How to access etcd cluster endpoints from kubernetes master - kubernetes

Is there a way that I can access the etcd endpoints from kubernetes master node without actually getting into etcd cluster?
For a example, can I do a health curl (using ssh) to etcd endpoints or see endpoints and get the return status from the kubernetes master node? (i.e. without really getting inside the etcd master)

it really depends on how you configured the cluster. Actually, etcd cluster could work outside of k8s cluster at all. Also etcd could be configurred with TLS auth, so you will need to provide cert files to be able make any request via curl. etcdctl does everything you need. Something like:
~# export ETCDCTL_API=3
~# export ETCDCTL_ENDPOINTS=https://kub01.msk.test.ru:2379,https://kub02.msk.test.ru:2379,https://kub03.msk.test.ru:2379
~# etcdctl endpoint status
https://kub01.msk.test.ru:2379, e9bc9d307c96fd08, 3.3.13, 10 MB, true, 1745, 17368976
https://kub02.msk.test.ru:2379, 885ed66440d63a79, 3.3.13, 10 MB, false, 1745, 17368976
https://kub03.msk.test.ru:2379, 8c5c20ece034a652, 3.3.13, 10 MB, false, 1745, 17368976
or with the TLS:
~# etcdctl endpoint health
client: etcd cluster is unavailable or misconfigured; error #0: remote error: tls: bad certificate
; error #1: remote error: tls: bad certificate
; error #2: remote error: tls: bad certificate
# need to export environment vars
~# export ETCDCTL_CACERT=<PATH_TO_FILE>
~# export ETCDCTL_CERT=<PATH_TO_FILE>
~# export ETCDCTL_KEY=<PATH_TO_FILE>
~# etcdctl endpoint health
https://kub01.msk.test.ru:2379 is healthy: successfully committed proposal: took = 2.946423ms
https://kub02.msk.test.ru:2379 is healthy: successfully committed proposal: took = 1.5883ms
https://kub03.msk.test.ru:2379 is healthy: successfully committed proposal: took = 1.745591ms

You can run the commands into a pod without actually getting inside the pod for example if I have to run ls -l inside the etcd pod, what I would is
kubectl exec -it -n kube-system etcd-kanister-control-plane -- ls -l
Similarly you can run any command instead of ls -l

Related

rancher rke up errors on etcd host health checks remote error: tls: bad certificate

rke --debug up --config cluster.yml
fails with health checks on etcd hosts with error:
DEBU[0281] [etcd] failed to check health for etcd host [x.x.x.x]: failed to get /health for host [x.x.x.x]: Get "https://x.x.x.x:2379/health": remote error: tls: bad certificate
Checking etcd healthchecks
for endpoint in $(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5"); do
echo "Validating connection to ${endpoint}/health";
curl -w "\n" --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) "${endpoint}/health";
done
Running on that master node
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
you can run it manually and see if it responds correctly
curl -w "\n" --cacert /etc/kubernetes/ssl/kube-ca.pem --cert /etc/kubernetes/ssl/kube-etcd-x-x-x-x.pem --key /etc/kubernetes/ssl/kube-etcd-x-x-x-x-key.pem https://x.x.x.x:2379/health
Checking my self signed certificates hashes
# md5sum /etc/kubernetes/ssl/kube-ca.pem
f5b358e771f8ae8495c703d09578eb3b /etc/kubernetes/ssl/kube-ca.pem
# for key in $(cat /home/kube/cluster.rkestate | jq -r '.desiredState.certificatesBundle | keys[]'); do echo $(cat /home/kube/cluster.rkestate | jq -r --arg key $key '.desiredState.certificatesBundle[$key].certificatePEM' | sed '$ d' | md5sum) $key; done | grep kube-ca
f5b358e771f8ae8495c703d09578eb3b - kube-ca
versions on my master node
Debian GNU/Linux 10
rke version v1.3.1
docker version Version: 20.10.8
kubectl v1.21.5
v1.21.5-rancher1-1
I think my cluster.rkestate gone bad, are there any other locations where rke tool checks for certificates?
Currently I cannot do anything with this production cluster, and want to avoid downtime. I experimented on testing cluster different scenarios, I could do as last resort to recreate the cluster from scratch, but maybe I can still fix it...
rke remove && rke up
rke util get-state-file helped me to reconstruct bad cluster.rkestate file
and I was able to successfully rke up and add new master node to fix whole situation.
The problem can be solved by doing the following steps:
Remove kube_config_cluster.yml file where you run rke up command. (Since some data are missing in your K8s nodes)
Remove cluster.rkestate file.
Re-run rke up command.

Kubernetes, Unable to connect to the server: EOF

Environment of kubectl: Windows 10.
Kubectl version: https://storage.googleapis.com/kubernetes-release/release/v1.15.0/bin/windows/amd64/kubectl.exe
Hello. I've just installed Kubernetes cluster at Google Cloud Platform. Then applied the next command:
gcloud container clusters get-credentials my-cluster --zone europe-west1-b --project my-project
It successfully added the credentials at %UserProfile%\.kube\config
But when I try kubectl get pods it returns Unable to connect to the server: EOF. My computer accesses the internet through corporate proxy. How and where could I provide cert file for the kubectl so it could use the cert with all the requests? Thanx.
You would get EOF if there is no response from kubectl API calls in a certain time(Idle time is set 300 sec by default).
Try increasing cluster Idle time or maybe you might need VPN to access those pods (something like those)

In a Kubernetes cluster, is there a way to migrate etcd from external to internal?

I made a Kubernetes cluster one year ago with an external etcd cluster (3 members).
A the time, I did not know that it was possible to make an etcd internal, so I made an external cluster and connected Kubernetes to it.
Now I am seeing that an internal cluster is a thing and it is a cleaner solution because the etcd nodes are updated when you update your Kubernetes cluster.
I can't find a clean solution to migrate an external etcd cluster to an internal cluster. I hope there is a solution with zero downtime. Do you know if it is possible please ?
Thank you for your response and have a nice day !
As I can understand you have 3 etcd cluster members, external from Kubernetes cluster perspective. The expected outcome is to have all three members running on Kubernetes master nodes.
There is some information left undisclosed, so I try to explain several possible options.
First of all, there are several reasonable ways to run etcd process to use as Kubernetes control-plane key-value storage:
etcd run as static pod, having startup configuration in /etc/kubernetes/manifests/etcd.yaml file
etcd run as a system service defined in /etc/systemd/system/etcd.service or similar file
etcd run as a docker container configured using command line options. (this solution is not really safe, unless you can make the contaner restarted after failure or host reboot)
For experimental purposes, you can also run etcd:
as a simple process in linux userspace
as a stateful set in the kubernetes cluster
as a etcd cluster managed by etcd-operator.
My personal recommendation is to have 5 members etcd cluster: 3 members runs as a static pods on 3 master kubernetes nodes and two more runs as static pods on external (Kubernetes cluster independent) hosts. In this case you will still have a quorum if you have at least one master node running or if you loose two external nodes by any reason.
There are at least two way to migrate etcd cluster from external instances to the Kubernetes cluster master nodes. It works in the opposite way too.
Migration
It's quite straighforward way to migrate the cluster. During this procedure members are turned off (one at a time), moved to another host and started again. Your cluster shouldn't have any problems while you still have quorum in the etcd cluster. My recommendation is to have at least 3 or better 5 nodes etcd cluster to make the migration safer. For bigger clusters it's may be more convenient to use the other solution from my second answer.
The process of moving etcd member to another IP address is described in the official documentation:
To migrate a member:
Stop the member process.
Copy the data directory of the now-idle member to the new machine.
Update the peer URLs for the replaced member to reflect the new machine according to the runtime reconfiguration instructions.
Start etcd on the new machine, using the same configuration and the copy of the data directory.
Now let's look closer on each step:
0.1 Ensure your etcd cluster is healthy and all members are in a good condition. I would recommend also checking the logs of all etcd members, just in case.
(To successfuly run the following commands please refer to step 3 for auth variables and aliases)
# last two commands only show you members specified by using --endpoints command line option
# the following commands is suppose to run with root privileges because certificates are not accessible by regular user
e2 cluster-health
e3 endpoint health
e3 endpoint status
0.2 Check each etcd member configuration and find out where etcd data-dir is located, then ensure that it will remain accessible after etcd process termination. In most cases it's located under /var/lib/etcd on the host machine and used directly or mounted as a volume to etcd pod or docker container.
0.3 Create a snapshot of each etcd cluster member, it's better don't use it, than don't have it.
1. Stop etcd member process.
If you use kubelet to start etcd, as recommended here, move etcd.yaml file out of /etc/kubernetes/manifests/. Right after that etcd Pod will be terminated by kubelet:
sudo mv /etc/kubernetes/manifests/etcd.yaml ~/
sudo chmod 644 ~/etcd.yaml
In case if you start etcd process as a systemd service you can stop it using the following command:
sudo systemctl stop etcd-service-name.service
In case of docker container you can stop it using the following command:
docker ps -a
docker stop <etcd_container_id>
docker rm <etcd_container_id>
If you run the etcd process from the command line, you can kill it using the following command:
kill `pgrep etcd`
2. Copy the data directory of the now-idle member to the new machine.
Not much complexity here. Compact etcd data-dir to the file and copy it to the destination instance. I also recommend to copy etcd manifest or systemd service configuration if you plan to run etcd on the new instance in the same way.
tar -C /var/lib -czf etcd-member-name-data.tar.gz etcd
tar -czf etcd-member-name-conf.tar.gz [etcd.yaml] [/etc/systemd/system/etcd.service] [/etc/kubernetes/manifests/etcd.conf ...]
scp etcd-member-name-data.tar.gz destination_host:~/
scp etcd-member-name-conf.tar.gz destination_host:~/
3. Update the peer URLs for the replaced member to reflect the new member IP address according to the runtime reconfiguration instructions.
There are two way to do it, by using etcd API or by running etcdctl utility.
That's how etcdctl way may look like:
(replace etcd endpoints variables with the correct etcd cluster members ip addresses)
# all etcd cluster members should be specified
export ETCDSRV="--endpoints https://etcd.ip.addr.one:2379,https://etcd.ip.addr.two:2379,https://etcd.ip.addr.three:2379"
#authentication parameters for v2 and v3 etcdctl APIs
export ETCDAUTH2="--ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key"
export ETCDAUTH3="--cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key"
# etcdctl API v3 alias
alias e3="ETCDCTL_API=3 etcdctl $ETCDAUTH3 $ETCDSRV"
# etcdctl API v2 alias
alias e2="ETCDCTL_API=2 etcdctl $ETCDAUTH2 $ETCDSRV"
# list all etcd cluster members and their IDs
e2 member list
e2 member update member_id http://new.etcd.member.ip:2380
#or
e3 member update member_id --peer-urls="https://new.etcd.member.ip:2380"
That's how etcd API way may look like:
export CURL_ETCD_AUTH="--cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt"
curl https://health.etcd.istance.ip:2379/v2/members/member_id -XPUT -H "Content-Type: application/json" -d '{"peerURLs":["http://new.etcd.member.ip:2380"]}' ${CURL_ETCD_AUTH}
4. Start etcd on the new machine, using the adjusted configuration and the copy of the data directory.
Unpack etcd data-dir on the new host:
tar -xzf etcd-member-name-data.tar.gz -C /var/lib/
Adjust etcd startup configuration according to your needs. At this point it's easy to select another way to run etcd. Depending on your choice prepare manifest or service definition file and replace there old ip address with new. E.g.:
sed -i 's/\/10.128.0.12:/\/10.128.0.99:/g' etcd.yaml
Now it's time to start etcd by moving etcd.yaml to /etc/kubernetes/manifests/, or by running the following command (if you run etcd as a systemd service)
sudo systemctl start etcd-service-name.service
5. Check updated etcd process logs and etcd cluster health to ensure that member is healthy.
To do that you can use the following commands:
$ e2 cluster-health
$ kubectl logs etct_pod_name -n kube-system
$ docker logs etcd_container_id 2>&1 | less
$ journalctl -e -u etcd_service_name
The second solution I've mentioned in another answer is
Growing and then shrinking etcd cluster
The downside of this method is that etcd quorum size is temporary increased, and in case of several nodes failure, etcd cluster may break. To avoid it, you may want to remove one existing etcd cluster member before adding another one.
Here is the brief overview of the process:
generate certificates for all additional members using etcd ca.crt and ca.key from existing etcd node folder (/etc/kubernetes/pki/etcd/).
add new member to the cluster using etcdctl command
create etcd config for new member
start new etcd member using new keys and config
check cluster health
repeat steps 2-5 until all required etcd nodes are added
remove one exessive etcd cluster member using etcdctl command
check cluster health
repeat steps 7-8 until the desired size of etcd cluster is achieved
Adjust all etcd.yaml files for all etcd cluster members.
Adjust etcd endpoints in all kube-apiserver.yaml manifests
Another possible sequence:
generate certificates for all additional members using etcd ca.crt and ca.key from existing etcd node folder (/etc/kubernetes/pki/etcd/).
remove one etcd cluster member using etcdctl command
add new member to the cluster using etcdctl command
create etcd config for new member
start new etcd member using new keys and config
check cluster health
repeat steps 2-6 until required etcd configuration is achieved
Adjust all etcd.yaml files for all etcd cluster members.
Adjust etcd endpoints in all kube-apiserver.yaml manifests
How to generate certificates:
using kubeadm command (manual)
using cfssl tool (Kubernetes the hard way guide)
using openssl (link1, link2)
Note: If you have etcd cluster, you likely have etcd-CA certificate somewhere. Consider to use it along with the etcd-CA key to generate certificates for all additional etcd members.
Note: In case you choose to generate certificates manually, usual Kubernetes certificates' parameters are:
Signature Algorithm: sha256WithRSAEncryption
Public Key Algorithm: rsaEncryption
RSA Public-Key: (2048 bit)
CA certs age: 10 years
other certs age: 1 year
You can check the content of the certificates using the following command:
find /etc/kubernetes/pki/ -name *.crt | xargs -l bash -c 'echo $0 ; openssl x509 -in $0 -text -noout'
How to remove a member from the etcd cluster
(Please refer to my another answer, step 3, for variables and alias definitions)
e3 member list
b67816d38b8e9d2, started, kube-ha-m3, https://10.128.0.12:2380, https://10.128.0.12:2379
3de72bd56f654b1c, started, kube-ha-m1, https://10.128.0.10:2380, https://10.128.0.10:2379
ac98ece88e3519b5, started, kube-etcd2, https://10.128.0.14:2380, https://10.128.0.14:2379
cfb0839e8cad4c8f, started, kube-ha-m2, https://10.128.0.11:2380, https://10.128.0.11:2379
eb9b83c725146b96, started, kube-etcd1, https://10.128.0.13:2380, https://10.128.0.13:2379
401a166c949e9584, started, kube-etcd3, https://10.128.0.15:2380, https://10.128.0.15:2379 # Let's remove this one
e2 member remove 401a166c949e9584
The member will shutdown instantly. To prevent further attempt of joining the cluster, move/delete etcd.yaml from /etc/kubernetes/manifests/ or shutdown etcd service on the etcd member node
How to add a member to the etcd cluster
e3 member add kube-etcd3 --peer-urls="https://10.128.0.16:2380"
The output shows the parameters required to start the new etcd cluster member, e.g.:
ETCD_NAME="kube-etcd3"
ETCD_INITIAL_CLUSTER="kube-ha-m3=https://10.128.0.15:2380,kube-ha-m1=https://10.128.0.10:2380,kube-etcd2=https://10.128.0.14:2380,kube-ha-m2=https://10.128.0.11:2380,kube-etcd1=https://10.128.0.13:2380,kube-etcd3=https://10.128.0.16:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.128.0.16:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
Note: ETCD_INITIAL_CLUSTER variable contains all existing etcd cluster members and also the new node. If you need to add several nodes it should be done one node at a time.
Note: All ETCD_INITIAL_* variables and corresponded command line parameters only required for the first etcd Pod start. After successful addition of the node to the etcd cluster, these parameters are ignored and can be removed from startup configuration. All required information is stored in /var/lib/etcd folder in etcd database file.
The default etcd.yaml manifest could be generated using the following kubeadm commmand:
kubeadm init phase etcd local
It's better to move etcd.yaml file from /etc/kubernetes/manifests/ somewhere to make adjustments.
Also delete content of the /var/lib/etcd folder. It contains data of new etcd cluster, so it can't be used to add member to existing cluster.
Then it should be adjusted according to member add command output. (--advertise-client-urls, -initial-advertise-peer-urls, --initial-cluster, --initial-cluster-state, --listen-client-urls, --listen-peer-urls) E.g.:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://10.128.0.16:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --initial-advertise-peer-urls=https://10.128.0.16:2380
- --initial-cluster=kube-ha-m3=https://10.128.0.15:2380,kube-ha-m1=https://10.128.0.10:2380,kube-etcd2=https://10.128.0.14:2380,kube-ha-m2=https://10.128.0.11:2380,kube-etcd1=https://10.128.0.13:2380,kube-etcd3=https://10.128.0.16:2380
- --initial-cluster-state=existing
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://10.128.0.16:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://10.128.0.16:2380
- --name=kube-etcd3
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: k8s.gcr.io/etcd:3.3.10
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /health
port: 2381
scheme: HTTP
initialDelaySeconds: 15
timeoutSeconds: 15
name: etcd
resources: {}
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
After saving the file, kubelet will restart etcd pod. Check etcd container logs to ensure it is joined to the cluster.
How to check cluster health
$ e2 cluster-health
member b67816d38b8e9d2 is healthy: got healthy result from https://10.128.0.15:2379
member 3de72bd56f654b1c is healthy: got healthy result from https://10.128.0.10:2379
member ac98ece88e3519b5 is healthy: got healthy result from https://10.128.0.14:2379
member cfb0839e8cad4c8f is healthy: got healthy result from https://10.128.0.11:2379
member eb9b83c725146b96 is healthy: got healthy result from https://10.128.0.13:2379
cluster is healthy
$ e2 member list
b67816d38b8e9d2: name=kube-ha-m3 peerURLs=https://10.128.0.15:2380 clientURLs=https://10.128.0.15:2379 isLeader=true
3de72bd56f654b1c: name=kube-ha-m1 peerURLs=https://10.128.0.10:2380 clientURLs=https://10.128.0.10:2379 isLeader=false
ac98ece88e3519b5: name=kube-etcd2 peerURLs=https://10.128.0.14:2380 clientURLs=https://10.128.0.14:2379 isLeader=false
cfb0839e8cad4c8f: name=kube-ha-m2 peerURLs=https://10.128.0.11:2380 clientURLs=https://10.128.0.11:2379 isLeader=false
eb9b83c725146b96: name=kube-etcd1 peerURLs=https://10.128.0.13:2380 clientURLs=https://10.128.0.13:2379 isLeader=false
$ e3 endpoint health
# the output includes only etcd members that are specified in --endpoints cli option or corresponded environment variable. I've included only three out of five members
https://10.128.0.13:2379 is healthy: successfully committed proposal: took = 2.310436ms
https://10.128.0.15:2379 is healthy: successfully committed proposal: took = 1.795723ms
https://10.128.0.14:2379 is healthy: successfully committed proposal: took = 2.41462ms
$ e3 endpoint status
# the output includes only etcd members that are specified in --endpoints cli option or corresponded environment variable. I've included only three out of five members
https://10.128.0.13:2379 is healthy: successfully committed proposal: took = 2.531676ms
https://10.128.0.15:2379 is healthy: successfully committed proposal: took = 1.285312ms
https://10.128.0.14:2379 is healthy: successfully committed proposal: took = 2.266932ms
How to check etcl Pod logs without using kubectl?
If you run etcd member using kubelet only, you can check its log using the following command:
docker logs `docker ps -a | grep etcd | grep -v pause | awk '{print $1}' | head -n1` 2>&1 | less
Note: Usually, only one etcd Pod can be run on the same node at the same time, because it uses database in the host directory /var/lib/etcd/ and it cannot be shared between two pods. Also etcd Pod uses node network interface to communicate with the etcd cluster.
Of course, you can configure etcd Pod to use different host directory and use different host ports as a workaround, but the above command assumes that the only one etcd Pod is present on the node.

Why tiller connect to localhost 8080 for kubernetes api?

When use helm for kubernetes package management, after installed the helm client,
after
helm init
I can see tiller pods are running on kubernetes cluster, and then when I run helm ls, it gives an error:
Error: Get http://localhost:8080/api/v1/namespaces/kube-system/configmaps?labe
lSelector=OWNER%3DTILLER: dial tcp 127.0.0.1:8080: getsockopt: connection
refused
and use kubectl logs I can see similar message like:
[storage/driver] 2017/08/28 08:08:48 list: failed to list: Get
http://localhost:8080/api/v1/namespaces/kube-system/configmaps?
labelSelector=OWNER%3DTILLER: dial tcp 127.0.0.1:8080: getsockopt: connection
refused
I can see the tiller pod is running at one of the node instead of master, there is no api server running on that node, why it connects to 127.0.0.1 instead of my master ip?
Run this before doing helm init. It worked for me.
kubectl config view --raw > ~/.kube/config
First delete tiller deployment and stop the tiller service.By running below commands,
kubectl delete deployment tiller-deploy --namespace=kube-system
kubectl delete service tiller-deploy --namespace=kube-system
rm -rf $HOME/.helm/
By default, helm init installs the Tiller pod into the kube-system namespace, with Tiller configured to use the default service account.
Configure Tiller with cluster-admin access with the following command:
kubectl create clusterrolebinding tiller-cluster-admin \
--clusterrole=cluster-admin \
--serviceaccount=kube-system:default
Then install helm server (Tiller) with the following command:
helm init
So I was having this problem since a couple weeks on my work station and none of the answers provided (here or in Github) worked for me.
What it has worked is this:
sudo kubectl proxy --kubeconfig ~/.kube/config --port 80
Notice that I am using port 80, so I needed to use sudo to be able to bing the proxy there, but if you are using 8080 you won't need that.
Be careful with this because the kubeconfig file that the command above is pointing to is in /root/.kube/config instead than in your usual $HOME. You can either use an absolute path to point to the config you want to use or create one in root's home (or use this sudo flag to preserve your original HOME env var --preserve-env=HOME).
Now if you are using helm by itself I guess this is it. To get my setup working, as I am using Helm through the Terraform provider on GKE this was a pain in the ass to debug as the message I was getting doesn't even mention Helm and it's returned by Terraform when planning. For anybody that may be in a similar situation:
The errors when doing a plan/apply operation in Terraform in any cluster with Helm releases in the state:
Error: error installing: Post "http://localhost/apis/apps/v1/namespaces/kube-system/deployments": dial tcp [::1]:80: connect: connection refused
Error: Get "http://localhost/api/v1/namespaces/system/secrets/apigee-secrets": dial tcp [::1]:80: connect: connection refused
One of these errors for every helm release in the cluster or something like that. In this case for a GKE cluster I had to ensure that I had the env var GOOGLE_APPLICATION_CREDENTIALS pointing to the key file with valid credentials (application-default unless you are not using the default set up for application auth) :
gcloud auth application-default login
export GOOGLE_APPLICATION_CREDENTIALS=/home/$USER/.config/gcloud/application_default_credentials.json
With the kube proxy in place and the correct credentials I am able again to use Terraform (and Helm) as usual. I hope this is helpful for anybody experiencing this.
kubectl config view --raw > ~/.kube/config
export KUBECONFIG=~/.kube/config
worked for me

DNS not resolving though all keys are there in etcd?

Here are some details, and why this is important for my next step in testing:
I can resolve any outside DNS
etcd appears to have all keys updating
correctly, along with directories (as expected)
Local-to-Kubernetes DNS queries doesn't appear to be working against the etcd datastore,
even though I can manually query for key-values.
This is the next
step that I need to complete before I can start using an NGINX L7 LB
demo.
I looked
at the advice in #10265 first [just in case], but it
appears I do have secrets for the service account...and I think(?)
everything should be there as expected.
The only thing I really see in the Kube2Sky logs are that etcd is found. I would imagine I should be seeing more than this?
[fedora#kubemaster ~]$ kubectl logs kube-dns-v10-q9mlb -c kube2sky --namespace=kube-system
I0118 17:42:24.639508 1 kube2sky.go:436] Etcd server found: http://127.0.0.1:4001
I0118 17:42:25.642366 1 kube2sky.go:503] Using https://10.254.0.1:443 for kubernetes master
I0118 17:42:25.642772 1 kube2sky.go:504] Using kubernetes API
[fedora#kubemaster ~]$
More Details:
[fedora#kubemaster ~]$ kubectl exec -t busybox -- nslookup kubelab.local
Server: 10.254.0.10
Address 1: 10.254.0.10
nslookup: can't resolve 'kubelab.local'
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1
fedora#kubemaster ~]$ etcdctl ls --recursive
/kubelab.local
/kubelab.local/network
/kubelab.local/network/config
/kubelab.local/network/subnets
/kubelab.local/network/subnets/172.16.46.0-24
/kubelab.local/network/subnets/172.16.12.0-24
/kubelab.local/network/subnets/172.16.70.0-24
/kubelab.local/network/subnets/172.16.21.0-24
/kubelab.local/network/subnets/172.16.54.0-24
/kubelab.local/network/subnets/172.16.71.0-24
....and so on...the keys are all there, as expected...
I see you changed the default "cluster.local" to "kubelab.local". did you change the skydns config to serve that domain?
kubectl exec --namespace=kube-system $podname -c skydns ps
PID USER COMMAND
1 root /skydns -machines=http://127.0.0.1:4001 -addr=0.0.0.0:53 -ns-rotate=false -domain=cluster.local.
11 root ps
Note the -domain flag.
If that is correct, check that you passed correct --cluster-dns and --cluster-domain flags to Kubelet. Then show me /etc/resolv.conf from a pod that can not do DNS lookups.