Problems with deploying a load balancer in front of etcd clusters - google-cloud-internal-load-balancer

i am trying to deployu a load balancer in front of 3 of my etcd clusters which are running on port 2379. But I am always getting a connection refused error. I ve added firewall IPs for healthchecks and added a fw rule for 0.0.0.0 on port 2379 but still having this issue.
Is deploying an internal LB to load balance to servers in backend serving on port 2379 too difficult, am I missing something ?
Anyone had this type of issue before ?

ok, thank you guys who contributed to my problem, here s the problem and the solution, it s because of the nature of gcp load balancers i think.
Here is the earlier etcd config and the latest config.. as soon as i changed the config, LBs started routing trrafic
Earlier:
ExecStart=/usr/local/bin/etcd \\
--name $ETCD_NAME \\
--discovery-srv bstock.local \\
--initial-advertise-peer-urls http://$INTERNAL_IP:2380 \\
--initial-cluster-token etcd-cluster-1 \\
--initial-cluster-state new \
--advertise-client-urls http://$INTERNAL_IP:2379 \\
--listen-peer-urls http://$INTERNAL_IP:2380 \\
--listen-client-urls http://$INTERNAL_IP:2379,http://127.0.0.1:2379 \\
--data-dir=/var/lib/etcd
Latest:
ExecStart=/usr/local/bin/etcd \\
--name $ETCD_NAME \\
--discovery-srv bstock.local \\
--initial-advertise-peer-urls http://$INTERNAL_IP:2380 \\
--initial-cluster-token etcd-cluster-1 \\
--initial-cluster-state new \
--advertise-client-urls http://$INTERNAL_IP:2379 \\
--listen-peer-urls http://$INTERNAL_IP:2380 \\
--listen-client-urls http://0.0.0.0:2379 \\
--data-dir=/var/lib/etcd

Related

Openstack Magnum ERROR: The Parameter (octavia_ingress_controller_tag) was not defined in template

While creating k8s cluster using magnum I m getting this error and when I m giving octavia_ingress_controller_tag as a perimeter its showing invalid argument
I tried creating a cluster from openstack UI as well as cmd
cmd I used while creating a cluster
openstack coe cluster template create k8s-cluster-template \
--image coreos \
--keypair DevController \
--external-network Public \
--dns-nameserver 8.8.8.8 \
--flavor m1.tiny \
--docker-volume-size 10 \
--network-driver flannel \
--coe kubernetes \
--octavia_ingress_controller_tag ‘’
openstack coe cluster create k8s-cluster \
--cluster-template k8s-cluster-template \
--master-count 1 \
--node-count 2
CoreOS image can be used for K8s cluster start from Magnum 9.1.0 as mentioned in the compatibility matrix of this wiki https://wiki.openstack.org/wiki/Magnum

Google cloud kubernetes unable to connect to cluster

I'm getting Unable to connect to the server: dial tcp <IP> i/o timeout when trying to run kubectl get pods when connected to my cluster in google shell. This started out of the blue without me doing any changes to my cluster setup.
gcloud beta container clusters create tia-test-cluster \
--create-subnetwork name=my-cluster\
--enable-ip-alias \
--enable-private-nodes \
--master-ipv4-cidr <IP> \
--enable-master-authorized-networks \
--master-authorized-networks <IP> \
--no-enable-basic-auth \
--no-issue-client-certificate \
--cluster-version=1.11.2-gke.18 \
--region=europe-north1 \
--metadata disable-legacy-endpoints=true \
--enable-stackdriver-kubernetes \
--enable-autoupgrade
This is the current cluster-config.
I've run gcloud container clusters get-credentials my-cluster --zone europe-north1-a --project <my project> before doing this aswell.
I also noticed that my compute instances have lost their external IPs. In our staging environment, everything works as it should based on the same config.
Any pointers would be greatly appreciated.
From what I can see of what you've posted you've turned on master authorized networks for the network <IP>.
If the IP address of the Google Cloud Shell ever changes that is the exact error that you would expect.
As per https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#cloud_shell: you need to update the allowed IP address.
gcloud container clusters update tia-test-cluster \
--region europe-north1 \
--enable-master-authorized-networks \
--master-authorized-networks [EXISTING_AUTH_NETS],[SHELL_IP]/32

kubectl get componentstatus showing extra etcd instances

I have a single node kubernetes cluster running. Everything working fine, but when I run the "kubectl get cs" (kubectl get componentstatus) it showing two instance of etcd. I have running a single etcd instance.
[root#master01 vagrant]# kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
[root#master01 vagrant]# etcdctl member list
19ef3eced66f4ae3: name=master01 peerURLs=http://10.0.0.10:2380 clientURLs=http://0.0.0.0:2379 isLeader=true
[root#master01 vagrant]# etcdctl cluster-health
member 19ef3eced66f4ae3 is healthy: got healthy result from http://0.0.0.0:2379
cluster is healthy
Etcd is running as a docker container. In the /etc/systemd/system/etcd.service file single etcd cluster is mentioned.(http://10.0.0.10:2380)
/usr/local/bin/etcd \
--name master01 \
--data-dir /etcd-data \
--listen-client-urls http://0.0.0.0:2379 \
--advertise-client-urls http://0.0.0.0:2379 \
--listen-peer-urls http://0.0.0.0:2380 \
--initial-advertise-peer-urls http://10.0.0.10:2380 \
--initial-cluster master01=http://10.0.0.10:2380 \
--initial-cluster-token my-token \
--initial-cluster-state new \
Also in the api server config file /etc/kubernetes/manifests/api-srv.yaml --etcd-servers flag is used.
- --etcd-servers=http://10.0.0.10:2379,
[root#master01 manifests]# netstat -ntulp |grep etcd
tcp6 0 0 :::2379 :::* LISTEN 31109/etcd
tcp6 0 0 :::2380 :::* LISTEN 31109/etcd
Any one know why it showing etcd-0 and etcd-1 in "kubectl get cs" ?. Any help is appreciated.
Despite the fact that #Jyothish Kumar S has found the root cause on his own and fixed the issue - It's a good practice to have an answer that will be available for those, who will face the same problem in the future.
Issue came from missconfiguration in API server config file /etc/kubernetes/manifests/api-srv.yaml where--etcd-servers was set in an inappropriate way.
All flags for kube-apiserver along with their descriptions may be found here.
So, the issue was in the last comma in --etcd-servers=http://10.0.0.10:2379, line. This comma was interpreted as new ETCD server record http://:::2379 and that’s why in the "kubectl get cs" output we were able to see two etcd records instead of one.
Pay attention to this aspect while configuring etcd.

Running Kubernetes in multimaster mode

I have set a kubernetes (version 1.6.1) cluster with three servers in control plane.
Apiserver is running with the following config:
/usr/bin/kube-apiserver \
--admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota \
--advertise-address=x.x.x.x \
--allow-privileged=true \
--audit-log-path=/var/lib/k8saudit.log \
--authorization-mode=ABAC \
--authorization-policy-file=/var/lib/kubernetes/authorization-policy.jsonl \
--bind-address=0.0.0.0 \
--etcd-servers=https://kube1:2379,https://kube2:2379,https://kube3:2379 \
--etcd-cafile=/etc/etcd/ca.pem \
--event-ttl=1h \
--insecure-bind-address=0.0.0.0 \
--kubelet-certificate-authority=/var/lib/kubernetes/ca.pem \
--kubelet-client-certificate=/var/lib/kubernetes/kubernetes.pem \
--kubelet-client-key=/var/lib/kubernetes/kubernetes-key.pem \
--kubelet-https=true \
--service-account-key-file=/var/lib/kubernetes/ca-key.pem \
--service-cluster-ip-range=10.32.0.0/24 \
--service-node-port-range=30000-32767 \
--tls-cert-file=/var/lib/kubernetes/kubernetes.pem \
--tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem \
--token-auth-file=/var/lib/kubernetes/token.csv \
--v=2 \
--apiserver-count=3 \
--storage-backend=etcd2
Now I am running kubelet with following config:
/usr/bin/kubelet \
--api-servers=https://kube1:6443,https://kube2:6443,https://kube3:6443 \
--allow-privileged=true \
--cluster-dns=10.32.0.10 \
--cluster-domain=cluster.local \
--container-runtime=docker \
--network-plugin=kubenet \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--serialize-image-pulls=false \
--register-node=true \
--cert-dir=/var/lib/kubelet \
--tls-cert-file=/var/lib/kubernetes/kubelet.pem \
--tls-private-key-file=/var/lib/kubernetes/kubelet-key.pem \
--hostname-override=node1 \
--v=2
This works great as long as kube1 is running. If I take kube1 down, the node does not communicate with kube2 or kube3. It always takes up the first apiserver passed to the --api-servers flag and does not failover in case the first apiserver crashes.
What is the correct way to do a failover in case one of the apiserver fails?
The --api-servers flag is deprecated. It's no longer in the documentation. kubeconfig is the brand new way to point kubelet to kube-apiserver.
The kosher way to do this today is to deploy a Pod with nginx on each worker node (ie. the ones running kubelet) that load-balances between the 3 kube-apiservers. nginx will know when one master goes down and not route traffic to it; that's its job. The kubespray project uses this method.
The 2nd, not so good way, is to use DNS RR. Create a DNS "A" record for the IPs of the 3 masters. Point kubelet to this RR hostname instead of the 3x IPs. Each time kubelet contacts a master, it will be routed to the IP in the RR list. This technique isn't robust because traffic will still be routed to the downed node, so the cluster will experience intermittent outage.
The 3rd, and more complex method imho, is to use keepalived. keepalived uses VRRP to ensure that at least one node owns the Virtual IP (VIP). If a master goes down, another master will hijack the VIP to ensure continuity. The bad thing about this method is that load-balancing doesn't come as a default. All traffic will be routed to 1 master (ie. the primary VRRP node) until it goes down. Then the secondary VRRP node will take over. You can see the nice write-up I contributed at this page :)
More details about kube-apiserver HA here. Good luck!
For the moment, until 1.8, the best solution seems to be using a load-balancer, as already suggested.
See https://github.com/sipb/homeworld/issues/10.

KubeDNS namespace lookups failing

Stack
Environment: Azure
Type of install: Custom
Base OS: Centos 7.3
Docker: 1.12.5
The first thing I will say is that I have this same install working in AWS with the same configuration files for apiserver, manager, scheduler, kubelet, and kube-proxy.
Here is the kubelet config:
/usr/bin/kubelet \
--require-kubeconfig \
--allow-privileged=true \
--cluster-dns=10.32.0.10 \
--container-runtime=docker \
--docker=unix:///var/run/docker.sock \
--network-plugin=kubenet \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--serialize-image-pulls=true \
--cgroup-root=/ \
--system-container=/system \
--node-status-update-frequency=4s \
--tls-cert-file=/var/lib/kubernetes/kubernetes.pem \
--tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem \
--v=2
Kube-proxy config:
/usr/bin/kube-proxy \
--master=https://10.240.0.6:6443 \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--proxy-mode=iptables \
--v=2
Behavior:
Login to any of the pods on any node:
nslookup kubernetes 10.32.0.10
Server: 10.32.0.10
Address 1: 10.32.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'kubernetes': Try again
What does work is:
nslookup kubernetes.default.svc.cluster.local. 10.32.0.10
Server: 10.32.0.10
Address 1: 10.32.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default.svc.cluster.local.
Address 1: 10.32.0.1 kubernetes.default.svc.cluster.local
So I figured out that on azure, the resolv.conf looked like this:
; generated by /usr/sbin/dhclient-script
search ssnci0siiuyebf1tqq5j1a1cyd.bx.internal.cloudapp.net
10.32.0.10
options ndots:5
If I added the search domains of default.svc.cluster.local svc.cluster.local cluster.local.
Everything started working and I understand why.
However, this is problematic because for every namespace I create, I would need to manage the resolv.conf.
This does not happen when I deploy in Amazon so I am kind of stumped on why it is happening in Azure.
Kubelet has a command line flag, cluster-domain which it looks like you're missing. See the docs
Add --cluster-domain=cluster.local to your kubelet command start up, and it should start working as expected.