kubernetes HA joing work node error - cluster CA found in cluster-info configmap is invalid: public key - kubernetes

I trying to setup the 3 node Multi Master kubernetes v1.10.0 cluster setup using Centos 7.
following kubeadm document steps. I was able to setup the 3 master all of them in Ready status.
https://kubernetes.io/docs/setup/independent/high-availability/#kubeadm-init-master0
while joining work node. I am getting certificate issue, kubeadm join command giving this error message.
kubeadm join <masterip>:6443 --token v5ylg8.jgqn122kewvobaoo --discovery-token-ca-cert-hash sha256:b01832713190461cc96ca02e5c2b1e578473c6712b0965d2383aa9de9b41d4b6 --ignore-preflight-errors=cri
[preflight] Running pre-flight checks.
[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 17.09.1-ce. Max validated version: 17.03
[WARNING CRI]: unable to check if the container runtime at "/var/run/dockershim.sock" is running: exit status 1
[discovery] Trying to connect to API Server "<masterip>:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://<masterip>:6443"
[discovery] Failed to connect to API Server "<masterip>:6443": cluster CA found in cluster-info configmap is invalid: public key sha256:87d28cf32666a75cb7ed6502ab5c726de29438754c48507687648e84ad9b6693 not pinned
[discovery] Trying to connect to API Server "<masterip>:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://<masterip>:6443"
[discovery] Failed to connect to API Server "<masterip>:6443": cluster CA found in cluster-info configmap is invalid: public key sha256:87d28cf32666a75cb7ed6502ab5c726de29438754c48507687648e84ad9b6693 not pinned
How to find out root cause for this certificate error message?
Thanks
SR

Related

Single-Node Kubernete Cluster Has Cluster-Wide 401 Unauthorized Error in Microservices After CA Cert Ratation

What was done?
kubeadm init phase certs all
kubeadm init phase kubeconfig all
Daemon reloaded
Kubelet restarted
Calico CNI restarted
Now:
All Worker Nodes show Ready State
All Deployments and pods show Running state
Application has errors in logs:
akka.management.cluster.bootstrap.internal.BootstrapCoordinator -
Resolve attempt failed! Cause:
akka.discovery.kubernetes.KubernetesApiServiceDiscovery$KubernetesApiException:
Non-200 from Kubernetes API server: 401 Unauthorized
Kube Apiserver has logs:
Unable to authenticate the request due to an error: [invalid bearer token, square/go-jose: error in cryptographic primitive]
Could it be the old certs and tokensbeing cached by the services somewhere ?

Cannot get nodes using kubectl get nodes with gcloud shell

My GCP GKE cluster is connected to the Rancher (v 2.3.3) but it shows unavailable with the msg:
Failed to communicate with API server: Get https://X.x.X.x:443/api/v1/namespaces/kube-system?timeout=30s: waiting for cluster agent to connect
When I try to connect to the GCP K8s Cluster via gcloudshell I cannot retrieve any info with command: kubectl get nodes !!
Any idea why it is happening ... all workloads and services are running and green, only Ingress stuff is with warning info some of them with Unhealthy status from the backend services. But first need to know how can I troubleshoot the problem with connectivity to the k8s cluster with gcloud or rancher !!

Worker node unable to join master node in kubernetes

I have two network interface in my master node -
192.168.56.118
10.0.3.15
While doing kubeadm init on master node, I got following command to add workers
kubeadm join --token qr1czu.5lh1nt34ldiauc1u 192.168.56.118:6443 --discovery-token-ca-cert-hash sha256:e5d90dfa0fff67589551559c443762dac3f1e5c7a5d2b4a630e4c0156ad0e16c
As you can see, it shows 192.168.56.118 IP to connect from worker.
But while executing the same on worker node, I'm getting following error.
[root#k8s-worker ~]# kubeadm join --token qr1czu.5lh1nt34ldiauc1u 192.168.56.118:6443 --discovery-token-ca-cert-hash sha256:e5d90dfa0fff67589551559c443762dac3f1e5c7a5d2b4a630e4c0156ad0e16c
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "192.168.56.118:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.56.118:6443"
[discovery] Requesting info from "https://192.168.56.118:6443" again to validate TLS against the pinned public key
[discovery] Failed to request cluster info, will try again: [Get https://192.168.56.118:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: x509: certificate is valid for 10.96.0.1, 10.0.3.15, not 192.168.56.118]
I tried with other IP - 10.0.3.15. But it returns connection refused error, despite the fact that the firewall is disabled in master.
[root#k8s-worker ~]# kubeadm join --token qr1czu.5lh1nt34ldiauc1u 10.0.3.15:6443 --discovery-token-ca-cert-hash sha256:e5d90dfa0fff67589551559c443762dac3f1e5c7a5d2b4a630e4c0156ad0e16c
[preflight] Running pre-flight checks
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[discovery] Trying to connect to API Server "10.0.3.15:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.3.15:6443"
[discovery] Failed to request cluster info, will try again: [Get https://10.0.3.15:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 10.0.3.15:6443: connect: connection refused]
How can I force the certificate to make 192.168.56.118 as valid? or any idea how can I resolve this issue?
You need to provide extra apiserver certificate SAN (--apiserver-cert-extra-sans <ip_address>) and api server advertise address(--apiserver-advertise-address) while initialising the cluster using kubeadm init. Your kubeadm init command will look like:
kubeadm init --apiserver-cert-extra-sans 192.168.56.118 --apiserver-advertise-address 192.168.56.118
Once, you initialise cluster with above command you will not face the issue of certificates while joining the cluster

Kubernetes: Trying to add second master node to K8S master using stacked control plane instructions

I was following instructions at https://kubernetes.io/docs/setup/independent/high-availability/#stacked-control-plane-and-etcd-nodes and I can't get the secondary master node to join the primary master.
$> kubeadm join LB_IP:6443 --token TOKEN --discovery-token-ca-cert-hash sha256:HASH --experimental-control-plane
[preflight] running pre-flight checks
[discovery] Trying to connect to API Server "LB_IP:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://LB_IP:6443"
[discovery] Requesting info from "https://LB_IP:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "LB_IP:6443"
[discovery] Successfully established connection with API Server "LB_IP:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
One or more conditions for hosting a new control plane instance is not satisfied.
unable to add a new control plane instance on a cluster that doesn't use an external etcd
Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The cluster uses an external etcd.
* The certificates that must be shared among control plane instances are provided.
Here is my admin init config:
apiVersion: kubeadm.k8s.io/v1alpha3
kind: ClusterConfiguration
kubernetesVersion: "1.12.3"
apiServer:
certSANs:
- "LB_IP"
controlPlaneEndpoint: "LB_IP:6443"
networking:
podSubnet: "192.168.128.0/17"
serviceSubnet: "192.168.0.0/17"
And I initialized the primary master node like:
kubeadm init --config=./kube-adm-config.yaml
I have also copied all the certs to the secondary node and kubectl works on the secondary:
[root#secondary ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
primary.fqdn Ready master 8h v1.12.3
I was really hoping to not set up external etcd nodes. The instructions seem pretty straightforward and I don't understand what I am missing.
Any advice to help get this stacked control plane multi-master setup with local etcd to work would be appreciated. Or any debugging ideas. Or at least "stacked control plane doesn't work, you must use external etcd".
Upgrading to k8s version 1.13.0 resolved my issue. I think the instructions were specifically for this newer version.

Unable to see join nodes in Kubernetes master

This is my worker node:
root#ivu:~# kubeadm join 10.16.70.174:6443 --token hl36mu.0uptj0rp3x1lfw6n --discovery-token-ca-cert-hash sha256:daac28160d160f938b82b8c720cfc91dd9e6988d743306f3aecb42e4fb114f19 --ignore-preflight-errors=swap
[preflight] Running pre-flight checks.
[WARNING Swap]: running with swap on is not supported. Please disable swap
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[discovery] Trying to connect to API Server "10.16.70.174:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.16.70.174:6443"
[discovery] Requesting info from "https://10.16.70.174:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.16.70.174:6443"
[discovery] Successfully established connection with API Server "10.16.70.174:6443"
This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
While checking in master nodes using command kubectl get nodes, I can only able to see master:
ivum01#ivum01-HP-Pro-3330-SFF:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ivum01-hp-pro-3330-sff Ready master 36m v1.10.0
For question answer:
docker kubelet kubeadm kubectl installed fine
kubectl get node can not see the current added node; of cause kubectl get pods --all-namespaces has no result for this node;
docker which in current node has no report for kubeadm command(means no k8s images pull, no running container for that)
must import is kubelet not running in work node
run kubelet output:
Failed to get system container stats for "/user.slice/user-1000.slice/session-1.scope": failed to get cgroup stats for "/user.slice/user-1000.slice/session-1.scope": failed to get container info for "/user.slice/user-1000.slice/session-1.scope": unknown container "/user.slice/user-1000.slice/session-1.scope"
same as this issue said
tear down and reset cluster(kubeadm reset) and redo that has no problem in my case.
I had this problem and it was solved by ensuring that cgroup driver on the worker nodes also were set properly.
check with:
docker info | grep -i cgroup
cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
set it with:
sed -i "s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
then restart kubelet service and rejoin the cluster:
systemctl daemon-reload
systemctl restart kubelet
kubeadm reset
kubeadm join ...
Info from docs: https://kubernetes.io/docs/tasks/tools/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-master-node