I have two network interface in my master node -
192.168.56.118
10.0.3.15
While doing kubeadm init on master node, I got following command to add workers
kubeadm join --token qr1czu.5lh1nt34ldiauc1u 192.168.56.118:6443 --discovery-token-ca-cert-hash sha256:e5d90dfa0fff67589551559c443762dac3f1e5c7a5d2b4a630e4c0156ad0e16c
As you can see, it shows 192.168.56.118 IP to connect from worker.
But while executing the same on worker node, I'm getting following error.
[root#k8s-worker ~]# kubeadm join --token qr1czu.5lh1nt34ldiauc1u 192.168.56.118:6443 --discovery-token-ca-cert-hash sha256:e5d90dfa0fff67589551559c443762dac3f1e5c7a5d2b4a630e4c0156ad0e16c
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "192.168.56.118:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.56.118:6443"
[discovery] Requesting info from "https://192.168.56.118:6443" again to validate TLS against the pinned public key
[discovery] Failed to request cluster info, will try again: [Get https://192.168.56.118:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: x509: certificate is valid for 10.96.0.1, 10.0.3.15, not 192.168.56.118]
I tried with other IP - 10.0.3.15. But it returns connection refused error, despite the fact that the firewall is disabled in master.
[root#k8s-worker ~]# kubeadm join --token qr1czu.5lh1nt34ldiauc1u 10.0.3.15:6443 --discovery-token-ca-cert-hash sha256:e5d90dfa0fff67589551559c443762dac3f1e5c7a5d2b4a630e4c0156ad0e16c
[preflight] Running pre-flight checks
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[discovery] Trying to connect to API Server "10.0.3.15:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.3.15:6443"
[discovery] Failed to request cluster info, will try again: [Get https://10.0.3.15:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 10.0.3.15:6443: connect: connection refused]
How can I force the certificate to make 192.168.56.118 as valid? or any idea how can I resolve this issue?
You need to provide extra apiserver certificate SAN (--apiserver-cert-extra-sans <ip_address>) and api server advertise address(--apiserver-advertise-address) while initialising the cluster using kubeadm init. Your kubeadm init command will look like:
kubeadm init --apiserver-cert-extra-sans 192.168.56.118 --apiserver-advertise-address 192.168.56.118
Once, you initialise cluster with above command you will not face the issue of certificates while joining the cluster
Related
What was done?
kubeadm init phase certs all
kubeadm init phase kubeconfig all
Daemon reloaded
Kubelet restarted
Calico CNI restarted
Now:
All Worker Nodes show Ready State
All Deployments and pods show Running state
Application has errors in logs:
akka.management.cluster.bootstrap.internal.BootstrapCoordinator -
Resolve attempt failed! Cause:
akka.discovery.kubernetes.KubernetesApiServiceDiscovery$KubernetesApiException:
Non-200 from Kubernetes API server: 401 Unauthorized
Kube Apiserver has logs:
Unable to authenticate the request due to an error: [invalid bearer token, square/go-jose: error in cryptographic primitive]
Could it be the old certs and tokensbeing cached by the services somewhere ?
I have 5 Vm in my GCP, out of which three are supposed to be master1, master2, master3 and other two are worker nodes (worker1 & worker 2). I have created a TCP Loadbalancer(LB) to enable load balancing for the master nodes. I have two sections in the LB:
i)frontend ii)backend
In the backend, i have defined all master ips there. And the frontend, i generated a static public ip and given port 6443 as LB port.
In master1, i sucessfully ran the kubeadm init command as follows:
kubeadm init --control-plane-endpoint="<LB_IP>:6443" --apiserver-advertise-address=10.128.0.2 --pod-network-cidr=10.244.0.0/16
where 10.128.0.2 is the master1 internal ip & 10.244.0.0/16 is the network cidr for the kube-flannel.
The kubeadm init runs sucessfully and gives two kubeadm join commands, one to join a new control plane and other to join a new worker node.
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join LB_IP:6443 --token znnlha.6Gfn1vlkunwpz36b \
--discovery-token-ca-cert-hash sha256:dc8834a2a5b4ada38a1ab9831e4cae67e9d64cb585458a194018f3ba5a82ac4U \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join LB_IP:6443 --token znnlha.6sfn1vlkunwpz36A \
--discovery-token-ca-cert-hash sha256:dc8834a2a5b4ada38a1ab9831e4cae68e9d64cb585458a194018f3ba5a82ac4e
I am not using --upload-certs for transfering the certificates from one control plane to another. I am doing it manually.
But when I run the above kubeadm join command to add a new control plane, on the one of my other master nodes,say master2, I am getting an error like following :
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get "https://LB_IP:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp LB_IP:6443: connect: connection refused
I just came across the same issue - in my case the problem was that when hitting GCP's loadbalancer from an instance which is also a target of this loadbalancer, the request will always land on the same instance from where you sent the request.
Based on this:
you run kubeadm init on Node A using LB_IP. LB_IP gets resolved to Node A. Everything works as expected, as you are creating a new cluster.
you run kubeadm join on Node B using LB_IP. LB_IP gets resolved to Node B, while the master you just initialized is on Node A. Node B doesn't have anything running on port 6443, thus you get connection refused error.
A simple solution is to remove the instance you are running kubeadm join on from the loadbalancer targets. You can re-add it right after successful join.
I was following instructions at https://kubernetes.io/docs/setup/independent/high-availability/#stacked-control-plane-and-etcd-nodes and I can't get the secondary master node to join the primary master.
$> kubeadm join LB_IP:6443 --token TOKEN --discovery-token-ca-cert-hash sha256:HASH --experimental-control-plane
[preflight] running pre-flight checks
[discovery] Trying to connect to API Server "LB_IP:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://LB_IP:6443"
[discovery] Requesting info from "https://LB_IP:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "LB_IP:6443"
[discovery] Successfully established connection with API Server "LB_IP:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
One or more conditions for hosting a new control plane instance is not satisfied.
unable to add a new control plane instance on a cluster that doesn't use an external etcd
Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The cluster uses an external etcd.
* The certificates that must be shared among control plane instances are provided.
Here is my admin init config:
apiVersion: kubeadm.k8s.io/v1alpha3
kind: ClusterConfiguration
kubernetesVersion: "1.12.3"
apiServer:
certSANs:
- "LB_IP"
controlPlaneEndpoint: "LB_IP:6443"
networking:
podSubnet: "192.168.128.0/17"
serviceSubnet: "192.168.0.0/17"
And I initialized the primary master node like:
kubeadm init --config=./kube-adm-config.yaml
I have also copied all the certs to the secondary node and kubectl works on the secondary:
[root#secondary ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
primary.fqdn Ready master 8h v1.12.3
I was really hoping to not set up external etcd nodes. The instructions seem pretty straightforward and I don't understand what I am missing.
Any advice to help get this stacked control plane multi-master setup with local etcd to work would be appreciated. Or any debugging ideas. Or at least "stacked control plane doesn't work, you must use external etcd".
Upgrading to k8s version 1.13.0 resolved my issue. I think the instructions were specifically for this newer version.
This is my worker node:
root#ivu:~# kubeadm join 10.16.70.174:6443 --token hl36mu.0uptj0rp3x1lfw6n --discovery-token-ca-cert-hash sha256:daac28160d160f938b82b8c720cfc91dd9e6988d743306f3aecb42e4fb114f19 --ignore-preflight-errors=swap
[preflight] Running pre-flight checks.
[WARNING Swap]: running with swap on is not supported. Please disable swap
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[discovery] Trying to connect to API Server "10.16.70.174:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.16.70.174:6443"
[discovery] Requesting info from "https://10.16.70.174:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.16.70.174:6443"
[discovery] Successfully established connection with API Server "10.16.70.174:6443"
This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
While checking in master nodes using command kubectl get nodes, I can only able to see master:
ivum01#ivum01-HP-Pro-3330-SFF:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ivum01-hp-pro-3330-sff Ready master 36m v1.10.0
For question answer:
docker kubelet kubeadm kubectl installed fine
kubectl get node can not see the current added node; of cause kubectl get pods --all-namespaces has no result for this node;
docker which in current node has no report for kubeadm command(means no k8s images pull, no running container for that)
must import is kubelet not running in work node
run kubelet output:
Failed to get system container stats for "/user.slice/user-1000.slice/session-1.scope": failed to get cgroup stats for "/user.slice/user-1000.slice/session-1.scope": failed to get container info for "/user.slice/user-1000.slice/session-1.scope": unknown container "/user.slice/user-1000.slice/session-1.scope"
same as this issue said
tear down and reset cluster(kubeadm reset) and redo that has no problem in my case.
I had this problem and it was solved by ensuring that cgroup driver on the worker nodes also were set properly.
check with:
docker info | grep -i cgroup
cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
set it with:
sed -i "s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
then restart kubelet service and rejoin the cluster:
systemctl daemon-reload
systemctl restart kubelet
kubeadm reset
kubeadm join ...
Info from docs: https://kubernetes.io/docs/tasks/tools/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-master-node
I trying to setup the 3 node Multi Master kubernetes v1.10.0 cluster setup using Centos 7.
following kubeadm document steps. I was able to setup the 3 master all of them in Ready status.
https://kubernetes.io/docs/setup/independent/high-availability/#kubeadm-init-master0
while joining work node. I am getting certificate issue, kubeadm join command giving this error message.
kubeadm join <masterip>:6443 --token v5ylg8.jgqn122kewvobaoo --discovery-token-ca-cert-hash sha256:b01832713190461cc96ca02e5c2b1e578473c6712b0965d2383aa9de9b41d4b6 --ignore-preflight-errors=cri
[preflight] Running pre-flight checks.
[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 17.09.1-ce. Max validated version: 17.03
[WARNING CRI]: unable to check if the container runtime at "/var/run/dockershim.sock" is running: exit status 1
[discovery] Trying to connect to API Server "<masterip>:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://<masterip>:6443"
[discovery] Failed to connect to API Server "<masterip>:6443": cluster CA found in cluster-info configmap is invalid: public key sha256:87d28cf32666a75cb7ed6502ab5c726de29438754c48507687648e84ad9b6693 not pinned
[discovery] Trying to connect to API Server "<masterip>:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://<masterip>:6443"
[discovery] Failed to connect to API Server "<masterip>:6443": cluster CA found in cluster-info configmap is invalid: public key sha256:87d28cf32666a75cb7ed6502ab5c726de29438754c48507687648e84ad9b6693 not pinned
How to find out root cause for this certificate error message?
Thanks
SR