Unauthorized when trying to allow nodes to join a Kubernetes cluster - kubernetes

I had a two node cluster in which one was master and another slave. It was running from the last 26 days. Today i tried to remove a node using kubeadm reset and add it again and kubelet was not able to start
cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
The binary conntrack is not installed, this can cause failures in network connection cleanup.
server.go:376] Version: v1.10.2
feature_gate.go:226] feature gates: &{{} map[]}
plugins.go:89] No cloud provider specified.
server.go:233] failed to run Kubelet: cannot create certificate signing request: Unauthorized
while the join command is successful
[preflight] Running pre-flight checks.
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[preflight] Starting the kubelet service
[discovery] Trying to connect to API Server "aaaaa:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://aaaaa:6443"
[discovery] Requesting info from "https:/aaaaaa:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server
[discovery] Successfully established connection with API Server "aaaa:6443"
This node has joined the cluster:
Certificate signing request was sent to master and a response
was received.
The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
IMO the log failed to run Kubelet: cannot create certificate signing request: Unauthorized is the source of the problem, but I am do not know how it is coming and how to fix it.
TIA. I can give more details but I am not sure what all I shall give

Related

Renewing Kubernetes cluster certificates

We currently having 2 Master 2 Worker node cluster on Kubernetes v1.13.4.The cluster is down as the kubelet certificate located in /var/lib/kubelet/pki/kubelet.crt has expired and the kubelet service is not running. On checking the kubelet logs I get the following error
E0808 09:49:35.126533 55154 bootstrap.go:209] Part of the existing bootstrap client certificate is expired: 2019-08-06 22:39:23 +0000 UTC
The following certificates ca.crt, apiserver-kubelet-client.crt are valid. We are unable to renew the kubelet.crt certificate manually by using the kubeadm-config.yaml. Can someone please provide the steps to renew the certificate.
We have tried setting --rotate-certificates property and also using kubeadm-config.yaml but since we are using v1.13.4 kubeadm --config flag is not present.
On checking the kubelet logs I get the following error
E0808 09:49:35.126533 55154 bootstrap.go:209] Part of the existing bootstrap client certificate is expired: 2019-08-06 22:39:23 +0000 UTC
As you mentioned that only kubelet.crt has expired and apiserver-kubelet-client.crt is valid, you can try to renew it by command kubeadm alpha certs renew based on documentation.
Second way to renew kubeadm certificates is upgrade version like in this article.
You can also try by using kubeadm init phase certs all. It was explained in this Stackoverflow case.
Let me know if that helped. If not provide more information with more logs.

etcd 3rd pod not getting scheduled on master node due to peers expecting old cert

needed hint to resolve etcd cert issue on two etcd server pods
I have 2(3) etcd server pods and these are reporting for 3rd pod that x.509 cert is valid for etc.test1.com and not for etc.test2.com
so, my assumption is, issue is etcd server pod 2 & 3 are somehow expecting old cert dns name and not new cert dns name value which is etc.test2.com>
this is causing the 3rd pod to never get accepted as a valid peer and pod never gets scheduled on node.
Any hint how can I reset the two PODS that are expecting old cert and start expecting new cert?
below is the error from etcd server pods that are running .
rafthttp: health check for peer 44ffe8e24fa23c10 could not connect: x509: certificate is valid for etcd-a.internal.test1.com, etcd-b.internal.test1.com, etcd-c.internal.test1.com, etcd-events-a.internal.test1.com, etcd-events-b.internal.test1.com, etcd-events-c.internal.test1.com, localhost, not etcd-b.internal.test2.com
Also, will cluster work on single etcd server pod or does it need to have 3?

kubeadm init is getting failed

I was trying to setup a Kubernetes cluster in centos 7 and i am facing issue while running the below kubeadm init command.
kubeadm init --apiserver-advertise-address=10.70.6.18 --pod-network-cidr=192.168.0.0/16
I1224 18:20:55.388552 11136 version.go:94] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: dial tcp: lookup dl.k8s.io on 10.171.221.11:53: no such host
I1224 18:20:55.388679 11136 version.go:95] falling back to the local client version: v1.13.1
[init] Using Kubernetes version: v1.13.1
[preflight] Running pre-flight checks
[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.13.1: output: v1.13.1: Pulling from kube-apiserver
73e3e9d78c61: Pulling fs layer
e08dba503a39: Pulling fs layer
error pulling image configuration: Get https://storage.googleapis.com/asia.artifacts.google-containers.appspot.com/containers/images/sha256:40a63db91ef844887af73e723e40e595e4aa651ac2c5637332d719e42abc4dd2: x509: certificate signed by unknown authority
, error: exit status 1
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.13.1: output: v1.13.1: Pulling from kube-controller-manager
In your worker(s) node try using proxy or VPN to change your IP. I think the registry which you try to pull from it, blocked your IP.

Kubernetes - unable to start kubelet with cloud provider openstack (error fetching current node name from cloud provider)

I'm trying to setup a Kubernetes cluster in Rackspace, and I understand that to get persistent volume support I would need to use Cinder (Openstack supported by Rackspace).
Following the Cloud Provider Integrations setup guide, I have setup /etc/kubernetes/cloud-config as follows
[Global]
username=cinder
password=********
auth-url=https://identity.api.rackspacecloud.com/v2.0
tenant-name=1234567
region=LON
I've added the following to the kubelet startup command in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
--cloud-provider=openstack --cloud-config=/etc/kubernetes/cloud-config
And I'm then running kubeadm init --config=kubeadm.conf where kubeadm.conf is:
kind: MasterConfiguration
apiVersion: kubeadm.k8s.io/v1alpha1
cloudProvider: openstack
pod-network-cidr: 10.244.0.0/16
It fails waiting for the kubelet to start. I tracked down the kubelet error to the following:
07:24:51.407692 21412 feature_gate.go:156] feature gates: map[]
07:24:51.407790 21412 controller.go:114] kubelet config controller: starting controller
07:24:51.407849 21412 controller.go:118] kubelet config controller: validating combination of defaults and flags
07:24:51.413973 21412 mount_linux.go:168] Detected OS with systemd
07:24:51.414065 21412 client.go:75] Connecting to docker on unix:///var/run/docker.sock
07:24:51.414137 21412 client.go:95] Start docker client with request timeout=2m0s
07:24:51.415471 21412 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d
07:24:51.437924 21412 iptables.go:564] couldn't get iptables-restore version; assuming it doesn't support --wait
07:24:51.440245 21412 feature_gate.go:156] feature gates: map[]
07:24:52.066765 21412 server.go:301] Successfully initialized cloud provider: "openstack" from the config file: "/etc/kubernetes/cloud-config"
07:24:52.066984 21412 openstack_instances.go:39] openstack.Instances() called
07:24:52.067048 21412 openstack_instances.go:46] Claiming to support Instances
07:24:52.070870 21412 metadata.go:84] Unable to run blkid: exit status 2
07:24:52.070993 21412 metadata.go:124] Attempting to fetch metadata from http://169.254.169.254/openstack/2012-08-10/meta_data.json
07:25:22.071444 21412 metadata.go:127] Cannot read http://169.254.169.254/openstack/2012-08-10/meta_data.json: Get http://169.254.169.254/openstack/2012-08-10/meta_data.json: dial tcp 169.254.169.254:80: i/o timeout
error: failed to run Kubelet: error fetching current node name from cloud provider: Get http://169.254.169.254/openstack/2012-08-10/meta_data.json: dial tcp 169.254.169.254:80: i/o timeout
How can I debug this further? I don't really understand how the IP address 169.254.169.254 works in this request.
Right now I can't tell if I have a Kubernetes issue or a Rackspace issue.
The answer is that Rackspace Cloud does not use the Openstack Metadata service. Instead it uses cloud-init with config-drive - a read-only block device (virtual CD-ROM) that is attached at boot.
The config drive contains the cloud-init data. Example https://developer.rackspace.com/blog/using-cloud-init-with-rackspace-cloud/
Anecdotally it seems most Rackspace customers who are using Kubernetes use CoreOS VMs which support cloud-config and the Openstack config drive. When K8s runs on a machine with the drive mounted, it attempts to obtain the metadata from there.
As per this link you need to put your cloud-config file inside /etc/kubernetes/pki. I have tried this approach and it works.
https://github.com/kubernetes/kubeadm/issues/484

kubeadm join failing. Unable to request signed cert

I'm a bit confused by this, because it was working for days without issue.
I use to be able to join nodes to my cluster withoout issue. I would run the below on the master node:
kubeadm init .....
After that, it would generate a join command and token to issue to the other nodes I want to join. Something like this:
kubeadm join --token 99385f.7b6e7e515416a041 192.168.122.100
I would run this on the nodes, and they would join without issue. The next morning, all of a sudden this stopped working. This is what I see when I run the command now:
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for
production clusters.
[preflight] Running pre-flight checks
[tokens] Validating provided token
[discovery] Created cluster info discovery client, requesting info from "http://192.168.122.100:9898/cluster-info/v1/?token-id=99385f"
[discovery] Cluster info object received, verifying signature using given token
[discovery] Cluster info signature and contents are valid, will use API endpoints [https://192.168.122.100:6443]
[bootstrap] Trying to connect to endpoint https://192.168.122.100:6443
[bootstrap] Detected server version: v1.6.0-rc.1
[bootstrap] Successfully established connection with endpoint "https://192.168.122.100:6443"
[csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request
failed to request signed certificate from the API server [cannot create certificate signing request: the server could not find the requested resource]
It seems like the node I'm trying to join does successfully connect to the API server on the master node, but for some reason, it now fails to request a certificate.
Any thoughts?
To me
sudo service kubelet restart
didn't work.
What I did was the following:
Copied from master node contents of /etc/kubernetes/* into slave nodes at same location /etc/kubernetes
I tried again "kubeadm join ..." command. This time the nodes joined the cluster without any complaint.
I think this is a temporary hack, but worked!
ok, I just stop and started kubelet on the master node as shown below, and things started working again:
sudo service kubelet stop
sudo service kubelet start
EDIT:
This only seemed to work on time for me.