can not join new node to k8s cluster

can not join new node to k8s cluster - kubernetes

i want to join my new server to k8s cluster ,but failed,i do not know why?
# kubeadm join 10.100.1.20:6443 --token xxxxxx --discovery-token-ca-cert-hash sha256:xxxxxx
[preflight] running pre-flight checks
[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs support: map[ip_vs_sh:{} nf_conntrack_ipv4:{} ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{}]
you can solve this problem with following methods:
1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support
I1126 10:30:33.608681 7238 kernel_validator.go:81] Validating kernel version
I1126 10:30:33.608737 7238 kernel_validator.go:96] Validating kernel config
[WARNING Hostname]: hostname "t-k8s-b1" could not be reached
[WARNING Hostname]: hostname "t-k8s-b1" lookup t-k8s-b1 on 103.224.222.222:53: no such host
[discovery] Trying to connect to API Server "10.100.1.20:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.100.1.20:6443"
[discovery] Requesting info from "https://10.100.1.20:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.100.1.20:6443"
[discovery] Successfully established connection with API Server "10.100.1.20:6443"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace
Unauthorized
can not find new node
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
t-k8s-a1 Ready master 6d v1.11.3
t-k8s-b2 Ready <none> 6d v1.11.3
in /var/log/messages
Nov 26 10:40:39 t-k8s-b1 systemd: Configuration file /etc/systemd/system/kubelet.service is marked executable. Please remove executable permission bits. Proceeding anyway.
i change /etc/systemd/system/kubelet.service from 0755 to 0644 ,the message warning disppeared and modprobe the module ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh,still Unauthorized
[preflight] running pre-flight checks
I1126 10:48:03.529871 8416 kernel_validator.go:81] Validating kernel version
I1126 10:48:03.529927 8416 kernel_validator.go:96] Validating kernel config
[WARNING Hostname]: hostname "t-k8s-b1" could not be reached
[WARNING Hostname]: hostname "t-k8s-b1" lookup t-k8s-b1 on 103.224.222.222:53: no such host
[discovery] Trying to connect to API Server "10.100.1.20:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.100.1.20:6443"
[discovery] Requesting info from "https://10.100.1.20:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.100.1.20:6443"
[discovery] Successfully established connection with API Server "10.100.1.20:6443"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace
Unauthorized
Solution
the reason is token expired, i recreate a new token ,and join with it ,all fine
# kubeadm token create
new token
# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
ca cert hash
# kubeadm join 10.100.1.20:6443 --token new_token --discovery-token-ca-cert-hash sha256:ca_cert_hash

Related

kubelet won't start after kuberntes/manifest update

This is sort of strange behavior in our K8 cluster.
When we try to deploy a new version of our applications we get:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "<container-id>" network for pod "application-6647b7cbdb-4tp2v": networkPlugin cni failed to set up pod "application-6647b7cbdb-4tp2v_default" network: Get "https://[10.233.0.1]:443/api/v1/namespaces/default": dial tcp 10.233.0.1:443: connect: connection refused
I used kubectl get cs and found controller and scheduler in Unhealthy state.
As describer here updated /etc/kubernetes/manifests/kube-scheduler.yaml and
/etc/kubernetes/manifests/kube-controller-manager.yaml by commenting --port=0
When I checked systemctl status kubelet it was working.
Active: active (running) since Mon 2020-10-26 13:18:46 +0530; 1 years 0 months ago
I had restarted kubelet service and controller and scheduler were shown healthy.
But systemctl status kubelet shows (soon after restart kubelet it showed running state)
Active: activating (auto-restart) (Result: exit-code) since Thu 2021-11-11 10:50:49 +0530; 3s ago<br>
Docs: https://github.com/GoogleCloudPlatform/kubernetes<br> Process: 21234 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET
Tried adding Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --fail-swap-on=false" to /etc/systemd/system/kubelet.service.d/10-kubeadm.conf as described here, but still its not working properly.
Also removed --port=0 comment in above mentioned manifests and tried restarting,still same result.
Edit: This issue was due to kubelet certificate expired and fixed following these steps. If someone faces this issue, make sure /var/lib/kubelet/pki/kubelet-client-current.pem certificate and key values are base64 encoded when placing on /etc/kubernetes/kubelet.conf
Many other suggested kubeadm init again. But this cluster was created using kubespray no manually added nodes.
We have baremetal k8 running on Ubuntu 18.04.
K8: v1.18.8
We would like to know any debugging and fixing suggestions.
PS:
When we try to telnet 10.233.0.1 443 from any node, first attempt fails and second attempt success.
Edit: Found this in kubelet service logs
Nov 10 17:35:05 node1 kubelet[1951]: W1110 17:35:05.380982 1951 docker_sandbox.go:402] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "app-7b54557dd4-bzjd9_default": unexpected command output nsenter: cannot open /proc/12311/ns/net: No such file or directory

Posting comment as the community wiki answer for better visibility
This issue was due to kubelet certificate expired and fixed following these steps. If someone faces this issue, make sure /var/lib/kubelet/pki/kubelet-client-current.pem certificate and key values are base64 encoded when placing on /etc/kubernetes/kubelet.conf

OC Cluster never goes up Error: timed out waiting for the condition

When ever i try to get the cluster up using "oc cluster up"
Below is the error I get. Kindly help on how to fix this
[mano#mano ~]$ oc cluster up
Getting a Docker client ...
Checking if image openshift/origin-control-plane:v3.11 is available ...
Checking type of volume mount ...
Determining server IP ...
Checking if OpenShift is already running ...
Checking for supported Docker version (=>1.22) ...
Checking if insecured registry is configured properly in Docker ...
Checking if required ports are available ...
Checking if OpenShift client is configured properly ...
Checking if image openshift/origin-control-plane:v3.11 is available ...
Starting OpenShift using openshift/origin-control-plane:v3.11 ...
I0923 13:40:32.364326 15396 config.go:40] Running "create-master-config"
I0923 13:40:59.938492 15396 config.go:46] Running "create-node-config"
I0923 13:41:10.721711 15396 flags.go:30] Running "create-kubelet-flags"
I0923 13:41:18.241285 15396 run_kubelet.go:49] Running "start-kubelet"
I0923 13:41:23.016238 15396 run_self_hosted.go:181] Waiting for the kube-apiserver to be ready ...
E0923 13:46:23.023479 15396 run_self_hosted.go:571] API server error: Get https://127.0.0.1:8443/healthz?timeout=32s: dial tcp 127.0.0.1:8443: connect: connection refused ()
Error: timed out waiting for the condition
OC version
[mano#mano` ~]$ oc version
oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO
followed the article :https://github.com/openshift/origin/blob/release-3.11/docs/cluster_up_down.md
yet no luck

Minikube is slow and unresponsive

Today randomly minikube seems to be taking very long to respond to command via kubectl.
And occasionally even:
kubectl get pods
Unable to connect to the server: net/http: TLS handshake timeout
How can I diagnose this?
Some logs from minikube logs:
==> kube-scheduler <==
I0527 14:16:55.809859 1 serving.go:319] Generated self-signed cert in-memory
W0527 14:16:56.256478 1 authentication.go:387] failed to read in-cluster kubeconfig for delegated authentication: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0527 14:16:56.256856 1 authentication.go:249] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W0527 14:16:56.257077 1 authentication.go:252] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W0527 14:16:56.257189 1 authorization.go:177] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0527 14:16:56.257307 1 authorization.go:146] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
I0527 14:16:56.264875 1 server.go:142] Version: v1.14.1
I0527 14:16:56.265228 1 defaults.go:87] TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory
W0527 14:16:56.286959 1 authorization.go:47] Authorization is disabled
W0527 14:16:56.286982 1 authentication.go:55] Authentication is disabled
I0527 14:16:56.286995 1 deprecated_insecure_serving.go:49] Serving healthz insecurely on [::]:10251
I0527 14:16:56.287397 1 secure_serving.go:116] Serving securely on 127.0.0.1:10259
I0527 14:16:57.417028 1 controller_utils.go:1027] Waiting for caches to sync for scheduler controller
I0527 14:16:57.524378 1 controller_utils.go:1034] Caches are synced for scheduler controller
I0527 14:16:57.827438 1 leaderelection.go:217] attempting to acquire leader lease kube-system/kube-scheduler...
E0527 14:17:10.865448 1 leaderelection.go:306] error retrieving resource lock kube-system/kube-scheduler: Get https://localhost:8443/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0527 14:17:43.418910 1 leaderelection.go:306] error retrieving resource lock kube-system/kube-scheduler: Get https://localhost:8443/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I0527 14:18:01.447065 1 leaderelection.go:227] successfully acquired lease kube-system/kube-scheduler
I0527 14:18:29.044544 1 leaderelection.go:263] failed to renew lease kube-system/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded
E0527 14:18:38.999295 1 server.go:252] lost master
E0527 14:18:39.204637 1 leaderelection.go:306] error retrieving resource lock kube-system/kube-scheduler: Get https://localhost:8443/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
lost lease
Update:
To work around this issue I just did a minikube delete and minikube start, and the performance issue resolved..

As solution has been found, I am posting this as Community Wiki for future users.
1) Debugging issues with minikube by adding -v flag and set debug level (0, 1, 2, 3, 7).
As example: minikube start --v=1 to set outbut to INFO level.
More detailed information here
2) Use logs command minikube logs
3) Because Minikube is working on Virtual Machine sometimes is better to delete minikube and start it again (It helped in this case).
minikube delete
minikube start
4) It might get slow due to lack of resources.
Minikube as default is using 2048MB of memory and 2 CPUs. More details about this can be fund here
In addition, you can enforce Minikube to create more using command
minikube start --cpus 4 --memory 8192

Kubeadm alpha phase certs causes join command token to be invalid

I have used kubeadm alpha phase certs to recreate the certificates used in my Kubernetes cluster. Also, use the alpha phase for kubeconfig. Now when trying to join a new worker - it is giving me errors that my token is invalid even when the token has been regenerate 3 times using - kubeadm token create --print-join-command.
The error that I keep getting is:
[discovery] Created cluster-info discovery client, requesting info from "https://x.x.x.x:6443"
[discovery] Failed to connect to API Server "x.x.x.x:6443": token id "bvw4cz" is invalid for this cluster or it has expired. Use "kubeadm token create" on the master node to creating a new valid token
Anyone run into the same problems or have a suggestion?
Thanks!
EDIT--
This is the tail end of /var/log/syslog --
Nov 5 09:40:01 master01 kubelet[755]: E1105 09:40:01.892304 755 kubelet.go:2236] node "master01" not found
Nov 5 09:40:01 master01 kubelet[755]: E1105 09:40:01.928937 755 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://x.x.x.x:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkubernetserver&limit=500&resourceVersion=0: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
Nov 5 09:40:01 master01 kubelet[755]: E1105 09:40:01.992427 755 kubelet.go:2236] node "master01" not found
EDIT 2 - 1. Now the real question is - if regenerating certs do not enable trust to itself as a CA, how do you fix this problem? 2. Is this a problem that is well known?

haproxy as loadbalancer for kube-apiserver

I've set up a kubernetes cluster with three masters. The kube-apiserver should be stateless. To properly access them from the worker nodes, I've configured an haproxy which is configured to provide the ports (8080) of the apiserver.
frontend http_front_8080
bind *:8080
stats uri /haproxy?stats
default_backend http_back_8080
backend http_back_8080
balance roundrobin
server m01 192.168.33.21:8080 check
server m02 192.168.33.22:8080 check
server m03 192.168.33.23:8080 check
But when I run the nodes with the loadbalancers ip as the address of the apiserver I'll receive this errors:
Apr 20 12:35:07 n01 kubelet[3383]: E0420 12:35:07.308337 3383 reflector.go:271] pkg/kubelet/kubelet.go:240: Failed to watch *api.Service: too old resource version: 4001 (4041)
Apr 20 12:36:48 n01 kubelet[3383]: E0420 12:36:48.321021 3383 reflector.go:271] pkg/kubelet/kubelet.go:240: Failed to watch *api.Service: too old resource version: 4011 (4041)
Apr 20 12:37:31 n01 kube-proxy[3408]: E0420 12:37:31.381042 3408 reflector.go:271] pkg/proxy/config/api.go:47: Failed to watch *api.Service: too old resource version: 4011 (4041)
Apr 20 12:41:42 n01 kube-proxy[3408]: E0420 12:41:42.409604 3408 reflector.go:271] pkg/proxy/config/api.go:47: Failed to watch *api.Service: too old resource version: 4011 (4041)
If I change the loadbalancers IP to one of the masters nodes it works as expected (without these error messages above).
Am I something missing in my haproxy configuration which is vital for running this config?

I had the same issue as you. I assume the watch requires some sort of state on the api server side.
The solution is to change the configuration so all the requests from a client go to the same server using balance source. I assume you only have multiple api servers so kubernetes is highly available (instead of load balancing).
frontend http_front_8080
bind *:8080
stats uri /haproxy?stats
default_backend http_back_8080
backend http_back_8080
balance source
server m01 192.168.33.21:8080 check
server m02 192.168.33.22:8080 check
server m03 192.168.33.23:8080 check

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

can not join new node to k8s cluster - kubernetes

Related

kubelet won't start after kuberntes/manifest update

OC Cluster never goes up Error: timed out waiting for the condition

Minikube is slow and unresponsive

Kubeadm alpha phase certs causes join command token to be invalid

haproxy as loadbalancer for kube-apiserver

Categories

Resources