I have used kubeadm alpha phase certs to recreate the certificates used in my Kubernetes cluster. Also, use the alpha phase for kubeconfig. Now when trying to join a new worker - it is giving me errors that my token is invalid even when the token has been regenerate 3 times using - kubeadm token create --print-join-command.
The error that I keep getting is:
[discovery] Created cluster-info discovery client, requesting info from "https://x.x.x.x:6443"
[discovery] Failed to connect to API Server "x.x.x.x:6443": token id "bvw4cz" is invalid for this cluster or it has expired. Use "kubeadm token create" on the master node to creating a new valid token
Anyone run into the same problems or have a suggestion?
Thanks!
EDIT--
This is the tail end of /var/log/syslog --
Nov 5 09:40:01 master01 kubelet[755]: E1105 09:40:01.892304 755 kubelet.go:2236] node "master01" not found
Nov 5 09:40:01 master01 kubelet[755]: E1105 09:40:01.928937 755 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://x.x.x.x:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkubernetserver&limit=500&resourceVersion=0: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
Nov 5 09:40:01 master01 kubelet[755]: E1105 09:40:01.992427 755 kubelet.go:2236] node "master01" not found
EDIT 2 - 1. Now the real question is - if regenerating certs do not enable trust to itself as a CA, how do you fix this problem? 2. Is this a problem that is well known?
Related
This is sort of strange behavior in our K8 cluster.
When we try to deploy a new version of our applications we get:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "<container-id>" network for pod "application-6647b7cbdb-4tp2v": networkPlugin cni failed to set up pod "application-6647b7cbdb-4tp2v_default" network: Get "https://[10.233.0.1]:443/api/v1/namespaces/default": dial tcp 10.233.0.1:443: connect: connection refused
I used kubectl get cs and found controller and scheduler in Unhealthy state.
As describer here updated /etc/kubernetes/manifests/kube-scheduler.yaml and
/etc/kubernetes/manifests/kube-controller-manager.yaml by commenting --port=0
When I checked systemctl status kubelet it was working.
Active: active (running) since Mon 2020-10-26 13:18:46 +0530; 1 years 0 months ago
I had restarted kubelet service and controller and scheduler were shown healthy.
But systemctl status kubelet shows (soon after restart kubelet it showed running state)
Active: activating (auto-restart) (Result: exit-code) since Thu 2021-11-11 10:50:49 +0530; 3s ago<br>
Docs: https://github.com/GoogleCloudPlatform/kubernetes<br> Process: 21234 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET
Tried adding Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --fail-swap-on=false" to /etc/systemd/system/kubelet.service.d/10-kubeadm.conf as described here, but still its not working properly.
Also removed --port=0 comment in above mentioned manifests and tried restarting,still same result.
Edit: This issue was due to kubelet certificate expired and fixed following these steps. If someone faces this issue, make sure /var/lib/kubelet/pki/kubelet-client-current.pem certificate and key values are base64 encoded when placing on /etc/kubernetes/kubelet.conf
Many other suggested kubeadm init again. But this cluster was created using kubespray no manually added nodes.
We have baremetal k8 running on Ubuntu 18.04.
K8: v1.18.8
We would like to know any debugging and fixing suggestions.
PS:
When we try to telnet 10.233.0.1 443 from any node, first attempt fails and second attempt success.
Edit: Found this in kubelet service logs
Nov 10 17:35:05 node1 kubelet[1951]: W1110 17:35:05.380982 1951 docker_sandbox.go:402] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "app-7b54557dd4-bzjd9_default": unexpected command output nsenter: cannot open /proc/12311/ns/net: No such file or directory
Posting comment as the community wiki answer for better visibility
This issue was due to kubelet certificate expired and fixed following these steps. If someone faces this issue, make sure /var/lib/kubelet/pki/kubelet-client-current.pem certificate and key values are base64 encoded when placing on /etc/kubernetes/kubelet.conf
I am trying to configure a demo, cross-platform Puppet Setup, that is
Puppet master on Centos 8 running puppetserver version: 6.11.1
Puppet node on ubuntu18.04 running Puppet v5.4.0
However, when I try a test puppet run with puppet agent --test , I get error like below.
Looks like it is expecting a issuer certificate of Puppet CA , that is certificate of the root CA which signed the puppet CAs certficate. This error is not present with centosMaster-centosUbuntu.
Can anyone helpout here? Can i copy this cert from the Puppet servers file on centos to the ubuntu node that is expecting it? If so what would be the location on the centos host?
root#node04-ubuntu:/# puppet agent --test --ca_server=puppetmaster.localdomain.com --no-daemonize --waitforcert=20 --verbose
Warning: Unable to fetch my node definition, but the agent run will continue:
Warning: SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get issuer certificate): [unable to get issuer certificate for /CN=Puppet CA: puppetmaster.localdomain.com]
Info: Retrieving pluginfacts
Error: /File[/var/cache/puppet/facts.d]: Failed to generate additional resources using 'eval_generate': SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get issuer certificate): [unable to get issuer certificate for /CN=Puppet CA: puppetmaster.localdomain.com]
Error: /File[/var/cache/puppet/facts.d]: Could not evaluate: Could not retrieve file metadata for puppet:///pluginfacts: SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get issuer certificate): [unable to get issuer certificate for /CN=Puppet CA: puppetmaster.localdomain.com]
Info: Retrieving plugin
Error: /File[/var/cache/puppet/lib]: Failed to generate additional resources using 'eval_generate': SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get issuer certificate): [unable to get issuer certificate for /CN=Puppet CA: puppetmaster.localdomain.com]
Error: /File[/var/cache/puppet/lib]: Could not evaluate: Could not retrieve file metadata for puppet:///plugins: SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get issuer certificate): [unable to get issuer certificate for /CN=Puppet CA: puppetmaster.localdomain.com]
Error: Could not retrieve catalog from remote server: SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get issuer certificate): [unable to get issuer certificate for /CN=Puppet CA: puppetmaster.localdomain.com]
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Error: Could not send report: SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get issuer certificate): [unable to get issuer certificate for /CN=Puppet CA: puppetmaster.localdomain.com
Fixed
The issue was the version mismatch. Got the client to be as the same latest version as the master and it is working now. Cheers
Today randomly minikube seems to be taking very long to respond to command via kubectl.
And occasionally even:
kubectl get pods
Unable to connect to the server: net/http: TLS handshake timeout
How can I diagnose this?
Some logs from minikube logs:
==> kube-scheduler <==
I0527 14:16:55.809859 1 serving.go:319] Generated self-signed cert in-memory
W0527 14:16:56.256478 1 authentication.go:387] failed to read in-cluster kubeconfig for delegated authentication: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0527 14:16:56.256856 1 authentication.go:249] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W0527 14:16:56.257077 1 authentication.go:252] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W0527 14:16:56.257189 1 authorization.go:177] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0527 14:16:56.257307 1 authorization.go:146] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
I0527 14:16:56.264875 1 server.go:142] Version: v1.14.1
I0527 14:16:56.265228 1 defaults.go:87] TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory
W0527 14:16:56.286959 1 authorization.go:47] Authorization is disabled
W0527 14:16:56.286982 1 authentication.go:55] Authentication is disabled
I0527 14:16:56.286995 1 deprecated_insecure_serving.go:49] Serving healthz insecurely on [::]:10251
I0527 14:16:56.287397 1 secure_serving.go:116] Serving securely on 127.0.0.1:10259
I0527 14:16:57.417028 1 controller_utils.go:1027] Waiting for caches to sync for scheduler controller
I0527 14:16:57.524378 1 controller_utils.go:1034] Caches are synced for scheduler controller
I0527 14:16:57.827438 1 leaderelection.go:217] attempting to acquire leader lease kube-system/kube-scheduler...
E0527 14:17:10.865448 1 leaderelection.go:306] error retrieving resource lock kube-system/kube-scheduler: Get https://localhost:8443/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0527 14:17:43.418910 1 leaderelection.go:306] error retrieving resource lock kube-system/kube-scheduler: Get https://localhost:8443/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I0527 14:18:01.447065 1 leaderelection.go:227] successfully acquired lease kube-system/kube-scheduler
I0527 14:18:29.044544 1 leaderelection.go:263] failed to renew lease kube-system/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded
E0527 14:18:38.999295 1 server.go:252] lost master
E0527 14:18:39.204637 1 leaderelection.go:306] error retrieving resource lock kube-system/kube-scheduler: Get https://localhost:8443/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
lost lease
Update:
To work around this issue I just did a minikube delete and minikube start, and the performance issue resolved..
As solution has been found, I am posting this as Community Wiki for future users.
1) Debugging issues with minikube by adding -v flag and set debug level (0, 1, 2, 3, 7).
As example: minikube start --v=1 to set outbut to INFO level.
More detailed information here
2) Use logs command minikube logs
3) Because Minikube is working on Virtual Machine sometimes is better to delete minikube and start it again (It helped in this case).
minikube delete
minikube start
4) It might get slow due to lack of resources.
Minikube as default is using 2048MB of memory and 2 CPUs. More details about this can be fund here
In addition, you can enforce Minikube to create more using command
minikube start --cpus 4 --memory 8192
i want to join my new server to k8s cluster ,but failed,i do not know why?
# kubeadm join 10.100.1.20:6443 --token xxxxxx --discovery-token-ca-cert-hash sha256:xxxxxx
[preflight] running pre-flight checks
[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs support: map[ip_vs_sh:{} nf_conntrack_ipv4:{} ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{}]
you can solve this problem with following methods:
1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support
I1126 10:30:33.608681 7238 kernel_validator.go:81] Validating kernel version
I1126 10:30:33.608737 7238 kernel_validator.go:96] Validating kernel config
[WARNING Hostname]: hostname "t-k8s-b1" could not be reached
[WARNING Hostname]: hostname "t-k8s-b1" lookup t-k8s-b1 on 103.224.222.222:53: no such host
[discovery] Trying to connect to API Server "10.100.1.20:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.100.1.20:6443"
[discovery] Requesting info from "https://10.100.1.20:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.100.1.20:6443"
[discovery] Successfully established connection with API Server "10.100.1.20:6443"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace
Unauthorized
can not find new node
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
t-k8s-a1 Ready master 6d v1.11.3
t-k8s-b2 Ready <none> 6d v1.11.3
in /var/log/messages
Nov 26 10:40:39 t-k8s-b1 systemd: Configuration file /etc/systemd/system/kubelet.service is marked executable. Please remove executable permission bits. Proceeding anyway.
i change /etc/systemd/system/kubelet.service from 0755 to 0644 ,the message warning disppeared and modprobe the module ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh,still Unauthorized
[preflight] running pre-flight checks
I1126 10:48:03.529871 8416 kernel_validator.go:81] Validating kernel version
I1126 10:48:03.529927 8416 kernel_validator.go:96] Validating kernel config
[WARNING Hostname]: hostname "t-k8s-b1" could not be reached
[WARNING Hostname]: hostname "t-k8s-b1" lookup t-k8s-b1 on 103.224.222.222:53: no such host
[discovery] Trying to connect to API Server "10.100.1.20:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.100.1.20:6443"
[discovery] Requesting info from "https://10.100.1.20:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.100.1.20:6443"
[discovery] Successfully established connection with API Server "10.100.1.20:6443"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace
Unauthorized
Solution
the reason is token expired, i recreate a new token ,and join with it ,all fine
# kubeadm token create
new token
# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
ca cert hash
# kubeadm join 10.100.1.20:6443 --token new_token --discovery-token-ca-cert-hash sha256:ca_cert_hash
composer network install -c adminCard -a hyperledger-fabric-network.bna
Network install commands fails on following error:
Installing business network. This may take a minute...E1115 11:51:11.667324200 30359 ssl_transport_security.cc:599] Could not load any root certificate.
E1115 11:51:11.667359374 30359 ssl_transport_security.cc:1400] Cannot load server root certificates.
E1115 11:51:11.667373715 30359 security_connector.cc:1025] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E1115 11:51:11.667384067 30359 secure_channel_create.cc:111] Failed to create secure subchannel for secure name 'ldn-zbc03a.4.secure.blockchain.ibm.com:20355'
E1115 11:51:11.667390697 30359 secure_channel_create.cc:142] Failed to create subchannel arguments during subchannel creation.
E1115 11:51:11.668097853 30359 ssl_transport_security.cc:599] Could not load any root certificate.
E1115 11:51:11.668109600 30359 ssl_transport_security.cc:1400] Cannot load server root certificates.
E1115 11:51:11.668118612 30359 security_connector.cc:1025] Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E1115 11:51:11.668123679 30359 secure_channel_create.cc:111] Failed to create secure subchannel for secure name 'ldn-zbc03a.4.secure.blockchain.ibm.com:20355'
E1115 11:51:11.668129626 30359 secure_channel_create.cc:142] Failed to create subchannel arguments during subchannel creation.
✖ Installing business network. This may take a minute...
Error: Error trying install business network. Error: No valid responses from any peers.
Response from attempted peer comms was an error: Error: Failed to connect before the deadline
Please check the adminCard. It seems the certification isn't correct.