"Calico CNI - calico-kube-controllers Failed to initialize Calico data store" error - calico

I am running Calico CNI v3.19.1 on Kubernetes version v1.19.13. using CentOS Steam 8 and RHEL 8 OS for this cluster with 3 master and 3 nodes.
when calico-kube-controller starts on the worker node. Its failing with [FATAL][1] main.go 118: Failed to initialize Calico data store error message.
I used below setting while deploying calico
# Auto-detect the BGP IP address.
- name: IP
value: "autodetect"
- name: IP_AUTODETECTION_METHOD
value: "interface=en.*"
- name: FELIX_IPTABLESBACKEND
value: "NFT"
Error message during pod startup:
klf calico-kube-controllers-5978c5f6b5-bxbmw
2021-07-26 15:24:21.353 [INFO][1] main.go 92: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0726 15:24:21.356093 1 client_config.go:615] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2021-07-26 15:24:21.357 [INFO][1] main.go 113: Ensuring Calico datastore is initialized
2021-07-26 15:24:31.357 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2021-07-26 15:24:31.357 [FATAL][1] main.go 118: Failed to initialize Calico datastore error=Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
Any tips to resolve this error message?
Thanks
SR

Try this on you master node, this is work for me.
$ sudo iptables -P INPUT ACCEPT
$ sudo iptables -P FORWARD ACCEPT
$ sudo iptables -P FORWARD ACCEPT
$ sudo iptables -F

Related

While creating kubernetes cluster with kubeadm with loadbalancer dns and port in aws getting Error

I'm trying to create high availability cluster for that I did following procedure
created Ec2 instance and added that to loadbalancer
By using that loadbalancer port and ip I gave following command.
sudo kubeadm init --control-plane-endpoint "LOAD_BALANCER_DNS:LOAD_BALANCER_PORT" --upload-certs
but master node is not created
getting this error
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
when I give systemctl status kubelet
I'm getting ip not found error
can someone please help

How to access etcd cluster endpoints from kubernetes master

Is there a way that I can access the etcd endpoints from kubernetes master node without actually getting into etcd cluster?
For a example, can I do a health curl (using ssh) to etcd endpoints or see endpoints and get the return status from the kubernetes master node? (i.e. without really getting inside the etcd master)
it really depends on how you configured the cluster. Actually, etcd cluster could work outside of k8s cluster at all. Also etcd could be configurred with TLS auth, so you will need to provide cert files to be able make any request via curl. etcdctl does everything you need. Something like:
~# export ETCDCTL_API=3
~# export ETCDCTL_ENDPOINTS=https://kub01.msk.test.ru:2379,https://kub02.msk.test.ru:2379,https://kub03.msk.test.ru:2379
~# etcdctl endpoint status
https://kub01.msk.test.ru:2379, e9bc9d307c96fd08, 3.3.13, 10 MB, true, 1745, 17368976
https://kub02.msk.test.ru:2379, 885ed66440d63a79, 3.3.13, 10 MB, false, 1745, 17368976
https://kub03.msk.test.ru:2379, 8c5c20ece034a652, 3.3.13, 10 MB, false, 1745, 17368976
or with the TLS:
~# etcdctl endpoint health
client: etcd cluster is unavailable or misconfigured; error #0: remote error: tls: bad certificate
; error #1: remote error: tls: bad certificate
; error #2: remote error: tls: bad certificate
# need to export environment vars
~# export ETCDCTL_CACERT=<PATH_TO_FILE>
~# export ETCDCTL_CERT=<PATH_TO_FILE>
~# export ETCDCTL_KEY=<PATH_TO_FILE>
~# etcdctl endpoint health
https://kub01.msk.test.ru:2379 is healthy: successfully committed proposal: took = 2.946423ms
https://kub02.msk.test.ru:2379 is healthy: successfully committed proposal: took = 1.5883ms
https://kub03.msk.test.ru:2379 is healthy: successfully committed proposal: took = 1.745591ms
You can run the commands into a pod without actually getting inside the pod for example if I have to run ls -l inside the etcd pod, what I would is
kubectl exec -it -n kube-system etcd-kanister-control-plane -- ls -l
Similarly you can run any command instead of ls -l

The connection to the server localhost:8080 was refused

I was able to cluster 2 nodes together in Kubernetes. The master node seems to be running fine but running any command on the worker node results in the error: "The connection to the server localhost:8080 was refused - did you specify the right host or port?"
From master (node1),
$ kubectl get nodes
NAME STATUS AGE VERSION
node1 Ready 23h v1.7.3
node2 Ready 23h v1.7.3
From worker (node 2),
$ kubectl get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?
$ telnet localhost 8080
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
$ ping localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.032 ms
I am not sure how to fix this issue. Any help is appreciated.
On executing,"journalctl -xeu kubelet" I see:
"CNI failed to retrieve network namespace path: Cannot find network namespace for the terminated container", but this seems to be related to installing a pod network ... which I am not able to because of the above error.
Thanks!
kubectl interfaces with kube-apiserver for cluster management. The command works on the master node because that's where kube-apiserver runs. On the worker nodes, only kubelet and kube-proxy is running.
In fact, kubectl is supposed to be run on a client (eg. laptop, desktop) and not on the kubernetes nodes.
from master you need ~/.kube/config pass this file as argument for kubectl command. Copy the config file to other server or laptop then pass the argument as for kubectl command
eg:
kubectl --kubeconfig=~/.kube/config
This worked for me after executing following commands:
$ sudo mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
As a hint, the message being prompted indicates its related to network.
So one potential answer could also be, which worked for my resolution, is to have a look at the key cluster value for context within contexts.
My error was that I had placed an incorrect cluster name there.
Having the appropriate cluster name is crucial to finding it for respective context and the error will disappear.
To solve the issue The connection to the server localhost:8080 was refused - did you specify the right host or port?, you may be missing a step.
My Fix:
On MacOS if you install K8s with brew, you still need to brew install minikube, afterwards you should run minikube start. This will start your cluster.
Run the command kubectl cluster-info and you should get a happy path response similar to:
Kubernetes control plane is running at https://127.0.0.1:63000
KubeDNS is running at https://127.0.0.1:63308/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Kubernetes install steps: https://kubernetes.io/docs/tasks/tools/install-kubectl-macos/
Minikube docs: https://minikube.sigs.k8s.io/docs/start/
Ensure what context is selected if you're running Kubernetes in the Docker Desktop.
Once you've selected it right, you'll be able to run the kubectl commands without any exception:
% kubectl cluster-info
Kubernetes control plane is running at https://kubernetes.docker.internal:6443
CoreDNS is running at https://kubernetes.docker.internal:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
% kubectl get nodes
NAME STATUS ROLES AGE VERSION
docker-desktop Ready control-plane,master 2d11h v1.22.5

"kubectl get nodes" shows NotReady always even after giving the appropriate IP

i am trying to setup a kubernetes cluster for testing purpose with a master and one minion. When i run the kubectl get nodes it always says NotReady. Following the configuration on minion in /etc/kubernetes/kubelet
KUBELET_ADDRESS="--address=0.0.0.0"
KUBELET_PORT="--port=10250"
KUBELET_HOSTNAME="--hostname-override=centos-minion"
KUBELET_API_SERVER="--api-servers=http://centos-master:8080"
KUBELET_ARGS=""
When kubelete service is started following logs could be seen
Mar 16 13:29:49 centos-minion kubelet: E0316 13:29:49.126595 53912 event.go:202] Unable to write event: 'Post http://centos-master:8080/api/v1/namespaces/default/events: dial tcp 10.143.219.12:8080: i/o timeout' (may retry after sleeping)
Mar 16 13:16:01 centos-minion kube-proxy: E0316 13:16:01.195731 53595 event.go:202] Unable to write event: 'Post http://localhost:8080/api/v1/namespaces/default/events: dial tcp [::1]:8080: getsockopt: connection refused' (may retry after sleeping)
Following is the config on master /etc/kubernetes/apiserver
KUBE_API_ADDRESS="--bind-address=0.0.0.0"
KUBE_API_PORT="--port=8080"
KUBELET_PORT="--kubelet-port=10250"
KUBE_SERVICE_ADDRESSES="--service-cluster-ip-range=10.254.0.0/16"
/etc/kubernetes/config
KUBE_ETCD_SERVERS="--etcd-servers=http://centos-master:2379"
KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=0"
KUBE_ALLOW_PRIV="--allow-privileged=false"
KUBE_MASTER="--master=http://centos-master:8080"
On master following processes are properly running
kube 5657 1 0 Mar15 ? 00:12:05 /usr/bin/kube-apiserver --logtostderr=true --v=0 --etcd-servers=http://centos-master:2379 --address=0.0.0.0 --port=8080 --kubelet-port=10250 --allow-privileged=false --service-cluster-ip-range=10.254.0.0/16
kube 5690 1 1 Mar15 ? 00:16:01 /usr/bin/kube-controller-manager --logtostderr=true --v=0 --master=http://centos-master:8080
kube 5723 1 0 Mar15 ? 00:02:23 /usr/bin/kube-scheduler --logtostderr=true --v=0 --master=http://centos-master:8080
So i still do not know what is missing.
I was having the same issue when I setting up the kubernetes with fedora following the steps on kubernetes.io.
In the tutorial, it's commenting out KUBELET_ARGS="--cgroup-driver=systemd" in node's /etc/kubernetes/kubelet, if you uncomment it, you will see the node status become ready.
Hope this help
rejoin the worker nodes to the master.
My install is on three physical machines. One master and two workers. All needed reboots.
you will need your join token, which you probably don't have:
sudo kubeadm token list
copy the TOKEN field data, the output looks like this (no, that's not my real one):
TOKEN
ow3v08ddddgmgzfkdkdkd7 18h 2018-07-30T12:39:53-05:00 authentication,signing The default bootstrap token generated by 'kubeadm init'. system:bootstrappers:kubeadm:default-node-token
THEN join the cluster here. Master node IP is the real IP address of your machine:
sudo kubeadm join --token <YOUR TOKEN HASH> <MASTER_NODE_IP>:6443 --discovery-token-unsafe-skip-ca-verification
Have to restart kubelet service in node (systemctl enable kubelet & systemctl restart kubelet). Then you can see your node is in "Ready" status.

DNS not resolving though all keys are there in etcd?

Here are some details, and why this is important for my next step in testing:
I can resolve any outside DNS
etcd appears to have all keys updating
correctly, along with directories (as expected)
Local-to-Kubernetes DNS queries doesn't appear to be working against the etcd datastore,
even though I can manually query for key-values.
This is the next
step that I need to complete before I can start using an NGINX L7 LB
demo.
I looked
at the advice in #10265 first [just in case], but it
appears I do have secrets for the service account...and I think(?)
everything should be there as expected.
The only thing I really see in the Kube2Sky logs are that etcd is found. I would imagine I should be seeing more than this?
[fedora#kubemaster ~]$ kubectl logs kube-dns-v10-q9mlb -c kube2sky --namespace=kube-system
I0118 17:42:24.639508 1 kube2sky.go:436] Etcd server found: http://127.0.0.1:4001
I0118 17:42:25.642366 1 kube2sky.go:503] Using https://10.254.0.1:443 for kubernetes master
I0118 17:42:25.642772 1 kube2sky.go:504] Using kubernetes API
[fedora#kubemaster ~]$
More Details:
[fedora#kubemaster ~]$ kubectl exec -t busybox -- nslookup kubelab.local
Server: 10.254.0.10
Address 1: 10.254.0.10
nslookup: can't resolve 'kubelab.local'
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1
fedora#kubemaster ~]$ etcdctl ls --recursive
/kubelab.local
/kubelab.local/network
/kubelab.local/network/config
/kubelab.local/network/subnets
/kubelab.local/network/subnets/172.16.46.0-24
/kubelab.local/network/subnets/172.16.12.0-24
/kubelab.local/network/subnets/172.16.70.0-24
/kubelab.local/network/subnets/172.16.21.0-24
/kubelab.local/network/subnets/172.16.54.0-24
/kubelab.local/network/subnets/172.16.71.0-24
....and so on...the keys are all there, as expected...
I see you changed the default "cluster.local" to "kubelab.local". did you change the skydns config to serve that domain?
kubectl exec --namespace=kube-system $podname -c skydns ps
PID USER COMMAND
1 root /skydns -machines=http://127.0.0.1:4001 -addr=0.0.0.0:53 -ns-rotate=false -domain=cluster.local.
11 root ps
Note the -domain flag.
If that is correct, check that you passed correct --cluster-dns and --cluster-domain flags to Kubelet. Then show me /etc/resolv.conf from a pod that can not do DNS lookups.