How to check if Kubernetes cluster is running fine - kubernetes

I have a Kubernetes cluster running. I used:
kubeadm init --apiserver-advertise-address=192.168.20.101 --pod-network-cidr=10.244.0.0/16
This is working okay. Now I'm putting this in a script and I only want to execute kubeadm init again if my cluster is not running fine. How can I check if a Kubernetes cluster is running fine? So if not I can recreate the cluster.

You can use the following command to do that:
[root#ip-10-0-1-19]# kubectl cluster-info
Kubernetes master is running at https://10.0.1.197:6443
KubeDNS is running at https://10.0.1.197:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
It shows that your master is running fine on particular url.

kubectl get cs
The above command would display the health of controller, scheduler and etcd as Healthy if your cluster is fine
# kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}

You should also check nodes running in the cluster.
kubectl get nodes

Related

How to restore accidentally deleted a kube-proxy DaemonSet in a Kubernetes cluster?

I accidentally deleted kube-proxy daemonset by using command: kubectl delete -n kube-system daemonset kube-proxy which should run kube-proxy pods in my cluster, what the best way to restore it?
That's how it should look
Kubernetes allows you to reinstall kube-proxy by running the following command which install the kube-proxy addon components via the API server.
$ kubeadm init phase addon kube-proxy --kubeconfig ~/.kube/config --apiserver-advertise-address string
This will generate the output as
[addons] Applied essential addon: kube-proxy
The IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
Hence kube-proxy will be reinstalled in the cluster by creating a DaemonSet and launching the pods.
kube-proxy daemon got created at the time of cluster creation, so you need to write your own manifest for daemon-set unless you have a backup to restore it from there.

I cannot load the node information on kubernetes

When I ran the command below, I got the below messages
bistel#BISTelResearchDev-DN03:~$ kubectl get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?
While in the master node, I get the information as below:
bistel#BISTelResearchDev-NN:/etc/kubernetes$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
bistelresearchdev-dn03 NotReady <none> 62s v1.19.3
bistelresearchdev-nn Ready master 57m v1.19.3
bistel#BISTelResearchDev-NN:/etc/kubernetes$
The bistelresearchdev-dn03 is the worker node and the message appears when I ran any command using kubectl as follows The connection to the server localhost:8080 was refused - did you specify the right host or port?.
I googled it a lot but any trials didn't work for me.
Thanks,
kubectl works only on master node in cluster. If you are getting this error then there is no issue.
I can see the issue here is node is NotReady status for that you can check below things.
Check kubelet is running on node bistelresearchdev-dn03 with systemctl status kubelet
Check network plugin is installed on your cluster.
The first computer you ran on is missing the kube config file.
Normally kubectl expects to find it at
~/.kube/config
If you get the one off the master node and copy it onto your machine your kubectl will see it and be able to use it.

Nginx Kubernetes POD stays in ContainerCreating

I was able to setup the Kubernetes Cluster on Centos7 with one master and two worker nodes, however when I try to deploy a pod with nginx, the state of the pod stays in ContainerRunning forever and doesn't seem to get out of it.
For pod network I am using the calico.
Can you please help me resolve this issue? for some reason I don't feel satisfied moving forward without resolving this issue, I tried to check forums etc, since the last two days and this is the last resort that I am reaching out to you.
[root#kube-master ~]# kubectl get pods --all-namespaces
[get pods result][1]
However when I run describe pods I see the below error for the nginx container under events section.
Warning FailedCreatePodSandBox 41s (x8 over 11m) kubelet,
kube-worker1 (combined from similar events): Failed to create pod
sandbox: rpc error: code = Unknown desc = failed to set up sandbox
container
"ac77a42270009cba0c508e2fd82a84d6caef287bdb117d288d5193960b52abcb"
network for pod "nginx-6db489d4b7-2r4d2": networkPlugin cni failed to
set up pod "nginx-6db489d4b7-2r4d2_default" network: unable to connect
to Cilium daemon: failed to create cilium agent client after 30.000000
seconds timeout: Get http:///var/run/cilium/cilium.sock/v1/config:
dial unix /var/run/cilium/cilium.sock: connect: no such file or
directory
Hope you can help here.
Edit 1:
The ip address of the master VM is 192.168.40.133
Used the below command to initialize the kubeadm:
kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address 192.168.40.133
Used the below command to install the pod network:
kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
The kubeadm init above gave me the join command that I used to join the workers into the cluster.
All the VMs are connected to host and bridged network adapters.
your pod subnet (specified by --pod-network-cidr) clashes with the network your VMs are located in: these 2 have to be distinct. Use something else for the pod subnet, for example 10.244.0.0/16 and then edit calico.yaml before applying it as described in the official docs:
POD_CIDR="10.244.0.0/16"
kubeadm init --pod-network-cidr=${POD_CIDR} --apiserver-advertise-address 192.168.40.133
curl https://docs.projectcalico.org/manifests/calico.yaml -O
sed -i -e "s?192.168.0.0/16?${POD_CIDR}?g" calico.yaml
kubectl apply -f calico.yaml
hope this helps :)
note: you don't really need to specify --apiserver-advertise-address flag: kubeadm will detect correctly the main IP of the machine most of the time.

Unable to get kubernetes dashboard

I've installad a new cluster (version 1.13.5 of kubectl kubelet kubeadm), then I've installed flannel and add a worker node.
Now I'm trying to add kubernetes dashboard to my cluster but after i run
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
I've this situation
kubernetes-dashboard-**** 0/1 CrashLoopBackOff 1 8s
Then if I get the log i can see this
Error while initializing connection to Kubernetes apiserver...
Where I'm wrong?
It seems that the problem was on the worker, when I put the dashboard on master the pod starts.
Maybe the kube dashboard has to be installed on the master or there is something wrong with flannel and the master-node communication.
Check api-server pod is running or not and KubeDNS is working fine or not.

Rancher 2.0 - Troubleshooting and fixing “Controller Manager Unhealthy Issue”

I have a problem with controller-manager and scheduler not responding, that is not related to github issues I've found (rancher#11496, azure#173, …)
Two days ago we had a memory overflow by one POD on one Node in our 3-node HA cluster. After that rancher webapp was not accessible, we found the compromised pod and scaled it to 0 over kubectl. But that took some time, figuring everything out.
Since then rancher webapp is working properly, but there are continuous alerts from controller-manager and scheduler not working. Alerts are not consist, sometimes they are both working, some times their health check urls are refusing connection.
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
etcd-2 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
Restarting controller-manager and scheduler on compromised Node hasn’t been effective. Even reloading all of the components with
docker restart kube-apiserver kubelet kube-controller-manager kube-scheduler kube-proxy
wasn’t effective either.
Can someone please help me figure out the steps towards troubleshooting and fixing this issue without downtime on running containers?
Nodes are hosted on DigitalOcean on servers with 4 Cores and 8GB of RAM each (Ubuntu 16, Docker 17.03.3).
Thanks in advance !
The first area to look at would be your logs... Can you export the following logs and attach them?
/var/log/kube-controller-manager.log
The controller manager is an endpoint, so you will need to do a "get endpoint". Can you run the following:
kubectl -n kube-system get endpoints kube-controller-manager
and
kubectl -n kube-system describe endpoints kube-controller-manager
and
kubectl -n kube-system get endpoints kube-controller-manager -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}'
Please run this command in master nodes
sed -i 's|- --port=0|#- --port=0|' /etc/kubernetes/manifests/kube-scheduler.yaml
sed -i 's|- --port=0|#- --port=0|' /etc/kubernetes/manifests/kube-controller-manager.yaml
systemctl restart kubelet
After restarting the kubelet, the problem will be solved.