Kubernetes Master Server is failing to become up and running - kubernetes

Installed kubeadm v1.6.0-alpha, kubectl v1.5.3, kubelet v1.5.3
Executed command $kubeadm init, to bring the Kubernetes Master up.
Issue observed: Stuck with the below log message
Created API client, waiting for the control plane to become ready
How to make the Kubernetes master server up and running or how to debug the issue?

Could you try using kubelet and kubectl 1.6 to see if it is a version mismatch?

Related

Kubectl connection refused existing cluster

Hope someone can help me.
To describe the situation in short, I have a self managed k8s cluster, running on 3 machines (1 master, 2 worker nodes). In order to make it HA, I attempted to add a second master to the cluster.
After some failed attempts, I found out that I needed to add controlPlaneEndpoint configuration to kubeadm-config config map. So I did, with masternodeHostname:6443.
I generated the certificate and join command for the second master, and after running it on the second master machine, it failed with
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
Checking the first master now, I get connection refused for the IP on port 6443. So I cannot run any kubectl commands.
Tried recreating the .kube folder, with all the config copied there, no luck.
Restarted kubelet, docker.
The containers running on the cluster seem ok, but I am locked out of any cluster configuration (dashboard is down, kubectl commands not working).
Is there any way I make it work again? Not losing any of the configuration or the deployments already present?
Thanks! Sorry if it’s a noob question.
Cluster information:
Kubernetes version: 1.15.3
Cloud being used: (put bare-metal if not on a public cloud) bare-metal
Installation method: kubeadm
Host OS: RHEL 7
CNI and version: weave 0.3.0
CRI and version: containerd 1.2.6
This is an old, known problem with Kubernetes 1.15 [1,2].
It is caused by short etcd timeout period. As far as I'm aware it is a hard coded value in source, and cannot be changed (feature request to make it configurable is open for version 1.22).
Your best bet would be to upgrade to a newer version, and recreate your cluster.

kubernetes issue : runtime network not ready

I am beginner in kubernetes and I'm trying to set up my first cluster , my worker node has joined to my cluster successfully but when I run kubectl get nodes it is in NotReady status .
and this massesge exists when I run
kubectl describe node k8s-node-1
runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
I have run this command to install a a Pod network add-on:
kubectl apply -f https://docs.projectcalico.org/v3.14/manifests/calico.yaml
how can I solve this issue?
Adding this answer as community wiki for better visibility. OP already solved the problem with rebooting the machine.
Worth to remember that going thru all the steps with bootstrapping cluster and installing all the prerequisites will make your cluster running successfully . If you had any previous installations please remember to perform kubeadm reset and remove .kube folder from the home or root directory.
I`m also linking this github case with the same issue whereas people provide solution for this problem.

OKD unable to pull lager images from Internal Registry right after deployment of microservices through Jenkinsx

I am trying to deploy micro services in OKD through Jenkinsx and the deployment is successful every time.
But the Pods are going into "ImagePullBackOff" error right after deployment and comes into Running state after deleting the pods.
ImagePullBackOff Error:
Events:
The images are being pulled from the OKD's internal registry and the image is of size "1.25 GB". And the images are available in the Internal Registry when the pod is trying to pull it.
I came across "image-pull-progress-deadline" field to be updated in the "/etc/origin/node/node-config.yaml" in all the nodes. Updated the same in all the nodes but still facing the same "ImagePullBackOff" error.
I am trying to restart the kubelet service but that fails with kubelet.service not found error,
[master ~]$ sudo systemctl status kubelet
Unit kubelet.service could not be found.
Please let me know if restart of kubelet service is necessary and any suggestions to resolve the "ImagePullBackOff" issue.

Kubernetes event logs

As a part of debug i need to track down events like pod creation and removal. in my kubernetes set up I am using logging level 5.
Kube api server, scheduler, controller, etcd are running on master node and the minion nodes is running kubelet and docker.
I am using journalctl to get K8s logs on master node as well on worker node. On worker node i can see logs from Docker and Kubelet. These logs contain events as i would expect as i create and destroy pods.
However on Master node i dont see any relevant logs which may indicate a pod creation or removal request handling.
what other logs or method i can use to get such logs from Kubernetes master components (API server, controller, scheduler, etcd)?
i have checked the logs from API server, controller, scheduler, etcd pods; they dont seem to have such information.
thanks
System component logs:
There are two types of system components:
those that run in a container
and those that do not run in a container.
For example:
The Kubernetes scheduler and kube-proxy run in a container
The kubelet and container runtime, for example Docker, do not run in containers.
On machines with systemd, the kubelet and container runtime write to journald. If systemd is not present, they write to .log files in the /var/log directory. System components inside containers always write to the /var/log directory, bypassing the default logging mechanism. They use the klog logging library.
Master components logs:
Get them from those containers running on master nodes.
$
$ docker ps | grep apiserver
d6af65a248f1 af20925d51a3 "kube-apiserver --ad…" 2 weeks ago Up 2 weeks k8s_kube-apiserver_kube-apiserver-minikube_kube-system_177a3eb80503eddadcdf8ec0423d04b9_0
5f0e6b33a29f k8s.gcr.io/pause-amd64:3.1 "/pause" 2 weeks ago Up 2 weeks k8s_POD_kube-apiserver-minikube_kube-system_177a3eb80503eddadcdf8ec0423d04b9_0
$
$
$ docker logs -f d6a
But all of this approach to logging is just for testing , you should stream all the logs , ( app logs , container logs , cluster level logs , everything) to a centeral logging system such as ELK or EFK.

calico-policy-container on the worker node is on a restart loop. how can i check why?

I have two coreos stable machines (with latest stable version installed) to test Kubernetes. i installed kubernetes 1.5.1 using the script from https://github.com/coreos/coreos-kubernetes/tree/master/multi-node/generic and patched it with https://github.com/kfirufk/coreos-kubernetes-multi-node-generic-install-script.
I installed controller script on one and worker script on the other. kubectl get nodes shows both servers.
kubectl get pods --namespace=kube-system shows that calico-policy-controller-2j5dn restarts a lot. in the worker server I do see that calico-policy-controller restarts a lot. any idea how to investigate this issue further?
how can I check why it restarts? are there any logs for this container?
kubectl logs --previous $id —namespace=kube-system
i added --previous because when the controller restart it has a different random characters appended to it.
in my case that kube-policy-controller what started on one server, and requested the etcd2 certificates that where generated on a different server.