Coredns in Crashloopbackoff state with calico network - kubernetes

I have a ubuntu 16.04 running in virtual box. I installed Kubernetes on it as a single node using kubeadm.
But coredns pods are in Crashloopbackoff state.
All other pods are running.
Single interface(enp0s3) - Bridge Network
Applied calico using
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
output on kubectl describe pod:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 41m default-scheduler Successfully assigned kube-system/coredns-66bff467f8-dxzq7 to kube
Normal Pulled 39m (x5 over 41m) kubelet, kube Container image "k8s.gcr.io/coredns:1.6.7" already present on machine
Normal Created 39m (x5 over 41m) kubelet, kube Created container coredns
Normal Started 39m (x5 over 41m) kubelet, kube Started container coredns
Warning BackOff 87s (x194 over 41m) kubelet, kube Back-off restarting failed container

I did a kubectl logs <coredns-pod> and found error logs below and looked in the mentioned link
As per suggestion, added resolv.conf = /etc/resolv.conf at the end of /etc/kubernetes/kubelet/conf.yaml and recreated the pod.
kubectl logs coredns-66bff467f8-dxzq7 -n kube-system
.:53 [INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7 CoreDNS-1.6.7 linux/amd64, go1.13.6, da7f65b [FATAL] plugin/loop: Loop (127.0.0.1:34536 -> :53) detected for zone ".", see coredns.io/plugins/loop#troubleshooting. Query: "HINFO 8322382447049308542.5528484581440387393."
root#kube:/home/kube#

Commented below line in /etc/resolv.conf (Host machine) and delete the coredns pods in kube-system namespace.
New pods came in running state :)
#nameserver 127.0.1.1

Related

istio bookinfo sample application deployment failed for "ImagePullBackOff"

I am trying to deploy istio's sample bookinfo application using the below command:
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
from here
but each time I am getting ImagePullBackoff error like this:
NAME READY STATUS RESTARTS AGE
details-v1-c74755ddf-m878f 2/2 Running 0 6m32s
productpage-v1-778ddd95c6-pdqsk 2/2 Running 0 6m32s
ratings-v1-5564969465-956bq 2/2 Running 0 6m32s
reviews-v1-56f6655686-j7lb6 1/2 ImagePullBackOff 0 6m32s
reviews-v2-6b977f8ff5-55tgm 1/2 ImagePullBackOff 0 6m32s
reviews-v3-776b979464-9v7x5 1/2 ImagePullBackOff 0 6m32s
For error details, I have run :
kubectl describe pod reviews-v1-56f6655686-j7lb6
Which returns these:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m41s default-scheduler Successfully assigned default/reviews-v1-56f6655686-j7lb6 to minikube
Normal Pulled 7m39s kubelet Container image "docker.io/istio/proxyv2:1.15.3" already present on machine
Normal Created 7m39s kubelet Created container istio-init
Normal Started 7m39s kubelet Started container istio-init
Warning Failed 5m39s kubelet Failed to pull image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0": rpc error: code = Unknown desc = context deadline exceeded
Warning Failed 5m39s kubelet Error: ErrImagePull
Normal Pulled 5m39s kubelet Container image "docker.io/istio/proxyv2:1.15.3" already present on machine
Normal Created 5m39s kubelet Created container istio-proxy
Normal Started 5m39s kubelet Started container istio-proxy
Normal BackOff 5m36s (x3 over 5m38s) kubelet Back-off pulling image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0"
Warning Failed 5m36s (x3 over 5m38s) kubelet Error: ImagePullBackOff
Normal Pulling 5m25s (x2 over 7m38s) kubelet Pulling image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0"
Do I need to build dockerfile first and push it to the local repository? There are no clear instructions there or I failed to find any.
Can anybody help?
If you check in dockerhub the image is there:
https://hub.docker.com/r/istio/examples-bookinfo-reviews-v1/tags
So the error that you need to deal with is context deadline exceeded while trying to pull it from dockerhub. This is likely a networking error (a generic Go error saying it took too long), depending on where your cluster is running you can do manually a docker pull from the nodes and that should work.
EDIT: for minikube do a minikube ssh and then a docker pull
Solved the problem by below command :
minikube ssh docker pull istio/examples-bookinfo-reviews-v1:1.17.0
from this git issues here
Also How to use local docker images with Minikube?
Hope this may help somebody

Crashloopbackoff while creating nginx controller

I have installed Kubernetes on AWS-EC2 machines, the cluster has a master and 2 nodes connected to it
[root#k8-m deployments]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8-m Ready control-plane,master 107m v1.20.1
k8-n1 Ready <none> 101m v1.20.1
k8-n2 Ready <none> 91m v1.20.1
I have a requirement to install ingress controller for exposing the traffic outside and the chosen controller is nginx. Am creating the resources such as ns, service account, secret, rbac, config map, ap-rbac, daemon-set config taken from https://github.com/nginxinc/kubernetes-ingress.git.
After creating the properties for ingress controller, am seeing the pods going to crashloopbackoff state
[root#k8-m deployments]# kubectl get all -n nginx-ingress
NAME READY STATUS RESTARTS AGE
pod/nginx-ingress-555f75f85f-5vxf6 0/1 CrashLoopBackOff 7 11m
pod/nginx-ingress-7wmhw 0/1 CrashLoopBackOff 7 11m
pod/nginx-ingress-mss7v 0/1 CrashLoopBackOff 7 11m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/nginx-ingress 2 2 0 2 0 <none> 11m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx-ingress 0/1 1 0 11m
NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-ingress-555f75f85f 1 1 0 11m
By describing the pod i get the below(pasting only the event details),
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned nginx-ingress/nginx-ingress-555f75f85f-5vxf6 to k8-n2
Normal Pulled 14m kubelet Successfully pulled image "nginx/nginx-ingress:edge" in 2.456877779s
Normal Pulled 14m kubelet Successfully pulled image "nginx/nginx-ingress:edge" in 2.501405255s
Normal Pulled 13m kubelet Successfully pulled image "nginx/nginx-ingress:edge" in 2.63456627s
Normal Created 13m (x4 over 14m) kubelet Created container nginx-ingress
Normal Started 13m (x4 over 14m) kubelet Started container nginx-ingress
Normal Pulled 13m kubelet Successfully pulled image "nginx/nginx-ingress:edge" in 2.659821346s
Normal Pulling 12m (x5 over 14m) kubelet Pulling image "nginx/nginx-ingress:edge"
Warning BackOff 3m53s (x47 over 14m) kubelet Back-off restarting failed container
Am not able to see the logs though
Below are the executions while creating the nginx controller,
kubectl create -f common/ns-and-sa.yaml
kubectl create -f rbac/rbac.yaml
kubectl create -f rbac/ap-rbac.yaml
kubectl create -f common/default-server-secret.yaml
kubectl create -f common/nginx-config.yaml
kubectl create -f deployment/nginx-ingress.yaml
kubectl create -f daemon-set/nginx-ingress.yaml
Could anyone here advice me on this

More detailed monitoring of Pod states

Our Pods usually spend at least a minute and up to several minutes in the Pending state, the events via kubectl describe pod x yield:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned testing/runner-2zyekyp-project-47-concurrent-0tqwl4 to host
Normal Pulled 55s kubelet, host Container image "registry.com/image:c1d98da0c17f9b1d4ca81713c138ee2e" already present on machine
Normal Created 55s kubelet, host Created container build
Normal Started 54s kubelet, host Started container build
Normal Pulled 54s kubelet, host Container image "gitlab/gitlab-runner-helper:x86_64-6214287e" already present on machine
Normal Created 54s kubelet, host Created container helper
Normal Started 54s kubelet, host Started container helper
The provided information is not exactly detailed as to figure out exactly what is happening.
Question:
How can we gather more detailed metrics of what exactly and when exactly something happens in regards to get a Pod running in order to troubleshoot which step exactly needs how much time?
Special interest would be the metric of how long it takes to mount a volume.
Check kubelet and kube scheduler logs because kube scheduler schedules the pod to a node and kubelet starts the pod on that node and reports the status as ready.
journalctl -u kubelet # after logging into the kubernetes node
kubectl logs kube-scheduler -n kube-system
Describe the pod, deployment, replicaset to get more details
kubectl describe pod podnanme -n namespacename
kubectl describe deploy deploymentnanme -n namespacename
kubectl describe rs replicasetnanme -n namespacename
Check events
kubectl get events -n namespacename
Describe the nodes and check available resources and status which should be ready.
kubectl describe node nodename

Error while starting POD in a newly created kubernetes cluster (ContainerCreating)

I am new to Kubernetes. I have created a Kubernetes cluster with one Master node and 2 worker nodes. I have installer helm for the deployment of apps. I am getting the following error while starting the tiller pod
tiller-deploy-5b4685ffbf-znbdc 0/1 ContainerCreating 0 23h
After describing the pod I got the following result
[root#master-node flannel]# kubectl --namespace kube-system describe
pod tiller-deploy-5b4685ffbf-znbdc
Events:
Type Reason Age From Message
Warning FailedCreatePodSandBox 10m (x34020 over 22h) kubelet,
worker-node1 (combined from similar events): Failed to create pod
sandbox: rpc error: code = Unknown desc = failed to set up sandbox
container
"cdda0a8ae9200668a2256e8c7b41904dce604f73f0282b0443d972f5e2846059"
network for pod "tiller-deploy-5b4685ffbf-znbdc": networkPlugin cni
failed to set up pod "tiller-deploy-5b4685ffbf-znbdc_kube-system"
network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 25s (x34556 over 22h) kubelet, worker-node1 Pod
sandbox changed, it will be killed and re-created.
Any hint of how can I get away with this error.
You need to setup a CNI plugin such as Flannel. Verify if all the pods in kube-system namespace are running.
To apply flannel in you cluster run the following command:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
For flannel to work correctly pod-network-cidr should be 10.244.0.0/16 or if you have a different CIDR, you can customize flannel manifest (kube-flannel.yml) according to your needs.
Example:
net-conf.json: |
{
"Network": "10.10.0.0/16",
"Backend": {
"Type": "vxlan"
}

Troubles while installing GlusterFS on Kubernetes cluster using Heketi

I try to install GlusterFS on my kubernetes cluster using heketi. I start gk-deploy but it shows that pods aren't found:
Using Kubernetes CLI.
Using namespace "default".
Checking for pre-existing resources...
GlusterFS pods ... not found.
deploy-heketi pod ... not found.
heketi pod ... not found.
gluster-s3 pod ... not found.
Creating initial resources ... Error from server (AlreadyExists): error when creating "/heketi/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml": serviceaccounts "heketi-service-account" already exists
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "heketi-sa-view" already exists
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view not labeled
OK
node/sapdh2wrk1 not labeled
node/sapdh2wrk2 not labeled
node/sapdh2wrk3 not labeled
daemonset.extensions/glusterfs created
Waiting for GlusterFS pods to start ... pods not found.
I've started gk-deploy more than once.
I have 3 nodes in my kubernetes cluster and it seems like pods can't start up on none of them, but I don't understand why.
Pods are created but aren't ready:
kubectl get pods
NAME READY STATUS RESTARTS AGE
glusterfs-65mc7 0/1 Running 0 16m
glusterfs-gnxms 0/1 Running 0 16m
glusterfs-htkmh 0/1 Running 0 16m
heketi-754dfc7cdf-zwpwn 0/1 ContainerCreating 0 74m
Here is a log of one GlusterFS Pod, it ends with a warning:
Events:
Type Reason Age From Message
Normal Scheduled 19m default-scheduler Successfully assigned default/glusterfs-65mc7 to sapdh2wrk1
Normal Pulled 19m kubelet, sapdh2wrk1 Container image "gluster/gluster-centos:latest" already present on machine
Normal Created 19m kubelet, sapdh2wrk1 Created container
Normal Started 19m kubelet, sapdh2wrk1 Started container
Warning Unhealthy 13m (x12 over 18m) kubelet, sapdh2wrk1 Liveness probe failed: /usr/local/bin/status-probe.sh
failed check: systemctl -q is-active glusterd.service
Warning Unhealthy 3m58s (x35 over 18m) kubelet, sapdh2wrk1 Readiness probe failed: /usr/local/bin/status-probe.sh
failed check: systemctl -q is-active glusterd.service
Glusterfs-5.8-100.1 is installed and started up on every node including master.
What is the reason why Pods don't start up?