What does this error from my coredns pod log mean and how do I debug it?
[ERROR] plugin/errors: 2 2858211404501823821.6843583298703021155. HINFO: read udp 192.168.27.16:47449->67.207.67.3:53: i/o timeout
The behavior is odd.
A single test pod will execute a curl command correctly, but the network will not.
Also each node is able to speak with each of the other nodes.
To my knowledge I have not changed any relevant configurations since the network last functioned "as expected."
UPDATE:
So I do not know if this counts as a solution, but I deleted all pods (including coreDNS) and allowed them to restart. The system now works.
I will keep this question up and mark as solved just in case anyone does not know this nifty command (do not use on a production cluster)
kubectl delete po -A --all
I deleted all pods (including coreDNS) and allowed them to restart. The system now works.
I will keep this question up and mark as solved just in case anyone does not know this nifty command (do not use on a production cluster)
kubectl delete po -A --all
Another way to do this (probably safer) is:
kubectl -n kube-system rollout restart deployment coredns
Thanks to #Richard_Bateman
Related
I am having a IBM cloud powered kubernetes cluster. That cluster currently have only 1 node.
I verified running the command kubectl get nodes.
There are few servers which are running in that node. I want to restart one of those server.
How can I get into the node and perform a restart for the required server?
I tried ssh, but this link says it cannot be done directly.
Seems like your main questions are:
"how to restart a pod", "how to ssh to a entity in which my service is running" and "how to see if I deleted a Pod".
First of all, most of this questions are already answered on StackOverflow. Second of all you need to get familiar with Kubernetes basic terminology and how things work in here. You can do that in any Kubernetes introduction or in documentation.
Answering the questions:
1) About restarting you can find information here. Or if you have running deployment, deleting a pod will result in pod recreation.
2) you can use kubectl execas described here:
kubectl exec -ti pod_name sh(or bash)
3) to see your pods, run kubectl get pods after you run kubectl delete pod name -n namespace you can run kubectl get pods -w to see changing status of deleted pod and new one being spawned. Or you will notice that there is a new pod running but with different NAME.
Right now, I deployed some pods on my kubernetes cluster. But sometime, my image may has some bugs which make the pod cannot start correctly.
For example:
nats-1 0/1 CrashLoopBackOff 121 10h
I also cannot see any error in the kubectl log.
So is there any way to access this pod? Or is there any tools or tech can allow to to enter the container?
Thanks a lot all! :)
You can kubectl describe to get the events, it sometimes might show some errors there. Otherwise you can probably also make the deployment/pod run a command like sleep 3600 to keep it open for you to exec into it to investigate further.
Edited after clarification:
You could go into the worker (kubectl get pod <pod-name> -o wide to get which one) and access the node syslogs or pods' logs. That should show you a more detailed information of what happened.
But #ho-man approach is very valid and less cumbersome.
Here is a transcript:
LANELSON$ kubectl --kubeconfig foo get -a jobs
No resources found.
OK, fine; even with the -a option, no jobs exist. Cool! Oh, let's just be paranoid and check for one that we know was created. Who knows? Maybe we'll learn something:
LANELSON$ kubectl --kubeconfig foo get -a job emcc-poc-emcc-broker-mp-populator
NAME DESIRED SUCCESSFUL AGE
emcc-poc-emcc-broker-mp-populator 1 0 36m
Er, um, what?
In this second case, I just happen to know the name of a job that was created, so I ask for it directly. I would have thought that kubectl get -a jobs would have returned it in its output. Why doesn't it?
Of course what I'd really like to do is get the logs of one of the pods that the job created, but kubectl get -a pods doesn't show any of that job's terminated pods either, and of course I don't know the name of any of the pods that the job would have spawned.
What is going on here?
Kubernetes 1.7.4 if it matters.
The answer is that Istio automatic sidecar injection happened to be "on" in the environment (I had no idea, nor should I have). When this happens, you can opt out of it, but otherwise all workloads are affected by default (!). If you don't opt out of it, and Istio's presence causes your Job not to be created for any reason, then your Job is technically uninitialized. If a resource is uninitialized, then it does not show up in kubectl get lists. To make an uninitialized resource show up in kubectl get lists, you need to include the --include-uninitialized option to get. So once I issued kubectl --kubeconfig foo get -a --include-uninitialized jobs, I could see the failed jobs.
My higher-level takeaway is that the initializer portion of Kubernetes, currently in alpha, is not at all ready for prime time yet.
When a Kubernetes pod goes into CrashLoopBackOff state, you will fix the underlying issue. How do you force it to be rescheduled?
For apply new configuration the new pod should be created (the old one will be removed).
If your pod was created automatically by Deployment or DaemonSet resource, this action will run automaticaly each time after you update resource's yaml.
It is not going to happen if your resource have spec.updateStrategy.type=OnDelete.
If problem was connected with error inside docker image, that you solved, you should update pods manually, you can use rolling-update feature for this purpose, In case when new image have same tag, you can just remove broken pod. (see below)
In case of node failure, the pod will recreated on new node after few time, the old pod will be removed after full recovery of broken node. worth noting it is not going to happen if your pod was created by DaemonSet or StatefulSet.
Any way you can manual remove crashed pod:
kubectl delete pod <pod_name>
Or all pods with CrashLoopBackOff state:
kubectl delete pod `kubectl get pods | awk '$3 == "CrashLoopBackOff" {print $1}'`
If you have completely dead node you can add --grace-period=0 --force options for remove just information about this pod from kubernetes.
Generally a fix requires you to change something about the configuration of the pod (the docker image, an environment variable, a command line flag, etc), in which case you should remove the old pod and start a new pod. If your pod is running under a replication controller (which it should be), then you can do a rolling update to the new version.
5 Years later, unfortunately, this scenario seems to still be the case.
#kvaps answer above suggested an alternative (rolling updates), that essentially updates(overwrites) instead of deleting a pod -- the current working link of rolling updates
The alternative to being able to delete a pod, was NOT to create a pod but instead create a deployment, and delete the deployment that contains the pod, subject to deletion.
$ kubectl get deployments -A
$ kubectl delete -n <NAMESPACE> deployment <DEPLOYMENT>
# When on minikube or using docker for development + testing
$ docker system prune -a
The first command displays all deployments, alongside their respective namespaces. This helped me reduce the error of deleting deployments that share the same name(name collision) but from two different namespaces.
The second command deletes a deployment that is exactly located underneath a namespace.
The last command helps when working in development mode. Essentially, removing all unused images, which is not required but helps clean up and save some disk-space.
Another great tip, is to try to understand the reasons why a Pod is failing. The problem may be relying completely somewhere else, and k8s does a good deal of documenting. For that one of the following may help:
$ kubectl logs -f <POD NAME>
$ kubectl get events
Other reference here on StackOveflow:
https://stackoverflow.com/a/55647634/132610
For anyone interested I wrote a simple helm chart and python script which watches the current namespace and deletes any pod that enters CrashLoopBackOff.
The chart is at https://github.com/timothyclarke/helm-charts/tree/master/charts/dr-abc.
This is a sticking plaster. Fixing the problem is always the best option. In my specific case getting the historic apps into K8s so the development teams have a common place to work and strangle the old applications with new ones is preferable to fixing all the bugs in the old apps. Having this in the namespace to keep the illusion of everything running buys that time.
This command will delete all pods that are in any of (CrashLoopBackOff, Init:CrashLoopBackOff, etc.) states. You can use grep -i <keyword> to match different states and then delete the pods that match the state. In your case it should be:
kubectl get pod -n <namespace> --no-headers | grep -i crash | awk '{print $1}' | while read line; do; kubectl delete pod -n <namespace> $line; done
I have new setup of Kubernetes and I created replication with 2. However what I see when I do " kubectl get pods' is that one is running another is "pending". Yet when I go to my 7 test nodes and do docker ps I see that all of them are running.
What I think is happening is that I had to change the default insecure port from 8080 to 7080 (the docker app actually runs on 8080), however I don't know how to tell if I am right, or where else to look.
Along the same vein, is there any way to setup config for kubectl where I can specify the port. Doing kubectl --server="" is a bit annoying (yes I know I can alias this).
If you changed the API port, did you also update the nodes to point them at the new port?
For the kubectl --server=... question, you can use kubectl config set-cluster to set cluster info in your ~/.kube/config file to avoid having to use --server all the time. See the following docs for details:
http://kubernetes.io/v1.0/docs/user-guide/kubectl/kubectl_config.html
http://kubernetes.io/v1.0/docs/user-guide/kubectl/kubectl_config_set-cluster.html
http://kubernetes.io/v1.0/docs/user-guide/kubectl/kubectl_config_set-context.html
http://kubernetes.io/v1.0/docs/user-guide/kubectl/kubectl_config_use-context.html