I am testing a log previous command and for that I need a pod to restart.
I can get my pods using a command like
kubectl get pods -n $ns -l $label
Which shows that my pods did not restart so far. I want to test the command:
kubectl logs $podname -n $ns --previous=true
That command fails because my pod did not restart making the --previous=true switch meaningless.
I am aware of this command to restart pods when configuration changed:
kubectl rollout restart deployment myapp -n $ns
This does not restart the containers in a way that is meaningful for my log command test but rather terminates the old pods and creates new pods (which have a restart count of 0).
I tried various versions of exec to see if I can shut them down from within but most commands I would use are not found in that container:
kubectl exec $podname -n $ns -- shutdown
kubectl exec $podname -n $ns -- shutdown now
kubectl exec $podname -n $ns -- halt
kubectl exec $podname -n $ns -- poweroff
How can I use a kubectl command to forcefully restart the pod with it retaining its identity and the restart counter increasing by one so that my test log command has a previous instance to return the logs from.
EDIT:
Connecting to the pod is well described.
kubectl -n $ns exec --stdin --tty $podname -- /bin/bash
The process list shows only a handful running processes:
ls -1 /proc | grep -Eo "^[0-9]{1,5}$"
proc 1 seems to be the one running the pod.
kill 1 does nothing, not even kill the proc with pid 1
I am still looking into this at the moment.
There are different ways to achieve your goal. I'll describe below most useful options.
Crictl
Most correct and efficient way - restart the pod on container runtime level.
I tested this on Google Cloud Platform - GKE and minikube with docker driver.
You need to ssh into the worker node where the pod is running. Then find it's POD ID:
$ crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
9863a993e0396 87a94228f133e 3 minutes ago Running nginx-3 2 6d17dad8111bc
OR
$ crictl pods -s ready
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
6d17dad8111bc About an hour ago Ready nginx-3 default 2 (default)
Then stop it:
$ crictl stopp 6d17dad8111bc
Stopped sandbox 6d17dad8111bc
After some time, kubelet will start this pod again (with different POD ID in CRI, however kubernetes cluster treats this pod as the same):
$ crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
f5f0442841899 87a94228f133e 41 minutes ago Running nginx-3 3 b628e1499da41
This is how it looks in cluster:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-3 1/1 Running 3 48m
Getting logs with --previous=true flag also confirmed it's the same POD for kubernetes.
Kill process 1
It works with most images, however not always.
E.g. I tested on simple pod with nginx image:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 27h
$ kubectl exec -it nginx -- /bin/bash
root#nginx:/# kill 1
root#nginx:/# command terminated with exit code 137
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 1 27h
Useful link:
Debugging Kubernetes nodes with crictl
Related
For some troubleshooting, I want to connect to my coredns pod. Is this possible?
$ microk8s kubectl get pod --namespace kube-system
NAME READY STATUS RESTARTS AGE
hostpath-provisioner-5c65fbdb4f-w6fmn 1/1 Running 1 7d22h
coredns-7f9c69c78c-mcdl5 1/1 Running 1 7d23h
calico-kube-controllers-f7868dd95-hbmjt 1/1 Running 1 7d23h
calico-node-rtprh 1/1 Running 1 7d23h
When I try, I get the following error msg:
$ microk8s kubectl --namespace kube-system exec --stdin --tty coredns-7f9c69c78c-mcdl5 -- /bin/bash
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "f1d08ed8494894d1281cd5c43dee36119225ab1ba414def333659538e5edc561": OCI runtime exec failed: exec failed: container_linux.go:370: starting container process caused: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown
User AndD has good mentioned in the comment:
Coredns Pod have no shell, I think. Check this to kind-of exec with a sidecar: How to get into CoreDNS pod kuberrnetes?
Yes. This image has no shell. You can read more about this situation in this thread:
The image does not contain a shell. Logs can be viewed with kubectl.
You have asked:
I want to connect to my coredns pod, is this possible?
Theoretically yes, but you need to make a workaroud with docker. It is described in this answer:
In short, do this to find a node where a coredns pod is running:
kubectl -n kube-system get po -o wide | grep coredns
ssh to one of those nodes, then:
docker ps -a | grep coredns
Copy the Container ID to clipboard and run:
ID=<paste ID here>
docker run -it --net=container:$ID --pid=container:$ID --volumes-from=$ID alpine sh
You will now be inside the "sidecar" container and can poke around. I.e.
cat /etc/coredns/Corefile
Additionally, you can check the logs, with kubectl. See also official documentation about DNS debugging.
When I run:
kubectl get pods --field-selector=status.phase=Running
I see:
NAME READY STATUS RESTARTS AGE
k8s-fbd7b 2/2 Running 0 5m5s
testm-45gfg 1/2 Error 0 22h
I don't understand why this command gives me pod that are in Error status?
According to K8S api, there is no such thing STATUS=Error.
How can I get only the pods that are in this Error status?
When I run:
kubectl get pods --field-selector=status.phase=Failed
It tells me that there are no pods in that status.
Using the kubectl get pods --field-selector=status.phase=Failed command you can display all Pods in the Failed phase.
Failed means that all containers in the Pod have terminated, and at least one container has terminated in failure (see: Pod phase):
Failed - All containers in the Pod have terminated, and at least one container has terminated in failure. That is, the container either exited with non-zero status or was terminated by the system.
In your example, both Pods are in the Running phase because at least one container is still running in each of these Pods.:
Running - The Pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting.
You can check the current phase of Pods using the following command:
$ kubectl get pod -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'
Let's check how this command works:
$ kubectl get pods
NAME READY STATUS
app-1 1/2 Error
app-2 0/1 Error
$ kubectl get pod -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'
app-1 Running
app-2 Failed
As you can see, only the app-2 Pod is in the Failed phase. There is still one container running in the app-1 Pod, so this Pod is in the Running phase.
To list all pods with the Error status, you can simply use:
$ kubectl get pods -A | grep Error
default app-1 1/2 Error
default app-2 0/1 Error
Additionally, it's worth mentioning that you can check the state of all containers in Pods:
$ kubectl get pod -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[*].state}{"\n"}{end}'
app-1 {"terminated":{"containerID":"containerd://f208e2a1ff08c5ce2acf3a33da05603c1947107e398d2f5fbf6f35d8b273ac71","exitCode":2,"finishedAt":"2021-08-11T14:07:21Z","reason":"Error","startedAt":"2021-08-11T14:07:21Z"}} {"running":{"startedAt":"2021-08-11T14:07:21Z"}}
app-2 {"terminated":{"containerID":"containerd://7a66cbbf73985efaaf348ec2f7a14d8e5bf22f891bd655c4b64692005eb0439b","exitCode":2,"finishedAt":"2021-08-11T14:08:50Z","reason":"Error","startedAt":"2021-08-11T14:08:50Z"}}
You can simply grep the Error pods using the
kubectl get pods --all-namespces | grep Error
Remove all error pods from the cluster
kubectl delete pod `kubectl get pods --namespace <yournamespace> | awk '$3 == "Error" {print $1}'` --namespace <yournamespace>
Mostly Pod failures return explicit error states that can be observed in the status field
Error :
Your pod is crashed, it was able to schedule on node successfully but crashed after that. To debug it more you can use different methods or commands
kubectl describe pod <Pod name > -n <Namespace>
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/#my-pod-is-crashing-or-otherwise-unhealthy
Here is an overkill go-template based attempt:
kubectl get pods -o go-template='{{range $index, $element := .items}}{{range .status.containerStatuses}}{{range .state }}{{if .reason }}{{if (eq .reason "Error") }}{{$element.metadata.name}} {{$element.metadata.namespace}}{{"\n"}}{{end}}{{end}}{{end}}{{end}}{{end}}'
job1-stn45 default
My pod status:
k get pod
NAME READY STATUS RESTARTS AGE
foo 1/1 Running 1 2d11h
nginx-0 1/1 Running 3 5d10h
nginx-2 1/1 Running 3 5d10h
nginx-1 1/1 Running 3 5d10h
job1-stn45 0/1 Error 0 113m
update-test-27145740-82z7s 0/1 ImagePullBackOff 0 96m
update-test-27145500-7f2l9 0/1 ImagePullBackOff 0 5h36m
I successfully deployed my Kubernetes app using kubectl apply -f deployment.yaml.
When I try to hit the URL endpoint, I'm getting an nginx 404 Not Found error page.
My next step is to open a bash shell on the docker instance that is running my app. How do I do this in Kubernetes?
How do I ssh into the docker container running my app, or docker exec bash to an app I've deployed to Kubernetes?
If I were running in docker I would run docker ps to find the container ID and then run docker exec -t ### bash to open a shell on my container to look around to troubleshoot why something isn't working.
What is the equivalent way to do this in Kubernetes?
Searching for a solution
I searched and found this URL, which says how to get a shell on your app.
The summary of that URL is:
kubectl apply -f https://k8s.io/examples/application/shell-demo.yaml
kubectl get pod shell-demo
kubectl exec --stdin --tty shell-demo -- /bin/bash
But when I tried the equivalent commands, I got an error see below:
kubectl get pods --namespace my-app-namespace
NAME READY STATUS RESTARTS AGE
dpl-my-app-787bc5b7d-4ftkb 1/1 Running 0 2h
Then I tried:
kubectl exec --stdin --tty my-app-namespace -- /bin/bash
Error from server (NotFound): pods "my-app-namespace" not found
exit status 1
I figured this happened because I was trying to exec into the namespace not the pod, so I also tried with the dpl-my-app-... (see below) but got the same error.
kubectl exec --stdin --tty dpl-my-app-787bc5b7d-4ftkb -- /bin/bash
Error from server (NotFound): pods "dpl-my-app-787bc5b7d-4ftkb" not found
exit status 1
What is the command I need to get the pod instance so that kubectl exec will work?
As correctly stated by #David Maze:
Your kubectl get pods command has a --namespace option; you need to repeat this in the kubectl exec command. – David Maze 12 hours ago
If you've created your Deployment: dpl-my-app in a namespace: my-app-namespace you should also specify the --namespace/-n parameter in all of your commands.
A side note!
There is a tool to change namespaces, called: kubens
With a following command:
kubectl exec --stdin --tty my-app-namespace -- /bin/bash
You've correctly identified the issue that you are trying to exec into a namespace but not into a Pod
With a following command:
kubectl exec --stdin --tty dpl-my-app-787bc5b7d-4ftkb -- /bin/bash
You've tried to exec into a Pod named dpl-my-app-787bc5b7d-4ftkb but in a default namespace. Not in a namespace your Pod is residing.
To exec into your Pod in a specific namespace you should use following command:
kubectl exec --stdin --tty --namespace my-app-namespace dpl-my-app-787bc5b7d-4ftkb -- /bin/bash
Please notice the --namespace is before -- where the commands to the Pod should be placed (like -- /bin/bash).
Additional resources:
Kubernetes.io: Docs: Concepts: Overview: Working with objects: Namespaces
Kubernetes.io: Docs: Tasks: Debug application cluster: Get shell running container
We have deployed a few pods in cluster in various namespaces. I would like to inspect and identify all pod which is not in a Ready state.
master $ k get pod/nginx1401 -n dev1401
NAME READY STATUS RESTARTS AGE
nginx1401 0/1 Running 0 10m
In above list, Pod are showing in Running status but having some issue. How can we find the list of those pods. Below command not showing me the desired output:
kubectl get po -A | grep Pending Looking for pods that have yet to schedule
kubectl get po -A | grep -v Running Looking for pods in a state other than Running
kubectl get pods --field-selector=status.phase=Failed
There is a long-standing feature request for this. The latest entry suggests
kubectl get po --all-namespaces | gawk 'match($3, /([0-9])+\/([0-9])+/, a) {if (a[1] < a[2] && $4 != "Completed") print $0}'
for finding pods that are running but not complete.
There are a lot of other suggestions in the thread that might work as well.
You can try this:
$ kubectl get po --all-namespaces -w
you will get an update whenever any change(create/update/delete) happened in the pod for all namespace
Or you can watch all pod by using:
$ watch -n 1 kubectl get po --all-namespaces
This will continuously watch all pod in any namespace in 1 seconds interval.
I am using GKE. I've launched the following traefik deployment through kubectl:
https://github.com/containous/traefik/blob/master/examples/k8s/traefik-deployment.yaml
The pod runs on the kube-system namespace.
I'm not able to ssh into the pod.
kubectl get po -n kube-system
traefik-ingress-controller-5bf599f65d-fl9gx 1/1 Running 0 30m
kubectl exec -it traefik-ingress-controller-5bf599f65d-fl9gx -n kube-system -- '\bin\bash'
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"\\\\bin\\\\bash\": executable file not found in $PATH"
command terminated with exit code 126
Am I missing something? The same thing for '-- sh' too.
rather use forward slashed / (your example has a backslash) such as in
kubectl exec -it traefik-ingress-controller-5bf599f65d-fl9gx -n kube-system -- '/bin/bash'
If this does still not work, try a different shell such as
kubectl exec -it traefik-ingress-controller-5bf599f65d-fl9gx -n kube-system -- '/bin/sh'
So, apparently the default traefik image is an amd64 version. I had to use the alpine version to ssh into it using:
kubectl exec -it _podname_ -- sh
It seems that this here is the right answer. You cannot exec a shell into the traefik container using the default image, you must use the alpine one.