Init container not restarting on pod restart - kubernetes

I have an init container that do some stuff that needs for the main container to run correctly, like creating some directories and a liveness probe that may fail if one of these directories were deleted. When the pod is restarted due to fail of liveness probe I expect that init container is also being restarted, but it won't.
This is what kubernetes documentation says about this:
If the Pod restarts, or is restarted, all init containers must execute again.
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
Easiest way to prove this behavior was to use the example of the pod from k8s documentation, add a liveness probe that always fails and expect that init container to be restarted, but again, it is not behaving as expected.
This is the example I'm working with:
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
restartPolicy: Always
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo "App started at $(date)" && tail -f /dev/null']
livenessProbe:
exec:
command:
- sh
- -c
- exit 1
initialDelaySeconds: 1
periodSeconds: 1
initContainers:
- name: myapp-init
image: busybox:1.28
command: ['/bin/sh', '-c', 'sleep 5 && echo "Init container started at $(date)"']
Sleep and date command are there to confirm that init container was restarted.
The pod is being restarted:
NAME READY STATUS RESTARTS AGE
pod/myapp-pod 1/1 Running 4 2m57s
But from logs it's clear that init container don't:
$ k logs pod/myapp-pod myapp-init
Init container started at Thu Jun 16 12:12:03 UTC 2022
$ k logs pod/myapp-pod myapp-container
App started at Thu Jun 16 12:14:20 UTC 2022
I checked it on both v1.19.5 and v1.24.0 kubernetes servers.
The question is how to force the init container to restart on pod restart.

The restart number refers to container restarts, not pod restarts.
init container need to run only once in a pos lifetime, and you need to design your containers like that, you can read this PR, and especially this comment

Related

LivenessProbe command for a background process

What is an appropriate Kubernetes livenessProbe command for a background process?
We have a NodeJS process that consumes messages off an SQS queue. Since it's a background job we don't expose any HTTP endpoints and so a liveness command seems to be the more appropriate way to do the liveness check. What would a "good enough" command setup look like that actually checks the process is alive and running properly? Should the NodeJS process touch a file to update its editted time and the liveness check validate that? Examples I've seen online seem disconnected to the actual process, e.g. they check a file exists.
You could use liveness using exec command.
Here is an example:
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
To perform a probe, the kubelet executes the command cat /tmp/healthy in the target container. If the command succeeds, it returns 0, and the kubelet considers the container to be alive and healthy. If the command returns a non-zero value, the kubelet kills the container and restarts it.

How can a failed Kubernetes Ceph node be deleted automatically?

On an environment with more than one node and using Ceph block volumes in RWO mode, if a node fails (is unreachable and will not come back soon) and the pod is rescheduled to another node, the pod can't start if it has a Ceph block PVC. The reason is that the volume is 'still being used' by the other pod (because as the node failed, its resources can't be removed properly).
If I remove the node from the cluster using kubectl delete node dead-node, the pod can start because the resources get removed.
How can I do this automatically? Some possibilities I have thought about are:
Can I set a force detach timeout for the volume?
Set a delete node timeout?
Automatically delete a node with given taints?
I can use the ReadWriteMany mode with other volume types to be able to let the PV be used by more than one pod, but it is not ideal.
You can probably have a sidecar container and tweak the Readiness and Liveness probes in your pod so that the pod doesn't restart if a Ceph block volume is unreachable for some time by the container that it's using it. (There may be other implications to your application though)
Something like this:
apiVersion: v1
kind: Pod
metadata:
labels:
test: ceph
name: ceph-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
- name: cephclient
image: ceph
volumeMounts:
- name: ceph
mountPath: /cephmountpoint
livenessProbe:
... 👈 something
initialDelaySeconds: 5
periodSeconds: 3600 👈 make this real long
✌️☮️

Pods stuck in PodInitializing state indefinitely

I've got a k8s cronjob that consists of an init container and a one pod container. If the init container fails, the Pod in the main container never gets started, and stays in "PodInitializing" indefinitely.
My intent is for the job to fail if the init container fails.
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: job-name
namespace: default
labels:
run: job-name
spec:
schedule: "15 23 * * *"
startingDeadlineSeconds: 60
concurrencyPolicy: "Forbid"
successfulJobsHistoryLimit: 30
failedJobsHistoryLimit: 10
jobTemplate:
spec:
# only try twice
backoffLimit: 2
activeDeadlineSeconds: 60
template:
spec:
initContainers:
- name: init-name
image: init-image:1.0
restartPolicy: Never
containers:
- name: some-name
image: someimage:1.0
restartPolicy: Never
a kubectl on the pod that's stuck results in:
Name: job-name-1542237120-rgvzl
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: my-node-98afffbf-0psc/10.0.0.0
Start Time: Wed, 14 Nov 2018 23:12:16 +0000
Labels: controller-uid=ID
job-name=job-name-1542237120
Annotations: kubernetes.io/limit-ranger:
LimitRanger plugin set: cpu request for container elasticsearch-metrics; cpu request for init container elasticsearch-repo-setup; cpu requ...
Status: Failed
IP: 10.0.0.0
Controlled By: Job/job-1542237120
Init Containers:
init-container-name:
Container ID: docker://ID
Image: init-image:1.0
Image ID: init-imageID
Port: <none>
Host Port: <none>
State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 14 Nov 2018 23:12:21 +0000
Finished: Wed, 14 Nov 2018 23:12:32 +0000
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wwl5n (ro)
Containers:
some-name:
Container ID:
Image: someimage:1.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wwl5n (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
To try and figure this out I would run the command:
kubectl get pods - Add the namespace param if required.
Then copy the pod name and run:
kubectl describe pod {POD_NAME}
That should give you some information as to why it's stuck in the initializing state.
A Pod can be stuck in Init status due to many reasons.
PodInitializing or Init Status means that the Pod contains an Init container that hasn't finalized (Init containers: specialized containers that run before app containers in a Pod, init containers can contain utilities or setup scripts). If the pods status is ´Init:0/1´ means that one init container is not finalized; init:N/M means the Pod has M Init Containers, and N have completed so far.
Gathering information
For those scenario the best would be to gather information, as the root cause can be different in every PodInitializing issue.
kubectl describe pods pod-XXX with this command you can get many info of the pod, you can check if there's any meaningful event as well. Save the init container name
kubectl logs pod-XXX this command prints the logs for a container in a pod or specified resource.
kubectl logs pod-XXX -c init-container-xxx This is the most accurate as could print the logs of the init container. You can get the init container name describing the pod in order to replace "init-container-XXX" as for example to "copy-default-config" as below:
The output of kubectl logs pod-XXX -c init-container-xxx can thrown meaningful info of the issue, reference:
In the image above we can see that the root cause is that the init container can't download the plugins from jenkins (timeout), here now we can check connection config, proxy, dns; or just modify the yaml to deploy the container without the plugins.
Additional:
kubectl describe node node-XXX describing the pod will give you the name of its node, which you can also inspect with this command.
kubectl get events to list the cluster events.
journalctl -xeu kubelet | tail -n 10 kubelet logs on systemd (journalctl -xeu docker | tail -n 1 for docker).
Solutions
The solutions depends on the information gathered, once the root cause is found.
When you find a log with an insight of the root cause, you can investigate that specific root cause.
Some examples:
1 > In there this happened when init container was deleted, can be fixed deleting the pod so it would be recreated, or redeploy it. Same scenario in 1.1.
2 > If you found "bad address 'kube-dns.kube-system'" the PVC may not be recycled correctly, solution provided in 2 is running /opt/kubernetes/bin/kube-restart.sh.
3 > There, a sh file was not found, the solution would be to modify the yaml file or remove the container if unnecessary.
4 > A FailedSync was found, and it was solved restarting docker on the node.
In general you can modify the yaml, for example to avoid using an outdated URL, try to recreate the affected resource, or just remove the init container that causes the issue from your deployment. However the specific solution will depend on the specific root cause.
I think that you could miss that it is the expected behavior of the init containers.
The rule is that in case of initContainers failure a Pod will not restart if restartPolicy is set to Never otherwise the Kubernetes will keep restarting it until it succeeds.
Also:
If the init container fails, the Pod in the main container never gets
started, and stays in "PodInitializing" indefinitely.
According to documentation:
A Pod cannot be Ready until all Init Containers have succeeded. The
ports on an Init Container are not aggregated under a service. A Pod
that is initializing is in the Pending state but should have a
condition Initializing set to true.
*I can see that you tried to change this behavior, but I am not sure if you can do that with CronJob, I saw examples with Jobs. But I am just theorizing, and if this post did not help you solve your issue I can try to recreate it in lab environment.
Since you have already figured out that initcontainers are meant to run to completion, successfully. If you can't get rid of init containers, what i would do in this case is to make sure that the init container ends successfully all the time. The result of the init container can be written in an emptydir volume, something like a status file, shared by both your init container and your work container.
I would delegate to the work container the responsibility of deciding what to do in case the init container ends unsuccessfully.

Not able to see Pod when I create a Job

When I try to create Deployment as Type Job, it's not pulling any image.
Below is .yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: copyartifacts
spec:
backoffLimit: 1
template:
metadata:
name: copyartifacts
spec:
restartPolicy: "Never"
volumes:
- name: sharedvolume
persistentVolumeClaim:
claimName: shared-pvc
- name: dockersocket
hostPath:
path: /var/run/docker.sock
containers:
- name: copyartifacts
image: alpine:3.7
imagePullPolicy: Always
command: ["sh", "-c", "ls -l /shared; rm -rf /shared/*; ls -l /shared; while [ ! -d /shared/artifacts ]; do echo Waiting for artifacts to be copied; sleep 2; done; sleep 10; ls -l /shared/artifacts; "]
volumeMounts:
- mountPath: /shared
name: sharedvolume
Can you please guide here?
Regards,
Vikas
There could be two possible reasons for not seeing pod.
The pod hasn't been created yet.
The pod has completed it's task and terminated before you have noticed.
1. Pod hasn't been created:
If pod hasn't been created yet, you have to find out why the job failed to create pod. You can view job's events to see if there are any failure event. Use following command to describe a job.
kubectl describe job <job-name> -n <namespace>
Then, check the Events: field. There might be some events showing pod creation failure with respective reason.
2. Pod has completed and terminated:
Job's are used to perform one-time task rather than serving an application that require to maintain a desired state. When the task is complete, pod goes to completed state then terminate (but not deleted). If your Job is intended for a task that does not take much time, the pod may terminate after completing the task before you have noticed.
As the pod is terminated, kubectl get pods will not show that pod. However, you will able to see the pod using kubectl get pods -a command as it hasn't been deleted.
You can also describe the job and check for completion or success event.
if you use kind created the K8s cluster, all the cluster node run as docker. If you had reboot you computer or VM, the cluster (pod) ip address may change, leeding to the cluster node internet communication failed. In this case, see the cluster manager logs, it has error message. Job created, but pod not.
try to re-create the cluster, or change the node config about ip address.

My kubernetes pods keep crashing with "CrashLoopBackOff" but I can't find any log

This is what I keep getting:
[root#centos-master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-server-h6nw8 1/1 Running 0 1h
nfs-web-07rxz 0/1 CrashLoopBackOff 8 16m
nfs-web-fdr9h 0/1 CrashLoopBackOff 8 16m
Below is output from describe pods
kubectl describe pods
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
16m 16m 1 {default-scheduler } Normal Scheduled Successfully assigned nfs-web-fdr9h to centos-minion-2
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Created Created container with docker id 495fcbb06836
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Started Started container with docker id 495fcbb06836
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Started Started container with docker id d56f34ae4e8f
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Created Created container with docker id d56f34ae4e8f
16m 16m 2 {kubelet centos-minion-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "web" with CrashLoopBackOff: "Back-off 10s restarting failed container=web pod=nfs-web-fdr9h_default(461c937d-d870-11e6-98de-005056040cc2)"
I have two pods: nfs-web-07rxz, nfs-web-fdr9h, but if I do kubectl logs nfs-web-07rxz or with -p option I don't see any log in both pods.
[root#centos-master ~]# kubectl logs nfs-web-07rxz -p
[root#centos-master ~]# kubectl logs nfs-web-07rxz
This is my replicationController yaml file:
replicationController yaml file
apiVersion: v1 kind: ReplicationController metadata: name: nfs-web spec: replicas: 2 selector:
role: web-frontend template:
metadata:
labels:
role: web-frontend
spec:
containers:
- name: web
image: eso-cmbu-docker.artifactory.eng.vmware.com/demo-container:demo-version3.0
ports:
- name: web
containerPort: 80
securityContext:
privileged: true
My Docker image was made from this simple docker file:
FROM ubuntu
RUN apt-get update
RUN apt-get install -y nginx
RUN apt-get install -y nfs-common
I am running my kubernetes cluster on CentOs-1611, kube version:
[root#centos-master ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
If I run the docker image by docker run I was able to run the image without any issue, only through kubernetes I got the crash.
Can someone help me out, how can I debug without seeing any log?
As #Sukumar commented, you need to have your Dockerfile have a Command to run or have your ReplicationController specify a command.
The pod is crashing because it starts up then immediately exits, thus Kubernetes restarts and the cycle continues.
#Show details of specific pod
kubectl describe pod <pod name> -n <namespace-name>
# View logs for specific pod
kubectl logs <pod name> -n <namespace-name>
If you have an application that takes slower to bootstrap, it could be related to the initial values of the readiness/liveness probes. I solved my problem by increasing the value of initialDelaySeconds to 120s as my SpringBoot application deals with a lot of initialization. The documentation does not mention the default 0 (https://kubernetes.io/docs/api-reference/v1.9/#probe-v1-core)
service:
livenessProbe:
httpGet:
path: /health/local
scheme: HTTP
port: 8888
initialDelaySeconds: 120
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 10
readinessProbe:
httpGet:
path: /admin/health
scheme: HTTP
port: 8642
initialDelaySeconds: 150
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 10
A very good explanation about those values is given by What is the default value of initialDelaySeconds.
The health or readiness check algorithm works like:
wait for initialDelaySeconds
perform check and wait timeoutSeconds for a timeout
if the number of continued successes is greater than successThreshold return success
if the number of continued failures is greater than failureThreshold return failure otherwise wait periodSeconds and start a new check
In my case, my application can now bootstrap in a very clear way, so that I know I will not get periodic crashloopbackoff because sometimes it would be on the limit of those rates.
I had the need to keep a pod running for subsequent kubectl exec calls and as the comments above pointed out my pod was getting killed by my k8s cluster because it had completed running all its tasks. I managed to keep my pod running by simply kicking the pod with a command that would not stop automatically as in:
kubectl run YOUR_POD_NAME -n YOUR_NAMESPACE --image SOME_PUBLIC_IMAGE:latest --command tailf /dev/null
My pod kept crashing and I was unable to find the cause. Luckily there is a space where kubernetes saves all the events that occurred before my pod crashed.
(#List Events sorted by timestamp)
To see these events run the command:
kubectl get events --sort-by=.metadata.creationTimestamp
make sure to add a --namespace mynamespace argument to the command if needed
The events shown in the output of the command showed my why my pod kept crashing.
From This page, the container dies after running everything correctly but crashes because all the commands ended. Either you make your services run on the foreground, or you create a keep alive script. By doing so, Kubernetes will show that your application is running. We have to note that in the Docker environment, this problem is not encountered. It is only Kubernetes that wants a running app.
Update (an example):
Here's how to avoid CrashLoopBackOff, when launching a Netshoot container:
kubectl run netshoot --image nicolaka/netshoot -- sleep infinity
In your yaml file, add command and args lines:
...
containers:
- name: api
image: localhost:5000/image-name
command: [ "sleep" ]
args: [ "infinity" ]
...
Works for me.
I observed the same issue, and added the command and args block in yaml file. I am copying sample of my yaml file for reference
apiVersion: v1
kind: Pod
metadata:
labels:
run: ubuntu
name: ubuntu
namespace: default
spec:
containers:
- image: gcr.io/ow/hellokubernetes/ubuntu
imagePullPolicy: Never
name: ubuntu
resources:
requests:
cpu: 100m
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
dnsPolicy: ClusterFirst
enableServiceLinks: true
As mentioned in above posts, the container exits upon creation.
If you want to test this without using a yaml file, you can pass the sleep command to the kubectl create deployment statement. The double hyphen -- indicates a command, which is equivalent of command: in a Pod or Deployment yaml file.
The below command creates a deployment for debian with sleep 1234, so it doesn't exit immediately.
kubectl create deployment deb --image=debian:buster-slim -- "sh" "-c" "while true; do sleep 1234; done"
You then can create a service etc, or, to test the container, you can kubectl exec -it <pod-name> -- sh (or -- bash) into the container you just created to test it.
I solved this problem I increased memory resource
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 100m
memory: 250Mi
In my case the problem was what Steve S. mentioned:
The pod is crashing because it starts up then immediately exits, thus Kubernetes restarts and the cycle continues.
Namely I had a Java application whose main threw an exception (and something overrode the default uncaught exception handler so that nothing was logged). The solution was to put the body of main into try { ... } catch and print out the exception. Thus I could find out what was wrong and fix it.
(Another cause could be something in the app calling System.exit; you could use a custom SecurityManager with an overridden checkExit to prevent (or log the caller of) exit; see https://stackoverflow.com/a/5401319/204205.)
Whilst troubleshooting the same issue I found no logs when using kubeclt logs <pod_id>.
Therefore I ssh:ed in to the node instance to try to run the container using plain docker. To my surprise this failed also.
When entering the container with:
docker exec -it faulty:latest /bin/sh
and poking around I found that it wasn't the latest version.
A faulty version of the docker image was already available on the instance.
When I removed the faulty:latest instance with:
docker rmi faulty:latest
everything started to work.
I had same issue and now I finally resolved it. I am not using docker-compose file.
I just added this line in my Docker file and it worked.
ENV CI=true
Reference:
https://github.com/GoogleContainerTools/skaffold/issues/3882
Try rerunning the pod and running
kubectl get pods --watch
to watch the status of the pod as it progresses.
In my case, I would only see the end result, 'CrashLoopBackOff,' but the docker container ran fine locally. So I watched the pods using the above command, and I saw the container briefly progress into an OOMKilled state, which meant to me that it required more memory.
In my case this error was specific to the hello-world docker image. I used the nginx image instead of the hello-world image and the error was resolved.
i solved this problem by removing space between quotes and command value inside of array ,this is happened because container exited after started and no executable command present which to be run inside of container.
['sh', '-c', 'echo Hello Kubernetes! && sleep 3600']
I had similar issue but got solved when I corrected my zookeeper.yaml file which had the service name mismatch with file deployment's container names. It got resolved by making them same.
apiVersion: v1
kind: Service
metadata:
name: zk1
namespace: nbd-mlbpoc-lab
labels:
app: zk-1
spec:
ports:
- name: client
port: 2181
protocol: TCP
- name: follower
port: 2888
protocol: TCP
- name: leader
port: 3888
protocol: TCP
selector:
app: zk-1
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: zk-deployment
namespace: nbd-mlbpoc-lab
spec:
template:
metadata:
labels:
app: zk-1
spec:
containers:
- name: zk1
image: digitalwonderland/zookeeper
ports:
- containerPort: 2181
env:
- name: ZOOKEEPER_ID
value: "1"
- name: ZOOKEEPER_SERVER_1
value: zk1
In my case, the issue was a misconstrued list of command-line arguments. I was doing this in my deployment file:
...
args:
- "--foo 10"
- "--bar 100"
Instead of the correct approach:
...
args:
- "--foo"
- "10"
- "--bar"
- "100"
I finally found the solution when I execute 'docker run xxx ' command ,and I got the error then.It is caused by incomplete-platform .
It seems there could be a lot of reasons why a Pod should be in crashloopbackoff state.
In my case, one of the container was terminating continuously due to the missing Environment value.
So, the best way to debug is to -
1. check Pod description output i.e. kubectl describe pod abcxxx
2. check the events generated related to the Pod i.e. kubectl get events| grep abcxxx
3. Check if End-points have been created for the Pod i.e. kubectl get ep
4. Check if dependent resources have been in-place e.g. CRDs or configmaps or any other resource that may be required.
kubectl logs -f POD, will only produce logs from a running container. Suffix --previous to the command to get logs from a previous container. Used maily for debugging. Hope this helps.