AdmissionController holding back a Terminated Pod from getting completely removed - kubernetes

I have an AdmissionController which is running successfully and prevents some pods from getting instantiated, checking on the prescribed conditions.
But the Pod gets stuck in Terminated Status and never goes away. I also have a process that monitors for stuck pods and cleans up. It tries to delete these Terminated Pods using deleteNamespacedPod. The Api call works fine, but the Pod lingers on without getting deleted. Is the AdmissionController denial a finalizer that is holding back the Pod from getting deleted ?
When I took down the Admission Controller, the clean up process was successfully able to delete the Pod.
Any insights or things I am missing in the AdmissionController ?
I appreciate any help/insights in this issue.
Thanks a lot,
-Sreeni

Run the below command against the terminated pod to delete it forcefully
kubectl patch pod <pod-name> -p '{"metadata":{"finalizers":null}}'

Related

Auto delete CrashBackoffLoop pods in a deployment

In my kubernetes cluster, there are multiple deployments in a namespace.
For a specific deployment, there is a need to not allow "CrashLoopBackoff" pods to exist.
So basically, when any pod gets to this state, I would want it to be deleted and later a new pod to be created which is already handled by the ReplicaSet.
I tried with custom controllers, with the thought that the SharedInformer would alert about the state of Pod and then I would delete it from that loop.
However, this brings dependency on the pod on which the custom controller would run.
I also tried searching for any option to be configured in the manifest itself, but could not find any.
I am pretty new to Kuberenetes, so need help in the implementation of this behaviour.
Firstly, you should address the reason why the pod has entered the CrashLoopBackOff state rather than just delete it. If you do this, you'll potentially just recreate the problem again and you'll be deleting pods repeatedly. For example, if your pod is trying to access an external DB and that DB is down, it'll CrashLoop, and deleting and restarting the pod won't help fix that.
Secondly, if you want to do this deleting in an automated manner, an easy way would be to run a CronJob resource that goes through your deployment and deletes the CrashLooped pods. You could set the cronjob to run once an hour or whatever schedule you wish.
Deleting the POD and waiting for the New one is like restarting the deployment or POD.
Kubernetes will auto restart your CrashLoopBackoff POD if failing, you can check the Restart count.
NAME READY STATUS RESTARTS AGE
te-pod-1 0/1 CrashLoopBackOff 2 1m44s
This restarts will be similar to what you have mentioned
when any pod gets to this state, I would want it to be deleted and
later a new pod to be created which is already handled by the
ReplicaSet.
If you want to remove Crashing the POD fully and not look for new POD to come up, you have to rollback the deployment.
If there is any issue with your Replicaset and your POD is crashing it would be useless, any number of times you delete and restart the POD it will crash all time, unless you check logs & debug to solve the real issue in replicaset(Deployment).

Kubernetes limit number of retry

For some context, I'm creating an API in python that creates K8s Jobs with user input in ENV variables.
Sometimes, it happens that the Image selected does not exist or has been deleted. Secrets does not exists or Volume isn't created. So it makes the Job in a crashloopbackoff or imagepullbackoff state.
First I'm am wondering if the ressource during this state are allocated to the job?
If yes, I don't want the Job to loop forever and lock resources to a never starting Job.
I've set the backofflimit to 0, but this is when the Job detect a Pod that goes in fail and tries to relaunch an other Pod to retry. In my case, I know that if a Pod fails for a job, then it's mostly due to OOM or code that fails and will always fails due to user input. So retrying will always fail.
But it doesn't limit the number of tries to crashloopbackoff or imagepullbackoff. Is there a way to set to terminate or fail the Job? I don't want to kill it, but just free the ressource and keep the events in (status.container.state.waiting.reason + status.container.state.waiting.message) or (status.container.state.terminated.reason + status.container.state.terminated.exit_code)
Could there be an option to set to limit the number of retry at the creation so I can free resources, but not to remove it to keep logs.
I have tested your first question and YES even if a pod is in crashloopbackoff state, the resources are still allocated to it !!! Here is my test: Are the Kubernetes requested resources by a pod still allocated to it when it is in crashLoopBackOff state?
Thanks for your question !
Long answer short, unfortunately there is no such option in Kubernetes.
However, you can do this manually by checking if the pod is in a crashloopbackoff then, unallocate its resources or simply delete the pod itself.
The following script delete any pod in the crashloopbackoff state from a specified namespace
#!/bin/bash
# This script check the passed namespace and delete pods in 'CrashLoopBackOff state
NAMESPACE="test"
delpods=$(sudo kubectl get pods -n ${NAMESPACE} |
grep -i 'CrashLoopBackOff' |
awk '{print $1 }')
for i in ${delpods[#]}; do
sudo kubectl delete pod $i --force=true --wait=false \
--grace-period=0 -n ${NAMESPACE}
done
Since we have passed the option --grace-period=0 the pod won't automatically restart again.
But, if after using this script or assigning it to a job, you noticed that the pod continues to restart and fall in the CrashLoopBackOff state again for some weird reason. Thera is a workaround for this, which is changing the restart policy of the pod:
A PodSpec has a restartPolicy field with possible values Always,
OnFailure, and Never. The default value is Always. restartPolicy
applies to all Containers in the Pod. restartPolicy only refers to
restarts of the Containers by the kubelet on the same node. Exited
Containers that are restarted by the kubelet are restarted with an
exponential back-off delay (10s, 20s, 40s …) capped at five minutes,
and is reset after ten minutes of successful execution. As discussed
in the Pods document, once bound to a node, a Pod will never be
rebound to another node.
See more details in the documentation or from here.
And that is it! Happy hacking.
Regarding the first question, it is already answered by bguess here.

Kubernetes: view logs of crashed Airflow worker pod

Pods on our k8s cluster are scheduled with Airflow's KubernetesExecutor, which runs all Tasks in a new pod.
I have a such a Task for which the pod instantly (after 1 or 2 seconds) crashes, and for which of course I want to see the logs.
This seems hard. As soon the pod crashes, it gets deleted, along with the ability to retrieve crash logs. I already tried all of:
kubectl logs -f <pod> -p: cannot be used since these pods are named uniquely
(courtesy of KubernetesExecutor).
kubectl logs -l label_name=label_value: I
struggle to apply the labels to the pod (if this is a known/used way of working, I'm happy to try further)
An shared nfs is mounted on all pods on a fixed log directory. The failing pod however, does not log to this folder.
When I am really quick I run kubectl logs -f -l dag_id=sample_dag --all-containers (dag_idlabel is added byAirflow)
between running and crashing and see Error from server (BadRequest): container "base" in pod "my_pod" is waiting to start: ContainerCreating. This might give me some clue but:
these are only but the last log lines
this is really backwards
I'm basically looking for the canonical way of retrieving logs from transient pods
You need to enable remote logging. Code sample below is for using S3. In airflow.cfg set the following:
remote_logging = True
remote_log_conn_id = my_s3_conn
remote_base_log_folder = s3://airflow/logs
The my_s3_conn can be set in airflow>Admin>Connections. In the Conn Type dropdown, select S3.

How can a pod have status ready and terminating?

Curiously, I saw that a pod I had had both ready 1/1 status and status terminating when I ran kubectl get pods. Are these states not mutually exclusive? Why or why not?
For context, this was noticed immediately after I had killed skaffold so these pods were in the middle of shutting down.
When pods are in terminating state, they could still be functioning. The pod could be delayed in termination due to many reasons (eg. could be that you have a PVC attached, other pods are being terminated at the same time, etc). You could test this by running the following on a pod with a PVC attached or another reason to be terminated with a delay:
$ kubectl delete pod mypod-xxxxx-xxxxxx
pod mypod-xxxxx-xxxxxx deleted
$ kubectl delete pod mypod-xxxxx-xxxxxx
pod mypod-xxxxx-xxxxxx deleted
$ kubectl apply mypod.yaml
pod mypod-xxxxx-xxxxxx configured
Sometimes this happens because the pod is still in the terminating period and is functioning normally, so it will be treated as an existing pod that gets configured (neglecting the fact that you usually can't configure pods like this, but you get the point).
The ready column says how many containers are up.
The status terminating means no more traffic is being sent to that pod by the controllers. From kubernetes' docs:
When a user requests deletion of a pod, the system records the
intended grace period before the pod is allowed to be forcefully
killed, and a TERM signal is sent to the main process in each
container. Once the grace period has expired, the KILL signal is sent
to those processes, and the pod is then deleted from the API server.
That's the state it is. The containers are up, finishing processing whatever work it had already and a TERM signal was sent.
I want to update #nrxr answer:
The status terminating means no more traffic is being sent to that pod by the controllers.
That is what we want, but in reality, it not always be like that. The pod may terminate completely and the traffic still forward to it.
For detail please read this blog: https://learnk8s.io/graceful-shutdown.

Use kubectl to delete a node that has running pods on it

We are using Heat + Kubernetes (V0.19) to manage our apps. When do rolling update, sometimes container staring will always fail on a node but kubelet on the node will always retry but always fail. So the updating will hang there which is not the behavior we expected.
I found that using "kubectl delete node" to remove the node can avoid pods scheduled to that node. But in our env, the node to be deleted may have running pods on it.
So my question is:
After using "kubectl delete node" to remove the node, will the pods on that node still worked correctly ?
If you just want to cancel the rolling update, remove the failed pods and try again later, I have found that it is best to stop the update loop with CTRL+c and then delete the replication controller corresponding to the new app that is failing.
^C
kubectl delete replicationcontrollers your-app-v1.2.3