k8s - Keep pod up even if sidecar crashed - kubernetes

I have a pod with a sidecar. The sidecar does file synchronisation and is optional. However it seems that if the sidecar crashes, the whole pod becomes unavailable. I want the pod to continue serving requests even if its sidecar crashed. Is this doable?

Set pod's restartPolicy to Never. It will prevent the kubelet from restarting your pod even if one of your containers failed.
If a Pod is running and has two Containers. Container 1 exits with failure. If the restartPolicy it set to Never, the kubelet will not restart Container and the Pod's phase stays Running.
Reference

Related

How to automatically force delete pods stuck in 'Terminating' after node failure?

I have a deployment that deploys a single pod with a persistent volume claim. If I switch off the node it is running on, after a while k8s terminates the pod and tries to spin it up elsewhere. However the new pod cannot attach the volume (Multi-Attach error for volume "pvc-...").
I can manually delete the old 'Terminating' pod with kubectl delete pod <PODNAME> --grace-period=0 --force and then things recover.
Is there a way to get Kubernetes to force delete the 'Terminating' pods after a timeout or something? Tx.
According to the docs:
A Pod is not deleted automatically when a node is unreachable. The
Pods running on an unreachable Node enter the 'Terminating' or
'Unknown' state after a timeout. Pods may also enter these states when
the user attempts graceful deletion of a Pod on an unreachable Node.
The only ways in which a Pod in such a state can be removed from the
apiserver are as follows:
The Node object is deleted (either by you, or by the Node Controller).
The kubelet on the unresponsive Node starts responding, kills the Pod and removes the entry from the apiserver.
Force deletion of the Pod by the user.
So I assume you are not deleting nor draining the node that is being shut down.
In general I'd advice to ensure any broken nodes are deleted from the node list and that should make Terminating pods to be deleted by controller manager.
Node deletion normally happens automatically, at least on kubernetes clusters running on the main cloud providers, but if that's not happening for you than you need a way to remove nodes that are not healthy.
Use Recreate in .spec.strategy.type of your Deployment. This tell Kubernetes to delete the old pods before creating new ones.
Ref: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy

Is it possible to get the details of the node where the pod ran before restart?

I'm running a kubernetes cluster of 20+ nodes. And one pod in a namespace got restarted. The pod got killed due to OOM with exit code 137 and restarted again as expected. But would like to know the node in which the pod was running earlier. Any place we could check the logs for the info? Like tiller, kubelet, kubeproxy etc...
But would like to know the node in which the pod was running earlier.
If a pod is killed with ExitCode: 137, e.g. when it used more memory than its limit, it will be restarted on the same node - not re-scheduled. For this, check your metrics or container logs.
But Pods can also be killed due to over-committing a node, see e.g. How to troubleshoot Kubernetes OOM and CPU Throttle.

Does a Kubernetes POD with restart policy always have to be under the auspice of a controller to work?

If I create a POD manifest (pod-definition.yaml) and set the restartPolicy: Always does that Pod also need to be associated with any controller (i.e., a Replicaset or Deployment)? The end goal here it to auto-start the container in the Pod should it die. Without a Pod being associated with a controller will that container automatically restart? What happens if the Pod has only one container?
The documentation is not clear here but it lead me to believe that the Pod must be under a controller for this to work, i.e., if you implicitly create a 8Ks object and specify a restart policy of Never you'll get a pod. If you specify always (the default) you'll get a deployment.
Pod without a controller(deployment, replication controller etc) and only with restartPolicy will not be restarted/rescheduled if the node(to be exact the kubelet on that node) where its running dies or drained or rebooted or for some other reason pod is evicted from the node. If the node is in good state and for some reason pod crashes it will be restarted on the same node without the need of a controller.
The reason is pod restartPolicy is handled by kubelet i.e pod is restarted by kubelet of the node.Now if the node dies kubelet is also dead and can not restart the pod. Hence you need to have a controller which will restart it in another node.
From the docs
restartPolicy only refers to restarts of the Containers by the kubelet
on the same node
In short if you want pods to survive a node failure or a kubelet failure of a node you should have a higher level controller.

Would Kubernetes bring up the down-ed Pod if only Pod definition file exists?

I have Pod definition file only. Kubernetes will bring up the pod. What happens if it goes down? Would Kubernetes bring it up automatically? Or if we want certain numbers of pods up at all time, we MUST take the help of ReplicationController( or ReplicaSet in new versions)?
Although your question is not clear , but yes , if you have deployed the pod through deployment or replicaSet , then kubernetes will create another one if you or someone else deletes that pod.
If you have just the pod without any controller like ReplicaSet , then it goes forever as there is no one to take care of it.
In case , the app crashes inside pod then:
A CrashloopBackOff means that you have a pod starting, crashing, starting again, and then crashing again.
A PodSpec has a restartPolicy field with possible values Always, OnFailure, and Never which applies to all containers in a pod. The default value is Always and the restartPolicy only refers to restarts of the containers by the kubelet on the same node (so the restart count will reset if the pod is rescheduled in a different node). Failed containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution.
https://sysdig.com/blog/debug-kubernetes-crashloopbackoff/
restartPolicy pod only refers to restarts of the Containers by the kubelet on the same node.If there is no replication controller or deployment then if a node goes down kubernetes will not reschedule or restart the pods of that node into any other nodes.This is the reason pods are not recommended to be used directly in production.

Kubernetes Deployment restartPolicy alternatives

I have a Deployment configuration that keeps a certain amount of Pods alive. However due to some strange circumstances these Pods fail the readiness probes sometimes and do not recover after a restart thus requiring me to manually delete the Pod from the Replica Set.
A solution to this would be to set the Pod restartPolicy to Never but that is actually not supported https://github.com/kubernetes/kubernetes/issues/24725.
My question is what alternatives are there to make it so that if a Pod has failed it's readiness probe then the Pod would be deleted.
You could change the liveness probe to make it fail whenever the readiness probe fails. This would kill the pod, and start a new one.