Know when a Pod was killed after exceeding its termination grace period - kubernetes

The scenario is as follows:
Our pods have a terminationGracePeriodSeconds of 60, which gives them ~60 seconds to do any necessary cleanup before Kubernetes will kill them ungracefully. In the majority of cases the cleanup happens well within the 60 seconds. But every now and then we (manually) observe pods that didn't complete their gracefully termination and were killed by Kubernetes.
How does one monitor these situations? When I try replicating this scenario with a simple linux image and sleep, I don't see Kubernetes logging an additional event after the "Killed" one. Without an additional event this is impossible to monitor from the infrastructure side.

You can use container hooks and then you can monitor those hooks events. For example preStop hook which is called when a POD get destroyed, will fire FailedPreStopHook event if it can not complete its work until terminationGracePeriodSeconds
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-demo
spec:
containers:
- name: lifecycle-demo-container
image: nginx
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
preStop:
exec:
command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done"]
https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination
https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/

Related

Do Kubernetes PreStop hooks prevent ALL containers in a pod from stopping, or only the container it's set on?

The Kubernetes documentation about PreStop hooks claims the following:
PreStop hooks are not executed asynchronously from the signal to stop
the Container; the hook must complete its execution before the TERM
signal can be sent.
However, it does not say anything about other containers in the pod.
Suppose terminationGracePeriodSeconds has not yet passed.
Are all containers in a pod protected from termination until the PreStop hook for each container finishes? Or does each PreStop hook only protect its own container?
I believe preStop hook only protects the container for which it's declared.
For example, in the following set up:
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-demo
spec:
containers:
- name: lifecycle-demo-container
image: nginx
lifecycle:
preStop:
exec:
command: ["/bin/sh","-c","sleep 15"]
- name: other-container
image: mysql
env:
- name: MYSQL_ALLOW_EMPTY_PASSWORD
value: "true"
If I terminate the pod, the mysql receives SIGTERM and shuts down immediately while the nginx container stays alive for extra 15 seconds due to its preStop hook

LivenessProbe command for a background process

What is an appropriate Kubernetes livenessProbe command for a background process?
We have a NodeJS process that consumes messages off an SQS queue. Since it's a background job we don't expose any HTTP endpoints and so a liveness command seems to be the more appropriate way to do the liveness check. What would a "good enough" command setup look like that actually checks the process is alive and running properly? Should the NodeJS process touch a file to update its editted time and the liveness check validate that? Examples I've seen online seem disconnected to the actual process, e.g. they check a file exists.
You could use liveness using exec command.
Here is an example:
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
To perform a probe, the kubelet executes the command cat /tmp/healthy in the target container. If the command succeeds, it returns 0, and the kubelet considers the container to be alive and healthy. If the command returns a non-zero value, the kubelet kills the container and restarts it.

Pod failure and recovery events

We are listening to multiple mailboxes on a single pod but if this pod goes down due to some reason need the other pod that is up to listen to these mailboxes. In order to keep recieving emails.
I would like to know if it is possible to find if a pod goes down like an event and trigger a script to perform above action on the go?
Approach 1:
kubernetes life cycle handler hook
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-demo
spec:
containers:
- name: lifecycle-demo-container
image: nginx
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
preStop:
exec:
command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done"]
https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/
Approach2:
Write a script which monitors the health of for every say x seconds, when 3 consecutive health checks fail kubernetes deletes the pod. so in your script, if 3 consecutive rest call fails for health then the pod is deleted. you can trigger your event.
Approach3:
maintain 2 replicas => problem could be two pods processing same mail. you can avoid this if you use kafka.

Kubernetes Rolling Updates: Respect pod readiness before updating

My deployment's pods are doing work that should not be interrupted. Is it possible that K8s is polling an endpoint about update readiness, or inform my pod that it is about to go down so it can get its affairs in order and then declare itself ready for an update?
Ideal process:
An updated pod is ready to replace an old one
A request is sent to the old pod by k8s, telling it that it is about to be updated
Old pod gets polled about update readiness
Old pod gets its affairs in order (e.g. stop receiving new tasks, finishes existing tasks)
Old pod says it is ready
Old pod gets replaced
You could perhaps look into using container lifecycle hooks - specifically prestop in this case.
apiVersion: v1
kind: Pod
metadata:
name: your-pod
spec:
containers:
- name: your-awesome-image
image: image-name
lifecycle:
postStart:
exec:
command: ["/bin/sh", "my-app", "-start"]
preStop:
exec:
# specifically by adding the cmd you want your image to run here
command: ["/bin/sh","my-app","-stop"]

preStop hook doesn't get executed

I am testing lifecycle hooks, and post-start works pretty well, but I think pre-stop never gets executed. There is another answer, but it is not working, and actually if it would work, it would contradict k8s documentation. So, from the docs:
PreStop
This hook is called immediately before a container is terminated due
to an API request or management event such as liveness probe failure,
preemption, resource contention and others. A call to the preStop hook
fails if the container is already in terminated or completed state.
So, the API request makes me think I can simply do kubectl delete pod POD, and I am good.
More from the docs (pod shutdown process):
1.- User sends command to delete Pod, with default grace period (30s)
2.- The Pod in the API server is updated with the time beyond which the Pod is considered “dead” along with the grace period.
3.- Pod shows up as “Terminating” when listed in client commands
4.- (simultaneous with 3) When the Kubelet sees that a Pod has been marked as terminating because the time in 2 has been set, it begins the pod shutdown process.
4.1.- If one of the Pod’s containers has defined a preStop hook, it is invoked inside of the container. If the preStop hook is still running after the grace period expires, step 2 is then invoked with a small (2 second) extended grace period.
4.2.- The container is sent the TERM signal. Note that not all containers in the Pod will receive the TERM signal at the same time and may each require a preStop hook if the order in which they shut down matters.
...
So, since when you do kubectl delete pod POD, the pod gets on Terminating, I assume I can do it.
From the other answer, I can't do this, but the way is to do a rolling-update. Well, I tried in all possible ways and it didn't work either.
My tests:
I have a deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: my-deploy
spec:
replicas: 1
template:
metadata:
name: lifecycle-demo
labels:
lifecycle: demo
spec:
containers:
- name: nginx
image: nginx
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- echo "Hello at" `date` > /usr/share/post-start
preStop:
exec:
command:
- /bin/sh"
- -c
- echo "Goodbye at" `date` > /usr/share/pre-stop
volumeMounts:
- name: hooks
mountPath: /usr/share/
volumes:
- name: hooks
hostPath:
path: /usr/hooks/
I expect the pre-stop and post-start files to be created in /usr/hooks/, on the host (node where the pod is running). post-start is there, but pre-stop, never.
I tried kubectl delete pod POD, and it didn't work.
I tried kubectl replace -f deploy.yaml, with a different image, and when I do kubectl get rs, I can see the new replicaSet created, but the file isn't there.
I tried kubectl set image ..., and again, I can see the new replicaSet created, but the file isn't there.
I even tried putting them in a completely separated volumes, as I thought may be when I kill the pod and it gets re-created it re-creates the folder where the files should be created, so it deletes the folder and the pre-stop file, but that was not the case.
Note: It always get re-created on the same node. I made sure on that.
What I have not tried is to bomb the container and break it by setting low CPU limit, but that's not what I need.
Any idea what are the circumstances under which preStop hook would get triggered?
Posting this as community wiki for a better visibility.
There is a typo in the second "/bin/sh"; for preStop. There is an extra double quote ("). It was letting me to create the deployment, but was the cause it was not creating the file. All works fine now.
The exact point where the issue lied was here:
preStop:
exec:
command:
- /bin/sh" # <- this quotation
- -c
- echo "Goodbye at" `date` > /usr/share/pre-stop
To be correct it should look like that:
preStop:
exec:
command:
- /bin/sh
- -c
- echo "Goodbye at" `date` > /usr/share/pre-stop
For the time of writing this community wiki post, this Deployment manifest is outdated. Following changes were needed to be able to run this manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: good-deployment
spec:
selector:
matchLabels:
lifecycle: demo
replicas: 1
template:
metadata:
labels:
lifecycle: demo
spec:
containers:
- name: nginx
image: nginx
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- echo "Hello at" `date` > /usr/share/post-start
preStop:
exec:
command:
- /bin/sh
- -c
- echo "Goodbye at" `date` > /usr/share/pre-stop
volumeMounts:
- name: hooks
mountPath: /usr/share/
volumes:
- name: hooks
hostPath:
path: /usr/hooks/
Changes were following:
1. apiVersion
+--------------------------------+---------------------+
| Old | New |
+--------------------------------+---------------------+
| apiVersion: extensions/v1beta1 | apiVersion: apps/v1 |
+--------------------------------+---------------------+
StackOverflow answer for more reference:
Stackoverflow.com: Questions: No matches for kind “Deployment” in version extensions/v1beta1
2. selector
Added selector section under spec:
spec:
selector:
matchLabels:
lifecycle: demo
Additional links with reference:
What is spec - selector - matchLabels used for while creating a deployment?
Kubernetes.io: Docs: Concepts: Workloads: Controllers: Deployment: Selector
Posting this as community wiki for a better visibility.
When a pod should be terminated:
A SIGTERM signal is sent to the main process (PID 1) in each container, and a “grace period” countdown starts (defaults to 30 seconds for k8s pod - see below to change it).
Upon the receival of the SIGTERM, each container should start a graceful shutdown of the running application and exit.
If a container doesn’t terminate within the grace period, a SIGKILL signal will be sent and the container violently terminated.
For a detailed explanation, please see:
Kubernetes: Termination of pods
Kubernetes: Pods lifecycle hooks and termination notice
Kubernetes: Container lifecycle hooks
Always Confirm this:
check whether preStop is taking more than 30 seconds to run (more than default graceful period time). If it is taking then increase the terminationGracePeriodSeconds to more than 30 seconds, may be 60. refer this for more info about terminationGracePeriodSeconds
I know its too late to answer, but it is worth to add here.
I spend a full day to figureout this preStop in K8S.
K8S does not print any logs in PreStop stage. PreStop is part of lifecycle, also called as hook.
Generally Hook and Probs(Liveness & Readiness) logs will not print in kubectl logs.
Read this issue, you will get to know fully.
But there is indirect way to print logs in kubectl logs cmd. Follow the last comment in the above link
Adding here also.
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- sleep 10; echo 'hello from postStart hook' >> /proc/1/fd/1