Kubernetes - What happens if startupProbe runs beyond periodSeconds - kubernetes

I have a Deployment which runs a simple apache server. I want to execute some commands after the service is up. I am not quite sure how much time the post action commands going to take. I have "timeoutSeconds" set as more than "periodSeconds".
Kubernets Version: 1.25
apiVersion: apps/v1
kind: Deployment
metadata:
name: readiness
spec:
replicas: 1
selector:
matchLabels:
app: readiness
template:
metadata:
labels:
app: readiness
spec:
containers:
- image: sujeetkp/readiness:3.0
name: readiness
resources:
limits:
memory: "500M"
cpu: "1"
readinessProbe:
httpGet:
path: /health_monitor
port: 80
initialDelaySeconds: 20
timeoutSeconds: 10
failureThreshold: 20
periodSeconds: 10
livenessProbe:
httpGet:
path: /health_monitor
port: 80
initialDelaySeconds: 60
timeoutSeconds: 10
failureThreshold: 20
periodSeconds: 10
startupProbe:
exec:
command:
- /bin/sh
- -c
- |-
OUTPUT=$(curl -s -o /dev/null -w %{http_code} http://localhost:80/health_monitor)
if [ $? -eq 0 ] && [ $OUTPUT -ge 200 ] && [ $OUTPUT -lt 400 ]
then
echo "Success" >> /tmp/post_action_track
if [ ! -f /tmp/post_action_success ]
then
# Trigger Post Action
sleep 60
echo "Success" >> /tmp/post_action_success
fi
else
exit 1
fi
initialDelaySeconds: 20
timeoutSeconds: 80
failureThreshold: 20
periodSeconds: 10
When I run this code, I see very strange results.
As "periodSeconds" is 10 and my script has a sleep of 60 seconds, should not the start up probe trigger atleast 6 times, but it only triggers 2 times.
I am checking the contents of files /tmp/post_action_success and /tmp/post_action_track to identify how many times the probe triggers. (Count the number of success inside the files)
Question: If the previous instance of startup probe is running, then is the startupProbe triggered on top of it or not ? If yes, then why it triggered only twice in my case.
Another observation:
When I set below options
initialDelaySeconds: 20
timeoutSeconds: 5
failureThreshold: 20
periodSeconds: 10
Then the content of file /tmp/post_action_success shows sleep/timeoutSeconds (60/5)=12 "success".
Can someone please explain how this works.

I think the reason you see the probe being triggered twice is because of timeoutSeconds: 80. See this question. Also the official doc is quiet handy in explaining the other fields.
Perhaps you can set initialDelaySeconds: 61 instead of using sleep in you script?

Related

readiness probe with command running scripts readiness-probe.sh failed on openshift by Mongodb Helm Chart bitnami

by Deploying the Mongodb Helm Chart of Bitnami to openshift, i got the Error "Readiness Probe failed"
the Health Check setting for Readiness and liveness Probe is looking like this
livenessProbe:
failureThreshold: 6
initialDelaySeconds: 30
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 10
exec:
command:
- /bitnami/scripts/ping-mongodb.sh
readinessProbe:
failureThreshold: 6
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
exec:
command:
- /bitnami/scripts/readiness-probe.sh
the scripts calling by the Command (/bitnami/scripts/readiness-probe.sh) is looking like this
#!/bin/bash
# Run the proper check depending on the version
[[ $(mongod -version | grep "db version") =~ ([0-9]+\.[0-9]+\.[0-9]+) ]] && VERSION=${BASH_REMATCH[1]}
. /opt/bitnami/scripts/libversion.sh
VERSION_MAJOR="$(get_sematic_version "$VERSION" 1)"
VERSION_MINOR="$(get_sematic_version "$VERSION" 2)"
VERSION_PATCH="$(get_sematic_version "$VERSION" 3)"
if [[ ( "$VERSION_MAJOR" -ge 5 ) || ( "$VERSION_MAJOR" -ge 4 && "$VERSION_MINOR" -ge 4 && "$VERSION_PATCH" -ge 2 ) ]]; then
mongosh $TLS_OPTIONS --port $MONGODB_PORT_NUMBER --eval 'db.hello().isWritablePrimary || db.hello().secondary' | grep -q 'true'
else
mongosh $TLS_OPTIONS --port $MONGODB_PORT_NUMBER --eval 'db.isMaster().ismaster || db.isMaster().secondary' | grep -q 'true'
fi
By running this script, the Pod becomes very slow.
No matter how high I set the time for Readiness probe, it doesn't work.
i check if the script are existing in the running Pod --> there are the file /bitnami/scripts/readiness-probe.sh existing in the Pod
i change the Command running the script to just " cat /bitnami/scripts/readiness-probe.sh" in readiness probe setting --> IT WORKING
livenessProbe:
failureThreshold: 6
initialDelaySeconds: 30
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 10
exec:
command:
- cat
- /bitnami/scripts/ping-mongodb.sh
readinessProbe:
failureThreshold: 6
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
exec:
command:
- cat
- /bitnami/scripts/readiness-probe.sh
i increase the CPU and memory --> NOT Success!
I have noticed that the Pod becomes very slow as soon as a Mongodb command is executed.
I had the same problem and saw that there was not enough timeoutSeconds
readinessProbe:
timeoutSeconds: 5
in the deployment, so I increased the timeout time to 20 and the error went away.
Idk, how many resourses it needs to work stable(

startup probes not working with exec as expected

I have a sample webapp and redis that I am running in Kubernetes.
I am using probes for the basic checks like below
Now I want to make sure that redis is up and running before the application.
below code snippet is from webapp.
when I run a command nc -zv <redis service name> 6379 it works well, but when I use it as command in startupProbe it gives me errors. I think the way I am passing command is not right, can someone help me understand what is wrong
error I get
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "nc -zv redis 6379": executable file not found in $PATH: unknown
readinessProbe:
httpGet:
path: /
port: 5000
initialDelaySeconds: 20
periodSeconds: 5
livenessProbe:
httpGet:
path: /
port: 5000
initialDelaySeconds: 30
periodSeconds: 5
startupProbe:
exec:
command:
- nc -zv redis 6379
failureThreshold: 20
periodSeconds: 5
The command has to be entered in proper format as it is an array. The below code is in expected format.
startupProbe:
exec:
command:
- nc
- -zv
- redis
- "6379"
failureThreshold: 30
periodSeconds: 5

Kubernetes postStart hook leads to race condition

I use a MySQL on Kubernetes with a postStart hook which should run a query after the start of the database.
This is the relevant part of my template.yaml:
spec:
containers:
- name: ${{APP}}
image: ${REGISTRY}/${NAMESPACE}/${APP}:${VERSION}
imagePullPolicy: Always
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- hostname && sleep 12 && echo $QUERY | /opt/rh/rh-mysql80/root/usr/bin/mysql
-h localhost -u root -D grafana
-P 3306
ports:
- name: tcp3306
containerPort: 3306
readinessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 15
timeoutSeconds: 1
livenessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 120
timeoutSeconds: 1
When the pod start, the PVC for the database gets corruped and the pod crashes. When I restart the pod, it works. I guess the query runs, when the database is not up yet. I guess this might get fixed with the readinessprobe, but I am not an expert at these topics.
Did anyone else run into a similar issue and knows how to fix it?
Note that postStart will be call at least once but may also be called more than once. This make postStart a bad place to run query.
You can set pod restartPolicy: OnFailure and run the query in separate MYSQL container. Start your second container with wait and run your query. Note that your query should produce idempotent result or your data integrity may breaks; consider when the pod is re-create with the existing data volume.

what is the command to check zookeeper health check for liveness , readiness probes and to start , stop the zookeeper

i have tried the following commands to check the zookeeper health and its corresponding error i am getting
sh -c zookeeper-ready 2181 (error: zookeeper-ready command not found)
i have tried all echo commands (error: it is not a file)
/apache-zookeeper-3.5.5-bin/bin/zkServer.sh start (error: cannot be able to start)
/apache-zookeeper-3.5.5-bin/bin/zkServer.sh stop (error: zookeeper stopping ...... there is no zookeeper to stop)
/apache-zookeeper-3.5.5-bin/bin/zkServer.sh status (error: when i am stopping the zookeeper the probe needs to fail for this command but it is not happening. it needs to be done)
and i have used these commands in go file as
LivenessProbe: &corev1.Probe{
Handler: corev1.Handler{
Exec: &corev1.ExecAction{
Command: []string{"sh",
"/apache-zookeeper-3.5.5-bin/bin/zkServer.sh" ,
"status",
},
},
},
InitialDelaySeconds: 30,
TimeoutSeconds: 5,
},
ReadinessProbe: &corev1.Probe{
Handler: corev1.Handler{
Exec: &corev1.ExecAction{
Command: []string{
"sh",
"/apache-zookeeper-3.5.5-bin/bin/zkServer.sh" ,
"status",
},
},
},
InitialDelaySeconds: 30,
TimeoutSeconds: 5,
},
To check liveness and rediness for zookeeper you can use following command
echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok
but make sure to set the env variable ZOO_4LW_COMMANDS_WHITELIST=ruok other wise the the check will fail.
You have to configure
livenessProbe:
exec:
command: ['/bin/bash', '-c', 'echo "ruok" | nc -w 2 localhost 2181 | grep imok']
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
readinessProbe:
exec:
command: ['/bin/bash', '-c', 'echo "ruok" | nc -w 2 localhost 2181 | grep imok']
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
For ruok,
Plain kafka
You need to set env. variable
ZOO_4LW_COMMANDS_WHITELIST=rouk
Confluent kafka
KAFKA_OPTS=-Dzookeeper.4lw.commands.whitelist=ruok
Also, You have to change the podManagementPolicy=parallel to start parallel

How to check uwsgi with gevent at socket file in k8s readinessProbe and livenessProbe

I have a flask app with uwsgi and gevent.
Here is my app.ini
How could I write readinessProbe and livenessProbe on kubernetes to check to flask app?
[uwsgi]
socket = /tmp/uwsgi.sock
chdir = /usr/src/app/
chmod-socket = 666
module = flasky
callable = app
master = false
processes = 1
vacuum = true
die-on-term = true
gevent = 1000
listen = 1024
I think what you are really asking is "How to health check a uWSGI application". There are some example tools to do this. Particularly:
https://github.com/andreif/uwsgi-tools
https://github.com/che0/uwping
https://github.com/m-messiah/uwget
The uwsgi-tools project seems to have the most complete example at https://github.com/andreif/uwsgi-tools/issues/2#issuecomment-345195583. In a Kubernetes Pod spec context this might end up looking like:
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: myapp
image: myimage
livenessProbe:
exec:
command:
- uwsgi_curl
- -H
- Host:host.name
- /path/to/unix/socket
- /health
initialDelaySeconds: 5
periodSeconds: 5
This would also assume your application responded to /health as a health endpoint.
You can configure uWSGI to serve both uwsgi-socket along side http-socket, and only expose the uwsgi-socket to the k8s service.
In this case your uwsgi.ini would looks something like:
[uwsgi]
socket = /tmp/uwsgi.sock
chdir = /usr/src/app/
chmod-socket = 666
module = flasky
callable = app
master = false
processes = 1
vacuum = true
die-on-term = true
gevent = 1000
listen = 1024
http-socket = 0.0.0.0:5050
And assuming you have /health endpoint in your app, your k8s manifest can be something like:
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: myapp
image: myimage
livenessProbe:
httpGet:
path: /health
port: 5050
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 5
periodSeconds: 5
In this case, your service will be reachable sitting as the upstream at socket = /tmp/uwsgi.sock via your k8s service and the k8s healthcheck service can reach your container at http-socket: 5050.
I wrote a small readiness check for uwsgi applications: https://github.com/filipenf/uwsgi-readiness-check/
It reads uwsgi's stats socket and checks for queue size. If queue is above a configurable threshold, Pod is marked as "NotReady" until its queue is drained and it can be marked as ready again.
Install it into your container image with:
pip install uwsgi-readiness-check
and then run the check with something like:
readinessProbe:
exec:
command:
- uwsgi-is-ready
- --stats-socket
- /tmp/uwsgi-stats
- --queue-threshold
- 0.7
failureThreshold: 2
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
Hope it helps for your use case