startup probes not working with exec as expected

startup probes not working with exec as expected - kubernetes

I have a sample webapp and redis that I am running in Kubernetes.
I am using probes for the basic checks like below
Now I want to make sure that redis is up and running before the application.
below code snippet is from webapp.
when I run a command nc -zv <redis service name> 6379 it works well, but when I use it as command in startupProbe it gives me errors. I think the way I am passing command is not right, can someone help me understand what is wrong
error I get
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "nc -zv redis 6379": executable file not found in $PATH: unknown
readinessProbe:
httpGet:
path: /
port: 5000
initialDelaySeconds: 20
periodSeconds: 5
livenessProbe:
httpGet:
path: /
port: 5000
initialDelaySeconds: 30
periodSeconds: 5
startupProbe:
exec:
command:
- nc -zv redis 6379
failureThreshold: 20
periodSeconds: 5

The command has to be entered in proper format as it is an array. The below code is in expected format.
startupProbe:
exec:
command:
- nc
- -zv
- redis
- "6379"
failureThreshold: 30
periodSeconds: 5

Related

Kubernetes - What happens if startupProbe runs beyond periodSeconds

I have a Deployment which runs a simple apache server. I want to execute some commands after the service is up. I am not quite sure how much time the post action commands going to take. I have "timeoutSeconds" set as more than "periodSeconds".
Kubernets Version: 1.25
apiVersion: apps/v1
kind: Deployment
metadata:
name: readiness
spec:
replicas: 1
selector:
matchLabels:
app: readiness
template:
metadata:
labels:
app: readiness
spec:
containers:
- image: sujeetkp/readiness:3.0
name: readiness
resources:
limits:
memory: "500M"
cpu: "1"
readinessProbe:
httpGet:
path: /health_monitor
port: 80
initialDelaySeconds: 20
timeoutSeconds: 10
failureThreshold: 20
periodSeconds: 10
livenessProbe:
httpGet:
path: /health_monitor
port: 80
initialDelaySeconds: 60
timeoutSeconds: 10
failureThreshold: 20
periodSeconds: 10
startupProbe:
exec:
command:
- /bin/sh
- -c
- |-
OUTPUT=$(curl -s -o /dev/null -w %{http_code} http://localhost:80/health_monitor)
if [ $? -eq 0 ] && [ $OUTPUT -ge 200 ] && [ $OUTPUT -lt 400 ]
then
echo "Success" >> /tmp/post_action_track
if [ ! -f /tmp/post_action_success ]
then
# Trigger Post Action
sleep 60
echo "Success" >> /tmp/post_action_success
fi
else
exit 1
fi
initialDelaySeconds: 20
timeoutSeconds: 80
failureThreshold: 20
periodSeconds: 10
When I run this code, I see very strange results.
As "periodSeconds" is 10 and my script has a sleep of 60 seconds, should not the start up probe trigger atleast 6 times, but it only triggers 2 times.
I am checking the contents of files /tmp/post_action_success and /tmp/post_action_track to identify how many times the probe triggers. (Count the number of success inside the files)
Question: If the previous instance of startup probe is running, then is the startupProbe triggered on top of it or not ? If yes, then why it triggered only twice in my case.
Another observation:
When I set below options
initialDelaySeconds: 20
timeoutSeconds: 5
failureThreshold: 20
periodSeconds: 10
Then the content of file /tmp/post_action_success shows sleep/timeoutSeconds (60/5)=12 "success".
Can someone please explain how this works.

I think the reason you see the probe being triggered twice is because of timeoutSeconds: 80. See this question. Also the official doc is quiet handy in explaining the other fields.
Perhaps you can set initialDelaySeconds: 61 instead of using sleep in you script?

Kubernetes postStart hook leads to race condition

I use a MySQL on Kubernetes with a postStart hook which should run a query after the start of the database.
This is the relevant part of my template.yaml:
spec:
containers:
- name: ${{APP}}
image: ${REGISTRY}/${NAMESPACE}/${APP}:${VERSION}
imagePullPolicy: Always
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- hostname && sleep 12 && echo $QUERY | /opt/rh/rh-mysql80/root/usr/bin/mysql
-h localhost -u root -D grafana
-P 3306
ports:
- name: tcp3306
containerPort: 3306
readinessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 15
timeoutSeconds: 1
livenessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 120
timeoutSeconds: 1
When the pod start, the PVC for the database gets corruped and the pod crashes. When I restart the pod, it works. I guess the query runs, when the database is not up yet. I guess this might get fixed with the readinessprobe, but I am not an expert at these topics.
Did anyone else run into a similar issue and knows how to fix it?

Note that postStart will be call at least once but may also be called more than once. This make postStart a bad place to run query.
You can set pod restartPolicy: OnFailure and run the query in separate MYSQL container. Start your second container with wait and run your query. Note that your query should produce idempotent result or your data integrity may breaks; consider when the pod is re-create with the existing data volume.

Docker Compose health check of HTTP API using tools outside the container

I am implementing a Docker Compose health check for Prysm Docker container. Prysm is Ethereum 2 node.
My goal is to ensure that RPC APIs (gRPC, JSON-RPC) of Prysm are up before starting other services in the same Docker Compose file, as those services depend on Prysm. I can use depends_on of Docker Compose file for this, but I need to figure out how to construct a check that checks if Prysm HTTP ports are ready to accept traffic.
The equivalent Kubernetes health check is:
readinessProbe:
initialDelaySeconds: 180
timeoutSeconds: 1
periodSeconds: 60
failureThreshold: 3
successThreshold: 1
httpGet:
path: /healthz
port: 9090
scheme: HTTP
livenessProbe:
initialDelaySeconds: 60
timeoutSeconds: 1
periodSeconds: 60
failureThreshold: 60
successThreshold: 1
httpGet:
path: /healthz
port: 9090
scheme: HTTP
The problem with Prysm image is that it lacks normal UNIX tools within the image (curl, netcat, /bin/sh) one usually uses to create such checks.
Is there a way to implement an HTTP health check with Docker Compose that would use built-in features in compose (are there any) or commands from the host system instead of ones within the container?

I managed to accomplish this by creating another service using Dockerize image.
version: '3'
services:
# Oracle connects to ETH1 and ETH2 nodes
# oracle:
stakewise:
container_name: stakewise-oracle
image: stakewiselabs/oracle:v1.0.1
# Do not start oracle service until beacon health check succeeds
depends_on:
beacon_ready:
condition: service_healthy
# ETH2 Prysm node
beacon:
container_name: eth2-beacon
image: gcr.io/prysmaticlabs/prysm/beacon-chain:latest
restart: always
hostname: beacon-chain
# An external startup check tool for Prysm
# Using https://github.com/jwilder/dockerize
# Simply wait that TCP port of RPC becomes available before
# starting the Oracle to avoid errors on the startup.
beacon_ready:
image: jwilder/dockerize
container_name: eth2-beacon-ready
command: "/bin/sh -c 'while true ; do dockerize -wait tcp://beacon-chain:3500 -timeout 300s ; sleep 99 ; done'"
depends_on:
- beacon
healthcheck:
test: ["CMD", "dockerize", "-wait", "tcp://beacon-chain:3500"]
interval: 1s
retries: 999

what is the command to check zookeeper health check for liveness , readiness probes and to start , stop the zookeeper

i have tried the following commands to check the zookeeper health and its corresponding error i am getting
sh -c zookeeper-ready 2181 (error: zookeeper-ready command not found)
i have tried all echo commands (error: it is not a file)
/apache-zookeeper-3.5.5-bin/bin/zkServer.sh start (error: cannot be able to start)
/apache-zookeeper-3.5.5-bin/bin/zkServer.sh stop (error: zookeeper stopping ...... there is no zookeeper to stop)
/apache-zookeeper-3.5.5-bin/bin/zkServer.sh status (error: when i am stopping the zookeeper the probe needs to fail for this command but it is not happening. it needs to be done)
and i have used these commands in go file as
LivenessProbe: &corev1.Probe{
Handler: corev1.Handler{
Exec: &corev1.ExecAction{
Command: []string{"sh",
"/apache-zookeeper-3.5.5-bin/bin/zkServer.sh" ,
"status",
},
},
},
InitialDelaySeconds: 30,
TimeoutSeconds: 5,
},
ReadinessProbe: &corev1.Probe{
Handler: corev1.Handler{
Exec: &corev1.ExecAction{
Command: []string{
"sh",
"/apache-zookeeper-3.5.5-bin/bin/zkServer.sh" ,
"status",
},
},
},
InitialDelaySeconds: 30,
TimeoutSeconds: 5,
},

To check liveness and rediness for zookeeper you can use following command
echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok
but make sure to set the env variable ZOO_4LW_COMMANDS_WHITELIST=ruok other wise the the check will fail.

You have to configure
livenessProbe:
exec:
command: ['/bin/bash', '-c', 'echo "ruok" | nc -w 2 localhost 2181 | grep imok']
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
readinessProbe:
exec:
command: ['/bin/bash', '-c', 'echo "ruok" | nc -w 2 localhost 2181 | grep imok']
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
For ruok,
Plain kafka
You need to set env. variable
ZOO_4LW_COMMANDS_WHITELIST=rouk
Confluent kafka
KAFKA_OPTS=-Dzookeeper.4lw.commands.whitelist=ruok
Also, You have to change the podManagementPolicy=parallel to start parallel

How to define Pod health check port for livenessProbe/readinessProbe

How do I define distinct Pod ports, one for application and another for health check (readinessProbe)?
Is the specification for ports, shown below, a correct way to make the readinessProbe to check the health check port TCP/9090 ? I mean, is the readinessProbe going to reach port 9090 (assuming it is open by the running container of course) ? Or does one need to specify any other port (nodePort, targetPort, port, whatever) ?
kind: Deployment
spec:
template:
spec:
containers:
- name: myapp
image: <image>
ports:
- name: myapp-port
containerPort: 8080
protocol: TCP
- name: healthcheck-port
containerPort: 9090
protocol: TCP
readinessProbe:
httpGet:
port: healthcheck-port
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 2
failureThreshold: 2

Yes, your specification snippet is almost correct. You don't need to specify any thing else to make readiness probe work.
Port names cannot be more than 15 characters, so the name healthcheck-port won't work. You might want to change the name to something smaller like healthcheck.

Your current configuration is almost correct as mentioned by #shashank-v except the port name.
What i would rather like to point out here apart from the name is to use the same port as best practice, which is TCP/8080 but have a healthz path where you application responds with ok or running. then in your httpget:
readinessProbe:
httpGet:
port: 8080
path: /healthz

You can specify any port and path (assuming it's http) for livenessProbe and readinessProbe, but, of course, you need to be serving something there.
It shouldn't be a service port, so NodePort is not an option, as that's kubelet in charge of the health of the containers, and it has direct access to the containers.

readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
Good reference:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-tcp-liveness-probe

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

startup probes not working with exec as expected - kubernetes

The command has to be entered in proper format as it is an array. The below code is in expected format. startupProbe: exec: command: - nc - -zv - redis - "6379" failureThreshold: 30 periodSeconds: 5

Related

Kubernetes - What happens if startupProbe runs beyond periodSeconds

Kubernetes postStart hook leads to race condition

Docker Compose health check of HTTP API using tools outside the container

what is the command to check zookeeper health check for liveness , readiness probes and to start , stop the zookeeper

How to define Pod health check port for livenessProbe/readinessProbe

Categories

Resources