Docker Compose health check of HTTP API using tools outside the container - kubernetes

I am implementing a Docker Compose health check for Prysm Docker container. Prysm is Ethereum 2 node.
My goal is to ensure that RPC APIs (gRPC, JSON-RPC) of Prysm are up before starting other services in the same Docker Compose file, as those services depend on Prysm. I can use depends_on of Docker Compose file for this, but I need to figure out how to construct a check that checks if Prysm HTTP ports are ready to accept traffic.
The equivalent Kubernetes health check is:
readinessProbe:
initialDelaySeconds: 180
timeoutSeconds: 1
periodSeconds: 60
failureThreshold: 3
successThreshold: 1
httpGet:
path: /healthz
port: 9090
scheme: HTTP
livenessProbe:
initialDelaySeconds: 60
timeoutSeconds: 1
periodSeconds: 60
failureThreshold: 60
successThreshold: 1
httpGet:
path: /healthz
port: 9090
scheme: HTTP
The problem with Prysm image is that it lacks normal UNIX tools within the image (curl, netcat, /bin/sh) one usually uses to create such checks.
Is there a way to implement an HTTP health check with Docker Compose that would use built-in features in compose (are there any) or commands from the host system instead of ones within the container?

I managed to accomplish this by creating another service using Dockerize image.
version: '3'
services:
# Oracle connects to ETH1 and ETH2 nodes
# oracle:
stakewise:
container_name: stakewise-oracle
image: stakewiselabs/oracle:v1.0.1
# Do not start oracle service until beacon health check succeeds
depends_on:
beacon_ready:
condition: service_healthy
# ETH2 Prysm node
beacon:
container_name: eth2-beacon
image: gcr.io/prysmaticlabs/prysm/beacon-chain:latest
restart: always
hostname: beacon-chain
# An external startup check tool for Prysm
# Using https://github.com/jwilder/dockerize
# Simply wait that TCP port of RPC becomes available before
# starting the Oracle to avoid errors on the startup.
beacon_ready:
image: jwilder/dockerize
container_name: eth2-beacon-ready
command: "/bin/sh -c 'while true ; do dockerize -wait tcp://beacon-chain:3500 -timeout 300s ; sleep 99 ; done'"
depends_on:
- beacon
healthcheck:
test: ["CMD", "dockerize", "-wait", "tcp://beacon-chain:3500"]
interval: 1s
retries: 999

Related

startup probes not working with exec as expected

I have a sample webapp and redis that I am running in Kubernetes.
I am using probes for the basic checks like below
Now I want to make sure that redis is up and running before the application.
below code snippet is from webapp.
when I run a command nc -zv <redis service name> 6379 it works well, but when I use it as command in startupProbe it gives me errors. I think the way I am passing command is not right, can someone help me understand what is wrong
error I get
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "nc -zv redis 6379": executable file not found in $PATH: unknown
readinessProbe:
httpGet:
path: /
port: 5000
initialDelaySeconds: 20
periodSeconds: 5
livenessProbe:
httpGet:
path: /
port: 5000
initialDelaySeconds: 30
periodSeconds: 5
startupProbe:
exec:
command:
- nc -zv redis 6379
failureThreshold: 20
periodSeconds: 5
The command has to be entered in proper format as it is an array. The below code is in expected format.
startupProbe:
exec:
command:
- nc
- -zv
- redis
- "6379"
failureThreshold: 30
periodSeconds: 5

Kubernetes postStart hook leads to race condition

I use a MySQL on Kubernetes with a postStart hook which should run a query after the start of the database.
This is the relevant part of my template.yaml:
spec:
containers:
- name: ${{APP}}
image: ${REGISTRY}/${NAMESPACE}/${APP}:${VERSION}
imagePullPolicy: Always
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- hostname && sleep 12 && echo $QUERY | /opt/rh/rh-mysql80/root/usr/bin/mysql
-h localhost -u root -D grafana
-P 3306
ports:
- name: tcp3306
containerPort: 3306
readinessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 15
timeoutSeconds: 1
livenessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 120
timeoutSeconds: 1
When the pod start, the PVC for the database gets corruped and the pod crashes. When I restart the pod, it works. I guess the query runs, when the database is not up yet. I guess this might get fixed with the readinessprobe, but I am not an expert at these topics.
Did anyone else run into a similar issue and knows how to fix it?
Note that postStart will be call at least once but may also be called more than once. This make postStart a bad place to run query.
You can set pod restartPolicy: OnFailure and run the query in separate MYSQL container. Start your second container with wait and run your query. Note that your query should produce idempotent result or your data integrity may breaks; consider when the pod is re-create with the existing data volume.

How to verify certificates for Liveness probe configured to https?

I'm using readinessProbe on my container and configured it work on HTTPS with scheme attribute.
My server expects the get the certificates. how can I configure the readiness probe to support HTTPS with certificates exchange? I don't want it to skip the certificates
readinessProbe:
httpGet:
path: /eh/heartbeat
port: 2347
scheme: HTTPS
initialDelaySeconds: 210
periodSeconds: 10
timeoutSeconds: 5
You can use Readiness command instead of HTTP request. This will give you complete control over the check, including the certificate exchange.
So, instead of:
readinessProbe:
httpGet:
path: /eh/heartbeat
port: 2347
scheme: HTTPS
, you would have something like:
readinessProbe:
exec:
command:
- python
- your_script.py
Be sure the script returns 0 if all is well, and non-zero value on failure.
(python your_script.py is, of course, just one example. You would know what is the best approach for you)

How to define Pod health check port for livenessProbe/readinessProbe

How do I define distinct Pod ports, one for application and another for health check (readinessProbe)?
Is the specification for ports, shown below, a correct way to make the readinessProbe to check the health check port TCP/9090 ? I mean, is the readinessProbe going to reach port 9090 (assuming it is open by the running container of course) ? Or does one need to specify any other port (nodePort, targetPort, port, whatever) ?
kind: Deployment
spec:
template:
spec:
containers:
- name: myapp
image: <image>
ports:
- name: myapp-port
containerPort: 8080
protocol: TCP
- name: healthcheck-port
containerPort: 9090
protocol: TCP
readinessProbe:
httpGet:
port: healthcheck-port
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 2
failureThreshold: 2
Yes, your specification snippet is almost correct. You don't need to specify any thing else to make readiness probe work.
Port names cannot be more than 15 characters, so the name healthcheck-port won't work. You might want to change the name to something smaller like healthcheck.
Your current configuration is almost correct as mentioned by #shashank-v except the port name.
What i would rather like to point out here apart from the name is to use the same port as best practice, which is TCP/8080 but have a healthz path where you application responds with ok or running. then in your httpget:
readinessProbe:
httpGet:
port: 8080
path: /healthz
You can specify any port and path (assuming it's http) for livenessProbe and readinessProbe, but, of course, you need to be serving something there.
It shouldn't be a service port, so NodePort is not an option, as that's kubelet in charge of the health of the containers, and it has direct access to the containers.
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
Good reference:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-tcp-liveness-probe

Increase startup threshold for k8s container in v1.12

Following the documentation here, I could set the threshold for container startup like so:
startupProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 30
periodSeconds: 10
Unfortunately, it seems like startupProbe.failureThreshold is not compatible with our current k8s version (1.12.1):
unknown field "startupProbe" in io.k8s.api.core.v1.Container; if you choose to ignore these errors, turn validation off with --validate=false
Is there a workaround for this? I'd like to give a container a chance of ~40+ minutes to start.
Yes, startupProbe was introduced with 1.16 - so you cannot use it with Kubernetes 1.12.
I am guessing you are defining a livenessProbe - so the easiest way to get around your problem is to remove the livenessProbe. Most applications won't need one (some won't even need a readinessProbe). See also this excellent article: Liveness Probes are Dangerous.
If you have a probe, you could specify initialDelaySeconds and make it some large value that is sufficient for your container to start up.
If you didn't care about probes at all, then you could just let it execute a command that will never fail e.g. whoami
Take what you need from the example below:
readinessProbe:
exec:
command:
- whoami
initialDelaySeconds: 2400
periodSeconds: 5
You could do the same config for livenessProbe if you require one.
I know this is not an answer for this question, but can be useful...
"startupProbes" comes with k8s 1.16+.
If you are suing helm you can surround your block startupProbes with this in your template:
{{- if (semverCompare ">=1.16-0" .Capabilities.KubeVersion.GitVersion) }}
startupProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 30
periodSeconds: 10
{{- end }}