Following the documentation here, I could set the threshold for container startup like so:
startupProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 30
periodSeconds: 10
Unfortunately, it seems like startupProbe.failureThreshold is not compatible with our current k8s version (1.12.1):
unknown field "startupProbe" in io.k8s.api.core.v1.Container; if you choose to ignore these errors, turn validation off with --validate=false
Is there a workaround for this? I'd like to give a container a chance of ~40+ minutes to start.
Yes, startupProbe was introduced with 1.16 - so you cannot use it with Kubernetes 1.12.
I am guessing you are defining a livenessProbe - so the easiest way to get around your problem is to remove the livenessProbe. Most applications won't need one (some won't even need a readinessProbe). See also this excellent article: Liveness Probes are Dangerous.
If you have a probe, you could specify initialDelaySeconds and make it some large value that is sufficient for your container to start up.
If you didn't care about probes at all, then you could just let it execute a command that will never fail e.g. whoami
Take what you need from the example below:
readinessProbe:
exec:
command:
- whoami
initialDelaySeconds: 2400
periodSeconds: 5
You could do the same config for livenessProbe if you require one.
I know this is not an answer for this question, but can be useful...
"startupProbes" comes with k8s 1.16+.
If you are suing helm you can surround your block startupProbes with this in your template:
{{- if (semverCompare ">=1.16-0" .Capabilities.KubeVersion.GitVersion) }}
startupProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 30
periodSeconds: 10
{{- end }}
Related
I have a sample webapp and redis that I am running in Kubernetes.
I am using probes for the basic checks like below
Now I want to make sure that redis is up and running before the application.
below code snippet is from webapp.
when I run a command nc -zv <redis service name> 6379 it works well, but when I use it as command in startupProbe it gives me errors. I think the way I am passing command is not right, can someone help me understand what is wrong
error I get
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "nc -zv redis 6379": executable file not found in $PATH: unknown
readinessProbe:
httpGet:
path: /
port: 5000
initialDelaySeconds: 20
periodSeconds: 5
livenessProbe:
httpGet:
path: /
port: 5000
initialDelaySeconds: 30
periodSeconds: 5
startupProbe:
exec:
command:
- nc -zv redis 6379
failureThreshold: 20
periodSeconds: 5
The command has to be entered in proper format as it is an array. The below code is in expected format.
startupProbe:
exec:
command:
- nc
- -zv
- redis
- "6379"
failureThreshold: 30
periodSeconds: 5
I am implementing a Docker Compose health check for Prysm Docker container. Prysm is Ethereum 2 node.
My goal is to ensure that RPC APIs (gRPC, JSON-RPC) of Prysm are up before starting other services in the same Docker Compose file, as those services depend on Prysm. I can use depends_on of Docker Compose file for this, but I need to figure out how to construct a check that checks if Prysm HTTP ports are ready to accept traffic.
The equivalent Kubernetes health check is:
readinessProbe:
initialDelaySeconds: 180
timeoutSeconds: 1
periodSeconds: 60
failureThreshold: 3
successThreshold: 1
httpGet:
path: /healthz
port: 9090
scheme: HTTP
livenessProbe:
initialDelaySeconds: 60
timeoutSeconds: 1
periodSeconds: 60
failureThreshold: 60
successThreshold: 1
httpGet:
path: /healthz
port: 9090
scheme: HTTP
The problem with Prysm image is that it lacks normal UNIX tools within the image (curl, netcat, /bin/sh) one usually uses to create such checks.
Is there a way to implement an HTTP health check with Docker Compose that would use built-in features in compose (are there any) or commands from the host system instead of ones within the container?
I managed to accomplish this by creating another service using Dockerize image.
version: '3'
services:
# Oracle connects to ETH1 and ETH2 nodes
# oracle:
stakewise:
container_name: stakewise-oracle
image: stakewiselabs/oracle:v1.0.1
# Do not start oracle service until beacon health check succeeds
depends_on:
beacon_ready:
condition: service_healthy
# ETH2 Prysm node
beacon:
container_name: eth2-beacon
image: gcr.io/prysmaticlabs/prysm/beacon-chain:latest
restart: always
hostname: beacon-chain
# An external startup check tool for Prysm
# Using https://github.com/jwilder/dockerize
# Simply wait that TCP port of RPC becomes available before
# starting the Oracle to avoid errors on the startup.
beacon_ready:
image: jwilder/dockerize
container_name: eth2-beacon-ready
command: "/bin/sh -c 'while true ; do dockerize -wait tcp://beacon-chain:3500 -timeout 300s ; sleep 99 ; done'"
depends_on:
- beacon
healthcheck:
test: ["CMD", "dockerize", "-wait", "tcp://beacon-chain:3500"]
interval: 1s
retries: 999
I'm using readinessProbe on my container and configured it work on HTTPS with scheme attribute.
My server expects the get the certificates. how can I configure the readiness probe to support HTTPS with certificates exchange? I don't want it to skip the certificates
readinessProbe:
httpGet:
path: /eh/heartbeat
port: 2347
scheme: HTTPS
initialDelaySeconds: 210
periodSeconds: 10
timeoutSeconds: 5
You can use Readiness command instead of HTTP request. This will give you complete control over the check, including the certificate exchange.
So, instead of:
readinessProbe:
httpGet:
path: /eh/heartbeat
port: 2347
scheme: HTTPS
, you would have something like:
readinessProbe:
exec:
command:
- python
- your_script.py
Be sure the script returns 0 if all is well, and non-zero value on failure.
(python your_script.py is, of course, just one example. You would know what is the best approach for you)
How do I define distinct Pod ports, one for application and another for health check (readinessProbe)?
Is the specification for ports, shown below, a correct way to make the readinessProbe to check the health check port TCP/9090 ? I mean, is the readinessProbe going to reach port 9090 (assuming it is open by the running container of course) ? Or does one need to specify any other port (nodePort, targetPort, port, whatever) ?
kind: Deployment
spec:
template:
spec:
containers:
- name: myapp
image: <image>
ports:
- name: myapp-port
containerPort: 8080
protocol: TCP
- name: healthcheck-port
containerPort: 9090
protocol: TCP
readinessProbe:
httpGet:
port: healthcheck-port
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 2
failureThreshold: 2
Yes, your specification snippet is almost correct. You don't need to specify any thing else to make readiness probe work.
Port names cannot be more than 15 characters, so the name healthcheck-port won't work. You might want to change the name to something smaller like healthcheck.
Your current configuration is almost correct as mentioned by #shashank-v except the port name.
What i would rather like to point out here apart from the name is to use the same port as best practice, which is TCP/8080 but have a healthz path where you application responds with ok or running. then in your httpget:
readinessProbe:
httpGet:
port: 8080
path: /healthz
You can specify any port and path (assuming it's http) for livenessProbe and readinessProbe, but, of course, you need to be serving something there.
It shouldn't be a service port, so NodePort is not an option, as that's kubelet in charge of the health of the containers, and it has direct access to the containers.
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
Good reference:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-tcp-liveness-probe
Basically, I need the database deployment to spin up before the API deployment. If the database isn't running, it throws an error in the API.
I've messed with the order in artifacts: and also in:
deploy:
kubectl:
manifests:
- manifests/ingress.yaml
- manifests/postgres.yaml
- manifests/client.yaml
- manifests/api.yaml
But it doesn't seem to have any bearing on the order they startup.
The only thing I can think is that it is alphabetical. I used to not have an issue: the database would startup 49/50 before the api. Now it is about the opposite. The only thing I've changed is a new computer and I renamed the server to api which puts it first alphabetically.
So two questions:
How is the deployment order determined in Skaffold?
Is there a way to set the order?
What I had to do was setup a readinessProbe (livenessProbe is optional, for continuous life check) in the containers section of the *.yaml files.
livenessProbe:
tcpSocket:
port: 5000
initialDelaySeconds: 2
periodSeconds: 2
readinessProbe:
tcpSocket:
port: 5000
initialDelaySeconds: 2
periodSeconds: 2
This looks for Django to fail (i.e. can't connect to the database) and if it does it keeps trying to redeploy it until it doesn't. This was the only way that I could find.