I have a question regarding the timeout in exec type container probe in openshift/kubernetes.
My openshift version is 3.11 which has kubernetes version 1.11
I have defined a readiness probe as stated below
readinessProbe:
exec:
command:
- /bin/sh
- -c
- check_readiness.sh
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 1
according to openshift documentation timeoutSeconds parameter has no effect on the container probe for exec type probe.
check_readiness.sh script is a long running script and may take more than 5 mins to return.
After the container start i logged into the container to check the status of the script.
What i found is that after approx 2 min another check_readiness.sh script was started while the first one was still running and another one after approx 2 min.
Can someone explain what openshift or kubernetes doing with the probe in this case ?
Yes, that is correct, Container Execution Checks do not support the timeoutSeconds argument. However, as the documentation notes, you can implement similar functionality with the timeout command:
[...]
readinessProbe:
exec:
command:
- /bin/bash
- '-c'
- timeout 60 /opt/eap/bin/livenessProbe.sh
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
[...]
So in your case I am guessing the following is happening:
Your container is started.
After the duration initialDelaySeconds (30 seconds in your case), the first readiness probe is started and your script is executed.
Then, after periodSeconds (30s) the next probe is launched, in your case leading to the script being executed the second time.
Every 30s, the script is started again, even though the previous iteration(s) are still running.
So in your case you should either use the timeout command as seen in the documentation or increase the periodSeconds to make sure the two scripts are not executed simultaneously.
In general, I would recommend that you make sure your readiness-check-script returns much faster than multiple minutes to avoid these kind of problems.
Related
I have found several people asking about how to start a container running a DB, then run a different container that runs maintenance/migration on the DB which then exits. Here are all of the solutions I've examined and what I think are the problems with each:
Init Containers - This wont work because these run before the main container is up and they block the starting of the main container until they successfully complete.
Post Start Hook - If the postStart hook could start containers rather than simply exec a command inside the container then this would work. Unfortunately, the container with the DB does not (and should not) contain the rather large maintenance application required to run it this way. This would be a violation of the principle that each component should do one thing and do it well.
Sidecar Pattern - This WOULD work if the restartPolicy were assignable or overridable at the container level rather than the pod level. In my case the maintenance container should terminate successfully before the pod is considered Running (just like would be the case if the postStart hook could run a container) while the DB container should Always restart.
Separate Pod - Running the maintenance as a separate pod can work, but the DB shouldn't be considered up until the maintenance runs. That means managing the Running state has to be done completely independently of Kubernetes. Every other container/pod in the system will have to do a custom check that the maintenance has run rather than a simple check that the DB is up.
Using a Job - Unless I misunderstand how these work, this would be equivalent to the above ("Separate Pod").
OnFailure restart policy with a Sidecar - This means using a restartPolicy of OnFailure for the POD but then hacking the DB container so that it always exits with an error. This is doable but obviously just a hacked workaround. EDIT: This also causes problems with the state of the POD. When the maintenance runs and stays up and both containers are running, the state of the POD is Ready, but once the maintenance container exits, even with a SUCCESS (0 exit code), the state of the POD goes to NotReady 1/2.
Is there an option I've overlooked or something I'm missing about the above solutions? Thanks.
One option would be to use the Sidecar pattern with 2 slight changes to the approach you described:
after the maintenance command is executed, you keep the container running with a while : ; do sleep 86400; done command or something similar.
You set an appropriate startupProbe in place that resolves successfully only when your maintenance command is executed successfully. You could for example create a file /maintenance-done and use a startupProbe like this:
startupProbe:
exec:
command:
- cat
- /maintenance-done
initialDelaySeconds: 5
periodSeconds: 5
With this approach you have the following outcome:
Having the same restartPolicy for both your database and sidecar containers works fine thanks to the sleep hack.
You Pod only becomes ready when both containers are ready. In the sidecar container case this happens when the startupProbe succeedes.
Furthermore, there will be no noticeable overhead in your pod: even if the sidecar container keeps running, it will consume close to zero resources since it is only running the sleep command.
I am basically looking for mechanics similair to init containers with a caveat, that I want it to run after pod is ready (responds to readinessProbe for instance). Are there any hooks that can be applied to readinessProbe, so that it can fire a job after first sucessfull probe?
thanks in advance
you can use some short lifecycle hook to pod or say container.
for example
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo In postStart > /dev/termination-log"]
it's postStart hook so I think it will work.
But Post hook is async function so as soon as container started it will be triggered sometime may possible before the entry point of container it triggers.
Update
Above postStart runs as soon as the container is created not when it's Ready.
So I you are looking for when it become read you have to use either startup probe.
Startup probe is like Readiness & Liveness probe only but it's one time. Startup probe check for application Readiness once the application is Ready liveness probe takes it's place.
Read More about startup probe
So from the startup probe you can invoke the Job or Run any type of shell script file it will be one time, also it's after your application sends 200 to /healthz Endpoint.
startupProbe:
exec:
command:
- bin/bash
- -c
- ./run-after-ready.sh
failureThreshold: 30
periodSeconds: 10
file run-after-ready.sh in container
#!/bin/sh
curl -f -s -I "http://localhost/healthz" &>/dev/null && echo OK || echo FAIL
.
. #your extra code or logic, wait, sleep you can handle now everything
.
You can add more checks or conditions shell script if the application Ready or some API as per need.
I don't think there are anything in vanilla k8s that can achieve this right now. However there are 2 options to go about this:
If it is fine to retry the initialization task multiple times until it succeed then I would just start the task as a job at the same time as the pod you want to initialize. This is the easiest option but might be a bit slow though because of exponential backoff.
If it is critical that the initialization task only run after the pod is ready or if you want the task to not waste time failing and backoff a few times, then you should still run that task as a job but this time have it watch the pod in question using k8s api and execute the task as soon as the pod becomes ready.
I had to stop a job in k8 by killing the pod, and now the job is not schedule anymore.
# Import
- name: cron-xml-import-foreman
schedule: "*/7 * * * *"
args:
- /bin/sh
- -c
/var/www/bash.sh; /usr/bin/php /var/www/import-products.php -->env=prod;
resources:
request_memory: "3Gi"
request_cpu: "2"
limit_memory: "4Gi"
limit_cpu: "4"
Error :
Warning FailedNeedsStart 5m34s (x7883 over 29h) cronjob-controller
Cannot determine if job needs to be started: Too many missed start
time (> 100). Set or decrease .spec.startingDeadlineSeconds or check
clock skew.
According to the official documentation:
If startingDeadlineSeconds is set to a large value or left unset (the
default) and if concurrencyPolicy is set to Allow, the jobs will
always run at least once.
A CronJob is counted as missed if it has failed to be created at its
scheduled time. For example, If concurrencyPolicy is set to Forbid and
a CronJob was attempted to be scheduled when there was a previous
schedule still running, then it would count as missed.
And regarding the concurrencyPolicy
It specifies how to treat concurrent executions of a job that is
created by this cron job.
Check your CronJob configuration and adjust those values accordingly.
Please let me know if that helped.
I have following setting for readiness probe:
readinessProbe:
httpGet:
path: /xyzapi/health
port: 8888
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 30
successThreshold: 2
failureThreshold: 5
I want this readiness probe to be hit only once. It should fail or pass my deployment base on one try only.
I did some googling, but not much of help. Any kube expert? please help.
Oswin Noetzelmann's comment is spot-on. The purpose of a readiness probe is to continually check for the state of readiness. You can probably still change it so that your readiness-check script checks once and then caches that result, but it wouldn't be idiomatic use of readiness.
Better alternatives for a one-time check are: init-containers or just using a wrapper script which wraps the actual work of your main container and performs the check you want.
I think it is a good use case for init-containers. From the documentation, one of the common purposes of init containers is:
They run to completion before any app Containers start, whereas app
Containers run in parallel, so Init Containers provide an easy way to
block or delay the startup of app Containers until some set of
preconditions are met.
I want systemd to start a script and retry a maximum of 5 times, 30s apart.
Reading the systemd.service manual and searching the Internet didn't produce any obvious answers.
To allow a maximum of 5 retries separated by 30 seconds use the following options in the relevant systemd service file.
[Unit]
StartLimitInterval=200
StartLimitBurst=5
[Service]
Restart=always
RestartSec=30
This worked for a service that runs a script using Type=idle. Note that StartLimitInterval must be greater than RestartSec * StartLimitBurst otherwise the service will be restarted indefinitely. The service is considered failed when restarted StartLimitBurst times within StartLimitInterval.
See https://www.freedesktop.org/software/systemd/man/systemd.unit.html#StartLimitIntervalSec=interval and https://www.freedesktop.org/software/systemd/man/systemd.service.html#RestartSec=