Incorrect liveness probe for Redis not failing - kubernetes

I have configured a liveness probe for my Redis instances that makes sure that the Redis is able to retrieve keys for it to be able to be called 'alive'.
livenessProbe:
initialDelaySeconds: 20
periodSeconds: 10
exec:
command:
{{- include "liveness_probe" . | nindent 16 }}
_liveness.tpl
{{/* Liveness probe script. */}}
{{- define "liveness_probe" -}}
- "redis-cli"
- "set"
- "liveness_test_key"
- "\"SUCCESS\""
- "&&"
- "redis-cli"
- "get"
- "liveness_test_key"
- "|"
- "awk"
- "'$1 != \"SUCCESS\" {exit 1}'"
{{- end }}
The pod is able to start after doing the change. However, I would like to make sure that the probe is working as expected. For that I just added a delete command before the get command.
{{/* Liveness probe script. */}}
{{- define "liveness_probe" -}}
- "redis-cli"
- "set"
- "liveness_test_key"
- "\"SUCCESS\""
- "&&"
- "redis-cli"
- "del"
- "liveness_test_key"
- "&&"
- "redis-cli"
- "get"
- "liveness_test_key"
- "|"
- "awk"
- "'$1 != \"SUCCESS\" {exit 1}'"
{{- end }}
I get the expected exit codes when I execute this command directly in my command prompt.
But the thing is that my pod is still able to start.
Is the liveness probe command I am using okay? If so, how do I verify this?

Try this for your liveness probe it is working fine and you can try the same in readinessProbe:
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
app: redis
name: redis
spec:
replicas: 1
selector:
matchLabels:
app: redis
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: redis
spec:
containers:
- image: redis
name: redis
livenessProbe:
exec:
command:
- sh
- -c
- |
#!/usr/bin/env bash -e
#export REDISCLI_AUTH="$REDIS_PASSWORD"
set_response=$(
redis-cli set liveness_test_key "SUCCESS"
)
del_response=$(
redis-cli del liveness_test_key
)
response=$(
redis-cli get liveness_test_key
)
if [ "$response" != "SUCCESS" ] ; then
echo "Unable to get keys, something is wrong"
exit 1
fi
initialDelaySeconds: 5
periodSeconds: 5
status: {}
You will need to edit these values in your template

I think you're confusing livenessProbewith readinessProbe. livenessProbe tells kubernetes to restart your pod if your command returns a non-zero exit code, this is executed after the period specified in initialDelaySeconds: 20
Whereas readinessProbe is what decides whether a pod is in Ready state to accept traffict or not.
readinessProbe:
initialDelaySeconds: 20
periodSeconds: 10
exec:
command:
{{- include "liveness_probe" . | nindent 16 }}
They can also be used together if you need so.
Please check this page from kubernetes documentation where they explain livenessProbe, readinessProbe and startupProbe

Related

kubernetes cache clear and handling

I am using Kubernetes with Helm 3.8.0, with windows docker desktop configured on WSL2.
Sometime, after running: helm install, and retrieve a container, the container that is created behind sense, is an old container that created before (even after restarting the computer).
i.e: Now the yaml is declared with password: 12345, and database: test. before I tried to run the container yaml with password: 11111, and database: my_database.
Now when I do helm install mychart ./mychart --namespace test-chart --create-namespace for the current folder chart, the container is running with password: 11111 and database: my_datatbase, instead of the new parameters provided. There is no current yaml code with the old password, so I don't understand why the docker is run with the old one.
I did several actions, such as docker system prune, restarting Windows Docker Desktop, but still I get the old container, that cannot be seen, even in Windows Docker Desktop, I have checked the option in: Settings -> Kubernetes -> Show System Containers -> Show system containers.
After some investigations, I realized that that may be because of Kubernetes has it's own garbage collection handling of containers, and that is why I may refer to old container, even I didn't mean to.
In my case, I am creating a job template (I didn't put any line that reference this job in the _helpers.tpl file - I never changed that file, and I don't know whether that may cause a problem).
Here is my job template:
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "myChart.fullname" . }}-migration
labels:
name: {{ include "myChart.fullname" . }}-migration
annotations:
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "-300"
"helm.sh/hook-delete-policy": before-hook-creation
spec:
parallelism: 1
completions: 1
backoffLimit: 1
template:
metadata:
labels:
app: {{ template "myChart.name" . }}
release: {{ .Release.Namespace }}
spec:
initContainers:
- name: wait-mysql
image: {{ .Values.mysql.image }}
imagePullPolicy: IfNotPresent
env:
- name: MYSQL_ROOT_PASSWORD
value: "12345"
- name: MYSQL_DATABASE
value: test
command:
- /bin/sh
- -c
- |
service mysql start &
until mysql -uroot -p12345 -e 'show databases'; do
echo `date +%H:%M:%S`' - Waiting for mysql...'
sleep 5
done
containers:
- name: migration
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
command: {{- toYaml .Values.image.entrypoint | nindent 12 }}
args: {{- toYaml .Values.image.cmd | nindent 12}}
restartPolicy: Never
In the job - there is a database, which is first created, and after that it has data that is populated with code.
Also, are the annotations (hooks) are necessary?
After running helm install myChart ./myChart --namespace my-namespace --create-namespace, I realized that I am using very old container, which I don't really need.
I didn't understand if I write the meta data, as the following example (in: Garbage Collection) really help, and what to put in uid, whether I don't know it, or don't have it.
metadata:
...
ownerReferences:
- apiVersion: extensions/v1beta1
controller: true
blockOwnerDeletion: true
kind: ReplicaSet
name: my-repset
uid: d9607e19-f88f-11e6-a518-42010a800195
Sometimes I really want to reference existing pod (or container) from several templates (use the same container, which is not stateless, such as database container - one template for the pod and the other for the job) - How can I do that, also?
Is there any command (in command line, or a kind of method) that clear all the cached in Garbage Collection, or not use Garbage Collection at all? (What are the main benefits for the GC of Kubernetes?)

livenessProbe seems not to be executed

A container defined inside a deployment has a livenessProbe set up: by definition, it calls a remote endpoint and checks, whether response contains useful information or an empty response (which should trigger the pod's restart).
The whole definition is as follows (I removed the further checks for better clarity of the markup):
apiVersion: apps/v1
kind: Deployment
metadata:
name: fc-backend-deployment
labels:
name: fc-backend-deployment
app: fc-test
spec:
replicas: 1
selector:
matchLabels:
name: fc-backend-pod
app: fc-test
template:
metadata:
name: fc-backend-pod
labels:
name: fc-backend-pod
app: fc-test
spec:
containers:
- name: fc-backend
image: localhost:5000/backend:1.3
ports:
- containerPort: 4044
env:
- name: NODE_ENV
value: "dev"
- name: REDIS_HOST
value: "redis"
livenessProbe:
exec:
command:
- curl -X GET $BACKEND_SERVICE_HOST:$BACKEND_SERVICE_PORT/api/v3/stats | head -c 30 > /app/out.log
initialDelaySeconds: 20
failureThreshold: 12
periodSeconds: 10
I also tried putting the command into an array:
command: ["sh", "-c", "curl -X GET $BACKEND_SERVICE_HOST:$BACKEND_SERVICE_PORT/api/v3/stats", "|", "head", "-c", "30", ">", "/app/out.log"]
and splitting into separate lines:
- /bin/bash
- -c
- curl
- -X
- GET
- $BACKEND_SERVICE_HOST:$BACKEND_SERVICE_PORT/api/v3/stats
- |
- head
- -c
- "30"
- >
- /app/out.log
and even like this:
command:
- |
curl -X GET $BACKEND_SERVICE_HOST:$BACKEND_SERVICE_PORT/api/v3/stats | head -c 30 > /app/out.log
All attempts were made with and without (/bin/ba)sh -c - with the same result.
But, as you're reading this, you already know that none of these worked.
I know it by exec'ing into running container and trying to find the /app/out.log file - it wasn't present any time I watched the directory contents. It looks like the probe gets never executed.
The command run inside running container works just fine: data gets fetched and written to the specified file.
What might be causing the probe not to get executed?
When using the exec type of probes, Kubernetes will not run a shell to process the command, it will just run the command directly. This means that you can only use a single command and that the | character is considered just another parameter of your curl.
To solve the problem, you need to use sh -c to exec shell code, something like the following:
livenessProbe:
exec:
command:
- sh
- -c
- >-
curl -X GET $BACKEND_SERVICE_HOST:$BACKEND_SERVICE_PORT/api/v3/stats |
head -c 30 > /app/out.log

Keep getting error status on Kubernetes Cron Job with connection refused?

I am trying to write a cron job which hits a rest endpoint of the application it is pulling image of.
Below is the sample code:
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: {{ .Chart.Name }}-cronjob
labels:
app: {{ .Release.Name }}
chart: {{ .Chart.Name }}-{{ .Chart.Version }}
release: {{ .Release.Name }}
spec:
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 2
failedJobsHistoryLimit: 2
startingDeadlineSeconds: 1800
jobTemplate:
spec:
template:
metadata:
name: {{ .Chart.Name }}-cronjob
labels:
app: {{ .Chart.Name }}
spec:
restartPolicy: OnFailure
containers:
- name: demo
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
command: ["/bin/sh", "-c", "curl http://localhost:8080/hello"]
readinessProbe:
httpGet:
path: "/healthcheck"
port: 8081
initialDelaySeconds: 300
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 3
livenessProbe:
httpGet:
path: "/healthcheck"
port: 8081
initialDelaySeconds: 300
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 3
resources:
requests:
cpu: 200m
memory: 2Gi
limits:
cpu: 1
memory: 6Gi
schedule: "*/5 * * * *"
But i keep running into *curl: (7) Failed to connect to localhost port 8080: Connection refused*.
I can see from the events that it creates the container and immediately throws: Back-off restarting failed container.
I already have pods running of demo app and it works fine, it is just when i am trying to point to this existing app and hit a rest endpoint i start running into connection refused errors.
Exact output when seeing the logs:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 8080: Connection refused
Event Logs:
Container image "wayfair/demo:728ac13-as_test_cron_job" already present on machine
9m49s Normal Created pod/demo-cronjob-1619108100-ndrnx Created container demo
6m17s Warning BackOff pod/demo-cronjob-1619108100-ndrnx Back-off restarting failed container
5m38s Normal SuccessfulDelete job/demo-cronjob-1619108100 Deleted pod: demo-cronjob-1619108100-ndrnx
5m38s Warning BackoffLimitExceeded job/demo-cronjob-1619108100 Job has reached the specified backoff limit
Being new to K8, Any pointers are helpful!
You are trying to connect to localhost:8080 with your curl which doesn't make sense from what I understand of your CronJob definition.
From the docs (at https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#define-a-command-and-arguments-when-you-create-a-pod )
The command and arguments that you define in the configuration file
override the default command and arguments provided by the container
image. If you define args, but do not define a command, the default
command is used with your new arguments.
Note: The command field corresponds to entrypoint in some container
runtimes. Refer to the Notes below.
If you define a command for the image, even if the image would start a rest application on port 8080 on localhost with its default entrypoint (or command, depends on the container type you are using), the command overrides the entrypoint and no application is start.
If you have the necessity of both starting the application and then performing other operations, like curls and so on, I suggest to use a .sh script or something like that, depending on what is the Job objective.

Pass current date to kubernetes cronjob

I have a docker image that receive an env var name SINCE_DATE.
I have created a cronjob to run that container and I want to pass it the current date.
How can I do it?
Trying this, I get the literal string date -d "yesterday 23:59"
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: my-cron
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: my-cron
image: {{ .Values.image.repository }}:{{ .Values.image.tag }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
env:
- name: SINCE_DATE
value: $(date -d "yesterday 23:59")
You could achieve it by overwriting container Entrypoint command and set environment variable.
In your case it would looks like:
containers:
- name: my-cron
image: nginx
#imagePullPolicy: {{ .Values.image.pullPolicy }}
command:
- bash
- -c
- |
export SINCE_DATE=`date -d "yesterday 23:59"`
exec /docker-entrypoint.sh
Note:
Nginx docker-entrypoint.sh in located in / If your image have different path, you should use it, for example exec /usr/local/bin/docker-entrypoint.sh
Very similar use-case can be found in this Stack question
What does this solution?
It will overwrite default script set in the container ENTRYPOINT with the same script but beforehand set dynamically environment variable.
I solved the same problem recently using KubeMod, which patches resources as they are created/updated in K8S. It is nice for this use case since it requires no modification to the original job specification.
In my case I needed to insert a date into the middle of a previously existing string in the spec, but it's the same concept.
For example, this matches a specific job by regex, and alters the second argument of the first container in the spec.
apiVersion: api.kubemod.io/v1beta1
kind: ModRule
metadata:
name: 'name-of-your-modrule'
namespace: default
spec:
type: Patch
match:
- select: '$.metadata.name'
matchRegex: 'regex-that-matches-your-job-name'
- select: '$.kind'
matchValue: 'Job'
patch:
- op: replace
path: '/spec/template/spec/containers/0/args/1'
select: '$.spec.template.spec.containers[0].args[1]'
value: '{{ .SelectedItem | replace "Placeholder Value" (cat "The time is" (now | date "2006-01-02T15:04:05Z07:00")) | squote }}'

What would be the opa policy in .rego for the following examples?

I am new to opa and k8s, i dont have much knowledge or experience in this field. i would like to have policy in rego code (opa policy) and execute to see the result.
the following examples are:
Always Pull Images - Ensure every container sets its ‘imagePullPolicy’ to ‘Always’
Check for Liveness Probe - Ensure every container sets a livenessProbe
Check for Readiness Probe - Ensure every container sets a readinessProbe
for the following, i would like have an opa policy:
1.Always Pull Images:
apiVersion: v1
kind: Pod
metadata:
name: test-image-pull-policy
spec:
containers:
- name: nginx
image: nginx:1.13
imagePullPolicy: IfNotPresent
2.Check for Liveness Probe
3.Check for Readiness Probe
containers:
- name: opa
image: openpolicyagent/opa:latest
ports:
- name: http
containerPort: 8181
args:
- "run"
- "--ignore=.*" # exclude hidden dirs created by Kubernetes
- "--server"
- "/policies"
volumeMounts:
- readOnly: true
mountPath: /policies
name: example-policy
livenessProbe:
httpGet:
scheme: HTTP # assumes OPA listens on localhost:8181
port: 8181
initialDelaySeconds: 5 # tune these periods for your environemnt
periodSeconds: 5
readinessProbe:
httpGet:
path: /health?bundle=true # Include bundle activation in readiness
scheme: HTTP
port: 8181
initialDelaySeconds: 5
periodSeconds: 5
Is there any way to create the opa policy for the above conditions. Could any one help as i am new to opa. Thanks in advance.
For the liveness and readiness probe checks, you can simply test if those fields are defined:
package kubernetes.admission
deny["container is missing livenessProbe"] {
container := input_container[_]
not container.livenessProbe
}
deny["container is missing readinessProbe"] {
container := input_container[_]
not container.readinessProbe
}
input_container[container] {
container := input.request.object.spec.containers[_]
}
#Always Pull Images
package kubernetes.admission
deny[msg] {
input.request.kind.kind = "Pod"
container = input.request.object.spec.containers[_]
container.imagePullPolicy != "Always"
msg = sprintf("Forbidden imagePullPolicy value \"%v\"", [container.imagePullPolicy])
}