How to put healthcheck in a deploy manifest? - kubernetes

I'm still learning kubernetes and, from pods, I'm moving to deploy configuration.
On pods I like to put healthchecks, here's an example using spring boot's actuator:
livenessProbe:
httpGet:
path: /actuator/health
port: 8080
initialDelaySeconds: 60
timeoutSeconds: 1
periodSeconds: 30
failureThreshold: 3
Problem is that the above configuration only works for pods. How can I use them in my deploy?

The Deployment will create a ReplicaSet and the ReplicaSet will maintain your Pods
Liveness and readiness probes are configured at a container level and a Pod is considered to be ready when all of its containers are ready.
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
The Spring actuator health check API is part of your application that is bundled in a container.
Kubernetes will check the liveness and readiness probes of each container in a Pod, if any of these probes fail to return successfully after a certain amount of time and attempts it will kill the pod and start a new one.
Setting a probe at a deployment level wouldn't make sense since you can potentially have multiple pods running under the same deployment and you wouldn't want to kill healthy pods if one of your pods is not healthy.
A deployment descriptor using the same pod configuration would be something like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: liveness-deployment
labels:
app: liveness
spec:
replicas: 3
selector:
matchLabels:
app: liveness
template:
metadata:
labels:
app: liveness
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5

Related

Why would k8s wait for all containers in a pod to start before restarting a failed/exited one?

No documentation mentions this behaviour and I find it very peculiar that k8s won't restart a failed container in a pod before all containers are started. I'm using a sidecar to the main container. The latter needs to restart itself at pod startup. After that the sidecar will run send some requests to the main container and continue to serve traffic further on.
However this all gets stuck with the first container not being restarted, i.e. startup/live/ready probes never kick in. Thus my questions are:
Why does this happen?
Where is it documented?
Can I circumvent this behaviour (i.e. make k8s restart my main container without decoupling the 2 containers into 2 distinct pods)?
Here's a small deployment yaml to illustrate the issue:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-test
labels:
app: nginx
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
restartPolicy: Always
containers:
- name: nginx
image: nginx:1.14.2
livenessProbe:
tcpSocket:
port: 80
startupProbe:
tcpSocket:
port: 80
command:
- bash
- -c
- echo exit 1; exit 1
- name: nginx2
image: nginx:1.14.2
lifecycle:
postStart:
exec:
command:
- bash
- -c
- while true; do sleep 1; echo .; done
I expect the restart counters to increase:
$ k describe pod -l app=nginx | grep Restart
Restart Count: 0
Restart Count: 0
What makes this annoying is the fact that k8s won't publish container stdout logs until the whole pod starts:
$ k logs --all-containers -l app=nginx
Error from server (BadRequest): container "nginx" in pod "nginx-test-cd5c64644-b48hj" is waiting to start: ContainerCreating
My real life example is percona (cluster) node with a proxysql sidecar. FWIW, all containers have "proper" live/ready/startup probe checks.

Kubernetes CronJob startup prob

I'm starting with Kubernetes and I implemented a CronJob that runs a Java jar.
It works fine but what I have observed is that if for some reason (for example, a wrong secret key) the container does not start, the pod will sit there indefinitely with error status : CreateContainerConfigError.
Is there a way to automatically kill the pod when such situation occurs ?
I tried with startup probe as indicated in code below but the probe did not even run.
apiVersion: batch/v1
kind: CronJob
metadata:
name: appName
namespace: appNamespace
labels:
app: appName
release: production
tiers: backend
spec:
jobTemplate:
spec:
backoffLimit: 2
template:
spec:
volumes:
- name: tmp-pod
emptyDir: {}
containers:
- name: appName
image: docker-image
command: ["/bin/bash", "-c"]
args:
- |
touch /tmp/pod/app-started;
java -XX:MaxRAMPercentage=75.0 -Djava.security.egd=file:/dev/urandom -jar /app.jar;
volumeMounts:
- mountPath: /tmp/pod
name: tmp-pod
env:
- name: env_var
value: value
# if app is not started within 5m (30 * 10 = 300s), container will be killed.
startupProbe:
exec:
command:
- cat
- /tmp/pod/app-started
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 30
resources:
requests:
memory: "2200Mi"
cpu: "750m"
limits:
memory: "2200Mi"
restartPolicy: OnFailure
schedule: "0 12 * * *"
concurrencyPolicy: Forbid
Does CronJob not support probes ?
Or I'm doing something wrong ?
Would there be another way of killing container that is not able to start after some time ?
I had the same issue, kubernetes v1.23.14
It seems that the kubernetes has a bug with cronjobs probs
Events:
Normal Started 3m38s kubelet Started container k8s-costs
Warning Unhealthy 3m38s kubelet Startup probe failed: cat: /var/run/secrets/vault-envs: No such file or directory
Normal Killing 3m37s kubelet Stopping container k8s-costs
kubelet said "Stopping container" but it still working
So as result I see that pod is Successed and everything fine, no alerts.
But really job execution was failed and already we in a fire
You can use a liveness probe to automatically kill a pod if the container inside it fails to start. A liveness probe continuously checks if the container is running, and if it detects that the container has failed, it restarts the container.
Here is an example of how to add a liveness probe to your pod definition:
apiVersion: batch/v1
kind: CronJob
metadata:
name: appName
namespace: appNamespace
labels:
app: appName
release: production
tiers: backend
spec:
jobTemplate:
spec:
backoffLimit: 2
template:
spec:
volumes:
- name: tmp-pod
emptyDir: {}
containers:
- name: appName
image: docker-image
command: ["/bin/bash", "-c"]
args:
- |
touch /tmp/pod/app-started;
java -XX:MaxRAMPercentage=75.0 -Djava.security.egd=file:/dev/urandom -jar /app.jar;
volumeMounts:
- mountPath: /tmp/pod
name: tmp-pod
env:
- name: env_var
value: value
livenessProbe:
exec:
command:
- cat
- /tmp/pod/app-started
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 30
resources:
requests:
memory: "2200Mi"
cpu: "750m"
limits:
memory: "2200Mi"
restartPolicy: OnFailure
schedule: "0 12 * * *"
concurrencyPolicy: Forbid
In the above example, the liveness probe exec command checks for the presence of the file /tmp/pod/app-started. If the file does not exist after the initial delay of 5 seconds, the probe will check every 10 seconds until the failure threshold of 30 is reached. At that point, the container will be killed and restarted.

How to ping DaemonSet in kubernetes

I've set up Kubernetes DaemonSet Manifest for handling metrics in my project, and having a little problem of pinging this DaemonSet, So my Eventual Question is, (If I have a daemonset in every pod of my project, how can I ping Specific One or Just the One In order to make sure that everything okay with it?) (Most Likely using curl, but would also consider some other ways too if it's possible:)
Example of DaemonSet I have
apiVersion: v1
kind: DaemonSet
metadata:
name: metrics-api-service
spec:
selector:
matchLabels:
app: api-metrics-app
template:
metadata:
labels:
app: api-metrics-app
spec:
terminationGracePeriodSeconds: 60
containers:
- name: api-metrics-service
image: image
livenessProbe:
exec:
< what I need to put here In order to ping the DaemonSet ? >
initialDelaySeconds: 60
resources:
limits:
cpu: "0.5"
memory: "500Mi"
requests:
cpu: "0.4"
memory: "200Mi"
Thanks
Healthcheck should be enough to making sure if its working or not, but if you still want to confirm from the external side then make sure your daemon set exposes the port and the security group allow internal traffic, you can ping/curl same node where the daemon set is running, every daemonset will get the node IP as an environment variable.
Pass the HOST_IP in the daemonset environment variable
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
and then update the liveness probe accordingly
livenessProbe:
exec:
command:
- sh
- -c
- curl -s $HOST_IP:PORT/healthcheck
I will recommend HTTP check over bash
healthcheck:
initialDelaySeconds: 100
periodSeconds: 10
timeoutSeconds: 5
httpGet:
path: /metrics
port: 3000
if /metrics seems working, just return 200 status code.
One way to do this would be to lookup for your Pods IP addresses, then query each of them.
for i in $(kubectl get pods -n logging get pods \
-l app=api-metrics-app \
-o jsonpath='{.items[*].status.podIP}')
do
curl http://$i:4242/metrics
done

Kubernetes pod never ready and service endpoint empty

I want to deploy an ASP.NET server with Kubernetes:
Deployment with my docker image (ASP.NET server)
Service to expose the pods
Nginx ingress controller
Ingress to access my pods from "outside"
For my issue, we can forget all the ingress yaml.
When I deploy my Deployment and my Service I have a problem: my pod is never ready and my service endpoints field is empty.
NAME READY STATUS RESTARTS AGE
pod/server-deployment-bd4977bf5-n7gmx 0/1 Running 36 (41s ago) 147m
When I run "kubectl logs pod/server-deployment-bd4977bf5-n7gmx" there are no logs related to this issue
Microsoft.Hosting.Lifetime[14]
Now listening on: http://[::]:80
Microsoft.Hosting.Lifetime[14]
Now listening on: https://[::]:403
Microsoft.Hosting.Lifetime[0]
Application started. Press Ctrl+C to shut down.
Microsoft.Hosting.Lifetime[0]
Hosting environment: Production
Microsoft.Hosting.Lifetime[0]
Content root path: /app/
Microsoft.Hosting.Lifetime[0]
Application is shutting down...
When I run "kubectl describe service/server-svc" I see that the "Endpoints:" field is empty.
After some research on stackoverflow & others sites, I didn't find any solution or explanation about my problem. From what I read, I know that service's endpoints field shouldn't be empty and my pods might have a problem with the readinessProbe.
Below is the .yaml of my Deployment and Service
Deployment :
apiVersion: apps/v1
kind: Deployment
metadata:
name: server-deployment
spec:
replicas: 1
selector:
matchLabels:
app: server-app
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: server-app
spec:
imagePullSecrets:
- name: regcred
containers:
- name: server-container
image: server:0.0.2
imagePullPolicy: Always
command: ["dotnet", "server.dll"]
envFrom:
- configMapRef:
name: server-configmap
optional: false
- secretRef:
name: server-secret
optional: false
ports:
- name: http
containerPort: 443
hostPort: 443
livenessProbe:
httpGet:
path: /api/health/live
port: http
initialDelaySeconds: 10
periodSeconds: 20
timeoutSeconds: 1
failureThreshold: 6
successThreshold: 1
readinessProbe:
httpGet:
path: /api/health/ready
port: http
initialDelaySeconds: 10
periodSeconds: 20
timeoutSeconds: 1
failureThreshold: 6
successThreshold: 1
volumeMounts:
- name: server-pfx-volume
mountPath: "/https"
readOnly: true
volumes:
- name: server-pfx-volume
secret:
secretName: server-pfx
Service:
apiVersion: v1
kind: Service
metadata:
name: server-svc
spec:
type: ClusterIP
selector:
app: server-app
ports:
- name: http
protocol: TCP
port: 443
targetPort: 443
When I run "kubectl get pods --show-labels" I got the pod with the correct label
NAME READY STATUS RESTARTS AGE LABELS
server-deployment-bd4977bf5-n7gmx 0/1 CrashLoopBackOff 38 (74s ago) 158m app=server-app,pod-template-hash=bd4977bf5
So I'm here to looking for help to figure out why my pod is never ready and why my service endpoint field is empty.

Kubernetes health check outside container

Can I do liveness or readiness kind of health check from out of the container. I mean, can I stop traffic to pods and restart containers in case application is not accessible.
The Http Request Liveness Probe and TCP Liveness Probe can be used to see if your application running inside of the container is reachable from the outside world:
pods/probe/http-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http
spec:
containers:
- name: liveness
image: k8s.gcr.io/liveness
args:
- /server
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3
See this piece of documentation on configuring probes. Does that answer your question?