Currently it takes quite a long time before the pod can be terminated after a kubectl delete command. I have the feeling that it could be because of the sleep command.
How can I make the container stop faster?
What best practices should I use here?
apiVersion: apps/v1
kind: Deployment
...
spec:
template:
spec:
containers:
- image: alpine
..
command:
- /bin/sh
- -c
- |
trap : TERM INT
while true; do
# some code to check something
sleep 10
done
Is my approach with "trap: TERM INT" correct? At the moment I don't see any positive effect...
When I terminate the pod it takes several seconds for the command to come back.
kubectl delete pod my-pod
Add terminationGracePeriodSeconds to your spec will do:
...
spec:
template:
spec:
terminationGracePeriodSeconds: 10 # <-- default is 30, can go as low as 0 to send SIGTERM immediately.
containers:
- image: alpine
Related
I'm running a pod with 3 containers (telegraf, fluentd and an in-house agent) that makes use of shareProcessNamespace: true.
I've written a python script to fetch the initial config for telegraf and fluentd from a central controller API endpoint. Since this is a one time operation, I plan to use helm post-install hook.
apiVersion: batch/v1
kind: Job
metadata:
name: agent-postinstall
annotations:
"helm.sh/hook-weight": "3"
"helm.sh/hook": "post-install"
spec:
template:
spec:
containers:
- name: agent-postinstall
image: "{{ .Values.image.agent.repository }}:{{ .Values.image.agent.tag | default .Chart.AppVersion }}"
imagePullPolicy: IfNotPresent
command: ['python3', 'getBaseCfg.py']
volumeMounts:
- name: config-agent-volume
mountPath: /etc/config
volumes:
- name: config-agent-volume
configMap:
name: agent-cm
restartPolicy: Never
backoffLimit: 1
It is required for the python script to check if telegraf/fluentd/agent processes are up, before getting the config. I intend to wait (with a timeout) until pgrep <telegraf/fluentd/agent> returns true and then fire APIs. Is there a way to enable shareProcessNamespace for the post-install hook as well? Thanks.
PS: Currently, the agent calls the python script along with its own startup script. It works, but it is kludgy. I'd like to move it out of agent container.
shareProcessNamespace
Most important part of this flag is it works only within one pod, all containers within one pod will share processes between each other.
In described approach job is supposed to be used. Job creates a separate pod so it won't work this way. Container should be a part of the "main" pod with all other containers to have access to running processes of that pod.
More details about process sharing.
Possible way to solution it
It's possible to get processes from the containers directly using kubectl command.
Below is an example how to check state of the processes using pgrep command. The pgrepContainer container needs to have the pgrep command already installed.
job.yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-postinstall-hook"
annotations: "helm.sh/hook": post-install
spec:
template:
spec:
serviceAccountName: config-user # service account with appropriate permissions is required using this approach
volumes:
- name: check-script
configMap:
name: check-script
restartPolicy: Never
containers:
- name: post-install-job
image: "bitnami/kubectl" # using this image with kubectl so we can connect to the cluster
command: ["bash", "/mnt/script/checkScript.sh"]
volumeMounts:
- name: check-script
mountPath: /mnt/script
And configmap.yaml which contains script and logic which check three processes in loop for 60 iterations per 10 seconds each:
apiVersion: v1
kind: ConfigMap
metadata:
name: check-script
data:
checkScript.sh: |
#!/bin/bash
podName=test
pgrepContainer=app-1
process1=sleep
process2=pause
process3=postgres
attempts=0
until [ $attempts -eq 60 ]; do
kubectl exec ${podName} -c ${pgrepContainer} -- pgrep ${process1} 1>/dev/null 2>&1 \
&& kubectl exec ${podName} -c ${pgrepContainer} -- pgrep ${process2} 1>/dev/null 2>&1 \
&& kubectl exec ${podName} -c ${pgrepContainer} -- pgrep ${process3} 1>/dev/null 2>&1
if [ $? -eq 0 ]; then
break
fi
attempts=$((attempts + 1))
sleep 10
echo "Waiting for all containers to be ready...$[ ${attempts}*10 ] s"
done
if [ $attempts -eq 60 ]; then
echo "ERROR: Timeout"
exit 1
fi
echo "All containers are ready !"
echo "Configuring telegraf and fluentd services"
Final result will look like:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test 2/2 Running 0 20m
test-postinstall-hook-dgrc9 0/1 Completed 0 20m
$ kubectl logs test-postinstall-hook-dgrc9
Waiting for all containers to be ready...10 s
All containers are ready !
Configuring telegraf and fluentd services
Above is an another approach, you can use its logic as base to achieve your end goal.
postStart
Also postStart hook can be considered to be used where some logic will be located. It will run after container is created. Since main application takes time to start and there's already logic which waits for it, it's not an issue that:
there is no guarantee that the hook will execute before the container ENTRYPOINT
My yaml file
apiVersion: batch/v1
kind: Job
metadata:
name: auto
labels:
app: auto
spec:
backoffLimit: 5
activeDeadlineSeconds: 100
template:
metadata:
labels:
app: auto
spec:
containers:
- name: auto
image: busybox
imagePullPolicy: Always
ports:
- containerPort: 9080
imagePullSecrets:
- name: imageregistery
restartPolicy: Never
The pods are killed appropriately but the job ceases to kill itself post 100 seconds.
Is there anything that we could do to kill the job post the container/pod's functionality is completed.
kubectl version --short
Client Version: v1.6.1
Server Version: v1.13.10+IKS
kubectl get jobs --namespace abc
NAME DESIRED SUCCESSFUL AGE
auto 1 1 26m
Thank you,
The default way to delete jobs after they are done is to use kubectl delete command.
As mentioned by #Erez:
Kubernetes is keeping pods around so you can get the
logs,configuration etc from it.
If you don't want to do that manually you could write a script running in your cluster that would check for jobs with completed status and than delete them.
Another way would be to use TTL feature that deletes the jobs automatically after a specified number of seconds. However, if you set it to zero it will clean them up immediately. For more details of how to set it up look here.
Please let me know if that helped.
I am trying to run a simple ubuntu container in a kubernetes cluster. It keeps on failing with CrashLoopBackOff status. I am not even able to see any logs as in to find the reason for it.
my yaml file looks like following:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ubuntu
labels:
app: jubuntu
spec:
selector:
matchLabels:
app: jubuntu
template:
metadata:
labels:
app: jubuntu
spec:
containers:
- name: ubuntu
image: ubuntu
That's because you're using a Deployment that assumes you have a long-running task. In your case, it starts the container and immediately exits since there's nothing to be done there. In other words, this deployment doesn't make a lot of sense. You could add the following in the containers: field to see it running (still useless, but at least you don't see it crashing anymore):
command:
- sh
- '-c'
- "while true; do echo working ; sleep 5; done;"
See also this troubleshooting guide.
For your convenience, if you don't want to do it via editing a YAML manifest, you can also use this command:
$ kubectl run --image=ubuntu -- sh while true; do echo working ; sleep 5; done;
And if you're super curious and want to check if it's the same, then you can append the following to the run command: --dry-run --output=yaml (after --image, before -- sh).
I have a job definition based on example from kubernetes website.
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-timeout-6
spec:
activeDeadlineSeconds: 30
completions: 1
parallelism: 1
template:
metadata:
name: pi
spec:
containers:
- name: pi
image: perl
command: ["exit", "1"]
restartPolicy: Never
I would like run this job once and not restart if fails. With comand exit 1 kubernetes trying to run new pod to get exit 0 code until reach activeDeadlineSeconds timeout. How can avoid that? I would like run build commands in kubernetes to check compilation and if compilation fails I'll get exit code different than 0. I don't want run compilation again.
Is it possible? How?
By now this is possible by setting backoffLimit: 0 which tells the controller to do 0 retries. default is 6
If you want a one-try command runner, you probably should create bare pod, because the job will try to execute the command until it's successful or the active deadline is met.
Just create the pod from your template:
apiVersion: v1
kind: Pod
metadata:
name: pi
spec:
containers:
- name: pi
image: perl
command: ["exit", "1"]
restartPolicy: Never
Sadly there is currently no way to prevent the job controller to just respawn new pods when they fail, but the kubernetes community is working on a solution, see:
"Backoff policy and failed pod limit" https://github.com/kubernetes/community/pull/583
I have a pod with the following config:
apiVersion: v1
kind: Pod
metadata:
labels:
name: demo
name: demo
spec:
containers:
- name: demo
image: ubuntu:14.04
command:
- sleep
- "3600"
When I try to stop it, the SIGTERM is ignored by the sleep command, and it takes 30 seconds (the full default grace period) to stop. I can also get on the pod and send the signal to the process (pid 1) manually, and it does not kill the pod. How can I get sleep to die when a signal is sent to it?
Bash ignores SIGTERM when there are no traps. You can trap SIGTERM to force an exit. For example, trap 'exit 255' SIGTERM; sleep 3600