How to schedule pods restart - kubernetes

Is it possible to restart pods automatically based on the time?
For example, I would like to restart the pods of my cluster every morning at 8.00 AM.

Use a cronjob, but not to run your pods, but to schedule a Kubernetes API command that will restart the deployment everyday (kubectl rollout restart). That way if something goes wrong, the old pods will not be down or removed.
Rollouts create new ReplicaSets, and wait for them to be up, before killing off old pods, and rerouting the traffic. Service will continue uninterrupted.
You have to setup RBAC, so that the Kubernetes client running from inside the cluster has permissions to do needed calls to the Kubernetes API.
---
# Service account the client will use to reset the deployment,
# by default the pods running inside the cluster can do no such things.
kind: ServiceAccount
apiVersion: v1
metadata:
name: deployment-restart
namespace: <YOUR NAMESPACE>
---
# allow getting status and patching only the one deployment you want
# to restart
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: deployment-restart
namespace: <YOUR NAMESPACE>
rules:
- apiGroups: ["apps", "extensions"]
resources: ["deployments"]
resourceNames: ["<YOUR DEPLOYMENT NAME>"]
verbs: ["get", "patch", "list", "watch"] # "list" and "watch" are only needed
# if you want to use `rollout status`
---
# bind the role to the service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: deployment-restart
namespace: <YOUR NAMESPACE>
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: deployment-restart
subjects:
- kind: ServiceAccount
name: deployment-restart
namespace: <YOUR NAMESPACE>
And the cronjob specification itself:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: deployment-restart
namespace: <YOUR NAMESPACE>
spec:
concurrencyPolicy: Forbid
schedule: '0 8 * * *' # cron spec of time, here, 8 o'clock
jobTemplate:
spec:
backoffLimit: 2 # this has very low chance of failing, as all this does
# is prompt kubernetes to schedule new replica set for
# the deployment
activeDeadlineSeconds: 600 # timeout, makes most sense with
# "waiting for rollout" variant specified below
template:
spec:
serviceAccountName: deployment-restart # name of the service
# account configured above
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl # probably any kubectl image will do,
# optionaly specify version, but this
# should not be necessary, as long the
# version of kubectl is new enough to
# have `rollout restart`
command:
- 'kubectl'
- 'rollout'
- 'restart'
- 'deployment/<YOUR DEPLOYMENT NAME>'
Optionally, if you want the cronjob to wait for the deployment to roll out, change the cronjob command to:
command:
- bash
- -c
- >-
kubectl rollout restart deployment/<YOUR DEPLOYMENT NAME> &&
kubectl rollout status deployment/<YOUR DEPLOYMENT NAME>

Another quick and dirty option for a pod that has a restart policy of Always (which cron jobs are not supposed to handle - see creating a cron job spec pod template) is a livenessProbe that simply tests the time and restarts the pod on a specified schedule
ex. After startup, wait an hour, then check hour every minute, if hour is 3(AM) fail probe and restart, otherwise pass
livenessProbe:
exec:
command:
- exit $(test $(date +%H) -eq 3 && echo 1 || echo 0)
failureThreshold: 1
initialDelaySeconds: 3600
periodSeconds: 60
Time granularity is up to how you return the date and test ;)
Of course this does not work if you are already utilizing the liveness probe as an actual liveness probe ¯\_(ツ)_/¯

I borrowed idea from #Ryan Lowe but modified it a bit. It will restart pod older than 24 hours
livenessProbe:
exec:
command:
- bin/sh
- -c
- "end=$(date -u +%s);start=$(stat -c %Z /proc/1 | awk '{print int($1)}'); test $(($end-$start)) -lt 86400"

There's a specific resource for that: CronJob
Here an example:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: your-cron
spec:
schedule: "*/20 8-19 * * 1-5"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
metadata:
labels:
app: your-periodic-batch-job
spec:
containers:
- name: my-image
image: your-image
imagePullPolicy: IfNotPresent
restartPolicy: OnFailure
change spec.concurrencyPolicy to Replace if you want to replace the old pod when starting a new pod. Using Forbid, the new pod creation will be skip if the old pod is still running.

According to cronjob-in-kubernetes-to-restart-delete-the-pod-in-a-deployment
you could create a kind: CronJob with a jobTemplate having containers. So your CronJob will start those containers with a activeDeadlineSeconds of one day (until restart). According to you example, it will be then schedule: 0 8 * * ? for 8:00AM

We were able to do this by modifying the manifest (passing a random param every 3 hours) file of deployment from a CRON job:
We specifically used Spinnaker for triggering deployments:
We created a CRON job in Spinnaker like below:
Configuration step looks like:
The Patch Manifest looks like: (K8S restarts PODS when YAML changes, to counter that check bottom of post)
As there can be a case where all pods can restart at a same time, causing downtime, we have a policy for Rolling Restart where maxUnavailablePods is 0%
spec:
# replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 50%
maxUnavailable: 0%
This spawns new pods and then terminates old ones.

Related

restart policy in Kubernetes deployment

Is it possible to create a deployment on Kubernetes and using restart policy on failure ?
I did a small research but didn't find anything that enables restart policy on failure for Deployment.
KR
restartPolicy for kind: Deployment support Always only.
One of the requirements is "restart policy on failure max 3 times"
Try:
apiVersion: batch/v1
kind: Job
metadata:
name: busybox
spec:
template:
backoffLimit: 3 # <-- max fail 3 times
spec:
restartPolicy: OnFailure # <-- You can do this with job
containers:
- name: busybox
image: busybox
command: ["ash","-c","sleep 15"] # <-- replace with an invalid command to see backoffLimit in action

K8s Cronjob Rolling Restart Every Day

I have one pod that I want to automatically restart once a day. I've looked at the Cronjob documentation and I think I'm close, but I keep getting an Exit Code 1 error. I'm not sure if there's an obvious error in my .yaml. If not, I can post the error log as well. Here's my code:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: my-deployment-restart
spec:
schedule: "0 20 * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl
command:
- 'kubectl'
- 'rollout'
- 'restart'
- 'deployment my-deployment'
You would need to give it permissions to access the API, that means making a ServiceAccount and some RBAC policy objects (Role, RoleBinding) and then set serviceAccountName in your pod spec there.

Is it possible to trigger a kubernetes cronjob also upon deployment?

I have a simple cronjob that runs every 10 minutes:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: myjob
spec:
schedule: "*/10 * * * *" #every 10 minutes
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: job
image: image
imagePullPolicy: Always
restartPolicy: OnFailure
It indeed runs every 10 minutes, but i would like it to run first time when i deploy the cronjob. is it possible?
You could have a one time CronJob trigger the scheduled CronJob:
kubectl create job --from=cronjob/<name of cronjob> <name of job>
Source
The one time CronJob would need to run after the scheduled CronJob has been created, and its image would need to include the kubectl binary. Api-server permissions needed to run kubectl within the container could be provided by linking a ServiceAccount to the one time CronJob.

Terminate istio sidecar istio-proxy for a kubernetes job / cronjob

We recently started using istio Istio to establish a service-mesh within out Kubernetes landscape.
We now have the problem that jobs and cronjobs do not terminate and keep running forever if we inject the istio istio-proxy sidecar container into them. The istio-proxy should be injected though to establish proper mTLS connections to the services the job needs to talk to and comply with our security regulations.
I also noticed the open issues within Istio (istio/issues/6324) and kubernetes (kubernetes/issues/25908), but both do not seem to provide a valid solution anytime soon.
At first a pre-stop hook seemed suitable to solve this issue, but there is some confusion about this conecpt itself: kubernetes/issues/55807
lifecycle:
preStop:
exec:
command:
...
Bottomline: Those hooks will not be executed if the the container successfully completed.
There are also some relatively new projects on GitHub trying to solve this with a dedicated controller (which I think is the most preferrable approach), but to our team they do not feel mature enough to put them right away into production:
k8s-controller-sidecars
K8S-job-sidecar-terminator
In the meantime, we ourselves ended up with the following workaround that execs into the sidecar and sends a SIGTERM signal, but only if the main container finished successfully:
apiVersion: v1
kind: ServiceAccount
metadata:
name: terminate-sidecar-example-service-account
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: terminate-sidecar-example-role
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get","delete"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: terminate-sidecar-example-rolebinding
subjects:
- kind: ServiceAccount
name: terminate-sidecar-example-service-account
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: terminate-sidecar-example-role
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: terminate-sidecar-example-cronjob
labels:
app: terminate-sidecar-example
spec:
schedule: "30 2 * * *"
jobTemplate:
metadata:
labels:
app: terminate-sidecar-example
spec:
template:
metadata:
labels:
app: terminate-sidecar-example
annotations:
sidecar.istio.io/inject: "true"
spec:
serviceAccountName: terminate-sidecar-example-service-account
containers:
- name: ****
image: ****
command:
- "/bin/ash"
- "-c"
args:
- node index.js && kubectl exec -n ${POD_NAMESPACE} ${POD_NAME} -c istio-proxy -- bash -c "sleep 5 && /bin/kill -s TERM 1 &"
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
So, the ultimate question to all of you is: Do you know of any better workaround, solution, controller, ... that would be less hacky / more suitable to terminate the istio-proxy container once the main container finished its work?
- command:
- /bin/sh
- -c
- |
until curl -fsI http://localhost:15021/healthz/ready; do echo \"Waiting for Sidecar...\"; sleep 3; done;
echo \"Sidecar available. Running the command...\";
<YOUR_COMMAND>;
x=$(echo $?); curl -fsI -X POST http://localhost:15020/quitquitquit && exit $x
Update: sleep loop can be omitted if holdApplicationUntilProxyStarts is set to true (globally or as an annotation) starting with istio 1.7
This was not a misconfiguration, this was a bug in upstream Kubernetes. As of September of 2019, this has been resolved by Istio by introducing a /quitquitquit endpoint to the Pilot agent.
Unfortunately, Kubernetes has not been so steadfast in solving this issue themselves. So it still does exist in some facets. However, the /quitquitquit endpoint in Istio should have resolved the problem for this specific use case.
I have found a work around by editing the configmap of istio-sidecar-injector as per the link Istio documentation
https://istio.io/docs/setup/additional-setup/sidecar-injection/
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-sidecar-injector
data:
config: |-
policy: enabled
neverInjectSelector:
- matchExpressions:
- {key: job-name, operator: Exists}
But with this changes in our cronjob sidecar will not inject and istio policy will not apply on the cronjob job, and in our case we dont want any policy to be enforced by istio
Note :- job-name is by default label gets added in the pod creation
For those for whom curl is a luxury my wget version of the Dimitri's code:
command:
- /bin/sh
- -c
- |
until wget -q --spider http://127.0.0.1:15021/healthz/ready 2>/dev/null; do echo "Waiting for Istio sidecar..."; sleep 3; done;
echo \"Sidecar available. Running...\";
<COMMAND>;
x=$?; wget -q --post-data='' -S -O /dev/null http://127.0.0.1:15020/quitquitquit && exit $x

How to automatically remove completed Kubernetes Jobs created by a CronJob?

Is there a way to automatically remove completed Jobs besides making a CronJob to clean up completed Jobs?
The K8s Job Documentation states that the intended behavior of completed Jobs is for them to remain in a completed state until manually deleted. Because I am running thousands of Jobs a day via CronJobs and I don't want to keep completed Jobs around.
You can now set history limits, or disable history altogether, so that failed or successful CronJobs are not kept around indefinitely. See my answer here. Documentation is here.
To set the history limits:
The .spec.successfulJobsHistoryLimit and .spec.failedJobsHistoryLimit fields are optional. These fields specify how many completed and failed jobs should be kept. By default, they are set to 3 and 1 respectively. Setting a limit to 0 corresponds to keeping none of the corresponding kind of jobs after they finish.
The config with 0 limits would look like:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
This is possible from version 1.12 Alpha with ttlSecondsAfterFinished. An example from Clean Up Finished Jobs Automatically:
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-ttl
spec:
ttlSecondsAfterFinished: 100
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
Another way using a field-selector:
kubectl delete jobs --field-selector status.successful=1
That could be executed in a cronjob, similar to others answers.
Create a service account, something like my-sa-name
Create a role with list and delete permissions for the resource jobs
Attach the role in the service account (rolebinding)
Create a cronjob that will use the service account that will check for completed jobs and delete them
# 1. Create a service account
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-sa-name
namespace: default
---
# 2. Create a role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: default
name: my-completed-jobs-cleaner-role
rules:
- apiGroups: [""]
resources: ["jobs"]
verbs: ["list", "delete"]
---
# 3. Attach the role to the service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: my-completed-jobs-cleaner-rolebinding
namespace: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: my-completed-jobs-cleaner-role
subjects:
- kind: ServiceAccount
name: my-sa-name
namespace: default
---
# 4. Create a cronjob (with a crontab schedule) using the service account to check for completed jobs
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: jobs-cleanup
spec:
schedule: "*/30 * * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: my-sa-name
containers:
- name: kubectl-container
image: bitnami/kubectl:latest
# I'm using bitnami kubectl, because the suggested kubectl image didn't had the `field-selector` option
command: ["sh", "-c", "kubectl delete jobs --field-selector status.successful=1"]
restartPolicy: Never
I've found the below to work
To remove failed jobs:
kubectl delete job $(kubectl get jobs | awk '$3 ~ 0' | awk '{print $1}')
To remove completed jobs:
kubectl delete job $(kubectl get jobs | awk '$3 ~ 1' | awk '{print $1}')
i'm using wernight/kubectl's kubectl image
scheduled a cron deleting anything that is
completed
2 - 9 days old (so I have 2 days to review any failed jobs)
it runs every 30mins so i'm not accounting for jobs that are 10+ days old
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cleanup
spec:
schedule: "*/30 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: kubectl-runner
image: wernight/kubectl
command: ["sh", "-c", "kubectl get jobs | awk '$4 ~ /[2-9]d$/ || $3 ~ 1' | awk '{print $1}' | xargs kubectl delete job"]
restartPolicy: Never
I recently built a kubernetes-operator to do this task.
After deploy it will monitor selected namespace and delete completed jobs/pods if they completed without errors/restarts.
https://github.com/lwolf/kube-cleanup-operator
Using jsonpath:
kubectl delete job $(kubectl get job -o=jsonpath='{.items[?(#.status.succeeded==1)].metadata.name}')
As stated in the documentation "It is up to the user to delete old jobs", see http://kubernetes.io/docs/user-guide/jobs/#job-termination-and-cleanup
I would run a pod to do this cleanup based on job name and certain conditions, thus letting kubernetes at least take care of the availability of your process here. You could run a recurring job for this (assuming you run kubernetes 1.5).
A simple way to delete them by running a cron job:
kubectl get jobs --all-namespaces | sed '1d' | awk '{ print $2, "--namespace", $1 }' | while read line; do kubectl delete jobs $line; done