I'd like to get information about k8s cronjob time.
There are so many jobs in my k8s program.
So It's hard to count what time there are focused on.
I want to distribute my jobs evenly.
Is there are way to count cronjob time or sort by time?
I have tried to find a suitable tool that can help with your case.
Unfortunately, I did not find anything suitable and easy to use at the same time.
It is possible to use Prometheus + Grafana to monitor CronJobs e.g using this Kubernetes Cron and Batch Job monitoring dashboard.
However, I don't think you will find any useful information in this way, just a dashboard that displays the number of CronJobs in the cluster.
For this reason, I decided to write a Bash script that is able to display the last few CronJobs run in a readable manner.
As described in the Kubernetes CronJob documentation:
A CronJob creates Jobs on a repeating schedule.
To find out how long a specific Job was running, we can check its startTime and completionTime e.g. using the commands below:
# kubectl get job <JOB_NAME> --template '{{.status.startTime}}' # "startTime"
# kubectl get job <JOB_NAME> --template '{{.status.completionTime}}' # "completionTime"
To get the duration of Jobs in seconds, we can convert startTime and completionTime dates to epoch:
# date -d "<SOME_DATE> +%s
And this is the entire Bash script:
NOTE: We need to pass the namespace name as an argument.
#!/bin/bash
# script name: cronjobs_timetable.sh <NAMESPACE>
namespace=$1
for cronjob_name in $(kubectl get cronjobs -n $namespace --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}'); do
echo "===== CRONJOB_NAME: ${cronjob_name} ==========="
printf "%-15s %-15s %-15s %-15s\n" "START_TIME" "COMPLETION_TIME" "DURATION" "JOB_NAME"
for job_name in $(kubectl get jobs -n $namespace --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | grep -w "${cronjob_name}-[0-9]*$"); do
startTime="$(kubectl get job ${job_name} -n $namespace --template '{{.status.startTime}}')"
completionTime="$(kubectl get job ${job_name} -n $namespace --template '{{.status.completionTime}}')"
if [[ "$completionTime" == "<no value>" ]]; then
continue
fi
duration=$[ $(date -d "$completionTime" +%s) - $(date -d "$startTime" +%s) ]
printf "%-15s %-15s %-15s %-15s\n" "$(date -d ${startTime} +%X)" "$(date -d ${completionTime} +%X)" "${duration} s" "$job_name"
done
done
By default, this script only displays the last three Jobs, but it may by modified in the Job configuration using the .spec.successfulJobsHistoryLimit and .spec.failedJobsHistoryLimit fields (for more information see Kubernetes Jobs History Limits)
We can check how it works:
$ ./cronjobs_timetable.sh default
===== CRONJOB_NAME: hello ===========
START_TIME COMPLETION_TIME DURATION JOB_NAME
02:23:00 PM 02:23:12 PM 12 s hello-1616077380
02:24:02 PM 02:24:13 PM 11 s hello-1616077440
02:25:03 PM 02:25:15 PM 12 s hello-1616077500
===== CRONJOB_NAME: hello-2 ===========
START_TIME COMPLETION_TIME DURATION JOB_NAME
02:23:01 PM 02:23:23 PM 22 s hello-2-1616077380
02:24:02 PM 02:24:24 PM 22 s hello-2-1616077440
02:25:03 PM 02:25:25 PM 22 s hello-2-1616077500
===== CRONJOB_NAME: hello-3 ===========
START_TIME COMPLETION_TIME DURATION JOB_NAME
02:23:01 PM 02:23:32 PM 31 s hello-3-1616077380
02:24:02 PM 02:24:34 PM 32 s hello-3-1616077440
02:25:03 PM 02:25:35 PM 32 s hello-3-1616077500
===== CRONJOB_NAME: hello-4 ===========
START_TIME COMPLETION_TIME DURATION JOB_NAME
02:23:01 PM 02:23:44 PM 43 s hello-4-1616077380
02:24:02 PM 02:24:44 PM 42 s hello-4-1616077440
02:25:03 PM 02:25:45 PM 42 s hello-4-1616077500
Additionally, you'll likely want to create exceptions and error handling to make this script work as expected in all cases.
Related
I'm trying to delete releases older than 10 days, but some namespace shouldn't be touched (ex: monitoring)
In helm2 i did it with awk, but in helm3 they changed date type so it's not working.
Is there any way to do that?
Let me show you how I've resolved a similar issue. In our flow, we have an automatic rollout of helm releases for every feature branch, and we decided to implement an automatic cleanup process for deleting old feature releases in the development flow.
The current implementation requires jq as a dependency.
#!/usr/bin/env bash
set -e
echo "Staring delete-old-helm-release.sh ..."
helm_release_name=${1:-$HELM_RELEASE_NAME}
k8s_namespace=${2:-$KUBERNETES_NAMESPACE}
# Get helm release date, take updated field and remove UTC from string
helm_release_updated=$(helm list --filter "${helm_release_name}" -n "${k8s_namespace}" -o json \
| jq --raw-output ".[0].updated" \
| sed s/"UTC"// \
)
if [[ "$helm_release_name" == null ]]; then
echo "Helm release: ${helm_release_name} in namespace: ${k8s_namespace} not found"
echo "Exit from delete-old-helm-release.sh ..."
exit 1
fi
# Convert date string to timestamp, get current timestamp and calculate time delta
helm_release_date_timestamp=$(date --utc --date="${helm_release_updated}" +"+%s")
current_date_timestamp=$(date --utc +"+%s")
time_difference=$((current_date_timestamp - helm_release_date_timestamp))
# 86400 means 24 hours (60*60*24) in seconds
if [[ (( $time_difference -gt 86400 )) ]]; then
echo "Detected old release: ${helm_release_name} in namespace: ${k8s_namespace}"
echo "Difference is more than 24hr: $((time_difference/60/60))hr"
echo "Deliting it ..."
helm delete "${helm_release_name}" -n "${k8s_namespace}" --purge
echo "Done"
else
echo "Detected fresh release"
echo "Current time difference is less than 24hr: $((time_difference/60/60))hr"
echo "Skipping ..."
fi
exit 0
It's tested with helm 3.2.4 and I think it should work with all helm 3.x.x until they changed date format.
BTW, please update your issue description so it will be more clear and have bigger priority in search engines :)
Please let me know is it helps,
Good luck,
Oleg
In PBS, one can query a specific job with qstat -f and obtain (all?) info and details to reproduce the job:
# qstat -f 1234
Job Id: 1234.login
Job_Name = job_name_here
Job_Owner = user#pbsmaster
...
Resource_List.select = 1:ncpus=24:mpiprocs=24
Resource_List.walltime = 23:59:59
...
Variable_List = PBS_O_HOME=/home/user,PBS_O_LANG=en_US.UTF-8,
PBS_O_LOGNAME=user,...
etime = Mon Apr 20 16:38:27 2020
Submit_arguments = run_script_here --with-these flags
How may I extract the same information from SLURM?
scontrol show job %j only works for currently running jobs or those terminated up to 5 minutes ago.
Edit: I'm currently using the following to obtain some information, but it's not as complete as a qstat -f:
sacct -u $USER \
-S 2020-05-13 \
-E 2020-05-15 \
--format "Account,JobID%15,JobName%20,State,ExitCode,Submit,CPUTime,MaxRSS,ReqMem,MaxVMSize,AllocCPUs,ReqTres%25"
.. usually piped into |(head -n 2; grep -v COMPLETED) |sort -k12 to inspect only failed runs.
You can get a list of all jobs that started before a certain date like so:
sacct --starttime 2020-01-01
Then pick the job you are interested (e.g. job 1234) and print details with sacct:
sacct -j 1234 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
See here under --helpformat for a complete list of available fields.
I'm looking for a way to tell (from within a script) when a Kubernetes Job has completed. I want to then get the logs out of the containers and perform cleanup.
What would be a good way to do this? Would the best way be to run kubectl describe job <job_name> and grep for 1 Succeeded or something of the sort?
Since version 1.11, you can do:
kubectl wait --for=condition=complete job/myjob
and you can also set a timeout:
kubectl wait --for=condition=complete --timeout=30s job/myjob
You can visually watch a job's status with this command:
kubectl get jobs myjob -w
The -w option watches for changes. You are looking for the SUCCESSFUL column to show 1.
For waiting in a shell script, I'd use this command:
until kubectl get jobs myjob -o jsonpath='{.status.conditions[?
(#.type=="Complete")].status}' | grep True ; do sleep 1 ; done
You can use official Python kubernetes-client.
https://github.com/kubernetes-client/python
Create new Python virtualenv:
virtualenv -p python3 kubernetes_venv
activate it with
source kubernetes_venv/bin/activate
and install kubernetes client with:
pip install kubernetes
Create new Python script and run:
from kubernetes import client, config
config.load_kube_config()
v1 = client.BatchV1Api()
ret = v1.list_namespaced_job(namespace='<YOUR-JOB-NAMESPACE>', watch=False)
for i in ret.items:
print(i.status.succeeded)
Remember to set up your specific kubeconfig in ~/.kube/config and valid value for your job namespace -> '<YOUR-JOB-NAMESPACE>'
I would use -w or --watch:
$ kubectl get jobs.batch --watch
NAME COMPLETIONS DURATION AGE
python 0/1 3m4s 3m4s
Adding the best answer, from a comment by #Coo, If you add a -f or --follow option when getting logs, it'll keep tailing the log and terminate when the job completes or fails. The $# status code is even non-zero when the job fails.
kubectl logs -l job-name=myjob --follow
One downside of this approach, that I'm aware of, is that there's no timeout option.
Another downside is the logs call may fail while the pod is in Pending (while the containers are being started). You can fix this by waiting for the pod:
# Wait for pod to be available; logs will fail if the pod is "Pending"
while [[ "$(kubectl get pod -l job-name=myjob -o json | jq -rc '.items | .[].status.phase')" == 'Pending' ]]; do
# Avoid flooding k8s with polls (seconds)
sleep 0.25
done
# Tail logs
kubectl logs -l job-name=myjob --tail=400 -f
It either one of these queries with kubectl
kubectl get job test-job -o jsonpath='{.status.succeeded}'
or
kubectl get job test-job -o jsonpath='{.status.conditions[?(#.type=="Complete")].status}'
Although kubectl wait --for=condition=complete job/myjob and kubectl wait --for=condition=complete job/myjob allow us to check whether the job completed but there is no way to check if the job just finished executing (irrespective of success or failure). If this is what you are looking for, a simple bash while loop with kubectl status check did the trick for me.
#!/bin/bash
while true; do
status=$(kubectl get job jobname -o jsonpath='{.status.conditions[0].type}')
echo "$status" | grep -qi 'Complete' && echo "0" && exit 0
echo "$status" | grep -qi 'Failed' && echo "1" && exit 1
done
I can delete all jobs inside a custer running
kubectl delete jobs --all
However, jobs are deleted one after another which is pretty slow (for ~200 jobs I had the time to write this question and it was not even done).
Is there a faster approach ?
It's a little easier to setup an alias for this bash command:
kubectl delete jobs `kubectl get jobs -o custom-columns=:.metadata.name`
I have a script for deleting which was quite faster in deleting:
$ cat deljobs.sh
set -x
for j in $(kubectl get jobs -o custom-columns=:.metadata.name)
do
kubectl delete jobs $j &
done
And for creating 200 jobs used following script with the command for i in {1..200}; do ./jobs.sh; done
$ cat jobs.sh
kubectl run memhog-$(cat /dev/urandom | tr -dc 'a-z0-9' | fold -w 8 | head -n 1) --restart=OnFailure --record --image=derekwaynecarr/memhog --command -- memhog -r100 20m
If you are using CronJob and those are piling up quickly, you can let kubernetes delete them automatically by configuring job history limit described in documentation. That is valid starting from version 1.6.
...
spec:
...
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
This works really well for me:
kubectl delete jobs $(kubectl get jobs -o custom-columns=:.metadata.name)
There is an easier way to do it:
To delete successful jobs:
kubectl delete jobs --field-selector status.successful=1
To delete failed or long-running jobs:
kubectl delete jobs --field-selector status.successful=0
I use this script, it's fast but it can trash CPU (a process per job), you can always adjust the sleep parameter:
#!/usr/bin/env bash
echo "Deleting all jobs (in parallel - it can trash CPU)"
kubectl get jobs --all-namespaces | sed '1d' | awk '{ print $2, "--namespace", $1 }' | while read line; do
echo "Running with: ${line}"
kubectl delete jobs ${line} &
sleep 0.05
done
The best way for me is (for completed jobs older than a day):
kubectl get jobs | grep 1/1 | gawk 'match($0, / ([0-9]*)h/, ary) { if(ary[1]>24) print $1}' | parallel -r --bar -P 32 kubectl delete jobs
grep 1/1 for completed jobs
gawk 'match($0, / ([0-9]*)h/, ary) { if(ary[1]>24) print $1}' for jobs older than a day
-P number of parallel processes
It is faster than kubectl delete jobs --all, has a progress bar and you can use it when some jobs are still running.
kubectl delete jobs --all --cascade=false is fast, but won't delete associated resources, such as Pods
https://github.com/kubernetes/kubernetes/issues/8598
Parallelize using GNU parallel
parallel --jobs=5 "echo {}; kubectl delete jobs {} -n core-services;" ::: $(kubectl get job -o=jsonpath='{.items[?(#.status.succeeded==1)].metadata.name}' -n core-services)
kubectl get jobs -o custom-columns=:.metadata.name | grep specific* | xargs kubectl delete jobs
kubectl get jobs -o custom-columns=:.metadata.name gives you list of jobs name | then you can grep specific that you need with regexp | then xargs use output to delete one by one from the list.
Probably, there's no other way to delete all job at once,because even kubectl delete jobs also queries one job at a time, what Norbert van Nobelen suggesting might get faster result, but it will make much difference.
Kubectl bulk (bulk-action on krew) plugin may be useful for you, it gives you bulk operations on selected resources.
This is the command for deleting jobs
' kubectl bulk jobs delete '
You could check details in
https://github.com/emreodabas/kubectl-plugins/blob/master/README.md#kubectl-bulk-aka-bulk-action
I am looking for a way to schedule a set of commands on a unix server. The server time is in UTC. I essentially want to perform the below steps automatically every Wednesday at 4pm UK time:
Change the server date to the next time it is UK midnight (to avoid timezone change issues)
Restart the tomcat server
If tomcat is running, run a jar
Change the server date to the correct present UTC time
Restart the tomcat server
If tomcat is running, we are done
The below commands are what I currently run manually:
date -s "Thu Feb 09 00:01:00 UTC 2017" (represents the next day at 1 minute past midnight)
service tomcat restart
sudo -u tomcat java -jar Test.jar -type "Major" -status "Active"
date -s "<the current UTC time>"
service tomcat restart
I understand we can use cron to schedule the running of a script, but unsure how to do this. Any help is appreciated.
Create a file called called unix-server-date.sh and save it in /opt for example. The script will have this content:
#!/bin/bash
date -s "Thu Feb 09 00:01:00 UTC 2017" (represents the next day at 1 minute past midnight)
service tomcat restart
sudo -u tomcat java -jar Test.jar -type "Major" -status "Active"
date -s "<the current UTC time>"
service tomcat restart
Make the script executable:
chmod +x /opt/unix-server-date.sh
Then issue crontab -e to edit the crontab entries and add an entry like:
00 16 * * 3 /opt/unix-server-date.sh
Depending on your editor, after you have added the crontab entry please save the file and the crontab will be automatically installed.
That would be the basics!
If you run those commands from a specific user account you should add the cronjob to that user's crontab. Change crontab -e to crontab -e -u user