Kubernetes cron job oomkilled - kubernetes

I have a rails app that is deployed on K8S. Inside my web app, there is a cronjob thats running every day at 8pm and it takes 6 hours to finish. I noticed OOMkilled error occurs after a few hours from cronjob started. I also increased memory of a pod but the error still happened.
This is my yaml file:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: sync-data
spec:
schedule: "0 20 * * *" # At 20:00:00pm every day
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 5
jobTemplate:
spec:
ttlSecondsAfterFinished: 100
template:
spec:
serviceAccountName: sync-data
containers:
- name: sync-data
resources:
requests:
memory: 2024Mi # OOMKilled
cpu: 1000m
limits:
memory: 2024Mi # OOMKilled
cpu: 1000m
image: xxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/path
imagePullPolicy: IfNotPresent
command:
- "/bin/sh"
- "-c"
- |
rake xxx:yyyy # Will take ~6 hours to finish
restartPolicy: Never
Are there any best practices to run long consuming cronjob on K8S?
Any help is welcome!

OOM Killed can happen for 2 reasons.
Your pod is taking more memory than the limit specified. In that case, you need to increase the limit obviously.
If all the pods in the node are taking more memory than they have requested then Kubernetes will kill some pods to free up space. In that case, you can give higher priority to this pod.
You should have monitoring in place to actually determine the reasons for this. Proper monitoring will show you which pods are performing as per expectations and which are not. You could also use node selectors for long-running pods and set priority class which will remove non-cron pods first.

Well honestly there is no correct resources request/limit stuff in kubernetes because it totally depend on your pod what kind of stuff it is doing. One thing I would suggest or you can do is deploy the vertical pod auto-scaling and observe what the vertical pod autoscaler suggest you the perfect resource request/limits for your cron job. Here is the very nice article you can start with and you will get to know how you can utilize this in your requirement.
https://medium.com/infrastructure-adventures/vertical-pod-autoscaler-deep-dive-limitations-and-real-world-examples-9195f8422724

Related

When the kubelet reports that the node is in diskpressure?

I know that the kubelet reports that the node is in diskpressure if there is not enough space on the node.
But I want to know the exact threshold of diskpressure.
Please let me know the source code of the kubelet related this issue if you could.
Or I really thanks for your help about the official documentation from k8s or sth else.
Thanks again!!
Kubelet running on the node will report the disk pressure depending on hard or soft eviction threshold values which is being set. if nothing being set it will be all default values. Please refer kubernetes documentation
Below are values which will be used
DiskPressure - nodefs.available, nodefs.inodesFree, imagefs.available, or imagefs.inodesFree
Disk pressure is a condition indicating that a node is using too much disk space or is using disk space too fast, according to the thresholds you have set in your Kubernetes configuration.
DaemonSet can deploy apps to multiple nodes in a single step. Like deployments, DaemonSets must be applied using kubectl before they can take effect.
Since kubernetes is running on Linux,this is easily done by running du command.you can either manually ssh into each kubernetes nodes,or use a Daemonset As follows:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: disk-checker
labels:
app: disk-checker
spec:
selector:
matchLabels:
app: disk-checker
template:
metadata:
labels:
app: disk-checker
spec:
hostPID: true
hostIPC: true
hostNetwork: true
containers:
- resources:
requests:
cpu: 0.15
securityContext:
privileged: true
image: busybox
imagePullPolicy: IfNotPresent
name: disk-checked
command: ["/bin/sh"]
args: ["-c", "du -a /host | sort -n -r | head -n 20"]
volumeMounts:
- name: host
mountPath: "/host"
volumes:
- name: host
hostPath:
path: "/"
Available disk space and inodes on either the node's root filesystem or image filesystem has satisfied an eviction threshold, check complete Node Conditions for more details.
Ways to set Kubelet options :
1)Command line options like --eviction-hard.
2)Config file.
3)More recent is dynamic configuration.
When you experience an issue with node disk pressure, your immediate thoughts should be when you run into the issue: an error in garbage collecting or log files. Of course the better answer here is to clean up unused files (free up some disk space).
So Monitor your clusters and get notified of any node disks approaching pressure, and get the issue resolved before it starts killing other pods inside the cluster.
Edit :
AFAIK there is no magic trick to know the exact threshold of diskpressure . You need to start with reasonable (limits & requests) and refine using trial and error.
Refer to this SO for more information on how to set the threshold of diskpressure.

Kubernetes cluster hangs when pod consumes too much memory

I have a job I run on a k8s node that quickly grows to 32Gb+ memory. The node has 32Gb of memory.
I would expect Kubernetes to evict the pod, but instead the node becomes completely unreachable and the only way to get it back is with a reboot. Looking at htop immediately after deploying the job, all CPUs and memory are maxed out.
Even when I set memory limits on the job configuration to 16M, the same happens.
This is what the job config looks like:
apiVersion: batch/v1
kind: Job
metadata:
name: mem-test
spec:
template:
spec:
containers:
- name: mem-test
image: gcr.io/foo/mem-test:latest
resources:
limits:
memory: 16000Mi
restartPolicy: Never
What am I missing?

How to make auto cluster upscaling work GKE/digitalocean for a job kind with varied requested memory requirement?

I have 1 node K8 cluster on digitalocean with 1cpu/2gbRAM
and 3 node cluster on google cloud with 1cpu/2gbRAM
I ran two jobs separatley on each cloud platform with auto-scaling enabled.
First job had memory request of 200Mi
apiVersion: batch/v1
kind: Job
metadata:
name: scaling-test
spec:
parallelism: 16
template:
metadata:
name: scaling-test
spec:
containers:
- name: debian
image: debian
command: ["/bin/sh","-c"]
args: ["sleep 300"]
resources:
requests:
cpu: "100m"
memory: "200Mi"
restartPolicy: Never
More nodes of (1cpu/2gbRAM) were added to cluster automatically and after job completion extra node were deleted automatically.
After that, i ran second job with memory request 4500Mi
apiVersion: batch/v1
kind: Job
metadata:
name: scaling-test2
spec:
parallelism: 3
template:
metadata:
name: scaling-test2
spec:
containers:
- name: debian
image: debian
command: ["/bin/sh","-c"]
args: ["sleep 5"]
resources:
requests:
cpu: "100m"
memory: "4500Mi"
restartPolicy: Never
After checking later job remained in pending state . I checked pods Events log and i'm seeing following error.
0/5 nodes are available: 5 Insufficient memory **source: default-scheduler**
pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 Insufficient memory **source:cluster-autoscaler**
cluster did not auto-scaled for required requested resource for job. Is this possible using kubernetes?
CA doesn't add nodes to the cluster if it wouldn't make a pod schedulable. It will only consider adding nodes to node groups for which it was configured. So one of the reasons it doesn't scale up the cluster may be that the pod has too large (e.g. 4500Mi memory). Another possible reason is that all suitable node groups are already at their maximum size.

Avoid multiple cron jobs running for one cron execution point in Kubernetes

EDIT: Question is solved, it was my mistake, i simply used the wrong cron settings. I assumed "* 2 * * *" would only run once per day at 2, but in fact it runs every minute past the hour 2. So Kubernetes behaves correctly.
I keep having multiple jobs running at one cron execution point. But it seems only if those jobs have a very short runtime. Any idea why this happens and how I can prevent it? I use concurrencyPolicy: Forbid, backoffLimit: 0 and restartPolicy: Never.
Example for a cron job that is supposed to run once per day, but runs multiple times just after its scheduled run time:
job-1554346620 1/1 11s 4h42m
job-1554346680 1/1 11s 4h41m
job-1554346740 1/1 10s 4h40m
Relevant config:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: job
spec:
schedule: "* 2 * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
containers:
- name: job
image: job_image:latest
command: ["rake", "run_job"]
restartPolicy: Never
imagePullSecrets:
- name: regcred
backoffLimit: 0
The most common problem of running CronJobs on k8s is:
spawning to many pods which consume all cluster resources
It is very important to set proper CronJob limitations
If you are not sure what you need - just take this example as a template:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: my-first-conjob
namespace: devenv-admitriev
spec:
schedule: "*/10 * * * *" # MM HH DD MM WKD -- Minutes, Hour, Day, Month, Weekday (eg. Sun, Mon)
successfulJobsHistoryLimit: 3 # how many completed jobs should be kept
failedJobsHistoryLimit: 1 # how many failed jobs should be kept
suspend: false # Here you can suspend cronjob without deliting it
concurrencyPolicy: Forbid # Choose Forbid if you don't want concurrent executions of your Job
# The amount of time that Kubernetes can miss and still start a job.
# If Kubernetes missed too many job starts (100)
# then Kubernetes logs an error and doesn’t start any future jobs.
startingDeadlineSeconds: 300 # if a job hasn't started in this many seconds, skip
jobTemplate:
spec:
parallelism: 1 # How many pods will be instantiated at once.
completions: 1 # How many containers of the job are instantiated one after the other (sequentially) inside the pod.
backoffLimit: 3 # Maximum pod restarts in case of failure
activeDeadlineSeconds: 1800 # Limit the time for which a Job can continue to run
template:
spec:
restartPolicy: Never # If you want to restart - use OnFailure
terminationGracePeriodSeconds: 30
containers:
- name: my-first-conjob
image: busybox
command:
- /bin/sh
args:
- -c
- date; echo sleeping....; sleep 90s; echo exiting...;
resources:
requests:
memory: '128Mi'
limits:
memory: '1Gi'
Hi it's not clear what you expected - looking into the question but if I understand correctly you mean not running all cronjobs at the same time:
1. First option - it's to change their schedule time,
2. Second option try to use in your spec template other options like - Parallel Jobs - described: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
"For a work queue Job, you must leave .spec.completions unset, and set .spec.parallelism to a non-negative integer"
jobTemplate:
spec:
parallelism: 1
template:
To recreate this task please provide more details.
In addition for "Jobs History"
by default successfulJobsHistoryLimit and failedJobsHistoryLimit are set to 3 and 1 respectively.
Please take at: https://kubernetes.io/docs/tasks/job/
If you are interested you can set-up limit in "spec" section:
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
Hope this help.

How to fail a (cron) job after a certain number of retries?

We have a Kubernetes cluster of web scraping cron jobs set up. All seems to go well until a cron job starts to fail (e.g., when a site structure changes and our scraper no longer works). It looks like every now and then a few failing cron jobs will continue to retry to the point it brings down our cluster. Running kubectl get cronjobs (prior to a cluster failure) will show too many jobs running for a failing job.
I've attempted following the note described here regarding a known issue with the pod backoff failure policy; however, that does not seem to work.
Here is our config for reference:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: scrape-al
spec:
schedule: '*/15 * * * *'
concurrencyPolicy: Allow
failedJobsHistoryLimit: 0
successfulJobsHistoryLimit: 0
jobTemplate:
metadata:
labels:
app: scrape
scrape: al
spec:
template:
spec:
containers:
- name: scrape-al
image: 'govhawk/openstates:1.3.1-beta'
command:
- /opt/openstates/openstates/pupa-scrape.sh
args:
- al bills --scrape
restartPolicy: Never
backoffLimit: 3
Ideally we would prefer that a cron job would be terminated after N retries (e.g., something like kubectl delete cronjob my-cron-job after my-cron-job has failed 5 times). Any ideas or suggestions would be much appreciated. Thanks!
You can tell your Job to stop retrying using backoffLimit.
Specifies the number of retries before marking this job failed.
In your case
spec:
template:
spec:
containers:
- name: scrape-al
image: 'govhawk/openstates:1.3.1-beta'
command:
- /opt/openstates/openstates/pupa-scrape.sh
args:
- al bills --scrape
restartPolicy: Never
backoffLimit: 3
You set 3 asbackoffLimit of your Job. That means when a Job is created by CronJob, It will retry 3 times if fails. This controls Job, not CronJob
When Job is failed, another Job will be created again as your scheduled period.
You want:
If I am not wrong, you want to stop scheduling new Job, when your scheduled Jobs are failed for 5 times. Right?
Answer:
In that case, this is not possible automatically.
Possible solution:
You need to suspend CronJob so than it stop scheduling new Job.
Suspend: true
You can do this manually. If you do not want to do this manually, you need to setup a watcher, that will watch your CronJob status, and will update CronJob to suspend if necessary.