Limited pods in Kubernetes (EKS) and Airflow - kubernetes

Despite increasing the values ​​of the variables that modify Airflow concurrency levels, I never get more than nine simultaneous pods.
I have an EKS cluster with two m4.large nodes, with capacity for 20 pods each. The whole system occupies 15 pods, so I have room to have 25 more pods but they never reach more than nine.
I have created an escalation policy because the scheduler gets a bit stressed by throwing 500 dags at the same time, but EKS creates an additional cluster that all it does is distribute the nine pods.
I have also tested with two m4.2xlarge nodes, with capacity for almost 120 pods and the result the same despite multiplying by 4 the performance of the system and increasing the number of threads from 2 to 6.
These are the environment variable values ​​that I handle.
AIRFLOW__CORE__PARALLELISM = 1000
AIRFLOW__CORE__NON_POOLED_TASK_SLOT_COUNT = 1000
AIRFLOW__CORE__DAG_CONCURRENCY = 1000
AIRFLOW__CORE__SQL_ALCHEMY_POOL_SIZE = 0
AIRFLOW__CORE__SQL_ALCHEMY_MAX_OVERFLOW = -1
That could be happening?

Ok, I've already seen where the problem is. Kubernetes does not manage pods well without requests or limits. I have added requests and limits and now the nodes are filled completely with 20 pods each.
Now I have another problem. The pods don't seem to disappear when they finish. The pods only print "Hello world", despite this, in dag_run there are dags that take from 49 seconds to 22 minutes. With the fact that although there are more pods in each node, the whole system still takes more than 20 minutes to complete, as before.

Something is wrong. If I have two nodes that can host 100 pods. And every pod takes a minute to finish, if I run five hundred pods simultaneously, all the work should end in five minutes. But it always takes between 16-20 minutes. The nodes are never full of pods at full capacity and the pods finish their work but take some time to be deleted. What makes it so slow?
Use Airflow 1.10.9 with this configuration:
ENV AIRFLOW__CORE__PARALLELISM=100
ENV AIRFLOW__CORE__NON_POOLED_TASK_SLOT_COUNT=100
ENV AIRFLOW__CORE__DAG_CONCURRENCY=100
ENV AIRFLOW__CORE__MAX_ACTIVE_RUNS_PER_DAG=100
ENV AIRFLOW__CORE__SQL_ALCHEMY_POOL_SIZE=0
ENV AIRFLOW__CORE__SQL_ALCHEMY_MAX_OVERFLOW=-1
ENV AIRFLOW__KUBERNETES_WORKER_PODS_CREATION_BATCH_SIZE=10
ENV AIRFLOW__SCHEDULER__MAX_THREADS=6

Related

Does Kubernetes put containers to sleep when maximum capacity of node is reached?

I am currently doing research as to how the number of running pods on a single worker node relates to the energy consumption of a server. From Kubernetes considerations for large cluster, I found that the recommended maximum number of pods per node is 110. I am running microk8s from Canonical on a server that has Ubuntu 20.04 LTS installed, and have some add-ons enabled which use a number of 15 pods, which leaves a number of 95 pods available. The server is attached to the grid through a WattsUp Pro meter, which allows me to measure the power consumption on a per-second basis.
My set-up is as follows:
I use 1, 24, 48, 71 and 95 pods.
For every number, I scale up a deployment of a Python Flask application served by Gunicorn to the desired amount of pods.
I wait for the power consumption to stabilize. Upon stabilization, I calculate the energy consumption of the server over a period of 60 seconds.
For every number, I run the measurement 30 times.
The results are more or less straightforward for the first four numbers of pods; the energy consumption during the "stable" 60 seconds increases linearly for each higher number of pods. However, the mean energy consumption for the treatment of 95 pods deviates from this pattern.
Is there any mechanism in Kubernetes (or Ubuntu) that I am missing, that could be causing this unexpected result? Does anybody have a possible explanation?

Kubernetes: Is there a way to speed up node creation after deletion?

I've deployed my application on GCP Kubernetes and at times, I need to delete a node from one of the node pools.
Once I run kubectl delete node <node-id>, it takes about half an hour to an hour for a new node to come up in its place even if they're gracefully stopped and then deleted, which is a lot. The auto-scaling is set at 1-3.
How do I make the node spawning process faster?
Any leads are appreciated!
Node version: 1.22.10-gke.600
Size: Number of nodes: 0
Autoscaling: On (1-5 nodes)
CPU target limit: 40%
Node zones: us-east1-b
It usually takes 1-2 minutes for the node to re spawn (when the condition matches). But when you are deleting the node there might be no need for the new node.
If you want to spawn it faster either you try to increase the traffic/load or you can decrease the CPU target limit in the HPA (let's say 50% or less).
For more you can check this answer

Kubernetes Cronjob: Reset missed start times after cluster recovery

I have a cluster that includes a Cronjob scheduled to run every 5 minutes.
We recently experienced an issue that incurred downtime and required manual recovery of the cluster. Although now healthy again, this particular cronjob is failing to run with the following error:
Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
I understand that the Cronjob has 'missed' a number of scheduled jobs while the cluster was down, and this has past a threshold at which no further jobs will be scheduled.
How can I reset the number of missed start times and have these jobs scheduled again (without scheduling all the missed jobs to suddenly run?)
Per the kubernetes Cronjob docs, there does not seem to be a way to cleanly resolve this. Setting the .spec.startingDeadlineSeconds value to a large number will re-schedule all missed occurrences that fall within the increased window.
My solution was just to kubectl delete cronjob x-y-z and recreate it, which worked as desired.

Pods start to be in pending state too long

I have a cluster where jobs are created in order of what my users do.
Sometimes I can have 0 job in parallel and sometimes 20 to 100.
I have set the following limits for each container:
cpu limit: 512m
memory limit: 512Mi;
cpu request: 256m;
memroy request: 128Mi;
I have by default 2 nodes and each one has:
7.91 CPU allocable
10.16 GB allocable
The node pool can scale to 5 nodes max.
But when the cluster starts to have 8 and more jobs in parallel, the new jobs start to be in pending, waiting for other jobs to get down.
If a job is selected to start directly it will be completed in 6 to 7 seconds.
But when the cluster starts to struggle from 8 or 10 jobs, each job take approximately 20 seconds to be completed, because it blocked in pending state or in ContainerCreating state.
I have IfNotPresent as imagePullPolicy and each image has a version.
I suppose the cluster will start struggling with 28 jobs knowing my allocable resources, then creates a new node and so on.
Why am I wrong ?
Is it possible to force each container to start without the pending state ?
I have found a new scheduler, but i am not sure if it can help me poseidon-firmament-alternate-scheduler ?

Running multiple pods simultaneously takes a lot of time in kubernetes

On the local machine, I am running multiple pods at the same time. It takes a lot of time to complete even though all the pods achieve the running state almost instantly. Internally i am running a docker image (1.8GB) on each pod. When i run pods on serial order, it takes around 12 sec/pod run but running parallely, the time increases exponentially not even at least the same as serial. What could be the probable cause for this?
EDIT 1
The operation is really cpu intensive, reaching above 90%. Is there a way to queue pods as they come for cpu resources, so that instead of all slowing down, each of them execute fast in a queue.