Celery Upgrade - Workers not working with RabbitMQ - kubernetes

Problem description
I have a working django application using Celery along with Mongo and
RMQ (3.7.17-management-alpine)
The application runs on kubernetes
cluster The application works fine in general
But when I upgrade Celery (3.1.25) and Kombu (3.0.37) to Celery (4.1.0) and Kombu
(4.1.0), I face following issue:
Celery worker pods come up but do
not receive the tasks
I have verified that RMQ receives the messages
needed to run tasks in celery workers
There is no error in RMQ or celery worker pod
In fact, celery pod mentions that it is connected
to RMQ
Strangely, when I restart the RMQ pod after Celery worker pod
comes up things become fine
I am able to run the new tasks in celery
workers after I restart RMQ pod
So I guess something happened wrt Celery/Kombu/RMQ after upgrade to 4.1.0
The code works fine with older version of Celery and Kombu.
Can someone please help me wrt this?

Related

How to keep Flink taskmanager pod running on K8s

I've created a Flink cluster using Session mode on native K8s using the command:
$ ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=my-first-flink-cluster
based on these instructions.
I am able to submit jobs to the cluster and view the Flink UI. However, I noticed that Flink creates a taskmanager pod only when the job is submitted and deletes it right after the job is finished. Previously I tried the same using YARN based deployment on Google Dataproc and with that method the cluster had a taskmanager always running which reduced job start time.
Hence, is there a way to keep a taskmanager pod always running using K8s Flink deployment such that job start time is reduced?
the intention of the native k8s support provided by Flink is to have this active resource allocation (i.e. task slots through new TaskManager instances) in case it is needed. In addition to that, it will allow the shutdown of TaskManager pods if they are not used anymore. That's the behavior you're observing.
What you're looking for is the standalone k8s support. Here, Flink does not try to start new TaskManager pods. The ResourceManager is passive, i.e. it only considers the TaskManagers that are registered. Some outside process (or a user) has to manage TaskManager pods instead. This might lead to jobs failing if there are not enough task slots available.
Best,
Matthias

Airflow, setup more than one scheduler on docker compose yaml

I'm running Airflow 2.2.5 using docker compose setup. I use celery executor and 10+ worker nodes on different machines. This setup works fine for few worker nodes, but if I launch all 12 nodes, the worker instances start to crash. I suspect, that the reason might be that scheduler can't handle the traffic from all the worker nodes.
I would like to test a setup where I have multiple schedulers on the main node to see if this solves my problem. I was unable to find an answer on how to implement this sort of setup on my docker compose file. Can I just make two services scheduler1 and scheduler2 with identical definitions or is there a better way?
The official documentation was bit short on this one:
https://airflow.apache.org/docs/apache-airflow/2.0.2/scheduler.html?highlight=scheduler#running-more-than-one-scheduler
I know that in the Kubernetes setup the scheduler count is just one parameter, but unfortunately I do not have Kubernetes at hand at the moment.
Found an answer from this post. It didn't solve the celery issue though. How to set up multiple schedulers for airflow

how to config prometheus + celery flower auth?

I use celery flower for task monitoring and recently I decided to add prometheus and everything worked fine until I added --auth option to my flower command, after this prometheus stopped getting data from celery flower, so my guess is that auth also stopped prometheus from getting data. Is there a way to auth prometheus in flower without compromising security?

Does Airflow Kubernetes Executor run any operator?

I am assessing the migration of my current Airflow deployment from Celery executor to Kubernetes (K8s) executor to leverage the dynamic allocation of resources and the isolation of tasks provided by pods.
It is clear to me that we can use the native KubernetesPodOperator to run tasks on a K8s cluster via the K8s executor. However I couldn't find info about the compatibility between the K8s executor with other operators, such as bash and athena.
Here is the question is it possible to run a bash (or any other) operator on a K8s powered Airflow or I should migrate all my tasks to the KubernetesPodOperator?
Thanks!
Kubernetes executor will work with all operators.
Using the kubernetes executor will create a worker pod for every task instead of using the celery worker as the celery executor will.
Using the KubernetesPodOperator will pull any specific image to launch a pod and execute your task.
So if you are to use the KubernetesPodOperator with the KubernetesExecutor, Airflow will launch a worker pod for your task, and that task will launch a pod and monitor its execution. 2 pods for 1 task.
If you use a BashOperator with the KubernetesExecutor, Airflow will launch a worker pod and execute bash commands on that worker pod. 1 pod for 1 task.

How to properly use Kubernetes for job scheduling?

I have the following system in mind: A master program that polls a list of tasks to see if they should be launched (based on some trigger information). The tasks themselves are container images in some repository. Tasks are executed as jobs on a Kubernetes cluster to ensure that they are run to completion. The master program is a container executing in a pod that is kept running indefinitely by a replication controller.
However, I have not stumbled upon this pattern of launching jobs from a pod. Every tutorial seems to be assuming that I just call kubectl from outside the cluster. Of course I could do this but then I would have to ensure the master program's availability and reliability through some other system. So am I missing something? Launching one-off jobs from inside an indefinitely running pod seems to me as a perfectly valid use case for Kubernetes.
Your master program can utilize the Kubernetes client libraries to preform operations on a cluster. Find a complete example here.