Celery executor in airflow without Backend - celery

When running airflow in a single node, can I configure the celeryexecutor without a Celery Backend (redis, rabbitmq, zookeeper) etc.
Currently, since there is no backend infrastcucture available, running webserver and scheduler in single node (Localexecutor)

Related

How to keep Flink taskmanager pod running on K8s

I've created a Flink cluster using Session mode on native K8s using the command:
$ ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=my-first-flink-cluster
based on these instructions.
I am able to submit jobs to the cluster and view the Flink UI. However, I noticed that Flink creates a taskmanager pod only when the job is submitted and deletes it right after the job is finished. Previously I tried the same using YARN based deployment on Google Dataproc and with that method the cluster had a taskmanager always running which reduced job start time.
Hence, is there a way to keep a taskmanager pod always running using K8s Flink deployment such that job start time is reduced?
the intention of the native k8s support provided by Flink is to have this active resource allocation (i.e. task slots through new TaskManager instances) in case it is needed. In addition to that, it will allow the shutdown of TaskManager pods if they are not used anymore. That's the behavior you're observing.
What you're looking for is the standalone k8s support. Here, Flink does not try to start new TaskManager pods. The ResourceManager is passive, i.e. it only considers the TaskManagers that are registered. Some outside process (or a user) has to manage TaskManager pods instead. This might lead to jobs failing if there are not enough task slots available.
Best,
Matthias

Airflow with KubernetesExecutor: how to get network access to worker pods from outside?

I have to run airflow DAGs in KubernetesExecutor (with mostly PythonVirtualenvOperators). And I want to receive incoming connections in my operators.
How can it be done? I really need it to run SparkSession in .master("yarn") mode.

Celery Upgrade - Workers not working with RabbitMQ

Problem description
I have a working django application using Celery along with Mongo and
RMQ (3.7.17-management-alpine)
The application runs on kubernetes
cluster The application works fine in general
But when I upgrade Celery (3.1.25) and Kombu (3.0.37) to Celery (4.1.0) and Kombu
(4.1.0), I face following issue:
Celery worker pods come up but do
not receive the tasks
I have verified that RMQ receives the messages
needed to run tasks in celery workers
There is no error in RMQ or celery worker pod
In fact, celery pod mentions that it is connected
to RMQ
Strangely, when I restart the RMQ pod after Celery worker pod
comes up things become fine
I am able to run the new tasks in celery
workers after I restart RMQ pod
So I guess something happened wrt Celery/Kombu/RMQ after upgrade to 4.1.0
The code works fine with older version of Celery and Kombu.
Can someone please help me wrt this?

Does Airflow Kubernetes Executor run any operator?

I am assessing the migration of my current Airflow deployment from Celery executor to Kubernetes (K8s) executor to leverage the dynamic allocation of resources and the isolation of tasks provided by pods.
It is clear to me that we can use the native KubernetesPodOperator to run tasks on a K8s cluster via the K8s executor. However I couldn't find info about the compatibility between the K8s executor with other operators, such as bash and athena.
Here is the question is it possible to run a bash (or any other) operator on a K8s powered Airflow or I should migrate all my tasks to the KubernetesPodOperator?
Thanks!
Kubernetes executor will work with all operators.
Using the kubernetes executor will create a worker pod for every task instead of using the celery worker as the celery executor will.
Using the KubernetesPodOperator will pull any specific image to launch a pod and execute your task.
So if you are to use the KubernetesPodOperator with the KubernetesExecutor, Airflow will launch a worker pod for your task, and that task will launch a pod and monitor its execution. 2 pods for 1 task.
If you use a BashOperator with the KubernetesExecutor, Airflow will launch a worker pod and execute bash commands on that worker pod. 1 pod for 1 task.

Is anyway to deploy airflow on elastic bean stack (separate environments for web server, scheduler and worker)

I am trying to set up airflow on AWS elastic beanstalk. Separate environments for the webserver, scheduler, and workers. So, each of them can be scaled independently.
Problem:
1. How do we know the health of all the applications ( webserver, scheduler, and workers)?
2. Is there any better way to deploy airflow in prod env?
Motivation to deploy on AWS elastic beanstalk: If any of the servers gets down, AWS will manage the server and spin up a new server/servers.
Currently, I am running webserver, scheduler, and workers on one ec2 machine. I am running all these as follows: airflow webserver, airflow scheduler, airflow worker respectively. I am across "/health" (https://airflow.apache.org/howto/check-health.html). But it has a health check for scheduler and metadatabase. But this endpoint is showing unhealthy for the scheduler, even when the scheduler is running.