Airflow kubernetes architecture understanding - kubernetes

I'm trying to understand the Arch of Airflow on Kubernetes.
Using the helm and Kubernetes executor, the installation mounts 3 pods called: Trigger, WebServer, and Scheduler...
When I run a dag using the Kubernetes pod operator, it also mounts 2 pods more: one with the dag name and another one with the task name...
I want to understand the communication between pods... So far I know the only the expressed in the image:
Note: I'm using the git sync option
Thanks in advance for the help that you can give me!!

Airflow application has some components that require for it to operate normally: Webserver, Database, Scheduler, Trigger, Worker(s), Executor. You can read about it here.
Lets go over the options:
Kubernetes Executor (As you choose):
In your instance since you are deploying on Kubernetes with Kubernetes Executor then each task being executed is a pod. Airflow wraps the task with a Pod no matter what task it is. This brings to the front the isolation that Kubernetes offer, this also bring the overhead of creating a pod for each task. Choosing Kubernetes Executor often goes with case where many/most of your tasks takes long time to execute - as if your tasks takes 5 seconds to complete it might not be worth to pay the overhead of creating pod for each task. As for what you see as the DAG -> Task1 in your diagram. Consider that the Scheduler launches the Airflow workers. The workers are starting the tasks in new pods. So the worker needs to monitor the execution of the task.
Celery Executor - Setting up a Worker/Pod which tasks can run in it. This gives you speed as there is no need to create pod for each task but there is no isolation for each task. Noting that using this executor doesn't mean that you can't run tasks in their own Pod. User can run KubernetesPodOperator and it will create a Pod for the task.
CeleryKubernetes Executor - Enjoying both worlds. You decide which tasks will be executed by Celery and which by Kubernetes. So for example you can set small short tasks to Celery and longer tasks to Kubernetes.
How will it look like Pod wise?
Kubernetes Executor - Every task creates a pod. PythonOperator, BashOperator - all of them will be wrapped with pods (user doesn't need to change anything on his DAG code).
Celery Executor - Every task will be executed in a Celery worker(pod). So the pod is always in Running waiting to get tasks. You can create a dedicated pod for a task if you will explicitly use KubernetesPodOperator.
CeleryKubernetes - Combining both of the above.
Note again that you can use each one of these executors with Kubernetes environment. Keep in mind that all of these are just executors. Airflow has other components like mentioned earlier so it's very OK to deploy Airflow on Kubernetes (Scheduler, Webserver) but to use CeleryExecutor thus the user code (tasks) are not creating new pods automatically.
As for Triggers since you asked about it specifically - It's a feature added in Airflow 2.2: Deferrable Operators & Triggers it allows tasks to defer and release worker slot.

Related

Can I "recycle" Pods when using Kubernetes Executor on Airflow, to avoid repeating the Initialization?

Context: Running Airflow v2.2.2 with Kubernetes Executor.
I am running a process that creates a burst of quick tasks.
The tasks are short enough that the Kubernetes Pod initialization takes up the majority of runtime for many of them.
Is there a way to re-utilize pre-initialized pods for multiple tasks?
I've found a comment in an old issue that states that when running the Subdag operator, all subtasks will be run on the same pod, but I cannot find any further information. Is this true?
I have searched the following resources:
Airflow Documentation: Kubernetes
Airflow Documentation: KubernetesExecutor
Airflow Documentation: KubernetesPodOperator
StackOverflow threads: Run two jobs on same pod, Best Run Airflow on Kube
Google Search: airflow kubernetes reuse initialization
But haven't really found anything that directly addresses my problem.
I don't think this is possible in Airlfow even with Subdag operator which runs a separate dag as a part of the current dag with the same way used for the other tasks.
To solve your problem, you can use CeleryKubernetesExecutor which combine CeleryExecutor and KubernetesExecutor. By default the tasks will be queued in Celery queue but for the heavy tasks, you can choose K8S queue in order to run them in isolated pods. In this case you will be able to use the Celery workers which are up all the time.
kubernetes_task= BashOperator(
task_id='kubernetes_executor_task',
bash_command='echo "Hello from Kubernetes"',
queue = 'kubernetes'
)
celery_task = BashOperator(
task_id='celery_executor_task',
bash_command='echo "Hello from Celery"',
)
If you are worry about the scalability/cost of Celery workers, you can use KEDA to scale the Celery workers from 0 workers to a maximum number of workers based on queued tasks count.

Airflow fault tolerance

I have 2 questions:
first, what does it mean that the Kubernetes executor is fault tolerance, in other words, what happens if one worker nodes gets down?
Second question, is it possible that the whole Airflow server gets down? if yes, is there a backup that runs automatically to continue the work?
Note: I have started learning airflow recently.
Thanks in advance
This is a theoretical question that faced me while learning apache airflow, I have read the documentation
but it did not mention how fault tolerance is handled
what does it mean that the Kubernetes executor is fault tolerance?
Airflow scheduler use a Kubernetes API watcher to watch the state of the workers (tasks) on each change in order to discover failed pods. When a worker pod gets down, the scheduler detect this failure and change the state of the failed tasks in the Metadata, then these tasks can be rescheduled and executed based on the retry configurations.
is it possible that the whole Airflow server gets down?
yes it is possible for different reasons, and you have some different solutions/tips for each one:
problem in the Metadata: the most important part in Airflow is the Metadata where it's the central point used to communicate between the different schedulers and workers, and it is used to save the state of all the dag runs and tasks, and to share messages between tasks, and to store variables and connections, so when it gets down, everything will fail:
you can use a managed service (AWS RDS or Aurora, GCP Cloud SQL or Cloud Spanner, ...)
you can deploy it on your K8S cluster but in HA mode (doc for postgresql)
problem with the scheduler: the scheduler is running as a pod, and the is a possibility to lose depending on how you deploy it:
Try to request enough resources (especially memory) to avoid OOM problem
Avoid running it on spot/preemptible VMs
Create multiple replicas (minimum 3) for the scheduler to activate HA mode, in this case if a scheduler gets down, there will be other schedulers up
problem with webserver pod: it doesn't affect your workload, but you will not be able to access the UI/API during the downtime:
Try to request enough resources (especially memory) to avoid OOM problem
It's a stateless service, so you can create multiple replicas without any problem, if one gets down, you will access the UI/API using the other replicas

Spring boot scheduler running cron job for each pod

Current Setup
We have kubernetes cluster setup with 3 kubernetes pods which run spring boot application. We run a job every 12 hrs using spring boot scheduler to get some data and cache it.(there is queue setup but I will not go on those details as my query is for the setup before we get to queue)
Problem
Because we have 3 pods and scheduler is at application level , we make 3 calls for data set and each pod gets the response and pod which processes at caches it first becomes the master and other 2 pods replicate the data from that instance.
I see this as a problem because we will increase number of jobs for get more datasets , so this will multiply the number of calls made.
I am not from Devops side and have limited azure knowledge hence I need some help from community
Need
What are the options available to improve this? I want to separate out Cron schedule to run only once and not for each pod
1 - Can I keep cronjob at cluster level , i have read about it here https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/
Will this solve a problem?
2 - I googled and found other option is to run a Cronjob which will schedule a job to completion, will that help and not sure what it really means.
Thanks in Advance to taking out time to read it.
Based on my understanding of your problem, it looks like you have following two choices (at least) -
If you continue to have scheduling logic within your springboot main app, then you may want to explore something like shedlock that helps make sure your scheduled job through app code executes only once via an external lock provider like MySQL, Redis, etc. when the app code is running on multiple nodes (or kubernetes pods in your case).
If you can separate out the scheduler specific app code into its own executable process (i.e. that code can run in separate set of pods than your main application code pods), then you can levarage kubernetes cronjob to schedule kubernetes job that internally creates pods and runs your application logic. Benefit of this approach is that you can use native kubernetes cronjob parameters like concurrency and few others to ensure the job runs only once during scheduled time through single pod.
With approach (1), you get to couple your scheduler code with your main app and run them together in same pods.
With approach (2), you'd have to separate your code (that runs in scheduler) from overall application code, containerize it into its own image, and then configure kubernetes cronjob schedule with this new image referring official guide example and kubernetes cronjob best practices (authored by me but can find other examples).
Both approaches have their own merits and de-merits, so you can evaluate them to suit your needs best.

Airflow - Async pod lauch

we have a long-running task in airflow (5hr and more)
I'm looking on a way to do fire a forget (after the pod started running) and monitor the status of the task using a sensor
The rationale is to free the worker and reduce resource consumption.
This line seems to monitor for pod health.
https://github.com/apache/airflow/blob/10ce31127f1ff87176158935925afce46a989917/airflow/kubernetes/pod_launcher.py#L140
Now, I know I can override this limitation using a Cloud Function to trigger the pod, but this is a kind of over-complication of the DAG.
I've also opened an issue in github

Task definitions AWS Fargate

Let us say I am defining a task definition in AWS Fargate, this task definition would be used to start up tasks that involve a multi-container application regarding 2 web servers. How many task definitions would I need, how many tasks would I pay for and how many services are create?
I have read a lot of documentation, but it does not click for me. Is there anyone who can explain the correlation between: task definitions, task/s, Docker containers, services and ECS Fargate clusters?
A task definition is a specification. You use it to define one or more containers (with image URIs) that you want to run together, along with other details such as environment variables, CPU/memory requirements, etc. The task definition doesn't actually run anything, its a description of how things will be set up when something does run.
A task is an actual thing that is running. ECS uses the task definition to run the task; it downloads the container images, configures the runtime environment based on other details in the task definition. You can run one or many tasks for any given task definition. Each running task is a set of one or more running containers - the containers in a task all run on the same instance.
A service in ECS is a way to run N tasks all using the same task definition, and keep those N tasks running if they happen to shut down unexpectedly. Those N tasks can run on different instances in EC2 (although some may run on the same instance depending on the placement strategy used for the service); on Fargate, there are no instances and the tasks "just run", so you don't have to think about placement strategies. You can also use services to connect those tasks to a load balancer, so that requests from a client inside or outside of AWS can be routed evenly cross all N tasks. You can update the task definition used by a service, which will then trigger a rolling update (starting up and shutting down running tasks) so that all running tasks will be using the new version of the task definition after the deployment completes. This is used, for example, when you create a new container image and want your service to be updated to use the latest version.
A service is scoped to a cluster. A cluster is really just a name. Different clusters can have different IAM policies and roles, so that you can restrict who can create services in different clusters using IAM.