Recommendation needed for running cron in replicated services - kubernetes

If let's say my service is running cron jobs. But the service replica is set to 2 in the Kubernetes, this will cause the cron job to run twice at each schedule.
Is there any design recommendation to prevent this?
Should we only have 1 replicas for a cron job running services or is there any other way?

Related

If the deployment has 2 replicas, will the scheduler in quarkus be executed 2 times?

am new to kubernetes. i have question about kubernetes.
i have quarkus deployment and have scheduling periodic tasks.
is there any chance that the scheduler will run as many replicas?
#Scheduled(cron="3 0 2 * * ?")
Yes.
Unless you implement some kind of clustering in between your applications, replicas are standalone processes that would execute just the same.
Yes, if you want to schedule a single task execution per cluster you will need to use a clustered quartz setup: https://quarkus.io/guides/quartz

Airflow kubernetes architecture understanding

I'm trying to understand the Arch of Airflow on Kubernetes.
Using the helm and Kubernetes executor, the installation mounts 3 pods called: Trigger, WebServer, and Scheduler...
When I run a dag using the Kubernetes pod operator, it also mounts 2 pods more: one with the dag name and another one with the task name...
I want to understand the communication between pods... So far I know the only the expressed in the image:
Note: I'm using the git sync option
Thanks in advance for the help that you can give me!!
Airflow application has some components that require for it to operate normally: Webserver, Database, Scheduler, Trigger, Worker(s), Executor. You can read about it here.
Lets go over the options:
Kubernetes Executor (As you choose):
In your instance since you are deploying on Kubernetes with Kubernetes Executor then each task being executed is a pod. Airflow wraps the task with a Pod no matter what task it is. This brings to the front the isolation that Kubernetes offer, this also bring the overhead of creating a pod for each task. Choosing Kubernetes Executor often goes with case where many/most of your tasks takes long time to execute - as if your tasks takes 5 seconds to complete it might not be worth to pay the overhead of creating pod for each task. As for what you see as the DAG -> Task1 in your diagram. Consider that the Scheduler launches the Airflow workers. The workers are starting the tasks in new pods. So the worker needs to monitor the execution of the task.
Celery Executor - Setting up a Worker/Pod which tasks can run in it. This gives you speed as there is no need to create pod for each task but there is no isolation for each task. Noting that using this executor doesn't mean that you can't run tasks in their own Pod. User can run KubernetesPodOperator and it will create a Pod for the task.
CeleryKubernetes Executor - Enjoying both worlds. You decide which tasks will be executed by Celery and which by Kubernetes. So for example you can set small short tasks to Celery and longer tasks to Kubernetes.
How will it look like Pod wise?
Kubernetes Executor - Every task creates a pod. PythonOperator, BashOperator - all of them will be wrapped with pods (user doesn't need to change anything on his DAG code).
Celery Executor - Every task will be executed in a Celery worker(pod). So the pod is always in Running waiting to get tasks. You can create a dedicated pod for a task if you will explicitly use KubernetesPodOperator.
CeleryKubernetes - Combining both of the above.
Note again that you can use each one of these executors with Kubernetes environment. Keep in mind that all of these are just executors. Airflow has other components like mentioned earlier so it's very OK to deploy Airflow on Kubernetes (Scheduler, Webserver) but to use CeleryExecutor thus the user code (tasks) are not creating new pods automatically.
As for Triggers since you asked about it specifically - It's a feature added in Airflow 2.2: Deferrable Operators & Triggers it allows tasks to defer and release worker slot.

Spring boot scheduler running cron job for each pod

Current Setup
We have kubernetes cluster setup with 3 kubernetes pods which run spring boot application. We run a job every 12 hrs using spring boot scheduler to get some data and cache it.(there is queue setup but I will not go on those details as my query is for the setup before we get to queue)
Problem
Because we have 3 pods and scheduler is at application level , we make 3 calls for data set and each pod gets the response and pod which processes at caches it first becomes the master and other 2 pods replicate the data from that instance.
I see this as a problem because we will increase number of jobs for get more datasets , so this will multiply the number of calls made.
I am not from Devops side and have limited azure knowledge hence I need some help from community
Need
What are the options available to improve this? I want to separate out Cron schedule to run only once and not for each pod
1 - Can I keep cronjob at cluster level , i have read about it here https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/
Will this solve a problem?
2 - I googled and found other option is to run a Cronjob which will schedule a job to completion, will that help and not sure what it really means.
Thanks in Advance to taking out time to read it.
Based on my understanding of your problem, it looks like you have following two choices (at least) -
If you continue to have scheduling logic within your springboot main app, then you may want to explore something like shedlock that helps make sure your scheduled job through app code executes only once via an external lock provider like MySQL, Redis, etc. when the app code is running on multiple nodes (or kubernetes pods in your case).
If you can separate out the scheduler specific app code into its own executable process (i.e. that code can run in separate set of pods than your main application code pods), then you can levarage kubernetes cronjob to schedule kubernetes job that internally creates pods and runs your application logic. Benefit of this approach is that you can use native kubernetes cronjob parameters like concurrency and few others to ensure the job runs only once during scheduled time through single pod.
With approach (1), you get to couple your scheduler code with your main app and run them together in same pods.
With approach (2), you'd have to separate your code (that runs in scheduler) from overall application code, containerize it into its own image, and then configure kubernetes cronjob schedule with this new image referring official guide example and kubernetes cronjob best practices (authored by me but can find other examples).
Both approaches have their own merits and de-merits, so you can evaluate them to suit your needs best.

How to run kubernetes pod for a set period of time each day?

I'm looking for a way to deploy a pod on kubernetes to run for a few hours each day. Essentially I want it to run every morning at 8AM and continue running until about 5:30 PM.
I've been researching a lot and haven't found a way to deploy the pod with a specific timeframe in mind. I've found cron jobs, but that seems to be to be for pods that terminate themselves, whereas mine should be running constantly.
Is there any way to deploy my pod on kubernetes this way? Or should I just set up the pod itself to run its intended application based on its internal clock?
According to the Kubernetes architecture, a Job creates one or more pods and ensures that a specified number of them successfully terminate. As pods successfully complete, the job tracks the successful completions. When a specified number of successful completions is reached, the job itself is complete.
In simple words, Jobs run until completion or failure. That's why there is no option to schedule a Cron Job termination in Kubernetes.
In your case, you can start a Cron Job regularly and terminate it using one of the following options:
A better way is to terminate a container by itself, so you can add such functionality to your application or use Cron. More information about how to add Cron to the Docker container, you can find here.
You can use another Cron Job to terminate your Cron Job. You need to run a command inside a Pod to find and delete a Pod related to your Job. For more information, you can look through this link. But it is not a good way, because your Cron Job will always have failed status.
In both cases, you need to check with what status your Cron Job was finished and use the correct RestartPolicy accordingly.
It seems you can implement using a cronjob object,
[ https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#creating-a-cron-job ]

Prevent Cron jobs from Single Point of Failure

I have many cron jobs running on server, which include like DB Backup (daily), sending Notification to users(hourly).
Currently i have 5 API Servers, and cron jobs is setup on one of the API Server.
I want to prevent Cron jobs from Single Point of failure. What if the machine crashed in which cron jobs has been setup.
Any suggestions please.