Can i schedule a quartz job which will be picked by a specific node/server? - quartz-scheduler

I am working in quartz scheduler. We have cluster environment for scheduler, can i schedule a quartz job which will be picked by a specific node/server?
For example : I have 3 Node in cluster: Node1, Node2 and Node3 and have 2 Jobs, job1 and job2.
I want to that every time i schedule job2, it will be picked by only Node2. For Job1 there is no restriction.
Thanks in advance.

Set org.quartz.scheduler.instanceName=node1 in scheduler.properties file.At time of job scheduling, we can set SchedName in job detail and trigger as node1, quartz will pick node1 as server to run job.

Related

Airflow kubernetes architecture understanding

I'm trying to understand the Arch of Airflow on Kubernetes.
Using the helm and Kubernetes executor, the installation mounts 3 pods called: Trigger, WebServer, and Scheduler...
When I run a dag using the Kubernetes pod operator, it also mounts 2 pods more: one with the dag name and another one with the task name...
I want to understand the communication between pods... So far I know the only the expressed in the image:
Note: I'm using the git sync option
Thanks in advance for the help that you can give me!!
Airflow application has some components that require for it to operate normally: Webserver, Database, Scheduler, Trigger, Worker(s), Executor. You can read about it here.
Lets go over the options:
Kubernetes Executor (As you choose):
In your instance since you are deploying on Kubernetes with Kubernetes Executor then each task being executed is a pod. Airflow wraps the task with a Pod no matter what task it is. This brings to the front the isolation that Kubernetes offer, this also bring the overhead of creating a pod for each task. Choosing Kubernetes Executor often goes with case where many/most of your tasks takes long time to execute - as if your tasks takes 5 seconds to complete it might not be worth to pay the overhead of creating pod for each task. As for what you see as the DAG -> Task1 in your diagram. Consider that the Scheduler launches the Airflow workers. The workers are starting the tasks in new pods. So the worker needs to monitor the execution of the task.
Celery Executor - Setting up a Worker/Pod which tasks can run in it. This gives you speed as there is no need to create pod for each task but there is no isolation for each task. Noting that using this executor doesn't mean that you can't run tasks in their own Pod. User can run KubernetesPodOperator and it will create a Pod for the task.
CeleryKubernetes Executor - Enjoying both worlds. You decide which tasks will be executed by Celery and which by Kubernetes. So for example you can set small short tasks to Celery and longer tasks to Kubernetes.
How will it look like Pod wise?
Kubernetes Executor - Every task creates a pod. PythonOperator, BashOperator - all of them will be wrapped with pods (user doesn't need to change anything on his DAG code).
Celery Executor - Every task will be executed in a Celery worker(pod). So the pod is always in Running waiting to get tasks. You can create a dedicated pod for a task if you will explicitly use KubernetesPodOperator.
CeleryKubernetes - Combining both of the above.
Note again that you can use each one of these executors with Kubernetes environment. Keep in mind that all of these are just executors. Airflow has other components like mentioned earlier so it's very OK to deploy Airflow on Kubernetes (Scheduler, Webserver) but to use CeleryExecutor thus the user code (tasks) are not creating new pods automatically.
As for Triggers since you asked about it specifically - It's a feature added in Airflow 2.2: Deferrable Operators & Triggers it allows tasks to defer and release worker slot.

Fargate: how to stop task after job done?

I need to calculate the task on my Fargate cluster and after finishing calculating the task should be stopped or terminated to decrease payments.
The consequence of my actions:
One task always running on EC2 Cluster and checking DB to the new data.
If new data appears, Boto3 runs the Fargate container.
After the job is done, the task should be stopped.
Also if in the DB appears the second row of data during proceeding the first task, Fargate should create a second task for the second job. and then stop tasks...
So, I have:
Task written on Python and deployed on ECR
Fargate cluster
Task definition (describe Memory, CPU's and container)
How task know, that it should be closed? Any idea or solution to stop task after job done?

Recommendation needed for running cron in replicated services

If let's say my service is running cron jobs. But the service replica is set to 2 in the Kubernetes, this will cause the cron job to run twice at each schedule.
Is there any design recommendation to prevent this?
Should we only have 1 replicas for a cron job running services or is there any other way?

How to scheduling jobs created in different scheduler instance

I am using spring-boot-starter-quartz 2.2.1.RELEASE to schedule Quartz jobs.And I've deploy my code on two nodes.
And the quartz.properties is like this:
For node one:
org.quartz.scheduler.instanceName: machine1
org.quartz.scheduler.instanceId = AUTO
For node two:
org.quartz.scheduler.instanceName: machine2
org.quartz.scheduler.instanceId = AUTO
So in this situation, each node can run the same scanning job separately.
And now in my database in "qrtz_job_details", I can have two job records ,namely scanJobbyMachine1 and scanJobbyMachine2.
I also deployed an frontend UI on node1 that have RESTful API to schedule jobs. And I use nginx to randomly send my request to one of my nodes.
If I made a request to query all jobs, the request may be sent to node1 and only node1's job will be shown. But I want to show both node1 and node2's jobs.
If I made a request to update scanJobbyMachine1, and it may be sent to node2. And update can't be made, because node2 only have properties file whose instanceName is machine2.
Here is my plan:
Plan A:use cluster mode. But Quartz doesn't support "Allow Job Excution to be pin to a cluster node" yet. So in cluster mode, my job will only be excuted by one node. But I want both node to do the scanning jobs.
here is the issue link in github
Plan B:use Non cluster mode. Then I have to write duplicate APIs in Controller like this:
localhost:8090/machine0/updateJob
localhost:8090/machine1/updateJob
And use nginx to set when I request /machine0/updateJob, send it to 10.110.200.60(machine1's ip), when I request /machine1/updateJob, send it to 10.110.200.62(machine2's ip)
And for queryAllJobs I have to use my backend to send request to 10.110.200.60 and 10.110.200.62 first, and combine the response list in my backend, then show it in the frontend.
Plan C:write another backend with two properties files. just to schedule the jobs and don't excute these jobs (I don't know if this can work) and depoyed it on these two nodes.
I really don't want write and deploy another backend like Plan C or write duplicate APIs like Plan B.
Any good ideas?
your problem is a cluster problem :)
Perhaps you could dynamically configure your job registration: as node1 start, it will be alone on the cluster and run jobs only on this one. As node2 start, the two nodes can be called.

Can Rundeck Be Configured to Use Different Nodes If Someone Else is Already Running A Job On Current Node?

I am trying to see if rundeck is capable of deciding if a node is currently busy runnning another job and it will switch to another node and ruin the job there instead.
For example, I am current running a job on NODE1, then another person logs into rundeck and decide to run their job on NODE1, but NODE1 is busy running my job so rundeck will automatically run their job on NODE2.
Thanks
This could be possible assuming the following design:
1) Create/use a resource model source plugin that sets attributes about the node's business. This can be a metric like load status or something else that you use to gauge rundeck utilization.
2) Write the job with a nodefilter to match on the attribute about utilization metric
3 ) Define the job to use the RandomSubset Orchestration strategy specifying to use 1 node.