Quartz scheduler shutdown(true) wait for all threads started from running Jobs to stop? - quartz-scheduler

If I have a job and from that job I create some threads, what happens when I call scheduler.shutdown(true)?
Will the scheduler wait for all of my threads to finish or not?

Quartz 1.8.1 API docs:
Halts the Scheduler's firing of Triggers, and cleans up all resources associated with the Scheduler.
Parameters:
waitForJobsToComplete - if true the scheduler will not allow this method to return until all currently executing jobs have completed.
Quarts neither know nor cares about any threads spawned by your job, it will simply wait for the job to complete. If your job spawns new threads then exits, then as far as Quartz is concerned, it's finished.
If your job needs to wait for its spawned threads to complete, then you need to use something like an ExecutorService (see javadoc for java.util.concurrent), which will allow the job thread to wait for its spawned threads to complete. If you're using raw java threads, then use Thread.join().

Related

ADF Scheduling when existing Job not yet finished

Having read https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-scheduling-and-execution, it is unclear to me if:
A schedule is made every hr for a job to run,
can we stop the concurrent execution of the next job at hr+1 if the job for hr+0 is still running?
It looks if concurrency = 1 means this,
But is that invocation simply not start until concurrent execution is finished?
Or will it be discarded?
When we set the concurrency 1, only one instance will be allowed to run at a time. When the scheduled trigger runs again and tries to run the pipeline, If the pipeline is already running, the next invocation will be queued. It will start after finishing the current instance.
For your question, the following invocation will be queued. After the first run finishes, the next run will start.

How to safely restart Airflow and kill a long-running task?

I have Airflow is running in Kubernetes using the CeleryExecutor. Airflow submits and monitors Spark jobs using the DatabricksOperator.
My streaming Spark jobs have a very long runtime (they run forever unless they fail or are cancelled). When pods for Airflow worker are killed while a streaming job is running, the following happens:
Associated task becomes a zombie (running state, but no process with heartbeat)
Task is marked as failed when Airflow reaps zombies
Spark streaming job continues to run
How can I force the worker to kill my Spark job before it shuts down?
I've tried killing the Celery worker with a TERM signal, but apparently that causes Celery to stop accepting new tasks and wait for current tasks to finish (docs).
You need to be more clear about the issue. If you are saying that the spark cluster finishes the jobs as expected and not calling the on_kill function, it's expected behavior. As per the docs on kill function is for cleaning up after task get killed.
def on_kill(self) -> None:
"""
Override this method to cleanup subprocesses when a task instance
gets killed. Any use of the threading, subprocess or multiprocessing
module within an operator needs to be cleaned up or it will leave
ghost processes behind.
"""
In your case when you manually kill the job it is doing what it has to do.
Now if you want to have a clean_up even after successful completion of the job, override post_execute function.
As per the docs. The post execute is
def post_execute(self, context: Any, result: Any = None):
"""
This hook is triggered right after self.execute() is called.
It is passed the execution context and any results returned by the
operator.
"""

Should we unschedule Sling Jobs running within AEM after they are completed?

I am creating multiple SlingJobs on the fly using org.apache.sling.commons.scheduler.Scheduler OSGi service in AEM.
i.e. scheduler.schedule(Runnable, ScheduleOptions);
I have requirement that these Sling Jobs be run only once, so I am using ScheduleOptions.AT(Date date,int times,long period) ScheduleOptions Docs
And passing times=1 as a parameter.
(Also what is period parameter ?)
The Job successfully runs only once.
My question is am I supposed to keep a track of this Job by name and UnSchedule it using Scheduler.unschedule(String jobName) after it has finished running ?
Will completed SlingJobs that are not UnScheduled, consume memory in the AEM server ?
Will these completed BUT unscheduled jobs cause my AEM server to slow down and later on require some purge activity as maintenance?
According to https://sling.apache.org/documentation/bundles/apache-sling-eventing-and-job-handling.html#scheduled-jobs
Internally the scheduled Jobs use the Commons Scheduler Service. But in addition they are persisted (by default below /var/eventing/scheduled-jobs) and survive therefore even server restarts. When the scheduled time is reached, the job is automatically added as regular Sling Job through the JobManager.
I had a problem with a scheduled jobs before(they were triggered on the daily basis). When the server was restarted scheduled jobs wasn't un-persisted and a new job doing the same action was scheduled(job was scheduled on #Activate method). As a result, I got several jobs doing the same action at the scheduled time, so I had to unschedule them in #Deactivate method.
You may make an experiment and make sure that there is no duplicated jobs under /var/eventing/scheduled-jobs

Number of celery tasks executed at a given point of time

I am trying to create a bunch of celery tasks asynchronously on the fly. Say there are 1000 tasks I start asynchronously and I have only one celeryd process running to execute tasks. How many threads will be created by celery to handle these tasks?
If there are multiple threads that celery starts automatically to process the task queue, how do I limit celery to execute only 100 threads at a given point of time.
Thanks.
Its starts as many as you specify with the CELERYD_OPTS concurrency parameter.
Which is also discussed here.

RTOS Task Management

If a task is attempting to surrender the processor, what steps does a real time operating system need to execute to ensure that another task has the opportunity to run?
A thread yielding CPU is a scheduling event; all scheduling events cause the scheduler to run.