How does Quartz store job details between restarts (regarding last successful run, misfires etc.)? - quartz-scheduler

A task is scheduled in the Quartz-scheduler for every hour, starting at 9 AM.
Application stops at 10 AM and is restarted at 12 PM. In that case two executions at 10 AM and 11 AM will be missed.
In that case when Scheduler starts again, how many misfires will be considered?
As job was executed at 9AM, it should consider two misfires from 10 AM and 11 AM. If it is so, then how can Quartz identify the last successful schedule as application is already restarted?

You are asking two questions:
How many misfires will be considered?
As you guessed, two misfires (10 AM and 11 AM) will be detected.
However, Quartz may or may not consider all of them: depending on the misfire instruction configured in each trigger, Quartz may decide to just consider the last misfire, or ignore them all.
How does Quartz store job details between restarts?
Depending on your configuration, Quartz will store job data in a database via its JDBCJobStore (Java) / AdoJobStore (.NET), or in RAM via its RAMJobStore.
When using a database store, Quartz will persist all job details to it as they are scheduled, run, finished etc.; and retrieve said details from it upon restart.
When using a RAM store, job information will not be persisted between Scheduler runs. If the Scheduler gets stopped and then restarted, all jobs will need to be scheduled again; also, no misfires will be detected.
All of this is explained in Quartz's official documentation, I suggest that you take a look around over there for more detailed info.

Related

What queuing tools or algorithms support fair queuing of jobs

I am hitting a well known problem, but I can't find a simple answer that tells me how to solve it.
I would appreciate you directing me by answering which feature I should look for in available queuing software or suitable algorithms if the solution requires programming in addition to the tools. and if you can direct me to Python supported tools, it would be helpful
My problem is that I get over the span of the day jobs which deploy 10, 100 or 1000 tests (I exaggerate , but it helps make a point). Many jobs deploy 10 tests, some deploy 100 tests and one or two deploy 1000 tests.
I want to deploy the tests in such a manner that the delay in execution is spread in a fair manner between all jobs. Let me explain myself.
If the very large job takes 2 hours on a idle server, it would be acceptable if it completes after 4 hours.
If a small job takes 3 minutes on an idle server, it would be acceptable if it completes after 15 minutes.
I want the delay of running the jobs to be spread in a fair way, so jobs that started earlier don't get too delayed. If it looks that the job is going to be delayed more than allowed it's priority will increase.
I think that prioritizing queues may be the solution, so dynamically changing the weights on a large queue will make it faster when needed.
Is there a queue software that knows how to do the above automatically. Lets say that I give each job some time limit and the queue software knows how to prioritize the tests from each queue so that no job is delayed too much?
Thanks.
Adding information following Jim's comments.
Not enough information to supply an answer. Is a job essentially just a list of tests? Can multiple tests for a single job be run concurrently? Do you always run all tests for a job? – Jim Mischel 14 hours ago
Each job deploys between 10 to 1000 tests.
The test can run concurrently to all other tests from the same or other users without conflicts.
All tests that were deploy by a job, are planned to run.
Additional info:
I've learned so far that Prioritized Queues are actually about applying weights to items in a single queue, where items with the hightest are pulled first. If two or more items have the same highest priority, the first item to arrive will be executed first.
When I pondered about Priority Queues it was more in the way of:
Multiple Queues, where each queue has a priority assigned to the entire queue.
The priority can be changed dynamically in runtime, based on some condition, e.g. setting a time limit on the execution of the entire queue.

Do not reschedule Quartz stateful job

I have a Quartz v1.x stateful job. The repeat interval is let's say 1 minute. The job itself typically terminates within a second, but it might happen that it lasts long, let's say 5 minutes. The scheduler prevents parallel run, but when the long running job finishes, it starts it again over and over again those, which were missed during the long running job. In this example, 5 other runs will be scheduled right after the long execution finishes. What I want is to make the scheduler "forget" the missed starts. E.g. if a job starts at 12:00 and finished at 12:05, then simply omit the runs at 12:01, 12:02, 12:03, 12:04, and depending on the exact finish, even 12:05. Is this somehow possible?
I need stateful job for preventing the parallel execution. Stateless job with proper annotation is not an option, because we are using Quartz version 1.x. I already tried playing around with the misfire policies (e.g. MISFIRE_INSTRUCTION_DO_NOTHING), but it seems that these are not intended for such situations. Could anyone help me?

Scheduling jobs in Quartz as a process

Is jobs in quartz are executed as process or thread?
If it is executed as a thread then will it effect the performance of quartz scheduler when heavy jobs or time consuming jobs are executed.
If so then please suggest the solution.
If we execute 10 time consuming jobs simultaneously what is the effect?
I read the tutorials but didnt find the solution.
Please suggest the solution.
Thanks.
Read the documentation regarding Configuring the thread pool which explains how the quartz thread pool can be suited for your needs. More specifically the org.quartz.threadPool.threadCount configuration property can be set according to your needs as the documentation explains:
The number of threads available for concurrent execution of jobs. You
can specify any positive integer, although only numbers between 1 and
100 are practical. If you only have a few jobs that fire a few times a
day, then one thread is plenty. If you have tens of thousands of jobs,
with many firing every minute, then you want a thread count more like
50 or 100 (this highly depends on the nature of the work that your
jobs perform, and your systems resources).
In the specific example you mentioned regarding 10 jobs firing simultaneously, if you have configured above property with more than 10 threads, then each job will run concurrently on its own thread. Otherwise if you have configured less, some will start first, and the others will wait for threads to become available. If no threads become available until a configured period of time, the misfire instructions you have set will handle the action to be taken, which usually is to trigger delayed jobs as soon as possible but this is also a configurable setting.

Select node in Quartz cluster to execute a job

I have some questions about Quartz clustering, specifically about how triggers fire / jobs execute within the cluster.
Does quartz give any preference to nodes when executing jobs? Such as always or never the node that executed the same job the last time, or is it simply whichever node that gets to the job first?
Is it possible to specify the node which should execute the job?
The answer to this will be something of a "it depends".
For quartz 1.x, the answer is that the execution of the job is always (only) on a more-or-less random node. Where "randomness" is really based on whichever node gets to it first. For "busy" schedulers (where there are always a lot of jobs to run) this ends up giving a pretty balanced load across the cluster nodes. For non-busy scheduler (only an occasional job to fire) it may sometimes look like a single node is firing all the jobs (because the scheduler looks for the next job to fire whenever a job execution completes - so the node just finishing an execution tends to find the next job to execute).
With quartz 2.0 (which is in beta) the answer is the same as above, for standard quartz. But the Terracotta folks have built an Enterprise Edition of their TerracottaJobStore which offers more complex clustering control - as you schedule jobs you can specify which nodes of the cluster are valid for the execution of the job, or you can specify node characteristics/requisites, such as "a node with at least 100 MB RAM available". This also works along with ehcache, such that you can specify the job to run "on the node where the data keyed by X is local".
I solved this question for my web application using Spring + AOP + memcached. My jobs do know from the data they traverse if the job has already been executed, so the only thing I need to avoid is two or more nodes running at the same time.
You can read it here:
http://blog.oio.de/2013/07/03/cluster-job-synchronization-with-spring-aop-and-memcached/

Use beanstalkd, for periodic tasks, how to always make a job replaced by its latest one?

I am trying to use beanstalk for queuing a large number of periodic
tasks (for example, tasks need processed every N minutes), for each
task, if the last queued job is not completed (not reserved, i mean)
when current job to be added, the last queued job should be replaced
with current job, in other words, only the latest queued job of a task
should be processed.
how can i achieve that using beanstalk?
Ideas i have got right now is:
for each task, use memcached store its latest timestamps (set this
when add jobs to queue),
every time the worker reserved a job successfully, it first checks
timestamps for this task in memcached,
if timestamps of the job is same as timestamps in memcached, then
process this job,
otherwise skip this job, and delete it from the queue.
So is there better ways to do such work? please give your suggestions,
thanks.
I found a memcache/beanstalk combination also the best solution for an implementation where I didnt want a newer but identical job entering a queue.
Until 'named jobs' are done and the software released, that may be one of the better solutions.