Need to set the quartz cron expression dynamically - quartz-scheduler

I'm using quartz in my web application (Servlet web app) following is snap of quartz.property file and the quartz.job.xml
quartz.property
#===================================================
# Configure the Job Initialization Plugin
#===================================================
org.quartz.plugin.jobInitializer.class = org.quartz.plugins.xml.XMLSchedulingDataProcessorPlugin
org.quartz.plugin.jobInitializer.fileNames = jobs.xml
org.quartz.plugin.jobInitializer.failOnFileNotFound = true
org.quartz.plugin.jobInitializer.scanInterval = 10
org.quartz.plugin.jobInitializer.wrapInUserTransaction = false
<?xml version='1.0' encoding='utf-8'?>
<job-scheduling-data xmlns="http://www.quartz-scheduler.org/xml/JobSchedulingData"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.quartz-scheduler.org/xml/JobSchedulingData http://www.quartz-scheduler.org/xml/job_scheduling_data_1_8.xsd"
version="1.8">
<schedule>
<job>
<name>my-very-clever-job</name>
<group>MYJOB_GROUP</group>
<description>The job description</description>
<job-class>com.acme.scheduler.job.ReportJob</job-class>
</job>
<trigger>
<cron>
<name>my-trigger</name>
<group>MYTRIGGER_GROUP</group>
<job-name>my-very-clever-job</job-name>
<job-group>MYJOB_GROUP</job-group>
<!-- trigger every night at 4:30 am -->
<cron-expression>0 30 4 * * ?</cron-expression>
</cron>
</trigger>
</schedule>
</job-scheduling-data>
Every thing work fine,in this order. I need to allow user to change the time (cron expression) as the way they want.My question is how do i set the cron expression in dynamically.

CronTrigger cronTrigger = (CronTrigger) stdScheduler
.getTrigger(triggerName,triggerGroupName);
CronTrigger newTriggerIns = new CronTrigger();
newTriggerIns.setJobName(cronTrigger.getJobName());
newTriggerIns.setName(triggerName);
newTriggerIns.setCronExpression(newCronExpression);
stdScheduler.rescheduleJob(triggerName,triggerGroupName,newTriggerIns);
In the above, take the existing trigger instance.
Create one new trigger instance and set cron expression.
Then reschedule the existing instance with new one.

Creating a new Trigger like this doesn't work.
CronTrigger cronTrigger = (CronTrigger) stdScheduler.getTrigger(triggerName,triggerGroupName);
CronTrigger newTriggerIns = new CronTrigger();
newTriggerIns.setJobName(cronTrigger.getJobName());
newTriggerIns.setName(triggerName);
newTriggerIns.setCronExpression(newCronExpression);
stdScheduler.rescheduleJob(triggerName,triggerGroupName,newTriggerIns); //doesn't work
You just have to edit the original trigger like this:
CronTrigger cronTrigger = (CronTrigger) stdScheduler.getTrigger(triggerName,triggerGroupName);
cronTrigger.setCronExpression(newCronExpression);
stdScheduler.rescheduleJob(triggerName,triggerGroupName,cronTrigger);

Use Quartz api. Programmatically take this trigger instance, cast it to CronTrigger instance and use it's setCronExpression to put expression dynamically.

Related

airflow TriggerDagRunOperator how to change the execution date

I noticed that for scheduled task the execution date is set in the past according to
Airflow was developed as a solution for ETL needs. In the ETL world,
you typically summarize data. So, if I want to summarize data for
2016-02-19, I would do it at 2016-02-20 midnight GMT, which would be
right after all data for 2016-02-19 becomes available.
however, when a dag triggers another dag the execution time is set to now().
Is there a way to have the triggered dags with the same execution time of triggering dag? Of course, I can rewrite the template and use yesterday_ds, however, this is a tricky solution.
The following class expands on TriggerDagRunOperator to allow passing the execution date as a string that then gets converted back into a datetime. It's a bit hacky but it is the only way I found to get the job done.
from datetime import datetime
import logging
from airflow import settings
from airflow.utils.state import State
from airflow.models import DagBag
from airflow.operators.dagrun_operator import TriggerDagRunOperator, DagRunOrder
class MMTTriggerDagRunOperator(TriggerDagRunOperator):
"""
MMT-patched for passing explicit execution date
(otherwise it's hard to hook the datetime.now() date).
Use when you want to explicity set the execution date on the target DAG
from the controller DAG.
Adapted from Paul Elliot's solution on airflow-dev mailing list archives:
http://mail-archives.apache.org/mod_mbox/airflow-dev/201711.mbox/%3cCAJuWvXgLfipPmMhkbf63puPGfi_ezj8vHYWoSHpBXysXhF_oZQ#mail.gmail.com%3e
Parameters
------------------
execution_date: str
the custom execution date (jinja'd)
Usage Example:
-------------------
my_dag_trigger_operator = MMTTriggerDagRunOperator(
execution_date="{{execution_date}}"
task_id='my_dag_trigger_operator',
trigger_dag_id='my_target_dag_id',
python_callable=lambda: random.getrandbits(1),
params={},
dag=my_controller_dag
)
"""
template_fields = ('execution_date',)
def __init__(
self, trigger_dag_id, python_callable, execution_date,
*args, **kwargs
):
self.execution_date = execution_date
super(MMTTriggerDagRunOperator, self).__init__(
trigger_dag_id=trigger_dag_id, python_callable=python_callable,
*args, **kwargs
)
def execute(self, context):
run_id_dt = datetime.strptime(self.execution_date, '%Y-%m-%d %H:%M:%S')
dro = DagRunOrder(run_id='trig__' + run_id_dt.isoformat())
dro = self.python_callable(context, dro)
if dro:
session = settings.Session()
dbag = DagBag(settings.DAGS_FOLDER)
trigger_dag = dbag.get_dag(self.trigger_dag_id)
dr = trigger_dag.create_dagrun(
run_id=dro.run_id,
state=State.RUNNING,
execution_date=self.execution_date,
conf=dro.payload,
external_trigger=True)
logging.info("Creating DagRun {}".format(dr))
session.add(dr)
session.commit()
session.close()
else:
logging.info("Criteria not met, moving on")
There is an issue you may run into when using this and not setting execution_date=now(): your operator will throw a mysql error if you try to start a dag with an identical execution_date twice. This is because the execution_date and dag_id are used to create the row index and rows with identical indexes cannot be inserted.
I can't think of a reason you would ever want to run two identical dags with the same execution_date in production anyway, but it is something I ran into while testing and you should not be alarmed by it. Simply clear the old job or use a different datetime.
The TriggerDagRunOperator now has an execution_date parameter to set the execution date of the triggered run.
Unfortunately the parameter is not in the template fields.
If it will be added to template fields (or if you override the operator and change the template_fields value) it will be possible to use it like this:
my_trigger_task= TriggerDagRunOperator(task_id='my_trigger_task',
trigger_dag_id="triggered_dag_id",
python_callable=conditionally_trigger,
execution_date= '{{execution_date}}',
dag=dag)
It has not been released yet but you can see the sources here:
https://github.com/apache/incubator-airflow/blob/master/airflow/operators/dagrun_operator.py
The commit that did the change was:
https://github.com/apache/incubator-airflow/commit/089c996fbd9ecb0014dbefedff232e8699ce6283#diff-41f9029188bd5e500dec9804fed26fb4
I improved a bit the MMTTriggerDagRunOperator. The function checks if the dag_run already exists, if found, restart the dag using the clear function of airflow. This allows us to create a dependency between dags because the possibility to have the execution date moved to the triggered dag opens a whole universe of amazing possibilities. I wonder why this is not the default behavior in airflow.
def execute(self, context):
run_id_dt = datetime.strptime(self.execution_date, '%Y-%m-%d %H:%M:%S')
dro = DagRunOrder(run_id='trig__' + run_id_dt.isoformat())
dro = self.python_callable(context, dro)
if dro:
session = settings.Session()
dbag = DagBag(settings.DAGS_FOLDER)
trigger_dag = dbag.get_dag(self.trigger_dag_id)
if not trigger_dag.get_dagrun( self.execution_date ):
dr = trigger_dag.create_dagrun(
run_id=dro.run_id,
state=State.RUNNING,
execution_date=self.execution_date,
conf=dro.payload,
external_trigger=True
)
logging.info("Creating DagRun {}".format(dr))
session.add(dr)
session.commit()
else:
trigger_dag.clear(
start_date = self.execution_date,
end_date = self.execution_date,
only_failed = False,
only_running = False,
confirm_prompt = False,
reset_dag_runs = True,
include_subdags= False,
dry_run = False
)
logging.info("Cleared DagRun {}".format(trigger_dag))
session.close()
else:
logging.info("Criteria not met, moving on")
There is a function available in the experimental API section of airflow that allows you to trigger a dag with a specific execution date. https://github.com/apache/incubator-airflow/blob/master/airflow/api/common/experimental/trigger_dag.py
You can call this function as a part of PythonOperator and achieve the objective.
So it will look like
from airflow.api.common.experimental.trigger_dag import trigger_dag
trigger_operator=PythonOperator(task_id='YOUR_TASK_ID',
python_callable=trigger_dag,
op_args=['dag_id'],
op_kwargs={'execution_date': datetime.now()})

Perform native batch UPDATE with EclipseLink

Dear fellow programmers,
i have been given the task to update about 10 000 - 100 000 records in an Oracle 11g database EACH minute. The current state of those records are held in a global ArrayList so i don't need to SELECT all records on every update from the DB. A scheduler updates those records in the ArrayList at the beginning of each minute and then starts to update the records in the database.
I cannot change this fact, it is a customer requirement.
To achieve high performance, those updates should be done by using the native batch update feature.
I am using a TomEE plume 7.0.2 application server with EclipseLink 2.6.3 (this version is included with TomEE).
Code:
#PersistenceContext(unitName = "MES_Tables")
private EntityManager em;
...
#Schedule(second="0", minute="*", hour="*", persistent=false)
public void startUpdate(){
Query q = em.createNativeQuery(
"UPDATE " +
"SCHEMA.PROPERTIES_GRP_CONT " +
"SET " +
"STRVAL = ? " + //<-- SQL-Param
"WHERE " +
"STATES_ID = 1 " +
"AND PROPERTIES_ID = ? " + //<-- SQL-Param
"AND PROPERTIES_GRP_ID = ?"); //<-- SQL-Param
for(BatchInfo bi : biList){
int rowsUpdated = q
.setParameter(1, Long.toString(bi.getLifetime()))
.setParameter(2, bi.getPropertiesId())
.setParameter(3, bi.getBatchId())
.executeUpdate();
}
}
Unfortunately those updates are executed as single updates and no batching is happening. So 10 000 updates are taking about 40-50 seconds.
To my understanding the EntityManager (em) should automatically create batch updates if you execute multiple updates within a single for each loop.
Even simplifying the SQL UPDATE to a statement without any parameters, so that always the same update is executed, did not change the fact that single updates were executed.
persistence.xml
<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.1"
xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/persistence http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
<persistence-unit name="MES_Tables" transaction-type="JTA">
<jta-data-source>MES_Connection</jta-data-source>
<exclude-unlisted-classes>false</exclude-unlisted-classes>
<properties>
<property name="javax.persistence.schema-generation.database.action" value="none" />
<property name="eclipselink.ddl-generation" value="none" />
<property name="eclipselink.logging.level" value="WARNING" />
<property name="eclipselink.logging.level.sql" value="FINE" />
<property name="eclipselink.logging.parameters" value="true" />
<property name="javax.persistence.query.timeout" value="1800000" />
<property name="eclipselink.jdbc.connections.wait-timeout" value="1800000" />
<property name="eclipselink.jdbc.batch-writing" value="JDBC" />
<property name="eclipselink.jdbc.batch-writing.size" value="600" />
<property name="eclipselink.logging.logger" value="mes.core.logging.EclipseLinkLogger"/>
</properties>
</persistence-unit>
</persistence>
To test if batch updating is working at all, i refactored the code to use a managed JPA entity instead of the native SQL UPDATE. The problem here is, that i need to perform a em.merge(entity) on each entity for it to be managed again. This is because the entities become unmanaged after committing (which is happening each minute in the scheduler).
This causes 10 000 slow SELECTs (30-40 seconds). After those SELECTs are finished, EclipseLink performs a fast batch update (3-4 seconds).
The last days i was trying to prevent EclipseLink from performing those SELECTs and just issue the update but without luck. From another stackoverflow post i found a method to do updates without the SELECT:
Perform UPDATE without SELECT in eclipselink
EntityManagerImpl emImpl = ((EntityManagerImpl) em.getDelegate());
UnitOfWork uow = emImpl.getUnitOfWork();
AbstractSession as = uow.getParent();
for(BatchInfo bi : biList)
as.updateObject(bi);
This unfortunately did not work also because of the following exception:
org.eclipse.persistence.internal.sessions.IsolatedClientSession cannot be cast to org.eclipse.persistence.internal.sessions.UnitOfWorkImpl
I am out of options now and hopefully someone of you can give me a hint where to look at and solve this problem. It would be greatly appreciated.
I would rather have the native batch update working than the manipulating EclipseLink to not perform any SELECTs on merge.
After searching for a long time and trying different approaches (thanks to Chris) i found the simplest solution if you want to stay on the native side of JPA:
#Schedule(second="0", minute="*", hour="*", persistent=false)
public void startUpdate(){
String updateSql =
"UPDATE " +
"SCHEMA.PROPERTIES_GRP_CONT " +
"SET " +
"STRVAL = ? " + //<-- SQL-Param
"WHERE " +
"STATES_ID = 1 " +
"AND PROPERTIES_ID = ? " + //<-- SQL-Param
"AND PROPERTIES_GRP_ID = ?"; //<-- SQL-Param
java.sql.Connection connection = em.unwrap(java.sql.Connection.class);
PreparedStatement prepStatement = connection.prepareStatement(updateSql);
for(BatchInfo bi : biList){
prepStatement.setString(1, Long.toString(bi.getLifetime()));
prepStatement.setLong(2, bi.getPropertiesId());
prepStatement.setLong(3, bi.getBatchId());
prepStatement.addBatch();
}
prepStatement.executeBatch();
}
Warning: the important part (em.unwrap) may be EclipseLink specific and require JPA 2.1 or higher!

Crontab style scheduling in Play 2.4.x?

Technically I can install cron on the machine and curl the url, but I'm trying to avoid that. Any way to accomplish this?
Reason I want to avoid cron is so I can easily change the schedule or stop it completely without also ssh'ing into the machine to do so.
Take a look at: https://github.com/enragedginger/akka-quartz-scheduler.
Refer to http://quartz-scheduler.org/api/2.1.7/org/quartz/CronExpression.html for valid CronExpressions and examples.
An example taken from the docs:
An example schedule called Every-30-Seconds which, aptly, fires-off every 30 seconds:
akka {
quartz {
schedules {
Every30Seconds {
description = "A cron job that fires off every 30 seconds"
expression = "*/30 * * ? * *"
calendar = "OnlyBusinessHours"
}
}
}
}
You can integrate this into your Play! application (probably in your Global application)
You can use the Akka scheduler.
val scheduler = Akka.system(app).scheduler
scheduler.schedule(0 seconds, 1 hour) {
// run this block every hour
}
The first parameter is a delay, so if you wanted to delay to a specific time you could easily calculate the target time with some simple date arithmetic.
Check out https://github.com/philcali/cronish
Some example code from README.md:
val payroll = task {
println("You have just been paid... Finally!")
}
// Yes... that's how you run it
payroll executes "every last Friday in every month"
val greetings = job (println("Hello there")) describedAs "General Greetings"
// give a delayed start
val delayed = greetings runs "every day at 7:30" in 5.seconds
// give an exact time to start
val exact = greetings runs "every day at noon" starting now + 1.week
// resets a job to its definition
val reseted = exact.reset()
reseted starting now + 1.day

How to ensure only one job fires at a time in Quartz.NET?

I have a Windows Service that uses Quartz.NET to execute jobs that are scheduled. I only want it to pick up a single job at a time. However, occasionally I am seeing behavior that indicates that it has picked up two jobs at once.
There are two log files (the regular one and one automatically generated when the regular one is in use) with jobs that start at the exact same time. I can see both jobs executing in the QRTZ_FIRED_TRIGGERS table, but only one has the correct instance ID, which is odd.
I have configured Quartz to use only a single thread. Is this not how you tell it to only pick up a single job at a time?
Here is my quartz.config file with sensitive values hashed out:
quartz.scheduler.instanceName = DefaultQuartzJobScheduler
quartz.scheduler.instanceId = ######################
quartz.jobstore.clustered = true
quartz.jobstore.clusterCheckinInterval = 15000
quartz.threadPool.type = Quartz.Simpl.SimpleThreadPool, Quartz
quartz.jobStore.useProperties = false
quartz.jobStore.type = Quartz.Impl.AdoJobStore.JobStoreTX, Quartz
quartz.jobStore.driverDelegateType = Quartz.Impl.AdoJobStore.OracleDelegate, Quartz
quartz.jobStore.tablePrefix = QRTZ_
quartz.jobStore.lockHandler.type = Quartz.Impl.AdoJobStore.UpdateLockRowSemaphore, Quartz
quartz.jobStore.misfireThreshold = 60000
quartz.jobStore.dataSource = default
quartz.dataSource.default.connectionString = ######################
quartz.dataSource.default.provider = OracleClient-20
# Customizable values per Node
quartz.threadPool.threadCount = 1
quartz.threadPool.threadPriority = Normal
Make the threadcount = 1.
<add key="quartz.threadPool.threadCount" value="1"/>
<add key="quartz.threadPool.threadPriority" value="Normal"/>
(as you have done)
Make each of your jobs "Stateful"
[PersistJobDataAfterExecution]
[DisallowConcurrentExecution]
public class StatefulDoesNotRunConcurrentlyJob : IJob /* : IStatefulJob */ /* Error 43 'Quartz.IStatefulJob' is obsolete: 'Use DisallowConcurrentExecutionAttribute and/or PersistJobDataAfterExecutionAttribute annotations instead. */
{
}
I've left in the name of the ~~older~~ version of how to do this (namely, the "IStatefulJob") and the error message that is generated when you code to the outdated "IStatefulJob" interface. But the error message gives the hint.
Basically, if you have 1 thread AND every job is marked with "DisallowConcurrentExecution", it should result in 1 job at any given time..running in "serial mode".

Replacing Celerybeat with Chronos

How mature is Chronos? Is it a viable alternative to scheduler like celery-beat?
Right now our scheduling implements a periodic "heartbeat" task that checks of "outstanding" events and fires them if they are overdue. We are using python-dateutil's rrule for defining this.
We are looking at alternatives to this approach, and Chronos seems a very attactive alternative: 1) it would mitigate the necessity to use a heartbeat schedule task, 2) it supports RESTful submission of events with ISO8601 format, 3) has a useful interface for management, and 4) it scales.
The crucial requirement is that scheduling needs to be configurable on the fly from the Web Interface. This is why can't use celerybeat's built-in scheduling out of the box.
Are we going to shoot ourselves in the foot by switching over to Chronos?
This SO has solutions to your dynamic periodic task problem. It's not the accepted answer at the moment:
from djcelery.models import PeriodicTask, IntervalSchedule
from datetime import datetime
class TaskScheduler(models.Model):
periodic_task = models.ForeignKey(PeriodicTask)
#staticmethod
def schedule_every(task_name, period, every, args=None, kwargs=None):
""" schedules a task by name every "every" "period". So an example call would be:
TaskScheduler('mycustomtask', 'seconds', 30, [1,2,3])
that would schedule your custom task to run every 30 seconds with the arguments 1 ,2 and 3 passed to the actual task.
"""
permissible_periods = ['days', 'hours', 'minutes', 'seconds']
if period not in permissible_periods:
raise Exception('Invalid period specified')
# create the periodic task and the interval
ptask_name = "%s_%s" % (task_name, datetime.datetime.now()) # create some name for the period task
interval_schedules = IntervalSchedule.objects.filter(period=period, every=every)
if interval_schedules: # just check if interval schedules exist like that already and reuse em
interval_schedule = interval_schedules[0]
else: # create a brand new interval schedule
interval_schedule = IntervalSchedule()
interval_schedule.every = every # should check to make sure this is a positive int
interval_schedule.period = period
interval_schedule.save()
ptask = PeriodicTask(name=ptask_name, task=task_name, interval=interval_schedule)
if args:
ptask.args = args
if kwargs:
ptask.kwargs = kwargs
ptask.save()
return TaskScheduler.objects.create(periodic_task=ptask)
def stop(self):
"""pauses the task"""
ptask = self.periodic_task
ptask.enabled = False
ptask.save()
def start(self):
"""starts the task"""
ptask = self.periodic_task
ptask.enabled = True
ptask.save()
def terminate(self):
self.stop()
ptask = self.periodic_task
self.delete()
ptask.delete()
I haven't used djcelery yet, but it supposedly has an admin interface for dynamic periodic tasks.