I am using Quartz to schedule cron jobs in my web application. i am using a oracle Databse to store jobs and related info. When i add the jobs in the Database, i need to re-start the server/application (tomcat server) for these new jobs to get scheduled. How can i add jobs in the database and make them work without restarting the server.
I assume you mean you are using JDBCJobStore? In that case it is not ideal to make direct changes in the database tables storing the job data. However, I suppose you could set up a separate job that runs every X minutes / hours, checks whether there are new jobs in the database (that need to be scheduled), and schedule them as usual.
Add jobs via the Scheduler API.
http://www.quartz-scheduler.org/docs/best_practices.html
Related
I have a reporting application that uses Celery to process thousands of jobs per day. There is a python module per each report type that encapsulates all job steps. Jobs take customer-specific parameters and typically complete within a few minutes. Currently, jobs are triggered by customers on-demand when they create a new report or request a refresh of an existing one.
Now, I would like to add scheduling, so the jobs run daily, and reports get refreshed automatically. I understand that Airflow shines at task orchestration and scheduling. I also like the idea of expressing my jobs as DAGs and getting the benefit of task retries. I can see how I can use Airflow to run scheduled batch-processing jobs, but I am unsure about my use case.
If I express my jobs as Airflow DAGs, I will still need to run them parametrized for each customer. It means, if the customer creates a new report, I will need to have a way to trigger a DAG with the customer-specific configuration. And with a scheduled execution, I will need to enumerate all customers and create a parametrized (sub-)DAG for each of them. My understanding this should be possible since Airflow supports DAGs created dynamically, however, I am not sure if this is an efficient and correct way to use Airflow.
I wonder if anyway considered using Airflow for a scenario similar to mine.
Celery workflows do literally the same, and you can create and run them at any point of time. Also, Celery has a pretty good scheduler (I have never seen it failing in 5 years of using Celery) - Celery Beat.
Sure, Airflow can be used to do what you need without any problems.
You can use Airflow to create DAGs dynamically, I am not sure if this will work with a scale of 1000 of DAGs though. There are some good examples on astronomer.io on Dynamically Generating DAGs in Airflow.
I have some DAGs and task that are dynamically generated by a yaml configuration with different schedules and configurations. It all works without any issue.
Only thing that might be challenging is the "jobs are triggered by customers on-demand" - I guess you could trigger any DAG with Airflow's REST API, but it's still in a experimental state.
I want to consolidate a couple of historically grown scripts (Python, Bash and Powershell) which purpose is to sync data between a lot of different database backends (mostly postgres, but also oracle and sqlserver) and on different sites. There isn't really a master, its more like a loose couple of partner companies working on the same domain specific use cases, everyone with its own data silo and its my job to hold all this together as good as I can.
Currently those scripts I mentioned are cron scheduled and need to run on the origin server where a dataset gets initially written, to sync it to every partner over night.
I am also familiar with and use Apache Airflow in another project. So my idea was to use an workflow management tool like Airflow to streamline the sync process and get it more centralized. But also with Airflow there is only a time interval scheduler available to trigger a DAG.
As most writes come in over postgres databases, I'd like to make use of the NOTIFY/LISTEN feature and already have a python daemon based on this listening to any database change (via triggers) and calling an event handler then.
The last missing piece is how its probably best done to trigger an airflow DAG with this handler and how to keep all this running reliably?
Perhaps there is a better solution?
I am writing a Quartz.net application using AdoJobStore to allow automated report scheduling.
In my scenario, users will define custom reports to be scheduled in one application which will add the required jobs and triggers to the database (using the AdoJobStore routines).
A separate Quartz.net application then reads these settings from the database (also using the AdoJobStore routines) and emails the reports as necessary.
Is there a way to get the quartz scheduler to automatically start scheduling new jobs and triggers that have been added to the database after the scheduler last started, or will I need to write a routine that periodically checks for database changes, and if found restart the Quartz scheduler instance?
You can handle all of this directly with Quartz.Net. Here's one way to do it:
Set up a Quartz.Net server as a windows service. The distribution comes with a Windows Service implementation, or you can build your own. Enable remoting on the quartz server.
From the application where users will configure their reports and schedules, connect to the Quartz.Net server using the Quartz.Net library and directly schedule the jobs and triggers as necessary.
You'll probably want to store the user's report configuration elsewhere in case the user wants to look at it later or change/copy it. Store this data somewhere else other than Quartz.Net. If the user changes the stored report configuration, connect again to the Quartz.Net server and update/reschedule the jobs using the Quartz.Net library. Alternatively, you could create a job that runs on the Quartz.Net server and periodically checks whether there have been any report configuration changes.
You'll have to create the actual jobs that will generate your reports in a generic enough fashion so that any report can be built by passing in data to job via the JobDataMap, instead of having to create a job for each report.
I am working to migrate from Quartz 1.6 to 2.1 and use a JDBCJobStore. Previously, the the jobs were loaded via an xml file when the webapp started. The scheduler is now running using the JDBCJobStore but I don't understand how to add the jobs to the database which need to run on an ongoing basis (not one-off jobs).
My first thought is to create a servlet which runs on startup which adds the jobs to the database. But my concern is that this will be executed every time I need to restart the app and the jobs will get duplicated.
Thanks,
steve
The Jobs wont disappear from the database when you do a restart. So within your servlet, when it starts up before adding any jobs check to see if they already exist. When you create your jobs you can give them identities. Using the identities and some quartz methods you check if they already exist.
It sounds like the memory based scheduler is a better fit for these fixed jobs. You can create more than one scheduler, one memory, one JDBC if that makes sense for your application.
I am using Spring 3 and Quartz 1.8.5 to schedule jobs in a clustered mode. I have placed, overwriteExistingJobs=true in the Spring's scheduler configuration.
There is a requirement for me to create dynamic jobs programmatically apart from the jobs which are part of the configuration using Quartz jobs. Everything works fine till i re-start the server. At this point , there is a problem with overwriteExistingJobs=true.
Say, if i have a dynamic job created to execute every two minutes. And, i stop the server and start it after ten minutes, the job executes five times as soon as the server starts. But, if there is a job which is part of the spring configuration , like the one given in spring documentation , it is over-written when the server re-starts.
My observation has been that for jobs which are configured in the spring configuration file and added to the org.springframework.scheduling.quartz.SchedulerFactoryBean, the
PREV_FIRE_TIME in QRTZ_TRIGGERS table gets updated to '-1' but for dynamically created jobs it is not over-written.
The fix is as follows:
a) I have CronTriggers associated with dynamic jobs so what i did was to provide the mis-fire instruction.
JobDetail jobDetail = new JobDetail(job.getDescription(), job.getName(),job.getClass());
CronTrigger crTrigger = new CronTrigger( "cronTrigger", job.getName(), cronExpression);
crTrigger.setStartTime(firstFireTime);
crTrigger.setMisfireInstruction(CronTrigger.MISFIRE_INSTRUCTION_DO_NOTHING);
scheduler.scheduleJob(jobDetail, crTrigger);
b)The mis-fire threshold was pretty high (6000000). So, what i did was to reduce the misfire threshold and it worked like a charm.