how to load/boostrap ongoing jobs with a quartz jdbcStore - quartz-scheduler

I am working to migrate from Quartz 1.6 to 2.1 and use a JDBCJobStore. Previously, the the jobs were loaded via an xml file when the webapp started. The scheduler is now running using the JDBCJobStore but I don't understand how to add the jobs to the database which need to run on an ongoing basis (not one-off jobs).
My first thought is to create a servlet which runs on startup which adds the jobs to the database. But my concern is that this will be executed every time I need to restart the app and the jobs will get duplicated.
Thanks,
steve

The Jobs wont disappear from the database when you do a restart. So within your servlet, when it starts up before adding any jobs check to see if they already exist. When you create your jobs you can give them identities. Using the identities and some quartz methods you check if they already exist.

It sounds like the memory based scheduler is a better fit for these fixed jobs. You can create more than one scheduler, one memory, one JDBC if that makes sense for your application.

Related

How to create a cron job in a Kubernetes deployed app without duplicates?

I am trying to find a solution to run a cron job in a Kubernetes-deployed app without unwanted duplicates. Let me describe my scenario, to give you a little bit of context.
I want to schedule jobs that execute once at a specified date. More precisely: creating such a job can happen anytime and its execution date will be known only at that time. The job that needs to be done is always the same, but it needs parametrization.
My application is running inside a Kubernetes cluster, and I cannot assume that there always will be only one instance of it running at the any moment in time. Therefore, creating the said job will lead to multiple executions of it due to the fact that all of my application instances will spawn it. However, I want to guarantee that a job runs exactly once in the whole cluster.
I tried to find solutions for this problem and came up with the following ideas.
Create a local file and check if it is already there when starting a new job. If it is there, cancel the job.
Not possible in my case, since the duplicate jobs might run on other machines!
Utilize the Kubernetes CronJob API.
I cannot use this feature because I have to create cron jobs dynamically from inside my application. I cannot change the cluster configuration from a pod running inside that cluster. Maybe there is a way, but it seems to me there have to be a better solution than giving the application access to the cluster it is running in.
Would you please be as kind as to give me any directions at which I might find a solution?
I am using a managed Kubernetes Cluster on Digital Ocean (Client Version: v1.22.4, Server Version: v1.21.5).
After thinking about a solution for a rather long time I found it.
The solution is to take the scheduling of the jobs to a central place. It is as easy as building a job web service that exposes endpoints to create jobs. An instance of a backend creating a job at this service will also provide a callback endpoint in the request which the job web service will call at the execution date and time.
The endpoint in my case links back to the calling backend server which carries the logic to be executed. It would be rather tedious to make the job service execute the logic directly since there are a lot of dependencies involved in the job. I keep a separate database in my job service just to store information about whom to call and how. Addressing the startup after crash problem becomes trivial since there is only one instance of the job web service and it can just re-create the jobs normally after retrieving them from the database in case the service crashed.
Do not forget to take care of failing jobs. If your backends are not reachable for some reason to take the callback, there must be some reconciliation mechanism in place that will prevent this failure from staying unnoticed.
A little note I want to add: In case you also want to scale the job service horizontally you run into very similar problems again. However, if you think about what is the actual work to be done in that service, you realize that it is very lightweight. I am not sure if horizontal scaling is ever a requirement, since it is only doing requests at specified times and is not executing heavy work.

What is the current recommended approach to manage/stop a spring-batch job?

We have some spring-batch jobs are triggered by autosys with shell scripts as short lived processes.
Right now there's no way to view what is going on in the spring-batch process so I was exploring ways to view the status & manage(stop) the jobs.
Spring Cloud Data Flow is one of the options that I was exploring - but it seems that may not work when jobs are scheduled with Autosys.
What are the other options that I can explore in this regard and what is the recommended approach to manage spring-batch jobs now?
To stop a job, you first need to get the ID of the job execution to stop. This can be done using the JobExplorer API that allows you to explore meta-data that Spring Batch is aware of in the job repository. Once you get the job execution ID, you can stop it by calling the JobOperator#stop method, please refer to the Stopping a job section of the reference documentation.
This is independent of any method you used to launch the job (either manually, or via a scheduler or a graphical tool) and allows you to gracefully stop a job and leave the repository in a consistent state (ready for a restart if needed).

Use Airflow to run parametrized jobs on-demand and with a schedule

I have a reporting application that uses Celery to process thousands of jobs per day. There is a python module per each report type that encapsulates all job steps. Jobs take customer-specific parameters and typically complete within a few minutes. Currently, jobs are triggered by customers on-demand when they create a new report or request a refresh of an existing one.
Now, I would like to add scheduling, so the jobs run daily, and reports get refreshed automatically. I understand that Airflow shines at task orchestration and scheduling. I also like the idea of expressing my jobs as DAGs and getting the benefit of task retries. I can see how I can use Airflow to run scheduled batch-processing jobs, but I am unsure about my use case.
If I express my jobs as Airflow DAGs, I will still need to run them parametrized for each customer. It means, if the customer creates a new report, I will need to have a way to trigger a DAG with the customer-specific configuration. And with a scheduled execution, I will need to enumerate all customers and create a parametrized (sub-)DAG for each of them. My understanding this should be possible since Airflow supports DAGs created dynamically, however, I am not sure if this is an efficient and correct way to use Airflow.
I wonder if anyway considered using Airflow for a scenario similar to mine.
Celery workflows do literally the same, and you can create and run them at any point of time. Also, Celery has a pretty good scheduler (I have never seen it failing in 5 years of using Celery) - Celery Beat.
Sure, Airflow can be used to do what you need without any problems.
You can use Airflow to create DAGs dynamically, I am not sure if this will work with a scale of 1000 of DAGs though. There are some good examples on astronomer.io on Dynamically Generating DAGs in Airflow.
I have some DAGs and task that are dynamically generated by a yaml configuration with different schedules and configurations. It all works without any issue.
Only thing that might be challenging is the "jobs are triggered by customers on-demand" - I guess you could trigger any DAG with Airflow's REST API, but it's still in a experimental state.

Quartz Scheduler using database

I am using Quartz to schedule cron jobs in my web application. i am using a oracle Databse to store jobs and related info. When i add the jobs in the Database, i need to re-start the server/application (tomcat server) for these new jobs to get scheduled. How can i add jobs in the database and make them work without restarting the server.
I assume you mean you are using JDBCJobStore? In that case it is not ideal to make direct changes in the database tables storing the job data. However, I suppose you could set up a separate job that runs every X minutes / hours, checks whether there are new jobs in the database (that need to be scheduled), and schedule them as usual.
Add jobs via the Scheduler API.
http://www.quartz-scheduler.org/docs/best_practices.html

QuartzScheduler.NET - Adding new job definitions while scheduler is running

Can anyone please say if quartz will allow you to add additional job types once the scheduler is up and running?
I reckon that our implementation of quarts will be an asp.net service using ram store. It is likely that new jobs will be written over time and that we will want to add these jobs into the scheduler without having to shut the service down.
Yes, so long as the classes related to the new job types make it into the classpath, and your class loader will discover them. Quartz does nothing to "preload" or "prediscover" job classes - it just loads them as they're referenced.