How to monitor a quartz scheduler job? - quartz-scheduler

I am very new to quartz scheduler. I am aware that we can enable logs for quartz jobs and triggers by doing the following configuration
org.quartz.plugin.jobHistory.class: org.quartz.plugins.history.LoggingJobHistoryPlugin
# Format of Log Generated
org.quartz.plugin.jobHistory.jobSuccessMessage= Job [{1}.{0}] execution complete and reports: { 8 }
org.quartz.plugin.jobHistory.jobToBeFiredMessage= Job [{1}.{0}] to be fired by trigger [{4}.{3}], re-fire: { 7 }
org.quartz.plugin.triggHistory.class= org.quartz.plugins.history.LoggingTriggerHistoryPlugin
# Format of Log Generated
org.quartz.plugin.triggHistory.triggerFiredMessage= Trigger \{1\}.\{0\} fired job \{6\}.\{5\} at: \{4, date, HH:mm:ss MM/dd/yyyy\}
org.quartz.plugin.triggHistory.triggerCompleteMessage= Trigger \{1\}.\{0\} completed firing job \{6\}.\{5\} at \{4, date, HH:mm:ss MM/dd/yyyy\}
But I am trying to understand if there is any way to directly get the quantitative metrics like how many jobs are currently running or duration for each job etc.
I am also aware of various tools like quartz-dask which gives a ui for the said metrics. But I am more interested in the metrics which in turn I could push to my prometheus instance

Related

How to wait until a job is done or a file is updated in airflow

I am trying to use apache-airflow, with google cloud-composer, to shedule batch processing that result in the training of a model with google ai platform. I failed to use airflow operators as I explain in this question unable to specify master_type in MLEngineTrainingOperator
Using the command line I managed to launch a job successfully.
So now my issue is to integrate this command in airflow.
Using BashOperator I can train the model but I need to wait for the job to be completed before creating a version and setting it as the default. This DAG create a version before the job is done
bash_command_train = "gcloud ai-platform jobs submit training training_job_name " \
"--packages=gs://path/to/the/package.tar.gz " \
"--python-version=3.5 --region=europe-west1 --runtime-version=1.14" \
" --module-name=trainer.train --scale-tier=CUSTOM --master-machine-type=n1-highmem-16"
bash_train_operator = BashOperator(task_id='train_with_bash_command',
bash_command=bash_command_train,
dag=dag,)
...
create_version_op = MLEngineVersionOperator(
task_id='create_version',
project_id=PROJECT,
model_name=MODEL_NAME,
version={
'name': version_name,
'deploymentUri': export_uri,
'runtimeVersion': RUNTIME_VERSION,
'pythonVersion': '3.5',
'framework': 'SCIKIT_LEARN',
},
operation='create')
set_version_default_op = MLEngineVersionOperator(
task_id='set_version_as_default',
project_id=PROJECT,
model_name=MODEL_NAME,
version={'name': version_name},
operation='set_default')
# Ordering the tasks
bash_train_operator >> create_version_op >> set_version_default_op
The training result in updating of a file in Gcloud storage. So I am looking for an operator or a sensor that will wait until this file is updated, I noticed GoogleCloudStorageObjectUpdatedSensor, but I dont know how to make it retry until this file is updated.
An other solution would be to check for the job to be completed, but I can't find how too.
Any help would be greatly appreciated.
The Google Cloud documentation for the --stream-logs flag:
"Block until job completion and stream the logs while the job runs."
Add this flag to bash_command_train and I think it should solve your problem. The command should only release once the job is finished, then Airflow will mark it as success. It will also let you monitor your training job's logs in Airflow.

ECS/Fargate - can I schedule a job to run every 5 minutes UNLESS its already running?

I've got an ECS/Fargate task that runs every five minutes. Is there a way to tell it to not run if the prior instance is still working? At the moment I'm just passing it a cron expression, and there's nothing in the cron/rate aws doc about blocking subsequent runs.
Conseptually I'm looking for something similar to Spring's #Scheduled(fixedDelay=xxx) where it'll run every five minutes after it finishes.
EDIT - I've created the task using cloudformation, not the cli
This solution works if you are using Cloudwatch Logging for your ECS application
- Have your script emit a 'task completed' or 'script successfully completed running' message so you can track it later on.
Using the describeLogStreams function, first retrieve the latest log stream. This will be the stream that was created for the task which ran 5 minutes ago in your case.
Once you have the name of the stream, check the last few logged events (text printed in the stream) to see if it's the expected task completed event that your stream should have printed. Use the getLogEvents function for this.
If it isn't, don't launch the next task and invoke a wait or handle as needed
Schedule your script to run every 5 minutes as you would normally.
API links to aws-sdk docs are below. This script is written in JS and uses the AWS-SDK (https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS.html) but you can use boto3 for python or a different lib for other languages
API ref for describeLogStreams
API ref for getLogEvents
const logGroupName = 'logGroupName';
this.cloudwatchlogs = new AwsSdk.CloudWatchLogs({
apiVersion: '2014-03-28', region: 'us-east-1'
});
// Get the latest log stream for your task's log group.
// Limit results to 1 to get only one stream back.
var descLogStreamsParams = {
logGroupName: logGroupName,
descending: true,
limit: 1,
orderBy: 'LastEventTime'
};
this.cloudwatchlogs.describeLogStreams(descLogStreamsParams, (err, data) => {
// Log Stream for the previous task run..
const latestLogStreamName = data.logStreams[0].logStreamName;
// Call getLogEvents to read from this log stream now..
const getEventsParams = {
logGroupName: logGroupName,
logStreamName: latestLogStreamName,
};
this.cloudwatchlogs.getLogEvents(params, (err, data) => {
const latestParsedMessage = JSON.parse(data.events[0].message);
// Loop over this to get last n messages
// ...
});
});
If you are launching the task with the CLI, the run-task command will return you the task-arn.
You can then use this to check the status of that task:
aws ecs describe-tasks --cluster MYCLUSTER --tasks TASK-ARN --query 'tasks[0].lastStatus'
It will return RUNNING if it's still running, STOPPED if stopped, etc.
Note that Fargate is very aggressive about harvesting stopped tasks. If that command returns null, you can consider it STOPPED.

JES2 SPOOL volume waiting for jobs, but $DJ shows no jobs

I’m trying to drain a JES2 SPOOL volume. It says it’s waiting for jobs:
$DSPL
$HASP893 VOLUME(SPLZ00) 852
$HASP893 VOLUME(SPLZ00) STATUS=DRAINING,AWAITING(JOBS),
$HASP893 PERCENT=2
$HASP893 VOLUME(SPLZ01) STATUS=ACTIVE,PERCENT=38
$HASP893 VOLUME(SPLZ02) STATUS=ACTIVE,PERCENT=36
$HASP646 37.5371 PERCENT SPOOL UTILIZATION
But when I look to see which jobs it’s waiting for, I don’t find any:
$DJ(*),SPL=(VOL=SPLZ00)
$HASP003 RC=(52),D 879
$HASP003 RC=(52),D J(*) - NO SELECTABLE ENTRIES FOUND MATCHING
$HASP003 SPECIFICATION
Any ideas about why this volume won’t finish draining?
Thanks to Dave Gibney on the IBM-MAIN mailing list (IBM-MAIN#LISTSERV.UA.EDU), I have the answer.
$DJ doesn't show started tasks or TSO users. $DJQ(*),SPL=(VOL=SPLZ00) displays everything. There's also $DS that just shows STC and $DT that only show TSU.
Though $DJQ commands show batch jobs, TSO users and Started tasks, but it does not include job group logging jobs. You would need to use the command $DG(*),SPL=(VOL=SPLZ00) to show any job groups using a spool volume.

quartz Fire Job immediately doesn't work

I integrated quartz 2 and spring 4 with maven and java annotation ( using servlet 3 ), also i am using tomcat 7 maven plugin for deploying my project,my quartz Configuration class like as below :
and my job class define simply like as below :
then i use the quartz Scheduler for using fire my job trigger immediately as below :
but my problem is : when i call fireNow methode with "job1" , "mygroup" parameters nothing happens and my job1 do not call immediately and don't print anything in console, i also track the db tables an i noticed
after running the fireNow method new row inserted in my qrtz_triggers table in mysql:
If Quartz scheduler is not set to start automatically. You need to start it explicitly.
scheduler.start();
If Quartz scheduler started successful, you should see information in your log or console output similar as below.
[main] INFO org.quartz.core.QuartzScheduler - Scheduler meta-data: Quartz Scheduler (v2.2.1)'org.springframework.scheduling.quartz.SchedulerFactoryBean#0' with instanceId 'MyScheduler'
Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.
NOT STARTED.
Currently in standby mode.
Number of jobs executed: 0
Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 10 threads.
Using job-store 'org.quartz.simpl.RAMJobStore' - which does not support persistence. and is not clustered.
...
[main] INFO org.quartz.core.QuartzScheduler - started
Finally I found solution for my problem, after enabling quartz log4j (adding log4j.logger.org.quartz=DEBUG in my log4j.properties ), I saw the jdbc exception in console, the exception related to using outdated quartz-query.
I added quartz 2.2.1 dependency in my POM but I used quartz sql query for 2.1.7 version and that mismatched between quartz jar and quartz sql query version cause missing some table like SCHED_TIME.

find if a job is running in Quartz1.6

I would like to clarify details of the scheduler.getCurrentlyExecutingJobs() method in Quartz1.6. I have a job that should have only one instance running at any given moment. It can be triggered to "run Now" from a UI but if a job instance already running for this job - nothing should happen.
This is how I check whether there is a job running that interests me:
List<JobExecutionContext> currentJobs = scheduler.getCurrentlyExecutingJobs();
for (JobExecutionContext jobCtx: currentJobs){
jobName = jobCtx.getJobDetail().getName();
groupName = jobCtx.getJobDetail().getGroup();
if (jobName.equalsIgnoreCase("job_I_am_looking_for_name") &&
groupName.equalsIgnoreCase("job_group_I_am_looking_for_name")) {
//found it!
logger.warn("the job is already running - do nothing");
}
}
then, to test this, I have a unit test that tries to schedule two instances of this job one after the other. I was expecting to see the warning when trying to schedule the second job, however, instead, I'm getting this exception:
org.quartz.ObjectAlreadyExistsException: Unable to store Job with name:
'job_I_am_looking_for_name' and group: 'job_group_I_am_looking_for_name',
because one already exists with this identification.
When I run this unit test in a debug mode, with the break on this line:
List currentJobs = scheduler.getCurrentlyExecutingJobs();
I see the the list is empty - so the scheduler does not see this job as running , but it still fails to schedule it again - which tells me the job was indeed running at the time...
Am I missing some finer points with this scheduler method?
Thanks!
Marina
For the benefit of others, I'm posting an answer to the issue I was having - I got help from the Terracotta Forum's Zemian Deng: posting on Terracotta's forum
Here is the re-cap:
The actual checking of the running jobs was working fine - it was just timing in the Unit tests, of course. I've added some sleeping in the job, and tweaked unit tests to schedule the second job while the first one is still running - and verified that I could indeed find the first job still running.
The exception I was getting was because I was trying to schedule a new job with the same name, rather than try to trigger the already stored in the scheduler job. The following code worked exactly as I needed:
List<JobExecutionContext> currentJobs = scheduler.getCurrentlyExecutingJobs();
for (JobExecutionContext jobCtx: currentJobs){
jobName = jobCtx.getJobDetail().getName();
groupName = jobCtx.getJobDetail().getGroup();
if (jobName.equalsIgnoreCase("job_I_am_looking_for_name") && groupName.equalsIgnoreCase("job_group_I_am_looking_for_name")) {
//found it!
logger.warn("the job is already running - do nothing");
return;
}
}
// check if this job is already stored in the scheduler
JobDetail emailJob;
emailJob = scheduler.getJobDetail("job_I_am_looking_for_name", "job_group_I_am_looking_for_name");
if (emailJob == null){
// this job is not in the scheduler yet
// create JobDetail object for my job
emailJob = jobFactory.getObject();
emailJob.setName("job_I_am_looking_for_name");
emailJob.setGroup("job_group_I_am_looking_for_name");
scheduler.addJob(emailJob, true);
}
// this job is in the scheduler and it is not running right now - run it now
scheduler.triggerJob("job_I_am_looking_for_name", "job_group_I_am_looking_for_name");
Thanks!
Marina