Spring Batch Job Stop Using jobOperator - spring-batch

I have Started my job using jobLauncher.run(processJob,jobParameters); and when i try stop job using another request jobOperator.stop(jobExecution.getId()); then get exeption :
org.springframework.batch.core.launch.JobExecutionNotRunningException:
JobExecution must be running so that it can be stopped
Set<JobExecution> jobExecutionsSet= jobExplorer.findRunningJobExecutions("processJob");
for (JobExecution jobExecution:jobExecutionsSet) {
System.err.println("job status : "+ jobExecution.getStatus());
if (jobExecution.getStatus()== BatchStatus.STARTED|| jobExecution.getStatus()== BatchStatus.STARTING || jobExecution.getStatus()== BatchStatus.STOPPING){
jobOperator.stop(jobExecution.getId());
System.out.println("###########Stopped#########");
}
}
when print job status always get job status : STOPPING but batch job is running
its web app, first upload some CSV file and start some operation using spring batch and during this execution if user need stop then stop request from another controller method come and try to stop running job
Please help me for stop running job

If you stop a job while it is running (typically in a STARTED state), you should not get this exception. If you have this exception, it means you have stopped your job while it is currently stopping (that is what the STOPPING status means).
jobExplorer.findRunningJobExecutions returns only running executions, so if in the next line right after this one you have a job in STOPPING status, this means the status changed right after calling jobExplorer.findRunningJobExecutions. You need to be aware that this is possible and your controller should handle this case.

When you tell spring batch to stop a job it goes into STOPPING mode. What this means is it will attempt to complete the unit of work chunk it is currently processing but then stop working. Likely what's happening is you are working on a long running task that is not finishing a unit of work (is it hung?) so it can't move from STOPPING to STOPPED.
Doing it twice rightly leads to an Exception because your job is already STOPPING by the time you did it the first time.

Related

How can I find the jobId that caused node failure in slurm?

I know that sacctmgr command can list the event history of nodes with the reason.
sacctmgr show event Start=09/01-00:00 format=nodename,timestart,timeend,state,reason,user
This command gives the following output
gnodeXX 2020-09-04T20:21:34 2020-09-05T01:21:38 DRAIN Kill task failed root(ZZ)
gnodeXX 2020-09-09T16:44:55 2020-09-09T17:50:21 DOWN* Not responding slurm(DDDD)
Is there any way I can get the jobId or username that caused the Node Failure or any info on the task for which kill failed? The user column gives either of two outputs for all results root(ZZ) and slurm(DDDD) and I am not sure what these imply.

In a scala spark job, running in yarn, how can I fail the job so that yarn shows a Failed status

I have a simple if statement in my scala spark job code, that if false i want to stop the job and mark it failed. I want the yarn UI to show the spark job with a status of failed, but everything i've done so far has stopped the job, but only shows up as successfully finished on the yarn UI.
if(someBoolen) {
//context.clearAllJobs()
//System.exit(-1)
//etc, nothing so far, stops the job and show as failed in the yarn UI
}
Any help would be great.
Throwing an exception (and not catching it) will cause the process to fail.
if(someBoolen) {
throw new Exception("Job failed");
}

How to stop and resume a spring batch job

Goal : I am using spring batch for data processing and I want to have an option to stop/resume (where it left off).
Issue: I am able to send a stop signal to a running job and it gets stopped successfully. But when I try to send start signal to same job its creating a new instance of the job and starts as a fresh job.
My question is how can we achieve a resume functionality for a stopped job in spring batch.
You just have to run it with the same parameters. Just make sure you haven't marked the job as non-restarted and that you're not using RunIdIncrementer or similar to automatically generate unique job parameters.
See for instance, this example. After the first run, we have:
INFO: Job: [SimpleJob: [name=myJob]] completed with the following parameters: [{}] and the following status: [STOPPED]
Status is: STOPPED, job execution id 0
#1 step1 COMPLETED
#2 step2 STOPPED
And after the second:
INFO: Job: [SimpleJob: [name=myJob]] completed with the following parameters: [{}] and the following status: [COMPLETED]
Status is: COMPLETED, job execution id 1
#3 step2 COMPLETED
#4 step3 COMPLETED
Note that stopped steps will be re-executed. If you're using chunk-oriented steps, make sure that at least the ItemReader implements ItemStream (and does it with the correct semantics).
Steps marked with allowRestartWithComplete will always be re-run.

How spring batch admin is stopping a running job?

How spring batch admin is stopping a running job from the UI .
On the spring batch admin's online documentation i have read the following lines .
"A job that is executing can be stopped by the user (whether or not it
is launchable). The stop signal is sent via the database and once
detected by Spring Batch in whatever process is running the job, the
job is stopped (status moves from STOPPING to STOPPED) and no further
processing takes place."
Does that mean Spring batch admin UI is directly changing the status of job inside the spring batch table ?
UPDATE: I tried executing the below query on the running job .
update batch_job_execution set status="STOPPED" where job_ins
tance_id=19;
The above query is getting updated in the DB but spring batch is not bale to stop the running job.
If anybody has tried this please do share the logic here .
You're confused between Batch Status vs. Exit Status.
What are you doing with that SQL is changed the STATUS to STOPPED
When a job is running you can stop the job from the code. In each step iteration, check their status and if STOPPING its set, then send the step to stop ongoing.
Anyway, what you doing is not elegant. The correct way is explained in Common Batch Patterns -> 11.2 Stopping a Job Manually for Business Reasons
public class FooProcessor implements ItemProcessor<FooIn,FooOut>{
public FooOut process(FooIn foo) throws Exception {
if (sendToStop(item)) {
throw new MyStopException("I need to Stop: " + item);
}
//do my stuff
return new FooOut(foo);
}
}
Another simple way to stop chunk step is return null in the reader. This tells us that no more elements to iterate the reader
public T read() throws Exception {
T item = delegate.read();
if (ifNeedStop(item)) {
return null; // end the step here
}
return item;
}
I investigated the spring batch code.
It seems they update both the version and status of the BATCH_JOB_EXECUTION.
This works for me:
update batch_job_execution set status="STOPPED", version=version+1 where job_instance_id=19;
If you look into the jars of spring batch admin, you can see that in AbstractStep.java(spring-batch admin class) it checks for the status of the Step and Job from Database .
Based on this status it validates step before running it .
This works well for all cases except in chunk, since next step is called after large processing . If you want to implement in it, you can implement your own listener to check status (but it will increase DB hits) .

find if a job is running in Quartz1.6

I would like to clarify details of the scheduler.getCurrentlyExecutingJobs() method in Quartz1.6. I have a job that should have only one instance running at any given moment. It can be triggered to "run Now" from a UI but if a job instance already running for this job - nothing should happen.
This is how I check whether there is a job running that interests me:
List<JobExecutionContext> currentJobs = scheduler.getCurrentlyExecutingJobs();
for (JobExecutionContext jobCtx: currentJobs){
jobName = jobCtx.getJobDetail().getName();
groupName = jobCtx.getJobDetail().getGroup();
if (jobName.equalsIgnoreCase("job_I_am_looking_for_name") &&
groupName.equalsIgnoreCase("job_group_I_am_looking_for_name")) {
//found it!
logger.warn("the job is already running - do nothing");
}
}
then, to test this, I have a unit test that tries to schedule two instances of this job one after the other. I was expecting to see the warning when trying to schedule the second job, however, instead, I'm getting this exception:
org.quartz.ObjectAlreadyExistsException: Unable to store Job with name:
'job_I_am_looking_for_name' and group: 'job_group_I_am_looking_for_name',
because one already exists with this identification.
When I run this unit test in a debug mode, with the break on this line:
List currentJobs = scheduler.getCurrentlyExecutingJobs();
I see the the list is empty - so the scheduler does not see this job as running , but it still fails to schedule it again - which tells me the job was indeed running at the time...
Am I missing some finer points with this scheduler method?
Thanks!
Marina
For the benefit of others, I'm posting an answer to the issue I was having - I got help from the Terracotta Forum's Zemian Deng: posting on Terracotta's forum
Here is the re-cap:
The actual checking of the running jobs was working fine - it was just timing in the Unit tests, of course. I've added some sleeping in the job, and tweaked unit tests to schedule the second job while the first one is still running - and verified that I could indeed find the first job still running.
The exception I was getting was because I was trying to schedule a new job with the same name, rather than try to trigger the already stored in the scheduler job. The following code worked exactly as I needed:
List<JobExecutionContext> currentJobs = scheduler.getCurrentlyExecutingJobs();
for (JobExecutionContext jobCtx: currentJobs){
jobName = jobCtx.getJobDetail().getName();
groupName = jobCtx.getJobDetail().getGroup();
if (jobName.equalsIgnoreCase("job_I_am_looking_for_name") && groupName.equalsIgnoreCase("job_group_I_am_looking_for_name")) {
//found it!
logger.warn("the job is already running - do nothing");
return;
}
}
// check if this job is already stored in the scheduler
JobDetail emailJob;
emailJob = scheduler.getJobDetail("job_I_am_looking_for_name", "job_group_I_am_looking_for_name");
if (emailJob == null){
// this job is not in the scheduler yet
// create JobDetail object for my job
emailJob = jobFactory.getObject();
emailJob.setName("job_I_am_looking_for_name");
emailJob.setGroup("job_group_I_am_looking_for_name");
scheduler.addJob(emailJob, true);
}
// this job is in the scheduler and it is not running right now - run it now
scheduler.triggerJob("job_I_am_looking_for_name", "job_group_I_am_looking_for_name");
Thanks!
Marina