My software is coreographing a number of spring batch jobs. The output of a job is partially an input for the next job . It may happen that the entire process (the entire jobs chain) is restarted, even if one or more jobs in the chain have been successfully completed. In this case, when I try tu run one of the jobs again with the same parameters, I get a JobInstanceAlreadyCompletedException as expected. I could skip and go on to the next job but I would need to access the context of the completed instance in order to get the output produced by its steps and pass them over to the next job.
According to the JobExplorer APIs, this is just possible if you have the executionId of the completed instance. I can't get it from the JobInstanceAlreadyCompletedException , and it looks there are no APIs for getting it from the already used parameters list. Do you know a way to get this executionId given the parameters? Or to get access, in whatever way, to the completed instance job context?
Why not put all this jobs into one main job and using JobSteps to integrate the jobs? This way, already completed subjobs will be treated as completed steps, which will not be started again. Moreover, all information is available in the job/step contexts, even if you restart?
Another way would be to save all needed parameters and information into a file and use this to start the next job instead of beeing dependent on the Jobexecution info. Your last step could simply be a tasklet, that writes an appropriate property file.
Related
I know there's a common pattern to stop a job when certain exceptions are thrown. But we allow our users to stop any job at any time.
I have a number of microservices, each running a different batch job. In the front, have a controller method that looks up all running jobs, gets the execution Id, and then uses JobOperator to issue a stop command. But execution appears to continue.
jobOperator.stop(Long.parseLong(jobExecId));
All of the examples I've seen have issued just this command and updated the JobRepository, which I do.
jobExecution.setEndTime(new Date());
jobExecution.setStatus(BatchStatus.ABANDONED);
jobExecution.setExitStatus(ExitStatus.STOPPED);
jobRepository.update(jobExecution);
Is there something more I should be doing?
I suppose calling Thread.currentThread().interrupt() should be the right way to proceed.
SB will intercept signal in ThreadStepInterruptionPolicy and stops the job.
Here is what I am trying to do, not sure if it is possible:
Autosys gets File1:10pm starts DataStage Job 1.1:10pm
Job1.1:10pm is still running
Autosys gets File1:20pm, it needs to start the same Job1 but run it as Job1.1:20pm, even though Job1.1:10pm is still running & not wait for it to finish, go ahead & run.
Can Autosys call the same DataStage job every time it gets a new file & run it with the new timestamp as the invocation id. Without waiting for the previous job to finish.
Thanks ya'll
Yes - absolutely - this is possible. To enable different InvocationIds you have to check the "multiple instance" property in the jobs properties. With this you allow multiple simultaneous runs of the job.
The invocationID can be a parameter as well when calling it from a sequence.
When your (multiple intance) job writes to a file make sure that each filename is unique to avoid side effects due to the multiple runs at the same time. This can be done by specifying DSJobInvocationId as part of the filename. Note that it is a parameter provided by DataStage which needs to be written exactly as shown with the upper and lower case letters. DataStage will the replace it with the content of your job invocationid at runtime.
I currently have a setup that creates a job and then collect some metrics about the tasks in the job. I want to do something similar, but by setting a job schedule instead. In particular, I want to set a job schedule that wakes up at a recurrence interval that I specify, and run the same code that I was running when creating a job. What's the best way to go about doing that?
It seems that there is a CloudJobSchedule that I could use to set up my job schedule, but this only lets me create say a job manager task, and specify few properties. How can I run external code on the jobs created by the Job schedule?
It could also help to clarify how the CloudJobSchedule works. Specifically, after I commit my job schedule, what would happen programmatically. Does the code just move sequentially and run the rest of the code. In this case, does it make sense to get a reference to the last job created by the job schedule and run code on the job returned?
You'll want to create a CloudJobSchedule. You can specify the recurrence in the Schedule.
If you only need to run a single task per recurrence, your job manager task can simply be the task you need to run. If you need to run multiple tasks per job recurrence, your job manager needs to have logic to submit tasks to Batch and monitor for completion (if necessary).
When you submit a job schedule to Batch, your client side code will continue running. The behavior is no different than if you were submitting a regular job. You can retrieve the last job run via JobScheduleExecutionInformation and the RecentJob property.
I am maintaining a legacy application written using Spring Batch and need to tweak it to never lose data.
I have to read from various webservice (one for each step) and then write to a remote database. Things goes bad when connection with the DB drops because all itens read from webservice are discarded (can't read the same item twice), and the data is lost because can not be written.
I need to setup Spring Batch to keep already read data on one step to retry the writing operation next time the step runs. The same step can not read more data until the write operation is successfully concluded.
When not being able to write, the step should keep the read data and pass execution to the next step, after a while, when it's time to the failed step to run again, it should not read another item, retrying the failed writing operation instead.
The batch application should runs in an infinite loop and each step should gather data from one different source. Failed writing operations should be momentarily skipped (keeping the read data) to not delay others steps but should resume from the write operation next time they are called.
I am researching in various web sources aside from official docs, but Spring Batch hasn't the most intuitive docs I have come across.
Can this be achieved? If yes, how?
You can write the data you need to persist in case the job fails to the Batch Step's ExecutionContext. You can restart the job again with this data:
Step executions are represented by objects of the StepExecution class.
Each execution contains a reference to its corresponding step and
JobExecution, and transaction related data such as commit and rollback
count and start and end times. Additionally, each step execution will
contain an ExecutionContext, which contains any data a developer needs
persisted across batch runs, such as statistics or state information
needed to restart
More from: http://static.springsource.org/spring-batch/reference/html/domain.html#domainStepExecution
I do not know if this will be ok with you, but here are my thoughts on your configuration.
Since you have two remote sources that are open to failure, let us partition the overall system with two jobs (not two steps)
JOB A
Step 1: Tasklet
Check a shared folder for files. If files exist, do not proceed to the next step. Will be more understandable when writing about JOB B
Step 2: Webservice to files
Read from your web service and write results to flatfiles in the shared folder. Since you would be using flatfiles for the output, you will solve your "all items read from webservice are discarded and the data is lost because can not be written."
Use Quartz or equivalent for the scheduling of this job.
JOB B
Poll the shared folder for generated files and create a joblauncher with the file (file.getWhere as a jobparameter). Spring integration project may help in this polling.
Step 1:
Read from the file, write them to remote db and move/delete file if writing to db is successful.
No scheduling will be needed since job launching originates from polled in files.
Sample Execution
Time 0: No file in the shared folder
Time 1: Read from web service and write to shared folder
Time 2: Job B file polling occurs, tries to write to db.
If successfull, the system continues to execute.
If not, when Job A tries to execute on its scheduled time, it will skip reading from web service since files still exist in the shared folder. It will skip until Job B consumes the files.
I did not want to go into implementation specifics but Spring Batch can handle all of these situations. Hope that this helps.
I have a clarification.
Is it possible for us to run multiple instances of a job at the same time.
Currently, we have single instance of a job at any given time.
If it is possible, please let me know how to do it.
Yes you can. Spring Batch distinguishes jobs based on the JobParameters. So if you always pass different JobParameters to the same job, you will have multiple instances of the same job running.
A simple way is just to add a UUID parameter to each request to start a job.
Example:
final JobParametersBuilder jobParametersBuilder = new JobParametersBuilder();
jobParametersBuilder.addString("instance_id", UUID.randomUUID().toString(), true);
jobLauncher.run(job,jobParametersBuilder.toJobParameters());
The boolean 'true' at the end signal to Spring Batch to use that parameter as part of the 'identity' of the instance of the job, so you will always get new instances with each 'run' of the job.
Yes you can very much run tasks in parallel as also documented here
But there are certain things to be considered
Does your application logic needs parallel execution? Because if if you are going to run steps in parallel, you would have to take care and build application logic so that the work done by parallel steps is not overlapping (Unless that is the intention of your application)
Yes, it's completely possible to have multiple instances (or executions) of a job run concurrently.