Spring Batch JobOperator.restart start a new job instance and don't resume job from the last chunk but just from last step - spring-batch

jobOperator.restart(JobExecutionId)
starts a new job instance and don't resume job from the last chunk but just from last step. I need to resume the job from the last chunk written.
My reader is a custom RestReader that count first the total of items to process and then read this exact number from api. I'm using #StepScope annotation because I need custom variables in custom reader
Spring Batch restart functionality not working when using #StepScope.
Is it possible to resume the job from last chunk written or the problem is my custom reader?

Your RestReader must implement ItemStream. This is the contract that stateful item readers should implement. The ItemStream#update method will be called at chunk boundaries to save any contextual data required to restart from the last checkpoint in case of failure.

Related

Scheduler Processing using Spring batch

we have a requirement to process millions of records using spring batch . We have planned to use a Spring Batch to do this by reading the db using JdbcPagingItemReaderBuilder and process in chunks and write it to Kaafka Queue. The active consumers of the queue will process the chunks of data and update the db
The consumer task is to iterate every item from the chunk and invoke the external api's.
In case the external system is down or not responding with success response , there should be retries of atleast 3 times and considering that each task in the chunk has to do this, what would be the ideal approach?
Another use case to consider, what happens when the job is processing and the system goes down and say that the job has already processed 10000 record and the remaining records are yet to be processed . After the restart how to make sure the execution doesnt restart the entire process from beginning and to resume from the point of failure.
Spring Batch creates the following tables. You can use them to check the status of your job and customize your scheduler to behave in a way you see fit.
I'd use the step execution Id in BATCH_STEP_EXCECUTION to validate the status that's set and then retry based off on that status, Or something similar to that sense.
BATCH_JOB_EXECUTION
BATCH_JOB_EXECUTION_CONTEXT
BATCH_JOB_EXECUTION_PARAMS
BATCH_JOB_INSTANCE
BATCH_STEP_EXECUTION

Put a deadline in spring batch

In a java program.
I need to read database, take theses data, doing some rest call,  write data in a txt file (who have an header, data and a footer).
Job start saturday night and need to finish before saturday morning. If not finish, we need to close file (write footer before) and start a new one.
I started to check some tool to do this job. Spring batch seem interesting.
I can split job with reader, process, writer.
Is there something to check if a job has reach is deadline
Job will be launch with Jentskin
I guess you must use a scheduler for that.
You must read from DB the end date every minute or so, and
if (endDate.compareTo(new Date())<=0)
than the scheduler'job must stop the batch job.
You can use Quartz

how to restart a spring batch chunck oriented at the correct position?

Bonjour à tous,
I have a spring batch job where once of its steps is chunk-oriented with a size of 1 by iteration.
Its process is really simple.
- itemReader read a file and return a line
- itemProcess transform this line in object
- itemWriter persist this object in db.
Now I have wrote a unit test where I have volontary set a unparseable value to fail the job at the second iteration of this chunk so read 1 line and persist.
Now I use the jobId to restart it but I can see in the log that the step restart from the beginning and persist a second time the first line.
Do I need to skip it myself in the before step or do I need to set a specific annotation to restart from the correct chunk iteration?
regards,Mathieu

In spring batch, how to mark a record a skipped record (without retry) during the writing phase

Spring batch has facility to provide the declarative skip policy (i.e. skippable-exception-classes) to state that the particular record needs to be skipped in the batch processing.
This is quite straight forward in case of ItemReader and ItemProcessor (as they operate record by record basis).
However in case of ItemWriter, when the writing of the record fails (because of the DB Constraint violation), I want to skip that record and let other records go through.
As far as I have researched, I can implement this in two ways,
1) Throw the skippable exception, and Spring Batch will start retry operation with one item per batch, and so if the original batch size is 1000, then the batch will call the writer (and processor if it's transactional) 1000 times (once for each record) and record the skipCount for such item which fails with skip exception (which is most probably the same item which had failed in normal operation)
2) ItemWriter catches the SQLException, and resumes the processing the next record till the end of the items list.
The 2nd approach has a problem of losing the statistics about how many records did not go through (i.e. skipped records) and the batch will record all the items are successfully written and hence update the write count with improper value.
The 1st approach is a little bit tricky in my use-case as it involves re-execution of all the items (on DB side we have complex SPs + triggers) and therefore unnecessarily takes more time.
I am looking for some legal alternative to retry to just record the skipped record count during writing phase.
If none, I will go for the 1st option.
Thanks !
This specifies after how many executions of writer the transaction is commited.
<chunk ... commit-interval="10"/>
As you want to skip all the items that fail while persisted to DB you need commit-interval to be 1 in order to actually persist the good items and not be rolled back along a bad one.
Assuming the reader sends only one item to the processor (and not the list of 1000) reader, processor and writer get executed in order for each item. In this case option 2) is not useful as writer receives only one item always.
You can control how the skip count is incremented by calling StepContribution.html#incrementWriteCount and other increment*Count methods from this class.

Retry failed writing operations without delaying other steps in Spring Batch application

I am maintaining a legacy application written using Spring Batch and need to tweak it to never lose data.
I have to read from various webservice (one for each step) and then write to a remote database. Things goes bad when connection with the DB drops because all itens read from webservice are discarded (can't read the same item twice), and the data is lost because can not be written.
I need to setup Spring Batch to keep already read data on one step to retry the writing operation next time the step runs. The same step can not read more data until the write operation is successfully concluded.
When not being able to write, the step should keep the read data and pass execution to the next step, after a while, when it's time to the failed step to run again, it should not read another item, retrying the failed writing operation instead.
The batch application should runs in an infinite loop and each step should gather data from one different source. Failed writing operations should be momentarily skipped (keeping the read data) to not delay others steps but should resume from the write operation next time they are called.
I am researching in various web sources aside from official docs, but Spring Batch hasn't the most intuitive docs I have come across.
Can this be achieved? If yes, how?
You can write the data you need to persist in case the job fails to the Batch Step's ExecutionContext. You can restart the job again with this data:
Step executions are represented by objects of the StepExecution class.
Each execution contains a reference to its corresponding step and
JobExecution, and transaction related data such as commit and rollback
count and start and end times. Additionally, each step execution will
contain an ExecutionContext, which contains any data a developer needs
persisted across batch runs, such as statistics or state information
needed to restart
More from: http://static.springsource.org/spring-batch/reference/html/domain.html#domainStepExecution
I do not know if this will be ok with you, but here are my thoughts on your configuration.
Since you have two remote sources that are open to failure, let us partition the overall system with two jobs (not two steps)
JOB A
Step 1: Tasklet
Check a shared folder for files. If files exist, do not proceed to the next step. Will be more understandable when writing about JOB B
Step 2: Webservice to files
Read from your web service and write results to flatfiles in the shared folder. Since you would be using flatfiles for the output, you will solve your "all items read from webservice are discarded and the data is lost because can not be written."
Use Quartz or equivalent for the scheduling of this job.
JOB B
Poll the shared folder for generated files and create a joblauncher with the file (file.getWhere as a jobparameter). Spring integration project may help in this polling.
Step 1:
Read from the file, write them to remote db and move/delete file if writing to db is successful.
No scheduling will be needed since job launching originates from polled in files.
Sample Execution
Time 0: No file in the shared folder
Time 1: Read from web service and write to shared folder
Time 2: Job B file polling occurs, tries to write to db.
If successfull, the system continues to execute.
If not, when Job A tries to execute on its scheduled time, it will skip reading from web service since files still exist in the shared folder. It will skip until Job B consumes the files.
I did not want to go into implementation specifics but Spring Batch can handle all of these situations. Hope that this helps.