Say I have 1000 data which is being read by reader and my chunk size is 50.In my processor after doing some business logic , deleting data from some table by calling java stored procedure and in case of deletion fail , incrementing failed-deletion count. At last when control comes to writer if failed-deletion count> 0 then I want to rollback the entire deletion for this particular chunk and this should not affect the other chunk process. Can someone please help on this? for more information please let me know.
Related
I have a multi instance application and each instance is multi threaded.
To make each thread only process rows not already fetched by another thread, I'm thinking of using pessimistic locks combined with skip locked.
My database is PostgreSQL11 and I use Spring batch.
For the spring batch part I use a classic chunk step (reader, processor, writer). The reader is a jdbcPagingItemReader.
However, I don't see how to use the pessimist lock (SELECT FOR UPDATE) and SKIP LOCKED with the jdbcPaginItemReader. And I can't find a tutorial on the net explaining simply how this is done.
Any help would be welcome.
Thank you
I have approached similar problem with a different pattern.
Please refer to
https://docs.spring.io/spring-batch/docs/current/reference/html/scalability.html#remoteChunking
Here you need to break job in two parts:
Master
Master picks records to be processed from DB and sent a chunk as message to queue task-queue. Then wait for acknowledgement on separate queue ack-queue once it get all acknowledgements it move to next step.
Slave
Slave receives the message and process it.
send acknowledgement to ack-queue.
I am trying to implement a Spring batch job where in order to process a record , it require 2-3 db calls which is slowing down the processing of records(size is 1 million).If I go with chunk based processing it would process each record separately and would be slow in performance. So, I need to process 1000 records in one go as bulk processing which would reduce the db calls and performance would increase. But my question is If I implement Tasklet then I would lose the functionality of restartability and retrial/skip features too and if implemented using AggregateInputReader I am not sure what would be the impact on restartability and transaction handling.
As per the below thread AggregateReader should work but not sure its impact on transaction handling and restartability in case of failure:
Spring batch: processing multiple record at once
The first extension point in the chunk-oriented processing model that gives you access to the list of items to be written is the ItemWriteListener#beforeWrite(List items). So if you do not want to enrich items one at a time in an ItemProcessor, you can use that listener to do the enrichment for the entire chunk at once.
I have a flow of Reader -> Processor -> writer
Every 50 Million records the writer is writing the data into a file and zip it.
the problem is that once the Reader has finished the Writer still "holds" many records which are not written since it didn't reach the 50 M records threshold.
Any advise on how to implement it in a way that the data will be written to many files with 50 M records each and a single file with the renaming records ?
If you use a MulitResoureceItemWriter, you can use chunk size to dictate how this should work. It can be configured to write at your specific threshold and if there is a remainder in the final chunk, that will also be written out. You can read more about this useful delegate in the documentation here: https://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/file/MultiResourceItemWriter.html
I have some confusion regarding #StepScope in chunk-oriented processing:
I have, lets say, 2 million record to read. So I want to run my spring batch application in chunks. Lets say 2000 items to read, and process and write, then go and read 2001th till 4000th item, process, write, etc.
Question is, if I don't use #StepScope, the batch will know that it has to read 2001st item, and not reread what it already has read?
Yes, even without using #StepScope the reader will read the next chunk and not re-read the same chunk again.
The scope step is actually required in order to use late binding of attributes from the job/step execution context. More details on this here: https://docs.spring.io/spring-batch/4.0.x/reference/html/step.html#late-binding
So if your reader does not need to access job parameters or attributes from the job/step execution context, it does not need to be step scoped and will still read data chunk by chunk. In sum, there is no relation between step scope and chunk-oriented processing.
Spring batch has facility to provide the declarative skip policy (i.e. skippable-exception-classes) to state that the particular record needs to be skipped in the batch processing.
This is quite straight forward in case of ItemReader and ItemProcessor (as they operate record by record basis).
However in case of ItemWriter, when the writing of the record fails (because of the DB Constraint violation), I want to skip that record and let other records go through.
As far as I have researched, I can implement this in two ways,
1) Throw the skippable exception, and Spring Batch will start retry operation with one item per batch, and so if the original batch size is 1000, then the batch will call the writer (and processor if it's transactional) 1000 times (once for each record) and record the skipCount for such item which fails with skip exception (which is most probably the same item which had failed in normal operation)
2) ItemWriter catches the SQLException, and resumes the processing the next record till the end of the items list.
The 2nd approach has a problem of losing the statistics about how many records did not go through (i.e. skipped records) and the batch will record all the items are successfully written and hence update the write count with improper value.
The 1st approach is a little bit tricky in my use-case as it involves re-execution of all the items (on DB side we have complex SPs + triggers) and therefore unnecessarily takes more time.
I am looking for some legal alternative to retry to just record the skipped record count during writing phase.
If none, I will go for the 1st option.
Thanks !
This specifies after how many executions of writer the transaction is commited.
<chunk ... commit-interval="10"/>
As you want to skip all the items that fail while persisted to DB you need commit-interval to be 1 in order to actually persist the good items and not be rolled back along a bad one.
Assuming the reader sends only one item to the processor (and not the list of 1000) reader, processor and writer get executed in order for each item. In this case option 2) is not useful as writer receives only one item always.
You can control how the skip count is incremented by calling StepContribution.html#incrementWriteCount and other increment*Count methods from this class.