I have a Spring batch job with standard reader , writer and processor
I have a simple requirement like below :
1)Whatever records reader reads all should be passed to writer by processing through processor
2)My reader reads records by SQL query
So if reader reads 100 records , all should be passed to writer at once
3)If it reads 1000 records , all 1000 should be passed at once
4)So in essence , commit-interval is dynamic here and not fixed.
5)Is there any way we can achieve this ?
EDIT :
To give more clarity , in sprint batch , commit-interval plays a role of chunk oriented processing
E.g : if chunk-size = 10 , reader reads 10 records , passes one record 1 by 1 to processor and at commit-interval (count = 10 ) , all records are written by writer .
Now what we want is dynamic commit-interval. Whatever is being read by reader , all will be passed to writer at once
Can be achieved using chunk-completion-policy property.
<step id="XXXXX">
<tasklet>
<chunk reader="XXXReader"
processor="XXXProcessor"
writer="XXXWriter"
chunk-completion-policy="defaultResultCompletionPolicy">
</chunk>
</tasklet>
</step>
<bean id="defaultResultCompletionPolicy" class="org.springframework.batch.repeat.policy.DefaultResultCompletionPolicy" scope="step"/>
Also we can write custom chunk-completion policy
see this post
Spring Batch custom completion policy for dynamic chunk size
Related
I have a use case where for every input record ( returned by Itemreader ) the processor will produce millions of rows( huge dataset ) which needs to be persisted in a database table
I tried to handle all the inserts in ItemWriter but before even it reaches ItemWriter for the step, i am getting out of memory. I have only one step in my Job
How to handle the persistence of large dataset as output in the spring batch step ?
Note: local chunking is something i could not use here as the input with just one record for which it is failing
I have a flow of Reader -> Processor -> writer
Every 50 Million records the writer is writing the data into a file and zip it.
the problem is that once the Reader has finished the Writer still "holds" many records which are not written since it didn't reach the 50 M records threshold.
Any advise on how to implement it in a way that the data will be written to many files with 50 M records each and a single file with the renaming records ?
If you use a MulitResoureceItemWriter, you can use chunk size to dictate how this should work. It can be configured to write at your specific threshold and if there is a remainder in the final chunk, that will also be written out. You can read more about this useful delegate in the documentation here: https://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/file/MultiResourceItemWriter.html
I am new to spring batch.i have a requirement to read and process 500 000 lines from text to csv. My item processor is taking five min to process 100 lines which will result in almost 2 days for processing and writing 500k lines.
How to invoke the item reader and processor concurrently?
You can use "SimpleAsyncTaskExecutor" for parallel processing and use it in your spring application context as follows:
<bean id="taskExecutor"
class="org.springframework.core.task.SimpleAsyncTaskExecutor">
</bean>
And then you can specify this taskExecutor in some specific tasklet as follows:
<tasklet task-executor="taskExecutor">
<chunk reader="deskReader" processor="deskProcessor"
writer="deskWriter" commit-interval="1" />
</tasklet>
Note that you need to define the ItemReader, ItemWriter and ItemProcessor classes as specified here.
Also, the for parallel processing, you can specify the throttle-limit which specifies how many threads how want to run in parallel which is by default 4 if throttle-limit is not being specified.
Im my chunk processing ,I have read one value from file and in my processor im passing this value to the DB , that will return 4 records for that single value. And im returning 4 records to the writer which is going to write in the DB . I'm failing the job in the 3rd record which is returned for the value read from the file. But after failing the job, 3 records from the DB is not rollbacked?
How the chunk is maintaining the transaction whether it is based on read count and write count of the record or not?
How is the data read is passed from reader to Itemprocessor in Spring batch? Is there a queue where it is put from ItemReader's read method which is consumed by ItemProcessor? I have to read 10 records at a time from a database and process 5 at a time in the ItemProcessor's process method. ItemProcessor is taking the records one by one and I want to change it to 5 records at a time in the process method.
Every item that is returned from the read method of a reader will be forwarded to the processor as one item.
If you want to collect a group of items and pass them as a group to the processor, you need a reader that groups them.
You could implement something like a group-wrapper.
I explained such an approach in another answer: Spring Batch Processor