I have a job that processes items in chunks (of 1000). The items are marshalled into a single JSON payload and posted to a remote service as a batch (all 1000 in one HTTP POST). Sometime the remote service bogs down and the connection times out. I set up skip for this
return steps.get("sendData")
.<DataRecord, DataRecord> chunk(1000)
.reader(reader())
.processor(processor())
.writer(writer())
.faultTolerant()
.skipLimit(10)
.skip(IOException.class)
.build();
If a chunk fails, batch retries the chunk, but one item at a time (in order to find out which item caused the failure) but in my case no one item caused the failure, it is the case that the entire chunk succeeeds or fails as a chunk and should be retried as a chunk (in fact, dropping to single-item mode causes the remote service to get very angry and it refuses to accept the data. We do not control the remote service).
What's my best way out of this? I was trying to see if I could disable single-item retry mode, but I don't even fully understand where this happens. Is there a custom SkipPolicy or something that I can implement? (the methods there didn't look that helpful)
Or is there some way to have the item reader read the 1000 records but pass it to the writer as a List (1000 input items => one output item)?
Let me walk though this in two parts. First I'll explain why it works the way it does, then I'll propose an option for addressing your issue.
Why Is Retry Item By Item
In your configuration, you've specified that it be fault tolerant. With that, when an exception is thrown in the ItemWriter, we don't know which item caused it so we don't have a way to skip/retry it. That's why, when we do begin the skip/retry logic, we go item by item.
How To Handle Retry By The Chunk
What this comes down to is you need to get to a chunk size of 1 in order for this to work. What that means is that instead of relying on Spring Batch for iterating over the items within a chunk for the ItemProcessor, you'll have to do it yourself. So your ItemReader would return a List<DataRecord> and your ItemProcessor would loop over that list. Your ItemWriter would take a List<List<DataRecord>>. I'd recommend creating a decorator for an ItemWriter that unwraps the outer list before passing it to the main ItemWriter.
This does remove the ability to do true skipping of a single item within that list but it sounds like that's ok for your use case.
Related
I have a use-case for which I could use spring batch job which I could design in following ways.
1) First Way:
Step1 (Chunk oriented step): Read from the file —> filter, validate and transform the read row into DTO (data transfer object), if there are any errors, store errors in DTO itself —> Check if any of the DTOs has errors , if not write to Database. If yes, write to an error file.
However, problem with this way is - I need this entire JOB in transaction boundary. So if there is a failure in any of the chunks then I don’t want to write to DB and want to rollback all successful writes till that point in DB. Above way forces me to write rollback logic for all successful writes if there is a failure in any of the chunks.
2) Second way
Step 1 (Chunk oriented step): Read items from the file —> filter, validate and transform the read row in DTO (data transfer object). This does store the errors in the DTO object itself.
Step 2 (Tasklet): Read entire list (and not chunks) of DTOs created from step 1 —> Check if any of the DTOs has errors populated in it. If yes, then abort the writing to DB and fail the JOB.
In second way, I get all benefits of chunk processing and scaling. At the same time I have created transaction boundary for entire job.
PS: In both ways in their first step there won’t be any step failure, if there is failure; errors are stored in DTO object itself. Thus, DTO object is always created.
Question is - Since I am new to Spring batch, is it a good pattern to go with second way. And is there a way that I can share data between steps so that entire List of DTOs is available to second step (in second way above) ?
In my opinion, trying to process the entire file in a single transaction (ie a transaction at the job level) is not the way to go. I would proceed in two steps:
Step 1: process the input and writes errors to the file
Step 2: this step is conditioned by step1. If no errors has been detected in step 1, then save data to the db.
This approach does not require to write data to the database and roll it back if there are errors (as suggested by option 1 in your description). It only writes to the database when everything is ok.
Moreover, this approach does not require holding a list of items in-memory as suggested by option 2, which could be inefficient in terms of memory usage and performs poorly if the file is big.
I am writing spring batch having dynamic size of chunk. I am using skip policy which skips item if any kind of exception occurs.
Now the when exception occurs in writer it does following,
when the item writer throws a skippable exception. Because the framework doesn’t know which item threw the exception, it reprocesses single item at a time and pass single item to writer (except error item)
What I want is if any kind of exception occurs in writer it will pass all items (except error one) as a list to writer again and NOT one by one. I am creating excel file in writer there I need all item in chunk at same time and not one by one.
Is there any way to achieve this in spring batch?
Thank you !!
The only way to accomplish this is basically working around Spring Batch and having your item be the full list, and your chunk size be 1. Then it would be up to you to handle skip/etc logic.
I have a chunk oriented processor in the form "reader / processor / writer" called Job1. I have to execute database EJB operations after this job ends, if possible, in the same transaction. I have others jobs (implemented by Tasklets) that I could do this in a simply manner. I this jobs I simply call this operations in tasklet, before finish exeute method. But in this case I don't know the right way to do. In a first try I implemented it by a step listener (outside transaction). But I cannot, because there are uma architecture rule in my company to don't call database operations in listeners. I could execute it after this step in another step in a tasklet and I will come this way if I don't find a better one, but moreover if it's possible I like to execute this operations in the same transaction of Job1.
A couple notes:
In a chunk based step (reader/processor/writer), typically you'll have multiple transactions. One for each chunk.
Because of 1, you typically can't do a db call in at the end of a step that is in the same transaction as the items were processed in. They were processed in multiple transactions.
That being said, from what it sounds like, the best option would be to put your call in another step after the chunk based one.
We have a springbatch job that reads a file (flatfileitemreader), process it and writes data to a queue (jmsitemwriter).
We have another job that reads the queue (jmsitemreader) and writes a file (flatfileitemwriter). It's asynchronous process (in between the execution of the two jobs, we have some manual process that must be performed).
The flat file content doesn't have a line identifier. And we use a multi-threaded approach when reading the file ("throttle-limit"). So, the messages queued do not maintain the same order that they used to have into the flat file.
The problem is that we should generate an output file respecting the original order. So the line 33 inside the incoming file, should be line 33 into the outgoing file (it will have the contents of the original line, plus some data).
Does springbatch provide "native" a way to order the output, respecting the original read order? I used "native" because one solution that we thought is to create an additional step just to add a line number to the file and use it at the end, but I was wondering if this "reinvent the wheel"...
We are using SB 3.0.3
TIA,
Bob
The use case you are describing asks that you maintain order across multiple jobs which is not supported. In theory (while not guaranteed) a single, single threaded step would retain the order of the input file.
Since you are reading in a multithreaded manor, there really isn't a good way to guarantee the order of the items as they are being read. The best you could do is synchronize the read method and add an id as the items are being read. If the bottleneck you're attempting to address with multithreading is in the processor or writer, this may not be a bad option.
How would one implement a custom counter in itemProcessor? A basic counter could work as defined here, but I need the counter not to include retried items or items in rolled back chuncks. Maybe there is itemStream like interface for itemProcessor that I haven't found yet. Using SpringBatch 2.1.7.
EDIT: The batch configuration can be found here (Using compositeProcessor). I've tried to implement the counter like follows (with no luck):
Setting itemProcessListener for all processors and in afterProcess(I,O) increment the counters for each processor in a cache (cache also is in each processor). Then using itemWriteListener for all processors and in afterWrite() flush the cache to stepExecution. But this don't work as the itemProcessListener is not working with the compositeProcessors child processors as I would have expected. Any other idea?
EDIT: I've removed the compositeProcessor and tried to use only a single processor and found out that the itemProcessListener.afterProcess will be called too many times. I am guessing that this is related to the chunk-processing-mode vs. single-processing-mode. So some of the not retried items of a chunk will be re-processed. I've also tried to use RetryListener (to disable counter increments if a retry is in progress), but was not able to configure it. The open and close would not have been called on RetryListener.
I think StepExecution domain object should fits your request.
Intercept it with a StepExecutionListener and access wanted properties.
I was able to solve this counter issue; However I had change the batch a bit. First thing to remove was the compositeItemProcessor. Then I needed to update SpringBatch version to 2.2.7 (To get ChunkListener.afterChunkError). Now when processor increments a counter, I'll cache the counter to processor. I also use ChunkListener.afterChunkError to clear the cache, so when items are reprocessed, the old counter values will be cleared. When ItemWriteListener.afterWrite() occurs I'll flush the cache to stepExecutionContext. This way I was able to overcome the retry increments counters problem.