I am writing spring batch having dynamic size of chunk. I am using skip policy which skips item if any kind of exception occurs.
Now the when exception occurs in writer it does following,
when the item writer throws a skippable exception. Because the framework doesn’t know which item threw the exception, it reprocesses single item at a time and pass single item to writer (except error item)
What I want is if any kind of exception occurs in writer it will pass all items (except error one) as a list to writer again and NOT one by one. I am creating excel file in writer there I need all item in chunk at same time and not one by one.
Is there any way to achieve this in spring batch?
Thank you !!
The only way to accomplish this is basically working around Spring Batch and having your item be the full list, and your chunk size be 1. Then it would be up to you to handle skip/etc logic.
Related
I have a use-case for which I could use spring batch job which I could design in following ways.
1) First Way:
Step1 (Chunk oriented step): Read from the file —> filter, validate and transform the read row into DTO (data transfer object), if there are any errors, store errors in DTO itself —> Check if any of the DTOs has errors , if not write to Database. If yes, write to an error file.
However, problem with this way is - I need this entire JOB in transaction boundary. So if there is a failure in any of the chunks then I don’t want to write to DB and want to rollback all successful writes till that point in DB. Above way forces me to write rollback logic for all successful writes if there is a failure in any of the chunks.
2) Second way
Step 1 (Chunk oriented step): Read items from the file —> filter, validate and transform the read row in DTO (data transfer object). This does store the errors in the DTO object itself.
Step 2 (Tasklet): Read entire list (and not chunks) of DTOs created from step 1 —> Check if any of the DTOs has errors populated in it. If yes, then abort the writing to DB and fail the JOB.
In second way, I get all benefits of chunk processing and scaling. At the same time I have created transaction boundary for entire job.
PS: In both ways in their first step there won’t be any step failure, if there is failure; errors are stored in DTO object itself. Thus, DTO object is always created.
Question is - Since I am new to Spring batch, is it a good pattern to go with second way. And is there a way that I can share data between steps so that entire List of DTOs is available to second step (in second way above) ?
In my opinion, trying to process the entire file in a single transaction (ie a transaction at the job level) is not the way to go. I would proceed in two steps:
Step 1: process the input and writes errors to the file
Step 2: this step is conditioned by step1. If no errors has been detected in step 1, then save data to the db.
This approach does not require to write data to the database and roll it back if there are errors (as suggested by option 1 in your description). It only writes to the database when everything is ok.
Moreover, this approach does not require holding a list of items in-memory as suggested by option 2, which could be inefficient in terms of memory usage and performs poorly if the file is big.
I have a file to written to db after doing some validations. The file will have Header and Trailer which needs to be validated and then skipped, and all the lines in between should be mapped and loaded to a db if validations are met. Can I use Item Reader and Writer to do this? Below is a sample file data which has a header line, a trailer line and in between them the lines with actual data to be loaded to db. Any help is appreciated.
HEADER|xxxxx|20190405T143025Z
linedata|linedata|linedata|linedata|||linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata
TRAILER|20190405T143025Z|1
p.s: I am an IIB Developer, this is my first time using spring batch.
You can break down the requirement in two steps:
Step 1: a simple tasklet that does the validation logic (looks for header/trailer records and validates them). The success of this step is a pre-condition for the next step.
Step 2: a chunk-oriented tasklet that is triggered only if step 1 succeeds, and which skips the header with FlatFileItemReader.setLinesToSkip(1) and skips the trailer with a processor that filters records starting with TRAILER.
Please see the image.
So here is a flow, wherein the first component executes a database query to find QAR_ID (single row), if it is found then all well. I am trying to put error handling into this. When no rows are found, it directs to tJava_11 which raises an java exception and that gets logged by another tJava component.
The problem I am facing is when it goes to error handling flow, it logs the error and just goes to the post-job section. However, I want Talend to take the OnSubJobOk route so that it continues with other steps instead of directly jumping to post-job section.
I know this is possible using subjobs but I don't want to keep creating 'n' number of subjobs.
Is there any way this can be done in the same job?
You could remove the runif and handle both scenarios in the get_QAR_ID into context component. ie query the database component's NB_LINE after variable, if it's <1 raise the error, else set the value. Your job would then flow to the onSubjobOk.
You can do something like this :
In tJava_1 you do your error logging if no row is returned by your query, and you continue to the next subjob. No need to throw an exception here only to catch it immediately after.
If any row is found, you continue to the next subjob (tJava_2) with an If trigger.
I have a job that processes items in chunks (of 1000). The items are marshalled into a single JSON payload and posted to a remote service as a batch (all 1000 in one HTTP POST). Sometime the remote service bogs down and the connection times out. I set up skip for this
return steps.get("sendData")
.<DataRecord, DataRecord> chunk(1000)
.reader(reader())
.processor(processor())
.writer(writer())
.faultTolerant()
.skipLimit(10)
.skip(IOException.class)
.build();
If a chunk fails, batch retries the chunk, but one item at a time (in order to find out which item caused the failure) but in my case no one item caused the failure, it is the case that the entire chunk succeeeds or fails as a chunk and should be retried as a chunk (in fact, dropping to single-item mode causes the remote service to get very angry and it refuses to accept the data. We do not control the remote service).
What's my best way out of this? I was trying to see if I could disable single-item retry mode, but I don't even fully understand where this happens. Is there a custom SkipPolicy or something that I can implement? (the methods there didn't look that helpful)
Or is there some way to have the item reader read the 1000 records but pass it to the writer as a List (1000 input items => one output item)?
Let me walk though this in two parts. First I'll explain why it works the way it does, then I'll propose an option for addressing your issue.
Why Is Retry Item By Item
In your configuration, you've specified that it be fault tolerant. With that, when an exception is thrown in the ItemWriter, we don't know which item caused it so we don't have a way to skip/retry it. That's why, when we do begin the skip/retry logic, we go item by item.
How To Handle Retry By The Chunk
What this comes down to is you need to get to a chunk size of 1 in order for this to work. What that means is that instead of relying on Spring Batch for iterating over the items within a chunk for the ItemProcessor, you'll have to do it yourself. So your ItemReader would return a List<DataRecord> and your ItemProcessor would loop over that list. Your ItemWriter would take a List<List<DataRecord>>. I'd recommend creating a decorator for an ItemWriter that unwraps the outer list before passing it to the main ItemWriter.
This does remove the ability to do true skipping of a single item within that list but it sounds like that's ok for your use case.
My case is I got a batch job will read data from 2 differents table and process differently.
The first reader will do simple SQL retrieving and simple conversion, the second reader will do SQL retrieving and process update and insert logic behind. Both readers will return a string line and write into a file.
In Spring Batch, possible to have 2 readers and 2 processor in 1 step then pass to 1 writer?
I'd go for the second approach, suggested by Faiz Pinkman. It's simply closer to the way spring-batch works.
first step
reader for your simple sql -> use the standard db reader
processor -> your own implementation of your simple logic
writer to a file -> use the standard FlatFileItemWriter
second step
I don't undestand exactly what you mean by "process update and insert logic behind". I assume, that you read data from a db and based on that data, you have to execute inserts and updates in a table.
reader for your more complex data -> again, use the standard db reader
processor ->
prepare the string for the text file
prepare the new inserts and upates
writer -> use a composite writer with the following delegates
FlatFileItemWriter for your textfile
DbWriter depending on your inserts and update needs
This way, you have clear transaction boundaries and can be sure, that the content of the file and inserts and updates are "in sync".
note: first and second step can run in parallel
third step
- reader use a multiresource reader, to read from the two files
- writer use a FlatFileItemWriter to write both contents into one file.
Of course, If you don't need to have the content in one file, then you can skip step 3.
You could also execute step 1 and 2 after each other and write in the same file. But depending on the execution time for step 1 and 2, the performance could be inferior to execute step 1 and 2 in parallel and using a third step to compine the data.
You can code a custom reader and write an application level logic in your custom processor for processing the inputs based on their content. It does not make sense to have two readers in one step. How would the spring batch execute them? It doesn't make sense to finish reader 1 and then start reader 2. This is as equal as having two different steps.
Another approach would be to place your output from both the reader in one file and then have another step for writing. But I'd go with the 1st technique.