Can we use Spring batch Item Reader and Writer on a file which needs to skip first and last line? - spring-batch

I have a file to written to db after doing some validations. The file will have Header and Trailer which needs to be validated and then skipped, and all the lines in between should be mapped and loaded to a db if validations are met. Can I use Item Reader and Writer to do this? Below is a sample file data which has a header line, a trailer line and in between them the lines with actual data to be loaded to db. Any help is appreciated.
HEADER|xxxxx|20190405T143025Z
linedata|linedata|linedata|linedata|||linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata
TRAILER|20190405T143025Z|1
p.s: I am an IIB Developer, this is my first time using spring batch.

You can break down the requirement in two steps:
Step 1: a simple tasklet that does the validation logic (looks for header/trailer records and validates them). The success of this step is a pre-condition for the next step.
Step 2: a chunk-oriented tasklet that is triggered only if step 1 succeeds, and which skips the header with FlatFileItemReader.setLinesToSkip(1) and skips the trailer with a processor that filters records starting with TRAILER.

Related

Spring batch entire Job in transaction boundary

I have a use-case for which I could use spring batch job which I could design in following ways.
1) First Way:
Step1 (Chunk oriented step): Read from the file —> filter, validate and transform the read row into DTO (data transfer object), if there are any errors, store errors in DTO itself —> Check if any of the DTOs has errors , if not write to Database. If yes, write to an error file.
However, problem with this way is - I need this entire JOB in transaction boundary. So if there is a failure in any of the chunks then I don’t want to write to DB and want to rollback all successful writes till that point in DB. Above way forces me to write rollback logic for all successful writes if there is a failure in any of the chunks.
2) Second way
Step 1 (Chunk oriented step): Read items from the file —> filter, validate and transform the read row in DTO (data transfer object). This does store the errors in the DTO object itself.
Step 2 (Tasklet): Read entire list (and not chunks) of DTOs created from step 1 —> Check if any of the DTOs has errors populated in it. If yes, then abort the writing to DB and fail the JOB.
In second way, I get all benefits of chunk processing and scaling. At the same time I have created transaction boundary for entire job.
PS: In both ways in their first step there won’t be any step failure, if there is failure; errors are stored in DTO object itself. Thus, DTO object is always created.
Question is - Since I am new to Spring batch, is it a good pattern to go with second way. And is there a way that I can share data between steps so that entire List of DTOs is available to second step (in second way above) ?
In my opinion, trying to process the entire file in a single transaction (ie a transaction at the job level) is not the way to go. I would proceed in two steps:
Step 1: process the input and writes errors to the file
Step 2: this step is conditioned by step1. If no errors has been detected in step 1, then save data to the db.
This approach does not require to write data to the database and roll it back if there are errors (as suggested by option 1 in your description). It only writes to the database when everything is ok.
Moreover, this approach does not require holding a list of items in-memory as suggested by option 2, which could be inefficient in terms of memory usage and performs poorly if the file is big.

Spring batch - Multiple step execution model, if step 3rd fails then data inserted by 1st and 2nd step should also get rolled back

Is there a way to do this with CSV file.
I have multiple steps to execute on which each step is inserting data into the DB.
My scenario is if my 3rd step fails then step1 and step2 should also get failed or inserted data should get roll backed.
In my current design step1 and step2 successfully inserted data into DB even though my 3rd steps failed.

Disable Spring Batch single-item processing in skip situation

I have a job that processes items in chunks (of 1000). The items are marshalled into a single JSON payload and posted to a remote service as a batch (all 1000 in one HTTP POST). Sometime the remote service bogs down and the connection times out. I set up skip for this
return steps.get("sendData")
.<DataRecord, DataRecord> chunk(1000)
.reader(reader())
.processor(processor())
.writer(writer())
.faultTolerant()
.skipLimit(10)
.skip(IOException.class)
.build();
If a chunk fails, batch retries the chunk, but one item at a time (in order to find out which item caused the failure) but in my case no one item caused the failure, it is the case that the entire chunk succeeeds or fails as a chunk and should be retried as a chunk (in fact, dropping to single-item mode causes the remote service to get very angry and it refuses to accept the data. We do not control the remote service).
What's my best way out of this? I was trying to see if I could disable single-item retry mode, but I don't even fully understand where this happens. Is there a custom SkipPolicy or something that I can implement? (the methods there didn't look that helpful)
Or is there some way to have the item reader read the 1000 records but pass it to the writer as a List (1000 input items => one output item)?
Let me walk though this in two parts. First I'll explain why it works the way it does, then I'll propose an option for addressing your issue.
Why Is Retry Item By Item
In your configuration, you've specified that it be fault tolerant. With that, when an exception is thrown in the ItemWriter, we don't know which item caused it so we don't have a way to skip/retry it. That's why, when we do begin the skip/retry logic, we go item by item.
How To Handle Retry By The Chunk
What this comes down to is you need to get to a chunk size of 1 in order for this to work. What that means is that instead of relying on Spring Batch for iterating over the items within a chunk for the ItemProcessor, you'll have to do it yourself. So your ItemReader would return a List<DataRecord> and your ItemProcessor would loop over that list. Your ItemWriter would take a List<List<DataRecord>>. I'd recommend creating a decorator for an ItemWriter that unwraps the outer list before passing it to the main ItemWriter.
This does remove the ability to do true skipping of a single item within that list but it sounds like that's ok for your use case.

Spring Batch - construct 2 reader and 2 processor with single writer

My case is I got a batch job will read data from 2 differents table and process differently.
The first reader will do simple SQL retrieving and simple conversion, the second reader will do SQL retrieving and process update and insert logic behind. Both readers will return a string line and write into a file.
In Spring Batch, possible to have 2 readers and 2 processor in 1 step then pass to 1 writer?
I'd go for the second approach, suggested by Faiz Pinkman. It's simply closer to the way spring-batch works.
first step
reader for your simple sql -> use the standard db reader
processor -> your own implementation of your simple logic
writer to a file -> use the standard FlatFileItemWriter
second step
I don't undestand exactly what you mean by "process update and insert logic behind". I assume, that you read data from a db and based on that data, you have to execute inserts and updates in a table.
reader for your more complex data -> again, use the standard db reader
processor ->
prepare the string for the text file
prepare the new inserts and upates
writer -> use a composite writer with the following delegates
FlatFileItemWriter for your textfile
DbWriter depending on your inserts and update needs
This way, you have clear transaction boundaries and can be sure, that the content of the file and inserts and updates are "in sync".
note: first and second step can run in parallel
third step
- reader use a multiresource reader, to read from the two files
- writer use a FlatFileItemWriter to write both contents into one file.
Of course, If you don't need to have the content in one file, then you can skip step 3.
You could also execute step 1 and 2 after each other and write in the same file. But depending on the execution time for step 1 and 2, the performance could be inferior to execute step 1 and 2 in parallel and using a third step to compine the data.
You can code a custom reader and write an application level logic in your custom processor for processing the inputs based on their content. It does not make sense to have two readers in one step. How would the spring batch execute them? It doesn't make sense to finish reader 1 and then start reader 2. This is as equal as having two different steps.
Another approach would be to place your output from both the reader in one file and then have another step for writing. But I'd go with the 1st technique.

Itemwriter output using same order that itemreader used to read file

We have a springbatch job that reads a file (flatfileitemreader), process it and writes data to a queue (jmsitemwriter).
We have another job that reads the queue (jmsitemreader) and writes a file (flatfileitemwriter). It's asynchronous process (in between the execution of the two jobs, we have some manual process that must be performed).
The flat file content doesn't have a line identifier. And we use a multi-threaded approach when reading the file ("throttle-limit"). So, the messages queued do not maintain the same order that they used to have into the flat file.
The problem is that we should generate an output file respecting the original order. So the line 33 inside the incoming file, should be line 33 into the outgoing file (it will have the contents of the original line, plus some data).
Does springbatch provide "native" a way to order the output, respecting the original read order? I used "native" because one solution that we thought is to create an additional step just to add a line number to the file and use it at the end, but I was wondering if this "reinvent the wheel"...
We are using SB 3.0.3
TIA,
Bob
The use case you are describing asks that you maintain order across multiple jobs which is not supported. In theory (while not guaranteed) a single, single threaded step would retain the order of the input file.
Since you are reading in a multithreaded manor, there really isn't a good way to guarantee the order of the items as they are being read. The best you could do is synchronize the read method and add an id as the items are being read. If the bottleneck you're attempting to address with multithreading is in the processor or writer, this may not be a bad option.