My case is I got a batch job will read data from 2 differents table and process differently.
The first reader will do simple SQL retrieving and simple conversion, the second reader will do SQL retrieving and process update and insert logic behind. Both readers will return a string line and write into a file.
In Spring Batch, possible to have 2 readers and 2 processor in 1 step then pass to 1 writer?
I'd go for the second approach, suggested by Faiz Pinkman. It's simply closer to the way spring-batch works.
first step
reader for your simple sql -> use the standard db reader
processor -> your own implementation of your simple logic
writer to a file -> use the standard FlatFileItemWriter
second step
I don't undestand exactly what you mean by "process update and insert logic behind". I assume, that you read data from a db and based on that data, you have to execute inserts and updates in a table.
reader for your more complex data -> again, use the standard db reader
processor ->
prepare the string for the text file
prepare the new inserts and upates
writer -> use a composite writer with the following delegates
FlatFileItemWriter for your textfile
DbWriter depending on your inserts and update needs
This way, you have clear transaction boundaries and can be sure, that the content of the file and inserts and updates are "in sync".
note: first and second step can run in parallel
third step
- reader use a multiresource reader, to read from the two files
- writer use a FlatFileItemWriter to write both contents into one file.
Of course, If you don't need to have the content in one file, then you can skip step 3.
You could also execute step 1 and 2 after each other and write in the same file. But depending on the execution time for step 1 and 2, the performance could be inferior to execute step 1 and 2 in parallel and using a third step to compine the data.
You can code a custom reader and write an application level logic in your custom processor for processing the inputs based on their content. It does not make sense to have two readers in one step. How would the spring batch execute them? It doesn't make sense to finish reader 1 and then start reader 2. This is as equal as having two different steps.
Another approach would be to place your output from both the reader in one file and then have another step for writing. But I'd go with the 1st technique.
Related
I need to export some database of arround 180k objects to JSON files so I can retain data structure in certain way that suits me for later import to other database. However because of amount of data, I wanto to separate and group data based on some atribute value from database records itself. So all records that have attribute1=value1, I want to go to value1.json, value2.json and so on.
However I still haven't figured out how to do this kind of job. I am using RepositoryItemReader and JsonFileWriter.
I started by filtering data on that attribute and running separate exports, just to verify that works, however I need to do this so I can automate whole process and let it work.
Can this be done?
There are several ways to do that. Here are a couple of options:
Option 1: parallel steps
You start by creating a tasklet that calculates the distinct values of the attribute you want to group items by, and you put this information in the job execution context.
After that, you create a flow with a chunk-oriented step for each value. Each chunk-oriented step would process a distinct value and generate an output file. The item reader and writer would be step-scoped bean and dynamically configured with the information from the job execution context.
Option 2: partitioned step
Here, you would implement a Partitioner that creates a partition for each distinct value. Each worker step would then process a distinct value and generate an output file.
Both options should perform equally in your use-case. However, option 2 is easier to implement and configure in my opinion.
I have a use-case for which I could use spring batch job which I could design in following ways.
1) First Way:
Step1 (Chunk oriented step): Read from the file —> filter, validate and transform the read row into DTO (data transfer object), if there are any errors, store errors in DTO itself —> Check if any of the DTOs has errors , if not write to Database. If yes, write to an error file.
However, problem with this way is - I need this entire JOB in transaction boundary. So if there is a failure in any of the chunks then I don’t want to write to DB and want to rollback all successful writes till that point in DB. Above way forces me to write rollback logic for all successful writes if there is a failure in any of the chunks.
2) Second way
Step 1 (Chunk oriented step): Read items from the file —> filter, validate and transform the read row in DTO (data transfer object). This does store the errors in the DTO object itself.
Step 2 (Tasklet): Read entire list (and not chunks) of DTOs created from step 1 —> Check if any of the DTOs has errors populated in it. If yes, then abort the writing to DB and fail the JOB.
In second way, I get all benefits of chunk processing and scaling. At the same time I have created transaction boundary for entire job.
PS: In both ways in their first step there won’t be any step failure, if there is failure; errors are stored in DTO object itself. Thus, DTO object is always created.
Question is - Since I am new to Spring batch, is it a good pattern to go with second way. And is there a way that I can share data between steps so that entire List of DTOs is available to second step (in second way above) ?
In my opinion, trying to process the entire file in a single transaction (ie a transaction at the job level) is not the way to go. I would proceed in two steps:
Step 1: process the input and writes errors to the file
Step 2: this step is conditioned by step1. If no errors has been detected in step 1, then save data to the db.
This approach does not require to write data to the database and roll it back if there are errors (as suggested by option 1 in your description). It only writes to the database when everything is ok.
Moreover, this approach does not require holding a list of items in-memory as suggested by option 2, which could be inefficient in terms of memory usage and performs poorly if the file is big.
I have a file to written to db after doing some validations. The file will have Header and Trailer which needs to be validated and then skipped, and all the lines in between should be mapped and loaded to a db if validations are met. Can I use Item Reader and Writer to do this? Below is a sample file data which has a header line, a trailer line and in between them the lines with actual data to be loaded to db. Any help is appreciated.
HEADER|xxxxx|20190405T143025Z
linedata|linedata|linedata|linedata|||linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata|linedata
TRAILER|20190405T143025Z|1
p.s: I am an IIB Developer, this is my first time using spring batch.
You can break down the requirement in two steps:
Step 1: a simple tasklet that does the validation logic (looks for header/trailer records and validates them). The success of this step is a pre-condition for the next step.
Step 2: a chunk-oriented tasklet that is triggered only if step 1 succeeds, and which skips the header with FlatFileItemReader.setLinesToSkip(1) and skips the trailer with a processor that filters records starting with TRAILER.
We have a springbatch job that reads a file (flatfileitemreader), process it and writes data to a queue (jmsitemwriter).
We have another job that reads the queue (jmsitemreader) and writes a file (flatfileitemwriter). It's asynchronous process (in between the execution of the two jobs, we have some manual process that must be performed).
The flat file content doesn't have a line identifier. And we use a multi-threaded approach when reading the file ("throttle-limit"). So, the messages queued do not maintain the same order that they used to have into the flat file.
The problem is that we should generate an output file respecting the original order. So the line 33 inside the incoming file, should be line 33 into the outgoing file (it will have the contents of the original line, plus some data).
Does springbatch provide "native" a way to order the output, respecting the original read order? I used "native" because one solution that we thought is to create an additional step just to add a line number to the file and use it at the end, but I was wondering if this "reinvent the wheel"...
We are using SB 3.0.3
TIA,
Bob
The use case you are describing asks that you maintain order across multiple jobs which is not supported. In theory (while not guaranteed) a single, single threaded step would retain the order of the input file.
Since you are reading in a multithreaded manor, there really isn't a good way to guarantee the order of the items as they are being read. The best you could do is synchronize the read method and add an id as the items are being read. If the bottleneck you're attempting to address with multithreading is in the processor or writer, this may not be a bad option.
I'm wondering how to increment a number "extracted" from a field in a csv, and then rewrite the file with the number incremented.
I need this counter in a tMap.
Is the design below a good way to do it ?
EDIT: im trying a new method. see the design of my subjob below, but i have an error when i link the tjavarow to my main tmap in the main job
Exception in component tMap_1
java.lang.NullPointerException
at mod_file_02.file_02_0_1.FILE_02.tFileList_1Process(FILE_02.java:9157)
at mod_file_02.file_02_0_1.FILE_02.tRowGenerator_5Process(FILE_02.java:8226)
at mod_file_02.file_02_0_1.FILE_02.tFileInputDelimited_2Process(FILE_02.java:7340)
at mod_file_02.file_02_0_1.FILE_02.runJobInTOS(FILE_02.java:12170)
at mod_file_02.file_02_0_1.FILE_02.main(FILE_02.java:11954)
2014-08-07 12:43:35|bm9aSI|bm9aSI|bm9aSI|MOD_FILE_02|FILE_02|Default|6|Java
Exception|tMap_1|java.lang.NullPointerException:null|1
[statistics] disconnected
enter image description here
You should be able to do this mid flow in a tMap or a tJavaRow.
Simply read the number in as an integer (or other numeric data type) and then add your increment to it.
A really simple example might look like this:
Here we have a tFixedFlowInput that has some hard coded values for the job:
And we run it through a tMap where we add 1 to the age column:
And finally, we output it to the console in a table:
EDIT:
As Gabriele B has pointed out, this doesn't exactly work when reading and writing to the same flat file as Talend claims an exclusive read-write lock on the file when reading and keeps it open throughout the job.
Instead you would have to write the incremented data to some other place such as a temporary file, a database or even just to the buffer and then read that data in to a separate job which would then output the file you want and clean up anything temporary.
The problem with that is you can't do the output in the same process. I've just tried testing reading in the file in one child job, passing the data back to a parent job using a tBufferOutput and then passing that data to another child job as a context variable and then trying to output to the file. Unfortunately the file lock remains on it so you can't do this all in one self contain job (even using a parent job and several child jobs).
If this sounds horrible to you (it is) and you absolutely need this to happen (I'd suggest a database table sounds like a better match for this functionality than a flat file) then you could raise a feature request on the Talend Jira for the tFileInputDelimited to not hold the file open or to not insist on an exclusive read-write lock on the file.
Once again, I strongly recommend that you move to using a database table for this because even without the file lock issue, this is definitely not the right use of a flat file and this use case perfectly fits a database, even something as lightweight as an embedded H2 database.