Spring Batch - Read X number of lines from each file at a time - spring-batch

I have a folder with .CSV files for each User.
I have read the records from these files and make HTTP request for each record.
Because of traffic issue at downstream application - i cannot make random calls.
We have limitation like down stream application can process max 5 records of type File-1, 3 records of type File 2 and 1 record of type File 3.
So, I have select 5 records from File 1, 3 records from File 2, 1 record from File 3 and process them (send http request in async way in item processor).
How can i do this via Spring batch? For reading multiple files can use MultiSourceReader but am more worried about the logic of selecting N number of records.
Thanks in advance.

Related

Do activities in the Data Flow count towards the Pipeline Activity limit of 40?

I essentially want two sinks for 1 output. It's seeming like I'll have to duplicate my Pipeline and change the sink to the other Sink location.
I'd like to avoid that as much as possible.
So, I have a pipeline with 30 Copy Activities. Plain and simple, source to sink.
If I changed those into Data flows which split the Sink between two different sources (using the new Branch feature), would that increase the count of activities or do Data flows count as 1 activity?
Data Flow is one active. But in Data Flow active, we can create more flows to copy data or do data conversion from source and sink.
We can create more sources to one sink, but one sink for one output, just for now there we can't achieve two sinks for one output.
The max number of 40 activities allowed per pipeline. Data Flow doesn't have the source and sink limits. I just tested and we can create more than 40 flows. That mean that we can create 40 data flows in one pipeline, and each data flow can contains more than 40 flows(source to sink).
Like you said, you have a pipeline with 30 Copy Activities, you have two ways to build the pipeline:
30 actives: copy active 1 + copy active 2 + copy active 3 + ... + copy active 30.
1 active Data Flow: source1-->sink 1, source2-->sink2, ... ,source30-->sink30.
Data Factory doesn't charge for the actives number, only charge the for the amount of the amount of data transferring and how many resource you used in Data Factory.

How to run Spring batch job only after completion running job

I have a list of records to process via a spring batch job. Each record has millions of data points, I want to process each record one after another otherwise database will not handle the load.
Data will be like this:
artworkList will contain 10 records and each artwork record will containt 30 million of data.
I am using spring batch with quartz schedular.

Spring batch how to let the Writer know that it received the last entity running through the flow

I have a flow of Reader -> Processor -> writer
Every 50 Million records the writer is writing the data into a file and zip it.
the problem is that once the Reader has finished the Writer still "holds" many records which are not written since it didn't reach the 50 M records threshold.
Any advise on how to implement it in a way that the data will be written to many files with 50 M records each and a single file with the renaming records ?
If you use a MulitResoureceItemWriter, you can use chunk size to dictate how this should work. It can be configured to write at your specific threshold and if there is a remainder in the final chunk, that will also be written out. You can read more about this useful delegate in the documentation here: https://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/file/MultiResourceItemWriter.html

Spring Batch - Chunk Processing

Im my chunk processing ,I have read one value from file and in my processor im passing this value to the DB , that will return 4 records for that single value. And im returning 4 records to the writer which is going to write in the DB . I'm failing the job in the 3rd record which is returned for the value read from the file. But after failing the job, 3 records from the DB is not rollbacked?
How the chunk is maintaining the transaction whether it is based on read count and write count of the record or not?

Spring Batch Integration - Multiple files as single Message

In the sample https://github.com/ghillert/spring-batch-integration-sample, the file inbound adapter is configured to poll for a directory and once the file is placed in that directory, a FileMessageToJobRequest is constructed and the Spring batch Job is getting launched.
So for each file, a new FileMessageToJobRequest is constructed and a new Spring batch Job instance is getting created.
We also want to configure a file inbound adapter to poll for the files but want to process all the files using a single batch job instance.
For example, If we place 1000 files in the directory and have max-messages-per-poll to 1000, We want to send the name of the 1000 files as one of the parameters to Spring batch Job instead of calling the Job 1000 times.
Is there a way to send list of files that the File Inbound adapter picked up during its one poll as a single message to the subsequent Spring components?
Thank You,
Regards
Suresh
Even if it is a single poll, the inbound-channel-adapter emits messages for each entry.
So, to collect them to the single message you need to use an <aggregator>.
Although with that you have to come up with ReleaseStrategy. Even if you can just use 1 as a correlationKey, there is some issue with releasing the group.
You should agree with that you don't always have 1000 of files there to group them to the single message. So, maybe a TimeoutCountSequenceSizeReleaseStrategy is a good compromise to emit the result after some timeout, even if you don't have enough number of files to complete the group by size.
HTH
UPDATE
You can consider to use group-timeout on the <aggregator> to allow to release groups even if there is no a new message during that period.
In addition the there is an expire-groups-upon-completion option to be sure that you "single" will be cleared and removed after each release. To allow to form a new group for the next poll.