How to read multiple items in ItemReader - spring-batch

Following is my use case for spring batch.
Reads the input from web service. Web service will return all records.
Process the records.
Write the processed records one by one.
I'm clear about step 2 and 3 but not able to figure out how to implement a reader which can read all the records in one go. How to pass the records one by one to item processor /writer?
Should I be using tasklet instead of reader/writer?

What will your WebService Returns? A collection of object i guess!
Your ItemReader need to loop on this collection and remove items one-by-one then return null when they are all processed.
What #Kik was saying is the rest is handled by Spring batch based on your commit-interval. if you have a commit0interval of 10 for example, your reader will read 10 items, passed those 10 items to the ItemProc. then pass them again after to the writer.
Hope it clarify
EDIT: 1) In Spring Batch you have more than one option to do what you need.
Easy Option, create a custom MyWsItemReader that implements the ItemReader interface.
-Define a method init() in this class that will call your webService and put the results in a collection attribute of MyWsItemReader.
-Implements the method read() from the interface. (read carfully the contact in the doc - you must return null when you passed all the elements of the collection)
-Then, configure a stepListener around the step and implement the beforeStep() method to call the init() of your MyWsItemReader. You can autowire the reader in the listener to accomplish this.
Alternatively, your MyWsItemReader could also implements the InitializingBean. then you would have to implement the afterPropertySet() where you could call the ws and store the result in a private attribute of MyWsItemReader
regards

Related

Fetch and maintain reference data at Job level in Spring Batch

I am configuring a new Job where I need to read the data from the database and in the processor, the data will be used to call a Rest endpoint with payload. In the payload along with dynamic data, I need to pass reference data which is constant for each record getting processed in the job. This reference data is stored in DB. I am thinking to implement the following approach.
In the beforeJob listener method make a DB call and populate the reference data object and use it for the whole job run.
In the processor make a DB call to get the reference data and cache the query so there will be no DB call to fetch the same data for each record.
Please suggest if these approaches are correct or if there is a better way to implement them in Spring batch.
For performance reasons, I would not recommend doing a DB call in the item processor, unless that is really a requirement.
The first approach seems reasonable to me, since the reference data is constant. You can populate/clear a cache with a JobExecutionListener and use the cache in your chunk-oriented step. Please refer to the following thread for more details and a complete sample: Spring Batch With Annotation and Caching.

How to add a list of Steps to Job in spring batch

I'm extending existing Job. What I need to do is update a list of records from database with data gotten from external service. I don't know how to do it in a loop so I thought about creating a list of Steps each consisting of reader, processor and writer and simply adding them to next() method in a jobBuilder. Looking at documentation it's only possible to add one Step at a time, and I have several thousands rows in the database, thus several thousands Steps. How should I do this?
edit:
in short I need to:
read a list of ids from db,
for every id I need to call external service to get information relevant to this id,
process data from it
save updated row to db

How do I create custom kafka consumer Interceptor to intercept one record at a time?

I'm trying to add a custom consumer interceptor using org.apache.kafka.clients.consumer.ConsumerInterceptor.
The problem I've here is the onConsume() method is taking ConsumerRecords<String, Object> records but i'm looking for intercepting only one record at a time instead of a bunch of records.
How do i do that? Please suggest
I was able to achieve by implementing 'RecordInterceptor' interface from 'spring-kafka-2.7.7-RELEASE' without changing any settings on the kafka.
This new interface was added last week. You can see the details here if you are interested in.
https://github.com/spring-projects/spring-kafka/issues/1118
As per the ConsumerInterceptor interface implementation, records will come in batch inside onConsume() method. You can iterate on this records and perform required business operation.
If you use Spring:
You can use
org.springframework.kafka.listener.RecordInterceptor with method
public ConsumerRecord<Object, Object> intercept(ConsumerRecord<Object, Object> record) this will work by intercepting one record at a time.
Then, if you're going to change record after sending you can use `org.apache.kafka.clients.producer.ProducerInterceptor.

Spring batch Item reader to iterate over a rest api call

I have a spring batch job, which needs to fetch details from rest api call and process the particular data on my side. My rest api call will have mainly the below parameters :
StartinIdNumber(offset)
PageSize(limit)
ps: StartinIdNumber serves the same purpose as rownumber or "offset" in this particular API. The API response results are sorted by IdNumber, so by specifying a StartinIdNumber, the API will in turn perform a "where IdNumber >= StartinIdNumber order by IdNumber limit pageSize" in their DB query.
It will return the given number of user details, I need to iterate through all the ids by changing the StartingIdNumber parameter for each request.
I have seen current ItemReader implementations of spring batch framework,which read through database or xml etc. But I didn't come across any reader which helps in my case. Please suggest a way to iterate through the user details as specified above .
Note : If I write my own custom item reader, I have to take care of preserving state (last processed "StartingIdNumer") which is proving challenging to me.
Does implementing ItemStream serves my purpose? Or is there any better way?
Implementing the ItemStream interface and writing my own custom reader served my purpose. It is now state-full as required for me. Thanks.

How to make ItemReader reads 2 tables

I have to create a batch job to do financial reconciliation. Right now i have 3 steps:
step 1 : Read an XML from the third party , convert this in our domains object, write in DB(table 1)
step 2 : Read a flatFile from our transactions datastore, write in DB (Table2)
step 3 : Read both table1 and table 2 in an aggregatorObject, process both list to find differences and set status code, write a status code in table 2
My problem is with step3. I can't find a good solution to have my ItemReader reading from 2 SQL.
I started with a custom ItemReader like this :
package batch.concilliation.readers;
#Component("conciliationReader")
public class TransactionReader implements ItemReader<TransactionsAgragegator>{
private final Logger log = Logger.getLogger(TransactionReader.class);
#Autowired
private ConciliationContext context;
#Autowired
private ServiceSommaireConciliation serviceTransactionThem;
#Autowired
private ServiceTransactionVirement serviceTransactionUs;
#Override
public TransactionsAgragegator read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
TransactionsAgragegator agregator = new TransactionsAgragegator();
SommaireConciliationVirementInterac sommaire = serviceSommaireThem.findByRunNo(context.getRunNo());
List<TransactionVirement> journalSic = serviceTransactionUs.findByTimestamp(sommaire.getBeginDate(), sommaire.getEndDate());
// on place ces deux listes dans l'objet agregteur.
agregator.setListeTransactionThem(sommaire.getPayments());
agregator.setListeTransactionsUs(journalSic);
return aggregator;
}
}
This Reader use two services already implemented (DAO) that read both tables and return domain objects. I take the two lists of transaction from us and from them and put them in an aggregator object. This object would be passed to the ItemProcessor and i could do my business logic... but this reader start an infinite loop since it will never read null.
I read about ItemReaderAdapter, but i still have the same problem of looping over a collection until i get a null.
So in summary, i want to read 2 different tables and get 2 List:
List<TransactionThirdParty>
List<TransactionHome>
then My ItemProcesssor would check to see if both lists are equals or not, is one has more or less transactions then the other..etc
Any Spring Batch expert can suggest something?
The problem here is that your first two steps are chunk oriented but the third one is not. While the first two may have the usual read-process-write cycle, the third step while dependent on the first two is a one time operation. It is no more different then copying a file in batch domain.
So you should not use the ItemReader way here, because you do not have an exit criteria (that is why you never get nulls from the reader, it cannot know when the source is exhausted since it does not deal with a line or record.
That is where TaskletStep helps
The Tasklet is a simple interface that has one method, execute, which
will be a called repeatedly by the TaskletStep until it either returns
RepeatStatus.FINISHED or throws an exception to signal a failure.
So implement your third step as a Tasklet instead of chunk oriented way.