Passing data from itemreader to processor

Passing data from itemreader to processor - spring-batch

How is the data read is passed from reader to Itemprocessor in Spring batch? Is there a queue where it is put from ItemReader's read method which is consumed by ItemProcessor? I have to read 10 records at a time from a database and process 5 at a time in the ItemProcessor's process method. ItemProcessor is taking the records one by one and I want to change it to 5 records at a time in the process method.

Every item that is returned from the read method of a reader will be forwarded to the processor as one item.
If you want to collect a group of items and pass them as a group to the processor, you need a reader that groups them.
You could implement something like a group-wrapper.
I explained such an approach in another answer: Spring Batch Processor

Related

How to process List of items in Spring batch using Chunk based processing| Bulk processing items in Chunk

I am trying to implement a Spring batch job where in order to process a record , it require 2-3 db calls which is slowing down the processing of records(size is 1 million).If I go with chunk based processing it would process each record separately and would be slow in performance. So, I need to process 1000 records in one go as bulk processing which would reduce the db calls and performance would increase. But my question is If I implement Tasklet then I would lose the functionality of restartability and retrial/skip features too and if implemented using AggregateInputReader I am not sure what would be the impact on restartability and transaction handling.
As per the below thread AggregateReader should work but not sure its impact on transaction handling and restartability in case of failure:
Spring batch: processing multiple record at once

The first extension point in the chunk-oriented processing model that gives you access to the list of items to be written is the ItemWriteListener#beforeWrite(List items). So if you do not want to enrich items one at a time in an ItemProcessor, you can use that listener to do the enrichment for the entire chunk at once.

How to schedule periodical task based on number of processed messages?

I want to use Kafka Processor API to process messages from Kafka.
I would like to call some periodically function - something like:
context.schedule(IntervalMS,punctuationType, somePunctuator), where somePunctuator perform some periodical job, but instead using interval time as trigger I would like to invoke that task after processing some number of messages
Is it possible do such triggering in Kafka streams?

yes, it's possible with using Kafka Streams State Store.
logic depends on what exactly you need to do on reaching the number of processed messages.
if you need to propagate data to the next processor or sink node, you need to store aggregated values as a list of objects inside key-value state store. inside Processor.process(..) you put data into key-value store, and after that check whether number of items reached limit, and do required logic (like processorContext.forward(..)). please take a look at similar example here.
if you need to do some logic after reaching number and don't need values, you could store only counter, and inside Processor.process(..) increment this value.

In spring batch, how to mark a record a skipped record (without retry) during the writing phase

Spring batch has facility to provide the declarative skip policy (i.e. skippable-exception-classes) to state that the particular record needs to be skipped in the batch processing.
This is quite straight forward in case of ItemReader and ItemProcessor (as they operate record by record basis).
However in case of ItemWriter, when the writing of the record fails (because of the DB Constraint violation), I want to skip that record and let other records go through.
As far as I have researched, I can implement this in two ways,
1) Throw the skippable exception, and Spring Batch will start retry operation with one item per batch, and so if the original batch size is 1000, then the batch will call the writer (and processor if it's transactional) 1000 times (once for each record) and record the skipCount for such item which fails with skip exception (which is most probably the same item which had failed in normal operation)
2) ItemWriter catches the SQLException, and resumes the processing the next record till the end of the items list.
The 2nd approach has a problem of losing the statistics about how many records did not go through (i.e. skipped records) and the batch will record all the items are successfully written and hence update the write count with improper value.
The 1st approach is a little bit tricky in my use-case as it involves re-execution of all the items (on DB side we have complex SPs + triggers) and therefore unnecessarily takes more time.
I am looking for some legal alternative to retry to just record the skipped record count during writing phase.
If none, I will go for the 1st option.
Thanks !

This specifies after how many executions of writer the transaction is commited.
<chunk ... commit-interval="10"/>
As you want to skip all the items that fail while persisted to DB you need commit-interval to be 1 in order to actually persist the good items and not be rolled back along a bad one.
Assuming the reader sends only one item to the processor (and not the list of 1000) reader, processor and writer get executed in order for each item. In this case option 2) is not useful as writer receives only one item always.
You can control how the skip count is incremented by calling StepContribution.html#incrementWriteCount and other increment*Count methods from this class.

How to design spring batch to avoid long queue of request and restart failed job

I am writing a project that will be generating reports. It would read all requests from a database by making a rest call, based on the type of request it will make a rest call to an endpoint, after getting the response it will save the response in an object and save it back to the database by making a call to an endpoint.
I am using spring-batch to handle the batch work. So far what I came up with is a single job (reader, processor, writer) that will do the whole things. I am not sure if this is the correct design considering
I do not want to queue up requests if some request is taking a long time to get a response back. [not sure yet]
I do not want to hold up saving response until all the responses are received. [using commit-internal will help]
If the job crashes for some reason, how can I restart the job [maybe using batch-admin will help but what are my other options]

By using chunk oriented processing Reader, Processor and Writer get executed in order until Reader has nothing to return.
If you can read one item at a time, process it and send it back to the endpoint that handles the persistence this approach is handy.
If you must read ALL the information at once the reader will get a big collection with all items and pass it to processor. The processor will process all the items and send the result to the writer. You cannot send just a few to the writer so you would have to do the persistence directly from processor and that would be against the design.
So, as I understand this, you have two options:
Design a reader that can read one item at a time. Use the chunk oriented processing that you already started to read one item, process it and send it back for persistence. Have a look at how other readers are implemented (like JdbcCursorItemReader).
You create a tasklet that reads the whole collection of items process it and sends them back for processing. You can break this in different tasklets.
commit-interval only controls after how many items transaction is commited. So it will not help you as all the processing and persistence is done by calling rest services.

I have figured out a design and I think it will work fine.
As for the questions that I asked, following are the answers:
Using asynchronous processors will help avoiding any queue.
http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html#asynchronous-processors
using commit-internal will solve it
This thread has the answer - Spring batch :Restart a job and then start next job automatically

Clarification needed with Spring batch concepts

I am new at Spring batch and I am having an issue implementing my business use case with Spring batch.
Basically, I am reading data from a database i.e. a List of subscribers to a newsletter. I then need to send an email to each subscriber as well as to insert data into the database in order to know which subscriber the email was sent to.
I use an ItemProcessor implementation whose process method returns a MimeMessage and takes a subscriber as an argument; the writer associated with this processor is of type: org.springframework.batch.item.mail.javamail.MimeMessageItemWriter.
The issue is that I need another writer for the database inserts (possibly using a CompositeItemWriter) that takes a List of subscribers as an argument and all I have as input is a MimeMessage from the above ItemProcessor.
Can anyone please help?

From what you've said using the ItemProcessor interface to save the message to the database is conceptually not right. You need to use ItemWriter for that. You can implement writing to DB as ItemWriter and sending the mail message as ItemWriter and use CompositeItemWriter to combine them.
Subscriber is passed to these item writers.
The transformation of Subscriber to MimeMessage is done by 2nd writer internally before transferring to MimeMessageItemWriter (which is aggregated by this writer).
Sending the message to subscriber should be done after saving to the DB, as DB can be rolled back if something goes wrong with sending the message (if you need that functionality), and your batch size should be 1 (otherwise rollback will wrongly discard all notifications which have been successfully sent).