Passing an Object from LineCallbackHandler to ItemWriter - spring-batch

I am writing a batch application which reads line-by-line from a file, process the content and write to database. Am using FlatFileItemReader for reading from file.
The first line in the file is special (header) which is skipped using linesToSkip and processed using a LineCallbackHandler (HeaderHandler). The HeaderHandler builds a cache using the header information.
Now I want to make use of this cache within my ItemWriter. Am not sure how to pass the cache object I build within HeaderHandler to my ItemWriter. Is there a clean way of doing this?

you have at least 2 possibilities:
use the StepExecutionContext as temporary memory for your data, take a look at the spring batch doc:
passing data to future steps (just ignore the promotionListener
part) your headerHandler writes the cache into the context and the
writer pulls the cache from the context
use a spring bean with either a custom DataObjectClass or String, Map, etc. which will be injected in the headerHandler and the (custom)itemWriter both use it accordingly

Related

Fetch and maintain reference data at Job level in Spring Batch

I am configuring a new Job where I need to read the data from the database and in the processor, the data will be used to call a Rest endpoint with payload. In the payload along with dynamic data, I need to pass reference data which is constant for each record getting processed in the job. This reference data is stored in DB. I am thinking to implement the following approach.
In the beforeJob listener method make a DB call and populate the reference data object and use it for the whole job run.
In the processor make a DB call to get the reference data and cache the query so there will be no DB call to fetch the same data for each record.
Please suggest if these approaches are correct or if there is a better way to implement them in Spring batch.
For performance reasons, I would not recommend doing a DB call in the item processor, unless that is really a requirement.
The first approach seems reasonable to me, since the reference data is constant. You can populate/clear a cache with a JobExecutionListener and use the cache in your chunk-oriented step. Please refer to the following thread for more details and a complete sample: Spring Batch With Annotation and Caching.

Spring Batch: reuse existing service as a reader

I want to reuse an existing, transactional,paginated service class, which retrieves the items using JPA from a database, inside a Spring batch job, as a reader. I want to do that instead of using directly the JpaPagingItemReader basically because the JPA query is more complex to build and the service already provides this functionality.
My question would be what are the things I should take into account when developing the Spring batch adapter over this service. Although the reference documentation http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#pagingItemReaders has a section on reusing existing services, it doesn't say anything regarding the constraints, if there are any, of using such a transactional service.
Now, I looked at the JpaPagingItemReader as an example for building the reader, and I came up with a couple of questions I couldn't find answers for netiher in the documentation or on stackoverflow, although this post https://stackoverflow.com/a/26549831/4473261 helped.
The first thing I noticed is that a new transaction is used by the JpaPagingItemReader for reading a page of data. The above post says that this new transaction is needed "so that features like retry and skip can be correctly performed.". I have also found this article related to the matter https://blog.codecentric.de/en/2012/03/transactions-in-spring-batch-part-3-skip-and-retry/ that says that "when a skippable exception occurs during reading, we just increase the skip count and keep the exception for a later call on the onSkipInRead method of the SkipListener, if configured. There’s no rollback". So I assume that the reader has to do any reading of the records in a new transaction so that if a rollback of the transaction started when the processing of the chunk began happened, then the reader is not affected. I am wondering if this is true and if in this case my adapter should create a new transaction, invoke the service inside that transaction and then commit the transaction, similarly to how the JpaPagingItemReader does it. If that's true though, I wonder why there isn't any template provided by the framework which creates the transaction, delegates to the service the actual call to retrieve the data and then commits the transaction.
Greetings,
Cristi
From a reader perspective, there really isn't much to be concerned about. You can see in our JmsItemReader which obviously works with a transactional store that we don't take any additional precautions within the ItemReader itself.
What really matters is how you configure your step. When configuring your step, you'll need to mark the reader as transactional so that Spring Batch handles rollback correctly. When Spring Batch reads items in a fault tollerant step, the default behavior is to buffer them so that they won't be re-read on failure (retry, skip, etc). However, since the items read from a transactional store are tied to the transaction (and therefore reset when the rollback occurs), you need to tell Spring Batch to not buffer the items as they are read.
To mark the ItemReader as transactional, you'll set the not-quite-well-named flag is-reader-transactional-queue to true. You can read more about configuring steps and transactions in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/configureStep.html

How to read multiple items in ItemReader

Following is my use case for spring batch.
Reads the input from web service. Web service will return all records.
Process the records.
Write the processed records one by one.
I'm clear about step 2 and 3 but not able to figure out how to implement a reader which can read all the records in one go. How to pass the records one by one to item processor /writer?
Should I be using tasklet instead of reader/writer?
What will your WebService Returns? A collection of object i guess!
Your ItemReader need to loop on this collection and remove items one-by-one then return null when they are all processed.
What #Kik was saying is the rest is handled by Spring batch based on your commit-interval. if you have a commit0interval of 10 for example, your reader will read 10 items, passed those 10 items to the ItemProc. then pass them again after to the writer.
Hope it clarify
EDIT: 1) In Spring Batch you have more than one option to do what you need.
Easy Option, create a custom MyWsItemReader that implements the ItemReader interface.
-Define a method init() in this class that will call your webService and put the results in a collection attribute of MyWsItemReader.
-Implements the method read() from the interface. (read carfully the contact in the doc - you must return null when you passed all the elements of the collection)
-Then, configure a stepListener around the step and implement the beforeStep() method to call the init() of your MyWsItemReader. You can autowire the reader in the listener to accomplish this.
Alternatively, your MyWsItemReader could also implements the InitializingBean. then you would have to implement the afterPropertySet() where you could call the ws and store the result in a private attribute of MyWsItemReader
regards

Executing Spring Batch ItemReader in a loop and pass IN parameters

I am new to Spring Batch. I have following question.
I am using Spring Batch for developing a batch process.
I have a java array with some 'process_id' values in it. What I want to do is for each 'process_id' I need to call database stored procedure using a ItemReader. Can anyone help me to write ItemReader to achieve this?
Thanks for your help.
You should think about your reader as designed :
The reader should provide the ID collection.
It will pass each ID to a writer that will call you stored procedure.
The goal of the reader is to find data to process.
Each data is then send to a writer (or a processor, depends on your batch design).

HSQLDB and in-memory files

Is it possible to setup HSQLDB in a way, so that the files with the db information are written into memory instead of using actual files? I want to use hsqldb to export some data structures together with hibernate mappings. Is is, however, not possible to write temporary files, so that I need to generate the files in-memory and return a stream with their contents as a response.
Setting hsqldb to use nio seems not to be a solution, because there is no way to get hold of those files before they get written onto the filesystem.
What I'm thinking of is a protocol handler for hsqldb, but I didn't find a suitable solution yet.
Just to describe in other words: A hack solution would be to pass hsqldb a stream or several streams. It would then during its operation write data into those streams. After all data is written, the user of the db could then use those streams to send it back over the network.
Yes, of course, we use it all the time for integration testing.
use as url : jdbc:hsqldb:mem:aname
see here for more details
DbUnit offers a handy database dump method as part of their package :
// database connection
Class driverClass = Class.forName("org.hsqldb.jdbcDriver");
Connection jdbcConnection = DriverManager.getConnection(
"jdbc:hsqldb:sample", "sa", "");
IDatabaseConnection connection = new DatabaseConnection(jdbcConnection);
// full database export
IDataSet fullDataSet = connection.createDataSet();
FlatXmlDataSet.write(fullDataSet, new FileOutputStream("full.xml"));
see DbUnit FAQ for more details. Of course there are routines to restore the data, as that is actually the puropose of the package : prepare a test database for integration testing. Usually we do this with an annotation, but you'll have to use tha API for that.