Spring Batch : CompositeItemWriter, #BeforeStep and Controlling StepExecution - spring-batch

I've stumbled upon a pretty twisted issue with a spring batch recently.
Requirements are as follows :
I have two main steps :
The first one reads some data from an oracle database, from a table to write to another table.
The second one does some other database stuff, based upon a data handled on first step.
From a design standpoint, first step looks like this :
#Bean
public Step myFirstStep(JdbcCursorItemReader<Revision> reader) {
return stepBuilderFactory.get("my-first-step")
.<Revision, Revision>chunk(1)
.reader(readerRevisionNumber)
.writer(compositeItemWriter())
.listener(executionContextPromotionListener())
.build();
Composite item writer :
#Bean
public CompositeItemWriter<Revision> compositeItemWriter() {
CompositeItemWriter writer = new CompositeItemWriter();
writer.setDelegates(Arrays.asList(somewriter(), someOtherwriter(), aWriterThatIsSupposedToPassDataToAnotherStep()));
return writer;
}
While the first two writer are not complex, my interest is focused on the third one.
aWriterThatIsSupposedToPassDataToAnotherStep()
As you might have guessed, this one will be used to get some data being processed before, to promote it on my second Step :
#Component
#StepScope
public class AWriterThatIsSupposedToPassDataToAnotherStep implements ItemWriter<SomeEntity> {
private StepExecution stepExecution;
public void write(List<? extends SomeEntity> items) {
ExecutionContext stepContext = this.stepExecution.getExecutionContext();
stepContext.put("revisionNumber", items.stream().findFirst().get().getSomeField());
System.out.println("writing : " + items.stream().findFirst().get().getSomeField()+ "to ExecutionContext");
}
#BeforeStep
public void saveStepExecution(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
}
Problem is : As long as this writer is part of a composite writer list (as declared above)
The #BeforeStep of my last writer is never executed, this ends up me unable to transmit my information to execution context.
When replacing my CompositeItemWriter by my single "AWriterThatIsSupposedToPassDataToAnotherStep" inside step definition, it gets executed properly.
Does it have to do anything with some kind of declaration order or something ?
Big Thanks to further help.

Found the solution (with some of my coworkers help), and sourced-in from : https://stackoverflow.com/a/39698653/1957764
You'll need to both declare the writer as part of the composite writer AND a step listener to make it execute the #BeforeStep annotated method.

Related

Proper way to write a spring-batch ItemReader

I'm constructing a spring-batch job that modifies a given number of records. The list of record ID's are an input parameter of the job. For example, one job might be: Modify the record Id's {1,2,3,4} and set parameters X and Y on related tables.
Since I'm unable to pass a potentialy very long input list (tipical cases, 50K records) to my ItemReader I only pass a MyJobID which then the itemReader uses to load the target ID list.
Problem is, the resulting code appears "wrong" (altough it works) and not in the spirit of spring-batch. Here's the reader:
#Scope(value = "step", proxyMode = ScopedProxyMode.INTERFACES)
#Component
public class MyItemReader implements ItemReader<Integer> {
#Autowired
private JobService jobService;
private List<Integer> itemsList;
private Long jobId;
#Autowired
public MyItemReader(#Value("#{jobParameters['jobId']}") final Long jobId) {
this.jobId = jobId;
this.itemsList = null;
}
#Override
public Integer read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
// First pass: Load the list.
if (itemsList == null) {
itemsList = new ArrayList<Integer>();
MyJob myJob = (MyJob) jobService.loadById(jobId);
for (Integer i : myJob.getTargedIdList()) {
itemsList.add(i);
}
}
// Serve one at a time:
if (itemsList.isEmpty()) {
return null;
} else {
return itemsList.remove(0);
}
}
}
I tried to move the first part of the read() method to the constructor but the #Autowired reference is null at that point. Afterwards (on the read method) it's initialized.
Is there a better way to write the ItemReader? I would like to move the "load"Or is this the best solution for this scenario?
Thank you.
Generally, your approach is not "wrong", but probably not ideal.
Firstly, you could move the initialisation to a initMethod which is annotated with #PostConstruct. This method is called after all Autowired fields have been injected:
#PostConstruct
public void afterPropertiesSet() throws Exception {
itemsList = new ArrayList<Integer>();
MyJob myJob = (MyJob) jobService.loadById(jobId);
for (Integer i : myJob.getTargedIdList()) {
itemsList.add(i);
}
}
But there is still the problem, that you load all the data at once. If you have a billion records to process, this could blow up the memory.
So what you should do is to load only a chunk of your data into memory, then return the items one by one in your read method. If all entries of a chunk have been returned, load the next chunk and return its items one by one again. If there is no other chunk to be loaded, then return null from the read method.
This ensures that you have a constant memory footprint regardless of how many records you have to process.
(If you have a look at FlatFileItemReader, you see that it uses a BufferedReader to read the data from the disk. While it has nothing to do with SpringBatch, it is the same principle: it reads a chunk of data from the disk, returns that and if more data is needed, it reads the next chunk of data).
Next problem is the restartability. What happens if the job crashes after doing 90% of the work? How can the job be restarted and only process the missing 10%?
This is actually a feature that springbatch provides, all you have to do is to implement the ItemStream interface and implement the methods open(), update(), close().
If you consider this two points - load data in chunks instead all at once and implement ItemStream interface - you'll end up having a reader that is in the spring spirit.

FlatFileItemWriter write header only in case when data is present

have a task to write header to file only if some data exist, other words if reader return nothing file created by writer should be empty.
Unfortunately FlatFileItemWriter implementation, in version 3.0.7, has only private access fields and methods and nested class that store all info about writing process, so I cannot just take and overwrite write() method. I need to copy-paste almost all content of FlatFileItemWriter to add small piece of new functionality.
Any idea how to achieve this more elegantly in Spring Batch?
So, finally found a less-more elegant solution.
The solution is to use LineAggregators, and seems in the current implementation of FlatFileItemWriter this is only one approach that you can use safer when inheriting this class.
I use separate line aggregator only for a header, but the solution can be extended to use multiple aggregators.
Also in my case header is just predefined string, thus I use PassThroughLineAggregator by default that just return my string to FlatFileItemWriter.
public class FlatFileItemWriterWithHeaderOnData extends FlatFileItemWriter {
private LineAggregator lineAggregator;
private LineAggregator headerLineAggregator = new PassThroughLineAggregator();
private boolean applyHeaderAggregator = true;
#Override
public void afterPropertiesSet() throws Exception {
Assert.notNull(headerLineAggregator, "A HeaderLineAggregator must be provided.");
super.afterPropertiesSet();
}
#Override
public void setLineAggregator(LineAggregator lineAggregator) {
this.lineAggregator = lineAggregator;
super.setLineAggregator(lineAggregator);
}
public void setHeaderLineAggregator(LineAggregator headerLineAggregator) {
this.headerLineAggregator = headerLineAggregator;
}
#Override
public void write(List items) throws Exception {
if(applyHeaderAggregator){
LineAggregator initialLineAggregator = lineAggregator;
super.setLineAggregator(headerLineAggregator);
super.write(getHeaderItems());
super.setLineAggregator(initialLineAggregator);
applyHeaderAggregator = false;
}
super.write(items);
}
private List<String> getHeaderItems() throws ItemStreamException {
// your actual implementation goes here
return Arrays.asList("Id,Name,Details");
}
}
PS. This solution assumed that if method write() called then some data exist.
Try this in your writer
writer.setShouldDeleteIfEmpty(true);
If you have no data, there is no file.
In other case, you write your header and your items
I'm thinking of a way as below.
BeforeStep() (or a Tasklet) if there is no Data at all, you set a flag such as "noData" is 'true'. Otherwise will be 'false'
And you have 2 writers, one with Header and another one without Header. In this case you can have a base Writer acts as a parent and then 2 writers inherits it. The only difference between them is one with Header and one doesn't have HeaderCallBack.
Base on the flag, you can switch to either 'Writer with Header' or 'Writer without Header'
Thanks,
Nghia

mark read data as "processing" by a table column flag then restore at the end

Below is a relevant portion of code for reader, processor , writer and step for batch job that I create.
I have a requirement to update a flag column in table from where data is being read ( source table ) to mark that this data is being processed by this job so other apps don't pick up that data. Then once processing of read records is finished, I need to restore that column to original value so other apps can work on those records too.
I guess, listener is the approach to take ( ItemReadListener ? ) . Reader listener seems suitable only for first step ( i.e to update flag column ) but not for restore at the end of chunk. Challenge seems to be making read data available at the end of processor.
Can anybody suggest about possible approaches?
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> reader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.<RemittanceVO, RemittanceClaimVO> chunk(Constants.SPRING_BATCH_CHUNK_SIZE)
.reader(reader)
.processor(processor)
.writer(writer)
.taskExecutor(simpleAsyntaskExecutor)
.throttleLimit(Constants.THROTTLE_LIMIT)
.build();
}
#Bean
public ItemReader<RemittanceVO> reader() {
JdbcPagingItemReader<RemittanceVO> reader = new JdbcPagingItemReader<RemittanceVO>();
reader.setDataSource(dataSource);
reader.setRowMapper(new RemittanceRowMapper());
reader.setQueryProvider(queryProvider);
reader.setPageSize(Constants.SPRING_BATCH_READER_PAGE_SIZE);
return reader;
}
#Bean
public ItemProcessor<RemittanceVO, RemittanceClaimVO> processor() {
return new MatchClaimProcessor();
}
#Bean
public ItemWriter<RemittanceClaimVO> writer(DataSource dataSource) {
return new MatchedClaimWriter();
}
I started with Spring Batch few days ago so don't have familiarity with all the provided modeling and patterns.
Firstly, a small hint about using an asyncTaskExecutor: you have to synchronize the reader, otherwise you will run into concurrency problems. You can use SynchronizedItemStreamReader to do this:
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> reader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.<RemittanceVO, RemittanceClaimVO> chunk(Constants.SPRING_BATCH_CHUNK_SIZE)
.reader(syncReader)
.processor(processor)
.writer(writer)
.taskExecutor(simpleAsyntaskExecutor)
.throttleLimit(Constants.THROTTLE_LIMIT)
.build();
}
#Bean
public ItemReader<RemittanceVO> syncReader() {
SynchronizedItemStreamReader<RemittanceVO> syncReader = new SynchronizedItemStreamReader<>();
syncReader.setDelegate(reader());
return syncReader;
}
#Bean
public ItemReader<RemittanceVO> reader() {
JdbcPagingItemReader<RemittanceVO> reader = new JdbcPagingItemReader<RemittanceVO>();
reader.setDataSource(dataSource);
reader.setRowMapper(new RemittanceRowMapper());
reader.setQueryProvider(queryProvider);
reader.setPageSize(Constants.SPRING_BATCH_READER_PAGE_SIZE);
return reader;
}
Secondly, a possible approach to your real question:
I would use a simple tasklet in order to "mark" the entries you want to process.
You can do this with one simple UPDATE-statement, since you know your selection criterias. This way, you only need one call and therefore only one transaction.
After that, I would implement an normal step with reader, processor and writer.
The reader has to read only the marked entries, making your select clause also very simple.
In order to restore the flag, you could do that in a third step which is implemented as tasklet and uses an appropriate UPDATE-statement (like the first step). To ensure that the flag is restored in the case of an exception, just configure your jobflow appropriately, so that step 3 is executed even if step 2 fails (-> see my answer to this question Spring Batch Java Config: Skip step when exception and go to next steps)
Of course, you could also restore the flag when writing the chunk if you use a compositeItemWriter. However, you need a strategy how to restore the flag in case of an exception in step 2.
IMO, using a Listener is not a good idea, since the transaction handling is differently.

CustomItemReader to retrieve list from DAO

I have a DAO class to retrieve a set of data from Hibernate.
<batch:step id="firstStep">
<batch:tasklet>
<batch:chunk reader="firstReader" writer="firstWriter"
processor="itemProcessor" commit-interval="2">
</batch:chunk>
</batch:tasklet>
</batch:step>
<bean id="firstReader" class="com.process.MyReader"
scope="step">
</bean>
Inside my reader, I will call DAO to get the data before read.
public class MyReader implements ItemReader<JobInstance>{
private List<JobInstance> jobList;
private String currentDate;
#Autowired
private JobDAO perDAO;
#BeforeRead
public void init() {
//jobList= perDAO.getPersonAJobList(currentDate);
}
#Override
public JobInstance read() throws Exception, UnexpectedInputException,
ParseException, NonTransientResourceException {
return !jobList.isEmpty() ? jobList.remove(0) : null;
}
#Value("#{jobParameters['currentDate']}")
public void setCurrentDate(String currentDate) {
this.currentDate = currentDate;
}
#Override
public void beforeStep(StepExecution stepExecution) {
// TODO Auto-generated method stub
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
// TODO Auto-generated method stub
return null;
}
}
When I run the batch job, the batch job keep repeating reading and processing.
[org.springframework.batch.repeat.support.RepeatTemplate] [getNextResult] [372] - Repeat operation about to start at count=1
Below is my DAO class
#Autowired
private QueryManager queryManager;
#Autowired
public JobDAO Impl(SessionFactory sessionFactory) {
super(sessionFactory, JobInstance.class);
}
public List<JobInstance> getPersonAJobList(String currentDate) {
String sql = queryManager.getNamedQuery("getJobList");
System.out.println("---------------------- " + sql + " " + currentDate);
SQLQuery query = this.getCurrentSession().createSQLQuery(sql);
query.setParameter("current_date", currentDate);
....
return result;
}
if you fill the list within the #BeforeRead annotated method, the list will be renewed before every read
see http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/annotation/BeforeRead.html
Marks a method to be called before an item is read from an ItemReader
if you need to get the items from a DAO you need to think about the implementation of either
easy way - keep the current implementation, but add a check in BeforeRead to init the list only once
a stateful DAO which fills the list once and removes items for every
read call
a stateless DAO with pagination
a better way is to move the data access (the SQL) into the batch, Spring Batch provides out of the box readers for SQL, Hibernate and even more... see http://docs.spring.io/spring-batch/reference/html/listOfReadersAndWriters.html
The init method should be called only once. The correct way to do this is either to implement the InitializingBean interface and implementing the afterPropertiesSet method, or using the #PostConstruct annotation instead of #BeforeRead.
The use of #BeforeRead is definitely wrong and makes no sense.
As also mentioned in the comments to Michael's answers, you should also consider to use one of the standard readers to get data from a db. If you just get a couple of hundred or thousand entries from getPersonAJobList it won't be a problem, but if you get millions of entries, it would definitely be wrong approach.
What about add an 'init' flag into your reader? Into MyReader.read():
if flag is not setted call jobDAO to fill jobList and set flag
If flag is setted consume jobList items.
Be careful using jobList.remove(0) because your reader seems not to be restartable; you need to maintain last consumed items index into execution-context so a restart will continue from first item of last not commited chunk.

How to end a job when no input read

We read most of our data from a DB. Sometimes the result-set is empty, and for that case we want the job to stop immediately, and not hand over to a writer. We don't want to create a file, if there is no input.
Currently we achieve this goal with a Step-Listener that returns a certain String, which is the input for a transition to either the next business-step or a delete-step, which deletes the file we created before (the file contains no real data).
I'd like the job to end after the reader realizes that there is no input?
New edit (more elegant way)
This approach is to elegantly move to the next step or end the batch application when the file is not found and prevent unwanted steps to execute (and their listeners too).
-> Check for the presence of file in a tasklet, say FileValidatorTasklet.
-> When the file is not found set some exit status (enum or final string) , here we have set EXIT_CODE
sample tasklet
public class FileValidatorTasklet implements Tasklet {
static final String EXIT_CODE = "SOME_EXIT_CODE";
static final String EXIT_DESC = "SOME_EXIT_DESC";
#Override
public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception {
boolean isFileFound = false;
//do file check and set isFileFound
if(!isFileFound){
stepContribution.setExitStatus(new ExitStatus(EXIT_CODE, EXIT_DESC));
}
return RepeatStatus.FINISHED;
}
}
-> In the job configuration of this application after executing FileValidatorTasklet, check for the presence of the EXIT_CODE.
-> Provide the subsequent path for this job if the code is found else the normal flow of the job.( Here we are simply terminating the job if the EXIT_CODE is found else continue with the next steps)
sample config
public Job myJob(JobBuilderFactory jobs) {
return jobs.get("offersLoaderJob")
.start(fileValidatorStep).on(EXIT_CODE).end() // if EXIT_CODE is found , then end the job
.from(fileValidatorStep) // else continue the job from here, after this step
.next(step2)
.next(finalStep)
.end()
.build();
}
Here we have taken advantage of conditional step flow in spring batch.
We have to define two separate path from step A. The flow is like A->B->C or A->D->E.
Old answer:
I have been through this and hence I am sharing my approach. It's better to
throw new RunTimeException("msg");.
It will start to terminate the Spring Application , rather than exact terminate at that point. All methods like close() in ( reader/writer) would be called and destroy method of all the beans would be called.
Note: While executing this in Listener, remember that by this point all the beans would have been initialized and code in their initialization (like afterPropertySet() ) would have executed.
I think above is the correct way, but if you are willing to terminate at that point only, you can try
System.exit(1);
It would likely be cleaner to use a JobExecutionDecider and based on the read count from the StepExecution set a new FlowExecutionStatus and route it to the end of the job.
Joshua's answer addresses the stopping of the job instead of transitioning to the next business step.
Your file writer might still create the file unnecessarily. You can create something like a LazyItemWriter with a delegate (FlatFileItemWriter) and it will only call delegate.open (once) if there's a call to write method. Of course you have to check if delegate.close() needs to be called only if the delegate was previously opened. This makes sure that no empty file is created and deleting it is no longer a concern.
I have the same question as the OP. I am using all annotations, and if the reader returns as null when no results (in my case a File) are found, then the Job bean will fail to be initialized with an UnsatisfiedDependencyException, and that exception is thrown to stdout.
If I create a Reader and then return it w/o a File specified, then the Job will be created. After that an ItemStreamException is thrown, but it is thrown to my log, as I am past the Job autowiring and inside the Step at that point. That seems preferable, at least for what I am doing.
Any other solution would be appreciated.
NiksVij Answer works for me, i implemented it like this:
#Component
public class FileValidatorTasklet implements Tasklet {
private final ImportProperties importProperties;
#Autowired
public FileValidatorTasklet(ImportProperties importProperties) {
this.importProperties = importProperties;
}
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
String folderPath = importProperties.getPathInput();
String itemName = importProperties.getItemName();
File currentItem = new File(folderPath + File.separator + itemName);
if (currentItem.exists()) {
contribution.setExitStatus(new ExitStatus("FILE_FOUND", "FILE_FOUND"));
} else {
contribution.setExitStatus(new ExitStatus("NO_FILE_FOUND", "NO_FILE_FOUND"));
}
return RepeatStatus.FINISHED;
}
}
and in the Batch Configuration:
#Bean
public Step fileValidatorStep() {
return this.stepBuilderFactory.get("step1")
.tasklet(fileValidatorTasklet)
.build();
}
#Bean
public Job tdZuHostJob() throws Exception {
return jobBuilderFactory.get("tdZuHostJob")
.incrementer(new RunIdIncrementer())
.listener(jobCompletionNotificationListener)
.start(fileValidatorStep()).on("NO_FILE_FOUND").end()
.from(fileValidatorStep()).on("FILE_FOUND").to(testStep()).end()
.build();
}