Spring Batch: How do I use decider based on my Reader result? - spring-batch

I am new to Spring Batch. I found out that by use of ExecutionContextPromotionListener, I can set up key value pairs and get them in future steps.
<step id="step1">....</step>
<decision id="decision1">.... />
When I used Tasklet instead of reader, I did following:
Created bean of ExecutionContextPromotionListener with expected keys in my batch config file.
Registered the Listener to my step.
Put the key-value pairs in executionContext retrieved from chunkContext inside my Tasklet like below:
Now the Decider can read from the step execution context as follows and decide.
But, I want to take decision based on Reader from previous step. So, in my decision, how to get the value from Reader? Or is my approach wrong. Please suggest.

simple way is you can make use of Execution context from your step and can pass on the values into the next step .
So, in you first step do something like this
// ...
ExecutionContext stepContext = this.stepExecution.getExecutionContext();
stepContext.put("DATA_KEY", dataToShare );
then in your next step you can read this using the execution context.
ExecutionContext jobContext = jobExecution.getExecutionContext();
dataToShare = jobContext.get("DATA_KEY");
you just need to manage the keys - the key that you use to put in first step and read it in next step

Related

Spring Batch repeat step until a criteria is met

I have this Spring Batch flow:
return jobBuilderFactory.get(JOB).preventRestart().incrementer(new RunIdIncrementer()).listener(jobCompletionListener)
.start(clean).next(generateFiles1).next(callApi1)
.next(clean).next(generateFiles2).next(callApi2)
.next(clean).next(generateFiles3).next(callApi3)
.next(clean).next(generateFiles4).next(callApi4)
.build();
I must repeat the first three steps (clean, generateFiles1 and callApi1) until a certain criteria is met (I have to count some data in the database to check if I need to call the API again). And so on for the next three steps.
I have seen the on and to functions explained there, but it seems to me that it does not allow to write such loops.
I could define such flows:
final FlowBuilder<Flow> flowBuilderStep1 = new FlowBuilder<>("Step1");
flowBuilderStep1.start(clean).next(generateFiles1).next(callApi1).end();
final Flow step1 = flowBuilderStep1.build();
final FlowBuilder<Flow> flowBuilderStep2 = new FlowBuilder<>("Step2");
flowBuilderStep2.start(clean).next(generateFiles2).next(callApi2).end();
final Flow step2 = flowBuilderStep2.build();
And then build the conditional structure (maybe after adding Decider or afterStep() some place):
return jobBuilderFactory.get(JOB).preventRestart().incrementer(new RunIdIncrementer()).listener(jobCompletionListener)
.start(step1).on("RETRY").to(step1).on("CONTINUE")
.to(step2).on("RETRY").to(step2).on("CONTINUE")
.to(step3).on("RETRY").to(step3).on("CONTINUE")
.to(step4)
.end().build();
But I don't think it would loop properly. Am I right? Can a loop be accomplished (without a xml config)?
I had to do something like that to make it work.
return jobBuilderFactory.get(JOB).preventRestart().incrementer(new RunIdIncrementer()).listener(jobCompletionListener)
.start(step1).next(step1Decider).on(RETRY).to(step1).from(step1Decider).on(CONTINUE)
.to(step2).next(step2Decider).on(RETRY).to(step2).from(step2Decider).on(CONTINUE)
...

Adding job parameter in before job doesnt work spring batch

I have a spring batch job wherein I need to set job parameter in beforeStep
I am using below code for the same:
#Override
public void beforeJob(JobExecution jobExecution) {
String pid = fetchPid();
jobExecution
.getJobParameters()
.getParameters()
.put("pid", new JobParameter(pid));
}
When I run above code and debug , I see that pid is not present in the jobparameters . What could be wrong here?
What could be wrong here?
The JobParameters#getParameters returns a unmodifiable map of parameters (See Javadoc). So adding the pid key as you did it wont work.
I need to set job parameter in beforeStep
I guess you mean in beforeJob and not beforeStep since your code shows the beforeJob method. Using the JobExecutionListener to add parameters is too late because the parameters are used to identify the instance, and at the time of invoking beforeJob, the execution has been already launched with the given parameters. You need to prepare the parameters upfront then use them to launch the job using jobLauncher.run(job, jobParameters).

How to commit a file(entire file) in spring batch without using chunks - commit interval?

Commit interval will commit the data at specified intervals. I want to commit the entire file at a single shot since my requirement is to validate the file (line by line) and if it fails at any point . roll back. no commit. is there any way to achieve this in spring batch?
You can either set your commit-interval to Integer.MAX_VALUE (231-1) or create your own CompletionPolicy.
Here's how you configure a step to use a custom CompletionPolicy :
<chunk reader="reader" writer="writer" chunk-completion-policy="completionPolicy"/>
<bean id="completionPolicy" class="xx.xx.xx.CompletionPolicy"/>
Then you have to either choose an out-of-the-box CompletionPolicy provided by Spring Batch (a list of implementations is available on previous link) or create your own.
What do you mean by "commit"?
You are talking about validating and not about writing the read data to another file or into database.
As mentioned in the comment by Michael Prarlow, memory problems could arise, if the size of the file changes.
In order to prevent this, I would suggest to start your job with a validation step. Simply read the data chunkwise, check the data line by line in your processor and throw a none-skippable exception, if the line is not valid. Use a passthroughwriter, so nothing is persisted. If there is a problem, the whole job will fail.
If you really have to write the data into a db or another file, you could do this in a second step. Since you have validated your data, you shouldn't observe any problems.
Simple PassThroughItemWriter
public class PassThroughItemWriter<T> implements ItemWriter<T> {
public void write(List<? extends T> items) {
// do nothing
}
}
or, if you use the Java-Api to build your job and steps, you could simply use a lambda:
stepBuilders.get("step")
.<..., ...>chunk(..)
.reader(...)
.processor(...) // your processor with the validation logic
.writer(items -> {}) // empty lambda expression
.build();

Exchanging data between steps in spring batch

I have a job that is built of the following components
Processing Step - activates an external processing that logs it's result in the DB and returns an internal id so I can take this id and process it further
Logging Step - built of a tasklet, that contains a chunk
The chunk is built of an item reader - that I plan will use #{stepExecutionContext['job.id']} as part of the sql written in the xml file so it will get the relevant logging info
I'm trying to work with the solution suggested here - 11.8 Passing Data to Future Steps but i get this error when i try to add a property on the step execution context or the job execution context
chunkContext.getStepContext().getStepExecutionContext().put("job.id", jobId);
And I get this error:
java.lang.UnsupportedOperationException: null
at java.util.Collections$UnmodifiableMap.put(Collections.java:1342)
at ...
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:395)
Did i miss something?
StepContext available from ChunkContext is a read-only object; use a StepExecutionListener and save the step execution context passed as parameter in method StepExecutionListener.afterStep(StepExecution stepExecution)
I always follow this path:
chunkContext
.getStepContext()
.getStepExecution()
.getJobExecution()
.getExecutionContext()
.put("", "");
I also had problems doing the put in StepExecution.
Maybe it loses data when it goes to the next step.
However, I leave the last sentence to the most expert.
chunkContext.getStepContext().getStepExecutionContext() retrun copy use chunkContext.getStepContext().getStepExecution().getExecutionContext()

spring batch - processor chain

I need to execute seven distinctive processes sequently(One after the other). The data is stored in Mysql. I am thinking of the following options, Please correct me if I am wrong, or if there is a better solution.
Requirments:
Read the data from the Db, do the seven processes(datavalidation, calculation1, calculation2 ...etc.) finally, write the processed data into the DB.
Need to process the data in chunks.
My solution and issues:
Data read:
Read the data using JdbcCursorItemReader, because this is the best performing db reader - But, the SQL is very complex , so I may have to consider a custom ItemReader using JdbcTemplate? which gives me more flexibility in handling the data.
Process:
Define seven steps and chunks, share the data between the steps using databean. But, this won't be a good idea, because the data processes in chunks and after each chunk the step1 writer will create a new set of data in the databean. When this databean shared across the other steps, data integrity will be an issue.
Use StepExecutionContext to share the data between steps. But this may affect the performance as this involves Batch job repository.
Define only one step, with one ItemReader, and a chain of processes (the seven processes), and create one ItemWriter which writes the processed data into the DB. But, I won't be able to administrate or monitor each different processes, all will be in one step.
the org.springframework.batch.item.support.CompositeItemProcessor is an out of the box component from the Spring Batch Framework that would support your requirement akin to your second option. this would allow you do to the following;
- keep separation in your design/solution for reading from the database (itemreader)
- keep separation of each individual processors 'concerns' and configuration
- allow any individual processor to 'shutdown' the chunk by returning null, irrespective of previous processes
the CompositeItemProcessor iterates over a loop of delegates, so it's 'similar' to an action pattern. it's quite useful in the scenario you've described and still allows you to leverage the Chunk benefits (exception, retry, commit policy, etc.)
Suggestions:
1) Read the data using JdbcCursorItemReader.
All out-of-the-box Components are a good choice because they already implements the ItemStream interface that make your steps restartable. But like you mention, sometime, the request is just to complexe or, like me, you already have a service or DAO that you can reuse.
I would suggest you use the ItemReaderAdapter. It let you configure a delegate service to call to get your data.
<bean id="MyReader" class="xxx.adapters.MyItemReaderAdapter">
<property name="targetObject" ref="AnExistingDao" />
<property name="targetMethod" value="next" />
</bean>
Note that the targetMethod must respect the read contract of ItemReaders (return null when no more data)
If your job does not need to be restartable, you could simply use the class : org.springframework.batch.item.adapter.ItemReaderAdapter
But if you need your job to be restartable, you can create your own ItemReaderAdapter like this:
public class MyItemReaderAdapter<T> extends AbstractMethodInvokingDelegator<T> implements ItemReader<T>, ItemStream {
private long currentCount = 0;
private final String CONTEXT_COUNT_KEY = "count";
/**
* #return return value of the target method.
*/
public T read() throws Exception {
super.setArguments(new Long[]{currentCount++});
return invokeDelegateMethod();
}
#Override
public void open(ExecutionContext executionContext)
throws ItemStreamException {
currentCount = executionContext.getLong(CONTEXT_COUNT_KEY,0);
}
#Override
public void update(ExecutionContext executionContext) throws ItemStreamException {
executionContext.putLong(CONTEXT_COUNT_KEY, currentCount);
log.info("Update Stream current count : " + currentCount);
}
#Override
public void close() throws ItemStreamException {
// TODO Auto-generated method stub
}
}
Because the out-of-the-box itemReaderAdapter is not restartable, you just create your own that implements the ItemStream
2) Regarding the 7 steps vs 1 step.
I would go with 1 step with compositeProcessor on this one. the 7 steps option will only bring problems IMO.
1) 7 steps databean : so your writer commit in a databean until step 7.. then step 7 writer try to commit to the real database and boom error!!! all is lost and the batch must restart from step 1!!
2) 7 steps with context : could be better since you will have the state saved in the spring batch metadata.. BUT it is not a good practice to store big data in the metadata of springBatch!!
3) is the way to go IMO. ;-)