How to group steps in one FlowStep and re-run all of the steps after failed? - spring-batch

I am trying to use Spring Batch to handle a bunch of precheck, validation and actions together.
If the precheck or validation fails, then I want to re-run all the pre-checks and validations regardless they previously succeeded or failed; but if an action fails, I want to start from the failed action only.
I want to group the validation or precheck in a step such as FlowStep, but it seems that the Spring retry does not support this.
Does anybody know a solution? thanks in advance!

You can group two steps in a FlowStep as follows:
#Bean
public Flow preCheckAndValidationFlow() {
return new FlowBuilder<SimpleFlow>("flow")
.start(preCheckStep())
.next(validationStep())
.build();
}
#Bean
public Step preCheckAndValidationStep(JobRepository jobRepository) {
FlowStep flowStep = new FlowStep();
flowStep.setFlow(preCheckAndValidationFlow());
flowStep.setJobRepository(jobRepository);
return flowStep;
}
If you want the preCheckStep to re-run (even if it was successful the first time) when the validation step fails, then you need to set the allowStartIfComplete flag on the preCheckStep.
IMO, it would be much more simpler for you to create a single step that contains both the prechecks and validation logic.

Related

Spring Batch repeat step until a criteria is met

I have this Spring Batch flow:
return jobBuilderFactory.get(JOB).preventRestart().incrementer(new RunIdIncrementer()).listener(jobCompletionListener)
.start(clean).next(generateFiles1).next(callApi1)
.next(clean).next(generateFiles2).next(callApi2)
.next(clean).next(generateFiles3).next(callApi3)
.next(clean).next(generateFiles4).next(callApi4)
.build();
I must repeat the first three steps (clean, generateFiles1 and callApi1) until a certain criteria is met (I have to count some data in the database to check if I need to call the API again). And so on for the next three steps.
I have seen the on and to functions explained there, but it seems to me that it does not allow to write such loops.
I could define such flows:
final FlowBuilder<Flow> flowBuilderStep1 = new FlowBuilder<>("Step1");
flowBuilderStep1.start(clean).next(generateFiles1).next(callApi1).end();
final Flow step1 = flowBuilderStep1.build();
final FlowBuilder<Flow> flowBuilderStep2 = new FlowBuilder<>("Step2");
flowBuilderStep2.start(clean).next(generateFiles2).next(callApi2).end();
final Flow step2 = flowBuilderStep2.build();
And then build the conditional structure (maybe after adding Decider or afterStep() some place):
return jobBuilderFactory.get(JOB).preventRestart().incrementer(new RunIdIncrementer()).listener(jobCompletionListener)
.start(step1).on("RETRY").to(step1).on("CONTINUE")
.to(step2).on("RETRY").to(step2).on("CONTINUE")
.to(step3).on("RETRY").to(step3).on("CONTINUE")
.to(step4)
.end().build();
But I don't think it would loop properly. Am I right? Can a loop be accomplished (without a xml config)?
I had to do something like that to make it work.
return jobBuilderFactory.get(JOB).preventRestart().incrementer(new RunIdIncrementer()).listener(jobCompletionListener)
.start(step1).next(step1Decider).on(RETRY).to(step1).from(step1Decider).on(CONTINUE)
.to(step2).next(step2Decider).on(RETRY).to(step2).from(step2Decider).on(CONTINUE)
...

Spring Batch: How do I use decider based on my Reader result?

I am new to Spring Batch. I found out that by use of ExecutionContextPromotionListener, I can set up key value pairs and get them in future steps.
<step id="step1">....</step>
<decision id="decision1">.... />
When I used Tasklet instead of reader, I did following:
Created bean of ExecutionContextPromotionListener with expected keys in my batch config file.
Registered the Listener to my step.
Put the key-value pairs in executionContext retrieved from chunkContext inside my Tasklet like below:
Now the Decider can read from the step execution context as follows and decide.
But, I want to take decision based on Reader from previous step. So, in my decision, how to get the value from Reader? Or is my approach wrong. Please suggest.
simple way is you can make use of Execution context from your step and can pass on the values into the next step .
So, in you first step do something like this
// ...
ExecutionContext stepContext = this.stepExecution.getExecutionContext();
stepContext.put("DATA_KEY", dataToShare );
then in your next step you can read this using the execution context.
ExecutionContext jobContext = jobExecution.getExecutionContext();
dataToShare = jobContext.get("DATA_KEY");
you just need to manage the keys - the key that you use to put in first step and read it in next step

Spring step does not run properly when I "fib" the reader, must I use a tasklet?

I'm aware that all spring steps need to have a reader, a writer, and optionally a processor. So even though my step only needs a writer, I am also fibbing a reader that does nothing but make spring happy.
This is based on the solution found here. Is it outdated, or am I missing something?
I have a spring batch job that has two chunked steps. My first step, deleteCount, is just deleting all rows from the table so that the second step has a clean slate. This means my first step doesn't need a reader, so I followed the above linked stackoverflow solution and created a NoOpItemReader, and added it to my stepbuilder object (code at the bottom).
My writer is mapped to a simple SQL statement that deletes all the rows from the table (code is at the bottom).
My table is not being cleared by the deleteCounts step. I suspect it's because I'm fibbing the reader.
I am expecting that deleteCounts will delete all rows from the table, yet it is not - and I suspect it's because of my "fibbed" reader but am not sure what I'm doing wrong.
My delete statement:
<delete id="delete">
DELETE FROM ${schemaname}.DERP
</delete>
My deleteCounts Step:
#Bean
#JobScope
public Step deleteCounts() {
StepBuilder sb = stepBuilderFactory.get("deleteCounts");
SimpleStepBuilder<ProcessedCountData, ProcessedCountData> ssb = sb.<ProcessedCountData, ProcessedCountData>chunk(10);
ssb.reader(noOpItemReader());
ssb.writer(writerFactory.myBatisBatchWriter(COUNT_DATA_DELETE));
ssb.startLimit(1);
ssb.allowStartIfComplete(true);
return ssb.build();
}
My NoOpItemReader, based on the previously linked solution on stackoverflow:
public NoOpItemReader<? extends ProcessedCountData> noOpItemReader() {
return new NoOpItemReader<>();
}
// for steps that do not need to read anything
public class NoOpItemReader<T> implements ItemReader<T> {
#Override
public T read() throws Exception {
return null;
}
}
I left out some mybatis plumbing, since I know that is working (step 2 is much more involved with the mybatis stuff, and step 2 is inserting rows just fine. deleting is so simple, it must be something with my step config...)
Your NoOpItemReader returns null. An ItemReader returning null indicates that the input has been exhausted. Since, in your case, that's all it returns, the framework assumes that there was no input in the first place.

How to commit a file(entire file) in spring batch without using chunks - commit interval?

Commit interval will commit the data at specified intervals. I want to commit the entire file at a single shot since my requirement is to validate the file (line by line) and if it fails at any point . roll back. no commit. is there any way to achieve this in spring batch?
You can either set your commit-interval to Integer.MAX_VALUE (231-1) or create your own CompletionPolicy.
Here's how you configure a step to use a custom CompletionPolicy :
<chunk reader="reader" writer="writer" chunk-completion-policy="completionPolicy"/>
<bean id="completionPolicy" class="xx.xx.xx.CompletionPolicy"/>
Then you have to either choose an out-of-the-box CompletionPolicy provided by Spring Batch (a list of implementations is available on previous link) or create your own.
What do you mean by "commit"?
You are talking about validating and not about writing the read data to another file or into database.
As mentioned in the comment by Michael Prarlow, memory problems could arise, if the size of the file changes.
In order to prevent this, I would suggest to start your job with a validation step. Simply read the data chunkwise, check the data line by line in your processor and throw a none-skippable exception, if the line is not valid. Use a passthroughwriter, so nothing is persisted. If there is a problem, the whole job will fail.
If you really have to write the data into a db or another file, you could do this in a second step. Since you have validated your data, you shouldn't observe any problems.
Simple PassThroughItemWriter
public class PassThroughItemWriter<T> implements ItemWriter<T> {
public void write(List<? extends T> items) {
// do nothing
}
}
or, if you use the Java-Api to build your job and steps, you could simply use a lambda:
stepBuilders.get("step")
.<..., ...>chunk(..)
.reader(...)
.processor(...) // your processor with the validation logic
.writer(items -> {}) // empty lambda expression
.build();

spring batch - processor chain

I need to execute seven distinctive processes sequently(One after the other). The data is stored in Mysql. I am thinking of the following options, Please correct me if I am wrong, or if there is a better solution.
Requirments:
Read the data from the Db, do the seven processes(datavalidation, calculation1, calculation2 ...etc.) finally, write the processed data into the DB.
Need to process the data in chunks.
My solution and issues:
Data read:
Read the data using JdbcCursorItemReader, because this is the best performing db reader - But, the SQL is very complex , so I may have to consider a custom ItemReader using JdbcTemplate? which gives me more flexibility in handling the data.
Process:
Define seven steps and chunks, share the data between the steps using databean. But, this won't be a good idea, because the data processes in chunks and after each chunk the step1 writer will create a new set of data in the databean. When this databean shared across the other steps, data integrity will be an issue.
Use StepExecutionContext to share the data between steps. But this may affect the performance as this involves Batch job repository.
Define only one step, with one ItemReader, and a chain of processes (the seven processes), and create one ItemWriter which writes the processed data into the DB. But, I won't be able to administrate or monitor each different processes, all will be in one step.
the org.springframework.batch.item.support.CompositeItemProcessor is an out of the box component from the Spring Batch Framework that would support your requirement akin to your second option. this would allow you do to the following;
- keep separation in your design/solution for reading from the database (itemreader)
- keep separation of each individual processors 'concerns' and configuration
- allow any individual processor to 'shutdown' the chunk by returning null, irrespective of previous processes
the CompositeItemProcessor iterates over a loop of delegates, so it's 'similar' to an action pattern. it's quite useful in the scenario you've described and still allows you to leverage the Chunk benefits (exception, retry, commit policy, etc.)
Suggestions:
1) Read the data using JdbcCursorItemReader.
All out-of-the-box Components are a good choice because they already implements the ItemStream interface that make your steps restartable. But like you mention, sometime, the request is just to complexe or, like me, you already have a service or DAO that you can reuse.
I would suggest you use the ItemReaderAdapter. It let you configure a delegate service to call to get your data.
<bean id="MyReader" class="xxx.adapters.MyItemReaderAdapter">
<property name="targetObject" ref="AnExistingDao" />
<property name="targetMethod" value="next" />
</bean>
Note that the targetMethod must respect the read contract of ItemReaders (return null when no more data)
If your job does not need to be restartable, you could simply use the class : org.springframework.batch.item.adapter.ItemReaderAdapter
But if you need your job to be restartable, you can create your own ItemReaderAdapter like this:
public class MyItemReaderAdapter<T> extends AbstractMethodInvokingDelegator<T> implements ItemReader<T>, ItemStream {
private long currentCount = 0;
private final String CONTEXT_COUNT_KEY = "count";
/**
* #return return value of the target method.
*/
public T read() throws Exception {
super.setArguments(new Long[]{currentCount++});
return invokeDelegateMethod();
}
#Override
public void open(ExecutionContext executionContext)
throws ItemStreamException {
currentCount = executionContext.getLong(CONTEXT_COUNT_KEY,0);
}
#Override
public void update(ExecutionContext executionContext) throws ItemStreamException {
executionContext.putLong(CONTEXT_COUNT_KEY, currentCount);
log.info("Update Stream current count : " + currentCount);
}
#Override
public void close() throws ItemStreamException {
// TODO Auto-generated method stub
}
}
Because the out-of-the-box itemReaderAdapter is not restartable, you just create your own that implements the ItemStream
2) Regarding the 7 steps vs 1 step.
I would go with 1 step with compositeProcessor on this one. the 7 steps option will only bring problems IMO.
1) 7 steps databean : so your writer commit in a databean until step 7.. then step 7 writer try to commit to the real database and boom error!!! all is lost and the batch must restart from step 1!!
2) 7 steps with context : could be better since you will have the state saved in the spring batch metadata.. BUT it is not a good practice to store big data in the metadata of springBatch!!
3) is the way to go IMO. ;-)