What is better way to pass parameter between steps when I use Paritioner and FlowStep in Spring Batch? - spring-batch

I am considering using JobExecutionContext to pass parameter between multiple steps with partiioner, but in this case, parameter will be able to be gotten by another thread.
And in my understanding, I can't use stepExecutionContext to share data because of different step.
So, I want to share data only in a same thread without using JobExecutionContext, although I think JobExecutionContext would be available because it is ConcurrentHashMap(thread safe) and if I allow job to use memory a lots.
For example, I would like to pass parameter file path from "firstStep" to "nextStep".
note: TaskExecutor-1 and 2 are each thread name.
TaskExecutor-1 executes step → firstStep → firstStep generates filePathA → nextStep → nextStep uses filePathA
TaskExecutor-2 executes step → firstStep → nextStep generates filePathB → nextStep → nextStep uses filePathB
But I don't want "nextStep" in TaskExecutor-2 to access the data filePathA which comes from "firstStep" in TaskExecutor-1
※ I might have to use #AfterStep to do everything I need in this case.
・Sample Configuration
#Bean
fun flowStep(jobRepository: JobRepository, transactionManager: PlatformTransactionManager): Step {
val flow: Flow = FlowBuilder<Flow>("myFlowStepGroup")
.from(firstStep(jobRepository, transactionManager)) // Generate file path.
.next(nextStep(jobRepository, transactionManager)) // upload file, but need file path from fistTask
.build()
val flowStep = FlowStep(flow)
flowStep.setFlow(flow)
flowStep.setJobRepository(jobRepository)
return StepBuilder("myFlowStep", jobRepository)
.partitioner("myFlowStepPartitioner", MyPartitioner())
.step(flowStep)
.taskExecutor(taskExecutor())
.build()
}
fun taskExecutor(): TaskExecutor {
val executor = ThreadPoolTaskExecutor()
executor.corePoolSize = 2 // use two threads
executor.initialize()
return executor
}
I am using below.
Spring Boot 3.0.1
Spring Batch 5.0.0

Related

Spring Batch repeat step until a criteria is met

I have this Spring Batch flow:
return jobBuilderFactory.get(JOB).preventRestart().incrementer(new RunIdIncrementer()).listener(jobCompletionListener)
.start(clean).next(generateFiles1).next(callApi1)
.next(clean).next(generateFiles2).next(callApi2)
.next(clean).next(generateFiles3).next(callApi3)
.next(clean).next(generateFiles4).next(callApi4)
.build();
I must repeat the first three steps (clean, generateFiles1 and callApi1) until a certain criteria is met (I have to count some data in the database to check if I need to call the API again). And so on for the next three steps.
I have seen the on and to functions explained there, but it seems to me that it does not allow to write such loops.
I could define such flows:
final FlowBuilder<Flow> flowBuilderStep1 = new FlowBuilder<>("Step1");
flowBuilderStep1.start(clean).next(generateFiles1).next(callApi1).end();
final Flow step1 = flowBuilderStep1.build();
final FlowBuilder<Flow> flowBuilderStep2 = new FlowBuilder<>("Step2");
flowBuilderStep2.start(clean).next(generateFiles2).next(callApi2).end();
final Flow step2 = flowBuilderStep2.build();
And then build the conditional structure (maybe after adding Decider or afterStep() some place):
return jobBuilderFactory.get(JOB).preventRestart().incrementer(new RunIdIncrementer()).listener(jobCompletionListener)
.start(step1).on("RETRY").to(step1).on("CONTINUE")
.to(step2).on("RETRY").to(step2).on("CONTINUE")
.to(step3).on("RETRY").to(step3).on("CONTINUE")
.to(step4)
.end().build();
But I don't think it would loop properly. Am I right? Can a loop be accomplished (without a xml config)?
I had to do something like that to make it work.
return jobBuilderFactory.get(JOB).preventRestart().incrementer(new RunIdIncrementer()).listener(jobCompletionListener)
.start(step1).next(step1Decider).on(RETRY).to(step1).from(step1Decider).on(CONTINUE)
.to(step2).next(step2Decider).on(RETRY).to(step2).from(step2Decider).on(CONTINUE)
...

Spring batch config with multiple state recording writes in one step

I am implementing the following config in Spring Batch and I am wondering what the best approach would be:
ItemReader ---> item ---> processor ---> processor -> processor -> ... processor -> itemWriter
| | |
Write state to DB Write... Write...
so the item is read from the database and each item goes through multiple units of processing which are serial(not parallel) before the final writer finishes it up by writing the result.
It looks like this could be done via listeners... what would be the best approach here? Thanks.
P.S.
What I had in mind was something like this, which does not seem possible using only one step:
ItemReader->item -> process -> write -> process -> write -> ...process ->itemWriter
Modified the diagram below.
ItemReader->item -> CompositeItemProcessor->itemWriter
CompositeItemProcessor is a serial processor you can add multiple processors
public ItemProcessor<Document,Document> compositeItemProcessor() {
CompositeItemProcessor processor = new CompositeItemProcessor<>();
ArrayList<ItemProcessor<Document,Document>> delegates = new ArrayList<>();
delegates.add(tikaItemProcessor);
delegates.add(pdfBoxItemProcessor);
delegates.add(metadataItemProcessor);
delegates.add(webserviceDocumentItemProcessor);
processor.setDelegates(delegates);
return processor;
}
Please find the API documentation below
https://docs.spring.io/spring-batch/docs/current/api/org/springframework/batch/item/support/CompositeItemProcessor.html

Spring Batch: How do I use decider based on my Reader result?

I am new to Spring Batch. I found out that by use of ExecutionContextPromotionListener, I can set up key value pairs and get them in future steps.
<step id="step1">....</step>
<decision id="decision1">.... />
When I used Tasklet instead of reader, I did following:
Created bean of ExecutionContextPromotionListener with expected keys in my batch config file.
Registered the Listener to my step.
Put the key-value pairs in executionContext retrieved from chunkContext inside my Tasklet like below:
Now the Decider can read from the step execution context as follows and decide.
But, I want to take decision based on Reader from previous step. So, in my decision, how to get the value from Reader? Or is my approach wrong. Please suggest.
simple way is you can make use of Execution context from your step and can pass on the values into the next step .
So, in you first step do something like this
// ...
ExecutionContext stepContext = this.stepExecution.getExecutionContext();
stepContext.put("DATA_KEY", dataToShare );
then in your next step you can read this using the execution context.
ExecutionContext jobContext = jobExecution.getExecutionContext();
dataToShare = jobContext.get("DATA_KEY");
you just need to manage the keys - the key that you use to put in first step and read it in next step

How to commit a file(entire file) in spring batch without using chunks - commit interval?

Commit interval will commit the data at specified intervals. I want to commit the entire file at a single shot since my requirement is to validate the file (line by line) and if it fails at any point . roll back. no commit. is there any way to achieve this in spring batch?
You can either set your commit-interval to Integer.MAX_VALUE (231-1) or create your own CompletionPolicy.
Here's how you configure a step to use a custom CompletionPolicy :
<chunk reader="reader" writer="writer" chunk-completion-policy="completionPolicy"/>
<bean id="completionPolicy" class="xx.xx.xx.CompletionPolicy"/>
Then you have to either choose an out-of-the-box CompletionPolicy provided by Spring Batch (a list of implementations is available on previous link) or create your own.
What do you mean by "commit"?
You are talking about validating and not about writing the read data to another file or into database.
As mentioned in the comment by Michael Prarlow, memory problems could arise, if the size of the file changes.
In order to prevent this, I would suggest to start your job with a validation step. Simply read the data chunkwise, check the data line by line in your processor and throw a none-skippable exception, if the line is not valid. Use a passthroughwriter, so nothing is persisted. If there is a problem, the whole job will fail.
If you really have to write the data into a db or another file, you could do this in a second step. Since you have validated your data, you shouldn't observe any problems.
Simple PassThroughItemWriter
public class PassThroughItemWriter<T> implements ItemWriter<T> {
public void write(List<? extends T> items) {
// do nothing
}
}
or, if you use the Java-Api to build your job and steps, you could simply use a lambda:
stepBuilders.get("step")
.<..., ...>chunk(..)
.reader(...)
.processor(...) // your processor with the validation logic
.writer(items -> {}) // empty lambda expression
.build();

How to write more then one class in spring batch

Situation:
I read url of file on internet from db. In itemProcessor I download this file and I want to save each row to database. Then processing continue and I want to create some new class "summary" which I want to save to db too. How should configure my job in spring batch ?
For your use-case job can be defined using this step sequence (in this way this job is also restartable):
Download file from URL to HDD using a Tasklet: a Tasklet is the strategy to process a single step; in your case something similar to this post can help and store local filename to JobExecutionContext.
Process downloaded file:
2.1. With a FlatFileItemReader<S> (or your own ItemReader/ItemStream implementation) read downloaded file
2.2 With an ItemProcessor<S,T> process each row
2.3 Write each object to processed in 2.2 to database using a custom MyWriter<T> that do summary calculation and delegate to ItemWriter<T> for T's database persistence and to ItemWriter<Summary> to write Summary object.
<S> is the bean contains each file row and
<T> is the bean your write to db
MyWriter<T> can be used in this way:
class MyWriter extends ItemWriter<T> {
private ItemWriter<Summary> summaryWriter;
private ItemWriter<T> tWriter;
public void write(List<? super T> items) {
List<Summary> summaries = new ArrayList<>(items.size());
for(T item : items) {
final Summary summary = /* Here create summary object reading from
* database or creating new object */
/* Do summary or update summary */
summaries.add(summary);
}
/* The code above is trivial: you can group Summary object using a Map<SummaryKey,Summary> to reduce reading and use summaryWriter.write(summariesMap.values()) for example */
tWriter.write(items);
summaryWriter.write(summaries);
}
}
You need to save as stream both MyWriter.summaryWriter and MyWriter.tWriter for restartability.
You can use a CompositeItemWriter.
But perhaps your summary processing should be in another step which reads the rows you previously inserted