I am using Spring Batch boot example. In this example, I am looking to convert XML based application into annotation based application. However I am struggling to configure using #Bean in Step to create exact configuration.
<batch:step id="step1">
<batch:tasklet>
<batch:chunk reader="paymentDataReader" writer="paymentDataWriter" commit-interval="100000">
<batch:listeners>
<batch:listener ref="paymentingStepExecutionListener" />
</batch:listeners>
</batch:chunk>
</batch:tasklet>
<batch:next on="COMPLETED" to="sendpaymentingBatchFiles" />
</batch:step>
JobConfiguration.java
#Configuration
public class JobConfiguration {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private DataSource dataSource;
#Bean
#StepScope
public PaymentContextTasklet paymentContextTasklet() {
return new PaymentContextTasklet();
}
// Either execute for "Payment" or "Order"
#Bean
public ContextDecider contextDecider() {
return new ContextDecider();
}
#Bean
public JdbcPagingItemReader<Payment> pagingItemReader(){
JdbcPagingItemReader<Payment> reader = new JdbcPagingItemReader<>();
reader.setDataSource(this.dataSource);
reader.setFetchSize(10);
reader.setRowMapper(new PaymentRowMapper());
MySqlPagingQueryProvider queryProvider = new MySqlPagingQueryProvider();
queryProvider.setSelectClause("select paymentId, amount, customerId, paymentDate");
queryProvider.setFromClause("from payment");
reader.setQueryProvider(queryProvider);
return reader;
}
#Bean
public ItemWriter<Payment> paymentItemWriter(){
return items -> {
for(Payment c : items) {
System.out.println(c.toString());
}
};
}
#Bean
public PaymentStepExecutionListener paymentStepExecutionListener() {
return new PaymentStepExecutionListener();
}
#Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.<Payment, Payment>chunk(10)
.reader(pagingItemReader())
.writer(paymentItemWriter())
.tasklet(paymentStepExecutionListener())
.rea
.build();
}
#Bean
#StepScope
public PaymentDataTasklet paymentDataTasklet() {
return new PaymentDataTasklet();
}
#Bean
public Step paymentContextStep() {
return stepBuilderFactory.get("paymentContextStep")
.tasklet(paymentContextTasklet())
.build();
}
#Bean
public Step paymentDataStep() {
return stepBuilderFactory.get("paymentDataStep")
.tasklet(paymentDataTasklet())
.build();
}
#Bean
public Step endStep() {
return stepBuilderFactory.get("endStep")
.tasklet(null)
.build();
}
#Bean
public Job paymentDataBatchJob() {
return jobBuilderFactory.get("paymentDataBatchJob")
.start(paymentContextStep())
.next(contextDecider())
.on("Payment").to(paymentDataStep()).on("COMPLETED").to(endStep)
.from(contextDecider())
.on("Order").to(endStep()).end()
.build();
}
}
However I am struggling to configure using #Bean in Step to create exact configuration.
The equivalent of your XML snippet in Java config would be something like:
#Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.<Payment, Payment>chunk(100000)
.reader(paymentDataReader())
.writer(paymentDataWriter())
.listener(paymentingStepExecutionListener())
.build();
}
#Bean
public Job paymentDataBatchJob() {
return jobBuilderFactory.get("paymentDataBatchJob")
.start(step1())
.next(sendpaymentingBatchFiles())
.build();
}
Thanks #Mahmoud Ben Hassine. I realize that I should be using paymentStepExecutionListener in listener and not in Tasklet.
One more thing is StepExecutionListener only works within a bean defined in the scope="step". so I should be using
// You can access the stepExecutionContext only within a bean defined in the //scope="step".
#Bean
#StepScope
public PaymentStepExecutionListener paymentStepExecutionListener() {
return new PaymentStepExecutionListener();
}
Now things working very fine.
Related
My Spring Batch Job configuration has 5 steps, all of which are identical except for the reader. Is there a way I can abstract out all of the other parts of the step into a "parent" step, so that I don't need to repeat everything? I know this can be done in XML, but I can't figure out the java equivalent.
Here's one of the steps:
public Step quarterlyStep(FileIngestErrorListener listener, ItemReader<DistributionItem> quarterlyReader) {
return stepBuilderFactory.get("quarterlyStep")
.<DistributionItem,DistributionItem>chunk(10)
.reader(quarterlyReader) // The only thing that changes among 5 different steps
.listener(listener.asReadListener())
.processor(processor())
.listener(listener.asProcessListener())
.writer(writer())
.listener(listener.asWriteListener())
.faultTolerant()
.skip(ValidationException.class)
.skip(ExcelFileParseException.class)
.build();
}
Here's the definition of one of the readers:
#Bean
#JobScope
public PoiItemReader<DistributionItem> yearEndReader(#Value("#{jobExecutionContext['filename']}") String filename) {
PoiItemReader<PortfolioFundsDistributionItem> reader = new PoiItemReader<>();
reader.setLinesToSkip(1);
reader.setRowMapper(yearEndRowMapper());
reader.setResource(new FileSystemResource(filename));
return reader;
}
You can do something like:
private StepBuilderFactory stepBuilderFactory;
private SimpleStepBuilder<Integer, Integer> createBaseStep(String stepName) {
return stepBuilderFactory.get(stepName)
.<Integer, Integer>chunk(5)
.processor(itemProcessor())
.writer(itemWriter());
}
#Bean
public Step step1(ItemReader<Integer> itemReader) {
return createBaseStep("step1")
.reader(itemReader)
.build();
}
#Bean
public Step step2(ItemReader<Integer> itemReader) {
return createBaseStep("step2")
.reader(itemReader)
.build();
}
I have a Spring Batch project with multiple jobs (job A, job B, job C,...). When I run a particular job A, I got the log of the job A shows that all of the beans of job B, C,... are created too. Is there any way to avoid the creation of the other beans when job A is launched.
I have tried to use #Lazy annotation but it 's seem not working.
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
#Qualifier("springDataSource")
public DataSource springDataSource;
#Autowired
#Qualifier("batchJobDataSource")
public DataSource batchJobDataSource;
}
#Configuration
#PropertySource("classpath:partner.properties")
public class B extends BatchConfiguration {
#Value("${partnerId}")
private String partnerId;
#Lazy
#Bean
public Job ProcessB(JobCompletionNotificationListener listener) {
return jobBuilderFactory
.get("ProcessB")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(ProcessStepB())
.build();
}
#Lazy
#Bean
public Step (ProcessStepB() {
return stepBuilderFactory
.get("(ProcessStepB")
.<PartnerDTO, PartnerDTO> chunk(1)
.reader(getPartner())
.processor(process())
.writer(save())
.build();
}
#Lazy
#Bean(destroyMethod = "")
public Reader getPartner() {
return new Reader(batchJobDataSource,partnerId);
}
#Lazy
#Bean
public Processor process() {
return new Processor();
}
#Lazy
#Bean
HistoryWriter historyWriter() {
return new HistoryWriter(batchJobDataSource);
}
#Lazy
#Bean
UpdateWriter updateWriter() {
return new UpdateWriter(batchJobDataSource);
}
#Lazy
#Bean
public CompositeItemWriter<PartnerDTO> saveTransaction() {
List<ItemWriter<? super PartnerDTO>> delegates = new ArrayList<>();
delegates.add(updateWriter());
delegates.add(historyWriter());
CompositeItemWriter<PartnerDTO> itemWriter = new CompositeItemWriter<>();
itemWriter.setDelegates(delegates);
return itemWriter;
}
}
I have also put the #Lazy over the #Configuration but it does work too.
That should not be an issue. But here are a few ideas to try:
Use Spring profiles to isolate job beans
If you use Spring Boot 2.2+, try to activate the lazy bean initialization mode
Package each job in its own jar. This is the best option IMO.
In my simple Spring Batch application, I have a configuration...
#Configuration
public class StepTransitionConfiguration {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Bean
public Step step1() {
...
}
#Bean
public Step step2() {
...
}
#Bean
public Job transitionJobSimpleNext() {
return jobBuilderFactory.get("transitionJobNext")
.start(step1())
.next(step2())
.build();
}
The only other class is
#SpringBootApplication
#EnableBatchProcessing
public class TransitionsApplication {
public static void main(String[] args) {
SpringApplication.run(TransitionsApplication.class, args);
}
}
So I started thinking how does that Application class know to use that particular configuration for the Batch. So I added another Configuration...
#Configuration
public class AnotherTransitionConfiguration {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Bean
public Step step4() {
...
}
#Bean
public Step step5() {
...
}
#Bean
public Job transitionJobSimpleNext() {
return jobBuilderFactory.get("transitionJobNext")
.start(step4())
.next(step5())
.build();
}
Now when I run the application, it always uses the second configuration even though both are on the classpath. Why is this?
1) I have a large file (> 100k lines) that needs to be processed. I have a lot of business validation and checks against external systems for each line item. The code is being migrated from a legacy app and i just put these business logic into the AsyncitemProcessor, which also persists the data into the DB. Is this a good practise to create/save records in the ItemProcessor (in lieu of ItemWriter) ?
2) Code is ::
#Configuration
#EnableAutoConfiguration
#ComponentScan(basePackages = "com.liquidation.lpid")
#EntityScan(basePackages = "com.liquidation.lpid.entities")
#EnableTransactionManagement
public class SimpleJobConfiguration {
#Autowired
public JobRepository jobRepository;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
#Qualifier("myFtpSessionFactory")
private SessionFactory myFtpSessionFactory;
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Bean
public ThreadPoolTaskExecutor lpidItemTaskExecutor() {
ThreadPoolTaskExecutor tExec = new ThreadPoolTaskExecutor();
tExec.setCorePoolSize(10);
tExec.setMaxPoolSize(10);
tExec.setAllowCoreThreadTimeOut(true);
return tExec;
}
#BeforeStep
public void beforeStep(StepExecution stepExecution){
String name = stepExecution.getStepName();
System.out.println("name: " + name);
}
#Bean
public SomeItemWriterListener someItemWriterListener(){
return new SomeItemWriterListener();
};
#Bean
#StepScope
public FlatFileItemReader<FieldSet> lpidItemReader(#Value("#{stepExecutionContext['fileResource']}") String fileResource) {
System.out.println("itemReader called !!!!!!!!!!! for customer data" + fileResource);
FlatFileItemReader<FieldSet> reader = new FlatFileItemReader<FieldSet>();
reader.setResource(new ClassPathResource("/data/stage/"+ fileResource));
reader.setLinesToSkip(1);
DefaultLineMapper<FieldSet> lineMapper = new DefaultLineMapper<FieldSet>();
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
reader.setSkippedLinesCallback(new LineCallbackHandler() {
public void handleLine(String line) {
if (line != null) {
tokenizer.setNames(line.split(","));
}
}
});
lineMapper.setLineTokenizer(tokenizer);
lineMapper.setFieldSetMapper(new PassThroughFieldSetMapper());
lineMapper.afterPropertiesSet();
reader.setLineMapper(lineMapper);
return reader;
}
#Bean
public ItemWriter<FieldSet> lpidItemWriter() {
return new LpidItemWriter();
}
#Autowired
private MultiFileResourcePartitioner multiFileResourcePartitioner;
#Bean
public Step masterStep() {
return stepBuilderFactory.get("masterStep")
.partitioner(slaveStep().getName(), multiFileResourcePartitioner)
.step(slaveStep())
.gridSize(4)
.taskExecutor(lpidItemTaskExecutor())
.build();
}
#Bean
public ItemProcessListener<FieldSet,String> processListener(){
return new LpidItemProcessListener();
}
#SuppressWarnings("unchecked")
#Bean
public Step slaveStep() {
return stepBuilderFactory.get("slaveStep")
.<FieldSet,FieldSet>chunk(5)
.faultTolerant()
.listener(new ChunkListener())
.reader(lpidItemReader(null))
.processor(asyncItemProcessor())
.writer(asyncItemWriter()).listener(someItemWriterListener()).build();
}
#Bean
public AsyncItemWriter<FieldSet> asyncItemWriter(){
AsyncItemWriter<FieldSet> asyncItemProcessor = new AsyncItemWriter<>();
asyncItemProcessor.setDelegate(lpidItemWriter());
try {
asyncItemProcessor.afterPropertiesSet();
} catch (Exception e) {
e.printStackTrace();
}
return asyncItemProcessor;
}
#Bean
public ItemProcessor<FieldSet, FieldSet> processor() {
return new lpidCheckItemProcessor();
}
#Bean
public AsyncItemProcessor<FieldSet, FieldSet> asyncItemProcessor() {
AsyncItemProcessor<FieldSet, FieldSet> asyncItemProcessor = new AsyncItemProcessor<FieldSet, FieldSet>();
asyncItemProcessor.setDelegate(processor());
asyncItemProcessor.setTaskExecutor(lpidItemTaskExecutor());
try {
asyncItemProcessor.afterPropertiesSet();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return asyncItemProcessor;
}
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get("job").incrementer(new RunIdIncrementer()).start(masterStep()).build();
}
}
The itemwriter runs before the itemprocessor has completed. My understanding is: for every chunk, the item reader reads the data, item processor will churn through each item, and at the end of the chunk, the item writer gets called (which in my case,it does not do anything since the itemprocessor persists the data). But the itemwriter gets called before the item processor gets completed and my job never completes. What am i doing incorrectly here? (I looked at previous issues around it and the solution was to wrap the writer around the AsyncItemWriter(), which i am doing) .
Thanks
Sundar
I am trying to make a sample application on parallel step execution in java configuration file but get perplexed that how many files(job repository,job launcher and execution etc.) are being configured and initialized and if configured then how?
Simply I need a sample application to clarify the basics of parallel execution of steps in a job.
Here's an example of using splits via java config. In this example, flows 1 and 2 will be executed in parallel:
#Configuration
public class BatchConfiguration {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Bean
public Tasklet tasklet() {
return new CountingTasklet();
}
#Bean
public Flow flow1() {
return new FlowBuilder<Flow>("flow1")
.start(stepBuilderFactory.get("step1")
.tasklet(tasklet()).build())
.build();
}
#Bean
public Flow flow2() {
return new FlowBuilder<Flow>("flow2")
.start(stepBuilderFactory.get("step2")
.tasklet(tasklet()).build())
.next(stepBuilderFactory.get("step3")
.tasklet(tasklet()).build())
.build();
}
#Bean
public Job job() {
return jobBuilderFactory.get("job")
.start(flow1())
.split(new SimpleAsyncTaskExecutor()).add(flow2())
.end()
.build();
}
public static class CountingTasklet implements Tasklet {
#Override
public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception {
System.out.println(String.format("%s has been executed on thread %s", chunkContext.getStepContext().getStepName(), Thread.currentThread().getName()));
return RepeatStatus.FINISHED;
}
}
}
Suppose you have steps, A,B1,B2,B3,C. You want to run B1,B2 & B3 in parallel. You first need to create sub-flows for them and then add to one flow with SimpleAsyncTaskExecutor():
#Bean
public Job job()
{
final Flow flowB1 = new FlowBuilder<Flow>("subflowb1").from(stepb1()).end();
final Flow flowB2 = new FlowBuilder<Flow>("subflowb2").from(stepb2()).end();
final Flow flowB3 = new FlowBuilder<Flow>("subflowb3").from(stepb3()).end();
final Flow splitFlow = new FlowBuilder<Flow>("splitFlow")
.start(flowB1)
.split(new SimpleAsyncTaskExecutor())
.add(flowB2, flowB3).build();
return jobBuilderFactory
.flow(stepA())
.next(splitFlow)
.next(stepC())
.end()
.build();
}
here is the basic parallel step execution on different data set, basically you have to provide a Partitioner which will create seprate context for each step and based on context you can work on its data set.
<batch:job id="myJob" job-repository="jobRepository">
<batch:step id="master">
<batch:partition step="step1" partitioner="stepPartitioner ">
<batch:handler grid-size="4" task-executor="taskExecutor"/>
</batch:partition>
</batch:step>
</batch:job>
<batch:step id="step1">
<batch:tasklet>
<batch:chunk reader="myReader" processor="myProcessor" writer="myWriter"
commit-interval="10"/>
</batch:tasklet>
</batch:step>
public class stepPartitioner implements Partitioner {
#Autowired
DaoInterface daoInterface;
#Override
public Map<String, ExecutionContext> partition(int i) {
Map<String, ExecutionContext> result = new HashMap<>();
List<String> keys= daoInterface.getUniqueKeyForStep();
for(String key: keys){
ExecutionContext executionContext = new ExecutionContext();
executionContext.putString("key", key);
result.put(key,executionContext);
}
return result;
}
}