I am trying to make a sample application on parallel step execution in java configuration file but get perplexed that how many files(job repository,job launcher and execution etc.) are being configured and initialized and if configured then how?
Simply I need a sample application to clarify the basics of parallel execution of steps in a job.
Here's an example of using splits via java config. In this example, flows 1 and 2 will be executed in parallel:
#Configuration
public class BatchConfiguration {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Bean
public Tasklet tasklet() {
return new CountingTasklet();
}
#Bean
public Flow flow1() {
return new FlowBuilder<Flow>("flow1")
.start(stepBuilderFactory.get("step1")
.tasklet(tasklet()).build())
.build();
}
#Bean
public Flow flow2() {
return new FlowBuilder<Flow>("flow2")
.start(stepBuilderFactory.get("step2")
.tasklet(tasklet()).build())
.next(stepBuilderFactory.get("step3")
.tasklet(tasklet()).build())
.build();
}
#Bean
public Job job() {
return jobBuilderFactory.get("job")
.start(flow1())
.split(new SimpleAsyncTaskExecutor()).add(flow2())
.end()
.build();
}
public static class CountingTasklet implements Tasklet {
#Override
public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception {
System.out.println(String.format("%s has been executed on thread %s", chunkContext.getStepContext().getStepName(), Thread.currentThread().getName()));
return RepeatStatus.FINISHED;
}
}
}
Suppose you have steps, A,B1,B2,B3,C. You want to run B1,B2 & B3 in parallel. You first need to create sub-flows for them and then add to one flow with SimpleAsyncTaskExecutor():
#Bean
public Job job()
{
final Flow flowB1 = new FlowBuilder<Flow>("subflowb1").from(stepb1()).end();
final Flow flowB2 = new FlowBuilder<Flow>("subflowb2").from(stepb2()).end();
final Flow flowB3 = new FlowBuilder<Flow>("subflowb3").from(stepb3()).end();
final Flow splitFlow = new FlowBuilder<Flow>("splitFlow")
.start(flowB1)
.split(new SimpleAsyncTaskExecutor())
.add(flowB2, flowB3).build();
return jobBuilderFactory
.flow(stepA())
.next(splitFlow)
.next(stepC())
.end()
.build();
}
here is the basic parallel step execution on different data set, basically you have to provide a Partitioner which will create seprate context for each step and based on context you can work on its data set.
<batch:job id="myJob" job-repository="jobRepository">
<batch:step id="master">
<batch:partition step="step1" partitioner="stepPartitioner ">
<batch:handler grid-size="4" task-executor="taskExecutor"/>
</batch:partition>
</batch:step>
</batch:job>
<batch:step id="step1">
<batch:tasklet>
<batch:chunk reader="myReader" processor="myProcessor" writer="myWriter"
commit-interval="10"/>
</batch:tasklet>
</batch:step>
public class stepPartitioner implements Partitioner {
#Autowired
DaoInterface daoInterface;
#Override
public Map<String, ExecutionContext> partition(int i) {
Map<String, ExecutionContext> result = new HashMap<>();
List<String> keys= daoInterface.getUniqueKeyForStep();
for(String key: keys){
ExecutionContext executionContext = new ExecutionContext();
executionContext.putString("key", key);
result.put(key,executionContext);
}
return result;
}
}
Related
My Spring Batch Job configuration has 5 steps, all of which are identical except for the reader. Is there a way I can abstract out all of the other parts of the step into a "parent" step, so that I don't need to repeat everything? I know this can be done in XML, but I can't figure out the java equivalent.
Here's one of the steps:
public Step quarterlyStep(FileIngestErrorListener listener, ItemReader<DistributionItem> quarterlyReader) {
return stepBuilderFactory.get("quarterlyStep")
.<DistributionItem,DistributionItem>chunk(10)
.reader(quarterlyReader) // The only thing that changes among 5 different steps
.listener(listener.asReadListener())
.processor(processor())
.listener(listener.asProcessListener())
.writer(writer())
.listener(listener.asWriteListener())
.faultTolerant()
.skip(ValidationException.class)
.skip(ExcelFileParseException.class)
.build();
}
Here's the definition of one of the readers:
#Bean
#JobScope
public PoiItemReader<DistributionItem> yearEndReader(#Value("#{jobExecutionContext['filename']}") String filename) {
PoiItemReader<PortfolioFundsDistributionItem> reader = new PoiItemReader<>();
reader.setLinesToSkip(1);
reader.setRowMapper(yearEndRowMapper());
reader.setResource(new FileSystemResource(filename));
return reader;
}
You can do something like:
private StepBuilderFactory stepBuilderFactory;
private SimpleStepBuilder<Integer, Integer> createBaseStep(String stepName) {
return stepBuilderFactory.get(stepName)
.<Integer, Integer>chunk(5)
.processor(itemProcessor())
.writer(itemWriter());
}
#Bean
public Step step1(ItemReader<Integer> itemReader) {
return createBaseStep("step1")
.reader(itemReader)
.build();
}
#Bean
public Step step2(ItemReader<Integer> itemReader) {
return createBaseStep("step2")
.reader(itemReader)
.build();
}
I would like to quit a tasklet cleanly if I have an error on it and put and stop the batch without having to resort to a System.exit(1).
Here is my code:
/**
* execution de la tasklet
*
*/
#Override
public RepeatStatus execute(StepContribution arg0, ChunkContext arg1) throws IOException {
if (suiviFluxDao.getNbFileDateTrt(FilenameUtils.getName(resource), Utils.getDateFromStringFormatUS(dateTraitement)) > 0) {
LOGGER.info(PropertiesUtils.getLibelleExcep(Constantes.ERREUR_NB_FILE_SELECT,
new String[]{ConstantesNomsSql.TABLE_STCO_STAU_SUIVI_FLUX, FilenameUtils.getName(resource), dateTraitement, Constantes.NAME_TRT}));
System.exit(1);
} else {
SuiviFluxBO suiviFluxBO = new SuiviFluxBO();
suiviFluxBO.setDateSysteme(Utils.getDateTodayFormatUS());
suiviFluxBO.setDateTrt(Utils.getDateFromStringFormatUS(dateTraitement));
suiviFluxBO.setLibelleTrt("Batch_Java");
suiviFluxBO.setNomficTrt(FilenameUtils.getName(resource));
suiviFluxBO.setNbrrecTrt(Utils.countNbFile(resource));
suiviFluxBO.setNomtabTrt(ConstantesNomsSql.TABLE_STCO_STAU_FIC_ADH);
suiviFluxBO.setNbrlignesTrt(0);
suiviFluxDao.insertSuiviBO(suiviFluxBO);
}
// fin de l'execution
return RepeatStatus.FINISHED;
}
The tasklet implements StepExecutionListener but how to indicate in the IF that contains the error to modify the execution status so that it is in FAILED?
Thank you for your leads.
Based on above requirement , we can build a flow using spring batch FlowBuilder object .
1 . Build a Tasklet which performs required validations and sets ExitStatus based on validation result.
#Component
public class TestTasklet implements StepExecutionListener, Tasklet {
// Any additional properties if required can be added here .
#Override
public void beforeStep(StepExecution stepExecution) {
// Any logic added here will execute before executing step
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
// Any logic added here will execute after executing step
return null;
}
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws IOException {
if (suiviFluxDao.getNbFileDateTrt(FilenameUtils.getName(resource),
Utils.getDateFromStringFormatUS(dateTraitement)) > 0) {
LOGGER.info(PropertiesUtils.getLibelleExcep(Constantes.ERREUR_NB_FILE_SELECT,
new String[]{ConstantesNomsSql.TABLE_STCO_STAU_SUIVI_FLUX, FilenameUtils.getName(resource),
dateTraitement, Constantes.NAME_TRT}));
contribution.setExitStatus(ExitStatus.FAILED);
} else {
// any logic goes here .
contribution.setExitStatus(ExitStatus.COMPLETED);
}
return RepeatStatus.FINISHED;
}
}
2 . Below code snippet allows to configure job using flow builder :
#Configuration
public class JobConfigurations {
private StepBuilderFactory stepBuilderFactory;
private JobBuilderFactory jobBuilderFactory;
#Autowired
public JobConfigurations(StepBuilderFactory stepBuilderFactory,
JobBuilderFactory jobBuilderFactory) {
this.stepBuilderFactory = stepBuilderFactory;
this.jobBuilderFactory = jobBuilderFactory;
}
#Bean
public Job job(TestTasklet testTasklet) {
Step validationStep = stepBuilderFactory.get("validationTasklet")
.tasklet(testTasklet).build();
//create another step where you want to perform business logic
//for sake of brevity let us assume it to be businessValidationStep
//Step businessValidationStep = stepBuilderFactory.get("businessvalidationstep")
// .chunk().reader().processor().writer();
return jobBuilderFactory.get("JOB_NAME").incrementer(new RunIdIncrementer())
.start(validationStep)// start your job with validation step
.on(ExitStatus.FAILED.getExitCode()).end()// this will terminate your job cleanly
.from(validationStep)
.on(ExitStatus.COMPLETED.getExitCode())//.to("businessValidationStep")
.to(validationStep).build().build();
}
}
#lasnico37 Hope above code will solve the problem statement .
I have a Spring Batch project with multiple jobs (job A, job B, job C,...). When I run a particular job A, I got the log of the job A shows that all of the beans of job B, C,... are created too. Is there any way to avoid the creation of the other beans when job A is launched.
I have tried to use #Lazy annotation but it 's seem not working.
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
#Qualifier("springDataSource")
public DataSource springDataSource;
#Autowired
#Qualifier("batchJobDataSource")
public DataSource batchJobDataSource;
}
#Configuration
#PropertySource("classpath:partner.properties")
public class B extends BatchConfiguration {
#Value("${partnerId}")
private String partnerId;
#Lazy
#Bean
public Job ProcessB(JobCompletionNotificationListener listener) {
return jobBuilderFactory
.get("ProcessB")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(ProcessStepB())
.build();
}
#Lazy
#Bean
public Step (ProcessStepB() {
return stepBuilderFactory
.get("(ProcessStepB")
.<PartnerDTO, PartnerDTO> chunk(1)
.reader(getPartner())
.processor(process())
.writer(save())
.build();
}
#Lazy
#Bean(destroyMethod = "")
public Reader getPartner() {
return new Reader(batchJobDataSource,partnerId);
}
#Lazy
#Bean
public Processor process() {
return new Processor();
}
#Lazy
#Bean
HistoryWriter historyWriter() {
return new HistoryWriter(batchJobDataSource);
}
#Lazy
#Bean
UpdateWriter updateWriter() {
return new UpdateWriter(batchJobDataSource);
}
#Lazy
#Bean
public CompositeItemWriter<PartnerDTO> saveTransaction() {
List<ItemWriter<? super PartnerDTO>> delegates = new ArrayList<>();
delegates.add(updateWriter());
delegates.add(historyWriter());
CompositeItemWriter<PartnerDTO> itemWriter = new CompositeItemWriter<>();
itemWriter.setDelegates(delegates);
return itemWriter;
}
}
I have also put the #Lazy over the #Configuration but it does work too.
That should not be an issue. But here are a few ideas to try:
Use Spring profiles to isolate job beans
If you use Spring Boot 2.2+, try to activate the lazy bean initialization mode
Package each job in its own jar. This is the best option IMO.
I am using Spring Batch boot example. In this example, I am looking to convert XML based application into annotation based application. However I am struggling to configure using #Bean in Step to create exact configuration.
<batch:step id="step1">
<batch:tasklet>
<batch:chunk reader="paymentDataReader" writer="paymentDataWriter" commit-interval="100000">
<batch:listeners>
<batch:listener ref="paymentingStepExecutionListener" />
</batch:listeners>
</batch:chunk>
</batch:tasklet>
<batch:next on="COMPLETED" to="sendpaymentingBatchFiles" />
</batch:step>
JobConfiguration.java
#Configuration
public class JobConfiguration {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private DataSource dataSource;
#Bean
#StepScope
public PaymentContextTasklet paymentContextTasklet() {
return new PaymentContextTasklet();
}
// Either execute for "Payment" or "Order"
#Bean
public ContextDecider contextDecider() {
return new ContextDecider();
}
#Bean
public JdbcPagingItemReader<Payment> pagingItemReader(){
JdbcPagingItemReader<Payment> reader = new JdbcPagingItemReader<>();
reader.setDataSource(this.dataSource);
reader.setFetchSize(10);
reader.setRowMapper(new PaymentRowMapper());
MySqlPagingQueryProvider queryProvider = new MySqlPagingQueryProvider();
queryProvider.setSelectClause("select paymentId, amount, customerId, paymentDate");
queryProvider.setFromClause("from payment");
reader.setQueryProvider(queryProvider);
return reader;
}
#Bean
public ItemWriter<Payment> paymentItemWriter(){
return items -> {
for(Payment c : items) {
System.out.println(c.toString());
}
};
}
#Bean
public PaymentStepExecutionListener paymentStepExecutionListener() {
return new PaymentStepExecutionListener();
}
#Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.<Payment, Payment>chunk(10)
.reader(pagingItemReader())
.writer(paymentItemWriter())
.tasklet(paymentStepExecutionListener())
.rea
.build();
}
#Bean
#StepScope
public PaymentDataTasklet paymentDataTasklet() {
return new PaymentDataTasklet();
}
#Bean
public Step paymentContextStep() {
return stepBuilderFactory.get("paymentContextStep")
.tasklet(paymentContextTasklet())
.build();
}
#Bean
public Step paymentDataStep() {
return stepBuilderFactory.get("paymentDataStep")
.tasklet(paymentDataTasklet())
.build();
}
#Bean
public Step endStep() {
return stepBuilderFactory.get("endStep")
.tasklet(null)
.build();
}
#Bean
public Job paymentDataBatchJob() {
return jobBuilderFactory.get("paymentDataBatchJob")
.start(paymentContextStep())
.next(contextDecider())
.on("Payment").to(paymentDataStep()).on("COMPLETED").to(endStep)
.from(contextDecider())
.on("Order").to(endStep()).end()
.build();
}
}
However I am struggling to configure using #Bean in Step to create exact configuration.
The equivalent of your XML snippet in Java config would be something like:
#Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.<Payment, Payment>chunk(100000)
.reader(paymentDataReader())
.writer(paymentDataWriter())
.listener(paymentingStepExecutionListener())
.build();
}
#Bean
public Job paymentDataBatchJob() {
return jobBuilderFactory.get("paymentDataBatchJob")
.start(step1())
.next(sendpaymentingBatchFiles())
.build();
}
Thanks #Mahmoud Ben Hassine. I realize that I should be using paymentStepExecutionListener in listener and not in Tasklet.
One more thing is StepExecutionListener only works within a bean defined in the scope="step". so I should be using
// You can access the stepExecutionContext only within a bean defined in the //scope="step".
#Bean
#StepScope
public PaymentStepExecutionListener paymentStepExecutionListener() {
return new PaymentStepExecutionListener();
}
Now things working very fine.
I've a working spring boot/batch projet containing 2 jobs.
I'm now trying to add Integration to poll files from a remote SFTP using only java configuration / java DSL, and then launch a job.
The file polling is working but I've no idea on how to launch a Job in my flow, despite reading these links :
Spring Batch Integration config using Java DSL
and
Spring Batch Integration job-launching-gateway
some code snippets:
#Bean
public SessionFactory SftpSessionFactory()
{
DefaultSftpSessionFactory sftpSessionFactory = new DefaultSftpSessionFactory();
sftpSessionFactory.setHost("myip");
sftpSessionFactory.setPort(22);
sftpSessionFactory.setUser("user");
sftpSessionFactory.setPrivateKey(new FileSystemResource("path to my key"));
return sftpSessionFactory;
}
#Bean
public IntegrationFlow ftpInboundFlow() {
return IntegrationFlows
.from(Sftp.inboundAdapter(SftpSessionFactory())
.deleteRemoteFiles(Boolean.FALSE)
.preserveTimestamp(Boolean.TRUE)
.autoCreateLocalDirectory(Boolean.TRUE)
.remoteDirectory("remote dir")
.regexFilter(".*\\.txt$")
.localDirectory(new File("C:/sftp/")),
e -> e.id("sftpInboundAdapter").poller(Pollers.fixedRate(600000)))
.handle("FileMessageToJobRequest","toRequest")
// what to put next to process the jobRequest ?
For .handle("FileMessageToJobRequest","toRequest") I use the one described here http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html
I would appreciate any help on that, many thanks.
EDIT after Gary comment
I've added, it doesn't compile -of course- because I don't understand how the request is propagated :
.handle("FileMessageToJobRequest","toRequest")
.handle(jobLaunchingGw())
.get();
}
#Bean
public MessageHandler jobLaunchingGw() {
return new JobLaunchingGateway(jobLauncher());
}
#Autowired
private JobLauncher jobLauncher;
#Bean
public JobExecution jobLauncher(JobLaunchRequest req) throws JobExecutionException {
JobExecution execution = jobLauncher.run(req.getJob(), req.getJobParameters());
return execution;
}
I've found a way to launch a job using a #ServiceActivator and adding this to my flow but I'm not sure it's good practice :
.handle("lauchBatchService", "launch")
#Component("lauchBatchService")
public class LaunchBatchService {
private static Logger log = LoggerFactory.getLogger(LaunchBatchService.class);
#Autowired
private JobLauncher jobLauncher;
#ServiceActivator
public JobExecution launch(JobLaunchRequest req) throws JobExecutionException {
JobExecution execution = jobLauncher.run(req.getJob(), req.getJobParameters());
return execution;
}
}
.handle(jobLaunchingGw())
// handle result
...
#Bean
public MessageHandler jobLaunchingGw() {
return new JobLaunchingGateway(jobLauncher());
}
where jobLauncher() is the JobLauncher bean.
EDIT
Your service activator is doing about the same as the JLG; it uses this code.
Your jobLauncher #Bean is wrong.
#Beans are definitions; they don't do runtime stuff like this
#Bean
public JobExecution jobLauncher(JobLaunchRequest req) throws JobExecutionException {
JobExecution execution = jobLauncher.run(req.getJob(), req.getJobParameters());
return execution;
}
Since you are already autowiring a JobLauncher, just use that.
#Autowired
private JobLauncher jobLauncher;
#Bean
public MessageHandler jobLaunchingGw() {
return new JobLaunchingGateway(jobLauncher);
}