Spring Batch: AsyncItemProcessor and AsyncItemWriter

Spring Batch: AsyncItemProcessor and AsyncItemWriter - spring-batch

1) I have a large file (> 100k lines) that needs to be processed. I have a lot of business validation and checks against external systems for each line item. The code is being migrated from a legacy app and i just put these business logic into the AsyncitemProcessor, which also persists the data into the DB. Is this a good practise to create/save records in the ItemProcessor (in lieu of ItemWriter) ?
2) Code is ::
#Configuration
#EnableAutoConfiguration
#ComponentScan(basePackages = "com.liquidation.lpid")
#EntityScan(basePackages = "com.liquidation.lpid.entities")
#EnableTransactionManagement
public class SimpleJobConfiguration {
#Autowired
public JobRepository jobRepository;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
#Qualifier("myFtpSessionFactory")
private SessionFactory myFtpSessionFactory;
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Bean
public ThreadPoolTaskExecutor lpidItemTaskExecutor() {
ThreadPoolTaskExecutor tExec = new ThreadPoolTaskExecutor();
tExec.setCorePoolSize(10);
tExec.setMaxPoolSize(10);
tExec.setAllowCoreThreadTimeOut(true);
return tExec;
}
#BeforeStep
public void beforeStep(StepExecution stepExecution){
String name = stepExecution.getStepName();
System.out.println("name: " + name);
}
#Bean
public SomeItemWriterListener someItemWriterListener(){
return new SomeItemWriterListener();
};
#Bean
#StepScope
public FlatFileItemReader<FieldSet> lpidItemReader(#Value("#{stepExecutionContext['fileResource']}") String fileResource) {
System.out.println("itemReader called !!!!!!!!!!! for customer data" + fileResource);
FlatFileItemReader<FieldSet> reader = new FlatFileItemReader<FieldSet>();
reader.setResource(new ClassPathResource("/data/stage/"+ fileResource));
reader.setLinesToSkip(1);
DefaultLineMapper<FieldSet> lineMapper = new DefaultLineMapper<FieldSet>();
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
reader.setSkippedLinesCallback(new LineCallbackHandler() {
public void handleLine(String line) {
if (line != null) {
tokenizer.setNames(line.split(","));
}
}
});
lineMapper.setLineTokenizer(tokenizer);
lineMapper.setFieldSetMapper(new PassThroughFieldSetMapper());
lineMapper.afterPropertiesSet();
reader.setLineMapper(lineMapper);
return reader;
}
#Bean
public ItemWriter<FieldSet> lpidItemWriter() {
return new LpidItemWriter();
}
#Autowired
private MultiFileResourcePartitioner multiFileResourcePartitioner;
#Bean
public Step masterStep() {
return stepBuilderFactory.get("masterStep")
.partitioner(slaveStep().getName(), multiFileResourcePartitioner)
.step(slaveStep())
.gridSize(4)
.taskExecutor(lpidItemTaskExecutor())
.build();
}
#Bean
public ItemProcessListener<FieldSet,String> processListener(){
return new LpidItemProcessListener();
}
#SuppressWarnings("unchecked")
#Bean
public Step slaveStep() {
return stepBuilderFactory.get("slaveStep")
.<FieldSet,FieldSet>chunk(5)
.faultTolerant()
.listener(new ChunkListener())
.reader(lpidItemReader(null))
.processor(asyncItemProcessor())
.writer(asyncItemWriter()).listener(someItemWriterListener()).build();
}
#Bean
public AsyncItemWriter<FieldSet> asyncItemWriter(){
AsyncItemWriter<FieldSet> asyncItemProcessor = new AsyncItemWriter<>();
asyncItemProcessor.setDelegate(lpidItemWriter());
try {
asyncItemProcessor.afterPropertiesSet();
} catch (Exception e) {
e.printStackTrace();
}
return asyncItemProcessor;
}
#Bean
public ItemProcessor<FieldSet, FieldSet> processor() {
return new lpidCheckItemProcessor();
}
#Bean
public AsyncItemProcessor<FieldSet, FieldSet> asyncItemProcessor() {
AsyncItemProcessor<FieldSet, FieldSet> asyncItemProcessor = new AsyncItemProcessor<FieldSet, FieldSet>();
asyncItemProcessor.setDelegate(processor());
asyncItemProcessor.setTaskExecutor(lpidItemTaskExecutor());
try {
asyncItemProcessor.afterPropertiesSet();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return asyncItemProcessor;
}
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get("job").incrementer(new RunIdIncrementer()).start(masterStep()).build();
}
}
The itemwriter runs before the itemprocessor has completed. My understanding is: for every chunk, the item reader reads the data, item processor will churn through each item, and at the end of the chunk, the item writer gets called (which in my case,it does not do anything since the itemprocessor persists the data). But the itemwriter gets called before the item processor gets completed and my job never completes. What am i doing incorrectly here? (I looked at previous issues around it and the solution was to wrap the writer around the AsyncItemWriter(), which i am doing) .
Thanks
Sundar

Related

Multithreading with StoredProcedureItemReader

Is it possible to have multiple threads with StoredProcedureItemReader? I have done multithreading with PageReader but not sure if it will work with StoredProcedureItemReader.
Below is the job configuration for using a StoredProcedureReader. I have wrapped the reader in Thread safe reader. I want to use ThreadPoolTaskExecutor but not able to figure out how I can do partition for each thread with the Stored procedure.
***#Configuration
public class SpPocJobConfigurationMT {
private DataSource dataSource;
/**
* The Job builder factory.
*/
private JobBuilderFactory jobBuilderFactory;
/**
* The Jdbc template.
*/
#Autowired
JdbcTemplate jdbcTemplate;
/**
* The Step builder factory.
*/
private StepBuilderFactory stepBuilderFactory;
#Autowired
private BillingRecordAuditRepository billingRecordAuditRepository;
#Autowired
private StagingMortgageDataTxnRepository stagingMortgageDataTxnRepository;
private SystemRepository systemRepository;
#Autowired
public SpPocJobConfigurationMT(JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory, SystemRepository systemRepository, DataSource dataSource) {
Assert.notNull(systemRepository, "SystemRepository cannot be null");
Assert.notNull(jobBuilderFactory, "JobBuilderFactory cannot be null");
Assert.notNull(stepBuilderFactory, "StepBuilderFactory cannot be null");
Assert.notNull(dataSource, "DataSource cannot be null");
this.jobBuilderFactory = jobBuilderFactory;
this.stepBuilderFactory = stepBuilderFactory;
this.systemRepository = systemRepository;
this.dataSource = dataSource;
}
#Bean
#Transactional
#Description(value = "")
public Job SpPocJobMT() throws Exception {
return jobBuilderFactory.get("spPocJobMT")
.start(spPocStepMT())
.build();
}
#Bean
public Step spPocStepMT() throws Exception {
return stepBuilderFactory.get("spPocStepMT")
.allowStartIfComplete(false)
.<StagingDataDto,StagingDataDto> chunk(20)
.reader(sybcSpReaderMT())
.processor(spPocProcessorMT())
.writer(spPocWriterMT())
// .taskExecutor(new ThreadPoolTaskExecutor ())
// .taskExecutor(new SimpleAsyncTaskExecutor())
.build();
}
#Bean
public SpPocWriter spPocWriterMT() {
return new SpPocWriter(this.billingRepository, this.stagingTxnRepository);
}
#Bean
public SpPocProcessor spPocProcessorMT() {
return new SpPocProcessor();
}
#Bean
#StepScope
public SynchronizedItemStreamReader sybcSpReaderMT() {
StoredProcedureItemReader reader = new StoredProcedureItemReader();
SqlParameter[] parameters = {new SqlParameter("#p_id", OracleTypes.NUMBER)
, new SqlOutParameter("#p_out_c1", OracleTypes.CURSOR)
, new SqlOutParameter("#p_out_c2", OracleTypes.CURSOR)
};
reader.setDataSource(dataSource);
reader.setProcedureName("SP_POC_FINAL");
reader.setRowMapper(new SPRowMapper());
reader.setRefCursorPosition(3);
reader.setPreparedStatementSetter(new MyItemPreparedStatementSetter());
reader.setParameters(parameters);
reader.setSaveState(false);
reader.setVerifyCursorPosition(false);
SynchronizedItemStreamReader synchronizedItemStreamReader = new SynchronizedItemStreamReader();
synchronizedItemStreamReader.setDelegate(reader);
return synchronizedItemStreamReader;
}
public class MyItemPreparedStatementSetter implements PreparedStatementSetter {
#Override
public void setValues(PreparedStatement ps) throws SQLException {
ps.setInt(1, 1);
((CallableStatement) ps).registerOutParameter(2, OracleTypes.CURSOR);
((CallableStatement) ps).registerOutParameter(3, OracleTypes.CURSOR);
}
}
}***

The StoredProcedureItemReader extends AbstractItemCountingItemStreamItemReader which is not thread-safe, please check its javadoc.
If you want to use the StoredProcedureItemReader in a multi-threaded step, you need to wrap it in a SynchronizedItemStreamReader or make it step-scoped.

How to read data from 2 collections and save data from both collection and save it in 3rd collection in SpringBatch

I am using springbatch to read data from mongo db using MongoItemReader bean.Suppose i want to read data from 2 different collections in a same job instance.Is this possible?
#Bean
#StepScope
public MongoItemReader<Object> reader() throws UnexpectedInputException, ParseException, Exception {
DataReader dataReader = new DataReader();
return dataReader.read();
}
#Bean
public DataItemProcessor processor() {
return new DataItemProcessor();
}
#Bean
public MongoItemWriter<DestinationCollectionModelClass> writer() {
MongoItemWriter<DestinationCollectionModelClass> writer = new MongoItemWriter<>();
writer.setCollection("collection_name_where_data_is_saved");
writer.setTemplate(mongoTemplate);
return writer;
}
#Bean
public Step step1(MongoItemWriter<DestinationModelClass> writer) throws UnexpectedInputException, ParseException, Exception {
return stepBuilderFactory.get("step1")
// TODO: P3 chunk size configurable
.<Object, DestinationModelClass>chunk(100)
.reader(dataReader())
.processor(processor())
.writer(writer())
.build();
}
Below is my class DataReader.java
public class DataReader extends MongoItemReader {
#Autowired
private MongoTemplate mongoTemplate;
#Override
public MongoItemReader<Object> read() throws Exception, UnexpectedInputException, ParseException {
List<Object> mongoItemReaderList = new ArrayList<>();
Map<String, Direction> sorts = new HashMap<>();
sorts.put("_id", Direction.ASC);
MongoItemReader<Object> collectionOneReader = new MongoItemReader<>();
collectionOneReader.setTemplate(mongoTemplate);
collectionOneReader.setTargetType(CollectionOneModelClass.class);
collectionOneReader.setQuery("{}");
collectionOneReader.setSort(sorts);
MongoItemReader<Object> collectionTwoReader = new MongoItemReader<>();
collectionTwoReader.setTemplate(mongoTemplate);
collectionTwoReader.setTargetType(CollectionTwoModelClass.class);
collectionTwoReader.setQuery("{}");
collectionTwoReader.setSort(sorts);
mongoItemReaderList.add(collectionOneReader);
mongoItemReaderList.add(collectionTwoReader);
MongoItemReader<Object> readerObject = (MongoItemReader<Object>) mongoItemReaderList;
return readerObject;
}
}
Below is my DataItemProcessor.java
public class DataItemProcessor implements ItemProcessor<Object, DestinationModelClass> {
public DataItemProcessor() {}
#Override
public DestinationModelClass process(Object phi) throws Exception {
DestinationModelClass hbd = new DestinationModelClass();
if(phi instanceof CollectionOneModelClass) {
//Processing code if Object is an instance of CollectionOneModelClass
}
if(phi instanceof CollectionTwoModelClass) {
//Processing code if Object is an instance of CollectionTwoModelClass
}
return hbd;
}
}

You can't have two readers in the same chunk-oriented step. What you can do is use the driving query pattern, which, in your case, could be implemented as follows:
Item Reader: reads items from collection 1
Item Processor: enriches items from collection 2
Item Writer: writes enriched items to collection 3

How to add custom header in IntegrationFlow with Spring Batch Integration?

I created a pollableChannel which is listening a S3 Bucket getting files and launching a job.
My classe is like this:
#Bean
public S3SessionFactory s3SessionFactory(AmazonS3 pAmazonS3) {
return new S3SessionFactory(pAmazonS3);
}
#Bean
public S3InboundFileSynchronizer s3InboundFileSynchronizer(S3SessionFactory s3SessionFactory) {
S3InboundFileSynchronizer synchronizer = new S3InboundFileSynchronizer(s3SessionFactory);
synchronizer.setPreserveTimestamp(true);
synchronizer.setDeleteRemoteFiles(false);
synchronizer.setRemoteDirectory(awsS3Properties.getCercBucket());
return synchronizer;
}
#Bean
public S3InboundFileSynchronizingMessageSource s3InboundFileSynchronizingMessageSource(
S3InboundFileSynchronizer s3InboundFileSynchronizer) {
S3InboundFileSynchronizingMessageSource messageSource = new S3InboundFileSynchronizingMessageSource(
s3InboundFileSynchronizer);
messageSource.setAutoCreateLocalDirectory(true);
messageSource.setLocalDirectory(new FileSystemResource(integrationProperties.getTempDirectoryName()).getFile());
return messageSource;
}
#Bean("${receivable.integration.inChannel}")
public PollableChannel s3FilesChannel() {
return new QueueChannel();
}
#Bean
public IntegrationFlow integrationFlow(
S3InboundFileSynchronizingMessageSource s3InboundFileSynchronizingMessageSource) {
return IntegrationFlows
.from(s3InboundFileSynchronizingMessageSource,
c -> c.poller(Pollers.fixedRate(1000).maxMessagesPerPoll(1)))
.transform(fileMessageToJobRequest()).handle(jobLaunchingGateway())
.get();
}
#Bean
public FileMessageToJobRequest fileMessageToJobRequest() {
FileMessageToJobRequest fileMessageToJobRequest = new FileMessageToJobRequest();
fileMessageToJobRequest.setFileParameterName("input.file.name");
fileMessageToJobRequest.setJob(receivablePositionJob);
return fileMessageToJobRequest;
}
#Bean
#ServiceActivator(inputChannel = "${receivable.integration.inChannel}", poller = #Poller(fixedRate = "1000"))
public JobLaunchingGateway jobLaunchingGateway() {
SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
simpleJobLauncher.setJobRepository(jobRepository);
simpleJobLauncher.setTaskExecutor(new SyncTaskExecutor());
JobLaunchingGateway jobLaunchingGateway = new JobLaunchingGateway(simpleJobLauncher);
jobLaunchingGateway.setOutputChannel(s3FilesChannel());
return jobLaunchingGateway;
}
And my FileMessageToJobRequest is like this:
public class FileMessageToJobRequest {
private Job job;
private String fileParameterName;
public void setFileParameterName(String fileParameterName) {
this.fileParameterName = fileParameterName;
}
public void setJob(Job job) {
this.job = job;
}
#Transformer
public JobLaunchRequest toRequest(Message<File> message) {
JobParametersBuilder jobParametersBuilder = new JobParametersBuilder();
jobParametersBuilder.addString(fileParameterName, message.getPayload().getAbsolutePath());
return new JobLaunchRequest(job, jobParametersBuilder.toJobParameters());
}
}
I want to add a custom MessageHeader in the Message or my second option is intercept the context before the message is published due to I need to set my tenant in ThreadLocal.
How could I do that?
Thanks in advance.
UPDATE with enrichHeaders:
#Bean
public IntegrationFlow integrationFlow(
S3InboundFileSynchronizingMessageSource s3InboundFileSynchronizingMessageSource) {
return IntegrationFlows
.from(s3InboundFileSynchronizingMessageSource,
c -> c.poller(Pollers.fixedRate(1000).maxMessagesPerPoll(1)))
.transform(fileMessageToJobRequest())
.enrichHeaders(Map.of("teste", "testandio"))
.handle(jobLaunchingGateway())
.get();
}

First of all you must remove that #ServiceActivator(inputChannel = "${receivable.integration.inChannel}" since it points to the same s3FilesChannel, which is an outputChannel of that JobLaunchingGateway, too. So, you are making a loop with such a configuration. Not sure how it works for you at all...
To add a header before sending to that JobLaunchingGateway, you just need to add enrichHeaders() before your .handle(jobLaunchingGateway()) in that integrationFlow definition.

How to check if a job is still running or is finished regardless of finalization status

Working in Spring Batch (3) with Spring Boot(1.5) project. I have an end of day job "endOfDayJob" that is asynchronously execute through a web controller, in the controller i am returning the job execution id.
Below the code for configuration class. Highlight here that i am implementing BatchConfigurer interface and creating a async JobLauncer with SimpleAsyncTaskExecutor.
#Configuration
#EnableBatchProcessing
#EnableAsync
public class BatchConfiguration implements BatchConfigurer {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private DataSource dataSource;
#Bean
public Job endOfDayJob() throws Exception {
SimpleJobBuilder simpleJobBuilder = jobBuilderFactory.get("endOfDayJob")
.incrementer(new RunIdIncrementer())
.start(init())
.next(updateInventory())
.next(generateSalesReport())
.next(cleanup())
.next(sendReport());
return simpleJobBuilder.build();
}
#Bean
public Step init() {
return stepBuilderFactory.get("initStep").tasklet(initTasklet()).build();
}
#Bean
public Step updateInventory() {
return stepBuilderFactory.get("updateInventoryStep").tasklet(updateInventoryTasklet()).build();
}
#Bean
public Step generateSalesReport() {
return stepBuilderFactory.get("generateSalesReportStep").tasklet(generateSalesReportTasklet()).build();
}
#Bean
public Step cleanup() {
return stepBuilderFactory.get("cleanupStep").tasklet(cleanupTasklet()).build();
}
#Bean
public Step sendReport() {
return stepBuilderFactory.get("sendReportStep").tasklet(sendReportTasklet()).build();
}
#Override
public JobRepository getJobRepository() throws Exception {
JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
factory.setDataSource(dataSource);
factory.setTransactionManager(getTransactionManager());
factory.setIsolationLevelForCreate("ISOLATION_READ_COMMITTED");
factory.setTablePrefix("BATCH_");
return factory.getObject();
}
#Override
public PlatformTransactionManager getTransactionManager() throws Exception {
return new DataSourceTransactionManager(dataSource);
}
#Override
public JobLauncher getJobLauncher() throws Exception {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(getJobRepository());
jobLauncher.setTaskExecutor(new SimpleAsyncTaskExecutor());
jobLauncher.afterPropertiesSet();
return jobLauncher;
}
#Override
public JobExplorer getJobExplorer() throws Exception {
JobExplorerFactoryBean jobExplorerFactoryBean = new JobExplorerFactoryBean();
jobExplorerFactoryBean.setDataSource(dataSource);
jobExplorerFactoryBean.setTablePrefix("BATCH_");
jobExplorerFactoryBean.afterPropertiesSet();
return jobExplorerFactoryBean.getObject();
}
}
Here the code for the web controller.
#RestController
#RequestMapping("/api/job")
public class WebController {
private static final Logger logger = LoggerFactory.getLogger(WebController.class);
#Autowired
private DataSource dataSource;
#Autowired
private BatchConfiguration batchConfiguration;
#Autowired
private JobLauncher jobLauncher;
#GetMapping("/endOfDayJob")
private Long kycrBatch(#RequestParam(value = "odate", required = true) String odate) {
logger.info("ExecutingendOfDayJob with odate = {}", odate);
if (odate == null || odate.isEmpty() || odate.trim().isEmpty()) {
return -1L;
}
JobParametersBuilder jobParametersBuilder = new JobParametersBuilder();
jobParametersBuilder.addString("odate", odate);
long jobExecutionId = -1L;
try {
Job endOfDayJob = this.batchConfiguration.endOfDayJob();
jobParametersBuilder.addDate("runtime", new Date());
jobExecutionId = jobLauncher.run(endOfDayJob, jobParametersBuilder.toJobParameters()).getId();
} catch (Exception e) {
logger.error("Error ocurred executing endOfDayJob with message: {}", e.getMessage());
return -1L;
}
return jobExecutionId;
}
}
Then i want to add new method in the controller to to know if the job ended or not. What is a possible way to check if a job is still running or is already finished regardless of finalization status??

Then i want to add new method in the controller to to know if the job ended or not.
You can inject the JobExplorer in your controller and write something like:
public boolean isRunning(long jobExecutionId) {
JobExecution jobExecution = jobExplorer.getJobExecution(jobExecutionId);
return jobExecution.isRunning();
}

Spring Batch restart from last exception

I would like to restart a batch after it was terminated
I stop a batch when a particular exception throws :
public class IntegrationItemProcessorExceptionHandler implements ExceptionHandler {
private static final Logger LOG = LoggerFactory.getLogger(IntegrationItemProcessorExceptionHandler.class);
#Override
public void handleException(RepeatContext context, Throwable throwable) throws Throwable {
LOG.error("handleException", throwable);
if (throwable instanceof CustomResponseException) {
context.setTerminateOnly();
}
}
}
The input is a json I read with a FlatFileItemReader
#Bean
public ItemReader<UserDto> reader() {
FlatFileItemReader<UserDto> reader = new FlatFileItemReader<MerchantDTO>();
reader.setResource(new ClassPathResource("user.json"));
reader.setRecordSeparatorPolicy(new CustomJsonRecordSeparatorPolicy());
reader.setLineMapper(new CustomLineMapper());
return reader;
}
If CustomResponseException is throws by the ItemProcessor, I stop the batch.
After, I would like to restart the batch, but at the same line I stoped it.
What I need to to to have this behavor ??
And my job config :
#Bean
public Job importUserJob(JobBuilderFactory jobs, Step s1) {
return jobs.get("integrationJob")
.incrementer(new RunIdIncrementer())
.flow(s1)
.end()
.build();
}
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory) {
return stepBuilderFactory.get("step1")
.<UserDTO, User>chunk(10)
.reader(reader())
.processor(processor())
.writer(writer())
.listener(new IntegrationItemProcessorListener())
.exceptionHandler(new IntegrationItemProcessorExceptionHandler())
.build();
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Spring Batch: AsyncItemProcessor and AsyncItemWriter - spring-batch

Related

Multithreading with StoredProcedureItemReader

How to read data from 2 collections and save data from both collection and save it in 3rd collection in SpringBatch

How to add custom header in IntegrationFlow with Spring Batch Integration?

How to check if a job is still running or is finished regardless of finalization status

Spring Batch restart from last exception

Categories

Resources