How do I create custom ItemReader for each step in my Spring Batch project - spring-batch

I am trying to use a custom reader, processor and writer in each step:
public Step step1(StepBuilderFactory factory,
ItemReader reader,
ExpireAssessmentWriter writer,
AssessmentItemProcessor processor,
PlatformTransactionManager platformTransactionManager){
return stepBuilderFactory.get("step1")
.transactionManager(platformTransactionManager)
.<Assessment,Assessment>chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
//update aggregate balance table
#Bean
public Step step2(StepBuilderFactory factory,
ItemReader reader,
BalanceItemWriter writer,
BalanceProcessor processor,
PlatformTransactionManager platformTransactionManager){
return stepBuilderFactory.get("step2")
.transactionManager(platformTransactionManager)
.<Assessment,Assessment>chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
#Bean
public Step step3(StepBuilderFactory factory,
ItemReader<Assessment> reader,
CustomWriter3 writer,
CustomItemProcessor3 processor,
PlatformTransactionManager platformTransactionManager){
return stepBuilderFactory.get("step3")
.transactionManager(platformTransactionManager)
.<Assessment,Assessment>chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
The first steps works fine but thats only when I leave this reader in the same class:
private static final String READER_QUERY = "SELECT * FROM TABLE1 WHERE COLUMN='TEST'";
#Bean
public JdbcCursorItemReader<Assessment> reader(DataSource dataSource) {
return new JdbcCursorItemReaderBuilder<Assessment>()
.dataSource(dataSource)
.name("AssessmentUtilityReader")
.sql(READER_QUERY)
.rowMapper(new AssessmentMapper())
.build();
}
How can I create a custom reader for each of these steps that will read it's own query?
Can I create a custom reader that extends JdbcCursorItemReader
and returns this same snippet of code? :
#Bean
public JdbcCursorItemReader<Assessment> reader(DataSource dataSource) {
return new JdbcCursorItemReaderBuilder<Assessment>()
.dataSource(dataSource)
.name("AssessmentUtilityReader")
.sql(READER_QUERY)
.rowMapper(new AssessmentMapper())
.build();
}
```

Since the item type is the same for all steps, you can create a method that accepts a query and returns an item reader:
public JdbcCursorItemReader<Assessment> getReader(DataSource dataSource, String query) {
return new JdbcCursorItemReaderBuilder<Assessment>()
.dataSource(dataSource)
.name("AssessmentUtilityReader") // can be passed as a parameter as well
.sql(query)
.rowMapper(new AssessmentMapper())
.build();
}
Then call this method in each step definition and pass the required query for each step.

To Turn your reader in a custom component, which can be autowired, add the following class:
#Component
public class AssessmentUtilityReader extends JdbcCursorItemReader<Assessment> {
public AssessmentUtilityReader(final DataSource dataSource) {
setName(getClass().getSimpleName());
setDataSource(dataSource);
setRowMapper(new AssessmentMapper());
// language=SQL
setSql(
"""
SELECT *
FROM TABLE1
WHERE COLUMN = 'TEST'
""");
}
}
Hint: The Comment (// language=SQL) is an hint for IntelliJ to use SQL highlighting in the following lines. It's optional.
Simply autowire in steps definition:
#Bean
public Step step3(StepBuilderFactory factory,
AssessmentUtilityReader<Assessment> assessmentUtilityReader,
CustomWriter3 writer,
CustomItemProcessor3 processor,
PlatformTransactionManager platformTransactionManager){
return stepBuilderFactory.get("step3")
.transactionManager(platformTransactionManager)
.<Assessment,Assessment>chunk(10)
.reader(assessmentUtilityReader)
.processor(processor)
.writer(writer)
.build();
}

Related

How can I enable or disable a Step dynamically reading the enableStatus from the Database?

#Bean
public Step step1(){
return stepBuilderFactory.get("checkInfo").<A, B>chunk(10)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
I have created this step called "checkInfo", and i have other steps with other name. In my database I have a table "STEPS" with the name of the step and if it is enabled or disabled.
So i have to chain only the steps enabled to my JOB.
#PostConstruct
public void getActiveSteps(){
stepsList = stepManagementRepository.findAllByActive(true);
for (StepManagement s : stepsList){
System.out.println(s.getDescription());
}
I get all of the active ones in this function. The problem is, how can I get the step i want by the name saved on my DB? (So then i can use the .next() in the job, only if the step is active)
#Bean
public Job runJob(){
SimpleJobBuilder jobBuilder = jobBuilderFactory.get("mainCalculationJob")
.incrementer(new RunIdIncrementer())
.start(step1());
return jobBuilder.build();
}
I solved getting bean by name:
#Bean(name = "checkInfo")
public Step step1(){
return stepBuilderFactory.get("checkInfo").<A, B>chunk(10)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
#Bean
public Job runJob(){
SimpleJobBuilder jobBuilder = jobBuilderFactory.get("mainCalculationJob")
.incrementer(new RunIdIncrementer())
.start((Step) context.getBean("checkInfo"));
return jobBuilder.build();
}

[Spring Batch][Mongo] Cannot read jobParameters in ItemReader read()

I have set a step configuration and ItemReader to read data from mongoDB in same file like this...
#Bean("STEP_FETCH_DATA")
public Step fetchDatabaseStep(
ItemReader<ExampleDao> dataReader,
DataProcessor dataProcessor,
DataWriter dataWriter,
#Qualifier("TASK_EXECUTOR") TaskExecutor taskExecutor
) {
log.info("Initialize step: {}", "STEP_FETCH_DATA");
return stepBuilderFactory.get("STEP_FETCH_DATA")
.<ExampleDao, ExampleDao>chunk(chunkSize)
.processor(dataProcessor)
.reader(dataReader)
.writer(dataWriter)
.taskExecutor(taskExecutor)
.build();
}
#Bean("dataReader")
#StepScope
public ItemReader<ExampleDao> read(#Value("#{jobParameters.get(\"batchRunDate\")}") String batchRunDate) throws UnexpectedInputException, ParseException, NonTransientResourceException {
log.info("Reading start... batchRunDate : {}", batchRunDate);
MongoItemReader<ExampleDao> reader = new MongoItemReader<>();
reader.setTemplate(mongoTemplate);
reader.setSort(new HashMap<String, Sort.Direction>() {{
put("_id", Sort.Direction.DESC);
}});
reader.setTargetType(ExampleDao.class);
reader.setQuery("{}");
return reader;
}
From above code, it can access my jobParameter and working as expected.
However, if I create a class to contain my mongo ItemReader like this
#Component
#Slf4j
public class DataReaderExample {
#Autowired
private MongoTemplate mongoTemplate;
#Bean
#StepScope
public ItemReader<ExampleDao> read(#Value("#{jobParameters.get(\"batchRunDate\")}") String
batchRunDate) throws UnexpectedInputException, ParseException, NonTransientResourceException {
log.info("Reading start... batchRunDate : {}", batchRunDate);
MongoItemReader<ExampleDao> reader = new MongoItemReader<>();
reader.setTemplate(mongoTemplate);
reader.setSort(new HashMap<String, Sort.Direction>() {{
put("_id", Sort.Direction.DESC);
}});
reader.setTargetType(ExampleDao.class);
reader.setQuery("{}");
return reader;
}
}
Then set a step configuration like this. (Notice the .reader(dataReadExample.read(null)). I expected #Value("#{jobParameters.get("batchRunDate") in read() argument will overide the null value)
#Bean("STEP_FETCH_DATA")
public Step fetchDatabaseStep(
DataReaderExample dataReadExample ,
DataProcessor dataProcessor,
DataWriter dataWriter,
#Qualifier("TASK_EXECUTOR") TaskExecutor taskExecutor
) {
log.info("Initialize step: {}", "STEP_FETCH_DATA");
return stepBuilderFactory.get("STEP_FETCH_DATA")
.<ExampleDao, ExampleDao>chunk(chunkSize)
.processor(dataProcessor)
.reader(dataReadExample.read(null))
.writer(dataWriter)
.taskExecutor(taskExecutor)
.build();
}
My log.info("Reading start... batchRunDate : {}", batchRunDate); the batch will always print out as null value and the #Value("#{jobParameters.get("batchRunDate") is not working. Seem like I cannot access the jobParameters.
Have anyone could explain me about this behavior and how to move the ItemReader to another class. My goal is to seperate ItemReader into another class. Thanks!
Your DataReaderExample is declared as a #Component, it should rather be a #Configuration class in which you declare bean definitions.
I suggest to change the read method to itemReader or something similar because its purpose is to define the item reader bean and not actually read the data.
Once that done, you can import your DataReaderExample configuration class in your application context and autowire the item reader in your step:
#Bean("STEP_FETCH_DATA")
public Step fetchDatabaseStep(
ItemReader<ExampleDao> itemReader ,
DataProcessor dataProcessor,
DataWriter dataWriter,
#Qualifier("TASK_EXECUTOR") TaskExecutor taskExecutor
) {
log.info("Initialize step: {}", "STEP_FETCH_DATA");
return stepBuilderFactory.get("STEP_FETCH_DATA")
.<ExampleDao, ExampleDao>chunk(chunkSize)
.processor(dataProcessor)
.reader(itemReader)
.writer(dataWriter)
.taskExecutor(taskExecutor)
.build();
}

Spring Batch - abstract Step definition in Java configuration?

My Spring Batch Job configuration has 5 steps, all of which are identical except for the reader. Is there a way I can abstract out all of the other parts of the step into a "parent" step, so that I don't need to repeat everything? I know this can be done in XML, but I can't figure out the java equivalent.
Here's one of the steps:
public Step quarterlyStep(FileIngestErrorListener listener, ItemReader<DistributionItem> quarterlyReader) {
return stepBuilderFactory.get("quarterlyStep")
.<DistributionItem,DistributionItem>chunk(10)
.reader(quarterlyReader) // The only thing that changes among 5 different steps
.listener(listener.asReadListener())
.processor(processor())
.listener(listener.asProcessListener())
.writer(writer())
.listener(listener.asWriteListener())
.faultTolerant()
.skip(ValidationException.class)
.skip(ExcelFileParseException.class)
.build();
}
Here's the definition of one of the readers:
#Bean
#JobScope
public PoiItemReader<DistributionItem> yearEndReader(#Value("#{jobExecutionContext['filename']}") String filename) {
PoiItemReader<PortfolioFundsDistributionItem> reader = new PoiItemReader<>();
reader.setLinesToSkip(1);
reader.setRowMapper(yearEndRowMapper());
reader.setResource(new FileSystemResource(filename));
return reader;
}
You can do something like:
private StepBuilderFactory stepBuilderFactory;
private SimpleStepBuilder<Integer, Integer> createBaseStep(String stepName) {
return stepBuilderFactory.get(stepName)
.<Integer, Integer>chunk(5)
.processor(itemProcessor())
.writer(itemWriter());
}
#Bean
public Step step1(ItemReader<Integer> itemReader) {
return createBaseStep("step1")
.reader(itemReader)
.build();
}
#Bean
public Step step2(ItemReader<Integer> itemReader) {
return createBaseStep("step2")
.reader(itemReader)
.build();
}

Why do I need an ItemReader in my job step if I only need to delete rows using ItemWriter

I have a step in my batch job that I want to use only to delete rows from a table.
The step looks like this:
#Bean
public Step step2(StepBuilderFactory factory,
PurgeAggBalanceWriter writer,
DataSource dataSource,
PlatformTransactionManager platformTransactionManager){
return stepBuilderFactory.get("step2")
.transactionManager(platformTransactionManager)
.<Assessment,Assessment>chunk(10)
.reader(getReader(dataSource, READER_QUERY2, "AggBalanceMapper", new AggBalanceMapper()))
.writer(writer)
.build();
}
I am using this writer class with a jdcb template to run the delete statement:
public class PurgeAggBalanceWriter implements ItemWriter<Assessment> {
private JdbcTemplate jdbcTemplate;
private static final String DELETE_QUERY = "DELETE FROM TABLE WHERE COLUMN = 'TEST'";
public PurgeAggBalanceWriter(DataSource dataSource) {
this.jdbcTemplate = new JdbcTemplate(dataSource);
}
#Override
public void write(List<? extends Assessment> list) {
jdbcTemplate.update(DELETE_QUERY);
}
The step completes successfully but I dont see why an ItemReader is required as it states when I try to remove the .reader() from step2.
Is there a way to avoid using a reader/mapper and just using the writer since all I have to do is run a delete query?
all I have to do is run a delete query
In this case, you don't need a chunk-oriented tasklet. A simple tasklet is enough, something like:
public class DeletionTasklet implements Tasklet {
private static final String DELETE_QUERY = "DELETE FROM TABLE WHERE COLUMN = 'TEST'";
private JdbcTemplate jdbcTemplate;
public DeletionTasklet(DataSource dataSource) {
this.jdbcTemplate = new JdbcTemplate(dataSource);
}
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) {
jdbcTemplate.update(DELETE_QUERY);
return RepeatStatus.FINISHED;
}
}

For second page onwards, JdbcPagingItemReader is not putting values automatically for sortkey placeholder

I am using JdbcPagingItemReader as below,
#Bean
public ItemReader<RemittanceVO> reader() {
JdbcPagingItemReader<RemittanceVO> reader = new JdbcPagingItemReader<RemittanceVO>();
reader.setDataSource(dataSource);
reader.setRowMapper(new RemittanceRowMapper());
reader.setQueryProvider(queryProvider);
reader.setPageSize(100);
return reader;
}
#Bean
public PagingQueryProvider queryProvider() throws Exception{
SqlPagingQueryProviderFactoryBean queryProviderBean= new SqlPagingQueryProviderFactoryBean();
queryProviderBean.setDataSource(dataSource);
queryProviderBean.setSelectClause(Constants.REMITTANCES_SELECT_CLAUSE);
queryProviderBean.setFromClause(Constants.REMITTANCES_FROM_CLAUSE);
queryProviderBean.setWhereClause(Constants.REMITTANCES_WHERE_CLAUSE);
queryProviderBean.setSortKey(Constants.REMITTANCES_SORT_KEY);
PagingQueryProvider queryProvider = queryProviderBean.getObject();
return queryProvider;
}
As of now, I launch job as below ( as I am very new to Spring batch )
JobLauncher jobLauncher = (JobLauncher) ctx.getBean("jobLauncher");
Job job = (Job) ctx.getBean("runRCMatcher");
try {
JobExecution execution = jobLauncher.run(job, new JobParameters());
}catch (Exception e) {
e.printStackTrace();
}
I am running this app as SpringBoot app. It fetches first 100 records successfully and hands over to processor and then next query fails. Query fails because sort key value has not been placed in it. This is there in query , AND ((REMIT_ID > ?)) ORDER BY REMIT_ID ASC FETCH FIRST 100 ROWS ONLY;
Where am I wrong?
My DB is DB2 so I guess it should be using - Db2PagingQueryProvider
Step & Job are defined as ,
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> reader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.<RemittanceVO, RemittanceClaimVO> chunk(100).reader(reader)
.processor(processor).writer(writer).build();
}
#Bean
public Job runRCMatcher(JobBuilderFactory jobs, Step s1) {
return jobs.get("RCMatcher")
.incrementer(new RunIdIncrementer())
.flow(s1)
.end()
.build();
}
Sort key specified is table column name - Constants.REMITTANCES_SORT_KEY and that is a primary key of table and of type BIGINT