My Spring Batch Job configuration has 5 steps, all of which are identical except for the reader. Is there a way I can abstract out all of the other parts of the step into a "parent" step, so that I don't need to repeat everything? I know this can be done in XML, but I can't figure out the java equivalent.
Here's one of the steps:
public Step quarterlyStep(FileIngestErrorListener listener, ItemReader<DistributionItem> quarterlyReader) {
return stepBuilderFactory.get("quarterlyStep")
.<DistributionItem,DistributionItem>chunk(10)
.reader(quarterlyReader) // The only thing that changes among 5 different steps
.listener(listener.asReadListener())
.processor(processor())
.listener(listener.asProcessListener())
.writer(writer())
.listener(listener.asWriteListener())
.faultTolerant()
.skip(ValidationException.class)
.skip(ExcelFileParseException.class)
.build();
}
Here's the definition of one of the readers:
#Bean
#JobScope
public PoiItemReader<DistributionItem> yearEndReader(#Value("#{jobExecutionContext['filename']}") String filename) {
PoiItemReader<PortfolioFundsDistributionItem> reader = new PoiItemReader<>();
reader.setLinesToSkip(1);
reader.setRowMapper(yearEndRowMapper());
reader.setResource(new FileSystemResource(filename));
return reader;
}
You can do something like:
private StepBuilderFactory stepBuilderFactory;
private SimpleStepBuilder<Integer, Integer> createBaseStep(String stepName) {
return stepBuilderFactory.get(stepName)
.<Integer, Integer>chunk(5)
.processor(itemProcessor())
.writer(itemWriter());
}
#Bean
public Step step1(ItemReader<Integer> itemReader) {
return createBaseStep("step1")
.reader(itemReader)
.build();
}
#Bean
public Step step2(ItemReader<Integer> itemReader) {
return createBaseStep("step2")
.reader(itemReader)
.build();
}
Related
#Bean
public Step step1(){
return stepBuilderFactory.get("checkInfo").<A, B>chunk(10)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
I have created this step called "checkInfo", and i have other steps with other name. In my database I have a table "STEPS" with the name of the step and if it is enabled or disabled.
So i have to chain only the steps enabled to my JOB.
#PostConstruct
public void getActiveSteps(){
stepsList = stepManagementRepository.findAllByActive(true);
for (StepManagement s : stepsList){
System.out.println(s.getDescription());
}
I get all of the active ones in this function. The problem is, how can I get the step i want by the name saved on my DB? (So then i can use the .next() in the job, only if the step is active)
#Bean
public Job runJob(){
SimpleJobBuilder jobBuilder = jobBuilderFactory.get("mainCalculationJob")
.incrementer(new RunIdIncrementer())
.start(step1());
return jobBuilder.build();
}
I solved getting bean by name:
#Bean(name = "checkInfo")
public Step step1(){
return stepBuilderFactory.get("checkInfo").<A, B>chunk(10)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
#Bean
public Job runJob(){
SimpleJobBuilder jobBuilder = jobBuilderFactory.get("mainCalculationJob")
.incrementer(new RunIdIncrementer())
.start((Step) context.getBean("checkInfo"));
return jobBuilder.build();
}
This section reads in the file from our server, processes it, writes it out and archives it.
#Bean
public Step step1() {
log.info("Made if to step1");
System.out.println("Made it to Step 1");
return this.stepBuilderFactory.get("step1")
.<PaymentTransaction, PaymentTransaction>chunk(10)
.reader(paymentTransactionItemReader())
.writer(paymentTransactionItemWriter())
.build();
}
#Bean
public JobExecutionDecider decider() {
System.out.println("Made it to the decider");
return (jobExecution, stepExecution) -> new FlowExecutionStatus("Success"); }
#Bean
public FlowJobBuilder job() {
return jobBuilderFactory.get("BenefitIssuance")
.start(step1())
.next(decider())
.on("Success")
.end()
.build();
}
However when it reaches the build() step at the end, it loops back to the reader
As mentioned in the comments, I don't see why the job() method returns a FlowJobBuilder and not a Job. The following job definition does not loop back on the same step:
#Bean
public Job job() {
return jobs.get("job")
.start(step1())
.next(decider())
.on("Success")
.end()
.build()
.build();
}
I am trying to use a custom reader, processor and writer in each step:
public Step step1(StepBuilderFactory factory,
ItemReader reader,
ExpireAssessmentWriter writer,
AssessmentItemProcessor processor,
PlatformTransactionManager platformTransactionManager){
return stepBuilderFactory.get("step1")
.transactionManager(platformTransactionManager)
.<Assessment,Assessment>chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
//update aggregate balance table
#Bean
public Step step2(StepBuilderFactory factory,
ItemReader reader,
BalanceItemWriter writer,
BalanceProcessor processor,
PlatformTransactionManager platformTransactionManager){
return stepBuilderFactory.get("step2")
.transactionManager(platformTransactionManager)
.<Assessment,Assessment>chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
#Bean
public Step step3(StepBuilderFactory factory,
ItemReader<Assessment> reader,
CustomWriter3 writer,
CustomItemProcessor3 processor,
PlatformTransactionManager platformTransactionManager){
return stepBuilderFactory.get("step3")
.transactionManager(platformTransactionManager)
.<Assessment,Assessment>chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
The first steps works fine but thats only when I leave this reader in the same class:
private static final String READER_QUERY = "SELECT * FROM TABLE1 WHERE COLUMN='TEST'";
#Bean
public JdbcCursorItemReader<Assessment> reader(DataSource dataSource) {
return new JdbcCursorItemReaderBuilder<Assessment>()
.dataSource(dataSource)
.name("AssessmentUtilityReader")
.sql(READER_QUERY)
.rowMapper(new AssessmentMapper())
.build();
}
How can I create a custom reader for each of these steps that will read it's own query?
Can I create a custom reader that extends JdbcCursorItemReader
and returns this same snippet of code? :
#Bean
public JdbcCursorItemReader<Assessment> reader(DataSource dataSource) {
return new JdbcCursorItemReaderBuilder<Assessment>()
.dataSource(dataSource)
.name("AssessmentUtilityReader")
.sql(READER_QUERY)
.rowMapper(new AssessmentMapper())
.build();
}
```
Since the item type is the same for all steps, you can create a method that accepts a query and returns an item reader:
public JdbcCursorItemReader<Assessment> getReader(DataSource dataSource, String query) {
return new JdbcCursorItemReaderBuilder<Assessment>()
.dataSource(dataSource)
.name("AssessmentUtilityReader") // can be passed as a parameter as well
.sql(query)
.rowMapper(new AssessmentMapper())
.build();
}
Then call this method in each step definition and pass the required query for each step.
To Turn your reader in a custom component, which can be autowired, add the following class:
#Component
public class AssessmentUtilityReader extends JdbcCursorItemReader<Assessment> {
public AssessmentUtilityReader(final DataSource dataSource) {
setName(getClass().getSimpleName());
setDataSource(dataSource);
setRowMapper(new AssessmentMapper());
// language=SQL
setSql(
"""
SELECT *
FROM TABLE1
WHERE COLUMN = 'TEST'
""");
}
}
Hint: The Comment (// language=SQL) is an hint for IntelliJ to use SQL highlighting in the following lines. It's optional.
Simply autowire in steps definition:
#Bean
public Step step3(StepBuilderFactory factory,
AssessmentUtilityReader<Assessment> assessmentUtilityReader,
CustomWriter3 writer,
CustomItemProcessor3 processor,
PlatformTransactionManager platformTransactionManager){
return stepBuilderFactory.get("step3")
.transactionManager(platformTransactionManager)
.<Assessment,Assessment>chunk(10)
.reader(assessmentUtilityReader)
.processor(processor)
.writer(writer)
.build();
}
I have a Spring Batch project with multiple jobs (job A, job B, job C,...). When I run a particular job A, I got the log of the job A shows that all of the beans of job B, C,... are created too. Is there any way to avoid the creation of the other beans when job A is launched.
I have tried to use #Lazy annotation but it 's seem not working.
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
#Qualifier("springDataSource")
public DataSource springDataSource;
#Autowired
#Qualifier("batchJobDataSource")
public DataSource batchJobDataSource;
}
#Configuration
#PropertySource("classpath:partner.properties")
public class B extends BatchConfiguration {
#Value("${partnerId}")
private String partnerId;
#Lazy
#Bean
public Job ProcessB(JobCompletionNotificationListener listener) {
return jobBuilderFactory
.get("ProcessB")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(ProcessStepB())
.build();
}
#Lazy
#Bean
public Step (ProcessStepB() {
return stepBuilderFactory
.get("(ProcessStepB")
.<PartnerDTO, PartnerDTO> chunk(1)
.reader(getPartner())
.processor(process())
.writer(save())
.build();
}
#Lazy
#Bean(destroyMethod = "")
public Reader getPartner() {
return new Reader(batchJobDataSource,partnerId);
}
#Lazy
#Bean
public Processor process() {
return new Processor();
}
#Lazy
#Bean
HistoryWriter historyWriter() {
return new HistoryWriter(batchJobDataSource);
}
#Lazy
#Bean
UpdateWriter updateWriter() {
return new UpdateWriter(batchJobDataSource);
}
#Lazy
#Bean
public CompositeItemWriter<PartnerDTO> saveTransaction() {
List<ItemWriter<? super PartnerDTO>> delegates = new ArrayList<>();
delegates.add(updateWriter());
delegates.add(historyWriter());
CompositeItemWriter<PartnerDTO> itemWriter = new CompositeItemWriter<>();
itemWriter.setDelegates(delegates);
return itemWriter;
}
}
I have also put the #Lazy over the #Configuration but it does work too.
That should not be an issue. But here are a few ideas to try:
Use Spring profiles to isolate job beans
If you use Spring Boot 2.2+, try to activate the lazy bean initialization mode
Package each job in its own jar. This is the best option IMO.
1) I have a large file (> 100k lines) that needs to be processed. I have a lot of business validation and checks against external systems for each line item. The code is being migrated from a legacy app and i just put these business logic into the AsyncitemProcessor, which also persists the data into the DB. Is this a good practise to create/save records in the ItemProcessor (in lieu of ItemWriter) ?
2) Code is ::
#Configuration
#EnableAutoConfiguration
#ComponentScan(basePackages = "com.liquidation.lpid")
#EntityScan(basePackages = "com.liquidation.lpid.entities")
#EnableTransactionManagement
public class SimpleJobConfiguration {
#Autowired
public JobRepository jobRepository;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
#Qualifier("myFtpSessionFactory")
private SessionFactory myFtpSessionFactory;
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Bean
public ThreadPoolTaskExecutor lpidItemTaskExecutor() {
ThreadPoolTaskExecutor tExec = new ThreadPoolTaskExecutor();
tExec.setCorePoolSize(10);
tExec.setMaxPoolSize(10);
tExec.setAllowCoreThreadTimeOut(true);
return tExec;
}
#BeforeStep
public void beforeStep(StepExecution stepExecution){
String name = stepExecution.getStepName();
System.out.println("name: " + name);
}
#Bean
public SomeItemWriterListener someItemWriterListener(){
return new SomeItemWriterListener();
};
#Bean
#StepScope
public FlatFileItemReader<FieldSet> lpidItemReader(#Value("#{stepExecutionContext['fileResource']}") String fileResource) {
System.out.println("itemReader called !!!!!!!!!!! for customer data" + fileResource);
FlatFileItemReader<FieldSet> reader = new FlatFileItemReader<FieldSet>();
reader.setResource(new ClassPathResource("/data/stage/"+ fileResource));
reader.setLinesToSkip(1);
DefaultLineMapper<FieldSet> lineMapper = new DefaultLineMapper<FieldSet>();
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
reader.setSkippedLinesCallback(new LineCallbackHandler() {
public void handleLine(String line) {
if (line != null) {
tokenizer.setNames(line.split(","));
}
}
});
lineMapper.setLineTokenizer(tokenizer);
lineMapper.setFieldSetMapper(new PassThroughFieldSetMapper());
lineMapper.afterPropertiesSet();
reader.setLineMapper(lineMapper);
return reader;
}
#Bean
public ItemWriter<FieldSet> lpidItemWriter() {
return new LpidItemWriter();
}
#Autowired
private MultiFileResourcePartitioner multiFileResourcePartitioner;
#Bean
public Step masterStep() {
return stepBuilderFactory.get("masterStep")
.partitioner(slaveStep().getName(), multiFileResourcePartitioner)
.step(slaveStep())
.gridSize(4)
.taskExecutor(lpidItemTaskExecutor())
.build();
}
#Bean
public ItemProcessListener<FieldSet,String> processListener(){
return new LpidItemProcessListener();
}
#SuppressWarnings("unchecked")
#Bean
public Step slaveStep() {
return stepBuilderFactory.get("slaveStep")
.<FieldSet,FieldSet>chunk(5)
.faultTolerant()
.listener(new ChunkListener())
.reader(lpidItemReader(null))
.processor(asyncItemProcessor())
.writer(asyncItemWriter()).listener(someItemWriterListener()).build();
}
#Bean
public AsyncItemWriter<FieldSet> asyncItemWriter(){
AsyncItemWriter<FieldSet> asyncItemProcessor = new AsyncItemWriter<>();
asyncItemProcessor.setDelegate(lpidItemWriter());
try {
asyncItemProcessor.afterPropertiesSet();
} catch (Exception e) {
e.printStackTrace();
}
return asyncItemProcessor;
}
#Bean
public ItemProcessor<FieldSet, FieldSet> processor() {
return new lpidCheckItemProcessor();
}
#Bean
public AsyncItemProcessor<FieldSet, FieldSet> asyncItemProcessor() {
AsyncItemProcessor<FieldSet, FieldSet> asyncItemProcessor = new AsyncItemProcessor<FieldSet, FieldSet>();
asyncItemProcessor.setDelegate(processor());
asyncItemProcessor.setTaskExecutor(lpidItemTaskExecutor());
try {
asyncItemProcessor.afterPropertiesSet();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return asyncItemProcessor;
}
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get("job").incrementer(new RunIdIncrementer()).start(masterStep()).build();
}
}
The itemwriter runs before the itemprocessor has completed. My understanding is: for every chunk, the item reader reads the data, item processor will churn through each item, and at the end of the chunk, the item writer gets called (which in my case,it does not do anything since the itemprocessor persists the data). But the itemwriter gets called before the item processor gets completed and my job never completes. What am i doing incorrectly here? (I looked at previous issues around it and the solution was to wrap the writer around the AsyncItemWriter(), which i am doing) .
Thanks
Sundar