mark read data as "processing" by a table column flag then restore at the end - spring-batch

Below is a relevant portion of code for reader, processor , writer and step for batch job that I create.
I have a requirement to update a flag column in table from where data is being read ( source table ) to mark that this data is being processed by this job so other apps don't pick up that data. Then once processing of read records is finished, I need to restore that column to original value so other apps can work on those records too.
I guess, listener is the approach to take ( ItemReadListener ? ) . Reader listener seems suitable only for first step ( i.e to update flag column ) but not for restore at the end of chunk. Challenge seems to be making read data available at the end of processor.
Can anybody suggest about possible approaches?
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> reader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.<RemittanceVO, RemittanceClaimVO> chunk(Constants.SPRING_BATCH_CHUNK_SIZE)
.reader(reader)
.processor(processor)
.writer(writer)
.taskExecutor(simpleAsyntaskExecutor)
.throttleLimit(Constants.THROTTLE_LIMIT)
.build();
}
#Bean
public ItemReader<RemittanceVO> reader() {
JdbcPagingItemReader<RemittanceVO> reader = new JdbcPagingItemReader<RemittanceVO>();
reader.setDataSource(dataSource);
reader.setRowMapper(new RemittanceRowMapper());
reader.setQueryProvider(queryProvider);
reader.setPageSize(Constants.SPRING_BATCH_READER_PAGE_SIZE);
return reader;
}
#Bean
public ItemProcessor<RemittanceVO, RemittanceClaimVO> processor() {
return new MatchClaimProcessor();
}
#Bean
public ItemWriter<RemittanceClaimVO> writer(DataSource dataSource) {
return new MatchedClaimWriter();
}
I started with Spring Batch few days ago so don't have familiarity with all the provided modeling and patterns.

Firstly, a small hint about using an asyncTaskExecutor: you have to synchronize the reader, otherwise you will run into concurrency problems. You can use SynchronizedItemStreamReader to do this:
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> reader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.<RemittanceVO, RemittanceClaimVO> chunk(Constants.SPRING_BATCH_CHUNK_SIZE)
.reader(syncReader)
.processor(processor)
.writer(writer)
.taskExecutor(simpleAsyntaskExecutor)
.throttleLimit(Constants.THROTTLE_LIMIT)
.build();
}
#Bean
public ItemReader<RemittanceVO> syncReader() {
SynchronizedItemStreamReader<RemittanceVO> syncReader = new SynchronizedItemStreamReader<>();
syncReader.setDelegate(reader());
return syncReader;
}
#Bean
public ItemReader<RemittanceVO> reader() {
JdbcPagingItemReader<RemittanceVO> reader = new JdbcPagingItemReader<RemittanceVO>();
reader.setDataSource(dataSource);
reader.setRowMapper(new RemittanceRowMapper());
reader.setQueryProvider(queryProvider);
reader.setPageSize(Constants.SPRING_BATCH_READER_PAGE_SIZE);
return reader;
}
Secondly, a possible approach to your real question:
I would use a simple tasklet in order to "mark" the entries you want to process.
You can do this with one simple UPDATE-statement, since you know your selection criterias. This way, you only need one call and therefore only one transaction.
After that, I would implement an normal step with reader, processor and writer.
The reader has to read only the marked entries, making your select clause also very simple.
In order to restore the flag, you could do that in a third step which is implemented as tasklet and uses an appropriate UPDATE-statement (like the first step). To ensure that the flag is restored in the case of an exception, just configure your jobflow appropriately, so that step 3 is executed even if step 2 fails (-> see my answer to this question Spring Batch Java Config: Skip step when exception and go to next steps)
Of course, you could also restore the flag when writing the chunk if you use a compositeItemWriter. However, you need a strategy how to restore the flag in case of an exception in step 2.
IMO, using a Listener is not a good idea, since the transaction handling is differently.

Related

Spring Batch : CompositeItemWriter, #BeforeStep and Controlling StepExecution

I've stumbled upon a pretty twisted issue with a spring batch recently.
Requirements are as follows :
I have two main steps :
The first one reads some data from an oracle database, from a table to write to another table.
The second one does some other database stuff, based upon a data handled on first step.
From a design standpoint, first step looks like this :
#Bean
public Step myFirstStep(JdbcCursorItemReader<Revision> reader) {
return stepBuilderFactory.get("my-first-step")
.<Revision, Revision>chunk(1)
.reader(readerRevisionNumber)
.writer(compositeItemWriter())
.listener(executionContextPromotionListener())
.build();
Composite item writer :
#Bean
public CompositeItemWriter<Revision> compositeItemWriter() {
CompositeItemWriter writer = new CompositeItemWriter();
writer.setDelegates(Arrays.asList(somewriter(), someOtherwriter(), aWriterThatIsSupposedToPassDataToAnotherStep()));
return writer;
}
While the first two writer are not complex, my interest is focused on the third one.
aWriterThatIsSupposedToPassDataToAnotherStep()
As you might have guessed, this one will be used to get some data being processed before, to promote it on my second Step :
#Component
#StepScope
public class AWriterThatIsSupposedToPassDataToAnotherStep implements ItemWriter<SomeEntity> {
private StepExecution stepExecution;
public void write(List<? extends SomeEntity> items) {
ExecutionContext stepContext = this.stepExecution.getExecutionContext();
stepContext.put("revisionNumber", items.stream().findFirst().get().getSomeField());
System.out.println("writing : " + items.stream().findFirst().get().getSomeField()+ "to ExecutionContext");
}
#BeforeStep
public void saveStepExecution(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
}
Problem is : As long as this writer is part of a composite writer list (as declared above)
The #BeforeStep of my last writer is never executed, this ends up me unable to transmit my information to execution context.
When replacing my CompositeItemWriter by my single "AWriterThatIsSupposedToPassDataToAnotherStep" inside step definition, it gets executed properly.
Does it have to do anything with some kind of declaration order or something ?
Big Thanks to further help.
Found the solution (with some of my coworkers help), and sourced-in from : https://stackoverflow.com/a/39698653/1957764
You'll need to both declare the writer as part of the composite writer AND a step listener to make it execute the #BeforeStep annotated method.

How to make a non-blocking item processor in spring batch (Not only asynchronous with a TaskExecuter)?

Spring batch has a facility called AsyncItemProcessor. It simply wraps an ItemProcessor and runs it with a TaskExecutor, so it can run asynchronously. I want to have a rest call in this ItemProcessor, the problem is that every thread inside this TaskExecutor which makes the rest call, will be blocked until the response is gotten. I want to make it non-blocking, something like a reactive paradigm.
I have an ItemProcessor that calls a Rest point and get its response:
#Bean
public ItemProcessor<String, String> testItemProcessor() {
return item -> {
String url = "http://localhost:8787/test";
try {
// it's a long time process and take a lot of time
String response = restTemplate.exchange(new URI(url), HttpMethod.GET, new RequestEntity(HttpMethod.GET, new URI(url)), String.class).getBody();
return response;
} catch (URISyntaxException e) {
e.printStackTrace();
return null;
}
};
}
Now I wrap it with AsyncItemProcessor:
#Bean
public AsyncItemProcessor testAsyncItemProcessor() throws Exception {
AsyncItemProcessor asyncItemProcessor = new AsyncItemProcessor<>();
asyncItemProcessor.setDelegate(testItemProcessor());
asyncItemProcessor.setTaskExecutor(testThreadPoolTaskExecutor());
asyncItemProcessor.afterPropertiesSet();
return asyncItemProcessor;
}
#Bean
public ThreadPoolTaskExecutor testThreadPoolTaskExecutor() {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(50);
threadPoolTaskExecutor.setMaxPoolSize(100);
threadPoolTaskExecutor.setWaitForTasksToCompleteOnShutdown(true);
return threadPoolTaskExecutor;
}
I used a ThreadPoolTaskExecutor as the TaskExecuter.
This is the ItemWriter:
#Bean
public ItemWriter<String> testItemWriter() {
return items -> {
// I write them to a file and a database, but for simplicity:
for (String item : items) {
System.out.println(item);
}
};
}
#Bean
public AsyncItemWriter asyncTestItemWriter() throws Exception {
AsyncItemWriter asyncItemWriter = new AsyncItemWriter<>();
asyncItemWriter.setDelegate(testItemWriter());
asyncItemWriter.afterPropertiesSet();
return asyncItemWriter;
}
The step and job configuration:
#Bean
public Step testStep() throws Exception {
return stepBuilderFactory.get("testStep")
.<String, String>chunk(1000)
.reader(testItemReader())
.processor(testAsyncItemProcessor())
.writer(asyncTestItemWriter())
.build();
}
#Bean
public Job testJob() throws Exception {
return jobBuilderFactory.get("testJob")
.start(testStep())
.build();
}
The ItemReader is a simple ListItemReader:
#Bean
public ItemReader<String> testItemReader() {
List<String> integerList = new ArrayList<>();
for (int i=0; i<10000; i++) {
integerList.add(String.valueOf(i));
}
return new ListItemReader(integerList);
}
Now I have a ThreadPoolTaskExecutor with 50~100 threads. Each thread inside ItemProcessor makes a rest call and waits/blocks to receive the response from the server. Is there a way to make these calls/process non-blocking? If the answer is yes, how should I design the ItemWriter? Inside the ItemWriter I want to write the results from the ItemProcessor to a file and a database.
Each chunk has a size of 1000, I can wait until all of the records inside it get processed, but I don't want to block a thread per each rest call inside the chunk. Is there any way to accomplish that?
I know that the Spring rest template is the one which makes the process blocking and webclient should be used, but is there any equivalent component in spring batch (instead of AsyncItemProcessor/AsyncItemWriter) for web client?
No, there is no support for reactive programming in Spring Batch yet, there is an open feature request here: https://github.com/spring-projects/spring-batch/issues/1008.
Please note that going reactive means the entire the stack should be reactive, from batch artefacts (reader, processor, writer, listeners, etc) to infrastructure beans (job repository, transaction manager, etc), and not only your item processor and writer.
Moreover, the current chunk processing model is actually incompatible with reactive paradigm. The reason is that a ChunkOrientedTasklet uses basically two collaborators:
A ChunkProvider which provides chunks of items (delegating item reading to an ItemReader)
A ChunkProcessor which processes chunks (delegating processing and writing respectively to an ItemProcessor/ItemWriter)
Here is a simplified version of the code:
Chunk inputs = chunkProvider.provide();
chunkProcessor.process(inputs);
As you can see, the step will wait for the chunkProcessor (processor + writer) to process the whole chunk before reading the next one. So in your case, even if you use non-blocking APIs in your processor + writer, your step will be waiting for the chunk to be completely processed before reading the next chunk (besides waiting for blocking interactions with the job repository and transaction manager).

RepositoryItemReader skipping chunks

I am using RepositoryItemReader to read records from the database and using Chunk oriented process these records. I am using 100 as page size and commit interval.
Query for the reader has "timestamp" in where condition and this date will be get updated by chunk processing when a chunk of 100 get the process and committed. The problem I am running is let's say I have 986 records that need's to be read and updated the date in a chunk of size 100 (1-100). All works as expected first time but when it picks up the second chunk it is processing 201-300, not 101-200 which is unexpected. This pattern continues, the third time it is picking up 501-600 and so on. The pattern is skipping 100 first time and 200 second time and so on. Is my update and commit of chunks causing this? Please advise how to fix this, so it can process all the records.
Spring batch version: 4.0.1.RELEASE
Code:
#Autowired
private MpImportRepository importRepo;
#Autowired
JpaTransactionManager jpaTransactionManager;
#Autowired
private MpImportRepository importRepo;
#Bean
#StepScope
public RepositoryItemReader<MpImport> importDataReader() {
RepositoryItemReader<MpImport> reader = new RepositoryItemReader<>();
reader.setPageSize(100);
reader.setRepository(importRepo);
reader.setMethodName("findAllImportedMissingPersons");
reader.setSort(Collections.singletonMap("missingDate", Sort.Direction.ASC));
return reader;
}
#Bean
#Qualifier("mpDataExtractAndSaveToYrstJob")
public Job mpDataExtractAndSaveToYrstJob() {
return jobBuilderFactory.get("mpDataExtractAndSaveToYrstJob")
.incrementer(new RunIdIncrementer())
.listener(jobCompletionListener)
.flow(mpDataExtractAndSaveToYrstStep())
.end().build();
#Bean
#Qualifier("mpDataExtractAndSaveToYrstStep")
public Step mpDataExtractAndSaveToYrstStep() {
return stepBuilderFactory.get("mpDataExtractAndSaveToYrstStep")
.<VMpHotfilesDailyExtract, MpImport> chunk(Integer.parseInt(100))
.reader(hotFilesReader())
.processor(hotFilesProcessor())
.writer(importDataWriter())
.transactionManager(jpaTransactionManager)
.listener(mpdataExtractStepListener)
.listener(chunkCompletionListener)
.build();
}
#Bean
#StepScope
public RepositoryItemWriter<MpImport> importDataWriter() {
RepositoryItemWriter<MpImport> writer = new RepositoryItemWriter<>();
writer.setRepository(importRepo);
writer.setMethodName("save");
return writer;
}

How to handle stateful item reader in SpringBatch

Our SpringBatch Job has a single Step with an ItemReader, ItemProcessor, and ItemWriter. We are running the same job concurrently with different parameters. The ItemReader is stateful as it contains an input stream that it reads from.
So, we don't want the same instance of the ItemReader to be used for every JobInstance (Job + Parameters) invocation.
I am not quite sure which is the best "scoping" for this situation.
1) Should the Step be annotated with #JobScope and ItemReader be a prototype?
OR
2) Should the Step be annotated with #StepScope and ItemReader be a prototype?
OR
3) Should both the Step and ItemReader be annotated as Prototype?
The end result should be such that a new ItemReader is created for every new execution of the Job with different identifying parameters (ie, for every new JobInstance).
Thanks.
-AP_
Here's how it goes from a class instantiation standpoint (from least to most instances):
Singleton (per JVM)
JobScope (per job)
StepScope (per step)
Prototype (per reference)
If you have multiple jobs running in a single JVM (assuming you aren't in a partitioned Step, JobScope will be sufficient. If you have a partitioned step, you'll want StepScope. Prototype would be overkill in all scenarios.
However, if these jobs are launching in different JVMs (and not a partitioned step), then a simple Singleton bean will be just fine.
There is no need that every component (Step, ItemReader, ItemProcessor, ItemWriter) has to be a spring component. For instance, with the SpringBatch-JavaApi, only your Job needs to be a SpringBean, but not your Steps, Readers and Writers:
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory steps;
#Bean
public Job job() throws Exception {
return this.jobs.get(JOB_NAME) // create jobbuilder
.start(step1()) // add step 1
.next(step2()) // add step 2
.build(); // create job
}
#Bean
public Job job() throws Exception {
return this.jobs.get(JOB_NAME) // create jobbuilder
.start(step1(JOB_NAME)) // add step 1
.next(step2(JOB_NAME)) // add step 2
.build(); // create job
}
private Step step1(String jobName) throws Exception {
return steps.get(jobName + "_Step_1").chunk(10) //
.faultTolerant() //
.reader(() -> null) // you could lambdas
.writer(items -> {
}) //
.build();
}
private Step step2(String jobName) throws Exception {
return steps.get(jobName + "_Step_2").chunk(10) //
.faultTolerant() //
.reader(createDbItemReader(ds, sqlString, rowmapper)) //
.writer(createFileWriter(resource, aggregator)) //
.build();
}
The only thing you have to pay attention to is that you have to call the "afterPropertiesSet"-methods when creating instances like JdbcCurserItemReader, FlatFileItemReader/Writer:
private static <T> ItemReader<T> createDbItemReader(DataSource ds, String sql, RowMapper<T> rowMapper) throws Exception {
JdbcCursorItemReader<T> reader = new JdbcCursorItemReader<>();
reader.setDataSource(ds);
reader.setSql(sql);
reader.setRowMapper(rowMapper);
reader.afterPropertiesSet(); // don't forget
return reader;
}
private static <T> ItemWriter<T> createFileWriter(Resource target, LineAggregator<T> aggregator) throws Exception {
FlatFileItemWriter<T> writer = new FlatFileItemWriter();
writer.setEncoding("UTF-8");
writer.setResource(target);
writer.setLineAggregator(aggregator);
writer.afterPropertiesSet(); // don't forget
return writer;
}
This way, there is no need for you to hassle around with the Scopes. Every Job will have its own instances of its Steps and their Readers and Writers.
Another advantage of this approach is the fact that you now can create your jobs completly dynamically.

How to end a job when no input read

We read most of our data from a DB. Sometimes the result-set is empty, and for that case we want the job to stop immediately, and not hand over to a writer. We don't want to create a file, if there is no input.
Currently we achieve this goal with a Step-Listener that returns a certain String, which is the input for a transition to either the next business-step or a delete-step, which deletes the file we created before (the file contains no real data).
I'd like the job to end after the reader realizes that there is no input?
New edit (more elegant way)
This approach is to elegantly move to the next step or end the batch application when the file is not found and prevent unwanted steps to execute (and their listeners too).
-> Check for the presence of file in a tasklet, say FileValidatorTasklet.
-> When the file is not found set some exit status (enum or final string) , here we have set EXIT_CODE
sample tasklet
public class FileValidatorTasklet implements Tasklet {
static final String EXIT_CODE = "SOME_EXIT_CODE";
static final String EXIT_DESC = "SOME_EXIT_DESC";
#Override
public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception {
boolean isFileFound = false;
//do file check and set isFileFound
if(!isFileFound){
stepContribution.setExitStatus(new ExitStatus(EXIT_CODE, EXIT_DESC));
}
return RepeatStatus.FINISHED;
}
}
-> In the job configuration of this application after executing FileValidatorTasklet, check for the presence of the EXIT_CODE.
-> Provide the subsequent path for this job if the code is found else the normal flow of the job.( Here we are simply terminating the job if the EXIT_CODE is found else continue with the next steps)
sample config
public Job myJob(JobBuilderFactory jobs) {
return jobs.get("offersLoaderJob")
.start(fileValidatorStep).on(EXIT_CODE).end() // if EXIT_CODE is found , then end the job
.from(fileValidatorStep) // else continue the job from here, after this step
.next(step2)
.next(finalStep)
.end()
.build();
}
Here we have taken advantage of conditional step flow in spring batch.
We have to define two separate path from step A. The flow is like A->B->C or A->D->E.
Old answer:
I have been through this and hence I am sharing my approach. It's better to
throw new RunTimeException("msg");.
It will start to terminate the Spring Application , rather than exact terminate at that point. All methods like close() in ( reader/writer) would be called and destroy method of all the beans would be called.
Note: While executing this in Listener, remember that by this point all the beans would have been initialized and code in their initialization (like afterPropertySet() ) would have executed.
I think above is the correct way, but if you are willing to terminate at that point only, you can try
System.exit(1);
It would likely be cleaner to use a JobExecutionDecider and based on the read count from the StepExecution set a new FlowExecutionStatus and route it to the end of the job.
Joshua's answer addresses the stopping of the job instead of transitioning to the next business step.
Your file writer might still create the file unnecessarily. You can create something like a LazyItemWriter with a delegate (FlatFileItemWriter) and it will only call delegate.open (once) if there's a call to write method. Of course you have to check if delegate.close() needs to be called only if the delegate was previously opened. This makes sure that no empty file is created and deleting it is no longer a concern.
I have the same question as the OP. I am using all annotations, and if the reader returns as null when no results (in my case a File) are found, then the Job bean will fail to be initialized with an UnsatisfiedDependencyException, and that exception is thrown to stdout.
If I create a Reader and then return it w/o a File specified, then the Job will be created. After that an ItemStreamException is thrown, but it is thrown to my log, as I am past the Job autowiring and inside the Step at that point. That seems preferable, at least for what I am doing.
Any other solution would be appreciated.
NiksVij Answer works for me, i implemented it like this:
#Component
public class FileValidatorTasklet implements Tasklet {
private final ImportProperties importProperties;
#Autowired
public FileValidatorTasklet(ImportProperties importProperties) {
this.importProperties = importProperties;
}
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
String folderPath = importProperties.getPathInput();
String itemName = importProperties.getItemName();
File currentItem = new File(folderPath + File.separator + itemName);
if (currentItem.exists()) {
contribution.setExitStatus(new ExitStatus("FILE_FOUND", "FILE_FOUND"));
} else {
contribution.setExitStatus(new ExitStatus("NO_FILE_FOUND", "NO_FILE_FOUND"));
}
return RepeatStatus.FINISHED;
}
}
and in the Batch Configuration:
#Bean
public Step fileValidatorStep() {
return this.stepBuilderFactory.get("step1")
.tasklet(fileValidatorTasklet)
.build();
}
#Bean
public Job tdZuHostJob() throws Exception {
return jobBuilderFactory.get("tdZuHostJob")
.incrementer(new RunIdIncrementer())
.listener(jobCompletionNotificationListener)
.start(fileValidatorStep()).on("NO_FILE_FOUND").end()
.from(fileValidatorStep()).on("FILE_FOUND").to(testStep()).end()
.build();
}