I used Spring batch to build a ETL job. My main job is simply read from a db and write to another one. Before my main job, I need to check a status in the source db to see if it is ready or not. I will proceed with the main job only if the source db is ready. I implemented the check status logic as a Tasklet and I build my job with the simple logic that if check status step failed repeat this step util the step succeed, then proceed with the main job. I build the job as follows:
#Bean
public Job myJob(MyListener listener) {
return jobBuilderFactory.get(jobName)
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(checkStatusStep())
.on("*").to(mainStep())
.on(ExitStatus.FAILED.getExitCode())
.to(checkStatusStep())
.end()
.build();
}
The check status step is a tasklet as follows:
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws InterruptedException {
//isReady method check the db to see if it is ready
if (isReady()) {
//contribution.setExitStatus(ExitStatus.COMPLETED);
return RepeatStatus.FINISHED;
} else {
//contribution.setExitStatus(ExitStatus.FAILED);
return RepeatStatus.CONTINUABLE;
}
}
I read the spring batch document and get that the ON method in the conditional follow will check the exit status of the step so that I set the exit status in the StepContribution. What makes me confused is that when I comment those two lines out and the code still works.
So my question is, first, if the conditional flow checks the exit status why my code works without explicitly change the exit status? Second, why tasklet return a repeat status and by whom this repeat status is consumed.
Third, is there a better way to achieve my goal?
if the conditional flow checks the exit status why my code works without explicitly change the exit status?
Because by default if the tasklet returns RepeatStatus.FINISHED, its exit code will be set to COMPLETED
Second, why tasklet return a repeat status and by whom this repeat status is consumed.
A tasklet is called repeatedly by the TaskletStep until it either returns RepeatStatus.FINISHED or throws an exception to signal a failure. Each call to a Tasklet is wrapped in a transaction. The structure is therefore that of a loop with transaction boundary inside the loop.
Third, is there a better way to achieve my goal?
IMO, your Tasklet implementation is ok as is, you don't need the exit status. Something like:
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws InterruptedException {
//isReady method check the db to see if it is ready
if (isReady()) {
return RepeatStatus.FINISHED;
} else {
return RepeatStatus.CONTINUABLE;
}
}
This will continuously run until isReady is true, so you don't need to implement the iteration with a flow at the job level. The job definition would be:
#Bean
public Job myJob(MyListener listener) {
return jobBuilderFactory.get(jobName)
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(checkStatusStep())
.next(mainStep())
.build();
}
Hope this helps.
Related
I'm using Spring batch to write a batch process and I'm having issues handling the exceptions.
I have a reader that fetches items from a database with an specific state. The reader passes the item to the processor step that can launch the exception MyException.class. When this exception is thrown I want to skip the item that caused that exception and continue reading the next one.
The issue here is that I need to change the state of that item in the database so it's not fetched again by the reader.
This is what I tried:
return this.stepBuilderFactory.get("name")
.<Input, Output>chunk(1)
.reader(reader())
.processor(processor())
.faultTolerant()
.skipPolicy(skipPolicy())
.writer(writer())
.build();
In my SkipPolicy class I have the next code:
public boolean shouldSkip(Throwable throwable, int skipCount) throws SkipLimitExceededException {
if (throwable instanceof MyException.class) {
// log the issue
// update the item that caused the exception in database so the reader doesn't return it again
return true;
}
return false;
}
With this code the exception is skipped and my reader is called again, however the SkipPolicy didn't commit the change or did a rollback, so the reader fetches the item and tries to process it again.
I also tried with an ExceptionHandler:
return this.stepBuilderFactory.get("name")
.<Input, Output>chunk(1)
.reader(reader())
.processor(processor())
.faultTolerant()
.skip(MyException.class)
.exceptionHandler(myExceptionHandler())
.writer(writer())
.build();
In my ExceptionHandler class I have the next code:
public void handleException(RepeatContext context, Throwable throwable) throws Throwable {
if (throwable.getCause() instanceof MyException.class) {
// log the issue
// update the item that caused the exception in database so the reader doesn't return it again
} else {
throw throwable;
}
}
With this solution the state is changed in the database, however it doesn't call the reader, instead it calls the method process of the processor() again, getting in an infinite loop.
I imagine I can use a listener in my step to handle the exceptions, but I don't like that solution because I will have to clone a lot of code asumming this exception could be launched in different steps/processors of my code.
What am I doing wrong?
EDIT: After a lot of tests and using different listeners like SkipListener, I couldn't achieve what I wanted, Spring Batch is always doing a rollback of my UPDATE.
Debugging this is what I found:
Once my listener is invoked and I update my item, the program enters the method write in the class FaultTolerantChunkProcessor (line #327).
This method will try the next code (copied from github):
try {
doWrite(outputs.getItems());
} catch (Exception e) {
status = BatchMetrics.STATUS_FAILURE;
if (rollbackClassifier.classify(e)) {
throw e;
}
/*
* If the exception is marked as no-rollback, we need to
* override that, otherwise there's no way to write the
* rest of the chunk or to honour the skip listener
* contract.
*/
throw new ForceRollbackForWriteSkipException(
"Force rollback on skippable exception so that skipped item can be located.", e);
}
The method doWrite (line #151) inside the class SimpleChunkProcessor will try to write the list of output items, however, in my case the list is empty, so in the line #159 (method writeItems) will launch an IndexOutOfBoundException, causing the ForceRollbackForWriteSkipException and doing the rollback I'm suffering.
If I override the class FaultTolerantChunkProcessor and I avoid writing the items if the list is empty, then everything works as intended, the update is commited and the program skips the error and calls the reader again.
I don't know if this is actually a bug or it's caused by something I'm doing wrong in my code.
A SkipListener is better suited to your use case than an ExceptionHandler in my opinion, as it gives you access to the item that caused the exception. With the exception handler, you need to carry the item in the exception or the repeat context.
Moreover, the skip listener allows you to know in which phase the exception happened (ie in read, process or write), while with the exception handler you need to find a way to detect that yourself. If the skipping code is the same for all phases, you can call the same method that updates the item's status in all the methods of the listener.
Starting to use Spring Batch for a new project. I have a simple batch application (based off the Person sample) that has 4 jobs invoked in order. The first 2 jobs delete all the elements of a mongo collection, the next Job is the Person job, and the 4th job manipulates the data loaded in Job #3.
Each Job has 1 step. Each step has a unique MongoItemWriter. It appears that the MongoItemWriters are not flushing to the mongodb until the application is finished, but not at the end of the step or job.
The job that loads the data from a flat file is this:
#Bean
#Qualifier("personWriter")
public MongoItemWriter<Person> mongoItemWriter(MongoTemplate mongoTemplate) {
return new MongoItemWriterBuilder<Person>().template(mongoTemplate)
.collection("person")
.build();
}
#Order(10)
#Bean
public Job importUserJob(PersonCompletionNotificationListener listener, Step step1) {
return jobBuilderFactory.get("importUserJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.flow(step1)
.end()
.build();
}
#Bean
public Step step1(#Qualifier("personWriter") MongoItemWriter<Person> writer) {
return stepBuilderFactory.get("importUserStep")
.<Person, Person>chunk(1)
.reader(reader())
.processor(processor())
.writer(writer)
.build();
}
When the next job runs, that table is empty.
I expected that any of Chunk size=1, End of Step, or End of Job, would flush to mongo.
What am I missing?
Below is a relevant portion of code for reader, processor , writer and step for batch job that I create.
I have a requirement to update a flag column in table from where data is being read ( source table ) to mark that this data is being processed by this job so other apps don't pick up that data. Then once processing of read records is finished, I need to restore that column to original value so other apps can work on those records too.
I guess, listener is the approach to take ( ItemReadListener ? ) . Reader listener seems suitable only for first step ( i.e to update flag column ) but not for restore at the end of chunk. Challenge seems to be making read data available at the end of processor.
Can anybody suggest about possible approaches?
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> reader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.<RemittanceVO, RemittanceClaimVO> chunk(Constants.SPRING_BATCH_CHUNK_SIZE)
.reader(reader)
.processor(processor)
.writer(writer)
.taskExecutor(simpleAsyntaskExecutor)
.throttleLimit(Constants.THROTTLE_LIMIT)
.build();
}
#Bean
public ItemReader<RemittanceVO> reader() {
JdbcPagingItemReader<RemittanceVO> reader = new JdbcPagingItemReader<RemittanceVO>();
reader.setDataSource(dataSource);
reader.setRowMapper(new RemittanceRowMapper());
reader.setQueryProvider(queryProvider);
reader.setPageSize(Constants.SPRING_BATCH_READER_PAGE_SIZE);
return reader;
}
#Bean
public ItemProcessor<RemittanceVO, RemittanceClaimVO> processor() {
return new MatchClaimProcessor();
}
#Bean
public ItemWriter<RemittanceClaimVO> writer(DataSource dataSource) {
return new MatchedClaimWriter();
}
I started with Spring Batch few days ago so don't have familiarity with all the provided modeling and patterns.
Firstly, a small hint about using an asyncTaskExecutor: you have to synchronize the reader, otherwise you will run into concurrency problems. You can use SynchronizedItemStreamReader to do this:
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> reader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.<RemittanceVO, RemittanceClaimVO> chunk(Constants.SPRING_BATCH_CHUNK_SIZE)
.reader(syncReader)
.processor(processor)
.writer(writer)
.taskExecutor(simpleAsyntaskExecutor)
.throttleLimit(Constants.THROTTLE_LIMIT)
.build();
}
#Bean
public ItemReader<RemittanceVO> syncReader() {
SynchronizedItemStreamReader<RemittanceVO> syncReader = new SynchronizedItemStreamReader<>();
syncReader.setDelegate(reader());
return syncReader;
}
#Bean
public ItemReader<RemittanceVO> reader() {
JdbcPagingItemReader<RemittanceVO> reader = new JdbcPagingItemReader<RemittanceVO>();
reader.setDataSource(dataSource);
reader.setRowMapper(new RemittanceRowMapper());
reader.setQueryProvider(queryProvider);
reader.setPageSize(Constants.SPRING_BATCH_READER_PAGE_SIZE);
return reader;
}
Secondly, a possible approach to your real question:
I would use a simple tasklet in order to "mark" the entries you want to process.
You can do this with one simple UPDATE-statement, since you know your selection criterias. This way, you only need one call and therefore only one transaction.
After that, I would implement an normal step with reader, processor and writer.
The reader has to read only the marked entries, making your select clause also very simple.
In order to restore the flag, you could do that in a third step which is implemented as tasklet and uses an appropriate UPDATE-statement (like the first step). To ensure that the flag is restored in the case of an exception, just configure your jobflow appropriately, so that step 3 is executed even if step 2 fails (-> see my answer to this question Spring Batch Java Config: Skip step when exception and go to next steps)
Of course, you could also restore the flag when writing the chunk if you use a compositeItemWriter. However, you need a strategy how to restore the flag in case of an exception in step 2.
IMO, using a Listener is not a good idea, since the transaction handling is differently.
One of the step in my job is having an exception and hence the job is failing with the EXIT_CODE "FAILED". Now I want to set the EXIT_MESSAGE as well, I did the following but the message is not getting set.. Any ideas??
chunkContext.getStepContext().getStepExecution().getJobExecution().setExitStatus(ExitStatus.FAILED);
ExitStatus es = jobExecution.getExitStatus();
es = exitStatus.**addExitDescription**("CUSTOM EXCEPTION MESSAGE");
chunkContext.getStepContext().getStepExecution().getJobExecution().setExitStatus(es);
I also tried the following but didn't work.
setExitStatus(new ExitStatus("FAILED","CUSTOM EXCEPTION MESSAGE"));
The way to manipulate the exit status of a job (aka the Job's ExitStatus) is via a JobExecutionLisener. The way you're attempting to manipulate it is using a copy of the real thing. We do that so that rollback can be implemented cleanly. You can read more about the JobExecutionListener in the documentation here: http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/JobExecutionListener.html
Gotcha!!!
Adding the listener at the job level and then giving a custom EXIT_CODE made it work.
Thanks Michael.
public class SampleJobListener implements JobExecutionListener {
#Override
public void beforeJob(JobExecution jobExecution) {
}
#Override
public void afterJob(JobExecution jobExecution) {
// Setting the exception in batch EXIT MESSAGE
jobExecution.setExitStatus(new ExitStatus("ERROR","Exception in JOB"));
}
}
We read most of our data from a DB. Sometimes the result-set is empty, and for that case we want the job to stop immediately, and not hand over to a writer. We don't want to create a file, if there is no input.
Currently we achieve this goal with a Step-Listener that returns a certain String, which is the input for a transition to either the next business-step or a delete-step, which deletes the file we created before (the file contains no real data).
I'd like the job to end after the reader realizes that there is no input?
New edit (more elegant way)
This approach is to elegantly move to the next step or end the batch application when the file is not found and prevent unwanted steps to execute (and their listeners too).
-> Check for the presence of file in a tasklet, say FileValidatorTasklet.
-> When the file is not found set some exit status (enum or final string) , here we have set EXIT_CODE
sample tasklet
public class FileValidatorTasklet implements Tasklet {
static final String EXIT_CODE = "SOME_EXIT_CODE";
static final String EXIT_DESC = "SOME_EXIT_DESC";
#Override
public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception {
boolean isFileFound = false;
//do file check and set isFileFound
if(!isFileFound){
stepContribution.setExitStatus(new ExitStatus(EXIT_CODE, EXIT_DESC));
}
return RepeatStatus.FINISHED;
}
}
-> In the job configuration of this application after executing FileValidatorTasklet, check for the presence of the EXIT_CODE.
-> Provide the subsequent path for this job if the code is found else the normal flow of the job.( Here we are simply terminating the job if the EXIT_CODE is found else continue with the next steps)
sample config
public Job myJob(JobBuilderFactory jobs) {
return jobs.get("offersLoaderJob")
.start(fileValidatorStep).on(EXIT_CODE).end() // if EXIT_CODE is found , then end the job
.from(fileValidatorStep) // else continue the job from here, after this step
.next(step2)
.next(finalStep)
.end()
.build();
}
Here we have taken advantage of conditional step flow in spring batch.
We have to define two separate path from step A. The flow is like A->B->C or A->D->E.
Old answer:
I have been through this and hence I am sharing my approach. It's better to
throw new RunTimeException("msg");.
It will start to terminate the Spring Application , rather than exact terminate at that point. All methods like close() in ( reader/writer) would be called and destroy method of all the beans would be called.
Note: While executing this in Listener, remember that by this point all the beans would have been initialized and code in their initialization (like afterPropertySet() ) would have executed.
I think above is the correct way, but if you are willing to terminate at that point only, you can try
System.exit(1);
It would likely be cleaner to use a JobExecutionDecider and based on the read count from the StepExecution set a new FlowExecutionStatus and route it to the end of the job.
Joshua's answer addresses the stopping of the job instead of transitioning to the next business step.
Your file writer might still create the file unnecessarily. You can create something like a LazyItemWriter with a delegate (FlatFileItemWriter) and it will only call delegate.open (once) if there's a call to write method. Of course you have to check if delegate.close() needs to be called only if the delegate was previously opened. This makes sure that no empty file is created and deleting it is no longer a concern.
I have the same question as the OP. I am using all annotations, and if the reader returns as null when no results (in my case a File) are found, then the Job bean will fail to be initialized with an UnsatisfiedDependencyException, and that exception is thrown to stdout.
If I create a Reader and then return it w/o a File specified, then the Job will be created. After that an ItemStreamException is thrown, but it is thrown to my log, as I am past the Job autowiring and inside the Step at that point. That seems preferable, at least for what I am doing.
Any other solution would be appreciated.
NiksVij Answer works for me, i implemented it like this:
#Component
public class FileValidatorTasklet implements Tasklet {
private final ImportProperties importProperties;
#Autowired
public FileValidatorTasklet(ImportProperties importProperties) {
this.importProperties = importProperties;
}
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
String folderPath = importProperties.getPathInput();
String itemName = importProperties.getItemName();
File currentItem = new File(folderPath + File.separator + itemName);
if (currentItem.exists()) {
contribution.setExitStatus(new ExitStatus("FILE_FOUND", "FILE_FOUND"));
} else {
contribution.setExitStatus(new ExitStatus("NO_FILE_FOUND", "NO_FILE_FOUND"));
}
return RepeatStatus.FINISHED;
}
}
and in the Batch Configuration:
#Bean
public Step fileValidatorStep() {
return this.stepBuilderFactory.get("step1")
.tasklet(fileValidatorTasklet)
.build();
}
#Bean
public Job tdZuHostJob() throws Exception {
return jobBuilderFactory.get("tdZuHostJob")
.incrementer(new RunIdIncrementer())
.listener(jobCompletionNotificationListener)
.start(fileValidatorStep()).on("NO_FILE_FOUND").end()
.from(fileValidatorStep()).on("FILE_FOUND").to(testStep()).end()
.build();
}