I'm trying to import city data from a csv file, some data may be duplicated which invoke conflict error ERROR: duplicate key value violates unique constraint "city__unique_idx"
Detail: Key (country, city_name, latitude, longitude)=(231, Monticello, 30.5450000000000017, -83.8703000000000003) already exists.
2018-03-13 14:34:03.607 ERROR 13275 --- [ main] o.s.batch.core.step.AbstractStep : Encountered an error executing step cityStep1 in job importCityJob. I want to know how to ignore this error and keep the job running because it currently exits immediately.
Writer:
#Bean
public ItemWriter<City> cityWriter(EntityManagerFactory factory) {
JpaItemWriter<City> writer = new JpaItemWriter<>();
writer.setEntityManagerFactory(factory);
return writer;
}
Here is my job method:
#Bean
public Step cityStep1(ItemWriter<City> writer) {
return stepBuilderFactory.get("cityStep1").<City, City>chunk(10).reader(cityReader())
.processor(cityProcessor()).writer(writer).build();
}
#Bean
public Job importHotelJob(JobCompletionNotificationListener listener, Step cityStep) {
return jobBuilderFactory.get("importCityJob").incrementer(new RunIdIncrementer())
.listener(listener)
.next(cityStep1)
.build();
Thanks.
Edit1:
After I applied faultTolerant()
#Bean
public Step cityStep1(ItemWriter<City> writer) {
return stepBuilderFactory.get("cityStep1").<City, City>chunk(50).reader(cityReader())
.processor(cityProcessor()).writer(writer)
.faultTolerant()
.skip(ConflictException.class)
.skip(ConstraintViolationException.class)
.noRetry(ConflictException.class)
.noRetry(ConstraintViolationException.class)
.skipLimit(150)
.build();
I still get error:
2018-03-14 15:49:40.047 ERROR 26613 --- [ main] o.h.engine.jdbc.spi.SqlExceptionHelper : ERROR: duplicate key value violates unique constraint "city__unique_idx"
Detail: Key (country, city_name, latitude, longitude)=(231, Monticello, 30.5450000000000017, -83.8703000000000003) already exists.
2018-03-14 15:49:40.161 ERROR 26613 --- [ main] o.s.batch.core.step.AbstractStep : Encountered an error executing step cityStep1 in job importCityJob
org.springframework.retry.ExhaustedRetryException: Retry exhausted after last attempt in recovery path, but exception is not skippable.; nested exception is javax.persistence.PersistenceException: org.hibernate.exception.ConstraintViolationException: could not execute statement
at org.springframework.batch.core.step.item.FaultTolerantChunkProcessor$5.recover(FaultTolerantChunkProcessor.java:405) ~[spring-batch-core-3.0.8.RELEASE.jar:3.0.8.RELEASE]
at org.springframework.retry.support.RetryTemplate.handleRetryExhausted(RetryTemplate.java:512) ~[spring-retry-1.2.1.RELEASE.jar:na]
at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:351) ~[spring-retry-1.2.1.RELEASE.jar:na]
at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:211) ~[spring-retry-1.2.1.RELEASE.jar:na]
You need to make your step fault tolerant for specific exceptions.
#Bean
public Step cityStep1(ItemWriter<City> writer) {
return stepBuilderFactory.get("cityStep1")
.<City,City>chunk(10)
.reader(cityReader())
.processor(cityProcessor())
.writer(writer)
.faultTolerant()
.skip(YourException.class)
.noRetry(YourException.class)
.noRollback(YourException.class)
.skipLimit(10)
.build();
}
You haven't listed your writer or exception that you received so you can put that exception in place of YourException.
noRetry , noRollBack have been put to alter default Spring Batch behavior in case of skipping as spring batch roll backs transaction for whole chunk & then reprocesses item bu item. If you are OK with that way, you can remove these parts from your step definition.
Related
Jpa repositoy methods called from inside of a #Scheduled method is throwing PSQLException and saying the relation does not exist, where the same repository method is being executed fine without any problems from other parts of the code.
This is my #Configuration class for Scheduler,
#Configuration
#EnableScheduling
public class SchedulingConfig {
#Bean
public TaskScheduler taskScheduler() {
ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(1);
executor.setThreadFactory(new ThreadFactoryBuilder().setNameFormat("Scheduled-%d").build());
ConcurrentTaskScheduler taskScheduler = new ConcurrentTaskScheduler(executor);
taskScheduler.setTaskDecorator(new MasterTaskDecorator());
return taskScheduler;
}
}
And this is the #Scheduled annotated method,
#Scheduled(fixedRate = 60000, initialDelay = 60000)
public void scheduledRevalidationOfOpenJourney() {
List<JourneyEntity> openJourneys = journeyRepository.findAllByState(JourneyState.OPEN);
openJourneys.forEach(openJourney -> openJourney.getDatasets().forEach(validationService::removeRelatedViolation));
}
From there on, this is the exception being thrown,
2022-12-02 15:22:58.894 ERROR 32704 --- [ scheduling-1] o.h.engine.jdbc.spi.SqlExceptionHelper : ERROR: relation "journey" does not exist
org.springframework.dao.InvalidDataAccessResourceUsageException: could not extract ResultSet; SQL [n/a]; nested exception is org.hibernate.exception.SQLGrammarException: could not extract ResultSet
2022-12-02 15:23:58.896 ERROR 32704 --- [ scheduling-1] o.s.s.s.TaskUtils$LoggingErrorHandler : Unexpected error occurred in scheduled task
Caused by: org.hibernate.exception.SQLGrammarException: could not extract ResultSet
Caused by: org.postgresql.util.PSQLException: ERROR: relation "journey" does not exist
As I mentioned before, the table 'journey' does already exist and its repository methods are already being executed without any problems elsewhere in the code. Can anybody please point out what is causing this scenario and what could be the solution to it?
I'm using Spring batch to write a batch process and I'm having issues handling the exceptions.
I have a reader that fetches items from a database with an specific state. The reader passes the item to the processor step that can launch the exception MyException.class. When this exception is thrown I want to skip the item that caused that exception and continue reading the next one.
The issue here is that I need to change the state of that item in the database so it's not fetched again by the reader.
This is what I tried:
return this.stepBuilderFactory.get("name")
.<Input, Output>chunk(1)
.reader(reader())
.processor(processor())
.faultTolerant()
.skipPolicy(skipPolicy())
.writer(writer())
.build();
In my SkipPolicy class I have the next code:
public boolean shouldSkip(Throwable throwable, int skipCount) throws SkipLimitExceededException {
if (throwable instanceof MyException.class) {
// log the issue
// update the item that caused the exception in database so the reader doesn't return it again
return true;
}
return false;
}
With this code the exception is skipped and my reader is called again, however the SkipPolicy didn't commit the change or did a rollback, so the reader fetches the item and tries to process it again.
I also tried with an ExceptionHandler:
return this.stepBuilderFactory.get("name")
.<Input, Output>chunk(1)
.reader(reader())
.processor(processor())
.faultTolerant()
.skip(MyException.class)
.exceptionHandler(myExceptionHandler())
.writer(writer())
.build();
In my ExceptionHandler class I have the next code:
public void handleException(RepeatContext context, Throwable throwable) throws Throwable {
if (throwable.getCause() instanceof MyException.class) {
// log the issue
// update the item that caused the exception in database so the reader doesn't return it again
} else {
throw throwable;
}
}
With this solution the state is changed in the database, however it doesn't call the reader, instead it calls the method process of the processor() again, getting in an infinite loop.
I imagine I can use a listener in my step to handle the exceptions, but I don't like that solution because I will have to clone a lot of code asumming this exception could be launched in different steps/processors of my code.
What am I doing wrong?
EDIT: After a lot of tests and using different listeners like SkipListener, I couldn't achieve what I wanted, Spring Batch is always doing a rollback of my UPDATE.
Debugging this is what I found:
Once my listener is invoked and I update my item, the program enters the method write in the class FaultTolerantChunkProcessor (line #327).
This method will try the next code (copied from github):
try {
doWrite(outputs.getItems());
} catch (Exception e) {
status = BatchMetrics.STATUS_FAILURE;
if (rollbackClassifier.classify(e)) {
throw e;
}
/*
* If the exception is marked as no-rollback, we need to
* override that, otherwise there's no way to write the
* rest of the chunk or to honour the skip listener
* contract.
*/
throw new ForceRollbackForWriteSkipException(
"Force rollback on skippable exception so that skipped item can be located.", e);
}
The method doWrite (line #151) inside the class SimpleChunkProcessor will try to write the list of output items, however, in my case the list is empty, so in the line #159 (method writeItems) will launch an IndexOutOfBoundException, causing the ForceRollbackForWriteSkipException and doing the rollback I'm suffering.
If I override the class FaultTolerantChunkProcessor and I avoid writing the items if the list is empty, then everything works as intended, the update is commited and the program skips the error and calls the reader again.
I don't know if this is actually a bug or it's caused by something I'm doing wrong in my code.
A SkipListener is better suited to your use case than an ExceptionHandler in my opinion, as it gives you access to the item that caused the exception. With the exception handler, you need to carry the item in the exception or the repeat context.
Moreover, the skip listener allows you to know in which phase the exception happened (ie in read, process or write), while with the exception handler you need to find a way to detect that yourself. If the skipping code is the same for all phases, you can call the same method that updates the item's status in all the methods of the listener.
Using spring batch, I am trying to start a job with some parameters but parameters from previous instance are used.
Spring is started using ApplicationContext context = SpringApplication.run(Application.class, args);
My job bean :
#Bean
public Job closingJob(JobCompletionNotificationListener listener,
Step step1,
Step step2,
Step step3,
JobParametersValidator validator) {
return jobBuilderFactory.get("quarterly-closing")
.incrementer(new RunIdIncrementer())
.validator(validator)
.listener(listener)
.flow(step1)
.next(step2)
.next(step3)
.end()
.build();
}
In the logs :
2020-04-15 08:51:10,259 - INFO - [] {o.s.b.a.b.JobLauncherCommandLineRunner} --> Running default command line with: [run.id(long)=1, my.param=secondRun ]
2020-04-15 08:51:10,422 - INFO - [] {o.s.b.c.l.s.SimpleJobLauncher} --> Job: [FlowJob: [name=my-job]] launched with the following parameters: [{run.id=2, my.param=firstRun}]
I saw a similar question but there are only one answer that doesn't help me.
Edit : I tried it with a custom JobParametersIncrementer but it doesn't work : it still uses the previous instance parameters
#Bean
public JobParametersIncrementer incrementer(){
return parameters -> {
if (parameters==null || parameters.isEmpty()) {
return new JobParametersBuilder().addLong("run.id",1L).toJobParameters();
}
long id = parameters.getLong("run.id",1L) + 1;
return new JobParametersBuilder().addLong("run.id", id).toJobParameters();
};
}
From your log statements, it does look like the RunIdIncrementer is being hit.
2020-04-15 08:51:10,259 - INFO - [] {o.s.b.a.b.JobLauncherCommandLineRunner} --> Running default command line with: [run.id(long)=1, my.param=secondRun ]
2020-04-15 08:51:10,422 - INFO - [] {o.s.b.c.l.s.SimpleJobLauncher} --> Job: [FlowJob: [name=my-job]] launched with the following parameters: [{run.id=2, my.param=firstRun}]
JobLauncherCommandLineRunner says:
run.id(long)=1
SimpleJobLauncher says:
run.id=2
JobLauncherCommandLineRunner
The log statement for JobLauncherCommandLineRunner happens before any modifications to the JobParameters occur.
Relevant Code Snippet (Source):
public void run(String... args) throws JobExecutionException {
logger.info("Running default command line with: " + Arrays.asList(args));
launchJobFromProperties(StringUtils.splitArrayElementsIntoProperties(args, "="));
}
Assumption
I'm assuming you're trying to execute a job more than once given a static run.id=1 parameter, and continually getting run.id=2 from your incrementer. If you're wanting to increment the JobParameters to guarantee unique JobParameters, you have to try a different approach.
Look at the below JobParameterIncrementer snippet, which take a set of JobParameters and adds a parameter random=(random long value):
#Override
public JobParameters getNext(JobParameters params) {
Long random = (long) (Math.random() * 1000000);
return new JobParametersBuilder(params).addLong("random", random).toJobParameters();
}
Using spring batch, I am trying to start a job with some parameters but parameters from previous instance are used.
According to your job definition, you are using a JobParametersIncrementer. When you add an incrementer to your job definition, you are basically telling Spring Batch the following: Whenever I run a my job, do increment the parameters of the previous instance using my incrementer and create a new job instance.
Using a job parameters incrementer makes sense when there is a logical sequence of job instances (aka it is possible to calculate the next job instance from the previous one).
So Spring Batch will take the parameters of the previous instance, pass them to your incrementer, and use those returned by the incrementer to create the "next" job instance in the sequence.
Hopefully this makes it clear why "parameters from previous instance are used".
I want to use Spring Batch (v3.0.9) restart functionality so that when JobInstance restarted the process step reads from the last failed chunk point forward. My restart works fine as long as I don't use #StepScope annotation to my myBatisPagingItemReader bean method.
I was using #StepScope so that i can do late binding to get the JobParameters in my myBatisPagingItemReader bean method #Value("#{jobParameters['run-date']}"))
If I use #StepScope annotation on myBatisPagingItemReader() bean method the restart does not work as it creates new instance (scope=step, name=scopedTarget.myBatisPagingItemReader).
If i use stepscope, is it possible for my myBatisPagingItemReader to set the read.count from the last failure to get restart working?
I have explained this issue with example below.
#Configuration
#EnableBatchProcessing
public class BatchConfig {
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<Model> myBatisPagingItemReader,
ItemProcessor<Model, Model> itemProcessor,
ItemWriter<Model> itemWriter) {
return stepBuilderFactory.get("data-load")
.<Model, Model>chunk(10)
.reader(myBatisPagingItemReader)
.processor(itemProcessor)
.writer(itemWriter)
.listener(itemReadListener())
.listener(new JobParameterExecutionContextCopyListener())
.build();
}
#Bean
public Job job(JobBuilderFactory jobBuilderFactory, #Qualifier("step1")
Step step1) {
return jobBuilderFactory.get("load-job")
.incrementer(new RunIdIncrementer())
.start(step1)
.listener(jobExecutionListener())
.build();
}
#Bean
#StepScope
public ItemReader<Model> myBatisPagingItemReader(
SqlSessionFactory sqlSessionFactory,
#Value("#{JobParameters['run-date']}") String runDate)
{
MyBatisPagingItemReader<Model> reader = new
MyBatisPagingItemReader<>();
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("runDate", runDate);
reader.setSqlSessionFactory(sqlSessionFactory);
reader.setParameterValues(parameterValues);
reader.setQueryId("query");
return reader;
}
}
Restart Example when I use #Stepscope annotation to myBatisPagingItemReader(), the reader is fetching 5 records and I have chunk size(commit-interval) set to 3.
Job Instance - 01 - Job Parameter - 01/02/2019.
chunk-1:
- process record-1
- process record-2
- process record-3
writer - writes all 3 records
chunk-1 commit successful
chunk-2:
process record-4
process record-5 - Throws and exception
Job completes and set to 'FAILED' status
Now the Job is Restarted again using same Job Parameter.
Job Instance - 01 - Job Parameter - 01/02/2019.
chunk-1:
process record-1
process record-2
process record-3
writer - writes all 3 records
chunk-1 commit successful
chunk-2:
process record-4
process record-5 - Throws and exception
Job completes and set to 'FAILED' status
The #StepScope annotation on myBatisPagingItemReader() bean method creates a new instance , see below log message.
Creating object in scope=step, name=scopedTarget.myBatisPagingItemReader
Registered destruction callback in scope=step, name=scopedTarget.myBatisPagingItemReader
As it is new instance it start the process from start, instead of starting from chunk-2.
If i don't use #Stepscope, it restarts from chunk-2 as the restarted job step sets - MyBatisPagingItemReader.read.count=3.
The issue here is that you are returning an ItemReader instead of the fully qualified class (MyBatisPagingItemReader) or at least ItemStreamReader. When you use Spring Batch's step scope, we create a proxy to allow for late initialization. The proxy is based on the return type of the method (ItemReader in your case). The issue you are running into is that because the proxy is of ItemReader, Spring Batch does not know that your bean also implements ItemStream and it is that interface that enables restartability. By default, Spring Batch will automatically register all beans of type ItemStream for you (you can also explicitly register the beans yourself, but it's typically not needed).
To address your issue, the following should work (note the change in the return type):
#Bean
#StepScope
public MyBatisPagingItemReader<Model> myBatisPagingItemReader(
SqlSessionFactory sqlSessionFactory,
#Value("#{JobParameters['run-date']}") String runDate) {
MyBatisPagingItemReader<Model> reader =
new MyBatisPagingItemReader<>();
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("runDate", runDate);
reader.setSqlSessionFactory(sqlSessionFactory);
reader.setParameterValues(parameterValues);
reader.setQueryId("query");
return reader;
}
This is why it is my recommendation that where possible, when using #Bean annotated methods, you should return the most concrete type possible to allow Spring to help as much as possible.
We read most of our data from a DB. Sometimes the result-set is empty, and for that case we want the job to stop immediately, and not hand over to a writer. We don't want to create a file, if there is no input.
Currently we achieve this goal with a Step-Listener that returns a certain String, which is the input for a transition to either the next business-step or a delete-step, which deletes the file we created before (the file contains no real data).
I'd like the job to end after the reader realizes that there is no input?
New edit (more elegant way)
This approach is to elegantly move to the next step or end the batch application when the file is not found and prevent unwanted steps to execute (and their listeners too).
-> Check for the presence of file in a tasklet, say FileValidatorTasklet.
-> When the file is not found set some exit status (enum or final string) , here we have set EXIT_CODE
sample tasklet
public class FileValidatorTasklet implements Tasklet {
static final String EXIT_CODE = "SOME_EXIT_CODE";
static final String EXIT_DESC = "SOME_EXIT_DESC";
#Override
public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception {
boolean isFileFound = false;
//do file check and set isFileFound
if(!isFileFound){
stepContribution.setExitStatus(new ExitStatus(EXIT_CODE, EXIT_DESC));
}
return RepeatStatus.FINISHED;
}
}
-> In the job configuration of this application after executing FileValidatorTasklet, check for the presence of the EXIT_CODE.
-> Provide the subsequent path for this job if the code is found else the normal flow of the job.( Here we are simply terminating the job if the EXIT_CODE is found else continue with the next steps)
sample config
public Job myJob(JobBuilderFactory jobs) {
return jobs.get("offersLoaderJob")
.start(fileValidatorStep).on(EXIT_CODE).end() // if EXIT_CODE is found , then end the job
.from(fileValidatorStep) // else continue the job from here, after this step
.next(step2)
.next(finalStep)
.end()
.build();
}
Here we have taken advantage of conditional step flow in spring batch.
We have to define two separate path from step A. The flow is like A->B->C or A->D->E.
Old answer:
I have been through this and hence I am sharing my approach. It's better to
throw new RunTimeException("msg");.
It will start to terminate the Spring Application , rather than exact terminate at that point. All methods like close() in ( reader/writer) would be called and destroy method of all the beans would be called.
Note: While executing this in Listener, remember that by this point all the beans would have been initialized and code in their initialization (like afterPropertySet() ) would have executed.
I think above is the correct way, but if you are willing to terminate at that point only, you can try
System.exit(1);
It would likely be cleaner to use a JobExecutionDecider and based on the read count from the StepExecution set a new FlowExecutionStatus and route it to the end of the job.
Joshua's answer addresses the stopping of the job instead of transitioning to the next business step.
Your file writer might still create the file unnecessarily. You can create something like a LazyItemWriter with a delegate (FlatFileItemWriter) and it will only call delegate.open (once) if there's a call to write method. Of course you have to check if delegate.close() needs to be called only if the delegate was previously opened. This makes sure that no empty file is created and deleting it is no longer a concern.
I have the same question as the OP. I am using all annotations, and if the reader returns as null when no results (in my case a File) are found, then the Job bean will fail to be initialized with an UnsatisfiedDependencyException, and that exception is thrown to stdout.
If I create a Reader and then return it w/o a File specified, then the Job will be created. After that an ItemStreamException is thrown, but it is thrown to my log, as I am past the Job autowiring and inside the Step at that point. That seems preferable, at least for what I am doing.
Any other solution would be appreciated.
NiksVij Answer works for me, i implemented it like this:
#Component
public class FileValidatorTasklet implements Tasklet {
private final ImportProperties importProperties;
#Autowired
public FileValidatorTasklet(ImportProperties importProperties) {
this.importProperties = importProperties;
}
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
String folderPath = importProperties.getPathInput();
String itemName = importProperties.getItemName();
File currentItem = new File(folderPath + File.separator + itemName);
if (currentItem.exists()) {
contribution.setExitStatus(new ExitStatus("FILE_FOUND", "FILE_FOUND"));
} else {
contribution.setExitStatus(new ExitStatus("NO_FILE_FOUND", "NO_FILE_FOUND"));
}
return RepeatStatus.FINISHED;
}
}
and in the Batch Configuration:
#Bean
public Step fileValidatorStep() {
return this.stepBuilderFactory.get("step1")
.tasklet(fileValidatorTasklet)
.build();
}
#Bean
public Job tdZuHostJob() throws Exception {
return jobBuilderFactory.get("tdZuHostJob")
.incrementer(new RunIdIncrementer())
.listener(jobCompletionNotificationListener)
.start(fileValidatorStep()).on("NO_FILE_FOUND").end()
.from(fileValidatorStep()).on("FILE_FOUND").to(testStep()).end()
.build();
}