How do I prevent rollback when exception occurs in ItemWriter? - spring-batch

Our writer is designed to write records to a relational database.
If exception occurs on any of the records, Spring Batch performs a rollback and retry write operation on each record in the chunk. This results in SQL Duplicate Key exception to occur since previously processed records in the chunk were successfully written to the database.
We have tried making use of noRetry() and noRollback(), specifying explicitly a list of exceptions that should not trigger retry or rollback.
According to Spring Batch online documentation noRollback() could be used to prevent rollback when error occurs on ItemWriter: https://docs.spring.io/spring-batch/4.1.x/reference/html/step.html#controllingRollback
However, this contradicts java doc in the source code which says that FaultTolerantStepBuilder.noRollback() is ignored during write:
https://docs.spring.io/spring-batch/4.1.x/api/index.html?org/springframework/batch/core/step/builder/FaultTolerantStepBuilder.html
Here is a sample of our Job definition:
#Bean("my-job")
public Job job(Step step) {
return jobBuilderFactory.get("my-job")
.start(step)
.build();
}
#Bean
public Step step() {
return stepBuilderFactory.get("skip-step")
.<String, String>chunk(3)
.reader(reader())
.processor(myprocessor())
.writer(this::write)
.faultTolerant()
.skipLimit(1)
.skip(JobSkippableException.class)
.noRollback(JobSkippableException.class)
.noRetry(JobSkippableException.class)
.processorNonTransactional()
.build();
}
public ItemReader<String> reader() {
return new ItemReader<String> () {
#Override
public String read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
String s = randomUUID().toString();
logger.debug("READ STRING {}", s);
return s;
}
};
}
public void write(List<? extends String> items) {
for(String s : items) {
logger.debug("WRITE STRING {}", s);
throw new JobSkippableException("My skippable exception");
}
}
public ItemProcessor <String, String> myprocessor() {
return new ItemProcessor<String, String>() {
#Override
public String process(String item) throws Exception {
logger.debug("PROCESS STRING {}", item);
return item;
}
};
}
Our expected behavior is that exceptions that occur in write doesn’t trigger a retry or rollback. This would prevent repeat calls to database and hence not cause SQL Duplicate Key exception.

Not a solution, but at least an explanation for why the framework does not behave as you expect I found in lines 335-350 of FaultTolerantChunkProcessor:
try {
doWrite(outputs.getItems());
}
catch (Exception e) {
if (rollbackClassifier.classify(e)) {
throw e;
}
/*
* If the exception is marked as no-rollback, we need to
* override that, otherwise there's no way to write the
* rest of the chunk or to honour the skip listener
* contract.
*/
throw new ForceRollbackForWriteSkipException(
"Force rollback on skippable exception so that skipped item can be located.", e); }

Related

#Transactional with handling error and db-inserts in catch block (Spring Boot)

I would like to rollback a transaction for the data in case of errors and at the same time write the error to db.
I can't manage to do with Transactional Annotations.
Following code produces a runtime-error (1/0) and still writes the data into the db. And also writes the data into the error table.
I tried several variations and followed similar questions in StackOverflow but I didn't succeed to do.
Anyone has a hint, how to do?
#Service
public class MyService{
#Transactional(rollbackFor = Exception.class)
public void updateData() {
try{
processAndPersist(); // <- db operation with inserts
int i = 1/0; // <- Runtime error
}catch (Exception e){
persistError()
trackReportError(filename, e.getMessage());
}
}
#Transactional(propagation = Propagation.REQUIRES_NEW)
public void persistError(String message) {
persistError2Db(message); // <- db operation with insert
}
You need the way to throw an exception in updateData() method to rollback a transaction. And you need to not rollback persistError() transaction at the same time.
#Transactional(rollbackFor = Exception.class)
public void updateData() {
try{
processAndPersist(); // <- db operation with inserts
int i = 1/0; // <- Runtime error
}catch (Exception e){
persistError()
trackReportError(filename, e.getMessage());
throw ex; // if throw error here, will not work
}
}
Just throwing an error will not help because persistError() will have the same transaction as updateData() has. Because persistError() is called using this reference, not a reference to a proxy.
Options to solve
Using self reference.
Using self injection Spring self injection for transactions
Move the call of persistError() outside updateData() (and transaction). Remove #Transactional from persistError() (it will not work) and use transaction of Repository in persistError2Db().
Move persistError() to a separate serface. It will be called using a proxy in this case.
Don't use declarative transactions (with #Transactional annotation). Use Programmatic transaction management to set transaction boundaries manually https://docs.spring.io/spring-framework/docs/3.0.0.M3/reference/html/ch11s06.html
Also keep in mind that persistError() can produce error too (and with high probability will do it).
Using self reference
You can use self reference to MyService to have a transaction, because you will be able to call not a method of MyServiceImpl, but a method of Spring proxy.
#Service
public class MyServiceImpl implements MyService {
public void doWork(MyService self) {
DataEntity data = loadData();
try {
self.updateData(data);
} catch (Exception ex) {
log.error("Error for dataId={}", data.getId(), ex);
self.persistError("Error");
trackReportError(filename, ex);
}
}
#Transactional
public void updateData(DataEntity data) {
persist(data); // <- db operation with inserts
}
#Transactional
public void persistError(String message) {
try {
persistError2Db(message); // <- db operation with insert
} catch (Exception ex) {
log.error("Error for message={}", message, ex);
}
}
}
public interface MyService {
void doWork(MyService self);
void updateData(DataEntity data);
void persistError(String message);
}
To use
MyService service = ...;
service.doWork(service);

How to commit the offsets when using KafkaItemReader in spring batch job, once all the messages are processed and written to the .dat file?

I have developed a Spring Batch Job which read from Kafka topic using KafkaItemReader class. I want to commit the offset only when the messages read in defined chunk are Processed and written successfully to an Output .dat file.
#Bean
public Job kafkaEventReformatjob(
#Qualifier("MaintStep") Step MainStep,
#Qualifier("moveFileToFolder") Step moveFileToFolder,
#Qualifier("compressFile") Step compressFile,
JobExecutionListener listener)
{
return jobBuilderFactory.get("kafkaEventReformatJob")
.listener(listener)
.incrementer(new RunIdIncrementer())
.flow(MainStep)
.next(moveFileToFolder)
.next(compressFile)
.end()
.build();
}
#Bean
Step MainStep(
ItemProcessor<IncomingRecord, List<Record>> flatFileItemProcessor,
ItemWriter<List<Record>> flatFileWriter)
{
return stepBuilderFactory.get("mainStep")
.<InputRecord, List<Record>> chunk(5000)
.reader(kafkaItemReader())
.processor(flatFileItemProcessor)
.writer(writer())
.listener(basicStepListener)
.build();
}
//Reader reads all the messages from akfka topic and sending back in form of IncomingRecord.
#Bean
KafkaItemReader<String, IncomingRecord> kafkaItemReader() {
Properties props = new Properties();
props.putAll(this.properties.buildConsumerProperties());
List<Integer> partitions = new ArrayList<>();
partitions.add(0);
partitions.add(1);
return new KafkaItemReaderBuilder<String, IncomingRecord>()
.partitions(partitions)
.consumerProperties(props)
.name("records")
.saveState(true)
.topic(topic)
.pollTimeout(Duration.ofSeconds(40L))
.build();
}
#Bean
public ItemWriter<List<Record>> writer() {
ListUnpackingItemWriter<Record> listUnpackingItemWriter = new ListUnpackingItemWriter<>();
listUnpackingItemWriter.setDelegate(flatWriter());
return listUnpackingItemWriter;
}
public ItemWriter<Record> flatWriter() {
FlatFileItemWriter<Record> fileWriter = new FlatFileItemWriter<>();
String tempFileName = "abc";
LOGGER.info("Output File name " + tempFileName + " is in working directory ");
String workingDir = service.getWorkingDir().toAbsolutePath().toString();
Path outputFile = Paths.get(workingDir, tempFileName);
fileWriter.setName("fileWriter");
fileWriter.setResource(new FileSystemResource(outputFile.toString()));
fileWriter.setLineAggregator(lineAggregator());
fileWriter.setForceSync(true);
fileWriter.setFooterCallback(customFooterCallback());
fileWriter.close();
LOGGER.info("Successfully created the file writer");
return fileWriter;
}
#StepScope
#Bean
public TransformProcessor processor() {
return new TransformProcessor();
}
==============================================================================
Writer Class
#BeforeStep
public void beforeStep(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
#AfterStep
public void afterStep(StepExecution stepExecution) {
this.stepExecution.setWriteCount(count);
}
#Override
public void write(final List<? extends List<Record>> lists) throws Exception {
List<Record> consolidatedList = new ArrayList<>();
for (List<Record> list : lists) {
if (!list.isEmpty() && null != list)
consolidatedList.addAll(list);
}
delegate.write(consolidatedList);
count += consolidatedList.size(); // to count Trailer record count
}
===============================================================
Item Processor
#Override
public List process(IncomingRecord record) {
List<Record> recordList = new ArrayList<>();
if (null != record.getEventName() and a few other conditions inside this section) {
// setting values of Record Class by extracting from the IncomingRecord.
recordList.add(the valid records which matching the condition);
}else{
return null;
}
Synchronizing a read operation and a write operation between two transactional resources (a queue and a database for instance)
is possible by using a JTA transaction manager that coordinates both transaction managers (2PC protocol).
However, this approach is not possible if one of the resources is not transactional (like the majority of file systems). So unless you use
a transactional file system and a JTA transaction manager that coordinates a kafka transaction manager and a file system transaction manager..
you need another approach, like the Compensating Transaction pattern. In your case, the "undo" operation (compensating action) would be rewinding the offset where it was before the failed chunk.

Save on JDBC connections by using JdbcCursorItemReader or JdbcPagingItemReader

In the spring batch project, I used JdbcCursorItemReader to read data to process them in parallel. I can run the batch locally without any problem.
I also heard that JdbcPagingItemReader is recommended for parallel processing against JdbcCursorItemReader, as cursor reader will hold the connection too long while paging reader can release connection once the page size is reached.
I then switched to JdbcPagingItemReader in step2, but out of surprise, I got the exception below when running locally.
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 -
Connection is not available, request timed out after 300001ms.
However, it seems the above exception occurs in step1 before the paging reader in step2 is executed, and that is the only change made. Please shed some light on why the exception is thrown and if it is good practice to use paging reader instead of cursor in parallel processing. Much appreciated your help!
The code snippet is pasted below:
#Bean
#StepScope
public Flow createParallelSubFlow() {
List<Flow> subFlowList = new ArrayList<>();
List<Stream> streamList;
try {
streamList = dataSourceConfig.streamMapper().
getStreamListByStatus(Constants.PENDING_STATUS_CD);
} catch (Exception e) {
}
streamList.forEach(stream -> {
long id = stream.getStreamId();
String flowName = "stream" + id + "_flow";
Flow subFlow = new FlowBuilder<Flow>(flowName)
.start(step1(id))
.next(step2(id))
.end();
subFlowList.add(subFlow);
});
return new FlowBuilder<Flow>("splitFlow").split(new SimpleAsyncTaskExecutor())
.add(subFlowList.toArray(new Flow[0])).build();
}
public Step step1(long id) {
return stepBuilderFactory.get("step1")
.<Domain, Domain>chunk(100)
.reader(reader1(id))
.writer(writer1())
.build();
}
//#StepScope
//#Bean
public Step step2(long id) {
return stepBuilderFactory.get("step2")
.<Domain, Domain>chunk(100)
.reader(cursorReader2(id))
.processor(processor2)
.writer(writer2())
.build();
}
public JdbcCursorItemReader<Domain> cursorReader2(Long id) {
return new JdbcCursorItemReaderBuilder<Domain>()
.dataSource(dataSourceConfig.dataSource())
.name("cursorReader")
.sql(Constants.QUERY_SQL)
.preparedStatementSetter(new PreparedStatementSetter() {
#Override
public void setValues(PreparedStatement ps) throws SQLException {
ps.setLong(1, id);
}})
.rowMapper(new RowMapper())
.build();
}
//Switch from cursorReader2 to pagingReader2 in step2
public JdbcPagingItemReader<Domain> pagingReader2(Long id) {
return new JdbcPagingItemReaderBuilder<Domain>()
.dataSource(dataSourceConfig.dataSource())
.name("pagingReader")
.queryProvider(queryProvider())
.parameterValues(parameterValues(id))
.rowMapper(new RowMapper())
.pageSize(100)
.build();
}
#Bean
public PagingQueryProvider queryProvider() {
SqlPagingQueryProviderFactoryBean providerFactory = new SqlPagingQueryProviderFactoryBean();
Map<String, Order> sortKeys = new HashMap<>(2);
sortKeys.put("ID", Order.ASCENDING);
providerFactory.setDataSource(dataSourceConfig.dataSource());
providerFactory.setSelectClause("SELECT Clause");
providerFactory.setFromClause("FROM Clause");
providerFactory.setWhereClause("WHERE Clause");
providerFactory.setSortKeys(sortKeys);
PagingQueryProvider pagingQueryProvider = null;
try {
pagingQueryProvider = providerFactory.getObject();
} catch (Exception e) {
logger.error("Failed to get PagingQueryProvider", e);
throw new RuntimeException("Failed to get PagingQueryProvider", e);
}
return pagingQueryProvider;
}
private Map<String, Object> parameterValues(Long id) {
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("1", id);
return parameterValues;
}

Spring Batch 4.0 ~ Code under JdbcCursorItemReader method doesn't run with #StepScope defined

I have a sql query defined in my batch job that needs to get input at runtime from the user.
I have the following item reader in my batch job defined as follows
#StepScope
#Bean
public JdbcCursorItemReader<QueryCount> queryCountItemReader() throws Exception {
ListPreparedStatementSetter preparedStatementSetter = new ListPreparedStatementSetter() {
#Override
public void setValues(PreparedStatement pstmt) throws SQLException {
pstmt.setString(1, "#{jobparameters[fromDate]}");
pstmt.setString(2, "#{jobparameters[toDate]}");
pstmt.setString(3, "#{jobparameters[fromDate]}");
pstmt.setString(4, "#{jobparameters[toDate]}");
pstmt.setString(5, "#{jobparameters[fromDate]}");
pstmt.setString(6, "#{jobparameters[toDate]}");
pstmt.setString(7, "#{jobparameters[eventType]}");
pstmt.setString(8, "#{jobparameters[businessUnit]}");
pstmt.setString(9, "#{jobparameters[deviceCategory]}");
pstmt.setString(10, "#{jobparameters[numberOfSearchIds]}");
}
};
JdbcCursorItemReader<QueryCount> queryCountJdbcCursorItemReader = new JdbcCursorItemReader<>();
queryCountJdbcCursorItemReader.setDataSource(dataSource);
queryCountJdbcCursorItemReader.setSql(sqlQuery);
queryCountJdbcCursorItemReader.setRowMapper(new QueryCountMapper());
queryCountJdbcCursorItemReader.setPreparedStatementSetter(preparedStatementSetter);
int counter = 0;
ExecutionContext executionContext = new ExecutionContext();
queryCountJdbcCursorItemReader.open(executionContext);
try {
QueryCount queryCount;
while ((queryCount = queryCountJdbcCursorItemReader.read()) != null) {
System.out.println(queryCount.toString());
counter++;
}
}catch (Exception e){
e.printStackTrace();
}finally {
queryCountJdbcCursorItemReader.close();
}
return queryCountJdbcCursorItemReader;
}
I am sending in the job parameters from my application class as follows
JobParameters jobParameters = new JobParametersBuilder()
.addString("fromDate", "20180410")
.addString("toDate", "20180410")
.addString("eventType", "WEB")
.addString("businessUnit", "UPT")
.addString("numberOfSearchIds", "10")
.toJobParameters();
JobExecution execution = jobLauncher.run(job, jobParameters);
The issue is, when I run my batch job the code inside the queryCountItemReader() method is never executed and the job completes with no errors. Essentially the sql query I am trying to run never executes. If I remove the #StepScope annotation the code will then run but fail with an error since it is enable to bind the parameters sent in from the application class to the sql query. I realize that #StepScope is necessary to use job parameters but why doesn't the code in my method execute?
Solved this by adding #EnableBatchProcessing & #EnableAutoConfigurationannotations and changing the item reader method definition as follows,
#StepScope
#Bean
public JdbcCursorItemReader<QueryCount> queryCountItemReader(#Value("#{jobParameters['fromDate']}") String fromDate,
#Value("#{jobParameters['toDate']}") String toDate,
#Value("#{jobParameters['eventType']}") String eventType,
#Value("#{jobParameters['businessUnit']}") String businessUnit,
#Value("#{jobParameters['deviceCategory']}") String deviceCategory,
#Value("#{jobParameters['numberOfSearchIds']}") String numberOfSearchIds) throws Exception {

How to exclude job parameter from uniqueness in Spring Batch?

I am trying to launch a job in Spring Batch 2, and I need to pass some information in the job parameters, but I do not want it to count for the uniqueness of the job instance. For example, I'd want these two sets of parameters to be considered unique:
file=/my/file/path,session=1234
file=/my/file/path,session=5678
The idea is that there will be two different servers trying to start the same job, but with different sessions attached to them. I need that session number in both cases. Any ideas?
Thanks!
So, if 'file' is the only attribute that's supposed to be unique and 'session' is used by downstream code, then your problem matches almost exactly what I had. I had a JMSCorrelationId that i needed to store in the execution context for later use and I didn't want it to play into the job parameters' uniqueness. Per Dave Syer, this really wasn't possible, so I took the route of creating the job with the parameters (not the 'session' in your case), and then adding the 'session' attribute to the execution context before anything actually runs.
This gave me access to 'session' downstream but it was not in the job parameters so it didn't affect uniqueness.
References
https://jira.springsource.org/browse/BATCH-1412
http://forum.springsource.org/showthread.php?104440-Non-Identity-Job-Parameters&highlight=
You'll see from this forum that there's no good way to do it (per Dave Syer), but I wrote my own launcher based on the SimpleJobLauncher (in fact I delegate to the SimpleLauncher if a non-overloaded method is called) that has an overloaded method for starting a job that takes a callback interface that allows contribution of parameters to the execution context while not being 'true' job parameters. You could do something very similar.
I think the applicable LOC for you is right here:
jobExecution = jobRepository.createJobExecution(job.getName(),
jobParameters);
if (contributor != null) {
if (contributor.contributeTo(jobExecution.getExecutionContext())) {
jobRepository.updateExecutionContext(jobExecution);
}
}
which is where, after execution context creatin, the execution context is added to. Hopefully this helps you in your implementation.
public class ControlMJobLauncher implements JobLauncher, InitializingBean {
private JobRepository jobRepository;
private TaskExecutor taskExecutor;
private SimpleJobLauncher simpleLauncher;
private JobFilter jobFilter;
public void setJobRepository(JobRepository jobRepository) {
this.jobRepository = jobRepository;
}
public void setTaskExecutor(TaskExecutor taskExecutor) {
this.taskExecutor = taskExecutor;
}
/**
* Optional filter to prevent job launching based on some specific criteria.
* Jobs that are filtered out will return success to ControlM, but will not run
*/
public void setJobFilter(JobFilter jobFilter) {
this.jobFilter = jobFilter;
}
public JobExecution run(final Job job, final JobParameters jobParameters, ExecutionContextContributor contributor)
throws JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException, JobParametersInvalidException, JobFilteredException {
Assert.notNull(job, "The Job must not be null.");
Assert.notNull(jobParameters, "The JobParameters must not be null.");
//See if job is filtered
if(this.jobFilter != null && !jobFilter.launchJob(job, jobParameters)) {
throw new JobFilteredException(String.format("Job has been filtered by the filter: %s", jobFilter.getFilterName()));
}
final JobExecution jobExecution;
JobExecution lastExecution = jobRepository.getLastJobExecution(job.getName(), jobParameters);
if (lastExecution != null) {
if (!job.isRestartable()) {
throw new JobRestartException("JobInstance already exists and is not restartable");
}
logger.info(String.format("Restarting job %s instance %d", job.getName(), lastExecution.getId()));
}
// Check the validity of the parameters before doing creating anything
// in the repository...
job.getJobParametersValidator().validate(jobParameters);
/*
* There is a very small probability that a non-restartable job can be
* restarted, but only if another process or thread manages to launch
* <i>and</i> fail a job execution for this instance between the last
* assertion and the next method returning successfully.
*/
jobExecution = jobRepository.createJobExecution(job.getName(),
jobParameters);
if (contributor != null) {
if (contributor.contributeTo(jobExecution.getExecutionContext())) {
jobRepository.updateExecutionContext(jobExecution);
}
}
try {
taskExecutor.execute(new Runnable() {
public void run() {
try {
logger.info("Job: [" + job
+ "] launched with the following parameters: ["
+ jobParameters + "]");
job.execute(jobExecution);
logger.info("Job: ["
+ job
+ "] completed with the following parameters: ["
+ jobParameters
+ "] and the following status: ["
+ jobExecution.getStatus() + "]");
} catch (Throwable t) {
logger.warn(
"Job: ["
+ job
+ "] failed unexpectedly and fatally with the following parameters: ["
+ jobParameters + "]", t);
rethrow(t);
}
}
private void rethrow(Throwable t) {
if (t instanceof RuntimeException) {
throw (RuntimeException) t;
} else if (t instanceof Error) {
throw (Error) t;
}
throw new IllegalStateException(t);
}
});
} catch (TaskRejectedException e) {
jobExecution.upgradeStatus(BatchStatus.FAILED);
if (jobExecution.getExitStatus().equals(ExitStatus.UNKNOWN)) {
jobExecution.setExitStatus(ExitStatus.FAILED
.addExitDescription(e));
}
jobRepository.update(jobExecution);
}
return jobExecution;
}
static interface ExecutionContextContributor {
boolean CONTRIBUTED_SOMETHING = true;
boolean CONTRIBUTED_NOTHING = false;
/**
*
* #param executionContext
* #return true if the exeuctioncontext was contributed to
*/
public boolean contributeTo(ExecutionContext executionContext);
}
#Override
public void afterPropertiesSet() throws Exception {
Assert.state(jobRepository != null, "A JobRepository has not been set.");
if (taskExecutor == null) {
logger.info("No TaskExecutor has been set, defaulting to synchronous executor.");
taskExecutor = new SyncTaskExecutor();
}
this.simpleLauncher = new SimpleJobLauncher();
this.simpleLauncher.setJobRepository(jobRepository);
this.simpleLauncher.setTaskExecutor(taskExecutor);
this.simpleLauncher.afterPropertiesSet();
}
#Override
public JobExecution run(Job job, JobParameters jobParameters)
throws JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException, JobParametersInvalidException {
return simpleLauncher.run(job, jobParameters);
}
}
Starting from spring batch 2.2.x, there is support for non-identifying parameters. If you are using CommandLineJobRunner, you can specify non-identifying parameters with '-' prefix.
For example:
java org.springframework.batch.core.launch.support.CommandLineJobRunner file=/my/file/path -session=5678
If you are using old version of spring batch, you need to migrate your database schema. See 'Migrating to 2.x.x' section at http://docs.spring.io/spring-batch/getting-started.html.
This is the Jira page of the feature https://jira.springsource.org/browse/BATCH-1412, and here are the change that implement it https://fisheye.springsource.org/changelog/spring-batch?cs=557515df45c0f596588418d53c3f2bae3781c1c3
In more recent versions of Spring Batch (I am using spring-batch-core:4.3.3), you can use the JobParametersBuilder to specify whether a parameter is identifying or not. For example:
new JobParametersBuilder()
.addString("identifying-param-name", paramValue1)
.addString("non-identifying-param-name", paramValue2, false)
.toJobParameters();
The 'false' in the third argument makes the parameter non-identifying.