I'm using Spring Batch with partitioning.
Initially, I was getting DeadlockLoserDataAccessException and then I have configured our steps as faultTolerant, please see the following code -
Step masterCalculationStep = stepBuilderFactory.get("STEP_1")
.<Map<Long, List<CostCalculation>>, List<TempCostCalc>>chunk(1).reader(reader)
.processor(processor)
.writer(writer)
.faultTolerant()
.retryLimit(5)
.retry(DeadlockLoserDataAccessException.class)
.build();
but now we are getting another exception -
org.springframework.batch.core.step.skip.NonSkippableReadException:
Non-skippable exception during read
Don't know why this new exception and how to resolve it?
The RetryPolicy in a chunk oriented step is not applied to the reader. So if your reader might throw a transient exception, you need to add the retry logic around the reader yourself. This can be done for example with:
AOP by applying org.springframework.retry.interceptor.RetryOperationsInterceptor to your reader
or by using org.springframework.retry.support.RetryTemplate in a decorator of your reader that reties the read method when it throws a transient exception
Similar questions can be found here:
Spring batch retry mechanism for reader failure
Retry in case of a failed reading
Hope this helps.
Related
I am trying to use Spring Batch to handle a bunch of precheck, validation and actions together.
If the precheck or validation fails, then I want to re-run all the pre-checks and validations regardless they previously succeeded or failed; but if an action fails, I want to start from the failed action only.
I want to group the validation or precheck in a step such as FlowStep, but it seems that the Spring retry does not support this.
Does anybody know a solution? thanks in advance!
You can group two steps in a FlowStep as follows:
#Bean
public Flow preCheckAndValidationFlow() {
return new FlowBuilder<SimpleFlow>("flow")
.start(preCheckStep())
.next(validationStep())
.build();
}
#Bean
public Step preCheckAndValidationStep(JobRepository jobRepository) {
FlowStep flowStep = new FlowStep();
flowStep.setFlow(preCheckAndValidationFlow());
flowStep.setJobRepository(jobRepository);
return flowStep;
}
If you want the preCheckStep to re-run (even if it was successful the first time) when the validation step fails, then you need to set the allowStartIfComplete flag on the preCheckStep.
IMO, it would be much more simpler for you to create a single step that contains both the prechecks and validation logic.
Commit interval will commit the data at specified intervals. I want to commit the entire file at a single shot since my requirement is to validate the file (line by line) and if it fails at any point . roll back. no commit. is there any way to achieve this in spring batch?
You can either set your commit-interval to Integer.MAX_VALUE (231-1) or create your own CompletionPolicy.
Here's how you configure a step to use a custom CompletionPolicy :
<chunk reader="reader" writer="writer" chunk-completion-policy="completionPolicy"/>
<bean id="completionPolicy" class="xx.xx.xx.CompletionPolicy"/>
Then you have to either choose an out-of-the-box CompletionPolicy provided by Spring Batch (a list of implementations is available on previous link) or create your own.
What do you mean by "commit"?
You are talking about validating and not about writing the read data to another file or into database.
As mentioned in the comment by Michael Prarlow, memory problems could arise, if the size of the file changes.
In order to prevent this, I would suggest to start your job with a validation step. Simply read the data chunkwise, check the data line by line in your processor and throw a none-skippable exception, if the line is not valid. Use a passthroughwriter, so nothing is persisted. If there is a problem, the whole job will fail.
If you really have to write the data into a db or another file, you could do this in a second step. Since you have validated your data, you shouldn't observe any problems.
Simple PassThroughItemWriter
public class PassThroughItemWriter<T> implements ItemWriter<T> {
public void write(List<? extends T> items) {
// do nothing
}
}
or, if you use the Java-Api to build your job and steps, you could simply use a lambda:
stepBuilders.get("step")
.<..., ...>chunk(..)
.reader(...)
.processor(...) // your processor with the validation logic
.writer(items -> {}) // empty lambda expression
.build();
I use spring batch 3.0.3.RELEASE in grails 2.4.4
I found exception when i execute the code below.
"When ##GLOBAL.ENFORCE_GTID_CONSISTENCY = 1, updates to non-transactional tables can only be done in either autocommitted statements or single-statement transactions, and never in the same statement as updates to transactional tables."
the code is
List<Flow> flowList = Lists.newArrayList()
Shop.findAllByCityIdAndTypeAndStatus(cityId, 1 as byte, 1 as byte).each {
Shop stationShop ->
TaskletStep taskletStep = stepBuilderFactory.get("copy_city_item_to_station").tasklet(new Tasklet() {
#Override
RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
copyCityItemToStationItem(item, stationShop)
return RepeatStatus.FINISHED
}
}).build()
Flow flow = new FlowBuilder<Flow>("subflow").from(taskletStep).end();
flowList.add(flow)
}
Flow splitFlow = new FlowBuilder<Flow>("split_city_item_to_station").split(eventTaskExecutor).add(flowList.toArray(new Flow[0])).build();
FlowJobBuilder builder = jobBuilderFactory.get("push_item_to_all_station").start(splitFlow).end();
Job job = builder.preventRestart().build()
jobLauncher.run(job, new JobParametersBuilder().addLong("city.item.id", item.id).toJobParameters())
the google say the problom maybe exist in "https://dev.mysql.com/doc/refman/5.6/en/replication-gtids-restrictions.html", so i replace all the ENGINE form MyISAM to InnoDB in file 'schema-mysql.sql', and it works.
now i want to know what i do is right or not, is there a potential bug in my way ?
What you did is correct. That's a bug in Spring Batch's generated SQL file for MySql. I've created an issue in Jira that you can follow here: https://jira.spring.io/browse/BATCH-2373.
I have a job that is built of the following components
Processing Step - activates an external processing that logs it's result in the DB and returns an internal id so I can take this id and process it further
Logging Step - built of a tasklet, that contains a chunk
The chunk is built of an item reader - that I plan will use #{stepExecutionContext['job.id']} as part of the sql written in the xml file so it will get the relevant logging info
I'm trying to work with the solution suggested here - 11.8 Passing Data to Future Steps but i get this error when i try to add a property on the step execution context or the job execution context
chunkContext.getStepContext().getStepExecutionContext().put("job.id", jobId);
And I get this error:
java.lang.UnsupportedOperationException: null
at java.util.Collections$UnmodifiableMap.put(Collections.java:1342)
at ...
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:395)
Did i miss something?
StepContext available from ChunkContext is a read-only object; use a StepExecutionListener and save the step execution context passed as parameter in method StepExecutionListener.afterStep(StepExecution stepExecution)
I always follow this path:
chunkContext
.getStepContext()
.getStepExecution()
.getJobExecution()
.getExecutionContext()
.put("", "");
I also had problems doing the put in StepExecution.
Maybe it loses data when it goes to the next step.
However, I leave the last sentence to the most expert.
chunkContext.getStepContext().getStepExecutionContext() retrun copy use chunkContext.getStepContext().getStepExecution().getExecutionContext()
I need to execute seven distinctive processes sequently(One after the other). The data is stored in Mysql. I am thinking of the following options, Please correct me if I am wrong, or if there is a better solution.
Requirments:
Read the data from the Db, do the seven processes(datavalidation, calculation1, calculation2 ...etc.) finally, write the processed data into the DB.
Need to process the data in chunks.
My solution and issues:
Data read:
Read the data using JdbcCursorItemReader, because this is the best performing db reader - But, the SQL is very complex , so I may have to consider a custom ItemReader using JdbcTemplate? which gives me more flexibility in handling the data.
Process:
Define seven steps and chunks, share the data between the steps using databean. But, this won't be a good idea, because the data processes in chunks and after each chunk the step1 writer will create a new set of data in the databean. When this databean shared across the other steps, data integrity will be an issue.
Use StepExecutionContext to share the data between steps. But this may affect the performance as this involves Batch job repository.
Define only one step, with one ItemReader, and a chain of processes (the seven processes), and create one ItemWriter which writes the processed data into the DB. But, I won't be able to administrate or monitor each different processes, all will be in one step.
the org.springframework.batch.item.support.CompositeItemProcessor is an out of the box component from the Spring Batch Framework that would support your requirement akin to your second option. this would allow you do to the following;
- keep separation in your design/solution for reading from the database (itemreader)
- keep separation of each individual processors 'concerns' and configuration
- allow any individual processor to 'shutdown' the chunk by returning null, irrespective of previous processes
the CompositeItemProcessor iterates over a loop of delegates, so it's 'similar' to an action pattern. it's quite useful in the scenario you've described and still allows you to leverage the Chunk benefits (exception, retry, commit policy, etc.)
Suggestions:
1) Read the data using JdbcCursorItemReader.
All out-of-the-box Components are a good choice because they already implements the ItemStream interface that make your steps restartable. But like you mention, sometime, the request is just to complexe or, like me, you already have a service or DAO that you can reuse.
I would suggest you use the ItemReaderAdapter. It let you configure a delegate service to call to get your data.
<bean id="MyReader" class="xxx.adapters.MyItemReaderAdapter">
<property name="targetObject" ref="AnExistingDao" />
<property name="targetMethod" value="next" />
</bean>
Note that the targetMethod must respect the read contract of ItemReaders (return null when no more data)
If your job does not need to be restartable, you could simply use the class : org.springframework.batch.item.adapter.ItemReaderAdapter
But if you need your job to be restartable, you can create your own ItemReaderAdapter like this:
public class MyItemReaderAdapter<T> extends AbstractMethodInvokingDelegator<T> implements ItemReader<T>, ItemStream {
private long currentCount = 0;
private final String CONTEXT_COUNT_KEY = "count";
/**
* #return return value of the target method.
*/
public T read() throws Exception {
super.setArguments(new Long[]{currentCount++});
return invokeDelegateMethod();
}
#Override
public void open(ExecutionContext executionContext)
throws ItemStreamException {
currentCount = executionContext.getLong(CONTEXT_COUNT_KEY,0);
}
#Override
public void update(ExecutionContext executionContext) throws ItemStreamException {
executionContext.putLong(CONTEXT_COUNT_KEY, currentCount);
log.info("Update Stream current count : " + currentCount);
}
#Override
public void close() throws ItemStreamException {
// TODO Auto-generated method stub
}
}
Because the out-of-the-box itemReaderAdapter is not restartable, you just create your own that implements the ItemStream
2) Regarding the 7 steps vs 1 step.
I would go with 1 step with compositeProcessor on this one. the 7 steps option will only bring problems IMO.
1) 7 steps databean : so your writer commit in a databean until step 7.. then step 7 writer try to commit to the real database and boom error!!! all is lost and the batch must restart from step 1!!
2) 7 steps with context : could be better since you will have the state saved in the spring batch metadata.. BUT it is not a good practice to store big data in the metadata of springBatch!!
3) is the way to go IMO. ;-)