Job context in parallel execution [Spring Batch]

Job context in parallel execution [Spring Batch] - spring-batch

I'm using Asynch jobLaucher for starting spring batch jobs. I tried to execute the same job parallelly but it seems the 2nd started job caused side effect to the 1st job. (It seems that the value of step_output of the 1st job overwritten by the 2nd job.
Everything works well if I use Synch JobLauncher.
My question is. Can I use jobContext in the following way?
MyTaskLetStep (implements StepExecutionListener):
public void beforeStep(StepExecution stepExecution) {
JobExecution jobExecution = stepExecution.getJobExecution();
ExecutionContext jobContext = jobExecution.getExecutionContext();
...
jobContext.put("step_output" ,outList);
...
}
Started Jobs looks like this one:
<batch:job id="TACKJob" restartable="true" incrementer="runIdIncrementer" parent="joblistenerjob">
<batch:step id="F1" next = "F2">
<batch:tasklet ref="N_COMMAND"/>
</batch:step>
<batch:step id="F2">
<batch:tasklet ref="Z_COMMAND"/>
</batch:step>
</batch:job>
The referenced tasklet beans are prototype scoped beans. They implement the referred StepExecutionListener.
Do you have any advices for this problem?
Thanks

Related

Spring Batch annotation based Job added as a step to XML based job

We have a Spring batch project which is XML based
We need to create a new job and we need to add the job as a nested job to previous XML based job
Is it possible to create the new Job annotation based and add a step to existing XML based job?
I have created a Tasklet Step and tried adding to XML based Job as a Step and am getting.
Cannot convert value of type 'org.springframework.batch.core.step.tasklet.TaskletStep' to required type 'org.springframework.batch.core.step.tasklet.Tasklet' for property 'tasklet': no matching editors or conversion strategy found

A tasklet is not the appropriate type to delegate a step processing to a job, you should use a JobStep instead.
The main job can be defined in XML and refer to the "delegate" job (which could be a bean defined in XML or Java config). Here is an example:
<batch:job id="mainJob">
<batch:step id="step">
<batch:job ref="subjob">
</batch:job>
</batch:step>
</batch:job>
In this example, subjob could be a Spring Batch job defined in XML or Java config.

Exception that terminate batch but without rollback

I need to insert a record into a table and then have the batch terminate with an exception without causing the insert to be rolled back.
I have used but it does not terminate the batch

You can use a fault tolerant step and set the exceptions that should not cause a rollback using FaultTolerantStepBuilder#noRollback.
Then you can use a listener (ItemProcessListener or ItemWriteListener depending on where the exception is thrown) to intercept the exception and terminate the step (and its surrounding job) with StepExecution#setTerminateOnly.

I have used this:
<batch:step id="id">
<batch:tasklet>
<batch:chunk reader="reader" processor="processor" writer="writer" commit-interval="1">
</batch:chunk>
<batch:no-rollback-exception-classes>
<batch:include class="com.exception.myException"></batch:include>
</batch:no-rollback-exception-classes>
</batch:tasklet>
</batch:step>
it allows only to avoid rollback but not terminate batch. I want terminate batch after the execution of myException.
The myException is launched in processor

Spring Batch: Duplicate rows after job re-run

Our Spring Batch application is, upon restart of a failed job, processing the same records again, resulting in duplicate rows, and we want to understand how to avoid this.
The Spring Integration poller which starts the batch job is configured to run every couple of hours. When it runs a second time, the job parameters will be the same, but if the previous run failed (for example, because of a DataTruncation exception), Spring Batch will not complain that the job has already completed.
At the point of failure, several hundred thousand records will already have been processed and copied fromn the source table to the destination table. When the job is run a subsequent time, the same rows will be copied to the destination table, resulting in duplicates. Therefore, it appears that the job is not being resumed, but restarted from the beginning.
The Spring Batch database is Derby (file based), this is setup when the application starts, and it appears state is not maintained between restarts of the actual application (because a job can be run again with the same parameters). However, within one application run, state is maintained. For instance, if the job completes succesfully, the next time the poller runs an exception will be thrown because a job (with those parameters) has already completed.
Our job is definition is as follows:
<batch:job id="publisherJob" >
<batch:step id="step1">
<batch:tasklet >
<batch:chunk reader="itemReader" processor="itemProcessor"
writer="itemWriter" commit-interval="${...}" />
</batch:tasklet>
<batch:listeners>
...
</batch:listeners>
</batch:job>
<bean id="itemReader" class="org.springframework.batch.item.database.JdbcCursorItemReader">
<property name="dataSource" ref="dataSource" />
<property name="sql" value="select ${...} from ${...} where ${...}" />
<property name="rowMapper" ref="rowMapper" />
</bean>
The WHERE clause includes ORDER BY.
Our understanding was that Spring Batch would retain the state at which processing failed and proceed from that point (if the error in the source table has been fixed), therefore preventing duplicate rows. What has to be configured for this to happen?
Thanks

Spring Batch maintains state in that it remembers how many records were processed, not specifically which ones. Because of that, it's up to you to guarantee the order of the items is reproducible from run to run so that if we process 100 records in run 1 and fail, when we skip the first 100 records in run 2, those are the right 100 records to skip. You didn't provide the configuration for your JdbcCursorItemReader but my assumption is that you are not using an order by in your SQL. If you want restartability, you need some way to guarantee the order of the items. Using an order by in your SQL is the easiest way to accomplish this (there are others like using the process indicator pattern if that's needed).

Spring Batch Javaconfig - parameterize commit-interval aka chunksize

with Spring Batch xml based configuration you can parameterize the commit-interval / chunk size like:
<job id="basicSimpleJob"
xmlns="http://www.springframework.org/schema/batch">
<step id="basicSimpleStep" >
<tasklet>
<chunk
reader="reader"
processor="processor"
writer="writer"
commit-interval="#{jobParameters['commit.interval']}">
</chunk>
</tasklet>
</step>
</job>
with javaconfig based configuration it could look like
#Bean
public Step step(
ItemStreamReader<Map<String, Object>> reader,
ItemWriter<Map<String, Object>> writer,
#Value("#{jobParameters['commit.interval']}") Integer commitInterval
) throws Exception {
return steps
.get("basicSimpleStep")
.<Map<String, Object>, Map<String, Object>>chunk(commitInterval)
.reader(reader)
.processor(new FilterItemProcessor())
.writer(writer)
.build();
}
but it does not work, i get either
Caused by:
org.springframework.expression.spel.SpelEvaluationException:
EL1008E:(pos 0): Property or field 'jobParameters' cannot be found on
object of type
'org.springframework.beans.factory.config.BeanExpressionContext' -
maybe not public?
or - while using #StepScope for the step bean -
Caused by: java.lang.IllegalStateException: No context holder
available for step scope
i know i have a working stepscope, other stepscoped beans work(defined inside same class as step)
right now i use a CompletionPolicy which does work with stepScope but i would like to know if someone got it to work the "normal" way or if it's time for a JIRA ticket
... which is created at https://jira.spring.io/browse/BATCH-2263

Adding #JobScope annotation to Step definition is working in Spring Batch 3:
#Bean
#JobScope
public Step step(
ItemStreamReader<Map<String, Object>> reader,
ItemWriter<Map<String, Object>> writer,
#Value("#{jobParameters['commit.interval']}") Integer commitInterval
)
This will initialize the step bean at job execution, so late binding of jobParameters is working in this case.

I have poor confidence with JavaConfig and - maybe - this can be an issue only for commit-interval during late binding for java configuration (in SB ChunkElementParser.java source there are few lines of code that checks for commit-interval starts with a # and inject a SimpleCompletionPolicy step scoped); you can try injecting a StepExecutionSimpleCompletionPolicy and check if this solution works.
Also, I have never tried late binding commit-interval with xml config but there is a [opened ticket with title Commit Interval not working as intended when used in Late Binding
As last chance, if you are using version 3.0, you can also annotate step with #JobScope and check if this solution works.

Spring batch Writer, Processor called twice on skip?

I have a very basic question regarding skip. i am using the spring-batch-simple-cli project provided by spring samples, and trying to understand the skip behaviour. this has a very basic example reader that reads from an array of strings ( i have modifed it to read from a list of 10 strings starting from Hellowworld 1 to Hellowworld 10) and a basic writer that logs to console. writer throws java.lang.Exception on each write.
i have added a skip limit of 4 to job configuration. once it reached Hellowworld 5, the job stops as expected.
But whenever writer throws exception, the writer is called back immediately with the same item. My question why is writer called twice? i am expecting this item to be just skipped? is there something i am missing.
<job id="job1" xmlns="http://www.springframework.org/schema/batch" incrementer="jobParametersIncrementer">
<step id="step1" parent="simpleStep">
<tasklet>
<chunk reader="reader" writer="writer" skip-limit="4" >
<skippable-exception-classes>
<include class="java.lang.Exception" />
</skippable-exception-classes>
</chunk>
</tasklet>
</step>
</job>

This is most likely caused by the default functionality, where spring batch rollbacks the chunk and retries each chunk item (in this case there is only one item) again.
https://stackoverflow.com/a/6730807/1627688

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Job context in parallel execution [Spring Batch] - spring-batch

Related

Spring Batch annotation based Job added as a step to XML based job

Exception that terminate batch but without rollback

Spring Batch: Duplicate rows after job re-run

Spring Batch Javaconfig - parameterize commit-interval aka chunksize

Spring batch Writer, Processor called twice on skip?

Categories

Resources