Spring batch Writer, Processor called twice on skip? - spring-batch

I have a very basic question regarding skip. i am using the spring-batch-simple-cli project provided by spring samples, and trying to understand the skip behaviour. this has a very basic example reader that reads from an array of strings ( i have modifed it to read from a list of 10 strings starting from Hellowworld 1 to Hellowworld 10) and a basic writer that logs to console. writer throws java.lang.Exception on each write.
i have added a skip limit of 4 to job configuration. once it reached Hellowworld 5, the job stops as expected.
But whenever writer throws exception, the writer is called back immediately with the same item. My question why is writer called twice? i am expecting this item to be just skipped? is there something i am missing.
<job id="job1" xmlns="http://www.springframework.org/schema/batch" incrementer="jobParametersIncrementer">
<step id="step1" parent="simpleStep">
<tasklet>
<chunk reader="reader" writer="writer" skip-limit="4" >
<skippable-exception-classes>
<include class="java.lang.Exception" />
</skippable-exception-classes>
</chunk>
</tasklet>
</step>
</job>

This is most likely caused by the default functionality, where spring batch rollbacks the chunk and retries each chunk item (in this case there is only one item) again.
https://stackoverflow.com/a/6730807/1627688

Related

Spring Batch annotation based Job added as a step to XML based job

We have a Spring batch project which is XML based
We need to create a new job and we need to add the job as a nested job to previous XML based job
Is it possible to create the new Job annotation based and add a step to existing XML based job?
I have created a Tasklet Step and tried adding to XML based Job as a Step and am getting.
Cannot convert value of type 'org.springframework.batch.core.step.tasklet.TaskletStep' to required type 'org.springframework.batch.core.step.tasklet.Tasklet' for property 'tasklet': no matching editors or conversion strategy found
A tasklet is not the appropriate type to delegate a step processing to a job, you should use a JobStep instead.
The main job can be defined in XML and refer to the "delegate" job (which could be a bean defined in XML or Java config). Here is an example:
<batch:job id="mainJob">
<batch:step id="step">
<batch:job ref="subjob">
</batch:job>
</batch:step>
</batch:job>
In this example, subjob could be a Spring Batch job defined in XML or Java config.

Exception that terminate batch but without rollback

I need to insert a record into a table and then have the batch terminate with an exception without causing the insert to be rolled back.
I have used but it does not terminate the batch
You can use a fault tolerant step and set the exceptions that should not cause a rollback using FaultTolerantStepBuilder#noRollback.
Then you can use a listener (ItemProcessListener or ItemWriteListener depending on where the exception is thrown) to intercept the exception and terminate the step (and its surrounding job) with StepExecution#setTerminateOnly.
I have used this:
<batch:step id="id">
<batch:tasklet>
<batch:chunk reader="reader" processor="processor" writer="writer" commit-interval="1">
</batch:chunk>
<batch:no-rollback-exception-classes>
<batch:include class="com.exception.myException"></batch:include>
</batch:no-rollback-exception-classes>
</batch:tasklet>
</batch:step>
it allows only to avoid rollback but not terminate batch. I want terminate batch after the execution of myException.
The myException is launched in processor

JobExecution null in spring batch

I am running jobs in parallel. My job execution is always null when I use JobRepositoryFactoryBean. I need to use to use this. If I don't use this, then I will not be able to use metadata tables. Because I want to restart my job when it is not completed because of some failure reason. So, I want previous record which I will be fetching from metadata tables. And if I use MapJobRepositoryFactoryBean, the job execution is not null. But then there will not be insertion in metadata tables.
I referred this link:-
My job is always null. Can't inject a batch job with Spring Batch. Why?
But the link is not working for me.
My congifuration is
<bean id="batchScheduler" class="com.abc.BatchScheduler">
<property name="jobLauncher" ref="jobLauncher" />
<property name="jobtwo" ref="JobTwo" />
</bean>
I searched a lot. Please help me out. I am not able to proceed.

Spring Batch Integration, Email to be sent out in case of JobInstanceAlreadyCompleteException

I would like to put a hook somewhere in the following code/config to be able to spot a JobInstanceAlreadyCompleteException and then email the production support team that this occurred.
I have tried a JobExecutionListener#beforeJob() method in Spring Batch, but the JobInstanceAlreadyCompleteException is occurring before job execution.
I am using this Spring Batch Integration configuration from the documentation:
<int:channel id="inboundFileChannel"/>
<int:channel id="outboundJobRequestChannel"/>
<int:channel id="jobLaunchReplyChannel"/>
<int-file:inbound-channel-adapter id="filePoller"
channel="inboundFileChannel"
directory="file:/tmp/myfiles/"
filename-pattern="*.csv">
<int:poller fixed-rate="1000"/>
</int-file:inbound-channel-adapter>
<int:transformer input-channel="inboundFileChannel"
output-channel="outboundJobRequestChannel">
<bean class="io.spring.sbi.FileMessageToJobRequest">
<property name="job" ref="personJob"/>
<property name="fileParameterName" value="input.file.name"/>
</bean>
</int:transformer>
I want to handle JobInstanceAlreadyCompleteException in case the same CSV file name appears as the job parameter. Do I extend org.springframework.integration.handler.LoggingHandler?
I notice that class is reporting the error:
ERROR org.springframework.integration.handler.LoggingHandler - org.springframework.messaging.MessageHandlingException: org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException: A job instance already exists and is complete for parameters={input.file.name=C:\Users\csv\file2015.csv}. If you want to run this job again, change the parameters.
The ERROR org.springframework.integration.handler.LoggingHandler is done from the default errorChannel which is reached from the <poller> on your <int-file:inbound-channel-adapter>.
So, to handle it manually your just need to specify your own error-channel there a go ahead with email sending:
<int-file:inbound-channel-adapter>
<int:poller fixed-rate="1000" error-channel="sendErrorToEmailChannel"/>
</int-file:inbound-channel-adapter>
<int-mail:outbound-channel-adapter id="sendErrorToEmailChannel"/>
Of course, you will have to do some ErrorMessage transformation before sending ti over e-mail, but that is already details of the target business logic implementation.

Spring Batch: Duplicate rows after job re-run

Our Spring Batch application is, upon restart of a failed job, processing the same records again, resulting in duplicate rows, and we want to understand how to avoid this.
The Spring Integration poller which starts the batch job is configured to run every couple of hours. When it runs a second time, the job parameters will be the same, but if the previous run failed (for example, because of a DataTruncation exception), Spring Batch will not complain that the job has already completed.
At the point of failure, several hundred thousand records will already have been processed and copied fromn the source table to the destination table. When the job is run a subsequent time, the same rows will be copied to the destination table, resulting in duplicates. Therefore, it appears that the job is not being resumed, but restarted from the beginning.
The Spring Batch database is Derby (file based), this is setup when the application starts, and it appears state is not maintained between restarts of the actual application (because a job can be run again with the same parameters). However, within one application run, state is maintained. For instance, if the job completes succesfully, the next time the poller runs an exception will be thrown because a job (with those parameters) has already completed.
Our job is definition is as follows:
<batch:job id="publisherJob" >
<batch:step id="step1">
<batch:tasklet >
<batch:chunk reader="itemReader" processor="itemProcessor"
writer="itemWriter" commit-interval="${...}" />
</batch:tasklet>
<batch:listeners>
...
</batch:listeners>
</batch:job>
<bean id="itemReader" class="org.springframework.batch.item.database.JdbcCursorItemReader">
<property name="dataSource" ref="dataSource" />
<property name="sql" value="select ${...} from ${...} where ${...}" />
<property name="rowMapper" ref="rowMapper" />
</bean>
The WHERE clause includes ORDER BY.
Our understanding was that Spring Batch would retain the state at which processing failed and proceed from that point (if the error in the source table has been fixed), therefore preventing duplicate rows. What has to be configured for this to happen?
Thanks
Spring Batch maintains state in that it remembers how many records were processed, not specifically which ones. Because of that, it's up to you to guarantee the order of the items is reproducible from run to run so that if we process 100 records in run 1 and fail, when we skip the first 100 records in run 2, those are the right 100 records to skip. You didn't provide the configuration for your JdbcCursorItemReader but my assumption is that you are not using an order by in your SQL. If you want restartability, you need some way to guarantee the order of the items. Using an order by in your SQL is the easiest way to accomplish this (there are others like using the process indicator pattern if that's needed).