Spring: How to restart transaction in a for-loop? - spring-data-jpa

I have a Spring Batch app that, during the write step, loops through records to be inserted into a Postgres database. Every now and again we get a DuplicateKeyException in the loop, but don't want the whole job to fail. We log that record and want to continue inserting the following records.
But upon getting an exception, the transaction becomes "bad" and Postgres won't accepted any more commands, as described in this excellent post. So my question is, what's the best way to restart the transaction? Again, I'm not retrying the record that failed - I just want to continue in my loop with the next record.
This is part of my job config xml:
<batch:job id="portHoldingsJob">
<batch:step id="holdingsStep">
<tasklet throttle-limit="10">
<chunk reader="PortHoldingsReader" processor="PortHoldingsProcessor" writer="PortHoldingsWriter" commit-interval="1" />
</tasklet>
</batch:step>
<batch:listeners>
<batch:listener ref="JobExecutionListener"/>
</batch:listeners>
</batch:job>
Thanks for any input!

Not sure if you are using the Spring transaction annotations to manage the transactions or not ... if so perhaps you can try.
#Transactional(noRollbackFor = DuplicateKeyException.class)
Hope that helps.
No rollback exceptions in Spring Batch are apparently designated like
<batch:tasklet>
<batch:chunk ... />
<batch:no-rollback-exception-classes>
<batch:include class="MyRuntimeException"/>
</batch:no-rollback-exception-classes>
</batch:tasklet>

Related

How are task executor of tasklet and partition handler different in spring batch?

I am running spring batch with partitioning. i.e. Partition/Read/Process/Write , where process step spends more time in I/O than a CPU intensive processing.
The problem with current partitioning scheme is that , it is creating some hot partitions, that is - Some of the partitions have more data to process as compared to others. So The job processes most of the records very fast, but last few records are processed slow because a few partitioned threads have more data and they are executing the records one by one, causing the job to complete slower.
Conceptually, I want to be able to spawn each partitioned step in to parallel threads (say 4 threads) , so that hot partitions are also processed in parallel , rather than 1 record at a time.
Here is the batch config I am currently using (before optimisation) for running partition/read/process logic for a job.
<batch:job id="Scheduler">
<batch:step id="Scheduler">
<batch:partition step="ScheduleStep" partitioner="Partitioner">
<batch:handler grid-size="30" task-executor="asyncTaskExecutor"/>
</batch:partition>
</batch:step>
</batch:job>
<batch:step id="ScheduleStep">
<batch:tasklet>
<batch:chunk reader="tsV2Reader" processor="tsV2Processor"
writer="tsV2Writer" commit-interval="1"/>
</batch:tasklet>
</batch:step>
<bean id="asyncTaskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor">
<property name="concurrencyLimit" value="30"/>
</bean>
I want to decrease partitions from 30 to say 10 and let each partition thread spawn its 4 threads, so that even if hot partition exists it still executes 4 records in parallel instead of 1. i.e. Overall 10 x 4 = 40 threads
So I changed the above config by adding task-executor at tasklet level with concurrencyLimit of 4 and changed gridSize and concurrencyLimit in config above to 10. i.e. -
<batch:job id="Scheduler">
<batch:step id="Scheduler">
<batch:partition step="ScheduleStep" partitioner="Partitioner">
<batch:handler **grid-size="10"** task-executor="asyncTaskExecutor"/>
</batch:partition>
</batch:step>
</batch:job>
<batch:step id="ScheduleStep">
<batch:tasklet **task-executor="taskletTaskExecutor" throttle-limit="4"**>
<batch:chunk reader="tsV2Reader" processor="tsV2Processor"
writer="tsV2Writer" commit-interval="1"/>
</batch:tasklet>
</batch:step>
<bean id="asyncTaskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor">
<property name="concurrencyLimit" **value="10"** />
</bean>
**<bean id="taskletTaskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor">
<property name="concurrencyLimit" value="4"/>
</bean>**
It seems to slow down the processing instead of speeding it up. May be the concurrenyLimit (4) of tasklet is being applied to all the threads. That I am not sure. I tried looking up the documentation on how the taskExecutor config for tasklet and handler is interpreted together under the hood, but not able to find relevant information. If someone can help with internals/understanding of these configs in question and how to achieve what I am looking for, will be of great help.

Tasklet transaction-manager and chunk transaction

I specified a tasklet with a chunk orientated procssing.
<batch:step id="midxStep" parent="stepParent">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk
reader="processDbItemReaderJdbc"
processor="midxItemProcessor"
writer="midxCompositeItemWriter"
processor-transactional="false"
reader-transactional-queue="false"
skip-limit="${cmab.batch.skip.limit}"
commit-interval="#{jobParameters['toProcess']==T(de.axa.batch.ecmcm.cmab.util.CmabConstants).TYPE_POSTAUSGANG ? '${consumer.global.pa.midx.commitSize}' : '${consumer.global.pe.midx.commitSize}' }"
cache-capacity="20">
<batch:skippable-exception-classes>
<batch:include class="de.axa.batch.ecmcm.cmab.util.CmabProcessMidxException" />
<batch:exclude class="java.lang.IllegalArgumentException" />
</batch:skippable-exception-classes>
<batch:retryable-exception-classes>
<batch:include class="de.axa.batch.ecmcm.cmab.util.CmabTechnicalMidxException" />
<batch:include class="de.axa.batch.ecmcm.cmab.util.CmabTechnicalException" />
</batch:retryable-exception-classes>
<batch:retry-listeners>
<batch:listener ref="logRetryListener"/>
</batch:retry-listeners>
<batch:listeners>
<batch:listener>
<bean id="midxProcessSkipListener" class="de.axa.batch.ecmcm.cmab.core.batch.listener.CmabDbSkipListener" scope="step">
<constructor-arg index="0" value="#{jobParameters['errorStatus']}" type="java.lang.String"/>
</bean>
</batch:listener>
</batch:listeners>
</batch:chunk>
<batch:transaction-attributes isolation="SERIALIZABLE" propagation="MANDATORY" timeout="${cmab.jta.usertransaction.timeout}"/>
<batch:listeners>
<batch:listener ref="midxStepListener"/>
<batch:listener>
<bean id="cmabChunkListener" class="de.axa.batch.ecmcm.cmab.core.batch.listener.CmabChunkListener" scope="step"/>
</batch:listener>
</batch:listeners>
</batch:tasklet>
</batch:step>
The tasklet runs with a JtaTransaction manger (Atomikos, name="transactionManager").
Now my question:
Is this transaction manager "delegate" to the chunk-process?
Why I'm asking this? If I set the transaction-attributes (see chunk) to propagation level "MANDATORY" the chunk process aborted with the error that no transaction is available.
Therefore it left me confused because I thought that the tasklet transaction specification implies that the chunk running within this tasklet transaction, too.
Furthermore I intended to run the application within a cloud system with more than one pod. The processDbIemReaderJdbs fetches via a StoredProcedureItemReader Items with a "FOR UPDATE SKIP LOCKED" from a PostgresDB.
So my intention is to run the hole chunk, means also the reader, within one transaction in order to block the reader resultSet to other POD-Processes.
The transaction attributes are for the transaction that Spring Batch will create to run your step with the transaction manager that you set on the step. Those are the attributes of the transaction itself, not the transaction manager (that does not make sense).
All batch artifacts are executed within the scope of that same transaction, including the reader and the writer. The only exception to that is the JdbcCursorItemReader, which by default does not participate in the transaction, unless useSharedExtendedConnection is set.

Spring Batch: Do I need a Transactional annotation?

I have a Spring Batch app that I've configured with a SkipPolicy so the whole batch won't fail if one record can't be inserted. But the behavior doesn't seem to be that way. When an insert into the database fails, Postgresql says the transaction is "bad", and all commands will aborted. So all the following inserts into the db fail too. Spring is supposed to retry the transaction one record at a time, I thought?
So we're wondering if the problem is because we mark the service method with #Transactional? Should I just let Spring Batch control the transactions? Here's my job configuration:
<bean id="stepScope" class="org.springframework.batch.core.scope.StepScope">
<property name="autoProxy" value="true"/>
</bean>
<bean id="skipPolicy" class="com.company.batch.common.job.listener.BatchSkipPolicy"/>
<bean id="chunkListener" class="com.company.batch.common.job.listener.ChunkExecutionListener"/>
<batch:job id="capBkdnJob">
<batch:step id="capStep">
<tasklet throttle-limit="20">
<chunk reader="CapReader" processor="CapProcessor" writer="CapWriter" commit-interval="50"
skip-policy="skipPolicy" skip-limit="10">
<batch:skippable-exception-classes>
<batch:include class="com.company.common.exception.ERDException"/>
</batch:skippable-exception-classes>
</chunk>
<batch:no-rollback-exception-classes>
<batch:include class="com.company.common.exception.ERDException"/>
</batch:no-rollback-exception-classes>
<batch:listeners>
<batch:listener ref="chunkListener"/>
</batch:listeners>
</tasklet>
</batch:step>
<batch:listeners>
<batch:listener ref="batchWorkerJobExecutionListener"/>
</batch:listeners>
</batch:job>
Short answer is: no
Spring Batch will use the transaction manager defined as part of your JobRepository by default. This will allow it to roll back the whole chunk when an error has been encountered and then retry each item individually in its own transaction.

Spring Batch SkipPolicy not used

I use the setup below in a project for a job definition.
On the project the batch-jobs are defined in a database. The xml-job definition below serves as a template for creating all these batch jobs at runtime.
This works fine, except in the case of a BeanCreationException in the dataProcessor. When this exception occurs the skip policy is never called and the batch ends immediately instead.
What could be the reason for that? What do I have to do so that every Exception in the dataProcessor is going to use the SkipPolicy?
Thanks a lot in advance
Christian
Version: spring-batch 3.0.7
<batch:job id="MassenGevoJob" restartable="true">
<batch:step id="selectDataStep" parent="selectForMassenGeVoStep" next="executeProcessorStep" />
<batch:step id="executeProcessorStep"
allow-start-if-complete="true" next="decideExitStatus" >
<batch:tasklet>
<batch:chunk reader="dataReader" processor="dataProcessor"
writer="dataItemWriter" commit-interval="10"
skip-policy="batchSkipPolicy">
</batch:chunk>
<batch:listeners>
<batch:listener ref="batchItemListener" />
<batch:listener ref="batchSkipListener" />
<batch:listener ref="batchChunkListener" />
</batch:listeners>
</batch:tasklet>
</batch:step>
<batch:decision decider="failOnPendingObjectsDecider"
id="decideExitStatus">
<batch:fail on="FAILED_PENDING_OBJECTS" exit-code="FAILED_PENDING_OBJECTS" />
<batch:next on="*" to="endFlowStep" />
</batch:decision>
<batch:step id="endFlowStep">
<batch:tasklet ref="noopTasklet"></batch:tasklet>
</batch:step>
<batch:validator ref="batchParameterValidator" />
<batch:listeners>
<batch:listener ref="batchJobListener" />
</batch:listeners>
</batch:job>
A BeanCreationException isn't really skippable because it usually happens before Spring Batch starts. It's also typically a fatal error for your application (Spring couldn't create a component you've defined as being critical to your application). If the creation of that bean is subject to issues and not having it is ok, I'd suggest wrapping it's creation in a factory so that you can control any exceptions that come out of the creation of that bean. For example, if you can't create your custom ItemProcessor, your FactoryBean could return the PassthroughItemProcessor if that's ok.

Skip step based on job parameter

I've read through the spring batch docs a few times and searched for a way to skip a job step based on job parameters.
For example say I have this job
<batch:job id="job" restartable="true"
xmlns="http://www.springframework.org/schema/batch">
<batch:step id="step1-partitioned-export-master">
<batch:partition handler="partitionHandler"
partitioner="partitioner" />
<batch:next on="COMPLETED" to="step2-join" />
</batch:step>
<batch:step id="step2-join">
<batch:tasklet>
<batch:chunk reader="xmlMultiResourceReader" writer="joinXmlItemWriter"
commit-interval="1000">
</batch:chunk>
</batch:tasklet>
<batch:next on="COMPLETED" to="step3-zipFile" />
</batch:step>
<batch:step id="step3-zipFile">
<batch:tasklet ref="zipFileTasklet" />
<!-- <batch:next on="COMPLETED" to="step4-fileCleanUp" /> -->
</batch:step>
<!-- <batch:step id="step4-fileCleanUp">
<batch:tasklet ref="fileCleanUpTasklet" />
</batch:step> -->
</batch:job>
I want to be able to skip step4 if desired by specifying in the job paramaters.
The only somewhat related question I could find was how to select which spring batch job to run based on application argument - spring boot java config
Which seems to indicate that 2 distinct job contexts should be created and the decision made outside the batch step definition.
I have already followed this pattern, since I had a csv export as well as xml as in the example. I split the 2 jobs into to separate spring-context.xml files one for each export type, even though the there where not many differences.
At that point I though it was perhaps cleaner since I could find no examples of alternatives.
But having to create 4 separate context files just to make it possible to include step 4 or not for each export case seems a bit crazy.
I must be missing something here.
Can't you do that with a decider? http://docs.spring.io/spring-batch/reference/html/configureStep.html (chapter 5.3.4 Programmatic Flow Decisions)
EDIT: link to the updated url
https://docs.spring.io/spring-batch/trunk/reference/html/configureStep.html#programmaticFlowDecisions