I have the following requirement: a CSV input file contains lines where one of the fields is an ID. There can be several lines with the same ID. The lines should be processed grouped by ID (meaning, if one line fails validation, then all lines with that same ID should fail to process). The groups of lines can be processed in parallel.
I have an implementation that works fine, but it is reading the CSV input file using my own code in a Partitioner implementation. It would be nicer if I could use an out-of-the-box implementation for that (e.g. FlatFileItemReader) and just configure that just like you would for a Chunk step.
To clarify, my job config is like this:
<batch:job id="job">
<batch:step id="partitionStep">
<batch:partition step="chunkStep" partitioner="partitioner">
<batch:handler grid-size="10" task-executor="taskExecutor" />
</batch:partition>
</batch:step>
</batch:job>
<batch:step id="chunkStep">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="reader" processor="processor" writer="writer" chunk-completion-policy="completionPolicy">
.. skip and retry policies omitted for brevity
</batch:chunk>
</batch:tasklet>
</batch:step>
<bean id="partitioner" class="com.acme.InputFilePartitioner" scope="step">
<property name="inputFileName" value="src/main/resources/input/example.csv" />
</bean>
<bean id="reader" class="org.springframework.batch.item.support.ListItemReader" scope="step">
<constructor-arg value="#{stepExecutionContext['key']}"/>
</bean>
where the Partitioner implementation reads the input file, "manually" parses the lines to get the ID field, groups them by that ID and puts them in Lists, and create ExecutionContexts that each get one of those Lists.
It would be great if I could replace that "manual" code in the Partitioner by a configuration of FlatFileItemReader with an ObjectMapper. (I hope I express myself clearly).
Is it possible ?
Related
I specified a tasklet with a chunk orientated procssing.
<batch:step id="midxStep" parent="stepParent">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk
reader="processDbItemReaderJdbc"
processor="midxItemProcessor"
writer="midxCompositeItemWriter"
processor-transactional="false"
reader-transactional-queue="false"
skip-limit="${cmab.batch.skip.limit}"
commit-interval="#{jobParameters['toProcess']==T(de.axa.batch.ecmcm.cmab.util.CmabConstants).TYPE_POSTAUSGANG ? '${consumer.global.pa.midx.commitSize}' : '${consumer.global.pe.midx.commitSize}' }"
cache-capacity="20">
<batch:skippable-exception-classes>
<batch:include class="de.axa.batch.ecmcm.cmab.util.CmabProcessMidxException" />
<batch:exclude class="java.lang.IllegalArgumentException" />
</batch:skippable-exception-classes>
<batch:retryable-exception-classes>
<batch:include class="de.axa.batch.ecmcm.cmab.util.CmabTechnicalMidxException" />
<batch:include class="de.axa.batch.ecmcm.cmab.util.CmabTechnicalException" />
</batch:retryable-exception-classes>
<batch:retry-listeners>
<batch:listener ref="logRetryListener"/>
</batch:retry-listeners>
<batch:listeners>
<batch:listener>
<bean id="midxProcessSkipListener" class="de.axa.batch.ecmcm.cmab.core.batch.listener.CmabDbSkipListener" scope="step">
<constructor-arg index="0" value="#{jobParameters['errorStatus']}" type="java.lang.String"/>
</bean>
</batch:listener>
</batch:listeners>
</batch:chunk>
<batch:transaction-attributes isolation="SERIALIZABLE" propagation="MANDATORY" timeout="${cmab.jta.usertransaction.timeout}"/>
<batch:listeners>
<batch:listener ref="midxStepListener"/>
<batch:listener>
<bean id="cmabChunkListener" class="de.axa.batch.ecmcm.cmab.core.batch.listener.CmabChunkListener" scope="step"/>
</batch:listener>
</batch:listeners>
</batch:tasklet>
</batch:step>
The tasklet runs with a JtaTransaction manger (Atomikos, name="transactionManager").
Now my question:
Is this transaction manager "delegate" to the chunk-process?
Why I'm asking this? If I set the transaction-attributes (see chunk) to propagation level "MANDATORY" the chunk process aborted with the error that no transaction is available.
Therefore it left me confused because I thought that the tasklet transaction specification implies that the chunk running within this tasklet transaction, too.
Furthermore I intended to run the application within a cloud system with more than one pod. The processDbIemReaderJdbs fetches via a StoredProcedureItemReader Items with a "FOR UPDATE SKIP LOCKED" from a PostgresDB.
So my intention is to run the hole chunk, means also the reader, within one transaction in order to block the reader resultSet to other POD-Processes.
The transaction attributes are for the transaction that Spring Batch will create to run your step with the transaction manager that you set on the step. Those are the attributes of the transaction itself, not the transaction manager (that does not make sense).
All batch artifacts are executed within the scope of that same transaction, including the reader and the writer. The only exception to that is the JdbcCursorItemReader, which by default does not participate in the transaction, unless useSharedExtendedConnection is set.
I have a Spring Batch app that I've configured with a SkipPolicy so the whole batch won't fail if one record can't be inserted. But the behavior doesn't seem to be that way. When an insert into the database fails, Postgresql says the transaction is "bad", and all commands will aborted. So all the following inserts into the db fail too. Spring is supposed to retry the transaction one record at a time, I thought?
So we're wondering if the problem is because we mark the service method with #Transactional? Should I just let Spring Batch control the transactions? Here's my job configuration:
<bean id="stepScope" class="org.springframework.batch.core.scope.StepScope">
<property name="autoProxy" value="true"/>
</bean>
<bean id="skipPolicy" class="com.company.batch.common.job.listener.BatchSkipPolicy"/>
<bean id="chunkListener" class="com.company.batch.common.job.listener.ChunkExecutionListener"/>
<batch:job id="capBkdnJob">
<batch:step id="capStep">
<tasklet throttle-limit="20">
<chunk reader="CapReader" processor="CapProcessor" writer="CapWriter" commit-interval="50"
skip-policy="skipPolicy" skip-limit="10">
<batch:skippable-exception-classes>
<batch:include class="com.company.common.exception.ERDException"/>
</batch:skippable-exception-classes>
</chunk>
<batch:no-rollback-exception-classes>
<batch:include class="com.company.common.exception.ERDException"/>
</batch:no-rollback-exception-classes>
<batch:listeners>
<batch:listener ref="chunkListener"/>
</batch:listeners>
</tasklet>
</batch:step>
<batch:listeners>
<batch:listener ref="batchWorkerJobExecutionListener"/>
</batch:listeners>
</batch:job>
Short answer is: no
Spring Batch will use the transaction manager defined as part of your JobRepository by default. This will allow it to roll back the whole chunk when an error has been encountered and then retry each item individually in its own transaction.
I've read through the spring batch docs a few times and searched for a way to skip a job step based on job parameters.
For example say I have this job
<batch:job id="job" restartable="true"
xmlns="http://www.springframework.org/schema/batch">
<batch:step id="step1-partitioned-export-master">
<batch:partition handler="partitionHandler"
partitioner="partitioner" />
<batch:next on="COMPLETED" to="step2-join" />
</batch:step>
<batch:step id="step2-join">
<batch:tasklet>
<batch:chunk reader="xmlMultiResourceReader" writer="joinXmlItemWriter"
commit-interval="1000">
</batch:chunk>
</batch:tasklet>
<batch:next on="COMPLETED" to="step3-zipFile" />
</batch:step>
<batch:step id="step3-zipFile">
<batch:tasklet ref="zipFileTasklet" />
<!-- <batch:next on="COMPLETED" to="step4-fileCleanUp" /> -->
</batch:step>
<!-- <batch:step id="step4-fileCleanUp">
<batch:tasklet ref="fileCleanUpTasklet" />
</batch:step> -->
</batch:job>
I want to be able to skip step4 if desired by specifying in the job paramaters.
The only somewhat related question I could find was how to select which spring batch job to run based on application argument - spring boot java config
Which seems to indicate that 2 distinct job contexts should be created and the decision made outside the batch step definition.
I have already followed this pattern, since I had a csv export as well as xml as in the example. I split the 2 jobs into to separate spring-context.xml files one for each export type, even though the there where not many differences.
At that point I though it was perhaps cleaner since I could find no examples of alternatives.
But having to create 4 separate context files just to make it possible to include step 4 or not for each export case seems a bit crazy.
I must be missing something here.
Can't you do that with a decider? http://docs.spring.io/spring-batch/reference/html/configureStep.html (chapter 5.3.4 Programmatic Flow Decisions)
EDIT: link to the updated url
https://docs.spring.io/spring-batch/trunk/reference/html/configureStep.html#programmaticFlowDecisions
I was trying to copy a data from one data source to another, used to Ibaitsbatchitemwriter class to do so. Record was got inserted into target database but end of the batch getting null pointer exception as below,
java.lang.NullPointerException
at org.springframework.batch.item.database.IbatisBatchItemWriter.write(IbatisBatchItemWriter.java:142)
at org.springframework.batch.core.step.item.SimpleChunkProcessor.writeItems(SimpleChunkProcessor.java:175)
at org.springframework.batch.core.step.item.SimpleChunkProcessor.doWrite(SimpleChunkProcessor.java:151)
at org.springframework.batch.core.step.item.SimpleChunkProcessor.write(SimpleChunkProcessor.java:274)
at org.springframework.batch.core.step.item.SimpleChunkProcessor.process(SimpleChunkProcessor.java:199)
at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:75)
but after adding the property assertupdates = fales i was not getting the error and then data is also got copied. but i was not convinced with the null pointer error, looks like am missing something in my config or so.
i use spring infra 2.2.4 and ibatis version 2.3.0.
<bean id="targetWriterDepAcct03"
class="org.springframework.batch.item.database.IbatisBatchItemWriter">
<property name="sqlMapClient" ref="targetDatabaseMap" />
<property name="statementId" value="DepositAccountSqlMap.updtDepositAccount" />
<property name="assertUpdates" value="false" />
</bean>
<batch:job id="baseJob" abstract="true" restartable="true"
job-repository="jobRepository" />
<batch:job id="TboltSyncBatchJob">
<batch:step id="CheckForConfigFileStep">
<batch:tasklet ref="CheckForConfigFile" />
<batch:next on="COMPLETED" to="SyncDataDepAcct03" />
<batch:end on="FAILED" />
</batch:step>
<batch:step id="SyncDataDepAcct03">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="sourceReaderForDepAcct03" writer="targetWriterDepAcct03"
commit-interval="1000" />
</batch:tasklet>
</batch:step>
any thoughts?
That NPE is due to the fact that zero results were returned by the SqlMapClient. If you don't need the number of records to be checked, you can turn that off. If you need them checked, you'll want to look into why that query isn't returning any results. You can see the code for the IbatisBatchItemWriter here: IbatisBatchItemWriter.
I'm setting up a batch job that moves data between three databases. I am planning on using the out of the box spring batch classes to handle the query from the first database, but i want to include details of the current job/step in the extract. The example spring config might look like this
<bean id="jdbcPagingItemReader" class="org.springframework.batch.item.database.JdbcPagingItemReader"> <property name="dataSource" ref="dataSource"/>
<property name="pageSize" value="1000"/>
<property name="fetchSize" value="100"/>
<property name="queryProvider">
<bean class="org.springframework.batch.item.database.support.HsqlPagingQueryProvider">
<property name="selectClause" value="select id, bar"/>
<property name="fromClause" value="foo"/>
<property name="sortKeys">
Is there a way via groovy or SpEL to access the current JobExecution? I had found this thread on access-spring-batch-job-definition but is assumes custom code.
Your configuration is cut off at the sortKeys entry so I'm not 100% sure what you are attempting to accomplish. That being said, using step scope, you can inject the StepExecution which has a reference to the JobExecution. Getting the JobExecution would look something like this:
<bean id="jdbcPagingItemReader" class="org.springframework.batch.item.database.JdbcPagingItemReader">
…
<property id="jobExecution" value="#{stepExecution.jobExecution}"/>
</bean>
The write count exists on the step context (StepExecution.getWriteCount()) so first you need to promote it to the JobExecutionContext.
I suggest using a Step listener with #AfterStep annotation for this
#Component
public class PromoteWriteCountToJobContextListener implements StepListener {
#AfterStep
public ExitStatus afterStep(StepExecution stepExecution){
int writeCount = stepExecution.getWriteCount();
stepExecution.getJobExecution().getExecutionContext()
.put(stepExecution.getStepName()+".writeCount", writeCount);
return stepExecution.getExitStatus();
}
}
Every step that perform the inserts will be added with this listner
<batch:step id="readFromDB1WrietToDB2" next="readFroDB2WrietToDB3">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="reader" writer="writter" />
</batch:tasklet>
<batch:listeners>
<batch:listener ref="promoteWriteCountToJobContextListener"/>
</batch:listeners>
</batch:step>
<batch:step id="readFromDB2WrietToDB3" next="summerizeWrites">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="reader" writer="writter" />
</batch:tasklet>
<batch:listeners>
<batch:listener ref="promoteWriteCountToJobContextListener"/>
</batch:listeners>
</batch:step>
You can use log4j to write the results to log within the Step Listener, or you can use the saved values in a future step. You need to use SpEL expression to read it, to Use SpEL expression you need to set the Writter/Tasklet at scope="step"
<batch:step id="summerizeWrites">
<batch:tasklet id="summerizeCopyWritesTaskelt"">
<bean class="Tasklet" scope="step">
<property name="writeCountsList">
<list>
<value>#{jobExecutionContext['readFromDB1WrietToDB2.writeCount']}</value>
<value>#{jobExecutionContext['readFromDB2WrietToDB3.writeCount']}</value>
</list>
</property>
</bean>
</batch:tasklet>