Sprint batch update insert on heavy db takes time - spring-batch

I have implemented a spring batch, which reads data from csv files and insert and update based on records
I have a table XXX037 which has 800k records and it takes too much time inserting and updating it.
I have used spring batch configuration, commit-interval of 1000. Still it takes time to process
Is there any way i can improve performance?
Configiraion
<batch:job id="pcqJob">
<batch:step id="pcqStep">
<batch:tasklet>
<batch:chunk reader="pcqReader" writer="compositeWriter" commit-interval="1000">
<!-- <batch:skippable-exception-classes>
<batch:include class="javax.persistence.PersistenceException"/>
</batch:skippable-exception-classes> -->
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
<!-- <bean id="skipPolicy" class="com.test.domain.services.writer.SkipPolicy">
<property name="skipLimit" value="2500"/>
</bean> -->
<bean id="compositeWriter" class="org.springframework.batch.item.support.ClassifierCompositeItemWriter">
<property name="classifier">
<bean class="org.springframework.classify.BackToBackPatternClassifier">
<property name="routerDelegate">
<bean class="com.test.domain.services.writer.ItemCodeClassifier" />
</property>
<property name="matcherMap">
<map>
<entry key="*Doss*" value-ref="fileItemWriter1" />
<entry key="*Ldt*" value-ref="fileItemWriter2" />
<entry key="*Old*" value-ref="oldDossierfileItemWriter" />
<entry key="*Tpm*" value-ref="tpmfileItemWriter" />
<entry key="*Txm*" value-ref="tpxfileItemWriter" />
<entry key="*DoD*" value-ref="dossierDeletefileItemWriter" />
<entry key="*LdD*" value-ref="ldtDeletefileItemWriter" />
<entry key="*TpD*" value-ref="tpmDeletefileItemWriter" />
<entry key="*TxD*" value-ref="txmDeletefileItemWriter" />
</map>
</property>
</bean>
</property>
</bean>

Related

RetryLogic using AOP is not working for MongoItemWritrer , but the same is working for my custom Readers and writers

RetryLogic using AOP is not working for MongoItemWritrer , but the same is working for my custom Readers and writers
Is there anything that I'm doing wrong here.
Below is the code snipopet.
<bean id="retryAdvice"
class="org.springframework.retry.interceptor.RetryOperationsInterceptor">
<property name="retryOperations" ref="taskBatchRetryTemplate" />
</bean>
<bean id="taskBatchRetryTemplate" class="org.springframework.retry.support.RetryTemplate">
<property name="retryPolicy" ref="genericRetryPolicy" />
<property name="backOffPolicy">
<bean class="org.springframework.retry.backoff.ExponentialBackOffPolicy">
<property name="initialInterval" value="${mongoloader.backOffPeriod.initialInterval}"/>
<property name="maxInterval" value="${mongoloader.backOffPeriod.maxInterval}"/>
<property name="multiplier" value="${mongoloader.backOffPeriod.multiplier}"/>
</bean>
</property>
<property name="listeners">
<bean class="com.company.ens.myload.job.StepRetryListener"/>
</property>
</bean>
<bean id="genericRetryPolicy" class="org.springframework.retry.policy.SimpleRetryPolicy" >
<constructor-arg index="0" value="${mongoloader.retry.limit}"/>
<constructor-arg index="1">
<map>
<entry key="org.springframework.data.mongodb.CannotGetMongoDbConnectionException" value="true"/>
<entry key="org.springframework.jdbc.CannotGetJdbcConnectionException" value="true"/>
<!-- Just included the below exception for testing purpose, needs to be removed -->
<entry key="java.io.FileNotFoundException" value="true"/>
<entry key="org.springframework.dao.DuplicateKeyException" value="true"/>
</map>
</constructor-arg>
</bean>
//including only my step configution here
<batch:step id="Step4a-MainFlow_TranslateRawFedObjectsToFilingModel"
allow-start-if-complete="false">
<batch:tasklet>
<batch:chunk reader="rawFedObjectMongoReader"
processor="translatingProcessor" writer="SctModelMongoWriter"
commit-interval="100"/>
</batch:tasklet>
<batch:listeners>
<batch:listener ref="step4aListener"/>
</batch:listeners>
</batch:step>
Did not find any posts related to my question. Any help would be appreciated.

Using optional fields with StaxEventItemReader

I have a Spring Batch application and I'm using the StaxEventItemReader as my ItemReader. By default XStream requires us to declare a property for each possible XML tag or else it throws an UnknownFieldException exception. There are ways to code around this with Java but with Spring Batch, the InputReader doesn't seem to have a way to modify it. Is there a way to flag fields as optional in the xml?
My bean is configured basically like this
<job id="synchronizecustomerData" xmlns="http://www.springframework.org/schema/batch">
<step id="readWritecustomers">
<tasklet>
<chunk reader="customerReader"
processor="customerProcessor"
writer="customerSyncWriter"
commit-interval="1"
skip-policy="alwaysSkip" >
</chunk>
</tasklet>
</step>
</job>
<bean id="customerReader" class="org.springframework.batch.item.xml.StaxEventItemReader">
<property name="fragmentRootElementName" value="customer" />
<property name="resource" ref="inputResource" />
<property name="unmarshaller" ref="customerMarshaller" />
</bean>
<bean id="inputResource" class="org.springframework.core.io.FileSystemResource">
<constructor-arg value="c:/sf/data.xml" />
</bean>
<bean id="customerMarshaller" class="org.springframework.oxm.xstream.XStreamMarshaller">
<property name="aliases">
<util:map id="aliases">
<entry key="customer" value="com.company.batchmaster.sf.beans.customer" />
<entry key="name" value="java.lang.String" />
</util:map>
</property>
</bean>
<bean id="customerProcessor" class="org.springframework.batch.item.support.CompositeItemProcessor">
<property name="delegates">
<list>
<ref bean="customerTransformer" />
</list>
</property>
</bean>
<bean id="customerTransformer" class="com.company.batchmaster.sf.chunk.customerTransformer" />
<bean id="customerSyncWriter" class="com.company.batchmaster.sf.chunk.customerSyncWriter" />
My import file looks like this, just getting it up and running
<?xml version="1.0" encoding="UTF-8"?>
<records>
<customer xmlns="http://springframework.org/batch/sample/io/oxm/domain">
<name>ABC Dealer</name>
<types>CR</types>
</customer>
</records>
Thanks for any help.
I am assuming Customer class has properties name and type.
Annotate it as XmlAttribute.defaultValue() as described in Jaxb guide
No need to this (<entry key="name" value="java.lang.String" />) alias because you are unmarshalling a complete Customer object from a node as specified with <property name="fragmentRootElementName" value="customer" />

I get "Writer must be open before it can be written to" while using ClassifierCompositeItemWriter

The question states my problem. Can you not use FlatFileItemWriters (FFIW) as the writers on anything but simple chunk processing with a single writer? I'm new.
I've attempted to inject an FFIW into ItemProcessors and gotten the same thing. Perhaps I need to write my own custom writers. I was trying to leverage the FFIW to do the work, because all I need is to sift the one input file and populate three outfiles. My routerDelegate works fine, no problems there. Just fails on the write because the file is not open, and I can't see how to manually open it (which I think is the wrong approach, even if I could).
Thanks...
here's my code:
<batch:step id="processCustPermits" next="somethingElse">
<batch:description>Sift permits></batch:description>
<batch:tasklet>
<batch:chunk reader="custPermitReader" writer="custPermitCompositeWriter"
commit-interval="1" />
</batch:tasklet>
</batch:step>
<bean id="custPermitCompositeWriter"
class="org.springframework.batch.item.support.ClassifierCompositeItemWriter">
<property name="classifier">
<bean
class="org.springframework.batch.classify.BackToBackPatternClassifier">
<property name="routerDelegate" ref="permitRouterClassifier" />
<property name="matcherMap">
<map>
<entry key="hierarchy" value-ref="custPermitWriter" />
<entry key="omit" value-ref="custPermitOmithWriter" />
<entry key="trash" value-ref="custPermitTrashWriter" />
</map>
</property>
</bean>
</property>
</bean>
<bean id="custPermitWriter" class="org.springframework.batch.item.file.FlatFileItemWriter">
<property name="resource" value="${sap.cust.permit.outfile.heirarchy}" />
<property name="lineAggregator" ref="passThroughLineAggregator" />
<property name="shouldDeleteIfExists" value="true" />
<property name="shouldDeleteIfEmpty" value="false" />
</bean>
<bean id="custPermitOmithWriter" class="org.springframework.batch.item.file.FlatFileItemWriter">
<property name="resource" value="${sap.cust.permit.outfile.omits}" />
<property name="lineAggregator" ref="passThroughLineAggregator" />
<property name="shouldDeleteIfExists" value="true" />
<property name="shouldDeleteIfEmpty" value="true" />
</bean>
<bean id="custPermitTrashWriter" class="org.springframework.batch.item.file.FlatFileItemWriter">
<property name="resource" value="${sap.cust.permit.outfile.trash}" />
<property name="lineAggregator" ref="passThroughLineAggregator" />
<property name="shouldDeleteIfExists" value="true" />
<property name="shouldDeleteIfEmpty" value="true" />
</bean>
Sometimes you just have to read real closely. I added the Streams element to my chunk element and voila!
<batch:step id="processCustPermits" next="somethingElse">
<batch:description>Sort out unwanted permits></batch:description>
<batch:tasklet>
<batch:chunk reader="custPermitReader" writer="custPermitCompositeWriter"
commit-interval="1">
<batch:streams>
<batch:stream ref="custPermitWriter" />
<batch:stream ref="custPermitOmithWriter" />
<batch:stream ref="custPermitTrashWriter" />
</batch:streams>
</batch:chunk>
</batch:tasklet>
</batch:step>
For those who prefer a Java configuration to XML configuration, it is done as follows:
#Bean
public Step processCustPermits(StepBuilderFactory stepBuilderFactory,
#Qualifier("custPermitReader") ItemReader<Wscpos> custPermitReader,
#Qualifier("custPermitCompositeWriter") ItemWriter<Wscpos> custPermitCompositeWriter,
#Qualifier("custPermitWriter") FlatFileItemWriter<Wscpos> custPermitWriter,
#Qualifier("custPermitOmithWriter") FlatFileItemWriter<Wscpos> custPermitOmithWriter,
#Qualifier("custPermitTrashWriter") FlatFileItemWriter<Wscpos> custPermitTrashWriter)
{
return stepBuilderFactory.get("processCustPermits")
.<Wscpos, Wscpos> chunk(1)
.reader(custPermitReader)
.writer(custPermitCompositeWriter)
.stream(writerCustodyMismatch)
.stream(writerNoMatch)
.stream(custPermitTrashWriter)
.build();
}

Spring Batch - Steps to Improve Performance

I am currently developing data loaders.Reading a file and writing to database. I am using partition handler to process multiple Comma Separated files in 30 threads. I want to scale and have throughput.Daily i receive 15000 files(each having 1 million records ) , how do i scale using spring batch.i want the job to complete this within a day.Do we have any open source grid computing , that can do this fairly, or is there any simple fine tuning steps.
The spring batch data loader runs stand alone. There is no web container involved. it runs on single solaris machine having 24 cpus. The data is written in to single database.default isolation and propagation is provided.The xml config is given below:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:batch="http://www.springframework.org/schema/batch"
xmlns:p="http://www.springframework.org/schema/p"
xmlns:aop="http://www.springframework.org/schema/aop"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:tx="http://www.springframework.org/schema/tx"
xmlns:task="http://www.springframework.org/schema/task"
xsi:schemaLocation="http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop.xsd
http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.1.xsd
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd
http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd
http://www.springframework.org/schema/task http://www.springframework.org/schema/task/spring-task-3.0.xsd">
<!-- IMPORT DB CONFIG -->
<import resource="classpath:bom/bom/bomloader/job/DataSourcePoolConfig.xml" />
<!-- USE ANNOTATIONS TO CONFIGURE SPRING BEANS -->
<context:component-scan base-package="bom.bom.bom" />
<!-- INJECT THE PROCESS PARAMS HASHMAP BEFORE CONTEXT IS INITIALISED -->
<bean id="holder" class="bom.bom.bom.loader.util.PlaceHolderBean" >
<property name="beanName" value="holder"/>
</bean>
<bean id="logger" class="bom.bom.bom.loader.util.PlaceHolderBean" >
<property name="beanName" value="logger"/>
</bean>
<bean id="dataMap" class="java.util.concurrent.ConcurrentHashMap" />
<!-- JOB REPOSITORY - WE USE DATABASE REPOSITORY -->
<!-- <bean id="jobRepository" class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean" >-->
<!-- <property name="transactionManager" ref="frdtransactionManager" />-->
<!-- <property name="dataSource" ref="frddataSource" />-->
<!-- <property name="databaseType" value="oracle" />-->
<!-- <property name="tablePrefix" value="batch_"/> -->
<!-- </bean>-->
<!-- JOB REPOSITORY - WE IN MEMORY REPOSITORY -->
<bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
<property name="transactionManager" ref="frdtransactionManager" />
</bean>
<!-- <bean id="jobExplorer" class="org.springframework.batch.core.explore.support.JobExplorerFactoryBean">-->
<!-- <property name="dataSource" ref="frddataSource" />-->
<!-- <property name="tablePrefix" value="batch_"/> -->
<!-- </bean>-->
<!-- LAUNCH JOBS FROM A REPOSITORY -->
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
<property name="taskExecutor">
<bean class="org.springframework.core.task.SyncTaskExecutor" />
</property>
</bean>
<!-- CONFIGURE SCHEDULING IN QUARTZ -->
<!-- <bean id="jobDetail" class="org.springframework.scheduling.quartz.JobDetailBean">-->
<!-- <property name="jobClass" value="bom.bom.bom.assurance.core.JobLauncherDetails" />-->
<!-- <property name="group" value="quartz-batch" />-->
<!-- <property name="jobDataAsMap">-->
<!-- <map>-->
<!-- <entry key="jobName" value="${jobname}"/>-->
<!-- <entry key="jobLocator" value-ref="jobRegistry"/>-->
<!-- <entry key="jobLauncher" value-ref="jobLauncher"/>-->
<!-- </map>-->
<!-- </property>-->
<!-- </bean>-->
<!-- RUN EVERY 2 HOURS -->
<!-- <bean class="org.springframework.scheduling.quartz.SchedulerFactoryBean">-->
<!-- <property name="triggers">-->
<!-- <bean id="cronTrigger" class="org.springframework.scheduling.quartz.CronTriggerBean">-->
<!-- <property name="jobDetail" ref="jobDetail" />-->
<!-- <property name="cronExpression" value="2/0 * * * * ?" />-->
<!-- </bean>-->
<!-- </property>-->
<!-- </bean>-->
<!-- -->
<!-- RUN STANDALONE -->
<bean id="jobRunner" class="bom.bom.bom.loader.core.DataLoaderJobRunner">
<constructor-arg value="${LOADER_NAME}" />
</bean>
<!-- Get all the files for the exchanges and feed as resource to the MultiResourcePartitioner -->
<bean id="fileresource" class="bom.bom.bom.loader.util.FiltersFoldersResourceFactory" p:dataMap-ref="dataMap">
<property name="filePath" value="${PARENT_PATH}" />
<property name="acceptedFolders" value="${EXCH_CODE}" />
<property name="logger" ref="logger" />
</bean>
<!-- The network Data Loading Configuration goes here -->
<job id="CDR_network _PARALLEL" xmlns="http://www.springframework.org/schema/batch" restartable="false" >
<step id="PREPARE_CLEAN" >
<flow parent="prepareCleanFlow" />
<next on="COMPLETED" to="LOAD_EXCHANGE_DATA" />
<fail on="FAILED" exit-code="Failed on cleaning error records."/>
</step>
<step id="LOAD_EXCHANGE_DATA" >
<tasklet ref="businessData" transaction-manager="ratransactionManager" />
<next on="COMPLETED" to="LOAD_CDR_FILES" />
<fail on="FAILED" exit-code="FAILED ON LOADING EXCHANGE INFORMATION FROM DB." />
</step>
<step id="LOAD_CDR_FILES" >
<tasklet ref="fileresource" transaction-manager="frdtransactionManager" />
<next on="COMPLETED" to="PROCESS_FILE_TO_STAGING_TABLE_PARALLEL" />
<fail on="FAILED" exit-code="FAILED ON LOADING CDR FILES." />
</step>
<step id="PROCESS_FILE_TO_STAGING_TABLE_PARALLEL" next="limitDecision" >
<partition step="filestep" partitioner="filepartitioner" >
<handler grid-size="100" task-executor="executorWithCallerRunsPolicy" />
</partition>
</step>
<decision id="limitDecision" decider="limitDecider">
<next on="COMPLETED" to="MOVE_RECS_STAGING_TO_MAIN_TABLE" />
<next on="CONTINUE" to="PROCESS_FILE_TO_STAGING_TABLE_PARALLEL" />
</decision>
<step id="MOVE_RECS_STAGING_TO_MAIN_TABLE" >
<tasklet ref="moveRecords" transaction-manager="ratransactionManager" >
<transaction-attributes isolation="SERIALIZABLE"/>
</tasklet>
<fail on="FAILED" exit-code="FAILED ON MOVING DATA TO THE MAIN TABLE." />
<next on="*" to="PREPARE_ARCHIVE"/>
</step>
<step id="PREPARE_ARCHIVE" >
<flow parent="prepareArchiveFlow" />
<fail on="FAILED" exit-code="FAILED ON Archiving files" />
<end on="*" />
</step>
</job>
<flow id="prepareCleanFlow" xmlns="http://www.springframework.org/schema/batch">
<step id="CLEAN_ERROR_RECORDS" next="archivefileExistsDecisionInFlow" >
<tasklet ref="houseKeeping" transaction-manager="ratransactionManager" />
</step>
<decision id="archivefileExistsDecisionInFlow" decider="archivefileExistsDecider">
<end on="NO_ARCHIVE_FILE" />
<next on="ARCHIVE_FILE_EXISTS" to="runprepareArchiveFlow" />
</decision>
<step id="runprepareArchiveFlow" >
<flow parent="prepareArchiveFlow" />
</step>
</flow>
<flow id="prepareArchiveFlow" xmlns="http://www.springframework.org/schema/batch" >
<step id="ARCHIVE_CDR_FILES" >
<tasklet ref="archiveFiles" transaction-manager="frdtransactionManager" />
</step>
</flow>
<bean id="archivefileExistsDecider" class="bom.bom.bom.loader.util.ArchiveFileExistsDecider" >
<property name="logger" ref="logger" />
<property name="frdjdbcTemplate" ref="frdjdbcTemplate" />
</bean>
<bean id="filepartitioner" class="org.springframework.batch.core.partition.support.MultiResourcePartitioner" scope="step" >
<property name="resources" value="#{dataMap[processFiles]}"/>
</bean>
<task:executor id="executorWithCallerRunsPolicy"
pool-size="90-95"
queue-capacity="6"
rejection-policy="CALLER_RUNS"/>
<!-- <bean id="dynamicJobParameters" class="bom.bom.bom.assurance.core.DynamicJobParameters" />-->
<bean id="houseKeeping" class="bom.bom.bom.loader.core.HousekeepingOperation">
<property name="logger" ref="logger" />
<property name="jdbcTemplate" ref="rajdbcTemplate" />
<property name="frdjdbcTemplate" ref="frdjdbcTemplate" />
</bean>
<bean id="businessData" class="bom.bom.bom.loader.core.BusinessValidatorData">
<property name="logger" ref="logger" />
<property name="jdbcTemplate" ref="NrajdbcTemplate" />
<property name="param" value="${EXCH_CODE}" />
<property name="sql" value="${LOOKUP_QUERY}" />
</bean>
<step id="filestep" xmlns="http://www.springframework.org/schema/batch">
<tasklet transaction-manager="ratransactionManager" allow-start-if-complete="true" >
<chunk writer="jdbcItenWriter" reader="fileItemReader" processor="itemProcessor" commit-interval="500" retry-limit="2">
<retryable-exception-classes>
<include class="org.springframework.dao.DeadlockLoserDataAccessException"/>
</retryable-exception-classes>
</chunk>
<listeners>
<listener ref="customStepExecutionListener">
</listener>
</listeners>
</tasklet>
</step>
<bean id="moveRecords" class="bom.bom.bom.loader.core.MoveDataFromStaging">
<property name="logger" ref="logger" />
<property name="jdbcTemplate" ref="rajdbcTemplate" />
</bean>
<bean id="archiveFiles" class="bom.bom.bom.loader.core.ArchiveCDRFile" >
<property name="logger" ref="logger" />
<property name="jdbcTemplate" ref="frdjdbcTemplate" />
<property name="archiveFlag" value="${ARCHIVE_FILE}" />
<property name="archiveDir" value="${ARCHIVE_LOCATION}" />
</bean>
<bean id="limitDecider" class="bom.bom.bom.loader.util.LimitDecider" p:dataMap-ref="dataMap">
<property name="logger" ref="logger" />
</bean>
<!-- <bean id="multifileReader" class="org.springframework.batch.item.file.MultiResourceItemReader" scope="step" >-->
<!-- <property name="resources" value="#{stepExecutionContext[fileName]}" />-->
<!-- <property name="delegate" ref="fileItemReader" />-->
<!-- </bean>-->
<!-- READ EACH FILE PARALLELY -->
<bean id="fileItemReader" scope="step" autowire-candidate="false" parent="itemReaderParent">
<property name="resource" value="#{stepExecutionContext[fileName]}" />
<property name="saveState" value="false" />
</bean>
<!-- LISTEN AT THE END OF EACH FILE TO DO POST PROCESSING -->
<bean id="customStepExecutionListener" class="bom.bom.bom.loader.core.StagingStepExecutionListener" scope="step">
<property name="logger" ref="logger" />
<property name="frdjdbcTemplate" ref="frdjdbcTemplate" />
<property name="jdbcTemplate" ref="rajdbcTemplate" />
<property name="sql" value="${INSERT_IA_QUERY_COLUMNS}" />
</bean>
<!-- CONFIGURE THE ITEM PROCESSOR TO DO BUSINESS LOGIC ON EACH ITEM -->
<bean id="itemProcessor" class="bom.bom.bom.loader.core.StagingLogicProcessor" scope="step">
<property name="logger" ref="logger" />
<property name="params" ref="businessData" />
</bean>
<!-- CONFIGURE THE JDBC ITEM WRITER TO WRITE IN TO DB -->
<bean id="jdbcItenWriter" class="org.springframework.batch.item.database.JdbcBatchItemWriter" scope="step">
<property name="dataSource" ref="radataSource"/>
<property name="sql">
<value>
<![CDATA[
${SQL1A}
]]>
</value>
</property>
<property name="itemSqlParameterSourceProvider">
<bean class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider">
</bean>
</property>
</bean>
<!-- <bean id="itemWriter" class="bom.bom.bom.assurance.core.LoaderDBWriter" scope="step">-->
<!-- <property name="sQL" value="${loader.sql}" />-->
<!-- <property name="jdbcTemplate" ref="NrajdbcTemplate" />-->
<!-- </bean>-->
<!-- CONFIGURE THE FLAT FILE ITEM READER TO READ INDIVIDUAL BATCH -->
<bean id="itemReaderParent" class="org.springframework.batch.item.file.FlatFileItemReader" abstract="true">
<property name="strict" value="false"/>
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.FixedLengthTokenizer">
<property name="names" value="${COLUMNS}" />
<property name="columns" value="${RANGE}" />
</bean>
</property>
<property name="fieldSetMapper">
<bean class="bom.bom.bom.loader.util.DataLoaderMapper">
<property name="params" value="${BEANPROPERTIES}"/>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
Tried:
i could see that the ThreadPoolExecutor hangs after 3 hours.The prstat in solaris says it is processing, but no processing in the log.
Tried with less chunk size 500 ,due memory foot print,no progress.
Since it inserts in to single database( 30 pooled connections).is there anythin i can do here.
Instances from visual vm
stacktrace of thread all are locked at connection level
Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):
"Attach Listener" daemon prio=3 tid=0x00bbf800 nid=0x26 waiting on condition [0x00000000..0x00000000]
java.lang.Thread.State: RUNNABLE
"executorWithCallerRunsPolicy-1" prio=3 tid=0x008a7000 nid=0x25 runnable [0xd5a7d000..0xd5a7fb70]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at oracle.net.ns.Packet.receive(Packet.java:240)
at oracle.net.ns.DataPacket.receive(DataPacket.java:92)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:172)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:117)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:92)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:77)
at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1034)
at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1010)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:588)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:194)
at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:953)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1222)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3387)
at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:3468)
- locked <0xdbdafa30> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:1350)
at org.springframework.jdbc.core.JdbcTemplate$2.doInPreparedStatement(JdbcTemplate.java:818)
at org.springframework.jdbc.core.JdbcTemplate$2.doInPreparedStatement(JdbcTemplate.java:1)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:587)
at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:812)
at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:868)
at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:876)
at
I would suggest you lower the chunk size to 50.
500 seems to be too big : you wait too much while talking with the DB.
At the same time, lower the TaskExecutor's pool size or increase your DB pool size.
You can choose which on by watching your DB host : if it's CPU and IO is not maxxed, increase your DB pool size to increase the DB load. If your DB CPU is already at it's maximum, lower the TaskExecutor's pool size. The objective is to have a fluid process.
I think the DB will be your main limitating element. So begin by adjusting the DB pool size according to the DB host capacities. When it's done, adjust your TaskExecutor's pool size according to the DB pool size (TE pool size = DB pool size * 1.5), plus the batch's host capacities (CPU, memory and IOs).
Splitting your incoming files on multiple hard drives may help too (if possible).
I think the problem here is million records in the file. Since you already reduced the chunk size , you should process smaller records. For testing sake, reduce the number of records in each file to 10k. My guess is you creating creating objects, doing some processing and you are doing this for 1m records in a loop. Each thread will hold the object in memory unless the processing is completed. My guess is because of volume of data, there are too many objects in your memory which are not garbage collected. If reducing the size helps, then you can try to use lightweight objects in your code and try setting each object to null at end of processing.
just a fix in your cron expression. This is the correct for 2 hours:
0 0 0/2 1/1 * ? *
Is your batch job relying on reflection(e.g., BeanPropertyRowMapper)? That can hamper performance.
If your database is causing problems, you may want to profile it. Don't have much concrete to offer here.
Already mentioned, drop that chunk size.

Making Spring Batch ItemReader restartable

The spring Batch program which I am working on is reading data from a table. It’s using ‘org.springframework.batch.item.database.JdbcCursorItemReader’ itemReader . Earlier the plan was to Alter table and add a PROCESSED_INDICATOR flag and prepopulate it with status ‘PENDING’. Once the record is processed and writer will update the status of PROCESSED_INDICATOR flag to ‘Processed’. This is to support re-startability . For example if batch picks up 1 million records and died in ½ million records then when I restart the batch; it should start where I have left off.
But unfortunately, management didn’t approve this solution. I am digging in ways to make itemreader re-startable. As per Spring documentation “Most ItemReaders have much more sophisticated restart logic. The JdbcCursorItemReader, for example, stores the row id of the last processed row in the Cursor.”
Does anyone have any sample example of such custom reader which implements JdbcCursorItemReader and stores last processed row in the cursor.
https://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html
==FULL XML CONFIGURATION==
<import resource="classpath:/batch/utility/skip/batch_skip.xml" />
<import resource="classpath:/batch/config/context-postgres.xml" />
<import resource="classpath:/batch/config/oracle-database.xml" />
<context:property-placeholder
location="classpath:/batch/jobs/TPF-1001-DD-01/TPF-1001-DD-01.properties" />
<bean id="gridSizePartitioner"
class="com.tpf.partitioner.GridSizePartitioner" />
<task:executor id="taskExecutor" pool-size="${pool.size}" />
<batch:job id="XYZJob" job-repository="jobRepository"
restartable="true">
<batch:step id="XYZSTEP">
<batch:description>Convert TIF files to PDF</batch:description>
<batch:partition partitioner="gridSizePartitioner">
<batch:handler task-executor="taskExecutor"
grid-size="${pool.size}" />
<batch:step>
<batch:tasklet allow-start-if-complete="true">
<batch:chunk commit-interval="${commit.interval}"
skip-limit="${job.skip.limit}">
<batch:reader>
<bean id="timeReader"
class="org.springframework.batch.item.database.JdbcCursorItemReader"
scope="step">
<property name="dataSource" ref="oracledataSource" />
<property name="sql">
<value>
select TIME_ID as timesheetId,count(*),max(CREATION_DATETIME) as creationDateTime , ILN_NUMBER as ilnNumber
from TS_FAKE_NAME
where creation_datetime >= '#{jobParameters['creation_start_date1']} 12.00.00.000000000 AM'
and creation_datetime < '#{jobParameters['creation_start_date2']} 11.59.59.999999999 PM'
and mod(time_id,${pool.size})=#{stepExecutionContext['partition.id']}
group by time_id ,ILN_NUMBER
</value>
</property>
<property name="rowMapper">
<bean
class="org.springframework.jdbc.core.BeanPropertyRowMapper">
<property name="mappedClass"
value="com.tpf.model.Time" />
</bean>
</property>
</bean>
</batch:reader>
<batch:processor>
<bean id="compositeItemProcessor"
class="org.springframework.batch.item.support.CompositeItemProcessor">
<property name="delegates">
<list>
<ref bean="timeProcessor" />
</list>
</property>
</bean>
</batch:processor>
<batch:writer>
<bean id="compositeItemWriter"
class="org.springframework.batch.item.support.CompositeItemWriter">
<property name="delegates">
<list>
<ref bean="timeWriter" />
</list>
</property>
</bean>
</batch:writer>
<batch:skippable-exception-classes>
<batch:include
class="com.utility.skip.BatchSkipException" />
</batch:skippable-exception-classes>
<batch:listeners>
<batch:listener ref="batchSkipListener" />
</batch:listeners>
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:partition>
</batch:step>
<batch:validator>
<bean
class="org.springframework.batch.core.job.DefaultJobParametersValidator">
<property name="requiredKeys">
<list>
<value>batchRunNumber</value>
<value>creation_start_date1</value>
<value>creation_start_date2</value>
</list>
</property>
</bean>
</batch:validator>
</batch:job>
<bean id="timesheetWriter" class="com.tpf.writer.TimeWriter"
scope="step">
<property name="dataSource" ref="dataSource" />
</bean>
<bean id="timeProcessor"
class="com.tpf.processor.TimeProcessor" scope="step">
<property name="dataSource" ref="oracledataSource" />
</bean>
Does anyone have any sample example of such custom reader which implements JdbcCursorItemReader and stores last processed row in the cursor
The JdbcCursorItemReader does that, see Javadoc, here is an excerpt:
ExecutionContext: The current row is returned as restart data,
and when restored from that same data, the cursor is opened and the current row
set to the value within the restart data.
So you don't need a custom reader.