I get "Writer must be open before it can be written to" while using ClassifierCompositeItemWriter - spring-batch

The question states my problem. Can you not use FlatFileItemWriters (FFIW) as the writers on anything but simple chunk processing with a single writer? I'm new.
I've attempted to inject an FFIW into ItemProcessors and gotten the same thing. Perhaps I need to write my own custom writers. I was trying to leverage the FFIW to do the work, because all I need is to sift the one input file and populate three outfiles. My routerDelegate works fine, no problems there. Just fails on the write because the file is not open, and I can't see how to manually open it (which I think is the wrong approach, even if I could).
Thanks...
here's my code:
<batch:step id="processCustPermits" next="somethingElse">
<batch:description>Sift permits></batch:description>
<batch:tasklet>
<batch:chunk reader="custPermitReader" writer="custPermitCompositeWriter"
commit-interval="1" />
</batch:tasklet>
</batch:step>
<bean id="custPermitCompositeWriter"
class="org.springframework.batch.item.support.ClassifierCompositeItemWriter">
<property name="classifier">
<bean
class="org.springframework.batch.classify.BackToBackPatternClassifier">
<property name="routerDelegate" ref="permitRouterClassifier" />
<property name="matcherMap">
<map>
<entry key="hierarchy" value-ref="custPermitWriter" />
<entry key="omit" value-ref="custPermitOmithWriter" />
<entry key="trash" value-ref="custPermitTrashWriter" />
</map>
</property>
</bean>
</property>
</bean>
<bean id="custPermitWriter" class="org.springframework.batch.item.file.FlatFileItemWriter">
<property name="resource" value="${sap.cust.permit.outfile.heirarchy}" />
<property name="lineAggregator" ref="passThroughLineAggregator" />
<property name="shouldDeleteIfExists" value="true" />
<property name="shouldDeleteIfEmpty" value="false" />
</bean>
<bean id="custPermitOmithWriter" class="org.springframework.batch.item.file.FlatFileItemWriter">
<property name="resource" value="${sap.cust.permit.outfile.omits}" />
<property name="lineAggregator" ref="passThroughLineAggregator" />
<property name="shouldDeleteIfExists" value="true" />
<property name="shouldDeleteIfEmpty" value="true" />
</bean>
<bean id="custPermitTrashWriter" class="org.springframework.batch.item.file.FlatFileItemWriter">
<property name="resource" value="${sap.cust.permit.outfile.trash}" />
<property name="lineAggregator" ref="passThroughLineAggregator" />
<property name="shouldDeleteIfExists" value="true" />
<property name="shouldDeleteIfEmpty" value="true" />
</bean>

Sometimes you just have to read real closely. I added the Streams element to my chunk element and voila!
<batch:step id="processCustPermits" next="somethingElse">
<batch:description>Sort out unwanted permits></batch:description>
<batch:tasklet>
<batch:chunk reader="custPermitReader" writer="custPermitCompositeWriter"
commit-interval="1">
<batch:streams>
<batch:stream ref="custPermitWriter" />
<batch:stream ref="custPermitOmithWriter" />
<batch:stream ref="custPermitTrashWriter" />
</batch:streams>
</batch:chunk>
</batch:tasklet>
</batch:step>

For those who prefer a Java configuration to XML configuration, it is done as follows:
#Bean
public Step processCustPermits(StepBuilderFactory stepBuilderFactory,
#Qualifier("custPermitReader") ItemReader<Wscpos> custPermitReader,
#Qualifier("custPermitCompositeWriter") ItemWriter<Wscpos> custPermitCompositeWriter,
#Qualifier("custPermitWriter") FlatFileItemWriter<Wscpos> custPermitWriter,
#Qualifier("custPermitOmithWriter") FlatFileItemWriter<Wscpos> custPermitOmithWriter,
#Qualifier("custPermitTrashWriter") FlatFileItemWriter<Wscpos> custPermitTrashWriter)
{
return stepBuilderFactory.get("processCustPermits")
.<Wscpos, Wscpos> chunk(1)
.reader(custPermitReader)
.writer(custPermitCompositeWriter)
.stream(writerCustodyMismatch)
.stream(writerNoMatch)
.stream(custPermitTrashWriter)
.build();
}

Related

Sprint batch update insert on heavy db takes time

I have implemented a spring batch, which reads data from csv files and insert and update based on records
I have a table XXX037 which has 800k records and it takes too much time inserting and updating it.
I have used spring batch configuration, commit-interval of 1000. Still it takes time to process
Is there any way i can improve performance?
Configiraion
<batch:job id="pcqJob">
<batch:step id="pcqStep">
<batch:tasklet>
<batch:chunk reader="pcqReader" writer="compositeWriter" commit-interval="1000">
<!-- <batch:skippable-exception-classes>
<batch:include class="javax.persistence.PersistenceException"/>
</batch:skippable-exception-classes> -->
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
<!-- <bean id="skipPolicy" class="com.test.domain.services.writer.SkipPolicy">
<property name="skipLimit" value="2500"/>
</bean> -->
<bean id="compositeWriter" class="org.springframework.batch.item.support.ClassifierCompositeItemWriter">
<property name="classifier">
<bean class="org.springframework.classify.BackToBackPatternClassifier">
<property name="routerDelegate">
<bean class="com.test.domain.services.writer.ItemCodeClassifier" />
</property>
<property name="matcherMap">
<map>
<entry key="*Doss*" value-ref="fileItemWriter1" />
<entry key="*Ldt*" value-ref="fileItemWriter2" />
<entry key="*Old*" value-ref="oldDossierfileItemWriter" />
<entry key="*Tpm*" value-ref="tpmfileItemWriter" />
<entry key="*Txm*" value-ref="tpxfileItemWriter" />
<entry key="*DoD*" value-ref="dossierDeletefileItemWriter" />
<entry key="*LdD*" value-ref="ldtDeletefileItemWriter" />
<entry key="*TpD*" value-ref="tpmDeletefileItemWriter" />
<entry key="*TxD*" value-ref="txmDeletefileItemWriter" />
</map>
</property>
</bean>
</property>
</bean>

RetryLogic using AOP is not working for MongoItemWritrer , but the same is working for my custom Readers and writers

RetryLogic using AOP is not working for MongoItemWritrer , but the same is working for my custom Readers and writers
Is there anything that I'm doing wrong here.
Below is the code snipopet.
<bean id="retryAdvice"
class="org.springframework.retry.interceptor.RetryOperationsInterceptor">
<property name="retryOperations" ref="taskBatchRetryTemplate" />
</bean>
<bean id="taskBatchRetryTemplate" class="org.springframework.retry.support.RetryTemplate">
<property name="retryPolicy" ref="genericRetryPolicy" />
<property name="backOffPolicy">
<bean class="org.springframework.retry.backoff.ExponentialBackOffPolicy">
<property name="initialInterval" value="${mongoloader.backOffPeriod.initialInterval}"/>
<property name="maxInterval" value="${mongoloader.backOffPeriod.maxInterval}"/>
<property name="multiplier" value="${mongoloader.backOffPeriod.multiplier}"/>
</bean>
</property>
<property name="listeners">
<bean class="com.company.ens.myload.job.StepRetryListener"/>
</property>
</bean>
<bean id="genericRetryPolicy" class="org.springframework.retry.policy.SimpleRetryPolicy" >
<constructor-arg index="0" value="${mongoloader.retry.limit}"/>
<constructor-arg index="1">
<map>
<entry key="org.springframework.data.mongodb.CannotGetMongoDbConnectionException" value="true"/>
<entry key="org.springframework.jdbc.CannotGetJdbcConnectionException" value="true"/>
<!-- Just included the below exception for testing purpose, needs to be removed -->
<entry key="java.io.FileNotFoundException" value="true"/>
<entry key="org.springframework.dao.DuplicateKeyException" value="true"/>
</map>
</constructor-arg>
</bean>
//including only my step configution here
<batch:step id="Step4a-MainFlow_TranslateRawFedObjectsToFilingModel"
allow-start-if-complete="false">
<batch:tasklet>
<batch:chunk reader="rawFedObjectMongoReader"
processor="translatingProcessor" writer="SctModelMongoWriter"
commit-interval="100"/>
</batch:tasklet>
<batch:listeners>
<batch:listener ref="step4aListener"/>
</batch:listeners>
</batch:step>
Did not find any posts related to my question. Any help would be appreciated.

Mapping JPA entity to more than one entityManagers with SpringBatch program

I have developed SpringBatch application and deployed as Web Application in Websphere Liberty profile container. The batch program is designed to read records from a table and invokes HTTP service. Based on the service response a column named status is updated as RECORD_SENT/COMPLETE/ERROR type.
Objective is to reuse the same program for multiple datasources. The data source is passed in job parameter using client type. The datasources are in different schemas but having same datamodel.
Question: How does the transaction manager can be applied at run time inside Job Step or Tasklet?. Seeking help in this regard.
Configuration:
<bean id="entityManagerFactory1"
class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
<property name="dataSource" ref="dataSource1" />
<property name="persistenceUnitName" value="user" />
<property name="jpaVendorAdapter">
<bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter">
<property name="showSql" value="false" />
</bean>
</property>
<property name="jpaDialect">
<bean class="org.springframework.orm.jpa.vendor.HibernateJpaDialect" />
</property>
</bean>
<bean id="entityManagerFactory2"
class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
<property name="dataSource" ref="dataSource2" />
<property name="persistenceUnitName" value="user" />
<property name="jpaVendorAdapter">
<bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter">
<property name="showSql" value="false" />
</bean>
</property>
<property name="jpaDialect">
<bean class="org.springframework.orm.jpa.vendor.HibernateJpaDialect" />
</property>
</bean>
<bean id="entityManagerSelector" class="*com.spring.jpa.test.EntitymanagerSelector">
<property name="entityManagerFactory1" ref="entityManagerFactory1"></property>
<property name="entityManagerFactory2" ref="entityManagerFactory2"></property>
</bean>
job.xml snippet
<bean id="itemReader" class="org.springframework.batch.item.database.JpaPagingItemReader" scope="step">
<property name="entityManagerFactory" value="#{entityManagerSelector.getEntitymanagerForClient({jobParameters['client']})}" />
<property name="queryString" value="select u from User u where u.age > #{jobParameters['age']}" />
</bean>
Setting the job parameters during runtime to identify the client
JobParameters param = new JobParametersBuilder()
.addString("age", "20").addString("client", "client2")
.toJobParameters();
JobExecution execution = jobLauncher.run(job, param);
It will not be possible for you to set the transaction-manager of the Step/tasklet during runtime. You will be better off creating a separate Job's for each client and using their own transaction manager in the tasklet.
<bean id="transactionManager1" class="org.springframework.orm.jpa.JpaTransactionManager">
<property name="entityManagerFactory" ref="entityManagerFactory1" />
</bean>
<bean id="transactionManager2" class="org.springframework.orm.jpa.JpaTransactionManager">
<property name="entityManagerFactory" ref="entityManagerFactory2" />
</bean>
Now use these transaction manager when creating the batch job's
<job id="testJob1" xmlns="http://www.springframework.org/schema/batch">
<step id="client1step1">
<tasklet transaction-manager="transactionManager1">
<chunk reader="itemReader" writer="itemWriter" commit-interval="1" />
</tasklet>
</step>
</job>
<job id="testJob2" xmlns="http://www.springframework.org/schema/batch">
<step id="client2step2">
<tasklet transaction-manager="transactionManager2">
<chunk reader="itemReader" writer="itemWriter" commit-interval="1" />
</tasklet>
</step>
</job>
Let me know if this works out.

Passing data through two steps - Custom Field Mapper and Custom Tasklet

This is my job configuration:
<batch:job id="clientesJob" job-repository="jobRepository">
<batch:step id="step1" next="renameFiles">
<tasklet>
<chunk reader="multiResourceReader" writer="sqlWriter"
commit-interval="1" />
</tasklet>
</batch:step>
<batch:step id="renameFiles">
<tasklet ref="fileRenamingTasklet" />
</batch:step>
</batch:job>
<bean id="multiResourceReader"
class=" org.springframework.batch.item.file.MultiResourceItemReader">
<property name="resources" value="file:c:/cvs/basecli*" />
<property name="delegate" ref="flatFileItemReader" />
</bean>
<bean id="flatFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="fieldSetMapper" ref="clienteMapper" />
<property name="lineTokenizer" ref="tickerLineTokenizer" />
</bean>
</property>
</bean>
<bean name="tickerLineTokenizer"
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer" />
<bean id="clienteMapper" class="com.bind.mapper.ClienteFieldSetMapper">
</bean>
<bean id="fileRenamingTasklet" class="com.bind.tasklet.FileRenamingTasklet">
<property name="directory" value="file:c:/cvs/" />
</bean>
In the first step I'm reading the folder with a MultiResourceItemReader, then write it to a SQL Server.
The second one rename the files like "PROCESSFILE-{originalname}".
I thing I want to archive is in the first step there was a problem rename the file in a diferent way like "PROCESSERROR-{originalname}".
So I have to know the status of the first step in my FileRenamingTasklet.
I read about setting the data to the stepExecutionContext. But I cant access in ClienteFieldSetMapper.
I also try using listeners, but there i can't pass the data through.
For further considerations I need the file name and the status.
Any ideas?
Make your fileRenamingTasklet a StepExecutionListener and listen step1 afterStep result; in StepExecutionListener.afterStep(StepExecution stepExecution) check stepExecution.getExitStatus() and you are able to rename correctly your files.
To add listener you have to modify your xml as:
<batch:step id="step1" next="renameFiles">
<tasklet>
<chunk reader="multiResourceReader" writer="sqlWriter" commit-interval="1" />
</tasklet>
<listeners>
<listener ref="fileRenamingTasklet" />
</listeners>
</batch:step>

Making Spring Batch ItemReader restartable

The spring Batch program which I am working on is reading data from a table. It’s using ‘org.springframework.batch.item.database.JdbcCursorItemReader’ itemReader . Earlier the plan was to Alter table and add a PROCESSED_INDICATOR flag and prepopulate it with status ‘PENDING’. Once the record is processed and writer will update the status of PROCESSED_INDICATOR flag to ‘Processed’. This is to support re-startability . For example if batch picks up 1 million records and died in ½ million records then when I restart the batch; it should start where I have left off.
But unfortunately, management didn’t approve this solution. I am digging in ways to make itemreader re-startable. As per Spring documentation “Most ItemReaders have much more sophisticated restart logic. The JdbcCursorItemReader, for example, stores the row id of the last processed row in the Cursor.”
Does anyone have any sample example of such custom reader which implements JdbcCursorItemReader and stores last processed row in the cursor.
https://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html
==FULL XML CONFIGURATION==
<import resource="classpath:/batch/utility/skip/batch_skip.xml" />
<import resource="classpath:/batch/config/context-postgres.xml" />
<import resource="classpath:/batch/config/oracle-database.xml" />
<context:property-placeholder
location="classpath:/batch/jobs/TPF-1001-DD-01/TPF-1001-DD-01.properties" />
<bean id="gridSizePartitioner"
class="com.tpf.partitioner.GridSizePartitioner" />
<task:executor id="taskExecutor" pool-size="${pool.size}" />
<batch:job id="XYZJob" job-repository="jobRepository"
restartable="true">
<batch:step id="XYZSTEP">
<batch:description>Convert TIF files to PDF</batch:description>
<batch:partition partitioner="gridSizePartitioner">
<batch:handler task-executor="taskExecutor"
grid-size="${pool.size}" />
<batch:step>
<batch:tasklet allow-start-if-complete="true">
<batch:chunk commit-interval="${commit.interval}"
skip-limit="${job.skip.limit}">
<batch:reader>
<bean id="timeReader"
class="org.springframework.batch.item.database.JdbcCursorItemReader"
scope="step">
<property name="dataSource" ref="oracledataSource" />
<property name="sql">
<value>
select TIME_ID as timesheetId,count(*),max(CREATION_DATETIME) as creationDateTime , ILN_NUMBER as ilnNumber
from TS_FAKE_NAME
where creation_datetime >= '#{jobParameters['creation_start_date1']} 12.00.00.000000000 AM'
and creation_datetime < '#{jobParameters['creation_start_date2']} 11.59.59.999999999 PM'
and mod(time_id,${pool.size})=#{stepExecutionContext['partition.id']}
group by time_id ,ILN_NUMBER
</value>
</property>
<property name="rowMapper">
<bean
class="org.springframework.jdbc.core.BeanPropertyRowMapper">
<property name="mappedClass"
value="com.tpf.model.Time" />
</bean>
</property>
</bean>
</batch:reader>
<batch:processor>
<bean id="compositeItemProcessor"
class="org.springframework.batch.item.support.CompositeItemProcessor">
<property name="delegates">
<list>
<ref bean="timeProcessor" />
</list>
</property>
</bean>
</batch:processor>
<batch:writer>
<bean id="compositeItemWriter"
class="org.springframework.batch.item.support.CompositeItemWriter">
<property name="delegates">
<list>
<ref bean="timeWriter" />
</list>
</property>
</bean>
</batch:writer>
<batch:skippable-exception-classes>
<batch:include
class="com.utility.skip.BatchSkipException" />
</batch:skippable-exception-classes>
<batch:listeners>
<batch:listener ref="batchSkipListener" />
</batch:listeners>
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:partition>
</batch:step>
<batch:validator>
<bean
class="org.springframework.batch.core.job.DefaultJobParametersValidator">
<property name="requiredKeys">
<list>
<value>batchRunNumber</value>
<value>creation_start_date1</value>
<value>creation_start_date2</value>
</list>
</property>
</bean>
</batch:validator>
</batch:job>
<bean id="timesheetWriter" class="com.tpf.writer.TimeWriter"
scope="step">
<property name="dataSource" ref="dataSource" />
</bean>
<bean id="timeProcessor"
class="com.tpf.processor.TimeProcessor" scope="step">
<property name="dataSource" ref="oracledataSource" />
</bean>
Does anyone have any sample example of such custom reader which implements JdbcCursorItemReader and stores last processed row in the cursor
The JdbcCursorItemReader does that, see Javadoc, here is an excerpt:
ExecutionContext: The current row is returned as restart data,
and when restored from that same data, the cursor is opened and the current row
set to the value within the restart data.
So you don't need a custom reader.