I have written a simple Spring Batch application that reads a CSV file, does some transforming and writes a modified CSV to the disk.
The reading of the file into domain objects works like a charm. I use DelimitedLineTokenizer to tokenize the lines and a BeanWrapperFieldSetMapper to feed the values into a bean:
<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<property name="resource" value="#{jobParameters['inputResource']}" />
<property name="linesToSkip" value="1" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="delimiter" value=";" />
<property name="names"
value="ID,NAME,DESCRIPTION,PRICE,DATE" />
</bean>
</property>
<property name="fieldSetMapper">
<bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
<property name="targetType" value="myapp.MyDomainObject" />
<property name="customEditors">
<map>
<entry key="java.util.Date" value-ref="dateEditor" />
<entry key="java.math.BigDecimal" value-ref="numberEditor" />
</map>
</property>
</bean>
</property>
</bean>
</property>
</bean>
I especially like the features of BeanWrapperFieldSetMapper to "guess" the field names and the possibility to define CustomEditors which I use to define the special date and number formats used in the input file.
Now I would like to write the modified file in the same format like the input file.
I use the following configuration:
<bean id="writer" class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step">
<property name="resource" value="#{jobParameters['outputResource']}" />
<property name="lineAggregator">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
<property name="delimiter" value=";" />
<property name="fieldExtractor">
<bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor">
<property name="names" value="id,name,description,price,date" />
</bean>
</property>
</bean>
</property>
</bean>
There are two things I miss with this configuration:
BeanWrapperFieldSetMapper allowed me to set CustomEditors, but BeanWrapperFieldExtractor has no such possibility. Is there a way to use these?
Is there a way to define the headings in the first line of the file? I have not found any way to write an initial line that is not a bean... It would be great to use the same names here as in BeanWrapperFieldSetMapper such that BeanWrapperFieldExtractor writes the inital line and guesses the bean property namens as BeanWrapperFieldSetMapper does.
The process to load files is so comfortable in Spring Batch. Why is the writing of files so different? Am I missing something?
I have to use Spring Batch 2.1.x because we are using Spring 3.0.x . Therefor an upgrade to 2.2.x would not be an option.
Which is your need? Extract field property as text? You can
use a FormatterLineAggregator if you needs are not too complicated
write your own CustomEditorsFieldExtractor (better)
Generate a complex domain object composed by original domain object and by text-formatted object and use last one as parameter of writer (but breaks your current processor/writer)
Use FlatFileItemWriter.headerCallback: if setted allow custom header write
Writing - in your case - seems a pain respect read process because spring-batch's reading components fits your needs. Standard components fits more used use-case and they cover a lot of scenario. Let us write a custom FieldExtractor sometimes! :)
Related
I am working on a Spring batch application where i am using FlatFileItemReader to read the file with delimiter ~ or | and its working fine and its calling the processor once read is completed.
But when i try to use the delimiter as \001 the processor is not called and i am not getting any error also in the console.(Linux environment)
Example file format:
0002~000000000000000470~000006206210008078~PR~7044656907~7044641561~~~~240082202~~~ENG~CH~~19940926~D~~~AL~~~P~USA
This is my reader configuration.
<property name="resource" value="#{stepExecutionContext['fileResource']}" />
<!-- <property name="linesToSkip" value="1"></property> -->
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="delimiter" value="${file.delimiter}"/>
<property name="names" value="sor_id,sor_cust_id,acct_id,cust_role_type_cd,cust_full_nm,mailg_adr_line_1,mailg_adr_line_2,mailg_city_nm,mailg_geo_st_cd,mailg_full_pstl_cd,mailg_cntry_cd,mailg_adr_desc,phy_adr_line_1,phy_adr_line_2,phy_city_nm,phy_geo_st_cd,phy_full_pstl_cd,phy_cntry_cd,phy_adr_desc,home_phn_num,work_phn_num,mobile_phn_num,email_adr_txt,ssn,cust_tax_idn_num,gndr_cd,martl_cd,lang_cd,acct_stat_cd,cust_brth_dt,acct_open_dt,sor_acct_stat_cd,sor_acct_stat_desc,vld_phn_num_ind,prod_cd,prft_ctr_cd,bus_legl_strc_cd,acct_use_cd,cntry_of_origin_cd" />
</bean>
</property>
<property name="fieldSetMapper">
<bean class="com.cap1.cdi.batch.SrcMasterFieldSetMapper" />
</property>
</bean>
</property>
</bean>
Is anyone else faced the same kind of issue?
Regards,
Shankar
I am going to answer my own question.
The actual issue was control character was used as delimiter in linux (^A)
In Java when i use string.split("\u0001") it was working. Also passing the same to Spring batch flatfileitemreader as delimiter it works like a charm.
Thanks
Shankar.
I am relatively new to Spring Batch.
I have an input file with a header. This header contains several fields, one of which I am interested in (YYYYMM data).
Here is my config for this :
<bean id="detaillesHeaderReaderCallback" class="fr.generali.ede.daemon.batch.dstaff.detailles.DetaillesHeaderReaderCallback" >
<property name="headerTokenizer" ref="headerTokenizer" />
<property name="fieldSetMapper" ref="fieldSetMapperHeaderLog07" />
<!-- need to write moisComptable to ChunkContext -->
<property name="chunkContext" value="#{chunkExecutionContext}" />
</bean>
<bean id="headerTokenizer"
class="org.springframework.batch.item.file.transform.FixedLengthTokenizer">
<property name="names" value="dummy1,moisComptable,dummy2" />
<property name="columns" value="1-22,23-28,29-146" />
</bean>
After which, in the next step of the job, I want to generate an output file whose name is composed of a static part and that header field :
<bean id="fileItemWriterLog07" class="org.springframework.batch.item.file.FlatFileItemWriter">
<property name="resource"
value="file:${batch.coherence.out.path}/DSTAF007_LOG_#{jobExecutionContext['moisComptable']}.txt" />
<property name="shouldDeleteIfExists" value="true" />
<property name="headerCallback" ref="DetaillesHeaderWriterCallbackLog07" />
...
</bean/>
(I have two jobs because I first write to a database, and then read from it.)
As one would guess this doesn't work, the config file is flowed so I get BeanCreationExceptions. But this gives an idea of what I want to achieve.
I have no exception on the ChunkContext (yet ?) but one on the writer resource. Here is the exception :
Field or property 'jobExecutionContext' cannot be found on object of type 'org.springframework.beans.factory.config.BeanExpressionContext'
Does anyone have an idea about how to proceed ?
Thanks in advance.
I was trying to share My in-memory jobRepository to the jobExplorer. But it throws an error as,
Nested exception is
org.springframework.beans.ConversionNotSupportedException:
Failed to convert property value of type '$Proxy1 implementing
org.springframework.batch.core.repository.JobRepository,org.
springframework.aop.SpringProxy,org.springframework.aop.framework.Advised'
to required type
Even i tried putting '&' sign before jobRepository when passing to jobExplorer for sharing.But attempt end in vain.
I am using Spring Batch 2.2.1
Is the dependency for jobExplorer is only database not in-memory?
Definition is,
<bean id="jobRepository"
class="com.test.repository.BatchRepositoryFactoryBean">
<property name="cache" ref="cache" />
<property name="transactionManager" ref="transactionManager" />
</bean>
<bean id="jobOperator" class="test.batch.LauncherTest.TestBatchOperator">
<property name="jobExplorer" ref="jobExplorer" />
<property name="jobRepository" ref="jobRepository" />
<property name="jobRegistry" ref="jobRegistry" />
<property name="jobLauncher" ref="jobLauncher" />
</bean>
<bean id="jobExplorer" class="test.batch.LauncherTest.TestBatchExplorerFactoryBean">
<property name="repositoryFactory" ref="&jobRepository" />
</bean>
<bean id="transactionManager"
class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />
<bean id="jobLauncher" class="com.scb.smartbatch.core.BatchLauncher">
<property name="jobRepository" ref="jobRepository" />
</bean>
<!-- To store Batch details -->
<bean id="jobRegistry" class="com.scb.smartbatch.repository.SmartBatchRegistry" />
<bean id="jobRegistryBeanPostProcessor"
class="org.springframework.batch.core.configuration.support.JobRegistryBeanPostProcessor">
<property name="jobRegistry" ref="jobRegistry" />
</bean>
<!--Runtime cache of batch executions -->
<bean id="cache" class="com.scb.cache.TCRuntimeCache" />
thanks for your valuable inputs.
But I used '&' before the job repository reference, which allowed me to use it for my job explorer as a shared resource.
problem solved.
kudos.
Usually you have to wire interface instead of implementation.
Else, probably, you have to add <aop:config proxy-target-class="true"> to create CGLIB-based proxy instead of standard Java-based proxy.
Read Spring official documentation about that
We have jobs that might process up to 20,000 files. We are using a MultiResourcePartitioner to set things up. The job does run, but we have noticed a bottleneck.
SpringBatch is creating entries in the BATCH_STEP_EXECUTION table for each file found, and will not process any files until it has created a table entry for every file. The loading of this table seems to take a very long time.
In local testing, trying to process just 1,000 files, it is taking 38-40 minutes to add the rows to 'BATCH_STEP_EXECUTION'. Once the table is loaded, the files are processed quite rapidly (usually under 1 minute).
I would hope that this is not typical behavior and that I am just missing something.
Here is how the database is set up (we really subclass the 'OracleDataSource' (we are using 'ojdbc6.jar' file to get to the class) and the db_file is a properties file to get to the url, password, etc.):
<bean id="dataSource" class="oracle.jdbc.pool.OracleDataSource" destroy-method="close">
<constructor-arg value="db_file" />
<property name="connectionCachingEnabled" value="true" />
<property name="connectionCacheProperties">
<props merge="default">
<prop key="InitialLimit">10</prop>
<prop key="MinLimit">25</prop>
<prop key="MaxLimit">50</prop>
<prop key="InactivityTimeout">1800</prop>
<prop key="AbandonedConnectionTimeout">900</prop>
<prop key="MaxStatementsLimit">20</prop>
<prop key="PropertyCheckInterval">20</prop>
</props>
</property>
</bean>
Here is the rest of the JobRepository definition:
<bean id="transactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
<property name="dataSource" ref="dataSource" />
</bean>
<bean id="jobRepository" class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean" >
<property name="databaseType" value="oracle" />
<property name="dataSource" ref="dataSource" />
<property name="transactionManager" ref="transactionManager" />
<property name="isolationLevelForCreate" value="ISOLATION_DEFAULT"/>
</bean>
<bean id="jobExplorer" class="org.springframework.batch.core.explore.support.JobExplorerFactoryBean">
<property name="dataSource" ref="dataSource" />
</bean>
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
</bean>
<bean id="jobParametersIncrementer" class="org.springframework.batch.core.launch.support.RunIdIncrementer" />
Anyone have any ideas?
As an FYI, SpringSource has identified this as a bug: Batch-1908.
As a workaround, we are simply lowering the number of files to process with a given run, and then increasing the number of times that the job runs in a given day.
We are using 2,000 as our file limit as it provides acceptable performance.
Take this as alternate approach.
For loading the table from files better to use LOADDATA .
http://infolab.stanford.edu/~ullman/fcdb/oracle/or-load.html
This will improve the performance in a better way. For me its take only 30 seconds to process a file with 1 million records
I am new to Spring Batch and I have to design a task which reads from database and write the data in to multiple XMLs the output format is as follows
<Records xmlns"somevalue" ...>
<Version>1.0</Version>
<SequenceNo>1</SeqeunceNo>
<Date>12/12/2012 12:12:12 PM<Date>
<RecordCount>100</RecordCount><!--This is total number of Update and Insert txns-->
<SenderEmail>asds#asds.com</SenderEmail>
<Transaction type="Update">
<TxnNo>1</TxnNo>
<Details>
<MoreDetails>
</MoreDetails>
</Details>
</Transaction>
<Transaction type="Insert">
<TxnNo>2</TxnNo>
<Details>
<MoreDetails>
</MoreDetails>
</Details>
</Transaction>
<Transaction type="Update">
</Transaction>
<Transaction type="Update">
</Transaction>
</Records>
Please suggest what unmarshaller should I use and how to start on this. Eventually later I have to convert it to multithreading for optimization and performance.
No need to write your own writer. Spring include a MultiResourceItemWriter to write your items into multiple xml.
I'm using a jaxb2Marshaller to write my complex XML.
<bean id="multiItemWriter" class="org.springframework.batch.item.file.MultiResourceItemWriter">
<property name="resource" value="file:data/output/output.xml"/>
<!-- <property name="resourceSuffixCreator" ref="resourceSuffixCreator"/> -->
<property name="saveState" value="true"/>
<property name="itemCountLimitPerResource" value="10"/>
<property name="delegate" ref="itemWriter" />
</bean>
<bean id="itemWriter" class="org.springframework.batch.item.xml.StaxEventItemWriter">
<!-- <property name="resource" value="file:data/output/output.xml" /> -->
<property name="marshaller" ref="customVrdbMarshaller" />
<property name="rootTagName" value="recordings" />
<property name="overwriteOutput" value="true" />
</bean>
<bean id="customVrdbMarshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
<property name="classesToBeBound">
<list>
<value>your.model.model.Albums</value>
</list>
</property>
</bean>
You should code a Writer that writes XML files. Choose a lib and use it in a Writer.
Be careful to write thread safe code for your future multithreading optimization.
An example from Spring Batch samples : XML Processing