spring batch 3. File conversion - spring-batch

Need to read multiple txt files and generate multiple csv files,
I want to read multiple txt files such that i can select some columns from one txt file and some from >other.My txt file is pipe seperated.
i am able to read a single txt file and generate multiple csv files
Thanks in advance.
<!-- ItemReader reads a complete line one by one from input file -->
<bean id="flatFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<property name="resource" value="classpath:student.txt" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="fieldSetMapper">
<!-- Mapper which maps each individual items in a record to properties in POJO -->
<bean class="com.springbatch.ExamResultFieldSetMapper" />
</property>
<property name="lineTokenizer">
<!-- A tokenizer class to be used when items in input record are separated by specific characters -->
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="delimiter" value="|" />
</bean>
</property>
</bean>
</property>
</bean>
<!-- CSV ItemWriter which writes the data in CSV format -->
<bean id="CSVWriter" class="org.springframework.batch.item.file.FlatFileItemWriter">
<property name="resource" value="file:output/examResultcsv1.csv"></property>
<property name="shouldDeleteIfExists" value="true"></property>
<property name="lineAggregator">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
<property name="delimiter" value=","></property>
<property name="fieldExtractor">
<bean
class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor">
<property name="names" value="studentName, percentage" />
</bean>
</property>
</bean>
</property>
</bean>

Related

Spring Batch - create a new file each time instead of overriding it for transferring data from CSV to XML

I am new to Spring Batch. I was trying to shift data from CSV file to XML file & able to shift it successfully. But when each time I run the code my XML (output file) getting override which I dont want, instead I want to create new output file (old output files should be there, require for data tracking purpose) for each run. How can I do that ?
Here is the my code: What I need to change in below file? Let me know if you need more file code from my side.
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:batch="http://www.springframework.org/schema/batch" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch.xsd
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">
<!-- JobRepository and JobLauncher are configuration/setup classes -->
<bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean" />
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
</bean>
<!-- ============= ItemReader reads a complete line one by one from input file ============ -->
<bean id="flatFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<!-- Get the Resource file -->
<property name="resource" value="classpath:ExamResult.txt" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="fieldSetMapper">
<!-- Mapper which maps each individual items in a record to properties in POJO -->
<bean class="com.websystique.springbatch.mapper.ExamResultFieldSetMapper" />
</property>
<property name="lineTokenizer">
<!-- A tokenizer class to be used when items in input record are separated by specific characters -->
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="delimiter" value="|" />
</bean>
</property>
</bean>
</property>
</bean>
<!-- ======== XML ItemWriter which writes the data in XML format =========== -->
<bean id="xmlItemWriter" class="org.springframework.batch.item.xml.StaxEventItemWriter">
<property name="resource" value="file:xml/ExamResult.xml" />
<property name="rootTagName" value="UniversityExamResultList" />
<property name="marshaller">
<bean class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
<property name="classesToBeBound">
<list>
<value>com.websystique.springbatch.model.ExamResult</value>
</list>
</property>
</bean>
</property>
</bean>
<!-- Optional ItemProcessor to perform business logic/filtering on the input records -->
<bean id="itemProcessor" class="com.websystique.springbatch.processor.ExamResultItemProcessor" />
<!-- Optional JobExecutionListener to perform business logic before and after the job -->
<bean id="jobListener" class="com.websystique.springbatch.listener.ExamResultJobListener" />
<!-- Step will need a transaction manager -->
<bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />
<!-- ==================== Actual Job =================== -->
<batch:job id="examResultJob">
<batch:step id="step1">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="flatFileItemReader" writer="xmlItemWriter" processor="itemProcessor" commit-interval="10" />
</batch:tasklet>
</batch:step>
<batch:listeners>
<batch:listener ref="jobListener" />
</batch:listeners>
</batch:job>
</beans>
Try using the Spring Expression Language (SpEL) to add a date and time to the end of the output file name. Something like:
<property name="resource"
value="file:xml/ExamResult-#{new java.text.SimpleDateFormat("Mddyyyyhhmmss").format(new java.util.GregorianCalendar().getTime())}.xml" />

Spring Batch - FlatFileItemReader \001 delimiter issue

I am working on a Spring batch application where i am using FlatFileItemReader to read the file with delimiter ~ or | and its working fine and its calling the processor once read is completed.
But when i try to use the delimiter as \001 the processor is not called and i am not getting any error also in the console.(Linux environment)
Example file format:
0002~000000000000000470~000006206210008078~PR~7044656907~7044641561~~~~240082202~~~ENG~CH~~19940926~D~~~AL~~~P~USA
This is my reader configuration.
<property name="resource" value="#{stepExecutionContext['fileResource']}" />
<!-- <property name="linesToSkip" value="1"></property> -->
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="delimiter" value="${file.delimiter}"/>
<property name="names" value="sor_id,sor_cust_id,acct_id,cust_role_type_cd,cust_full_nm,mailg_adr_line_1,mailg_adr_line_2,mailg_city_nm,mailg_geo_st_cd,mailg_full_pstl_cd,mailg_cntry_cd,mailg_adr_desc,phy_adr_line_1,phy_adr_line_2,phy_city_nm,phy_geo_st_cd,phy_full_pstl_cd,phy_cntry_cd,phy_adr_desc,home_phn_num,work_phn_num,mobile_phn_num,email_adr_txt,ssn,cust_tax_idn_num,gndr_cd,martl_cd,lang_cd,acct_stat_cd,cust_brth_dt,acct_open_dt,sor_acct_stat_cd,sor_acct_stat_desc,vld_phn_num_ind,prod_cd,prft_ctr_cd,bus_legl_strc_cd,acct_use_cd,cntry_of_origin_cd" />
</bean>
</property>
<property name="fieldSetMapper">
<bean class="com.cap1.cdi.batch.SrcMasterFieldSetMapper" />
</property>
</bean>
</property>
</bean>
Is anyone else faced the same kind of issue?
Regards,
Shankar
I am going to answer my own question.
The actual issue was control character was used as delimiter in linux (^A)
In Java when i use string.split("\u0001") it was working. Also passing the same to Spring batch flatfileitemreader as delimiter it works like a charm.
Thanks
Shankar.

Spring Batch MultiResourceItemWriter doesn't correctly writes data in files

This is my SpringBatch maven depedency:
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-core</artifactId>
<version>2.2.0.RELEASE</version>
</dependency>
Below is my job.xml file
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:batch="http://www.springframework.org/schema/batch"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/batch
http://www.springframework.org/schema/batch/spring-batch-2.2.xsd
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.2.xsd">
<import resource="../config/launch-context.xml" />
<bean id="inputFileForMultiResource" class="org.springframework.core.io.FileSystemResource" scope="step">
<constructor-arg value="src/main/resources/files/customerInputValidation.txt"/>
</bean>
<bean id="outputFileForMultiResource" class="org.springframework.core.io.FileSystemResource" scope="step">
<constructor-arg value="src/main/resources/files/xml/customerOutput.xml"/>
</bean>
<bean id="readerForMultiResource" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="resource" ref="inputFileForMultiResource" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names" value="firstName,middleInitial,lastName,address,city,state,zip" />
<property name="delimiter" value="," />
</bean>
</property>
<property name="fieldSetMapper">
<bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
<property name="prototypeBeanName" value="customer5TO" />
</bean>
</property>
</bean>
</property>
</bean>
<bean id="customer5TO" class="net.gsd.group.spring.batch.tutorial.to.Customer5TO"></bean>
<bean id="xmlOutputWriter" class="org.springframework.batch.item.xml.StaxEventItemWriter">
<property name="resource" ref="outputFileForMultiResource" />
<property name="marshaller" ref="customerMarshallerJMSJob" />
<property name="rootTagName" value="customers" />
</bean>
<bean id="customerMarshallerJMSJob" class="org.springframework.oxm.xstream.XStreamMarshaller">
<property name="aliases">
<map>
<entry key="customer" value="net.gsd.group.spring.batch.tutorial.to.Customer5TO"></entry>
</map>
</property>
</bean>
<bean id="multiResourceItemWriter" class="org.springframework.batch.item.file.MultiResourceItemWriter">
<property name="resource" ref="customerOutputXmlFile"/>
<property name="delegate" ref="xmlOutputWriter"/>
<property name="itemCountLimitPerResource" value="3"/>
<property name="resourceSuffixCreator" ref="suffix"/>
</bean>
<bean id="suffix" class="net.gsd.group.spring.batch.tutorial.util.CustomerOutputFileSuffixCreator"></bean>
<batch:step id="multiResourceWriterParentStep">
<batch:tasklet>
<batch:chunk
reader="readerForMultiResource"
writer="multiResourceItemWriter"
commit-interval="3">
</batch:chunk>
</batch:tasklet>
</batch:step>
<batch:job id="multiResourceWriterJob">
<batch:step id="multiResourceStep" parent="multiResourceWriterParentStep"/>
</batch:job>
</beans>
Basically I have one input file and more output files. I read the data from the input file (which contains 10 rows) and I want to write it by chunks of 3 into more than one output file (in my case there will be 4 files 3/3/3/1). For each chunk there will be one file.
The job works correctly but the content of the output files is wrong.
Each of the files contain the last element that has bean read in the current chunk.
Let's say the first chunk reads A, B and C. When this chunk is written to the file only the C is written and is written 3 times.
From my tests it works correctly only when the chunk commit-interval is 1.
Is this the correct behaviour?
What am I doing wrong?
Thank you in advance
Your customer5TO bean should have scope prototype. Currently it is a singleton, so the bean properties will always be overwritten with the last item.

MultiResourceItemReader inside Jar doesn't read resource files

I created a job and archived it into a jar.
When I run it with exec:java it works. When I invoke it with java -cp myjar.jar, MultiResourceItemReader doesn't seem read my resources what fetched from ftp server.
<bean id="merge.reader.resource"
class="org.springframework.batch.item.file.MultiResourceItemReader"
scope="step">
<property name="resources" value="file:job1/merge/fetched/*.xml" />
...
</bean>
What did I do wrong?
$ ls -l
mybatch.jar
job1/merge/fetched/xxxx.xml
$ java -cp mybatch.jar x.y.z.Main
I created a class extending MultiResourceItemReader and checked that file sources are set via setResources method and delegate via setDelegate.
<!--bean id="merge.reader.resource"
class="org.springframework.batch.item.file.MultiResourceItemReader" scope="step"-->
<bean id="merge.reader.resource"
class="xxx.ExtendedMultiResourceItemReader"
scope="step">
<property name="strict" value="true"/>
<property name="resources" value="file:job1/merge/fetched/*.xml"/>
<property name="delegate" ref="merge.reader.item" />
</bean>
<bean id="merge.reader.item"
class="org.springframework.batch.item.xml.StaxEventItemReader">
<property name="fragmentRootElementName" value="XXX"/>
<property name="unmarshaller" ref="merge.reader.unmarshaller"/>
</bean>
<bean id="merge.reader.unmarshaller"
class="org.springframework.oxm.xstream.XStreamMarshaller">
<property name="aliases" ref="merge.reader.binder"/>
<property name="autodetectAnnotations" value="true"/>
</bean>
<util:map id="merge.reader.binder">
<entry key="XXX" value="xxx"/>
</util:map>
I really don't understand why reader don't process files.

Making Spring Batch ItemReader restartable

The spring Batch program which I am working on is reading data from a table. It’s using ‘org.springframework.batch.item.database.JdbcCursorItemReader’ itemReader . Earlier the plan was to Alter table and add a PROCESSED_INDICATOR flag and prepopulate it with status ‘PENDING’. Once the record is processed and writer will update the status of PROCESSED_INDICATOR flag to ‘Processed’. This is to support re-startability . For example if batch picks up 1 million records and died in ½ million records then when I restart the batch; it should start where I have left off.
But unfortunately, management didn’t approve this solution. I am digging in ways to make itemreader re-startable. As per Spring documentation “Most ItemReaders have much more sophisticated restart logic. The JdbcCursorItemReader, for example, stores the row id of the last processed row in the Cursor.”
Does anyone have any sample example of such custom reader which implements JdbcCursorItemReader and stores last processed row in the cursor.
https://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html
==FULL XML CONFIGURATION==
<import resource="classpath:/batch/utility/skip/batch_skip.xml" />
<import resource="classpath:/batch/config/context-postgres.xml" />
<import resource="classpath:/batch/config/oracle-database.xml" />
<context:property-placeholder
location="classpath:/batch/jobs/TPF-1001-DD-01/TPF-1001-DD-01.properties" />
<bean id="gridSizePartitioner"
class="com.tpf.partitioner.GridSizePartitioner" />
<task:executor id="taskExecutor" pool-size="${pool.size}" />
<batch:job id="XYZJob" job-repository="jobRepository"
restartable="true">
<batch:step id="XYZSTEP">
<batch:description>Convert TIF files to PDF</batch:description>
<batch:partition partitioner="gridSizePartitioner">
<batch:handler task-executor="taskExecutor"
grid-size="${pool.size}" />
<batch:step>
<batch:tasklet allow-start-if-complete="true">
<batch:chunk commit-interval="${commit.interval}"
skip-limit="${job.skip.limit}">
<batch:reader>
<bean id="timeReader"
class="org.springframework.batch.item.database.JdbcCursorItemReader"
scope="step">
<property name="dataSource" ref="oracledataSource" />
<property name="sql">
<value>
select TIME_ID as timesheetId,count(*),max(CREATION_DATETIME) as creationDateTime , ILN_NUMBER as ilnNumber
from TS_FAKE_NAME
where creation_datetime >= '#{jobParameters['creation_start_date1']} 12.00.00.000000000 AM'
and creation_datetime < '#{jobParameters['creation_start_date2']} 11.59.59.999999999 PM'
and mod(time_id,${pool.size})=#{stepExecutionContext['partition.id']}
group by time_id ,ILN_NUMBER
</value>
</property>
<property name="rowMapper">
<bean
class="org.springframework.jdbc.core.BeanPropertyRowMapper">
<property name="mappedClass"
value="com.tpf.model.Time" />
</bean>
</property>
</bean>
</batch:reader>
<batch:processor>
<bean id="compositeItemProcessor"
class="org.springframework.batch.item.support.CompositeItemProcessor">
<property name="delegates">
<list>
<ref bean="timeProcessor" />
</list>
</property>
</bean>
</batch:processor>
<batch:writer>
<bean id="compositeItemWriter"
class="org.springframework.batch.item.support.CompositeItemWriter">
<property name="delegates">
<list>
<ref bean="timeWriter" />
</list>
</property>
</bean>
</batch:writer>
<batch:skippable-exception-classes>
<batch:include
class="com.utility.skip.BatchSkipException" />
</batch:skippable-exception-classes>
<batch:listeners>
<batch:listener ref="batchSkipListener" />
</batch:listeners>
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:partition>
</batch:step>
<batch:validator>
<bean
class="org.springframework.batch.core.job.DefaultJobParametersValidator">
<property name="requiredKeys">
<list>
<value>batchRunNumber</value>
<value>creation_start_date1</value>
<value>creation_start_date2</value>
</list>
</property>
</bean>
</batch:validator>
</batch:job>
<bean id="timesheetWriter" class="com.tpf.writer.TimeWriter"
scope="step">
<property name="dataSource" ref="dataSource" />
</bean>
<bean id="timeProcessor"
class="com.tpf.processor.TimeProcessor" scope="step">
<property name="dataSource" ref="oracledataSource" />
</bean>
Does anyone have any sample example of such custom reader which implements JdbcCursorItemReader and stores last processed row in the cursor
The JdbcCursorItemReader does that, see Javadoc, here is an excerpt:
ExecutionContext: The current row is returned as restart data,
and when restored from that same data, the cursor is opened and the current row
set to the value within the restart data.
So you don't need a custom reader.