skip FlatFileParseException or specific exception in Spring Batch - spring-batch

Hi I have requirement to read (n number) of flat file. During file reading if received FileParseException: from reader then stop the current file reading and came out safely and process next file and continue the job execution. currently i have this xml config but i don't want to go with this because i don't have a really skip limit count. is there any way to handle this scenario may be using ItemReaderListener ?
<chunk reader="flatFileItemReader" writer="itemWriter"
commit-interval="10" skip-limit="2">
<skippable-exception-classes>
<include class="org.springframework.batch.item.file.FlatFileParseException"/>
</skippable-exception-classes>

Instead of specifying a skip-limit, you can use a policy. There are several out-of-the-box skip policies, it sounds like you always want to skip (no limit), use AlwaysSkipItemSkipPolicy.
Example config :
<batch:skip-policy>
<bean:bean class="org.springframework.batch.core.step.skip.AlwaysSkipItemSkipPolicy"/>
</batch:skip-policy>

thanks Doeleman, based upon your input i am able to skip exception usingAlwaysSkipItemSkipPolicy this is how i have implemented
public class SkipPolicy extends AlwaysSkipItemSkipPolicy {
#Override
public boolean shouldSkip(java.lang.Throwable t, int skipCount){
if(t instanceof NonSkippableReadException){
return true;
}
return false;
}
}
xml config.
<batch:chunk reader="cvsFileItemReader" writer="mysqlItemWriter"
commit-interval="2" skip-policy="mySkipPolicy">
<bean id="mySkipPolicy" class="com.model.SkipPolicy"/>

Related

Spring batch improve performance by Partitioning

I need to convert the existing project to Spring batch job to make improvement the job's speed.
Suppose I have the first tasklet to retrive a list of data from database and put it to listener. So the next step can retrieve it from #BeforeStep and do some condition to get another list (10k-20k records) then proceed multiple business logic for each record.
But I am stuck how to implement this step by partition in Spring batch. I found all tutorials using directly query in reader and injected by the ExecutionContext in rangePartitioner. But I can't follow like that way.
<job id="testJob" xmlns="http://www.springframework.org/schema/batch">
<step id="step1" next="step2">
<tasklet ref="driver"/>
<listeners>
<listener ref="promotionListener">
</listener>
</listeners>
</step>
<step id="step2">
<tasklet >
<chunk reader="bmtbBillGenReqReader"
processor="bmtbBillGenReqProcessor"
writer="bmtbBillGenReqWriter"
commit-interval="1">
</chunk>
</tasklet>
</step>
</job>
<bean id="promotionListener"
class="org.springframework.batch.core.listener.ExecutionContextPromotionListener">
<property name="keys">
<util:list>
<value>billGenRequests</value>
</util:list>
</property>
</bean>
Please advise how can I implement partition from step2. maybe store the new list from step2 to csv file or somethings first?
You could implement your own partitionner instead of using RangePartitionner and retrieve the data in this implementation, instead of a dedicated step.
Then pass data for each partition to create according to your needs. For example
public class FilesPartitioner implements Partitioner {
private JdbcOperations jdbcTemplate;
#Autowired
private DataSource dataSource;
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> map = new HashMap<>();
List<String> filesname = jdbcTemplate.queryForList(
"SELECT DISTINCT FILENAME FROM MYTABLE", String.class);
for (int i = 0; i < filesname.size(); i++) {
ExecutionContext executionContext = new ExecutionContext();
executionContext.put("data", filesname.get(i));
String key = filesname.get(i);
map.put(key, executionContext);
}
return map;
}
}
And inject the parameters accordingly in reader

Spring batch: Create file by setting name programmatically

I have a spring batch job (defined in xml) which generates the csv export.
Inside FlatFileItemWriter bean I am setting resource, where the name of file is set.
<bean id="customDataFileWriter" class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step">
<property name="resource" value="file:/tmp/export/custom-export.csv"/>
...
Now I need to set this file name taking account a certain logic, so I need to set the file name from some java class. Any ideas?
Use the different builder classes of spring batch (job builder, step builder, and so on). Have a look at https://blog.codecentric.de/en/2013/06/spring-batch-2-2-javaconfig-part-1-a-comparison-to-xml/ to get an idea.
You can implement your own FlatFileItemWriter to override the method setResource and add your own logic to rename the file.
Here's an example implementation :
#Override
public void setResource(Resource resource) {
if (resource instanceof ClassPathResource) {
// Convert resource
ClassPathResource res = (ClassPathResource) resource;
try {
String path = res.getPath();
// Do something to "path" here
File file = new File(path);
// Check for permissions to write
if (file.canWrite() || file.createNewFile()) {
file.delete();
// Call parent setter with new resource
super.setResource(new FileSystemResource(file.getAbsolutePath()));
return;
}
} catch (IOException e) {
// File could not be read/written
}
}
// If something went wrong or resource was delegated to MultiResourceItemWriter,
// call parent setter with default resource
super.setResource(resource);
}
Another possibility exists with the use of jobParameters, if your logic can be applied before job is launched. See 5.4 Late Binding of Spring Batch Documentation.
Example :
<bean id="flatFileItemReader" scope="step" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="resource" value="#{jobParameters['input.file.name']}" />
</bean>
You can also use a MultiResourceItemWriter with a custom ResourceSuffixCreator. That will let you create 1 to n files with a common filename pattern.
Here's an example of the method getSuffix of a custom ResourceSuffixCreator :
#Override
public String getSuffix(int index) {
// Your logic
if (true)
return "XXX" + index;
else
return "";
}

Spring batch jpaPagingItemReader why some rows are not read?

I 'm using Spring Batch(3.0.1.RELEASE) / JPA and an HSQLBD server database.
I need to browse an entire table (using paging) and update items (one by one). So I used a jpaPagingItemReader. But when I run the job I can see that some rows are skipped, and the number of skipped rows is equal to the page size. For i.e. if my table has 12 rows and the jpaPagingItemReader.pagesize = 3 the job will read : lines 1,2,3 then lines 7,8,9 (so skip the lines 4,5,6)…
Could you tell me what is wrong in my code/configuration, or maybe it's an issue with HSQLDB paging?
Below is my code:
[EDIT] : The problem is with my ItemProcessor that performs modification to the POJOs Entities. Since JPAPagingItemReader made a flush between each reading, the Entities are updated ((this is what I want) . But it seems that the cursor paging is also incremented (as can be seen in the log: row ID 4, 5 and 6 have been skipped). How can I manage this issue ?
#Configuration
#EnableBatchProcessing(modular=true)
public class AppBatchConfig {
#Inject
private InfrastructureConfiguration infrastructureConfiguration;
#Inject private JobBuilderFactory jobs;
#Inject private StepBuilderFactory steps;
#Bean public Job job() {
return jobs.get("Myjob1").start(step1()).build();
}
#Bean public Step step1() {
return steps.get("step1")
.<SNUserPerCampaign, SNUserPerCampaign> chunk(0)
.reader(reader()).processor(processor()).build();
}
#Bean(destroyMethod = "")
#JobScope
public ItemStreamReader<SNUserPerCampaign> reader() String trigramme) {
JpaPagingItemReader reader = new JpaPagingItemReader();
reader.setEntityManagerFactory(infrastructureConfiguration.getEntityManagerFactory());
reader.setQueryString("select t from SNUserPerCampaign t where t.isactive=true");
reader.setPageSize(3));
return reader;
}
#Bean #JobScope
public ItemProcessor<SNUserPerCampaign, SNUserPerCampaign> processor() {
return new MyItemProcessor();
}
}
#Configuration
#EnableBatchProcessing
public class StandaloneInfrastructureConfiguration implements InfrastructureConfiguration {
#Inject private EntityManagerFactory emf;
#Override
public EntityManagerFactory getEntityManagerFactory() {
return emf;
}
}
from my ItemProcessor:
#Override
public SNUserPerCampaign process(SNUserPerCampaign item) throws Exception {
//do some stuff …
//then if (condition) update the Entity pojo :
item.setModificationDate(new Timestamp(System.currentTimeMillis());
item.setIsactive = false;
}
from Spring xml config file:
<tx:annotation-driven transaction-manager="transactionManager" />
<bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager">
<property name="entityManagerFactory" ref="entityManagerFactory" />
</bean>
<bean id="entityManagerFactory" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
<property name="dataSource" ref="dataSource" />
</bean>
<bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
<property name="driverClassName" value="org.hsqldb.jdbcDriver" />
<property name="url" value="jdbc:hsqldb:hsql://localhost:9001/MYAppDB" />
<property name="username" value="sa" />
<property name="password" value="" />
</bean>
trace/log summarized :
11:16:05.728 TRACE MyItemProcessor - item processed: snUserInternalId=1]
11:16:06.038 TRACE MyItemProcessor - item processed: snUserInternalId=2]
11:16:06.350 TRACE MyItemProcessor - item processed: snUserInternalId=3]
11:16:06.674 DEBUG SQL- update SNUSER_CAMPAIGN set ...etc...
11:16:06.677 DEBUG SQL- update SNUSER_CAMPAIGN set ...etc...
11:16:06.679 DEBUG SQL- update SNUSER_CAMPAIGN set ...etc...
11:16:06.681 DEBUG SQL- select ...etc... from SNUSER_CAMPAIGN snuserperc0_
11:16:06.687 TRACE MyItemProcessor - item processed: snUserInternalId=7]
11:16:06.998 TRACE MyItemProcessor - item processed: snUserInternalId=8]
11:16:07.314 TRACE MyItemProcessor - item processed: snUserInternalId=9]
org.springframework.batch.item.database.JpaPagingItemReader creates is own entityManager instance
(from org.springframework.batch.item.database.JpaPagingItemReader#doOpen) :
entityManager = entityManagerFactory.createEntityManager(jpaPropertyMap);
If you are within a transaction, as it seems to be, reader entities are not detached
(from org.springframework.batch.item.database.JpaPagingItemReader#doReadPage):
if (!transacted) {
List<T> queryResult = query.getResultList();
for (T entity : queryResult) {
entityManager.detach(entity);
results.add(entity);
}//end if
} else {
results.addAll(query.getResultList());
tx.commit();
}
For this reason, when you update an item into processor, or writer, this item is still managed by reader's entityManager.
When the item reader reads the next chunk of data, it flushes the context to the database.
So, if we look at your case, after the first chunk of data processes, we have in database:
|id|active
|1 | false
|2 | false
|3 | false
org.springframework.batch.item.database.JpaPagingItemReader uses limit & offset to retrieve paginated data. So the next select created by the reader looks like :
select * from table where active = true offset 3 limits 3.
Reader will miss the items with id 4,5,6, because they are now the first rows retrieved by database.
What you can do, as a workaround, is to use jdbc implementation (org.springframework.batch.item.database.JdbcPagingItemReader) as it does not use limit & offset. It is based on a sorted column (typically the id column), so you will not miss any data.
Of course, you will have to update your data into the writer (using either JPA ou pure JDBC implementation)
Reader will be more verbose:
#Bean
public ItemReader<? extends Entity> reader() {
JdbcPagingItemReader<Entity> reader = new JdbcPagingItemReader<Entity>();
final SqlPagingQueryProviderFactoryBean sqlPagingQueryProviderFactoryBean = new SqlPagingQueryProviderFactoryBean();
sqlPagingQueryProviderFactoryBean.setDataSource(dataSource);
sqlPagingQueryProviderFactoryBean.setSelectClause("select *");
sqlPagingQueryProviderFactoryBean.setFromClause("from <your table name>");
sqlPagingQueryProviderFactoryBean.setWhereClause("where active = true");
sqlPagingQueryProviderFactoryBean.setSortKey("id");
try {
reader.setQueryProvider(sqlPagingQueryProviderFactoryBean.getObject());
} catch (Exception e) {
e.printStackTrace();
}
reader.setDataSource(dataSource);
reader.setPageSize(3);
reader.setRowMapper(new BeanPropertyRowMapper<Entity>(Entity.class));
return reader;
I faced the same case, my reader was a JpaPagingItemReader that queried on a field that was updated in the writer. Consequently skipping half of the items that needed to be updated, due to the page window progressing while the items already read were not in the reader scope anymore.
The simplest workaround for me was to override getPage method on the JpaPagingItemReader to always return the first page.
JpaPagingItemReader<XXXXX> jpaPagingItemReader = new JpaPagingItemReader() {
#Override
public int getPage() {
return 0;
}
};
A couple things to note:
All entities that are returned from the JpaPagingItemReader are detached. We accomplish this in one of two ways. We either create a transaction before querying for the page, then commit the transaction (which detaches all entities associated with the EntityManager for that transaction) or we explicitly call entityManager.detach. We do this so that features like retry and skip can be correctly performed.
While you didn't post all the code in your processor, my hunch is that in the //do some stuff section, your item is getting re-attached which is why the update is occurring. However, without being able to see that code, I can't be sure.
In either case, using an explicit ItemWriter should be done. In fact, I consider it a bug that we don't require an ItemWriter when using java config (we do for XML).
For your specific issue of missing records, you need to keep in mind that a cursor isn't used by any of the *PagingItemReaders. They all execute independent queries for each page of data. So if you update the underlying data in between each page, it can have an impact on the items returned in future pages. For example, if my paging query specifies where val1 > 4 and I have a record that val1 was 1 to be 5, in chunk 2, that item may be returned since it now meets the criteria. If you need to update values that are in your where clause (thereby impacting what falls into the set of data you'd be processing), it's best to add a processed flag of some kind that you can query by instead.
I had the same problem with rows being skipped based on the pageSize.
If I have pageSize set to 2 for example, it would read 2, ignore 2, read 2, ignore 2 etc.
I was building a daemon processor to poll a 'Request' database table for records at a 'Waiting To Be Processed' status. The daemon is designed to run for ever in the background.
I had a 'status' field which was defined in the #NamedQuery and would select records whose status was '10':Waiting to be processed. After the record was processed, the status field would be updated to '20':Error or '30':Success.
This turned out to be the cause of the problem - I was updating a field which was defined in the query. If I introduced a 'processedField' and updated that instead of the 'status' field then no problem - all the records would be read.
As a possible solution to updating the status field, I setMaxItemCount to be the same as the PageSize; this updated the records correctly before step completion. I then keep executing the step until a request is made to stop the daemon. OK, probably not the most efficient way to do it (but I’m still benefiting from the ease of use that JPA provides) but I think it would probably be better to use JdbcPagingItemReader (described above – thanks!). Opinions on the best approach to this batch database polling problem would be welcome :)

spring batch job to execute method in manager

I am new to spring batch so appreciate the help. So far I have two spring batch jobs. Both of them have an item reader(sqls select) and an item writer(sql insert).
They look like this...
<job id="job-daily-tran-counts" xmlns="http://www.springframework.org/schema/batch">
<step id="job-daily-tran-counts-step1">
<tasklet>
<chunk
reader="dailyTranCountJdbcCursorItemReader"
writer="dailyTranCountItemWriter"
commit-interval="1000" />
</tasklet>
</step>
</job>
Now I want to write a simple batch job to execute a method inside one of my managers which refreshes the cache of a number of list of value maps. An item reader and item writer does not really fit in I think. How should I structure this batch job?
To be more specific I have a class named LovManagerImpl and I need to execute the afterPropertiesSet method from spring batch. What's the best way to do that?
public class LovManagerImpl implements LovManager,InitializingBean {
/**
* The list of values data access object factory
*/
#Autowired
public LovDaoFactory lovDaoFactory;
/* (non-Javadoc)
* #see org.springframework.beans.factory.InitializingBean#afterPropertiesSet()
*/
public void afterPropertiesSet() throws ReportingManagerException {
Map<String,LovDao> lovDaoMap = lovDaoFactory.getLovDaoMap();
for (Map.Entry<String,LovDao> entry : lovDaoMap.entrySet()){
String code = (String)entry.getKey();
LovDao dao = (LovDao)entry.getValue();
dao.getLov(code);
}
}
thanks
Use a Tasklet; please refer to Can we write a Spring Batch Job Without ItemReader and ItemWriter answer.
For your specific case - reuse of existing service method - use a MethodInvokingTaskletAdapter.

strange behavior in spring batch about skip policy implementation

I have a spring batch program.
The skip limit is set to 5 and the chunk size is 1000.
I have a job with two steps as below:
<step id="myFileGenerator" next="myReportGenerator">
<tasklet transaction-manager="jobRepository-transactionManager">
<chunk reader="myItemReader" processor="myItemProcessor" writer="myItemWriter" commit-interval="1000" skip-policy="skipPolicy"/>
</tasklet>
<listeners>
<listener ref="mySkipListener"/>
</listeners>
</step>
<step id="myReportGenerator">
<tasklet ref="myReportTasklet" transaction-manager="jobRepository-transactionManager"/>
</step>
The skip policy is as below:
<beans:bean id="skipPolicy" class="com.myPackage.util.Skip_Policy">
<beans:property name="skipLimit" value="5"/>
</beans:bean>
The SkipPolicy class is as below:
public class Skip_Policy implements SkipPolicy {
private int skipLimit;
public void setSkipLimit(final int skipLimit) {
this.skipLimit = skipLimit;
}
public boolean shouldSkip(final Throwable t, final int skipCount) throws SkipLimitExceededException {
if (skipCount < this.skipLimit) {
return true;
}
return false;
}
}
Thus for any error occurring before the skip limit is reached, the skip policy will ignore the error (return true). The job will fail for any error after the skip limit is reached.
The mySkipListener class is as below:
public class mySkipListener implements SkipListener<MyItem, MyItem> {
public void onSkipInProcess(final MyItem item, final Throwable t) {
// TODO Auto-generated method stub
System.out.println("Skipped details during PROCESS is: " + t.getMessage());
}
public void onSkipInRead(final Throwable t) {
System.out.println("Skipped details during READ is: " + t.getMessage());
}
public void onSkipInWrite(final MyItem item, final Throwable t) {
// TODO Auto-generated method stub
System.out.println("Skipped details during WRITE is: " + t.getMessage());
}
}
Now in myItemProcessor I have below code block:
if (item.getTheNumber().charAt(4) == '-') {
item.setProductNumber(item.getTheNumber().substring(0, 3));
} else {
item.setProductNumber("55");
}
For some of the items theNumber field is null and so above code block throws "StringIndexOutofBounds" exception.
But I am seeing a strange behavior which I am not understanding why it is happening.
In all there are 6 items which are having error i.e. theNumber field is null.
If the skip limit is more than the number of errors (i.e. > 6), the sys outs in skip listener class are getting called and the errors skipped are being reported.
However, if the skip limit is less (say 5 as in my example), the sys outs in skip listener class are not getting called at all and I am directly getting the below exception dump on console:
org.springframework.batch.retry.RetryException: Non-skippable exception in recoverer while processing; nested exception is java.lang.StringIndexOutOfBoundsException
at org.springframework.batch.core.step.item.FaultTolerantChunkProcessor$2.recover(FaultTolerantChunkProcessor.java:282)
at org.springframework.batch.retry.support.RetryTemplate.handleRetryExhausted(RetryTemplate.java:416)
at org.springframework.batch.retry.support.RetryTemplate.doExecute(RetryTemplate.java:285)
at org.springframework.batch.retry.support.RetryTemplate.execute(RetryTemplate.java:187)
What is the reason behind this behavior ? What should I do to resolve this ?
Thanks for reading!
The SkipListener is only used at the end of the Chunk, if the tasklet that contains it finishes normally. When you have more errors than the skip-limit, that is reported, via the exception you see, and the tasklet is aborted.
If the number of errors is less than the skip-limit, then the tasklet finishes normally and the SkipListener is invoked once for each skipped line or item - Spring Batch builds a list of them internally as it goes along but only reports at the end.
The idea if this is that if the task fails you are, probably, going to retry it, so knowing what got skipped during an incomplete run is not useful, every time you retry you will get the same notification. Only if everything else succeeds, do you get to see what was skipped. Imaging you are logging the skipped items, you don't want them to be logged as skipped over and over again.
As you have seen, the simple solution is to make the skip-limit large enough. Again the idea is that if you have to skip lots of items, there is probably a more serious problem.