I am new to spring batch so appreciate the help. So far I have two spring batch jobs. Both of them have an item reader(sqls select) and an item writer(sql insert).
They look like this...
<job id="job-daily-tran-counts" xmlns="http://www.springframework.org/schema/batch">
<step id="job-daily-tran-counts-step1">
<tasklet>
<chunk
reader="dailyTranCountJdbcCursorItemReader"
writer="dailyTranCountItemWriter"
commit-interval="1000" />
</tasklet>
</step>
</job>
Now I want to write a simple batch job to execute a method inside one of my managers which refreshes the cache of a number of list of value maps. An item reader and item writer does not really fit in I think. How should I structure this batch job?
To be more specific I have a class named LovManagerImpl and I need to execute the afterPropertiesSet method from spring batch. What's the best way to do that?
public class LovManagerImpl implements LovManager,InitializingBean {
/**
* The list of values data access object factory
*/
#Autowired
public LovDaoFactory lovDaoFactory;
/* (non-Javadoc)
* #see org.springframework.beans.factory.InitializingBean#afterPropertiesSet()
*/
public void afterPropertiesSet() throws ReportingManagerException {
Map<String,LovDao> lovDaoMap = lovDaoFactory.getLovDaoMap();
for (Map.Entry<String,LovDao> entry : lovDaoMap.entrySet()){
String code = (String)entry.getKey();
LovDao dao = (LovDao)entry.getValue();
dao.getLov(code);
}
}
thanks
Use a Tasklet; please refer to Can we write a Spring Batch Job Without ItemReader and ItemWriter answer.
For your specific case - reuse of existing service method - use a MethodInvokingTaskletAdapter.
Related
I need to convert the existing project to Spring batch job to make improvement the job's speed.
Suppose I have the first tasklet to retrive a list of data from database and put it to listener. So the next step can retrieve it from #BeforeStep and do some condition to get another list (10k-20k records) then proceed multiple business logic for each record.
But I am stuck how to implement this step by partition in Spring batch. I found all tutorials using directly query in reader and injected by the ExecutionContext in rangePartitioner. But I can't follow like that way.
<job id="testJob" xmlns="http://www.springframework.org/schema/batch">
<step id="step1" next="step2">
<tasklet ref="driver"/>
<listeners>
<listener ref="promotionListener">
</listener>
</listeners>
</step>
<step id="step2">
<tasklet >
<chunk reader="bmtbBillGenReqReader"
processor="bmtbBillGenReqProcessor"
writer="bmtbBillGenReqWriter"
commit-interval="1">
</chunk>
</tasklet>
</step>
</job>
<bean id="promotionListener"
class="org.springframework.batch.core.listener.ExecutionContextPromotionListener">
<property name="keys">
<util:list>
<value>billGenRequests</value>
</util:list>
</property>
</bean>
Please advise how can I implement partition from step2. maybe store the new list from step2 to csv file or somethings first?
You could implement your own partitionner instead of using RangePartitionner and retrieve the data in this implementation, instead of a dedicated step.
Then pass data for each partition to create according to your needs. For example
public class FilesPartitioner implements Partitioner {
private JdbcOperations jdbcTemplate;
#Autowired
private DataSource dataSource;
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> map = new HashMap<>();
List<String> filesname = jdbcTemplate.queryForList(
"SELECT DISTINCT FILENAME FROM MYTABLE", String.class);
for (int i = 0; i < filesname.size(); i++) {
ExecutionContext executionContext = new ExecutionContext();
executionContext.put("data", filesname.get(i));
String key = filesname.get(i);
map.put(key, executionContext);
}
return map;
}
}
And inject the parameters accordingly in reader
Hi I have requirement to read (n number) of flat file. During file reading if received FileParseException: from reader then stop the current file reading and came out safely and process next file and continue the job execution. currently i have this xml config but i don't want to go with this because i don't have a really skip limit count. is there any way to handle this scenario may be using ItemReaderListener ?
<chunk reader="flatFileItemReader" writer="itemWriter"
commit-interval="10" skip-limit="2">
<skippable-exception-classes>
<include class="org.springframework.batch.item.file.FlatFileParseException"/>
</skippable-exception-classes>
Instead of specifying a skip-limit, you can use a policy. There are several out-of-the-box skip policies, it sounds like you always want to skip (no limit), use AlwaysSkipItemSkipPolicy.
Example config :
<batch:skip-policy>
<bean:bean class="org.springframework.batch.core.step.skip.AlwaysSkipItemSkipPolicy"/>
</batch:skip-policy>
thanks Doeleman, based upon your input i am able to skip exception usingAlwaysSkipItemSkipPolicy this is how i have implemented
public class SkipPolicy extends AlwaysSkipItemSkipPolicy {
#Override
public boolean shouldSkip(java.lang.Throwable t, int skipCount){
if(t instanceof NonSkippableReadException){
return true;
}
return false;
}
}
xml config.
<batch:chunk reader="cvsFileItemReader" writer="mysqlItemWriter"
commit-interval="2" skip-policy="mySkipPolicy">
<bean id="mySkipPolicy" class="com.model.SkipPolicy"/>
I have a DAO class to retrieve a set of data from Hibernate.
<batch:step id="firstStep">
<batch:tasklet>
<batch:chunk reader="firstReader" writer="firstWriter"
processor="itemProcessor" commit-interval="2">
</batch:chunk>
</batch:tasklet>
</batch:step>
<bean id="firstReader" class="com.process.MyReader"
scope="step">
</bean>
Inside my reader, I will call DAO to get the data before read.
public class MyReader implements ItemReader<JobInstance>{
private List<JobInstance> jobList;
private String currentDate;
#Autowired
private JobDAO perDAO;
#BeforeRead
public void init() {
//jobList= perDAO.getPersonAJobList(currentDate);
}
#Override
public JobInstance read() throws Exception, UnexpectedInputException,
ParseException, NonTransientResourceException {
return !jobList.isEmpty() ? jobList.remove(0) : null;
}
#Value("#{jobParameters['currentDate']}")
public void setCurrentDate(String currentDate) {
this.currentDate = currentDate;
}
#Override
public void beforeStep(StepExecution stepExecution) {
// TODO Auto-generated method stub
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
// TODO Auto-generated method stub
return null;
}
}
When I run the batch job, the batch job keep repeating reading and processing.
[org.springframework.batch.repeat.support.RepeatTemplate] [getNextResult] [372] - Repeat operation about to start at count=1
Below is my DAO class
#Autowired
private QueryManager queryManager;
#Autowired
public JobDAO Impl(SessionFactory sessionFactory) {
super(sessionFactory, JobInstance.class);
}
public List<JobInstance> getPersonAJobList(String currentDate) {
String sql = queryManager.getNamedQuery("getJobList");
System.out.println("---------------------- " + sql + " " + currentDate);
SQLQuery query = this.getCurrentSession().createSQLQuery(sql);
query.setParameter("current_date", currentDate);
....
return result;
}
if you fill the list within the #BeforeRead annotated method, the list will be renewed before every read
see http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/annotation/BeforeRead.html
Marks a method to be called before an item is read from an ItemReader
if you need to get the items from a DAO you need to think about the implementation of either
easy way - keep the current implementation, but add a check in BeforeRead to init the list only once
a stateful DAO which fills the list once and removes items for every
read call
a stateless DAO with pagination
a better way is to move the data access (the SQL) into the batch, Spring Batch provides out of the box readers for SQL, Hibernate and even more... see http://docs.spring.io/spring-batch/reference/html/listOfReadersAndWriters.html
The init method should be called only once. The correct way to do this is either to implement the InitializingBean interface and implementing the afterPropertiesSet method, or using the #PostConstruct annotation instead of #BeforeRead.
The use of #BeforeRead is definitely wrong and makes no sense.
As also mentioned in the comments to Michael's answers, you should also consider to use one of the standard readers to get data from a db. If you just get a couple of hundred or thousand entries from getPersonAJobList it won't be a problem, but if you get millions of entries, it would definitely be wrong approach.
What about add an 'init' flag into your reader? Into MyReader.read():
if flag is not setted call jobDAO to fill jobList and set flag
If flag is setted consume jobList items.
Be careful using jobList.remove(0) because your reader seems not to be restartable; you need to maintain last consumed items index into execution-context so a restart will continue from first item of last not commited chunk.
I want to read multiple tables to fetch few fields from each of these table & write it to a xml.
I have created a custom ItemReader and have multiple queries.
I have two issues
1) My Reader goes into an infinte loop as I am not sure when & how to return null
2) What is the best way to consolidate data from multiple tables & send it to ItemWriter ?
public class SolrTransformProductReader implements ItemReader <ProductWithPrograms> {
#Autowired
private JdbcTemplate jdbcTemplate;
private String sql1 = "Select PRODUCT_CODE from product";
private String sql2 = "Select PRODUCT_CODE, CONTRIBUTOR_ID from product_Contributor";
#Override
public ProductWithPrograms read() throws Exception {
SqlRowSet productRows = jdbcTemplate.queryForRowSet(sql1);
while(productRows.next()) {
System.out.println("Product Code " + productRows.getString("PRODUCT_CODE"));
ProductWithPrograms pp = new ProductWithPrograms();
pp.setProduct_Code(productRows.getString("PRODUCT_CODE"));
return pp;
}
return null;
}
}
And my xml is as below
<job id="SEG_SolrTransformation" xmlns="http://www.springframework.org/schema/batch">
<batch:step id="solrProductTransformation">
<tasklet>
<chunk reader="solrTransformProductReader" writer="solrTransformProductWriter" commit-interval="999" />
</tasklet>
</batch:step>
</job>
better try to use JdbcPgingItemReader for reading the data ,which is provided by spring batch. you can create start multile instances of jobs for each table and convert them into xml.
you can specify select,from,where clauses as parameters for the job
I 'm using Spring Batch(3.0.1.RELEASE) / JPA and an HSQLBD server database.
I need to browse an entire table (using paging) and update items (one by one). So I used a jpaPagingItemReader. But when I run the job I can see that some rows are skipped, and the number of skipped rows is equal to the page size. For i.e. if my table has 12 rows and the jpaPagingItemReader.pagesize = 3 the job will read : lines 1,2,3 then lines 7,8,9 (so skip the lines 4,5,6)…
Could you tell me what is wrong in my code/configuration, or maybe it's an issue with HSQLDB paging?
Below is my code:
[EDIT] : The problem is with my ItemProcessor that performs modification to the POJOs Entities. Since JPAPagingItemReader made a flush between each reading, the Entities are updated ((this is what I want) . But it seems that the cursor paging is also incremented (as can be seen in the log: row ID 4, 5 and 6 have been skipped). How can I manage this issue ?
#Configuration
#EnableBatchProcessing(modular=true)
public class AppBatchConfig {
#Inject
private InfrastructureConfiguration infrastructureConfiguration;
#Inject private JobBuilderFactory jobs;
#Inject private StepBuilderFactory steps;
#Bean public Job job() {
return jobs.get("Myjob1").start(step1()).build();
}
#Bean public Step step1() {
return steps.get("step1")
.<SNUserPerCampaign, SNUserPerCampaign> chunk(0)
.reader(reader()).processor(processor()).build();
}
#Bean(destroyMethod = "")
#JobScope
public ItemStreamReader<SNUserPerCampaign> reader() String trigramme) {
JpaPagingItemReader reader = new JpaPagingItemReader();
reader.setEntityManagerFactory(infrastructureConfiguration.getEntityManagerFactory());
reader.setQueryString("select t from SNUserPerCampaign t where t.isactive=true");
reader.setPageSize(3));
return reader;
}
#Bean #JobScope
public ItemProcessor<SNUserPerCampaign, SNUserPerCampaign> processor() {
return new MyItemProcessor();
}
}
#Configuration
#EnableBatchProcessing
public class StandaloneInfrastructureConfiguration implements InfrastructureConfiguration {
#Inject private EntityManagerFactory emf;
#Override
public EntityManagerFactory getEntityManagerFactory() {
return emf;
}
}
from my ItemProcessor:
#Override
public SNUserPerCampaign process(SNUserPerCampaign item) throws Exception {
//do some stuff …
//then if (condition) update the Entity pojo :
item.setModificationDate(new Timestamp(System.currentTimeMillis());
item.setIsactive = false;
}
from Spring xml config file:
<tx:annotation-driven transaction-manager="transactionManager" />
<bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager">
<property name="entityManagerFactory" ref="entityManagerFactory" />
</bean>
<bean id="entityManagerFactory" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
<property name="dataSource" ref="dataSource" />
</bean>
<bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
<property name="driverClassName" value="org.hsqldb.jdbcDriver" />
<property name="url" value="jdbc:hsqldb:hsql://localhost:9001/MYAppDB" />
<property name="username" value="sa" />
<property name="password" value="" />
</bean>
trace/log summarized :
11:16:05.728 TRACE MyItemProcessor - item processed: snUserInternalId=1]
11:16:06.038 TRACE MyItemProcessor - item processed: snUserInternalId=2]
11:16:06.350 TRACE MyItemProcessor - item processed: snUserInternalId=3]
11:16:06.674 DEBUG SQL- update SNUSER_CAMPAIGN set ...etc...
11:16:06.677 DEBUG SQL- update SNUSER_CAMPAIGN set ...etc...
11:16:06.679 DEBUG SQL- update SNUSER_CAMPAIGN set ...etc...
11:16:06.681 DEBUG SQL- select ...etc... from SNUSER_CAMPAIGN snuserperc0_
11:16:06.687 TRACE MyItemProcessor - item processed: snUserInternalId=7]
11:16:06.998 TRACE MyItemProcessor - item processed: snUserInternalId=8]
11:16:07.314 TRACE MyItemProcessor - item processed: snUserInternalId=9]
org.springframework.batch.item.database.JpaPagingItemReader creates is own entityManager instance
(from org.springframework.batch.item.database.JpaPagingItemReader#doOpen) :
entityManager = entityManagerFactory.createEntityManager(jpaPropertyMap);
If you are within a transaction, as it seems to be, reader entities are not detached
(from org.springframework.batch.item.database.JpaPagingItemReader#doReadPage):
if (!transacted) {
List<T> queryResult = query.getResultList();
for (T entity : queryResult) {
entityManager.detach(entity);
results.add(entity);
}//end if
} else {
results.addAll(query.getResultList());
tx.commit();
}
For this reason, when you update an item into processor, or writer, this item is still managed by reader's entityManager.
When the item reader reads the next chunk of data, it flushes the context to the database.
So, if we look at your case, after the first chunk of data processes, we have in database:
|id|active
|1 | false
|2 | false
|3 | false
org.springframework.batch.item.database.JpaPagingItemReader uses limit & offset to retrieve paginated data. So the next select created by the reader looks like :
select * from table where active = true offset 3 limits 3.
Reader will miss the items with id 4,5,6, because they are now the first rows retrieved by database.
What you can do, as a workaround, is to use jdbc implementation (org.springframework.batch.item.database.JdbcPagingItemReader) as it does not use limit & offset. It is based on a sorted column (typically the id column), so you will not miss any data.
Of course, you will have to update your data into the writer (using either JPA ou pure JDBC implementation)
Reader will be more verbose:
#Bean
public ItemReader<? extends Entity> reader() {
JdbcPagingItemReader<Entity> reader = new JdbcPagingItemReader<Entity>();
final SqlPagingQueryProviderFactoryBean sqlPagingQueryProviderFactoryBean = new SqlPagingQueryProviderFactoryBean();
sqlPagingQueryProviderFactoryBean.setDataSource(dataSource);
sqlPagingQueryProviderFactoryBean.setSelectClause("select *");
sqlPagingQueryProviderFactoryBean.setFromClause("from <your table name>");
sqlPagingQueryProviderFactoryBean.setWhereClause("where active = true");
sqlPagingQueryProviderFactoryBean.setSortKey("id");
try {
reader.setQueryProvider(sqlPagingQueryProviderFactoryBean.getObject());
} catch (Exception e) {
e.printStackTrace();
}
reader.setDataSource(dataSource);
reader.setPageSize(3);
reader.setRowMapper(new BeanPropertyRowMapper<Entity>(Entity.class));
return reader;
I faced the same case, my reader was a JpaPagingItemReader that queried on a field that was updated in the writer. Consequently skipping half of the items that needed to be updated, due to the page window progressing while the items already read were not in the reader scope anymore.
The simplest workaround for me was to override getPage method on the JpaPagingItemReader to always return the first page.
JpaPagingItemReader<XXXXX> jpaPagingItemReader = new JpaPagingItemReader() {
#Override
public int getPage() {
return 0;
}
};
A couple things to note:
All entities that are returned from the JpaPagingItemReader are detached. We accomplish this in one of two ways. We either create a transaction before querying for the page, then commit the transaction (which detaches all entities associated with the EntityManager for that transaction) or we explicitly call entityManager.detach. We do this so that features like retry and skip can be correctly performed.
While you didn't post all the code in your processor, my hunch is that in the //do some stuff section, your item is getting re-attached which is why the update is occurring. However, without being able to see that code, I can't be sure.
In either case, using an explicit ItemWriter should be done. In fact, I consider it a bug that we don't require an ItemWriter when using java config (we do for XML).
For your specific issue of missing records, you need to keep in mind that a cursor isn't used by any of the *PagingItemReaders. They all execute independent queries for each page of data. So if you update the underlying data in between each page, it can have an impact on the items returned in future pages. For example, if my paging query specifies where val1 > 4 and I have a record that val1 was 1 to be 5, in chunk 2, that item may be returned since it now meets the criteria. If you need to update values that are in your where clause (thereby impacting what falls into the set of data you'd be processing), it's best to add a processed flag of some kind that you can query by instead.
I had the same problem with rows being skipped based on the pageSize.
If I have pageSize set to 2 for example, it would read 2, ignore 2, read 2, ignore 2 etc.
I was building a daemon processor to poll a 'Request' database table for records at a 'Waiting To Be Processed' status. The daemon is designed to run for ever in the background.
I had a 'status' field which was defined in the #NamedQuery and would select records whose status was '10':Waiting to be processed. After the record was processed, the status field would be updated to '20':Error or '30':Success.
This turned out to be the cause of the problem - I was updating a field which was defined in the query. If I introduced a 'processedField' and updated that instead of the 'status' field then no problem - all the records would be read.
As a possible solution to updating the status field, I setMaxItemCount to be the same as the PageSize; this updated the records correctly before step completion. I then keep executing the step until a request is made to stop the daemon. OK, probably not the most efficient way to do it (but I’m still benefiting from the ease of use that JPA provides) but I think it would probably be better to use JdbcPagingItemReader (described above – thanks!). Opinions on the best approach to this batch database polling problem would be welcome :)