Spring batch improve performance by Partitioning - spring-batch

I need to convert the existing project to Spring batch job to make improvement the job's speed.
Suppose I have the first tasklet to retrive a list of data from database and put it to listener. So the next step can retrieve it from #BeforeStep and do some condition to get another list (10k-20k records) then proceed multiple business logic for each record.
But I am stuck how to implement this step by partition in Spring batch. I found all tutorials using directly query in reader and injected by the ExecutionContext in rangePartitioner. But I can't follow like that way.
<job id="testJob" xmlns="http://www.springframework.org/schema/batch">
<step id="step1" next="step2">
<tasklet ref="driver"/>
<listeners>
<listener ref="promotionListener">
</listener>
</listeners>
</step>
<step id="step2">
<tasklet >
<chunk reader="bmtbBillGenReqReader"
processor="bmtbBillGenReqProcessor"
writer="bmtbBillGenReqWriter"
commit-interval="1">
</chunk>
</tasklet>
</step>
</job>
<bean id="promotionListener"
class="org.springframework.batch.core.listener.ExecutionContextPromotionListener">
<property name="keys">
<util:list>
<value>billGenRequests</value>
</util:list>
</property>
</bean>
Please advise how can I implement partition from step2. maybe store the new list from step2 to csv file or somethings first?

You could implement your own partitionner instead of using RangePartitionner and retrieve the data in this implementation, instead of a dedicated step.
Then pass data for each partition to create according to your needs. For example
public class FilesPartitioner implements Partitioner {
private JdbcOperations jdbcTemplate;
#Autowired
private DataSource dataSource;
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> map = new HashMap<>();
List<String> filesname = jdbcTemplate.queryForList(
"SELECT DISTINCT FILENAME FROM MYTABLE", String.class);
for (int i = 0; i < filesname.size(); i++) {
ExecutionContext executionContext = new ExecutionContext();
executionContext.put("data", filesname.get(i));
String key = filesname.get(i);
map.put(key, executionContext);
}
return map;
}
}
And inject the parameters accordingly in reader

Related

unable to set/get the property values in the rowMapper

I'm trying to access the StepExecution in my RowMapper but unable to do so. I have a property set in the xml called 'prop1'. I expect this to be set but it is not setting.I also added a #BeforeStep method to the RowMapper hoping I can get the stepExecutionContext but this method is never invoked. Is there something else I need to do?
Here is my xml:
<bean id="bean1"
class="org.springframework.batch.item.database.JdbcCursorItemReader"
scope="step">
<property name="dataSource" ref="dataSource" />
<property name="sql"
value="${sql}"/>
<property name="fetchSize" value="${fetchSize}"></property>
<property name="rowMapper">
<bean id="rowMapper1" class="c.p.MyRowMapper" scope="step">
<property name="prop1" value="${prop1}"></property>
</property>
Here is my RowMapper:
public class MyRowMapper implements RowMapper<Object>{
private String prop1;
private StepExecution se;
public String getProp1() {
return stepFatpCount;
}
public void setProp1(String rop1) {
this. prop1 = prop1;
}
#BeforeStep
public void beforeStep(StepExecution stepExecution){
this.se = stepExecution;
}
}
I have some properties set in the stepExecutionContext before this step in another step and I want to use them here in the RowMapper. The same thing works in the ItemProcessor but not the RowMapper. Please let me know if I need to do something more for lazy binding or any other issue.
Thanks.
The step execution context is not shared between steps. Maybe you mean job execution context?
I suppose you could register rowMapper1 as a listener in the step (with the <listeners> tag), but if you just want to read some value from the job execution context you could use
value="#{jobExecutionContext['foobar']}"
If you do want to have injected some value from the step execution context, you just have to replace stepExecutionContext above.

Reading multiple tables Spring batch ItemReader

I want to read multiple tables to fetch few fields from each of these table & write it to a xml.
I have created a custom ItemReader and have multiple queries.
I have two issues
1) My Reader goes into an infinte loop as I am not sure when & how to return null
2) What is the best way to consolidate data from multiple tables & send it to ItemWriter ?
public class SolrTransformProductReader implements ItemReader <ProductWithPrograms> {
#Autowired
private JdbcTemplate jdbcTemplate;
private String sql1 = "Select PRODUCT_CODE from product";
private String sql2 = "Select PRODUCT_CODE, CONTRIBUTOR_ID from product_Contributor";
#Override
public ProductWithPrograms read() throws Exception {
SqlRowSet productRows = jdbcTemplate.queryForRowSet(sql1);
while(productRows.next()) {
System.out.println("Product Code " + productRows.getString("PRODUCT_CODE"));
ProductWithPrograms pp = new ProductWithPrograms();
pp.setProduct_Code(productRows.getString("PRODUCT_CODE"));
return pp;
}
return null;
}
}
And my xml is as below
<job id="SEG_SolrTransformation" xmlns="http://www.springframework.org/schema/batch">
<batch:step id="solrProductTransformation">
<tasklet>
<chunk reader="solrTransformProductReader" writer="solrTransformProductWriter" commit-interval="999" />
</tasklet>
</batch:step>
</job>
better try to use JdbcPgingItemReader for reading the data ,which is provided by spring batch. you can create start multile instances of jobs for each table and convert them into xml.
you can specify select,from,where clauses as parameters for the job

spring batch job to execute method in manager

I am new to spring batch so appreciate the help. So far I have two spring batch jobs. Both of them have an item reader(sqls select) and an item writer(sql insert).
They look like this...
<job id="job-daily-tran-counts" xmlns="http://www.springframework.org/schema/batch">
<step id="job-daily-tran-counts-step1">
<tasklet>
<chunk
reader="dailyTranCountJdbcCursorItemReader"
writer="dailyTranCountItemWriter"
commit-interval="1000" />
</tasklet>
</step>
</job>
Now I want to write a simple batch job to execute a method inside one of my managers which refreshes the cache of a number of list of value maps. An item reader and item writer does not really fit in I think. How should I structure this batch job?
To be more specific I have a class named LovManagerImpl and I need to execute the afterPropertiesSet method from spring batch. What's the best way to do that?
public class LovManagerImpl implements LovManager,InitializingBean {
/**
* The list of values data access object factory
*/
#Autowired
public LovDaoFactory lovDaoFactory;
/* (non-Javadoc)
* #see org.springframework.beans.factory.InitializingBean#afterPropertiesSet()
*/
public void afterPropertiesSet() throws ReportingManagerException {
Map<String,LovDao> lovDaoMap = lovDaoFactory.getLovDaoMap();
for (Map.Entry<String,LovDao> entry : lovDaoMap.entrySet()){
String code = (String)entry.getKey();
LovDao dao = (LovDao)entry.getValue();
dao.getLov(code);
}
}
thanks
Use a Tasklet; please refer to Can we write a Spring Batch Job Without ItemReader and ItemWriter answer.
For your specific case - reuse of existing service method - use a MethodInvokingTaskletAdapter.

How do I pass previous step data to partitioner

I am trying to run a partition oriented job and having trouble in accessing the stepExecutionContext stored data. Here is my job definition
<batch:job id="job1" restartable="false" incrementer="idIncrementer">
<batch:step id="readwritestep" next="partitionStep">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="reader1"
writer="writer1"
commit-interval="500"/>
</batch:tasklet>
<batch:listeners>
<batch:listener ref="promotionListener"/>
</batch:listeners>
</batch:step>
<batch:step id="partitionStep" >
<batch:partition step="detailsStep" partitioner="partitioner">
<batch:handler grid-size="10" task-executor="taskExecutor" />
</batch:partition>
</batch:step>
</batch:job>
<batch:step id="detailsStep">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="reader2"
processor="processor"
writer="writer2"
commit-interval="1500"/>
</batch:tasklet>
</batch:step>
While processing readwritestep I am storing some data in the step context and promoting to job context so that I can access in partioner. But custom partioner which I have implemented doesnt have any reference to the parent step where I can get access to the stored data...Even though partitioner is bound to STEP it cant access parent step data...am I missing something here? One option partitioner is giving is jdbctemplate to generate splitter contexts which I am not in intrest. I have tried to inject the #beforestep annotation to get access the context data but its not getting invoked..I dont want to perform the JDBC read to generate the slave data...I want to get the LIST data stored in the step/job context execution and generate the splitter contexts...Can somebody help me out to point me to the right direction so that I can access that data...
Here is the partitioning class...
public class ProductDetailsPartitioner implements Partitioner {
private List<Product> prds;
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
List<String> referencIds = new ArrayList<String>();
for (Product prd : prds) {
referencIds.add(prd.getReferenceId());
}
Map<String, ExecutionContext> results = new LinkedHashMap<String,ExecutionContext>();
for (String referencId : referencIds) {
ExecutionContext context = new ExecutionContext();
context.put("referenceId", referencId);
results.put("partition." + referencId, context);
}
return results;
}
#BeforeStep
public void retrieveInterstepData(StepExecution stepExecution) {
System.out.println("Entered Before step in partion");
JobExecution jobExecution = stepExecution.getJobExecution();
ExecutionContext jobContext = jobExecution.getExecutionContext();
System.out.println("ExecutionContext"+jobContext);
this.prds = (List<Product>) jobContext.get("products");
}
}
OK, I was able to pass this issue by taking another approach. Now I am processing partioner based on jobid rather than passing the reference through the execution context. Its running fine. Its hard to explain the real solution as it is based on the real bussiness needs.

How to pass data between Spring Batch jobs

I'm familiar with how to pass data between steps in a Spring Batch job. But what happens when you job is composed of many smaller jobs? In the example below, I would like to set some data in the JobExecutionContext at the end of the first job, siNotificationJob. Then, that data could be read from the JobExecutionContext of StepExecutionContext in the next job, ciNotificationJob. Do I need to promote this data somehow? I can't seem to see the results in the Job Parameter Extractor defined in step 'ciNotificationJob' that I use to configure my job parameters.
Thoughts?
Andrew
<job id="notificationJob" xmlns="http://www.springframework.org/schema/batch">
<batch:step id="pn_step_0" next="pn-step-1">
<batch:job ref="siNotificationJob" job-launcher="jobLauncher"
job-parameters-extractor="jobParamsExtractor"/>
</batch:step>
<batch:step id="pn-step-1" next="pn-step-2">
<batch:job ref="ciNotificationJob" job-launcher="jobLauncher"
job-parameters-extractor="jobParamsExtractor"/>
</batch:step>
</job>
I was able to resolve this. I'll show you through example how I solved it. It was complicated but I think the end result is fairly easy to understand.
I have one overall job called 'notificationJob'. It has three steps that calls 3 different jobs (not steps). Each of these jobs can run independently, or be called from within the top level 'notificationJob'. Also, each sub-job has many steps. I'm not going to show all those steps here, but just wanted to highlight that these are complete jobs themselves with more multliple steps.
<job id="notificationJob" xmlns="http://www.springframework.org/schema/batch">
<batch:listeners>
<batch:listener ref="pn_job-parent-listener" />
</batch:listeners>
<batch:step id="pn_step-0" next="pn-step-1">
<batch:job ref="siNotificationJob" job-launcher="jobLauncher"
job-parameters-extractor="futureSiParamsExtractor"/>
</batch:step>
<batch:step id="pn-step-1" next="pn-step-2">
<batch:job ref="ciNotificationJob" job-launcher="jobLauncher"
job-parameters-extractor="futureCiParamsExtractor"/>
</batch:step>
<batch:step id="pn-step-2">
<batch:job ref="combineResultsJob" job-launcher="jobLauncher"
job-parameters-extractor="jobParamsExtractor"/>
</batch:step>
</job>
The key is being able to extract the results from one job and read them in the next job. Now, you could do this multiple ways. One way would be to output the result from one job into a DB or text file and then have the next job read from that file/table. Since I wasn't dealing with that much data, I passed the information around in memory. So, you'll notice the job-parameter-extractors. You can either rely on a built-in implementation of a paramter extractor, or you can implement your own. I actually use both. All they do is extract the value from the StepExecution and then we'll need to promote/move them to the next sub-job.
<bean id="jobParamsExtractor" class="org.springframework.batch.core.step.job.DefaultJobParametersExtractor">
<property name="keys">
<list>
<value>OUTPUT</value>
</list>
</property>
</bean>
<bean id="futureSiParamsExtractor" class="jobs.SlideDatesParamExtractor">
<property name="mode" value="FORWARD" />
<property name="addedParams">
<map><entry>
<key><value>outputName</value></key>
<value>FUTURE_SI_JOB_RESULTS</value>
</entry></map>
</property>
</bean>
<bean id="futureCiParamsExtractor" class="jobs.SlideDatesParamExtractor">
<property name="mode" value="FORWARD" />
<property name="addedParams">
<map><entry>
<key><value>outputName</value></key>
<value>FUTURE_CI_JOB_RESULTS</value>
</entry></map>
</property>
</bean>
Finally, you'll notice that there is a parent job listener. This is the magic that transfer the state from one job and makes it available to the next. Here is my implementation of the class that does that.
<bean id="pn_job-state-listener" class="jobs.JobStateListener">
<property name="parentJobListener" ref="pn_job-parent-listener" />
</bean>
<bean id="pn_job-parent-listener" class="cjobs.ParentJobListener">
</bean>
package jobs.permnotification;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobExecutionListener;
public class ParentJobListener implements JobExecutionListener
{
private JobExecution parentExecution;
#Override
public void beforeJob(JobExecution jobExecution)
{
this.parentExecution = jobExecution;
}
#Override
public void afterJob(JobExecution jobExecution)
{
// TODO Auto-generated method stub
}
public void setParentExecution(JobExecution parentExecution)
{
this.parentExecution = parentExecution;
}
public JobExecution getParentExecution()
{
return parentExecution;
}
}
package jobs.permnotification;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobExecutionListener;
public class JobStateListener implements JobExecutionListener
{
private ParentJobListener parentJobListener;
#Override
public void beforeJob(JobExecution jobExecution)
{
if(parentJobListener == null || parentJobListener.getParentExecution() == null) return;
passStateFromParentToJob(StepKey.FUTURE_SI_JOB_RESULTS.toString(), jobExecution);
passStateFromParentToJob(StepKey.FUTURE_CI_JOB_RESULTS.toString(), jobExecution);
passStateFromParentToJob(StepKey.OUTPUT.toString(), jobExecution);
}
#Override
public void afterJob(JobExecution jobExecution)
{
if(parentJobListener == null || parentJobListener.getParentExecution() == null) return;
//take state from child step and move it into the parent execution context
passStateFromJobToParent(StepKey.FUTURE_SI_JOB_RESULTS.toString(), jobExecution);
passStateFromJobToParent(StepKey.FUTURE_CI_JOB_RESULTS.toString(), jobExecution);
passStateFromJobToParent(StepKey.OUTPUT.toString(), jobExecution);
}
private void passStateFromJobToParent(String key, JobExecution jobExecution)
{
Object obj = jobExecution.getExecutionContext().get(key);
if(obj != null)
parentJobListener.getParentExecution().getExecutionContext().put(key, obj);
}
private void passStateFromParentToJob(String key, JobExecution jobExecution)
{
Object obj = parentJobListener.getParentExecution().getExecutionContext().get(key);
if(obj != null)
jobExecution.getExecutionContext().put(key, obj);
}
public void setParentJobListener(ParentJobListener parentJobListener)
{
this.parentJobListener = parentJobListener;
}
public ParentJobListener getParentJobListener()
{
return parentJobListener;
}
}
this is sort of a hack.... recommend you use spring integration instead..but see if this applies for your situation.
if you have the spring batch meta data tables set up, you can probably get at the data that you generate within each job if you query the tables for your latest job run. All your data in the job execution context is stored and can be queried.
spring batch meta tables