How to pass data between Spring Batch jobs - spring-batch

I'm familiar with how to pass data between steps in a Spring Batch job. But what happens when you job is composed of many smaller jobs? In the example below, I would like to set some data in the JobExecutionContext at the end of the first job, siNotificationJob. Then, that data could be read from the JobExecutionContext of StepExecutionContext in the next job, ciNotificationJob. Do I need to promote this data somehow? I can't seem to see the results in the Job Parameter Extractor defined in step 'ciNotificationJob' that I use to configure my job parameters.
Thoughts?
Andrew
<job id="notificationJob" xmlns="http://www.springframework.org/schema/batch">
<batch:step id="pn_step_0" next="pn-step-1">
<batch:job ref="siNotificationJob" job-launcher="jobLauncher"
job-parameters-extractor="jobParamsExtractor"/>
</batch:step>
<batch:step id="pn-step-1" next="pn-step-2">
<batch:job ref="ciNotificationJob" job-launcher="jobLauncher"
job-parameters-extractor="jobParamsExtractor"/>
</batch:step>
</job>

I was able to resolve this. I'll show you through example how I solved it. It was complicated but I think the end result is fairly easy to understand.
I have one overall job called 'notificationJob'. It has three steps that calls 3 different jobs (not steps). Each of these jobs can run independently, or be called from within the top level 'notificationJob'. Also, each sub-job has many steps. I'm not going to show all those steps here, but just wanted to highlight that these are complete jobs themselves with more multliple steps.
<job id="notificationJob" xmlns="http://www.springframework.org/schema/batch">
<batch:listeners>
<batch:listener ref="pn_job-parent-listener" />
</batch:listeners>
<batch:step id="pn_step-0" next="pn-step-1">
<batch:job ref="siNotificationJob" job-launcher="jobLauncher"
job-parameters-extractor="futureSiParamsExtractor"/>
</batch:step>
<batch:step id="pn-step-1" next="pn-step-2">
<batch:job ref="ciNotificationJob" job-launcher="jobLauncher"
job-parameters-extractor="futureCiParamsExtractor"/>
</batch:step>
<batch:step id="pn-step-2">
<batch:job ref="combineResultsJob" job-launcher="jobLauncher"
job-parameters-extractor="jobParamsExtractor"/>
</batch:step>
</job>
The key is being able to extract the results from one job and read them in the next job. Now, you could do this multiple ways. One way would be to output the result from one job into a DB or text file and then have the next job read from that file/table. Since I wasn't dealing with that much data, I passed the information around in memory. So, you'll notice the job-parameter-extractors. You can either rely on a built-in implementation of a paramter extractor, or you can implement your own. I actually use both. All they do is extract the value from the StepExecution and then we'll need to promote/move them to the next sub-job.
<bean id="jobParamsExtractor" class="org.springframework.batch.core.step.job.DefaultJobParametersExtractor">
<property name="keys">
<list>
<value>OUTPUT</value>
</list>
</property>
</bean>
<bean id="futureSiParamsExtractor" class="jobs.SlideDatesParamExtractor">
<property name="mode" value="FORWARD" />
<property name="addedParams">
<map><entry>
<key><value>outputName</value></key>
<value>FUTURE_SI_JOB_RESULTS</value>
</entry></map>
</property>
</bean>
<bean id="futureCiParamsExtractor" class="jobs.SlideDatesParamExtractor">
<property name="mode" value="FORWARD" />
<property name="addedParams">
<map><entry>
<key><value>outputName</value></key>
<value>FUTURE_CI_JOB_RESULTS</value>
</entry></map>
</property>
</bean>
Finally, you'll notice that there is a parent job listener. This is the magic that transfer the state from one job and makes it available to the next. Here is my implementation of the class that does that.
<bean id="pn_job-state-listener" class="jobs.JobStateListener">
<property name="parentJobListener" ref="pn_job-parent-listener" />
</bean>
<bean id="pn_job-parent-listener" class="cjobs.ParentJobListener">
</bean>
package jobs.permnotification;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobExecutionListener;
public class ParentJobListener implements JobExecutionListener
{
private JobExecution parentExecution;
#Override
public void beforeJob(JobExecution jobExecution)
{
this.parentExecution = jobExecution;
}
#Override
public void afterJob(JobExecution jobExecution)
{
// TODO Auto-generated method stub
}
public void setParentExecution(JobExecution parentExecution)
{
this.parentExecution = parentExecution;
}
public JobExecution getParentExecution()
{
return parentExecution;
}
}
package jobs.permnotification;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobExecutionListener;
public class JobStateListener implements JobExecutionListener
{
private ParentJobListener parentJobListener;
#Override
public void beforeJob(JobExecution jobExecution)
{
if(parentJobListener == null || parentJobListener.getParentExecution() == null) return;
passStateFromParentToJob(StepKey.FUTURE_SI_JOB_RESULTS.toString(), jobExecution);
passStateFromParentToJob(StepKey.FUTURE_CI_JOB_RESULTS.toString(), jobExecution);
passStateFromParentToJob(StepKey.OUTPUT.toString(), jobExecution);
}
#Override
public void afterJob(JobExecution jobExecution)
{
if(parentJobListener == null || parentJobListener.getParentExecution() == null) return;
//take state from child step and move it into the parent execution context
passStateFromJobToParent(StepKey.FUTURE_SI_JOB_RESULTS.toString(), jobExecution);
passStateFromJobToParent(StepKey.FUTURE_CI_JOB_RESULTS.toString(), jobExecution);
passStateFromJobToParent(StepKey.OUTPUT.toString(), jobExecution);
}
private void passStateFromJobToParent(String key, JobExecution jobExecution)
{
Object obj = jobExecution.getExecutionContext().get(key);
if(obj != null)
parentJobListener.getParentExecution().getExecutionContext().put(key, obj);
}
private void passStateFromParentToJob(String key, JobExecution jobExecution)
{
Object obj = parentJobListener.getParentExecution().getExecutionContext().get(key);
if(obj != null)
jobExecution.getExecutionContext().put(key, obj);
}
public void setParentJobListener(ParentJobListener parentJobListener)
{
this.parentJobListener = parentJobListener;
}
public ParentJobListener getParentJobListener()
{
return parentJobListener;
}
}

this is sort of a hack.... recommend you use spring integration instead..but see if this applies for your situation.
if you have the spring batch meta data tables set up, you can probably get at the data that you generate within each job if you query the tables for your latest job run. All your data in the job execution context is stored and can be queried.
spring batch meta tables

Related

Spring batch improve performance by Partitioning

I need to convert the existing project to Spring batch job to make improvement the job's speed.
Suppose I have the first tasklet to retrive a list of data from database and put it to listener. So the next step can retrieve it from #BeforeStep and do some condition to get another list (10k-20k records) then proceed multiple business logic for each record.
But I am stuck how to implement this step by partition in Spring batch. I found all tutorials using directly query in reader and injected by the ExecutionContext in rangePartitioner. But I can't follow like that way.
<job id="testJob" xmlns="http://www.springframework.org/schema/batch">
<step id="step1" next="step2">
<tasklet ref="driver"/>
<listeners>
<listener ref="promotionListener">
</listener>
</listeners>
</step>
<step id="step2">
<tasklet >
<chunk reader="bmtbBillGenReqReader"
processor="bmtbBillGenReqProcessor"
writer="bmtbBillGenReqWriter"
commit-interval="1">
</chunk>
</tasklet>
</step>
</job>
<bean id="promotionListener"
class="org.springframework.batch.core.listener.ExecutionContextPromotionListener">
<property name="keys">
<util:list>
<value>billGenRequests</value>
</util:list>
</property>
</bean>
Please advise how can I implement partition from step2. maybe store the new list from step2 to csv file or somethings first?
You could implement your own partitionner instead of using RangePartitionner and retrieve the data in this implementation, instead of a dedicated step.
Then pass data for each partition to create according to your needs. For example
public class FilesPartitioner implements Partitioner {
private JdbcOperations jdbcTemplate;
#Autowired
private DataSource dataSource;
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> map = new HashMap<>();
List<String> filesname = jdbcTemplate.queryForList(
"SELECT DISTINCT FILENAME FROM MYTABLE", String.class);
for (int i = 0; i < filesname.size(); i++) {
ExecutionContext executionContext = new ExecutionContext();
executionContext.put("data", filesname.get(i));
String key = filesname.get(i);
map.put(key, executionContext);
}
return map;
}
}
And inject the parameters accordingly in reader

unable to set/get the property values in the rowMapper

I'm trying to access the StepExecution in my RowMapper but unable to do so. I have a property set in the xml called 'prop1'. I expect this to be set but it is not setting.I also added a #BeforeStep method to the RowMapper hoping I can get the stepExecutionContext but this method is never invoked. Is there something else I need to do?
Here is my xml:
<bean id="bean1"
class="org.springframework.batch.item.database.JdbcCursorItemReader"
scope="step">
<property name="dataSource" ref="dataSource" />
<property name="sql"
value="${sql}"/>
<property name="fetchSize" value="${fetchSize}"></property>
<property name="rowMapper">
<bean id="rowMapper1" class="c.p.MyRowMapper" scope="step">
<property name="prop1" value="${prop1}"></property>
</property>
Here is my RowMapper:
public class MyRowMapper implements RowMapper<Object>{
private String prop1;
private StepExecution se;
public String getProp1() {
return stepFatpCount;
}
public void setProp1(String rop1) {
this. prop1 = prop1;
}
#BeforeStep
public void beforeStep(StepExecution stepExecution){
this.se = stepExecution;
}
}
I have some properties set in the stepExecutionContext before this step in another step and I want to use them here in the RowMapper. The same thing works in the ItemProcessor but not the RowMapper. Please let me know if I need to do something more for lazy binding or any other issue.
Thanks.
The step execution context is not shared between steps. Maybe you mean job execution context?
I suppose you could register rowMapper1 as a listener in the step (with the <listeners> tag), but if you just want to read some value from the job execution context you could use
value="#{jobExecutionContext['foobar']}"
If you do want to have injected some value from the step execution context, you just have to replace stepExecutionContext above.

How can I make a XStreamMarshaller skip unknown binding?

I'm working on a Spring-Batch program. I unmarshalls XML files with XStreamMarshaller.
How can I make a XStreamMarshaller to skip any unknown+unannoated fields?
<bean id="merge.reader.item"
class="org.springframework.batch.item.xml.StaxEventItemReader">
<property name="fragmentRootElementName" value="xml-fragment"/>
<property name="unmarshaller" ref="merge.reader.unmarshaller"/>
</bean>
<bean id="merge.reader.unmarshaller"
class="org.springframework.oxm.xstream.XStreamMarshaller">
<property name="aliases" ref="merge.reader.binder"/>
<property name="autodetectAnnotations" value="true"/>
</bean>
<util:map id="merge.reader.binder">
<entry key="xml-fragment" value="path.to.my.Model"/>
</util:map>
public class Model {
#XStreamAlias(value = "one")
private String one;
#XStreamAlias(value = "other")
private String other;
}
The problem is that some new xml elements will be introduced in some other time.
I don't want to (actually I can't) add extra fields to my Model.
I'm answering for my own question. The solution is where #biziclop linked. (disclaimer: I also answered the same answer on that post).
public class ExtendedXStreamMarshaller extends XStreamMarshaller {
#Override
protected void configureXStream(final XStream xstream) {
super.configureXStream(xstream);
xstream.ignoreUnknownElements(); // will it blend?
}
}

How do I pass previous step data to partitioner

I am trying to run a partition oriented job and having trouble in accessing the stepExecutionContext stored data. Here is my job definition
<batch:job id="job1" restartable="false" incrementer="idIncrementer">
<batch:step id="readwritestep" next="partitionStep">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="reader1"
writer="writer1"
commit-interval="500"/>
</batch:tasklet>
<batch:listeners>
<batch:listener ref="promotionListener"/>
</batch:listeners>
</batch:step>
<batch:step id="partitionStep" >
<batch:partition step="detailsStep" partitioner="partitioner">
<batch:handler grid-size="10" task-executor="taskExecutor" />
</batch:partition>
</batch:step>
</batch:job>
<batch:step id="detailsStep">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="reader2"
processor="processor"
writer="writer2"
commit-interval="1500"/>
</batch:tasklet>
</batch:step>
While processing readwritestep I am storing some data in the step context and promoting to job context so that I can access in partioner. But custom partioner which I have implemented doesnt have any reference to the parent step where I can get access to the stored data...Even though partitioner is bound to STEP it cant access parent step data...am I missing something here? One option partitioner is giving is jdbctemplate to generate splitter contexts which I am not in intrest. I have tried to inject the #beforestep annotation to get access the context data but its not getting invoked..I dont want to perform the JDBC read to generate the slave data...I want to get the LIST data stored in the step/job context execution and generate the splitter contexts...Can somebody help me out to point me to the right direction so that I can access that data...
Here is the partitioning class...
public class ProductDetailsPartitioner implements Partitioner {
private List<Product> prds;
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
List<String> referencIds = new ArrayList<String>();
for (Product prd : prds) {
referencIds.add(prd.getReferenceId());
}
Map<String, ExecutionContext> results = new LinkedHashMap<String,ExecutionContext>();
for (String referencId : referencIds) {
ExecutionContext context = new ExecutionContext();
context.put("referenceId", referencId);
results.put("partition." + referencId, context);
}
return results;
}
#BeforeStep
public void retrieveInterstepData(StepExecution stepExecution) {
System.out.println("Entered Before step in partion");
JobExecution jobExecution = stepExecution.getJobExecution();
ExecutionContext jobContext = jobExecution.getExecutionContext();
System.out.println("ExecutionContext"+jobContext);
this.prds = (List<Product>) jobContext.get("products");
}
}
OK, I was able to pass this issue by taking another approach. Now I am processing partioner based on jobid rather than passing the reference through the execution context. Its running fine. Its hard to explain the real solution as it is based on the real bussiness needs.

Implementing SkipListener to write invalid records to a flat file

I am working on setting a spring batch job that does the conventional READ > PROCESS > WRITE operation. However, I am trying to implement a listener which would capture records that are conisdered invalid during the PROCESS phase and write it out to an error log file.
My listener class uses an instance of FlatFileItemWriter to write the data. However, spring-batch is not instantiating the writer instance properly.
My listener class looks like this:
public class DTOProcessorListener extends SkipListenerSupport<AttributeReportGenerationDTO, AttributeValue> {
private static final Logger LOGGER = LoggerFactory.getLogger(DTOProcessorListener.class);
private FlatFileItemWriter<AttributeReportGenerationDTO> flatFileItemWriter;
#Override
public void onSkipInProcess(AttributeReportGenerationDTO item, Throwable t) {
try {
LOGGER.error("Record not processed for attribute value with ID : " + item.getAttributeValueId());
List<AttributeReportGenerationDTO> list = new ArrayList<AttributeReportGenerationDTO>();
list.add(item);
flatFileItemWriter.write(list);
} catch (Exception e) {
LOGGER.error("Unable to write to the error output file", e);
}
}
/**
* #param flatFileItemWriter
* the flatFileItemWriter to set
*/
public void setFlatFileItemWriter(FlatFileItemWriter<AttributeReportGenerationDTO> flatFileItemWriter) {
this.flatFileItemWriter = flatFileItemWriter;
}
}
and my job configuration XML looks like this:
<bean id="skipListener" class="something.DTOProcessorListener" scope="step">
<property name="flatFileItemWriter">
<bean id="errorItemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step">
<property name="resource" value="file:#{jobParameters['error.filename']}" />
<property name="appendAllowed" value="true" />
<property name="lineAggregator">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
<property name="fieldExtractor">
<bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor">
<property name="names"
value="productTypeId, productTypeName, productId, productName, skuId, skuName, attributeValueId, attributeName, attributeValue, attributeType, nonEditableValueCheckSum, editableValueCheckSum" />
</bean>
</property>
</bean>
</property>
<property name="headerCallback">
<bean class="something.CsvHeaderImplementation">
<property name="headerString"
value="Product Type ID,Product Type,Product ID,Product Name,Sku ID,Sku Name,Attribute Value ID,Attribute Name,Attribute Value,Attribute Type,Check Sum 1,Check Sum 2" />
</bean>
</property>
</bean>
</property>
</bean>
I get the error
org.springframework.batch.item.WriterNotOpenException: Writer must be open before it can be written to
I am unable to set a stream entry in the job config as the bean for FlatFileItemWriter is internally specified (for the listener). If I create abean outside of the listener and refer to it, its returning a proxy instance of the FlatFileItemWriterClass.
Has anyone successfully wired up a writer to a flat file in the listener?
Thanks for the help
Well why don't you use the writer as a normal bean ? You could register it as a stream and to get around the step proxy you could use the PropertPlaceholderConfigurer
i created a working example under my github repo, but i think spring batch could need an improvement here, it should be easier to implement error-item logging