Spring Batch restart functionality not working when using #StepScope - spring-batch

I want to use Spring Batch (v3.0.9) restart functionality so that when JobInstance restarted the process step reads from the last failed chunk point forward. My restart works fine as long as I don't use #StepScope annotation to my myBatisPagingItemReader bean method.
I was using #StepScope so that i can do late binding to get the JobParameters in my myBatisPagingItemReader bean method #Value("#{jobParameters['run-date']}"))
If I use #StepScope annotation on myBatisPagingItemReader() bean method the restart does not work as it creates new instance (scope=step, name=scopedTarget.myBatisPagingItemReader).
If i use stepscope, is it possible for my myBatisPagingItemReader to set the read.count from the last failure to get restart working?
I have explained this issue with example below.
#Configuration
#EnableBatchProcessing
public class BatchConfig {
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<Model> myBatisPagingItemReader,
ItemProcessor<Model, Model> itemProcessor,
ItemWriter<Model> itemWriter) {
return stepBuilderFactory.get("data-load")
.<Model, Model>chunk(10)
.reader(myBatisPagingItemReader)
.processor(itemProcessor)
.writer(itemWriter)
.listener(itemReadListener())
.listener(new JobParameterExecutionContextCopyListener())
.build();
}
#Bean
public Job job(JobBuilderFactory jobBuilderFactory, #Qualifier("step1")
Step step1) {
return jobBuilderFactory.get("load-job")
.incrementer(new RunIdIncrementer())
.start(step1)
.listener(jobExecutionListener())
.build();
}
#Bean
#StepScope
public ItemReader<Model> myBatisPagingItemReader(
SqlSessionFactory sqlSessionFactory,
#Value("#{JobParameters['run-date']}") String runDate)
{
MyBatisPagingItemReader<Model> reader = new
MyBatisPagingItemReader<>();
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("runDate", runDate);
reader.setSqlSessionFactory(sqlSessionFactory);
reader.setParameterValues(parameterValues);
reader.setQueryId("query");
return reader;
}
}
Restart Example when I use #Stepscope annotation to myBatisPagingItemReader(), the reader is fetching 5 records and I have chunk size(commit-interval) set to 3.
Job Instance - 01 - Job Parameter - 01/02/2019.
chunk-1:
- process record-1
- process record-2
- process record-3
writer - writes all 3 records
chunk-1 commit successful
chunk-2:
process record-4
process record-5 - Throws and exception
Job completes and set to 'FAILED' status
Now the Job is Restarted again using same Job Parameter.
Job Instance - 01 - Job Parameter - 01/02/2019.
chunk-1:
process record-1
process record-2
process record-3
writer - writes all 3 records
chunk-1 commit successful
chunk-2:
process record-4
process record-5 - Throws and exception
Job completes and set to 'FAILED' status
The #StepScope annotation on myBatisPagingItemReader() bean method creates a new instance , see below log message.
Creating object in scope=step, name=scopedTarget.myBatisPagingItemReader
Registered destruction callback in scope=step, name=scopedTarget.myBatisPagingItemReader
As it is new instance it start the process from start, instead of starting from chunk-2.
If i don't use #Stepscope, it restarts from chunk-2 as the restarted job step sets - MyBatisPagingItemReader.read.count=3.

The issue here is that you are returning an ItemReader instead of the fully qualified class (MyBatisPagingItemReader) or at least ItemStreamReader. When you use Spring Batch's step scope, we create a proxy to allow for late initialization. The proxy is based on the return type of the method (ItemReader in your case). The issue you are running into is that because the proxy is of ItemReader, Spring Batch does not know that your bean also implements ItemStream and it is that interface that enables restartability. By default, Spring Batch will automatically register all beans of type ItemStream for you (you can also explicitly register the beans yourself, but it's typically not needed).
To address your issue, the following should work (note the change in the return type):
#Bean
#StepScope
public MyBatisPagingItemReader<Model> myBatisPagingItemReader(
SqlSessionFactory sqlSessionFactory,
#Value("#{JobParameters['run-date']}") String runDate) {
MyBatisPagingItemReader<Model> reader =
new MyBatisPagingItemReader<>();
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("runDate", runDate);
reader.setSqlSessionFactory(sqlSessionFactory);
reader.setParameterValues(parameterValues);
reader.setQueryId("query");
return reader;
}
This is why it is my recommendation that where possible, when using #Bean annotated methods, you should return the most concrete type possible to allow Spring to help as much as possible.

Related

Step ExecutionContext not promoted using Spring Cloud Task on Spring Cloud Data Flow

I successfully deployed a remote partitioned job using Spring Cloud Data Flow and Spring Cloud Task; the installation is based on Kubernetes, so I added the Kubernetes implementation of Spring Cloud Deployer to the project.
But it seems that it's impossible to propagate values from the step execution context of a worker to the job execution context.
The worker tasklet writes some data into the step execution context, which is successfully saved in the "BATCH_STEP_EXECUTION_CONTEXT" table:
#Bean
public Tasklet workerTasklet() {
return new Tasklet() {
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
ExecutionContext executionContext = chunkContext.getStepContext()
.getStepExecution()
.getExecutionContext();
Integer partitionNumber = executionContext.getInt("partitionNumber");
System.out.println("This tasklet ran partition UPD1: " + partitionNumber);
executionContext.put(RESULT_PREFIX+partitionNumber, "myResult "+partitionNumber);
return RepeatStatus.FINISHED;
}
};
}
There's an ExecutionPromotionListener which is added during the Step building process:
#Bean
public StepExecutionListener promotionListener() {
ExecutionContextPromotionListener listener = new
ExecutionContextPromotionListener();
listener.setKeys(new String[] {"result0","result1","result2","result3"});
return listener;
}
#Bean
public Step workerStep() {
return this.stepBuilderFactory.get("workerStep")
.tasklet(workerTasklet())
.listener(promotionListener())
.build();
}
The job successfully completes and the work is partitioned properly and executed by 4 Kubernetes pods:
But the expected values are not present in the BATCH_JOB_EXECUTION_CONTEXT table.
Conversley, the step execution context propagation works with a partitioned job in a not cloud environment, for example using a TaskExecutorPartitionHandler.

Reading file dynamically in spring-batch

I am trying to transfer any files (video,txt etc) between different endpoints (pc, s3, dropbox, google drive) using spring-batch on a network. For that, I am getting json file containing list of files location(url) to be transferred (assume I can access those location).
So, how do I tell the reader to read the input once my controller is hit (in which job is created) and not at the time of starting spring-boot application?
I have tried adding "spring.batch.job.enabled=false" which stops spring-batch to start automatically but my concern is where should I write setting my resource line that will be provided to ItemReader :
FlatFileItemReader<String> reader = new FlatFileItemReader<String>();
reader.setResource(someResource);
Because during setting resources I am getting NullPointerException.
The Running Jobs from within a Web Container explains that with a code example. Here is an except:
#Controller
public class JobLauncherController {
#Autowired
JobLauncher jobLauncher;
#Autowired
Job job;
#RequestMapping("/jobLauncher.html")
public void handle() throws Exception{
jobLauncher.run(job, new JobParameters());
}
}
In your case, you need to extract the file name from the request and pass it as a job parameter, something like:
#RequestMapping("/jobLauncher.html")
public void handle() throws Exception{
URL url = // extract url from request
JobParameters parameters = new JobParametersBuilder()
.addString("url", url)
.toJobParameters();
jobLauncher.run(job, parameters);
}
Then make your reader step-scoped and dynamically extract the file from job parameters:
#StepScope
#Bean
public FlatFileItemReader flatFileItemReader(#Value("#{jobParameters['url']}") URL url) {
return new FlatFileItemReaderBuilder<String>()
.resource(new UrlResource(url))
// set other properties
.build();
}
This is explained in the Late Binding of Job and Step Attributes section.

Spring Batch restart

I am new to Spring Batch. I have some question about restart. I know restart feature enabled by default. Any extra code I need to do restart any job? Which jobs are restart-able. How can I test my batch app is restartable. I tried to stop the batch middle of process and run again. It always executing a new job.
Below are my code :
#Bean
#Qualifier("dataTransferJob")
public Job dataJob() {
return jobBuilderFactory.get("data-transfer-job")
.listener(jobExecutionListener())
.flow(step()).end().build();
}
#Bean
public Step step() {
return stepBuilderFactory.get("data-transfer-step")
.<TestData, TestDataVO>chunk(100)
.reader(reader())
.processor(process())
.writer(writer)
.taskExecutor(threadPool)
.transactionManager(transactionManager)
.listener(stepExecutionListener())
.listener(chunkListener())
.throttleLimit(10)
.build();
}
#PersistenceContext
private EntityManager em;
#Bean(destroyMethod="")
public ItemReader<TestData> reader() {
JpaPagingItemReader<TestData> itemReader = new JpaPagingItemReader<>();
try {
String sqlQuery = "SELECT * FROM TEST_DATA";
JpaNativeQueryProvider<TestData> queryProvider = new JpaNativeQueryProvider<TestData>();
queryProvider.setSqlQuery(sqlQuery);
queryProvider.setEntityClass(TestData.class);
queryProvider.afterPropertiesSet();
itemReader.setEntityManagerFactory(em.getEntityManagerFactory());
itemReader.setPageSize(100);
itemReader.setQueryProvider(queryProvider);
itemReader.afterPropertiesSet();
itemReader.setSaveState(true);
}
catch (Exception e) {
System.out.println("BatchConfiguration.reader() ==> error " + e.getMessage());
}
return itemReader;
}
And lunch the job using CommandLineRunner
#Autowired
JobLauncher jobLauncher;
#Autowired
#Qualifier("dataTransferJob")
Job dataJob;
JobParametersBuilder paramsBuilder = new JobParametersBuilder();
paramsBuilder.addString("date", LocalDateTime.now().toString());
JobExecution jobExecution=jobLauncher.run(dataJob, paramsBuilder.toJobParameters());
In Spring Batch, a job instance is identified by the (identifying) job parameters. Please check the The domain language of Batch section to understand the difference between the Job, JobInstance and JobExecution concepts and how parameters are used to identify job instances.
I tried to stop the batch middle of process and run again. It always executing a new job.
In your case, since your are adding the current time as a job parameter on each run here:
JobParametersBuilder paramsBuilder = new JobParametersBuilder();
paramsBuilder.addString("date", LocalDateTime.now().toString());
you end up with a different job instance each time. If you want to start the same job instance again, you need to pass the same timestamp of the first attempt as a job parameter.

How to handle stateful item reader in SpringBatch

Our SpringBatch Job has a single Step with an ItemReader, ItemProcessor, and ItemWriter. We are running the same job concurrently with different parameters. The ItemReader is stateful as it contains an input stream that it reads from.
So, we don't want the same instance of the ItemReader to be used for every JobInstance (Job + Parameters) invocation.
I am not quite sure which is the best "scoping" for this situation.
1) Should the Step be annotated with #JobScope and ItemReader be a prototype?
OR
2) Should the Step be annotated with #StepScope and ItemReader be a prototype?
OR
3) Should both the Step and ItemReader be annotated as Prototype?
The end result should be such that a new ItemReader is created for every new execution of the Job with different identifying parameters (ie, for every new JobInstance).
Thanks.
-AP_
Here's how it goes from a class instantiation standpoint (from least to most instances):
Singleton (per JVM)
JobScope (per job)
StepScope (per step)
Prototype (per reference)
If you have multiple jobs running in a single JVM (assuming you aren't in a partitioned Step, JobScope will be sufficient. If you have a partitioned step, you'll want StepScope. Prototype would be overkill in all scenarios.
However, if these jobs are launching in different JVMs (and not a partitioned step), then a simple Singleton bean will be just fine.
There is no need that every component (Step, ItemReader, ItemProcessor, ItemWriter) has to be a spring component. For instance, with the SpringBatch-JavaApi, only your Job needs to be a SpringBean, but not your Steps, Readers and Writers:
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory steps;
#Bean
public Job job() throws Exception {
return this.jobs.get(JOB_NAME) // create jobbuilder
.start(step1()) // add step 1
.next(step2()) // add step 2
.build(); // create job
}
#Bean
public Job job() throws Exception {
return this.jobs.get(JOB_NAME) // create jobbuilder
.start(step1(JOB_NAME)) // add step 1
.next(step2(JOB_NAME)) // add step 2
.build(); // create job
}
private Step step1(String jobName) throws Exception {
return steps.get(jobName + "_Step_1").chunk(10) //
.faultTolerant() //
.reader(() -> null) // you could lambdas
.writer(items -> {
}) //
.build();
}
private Step step2(String jobName) throws Exception {
return steps.get(jobName + "_Step_2").chunk(10) //
.faultTolerant() //
.reader(createDbItemReader(ds, sqlString, rowmapper)) //
.writer(createFileWriter(resource, aggregator)) //
.build();
}
The only thing you have to pay attention to is that you have to call the "afterPropertiesSet"-methods when creating instances like JdbcCurserItemReader, FlatFileItemReader/Writer:
private static <T> ItemReader<T> createDbItemReader(DataSource ds, String sql, RowMapper<T> rowMapper) throws Exception {
JdbcCursorItemReader<T> reader = new JdbcCursorItemReader<>();
reader.setDataSource(ds);
reader.setSql(sql);
reader.setRowMapper(rowMapper);
reader.afterPropertiesSet(); // don't forget
return reader;
}
private static <T> ItemWriter<T> createFileWriter(Resource target, LineAggregator<T> aggregator) throws Exception {
FlatFileItemWriter<T> writer = new FlatFileItemWriter();
writer.setEncoding("UTF-8");
writer.setResource(target);
writer.setLineAggregator(aggregator);
writer.afterPropertiesSet(); // don't forget
return writer;
}
This way, there is no need for you to hassle around with the Scopes. Every Job will have its own instances of its Steps and their Readers and Writers.
Another advantage of this approach is the fact that you now can create your jobs completly dynamically.

why spring batch creates a new instance when i try to restart the day after a failed job?

image datas from database
I have a failed execution instance in my repository on date 2016-03-14.
If i try to restart the job's instance on the date of 2016-03-15, a new instance and a new execution with the previous job's parameters (2016-03-14) are creating.
But the job is restarting a complete step instead of doing a recover process(starting at the last line before the fail event).
Why i have a new instance?
If i restart on the same day (failed job and the restarted job) i have no problem (one instance sharing between job's execution).
EDIT:
I start my job with this code:
#Bean
public Job myJob(JobBuilderFactory jobs, Step stepInjectCsvWsIntoCsv) {
return jobs.get("myJob")
.listener(new JobListener())
.incrementer(new RunIdDateIncrementor())
.flow(stepInjectCsvWsIntoCsv)
.end().build();
}
RunIdDateIncrementor is my own class. It's here that i create parameters (run.id and run.date)
I use a FlatItemReader and a CompositeWriter which manage two MultiResourceItemWriter and implements ResourceAwareItemWriterItemStream
And the step configuration :
#Bean(name = "stepInjectCsvWsIntoCsv")
public Step stepInjectCsvWsIntoCsv(StepBuilderFactory stepBuilderFactory, ItemReader<GetDataInCsv> csvReader,
CompositeTwoCsvFileItemWriter getDataWriter,
ItemProcessor<GetDataInCsv, List<GetDataOutCsv>> getDataProcessor
) {
/* it handles bunches of 10 units => limité à 10 stations*/
return stepBuilderFactory.get("stepInjectCsvWsIntoCsv").listener(new StepListener())
.<GetDataInCsv, List<GetDataOutCsv>> chunk(1)
.reader(csvReader).processor(getDataProcessor).writer(getDataWriter)
.faultTolerant().skipLimit(1000).skip(GetDataFault.class)
.listener(new CustomChunkListener())
.listener(new CustomItemReaderListener())
.listener(new GetDataItemProcessListener())
.listener(new CustomItemWriterListener())
.build();
}
I have a new instance then an empty execution context and so the restart isn't detected.
I use SPRING BOOT too.
The Launch
#SpringBootApplication
public class BatchWsVersCsv implements CommandLineRunner {
public static void main(String[] args) {
Logger logger = LoggerFactory.getLogger(BatchWsVersCsv.class);
SpringApplication springApplication = new SpringApplication(new Object[] { BatchWsVersCsv.class });
Map<String, Object> defaultProperties = new HashMap<String, Object>();
//set some default properties
//...
springApplication.setDefaultProperties(defaultProperties);
springApplication.run(args);
}
public void run(String... strings) throws Exception {
System.out.println("running...");
}
}
Ok, it's my bad. For testing an error event on day before, i do a request in data base for updating all date.
So the serialized key in batch_instance isn't match anymore.
If i change the system date for generating error on day before, all works prefectly if i run a restart the day after.