Reading file dynamically in spring-batch - spring-batch

I am trying to transfer any files (video,txt etc) between different endpoints (pc, s3, dropbox, google drive) using spring-batch on a network. For that, I am getting json file containing list of files location(url) to be transferred (assume I can access those location).
So, how do I tell the reader to read the input once my controller is hit (in which job is created) and not at the time of starting spring-boot application?
I have tried adding "spring.batch.job.enabled=false" which stops spring-batch to start automatically but my concern is where should I write setting my resource line that will be provided to ItemReader :
FlatFileItemReader<String> reader = new FlatFileItemReader<String>();
reader.setResource(someResource);
Because during setting resources I am getting NullPointerException.

The Running Jobs from within a Web Container explains that with a code example. Here is an except:
#Controller
public class JobLauncherController {
#Autowired
JobLauncher jobLauncher;
#Autowired
Job job;
#RequestMapping("/jobLauncher.html")
public void handle() throws Exception{
jobLauncher.run(job, new JobParameters());
}
}
In your case, you need to extract the file name from the request and pass it as a job parameter, something like:
#RequestMapping("/jobLauncher.html")
public void handle() throws Exception{
URL url = // extract url from request
JobParameters parameters = new JobParametersBuilder()
.addString("url", url)
.toJobParameters();
jobLauncher.run(job, parameters);
}
Then make your reader step-scoped and dynamically extract the file from job parameters:
#StepScope
#Bean
public FlatFileItemReader flatFileItemReader(#Value("#{jobParameters['url']}") URL url) {
return new FlatFileItemReaderBuilder<String>()
.resource(new UrlResource(url))
// set other properties
.build();
}
This is explained in the Late Binding of Job and Step Attributes section.

Related

How to write integration tests for spring-batch-integration?

I'm using spring-integration bundled with spring-batch and got stuck trying to write integration tests to test the whole flow, not just single config.
I've created Embedded Sftp Server for this tests and trying to send message to sftpInboundChannel - the message is sent, but nothing happens, but when i send this message to the next channel (after sftpInboundChannel) it goes ok. Also i'm not able to load test source properties, even though i'm using #TestPropertySource annotation.
This are my class annotations
#TestPropertySource(properties = {
//here goes all the properties
})
#EnableConfigurationProperties
#RunWith(SpringRunner.class)
#Import({TestConfig.class, SessionConfig.class})
#ActiveProfiles("it")
#SpringIntegrationTest
#EnableIntegration
#SpringBootTest
#DirtiesContext(classMode = DirtiesContext.ClassMode.BEFORE_EACH_TEST_METHOD)
This is my class body
#Autowired
private PollableChannel sftpInboundChannel;
#Autowired
private SessionFactory<ChannelSftp.LsEntry> defaultSftpSessionFactory;
#Autowired
private EmbeddedSftpServer server;
#Test
public void shouldDoSmth() {
RemoteFileTemplate<ChannelSftp.LsEntry> template;
try {
template = new RemoteFileTemplate<>(defaultSftpSessionFactory);
SftpTestUtils.moveToRemoteFolder(template);
final List<ChannelSftp.LsEntry> movedFiles = SftpTestUtils.listFilesFromDirectory("folder/subfolder", template);
log.info("Moved file {}", movedFiles.size());
final MessageBuilder<String> messageBuilder = MessageBuilder.withPayload("Sample.txt") // path to file
.setHeader("file_Path", "Sample.txt")
boolean wasSent = this.sftpInboundChannel.send(messageBuilder.build());
log.info("Was sent to sftpInboundChannel channel {}", wasSent);
log.info("message {}", messageBuilder.build());
} finally {
SftpTestUtils.cleanUp();
}
}
To the case of not read the property file one solution is add in your Test class something like this:
#BeforeClass
public static void beforeClass() {
System.setProperty("propertyfile", "nameOfFile.properties");
}
A second way is to create a xml (or class) config where you add the tag:
<context:property-placeholder
location="nameOfFile.properties"
ignore-resource-not-found="true" system-properties-mode="OVERRIDE" />
and your file will be localized.
The property file should be inside of resources folder.

Spring Batch restart functionality not working when using #StepScope

I want to use Spring Batch (v3.0.9) restart functionality so that when JobInstance restarted the process step reads from the last failed chunk point forward. My restart works fine as long as I don't use #StepScope annotation to my myBatisPagingItemReader bean method.
I was using #StepScope so that i can do late binding to get the JobParameters in my myBatisPagingItemReader bean method #Value("#{jobParameters['run-date']}"))
If I use #StepScope annotation on myBatisPagingItemReader() bean method the restart does not work as it creates new instance (scope=step, name=scopedTarget.myBatisPagingItemReader).
If i use stepscope, is it possible for my myBatisPagingItemReader to set the read.count from the last failure to get restart working?
I have explained this issue with example below.
#Configuration
#EnableBatchProcessing
public class BatchConfig {
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<Model> myBatisPagingItemReader,
ItemProcessor<Model, Model> itemProcessor,
ItemWriter<Model> itemWriter) {
return stepBuilderFactory.get("data-load")
.<Model, Model>chunk(10)
.reader(myBatisPagingItemReader)
.processor(itemProcessor)
.writer(itemWriter)
.listener(itemReadListener())
.listener(new JobParameterExecutionContextCopyListener())
.build();
}
#Bean
public Job job(JobBuilderFactory jobBuilderFactory, #Qualifier("step1")
Step step1) {
return jobBuilderFactory.get("load-job")
.incrementer(new RunIdIncrementer())
.start(step1)
.listener(jobExecutionListener())
.build();
}
#Bean
#StepScope
public ItemReader<Model> myBatisPagingItemReader(
SqlSessionFactory sqlSessionFactory,
#Value("#{JobParameters['run-date']}") String runDate)
{
MyBatisPagingItemReader<Model> reader = new
MyBatisPagingItemReader<>();
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("runDate", runDate);
reader.setSqlSessionFactory(sqlSessionFactory);
reader.setParameterValues(parameterValues);
reader.setQueryId("query");
return reader;
}
}
Restart Example when I use #Stepscope annotation to myBatisPagingItemReader(), the reader is fetching 5 records and I have chunk size(commit-interval) set to 3.
Job Instance - 01 - Job Parameter - 01/02/2019.
chunk-1:
- process record-1
- process record-2
- process record-3
writer - writes all 3 records
chunk-1 commit successful
chunk-2:
process record-4
process record-5 - Throws and exception
Job completes and set to 'FAILED' status
Now the Job is Restarted again using same Job Parameter.
Job Instance - 01 - Job Parameter - 01/02/2019.
chunk-1:
process record-1
process record-2
process record-3
writer - writes all 3 records
chunk-1 commit successful
chunk-2:
process record-4
process record-5 - Throws and exception
Job completes and set to 'FAILED' status
The #StepScope annotation on myBatisPagingItemReader() bean method creates a new instance , see below log message.
Creating object in scope=step, name=scopedTarget.myBatisPagingItemReader
Registered destruction callback in scope=step, name=scopedTarget.myBatisPagingItemReader
As it is new instance it start the process from start, instead of starting from chunk-2.
If i don't use #Stepscope, it restarts from chunk-2 as the restarted job step sets - MyBatisPagingItemReader.read.count=3.
The issue here is that you are returning an ItemReader instead of the fully qualified class (MyBatisPagingItemReader) or at least ItemStreamReader. When you use Spring Batch's step scope, we create a proxy to allow for late initialization. The proxy is based on the return type of the method (ItemReader in your case). The issue you are running into is that because the proxy is of ItemReader, Spring Batch does not know that your bean also implements ItemStream and it is that interface that enables restartability. By default, Spring Batch will automatically register all beans of type ItemStream for you (you can also explicitly register the beans yourself, but it's typically not needed).
To address your issue, the following should work (note the change in the return type):
#Bean
#StepScope
public MyBatisPagingItemReader<Model> myBatisPagingItemReader(
SqlSessionFactory sqlSessionFactory,
#Value("#{JobParameters['run-date']}") String runDate) {
MyBatisPagingItemReader<Model> reader =
new MyBatisPagingItemReader<>();
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("runDate", runDate);
reader.setSqlSessionFactory(sqlSessionFactory);
reader.setParameterValues(parameterValues);
reader.setQueryId("query");
return reader;
}
This is why it is my recommendation that where possible, when using #Bean annotated methods, you should return the most concrete type possible to allow Spring to help as much as possible.

Spring Batch Using Java No XML 3.0.8.RELEASE

I would like to pass in a "timestamp" into the JobParameters that gets passed to JobLauncher so I can run a job multiple times a day.
With XML I was able to do this:
JobParameters jobParameters = new JobParametersBuilder().addLong("time", System.currentTimeMillis()).toJobParameters();
JobExecution execution = jobLauncher.run(job, jobParameters);
Now that I am trying only Java inside Spring Boot framework, I am at a loss.
Startup class with main:
#SpringBootApplication
public class SpringBatchAnnotatedApplication {
public static void main(String[] args) {
SpringApplication.run(SpringBatchAnnotatedApplication.class, args);
And code eg from class that starts Job:
#Bean
public Job testJob(JobCompletionNotificationListener listener) {
return jobBuilderFactory.get("testJob")
.preventRestart()
.listener(listener)
.flow(step1())
.end()
.build();
}
Advice? I am following docs here using https://docs.spring.io/spring-batch/4.0.x/reference/html/job.html#configureJob [ Java ] Maybe there is a better reference?
Below should work for you. I am using same configuration and it's working fine for me.
JobParameters jobParameters = new JobParametersBuilder()
.addDate("time", new Date())
.toJobParameters();
Spring Batch has the concept of a RunIdIncrementer for this very use case. It doesn't pass a timestamp, instead it increments a job parameter. I'd recommend using this approach instead of manipulating the job parameters. You can read more about the RunIdIncrementer in the documentation here: https://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/core/launch/support/RunIdIncrementer.html

How to handle stateful item reader in SpringBatch

Our SpringBatch Job has a single Step with an ItemReader, ItemProcessor, and ItemWriter. We are running the same job concurrently with different parameters. The ItemReader is stateful as it contains an input stream that it reads from.
So, we don't want the same instance of the ItemReader to be used for every JobInstance (Job + Parameters) invocation.
I am not quite sure which is the best "scoping" for this situation.
1) Should the Step be annotated with #JobScope and ItemReader be a prototype?
OR
2) Should the Step be annotated with #StepScope and ItemReader be a prototype?
OR
3) Should both the Step and ItemReader be annotated as Prototype?
The end result should be such that a new ItemReader is created for every new execution of the Job with different identifying parameters (ie, for every new JobInstance).
Thanks.
-AP_
Here's how it goes from a class instantiation standpoint (from least to most instances):
Singleton (per JVM)
JobScope (per job)
StepScope (per step)
Prototype (per reference)
If you have multiple jobs running in a single JVM (assuming you aren't in a partitioned Step, JobScope will be sufficient. If you have a partitioned step, you'll want StepScope. Prototype would be overkill in all scenarios.
However, if these jobs are launching in different JVMs (and not a partitioned step), then a simple Singleton bean will be just fine.
There is no need that every component (Step, ItemReader, ItemProcessor, ItemWriter) has to be a spring component. For instance, with the SpringBatch-JavaApi, only your Job needs to be a SpringBean, but not your Steps, Readers and Writers:
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory steps;
#Bean
public Job job() throws Exception {
return this.jobs.get(JOB_NAME) // create jobbuilder
.start(step1()) // add step 1
.next(step2()) // add step 2
.build(); // create job
}
#Bean
public Job job() throws Exception {
return this.jobs.get(JOB_NAME) // create jobbuilder
.start(step1(JOB_NAME)) // add step 1
.next(step2(JOB_NAME)) // add step 2
.build(); // create job
}
private Step step1(String jobName) throws Exception {
return steps.get(jobName + "_Step_1").chunk(10) //
.faultTolerant() //
.reader(() -> null) // you could lambdas
.writer(items -> {
}) //
.build();
}
private Step step2(String jobName) throws Exception {
return steps.get(jobName + "_Step_2").chunk(10) //
.faultTolerant() //
.reader(createDbItemReader(ds, sqlString, rowmapper)) //
.writer(createFileWriter(resource, aggregator)) //
.build();
}
The only thing you have to pay attention to is that you have to call the "afterPropertiesSet"-methods when creating instances like JdbcCurserItemReader, FlatFileItemReader/Writer:
private static <T> ItemReader<T> createDbItemReader(DataSource ds, String sql, RowMapper<T> rowMapper) throws Exception {
JdbcCursorItemReader<T> reader = new JdbcCursorItemReader<>();
reader.setDataSource(ds);
reader.setSql(sql);
reader.setRowMapper(rowMapper);
reader.afterPropertiesSet(); // don't forget
return reader;
}
private static <T> ItemWriter<T> createFileWriter(Resource target, LineAggregator<T> aggregator) throws Exception {
FlatFileItemWriter<T> writer = new FlatFileItemWriter();
writer.setEncoding("UTF-8");
writer.setResource(target);
writer.setLineAggregator(aggregator);
writer.afterPropertiesSet(); // don't forget
return writer;
}
This way, there is no need for you to hassle around with the Scopes. Every Job will have its own instances of its Steps and their Readers and Writers.
Another advantage of this approach is the fact that you now can create your jobs completly dynamically.

why spring batch creates a new instance when i try to restart the day after a failed job?

image datas from database
I have a failed execution instance in my repository on date 2016-03-14.
If i try to restart the job's instance on the date of 2016-03-15, a new instance and a new execution with the previous job's parameters (2016-03-14) are creating.
But the job is restarting a complete step instead of doing a recover process(starting at the last line before the fail event).
Why i have a new instance?
If i restart on the same day (failed job and the restarted job) i have no problem (one instance sharing between job's execution).
EDIT:
I start my job with this code:
#Bean
public Job myJob(JobBuilderFactory jobs, Step stepInjectCsvWsIntoCsv) {
return jobs.get("myJob")
.listener(new JobListener())
.incrementer(new RunIdDateIncrementor())
.flow(stepInjectCsvWsIntoCsv)
.end().build();
}
RunIdDateIncrementor is my own class. It's here that i create parameters (run.id and run.date)
I use a FlatItemReader and a CompositeWriter which manage two MultiResourceItemWriter and implements ResourceAwareItemWriterItemStream
And the step configuration :
#Bean(name = "stepInjectCsvWsIntoCsv")
public Step stepInjectCsvWsIntoCsv(StepBuilderFactory stepBuilderFactory, ItemReader<GetDataInCsv> csvReader,
CompositeTwoCsvFileItemWriter getDataWriter,
ItemProcessor<GetDataInCsv, List<GetDataOutCsv>> getDataProcessor
) {
/* it handles bunches of 10 units => limité à 10 stations*/
return stepBuilderFactory.get("stepInjectCsvWsIntoCsv").listener(new StepListener())
.<GetDataInCsv, List<GetDataOutCsv>> chunk(1)
.reader(csvReader).processor(getDataProcessor).writer(getDataWriter)
.faultTolerant().skipLimit(1000).skip(GetDataFault.class)
.listener(new CustomChunkListener())
.listener(new CustomItemReaderListener())
.listener(new GetDataItemProcessListener())
.listener(new CustomItemWriterListener())
.build();
}
I have a new instance then an empty execution context and so the restart isn't detected.
I use SPRING BOOT too.
The Launch
#SpringBootApplication
public class BatchWsVersCsv implements CommandLineRunner {
public static void main(String[] args) {
Logger logger = LoggerFactory.getLogger(BatchWsVersCsv.class);
SpringApplication springApplication = new SpringApplication(new Object[] { BatchWsVersCsv.class });
Map<String, Object> defaultProperties = new HashMap<String, Object>();
//set some default properties
//...
springApplication.setDefaultProperties(defaultProperties);
springApplication.run(args);
}
public void run(String... strings) throws Exception {
System.out.println("running...");
}
}
Ok, it's my bad. For testing an error event on day before, i do a request in data base for updating all date.
So the serialized key in batch_instance isn't match anymore.
If i change the system date for generating error on day before, all works prefectly if i run a restart the day after.