I'm constructing a spring-batch job that modifies a given number of records. The list of record ID's are an input parameter of the job. For example, one job might be: Modify the record Id's {1,2,3,4} and set parameters X and Y on related tables.
Since I'm unable to pass a potentialy very long input list (tipical cases, 50K records) to my ItemReader I only pass a MyJobID which then the itemReader uses to load the target ID list.
Problem is, the resulting code appears "wrong" (altough it works) and not in the spirit of spring-batch. Here's the reader:
#Scope(value = "step", proxyMode = ScopedProxyMode.INTERFACES)
#Component
public class MyItemReader implements ItemReader<Integer> {
#Autowired
private JobService jobService;
private List<Integer> itemsList;
private Long jobId;
#Autowired
public MyItemReader(#Value("#{jobParameters['jobId']}") final Long jobId) {
this.jobId = jobId;
this.itemsList = null;
}
#Override
public Integer read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
// First pass: Load the list.
if (itemsList == null) {
itemsList = new ArrayList<Integer>();
MyJob myJob = (MyJob) jobService.loadById(jobId);
for (Integer i : myJob.getTargedIdList()) {
itemsList.add(i);
}
}
// Serve one at a time:
if (itemsList.isEmpty()) {
return null;
} else {
return itemsList.remove(0);
}
}
}
I tried to move the first part of the read() method to the constructor but the #Autowired reference is null at that point. Afterwards (on the read method) it's initialized.
Is there a better way to write the ItemReader? I would like to move the "load"Or is this the best solution for this scenario?
Thank you.
Generally, your approach is not "wrong", but probably not ideal.
Firstly, you could move the initialisation to a initMethod which is annotated with #PostConstruct. This method is called after all Autowired fields have been injected:
#PostConstruct
public void afterPropertiesSet() throws Exception {
itemsList = new ArrayList<Integer>();
MyJob myJob = (MyJob) jobService.loadById(jobId);
for (Integer i : myJob.getTargedIdList()) {
itemsList.add(i);
}
}
But there is still the problem, that you load all the data at once. If you have a billion records to process, this could blow up the memory.
So what you should do is to load only a chunk of your data into memory, then return the items one by one in your read method. If all entries of a chunk have been returned, load the next chunk and return its items one by one again. If there is no other chunk to be loaded, then return null from the read method.
This ensures that you have a constant memory footprint regardless of how many records you have to process.
(If you have a look at FlatFileItemReader, you see that it uses a BufferedReader to read the data from the disk. While it has nothing to do with SpringBatch, it is the same principle: it reads a chunk of data from the disk, returns that and if more data is needed, it reads the next chunk of data).
Next problem is the restartability. What happens if the job crashes after doing 90% of the work? How can the job be restarted and only process the missing 10%?
This is actually a feature that springbatch provides, all you have to do is to implement the ItemStream interface and implement the methods open(), update(), close().
If you consider this two points - load data in chunks instead all at once and implement ItemStream interface - you'll end up having a reader that is in the spring spirit.
Related
I have created a simple Spring Batch and I have a Chunk implemented as follows:
public class ChunksConfig {
.......
.......
#Override
protected ItemWriter<List<Pratica>> getItemWriter() {
return praticaWriter;
}
#Override
protected ItemReader<List<GaranziaRichiesta>> getItemReader() {
return praticheReader;
}
#Override
protected ItemProcessor<List<GaranziaRichiesta>, List<Pratica>> getItemProcessor() {
return praticaProcessor;
}
}
Now my reader should retrieve data from SOAP WebServices and, each call, retrieve 100 elements. I try to implement my reader as follows:
#Component
public class PraticaReader implements ItemReader<List<GaranziaRichiesta>> {
#Autowired
SOAPServices soapServices;
#Override
public List<GaranziaRichiesta> read()
throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
GaranzieRichiesteRequest request = new GaranzieRichiesteRequest();
GaranzieRichiesteResponse response = soapServices.garanzieRichieste(request);
return response.getGaranzie(); // this contains exactly 100 elements
}
}
Now the problem is that I don't know how set chunk size and how let Spring understand how many element is eleaborating after reading operation.
In fact if I put chunk size to 1 the code flow passes to reader -> processor and writer correctly (but the job doesn't stop and after perform the writer operation starts reading again..). Instead if I put the chunk size to 100, the flow perform 100 reading action before arriving to processor.
I absolutely new to SpringBatch, I read the doc but 90% percent of example is about reading data from file. Can you suggest me if I can do a compliant implementation of Chunk on this particular case or I should try another way?
Thank you
I've stumbled upon a pretty twisted issue with a spring batch recently.
Requirements are as follows :
I have two main steps :
The first one reads some data from an oracle database, from a table to write to another table.
The second one does some other database stuff, based upon a data handled on first step.
From a design standpoint, first step looks like this :
#Bean
public Step myFirstStep(JdbcCursorItemReader<Revision> reader) {
return stepBuilderFactory.get("my-first-step")
.<Revision, Revision>chunk(1)
.reader(readerRevisionNumber)
.writer(compositeItemWriter())
.listener(executionContextPromotionListener())
.build();
Composite item writer :
#Bean
public CompositeItemWriter<Revision> compositeItemWriter() {
CompositeItemWriter writer = new CompositeItemWriter();
writer.setDelegates(Arrays.asList(somewriter(), someOtherwriter(), aWriterThatIsSupposedToPassDataToAnotherStep()));
return writer;
}
While the first two writer are not complex, my interest is focused on the third one.
aWriterThatIsSupposedToPassDataToAnotherStep()
As you might have guessed, this one will be used to get some data being processed before, to promote it on my second Step :
#Component
#StepScope
public class AWriterThatIsSupposedToPassDataToAnotherStep implements ItemWriter<SomeEntity> {
private StepExecution stepExecution;
public void write(List<? extends SomeEntity> items) {
ExecutionContext stepContext = this.stepExecution.getExecutionContext();
stepContext.put("revisionNumber", items.stream().findFirst().get().getSomeField());
System.out.println("writing : " + items.stream().findFirst().get().getSomeField()+ "to ExecutionContext");
}
#BeforeStep
public void saveStepExecution(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
}
Problem is : As long as this writer is part of a composite writer list (as declared above)
The #BeforeStep of my last writer is never executed, this ends up me unable to transmit my information to execution context.
When replacing my CompositeItemWriter by my single "AWriterThatIsSupposedToPassDataToAnotherStep" inside step definition, it gets executed properly.
Does it have to do anything with some kind of declaration order or something ?
Big Thanks to further help.
Found the solution (with some of my coworkers help), and sourced-in from : https://stackoverflow.com/a/39698653/1957764
You'll need to both declare the writer as part of the composite writer AND a step listener to make it execute the #BeforeStep annotated method.
I have a requirement where i need to lookup few tables in ItemProcessor section. I dont want to make multiple JDBC call for each row in the ItemProcessor section where it might lead to performance issue when the spring batch started to process more number of records. What are the workarounds to avoid this situation? is there any way to preload these objects before the ItemProcessor or before batch starts and can refer it in ItemProcessor ?
You can use annotate your method with #PostConstruct to read data during the Spring application context initialization. Make your ItemReader's read method returns value from the list. When entire list is completed return null. This stops reading.
#Service
public class YourItemReader implements ItemReader<DomainObject> {
private int index;
List<DomainObject> dbRows;
#PostConstruct
public void init() {
List<DomainObject> //read from database
}
#Override
public DomainObject read(){
if (null != dbRows && index < dbRows.size()) {
return dbRows.get(index);
}
return null;
}
If the number of records are in millions, I would suggest to do a chunk based read from your database instead of reading all the records at once which might case Garbage collector out of memory exception. This can be done easily by adding a column called STATUS to your table to track the status of the records that are processed. Initially when you load data to your table, set the status as 'NOT PROCESSED' and when your ItemReader reads the chunk of records set the status to 'IN PROGRESS'. Once your ItemProcessor or ItemWriter completes its processing, change the status from 'IN PROGRESS' to 'PROCESSED'. Make sure to make the method which fetches the data from the database as 'synchronized'. This will make sure multiple threads not to fetch the same data from database.
public List<DomainObject> read(){
return fetchDataFromDb();
}
private synchronized List<DomainObject> fetchProductAssociationData(){
//read your chunk-size of records from database which has status as 'NOT
PROCESSED'
and change the status of the data which is read to 'IN PROGRESS'
return list;
}
have a task to write header to file only if some data exist, other words if reader return nothing file created by writer should be empty.
Unfortunately FlatFileItemWriter implementation, in version 3.0.7, has only private access fields and methods and nested class that store all info about writing process, so I cannot just take and overwrite write() method. I need to copy-paste almost all content of FlatFileItemWriter to add small piece of new functionality.
Any idea how to achieve this more elegantly in Spring Batch?
So, finally found a less-more elegant solution.
The solution is to use LineAggregators, and seems in the current implementation of FlatFileItemWriter this is only one approach that you can use safer when inheriting this class.
I use separate line aggregator only for a header, but the solution can be extended to use multiple aggregators.
Also in my case header is just predefined string, thus I use PassThroughLineAggregator by default that just return my string to FlatFileItemWriter.
public class FlatFileItemWriterWithHeaderOnData extends FlatFileItemWriter {
private LineAggregator lineAggregator;
private LineAggregator headerLineAggregator = new PassThroughLineAggregator();
private boolean applyHeaderAggregator = true;
#Override
public void afterPropertiesSet() throws Exception {
Assert.notNull(headerLineAggregator, "A HeaderLineAggregator must be provided.");
super.afterPropertiesSet();
}
#Override
public void setLineAggregator(LineAggregator lineAggregator) {
this.lineAggregator = lineAggregator;
super.setLineAggregator(lineAggregator);
}
public void setHeaderLineAggregator(LineAggregator headerLineAggregator) {
this.headerLineAggregator = headerLineAggregator;
}
#Override
public void write(List items) throws Exception {
if(applyHeaderAggregator){
LineAggregator initialLineAggregator = lineAggregator;
super.setLineAggregator(headerLineAggregator);
super.write(getHeaderItems());
super.setLineAggregator(initialLineAggregator);
applyHeaderAggregator = false;
}
super.write(items);
}
private List<String> getHeaderItems() throws ItemStreamException {
// your actual implementation goes here
return Arrays.asList("Id,Name,Details");
}
}
PS. This solution assumed that if method write() called then some data exist.
Try this in your writer
writer.setShouldDeleteIfEmpty(true);
If you have no data, there is no file.
In other case, you write your header and your items
I'm thinking of a way as below.
BeforeStep() (or a Tasklet) if there is no Data at all, you set a flag such as "noData" is 'true'. Otherwise will be 'false'
And you have 2 writers, one with Header and another one without Header. In this case you can have a base Writer acts as a parent and then 2 writers inherits it. The only difference between them is one with Header and one doesn't have HeaderCallBack.
Base on the flag, you can switch to either 'Writer with Header' or 'Writer without Header'
Thanks,
Nghia
We read most of our data from a DB. Sometimes the result-set is empty, and for that case we want the job to stop immediately, and not hand over to a writer. We don't want to create a file, if there is no input.
Currently we achieve this goal with a Step-Listener that returns a certain String, which is the input for a transition to either the next business-step or a delete-step, which deletes the file we created before (the file contains no real data).
I'd like the job to end after the reader realizes that there is no input?
New edit (more elegant way)
This approach is to elegantly move to the next step or end the batch application when the file is not found and prevent unwanted steps to execute (and their listeners too).
-> Check for the presence of file in a tasklet, say FileValidatorTasklet.
-> When the file is not found set some exit status (enum or final string) , here we have set EXIT_CODE
sample tasklet
public class FileValidatorTasklet implements Tasklet {
static final String EXIT_CODE = "SOME_EXIT_CODE";
static final String EXIT_DESC = "SOME_EXIT_DESC";
#Override
public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception {
boolean isFileFound = false;
//do file check and set isFileFound
if(!isFileFound){
stepContribution.setExitStatus(new ExitStatus(EXIT_CODE, EXIT_DESC));
}
return RepeatStatus.FINISHED;
}
}
-> In the job configuration of this application after executing FileValidatorTasklet, check for the presence of the EXIT_CODE.
-> Provide the subsequent path for this job if the code is found else the normal flow of the job.( Here we are simply terminating the job if the EXIT_CODE is found else continue with the next steps)
sample config
public Job myJob(JobBuilderFactory jobs) {
return jobs.get("offersLoaderJob")
.start(fileValidatorStep).on(EXIT_CODE).end() // if EXIT_CODE is found , then end the job
.from(fileValidatorStep) // else continue the job from here, after this step
.next(step2)
.next(finalStep)
.end()
.build();
}
Here we have taken advantage of conditional step flow in spring batch.
We have to define two separate path from step A. The flow is like A->B->C or A->D->E.
Old answer:
I have been through this and hence I am sharing my approach. It's better to
throw new RunTimeException("msg");.
It will start to terminate the Spring Application , rather than exact terminate at that point. All methods like close() in ( reader/writer) would be called and destroy method of all the beans would be called.
Note: While executing this in Listener, remember that by this point all the beans would have been initialized and code in their initialization (like afterPropertySet() ) would have executed.
I think above is the correct way, but if you are willing to terminate at that point only, you can try
System.exit(1);
It would likely be cleaner to use a JobExecutionDecider and based on the read count from the StepExecution set a new FlowExecutionStatus and route it to the end of the job.
Joshua's answer addresses the stopping of the job instead of transitioning to the next business step.
Your file writer might still create the file unnecessarily. You can create something like a LazyItemWriter with a delegate (FlatFileItemWriter) and it will only call delegate.open (once) if there's a call to write method. Of course you have to check if delegate.close() needs to be called only if the delegate was previously opened. This makes sure that no empty file is created and deleting it is no longer a concern.
I have the same question as the OP. I am using all annotations, and if the reader returns as null when no results (in my case a File) are found, then the Job bean will fail to be initialized with an UnsatisfiedDependencyException, and that exception is thrown to stdout.
If I create a Reader and then return it w/o a File specified, then the Job will be created. After that an ItemStreamException is thrown, but it is thrown to my log, as I am past the Job autowiring and inside the Step at that point. That seems preferable, at least for what I am doing.
Any other solution would be appreciated.
NiksVij Answer works for me, i implemented it like this:
#Component
public class FileValidatorTasklet implements Tasklet {
private final ImportProperties importProperties;
#Autowired
public FileValidatorTasklet(ImportProperties importProperties) {
this.importProperties = importProperties;
}
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
String folderPath = importProperties.getPathInput();
String itemName = importProperties.getItemName();
File currentItem = new File(folderPath + File.separator + itemName);
if (currentItem.exists()) {
contribution.setExitStatus(new ExitStatus("FILE_FOUND", "FILE_FOUND"));
} else {
contribution.setExitStatus(new ExitStatus("NO_FILE_FOUND", "NO_FILE_FOUND"));
}
return RepeatStatus.FINISHED;
}
}
and in the Batch Configuration:
#Bean
public Step fileValidatorStep() {
return this.stepBuilderFactory.get("step1")
.tasklet(fileValidatorTasklet)
.build();
}
#Bean
public Job tdZuHostJob() throws Exception {
return jobBuilderFactory.get("tdZuHostJob")
.incrementer(new RunIdIncrementer())
.listener(jobCompletionNotificationListener)
.start(fileValidatorStep()).on("NO_FILE_FOUND").end()
.from(fileValidatorStep()).on("FILE_FOUND").to(testStep()).end()
.build();
}