ETL implementation using Spring Batch - spring-batch

I need to implement an ETL application for one of the projects am working on.
It has following steps:
Need to read from a table to retrieve some values that will be
passed in as Job parameters.
The returned object of the step 1 will be further used to retrieve
some data from a second table.
Then has to read from a flat file that will be used along with the
values from step 2. Apply the business logic. Then write to a table.
We are using Spring Data JPA, Spring integration.
The challenge I am facing is to read the values from a table to retrieve the parameters for the job then launch the job.
And then the output of step 2 has to be sent along with the File information for further processing.
I know how to implement the above steps independently but struggling to tie them from end to end.
Sharing any ideas to design the above would be great. Thanks in advance.

I'll try to give you some ideas for your differents points.
1 - Read table values and pass them as Job Parameters
I see 2 solutions here :
You could do a "manual" query (ie. without springbatch), and then do your business logic to pass the results as JobParameters (you just need a JobLauncher or a CommandLineJobRunner, see Springbatch Documentation ยง4.4) :
JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
Job job = (Job) context.getBean(jobName);
// Do your business logic and your database query here.
// Create your parameters
JobParameter parameter = new JobParameter(resultOfQuery);
// Add them to a map
Map<String, JobParameter> parameters = new HashMap<String, JobParameter>();
parameters.add("yourParameter", parameter);
// Pass them to the job
JobParameters jobParameters = new JobParameters(parameters);
JobExecution execution = jobLauncher.run(job, parameters);
The other solution would be to add a JobExecutionListener and override the method beforeJob to do your query and then save the results in the executionContext (which you can then access with : #{jobExecutionContext[name]}).
#Override
public void beforeJob(JobExecution jobExecution) {
// Do your business logic and your database query here.
jobExecution.getExecutionContext().put(key, value);
}
In each case, you can use a SpringBatch ItemReader to do your query. You can, for example, declare an item reader as a field for your listener (don't forget the setter) and configure it as such :
<batch:listener>
<bean class="xx.xx.xx.YourListener">
<property name="reader">
<bean class="org.springframework.batch.item.database.JdbcCursorItemReader">
<property name="dataSource" ref="dataSource"></property>
<property name="sql" value="${yourSQL}"></property>
<property name="rowMapper">
<bean class="xx.xx.xx.YourRowMapper"></bean>
</property>
</bean>
</property>
</bean>
</batch:listener>
2 - Read a table depending on results from previous step
Once more, you can use the JobExecutionContext to store and retreive data between steps. You can then implement a StepExecutionListener to override the method beforeStep and access StepExecution which will lead you to JobExecution.
3 - Send result from table reading along results of file reading
There is no "default" CompositeItemReader which would let you read from 2 sources at the same time, but I don't think that's what you actually want to do.
For your case, I would declare the "table reader" as the reader in a <batch:chunk> and then declare a custom ItemProcessor which would have another ItemReader field. This reader would be your FlatFileItemReader. You can then manually start the read and apply your business logic in the process method.

Related

spring batch using ClassifierCompositeItemwriter and additionally write all items to database

I have a project in spring batch, in which I read from a txt file (input data) and according to a validation of the item I read, it should be written in a txt file (output 1) or in another txt file (output 2) I think for this i should use a ClassifierCompositeItemwriter, how i can do to additionally write all items that I read in a database (output 3)?
I must keep in mind that the three outputs have different formats
Thanks!
You can use CompositeItemWriter which will delegate writing to a list of ItemWriter . The order of the ItemWriter list is important as it will call the ItemWriter in order. So make sure ClassifierCompositeItemwriter is before the ItemWriter for writing to database. :
#Bean
public ItemWriter itemWriter(ClassifierCompositeItemwriter classiferWriter ,JdbcBatchItemWriter jdbcWriter){
CompositeItemWriter writer = new CompositeItemWriter();
writer.setDelegates(List.of(classiferWriter,jdbcWriter));
return writer;
}

Spring batch get lastJobExecution

I need to process DB data from last job execution till now.
There is the JobRepository class. It has getLastJobExecution(jobName, jobParams) method. To get the last job execution, I should somehow extract last job parameters.
Is there a possibility provided by spring batch to do this?
You can access SB metadata tables with direct queries if interface exposed from JobRepository is not enough for your needs.

Using Spring Batch Admin

I have looked around quite a bit at Spring Batch and Spring Batch Admin. My question is as follows. I understand that Spring Batch meta-tables do not store an attribute 'jobId' as such but the 'job name' which is the value passed as the 'id' in the <job/> bean. I want to have something of the following sort. For example:
<job id="myJob">
<property name="jobId" value="123"/>
</job>
That is, for my specific requirement I want to display the 'jobId' against the respective 'jobName'. So I have created another table that holds the 'jobName' and the 'jobId'. But I am unable to make any progress on how to go about making the Spring Batch Admin UI pick up the 'jobId' given the 'jobName' from my table and display it on the Admin screen. Or, is there any other way through which Spring Admin could pick up the jobId? For instance, will it make sense to have a class extend 'SimpleJob' and then make the job a child of this class? Say, something like this:
class MyJob extends SimpleJob{
private int jobId;
}
//And then in the config file
<bean id="baseJob" class="...MyJob/>
<job id="myJob" parent="baseJob">
<property name="jobId" value="123"/>
</job>
By the way, I am using spring-admin-manager and spring-admin-resources version '1.3.1.RELEASE'. And spring batch version is '2.1.8.RELEASE'
Would somehow please share some pointers?
Thanks
what is the spring batch version that you are using..?
While ago when I was using the batch varsion Spring-batch 2.1.8 - it used to insert the jobID , jobName , jobStatus and time too.

Passing fragmentRootElementName as parameters to the xml

Is it possible to send fragmentRootElementName as a parameter to the job xml file. I have two processes one is plan and the other contract. So I divided my job into reading the file from database, converting it to an object and then publishing it in webservices. The reading part first reads a property file, there we get the info if the process is a plan or contract and accordingly we need to call the corresponding process. I did the one flow for plan but is it possible to pass the fragmentRootElementName as a prameter.. as it would be different for plan and contract
Thanks
Yes, you can using late-binding via scope="step" in this way:
<bean id="myReader" class="org.springframework.batch.item.xml.StaxEventItemReader" scope="step">
<property name="fragmentRootElementName" value="#{jobParameters['rootFragmentName']}" />
<!-- Other properties -->
</bean>

How to make ItemReader reads 2 tables

I have to create a batch job to do financial reconciliation. Right now i have 3 steps:
step 1 : Read an XML from the third party , convert this in our domains object, write in DB(table 1)
step 2 : Read a flatFile from our transactions datastore, write in DB (Table2)
step 3 : Read both table1 and table 2 in an aggregatorObject, process both list to find differences and set status code, write a status code in table 2
My problem is with step3. I can't find a good solution to have my ItemReader reading from 2 SQL.
I started with a custom ItemReader like this :
package batch.concilliation.readers;
#Component("conciliationReader")
public class TransactionReader implements ItemReader<TransactionsAgragegator>{
private final Logger log = Logger.getLogger(TransactionReader.class);
#Autowired
private ConciliationContext context;
#Autowired
private ServiceSommaireConciliation serviceTransactionThem;
#Autowired
private ServiceTransactionVirement serviceTransactionUs;
#Override
public TransactionsAgragegator read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
TransactionsAgragegator agregator = new TransactionsAgragegator();
SommaireConciliationVirementInterac sommaire = serviceSommaireThem.findByRunNo(context.getRunNo());
List<TransactionVirement> journalSic = serviceTransactionUs.findByTimestamp(sommaire.getBeginDate(), sommaire.getEndDate());
// on place ces deux listes dans l'objet agregteur.
agregator.setListeTransactionThem(sommaire.getPayments());
agregator.setListeTransactionsUs(journalSic);
return aggregator;
}
}
This Reader use two services already implemented (DAO) that read both tables and return domain objects. I take the two lists of transaction from us and from them and put them in an aggregator object. This object would be passed to the ItemProcessor and i could do my business logic... but this reader start an infinite loop since it will never read null.
I read about ItemReaderAdapter, but i still have the same problem of looping over a collection until i get a null.
So in summary, i want to read 2 different tables and get 2 List:
List<TransactionThirdParty>
List<TransactionHome>
then My ItemProcesssor would check to see if both lists are equals or not, is one has more or less transactions then the other..etc
Any Spring Batch expert can suggest something?
The problem here is that your first two steps are chunk oriented but the third one is not. While the first two may have the usual read-process-write cycle, the third step while dependent on the first two is a one time operation. It is no more different then copying a file in batch domain.
So you should not use the ItemReader way here, because you do not have an exit criteria (that is why you never get nulls from the reader, it cannot know when the source is exhausted since it does not deal with a line or record.
That is where TaskletStep helps
The Tasklet is a simple interface that has one method, execute, which
will be a called repeatedly by the TaskletStep until it either returns
RepeatStatus.FINISHED or throws an exception to signal a failure.
So implement your third step as a Tasklet instead of chunk oriented way.