How do you read meta data before the actual job in Spring Batch - spring-batch

I'm currently designing a Spring Batch application that reads from a table, transforms the data and then writes it to another table.
However, before I begin reading the source table, I need to collect some meta data for the application run (e.g. read the holiday calendar table to determine if it's a bank holiday or not). This meta data will not change anymore during runtime, so it needs to be read only once, at the very beginning of the application run.
How can this be achieved? Use a JobListener? Configure a separate Job for this and then pass the information to the "actual" job through an ExecutionContext? Configure a separate step that gets only executed once?

Configure a JobExecutionListener to get the information you need and store it on the Job's ExecutionContext.
You can create a Listener class that either extends JobExecutionListenerSupport to only override the beforeJob method or create a standalone Listener class with a beforeJob method annotated with #BeforeJob.
When configuring the job, just add an instance of your custom Listener class to your JobBuilder configuration before adding any steps.
#Bean
public Job myJob() {
return this.jobBuilderFactory.get("myJob")
.listener(new MyListener())
.start(step1())
.next(step2())
.next(step3())
.build();
}
Anything you add in your Job's ExecutionContext can then be injected into any other Processor/Reader/Writer/Step beans that are configured as long as they are annotated with either #JobScope or #StepScope:
#Bean
#JobScope
public ItemReader<MyItem> myItemReader(
#Value("#{jobExecutionContext['myDate']}") Date myDate) {
//...
}
Component classes work the same as well
#Component
#JobScope
static class MyProcessor implements ItemProcessor<ItemA, ItemB> {
private Date myDate;
public MyProcessor(
#Value("#{jobExecutionContext['myDate']}") Date myDate) {
this.myDate = myDate;
}
// ...
}

Related

How to perform logic after ItemWriter has completed?

Hi all I'm new to SO and to Springbatch. I have written a batch job with Classifer that updates a table in two different ways (i.e. two ItemWriters) depending on what's retrieved through the ItemReader and all that's working fine. Now, I want to perform some logic after the ItemWriters are done updating. I want to do some logging and update another table with the same set of data retrieved previously. How can I achieve this? I looked at ItemWriterListener but seems it cannot perform data specific logics. I did some searching but with no luck. Any help would be appreciated. Thanks in advance!!
You can try using StepExecutionListener implementing it to your Writer Class to execute logic once ItemWriter is done with the execution. Below is a snippet of the ItemWriter for your reference,
public class TestWriter implements ItemWriter<Test>, StepExecutionListener {
#Override
public void beforeStep(StepExecution stepExecution) {
}
#Override
public void write(List<? extends Test> items) throws Exception {
// Logic of Writer
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
// You can perform post logic after writer here inside afterStep based on your requirements
// Return custom exit status based on the run
return ExitStatus.COMPLETED;
}
}
Now, I want to perform some logic after the ItemWriters are done updating. I want to do some logging and update another table with the same set of data retrieved previously. How can I achieve this? I looked at ItemWriterListener but seems it cannot perform data specific logics.
Since you want to do something with the same items retrieved previously, you need to use a ItemWriteListener#afterWrite as this method gives you access to the items that have just been written.
EDIT: Add details about the failure case based on comments
If the transaction is rolled back, the method ItemWriteListener#onWriteError will be called. Please find more details about this in the common patterns section.

Spring Batch - How to prevent batch from storing transactions in DB

First the problem statement:
I am using Spring-Batch in my DEV environment fine. When I move the code to a production environment I am running into a problem. In my DEV environment, Spring-Batch is able to create it's transaction data tables in our DB2 database server with out problem. This is not a option when we go to PROD as this is a read only job.
Attempted solution:
Search Stack Overflow I found this posting:
Spring-Batch without persisting metadata to database?
Which sounded perfect, so I added
#Bean
public ResourcelessTransactionManager transactionManager() {
return new ResourcelessTransactionManager();
}
#Bean
public JobRepository jobRepository(ResourcelessTransactionManager transactionManager) throws Exception {
MapJobRepositoryFactoryBean mapJobRepositoryFactoryBean = new MapJobRepositoryFactoryBean(transactionManager);
mapJobRepositoryFactoryBean.setTransactionManager(transactionManager);
return mapJobRepositoryFactoryBean.getObject();
}
I also added it to my Job by calling .reporitory(jobRepository).
But I get
Caused by: java.lang.NullPointerException: null
at org.springframework.batch.core.repository.dao.MapJobExecutionDao.synchronizeStatus(MapJobExecutionDao.java:158) ~[spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
So I am not sure what to do here. I am new to Spring so I am teaching myself as I go. I am open to other solutions, such as an in memory database, but I have not been able to get them to work either. I do NOT need to save any state or session information between runs, but the data base query I am running will return around a million or so rows, so I will need to get that in chunks.
Any suggestions or help would be greatly appreciated.
Add this beans to AppClass
#Bean
public PlatformTransactionManager transactionManager() {
return new ResourcelessTransactionManager();
}
#Bean
public JobExplorer jobExplorer() throws Exception {
MapJobExplorerFactoryBean jobExplorerFactory = new MapJobExplorerFactoryBean(mapJobRepositoryFactoryBean());
jobExplorerFactory.afterPropertiesSet();
return jobExplorerFactory.getObject();
}
#Bean
public MapJobRepositoryFactoryBean mapJobRepositoryFactoryBean() {
MapJobRepositoryFactoryBean mapJobRepositoryFactoryBean = new MapJobRepositoryFactoryBean();
mapJobRepositoryFactoryBean.setTransactionManager(transactionManager());
return mapJobRepositoryFactoryBean;
}
#Bean
public JobRepository jobRepository() throws Exception {
return mapJobRepositoryFactoryBean().getObject();
}
#Bean
public JobLauncher jobLauncher() throws Exception {
SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
simpleJobLauncher.setJobRepository(jobRepository());
return simpleJobLauncher;
}
This doesn't directly answer your question, but that is not a good solution; the map-based repository is supposed to be used only for testing. It will grow in memory indefinitely.
I suggest you use an embedded database like sqlite. The main problem in using a separate database for job metadata is that you should then coordinate the transactions between the two databases that you use (so that the state of metadata matches that of the data), but since it seems you're not even writing in the main database, that probably won't be a problem for you.
You could use an in-memory database (for example H2 or HSQL) quite easily. Examples of that you can find for example here: http://www.mkyong.com/spring/spring-embedded-database-examples/.
As for the Map-backed job repository, it does provide a method to clear its contents:
public void clear()
Convenience method to clear all the map DAOs globally, removing all entities.
Be aware that a Map-based job repository is not fit for use in partitioned steps and other multi-threading.
The following seems to have done the job for me:
#Bean
public DataSource dataSource() {
EmbeddedDatabaseBuilder builder = new EmbeddedDatabaseBuilder();
EmbeddedDatabase db = builder
.setType(EmbeddedDatabaseType.HSQL)
.build();
return db;
}
Now Spring is not creating tables in our production database, and when the JVM exits state is lost so nothing seems to be hanging around.
UPDATE: The above code has caused concurrency errors for us. We have addressed this by abandoning the EmbeddedDatabaseBuilder and declaring the HSQLDB this way instead:
#Bean
public BasicDataSource dataSource() {
BasicDataSource dataSource = new BasicDataSource();
dataSource.setDriverClassName("org.hsqldb.jdbcDriver");
dataSource.setUrl("jdbc:hsqldb:mem:testdb;sql.enforce_strict_size=true;hsqldb.tx=mvcc");
dataSource.setUsername("sa");
dataSource.setPassword("");
return dataSource;
}
The primary difference is that we are able to specify mvcc (Multiversion concurrency control) in connection string which resolves the issue.

MEF exports that require remote data (like DB data) in order to be created

please excuse the long description at the beginning. the questions are at the end.
i have a windows service that is supposed to read data form some data sources (represented by the IDataSource interface).
i'm using MEF in my project and i was thinking of injecting the required data sources via ctor injection like below:
[Export(typeof(Service))]
public class Service:ServiceBase{
[ImportingConstructor]
public Service([ImportMany]IEnumerable<IDataSource> dataSources){
//...
}
}
However, there is a problem in doing it like this. The service needs to use any combination of data sources: multiple data sources of the same type (ex: 2 CSVDataSource instances) or multiple data sources of different types (ex: 2 CSVDataSource instances and 1 SQLDataSource instance).
Each data source has properties that are retrieved from the DB in order to properly set it up. these settings might indicate from where to read the data and at what intervals. this is why, in my implementation, the data sources have a ctor that accepts an id. this id is used to identify the data source in the DB and to retrieve the specific data source settings from the DB. this can be seen below.
public class CSVDataSource: IDataSource{
public CSVDataSource(int dsId){
//call web service in order to get properties to
//properly set up the data source.
}
//...
}
i feel that the service definition presented above is not suited for this scenario. The other approach I can think of is to use some sort of factory that allows the service to dynamically create the data sources inside. this implementation might look like below.
public class Service:ServiceBase{
[ImportingConstructor]
public Service(IDataSourceFactory dsFactory)
{
if (dsFactory == null) throw new ArgumentNullException("dsFactory");
IEnumerable<IDataSource> dataSources = dsFactory.CreateAll();
}
}
[Export(typeof(IDataSourceFactory))]
[PartCreationPolicy(CreationPolicy.Shared)]
public class DataSourceFactory:IDataSourceFactory
{
private readonly int agentId;
[ImportingConstructor]
public DataSourceFactory([Import("AgentId")]int agentId)
{
this.agentId = agentId;
}
public IEnumerable<IDataSource> CreateAll()
{
List<IDataSource> dataSources = new List<IDataSource>();
//access web service and instantiate the data sources
return dataSources;
}
}
And now to my questions:
is my factory approach a good ideea or should i look for another approach?
is it ok to have exports that require data from a remote location in order to be created?
Did you come across ExportMetadataAttribute before? It will allow you to assign metadata to an export that you can view before the export is created. You'll be able to import your IDataSources as Lazy and then should be able to create them yourself with the required parameters.
There's a good breakdown of Lazy and ExportMetadata here

How to write more then one class in spring batch

Situation:
I read url of file on internet from db. In itemProcessor I download this file and I want to save each row to database. Then processing continue and I want to create some new class "summary" which I want to save to db too. How should configure my job in spring batch ?
For your use-case job can be defined using this step sequence (in this way this job is also restartable):
Download file from URL to HDD using a Tasklet: a Tasklet is the strategy to process a single step; in your case something similar to this post can help and store local filename to JobExecutionContext.
Process downloaded file:
2.1. With a FlatFileItemReader<S> (or your own ItemReader/ItemStream implementation) read downloaded file
2.2 With an ItemProcessor<S,T> process each row
2.3 Write each object to processed in 2.2 to database using a custom MyWriter<T> that do summary calculation and delegate to ItemWriter<T> for T's database persistence and to ItemWriter<Summary> to write Summary object.
<S> is the bean contains each file row and
<T> is the bean your write to db
MyWriter<T> can be used in this way:
class MyWriter extends ItemWriter<T> {
private ItemWriter<Summary> summaryWriter;
private ItemWriter<T> tWriter;
public void write(List<? super T> items) {
List<Summary> summaries = new ArrayList<>(items.size());
for(T item : items) {
final Summary summary = /* Here create summary object reading from
* database or creating new object */
/* Do summary or update summary */
summaries.add(summary);
}
/* The code above is trivial: you can group Summary object using a Map<SummaryKey,Summary> to reduce reading and use summaryWriter.write(summariesMap.values()) for example */
tWriter.write(items);
summaryWriter.write(summaries);
}
}
You need to save as stream both MyWriter.summaryWriter and MyWriter.tWriter for restartability.
You can use a CompositeItemWriter.
But perhaps your summary processing should be in another step which reads the rows you previously inserted

spring batch - processor chain

I need to execute seven distinctive processes sequently(One after the other). The data is stored in Mysql. I am thinking of the following options, Please correct me if I am wrong, or if there is a better solution.
Requirments:
Read the data from the Db, do the seven processes(datavalidation, calculation1, calculation2 ...etc.) finally, write the processed data into the DB.
Need to process the data in chunks.
My solution and issues:
Data read:
Read the data using JdbcCursorItemReader, because this is the best performing db reader - But, the SQL is very complex , so I may have to consider a custom ItemReader using JdbcTemplate? which gives me more flexibility in handling the data.
Process:
Define seven steps and chunks, share the data between the steps using databean. But, this won't be a good idea, because the data processes in chunks and after each chunk the step1 writer will create a new set of data in the databean. When this databean shared across the other steps, data integrity will be an issue.
Use StepExecutionContext to share the data between steps. But this may affect the performance as this involves Batch job repository.
Define only one step, with one ItemReader, and a chain of processes (the seven processes), and create one ItemWriter which writes the processed data into the DB. But, I won't be able to administrate or monitor each different processes, all will be in one step.
the org.springframework.batch.item.support.CompositeItemProcessor is an out of the box component from the Spring Batch Framework that would support your requirement akin to your second option. this would allow you do to the following;
- keep separation in your design/solution for reading from the database (itemreader)
- keep separation of each individual processors 'concerns' and configuration
- allow any individual processor to 'shutdown' the chunk by returning null, irrespective of previous processes
the CompositeItemProcessor iterates over a loop of delegates, so it's 'similar' to an action pattern. it's quite useful in the scenario you've described and still allows you to leverage the Chunk benefits (exception, retry, commit policy, etc.)
Suggestions:
1) Read the data using JdbcCursorItemReader.
All out-of-the-box Components are a good choice because they already implements the ItemStream interface that make your steps restartable. But like you mention, sometime, the request is just to complexe or, like me, you already have a service or DAO that you can reuse.
I would suggest you use the ItemReaderAdapter. It let you configure a delegate service to call to get your data.
<bean id="MyReader" class="xxx.adapters.MyItemReaderAdapter">
<property name="targetObject" ref="AnExistingDao" />
<property name="targetMethod" value="next" />
</bean>
Note that the targetMethod must respect the read contract of ItemReaders (return null when no more data)
If your job does not need to be restartable, you could simply use the class : org.springframework.batch.item.adapter.ItemReaderAdapter
But if you need your job to be restartable, you can create your own ItemReaderAdapter like this:
public class MyItemReaderAdapter<T> extends AbstractMethodInvokingDelegator<T> implements ItemReader<T>, ItemStream {
private long currentCount = 0;
private final String CONTEXT_COUNT_KEY = "count";
/**
* #return return value of the target method.
*/
public T read() throws Exception {
super.setArguments(new Long[]{currentCount++});
return invokeDelegateMethod();
}
#Override
public void open(ExecutionContext executionContext)
throws ItemStreamException {
currentCount = executionContext.getLong(CONTEXT_COUNT_KEY,0);
}
#Override
public void update(ExecutionContext executionContext) throws ItemStreamException {
executionContext.putLong(CONTEXT_COUNT_KEY, currentCount);
log.info("Update Stream current count : " + currentCount);
}
#Override
public void close() throws ItemStreamException {
// TODO Auto-generated method stub
}
}
Because the out-of-the-box itemReaderAdapter is not restartable, you just create your own that implements the ItemStream
2) Regarding the 7 steps vs 1 step.
I would go with 1 step with compositeProcessor on this one. the 7 steps option will only bring problems IMO.
1) 7 steps databean : so your writer commit in a databean until step 7.. then step 7 writer try to commit to the real database and boom error!!! all is lost and the batch must restart from step 1!!
2) 7 steps with context : could be better since you will have the state saved in the spring batch metadata.. BUT it is not a good practice to store big data in the metadata of springBatch!!
3) is the way to go IMO. ;-)