Situation:
I read url of file on internet from db. In itemProcessor I download this file and I want to save each row to database. Then processing continue and I want to create some new class "summary" which I want to save to db too. How should configure my job in spring batch ?
For your use-case job can be defined using this step sequence (in this way this job is also restartable):
Download file from URL to HDD using a Tasklet: a Tasklet is the strategy to process a single step; in your case something similar to this post can help and store local filename to JobExecutionContext.
Process downloaded file:
2.1. With a FlatFileItemReader<S> (or your own ItemReader/ItemStream implementation) read downloaded file
2.2 With an ItemProcessor<S,T> process each row
2.3 Write each object to processed in 2.2 to database using a custom MyWriter<T> that do summary calculation and delegate to ItemWriter<T> for T's database persistence and to ItemWriter<Summary> to write Summary object.
<S> is the bean contains each file row and
<T> is the bean your write to db
MyWriter<T> can be used in this way:
class MyWriter extends ItemWriter<T> {
private ItemWriter<Summary> summaryWriter;
private ItemWriter<T> tWriter;
public void write(List<? super T> items) {
List<Summary> summaries = new ArrayList<>(items.size());
for(T item : items) {
final Summary summary = /* Here create summary object reading from
* database or creating new object */
/* Do summary or update summary */
summaries.add(summary);
}
/* The code above is trivial: you can group Summary object using a Map<SummaryKey,Summary> to reduce reading and use summaryWriter.write(summariesMap.values()) for example */
tWriter.write(items);
summaryWriter.write(summaries);
}
}
You need to save as stream both MyWriter.summaryWriter and MyWriter.tWriter for restartability.
You can use a CompositeItemWriter.
But perhaps your summary processing should be in another step which reads the rows you previously inserted
Related
I'm currently designing a Spring Batch application that reads from a table, transforms the data and then writes it to another table.
However, before I begin reading the source table, I need to collect some meta data for the application run (e.g. read the holiday calendar table to determine if it's a bank holiday or not). This meta data will not change anymore during runtime, so it needs to be read only once, at the very beginning of the application run.
How can this be achieved? Use a JobListener? Configure a separate Job for this and then pass the information to the "actual" job through an ExecutionContext? Configure a separate step that gets only executed once?
Configure a JobExecutionListener to get the information you need and store it on the Job's ExecutionContext.
You can create a Listener class that either extends JobExecutionListenerSupport to only override the beforeJob method or create a standalone Listener class with a beforeJob method annotated with #BeforeJob.
When configuring the job, just add an instance of your custom Listener class to your JobBuilder configuration before adding any steps.
#Bean
public Job myJob() {
return this.jobBuilderFactory.get("myJob")
.listener(new MyListener())
.start(step1())
.next(step2())
.next(step3())
.build();
}
Anything you add in your Job's ExecutionContext can then be injected into any other Processor/Reader/Writer/Step beans that are configured as long as they are annotated with either #JobScope or #StepScope:
#Bean
#JobScope
public ItemReader<MyItem> myItemReader(
#Value("#{jobExecutionContext['myDate']}") Date myDate) {
//...
}
Component classes work the same as well
#Component
#JobScope
static class MyProcessor implements ItemProcessor<ItemA, ItemB> {
private Date myDate;
public MyProcessor(
#Value("#{jobExecutionContext['myDate']}") Date myDate) {
this.myDate = myDate;
}
// ...
}
Commit interval will commit the data at specified intervals. I want to commit the entire file at a single shot since my requirement is to validate the file (line by line) and if it fails at any point . roll back. no commit. is there any way to achieve this in spring batch?
You can either set your commit-interval to Integer.MAX_VALUE (231-1) or create your own CompletionPolicy.
Here's how you configure a step to use a custom CompletionPolicy :
<chunk reader="reader" writer="writer" chunk-completion-policy="completionPolicy"/>
<bean id="completionPolicy" class="xx.xx.xx.CompletionPolicy"/>
Then you have to either choose an out-of-the-box CompletionPolicy provided by Spring Batch (a list of implementations is available on previous link) or create your own.
What do you mean by "commit"?
You are talking about validating and not about writing the read data to another file or into database.
As mentioned in the comment by Michael Prarlow, memory problems could arise, if the size of the file changes.
In order to prevent this, I would suggest to start your job with a validation step. Simply read the data chunkwise, check the data line by line in your processor and throw a none-skippable exception, if the line is not valid. Use a passthroughwriter, so nothing is persisted. If there is a problem, the whole job will fail.
If you really have to write the data into a db or another file, you could do this in a second step. Since you have validated your data, you shouldn't observe any problems.
Simple PassThroughItemWriter
public class PassThroughItemWriter<T> implements ItemWriter<T> {
public void write(List<? extends T> items) {
// do nothing
}
}
or, if you use the Java-Api to build your job and steps, you could simply use a lambda:
stepBuilders.get("step")
.<..., ...>chunk(..)
.reader(...)
.processor(...) // your processor with the validation logic
.writer(items -> {}) // empty lambda expression
.build();
Actually I am executing my selenium test by reading test case data from excel.I wanted to fetch whether the test result is Passed or failed after execution of my first test case and write it in front of test case then y second test case and write it in front of test case and so on .
Before execution of my test case excelsheet screenshoot
http://i.stack.imgur.com/L2LNz.png
after execution of my test cases excelsheet screenshoot
http://i.stack.imgur.com/mMivW.png
You can fetch the results using TestNG. TestNG contains default listeners which reads if your test passed/failed/was skipped.
To set this data in excelsheet you need to create a class that implements from ITestListener
public class ExcelListener implements ITestListener
If you use any IDE, you should see a warning about need of creating unimplemented methods. Allow system to create them and you should see methods like
#Override
public void onTestSuccess(ITestResult result) {
// TODO Auto-generated method stub
}
Then all you have to code is
1. Open excel file
2. Find the right column
3. Insert status
To do that I recommend using Java Excel API.
To read existing excelsheet you need to provide absolute path, workbook name and a sheetname. Here's my code for method getExcel
public void getExcel(String filePath, String sheetName, String fileName) throws BiffException, IOException {
String absolutePath = filePath.concat("/").concat(fileName);
file = new FileInputStream(new File(absolutePath));
workbook = Workbook.getWorkbook(file);
worksheet = workbook.getSheet(sheetName);
}
After getting an excel file, you need to iterate through data.
You can provide exact column and row.
Hope it helps!
EDIT:
Place a listener like this
#Listeners(MyExcelListener.class)
public class MyTestClass {
}
please excuse the long description at the beginning. the questions are at the end.
i have a windows service that is supposed to read data form some data sources (represented by the IDataSource interface).
i'm using MEF in my project and i was thinking of injecting the required data sources via ctor injection like below:
[Export(typeof(Service))]
public class Service:ServiceBase{
[ImportingConstructor]
public Service([ImportMany]IEnumerable<IDataSource> dataSources){
//...
}
}
However, there is a problem in doing it like this. The service needs to use any combination of data sources: multiple data sources of the same type (ex: 2 CSVDataSource instances) or multiple data sources of different types (ex: 2 CSVDataSource instances and 1 SQLDataSource instance).
Each data source has properties that are retrieved from the DB in order to properly set it up. these settings might indicate from where to read the data and at what intervals. this is why, in my implementation, the data sources have a ctor that accepts an id. this id is used to identify the data source in the DB and to retrieve the specific data source settings from the DB. this can be seen below.
public class CSVDataSource: IDataSource{
public CSVDataSource(int dsId){
//call web service in order to get properties to
//properly set up the data source.
}
//...
}
i feel that the service definition presented above is not suited for this scenario. The other approach I can think of is to use some sort of factory that allows the service to dynamically create the data sources inside. this implementation might look like below.
public class Service:ServiceBase{
[ImportingConstructor]
public Service(IDataSourceFactory dsFactory)
{
if (dsFactory == null) throw new ArgumentNullException("dsFactory");
IEnumerable<IDataSource> dataSources = dsFactory.CreateAll();
}
}
[Export(typeof(IDataSourceFactory))]
[PartCreationPolicy(CreationPolicy.Shared)]
public class DataSourceFactory:IDataSourceFactory
{
private readonly int agentId;
[ImportingConstructor]
public DataSourceFactory([Import("AgentId")]int agentId)
{
this.agentId = agentId;
}
public IEnumerable<IDataSource> CreateAll()
{
List<IDataSource> dataSources = new List<IDataSource>();
//access web service and instantiate the data sources
return dataSources;
}
}
And now to my questions:
is my factory approach a good ideea or should i look for another approach?
is it ok to have exports that require data from a remote location in order to be created?
Did you come across ExportMetadataAttribute before? It will allow you to assign metadata to an export that you can view before the export is created. You'll be able to import your IDataSources as Lazy and then should be able to create them yourself with the required parameters.
There's a good breakdown of Lazy and ExportMetadata here
I need to execute seven distinctive processes sequently(One after the other). The data is stored in Mysql. I am thinking of the following options, Please correct me if I am wrong, or if there is a better solution.
Requirments:
Read the data from the Db, do the seven processes(datavalidation, calculation1, calculation2 ...etc.) finally, write the processed data into the DB.
Need to process the data in chunks.
My solution and issues:
Data read:
Read the data using JdbcCursorItemReader, because this is the best performing db reader - But, the SQL is very complex , so I may have to consider a custom ItemReader using JdbcTemplate? which gives me more flexibility in handling the data.
Process:
Define seven steps and chunks, share the data between the steps using databean. But, this won't be a good idea, because the data processes in chunks and after each chunk the step1 writer will create a new set of data in the databean. When this databean shared across the other steps, data integrity will be an issue.
Use StepExecutionContext to share the data between steps. But this may affect the performance as this involves Batch job repository.
Define only one step, with one ItemReader, and a chain of processes (the seven processes), and create one ItemWriter which writes the processed data into the DB. But, I won't be able to administrate or monitor each different processes, all will be in one step.
the org.springframework.batch.item.support.CompositeItemProcessor is an out of the box component from the Spring Batch Framework that would support your requirement akin to your second option. this would allow you do to the following;
- keep separation in your design/solution for reading from the database (itemreader)
- keep separation of each individual processors 'concerns' and configuration
- allow any individual processor to 'shutdown' the chunk by returning null, irrespective of previous processes
the CompositeItemProcessor iterates over a loop of delegates, so it's 'similar' to an action pattern. it's quite useful in the scenario you've described and still allows you to leverage the Chunk benefits (exception, retry, commit policy, etc.)
Suggestions:
1) Read the data using JdbcCursorItemReader.
All out-of-the-box Components are a good choice because they already implements the ItemStream interface that make your steps restartable. But like you mention, sometime, the request is just to complexe or, like me, you already have a service or DAO that you can reuse.
I would suggest you use the ItemReaderAdapter. It let you configure a delegate service to call to get your data.
<bean id="MyReader" class="xxx.adapters.MyItemReaderAdapter">
<property name="targetObject" ref="AnExistingDao" />
<property name="targetMethod" value="next" />
</bean>
Note that the targetMethod must respect the read contract of ItemReaders (return null when no more data)
If your job does not need to be restartable, you could simply use the class : org.springframework.batch.item.adapter.ItemReaderAdapter
But if you need your job to be restartable, you can create your own ItemReaderAdapter like this:
public class MyItemReaderAdapter<T> extends AbstractMethodInvokingDelegator<T> implements ItemReader<T>, ItemStream {
private long currentCount = 0;
private final String CONTEXT_COUNT_KEY = "count";
/**
* #return return value of the target method.
*/
public T read() throws Exception {
super.setArguments(new Long[]{currentCount++});
return invokeDelegateMethod();
}
#Override
public void open(ExecutionContext executionContext)
throws ItemStreamException {
currentCount = executionContext.getLong(CONTEXT_COUNT_KEY,0);
}
#Override
public void update(ExecutionContext executionContext) throws ItemStreamException {
executionContext.putLong(CONTEXT_COUNT_KEY, currentCount);
log.info("Update Stream current count : " + currentCount);
}
#Override
public void close() throws ItemStreamException {
// TODO Auto-generated method stub
}
}
Because the out-of-the-box itemReaderAdapter is not restartable, you just create your own that implements the ItemStream
2) Regarding the 7 steps vs 1 step.
I would go with 1 step with compositeProcessor on this one. the 7 steps option will only bring problems IMO.
1) 7 steps databean : so your writer commit in a databean until step 7.. then step 7 writer try to commit to the real database and boom error!!! all is lost and the batch must restart from step 1!!
2) 7 steps with context : could be better since you will have the state saved in the spring batch metadata.. BUT it is not a good practice to store big data in the metadata of springBatch!!
3) is the way to go IMO. ;-)