This is a Spring Batch problem.
I would like to read some information from a CSV, then use that to read from two different tables in a database, then perform an update on those rows. I have a reader than reads from a CSV, and can write to two tables by making a composite writer.
I would prefer a solution that uses Java configuration (it's too bad so many examples use XML configuration on the Web, and haven't been updated to do Java configuration).
The more sample code that you can provide, the better, in particular, if I had to use a listener or a processor, how would I perform the query and get the result.
What you're really looking for isn't chaining of readers but using an ItemProcessor to enrich the data that was read in from the CSV. I'd expect your step to be something along the lines of FlatFileItemReader for the reader, your own custom ItemProcessor that enriches the object provided from the reader, and then (as you mentioned) a CompositeItemWriter that delegates the writes to the appropriate other writers.
Related
I have a use case where I am using spring batch and writing to 3 different data sources based on the job parameters. All of this mechanism is working absolutely fine but the only problem is the meta data. Spring batch is using the default data Source to write the metadata . So whenever I write the data for a job, the transactional data always goes to the correct DB but the batch metadata always goes to default DB.
Is it possible to selectively write the meta data also to the respective databases based on the jobs parameter?
#michaelMinella , #MahmoudBenHassine Can you please help.
We have scenario where I have to get data from one database and update data to another database after applying business rules.
I want to use spring batch+drools+hibernate.
Can we apply rules in batch as we have million records at one time?
I am not an expert of drools and I am simply trying to give some context about Spring Batch.
Spring Batch is a Read -> Process -> Write framework and what we do with drools is same as what we do in Process step of Spring Batch i.e. we transform a read item in an ItemProcessor.
How Spring Batch helps you for handling large number of items is by implementing Chunk Oriented processing i.e. We read N-number of items in one go, transform these items one by one in Processor & then write a bulk of items in writer - this way we are basically reducing number of DB calls.
There are further scope of performance improvement by implementing parallelism via partitioning etc if your data can be partitioned on some criteria.
So we read items in bulk , transform one by one & then write in bulk to target database & I don't think hibernate is a good tool for bulk update / insert at write step - I would go by plain JDBC.
Your drools comes into picture at transformation step & that is going to be your custom code & its performance will have nothing to do with Spring Batch i.e. how you initialize sessions , pre compile rules etc . You will have to plug in this code in such a way that you don't initialize drools session etc every time but that should be one time activity.
As per the Spring Batch Documentation, it provides the variety of flavors to read data from the database as ItemReader. In my case, there are lots of business validation needs to be performed against the database.
Let's say after reading data from any of the below source, I wanted to validate them against the multiple databases, Can I use Spring JdbcTemplate in Spring Batch Job Implementation?
1. HibernatePagingItemReader
2. HibernateCursorItemReader
3. JpaPagingItemReader
4. JdbcPagingItemReader
5. JdbcCursorItemReader
You can use whatever mechanism you desire including JdbcTemplate to read database with Spring Batch. Spring Batch as a framework doesn't put any such restrictions.
Spring Batch has those convenient readers ( listed by you ) for simple use cases and if those don't fit in your requirement, you are very free to write your own readers too.
JdbcPagingItemReader itself uses a NamedParameterJdbcTemplate created on datasource that you provide.
You requirement is not very clear to me but I guess, you can do any of the two tasks,
1.Composite Reader - You write your own composite reader and use one of Spring Batch readers as first reader then put in validation logic on those read items
2.Validate in Processor - Read your items with Spring Batch provided readers then process / validate in processor. Chaining of processors is possible in Spring Batch - Chaining ItemProcessors so you can put different transformations if different processors and produce a final output after a chain.
What is the best way to import data from few csv files in Spring Batch? I mean one csv file responds to one table in database.
I created one batch configuration class for each table and every table has its own job and step.
Is there any solution to do this in more elegant way?
There's a variety of ways you could tackle the problem, but the simplest job would look something like:
FlatFileItemWriter reader with a DelmitedLineTokenizer and BeanWrapperFieldSetMapper to read the file
Processor if you need to do any additional validation/filtering/transformation
JDBCBatchItemWriter to insert/update the target table
Here's an example that includes more information around specific dependencies, config, etc. The example uses context file config rather than annotation-based, but it should be sufficient to show you the way.
A more complex solution might be a single job with a partitioned step that scans the input folder for files and, leveraging reference table/schema information, creates a reader/writer step for each file that it finds.
You also may want to consider what to do with the files once you're done... Delete them? Compress them?
I have a chunk tasklet in Spring Batch. The processor reads form table A, the writer writes to table A when the record is not present. When I configure the commit-interval to 1, it works fine.
When i configure the commit-interval to a higher number i'm getting dublicate entry execptions because the processor didn't get the dirty read information.
My Tasklet is configured with a read uncommit statement:
batch:transaction-attributes isolation = "READ_UNCOMMITTED"
I think that this configurations was not accepted in my configuration? Any ideas?
You shouldn't encountered this problem because read/process/write are (usually) managed in this manner:
read is done in a separate connection
chunk write is done in its own transaction for skip/retry/fault management
You don't need to use READ_UNCOMMITTED but more easy:
Create a ItemReader<S> (a JdbcCursorItemReader should be fine)
Process your item with an ItemProcessor<S,T>
Write your own ItemWriter<T> that write/update an object based on its presence on database
If you want to reduce items to write with your custom writer you can filter out duplicate objects during process phase: you can achive this goal using a map to store duplicated items like described by #jackson (only for current chunk items, not for all rows in database - this step is done later by ItemWriter)
dirty reads in general is a scary idea.
sounds like it is a design issue instead.
what you should be doing is...
1) sounds like you should introduce a cache/map to store entries you plan to commit, but haven't written into db yet.
if the entry is already in table A or in the cache, skip.
if the entry is NOT in table A or the cache, then save a copy into the cache, and add it to the list of candidates to be written by the writer.