I have a use case where I am using spring batch and writing to 3 different data sources based on the job parameters. All of this mechanism is working absolutely fine but the only problem is the meta data. Spring batch is using the default data Source to write the metadata . So whenever I write the data for a job, the transactional data always goes to the correct DB but the batch metadata always goes to default DB.
Is it possible to selectively write the meta data also to the respective databases based on the jobs parameter?
#michaelMinella , #MahmoudBenHassine Can you please help.
Related
Example :
I have two different batches
batch-a , batch-b running on azure connecting to onprem db.
batch-a is deployed first and creates a meta data table.
lets say batch-b is deployed after few months .
Can it use the same meta tables that was created and used by batch-a?
If batch-a and batch-b is different Jobs, then you can use the same Spring Batch Metadata tables, it depends if batch-a and batch-b is connecting to same DB then yes Spring Batch Framework will automatically take care of it.
See my article here: https://prateek-ashtikar512.medium.com/spring-batch-metadata-in-different-schema-c18813a0448a
We have a spring batch application which inserts data into few tables and then selects data from few tables based on multiple business conditions and writes the data in feed file(flat text file). The application while run generates empty feed file only with headers and no data. The select query when ran separately in SQL developer runs for 2 hours and fetches the data (approx 50 million records). We are using the below components in the application JdbcCursorItemReader and FlatFileWrtier. Below is the configuration details used.
maxBatchSize=100
fileFetchSize=1000
commitInterval=10000
There are no errors or exceptions while the application is run. Wanted to know if we are missing anything here or is any spring batch components not properly used.Any pointers in this regard would be really helpful.
We have scenario where I have to get data from one database and update data to another database after applying business rules.
I want to use spring batch+drools+hibernate.
Can we apply rules in batch as we have million records at one time?
I am not an expert of drools and I am simply trying to give some context about Spring Batch.
Spring Batch is a Read -> Process -> Write framework and what we do with drools is same as what we do in Process step of Spring Batch i.e. we transform a read item in an ItemProcessor.
How Spring Batch helps you for handling large number of items is by implementing Chunk Oriented processing i.e. We read N-number of items in one go, transform these items one by one in Processor & then write a bulk of items in writer - this way we are basically reducing number of DB calls.
There are further scope of performance improvement by implementing parallelism via partitioning etc if your data can be partitioned on some criteria.
So we read items in bulk , transform one by one & then write in bulk to target database & I don't think hibernate is a good tool for bulk update / insert at write step - I would go by plain JDBC.
Your drools comes into picture at transformation step & that is going to be your custom code & its performance will have nothing to do with Spring Batch i.e. how you initialize sessions , pre compile rules etc . You will have to plug in this code in such a way that you don't initialize drools session etc every time but that should be one time activity.
In Spring Batch , when are meta data tables dropped?
I see drop sql file at - /org/springframework/batch/core/... but not sure if its some trigger from program ( Batch Job itself ) that drops these tables or these tables need to be dropped manually or does it have anything to do with batch admin?
I suppose they are never dropped automatically but a manual action is always required (from SB admin) or from your application (as part of your application service layer)
The meta data tables are not created automatically nor are they dropped automatically.
You need to do it yourself once. (This can be automated if necessary but need not be.)
Spring boot does provide a facility that automatically will create the tables needed, but that is not part of the native Spring Batch functionality.
In the spring batch program, I am reading the records from a file and comparing with the DB if the data say column1 from file is already exists in table1.
Table1 is fairly small and static. Is there a way I can get all the data from table1 and store it in memory in the spring batch code? Right now for every record in the file, the select query is hitting the DB.
The file is having 3 columns delimited with "|".
The file I am reading is having on an average 12 million records and it is taking around 5 hours to complete the job.
Preload in memory using a StepExecutionListener.beforeStep (or #BeforeStep).
Using this trick data will be loaded once before step execution.
This also works for step restarting.
I'd use caching like a standard web app. Add service caching using Spring's caching abstractions and that should take care of it IMHO.
Load static table in JobExecutionListener.beforeJob(-) and keep this in jobContext and you can access through multiple steps using 'Late Binding of Job and Step Attributes'.
You may refer 5.4 section of this link http://docs.spring.io/spring-batch/reference/html/configureStep.html