dropping of meta data tables by spring batch - spring-batch

In Spring Batch , when are meta data tables dropped?
I see drop sql file at - /org/springframework/batch/core/... but not sure if its some trigger from program ( Batch Job itself ) that drops these tables or these tables need to be dropped manually or does it have anything to do with batch admin?

I suppose they are never dropped automatically but a manual action is always required (from SB admin) or from your application (as part of your application service layer)

The meta data tables are not created automatically nor are they dropped automatically.
You need to do it yourself once. (This can be automated if necessary but need not be.)
Spring boot does provide a facility that automatically will create the tables needed, but that is not part of the native Spring Batch functionality.

Related

Can same meta data tables be used for multiple spring batches

Example :
I have two different batches
batch-a , batch-b running on azure connecting to onprem db.
batch-a is deployed first and creates a meta data table.
lets say batch-b is deployed after few months .
Can it use the same meta tables that was created and used by batch-a?
If batch-a and batch-b is different Jobs, then you can use the same Spring Batch Metadata tables, it depends if batch-a and batch-b is connecting to same DB then yes Spring Batch Framework will automatically take care of it.
See my article here: https://prateek-ashtikar512.medium.com/spring-batch-metadata-in-different-schema-c18813a0448a

Can we write the spring batch metadata to multiple datasources?

I have a use case where I am using spring batch and writing to 3 different data sources based on the job parameters. All of this mechanism is working absolutely fine but the only problem is the meta data. Spring batch is using the default data Source to write the metadata . So whenever I write the data for a job, the transactional data always goes to the correct DB but the batch metadata always goes to default DB.
Is it possible to selectively write the meta data also to the respective databases based on the jobs parameter?
#michaelMinella , #MahmoudBenHassine Can you please help.

Is it possible to configure Hibernate for flush only but never commit ( A kind of commit simulation)

I need to migrate from an old postgreSql database with an old schema (58 tables) to a new database with a new schema (40 tables). The patterns are completely different.
It is not a simple migration (copy and paste). But rather a copy-transform-paste.
I decided to write a batch and use spring batch, spring data and jpa. So I have two dataSources and a chainedTransaction. My config spring is mainly made up of chunck Task with a JpaPagingItemReader and an ItemWriterAdapter.
For performance needs, I also configured Partitioner which allows me to partition my source tables into several sub-tables and a chunckSize = 500000
Everything works smoothly. But considering the size of my old table it takes me a week to migrate all the data.
I will want to do a test which will consist of running my Batch without committing. Just that hibernate generates all sql requests in a ".sql" file, but does not commit the data to the database.
This will allow me to see if the commit is costly in execution time.
Is it possible to configure hibernate to flush only but never commit? A kind of commit simulation ?
Thank's
Usually, the costly part is foreign key and unique key checks as well as index maintenance, but since you don't write how you fetch data, it could very well be the case that you are accessing your data in an inefficient manner.
In general, I would recommend you to create a dump with pg_dump, restore that and then try to do the migration in an SQL only way. This way, no data has to flow around but can stay on the machine which is generally much more efficient.

Spring batch 2.2.0 not writing data to file

We have a spring batch application which inserts data into few tables and then selects data from few tables based on multiple business conditions and writes the data in feed file(flat text file). The application while run generates empty feed file only with headers and no data. The select query when ran separately in SQL developer runs for 2 hours and fetches the data (approx 50 million records). We are using the below components in the application JdbcCursorItemReader and FlatFileWrtier. Below is the configuration details used.
maxBatchSize=100
fileFetchSize=1000
commitInterval=10000
There are no errors or exceptions while the application is run. Wanted to know if we are missing anything here or is any spring batch components not properly used.Any pointers in this regard would be really helpful.

getting data from DB in spring batch and store in memory

In the spring batch program, I am reading the records from a file and comparing with the DB if the data say column1 from file is already exists in table1.
Table1 is fairly small and static. Is there a way I can get all the data from table1 and store it in memory in the spring batch code? Right now for every record in the file, the select query is hitting the DB.
The file is having 3 columns delimited with "|".
The file I am reading is having on an average 12 million records and it is taking around 5 hours to complete the job.
Preload in memory using a StepExecutionListener.beforeStep (or #BeforeStep).
Using this trick data will be loaded once before step execution.
This also works for step restarting.
I'd use caching like a standard web app. Add service caching using Spring's caching abstractions and that should take care of it IMHO.
Load static table in JobExecutionListener.beforeJob(-) and keep this in jobContext and you can access through multiple steps using 'Late Binding of Job and Step Attributes'.
You may refer 5.4 section of this link http://docs.spring.io/spring-batch/reference/html/configureStep.html