Spring Batch reading from Oracle View - spring-batch

I have an external Oracle Database for which I have access only to one view.
It will take more than 40 minutes for the view to provide results for which there are around 50,000 records in resultset.
We do not have control on optimizing the oracle view.
I have to process the resultset and persist to a table in another postgres database
Is using Spring Batch recommended for my requirement?

Yes, Spring Batch will work fine for this

Related

What's the bottleneck in Spring r2dbc database connection?

I've set up a sample project using spring boot, webflux, and r2dbc. I've been able to stream rows from a postgres db table to the client.
Is there a memory bottleneck on this server implementation (for storing the results of the query)? Do the rows stream through?
PS I'm not claiming any level of quality on this, I know pagination and so on would be essential, just wondering about how the db query interacts with the reactive framework.
Pagination is not essential with R2DBC. If you have a lot of rows to process you can issue a single query instead of fetching batches. The driver uses back-pressure to allow flow control so it does not overwhelm your application. You could read here about how backpressure is applied on such queries.

Bulk data insertion and updating from one db server to another db server

I have some set of tables which has 20 million records in a postgres server. As of now i m migrating some table data from one server to another server using insert and update queries with dependent tables in functions. It takes around 2 hours even after optimizing the query. I need a solution to migrate the data faster by using mongodb or cassandra. How?
Try putting your updates and inserts into a file and then load the file. I understand Postgresql will optimise loading the file contents. It's always worked for me although I haven't used that quantity.

Spring batch: several hosts, one database

My question is simple - is spring batch made in the way that it allows batch processing hosts to be connected to the same database which contains all spring batch's tables like BATCH_JOB_EXECUTION, BATCH_JOB_STATUS etc?
I'm asking because now I have application organized in this way and I have DB deadlocks on those tables...

dropping of meta data tables by spring batch

In Spring Batch , when are meta data tables dropped?
I see drop sql file at - /org/springframework/batch/core/... but not sure if its some trigger from program ( Batch Job itself ) that drops these tables or these tables need to be dropped manually or does it have anything to do with batch admin?
I suppose they are never dropped automatically but a manual action is always required (from SB admin) or from your application (as part of your application service layer)
The meta data tables are not created automatically nor are they dropped automatically.
You need to do it yourself once. (This can be automated if necessary but need not be.)
Spring boot does provide a facility that automatically will create the tables needed, but that is not part of the native Spring Batch functionality.

Can Spring-JPA work with Postgres partitioning?

We have a Spring Boot project that uses Spring-JPA for data access. We have a couple of tables where we create/update rows once (or a few times, all within minutes). We don't update rows that are older than a day. These tables (like audit table) can get very large and we want to use Postgres' table partitioning features to help break up the data by month. So the main table always has this calendar month's data but if the query requires retrieval from previous months it would somehow read it from other partitions.
Two questions:
1) Is this a good idea for archiving older data but still leave it query-able?
2) Does Spring-JPA work with partitioned tables? Or do we have to figure out how to break up the query and do native queries and concatenate the restult set?
Thanks.
I am working with postgres partitioning with Hibernate & Spring JPA for a period of time. So I think, I can try to answer your questions.
1) Is this a good idea for archiving older data but still leave it query-able?
If you are applying indexes and not re-indexing table frequently, then partitioning of data may result faster query results.
Also you can use clustered index feature in postgres as well to fetch the data faster.
Because table with older data will not going to be updated, so clustered index will improve the performance efficiently.
2) Does Spring-JPA work with partitioned tables? Or do we have to figure out how to break up the query and do native queries and concatenate the restult set?
Spring JPA will work out of the box with partitioned table. It will retrieve the data from master as well as child tables and returns the concatenated result set.
Note : Issue with partitioned table
The only issue you will face with partitioned table is insertion in partitioned table.
Let me explain, when you partition a table, you will create a trigger over master table, and that trigger will return null. This is the key behind insertion issue in partitioned table using Spring JPA / Hibernate.
When you try to insert a row using Spring JPA or Hibernate you will face below issue
Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1
To overcome this issue you need to override implementation of Batching batcher.
In hibernate you can provide the custom implementation of batcher factory using below configuration
hibernate.jdbc.factory_class=path.to.my.batcher.factory.implementation
In spring JPA you can achieve the same by custom implementation of batch builder using below configuration
hibernate.jdbc.batch.builder=path.to.my.batch.builder.implementation
References :
Custom Batch Builder/Batch in Spring-JPA
Demo Application
In addition to the #Anil Agrawal answer.
If you are using spring boot 2 then you need to define the customBatcher using the property.
spring.jpa.properties.hibernate.jdbc.batch.builder=net.xyz.jdbc.CustomBatchBuilder
You do not have to break down the JDBC query with postgres 11+.
If you execute select on the main table with plain jdbc, the DB would return the aggregated results from the partitioned tables.
In other words, the work is done by the Postgres DB, so Spring JPA will simply get the result and map it to objects as if there were no partitioning.
For having inserts work in a partitioned table you need to make sure that your partitions are already created, i think spring data will not create them for you.