Read from database and delete from database through spring batch - spring-data-jpa

I have to read data from Relational database and delete all the data which is present in the table through spring batch.
Is there any way to do so?
I have tried reading through jdbscursorItemReader and update delete query in ItemWriter Also, by Itemprocessor but failed to do it.
Being new makes it difficult for me to perform it to its accuracy.
As I'm not getting any error in both cases but data is not getting deleted.

Related

SpringBatch is blocking insertion of data in other tables

i am using Postgres as my SQL.My Springboot application uses Spring Batch for processing and insertion of data.I am auditing my code flow like say suppose one 3rd party api which i call if it fails i audit this failure event.This piece of code is in my Spring Batch Writer.I see logs of my AUDIT DTO class getting created however i dont see data in audit table.The same if i move code of auditing outside Spring Batch writer -it works.What should be done so that my audit table insertion code in Spring Batch writer works?
More details would be needed to be sure but I assume your writer writes to the 3rd party API and you write the audit log to the same DataSource that you use for the Spring Batch meta data.
Every write of a chunk that Spring Batch does in a writer is wrapped in a transaction. Such a transaction will be rolled back if you throw an exception in the writer.
You need to write the audit log outside of the transaction created by Spring Batch. For example by using Spring transaction management and starting a new transaction with propagation level REQUIRES_NEW.

Spring batch with MongoDB and transactions

I have a Spring Batch application with two databases: one SQL DB for the Spring Batch meta data, and another which is a MongoDB where all the business data is stored. The relation DB still uses DataSourceTransactionManager.
However I dont think the Mongo writes are done within an active transaction with rollbacks. Here is the excerpt from the official Spring Batch documentation on MongoItemWriter:
A ItemWriter implementation that writes to a MongoDB store using an implementation of Spring Data's MongoOperations. Since MongoDB is not a transactional store, a best effort is made to persist written data at the last moment, yet still honor job status contracts. No attempt to roll back is made if an error occurs during writing.
However this is not the case any more; MongoDB introduced ACID transactions in version 4.
How do I go about adding transactions to my writes? I could use #Transactional on my service methods when I use ItemWriterAdapter. But still dont know what to do with MongoItemWriter... What is the right configuration here? Thank you.
I have a Spring Batch application with two databases: one SQL DB for the Spring Batch meta data, and another which is a MongoDB where all the business data is stored.
I invite you to take a look at the following posts to understand the implications of this design choice:
How to java-configure separate datasources for spring batch data and business data? Should I even do it?
How does Spring Batch transaction management work?
In your case, you have a distributed transaction across two data sources:
SQL datasource for the job repository, which is managed by a DataSourceTransactionManager
MongoDB for your step (using the MongoItemWriter), which is managed by a MongoTransactionManager
If you want technical meta-data and business data to be committed/rolled back in the scope of the same distributed transaction, you need to use a JtaTransactionManager that coordinates the DataSourceTransactionManager and MongoTransactionManager. You can find some resources about the matter here: https://stackoverflow.com/a/56547839/5019386.
BTW, there is a feature request to use MongoDB as a job repository in Spring Batch: https://github.com/spring-projects/spring-batch/issues/877. When this is implemented, you could store both business data and technical meta-data in the same datasource (so no need for a distributed transaction anymore) and you would be able to use the same MongoTransactionManager for both the job repository and your step.

Is it possible to configure Hibernate for flush only but never commit ( A kind of commit simulation)

I need to migrate from an old postgreSql database with an old schema (58 tables) to a new database with a new schema (40 tables). The patterns are completely different.
It is not a simple migration (copy and paste). But rather a copy-transform-paste.
I decided to write a batch and use spring batch, spring data and jpa. So I have two dataSources and a chainedTransaction. My config spring is mainly made up of chunck Task with a JpaPagingItemReader and an ItemWriterAdapter.
For performance needs, I also configured Partitioner which allows me to partition my source tables into several sub-tables and a chunckSize = 500000
Everything works smoothly. But considering the size of my old table it takes me a week to migrate all the data.
I will want to do a test which will consist of running my Batch without committing. Just that hibernate generates all sql requests in a ".sql" file, but does not commit the data to the database.
This will allow me to see if the commit is costly in execution time.
Is it possible to configure hibernate to flush only but never commit? A kind of commit simulation ?
Thank's
Usually, the costly part is foreign key and unique key checks as well as index maintenance, but since you don't write how you fetch data, it could very well be the case that you are accessing your data in an inefficient manner.
In general, I would recommend you to create a dump with pg_dump, restore that and then try to do the migration in an SQL only way. This way, no data has to flow around but can stay on the machine which is generally much more efficient.

JPA locking database record

I am writing a spring boot service using JPA to interact with database. What I want to do is to lock down a database record as soon as it is read by JPA i.e. once is has been read no other thread will be able to read it until the first has updated the record.

How Entity Framework works in case of batch insert and update data

I use MS data access application block for interaction with database and I saw its performance is good. When I like to add 100 or more records then I send those 100 records in xml format to a stored procedure and from there I do a bulk insert. Now I have to use Entity Framework. I haven't ever used EF before so I am not familiar with EF and how it works.
In another forum I asked a question like "How Entity Framework works in case of batch insert and update data" and got answer
From my experience, EF does not support batch insert or batch update.
What it does is that it will issue an individual insert or update statement, but it will wrap all of them in a transaction if you add all of your changes to the dbcontect before calling SaveChanges().
Is it true that EF can not handle batch insert/update? In case of batch insert/update EF inserts data in loop? If there are 100 records which we need to commit at once then EF can not do it?
If it is not right then please guide me how one should write code as a result EF can do batch insert/update. Also tell me the trick how to see what kind of SQL it will generate.
If possible please guide me with sample code for batch insert/update with EF. also tell me which version of EF support true batch operation. Thanks
Yes EF is not a Bulk load, Update tool.
You can of course put a a few K entries and commit (SaveChanges)
But when you have serious volumes of speed is critical, use SQL.
see Batch update/delete EF5 as an example on the topic