How to avoid JPA persist directly insert into database? - jpa

This is the mysql table.
create table Customer
(
id int auto_increment primary key,
birth date null,
createdTime time null,
updateTime datetime(6) null
);
This my java code
#Before
public void init() {
this.entityManagerFactory=Persistence.createEntityManagerFactory("jpaLearn");
this.entityManager=this.entityManagerFactory.createEntityManager();
this.entityTransaction=this.entityManager.getTransaction();
this.entityTransaction.begin();
}
#Test
public void persistentTest() {
this.entityManager.setFlushMode(FlushModeType.COMMIT); //don't work.
for (int i = 0; i < 1000; i++) {
Customer customer = new Customer();
customer.setBirth(new Date());
customer.setCreatedTime(new Date());
customer.setUpdateTime(new Date());
this.entityManager.persist(customer);
}
}
#After
public void destroy(){
this.entityTransaction.commit();
this.entityManager.close();
this.entityManagerFactory.close();
}
When I reading the wikibooks of JPA, it said "This means that when you call persist, merge, or remove the database DML INSERT, UPDATE, DELETE is not executed, until commit, or until a flush is triggered."
But at same time my code runing, I read the mysql log, I find each time the persist execution, mysql will execute the sql. And I also read the wireShark, each time will cause the request to Database.
I remember jpa saveAll method can send SQL statements to the database in batches? If I wanna to insert 10000 records, how to improve the efficiency?

My answer below supposes that you use Hibernate as jpa implementation. Hibernate doesn't enable batching by default. This means that it'll send a separate SQL statement for each insert/update operation.
You should set hibernate.jdbc.batch_size property to a number bigger than 0.
It is better to set this property in your persistence.xml file where you have your jpa configuration but since you have not posted it in the question, below it is set directly on the EntityManagerFactory.
#Before
public void init() {
Properties properties = new Properties();
properties.put("hibernate.jdbc.batch_size", "5");
this.entityManagerFactory = Persistence.createEntityManagerFactory("jpaLearn", properties);
this.entityManager = this.entityManagerFactory.createEntityManager();
this.entityTransaction = this.entityManager.getTransaction();
this.entityTransaction.begin();
}
Then by observing your logs you should see that the Customer records are persisted in the database in batches of 5.
For further reading please check: https://www.baeldung.com/jpa-hibernate-batch-insert-update

you should enable batch for hibernate
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.jdbc.batch_size=20
and use
reWriteBatchedInserts&reWriteBatchedStatements=true
end of your connectionString

Related

Trying to persist data if no duplicate record is found on collection

I'm using Quarkus Reactive and I'm trying to insert data to DB if duplicate record is not found on DB, here's my line of code
#ApplicationScoped
public class MovieRepositories implements IMovieRepositories {
#Inject
MovieMapper mapper;
#Override
public Uni<Object> create(MovieEntity entity) {
final var data = mapper.toEntity(entity);
final var dataMovie = MovieEntity
.find("name=?1 and deleted_at is null", entity.getName());
return dataMovie.firstResult().onItem().ifNotNull()
.failWith(new ValidationException("Movie already exist"))
.call(c -> MovieEntity.persist(data))
.chain(d -> Uni.createFrom().item(data.getId()));
}
however, this code terminates after failure on this line of code
.failWith(new ValidationException("Movie already exist"))
and persist is never executed.
How to make this code insert data if no duplicate record is found on?
Thanks in advance
Solved by adding a return value to incoming message method that is calling this insert. This happens due to uncommited transaction to mongodb. By simply adding return commited the transaction to mongodb

Spring batch JdbcBatchItemWriter insert is very slow with MYSQL

I'm using a chunk step with a reader and writer. I am reading data from Hive with 50000 chunk size and insert into mysql with same 50000 commit.
#Bean
public JdbcBatchItemWriter<Identity> writer(DataSource mysqlDataSource) {
return new JdbcBatchItemWriterBuilder<Identity>()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql(insertSql)
.dataSource(mysqlDataSource)
.build();
}
When i have started dataload and insert into Mysql its commiting very slow and 100000 records are takiing more than a hr to load while same loader with Gemfire loading 5 million recordsin 30 min.
seems like it insert one by one insted of batch as laoding 1500 then 4000 then ....etc ,does anyone faced same issue ?
Since you are using BeanPropertyItemSqlParameterSourceProvider, this will include lot of reflection to set variables in prepared statement.This will increase time.
If speed is the your high priority try implementing your own ItemWriter as given below and use prepared statement batch to execute update.
#Component
public class CustomWriter implements ItemWriter<Identity> {
//your sql statement here
private static final String SQL = "INSERT INTO table_name (column1, column2, column3, ...) VALUES (?,?,?,?);";
#Autowired
private DataSource dataSource;
#Override
public void write(List<? extends Identity> list) throws Exception {
PreparedStatement preparedStatement = dataSource.getConnection().prepareStatement(SQL);
for (Identity identity : list) {
// Set the variables
preparedStatement.setInt(1, identity.getMxx());
preparedStatement.setString(2, identity.getMyx());
preparedStatement.setString(3, identity.getMxt());
preparedStatement.setInt(4, identity.getMxt());
// Add it to the batch
preparedStatement.addBatch();
}
int[] count = preparedStatement.executeBatch();
}
}
Note: This is a rough code. So Exception handling and resource handling is not done properly. You can work on the same. I think this will improve your writing speed very much.
Try Adding ";useBulkCopyForBatchInsert=true" to your connection url.
Connection con = DriverManager.getConnection(connectionUrl + ";useBulkCopyForBatchInsert=true")
Source : https://learn.microsoft.com/en-us/sql/connect/jdbc/use-bulk-copy-api-batch-insert-operation?view=sql-server-ver15

how does ADO.NET know that a SQL Server concurrency violation has occurred?

I don't understand how ADO.NET recognizes a concurrency violation unless it's doing something beyond what I'm telling it to do, inside its "black box".
My update query in SQL Server 2000 does something like the following example, which is simplified; if the rowversion passed to the stored proc by the client doesn't match the rowversion in the database, the where-clause will fail, and no rows will be updated:
create proc UpdateFoo
#rowversion timestamp OUTPUT,
#id int,
#foodescription varchar(255)
as UPDATE FOO set description = #foodescription
where id = #id and rowversion = #rowversion;
if ##ROWCOUNT = 1
select #rowversion from foo where id = #id;
I create a SqlCommand object and populate the parameters and assign the command object to the SqlDataAdapter's UpdateCommand property. Then I invoke the data adapter's Update method.
There should indeed be a concurrency error because I deliberately change the database row in order to force a new rowversion. But how does ADO.NET know this? Is it doing something more than executing the command?
In the RowUpdated event of the SqlDataAdapter there will be a Concurrency error:
MySqlDataAdapter += (sender, evt) =>
{
if ((evt.Status == UpdateStatus.Continue) && (evt.StatementType == StatementType.Update))
{
// update succeeded
}
else
{
// update failed, check evt.Errors
}
}
Is ADO.NET comparing the rowversions? Is it looking at ##rowcount?

Reset Embedded H2 database periodically

I'm setting up a new version of my application in a demo server and would love to find a way of resetting the database daily. I guess I can always have a cron job executing drop and create queries but I'm looking for a cleaner approach. I tried using a special persistence unit with drop-create approach but it doesn't work as the system connects and disconnects from the server frequently (on demand).
Is there a better approach?
H2 supports a special SQL statement to drop all objects:
DROP ALL OBJECTS [DELETE FILES]
If you don't want to drop all tables, you might want to use truncate table:
TRUNCATE TABLE
As this response is the first Google result for "reset H2 database", I post my solution below :
After each JUnit #tests :
Disable integrity constraint
List all tables in the (default) PUBLIC schema
Truncate all tables
List all sequences in the (default) PUBLIC schema
Reset all sequences
Reenable the constraints.
#After
public void tearDown() {
try {
clearDatabase();
} catch (Exception e) {
Fail.fail(e.getMessage());
}
}
public void clearDatabase() throws SQLException {
Connection c = datasource.getConnection();
Statement s = c.createStatement();
// Disable FK
s.execute("SET REFERENTIAL_INTEGRITY FALSE");
// Find all tables and truncate them
Set<String> tables = new HashSet<String>();
ResultSet rs = s.executeQuery("SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA='PUBLIC'");
while (rs.next()) {
tables.add(rs.getString(1));
}
rs.close();
for (String table : tables) {
s.executeUpdate("TRUNCATE TABLE " + table);
}
// Idem for sequences
Set<String> sequences = new HashSet<String>();
rs = s.executeQuery("SELECT SEQUENCE_NAME FROM INFORMATION_SCHEMA.SEQUENCES WHERE SEQUENCE_SCHEMA='PUBLIC'");
while (rs.next()) {
sequences.add(rs.getString(1));
}
rs.close();
for (String seq : sequences) {
s.executeUpdate("ALTER SEQUENCE " + seq + " RESTART WITH 1");
}
// Enable FK
s.execute("SET REFERENTIAL_INTEGRITY TRUE");
s.close();
c.close();
}
The other solution would be to recreatethe database at the begining of each tests. But that might be too long in case of big DB.
Thre is special syntax in Spring for database manipulation within unit tests
#Sql(scripts = "classpath:drop_all.sql", executionPhase = Sql.ExecutionPhase.AFTER_TEST_METHOD)
#Sql(scripts = {"classpath:create.sql", "classpath:init.sql"}, executionPhase = Sql.ExecutionPhase.BEFORE_TEST_METHOD)
public class UnitTest {}
In this example we execute drop_all.sql script (where we dropp all required tables) after every test method.
In this example we execute create.sql script (where we create all required tables) and init.sql script (where we init all required tables before each test method.
The command: SHUTDOWN
You can execute it using
RunScript.execute(jdbc_url, user, password, "classpath:shutdown.sql", "UTF8", false);
I do run it every time when the Suite of tests is finished using #AfterClass
If you are using spring boot see this stackoverflow question
Setup your data source. I don't have any special close on exit.
datasource:
driverClassName: org.h2.Driver
url: "jdbc:h2:mem:psptrx"
Spring boot #DirtiesContext annotation
#DirtiesContext(classMode = DirtiesContext.ClassMode.BEFORE_EACH_TEST_METHOD)
Use #Before to initialise on each test case.
The #DirtiesContext will cause the h2 context to be dropped between each test.
you can write in the application.properties the following code to reset your tables which are loaded by JPA:
spring.jpa.hibernate.ddl-auto=create

EF Code First - Recreate Database If Model Changes

I'm currently working on a project which is using EF Code First with POCOs. I have 5 POCOs that so far depends on the POCO "User".
The POCO "User" should refer to my already existing MemberShip table "aspnet_Users" (which I map it to in the OnModelCreating method of the DbContext).
The problem is that I want to take advantage of the "Recreate Database If Model changes" feature as Scott Gu shows at: http://weblogs.asp.net/scottgu/archive/2010/07/16/code-first-development-with-entity-framework-4.aspx - What the feature basically does is to recreate the database as soon as it sees any changes in my POCOs. What I want it to do is to Recreate the database but to somehow NOT delete the whole Database so that aspnet_Users is still alive. However it seems impossible as it either makes a whole new Database or replaces the current one with..
So my question is: Am I doomed to define my database tables by hand, or can I somehow merge my POCOs into my current database and still take use of the feature without wipeing it all?
As of EF Code First in CTP5, this is not possible. Code First will drop and create your database or it does not touch it at all. I think in your case, you should manually create your full database and then try to come up with an object model that matches the DB.
That said, EF team is actively working on the feature that you are looking for: altering the database instead of recreating it:
Code First Database Evolution (aka Migrations)
I was just able to do this in EF 4.1 with the following considerations:
CodeFirst
DropCreateDatabaseAlways
keeping the same connection string and database name
The database is still deleted and recreated - it has to be to for the schema to reflect your model changes -- but your data remains intact.
Here's how: you read your database into your in-memory POCO objects, and then after the POCO objects have successfully made it into memory, you then let EF drop and recreate the database. Here is an example
public class NorthwindDbContextInitializer : DropCreateDatabaseAlways<NorthindDbContext> {
/// <summary>
/// Connection from which to ead the data from, to insert into the new database.
/// Not the same connection instance as the DbContext, but may have the same connection string.
/// </summary>
DbConnection connection;
Dictionary<Tuple<PropertyInfo,Type>, System.Collections.IEnumerable> map;
public NorthwindDbContextInitializer(DbConnection connection, Dictionary<Tuple<PropertyInfo, Type>, System.Collections.IEnumerable> map = null) {
this.connection = connection;
this.map = map ?? ReadDataIntoMemory();
}
//read data into memory BEFORE database is dropped
Dictionary<Tuple<PropertyInfo, Type>, System.Collections.IEnumerable> ReadDataIntoMemory() {
Dictionary<Tuple<PropertyInfo,Type>, System.Collections.IEnumerable> map = new Dictionary<Tuple<PropertyInfo,Type>,System.Collections.IEnumerable>();
switch (connection.State) {
case System.Data.ConnectionState.Closed:
connection.Open();
break;
}
using (this.connection) {
var metaquery = from p in typeof(NorthindDbContext).GetProperties().Where(p => p.PropertyType.IsGenericType)
let elementType = p.PropertyType.GetGenericArguments()[0]
let dbsetType = typeof(DbSet<>).MakeGenericType(elementType)
where dbsetType.IsAssignableFrom(p.PropertyType)
select new Tuple<PropertyInfo, Type>(p, elementType);
foreach (var tuple in metaquery) {
map.Add(tuple, ExecuteReader(tuple));
}
this.connection.Close();
Database.Delete(this.connection);//call explicitly or else if you let the framework do this implicitly, it will complain the connection is in use.
}
return map;
}
protected override void Seed(NorthindDbContext context) {
foreach (var keyvalue in this.map) {
foreach (var obj in (System.Collections.IEnumerable)keyvalue.Value) {
PropertyInfo p = keyvalue.Key.Item1;
dynamic dbset = p.GetValue(context, null);
dbset.Add(((dynamic)obj));
}
}
context.SaveChanges();
base.Seed(context);
}
System.Collections.IEnumerable ExecuteReader(Tuple<PropertyInfo, Type> tuple) {
DbCommand cmd = this.connection.CreateCommand();
cmd.CommandText = string.Format("select * from [dbo].[{0}]", tuple.Item2.Name);
DbDataReader reader = cmd.ExecuteReader();
using (reader) {
ConstructorInfo ctor = typeof(Test.ObjectReader<>).MakeGenericType(tuple.Item2)
.GetConstructors()[0];
ParameterExpression p = Expression.Parameter(typeof(DbDataReader));
LambdaExpression newlambda = Expression.Lambda(Expression.New(ctor, p), p);
System.Collections.IEnumerable objreader = (System.Collections.IEnumerable)newlambda.Compile().DynamicInvoke(reader);
MethodCallExpression toArray = Expression.Call(typeof(Enumerable),
"ToArray",
new Type[] { tuple.Item2 },
Expression.Constant(objreader));
LambdaExpression lambda = Expression.Lambda(toArray, Expression.Parameter(typeof(IEnumerable<>).MakeGenericType(tuple.Item2)));
var array = (System.Collections.IEnumerable)lambda.Compile().DynamicInvoke(new object[] { objreader });
return array;
}
}
}
This example relies on a ObjectReader class which you can find here if you need it.
I wouldn't bother with the blog articles, read the documentation.
Finally, I would still suggest you always back up your database before running the initialization. (e.g. if the Seed method throws an exception, all your data is in memory, so you risk your data being lost once the program terminates.) A model change isn't exactly an afterthought action anyway, so be sure to back your data up.
One thing you might consider is to use a 'disconnected' foreign key. You can leave the ASPNETDB alone and just reference the user in your DB using the User key (guid). You can access the logged in user as follows:
MembershipUser currentUser = Membership.GetUser(User.Identity.Name, true /* userIsOnline */);
And then use the User's key as a FK in your DB:
Guid UserId = (Guid) currentUser.ProviderUserKey ;
This approach decouples your DB with the ASPNETDB and associated provider architecturally. However, operationally, the data will of course be loosely connected since the IDs will be in each DB. Note also there will be no referential constraints, whcih may or may not be an issue for you.