Portable JPA Batch / Bulk Insert

Portable JPA Batch / Bulk Insert - jpa

I just jumped on a feature written by someone else that seems slightly inefficient, but my knowledge of JPA isn't that good to find a portable solution that's not Hibernate specific.
In a nutshell the Dao method called within a loop to insert each one of the new entities does a "entityManager.merge(object);".
Isnt' there a way defined in the JPA specs to pass a list of entities to the Dao method and do a bulk / batch insert instead of calling merge for every single object?
Plus since the Dao method is annotated w/ "#Transactional" I'm wondering if every single merge call is happening within its own transaction... which would not help performance.
Any idea?

No there is no batch insert operation in vanilla JPA.
Yes, each insert will be done within its own transaction. The #Transactional attribute (with no qualifiers) means a propagation level of REQUIRED (create a transaction if it doesn't exist already). Assuming you have:
public class Dao {
#Transactional
public void insert(SomeEntity entity) {
...
}
}
you do this:
public class Batch {
private Dao dao;
#Transactional
public void insert(List<SomeEntity> entities) {
for (SomeEntity entity : entities) {
dao.insert(entity);
}
}
public void setDao(Dao dao) {
this.dao = dao;
}
}
That way the entire group of inserts gets wrapped in a single transaction. If you're talking about a very large number of inserts you may want to split it into groups of 1000, 10000 or whatever works as a sufficiently large uncommitted transaction may starve the database of resources and possibly fail due to size alone.
Note: #Transactional is a Spring annotation. See Transactional Management from the Spring Reference.

What you could do, if you were in a crafty mood, is:
#Entity
public class SomeEntityBatch {
#Id
#GeneratedValue
private int batchID;
#OneToMany(cascade = {PERSIST, MERGE})
private List<SomeEntity> entities;
public SomeEntityBatch(List<SomeEntity> entities) {
this.entities = entities;
}
}
List<SomeEntity> entitiesToPersist;
em.persist(new SomeEntityBatch(entitiesToPersist));
// remove the SomeEntityBatch object later
Because of the cascade, that will cause the entities to be inserted in a single operation.
I doubt there is any practical advantage to doing this over simply persisting individual objects in a loop. It would be an interesting to look at the SQL that the JPA implementation emitted, and to benchmark.

Related

JPA: How to get results by compromised where-clause

I have a table with 30 columns.
I fill the object within my java code. Now I want to look up in my database, if the row is already inserted. I can do this primitive like:
SELECT *
FROM tablename
WHERE table.name=object.name
AND table.street=object.street
AND ...
AND ...
AND ...
I think you get it. It works, but in my opinion this is not the best solution.
Is there any kind of a generic solution (eg: I do not need to change the code, if the table changes), where I can give the where-clause my object and it can match itself? Also the where-clause is not that massive.

The closest thing that comes to mind is the Spring Data JPA Specifications.
You can isolate the where clauses in an instance for a particular entity.
Afterwards, you just pass it to any of the #Repository methods:
public interface UserRepository extends CrudRepository<User, Long>,
JpaSpecificationExecutor<User> {
}
Then in your service:
#Autowired
private UrerRepository repo;
public void findMatching() {
List<User> users = repo.findAll(new MyUserSpecification());
Then, whenever db changes you simply alter one place, which is the Specification implementation.

How id can be found in Transaction-Scoped Persistence context if it's not in the database

An example from Pro JPA:
#Stateless
public class AuditServiceBean implements AuditService {
#PersistenceContext(unitName = "EmployeeService")
EntityManager em;
public void logTransaction(int empId, String action) {
// verify employee number is valid
if (em.find(Employee.class, empId) == null) {
throw new IllegalArgumentException("Unknown employee id");
}
LogRecord lr = new LogRecord(empId, action);
em.persist(lr);
}
}
#Stateless
public class EmployeeServiceBean implements EmployeeService {
#PersistenceContext(unitName = "EmployeeService")
EntityManager em;
#EJB
AuditService audit;
public void createEmployee(Employee emp) {
em.persist(emp);
audit.logTransaction(emp.getId(), "created employee");
}
// ...
}
And the text:
Even though the newly created Employee is not yet in the database, the
audit bean can find the entity and verify that it exists. This works
because the two beans are actually sharing the same persistence
context.
As far as I understand Id is generated by the database. So how can emp.getId() be passed into audit.logTransaction() if the transaction has not been committed yet and id has not been not generated yet?

it depends on the strategy of GeneratedValue. if you use something like Sequence or Table strategy. usually, persistence provider assign the id to the entities( it has some reserved id based on allocation size) immediately after calling persist method.
but if you use IDENTITY strategy id different provider may act different. for example in hibernate, if you use Identity strategy, it performs the insert statement immediately and fill the id field of entity.
https://thoughts-on-java.org/jpa-generate-primary-keys/ says:
Hibernate requires a primary key value for each managed entity and
therefore has to perform the insert statement immediately.
but in eclipselink, if you use IDENTITY strategy, id will be assigned after flushing. so if you set flush mode to auto(or call flush method) you will have id after persist.
https://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Entities/Ids/GeneratedValue says:
There is a difference between using IDENTITY and other id generation
strategies: the identifier will not be accessible until after the
insert has occurred – it is the action of inserting that caused the
identifier generation. Due to the fact that insertion of entities is
most often deferred until the commit time, the identifier would not be
available until after the transaction has been flushed or committed.
in implementation UnitOfWorkChangeSet has a collection for new entities which will have no real identity until inserted.
// This collection holds the new objects which will have no real identity until inserted.
protected Map<Class, Map<ObjectChangeSet, ObjectChangeSet>> newObjectChangeSets;
JPA - Returning an auto generated id after persist() is a question that is related to eclipselink.
there are good points at https://forum.hibernate.org/viewtopic.php?p=2384011#p2384011
I am basically referring to some remarks in Java Persistence with
Hibernate. Hibernate's API guarantees that after a call to save() the
entity has an assigned database identifier. Depending on the id
generator type this means that Hibernate might have to issue an INSERT
statement before flush() or commit() is called. This can cause
problems at rollback time. There is a discussion about this on page
490 of Java Persistence with Hibernate.
In JPA persist() does not return a database identifier. For that
reason one could imagine that an implementation holds back the
generation of the identifier until flush or commit time.
Your approach might work fine for now, but you could run into troubles
when changing the id generator or JPA implementation (switching from
Hibernate to something else).
Maybe this is no issue for you, but I just thought I bring it up.

How to properly use Locking or Transactions to prevent duplicates using Spring Data

What is the best way to check if a record exists and if it doesn't, create it (avoiding duplicates)?
Keep in mind that this is a distributed application running across many application servers.
I'm trying to avoid these:
Race Conditions
TOCTOU
A simple example:
Person.java
#Entity
public class Person {
#Id
#GeneratedValue
private long id;
private String firstName;
private String lastName;
//Getters and Setters Omitted
}
PersonRepository.java
public interface PersonRepository extends CrudRepository<Person, Long>{
public Person findByFirstName(String firstName);
}
Some Method
public void someMethod() {
Person john = new Person();
john.setFirstName("John");
john.setLastName("Doe");
if(personRepo.findByFirstName(john.getFirstName()) == null){
personRepo.save(john);
}else{
//Don't Save Person
}
}
Clearly as the code currently stands, there is a chance that the Person could be inserted in the database in between the time I checked if it already exists and when I insert it myself. Thus a duplicate would be created.
How should I avoid this?
Based on my initial research, perhaps a combination of
#Transactional
#Lock
But the exact configuration is what I'm unsure of. Any guidance would be greatly appreciated. To reiterate, this application will be distributed across multiple servers so this must still work in a highly-available, distributed environment.

For Inserts: if you want to prevent same recordsto be persisted, than you may want to take some precoutions on DB side. In your example, if firstname should be unique, then define a unique index on that column, or a agroup of colunsd that should be unique, and let the DB handle the check, you just insert & get exception if you're inserting a record that's already inserted.
For updates: use #Version (javax.persistence.Version) annotation like this:
#Version
private long version;
Define a version column in tables, Hibernate or any other ORM will automatically populate the value & also verison to where clause when entity updated. So if someone try to update the old entity, it prevent this. Be careful, this doesn't throw exception, just return update count as 0, so you may want to check this.

EJB - Using an EntityManager - Can finding an entity cause an OptimisticLockException

Unfortunately I'm getting an OptimisticLockException in my code and I'm not sure why. Perhaps there is someone who can help me with an answer to a general question.
Following scenario:
#Entity
public class MyEntity {
#Id
#GeneratedValue
private Integer id;
#Version
private int version;
private String value;
}
#Singleton
#TransactionManagement(TransactionManagementType.CONTAINER)
public class MyBean {
#PersistenceContext
private EntityManager em;
#TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public void test() {
MyEntity myEntity = em.find(MyEntity.class, 1);
}
}
CMT are used. Method test() requires a new transaction.
Now my question: Can method test() throw an OptimisticLockException if there is another thread in another bean using the same persistence context changing my entity before commit although I only use find and don't update anything in my method test()?

from this blog
JPA Optimistic locking allows anyone to read and update an entity, however a version check is made upon commit and an exception is thrown if the version was updated in the database since the entity was read
So there is no need to do an update to get an OptimisticLockingException. Assume myEntity.getVersion()==1 when you read it. You will have an OptimisticLockingException if, at commit (i.e. when your test() method ends), the actual value in the version column is != 1.
It means that someone updated the entity (in the mean time between the READ and the transaction COMMIT) and so the values you have just read are no more valid at commit time.

Persisting a list of an interface type with JPA2

I suspect there's no perfect solution to this problem so least worst solution are more than welcome.
I'm implementing a dashboard using PrimeFaces and I would like to persist the model backing it (using JPA2). I've written my own implementation of DashboardModel and DashboardColumn with the necessary annotations and other fields I need. The model is shown below:
#Entity
public class DashboardSettings implements DashboardModel, Serializable{
#Id
private long id;
#OrderColumn( name="COLUMN_ORDER" )
private List<DashboardColumn> columns;
...a few other fields...
public DashboardSettings() {}
#Override
public void addColumn(DashboardColumn column) {
this.columns.add(column);
}
#Override
public List<DashboardColumn> getColumns() {
return columns;
}
...snip...
}
The problem is the columns field. I would like this field to be persisted into it's own table but because DashboardColumn is an interface (and from a third party so can't be changed) the field currently gets stored in a blob. If I change the type of the columns field to my own implementation (DashboardColumnSettings) which is marked with #Entity the addColumn method would cease to work correctly - it would have to do a type check and cast.
The type check and cast is not the end of the world as this code will only be consumed by our development team but it is a trip hazard. Is there any way to have the columns field persisted while at the same time leaving it as a DashboardColumn?

You can try to use targetEntity attribute, though I'm note sure it would be better than explicit cast:
#OrderColumn( name="COLUMN_ORDER" )
#OneToMany(targetEntity = DashboardColumnSettings.class)
private List<DashboardColumn> columns;

Depends on the JPA implementation (you don't mention which one); the JPA spec doesn't define support for interface fields, nor for Collections of interfaces. DataNucleus JPA certainly allows it, primarily because we support it for JDO also, being something that is part of the JDO spec.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse