Read and write the same table defined as two different persistent entity with different entity manager - JPA/Eclipelink - jpa

Here's a scenario I need to execute to throw the optimistic lock exception in case if something has changed underneath between the time I read from the DB and by the time I actually writes it back. I have a project that defines it's own persistence entities and unit. The EntityManager is defined with the extended persistence context. It define a bunch of persistent entities.
Now I have a dependency on some jar file and now this jar file also defines some persistence entities and it's own persistence unit(in persistence.xml file) and entityManager. The entityManager here also is defined with extended persistence context.
Both the persistence unit are pointing to the same database and same schema/library(DB2). THere are some persistent entities that are common between both the persistence units i.e. they refer to the same underlying table but since they are coming from two different jars they have a different name/package structure and a different set of keys defined as a composite primary key.
The scenario is and I need to read an entity from one of the entityManager and then write/update using the different entity manager. Since I am running this in a single transaction the persistence context propagates but since these are 2 different persistence units and each defines a different name/package structure for the same underlying table, even I read from one of them and try to save from the other, the second entity manager saves this as a new entity as if it doesn't exist in the table. So even if the data has changed for the same row in the DB between the time I read it from the first entity Manager and the time I try to write using different entity manager it doesn't throw the optimistic lock exception, it just saves it as it is.
I tried my best to explain the scenario but please don't hesitate to ask in case of query. Also, I believe if you have 2 different PU that defines persistent entity with different name for the same underlying table, you just cannot read from one em and have the other em know that since the underlying table is same, and since this row has been read by different em so let convert this entity the way that second PU defines it and when tries, the second em tries to compare the data in the table first if something has changed since the very first time the first em read from the same table and if something has changed then to throw the optimistic lock exception.

Related

Should my logs table be managed by entity framework?

I want to log exceptions to my database to ensure failures are recorded. I am using entity framework.
Should I setup an encapsulated logging service that records to a table which is not managed by entity framework or should I just make an ef class called Log?
Im thinking that a log is not really an entity that represents my application parts, but represents meta data which is why I ask.
Consider a separate (bounded) context for your general logging. If logs happen to reference top-level entities you can define minimal entity definitions for these as well. Logging operations are heavy-write, so by keeping a separate DbContext you minimize the spin-up time.
When it comes to auditing (I.e. persisting change tracking) then I commonly use a pattern that hooks directly into the DbContext events and records information based on when entities are updated, inserted, or deleted.

JPA EntityManager Merge Exception

I'm getting a primaryKeyUpdateDisallowed ValidationException when trying to merge an entity into the database when there is an existing entity with the same primary key.
Of course, I don't get the exception when I perform a TypedQuery and have the entity manager return the entity first, update the appropriate values, and then merge. The problem is this process is too expensive, resource-wise. I need to be able to simply merge without the resulting exception.
Is there a way to structure our entity class so that we can over-write the records, including the primary keys? Or some other way around the problem?
In JPA, if you wish to do:
entityManager.merge(someEntity);
then you must first have someEntity loaded from the DB into the entityManager persistence context and then detached via "entityManager.detach(someEntity)" or by clearing the persistence context. If someEntity is not pre-loaded and detached, but is instead created via "new SomeEntity()", then the merge() function will determine that you have added a new entity and carry out an internal operation very similar to entityManager.persist(someEntity). When the data is flushed or committed in a transaction, it will generate a SQL INSERT, which will clash with the pre-existing PK.
Here's the specified behaviour from the JPA 2 spec:
The semantics of the merge operation applied to an entity X are as follows:
If X is a detached entity, the state of X is copied onto a pre-existing managed entity instance X' of the same identity or a new managed copy X' of X is created.
If X is a new entity instance, a new managed entity instance X' is created and the state of X is copied into the new managed entity instance X'.
> The problem is this process is too expensive, resource-wise. I need to be able to simply merge without the resulting exception.
This shouldn't be the case. Retrieving the entity or list of entities from the DB should be efficient. Possibly your query could be improved. Do you have a "where" clause on the query? Do you have much cascading of entities during refresh (via attribute cascade=CascadeType.REFRESH/ALL on relationships such as #OneToMany or #ManyToOne)? Do you have a very complex inheritance hierarchy with many entities/tables? If you could provide your JPQL or criteria query, I'm sure the problem will be simple to fix. :-)

Compose queries across Entity Data Models

Is there a way to compose queries from 2 different entity models if the models are hitting the same underlying database.
The scenario I have is this:
I have a framework that uses EF for data access.(EDM 1)
I have a client application that uses services of the framework and also uses EF for it's own data access.(EDM2)
There are situations where I need to compose queries and join on entities that span the 2 EDMs.
Is there a way to do this without getting the data in memory from the first EDM and then apply additional predicates/joins in memory from the entities of the 2nd EDM?
I hope I'm articulating this the right way
EDIT
#Ladislav Mrnka:
The first EDM is the data access layer for a reusable framework.
It doesn't make sense to couple the EF generated entities from this EDM with
those of the consuming client
It defeats reusabilty of the API if I did this and I'd have to carry around additonal bloat
(EF metadata and DB tables of the client) everytime I wanted to redeploy the framework.
Also this would make managing the model in the designer unwieldy.
I'm currently using what you mention n item 7 as the solutuon and the performance is abysmal
due to the fact that I have to end up returning more data(i.e. entities) than needed from framework using
EDM1 and then filter out the ones not needed based on predicates/conditions based on value of
properties from entities in the second EDM. End result is a huge performance degradation and an unhappy DBA.
For this reason I ended up pushing the logic needed to retrieve the entities
to a SPROC in which I can access the tables that both EDMs use and apply the predicates needed
and have the entire query run in the DB as opposed to bringing the data in memory
and then filter out unnecessary ones.Downside is that I can't use LINQ
Item 8 that you mention sounds interesting but from what it sounds like I doubt that
you get strong typing at design time, or do you?
Can you upload your code sample someplace so that I can try it out?
Important edit
There is no build in support for achieving this with two ObjectContext types. Your query must always be executed against single ObjectContext.
Probably the best way to go: This was interesting enough for me to try it myself. I started with very simple idea. Two EDMX files (used with POCO T4 generators), each containing single entity. I take metadata description from second connection string and added it to first connection string. I used ObjectContext and ObjectSet directly. By doing this I was able to query and modify both entities from single ObjectContext instance. I also tryed to create query joining entities from both models and it worked.
This obviously works only if both EDMX map to the same database (same db connection string).
The important part is connections string:
<configuration>
<connectionStrings>
<add name="TestEntities" connectionString="metadata=res://*/FirstModel.csdl|res://*/FirstModel.ssdl|res://*/FirstModel.msl|res://*/SecondModel.csdl|res://*/SecondModel.ssdl|res://*/SecondModel.msl;provider=System.Data.SqlClient;provider connection string="Data Source=.;Initial Catalog=Test;Integrated Security=True;MultipleActiveResultSets=True"" providerName="System.Data.EntityClient" />
</connectionStrings>
</configuration>
This connection string contains metadata from two models - FirstModel.edmx and SecondModel.edmx.
Another problem is to force EF to use mapping from both these files. Each EDMX file must define unique container for SSDL and CSDL. ObjectContext offers property called DefaultContainerName. This property can be set directly or through some constructor overloads. Once you set this property you bind your ObjectContext instance to single EDMX - for this scenario you must not set this property. Omitting DefaultContainerName can have some consequences because some features and declarations can stop working (you will get runtime errors). You should not have problems with POCO unless you want to use some advanced features. You will most probably have proplems if you are using Entity objects (heavy EF entities. All methods using entity sets defined as strings are dependent on container. Due to this I suggest using such configuration only when necessary - for cross models queries.
Last problem is generating entities and "strongly typed" derived ObjectContext. The way to go is modify T4 template so that one template reads data from multiple EDMX files and generates context end entities for all of them - I already doing this in my project and it works. Default T4 implementation doesn't follow needed approach described in previous paragraph. Derived ObjectContext from default T4 implementation is dependent on single EDMX and entity container.
This part has been written before previous edit.
I'm leaving the rest of information just because some of them can be useful in other scenarios including work with multiple databases.
ORM like entity framework operates on top of mapping between object world and database world. In EF the object world is described by CSDL, database world is described as SSDL and mapping between them is described as MSL (all are just XML with well known schema). At design time these descriptions are part of model stored in EDMX file. During compilation these descriptions are extracted from EDMX and by default included as resource files to compiled assembly.
When you create instance of ObjectContext it receives connections string which contains reference to CSDL, SSDL and MSL resource files. SSDL or MSL do not specify include element to add information from other files. CSDL offers Using element which will allow you reusing existing mapping but this feature is not supported by designer. ConnectionString is used to initialize EntityConnection instance which is in turn used to initialize ObjectContext's MetadataWorkspace (runtime mapping information). Also ObjectContext doesn't provide any functionality of nesting multiple contexts into hiearchy. Connection string can't contain reference to multiple instances of these files. Edit: It can. I just tested it. See the initial paragraphs.
When you run Linq or ESQL query on the instance of ObjectContext it usese MSL to map your entities or POCO classes (defined by CSDL) into DB query (defined by SSDL description of database tables). If it doesn't have this information it will not work (and it can't have that information if it is stored in separate EDMX).
So how to solve this problem? There are several ways:
Always consider: Merge your mapping into one file (if multiple files are used for same database). That is supposed way to use EF and as you mentioned you are querying same DB so two EF models are not needed.
Duplicate entity description in second model. If you use EF4 and POCO you can map same descriptions from multiple models into one POCO class definition. I don't like this solution but sometimes it can help.
Define DB View or Stored procedure containing your query (or core of your query) and map it in one model to new entity.
Use DefiningQuery in one model (you will probably need 3rd one if you use Update from database feature) and map it to new entity. DefiningQuery is custom SQL query defined in SSDL instead of table or view description.
Use Function with custom CommandText specifying DB query. It is similar to using DefiningQuery and it has the same limitation. You must manually (in EDMX) map the result of the function into new complex type (another difference to DefiningQuery which is mapped to new entity).
Define new type for result of the query (properties of the type must have same names as returned columns in query) and use ObjectContext's ExecuteStoreQuery (only in EF4).
Divide query into two parts each executed separately on its own context and use linq-to-objects to get result. I don't like this solution.
This one is only high level idea - I didn't try it and I don't know if it works. As described above runtime mapping is dependent on the content of MetadataWorkspace instance which is filled from EntityConnection. EntityConnection also provides constructor which receives MetadataWorkspace instance directly. So generally if it would be possible to fill MetadataWorkspace from multiple EDMX you will not need multiple ObjectContext instances but your mapping would be still separated into two EDMXs. This would hopefully allow you writing custom Linq queries on top of two mapping files). Edit: It should be possible because it is exactly what EF is doing if you define multiple mappings in connection string.
Use CSDL Using feature for breaking the model into multiple reused parts.

Create new or update existing entity at one go with JPA

A have a JPA entity that has timestamp field and is distinguished by a complex identifier field. What I need is to update timestamp in an entity that has already been stored, otherwise create and store new entity with the current timestamp.
As it turns out the task is not as simple as it seems from the first sight. The problem is that in concurrent environment I get nasty "Unique index or primary key violation" exception. Here's my code:
// Load existing entity, if any.
Entity e = entityManager.find(Entity.class, id);
if (e == null) {
// Could not find entity with the specified id in the database, so create new one.
e = entityManager.merge(new Entity(id));
}
// Set current time...
e.setTimestamp(new Date());
// ...and finally save entity.
entityManager.flush();
Please note that in this example entity identifier is not generated on insert, it is known in advance.
When two or more of threads run this block of code in parallel, they may simultaneously get null from entityManager.find(Entity.class, id) method call, so they will attempt to save two or more entities at the same time, with the same identifier resulting in error.
I think that there are few solutions to the problem.
Sure I could synchronize this code block with a global lock to prevent concurrent access to the database, but would it be the most efficient way?
Some databases support very handy MERGE statement that updates existing or creates new row if none exists. But I doubt that OpenJPA (JPA implementation of my choice) supports it.
Event if JPA does not support SQL MERGE, I can always fall back to plain old JDBC and do whatever I want with the database. But I don't want to leave comfortable API and mess with hairy JDBC+SQL combination.
There is a magic trick to fix it using standard JPA API only, but I don't know it yet.
Please help.
You are referring to the transaction isolation of JPA transactions. I.e. what is the behaviour of transactions when they access other transactions' resources.
According to this article:
READ_COMMITTED is the expected default Transaction Isolation level for using [..] EJB3 JPA
This means that - yes, you will have problems with the above code.
But JPA doesn't support custom isolation levels.
This thread discusses the topic more extensively. Depending on whether you use Spring or EJB, I think you can make use of the proper transaction strategy.

JPA EntityManager: Why use persist() over merge()?

EntityManager.merge() can insert new objects and update existing ones.
Why would one want to use persist() (which can only create new objects)?
Either way will add an entity to a PersistenceContext, the difference is in what you do with the entity afterwards.
Persist takes an entity instance, adds it to the context and makes that instance managed (i.e. future updates to the entity will be tracked).
Merge returns the managed instance that the state was merged with. It does return something that exists in PersistenceContext or creates a new instance of your entity. In any case, it will copy the state from the supplied entity, and return a managed copy. The instance you pass in will not be managed (any changes you make will not be part of the transaction - unless you call merge again). Though you can use the returned instance (managed one).
Maybe a code example will help.
MyEntity e = new MyEntity();
// scenario 1
// tran starts
em.persist(e);
e.setSomeField(someValue);
// tran ends, and the row for someField is updated in the database
// scenario 2
// tran starts
e = new MyEntity();
em.merge(e);
e.setSomeField(anotherValue);
// tran ends but the row for someField is not updated in the database
// (you made the changes *after* merging)
// scenario 3
// tran starts
e = new MyEntity();
MyEntity e2 = em.merge(e);
e2.setSomeField(anotherValue);
// tran ends and the row for someField is updated
// (the changes were made to e2, not e)
Scenario 1 and 3 are roughly equivalent, but there are some situations where you'd want to use Scenario 2.
Persist and merge are for two different purposes (they aren't alternatives at all).
(edited to expand differences information)
persist:
Insert a new register to the database
Attach the object to the entity manager.
merge:
Find an attached object with the same id and update it.
If exists update and return the already attached object.
If doesn't exist insert the new register to the database.
persist() efficiency:
It could be more efficient for inserting a new register to a database than merge().
It doesn't duplicates the original object.
persist() semantics:
It makes sure that you are inserting and not updating by mistake.
Example:
{
AnyEntity newEntity;
AnyEntity nonAttachedEntity;
AnyEntity attachedEntity;
// Create a new entity and persist it
newEntity = new AnyEntity();
em.persist(newEntity);
// Save 1 to the database at next flush
newEntity.setValue(1);
// Create a new entity with the same Id than the persisted one.
AnyEntity nonAttachedEntity = new AnyEntity();
nonAttachedEntity.setId(newEntity.getId());
// Save 2 to the database at next flush instead of 1!!!
nonAttachedEntity.setValue(2);
attachedEntity = em.merge(nonAttachedEntity);
// This condition returns true
// merge has found the already attached object (newEntity) and returns it.
if(attachedEntity==newEntity) {
System.out.print("They are the same object!");
}
// Set 3 to value
attachedEntity.setValue(3);
// Really, now both are the same object. Prints 3
System.out.println(newEntity.getValue());
// Modify the un attached object has no effect to the entity manager
// nor to the other objects
nonAttachedEntity.setValue(42);
}
This way only exists 1 attached object for any register in the entity manager.
merge() for an entity with an id is something like:
AnyEntity myMerge(AnyEntity entityToSave) {
AnyEntity attached = em.find(AnyEntity.class, entityToSave.getId());
if(attached==null) {
attached = new AnyEntity();
em.persist(attached);
}
BeanUtils.copyProperties(attached, entityToSave);
return attached;
}
Although if connected to MySQL merge() could be as efficient as persist() using a call to INSERT with ON DUPLICATE KEY UPDATE option, JPA is a very high level programming and you can't assume this is going to be the case everywhere.
If you're using the assigned generator, using merge instead of persist can cause a redundant SQL statement, therefore affecting performance.
Also, calling merge for managed entities is also a mistake since managed entities are automatically managed by Hibernate, and their state is synchronized with the database record by the dirty checking mechanism upon flushing the Persistence Context.
To understand how all this works, you should first know that Hibernate shifts the developer mindset from SQL statements to entity state transitions.
Once an entity is actively managed by Hibernate, all changes are going to be automatically propagated to the database.
Hibernate monitors currently attached entities. But for an entity to become managed, it must be in the right entity state.
To understand the JPA state transitions better, you can visualize the following diagram:
Or if you use the Hibernate specific API:
As illustrated by the above diagrams, an entity can be in one of the following four states:
New (Transient)
A newly created object that hasn’t ever been associated with a Hibernate Session (a.k.a Persistence Context) and is not mapped to any database table row is considered to be in the New (Transient) state.
To become persisted we need to either explicitly call the EntityManager#persist method or make use of the transitive persistence mechanism.
Persistent (Managed)
A persistent entity has been associated with a database table row and it’s being managed by the currently running Persistence Context. Any change made to such an entity is going to be detected and propagated to the database (during the Session flush-time).
With Hibernate, we no longer have to execute INSERT/UPDATE/DELETE statements. Hibernate employs a transactional write-behind working style and changes are synchronized at the very last responsible moment, during the current Session flush-time.
Detached
Once the currently running Persistence Context is closed all the previously managed entities become detached. Successive changes will no longer be tracked and no automatic database synchronization is going to happen.
To associate a detached entity to an active Hibernate Session, you can choose one of the following options:
Reattaching
Hibernate (but not JPA 2.1) supports reattaching through the Session#update method.
A Hibernate Session can only associate one Entity object for a given database row. This is because the Persistence Context acts as an in-memory cache (first level cache) and only one value (entity) is associated with a given key (entity type and database identifier).
An entity can be reattached only if there is no other JVM object (matching the same database row) already associated with the current Hibernate Session.
Merging
The merge is going to copy the detached entity state (source) to a managed entity instance (destination). If the merging entity has no equivalent in the current Session, one will be fetched from the database.
The detached object instance will continue to remain detached even after the merge operation.
Remove
Although JPA demands that managed entities only are allowed to be removed, Hibernate can also delete detached entities (but only through a Session#delete method call).
A removed entity is only scheduled for deletion and the actual database DELETE statement will be executed during Session flush-time.
I noticed that when I used em.merge, I got a SELECT statement for every INSERT, even when there was no field that JPA was generating for me--the primary key field was a UUID that I set myself. I switched to em.persist(myEntityObject) and got just INSERT statements then.
The JPA specification says the following about persist().
If X is a detached object, the EntityExistsException may be thrown when the persist
operation is invoked, or the EntityExistsException or another PersistenceException may be thrown at flush or commit time.
So using persist() would be suitable when the object ought not to be a detached object. You might prefer to have the code throw the PersistenceException so it fails fast.
Although the specification is unclear, persist() might set the #GeneratedValue #Id for an object. merge() however must have an object with the #Id already generated.
Some more details about merge which will help you to use merge over persist:
Returning a managed instance other than the original entity is a critical part of the merge
process. If an entity instance with the same identifier already exists in the persistence context, the
provider will overwrite its state with the state of the entity that is being merged, but the managed
version that existed already must be returned to the client so that it can be used. If the provider did not
update the Employee instance in the persistence context, any references to that instance will become
inconsistent with the new state being merged in.
When merge() is invoked on a new entity, it behaves similarly to the persist() operation. It adds
the entity to the persistence context, but instead of adding the original entity instance, it creates a new
copy and manages that instance instead. The copy that is created by the merge() operation is persisted
as if the persist() method were invoked on it.
In the presence of relationships, the merge() operation will attempt to update the managed entity
to point to managed versions of the entities referenced by the detached entity. If the entity has a
relationship to an object that has no persistent identity, the outcome of the merge operation is
undefined. Some providers might allow the managed copy to point to the non-persistent object,
whereas others might throw an exception immediately. The merge() operation can be optionally
cascaded in these cases to prevent an exception from occurring. We will cover cascading of the merge()
operation later in this section. If an entity being merged points to a removed entity, an
IllegalArgumentException exception will be thrown.
Lazy-loading relationships are a special case in the merge operation. If a lazy-loading
relationship was not triggered on an entity before it became detached, that relationship will be
ignored when the entity is merged. If the relationship was triggered while managed and then set to null while the entity was detached, the managed version of the entity will likewise have the relationship cleared during the merge."
All of the above information was taken from "Pro JPA 2 Mastering the Java™ Persistence API" by Mike Keith and Merrick Schnicariol. Chapter 6. Section detachment and merging. This book is actually a second book devoted to JPA by authors. This new book has many new information then former one. I really recommed to read this book for ones who will be seriously involved with JPA. I am sorry for anonimously posting my first answer.
There are some more differences between merge and persist (I will enumerate again those already posted here):
D1. merge does not make the passed entity managed, but rather returns another instance that is managed. persist on the other side will make the passed entity managed:
//MERGE: passedEntity remains unmanaged, but newEntity will be managed
Entity newEntity = em.merge(passedEntity);
//PERSIST: passedEntity will be managed after this
em.persist(passedEntity);
D2. If you remove an entity and then decide to persist the entity back, you may do that only with persist(), because merge will throw an IllegalArgumentException.
D3. If you decided to take care manually of your IDs (e.g by using UUIDs), then a merge
operation will trigger subsequent SELECT queries in order to look for existent entities with that ID, while persist may not need those queries.
D4. There are cases when you simply do not trust the code that calls your code, and in order to make sure that no data is updated, but rather is inserted, you must use persist.
JPA is indisputably a great simplification in the domain of enterprise
applications built on the Java platform. As a developer who had to
cope up with the intricacies of the old entity beans in J2EE I see the
inclusion of JPA among the Java EE specifications as a big leap
forward. However, while delving deeper into the JPA details I find
things that are not so easy. In this article I deal with comparison of
the EntityManager’s merge and persist methods whose overlapping
behavior may cause confusion not only to a newbie. Furthermore I
propose a generalization that sees both methods as special cases of a
more general method combine.
Persisting entities
In contrast to the merge method the persist method is pretty straightforward and intuitive. The most common scenario of the persist method's usage can be summed up as follows:
"A newly created instance of the entity class is passed to the persist method. After this method returns, the entity is managed and planned for insertion into the database. It may happen at or before the transaction commits or when the flush method is called. If the entity references another entity through a relationship marked with the PERSIST cascade strategy this procedure is applied to it also."
The specification goes more into details, however, remembering them is not crucial as these details cover more or less exotic situations only.
Merging entities
In comparison to persist, the description of the merge's behavior is not so simple. There is no main scenario, as it is in the case of persist, and a programmer must remember all scenarios in order to write a correct code. It seems to me that the JPA designers wanted to have some method whose primary concern would be handling detached entities (as the opposite to the persist method that deals with newly created entities primarily.) The merge method's major task is to transfer the state from an unmanaged entity (passed as the argument) to its managed counterpart within the persistence context. This task, however, divides further into several scenarios which worsen the intelligibility of the overall method's behavior.
Instead of repeating paragraphs from the JPA specification I have prepared a flow diagram that schematically depicts the behaviour of the merge method:
So, when should I use persist and when merge?
persist
You want the method always creates a new entity and never updates an entity. Otherwise, the method throws an exception as a consequence of primary key uniqueness violation.
Batch processes, handling entities in a stateful manner (see Gateway pattern).
Performance optimization
merge
You want the method either inserts or updates an entity in the database.
You want to handle entities in a stateless manner (data transfer objects in services)
You want to insert a new entity that may have a reference to another entity that may but may not be created yet (relationship must be marked MERGE). For example, inserting a new photo with a reference to either a new or a preexisting album.
I was getting lazyLoading exceptions on my entity because I was trying to access a lazy loaded collection that was in session.
What I would do was in a separate request, retrieve the entity from session and then try to access a collection in my jsp page which was problematic.
To alleviate this, I updated the same entity in my controller and passed it to my jsp, although I imagine when I re-saved in session that it will also be accessible though SessionScope and not throw a LazyLoadingException, a modification of example 2:
The following has worked for me:
// scenario 2 MY WAY
// tran starts
e = new MyEntity();
e = em.merge(e); // re-assign to the same entity "e"
//access e from jsp and it will work dandy!!
I found this explanation from the Hibernate docs enlightening, because they contain a use case:
The usage and semantics of merge() seems to be confusing for new users. Firstly, as long as you are not trying to use object state loaded in one entity manager in another new entity manager, you should not need to use merge() at all. Some whole applications will never use this method.
Usually merge() is used in the following scenario:
The application loads an object in the first entity manager
the object is passed up to the presentation layer
some modifications are made to the object
the object is passed back down to the business logic layer
the application persists these modifications by calling merge() in a second entity manager
Here is the exact semantic of merge():
if there is a managed instance with the same identifier currently associated with the persistence context, copy the state of the given object onto the managed instance
if there is no managed instance currently associated with the persistence context, try to load it from the database, or create a new managed instance
the managed instance is returned
the given instance does not become associated with the persistence context, it remains detached and is usually discarded
From: http://docs.jboss.org/hibernate/entitymanager/3.6/reference/en/html/objectstate.html
Going through the answers there are some details missing regarding `Cascade' and id generation. See question
Also, it is worth mentioning that you can have separate Cascade annotations for merging and persisting: Cascade.MERGE and Cascade.PERSIST which will be treated according to the used method.
The spec is your friend ;)
Scenario X:
Table:Spitter (One) ,Table: Spittles (Many) (Spittles is Owner of the relationship with a FK:spitter_id)
This scenario results in saving : The Spitter and both Spittles as if owned by Same Spitter.
Spitter spitter=new Spitter();
Spittle spittle3=new Spittle();
spitter.setUsername("George");
spitter.setPassword("test1234");
spittle3.setSpittle("I love java 2");
spittle3.setSpitter(spitter);
dao.addSpittle(spittle3); // <--persist
Spittle spittle=new Spittle();
spittle.setSpittle("I love java");
spittle.setSpitter(spitter);
dao.saveSpittle(spittle); //<-- merge!!
Scenario Y:
This will save the Spitter, will save the 2 Spittles But they will not reference the same Spitter!
Spitter spitter=new Spitter();
Spittle spittle3=new Spittle();
spitter.setUsername("George");
spitter.setPassword("test1234");
spittle3.setSpittle("I love java 2");
spittle3.setSpitter(spitter);
dao.save(spittle3); // <--merge!!
Spittle spittle=new Spittle();
spittle.setSpittle("I love java");
spittle.setSpitter(spitter);
dao.saveSpittle(spittle); //<-- merge!!
Another observation:
merge() will only care about an auto-generated id(tested on IDENTITY and SEQUENCE) when a record with such an id already exists in your table. In that case merge() will try to update the record.
If, however, an id is absent or is not matching any existing records, merge() will completely ignore it and ask a db to allocate a new one. This is sometimes a source of a lot of bugs. Do not use merge() to force an id for a new record.
persist() on the other hand will never let you even pass an id to it. It will fail immediately. In my case, it's:
Caused by: org.hibernate.PersistentObjectException: detached entity
passed to persist
hibernate-jpa javadoc has a hint:
Throws: javax.persistence.EntityExistsException - if the entity
already exists. (If the entity already exists, the
EntityExistsException may be thrown when the persist operation is
invoked, or the EntityExistsException or another PersistenceException
may be thrown at flush or commit time.)
You may have come here for advice on when to use persist and when to use merge. I think that it depends the situation: how likely is it that you need to create a new record and how hard is it to retrieve persisted data.
Let's presume you can use a natural key/identifier.
Data needs to be persisted, but once in a while a record exists and an update is called for. In this case you could try a persist and if it throws an EntityExistsException, you look it up and combine the data:
try { entityManager.persist(entity) }
catch(EntityExistsException exception) { /* retrieve and merge */ }
Persisted data needs to be updated, but once in a while there is no record for the data yet. In this case you look it up, and do a persist if the entity is missing:
entity = entityManager.find(key);
if (entity == null) { entityManager.persist(entity); }
else { /* merge */ }
If you don't have natural key/identifier, you'll have a harder time to figure out whether the entity exist or not, or how to look it up.
The merges can be dealt with in two ways, too:
If the changes are usually small, apply them to the managed entity.
If changes are common, copy the ID from the persisted entity, as well as unaltered data. Then call EntityManager::merge() to replace the old content.
persist(entity) should be used with totally new entities, to add them to DB (if entity already exists in DB there will be EntityExistsException throw).
merge(entity) should be used, to put entity back to persistence context if the entity was detached and was changed.
Probably persist is generating INSERT sql statement and merge UPDATE sql statement (but i'm not sure).
Merge won't update a passed entity, unless this entity is managed. Even if entity ID is set to an existing DB record, a new record will be created in a database.