Best way to handle JPA merge? - jpa

I'm new to the whole JPA thing so I have multiple questions about the best way to handle JPA merge and persist.
I have an user object which should be updated (some values like date and name). Do I have to merge the passed object first or is it safer to find the new object?
Currently my code for updating a user looks like this:
public void updateUserName(User user, String name) {
// maybe first merge it?
user.setName(name);
user.setChangeDate(new Date());
em.merge(user);
}
How can I be sure that the user has not been manipulated before the update method is called? Is it safer to do something like this:
public void updateUserName(int userId, String name) {
User user = em.getReference(User.class, userId);
user.setName(name);
user.setChangeDate(new Date());
em.merge(user);
}
Maybe other solutions? I watched multiple videos and saw many examples but they all were different and nobody explained what the best practice is.
What is the best approach to add children to relationships? For example my user object has a connection to multiple groups. Should I call the JPA handler for users and just add the new group to the user group list or should I create the group in a so group handler with persist and add it manually to my user object?
Hope somebody here has a clue ;)

It depends on what you want to achieve and how much information you have about the origin of the object you're trying to merge.
Firstly, if you invoke em.merge(user) in the first line of your method or in the end it doesn't matter. If you use JTA and CMT, your entity will be updated when method invocation finishes. The only difference is that if you invoke em.merge(user) before changing the user you should use the returned instance instead of your parameter, so either it is:
public void updateUserName(User user, String name) {
User managedUser = em.merge(user);
managedUser.setChangeDate(new Date());
// no need of additional em.merge(-) here.
// JTA automatically commits the transaction for this business method.
}
or
public void updateUserName(User user, String name) {
user.setChangeDate(new Date());
em.merge(user);
// JTA automatically commits the transaction for this business method.
}
Now about updating entity.
If you just want to update some well-defined fields in your entity - use the second approach as it's safer. You can't be sure if a client of your method hasn't modified some other fields of your entity. Therefore, em.merge(-) will update them as well which might not be what you wanted to achieve.
On the other hand - if you want to accept all changes made by user and just override / add some properties like changeDate in your example, the first approach is also fine (merge whole entity passed to the business method.) It really depends on your use-case.
I guess it depends on your cascading settings. If you want to automatically persist / merge all Groups whenever the User entity is changed - it's safe to just add it to the User's collection (something like User#addGroup(Group g) { groups.add(g)}. If you don't want cascading, you can always create your own methods that will propagate to the other side of the relationship. It might be something like: User#addGroup(Group g) that automatically invokes g.addUser(this);.

Question 1
The merge method must be called on a detached entity.
The merge method will return the merged object attached to the entityManager.
What does it mean ?
An entity is detached as soon as the entityManager you use to fetch it is closed. (i.e. most of the time because you fetch it in a previous transaction).
In your second sample of code: user is attached (because you just fetch it) and so calling merge is useless. (BTW : it is not getReference but find)
In your first sample: we don't know the state of user (detached entity or not ?). If it is detached, it make sense to call merge , but careful that merge don't modify it's object passed as an argument. So here is my version of your first sample:
/**
* #param user : a detached entity
* #return : the attached updated entity
**/
public User updateUserName(User user, String name) {
user.setName(name);
user.setChangeDate(new Date());
return em.merge(user);
}
Question 2
Maybe some code sample to explain what you mean by jpa handler can help us to understand your concern. Anyway, I'll try to help you.
If you have a persistent user and you need to create a new group and associating it with the persistent user:
User user = em.find(User.class, userId);
Group group = new Group();
...
em.persist(group);
user.addToGroups(group);
group.addToUsers(user); //JPA won't update the other side of the relationship
//so you have to do it by hand OR being aware of that
If you have a persistent user and a persistent group and you need to associate them:
User user = em.find(User.class, userId);
Group group = em.find(Group.class, groupId);
...
user.addToGroups(group);
group.addToUsers(user);
General considerations
The best practices regarding all of this really depends on you manage the transactions (and so the lifecycle of the entityManager) vs the lifecycle of your objects.
Most of the time: an entityManager is a really short time living object. On the other hand your business objects may live for longer and so you will have to call merge (and being careful about the fact that merge don't modify the object passed in argument !!!).
You can decide to fetch and modify your business objects in the same transaction (i.e. with the same entityManager): it means a lot more database access and this strategy must generally be combined with a second-level cache for performance reason. But in this case you won't have to call merge.
I hope this help.

Related

Entity Framework update with attach or single

I'm new to entity framework and seeing different approaches for updating.
public void Update (Model model)
{
var modelInDb = context.Singe(m => m.Id == model.Id);
modelInDb.Name = "New Name";
context.SaveChanges();
}
public void Update (Model model)
{
context.Customer.Attach(model);
model.Name = "New Name";
context.SaveChanges();
}
Why I should use attach over single? Could you explain difference.
Passing entities between client and server should be considered an antipattern because it can make your system vulnerable to man in the browser and similar attacks.
Your 2 examples don't really outline much because you are setting the updated value solely in your method, rather than based on input from the view. A more common example for an update would be:
public void Update (Model model)
{
var modelInDb = context.Models.Singe(m => m.Id == model.Id);
modelInDb.Name = model.Name;
context.SaveChanges();
}
and
public void Update (Model model)
{
context.Models.Attach(model);
context.Entity(model).State = EntityState.Modified;
context.SaveChanges();
}
In your example, if your method sets the modifications then the UPDATE SQL statement should be Ok, just modifying the customer Name. However, if you attach the model, and set it's state to Modified to save the new model fields to the DB, it will update all columns.
Of these two examples, the first is better than the second for a number of reasons. The first example is loading the data from the context and copying across only the data you expect to be able to change from the view. The second is taking the model from the view as-is, attaching it to the context, and will overwrite any existing fields. Attackers can discover this and use the behaviour to alter data your view did not allow to change. A customer Order for instance might contain a lot of data about an order including relationships for products, discounts, etc. A user may not see any of these details in their view, but by passing an Entity graph, all of it is visible in the web request data. Not only is this going to be sending far more information to the client than the client needs (slower) but it can be altered in debug tools and the like prior to reaching your service as well. Attaching and updating the returned entity exposes your system to tampering.
Additionally you risk overwriting stale data in your objects. With option 1 you are loading the "right now" copy of the entity. A simple check to a Row Version Number or Last Modified Date between your passed in data and the current DB copy can signal whether that row had changed since the copy passed to the client a while ago. With the 2nd method, you can inadvertently erase modifications to data without a trace.
The better approach is to pass ViewModels to and from your view. By using Select or Automapper to fill a view model, you avoid exposing any more about your domain than the client needs to see. You also only accept back the data needed for the operation the client can perform. This reduces the payload size and reduces the vulnerability to tampering. I've seen an alarming # of examples, even from Microsoft, passing Entities around between client and server. It looks practical since the objects are already there, but this is wasteful for resources/performance, troublesome for dealing with cyclic references and serialization, and prone to data tampering and stale data overwrites.

Partially initializing domain entities

In the following, author advises to not partially initialize domain entities.
As we stated earlier, each customer must have no more than 5 contacts. By not returning the contacts along with the customers themselves, we leave a hole in our domain model which allows us to add a 6th contact and thus break this invariant.
Because of that, the practice of partial initialization should be avoided. If your repository returns a list of domain entities (or just a single domain entity), make sure the entities are fully initialized meaning that all their properties are filled out.
https://enterprisecraftsmanship.com/posts/partially-initialized-entities-anti-pattern/
So, should we have to load the whole object graph? A customer with all contacts and all related things or entity framework lazy loading would help?
It probably has less to do with the object graph and more to do with the invariants involved.
As someone posted in the comments of that post, a performance issue may very well arise when there are 1000's of permitted contacts. An example of something to this effect may be that a Customer may only have, say, 5 active Order instances. Should all order instances linked to the customer be loaded? Most certainly not. In fact, an Order is another aggregate and an instance of one aggregate should not be contained in another aggregate. You could use a value object containing the id of the other aggregate but for a great many of these the same performance issue may manifest itself.
An alternative may be to simply keep a ContactCount or, in my example, an ActiveOrderCount which is kept consistent. If the actual relationships are to be stored/removed then these may be attached to the relevant aggregate when adding/removing in order to persist the change but that is a transient representation.
So, should we have to load the whole object graph? A customer with all contacts and all related things or entity framework lazy loading would help?
The answer is, actually, a resounding "yes". However, your object model should not be deep. You should make every attempt to create small aggregates. I try to model my aggregates with a single root entity and then containing value objects. The entire aggregate is loaded. Lazy-loading is probably an indication that you are querying your domain which is something that I suggest one not do. Rather create a simple query mechanism that uses some read model to return return the relevant data for your front-end.
The anti-pattern of partially loaded entities has to do with both graphs (children and relatives) as well as data within an entity. The reason it is an anti-pattern is because any code that is written to accept, and expect an entity should be given a complete and valid entity.
This is not to say that you always must load a complete entity, it is that if you ever return an entity, it should be a complete, or complete-able entity. (proxies associated to a live DbContext)
An example of a partially loaded example and why it goes bad:
Someone goes to write the following method that an MVC controller will call to get a customer and return it to a view...
public IEnumerable<Customer> GetCustomers(string criteria)
{
using (var context = new MyDbContext())
{
return context.Customers.Where(x => x.IsActive && x.CustomerName.StartsWith(criteria)).ToList();
}
}
Code like this may have worked earlier with simpler entities, but Customer had related data like Orders and when MVC went to serialize it, they got an error because the Orders proxies could not lazy load due to the DbContext being disposed. The options were to somehow eager-load all related details with this call to return the complete customer, completely disable lazy loading proxies, or return an incomplete customer. Since this method would be used to display a summary list of just customer details, the author could choose to do something like:
public IEnumerable<Customer> GetCustomers(string criteria)
{
using (var context = new MyDbContext())
{
return context.Customers.Where(x => x.IsActive && x.CustomerName.StartsWith(criteria))
.Select(x => new Customer
{
CustomerId = x.CustomerId,
CustomerName = x.CustomerName,
// ... Any other fields that we want to display...
}).ToList();
}
}
The problem seems solved. The trouble with this approach, or turning off lazy load proxies, is that you are returning a class that implies "I am a Customer Entity". That object may be serialized to a view, and de-serialized back from a view and passed to another method that is expecting a Customer Entity. Modifications to your code down the road will need to somehow determine which "Customer" objects are actually associated with a DbContext (or a complete, disconnected entity) vs. one of these partial, and incomplete Customer objects.
Eager-loading all of the related data would avoid the issue of the partial entity, however it is both wasteful in terms of performance and memory usage, and prone to bugs as entities evolve as when relatives are added they need to be eager-fetched in the repository or could result in lazy load hits, errors, or incomplete entity views getting introduced down the road.
Now in the early days of EF & NHibernate you would be advised to always return complete entities, or write your repositories to never return entities, instead, return DTOs. For example:
public IEnumerable<CustomerDTO> GetCustomers(string criteria)
{
using (var context = new MyDbContext())
{
return context.Customers.Where(x => x.IsActive && x.CustomerName.StartsWith(criteria))
.Select(x => new CustomerDTO
{
CustomerId = x.CustomerId,
CustomerName = x.CustomerName,
// ... Any other fields that we want to display...
}).ToList();
}
}
This is a better approach than the above one because by returning and using the CustomerDTO, there is absolutely no confusion between this partial object and a Customer entity. However, this solution has its drawbacks. One is that you may have several similar, but different views that need customer data, and some may need a bit extra or some of the related data. Other methods will have different search requirements. Some will want pagination or sorting. Using this approach will be similar to the article's example where you end up with a repository returning several similar, but different DTOs with a large number of variant methods for different criteria, inclusions, etc. (CustomerDTO, CustomerWithAddressDTO, etc. etc.)
With modern EF there is a better solution available for repositories, and that is to return IQueryable<TEntity> rather than IEnumerable<TEntity> or even TEntity. For example, to search for customers leveraging IQueryable:
public IEnumerable<Customer> GetCustomers()
{
return Context.Customers.Where(x => x.IsActive)
}
Then, when your MVC Controller goes to get a list of customers with it's criteria:
using (var contextScope = ContextScopeFactory.Create())
{
return CustomerRepository.GetCustomers()
.Where(x => x.CustomerName.Contains(criteria)
.Select(x => new CustomerViewModel
{
CustomerId = x.CustomerId,
CustomerName = x.CustomerName,
// ... Details from customer and related entities as needed.
}).ToList();
}
By returning IQueryable the repository does not need to worry about complete vs. incomplete representations of entities. It can enforce core rules such as active state checking, but leave it up to the consumers to filter, sort, paginate, or otherwise consume the data as they see fit. This keeps the repositories very lightweight and simple to work with while allowing controllers and services that consume them to be unit tested with mocks in place of the repositories. The controllers should consume the entities returned by the repository, but take care not to return these entities themselves. Instead they can populate view models (or DTOs) to hand over to the web client or API consumer to avoid partial entities being passed around and confused for real entities.
This applies to cases even when a repository is expected to return just 1 entity, returning IQueryable has it's advantages.
for instance, comparing:
public Customer GetCustomerById(int customerId)
{
return Context.Customers.SingleOrDefault(x => x.CustomerId == customerId);
}
vs.
public IQueryable<Customer> QGetCustomerById(int customerId)
{
return Context.Customers.Where(x => x.CustomerId == customerId);
}
These look very similar, but to the consumer (controller/service) it would be a bit different.
var customer = CustomerRepository.GetCustomerById(customerId);
vs.
var customer = CustomerRepository.QGetCustomerById(customerId).Single();
Slightly different, but the 2nd is far more flexible. If we just wanted to check if a customer existed?
var customerExists = CustomerRepository.GetCustomerById(customerId) != null;
vs.
var customerExists = CustomerRepository.QGetCustomerById(customerId).Any();
The first would execute a query that loads the entire customer entity. The second merely executes an Exists check query. When it comes to loading related data? The first method would need to rely on lazy loading or simply not have related details available, where as the IQueryable method could:
var customer = CustomerRepository.QGetCustomerById(customerId).Include(x => x.Related).Single();
or better, if loading a view model with or without related data:
var customerViewModel = CustomerRepository.QGetCustomerById(customerId)
.Select(x => new CustomerViewModel
{
CustomerId = x.CustomerId,
CustomerName = x.CustomerName,
RelatedName = x.Related.Name,
// ... etc.
}).Single();
Disclaimer: Actual mileage may vary depending on your EF version. EF Core has had a number of changes compared to EF6 around lazy loading and query building.
A requirement for this pattern is that the DbContext either has to be injected (DI) or provided via a unit of work pattern as the consumer of the repository will need to interact with the entities and their DbContext when materializing the query created by the repository.
A case where using a partially initialized entity is perfectly valid would be when performing a Delete without pre-fetching the entity. For instance in cases where you're certain a particular ID or range of IDs needs to be deleted, rather than loading those entities to delete you can instantiate a new class with just that entity's PK populated and tell the DbContext to delete it. The key point when considering the use of incomplete entities would be that it is only cases where the entity only lives within the scope of the operation and is not returned to callers.

persisting an update query using openJPA

I am attempting to update an existing record using JPA. The following link seems to suggest that the only way to update a record would be to write the update query for it
enter link description here
Which is fine. But again, I am wondering why am I pulling this out of stored proc to use all f the magic of open JPA?
I thought that If I had an instance of a JPA object that if I tried to persist to the database using a call similar to this
emf.persist(launchRet)
the JPA framework would check to see if the record allready exists, if so, it would then proceed to make the changes to that record, if not, it would just add a new record. Which would be really cool. Instead, I am going to have to end up writing all that logic myself in an update query. Which is fine, But why can't I just use a stored proc and just pass it all the necessary values?
UPDATE: CODE EXPLAINING WHAT MY LAST COMMENT IS ALL ABOUT
try{
launchRet = emf.find(QuickLaunch.class, launch.getQuickLaunchId());
if(launchRet==null){
emf.getTransaction().begin();
emf.persist(launchRet);
emf.getTransaction().commit();
}
else{
emf.refresh(launchRet);
}
}
The variable launch is passed into the method...
public QuickLaunch UpdateQuickLaunchComponent(QuickLaunch launch)
Would I simple just set the found launch launchRet equal to the launch that was passed in?
Read the link that you posted:
You can modify an entity instance in one the following ways:
Using an Updating Query
Using the Entity's Public API
[...]
The way used in 99% of the cases is the second way:
Foo someExistingFoo = em.find(Foo.class, someExistingFooId);
someExistingFoo.setSomeValue(theNewValue);
// no need to call any persist method: the EM will flush the new state
// of the entity to the database when needed
Of course, the entity can also be loaded via a query, or by navigating through the graph of entities.
If you have a detached entity, and you want to persist its state to the database, use the EntityManager.merge() method. It finds the entity with the same ID as the detached one passed as argument, copies the state from the detached entity to the attached one, and returns the attached one:
Foo attachedModifiedFoo = em.merge(detachedFoo);
If the detached entity isn't persistent (i.e. doesn't have any ID), then it is created and made persistent.

Calculation before db update

I'm using Play Framework and I have what I think is a very frequent persistence problem :
I display a form with values coming from the database and a field 'quantity'
The user updates the form and changes the 'quantity' value
He clicks on the "save button"
In the controller method called, I want to get the old value of 'quantity' and calculate the difference between the new and the old one before updating the DB
To make that, i use findById (before calling the object.save method), but it gives me the new value, not the old one : it apparently looks into some cache (which one ?) instead of requesting the DB
=> is that a normal ? how can i get the old value, make my calculation and then persist ?
Thanks a lot for your help, I do not want to manage old/new value in my DB...i'm sure it's bad practice !
UPDATE
public static void save(#Valid Lot lot) {
History element = new History();
element.date = new Date();
//HERE below it returns the new value, not the old one
Lot databaseLot = Lot.findById(lot.id);
element.delta= databaseLot.quantity - lot.quantity;
element.save();
lot.save();
list(null, null, null, null);
}
This is because, Play is doing some magic for you here.
When you pass a JPA Object into your controller, that contains an ID, Play will automatically retrieve this JPA Object from the database. If you look here, it explains this in a little more detail. It states (and assuming an action call that is passing in a User JPA Pojo)
You can automatically bind a JPA object using the HTTP to Java
binding.
You can provide the user.id field yourself in the HTTP parameters.
When Play finds the id field, it loads the matching instance from the
database before editing it. The other parameters provided by the HTTP
request are then applied. So you can save it directly.
So, how can you fix this? I guess the easiest way is to not pass the id as part of the Pojo Object, and to pass the ID as a separate parameter, therefore Play will believe the object is not required to be automagically retrieved from the database.
An alternative method, is to have a setter method for the quantity field, which updates the delta. So, Play will automatically retrieve the object from the DB, then call your setter method to update the values (as per normal POJO binding), and as part of that operation, your new quantity, and the delta are set. Perfect! This is the best option in my opinion as it also ensures that the business logic stays neatly inside of your Model, and not your Controller.
I can't speak to the Play Framework specifically, but in JPA, the EntityManager caches objects for it's lifetime, unless explicitly emptied. Because of that, querying for an object that the context is already managing will just give you the cached version. Furthermore, it sounds like the EM you are getting is declared as EXTENDED, which causes the same EM to be used across multiple requests (or perhaps the framework does the lookup under the covers and sets the values before you get to handle it).
You will either need to work around this cache or configure Play to use a TRANSACTION-scoped Persistence Context (aka EntityManager). I can't help you with the latter, but the former is easy enough.
int newQuantity = entity.getQuantity();
entityManager.refresh(entity);
// enity.getQuantity() should now give you the old value
// Do your calculation
// entity.setQuantity(newQuantity);
At the end of the transaction, your new state should be saved.
You may save quantity value in a hidden text and you can process that

Why does updating an object only work one, particular way?

I am trying to update an object using EF4. An object is passed from the strongly-typed page to the action method and
[HttpPost]
public ActionResult Index(Scenario scenario, Person person)
{
// Some business logic.
// Update Scenario with Person information.
scenario.Person = person;
// Update the corresponding object and persist the changes.
// Note that the repository stems from the repository pattern. Contains the ObjectContext.
Scenario updateScenario = repository.GetScenario(scenario.ScenarioID);
updateScenario = scenario;
repository.Save();
}
However, the problem is that the changes do not persist when I do this. However, if I instead update every single property within the scenario individually and then persist the changes (via the Save method), everything is persisted.
I'm confused why this is happening. In my real application, there are MANY items and subobjects within a Scenario so it is not feasible to update every individual property. Can someone please help clear up what is happening and what I need to do to fix it?
In the context of your action method, you have two different objects of type Scenario. scenario points to one of the objects and updateScenario points to another one. With the line of code:
updateScenario = scenario
All you are doing is causing the updateScenario to point to the same object that scenario points to, you are not copying the values that make up the object from one to another. Essentially, your database context is aware of only 1 of the 2 instances of Scenario. The other instance of Scenario was created outside of the context and the context has not been made aware of it.
In your particular scenario you can accomplish what you want by not taking a Scenario on your parameter, and instead, pull the Scenario that you want to update from your database context and in your action method, invoke:
this.TryUpdateModel(updateScenario);
This will cause the model binder to update the property/fields on the Scenario object that your database context is aware of, and therefore will persist the changes when you call Save().
HTH