Why JPA entities are treated like this outside a session? - jpa

Hy,
I am having a "Solve failed to lazily initialize a collection of role.... exception" in jpa. I understand that outside a session, when you want to retrieve a lazy collection if there is not session bound you will get this error, fine. But what I dont understand is if I have this spring controller(the code is not 100% correct, is just to explain my case):
#Controller
#Autowired
EnterpriseService enterpriseService ;
public List<Enterprise> getAll(){
List<Enterprise> enterprises = enterpriseService.getAll();
for(Enterprise enterprise:enterprises){
enterprise.getEmployees();
}
return enterprises;
}
When I call "enterprise.getEmployees()", I know there is not anymore a session but why when I try to do "enterprise.getEmployees()", why enterprise is treated like a jpa entity and not just like a normal bean?, I mean; for what I understand a jpa entity is treated like this inside a session but outside like in this case it should be treated like a normal java bean so the calling to enterprise.getEmployees() should be treated like the calling of a get method of a java bean and should not throw any lazy exception....
Maybe is because spring controller treats the enterprise objects like jpa entities and not just only like java beans? this behaviour is specific to spring controllers?
Thanks

An entity returned from an EntityManager is not necessarily an instance of your entity class, but a proxy class which extends your class. In many cases the same is true for the persistent properties of such entities (especially for those annotated with (One/Many)To(One/Many)).
For example if you are using field based access:
#Entity
public class Enterprise {
#OneToMany
private List<Employee> employees = new ArrayList<>();
public List<Employee> getEmployees() {
return employees;
}
}
Here the JPA provider will create a proxy class that extends Enterprise and remembers the previous state from the DB somehow. In addition it will change the employees list to its own implementation of List (which does not need to extend ArrayList).
Because the proxy class can "overwrite" your methods, it could know when you are calling getEmployees() and check for the session. But I don't think that this would happen here, as the method is not annotated with any JPA specific annotation.
In addition some frameworks like Hibernate do support byte code enhancement or byte code instrumentation. That changes the implementation of your class (in byte code) and replaces every access to employees with some provider specific code. I don't know if Spring JPA provides this, but this could lead to check for a session.
Otherwise any call to enterprise.getEmployees() should just return the current value of employees - without any check for the session and without the LazyInitializationException.
But calling enterprise.getEmployees().size() will try to initialize the list and check for the session - this could lead to the mentioned exception.
If you are using property based access things are a little bit different:
#Entity
public class Enterprise {
private List<Employee> employees = new ArrayList<>();
#OneToMany
public List<Employee> getEmployees() {
return employees;
}
}
Here the proxy class will not delegate to your implementation, but overwrite the getEmployees() method and return its own List implementation, without changing employees. Thus here it is possible to get a LazyInitalizationException for enterprise.getEmployees().
Remark: This describes how most JPA implementations are working - but as this is implementation specific some unusual frameworks could do things differently.

It can't do anything else. The only alternative would be to return an empty collection of employees, which would be much much worse: you would incorrectly assume that the enterprise has 0 employee, which is a valid, but completely incorrect result.
To realize how much worse it would be to do that, let's imagine a HospitalAnalysis entity, having a collection of DetectedDisease entities. And let's say you try to display the result of the analysis but forget to initialize the collection. The page would tell you that you're perfectly healthy and that you can safely go home, whereas in reality, you have a cancer, and the program has a bug. I'd much prefer the program to crash with an exception and be fixed rather than not starting my treatment.
Trying to access employees without having initialized the collection, and thus without knowing the actual collection of employees, is just a bug. This bug is signalled by throwing a runtime exception.

Related

Mark complete entity as "updatable=false"

In one application there are several entity classes, for which update statements to the database shall be forbidden. I want to insert into and read from the database, but never do any updates on existing records.
Is there a way to mark a whole #Entity class as updatable=false?
One could annotate every single field with #Column(updatable=false) or similar annotations, but for obvious reasons I'd like to avoid this.
Getting rid of the setter methods is not an option as well, because the entity classes are also used as DTOs and other parts of the application need those setters. so this would cause to much refactoring of existing code.
Is there another simple and clean way to achieve what I'd like to have with JPA 2.1 / EclipseLink (+Extensions)
You could use #PreUpdate for that:
#Entity
public class ReadOnlyEntity {
#PreUpdate
private void preUpdate() {
throw new UnsupportedOperationException();
}
}
That way you can never update (write) that entity.

Why are my entities only partially populated when loading them using Spring Data JPA?

I'm using Spring Data JPA and DataNucleus as JPA persistence provider and have something like
interface BookRepository extends CrudRepository<Book, Long> {
Book findByAuthorId(Long id);
}
If I call bookRepository.findByAuthorId() and then access book.publishingHouse.manager.name is null. As opposed to calling bookRepository.findAll() when the fields are populated correctly all the way. I set datanucleus.DetachAllOnCommit=true and datanucleus.maxFetchDepth=-1 (I also tried with 10).
Any idea why?
If you don't have any additional transaction boundaries defined, the EntityManager closed when leaving the query method. That means what you get back is detached entities and what kind of load state you get back is determined by the defaults the persistence provider uses.
You basically have two options:
Have a client (service or controller class) using #Transactional to keep the EntityManager open and thus the loaded instances eligible to lazy-loading to pull data out of the store while you use the instance. If a controller or service is not enough, you might wanna look into the OpenEntityManagerInViewFilter/-Interceptor which basically keeps the EntityManager open until the view is rendered.
Define what should be fetched explicitly either using JPA 2.1 entity graphs (see the reference docs for details) or explicitly adding fetch-joins to the query by defining it manually.

Having static Repository class in a webforms project reuses entity framework connections?

I have a
public static class Repository
in my webforms project.
In the static block of that class I setup my entity framework entity object:
private static readonly ProjectEntities db;
static Repository()
{
db = new ProjectEntities("Name=ProjectEntities");
}
Then I setup some public static methods like this:
public static Order GetOrder(int orderID)
{
return db.Orders.First(o => o.OrderID == orderID);
}
The problem is that when for instance deletions fails (because of some constraint), I randomly gets some clues about that in subsequent connections, coming up as exceptions as a result of queries that should be innocent. For instance, exceptions about deletions as a result of select queries.
I never
db.AcceptAllChanges();
upon any exception, and I should not have to, because across page accesses, there should be no trace of failed queries. Or should it? Is the cleaning responsibility on me?
Those problems should not be because of me using static (please say it is not like that), so is it related to entity framework connection pooling?
Generally speaking the entity framework context is meant to be short lived - i.e. it is generally regarded as a unit of work whereby you create it for a particular task and dispose of it at the end. It's a light weight object, and should be used in this way.
You issue is as a result of the object being long lived (i.e. in a singleton shared across requests). In this case the internal state of the context is becoming invalid - i.e. you try to delete something, it cannot persist those changes to the database, and is therefore in an invalid state.
You could probably resolve your issue by calling the refresh method before making use of the object in every case - this will cause the object to update its state based on the database - but this will probably cause other issues.
However, this is the wrong thing to do - the context should be created, used and disposed per request.
Hope this helps.
I would seriously suggest you investigate the lifecycle management of your context object.
Have a look at this excellent answer as to what your options are.

Refactoring application: Direct database access -> access through REST

we have a huge database application, which must get refactored (there are so many reasons for this. biggest one: security).
What we already have:
MySQL Database
JPA2 (Eclipselink) classes for over 100 tables
Client application that accesses the database directly
What needs to be there:
REST interface
Login/Logout with roles via database
What I've done so far:
Set up Spring MVC 3.2.1 with Spring Security 3.1.1
Using a custom UserDetailsService (contains just static data for testing atm)
Created a few Controllers for testing (simply receiving/providing data)
Design Problems:
We have maaaaany #OneToMany and #ManyToMany relations in our database
1.: (important)
If I'd send the whole object tree with all child objects as a response, I could probably send the whole database at once.
So I need a way to request for example 'all Articles'. But it should omit all the child objects. I've tried this yesterday and the objects I received were tons of megabytes:
#PersistenceContext
private EntityManager em;
#RequestMapping(method=RequestMethod.GET)
public #ResponseBody List<Article> index() {
List<Article> a = em.createQuery("SELECT a FROM Article a", Article.class).getResultList();
return a;
}
2.: (important)
If the client receives an Article, at the moment we can simply call article.getAuthor() and JPA will do a SELECT a FROM Author a JOIN Article ar WHERE ar.author_id = ?.
With REST we could make a request to /authors/{id}. But: This way we can't use our old JPA models on the client side, because the model contains Author author and not Long author_id.
Do we have to rewrite every model or is there a simpler approach?
3.: (less important)
Authentication: Make it stateless or not? I've never worked with stateless auth so far, but Spring seems to have some kind of support for it. When I look at some sample implementations on the web I have security concerns: With every request they send username and password. This can't be the right way.
If someone knows a nice solution for that, please tell me. Else I'd just go with standard HTTP Sessions.
4.:
What's the best way to design the client side model?
public class Book {
int id;
List<Author> authors; //option1
List<Integer> authorIds; //option2
Map<Integer, Author> idAuthorMap; //option3
}
(This is a Book which has multiple authors). All three options have different pros and cons:
I could directly access the corresponding Author model, but if I request a Book model via REST, I maybe don't want the model now, but later. So option 2 would be better:
I could request a Book model directly via REST. And use the authorIds to afterwards fetch the corresponding author(s). But now I can't simply use myBook.getAuthors().
This is a mixture of 1. and 2.: If I just request the Books with only the Author ids included, I could do something like: idAuthorMap.put(authorId, null).
But maybe there's a Java library that handles all the stuff for me?!
That's it for now. Thank you guys :)
The maybe solution(s):
Problem: Select only the data I need. This means more or less to ignore every #ManyToMany, #OneToMany, #ManyToOne relations.
Solution: Use #JsonIgnore and/or #JsonIgnoreProperties.
Problem: Every ignored relation should get fetched easily without modifying the data model.
Solution: Example models:
class Book {
int bId;
Author author; // has #ManyToOne
}
class Author {
int aId;
List<Book> books; // has #OneToMany
}
Now I can fetch a book via REST: GET /books/4 and the result will look like that ('cause I ignore all relations via #JsonIgnore): {"bId":4}
Then I have to create another route to receive the related author: GET /books/4/author. Will return: {"aId":6}.
Backwards: GET /authors/6/books -> [{"bId":4},{"bId":42}].
There will be a route for every #ManyToMany, #OneToMany, #ManyToOne, but nothing more. So this will not exist: GET /authors/6/books/42. The client should use GET /books/42.
First, you will want to control how the JPA layer handles your relationships. What I mean is using Lazy Loading vs. Eager loading. This can easily be controller via the "fetch" option on the annotation like thus:
#OneToMany(fetch=FetchType.Lazy)
What this tells JPA is that, for this related object, only load it when some code requests it. Behind the scenes, what is happening is that a dynamic "proxy" object is being made/created. When you try to access this proxy, it's smart enough to go out and do another SQL to gather that needed bit. In the case of Collection, its even smart enough to grab the underlying objects in batches are you iterate over the items in the Collection. But, be warned: access to these proxies has to happen all within the same general Session. The underlying ORM framework (don't know how Eclipselink works...I am a Hybernate user) will not know how to associate the sub-requests with the proper domain object. This has a bigger effect when you use transportation frameworks like Flex BlazeDS, which tries to marshal objects using bytecode instead of the interface, and usually gets tripped up when it sees these proxy objects.
You may also want to set your cascade policy, which can be done via the "cascade" option like
#OneToMany(cascade=CascadeType.ALL)
Or you can give it a list like:
#OneToMany(cascade={CascadeType.MERGE, CascadeType.REMOVE})
Once you control what is getting pulled from your database, then you need to look at how you are marshalling your domain objects. Are you sending this via JSON, XML, a mixture depending on the request? What frameworks are you using (Jackson, FlexJSON, XStream, something else)? The problem is, even if you set the fetch type to Lazy, these frameworks will still go after the related objects, thus negating all the work you did telling it to lazily load. This is where things get more specific to the mashalling/serializing scheme: you will need to figure out how to tell your framework what to marshal and what not to marshal. Again, this will be highly dependent on whatever framework is in use.

EF 4.2 Code First and DDD Design Concerns

I have several concerns when trying to do DDD development with EF 4.2 (or EF 4.1) code first. I've done some extensive research but haven't come up with concrete answers for my specific concerns. Here are my concerns:
The domain cannot know about the persistence layer, or in other words the domain is completely separate from EF. However, to persist data to the database each entity must be attached to or added to the EF context. I know you are supposed to use factories to create instances of the aggregate roots so the factory could potentially register the created entity with the EF context. This appears to violate DDD rules since the factory is part of the domain and not part of the persistence layer. How should I go about creating and registering entities so that they correctly persist to the database when needed to?
Should an aggregate entity be the one to create it's child entities? What I mean is, if I have an Organization and that Organization has a collection of Employee entities, should Organization have a method such as CreateEmployee or AddEmployee? If not where does creating an Employee entity come in keeping in mind that the Organization aggregate root 'owns' every Employee entity.
When working with EF code first, the IDs (in the form of identity columns in the database) of each entity are automatically handled and should generally never be changed by user code. Since DDD states that the domain is separate from persistence ignorance it seems like exposing the IDs is an odd thing to do in the domain because this implies that the domain should handle assigning unique IDs to newly created entities. Should I be concerned about exposing the ID properties of entities?
I realize these are kind of open ended design questions, but I am trying to do my best to stick to DDD design patterns while using EF as my persistence layer.
Thanks in advance!
On 1: I'm not all that familiar with EF but using the code-first/convention based mapping approach, I'd assume it's not too hard to map POCOs with getters and setters (even keeping that "DbContext with DbSet properties" class in another project shouldn't be that hard). I would not consider the POCOs to be the Aggregate Root. Rather they represent "the state inside an aggregate you want to persist". An example below:
// This is what gets persisted
public class TrainStationState {
public Guid Id { get; set; }
public string FullName { get; set; }
public double Latitude { get; set; }
public double Longitude { get; set; }
// ... more state here
}
// This is what you work with
public class TrainStation : IExpose<TrainStationState> {
TrainStationState _state;
public TrainStation(TrainStationState state) {
_state = state;
//You can also copy into member variables
//the state that's required to make this
//object work (think memento pattern).
//Alternatively you could have a parameter-less
//constructor and an explicit method
//to restore/install state.
}
TrainStationState IExpose.GetState() {
return _state;
//Again, nothing stopping you from
//assembling this "state object"
//manually.
}
public void IncludeInRoute(TrainRoute route) {
route.AddStation(_state.Id, _state.Latitude, _state.Longitude);
}
}
Now, with regard to aggregate life-cycle, there are two main scenario's:
Creating a new aggregate: You could use a factory, factory method, builder, constructor, ... whatever fits your needs. When you need to persist the aggregate, query for its state and persist it (typically this code doesn't reside inside your domain and is pretty generic).
Retrieving an existing aggregate: You could use a repository, a dao, ... whatever fits your needs. It's important to understand that what you are retrieving from persistent storage is a state POCO, which you need to inject into a pristine aggregate (or use it to populate it's private members). This all happens behind the repository/DAO facade. Don't muddle your call-sites with this generic behavior.
On 2: Several things come to mind. Here's a list:
Aggregate Roots are consistency boundaries. What consistency requirements do you see between an Organization and an Employee?
Organization COULD act as a factory of Employee, without mutating the state of Organization.
"Ownership" is not what aggregates are about.
Aggregate Roots generally have methods that create entities within the aggregate. This makes sense because the roots are responsible for enforcing consistency within the aggregate.
On 3: Assign identifiers from the outside, get over it, move on. That does not imply exposing them, though (only in the state POCO).
The main problem with EF-DDD compatibility seems to be how to persist private properties. The solution proposed by Yves seems to be a workaround for the lack of EF power in some cases. For example, you can't really do DDD with Fluent API which requires the state properties to be public.
I've found only mapping with .edmx files allows you to leave Domain Entities pure. It doesn't enforce you to make things publc or add any EF-dependent attributes.
Entities should always be created by some aggregate root. See a great post of Udi Dahan: http://www.udidahan.com/2009/06/29/dont-create-aggregate-roots/
Always loading some aggregate and creating entities from there also solves a problem of attaching an entity to EF context. You don't need to attach anything manually in that case. It will get attached automatically because aggregate loaded from the repository is already attached and has a reference to a new entity. While repository interface belongs to the domain, repository implementation belongs to the infrastructure and is aware of EF, contexts, attaching etc.
I tend to treat autogenerated IDs as an implementation detail of the persistent store, that has to be considered by the domain entity but shouldn't be exposed. So I have a private ID property that is mapped to autogenerated column and some another, public ID which is meaningful for the Domain, like Identity Card ID or Passport Number for a Person class. If there is no such meaningful data then I use Guid type which has a great feature of creating (almost) unique identifiers without a need for database calls.
So in this pattern I use those Guid/MeaningfulID to load aggregates from a repository while autogenerated IDs are used internally by database to make a bit faster joins (Guid is not good for that).