Unidirectional or bidirectional relationships in JPA? [duplicate] - jpa

What is the difference between Unidirectional and Bidirectional associations?
Since the table generated in the db are all the same,so the only difference I found is that each side of the bidiretional assocations will have a refer to the other,and the unidirectional not.
This is a Unidirectional association
public class User {
private int id;
private String name;
#ManyToOne
#JoinColumn(
name = "groupId")
private Group group;
}
public class Group {
private int id;
private String name;
}
The Bidirectional association
public class User {
private int id;
private String name;
#ManyToOne
#JoinColumn(
name = "groupId")
private Group group;
}
public class Group {
private int id;
private String name;
#OneToMany(mappedBy="group")
private List<User> users;
}
The difference is whether the group holds a reference of the user.
So I wonder if this is the only difference? which is recommended?

The main difference is that bidirectional relationship provides navigational access in both directions, so that you can access the other side without explicit queries. Also it allows you to apply cascading options to both directions.
Note that navigational access is not always good, especially for "one-to-very-many" and "many-to-very-many" relationships. Imagine a Group that contains thousands of Users:
How would you access them? With so many Users, you usually need to apply some filtering and/or pagination, so that you need to execute a query anyway (unless you use collection filtering, which looks like a hack for me). Some developers may tend to apply filtering in memory in such cases, which is obviously not good for performance. Note that having such a relationship can encourage this kind of developers to use it without considering performance implications.
How would you add new Users to the Group? Fortunately, Hibernate looks at the owning side of relationship when persisting it, so you can only set User.group. However, if you want to keep objects in memory consistent, you also need to add User to Group.users. But it would make Hibernate to fetch all elements of Group.users from the database!
So, I can't agree with the recommendation from the Best Practices. You need to design bidirectional relationships carefully, considering use cases (do you need navigational access in both directions?) and possible performance implications.
See also:
Deterring “ToMany” Relationships in JPA models
Hibernate mapped collections performance problems

There are two main differences.
Accessing the association sides
The first one is related to how you will access the relationship. For a unidirectional association, you can navigate the association from one end only.
So, for a unidirectional #ManyToOne association, it means you can only access the relationship from the child side where the foreign key resides.
If you have a unidirectional #OneToMany association, it means you can only access the relationship from the parent side which manages the foreign key.
For the bidirectional #OneToMany association, you can navigate the association in both ways, either from the parent or from the child side.
You also need to use add/remove utility methods for bidirectional associations to make sure that both sides are properly synchronized.
Performance
The second aspect is related to performance.
For #OneToMany, unidirectional associations don't perform as well as bidirectional ones.
For #OneToOne, a bidirectional association will cause the parent to be fetched eagerly if Hibernate cannot tell whether the Proxy should be assigned or a null value.
For #ManyToMany, the collection type makes quite a difference as Sets perform better than Lists.

I'm not 100% sure this is the only difference, but it is the main difference. It is also recommended to have bi-directional associations by the Hibernate docs:
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/best-practices.html
Specifically:
Prefer bidirectional associations:
Unidirectional associations are more difficult to query. In a large
application, almost all associations
must be navigable in both directions
in queries.
I personally have a slight problem with this blanket recommendation -- it seems to me there are cases where a child doesn't have any practical reason to know about its parent (e.g., why does an order item need to know about the order it is associated with?), but I do see value in it a reasonable portion of the time as well. And since the bi-directionality doesn't really hurt anything, I don't find it too objectionable to adhere to.

In terms of coding, a bidirectional relationship is more complex to implement because the application is responsible for keeping both sides in synch according to JPA specification 5 (on page 42). Unfortunately the example given in the specification does not give more details, so it does not give an idea of the level of complexity.
When not using a second level cache it is usually not a problem to do not have the relationship methods correctly implemented because the instances get discarded at the end of the transaction.
When using second level cache, if anything gets corrupted because of wrongly implemented relationship handling methods, this means that other transactions will also see the corrupted elements (the second level cache is global).
A correctly implemented bi-directional relationship can make queries and the code simpler, but should not be used if it does not really make sense in terms of business logic.

Related

DDD, JPA and non-default constructors

DDD says that you domain models needs to be expressive and say what they need. With that in mind I created the following class:
#Entity
public class User{
#Id
#GeneratedValue
private Long id;
private String userName;
private String password;
public User(String userName, String password) {
/**set the values*/
}
/** Validations, getters and business methods*/
With this constructor I tell that the minimun needed to build a new User object.
The problem with that approach is that JPA needs a non-argument default constructor to instatiate the class by reflections. In this case do I have to forget about that DDD sugestion or is there a workaround?
In your specific situation you can make the non-argument constructor private. But soon another JPA requirement will force you to make another workaround;
voiceofunreason gave you the right answer. The right way is to have an adapter in the infrastructure/persistence layer that receives your domain object and then convert it to a JPA anemic model and vice versa.
Some other options that fits better without too much conversions:
1 - Spring Data JDBC is trying to absorb Aggregates concept
https://www.youtube.com/watch?v=GOSW911Ox6s&t=3s;
https://docs.spring.io/spring-data/jdbc/docs/1.1.7.RELEASE/reference/html/#reference
2 - Persist your aggregate as a json in a relational database. Vaughn Vernon proposed an implementation -> https://kalele.io/the-ideal-domain-driven-design-aggregate-store/
3 - Document oriented database such as https://www.mongodb.com/ among others...
In this case do I have to forget about that DDD sugestion or is there a workaround?
There isn't really a good answer here. There's a tension at work when we try to mix objects with mutable, encapsulated state, and persistence (ie: storing the authoritative copy of state in some anemic durable store).
The "right" answer is that we don't use an O/RM to load domain entities from the durable store; the O/RM loads a DTO, and we copy information from the DTO into the domain object. Similarly, to save, you extract information from the domain entity and copy it into the DTO, then save the DTO.
In other words, the O/RM loads an in memory representation of an anemic data structure.
In practice, it's a lot more common to couple the entity state to the O/RM, implementing within the domain entity whatever interfaces the O/RM needs to do its work.
The trick is to make sure that you can decouple your implementation from the O/RM when the O/RM assumptions start to get in the way. For instance, you should be able to easily change a specific aggregate from using a relational data store to using a document data store without requiring a rewrite of your entire system.

Why do I need to set the entities both ways in JPA

I have two following entities:
#Entity
public class Pilot implements Serializable {
....
#ManyToOne
#JoinColumn(name = "FLIGHT_ID")
private Flight flight;
}
and
#Entity
public class Flight implements Serializable {
....
#OneToMany(mappedBy = "flight")
private List<Pilot> pilots;
}
I am using an EJB to add a pilot to a flight:
public void addPilotToFlight(String pilotId, String flightId) {
<code to find Flight f>
<code to find Pilot p>
p.setFlight(f);
f.getPilots().add(p); // Why do I need this one?
}
My confusion is why do I need to link objects both ways?
I have tried the code with and without the f.getPilots().add(p); part, in DB it looks the exact same. However, when I try to use f.getPilots(); for the same flight entity object, it shows only if I had done the above both-way linking. Could anyone please shed some light on this? I'd be very grateful. Thanks in advance.
Your question seems aimed at asking why you have to maintain both sides of a bidirectional relationship, which is a common question and source of confusion.
CMP 2.0 tried to perform relationship maintenance and magic behind the scenes, but the overhead and lack of control was a problem (IMO) and it was not included in JPA. Instead, JPA entities are meant to be plain old java objects. That means if you want changes made to one side of a relationship to be reflected in the other, you must make them yourself.
The confusion will be that if you modify one side of a relationship and not the other, sometimes the database will get updated correctly, sometimes not, and sometimes both sides will show the change. It all depends on which side you change, and how the entities are cached and refreshed from the database. Changes to the owning side are persisted to the database, and JPA allows for caching, so the non-owning side may not reflect these changes until it is refreshed from the database.
The best practices is that any change you make to one side should be made to the other by the application if you want your object model to remain in sync with the database.
Because in JPA you have the concept of an "one-way" relationship. If you don't specify the "mapped by" attribute, you're going ot create 2 one-way relationships (1:N and N:1) instead of a bi-directional 1:N relationship.
Then one may ask "why the heck do people need two one-way relationships"? And the answer is because JPA was built to satisfy even strange legacy databases. :-)

Foreign key between aggregate roots

I understand the concept of aggregate root and I know that one aggregate root must reference another by identity ( http://dddcommunity.org/wp-content/uploads/files/pdf_articles/Vernon_2011_2.pdf ) so what I don't get is how can I force Entity Framework to add a foreign key constraint between two aggregates?
Lets suppose I have a simplified domain:
public class AggregateOne{
[Key]
public Guid AggregateOneID{ get; private set;}
public Guid AggregateTwoFK{get; private set;}
/*Other Properties and methods*/
}
public class AggregateTwo{
[Key]
public Guid AggregateTwoID{get; private set;}
/*Other Properties and methods*/
}
With this domain design, Entity Framework doesn't know that there is a relationship between AggregateOne and AggregateTwo and consequently there is no foreign key at the generated database.
In DDD, EF doesn't exist. Domain relationships are not the same as database relationships. Don't try to mix EF with domain modeling, they don't work together. So in a nutshell, what you have there is not DDD, just plain old relational db masquerading as DDD. EF would be used by the Repositories and would care about persisting one Aggregate Root (AR).
Two ARs can work together, however you need to model the process according to the domain. EF is there to act as a db for the app, it's concerned with persistence issues and shouldn't care about the Domain. Persistence is all about storage and not about reflecting domain relationships (the EF entity is not the domain entity although they can have the same name and can look similar. The important detail is that both belong to different layers and handle different issues). The Domain repositories care only to persist the AR in a way that can be easily restored when it will change. If more AR need to be persisted together, embrace eventual consistency and learn how to use a service bus and sagas. It will greatly simplify your life (consider it a kind of implementation for the unit of work pattern).
For querying, the most clean and elegant way is to generate/update a read model suitable for the querying use cases and this is usually done after a domain event tells the 'world' that something changed in the Domain.
Doing DDD right is not straightforward and it's very easy to fall into the trap, believing that you apply DDD when in fact you're just CRUD ing away, using DDD terminology. Also IMO CQRS is a must with DDD if you like an easy life.
Understand the domain without rushing it and being superficial, identify the bounded contexts, model the domain concepts and their use cases (very important!!!), define repository interfaces as you need them, and implement the repositories only when there's nothing else left to do (the real repos, in the mean time you can use fake ones like in memory repos - they're very fast to implement and your app being decoupled means it shouldn't care about how persistence is implemented, right?). I know it sounds weird, but this how you know you have a maintainable DDD app.
The point of implementing the repositories last is to really decouple the app from the persistence details and also to have defined the expectations(repository methods) the app has from persistence. Once defined, you can write tests :D then implement the repositories. The bonus is that you get to focus only on repo implementation is isolation and when the all tests pass, you know everything works as it should.
Why should you have two complete different objects? Why not only expose your entities as domain objects through a domain interface?
In this case there's no issue with having your entities also act as domain objects with their implementation details neatly hidden behind the interface.
Another point a neat way to represent aggregate roots with EF is to make sure the foreign key column also makes up the primary key of the dependant entity. In your case that would mean AggregateOneId and AggregateTwoFk together would form the composite primary key of AggregateOne. This will ensure that EF doesn't need a repository for removing instances off AggregateOne as long as it's removed from AggregateTwo's collection it will be properly marked for deletion from the databases (if you don't have key like this you need to remove it from AggregateOne set because EF would throw an exception not understanding the intent of the developer that AggregateOne should be deleted.

EF 4.2 Code First and DDD Design Concerns

I have several concerns when trying to do DDD development with EF 4.2 (or EF 4.1) code first. I've done some extensive research but haven't come up with concrete answers for my specific concerns. Here are my concerns:
The domain cannot know about the persistence layer, or in other words the domain is completely separate from EF. However, to persist data to the database each entity must be attached to or added to the EF context. I know you are supposed to use factories to create instances of the aggregate roots so the factory could potentially register the created entity with the EF context. This appears to violate DDD rules since the factory is part of the domain and not part of the persistence layer. How should I go about creating and registering entities so that they correctly persist to the database when needed to?
Should an aggregate entity be the one to create it's child entities? What I mean is, if I have an Organization and that Organization has a collection of Employee entities, should Organization have a method such as CreateEmployee or AddEmployee? If not where does creating an Employee entity come in keeping in mind that the Organization aggregate root 'owns' every Employee entity.
When working with EF code first, the IDs (in the form of identity columns in the database) of each entity are automatically handled and should generally never be changed by user code. Since DDD states that the domain is separate from persistence ignorance it seems like exposing the IDs is an odd thing to do in the domain because this implies that the domain should handle assigning unique IDs to newly created entities. Should I be concerned about exposing the ID properties of entities?
I realize these are kind of open ended design questions, but I am trying to do my best to stick to DDD design patterns while using EF as my persistence layer.
Thanks in advance!
On 1: I'm not all that familiar with EF but using the code-first/convention based mapping approach, I'd assume it's not too hard to map POCOs with getters and setters (even keeping that "DbContext with DbSet properties" class in another project shouldn't be that hard). I would not consider the POCOs to be the Aggregate Root. Rather they represent "the state inside an aggregate you want to persist". An example below:
// This is what gets persisted
public class TrainStationState {
public Guid Id { get; set; }
public string FullName { get; set; }
public double Latitude { get; set; }
public double Longitude { get; set; }
// ... more state here
}
// This is what you work with
public class TrainStation : IExpose<TrainStationState> {
TrainStationState _state;
public TrainStation(TrainStationState state) {
_state = state;
//You can also copy into member variables
//the state that's required to make this
//object work (think memento pattern).
//Alternatively you could have a parameter-less
//constructor and an explicit method
//to restore/install state.
}
TrainStationState IExpose.GetState() {
return _state;
//Again, nothing stopping you from
//assembling this "state object"
//manually.
}
public void IncludeInRoute(TrainRoute route) {
route.AddStation(_state.Id, _state.Latitude, _state.Longitude);
}
}
Now, with regard to aggregate life-cycle, there are two main scenario's:
Creating a new aggregate: You could use a factory, factory method, builder, constructor, ... whatever fits your needs. When you need to persist the aggregate, query for its state and persist it (typically this code doesn't reside inside your domain and is pretty generic).
Retrieving an existing aggregate: You could use a repository, a dao, ... whatever fits your needs. It's important to understand that what you are retrieving from persistent storage is a state POCO, which you need to inject into a pristine aggregate (or use it to populate it's private members). This all happens behind the repository/DAO facade. Don't muddle your call-sites with this generic behavior.
On 2: Several things come to mind. Here's a list:
Aggregate Roots are consistency boundaries. What consistency requirements do you see between an Organization and an Employee?
Organization COULD act as a factory of Employee, without mutating the state of Organization.
"Ownership" is not what aggregates are about.
Aggregate Roots generally have methods that create entities within the aggregate. This makes sense because the roots are responsible for enforcing consistency within the aggregate.
On 3: Assign identifiers from the outside, get over it, move on. That does not imply exposing them, though (only in the state POCO).
The main problem with EF-DDD compatibility seems to be how to persist private properties. The solution proposed by Yves seems to be a workaround for the lack of EF power in some cases. For example, you can't really do DDD with Fluent API which requires the state properties to be public.
I've found only mapping with .edmx files allows you to leave Domain Entities pure. It doesn't enforce you to make things publc or add any EF-dependent attributes.
Entities should always be created by some aggregate root. See a great post of Udi Dahan: http://www.udidahan.com/2009/06/29/dont-create-aggregate-roots/
Always loading some aggregate and creating entities from there also solves a problem of attaching an entity to EF context. You don't need to attach anything manually in that case. It will get attached automatically because aggregate loaded from the repository is already attached and has a reference to a new entity. While repository interface belongs to the domain, repository implementation belongs to the infrastructure and is aware of EF, contexts, attaching etc.
I tend to treat autogenerated IDs as an implementation detail of the persistent store, that has to be considered by the domain entity but shouldn't be exposed. So I have a private ID property that is mapped to autogenerated column and some another, public ID which is meaningful for the Domain, like Identity Card ID or Passport Number for a Person class. If there is no such meaningful data then I use Guid type which has a great feature of creating (almost) unique identifiers without a need for database calls.
So in this pattern I use those Guid/MeaningfulID to load aggregates from a repository while autogenerated IDs are used internally by database to make a bit faster joins (Guid is not good for that).

JPA, pattern or anti-pattern: to have both flat and related sets of mappings?

This question concerns using JPA to manage some data where some scenarios benefit from the full object model and others seem to be better implemented by a much flatter model. I'm therefore inclined to create two models. I get the feeling that this is not a good idea but I'm hard-pressed to see exactly why, or what the alternatives may be.
The basis scenario is that there is an Entity, lets call it A which the many side of a relationship with entity B. So in the database A has a foreign key field and if the full object model we see (simplified, getters/setters removed)
public Class A {
public int aKey;
public B;
// more attributes
}
public Class B {
public int bKey;
public List<A> collectionOfA;
// and more
}
One particular scenario is handling the arrival into the system of new As. They come from some external in the form of, say, text files. the insertion code needs to
for each CVS record
get the bKey from the record
find the B, or manage any error
create the A, setting the B
persist
Now in fact my scenario is more complex, there are several such relationships, so that find/set pairing is repeated several times.
Alternatively I could (and in fact have) created a second mapping for the A table
public Class Ainserter {
public int aKey;
public int bKey;
// more attributes
}
Now I just set the two values and persist. This does assume that the DB will have the referential integrity constraints, but with the tooling I'm using that is the case. In this, and in many legacy systems the DB pre-exists and may be accessed from both the new JPA code and other even non-Java code. I therefore don't see a reason to put the referential integrity checking in the JPA code in such simple cases.
I can see that potentially there are opportunities for aspects of the full model to become stale with respect to my insertions, but in a legacy environment there could be insertions happening in the DB itself at any time. So I don't see a new problem here.
I can also see potential for confusion if the same Entity Context were used for both models, but that can be avoided by suitable encapsulation.
Any other thoughts?
Edit:
There is a suggestion from axtavt to use EntityManager.getReference(B.class, bkey) to get the B instance. My understanding is that if I do this then to be properly conforming with the JPA programming model I am supposed to set both sides of the relationship, hence I would need to visit the "referenced" B object and add my A into his collection.
Edited again:
I was concerned that visiting B would cause a database lookup, so in performance terms I would not get the win. I have it on very good authority that, at least OpenJPA, will in fact not need to "inflate" B if we only access B's key and the collection of As - and so getReference() is a good suggestion. I seems reasonable that a well designed JPA implementation would have such optimisations.
JPA has an EntityManager.getReference() method, which basically combines the approaches you describe.
It gets primary key and returns a proxy object with that primary key without hitting the database. So, you can use that object to initialize the relationship field, exactly as you want to do in your second approach.