How do I express a polymorphic association in JPA? - jpa

A polymorphic association is similar to a foreign key or many-to-one relationship, with the difference being that the target might be one of a number of types (classes in the language, tables in the db).
I'm porting a database design I've been using for some years from PHP to Java. In the old code, I had rolled my own ORM, which wasn't optimal for a number of reasons. Although I might start to tweak things later, and maybe end up implementing things myself again, for now I'd like to use an off-the-shelf ORM and JPA on my entity classes.
Now, there's one thing about the database layout that I don't know how to express in JPA:
I have a Node and an Edge table storing a graph (a DAG, if it matters). Each node may optionally reference one other entity from the database. These entites may be refrenced multiple times throughout the graph and there may also be "orphaned" entites, which wouldn't be accesible for the user, but which may make sense to keep at least for a while.
These objects are not at all related in terms of inheritance etc. but have a natural hierarchy, similar to Customer->Site->Floor->Room. In fact, years ago, I started out with just foreign key fields pointing to the "parent" objects. However, this hierarchy isn't flexible enough and started falling apart.
For example, I want to allow users to group objects in folders, some objects can have multiple "parents" and also the relations change over time. I need to keep track of how the relations used to be, so the edegs of the graph have a timespan associated with them, that states from when to when that edge was valid.
The link from a node to an object is stored in two columns of the node table, one carries the id in the foreign table, one carries its name. For example (some columns omitted):
table Node:
+--------+-------+----------+
| ixNode | ixRef | sRefType |
+--------+-------+----------+
| 1 | NULL | NULL | <-- this is what a "folder" would look like
| 2 | 17 | Source |
| 3 | 58 | Series | <-- there's seven types of related objects so far
+--------+-------+----------+
table Source (excerpt):
+----------+--------------------+
| ixSource | sName |
+----------+--------------------+
| 16 | 4th floor breaker |
| 17 | 5th floor breaker |
| 18 | 6th floor breaker |
+----------+--------------------+
There might be a different solution than using JPA. I could change something about the table layout or introduce a new table etc. However, I have thought about this a lot already and the table structure seems OK to me. Maybe there's also a third way that I didn't think of.

I think you've already hit on an answer. Create an abstract class (either #Entity or #MappedSuperclass) and have the different types extend it.
Something like this might work
#MappedSuperclass
#Inheritance(strategy=InheritanceType.TABLE_PER_CLASS)
public abstract class Edge {
// . . .
#OneToMany
Collection<Node> nodes;
}
#Entity
public class Source extends Edge {
}
#Entity public class Series extends Edge {
}
#Entity
public class Node {
// . . .
#ManyToOne
Edge edge;
}
I understand you might not want to imply a relationship between the Source and Series, but extending a common abstract (table-less) class is the only way I can think of to do what you want.
InheritanceType.TABLE_PER_CLASS will keep Source and Series in separate tables (you could use SINGLE_TABLE to do something like the previous answer).
If this isn't what you're looking for, many JPA providers provide a tool that creates mappings based on an existing set of tables. In OpenJPA it's called the ReverseMappingTool [1]. The tool will generate Java source files that you can use as a starting point for your mappings. I suspect Hibernate or EclipseLink have something similar, but you could just use the OpenJPA one and use the entity definitions with a different provider (the tool doesn't generate any OpenJPA specific code as far as I know).
[1] http://openjpa.apache.org/builds/latest/docs/manual/manual.html#ref_guide_pc_reverse

The answer would be:
inheritance (as suggested already by Mike)
plus #DiscriminatorColumn to provide information which column stores the information about which subclass should be used: sxRef. The only doubt I see is the "sxRef" being a nullable column. I guess that it's forbidden.

Have you looked at the #Any annotation? It's not part of JPA but is a Hibernate Annotation extension to it.

How much information is stored in the Source and Series tables? Is it just a name? If so, you could combine them into one table, and add a "type" column. Your Node table would lose its sRefType, and you would have a new table that looks like this:
ixSource sName sType
16 4th floor breaker SOURCE
17 5th floor breaker SOURCE
18 6th floor breaker SOURCE
19 1st floor widget SERIES
20 2nd floor widget SERIES
This table would replace the Source and Series tables. Do Source and Series both belong to a superclass? That would be a natural name for this table.

Related

Does a birthdate/deathdate class should be a composition or an aggregation to an individual class?

The entity is a person.
So the entity have a birthdate and maybe already have a deathdate.
But this dates can or cannot be informed (depends of the entity and avaibility of the informations) ; so the entity might have none of those.
But I feel to do mess with the cardinality and the relation type.
How should I represent that ?
I have created an abstract class Individual. It leads to 2 final class : Person (identified person) or Pseudonym (anonym person).
It linked to a class Birthdate and a class Deathdate (both are generalized as a class Date).
[Birthdate]----<>[Individual] relationship is :
one (optional)-to-many (0..1 - 1..*)
0..1 : Because birthdate can be omitted and individual can have just one date of birth.
1..* : Because birthdate must concern at least one, but can concern severals individual.
[Deathdate]----<>[Individual] relationship is :
one (optional)-to-many (0..1 - 1..*)
0..1 : Because the individual isn't dead yet and can die just once.
1..* : Because deathdate must concern at least one but can concern severals individual.
But since, theoretically, everyone have a birthdate (and will have a deathdate) I was tempted by a composition. But some might prefer keep these dates secret and I wondered if composition could allow that.
Futhermore one date can correspond to severals individuals and here also I guess composition isn't possible then OR else it's me who did the confusion between Individual class and its instances (the individuals) and then Composition would be possible but not with the aforementionned cardinality.
At the moment I chose that :
Aggregation :
___________ _______________
|Birthdate|0..1-----1..*< >| |
___________ | <<Individual>>|
|Deathdate|0..1-----1..*< >|_______________|
But I hesitate with this one
Composition :
___________ _______________
|Birthdate|0..1-----1<#>| |
___________ | <<Individual>>|
|Deathdate|0..1-----1<#>|_______________|
What is the right answer ? Thanks for the attention.
There is a number of issues with the approach.
First - using a class for dates is simply an overkill. Both birthdate and deathdate are attributes of a specific person and can be easily modelled as inline properties of the Individual class. Unless there is some significant reason to use something more than the good old Date DataType, keep with the standard approach.
For visibility issue, as object oriented principles say you should not expose the properties directly anyway. Rather than that you should have an operation responsible for retrieving birthdate and deathdate that will control if the date can be read or not. You may add boolean attributes that will support that, but it isn't necessary if the ability to see the dates depend on some state of the Individual or other things (e.g. "who" asks). In the former case you may also wish to still show explicitly those boolean attributes as derived ones.
If you insist on using a class for dates (e.g. as you want to have a Wikipedia-style "Born on date"/"Deceased on date" collections) you should create just one class Date and build associations to this class pretty much similar to the way you did in your second approach. In such situation, the multiplicity does not work "database style" but is a property of association itself. In particular association you have one birthdate/deathdate and one Individual. By default you will have two 1-0..1 association one for each but depending on the approach you may have much more complex approach as well.
I'll later add diagrams for more clarity.
One last remark.
Do not use << >> for the class name. Those are reserved to indicate stereotypes.
If you want to indicate that Individual is abstract either show it in italics or (if your tool doesn't allow that) use <<abstract>> stereotype.

Entity Framework: Doing JOINs without having to creating Entities

Just starting out with Entity Framework (Code First) and I have to say I am having a lot of problems with it when loading SQL data that is fairly complex. For example, let's say I have the following tables which stores which animals belongs to which regions in the world and the animal are also categorized.
Table: Region
Id: integer
Name string
Table AnimalCategory
Id integer
Name: string
RegionId: integer -- Refers back Region
Table Animal
Id integer
AnimalCategoryId integer -- Refers back AnimalCategory
Let's say I want to create a query with Entity Framework that would load all Animals for a specific region. The easiest thing to do is to create 3 Entities Region, AnimalCategory, and Animal and use LINQ to load the data.
But let's say I am not interested in loading any AnimalCategory information and define an Entity class just to represent AnimalCategory so that I can do the JOIN. How can I do this with Entity Framework? Even with many of its Mapping functions I still don't think this is possible.
In non Entity Framework solutions this is easy to accomplish by using INNER JOINs in SPs or inline SQL. So what are my options in Entity Framework? Shall I pollute my data model with these useless tables just so I can do a JOIN?
It's a matter of choice I guess. EF choose to support many-to-many associations with transparent junction tables, i.e. where junction tables only have two foreign keys to the associated entities. They simply didn't choose to support this far less common "skipping one-to-many-to-many" scenario in a similar manner.
And I can imagine why.
To start with, in a many-to-many association, the junction table is nothing but that: a junction, an association. However, in a chain of one-to-many (or many-to-one) associations it would be exceptional for any of the involved tables to be just an association. In your example...
Animal → AnimalCategory → Region
...AnimalCategory would only have a primary key (Id) and a foreign key (RegionId). That would be useless though: Animal might just as well have a RegionId itself. There's no reason to support a data model that doesn't make sense.
What you're after though, is a model in which the table in the middle does carry information (AnimalCategory.Name), but where you'd like to map it as a transparent junction table, because a particular class model doesn't need this information.
Your focus seems to be on reading data. But EF has to support all CRUD actions. The problem here would be: how to deal with inserts? Suppose Name is a required field. There would be no way to supply its value.
Another problem would be that a statement like...
region.Animals.Add(animal);
...could mean two things:
add an Animal and a new AnimalCategory, the latter referring to the Region.
Add an Animal referring to an existing AnimalCategory - without being able to choose which one.
EF wouldn't want to choose for some default behavior. You'd have to make the choice yourself, so you can't do without access to AnimalCategory.

EF Table-per-hierarchy mapping

In trying to normalize a database schema and mapping it in Entity Framework, I've found that there might end up being a bunch of lookup tables. They would end up only containing key and value pairs. I'd like to consolidate them into one table that basically has two columns "Key" and "Value". For example, I'd like to be able to get Addresses.AddressType and Person.Gender to both point to the same table, but ensure that the navigation properties only return the rows applicable to the appropriate entity.
EDIT: Oops. I just realized that I left this paragraph out:
It seems like a TPH type of problem, but all of the reading I've done indicates that you start with fields in the parent entity and migrate fields over to the inherited children. I don't have any fields to move here because there would generally only be two.
There are a lot of domain-specific key-value pairs need to be represented. Some of them will change from time to time, others will not. Rather than pick and choose I want to just make everything editable. Due to the number of these kinds of properties that are going to be used, I'd rather not have to maintain a list enums that require a recompile, or end up with lots of lookup tables. So, I thought that this might be a solution.
Is there a way to represent this kind of structure in EF4? Or, am I barking up the wrong tree?
EDIT: I guess another option would be to build the table structure I want at the database level and then write views on top of that and surface those as EF entities. It just means any maintenance needs to be done at multiple levels. Does that sound more, or less desireable than a pure EF solution?
Table per hiearchy demands that you have one parent entity which is used as base class for child entities. All entities are mapped to the same table and there is special discriminator column to differ type of entity stored in database record. You can generally use it even if your child entities do not define any new properties. You will also have to define primary key for your table otherwise it will be handled as readonly entity in EF. So your table can look like:
CREATE TABLE KeyValuePairs
(
Id INT NOT NULL IDENTITY(1,1),
Key VARCHAR(50) NOT NULL,
Value NVARCHAR(255) NOT NULL,
Discriminator VARCHAR(10) NOT NULL,
Timestamp Timestamp NOT NULL
)
You will define your top level KeyValuePair entity with properties Id, Key, Value and Timestamp (set as concurrency mode fixed). Discriminator column will be used for inheritance mapping.
Be aware that EF mapping is static. If you define AddressType and Gender entities you will be able to use them but you will not be able to dynamically define new type like PhoneType. This will always require modifying your EF model, recompiling and redeploying your application.
From OOP perspective it would be nicer to not model this as object hiearchy and instead use conditional mapping of multiple unrelated entities to the same table. Unfortunatelly even EF supports conditional mapping I have never been able to map two entities to the same table yet.

Sort order in Core Data with a multi-multi relationship

Say I'm modeling a school, so I have 2 Entities: Student and Class. For whatever reason, I want each class roster to have a custom sort order. In a simple relationship, this would mean giving Student a sortOrder attribute and just sorting the list by this number. Issue is, a Student might be order 3 in one Class and order 6 in another. How would I store these orderings in Core Data in a way that I can easily access them and sort my lists properly?
Student Class
classes <<--------->> students
^ ^
| |
unordered ordered
This diagram might help explain what I'm trying to do. The students "roster" I would want to be fetched in a specific order stored somewhere, which could be any ordering. Storing this ordering is what I'm not sure how to do in a way that's the most efficient. Creating a bunch of Order objects and trying to manage the links sounds like a lot of overhead, and it feels like there must be a better way.
If the ordering of students can be described by one or more NSSortDescriptors, you could create a fetched property on the Class entity that fetches the students and applies the sort descriptor. Alternatively, it may be easier (depending on your use case) to apply the sort descriptor(s) to the NSFetchedResultsController that you're using to deal with the class' students collection.
If you can't use an NSSortDescriptor, then you'll need an index attribute (or name of your choice) on the Student entity if there's only one ordering or a collection of Order entities describing the index in each ordering for each Student. You'll be responsible for maintaing these index values. Unfortunately, there's no easy way to do this in Core Data. It's just a lot of work.
Student <<->> StudentClass <<->> Class
StudentClass
----
studentID
order
classID
Then you can select as necessary.
For example, you have a student. Fetch all StudentClass where StudentID is student.studentID. You then have the order, as well as access to the Class.
You'll likely want to add some business logic to make your life easier. Also, if you're not already using it, take a peek at MOGenerator: https://github.com/rentzsch/mogenerator
EDIT: I'd really like to know why this is getting voted down. Comments would be much appreciated.

Entity Framework many-to-many question

Please help an EF n00b design his database.
I have several companies that produce several products, so there's a many-to-many relationship between companies and products. I have an intermediate table, Company_Product, that relates them.
Each company/product combination has a unique SKU. For example Acme widgets have SKU 123, but Omega widgets have SKU 456. I added the SKU as a field in the Company_Product intermediate table.
EF generated a model with a 1:* relationship between the company and Company_Product tables, and a 1:* relationship between the product and Company_Product tables. I really want a : relationship between company and product. But, most importantly, there's no way to access the SKU directly from the model.
Do I need to put the SKU in its own table and write a join, or is there a better way?
I just tested this in a new VS2010 project (EFv4) to be sure, and here's what I found:
When your associative table in the middle (Company_Product) has ONLY the 2 foreign keys to the other tables (CompanyID and ProductID), then adding all 3 tables to the designer ends up modeling the many to many relationship. It doesn't even generate a class for the Company_Product table. Each Company has a Products collection, and each Product has a Companies collection.
However, if your associative table (Company_Product) has other fields (such as SKU, it's own Primary Key, or other descriptive fields like dates, descriptions, etc), then the EF modeler will create a separate class, and it does what you've already seen.
Having the class in the middle with 1:* relationships out to Company and Product is not a bad thing, and you can still get the data you want with some easy queries.
// Get all products for Company with ID = 1
var q =
from compProd in context.Company_Product
where compProd.CompanyID == 1
select compProd.Product;
True, it's not as easy to just navigate the relationships of the model, when you already have your entity objects loaded, for instance, but that's what a data layer is for. Encapsulate the queries that get the data you want. If you really want to get rid of that middle Company_Product class, and have the many-to-many directly represented in the class model, then you'll have to strip down the Company_Product table to contain only the 2 foreign keys, and get rid of the SKU.
Actually, I shouldn't say you HAVE to do that...you might be able to do some edits in the designer and set it up this way anyway. I'll give it a try and report back.
UPDATE
Keeping the SKU in the Company_Product table (meaning my EF model had 3 classes, not 2; it created the Company_Payload class, with a 1:* to the other 2 tables), I tried to add an association directly between Company and Product. The steps I followed were:
Right click on the Company class in the designer
Add > Association
Set "End" on the left to be Company (it should be already)
Set "End" on the right to Product
Change both multiplicities to "* (Many)"
The navigation properties should be named "Products" and "Companies"
Hit OK.
Right Click on the association in the model > click "Table Mapping"
Under "Add a table or view" select "Company_Product"
Map Company -> ID (on left) to CompanyID (on right)
Map Product -> ID (on left) to ProductID (on right)
But, it doesn't work. It gives this error:
Error 3025: Problem in mapping fragments starting at line 175:Must specify mapping for all key properties (Company_Product.SKU) of table Company_Product.
So that particular association is invalid, because it uses Company_Product as the table, but doesn't map the SKU field to anything.
Also, while I was researching this, I came across this "Best Practice" tidbit from the book Entity Framework 4.0 Recipies (note that for an association table with extra fields, besides to 2 FKs, they refer to the extra fields as the "payload". In your case, SKU is the payload in Company_Product).
Best Practice
Unfortunately, a project
that starts out with several,
payload-free, many-to-many
relationships often ends up with
several, payload-rich, many-to-many
relationships. Refactoring a model,
especially late in the development
cycle, to accommodate payloads in the
many-to-many relationships can be
tedious. Not only are additional
entities introduced, but the queries
and navigation patterns through the
relationships change as well. Some
developers argue that every
many-to-many relationship should start
off with some payload, typically a
synthetic key, so the inevitable
addition of more payload has
significantly less impact on the
project.
So here's the best practice.
If you have a payload-free,
many-to-many relationship and you
think there is some chance that it may
change over time to include a payload,
start with an extra identity column in
the link table. When you import the
tables into your model, you will get
two one-to-many relationships, which
means the code you write and the model
you have will be ready for any number
of additional payload columns that
come along as the project matures. The
cost of an additional integer identity
column is usually a pretty small price
to pay to keep the model more
flexible.
(From Chapter 2. Entity Data Modeling Fundamentals, 2.4. Modeling a Many-to-Many Relationship with a Payload)
Sounds like good advice. Especially since you already have a payload (SKU).
I would just like to add the following to Samuel's answer:
If you want to directly query from one side of a many-to-many relationship (with payload) to the other, you can use the following code (using the same example):
Company c = context.Companies.First();
IQueryable<Product> products = c.Company_Products.Select(cp => cp.Product);
The products variable would then be all Product records associated with the Company c record. If you would like to include the SKU for each of the products, you could use an anonymous class like so:
var productsWithSKU = c.Company_Products.Select(cp => new {
ProductID = cp.Product.ID,
Name = cp.Product.Name,
Price = cp.Product.Price,
SKU = cp.SKU
});
foreach (var
You can encapsulate the first query in a read-only property for simplicity like so:
public partial class Company
{
public property IQueryable<Product> Products
{
get { return Company_Products.Select(cp => cp.Product); }
}
}
You can't do that with the query that includes the SKU because you can't return anonymous types. You would have to have a definite class, which would typically be done by either adding a non-mapped property to the Product class or creating another class that inherits from Product that would add an SKU property. If you use an inherited class though, you will not be able to make changes to it and have it managed by EF - it would only be useful for display purposes.
Cheers. :)