I am currently reading about the possibility about using inheritance with Entity Framework. Sometimes I use a approch to type data records and I am not sure if I would use TPT or TPH or none...
For example...
I have a ecommerce shop which adds shipping, billing, and delivery address
I have a address table:
RecordID
AddressTypeID
Street
ZipCode
City
Country
and a table AddressType
RecordID
AddressTypeDescription
The table design differs to the gerneral design when people show off TPT or TPH...
Does it make sense to think about inheritance an when having a approach like this..
I hope it makes sense...
Thanks for any help...
When considering how to represent inheritance in the database, you need to consider a few things.
If you have many different sub classes you can have a lot of extra joins in queries involving those more complex types which can hurt performance. One big advantage of TPH is that you query one table for all types in the hierarchy and this is a boon for performance, particularly for larger hierarchies. For this reason i tend to favour that approach in most scenarioes
However, TPH means that you can no longer have NOT NULL fields for sub types as all fields for all types are in a single table, pushing the responsibility for data integrity towards your application. Although this may sound horrible in practice i haven't found this to be too big a restriction.
However i would tend to use TPT if there were a lot of fields for each type and that the number of types in the hierarchy was likely to be small, meaning that performance was not so much of an issue with the joins, and you get better data integrity.
Note that one of the advantages of EF and other ORMs is that you can change your mind down the track without impacting your application so the decision doesn't need to be completely carved in stone.
In your example, it doesn't appear to have an inheritance relationship, it looks like a one to many from the address type to the addresses
This would be represented between your classes something like the following:
Address.AddressType
AddressType.Addresses
As Keith hints, this article suggests TPT in EF scales horribly, but I haven't tried it myself.
Related
I'm using EF 6 Code First, and I'm facing a design decision:
The entity model
I have an abstract entity which is realized by several concrete ones, each of which add several attributes to the entity. I have polymorphic associations (base class has several related entities), so when mapping this in EF, it makes a clear case for TPT inheritance. (More info here for readers: http://weblogs.asp.net/manavi/archive/2010/12/28/inheritance-mapping-strategies-with-entity-framework-code-first-ctp5-part-2-table-per-type-tpt.aspx)
The issue with TPT design
What I'm concerned about, is that I have several cases where I only need to process data on the base class. To be specific, I have several scheduled batch processes that need to load the entities and perform calculations and updates.
TPT inheritance will cause performing JOIN on all sub-type tables, which is unnecessary for my case.
Second design option
My second option is to create separate concrete entities for parent and child types, and forget about the inheritance altogether.
From the SQL point of view, these designs will be equivalent. From the object-oriented point of view, this option is less correct. Plus, reaching child entities from the parent is not clean enough.
When to use which design?
Personally, I'm leaning toward the second design (associations) because most of my cases would be easier and simpler. (Less logic handled by the ORM tool means more control and simplicity).
My question is when to use which design.
I am using EF with TPC and I have a multiple inheritance lets say I have
Employee (abstract)
Developer (inherits from Employee)
SeniorDeveloper (inherits from Developer)
I inserted some rows in the database and EF reads them correctly.
BUT
When I insert a new SeniorDeveloper, the values get written to the SeniorDeveloper AND Developer database table, hence querying just the Developers (context.Employees.OfType()) also gets the recently added SeniorDevelopers.
Is there a way to tell EF, that it should store only in one table, or why does EF fall back to TPT strategy?
Since it doesn't look like EF supports the multiple inheritance with TPC, we ended up using TPC for Employee to Developer and TPT between Developer and SeniorDeveloper...
I believe there is a reason for this, although I may not see the full picture and might just be speculating.
The situation
Indeed, the only way (that I see) for EF to be able to list only the non-senior developers (your querying use-case) in a TPT scenario by reading only the Developer table would be by using a discriminator, and we know that EF doesn't use one in TPT/TPC strategies.
Why? Well, remember that all senior developers are developers, so it's only natural (and necessary) that they have a Developer record as well as a SeniorDeveloper record.
The only exception is if Developer is an abstract type, in which case you can use a TPC strategy to remove the Developer table altogether. In your case however, Developer is concrete.
The current solution
Remembering this, and without a discriminator in the Developer table, the only way to determine if any developer is a non-senior developer is by checking if it is not a senior developer; in other words, by verifying that there is no record of the developer in the SeniorDeveloper table, or any other subtype table.
That did sound a little obvious, but now we understand why the SeniorDeveloper table must be used and accessed when its base type (Developer) is concrete (non-abstract).
The current implementation
I'm writing this from memory so I hope it isn't too off, but this is also what Slauma mentioned in another comment. You probably want to fire up a SQL profiler and verify this.
The way it is implemented is by requesting a UNION of projections of the tables. These projections simply add a discriminator declaring their own type in some encoded way[1]. In the union set, the rows can then be filtered based on this discriminator.
[1] If I remember correctly, it goes something like this: 0X for the base type, 0X0X for the first subtype in the union, 0X1X for the second subtype, and so on.
Trade-off #1
We can already identify a trade-off: EF can either store a discriminator in the table, or it can "generate one" at "run time".
The disadvantages of a stored discriminator are that it is less space efficient, and possibly "ugly" (if that's an argument). The advantages are lookup performance in a very specific case (we only want the records of the base type).
The disadvantages of a "run time" discriminator are that lookup performance is not as good for that same use-case. The advantages are that it is more space efficient.
At first sight, it would seem that sometimes we might prefer to trade a little bit of space for query performance, and EF wouldn't let us.
In reality, it's not always clear when; by requesting a UNION of two tables, we just lookup two indexes instead of one, and the performance difference is negligible. With a single level of inheritance, it can't be worse than 2x (since all subtype sets are disjoint). But wait, there's more.
Trade-off #2
Remember that I said the performance advantage of the stored-discriminator approach would only appear in the specific use-case where we lookup records of the base type. Why is that?
Well, if you're searching for developers that may or may not be senior developers[2], you're forced to lookup the SeniorDeveloper table anyway. While this, again, seems obvious, what may be less obvious is that EF can't know in advance if the types will only be of one type or another. This means that it would have to issue two queries in the worst case: one on the Developer table, and if there is even one senior developer in the result set, a second one on the SeniorDeveloper table.
Unfortunately, the extra roundtrip probably has a bigger performance impact than a UNION of the two tables. (I say probably, I haven't verified it.) Worse, it increases for each subtype for which there is a row in the result set. Imagine a type with 3, or 5, or even 10 subtypes.
And that's your trade-off #2.
[2] Remember that this kind of operation could come from any part of your application(s), while the resolving the trade-off must be done globally to satisfy all processes/applications/users. Also couple that with the fact that the EF team must make these trade-offs for all EF users (although it is true that they could add some configuration for these kinds trade-offs).
A possible alternative
By batching SQL queries, it would be possible to avoid the multiple roundtrips. EF would have to send some procedural logic to the server for the conditional lookups (T-SQL). But since we already established in trade-off #1 that the performance advantage is most likely negligible in many cases, I'm not sure this would ever be worth the effort. Maybe someone could open an issue ticket for this to determine if it makes sense.
Conclusion
In the future, maybe someone can optimize a few typical operations in this specific scenario with some creative solutions, then provide some configuration switches when the optimization involves such trade-offs.
Right now however, I think EF has chosen a fair solution. In a strange way, it's almost cleaner.
A few notes
I believe the use of union is an optimization applied in certain cases. In other cases, it would be an outer join, but the use of a discriminator (and everything else) remains the same.
You mentioned multiple inheritance, which sort of confused me initially. In common object-oriented parlance, multiple inheritance is a construct in which a type has multiple base types. Many object-oriented type systems don't support this, including the CTS (used by all .NET languages). You mean something else here.
You also mentioned that EF would "fallback" to a TPT strategy. In the case of Developer/SeniorDeveloper, a TPC strategy would have the same results as a TPT strategy, since Developer is concrete. If you really want a single table, you must then use a TPH strategy.
I need to model a large number of tables in my domain... I am trying to figure out how to correctly normalize the following:
I have Address Entity which is Abstract
StreetAddress and POBoxAddress are derived from Address
Many other entities in this domain will need a collection of addresses for instance:
Vendor.Addresses
CondoComplex.Addresses
Employee.Addresses
PositionShift.Addresses
Location.Addresses
Guest.Addresses
Property.Addresses
Owner.Addresses
etc... many other enities... So I am confused on how to store these associations in EF ? As a many to many tph with a discriminator column or am I just missing the forest for the trees and there is an less complex solution ?
Inheritance is overused in entity mappings (indeed, it's overused in general, but especially in the case of mapped objects, which aren't really strongly OO in the first place and shouldn't generally have behaviors). Most of the time, you should avoid it, as it makes queries and data structures far more complicated. Do you really need two different types for StreetAddress and POBoxAddress? Why? The Post Office won't care.
There needs to be a clear, compelling case for inheritance before you take that complexity into your model. In this case, you not only don't have it, but your question indicates that you have a strong case for not using inheritance here at all.
Leaving the case of complexity of your Address entity aside, if related entities are not going to share Addresses I don't think there's a point in creating many to many relationship. I'm not sure how your abstract Address class looks like, but I would go simply with
public List<Address> Addresses {get;set;}
in your entities
So I heard L2S is going the way of the dodo bird. I am also finding out that if I use L2S, I will have to write multiple versions of the same code to target different schemas even if they vary slightly. I originally chose L2S because it was reliable and easy to learn, while EF 3 wasn't ready for public consumption at the time.
After reading lots of praises for EF 4.1, I thought I would do a feasibility test. I discovered that EF 4.1 is a beast to get your head around. It is mindnumblingly complex with hundreds of ways of doing the same thing. It seems to work fine if you're planning on using simple table-to-object mapped entities, but complex POCO object mapping has been a real PITA. There are no good tutorials and the few that exist are very rudimentary.
There are tons of blogs about learning the fundamentals about EF 4.1, but I have a feeling that they deliberately avoid advanced topics. Are there any good tutorials on more complex mapping scenarios? For instance, taking an existing POCO object and mapping it across several tables, or persisting a POCO object that is composed of other POCO objects? I keep hearing this is possible, but haven't found any examples.
Disclaimer: IMO EF 4.1 is best known for its Code-First approach. Most of the following links point to articles about doing stuff in code-first style. I'm not very familiar with DB-First or Model-First approaches.
I have learned many things from Mr. Manavi's blog. Especially, the Inheritance with code-first series was full of new stuff for me. This MSDN link has some valuable links/infos about different mapping scenarios too. Also, I have learned manu stuff by following or answering questions with entity-framework tags here on SO.
Whenever I want to try some new complex object mapping, I do my best (based on my knowledge about EF) to create the correct mappings; However sometimes, you face a dead end. That's why god created StackOverflow. :)
What do you mean by EFv4.1? Do you mean overhyped code-first / fluent-API? In such case live with a fact that it is mostly for simple mapping scenarios. It offers more then L2S but still very little in terms of advanced mappings.
The basic mapping available in EF follows basic rule: one table = one entity. Entity can be single class or composition of the main class representing the entity itself and helper classes for set of mapped fields (complex types).
The most advanced features you will get with EF fluent-API or designer are:
TPH inheritance - multiple tables in inheritance hierarchy mapped to the same table. Types are differed by special column called discriminator. Shared fields must be in parent class.
TPT inheritance - each type mapped to the separate table = basic type has one table and each derived type has one table as well. Shared fields must be defined in base type and thus in base table. Relation between base and derived table is one-to-one. Derived entities span multiple tables.
TPC inheritance - each class has separate table = shared fields must be defined in base type but each derived type has them in its own table.
Entity splitting - entity is split into two or more tables which are related by one-to-one relation. All parts of entity must exist.
Table splitting - table is split into two or more entities related with one-to-one relation.
Designer also offers
Conditional mapping - this is not real mapping. It is only hardcoded filter on mapping level where you select one or more fields to restrict records which are allowed for loading.
When using basic or more advanced features table can participate only in one mapping.
All these mapping techniques follow very strict rules. Your classes and tables must follow these rules to make them work. That means you cannot take arbitrary POCO and map it to multiple tables without satisfying those rules.
These rules can be avoided only when using EDMX and advanced approach with advanced skills = no fluent API and no designer but manual modifications of XML defining EDMX. Once you go this way you can use
Defining query - custom SQL query used to specify loading of new "entity". This is also approach natively used by EDMX and designer when mapping database view
Query view - custom ESQL query used to specify new "entity" from already mapped entities. It is more usable for predefined projections because in contrast to defining query it has some limitations (for example aggregations are not allowed).
Both these features allow you defining classes combined from multiple tables. The disadvantage of both these mapping techniques is that mapped result is read only. You must use stored procedures for persisting changes when using these techniques.
I thought I'll post this to the community. I am using coredata, and have two entities. Both entities have a hierarchical relationship. I am noticing quite a lot of duplicated functionality now, and am wondering if I should re-structure to have a base Entity which is abstract (HierarchicalObject), and make my entities inherit from them.
So the question is are there some limitations of this inheritance that I should take into account? Reading some of the posts out there, I see a few trade-offs, let me know if my assumptions are correct.
(Good) clean up structure, keep the HierarchicalObject functionality in one spot.
(Ok) With inheritance, both objects now end up in the same sqlite table (I am using Sqlite as the backend). So if the number of objects grow, search/sorting could take longer? Not sure if this is a huge deal, as the number of objects in my case should stay pretty static.
(not so good) With inheritance, the relationship could get more complicated? (http://www.cocoadev.com/index.pl?CoreDataInheritanceIssues)
Are there other things to take into account?
Thanks for your comments.
I think it's a mistake to draw to close a parallel between entities and classes. While very similar they do have some important differences.
The most important difference is that entities don't have code like a class would so when you have entities with duplicate attributes, your not adding a lot of extra coding and potential for introducing bugs.
A lot of people believe that class inheritance must parallel entity inheritance. It does not. As a long as a class descends from NSManagedObject and responds to the right key-value messages for the entity it represents, the class can have many merry adventures in it's inheritance that are not reflected in the entities inheritance. E.g. It's fairly common to create a custom base class right below NSManagedObject and the have all the subsequent managed object subclasses inherit from that regardless of their entities.
I think the only time that entity inheritance is absolutely required is when you need different entities to show up in the same relationship. E.g:
Owner{
vehical<-->Vehical.owner
}
Vehical(abstract){
owner<-->Owner.vehical
}
Motocycle:Vehical{
}
Car:Vehical{
}
Now the Owner.vehical can hold either a Motocycle object or a Car object. Note that the managed object class inheritance for Motocycle and Car don't have to be same. You could have something like Motocycle:TwoWheeled:NSManagedObject and Car:FourWheeled:NSManagedObject and everything would work fine.
In the end, entities are just instructions to context to tell it how the object graph fits together. As long as your entity arrangement makes that happen, you have a lot flexibility in the design details, quite a bit more than you would have in an analogous situation with classes.
I thought it would be useful to mention that the Notes app on iOS 10 uses inheritance in its Core Data model. They use a base entity SyncingObject, that has 7 sub-entities including Note and Folder. And as you mentioned all of these are stored in the same SQLite table which has a whopping 106 columns, and since are shared among all entities most are NULL. They also implemented the folder-notes one-to-many relation as a many-to-many which creates a pivot table, which might be a work-around for an inheritance problem.
There are a couple of advantages to using entity inheritance that likely outweigh these storage limitations. For example, a unique constraint can be unique across entities. And a fetch request for a parent entity can return multiple child entities making UI that uses fetched results controller simpler, e.g. grouping by accounts or folders in a sidebar. Notes uses this to show an "All Notes" row above the Folder rows which is actually backed by an Account.
I have had issues in the past with data migration of models that had inheritance - you may want to experiment with that and see if you can get it to work.
As you noted also, all objects go in one table.
However, as Core Data is managing an object graph, it is really nice to keep the structure the way you would naturally have it just modeling objects - which includes inheritance. There's a lot to be said for keeping the model sane so that you have to do less work in maintaining code.
I have personally used a fairly complex CD model with inheritance in one of my own apps, and it has worked out OK (apart from as I said having issues with data migration, but that has been so flakey for me in general I do not rely on that working any longer).