Entity Framework TPC with multiple inheritance - entity-framework

I am using EF with TPC and I have a multiple inheritance lets say I have
Employee (abstract)
Developer (inherits from Employee)
SeniorDeveloper (inherits from Developer)
I inserted some rows in the database and EF reads them correctly.
BUT
When I insert a new SeniorDeveloper, the values get written to the SeniorDeveloper AND Developer database table, hence querying just the Developers (context.Employees.OfType()) also gets the recently added SeniorDevelopers.
Is there a way to tell EF, that it should store only in one table, or why does EF fall back to TPT strategy?

Since it doesn't look like EF supports the multiple inheritance with TPC, we ended up using TPC for Employee to Developer and TPT between Developer and SeniorDeveloper...

I believe there is a reason for this, although I may not see the full picture and might just be speculating.
The situation
Indeed, the only way (that I see) for EF to be able to list only the non-senior developers (your querying use-case) in a TPT scenario by reading only the Developer table would be by using a discriminator, and we know that EF doesn't use one in TPT/TPC strategies.
Why? Well, remember that all senior developers are developers, so it's only natural (and necessary) that they have a Developer record as well as a SeniorDeveloper record.
The only exception is if Developer is an abstract type, in which case you can use a TPC strategy to remove the Developer table altogether. In your case however, Developer is concrete.
The current solution
Remembering this, and without a discriminator in the Developer table, the only way to determine if any developer is a non-senior developer is by checking if it is not a senior developer; in other words, by verifying that there is no record of the developer in the SeniorDeveloper table, or any other subtype table.
That did sound a little obvious, but now we understand why the SeniorDeveloper table must be used and accessed when its base type (Developer) is concrete (non-abstract).
The current implementation
I'm writing this from memory so I hope it isn't too off, but this is also what Slauma mentioned in another comment. You probably want to fire up a SQL profiler and verify this.
The way it is implemented is by requesting a UNION of projections of the tables. These projections simply add a discriminator declaring their own type in some encoded way[1]. In the union set, the rows can then be filtered based on this discriminator.
[1] If I remember correctly, it goes something like this: 0X for the base type, 0X0X for the first subtype in the union, 0X1X for the second subtype, and so on.
Trade-off #1
We can already identify a trade-off: EF can either store a discriminator in the table, or it can "generate one" at "run time".
The disadvantages of a stored discriminator are that it is less space efficient, and possibly "ugly" (if that's an argument). The advantages are lookup performance in a very specific case (we only want the records of the base type).
The disadvantages of a "run time" discriminator are that lookup performance is not as good for that same use-case. The advantages are that it is more space efficient.
At first sight, it would seem that sometimes we might prefer to trade a little bit of space for query performance, and EF wouldn't let us.
In reality, it's not always clear when; by requesting a UNION of two tables, we just lookup two indexes instead of one, and the performance difference is negligible. With a single level of inheritance, it can't be worse than 2x (since all subtype sets are disjoint). But wait, there's more.
Trade-off #2
Remember that I said the performance advantage of the stored-discriminator approach would only appear in the specific use-case where we lookup records of the base type. Why is that?
Well, if you're searching for developers that may or may not be senior developers[2], you're forced to lookup the SeniorDeveloper table anyway. While this, again, seems obvious, what may be less obvious is that EF can't know in advance if the types will only be of one type or another. This means that it would have to issue two queries in the worst case: one on the Developer table, and if there is even one senior developer in the result set, a second one on the SeniorDeveloper table.
Unfortunately, the extra roundtrip probably has a bigger performance impact than a UNION of the two tables. (I say probably, I haven't verified it.) Worse, it increases for each subtype for which there is a row in the result set. Imagine a type with 3, or 5, or even 10 subtypes.
And that's your trade-off #2.
[2] Remember that this kind of operation could come from any part of your application(s), while the resolving the trade-off must be done globally to satisfy all processes/applications/users. Also couple that with the fact that the EF team must make these trade-offs for all EF users (although it is true that they could add some configuration for these kinds trade-offs).
A possible alternative
By batching SQL queries, it would be possible to avoid the multiple roundtrips. EF would have to send some procedural logic to the server for the conditional lookups (T-SQL). But since we already established in trade-off #1 that the performance advantage is most likely negligible in many cases, I'm not sure this would ever be worth the effort. Maybe someone could open an issue ticket for this to determine if it makes sense.
Conclusion
In the future, maybe someone can optimize a few typical operations in this specific scenario with some creative solutions, then provide some configuration switches when the optimization involves such trade-offs.
Right now however, I think EF has chosen a fair solution. In a strange way, it's almost cleaner.
A few notes
I believe the use of union is an optimization applied in certain cases. In other cases, it would be an outer join, but the use of a discriminator (and everything else) remains the same.
You mentioned multiple inheritance, which sort of confused me initially. In common object-oriented parlance, multiple inheritance is a construct in which a type has multiple base types. Many object-oriented type systems don't support this, including the CTS (used by all .NET languages). You mean something else here.
You also mentioned that EF would "fallback" to a TPT strategy. In the case of Developer/SeniorDeveloper, a TPC strategy would have the same results as a TPT strategy, since Developer is concrete. If you really want a single table, you must then use a TPH strategy.

Related

How to stop EF Core from indexing all foreign keys

As documented in questions like Entity Framework Indexing ALL foreign key columns, EF Core seems to automatically generate an index for every foreign key. This is a sound default for me (let's not get into an opinion war here...), but there are cases where it is just a waste of space and slowing down inserts and updates. How do I prevent it on a case-by-case basis?
I don't want to wholly turn it off, as it does more good than harm; I don't want to have to manually configure it for all those indices I do want. I just want to prevent it on specific FKs.
Related side question: is the fact that these index are automatically created mentioned anywhere in the EF documentation? I can't find it anywhere, which is probably why I can't find how to disable it?
Someone is bound to question why I would want to do this... so in the interest of saving time, the OPer of the linked question gave a great example in a comment:
We have a People table and an Addresses table, for example. The
People.AddressID FK was Indexed by EF but I only ever start from a
People row and search for the Addresses record; I never find an
Addresses row and then search the People.AddressID column for a
matching record.
EF Core has a configuration option to replace one of its services.
I found replacing IConventionSetBuilder to custom one would be a much cleaner approach.
https://giridharprakash.me/2020/02/12/entity-framework-core-override-conventions/
If it is really necessary to avoid the usage of some foreign keys indices - as far as I know (currently) - in .Net Core, it is necessary to remove code that will set the indices in generated migration code file.
Another approach would be to implement a custom migration generator in combination with an attribute or maybe an extension method that will avoid the index creation. You could find more information in this answer for EF6: EF6 preventing not to create Index on Foreign Key. But I'm not sure if it will work in .Net Core too. The approach seems to be bit different, here is a MS doc article that should help.
But, I strongly advise against doing this! I'm against doing this, because you have to modify generated migration files and not because of not using indices for FKs. Like you mentioned in question's comments, in real world scenarios some cases need such approach.
For other people they are not really sure if they have to avoid the usage of indices on FKs and therefor they have to modify migration files:
Before you go that way, I would suggest to implement the application with indices on FKs and would check the performance and space usage. Therefor I would produce a lot test data.
If it really results in performance and space usage issues on a test or QA stage, it's still possible to remove indices in migration files.
Because we already chat about EnsureCreated vs migrations here for completeness further information about EnsureCreated and migrations (even if you don't need it :-)):
MS doc about EnsureCreated() (It will not update your database if you have some model changes - migrations would do it)
interesting too (even if for EF7) EF7 EnsureCreated vs. Migrate Methods
Entity Framework core 2.0 (the latest version available when the question was asked) doesn't have such a mechanism, but EF Core 2.2 just might - in the form of Owned Entity Types.
Namely, since you said:
" I only ever start from a People row and search for the Addresses record; I never find an Addresses row"
Then you may want to make the Address an Owned Entity Type (and especially the variant with 'Storing owned types in separate tables', to match your choice of storing the address information in a separate Addresses table).
The docs of the feature seem to say a matching:
"Owned entities are essentially a part of the owner and cannot exist without it"
By the way, now that the feature is in EF, this may justify why EF always creates the indexes for HasMany/HasOne. It's likely because the Has* relations are meant to be used towards other entities (as opposed to 'value objects') and these, since they have their own identity, are meant to be queried independently and allow accessing other entities they relate to using navigational properties. For such a use case, it would be simply dangerous use such navigation properties without indexes (a few queries could make the database slow down hugely).
There are few caveats here though:
Turning an entity into an owned one doesn't instruct EF only about the index, but rather it instructs to map the model to database in a way that is a bit different (more on this below) but the end effect is in fact free of that extra index on People.
But chances are, this actually might be the better solution for you: this way you also say that no one should query the Address (by not allowing to create a DbSet<T> of that type), minimizing the chance of someone using it to reach the other entities with these costly indexless queries.
As to what the difference is, you'll note that if you make the Address owned by Person, EF will create a PersonId column in the Address table, which is different to your AddressId in the People table (in a sense, lack of the foreign key is a bit of a cheat: an index for querying Person from Address is there, it's just that it's the primary key index of the People table, which was there anyways). But take note that this design is actually rather good - it not only needs one column less (no AddressId in People), but it also guarantees that there's no way to make orphaned Address record that your code will never be able to access.
If you would still like to keep the AddressId column in the Addresses, then there's still one option:
Just choose a name of AddressId for the foreign key in the Addresses table and just "pretend" you don't know that it happens to have the same values as the PersonId :)
If that option isn't funny (e.g. because you can't change your database schema), then you're somewhat out of luck. But do take note that among the Current shortcomings of EF they still list "Instances of owned entity types cannot be shared by multiple owners", while some shortcomings of the previous versions are already listed as addressed. Might be worth watching that space as, it seems to me, resolving that one will probably involve introducing the ability to have your AddressId in the People, because in such a model, for the owned objects to be shared among many entities the foreign keys would need to be sitting with the owning entities to create an association to the same value for each.
in the OnModelCreating override
AFTER the call to
base.OnModelCreating(modelBuilder);
add:
var indexForRemoval = modelBuilder.Entity<You_Table_Entity>().HasIndex(x => x.Column_Index_Is_On).Metadata;
modelBuilder.Entity<You_Table_Entity>().Metadata.RemoveIndex(indexForRemoval);
'''

Entity Framework Pluralization Concern

This is a two part question:
1) What is the advantage of pluralizing other than having model respective tables names implying that they contain a collection of entity records?
2) Pluralizing is a very intricate art, and is sensitive to language localization. When I created an Entity called Schema, EF yielded a table called Schemata.
There is a major problem with this. Primarily, a developer would need to know that the plural of Schema is not Schemas, but the aforementioned. Also, this means that EF maintains some sort of a linguistic dictionary which explicitely dictates pluralization of words, and this can lead to unexpected results..
PS: Ok..., lets have the SO antifa-blm-nazis vote to close my question because it doesn't meet some guidelines, and because they have nothing better to do with their lives, and this commentary is really offensive(albeit true to life)!
Every Entity Framework entity I've ever created I have control over the pluralized version of the name, so I'm not sure what the issue is. You don't have to accept the suggested pluralization. The pluralization is useful in following connections to child entities and their collections, so there is a reason to have them in the first place. Use common sense in creating pluralized names that have the broadest, most easily grasped meaning.

Polymorphic association foreign key constraints. Is this a good solution?

We're using polymorphic associations in our application. We've run into the classic problem: we encountered an invalid foreign key reference, and we can't create a foreign key constraint, because its a polymorphic association.
That said, I've done a lot of research on this. I know the downsides of using polymorphic associations, and the upsides. But I found what seems to be a decent solution:
http://blog.metaminded.com/2010/11/25/stable-polymorphic-foreign-key-relations-in-rails-with-postgresql/
This is nice, because you get the best of both worlds. My concern is the data duplication. I don't have a deep enough knowledge of postgresql to completely understand the cost of this solution.
What are your thoughts? Should this solution be completely avoided? Or is it a good solution?
The only alternative, in my opinion, is to create a foreign key for each association type. But then you run into validating that only one association exists. It's a "pick your poison" situation. Polymorphic associations clearly describe intent, and also make this scenario impossible. In my opinion that is the most important. The database foreign key constraint is a behind the scenes feature, and altering "intent" to work with database limitations feels wrong to me. This is why I'd like to use the above solution, assuming there is not a glaring "avoid" with it.
The biggest problem I have with PostgreSQL's INHERITS implementation is that you can't set a foreign key reference to the parent table. There are a lot of cases where you need to do that. See the examples at the end of my answer.
The decision to create tables, views, or triggers outside of Rails is the crucial one. Once you decide to do that, then I think you might as well use the very best structure you can find.
I have long used a base parent table, enforcing disjoint subtypes using foreign keys. This structure guarantees only one association can exist, and that the association resolves to the right subtype in the parent table. (In Bill Karwin's slideshow on SQL antipatterns, this approach starts on slide 46.) This doesn't require triggers in the simple cases, but I usually provide one updatable view per subtype, and require client code to use the views. In PostgreSQL, updatable views require writing either triggers or rules. (Versions before 9.1 require rules.)
In the most general case, the disjoint subtypes don't have the same number or kind of attributes. That's why I like updatable views.
Table inheritance isn't portable, but this kind of structure is. You can even implement it in MySQL. In MySQL, you have to replace the CHECK constraints with foreign key references to one-row tables. (MySQL parses and ignores CHECK constraints.)
I don't think you have to worry about data duplication. In the first place, I'm pretty sure data isn't duplicated between parent tables and inheriting tables. It just appears that way. In the second place, duplication or derived data whose integrity is completely controlled by the dbms is not an especially bitter pill to swallow. (But uncontrolled duplication is.)
Give some thought to whether deletes should cascade.
A publications example with SQL code.
A "parties" example with SQL code.
You cannot enforce that in a database in an easy way - so this is a really bad idea. The best solution is usually the simple one - forget about the polymorphic associations - this is a taste of an antipattern.

Rules of thumbs for writing "queries" using ADO.NET Entity Framework

I’m currently working on a prototype of a medium size web application, and I thought that it would be good to also experiment with Entity Framework. The problem is that the major part of the application is not the data layer and logic, and so that I don't have much time to play with Entity Framework. On the other hand, the database schema is quite simple.
One of the problems I’m facing is that I cannot find a consistent way to "write queries". As far as I can tell, there are four "interfaces" for the job:
LINQ to Entities
LINQ to Entities using LINQ extension methods
Entity SQL
Query builder
OK, the first two are essentially the same, but it’s good to use just one for maintenance and consistency.
I’m mostly puzzled by the fact that none of them seems to be complete and the most general. I often find myself cornered and using some ugly looking combination of several of them. My guess is that Entity SQL is the most general one, but writing queries using strings feels like a step back. The main reason I’m experimenting with something like Entity Framework is that I like the compile time checking.
Some other random thought / issues:
I often also use the ObjectQuery.Include() method, but again it takes a string. Is this the only way?
When to use ObjectQuery.Execute() (vs. ToList())? Does it actually execute the query?
Should execute queries as soon as possible (e.g. using ToList()) or should I not care just let leave the execution for the first enumeration which gets in the way?
Are ObjectQuery.Skip() and ObjectQuery.Take() available only as extension methods? Is there a better way to do paging? It’s 2009 and almost every web application deals with paging.
Overall, I understand there are many difficulties when implementing an ORM, and often one has to compromise. On the other hand, the direct database access (e.g. ADO.NET) is plain and simple and has well defined interface (tabular results, data readers), so all code - no matter who and when writes it - is consistent. I don’t want to faced with too many choices whenever writing a database query. It’s too tedious and more than likely different developers will come up with different ways.
What are your rules of thumbs?
I use LINQ-to-Entities as much as possible. I also try and formalise to the lambda-form, as opposed to the extended SQL-style syntax. I have to admit to have had problems enforcing relationships and making compromises on efficiency just to expedite my coding of our application (eg. Master->Child tables may need to be manually loaded) but all in all, EF is a good product.
I do use EF's .Include() method for lazy-loading, which as you say, does require a string input. I find no problem with this, other than that of identifying the string to use which is relatively simple. I guess if you're keen on compile-time checking of such relations, a model similar to: Parent.GetChildren() might be more appropriate.
My application does require some "dynamic" queries to be performed, though. I have two ways of meeting this:
a) I create a mediator object, eg. ClientSearchMediator, which "knows" how to search for clients by name, etc. I can then put this through a SearchHandler.Search(ISearchMediator[] mediators) call (for example). This can be used to target specific data structures and sort results accordingly using LINQ-to-Entities.
b) For a looser experience, possibly as a result of a user designing their own query (using high level tools our application provides), eSQL is ideal for this purpose. It can be made to be injection-safe.
I don't have enough knowledge to address all of this, but I'll at least take a few stabs.
I don't know why you think ADO.NET is more consistent than Entity Framework. There are many different ways to use ADO.NET and I've definitely seen inconsistency within a single code base.
Entity Framework is currently a 1.0 release and it suffers from many 1.0 type problems (incomplete & inconsistent API, missing features, etc.).
In regards to Include, I assume you are referring to eager loading. Multiple people (outside of Microsoft) have developed solutions for getting "type safe" includes (try googling something like: Entity Framework ObjectQueryExtension Include). That said, Include is more of a hint than anything. You can't force eager loading and you have to always remember to call the IsLoaded() method to see if your request was fulfilled. As far as I know, the way "Include" works is not changing at all in the next version of Entity Framework (4.0 - to ship with VS 2010).
As far as executing the Linq query as soon as it's built vs. the last possible moment, that decision is situational. Personally, I would probably execute it as soon as it's built for the most part unless there was a compelling reason not to, but I can see other people going the opposite direction.
There are more mature ORMs on the market and Entity Framework isn't necessarily your best option. For the most part, you can bend Entity Framework to your will, but you may end up rolling your own implementation of features that come out of the box with other ORMs.

Entity Framework inheritance: TPT, TPH or none?

I am currently reading about the possibility about using inheritance with Entity Framework. Sometimes I use a approch to type data records and I am not sure if I would use TPT or TPH or none...
For example...
I have a ecommerce shop which adds shipping, billing, and delivery address
I have a address table:
RecordID
AddressTypeID
Street
ZipCode
City
Country
and a table AddressType
RecordID
AddressTypeDescription
The table design differs to the gerneral design when people show off TPT or TPH...
Does it make sense to think about inheritance an when having a approach like this..
I hope it makes sense...
Thanks for any help...
When considering how to represent inheritance in the database, you need to consider a few things.
If you have many different sub classes you can have a lot of extra joins in queries involving those more complex types which can hurt performance. One big advantage of TPH is that you query one table for all types in the hierarchy and this is a boon for performance, particularly for larger hierarchies. For this reason i tend to favour that approach in most scenarioes
However, TPH means that you can no longer have NOT NULL fields for sub types as all fields for all types are in a single table, pushing the responsibility for data integrity towards your application. Although this may sound horrible in practice i haven't found this to be too big a restriction.
However i would tend to use TPT if there were a lot of fields for each type and that the number of types in the hierarchy was likely to be small, meaning that performance was not so much of an issue with the joins, and you get better data integrity.
Note that one of the advantages of EF and other ORMs is that you can change your mind down the track without impacting your application so the decision doesn't need to be completely carved in stone.
In your example, it doesn't appear to have an inheritance relationship, it looks like a one to many from the address type to the addresses
This would be represented between your classes something like the following:
Address.AddressType
AddressType.Addresses
As Keith hints, this article suggests TPT in EF scales horribly, but I haven't tried it myself.