JPA/Hibernate and composite keys - jpa

I have come across some SO discussions and others posts (e.g. here, here and here) where using composite primary keys with JPA is described either as something to be avoided if possible, or as a necessity due to legacy databases or as having "hairy" corner cases. Since we are designing a new database from scratch and don't have any legacy issues to consider is it recommended or let's say, safer, to avoid composite primary keys with JPA (either Hibernate or EclipseLink?).
My own feeling is that since JPA engines are complex enough and certainly, like all software, not without bugs, it may be best to suffer non-normalized tables than to endure the horror of running against a bug related to composite primary keys (the rationale being that numeric single-column primary keys and foreign keys are the simplest use case for JPA engines to support and so it should be as bug-free as possible).

I've tried both methods, and personally I prefer avoiding composite primary keys for several reasons:
You can make a superclass containing the id field, so you don't have to bother with it in all your entities.
Entity creation becomes much easier
JPA plays nicer in general
Referencing to an entity becomes easier. For example storing a bunch of IDs in a set, or specififying a single id in the query string of a web page is largelly simplified by only having to use a single number.
You can use a single equals method specified in the super class that works for all entities).
If you use JSF you can make a generic converter
Easier to specify objects when working with your DB client
But it brings some bad parts aswell:
Small amount of denormalization
Working with unpersisted objects (if you use auto generated IDs, which you should) can mean trouble in some cases, since equality methods and such needs an ID to work correctly

Related

When to use an owned entity types vs just creating a foreign key or adding the columns directly to the table?

I was reading about owned entity types here https://learn.microsoft.com/en-us/ef/core/modeling/owned-entities#feedback and I was wondering when I would use that. Especially when using .ToTable(); although I am not sure if ToTable creates a relationship with keys.
I read the entire article so I understand that it essentially forces you to access the data via nav properties and prevents the owned table from being treated as an entity. They also say Include() is not needed and the data comes down with every query for the parent table so its not like you are reducing the amount of data that comes back.
So whats the point exactly? Also whats the point of "table splitting"?
It takes the place of Complex types with the option to set it up like a 1-1 relationship /w ToTable while automatically eager-loaded. This would use the same PK in both tables, same as 1-1.
The point Table-splitting would be that you want an object model that is normalized, where the table structure is not. This would fit scenarios where you have an existing table structure and want to split off related pieces of that data into sub-entities associated with the main entity. With the ToTable option, it would be similar to a 1-1 relationship, but automatically eager-loaded. However when considering the reasons to use a 1-1 relationship I would consider this option a bad choice.
The common reasons for using it in normal 1-1 relationships would include:
Splitting off expensive to load, rarely used data. (images, binary, memo)
Encapsulating data particular to a single application off of a common entity. i.e. if I have a "Customer" which is used by a billing system vs. a CRM I might have "CustomerBillingData" and "CustomerCRMData" owned by "Customer" rather than an inherited BillingCustomer / CRMCustomer. As there is a "single" customer that may serve one or both systems. Billing doesn't care about CRM data, CRM doesn't care about Billing. If all data is in "Customer" then both systems potentially need to be updated, and I cannot rely on constraints when the data is optional to the other system. By using composition I can enforce required data for a particular system.
In neither of these cases would I want to use table-splitting or anything that automatically eager-loads, so Owned Types /w ToTable would not replace 1-1 relationships by any stretch. It's essentially a more strict version of complex types, I'd say it's strictly used for entity organization. Not something I'd admit to wanting to use very often.

How to stop EF Core from indexing all foreign keys

As documented in questions like Entity Framework Indexing ALL foreign key columns, EF Core seems to automatically generate an index for every foreign key. This is a sound default for me (let's not get into an opinion war here...), but there are cases where it is just a waste of space and slowing down inserts and updates. How do I prevent it on a case-by-case basis?
I don't want to wholly turn it off, as it does more good than harm; I don't want to have to manually configure it for all those indices I do want. I just want to prevent it on specific FKs.
Related side question: is the fact that these index are automatically created mentioned anywhere in the EF documentation? I can't find it anywhere, which is probably why I can't find how to disable it?
Someone is bound to question why I would want to do this... so in the interest of saving time, the OPer of the linked question gave a great example in a comment:
We have a People table and an Addresses table, for example. The
People.AddressID FK was Indexed by EF but I only ever start from a
People row and search for the Addresses record; I never find an
Addresses row and then search the People.AddressID column for a
matching record.
EF Core has a configuration option to replace one of its services.
I found replacing IConventionSetBuilder to custom one would be a much cleaner approach.
https://giridharprakash.me/2020/02/12/entity-framework-core-override-conventions/
If it is really necessary to avoid the usage of some foreign keys indices - as far as I know (currently) - in .Net Core, it is necessary to remove code that will set the indices in generated migration code file.
Another approach would be to implement a custom migration generator in combination with an attribute or maybe an extension method that will avoid the index creation. You could find more information in this answer for EF6: EF6 preventing not to create Index on Foreign Key. But I'm not sure if it will work in .Net Core too. The approach seems to be bit different, here is a MS doc article that should help.
But, I strongly advise against doing this! I'm against doing this, because you have to modify generated migration files and not because of not using indices for FKs. Like you mentioned in question's comments, in real world scenarios some cases need such approach.
For other people they are not really sure if they have to avoid the usage of indices on FKs and therefor they have to modify migration files:
Before you go that way, I would suggest to implement the application with indices on FKs and would check the performance and space usage. Therefor I would produce a lot test data.
If it really results in performance and space usage issues on a test or QA stage, it's still possible to remove indices in migration files.
Because we already chat about EnsureCreated vs migrations here for completeness further information about EnsureCreated and migrations (even if you don't need it :-)):
MS doc about EnsureCreated() (It will not update your database if you have some model changes - migrations would do it)
interesting too (even if for EF7) EF7 EnsureCreated vs. Migrate Methods
Entity Framework core 2.0 (the latest version available when the question was asked) doesn't have such a mechanism, but EF Core 2.2 just might - in the form of Owned Entity Types.
Namely, since you said:
" I only ever start from a People row and search for the Addresses record; I never find an Addresses row"
Then you may want to make the Address an Owned Entity Type (and especially the variant with 'Storing owned types in separate tables', to match your choice of storing the address information in a separate Addresses table).
The docs of the feature seem to say a matching:
"Owned entities are essentially a part of the owner and cannot exist without it"
By the way, now that the feature is in EF, this may justify why EF always creates the indexes for HasMany/HasOne. It's likely because the Has* relations are meant to be used towards other entities (as opposed to 'value objects') and these, since they have their own identity, are meant to be queried independently and allow accessing other entities they relate to using navigational properties. For such a use case, it would be simply dangerous use such navigation properties without indexes (a few queries could make the database slow down hugely).
There are few caveats here though:
Turning an entity into an owned one doesn't instruct EF only about the index, but rather it instructs to map the model to database in a way that is a bit different (more on this below) but the end effect is in fact free of that extra index on People.
But chances are, this actually might be the better solution for you: this way you also say that no one should query the Address (by not allowing to create a DbSet<T> of that type), minimizing the chance of someone using it to reach the other entities with these costly indexless queries.
As to what the difference is, you'll note that if you make the Address owned by Person, EF will create a PersonId column in the Address table, which is different to your AddressId in the People table (in a sense, lack of the foreign key is a bit of a cheat: an index for querying Person from Address is there, it's just that it's the primary key index of the People table, which was there anyways). But take note that this design is actually rather good - it not only needs one column less (no AddressId in People), but it also guarantees that there's no way to make orphaned Address record that your code will never be able to access.
If you would still like to keep the AddressId column in the Addresses, then there's still one option:
Just choose a name of AddressId for the foreign key in the Addresses table and just "pretend" you don't know that it happens to have the same values as the PersonId :)
If that option isn't funny (e.g. because you can't change your database schema), then you're somewhat out of luck. But do take note that among the Current shortcomings of EF they still list "Instances of owned entity types cannot be shared by multiple owners", while some shortcomings of the previous versions are already listed as addressed. Might be worth watching that space as, it seems to me, resolving that one will probably involve introducing the ability to have your AddressId in the People, because in such a model, for the owned objects to be shared among many entities the foreign keys would need to be sitting with the owning entities to create an association to the same value for each.
in the OnModelCreating override
AFTER the call to
base.OnModelCreating(modelBuilder);
add:
var indexForRemoval = modelBuilder.Entity<You_Table_Entity>().HasIndex(x => x.Column_Index_Is_On).Metadata;
modelBuilder.Entity<You_Table_Entity>().Metadata.RemoveIndex(indexForRemoval);
'''

Entity Framework Primary Key Design (Human Resources web app)

I'm going to start the development for a Human Resources (HRMS) application and I have been thinking about choosing the best datatype for database primary keys.
The application has a few must to have's:
Javascript Framework (EXTJS)
ASP.NET WebAPI Server Side
Multi-Tenant feature (Database design)
I have developed other enterprise applications before using Entity Framework and incremental INT as primary keys but sometimes you get into trouble when dealing with manual imports, etc. because the primary key is dynamic.
So I have been thinking on using GUID's as primary key because it gives you a lot of benefits in terms on data management but would like to know how does that perform with Entity Framework. Is there any side-effect on using GUID as primary keys in all my tables?
The only down-side element that I can think on using GUID's on server side is the Payload to the client because then each jSON sent from server to client will have a GUID on each record (36 chars instead of simple INT).
Appreciate any feedback.
No downside I know of besides it can be harder to debug when trying to compare Guid's. (integers are easier to read). The major benefit I see is that you don't need to have the table locked when doing inserts because you simply say "Guid.new" (something like that) and it is practically guaranteed to be unique.
I use Guid's all the time as my primary key with EF so I'm sure it works very well. Unless your tables records are very very short, I don't think the length is of material concern.
My 2 cents.

Entity Framework TPC with multiple inheritance

I am using EF with TPC and I have a multiple inheritance lets say I have
Employee (abstract)
Developer (inherits from Employee)
SeniorDeveloper (inherits from Developer)
I inserted some rows in the database and EF reads them correctly.
BUT
When I insert a new SeniorDeveloper, the values get written to the SeniorDeveloper AND Developer database table, hence querying just the Developers (context.Employees.OfType()) also gets the recently added SeniorDevelopers.
Is there a way to tell EF, that it should store only in one table, or why does EF fall back to TPT strategy?
Since it doesn't look like EF supports the multiple inheritance with TPC, we ended up using TPC for Employee to Developer and TPT between Developer and SeniorDeveloper...
I believe there is a reason for this, although I may not see the full picture and might just be speculating.
The situation
Indeed, the only way (that I see) for EF to be able to list only the non-senior developers (your querying use-case) in a TPT scenario by reading only the Developer table would be by using a discriminator, and we know that EF doesn't use one in TPT/TPC strategies.
Why? Well, remember that all senior developers are developers, so it's only natural (and necessary) that they have a Developer record as well as a SeniorDeveloper record.
The only exception is if Developer is an abstract type, in which case you can use a TPC strategy to remove the Developer table altogether. In your case however, Developer is concrete.
The current solution
Remembering this, and without a discriminator in the Developer table, the only way to determine if any developer is a non-senior developer is by checking if it is not a senior developer; in other words, by verifying that there is no record of the developer in the SeniorDeveloper table, or any other subtype table.
That did sound a little obvious, but now we understand why the SeniorDeveloper table must be used and accessed when its base type (Developer) is concrete (non-abstract).
The current implementation
I'm writing this from memory so I hope it isn't too off, but this is also what Slauma mentioned in another comment. You probably want to fire up a SQL profiler and verify this.
The way it is implemented is by requesting a UNION of projections of the tables. These projections simply add a discriminator declaring their own type in some encoded way[1]. In the union set, the rows can then be filtered based on this discriminator.
[1] If I remember correctly, it goes something like this: 0X for the base type, 0X0X for the first subtype in the union, 0X1X for the second subtype, and so on.
Trade-off #1
We can already identify a trade-off: EF can either store a discriminator in the table, or it can "generate one" at "run time".
The disadvantages of a stored discriminator are that it is less space efficient, and possibly "ugly" (if that's an argument). The advantages are lookup performance in a very specific case (we only want the records of the base type).
The disadvantages of a "run time" discriminator are that lookup performance is not as good for that same use-case. The advantages are that it is more space efficient.
At first sight, it would seem that sometimes we might prefer to trade a little bit of space for query performance, and EF wouldn't let us.
In reality, it's not always clear when; by requesting a UNION of two tables, we just lookup two indexes instead of one, and the performance difference is negligible. With a single level of inheritance, it can't be worse than 2x (since all subtype sets are disjoint). But wait, there's more.
Trade-off #2
Remember that I said the performance advantage of the stored-discriminator approach would only appear in the specific use-case where we lookup records of the base type. Why is that?
Well, if you're searching for developers that may or may not be senior developers[2], you're forced to lookup the SeniorDeveloper table anyway. While this, again, seems obvious, what may be less obvious is that EF can't know in advance if the types will only be of one type or another. This means that it would have to issue two queries in the worst case: one on the Developer table, and if there is even one senior developer in the result set, a second one on the SeniorDeveloper table.
Unfortunately, the extra roundtrip probably has a bigger performance impact than a UNION of the two tables. (I say probably, I haven't verified it.) Worse, it increases for each subtype for which there is a row in the result set. Imagine a type with 3, or 5, or even 10 subtypes.
And that's your trade-off #2.
[2] Remember that this kind of operation could come from any part of your application(s), while the resolving the trade-off must be done globally to satisfy all processes/applications/users. Also couple that with the fact that the EF team must make these trade-offs for all EF users (although it is true that they could add some configuration for these kinds trade-offs).
A possible alternative
By batching SQL queries, it would be possible to avoid the multiple roundtrips. EF would have to send some procedural logic to the server for the conditional lookups (T-SQL). But since we already established in trade-off #1 that the performance advantage is most likely negligible in many cases, I'm not sure this would ever be worth the effort. Maybe someone could open an issue ticket for this to determine if it makes sense.
Conclusion
In the future, maybe someone can optimize a few typical operations in this specific scenario with some creative solutions, then provide some configuration switches when the optimization involves such trade-offs.
Right now however, I think EF has chosen a fair solution. In a strange way, it's almost cleaner.
A few notes
I believe the use of union is an optimization applied in certain cases. In other cases, it would be an outer join, but the use of a discriminator (and everything else) remains the same.
You mentioned multiple inheritance, which sort of confused me initially. In common object-oriented parlance, multiple inheritance is a construct in which a type has multiple base types. Many object-oriented type systems don't support this, including the CTS (used by all .NET languages). You mean something else here.
You also mentioned that EF would "fallback" to a TPT strategy. In the case of Developer/SeniorDeveloper, a TPC strategy would have the same results as a TPT strategy, since Developer is concrete. If you really want a single table, you must then use a TPH strategy.

Polymorphic association foreign key constraints. Is this a good solution?

We're using polymorphic associations in our application. We've run into the classic problem: we encountered an invalid foreign key reference, and we can't create a foreign key constraint, because its a polymorphic association.
That said, I've done a lot of research on this. I know the downsides of using polymorphic associations, and the upsides. But I found what seems to be a decent solution:
http://blog.metaminded.com/2010/11/25/stable-polymorphic-foreign-key-relations-in-rails-with-postgresql/
This is nice, because you get the best of both worlds. My concern is the data duplication. I don't have a deep enough knowledge of postgresql to completely understand the cost of this solution.
What are your thoughts? Should this solution be completely avoided? Or is it a good solution?
The only alternative, in my opinion, is to create a foreign key for each association type. But then you run into validating that only one association exists. It's a "pick your poison" situation. Polymorphic associations clearly describe intent, and also make this scenario impossible. In my opinion that is the most important. The database foreign key constraint is a behind the scenes feature, and altering "intent" to work with database limitations feels wrong to me. This is why I'd like to use the above solution, assuming there is not a glaring "avoid" with it.
The biggest problem I have with PostgreSQL's INHERITS implementation is that you can't set a foreign key reference to the parent table. There are a lot of cases where you need to do that. See the examples at the end of my answer.
The decision to create tables, views, or triggers outside of Rails is the crucial one. Once you decide to do that, then I think you might as well use the very best structure you can find.
I have long used a base parent table, enforcing disjoint subtypes using foreign keys. This structure guarantees only one association can exist, and that the association resolves to the right subtype in the parent table. (In Bill Karwin's slideshow on SQL antipatterns, this approach starts on slide 46.) This doesn't require triggers in the simple cases, but I usually provide one updatable view per subtype, and require client code to use the views. In PostgreSQL, updatable views require writing either triggers or rules. (Versions before 9.1 require rules.)
In the most general case, the disjoint subtypes don't have the same number or kind of attributes. That's why I like updatable views.
Table inheritance isn't portable, but this kind of structure is. You can even implement it in MySQL. In MySQL, you have to replace the CHECK constraints with foreign key references to one-row tables. (MySQL parses and ignores CHECK constraints.)
I don't think you have to worry about data duplication. In the first place, I'm pretty sure data isn't duplicated between parent tables and inheriting tables. It just appears that way. In the second place, duplication or derived data whose integrity is completely controlled by the dbms is not an especially bitter pill to swallow. (But uncontrolled duplication is.)
Give some thought to whether deletes should cascade.
A publications example with SQL code.
A "parties" example with SQL code.
You cannot enforce that in a database in an easy way - so this is a really bad idea. The best solution is usually the simple one - forget about the polymorphic associations - this is a taste of an antipattern.