How to stop EF Core from indexing all foreign keys - entity-framework

As documented in questions like Entity Framework Indexing ALL foreign key columns, EF Core seems to automatically generate an index for every foreign key. This is a sound default for me (let's not get into an opinion war here...), but there are cases where it is just a waste of space and slowing down inserts and updates. How do I prevent it on a case-by-case basis?
I don't want to wholly turn it off, as it does more good than harm; I don't want to have to manually configure it for all those indices I do want. I just want to prevent it on specific FKs.
Related side question: is the fact that these index are automatically created mentioned anywhere in the EF documentation? I can't find it anywhere, which is probably why I can't find how to disable it?
Someone is bound to question why I would want to do this... so in the interest of saving time, the OPer of the linked question gave a great example in a comment:
We have a People table and an Addresses table, for example. The
People.AddressID FK was Indexed by EF but I only ever start from a
People row and search for the Addresses record; I never find an
Addresses row and then search the People.AddressID column for a
matching record.

EF Core has a configuration option to replace one of its services.
I found replacing IConventionSetBuilder to custom one would be a much cleaner approach.
https://giridharprakash.me/2020/02/12/entity-framework-core-override-conventions/

If it is really necessary to avoid the usage of some foreign keys indices - as far as I know (currently) - in .Net Core, it is necessary to remove code that will set the indices in generated migration code file.
Another approach would be to implement a custom migration generator in combination with an attribute or maybe an extension method that will avoid the index creation. You could find more information in this answer for EF6: EF6 preventing not to create Index on Foreign Key. But I'm not sure if it will work in .Net Core too. The approach seems to be bit different, here is a MS doc article that should help.
But, I strongly advise against doing this! I'm against doing this, because you have to modify generated migration files and not because of not using indices for FKs. Like you mentioned in question's comments, in real world scenarios some cases need such approach.
For other people they are not really sure if they have to avoid the usage of indices on FKs and therefor they have to modify migration files:
Before you go that way, I would suggest to implement the application with indices on FKs and would check the performance and space usage. Therefor I would produce a lot test data.
If it really results in performance and space usage issues on a test or QA stage, it's still possible to remove indices in migration files.
Because we already chat about EnsureCreated vs migrations here for completeness further information about EnsureCreated and migrations (even if you don't need it :-)):
MS doc about EnsureCreated() (It will not update your database if you have some model changes - migrations would do it)
interesting too (even if for EF7) EF7 EnsureCreated vs. Migrate Methods

Entity Framework core 2.0 (the latest version available when the question was asked) doesn't have such a mechanism, but EF Core 2.2 just might - in the form of Owned Entity Types.
Namely, since you said:
" I only ever start from a People row and search for the Addresses record; I never find an Addresses row"
Then you may want to make the Address an Owned Entity Type (and especially the variant with 'Storing owned types in separate tables', to match your choice of storing the address information in a separate Addresses table).
The docs of the feature seem to say a matching:
"Owned entities are essentially a part of the owner and cannot exist without it"
By the way, now that the feature is in EF, this may justify why EF always creates the indexes for HasMany/HasOne. It's likely because the Has* relations are meant to be used towards other entities (as opposed to 'value objects') and these, since they have their own identity, are meant to be queried independently and allow accessing other entities they relate to using navigational properties. For such a use case, it would be simply dangerous use such navigation properties without indexes (a few queries could make the database slow down hugely).
There are few caveats here though:
Turning an entity into an owned one doesn't instruct EF only about the index, but rather it instructs to map the model to database in a way that is a bit different (more on this below) but the end effect is in fact free of that extra index on People.
But chances are, this actually might be the better solution for you: this way you also say that no one should query the Address (by not allowing to create a DbSet<T> of that type), minimizing the chance of someone using it to reach the other entities with these costly indexless queries.
As to what the difference is, you'll note that if you make the Address owned by Person, EF will create a PersonId column in the Address table, which is different to your AddressId in the People table (in a sense, lack of the foreign key is a bit of a cheat: an index for querying Person from Address is there, it's just that it's the primary key index of the People table, which was there anyways). But take note that this design is actually rather good - it not only needs one column less (no AddressId in People), but it also guarantees that there's no way to make orphaned Address record that your code will never be able to access.
If you would still like to keep the AddressId column in the Addresses, then there's still one option:
Just choose a name of AddressId for the foreign key in the Addresses table and just "pretend" you don't know that it happens to have the same values as the PersonId :)
If that option isn't funny (e.g. because you can't change your database schema), then you're somewhat out of luck. But do take note that among the Current shortcomings of EF they still list "Instances of owned entity types cannot be shared by multiple owners", while some shortcomings of the previous versions are already listed as addressed. Might be worth watching that space as, it seems to me, resolving that one will probably involve introducing the ability to have your AddressId in the People, because in such a model, for the owned objects to be shared among many entities the foreign keys would need to be sitting with the owning entities to create an association to the same value for each.

in the OnModelCreating override
AFTER the call to
base.OnModelCreating(modelBuilder);
add:
var indexForRemoval = modelBuilder.Entity<You_Table_Entity>().HasIndex(x => x.Column_Index_Is_On).Metadata;
modelBuilder.Entity<You_Table_Entity>().Metadata.RemoveIndex(indexForRemoval);
'''

Related

hibernate-search for one-directional associations

According to the spec, when #IndexedEmbedded points to an entity, the association has to be directional and the other side has to be annotated with #ContainedIn. If not, Hibernate Search has no way to update the root index when the associated entity is updated.
Am I right to assume the word directional should be bi-directional? I have exactly the problem that my index is not updated. I have one-directional relationships, e.g. person to order but the order does not know the person. Now when I change the order the index is not updated.
If changing the associations to become bi-directional is no option which possibilities would I have to still use hibernate-search? Would it be possible to create two separate indices and to combine queries?
Am I right to assume the word directional should be bi-directional?
Yes. I will fix this typo.
If changing the associations to become bi-directional is no option which possibilities would I have to still use hibernate-search?
If Person is indexed and embeds Order, but Order doesn't have an inverse association to Person, then Hibernate Search cannot retrieve the Persons that have to be reindexed when an Order changes.
Thus you will have to reindex manually: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#manual-index-changes .
You can adopt one of two strategies:
The easy path: reindex all the Person entities periodically, e.g. every night.
The hard path: reindex the affected Person entities whenever an Order changes. This basically means adding code to your services so that whenever an order is created/updated/deleted, you run a query to retrieve all the corresponding persons, and reindex them manually.
The first solution is fairly simple, but has the big disadvantage that the Person index will be up to 24 hours out of date. Depending on your use case, that may be ok or that may not.
The second solution is prone to errors and you would basically be doing Hibernate Search's work.
All in all, you really have to ask yourself if adding the inverse side of the association to your model wouldn't be better.
Would it be possible to create two separate indices and to combine queries?
Technically, if you are using the Lucene integration (not the Elasticsearch one), then yes, it would be possible.
But:
you would need above-average knowledge of Lucene.
you would have to bypass Hibernate Search APIs, and would need to write code to do what Hibernate Search usually does.
you would have to use experimental (read: unstable) Lucene APIs.
I am unsure as to how well that would perform, as I never tried it.
So I wouldn't recommend it if you're not familiar with Lucene's APIs. If you really want to take that path, here are a few pointers:
How to use the index readers directly: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#IndexReaders
Lucene's documentation for joins (what you're looking for is query-time joins): https://lucene.apache.org/core/5_5_5/join/org/apache/lucene/search/join/package-summary.html

When to use an owned entity types vs just creating a foreign key or adding the columns directly to the table?

I was reading about owned entity types here https://learn.microsoft.com/en-us/ef/core/modeling/owned-entities#feedback and I was wondering when I would use that. Especially when using .ToTable(); although I am not sure if ToTable creates a relationship with keys.
I read the entire article so I understand that it essentially forces you to access the data via nav properties and prevents the owned table from being treated as an entity. They also say Include() is not needed and the data comes down with every query for the parent table so its not like you are reducing the amount of data that comes back.
So whats the point exactly? Also whats the point of "table splitting"?
It takes the place of Complex types with the option to set it up like a 1-1 relationship /w ToTable while automatically eager-loaded. This would use the same PK in both tables, same as 1-1.
The point Table-splitting would be that you want an object model that is normalized, where the table structure is not. This would fit scenarios where you have an existing table structure and want to split off related pieces of that data into sub-entities associated with the main entity. With the ToTable option, it would be similar to a 1-1 relationship, but automatically eager-loaded. However when considering the reasons to use a 1-1 relationship I would consider this option a bad choice.
The common reasons for using it in normal 1-1 relationships would include:
Splitting off expensive to load, rarely used data. (images, binary, memo)
Encapsulating data particular to a single application off of a common entity. i.e. if I have a "Customer" which is used by a billing system vs. a CRM I might have "CustomerBillingData" and "CustomerCRMData" owned by "Customer" rather than an inherited BillingCustomer / CRMCustomer. As there is a "single" customer that may serve one or both systems. Billing doesn't care about CRM data, CRM doesn't care about Billing. If all data is in "Customer" then both systems potentially need to be updated, and I cannot rely on constraints when the data is optional to the other system. By using composition I can enforce required data for a particular system.
In neither of these cases would I want to use table-splitting or anything that automatically eager-loads, so Owned Types /w ToTable would not replace 1-1 relationships by any stretch. It's essentially a more strict version of complex types, I'd say it's strictly used for entity organization. Not something I'd admit to wanting to use very often.

Entity Framework Pluralization Concern

This is a two part question:
1) What is the advantage of pluralizing other than having model respective tables names implying that they contain a collection of entity records?
2) Pluralizing is a very intricate art, and is sensitive to language localization. When I created an Entity called Schema, EF yielded a table called Schemata.
There is a major problem with this. Primarily, a developer would need to know that the plural of Schema is not Schemas, but the aforementioned. Also, this means that EF maintains some sort of a linguistic dictionary which explicitely dictates pluralization of words, and this can lead to unexpected results..
PS: Ok..., lets have the SO antifa-blm-nazis vote to close my question because it doesn't meet some guidelines, and because they have nothing better to do with their lives, and this commentary is really offensive(albeit true to life)!
Every Entity Framework entity I've ever created I have control over the pluralized version of the name, so I'm not sure what the issue is. You don't have to accept the suggested pluralization. The pluralization is useful in following connections to child entities and their collections, so there is a reason to have them in the first place. Use common sense in creating pluralized names that have the broadest, most easily grasped meaning.

Entity Framework Code First unique constraint across multiple tables

So I'm creating a database model using Entity Framework's Code First paradigm and I'm trying to create two tables (Players and Teams) that must share a uniqueness constraint regarding their primary key.
For example, I have 3 Players with Ids "1", "2" and "3" and when I try to create a Team with Id "2", the system should validate uniqueness and fail because there already exists a Player with Id "2".
Is this possible with data annotations? Both these entities share a common Interface called IParticipant if that helps!
Txs in advance lads!
The scenario you are describing here isn't really ideal. This isn't really a restriction on Entity Framework; it's more a restriction on the database stack. By default, the Id primary key is an Identity column, and SQL itself isn't really supportive of the idea of "shared" Identity columns. You can disable Identity and manage the Id properties yourself, but then Entity Framework cannot automatically build navigation properties for your entities.
The best option here is to use one single participant table, in a technique called "Table Per Hierarchy", or TPH. Entity Framework can manage the single table using an internal discriminator column. Shared properties can be put into the base class, and non-shared properties can be put on the individual classes, which Entity Framework will composite into a single large table in the DB. The main drawback to this strategy is that columns for non-shared properties will automatically be nullable in the database. This article describes this scenario very well.
The more I try to come up with a solution, I realize that this is an example of the XY Problem. There is not really a good solution to this question, because this question is already a proposed solution. There is a problem here that has led you to create an Interface which you suggest requires the entities which are using the interface to have a unique Id. This really sounds like an issue with the design of the Interface itself, as Interfaces should be agnostic to the entity they are applied to. Perhaps providing some code and showing what your problem actually is would be helpful, since the proposed solution you are asking how to implement here isn't really practical.

I don't need/want a key!

I have some views that I want to use EF 4.1 to query. These are specific optimized views that will not have keys to speak of; there will be no deletions, updates, just good ol'e select.
But EF wants a key set on the model. Is there a way to tell EF to move on, there's nothing to worry about?
More Details
The main purpose of this is to query against a set of views that have been optimized by size, query parameters and joins. The underlying tables have their PKs, FKs and so on. It's indexed, statiscized (that a word?) and optimized.
I'd like to have a class like (this is a much smaller and simpler version of what I have...):
public MyObject //this is a view
{
Name{get;set}
Age{get;set;}
TotalPimples{get;set;}
}
and a repository, built off of EF 4.1 CF where I can just
public List<MyObject> GetPimply(int numberOfPimples)
{
return db.MyObjects.Where(d=> d.TotalPimples > numberOfPimples).ToList();
}
I could expose a key, but whats the real purpose of dislaying a 2 or 3 column natural key? That will never be used?
Current Solution
Seeming as their will be no EF CF solution, I have added a complex key to the model and I am exposing it in the model. While this goes "with the grain" on what one expects a "well designed" db model to look like, in this case, IMHO, it added nothing but more logic to the model builder, more bytes over the wire, and extra properties on a class. These will never be used.
There is no way. EF demands unique identification of the record - entity key. That doesn't mean that you must expose any additional column. You can mark all your current properties (or any subset) as a key - that is exactly how EDMX does it when you add database view to the model - it goes through columns and marks all non-nullable and non-computed columns as primary key.
You must be aware of one problem - EF internally uses identity map and entity key is unique identification in this map (each entity key can be associated only with single entity instance). It means that if you are not able to choose unique identification of the record and you load multiple records with the same identification (your defined key) they will all be represented by a single entity instance. Not sure if this can cause you any issues if you don't plan to modify these records.
EF is looking for a unique way to identify records. I am not sure if you can force it to go counter to its nature of desiring something unique about objects.
But, this is an answer to the "show me how to solve my problem the way I want to solve it" question and not actually tackling your core business requirement.
If this is a "I don't want to show the user the key", then don't bind it when you bind the data to your form (web or windows). If this is a "I need to share these items, but don't want to give them the keys" issue, then map or surrogate the objects into an external domain model. Adds a bit of weight to the solution, but allows you to still do the heavy lifting with a drag and drop surface (EF).
The question is what is the business requirement that is pushing you to create a bunch of objects without a unique identifier (key).
One way to do this would be not to use views at all.
Just add the tables to your EF model and let EF create the SQL that you are currently writing by hand.