hibernate-search for one-directional associations - hibernate-search

According to the spec, when #IndexedEmbedded points to an entity, the association has to be directional and the other side has to be annotated with #ContainedIn. If not, Hibernate Search has no way to update the root index when the associated entity is updated.
Am I right to assume the word directional should be bi-directional? I have exactly the problem that my index is not updated. I have one-directional relationships, e.g. person to order but the order does not know the person. Now when I change the order the index is not updated.
If changing the associations to become bi-directional is no option which possibilities would I have to still use hibernate-search? Would it be possible to create two separate indices and to combine queries?

Am I right to assume the word directional should be bi-directional?
Yes. I will fix this typo.
If changing the associations to become bi-directional is no option which possibilities would I have to still use hibernate-search?
If Person is indexed and embeds Order, but Order doesn't have an inverse association to Person, then Hibernate Search cannot retrieve the Persons that have to be reindexed when an Order changes.
Thus you will have to reindex manually: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#manual-index-changes .
You can adopt one of two strategies:
The easy path: reindex all the Person entities periodically, e.g. every night.
The hard path: reindex the affected Person entities whenever an Order changes. This basically means adding code to your services so that whenever an order is created/updated/deleted, you run a query to retrieve all the corresponding persons, and reindex them manually.
The first solution is fairly simple, but has the big disadvantage that the Person index will be up to 24 hours out of date. Depending on your use case, that may be ok or that may not.
The second solution is prone to errors and you would basically be doing Hibernate Search's work.
All in all, you really have to ask yourself if adding the inverse side of the association to your model wouldn't be better.
Would it be possible to create two separate indices and to combine queries?
Technically, if you are using the Lucene integration (not the Elasticsearch one), then yes, it would be possible.
But:
you would need above-average knowledge of Lucene.
you would have to bypass Hibernate Search APIs, and would need to write code to do what Hibernate Search usually does.
you would have to use experimental (read: unstable) Lucene APIs.
I am unsure as to how well that would perform, as I never tried it.
So I wouldn't recommend it if you're not familiar with Lucene's APIs. If you really want to take that path, here are a few pointers:
How to use the index readers directly: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#IndexReaders
Lucene's documentation for joins (what you're looking for is query-time joins): https://lucene.apache.org/core/5_5_5/join/org/apache/lucene/search/join/package-summary.html

Related

How to stop EF Core from indexing all foreign keys

As documented in questions like Entity Framework Indexing ALL foreign key columns, EF Core seems to automatically generate an index for every foreign key. This is a sound default for me (let's not get into an opinion war here...), but there are cases where it is just a waste of space and slowing down inserts and updates. How do I prevent it on a case-by-case basis?
I don't want to wholly turn it off, as it does more good than harm; I don't want to have to manually configure it for all those indices I do want. I just want to prevent it on specific FKs.
Related side question: is the fact that these index are automatically created mentioned anywhere in the EF documentation? I can't find it anywhere, which is probably why I can't find how to disable it?
Someone is bound to question why I would want to do this... so in the interest of saving time, the OPer of the linked question gave a great example in a comment:
We have a People table and an Addresses table, for example. The
People.AddressID FK was Indexed by EF but I only ever start from a
People row and search for the Addresses record; I never find an
Addresses row and then search the People.AddressID column for a
matching record.
EF Core has a configuration option to replace one of its services.
I found replacing IConventionSetBuilder to custom one would be a much cleaner approach.
https://giridharprakash.me/2020/02/12/entity-framework-core-override-conventions/
If it is really necessary to avoid the usage of some foreign keys indices - as far as I know (currently) - in .Net Core, it is necessary to remove code that will set the indices in generated migration code file.
Another approach would be to implement a custom migration generator in combination with an attribute or maybe an extension method that will avoid the index creation. You could find more information in this answer for EF6: EF6 preventing not to create Index on Foreign Key. But I'm not sure if it will work in .Net Core too. The approach seems to be bit different, here is a MS doc article that should help.
But, I strongly advise against doing this! I'm against doing this, because you have to modify generated migration files and not because of not using indices for FKs. Like you mentioned in question's comments, in real world scenarios some cases need such approach.
For other people they are not really sure if they have to avoid the usage of indices on FKs and therefor they have to modify migration files:
Before you go that way, I would suggest to implement the application with indices on FKs and would check the performance and space usage. Therefor I would produce a lot test data.
If it really results in performance and space usage issues on a test or QA stage, it's still possible to remove indices in migration files.
Because we already chat about EnsureCreated vs migrations here for completeness further information about EnsureCreated and migrations (even if you don't need it :-)):
MS doc about EnsureCreated() (It will not update your database if you have some model changes - migrations would do it)
interesting too (even if for EF7) EF7 EnsureCreated vs. Migrate Methods
Entity Framework core 2.0 (the latest version available when the question was asked) doesn't have such a mechanism, but EF Core 2.2 just might - in the form of Owned Entity Types.
Namely, since you said:
" I only ever start from a People row and search for the Addresses record; I never find an Addresses row"
Then you may want to make the Address an Owned Entity Type (and especially the variant with 'Storing owned types in separate tables', to match your choice of storing the address information in a separate Addresses table).
The docs of the feature seem to say a matching:
"Owned entities are essentially a part of the owner and cannot exist without it"
By the way, now that the feature is in EF, this may justify why EF always creates the indexes for HasMany/HasOne. It's likely because the Has* relations are meant to be used towards other entities (as opposed to 'value objects') and these, since they have their own identity, are meant to be queried independently and allow accessing other entities they relate to using navigational properties. For such a use case, it would be simply dangerous use such navigation properties without indexes (a few queries could make the database slow down hugely).
There are few caveats here though:
Turning an entity into an owned one doesn't instruct EF only about the index, but rather it instructs to map the model to database in a way that is a bit different (more on this below) but the end effect is in fact free of that extra index on People.
But chances are, this actually might be the better solution for you: this way you also say that no one should query the Address (by not allowing to create a DbSet<T> of that type), minimizing the chance of someone using it to reach the other entities with these costly indexless queries.
As to what the difference is, you'll note that if you make the Address owned by Person, EF will create a PersonId column in the Address table, which is different to your AddressId in the People table (in a sense, lack of the foreign key is a bit of a cheat: an index for querying Person from Address is there, it's just that it's the primary key index of the People table, which was there anyways). But take note that this design is actually rather good - it not only needs one column less (no AddressId in People), but it also guarantees that there's no way to make orphaned Address record that your code will never be able to access.
If you would still like to keep the AddressId column in the Addresses, then there's still one option:
Just choose a name of AddressId for the foreign key in the Addresses table and just "pretend" you don't know that it happens to have the same values as the PersonId :)
If that option isn't funny (e.g. because you can't change your database schema), then you're somewhat out of luck. But do take note that among the Current shortcomings of EF they still list "Instances of owned entity types cannot be shared by multiple owners", while some shortcomings of the previous versions are already listed as addressed. Might be worth watching that space as, it seems to me, resolving that one will probably involve introducing the ability to have your AddressId in the People, because in such a model, for the owned objects to be shared among many entities the foreign keys would need to be sitting with the owning entities to create an association to the same value for each.
in the OnModelCreating override
AFTER the call to
base.OnModelCreating(modelBuilder);
add:
var indexForRemoval = modelBuilder.Entity<You_Table_Entity>().HasIndex(x => x.Column_Index_Is_On).Metadata;
modelBuilder.Entity<You_Table_Entity>().Metadata.RemoveIndex(indexForRemoval);
'''

AddItemToSet vs StoreRelatedEntities

I am trying to understand when someone would use AddItemToSet vs StoreRelatedEntities.
It seems the former is a way to associate a set label with a string-based item handle.
The latter is a way to associate two entities, which seems like a more generalized operation.
What is it that AddItemToSet does that StoreRelatedEntities can't do?
Thanks
The AddItemToSet API in ServiceStack.Redis is a 1:1 mapping that calls Redis' Server SADD Operation, i.e. adds an item to a Redis SET.
The StoreRelatedEntities is a higher-level operation that also maintains an index containing relationship between the entities described in detail in this Storing Related Entities in Redis answer.

find(1234) including relationships

I am having a model "Events" (Zend_Db_Table_Abstract) that's got various relationships to other models. Usually I think I would do something like this to find it and its relationships:
$events = new Events();
$event = $events->find($id)->current();
$eventsRelationship1 = $event->findDependentRowset('Relationship1');
As the relationship is already set up I am wondering if there's any sort of automatic join available or something. Every time I fetch my event I need to have all the relationships, too. Currently I see only two ways to achieve that:
Build the query myself, hard coded. Don't like this, because it's working around the already set up relationship and "model method convenience".
Fetch every related object with a single query. This one's ugly, too, as I have to trigger too many queries.
This goes even a step further when thinking about getting a set of multiple rows. For a single event I may query the database multiple times, but when fetching 100 rows joins are just elementary.
So, does anyone know a way to create joins by using those relationships or is there no other way than hardcoding the query?
Thanks in advance
Arne
The way to solve this challenge is to 'upgrade' your database access to use the dataMapper pattern.
You are essentially adding an extra layer between the model in your application an their representation in the db. This mapper layer allows you read/write data from different tables - rather than a direct link between one model and one table.
Here is a good tutorial to follow. (There are some bits you can skip - I left out all the getters and setters as its just me using the code).
It takes a little while to get your head round the way it works, when you've just been using Zend_Db_Table_Abstract, but it is worth it.

DDD: Persisting aggregates

Let's consider the typical Order and OrderItem example. Assuming that OrderItem is part of the Order Aggregate, it an only be added via Order. So, to add a new OrderItem to an Order, we have to load the entire Aggregate via Repository, add a new item to the Order object and persist the entire Aggregate again.
This seems to have a lot of overhead. What if our Order has 10 OrderItems? This way, just to add a new OrderItem, not only do we have to read 10 OrderItems, but we should also re-insert all these 10 OrderItems again. (This is the approach that Jimmy Nillson has taken in his DDD book. Everytime he wants to persists an Aggregate, he clears all the child objects, and then re-inserts them again. This can cause other issues as the ID of the children are changed everytime because of the IDENTITY column in database.)
I know some people may suggest to apply Unit of Work pattern at the Aggregate Root so it keeps track of what has been changed and only commit those changes. But this violates Persistence Ignorance (PI) principle because persistence logic is leaking into the Domain Model.
Has anyone thought about this before?
Mosh
This doesn't have to be a problem, some ORM's support lazy lists.
e.g.
You could load the order entity and add items to the Details collection w/o actually materializing all of the other entities in that list.
I think N/Hibernate supports this.
If you are writing your own entity persistence code w/o any ORM, then you are pretty much out of luck, you would have to re-implement the same dirty tracking machinery as ORMappers give you for free.
The entire aggregate must be loaded from database because DDD assumes that aggregate roots ensure consistency within boundaries of aggregates. For these rules to be checed, all necessary data must be loaded. If there is a requirement that an order can be worth no more then $100000 for particular customer, aggregate root (Order) must check this rule before persisting changes. This does not imply that all the exisiting items must be loaded and their value summed up. Order can maintain pre-calculated sum of existing items which is updated on adding new ones. This way checking the business rule requires only Order data to be loaded when adding new items.
I'm not 100% sure about this approach , but I think applying unit of work pattern could be the answer . Keeping in mind that any transaction should be done , in application or domain services , you could populate the unit of work class/object with the objects from the aggregate that you have changed . After that let the UoW class/object do the magic (ofcourse building a proper UoW might be hard for some cases)
Here is a description of the unit of work pattern from here :
A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work.

How do I use entity framework with hierarchical data?

I'm working with a large hierarchical data set in sql server - modelled using the standard "EntityID, ParentID" kind of approach. There are about 25,000 nodes in the whole tree.
I often need to access subtrees of the tree, and then access related data that hangs off the nodes of the subtree. I built a data access layer a few years ago based on table-valued functions, using recursive queries to fetch an arbitrary subtree, given the root node of the subtree.
I'm thinking of using Entity Framework, but I can't see how to query hierarchical data like
this. AFAIK there is no recursive querying in Linq, and I can't expose a TVF in my entity data model.
Is the only solution to keep using stored procs? Has anyone else solved this?
Clarification: By 25,000 nodes in the tree I'm referring to the size of the hierarchical dataset, not to anything to do with objects or the Entity Framework.
It may the best to use a pattern called "Nested Set", which allows you to get an arbitrary subtree within one query. This is especially useful if the nodes aren't manipulated very often: Managing hierarchical data in MySQL.
In a perfect world the entity framework would provide possibilities to save and query data using this data pattern.
Everything IS possible with Entity Framework but you have to hack and slash your way in to it. The database I am currently working against has too many "holder tables" since Points for instance is shared with both teams and users. Both users and teams can also have a blog.
When you say 25 000 nodes do you mean navigational properties? If so I think it could be tricky to get the data access in place. It's not hard to navigate, search etc with entity framework but I tend to model on paper then create the database based on how I want to navigate while using entity framework. Sounds like you don't have that option.
Thanks for these suggestions.
I'm beginning to realise that the answer is to remodel the data in the database - either along the lines of nested sets as Georg suggests, or maybe a transitive closure table, which I've just come across.
That way, I'm hoping to get two key benefits:
a) faster querying aginst arbitrary subtrees
b) a data model which no longer requires recursive querying - so perhaps bringing it within easy reach of the Entity Framework!
It's always amazing how so often the right answer to a difficult problem is not to answer it, but to do something else instead!