Hibernate search 6 : updating EmbededIndex from another Microservice - hibernate-search

Good morning,
I have a rather special scenario and I would like to have your opinion on the best way to handle this situation.
We have an application divided into several functional microservices, but a common database (it's not ideal but for the moment we have no choice).
From a microservice A, I index entity A with entities B, C and D, like IndexedEmbeded.
1- if I make modifications on A, by changing B or C or D, is it automatically propagated in the indexing document or does it require additional configuration?
2- the tables of entities B, C and D are updated by other microservices and in this case I have to update my index of entity A. What is the best way to do this?
I thought of doing manual indexing trimmed every change in the other microservices. but I'm not sure that's the best way to do it.
Thank you

I'll state the obvious and say that if you use the same model across microservices, you're in for some headaches, especially when updating your schema, but I guess you know that and can't do anything about it. So, let's have a look at solutions...
if I make modifications on A, by changing B or C or D, is it automatically propagated in the indexing document or does it require additional configuration
Assuming everything happens in the same microservice, and the updates are performed using Hibernate ORM (and not native SQL), it should be automatic. See https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#mapper-orm-reindexing-basics .
the tables of entities B, C and D are updated by other microservices and in this case I have to update my index of entity A. What is the best way to do this?
Assuming your other microservices share the same Hibernate ORM mapping (they know of entity A, they just don't deal with it), e.g. they all import a common JAR that contains your annotated entities... you could simply rely on outbox-polling coordination, which allows multiple instances of an application (or of different applications with the same model/mapping) to cooperate and index safely and reliably, as long as they all use Hibernate Search with compatible configuration.
If that's not your case, e.g. each microservice has its own entity classes, and may not include all entities from other microservices... I'm afraid Hibernate Search can't solve that problem for you (yet). Hibernate Search exposes a way to trigger reindexing based on entity events (entity created, entity property 'foo.bar' updated, entity deleted, ...) that you input manually, via the SearchIndexingPlan, but you will have to devise a way to propagate these events from one microservice to another. And that kind of makes Hibernate Search a lot less useful, unfortunately.

Related

hibernate-search for one-directional associations

According to the spec, when #IndexedEmbedded points to an entity, the association has to be directional and the other side has to be annotated with #ContainedIn. If not, Hibernate Search has no way to update the root index when the associated entity is updated.
Am I right to assume the word directional should be bi-directional? I have exactly the problem that my index is not updated. I have one-directional relationships, e.g. person to order but the order does not know the person. Now when I change the order the index is not updated.
If changing the associations to become bi-directional is no option which possibilities would I have to still use hibernate-search? Would it be possible to create two separate indices and to combine queries?
Am I right to assume the word directional should be bi-directional?
Yes. I will fix this typo.
If changing the associations to become bi-directional is no option which possibilities would I have to still use hibernate-search?
If Person is indexed and embeds Order, but Order doesn't have an inverse association to Person, then Hibernate Search cannot retrieve the Persons that have to be reindexed when an Order changes.
Thus you will have to reindex manually: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#manual-index-changes .
You can adopt one of two strategies:
The easy path: reindex all the Person entities periodically, e.g. every night.
The hard path: reindex the affected Person entities whenever an Order changes. This basically means adding code to your services so that whenever an order is created/updated/deleted, you run a query to retrieve all the corresponding persons, and reindex them manually.
The first solution is fairly simple, but has the big disadvantage that the Person index will be up to 24 hours out of date. Depending on your use case, that may be ok or that may not.
The second solution is prone to errors and you would basically be doing Hibernate Search's work.
All in all, you really have to ask yourself if adding the inverse side of the association to your model wouldn't be better.
Would it be possible to create two separate indices and to combine queries?
Technically, if you are using the Lucene integration (not the Elasticsearch one), then yes, it would be possible.
But:
you would need above-average knowledge of Lucene.
you would have to bypass Hibernate Search APIs, and would need to write code to do what Hibernate Search usually does.
you would have to use experimental (read: unstable) Lucene APIs.
I am unsure as to how well that would perform, as I never tried it.
So I wouldn't recommend it if you're not familiar with Lucene's APIs. If you really want to take that path, here are a few pointers:
How to use the index readers directly: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#IndexReaders
Lucene's documentation for joins (what you're looking for is query-time joins): https://lucene.apache.org/core/5_5_5/join/org/apache/lucene/search/join/package-summary.html

Writing revisions for all audit tables

I use Envers 3.5 with Spring.
Lets say I have a entity A with a relation to Entity B which has a relation to Entity C.
All Entities are audited. When I change something in entity C I can see the change in that audit table. What I want is to see the change in the audittable of entity A, too. It would be ok to see that change in entity B´s table.
Can this be done with envers?
(I´m sorry for my poor English knowledge)
This is a common question about Envers, however that is not possible currently.
First of all Envers doesn't have a way to automatically know what are the roots of entity trees, that is which entities should be marked as modified upon a child-entity modification.
Secondly it would cause a lot more data to be written on each change. It would be possible to implement using some custom annotations and "marker" revisions, but I guess that task waits for a contributor :)

DDD, EF and Referential Integrity

Ok so I have my roots defined. Entities inside a root are allowed references to other entities inside the same root, but not outside. There they can only have the identity or the related entity. This is all great.
But, I'm using EF5 and navigation properties are getting in the way. I really want to only define navigation properties where entities are inside the aggregate. But how can I apply some referential integrity where related entities are in different aggregates? Is the only option to add FKs manually using migrations?
And again, but... this is going to cause a bit of a problem because I want to be able to have ReadOnlyRepositories for fetching aggregated data from all over the place. I don't want to get into CQRS as there is too much of a leap there I feel. Am I stuck needing a second bounded context with a new set (or maybe derived set) of entities with extra navigation properties defined? All so i can write queries that pull data from more than one root?
Needing referential integrity usually indicates a deeper issue. Why do you need the entity identifier to exist in both tables? What is being held consistent that way? And why isn't that modeled explicitly? These questions are not as important as the answer to them. Also realize that by just using other technology over the same db schema (and proper indexes) many of your problems could go away. Who knows, you just might be doing CQRS at that point in time.

Entity Framework 4: which approach is the best

I know similar questions have been asked before.
I am starting with a set of xsd-generated data objects (plus the db model is there) and need to persist these almost 1:1 to a single SQL Server database. The number of entities is small (10), and the logic required for the db insert/update/delete (mostly upserts) is thin (albeit there is some).
I am wondering which approach is best?
no ORM with SQL Server stored procs, probably generated using T4
or something like codeSmith
Entity Fx, generate entities from Db, and manually map the xsd
entities to EFx entities at runtime
Entity Fx, generate edmx file from DB, then use the POCO approach
and directly persist the xsd-generated entities (after handcoding
the ObjectContext derived class I suppose)
code-only EFx approach (looks like one of the most idiotic ideas I have ever seen to me)
anything else?
I am especially keen in terms of maintenance - what happens if a property is added to the XSD-generated entities, how much effort does each approach take.
I would be tempted to go with 1, since the logic is slim and there are no complex mappings (m:n). But it would be possible the Data model will evolve to a more complex domain model, and we don't want to reimplement anything then.
How bad does each of the EFx approaches hurt in terms of run-time performance?
Your decision in this case should be informed largely by the future direction of your application.
You should consider Option 3 primarily if you do not want your Entities to have any dependency on the Entity Framework assembly (System.Data.Entity). If you think you might want to distribute or share your Entity/DAL/BL layer as an independent assembly with another application, consider option 3. This will allow you to keep your Entities separated from your persistence implementation. If, however, you don't expect to have multiple persistence implementations and don't care about the dependency on the EF assemblies, options 1 or 2 will work just fine.
On a side note, given the limited persistence logic required, be sure to look into compiled queries in Entity Framework for a big performance improvement.

DDD: Persisting aggregates

Let's consider the typical Order and OrderItem example. Assuming that OrderItem is part of the Order Aggregate, it an only be added via Order. So, to add a new OrderItem to an Order, we have to load the entire Aggregate via Repository, add a new item to the Order object and persist the entire Aggregate again.
This seems to have a lot of overhead. What if our Order has 10 OrderItems? This way, just to add a new OrderItem, not only do we have to read 10 OrderItems, but we should also re-insert all these 10 OrderItems again. (This is the approach that Jimmy Nillson has taken in his DDD book. Everytime he wants to persists an Aggregate, he clears all the child objects, and then re-inserts them again. This can cause other issues as the ID of the children are changed everytime because of the IDENTITY column in database.)
I know some people may suggest to apply Unit of Work pattern at the Aggregate Root so it keeps track of what has been changed and only commit those changes. But this violates Persistence Ignorance (PI) principle because persistence logic is leaking into the Domain Model.
Has anyone thought about this before?
Mosh
This doesn't have to be a problem, some ORM's support lazy lists.
e.g.
You could load the order entity and add items to the Details collection w/o actually materializing all of the other entities in that list.
I think N/Hibernate supports this.
If you are writing your own entity persistence code w/o any ORM, then you are pretty much out of luck, you would have to re-implement the same dirty tracking machinery as ORMappers give you for free.
The entire aggregate must be loaded from database because DDD assumes that aggregate roots ensure consistency within boundaries of aggregates. For these rules to be checed, all necessary data must be loaded. If there is a requirement that an order can be worth no more then $100000 for particular customer, aggregate root (Order) must check this rule before persisting changes. This does not imply that all the exisiting items must be loaded and their value summed up. Order can maintain pre-calculated sum of existing items which is updated on adding new ones. This way checking the business rule requires only Order data to be loaded when adding new items.
I'm not 100% sure about this approach , but I think applying unit of work pattern could be the answer . Keeping in mind that any transaction should be done , in application or domain services , you could populate the unit of work class/object with the objects from the aggregate that you have changed . After that let the UoW class/object do the magic (ofcourse building a proper UoW might be hard for some cases)
Here is a description of the unit of work pattern from here :
A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work.