Aggregate Pattern and Performance Issues

Aggregate Pattern and Performance Issues - aggregate

I have read about the Aggregate Pattern but I'm confused about something here. The pattern states that all the objects belonging to the aggregate should be accessed via the Aggregate Root, and not directly.
And I'm assuming that is the reason why they say you should have a single Repository per Aggregate.
But I think this adds a noticeable overhead to the application. For example, in a typical Web-based application, what if I want to get an object belonging to an aggregate (which is NOT the aggregate root)? I'll have to call Repository.GetAggregateRootObject(), which loads the aggregate root and all its child objects, and then iterate through the child objects to find the one I'm looking for. In other words, I'm loading lots of data and throwing them out except the particular object I'm looking for.
Is there something I'm missing here?
PS: I know some of you may suggest that we can improve performance with Lazy Loading. But that's not what I'm asking here... The aggregate pattern requires that all objects belonging to the aggregate be loaded together, so we can enforce business rules.

I'm not sure where you read about this "Aggregate Pattern". Please post a link.
One thing it could be is where we encapsulate a list into another object. For a simple example if we have a shopping cart instead of passing around a list of purchases we use a cart object instead. Then code that works on the whole cart (eg. getting total spend) can be encapsulated in the cart. I'm not sure if this is really a pattern but google found this link: http://perldesignpatterns.com/?AggregatePattern
I suspect when you say
"The aggregate pattern requires that
all objects belonging to the aggregate
be loaded together, so we can enforce
business rules. "
this depends on your business rules.
Generally patterns should not be regarded as a set of rules you must follow for everything. It is up to you as a developer to recognise where they can be used effectively.
In our cart example we would generally want to work on the whole cart at once. We might have business rules that say our customers can only order a limited about of items - or maybe they get a discount for order multiple items. So it makes sense to read the whole thing.
If you take a different example eg. products. You could still have a products repository but you needn't not load them all at once. You probably only ever want a page of products at a time.

Related

How to modelling domain model - aggregate root

I'm having some issues to correctly design the domain that I'm working on.
My straightforward use case is the following:
The user (~5000 users) can access to a list of ads (~5 millions)
He can choose to add/remove some of them as favorites.
He can decide to show/hide some of them.
I have a command which will mutate the aggregate state, to set Favorite to TRUE, let's say.
In terms of DDD, how should I design the aggregates?
How design the relationship between a user and his favorite's ads selection?
Considering the large numbers of ads, I cannot duplicate each ad inside a user aggregate root.
Can I design a Ads aggregateRoot containing a user "collection".
And finally, how to handle/perform the readmodels part?
Thanks in advance
Cheers

Two concepts may help you understand how to model this:
1. Aggregates are Transaction Boundaries.
An aggregate is a cluster of associated objects that are considered as a single unit. All parts of the aggregate are loaded and persisted together.
If you have an aggregate that encloses a 1000 entities, then you have to load all of them into memory. So it follows that you should preferably have small aggregates whenever possible.
2. Aggregates are Distinct Concepts.
An Aggregate represents a distinct concept in the domain. Behavior associated with more than one Aggregate (like Favoriting, in your case) is usually an aggregate by itself with its own set of attributes, domain objects, and behavior.
From your example, User is a clear aggregate.
An Ad has a distinct concept associated with it in the domain, so it is an aggregate too. There may be other entities that will be embedded within the Ad like valid_until, description, is_active, etc.
The concept of a favoriting an Ad links the User and the Ad aggregates. Your question seems to be centered around where this linkage should be preserved. Should it be in the User aggregate (a list of Ads), or should an Ad have a collection of User objects embedded within it?
While both are possibilities, IMHO, I think FavoriteAd is yet another aggregate, which holds references to both the User aggregate and the Ad aggregate. This way, you don't burden the concepts of User or the Ad with favoriting behavior.
Those aggregates will also not be required to load this additional data every time they are loaded into memory. For example, if you are loading an Ad object to edit its contents, you don't want the favorites collection to be loaded into memory by default.
These aggregate structures don't matter as far as read models are concerned. Aggregates only deal with the write side of the domain. You are free to rewire the data any way you want, in multiple forms, on the read side. You can have a subscriber just to listen to the Favorited event (raised after processing the Favorite command) and build a composite data structure containing data from both the User and the Ad aggregates.

I really like the answer given by Subhash Bhushan and I want to add another approach for you to consider.
If you look closely at your question you will see that you've made the assumption that an aggregate can 'see' everything that the user does when they are interacting with the UI. This doesn't need to be so.
Depending on the requirements of the domain you don't need to hold a list of any Ads in the aggregate to favourite them. Here's what I mean:
For this example, it doesn't matter where the the 'favourite' ad command sits. It could be on the user aggregate or a specific aggregate for handling the concept of Favouriting. The command just needs to hold the id of the User and the Ad they are favouriting.
You may need to handle what happens if a user or ad is deleted but that would just be a case of an event process manager listening to the appropriate events and issuing compensating commands.
This way you don't need to load up 5 million ads. That's a job for the read model and UI, not the domain.
Just a thought.

Conducting searches with REST that return large datasets?

I'm creating a RESTful WebAPI for our system in .Net, when conducting a search in my client I presume that it should hit the /person route passing parameters when required to filter the data. However, the person object that is return has quite a lot of nested objects which could slow down data retrieval. Should I have a separate controller which returns a more skeletonised view of a person, should I continue the way I am going, or should I be making subsequent requests to break down the person?

Actually, there is no silver-bullet way to solve your problem, but there are several approaches, which could be usefull for you. However, in my opinion, your idea about optimizing the size of resource representation in search results is correct.
You can include the list of requested fields in filtering query. (for example, see the similar signature/approach in ES search API). Many search engines are following this approach to reduce redundant response payload.
As you have metioned, you can break your heavy object in sub-resources, so that you would be able to include only links to nested objects inside the person, without including the whole represantations of inner-objects. The HATEOAS approach will fit perfectly for this purpose, but it will add extra complexity to your application (but the extra flexibility too).
However, you have to choose, which approach is better for your particular application, but I think, that a good starting point will be the approach with list of requested fields.

Validation using other models

What are the options to validate an event which uses another model?
Shopping cart example:
When adding a cart item to the shopping cart, there should be a check if the item isn't sold out yet.

You would usually validate a command rather than an event as an event should be something that cannot be changed
In answer to the question, typically it depends on what the business cost of the process is. In your example for instance, what is the cost to the business of ordering an item that is sold out? Probably very little - an email saying the item is out of stock, with an estimate of how long it will take.
In this type of scenario you could use an eventually consistent read model over the data, where you'd query the read model / cache for stock levels, but accept that some orders might go through for things that are out of stock.
If you have tighter constraints then you'll have to enforce them, ideally by remodelling your aggregates, or having transactional and/or blocking on the ordering process.

What are the options to validate an event which uses another model?
A domain event is an important happening from the point of view of the business. It's something that happen in the past, so it can't be changed.
In OO, it's commonly represented as a Value Object, that's say, an immutable object where the interesting part are their attributes.
Commonly, those Domain Events are yield from an operation in a Aggregate Root (DDD jargon). The client of the Aggregate Root is the Application Service (aka use case). The Application Service receive a Command object and base on that, executes operation in the Aggregate Root.
The validation could consist in primitive validation, object validation and/or composed object validation. Then the object in charge to perform this validation should be the Aggregate Root itself and/or some objects with the specific goal in validation.
When adding a cart item to the shopping cart, there should be a check
if the item isn't sold out yet
Following your example the objects woulb be:
Command: AddItemToShoppingCartCommand. Holds information about the item to be added, and shopping cart identifier for example.
Application Service: AddItemToShoppingCartService.
Aggregate Root: ShoppingCartInventory. I've used deliberately Inventory in the name to be explicit that this Aggregate Root satisfy the invariant "...if the item isn't sold out yet."
Note: In my opinion, the invariant that checks the inventory in the Aggregate Root, makes the Aggregate too big. My advice is relax this invariant and embrace eventual consistency if this "sold out" it's not happen normally.

nosql wishlist models - Struggle between reference and embedded document

I got a question about modeling wishlists using mongodb and mongoose. The idea is I need a user beeing able to have many different wishlists which contain many wishes, each wish making a reference to a single article
I was thinking about it and because a wishlist only belong to a single user I thought using embedded document for that.
Same for the wish beeing embedded to a wishlist.
So I got something like that
var UserSchema = new Schema({
...
wishlists: [wishlistSchema]
...
})
var WishlistSchema = new Schema({
...
wishes: [wishSchema]
...
})
but my question is what to do with the article ? should I use a reference or should I copy the article's data in an embedded document.
If I use embedded document I got an update problem. When the article's price change, to update every wish referencing this article become a struggle. But to access those wishes's article is a piece of cake.
If I use reference, The update is not a problem anymore but I got a probleme when I filter the wish depending on their article criteria ( when I filter the wishes depending on a price, category etc .. ).
I think the second way is probably the best but I don't know how if it's possible to build a query to filter the wish depending on the article's field. I tried a lot of things using population but nothing works very well when you need to populate depending on a nested object field. ( for exemple getting wishes where their article respond to certain conditions ).
Is this kind of query doable ?
Sry for the loooong question and for my bad English :/ but any advice would be great !

In my experience in dealing with NoSQL database (mongo, mainly), when designing a collection, do not think of the relations. Instead, think of how you would display, page, and retrieve the documents.
I would prefer embedding and updating multiple schema when there's a change, as opposed to doing a ref, for multiple reasons.
Get would be fast and easy and filter is not a problem (like you've said)
Retrieve operations usually happen a lot more often than updates and with proper indexing, you wouldn't really have to bother about performance.
It leverages on NoSQL's schema-less nature and you'll be less prone restructuring due to requirement changes (new sorting, new filters, etc)
Paging would be a lot less of a hassle, and UI would not be restricted with it's design with paging and limit.
Joining could become expensive. Redundant data might be a hassle to update but it's always better than not being able to display a data in a particular way because your schema is normalized and joining is difficult.
I'd say that the rule of thumb is that only split them when you do not need to display them together. It is not impossible to join them back if you do, but definitely more troublesome.

Displaying results of perform find in a portal

I have some global variables $$A, $$B, $$C and what to search within a table for these terms in fieldA, fieldB and fieldC (using Perform Find). How can I use the result of this Perform Find to display the results in a portal.
The implementation by my predecessor replaces a field fieldSEARCHwith 1 if it is in the Perform Find results and 0 otherwise, and then uses a portal filtered by this field. This seems a very dodgey way of doing it, not least becuase it means that multiple users will not be able to search at the same time!

Can you enhance the portal filter to filter against the variables themselves? Or you can perform the find, grab IDs of the found set, put them into a global field, and then use the field to construct the relationship. Global fields are multi-user safe.
The best way is not to do this at all, but use list views to perform searches. List views are naturally searchable and much more flexible than portals (you can easily sort them, omit arbitrary records, and so on). It's possible to repeat this functionality in portals, but it's way more complex. I mean, if there's some serious gain from using a portal, then it's doable, but if not, then the native way is obviously better.

List views are easier to search, as FileMaker still hasn't transitioned to the 21st century and insists on this model... Most users however want a Master-Detail view, like a mail app, and understandably so as it's more intuitive (i.e. produce a list view on one side, but clicking on it updates detail/fields in the middle).
If this is what you want, you may want to cast an eye at Modular FM, where someone has already done the hard work for you:
http://www.modularfilemaker.org/module/masterdetail-2-0/
HTH
Stam

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse