Validation using other models - cqrs

What are the options to validate an event which uses another model?
Shopping cart example:
When adding a cart item to the shopping cart, there should be a check if the item isn't sold out yet.

You would usually validate a command rather than an event as an event should be something that cannot be changed
In answer to the question, typically it depends on what the business cost of the process is. In your example for instance, what is the cost to the business of ordering an item that is sold out? Probably very little - an email saying the item is out of stock, with an estimate of how long it will take.
In this type of scenario you could use an eventually consistent read model over the data, where you'd query the read model / cache for stock levels, but accept that some orders might go through for things that are out of stock.
If you have tighter constraints then you'll have to enforce them, ideally by remodelling your aggregates, or having transactional and/or blocking on the ordering process.

What are the options to validate an event which uses another model?
A domain event is an important happening from the point of view of the business. It's something that happen in the past, so it can't be changed.
In OO, it's commonly represented as a Value Object, that's say, an immutable object where the interesting part are their attributes.
Commonly, those Domain Events are yield from an operation in a Aggregate Root (DDD jargon). The client of the Aggregate Root is the Application Service (aka use case). The Application Service receive a Command object and base on that, executes operation in the Aggregate Root.
The validation could consist in primitive validation, object validation and/or composed object validation. Then the object in charge to perform this validation should be the Aggregate Root itself and/or some objects with the specific goal in validation.
When adding a cart item to the shopping cart, there should be a check
if the item isn't sold out yet
Following your example the objects woulb be:
Command: AddItemToShoppingCartCommand. Holds information about the item to be added, and shopping cart identifier for example.
Application Service: AddItemToShoppingCartService.
Aggregate Root: ShoppingCartInventory. I've used deliberately Inventory in the name to be explicit that this Aggregate Root satisfy the invariant "...if the item isn't sold out yet."
Note: In my opinion, the invariant that checks the inventory in the Aggregate Root, makes the Aggregate too big. My advice is relax this invariant and embrace eventual consistency if this "sold out" it's not happen normally.

Related

CQRS projections, joining data from different aggregates via probe commands

In CQRS when we need to create a custom-tailored projections for our read-models, we usually prefer a "denormalized" projections (assume we are talking about projecting onto a DB). It is not uncommon to have the information need by the application/UI come from different aggregates (possibly from different BCs).
Imagine we need a projected table to contain customer's information together with her full address and that Customer and Address are different aggregates in our system (possibly in different BCs). Meaning that, addresses are generated and maintained independently of customers. Or, in other words, when a new customer is created, there is no guarantee that there will be an AddressCreatedEvent subsequently produced by the system, this event may have already been processed prior to the creation of the customer. All we have at the time of CreateCustomerCommand is an UUID of an existing address.
We have several solutions here.
Enrich CreateCustomerCommand and the subsequent CustomerCreatedEvent to contain full address of the customer (looking up this information on the fly from the UI or the controller). This way the projection handler will just update the table directly upon receiving CustomerCreatedEvent.
Use the addrUuid provided in CustomerCreatedEvent to perform an ad-hoc query in the projection handler to get the missing part of the address information before updating the table.
These are commonly discussed solution to this problem. However, as noted by many others, there are problems with each approach. Enriching events can be difficult to justify as well described by Enrico Massone in this question, for example. Querying other views/projections (kind of JOINs) will work but introduces coupling (see the same link).
I would like describe another method here, which, as I believe, nicely addresses these concerns. I apologize beforehand for not giving a proper credit if this is a known technique. Sincerely, I have not seen it described elsewhere (at least not as explicitly).
"A picture speaks a thousand words", as they say:
The idea is that :
We keep CreateCustomerCommand and CustomerCreatedEvent simple with only addrUuid attribute (no enriching).
In API controller we send two commands to the command handler (aggregates): the first one, as usual, - CreateCustomerCommand to create customer and project customer information together with addrUuid to the table leaving other columns (full address, etc.) empty for time being. (Warning: See the update, we may have concurrency issue here and need to issue the probe command from a Saga.)
Right after this, and after we have obtained custUuid of the newly created customer, we issue a special ProbeAddrressCommand to Address aggregate triggering an AddressProbedEvent which will encapsulate the full state of the address together with the special attribute probeInitiatorUuid which is, of course our custUuid from the previous command.
The projection handler will then act upon AddressProbedEvent by simply filling in the missing pieces of the information in the table looking up the required row by matching the provided probeInitiatorUuid (i.e. custUuid) and addrUuid.
So we have two phases: create Customer and probe for the related Address. They are depicted in the diagram with (1) and (2) correspondingly.
Obviously, we can send as many such "probe" commands (in parallel) as needed by our projection: ProbeBillingCommand, ProbePreferencesCommand, etc. effectively populating or "filling in" the denormalized projection with missing data from each handled "probe" event.
The advantages of this method is that we keep the commands/events in the first phase simple (only UUIDs to other aggregates) all the while avoiding synchronous coupling (joining) of the projections. The whole approach has a nice EDA feeling about it.
My question is then: is this a known technique? Seems like I have not seen this... And what can go wrong with this approach?
I would be more then happy to update this question with any references to other sources which describe this method.
UPDATE 1:
There is one significant flaw with this approach that I can see already: command ProbeAddrressCommand cannot be issued before the projection handler had a chance to process CustomerCreatedEvent. But this is impossible to know from the API gateway (or controller).
The solution would probably involve a Saga, say CustomerAddressJoinProjectionSaga with will start upon receiving CustomerCreatedEvent and which will only then issue ProbeAddrressCommand. The Saga will end upon registering AddressProbedEvent. Or, if many other aggregates are involved in probing, when all such events have been received.
So here is the updated diagram.
UPDATE 2:
As noted by Levi Ramsey (see answer below) my example is rather convoluted with respect to the choice of aggregates. Indeed, Customer and Address are often conceptualized as belonging together (same Aggregate Root). So it is a better illustration of the problem to think of something like Student and Course instead, assuming for the sake of simplicity that there is a straightforward relation between the two: a student is taking a course. This way it is more obvious that Student and Course are independent aggregates (students and courses can be created and maintained at different times and different places in the system).
But the question still remains: how can we obtain a projection containing the full information about a student (full name, etc.) and the courses she is registered for (title, credits, the instructor's full name, prerequisites, etc.) all in the same table, if the UI requires it ?
A couple of thoughts:
I question why address needs to be a separate aggregate much less in a different bounded context, in view of the requirement that customers have an address. If in some other bounded context customer addresses are meaningful (e.g. you want to know "which addresses have more customers" etc.), then that context can subscribe to the events from the customer service.
As an alternative, if there's a particularly strong reason to model addresses separately from customers, why not have the read side prospectively listen for events from the address aggregate and store the latest address for a given address UUID in case there's a customer who ends up with that address. The reliability per unit effort of that approach is likely to be somewhat greater, I would expect.

How to modelling domain model - aggregate root

I'm having some issues to correctly design the domain that I'm working on.
My straightforward use case is the following:
The user (~5000 users) can access to a list of ads (~5 millions)
He can choose to add/remove some of them as favorites.
He can decide to show/hide some of them.
I have a command which will mutate the aggregate state, to set Favorite to TRUE, let's say.
In terms of DDD, how should I design the aggregates?
How design the relationship between a user and his favorite's ads selection?
Considering the large numbers of ads, I cannot duplicate each ad inside a user aggregate root.
Can I design a Ads aggregateRoot containing a user "collection".
And finally, how to handle/perform the readmodels part?
Thanks in advance
Cheers
Two concepts may help you understand how to model this:
1. Aggregates are Transaction Boundaries.
An aggregate is a cluster of associated objects that are considered as a single unit. All parts of the aggregate are loaded and persisted together.
If you have an aggregate that encloses a 1000 entities, then you have to load all of them into memory. So it follows that you should preferably have small aggregates whenever possible.
2. Aggregates are Distinct Concepts.
An Aggregate represents a distinct concept in the domain. Behavior associated with more than one Aggregate (like Favoriting, in your case) is usually an aggregate by itself with its own set of attributes, domain objects, and behavior.
From your example, User is a clear aggregate.
An Ad has a distinct concept associated with it in the domain, so it is an aggregate too. There may be other entities that will be embedded within the Ad like valid_until, description, is_active, etc.
The concept of a favoriting an Ad links the User and the Ad aggregates. Your question seems to be centered around where this linkage should be preserved. Should it be in the User aggregate (a list of Ads), or should an Ad have a collection of User objects embedded within it?
While both are possibilities, IMHO, I think FavoriteAd is yet another aggregate, which holds references to both the User aggregate and the Ad aggregate. This way, you don't burden the concepts of User or the Ad with favoriting behavior.
Those aggregates will also not be required to load this additional data every time they are loaded into memory. For example, if you are loading an Ad object to edit its contents, you don't want the favorites collection to be loaded into memory by default.
These aggregate structures don't matter as far as read models are concerned. Aggregates only deal with the write side of the domain. You are free to rewire the data any way you want, in multiple forms, on the read side. You can have a subscriber just to listen to the Favorited event (raised after processing the Favorite command) and build a composite data structure containing data from both the User and the Ad aggregates.
I really like the answer given by Subhash Bhushan and I want to add another approach for you to consider.
If you look closely at your question you will see that you've made the assumption that an aggregate can 'see' everything that the user does when they are interacting with the UI. This doesn't need to be so.
Depending on the requirements of the domain you don't need to hold a list of any Ads in the aggregate to favourite them. Here's what I mean:
For this example, it doesn't matter where the the 'favourite' ad command sits. It could be on the user aggregate or a specific aggregate for handling the concept of Favouriting. The command just needs to hold the id of the User and the Ad they are favouriting.
You may need to handle what happens if a user or ad is deleted but that would just be a case of an event process manager listening to the appropriate events and issuing compensating commands.
This way you don't need to load up 5 million ads. That's a job for the read model and UI, not the domain.
Just a thought.

REST API filtering/search on a parent resource

Suppose I have a store such as Amazon that sells a variety of products such as computers and paintings. They are quite different from each other and have their own set of fields and logic.
In addition to the typical CRUD, I need to design a JSON API that allows me to:
A. Fetch an ungrouped list of paintings and computers. For example: [computer, painting, painting, computer, ...] ordered by date published (so with filtering capability).
B. Fetch only paintings
C. Fetch only computers
The RESTful approach will typically be something like: /api/paintings and api/computers which works really well for segregated results.
But my main concern is operation A - getting an ungrouped list of paintings and products sorted by date published. The way I see it, there are three approaches:
1) Create a new standalone resource called products such as /api/products which will have filtering capability and continue to use /api/resource for specific CRUD operations.
2) Create a parent products resource which will be used for filtering operations. So I can do something like /products?order_by=published_date And for more specific resources I can do something like /products/paintings or /products/computers
3) Do not have a resource for paintings or computers. Instead have one for a generic product. I will then have most logic in the api layer and reduce the complexity of the client.
I am leaning towards approach #3 but wanted to get feedback prior to implementing since this will be a core feature of the API.
I've always taken the approach the your API Layer should match your object modeling. So, the answer to your question would be it depends on the source data. Well, the source data after it's object modeling.
If you have an object model for Computer and for Printer, they should be resources like you've said. Do they share any data/functions? If so, you should have an object model for that, too, perhaps: Product. Then Computer and Printer extend the Product class.
With that in mind, design the API layer to mirror it. Since Computer and Printer both extend Product. Product as a parent of the Computer and Printer resources make sense.
In my opinion I would go for approach #3 and query the API with type of product if you search for it.
/products?type=computers&order_by=date

CQRS events do not contain details needed for updating read model

There is one thing about CQRS I do not get: How to update the read model when the raised event does not contain the details needed for updating the read model.
Unfortunately, this is a quite common scenario.
Example: I add a user to a group, so I send a addUserToGroup(userId, groupId) command. This is received, handled by the command handler, the userAddedToGroup event is created, stored and published.
Now, an event handler receives this event and the both IDs. Now there shall be a view that lists all users with the names of the groups they're in. To update the read model for that view, we do need the user id (which we have) and the group name (which we don't have, we only have its id).
So the question is: How do I handle this scenario?
Currently, four options come to my mind, all with their specific disadvantages:
The read model asks the domain. => Forbidden, and not even possible, as the domain only has behavior, no (public) state.
The read model reads the group name from another table in the read model. => Works, but what if there is no matching table?
Add the neccessary data to the event. => Does not work, as this means that I had to update all previous events as well, and I cannot foresee which data I may need one day.
Do not handle the event via a "usual" event handler, but start an ETL process in the background that deals with the event store, creates the neccessary data and writes the read model. => Works, but to me this seems a little bit of way too much overhead for such a simple scenario.
So, the question is: How do I deal with this scenario correctly?
There are two common solutions.
1) "Event Enrichment" is where you indeed put information on the event that reflects the information you are mentioning, e.g. the group name. Doing this is somewhere between modeling your domain properly and cheating. If you know, for instance, that group names change, emitting the name at the moment of the change is not a bad idea. Imagine when you create a line item on a quote or invoice, you want to emit the price of the good sold on the invoice created event. This is because you must honor that price, even if it changes later.
2) Project several streams at once. Write a projector which watches information from the various streams and joins them together. You might watch user and group events as well as your user added to group event. Depending on the ordering of events in your system, you may know that a user is in a group before you know the name of the group, but you should know the general properties of your event store before you get going.
Events don't necessarily represent a one-to-one mapping of the commands that have initiated the process in the first place. For instance, if you have a command:
SubmitPurchaseOrder
Shopping Cart Id
Shipping Address
Billing Address
The resulting event might look like the following:
PurchaseOrderSubmitted
Items (Id, Name, Amount, Price)
Shipping Address
Shipping Provider
Our Shipping Cost
Shipping Cost billed to Customer
Billing Address
VAT %
VAT Amount
First Time Customer
...
Usually the information is available to the domain model (either by being provided by the command or as being known internal state of the concerned aggregate or by being calculated as part of processing.)
Additionally the event can be enriched by querying the read model or even a different BC (e.g. to retrieve the actual VAT % depending on state) during processing.
You're correctly assuming that events can (and probably will) change over time. This basically doesn't matter at all if you employ versioning: Add the new event (e.g. SubmitPurchaseOrderV2) and add an appropriate event handler to all the classes that are supposed to consume it. No need to change the old event, it can still be consumed since you don't modify the interface, you extend it. This basically comes down to a very good example of the Open/Closed Principle in practice.
Option 2 would be fine, your question about "what about the mismatching in the groups' name read-model table" wouldn´t apply. no data should be deleted, should invalidated when a previous event (say delete group) was emmited. In the end the row in the groups table is there effectively and you can read the group name without problem at all. The only apparent problem could be speed inconsistency, but thats another issue, events should be orderly processed no matter speed they are being processed.

Aggregate Pattern and Performance Issues

I have read about the Aggregate Pattern but I'm confused about something here. The pattern states that all the objects belonging to the aggregate should be accessed via the Aggregate Root, and not directly.
And I'm assuming that is the reason why they say you should have a single Repository per Aggregate.
But I think this adds a noticeable overhead to the application. For example, in a typical Web-based application, what if I want to get an object belonging to an aggregate (which is NOT the aggregate root)? I'll have to call Repository.GetAggregateRootObject(), which loads the aggregate root and all its child objects, and then iterate through the child objects to find the one I'm looking for. In other words, I'm loading lots of data and throwing them out except the particular object I'm looking for.
Is there something I'm missing here?
PS: I know some of you may suggest that we can improve performance with Lazy Loading. But that's not what I'm asking here... The aggregate pattern requires that all objects belonging to the aggregate be loaded together, so we can enforce business rules.
I'm not sure where you read about this "Aggregate Pattern". Please post a link.
One thing it could be is where we encapsulate a list into another object. For a simple example if we have a shopping cart instead of passing around a list of purchases we use a cart object instead. Then code that works on the whole cart (eg. getting total spend) can be encapsulated in the cart. I'm not sure if this is really a pattern but google found this link: http://perldesignpatterns.com/?AggregatePattern
I suspect when you say
"The aggregate pattern requires that
all objects belonging to the aggregate
be loaded together, so we can enforce
business rules. "
this depends on your business rules.
Generally patterns should not be regarded as a set of rules you must follow for everything. It is up to you as a developer to recognise where they can be used effectively.
In our cart example we would generally want to work on the whole cart at once. We might have business rules that say our customers can only order a limited about of items - or maybe they get a discount for order multiple items. So it makes sense to read the whole thing.
If you take a different example eg. products. You could still have a products repository but you needn't not load them all at once. You probably only ever want a page of products at a time.