In CQRS/Eventsourcing which is best approach for a parent to modify the state of all it's children ?? - cqrs

Usecase: Suppose I have the following aggregates
Root aggregate - CustomerRootAggregate (manages each CustomerAggregate)
Child aggregate of the Root aggregate - CustomerAggregate (there are 10 customers)
Question: How do I send DisableCustomer command to all the 10 CustomerAggregate to update their state to be disabled ?
customerState.enabled = false
Solutions: Since CQRS does not allow the write side to query the read side to get a list of CustomerAggregate IDs I thought of the following:
CustomerRootAggregate always store the IDs of all it's CustomerAggregate in the database as json. When a Command for DisableAllCustomers is received by CustomerRootAggregate it will fetch the CustomerIds json and send DisableCustomer command to all the children where each child will restore it's state before applying DisableCustomer command. But this means I will have to maintain CustomerIds json record's consistency.
The Client (Browser - UI) should always send the list of CustomerIds to apply DisableCustomer to. But this will be problematic for a database with thousands of customers.
In the REST API Layer check for the command DisableAllCustomers and fetch all the IDs from the read side and sends DisableAllCustomers(ids) with IDs populated to write side.
Which is a recommended approach or is a better approach ?

Root aggregate - CustomerRootAggregate (manages each CustomerAggregate)
Child aggregate of the Root aggregate - CustomerAggregate (there are 10 customers)
For starters, the language "child aggregate" is a bit confusing. Your model includes a "parent" entity that holds a direct reference to a "child" entity, then both of those entities must be part of the same aggregate.
However, you might have a Customer aggregate for each customer, and a CustomerSet aggregate that manages a collection of Id.
How do I send DisableCustomer command to all the 10 CustomerAggregate to update their state to be disabled ?
The usual answer is that you run a query to get the set of Customers to be disabled, and then you dispatch a disableCustomer command to each.
So both 3 and 2 are reasonable answers, with the caveat that you need to consider what your requirements are if some of the DisableCustomer commands fail.
2 in particular is seductive, because it clearly articulates that the client (human operator) is describing a task, which the application then translates into commands to by run by the domain model.
Trying to pack "thousands" of customer ids into the message may be a concern, but for several use cases you can find a way to shrink that down. For instance, if the task is "disable all", then client can send to the application instructions for how to recreate the "all" collection -- ie: "run this query against this specific version of the collection" describes the list of customers to be disabled unambiguously.
When a Command for DisableAllCustomers is received by CustomerRootAggregate it will fetch the CustomerIds json and send DisableCustomer command to all the children where each child will restore it's state before applying DisableCustomer command. But this means I will have to maintain CustomerIds json record's consistency.
This is close to a right idea, but not quite there. You dispatch a command to the collection aggregate. If it accepts the command, it produces an event that describes the customer ids to be disabled. This domain event is persisted as part of the event stream of the collection aggregate.
Subscribe to these events with an event handler that is responsible for creating a process manager. This process manager is another event sourced state machine. It looks sort of like an aggregate, but it responds to events. When an event is passed to it, it updates its own state, saves those events off in the current transaction, and then schedules commands to each Customer aggregate.
But it's a bunch of extra work to do it that way. Conventional wisdom suggests that you should usually begin by assuming that the process manager approach isn't necessary, and only introduce it if the business demands it. "Premature automation is the root of all evil" or something like that.

Related

Synchronising events between microservices using Kafka and MongoDb connector

I'm experimenting with microservices architecture. I have UserService and ShoppingService.
In UserService I'm using MongoDb. When I'm creating new user in UserService I want to sync basic user info to ShoppingService. In UserService I'm using something like event sourcing. When I'm creating new User, I first create the UserCreatedEvent and then I apply the event onto domain User object. So in the end I get the domain User object that has current state and list of events containing one UserCreatedEvent.
I wonder if I should persist the Events collection as a nested property of User document or in separate UserEvents collection. I was planning to use Kafka Connect to synchronize the events from UserService to ShoppingService.
If I decide to persist the events inside the User document then I don't need transaction that I would use to save event to separate UserEvents collection but I can't setup the Kafka connector to track changes in the nested property only.
If I decide to persist events in separate UserEvents collection I need to wrap in transaction changes to User and UserEvents. But saving events to separate collection makes setting up Kafka connector very easy because I track only inserts and I don't need to track updates of nested UserEvents array in User document.
I think I will go with the second option for sake of simplicity but maybe I've missed something. Is it good idea to implement it like this?
I would generally advise the second approach. Note that you can also eliminate the need for a transaction by observing that User is just a snapshot based on the UserEvents up to some point in the stream and thus doesn't have to be immediately updated.
With this, your read operation for User can be: select a user from User (the latest snapshot), which includes a version/sequence number saying that it's as-of some event; then select the events with later sequence numbers and apply those events to the user. If there's some querier which wants a faster response and can tolerate getting something stale, a different endpoint (or an option in the query) can bypass the event replay.
You can then have some asynchronous process which subscribes to the stream of user events and updates User based on those events.

version of aggregate event sourcing

According to event sourcing. When a command is called, all events of a domain have to be stored. Per event, system must increase the version of an aggregate. My eventstore is something like this:
(AggregateId, AggregateVersion, Sequence, Data, EventName, CreatedDate)
(AggregateId, AggregateVersion) is key
In some cases it does not make sense to increase the version of an aggregate. For example,
a command register an user and raises RegisteredUser, WelcomeEmailEvent, GiftCardEvent.
how can I handle this problem?
how can I handle this problem?
Avoid confusing your representation-of-information-changes events from your publishing-for-use-elsewhere events.
"Event sourcing", as commonly understood in the domain-drive-design and cqrs space, is a kind of data model. We're talking specifically about the messages an aggregate sends to its future self that describe its own changes over time.
It's "just" another way of storing the state of the aggregate, same as we would do if we were storing information in a relational database, or a document store, etc.
Messages that we are going to send to other components and then forget about don't need to have events in the event stream.
In some cases, there can be confusion when we haven't recognized that there are multiple different processes at work.
A requirement like "when a new user is registered, we should send them a welcome email" is not necessarily part of the registration process; it might instead be an independent process that is triggered by the appearance of a RegisteredUser event. The information that you need to save for the SendEmail process would be "somewhere else" - outside of the Users event history.
Event changes the state of an aggregate, and therefore changes its version. If state is not changed, then there should be no event for this aggregate.
In your example, I would ask myself - if WelcomeEmailEvent does not change the state of the User aggregate, then whose state it chages? Perhaps some other aggregate - some EmailNotification service that cares about successful or filed email attempt. In this case I would make it event of those aggregate which state it changes. And it will affect version of that aggregate.

Maintain reference between aggregates

I'm trying to wrap my head around how to maintain id references between two aggregates, eg. when an event happens on either side that affects the relationship, the other side is updated as well in an eventual consistent manner.
I have two aggregates, one for "Team" and one for "Event", in the context of a festival with the following code:
#Aggregate
public class Event {
#AggregateIdentifier
private EventId eventId;
private Set<TeamId> teams; // List of associated teams
... protected constructor, getters/setters and command handlers ...
}
#Aggregate
public class Team {
#AggregateIdentifier
private TeamId teamId;
private EventId eventId; // Owning event
... protected constructor, getters/setters and command handlers ...
}
A Team must always be associated to an event (through the eventId). An event contains a list of associated teams (through the team id set).
When a team is created (CreateTeamCommand) on the Team aggregate, I would like the TeamId set on the Event aggregate to be updated with the team id of the newly created team.
If the command "DeleteEventCommand" on the Event aggregate is executed, all teams associated to the event should also be deleted.
If a team is moved from one event to another event (MoveTeamToEventCommand) on the Team aggregate, the eventId on the Team aggregate should be updated but the TeamId should be removed from the old Event aggregate and be added to the new Event aggregate.
My current idea was to create a saga where I would run SagaLifecycle.associateWith for both the eventId on the Event aggregate and the teamId on the Team aggregate with a #StartSaga on the "CreateTeamCommand" (essentially the first time the relationship starts) and then have an event handler for every event that affects the relationship. My main issue with this solution is:
1: It would mean I would have a unique saga for each possible combination of team and event. Could this cause trouble performance wise if it was scaled to eg. 1mil events with each event having 50 teams? (This is unrealistic for this scenario but relevant for a general solution to maintain relationships between aggregates).
2: It would require I had custom commands and event handlers dedicated to handle the update of teams in team list of the Event aggregate as the resulting events should not be processed in the saga to avoid an infinite loop of updating references.
Thank you for reading this small story and I hope someone can either confirm that I'm on the right track or point me in the direction of a proper solution.
An event contains a list of associated teams (through the team id set).
If you mean "An event aggregate" here by "An event", I don't believe your event aggregate needs team ids. If you think it does, it would be great to understand your reasoning on this.
What I think you need is though your read side to know about this. Your read model for a single "Event" can listen on CreateTeamCommand and MoveTeamToEventCommand as well as all other "Event" related events, and build up the projection accordingly. Remember, don't design your aggregates with querying concerns in mind.
If the command "DeleteEventCommand" on the Event aggregate is executed, all teams associated to the event should also be deleted.
A few things here:
Again, your read side can listen on this event, and update the projections accordingly.
You can also start performing validation on relevant command handlers for the Team aggregate to check whether the Event exists or not before performing the operations. This won't have exact sync, but will cover for most cases (see "How can I verify that a customer ID really exists when I place an order?" section here).
If you really want to delete the associated Team aggregates off the back of a DeleteEventCommand event, you need to handle this inside a Saga as there is no way for you to be able to perform this in an atomic way w/o leaking the data storage system specifics into your domain model. So, you need certain retry and idempotency needs here, where a saga can give you. It's not exactly what you are suggesting here but related fact is that a single command can't act on a set of aggregates, see "How can I update a set of aggregates with a single command?" section here.

DB relationship: implementing a conversation

I want to implement a simple conversation feature, where each conversation has a set of messages between two users. My question is, if I have a reference from a message to a conversation, whether I should have a reference the other way as well.
Right now, each message has conversationId. To retrieve all the messages the belong to a certain conversation, I should do Message.find({conversationId: ..}). If I had stored an array of messages in a conversation object, I could do conversation.messages.
Which way is the convention?
It all depends on usage patterns. First, you normalize: 1 conversation has many messages, 1 message belongs to 1 conversation. That means you've got a 1-to-many (1:M) relationship between conversations and messages.
In a 1:M relationship, the SQL standard is to assign the "1" as a foreign key to each of the "M". So, each message would have the conversationId.
In Mongo, you have the option of doing the opposite via arrays. Like you said, you could store an array of messageIds in the conversation. This gets pretty messy because for every new message, you have to edit the conversation doc. You're essentially doubling your writes to the DB & keeping the 2 writes in sync is completely on you (e.g. what if the user deletes a message & it's not deleted from the conversation?).
In Mongo, you also have to consider the difference between 1:M and 1:F (1-to-few). Many times, it's advantageous to nest 1:F relationships, ie make the "F" a subdoc of the "1". There is a limit: each doc cannot exceed 16MB (this may lift in future versions). The advantage of nesting subdocs is you have atomic transactions because it's all the same doc, not to mention subscriptions in a pub/sub are easier. This may work, but if you've got a group-chat with 20 friends that's been going on for the last 4 years, you might have to get clever (cap it, start a new conversation, etc.)
Nesting would be my recommendation, although your origin idea of assigning a conversationId to each message works too (make sure to index!).

How to get list of aggregates using JOliviers's CommonDomain and EventStore?

The repository in the CommonDomain only exposes the "GetById()". So what to do if my Handler needs a list of Customers for example?
On face value of your question, if you needed to perform operations on multiple aggregates, you would just provide the ID's of each aggregate in your command (which the client would obtain from the query side), then you get each aggregate from the repository.
However, looking at one of your comments in response to another answer I see what you are actually referring to is set based validation.
This very question has raised quite a lot debate about how to do this, and Greg Young has written an blog post on it.
The classic question is 'how do I check that the username hasn't already been used when processing my 'CreateUserCommand'. I believe the suggested approach is to assume that the client has already done this check by asking the query side before issuing the command. When the user aggregate is created the UserCreatedEvent will be raised and handled by the query side. Here, the insert query will fail (either because of a check or unique constraint in the DB), and a compensating command would be issued, which would delete the newly created aggregate and perhaps email the user telling them the username is already taken.
The main point is, you assume that the client has done the check. I know this is approach is difficult to grasp at first - but it's the nature of eventual consistency.
Also you might want to read this other question which is similar, and contains some wise words from Udi Dahan.
In the classic event sourcing model, queries like get all customers would be carried out by a separate query handler which listens to all events in the domain and builds a query model to satisfy the relevant questions.
If you need to query customers by last name, for instance, you could listen to all customer created and customer name change events and just update one table of last-name to customer-id pairs. You could hold other information relevant to the UI that is showing the data, or you could simply hold IDs and go to the repository for the relevant customers in order to work further with them.
You don't need list of customers in your handler. Each aggregate MUST be processed in its own transaction. If you want to show this list to user - just build appropriate view.
Your command needs to contain the id of the aggregate root it should operate on.
This id will be looked up by the client sending the command using a view in your readmodel. This view will be populated with data from the events that your AR emits.