CQRS and Passing Data - cqrs

Suppose I have an aggregate containing some data and when it reaches a certain state, I'd like to take all that state and pass it to some outside service. For argument and simplicity's sake, lets just say it is an aggregate that has a list and when all items in that list are checked off, I'd like to send the entire state to some outside service. Now when I'm handling the command for checking off the last item in the list, I'll know that I'm at the end but it doesn't seem correct to send it to the outside system from the processing of the command. So given this scenario what is the recommended approach if the outside system requires all of the state of the aggregate. Should the outside system build its own copy of the data based on the aggregate events or is there some better approach?

Should the outside system build its own copy of the data based on the aggregate events.
Probably not -- it's almost never a good idea to share the responsibility of rehydrating an aggregate from its history. The service that owns the object should be responsible for rehydration.
First key idea to understand is when in the flow the call to the outside service should happen.
First, the domain model processes the command arguments, computing the update to the event history, including the ChecklistCompleted event.
The application takes that history, and saves it to the book of record
The transaction completes successfully.
At this point, the application knows that the operation was successful, but the caller doesn't. So the usual answer is to be thinking of an asynchronous operation that will do the rest of the work.
Possibility one: the application takes the history that it just saved, and uses that history to create schedule a task to rehydrate a read-only copy of the aggregate state, and then send that state to the external service.
Possibility two: you ditch the copy of the history that you have now, and fire off an asynchronous task that has enough information to load its own copy of the history from the book of record.
There are at least three ways that you might do this. First, you could have the command schedule the task as before.
Second, you could have a event handler listening for ChecklistCompleted events in the book of record, and have that handler schedule the task.
Third, you could read the ChecklistCompleted event from the book of record, and publish a representation of that event to a shared bus, and let the handler in the external service call you back for a copy of the state.
I was under the impression that one bounded context should not reach out to get state from another bounded context but rather keep local copies of the data it needed.
From my experience, the key idea is that the services shouldn't block each other -- or more specifically, a call to service B should not block when service A is unavailable. Responding to events is fundamentally non blocking; does it really matter that we respond to an asynchronously delivered event by making an asynchronous blocking call?
What this buys you, however, is independent evolution of the two services - A broadcasts an event, B reacts to the event by calling A and asking for a representation of the aggregate that B understands, A -- being backwards compatible -- delivers the requested representation.
Compare this with requiring a new release of B every time the rehydration logic in A changes.
Udi Dahan raised a challenging idea - the notion that each piece of data belongs to a singe technical authority. "Raw business data" should not be replicated between services.
A service is the technical authority for a specific business capability.
Any piece of data or rule must be owned by only one service.
So in Udi's approach, you'd start to investigate why B has any responsibility for data owned by A, and from there determine how to align that responsibility and the data into a single service. (Part of the trick: the physical view of a service can span process boundaries; in other words, a process may be composed from components that belong to more than one service).
Jeppe Cramon series on microservices is nicely sourced, and touches on many of the points above.

You should never externalise your state. Reporting on that state is a function of the read side, as it produces reports and you'll need that data to call the service. The structure of your state is plastic, and you shouldn't have an external service that relies up that structure otherwise you'll have to update both in lockstep which is a bad thing.
There is a blog that puts forward a strong argument that the process manager is the correct place to put this type of feature (calling an external service), because that's the appropriate place for orchestrating events.

Related

How to communicate between order placing system and trade matching system

Order Placing System
A user places an order and the corresponding amount is kept on hold for the order and an order is created. This order is then pushed to some queue to be used by trade matching system. User gets back a reference order id for the order placed in return to the API call.
Trade Matching System
The system feeds on data from the queue generated by order placing system and looks for possible match and if possible to execute, executes them and push to another queue.
User Notification System
The system fetches data from the executed queue and broadcasts it to the user it belonged to. User can also fetch status of the order from the reference id which was shared on first API call
These two systems are right now communicating indirectly via a queue. Now the requirement is, in order placing system, when a user places order, along with order id, we also need to return execution status (i.e. Whether it got executed or not, if yes, rate and fee charged etc).
What should be mode of communication between order placing and trade matching system to make it possible to return execution details in first api call itself ?
Challenges
Matching System being single threaded, we cannot merge it with order engine
Polling and waiting for execution from execution queue, will probably make our order placing API slow
Right now our order placing system and matching system are separate.
Just looking for possible solution and opinion. Please let me know if something is unclear.

DDD: How to solve this using Domain-Driven design?

I'm new to DDD and cutting my teeth on the following exercise. The use case is real, but my attempt to solve it with DDD is purely for learning.
We have multiple Git repos, each containing a file that we call
product spec. The system needs to respond to a HTTP POST by cloning all
the repos, and then update the product spec in those that match some
information in the POST body. System also needs to log the POST request as the cause for updating the product spec.
I'd like to use Aggregates and event sourcing for solving this problem because they seem like a good fit. Event sourcing comes with automatic persistence of the commands, so if I convert the POST body to a command, I get auditing for free.
Problem is, the POST may match multiple product spec. I'm not sure how to deal with that. Should I create a domain service, let it find all the matching product spec and then issue an update command to each? Or should I have the aggregate root do so? If using aggregate root to update multiple entities, it itself needs to be an entity, so what would it be in my problem domain?
The first comment to your question is right (the one of #VoiceOfUnreason): this 'is mostly side effect coordination'.
But I will try to answer your question: How to solve this using DDD / Event Sourcing:
The first aggregate root could just be named: 'MultipleRepoOperations'. This aggregate root has only one stream of events.
The command that fires the whole process could be: 'CloneAndUpdateProdSpecRepos' which carries a list of all the repos to be cloned and updated.
When the aggregate root processes the command it will simply spit a bunch of events of type 'UserRequestedToCloneAndUpdateProdSpec'
The second bounded context manages all the repos, and it its subscribed to all the events from 'MultipleRepoOperations' and will receive each event emitted by it. This bounded context aggregate root can be called: 'GitRepoManagement', and has a stream per repo. Eg: GitRepoManagement-Repo1, GitRepoManagement-Repo215, GitRepoManagement-20158, etc.
'GitRepoManagement' receives each event of type 'UserRequestedToCloneAndUpdateProdSpec', replays its corresponding repo stream in order to rehydrate the current state, and then tries to clone and update the product spec for the repo. When fails emits a failed event or a suceed if appropiate.
for learning purposes try to choose problem domain that has more complex rules and logic, where many actions is needed. for example small game (card game,multiplayer quiz game or whatever). or simulate some real world process like school management or some business process.

Complicated job aggregate

I have a very complicated job process and it's not 100% clear to me where to handle what.
I don't want to have code, it just the question who is responsible for what.
Given is the following:
There is a root directory "C:\server"
Inside are two directories "ftp" and "backup"
Imagine the following process:
An external customer sends a file into the ftp directory.
An importer application get's the file and now the fun starts.
A job aggregate have to be created for this file.
The command "CreateJob(string file)" is fired.
?. The file have to be moved from ftp to backup. Inside the CommandHandler or inside the Aggregate or on JobCreated event?
StartJob(Guid jobId) get's called. A third folder have to be created "in-progress", File have to be copied from backup to in-progress. Who does it?
So it's unclear for me where Filesystem things have to be handled if the Aggregate can not work correctly without the correct filesystem.
Because my first approach was to do that inside an Infrastructure layer/lib which listen to the events from the job layer. But it seems not 100% correct?!
And top of this, what is with replaying?
You can't replay things/files that were moved, you have to somehow simulate that a customer sends the file to the ftp folder...
Thankful for answers
The file have to be moved from ftp to backup. Inside the CommandHandler or inside the Aggregate or on JobCreated event?
In situations like this, I move the file to the destination folder in the Application service that sends the command to the Aggregate (or that calls a command-like method on the Aggregate, it's the same) before the command is sent to the Aggregate. In this way, if there are some problems with the file-system (not enough permissions or space is not available etc) the command is not sent. These kind of problems should not reach our Aggregate. We most protect it from the infrastructure. In fact we should keep the Aggregate isolated from anything else; it must contain only pure business logic that is used to decide what events get generated.
Because my first approach was to do that inside an Infrastructure layer/lib which listen to the events from the job layer. But it seems not 100% correct?!
Indeed, this seems like over engineering to me. You must KISS.
StartJob(Guid jobId) get's called. A third folder have to be created "in-progress", File have to be copied from backup to in-progress. Who does it?
Whoever's calling the StartJob could do the moving, before the StartJob gets called. Again, keep the Aggregate pure. In this case it depends on your framework/domain details.
And top of this, what is with replaying? You can't replay things/files that where moved, you have to somehow simulate that a customer sends the file to the ftp folder...
The events are loaded from the event store and replayed in two situations:
Before every command gets sent to the Aggregate, the Aggregate Repository loads all the events from the event store then it applies every one of them to the Aggregate, probably calling some applyThisEvent(TheEvent) method on the Aggregate. So, this methods should be with no side effects (pure) otherwise you change the outside world again and again at every command execution and you don't want that.
The read-models (the projections, the query-models) that present data to the user listen to those events and update the database tables that hold the data that the users see. The events are sent to those read-models after they are generated and every time the read-models are being recreated. When you invent a new read-model, you must pass it all the events that were previous generated by the aggregates in order to build the correct/complete state on them. If your read-model's event listeners have side effects what do you think happens when you replay those long past events? The outside world is modified again and again and you don't want that! The read-models only interpret the events, they don't generate other events and they don't change the outside world.
There is a special third case when events reach another type of model, a Saga. A Saga must receive an event only once! This is the case that you thought to use in Because my first approach was to do that inside an Infrastructure layer/lib which listen to the events from the job layer. You could do this in your case but is not KISS.
I have a very complicated job process and it's not 100% clear to me where to handle what. I don't want to have code, it just the question who is responsible for what.
The usual answer is that the domain model -- aka the "aggregate" makes decisions, and saves them. Observing those decisions, some event handler induces side effects.
And top of this, what is with replaying? You can't replay things/files that where moved, you have to somehow simulate that a customer sends the file to the ftp folder...
You replay the events to the aggregate, so that it is restored to the state where it made the last decision. That's a separate concern from replaying the side effects -- which is part of the motivation for handling the side effects elsewhere.
Where possible, of course, you prefer to have the side effects be idempotent, so that a duplicated message doesn't create a problem. But notice that from the point of view of the model, it doesn't actually matter whether the side effect succeeds or not.

Can I use Time as globally unique event version?

I found time as the best value as event version.
I can merge perfectly independent events of different event sources on different servers whenever needed without being worry about read side event order synchronization. I know which event (from server 1) had happened before the other (from server 2) without the need for global sequential event id generator which makes all read sides to depend on it.
As long as the time is a globally ever sequential event version , different teams in companies can act as distributed event sources or event readers And everyone can always relay on the contract.
The world's simplest notification from a write side to subscribed read sides followed by a query pulling the recent changes from the underlying write side can simplify everything.
Are there any side effects I'm not aware of ?
Time is indeed increasing and you get a deterministic number, however event versioning is not only serves the purpose of preventing conflicts. We always say that when we commit a new event to the event store, we send the new event version there as well and it must match the expected version on the event store side, which must be the previous version plus exactly one. If there will be a thousand or three millions of ticks between two events - I do not really care, this does not give me the information I need. And if I have missed one event on the go is critical to know. So I would not use anything else than incremental counter, with events versioned per aggregate/stream.

Data store in an actor system

I'm working on an event processing pipeline based on Akka actors. I have 3 actors for each step of the pipeline: FilterWorker, EnrichWorker and ProcessWorker; plus a supervisor actor that makes sure the events are sent from one step of the pipeline to the next.
The enrich step might need to query some external database for extra data or even create new data that I'll want to persist. For example, the enrich step of a web analytics system might want to enrich a click event with the user that made the click and store that user information in a database.
Keeping in mind that example, I see the following options:
1.Use a singleton; e.g. UserStore that keeps in memory all the users gathered so far and saves them to the database once in a while; has all the logic to fetch users that are not yet in memory. Doesn't seem like a good idea to use a singleton in an actor system however (?).
Use a store actor. Use tell to add a new user and ask to fetch it.
Is there a better pattern for this?
Thanks!
In order to not leave this unanswered, I went with my second option and johanandren's suggestion of having an Actor fill the data store role. Works pretty well!