DDD: How to solve this using Domain-Driven design? - event-handling

I'm new to DDD and cutting my teeth on the following exercise. The use case is real, but my attempt to solve it with DDD is purely for learning.
We have multiple Git repos, each containing a file that we call
product spec. The system needs to respond to a HTTP POST by cloning all
the repos, and then update the product spec in those that match some
information in the POST body. System also needs to log the POST request as the cause for updating the product spec.
I'd like to use Aggregates and event sourcing for solving this problem because they seem like a good fit. Event sourcing comes with automatic persistence of the commands, so if I convert the POST body to a command, I get auditing for free.
Problem is, the POST may match multiple product spec. I'm not sure how to deal with that. Should I create a domain service, let it find all the matching product spec and then issue an update command to each? Or should I have the aggregate root do so? If using aggregate root to update multiple entities, it itself needs to be an entity, so what would it be in my problem domain?

The first comment to your question is right (the one of #VoiceOfUnreason): this 'is mostly side effect coordination'.
But I will try to answer your question: How to solve this using DDD / Event Sourcing:
The first aggregate root could just be named: 'MultipleRepoOperations'. This aggregate root has only one stream of events.
The command that fires the whole process could be: 'CloneAndUpdateProdSpecRepos' which carries a list of all the repos to be cloned and updated.
When the aggregate root processes the command it will simply spit a bunch of events of type 'UserRequestedToCloneAndUpdateProdSpec'
The second bounded context manages all the repos, and it its subscribed to all the events from 'MultipleRepoOperations' and will receive each event emitted by it. This bounded context aggregate root can be called: 'GitRepoManagement', and has a stream per repo. Eg: GitRepoManagement-Repo1, GitRepoManagement-Repo215, GitRepoManagement-20158, etc.
'GitRepoManagement' receives each event of type 'UserRequestedToCloneAndUpdateProdSpec', replays its corresponding repo stream in order to rehydrate the current state, and then tries to clone and update the product spec for the repo. When fails emits a failed event or a suceed if appropiate.

for learning purposes try to choose problem domain that has more complex rules and logic, where many actions is needed. for example small game (card game,multiplayer quiz game or whatever). or simulate some real world process like school management or some business process.

Related

Complicated job aggregate

I have a very complicated job process and it's not 100% clear to me where to handle what.
I don't want to have code, it just the question who is responsible for what.
Given is the following:
There is a root directory "C:\server"
Inside are two directories "ftp" and "backup"
Imagine the following process:
An external customer sends a file into the ftp directory.
An importer application get's the file and now the fun starts.
A job aggregate have to be created for this file.
The command "CreateJob(string file)" is fired.
?. The file have to be moved from ftp to backup. Inside the CommandHandler or inside the Aggregate or on JobCreated event?
StartJob(Guid jobId) get's called. A third folder have to be created "in-progress", File have to be copied from backup to in-progress. Who does it?
So it's unclear for me where Filesystem things have to be handled if the Aggregate can not work correctly without the correct filesystem.
Because my first approach was to do that inside an Infrastructure layer/lib which listen to the events from the job layer. But it seems not 100% correct?!
And top of this, what is with replaying?
You can't replay things/files that were moved, you have to somehow simulate that a customer sends the file to the ftp folder...
Thankful for answers
The file have to be moved from ftp to backup. Inside the CommandHandler or inside the Aggregate or on JobCreated event?
In situations like this, I move the file to the destination folder in the Application service that sends the command to the Aggregate (or that calls a command-like method on the Aggregate, it's the same) before the command is sent to the Aggregate. In this way, if there are some problems with the file-system (not enough permissions or space is not available etc) the command is not sent. These kind of problems should not reach our Aggregate. We most protect it from the infrastructure. In fact we should keep the Aggregate isolated from anything else; it must contain only pure business logic that is used to decide what events get generated.
Because my first approach was to do that inside an Infrastructure layer/lib which listen to the events from the job layer. But it seems not 100% correct?!
Indeed, this seems like over engineering to me. You must KISS.
StartJob(Guid jobId) get's called. A third folder have to be created "in-progress", File have to be copied from backup to in-progress. Who does it?
Whoever's calling the StartJob could do the moving, before the StartJob gets called. Again, keep the Aggregate pure. In this case it depends on your framework/domain details.
And top of this, what is with replaying? You can't replay things/files that where moved, you have to somehow simulate that a customer sends the file to the ftp folder...
The events are loaded from the event store and replayed in two situations:
Before every command gets sent to the Aggregate, the Aggregate Repository loads all the events from the event store then it applies every one of them to the Aggregate, probably calling some applyThisEvent(TheEvent) method on the Aggregate. So, this methods should be with no side effects (pure) otherwise you change the outside world again and again at every command execution and you don't want that.
The read-models (the projections, the query-models) that present data to the user listen to those events and update the database tables that hold the data that the users see. The events are sent to those read-models after they are generated and every time the read-models are being recreated. When you invent a new read-model, you must pass it all the events that were previous generated by the aggregates in order to build the correct/complete state on them. If your read-model's event listeners have side effects what do you think happens when you replay those long past events? The outside world is modified again and again and you don't want that! The read-models only interpret the events, they don't generate other events and they don't change the outside world.
There is a special third case when events reach another type of model, a Saga. A Saga must receive an event only once! This is the case that you thought to use in Because my first approach was to do that inside an Infrastructure layer/lib which listen to the events from the job layer. You could do this in your case but is not KISS.
I have a very complicated job process and it's not 100% clear to me where to handle what. I don't want to have code, it just the question who is responsible for what.
The usual answer is that the domain model -- aka the "aggregate" makes decisions, and saves them. Observing those decisions, some event handler induces side effects.
And top of this, what is with replaying? You can't replay things/files that where moved, you have to somehow simulate that a customer sends the file to the ftp folder...
You replay the events to the aggregate, so that it is restored to the state where it made the last decision. That's a separate concern from replaying the side effects -- which is part of the motivation for handling the side effects elsewhere.
Where possible, of course, you prefer to have the side effects be idempotent, so that a duplicated message doesn't create a problem. But notice that from the point of view of the model, it doesn't actually matter whether the side effect succeeds or not.

CQRS and Passing Data

Suppose I have an aggregate containing some data and when it reaches a certain state, I'd like to take all that state and pass it to some outside service. For argument and simplicity's sake, lets just say it is an aggregate that has a list and when all items in that list are checked off, I'd like to send the entire state to some outside service. Now when I'm handling the command for checking off the last item in the list, I'll know that I'm at the end but it doesn't seem correct to send it to the outside system from the processing of the command. So given this scenario what is the recommended approach if the outside system requires all of the state of the aggregate. Should the outside system build its own copy of the data based on the aggregate events or is there some better approach?
Should the outside system build its own copy of the data based on the aggregate events.
Probably not -- it's almost never a good idea to share the responsibility of rehydrating an aggregate from its history. The service that owns the object should be responsible for rehydration.
First key idea to understand is when in the flow the call to the outside service should happen.
First, the domain model processes the command arguments, computing the update to the event history, including the ChecklistCompleted event.
The application takes that history, and saves it to the book of record
The transaction completes successfully.
At this point, the application knows that the operation was successful, but the caller doesn't. So the usual answer is to be thinking of an asynchronous operation that will do the rest of the work.
Possibility one: the application takes the history that it just saved, and uses that history to create schedule a task to rehydrate a read-only copy of the aggregate state, and then send that state to the external service.
Possibility two: you ditch the copy of the history that you have now, and fire off an asynchronous task that has enough information to load its own copy of the history from the book of record.
There are at least three ways that you might do this. First, you could have the command schedule the task as before.
Second, you could have a event handler listening for ChecklistCompleted events in the book of record, and have that handler schedule the task.
Third, you could read the ChecklistCompleted event from the book of record, and publish a representation of that event to a shared bus, and let the handler in the external service call you back for a copy of the state.
I was under the impression that one bounded context should not reach out to get state from another bounded context but rather keep local copies of the data it needed.
From my experience, the key idea is that the services shouldn't block each other -- or more specifically, a call to service B should not block when service A is unavailable. Responding to events is fundamentally non blocking; does it really matter that we respond to an asynchronously delivered event by making an asynchronous blocking call?
What this buys you, however, is independent evolution of the two services - A broadcasts an event, B reacts to the event by calling A and asking for a representation of the aggregate that B understands, A -- being backwards compatible -- delivers the requested representation.
Compare this with requiring a new release of B every time the rehydration logic in A changes.
Udi Dahan raised a challenging idea - the notion that each piece of data belongs to a singe technical authority. "Raw business data" should not be replicated between services.
A service is the technical authority for a specific business capability.
Any piece of data or rule must be owned by only one service.
So in Udi's approach, you'd start to investigate why B has any responsibility for data owned by A, and from there determine how to align that responsibility and the data into a single service. (Part of the trick: the physical view of a service can span process boundaries; in other words, a process may be composed from components that belong to more than one service).
Jeppe Cramon series on microservices is nicely sourced, and touches on many of the points above.
You should never externalise your state. Reporting on that state is a function of the read side, as it produces reports and you'll need that data to call the service. The structure of your state is plastic, and you shouldn't have an external service that relies up that structure otherwise you'll have to update both in lockstep which is a bad thing.
There is a blog that puts forward a strong argument that the process manager is the correct place to put this type of feature (calling an external service), because that's the appropriate place for orchestrating events.

building audit trail functionality

Following is a use case in a workflow system
Work order enters into a system. Work order will have a target which goes through different workflow states before completing a work order.
Say Work order for a target Vehicle came into a system - workflow for this work oder involves 2 tasks say
a)wash vehicle
b)inspect vehicle
Say wash vehicle workflow task changes vehicle attribute from "not washed" to "washed". And say "inspect vehicle" workflow task changes vehicle attribute "not inspected" to "inspection done"
If user is pulling work order data user will always see latest vehicle data (in this example assuming both workflow tasks are completed user will see value "washed" and "inspection done". However when user pulls ONLY workflow Task Wash Vehicle data -> user will see "washed" -Though second task was done, workflow Task 1 will only see that that it modified. Getting data for Workflow Task 2 will see both "washed" and "inspection done"
This involves milstoning (audit trail) of data; One approach is as shown below image - where when workflow task modifies data it'll update version number, modified_ts and maintain that version number in it's own data row (via a JOIN table as depicted below). Basically this is nothing but maintaining a reference to a history record for workflow task data so when pulling workflow task data it knows which history record to pull back. please ignore parent_id and other notes, noise in a below picture. it's not relevant for this question.
I am thinking event sourcing will also be another alternative design - however don't want to apply event sourcing(or any other similar solution) as a whole sale solution but only for this particular use case (affecting only 3 or so tables where audit trail matters). I am trying to evaluate if CQRS/Event sourcing is a right fit as a partial solution (again only limited to 3-4 tables which need to preserve history/audit trail data) or ES/CQRS will be an overkill? any other thoughts?
P.S. Though this isn't related to Scala - Scala is a platform we are using hence tagging it to see if there are language specific solutions that can help. tagging Akka for finding out if ES/CQRS via Akka persistence is an option or not. Postgresql is a db - And DB triggers is not a solution I am looking for.

Data store in an actor system

I'm working on an event processing pipeline based on Akka actors. I have 3 actors for each step of the pipeline: FilterWorker, EnrichWorker and ProcessWorker; plus a supervisor actor that makes sure the events are sent from one step of the pipeline to the next.
The enrich step might need to query some external database for extra data or even create new data that I'll want to persist. For example, the enrich step of a web analytics system might want to enrich a click event with the user that made the click and store that user information in a database.
Keeping in mind that example, I see the following options:
1.Use a singleton; e.g. UserStore that keeps in memory all the users gathered so far and saves them to the database once in a while; has all the logic to fetch users that are not yet in memory. Doesn't seem like a good idea to use a singleton in an actor system however (?).
Use a store actor. Use tell to add a new user and ask to fetch it.
Is there a better pattern for this?
Thanks!
In order to not leave this unanswered, I went with my second option and johanandren's suggestion of having an Actor fill the data store role. Works pretty well!

Work Flow Support multiple Scenario

I am building a base workflow will support around 25 Customer
all customers they matches with one basic workflow and each one has little different request lets say one customer wanna send email and another one don't wanna send email
What I am thinking to make
1- make one workflow and in the different requirement I will make switch to check who is
the user then switch each user to his requirements
(Advantages)this way powerful in maintenance and if there is any common requirements
easy to add
(Disadvantages) if The customer number increase and be like 100 and each is different
and we expect to have 100 user using the workflow but with the Different
little requirements
2- make Differnt workflow for each customer which meaning I will have a 100 workflow
in the future and in declaration instantiate the object from the specific workflow
which related to the Current user
(Advantages) each workflow is separate
(Disadvantages) - hard to add simple feature this meaning write the same thing 100
time so this is not Professional
so What I need ??
I wanna know if those only the ways I have to use in this situation or I missing another technique
One way would be to break out your workflow into smaller parts, each which do a specific thing. You could organize a layout like the following, to be able to support multiple variations of the inbound request.
Customer1-Activity.xaml
- Common-Activity1.xaml
- Common-Activity2.xaml
Customer2-Activity.xaml
- Common-Activity1.xaml
- Common-Activity2.xaml
For any new customers you have, you only need to create a root XAML activity, with each having the slight changes required for your incoming request parameters.
Option #2: Pass in a dictionary to your activity
Thought of a better idea, where you could have your workflow have a Dictionary<string, object> type be an input argument. The dictionary can contain the parameter/argument set that was given to your workflow. Your workflow could then query for the parameter set to initialize itself with that info.