CQRS/EventStore - changing two aggregates - cqrs

I have a command that updates two aggregates. Since aggregate routes are transactional boundaries, I have a command that does a repository.Save() action on the first aggregate and then I fire another command (from within the first command) which acts on the second aggregate. Each Save() actions starts its Event-Store transaction and commits the changes and then publishes them.
First is this correct, i.e. letting one command notify another aggregate via another command?
I noticed in Mark Nihjof's code that he uses event handlers which is nice as you could register the event handlers to the same event. I tried doing this using J Oliver's Event-Store but my commits.events in IDispatchCommit were referencing the first aggregates values when processing the second. This caused some weird errors.
So should I find a way of making this work with EventHandlers or is firing off commands within commands okay?
JD
Edit - I have used switched my wire up to use .UsingAsynchronousDispatchScheduler() and am now allowing registered events to fire more than one event handler which in turn fires a command on the other aggregate and it seems to work. So, is this the correct way to do it and not use commmands firing commands?

I think there's a million and one ways to skin this cat. I'm not sure firing a command from an event handler is the way to go, I have to command handlers respond to the same command in this instance.
I do find documently good for a reference app. Have you looked a that?

Related

Early firing in Flink - how to emit early window results to a different DataStream with a trigger

I'm working with code that uses a tumbling window of one day, and would like to send early results to a different DataStream on an hourly basis.
I understand that triggers are a way to go here, but don't really see how it would work.
The current code is as follows:
myStream
.keyBy(..)
.window(TumblingEventTimeWindows.of(Time.days(1)))
.aggregate(new MyAggregateFunction(), new MyProcessWindowFunction())
In my understanding, I should register a trigger, and then on its onEventTime method get a hold of a TriggerContext and I can send data to the labeled output from there. But how do I get the current state of MyAggregateFunction from there? Or would I need to my own computation here inside of onEventTime()?
Also, the documentation states that "By specifying a trigger using trigger() you are overwriting the default trigger of a WindowAssigner.". Would my one day window then still fire correctly, or do I need to trigger it somehow differently?
Another way of doing this is creating two different operators - one that windows by 1 hour, and another that windows by 1 day. Would triggers be a preferred approach to that?
Rather than using a custom Trigger, it would be simpler to have two layers of windowing, where the hourly results are further aggregated into daily results. Something like this:
hourlyResults = myStream
.keyBy(...)
.window(TumblingEventTimeWindows.of(Time.hours(1)))
.aggregate(new MyAggregateFunction(), new MyProcessWindowFunction())
dailyResults = hourlyResults
.keyBy(...)
.window(TumblingEventTimeWindows.of(Time.days(1)))
.aggregate(new MyAggregateFunction(), new MyProcessWindowFunction())
hourlyResults.addSink(...)
dailyResults.addSink(...)
Note that the result of a window is not a KeyedStream, so you will need to use keyBy again, unless you can arrange to leverage reinterpretAsKeyedStream (docs).
Normally when I get to more complex behavior like this, I use a KeyedProcessFunction. You can aggregate (and save in state) hourly and daily results, set timers as needed, and use a side output for the hourly results versus the regular output for the daily results.
There are quite a few questions here. I will try to ask all of them. First of all, if You specify Your own trigger using trigger() this means You are going to effectively override the default trigger and thus the window may not work the way it would by default. So, if You for example if You create the 1 day event time tumbling window, but override a trigger so that it fires for every 20th element, it will never fire based on event time.
Now, after Your custom trigger fires, the output from MyAggregateFunction will be passed to MyProcessWindowFunction, so It will work the same as for the default trigger, you don't need to access the MyAggregateFunction from inside the trigger.
Finally, while it may be technically possible to implement trigger to trigger partial results every hour, my personal opinion is that You should probably go with the two separate windows. While this solution may create a slightly larger overhead and may result in a larger state, it should be much clearer, easier to implement, and finally much more error resistant.

Complicated job aggregate

I have a very complicated job process and it's not 100% clear to me where to handle what.
I don't want to have code, it just the question who is responsible for what.
Given is the following:
There is a root directory "C:\server"
Inside are two directories "ftp" and "backup"
Imagine the following process:
An external customer sends a file into the ftp directory.
An importer application get's the file and now the fun starts.
A job aggregate have to be created for this file.
The command "CreateJob(string file)" is fired.
?. The file have to be moved from ftp to backup. Inside the CommandHandler or inside the Aggregate or on JobCreated event?
StartJob(Guid jobId) get's called. A third folder have to be created "in-progress", File have to be copied from backup to in-progress. Who does it?
So it's unclear for me where Filesystem things have to be handled if the Aggregate can not work correctly without the correct filesystem.
Because my first approach was to do that inside an Infrastructure layer/lib which listen to the events from the job layer. But it seems not 100% correct?!
And top of this, what is with replaying?
You can't replay things/files that were moved, you have to somehow simulate that a customer sends the file to the ftp folder...
Thankful for answers
The file have to be moved from ftp to backup. Inside the CommandHandler or inside the Aggregate or on JobCreated event?
In situations like this, I move the file to the destination folder in the Application service that sends the command to the Aggregate (or that calls a command-like method on the Aggregate, it's the same) before the command is sent to the Aggregate. In this way, if there are some problems with the file-system (not enough permissions or space is not available etc) the command is not sent. These kind of problems should not reach our Aggregate. We most protect it from the infrastructure. In fact we should keep the Aggregate isolated from anything else; it must contain only pure business logic that is used to decide what events get generated.
Because my first approach was to do that inside an Infrastructure layer/lib which listen to the events from the job layer. But it seems not 100% correct?!
Indeed, this seems like over engineering to me. You must KISS.
StartJob(Guid jobId) get's called. A third folder have to be created "in-progress", File have to be copied from backup to in-progress. Who does it?
Whoever's calling the StartJob could do the moving, before the StartJob gets called. Again, keep the Aggregate pure. In this case it depends on your framework/domain details.
And top of this, what is with replaying? You can't replay things/files that where moved, you have to somehow simulate that a customer sends the file to the ftp folder...
The events are loaded from the event store and replayed in two situations:
Before every command gets sent to the Aggregate, the Aggregate Repository loads all the events from the event store then it applies every one of them to the Aggregate, probably calling some applyThisEvent(TheEvent) method on the Aggregate. So, this methods should be with no side effects (pure) otherwise you change the outside world again and again at every command execution and you don't want that.
The read-models (the projections, the query-models) that present data to the user listen to those events and update the database tables that hold the data that the users see. The events are sent to those read-models after they are generated and every time the read-models are being recreated. When you invent a new read-model, you must pass it all the events that were previous generated by the aggregates in order to build the correct/complete state on them. If your read-model's event listeners have side effects what do you think happens when you replay those long past events? The outside world is modified again and again and you don't want that! The read-models only interpret the events, they don't generate other events and they don't change the outside world.
There is a special third case when events reach another type of model, a Saga. A Saga must receive an event only once! This is the case that you thought to use in Because my first approach was to do that inside an Infrastructure layer/lib which listen to the events from the job layer. You could do this in your case but is not KISS.
I have a very complicated job process and it's not 100% clear to me where to handle what. I don't want to have code, it just the question who is responsible for what.
The usual answer is that the domain model -- aka the "aggregate" makes decisions, and saves them. Observing those decisions, some event handler induces side effects.
And top of this, what is with replaying? You can't replay things/files that where moved, you have to somehow simulate that a customer sends the file to the ftp folder...
You replay the events to the aggregate, so that it is restored to the state where it made the last decision. That's a separate concern from replaying the side effects -- which is part of the motivation for handling the side effects elsewhere.
Where possible, of course, you prefer to have the side effects be idempotent, so that a duplicated message doesn't create a problem. But notice that from the point of view of the model, it doesn't actually matter whether the side effect succeeds or not.

Processing a row externally that fired a trigger

I'm working on a PostgreSQL 9.3-database on an Ubuntu 14 server.
I try to write a trigger-function (AFTER EACH ROW) that launches an external process that needs to access the row that fired that trigger.
My problem:
Even tough I can run queries on the table including the new row inside the trigger, the external process does not see it (while the trigger function is still running).
Is there a way to manage that?
I thought about starting some kind of asynchronous function call to give the trigger some time to terminate first, but that's of course really ugly.
Also I read about notifiers and listeners, but that would require some refactoring of my existing code and also some additional listener, which I tried to prevent with my trigger. (I'm also afraid of new problems which may occur on this road.)
Any more thoughts?
Robin

Akka Persistence: Where do the execution of the Command Goes when it is not simply a state update

Just for clarification: Where do the execution of a command goes, when the execution is not simply a state update (like in most examples found online)
For instance, in my case,
The Command is FetchLastHistoryChangeSet which consist in fetching the last history changeset from an external service based on where we left off last time. In other words the time of the newest change of the previous history ChangeSet Fetched.
The Event would be HistoryChangeSetFetched(changeSet, time). In correlation to what has been said above, the time should be that of the newest change of the newly history ChangeSet Fetched (as per the command event currently being handled)
Now in all example that i see, it is always: (i) validating the command, then, (ii) persisting the event, and finally (iii) handling the event.
It is in handling the event that i have seen custom code added in addition to the updatestate logic. Where, the custom code is usually added after the update state function. But this custom is most of the time about sending message back to the sender, or broadcasting it to the event bus.
As per my example, it is clear that i need to do quite few operation to actually call Persist(HistoryChangeSetFetched(changeSet, time)). Indeed i need the new changeset, and the time of the newest change of it.
The only way i see it possible is to do the fetch in the validating the command
That is:
case FetchLastHistoryChangeSet => val changetuple = if ValidateCommand(FetchLastHistoryChangeSet) persit(HistoryChangeSetFetched(changetuple._1, changetuple._2)) { historyChangeSetFetched =>
updateState(historyChangeSetFetched)
}
Where the ValidateCommand(FetchLastHistoryChangeSet)
would have as logic, to read last changeSet time (newest change of the changeSet), fetch a new changeset based on it, if it exist, get the time of its newest change, and return the tuple.
My question is, is that how it is supposed to work. Validating command
can be something as complex as that ? i.e. actually executing the
command ?
As it says in the documentation: "validation can mean anything, from simple inspection of a command message's fields up to a conversation with several external services"
So I think what you're trying to do is exactly right. Any interaction with an external service must be done at the command validation stage.

Should Command / Handles hold the full aggregates or only its id?

I'm trying to play around with DDD and CQRS.
And I got this two solutions :
add AggregateId to my command / event. It's nice beauce I can use my command as my web service's parameter, and I can as well return some instance of my command to my forms for saying "you can do this command,t his one and this one"
add my full Aggregate to my command / event. It's nice because I'm sure that I won't load my aggregate 100 times if there is a lot of event going on, I'll just pass my reference around (for instance I won't load it in my command's validator and in my command handler). But i'd add to create a parameter class for each command wih only the id.
For now I have the id in the commands and the full model in the events (I trust my unit of work for caching the Load(aggregateId) so i won't execute the same request 100 for 1 command).
Is there a right / better way ?
Yes your current approach is correct - reference the aggregate with an identity value on the command. A command is meant to be serialized and sent across process boundaries. Also, a command is normally constructed by a client who may not have enough information to create an entire aggregate instance. This is also why an identity should be used. And yes, your unit of work should take care of caching an aggregate for the duration of a unit of work, if need be.