I'm building a simple REST API for generating some objects that must be created and sent periodically out of the API. The nature of the objects doesn't matter, neither the framework supporting the REST interface (Spray, Play Framework, whatever else). My question is, what would be a good scalable actor design for this system using Akka? Suppose the service crashes or it's migrated or whatever that causes to stop it. In order to recover the description of the tasks about what objects must be sent and when, is akka-persistence a good way to go here? or it's better to persist such things in a traditional DB?
Thanks.
NOTE: also I would like to know, supposing there's some actor which is not stateful himself, but creates many children actors, if it's a good practice to use akka-persistence in order to replay the messages which causes this actor to create his children again (the children being also non-stateful).
In a traditional DB you would most likely end up modeling this with timestamps and events, and with event sourcing this is already the native model.
Akka-persistence would be a natural fit for this scenario since it will persist every event about what objects must be created and sent periodically out. The snapshot support will also help with speed of recovery when the number of events gets very large.
In the case of crashes or migration, the recovery process will handle this just fine.
Regarding your note, if the actor is truly stateless then there is no need to persist the events that cause the children to be created since they can be recreated on demand. If the existence of the children does need to be recovered, then the actor is not stateless. In that case then it may indeed make sense to persist those events.
Related
In my WebApi I want to subscribe to the ActorEvents for all of my ActorIds. Subscribing to a single ActorId is easy, and works. However, I'm wondering if there is a way to subscribe everything all and future ones at once, that I might've missed in the documentation.
Currently, I'm iterating through all my ActorIds on WebApi startup and subscribing. I then subscribe to new Ids, and unsubscribe from deleted ones. However, this is somewhat cumbersome, and since it is during startup, will throw errors and stop the api process if any of the actors aren't up and running (happens on deployment).
Also for consideration, is that the same ActorId may be used in multiple IActors. For example UserActor(Bob), Wishlist(Bob), etc.
Any suggestions on how best to subscribe to anything?
Thanks for your time and suggestions!
You can't subscribe to future Actor events, because they don't exist yet. As a workaround, you can make the Actors register themselves to a central hub, as they are created. And use that instead of ActorEvents. One way to do this is by using something like pub/sub.
An ActorId simply identifies an Actor and it is used to determine the ActorService partition it is stored on. It does not hold Actor type info, sou it's safe to re-use it across different Actor types.
According to CQRS à la Greg Young, event handlers (and the downstream event denormalizers) react on incoming events that were published before by the event publisher.
Now lets suppose that at runtime we want to add a new event denormalizer: Basically, this is easy, but it needs to get to its data to the current state.
What is the best way to do this?
Should I send an out-of-order request to the event store and ask for all previously emitted events?
Or is there a better way to do this?
You can fetch and replay all (required) events against the new handler. This can be done in a separate process since what you essentially want is to get the persisted view models into the proper state.
Have a look at Rinat Abdullin's Lokad.CQRS sample project for a production example. Especially the SaaS.Engine.StartupProjectionRebuilder might be an interesting source even though it's rather complex.
One can also build the projections so that they remember what event they saw last. Then on any startup, they ask for this event and all forward. Re-starting an old projection and building a new one then become roughly the same thing.
If you embrace bounded context complex integration you may need to drop the entire read model and rebuild it.
I am attempting to learn and apply the CQRS design approach (pattern and architecture) to a new project but seem to be missing a key piece.
My client application executes a query and retrieves a list of light-weight, read-only DTOs from the read model. The user selects an item and clicks a button to initiate some action. The action is performed by creating and sending the corresponding command object to the write model (where the command handler carries out the action, updates the data store, etc.) At some point, however, I need to update the UI to reflect changes to the state of the application resulting from the action.
How does the UI know when it is time to refresh the original list?
Additional Info
I have noticed that most articles/blogs discussing CQRS use MVC client apps in their examples. I am working on a Silverlight client right now and am beginning to wonder if the pattern simply doesn't work in that case.
Follow-Up Question
After thinking more about Bartlomiej's response and subsequent discussion, I am wondering about error handling in CQRS. Given that commands are basically fire-and-forget asynchronous operations, how do we report an error condition to the UI?
I see 'refreshing the UI' to take one of two forms:
The operation succeeds, data has changed and the UI should be updated to reflect these changes
The operation fails, data has not changed but the user should be notified of the failure and potential corrective actions.
Even with a Post-Redirect-Get pattern in an MVC, you can't really Redirect until you know the outcome of the operation. None of the examples I've seen thus far address these real-world concerns.
I've been struggling with similar issues for a WPF client. The re-query trigger for any data is dependent on the data your updating, commands tend to fall into categories:
The command is a true fire and forget method, it informs the back-end of a state change but this change does not need to be reflected in the UI, or the change simply isn't important to the UI.
The command will alter the result of a single query
The command will alter the result of multiple queries, usually (in my domain at least) in a cascading fashion, that is, changing the state of a single "high level" piece of data will likely affect many "low level" caches.
My first trigger is the page load, very few items are exempt from this as most pages must assume data has been updated since it was last visited. Though some systems may be able to escape with only updating financial and other critical data in this way.
For short commands I also update data when 'success' is returned from a command. Though this is mostly laziness as IMHO all CQRS commands should be fired asynchronously. It's still an option I couldn't live without but one you may have to if your implementation expects high latency between command and query.
One pattern I'm starting to make use of is the mediator (most MVVM frameworks come with one). When I fire a command, I also fire a message to the mediator specifying which command was launched. Each Cache (A view model property Retriever<T>) listens for commands which affect it and then updates appropriately. I try to minimise the number of messages while still minimising the number of caches that update unnecessary from a single message so I'll (hopefully) eventually end up with a shortlist of update reasons, with each 'reason' updating a list of caches.
Another approach is simple honesty, I find that by exposing graphically how the system updates itself makes users more willing to be patient with it. On firing a command show some UI indicating you're waiting for the successful response, on error you could offer to retry / show the error, on success you start the update of the relevant fields. Baring in mind that this command could have been fired from another terminal (of which you have no knowledge) so data will need to timeout eventually to avoid missing state changes invoked by other machines also.
Noting the irony that the only efficient method of updating cache's and values on a client is to un-separate the commands and queries again, be it through hardcoding or something like a hashmap.
My two cents.
I think MVVM actually fits into CQRS quite well. The ViewModel simply becomes an observable ReadModel.
1 - You initialize your ViewModel state via a query on the ReadModel.
2 - Changes on your ViewModel are automatically reflected on any Views that are bound to it.
3 - Certain changes on your ViewModel trigger a command to propegate to a message queue, an object responsible for sending those commands to the server takes those messages off the queue and sends them to the WriteModel.
4 - Clients should be well formed, meaning the ViewModel should have performed appropriate validation before it ever triggered the command. Once the command has been triggered, any event notifications can be published onto an event bus for the client to communicate changes to other ViewModels or components in the system interested in those changes. These events should carry the relevant information necessary. Typically, this means that other view models usually don't have to re-query the read model as a result of the change unless they are dependent on other data that needs to be retrieved.
5 - There is an object that connects to the message bus on the server for real-time push notifications when other clients make changes that this client is interested in knowing about, falling back to long-polling if necessary. It propagates those to the internal message bus that ties the components on the client together.
6 - The last part to handle is the fact that clients can be occasionally connected, which should be the only reason a command fails (they don't have internet access at the moment), which is when the client should be notified of problems.
In my ASP.NET MVC 3 I use 2 techniques depending on use case:
already well-known Post-Redirect-Get pattern which fits nicely with CQRS. Your MVC action that triggers the command returns a redirection to action that performs a query.
in some cases, like real-time updates of other clients, I rely on domain events/messages. I create an event handler that uses singlarR to push changes to all connected and interested clients.
There are two major ways you can take as far as I know :
1) design your UI , so that the user does not see its changes right away. Like for instance a message to tell him his action is a success, and offering him different choices to continue his work. this should buy you enough time to have updated your readmodel.
2) more complex, but you might keep the information you have send to the server and shows them in the interface.
The most important I guess, educate your user if you can so that they know why the data is not here... yet!
I am thinking about it only now, but these are for sync command handling, not async, in async things go really harder on the brain...the client interface becomes an event eater too..
Event sourcing is pitched as a bonus for a number of things, e.g. event history / audit trail, complete and consistent view regeneration, etc. Sounds great. I am a fan. But those are read-side implementation details, and you could accomplish the same by moving the event store completely to the read side as another subscriber.. so why not?
Here's some thoughts:
The views/denormalizers themselves don't care about an event store. They just handle events from the domain.
Moving the event store to the read side still gives you event history / audit
You can still regenerate your views from the event store. Except now it need not be a write model leak. Give him read model citizenship!
Here seems to be one technical argument for keeping it on the write side. This from Greg Young at http://codebetter.com/gregyoung/2010/02/20/why-use-event-sourcing/:
There are however some issues that exist with using something that is storing a snapshot of current state. The largest issue revolves around the fact that you have introduced two models to your data. You have an event model and a model representing current state.
The thing I find interesting about this is the term "snapshot", which more recently has become a distinguished term in event sourcing as well. Introducing an event store on the write side adds some overhead to loading aggregates. You can debate just how much overhead, but it's apparently a perceived or anticipated problem, since there is now the concept of loading aggregates from a snapshot and all events since the snapshot. So now we have... two models again. And not only that, but the snapshotting suggestions I've seen are intended to be implemented as an infrastructure leak, with a background process going over your entire data store to keep things performant.
And after a snapshot is taken, events before the snapshot become 100% useless from the write perspective, except... to rebuild the read side! That seems wrong.
Another performance related topic: file storage. Sometimes we need to attach large binary files to entities. Conceptually, sometimes these are associated with entities, but sometimes they ARE the entities. Putting these in the event store means you have to physically load that data each and every time you load the entity. That's bad enough, but imagine several or hundreds of these in a large aggregate. Every answer I have seen to this is to basically bite the bullet and pass a uri to the file. That is a cop-out, and undermines the distributed system.
Then there's maintenance. Rebuilding views requires a process involving the event store. So now every view maintenance task you ever write further binds your write model into using the event store.. forever.
Isn't the whole point of CQRS that the use cases around the read model and write model are fundamentally incompatible? So why should we put read model stuff on the write side, sacrificing flexibility and performance, and coupling them back up again. Why spend the time?
So all in all, I am confused. In all respects from where I sit, the event store makes more sense as a read model detail. You still achieve the many benefits of keeping an event store, but you don't over-abstract write side persistence, possibly reducing flexibility and performance. And you don't couple your read/write side back up by leaky abstractions and maintenance tasks.
So could someone please explain to me one or more compelling reasons to keep it on the write side? Or alternatively, why it should NOT go on the read side as a maintenance/reporting concern? Again, I'm not questioning the usefulness of the store. Just where it should go :)
This is a long dead question that someone pointed me to. There are quite a few reasons why its better to store events on the write side.
From my understanding the architecture you are talking about is a very common one that I see ... fail. We will store our domain model in a relational database then put out events. You add the twist of them saving the events on the read side in an event store. This will likely lead to a mess.
The first issue you will run into is in the publishing of your events. What happens when I save to the database and publish to say MSMQ (I die in the middle). So DTC gets introduced between them. This is a huge thing to bring in, distributed transactions should be avoided like the plague. It is also quite inefficient as I am probably making the data durable twice (once to queue once to database). This will limit system throughput by a lot (DTC benchmarks of 200-300 messages/second are common, with events only 20-30k/second is common).
Some work around the need for DTC by putting a table in their database that has the events and operates as a queue. This will avoid the need for DTC however this will still run into the next issue.
What happens when you have a bug? I know you would never write buggy code but one of the Jrs/maintenance developers later working with the project. As an example what happens when the domain object change and the event raised do not match? Say you set State on your domain object to "LA" (hardcoded) but you properly set State on the event to cmd.State ("CT").
How will you detect such errors are occurring? The biggest problem with what is being discussed is that there are now two sources of "truth" there is the database on the write side and the event stream coming out. There is no way to prove that they are equivalent. This will cause all sorts of weird bugs down the line.
I think this is really an excellent question. Treating your aggregate as a sequence of events is useful in its own right on the write side, making command retries and the like easier. But I agree that it seems upsetting to work to create your events, then have to make yet another model of your object for persistence if you need this snapshotting performance improvement.
A system where your aggregates only stored snapshots, but sent events to the read-model for projection into read models would I think be called "CQRS", just not "Event Sourcing". If you kept the events around for re-projection, I guess you'd have a system that was very much both.
But then wouldn't you have three definitions? One for persisting your aggregates, one for communicating state changes, and any number more for answering queries?
In such a system it would be tempting to start answering queries by loading your aggregates and asking them questions directly. While this isn't forbidden by any means, it does tend to start causing those aggregates to accrete functionality they might not otherwise need, not to mention complicating threading and transactions.
One reason for having the event store on the write-side might be for resolving concurrency issues before events become "facts" and get distributed/dispatched, e.g. through optimistic locking on committing to event streams. That way, on the write side you can make sure that concurrent "commits" to the same event stream (aggregate) are resolved, one of them gets through, the other one has to resolve the conflicts in a smart way through comparing events or propagating the conflict to the client, thus rejecting the command.
In the reply to few questions, Jonathan Oliver mentions using an AsynchronousCommitDispatcher to handle multiple unit of works.
I am still in the design stage of my project (and still learning CRQS and ES) and have a few questions:
Would I create an AsynchronousCommitDispatcher for each aggregate root that will be affected by a domain event being raised?
What happens if I have some sort of locking mechanism where the dispatched event cannot make a change to an aggregate root if it is locked by another user? Does AsynchronousCommitDispatcher retry if there is a lock?
What if the system goes down before an domain event is handled? Unless I persist the fact that it has not been handled, wont it be lost?
My initial understanding was that the types of Dispatchers were for messaging across the wire or for updating the read model. Here we are using it to update another aggregate root. I this correct?
TIA
JD
The commit dispatchers are all about pushing events onto the wire after everything has been completely successfully. No, you don't need more than one dispatcher for a given endpoint. The AsyncCommitScheduler (which uses a dispatcher) is multi-threaded and can dispatch more than one event at a time.
A dispatcher is not about handling an incoming message--that's what your message handlers are for. The dispatcher just sends once everything is complete.
Yes dispatchers can help update read models, but not in the way you think. Instead, the dispatchers just push the messages into your messaging framework (MSMQ, RabbitMQ, or, at a higher level, NServiceBus/MassTransit). Then once a message is received at your view models, you update your view model tables accordingly.