Snapshot taking and restore strategies - cqrs

I've been reading about CQRS+EventSoucing patterns (which I wish to apply in a near future) and one point common to all decks and presentations I found is to take snapshots of your model state in order to restore it, but none of these share patterns/strategies of doing that.
I wonder if you could share your thoughts and experience in this matter particularly in terms of:
When to snapshot
How to model a snapshot store
Application/cache cold start
TL;DR: How have you implemented Snapshotting in your CQRS+EventSourcing application? Pros and Cons?

Rule #1: Don't.
Rule #2: Don't.
Snapshotting an event sourced model is a performance optimization. The first rule of performance optimization? Don't.
Specifically, snapshotting reduces the amount of time you lose in your repository trying to reload the history of your model from your event store.
If your repository can keep the model in memory, then you aren't going to be reloading it very often. So the win from snapshotting will be small. Therefore: don't.
If you can decompose your model into aggregates, which is to say that you can decompose the history of your model into a number of entities that have non-overlapping histories, then your one model long model history becomes many many short histories that each describe the changes to a single entity. Each entity history that you need to load will be pretty short, so the win from a snapshot will be small. Therefore: don't.
The kind of systems I'm working today require high performance but not 24x7 availability. So in a situation where I shut down my system for maintenace and restart it I'd have to load and reprocess all my event store as my fresh system doesn't know which aggregate ids to process the events. I need a better starting point for my systems to restart be more efficient.
You are worried about missing a write SLA when the repository memory caches are cold, and you have long model histories with lots of events to reload. Bolting on snapshotting might be a lot more reasonable than trying to refactor your model history into smaller streams. OK....
The snapshot store is a read model -- at any point in time, you should be able to blow away the model and rebuild it from the persisted history in the event store.
From the perspective of the repository, the snapshot store is a cache; if no snapshot is available, or if the store itself doesn't respond within the SLA, you want to fall back to reprocessing the entire event history, starting from the initial seed state.
The service provider interface is going to look something like
interface SnapshotClient {
SnapshotRecord getSnapshot(Identifier id)
}
SnapshotRecord is going to provide to the repository the information it needs to consume the snapshot. That's going to include at a minimum
a memento that allows the repository to rehydrate the snapshotted state
a description of the last event processed by the snapshot projector when building the snapshot.
The model will then re-hydrate the snapshotted state from the memento, load the history from the event store, scanning backwards (ie, starting from the most recent event) looking for the event documented in the SnapshotRecord, then apply the subsequent events in order.
The SnapshotRepository itself could be a key-value store (at most one record for any given id), but a relational database with blob support will work fine too
select *
from snapshots s
where id = ?
order by s.total_events desc
limit 1
The snapshot projector and the repository are tightly coupled -- they need to agree on what the state of the entity should be for all possible histories, they need to agree how to de/re-hydrate the memento, and they need to agree which id will be used to locate the snapshot.
The tight coupling also means that you don't need to worry particularly about the
schema for the memento; a byte array will be fine.
They don't, however, need to agree with previous incarnations of themselves. Snapshot Projector 2.0 discards/ignores any snapshots left behind by Snapshot Projector 1.0 -- the snapshot store is just a cache after all.
i'm designing an application that will probably generate millions event a day. what can we do if we need to rebuild a view 6 month later
One of the more compelling answers here is to model time explicitly. Do you have one entity that lives for six months, or do you have 180+ entities that each live for one day? Accounting is a good domain to reference here: at the end of the fiscal year, the books are closed, and the next year's books are opened with the carryover.
Yves Reynhout frequently talks about modeling time and scheduling; Evolving a Model may be a good starting point.

There are few instances you need to snapshot for sure. But there are a couple - a common example is an account in a ledger. You'll have thousands maybe millions of credit/debit events producing the final BALANCE state of the account - it would be insane not to snapshot that every so often.
My approach to snapshoting when I designed Aggregates.NET was its off by default and to enable your aggregates or entities must inherit from AggregateWithMemento or EntityWithMemento which in turn your entity must define a RestoreSnapshot, a TakeSnapshot and a ShouldTakeSnapshot
The decision whether to take a snapshot or not is left up to the entity itself. A common pattern is
Boolean ShouldTakeSnapshot() {
return this.Version % 50 == 0;
}
Which of course would take a snapshot every 50 events.
When reading the entity stream the first thing we do is check for a snapshot then read the rest of the entity's stream from the moment the snapshot was taken. IE: Don't ask for the entire stream just the part we have not snapshoted.
As for the store - you can use literally anything. VOU is right though a key-value store is best because you only need to 1. check if one exists 2. load the entire thing - which is ideal for kv
For system restarts - I'm not really following what your described problem is. There's no reason for your domain server to be stateful in the sense that its doing something different at different points in time. It should do just 1 thing - process the next command. In the process of handling a command it loads data from the event store, including a snapshot, runs the command against the entity which either produces a business exception or domain events which are recorded to the store.
I think you may be trying to optimize too much with this talk of caching and cold starts.

Related

Snapshotting as Domain Event in Event Sourcing

I have some pretty constant aggregates in my event sourcing model that will accumulate a large amount of events. I am thinking about using snapshots to optimize the re-hydration of these aggregates. I.E. the aggregates are warehouses.
My question is whether or not I should produce a specific event for snapshotting, so something like "WarehouseStateSnapshotted". In my current prototype, a snapshot state is saved in duplicate code existing in a few command handlers. I feel this is not the right area to be handling it. I would rather dispatch an event for the snapshot to my service bus, and have the event handler handle saving the snapshot state. This may, however, violate the domain driven pattern of events them self. Have other's created events for snapshots?
If this is not the right approach, should I at least move my snapshotting logic out of the command handlers and into the aggregate class?
Thanks!
EDIT: Title and -- This comment seems to suggest snapshots as domain events is the wrong approach.
EDIT2: Simplified Question - Is it appropriate to have repos injected into command handlers?
Let me attack the easy one first. The snapshotting logic does not belong in the aggregate. Whether and when you shapshot is purely a performance concern and so does not belong with business rules. It helps to draw the line by imagining a server with infinite resources. If you don’t need to do “the thing” on this amazing machine, “the thing” does not belong in the aggregate.
In the link you posted above I agree with RBanks54 that the snapshot does not belong in the aggregate event stream, for all the reasons he lists. I think your solution to dispatch an event on the service bus, then handle that event in a different command, is the correct approach. Handling snapshotting in the context of handling a new event means you cannot snapshot unless a new event is received. Having a distinct message on the service bus means any process can request a snapshot when appropriate.
My question is whether or not I should produce a specific event for snapshotting, so something like "WarehouseStateSnapshotted".
"It depends".
The reference you should review for snapshoting is CQRS Documents, by Greg Young. It's relatively old 2010, but serves as a simple introduction to snapshotting as a concept.
There's nothing wrong with generating snapshots asynchronously and storing them outside of the event stream.
You can use any sensible trigger for the snapshotting process; you don't necessarily need an event in the stream. "Snapshot every 100 events" or "Snapshot every 10 minutes" or "Snapshot when the admin clicks the snapshot button" are all viable.
Some domains have a natural cadence to them, where the domain itself might suggest a snapshot -- think "closing the books on the fiscal year".
I'm somewhat skeptical about putting a domain agnostic "make a snapshot" message into the event stream - I don't think it's appropriate to have the aggregate be responsible for snapshot cadence. It's not broken, but it does feel a bit like overloading the semantics of the event stream with a different concern.
I have been dabbling a bit with event-sourcing but I'm no expert. I do not particularly like the idea of a separate "stream" representing a snapshot. It isn't much of a stream since it only stores the last snapshot. In my Shuttle.Recall project, which is still in its infancy, I store snapshots as normal domain events but they are specifically marked as snapshots and the last snapshot version is stored separately in order to load it and then the events after that version are applied. I find some advantages to this in that you can add some functionality around snapshots also.
When you are using snapshots as a purely technical performance improvement it may not add much value to your domain. If a snapshot does not belong in the aggregate/domain then how would one go about hydrating the aggregate from the snapshot?
In some instances a snapshot may be very much part of the domain. When you look at your monthly bank statement you will not find each and every transaction (event) from the day that you opened up your account. Instead we have an opening balance (snapshot) with the new transactions (events) for that month. In this way the "MonthEndProcessed" event may very well be a snapshot.
I also don't really buy the argument that should a snapshot contain an error you cannot fix it since an event stream is immutable. What happens if your event contains an error? Can you not fix it? These errors should ideally not make it into a production system but if they do then they should be fixed. The immutability, to me anyway, relates to the typical interaction with the system. We do not typically make changes to an event once it has taken place.
In some instances it may even be beneficial to go back and change some events to a newer version. These should be kept to a minimum and ideally avoided but perhaps it may be pragmatic in some instances.
But like I said... I'm still learning :)

How to manage a user's game state using akka

I am trying to figure out how to manage a users game state using akka.
The game state will be persisted to mysql and this cannot change because we have other services that require this.
Anything that happens in a game is considered an "event".
Then you I have "Levels" which someone can achieve. A level is achieved when you complete all the "events" associated with it.
So you have:
Level
- event1 e.g. reach a point in the game
- event2 e.g. pickup a sword
- event3 e.g. defeat a monster
So in a game there are many levels, and 100's of events that are linked to levels.
So all "events" are sent via HTTP to my backend, and I save the event in the database.
I then have to load the users game profile in memory, and then re-calculate the Level's achieved since there was a new event that happened.
Note: This calculation cannot be done at the database level because it is a little more complicated that I am writing here.
The problem I see is that if I use akka, I can't have multiple actors processing the events for the same user, because the data can become stale.
Just to be clear, so when a new event arrives, I have to load the game profile in memory, loop through the levels and see if any of them have been achieved, if they have, update the database
e.g. update levels set achieved=true where level_id = 123 and user_id=234
e.g. actor1 loads the profile (all the levels and events for this user) and then processes the new event that just arrived in the inbox.
at the same time, actor2 loads the profile (same as actor1), and then processes the new event. When it persists the changes to mysql, the data will be out of sych.
If I was using threads, I would have to lock during the game profile calculation and persisting to the db.
How can I do this using Akka and be able to handle things in parallel, or is this scenerio not allow for it?
Let's think how you would manage it without actors. So, in nutshell, you have the following problem scenario:
two (or more) update requests arrive at the same time, both are
going to modify the same data
both requests read some stable data
state, then update it each in its own manner and persist to the DB
the modifications from the request which checked in first are lost, more precisely - overridden by the later request.
This is a classical problem. There are at least two classical solutions of it:
Optimistic locking
Pessimistic locking: it's usually achieved by applying Serializable isolation level for transactions.
It worth reading this answer with a nice comparison of both worlds.
As you're using Akka, you most probably want to prefer better concurrency and occasional failures, which are easy to recover. It goes on par with Akka motto let it crash.
So, you need to make the next steps:
Add version column to your table(s). It can be numeric or string (with hash). Numeric is the simplest one.
When you insert new record - initialize versions.
When you update the record - check version value has not changed. So, here's your update strategy:
Read record and its version.
Update record in memory.
Execute update query with criteria where rec_id=$id and version=$version.
If updated records count is 1 - you're good. If 0 - throw OptimisticLockException or smth like this.
Finally, it's time for Akka to do its job: come up with appropriate supervision strategy (I'd pick something like try again in 1 second). In actor's preRestart method return the update message back to the actor's mailbox (see Restart Hooks chapter in Akka docs).
With this strategy, even if two requests try to update the same record at a time, one of them will fail and will be immediately processed again.

CQRS, Event-Sourcing and Web-Applications

As I am reading some CQRS resources, there is a recurrent point I do not catch. For instance, let's say a client emits a command. This command is integrated by the domain, so it can refresh its domain model (DM). On the other hand, the command is persisted in an Event-Store. That is the most common scenario.
1) When we say the DM is refreshed, I suppose data is persisted in the underlying database (if any). Am I right ? Otherwise, we would deal with a memory-transient model, which I suppose, would not be a good thing ? (state is not supposed to remain in memory on server side outside a client request).
2) If data is persisted, I suppose the read-model that relies on it is automatically updated, as each client that requests it generates a new "state/context" in the application (in case of a Web-Application or a RESTful architecture) ?
3) If the command is persisted, does that mean we deal with Event-Sourcing (by construct when we use CQRS) ? Does Event-Sourcing invalidate the database update process ? (as if state is reconstructed from the Event-Store, maintaining the database seems useless) ?
Does CQRS only apply to multi-databases systems (when data is propagated on separate databases), and, if it deals with memory-transient models, does that fit well with Web-Applications or RESTful services ?
1) As already said, the only things that are really stored are the events.
The only things that commands do are consistency checks prior to the raise of events. In pseudo-code:
public void BorrowBook(BorrowableBook dto){
if (dto is valid)
RaiseEvent(new BookBorrowedEvent(dto))
else
throw exception
}
public void Apply(BookBorrowedEvent evt) {
this.aProperty = evt.aProperty;
...
}
Current state is retrieved by sequential Apply. Since this, you have to point a great attention in the design phase cause there are common pitfalls to avoid (maybe you already read it, but let me suggest this article by Martin Fowler).
So far so good, but this is just Event Sourcing. CQRS come into play if you decide to use a different database to persist the state of an aggregate.
In my project we have a projection that every x minutes apply the new events (from event store) on the aggregate and save the results on a separate instance of MongoDB (presentation layer will access to this DB for reading). This model is clearly eventually consistent, but in this way you really separate Command (write) from Query (read).
2) If you have decided to divide the write model from the read model there are various options that you can use to make them synchronized:
Every x seconds apply events from the last checkpoint (some solutions offer snapshot to avoid reapplying of heavy commands)
A projection that subscribe events and update the read model as soon event is raised
3) The only thing stored are the events. Infact we have an event-store, not a command store :)
Is database is useless? Depends! How many events do you need to reapply for take the aggregate to the current state?
Three? Maybe you don't need to have a database for read-model
The thing to grok is that the ONLY thing stored is the events*. The domain model is rebuilt from the events.
So yes, the domain model is memory transient as you say in that no representation of the domain model is stored* only the events which happend to the domain to put the model in the current state.
When an element from the domain model is loaded what happens is a new instance of the element is created and then the events that affect that instance are replayed one after the other in the right order to put the element into the correct state.
you could keep instances of your domain objects around and subscribing to new events so that they can be kept up to date without loading them from all the events every time, but usually its quick enough just to load all the events from the database and apply them every time in the same way that you might load the instance from the database on every call to your web service.
*Unless you have snapshots of you domain object to reduce the number of events you need to load/process
Persistence of data is not strictly needed. It might be sufficient to have enough copies in enough different locations (GigaSpaces). So no, a database is not required. This is (at least was a few years ago) used in production by the Dutch eBay equivalent.

Using aggregates and Domain events with nosql storage

I'm wandering on DDD and NoSql field actually. I have a doubt now: i need to produce events from the aggregate and i would like to use a NoSql storage. But how can i be sure that events are saved on the storage AND the changes on the aggregate root not having transactions?
Does it makes sense? Is there a way to do this without being forced to use event sourcing or a transactional db?
Actually i was lookin at implementing a 2 phase commit algorithm but it seems pretty heavy from a performance point of view...
Am i approaching the problem the wrong way?
Stuffed with questions...
Thanks for every suggestion
Enrico
PS
I'm a newbie on stackoverflow so any suggestion/critic/... is more than welcome
Enrico
Edit 1
Well i would need events to notify aggregates that something happened and i they should react to the change. The problem arise when such events are important for the business logic. As far as i understood, after a night of thinking, i can't use a nosql storage to do such things. Let me explain (thinking with loud voice :P):
With ES (1st scenery): I save the "diff" of the data. Then i produce an event associated with it. 2 operations.
With ES (2nd scenery): I save the "diff" of the data. A process, watch the ES and produce the event. But i'm tied to having only one watcher process to ensure the correct ordering of events.
With ES (3d scenery): Idempotent events. The events can be inferred by the state and every reapplication of the event can cause a change on the consumer only once, can have multiple "dequeue" processes, duplicates can't possibly happen. 1 operation, but it introduce heavy limitations on the consumers.
In general: I save the aggregate's data. Then i produce an event associated with it. 2 operations.
Now the question becomes wider imho, is it possible to work with domain events and nosql when such domain events are fundamental part of the business process?
I think that could be a better option to go relational... even if i would need to add quite a lot of machines to get the same performances.
Edit 2
For the sake of completness, searching for "domain events nosql idempotent" on google: http://svendvanderveken.wordpress.com/2011/08/26/transactional-event-based-nosql-storage/
If you need Event Sourcing, you should store events only.
This should be the sequence:
the aggregate root recieves a command
it fires proper events
events are stored
Each aggregate's re-hydratation should be done only by executing events over them. You can create aggregates' snapshots if you measure performance problems on their initialization, but this doesn't require two-phase commits, since you can build snapshots asynchronously via batch.
Note however that you need CQRS and/or Event Sourcing only if your application is heavily concurrent and you need to cope with partition tolerance and compensating actions.
edit
Event Sourcing is alternative to the persistence of object state. You either store the events or the state of the object model. You can save snapshot, but they're just performance tools: your application must be able to work without them. You can consider such snapshots as a caching technique. As an alternative you can persist object state (the classical model), but in that case you don't need to store events.
In my own DDD application, I use observable entities to decouple (via direct events' subscription from the repository) aggregates and their persistence. For example your repository can subscribe each domain events, and execute the actions required by the application (persist to the store, dispatch to a queue and so on...). But as a persistence technique, Event Sourcing is alternative to classical persistence of the observable object state. In most scenarios you don't need both.
edit 2
A final note: if you choose ES, one of the events subscriber can build a relational read-model too.

Recreate a graph that change in time

I have an entity in my domain that represent a city electrical network. Actually my model is an entity with a List that contains breakers, transformers, lines.
The network change every time a breaker is opened/closed, user can change connections etc...
In all examples of CQRS the EventStore is queried with Version and aggregateId.
Do you think I have to implement events only for the "network" aggregate or also for every "Connectable" item?
In this case when I have to replay all events to get the "actual" status (based on a date) I can have near 10000-20000 events to process.
An Event modify one property or I need an Event that modify an object (containing all properties of the object)?
Theres always an exception to the rule but I think you need to have an event for every command handled in your domain. You can get around the problem of processing so many events by making use of Snapshots.
http://thinkbeforecoding.com/post/2010/02/25/Event-Sourcing-and-CQRS-Snapshots
I assume you mean currently your "connectable items" are part of the "network" aggregate and you are asking if they should be their own aggregate? That really depends on the nature of your system and problem and is more of a DDD issue than simple a CQRS one. However if the nature of your changes is typically to operate on the items independently of one another then then should probably be aggregate roots themselves. Regardless in order to answer that question we would need to know much more about the system you are modeling.
As for the challenge of replaying thousands of events, you certainly do not have to replay all your events for each command. Sure snapshotting is an option, but even better is caching the aggregate root objects in memory after they are first loaded to ensure that you do not have to source from events with each command (unless the system crashes, in which case you can rely on snapshots for quicker recovery though you may not need them with caching since you only pay the penalty of loading once).
Now if you are distributing this system across multiple hosts or threads there are some other issues to consider but I think that discussion is best left for another question or the forums.
Finally you asked (I think) can an event modify more than one property of the state of an object? Yes if that is what makes sense based on what that event represents. The idea of an event is simply that it represents a state change in the aggregate, however these events should also represent concepts that make sense to the business.
I hope that helps.