How can running event handlers on production be done? - cqrs

On production enviroments event numbers scale massively, on cases of emergency how can you re run all the handlers when it can take days if they are too many?

Depends on which sort of emergency you are describing
If the nature of your emergency is that your event handlers have fallen massively behind the writers (eg: your message consumers blocked, and you now have 48 hours of backlog waiting for you) -- not much. If your consumer is parallelizable, you may be able to speed things up by using a data structure like LMAX Disruptor to support parallel recovery.
(Analog: you decide to introduce a new read model, which requires processing a huge backlog of data to achieve the correct state. There isn't any "answer", except chewing through them all. In some cases, you may be able to create an approximation based on some manageable number of events, while waiting for the real answer to complete, but there's no shortcut to processing all events).
On the other hand, in cases where the history is large, but the backlog is manageable (ie - the write model wasn't producing new events), you can usually avoid needing a full replay.
In the write model: most event sourced solutions leverage an event store that supports multiple event streams - each aggregate in the write model has a dedicated stream. Massive event numbers usually means massive numbers of manageable streams. Where that's true, you can just leave the write model alone -- load the entire history on demand.
In cases where that assumption doesn't hold -- a part of the write model that has an extremely large stream, or a pieces of the read model that compose events of multiple streams, the usual answer is snapshotting.
Which is to say, in the healthy system, the handlers persist their state on some schedule, and include in the meta data an identifier that tracks where in the history that snapshot was taken.
To recover, you reload the snapshot, and the identifier. You then start the replay from that point (this assumes you've got an event store that allows you to start the replay from an arbitrary point in the history).
So managing recovery time is simply a matter of tuning the snapshotting interval so that you are never more than recovery SLA behind "latest". The creation of the snapshots can happen in a completely separate process. (In truth, your persistent snapshot store looks a lot like a persisted read model).


How to replay Event Sourcing events reliably?

One of great promises of Event Sourcing is the ability to replay events. When there's no relationship between entities (e.g. blob storage, user profiles) it works great, but how to do replay quckly when there are important relationships to check?
For example: Product(id, name, quantity) and Order(id, list of productIds). If we have CreateProduct and then CreateOrder events, then it will succeed (product is available in warehouse), it's easy to implement e.g. with Kafka (one topic with n1 partitions for products, another with n2 partitions for orders).
During replay everything happens more quickly, and Kafka may reorder the events (e.g. CreateOrder and then CreateProduct), which will give us different behavior than originally (CreateOrder will now fail because product doesn't exist yet). It's because Kafka guarantees ordering only within one topic within one partition. The easy solution would be putting everything into one huge topic with one partition, but this would be completely unscalable, as single-threaded replay of bigger databases could take days at least.
Is there any existing, better solution for quick replaying of related entities? Or should we forget about event sourcing and replaying of events when we need to check relationships in our databases, and replaying is good only for unrelated data?
As a practical necessity when event sourcing, you need the ability to conjure up a stream of events for a particular entity so that you can apply your event handler to build up the state. For Kafka, outside of the case where you have so few entities that you can assign an entire topic partition to just the events for a single entity, this entails a linear scan and filter through a partition. So for this reason, while Kafka is very likely to be a critical part of any event-driven/event-based system in relaying events published by a service for consumption by other services (at which point, if we consider the event vs. command dichotomy, we're talking about commands from the perspective of the consuming service), it's not well suited to the role of an event store, which are defined by their ability to quickly give you an ordered stream of the events for a particular entity.
The most popular purpose-built event store is, probably, the imaginatively named Event Store (at least partly due to the involvement of a few prominent advocates of event sourcing in its design and implementation). Alternatively, there are libraries/frameworks like Akka Persistence (JVM with a .Net port) which use existing DBs (e.g. relational SQL DBs, Cassandra, Mongo, Azure Cosmos, etc.) in a way which facilitates their use as an event store.
Event sourcing also as a practical necessity tends to lead to CQRS (they go together very well: event sourcing is arguably the simplest possible persistence model capable of being a write model, while its nearly useless as a read model). The typical pattern seen is that the command processing component of the system enforces constraints like "product exists before being added to the cart" (how those constraints are enforced is generally a question of whatever concurrency model is in use: the actor model has a high level of mechanical sympathy with this approach, but other models are possible) before writing events to the event store and then the events read back from the event store can be assumed to have been valid as of the time they were written (it's possible to later decide a compensating event needs to be recorded). The events from within the event store can be projected to a Kafka topic for communication to another service (the command processing component is the single source of truth for events).
From the perspective of that other service, as noted, the projected events in the topic are commands (the implicit command for an event is "update your model to account for this event"). Semantically, their provenance as events means that they've been validated and are undeniable (they can be ignored, however). If there's some model validation that needs to occur, that generally entails either a conscious decision to ignore that command or to wait until another command is received which allows that command to be accepted.
Ok, you are still thinking how did we developed applications in last 20 years instead of how we should develop applications in the future. There are frameworks that actually fits the paradigms of future perfectly, one of those, which mentioned above, is Akka but more importantly a sub component of it Akka FSM Finite State Machine, which is some concept we ignored in software development for years, but future seems to be more and more event based and we can't ignore anymore.
So how these will help you, Akka is a framework based on Actor concept, every Actor is an unique entity with a message box, so lets say you have Order Actor with id: 123456789, every Event for Order Id: 123456789 will be processed with this Actor and its messages will be ordered in its message box with first in first out principle, so you don't need a synchronisation logic anymore. But you could have millions of Order Actors in your system, so they can work in parallel, when Order Actor: 123456789 processing its events, an Order Actor: 987654321 can process its own, so there is the parallelism and scalability. While your Kafka guaranteeing the order of every message for Key 123456789 and 987654321, everything is green.
Now you can ask, where Finite State Machine comes into play, as you mentioned the problem arise, when addProduct Event arrives before createOrder Event arrives (while being on different Kafka Topics), at that point, State Machine will behave differently when Order Actor is in CREATED state or INITIALISING state, in CREATED state, it will just add the Product, in INITIALISING state probably it will just stash it, until createOrder Event arrives.
These concepts are explained really good in this video and if you want to see a practical example I have a blog for it and this one for a more direct dive.
I think I found the solution for scalable (multi-partition) event sourcing:
create in Kafka (or in a similar system) topic named messages
assign users to partitions (e.g by murmurHash(login) % partitionCount)
if a piece of data is mutable (e.g. Product, Order), every partition should contain own copy of the data
if we have e.g. 256 pieces of a product in our warehouse and 64 partitions, we can initially 'give' every partition 8 pieces, so most CreateOrder events will be processed quickly without leaving user's partition
if a user (a partition) sometimes needs to mutate data in other partition, it should send a message there:
for example for Product / Order domain, partitions could work similarly to Walmart/Tesco stores around a country, and the messages sent between partitions ('stores') could be like CreateProduct, UpdateProduct, CreateOrder, SendProductToMyPartition, ProductSentToYourPartition
the message will become an 'event' as if it was generated by an user
the message shouldn't be sent during replay (already sent, no need to do it twice)
This way even when Kafka (or any other event sourcing system) chooses to reorder messages between partitions, we'll still be ok, because we don't ever read any data outside our single-threaded 'island'.
EDIT: As #LeviRamsey noted, this 'single-threaded island' is basically actor model, and frameworks like Akka can make it a bit easier.

Kafka validate messages in stateful processing

I have an application where multiple users can send REST operations to modify the state of shared objects.
When an object is modified, then multiple actions will happen (DB, audit, logging...).
Not all the operations are valid for example you can not Modify an object after it was Deleted.
Using Kafka I was thinking about the following architecture:
Rest operations are queuing in a Kafka topic.
Operations to the same object are going to the same partition. So all the object's operations will be in sequence and processed by a consumer
Consumers are listening to a partition and validate the operation using an in-memory database
If the operation was valid then is sent to a "Valid operation topic" otherways is sent to an "Invalid operation topic"
Other consumers (db, log, audit) are listening to the "Valid operation topic"
I am not very sure about point number 3.
I don't like the idea to keep the state of all my objects. (I have billions of objects and even if an object can be of 10mb in size, what I need to store to validate its state is just few Kbytes...)
However, is this a common pattern? Otherwise how can you verify the validity of certain operations?
Also what would do you use as a in-memory database? Surely it has to be highly available, fault-tolerant and support transaction (read and write).
I believe this is a very valid pattern, and is essentially a variation to an event-sourced CQRS pattern.
For example, Lagom implements their CQRS persistence in a very similar fashion (although based on completely different toolset)
A few points:
you are right about the need for sequencial operations: since all your state mutations need to be based on the result of the previous mutation, there must be a strong order in their execution. This is very often the case for such things, so we like to be able to scale those operations horizontally as much as possible so that each of those sequences operations is happening in parallel to many other sequences. In your case we have one such sequence per shared object.
Relying on Kafka partitioning by key is a good way to achieve that (assuming you do not set higher than the default value 1). Here again Lagom has a similar approach by having their persistent entity distributed and single-threaded. I'm not saying Lagom is better, I'm just comforting you in the fact that is approach is used by others :)
a key aspect of your pattern is the transformation of a Command into an Event: in that jargon a command is seen as a request to impact the state and may be rejected for various reasons. An event is a description of a state update that happened in the past and is irrefutable from the point of view of those who receive it: a event always tells the truth. The process you are describing would be a controller that is at the boundary between the two: it is responsible for transforming commands into events.
In that sense the "Valid operation topic" you mention would be an event-sourced description of the state updates of your process. Since it's all backed by Kafka it would be arbitrarily partionable and thus scalable, which is awesome :)
Don't worry about the size of the sate of all your object, it must sit somewhere somehow. Since you have this controller that transforms the commands into events, this one becomes the primary source of truth related to that object, and this one is responsible for storing it: this controller handles the primary storage for your events, so you must cater space for it. You can use Kafka Streams's Key value store: those are local to each of your processing instance, though if you make them persistent they have no problem in handling data much bigger that the available RAM. Behind the scene data is spilled to disk thanks to RocksDB, and even more behind the scene it's all event-sourced to a kafka topic so your state store is replicated and will be transparently re-created on another machine if necessary
I hope this helps you finalise your design :)

Handling large amount of events in event sourcing

CQRS with event sourcing looks like a perfect fit as an architecture for one of our systems, there is only one little thing we are current worried about: Handling a large amount of events and dealing with huge event stores as a consequence.
Our current system receives about a million events a day (which currently have nothing to do with event sourcing though), if we were to store them all over a longer period of time, our event stores might get pretty big but if we dump/purge to a rolling snapshot frequently, we might loose one of the big advantages of event sourcing: information about the history of the system and replay.
What are common ways to deal with this problem in a CQRS architecture? Is it a problem at all? Do we just throw more hardware at the event store or is there something we can do at the architecture design level?
I think the most common approach is to use snapshots and persistent read models. That is, you don't actually replay your events very often, except when you need to build a new read model or change the way an existing one works. By storing snapshots of your domain objects, you avoid having to replay long streams of events.
One could argue that storing snapshots and persistent read models isn't a whole lot different than just doing CQRS without event-sourcing. But the old events are there in the event that you made a mistake in your read model, or need to derive new information, or have other strict auditing requirements.
In our application, where we have many events that have low business value, we plan to scrub events heavily during execution so that our event logs stay smaller. But I imagine for some objects we will still fall back to snapshots and persistent models.
Look at your "active streamset". Are there streams that have a lifecycle where they tend to come into existence, mutate over a relatively short period of time, and then die as they reach their final state? If so, these streams could be moved to cheaper storage (backup). The only reason you'd need them is for replaying purposes, so you may want to either make them still accessible (albeit at a slower response rate) or keep a compressed copy for replay purposes around. In any case, do question if there are streams you can move out of the event store or at least out the active streamset.
Another option is to partition your streams across multiple physical event stores. Maybe there is a geographical boundary that can be used, or maybe there's something that naturally partitions them (the domain you are in usually provides hints). It's the kind of thing where you need to reflect about advantages and disadvantages.
This technique is not restricted to event sourcing. It can equally be applied to state-based models (it's just data afterall).

NEventStore 3.0 - Throughput / Performance

I have been experimenting with JOliver's Event Store 3.0 as a potential component in a project and have been trying to measure the throughput of events through the Event Store.
I started using a simple harness which essentially iterated through a for loop creating a new stream and committing a very simple event comprising of a GUID id and a string property to a MSSQL2K8 R2 DB. The dispatcher was essentially a no-op.
This approach managed to achieve ~3K operations/second running on an 8 way HP G6 DL380 with the DB on a separate 32 way G7 DL580. The test machines were not resource bound, blocking looks to be the limit in my case.
Has anyone got any experience of measuring the throughput of the Event Store and what sort of figures have been achieved? I was hoping to get at least 1 order of magnitude more throughput in order to make it a viable option.
I would agree that blocking IO is going to be the biggest bottleneck. One of the issues that I can see with the benchmark is that you're operating against a single stream. How many aggregate roots do you have in your domain with 3K+ events per second? The primary design of the EventStore is for multithreaded operations against multiple aggregates which reduces contention and locks for read-world applications.
Also, what serialization mechanism are you using? JSON.NET? I don't have a Protocol Buffers implementation (yet), but every benchmark shows that PB is significantly faster in terms of performance. It would be interesting to run a profiler against your application to see where the biggest bottlenecks are.
Another thing I noticed was that you're introducing a network hop into the equation which increases latency (and blocking time) against any single stream. If you were writing to a local SQL instance which uses solid state drives, I could see the numbers being much higher as compared to a remote SQL instance running magnetic drives and which have the data and log files on the same platter.
Lastly, did your benchmark application use System.Transactions or did it default to no transactions? (The EventStore is safe without use of System.Transactions or any kind of SQL transaction.)
Now, with all of that being said, I have no doubt that there are areas in the EventStore that could be dramatically optimized with a little bit of attention. As a matter of fact, I'm kicking around a few backward-compatible schema revisions for the 3.1 release to reduce the number writes performed within SQL Server (and RDBMS engines in general) during a single commit operation.
One of the biggest design questions I faced when starting on the 2.x rewrite that serves as the foundation for 3.x is the idea of async, non-blocking IO. We all know that node.js and other non-blocking web servers beat threaded web servers by an order of magnitude. However, the potential for complexity introduced on the caller is increased and is something that must be strongly considered because it is a fundamental shift in the way most programs and libraries operate. If and when we do move to an evented, non-blocking model, it would be more in a 4.x time frame.
Bottom line: publish your benchmarks so that we can see where the bottlenecks are.
Excellent question Matt (+1), and I see Mr Oliver himself replied as the answer (+1)!
I wanted to throw in a slightly different approach that I myself am playing with to help with the 3,000 commits-per-second bottleneck you are seeing.
The CQRS Pattern, that most people who use JOliver's EventStore seem to be attempting to follow, allows for a number of "scale out" sub-patterns. The first one people usually queue off is the Event commits themselves, which you are seeing a bottleneck in. "Queue off" meaning offloaded from the actual commits and inserting them into some write-optimized, non-blocking I/O process, or "queue".
My loose interpretation is:
Command broadcast -> Command Handlers -> Event broadcast -> Event Handlers -> Event Store
There are actually two scale-out points here in these patterns: the Command Handlers and Event Handlers. As noted above, most start with scaling out the Event Handler portions, or the Commits in your case to the EventStore library, because this is usually the biggest bottleneck due to the need to persist it somewhere (e.g. Microsoft SQL Server database).
I myself am using a few different providers to test for the best performance to "queue up" these commits. CouchDB and .NET's AppFabric Cache (which has a great GetAndLock() feature). [OT]I really like AppFabric's durable-cache features that lets you create redundant cache servers that backup your regions across multiple machines - therefore, your cache stays alive as long as there is at least 1 server up and running.[/OT]
So, imagine your Event Handlers do not write the commits to the EventStore directly. Instead, you have a handler insert them into a "queue" system, such as Windows Azure Queue, CouchDB, Memcache, AppFabric Cache, etc. The point is to pick a system with little to no blocks to queue up the events, but something that is durable with redundancy built-in (Memcache being my least favorite for redundancy options). You must have that redundancy, in the case that if a server drops, you still have the event queued up.
To finally commit from this "Queued Event", there are several options. I like Windows Azure's Queue pattern for this, because of the many "workers" you can have constantly looking for work in the queue. But it doesn't have to be Windows Azure - I've mimicked Azure's Queue pattern in local code using a "Queue" and "Worker Roles" running in background threads. It scales really nicely.
Say you have 10 workers constantly looking into this "queue" for any User Updated events (I usually write a single worker role per Event type, makes scaling out easier as you get to monitor the stats of each type). Two events get inserted into the queue, the first two workers instantly pick up a message each, and insert them (Commit them) directly into your EventStore at the same time - multithreading, as Jonathan mentioned in his answer. Your bottleneck with that pattern would be whatever database/eventstore backing you select. Say your EventStore is using MSSQL and the bottleneck is still 3,000 RPS. That is fine, because the system is built to 'catch up' when those RPS drops down to, say 50 RPS after a 20,000 burst. This is the natural pattern CQRS allows for: "Eventual Consistency."
I said there was other scale-out patterns native to the CQRS patterns. Another, as I mentioned above, is the Command Handlers (or Command Events). This is one I have done as well, especially if you have a very rich domain domain as one of my clients does (dozens of processor-intensive validation checks on every Command). In that case, I'll actually queue off the Commands themselves, to be processed in the background by some worker roles. This gives you a nice scale out pattern as well, because now your entire backend, including the EvetnStore commits of the Events, can be threaded.
Obviously, the downside to that is that you loose some real-time validation checks. I solve that by usually segmenting validation into two categories when structuring my domain. One is Ajax or real-time "lightweight" validations in the domain (kind of like a Pre-Command check). And the others are hard-failure validation checks, that are only done in the domain but not available for realtime checking. You would then need to code-for-failure in Domain model. Meaning, always code for a way out if something fails, usually in the form of a notification email back to the user that something went wrong. Because the user is no longer blocked by this queued Command, they need to be notified if the command fails.
And your validation checks that need to go to the 'backend' is going to your Query or "read-only" database, riiiight? Don't go into the EventStore to check for, say, a unique Email address. You'd be doing your validation against your highly-available read-only datastore for the Queries of your front end. Heck, have a single CouchDB document be dedicated to only a list of all email addresses in the system as your Query portion of CQRS.
CQRS is just suggestions... If you really need realtime checking of a heavy validation method, then you can build a Query (read-only) store around that, and speed up the validation - on the PreCommand stage, before it gets inserted into the queue. Lots of flexibility. And I would even argue that validating things like empty Usernames and empty Emails is not even a domain concern, but a UI responsiblity (off-loading the need to do real-time validation in the domain). I've architected a few projects where I had very rich UI validation on my MVC/MVVM ViewModels. Of course my Domain had very strict validation, to ensure it is valid before processing. But moving the mediocre input-validation checks, or what I call "light-weight" validation, up into the ViewModel layers gives that near-instant feedback to the end-user, without reaching into my domain. (There are tricks to keep that in sync with your domain as well).
So in summary, possibly look into queuing off those Events before they are committed. This fits nicely with EventStore's multi-threading features as Jonathan mentions in his answer.
We built a small boilerplate for massive concurrency using Erlang/Elixir, using Eventstore. We still have to optimize db connections, pooling, etc... but the idea of having one process per aggregate with multiple db connections is aligned with your needs.

Is CEP what I need (system state and event replaying)

I'm looking for a CEP engine, but I' don't know if any engine meets my requirements.
My system has to process multiple streams of event data and generate complex events and this is exactly what almost any CEP engine perfectly fits (ESPER, Drools).
I store all raw events in database (it's not CEP part, but I do this) and use rules (or continious queries or something) to generate custom actions on complex events. But some of my rules are dependent on the events in the past.
For instance: I could have a sensor sending event everytime my spouse is coming or leaving home and if both my car and the car of my fancy woman are near the house, I get SMS 'Dangerous'.
The problem is that with restart of event processing service I lose all information on the state of the system (is my wife at home?) and to restore it I need to replay events for unknow period of time. The system state can depend not only on raw events, but on complex events as well.
The same problem arises when I need some report on complex events in the past. I have raw events data stored in database, and could generate these complex events replaying raw events, but I don't know for which exactly period I have to replay them.
At the same time it's clear that for the most rules it's possible to find automatically the number of events to be processed from the past (or period of time to load events to be processed) to restore system state.
If given action depends on presence of my wife at home, CEP system has to request last status change. If report on complex events is requested and complex event depends on average price within the previous period, all price change events for this period should be replayed. And so on...
If I miss something?
The RuleCore CEP Server might solve your problems if I remember correctly. It does not lose state if you restart it and it contains a virtual logical clock so that you can replay events using any notion of time.
I'm not sure if your question is whether current CEP products offer joining historical data with live events, but if that's what you need, Esper allows you to pull data from JDBC sources (which connects your historical data with your live events) and reflect them in your EPL statements. I guess you already checked the Esper website, if not, you'll see that Esper has excellent documentation with lots of cookbook examples
But even if you model your historical events after your live events, that does not solve your problem with choosing the correct timeframe, and as you wrote, this timeframe is use case dependent.
As previous people mentioned, I don't think your problem is really an engine problem, but more of a use case one. All engines I am familiar with, including Drools Fusion and Esper can join incoming events with historical data and/or state data queried on demand from an external source (like a database). It seems to me that what you need to do is persist state (or "timestamp check-points") when a relevant change happens and re-load the state on re-starts instead of replaying events for an unknown time frame.
Alternatively, if using Drools, you can inspect existing rules (kind of reflection on your rules/queries) to figure out which types of events your rules need and backtrack your event log until a point in time where all requirements are met and load/replay your events from there using the session clock.
Finally, you can use a cluster to reduce the restarts, but that does not solve the problem you describe.
Hope it helps.