How to create snapshots using GetStreamsToSnapshot in EventStore 3.0

How to create snapshots using GetStreamsToSnapshot in EventStore 3.0 - cqrs

We are following CQRS architecture and using Jonathan Oliver's event-store version 3 for events. We want to create snapshot of the aggregate roots to improve performance.
I found an API (GetStreamsToSnapshot) which can be used for this. It gives all streams based upon how long before the snapshots have been created.
But I am not sure how to use the stream to create the snapshot as I do not know the aggregate type.
Please provide any inputs on how to create snapshots

As you have discovered, GetStreamsToSnapshot gives you a list of streams that are at least X revisions behind the head revision.
From there, it's a matter of loading up each stream. This is where you can append some kind of header information to the stream to determine what type of aggregate you're dealing.
Many times I'm asked why I don't just store the aggregate type information directly into the EventStore and make it a first-class part of the API. The answer is that it doesn't care about aggregates which is a DDD concept. All the EventStore cares about is streams and events.

Related

Axon or Kafka to support CQRS/ES

Consider the simple use case in which I want to store product ratings as events in an event store.
I could use two different approaches:
Using Axon: A Rating aggregate is responsible for handling the CreateRatingCommand and sending the RatingCreatedEvent. Sending the event would case the Rating to be stored in the event store. Other event handlers have the possibility to replay the event stream when connecting to the Axon server instance and doing whatever needed with the ratings. In this case, the event handler will be used as a stream processor.
Using Kafka: A KafkaProducer will be used to store a Rating POJO (after proper serialization) in a Kafka topic. Setting the topic's retention time to indefinite would cause no events to get lost in time. Kafka Streams would in this case be used to do the actual rating processing logic.
Some architectural questions appear to me for both approaches:
When using Axon:
Is there any added value to use Axon (or similar solutions) if there is no real state to be maintained or altered within the aggregate? The aggregate just serves as a "dumb" placeholder for the data, but does not provide any state changing logic.
How does Axon handle multiple event handlers of the same event type? Will they all handle the same event (same aggregate id) in parallel, or is the same event only handled once by one of the handlers?
Are events stored in the Axon event store kept until the end of time?
When using Kafka:
Kafka stores events/messages with the same key in the same partition. How does one select the best value for a key in the use case of user-product ratings? UserId, ProductId or a separate topic for both and publish each event in both topics.
Would it be wise to use a separate topic for each user and each product resulting in a massive amount of topics on the cluster? (Approximately <5k products and >10k users).
I don't know if SO is the preferred forum for this kind of questions... I was just wondering what you (would) recommend in this particular use case as the best practise. Looking forward to your feedback and feel free to point out other points of thought I missed in the previous questions.
EDIT#12/11/2020 : I just found a related discussion containing useful information related to my question.

As Jan Galinski already puts it, this hasn't got a fool proof answer to it really. This is worth a broader discussion on for example indeed AxonIQ's Discuss forum. Regardless, there are some questions in here I can definitely give an answer to, so let's get to it:
Axon Question 1 - Axon Framework is as you've noticed used a lot for DDD centric applications. Nothing however forces you to base yourself on that notion at all. You can strip the framework from Event Sourcing specifics, as well as modelling specifics entirely and purely go for the messaging idea of distinct commands, events and queries. It has been a conscious decision to segregate Axon Framework version 3 into these sub-part when version 4 (current) was released actually. Next to that, I think there is great value in not just basing yourself on event messages. Using distinct commands and queries only further decouples your components, making for a far richer and easier to extend application landscape.
Axon Question 2 - This depends on where the #EventHandler annotated methods are located actually. If they're in the same class only one will be invoked. If they're positioned into distinct classes, then both will receive the same event. Furthermore if they're segregated between distinct classes, it is important to note Axon uses an Event Processor as the technical solution to invoking your event handlers. If distinct classes are grouped under the same Event Processor, you can impose a certain ordering which handler is invoked first. Next to this if the event handling should occur in parallel, you will have to configure a so called TrackingEventProcessor (the default in Axon Framework), as it allows configuration of several threads to handle events concurrently. Well, to conclude this section, everything you're asking in question two is an option, neither a necessity. Just a matter of configuration really. Might be worth checking up on this documentation page of Axon Framework on the matter.
Axon Question 3 - As Axon Server serves the purpose of an Event Store, there is no retention period at all. So yes, they're by default kept until the end of time. There is nothing stopping your from dropping the events though, if you feel there's no value in storing the events to for example base all your models on (as you'd do when using Event Sourcing).
It's the Kafka question I'm personally less familiar with (figures as a contributor to Axon Framework I guess). I can give you my two cents on the matter here too though, although I'd recommend a second opinion here:
Kafka Question 1 - From my personal feeling of what such an application would require, I'd assume you'd want to be able to retrieve all data for a given product as efficient as possible. I'd wager it's important that all events are in the same partition to make this process as efficient as possible, is it wouldn't require any merging afterwards. With this in mind, I'd think using the ProductId will make most sense.
Kafka Question 2 - If you are anticipating only 5_000 products and 10_000 users, I'd guess it should be doable to have separate topics for these. Opinion incoming - It is here though were I personally feel that Kafka's intent to provide you direct power to decide on when to use topics over complicates from what you'd actually try to achieve, which business functionality. Giving the power to segregate streams feels more like an after thought from the perspective of application development. As soon as you'd require an enterprise grade/efficient message bus, that's when this option really shines I think, as then you can optimize for bulk.
Hoping all this helps you further #KDW!

EventStore basics - what's the difference between Event Meta Data/MetaData and Event Data?

I'm very much at the beginning of using / understanding EventStore or get-event-store as it may be known here.
I've consumed the documentation regarding clients, projections and subscriptions and feel ready to start using on some internal projects.
One thing I can't quite get past - is there a guide / set of recommendations to describe the difference between event metadata and data ? I'm aware of the notional differences; Event data is 'Core' to the domain, Meta data for describing, but it is becoming quite philisophical.
I wonder if there are hard rules regarding implementation (querying etc).
Any guidance at all gratefully received!

Shamelessly copying (and paraphrasing) parts from Szymon Kulec's blog post "Enriching your events with important metadata" (emphases mine):
But what information can be useful to store in the metadata, which info is worth to store despite the fact that it was not captured in
the creation of the model?
1. Audit data
who? – simply store the user id of the action invoker
when? – the timestamp of the action and the event(s)
why? – the serialized intent/action of the actor
2. Event versioning
The event sourcing deals with the effect of the actions. An action
executed on a state results in an action according to the current
implementation. Wait. The current implementation? Yes, the
implementation of your aggregate can change and it will either because
of bug fixing or introducing new features. Wouldn’t it be nice if
the version, like a commit id (SHA1 for gitters) or a semantic version
could be stored with the event as well? Imagine that you published a
broken version and your business sold 100 tickets before fixing a bug.
It’d be nice to be able which events were created on the basis of the
broken implementation. Having this knowledge you can easily compensate
transactions performed by the broken implementation.
3. Document implementation details
It’s quite common to introduce canary releases, feature toggling and
A/B tests for users. With automated deployment and small code
enhancement all of the mentioned approaches are feasible to have on a
project board. If you consider the toggles or different implementation
coexisting in the very same moment, storing the version only may be
not enough. How about adding information which features were applied
for the action? Just create a simple set of features enabled, or map
feature-status and add it to the event as well. Having this and the
command, it’s easy to repeat the process. Additionally, it’s easy to
result in your A/B experiments. Just run the scan for events with A
enabled and another for the B ones.
4. Optimized combination of 2. and 3.
If you think that this is too much, create a lookup for sets of
versions x features. It’s not that big and is repeatable across many
users, hence you can easily optimize storing the set elsewhere, under
a reference key. You can serialize this map and calculate SHA1, put
the values in a map (a table will do as well) and use identifiers to
put them in the event. There’s plenty of options to shift the load
either to the query (lookups) or to the storage (store everything as
named metadata).
Summing up
If you create an event sourced architecture, consider adding the
temporal dimension (version) and a bit of configuration to the
metadata. Once you have it, it’s much easier to reason about the
sources of your events and introduce tooling like compensation.
There’s no such thing like too much data, is there?

I will share my experiences with you which may help. I have been playing with akka-persistence, akka-persistence-eventstore and eventstore. akka-persistence stores it's event wrapper, a PersistentRepr, in binary format. I wanted this data in JSON so that I could:
use projections
make these events easily available to any other technologies
You can implement your own serialization for akka-persistence-eventstore to do this, but it still ended up just storing the wrapper which had my event embedded in a payload attribute. The other attributes were all akka-persistence specific. The author of akka-persistence-eventstore gave me some good advice, get the serializer to store the payload as the Data, and the rest as MetaData. That way my event is now just the business data, and the metadata aids the technology that put it there in the first place. My projections now don't need to parse out the metadata to get at the payload.

Using aggregates and Domain events with nosql storage

I'm wandering on DDD and NoSql field actually. I have a doubt now: i need to produce events from the aggregate and i would like to use a NoSql storage. But how can i be sure that events are saved on the storage AND the changes on the aggregate root not having transactions?
Does it makes sense? Is there a way to do this without being forced to use event sourcing or a transactional db?
Actually i was lookin at implementing a 2 phase commit algorithm but it seems pretty heavy from a performance point of view...
Am i approaching the problem the wrong way?
Stuffed with questions...
Thanks for every suggestion
Enrico
PS
I'm a newbie on stackoverflow so any suggestion/critic/... is more than welcome
Enrico
Edit 1
Well i would need events to notify aggregates that something happened and i they should react to the change. The problem arise when such events are important for the business logic. As far as i understood, after a night of thinking, i can't use a nosql storage to do such things. Let me explain (thinking with loud voice :P):
With ES (1st scenery): I save the "diff" of the data. Then i produce an event associated with it. 2 operations.
With ES (2nd scenery): I save the "diff" of the data. A process, watch the ES and produce the event. But i'm tied to having only one watcher process to ensure the correct ordering of events.
With ES (3d scenery): Idempotent events. The events can be inferred by the state and every reapplication of the event can cause a change on the consumer only once, can have multiple "dequeue" processes, duplicates can't possibly happen. 1 operation, but it introduce heavy limitations on the consumers.
In general: I save the aggregate's data. Then i produce an event associated with it. 2 operations.
Now the question becomes wider imho, is it possible to work with domain events and nosql when such domain events are fundamental part of the business process?
I think that could be a better option to go relational... even if i would need to add quite a lot of machines to get the same performances.
Edit 2
For the sake of completness, searching for "domain events nosql idempotent" on google: http://svendvanderveken.wordpress.com/2011/08/26/transactional-event-based-nosql-storage/

If you need Event Sourcing, you should store events only.
This should be the sequence:
the aggregate root recieves a command
it fires proper events
events are stored
Each aggregate's re-hydratation should be done only by executing events over them. You can create aggregates' snapshots if you measure performance problems on their initialization, but this doesn't require two-phase commits, since you can build snapshots asynchronously via batch.
Note however that you need CQRS and/or Event Sourcing only if your application is heavily concurrent and you need to cope with partition tolerance and compensating actions.
edit
Event Sourcing is alternative to the persistence of object state. You either store the events or the state of the object model. You can save snapshot, but they're just performance tools: your application must be able to work without them. You can consider such snapshots as a caching technique. As an alternative you can persist object state (the classical model), but in that case you don't need to store events.
In my own DDD application, I use observable entities to decouple (via direct events' subscription from the repository) aggregates and their persistence. For example your repository can subscribe each domain events, and execute the actions required by the application (persist to the store, dispatch to a queue and so on...). But as a persistence technique, Event Sourcing is alternative to classical persistence of the observable object state. In most scenarios you don't need both.
edit 2
A final note: if you choose ES, one of the events subscriber can build a relational read-model too.

CQRS + Event Sourcing: (is it correct that) Commands are generally communicated point-to-point, while Domain Events are communicated through pub/sub?

Didn't know how to shorten that title.
I'm basically trying to wrap my head around the concept of CQRS (http://en.wikipedia.org/wiki/Command-query_separation) and related concepts.
Although CQRS doesn't necessarily incorporate Messaging and Event Sourcing it seems to be a good combination (as can be seen with a lot of examples / blogposts combining these concepts )
Given a use-case for a state change for something (say to update a Question on SO), would you consider the following flow to be correct (as in best practice) ?
The system issues an aggregate UpdateQuestionCommand which might be separated into a couple of smaller commands: UpdateQuestion which is targeted at the Question Aggregate Root, and UpdateUserAction(to count points, etc) targeted at the User Aggregate Root. These are send asynchronously using point-to-point messaging.
The aggregate roots do their thing and if all goes well fire events QuestionUpdated and UserActionUpdated respectively, which contain state that is outsourced to an Event Store.. to be persisted yadayada, just to be complete, not really the point here.
These events are also put on a pub/sub queue for broadcasting. Any subscriber (among which likely one or multiple Projectors which create the Read Views) are free to subscribe to these events.
The general question: Is it indeed best practice, that Commands are communicated Point-to-Point (i.e: The receiver is known) whereas events are broadcasted (I.e: the receiver(s) are unknown) ?
Assuming the above, what would be the advantage/ disadvantage of allowing Commands to be broadcasted through pub/sub instead of point-to-point?
For example: When broadcasting Commands while using Saga's (http://blog.jonathanoliver.com/2010/09/cqrs-sagas-with-event-sourcing-part-i-of-ii/) could be a problem, since the mediation role a Saga needs to play in case of failure of one of the aggregate roots is hindered, because the saga doesn't know which aggregate roots participate to begin with.
On the other hand, I see advantages (flexibility) when broadcasting commands would be allowed.
Any help in clearing my head is highly appreciated.

Yes, for Command or Query there is only one and exactly one receiver (thus you can still load balance), but for Events there could be zero or more receivers (subscribers)

J Oliver EventStore V2.0 questions

I am embarking upon an implementation of a project using CQRS and intend to use the J Oliver EventStore V2.0 as my persistence engine for events.
1) In the documentation, ExampleUsage.cs uses 3 serializers in "BuildSerializer". I presume this is just to show the flexibility of the deserialization process?
2) In the "Restart after failure" case where some events were not dispatched I believe I need startup code that invokes GetUndispatchedCommits() and then dispatch them, correct?
3) Again, in "ExampleUseage.cs" it would be useful if "TakeSnapshot" added the third event to the eventstore and then "LoadFromSnapShotForward" not only retrieve the most recent snapshot but also retrieved events that were post snapshot to simulate the rebuild of an aggregate.
4) I'm failing to see the use of retaining older snapshots. Can you give a use case where they would be useful?
5) If I have a service that is handling receipt of commands and generation of events what is a suggested strategy for keeping track of the number of events since the last snapshot for a given aggregate. I certainly don't want to invoke "GetStreamsToSnapshot" too often.
6) In the SqlPersistence.SqlDialects namespace the sql statement name is "GetStreamsRequiringSnaphots" rather than "GetStreamsRequiringSnapShots"

1) There are a few "base" serializers--such as the Binary, JSON, and BSON serializers. The other two in the example--GZip/Compression and Encryption serializers are wrapping serializers and are only meant to modify what's already been serialized into a byte stream. For the example, I'm just showing flexibility. You don't have to encrypt if you don't want to. In fact, I've got stuff running production that uses simple JSON which makes debugging very easy because everything is text.
2) The SynchronousDispatcher and AsychronousDispatcher implementations are both configured to query and find any undispatched commits. You shouldn't have to do anything special.
3) Greg Young talked about how he used to "inline" his snapshots with the main event stream, but there were a number of optimistic concurrency and race conditions in high-performance systems that came up. He therefore decided to move them "out of band". I have followed this decision for many of the same reasons.
In addition snapshots are really a performance consideration when you have extrememly low SLAs. If you have a stream with a few thousand events on it and you don't have low SLAs, why not just take the minimal performance hit instead of adding additional complexity into your system. In other words, snapshots are "ancillary" concepts. They're in the EventStore API, but they're an optional concept that should be considered for certain use cases.
4) Let's suppose you had an aggregate with tens of millions of events and you wanted to run a "what if" scenario from before your most recent snapshot. It's a lot cheaper to go from another snapshot forward. The really nice thing about snapshots being a secondary concept is that if you wanted to drop older snapshots you could and it wouldn't affect your system at all.
5) There is a method in each implementation of IPersistStreams called GetStreamsRequiringSnapshots. You provide a threshold of 50, for example which finds all streams having 50 or more events since their last snapshot. This can (and probably should) be done asynchronously from your normal processing.
6) "Snapshots" is the correct casing for that word. Much like "website" used to be "Web site" but because of common usage it became "website".