I am implementing a bus using Rx (reactive framework) and so far it looks good. The problem that I am facing right now is how to add events before the beginning of a stream.
To give more context, this is for an Event Source (a la CQRS/ES). When the bus starts, a number of subscribers get the IObservable. At this point, the bus asks the subscribers what event number/sequence they need to start at. The events are loaded from disk starting at the lowest number and added to the stream, with each subscriber using a where statement to start at the right event. As the application runs, new events get added and go to the subscribers. This part works great.
Some subscribers that attach late, but still need all the events. It looks like ReplaySubject fits the bill, as long as the number of events is small enough. I am thinking that I can probably do some disk/offline caching to keep more them around (any pointers welcome!).
The bigger issue is that when some subscribers get the IObservable, they need to get events that occurred before the ones that were loaded initially. A situation like the following.
When the bus started it loaded event #50 onward (the earliest anyone wanted). Now, a new subscriber requests the IObservable, except they need from #20 onward. So what I want to do is load 20-49, and add them before the start of the stream. None of the existing subscribers should see any of the events (it will be filtered by the Where).
It seems like this should be possible, but I can't quite get my head around it. Is this possible in Rx? If so, how?
Thanks!
Erick
I would look at this in a very simple Rx way. Don't use a ReplaySubject, use a single Subject for all of the "live" events and concat the historical events in front of this for each subscriber.
I would imagine that you might have some sort of data structure that keeps track of the historical records. If you've loaded/captured events from #50 to say #80 in memory already, then when a new subscriber comes along that requests the events from #20 I'd do something like this:
var fromIndex = 20;
var eventSubject = new Subject<Event>();
capturedEvents
.GetFromIndex(fromIndex) // Gets a enumerable list of events
.ToObservable()
.Concat(eventSubject)
.Subscribe( ... );
You then can have the eventSubject be used to capture new events, like so:
eventSubject.Subscribe(capturedEvents.CaptureEvent);
Does this meet your need?
How about just:
itemsFromTwentyToFourtyNine.Concat(currentItems);
Related
Trying to migrate to rx-java2and came across a problem with resubscribing the shared observable inside it's own flatMap. Need this pattern to get-update-refresh chain:
Get current data from network (shared observable to avoid multiple network requests if source is being subscribed by several observers at the same time).
Modify the data and send it back to server (completable)
Get the data again after update completes
The whole thing looks like this:
#Test fun sharedTest() {
val o = Observable.just(1).share()
assertEquals(1, o
.take(1)
.flatMap({
Completable.complete()
.andThen(o) })
.blockingFirst())
}
The test fails with: java.util.NoSuchElementException
If o is not shared everything works.
That behavior seems to be because the latter subscriber comes when a single value of original has already been dispatched and only onComplete event is to be seen.
Does anybody know is that a by-design behavior and documented somehow? There is a workaround of course but I need to know the cause, as this is a bit annoying. The approach worked in Rx 1.x
Currently using version 2.1.3
Edit:
Seems to be no legitimate way to "restart" a shared observable and its side-effects as there is no guarantee other subscribers are not listening at the moment.
Take a look at the bubble diagram for 'share' and you'll see why it behaves like that: Observable.share().
share() emits items that are emitted after the subscription, it does not re-emit previously emitted items. Take a look at Observable.replay() for the behavior that should be what you expect.
Seems to be no legitimate way to "restart" a shared observable and its side-effects as there is no guarantee other subscribers are not listening at the moment.
I have a CQRS/ES application where some of the views are populated by events from multiple aggregate roots.
I have a CashRegisterActivated event on the CashRegister aggregate root and a SaleCompleted event on the Sale aggregate root. Both events are used to populate the CashRegisterView. The CashRegisterActivated event creates the CashRegisterView or sets it active in case it already exists. The SaleCompleted event sets the last sale sequence number and updates the cash in the drawer.
When two of these events arrive within milliseconds, the first update is overwritten by the last one. So that's a lost update.
I already have a few possible solutions in mind, but they all have their drawbacks:
Marshal all event processing for a view or for one record of a view on the same thread. This works fine on a single node, but once you scale out, things start to get complex. You need to ensure all events for a view are delivered to the same node. And you need to migrate to another node when it goes down. This requires some smart load balancer which is aware of the events and the views.
Lock the record before updating to make sure no other threads or nodes modify it in the meantime. This will probably work fine, but it means giving up on a lock-free system. Threads will set there, waiting for a lock to be freed. Locking also means increased latency when I scale out the data store (if I'm not mistaken).
For the record: I'm using Java with Apache Camel, RabbitMQ to deliver the events and MariaDB for the view data store.
I have a CQRS/ES application where some of the views in the read model are populated by events from multiple aggregate roots.
That may be a mistake.
Driving a process off of an isolated event. But composing a view normally requires a history, rather than a single event.
A more likely implementation would be to use the arrival of the events to mark the current view stale, and to use a single writer to update the view from the history of events produced by the aggregate(s) concerned.
And that requires a smart messaging solution. I thought "Smart endpoints and dumb pipes" would be a good practice for CQRS/ES systems.
It is. The endpoints just need to be smart enough to understand when they need histories, or when events are sufficient.
A view, after all, is just a snapshot. You take inputs (X.history, Y.history), produce a snapshot, write the snapshot into your view store (possibly with meta data describing the positions in the histories that were used), and you are done.
The events are just used to indicate to the writer that a previous snapshot is stale. You don't use the event to extend the history, you use the event to tell the writer that a history has changed.
You don't lose updates with multiple events, because the event itself, with all of its state, is captured in the history. It's the history that is used to build the event-sourced view.
Konrad Garus wrote
... handling events coming from a single source is easier, but more importantly because a DB-backed event store trivially guarantees ordering and has no issues with lost or duplicate messages.
A solution could be to detect the when this situation happens, and do a retry.
To do this:
Add to each table the aggregate version number which is kept up to date
On each update statement add the following the the where clause "aggr_version=n-1" (where n is the version of the event being processed)
When the result of the update statement is that no records where modified, it probably means that the event was processed out of order and a retry strategy can be performed
The problem is that this adds complexity and is hard to test. The performance bottleneck is very likely in the database, so a single process with a failover solution will probably be the easiest solution.
Although I see you ask how to handle these things at scale - I've seen people recommend using a single threaded approach - until such times as it actually becomes a problem - and then address it.
I would have a process manager per view model, draw the events you need from the store and write them single threaded.
I combined the answers of VoiceOfUnreason and StefRave into something I think might work. Populating a view from multiple aggregate roots feels wrong indeed. We have out of order detection with a retry queue. So an event on an aggregate root will only be processed when the last completely processed event is version n-1.
So when I create new aggregate roots for the views that would be populated by multiple aggregate roots (say aggregate views), all updates for the view will be synchronised without row locking or thread synchronisation. We have conflict detection with a retry mechanism on the aggregate roots as well, that will take care of concurrency on the command side. So if I just construct these aggregate roots from the events I'm currently using to populate the aggregate views, I will have solved the lost update problem.
Thoughts on this solution?
I am trying to figure out how to manage a users game state using akka.
The game state will be persisted to mysql and this cannot change because we have other services that require this.
Anything that happens in a game is considered an "event".
Then you I have "Levels" which someone can achieve. A level is achieved when you complete all the "events" associated with it.
So you have:
Level
- event1 e.g. reach a point in the game
- event2 e.g. pickup a sword
- event3 e.g. defeat a monster
So in a game there are many levels, and 100's of events that are linked to levels.
So all "events" are sent via HTTP to my backend, and I save the event in the database.
I then have to load the users game profile in memory, and then re-calculate the Level's achieved since there was a new event that happened.
Note: This calculation cannot be done at the database level because it is a little more complicated that I am writing here.
The problem I see is that if I use akka, I can't have multiple actors processing the events for the same user, because the data can become stale.
Just to be clear, so when a new event arrives, I have to load the game profile in memory, loop through the levels and see if any of them have been achieved, if they have, update the database
e.g. update levels set achieved=true where level_id = 123 and user_id=234
e.g. actor1 loads the profile (all the levels and events for this user) and then processes the new event that just arrived in the inbox.
at the same time, actor2 loads the profile (same as actor1), and then processes the new event. When it persists the changes to mysql, the data will be out of sych.
If I was using threads, I would have to lock during the game profile calculation and persisting to the db.
How can I do this using Akka and be able to handle things in parallel, or is this scenerio not allow for it?
Let's think how you would manage it without actors. So, in nutshell, you have the following problem scenario:
two (or more) update requests arrive at the same time, both are
going to modify the same data
both requests read some stable data
state, then update it each in its own manner and persist to the DB
the modifications from the request which checked in first are lost, more precisely - overridden by the later request.
This is a classical problem. There are at least two classical solutions of it:
Optimistic locking
Pessimistic locking: it's usually achieved by applying Serializable isolation level for transactions.
It worth reading this answer with a nice comparison of both worlds.
As you're using Akka, you most probably want to prefer better concurrency and occasional failures, which are easy to recover. It goes on par with Akka motto let it crash.
So, you need to make the next steps:
Add version column to your table(s). It can be numeric or string (with hash). Numeric is the simplest one.
When you insert new record - initialize versions.
When you update the record - check version value has not changed. So, here's your update strategy:
Read record and its version.
Update record in memory.
Execute update query with criteria where rec_id=$id and version=$version.
If updated records count is 1 - you're good. If 0 - throw OptimisticLockException or smth like this.
Finally, it's time for Akka to do its job: come up with appropriate supervision strategy (I'd pick something like try again in 1 second). In actor's preRestart method return the update message back to the actor's mailbox (see Restart Hooks chapter in Akka docs).
With this strategy, even if two requests try to update the same record at a time, one of them will fail and will be immediately processed again.
I'm trying to learn EventStore, I like the concept but when I try to apply in practice I'm getting stuck in same point.
Let's see the code:
foreach (var k in stream.CommittedEvents)
{
//handling events
}
Two question about that:
When an app start ups after some maintenance, how do we bookmark in a
safe way what events start to read? Is there a pattern to use?
as soon the events are all consumed, the cycle ends... what about the message arriving run time? I would expect the call blocking until some new message arrive ( of course need to be handled in a thread ) or having something like BeginRead EndRead.
Do I have to bind an ESB to handle run time event or does the EventSore provides some facility to do this?
I try to better explain with an example
Suppose the aggregate is a financial portfolio, and the application is an application showing that portfolio to a trader. Suppose the trader connect to the web app and he looks at his own portfolio. The current state will be the whole history, so I have to read potentially a lot of records to reproduce the status. I guess this could be done by a so called snapshot, but who's responsible for creating it? When one should choose to create an aggregate? How can one guess a snapshot for an aggregate exists ?
For the runtime part: as soon the user look at the reconstructed portfolio state, the real time part begin to run. The user can place an order and a new position can be created by succesfully execute that order in the market. How is the portfolio updated by the infrastructure? I would expect, but maybe I'm completely wrong, having the same event stream being the source of that new event new long position, otherwise I have two path handling the state of the same aggregate. I would like to know if this is how the strategy is supposed to work, even if I feel a little tricky having the two state agents, that can possibly overlap.
Just to clarify how I fear the overlapping:
I know events has to be idempotent, so I know it must not be a
problem anyway,
But let's consider the following:
I subscribe an event bus before streaming the event to update the state of the portfolio. some "open position event" appears on the bus: I must handle them, but maybe the portfolio is not in the correct state to handle it since is not yet actualized. Even if I'm able to handle such events I will find them again when I read the stream.
More insidious: I open the stream and I read all events and I create a state. Then I subscribe to the bus: some message on the bus happen in the middle between the end of the steram reading and the beggining of the subscription: those events are missing and the aggregate is not in the correct state.
Please be patient all, my English is poor and the argument is tricky, hope I managed to share my doubt :)
The current state will be the whole history, so I have to read
potentially a lot of records to reproduce the status. I guess this
could be done by a so called snapshot, but who's responsible for
creating it?
In CQRS and event sourcing, queries are served by projections which are generated from events emitted by aggregates. You don't use the aggregate instance as reconstituted from the event store to display information.
The term snapshot refers specifically to an optimization of the event store which allows rebuilding the aggregate without replaying all of the events.
Projections are essentially event handlers which maintain a denormalized view of aggregates. Events emitted from aggregates are published, possibly out of band, and the projection subscribes to and handles those events. A projection can combine multiple aggregates if a requirement exists to display summary information, for instance. In case of a trading application, each view will typically contain data from various aggregates. Projections are designed in a consumer-driven way - application requirements determine the different views of the underlying data that are needed.
With this type of workflow you have to embrace eventual consistency throughout your application. For instance, if an end user is viewing their portfolio and initiating new trades, the UI has to subscribe to updates to reflect updated projections in an asynchronous manner.
Take a look at here for an overview of CQRS and event sourcing.
I have a composite structure in my domain where the leaf node (Allocation) has a DurationChanged event that I would like to use at the top of my presentation layer view model structure (in the TimeSheetViewModel), and I am wondering what the best way is to get to it.
Options that come to mind include:
Subscribe to it in the TimeSheetComposite. Each composite is ultimately composed of Allocations, and the TimeSheetComposite is the Model to the TimeSheetViewModel. It seems I would also need an event in the TimeSheetComposite that gets fired when a child DurationChanged event is fired; the TimeSheetViewModel would subscribe to the latter event.
Ignore the DurationChanged event and just follow the INPC chain that bubbles up to the TimeSheetViewModel when AllocationViewModel.Amount is changed. I wouldn't have a useful piece of information, specifically the old Amount prior to the edit, but I can calculate the needed end results cheaply enough if necessary.
Make the DurationChanged event a Domain Event; I do not currently use domain events, but I sure like the concept and it looks like there is enough code in Udi's article to get started with it.
Set up some sort of Event Aggregator to publish & subscribe to DurationChanged. I am not very sure yet what the difference is yet between Domain Events and Event Aggregators are, and whether they are complimentary or alternative approaches to solving the same thing. The implementation here using Rx looks promising.
In this design, the TimeSheetViewModel needs to know when an Allocation.Duration has changed so it can get a new total of all allocation durations by date.
How would you provide the DurationChanged notice?
Cheers,
Berryl
Domain Composite structure & event
Presentation layer structure
I wound up listening for the leaf event in the (TimeSheet)Composite, and then essentially re-throwing a similar event there to make it easy for the (TimeSheet)ViewModel to listen to it.
When I understand DomainEvents / EventAggregators better I will revisit this one.
Cheers,
Berryl