I use Flowable.merge operator with buffer of size 1 for each from two upstream.
Flowable.merge(sources, 2, 1)
.observeOn(Schedulers.io(), false, 1)
Is possible use priority for one from streams when is called request(1) from downstream?
When both streams have item in queue (full) than should be emitted item (call of request(1)) of first source.
Sources can be two subjects.
You would need to create a specialized observable that implemented the priority handling.
The operator would construct two subscribers and subscribe them to the two observer chains. Each subscriber would then handle the merging of the streams. The "priority one" subscriber would receive a value, check if the down stream had requested a value and send it on. The "priority two" subscriber would receive a value and hold on to it if there was a "priority one" value waiting to be sent. Each subscriber would issue request(1) whenever it emitted a value downstream.
Fortunately, RxJava is open source, and, even more fortunately, there are several blogs that describe the proper implementation of RxJava operators. This blog by David Karnok has very well written descriptions of implementing many operators. His postings describe in excellent detail how to avoid the many pitfalls of threaded data structure access.
Related
This is easy for projections that subscribe to all events from the stream, you just keep version of the last event applied on your read model. But what do you do when projection is composite of multiple streams? Do you keep version of each stream that is partaking in the projection. But then what about the gaps, if you are not subscribing to all events? At most you can assert that version is greater than the last one. How do others deal with this? Do you respond to every event and bump up version(s)?
For the EventStore, I would suggest using the $all stream as the default stream for any read-model subscription.
I have used the category stream that essentially produces the snapshot of a given entity type but I stopped doing so since read-models serve a different purpose.
It might be not desirable to use the $all stream as it might also get events, which aren't domain events. Integration events could be an example. In this case, adding some attributes either to event contracts or to the metadata might help to create an internal (JS) projection that will create a special all stream for domain events, or any event category in that regard, where you can subscribe to. You can also use a negative condition, for example, filter out all system events and those that have the original stream name starting with Integration.
As well as processing messages in the correct order, you also have the problem of resuming a projection after it is restarted - how do you ensure you start from the right place when you restart?
The simplest option is to use an event store or message broker that both guarantees order and provides some kind of global stream position field (such as a global event number or an ordered timestamp with a disambiguating component such as MongoDB's Timestamp type). Event stores where you pull the events directly from the store (such as eventstore.org or homegrown ones built on a database) tend to guarantee this. Also, some message brokers like Apache Kafka guarantee ordering (again, this is pull-based). You want at-least-once ordered delivery, ideally.
This approach limits write scalability (reads scale fine, using read replicas) - you can shard your streams across multiple event store instances in various ways, then you have to track the position on a per-shard basis, which adds some complexity.
If you don't have these ordering, delivery and position guarantees, your life is much harder, and it may be hard to make the system completely reliable. You can:
Hold onto messages for a while after receiving them, before processing them, to allow other ones to arrive
Have code to detect missing or out-of-order messages. As you mention, this only works if you receive all events with a global sequence number or if you track all stream version numbers, and even then it isn't reliable in all cases.
For each individual stream, you keep things in order by fetching them from a data store that knows the correct order. A way of thinking of this is that your query the data store, and you get a Document Message back.
It may help to review Greg Young's Polyglot Data talk.
As for synchronization of events in multiple streams; a thing that you need to recognize is that events in different streams are inherently concurrent.
You can get some loose coordination between different streams if you have happens-before data encoded into your messages. "Event B happened in response to Event A, therefore A happened-before B". That gets you a partial ordering.
If you really do need a total ordering of everything everywhere, then you'll need to be looking into patterns like Lamport Clocks.
The documentation of Subscription#cancel says that
Data may still be sent to meet previously signalled demand after calling cancel.
In which scenario would people expect the publisher to continue to send till previous signalled demand is met?
Also, if I don't want any new items to be sent after cancellation, what should I do?
Unless you are creating low level operators or Publishers, you don't have to worry about this.
In which scenario would people expect the publisher to continue to send till previous signalled demand is met?
None of the mainstream Reactive Streams libraries do that as they stop sending items eventually. RxJava 2 and Reactor 3 are pretty eager on this so you'd most likely have an extra item on a low-lever asynchronously issued cancellation. Akka Stream may signal more than that (last time I checked, they mix control and item signals and there is a configuration setting for max synchronous items per stream that can lead to multiple items being emitted before the cancellation takes effect).
Also, if I don't want any new items to be sent after cancellation, what should I do?
Depends on what you implement: a Publisher or a Subscriber.
In a Publisher the most eager method is to set a volatile boolean cancelled field and check that every time you are in some kind of emission loop.
In a Subscriber, you can have a boolean done field that is checked in each onXXX so that when you call Subscription.cancel() from onNext, any subsequent call will be ignored.
I'm trying to implement a simple CQRS/event sourcing proof of concept on top of Kafka streams (as described in https://www.confluent.io/blog/event-sourcing-using-apache-kafka/)
I have 4 basic parts:
commands topic, which uses the aggregate ID as the key for sequential processing of commands per aggregate
events topic, to which every change in aggregate state are published (again, key is the aggregate ID). This topic has a retention policy of "never delete"
A KTable to reduce aggregate state and save it to a state store
events topic stream ->
group to a Ktable by aggregate ID ->
reduce aggregate events to current state ->
materialize as a state store
commands processor - commands stream, left joined with aggregate state KTable. For each entry in the resulting stream, use a function (command, state) => events to produce resulting events and publish them to the events topic
The question is - is there a way to make sure I have the latest version of the aggregate in the state store?
I want to reject a command if violates business rules (for example - a command to modify the entity is not valid if the entity was marked as deleted). But if a DeleteCommand is published followed by a ModifyCommand right after it, the delete command will produce the DeletedEvent, but when the ModifyCommand is processed, the loaded state from the state store might not reflect that yet and conflicting events will be published.
I don't mind sacrificing command processing throughput, I'd rather get the consistency guarantees (since everything is grouped by the same key and should end up in the same partition)
Hope that was clear :) Any suggestions?
I don't think Kafka is good for CQRS and Event sourcing yet, the way you described it, because it lacks a (simple) way of ensuring protection from concurrent writes. This article talks about this in details.
What I mean by the way you described it is the fact that you expect a command to generate zero or more events or to fail with an exception; this is the classical CQRS with Event sourcing. Most of the people expect this kind of Architecture.
You could have Event sourcing however in a different style. Your Command handlers could yield events for every command that is received (i.e. DeleteWasAccepted). Then, an Event handler could eventually handle that Event in an Event sourced way (by rebuilding Aggregate's state from its event stream) and emit other Events (i.e. ItemDeleted or ItemDeletionWasRejected). So, commands are fired-and-forget, sent async, the client does not wait for an immediate response. It waits however for an Event describing the outcome of its command execution.
An important aspect is that the Event handler must process events from the same Aggregate in a serial way (exactly once and in order). This can be implemented using a single Kafka Consumer Group. You can see about this architecture in this video.
Please read this article by my colleague Jesper. Kafka is a great product but actually not a good fit at all for event sourcing
https://medium.com/serialized-io/apache-kafka-is-not-for-event-sourcing-81735c3cf5c
A possible solution I came up with is to implement a sort of optimistic locking mechanism:
Add an expectedVersion field on the commands
Use the KTable Aggregator to increase the version of the aggregate snapshot for each handled event
Reject commands if the expectedVersion doesn't match the snapshot's aggregate version
This seems to provide the semantics I'm looking for
We have are developing an application that will receive events from various systems via a message queue (Azure) but it is just possible that some events (messages) will not arrive in the order they were sent. These events will be received and processed by a central CQRS/ES based system but my worry is that if the events are placed in the event store in the wrong order we will get garbage out (for example "order create" after "add order item").
Are typical ES systems meant to resolve this issue or are we meant to ensure that such messages are put in the right order before being pushed into the event store? If you have links to articles that back up either view it would help.
Edit: I think my description is clearly far too vague so the responses, while helpful in understanding CQRS/ES, do not quite answer my problem so I'll add a little more detail and hopefully someone will recognise the problem.
Firstly the players.
the front end web site (not actually relevant to this problem) delivers orders to the management system.
our management system which takes orders from the web site and passes them to the warehouse and is hosted on site.
the warehouse which accepts orders, fulfils them if possible and notifies us when an order is fulfilled or cannot be partially or completely fulfilled.
Linking the warehouse to the management system is a fairly thin Azure cloud based coupling. Messages from the warehouse are sent to a WCF/Soap layer in the cloud, parsed, and sent over the messages bus. Message to the warehouse are sent over the message bus and then, again in the cloud, converted into Soap calls to a server at the warehouse.
The warehouse is very careful to ensure that messages it sends have identifiers that increment without a gap so we can know when a message is missed. However when we take those messages and forward them to the management system they are transported over the message bus and could, in theory, arrive in the wrong order.
Now given that we have a sequence number in the messages we could ensure the messages are put back in the right order before they are sent to the CQRS/ES system but my questions is, is that necessary, can the ES actually be used to reorder the events into the logical order they were intended?
Each message that arrives in Service Bus is tagged with a SequenceNumber. The SequenceNumber is a monotonically increasing, gapless 64-bit integer sequence, scoped to the Queue (or Topic) that provides an absolute order criterion by arrival in the Queue. That order may different from the delivery order due to errors/aborts and exists so you can reconstitute order of arrival.
Two features in Service Bus specific to management of order inside a Queue are:
Sessions. A sessionful queue puts locks on all messages with the same SessionId property, meaning that FIFO is guaranteed for that sequence, since no messages later in the sequence are delivered until the "current" message is either processed or abandoned.
Deferral. The Defer method puts a message aside if the message cannot be processed at this time. The message can later be retrieved by its SequenceNumber, which pulls from the hidden deferral queue. If you need a place to keep track of which messages have been deferred for a session, you can put a data structure holding that information right into the message session, if you use a sessionful queue. You can then pick up that state again elsewhere on an accepted session if you, for instance, fail over processing onto a different machine.
These features have been built specifically for document workflows in Office 365 where order obviously matters quite a bit.
I would have commented on KarlM's answer but stackoverflow won't allow it, so here goes...
It sounds like you want the transport mechanism to provide transactional locking on your aggregate. To me this sounds inherently wrong.
It sounds as though the design being proposed is flawed. Having had this exact problem in the past, I would look at your constraints. Either you want to provide transactional guarantees to the website, or you want to provide them to the warehouse. You can't do both, one always wins.
To be fully distributed: If you want to provide them to the website, then the warehouse must ask if it can begin to fulfil the order. If you want to provide them to the warehouse, then the website must ask if it can cancel the order.
Hope that is useful.
For events generated from a single command handler/aggregate in an "optimistic locking" scenario, I would assume you would include the aggregate version in the event, and thus those events are implicitly ordered.
Events from multiple aggregates should not care about order, because of the transactional guarantees of an aggregate.
Check out http://cqrs.nu/Faq/aggregates , http://cqrs.nu/Faq/command-handlers and related FAQs
For an intro to ES and optimistic locking, look at http://www.jayway.com/2013/03/08/aggregates-event-sourcing-distilled/
You say:
"These events will be received and processed by a central CQRS/ES based system but my worry is that if the events are placed in the event store in the wrong order we will get garbage out (for example "order create" after "add order item")."
There seems to be a misunderstanding about what CQRS pattern with Event Sourcing is.
Simply put Event Sourcing means that you change Aggregates (as per DDD terminology) via internally generated events, the Aggregate persistence is represented by events and the Aggregate can be restored by replaying events. This means that the scope is quite small, the Aggregate itself.
Now, CQRS with Event Sourcing means that these events from the Aggregates are published and used to create Read projections, or other domain models that have different purposes.
So I don't really get your question given the explanations above.
Related to Ordering:
there is already an answer mentioning optimistic locking, so events generated inside a single Aggregate must be ordered and optimistic locking is a solution
Read projections processing events in order. A solution I used in the past was to to publish events on RabbitMQ and process them with Storm.
RabbitMQ has some guarantees about ordering and Storm has some processing affinity features. For Storm, (as far as I remember) allows you to specify that for a given ID (for example an Aggregate ID) the same handler would be used, hence the events are processed in the same order as received from RabbitMQ.
The article on MSDN https://msdn.microsoft.com/en-us/library/jj591559.aspx states "Stored events should be immutable and are always read in the order in which they were saved" under "Performance, Scalability, and consistency". This clearly means that appending events out of order is not tolerated. The same article also states multiple times that while events cannot be altered, corrective events can be made. This would imply again that events are processed in the order they are received to determine the current truth (state of of the aggregate). My conclusion is that we should fixed the messaging order problem before posting events to the event store.
Microsoft says, "developers represent asynchronous data streams with Observables." I'm trying to reason through the idea. If I were to tackle the concept implicitly, I would imagine that it's just, anything that could be observed in the data stream. Code should be more precise.
How would I know an "observable" if I saw it? Could you give me a better explanation of what an "observable" is?
Microsoft says, "developers represent asynchronous data streams with
Observables." I'm trying to reason through the idea. If I were to
tackle the concept implicitly, I would imagine that it's just,
anything that could be observed in the data stream. Code should be
more precise.
The code actually is more precise. An Observable is represented by the IObservable<T> interface. The main job of IObservable<T> is to handle IObserver<T>s. These two work in tandem: An IObservable<T> represents a stream of type T that can be be subscribed to. An IObserver<T> represents a handler that subscribes to the observable to handle those events.
There are three types of events that an observable can implicitly emit:
OnNext: The next instance of T
OnCompleted: A non-error (empty-message) terminator.
OnError: An error terminator.
However, observables don't emit these messages directly, rather they emit them only onto subscribed observers.
How would I know an "observable" if I saw it? Could you give me a
better explanation of what an "observable" is?
Imagine a service that reports the latest Apple stock price. You can think of the service as an observable. To get this information, you would have to subscribe to the service. Once subscribed, the service could emit one of three messages:
Next most-latest stock price
Market closed
Some sort of failure (connection failure would be most typical)
You would in turn write a handler to handle these three types of messages. That handler would be an observer to the observable stream of prices.
From Wikipedia:
The observer pattern is a software design pattern in which an object, called the subject, maintains a list of its dependents, called observers, and notifies them automatically of any state changes, usually by calling one of their methods.
This definition is clear when applied to the events used in user interfaces: you observe button clicks by providing a event handler which the button calls when it is clicked. In this case, the button is an observable, which notifies a number of observers in the form of event handlers.
Applied to reactive programming, an observable is just a stream of events that you can subscribe - i.e. observe. Think of it as a pipe through which events traverse and that you can peek into. You do so by observing the stream and handling those events you are interested to. Furthermore, operations can be performed over streams - for instance merging a couple streams into a new one.
Both the publishing of events to the stream and the handling of those events - your observer which processes them - can be done asynchronously which promotes scalability.
Similar concepts are those of messages, topics, and subscribers: some stakeholder can publish messages to a topic, to which many different stakeholders can subscribe. Respectively, these would correspond to the events, the observable stream event, and the observers.
Microsoft uses the terms Observer and Observable while in some other reactive frameworks they may use other terms. The Getting started of Introduction to Rx can help you further clarify these concepts and the whole book is a free gem. Note that this book prefers to use the term sequence to refer to a stream of events.
I would imagine that it's just, anything that could be observed in the data stream.
That's right. Actually, in Microsoft's Rx, the main core are just the two interfaces interfaces defining the contract between observers and observables, the rest is pretty much abstracted away.
I think the terminology varies, but if you search for functional reactive programming papers e.g. on Google Scholar, you will find definitions of the basic concepts behavior and event. I think the following two definitions from a paper from Functional Reactive Programming from First Principles are representative:
Behavior is a value of type a that changes over time
Event is a
time-ordered sequence of event occurences
Intuitively, a behavior is a stream transformer: a function that takes
an infinite stream of sample times, and yields an infinite stream of
values. Similarly, an event is a stream transformer, and can be
thought of as a behavior where, at each time t, the event either
occurs or does not occur.
It seems MS fuses both into the concept of an Observable.
I think it is good to read some background papers to get the terminology. The papers from Conal Elliott are a good start. Or you could enlist in the Principles of Reactive Programming at coursera if you want a more interactive introduction.