CQRS & ElasticSearch - using ElasticSearch to build read model - cqrs

Does anyone use ElasticSearch for building read model in CQRS approach? I have some questions related to such solution:
Where do you store your domain
events? In JDBC database? In
ElasticSearch?
Do you build indexes by event handlers that processes domain events or using ElasticSearch River functionality?
How do you handle complete rebuild of view model - for example in case when view is corrupted? Do you process all events to rebuid view?

Where the authoritative repository for your domain event is located is an implementation detail. For example, you could store serialized versions on S3 or CouchDB or any other number of storage implementations. The easiest if you're just getting started is a relational database.
Typically people use event handlers that understand the business intent behind each message and can then properly denormalize the message into the read model structure appropriate for the needs of your views.
If the view model is ever corrupted or perhaps you have a bug in a view model handler, there are a few simple steps to follow after fixing the bug:
Temporarily enqueue the flow of events arriving from the domain--these are the typical messages that are being published as your domain is doing work. We still want these messages, but not just yet. This could be done by turning off any message bus or not connecting to your queuing infrastructure if you use one.
Read all events from event storage. As each event is received (this can be done through a simple DB query), run each message through the appropriate message handler. Make sure that you keep track of the last 10,000 (or so) identifiers for all messages processed.
Now reconnect to your queues and start processing normally. If the identifier for the message has been seen, drop the message. Otherwise, process it normally.
The reason for tracking identifiers is to avoid a race condition where you're getting all events from the event store but the same message is coming across through the message queue.
Another technique that's highly related, but involves keeping track of all message identifiers can be found here: http://blog.jonathanoliver.com/2011/03/removing-2pc-two-phase-commit.html

Related

Axon Event Published Multiple Times Over EventBus

Just want to confirm the intended behavior of Axon, versus what I’m seeing in my application. We have a customized Kafka publisher integrated with the Axon framework, as well as, a custom Cassandra-backed event store.
The issue I’m seeing is as follows: (1) I publish a command (e.g. CreateServiceCommand) which hits the constructor of the ServiceAggregate, and then (2) A ServiceCreatedEvent is applied to the aggregate. (3) We see the domain event persisted in the backend and published over the EventBus (where we have a Kafka consumer).
All well and good, but suppose I publish that same command again with the same aggregate identifier. I do see the ServiceCreatedEvent being applied to the aggregate in the debugger, but since a domain event record already exists with that key, nothing is persisted to the backend. Again, all well and good, however I see the ServiceCreatedEvent being published out to Kafka and consumed by our listener, which is unexpected behavior.
I’m wondering whether this is the expected behavior of Axon, or if our Kafka integrations ought to be ensuring we’re not publishing duplicate events over the EventBus.
Edit:
I swapped in Axon's JPA event store and saw the following log when attempting to issue a command to create the aggregate that already exists. This issue then is indeed due to a defect with our custom event store.
"oracle.jdbc.OracleDatabaseException: ORA-00001: unique constraint (R671659.UK8S1F994P4LA2IPB13ME2XQM1W) violated\n\n\tat oracle.jdbc.driver.T4CTTIoer11.processError
The given explanation has a couple of holes which make it odd to be honest, and hard to pinpoint where the problem lies.
In short no, Axon would not publish an event twice as a result of dispatching the exact same command a second time. This depends on your implementation. If the command creates an aggregate, then you should see a constraint violation on the uniqueness requirement of aggregate identifier and (aggregate) sequence number. If it's a command which works on an existing aggregate, it depends on your implementation whether it is idempotent yes/no.
From your transcript I guess you are talking about a command handler which creates an Aggregate. Thus the behavior you are seeing strikes me as odd. Either the event store is custom which inserts this undesired behavior, or it's due to not using Axon's dedicated Kafka Extension.
Also note that using a single solution for event storage and message distribution like Axon Server will omit the problem entirely. You'd no longer need to configure any custom event handling and publication on Kafka at all, saving you personal development work and infrastructure coordination. Added, it provides you the guarantees which I've discussed earlier. From more insights on why/how of Axon Server, you can check this other SO response of mine.

Client Interaction With Event Sourcing

I have been recently looking into event sourcing and have some questions about the interactions with clients.
So event-sourcing sounds great. decoupling all your microservices, keeping your information in immutable events and formulating a stored states off of that to fit your needs is really handy. Having event propagate through your system/services and reacting to events in their own way is all fine.
The issue i am having lies with understanding the client interaction.
So you want clients to interact with the system, but they need to do this now by events. They can not longer submit a state to mutate your existing one.
So the question is how do clients fire off specific event and interact with (not only an event based system) but a system based on event sourcing.
My understanding is that you no longer use the rest api as resources (which you can get, update, delete, etc.. handling them as a resource), but you instead post to an endpoint as an event.
So how do these endpoint work?
my second question is how does the user get responses back?
for instance lets say we have an event to place an order.
your going to fire off an event an its going to do its thing. Again my understanding is that you dont now validate the request, e.g. checking if the user ordering the order has enough money, but instead fire it to be place and it will be handled in the system.
e.g. it will not be
- order placed
- this will be picked up by the pricing service and it will either fire an reserved money or money exceeded event based on if the user can afford it.
- The order service will then listen for those and then mark the order as denied or not enough credit.
So because this is a async process and the user has fired and forgotten, how do you then show the user it has either failed or succeeded? do you show them an order confirmation page with the order status as it is (even if its pending)
or do you poll it until it changes (web sockets or something).
I'm sorry if a lot of this is all nonsense, I am still learning about this architecture and am very much in the mindset of a monolith with REST responses.
Any help would be appreciated.
The issue i am having lies with understanding the client interaction.
Some of the issue may be understanding, but I promise you a fair share of the issue is that the literature sucks.
In particular, the word "Event" gets re-used a lot of different ways. If you aren't paying very careful attention to which meaning is being used, you are going to get knotted.
Event Sourcing is really about persistence - how does a micro-server store its private copy of state for later re-use? Instead of destructively overwriting our previous state, we write new information that links back to the previous state. If you imagine each microservice storing each change of state as a commit in its own git repository, you are in the right ballpark.
That's a different animal from using Event Messages to communicate information between one microservice and another.
There's some obvious overlap, of course, because the one message that you are likely to share with other microservices is "I just changed state".
So how do these endpoint work?
The same way that web forms do. I send you a representation of a form, the client displays the form to you. You fill in your data and submit the form, the client processes the contents of the form, and sends back to me an HTTP request with a "FormSubmitted" event in the message body.
You can achieve similar results by sending new representations of the state, but its a bit error prone to strip away the semantic intent and then try to guess it again on the server. So you are more likely to instead see task based user interfaces, or protocols that clearly identify the semantics of the change.
When the outside world is the authority for some piece of data (a shopper's shipping address, for example), you are more likely to see the more traditional "just edit the existing representation" approach.
So because this is a async process and the user has fired and forgotten, how do you then show the user it has either failed or succeeded?
Fire and forget really doesn't work for a distributed protocol on an unreliable network. In most cases, at-least-once delivery is important, so Fire until verified is the more common option. The initial acknowledgement of the message might be something like 202 Accepted -- "We received your message, we wrote it down, here's our current progress, here are some links you can fetch for progress reports".
It doesnt seem to me that event-sourcing fits with the traditional REST model where you CRUD a resource.
Jim Webber's 2011 talk may help to prune away the noise. A REST API is a disguise that your domain model wears; you exchange messages about manipulating resources, and as a side effect your domain model does useful work.
One way you could do this that would look more "traditional" is to work with representations of the event stream. I do a GET /08ff2ec9-a9ad-4be2-9793-18e232dbe615 and it returns me a representation of a list of events. I append a new event onto the end of that list, and PUT /08ff2ec9-a9ad-4be2-9793-18e232dbe615, and interesting side effects happen. Or perhaps I instead create a patch document that describes my change, and PATCH /08ff2ec9-a9ad-4be2-9793-18e232dbe615.
But more likely, I would do something else -- instead of GET /08ff2ec9-a9ad-4be2-9793-18e232dbe615 to fetch a representation of the list of events, I'd probably GET /08ff2ec9-a9ad-4be2-9793-18e232dbe615 to fetch a representation of available protocols - which is to say, a document filled with hyper links. From there, I might GET /08ff2ec9-a9ad-4be2-9793-18e232dbe615/603766ac-92af-47f3-8265-16f003ce5a09 to obtain a representation of the data collection form. I fill in the details of my event, submit the form, and POST /08ff2ec9-a9ad-4be2-9793-18e232dbe615 the form data to the server.
You can, of course, use any spelling you like for the URI.
In the first case, we need something like an HTTP capable document editor; the second case uses something more like a web browser.
If there were lots of different kinds of events, then the second case might well have lots of different form resources, all submitting POST /08ff2ec9-a9ad-4be2-9793-18e232dbe615 requests.
(You don't have to have all of the forms submitting to the same URI, but there are advantages to consider).
In a non event sourcing pattern I guess that would be first put into the database, then the event gets risen.
Even when you aren't event sourcing, there may still be some advantages to committing events to your durable store before emitting them. See Pat Helland: Data on the Outside versus Data on the Inside.
So you want clients to interact with the system, but they need to do this now by events.
Clients don't have to. Client may even not be aware of the underlying event store.
There are a number of trade-offs to consider and decisions to take when implementing an event-sourced system. To start with you can try to name a few pre computer era examples of event-sourced systems and look at their non-functional characteristics.
So the question is how do clients fire off specific event
Clients don't send events. They rather should express an intent (a command). Then it is the responsibility of the event-sourced system to validate the intent and either reject it or accept and store the corresponding event. It would mean that an intent to change the system's state was accepted and the stored event confirms the change.
My understanding is that you no longer use the rest api as resources
REST is one of the options. You just consider different things as resources. A command can be a REST resource. An event-sourced entity can be a resource, to which you POST a command. If you like it async - you can later GET the command to check its status. You can GET an entity to know its current state. You cant GET events from a class of entities as a means of subscription.
If we are talking about an end user, then most likely it doesn't deal with the event store directly. There is some third tier in between, which does CQRS. From a user client perspective it can be provided with REST, GraphQL, SOAP, gRPC or event e-mail. Whatever transport solution you find suitable. Command-processing part from CQRS is what specifically domain-driven. It decides which intent to accept and which to reject.
Event store itself is responsible for the data consistency. I.e. it should not allow two concurrent event leading to invalid state be published. This is what pre-computer event-sourced systems are good at. You usually have some physical object as an entity, so you lock for update by just getting hand of it.
Then an end-user client usually reads from some prepared read model. The responsibility of a read (R in CQRS) component is to prepare read-optimised data for clients. This data may come from multiple event-sourced of the same or different classes. Again, client may interact with a read model with whatever transport is suitable.
While an event-store is consistent and consistent immediately, a read model is eventually consistent. But it's up to you to tune this eventuality.
Just try to throw REST out of the architecture for a while. Consider it a one of available transport options - that may help to look at the root.

How can reactive programming react to database changes?

I am new in the topic reactive programming and therefore have some questions.
I am developing a small software.
I would like to take the opportunity to get to know reactive programming better.
So I looked at Spring's project-reactor.
I also use R2DBC to reactively access the database.
I would like to know if there is any way that database responds to changes.
Or rather: If a user saves an entry in the database, then servers (for example, RestController) should be notified.
How could I go about doing that?
Enresponding controllers, configuration, entities, etc. I have already implemented.
Sorry for spelling mistakes.
Complement: The updates to the frontend are then made by Server Sent Events.
Basically, what Nick Tsitlakidis mentioned. Let me add a couple of things here.
The typical database query pattern is to query for a number of records. Databases respond with their results and indicate that the query is complete once a server has sent all records to your application. If new records arrive while the query is active or after the query is complete, you do not see these changes immediately because the of isolation and in case the query is complete, then you no longer have a reference to the query.
The feature you're asking is event-driven consumption of data. Databases call this feature continuous queries. Some stores (such as MongoDB with Tailable cursors or Postgres Logical Decoding) come with features that allow keeping a cursor/query open and your client is able to receive continuous updates.
Kafka and JMS also follow the idea of sending (messages) that are consumed typically by listeners or even through a reactive stream.
So it all boils down to the technology that you're using.
My understanding is that reactor can't solve this problem for you on its own. If you want your application to respond (react) on some database change, then you need to identify who's making this change and implement some kind of integration there.
Example, if you have Service1 updating the database, and Service2 needs to respond then Service1 can either call Service2, or, you can emit an event from Service1 and listen for the event from Service2.
The first approach is simpler and easier to implement but it has the disantvantage that is couples the two services. The second is trickier to implement but services are decoupled.
Reactor can help you in both cases :
For events, reactor can give you a way to listen to the events. For example using the reactor-rabbitmq module or the reactor-kafka.
For service-to-service calls, reactor can help you if you use Spring Webflux.
Perhaps you can tell us more about your case so we can provide a more specific solution?

Read Directly from the Event Store Or Implement a Copy of the Events in the Read Side

I'am wokring on a project where I have implemented CQRS without event sourcing. I am using Mongo for the read database and also for the write database. When something has to be changed it is first changed in the write db and then the read db is synchronized.
Later I introtoduced somehting like Event Store, also a MongoDB instace. I am making history of all the events that changed the other databases in some way. The read db does not synchronize with the Event Store, so I have no way to read the events.
I've reached a situation where I need the information that's inside the events from the Event Store. Should I connect directly to the Event Store and read from there or I should make the read db synchronize with the Event Store and basically hold a duplicate of the Event Store?
Thank you in advance guys! I am using C# .NET Core if someone needs this kind of info.
There's no functional reason why you shouldn't just read events directly from the event store.
The motivation for read models is usually low latency queries; we take the representation of data in the book of record (the event streams), and reshape it into a data structure that can answer queries. We accept the consequences of eventual consistency to get fast response times.
But if the shape we need for a query is an event stream, then we can just use the source data.
If the queries of the event store are having a negative performance impact on the writes to the store, then we might redesign the system to direct queries to a cached copy of the events instead.

CQRS - Consuming event service

My question is regarding the consuming event services which subscribes to the event published by commands in CQRS.
Say I have a document generation service which will generate some documents based on certain events, does the document generation service load the data from the domain via an aggregate root? If so, wouldn't the document generation service load data which may have been updated subsequently of the event being received by the generation service? How would you stop that from happening?
I guess I am assuming that the event will only pass the information received by command DTO and passing the whole domain model data to the event feels very wrong.
You really should build your read model from your events, unless you consider your documents a part of the domain (and you would have a CreateDocumentX command)
All i can say is that when you are speaking in cqrs you should describe the issue more deeply to solve or provide help to it properly.
However from what I've read is that you can have persistent storage on your write side, but ensure that you are not reaching out of your aggregate context.
Related issue reading-data-from-database-on-write-side-in-cqrs