Axon - How to retrieve the new entity version number in the CommandHandler? - rest

I'm currently writing a distributed application with a microservice architecture.
For that I am applying the CQRS pattern and event sourcing with the help of the axon framework. Therefore the data is eventual consistent.
Both, the write and the read side, are accessible over HTTP; REST specifically.
The initial problem:
After updating/creating an entity, the user [1] should be able to see the results. Because the events are handled asynchronously, the client/UI doesn't know when the entity is really updated (or created). So when the client fetches the data after sending the update-request but before the event is processed, the unchanged data is returned. Therefore the user could think, that the application is broken and/or sends a new request.
Solution attempt:
While looking for a solution for the read-after-write problem I came accross this blog entry.
There is proposed to return the new entity version in the write response. The client can then request the data with the expected entity version (as Expect header). If the actual version is equal or greater than the expected version, the data is returned. Or else an empty response with a Retry-After Header is returned.
The problem:
When the client sends an UpdateFoo request to the write side, the application sends a corresponding UpdateFooCommand over the CommandGateway. The Command is processed by the entity aggregate which publishes the FooUpdatedEvent. The read side receives this event and updates its entity view which can be accessed over the REST interface of the read side.
This is controlled by the axon framework. The handlers are annotated with #CommandHandler and #EventSourcingHandler respectively.
Now: How can I access the new version number of the affected entity in the CommandHandler, so that this number can be returned in the update response?
Thanks in advance
[1] Not only users. There can als be non human clients.

you can use AggregateLifecycle.getVersion() from within your aggregate. You can choose to return that value as part of your command's results and pass that information when doing a query. If the query doesn't have that version number of the aggregate's information, yet, you can (wait and) retry.

Related

How do PUT, POST or PATCH request differ ultimately?

The data, being sent over a PUT/PATCH/POST request, ultimately ends up in the database.
Now whether we are inserting a new resource or updating or modifying an existing one - it all depends upon the database operation being carried out.
Even if we send a POST and ultimately perform just an update in the database, it does not impact anywhere at all, isn't it?!
Hence, do they actually differ - apart from a purely conceptual point of view?
Hence, do they actually differ - apart from a purely conceptual point of view?
The semantics differ - what the messages mean, and what general purpose components are allowed to assume is going on.
The meanings are just those described by the references listed in the HTTP method registry. Today, that means that POST and PUT are described by HTTP semantics; PATCH is described by RFC 5789.
Loosely: PUT means that the request content is a proposed replacement for the current representation of some resource -- it's the method we would use to upload or replace a single web page if we were using the HTTP protocol to do that.
PATCH means that the request content is a patch document - which is to say a proposed edit to the current representation of some resource. So instead of sending the entire HTML document with PUT, you might instead just send a fix to the spelling error in the title element.
POST is... well, POST is everything else.
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.” -- Fielding 2009
The POST method has the fewest constraints on its semantics (which is why we can use it for anything), but the consequence is that the HTTP application itself has to be very conservative with it.
Webber 2011 includes a good discussion of the implementations of the fact that HTTP is an application protocol.
Now whether we are inserting a new resource or updating or modifying an existing one - it all depends upon the database operation being carried out.
The HTTP method tells us what the request means - it doesn't place any constraints on how your implementation works.
See Fielding, 2002:
HTTP does not attempt to require the results of a GET to be safe. What it does is require that the semantics of the operation be safe, and therefore it is a fault of the implementation, not the interface or the user of that interface, if anything happens as a result that causes loss of property (money, BTW, is considered property for the sake of this definition).
The HTTP methods are part of the "transfer of documents over a network" domain - ie they are part of the facade that allows us to pretend that the bank/book store/cat video archive you are implementing is just another "web site".
It is about the intent of the sender and from my perspective it has a different behaviour on the server side.
in a nutshell:
POST : creates new data entry on the server (especially with REST)
PUT : updates full data entry on the server (REST) or it creates a new data entry (non REST). The difference to a POST request is that the client specifies the target location on the server.
PATCH : the client requests a partial update (Id and partial data of entry are given). The difference to PUT is that the client sends not the full data back to the server this can save bandwidth.
In general you can use any HTTP request to store data (GET, HEAD, DELETE...) but it is common practice to use POST, PUT, and PATCH for specific and standardized scenarios. Because every developer can understand it later
They are slightly different and they bind to different concepts of REST API (which is based on HTTP)
Just imagine that you have some Booking entity. And yo perform the following actions with resources:
POST - creates a new resource. And it is not idempotent - if you sent the same request twice -> two bookings will be stored. The third time - will create the third one. You are updating your DB with every request.
PUT - updates the full representation of a resource. It means - it replaces the booking full object with a new one. And it is idempotent - you could send a request ten times result will be the same (if a resource wasn't changed between your calls)
PATCH - updates some part of the resource. For example, your booking entity has a date property -> you update only this property. For example, replace the existing date with new date which is sent at the request.
However, for either of the above - who is deciding whether it is going to be a new resource creation or updating/modifying an existing one, it's the database operation or something equivalent to that which takes care of persistence
You are mixing different things.
The persistence layer and UI layer are two different things.
The general pattern used at Java - Model View Controller.
REST API has its own concept. And the DB layer has its own purpose. Keep in mind that separating the work of your application into layers is exactly high cohesion - when code is narrow-focused and does one thing and does it well.
Mainly at the answer, I posted some concepts for REST.
The main decision about what the application should do - create the new entity or update is a developer. And this kind of decision is usually done through the service layer. There are many additional factors that could be done, like transactions support, performing filtering of the data from DB, pagination, etc.
Also, it depends on how the DB layer is implemented. If JPA with HIbernate is used or with JDBC template, custom queries execution...

Which HTTP Verb should I use to claim and lock an item in a job queue?

I plan on using an HTTP REST interface to connect to a Job Control service.
One key operation is to request a computational Job.
The caller does not know the ID of the Job; that is what it will be told.
The job will be marked in the database as locked by the service.
The data needed for processing of the job will be returned to the caller.
Later on, when the caller is done processing the job, it will send the results back via another REST call.
Now it knows the ID of the record to be updated.
The second REST call will update the Job record with the results.
and change the Job's status and release the lock.
Only the Success/Fail status needs to be returned.
I am leaning towards using PUT for each operation because no new record is being created; it is being updated in both cases.
Is this proper? Can the first PUT return a large JSON payload with the Job data or does it just return an HTTP status? Should I use a POST instead, even though I am not creating a record, just updating it?
I would have used a GET for the first operation, but a GET is not supposed to change any objects on the service, and I am locking it, which is a change. Is locking a record acceptable in a GET request?
Which HTTP Verb should I use to claim and lock an item in a job queue?
Key idea: a REST API is a facade - your application/service pretends to be an HTTP compliant document store. All of the interesting things that happen are side effects triggered by modifying documents. See Jim Webber, 2011.
With that in mind...
POST is fine. It's okay to use POST.
PUT/PATCH are a good for remote authoring; the client fetches your representation of a resource, makes edits to his local copy, and sends you a copy of the representation (PUT) or a patch document describing the changes (PATCH). The server can then apply those edits to its copy, or not.
So for your specific example, I would expect the client to GET a representation of your resource, change the information in that representation from unlocked to locked, and then to PUT the changed representation back to your server. You server would be expected to update your copy of the representation to match.
It may remind you of a declarative style - the client tells the server what the representation should look like, and it's up to the server to figure out how to do that.
Included for Completeness, NOT Recommened:
The HTTP method registry also includes a method LOCK, with a corresponding UNLOCK. The semantics for these method tokens are defined by the WebDAV specification. If your meaning of LOCK matches that of WebDAV, then using that might be an answer. Note that the specification includes comments like
Any resource that supports the LOCK method MUST, at minimum, support the XML request and response formats defined herein.
Unless you are already in a space where people are expecting to be able to use general-purpose WebDAV clients to interact with your API, that's probably not a good fit.
The HTTP method registry is extendable. So you could define the semantics of your own method token, then push to have it adopted as a standard.

How to merge/consolidate responses from multiple RESTful microservices?

Let's say there are two (or more) RESTful microservices serving JSON. Service (A) stores user information (name, login, password, etc) and service (B) stores messages to/from that user (e.g. sender_id, subject, body, rcpt_ids).
Service (A) on /profile/{user_id} may respond with:
{id: 1, name:'Bob'}
{id: 2, name:'Alice'}
{id: 3, name:'Sue'}
and so on
Service (B) responding at /user/{user_id}/messages returns a list of messages destined for that {user_id} like so:
{id: 1, subj:'Hey', body:'Lorem ipsum', sender_id: 2, rcpt_ids: [1,3]},
{id: 2, subj:'Test', body:'blah blah', sender_id: 3, rcpt_ids: [1]}
How does the client application consuming these services handle putting the message listing together such that names are shown instead of sender/rcpt ids?
Method 1: Pull the list of messages, then start pulling profile info for each id listed in sender_id and rcpt_ids? That may require 100's of requests and could take a while. Rather naive and inefficient and may not scale with complex apps???
Method 2: Pull the list of messages, extract all user ids and make bulk request for all relevant users separately... this assumes such service endpoint exists. There is still delay between getting message listing, extracting user ids, sending request for bulk user info, and then awaiting for bulk user info response.
Ideally I want to serve out a complete response set in one go (messages and user info). My research brings me to merging of responses at service layer... a.k.a. Method 3: API Gateway technique.
But how does one even implement this?
I can obtain list of messages, extract user ids, make a call behind the scenes and obtain users data, merge result sets, then serve this final result up... This works ok with 2 services behind the scenes... But what if the message listing depends on more services... What if I needed to query multiple services behind the scenes, further parse responses of these, query more services based on secondary (tertiary?) results, and then finally merge... where does this madness stop? How does this affect response times?
And I've now effectively created another "client" that combines all microservice responses into one mega-response... which is no different that Method 1 above... except at server level.
Is that how it's done in the "real world"? Any insights? Are there any open source projects that are built on such API Gateway architecture I could examine?
The solution which we used for such problem was denormalization of data and events for updating.
Basically, a microservice has a subset of data it requires from other microservices beforehand so that it doesn't have to call them at run time. This data is managed through events. Other microservices when updated, fire an event with id as a context which can be consumed by any microservice which have any interest in it. This way the data remain in sync (of course it requires some form of failure mechanism for events). This seems lots of work but helps us with any future decisions regarding consolidation of data from different microservices. Our microservice will always have all data available locally for it process any request without synchronous dependency on other services
In your case i.e. for showing names with a message, you can keep an extra property for names in Service(B). So whenever a name update in Service(A) it will fire an update event with id for the updated name. The Service(B) then gets consumes the event, fetches relevant data from Service(A) and updates its database. This way even if Service(A) is down Service(B) will function, albeit with some stale data which will eventually be consistent when Service(A) comes up and you will always have some name to be shown on UI.
https://enterprisecraftsmanship.com/2017/07/05/how-to-request-information-from-multiple-microservices/
You might want to perform response aggregation strategies on your API gateway. I've written an article on how to perform this on ASP.net Core and Ocelot, but there should be a counter-part for other API gateway technologies:
https://www.pogsdotnet.com/2018/09/api-gateway-response-aggregation-with.html
You need to write another service called Aggregator which will internally call both services and get the response and merge/filter them and return the desired result. This can be easily achieved in non-blocking using Mono/Flux in Spring Reactive.
An API Gateway often does API composition.
But this is typical engineering problem where you have microservices which is implementing databases per service pattern.
The API Composition and Command Query Responsibility Segregation (CQRS) pattern are useful ways to implement queries .
Ideally I want to serve out a complete response set in one go
(messages and user info).
The problem you've described is what Facebook realized years ago in which they decided to tackle that by creating an open source specification called GraphQL.
But how does one even implement this?
It is already implemented in various popular programming languages and maybe you can give it a try in the programming language of your choice.

How to handle network connectivity loss in the middle of REST POST request?

REST POST is used to create resources.
Let's say we have resource url
"http://example.com/cars"
We want to create a new car.
We POST to "http://example.com/cars" with JSON payload containing car properties (color, weight, model, etc).
Server receives the request, creates a new car, sends a response over the network.
At this point network fails (let's say router stops working properly and ignores every packet).
Client fails with TCP timeout (like 90 seconds).
Client has no idea whether car was created or not.
Also client haven't received car resource id, so it can't GET it to check if it was created.
Now what?
How do you handle this?
You can't simply retry creating, because retrying will just create a duplicate (which is bad).
REST POST is used to create resources.
HTTP POST is used for lots of things. REST doesn't particularly care; it just wants resources that support a uniform interface, and hypermedia.
At this point network fails
Bummer!
Now what? How do you handle this? You can't simply retry creating, because retrying will just create a duplicate (which is bad).
This is a general messaging concern, not directly related to REST. The most common solution is to use the Idempotent Receiver pattern. In short, you
need to define your messages so that the receiver has enough information to recognize the request as something that has already been done.
Ideally, this is being supported at the business level.
Idempotent collections of values are often straight forward; we just need to be thinking sets, rather than lists.
Idempotent collections of entities are trickier; if the request includes an identifier for the new entity, or if we can compute one from the data provided, then we can think of our collection as a hash.
If none of those approaches fits, then there's another possibility. Instead of performing an idempotent mutation of the collection, we make the mutation of the collection itself idempotent. Think "compare and swap" - we encode into the request information that identifies the current state of the collection; is that state is still current when the request arrives, then the mutation is applied. If the condition does not hold, then the request becomes a no-op.
Translating this into HTTP, we make a small modification to the protocol for updating the collection resource. First, we GET the current representation; and in the meta data the server provides validators that can be used in subsquent requests. Having obtained the validator, the client evaluates the current representation of the resource to determine if it needs to be changed. If the client decides to make a change, then submits the change with an If-Match or an If-Unmodified-Since header including the validator. The server, before processing the requests, then considers the validator, immediately abandoning the request with 412 Precondition Failed.
Thus, if a conditional state-changing request is lost, the client can at its own discretion repeat the request without concern that server will misunderstand the client's intent.
Retry it a limited number of times, with increasing delays between the attempts, and make sure the transaction concerned is idempotent.
because retrying will just create a duplicate (which is bad).
It is indeed, and it needs fixing, see above. It should be impossible in your system to create two entries with the same attributes. This is easily accomplished at the database level. You can attain idempotence by having the transaction return the same thing whether the entry already existed or was newly created. Or else just have it return EXISTS if the entry already exists, and adjust your client accordingly.

CQRS and REST HATEOAS mismatch

Suppose you have a model Foo.
One business case is to simply create an instance of Foo, so there is a corresponding CreateFooCommand in my model, triggered by invoking a POST request to a given REST endpoint.
There are of course other Commands too.
But now, there is a ViewModel, which is derived from my DomainModel. It's simply a sql table with raw data - each Foo instance from DomainModel has corresponding derived ViewModel instance. Both have different IDs (on DomainModel there is a DomainID, on ViewModel it's simply a long value).
Now: should I even care about HATEOAS in such a case? In a proper REST implementation, I should at least return location-url in the header. But since my view model is only derived from DomainModel, should I care? I don't even have the view model's ID at the time my DomainModel is created.
Since CQRS means that Queries are separated from Commands, you may not be able to perform a Query right away, because the Command may not yet have been applied (perhaps it never will).
In order to reconcile that with HATEOAS, instead of returning 200 OK from the POST request, the service can return 202 Accepted:
The request has been accepted for processing, but the processing has not been completed. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place. There is no facility for re-sending a status code from an asynchronous operation such as this.
The 202 response is intentionally non-committal. Its purpose is to allow a server to accept a request for some other process (perhaps a batch-oriented process that is only run once per day) without requiring that the user agent's connection to the server persist until the process is completed. The entity returned with this response SHOULD include an indication of the request's current status and either a pointer to a status monitor or some estimate of when the user can expect the request to be fulfilled.
(My emphasis)
That pointer could be a link that the client can query to get the status of the Command. When/if the Command completes and the View is updated, that status resource could then contain a link to the view.
This is pretty much a workflow straight out of REST in Practice - very reminiscent of its Restbucks example.
Another option to deal with the ID issue is to generate the ID before accepting the Command - perhaps even asking the client to supply the ID. Read more about such options here.
As Greg Young explains, CQRS is nothing more than "splitting one object into two". So assume that you have one domain aggregate and it has an id. Now you are talking about your view model having another id. However, you are unable to update your view model unless you have the aggregate id in your view model as well. From my point of view, your REST POST request should return a result that has the aggregate id in it. This is your id, the view model id has no interest to anyone except the read model storage.
Should it return a command status URI like Mark suggests is a topic for another discussion. Many CQRS practitioners currently tend to handle commands synchronously to avoid FE/BE mismatch in case of failure and give the FE an ability to react on errors on the BE. There is no real win to execute commands asynchronously for one user. Commands do mutate the state and in 99% of cases the user needs to know if the state was mutated properly.