Design Commands And Events while Handling External partner with Axon 4 - cqrs

This is a question related to designing command handling with Axon 4.
Let say I've a domain that model the concept of a Payment.
The actual payment will be done by an external Partner. I want to track it in my system via the following events: a Payment Request Was Issued followed by either
Partner Agreed the Payment or Partner Declined the Payment.
Every events issued by the command should be enrolled in the same database transaction.
What would be the best practice to actually call my partner in Axon 4 ?
Here's what I've done so far:
Have one command named RequestPaymentCommand
This command will be handled by a Payment Aggregate like this:
do some checks
apply the event PaymentRequestWasIssued
and then, call the external partner and given the result it will apply either PaymentAccepted or PaymentRefused
In this answer from stackoverflow, it is said that
All the data that you need to apply the event should normally be available in the command
With this statement in mind, I understand that I should create as much Commands as Events ? But In this case, what is the point of all theses commands ? Should I end up with something like:
My command RequestPaymentCommand will generate the PaymentRequestWasIssued event.
Then from somewhere I call my partner and then send another command (how to name it ?) that will generate the event given the result from the partner ?

The actual payment will be done by an external Partner
This means that your application is not the source of truth and it should not try to behave like one. This means that it should only observe what is happening in the remote system and possible react to remote events. To "observe" could mean to duplicate/copy the remote events in local databases, without modifications, just for cache reasons or for display reasons. Your system should not directly give other interpretations to these events, other than those given by their source.
After the remote events are copied locally, your system could react to them. This could mean that a Saga, after receives the Partner Agreed the Payment it sends a UnlockFeature command to a local Aggregate (see DDD).
With this statement in mind, I understand that I should create as much Commands as Events ? But In this case, what is the point of all theses commands ?
This is an indication that those are not your events: you should not emit them from your code; in the worst case you store them and react to them (in a Saga/Process manager). This means that you should discover the local business processes and model them as such: they react to events by sending commands.

Related

Client Interaction With Event Sourcing

I have been recently looking into event sourcing and have some questions about the interactions with clients.
So event-sourcing sounds great. decoupling all your microservices, keeping your information in immutable events and formulating a stored states off of that to fit your needs is really handy. Having event propagate through your system/services and reacting to events in their own way is all fine.
The issue i am having lies with understanding the client interaction.
So you want clients to interact with the system, but they need to do this now by events. They can not longer submit a state to mutate your existing one.
So the question is how do clients fire off specific event and interact with (not only an event based system) but a system based on event sourcing.
My understanding is that you no longer use the rest api as resources (which you can get, update, delete, etc.. handling them as a resource), but you instead post to an endpoint as an event.
So how do these endpoint work?
my second question is how does the user get responses back?
for instance lets say we have an event to place an order.
your going to fire off an event an its going to do its thing. Again my understanding is that you dont now validate the request, e.g. checking if the user ordering the order has enough money, but instead fire it to be place and it will be handled in the system.
e.g. it will not be
- order placed
- this will be picked up by the pricing service and it will either fire an reserved money or money exceeded event based on if the user can afford it.
- The order service will then listen for those and then mark the order as denied or not enough credit.
So because this is a async process and the user has fired and forgotten, how do you then show the user it has either failed or succeeded? do you show them an order confirmation page with the order status as it is (even if its pending)
or do you poll it until it changes (web sockets or something).
I'm sorry if a lot of this is all nonsense, I am still learning about this architecture and am very much in the mindset of a monolith with REST responses.
Any help would be appreciated.
The issue i am having lies with understanding the client interaction.
Some of the issue may be understanding, but I promise you a fair share of the issue is that the literature sucks.
In particular, the word "Event" gets re-used a lot of different ways. If you aren't paying very careful attention to which meaning is being used, you are going to get knotted.
Event Sourcing is really about persistence - how does a micro-server store its private copy of state for later re-use? Instead of destructively overwriting our previous state, we write new information that links back to the previous state. If you imagine each microservice storing each change of state as a commit in its own git repository, you are in the right ballpark.
That's a different animal from using Event Messages to communicate information between one microservice and another.
There's some obvious overlap, of course, because the one message that you are likely to share with other microservices is "I just changed state".
So how do these endpoint work?
The same way that web forms do. I send you a representation of a form, the client displays the form to you. You fill in your data and submit the form, the client processes the contents of the form, and sends back to me an HTTP request with a "FormSubmitted" event in the message body.
You can achieve similar results by sending new representations of the state, but its a bit error prone to strip away the semantic intent and then try to guess it again on the server. So you are more likely to instead see task based user interfaces, or protocols that clearly identify the semantics of the change.
When the outside world is the authority for some piece of data (a shopper's shipping address, for example), you are more likely to see the more traditional "just edit the existing representation" approach.
So because this is a async process and the user has fired and forgotten, how do you then show the user it has either failed or succeeded?
Fire and forget really doesn't work for a distributed protocol on an unreliable network. In most cases, at-least-once delivery is important, so Fire until verified is the more common option. The initial acknowledgement of the message might be something like 202 Accepted -- "We received your message, we wrote it down, here's our current progress, here are some links you can fetch for progress reports".
It doesnt seem to me that event-sourcing fits with the traditional REST model where you CRUD a resource.
Jim Webber's 2011 talk may help to prune away the noise. A REST API is a disguise that your domain model wears; you exchange messages about manipulating resources, and as a side effect your domain model does useful work.
One way you could do this that would look more "traditional" is to work with representations of the event stream. I do a GET /08ff2ec9-a9ad-4be2-9793-18e232dbe615 and it returns me a representation of a list of events. I append a new event onto the end of that list, and PUT /08ff2ec9-a9ad-4be2-9793-18e232dbe615, and interesting side effects happen. Or perhaps I instead create a patch document that describes my change, and PATCH /08ff2ec9-a9ad-4be2-9793-18e232dbe615.
But more likely, I would do something else -- instead of GET /08ff2ec9-a9ad-4be2-9793-18e232dbe615 to fetch a representation of the list of events, I'd probably GET /08ff2ec9-a9ad-4be2-9793-18e232dbe615 to fetch a representation of available protocols - which is to say, a document filled with hyper links. From there, I might GET /08ff2ec9-a9ad-4be2-9793-18e232dbe615/603766ac-92af-47f3-8265-16f003ce5a09 to obtain a representation of the data collection form. I fill in the details of my event, submit the form, and POST /08ff2ec9-a9ad-4be2-9793-18e232dbe615 the form data to the server.
You can, of course, use any spelling you like for the URI.
In the first case, we need something like an HTTP capable document editor; the second case uses something more like a web browser.
If there were lots of different kinds of events, then the second case might well have lots of different form resources, all submitting POST /08ff2ec9-a9ad-4be2-9793-18e232dbe615 requests.
(You don't have to have all of the forms submitting to the same URI, but there are advantages to consider).
In a non event sourcing pattern I guess that would be first put into the database, then the event gets risen.
Even when you aren't event sourcing, there may still be some advantages to committing events to your durable store before emitting them. See Pat Helland: Data on the Outside versus Data on the Inside.
So you want clients to interact with the system, but they need to do this now by events.
Clients don't have to. Client may even not be aware of the underlying event store.
There are a number of trade-offs to consider and decisions to take when implementing an event-sourced system. To start with you can try to name a few pre computer era examples of event-sourced systems and look at their non-functional characteristics.
So the question is how do clients fire off specific event
Clients don't send events. They rather should express an intent (a command). Then it is the responsibility of the event-sourced system to validate the intent and either reject it or accept and store the corresponding event. It would mean that an intent to change the system's state was accepted and the stored event confirms the change.
My understanding is that you no longer use the rest api as resources
REST is one of the options. You just consider different things as resources. A command can be a REST resource. An event-sourced entity can be a resource, to which you POST a command. If you like it async - you can later GET the command to check its status. You can GET an entity to know its current state. You cant GET events from a class of entities as a means of subscription.
If we are talking about an end user, then most likely it doesn't deal with the event store directly. There is some third tier in between, which does CQRS. From a user client perspective it can be provided with REST, GraphQL, SOAP, gRPC or event e-mail. Whatever transport solution you find suitable. Command-processing part from CQRS is what specifically domain-driven. It decides which intent to accept and which to reject.
Event store itself is responsible for the data consistency. I.e. it should not allow two concurrent event leading to invalid state be published. This is what pre-computer event-sourced systems are good at. You usually have some physical object as an entity, so you lock for update by just getting hand of it.
Then an end-user client usually reads from some prepared read model. The responsibility of a read (R in CQRS) component is to prepare read-optimised data for clients. This data may come from multiple event-sourced of the same or different classes. Again, client may interact with a read model with whatever transport is suitable.
While an event-store is consistent and consistent immediately, a read model is eventually consistent. But it's up to you to tune this eventuality.
Just try to throw REST out of the architecture for a while. Consider it a one of available transport options - that may help to look at the root.

Event-Sourcing how to change business rules

My application use cqrs and event sourcing. It's already in production.
Now i must add a business rules. My business rules are in my aggregate root UserAggregate.
My commands :
public class CallUserForMarketingPlanCommand
{
public Guid UserId {get;set;}
public DateTime CallDate {get;set;}
public Guid PlanId {get;set;}
}
public class AcceptMarketingPlanCommand
{
public Guid UserId {get;set;}
public Date AswerDate {get;set;}
public Guid PlanId {get;set;}
}
... the same thing for RefuseMarketingPlanCommand
these commands are applied on my aggregate root which generate events stored in event store
Now if 50 days after the call, the user do not give answer, the user must be recalled by operator. To do this, i think generate event UserDoNotRepliedInDelayEvent and use it to project to a read model with recall informations.
My solution is to create a deferred command (from UserCalledForMarketingPlanEvent handler) CheckUserAnswerCommand which check the call date and generate UserDoNotRepliedInDelayEvent if necessary across the aggregate. Ok.
My problem is how to deffered this command on users already in my event store (before this change) ?
EDIT :
Without considering deferred message, how to change business rules (or a business rules parameter) affecting the state of an aggregate. Simple example :
Disable account if two payments are not permformed.
this rule come with the first deployement. Ok, now there are 1000 accounts disabled. The boss change the rule because the business is impacted, and want disable account if 5 payments are not performed.
How to enable account having less than 5 payments not performed ?
Thanks for your help.
Now if 50 days after the call, the user do not give answer, the user must be recalled by operator. To do this, i think generate event UserDoNotRepliedInDelayEvent and use it to project to a read model with recall informations.
If I undestood your question correctly, the main point here, is that the user "not replying" in time is not an action (command) of your domain, quite the contrary, it is the absence of an action. So in this scenario, I don't think you need an event at all.
You simply need a read model which will register all sent invitations and their statuses (whether they're replied, their reply dates and how long did they stand unanswered). Then, you can check this read model for unanswered invitations that exceed your deadline of 50 days (which should be simple enough at this point).
So, up to this point, no new events are generated in your "Invitations" event store. You're simply interpreting the store into a specific read model that will answer you a question you have (which invitations were not answered).
From this point, it depends on your architecture.
You might want a recurring process to check this read model for invitations that exceed your deadline, having those specific invitations trigger a "InvitationExpiredEvent" or something to notify the interested parties (those who will resend them, for instance)
Or you simply might want a more passive approach, not needing an extra event, simply reading this Read Model when appropriate (on the GUI, maybe) and listing the expired invitations.
This will then fix itself... since you can generate the read model retroactively (finding users from any given point in the that never answered their invitations) and put them through the re-invitation pipeline.
Without considering deferred message, how to change business rules (or a business rules parameter) affecting the state of an aggregate. Simple example :
Disable account if two payments are not permformed.
this rule come with the first deployement. Ok, now there are 1000 accounts disabled. The boss change the rule because the business is impacted, and want disable account if 5 payments are not performed.
How to enable account having less than 5 payments not performed ?
This part of your question is more confusing. From what I understood, you once had a rule that stated "Accounts with two or more expired payments should be inactivated" and you want to change this rule to "Accounts with five or more expired payments should be inactivated". If that's the case, you have to deal with this on multiple levels...
First, you must first implement the new rule on your command model, the same way it always have been but with the updated parameter.
Second, you cannot retroactively reactivated accounts with 2,3,4 expired payments by ignoring their "deactivation events". From your event store point of view, this happened and you must abide by the rules that an event store is a "push only" storage. So, you must use compensating events to reactivate them after the rule change.
So, if you took care of the first topic (and your domain is up and running with the new rule) and since you can't take a shortcut because of the second topic, one of your easier options is to simply develop a one-shot operation that will find accounts with 2,3,4 expired payments that are currently disabled and append to their event stores a reactivation event. At this point you will have to regenerate any affected read models if your architecture doesn't do this automatically.
That way, the next time commands are executed against these accounts, their event stores will reflect the fact that they have been reactivated and thus are currently active.
From an event store point of view... each of these accounts will have something like this on their event streams:
... > Payment Expired > Account Disabled > (maybe other stuff happened) > Account Re-Enabled
So your event store will be a pretty accurate representation of your business scenario... once you chose to disable accounts with only 2 expired payments, so a certain account was disabled by that... later you changed your mind, and even without paying their debts, these accounts were re-enabled.
EDIT:
In fact, i think the problem can be summarized by "how to integrate retroactive rules in event sourced system"
If that's the case, than the answer will be more focused on the lines of "there shouldn't be retroactive actions in an event-sourced domain".
As I said in my original answer, an event stream should be a "push-only" storage and that's mainly because only the exact order of events, as they happened, can guarantee the integrity of your rules as they were when those events happened. In that sense, an event storage is less flexible than a traditional one as it will be way more sensitive to external interference and that will sometimes be a pain (were used to meddling with the data sources directly to fix stuff).
However, we should really try to keep the rule and acknowledge that whatever happened, happened and can't be changed. What you can do, is add, to the end of event stream "compensation events" that is, new events that will register a change of state at a given time to reflect your rules-changing. And then you will need a one-shot process to go through your entities and decide which of them are eligible for such a compensating event.
Now, of course, rules are meant to be broken when needed and with enough consideration, you can go wild into the event store. Just know the risks. If you choose to go "full time machine mode" into the event store, the main risks you will face (and should guard against) are:
Entities going into invalid states in their lifetime. It doesn't matter the entity "ends" the event stream in a valid state. You must validate it never enters an invalid state as that is a prerequisite of event streams. So, for each entity affected by your editing, you will need to evaluate its validity step by step through the new event stream.
Mismatches between source code and event stream. This is a little trickier. But one of the maneuvers you can pull with an event sourced system is rollback your source code repository to a given date and "discard" events from that date forward. That way, you can re-execute actions as they would have happened in the past. If you edit past events though, you might face situations where the recorded events don't match what would have happened in the past based on source code. That might be critical and extremely misleading in the future. You should monitor that.
If your architecture integrates different contexts/domains/microservices, that might also need further evaluation. Say context-A issued a cross-boundary message to context-B because of a given state of an entity. Moving forward, you change the entity state by meddling with the event stream. Now there's a chance these contexts might be left inconsistent between them as context-B believes that entity had a state it no longer has. This might be very relevant in your scenario.
you could also use a Saga that keeps track of the process and than create a command like "recallneeded" when the time is up. it also keeps track of events that tells the Saga to complete if there was a call within the 50 days. (Keep in mind that a Saga is part of your Domain logic and acts as an AR if doing DDD)

Best place to fetch 3rd party data in ES/CQRS

I have a RegisterUserCommand with some user data.
To be able to register user with some additional information, I need to connect to 3rd party so my question is:
1) Should Command already have all of that 3rd party data when called?
2) Would it be ok if CommandHandler connects to 3rd party and retrieves it?
3) I don't think that my aggregate root should be doing it but in a sense, it is domain logic.
I think that #2 is the best way but would like to hear if I'm going wrong about it or not?
(actual case is not registering user but it needs to fetch data from a remote service/3rd party)
The issue with (2) that your domain layer (where the command handler definitely belongs) become dependent on an external bounded context. This breaks the onion architecture inner layer isolation.
Your first point is basically correct for some cases, if your service layer can fetch this data and send a self-contained command, this is one of possible solutions.
Another solution that instead of invoking a command handler, you can send a message to start a process manager that will send an information collection request, get the data back and send a command to your handler with all information required. Since this happens via asynchronous messaging, you will not have synchronous dependency on a third party and your application will keep working even if the third party is down, at least to some extent, and when the third party will wake up, all queued requests will be processed.
Usually, durable messaging also has some retry capabilities that decrease the risk of the request made to the external bounded context to fail.
Your 3rd party data is not part of your domain but is required by it so you could have a command that results in a "data requested" event to which an external process subscribes to. This process could then gather the required 3rd party data and package it into another command which results in another event stating that the data was provided which would cause your query data to be update.

CQRS Saga Management

I'm developing a new software based on CQRS principles, but I have some doubts.
I'm creating a Saga to manage the user creation. Each user has some general information (name, surname, birthdate), several addresses and other stuff. The "CreateUserSaga" is started by a "CreateUserCommand". After CreateUserCommand is handled, I want to raise a "UserCreatedEvent" that is handled by the same saga. Inside this event I want to send the command CreateUserAddress to register addresses.
What I don't know is where to retrieve data for addresses. Have I to send them in the CreateUserCommand?
You have a few options really.
1) Send all the information in the first command and event, then start removing the data that is no longer needed for subsequent events/commands as you move from command to event and onto the next command... not the best.
2) Have the 'client' / calling system hold all the data (something must have it all even if it's a browser) raise your first command/event (CreateUser/UserCreated) then have that 'client' send the next command(s) when it receives the UserCreated event adding what is necassary. You can also add in other event handlers to the process here and not listen to the UserCreated event but something else that has already processed it etc. etc.... rather easy with browsers... but sometimes it's a little overkill.
3) Send all the data into a saga via a command, have that saga manage creation of the next step of commands based on incoming events. The same as option two, just using another saga instead of the client. This keeps the flow control internal to your system... if you don't trust your connecting clients... which raises a lot of questions on why you can't trust your connected clients.
In all cases something needs to hold all the data. be it a client, a separate saga or the first command. I'm not going to question why you'd need it in your example, but those are your options. They are all state management objects... something holds all the incoming state change requests and decides what and who get what and when.

Transactions across REST microservices?

Let's say we have a User, Wallet REST microservices and an API gateway that glues things together. When Bob registers on our website, our API gateway needs to create a user through the User microservice and a wallet through the Wallet microservice.
Now here are a few scenarios where things could go wrong:
User Bob creation fails: that's OK, we just return an error message to the Bob. We're using SQL transactions so no one ever saw Bob in the system. Everything's good :)
User Bob is created but before our Wallet can be created, our API gateway hard crashes. We now have a User with no wallet (inconsistent data).
User Bob is created and as we are creating the Wallet, the HTTP connection drops. The wallet creation might have succeeded or it might have not.
What solutions are available to prevent this kind of data inconsistency from happening? Are there patterns that allow transactions to span multiple REST requests? I've read the Wikipedia page on Two-phase commit which seems to touch on this issue but I'm not sure how to apply it in practice. This Atomic Distributed Transactions: a RESTful design paper also seems interesting although I haven't read it yet.
Alternatively, I know REST might just not be suited for this use case. Would perhaps the correct way to handle this situation to drop REST entirely and use a different communication protocol like a message queue system? Or should I enforce consistency in my application code (for example, by having a background job that detects inconsistencies and fixes them or by having a "state" attribute on my User model with "creating", "created" values, etc.)?
What doesn't make sense:
distributed transactions with REST services. REST services by definition are stateless, so they should not be participants in a transactional boundary that spans more than one service. Your user registration use case scenario makes sense, but the design with REST microservices to create User and Wallet data is not good.
What will give you headaches:
EJBs with distributed transactions. It's one of those things that work in theory but not in practice. Right now I'm trying to make a distributed transaction work for remote EJBs across JBoss EAP 6.3 instances. We've been talking to RedHat support for weeks, and it didn't work yet.
Two-phase commit solutions in general. I think the 2PC protocol is a great algorithm (many years ago I implemented it in C with RPC). It requires comprehensive fail recovery mechanisms, with retries, state repository, etc. All the complexity is hidden within the transaction framework (ex.: JBoss Arjuna). However, 2PC is not fail proof. There are situations the transaction simply can't complete. Then you need to identify and fix database inconsistencies manually. It may happen once in a million transactions if you're lucky, but it may happen once in every 100 transactions depending on your platform and scenario.
Sagas (Compensating transactions). There's the implementation overhead of creating the compensating operations, and the coordination mechanism to activate compensation at the end. But compensation is not fail proof either. You may still end up with inconsistencies (= some headache).
What's probably the best alternative:
Eventual consistency. Neither ACID-like distributed transactions nor compensating transactions are fail proof, and both may lead to inconsistencies. Eventual consistency is often better than "occasional inconsistency". There are different design solutions, such as:
You may create a more robust solution using asynchronous communication. In your scenario, when Bob registers, the API gateway could send a message to a NewUser queue, and right-away reply to the user saying "You'll receive an email to confirm the account creation." A queue consumer service could process the message, perform the database changes in a single transaction, and send the email to Bob to notify the account creation.
The User microservice creates the user record and a wallet record in the same database. In this case, the wallet store in the User microservice is a replica of the master wallet store only visible to the Wallet microservice. There's a data synchronization mechanism that is trigger-based or kicks in periodically to send data changes (e.g., new wallets) from the replica to the master, and vice-versa.
But what if you need synchronous responses?
Remodel the microservices. If the solution with the queue doesn't work because the service consumer needs a response right away, then I'd rather remodel the User and Wallet functionality to be collocated in the same service (or at least in the same VM to avoid distributed transactions). Yes, it's a step farther from microservices and closer to a monolith, but will save you from some headache.
This is a classic question I was asked during an interview recently How to call multiple web services and still preserve some kind of error handling in the middle of the task. Today, in high performance computing, we avoid two phase commits. I read a paper many years ago about what was called the "Starbuck model" for transactions: Think about the process of ordering, paying, preparing and receiving the coffee you order at Starbuck... I oversimplify things but a two phase commit model would suggest that the whole process would be a single wrapping transaction for all the steps involved until you receive your coffee. However, with this model, all employees would wait and stop working until you get your coffee. You see the picture ?
Instead, the "Starbuck model" is more productive by following the "best effort" model and compensating for errors in the process. First, they make sure that you pay! Then, there are message queues with your order attached to the cup. If something goes wrong in the process, like you did not get your coffee, it is not what you ordered, etc, we enter into the compensation process and we make sure you get what you want or refund you, This is the most efficient model for increased productivity.
Sometimes, starbuck is wasting a coffee but the overall process is efficient. There are other tricks to think when you build your web services like designing them in a way that they can be called any number of times and still provide the same end result. So, my recommendation is:
Don't be too fine when defining your web services (I am not convinced about the micro-service hype happening these days: too many risks of going too far);
Async increases performance so prefer being async, send notifications by email whenever possible.
Build more intelligent services to make them "recallable" any number of times, processing with an uid or taskid that will follow the order bottom-top until the end, validating business rules in each step;
Use message queues (JMS or others) and divert to error handling processors that will apply operations to "rollback" by applying opposite operations, by the way, working with async order will require some sort of queue to validate the current state of the process, so consider that;
In last resort, (since it may not happen often), put it in a queue for manual processing of errors.
Let's go back with the initial problem that was posted. Create an account and create a wallet and make sure everything was done.
Let's say a web service is called to orchestrate the whole operation.
Pseudo code of the web service would look like this:
Call Account creation microservice, pass it some information and a some unique task id 1.1 Account creation microservice will first check if that account was already created. A task id is associated with the account's record. The microservice detects that the account does not exist so it creates it and stores the task id. NOTE: this service can be called 2000 times, it will always perform the same result. The service answers with a "receipt that contains minimal information to perform an undo operation if required".
Call Wallet creation, giving it the account ID and task id. Let's say a condition is not valid and the wallet creation cannot be performed. The call returns with an error but nothing was created.
The orchestrator is informed of the error. It knows it needs to abort the Account creation but it will not do it itself. It will ask the wallet service to do it by passing its "minimal undo receipt" received at the end of step 1.
The Account service reads the undo receipt and knows how to undo the operation; the undo receipt may even include information about another microservice it could have called itself to do part of the job. In this situation, the undo receipt could contain the Account ID and possibly some extra information required to perform the opposite operation. In our case, to simplify things, let's say is simply delete the account using its account id.
Now, let's say the web service never received the success or failure (in this case) that the Account creation's undo was performed. It will simply call the Account's undo service again. And this service should normaly never fail because its goal is for the account to no longer exist. So it checks if it exists and sees nothing can be done to undo it. So it returns that the operation is a success.
The web service returns to the user that the account could not be created.
This is a synchronous example. We could have managed it in a different way and put the case into a message queue targeted to the help desk if we don't want the system to completly recover the error". I've seen this being performed in a company where not enough hooks could be provided to the back end system to correct situations. The help desk received messages containing what was performed successfully and had enough information to fix things just like our undo receipt could be used for in a fully automated way.
I have performed a search and the microsoft web site has a pattern description for this approach. It is called the compensating transaction pattern:
Compensating transaction pattern
All distributed systems have trouble with transactional consistency. The best way to do this is like you said, have a two-phase commit. Have the wallet and the user be created in a pending state. After it is created, make a separate call to activate the user.
This last call should be safely repeatable (in case your connection drops).
This will necessitate that the last call know about both tables (so that it can be done in a single JDBC transaction).
Alternatively, you might want to think about why you are so worried about a user without a wallet. Do you believe this will cause a problem? If so, maybe having those as separate rest calls are a bad idea. If a user shouldn't exist without a wallet, then you should probably add the wallet to the user (in the original POST call to create the user).
IMHO one of the key aspects of microservices architecture is that the transaction is confined to the individual microservice (Single responsibility principle).
In the current example, the User creation would be an own transaction. User creation would push a USER_CREATED event into an event queue. Wallet service would subscribe to the USER_CREATED event and do the Wallet creation.
If my wallet was just another bunch of records in the same sql database as the user then I would probably place the user and wallet creation code in the same service and handle that using the normal database transaction facilities.
It sounds to me you are asking about what happens when the wallet creation code requires you touch another other system or systems? Id say it all depends on how complex and or risky the creation process is.
If it's just a matter of touching another reliable datastore (say one that can't participate in your sql transactions), then depending on the overall system parameters, I might be willing to risk the vanishingly small chance that second write won't happen. I might do nothing, but raise an exception and deal with the inconsistent data via a compensating transaction or even some ad-hoc method. As I always tell my developers: "if this sort of thing is happening in the app, it won't go unnoticed".
As the complexity and risk of wallet creation increases you must take steps to ameliorate the risks involved. Let's say some of the steps require calling multiple partner apis.
At this point you might introduce a message queue along with the notion of partially constructed users and/or wallets.
A simple and effective strategy for making sure your entities eventually get constructed properly is to have the jobs retry until they succeed, but a lot depends on the use cases for your application.
I would also think long and hard about why I had a failure prone step in my provisioning process.
One simple Solution is you create user using the User Service and use a messaging bus where user service emits its events , and Wallet Service registers on the messaging bus, listens on User Created event and create Wallet for the User. In the mean time , if user goes on Wallet UI to see his Wallet, check if user was just created and show your wallet creation is in progress, please check in some time
What solutions are available to prevent this kind of data inconsistency from happening?
Traditionally, distributed transaction managers are used. A few years ago in the Java EE world you might have created these services as EJBs which were deployed to different nodes and your API gateway would have made remote calls to those EJBs. The application server (if configured correctly) automatically ensures, using two phase commit, that the transaction is either committed or rolled back on each node, so that consistency is guaranteed. But that requires that all the services be deployed on the same type of application server (so that they are compatible) and in reality only ever worked with services deployed by a single company.
Are there patterns that allow transactions to span multiple REST requests?
For SOAP (ok, not REST), there is the WS-AT specification but no service that I have ever had to integrate has support that. For REST, JBoss has something in the pipeline. Otherwise, the "pattern" is to either find a product which you can plug into your architecture, or build your own solution (not recommended).
I have published such a product for Java EE: https://github.com/maxant/genericconnector
According to the paper you reference, there is also the Try-Cancel/Confirm pattern and associated Product from Atomikos.
BPEL Engines handle consistency between remotely deployed services using compensation.
Alternatively, I know REST might just not be suited for this use case. Would perhaps the correct way to handle this situation to drop REST entirely and use a different communication protocol like a message queue system?
There are many ways of "binding" non-transactional resources into a transaction:
As you suggest, you could use a transactional message queue, but it will be asynchronous, so if you depend on the response it becomes messy.
You could write the fact that you need to call the back end services into your database, and then call the back end services using a batch. Again, async, so can get messy.
You could use a business process engine as your API gateway to orchestrate the back end microservices.
You could use remote EJB, as mentioned at the start, since that supports distributed transactions out of the box.
Or should I enforce consistency in my application code (for example, by having a background job that detects inconsistencies and fixes them or by having a "state" attribute on my User model with "creating", "created" values, etc.)?
Playing devils advocate: why build something like that, when there are products which do that for you (see above), and probably do it better than you can, because they are tried and tested?
In micro-services world the communication between services should be either through rest client or messaging queue. There can be two ways to handle the transactions across services depending on how are you communicating between the services. I will personally prefer message driven architecture so that a long transaction should be a non blocking operation for a user.
Lets take you example to explain it :
Create user BOB with event CREATE USER and push the message to a message bus.
Wallet service subscribed to this event can create a wallet corresponding to the user.
The one thing which you have to take care is to select a robust reliable message backbone which can persists the state in case of failure. You can use kafka or rabbitmq for messaging backbone. There will be a delay in execution because of eventual consistency but that can be easily updated through socket notification. A notifications service/task manager framework can be a service which update the state of the transactions through asynchronous mechanism like sockets and can help UI to update show the proper progress.
Personally I like the idea of Micro Services, modules defined by the use cases, but as your question mentions, they have adaptation problems for the classical businesses like banks, insurance, telecom, etc...
Distributed transactions, as many mentioned, is not a good choice, people now going more for eventually consistent systems but I am not sure this will work for banks, insurance, etc....
I wrote a blog about my proposed solution, may be this can help you....
https://mehmetsalgar.wordpress.com/2016/11/05/micro-services-fan-out-transaction-problems-and-solutions-with-spring-bootjboss-and-netflix-eureka/
Eventual consistency is the key here.
One of the services is chosen to become primary handler of the event.
This service will handle the original event with single commit.
Primary handler will take responsibility for asynchronously communicating the secondary effects to other services.
The primary handler will do the orchestration of other services calls.
The commander is in charge of the distributed transaction and takes control. It knows the instruction to be executed and will coordinate executing them. In most scenarios there will just be two instructions, but it can handle multiple instructions.
The commander takes responsibility of guaranteeing the execution of all instructions, and that means retires.
When the commander tries to effect the remote update and doesn’t get a response, it has no retry.
This way the system can be configured to be less prone to failure and it heals itself.
As we have retries we have idempotence.
Idempotence is the property of being able to do something twice such a way that the end results be the same as if it had been done once only.
We need idempotence at the remote service or data source so that, in the case where it receives the instruction more than once, it only processes it once.
Eventual consistency
This solves most of distributed transaction challenges, however we need to consider couple of points here.
Every failed transaction will be followed by a retry, the amount of attempted retries depends on the context.
Consistency is eventual i.e., while the system is out of consistent state during a retry, for example if a customer has ordered a book, and made a payment and then updates the stock quantity. If the stock update operations fail and assuming that was the last stock available, the book will still be available till the retry operation for the stock updating has succeeded. After the retry is successful your system will be consistent.
Why not use API Management (APIM) platform that supports scripting/programming? So, you will be able to build composite service in the APIM without disturbing micro services. I have designed using APIGEE for this purpose.