Applying SAGA pattern in situations where immediate feedback to user is required - cqrs

Imagine there is an app where a user has a wallet which they can top it up with cash, cash it out or make purchases from an external system, when the user creates a new purchase order, we first deduct the amount from the user’s wallet. Then send an API call to the external API saying the user purchased these items and we get a response back from the merchant on whether the purchase was successful or not. If the purchase was not successful, we would refund the amount to the user’s wallet on our system. However, one key note here is that the merchant purchase API endpoint could return error responses that are domain errors (user is not registered, user has been deactivated, purchase is less than minimum allowed amount or above maximum amount) and the user gets an immediate confirmation response on whether the transaction was successful or not, and if not, we show the user the failure reason we got from the external API
I’d like to apply saga to the flow above but there are some challenges
Let’s say we’re going to be using a message broker (Kafka, rabbitmq) for async saga flow, how do we return a response to the user on whether the transaction was successful or not? The async transaction could fail for any reason, and if it fails it might take a while to process retries or even rollbacks in the background.
And even if we were able to let’s say notify the front-end/user of the result using something like webhooks where we push data to the client. What happens on timeouts or technical failures? Since the flow is async, it could take either a second or an hour to finish. Meanwhile what should the user see? If we show a timeout error, the user could retry the request and end up with 2 requests in pending state that will be processed later on but the user’s intention was only to make one.
I cannot show the user a successful message like “Purchase created” then notify them later on for two reasons:
The external API returns domain errors a lot of the time. And their response is immediate. So it won’t make sense for the user to see this response message then immediately get a notification about the failure
The user must be able to see the error message returned by the external API
How do we solve this? The main reason behind attempting to solve it with saga is to ensure consistency and retry on failure, but given that, how do we handle user interaction?

This is how I would solve this through temporal.io open source project:
Synchronously (waiting for completion) execute a workflow when a user creates the purchase order.
Workflow deducts the purchase amount from the user's wallet
Workflow calls the external API
If the API call completes without failure complete the workflow. This unblocks the synchronous call from (1) and shows the status to the user.
If the API call fails start (without waiting for the result) another workflow that implements the rollback.
Fail the original workflow. This returns the failure to the caller at (1). This allows showing the error to the user.
The rollback workflow executes the rollback logic as long as needed.
Here is the implementation of the above logic using Java SDK. Other supported SDKs are Go, Typescript/Javascript, Python, PHP.
public class PurchaseWorkflowImpl implements PurchaseWorkflow {
private final ActivityOptions options =
ActivityOptions.newBuilder().setStartToCloseTimeout(Duration.ofSeconds(10)).build();
private final Activities activities = Workflow.newActivityStub(Activities.class, options);
#Override
public void purchase(String accountId, Money amount, List<Item> items) {
WalletUpdate walletUpdate = activities.deductFromWallet(accountId, amount);
try {
activities.notifyItemsPurchased(accountId, items);
} catch (ActivityFailure e) {
// Create stub used to start a child workflow.
// ABANDON tells child to keep running after the parent completion.
RollbackWalletUpdate rollback =
Workflow.newChildWorkflowStub(
RollbackWalletUpdate.class,
ChildWorkflowOptions.newBuilder()
.setParentClosePolicy(ParentClosePolicy.PARENT_CLOSE_POLICY_ABANDON)
.build());
// Start rollbackWalletUpdate child workflow without blocking.
Async.procedure(rollback::rollbackWalletUpdate, walletUpdate);
// Wait for the child to start.
Workflow.getWorkflowExecution(rollback).get();
// Fail workflow.
throw e;
}
}
}
The code that synchronously executes workflow
PurchaseWorkflow purchaseWorkflow =
workflowClient.newWorkflowStub(PurchaseWorkflow.class, options);
// Blocks until workflow completion.
purchaseWorkflow.purchase(accountId, items);
Note that Temporal ensures that the code of the workflow keeps running as if nothing happened in the presence of various types of failures including process crashes. So all the fault tolerance aspects are taken care of automatically.

Related

Continuous request throwing with Flutter

After saving a data, I wait for the confirmation of the opposite server. With a different endpoint, I want to constantly check this server's approval check in the background. I want to send requests continuously until the transaction is approved and when it is approved, I want to take action. How can I make a request to a persistent endpoint?
From which sources should I get help, if there are examples, I would like to examine them.

What to do if a RESTful api is only partly successful

In our design we have something of a paradox. We have a database of projects. Each project has a status. We have a REST api to change a project from “Ready” status to “Cleanup” status. Two things must happen.
update the status in the database
send out an email to the approvers
Currently RESTful api does 1, and if that is successful, do 2.
But sometimes the email fails to send. But since (1) is already committed, it is not possible to rollback.
I don't want to send the email prior to commit, because I want to make sure the commit is successful before sending the email.
I thought about undoing step 1, but that is very hard. The status change involves adding new records to the history table, so I need to delete them. And if another person make other changes concurrently, the undo might get messed up.
So what can I do? If (2) fails, should I return “200 OK” to the client?
Seems like the best option is to return “500 Server Error” with error message that says “The project status was changed. However, sending the email to the approvers failed. Please take appropriate action.”
Perhaps I should not try to do 1 + 2 in a single operation? But that just puts the burden on the client, which is worse!
Just some random thoughts:
You can have a notification sent status flag along with a datetime of submission. When an email is successful then it flips, if not then it stays. When changes are submitted then your code iterates through ALL unsent notifications and tries to send. No idea what backend db you are suing but I believe many have the functionality to send emails as well. You could have a scheduled Job (SQL Server Agent for MSSQL) that runs hourly and tries to send if the datetime of the submission is lapsed a certain amount or starts setting off alarms if it fails as well.
If ti is that insanely important then maybe you could integrate a third party service such as sendgrid to run as a backup sending mech. That of course would be more $$ though...
Traditionally I've always separated functions like this into a backend worker process that handles this kind of administrative tasking stuff across many different applications. Some notifications get sent out every morning. Some get sent out every 15 minutes. Some are weekly summaries. If I run into a crash and burn then I light up the event log and we are (lucky/unlucky) enough to have server monitoring tools that alert us on specified application events.

REST APIs: How to ensure atomicity?

I am developing a small REST API. As I got into analyzing all the possible failure scenarios, which I have to handle to create a reliable and stable system, I went into thinking about how to make my APIs atomic.
If we take a simple case of creating a contact through the POST API.
The server gets the POST request for the new contact.
Creates the contact in the DB.
Creates a response to send back to the client.
The server crashes before sending the response.
The client gets a timeout error (or connection refused?)
The client is bound to think that the contact creation has failed, though, in fact, the contact was in the DB.
Is this a rare case we can ignore? How do big companies deal with such an issue?
To handle this, you should make your write APIs idempotent i.e. If the same operation is executed multiple times, the result should be same as the operation was done only once.
To achieve this in your current example, you need to be able to identify a contact uniquely based on some parameter, say emailAddress. So, if the createContact is called again with the same emailAddress, check in the DB if a contact already exists with the emailAddress. If so, return the existing contact. Else, create a new contact with the emailAddress and return it.
Hope this helps.
If the request times out, the client should not make any assumption about whether it failed or succeeded.
If it is just a user making a request from a web form, then the timeout should just be exposed to the user, and they can hit the back button and check whether the operation succeeded or not, and if not they submit the request again. (This is fine as long as you always keep a consistent state. If your operation has multiple steps and fails mid way, you need to roll back.)
However if reliable messaging is important to your application you will have to use a library or build your own reliable messaging layer. This could work by having the client assign a unique ID to every request, and having another request that lets you check the result of that request ID later. Then you can do automated retries but only where necessary.

How to prevent user from withdrawing funds twice?

I have an endpoint in my api, which lets users specify the amount of money they want to withdraw. Before I send the withdrawal request to the payment processor, I check that the requested amount is <= the user's balance. Once the payment is processed, I deduct the amount from the user's balance.
But I'm thinking, someone could send a second request before the first payment is processed, effectively withdrawing the amount twice. How do I prevent this situation?
PS: I'm using Flask Restless and Postgres, if that makes any difference.
In your case, where you're co-ordinating with an external service, it's harder than you'd expect.
Traditional: prepared transactions and XA
The standard solution to this is to use two-phase commit, creating a distributed transaction, where you update the user's record before sending the payment request:
UPDATE account
SET balance = balance - :requested_amount
WHERE balance >= :requested_amount AND user_id = :userid`
If the update succeeds (i.e. they had enough money), you PREPARE TRANSACTION to get the DB to confirm the tx will be saved even if the DB crashes. You then send the request off to the provider, and COMMIT PREPARED or ROLLBACK PREPARED depending on the result.
A lock is held on the balance record by the prepared transaction so no other tx can begin until the prepared tx is rolled back or committed, at which point the new balance is visible.
The old balance shows up to other transactions until the prepared transaction commits or rolls back, unless they use SELECT ... FOR UPDATE or SELECT ... FOR SHARE, in which case they'll wait until the prepared TX commits/rolls back. The NOWAIT option lets them error out instead. It's all very convenient.
However, this approach scales poorly for very large client counts, and it can become problematic if the payment processor is slow or becomes unresponsive. At least in PostgreSQL there's a limit on how many prepared transactions you can have at a time.
More scalable: Keep an open-transaction log in the app
If you don't want to use two-phase commit, you'll need to keep an open transaction log instead.
Rather than just checking the users' balance, the app inserts a row into the active_transactions table as part of beginning a transaction for the user. If the user already has an active transaction, you'll need a unique constraint on active_transactions.user_id so if there are concurrent inserts all but one gets rejected.
You'll probably want to update the user's balance in the same transaction.
Other approaches, like SELECTing to check for the user before inserting a record, are unsafe and prone to race conditions. They're OK if they help provide nicer error messages, etc, but are only acceptable as additional checks.
Then you send the payment request and wait for a response. Whether it's successful or not, when you get a response you delete the open transaction journal entry and copy it to a history table; if the payment failed, you also put the user's balance back up in the same transaction, then commit. Do whatever record-keeping etc you need to in the same transaction you process the payment response in.
Either way: Transaction cleanup
With prepared transactions or an app-defined transaction journal, now you're left with the problem of what to do when your app/server crashes with transactions active and you don't know what the payment processor response for them was ... or whether you actually sent the request yet.
Most payment processor APIs offer some help for this by letting you attach application-defined tokens to each request. If you were using prepared transactions you'd use the prepared transaction Id for this; if you were doing your own transaction journal you'd use the ID you generated when you inserted the entry into your transaction journal. On restart after a crash/restart you can then check each open transaction in your app and ask the payment processor if it knows about it and if so, whether it was successful or not.
You also have to deal with cases where there was no crash, but a payment processor request or response got lost due a transient network issue, etc. You'll need code that periodically checks for apparently abandoned open transactions and re-checks them with the payment processor, like after crash recovery.
There are a number of failure modes you have to deal with:
App crash / network issue / etc after local payment request saved, but before request sent successfully to processor
Processor down/unreachable
Processor sends reply (payment failed / payment OK) but your app is down/unreachable and you never get the response.
App sends payment request then is restarted before payment processor finishes processing the request (or finishes receiving it). Cleanup code thinks the processor never received the request and discards the local transaction record, then payment processor responds to confirm the payment. (There are a few ways to deal with this, but it's really out of scope for this answer.)
... more?
Fun times, eh?
A useful additional sanity check is to periodically (say, daily) fetch the list of transactions from the provider, and compare it to the transactions you thought you did, making sure the completion statuses match. Flag any mismatches for human evaluation. It shouldn't happen, but ....

Cancel gwt rpc call

In this example there is a pretty description of how to make a timeout logic using a Timer#schedule. But there is a pitfall there. We have 2 rpc requests: first makes a lot of computation on server(or maybe retrieving a large amount of data from database) and second a tiny request that returns results immediately. If we make first request, we will not recieve results immediately, instead we will have a timeout and after timeout we make the second tiny request and then abortFlag from example will be true, so we can retrieve the results of second request, but also we can retrieve the results of first request that was timed out before(because the AsyncCallback object of first call was not destroyed).
So we need some kind of cancelling the first rpc call after timeout occurs. how can I do this?
Let me give you an analogy.
You, the boss, made a call to a supplier, to get some product info. Supplier say they need to call you back because the info would take some time to be gathered. So, you gave them the contact of your foreman.
Your foreman waits for the call. Then you told your foreman to cancel the info request if it takes more than 30 minutes.
Your foreman thinks you are bonkers because he cannot cancel the request, because he does not have an account that gives him privilege to access the supplier's ordering system.
So, your foreman simply ignores any response from the supplier after 30 minutes. Your ingenious foreman sets up a timer in his phone that ignores the call from the supplier after 30 minutes. Even if you killed your foreman, cut off all communication links, the vendor would still be busy servicing your request.
There is nothing on the GWT client-side to cancel. The callback is merely a javascript object waiting to be invoked.
To cancel the call, you need to tell the server-side to stop wasting cpu resources (if that is your concern). Your server-side must be programmed to provide a service API which when invoked would cancel the job and return immediately to trigger your GWT callback.
You can refresh the page, and that would discard the page request and close the socket, but the server side would still be running. And when the server side completes its tasks and tries to perform a http response, it would fail, saying in the server logs that it had lost the client socket.
It is a very straight forward piece of reasoning.
Therefore, it falls into the design of your servlet/service, how a previous request can be identified by a subsequent request.
Cascaded Callbacks
If request 2 is dependent on the status of request 1, you should perform a cascaded callback. If request 2 is to be run on success then, you should place request 2 into the onFailure block of the callback. Rather than submitting the two requests one after another.
Otherwise, your timer should trigger request 2, and request 2 would have two responsibilities:
tell the server to cancel the previous request
get the small piece of info