In this example there is a pretty description of how to make a timeout logic using a Timer#schedule. But there is a pitfall there. We have 2 rpc requests: first makes a lot of computation on server(or maybe retrieving a large amount of data from database) and second a tiny request that returns results immediately. If we make first request, we will not recieve results immediately, instead we will have a timeout and after timeout we make the second tiny request and then abortFlag from example will be true, so we can retrieve the results of second request, but also we can retrieve the results of first request that was timed out before(because the AsyncCallback object of first call was not destroyed).
So we need some kind of cancelling the first rpc call after timeout occurs. how can I do this?
Let me give you an analogy.
You, the boss, made a call to a supplier, to get some product info. Supplier say they need to call you back because the info would take some time to be gathered. So, you gave them the contact of your foreman.
Your foreman waits for the call. Then you told your foreman to cancel the info request if it takes more than 30 minutes.
Your foreman thinks you are bonkers because he cannot cancel the request, because he does not have an account that gives him privilege to access the supplier's ordering system.
So, your foreman simply ignores any response from the supplier after 30 minutes. Your ingenious foreman sets up a timer in his phone that ignores the call from the supplier after 30 minutes. Even if you killed your foreman, cut off all communication links, the vendor would still be busy servicing your request.
There is nothing on the GWT client-side to cancel. The callback is merely a javascript object waiting to be invoked.
To cancel the call, you need to tell the server-side to stop wasting cpu resources (if that is your concern). Your server-side must be programmed to provide a service API which when invoked would cancel the job and return immediately to trigger your GWT callback.
You can refresh the page, and that would discard the page request and close the socket, but the server side would still be running. And when the server side completes its tasks and tries to perform a http response, it would fail, saying in the server logs that it had lost the client socket.
It is a very straight forward piece of reasoning.
Therefore, it falls into the design of your servlet/service, how a previous request can be identified by a subsequent request.
Cascaded Callbacks
If request 2 is dependent on the status of request 1, you should perform a cascaded callback. If request 2 is to be run on success then, you should place request 2 into the onFailure block of the callback. Rather than submitting the two requests one after another.
Otherwise, your timer should trigger request 2, and request 2 would have two responsibilities:
tell the server to cancel the previous request
get the small piece of info
Related
Imagine there is an app where a user has a wallet which they can top it up with cash, cash it out or make purchases from an external system, when the user creates a new purchase order, we first deduct the amount from the user’s wallet. Then send an API call to the external API saying the user purchased these items and we get a response back from the merchant on whether the purchase was successful or not. If the purchase was not successful, we would refund the amount to the user’s wallet on our system. However, one key note here is that the merchant purchase API endpoint could return error responses that are domain errors (user is not registered, user has been deactivated, purchase is less than minimum allowed amount or above maximum amount) and the user gets an immediate confirmation response on whether the transaction was successful or not, and if not, we show the user the failure reason we got from the external API
I’d like to apply saga to the flow above but there are some challenges
Let’s say we’re going to be using a message broker (Kafka, rabbitmq) for async saga flow, how do we return a response to the user on whether the transaction was successful or not? The async transaction could fail for any reason, and if it fails it might take a while to process retries or even rollbacks in the background.
And even if we were able to let’s say notify the front-end/user of the result using something like webhooks where we push data to the client. What happens on timeouts or technical failures? Since the flow is async, it could take either a second or an hour to finish. Meanwhile what should the user see? If we show a timeout error, the user could retry the request and end up with 2 requests in pending state that will be processed later on but the user’s intention was only to make one.
I cannot show the user a successful message like “Purchase created” then notify them later on for two reasons:
The external API returns domain errors a lot of the time. And their response is immediate. So it won’t make sense for the user to see this response message then immediately get a notification about the failure
The user must be able to see the error message returned by the external API
How do we solve this? The main reason behind attempting to solve it with saga is to ensure consistency and retry on failure, but given that, how do we handle user interaction?
This is how I would solve this through temporal.io open source project:
Synchronously (waiting for completion) execute a workflow when a user creates the purchase order.
Workflow deducts the purchase amount from the user's wallet
Workflow calls the external API
If the API call completes without failure complete the workflow. This unblocks the synchronous call from (1) and shows the status to the user.
If the API call fails start (without waiting for the result) another workflow that implements the rollback.
Fail the original workflow. This returns the failure to the caller at (1). This allows showing the error to the user.
The rollback workflow executes the rollback logic as long as needed.
Here is the implementation of the above logic using Java SDK. Other supported SDKs are Go, Typescript/Javascript, Python, PHP.
public class PurchaseWorkflowImpl implements PurchaseWorkflow {
private final ActivityOptions options =
ActivityOptions.newBuilder().setStartToCloseTimeout(Duration.ofSeconds(10)).build();
private final Activities activities = Workflow.newActivityStub(Activities.class, options);
#Override
public void purchase(String accountId, Money amount, List<Item> items) {
WalletUpdate walletUpdate = activities.deductFromWallet(accountId, amount);
try {
activities.notifyItemsPurchased(accountId, items);
} catch (ActivityFailure e) {
// Create stub used to start a child workflow.
// ABANDON tells child to keep running after the parent completion.
RollbackWalletUpdate rollback =
Workflow.newChildWorkflowStub(
RollbackWalletUpdate.class,
ChildWorkflowOptions.newBuilder()
.setParentClosePolicy(ParentClosePolicy.PARENT_CLOSE_POLICY_ABANDON)
.build());
// Start rollbackWalletUpdate child workflow without blocking.
Async.procedure(rollback::rollbackWalletUpdate, walletUpdate);
// Wait for the child to start.
Workflow.getWorkflowExecution(rollback).get();
// Fail workflow.
throw e;
}
}
}
The code that synchronously executes workflow
PurchaseWorkflow purchaseWorkflow =
workflowClient.newWorkflowStub(PurchaseWorkflow.class, options);
// Blocks until workflow completion.
purchaseWorkflow.purchase(accountId, items);
Note that Temporal ensures that the code of the workflow keeps running as if nothing happened in the presence of various types of failures including process crashes. So all the fault tolerance aspects are taken care of automatically.
I have a Firebase HTTPS function that sends timed messages and is triggered by Google Cloud Tasks.
According to the Cloud Tasks documentation, any response code outside the 200 range is seen as a failure and will trigger a retry.
This function needs to scale to millions of daily messages, so we need to avoid retrying messages that have a permanent failure (the person opted out, etc).
Note: This is especially important in this example because each task needs to look up the latest information before processing, adding 2-10 Firestore reads to each attempt. We can't send this info in the payload because it might change between the time the message was queued and it is processed.
Its easy to delete the task using the cloud task API, but I was wondering if there is any HTTP response code (or header) that can mark these tasks as permanently failed (400 bad request for example) and not to retry them.
Only the HTTP code 2XX (from 200 to 299) are considered as a task completion and stops the retries.
All other return code are considered as a failure and imply a retry.
Note: 429 (and 503 for App Engine task queue) throttle the retries on the queue (to prevent service congestion).
If you want to stop the retry mechanism by Cloud Task, return a 2XX code. That's the only way.
You can imagine to return 299 and to plug Error Reporting alert on this specific code to track them and be alerted on these cases
I am developing a small REST API. As I got into analyzing all the possible failure scenarios, which I have to handle to create a reliable and stable system, I went into thinking about how to make my APIs atomic.
If we take a simple case of creating a contact through the POST API.
The server gets the POST request for the new contact.
Creates the contact in the DB.
Creates a response to send back to the client.
The server crashes before sending the response.
The client gets a timeout error (or connection refused?)
The client is bound to think that the contact creation has failed, though, in fact, the contact was in the DB.
Is this a rare case we can ignore? How do big companies deal with such an issue?
To handle this, you should make your write APIs idempotent i.e. If the same operation is executed multiple times, the result should be same as the operation was done only once.
To achieve this in your current example, you need to be able to identify a contact uniquely based on some parameter, say emailAddress. So, if the createContact is called again with the same emailAddress, check in the DB if a contact already exists with the emailAddress. If so, return the existing contact. Else, create a new contact with the emailAddress and return it.
Hope this helps.
If the request times out, the client should not make any assumption about whether it failed or succeeded.
If it is just a user making a request from a web form, then the timeout should just be exposed to the user, and they can hit the back button and check whether the operation succeeded or not, and if not they submit the request again. (This is fine as long as you always keep a consistent state. If your operation has multiple steps and fails mid way, you need to roll back.)
However if reliable messaging is important to your application you will have to use a library or build your own reliable messaging layer. This could work by having the client assign a unique ID to every request, and having another request that lets you check the result of that request ID later. Then you can do automated retries but only where necessary.
the long string of NOTIFY messages happen after the called number answers. and after about 20-30 seconds the 503 happens and then the call connects fine with audio.
If that trace is for a single call it's an incredibly complex one. After spending a bit of time looking over it I don't think it is for a single call and instead there are a few different calls mixed up in it. It's complicated by the fact that 10.10.20.1 is a Back to Back User Agent (B2BUA) and is initiating its on calls in response to different events.
As to your question about the NOTIFY request it's originally generated by the UAC at 10.10.10.3 as part of what appears to be an attended transfer. The REFER request is the start of the transfer. An implicit subscription, which is what the NOTIFY request is part of, gets created for a REFER transaction (see https://www.rfc-editor.org/rfc/rfc3515 and also see https://www.rfc-editor.org/rfc/rfc4488 which deals with suppressing the implicit transaction).
For an attended transfer the NOTIFY request allows a call leg end point to indicate that the transfer has been processed successfully. In this case it looks like the user agent at 10.10.10.3 isn't happy to accept the transfer until it gets a response to its NOTIFY request. This is unusual behaviour as typically the NOTIFY requests are for just that, notifying agents of events not controlling call flow. Once 10.10.10.3 gets the 503 response to its NOTIFY request it finally starts sending the RTP to 10.10.20.4. It mustn't care what the response is as 503 is an error condition and would usually result in whatever was waiting for it to fail.
In my CGI script i make long (up to 10 seconds) request to another server, parsing results and show response to my user (via AJAX). But another server owner ask me to perform each request no more than 1 request per 10 seconds so:
i need to save each request of my
user;
every ten seconds i can make only one
request to another server;
First i think about Cron which will open simple text file (queue file), read first line and send it as a request to another server. After that it will save result in another file (where i'll cache all results). So my CGI will first check cache file and try to find result in it and after that (if result is not found) it will save task in the queue file (for the Cron).
But Cron run only once per minute so my user must wait for so long time...
So how i can do this via CGI?
May be:
After checking the cache file CGI will
estimate time to complete request
(by reading current queue file) and
send this estimation time to the
HTML (where i can got this time and
make another request after this time
via AJAX).
After that it will save request to
the queue file and fork. The forked
process will wait untill it's
request will be on the top of the
queue and will make request to
another server.
After that it will save result in
the cache file. What you think?
May be some module already written for such tasks?
One option is to create a local daemon/service (Linux/Windows) that handles sending all requests to the remote server. Your web service can talk to this daemon instead of the remote service using the same protocol, except on a private port/socket. The daemon can accept requests from the web server/application and every ten seconds, if there is a pending request it can send it on to the remote server, and when there is a response, it can forward it back to the incoming request socket. You can think of this daemon as a proxy server that simply adds a queueing functionality. Note that the daemon doesn't actually have to parse either the incoming request or returning results; it just forwards the bits on to the destination in each case. It only has to implement the queueing and networking functionality.