Long-running operations in web-application - rest

Web application operations are generally meant to be quick to avoid long wait times to users. However, some operations the web application may perform may be computationally-intensive and take a fair bit of time. What is the best practice in REST to deal with such operations that may be take several minutes yet require an immediate response to users? Is it okay for the web application to take several minutes to return the response of the HTTP request, or is it better to return a 202 response, process in the background somewhere else, and then provide some form of notification to the user?

Is it okay for the web application to take several minutes to return the response of the HTTP request
No. Part of the problem with this approach is that if the server doesn't acknowledge the request in a timely fashion, the client won't know that it reached its intended destination.
is it better to return a 202 response, process in the background somewhere else, and then provide some form of notification to the user?
Yes. That's exactly what 202 Accepted is designed for
The 202 response is intentionally noncommittal. Its purpose is to allow a server to accept a request for some other process (perhaps a batch-oriented process that is only run once per day) without requiring that the user agent's connection to the server persist until the process is completed. The representation sent with this response ought to describe the request's current status and point to (or embed) a status monitor that can provide the user with an estimate of when the request will be fulfilled.
It can help, I think, to remember that we're talking about your integration domain; the client isn't talking to your app. It's instead talking to your API, which pretends to be a web site that the client can integrate with. So your client sends the request to the API, and the API responds with an accepted message accompanied by a bunch of links that will help the client continue with the protocol and eventually reach its goal.

Related

HTTP and REST: Are Status Codes Relevant to the Request, or the Resource?

In REST using HTTP, are status codes relevant to the request, or the resource?
In non-REST HTTP request/response calls, the status code refers to the success/failure/etc of the request being made. Did the request fail because the client sent an invalid request? Then return 400, and so on.
But with REST, the URI is stated to refer to a "resource", and the status code often appears to be used to refer to the resource (i.e. it's metadata about the resource). But if the request to retrieve the resource fails, how does the client know if there's a problem with the resource, or just a problem with the request to GET the resource?
My question is - have I understood that correctly? And if so, what happens if the request fails server-side?
Example:
Client calls POST /api/order/ with a payload.
Server validates the request and enqueues it before immediately returning 202 Accepted with Location: /api/order/43573/status and Retry-After: 60.
Client waits the requisite 60 seconds before polling GET /api/order/43573/status.
Server returns 202 Accepted, including a payload with more information and Retry-After: 60.
(The waiting and polling continues until processing completes.)
One of two things happens:
Success
Background processing succeeds.
Client polls GET /api/order/43573/status.
Server returns 303 See Other, with Location: /api/order/43573.
Client calls GET /api/order/43573.
Server returns 200 OK with appropriate payload.
Failure
Background processing fails for some reason.
Client polls GET /api/order/43573/status.
Server returns 200 See Other, with payload indicating failure.
My problem is that last request - returning 200 OK seems right for the web request, but the only body it'll ever contain is information about failure.
TL;DR
a HTTP response code belongs to the response issued for a request and not the resource itself
Unfortunately there is a widespread misunderstanding of what REST is and what it isn't.
As Jim Webber correctly pointed out, HTTP is an application protocol whose domain is the transfer of documents over a network and any business activities we deduce are just side effect of the actual document management. It is therefore our task to narrow down these side-effects to something useful.
REST or more precisely the REST architecture is a set of constraints that mainly deal with the decoupling of clients from servers by relying on standardized document types / representation formats and on an interaction model that is similar to the one used on the Web. The main goal here is to allow server to evolve freely in future without having to fear that introduced changes will break clients, as they only operate on standardized document formats they'll understand and are able to process. Through careful design and focus on above mentioned standardized formats the interoperability between different systems should be guaranteed.
In example, REST puts a strong focus on how server teach clients on what they can do next. Jim Webber compared it to a text based computer game where you are given options on what to do next. He later on gave also the example of a typical Web based Amazon-like checkout where you play along a given predefined checkout protocol until you reached the end of that state-machine in some way or the other.
A strong hint that a HTTP response code belongs to the response issued for a request and not the resource itself is, when you i.e. have resource hidden behind some authentication. In the first request you do not include an Authorization HTTP request header which then results in a 401 Unauthorized status code. Your browser will now ask you for a username and password and upon entering it the same reuqest will be reissued but this time with the Authorization header set which then might succeed, depending on your input and/or the server implementation.
what happens if the request fails server-side?
HTTP is based upon a request-response model meaning that for each request a response must be issued. As mentioned before, the HTTP status code returned here acts as coordination metadata that allows a client to act upon. I.e. with the above example with the 401 Unauthorized status code the browser (client) asked for credentials to send along and reissued the request.
Further, each of the HTTP operations used contain some defined properties a client can assume to hold true. Whether a server sticks to these is a different story, but a well-behaved server should. Such properties are safety and idempotency. The primer one does guarantee that a resource state wont be change when a request issued with such an operation is processed. Prominent examples here are GET and HEAD which are used to retrieve docuements. Itempotency is a property that is useful in case of a network connection issue and the client can't be sure whether the request reached the server at all or just the response got lost mid-way. It basically allows a server to resend the request without further thinking as the outcome here should be the same regardless if the initial request was processed at the server or not. PUT and DELETE are often used as example here.
Other HTTP operations such as POST or PATCH don't have such properties, unfortunately. So in case of a network issue the client can't be sure whether a request can be issued automatically. I.e. in case of a network issue while performing an order and later on resending the request, a server might have taken the order twice. As such, if something cruicial should be performed, such a payment, an expansive order or the like it is probably better to use an idempotent operation i.e. by using a POST-PUT creation pattern.
A further solution to prevent processing of certain requests my be to send them as conditional requests containing either ETag, If-Modified-Since or similar request headers. This though usually is used if certain unsafe operations should be performed on very specific resource states. I.e. a patch is usually dependent on the current, checkout version. If some changes are done to the remote state in the meantime, we usually don't want the patch to be applied to the altered version and as such want the request to fail which forces us to download the new state and modify the request in a way so that the desired outcome will be as desired.
According to your question:
While it seems natural to combine 202 Accepted with a Location response header, unfortunately RFC 7231 does not mention the location header as requirement for a 202 Accepted response nor does the definition of the Location header state that it can be used in a 202 Accepted response and as such, according to a response from Roy Fielding, a Location header outside of 201 Created and 3xx repsonses the Location header has no defined meaning. Instead of returning a 202 Accepted initially, you could have returned a 303 See Other response code containing a Location header for the status of the resource, which then could have returned a 202 Accepted status and later on perform a redirect to the final state as you did.
My problem is that last request - returning 200 OK seems right for the web request, but the only body it'll ever contain is information about failure.
According to RFC 7231
The representation sent with this response ought to describe the request's current status and point to (or embed) a status monitor that can provide the user with an estimate of when the request will be fulfilled.
So, in case the processing of a long-running process (partially) failed, the payload itself could indicate a failed processing to start with. You might also perform a redirect to a resource that then returns a 4xx status code containing i.e. a application/problem+json response payload giving hints on the reason why processing failed and so on.
I feel like 202 isn't appropriate for GET requests, so I would return 200 with a status payload until such time that returning a redirect makes sense?
The actual definition of 202 Accepted is
The 202 (Accepted) status code indicates that the request has been
accepted for processing, but the processing has not been completed.
The request might or might not eventually be acted upon, as it might
be disallowed when processing actually takes place. There is no
facility in HTTP for re-sending a status code from an asynchronous
operation.
The 202 response is intentionally noncommittal. Its purpose is to
allow a server to accept a request for some other process (perhaps a
batch-oriented process that is only run once per day) without
requiring that the user agent's connection to the server persist
until the process is completed. The representation sent with this
response ought to describe the request's current status and point to
(or embed) a status monitor that can provide the user with an
estimate of when the request will be fulfilled.
So, depending on how you interpret the request has been accepted for processing part you can return a 202 Accepted for a GET request, as the actual process might still not have been finished, or return a 200 OK response. I think the important part here is though the usage of a media type (a.k.a. representation format) a client understands and knows how to process and interpret. Based on that response a client should know whether the actual process completed with or without failures or is still pending. I have to admit though that I am currently unaware of any suitable media types that can be used to express such knowledge.
The status-code element is a 3-digit integer code describing the result of the server's attempt to understand and satisfy the client's corresponding request. The rest of the response message is to be interpreted in light of the semantics defined for that status code. -- RFC 7230
That's always true. It's part of the uniform interface constraint that we all use the same self descriptive messages, regardless of the target resource.
But with REST, the URI is stated to refer to a "resource", and the status code often appears to be used to refer to the resource (i.e. it's metadata about the resource).
I think you've been mislead here. The motivation for the uniform interface is that we can use general purpose components (browsers, caches) in our web applications because all servers use the same messages the same way. That would go right out the window if "REST" resources had different message semantics than the "non-REST" sort.
My problem is that last request - returning 200 OK seems right for the web request, but the only body it'll ever contain is information about failure.
You should probably review Jim Webber's 2011 talk. In summary: HTTP is an application protocol, the domain of that application is the transfer of documents over a network. The status line and headers are metadata for the "transfer of documents over a network" domain.
On the web, we transfer documents ("web pages") for the human beings to read. The audience for the metadata is the browser, not the human. Uniform interface means that you can use the same browser to do your banking, book shopping, track your sales funnel, and ask programming questions . It's the metadata that allows the components of the HTTP application to do useful work (ex: cache-invalidation).
So in your case, you are returning a document (technically: a representation of a resource) that describes the progress of a side effect of an earlier request, presumably with headers that describe how caches can re-use the response.

Should users await for a response after the http request in a saga pattern architecture?

I am designing a microservice architecture, using a database per service pattern.
Following the example of Order Service and Shipping Service, when a user makes an HTTP REST request to the Order Service, this one fires an event to notify shipping service. All this happens asynchronously. So, what happens with the user experience? I mean, the user needs an immediate response from the HTTP request. How can I handle this scenario?
All this happens asynchronously. So, What happen with the user experience? I mean, the user needs an immediately response from the HTTP request. How can I handle this scenario?
Respond as soon as you have stored the request.
Part of the point of microservices is that you have a system composed of independently deployable elements that do not require coordination.
If you want a system that is reliable even though the services don't have 100% uptime, then you need to have some form of durable message storage so that the sender and the receiver don't need to be running at the same time.
Therefore, your basic pattern for data from the outside is that the information from the incoming HTTP request is copied, not directly into a running service, but instead into the message store, to be processed by the service at some later time.
In other words, your REST API is a facade in front of your storage, not in front of the service itself.
The actor model may be a useful analogy; information moves around by copying messages into different inboxes, and are later consumed by the subscribing actor.
From the perspective of the client, the HTTP response is an acknowledgement that the request has been received and recognized as valid. Think "thank you for your order, we'll send you an email when your purchase is ready for pick up."
On the web, we would include in the response links to other useful resources; click here to see the status of your order, click there to see your history of recent orders, and so on.

REST API status when external APIs are down - Best Practices

I'm looking for guidance on good practices when it comes to returning errors from a REST API. I'm working on a new API so I can take it any direction.
In my case client invokes my API which internally invokes some external APIs. In case of success no problem, but in case of error responses from the far end(external cloud APIs) I am not sure what is industry standard for such services. Am currently thinking of returning 200 OK and then a json payload which details about the external API errors.
So what is the industry recommendations? Good practices (please explain why!) and also, from a client pov, what kind of error handling in the REST API makes life easier for the client code?
The failure you're asking about is one that has occurred within the internals of the service itself, though it is having external dependencies, so a 5XX status code range is the correct choice. 503 Service Unavailable looks perfect for the situation you've described.
5XX codes used for telling the client that even though the request was fine, the server has had some kind of problem fulfilling the request. On the other hand,
4XX codes are used to tell the client that it has done something wrong in request (and that the server is just fine, thanks).
Sections 10.4 and 10.5 of the HTTP 1.1 spec explain the different purposes of 4XX and 5XX codes.
Our colleagues have already provided the links / explanations about the HTTP status codes so you should learn them and find the most appropriate in your case.
I'll more concentrate on what can influence your decisions, assuming you've learnt the status codes.
Basically, You should understand what are the business implications of the flow triggered by client when he/she calls "your" API. The client doesn't know anything about the external cloud API you're working with and doesn't really care whether it works or not, the client works with your application.
If so, when the remote system returns some kind of error (and yes, different error statuses should give you a clue of what's wrong with the remote system), its your business decision about how to handle this error, and depending on this decision you might want to "behave" differently in the interaction with a client.
Here are some examples:
You know that the remote system breaks extremely rarely. But once its unavailable, you system doesn't work as well.
In this case you can might consider to retry the call to remote system if it failed. And if you still out of luck - then return some error status. Probably something like 5XX
You know that the data provided by remote client is not really important, on the other hand when the client calls your API its better to provide "something" even if its not really up-to-date than nothing. Think about the remote system that provides the "recommended movies" by some client id. And you're building a portal (netflix style). If this recommended movies service is down for some reason, it doesn't make sense to fail the whole portal page (think about the awful user experience). In this case you might want to "pre-cache" some generic list of movies, and use it as a fallback in case of failure of that remote service. In this case obviously you should return 2XX status in any case.
More advanced architecture. You know that the remote service fails often, and you can continue to work when its down. In this case maybe you will want to choose an "asynchronous" style of interaction with the client. For example: the client calls your rest and you respond immediately with an "Accepted" status code (202). You can save this id with status in some Database so that when the user "asks for status of the ticket by ticket id" you'll be able to query the DB. The point is that you return immediately. Then you might want to send the message with the task to some messaging system and once the consumer will pick the message, it will be processed and the db will be updated. As long as the remote service fails the message will get back to queue still being "unprocessed" (usually messaging systems can implement this behavior). Now at some point in time, the remote system starts responding, and all the messages get processed. Now their status in DB is "done".
So its up to client to ask "what happens" /or you can implement some push model with web sockets or something (its not REST style communication anymore in this case). But the point is that at some point in time the client will receive "OK, we're done with the ticket ID" (status 200). In this case the client can call a special endpoint and consume the stored results that you'll store in the DB as well (again status 200)
Bottom line, as you see, HTTP return codes are just an indicator, but its up to you how to organize the process of interconnection with the client and the relevant HTTP statuses will be derived from your decisions.
I would use 503 - Service Unavailable - as the error. Reason -
This is considering the case that the API operation cannot be completed without response from the external API. This is similar to my DB not responding. So my API is unavailable for service till the external service is back online.
As an API client, I am not concerned whether the API server internally invokes other APIs or not. I am just concerned with the result of the API server. So it does not matter to the client whether I am a proxy or not - hence, I would avoid 502 (Bad Gateway) and 504 (Gateway Timeout). These error can put the client into wrong assumption that the Gateway between the client and our service is causing trouble.
As suggested by #developerjack, I would also recommend to - "Include a Retry-After header so that your HTTP client knows not to spam you with retries until after X time. This means less error traffic for you, and better request planning for the client."
HTTP calls are between client and server, and so the error codes should reflect where the error or fault lies on either side of that relationship. Just because its downstream to you doesn't mean the HTTP client needs to care about that.
Given this, you should be returning a 5xx error because the fault is not with the client, its with the server (or its downstream services). Returning a 2xx (see below for caveat) would be incorrect because the HTTP call did not succeed, and a 4xx would be incorrect because it's not the client's fault.
Digging into specific 5xx's you can return:
A 504 or 502 might be appropriate if you specifically want to signal that your service is acting as a gateway/proxy.
A 523 is unofficial but used by cloudflare to specifically signal that an upstream/origin service is unreachable
A 500 (with a human and machine readable error body) is a safe default that simply indicates "there is something not right with the server and its services right now".
Now, in terms of best practice, there are some techniques you can use to either reduce the 500 errors, or make it easier on the clients to respond/react to this 5xx response.
Put in place retries within your service. If your service is working and the fault is downstream, and can successfully store the client's request to retry later when downstream services are available then you can still respond with a 2xx and the client knows that their request will be submitted. A great example of this might be a user sign up workflow; you can process the signup at your side, and then queue the welcome email to retry later if your email provider is unavailable.
Have both human descriptions, machine error codes and links in your API responses. Human descriptions are useful when debugging and developing against your service. Machine codes mean clients can index/track and code up specific code paths to a given scenario, and links to your docs mean you can alway provide more information. Even better is including any specific ID's for you to trace instances of this error in case the HTTP client needs to reach out for support (though this will be heavily dependant on your logging & telemetry). Here's an example:
{
"error_code": 1234,
"description": "X happened with Y because of Z.",
"learn_more": "https://dev.my.app/errors/1234",
"id": "90daa63b-f5ac-4c33-97d5-0801ade75a5d"
}
Include a Retry-After header so that your HTTP client knows not to spam you with retries until after X time. This means less error traffic for you, and better request planning for the client.

How can I implement a RESTful Progress Indicator?

I want my API to be be RESTful
Let say I have started a long running Task with POST and now want to be informed about the progress?
What is the idiomatic REST way to do this?
Poll with GET every 10 seconds?
The response to your
POST /new/long/running/task
should include a Location header. That header will point to an endpoint the client can hit to find out the status of the task. I would suggest your response look something like:
Location: http://my.server/task-status/15
{
"self": "/task-status/15",
"status": "running",
"expectedFinishedAt": <timestamp>
}
Then your client doesn't have to ping arbitrarily, because the server is giving it a hint as to when to check back. Later GETs to /task-status/15 will return updated timestamps. This saves you from having to blindly poll the server. Of course, this works much better if the server has some idea as to how long it will take to finish processing the task.
The way REST works, or rather the mechanism it uses - the HTTPS GET/POST/PUT/DELETE etc. doesn't provide a mechanism to have an event-driven mechanism where the server could send the data to the client. Though, it is theoretically be possible to have client/server functionality in both your server and in your client - though I wouldn't personally endorse this design. So having some sort of a submit API - POST/PUT and then a status query mechanism - GET would do the job.
The client should be the one giving you that information, showing you how many bytes have been sent already to the server. The server should not care about a partially uploaded resource.
That put aside, you will return a "Location" header indicating where is the resource once is created, but not earlier. I mean, when you POST you donĀ“t know which is going to be the address of the resource (that is indicated later in the Location header), so there is no reasonable way of providing an URL to check the status of the upload, because there is no reasonable way of identifying it till is done (you may try crazy things, but it is not recommendable).
Again, the client should give you that feedback, not the server.

http interface for long operation

I have a running system that process short and long running operations with a Request-Response interface based on Agatha-RRSL.
Now we want to change a little in order to be able to send requests via website in Json format so i'm trying many REST server implementation that support Json.
REST server will be one module or "shelve" handled by Topshelf, another module will be the processing module and the last the NoSQL database runner module.
To talk between REST and processing module i'm thinking about a servicebus but we have two types of request: short requests that perform work in 1-2 seconds and long requests that do work in 1 minute..
Is servicebus the right choice for this work? I'm thinking about returning a "response" for long running op with a token that can be used to request operation status and results with a new request. The problem is that big part of the requests must be used like sync request in order to complete http response.
I think I have also problems with response size (on MSMQ message transport) when I have to return huge list of objects
Any hint?
NServiceBus is not really suitable for request-response messaging patterns. It's more suited to asynchronous publish-subscribe.
Edit: In order to implement a kind of request response, you would need to message in both directions, but consisting of three logical steps:
So your client sends a message requesting the data.
The server would receive the message, process it, construct a return message with the data, and send it to the client.
The client can then process the data.
Because each of these steps takes place in isolation and in an asynchronous manner there can be no meaningful SLA or timeout enforced between when a client sends a request and receives a response. But this works nicely for large processing job which may take several minutes to complete.
Additionally a common value which can be used to tie the request to the response will need to be present in both messages. Otherwise a client could send more than one request, and receive multiple responses and not know which response was for which request.
So you can do this with NServiceBus but it takes a little more thought.
Also NServiceBus uses MSMQ as the underlying transport, not http.