I have an API that charges for an order. It accepts the orderId and the amount as inputs. Then it makes a '/charge' call to the downstream, which returns a 202. Immediately after this call, it calls a '/verify' endpoint to make sure that the previous charge was successful.
Now it may happen that the charge was declined. One of the reasons for this can be that the user used an expired card. What should be the error code in this scenario?
As I see it, I can't send a 4xx as the request was correct for my API perspective. A bad request is something that the user can correct - In this case, he can't correct anything since the API just accepts the 'orderId' and the total amount to charge.
If I am sending a 5XX, then 500 does not make sense as this was not an 'unexpected condition' on my server. I can neither send a 503 as my server was not overloaded or down for maintenance.
Currently, I am sending back a 503 with an app code that maps to: Payment verification failed.
The response of the server must always be in the context of the domain responsibility of the service
If the service "accepts" the request and that is all the requester (client) is expected to know, with the domain operation being performed asynchronously behind the scenes, it should return a 202
If the interaction is synchronous, you must surely respond with an error, since the request was unsuccessful.
The response code depends on the domain remits.
As per your service, if the api accepts an identifier in the request, that led to the failed payment and it was the responsibility of the client to pass the right identifier, then you must respond with a 400 - BAD REQUEST.
If however, the api is just an intimation from the client to request you to perform some domain actions, and one of the domain actions failed; then there is nothing the client can do about it, and you must return a 5XX, since it is a service failure
500 is generally used for ungraceful error scenarios as a rule of thumb. But if you are ok to term this a server error, then return a 500
502 - is a BAD GATEWAY, wherein your domain services acting as a proxy for your downstream services failed to perform a domain action.
Please choose what fits
Related
In REST using HTTP, are status codes relevant to the request, or the resource?
In non-REST HTTP request/response calls, the status code refers to the success/failure/etc of the request being made. Did the request fail because the client sent an invalid request? Then return 400, and so on.
But with REST, the URI is stated to refer to a "resource", and the status code often appears to be used to refer to the resource (i.e. it's metadata about the resource). But if the request to retrieve the resource fails, how does the client know if there's a problem with the resource, or just a problem with the request to GET the resource?
My question is - have I understood that correctly? And if so, what happens if the request fails server-side?
Example:
Client calls POST /api/order/ with a payload.
Server validates the request and enqueues it before immediately returning 202 Accepted with Location: /api/order/43573/status and Retry-After: 60.
Client waits the requisite 60 seconds before polling GET /api/order/43573/status.
Server returns 202 Accepted, including a payload with more information and Retry-After: 60.
(The waiting and polling continues until processing completes.)
One of two things happens:
Success
Background processing succeeds.
Client polls GET /api/order/43573/status.
Server returns 303 See Other, with Location: /api/order/43573.
Client calls GET /api/order/43573.
Server returns 200 OK with appropriate payload.
Failure
Background processing fails for some reason.
Client polls GET /api/order/43573/status.
Server returns 200 See Other, with payload indicating failure.
My problem is that last request - returning 200 OK seems right for the web request, but the only body it'll ever contain is information about failure.
TL;DR
a HTTP response code belongs to the response issued for a request and not the resource itself
Unfortunately there is a widespread misunderstanding of what REST is and what it isn't.
As Jim Webber correctly pointed out, HTTP is an application protocol whose domain is the transfer of documents over a network and any business activities we deduce are just side effect of the actual document management. It is therefore our task to narrow down these side-effects to something useful.
REST or more precisely the REST architecture is a set of constraints that mainly deal with the decoupling of clients from servers by relying on standardized document types / representation formats and on an interaction model that is similar to the one used on the Web. The main goal here is to allow server to evolve freely in future without having to fear that introduced changes will break clients, as they only operate on standardized document formats they'll understand and are able to process. Through careful design and focus on above mentioned standardized formats the interoperability between different systems should be guaranteed.
In example, REST puts a strong focus on how server teach clients on what they can do next. Jim Webber compared it to a text based computer game where you are given options on what to do next. He later on gave also the example of a typical Web based Amazon-like checkout where you play along a given predefined checkout protocol until you reached the end of that state-machine in some way or the other.
A strong hint that a HTTP response code belongs to the response issued for a request and not the resource itself is, when you i.e. have resource hidden behind some authentication. In the first request you do not include an Authorization HTTP request header which then results in a 401 Unauthorized status code. Your browser will now ask you for a username and password and upon entering it the same reuqest will be reissued but this time with the Authorization header set which then might succeed, depending on your input and/or the server implementation.
what happens if the request fails server-side?
HTTP is based upon a request-response model meaning that for each request a response must be issued. As mentioned before, the HTTP status code returned here acts as coordination metadata that allows a client to act upon. I.e. with the above example with the 401 Unauthorized status code the browser (client) asked for credentials to send along and reissued the request.
Further, each of the HTTP operations used contain some defined properties a client can assume to hold true. Whether a server sticks to these is a different story, but a well-behaved server should. Such properties are safety and idempotency. The primer one does guarantee that a resource state wont be change when a request issued with such an operation is processed. Prominent examples here are GET and HEAD which are used to retrieve docuements. Itempotency is a property that is useful in case of a network connection issue and the client can't be sure whether the request reached the server at all or just the response got lost mid-way. It basically allows a server to resend the request without further thinking as the outcome here should be the same regardless if the initial request was processed at the server or not. PUT and DELETE are often used as example here.
Other HTTP operations such as POST or PATCH don't have such properties, unfortunately. So in case of a network issue the client can't be sure whether a request can be issued automatically. I.e. in case of a network issue while performing an order and later on resending the request, a server might have taken the order twice. As such, if something cruicial should be performed, such a payment, an expansive order or the like it is probably better to use an idempotent operation i.e. by using a POST-PUT creation pattern.
A further solution to prevent processing of certain requests my be to send them as conditional requests containing either ETag, If-Modified-Since or similar request headers. This though usually is used if certain unsafe operations should be performed on very specific resource states. I.e. a patch is usually dependent on the current, checkout version. If some changes are done to the remote state in the meantime, we usually don't want the patch to be applied to the altered version and as such want the request to fail which forces us to download the new state and modify the request in a way so that the desired outcome will be as desired.
According to your question:
While it seems natural to combine 202 Accepted with a Location response header, unfortunately RFC 7231 does not mention the location header as requirement for a 202 Accepted response nor does the definition of the Location header state that it can be used in a 202 Accepted response and as such, according to a response from Roy Fielding, a Location header outside of 201 Created and 3xx repsonses the Location header has no defined meaning. Instead of returning a 202 Accepted initially, you could have returned a 303 See Other response code containing a Location header for the status of the resource, which then could have returned a 202 Accepted status and later on perform a redirect to the final state as you did.
My problem is that last request - returning 200 OK seems right for the web request, but the only body it'll ever contain is information about failure.
According to RFC 7231
The representation sent with this response ought to describe the request's current status and point to (or embed) a status monitor that can provide the user with an estimate of when the request will be fulfilled.
So, in case the processing of a long-running process (partially) failed, the payload itself could indicate a failed processing to start with. You might also perform a redirect to a resource that then returns a 4xx status code containing i.e. a application/problem+json response payload giving hints on the reason why processing failed and so on.
I feel like 202 isn't appropriate for GET requests, so I would return 200 with a status payload until such time that returning a redirect makes sense?
The actual definition of 202 Accepted is
The 202 (Accepted) status code indicates that the request has been
accepted for processing, but the processing has not been completed.
The request might or might not eventually be acted upon, as it might
be disallowed when processing actually takes place. There is no
facility in HTTP for re-sending a status code from an asynchronous
operation.
The 202 response is intentionally noncommittal. Its purpose is to
allow a server to accept a request for some other process (perhaps a
batch-oriented process that is only run once per day) without
requiring that the user agent's connection to the server persist
until the process is completed. The representation sent with this
response ought to describe the request's current status and point to
(or embed) a status monitor that can provide the user with an
estimate of when the request will be fulfilled.
So, depending on how you interpret the request has been accepted for processing part you can return a 202 Accepted for a GET request, as the actual process might still not have been finished, or return a 200 OK response. I think the important part here is though the usage of a media type (a.k.a. representation format) a client understands and knows how to process and interpret. Based on that response a client should know whether the actual process completed with or without failures or is still pending. I have to admit though that I am currently unaware of any suitable media types that can be used to express such knowledge.
The status-code element is a 3-digit integer code describing the result of the server's attempt to understand and satisfy the client's corresponding request. The rest of the response message is to be interpreted in light of the semantics defined for that status code. -- RFC 7230
That's always true. It's part of the uniform interface constraint that we all use the same self descriptive messages, regardless of the target resource.
But with REST, the URI is stated to refer to a "resource", and the status code often appears to be used to refer to the resource (i.e. it's metadata about the resource).
I think you've been mislead here. The motivation for the uniform interface is that we can use general purpose components (browsers, caches) in our web applications because all servers use the same messages the same way. That would go right out the window if "REST" resources had different message semantics than the "non-REST" sort.
My problem is that last request - returning 200 OK seems right for the web request, but the only body it'll ever contain is information about failure.
You should probably review Jim Webber's 2011 talk. In summary: HTTP is an application protocol, the domain of that application is the transfer of documents over a network. The status line and headers are metadata for the "transfer of documents over a network" domain.
On the web, we transfer documents ("web pages") for the human beings to read. The audience for the metadata is the browser, not the human. Uniform interface means that you can use the same browser to do your banking, book shopping, track your sales funnel, and ask programming questions . It's the metadata that allows the components of the HTTP application to do useful work (ex: cache-invalidation).
So in your case, you are returning a document (technically: a representation of a resource) that describes the progress of a side effect of an earlier request, presumably with headers that describe how caches can re-use the response.
I'm looking for guidance on good practices when it comes to returning errors from a REST API. I'm working on a new API so I can take it any direction.
In my case client invokes my API which internally invokes some external APIs. In case of success no problem, but in case of error responses from the far end(external cloud APIs) I am not sure what is industry standard for such services. Am currently thinking of returning 200 OK and then a json payload which details about the external API errors.
So what is the industry recommendations? Good practices (please explain why!) and also, from a client pov, what kind of error handling in the REST API makes life easier for the client code?
The failure you're asking about is one that has occurred within the internals of the service itself, though it is having external dependencies, so a 5XX status code range is the correct choice. 503 Service Unavailable looks perfect for the situation you've described.
5XX codes used for telling the client that even though the request was fine, the server has had some kind of problem fulfilling the request. On the other hand,
4XX codes are used to tell the client that it has done something wrong in request (and that the server is just fine, thanks).
Sections 10.4 and 10.5 of the HTTP 1.1 spec explain the different purposes of 4XX and 5XX codes.
Our colleagues have already provided the links / explanations about the HTTP status codes so you should learn them and find the most appropriate in your case.
I'll more concentrate on what can influence your decisions, assuming you've learnt the status codes.
Basically, You should understand what are the business implications of the flow triggered by client when he/she calls "your" API. The client doesn't know anything about the external cloud API you're working with and doesn't really care whether it works or not, the client works with your application.
If so, when the remote system returns some kind of error (and yes, different error statuses should give you a clue of what's wrong with the remote system), its your business decision about how to handle this error, and depending on this decision you might want to "behave" differently in the interaction with a client.
Here are some examples:
You know that the remote system breaks extremely rarely. But once its unavailable, you system doesn't work as well.
In this case you can might consider to retry the call to remote system if it failed. And if you still out of luck - then return some error status. Probably something like 5XX
You know that the data provided by remote client is not really important, on the other hand when the client calls your API its better to provide "something" even if its not really up-to-date than nothing. Think about the remote system that provides the "recommended movies" by some client id. And you're building a portal (netflix style). If this recommended movies service is down for some reason, it doesn't make sense to fail the whole portal page (think about the awful user experience). In this case you might want to "pre-cache" some generic list of movies, and use it as a fallback in case of failure of that remote service. In this case obviously you should return 2XX status in any case.
More advanced architecture. You know that the remote service fails often, and you can continue to work when its down. In this case maybe you will want to choose an "asynchronous" style of interaction with the client. For example: the client calls your rest and you respond immediately with an "Accepted" status code (202). You can save this id with status in some Database so that when the user "asks for status of the ticket by ticket id" you'll be able to query the DB. The point is that you return immediately. Then you might want to send the message with the task to some messaging system and once the consumer will pick the message, it will be processed and the db will be updated. As long as the remote service fails the message will get back to queue still being "unprocessed" (usually messaging systems can implement this behavior). Now at some point in time, the remote system starts responding, and all the messages get processed. Now their status in DB is "done".
So its up to client to ask "what happens" /or you can implement some push model with web sockets or something (its not REST style communication anymore in this case). But the point is that at some point in time the client will receive "OK, we're done with the ticket ID" (status 200). In this case the client can call a special endpoint and consume the stored results that you'll store in the DB as well (again status 200)
Bottom line, as you see, HTTP return codes are just an indicator, but its up to you how to organize the process of interconnection with the client and the relevant HTTP statuses will be derived from your decisions.
I would use 503 - Service Unavailable - as the error. Reason -
This is considering the case that the API operation cannot be completed without response from the external API. This is similar to my DB not responding. So my API is unavailable for service till the external service is back online.
As an API client, I am not concerned whether the API server internally invokes other APIs or not. I am just concerned with the result of the API server. So it does not matter to the client whether I am a proxy or not - hence, I would avoid 502 (Bad Gateway) and 504 (Gateway Timeout). These error can put the client into wrong assumption that the Gateway between the client and our service is causing trouble.
As suggested by #developerjack, I would also recommend to - "Include a Retry-After header so that your HTTP client knows not to spam you with retries until after X time. This means less error traffic for you, and better request planning for the client."
HTTP calls are between client and server, and so the error codes should reflect where the error or fault lies on either side of that relationship. Just because its downstream to you doesn't mean the HTTP client needs to care about that.
Given this, you should be returning a 5xx error because the fault is not with the client, its with the server (or its downstream services). Returning a 2xx (see below for caveat) would be incorrect because the HTTP call did not succeed, and a 4xx would be incorrect because it's not the client's fault.
Digging into specific 5xx's you can return:
A 504 or 502 might be appropriate if you specifically want to signal that your service is acting as a gateway/proxy.
A 523 is unofficial but used by cloudflare to specifically signal that an upstream/origin service is unreachable
A 500 (with a human and machine readable error body) is a safe default that simply indicates "there is something not right with the server and its services right now".
Now, in terms of best practice, there are some techniques you can use to either reduce the 500 errors, or make it easier on the clients to respond/react to this 5xx response.
Put in place retries within your service. If your service is working and the fault is downstream, and can successfully store the client's request to retry later when downstream services are available then you can still respond with a 2xx and the client knows that their request will be submitted. A great example of this might be a user sign up workflow; you can process the signup at your side, and then queue the welcome email to retry later if your email provider is unavailable.
Have both human descriptions, machine error codes and links in your API responses. Human descriptions are useful when debugging and developing against your service. Machine codes mean clients can index/track and code up specific code paths to a given scenario, and links to your docs mean you can alway provide more information. Even better is including any specific ID's for you to trace instances of this error in case the HTTP client needs to reach out for support (though this will be heavily dependant on your logging & telemetry). Here's an example:
{
"error_code": 1234,
"description": "X happened with Y because of Z.",
"learn_more": "https://dev.my.app/errors/1234",
"id": "90daa63b-f5ac-4c33-97d5-0801ade75a5d"
}
Include a Retry-After header so that your HTTP client knows not to spam you with retries until after X time. This means less error traffic for you, and better request planning for the client.
Our image board allows users to upload images by copy-pasting URLs. A client app sends a POST request to our API with an image URL given in the request body. Our web service receives the POST request and handles it by downloading the image from the given URL by using a server-side HTTP client (request in our case).
In successful case, the service finds the image, downloads it, and stores it to the server. The service returns HTTP 200 to the client.
Now, what if the image cannot be found? What if the download attempt results in HTTP 404? What HTTP error code should we use to response to the client?
HTTP 400 Bad Request is not applicable because the request was well-formed and all parameters were valid.
HTTP 404 Not Found is not applicable because the request URL was found and served although the image URL was not.
HTTP 502 Bad Gateway does not feel right either because there is nothing wrong with our server or the upstream server (the server of the source image). The user just happened to type in an image URL that does not exist.
Any experience on the matter? Which error code is the most correct?
First of all you should decide if this is a client error (4xx) or server error (5xx). From what you describe, it feels more like a client error. The client has requested the creation of a resource from another resource (the image URL) which does not exist.
There is no perfect match for this scenario, although one could make a case for each of the 2 following response codes:
HTTP 409 Conflict: From the RFC:
The request could not be completed due to a conflict with the current
state of the target resource. This code is used in situations where
the user might be able to resolve the conflict and resubmit the
request...
This applies to your case if you consider the target resource to be in a bad state (image not found). If someone provides an image at the specified URL, that effectively transitions your resource to a valid state.
This is also a good match because, as the RFC states, this code implies the user might be able to resolve the conflict (in your case the user would correct this by posting the image to the specified URL).
HTTP 424 Failed Dependency: From the RFC:
The 424 (Failed Dependency) status code means that the method could
not be performed on the resource because the requested action depended
on another action and that action failed...
This applies to your case in that "the requested action depended on another action and that action failed". The dependent action is the posting of an image to the other URL. What you have described is a case where that dependent action either failed or did not happen (which could also be called a failure).
Since the API determines on something that is not available, its service is unavailable as well.
The status code 503: Service Unavailable is the best fit for your situation.
According to the RFC description:
The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.
Alternatively, if your API supports a way of communicating errors (e.g. to tell the user that the information he submitted is incorrect) you may be able to use this method to tell the user that the external resource is unavailable. This might be a little friendlier and might avoid some error raises on the user's side.
Since the client app sends POST requests to your API server the response codes should be generated according to the received server in your case this is your API server.
If the server has received correct information from the client app and server determines the request as valid, it should return apropriate code with proper JSON or header based error messages.
http error codes were conceived assuming that all pages possibly served were stored locally, one way or another.
Your scenario does not match that assumption and it should therefore not come as a surprise that you don't find codes that fit your bill properly.
Your "not found" scenario is in fact an application error and you should notify your user of the situation by providing an error message on the form where he entered the URL (or return a fully dedicated error page or some such). Or choose an http error nonetheless and accept the notion that it will be a poor fit no matter what.
Now, what if the image cannot be found? What if the download attempt results in HTTP 404? What HTTP error code should we use to response to the client?
The main thing to keep in mind: you are trying to fool the client into thinking that you are a web site - just a dumb document store which might respond to some content editing messages.
For the client, the primary means of communication is the body of the response. See RFC 7231
Except when responding to a HEAD request, the server SHOULD send a representation containing an explanation of the error situation, and whether it is a temporary or permanent condition.
The status code is meta-data: aimed at giving the generic components participating in the exchange a chance to know what is going on (examples: the web browser doesn't need to know what page you are asking for to recognize a redirection response returned by the server, the web browser asking for credentials when it receives a 401 unauthorized response, web caches invalidating entries, or not, depending on the status code returned by the response).
HTTP 400 Bad Request is not applicable because the request was well-formed and all parameters were valid.
Yes, that's exactly right.
I would probably use 500 Internal Server Error, on the grounds that there's nothing wrong with the _document that the server received, the problems are all involved in the side effects of the server's implementation.
A different approach you might consider: 202 Accepted. Roughly translated "I got your message, I understood your message, and I'll get around to it later." If you don't need the side effects to be synchronous, you can defer judgment. That allows you to do things like applying a retry strategy.
The representation sent with this response ought to describe the request's current status and point to (or embed) a status monitor that can provide the user with an estimate of when the request will be fulfilled.
"I'll get to it later; if you want to know how it is going, go ask him -->"
Because 202 is a non-error status code, its effect on caches is different from those of a 4xx or 5xx. If you are already thinking ahead about caching, you'll want to the implications of that in mind.
How is possible to handle timeouts in time consuming operations in a REST API. Let's say we have the following scenario as example:
A client service sends a request to insert a resource through a REST API.
Timeout elapses. The client thinks the insertion failed.
REST API keep working and finishes the insertion.
Client do not notify the resource insertion and it status is "Failed".
I can think I a solution with a message broker to send orders to a queue and wait until they are solved.
Any other workaround?
EDIT 1:
POST-PUT Pattern as has been suggested in this thread.
A Message Broker (add more complexity to the system)
Callback or webhook. Pass in the request a return url that the server API can call to let the client know that the work is completed.
HTTP offers a set of properties for invoking certain methods. These are primarily safetiness, idempotency and cacheability. While the first one guarantees a client that no data is modified, the 2nd one gives a promise whether a request can be reissued in regards to connection issues and the client not knowing whether the initial request succeeded or not and only the response got lost mid way. PUT i.e. does provide such a property, i.e.
A simple POST request to "insert" some data does not have any of these properties. A server receiving a POST request furthermore processes the payload according to its own semantics. The client does not know beforehand whether a resource will be created or if the server just ignores the request. In case the server created a resource the server will inform the client via the Location HTTP response header pointing to the actual location the client can retrieve information from.
PUT is usually used only to "update" a resource, though according to the spec it can also be used in order to create a new resource if it does not yet exist. As with POST on a successful resource creation the PUT response should include such a Location HTTP response header to inform the client that a resource was created.
The POST-PUT-Creation pattern separates the creation of the URI from the actual persistence of the representation by first firing off POST requests to the server until a response is received containing a Location HTTP response header. This header is used in a PUT request to actually send the payload to the server. As PUT is idempotent the server simply can reissue the request until it receives a valid response from the server.
On sending the initial POST request to the server, a client can't be sure whether the request reached the server and only the response got lost, or the initial request didn't make it to the server. As the request is only used to create a new URI (without any content yet) the client may simply reissue the request and in worst case just create a new URI that points to nothing. The server may have a cleanup routine that frees unused URIs after a certain amount of time.
Once the client receives the URI, it simply can use PUT to reliably send data to the server. As long as the client didn't receive a valid response, it can just reissue the request over and over until it receives a response.
I therefore do not see the need to use a message-oriented middleware (MOM) using brokers and queues in order to guarantee reliable messaging.
You could also cache the data after a successful insertion with a previously exchanged request_id or something of that sort. But I believe message broker with some asynchronous task runner is a much better way to deal with the problem especially if your request thread is a scarce resource. What I mean by that is. If you are receiving a good amount of requests all the time. Then it is a good idea to keep your responses as quickly as possible so the workers will be available for any requests to come.
What is the correct HTTP status code to return when a server is having issues communicating with an external API?
Say a client sends a valid request to my server A, A then queries server B's API in order to do something. However B's API is currently throwing 500's or is somehow unreachable, what status code should A return to the client? A 5* error doesn't seem correct because server A is functioning as it should, and a 4* error doesn't seem correct because the client is sending a valid request to A.
Did you consider status codes 502 and 504?
502 – The server while acting as a gateway or a proxy,
received an invalid response from the upstream server it accessed
in attempting to fulfill the request.
504 – The server, while acting as a gateway or proxy,
did not receive a timely response from the upstream server
specified by the URI (e.g. HTTP, FTP, LDAP)
or some other auxiliary server (e.g. DNS) it needed to access
in attempting to complete the request.
Of course, this would require a broad interpretation of "gateway" (implementation of interface A requiring a call to interface B), applied to the application layer. But this could be a nice way to say : "I cannot answer but it's not my fault nor yours".
Since the API relies on something that is not available, its service is unavailable as well.
I would think that the status code 503: Service Unavailable is the best fit for your situation. From the RFC description:
The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.
Granted, the description implies that this status code should be applied for errors on the server itself (and not to signal a problem with an external dependency). However, this is the best fit within the RFC status codes, and I wouldn't suggest using any custom status codes so anyone can understand them.
Alternatively, if your API supports a way of communicating errors (e.g. to tell the user that the ID he supplied is incorrect) you may be able to use this method to tell the user that the dependency is unavailable. This might be a little friendlier and might avoid some bug searching on the user's side, since at least some of the users won't be familiar with any status codes besides 403, 404 and maybe 500, depending on your audience.
You can refer this link.
HTTP Status 424 or 500 for error on external dependency
503 Service Unavailable looks perfect for the situation.
It's been a while that this question was asked.
Faced similar situation today. After exploring a bit, to me it makes more sense to send 424 FAILED_DEPENDENCY.