HTTP Status Code for External Dependency Error - rest

What is the correct HTTP status code to return when a server is having issues communicating with an external API?
Say a client sends a valid request to my server A, A then queries server B's API in order to do something. However B's API is currently throwing 500's or is somehow unreachable, what status code should A return to the client? A 5* error doesn't seem correct because server A is functioning as it should, and a 4* error doesn't seem correct because the client is sending a valid request to A.

Did you consider status codes 502 and 504?
502 – The server while acting as a gateway or a proxy,
received an invalid response from the upstream server it accessed
in attempting to fulfill the request.
504 – The server, while acting as a gateway or proxy,
did not receive a timely response from the upstream server
specified by the URI (e.g. HTTP, FTP, LDAP)
or some other auxiliary server (e.g. DNS) it needed to access
in attempting to complete the request.
Of course, this would require a broad interpretation of "gateway" (implementation of interface A requiring a call to interface B), applied to the application layer. But this could be a nice way to say : "I cannot answer but it's not my fault nor yours".

Since the API relies on something that is not available, its service is unavailable as well.
I would think that the status code 503: Service Unavailable is the best fit for your situation. From the RFC description:
The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.
Granted, the description implies that this status code should be applied for errors on the server itself (and not to signal a problem with an external dependency). However, this is the best fit within the RFC status codes, and I wouldn't suggest using any custom status codes so anyone can understand them.
Alternatively, if your API supports a way of communicating errors (e.g. to tell the user that the ID he supplied is incorrect) you may be able to use this method to tell the user that the dependency is unavailable. This might be a little friendlier and might avoid some bug searching on the user's side, since at least some of the users won't be familiar with any status codes besides 403, 404 and maybe 500, depending on your audience.

You can refer this link.
HTTP Status 424 or 500 for error on external dependency
503 Service Unavailable looks perfect for the situation.

It's been a while that this question was asked.
Faced similar situation today. After exploring a bit, to me it makes more sense to send 424 FAILED_DEPENDENCY.

Related

HTTP Post under the hood

We have 2 Windows services (same machine) that communicate on top of HTTP Protocol.
On specific machine we see the HTTP POST being sent from the client (Windows service) and arrives to the server (Windows service listening to REST CALLs) - 2 times, meaning i get 2 exact HTTP Post request on the service, but we see on client it was executed only 1 time.
Before going to wireshark/analyze the HTTP protocol, I wish to understand what explain this behavior.
When going to https://www.rfc-editor.org/rfc/rfc7231#section-4.3.3
"the origin server SHOULD send a 201 (Created) response containing a Location header
field that provides an identifier for the primary resource created"
I guess we should look in wireshark for 201 response? And if no response? Does the HTTP or network framework for my C# application is retrying the POST on the server side? because we dont see 2 requests sent from client code.
POST reply behavior
While true, more often than not the server replies with a 200-ok status code and some extra information.
Whether this is by mistake or to avoid chatty apis or some other architecture/design consideration, only the developer can tell.
So in theory you get a 201 with an identifier and then make a GET request with said identifier to retrieve details.
In practice a lot of times this does not occur. So it is not safe to assume this behavior.
Your problem
I highly doubt that there is a built in mechanism that retries post. There are plenty of reasons for that:
Duplicating entries. Imagine creating a PayPal payment. If the network has an error and you just did not receive the answer, the built in mechanism will charge you twice.
There are libraries that do that only when you are sure that the request is idempotent, that is the post contained some sort of identifier and the second request will fail.
First, the calls are HTTP GET (not POST).
We define the URL with hostname/FQDN, the solution to avoid duplicated calls was to work with ip address instead of hostname when sending the Rest API.
This is the long explanation of the problem, no root cause yet.
Used both Wireshark/Process Monitor to diag, not sure for the root cause.
Process Monitor: Filtering to display network calls
Wireshark: Filter to show only HTTP
The Client send a single HTTP Get request to:
/DLEManagement/API/Engine/RunLearningPeriod
The call was executed at 11:08:16.931906
We can see 2nd call at 11:08:54.511909 - We did not trigger.
HTTP Get executed from *Server.exe (in red) and the Server is at *Management.Webservice.exe (in red).
We see that a *Client.exe (Antivirus process, in blue) is sending TCPCopy packets in the window between we sent and received.
Also, we can see that the first request was made with APIPA IPv6 and the 2nd call is IPv4, We checked the network interface and it was disabled.
Wireshark screenshot:
Process Monitor screenshot:
Network configuration:

What HTTP status code should server return if client request can't be fulfilled at the moment because of some business logic?

I have a chat application that works in browsers and uses REST API backend. It has following business logic rules:
a user can start chat session with any other users
a user can be in one and only one chat session at the time
if userA and userB have started a chat session and the session is
currently active, then if userC tries to start chat session with
either userA or userB server should prevent that and return some
kind of error to userC
My question is what would be appropriate HTTP status code for this error that userC should receive?
This is not fault of the client so 4xx codes don't seem appropriate.
This is not server error so 5xx codes also don't seem appropriate.
I will send response body with additional JSON info message of why request failed but what would be appropriate HTTP status code that would satisfy RESTful principles?
409 sounds the most appropriate here. 409 is often used in cases where a request is otherwise valid, but cannot be fulfilled because of the current state of the server/some other resource. If that 'other state' changes, the request could be valid again.
I wrote a bit more about 409 Conflict here: https://evertpot.com/http/409-confict
An important thing to recognize is that status codes, like headers, are meta data that belong to the transporting documents over a network domain. The audience for a status code isn't just your bespoke client, but also all of the general purpose components participating in the message exchange.
My question is what would be appropriate HTTP status code for this error that userC should receive?
The first thing to work through is the appropriate class of status code to use. For unsafe requests, the primary question to ask is "did this request change the representation of the resource?" If it did, then you want to think about using a 2xx status code, because you will want the general purpose components to be using cache invalidation to evict the now out-of-date representations in their caches. If the resource didn't change state, then you want to be reviewing the 4xx status code.
In either case, you can get a sense for the possibilities by reviewing the Status Code Registry, deciding which descriptions seem likely, then reviewing the authoritative reference to see if the semantics match what you are looking for.
More often than not, you can cheat and jump immediately to RFC 7231 -- the most familiar status codes are defined by the HTTP standard.
This is not fault of the client so 4xx codes don't seem appropriate.
It's probably the correct choice, though. 5xx is "I wanted to do what you asked, but I couldn't" is unlikely to be the right choice.
403 Forbidden is a pretty good option that says "I understood what you wanted, but I'm not going to do it. It's most commonly associated with a credentials problem, but the standard explicitly allows us to use this code elsewhere
a request might be forbidden for reasons unrelated to the credentials.
409 Conflict is a reasonable candidate.
The good news is that, aside from the human semantics, there isn't actually a lot of difference in how general purpose candidates handle these two status codes. For instance, they have exactly the same default caching behaviors.
HTTP does have a standardized treatment of conditional requests; "apply this change to the resource only if the specified predicate is true". That, in effect, gives you a compare and swap operation - you tag your request with metadata indicating which version of a resource you are looking at locally.
Conditional requests have their own special error code to handle a request that is out of date: 412 Precondition Failed. There's also a 428 Precondition Required status code if a request is missing a predicate and you want to insist. The client would be expected to include an appropriate precondition header to proceed.
As noted by Andrei Dragotoniu, status codes aren't intended to describe your domain behaviors. So you sometimes need to consider that 2xx is appropriate, because the server did what you asked, even though what you asked didn't have the outcome you hoped for.
Imagine, for example, a game on the web; you make a legal move, and the result of your move is that you lose the game. What status code should be used? Probably a 200 in that case - the server's state machine moved from playing the game to losing the game, and that's a perfectly legit outcome for a correctly handled HTTP request.
I don't guess that applies in your case; but you have more information to make that judgment.
First of all what you use is all upto you. There is no silver bullet to decide, you can think of the error code that you feel suits the usecase.
If I was writing this application, I would prefer to return 400 for this. Initiating a chat with a user which is already chatting with someone is a client's intention. If application cannot satisfy this request I would consider to return 400. And in the error message you can say that user is already in chat session with someone else.
5XX is a server side error which should not apply here because the server is still running fine. So I wouldn't use that
Some people will even return 200 and have a error field which tells the request actually failed. So all such things ends up to developer preferences.
The status code that seems more appropariate is 409 Conflict it shows exactly that the server won't accept the request because it is busy or doing something else.
for more info see https://developer.mozilla.org/en-US/docs/Web/HTTP/Status

REST API status when external APIs are down - Best Practices

I'm looking for guidance on good practices when it comes to returning errors from a REST API. I'm working on a new API so I can take it any direction.
In my case client invokes my API which internally invokes some external APIs. In case of success no problem, but in case of error responses from the far end(external cloud APIs) I am not sure what is industry standard for such services. Am currently thinking of returning 200 OK and then a json payload which details about the external API errors.
So what is the industry recommendations? Good practices (please explain why!) and also, from a client pov, what kind of error handling in the REST API makes life easier for the client code?
The failure you're asking about is one that has occurred within the internals of the service itself, though it is having external dependencies, so a 5XX status code range is the correct choice. 503 Service Unavailable looks perfect for the situation you've described.
5XX codes used for telling the client that even though the request was fine, the server has had some kind of problem fulfilling the request. On the other hand,
4XX codes are used to tell the client that it has done something wrong in request (and that the server is just fine, thanks).
Sections 10.4 and 10.5 of the HTTP 1.1 spec explain the different purposes of 4XX and 5XX codes.
Our colleagues have already provided the links / explanations about the HTTP status codes so you should learn them and find the most appropriate in your case.
I'll more concentrate on what can influence your decisions, assuming you've learnt the status codes.
Basically, You should understand what are the business implications of the flow triggered by client when he/she calls "your" API. The client doesn't know anything about the external cloud API you're working with and doesn't really care whether it works or not, the client works with your application.
If so, when the remote system returns some kind of error (and yes, different error statuses should give you a clue of what's wrong with the remote system), its your business decision about how to handle this error, and depending on this decision you might want to "behave" differently in the interaction with a client.
Here are some examples:
You know that the remote system breaks extremely rarely. But once its unavailable, you system doesn't work as well.
In this case you can might consider to retry the call to remote system if it failed. And if you still out of luck - then return some error status. Probably something like 5XX
You know that the data provided by remote client is not really important, on the other hand when the client calls your API its better to provide "something" even if its not really up-to-date than nothing. Think about the remote system that provides the "recommended movies" by some client id. And you're building a portal (netflix style). If this recommended movies service is down for some reason, it doesn't make sense to fail the whole portal page (think about the awful user experience). In this case you might want to "pre-cache" some generic list of movies, and use it as a fallback in case of failure of that remote service. In this case obviously you should return 2XX status in any case.
More advanced architecture. You know that the remote service fails often, and you can continue to work when its down. In this case maybe you will want to choose an "asynchronous" style of interaction with the client. For example: the client calls your rest and you respond immediately with an "Accepted" status code (202). You can save this id with status in some Database so that when the user "asks for status of the ticket by ticket id" you'll be able to query the DB. The point is that you return immediately. Then you might want to send the message with the task to some messaging system and once the consumer will pick the message, it will be processed and the db will be updated. As long as the remote service fails the message will get back to queue still being "unprocessed" (usually messaging systems can implement this behavior). Now at some point in time, the remote system starts responding, and all the messages get processed. Now their status in DB is "done".
So its up to client to ask "what happens" /or you can implement some push model with web sockets or something (its not REST style communication anymore in this case). But the point is that at some point in time the client will receive "OK, we're done with the ticket ID" (status 200). In this case the client can call a special endpoint and consume the stored results that you'll store in the DB as well (again status 200)
Bottom line, as you see, HTTP return codes are just an indicator, but its up to you how to organize the process of interconnection with the client and the relevant HTTP statuses will be derived from your decisions.
I would use 503 - Service Unavailable - as the error. Reason -
This is considering the case that the API operation cannot be completed without response from the external API. This is similar to my DB not responding. So my API is unavailable for service till the external service is back online.
As an API client, I am not concerned whether the API server internally invokes other APIs or not. I am just concerned with the result of the API server. So it does not matter to the client whether I am a proxy or not - hence, I would avoid 502 (Bad Gateway) and 504 (Gateway Timeout). These error can put the client into wrong assumption that the Gateway between the client and our service is causing trouble.
As suggested by #developerjack, I would also recommend to - "Include a Retry-After header so that your HTTP client knows not to spam you with retries until after X time. This means less error traffic for you, and better request planning for the client."
HTTP calls are between client and server, and so the error codes should reflect where the error or fault lies on either side of that relationship. Just because its downstream to you doesn't mean the HTTP client needs to care about that.
Given this, you should be returning a 5xx error because the fault is not with the client, its with the server (or its downstream services). Returning a 2xx (see below for caveat) would be incorrect because the HTTP call did not succeed, and a 4xx would be incorrect because it's not the client's fault.
Digging into specific 5xx's you can return:
A 504 or 502 might be appropriate if you specifically want to signal that your service is acting as a gateway/proxy.
A 523 is unofficial but used by cloudflare to specifically signal that an upstream/origin service is unreachable
A 500 (with a human and machine readable error body) is a safe default that simply indicates "there is something not right with the server and its services right now".
Now, in terms of best practice, there are some techniques you can use to either reduce the 500 errors, or make it easier on the clients to respond/react to this 5xx response.
Put in place retries within your service. If your service is working and the fault is downstream, and can successfully store the client's request to retry later when downstream services are available then you can still respond with a 2xx and the client knows that their request will be submitted. A great example of this might be a user sign up workflow; you can process the signup at your side, and then queue the welcome email to retry later if your email provider is unavailable.
Have both human descriptions, machine error codes and links in your API responses. Human descriptions are useful when debugging and developing against your service. Machine codes mean clients can index/track and code up specific code paths to a given scenario, and links to your docs mean you can alway provide more information. Even better is including any specific ID's for you to trace instances of this error in case the HTTP client needs to reach out for support (though this will be heavily dependant on your logging & telemetry). Here's an example:
{
"error_code": 1234,
"description": "X happened with Y because of Z.",
"learn_more": "https://dev.my.app/errors/1234",
"id": "90daa63b-f5ac-4c33-97d5-0801ade75a5d"
}
Include a Retry-After header so that your HTTP client knows not to spam you with retries until after X time. This means less error traffic for you, and better request planning for the client.

HTTP error code when server cannot find a user-given external resource

Our image board allows users to upload images by copy-pasting URLs. A client app sends a POST request to our API with an image URL given in the request body. Our web service receives the POST request and handles it by downloading the image from the given URL by using a server-side HTTP client (request in our case).
In successful case, the service finds the image, downloads it, and stores it to the server. The service returns HTTP 200 to the client.
Now, what if the image cannot be found? What if the download attempt results in HTTP 404? What HTTP error code should we use to response to the client?
HTTP 400 Bad Request is not applicable because the request was well-formed and all parameters were valid.
HTTP 404 Not Found is not applicable because the request URL was found and served although the image URL was not.
HTTP 502 Bad Gateway does not feel right either because there is nothing wrong with our server or the upstream server (the server of the source image). The user just happened to type in an image URL that does not exist.
Any experience on the matter? Which error code is the most correct?
First of all you should decide if this is a client error (4xx) or server error (5xx). From what you describe, it feels more like a client error. The client has requested the creation of a resource from another resource (the image URL) which does not exist.
There is no perfect match for this scenario, although one could make a case for each of the 2 following response codes:
HTTP 409 Conflict: From the RFC:
The request could not be completed due to a conflict with the current
state of the target resource. This code is used in situations where
the user might be able to resolve the conflict and resubmit the
request...
This applies to your case if you consider the target resource to be in a bad state (image not found). If someone provides an image at the specified URL, that effectively transitions your resource to a valid state.
This is also a good match because, as the RFC states, this code implies the user might be able to resolve the conflict (in your case the user would correct this by posting the image to the specified URL).
HTTP 424 Failed Dependency: From the RFC:
The 424 (Failed Dependency) status code means that the method could
not be performed on the resource because the requested action depended
on another action and that action failed...
This applies to your case in that "the requested action depended on another action and that action failed". The dependent action is the posting of an image to the other URL. What you have described is a case where that dependent action either failed or did not happen (which could also be called a failure).
Since the API determines on something that is not available, its service is unavailable as well.
The status code 503: Service Unavailable is the best fit for your situation.
According to the RFC description:
The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.
Alternatively, if your API supports a way of communicating errors (e.g. to tell the user that the information he submitted is incorrect) you may be able to use this method to tell the user that the external resource is unavailable. This might be a little friendlier and might avoid some error raises on the user's side.
Since the client app sends POST requests to your API server the response codes should be generated according to the received server in your case this is your API server.
If the server has received correct information from the client app and server determines the request as valid, it should return apropriate code with proper JSON or header based error messages.
http error codes were conceived assuming that all pages possibly served were stored locally, one way or another.
Your scenario does not match that assumption and it should therefore not come as a surprise that you don't find codes that fit your bill properly.
Your "not found" scenario is in fact an application error and you should notify your user of the situation by providing an error message on the form where he entered the URL (or return a fully dedicated error page or some such). Or choose an http error nonetheless and accept the notion that it will be a poor fit no matter what.
Now, what if the image cannot be found? What if the download attempt results in HTTP 404? What HTTP error code should we use to response to the client?
The main thing to keep in mind: you are trying to fool the client into thinking that you are a web site - just a dumb document store which might respond to some content editing messages.
For the client, the primary means of communication is the body of the response. See RFC 7231
Except when responding to a HEAD request, the server SHOULD send a representation containing an explanation of the error situation, and whether it is a temporary or permanent condition.
The status code is meta-data: aimed at giving the generic components participating in the exchange a chance to know what is going on (examples: the web browser doesn't need to know what page you are asking for to recognize a redirection response returned by the server, the web browser asking for credentials when it receives a 401 unauthorized response, web caches invalidating entries, or not, depending on the status code returned by the response).
HTTP 400 Bad Request is not applicable because the request was well-formed and all parameters were valid.
Yes, that's exactly right.
I would probably use 500 Internal Server Error, on the grounds that there's nothing wrong with the _document that the server received, the problems are all involved in the side effects of the server's implementation.
A different approach you might consider: 202 Accepted. Roughly translated "I got your message, I understood your message, and I'll get around to it later." If you don't need the side effects to be synchronous, you can defer judgment. That allows you to do things like applying a retry strategy.
The representation sent with this response ought to describe the request's current status and point to (or embed) a status monitor that can provide the user with an estimate of when the request will be fulfilled.
"I'll get to it later; if you want to know how it is going, go ask him -->"
Because 202 is a non-error status code, its effect on caches is different from those of a 4xx or 5xx. If you are already thinking ahead about caching, you'll want to the implications of that in mind.

API: what HTTP status code to use for multiple items found error?

Suppose there is a lookup API endpoint. A response can be successful (200), not found (404), ... and in my case more than one item found is an error. Which HTTP status code can describe more than one item found error the best?
Suppose there is a lookup API endpoint. A response can be successful (200), not found (404), ... and in my case more than one item found is an error. Which HTTP status code can describe more than one item found error the best?
The server fully understands the request, but can't deliver a representation that meets its part of the contract.
So the right error code is going to be in the 5xx range.
The 5xx (Server Error) class of status code indicates that the server is aware that it has erred or is incapable of performing the requested method.
If none of the specialized 5xx error codes fits, you should use 500
The 500 (Internal Server Error) status code indicates that the server encountered an unexpected condition that prevented it from fulfilling the request.
Michael Kropat did a good job of enumerating the options in Stop Making It Hard. He makes this interesting observation about 502
I can tell you we would have saved hours upon hours of debugging time if only we had distinguished 502 Bad Gateway (an upstream problem) instead of confusing it with 500 Internal Server Error.
The modern definition of "gateway" can be found in RFC 7230 section 2.3 (Intermediaries).
A "gateway" (a.k.a. "reverse proxy") is an intermediary that acts as an origin server for the outbound connection but translates received requests and forwards them inbound to another server or servers. Gateways are often used to encapsulate legacy or untrusted information services, to improve server performance through "accelerator" caching, and to enable partitioning or load balancing of HTTP services across multiple machines.
All HTTP requirements applicable to an origin server also apply to the outbound communication of a gateway. A gateway communicates with inbound servers using any protocol that it desires, including private extensions to HTTP that are outside the scope of this specification.
Very very roughly, 500 is "my bad", where 502/504 points a finger somewhere else.
what error code would you use for my case?
Based on what you have described, 500. That's appropriate for "my representation of this resource is corrupt."
The reasonable alternative is 502, which is appropriate for "the upstream representation of this resource is corrupt".
In either case, the audience for the error is internal (the client can't do anything to correct the problem. Your support team probably can't do anything useful with the distinction between the status codes). You could just as reasonably argue that the fact that the problem is upstream is an implementation detail of no interest to clients (so 500 everything). Alternatively, you could argue that your API is a gateway that translates received requests and forwards them inbound to another server, and therefore the status code should because the problem is in your store, not in your api.
So it comes down to things like "when tracking the number of errors we have in the api, do we want to distinguish this sort of issue from an exception being thrown internally"?
Authoritative guidance seems to be lacking; choose a method, document your justification, and ship.
That's an interesting problem. If it's an API call for a lookup and you get multiple results back, I'm actually expecting a 200 code with an results array.
But if the request itself is wrongfully formatted, so that it's not clear what it is the client is asking for, you could send a 400 bad request http status code.
Also have a look at the 207 multi-status code, as this might actually closer what you're looking for. Could you maybe provide request and response examples?