REST API: how to notify a client that the request has failed when the service already returned 200 and some data? - rest

REST API: how to notify a client that the request has failed when the service already returned 200 and some data?
What I am doing?
I am developing a REST Web service that returns data from two sources:
An CSV file from an HTTP server which changes often and sometimes is huge.
A local file.
When a client invokes the service, it does this:
It sends a request to the HTTP server to obtain the CSV file.
After obtaining the CSV file, it combines the data from both sources.
Sends the result to the client. The result is an XML document.
Problem
Sometimes, after I have already returned some data to the client, the HTTP server fails so I cannot continue sending data to the client.
When this happens, I would like to notify the client that there was an error. How should I do this? The service already returned the HTTP code 200 and some data. So I cannot send the client an error 500.
Should I simply write to the output an error message? The client will fail because it the XML-document will not be valid.
The service cannot wait to send the response until the entire file from the HTTP server is read. The reasons is that sometimes the file obtained from the HTTP is very big and does not fit in memory.
Environment: although I do not think this is important, this service is developed in Jersey 1.x.

As you say, there are a couple options:
Start sending the response 200 OK before your upload request is complete, but rely on the client to detect an invalid ontent response; or
Wait until your request file upload is complete before sending the HTTP response. Then you can send the correct status code (2xx or 500).
I would recommend waiting until the upload is complete.
If the file cannot fit in server-side memory, then find a technique to write the stream to persistence not in memory, such as a cache, nosql db, or the filesystem. This will allow for faster processing of the file upload.
If you require additional time to process the file on the server side after upload, you can return a 202 Accepted status, with the Location: header having the resource to the long-running job. The client can keep checking if the job is complete. This will avoid having to process the whole thing in one HTTP round-trip.
some good examples of using RESTful long-operations:
Best practice for implementing long-running searches with REST
http://billhiggins.us/blog/2011/04/27/resty-long-ops/
REST with JAX-RS - Handling long running operations

Replying to myself. This may be useful for someone else.
Initially, I developed this option: if there was an error generating the output of the service when the HTTP code 200 was already sent, the service would write the error message to the output and close the connection. In these cases, the XML of the response was invalid.
Later, I had to change this behavior because users complained that in this scenario, the response was an invalid XML. As a consequence, all they were seeing was the error returned by the XML parser of their applications saying that the XML was invalid, not the actual error message.
To avoid this issue, I changed the behavior of the service:
When there are no errors, the response looks like this:
<view name="demo_stats">
<demo_stats>
<int_type>1</int_type>
<numeric_type>1.1</numeric_type>
</demo_stats>
<demo_stats>
<int_type>2</int_type>
<numeric_type>2.2</numeric_type>
</demo_stats>
</view>
If there is an error generating the output of the service and the service already sent the HTTP code 200, the response looks like this:
<view name="demo_stats">
<demo_stats>
<int_type>1</int_type>
<numeric_type>1.1</numeric_type>
</demo_stats>
<demo_stats>
<int_type>2</int_type>
<numeric_type>2.2</numeric_type>
</demo_stats>
<errors>
<error>There was an error transforming the value of row #3</error>
</errors>
</view>
The element errors is optional and only appears when there is an error during the generation of the output. This is a valid XML document and it allows client application to control better for this situation.

Related

What is the correct status code after testing if an e-mail address exists in the database or not?

We are using a specific endpoint on our API to test if an e-mail address is already registered in our database. When it's not, what would be the right status code to return to the client ?
We cannot take a decision between 404, 204 and 200. There are a couple of articles over the net but all state pros and cons but it's not very clear.
200 says that the request was successful
204 says that the request was successful AND that the message body included in the response is 0 bytes long.
404 says that there is no current implementation associated with the requested resource
Which of these is correct really depends on your resource design.
Consider a database query with a where clause -- if there are no matching rows, then you get SUCCESS, with an empty result set. So the analogous thing in a HTTP response would be a 2xx status code, and a body that describes an empty set.
If you were using a JSON List as your representation of the set, then the representation would be two bytes long [], and a 200 status code would be appropriate. If you were using a json lines representation, with each record on its own line, then with no records you would have no lines, therefore a 0 byte representation and 204 would be a good choice.
What about a case where we have a simple web page, that tells you if the email address is registered or not? If it's registered, the server responds with a 200 message and a html document that tells you about the registration. If it isn't registered, then you get an html message telling you that the email address isn't registered... and a 200, because we were able to find the current representation of the resource.
And 404? 404 indicates to the client that there appears to have been a spelling error in the target-uri of the http request -- that there isn't even nothing to find.
It may help to understand that status codes are metadata about the HTTP response, which is to say that they are part of the application domain of transferring documents over a network, not about the business domain. They are there so that generic components, like caches, can do interesting things without needing to know any specifics about the domain in question.
Our web API is a facade to make our domain model look like a boring document store.

Call remote procedure in a RESTful way

I want to make a webservice which provides endpoints through which a user can generate and manipulate configuration files saved in a database.
The endpoint
https://api.example.com/generate-new
should execute a procedure which generates a standard configuration file for the user.
A RESTful webservice should return the complete config file as an answer (state transfer).
Since the config files also contain sensitive information like api keys, this is not possible here.
I would like to only return a success message. A HTTP-200 would be enough.
But is this approach still RESTful?
Ask yourself: What is the Resource (the 'R' in REST) here? It sounds like the Resource is something like a User Configuration Generation Task. The URL should reflect this:
https://api.example.com/user-configuration-generation-tasks
Let's say a HTTP POST request to this collection resource creates a new task, maybe using information passed in the request body. The server executes this task and generates the new user configuration.
Now you have three options for what to return:
201 CREATED: This would require to include the Location of the user configuration in the response. Since you don't want to make the configuration accessible, this is no option.
200 OK: This would indicate general success of the operation. But since you don't plan to return any response, I don't suggest this. Instead us
204 NO CONTENT: The server has successfully fulfilled the request and there is no additional content to send in the response payload body.
All this is perfectly RESTful.

REST response code for accessing a corrupt/invalid resource

What's the best HTTP status code to use in response to an HTTP GET for a resource that's corrupt or semantically invalid?
E.g., consider a request to GET /person/1234 where data for person ID 1234 exists on the server but violates some business rule, so the server refuses to use it.
404 doesn't apply (because the data actually exists).
4xx in general seems not ideal (because the problem is on the server end, not under the client's control).
503 seems to apply to the service as a whole, not a particular resource.
500 certainly fits, but it's very vague in actually telling the client what might be wrong.
Any suggestions?
After reading the comments and the linked resources, it looks like #RemyLebeau's approach is best:
I think 500 is the only official response code that fits this situation. And there is nothing stopping you from including a response body that describes the reason for the failure.
according to iana.org:
4xx: Client Error - The request contains bad syntax or cannot be fulfilled
5xx: Server Error - The server failed to fulfill an apparently valid request
I think none of the 4xx status code should be valid as a response to an internal server error or migration or ... where client has no responsibilities or where user's inputs are expected to be rechecked. unless user's pre-filled data are involved like maybe user's package is not allowing him to access that data after a pre-determinate and known date, in such specific case It may be valid a 403 Forbidden as #Bari did suggest.
I'm not an expert but I think when the rejection or the decision of considering endpoint data as corrupt or invalid is made by server, then it will depends on what should be done next. I see 3 possible cases:
1. It is expected that somehow this is going to be fixed and client
should be invited to request it back but at some future moment ==> 503 (Service Unavailable):
503 (Service Unavailable)
status code indicates that the server
is currently unable to handle the request due to a temporary overload
or scheduled maintenance, which will likely be alleviated after some
delay. The server MAY send a Retry-After header field
(Section 7.1.3) to suggest an appropriate amount of time for the
client to wait before retrying the request.
2. Something is wrong, it is not client responsibility but there is an alternative way to access data, maybe following a specific process or sending further details ==> 510 Not Extended
2. Server cannot fulfill the request but there is an alternative way that requires it to include further details. Example: when requested data is corrupt, server error response may include a list of older (or unsaved, unversioned) versions of it and expect client to be more specific about which version to select so it could be fetched instead of the corrupted one ==> 510 Not Extended
510 Not Extended
The policy for accessing the resource has not been met in the
request. The server should send back all the information necessary
for the client to issue an extended request. It is outside the scope
of this specification to specify how the extensions inform the
client.
If the 510 response contains information about extensions that were
not present in the initial request then the client MAY repeat the
request if it has reason to believe it can fulfill the extension
policy by modifying the request according to the information provided
in the 510 response. Otherwise the client MAY present any entity
included in the 510 response to the user, since that entity may
include relevant diagnostic information.
case 2 was updated to include an example as IMHO it may fit in such case. but again I'm not any expert and I may be
wrong about it
3. No alternative ways, nothing to be expected or none of the other cases ==> 500 should be good
500 (Internal Server Error)
status code indicates that the server
encountered an unexpected condition that prevented it from fulfilling
the request.

HTTP Status Code 404 or 501 when Operation is not supported by a Resource in REST

I have a REST Service and depending on the type of Resource that is being viewed I have certain Operations avialable
So Resource1 supports Operation1 and Operation2
eg:
Resource1/Operation1
Resource1/Operation2
And Resource2 only supports Operation1
eg:
Resource2/Operation1
So if I receive a call for Operation2 on Resource2 (eg: Resource2/Operation2) the server will not raise an error as Operation2 is valid, but internally I don't support it, so should I return a 404 - Not Found or would a 501 - Not Implemented be more accurate or another error code?
You have to think in terms of Resources (that exist and you operate on them) and not Services as in WebServices (which you call to do something).
So every Operation of your REST API is a resource by itself (i.e. to model long running operations or some sort of more complex transaction that are triggered by sending a POST request to the operation resource) - and the URI to that resource is any opaque string, which can be 'Resource1/Operation1' or "/someOther/arbitrary/path",
A 501 means the HTTP server itself has a particular feature not implemented that is require to fulfill the request - i.e. it lacks some technical features - but the resource itself is valid.
But as you've written, your Rest API is not designed to support Operation2 for Resource2 at all, which means that there is no such thing as a "Resource2/Operation2" service resource. Therefore 404 response is the only reasonable choice.
In terms of "how client will interact...", ask yourself: given you are the consumer of a resource (i.e. some arbitrary web site), will you revisit the same URL when receiving a 404? (probably not) will you do when receiving a 501? (probably yes, as you assume the problem has been resolved in the meantime)
TL;DR
return 404
Never return any HTTP 5XX status from you code. It's used by the server to report server internal problem. If you are using that it would be confusing for the application user. So just use 404 for your purpose.
You can use either of them,
its your webapis , depend upon how the client will interact with it.
4xx is for client error and 5xx is for server error
you may use
501 Not Implemented instead of 404 not found
because the resource in url is not there .
OR you can also use
your own custom error code and error message.
or
422 Unprocessable Entity
The request was well-formed but was unable to be followed due to semantic errors.
If the client request by definition is illegal, I would return HTTP 400 Bad Request.
Sending HTTP 400 also implies that the request should not merely be retried, as it is permanently illegal and is not understood by he server.
Only reservation I would have with using HTTP 400 is that it usually seems to be used for malformed requests; i.e. typos and the like. Whether your request is illegal or malformed, is a question of semantics.

High Scale REST API

One of our REST APIs will cause a long-running process to execute. Rather than have the client wait for a long time, we would prefer to return an immediate response.
So, let's consider this use case: An applicant submits an application, for which there will be an eventual result. Since this is a very high-scale platform, we cannot persist the application to storage, but must place it onto a queue for processing.
In this situation, is it acceptable practice to return the URI where the application will eventually live, such as http://example.com/application/abc123?
Similarly, would it be acceptable practice to return the URI of the result document, which represents the decision regarding the application, as part of the representation of the application resource? The result document will not be created for some minutes, and an HTTP GET to its URI (or the URI of the application for that matter) will result in a 404 until they are persisted.
What is the best practice in this kind of situation? Is it acceptable to hand out "future" URIs for resources?
I don't see anything wrong with such design, but have a closer look at the list of HTTP status codes for better responses. IMHO the first request should return 202 Accepted:
The request has been accepted for processing, but the processing has not been completed.
while requests to the URL where the result will eventually be should in the meantime return 204 No Content (?):
The server successfully processed the request, but is not returning any content
And of course it should eventually return 200 OK when processing finishes.
From "RESTful Web Services Cookbook"
Problem
You want to know how to provide resource abstractions for
tasks such as performing computations or validating data.
Solution
Treat the processing function as a resource, and use HTTP GET to fetch
a representation containing the output of the processing function. Use
query parameters to supply inputs to the processing function.
This entails just GET requests on a URI that represents the processing function. Your example 'http://example.com/application/abc123' URI. When returning a response you would include what information you have by now and use HTTP codes to indicate the status of the processing as already suggested by Tomasz.
However..., you should not use this approach, if the subsequent application processing stores or modifies data in any way.
GET requests should never have side effects. If the submittal of the application leads in anyway (even if only after being processed in from queue) to new information / data being stored, you should use a PUT or a POST request with the application's data in the request's body. See "Why shouldn't data be modified on an HTTP GET request?" form more information.
If they application's submittal stores or modifies data, use the pattern for asynchronous processing: a POST or PUT request with the application's details.
For example
POST http://example.com/applications
which returns "201 Created" with the URI of the new application resource.
or
PUT http://example.com/applications/abc123
which returns "201 Created" and
Both would also return any resource information that is already known at that time.
You can then safely perform GET requests on the URI of the new resource as they now only retrieve data - the results of the application processing so far - and no data is stored or modified as a result of the GET.
To indicate the application's processing progress, the GET request can either return some specific status code in the response (queued, processing, accepted, rejected), and/or use the HTTP response codes. In either case a "200 OK" should only be returned when the application's processing is complete.