Correct REST API semantics to check email exists or not - rest

I have an API that looks like GET user/exists/by?email=<email_here>.
The objective of the API is to check if the email exists or not.
I am confused over what should be the proper semantics of the API? Currently, I have 2 options.
Option 1:
Use HTTP status codes to drive the API.
Send 204 if the email exists
Send 404 if the email doesn't exist
Send 400 if email fails validation
Option 2:
Send 200 with body {"exists": true} when email exists
Send 200 with body {"exists": false} when email doesn't exists
Send 400 if email fails validation

Send 204 if the email exists
Send 404 if the email doesn't exist
I don't think you are going to find an authoritative answer to your question.
That said, one thing I would encourage you to do is to look at the HTTP responses being sent by your server, and in particular pay attention to the number of bytes of meta data being sent; the status-line, the headers, and so on.
Then consider carefully whether the difference between a small JSON payload and a zero length payload is all that significant.
Furthermore, recall that if a client doesn't want a copy of the representation to be returned, the client can use the method token HEAD rather than GET to request a refreshed copy of the resource meta data.
200 vs 404 is harder. 200 means that the payload is a representation of the requested resource (which is cacheable by default). 404 means that the payload is a representation of an error message (which is cacheable by default).
The HTTP status codes are metadata about the transfer of documents over a network domain. So it is really the wrong mechanism to use to finesse fine grained distinctions in your documents.
For instance, take a look at the cache invalidation specification, and notice please the distinction between the handling of 2xx and 4xx responses to unsafe requests.
As a matter of principle, HTTP data belongs in the headers, your data belongs in the body, and its only when your data is going to be of interest to general purpose HTTP components that we should be lifting copies of your data into the HTTP meta data.
But, as far as I know, not authoritative. This is all very hand wavy, and not easily matched to a specific RFC or advisory.

Related

Is a 404 http code appropriate when a successful REST query gives 0 results?

We had a bit of a heated debate at work this week regarding the proper use of HTTP error code 404. I feel that it is being used inappropriately in this situation, and they insist that "this is how it is everywhere".
Here's the scenario:
We have a REST API endpoint with a static URI like "https://server.domain.com/restapi/getnode" that accepts a json body during a POST operation to see if a server exists within the database. There is 1 server allowed per query.
The way it is designed, if the server exists, it returns http code 200 with the body of the message being the details of the server within the database. If it doesn't exist, then they return a 404 code with the message "server x is not found in database y".
I feel that this is incorrect because according to the standards from W3C 4xx codes are for client-related issues, and 404 is specifically if the server can not server up that specific URI. Further, this is not really an error, we would not only expect the occasional negative/empty response as part of the normal business, but that is expected to be the state on the majority of the calls.
Their response to this is that according to sources such as AWS REST Standards and ServiceNow REST Standards that 404 is appropriate because the query didn't find the server, therefore it couldn't find the resource (To which I feel that they are misinterpreting the term "resource").
So, who is correct here? Or is this more of a gray area than I think it is?
The relevant part of the specification here is Client Error 4xx.
Except when responding to a HEAD request, the server SHOULD send a representation containing an explanation of the error situation, and whether it is a temporary or permanent condition.
User agents SHOULD display any included representation to the user.
In other words, the status code is meta data that indicates to general-purpose components that the payload of the response is an explanation of the error, and not a representation of the resource.
If what you are actually doing is returning a representation of the resource which is at this moment an empty list, then you should be using a Successful code, not a Client Error code.
For example, if we have a resource that lists women who have been sworn into the office of President of the United States, the expression of that resource, as of 2019-09-25, would be an empty list. So a standard HTTP response would look like
200 OK
Content-Type: application/json
[]
On the other hand, this response says something completely different
404 Not Found
Content-Type: application/json
[]
What this says is that there was no current representation of the requested resource, and the representation of the explanation of the problem is an empty list, so please show an empty list to the user so that they know what to do.
This is a neverending debate with no useful outcome. See for example How to design RESTful search/filtering?, Do web applications use HTTP as a transport layer, or do they count as an integral part of the HTTP server? (disclaimer: my question), and thousands of other questions on the Stack Exchange network and thousands of blogs like "which status code to pick for REST scenario XYZ".
The fact that you're using POST for a request with GET semantics, means you're already not properly applying REST. POST is used to create resources, not to find existing resources. If you have a "servers" resource collection, then you're dealing with "server" resources.
If you want to have a "find-server" endpoint, and you want to use POST to "create" a "find-server resource result", which you can then issue a GET request to to obtain your search results, you're a masochist who tries to shoehorn an application into a design philosophy that doesn't fit the problem domain.
So just be pragmatic. Does the endpoint exist? Yes, so 404 is not applicable. Return a 200 (the resource exists, the request is correct) with an body indicating no results were found, or a 204 ("no content") without a body.

What is the correct status code after testing if an e-mail address exists in the database or not?

We are using a specific endpoint on our API to test if an e-mail address is already registered in our database. When it's not, what would be the right status code to return to the client ?
We cannot take a decision between 404, 204 and 200. There are a couple of articles over the net but all state pros and cons but it's not very clear.
200 says that the request was successful
204 says that the request was successful AND that the message body included in the response is 0 bytes long.
404 says that there is no current implementation associated with the requested resource
Which of these is correct really depends on your resource design.
Consider a database query with a where clause -- if there are no matching rows, then you get SUCCESS, with an empty result set. So the analogous thing in a HTTP response would be a 2xx status code, and a body that describes an empty set.
If you were using a JSON List as your representation of the set, then the representation would be two bytes long [], and a 200 status code would be appropriate. If you were using a json lines representation, with each record on its own line, then with no records you would have no lines, therefore a 0 byte representation and 204 would be a good choice.
What about a case where we have a simple web page, that tells you if the email address is registered or not? If it's registered, the server responds with a 200 message and a html document that tells you about the registration. If it isn't registered, then you get an html message telling you that the email address isn't registered... and a 200, because we were able to find the current representation of the resource.
And 404? 404 indicates to the client that there appears to have been a spelling error in the target-uri of the http request -- that there isn't even nothing to find.
It may help to understand that status codes are metadata about the HTTP response, which is to say that they are part of the application domain of transferring documents over a network, not about the business domain. They are there so that generic components, like caches, can do interesting things without needing to know any specifics about the domain in question.
Our web API is a facade to make our domain model look like a boring document store.

HTTP status code for GET request with non-existing query parameter value

Let's clarify three common scenarios when no item matches the request with a simple example:
GET /posts/{postId} and postId does not exist (status code 404, no question)
GET /posts?userId={userId} and the user with userId does not have any posts
GET /posts?userId={userId} and the user with userId does not exist itslef
I know there's no strict REST guideline for the appropriate status codes for cases 2 and 3, but it seems to be a common practice to return 200 for case 2 as it's considered a "search" request on the posts resource, so it is said that 404 might not be the best choice.
Now I wonder if there's a common practice to handle case 3. Based on a similar reasoning with case 2, 200 seems to be more relevant (and of course in the response body more info could be provided), although returning 404 to highlight the fact that userId itself does not exist is also tempting.
Any thoughts?
Ok, so first, REST doesn't say anything about what the endpoints should return. That's the HTTP spec. The HTTP spec says that if you make a request for a non-existent resource, the proper response code is 404.
Case 1 is a request for a single thing. That would return 404, as you said.
The resource being returned in case 2 is typically an envelope which contains metadata and a collection of things. It doesn't matter if the envelope has any things in it or not. So 200 is the correct response code because the envelope exists, it just so happens the envelope isn't holding any things. It would be allowable under the spec to say there's no envelope if there are no things and return 404, but that's usually not done because then the API can't send the metadata.
Case 3, then, is exactly the same thing as case 2. If expected response is an envelope, then the envelope exists whether or not the userId is valid. It would not be unreasonable to include metadata in the envelope pointing out that there is no user with userId, if the API designer thinks that information would be useful to clients.
Case 2 and Case 3 are really the same case, and should both either return 200 with an empty envelope or 404.
First piece, you need to recognize that /posts?userId={userId} identifies a resource, precisely in the same sense that /posts/{userId} or /index.html specifies a resource.
So GET /posts?userId={userId} "requests transfer of a current selected representation for the target resource."
The distinction between 200 and 404 is straight forward; if you are reporting to the consumer that "the origin server did not find a current representation for the target resource or is not willing to disclose that one exists", then you should be returning 404. If the response payload includes a current representation of the resource, then you should use the 200 status code.
404 is, of course, an response from the Client Error response class
the server SHOULD send a representation containing an explanation of the error situation, and whether it is a temporary or permanent condition.
So a way of figuring out which of these status codes to use, is to just look at the message body of the response. If it is a current representation of the resource, then you use the 200 status code. If it is a representation of a message that explains no current representation is available, then you use the 404 status code.
Of course that ducks the big question: what should the representation of the resource be in each case? Once you know that, you can work out the rest.
If you you think that an unexpected identifier indicates an error on the client (for example, a corrupted link), then it will probably improve the consumer's experience to report that as an explicit error, rather than returning a representation of an empty list.
But that's a judgment call; different API are going to have different answers, and HTTP isn't particularly biased one way or the other; HTTP just asks that you ensure that the response code and headers are appropriate for the choice that you have made.

REST-API HTTP status code for invalid input on a Patch request

There is a Patch request on my application that updates a user's password. We have an Ember validator to block all invalid input except for 1 business rule, which is it should not be a password used as one of your past 5 passwords.
We are currently returning a 400 Bad Request in this case, however my company has a dashboard for component availability and counts 400 and 500 requests as unavailability, because most applications are SOAP and they just expect 200 and 300s. Even though we handle this 400 appropriately through the UI it is still a ding against us. And puts us on the radar as an area with poor availability.
Should we take this to the people that monitor availability and have them change this for REST services as this will become a more common and common occurrence as the company creates more REST applications. Or do we cave and return a 200 that also states that the password was not successfully updated?
I would argue that a 400 response is inappropriate for the service. If the service is responding with a 400 when the user's password has been repeated within the last 5 passwords, then the request was understood by the server.
According to the W3C:
The request could not be understood by the server due to malformed
syntax. The client SHOULD NOT repeat the request without
modifications.
In your case, the request was understood. It is returning a 400 to signal an application concern (regarding password reuse). I believe a 200 response would be more appropriate with a payload indicating the application problem.
EDIT:
One might also argue that a 422 response would be in order:
The 422 (Unprocessable Entity) status code means the server
understands the content type of the request entity (hence a
415(Unsupported Media Type) status code is inappropriate), and the
syntax of the request entity is correct (thus a 400 (Bad Request)
status code is inappropriate) but was unable to process the contained
instructions. For example, this error condition may occur if an XML
request body contains well-formed (i.e., syntactically correct), but
semantically erroneous, XML instructions.

HTTP Spec: PUT without data transfer, since hash of data is known to server

Does the HTTP/WebDav spec allow this client-server dialog?
client: I want to PUT data to /user1/foo.mkv which has this hash sum: HASH
server: OK, PUT was successful, you don't need to send the data since I already know the data with this hash sum.
Note: This PUT is an initial upload. It is not an update.
If this is possible, a way faster file syncing could be implemented.
Use case: The WebDAV server hosts a directory for each user. The favorite video foo.mkv gets uploaded by several users. In this example the favorite video is already stored at this location: /user2/myfoo.mkv. The second and following uploads don't need to send any data, since the server already knows the content. This would reduce a lot of network load.
Preconditions:
Client and server would need to agree on the hash algorithm beforehand.
The server needs to store the hash-value of already known files.
It would be very easy to implement this in a custom client and server. But that's not what I want.
My question: Is there an RFC or other standard that allows such a dialog?
If there is no standard yet, then how to proceed to get this dream come true?
Security consideration
With the above dialog it would be able to access the content of know hashes. Example an evil client knows that there is a file with the hash sum of 1234567.... He could do the above two steps and after that the client could use a GET to download the data.
A way around this to extend the dialog:
client: I want to PUT data which has this hash sum: HASH
server: OK, PUT would be successful, but to be sure that you have the data, please send me the bytes N up to M. I need this to be sure you have the hash-sum and the data.
client: Bytes N up to M of the data are abcde...
server: OK, your bytes match mine. I trust you. Upload successful, you don't need to send the data any more.
How to get this done?
Since it seems that there is not spec yet, this part of the question remains:
How to proceed to get this dream come true?
From what you described, it seems like ETags should be used.
It was specifically designed to associate a tag (usually an MD5 hash, but can be anything) with a resource's content (and/or location) so you can later tell whether the resource has changed or not.
PUT requests are supported by ETags and are commonly used with the If-Match header for optimistic concurrency control.
However, your use case is slightly different as you are trying to prevent a PUT to a resource with the same content, whereas the If-Match header is used to only allow the PUT to a resource with the same content.
In your case, you can instead use the If-None-Match header:
The meaning of "If-None-Match: *" is that the method MUST NOT be
performed if the representation selected by the origin server (or by a
cache, possibly using the Vary mechanism, see section 14.44) exists,
and SHOULD be performed if the representation does not exist. This
feature is intended to be useful in preventing races between PUT
operations.
WebDAV also supports Etags though how it's used may depend on the implementation:
Note that the meaning of an ETag in a PUT response is not clearly
defined either in this document or in RFC 2616 (i.e., whether the ETag
means that the resource is octet-for-octet equivalent to the body of
the PUT request, or whether the server could have made minor changes
in the formatting or content of the document upon storage). This is an
HTTP issue, not purely a WebDAV issue.
If you are implementing your own client, I would do something like this:
Client sends a HEAD request to the resource check the ETag
If the client sees that it matches what it has already, do not send anything else
If it doesn't match, then send the PUT request with the If-None-Matches header
UPDATE
From your updated question, it now seems clear that when a PUT request is received, you want to check ALL resources on the server for the absence of the same content before the request is accepted. That means also checking resources which are in a different location than what was specified as the destination to the PUT request.
AFAIK, there's no existing spec to specifically handle this case. However, the ETag mechanism (and the HTTP protocol) was designed to be generic and flexible enough to handle many cases and this is one of them.
Of course, this just means you can't take advantage of standard HTTP server logic -- you'd need to custom code both the client and server side.
Assumptions
Before I get into possible implementations, there are some assumptions that need to be made.
As mentioned, you need to control both the server and the client
An algorithm needs to be agreed upon for generating the ETag based on the content. This can be MD5, SHA1, SHA2-256, SHA3, a concatenation of a combination of them, etc. I'll just mention the algorithm output as the ETag, but how you do it is up to you.
Possible implementations
These have been ordered from simplest to increasing complexity if the simple case doesn't work for you.
Possible implementation 1
This assumes your server implementation allows you to read the request headers and respond before the entire request is received.
Client computes the ETag for the file/resource to upload.
Client sends a PUT request to the server (location doesn't matter) with the header If-None-Match containing the ETag and continue sending the body normally.
Server checks to see if a resource with the ETag already exists.
Server:
If ETag already exists, immediately return a 412 response code. Optionally terminate the connection to stop the client from continuing to send the resource (NOTE: This is NOT advisable by the HTTP spec, though not explicitly prohibited. See note 1 below). Yes, a little bandwidth is wasted, but you wouldn't have to wait for the entire request to finish.
If ETag doesn't exist, wait for the request to finish normally.
Client:
If the 412 response is received, interpreted it such that the resource already exists and the request needs to be aborted -- stop sending data.
Possible implementation 2
This is slightly more complex, but better adheres to the HTTP spec. Also, this MIGHT work if your server architecture doesn't allow you to read the headers before the entire request is received.
Client computes the ETag for the file/resource to upload.
Client sends a PUT request to the server (location doesn't matter) with the header If-None-Match containing the ETag and an Expect: 100-continue header. The request body is NOT yet sent at this point.
Server checks to see if a resource with the ETag already exists.
Server:
If ETag already exists, return a 412 response.
If ETag doesn't exist, send a 100 response and wait for the request to finish normally.
Client:
If the 412 response is received, interpreted it such that the resource already exists and the request was therefore aborted.
If the 100 response is received, continue sending the body normally
Possible implementation 3
This implementation probably requires the most work but should be broadly compatible with all major libraries / architectures. There's a small risk of another client uploading a file with the same contents in between the two requests though.
Client computes the ETag for the file/resource to upload.
Client sends a HEAD request (no body) to the server at /check-etag/<etag> where <etag> is the ETag. This checks whether the ETag already exists at the server.
Server code at /check-etag/* checks to see if a resource with that ETag already exists.
Server:
If ETag already exists, return a 200 response.
If ETag doesn't exist, send a 404 response.
Client:
If the 200 response is received, interpreted it such that the resource already exists and do not proceed with a PUT request.
If the 404 response is received, follow up with a normal PUT request to the intended destination.
Considerations
Although the implementation is up to you, here are some points to consider:
When a resource is added or updated, the ETag and the location should be stored in a database for quick retrieval. It is needlessly inefficient for a server to recompute the hash for every single resource whenever a resource is being uploaded. There should also be an index on the ETag and location fields for quick retrieval.
If two clients upload a resource with the same ETag at the same time, you might want to abort the 2nd one as soon as the 1st one finishes.
Using hashes for ETag means that there's a possibility for collision (where two resource would have the same hash), though in practice, the possibility is extremely slim if a good hash is used. Note that MD5 is known to be weak to intentional collision attacks. If you are paranoid, you can concatenate multiple hashes to make collision a much smaller chance.
In regards to your "security consideration", I still don't see how knowing a hash would lead to retrieval of a resource. The server will only and SHOULD ONLY tell you whether a specific ETag exists or not. Without divulging the location, it's not possible for the client to retrieve the file. And even if the client knows the location, the server SHOULD implement other security controls such as authentication and authorizations to restrict access. Using the resource location solely as a way of restricting access is just security by obscurity, especially since from what you mentioned, the paths seem to follow a pattern by username.
Notes
RFC 2616 indicates this SHOULD NOT be done:
If an origin server receives a request that does not include an Expect
request-header field with the "100-continue" expectation, the request
includes a request body, and the server responds with a final status
code before reading the entire request body from the transport
connection, then the server SHOULD NOT close the transport connection
until it has read the entire request, or until the client closes the
connection. Otherwise, the client might not reliably receive the
response message.
Also, DO NOT close the connection from the server side without sending any status codes, as the client will most likely retry the request:
If an HTTP/1.1 client sends a request which includes a request body,
but which does not include an Expect request-header field with the
"100-continue" expectation, and if the client is not directly
connected to an HTTP/1.1 origin server, and if the client sees the
connection close before receiving any status from the server, the
client SHOULD retry the request.