RESTful API design: Validity of username - rest

I am designing a RESTful API which should return the validity of a username for registration. The invalid cases include:
duplicate username
too short or too long
invalid characters
My current design is:
GET /valid_username/{username}
returns 204 for valid username
returns 404 for invalid username with {err: 'DUPLICATE_USERNAME'}
Is this the preferred way in RESTful API?

Is this the preferred way in RESTful API?
I don't think so.
The guiding star for designing a REST API is a single question: How would you implement this as a web site?
As an integration protocol, when the client submits the username, you either want to advance them to the next form, or you send them back to the "previous" form with a bunch of error messages.
Technically, you could use 204 to do that in the happy path, with a Link in the meta data that the client could follow to go from the success page to the next. But it's a lot more likely that you would just send a representation of the next page at once; so 200 would be your most likely play.
For the unhappy path, instead of sending a representation of the next page, you'd probably send a representation of the current page with the data pre-filled in and the errors highlighted.
So key point: the human being is looking at the rendered representation, not at the meta-data in the headers. The browser is the audience for the metadata. The browser doesn't care anything about the semantics of the message, it just wants to know things like "can I cache this response?"
The most obvious choice of status codes for this case is: 200. Semantically, the representation of the integration resource change from one that would allow you to proceed (username Bob was available) to a representation that would force you to make another choice (username Bob is no longer available). The various headers in the meta data describe the appropriate caching policy to apply to this representation, and so on.
Technically, I think 404 "works", in so far that you can make it do what you want. The semantics are incorrect; from RFC 7231
The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists
My interpretation of that passage is that the server is claiming that the client submitted a bad request (all 4xx status codes share that meaning) , and more specifically that the spelling of the resource identifier is the root of the problem.
It's a bad fit because the server, in this case, isn't having any difficulty fulfilling the request -- it understands the request just fine; the "problem" is that the representation being returned isn't the one that the client is hoping for.

REST is about exposing, creating, and mutating resources. Each endpoint URI defines the location of that resource. In the case of a User resource, you would typically expose an endpoint to the collection of User resources with a URI like /users, and an endpoint for individual User resources of that collection with a URI like /users/{id}. Each endpoint is tied to a resource; not an operation. Endpoints that expose operations follow RPC architecture.
To do this RESTfully, your validations would be run when attempting to create a User resource using the user input. This could be done using an HTTP request line like POST /users. Upon invalid format of the username value, you would respond with a 400 status code, to indicate error on the user's part. Upon conflict with an existing username, you could return a 409 status to indicate a conflict between the request and internal API state. Your responses should also contain a body with further details of the issue. Upon a successful creation of a User resource, you should respond with a 201 status to indicate that the resource was created successfully.
Another way to go about this RESTfully, would be to key the individual User resource URI on the username, so that you would have something like /users/{username}. Then, to check if a username already exists, you would simply form a request like GET /users/{username}. If a User resource with that username does not exist, the API should respond with a 404 status, which would indicate to your client that the username is available. A response status of 200 would obviously indicate that the username is already in use.

Related

What is the best way to implement checking whether a username is already registered on the client side, if I want to keep my API RESTful?

As the title suggests, I am looking to have the client be able to check whether a username or email is already registered before the user submits the registration form. I had considered using an API endpoint that would return true or false for a given username, but this seems more RPC than RESTful. Is it bad practice to have such an endpoint if the rest of my API is RESTful? If so, what would a RESTful approach to this situation look like?
A key concept in REST is that anything that can be named can be a resource; this includes procedures. If you want to have an endpoint that accepts a username in the request body and returns true/false that's perfectly fine.
Alternatively, you can (or may already) treat a user as a resource. Take the GitHub API as an example: you can fetch a user by sending a GET request to https://api.github.com/users/{username}. If the user exists, and therefore the username is taken, you'll get back 200 OK. If the user does not exist you'll get 404 Not Found.
If you want to check if a username has been taken you can just issue a request for that username and then check the response. If you choose this approach HEAD is the more appropriate method. HEAD is essentially the same as GET except that the response body is empty. Since you don't need the body to determine if the user exists you can save a tiny bit of bandwidth with HEAD over GET.
You could do a POST /registrations and return a 400 with validation errors array and just have client side logic filter that array for invalid username. In other words, no reason you can't hit the endpoint multiple times. This should help decouple your UX from your API.

Which REST HTTP verb to use for "Q&A" scenario?

An auth system I work on has this new function:
1. Auth system allows users to specify Relying Parties they transact with,
2. The Relying Party can approve/deny/maybe the request (authorisation) - maybe causes a redirect to the RP website for further authorisation questions by the RP.
The RP has to implement a web service specified by the Auth System to perform the approve/deny/maybe request that the auth system generates.
My problem is what this looks like as a REST service. As the auth system can't really dictate the URI style for the RP system, i would like to specifying that the path does not have any parameters in it, auth system just needs to know the URI of the service. The data of the request (user name/id) might be in a bit of json in the request body (suggesting POST http verb. GET might be OK, but loath to expose user ids in the URI). The auth system does not care what the RP does with the request data, the auth system just wants a "yes/no/maybe" reply (so may not really be a GET/POST/PATCH/DELETE/etc paradigm).
What would be the best verb to use? and how to facilitate the reply; its not really a success/failure response as there are 3 possible results to the query, is it acceptable to have some json returned with the response (then what http verb to use)?
I'm a bit baffled by this. GET seems the most obvious
GET /api/user_link_authorize/{userid}
except then i'm forced to put user ids in the URI (which I dont want to do)...
Any suggestions?
My problem is what this looks like as a REST service.
Think about how it would look as a web site.
You would start with some known URI in your list of bookmarks. Fetching that page would give you a representation of a form, which would have input controls that describe what data needs to be provided (and possibly includes default values). The client provides the data it knows about, and submits the form. The data in the form is used to create a HTTP request as described by HTML's form processing rules. The response to that request includes a representation of the answer, or possibly the next bit of work to be done.
That's REST.
Retrieving the form (via the bookmarked URI) would be a GET of course; we're just updating our locally cached copy of the forms "current" representation. Submitting the form could be a GET or a POST; we don't necessarily need to know that in advance, because that information is carried in the representation of the form itself.
GET vs POST involves a number of trade offs. Semantically, GET is safe, it implies that the resource can be fetched at any time, that spiders can crawl it, that accessing the resource in that way is "free". Which is great when the resource is free, because clients on an unreliable network can automatically retry the request if the response is lost. On the other hand, announcing to the world that the request is safe when it is actually expensive to produce responses is not a winning play.
Furthermore, GET doesn't support a message body (more precisely, the payload has no defined semantics). That means that information provided by the client needs to be part of the target resource identifier itself. If you are dealing with sensitive information, that can be problematic -- not necessarily in transit (you can use a secured socket), but certainly in making sure that the URI with sensitive information is not logged where the sensitive data can leak.
POST supports including a payload with the request, but it doesn't promise that the query is safe, which means that generic components won't know if they can automatically retry the request when a response is lost.
Given that you don't want the user id in the URI, that's a point against GET, and therefore in favor of POST.

Differentiating REST status codes

Lately, I have started adding status codes to my responses instead of returning them directly.
Let's assume /person/1 returns a person with id 1 from the DB. If the person does not exist, should I return 404 status? How am I supposed to differentiate if the endpoint does not exist on the server or the resource does not exist?
Now, let's assume I have a POST endpoint for inserting users. What if that endpoint checks if the email is formed correctly and I return 400? How should I know if the request was not formed correctly and did not route to any servlets or if it indeed reached the servlet which decided that email is badly formed?
Is it a good practice to always return a 200 OK response from all of my servlets indicating that the application has done its job regardless of the outcome and write the status in a json field status or is this an overkill and an anti-pattern?
I do not have a lot of experience nor knowledge of HTTP servers so I am not sure I am explaining this (nor using it) right, so I apologize for the broad descriptions.
Let's assume /person/1 returns a person with id 1 from the DB. If the person does not exist, should I return 404 status? How am I supposed to differentiate if the endpoint does not exist on the server or the resource does not exist?
To a client it doesn't matter whether the resource or the endpoint did not exist. All it is told by the server is that for the given URI there is no representation available.
As inf3rno already mentioned a client is usually served all of the URIs a client will need by the server directly in a response. Through bookmarking or including links in some external resource certain links might get invalid over time and as such a 404 Not Found response just informs the client that no representation is available for the given URI.
A client typically is also not interested in the internals of an API but just to send or receive data it can work upon.
A further misconception many users have, unfortunately, is, that they already assume certain resources to return certain types. Such types may lead to failures on the client side if the expected representation format ever changes. In addition to that the URI structure itself, including any path, matrix and query parameters, should not be used to deduce any logical structure of the API, its exposed endpoints or the logical structure of the resources to other resources of that API. A URI as a whole is a pointer to a resource. A resource may have a dozens of links pointing to it. You might think of a URI as cache-key for representations returned that, on consecutive invocations are further served by the cache instead of the actual server. This is actually one of the constraints REST imposes and is widely used on the Web.
Now, let's assume I have a POST endpoint for inserting users. What if that endpoint checks if the email is formed correctly and I return 400? How should I know if the request was not formed correctly and did not route to any servlets or if it indeed reached the servlet which decided that email is badly formed?
RFC 7231 defines POST as an all-purpose tool that should be used if other methods aren't fitting for the task at hand. It explicitely states that the payload provided by that method will be processed according to the resource's own specific semantics. So, if you need to validate an email-address of a user before persisting it or before starting a calculation, background process or whatever, fine, do that :) Even PUT, which is often said to only replace the current representatin with the given one in the request, is not only allowed but also encouraged to perform verifications regarding any constraints the server has for the target resource and therefore it should refuse payloads that do not fit its expectations.
The quintesence here is, that a server should provide a client always with as much information as possible to let a client determine what to do next. Think of a Web based application which you access through your browser. If you receive a 400 Bad Request the browser will usually tell what the server didn't like about your request, i.e. incomplete syntax or missing value of a required field. The same holds true for REST APIs as they are basically just a generalization of the interaction model used on the Web. So the same concepts that apply to the Web also apply to REST :)
By that, each HTTP status code has its own semantics and should help the client to determine what the client should do next. A 400 Bad Request i.e. states that the server either cannot or will not process the request due to something that the server considers to be a client based error and it's up to the client to correct that failure and resend the request.
A 405 Method Not Allowed on the other hand indicates that the client used a HTTP method not supported by the targeted endpoint. An error response not only indicates that to the client but also which methods are allowed on the targeted endpoint within an Alllow response header.
Each of the HTTP status codes specified in RFC 7231 has their own semantics and its probably advisable to at least skim over these. You can also lookup all available status codes at IANA that provides links to the specificaton describing those status codes.
Is it a good practice to always return a 200 OK response from all of my servlets indicating that the application has done its job regardless of the outcome and write the status in a json field status or is this an overkill and an anti-pattern?
As with error codes also the success codes (in the 200 range) have their own semantics. If a new resource is created as outcome of processing a request (via PUT or POST) a client should be notified with a 201 Created status response that furthremore contains a HTTP Location header containing a URI targeting at the newly created resource.
If a server may take some time in order to calculate a response it is probably advisable to return a 202 Accepted response in order to inform a client about the pending request. A client can later on poll for the request either after some threshold period or after getting notified by the server through callback mechanisms such as email-notification or similar stuff. Due to German law restrictions i.e. German companies have to maintain archives of their messages exchanged via EDI. We, as an EDI provider, offer our clients to perform an archive of their exchanged messages via triggering one of our HTTP endpoints. Depending on the number of messages exchange by that company and the time period selected the archive should be generated for, this process may take some time (a couple of hours to be more concrete) and instead of letting the client wait for that period we simply return 202 Accepted and start the archiving process in the back. Depending on the configuration they either poll for the finished archive, get an information about the final result or directly get the archive sent through email if the file isn't to large.
204 No Content is also quite useful if a client performs an update onto a resource. As PUT is generally defined as replace the current representation with the one provided in the payload, upon receiving a 204 No Content response the client knows that the server applied the update and the current representation does look like the requested one by the client. Thus the server does not need to inform the client further how the current representation looks like, as the client already knows how it should look like. However, in case the server had to convert the payload to a different representation that maybe lead to an other outcome, it is probably benefitial to inform a client about the new state of the resource within a 200 OK response including the a representation of the outcome of the update process.
Returning 200 OK for a failure including a JSON payload with fields indicating about the error is for sure a bad way to proceed. Not only does it give clients a wrong hint but the response might also be cached by intermediaries and returned to other clients requesting the same even when the failure might only be of temporary nature (DB crash or the like). In additon to that is such a JSON payload proabably using a non-standardized format and thus requires out-of-band knowledge to actually process the message. While we humans are quite capable of figuring out what's going on, computers aren't yet that smart on their own.
I hope you can see that HTTP offers a lot of semantics on when to use what method or response code. They are there for a reason and therefore also should be used if the circumstances are right.
In GET request, 404 status is just a response code. You have to provide error message in body of the response in case when record is not found for the id provided.
For POST request, you can return 400 error code with specifying in the body which fields are missing/failing validation.
For url not found, User will always get the 404 error code.
For succcessful GET or POST request, you can return the response with 200 status
How am I supposed to differentiate if the endpoint does not exist on
the server or the resource does not exist?
The endpoint is the IRI (URI) of the web resource in this case. If the endpoint does not exist, then there is a good chance that the web resource does not exist either. It is an unlikely scenario, since you got your URIs from the server (HATEOAS), but it can happen if something changes between two requests, e.g. the URI template changes or somebody deletes the resource. In all of these cases the 404 is a fine HTTP status code. You can elaborate in the error message or use an additional error code, but for me it does not make sense, because the URI template change is a rare event. It would make the client more flexible though, since it could clear the cache and retry with a new link.

Authorization and the query string in REST API design

We have a design where a user has access to products, but only in a certain category.
A superuser can list all products in all categories with a simple GET request, such as GET /products.
Simple filtering already exists via the query string, so the superuser can already restrict his search to category N by requesting GET /products?category=N.
Say that user X is authenticated, and user X has access to products with category of 3.
Should an API server:
mandate that the user pass the filter for the appropriate category -- ie require GET /products?filter=3, and GET /products would fail with 403 Forbidden? -- or
expect a simple GET /products and silently filter the results to those that the user is authorized to access?
expect a simple GET /products and silently filter the results to those that the user is authorized to access?
Changing the representation of a resource depending on which user is looking at it strikes me was a Bad Idea [tm].
Consider, for instance, the use case where a uri gets shared between two users. They each navigate to the "same" resource, but get back representations of two different entites.
Remember - here's what Fielding had to say about resources
The key abstraction of information in REST is a resource. Any information that can be named can be a resource: a document or image, a temporal service (e.g. "today's weather in Los Angeles"), a collection of other resources, a non-virtual object (e.g. a person), and so on. In other words, any concept that might be the target of an author's hypertext reference must fit within the definition of a resource. A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time.
Now, there are conceptual mappings which depend on the perspective of the user (e.g. "today's weather in my city"). But I'm a lot more comfortable addressing those by delegating the work to another resource (Moved Temporarily to "today's weather in Camelot") than I am in treating the authorization credentials as a hidden part of the resource identifier.
If consumers of your api are arriving at this resource by following a link, then 403 Forbidden on the unfiltered spelling is fine. On the other hand, if consumers are supposed to be bookmarking this URI, then implementing it as a redirecting dispatcher may make more sense.
The redirecting approach is consistent with remarks made by Fielding in a conversation about content negotiation on the rest-discuss list in 2006
In general, I avoid content negotiation without redirects like the
plague because of its effect on caching.
E.g., if you follow a link to
http://www.day.com/
then the site will redirect according to the user's preference to one of
http://www.day.com/site/en/index.html
http://www.day.com/site/de/index.html
Both of the solutions are perfectly valid. You can always hide things when the user does not have the permission to see them. This means, that you can display a filtered representation of /products by the actual user, or you can hide the /products resource and give link only to the /products?filter=3 resource. I don't think there is a best practice here, since both solutions are easy to implement.
#VoiceOfUnreason:
Apparently you did not read Fielding's work carefully enough.
Changing the representation of a resource depending on which user is
looking at it strikes me was a Bad Idea [tm].
The representation depends on the whole request, not just on the GET /resource part. The Authorization header is part of the HTTP request as well. So changing the representation depending on the Authorization header is valid in REST. It is just the same as changing the representation depending on the Accept header. Or do you say that we should serve XML even if somebody requests for JSON? It is a huge mistake to think that a single resource can have only a single representation. Another mistake to return a broken link to the client and let them waste resources by sending pointless HTTP messages...
Consider, for instance, the use case where a uri gets shared between
two users. They each navigate to the "same" resource, but get back
representations of two different entites.
REST is for machine to machine communication and not for human to machine communication! The browser is not a REST client. Or did you mean that a facebook bot will share the link with a google crawler? Such a nonsense...

Is leaving out the current user ID in a REST endpoint bad practice?

I'm currently writing my own REST API. One of my endpoints is named /api/profile, which takes either GET to retrieve or PUT to update this profile. I have been told that it's considered bad practice or plain wrong to not include any form of user ID, email or username in the URL, when calling PUT /api/profile. Every request has a Bearer token for authentication, which makes the server load the user in case and stores it in the request data. Upon updating information using PUT, I thought it'd be safe to assume that if no parameter has been supplied to the URL, the target is self, in this case, the authenticated user.
Am I wrong to make this assumption and/or is this a bad example of REST?
One of the architectural constraints of REST is to make your resources cacheable.
When you have some proxy or other cache between your client and server, the request line GET /api/profile cannot be cached, or another user would potentially retrieve the wrong profile.
You could use a Vary: Authorization header in your case to prevent this. But having an id in the URL will make unequivocally clear which resource you want to retrieve.
For PUT requests, this is not an issue as they won't be cached, but you should keep your URI design consistent.