RESTful API design debate: complex query for restful endpoint - rest

I'm designing a RESTful api for a large collection of reporting data, I would like to pass a complex set of parameters like in the codeblock below. I'm debating between using POST and GET for this endpoint. Team member seem to favor GET but I am not sure the best way of passing this amount of data as GET parameters, best idea so far is to have one GET parameters called something like jsonparams that would have all of the below json encoded
{
"filters":
[
{
"field": "metric-name",
"gt": (float/int),
"lt": (float/int)
},
{
"field": "metric-name-2",
"gt": (float/int),
"lt": (float/int)
}
],
"sort":
[
{
"field": "metic-name",
"order": "ASC"/"DESC"
},
{
"field": "metic-name-2",
"order": "ASC"/"DESC"
}
]
"limit": 100,
"offset": 0
}

POST is the method to use for any operation that isn't standardized by HTTP. Retrieval is standardized by the GET method, so using POST to retrieve information that corresponds to a potential resource is never RESTful. In theory, you should use GET, regardless of how convoluted your URI turns out to be.
However, since you're performing a query for which there isn't a single resource you could perform a GET to, it seems fine to use POST, as long as you're aware of the disadvantages and your documentation is clear about it. Frankly, I think using a POST is much clearer than encoding that payload as a JSON + base64 and sending it as a querystring merely for purism.
The real issue with using POST is when people use it in such a way that it avoids or prevents using a real URI. That doesn't seem to be the issue in your case, since you have a valid URI for the collection, but the semantics of your query are too complex to be expressed easily.
If you decide to go with GET, there's a catch. While the HTTP specs establish no limit for URIs, most implementations do, and you might hit that limit if you need to feed all those parameters as a querystring. In that case, it's RESTful to circumvent limitations of the underlying implementation, as long as that's decoupled from your application. A convention for doing what you want would be to use the POST method with the payload you described above, and the X-HTTP-Method-Override: GET header.

If you're adding the data to a resource or creating a resource use POST. GET is to get a already existing resource, not change the state of the resources.
Update: While POST request are fine to update a resource, if the action is idempotent (meaning will not result in creating new resource and every time you issue the request with the same parameters and data you can guarantee to the same resulting resource), then it's recommended to use a PUT. If it's not idempotent, but you're not replacing the entire resource, use PATCH if only updating part of the resource.
If the argument to go with a crazy serialized GET parameter based request for some kind of percived simplicity, you're not going to be adhearing to REST.
Now, if you're retrieving resources only (no creation), use GET. While I prefer human typable parameters, it's not required. If your situation is 100% retrieval you could encode the entire set into into a giant encoded param string, but I'd suggest at least splitting it out a bit for improved sanity by doing something like:
/resource?filters=urlencoded_filter_array&sort=urlencoded_sort_array&offset=0&count=100
Or you could go more explicit like:
/resource?filter1=urlencoded_filter_json&filter2=urlencoded_filter_json .... sort2=urlencoded_sort_json&offset=0&count=100
or finally (my favorite) a completely explicit broken out set of params
/resource?filter1_field=bah&filter1_gt=1.0&filter1_lt<2&filter2_field=boo&filter2_lt...
I like the final one because there's no encodeing/decoding of json and then url encoding the entire json string. This format is easy to decipher in access logs and trouble shoot. It's also very cacheable, even if the parameter order gets changed, some proxy caches can still work with this, whereas encoding some of the filters in a json object if they get moved around they look like entirely different values as far as proxies would be concerned. For me it's the most REST friendly (if that makes any sense), even though the first 2 examples are fine REST GET requests.
The added work to parse the parameter name isn't really that much fuss. A simple method could convert your json into the parameter string and another simple one could re-hydrate the json object from the explict filter1_xyz format.

Related

REST - Does the PUT method have to remove an optional field when it is omitted?

I have a resource Car which have some required fields and another optional ones.
The Car was created with the following request:
POST /cars
{
plate: "XYZ-A2C4",
color: "blue,
owner: "John" //OPTIONAL
}
A REST client wants to update all required info of this car:
PUT /cars/:id
{
plate: "ABC-1234",
color: "black"
}
What happen to the owner optional field?
It will be removed, since it was not informed? ie: PUT must replace the entire resource with the representation passed in the payload ?
Or, since owner is not required, the server may preserve the old value?
I know that the server can provide a PATCH method, but sometimes it is not possible to update a single field because the new state could become invalid (there are no minimum required payload to enforce related fields values). Also, manipulating arrays, removing fields or setting it with null can be tricky in some cases with PATCH since it can be done with two different patterns; JSON Merge Patch is limited and JSON Patch is kinda strange.
Is it OK to perform a PUT with the required fields and the server preserves the old optional values when it is omitted?
If you want to go by the book (being section 4.3.4 of RFC 7231), then yes, the PUT request should, in your case, replace the entire Car resource:
The PUT method requests that the state of the target resource be created or replaced with the state defined by the representation enclosed in the request message payload. A successful PUT of a given representation would suggest that a subsequent GET on that same target resource will result in an equivalent representation being sent in a 200 (OK) response.
So, by the book, you should not use PUT for partial updates, but rather PATCH.
However, in practice, it is really up to you to decide how exactly this is applicable to your service, and more importantly, to document it.
Here are a few real-world examples of how some well-known APIs allow partial updates:
The Ghost API does not support partial resource update, it requires a PUT request with a full resource for any update
The Rossum API supports PATCH for partial resource update, but their documentation explicitly states that only top-level properties are supported
GitHub allows both PATCH and POST requests for partial data updates
Firebase allows PATCH requests but also POST with an X-HTTP-Method-Override header
You are exactly right that sometimes, a PATCH method could result in an invalid resource if processes as-is. However nothing prevents the server from ensuring proper data state as a side-effect. Therefore, during each call you can have the server:
verify the proper state of the resource before persisting it
reject (with a 400 Bad Request error) any request that would result in improper state
respond with the resource (maybe bearing side-effects) on success
Strictly speaking PUT should replace the resource being identified with the entity that is being supplied. In your example, that would mean car would be replaced without the optional field unless the optional field was also supplied in the PUT request.
The number of APIs that strictly adhere to REST or a resource oriented architecture are pretty few and far between, so I personally would try not to sweat this level of detail and just document your api and the behavior that your users can expect.
If you really want to be fanatical about it, I think you're on the right track with PATCH, or you could identify a sub-resource:
PUT /cars/:id/plate
"ABC-1234"
PUT /cars/:id/color
"black
OR perhaps:
PUT /cars/:id/description
{
plate: "ABC-1234",
color: "black"
}
The www-tag mailing list archives include this interesting observation from Roy Fielding in 2002:
HTTP does not attempt to require the results of a GET to be safe. What it does is require that the semantics of the operation be safe, and therefore it is a fault of the implementation, not the interface or the user of that interface, if anything happens as a result that causes loss of property (money, BTW, is considered property for the sake of this definition).
The specification for HTTP PUT should be understood the same way; the specification tells us what the messages mean, but not how to do it.
The PUT method requests that the state of the target resource be created or replaced with the state defined by the representation enclosed in the request message payload.
PUT semantics are fundamentally "remote authoring"; the request tells the server to make its copy of a resource look like a clients copy.
So when you leave an "optional" element out of the representation provided in the request, you are telling the server to remove that optional element from its own representation as well.
It's the responsibility of the client to create a message that describes what the actually wants. So if you the client intend that the optional fields are unchanged after your request, you need to include them in the representation that you include in the body of the request.
The server is expected to interpret the PUT request as described; but it is not constrained in what it does with such a thing (subject to Fielding's observation above: which way does the blame finger point when things go wrong?)
HTTP does not define exactly how a PUT method affects the state of an origin server beyond what can be expressed by the intent of the user agent request and the semantics of the origin server response.
What happen to the owner optional field?
So in your specific example, the request clearly says "do not include the owner element". But the server is allowed to ignore that; it need only be careful in crafting its response not to imply that the provided representation was stored unchanged.

Best practices for returning representation of resource that is also a collection

Let's say I want to make a RESTful interface, and I want to work with foos based upon their IDs. Nothing new here:
GET /api/foo1 returns a representation (e.g. using JSON) of foo1.
DELETE /api/foo1 deletes foo1.
etc.
Now let me tell you that a "foo" is a collection type thing. So I want to be able to add a "bar" to a "foo":
PUT /api/foo1/bar3 adds bar3 to foo1.
GET /api/foo1/bar3 returns a representation of foo1.
DELETE /api/foo1/bar3 removes bar3 from foo1.
DELETE /api/foo1 deletes foo1 altogether.
Now the question remains: what does GET /api/foo1 do? Does it it simply return a representation of foo1 as I originally assumed in this question? Or does it return a list of bars? Or does it return a representation of foo1 that is both a description of foo1 as well as includes a list of all contained bars?
Or should GET /api/foo1 merely return a representation of foo1 as I assumed at the beginning, and require a PROPFIND request to list the bars inside foo1 (the approach taken by WebDAV)? But then in order to be consistent, wouldn't I have to change all my other list-type functionality to PROPFIND, directly contradicting all those thousands of RESTful tutorials that say to use GET /api/foo1 to list the contents?
After some pondering, I think the best conceptual explanation from a RESTful perspective is that usually the "thing" is not the same thing as its "collection". So while in the WebDAV world a directory/ might be the same thing that holds its files, in the RESTful world I might have a separate directory/files/ subpath for the contained files. That way I can manipulate directories separately from the files that hold.
Consider a RESTful API for a farm which contains barns. The endpoint farm/api/barns/ might return a list of barns, one of which would be farm/api/barns/bigredbarn. Naively I would think that retrieving farm/api/barns/bigredbarn/ would provide me a list of animals in the barn, which is what prompted this question.
But really the animals in the barn is only one aspect of the Big Red Barn. It might contain vehicles and hay:
farm/api/barns/bigredbarn/animals/
farm/api/barns/bigredbarn/vehicles/
farm/api/barns/bigredbarn/haybales/
With this approach the dilemma I faced does not arise.
The semantics of webdav has never really been reconciled with the idioms of RESTful interfaces.
In theory, GET should retrieve a representation of a state of a resource and PROPFIND should be used to retrieve the members of a collection.
So you should do this:
GET /api/foo1/ - returns state of foo1 only
PROPFIND /api/foo1/ - returns the members of foo1
Most front end devs would freak out if you told them to use PROPFIND, although its completely supported in browser js implementations.
Personally i use a webdav/json gateway, where requests are made using RESTful idioms, but routed to my webdav implementation
For example i would do this:
GET /api/foo1/_PROPFIND?fields=name,fooProp1,fooProp2
And that would return
[
{ name : "bar1", fooProp1: "..", fooProp2 : ".."},
{ name : "bar2", fooProp1: "..", fooProp2 : ".."}
]
One advantage of this approach is that client's get to control the json properties returned. This is good because a rich API will have a lot of properties, but in most situations clients dont need all of them.
The routes and their operations in RESTfull API are completely designed by the developers. It's the developer who decides what to return while requesting a specific route say, GET /api/foo1.
And the developer should design every route including /api/foo1/bar. There is no specific rule on what a particular route should do. If your API is an open-source project make a clean and clear documentation of every route.
Don't waste your time thinking about the old school strategies.

REST - Common practice for "change-all" mutations

In my app there is a button which (when clicked) should apply some mutation to all entities (of a certain type). Assuming my entities are "purchases" clicking on the "confirm all" button should result in the server/db setting the "confirmed" field of all "purchases" to "true".
I have the code that does this on the server side. My question is this: what is the URI that I should use for this action?
POST seems like a wrong choice as this action is idempotent. Thus, I am left with PUT. two ideas come to mind:
PUT /purchases?confirmed=true
PUT /purchaes/__all__?confirmed=true
Is there any well established convention?
EDIT
There is a third option (suggested by Markus):
PUT /confirmations/?confirmed=true
Of course, this can work and clearly has its merits. The (only) problem that I find with it is the confirmations is not an en entity in my system. In particular there is no GET /confirmations/some_id URI which may be confusing.
Firs of all, the URI you should use for this operation is whatever URI the collection of purchases returned for that action. The URI semantics shouldn't matter.
Second, when we say POST isn't idempotent, what we are saying is that the client can't assume by default that the operation is idempotent. The semantics of the POST method are entirely within your control, and that's why they must be documented and you can't rely on the client knowing beforehand what it does, as is the case with GET, PUT and DELETE.
In simple terms, having a PUT operation that isn't idempotent is a violation of the HTTP method semantics, but having a POST that is idempotent is fine.
Third, what you're doing is RPC, not REST. For instance:
PUT /purchases?confirmed=true
You're simply calling a method by passing parameters. The only way this could make sense in REST would be if you were creating or replacing a set of purchases to replace the subset where confirmed=true, which would be the exact opposite of what you want. If you want to do something like that, you should use POST, not PUT.
It's clear that your confirmation is a partial update, so you shouldn't be using PUT at all. As a general rule, if you're trying to define a separate semantics coupled to a particular resource for GET, PUT, PATCH and DELETE, you're doing wrong. Those methods should be generic and work in the exact same way for everything. If you need to couple some URI and method to a particular resource state transition, do it with POST.
I don't know the details of the API, but it looks like you're using query string parameters as payload parameters interchangeably. That's not a good idea either, since you can't treat URIs as atomic identifiers.
If your URIs were atomic, you could say this returns all unconfirmed purchases:
GET /purchases?confirmed=false
And you could document POST as the method to perform partial updates in bulk, changing the filtered subset:
POST /purchases?confirmed=false
{ "confirmed": true }
If you insist on using PUT, you should do something like this. If you have the URI below which returns a global confirmed status by reducing all purchases with AND:
GET /purchases/confirmation
{ "confirmed": False }
Then you could make something like:
PUT /purchases/confirmation
{ "confirmed": true }

POST a list of items using REST

I'm looking for a convention on how to serialize my data when I have a (long) list of items that I want to POST to the server.
For example, if I have a resource /users and I wanted to POST a new one to it, I'd http-encode the fields for the new user and put it in the request body like this: name=foo&age=20
But if I have a list of users, like this [{ name: 'foo', age: 20 }, { name: 'bar', age: 10 }], is there a conventional way of POSTing this?
I'm thinking name[0]=foo&age[0]=20&name[1]=bar&age[1]=10 but I can't find anything to back it up. What do web servers usually accept/expect?
Quick question which may change my answer: Are you POSTing directly from an HTML form or are you expecting something more sophisticated (e.g. javascript processsing, or not even a web-based client)
If you have a sophisticated enough client, you could just construct a JSON string and POST with a content type of application/json. Then whatever resource is processing the POST could use any number of json libraries to read the posted string and process as is.
Further Rambling:
What framework/languages are you using to construct your REST service? Do they have built-in functionality/conventions to help you?
For example if you're using JAX-RS to build your service, there is a built in annotation #FormParam which can be used to process posted forms... for example: if you posted the following with a content type of application/x-www-form-urlencoded: name=foo&age=20&name=bar&age=10
You could retrieve parallel lists on the service side via:
#POST
#Consumes("application/x-www-form-urlencoded")
public void createUsers(#FormParam("name") List<String> name, #FormParam("age") List<String> age) {
// Store your users
}
But you would then have to deal with the question of what if one list is shorter/longer than the other, how do you resolve that? What happens if a new field is required or optional to create a list of users? (But as I mentioned initially, a JSON array of JSON objects would solve that issue... there are a number of libraries out there that support automagic JSON deserialization in JAX-RS or there is also the option of creating your own MessageBodyReader.
(Disclaimer on the next section: I don't know rails, my experience is more in the Java services world... I'm basing this on this guide). It looks like Rails has a convention of name[]=foo&name[]=bar to process posted data into arrays automagically, and a similar convention to populate structure like user[name]=foo&user[age]=20... Perhaps if you are on rails there is some way to use/abuse both of these features to get the desired result?
Other REST frameworks and languages may have their own conventions and functionality :)
Rails serializes forms on a format not unlike what you suggest. If you have a nested model it encodes it like this:
name=theo&company[name]=acme
(the equivalent JSON would be {"name": "theo", "company": {"name": "acme"}})
I can't say that I've seen a Rails application sending arrays, but there's no reason why it wouldn't work (worst case you would end up with a hash with string keys).
PHP has another convention, if you want to send an array you do
names[]=alice&names[]=bob&names[]=steve
But I don't know how you do nested objects that way.
The HTTP spec, or if it's the URI spec, not sure which atm, actually specifies that if you pass the same argument multiple times you get array of values (instead of the last-wins behaviour of most application frameworks). You can see this in the API docs for Jetty, for example: http://api.dpml.net/org/mortbay/jetty/6.1.5/org/mortbay/jetty/Request.html#getParameterValues(java.lang.String)
However, most of this applies to GET requests, not necessarily POST (but perhaps application/x-url-encoded should adhere to the same standards as GET).
In short, I don't think there is a standard for doing this, POST bodies are a bit of a wild west territory. I think, however, that either you should go with JSON, because it's made to describe structures, and application/x-url-encoded is not, or you should try to represent the structure of your data better, something like:
users[0][name]=foo&users[0][age]=20&users[1][name]=bar&users[1][age]=10
That has some kind of chance of actually being interpretable by a Rails app out of the box, for example.

Rest Standard: Path parameters or Request parameters

I am creating a new REST service.
What is the standard for passing parameters to REST services. From different REST implementations in Java, you can configure parameters as part of the path or as request parameters. For example,
Path parameters
http://www.rest.services.com/item/b
Request parameters
http://www.rest.services.com/get?item=b
Does anyone know what the advantages/disadvantages for each method of passing parameters. It seems that passing the parameters as part of the path seems to coincide better with the notion of the REST protocol. That is, a single location signifies a unique response, correct?
Paths tend to be cached, parameters tend to not be, as a general rule.
So...
GET /customers/bob
vs
GET /customers?name=bob
The first is more likely to be cached (assuming proper headers, etc.) whereas the latter is likely not to be cached.
tl;dr: You might want both.
Item #42 exists:
GET /items/42
Accept: application/vnd.foo.item+json
--> 200 OK
{
"id": 42,
"bar": "baz"
}
GET /items?id=42
Accept: application/vnd.foo.item-list+json
--> 200 OK
[
{
"id": 42,
"bar": "baz"
}
]
Item #99 doesn't exist:
GET /items/99
Accept: application/vnd.foo.item+json
--> 404 Not Found
GET /items?id=99
Accept: application/vnd.foo.item-list+json
--> 200 OK
[
]
Explanations & comments
/items/{id} returns an item while /items?id={id} returns an item-list.
Even if there is only a single element in a filtered item-list, a list of a single element is still returned for consistency (as opposed to the element itself).
It just so happens that id is a unique property. If we were to filter on other properties, this would still work in exactly the same way.
Elements of a collection resource can only be named using unique properties (e.g. keys as a subresource of the collection) for obvious reasons (they're normal resources and URIs uniquely identify resources).
If the element is not found when using a filter, the response is still OK and still contains a list (albeit empty). Just because we're requesting a filtered list containing an item that doesn't exist doesn't mean the list itself doesn't exist.
Because they're so different and independently useful, you might want both. The client will want to differentiate between all cases (e.g. whether the list is empty or the list itself doesn't exist, in which case you should return a 404 for /items?...).
Disclaimer: This approach is by no means "standard". It makes so much sense to me though that I felt like sharing.
PS: Naming the item collection "get" is a code smell; prefer "items" or similar.
Your second example of "request parameters" is not correct because "get" is included as part of the path. GET is the request type, it should not be part of the path.
There are 4 main types of requests:
GET
PUT
POST
DELETE
GET requests should always be able to be completed without any information in the request body. Additionally, GET requests should be "safe", meaning that no significant data is modified by the request.
Besides the caching concern mentioned above, parameters in the URL path would tend to be required and/or expected because they are also part of your routing, whereas parameters passed in the query string are more variable and don't affect which part of your application the request is routed to. Although could potentially also pass a variable length set of parameters through the url:
GET somedomain.com/states/Virginia,California,Mississippi/
A good book to read as a primer on this topic is "Restful Web Services". Though I will warn you to be prepared to skim over some redundant information.
I think it depends. One URL for one resource. If you want to receive that resource in a slightly different way, give it a query string. But for a value that would deliver a different resource, put it in the path.
So in your example, the variable's value is directly related to the resource being returned. So it makes more sense in the path.
The first variation is a little cleaner, and allows you to reserve the request parameters for things like sort order and page, as in
http://www.rest.services.com/items/b?sort=ascending;page=6
This is a great fundamental question. I've recently come to the conclusion to stay away from using path parameters. They lead to ambiguous resource resolution. The URL is a basically the 'method name' of a piece of code running somewhere on a server. I prefer not to mix variable names with method names. The name of your method is apparently 'customer' (which IMHO is a rotten name for a method but REST folks love this pattern). The parameter you're passing to this method is the name of the customer. A query parameter works well for that, and this resource and query-parameter value can even be cached if desired.
There is no physical IT customer resource. There is likely no file on disk under a customer folder that's named after the customer. This is a web-service that performs some kind of database transaction. The 'resource' is your service, not the customer.
This obsession over REST and web-verbs reminds me of the early days of Object Oriented programming where we attempted to cram our code into virtual representations of physical objects. Then we realized that objects are usually virtual concepts in a system. OO is still useful when done the right way. REST is also useful if you realize that RESTful resources are services, not objects.