Handling long queries without violating REST

Handling long queries without violating REST - rest

We have a REST api, and we've done a pretty good job at sticking to the spirit of REST. However, we have an important consumer, and they're requesting a way to reconcile their datastore. The flow works like this:
Consumer makes a GET call to retrieve all inventory objects created within a date range. Lets say this returns 1 million inventory VINs.
Consumer compares the payload with their own datastore, see's that they're missing 5,000 inventory objects
Consumer would like to make a request with the 5,000 VIN id's, and return those 5,000 objects.
The problem is that the long query string (JSON array of vins) bumps into the query string length limits imposed by our server. Possbile ideas - make 5k separate calls (seems horrible), increase querystring length limit on server (would like not to do this), use POST instead (not RESTful?).
So, I'm wondering what Roy Fielding would do...

What about a POST submitting the JSON file with the id's list to a new resource, e.g. called /inventory/difference?
If the computation goes any long, you can answer with 202 Accepted and the id of the resource being generated, then point back to it at /inventory/difference/:id.

Somewhat similar to what moonwave99 suggested, but instead you create a resource called a "set".
You POST to /set a list of identifiers that you wish to be in the set. The result of the POST is a redirect URL to the resource that names the specific set.
So:
POST /set
Result:
301 Moved Permanently
Location: /set/123
Then:
GET /set/123
Returns the list of items in the set.
Sets are orthogonal to the use case of "fetching differences", they're simply a compilation of items.
If the creation of a set takes a long time, and you consider the set itself to be a snapshot of the data, when the user tries to do the GET /set/123 can simply reply with a 202 Accepted until the actual dataset has been completed.
You can then use:
GET /set/123/identifiers
To get a collection of the actual identifiers in the set, for example, if you like.
You can do something like
POST /setfromquery
and send a list of criteria (name like "John*", city = "Los Angeles", etc.). This doesn't really need its own specific resource, just define your query "language" to include both simple lists of IDs as well as perhaps other filter criteria.
Set operations (unions, differences, etc.). Lots of powerful things can be done with a set resource.
Finally, of course, there's the ever popular:
DELETE /set/123

I don't think anyone would fault you in working around GET not accepting a request body by using POST for a request that needs a request body. You are just being pragmatic.
I agree, making 5000 individual requests or upping the query string limit are ugly. POST is the way forward.

Using a post without creating a resource just seemed too dirty for me. In the end, we made it so that there was a limit of 100 ids requested in a "chunk". In practice, these requests will rarely be > 100, so hacking REST principles to accomodate an edge case seemed like a bad idea. I made sure the limitation was clearly defined in our API docs, done and done...

Related

Complex requests with REST API

I am wondering if it is possible to adhere to REST principles when creating what will essentially amount to a BI tool.
In my scenario I have high data volume with 100,000's of IDs (frankly more than this but for the sake of this example let's go with that.). These are presented in a traditional table that allows for necessary features when accessing large data sets such as pagination. The user also has the ability to filter by one, or many of these ID's to drill down the data set as they see fit.
It is theoretically possible that the user would want to filter on 100 of the ID's, thus rendering a GET URI incredibly long. Which as I understand it would kind of break the resource identification principle of a REST API. Not to mention could potentially bump into the character limit in a GET request for certain browsers since the ID's may be quite long. Normally I would just use a POST since I can add all of the applied filters in the body and generate a where clause on the server.
Since a POST in a REST API is supposed to
Create a new entry in the collection.
By definition it would appear, at least to me that generating a complex query for something like this would mean that a REST API is not possible. Or does this perhaps mean that I am approaching the solution wrong (totally plausible).
It would seem that in my scenario using a GET simply isn't possible due to the potential length of the parameters. Thus I am forced to use a POST. Though using a POST as I am seems to violate REST style, which isn't the end of the world. I mostly just wanted to double check that I am not missing something and there is not a solution using a GET.

To follow the resources principle, make a search like resource. POST your ids in a body wto this endpoint and it will prepare a list of results for you and redirect you to searchresults/{id}.
See for example: https://gooroo.io/GoorooTHINK/Article/16583/HTTP-Patterns---Bouncer/25829#.W3aBsugzaUk

Design a REST API in which a search request can take parameters for multiple Queries

I have to design a REST API in which a search request can take parameters for multiple Queries ( i.e. when the client make a call using this API, he should be able to send parameters to form multiple queries).
We have an existing API where we are using GET and it takes multiple parameters which together forms a single Query and then this API call returns the response for this query.
e.g. currently I can pass firstName, lastName, age etc in the request and then get back the person.
But now I have to enhance this service(or have a separate service) where I should be able to send parameters like firstName1, lastName1, age1 to search person1 ; firstName2, lastName2, age2 to search person2 and so on.
Should I use POST for the new API and then send list of parameters(params for query1, params for query2 and so on)?
Or is there a better approach.
We are using Spring Boot for REST implementation.

Its better to use POST because GET is good for 2,3 parameter but when you have a set of parameter or object then POST is Good.

The best thing to do here will be do POST and then return a JSON object with all the details of the Person in an array.
That way it will be faster and you would not have to deal with long urls for GET.
Also GET has limitations regarding the length of the request whereas there is no such limitation in case of POST.

It is really hard to give a right answer here. In general sending a GET request does have the advantage that you can leverage caching easily on a HTTP level, e.g. by using products like varnish, nginx, etc. But if you already can forsee that your URL including all params you'll have to send a POST request to make it work in all Browsers.

RESTfull architecture should respect the principle of addressability.
Since multiple users can be accessed through a unique request, then ideally this group of user should get an address, which would identify it as a resource.
However I understand that in the real world, URIs have a limited length (maximum length of HTTP GET request?). A POST request would indeed work well, but we lose the benefit of addressability.
Another way would be to expose a new resource : group,.
Lets suppose that your current model is something like this :
.../users/{id}
.../users/search?{arg1}={val1};{arg2}={val2}
You could eventually do something like :
.../users/groups/
.../users/groups/{id}
.../users/search?group={id}
(explanation below)
then you could split your research in two :
first a POST on .../users/groups/ with, as proposed by other response, a JSON description of the search parameters. This request could scan the .../users/groups/ directory, and if this set of parameters exists, return the corresponding address .../users/groups/{id}. (for performance issues you could for instance define {id} with a first part which would give the number of users requested).
Then you could make a request for this group with a GET with something like this : .../users/search?group={id}.
This approach would be a bit more complex to implement, but is more consistent with the resource oriented paradigm.

POST to get REST resource - three approaches - which one would you recommend?

I have REST resource (Ex: Tickets). To be able to obtain a set of Tickets that match a given set of constraints (Ex: start date, end date, price and other criterion) a user will need to pass information. This information can be included as query parameters and the protocol can define:
GET: Tickets?start-date=date&end-date=date&price=someprice...
The set of constraints to pass could be a lot.
In such situations, is it better to use a POST and pass the set of constraints as JSON object within the body?
POST: Tickets
Body:
{
"start-date": "date"
"end-date" : "date"
. . .
}
What are the drawbacks of such an approach? Does it still agree with the REST guidelines?Ref: http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
Another alternative is the client could create a new resource called "Constraints" on the server, obtain a constraint-id (ex:123) as a response. Then it could use:
GET: Tickets?constraints-id=123
But this will mean that the server will periodically have to expire and delete "Constraint" objects, as clients might keep creating those without completing the business flow (ex: without confirming a Ticket in the end)
A third approach could be still use POST, but not create any resource. We can use a URI scheme like this:
POST: Tickets\Constraints
Body:
Body:
{
"start-date": "date"
"end-date" : "date"
. . .
}
Response:
200 OK ...
Tickets
This will mean that allthough no resource was created on the server, the need to POST the constraints to obtain Tickets is still made clear.
Which of these approaches would you recommend? What is most intuitive? Or is other any other alternative you would recommend?

Simply according to the HTTP spec, a POST is not a valid method to send a large amount of data for a query, as the intention is that the body of the request is to be stored by the server in some way, which is not the case in your example.
My current project faced the same problem and we decided to go with the more correct GET with many templated query parameters. Despite supporting over a dozen query params which can be quite long in length, most servers specify a GET request maximum length of 8KB, which I would expect to be an ample amount. I suppose this limit could be reached if you were attempting to send a GET with a large amount of the same query parameter to describe a long list, but if this is this case then it would suggest taking a step back and seeing how this has become a requirement of the API.
In my opinion a GET is the most intuitive and clearest use, and definitely seems to be the "correct" RESTful implementation. If the size of the request is an issue for you and you control the environment you are deploying to, you can even increase your server's max request size.

Yes, definitely OK and a good idea, especially if the post data is large, as it may exceed the max url length. It is better as part of the body of the message rather than on the url.

REST Design: What Http verb should be used to retrieve a dynamic resource?

I have a scenario in which I have REST API which manages a Resource which we will call Group. A Group contains members and the group resource is dynamic - whenever you retrieve it, you get the latest data (so a query must run server side to update the number of members in a group - in other words, the result of the request is to modify the data, since the results of running the query are stored).
Given a *group_id* it should return a minimal amount of information like
{
group_id: "5t7yu8i9io0op",
group_name: "That's my name",
size: 34
}
So a GET to this resource causes the resource to change, since a subsequent GET could return a new value for 'size'. This tells me it is not idempotent and so you should use POST to retrieve this resource. Am I correct in this conclusion?
If I am correct, do you think it is advisable to also provide a GET method that only returns the currently stored data for the group (eg. so the size could be out of date, even the name too). I suppose in this case I should return a last-modified date as one of the fields so that the user knows how up-to-date the resource is and can then elect to use the POST method...but then I am left wondering why would anyone do that, so why not ONLY provide the POST method and forget about GET?
Confused I am!
Thanks in advance.
[EDIT]
#Satish posted a link in his/her answer to the HTTP specs. In section 9.1.1. it ends with this sentence:
Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them.
So in my scenario, the requester does not really care about the side-effect that the value for 'size' is recomputed as a direct result of making the request. They want the group information and it just so happens that to provide accurate, up-to-date group data, the size query must be run in order to update that value. Whilst making the request causes data to change implies this should be a POST, the user did not request that side-effect and so therefore a GET request would be acceptable and more intuitive, would it not? And therefore still be restful according to this sentence.
[2nd EDIT]
#Satish asks a very important question in the comments. So for others who read this I'll explain further about this problem:
Normally you would not run the group query to update its size from a REST request. As members are added or removed from a group, you would update the computed size of that group, store it and then a simple GET request would always return the correct size. However, our situation is more complicated in that a group is only stored as a query definition in ElasticSearch (kind of like a view in an RDBMS). Members do not get added/removed to and from groups. They get added to a much larger set of data (a collection in MongoDB). There are hundreds, potentially thousands, of different 'group definitions' so it is not practical to recompute size for every group when the collection changes. We cannot know when an item is added/removed to/from the collection which groups might change size - you only know by running the group definition who is in that group and what the size is. I hope that clears things up. :)

You should use GET. Even if dynamic resource is changing, you did not request for that change through your request and you are not accountable for that change. Ref: http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html

In your case when you do a GET you retrieve some information about the Group. You don't modify the group structure. Ok, the group can be changed by an external entity so your next GET may bring you another data. Am I right? Who modifies the structure of the group and when?
So you should use GETbecause the resource it will be modified from somewhere else and not by your call that tries to do a read operation.
EDIT
After your edited the question I just want to add that I agree about the side effects.
It matters if you sent data or a change command explicitly to the server or you just read something and you don't have to pay attention for what the server side is doing to gave you the response. More intuitively:
GET - Requests data from a specified resource
POST - Submits data to be processed to a specified resource

It is the combination of GET and POST. So you should use POST.
Refer : http://adarshdchaurasia.wordpress.com/2013/09/26/http-get-vs-post/
You should not use GET because if you use GET method then search engines may cache the responses. It may cause unintentional data update at your server side, which you do not want. GET method is meant to return content without updating anything on server. POST is meant to updated the things at server and return result against that operation.

RESTful way to create multiple items in one request

I am working on a small client server program to collect orders. I want to do this in a "REST(ful) way".
What I want to do is:
Collect all orderlines (product and quantity) and send the complete order to the server
At the moment I see two options to do this:
Send each orderline to the server: POST qty and product_id
I actually don't want to do this because I want to limit the number of requests to the server so option 2:
Collect all the orderlines and send them to the server at once.
How should I implement option 2? a couple of ideas I have is:
Wrap all orderlines in a JSON object and send this to the server or use an array to post the orderlines.
Is it a good idea or good practice to implement option 2, and if so how should I do it.
What is good practice?

I believe that another correct way to approach this would be to create another resource that represents your collection of resources.
Example, imagine that we have an endpoint like /api/sheep/{id} and we can POST to /api/sheep to create a sheep resource.
Now, if we want to support bulk creation, we should consider a new flock resource at /api/flock (or /api/<your-resource>-collection if you lack a better meaningful name). Remember that resources don't need to map to your database or app models. This is a common misconception.
Resources are a higher level representation, unrelated with your data. Operating on a resource can have significant side effects, like firing an alert to a user, updating other related data, initiating a long lived process, etc. For example, we could map a file system or even the unix ps command as a REST API.
I think it is safe to assume that operating a resource may also mean to create several other entities as a side effect.

Although bulk operations (e.g. batch create) are essential in many systems, they are not formally addressed by the RESTful architecture style.
I found that POSTing a collection as you suggested basically works, but problems arise when you need to report failures in response to such a request. Such problems are worse when multiple failures occur for different causes or when the server doesn't support transactions.
My suggestion to you is that if there is no performance problem, for example when the service provider is on the LAN (not WAN) or the data is relatively small, it's worth it to send 100 POST requests to the server. Keep it simple, start with separate requests and if you have a performance problem try to optimize.

Facebook explains how to do this: https://developers.facebook.com/docs/graph-api/making-multiple-requests
Simple batched requests
The batch API takes in an array of logical HTTP requests represented
as JSON arrays - each request has a method (corresponding to HTTP
method GET/PUT/POST/DELETE etc.), a relative_url (the portion of the
URL after graph.facebook.com), optional headers array (corresponding
to HTTP headers) and an optional body (for POST and PUT requests). The
Batch API returns an array of logical HTTP responses represented as
JSON arrays - each response has a status code, an optional headers
array and an optional body (which is a JSON encoded string).

Your idea seems valid to me. The implementation is a matter of your preference. You can use JSON or just parameters for this ("order_lines[]" array) and do
POST /orders
Since you are going to create more resources at once in a single action (order and its lines) it's vital to validate each and every of them and save them only if all of them pass validation, ie. you should do it in a transaction.

I've actually been wrestling with this lately, and here's what I'm working towards.
If a POST that adds multiple resources succeeds, return a 200 OK (I was considering a 201, but the user ultimately doesn't land on a resource that was created) along with a page that displays all resources that were added, either in read-only or editable fashion. For instance, a user is able to select and POST multiple images to a gallery using a form comprising only a single file input. If the POST request succeeds in its entirety the user is presented with a set of forms for each image resource representation created that allows them to specify more details about each (name, description, etc).
In the event that one or more resources fails to be created, the POST handler aborts all processing and appends each individual error message to an array. Then, a 419 Conflict is returned and the user is routed to a 419 Conflict error page that presents the contents of the error array, as well as a way back to the form that was submitted.

I guess it's better to send separate requests within single connection. Of course, your web-server should support it

You won't want to send the HTTP headers for 100 orderlines. You neither want to generate any more requests than necessary.
Send the whole order in one JSON object to the server, to: server/order or server/order/new.
Return something that points to: server/order/order_id
Also consider using CREATE PUT instead of POST

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse