RESTful API, what if the query string isn't long enough? - rest

We have a resource collection for products: /products.
We want to filter this collection to only return the members which have one of a list of specific class id's. For example:
GET /products?classes=100,101,102
This should return a collection of product members which have any of the classes listed.
The issue we have, is that we're working with thousands of products and classes, so the class list of id's could be thousands long - too long for a query string.
I'm keen to stick to RESTful principles whenever we can, so I like the fact that the resource /products?classes=100,101,102 when called with GET returns a filtered products collection.
Obviously, we could include the id's list in the body in JSON format, but that would mean that the call GET /products won't return a representation of the state of the resource (the resource being the URL), because the body is being used to provide filter options.
What's the best way to request a collection which is filtered, but the filter options are too long to use the query string..?

Interesting comment from #C. Smith who suggests making a POST call using a X-HTTP-Method-Override header set to GET and passing the id's in the body. This would work.
After thinking about it we're probably going to limit the number of class id's allowed in the query string, and suggest making multiple calls, breaking up the id's list into say, groups of 200. Submitting more than 200 would return an error.
GET /products?classes=1004,2342,8753... (limited to 200 id's)
GET /products?classes=2326,3343,6981... (limited to 200 id's)
Then the results can easily be stitched together after.
This method would let you use 5,000 id's for example, by doing 25 calls, which while not ideal, is ok for our use case.

Related

RESTful API URL design with authentication

My data model is like this:
User: id, email, hashed_password
Item: id, name, color
UserItem: user_id, item_id, rating
and I want to write a RESTful API to get these resources. The authentication is provided via OAuth 2 with a JWT token (that contains the logged user id).
First approach
Endpoints
The first idea for a URL structure is (the one I chose when there was still no authentication):
/items/{item_id}
/users/{user_id}
/users/{user_id}/items/{item_id}
In this case a user with id 1 could use:
GET /users/1 to get their own information;
GET /users/1/items to get their own items (with rating);
GET /items to get all items that they could add to their collection.
Analysis
I think this solution is quite clear, but also unelegant.
Good:
You can easily get other users info (if they are available to them);
1-to-1 relations between endpoints and data models.
Bad:
Longer URLs;
There is redundancy (why GET /users/1/items when in the token you already have the information about id 1?).
Second approach
Endpoints
Given that you can extract the user id from the token, the structure could as well be more simple:
/items/{item_id}
/users/{user_id}
In this case a user with id 1 could use:
GET /users/me to get their own information;
GET /items?class=owned to get their own items (with rating);
GET /items?class=all to get all items that they could add to their collection.
Analysis
This solution is a bit messy but probably more elegant.
Good:
Shorter URLs;
Less redundancy (GET /items to get your own items).
Bad:
Only model UserItem is represented (even though in this case it is probably almost meaningless to get an Item without its rating, that could be set to null if the user has not yet added it);
Not straightforward to get other users' items (maybe something like GET /items?user=3?).
Conclusions
Honestly I don't know what is the best practice in this case. I feel like there is something off about both of these. Maybe there is an hybrid approach I'm not seeing?
How would you organize a model like this?
You could look into a format like HAL. HAL gives you a way to describe specific resources (items) and allows you to create multiple collections that point to those resources.
This means that individual items could be hosted at /items/xyz, but items can be both part of the /user/a/items and /items collections.
I put a lot of work into a hypermedia client: https://github.com/badgateway/ketting . This is not just an ad though, there's alternatives but that approach of API design might we well-suited for you.
But regardless of the client you're using, systems like this can avoid the issue of retrieving the same item through multiple endpoints. A single item has a canonical url, and if the system is designed well you only have to retrieve an item once.
A collection is just a list of links to the resources (items) that belong to that collection. They point to the item, but don't 'contain it', just like a regular hyperlink.

Passing sizable data in an REST GET request

A REST question. Let's say I have a database will a million items in it. I want to retrieve say 10,000 of them via an REST GET, passing in the GET request the ID's of the 10,000 items. Using URL request query parameters, it'll quickly exceed the maximum length of a URL. How does people solve this? Use a POST instead and pass it in the body? That seems hacky.
You should not address this form through the URL parameters, it has a limit: 2000 characters
Url limit
I guess what you are doing is something like this:
https://localhost/api/applicationcollections/(47495dde-67d2-4854-add0-7e25b2fe88e4,1b470799-cc8a-4284-9ca7-76dc34a5aebb)
If you are planning to get more than 10k records you can pass the information on the body of the request which doesn't have any limit. Technically speaking you should do it through a POST request, but that is not the intent with the semantic of the POST verb. Even for the GET you can include a body:HTTP GET with request body but it should not consider as part of the semantic.
Normally you don't filter 10k elements by id, instead, you get 10k elements on a request, passing a pagination parameter if you want through the URL, but that can kill your app, especially considering that the DTO has more than one field, like
ApplicationDto
field1
field2....
field15
Bellow, you have an example of how to pass pagination parameters and get the first 10k records
https://localhost:44390/api/applications?pageNumber=1&pageSize=10000
Also, the APIs should return an extra header, let's call it X-Pagination where you can get the information if you have more pages to paginate, including as well the total amount of elements.
As an extra effort to reduce the size of the request, you can shape the data and only get the fields you need.
ApplicationDto should bring only: field1, field3 see bellow:
https://localhost:44390/api/applications?fields=field1,field3
See how Twitter address this problem as well:
Twitter cursor response
Hope this helps

Add subcategories in a filtered API Restful resource

I'll give an example as the title might sound a bit confusing.
How to build a resource path for something like that:
GET /courses/?language=english&active=true/units
I want to filter the courses (not using an id as usually) and then get the units of this result. How would you do that? I guess using question marks between the path is not allowed.
That would depend a little on your DB schema of what is a "course" and a "unit". The whole point on using the RESTful way is to always build requests and urls resource-specific.
But let's say that one course has X units on it. Here's what i would do to make a RESTful path to that request:
Due to the path problem of filtering courses AND using the /unit suffix, it can be done by adding another query parameter that specifies what fields the request is supposed to return. Something like this:
GET /courses?language=english&active=true&fields=units
That would filter the courses, and then return only the 'units' field on the response. As i said, depending on your DB and models, if the units are not stored inside the courses, it would be a bad practice to get them by requesting a /courses path. In that case, first request the courses that match the desired filter, and then make another request to the /units context sending i.e the courses ID's as query parameters.

rest Http verb best practice for querying data

I have read that it best practice to use method in REST as an indicator of operation performed on the resource.Lets say i have 5 operation,I am using below resource and methods:
Resource /customer- POST- CreateCustomer
DELETE-delete customer
PUT-update customer
Now I have 2 more operations of query : findCustomer and queryCustomer.
I can use GET method for one of them only.What is the best practice to handle such scenario because passing an explicit HTTP header or extra query string for identifying 1 exceptional opertaion doesnt seem like a good alternative !
I have 2 more operations of query: findCustomer and queryCustomer. I can use GET method for one of them only.
The GET method is suitable for both operations, however you must use different URIs.
Use the following to retrieve a representation of a collection of customers (the operation you define as query):
GET /customers
The collections can be filtered with query parameters.
And use the following to retrieve a representation of a single customer (the operation you define as find):
GET /customers/{id}
{id} is a unique identifier for your customer.
Related: See this answer for some insights on which status code can be returned in each situation.

Handling long queries without violating REST

We have a REST api, and we've done a pretty good job at sticking to the spirit of REST. However, we have an important consumer, and they're requesting a way to reconcile their datastore. The flow works like this:
Consumer makes a GET call to retrieve all inventory objects created within a date range. Lets say this returns 1 million inventory VINs.
Consumer compares the payload with their own datastore, see's that they're missing 5,000 inventory objects
Consumer would like to make a request with the 5,000 VIN id's, and return those 5,000 objects.
The problem is that the long query string (JSON array of vins) bumps into the query string length limits imposed by our server. Possbile ideas - make 5k separate calls (seems horrible), increase querystring length limit on server (would like not to do this), use POST instead (not RESTful?).
So, I'm wondering what Roy Fielding would do...
What about a POST submitting the JSON file with the id's list to a new resource, e.g. called /inventory/difference?
If the computation goes any long, you can answer with 202 Accepted and the id of the resource being generated, then point back to it at /inventory/difference/:id.
Somewhat similar to what moonwave99 suggested, but instead you create a resource called a "set".
You POST to /set a list of identifiers that you wish to be in the set. The result of the POST is a redirect URL to the resource that names the specific set.
So:
POST /set
Result:
301 Moved Permanently
Location: /set/123
Then:
GET /set/123
Returns the list of items in the set.
Sets are orthogonal to the use case of "fetching differences", they're simply a compilation of items.
If the creation of a set takes a long time, and you consider the set itself to be a snapshot of the data, when the user tries to do the GET /set/123 can simply reply with a 202 Accepted until the actual dataset has been completed.
You can then use:
GET /set/123/identifiers
To get a collection of the actual identifiers in the set, for example, if you like.
You can do something like
POST /setfromquery
and send a list of criteria (name like "John*", city = "Los Angeles", etc.). This doesn't really need its own specific resource, just define your query "language" to include both simple lists of IDs as well as perhaps other filter criteria.
Set operations (unions, differences, etc.). Lots of powerful things can be done with a set resource.
Finally, of course, there's the ever popular:
DELETE /set/123
I don't think anyone would fault you in working around GET not accepting a request body by using POST for a request that needs a request body. You are just being pragmatic.
I agree, making 5000 individual requests or upping the query string limit are ugly. POST is the way forward.
Using a post without creating a resource just seemed too dirty for me. In the end, we made it so that there was a limit of 100 ids requested in a "chunk". In practice, these requests will rarely be > 100, so hacking REST principles to accomodate an edge case seemed like a bad idea. I made sure the limitation was clearly defined in our API docs, done and done...