REST: How best to handle large lists

REST: How best to handle large lists - rest

It seems to me that REST has clean and clear semantics for basic CRUD and for listing resources, but I've seen no discussion of how to handle large lists of resources. It doesn't scale to dump an entire database table over the network in a resource-oriented architecture (imagine a customer table with a million customers!), especially if you only need a few items. So it seems that some semantics should exist to filter, map and reduce a list of resources on the server-side.
So, do you know any tried and true ways to do the following kinds of requests in REST:
1) Retrieve just the count of the resources?
I could imagine doing something like GET /api/customer?result=count
Is that how it's usually done?
I could also imagine modifying the URL (/api/count/customer or /api/customer/count, for example), but that seems to either break the continuity of the resource paths or inflict an ugly hack on the expected ID field.
2) Filter the results on the server-side?
I could imagine using query parameters for this, in a context-specific way (such as GET /api/customer?country=US&state=TX).
It seems tricky to do it in a flexible way, especially if you need to join other tables (for example, get customers who purchased in the last 6 months).
I could imagine using the HTTP OPTIONS method to return a JSON string specifying possible filters and their possible values.
Has anyone tried this sort of thing?
If so, how complex did you get (for example, retrieving the items purchased year-to-date by female customers between 18 and 45 years old in Massachussetts, etc.)?
3) Mapping to just get a limited set of fields or to add fields from joined tables?
4) More complicated reductions than count (such as average, sum, etc.)?
EDIT: To clarify, I'm interested in how the request is formulated rather than how to implement it on the server-side.

I think the answer to your question is OData! OData is a generic protocol for querying and interacting with information. OData is based on REST but extends the semantics to include programatic elemements similar to SQL.
OData is not always URL-based only as it use JSON payloads for some scenarios. But it is a standard (OASIS) so it well structured and supported by many APIs.
A few general links:
https://en.wikipedia.org/wiki/Open_Data_Protocol
http://www.odata.org/

The most common ways of handling large data sets in GET requests are (afaict) :
1) Pagination. The request would be something like GET /api/customer?country=US&state=TX&firstResult=0&maxResults=50. This way the client has the freedom to choose the size of the data chunk he needs (this is often useful for UI-based clients).
2) Exposing a size service, so that the client gets to know how large the data set is before actually requesting it. The service would be something like
GET /api/customer/size?country=US&state=TX
Obviously the two can (and imho should) be used together, so that when/if a client (be it mobile or web or whatever) waints to fill its UI with content, he can choose what's the best data chunk size based also on the size of whole data set (e.g. to avoid creating 100 pages for the user to navigate).

Related

REST Best practise for filtering and knowing the result is singular: List or single?

Variety of REST practises suggest (i.e. 1, 2, 3) to use plurals in your endpoints and the result is always a list of objects, unless it's filtered by a specific value, such as /users/123 Query parameters are used to filter the list, but still result in a list, nevertheless. I want to know if my case should 'abandon' those best practices.
Let's use cars for my example below.
I've got a database full of cars and each one has a BuildNumber ("Id"), but also a model and build year which combination is unique. If I then query for /cars/ and search for a specific model and year, for example /cars?model=golf&year=2018 I know, according to my previous sentence, my retrieve will always contain a single object, never multiple. My result, however, will still be a list, containing just one object, nevertheless.
In such case, what will be the best practise as the above would mean the object have to be extracted from the list, even though a single object could've been returned instead.
Stick to best practises and export a list
Make a second endpoind /car/ and use the query parameters ?model=golf&year=2018, which are primarily used for filtering in a list, and have the result be a single object, as the singular endpoint states
The reason that I'm asking this is simply for the cleanness of the action: I'm 100% sure my GET request will result in single object, but still have to perform actions to extract it from the list. These steps should've been unnecessary. Aside of that, In my case I don't know the unique identifier, so cars/123 for retrieving a specific car isn't an option. I know, however, filters that will result in one object and one specific object altogether. The additional steps simply feel redundant.
1: https://learn.microsoft.com/en-us/azure/architecture/best-practices/api-design
2: https://blog.mwaysolutions.com/2014/06/05/10-best-practices-for-better-restful-api/
3: https://medium.com/hashmapinc/rest-good-practices-for-api-design-881439796dc9

As you've specifically asked for best practices in regards to REST:
REST doesn't care how you specify your URIs or that semantically meaningful tokens are used inside the URI at all. Further, a client should never expect a certain URI to return a certain type but instead rely on content-type negotiation to tell the server all of the capabilities the client supports.
You should furthermore not think of REST in terms of object orientation but more in terms of affordance and statemachines where a client get served every information needed in order to make an educated decision on what to do next.
The best sample to give here is probably to take a close look at the Web and how it's done for HTML pages. How can you filter for a specific car and how it will be presented to you? The same concepts that are used in the Web also apply to REST as both use the same interaction model. In regards to your car sample, the API should initially return some control-structures that teach a client how a request needs to be formed and what options could be filtered for. In HTML this is done via forms. For non-HTML based REST APIs dedicated media-types should be defined that translate the same approach to non-HTML structures. On sending the request to the server, your client would include all of the supported media-types it supports in an Accept HTTP header, which informs the server about the capabilities of the client. Media-types are just human-readable specification on how to process payloads of such types. Such specifications may include hints on type information a link relation might return. In order to gain wide-usage of media-types they should be defined as generic as possible. Instead of defining a media-type specific for a car, which is possible, it probably would be more convenient to use an existing or define a new general data-container format (similar to HTML).
All of the steps mentioned here should help you to design and implement an API that is free to evolve without having to risk to break clients, that furthermore is also scalable and minimizes interoperability concerns.
Unfortunately your question targets something totally different IMO, something more related to RPC. You basically invoke a generic method via HTTP on an endpoint, similar like SOAP, RMI or CORBA work. Whether you respect the semantics of HTTP operations or not is only of sub-interest here. Even if you'd reached level 3 of the Richardson Maturity Model (RMM) it does not mean that you are compliant to REST. Your client might still break if the server changes anything within the response. The RMM further doesn't even consider media-types at all, hence I consider it as rather useless.
However, regardless if you use a (true) REST or RPC/CRUD client, if retrieving single items is your preference instead of feeding them into a collection you should consider to include the URI of the items of interest instead of its data directly into the collection, as Evert also has suggested. While most people seem to be concerned on server performance and round-trip-times, it actually is very elegant in terms of caching. Further certain link-relation names such as prefetch may inform the client that it may fetch the targets payload early as it is highly possible that it's content will be requested next. Through caching a request might not even have to be triggered or sent to the server for processing, which is probably the best performance gain you can achieve.

1) If you use query like cars/where... - use CARS
2) If you whant CAR - make method GetCarById

You might not get a perfect answer to this, because all are going to be a bit subjective and often in a different way.
My general thought about this is that every item in my system will have its own unique url, for example /cars/1234. That case is always singular.
But this specific item might appear as a member in collections and search results. When /cars/1234 apears in these, they will always appear as a list with 1 item (or 0 or more depending on the query).
I feel that this is ultimately the most predictable.
In my case though, if a car appears as a member of a search or colletion, it's 'true url' will still be displayed.

Complex requests with REST API

I am wondering if it is possible to adhere to REST principles when creating what will essentially amount to a BI tool.
In my scenario I have high data volume with 100,000's of IDs (frankly more than this but for the sake of this example let's go with that.). These are presented in a traditional table that allows for necessary features when accessing large data sets such as pagination. The user also has the ability to filter by one, or many of these ID's to drill down the data set as they see fit.
It is theoretically possible that the user would want to filter on 100 of the ID's, thus rendering a GET URI incredibly long. Which as I understand it would kind of break the resource identification principle of a REST API. Not to mention could potentially bump into the character limit in a GET request for certain browsers since the ID's may be quite long. Normally I would just use a POST since I can add all of the applied filters in the body and generate a where clause on the server.
Since a POST in a REST API is supposed to
Create a new entry in the collection.
By definition it would appear, at least to me that generating a complex query for something like this would mean that a REST API is not possible. Or does this perhaps mean that I am approaching the solution wrong (totally plausible).
It would seem that in my scenario using a GET simply isn't possible due to the potential length of the parameters. Thus I am forced to use a POST. Though using a POST as I am seems to violate REST style, which isn't the end of the world. I mostly just wanted to double check that I am not missing something and there is not a solution using a GET.

To follow the resources principle, make a search like resource. POST your ids in a body wto this endpoint and it will prepare a list of results for you and redirect you to searchresults/{id}.
See for example: https://gooroo.io/GoorooTHINK/Article/16583/HTTP-Patterns---Bouncer/25829#.W3aBsugzaUk

Standard for RESful service with differing models?

From the perspective of an MVC developer, the Model should only contain Properties relevant to the view. So what is the best practice for dealing with this RESTful service scenario?
Usage 1
The RESTful endpoint "my-application/items/" is used to QUERY for a list of items which are bound to a paged list view of the items. It may contain many properties, such as ItemId, ItemName, CreatedDate, ModifiedDate, etc. It may even be a paged result of data from the server (eg 10 records out of 1000).
Usage 2
In a different area of the application, I need a select box for these items. In this scenario, I just need ItemId and ItemName. All other properties are irrelevant.
Do I...
...swallow my MVC pride and just use a single bloated model and a
single RESTful endpoint?
...create different RESTful endpoints with some sort of naming standard?
...do something else?

To me, this is not a theoretical REST issue. This is an implementation issue. Basically, you are asking whether you should implement a different endpoint for a separate use case or not. I agree with you when you say that the information exchanged with the client should be the minimum for performing a specific task. The only reason not to do so is convenience or budget.
In the use case 1, you should provide a wide set of data, in use case 2 you need far less of them..
Maybe use an endpoint as such .../items/details for the use case 1 and another endpoint .../items for the use case 2.
Maybe you could also implement a single endpoint and use a query parameter as such
.../items?detailed=true
Both solutions are perfectly acceptable.
The theoretical explanation behind this consideration is that REST asks the server to exchange resources' representation with the client. Representation means something similar to the view concept of the MVC model: it's not the entity itself, it's the most convenient way of describing it in a given context.
And this also means that different contexts may require different representations.

REST API Design - Collections as arrays with ETag

Take the following URL template:
~/pupil/{id}/subjects
If the subjects are a collection represented in the traditional way, as if each item stands alone like the pupil items, then is this the correct way to expose them?
It begins to feel wrong when you consider updating the collection in terms of the pupil and concurrently with another API caller.
Suddenly, you cannot synchronize access since there's no ETag to cover the set and you'll end up interleaving the changes and getting in a tangle.
A different design could see the subjects incorporated as a sub array in the entity of the pupil and the /subjects URL is just for read access.
Perhaps the subjects should be returned as a single array set entity with a discreet ETag and that POSTing individual subjects should be disabled and updates made via a POST/PUT of the whole set, but what if the list is very long? Needs paging?
Maybe the design decision is case-by-case, not a sweeping guideline. Not sure.
Thoughts?

It depends on whether you want to treat "subjects" as a single resource or not.
If, as you say, consumers of your API and going to want to add, remove or modify individual subjects then, yes the traditional way of representing them would be correct with regard to REST patterns:
~/pupil/{id}/subjects
would return a list of resources at
~/pupil/{id}/subjects/{subjectId}
Unless there's a strong reason to optimize bulk operations or caching, this is the most RESTful and straightforward implementation.

What to do about huge resources in REST API

I am bolting a REST interface on to an existing application and I'm curious about what the most appropriate solution is to deal with resources that would return an exorbitant amount of data if they were to be retrieved.
The application is an existing timesheet system and one of the resources is a set of a user's "Time Slots".
An example URI for these resources is:
/users/44/timeslots/
I have read a lot of questions that relate to how to provide the filtering for this resource to retrieve a subset and I already have a solution for that.
I want to know how (or if) I should deal with the situation that issuing a GET on the URI above would return megabytes of data from tens or hundreds of thousands of rows and would take a fair amount of server resource to actually respond in the first place.
Is there an HTTP response that is used by convention in these situations?
I found HTTP code 413 which relates to a Request entity that is too large, but not one that would be appropriate for when the Response entity would be too large
Is there an alternative convention for limiting the response or telling the client that this is a silly request?
Should I simply let the server comply with this massive request?
EDIT: To be clear, I have filtering and splitting of the resource implemented and have considered pagination on other large collection resources. I want to respond appropriately to requests which don't make sense (and have obviously been requested by a client constructing a URI).

You are free to design your URIs as you want encoding any concept.
So, depending on your users (humans/machines) you can use that as a split on a conceptual level based on your problem space or domain. As you mentioned you probably have something like this:
/users/44/timeslots/afternoon
/users/44/timeslots/offshift
/users/44/timeslots/hours/1
/users/44/timeslots/hours/1
/users/44/timeslots/UTC1624
Once can also limit by the ideas/concepts as above. You filter more by adding queries /users/44/timeslots?day=weekdays&dow=mon
Making use or concept and filters like this will naturally limit the response size. But you need to try design your API not go get into that situation. If your client misbehaves, give it a 400 Bad Request. If something goes wrong on your server side use a 5XX code.
Make use of one of the tools of REST - hypermedia and links (See also HATEOAS) Link to the next part of your hypermedia, make use of "chunk like concepts" that your domain understands (pages, time-slots). No need to download megabytes which also not good for caching which impacts scalability/speed.

timeslots is a collection resource, why won't you simply enable pagination on that resource
see here: Pagination in a REST web application
calling get on the collection without page information simply returns the first page (with a default page size)
Should I simply let the server comply with this massive request?
I think you shouldn't, but that's up to you to decide, can the server handle big volumes? do you find it a valid usecase?

This may be too weak of an answer but here is how my team has handled it. Large resources like that are Required to have the additional filtering information provided. If the filtering information is not there to keep the size within a specific range then we return an Internal Error (500) with an appropriate message to denote that it was a failure to use the RESTful API properly.
Hope this helps.

You can use a custom Range header - see http://otac0n.com/blog/2012/11/21/range-header-i-choose-you.html
Or you can (as others have suggested) split your resource up into smaller resources at different URLs (representing sections, or pages, or otherwise filtered versions of the original resource).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse