Paginated REST API : How to select amount of data returned? - rest

I know that from most of today's REST APIs, web calls responses have to be paginated.
But, I don't see on the web any insight on how to select the ideal size of a batch returned by an API call: should it be 10, 100, 1000. To be short: on what factors should we base the reflection of the size of an API response?
Some people state that it should be based on the number of elements displayed by the UI. I don't agree with this, as not all APIs are directly linked with an UI, and, in any cases, modern REST APIs allow to chose the number of items in output batch with a configurable parameter up to a certain amount.
So, how could we define the value for this "maximum number of elements returned by an HTTP request"?
Should it be based on the payload size? Internal architecture of the API? Should it be based on performance measurement?
Any insight on this? I'm not really looking for an explicit figure, but more some techniques that could help to find the answer. The best for me would be the process followed by some successful APIs. But, I cannot find any.

My general approach to this is to:
Try to avoid paging
Make REST clients resilient against paging changes, by using next and previous links, instead of using a ?page= attribute. A REST client knows another page is available strictly by the appearance of the link.
I don't use hard rules or measurements to figure out when paging is needed. Paging is generally painful, so my first approach would be to try to figure out what requirements drives the need for paging, and if that requirement can be removed.
Once I've determined it's not possible to remove this requirement in another way, I would set the cut-off of a page as large as reasonable, to remove the likelyhood clients need to do additional requests.

If it's a closed API (used only by clients you control), pick whatever the UI wants. It's trivial to change. If clients can choose from several options, you can include a pageSize parameter. Or, better..
If it's an open API (used by clients you don't control), then let clients control what size paging they want. Rather than support a pageNumber parameter, support offset and limit. Offset is how many records to skip before starting to return records, and limit is the maximum number of records to return. If the client is not happy with how their request performs, they can adjust the parameters to suit their needs. Any upper limit your API has should be driven by what your service can handle. It's neither possible nor desirable for you to try to figure out the Magic Maximum Page Size that makes all clients happy and no clients sad.
Also, please note that none of this has anything to do with ReST, which is silent when it comes to paging.

I usually make a rough performance measurement by hand. I want as few requests as necessary, but do not want to risk timeouts.

Related

github api is it possible to bring down requests?

Playing around with the Github API, I see that there is a limit of 5,000 requests an hour. Is there a way to increase this? (Not the main question)
When I hit the /users/:username/events endpoint, I get back an array of events. The ones that are PushEvent have an array called commits. Each commit has its own endpoint that I can hit to pull more data.
This racks up requests super quickly and I was wondering if there was a better way to do this.
Thanks!
This limit is in place to prevent a single user from using too many resources, so it's not likely that it will be raised. If you want to make one request for multiple events, you can do that with the v4 GraphQL API. Note that the API limits are different for the v4 API and are scored based on points, but it's possible that you may be able to structure your query so as to be more efficient and allow you more data to be fetched.
This answer explains how you can write a GraphQL query to inquire about multiple objects with one request.
There may be alternative ways to get the information you want. For example, if you want to know about push events as they happen instead of after the fact, you may be able to set up a webhook, which isn't rate limited.

best practices in pulling data from paginated REST API, page size vs. number of calls?

I'm working on a project where I pull data from Github/Jira REST APIs where results are returned in pages. I'm new to the concept of pagination but I understand that it's better for the server side performance when the size of returned data is large.
my question is what's the reasonable page size if I want to make as less API calls as possible? and what's better for both client side and server side, to make more API calls or to pull more data per each call?
I would suggest the following approach:
in case of UI application you can request the amount of data that is enough to full the UI screen;
in case you want to make as less API calls as possible I can suggest following: three the biggest concerns of requesting big amount of data are response time, throughput and memory usage.
So if you have the requirement about supported requests per second and if you measure the required memory per request then you can calculate the maximum amount of data per one request. After that reduce that value to fit acceptable response time and throughput.

Delete multiple records using REST

What is the REST-ful way of deleting multiple items?
My use case is that I have a Backbone Collection wherein I need to be able to delete multiple items at once. The options seem to be:
Send a DELETE request for every single record (which seems like a bad idea if there are potentially dozens of items);
Send a DELETE where the ID's to delete are strung together in the URL (i.e., "/records/1;2;3");
In a non-REST way, send a custom JSON object containing the ID's marked for deletion.
All options are less than ideal.
This seems like a gray area of the REST convention.
Is a viable RESTful choice, but obviously has the limitations you have described.
Don't do this. It would be construed by intermediaries as meaning “DELETE the (single) resource at /records/1;2;3” — So a 2xx response to this may cause them to purge their cache of /records/1;2;3; not purge /records/1, /records/2 or /records/3; proxy a 410 response for /records/1;2;3, or other things that don't make sense from your point of view.
This choice is best, and can be done RESTfully. If you are creating an API and you want to allow mass changes to resources, you can use REST to do it, but exactly how is not immediately obvious to many. One method is to create a ‘change request’ resource (e.g. by POSTing a body such as records=[1,2,3] to /delete-requests) and poll the created resource (specified by the Location header of the response) to find out if your request has been accepted, rejected, is in progress or has completed. This is useful for long-running operations. Another way is to send a PATCH request to the list resource, /records, the body of which contains a list of resources and actions to perform on those resources (in whatever format you want to support). This is useful for quick operations where the response code for the request can indicate the outcome of the operation.
Everything can be achieved whilst keeping within the constraints of REST, and usually the answer is to make the "problem" into a resource, and give it a URL.
So, batch operations, such as delete here, or POSTing multiple items to a list, or making the same edit to a swathe of resources, can all be handled by creating a "batch operations" list and POSTing your new operation to it.
Don't forget, REST isn't the only way to solve any problem. “REST” is just an architectural style and you don't have to adhere to it (but you lose certain benefits of the internet if you don't). I suggest you look down this list of HTTP API architectures and pick the one that suits you. Just make yourself aware of what you lose out on if you choose another architecture, and make an informed decision based on your use case.
There are some bad answers to this question on Patterns for handling batch operations in REST web services? which have far too many upvotes, but ought to be read too.
If GET /records?filteringCriteria returns array of all records matching the criteria, then DELETE /records?filteringCriteria could delete all such records.
In this case the answer to your question would be DELETE /records?id=1&id=2&id=3.
I think Mozilla Storage Service SyncStorage API v1.5 is a good way to delete multiple records using REST.
Deletes an entire collection.
DELETE https://<endpoint-url>/storage/<collection>
Deletes multiple BSOs from a collection with a single request.
DELETE https://<endpoint-url>/storage/<collection>?ids=<ids>
ids: deletes BSOs from the collection whose ids that are in the provided comma-separated list. A maximum of 100 ids may be provided.
Deletes the BSO at the given location.
DELETE https://<endpoint-url>/storage/<collection>/<id>
http://moz-services-docs.readthedocs.io/en/latest/storage/apis-1.5.html#api-instructions
This seems like a gray area of the REST convention.
Yes, so far I have only come accross one REST API design guide that mentions batch operations (such as a batch delete): the google api design guide.
This guide mentions the creation of "custom" methods that can be associated via a resource by using a colon, e.g. https://service.name/v1/some/resource/name:customVerb, it also explicitly mentions batch operations as use case:
A custom method can be associated with a resource, a collection, or a service. It may take an arbitrary request and return an arbitrary response, and also supports streaming request and response. [...] Custom methods should use HTTP POST verb since it has the most flexible semantics [...] For performance critical methods, it may be useful to provide custom batch methods to reduce per-request overhead.
So you could do the following according to google's api guide:
POST /api/path/to/your/collection:batchDelete
...to delete a bunch of items of your collection resource.
I've allowed for a wholesale replacement of a collection, e.g. PUT ~/people/123/shoes where the body is the entire collection representation.
This works for small child collections of items where the client wants to review a the items and prune-out some and add some others in and then update the server. They could PUT an empty collection to delete all.
This would mean GET ~/people/123/shoes/9 would still remain in cache even though a PUT deleted it, but that's just a caching issue and would be a problem if some other person deleted the shoe.
My data/systems APIs always use ETags as opposed to expiry times so the server is hit on each request, and I require correct version/concurrency headers to mutate the data. For APIs that are read-only and view/report aligned, I do use expiry times to reduce hits on origin, e.g. a leaderboard can be good for 10 mins.
For much larger collections, like ~/people, I tend not to need multiple delete, the use-case tends not to naturally arise and so single DELETE works fine.
In future, and from experience with building REST APIs and hitting the same issues and requirements, like audit, I'd be inclined to use only GET and POST verbs and design around events, e.g. POST a change of address event, though I suspect that'll come with its own set of problems :)
I'd also allow front-end devs to build their own APIs that consume stricter back-end APIs since there's often practical, valid client-side reasons why they don't like strict "Fielding zealot" REST API designs, and for productivity and cache layering reasons.
You can POST a deleted resource :). The URL will be
POST /deleted-records
and the body will be
{"ids": [1, 2, 3]}

Mobile : one single request or multiple smaller requests?

On an iPhone app (or mobile in general) that constantly needs to send requests to a Web Service, is it better to work with one single requests that will fetch a large amount of data or multiple (possibly simultaneous) requests for each element with smaller amount of data fetched.
Example
I want to load a list of elements in a node. I have the node's ID. The 2 ways I can fetch the elements are the following :
send a single request with the node ID and get all the information about the n first elements in the node in a single response ;
send a first request with the node ID to get the IDs of the n first elements in the node, then for each one send another request to have one response per element.
I'm balanced between about that.
the heavyweight single response may cause more lag and timeouts because of the very unstable and slow mobile internet connection ;
the phone may have trouble handling too many responses at the same time.
What's your opinion ?
Since there is overhead for every request, one large request is generally faster than several small ones of the same size. This applies to high speed networks too, but in mobile networks the ratio between transfer speed and latency is even bigger.
I don't think the phone will have any problem handling the responses, so the multiple requests approach seems better for large requests/answers. However, depending on the size of your requests/responses, it may actually be faster to do it in a single request, in order to reduce the delay associated with multiple requests. The single request approach will also need to transfer slightly less data than the multiple request one.
Every call will have its overhead (i.e. network load), the number of connections might also be limited.
You might or might not be able to update your user interface during download, depending on how often your callbacks are called - you may be able to process partial data as it arrives.
If your data is easy to compress (typically text data), then using a single call might even reduce your total network usage footprint even more.
If the chunks of data are large, I'd go with several individual ones. This will also make things easier in case of network errors. Bottom line for me is to just get the right balance - make the packets reasonable sized and don't flood the server.
This is depend upon the situation. If you don't want to bother your user to waiting everytime throughout the app then you can use single request to load all the data at a time.
If you don't mind to let user wait then you can use multiple request on demand. For example if you just want to show title in tableview and detail when user tap on any title. So you can first get the title only and then when user tap you can get details for that title by ID. so that would be pretty good way to request on demand only.
Sometimes the situation merits use for only single requests for say a certain category. Say you have a twitter app and the tweets are seperated out into categories. Someone who has the app but only cares about sports may only look at the sports section which could be a single ajax call. Another user may only be intersted in two categories out of 15 categories. This means the user doesn't have to load unneccessary data. The important thing you need to determine is this.
Does all of the data need to be loaded all at once for the app to work correctly and are your users generally going to want all that data in the first place.

REST api pagination: make page-size a parameter (configurable from outside)

we are having a search/list-resource:
http://xxxx/users/?page=1
Internally the page-size is static and returns 20 items. The user can move forward by increasing the page number. But to be more flexible we are now thinking also to expose the size of a page:
http://xxxx/users/?page=1&size=20
As such this is flexible as the client can now decide network-calls vs. size of response, when searching. Of course this has the drawback that the server could be hit hard either by accident or maliciosly on purpose:
http://xxxx/users/?page=1&size=1000000
For robustness the solution could be to configure an upper limit of page size (e.g. 100) and when it is exceeded either represent an error response or a HTTP redirect to the URL with highest possible page-size parameter.
What do you think?
Personally, I would simply document a maximum page size, and anything larger than that is simply treated as the maximum.
Managing access to resources is always a good idea aka protecting outside interfaces: in other words, put a sensible limit.
The redirect might be a good idea when it comes to development time i.e. when the user of the API gets acquainted with the service but outside of this situation, I doubt there is value.
Make sure the parameters are well documented either way.
Have you tested this to see if it's even a concern? If a user asks for a page size of a million does that really cause all your other requests to stop/slow? If so I might look at your underlying architecture first. But if, in the end, this is an issue I don't think setting a hard limit on page size is bad.
Question: When I user GETs the URI http://xxx/user?page=1 does the response have a link in it to the next page? previous page? If not then it's not really RESTful.