Standard format for REST pagination, field selection, querying? - rest

When designing a REST API, following guidance such as 10 Best Practices for Better RESTful API, there seem to be all sorts of ways to provide a query syntax, pagination, selecting fields to return, etc.
For example, some ways to do pagination:
/orders?max=20&start=100
/orders?per_page=20&page=5
Some ways to provide a query interface:
/orders?q=value>20
/orders?q={'value': 'gt 20'}
Are there any standards for how to design an API that offers these features? If not, standards in development or best practice guidelines would be useful.

When researching this for the Watson Discovery and Assistant APIs, we weren't able to find any widely adopted conventions for filtering or paging, although there are many different conventions.
Some considerations for which convention you use:
Do you need compound clauses in your query? If you want to be able to express a > 10 || b < 10, then you need a string syntax or structured JSON structure to represent the more complicated queries, which will likely be a usability challenge for your users, and so is preferable to avoid if you don't really need the flexibility. In general, the simpler you can keep the requirements, the easier the API will be to learn and use, while potentially at the expense of flexibility. For example, if it turns out that the created date is the only field that users actually care about doing inequality filtering on, you could have explicit begin_date and end_date filter parameters instead of allowing inequality comparisons on all fields.
For pagination, do you have frequently changing data? If so, paging by offset may give you unstable results. For example, paging through logs that are actively being created, sorted by most recent, would cause you to see duplicate items. To avoid this, the server can return a token that represents the next page. This token can either be a lookup value or directly encode the information necessary to identify the values of the next item in the potentially changing list. Microsoft's API guidelines contain examples of both token and offset based paging, and are one of many sets of conventions to follow: https://github.com/Microsoft/api-guidelines/blob/vNext/Guidelines.md#98-pagination

Related

Describing "greater than"-filters search in a REST API URL?

I'm designing a REST API where the /widgets endpoint can be filtered to only show widgets with a certain number of connections. This seems like a natural design:
/widgets?connections=4
I also want to allow filtering for widgets using lesser than and greater than, however. These URL designs seem wrong as they don't follow the classic query string pattern or appear misleading:
/widgets?connections>2
/widgets?connections=>2
What is the normal way of designing this kind of filter? I also need to be able to combine filters, e.g. "more than two connections and exactly one screen".
I've read this related question: REST URL design for greater than, less than operations, but it is not the same as it relates to pagination and ID, and does not contain a neat answer for combined filters.
REST does not give you an exact solution, it just says that your should use standards to build an uniform interface if there are available standards. If not, then it is up to you, anyways it must be documented for the client developers.
Here what you are doing is developing a complete query language for the URI. It would be good to check what exactly you need, because if there is a query language standard, then supporting it completely is just too much work. Afaik. Odata has something you need and there are other conventions, for example RQL is a very old one. With a little search there are other ones too: w x y z. I guess there are many others too. I would choose one of these and implement only what I need from it or look for an existing implementation.

REST API URL design for only single property from performance viewpoint

I am developing REST API and Frontend as a microservice. I know some basic principles of URL design, but there is a performance issue and I'm not sure how to deal with it.
For the convenience of displaying the webpage, I'd like to get certain information about more than 100 resources per page. (Actually, BFF exists as an orchestration layer)
Since the target resource includes the aggregation result from a large amount of database record, it takes about 3 seconds per request. However, the information I want on the webpage is only a part of it, and it doesn't require complex aggregation to get it, and that makes the response time much shorter.
Take a case as an example.
There is a resource of article, and return the resource data in articles/:id containing a complex aggregation. But in this case, all I need is a count of comments, which can be quickly obtained by issuing a SQL count statement without a counter cache.
However, when examining REST API design, I've never seen a case where a GET request that returns only a specific field.
And in microservices, API should only return resource state in loosely coupled situation, so I think it shouldn't be focused on specific fields.
What kind of URL design or performance optimization can be considered in the face of performance problems?
REST is the architectural style of the world wide web; the style and the web were developed in parallel in the 1990s. Given the usage patterns and technical constraints of the time, the attention of the style is focused more on the caching of large grained documents, rather than trying to reduce the latency of transport.
So you would be more likely to design your representation so that count is present somewhere in the representation, and addressable to that you can call attention to it. Thus: fragments.
So if you already had a resource with the identifier /articles, and having the comment count was important enough, then you might treat the representation like a DTO (Data Transfer Object), and simply include the comment count in the representation, accessible via some identifier like /articles#comment-count.
That's not necessarily a great fit for your use case.
An alternative is to just introduce a stand-alone comment count resource.
Any information that can be named can be a resource -- Fielding, 2000
If you are actually doing REST, then the spelling of the URI don't matter (consider - when is the last time you cared where the google search form actually submitted your query?). The identifiers are used as identifiers, general purpose clients don't try to extract semantics from the identifiers.
So using /d8a496c4-51c5-4eeb-8cbd-d5e777cbdee7 as your identifier for the comment-count should "just work".
It's not particularly friendly to the human reader, of course, so you might prefer something else. URI design is, in this sense, a lot like choosing a good variable name -- the machines don't care, so me make choices that are easier for the human beings to manage. That normally means choosing a spelling that is "consistent" with the other spellings in your API.
RFC 3986 introduces distinctions between the path, the query, and the fragment, that you can expect general-purpose components to understand; one of the potentially important distinctions is that reference resolution describes how a general purpose component can compute a new identifier from a base uri and a relative reference.
/articles/comment-count + ./1 -> /articles/1

REST design principles: Referencing related objects vs Nesting objects

My team and I we are refactoring a REST-API and I have come to a question.
For terms of brevity, let us assume that we have an SQL database with 4 tables: Teachers, Students, Courses and Classrooms.
Right now all the relations between the items are represented in the REST-API through referencing the URL of the related item. For example for a course we could have the following
{ "id":"Course1", "teacher": "http://server.com/teacher1", ... }
In addition, if ask a list of courses thought a call GET call to /courses, I get a list of references as shown below:
{
... //pagination details
"items": [
{"href": "http://server1.com/course1"},
{"href": "http://server1.com/course2"}...
]
}
All this is nice and clean but if I want a list of all the courses titles with the teachers' names and I have 2000 courses and 500 teachers I have to do the following:
Approximately 2500 queries just to read the data.
Implement the join between the teachers and courses
Optimize with caching etc, so that I will do it as fast as possible.
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently.
Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
My question therefore is:
1. Is it wrong if we we nest the teacher information in the courses.
2. Should the listing of items e.g. GET /courses return a list of references or a list of items?
Edit: After some research I would say the model I have in mind corresponds mainly to the one shown in jsonapi.org. Is this a good approach?
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently. Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
Your colleagues have lost the plot.
Here's your heuristic - how would you support this use case on a web site?
You would probably do it by defining a new web page, that produces the report you need. You'd run the query, you the result set to generate a bunch of HTML, and ta-da! The client has the information that they need in a standardized representation.
A REST-API is the same thing, with more emphasis on machine readability. Create a new document, with a schema so that your clients can understand the semantics of the document you return to them, tell the clients how to find the target uri for the document, and voila.
Creating new resources to handle new use cases is the normal approach to REST.
Yes, I totally think you should design something similar to jsonapi.org. As a rule of thumb, I would say "prefer a solution that requires less network calls". It's especially true if amount of network calls will be less by order of magnitude.
Of course it doesn't eliminate the need to limit the request/response size if it becomes unreasonable.
Real life solutions must have a proper balance. Clean API is nice as long as it works.
So in your case I would so something like:
GET /courses?include=teachers
Or
GET /courses?includeTeacher=true
Or
GET /courses?includeTeacher=brief|full
In the last one the response can have only the teacher's id for brief and full teacher details for full.
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently. Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
Have you actually measured the overhead generated by each request? If not, how do you know that the overhead will be too intense? From an object-oriented programmers perspective it may sound bad to perform each call on their own, your design, however, lacks one important asset which helped the Web to grew to its current size: caching.
Caching can occur on multiple levels. You can do it on the API level or the client might do something or an intermediary server might do it. Fielding even mad it a constraint of REST! So, if you want to comply to the REST architecture philosophy you should also support caching of responses. Caching helps to reduce the number of requests having to be calculated or even processed by a single server. With the help of stateless communication you might even introduce a multitude of servers that all perform calculations for billions of requests that act as one cohesive system to the client. An intermediary cache may further help to reduce the number of requests that actually reach the server significantly.
A URI as a whole (including any path, matrix or query parameters) is actually a key for a cache. Upon receiving a GET request, i.e., an application checks whether its current cache already contains a stored response for that URI and returns the stored response on behalf of the server directly to the client if the stored data is "fresh enough". If the stored data already exceeded the freshness threshold it will throw away the stored data and route the request to the next hop in line (might be the actual server, might be a further intermediary).
Spotting resources that are ideal for caching might not be easy at times, though the majority of data doesn't change that quickly to completely neglect caching at all. Thus, it should be, at least, of general interest to introduce caching, especially the more traffic your API produces.
While certain media-types such as HAL JSON, jsonapi, ... allow you to embed content gathered from related resources into the response, embedding content has some potential drawbacks such as:
Utilization of the cache might be low due to mixing data that changes quickly with data that is more static
Server might calculate data the client wont need
One server calculates the whole response
If related resources are only linked to instead of directly embedded, a client for sure has to fire off a further request to obtain that data, though it actually is more likely to get (partly) served by a cache which, as mentioned a couple times now throughout the post, reduces the workload on the server. Besides that, a positive side effect could be that you gain more insights into what the clients are actually interested in (if an intermediary cache is run by you i.e.).
Is it wrong if we we nest the teacher information in the courses.
It is not wrong, but it might not be ideal as explained above
Should the listing of items e.g. GET /courses return a list of references or a list of items?
It depends. There is no right or wrong.
As REST is just a generalization of the interaction model used in the Web, basically the same concepts apply to REST as well. Depending on the size of the "item" it might be beneficial to return a short summary of the items content and add a link to the item. Similar things are done in the Web as well. For a list of students enrolled in a course this might be the name and its matriculation number and the link further details of that student could be asked for accompanied by a link-relation name that give the actual link some semantical context which a client can use to decide whether invoking such URI makes sense or not.
Such link-relation names are either standardized by IANA, common approaches such as Dublin Core or schema.org or custom extensions as defined in RFC 8288 (Web Linking). For the above mentioned list of students enrolled in a course you could i.e. make use of the about relation name to hint a client that further information on the current item can be found by following the link. If you want to enable pagination the usage of first, next, prev and last can and probably should be used as well and so forth.
This is actually what HATEOAS is all about. Linking data together and giving them meaningful relation names to span a kind of semantic net between resources. By simply embedding things into a response such semantic graphs might be harder to build and maintain.
In the end it basically boils down to implementation choice whether you want to embed or reference resources. I hope, I could shed some light on the usefulness of caching and the benefits it could yield, especially on large-scale systems, as well as on the benefit of providing link-relation names for URIs, that enhance the semantical context of relations used within your API.

Conducting searches with REST that return large datasets?

I'm creating a RESTful WebAPI for our system in .Net, when conducting a search in my client I presume that it should hit the /person route passing parameters when required to filter the data. However, the person object that is return has quite a lot of nested objects which could slow down data retrieval. Should I have a separate controller which returns a more skeletonised view of a person, should I continue the way I am going, or should I be making subsequent requests to break down the person?
Actually, there is no silver-bullet way to solve your problem, but there are several approaches, which could be usefull for you. However, in my opinion, your idea about optimizing the size of resource representation in search results is correct.
You can include the list of requested fields in filtering query. (for example, see the similar signature/approach in ES search API). Many search engines are following this approach to reduce redundant response payload.
As you have metioned, you can break your heavy object in sub-resources, so that you would be able to include only links to nested objects inside the person, without including the whole represantations of inner-objects. The HATEOAS approach will fit perfectly for this purpose, but it will add extra complexity to your application (but the extra flexibility too).
However, you have to choose, which approach is better for your particular application, but I think, that a good starting point will be the approach with list of requested fields.

REST best practice for getting a subset list

I read the article at REST - complex applications and it answers some of my questions, but not all.
I am designing my first REST application and need to return "subset" lists to GET requests. Which of the following is more "RESTful"?
/patients;listType=appointments;date=2010-02-22;user_id=1234
or
/patients/appointments-list;date=2010-02-22;user_id=1234
or even
/appointments/2010-02-22/patients;user_id=1234
There will be about a dozen different lists that I need to return. In some of these, there will be several filtering parameters and I don't want to have big 'if' statements in my server code to select the subsets based on which parameters are present. For example, I might need all patients for a specific doctor where the covering doctor is another and the primary doctor is yet another. I could select with
/patients;rounds=true;specific_id=xxxx;covering_id=yyyy;primary_id=zzzz
but that would require complicated branching logic to get the right list, where asking for a specific subset (rounds-list) will achieve that same thing.
Note that I need to use matrix parameters instead of query parameters because I need to do filtering at several levels of the URL. The framework I am using (RestEasy), fully supports matrix parameters.
Ralph,
the particular URI patterns are orthogonal to the question how RESTful your application will be.
What matters with regard to RESTfulness is that the client discovers how to construct the URIs at runtime. This can be achieved either with forms or URI templates. Both hypermedia controls tell the client what parameters can be used and where to put them in the URI.
For this to work RESTfully, client and server must know the possible parameters at design time. This is usually achieved by making them part of the specification of the link relationship.
You might for example define a 'my-subset' link relation to have the meaning of linking to subsets of collections and with it you would define the following parameters:
listType, date, userID.
In a link template that spec could be used as
<link rel="my-subset' template="/{listType}/{date}/patients;user_id={userID}"/>
Note how the actual parameter name in the URI is decoupled from the specified parameter name. The value for userID is late-bound to the URI parameter user_id.
This makes it possible for the URI parameter name to change without affecting the client.
You can look at OpenSearch description documents (http://www.opensearch.org) to see how this is done in practice.
Actually, you should be able to leverage OpenSearch quite a bit for your use case. Especially the ability to predefine queries would allow you to describe particular subsets in your 'forms'.
But see for yourself and then ask back again :-)
Jan
I would recommend that you use this URL structure:
/appointments;user_id=1234;date=2010-02-22
Why? I chose /appointments because it is simple and clear. (If you have more than one kind of appointment, let me know in the comments and I can adjust my answer.) I chose the semicolons because they don't imply hierarchy between user_id and date.
One more thing, there is no reason why you should limit yourself to just one URL. It is just fine to have multiple URL structures that refer to the same resource. So you might also use:
/users/1234/appointments;date=2010-02-22
To return a similar result.
That said, I would not recommend using /dates/2010-02-22/appointments;user_id=1234. Why? I don't think, in practice, that /dates refers to a resource. Date is an attribute of an appointment but is not a noun on its own (i.e. it is not a first-class kind of thing).
I can relate to what David James answered.
The format of your URIs can be like he suggested:
/appointments;user_id=1234;date=2010-02-22
and / or
/users/1234/appointments;date=2010-02-22
while still maintaining the discoverability (at runtime) of your resource's URIs (like Jan Algermissen suggested).