API provides pagination functionality - client wants to query beyond bounds of API - rest

For a web project, I am consuming an API that returns Educational Materials (books, videos, etc) -- in simplicity you can request:
API accepted parameters :
type: accepts 1 or many: [book, video, software]
subject matter: accepts 1 or many: [science, math, english, history]
per page: accepts an integer, defaults to 2, 0 returns ALL results
page: accepts an integer, defaults to 1
Important: This is a contrived example of a real use case, so it's not just 1 or 2 requests I'd have to cache, it's almost an infinite amount of combinations.
and it returns objects that look like:
{
"total-results": 15,
"page": 1,
"per-page": 2,
"data": [
{
"title": "Foobar",
"type": "book",
"subject-matter": [
"history",
"science"
],
"age": 10
},
{
"title": "Barfoo",
"type": "video",
"subject-matter": [
"history"
],
"age": 14
}
]
}
The client wants to be able to allow users to filter by age on my site -- so I have to essentially query everything and re-run my pagination.
I'd like to suggest to the API team (which we control) to allow me to query by age as well, but trying to explain this concept to the business is proving fruitless.
Right now all that I can think to solve this are 2 options: (1) convince the API team to allow me to query by age or (2) to cache the life out of my requests and use "0" by default and handle pagination on my end.
Again, Important: This is a contrived example of a real use case, so it's not just 1 or 2 requests I'd have to cache, it's almost an infinite amount of combinations.
Anyone have experience dealing with something similar to this?
Edit: Eric Stein asked a very sound question, here is the Q & A:
His Q: "Your API team does not know how to filter by age?"
My A: "They may it's a HUGE organization and I may get stonewalled because of bureaucracy and want to prepare for the worst."

I worked in a project that we consumed an API and had to make more filters that the API allowed (the API wasn't ours). In that case what we decided was to create a cron script that consumed the API and registered the returned data in an database of our own. We had a lot of problems maintaining that (it was A LOT of data), but kinda worked for us (at least for the time I was working in the project).
I think if it's important to your application (and for your client) that you can age filter, that's a pretty good argument to convince the API team to allow that.

Related

REST API sub resources, data to return?

If we have customers and orders, I'm looking for the correct RESTful way to get this data:
{
"customer": {
"id": 123,
"name": "Jim Bloggs"
"orders": [
{
"id": 123,
"item": "Union Jack Keyring",
"qty": 1
}, {
"id": 987,
"item": "London Eye Ticket",
"qty": 5
}
]
}
}
GET /customers/123/orders
GET /customers/123?inc-orders=1
Am I correct that the last part/folder of the URL, excluding query string params, should be the resource returned..?
If so, number 1 should only return order data and not include the customer data. While number 2 is pointing directly at customer 123 and uses query string params to effect/filter the customer data returned, in this case including the order data.
Which of these two calls is the correct RESTful call for the above JSON..? ...or is there a secret number 3 option..?
You have 3 options which I think could be considered RESTful.
1)
GET /customers/12
But always include the orders. Do you have a situation in which the client would not want to use the orders? Or can the orders array get really big? If so you might want another option.
2)
GET /customers/123, which could include a link to their orders like so:
{
"customer": {
"id": 123,
"name": "Jim Bloggs"
"orders": {
"href": "<link to you orders go here>"
}
}
}
With this your client would have to make 2 requests to get a customer and their orders. Good thing about this way though is that you can easily implement clean paging and filtering on orders.
3)
GET /customers/123?fields=orders
This is similar to your second approach. This will allow clients to use your API more efficiently, but I wouldn't go this route unless you really need to limit the fields that are coming back from your server. Otherwise it will add unnecessary complexity to your API which you will have to maintain.
The Resource (identified by the complete URL) is the same, a customer. Only the Representation is different, with or without embedded orders.
Use Content Negotiation to get different Representations for the same Resource.
Request
GET GET /customers/123/
Accept: application/vnd.acme.customer.short+json
Response
200 OK
Content-Type: application/vnd.acm.customer.short+json
{
"customer": {
"id": 123,
"name": "Jim Bloggs"
}
}
Request
GET GET /customers/123/
Accept: application/vnd.acme.customer.full+json
Response
200 OK
Content-Type: application/vnd.acme.customer.full+json
{
"customer": {
"id": 123,
"name": "Jim Bloggs"
"orders": [
{
"id": 123,
"item": "Union Jack Keyring",
"qty": 1
}, {
"id": 987,
"item": "London Eye Ticket",
"qty": 5
}
]
}
}
The JSON that you posted looks like what would be the result of
GET /customers/123
provided the Customer resource contains a collection of Orders as a property; alternatively you could either embed them, or provide a link to them.
The latter would result in something like this:
GET /customers/123/orders
which would return something like
{
"orders": [
{
"id": 123,
"item": "Union Jack Keyring",
"qty": 1
},
{
"id": 987,
"item": "London Eye Ticket",
"qty": 5
}
]
}
I'm looking for the correct RESTful way to get this data
Simply perform a HTTP GET request on a URI that points to a resource that produces this data!
TL;DR
REST does not care about URI design - but on its constraints!
Clients perform state transitions through possible actions returned by the server through dynamically identified hyperlinks contained within the response.
Clients and servers can negotiate on a preferred hypermedia type
Instead of embedding the whole (sub-)resource consider only returning the link to that resource so a client can look it up if interested
First, REST does not really care about the URI design as long as the URI is unique. Sure, a simple URI design is easier to understand for humans, though if compared to HTML the actual link can be hidden behind a more meaninful text and is thus also not that important for humans also as long as they are able to find the link and can perform an action against it. Next, why do you think your "response" or API is RESTful? To call an API RESTful, the API should respect a couple of constraints. Among these constraints is one quite buzzword-famous: hypertext as the engine of application state (HATEOAS).
REST is a generalized concept of the Web we use every day. A quite common task for a web-session is that a client requests something where the server sends a HTML document with plenty of links and other resources the client can use to request further pages or stream a video (or what ever). A user operationg on a client can use the returned information to proceed further, request new pages, send information to the server etc, etc. The same holds true for RESTful applications. This is was REST simply defines as HATEOAS. If you now have a look at your "response" and double check with the HATEOAS constraint you might see that your response does not contain any links to start with. A client therefore needs domain knowledge to proceed further.
JSON itself isn't the best hypermedia type IMO as it only defines the overall syntax of the data but does not carry any semantics, similar to plain XML which though may have some DTD or schemas a client may use to validate the document and check if further semantic rules are available elsewhere. There are a couple of hypermedia types that build up on JSON that are probably better suited like f.e. application/hal+json (A good comparison of JSON based hypermedia types can be found in this blog post). You are of course entitled to define your own hypermedia type, though certain clients may not be able to understand it out of the box.
If you take f.e. a look at HAL you see that it defines an _embedded element where you can put in certain sub-resources. This seems to be ideal in your case. Depending on your design, orders could also be a resource on its own and thus be reachable via GET /orders/{orderId} itself. Instead of embedding the whole sub-resource, you can also just include the link to that (sub)resource so a client can look up the data if interested.
If there are cases where you want to return only customer data and other cases where you want also to include oder data you can f.e. define different hypermedia types (based on HAL f.e.) for both, one returning just the customer data while the other also includes the oder data. These types could be named like this: application/vnd.yourcompanyname.version.customers.hal+json or application/vnd.yourcompanyname.version.customer_orders.hal+json. While this is for sure an development overhead compared to adding a simple query-parameter to the request, the semantics are more clear and the documentation overhead is on the hypermedia type (or representation) rather then the HTTP operation.
You can of course also define some kind of view structure where one view only returns the customer data as is while a different view returns the customer data including the orders similar to a response I gave on a not so unrelated topic.

Mongodb real basic use case

I'm approaching the noSQL world.
I studied a little bit around the web (not the best way to study!) and I read the Mongodb documentation.
Around the web I wasn't able to find a real case example (only fancy flights on big architectures not well explained or too basic to be real world examples).
So I have still some huge holes in my understanding of a noSQL and Mongodb.
I try to summarise one of them, the worst one actually, here below:
Let's imagine the data structure for a post of a simple blog structure:
{
"_id": ObjectId(),
"title": "Title here",
"body": "text of the post here",
"date": ISODate("2010-09-24"),
"author": "author_of_the_post_name",
"comments": [
{
"author": "comment_author_name",
"text": "comment text",
"date": ISODate("date")
},
{
"author": "comment_author_name2",
"text": "comment text",
"date": ISODate("date")
},
...
]
}
So far so good.
All works fine if the author_of_the_post does not change his name (not considering profile picture and description).
The same for all comment_authors.
So if I want to consider this situation I have to use relationships:
"authorID": <author_of_the_post_id>,
for post's author and
"authorID": <comment_author_id>,
for comments authors.
But MongoDB does not allow joins when querying. So there will be a different query for each authorID.
So what happens if I have 100 comments on my blog post?
1 query for the post
1 query to retrieve authors informations
100 queries to retrieve comments' authors informations
**total of 102 queries!!!**
Am I right?
Where is the advantage of using a noSQL here?
In my understanding 102 queries VS 1 bigger query using joins.
Or am I missing something and there is a different way to model this situation?
Thanks for your contribution!
Have you seen this?
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
It sounds like what you are doing is NOT a good use case for NoSQL. Use relational database for basic data storage to back applications, use NoSQL for caching and the like.
NoSQL databases are used for storage of non-sensitive data for instance posts, comments..
You are able to retrieve all data with one query. Example: Don't care about outdated fields as author_name, profile_picture_url or whatever because it's just a post and in the future this post will not be visible as newer ones. But if you want to have updated fields you have two options:
First option is to use some kind of worker service. If some user change his username or profile picture you will give some kind of signal to your service to traverse all posts and comments and update all fields his new username.
Second option use authorId instead of author name, and instead of 2 query you will make N+2 queries to query for comment_author_profile. But use pagination, instead of querying for 100 comments take 10 and show "load more" button/link, so you will make 12 queries.
Hope this helps.

RESTful way to ask for subset of a resource

Suppose I have a resource called user_stats, that has things like how many posts, comments, likes, and followers users have. Is there a RESTful way to only ask for parts of that statistics (i.e. for user_stats/3, tell me how many posts and comments this user has, but don't tell me how many follower this user has.)
The reason I'm asking is some statistical attributes can be computationally intensive (yes I'm generating them at query time). So simply not asking for it can reduce workload.
There is a very useful 38 page free ebook with best practices about designing Web APIs, you might find it helpful, at least I did.
For your case, it is stated:
Add optional fields in a comma-delimited list
The Google approach works extremely well.
Here's how to get just the information we need from our dogs API using
this approach:
/dogs?fields=name,color,location
It's simple to read; a developer can select just the information an
app needs at a given time; it cuts down on bandwidth issues, which is
important for mobile apps. The partial selection syntax can also be used to include associated resources cutting down on the number of requests needed to get the required information.
Maybe that's what you re looking for?
there are at least three options:
Use query parameter as a filter
e.g. user_stats?fields=posts,comments
Make user_stats composite resource and create new resources for particular stat
e.g. /user_stats in JSON
{
"blogs" : {
"count" : 10,
"link" : "/user_stats_blobs"
},
...
}
then you can get whole stats (GET /user_stats) or just a piece (GET /user_stats_blobs)
Create filter representation; use POST to post filter representation as part of request
e.g.
Request
POST /user_stats/filter
{
"fields" : [ "blogs", ...]
}
response body contains just requested/filtered data.
All solutions are RESTful. Solution 1. is easy to implement but has limited extensibility and transparency. Solution 2. expects that you create new resources which is overhead in this case (you need just one number). so, I would recommend solution 3. because is no so hard to implement, is easily extensible and transparent.

Backbone.js & REST API resources relationship & interraction

I have a small REST API that is being consumed by a single page web application powered by Backbone.js
There are two resource types that the API provides, and therefore, the Backbone app uses. These are articles and comments. These two resources have different endpoints and there is a link from each of the articles to the location of all the comments for that item.
The problem that I'm facing is that, on the article list in my web app I would like to be able to display the number of comments for each article. Given that that would only be possible if I also get the comments list, on the current setup, would require me to make one API request to get the the initial article list and another one for each of the articles to be able to count the number of comments. That becomes a problem if, for instance, there are 100 articles, and therefore 101 HTTP requests would be necessary to populate one single view.
The solutions I can think of right now are:
1. to include the comments data in the initial articles request like so
{
{
"id": 1,
"name": "Article 1",
...
"comments": {
{
"id": 1,
"text": "some comment"
},
{
"id": 2,
"text": "some comment"
},
...
}
},
}
The question in this case is: How is it possible to parse the "comments" as a separate comments collection and not include it into the article model?
2. to include some metadata inside the articles response like so:
{
{
"id": 1,
"name": "Article 1",
...
"comments": 13
},
}
Option that raises the question: how should I handle the parse of the model so that, on one hand the meta information is available, and on the other hand, the "comments" attribute is not one Backbone would try to perform updates on?
I feel there might be another solution, compliant with the REST philosophy, for this that I'm missing, so if you have any other suggestion please let me know.
I think your best bet is to go with your second option, include the number of comments for each article inside your article model.
Option that raises the question: how should I handle the parse of the model so that, on one hand the meta information is available, and on the other hand, the "comments" attribute is not one Backbone would try to perform updates on?
Not sure what your concern is here. Why would you be worried about the comments attribute getting updated?
I can't think of any other "RESTy" way of achieving your desired result.
I would suggest using alternative 2 and have the server return
a subset of the article attributes that are deemed useful for
applications when dealing with the article collection resource
(perhaps reachable at /articles).
The full article member resource with all its comments (whether
they are stored in separate tables in the backend) would be
available at /articles/:id).
From a Backbone.js point of view you probably want to put the
collection resource in a, say, ArticleCollection which will
convert each member (currently with a subset of the attributes)
to Article models.
When the user selects to view an article in full you pull it
out from the ArticleCollection and invoke fetch to populate
it in full.
Regarding what to do with extra/virtual attributes that are included
in the collection resource (/articles) like the comment count and
possibly other usefult aggregations, I see a few alternatives:
In Article#initialize you can pull those out from the attributes
and store them as meta-data on the article. This way the built-in
Backbone.Model#toJSON will not see them.
Keep them in the attributes section of each model and override
Backbone.Model#toJSON to exlcude them when "serializing" an Article.
In atlernative 1, an Article#commentCount() helper could return
this._commentCount || this.get('comments').length to make it work
on both partially and fully loaded articles.
For a fully loaded Article you would probably want to convert the
nested comments array into a full-blown CommentCollection anyway
and store that in this._comments so I don't think it is that unusual
to have your models store additional stuff directly on the model instance,
outside of its attributes hash.

What is a good design for a REST URI in case of several pages of results?

Say I have a service I want to expose via REST.
A query on the service may yield a long list of results, which are returned "page by page", which is why the user must be able to:
specify an ordering criterion (alpha sorting on the values of one attribute or another)
specify a key value from where to get results: "show me results from letter C on..."
specify a page number from which to start getting results (i.e. I want to get the results from page 3 )
specify the max number of results per page
I suppose the ordering criterion is well suited to a query string parameter, since it does not belong to the resource, but is just a preference for its returned representation.
What about the other options? Is the whole idea sound, or it smells too much of its web-oriented origin?
As a side note, have you got any pointers for good overall design suggestions for heavy queries with multiple pages of results (for example policies for caching results on the server)?
Thank you.
From RESTful Web Services Cookbook:
Request
GET /book?sortbyDesc=date&limit=5
Response
{
"id": 9,
"links": [{
"href": "/book?sortByDesc=date&limit=5&start=5",
"rel": "next"}]
}