REST, cross-references and performances, which compromise? - rest

After reading this excellent thread REST Complex/Composite/Nested Resources about nested structures in REST responses, I still have a question. What's the best choice in terms of performance about the response ?
Let's take an example.
I have an Category object, which contains some Questions. Those Questions contains some Answers. All of these structures have meta-informations.
Now, when querying an url like GET http://<base_url>/categories/, should I include a description of the Categories only, include Question description ? Which one, full description or simplified one ?
In other terms, what's the best solution between those :
{
"results":[
{
'id':1,
'name':'category1',
'description':'foobar',
'questions':[
{
'id':1234,
'question':'My question',
'author' : 4235345,
'answers':[
{
'id':56786,
'user':456,
'votes':6,
'answer':'It's an answer !'
},
{
'id':3486,
'user':4564,
'votes':2,
'answer':'It's another answer !'
},
]
},
...
]
}
...
]
}
OR SOLUTION 2 :
{
"results":[
{
'id':1,
'name':'category1',
'description':'foobar',
'questions':[
{
'id':1234,
'url':'http://foobar/questions/1234'
'answers':[
{
'id':56786,
'url':'http://foobar/answers/56786'
},
{
'id':3486,
'url':'http://foobar/answers/3486'
},
]
},
...
]
}
...
]
}
OR SOLUTION 3 :
{
"results":[
{
'id':1,
'name':'category1',
'description':'foobar',
'questions':'http://foobar/categories/1/questions'
}
...
]
}
Or maybe another solution ?
Thanks !

That depends on what the application will do with the data. If it is only going to display a list of categories, then it is very inefficient to transfer all the data it ever needs at once, especially if the categories are many, which will decrease response time of user (absolute no no).
These scenarios depend heavily on application and usage of data.
One optimization that we can do is, we can create two requests,
GET http://<base_url>/categories
Which will return minimal data immediately and another request,
GET http://<base_url>/categories?all=true
Which will return all data.
Then the client app can make some clever optimizations like, when user requests for categories, request one is sent and it will immediately render the data. Then after getting the list of categories the user will be idle for some time looking and we can use this opportunity to request all data using request two.
However, as I said this will largely depend on the application.

Related

Correct JSON structure to filter through data [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
What's the "best" JSON structure when you need to "filter" through data in Firebase (in Swift)?
I'm having users sort their questions into:
Business
Entertainment
Other
Is it better to have a separate child for each question genre? If so, how do I get all of the data (when i want it), and then filter it only by "business" when I want to?
In NoSQL databases you usually end up modeling your data structure for the use-cases you want to allow in your app.
It's a bit of a learning path, so I'll explain it below in four steps:
Tree by category: Storing the data in a tree by its category, as you seem to be most interested in already.
Flat list of questions, and querying: Storing the data in a flat list, and then using queries to filter.
Flat list and indexes: Combining the above two approaches, to make the result more scalable.
Duplicating data: By duplicating data on top of that, you can reduce code complexity and improve performance further.
Tree by category
If you only want to get the questions by their category, you're best of simply storing each question under its category. In a simple model that'd look like this:
questionsByCategory: {
Business: {
question1: { ... },
question4: { ... }
},
Entertainment: {
question2: { ... },
question5: { ... }
},
Other: {
question3: { ... },
question6: { ... }
}
}
With the above structure, loading a list of question for a category is a simple, direct-access read for that category: firebase.database().ref("questionsByCategory").child("Business").once("value"....
But if you'd need a list of all questions, you'd need to read all categories, and denest the categories client-side. If you'd need a list of all question that is not a real problem, as you need to load them all anyway, but if you want to filter over some other condition than category, this may be wasteful.
Flat list of questions, and querying
An alternative is to create a flat list of all questions, and use queries to filter the data. In that case your JSON would look like this:
questions: {
question1: { category: "Business", difficulty: 1, ... },
question2: { category: "Entertainment", difficulty: 1, ... },
question3: { category: "Other", difficulty: 2, ... },
question4: { category: "Business", difficulty: 2, ... }
question5: { category: "Entertainment", difficulty: 3, ... }
question6: { category: "Other", difficulty: 1, ... }
}
Now, getting a list of all questions is easy, as you can just read them and loop over the results:
firebase.database().ref("questions").once("value").then(function(result) {
result.forEach(function(snapshot) {
console.log(snapshot.key+": "+snapshot.val().category);
})
})
If we want to get all questions for a specific category, we use a query instead of just the ref("questions"). So:
Get all Business questions:
firebase.database().ref("questions").orderByChild("category").equalTo("Business").once("value")...
Get all questions with difficult 3:
firebase.database().ref("questions").orderByChild("difficult").equalTo(3).once("value")...
This approach works quite well, unless you have huge numbers of questions.
Flat list and indexes
If you have millions of questions, Firebase database queries may not perform well enough anymore for you. In that case you may need to combine the two approaches above, using a flat list to store the question, and so-called (self-made) secondary indexes to perform the filtered lookups.
If you think you'll ever reach this number of questions, I'd consider using Cloud Firestore, as that does not have the inherent scalability limits that the Realtime Database has. In fact, Cloud Firestore has the unique guarantee that retrieving a certain amount of data takes a fixed amount of time, no matter how much data there is in the database/collection.
In this scenario, your JSON would look like:
questions: {
question1: { category: "Business", difficulty: 1, ... },
question2: { category: "Entertainment", difficulty: 1, ... },
question3: { category: "Other", difficulty: 2, ... },
question4: { category: "Business", difficulty: 2, ... }
question5: { category: "Entertainment", difficulty: 3, ... }
question6: { category: "Other", difficulty: 1, ... }
},
questionsByCategory: {
Business: {
question1: true,
question4: true
},
Entertainment: {
question2: true,
question5: true
},
Other: {
question3: true,
question6: true
}
},
questionsByDifficulty: {
"1": {
question1: true,
question2: true,
question6: true
},
"2": {
question3: true,
question4: true
},
"3": {
question3: true
}
}
You see that we have a single flat list of the questions, and then separate lists with the different properties we want to filter on, and the question IDs of the question for each value. Those secondary lists are also often called (secondary) indexes, since they really serve as indexes on your data.
To load the hard questions in the above, we take a two-step approach:
Load the questions IDs with a direct lookup.
Load each question by their ID.
In code:
firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result) {
result.forEach(function(snapshot) {
firebase.database().ref("questions").child(snapshot.key).once("value").then(function(questionSnapshot) {
console.log(questionSnapshot.key+": "+questionSnapshot.val().category);
});
})
})
If you need to wait for all questions before logging (or otherwise processing) them, you'd use Promise.all:
firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result) {
var promises = [];
result.forEach(function(snapshot) {
promises.push(firebase.database().ref("questions").child(snapshot.key).once("value"));
})
Promise.all(promises).then(function(questionSnapshots) {
questionSnapshots.forEach(function(questionSnapshot) {
console.log(questionSnapshot.key+": "+questionSnapshot.val().category);
})
})
})
Many developers assume that this approach is slow, since it needs a separate call for each question. But it's actually quite fast, since Firebase pipelines the requests over its existing connection. For more on this, see Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly
Duplicating data
The code for the nested load/client-side join is a bit tricky to read at times. If you'd prefer only performing a single load, you could consider duplicating the data for each question into each secondary index too.
In this scenario, the secondary index would look like this:
questionsByCategory: {
Business: {
question1: { category: "Business", difficulty: 1, ... },
question4: { category: "Business", difficulty: 2, ... }
},
If you come from a background in relational data modeling, this may look quite unnatural, since we're now duplicating data between the main list and the secondary indexes.
To an experienced NoSQL data modeler however, this looks completely normal. We're trading off storing some extra data against the extra time/code it takes to load the data.
This trade-off is common in all areas of computer science, and in NoSQL data modeling you'll fairly often see folks choosing to sacrifice space (and thus store duplicate data) to get an easier and more scalable data model.

Meteor template subscriptions and performance

I'm looking for advice on my approach here- I want to be sure I'm doing things in the "meteor way" and keeping the code fast.
Current situation:
We have a collection for Questions. Each question has a nested collection of Answers. Through a REST API, a device relays the answers that were selected by users.
Based on the answers that were selected, we show a chart for each question- simple number breakdowns and percentage bars. To improve performance, we've been tracking the number of responses each Answer has received on the Answer itself.
The publication looks (basically) like this:
Meteor.publish('questionsBySiteID', function(site_id){
return Questions.find({site_id: site_id});
});
And the route like this:
Router.route('/sites/:_id/questions', {
name: 'questionsList',
waitOn: function(){
return [
Meteor.subscribe('questionsBySiteID', this.params._id),
];
},
data: function(){
return {
publishedQuestions: Questions.find(
{ site_id : this.params._id, active: true, deleted: {$ne: true} },
{ sort : { order: 1} }
),
archivedQuestions : Questions.find(
{ site_id : this.params._id, active: false, deleted: {$ne: true} },
{ sort : { updated_at: -1 } }
),
deletedQuestions : Questions.find(
{ site_id : this.params._id, deleted: true },
{ sort : { updated_at: -1 } }
)
};
}
});
Change required:
Now we want responses to be date-filterable. This means the denormalized response counts we've tracked on Answers aren't very useful. We've been tracking another collection (Responses) with more a "raw" version of the data. A Response object tracks the module (questions in this case), question_id, answer_id, timestamp, id for the customer the question belongs to, etc.
Question:
Is this something that template subscriptions help with? Perhaps we need a publication that accepts a question_id and optional start/end dates for the filter. The stats template for each question would be subscribed to applicable Responses data in Template.question.create(). Based on the question_id, the publication would need to find Responses for related answers within the date filter. And maybe we use the publish-counts package to count the number of times each answer was selected and publish those counts.
The Responses collection will be quite large, so I'm trying to be careful about what I publish here. I don't want to waitOn all Responses to be published.

Returning objects with multiple types in a RESTful way

I'm currently designing an API to handle requests from mobile clients. To achieve some degree of decoupling between backend and client I would like to define the webservices in a RESTful way. The challenge that I am facing is returning multiple objects with different types for a single call.
Lets say we have the following Model:
Harbour ... Top level entry
Boat shed ... Assigned to a specific harbour (0..n)
Boat ... Assigned to a specific boat shed (0..n), or directly to a harbour (0..n)
As far as i understand REST, if I now want to display all the boats and sheds in the harbour I would send two requests:
/harbours/{harbour_id}/boats Returning a list of all boats. Boats in a shed would contain an id linking to the shed they are in
/harbours/{harbour_id}/sheds Returning a list of all sheds
As I want to use the web service in a mobile scenario, it would be ideal to combine these two calls into one. This could then either return the list of boats with the shed object nested within, or both object types side by side:
/harbours/22/boats
[
{
"id":1,
"boatName":"Mary",
"boatShed":{
"id":1,
"shedName":"Dock 1",
"capacity":55
}
},
{
"id":2,
"boatName":"Jane",
"boatShed":{
"id":1,
"shedName":"Dock 1",
"capacity":55
}
}
]
or
/harbours/22/boats
{
"boats":[
{
"id":1,
"boatName":"Mary",
"boatShedId":1
},
{
"id":2,
"boatName":"Jane",
"boatShedId":1
}
],
"sheds":[
{
"id":1,
"shedName":"Dock 1",
"capacity":55
}
]
}
My question now is, which of these ways is closer to the idea behind REST, or is it not RESTful at all?
As #Tarken mentioned /boats request should not return sheds in the top level (since the url assumes you're asking for collection of resource Boat)
If you have relations defined as follows
Harbour:
boats: Boat[]
sheds: Shed[]
Shed:
boats: Boat[]
Boat:
harbour: Harbour
shed: Shed
/harbours/ID then returns a Harbour representation with boats and sheds relation set.
{
boats:
[
{ Boat },
{ Boat },
..
],
sheds:
[
{ Shed },
{ Shed },
..
],
...
}
Nothing is against restful principles here - the url uniquely identifies a resource and resource representation can be anything, with links to other resources as well.
Create a Harbour model which contains both Boat Shed and Boat information. If i am implementing the service in Java, then i would have done something like this :
class Boat{
...
}
class BoatShed{
...
}
class Harbour{
List<Boat> boats;
List<BoatShed> boatSheds;
...
}
You can create an API like /api/harbours/{harbourId}.
As per your question you want to display all the boats and sheds in the harbour, say id=1234, you can make a request like this :
GET /api/harbours/1234
This will return list of Boats and list of Boat Sheds like this:
{
"boats":[
{
"id":1,
"boatName":"Mary",
"boatShedId":1
},
{
"id":2,
"boatName":"Mary2",
"boatShedId":2
}
],
"sheds":[
{
"id":1,
"shedName":"Dock 1",
"capacity":55
},
{
"id":2,
"shedName":"Dock 2",
"capacity":50
}
]
}
EDIT
As you want to get boats and Sheds side by side by sending one request, the api/hourbours/{id} looks good according to REST API design principles.
Getting all sheds while requesting for boats is not in accordance with ideal REST API design, but if you want to achieve the same you can d the following.
If you want that way, then first one /api//harbours/{id}/boats looks good to me.

MongoDB Social Network Adding Followers

I'm implementing a social network in MongoDB and I need to keep track of Followers and Following for each User. When I search for Users I want to display a list like Facebook with the User Name, Picture and number of Followers & Following. If I just wanted to display the User Name and Picture (info that doesn't change) it would be easy, but I also need to display the number of Followers & Following (which changes fairly regularly).
My current strategy is to embed the People a User follows into each User Document:
firstName: "Joe",
lastName: "Bloggs",
follows: [
{
_id: ObjectId("520534b81c9aac710d000002"),
profilePictureUrl: "https://pipt.s3.amazonaws.com/users/xxx.jpg",
name: "Mark Rogers",
},
{
_id: ObjectId("51f26293a5c5ea4331cb786a"),
name: "The Palace Bar",
profilePictureUrl: "https://s3-eu-west-1.amazonaws.com/businesses/xxx.jpg",
}
]
The question is - What is the best strategy to keep track of the number of Followers & Following for each User?
If I include the number of Follows / Following as part of the embedded document i.e.
follows: [
{
_id: ObjectId("520534b81c9aac710d000002"),
profilePictureUrl: "https://pipt.s3.amazonaws.com/users/xxx.jpg",
name: "Mark Rogers",
**followers: 10,**
**following: 400**
}
then every time a User follows someone requires multiple updates across all the embedded documents.
Since the consistency of this data isn't really important (i.e. Showing someone I have 10 instead of 11 followers isn't the end of the world), I can queue this update. Is this approach ok or can anyone suggest a better approach ?
You're on the right track. Think about which calculation is performed more - determining the number of followers/following or changing number of followers/following? Even if you're caching the output of the # of followers/following calculation it's still going to be performed one or two orders of magnitude more often than changing the number.
Also, think about the opposite. If you really need to display the number of followers/following for each of those users, you'll have to then do an aggregate on each load (or cache it somewhere, but you're still doing a lot of calcs).
Option 1: Cache the number of followers/following in the embedded document.
Upsides: Can display stats in O(1) time
Downsides: Requires O(N) time to follow/unfollow
Option 2: Count the number of followers/following on each page view (or cache invalidation)
Upsides: Can follow/unfollow in O(1) time
Downsides: Requires O(N) time to display
Add in the fact that follower/following stats can be eventually consistent whereas the counts have to be displayed on demand and I think it's a pretty easy decision to cache it.
I've gone ahead and implement the update followers/following based on the same strategy recommended by Mason (Option 1). Here's my code in NodeJs and Mongoose and using the AsyncJs Waterfall pattern in case anyone is interested or has any opinions. I haven't implemented queuing yet but the plan would be to farm most of this of to a queue.
async.waterfall([
function (callback) {
/** find & update the person we are following */
Model.User
.findByIdAndUpdate(id,{$inc:{followers:1}},{upsert:true,select:{fullName:1,profilePictureUrl:1,address:1,following:1,followers:1}})
.lean()
.exec(callback);
},
function (followee, callback) {
/** find & update the person doing the following */
var query = {
$inc:{following:1},
$addToSet: { follows: followee}
}
Model.User
.findByIdAndUpdate(credentials.username,query,{upsert:true,select:{fullName:1,profilePictureUrl:1,address:1,following:1,followers:1}})
.lean()
.exec(function(err,follower){
callback(err,follower,followee);
});
},
function(follower,followee,callback){
/** update the following count */
Model.User
.update({'follows._id':follower.id},{'follows.$.following':follower.following},{upsert:true,multi:true},function(err){
callback(err,followee);
});
},
function(followee,callback){
/** update the followers count */
Model.User
.update({'follows._id':followee.id},{'follows.$.followers':followee.followers},{upsert:true,multi:true},callback);
}
], function (err) {
if (err)
next(err);
else {
res.send(HTTPStatus.OK);
next();
}
});

Does it make sense to use internal anchors for filtering a REST API's representation?

As a follow up to my previous question about REST URIs for retrieving statistical information for a web forum Resource, I want to know if it is possible to use the internal anchors as filter hints. See example below:
a) Get all statistics:
GET /group/5t7yu8i9io0op/stat
{
group_id: "5t7yu8i9io0op",
top_ranking_users: {
[ { user: "george", posts: 789, rank: 1 },
{ user: "joel", posts: 560, rank: 2 } ...]
},
popular_topics: {
[ ... ]
},
new_topics: {
[ ... ]
}
}
b) GET only popular topics
GET /group/5t7yu8i9io0op/stat#popular_topics
{
group_id: "5t7yu8i9io0op",
popular_topics: {
[ ... ]
}
}
c) GET only top ranking users
GET /group/5t7yu8i9io0op/stat#top_ranking_users
{
group_id: "5t7yu8i9io0op",
top_ranking_users: {
[ { user: "george", posts: 789, rank: 1 },
{ user: "joel", posts: 560, rank: 2 } ...]
}
}
Or should I be using query parameters ?
Not sure what you are trying to do exactly, but make sure you understand that fragment identifiers are not seen by the server, they are chopped off by the client connector.
See: http://www.nordsc.com/blog/?p=17
I've never seen anchors being used that way - it's interesting. That being said, I'd suggest using query parameters for a couple of reasons:
They're standard - and consumers of your api will be comfortable with them. There's nothing more annoying that dealing with a quirky api.
Many frameworks will auto-parse the query parameters and set them in a dictionary on the request object (or whatever analogue exists in your framework / http server library).
I think it would make more sense to have:
/group/5t7yu8i9io0op/stat/top_users
/group/5t7yu8i9io0op/stat/popular_topics
/group/5t7yu8i9io0op/stat/new_topics
/group/5t7yu8i9io0op/stat/user/george
No you cannot do that because as Jan points out the server will never see that fragment identifier. Literally, that part of the url will not reach the server.