I'm looking for advice on my approach here- I want to be sure I'm doing things in the "meteor way" and keeping the code fast.
Current situation:
We have a collection for Questions. Each question has a nested collection of Answers. Through a REST API, a device relays the answers that were selected by users.
Based on the answers that were selected, we show a chart for each question- simple number breakdowns and percentage bars. To improve performance, we've been tracking the number of responses each Answer has received on the Answer itself.
The publication looks (basically) like this:
Meteor.publish('questionsBySiteID', function(site_id){
return Questions.find({site_id: site_id});
});
And the route like this:
Router.route('/sites/:_id/questions', {
name: 'questionsList',
waitOn: function(){
return [
Meteor.subscribe('questionsBySiteID', this.params._id),
];
},
data: function(){
return {
publishedQuestions: Questions.find(
{ site_id : this.params._id, active: true, deleted: {$ne: true} },
{ sort : { order: 1} }
),
archivedQuestions : Questions.find(
{ site_id : this.params._id, active: false, deleted: {$ne: true} },
{ sort : { updated_at: -1 } }
),
deletedQuestions : Questions.find(
{ site_id : this.params._id, deleted: true },
{ sort : { updated_at: -1 } }
)
};
}
});
Change required:
Now we want responses to be date-filterable. This means the denormalized response counts we've tracked on Answers aren't very useful. We've been tracking another collection (Responses) with more a "raw" version of the data. A Response object tracks the module (questions in this case), question_id, answer_id, timestamp, id for the customer the question belongs to, etc.
Question:
Is this something that template subscriptions help with? Perhaps we need a publication that accepts a question_id and optional start/end dates for the filter. The stats template for each question would be subscribed to applicable Responses data in Template.question.create(). Based on the question_id, the publication would need to find Responses for related answers within the date filter. And maybe we use the publish-counts package to count the number of times each answer was selected and publish those counts.
The Responses collection will be quite large, so I'm trying to be careful about what I publish here. I don't want to waitOn all Responses to be published.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
What's the "best" JSON structure when you need to "filter" through data in Firebase (in Swift)?
I'm having users sort their questions into:
Business
Entertainment
Other
Is it better to have a separate child for each question genre? If so, how do I get all of the data (when i want it), and then filter it only by "business" when I want to?
In NoSQL databases you usually end up modeling your data structure for the use-cases you want to allow in your app.
It's a bit of a learning path, so I'll explain it below in four steps:
Tree by category: Storing the data in a tree by its category, as you seem to be most interested in already.
Flat list of questions, and querying: Storing the data in a flat list, and then using queries to filter.
Flat list and indexes: Combining the above two approaches, to make the result more scalable.
Duplicating data: By duplicating data on top of that, you can reduce code complexity and improve performance further.
Tree by category
If you only want to get the questions by their category, you're best of simply storing each question under its category. In a simple model that'd look like this:
questionsByCategory: {
Business: {
question1: { ... },
question4: { ... }
},
Entertainment: {
question2: { ... },
question5: { ... }
},
Other: {
question3: { ... },
question6: { ... }
}
}
With the above structure, loading a list of question for a category is a simple, direct-access read for that category: firebase.database().ref("questionsByCategory").child("Business").once("value"....
But if you'd need a list of all questions, you'd need to read all categories, and denest the categories client-side. If you'd need a list of all question that is not a real problem, as you need to load them all anyway, but if you want to filter over some other condition than category, this may be wasteful.
Flat list of questions, and querying
An alternative is to create a flat list of all questions, and use queries to filter the data. In that case your JSON would look like this:
questions: {
question1: { category: "Business", difficulty: 1, ... },
question2: { category: "Entertainment", difficulty: 1, ... },
question3: { category: "Other", difficulty: 2, ... },
question4: { category: "Business", difficulty: 2, ... }
question5: { category: "Entertainment", difficulty: 3, ... }
question6: { category: "Other", difficulty: 1, ... }
}
Now, getting a list of all questions is easy, as you can just read them and loop over the results:
firebase.database().ref("questions").once("value").then(function(result) {
result.forEach(function(snapshot) {
console.log(snapshot.key+": "+snapshot.val().category);
})
})
If we want to get all questions for a specific category, we use a query instead of just the ref("questions"). So:
Get all Business questions:
firebase.database().ref("questions").orderByChild("category").equalTo("Business").once("value")...
Get all questions with difficult 3:
firebase.database().ref("questions").orderByChild("difficult").equalTo(3).once("value")...
This approach works quite well, unless you have huge numbers of questions.
Flat list and indexes
If you have millions of questions, Firebase database queries may not perform well enough anymore for you. In that case you may need to combine the two approaches above, using a flat list to store the question, and so-called (self-made) secondary indexes to perform the filtered lookups.
If you think you'll ever reach this number of questions, I'd consider using Cloud Firestore, as that does not have the inherent scalability limits that the Realtime Database has. In fact, Cloud Firestore has the unique guarantee that retrieving a certain amount of data takes a fixed amount of time, no matter how much data there is in the database/collection.
In this scenario, your JSON would look like:
questions: {
question1: { category: "Business", difficulty: 1, ... },
question2: { category: "Entertainment", difficulty: 1, ... },
question3: { category: "Other", difficulty: 2, ... },
question4: { category: "Business", difficulty: 2, ... }
question5: { category: "Entertainment", difficulty: 3, ... }
question6: { category: "Other", difficulty: 1, ... }
},
questionsByCategory: {
Business: {
question1: true,
question4: true
},
Entertainment: {
question2: true,
question5: true
},
Other: {
question3: true,
question6: true
}
},
questionsByDifficulty: {
"1": {
question1: true,
question2: true,
question6: true
},
"2": {
question3: true,
question4: true
},
"3": {
question3: true
}
}
You see that we have a single flat list of the questions, and then separate lists with the different properties we want to filter on, and the question IDs of the question for each value. Those secondary lists are also often called (secondary) indexes, since they really serve as indexes on your data.
To load the hard questions in the above, we take a two-step approach:
Load the questions IDs with a direct lookup.
Load each question by their ID.
In code:
firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result) {
result.forEach(function(snapshot) {
firebase.database().ref("questions").child(snapshot.key).once("value").then(function(questionSnapshot) {
console.log(questionSnapshot.key+": "+questionSnapshot.val().category);
});
})
})
If you need to wait for all questions before logging (or otherwise processing) them, you'd use Promise.all:
firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result) {
var promises = [];
result.forEach(function(snapshot) {
promises.push(firebase.database().ref("questions").child(snapshot.key).once("value"));
})
Promise.all(promises).then(function(questionSnapshots) {
questionSnapshots.forEach(function(questionSnapshot) {
console.log(questionSnapshot.key+": "+questionSnapshot.val().category);
})
})
})
Many developers assume that this approach is slow, since it needs a separate call for each question. But it's actually quite fast, since Firebase pipelines the requests over its existing connection. For more on this, see Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly
Duplicating data
The code for the nested load/client-side join is a bit tricky to read at times. If you'd prefer only performing a single load, you could consider duplicating the data for each question into each secondary index too.
In this scenario, the secondary index would look like this:
questionsByCategory: {
Business: {
question1: { category: "Business", difficulty: 1, ... },
question4: { category: "Business", difficulty: 2, ... }
},
If you come from a background in relational data modeling, this may look quite unnatural, since we're now duplicating data between the main list and the secondary indexes.
To an experienced NoSQL data modeler however, this looks completely normal. We're trading off storing some extra data against the extra time/code it takes to load the data.
This trade-off is common in all areas of computer science, and in NoSQL data modeling you'll fairly often see folks choosing to sacrifice space (and thus store duplicate data) to get an easier and more scalable data model.
I just use mongoose recently and a bit confused how to sort and paginate it.
let say I make some project like twitter and I had 3 schema. first is user second is post and third is post_detail. user schema contains data that user had, post is more like fb status or twitter tweet that we can reply it, post_detail is like the replies of the post
user
var userSchema = mongoose.Schema({
username: {
type: String
},
full_name: {
type: String
},
age: {
type: Number
}
});
post
var postDetailSchema = mongoose.Schema({
message: {
type: String
},
created_by: {
type: String
}
total_reply: {
type: Number
}
});
post_detail
var postDetailSchema = mongoose.Schema({
post_id: {
type: String
}
message: {
type: String
},
created_by: {
type: String
}
});
the relation is user._id = post.created_by, user._id = post_detail.created_by, post_detail.post_id = post._id
say user A make 1 post and 1000 other users comment on that posts, how can we sort the comment by the username of user? user can change the data(full_name, age in this case) so I cant put the data on the post_detail because the data can change dynamically or I just put it on the post_detail and if user change data I just change the post_detail too? but if I do that I need to change many rows because if the same users comment 100 posts then that data need to be changed too.
the problem is how to sort it, I think if I can sort it I can paginate it too. or in this case I should just use rdbms instead of nosql?
thanks anyway, really appreciate the help and guidance :))
Welcome to MongoDB.
If you want to do it in the way you describe, just don't go for Mongo.
You are designing the schema based on relations and not in documents.
Your design requires to do joins and this does not work well in mongo because there is not an easy/fast way of doing this.
First, I would not create a separate entity for the post details but embedded in the Post document the post details as a list.
Regarding your question:
or I just put it on the post_detail and if user change data I just
change the post_detail too?
Yes, that is what you should do. If you want to be able to sort the documents by the userName you should denormalize it and include in the post_details.
If I had to design the schema, it would be something like this:
{
"message": "blabl",
"authorId" : "userId12",
"total_reply" : 100,
"replies" : [
{
"message" : "okk",
"authorId" : "66234",
"authorName" : "Alberto Rodriguez"
},
{
"message" : "test",
"authorId" : "1231",
"authorName" : "Fina Lopez"
}
]
}
With this schema and using the aggregation framework, you can sort the comments by username.
If you don't like this approach, I rather would go for an RDBMS as you mentioned.
Like Facebook, I would like to aggregate the results. But I can't figure out how to go about it.
Example:
Let's say 10 users like my posts.
I don't want to get 10 notifications. 1 is of course enough.
This is my schema:
var eventLogSchema = mongoose.Schema({
//i.e. Somebody commented, sombody liked, etc.
event: String,
//to a comment, to a post, to a profile, etc.
toWhat: String,
//who is the user we need to notify?
toWho: {type: mongoose.Schema.Types.ObjectId, ref:'User'},
//post id, comment id, whatever..
refID: {type: mongoose.Schema.Types.ObjectId},
//who initiated the event.
whoDid: {type: mongoose.Schema.Types.ObjectId, ref: 'User'},
// when the event happened
date: {type:Date, default: Date.now()},
//whether the user already saw this notification or not.
seen: {type:Boolean, default: false}
})
so I need to count the times
Ex.1: event='liked' and toWhat="post" and refID=myPostID and seen=false
But at the same time, I would like to populate the last event with this parameters on the 'who' path so I could display "Michael and 9 other people liked your post(link to post)"
Every way I can think of doing this is clunky and requires multiple queries that feel like they would cost a lot of system resources and I am wondering if there's a simple way to do it.
Actually it gets more complicated then that.
I do not want to specify values like I did in Ex.1.
Instead I would like to say
aggregate all events with similar 'event', 'toWhat',
'refID' with value seen=false and populate the last one on the 'who' path.
Would love some reading materials, links, advice, or anything.
Thanks!
Managed to solve it like this.
Not sure if it's optimal, but it works.
//The name of my Schema
Notification.aggregate([
{
$match: {
//I only want to display new notifications
seen: {$ne: true}
//to query a specific user add
// toWho: UserID
}
},
{
$group: {
//groups when event, toWhat, and refID are similar
_id: {
event: '$event',
toWhat: '$toWhat',
refID: '$refID',
},
//gets the total number of this type of notification
howMany: {$sum: 1},
//gets the date of the last document in this query
date: {$max: '$date'},
//pulls the user ID of the last user in this query
user: {$last: '$whoDid'}
}
}
]).exec(function (err, results) {
if (err) throw err;
if (results) {
//after I get the results, I want to populate my user to get his name.
Notification.populate(results, {path: 'user', model: "User"}, function (err, notifications) {
if (err) throw err;
if (notifications) res.send(notifications);
})
}
})
I'm not sure whether it's possible to populate the aggregated result in one query, I assume that if it's possible it would be optimal, but so far, this seems acceptable for my needs.
Hope this helps.
TLDR; Should you use subdocuments or relational Id?
This is my PostSchema:
const Post = new mongoose.Schema({
title: {
type: String,
required: true
},
body: {
type: String,
required: true
},
comments: [Comment.schema]
})
And this is my Comment Schema:
const Comment = new mongoose.Schema({
body: {
type: String,
required: true
}
})
In Postgres, I would have a post_id field in Comment, instead of having an array of comments inside Post. I am sure you can do the same in MongoDB but I don't know which one is more conventional. If people use subdocuments over references (and joining tables) in MongoDB, why is that? In other words, why should I ever use subdocuments? If it's advantageous, should I do the same in Postgres as well?
What I understood from your question, answering based on that.
If you will keep sub documents, you don't have to query two tables to know comments specific to one post.
Let's say we have following db structure for post:-
[{
_id:1,
title:'some title',
comments:[
{
...//some fields that belongs to comments
} ,
{
...//some fields that belongs to comments
} ,
...
]
},
{
_id:2,
title:'some title',
comments:[
{
...//some fields that belongs to comments
} ,
{
...//some fields that belongs to comments
} ,
...
]
}]
Now you can query based on _id of the post (1) and can get comments array that belongs to the specific post.
If you will just keep the comment's id inside post, you have to query both the tables, which I don't think is a good idea.
EDIT :-
If you are keeping post id inside comments record, then it will help you to track which comment is for which post i.e. if you want to query comments table based on post id and you need only fields from comments records.
What I think, use case will be which post contains what all comments. So keeping comment inside post will give you comments fields as well as fields from post record.
So it's totally depends on your requirement, how you will design your data structure.
The document I am working on is extremely large. It collects user input from an extremely long survey (like survey monkey) and stores the answers in a mongodb database.
I am unsurprisingly getting the following error
Error: Document exceeds maximal allowed bson size of 16777216 bytes
If I cannot change the fields in my document is there anything I can do? Is there some way to compress down the document, by removing white space or something like that?
Edit
Here is the structure of the document
Schema({
id : { type: Number, required: true },
created: { type: Date, default: Date.now },
last_modified: { type: Date, default: Date.now },
data : { type: Schema.Types.Mixed, required: true }
});
An example of the data field:
{
id: 65,
question: {
test: "some questions",
answers: [2,5,6]
}
// there could be thousands of these question objects
}
One thing you can do is to build your own mongoDB :-). Mongodb is an open source and the limitation about the size of a document is rather arbitrary to enforce a better schema design. You can just modify this line and build it for yourself. Be careful with this.
The most straight forward idea is to have each small question in a different document with a field which reference to its parent.
Another idea is to limit number of documents in the parent. Lets say you limit is N elements then the parent looks like this:
{
_id : ObjectId(),
id : { type: Number, required: true },
created: { type: Date, default: Date.now }, // you can store it only for the first element
last_modified: { type: Date, default: Date.now }, // the same here
data : [{
id: 65,
question: {
test: "some questions",
answers: [2,5,6]
}
}, ... up to N of such things {}
]
}
This way modifying number N you can make sure that you will be in 16 MB of BSON. And in order to read the whole survey you can select
db.coll.find({id: the Id you need}) and then combine the whole survey on the application level. Also do not forget to ensureIndex on id.
Try different things, do a benchmark on your data and see what works for you.
You should be using gridfs. It allows you to store documents in chunks. Here's the link: http://docs.mongodb.org/manual/reference/gridfs/