Currently I have two models in my application - for users and comments. The simplified structure is as follows:
User
{
id : "01",
username : "john"
}
Comment
{
id : "001",
body : "this is the comment"
}
Now I would like to associate users with their comments. Coming from SQL world, the first thing coming to my mind is simply adding user_id field in comment document and then use JOIN, but I guess it's not an optimal solution in terms of efficiency.
The other solution could be to embed comments in user's document:
{
id : "01",
username : "john",
comments : [
{
id : "001",
body : "this is the comment"
}
]
}
But I'm going to query for comments very often, e.g. when showing all comments from the past 24 or 48 hours. And alongside with the comment, I want to display the username.
I could of course add username field to the comment document. But then I have username stored in two places - in users collection and comments collection.
What is the best approach here?
If you have a very huge number of comments per user, it is not a good option to put those comments as sub-document of the user collection. It will not increase the efficiency. The best option will be to put the user_id in the comment collection and creating an index on that field.
Related
There is a lot of content of what kind of relationships should use in a database schema. However, I have not seen anything about mixing both techniques.
The idea is to embed only the necessaries attributes and with them a reference. This way the application have the necessary data for rendering and the reference for the updating methods.
The problem I see here is that the logic for handle any CRUD operations becomes more tricky because its mandatory to update multiples collections however I have all the information in one single read.
Basic schema for a page that only wants the students names of a classroom:
CLASSROOM COLLECTION
{"_id": ObjectID(),
"students": [{"studentId" : ObjectID(),
"name" : "John Doe",
},
...
]
}
STUDENTS COLLECION
{"_id": ObjectID(),
"name" : "John Doe",
"address" : "...",
"age" : "...",
"gender": "..."
}
I use the students' collection in a different page and there I do not want any information about the classroom. That is the reason not to embed the students.
I started to learning mongo a few days ago and I don't know if this kind of schema bring some problems.
You can embed some fields and store other fields in a different collection as you are suggesting.
The issues with such an arrangement in my opinion would be:
What is the authority for a field? For example, what if a field like name is both embedded and stored in the separate collection, and the values differ?
Both updating and querying become awkward as you need to do it differently depending on which field is being worked with. If you make a mistake and go in the wrong place, you create/compound the first issue.
I've got such four tables:
Point is that users that joined in particular group have access to a survey for time interval from date to date. How should i organize collection structure of such db in mongodb?
For survey and questions this will be a simple colection of surveys with an array of questions. But for this behavior with start/end of survey it is not clear for me how to store this data.
What about something like.
Groups
{
_id : "group1",
"members" : [{"name":"A"...},{"name":"B"...}],
"surveys" : [{"surveyId":"survey1", "startDate": ISODate(),"endDate":ISODate()},{"surveyId":"survey2", "startDate": ISODate(),"endDate":ISODate()}]
}
Surveys
{
_id : "survey1",
questions : [{"text":"Atheist??"...},{....}]
}
Honestly, it depends on what pattern you want to use, I mean you can embed groups inside survey also with registration details.
I am using MongoDB and I ended up with two Collections (unintentionally).
The first Collection (sample) has 100 million records (Tweets) with the following structure:
{
"_id" : ObjectId("515af34297c2f607b822a54b"),
"text" : "bla bla ",
"id" : NumberLong("314965680476803072"),
"user" :
{
"screen_name" : "TheFroooggie",
"time_zone" : "Amsterdam",
},
}
The second Collection (users) with 30 Million records of unique users from the tweet collection and it looks like this
{ "_id" : "000000_n", "target" : 1, "value" : { "count" : 5 } }
where the _id in the users collection is the user.screen_name from the tweets collection, the target is their status (spammer or not) and finally the value.count is the number a user appeared in our first collection (sample) collection (e.g. number of captured tweets)
Now I'd like to make the following query:
I'd like to return all the documents from the sample collection (tweets) where the user has the target value = 1
In other words, I want to return all the tweets of all the spammers for example.
As you receive the tweets you could upsert them into a collection. Using the author information as the key in the "query" document portion of the update. The update document could utilize the $addToSet operator to put the tweet into a tweets array. You'll end up with a collection that has the author and an array of tweets. You can then do your spammer classification for each author and have their associated tweets.
So, you would end up doing something like this:
db.samples.update({"author":"joe"},{$addToSet:{"tweets":{"tweet_id":2}}},{upsert:true})
This approach does have the likely drawback of growing the document past its initially allocated size on disk which means it would be moved and expanded on disk. You would likely incur some penalty for index updating as well.
You could also take an approach of storing a spam rating with each tweet document and later pulling those based on user id.
As others have pointed out, there is nothing wrong with setting up the appropriate indexes and using a cursor to loop through your users pulling their tweets.
The approach you choose should be based on your intended access pattern. It sounds like you are in a good place where you can experiment with several different possible solutions.
i have a requirement where i need to get the friends of user. I have made two collections named User and Friends.
The code that i use to access the data from the Friends and User is:
var friend = Friends.find({acceptor:req.currentUser.id,status:'0'},function(err, friends) {
console.log('----------------friends-----------------'+friends.length);
});
console.log is giving me the desired results for friends.Now if i use friend to access the User data like the one given i am not getting the result that i need.
var user = User.find({_id:friend.requestor},function(err, users) {
console.log('----------------user-----------------'+users.length);
});
how can i join the two queries to get the desired result.Please help
I'd suggest you try to denormalize the data instead of going down the SQL path:
User {
"FirstName" : "Jack",
"LastName" : "Doe",
// ...
// no friend info here
}
Put denormalized information in the list of friends. Don't use an embedded array, because you probably don't need to fetch all friend ids every time you fetch a user. The details of the data structure depend on the relations you want to support (directed vs. undirected, etc.), but it would roughly look like this:
FriendList {
OwnerUserId : ObjectId("..."),
FriendUserId : ObjectId("..."),
FriendName: "Jack Doe"
// add more denormalized information
}
Now, to display the list of friends of a user:
var friends = db.FriendList.find({"OwnerUserId" : currentUserId});
The downside is that, if a friend changes her name, you'll have to update all references of that name. On the other hand, that logic is trivial and that the (typically much more common) query "fetch all friends" is super fast, easy to code, and easy to page.
Let's take a simple example, a blog post. I would store comments to a particular post within the same document.
messages = { '_id' : ObjectId("4cc179886c0d49bf9424fc74"),
'title' : 'Hello world',
'comments' : [ { 'user_id' : ObjectId("4cc179886c0d49bf9424fc74"),
'comment' : 'hello to you too!'},
{ 'user_id' : ObjectId("4cc1a1830a96c68cc67ef14d"),
'comment' : 'test!!!'},
]
}
The question is, would it make sense to store the username instead of the user's objectid aka primary key? There are pros/cons to both, pro being that if I display the username within the comment, I wouldn't have to run a second query. Con being if "John Doe" decides to modify his username, I would need to run a query across my entire collection to change his username within all comments/posts.
What's more efficient?
I will store the two fields. This way, you only run one query in the most common case (display the comments). Change user name is really rare so you will not have to update very often.
I will keep user_id because I don't like to use natural field like username as primary key and match on an object id must be faster.
Of course, it really depends on how much traffic you're going to get, how many comments you expect to have, etc… But it's likely that “do the simplest thing that works” is your friend here: it's simpler to store only the user_id, so do that until it doesn't work any more (eg, because you've got a post with 100,000 comments that takes 30 seconds to render), then denormalize and store the username along with the comments.