Nested Comments in MongoDB - mongodb

I'm quite new to MongoDB and trying to build a nested comment system with it.
On the net you're finding various document structures to achieve that, but I'm looking for some proposals that would enable me easily to do the following things with the comments
Mark comments as spam/approved and retrieve comments by this attributes
Retrieve comments by user
Retrieve comment count for an object/user
Besides of course displaying the comments as it is normally done. If you have any suggestions on how to handle these things with MongoDB - or - tell me to look for an alternative it'd be appreciated much!

Have you considered storing the comments in all documents that need a reference to them? If you have a document for the user, store all of that user's comments in it. If you have a separate document for objects, store all comments there also. It feels sort of wrong after coming from a relational world where you try to have exactly one copy of a given piece of data, and then reference it by ID, but even with relational databases you have to start duplicating data if you want queries to run quickly.
With this design, each document that you load would be "complete". It would have all the data you need, and indexes on that collection would keep reads fast. The price would be slightly slower writes, and more of a headache when you need to update the comment text, since you need to update more than one document.

Because of you need retrieve comments by some attributes, by user, etc.., you can't embed(embedding is always faster for document databases) comment in each object that users can comment. So you need create separate collection for the comments. I suggest following structure:
comment
{
_id : ObjectId,
status: int (spam =1, approved =2),
userId: ObjectId,
commentedObjectId: ObjectId,
commentedObjectType: int(for example question =1, answer =2, user =3),
commentText
}
With above structure you can easy do things thats you want:
//Mark comments as spam/approved and retrieve comments by this attributes
//mark specific comment as spam
db.comments.update( { _id: someCommentId }, { status: 1 }, true);
db.comments.find({status : 1});// get all comments marked as spam
//Retrieve comments by user
db.comments.find({'_userId' : someUserId});
//Retrieve comment count for an object/user
db.comments.find({'commentedObjectId' : someId,'commentedObjectType' : 1 })
.count();
Also i suppose for comments counting will be better to create extra field in each object and inc it on comment add/delete.

Related

How does querying nested data work? Can I still retrieve data from 1 level down?

I want to query data that is two levels down, however, would I still be able to retrieve data from its original node?
To explain better, my Firebase Database looks like:
posts
-192u3jdj0j9sj0
-message: haha this is funny (CAN I STILL GET THIS DATA)
-genre: comedy (CAN I STILL GET THIS DATA)
-author
-user: "jasonj"
-comment
-ajiwj2319j0jsf9d0jf
-comment: "lol"
-user: "David" (QUERY HERE****)
-jfaiwjfoj1ijifjojif
-comment: "so funny"
-user: "Toddy"
I essentially want to query by all of the comments David has posted. However, with how query works, can I still grab the original (message & genre) that was from "level 1"? Or would I have to restructure my data? Possibly rewriting the level 1 data under comment.
(End goal: something like Yahoo answers, where the user can see the questions he posted, as well as the questions to where he posted comments)
Below code works, but I'm not sure how to pull up level 1 data or if its even possible
ref = Database.database().reference().child("posts").child(myPost).child("comment")
var queryRef:DatabaseQuery
queryRef = ref.queryOrdered(byChild: "user").queryEqual(toValue: "David")
queryRef.observeSingleEvent(of: .value, with: { (snapshot) in
if snapshot.childrenCount > 0 {
Your current data structure makes it easy to find the comments for a specific post. It does not however make it easy to find the comments from a specific author. The reason for that is that Firebase Database queries treat your content as a flat list of nodes. The value you want to filter on, must be at a fixed path under each node.
To allow finding the comments from a specific author, you'll want to add an additional node where you keep that information. For example:
"authorComments": {
"David": {
"-192u3jdj0j9sj0_-ajiwj2319j0jsf9d0jf": true
},
"Toddy": {
"-192u3jdj0j9sj0_-jfaiwjfoj1ijifjojif": true
}
}
This structure is often known as a reverse index, and it allows you to easily find the comment paths (I used a _ as the separator of path segments above) for a specific user.
This sort of data duplication is quite common when using NoSQL databases, as you often have to modify/expand your data structure to allow the use-cases that your app needs.
Also see my answers here:
Firebase Query Double Nested
Firebase query if child of child contains a value

how to join a collection and sort it, while limiting results in MongoDB

lets say I have 2 collections wherein each document may look like this:
Collection 1:
target:
_id,
comments:
[
{ _id,
message,
full_name
},
...
]
Collection 2:
user:
_id,
full_name,
username
I am paging through comments via $slice, let's say I take the first 25 entries.
From these entries I need the according usernames, which I receive from the second collection. What I want is to get the comments sorted by their reference username. The problem is I can't add the username to the comments because they may change often and if so, I would need to update all target documents, where the old username was in.
I can only imagine one way to solve this. Read out the entire full_names and query them in the user collection. The result would be sortable but it is not paged and so it takes a lot of resources to do that with large documents.
Is there anything I am missing with this problem?
Thanks in advance
If comments are an embedded array, you will have to do work on the client side to sort the comments array unless you store it in sorted order. Your application requirements for username force you to either read out all of the usernames of the users who commented to do the sort, or to store the username in the comments and have (much) more difficult and expensive updates.
Sorting and pagination don't work unless you can return the documents in sorted order. You should consider a different schema where comments form a separate collection so that you can return them in sorted order and paginate them. Store the username in each comment to facilitate the sort on the MongoDB side. Depending on your application's usage pattern this might work better for you.
It also seems strange to sort on usernames and expect/allow usernames to change frequently. If you could drop these requirements it'd make your life easier :D

mongodb - add column to one collection find based on value in another collection

I have a posts collection which stores posts related info and author information. This is a nested tree.
Then I have a postrating collection which stores which user has rated a particular post up or down.
When a request is made to get a nested tree for a particular post, I also need to return if the current user has voted, and if yes, up or down on each of the post being returned.
In SQL this would be something like "posts.*, postrating.vote from posts join postrating on postID and postrating.memberID=currentUser".
I know MongoDB does not support joins. What are my options with MongoDB?
use map reduce - performance for a simple query?
in the post document store the ratings - BSON size limit?
Get list of all required posts. Get list of all votes by current user. Loop on posts and if user has voted add that to output?
Is there any other way? Can this be done using aggregation?
NOTE: I started on MongoDB last week.
In MongoDB, the simplest way is probably to handle this with application-side logic and not to try this in a single query. There are many ways to structure your data, but here's one possibility:
user_document = {
name : "User1",
postsIhaveLiked : [ "post1", "post2" ... ]
}
post_document = {
postID : "post1",
content : "my awesome blog post"
}
With this structure, you would first query for the user's user_document. Then, for each post returned, you could check if the post's postID is in that user's "postsIhaveLiked" list.
The main idea with this is that you get your data in two steps, not one. This is different from a join, but based on the same underlying idea of using one key (in this case, the postID) to relate two different pieces of data.
In general, try to avoid using map-reduce for performance reasons. And for this simple use case, aggregation is not what you want.

Design MongoDb Schema For My Social

I'm new for MongoDB , I just want to create a simple project to test performance of MongoDB
The project just like a simple CMS
it has users, blogs and comments, users can have friends
so I design my database like that
user
{
_ID:
name:
birth_day:
sex:
friends:[id_1,Id_2]
}
blogs
{
title:
owner:
tags_fiends:
comments:
[
{"_id":"","content":"","date_created":""},
{"_id":"","content":"","date_created":""},
],
"like"={"_id","_id"}
}
And How many collection are needed for this database. Can I use 1 Collection for both user and blog.Thanks in advance.
Due to mongoDB is schema less or schema free DB You can make any kind of structure within a document, which is supported:
individual elements
nested arrays
nested documents
There is a couple of things you have to considare during schema design which for it is useful to have the users and the blogs in separated schema. For example if you storing something in a nested array you can specify index for fastening the search within this array, but you can have only one multykéy index (indexed array content) within one particular collection. so if you store, friends and blogs, and posts, and tags all in arrays you can have index only on one of them.
Also important to know in this case that there is a size limit for each document what is now 16MB.
In your scenario, I would make Users a collection and reference it by _id from the blog collection.
In practise, you could make the Blogs an attribute of User, the only constraint being the max doc size of 16MB - but that's a lot of blogs (text).
To get round that (assuming you need to), a separate Blog collection referencing the user _id would be fine. You may need to denormalise the user name too if that's not your _id. This would mean you can get all the blogs for a user in a single query.

best possible schema design for log analysis database in mongodb

i have to store the following data in mongodb uid, gender ,country, city, date_of_visit, url_of_visit
I would like to store uid, gender, country and city in one collection because these information will never change for particular user.
in the other collection i would like to store uid, date_of_visit, url_of_visit
i want to know which is best practice to store uid, date_of_visit and url_of_visit.there are two things in my mind..
(a) { uid: 100, date: xxxxxxxxxxxxxxx, url: abc.php }
{ uid: 100, date: xxxxxx, url: ref.php }
{ uid: 200, date: xxxxxxxxx, url: ref.php }
(b) { uid:100, visit:[{date:xxxxxxx, url:abc.php},
{date:xxxx, url:def.php},
{.........................}]}
i want to have following index date:1, uid:1 ,url:1 ...the problem with approach (a) is with each row inserted in database the database side and index size will grow and there will come a point when index size will not fit into RAM
problem with approach (b) is at some point each document will exceed the 16 MB limit and this approach will fail that time..
please suggest me what should be the best schema design for this scenario. i would also have the query which include uid, gender, country, date_of_visit, url_of_visit
I know this thread is a bit older but I'm wondering if you've decided on a structure and if it works well.
My idea was, instead of risking to create too large documents, to structure it similar to your second approach but include the date in the main collection. This way each document would be the user's activity within one day. It would be indexed by user and date, easy to update and query and keep things organized.
Something like:
{ uid:100, date:xxxxxxx, event:[{time:xxxxxxx, url:abc.php},
{time:xxxx, url:def.php},
{.........................}]}
I think the second approach is better than one because it corresponds to idea of grouping similar data together. About exceeding 16M of document you can reach this limit but he should be a very active user. :)
Also you can pull out some data to another collection and make reference using ObjectId or DBRef.
See more info http://www.mongodb.org/display/DOCS/Database+References#DatabaseReferences-DBRef
Your second approach will force you to fetch a huge amount of data from the embedded document, which cannot be filtered by Mongo. In other words, if you have a million documents stored inside the "event" field for a particular user, then when you fetch those embedded documents with dot notation, then the entire document including the parent will be returned. There's no way you can filter the results.
I would recommend the first approach which makes the data easier to retrieve and work with.