mongodb - add column to one collection find based on value in another collection - mongodb

I have a posts collection which stores posts related info and author information. This is a nested tree.
Then I have a postrating collection which stores which user has rated a particular post up or down.
When a request is made to get a nested tree for a particular post, I also need to return if the current user has voted, and if yes, up or down on each of the post being returned.
In SQL this would be something like "posts.*, postrating.vote from posts join postrating on postID and postrating.memberID=currentUser".
I know MongoDB does not support joins. What are my options with MongoDB?
use map reduce - performance for a simple query?
in the post document store the ratings - BSON size limit?
Get list of all required posts. Get list of all votes by current user. Loop on posts and if user has voted add that to output?
Is there any other way? Can this be done using aggregation?
NOTE: I started on MongoDB last week.

In MongoDB, the simplest way is probably to handle this with application-side logic and not to try this in a single query. There are many ways to structure your data, but here's one possibility:
user_document = {
name : "User1",
postsIhaveLiked : [ "post1", "post2" ... ]
}
post_document = {
postID : "post1",
content : "my awesome blog post"
}
With this structure, you would first query for the user's user_document. Then, for each post returned, you could check if the post's postID is in that user's "postsIhaveLiked" list.
The main idea with this is that you get your data in two steps, not one. This is different from a join, but based on the same underlying idea of using one key (in this case, the postID) to relate two different pieces of data.
In general, try to avoid using map-reduce for performance reasons. And for this simple use case, aggregation is not what you want.

Related

how to join a collection and sort it, while limiting results in MongoDB

lets say I have 2 collections wherein each document may look like this:
Collection 1:
target:
_id,
comments:
[
{ _id,
message,
full_name
},
...
]
Collection 2:
user:
_id,
full_name,
username
I am paging through comments via $slice, let's say I take the first 25 entries.
From these entries I need the according usernames, which I receive from the second collection. What I want is to get the comments sorted by their reference username. The problem is I can't add the username to the comments because they may change often and if so, I would need to update all target documents, where the old username was in.
I can only imagine one way to solve this. Read out the entire full_names and query them in the user collection. The result would be sortable but it is not paged and so it takes a lot of resources to do that with large documents.
Is there anything I am missing with this problem?
Thanks in advance
If comments are an embedded array, you will have to do work on the client side to sort the comments array unless you store it in sorted order. Your application requirements for username force you to either read out all of the usernames of the users who commented to do the sort, or to store the username in the comments and have (much) more difficult and expensive updates.
Sorting and pagination don't work unless you can return the documents in sorted order. You should consider a different schema where comments form a separate collection so that you can return them in sorted order and paginate them. Store the username in each comment to facilitate the sort on the MongoDB side. Depending on your application's usage pattern this might work better for you.
It also seems strange to sort on usernames and expect/allow usernames to change frequently. If you could drop these requirements it'd make your life easier :D

Query moongoDB from a redis list

If for example I keep lists of user posts in redis, for example a user has 1000 posts, and the posts documents are stored into mongodb but the link between the user and the posts is stored inside redis, I can rtetrieve the array containing all the ids of a user post from redis, but what is the efficient way to retrieving them from mongodb?
do I pass a parameter to mongoDB with the array of ids, and mongo will fetch those for me?
I don't seem to find any documentation on this, if Anyone is willing to help me out!
thanks in advance!
To retrieve a number of documents per id, you can use the $in operator to build the MongoDB query. See the following section from the documentation:
http://docs.mongodb.org/manual/reference/operator/query/in/#op._S_in
For instance you can build a query such as:
db.mycollection.find( { _id : { $in: [ id1, id2, id3, .... ] } } )
Depending on how much ids will be returned by Redis, you may have to group them in batch of n items (n=100 for instance) to run several MongoDB queries. IMO, this is a bad practice to build such query containing more than a few thousands ids. It is better to have smaller queries but accept to pay for the extra roundtrips.

Mongoid: retrieving documents whose _id exists in another collection

I am trying to fetch the documents from a collection based on the existence of a reference to these documents in another collection.
Let's say I have two collections Users and Courses and the models look like this:
User: {_id, name}
Course: {_id, name, user_id}
Note: this just a hypothetical example and not actual use case. So let's assume that duplicates are fine in the name field of Course. Let's thin Course as CourseRegistrations.
Here, I am maintaining a reference to User in the Course with the user_id holding the _Id of User. And note that its stored as a string.
Now I want to retrieve all users who are registered to a particular set of courses.
I know that it can be done with two queries. That is first run a query and get the users_id field from the Course collection for the set of courses. Then query the User collection by using $in and the user ids retrieved in the previous query. But this may not be good if the number of documents are in tens of thousands or more.
Is there a better way to do this in just one query?
What you are saying is a typical sql join. But thats not possible in mongodb. As you suggested already you can do that in 2 different queries.
There is one more way to handle it. Its not exactly a solution, but the valid workaround in NonSql databases. That is to store most frequently accessed fields inside the same collection.
You can store the some of the user collection fields, inside the course collection as embedded field.
Course : {
_id : 'xx',
name: 'yy'
user:{
fname : 'r',
lname :'v',
pic: 's'
}
}
This is a good approach if the subset of fields you intend to retrieve from user collection is less. You might be wondering the redundant user data stored in course collection, but that's exactly what makes mongodb powerful. Its a one time insert but your queries will be lot faster.

Mongodb map reduce across 2 collection

Let say we have user and post collection. In post collection, vote store the user name as a key.
db.user.insert({name:'a', age:12});
db.user.insert({name:'b', age:12});
db.user.insert({name:'c', age:22});
db.user.insert({name:'d', age:22});
db.post.insert({Title:'Title1', vote:[a]});
db.post.insert({Title:'Title2', vote:[a,b]});
db.post.insert({Title:'Title3', vote:[a,b,c]});
db.post.insert({Title:'Title4', vote:[a,b,c,d]});
We would like to group by the post.Title and find out the count of vote in different user age.
> {_id:'Title1', value:{ ages:[{age:12, Count:1},{age:22, Count:0}]} }
> {_id:'Title2', value:{ ages:[{age:12, Count:2},{age:22, Count:0}]} }
> {_id:'Title3', value:{ ages:[{age:12, Count:2},{age:22, Count:1}]} }
> {_id:'Title4', value:{ ages:[{age:12, Count:2},{age:22, Count:2}]} }
I have searched through and doesn't find a way to access 2 collection in mongodb mapreduce.
Could it be possible to achieve in re-reduce?
I know it is much simple to embedded the user document in post, but it is not a nice way to do as the real user document have many properties. If we include the simplify version of user document, it will limit the dimension of analysis.
{Title:'Title1', vote:[{name:'a', age:12}]}
MongoDB does not have a multi-collection Map / Reduce. MongoDB does not have any JOIN syntax and may not be very good for ad-hoc joins. You will need to denormalize this data in some way.
You have a few options:
Option #1: Embed the age with the vote.
{Title:'Title1', vote:[{name:'a', age:12}]}
Option #2: Keep a counter of the ages
{Title:'Title1', vote:[a, b], age: { "12" : 1, "22" : 1 }}
Option #3: Do a "manual" join
Your last option is to write script/code that does a for loop over both collections and merges the data correctly.
So you would loop over post and output a collection with the title and the list of votes. Then you would loop through the new collection and update the ages by looking up each user.
My suggestion
Go with #1 or #2.
Instead of
{name:'a', age:12}
It is easier to add a new field to user document and maintain it in each vote update.Of course, you can enjoy to use map reduce to analysis your data.
{name:'a', age:12, voteTitle:["Title1","Title2","Title3","Title4"]}

Nested Comments in MongoDB

I'm quite new to MongoDB and trying to build a nested comment system with it.
On the net you're finding various document structures to achieve that, but I'm looking for some proposals that would enable me easily to do the following things with the comments
Mark comments as spam/approved and retrieve comments by this attributes
Retrieve comments by user
Retrieve comment count for an object/user
Besides of course displaying the comments as it is normally done. If you have any suggestions on how to handle these things with MongoDB - or - tell me to look for an alternative it'd be appreciated much!
Have you considered storing the comments in all documents that need a reference to them? If you have a document for the user, store all of that user's comments in it. If you have a separate document for objects, store all comments there also. It feels sort of wrong after coming from a relational world where you try to have exactly one copy of a given piece of data, and then reference it by ID, but even with relational databases you have to start duplicating data if you want queries to run quickly.
With this design, each document that you load would be "complete". It would have all the data you need, and indexes on that collection would keep reads fast. The price would be slightly slower writes, and more of a headache when you need to update the comment text, since you need to update more than one document.
Because of you need retrieve comments by some attributes, by user, etc.., you can't embed(embedding is always faster for document databases) comment in each object that users can comment. So you need create separate collection for the comments. I suggest following structure:
comment
{
_id : ObjectId,
status: int (spam =1, approved =2),
userId: ObjectId,
commentedObjectId: ObjectId,
commentedObjectType: int(for example question =1, answer =2, user =3),
commentText
}
With above structure you can easy do things thats you want:
//Mark comments as spam/approved and retrieve comments by this attributes
//mark specific comment as spam
db.comments.update( { _id: someCommentId }, { status: 1 }, true);
db.comments.find({status : 1});// get all comments marked as spam
//Retrieve comments by user
db.comments.find({'_userId' : someUserId});
//Retrieve comment count for an object/user
db.comments.find({'commentedObjectId' : someId,'commentedObjectType' : 1 })
.count();
Also i suppose for comments counting will be better to create extra field in each object and inc it on comment add/delete.