Mongodb Aggregate: Nested totals - mongodb

I am trying to generate a report from our mongo db that tallies up the unique visits by country code per referral site. I'd like to use aggregation as I've heard it is quite fast and performance here is an issue.
We have an account db that has a country code and last referral site associated with each account.
{
account:"user123",
CountryCode:"CA",
ReferalSite:"Google",
lastLogin:"someisodate"
}
Conceptually, I can write the javascript in a few minutes.
For each unique vistor in accounts db;
visits[vistor.country_code][vistor.refferal_site]+= 1;
Is this query possible with a db.accounts.aggregate()? Or is a map/reduce the better way to go about this.
Thanks in advance,

You can run two groups one after another :
db.collection.aggregate([
{$group:{_id:{account:'$account', CountryCode:'$CouintryCode', ReferalSite:'$ReferalSite'}}, {number:1}},
{$group:{_id:{CountryCode:'$_id.CountryCode', ReferalSite:'$_id.ReferalSite'}}, {number:{$sum:'$number'}}}])

Related

mongodb mapreduce groupby twice

I am new to mongodb and try to count how many distinct login users per day from existing collection. The data in collection looks like following
[{
_id: xxxxxx,
properties: {
uuid: '4b5b5c2e208811e3b5a722000a97015e',
time: ISODate("2014-12-13T00:00:00Z"),
type: 'login'
}
}]
Due to my limited knowledge, what I figure out so far is group by day first, output the data to a tmp collection and use this tmp collection to do anther map reduce and output the result to a final collection. This solution will get my collections bigger which I do not really like it. Does anyone can help me out or any good/more complex tutorials that I can follow? thanks
Rather than a map reduce, I would suggest an Aggregation. You can think of an aggregation as somewhat like a linux pipe, in that you can pass the results of one operation to the next. With this strategy, you can perform 2 consecutive groups and never have to write anything to the database.
Take a look at this question for more details on the specifics.

how to join a collection and sort it, while limiting results in MongoDB

lets say I have 2 collections wherein each document may look like this:
Collection 1:
target:
_id,
comments:
[
{ _id,
message,
full_name
},
...
]
Collection 2:
user:
_id,
full_name,
username
I am paging through comments via $slice, let's say I take the first 25 entries.
From these entries I need the according usernames, which I receive from the second collection. What I want is to get the comments sorted by their reference username. The problem is I can't add the username to the comments because they may change often and if so, I would need to update all target documents, where the old username was in.
I can only imagine one way to solve this. Read out the entire full_names and query them in the user collection. The result would be sortable but it is not paged and so it takes a lot of resources to do that with large documents.
Is there anything I am missing with this problem?
Thanks in advance
If comments are an embedded array, you will have to do work on the client side to sort the comments array unless you store it in sorted order. Your application requirements for username force you to either read out all of the usernames of the users who commented to do the sort, or to store the username in the comments and have (much) more difficult and expensive updates.
Sorting and pagination don't work unless you can return the documents in sorted order. You should consider a different schema where comments form a separate collection so that you can return them in sorted order and paginate them. Store the username in each comment to facilitate the sort on the MongoDB side. Depending on your application's usage pattern this might work better for you.
It also seems strange to sort on usernames and expect/allow usernames to change frequently. If you could drop these requirements it'd make your life easier :D

Query moongoDB from a redis list

If for example I keep lists of user posts in redis, for example a user has 1000 posts, and the posts documents are stored into mongodb but the link between the user and the posts is stored inside redis, I can rtetrieve the array containing all the ids of a user post from redis, but what is the efficient way to retrieving them from mongodb?
do I pass a parameter to mongoDB with the array of ids, and mongo will fetch those for me?
I don't seem to find any documentation on this, if Anyone is willing to help me out!
thanks in advance!
To retrieve a number of documents per id, you can use the $in operator to build the MongoDB query. See the following section from the documentation:
http://docs.mongodb.org/manual/reference/operator/query/in/#op._S_in
For instance you can build a query such as:
db.mycollection.find( { _id : { $in: [ id1, id2, id3, .... ] } } )
Depending on how much ids will be returned by Redis, you may have to group them in batch of n items (n=100 for instance) to run several MongoDB queries. IMO, this is a bad practice to build such query containing more than a few thousands ids. It is better to have smaller queries but accept to pay for the extra roundtrips.

mongodb - add column to one collection find based on value in another collection

I have a posts collection which stores posts related info and author information. This is a nested tree.
Then I have a postrating collection which stores which user has rated a particular post up or down.
When a request is made to get a nested tree for a particular post, I also need to return if the current user has voted, and if yes, up or down on each of the post being returned.
In SQL this would be something like "posts.*, postrating.vote from posts join postrating on postID and postrating.memberID=currentUser".
I know MongoDB does not support joins. What are my options with MongoDB?
use map reduce - performance for a simple query?
in the post document store the ratings - BSON size limit?
Get list of all required posts. Get list of all votes by current user. Loop on posts and if user has voted add that to output?
Is there any other way? Can this be done using aggregation?
NOTE: I started on MongoDB last week.
In MongoDB, the simplest way is probably to handle this with application-side logic and not to try this in a single query. There are many ways to structure your data, but here's one possibility:
user_document = {
name : "User1",
postsIhaveLiked : [ "post1", "post2" ... ]
}
post_document = {
postID : "post1",
content : "my awesome blog post"
}
With this structure, you would first query for the user's user_document. Then, for each post returned, you could check if the post's postID is in that user's "postsIhaveLiked" list.
The main idea with this is that you get your data in two steps, not one. This is different from a join, but based on the same underlying idea of using one key (in this case, the postID) to relate two different pieces of data.
In general, try to avoid using map-reduce for performance reasons. And for this simple use case, aggregation is not what you want.

make a join like SQL server in MongoDB

For example, we have two collections
users {userId, firstName, lastName}
votes {userId, voteDate}
I need a report of the name of all users which have more than 20 votes a day.
How can I write query to get data from MongoDB?
The easiest way to do this is to cache the number of votes for each user in the user documents. Then you can get the answer with a single query.
If you don't want to do that, the map-reduce the results into a results collection, and query that collection. You can then run incremental map-reduces that only calculate new votes to keep your results up to date: http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-IncrementalMapreduce
You shouldn't really be trying to do joins with Mongo. If you are you've designed your schema in a relational manner.
In this instance I would store the vote as an embedded document on the user.
In some scenarios using embedded documents isn't feasible, and in that situation I would do two database queries and join the results at the client rather than using MapReduce.
I can't provide a fuller answer now, but you should be able to achieve this using MapReduce. The Map step would return the userIds of the users who have more than 20 votes, the reduce step would return the firstName and lastName, I think...have a look here.