count multiple distinct fields by group with Mongo - mongodb

I have a data set looks as
{"BrandId":"a","SessionId":100,"UserName":"tom"}
{"BrandId":"a","SessionId":200,"UserName":"tom"}
{"BrandId":"b","SessionId":300,"UserName":"mike"}
I would like to count distinct session and username group by brandid, the sample sql is like:
select brandid,count_distinct(sessionid),count_distinct(username)
from data
group by brandid
I tried to write Mongo DB, my current code is as following and it does not work. Is there anyway to make it work?
db.logs.aggregate([
{$group:{
_id:{brand:"$BrandId",user:"$UserName",session:"$SessionId"},
count:{$sum:1}}},
{$group:{
_id:"$_id.brand",
users:{$sum:"$_id.user"},
sessions:{$sum:"$_id.session"}
}}
])
for the certain example, the expected count is
{"BrandId:"a","countSession":2,"countUser":1}
{"BrandId:"b","countSession":1,"countUser":1}
if you know SQL, the expect result is as same as the SQL I mentioned.

You can do this by using $addToSet to accumulate the distinct set of SessionId and UserName values during the $group, and then adding a $project stage to your pipeline that uses the $size operator to get the size of each set:
db.logs.aggregate([
{$group: {
_id: '$BrandId',
sessionIds: {$addToSet: '$SessionId'},
userNames: {$addToSet: '$UserName'}
}},
{$project: {
_id: 0,
BrandId: '$_id',
countSession: {$size: '$sessionIds'},
countUser: {$size: '$userNames'}
}}
])
Result:
{
"BrandId" : "b",
"countSession" : 1,
"countUser" : 1
},
{
"BrandId" : "a",
"countSession" : 2,
"countUser" : 1
}

Related

MongoDB is it possible to runCommand distinct with substr on key

Is it possible to runCommand distinct with substr on the key I'm targeting?
I keep getting missing : after property id :
db.runCommand(
{
distinct: "mycollection",
key: {"myfield" : { $substr: { "$myfield", 0, 10 } }},
}
)
Can't do this with runCommand distinct. You need to use the agg framework to process the field and then get distinct values using $group, thusly:
db.foo.aggregate([
{$group: {_id: {$substr: [ "$myfield",0,10]} }}
]);
Very often it is useful to get the count of those distinct values:
db.foo.aggregate([
{$group: {_id: {$substr: ["$myfield",0,10]}, count: {$sum:1} }}
]);

Group documents by subdocument field

I am trying to use mongo's aggregation framework to group a collection based on a timestamp and use the $out to output it to a new collection. Apologies, I am new to Mongo
I have the following JSON structure in my collection
{
"_id" : "1",
"parent" : [
{
"child" : {
"child_id" : "1",
"timestamp" : ISODate("2010-01-08T17:49:39.814Z")
}
}
]
}
Here is what I have been trying
db.mycollection.aggregate([
{ $project: { child_id: '$parent.child.child_id', timestamp: '$parent.child.timestamp' }},
{ $group: { cid: '$child_id', ts: { $max: '$timestmap'} }},
{ $out : 'mycollectiongrouped'}
]))
however getting this error. Any ideas, I assume I am probably using the project incorrectly.
[thread1] Error: command failed: {
"ok" : 0,
"errmsg" : "the group aggregate field 'cid' must be defined as an expression inside an object",
"code" : 15951
} : aggregate failed :
_getErrorWithCode#src/mongo/shell/utils.js:25:13
db.collection.aggregate([
{$group: {
_id: "$parent.child.child_id",
timestamp: {$max: "$parent.child.timestamp"}
}},
{$project: {
cid: {$arrayElemAt: ["$_id", 0]},
ts: {$arrayElemAt: ["$timestamp", 0]},
_id: 0
}},
{$out: "groupedCollection" }
])
You are missing the _id which is mandatory for the $group pipeline stage. That being said since the "parent" field in your document is one element array, the $group stage should be the first stage in the pipeline.
By making the $group stage the first stage, you will only need to project one document per group instead of all documents in the collection.
Note that the resulted document fields are array hence the use of the $arrayElemAt operator in the $project stage.
You need an _id field for the $group. This _id is what determines which documents are grouped together. For instance, if you want to group by child_id, then do _id: "$child_id". In that case, you can omit the cid field (in this case, you can just change cid to _id).

Get the number of documents liked per document in MongoDB

I'm working on a project by using MongoDB as a database and I'm encountering a problem: I can't find the right query to make a simple count of the likes of a document. The collection that I use is this :
{ "username" : "example1",
"like" : [ { "document_id" : "doc1" },
"document_id" : "doc2 },
...]
}
So what I need is to compute is the number of likes of each document so at the end I will have
{ "document_id" : "docA" , nbLikes : 30 }, {"document_id" : "docB", nbLikes : 1}
Can anyone help me on this because I failed.
You can do this by unwinding the like array of each doc and then grouping by document_id to get a count for each value:
db.test.aggregate([
// Duplicate each doc, once per 'like' array element
{$unwind: '$like'},
// Group them by document_id and assemble a count
{$group: {_id: '$like.document_id', nbLikes: {$sum: 1}}},
// Reshape the docs to match the desired output
{$project: {_id: 0, document_id: '$_id', nbLikes: 1}}
])
Add "likeCount" field and increase count for per $push operation and read "likeCount" field
db.test.update(
{ _id: "..." },
{
$inc: { likeCount: 1 },
$push: { like: { "document_id" : "doc1" } }
}
)

mongoDB, sum the product of two fields

I have a list of items, and I want mongoDB return the result of the sum of their price*quantity, in other words, the total value of my items.
Schema = {
_id: ObjectId,
price: Number,
quantity: Number
}
I'm trying using the aggregation framework, or map reduce, but I can't figure out how correctly use it.
Here an there is an example for finding the sum of prices,
db.items.aggregate([
{$group: {
_id: null,
prices: {$sum: "$price"}
}}
])
Here is what I would like to obtain:
db.items.aggregate([
{$group: {
_id: null,
prices: {$sum: "$price"*"$quantity"}
}}
])
You don't need to use map-reduce for this. You can use aggregation framework and combine multiple aggregation operators. You almost got it you were just missing the final piece - $multiply operator:
db.items.aggregate([{
"$group" : {
"_id" : null,
"prices" : {
"$sum" : {
"$multiply" : ["$price", "$quantity"]
}
}
}
}]);

Querying internal array size in MongoDB

Consider a MongoDB document in users collection:
{ username : 'Alex', tags: ['C#', 'Java', 'C++'] }
Is there any way, to get the length of the tags array from the server side (without passing the tags to the client) ?
Thank you!
if username Alex is unique, you can use next code:
db.test.insert({username:"Alex", tags: ['C#', 'Java', 'C++'] });
db.test.aggregate(
{$match: {username : "Alex"}},
{$unwind: "$tags"},
{$project: {count:{$add:1}}},
{$group: {_id: null, number: {$sum: "$count" }}}
);
{ "result" : [ { "_id" : null, "number" : 3 } ], "ok" : 1 }
Now MongoDB (2.6 release) supports $size operation in aggregation.
From the documentation:
{ <field>: { $size: <array> } }
What you want can be accomplished as following with either by using this:
db.users.aggregate(
[
{
$group: {
_id: "$username",
tags_count: {$first: {$size: "$tags" }}
}
}
]
)
or
db.users.aggregate(
[
{
$project: {
tags_count: {$size: "$tags"}
}
}
]
)
I think it might be more efficient to calculate the number of tags on each save (as a separate field) using $inc perhaps or via a job on a schedule.
You could also do this with map/reduce (the canonical example) but that doesn't seem to be be what you'd want.
I'm not sure it's possible to do exactly what you are asking, but you can query all the documents that match a certain size with $size ...
> db.collection.find({ tags : { $size: 3 }});
That'd get you all the documents with 3 tags ...
xmm.dev's answer can be simplified: instead of having interm field 'count', you can sum directly in $group:
db.test.aggregate(
{$match: {username : "Alex"}},
{$unwind: "$tags"},
{$group: {_id: null, number: {$sum: 1 }}}
)
Currently, the only way to do it seems to be using db.eval, but this locks database for other operations.
The most speed-efficient way would be adding an extra field that stores the length of the array and
maintaining it by $inc and $push operations.
I did a small work around as I needed to query the array size and return if it was greater than 0 but could be anything from 1-3.
Here was my solution:
db.test.find($or : [{$field : { $exists : true, $size : 1}},
{$field : { $exists : true, $size : 2}},
{$field : { $exists : true, $size : 3}}, ])
This basically returns a document when the attribute exists and the size is 1, 2, or 3. The user can add more statements and increment if they are looking for a specific size or within a range. I know its not perfect but it did work and was relatively quick. I only had 1-3 sizes in my attribute so this solution worked.