Double aggregation with distinct count in MongoDB - mongodb

We have a collection which stores log documents.
Is it possible to have multiple aggregations on different attributes?
A document looks like this in it's purest form:
{
_id : int,
agent : string,
username: string,
date : string,
type : int,
subType: int
}
With the following query I can easily count all documents and group them by subtype for a specific type during a specific time period:
db.logs.aggregate([
{
$match: {
$and : [
{"date" : { $gte : new ISODate("2020-11-27T00:00:00.000Z")}}
,{"date" : { $lte : new ISODate("2020-11-27T23:59:59.000Z")}}
,{"type" : 906}
]
}
},
{
$group: {
"_id" : '$subType',
count: { "$sum": 1 }
}
}
])
My output so far is perfect:
{
_id: 4,
count: 5
}
However, what I want to do is to add another counter, which will also add the distinct count as a third attribute.
Let's say I want to append the resultset above with a third attribute as a distinct count of each username, so my resultset would contain the subType as _id, a count for the total amount of documents and a second counter that represents the amount of usernames that has entries. In my case, the number of people that somehow have created documents.
A "pseudo resultset" would look like:
{
_id: 4,
countOfDocumentsOfSubstype4: 5
distinctCountOfUsernamesInDocumentsWithSubtype4: ?
}
Does this makes any sense?
Please help me improve the question as well, since it's difficult to google it when you're not a MongoDB expert.

You can first group at the finest level, then perform a second grouping to achieve what you need:
db.logs.aggregate([
{
$match: {
$and : [
{"date" : { $gte : new ISODate("2020-11-27T00:00:00.000Z")}}
,{"date" : { $lte : new ISODate("2020-11-27T23:59:59.000Z")}}
,{"type" : 906}
]
}
},
{
$group: {
"_id" : {
subType : "$subType",
username : "$username"
},
count: { "$sum": 1 }
}
},
{
$group: {
"_id" : "$_id.subType",
"countOfDocumentsOfSubstype4" : {$sum : "$count"},
"distinctCountOfUsernamesInDocumentsWithSubtype4" : {$sum : 1}
}
}
])
Here is the test cases I used:
And here is the aggregate result:

Related

need me use aggregation mongodb in arrays

I need help in aggregate this query, I need aggregate values of debito
{
"_id" : ObjectId("5a088f6584ccb0a665900726"),
"usuario" : "tamura",
"creditos" : [
{
"nome_do_credito" : "credito inicial",
"credito" : 0
}
],
"debitos" : [
{
"nome_do_debito" : "debito inicial",
"debito" : 0
},
{
"nome_do_debito" : "Faculdade",
"debito" : "150.00"
}
]
}
I need the output
debito : 150
(0+150)
You will first need to turn all your debito fields into a numerical type (as in 150.00) since you cannot do Maths on strings (as in "150.00"). And then the following query should do the trick:
db.collection.aggregate({
$project: {
"debitos": {
$sum: "$debitos.debito"
}
}
})
In case you have more than one document in your collection and you want the total sum over all documents you can run this:
db.collection.aggregate({
$unwind: "$debitos" // flatten the "debitos" array
}, {
$group: {
"_id": null, // do not really group, just throw all documents in the same group
"debitos": {
$sum: "$debitos.debito" // sum up all debito fields
}
}
})

Excluding data in mongo aggregation

I'm working with a mongodb query. Each document in the collection looks like this:
{
"_id": "12345",
"name": "Trinity Force",
"price": 3702,
"comp": [
"Zeal",
"Phage",
"Sheen",
]
}
I was working on a query that returns the 5 cheapest items (lowest price), with prices equal to 0 excluded (those trinkets though). I wrote this (sorry for poor formatting)
db.league.aggregate( { $project : { _id : 1, name: 1, price: 1, comp: 0 } },
{ $match : {price : { $gt : 0 } } },
{ $sort: { price : 1 } }).limit(5)
I ran into two problems, though; the limit function doesn't seem to work with this aggregation, and neither does the $project. The output I'm looking for should exclude the item components (hence comp: 0) and limit it to 5 outputs. Could I get some assistance, please?
db.league.aggregate(
{ $project : { _id : "$_id", name: "$name", price: "$price"} },
{ $match : { "price" : { $gt : 0 } } },
{ $sort: { "price" : 1 } },
{ $limit : 5 })
This is aggregation query to return the 5 cheapest items
imo, this is not aggregating but sorting results.
db.league.find({ price: { $gt :0} }, {comp: 0}).sort({price: 1}).limit(5)
nevertheless, i would test both for performance

MongoDB sum() data

I am new to mongoDB and nosql, what is the syntax to get a sum?
In MySQL, I would do something like this:
SELECT SUM(amount) from my_table WHERE member_id = 61;
How would I convert that to MongoDB? Here is what I have tried:
db.bigdata.aggregate({
$group: {
_id: {
memberId: 61,
total: {$sum: "$amount"}
}
}
})
Using http://docs.mongodb.org/manual/tutorial/aggregation-zip-code-data-set/ for reference you want:
db.bigdata.aggregate(
{
$match: {
memberId: 61
}
},
{
$group: {
_id: "$memberId",
total : { $sum : "$amount" }
}
})
From the MongoDB docs:
The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated results.
It would be better to match first and then group, so that you system only perform group operation on filtered records. If you perform group operation first then system will perform group on all records and then selects the records with memberId=61.
db.bigdata.aggregate(
{ $match : {memberId : 61 } },
{ $group : { _id: "$memberId" , total : { $sum : "$amount" } } }
)
db.bigdata.aggregate(
{ $match : {memberId : 61 } },
{ $group : { _id: "$memberId" , total : { $sum : "$amount" } } }
)
would work if you are summing data which is not a part of array, if you want to sum the data present in some array in a document then use
db.collectionName.aggregate(
{$unwind:"$arrayName"}, //unwinds the array element
{
$group:{_id: "$arrayName.arrayField", //id which you want to see in the result
total: { $sum: "$arrayName.value"}} //the field of array over which you want to sum
})
and will get result like this
{
"result" : [
{
"_id" : "someFieldvalue",
"total" : someValue
},
{
"_id" : "someOtherFieldvalue",
"total" : someValue
}
],
"ok" : 1
}

Count number of times string matches a field across multiple documents

Say I have a number of documents that look like this:
{
'domain': 'www.stackoverflow.com',
'time': 1380860676457
}
{
'domain': 'www.google.com',
'time': 1380860678001
}
{
'domain': 'www.stackoverflow.com',
'time': 1380860657233
}
What's the best way to end up with the following output?
{
'domain': 'www.stackoverflow.com',
'count': 2
}
Are there any performance considerations (a different way to store the logs?) if the initial collection contains, say, a million or more documents?
You could use aggregation. Something like:
db.sites.aggregate([{
$group: {
_id: '$domain',
count: {$sum: 1}
}
}]);
This groups on the domain field and adds 1 to count for each document it finds. To make it look like the output you want you could also add a projection operation to your aggregation:
$project: {
domain: '$_id',
count: 1,
_id: 0
}
What you need is just group by domain and sum chunks. You could do it by collection method aggregate like this:
db.cls.aggregate(
{$group:{_id:"$domain", count: {$sum : 1}}},
{$project:{_id:0, domain:"$_id", count:"$count"}}
)
First, $group gives you:
{
"result" : [
{
"_id" : "www.google.com",
"count" : 1
},
{
"_id" : "www.stackoverflow.com",
"count" : 2
}
],
"ok" : 1
}
And the second, $project gives you:
{
"result" : [
{
"count" : 1,
"domain" : "www.google.com"
},
{
"count" : 2,
"domain" : "www.stackoverflow.com"
}
],
"ok" : 1
}
Or you could do it just by collection method group:
db.cls.group({
key: {domain:1},
reduce: function(curr,result){ result.count += 1 },
initial:{count:0}
})
And to speed up process you should have an index on domain field, as was mentioned by #AnujAneja.

Complex Logic quires on simple mongo document

I am battling with forming complex logic queries on very basic data types in mongo. Essentially I can have millions of user attributes so my basic mongo document is:
{
name: "Gender"
value: "Male"
userId : "ABC123"
}
{
name: "M-Spike"
value: 0.123
userId : "ABC123"
}
What I would like to do is search for things like findAll userId where {name : "Gender, value: "Male"} AND { name : "m-spike", value : { $gt : 0.1} }
I have tried using the aggregation framework but the complexity of the queries is limited, basically I was ORing all the criteria and counting the results by sampleId (which replicated a rudimentary AND)
I can see a way to do it for N being the number of attributes you want to query about (N being 2 in your example). Try something like this:
db.collection.aggregate(
[ { $match: {$or: [
{"name":"M-Spike","value":{$gt:.1}},
{"name":"Gender","value":"Male"}
]
}
},
{ $group: { _id:"$userId",total:{$sum:1}}
},
{ $project: { _id:1,
matchedAttr : { $eq: ["$total",2] }
}
}
]
)
You will get back:
{
"result" : [
{
"_id" : "XYZ123",
"matchedAttr" : false
},
{
"_id" : "ABC123",
"matchedAttr" : true
}
],
"ok" : 1
}
Now, if you had 2 conditions you matched via "$or" then you get back true for _id that matched both of them. So for five conditions, your $match: $or array will have five conditional pairs, and the last $project transformation will be $eq: ["$total",5]
Built in to this solution is the assumption that you cannot have duplicate entries (i.e. _id cannot have "M-Spike":.5 and also "M-Spike":.2. If you can, then this won't work.