Mongodb aggregation taking average - mongodb

In MongoDb, I have a collection which has following data:
[{
_id: ObjectId('....'),
data: [{
type: 'internal',
name: 'abc',
value: 60
}, {
type: 'internal',
name: 'def',
value: 20
}, {
type: 'external',
name: 'def',
value: 20
}]
}, {
_id: ObjectId('....'),
data: [{
type: 'internal',
name: 'abc',
value: 30
}, {
type: 'internal',
name: 'def',
value: 40
}, {
type: 'external',
name: 'def',
value: 10
}]
}]
Now If I want to group by type and take average of value field, I can do like
db.testcollection.aggregate([
{$unwind: '$data'},
{$group: {_id: '$data.type', avg: {$avg: '$data.value'}}}
]);
But, If values of one document of same type has to be treated has one value(sum of both) and then average has to be calculated, what will be the query ?
In my example for type internal it should be:
((60+20)+(30+40))/2
rather than
(60+20+30+40)/4

You can do it in 2 $group phases. The first one is used to compute the sum within a document and the second one is used to compute the average across documents.
db.testcollection.aggregate([
{$unwind: "$data"},
{$group: {_id: {_id:"$_id", type:"$data.type"}, sum:{"$sum": "$data.value"}}}
{$group:{_id:"$_id.type", avg:{"$avg":"$sum"}}}
]);

Related

MongoDB aggregation group by similar string

Im starting to learn aggregations for Mongo, but for my project i found a lot of brands in my collection with very similar names, like 'BrandA' and 'BrandA tech'. Is there a way to group them at the end of my aggregation?
I have 2 collections in my database:
The first one is for brands:
{
_id: ObjectId(),
name: String
}
The second one is for products:
{
_id: ObjectId(),
name: String,
brand: ObjectId() // referring to _id of brands
}
Now lets say i have the following brands:
{_id: ObjectId('5a9fd2b8045b020013de2a47'), name: 'brand1'},
{_id: ObjectId('5a9fcf94d28420245451a39c'), name: 'brand2'},
{_id: ObjectId('5a9fcf94d28420245451a39a'), name: 'brand1 sub1'},
{_id: ObjectId('5a9fe8bf045b020013de2a6d'), name: 'sub2 brand2'}
And the following products:
{_id: ObjectId(''), name: 'item1', brand: ObjectId('5a9fd2b8045b020013de2a47')},
{_id: ObjectId(''), name: 'item2', brand: ObjectId('5a9fcf94d28420245451a39c')},
{_id: ObjectId(''), name: 'item3', brand: ObjectId('5a9fd2b8045b020013de2a47')},
{_id: ObjectId(''), name: 'item4', brand: ObjectId('5a9fcf94d28420245451a39a')},
{_id: ObjectId(''), name: 'item5', brand: ObjectId('5a9fe8bf045b020013de2a6d')},
{_id: ObjectId(''), name: 'item6', brand: ObjectId('5a9fd2b8045b020013de2a47')},
{_id: ObjectId(''), name: 'item7', brand: ObjectId('5a9fcf94d28420245451a39c')},
{_id: ObjectId(''), name: 'item8', brand: ObjectId('5a9fcf94d28420245451a39a')}
The query I have now:
db.getCollection('products').aggregate([
{$group: {
_id: '$brand',
amount: { $sum: 1 },
}},
{
$sort: { 'amount': -1 }
},{$lookup: {
from: 'brands',
localField: '_id',
foreignField: '_id',
as: 'lookup'
}},
{$unwind: {path: '$lookup'}},
{$project: {
_id: '$_id',
brandName: '$lookup.name',
amount: '$amount'
}}
]);
Result:
{_id: ObjectId('5a9fd2b8045b020013de2a47'), brandName: 'brand1', amount: 3}
{_id: ObjectId('5a9fcf94d28420245451a39c'), brandName: 'brand2', amount: 2}
{_id: ObjectId('5a9fcf94d28420245451a39a'), brandName: 'brand1 sub1', amount: 2}
{_id: ObjectId('5a9fe8bf045b020013de2a6d'), brandName: 'sub2 brand2', amount: 1}
Result I want:
{_id: ObjectId(null), brandName: 'brand1', amount: 5},
{_id: ObjectId(null), brandName: 'brand2', amount: 3}
Is it possible to to group the result I have now by finding similar strings in brandName? Like grouping 'brand1' and 'brand1 sub1' or 'brand2' and 'sub2 brand2'?
I think that you could do what you want by using $split and $unwind
split will transform your string into an array of words and unwind will create as many entries as you have words in the array.
Then you can apply the pipeline you already prepared to count the occurences.
a change in the model could easily achieve this. just add the items in an array to a brand.
then you instantly get a count by using the array's length and the query speed is faster.

Using $group in mongodb aggregation multiple times to reduce data to an NxM set with limits?

I am trying to group multiple times to create a NxM matrix of actions over a very large data set.
I have people who can perform actions (set size 10) in locations (set size 1000 per operator, 5 million possible) and I want to produce a report that gives me:
for each operator
for each action
a total count of this type of action performed by this operator
the top N locations where this action was performed by this operator
My input data looks like this:
{ time: 1, operator: 'John', action: 'up', location: 'a' },
{ time: 2, operator: 'Jane', action: 'down', location: 'b' },
{ time: 3, operator: 'John', action: 'down', location: 'a' },
{ time: 4, operator: 'Sean', action: 'charm', location: 'c' },
{ time: 5, operator: 'John', action: 'up', location: 'a' },
{ time: 6, operator: 'Jane', action: 'down', location: 'c' },
...
So for the first stage of the group, I do:
$group: {
_id: {
operator: '$operator',
action: '$action',
location: '$location',
},
count: {$sum: 1}
}
to create:
{ operator: 'John': action: 'up', location: 'a', count: 2},
{ operator: 'John': action: 'down', location: 'a', count: 1},
{ operator: 'Jane': action: 'down', location: 'b', count: 1},
{ operator: 'Jane': action: 'down', location: 'c', count: 1},
{ operator: 'Sean': action: 'charm', location: 'c', count: 1}
Now I want to count every action performed by the operator (could be thousands) but only retain the top 5 locations each operator performed
each operation... I want my final output to have records that look something like:
{ operator: 'John',
total_actions: 10576,
actions: {
up: { count: 2052, most: [{a: 92}, {b: 91}, {c: 82}, {qqz: 60}, {d: 54}]},
down: { count: 8482, most: [{loc: count}, {loc: count}...]}
strange: { count: 39, most: [{loc: count}...]}
charm: {count: 3, most: ...}
}
},
{ operator: 'Jane',
total_actions: 38223,
actions: {...}
}
I'm not really fixated on "most" being an array, and it certainly doesn't need to be sorted.
I keep getting stuck on pushing...
I originally wrote the second stage to group on operator/action and pushed # of actions at location onto the array, but there is no way to sort/limit an array.
{ $group: {
_id: {
operator: '$_id.operator',
action: '$action'
},
action_count: {$sum: '$count'},
locations: {
$push: {
location: '$_id.location',
count: '$count'
}
}
}
Which further reduces the set down to:
{ _id: { operator: 'John', action: 'up' }, action_count: 2, locations: [{location: 'a', count: 2}] },
{ _id: { operator: 'John', action: 'down' }, action_count: 1, locations: [{location: 'a', count: 1}] },
{ _id: { operator: 'Jane', action: 'down' }, action_count: 2, locations: [{location: 'b', count: 1}, {location: 'c', count: 1}] },
My mongo-sense said that was wrong because locations is an array that could have potentially 1000s of entries in it per operator/action. Additionally, I have no operations for sorting and limiting this mess, but if I $unwind it at this stage, it seems like I've just reversed stage 2.
Question 1: What's the right way to proceed from here?
Thoughts:
So instead my next stage, non-intuitively, groups on locations because there could be a ton of locations and relatively few actions, and if I'm grouping on locations, I might be able to still achieve a total-action count before sorting/limiting the locations? I just don't know how to proceed onto the third stage...?
{ $group: {
_id: {
operator: '$_id.operator',
location: '$_id.location'
},
actions: {
$push: {
action: '$_id.action',
count: '$count'
}
}
}
Gives me:
{ _id: { operator: 'John', location: 'a' }, actions: [{action: 'up', count: 2}, {action: 'down', count: 1}] },
{ _id: { operator: 'Jane', location: 'b' }, actions: [{action: 'down', count: 1}]
{ _id: { operator: 'Jane', location: 'c' }, actions: [{action: 'down', count: 1}]
...
I'm not sure I should be using $push at all. Is this a case for $addToSet? My brain is melting.
What is the mongo-sane way to do NxM group reports?
Specifically, I want to count all of the actions performed by the user but only report on the top N locations.
Wow, this was a tough one... thanks to chridam's answer (not the selected answer, but the proper answer) to Mongodb: Select the top N rows from each group .
The correct solution is to, after the first group, sort all of the entries by count, descending, then group them by operator & action, and push the location onto a list (despite the fact the list could get huge), then project just the first few elements of the list, and continue.
The final pipeline looks like this:
{ $group: {
_id: {
operator: '$operator',
action: '$action',
location: '$location',
},
count: {$sum: 1}
}},
// make sure we push from highest count to least
{$sort: {count: -1}},
{$group: {
_id: {
operator: '$_id.operator',
action: '$_id.action'
},
count: { $sum: '$count' },
locations: {
$push: {
location: '$_id.location',
count: '$count'
}
}
}},
{$group: {
'_id': '$_id.operator',
'total': {'$sum': '$count'},
'actions': {
'$push': {
'action': '$_id.action',
'count': '$count',
// keep just the top 5 locations
'top_locations': {'$slice': ['$locations', 5]}
}
}
}}

mongodb get distinct values with category

Suppose there is the following collection
People:
{
_id: 1,
name: 'john',
last_name: 'blah1',
job: 'lifeguard'
}
{
_id: 2,
name: 'john',
last_name: 'blah2',
job: 'lifeguard'
}
{
_id: 3,
name: 'alex',
last_name: 'blah3',
job: 'lifeguard'
}
{
_id: 4,
name: 'alex',
last_name: 'blah4',
job: 'lifeguard'
}
{
_id: 5,
name: 'alex',
last_name: 'blah5',
job: 'gardener'
}
I need to get the distict jobs with an array of distict names:
Trying to get the following result:
[
{
value: 'lifeguard',
names: [
'john',
'alex'
],
},
{
value: 'gardener',
names: [
'alex'
],
},
]
I understand how to get the unique jobs
db.people.find().distinct('jobs')
However i did not figure out how to do a distinct query with multiple properties.
Better to use the aggregation framework where you have a pipeline that has a $group stage to group the documents by the job key and then construct the names distinct array within the group by the accumulator $addToSet.
Consider the following aggregate operation:
db.people.aggregate([
{
"$group": {
"_id": "$job",
"names": { "$addToSet": "$name" }
}
}
])
#chridam did help me find the right answer, in the real world my object was more like
{
_id: 1,
name: ['john', 'bah1', 'blah2', 'blah3'],
last_name: 'blah1',
job: 'lifeguard'
}
so i had to $unwind the names and aggregate $group just like in #chridam's answer.
model.aggregate([
{$unwind: "$name"},
{
$group: {
_id:"$name",
jobs: {
$addToSet: "$job"
}
}
}
]

Count of unique items in mongodb documents with Array of Strings

I'm having a problem that seems like it can be solved by some aggregation samples I've seen, but I've not come up with an answer yet.
Basically I have documents like so:
{
date: '2015-01-14 00:00:00.000Z',
attendees: ['john', 'jane', 'james', 'joanne'],
groupName: '31'
}
And I need to find the unique attendees for a groupName and their attendance count. So for example, with the data:
{
date: '2015-01-13 00:00:00.000Z',
attendees: ['john', 'jane', 'james', 'joanne'],
groupName: '31'
},
{
date: '2015-01-14 00:00:00.000Z',
attendees: ['james', 'joanne'],
groupName: '31'
},
{
date: '2015-01-15 00:00:00.000Z',
attendees: ['joanne'],
groupName: '31'
}
I'd like to get something like:
[{
name: 'joanne',
count: 3
}, {
name: 'john',
count: 1
}, {
name: 'james',
count: 2
}]
I can't seem to find an aggregation to get this type of result. Any help is appreciated.
you can do this:
db.collection.aggregate([
{$unwind: '$attendees'},
{$group: {_id: '$attendees', count: {$sum: 1}}},
{$project: {_id:0, name: '$_id', count: '$count'}}
])

Mongodb: limit 2-level nested document

Mongod deb albums collection contains such items:
var album = {
name: 'album1',
tracks: [{
title: 'track0',
language: 'en',
processing: {
tasks: [
{_id: 1, name: 'someTask1'},
{_id: 2, name: 'someTask2'},
]
}
},{
title: 'track1',
language: 'en',
},{
title: 'track2',
language: 'es',
}]
}
I need to select only one Album, track0 and task with _id 1, so that result set would be looking like (contains only one track and only one task)
{
name: 'album1'
tracks: [{
title: 'track0',
language: 'en',
processing: {
tasks: [
{_id: 1, name: 'someTask1', },
]
}
]
}
Is it possible to that without aggregation framework just using find?
I tried $elemMatch and .$ projection to limit output, but it seems that it doesn't work on nested levels > 1 (tasks in that case) =(