Here is document example, year field contains year keys, that contains some metrics with included days as keys:
{
"_id" : NumberInt(1),
"year" : {
"2017" : {
"g1" : {
"1" : {
"total" : 2.0
},
"2" : {
"total" : 5.0
}
},
"g2" : {
"1" : {
"total" : 3.0
},
"2" : {
"total" : 6.0
}
}
}
}
I dont want getting document into memory to summarize total field for each key field g#.
How could i say to mongodb, summ total field for each key in year field.
Result that i want: g1 = 7.0, g2 = 9.0
You have to change your year part of structure to something like below.(Preferred)
"year" : [{ "k" : "2017", "v":[{ "k": "g1", "v":[{ "k" : "1","v" : {"total" : 2 }},{ "k" : "2","v" : {"total" : 5}}]}, { "k": "g2", "v":[{ "k" : "1","v" : {"total" : 3 }},{ "k" : "2","v" : {"total" : 6}}]}]}]
You can the below aggregation. This will work without knowing the keys ahead of time.
The query $unwinds couple of times to reach the g & total document followed by group on the g key and calculate total sum.
db.collection.aggregate([
{$match:{_id:1}},
{$unwind:"$year"},
{$unwind:"$year.v"},
{$unwind:"$year.v.v"},
{
$group:
{
_id:"$year.v.k",
sum: {$sum:"$year.v.v.v.total"}
}
}
])
This is the solution if you can't change your structure.
You can use 3.4.4 version and use $objectToArray to convert all the dynamic keys into labeled key and value pair.
Stage 1 & 2: Match on _id filter and convert the dynamic year keys into label value pair.
Stage 3 & 4: $unwind year array & $reduce the total value to calculate sum before changing the g1 and g2 dynamic keys to labeled key and value pair.
db.collection.aggregate([
{$match:{_id:1}},
{$addFields: {"year": {$objectToArray: "$year"}}},
{$unwind:"$year"},
{
$project:
{
g1:
{
$reduce: {
input: {$objectToArray: "$year.v.g1"},
initialValue: 0,
in: { $sum: [ "$$value", "$$this.v.total" ] }
}
},
g2:
{
$reduce: {
input: {$objectToArray: "$year.v.g2"},
initialValue: 0,
in: { $sum: [ "$$value", "$$this.v.total" ] }
}
}
}
}
])
Related
Assume the following records in mongodb
{
_id: // primary key
age: // some age.
}
The system generates primary key and is guaranteed to be increasing monotonically.
The business logic provides value for age. Age should be increasing, however due to a bug, under some remote cases, the age could be decreasing.
Eg: age could go from 1 yr, 2 yr, 3yr, "2 yr", 4yr, 5yr etc.
How to write a query to spot the outlier in the age ?
Assuming your collection is called 'junk' (sorry, no bad intentions here) I think this might work...
db.junk.aggregate([
{$lookup: {
from: "junk",
let: { age: "$age", id: "$_id" },
pipeline: [
{ $match :
{ $expr:
{ $and:
[
{$gt: ["$_id", "$$id"]},
{ $lt: ["$age", "$$age"] }
]
}
}
}
],
as: "data"
}},
{ $project: { _id: 1, "age": 1, "data": 1, "found": { $gt: [{ $size: "$data" }, 0] } } },
{ $match : { found: true }}
])
The intent is to self join on the same collection where the id is greater than another document, but the age is less for the same document. Count how many records are in this collection, and if the count is greater than 0 output.
Example Collections:
So, for testing this I populated a collection called 'junk' with 7 documents...
> db.junk.find()
{ "_id" : ObjectId("5daf4700090553aca6da1535"), "age" : 0 }
{ "_id" : ObjectId("5daf4700090553aca6da1536"), "age" : 1 }
{ "_id" : ObjectId("5daf4700090553aca6da1537"), "age" : 2 }
{ "_id" : ObjectId("5daf471b090553aca6da1538"), "age" : 3 }
{ "_id" : ObjectId("5daf471e090553aca6da1539"), "age" : 4 }
{ "_id" : ObjectId("5daf4721090553aca6da153a"), "age" : 3 }
{ "_id" : ObjectId("5daf4724090553aca6da153b"), "age" : 5 }
Results:
Here is what my results look like after running this query...
{ "_id" : ObjectId("5daf471e090553aca6da1539"), "age" : 4, "data" : [ { "_id" : ObjectId("5daf4721090553aca6da153a"), "age" : 3 } ], "found" : true }
It found a record having a later outlier (ObjectId 5daf471e090553aca6da1539 precedes the outlier, ObjectId 5daf4721090553aca6da153a is the outlier). Obviously this could be projected differently to show just the outlier, but I wanted to first verify the query works as expected and not invest more time on a inadequate approach.
I have a documents that have a field called ratings. This is an array of objects, each object containing userId and ratingValue
ratings: Array
0: Object
userId: "uidsample1"
ratingValue: 5
1: Object
userId:"uidsample2"
ratingValue:1.5
I want to do an aggregation pipeline to calculate the new average when one of the ratings in the array is updated or added. Then, I want to put that value in the document as a new field called averageRating.
I have tried unwinding, then $ add field of $avg : "ratings.ratingValue" but it adds to the unwinded documents and doesnt get the average. It looks something like this (not exactly since testing on compass)
db.test.aggregate{
[
{
$unwind: {
path: "$ratings"
}
},
{
$addFields {
averageRating: {
$avg: "$ratings.ratingValue"
}
}
}
]
}
What's a good query structure for this ?
you don't actually need to $unwind and $group to calculate the average, these operations are costly
you can simply $addFields with $avg
db.col.aggregate([
{$addFields : {averageRating : {$avg : "$ratings.ratingValue"}}}
])
sample collection and aggregation
> db.t62.drop()
true
> db.t62.insert({data : {ratings : [{val : 1}, {val : 2}]}})
WriteResult({ "nInserted" : 1 })
> db.t62.find()
{ "_id" : ObjectId("5c44d9719d56bf65be5ab2e6"), "data" : { "ratings" : [ { "val" : 1 }, { "val" : 2 } ] } }
> db.t62.aggregate([{$addFields : {avg : {$avg : "$data.ratings.val"}}}])
{ "_id" : ObjectId("5c44d9719d56bf65be5ab2e6"), "data" : { "ratings" : [ { "val" : 1 }, { "val" : 2 } ] }, "avg" : 1.5 }
Use $group after $unwind as below to calculate the averageRating. Aggregate is a read operation. You need to update the doc afterward.
[
{
'$unwind': {
'path': '$ratings'
}
}, {
'$group': {
'_id': '$_id',
'averageRating': {
'$avg': '$ratings.ratingValue'
}
}
}
]
assume I have following structure :
"KnownName" : {
"unknownName1" : {
"id" : "unknownName1",
"value" : "5"
},
"unknownName2" : {
"id" : "unknownName2",
"value" : "5"
},
"unknownName3" : {
"id" : "unknownName3",
"value" : "5"
},
"unknownName4" : {
"id" : "unknownName4",
"value" : "5"
},
"unknownName5" : {
"id" : "unknownName5_v2",
"value" : "5"
},
"unknownName6" : {
"id" : "unknownName6",
"value" : "5"
}
... many more documents as above in various ways
and I want to get all of these counted like this :
unknownName1 : 24
unknownName2 : 27
unknownName3 : 10
....
unknownName37 : 12
I do know my structure upon the 'KnownName' node, but within this node I can have several different labels (here unknownName 1 to 6) but there can be more or less, and they can be different by document. Typically the id in the array will have the same name as the array label but it's not a given (as in unknownName5).
I was looking for ways to get a distinct count of all these 'unknownNames' but this seems to be more challenging as expected.
Any advice on how this can be achieved (preferably using the aggregation framework)
If there is an easy way to get all (deep) children labelled as "id" in the "KnownName" tree without the need to know the unknown parent name it would work also for me. I'm aware there is no such thing as wildcards in mongo, but I'm looking for an alternative to something like KnownName.*.id
You need to start with $objectToArray since your keys are unknown. Then you'll get an array of keys and values that can be processed using $group to get counts. You can also use $replaceRoot and $arrayToObject to get dynamic keys in root object
db.col.aggregate([
{
$addFields: {
unknown: { $objectToArray: "$KnownName" }
}
},
{
$unwind: "$unknown"
},
{
$group: {
_id: "$unknown.k",
count: { $sum: 1 }
}
},
{
$sort: { _id: 1 }
},
{
$group: {
_id: null,
data: { $push: { k: "$_id", v: "$count" } }
}
},
{
$replaceRoot: {
newRoot: {
$arrayToObject: "$data"
}
}
}
])
I'd like to get percentages from a group pipeline in a MongoDB aggregate.
My data:
{
_id : 1,
name : 'hello',
type : 'big'
},
{
_id : 2,
name : 'bonjour',
type : 'big'
},
{
_id : 3,
name : 'hi',
type : 'short'
},
{
_id : 4,
name : 'salut',
type : 'short'
},
{
_id : 5,
name : 'ola',
type : 'short'
}
My request group by type, and count:
[{
$group : {
_id : {
type : '$type'
},
"count" : {
"$sum" : 1
}
}
}]
Result:
[
{
_id {
type : 'big',
},
count : 2
},
{
_id {
type : 'short',
},
count : 3
}
]
But I'd like to have count AND percentage, like that:
[
{
_id {
type : 'big',
},
count: 2,
percentage: 40%
},
{
_id {
type : 'short',
},
count: 3,
percentage: 60%
}
]
But I've no idea how to do that. I've tried $divide and other things, but without success. Could you please help me?
Well I think percentage should be string if the value contains %
First get you will need to count the number of document.
var nums = db.collection.count();
db.collection.aggregate(
[
{ "$group": { "_id": {"type": "$type"}, "count": { "$sum": 1 }}},
{ "$project": {
"count": 1,
"percentage": {
"$concat": [ { "$substr": [ { "$multiply": [ { "$divide": [ "$count", {"$literal": nums }] }, 100 ] }, 0,2 ] }, "", "%" ]}
}
}
]
)
Result
{ "_id" : { "type" : "short" }, "count" : 3, "percentage" : "60%" }
{ "_id" : { "type" : "big" }, "count" : 2, "percentage" : "40%" }
First find total number of documents in collections using count method and used that count variable to calculate percentage in aggregation like this :
var totalDocument = db.collectionName.count() //count total doc.
used totalDocument in aggregation as below :
db.collectionName.aggregate({"$group":{"_id":{"type":"$type"},"count":{"$sum":1}}},
{"$project":{"count":1,"percentage":{"$multiply":[{"$divide":[100,totalDocument]},"$count"]}}})
EDIT
If you need to this in single aggregation query then unwind used in aggregation but using unwind it creates Cartesian problem check below aggregation query :
db.collectionName.aggregate({"$group":{"_id":null,"count":{"$sum":1},"data":{"$push":"$$ROOT"}}},
{"$unwind":"$data"},
{"$group":{"_id":{"type":"$data.type"},"count":{"$sum":1},
"total":{"$first":"$count"}}},
{"$project":{"count":1,"percentage":{"$multiply":[{"$divide":[100,"$total"]},"$count"]}}}
).pretty()
I recconmed first find out toatal count and used that count in aggregation as per first query.
I have a user base stored in mongo. Users may record their date of birth.
I need to run a report aggregating users by age.
I now have a pipeline that groups users by year of birth. However, that is not precise enough because most people are not born on January 1st; so even if they are born in, say, 1970, they may well not be 43 yet.
db.Users.aggregate([
{ $match : { "DateOfBirth" : { $exists : true} } },
{ $project : {"YearOfBirth" : {$year : "$DateOfBirth"} } },
{ $group : { _id : "$YearOfBirth", Total : { $sum : 1} } },
{ $sort : { "Total" : -1 } }
])
Do you know if it's possible to perform some kind of arithmetic within the aggregation framework to exactly calculate the age of a user? Or is this possible with MapReduce only?
It seems like the whole thing is possible with the new Mongo 2.4 version just released, supporting additional Date operations (namely the "$subtract").
Here's how I did it:
db.Users.aggregate([
{ $match : { "DateOfBirth" : { $exists : true} } },
{ $project : {"ageInMillis" : {$subtract : [new Date(), "$DateOfBirth"] } } },
{ $project : {"age" : {$divide : ["$ageInMillis", 31558464000] }}},
// take the floor of the previous number:
{ $project : {"age" : {$subtract : ["$age", {$mod : ["$age",1]}]}}},
{ $group : { _id : "$age", Total : { $sum : 1} } },
{ $sort : { "Total" : -1 } }
])
There are not enough dateTime operators and math operators to project out the date. But you might be able to create age ranges by composing a dynamic query:
Define your date ranges as cut-off dates as
dt18 = today - 18
dt25 = today - 25
...
dt65 = today - 65
Then do nested conditionals, where you progressively use the cut off dates as age group markers, like so:
db.folks.save({ "_id" : 1, "bd" : ISODate("2000-02-03T00:00:00Z") });
db.folks.save({ "_id" : 2, "bd" : ISODate("2010-06-07T00:00:00Z") });
db.folks.save({ "_id" : 3, "bd" : ISODate("1990-10-20T00:00:00Z") });
db.folks.save({ "_id" : 4, "bd" : ISODate("1964-09-23T00:00:00Z") });
db.folks.aggregate(
{
$project: {
ageGroup: {
$cond: [{
$gt: ["$bd",
ISODate("1995-03-19")]
},
"age0_18",
{
$cond: [{
$gt: ["$bd",
ISODate("1988-03-19")]
},
"age18_25",
"age25_plus"]
}]
}
}
},
{
$group: {
_id: "$ageGroup",
count: {
$sum: 1
}
}
})