I have a user base stored in mongo. Users may record their date of birth.
I need to run a report aggregating users by age.
I now have a pipeline that groups users by year of birth. However, that is not precise enough because most people are not born on January 1st; so even if they are born in, say, 1970, they may well not be 43 yet.
db.Users.aggregate([
{ $match : { "DateOfBirth" : { $exists : true} } },
{ $project : {"YearOfBirth" : {$year : "$DateOfBirth"} } },
{ $group : { _id : "$YearOfBirth", Total : { $sum : 1} } },
{ $sort : { "Total" : -1 } }
])
Do you know if it's possible to perform some kind of arithmetic within the aggregation framework to exactly calculate the age of a user? Or is this possible with MapReduce only?
It seems like the whole thing is possible with the new Mongo 2.4 version just released, supporting additional Date operations (namely the "$subtract").
Here's how I did it:
db.Users.aggregate([
{ $match : { "DateOfBirth" : { $exists : true} } },
{ $project : {"ageInMillis" : {$subtract : [new Date(), "$DateOfBirth"] } } },
{ $project : {"age" : {$divide : ["$ageInMillis", 31558464000] }}},
// take the floor of the previous number:
{ $project : {"age" : {$subtract : ["$age", {$mod : ["$age",1]}]}}},
{ $group : { _id : "$age", Total : { $sum : 1} } },
{ $sort : { "Total" : -1 } }
])
There are not enough dateTime operators and math operators to project out the date. But you might be able to create age ranges by composing a dynamic query:
Define your date ranges as cut-off dates as
dt18 = today - 18
dt25 = today - 25
...
dt65 = today - 65
Then do nested conditionals, where you progressively use the cut off dates as age group markers, like so:
db.folks.save({ "_id" : 1, "bd" : ISODate("2000-02-03T00:00:00Z") });
db.folks.save({ "_id" : 2, "bd" : ISODate("2010-06-07T00:00:00Z") });
db.folks.save({ "_id" : 3, "bd" : ISODate("1990-10-20T00:00:00Z") });
db.folks.save({ "_id" : 4, "bd" : ISODate("1964-09-23T00:00:00Z") });
db.folks.aggregate(
{
$project: {
ageGroup: {
$cond: [{
$gt: ["$bd",
ISODate("1995-03-19")]
},
"age0_18",
{
$cond: [{
$gt: ["$bd",
ISODate("1988-03-19")]
},
"age18_25",
"age25_plus"]
}]
}
}
},
{
$group: {
_id: "$ageGroup",
count: {
$sum: 1
}
}
})
Related
I have a documents that have a field called ratings. This is an array of objects, each object containing userId and ratingValue
ratings: Array
0: Object
userId: "uidsample1"
ratingValue: 5
1: Object
userId:"uidsample2"
ratingValue:1.5
I want to do an aggregation pipeline to calculate the new average when one of the ratings in the array is updated or added. Then, I want to put that value in the document as a new field called averageRating.
I have tried unwinding, then $ add field of $avg : "ratings.ratingValue" but it adds to the unwinded documents and doesnt get the average. It looks something like this (not exactly since testing on compass)
db.test.aggregate{
[
{
$unwind: {
path: "$ratings"
}
},
{
$addFields {
averageRating: {
$avg: "$ratings.ratingValue"
}
}
}
]
}
What's a good query structure for this ?
you don't actually need to $unwind and $group to calculate the average, these operations are costly
you can simply $addFields with $avg
db.col.aggregate([
{$addFields : {averageRating : {$avg : "$ratings.ratingValue"}}}
])
sample collection and aggregation
> db.t62.drop()
true
> db.t62.insert({data : {ratings : [{val : 1}, {val : 2}]}})
WriteResult({ "nInserted" : 1 })
> db.t62.find()
{ "_id" : ObjectId("5c44d9719d56bf65be5ab2e6"), "data" : { "ratings" : [ { "val" : 1 }, { "val" : 2 } ] } }
> db.t62.aggregate([{$addFields : {avg : {$avg : "$data.ratings.val"}}}])
{ "_id" : ObjectId("5c44d9719d56bf65be5ab2e6"), "data" : { "ratings" : [ { "val" : 1 }, { "val" : 2 } ] }, "avg" : 1.5 }
Use $group after $unwind as below to calculate the averageRating. Aggregate is a read operation. You need to update the doc afterward.
[
{
'$unwind': {
'path': '$ratings'
}
}, {
'$group': {
'_id': '$_id',
'averageRating': {
'$avg': '$ratings.ratingValue'
}
}
}
]
I want to aggregate my data and make an array with multiple stored date, grouped by user and day of week and for this day, something like for this data (according we are february, the 24th) :
{
"_id" : ObjectId("58b0b4b732d3cd188cea9e1b"),
"user" : 1,
"heure" : ISODate("2017-02-24T22:33:27.858Z")
}
{
"_id" : ObjectId("58b0b4b732d3cd188cea9e1b"),
"user" : 1,
"heure" : ISODate("2017-02-24T23:33:27.858Z")
}
{
"_id" : ObjectId("58b0b4b732d3cd188cea9e1b"),
"user" : 2,
"heure" : ISODate("2017-02-24T22:34:27.858Z")
}
{
"_id" : ObjectId("58b0b4b732d3cd188cea9e1b"),
"user" : 1,
"heure" : ISODate("2017-02-25T07:21:27.858Z")
}
Get this :
{
"_id" : {user : 1, jour : 55}
"date" : [ISODate("2017-02-24T22:33:27.858Z"), ISODate("2017-02-24T23:33:27.858Z") ]
}
{
"_id" : {user : 2, jour : 55}
"date" : [ISODate("2017-02-24T22:34:27.858Z") ]
}
I tried using $push of $match, but everything failed.
Optionally, i want to have the time beetween time two date, like for user 1, adding another field which contains 1 hours. But i don't wan't to use ate at most once, so with 4 date in array, i need to have only a addition : the value of first and second with the value of third and fourth. I want to see this to learn how to use the $cond properly
Here is my actual pipeline :
[
{ $match : {$eq : [{$dayOfYear : "$heure"}, {$dayOfYear : ISODate()}] }
{
$group : {
_id : {
user : "$user",
},
date : {$push: "$heure"},
nombre: { $sum : 1 }
}
}
]
For now, i don't handle the second part of the aggregate function
For the first filter part you need to use $redact pipeline as it will return all documents that match the condition with the $$KEEP system variable returned by $cond based on the $dayOfYear date operator and discards documents otherwise with $$PRUNE.
Consider composing your final aggregate pipeline as:
[
{
"$redact": {
"$cond": [
{
"$eq": [
{ "$dayOfYear": "$heure" },
{ "$dayOfYear": new Date() }
]
},
"$$KEEP",
"$$PRUNE"
]
}
},
{
"$group": {
"_id": {
"user": "$user",
"jour": { "$dayOfYear": "$heure" }
},
"date": { "$push": "$heure" },
"nombre": { "$sum": 1 }
}
}
]
I'm trying to fix this little issue, i have trying to search a round to find help, but i can't find help anywhere so i trying to ask here.
i try to get a top most views products from a visitor log, the data in my mongodb look like this
{
"_id" : ObjectId("56617f12cc8eaaa6010041ab"),
"Product" : {
"UUID" : "c7b3e23e-0bf9-4fd5-b8d3-f559b80c42ed"
},
"Log" : {
"Method" : "visit",
"IP" : "127.0.0.1",
"Added" : "2015-12-04 12:54:58"
}
}
What i want is create a group by on the Product.UUID field and all logs not older then 1,5 month and what i have done right now look like this.
db.getCollection('log-product').aggregate([
{
"$group" : {
_id:"$Product.UUID",
total: {$sum : 1}
}
},
{"$sort" : {total: -1}},
{"$limit" : 8}
])
here i group on Product.UUID and sort it DESC on total count and limit it to 8, my problem is i can't find a way to count how many visitor the single product have.
Hope somebody out there can help me width this question.
You need to filter "Log.Added" by time interval first then pass the results to $group:
db.getCollection('log-product').aggregate([
{
"$match": {
"Log.Added": { $gt: new Date(2015,10, 15), $lt: new Date(2015,11,15) }
}
},
{
"$group" : {
_id:"$Product.UUID",
total: {$sum : 1}
}
},
{"$sort" : {total: -1}},
{"$limit" : 8}
])
You can filter by Log.Added and group by product uuid and $Log.IP.:
var currentDate = new Date();
var dateOffset = (24*60*60*1000) * 45;
var initInterval = new Date(new Date() - dateOffset);
db.getCollection('log-product').aggregate([
{ "$match" : { "Log.Added": {$lte: currentDate, $gte: initInterval}}},
{
"$group" : {
_id:{"product": "$Product.UUID", "visitor":"$Log.IP"},
total: {$sum : 1}
}
},
{"$sort" : {total: -1}},
{"$limit" : 8}
])
I have many documents like this one:
{
"_id" : ObjectId("54a94200aa76d3db6cd51977"),
"URL" : "http://...",
"Statistics" : [
{
"Date" : ISODate("2010-05-18T18:07:29.000+0000"),
"Clicks" : NumberInt(250),
},
{
"Date" : ISODate("2010-05-21T12:06:41.000+0000"),
"Clicks" : NumberInt(165),
},
{
"Date" : ISODate("2010-05-30T08:37:50.000+0000"),
"Clicks" : NumberInt(263),
}
]
}
My query looks like this:
db.clicks.aggregate([
{ $match : 'Statistics.Date' : { $gte: new Date("2010-05-18T00:00:00.000Z"), $lte: new Date("2010-05-18T23:59:59.999Z") } },
{ $unwind' => '$Statistics' },
{ $group : { _id : { year : { $year : '$Statistics.Date' }, month : { $month : '$Statistics.Date' }, day : { $dayOfMonth : '$Statistics.Date' } }, Clicks : { $sum : '$Statistics.Clicks' } },
{ $sort : { _id : 1 } }
])
When I try to sum up the clicks from a specific date it gives me all dates, instead of only one. What am I doing wrong? Thanks in advance.
Edit 1:
As there are >80.000 documents in that collection I can't do a $unwind before the $match. Also afaik this would be not a good idea, 'cause that would make the query slower than necessary.
The huge amount of documents and data in it is the reason why I have to use $sum. The document I made above is just an example and only the structure is the same as in my project.
The above query gives me back smth like this:
{
"_id" : [
{
"year" : 2010,
"month" : 5,
"day" : 18
}
],
"Clicks" : 250
},
{
"_id" : [
{
"year" : 2010,
"month" : 4,
"day" : 21
}
],
"Clicks" : 165
},
{
"_id" : [
{
"year" : 2010,
"month" : 5,
"day" : 30
}
],
"Clicks" : 263
}
If I don't use $group I also have to use $limit as the query would exceed 16MB otherwise:
db.clicks.aggregate([
{ $match : 'Statistics.Date' : { $gte: new Date("2010-05-18T00:00:00.000Z"), $lte: new Date("2010-05-18T23:59:59.999Z") } },
{ $unwind' : '$Statistics' },
{ $limit : 1 }
])
This result:
{
"_id" : ObjectId("54a94200aa76d3db6cd51977"),
"URL" : "http://...",
"Statistics" : {
"Date" : {
"sec" : 1274166878,
"usec" : 0
},
"Clicks" : 250
}
}
Due to performance reasons I have to use $group and not using it is not an option.
As I have all done in PHP there may be some errors in the document, queries and results I mentioned. Hopefully this won't be a problem. I still haven't figured out what's causing my problem. Can anyone help me?
Edit 2:
As this seems to be an performance issue which can't be solved I'm migrating all the data from the 'Statistics' array into an own collection. Thx to anyone for your help.
You need to run your $match twice, both before and after the $unwind:
db.clicks.aggregate([
{ $match : { 'Statistics.Date' : {
$gte: new ISODate("2010-05-18T00:00:00.000Z"),
$lte: new ISODate("2010-05-18T23:59:59.999Z") } } },
{ $unwind: '$Statistics' },
{ $match : { 'Statistics.Date' : {
$gte: new ISODate("2010-05-18T00:00:00.000Z"),
$lte: new ISODate("2010-05-18T23:59:59.999Z") } } },
{ $group : {
_id : { year : { $year : '$Statistics.Date' },
month : { $month : '$Statistics.Date' },
day : { $dayOfMonth : '$Statistics.Date' } },
Clicks : { $sum : '$Statistics.Clicks' } } },
{ $sort : { _id : 1 } }
])
The first $match is used to select the documents with at least one Statistics element in the right date range. The second one is used to filter out the other Statistics elements of those docs that aren't in the right date range.
Things may have been solved but posting answer for ones who are seeking help from this question
{ $match : 'Statistics.Date' : { $gte: new Date("2010-05-18T00:00:00.000Z"),
enter code here$lte: new Date("2010-05-18T23:59:59.999Z") } }
this match will filter main documents. What you want is to filter the documents inside Statistics array.
Now documents filtered by $match will contain full Statistic array. And unwinding after filtering may have Sub-document of Statistic whose sibling document(document that are in the same array) have passed $match condition.
Note: simple find projection:
db.col_name.find({},{"Statistics.$":1}) will filter array too but
$project in aggregation is not helping in filtering array of documents.
I'm attempting to use use the new MongoDB aggregation features to tally some statistics by date. Below is a sample of the documents that I am working with, my attempted code and desired result. The aggregation function retuns "UNDEFINED". Can someone tell me why that is? And secondly, I want my aggregation function to group results by date in mm-dd-yyyy format. However as it is currently written I think the code is going to execute the aggregation by the full ISO date. Can someone please tell me how to fix this?
DOCUMENT EXAMPLE
{
user: "2A8761E4-C13A-470E-A759-91432D61B6AF-25982-0000352D853511AF",
language: "English",
imageFileName: "F7A5ED9-D43C-4671-A5C6-F06C7E41F902-7758-000008371FB5B834",
audioFileName: "F6D5727D-9377-4092-A28A-AA900F02653D-7758-0000083749066CF2",
date: ISODate("2012-10-22T02:43:52Z"),
correct: "1",
_id: ObjectId("5084b2e8179c41cc15000001")
}
AGGREGATION FUNCTION
var getUserStats = function(user, language, callback) {
var guessCollection = db.collection('Guesses');
guessCollection.aggregate(
{ $match: {
user: user,
language: language,
}},
{ $sort: {
date: 1
}},
{ $project : {
user : 1,
language : 1,
date : 1,
correct : 1,
incorrect : 1,
} },
{ $unwind : "$language" },
{ $group : {
_id : "$date",
correct : { $sum : "$correct" },
incorrect : { $sum : "$incorrect" }
} }
, function(err, result){
console.log(result);
callback(result);
});
DESIRED RESULT
{
"result" : [
//...snip...
{
"_id" : "2A8761E4-C13A-470E-A759-91432D61B6AF-25982-0000352D853511AF",
"correct" : 32,
"incorrect" : 17,
"date" : 2012-10-22
},
{
"_id" : "2A8761E4-C13A-470E-A759-91432D61B6AF-25982-0000352D853511AF",
"correct" : 16,
"incorrect" : 7,
"date" : 2012-10-23
}
],
"Ok" : 1
}
Regarding your first question about it returning undefined, there are two problems:
You are using the $unwind operator on a field ($language) that isn't an array.
You are using the $sum operator on a string field ($correct); that's only supported for number fields.
For your second question about grouping on just the date, you need to project the date components you want to group on and then use those components in your $group operator's _id value:
For example:
test.aggregate(
{ $match: {
user: user,
language: language
}},
{ $sort: {
date: 1
}},
{ $project : {
user : 1,
language : 1,
year : { $year: '$date' },
month : { $month: '$date' },
day : { $dayOfMonth: '$date'},
correct : 1,
incorrect : 1
}},
{ $group : {
_id : { year: "$year", month: "$month", day: "$day" },
correct : { $sum : "$correct" },
incorrect : { $sum : "$incorrect" }
}},
function(err, result){
console.log(result);
}
);
Produces output of:
[ { _id: { year: 2012, month: 10, day: 22 },
correct: 0,
incorrect: 0 } ]
You can assemble that into '2012-10-22' in code from there.