I have a collection with a schema that includes a date. I would like to shape the results of a query so that it is grouped by year and month, but not aggregated, just sorted.
So my model that looks like this:
var RoundSchema = new Schema({
user: {type: Schema.Types.ObjectId, ref: 'User', required: '{PATH} is required!'},
course: {type: Schema.Types.ObjectId, ref: 'Course', required: '{PATH} is required!'},
date: {type: Date, required: '{PATH} is required!'},
score: Number
});
gets transformed like this:
[
{
"year": 2014,
"month": "April",
"rounds": [
{
"user": "5334d6650685f68c22aa460b",
"date": "2014-04-23T05:00:00.000Z",
"course": "5340ab6000806e2433864cfc",
"score": 73,
"_id": "534f102667d635381834367b"
},
{
"user": "5334d6650685f68c22aa460b",
"date": "2014-04-21T05:00:00.000Z",
"course": "5340ab6000806e2433864cfc",
"score": 75,
"_id": "534f100067d6353818343671"
}
]
},
{
"year": 2014,
"month": "May",
"rounds": [
{
"user": "5334d6650685f68c22aa460b",
"date": "2014-05-05T05:00:00.000Z",
"course": "5337611d8d03819024515cf9",
"score": 81,
"_id": "534dc38780f1a854236203f3"
},
{
"user": "5334d6650685f68c22aa460b",
"date": "2014-05-04T05:00:00.000Z",
"course": "5337611d8d03819024515cf9",
"score": 77,
"_id": "534dc22c80f1a854236203e9"
}
]
}
]
I am guessing that I can use map-reduce or the aggregation pipeline in MongoDB, but I can't get my head around the syntax. Feeling sharp as a bowling ball right now.
Anyone have an idea to get me kick-started?
This can be done quite simply using the aggregation framework. Which is not just for "summing" values, but also excels at document re-shaping.
Which is your best option as though the "non-aggregating" code can look a little simpler using mapReduce, as that process relies on a JavaScript interpreter, as opposed to the native code of the aggregation framework, and a mapReduce will therefore run much slower. Considerably so over large data.
db.collection.aggregate([
// Place all items in an array by year and month
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
},
"rounds": { "$push": {
"user": "$user",
"date": "$date",
"course": "$course",
"score": "$score",
"_id": "$_id"
}}
}},
// Sort the results by year and month
{ "$sort": { "_id.year": 1, "_id.date": 1 } },
// Optional project to your exact form
{ "$project": {
"_id": 0,
"year": "$_id.year",
"month": "$_id.month",
"rounds": 1
}}
])
Or possibly without the arrays you specify and leave everything just in a flat form:
db.collection.aggregate([
{ "$project": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
"user": 1,
"date": 1,
"course": 1,
"score": 1
}},
{ "$sort": { "year": 1, "month": 1 } }
])
Or even do it to exactly how you specify with the month names:
db.collection.aggregate([
// Place all items in an array by year and month
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
},
"rounds": { "$push": {
"user": "$user",
"date": "$date",
"course": "$course",
"score": "$score",
"_id": "$_id"
}}
}},
// Sort the results by year and month
{ "$sort": { "_id.year": 1, "_id.date": 1 } },
// Optionally project to your exact form
{ "$project": {
"_id": 0,
"year": "$_id.year",
"month": { "$cond": [
{ "$eq": [ "$_id.month": 1 ] },
"January",
{ "$cond": [
{ "$eq": [ "$_id.month": 2 ] },
"February",
{ "$cond": [
{ "$eq": [ "$_id.month": 3 ] },
"March",
{ "$cond": [
{ "$eq": [ "$_id.month": 4 ] },
"April",
{ "$cond": [
{ "$eq": [ "$_id.month": 5 ] },
"May",
{ "$cond": [
{ "$eq": [ "$_id.month": 6 ] },
"June",
{ "$cond": [
{ "$eq": [ "$_id.month": 7 ] },
"July",
{ "$cond": [
{ "$eq": [ "$_id.month": 8 ] },
"August",
{ "$cond": [
{ "$eq": [ "$_id.month": 9 ] },
"September",
{ "$cond": [
{ "$eq": [ "$_id.month": 10 ] },
"October",
{ "$cond": [
{ "$eq": [ "$_id.month": 11 ] },
"November",
"December"
]}
]}
]}
]}
]}
]}
]}
]}
]}
]}
]},
"rounds": 1
}}
])
Which is overkill of course, but just to show that it can be done.
And there is of course the Operator reference so you can understand the usages of the operators used. And you may well want to alter this by adding your own $match condition to restrict the date range that you are looking at. Or indeed for other purposes.
Related
I'm doing an aggregation where I sum all the sales by month (createdAt), and I'm trying to calculate the variation between the prior value.
How to compare value with the prior value of same field in MongoDB?
[
{"$addFields": { "createdAt": {"$convert": { "input": "$createdAt", "to": "date", "onError": null}}}},
{"$addFields": {"createdAt": {"$cond": {"if": { "$eq": [{"$type": "$createdAt" }, "date"]},
"then": "$createdAt", "else": null}}}},
{"$addFields": {"__alias_0": {"year": {"$year": "$createdAt" }, "month": {"$subtract": [{ "$month": "$createdAt"}, 1] } } }},
{ "$group": { "_id": { "__alias_0": "$__alias_0" }, "__alias_1": {"$sum": 1 }}},
{ "$project": {"_id": 0, "__alias_0": "$_id.__alias_0", "__alias_1": 1}},
{ "$project": {"group": "$__alias_0", "value": "$__alias_1", "_id": 0 }}
Collection exists as below:
[
{"currentLocation": "Chennai", "baseLocation": "Bengaluru"},
{"currentLocation": "Chennai", "baseLocation": "Bengaluru"},
{"currentLocation": "Delhi", "baseLocation": "Bengaluru"},
{"currentLocation": "Chennai", "baseLocation": "Chennai"}
]
Expected Output:
[
{"city": "Chennai", "currentLocationCount": 3, "baseLocationCount": 1},
{"city": "Bengaluru", "currentLocationCount": 0, "baseLocationCount": 3},
{"city": "Delhi", "currentLocationCount": 1, "baseLocationCount": 0}
]
What I have tried is:
db.getCollection('users').aggregate([{
$group: {
"_id": "$baselocation",
baseLocationCount: {
$sum: 1
}
},
}, {
$project: {
"_id": 0,
"city": "$_id",
"baseLocationCount": 1
}
}])
Got result as:
[
{"city": "Chennai", "baseLocationCount": 1},
{"city": "Bengaluru", "baseLocationCount": "3"}
]
I'm not familiar with mongo, so any help?
MongoDB Version - 3.4
Neil Lunn and myself had a lovely argument over this topic the other day which you can read all about here: Group by day with Multiple Date Fields.
Here are two solutions to your precise problem.
The first one uses the $facet stage. Bear in mind, though, that it may not be suitable for large collections because $facet produces a single (potentially huge) document that might be bigger than the current MongoDB document size limit of 16MB (which only applies to the result document and wouldn't be a problem during pipeline processing anyway):
collection.aggregate(
{
$facet:
{
"current":
[
{
$group:
{
"_id": "$currentLocation",
"currentLocationCount": { $sum: 1 }
}
}
],
"base":
[
{
$group:
{
"_id": "$baseLocation",
"baseLocationCount": { $sum: 1 }
}
}
]
}
},
{ $project: { "result": { $setUnion: [ "$current", "$base" ] } } }, // merge results into new array
{ $unwind: "$result" }, // unwind array into individual documents
{ $replaceRoot: { newRoot: "$result" } }, // get rid of the additional field level
{ $group: { "_id": "$_id", "currentLocationCount": { $sum: "$currentLocationCount" }, "baseLocationCount": { $sum: "$baseLocationCount" } } }, // group into final result)
{ $project: { "_id": 0, "city": "$_id", "currentLocationCount": 1, "baseLocationCount": 1 } } // group into final result
)
The second one works based on the $map stage instead:
collection.aggregate(
{
"$project": {
"city": {
"$map": {
"input": [ "current", "base" ],
"as": "type",
"in": {
"type": "$$type",
"name": {
"$cond": {
"if": { "$eq": [ "$$type", "current" ] },
"then": "$currentLocation",
"else": "$baseLocation"
}
}
}
}
}
}
},
{ "$unwind": "$city" },
{
"$group": {
"_id": "$city.name",
"currentLocationCount": {
"$sum": {
"$cond": {
"if": { "$eq": [ "$city.type", "current" ] },
"then": 1,
"else": 0
}
}
},
"baseLocationCount": {
"$sum": {
"$cond": {
"if": { "$eq": [ "$city.type", "base" ] },
"then": 1,
"else": 0
}
}
}
}
}
)
If I have a set of objects each with the same description, but with different amounts.
{
{
"_id": "101",
"description": "DD from my employer1",
"amount": 1000.33
},
{
"_id": "102",
"description": "DD from my employer1",
"amount": 1000.34
},
{
"_id": "103",
"description": "DD from my employer1",
"amount": 1000.35
},
{
"_id": "104",
"description": "DD from employer1",
"amount": 5000.00
},
{
"_id": "105",
"description": "DD from my employer2",
"amount": 2000.33
},
{
"_id": "106",
"description": "DD from my employer2",
"amount": 2000.33
},
{
"_id": "107",
"description": "DD from my employer2",
"amount": 2000.33
}
}
Below, I am able to group them using the description:
{
{
"$group": {
"_id": {
"description": "$description"
},
"count": {
"$sum": 1
},
"_id": {
"$addToSet": "$_id"
}
}
},
{
"$match": {
"count": {
"$gte": 3
}
}
}
}
Is there a way to include all the amounts in the group (_ids: 101, 102, and 103 plus 105,106,107) even if they have a small difference, but exclude the bonus amount, which in the sample above is _id 104?
I don't believe it could be done in a group stage, but is there something that could be done at a later stage that could group _ids 101, 102 and 103 together and exclude _id 104. Basically, I want MongoDB to ignore the small differences in 101, 102, 103 and group them together since the are paychecks coming from the same employer.
I have been working with $stdDevPop, but can't get a solid formula down.
I am looking for a simple array output of just the _ids.
{
"result": [
"101",
"102",
"103",
"105",
"106",
"107"
]
}
You can do this by doing some math on the "amount" to round it down to the nearest 1000 and use that as the grouping _id:
db.collection.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$trunc": "$amount" },
{ "$mod": [
{ "$trunc": "$amount" },
1000
]}
]
},
"results": { "$push": "$_id" }
}},
{ "$redact": {
"$cond": {
"if": { "$gt": [ { "$size": "$results" }, 1 ] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$unwind": "$results" },
{ "$group": {
"_id": null,
"results": { "$push": "$results" }
}}
])
If your MongoDB is older than 3.2 then you would just need to use a long form with $mod of what $trunc is doing. And if your MongoDB is older than 2.6 then rather than $redact you would $match. So in the longer form this is:
db.collection.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [
"$amount",
{ "$mod": [ "$amount", 1 ] }
]},
{ "$mod": [
{ "$subtract": [
"$amount",
{ "$mod": [ "$amount", 1 ] }
]},
1000
]}
]
},
"results": { "$push": "$_id" },
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } } },
{ "$unwind": "$results" },
{ "$group": {
"_id": null,
"results": { "$push": "$results" }
}}
])
Either way the output is just the _id values whose amounts grouped to the boundaries with a count more than once.
{ "_id" : null, "results" : [ "105", "106", "107", "101", "102", "103" ] }
You could either add a $sort in there or live with sorting the result array in client code.
db.yourDBNameHere.aggregate( [
{ $match: { "amount" : { $lt : 5000 } } },
{ $project: { _id: 1 } },
])
that will grab the ID only of every transaction less than 5000$.
I have a document called user.monthly, in that I have we used store 'day' : no. of clicks .
Here I have given 2 samples for different date
For month January
{
name : "devid",
date : ISODate("2014-01-21T11:32:42.392Z"),
daily: {'1':12,'9':13,'30':13}
}
For month February
{
name : "devid",
date : ISODate("2014-02-21T11:32:42.392Z"),
daily: {'3':12,'12':13,'25':13}
}
How can I aggregate this and get total clicks for January and February ?
Please help me to resolve my problem.
Your current schema is not helping you here as the "daily" field ( which we presume is your clicks per type or something like that ) is represented as a sub-document, which means that you need to explicitly name the path to each field in order to do something with it.
A better approach would be to put this information in an array:
{
"name" : "devid",
"date" : ISODate("2014-02-21T11:32:42.392Z"),
"daily": [
{ "type": "3", "clicks": 12 },
{ "type": "12", "clicks": 13 },
{ "type": "25", "clicks": 13 }
]
}
Then you have an aggregation statement that goes like this:
db.collection.aggregate([
// Just match the dates in January and February
{ "$match": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
}},
// Unwind the "daily" array
{ "$unwind": "$daily" },
// Group the values together by "type" on "January" and "February"
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
"type": "$daily.type"
},
"clicks": { "$sum": "$daily.clicks" }
}},
// Sort the result nicely
{ "$sort": {
"_id.year": 1,
"_id.month": 1,
"_id.type": 1
}}
])
That form is pretty simple. Or even if you do not care about the type as a grouping and just want the month totals:
db.collection.aggregate([
{ "$match": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
}},
{ "$unwind": "$daily" },
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
},
"clicks": { "$sum": "$daily.clicks" }
}},
{ "$sort": { "_id.year": 1, "_id.month": 1 }}
])
But with the current sub-document form you currently have this becomes ugly:
db.collection.aggregate([
{ "$match": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
}},
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
},
"clicks": {
"$sum": {
"$add": [
{ "$ifNull": ["$daily.1", 0] },
{ "$ifNull": ["$daily.3", 0] },
{ "$ifNull": ["$daily.9", 0] },
{ "$ifNull": ["$daily.12", 0] },
{ "$ifNull": ["$daily.25", 0] },
{ "$ifNull": ["$daily.30", 0] },
]
}
}
}}
])
That shows that you have no other option here other than to specify what is essentially every possible field under daily ( so probably much larger ). Then we have to evaluate as that key may possibly not exist for a given document to return a default value.
For example, your first document has no key "daily.3" so without the $ifNull check the returned value would be null and invalidate the whole $sum process so that the total would be "0".
Grouping on those keys as in the first aggregate example gets even worse:
db.collection.aggregate([
// Just match the dates in January and February
{ "$match": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
}},
// Project with an array to match all possible values
{ "$project": {
"date": 1,
"daily": 1,
"type": { "$literal": ["1", "3", "9", "12", "25", "30" ] }
}},
// Unwind the "type" array
{ "$unwind": "$type" },
// Project values onto the "type" while grouping
{ "$group" : {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
"type": "$type"
},
"clicks": { "$sum": { "$cond": [
{ "$eq": [ "$type", "1" ] },
"$daily.1",
{ "$cond": [
{ "$eq": [ "$type", "3" ] },
"$daily.3",
{ "$cond": [
{ "$eq": [ "$type", "9" ] },
"$daily.9",
{ "$cond": [
{ "$eq": [ "$type", "12" ] },
"$daily.12",
{ "$cond": [
{ "$eq": [ "$type", "25" ] },
"$daily.25",
"$daily.30"
]}
]}
]}
]}
]}}
}},
{ "$sort": {
"_id.year": 1,
"_id.month": 1,
"_id.type": 1
}}
])
Which is creating one big conditional evaluation using $cond to match out the values to the "type" which we projected all possible values in an array using the $literal operator.
If you do not have MongoDB 2.6 or greater you can always do this in place of the $literal operator statement:
"type": { "$cond": [1, ["1", "3", "9", "12", "25", "30" ], 0] }
Where essentially the true evaluation from $cond returns a "literal" declared value, which is how you specify an array. There is also the hidden $const operator that is not documented, but now exposed as $literal.
As you can see the structure here is doing you no favors, so the best option is to change it. But if you cannot and otherwise find the aggregation concept for this too hard to handle, then mapReduce offers an approach, but the processing will be much slower:
db.collection.mapReduce(
function () {
for ( var k in this.daily ) {
emit(
{
year: this.date.getFullYear(),
month: this.date.getMonth() + 1,
type: k
},
this.daily[k]
);
}
},
function(key,values) {
return Array.sum( values );
},
{
"query": {
"date": {
"$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
}
},
"out": { "inline": 1 }
}
)
The general lesson here is that you will get the cleanest and fastest results by altering the document format and using the aggregation framework. But all the ways to do this are listed here.
Here is my query, I would like to combine $_id to YYYY-MM-DD? or any function like Mysql DATE() to convert DATETIME format to DATE format?
db.event.aggregate([
{
$project: {
"created": {$add: ["$created", 60*60*1000*8]},
}
},
{
$group: {
"_id": {
"year": {"$year": "$created"},
"month": {"$month": "$created"},
"day": {"$dayOfMonth": "$created"}
},
"count": { $sum: 1 }
}
}
])
You basically already are by using the date aggregation operators to split up the components into your compound _id key, and this is probably the best way to handle it. You can actually alter this though with the $substr operator and use of $concat:
db.event.aggregate([
{ "$project": {
"created": {$add: ["$created", 60*60*1000*8]},
}},
{ "$group": {
"_id": {
"year": {"$year": "$created"},
"month": {"$month": "$created"},
"day": {"$dayOfMonth": "$created"}
},
"count": { "$sum": 1 }
}},
{ "$project": {
"_id": { "$concat": [
{ "$substr": [ "$_id.year", 0, 4 ] },
"-",
{ "$cond": [
{ "$lte": [ "$_id.month", 9 ] },
{ "$concat": [
"0",
{ "$substr": [ "$_id.month", 0, 2 ] }
]},
{ "$substr": [ "$_id.month", 0, 2 ] }
]},
"-",
{ "$cond": [
{ "$lte": [ "$_id.day", 9 ] },
{ "$concat": [
"0",
{ "$substr": [ "$_id.day", 0, 2 ] }
]},
{ "$substr": [ "$_id.day", 0, 2 ] }
]}
]},
"count": 1
}}
])
So there is a bit of coercion of the values from the date parts to strings there as well as padding out any values under two didgits with a leading 0 just like in a "YYYY-MM-DD" format.
Noting that it can be done, and has been able to be done for some time, but it is notably missing from the manual page description of the $substr operator.
Not to sure about your "date math" at the start there. I would say you would be better off using the aggregation operators and then working on the values that you wanted to adjust by, or if indeed it was something like a "timezone" correction, then again you would probably be better off processing that client side.