I have the data below. I want to run a query to group my results by category and month and return a total.
The first desired output is a nested array of month names with aggregated totals for all 12 months by category. Months that are not present in the data will still be returned but have 0 as the total.
{"category":"Auto","month":{"Jan":9.12,"Feb":9.12,"Mar":0,...}},
{"category":"Fees","month":{..."Apr":0,"May":4.56,"Jun":0,...}},
{"category":"Travel","month":{..."Oct":0,"Nov":4.56,"Dec":0}}
The second desired output is an array that doesn't have nested months...
{"category":"Auto","Jan":4.56,"Feb":4.56,"Mar":0,...},
{"category":"Fees",..."Apr":0,"May":4.56,"Jun":0,...},
{"category":"Travel",..."Oct":0,"Nov":0,"Dec":4.56,}
How can these results be queried with Mongodb? Here is the sample input data:
{
"_id" : ObjectId("583f6e6d14c8042dd7c153f1"),
"transid" : 1,
"category": "Auto",
"postdate" : ISODate("2016-01-28T05:00:00.000Z"),
"total" : 4.56 }
{
"_id" : ObjectId("583f6e6d14c8042dd7c153f2"),
"transid" : 5,
"category": "Auto",
"postdate" : ISODate("2016-01-31T05:00:00.000Z"),
"total" : 4.56 }
{
"_id" : ObjectId("583f6e6d14c8042dd7c153f3"),
"transid" : 3,
"category": "Auto",
"postdate" : ISODate("2016-02-28T05:00:00.000Z"),
"total" : 4.56 }
{
"_id" : ObjectId("583f6e6d14c8042dd7c153f4"),
"transid" : 2,
"category": "Auto",
"postdate" : ISODate("2016-02-31T05:00:00.000Z"),
"total" : 4.56 }
{
"_id" : ObjectId("583f6e6d14c8042dd7c153f5"),
"transid" : 6,
"category": "Fees",
"postdate" : ISODate("2016-05-16T05:00:00.000Z"),
"total" : 4.56 }
{
"_id" : ObjectId("583f6e6d14c8042dd7c153f6"),
"transid" : 7,
"category": "Travel",
"postdate" : ISODate("2016-11-13T05:00:00.000Z"),
"total" : 4.56 }
I'm new to mongodb and come from a sql background so I feel I've been thinking about all this in sql terms.
Below is what I've tried so far based on reading through the mongodb documentation and attempting to translate "sql think". I'm essentially trying to filter to a specified year (in this case 2016). I'm then grouping by category and date. And in the last step I plan to use project and the $cond keyword to "subaggregate" on month by specifying the start and end dates of each month and then assign the month name as Jan, Feb, etc... I have syntax errors and I don't know if this is the right or best approach.
db.transactions.aggregate(
[
{ $match: { "postdate": {$gte: new Date("2016-01-01")}} },
{ $group: { _id: {"category":"$category","postdate":"$postdate"} , "total": { $sum: "$debit" } } },
{ $project: {"_id":0,"category":"$_id.category",
"month":{$cond: {
$and:
[
{ $gte: ["$_id.postdate", new Date("2016-01-01")] },
{ $lt: ["$_id.postdate", new Date("2016-02-01")] },
]
},"Jan":"$sum"}
//repeat for all other 11 months...
}}
]
)
If you want to group by month you can use month operator. eg:
db.transaction.aggregate([{$group:{_id:{ $month:"$postdate"}, "total":{$sum:1}}}])
I am not sure what project is doing for you.
Related
In MongoDB, I have a collection of different movies with their years.
Consider these documents:
{
"_id" : ObjectId("63a994974ac549c5ea982d2b"),
"title" : "Destroyer",
"year" : 1907
},
{
"_id" : ObjectId("63a994974ac549c5ea982d2a"),
"title" : "Aquaman",
"year" : 1902
},
{
"_id" : ObjectId("63a994974ac549c5ea982d29"),
"title" : "On the Basis of Sex",
"year" : 1907
},
{
"_id" : ObjectId("63a994974ac549c5ea982d28"),
"title" : "Holmes and Watson",
"year" : 1902
},
{
"_id" : ObjectId("63a994974ac549c5ea982d27"),
"title" : "Conundrum: Secrets Among Friends",
"year" : 1902
},
{
"_id" : ObjectId("63a994974ac549c5ea982d26"),
"title" : "Welcome to Marwen",
"year" : 1907
},
{
"_id" : ObjectId("63a994974ac549c5ea982d25"),
"title" : "Mary Poppins Returns",
"year" : 1997
},
{
"_id" : ObjectId("63a994974ac549c5ea982d24"),
"title" : "Bumblebee",
"year" : 2004
}
I want to show the year or years with the fewest movies showing the number of movies from that year. So, with the previous documents, you can see there are 2 years with the same count of movies. Years: 1907 and 1902.
Therefore, I want to join those years in a single document. I tried this code:
var query1 = {$group: {"_id": "$year",
"movies": {$sum:1},
"Years": {$addToSet:"$year"},
}}
var stages = [query1]
db.movies.aggregate(stages)
However, the output is this:
{
"_id" : 1907,
"movies" : 3,
"Years" : [ 1907 ]
},
{
"_id" : 1902,
"movies" : 3,
"Years" : [ 1902 ]
},
I do not want that. The expect output that I want is this:
{
"_id" : 1902,
"movies" : 3,
"Years" : [ 1907, 1902 ]
}
Once you get that, what I want to show as a final output is this:
{
"_id" : [1907, 1902],
"movies" : 3
}
I do not know how to do that. I cannot join all these years in an array...
How can I get that? How can I obtain the previous output?
Thanks so much for your attention. Whatever you need, ask it pls...
One option is to do what #Joe suggested:
db.collection.aggregate([
{$group: {
_id: "$year",
count: {$sum: 1}
}},
{$group: {
_id: "$count",
years: {$push: "$_id"}
}},
{$sort: {_id: -1}},
{$limit: 1},
{$project: {_id: 0, count: "$_id", years: 1}}
])
See how it works on the playground example
If two values are Equal make a Condition and change datatype after trying to concatenates save in Same array or use trim to four letter ',' and store Database then Fetch the movies value and this new Concatenate array.
I am not sure This I just give my IdEA
The _id for the group is $year, so every year will have a separate document.
What you should do is 2 group stages, the first with _id: "$year" to count the number of movies per year, and the second, with the _id: $movies with the addToSet to group years with the same number of movies
Ok, here's my data:
"stats" : [
{
"campaign_id" : "some_id",
"log_id" : "some_id",
"agent" : "some_id",
"office" : "some_id",
"hq" : "some_name",
"seller" : "some_name",
"status" : "live",
"phases" : [
{
"phase" : "main_phase",
"banners" : [
{
"banner_id" : "some_id_same_as_below",
"split_var" : "light",
"reports" : [
{
"date" : "2016-11-25",
"banner" : "some_id_same_as_above",
"cost" : "0.231",
"impressions" : 14,
"clicks" : 0
},
...
And I need to query the database for all reports:
"reports" : [
{
"date" : "2016-11-25",
"banner" : "some_id_same_as_above",
"cost" : "0.231",
"impressions" : 14,
"clicks" : 0
},
For the "date" : "2016-11-25" within a date range. For the date range I have this:
start_month = DateTime.current.beginning_of_month - 1.month
end_month = DateTime.current.end_of_month - 1.month
Which gives me start and end of the previous month, which is right. How can I search for all documents that have reports (the nested values inside stats, phases, etc) that falls within this range?
Any ideas?
EDIT
It has been suggested to change the way the data is inserted into the db, but unfortunately I have no control on how the data is inserted (done by a third party service/API).
You can store the dates as standard Date objects instead of formatted strings, which MongoDB stores in ISODate format, for instance:
db.collection.insert({date: new Date()});
will have a field like:
{ "date" : ISODate("2016-11-15T15:50:15.167Z") }
Then you can query by date range (Use the $and operator if you need to query between two ranges or the second statement may override the first)
Such as:
// Return all documents in collection with a date between 11-1-2016 and 12-1-2016
db.collection.find({
$and: [
{ date: { $gte: ISODate("2016-11-01T00:00:00.000Z") } },
{ date: { $lt: ISODate("2016-12-01T00:00:00.000Z") } }
]
})
EDIT: In case you cannot modify your collections, you could do a regex style search...
For instance:
db.collection.find({
"stats.phases.banners.reports.date": /2016-11/
});
will return all documents for November 2016 since it matches on all strings containing "2016-11"
EDIT AGAIN:
Here is a solution using the aggregation framework to return the documents in the format you mentioned above i.e.
{
"reports" : {
"date" : "2016-11-23",
"banner" : ObjectId("58404a9450b5412e92ebbb97"),
"cost" : "0.231",
"impressions" : 14,
"clicks" : 0
}
},
{
"reports" : {
"date" : "2016-11-25",
"banner" : ObjectId("58404a9450b5412e92ebbb97"),
"cost" : "0.231",
"impressions" : 14,
"clicks" : 0
}
}
Note you have to do a lot of unwinds due to your heavily nested array structure...
db.collection.aggregate([
{ $unwind: "$stats" },
{ $unwind: "$stats.phases" },
{ $unwind: "$stats.phases.banners" },
{ $unwind: "$stats.phases.banners.reports" },
{ $match: { "stats.phases.banners.reports.date": /2016-11/ } },
{ $project: { _id: 0, reports: "$stats.phases.banners.reports" } }
])
You need to store date fields in as ISODate objects and then you can use comparison operators like $lt, $lte, $gt, $gte, etc.
Here is how to insert data:
db.test.insert({
"stats": [
{
"campaign_id": "some_id",
"log_id": "some_id",
"agent": "some_id",
"office": "some_id",
"hq": "some_name",
"seller": "some_name",
"status": "live",
"phases": [
{
"phase": "main_phase",
"banners": [
{
"banner_id": "some_id_same_as_below",
"split_var": "light",
"reports": [
{
"date": ISODate("2016-11-25T00:00:00.0Z"),
"banner": "some_id_same_as_above",
"cost": "0.231",
"impressions": 14,
"clicks": 0
}
]
}
]
}
]
}
]
})
Following is the query to find documents whose stats.phases.banners.reports.date is between 25 Nov 2016 to 15 Dec 2016.
db.test.find({"stats.phases.banners.reports.date": {$lt: ISODate("2016-11-25T00:00:00.0Z"), $gt: ISODate("2016-12-25T00:00:00.0Z")}})
In a database in MongoDB I am trying to group some data by their date (one group for each day of the year), and then add an additional field that would be the result of the multiplication of two of the already existing fields.
The data structure is:
{
"_id" : ObjectId("567a7c6d9da4bc18967a3947"),
"units" : 3.0,
"price" : 50.0,
"name" : "Name goes here",
"datetime" : ISODate("2015-12-23T10:50:21.560+0000")
}
I first tried a two stage approach using $project and then $group like this
db.things.aggregate(
[
{
$project: {
"_id" : 1,
"name" : 1,
"units" : 1,
"price" : 1,
"datetime":1,
"unitsprice" : { $multiply: [ "$price", "$units" ] }
}
},
{
$group: {
"_id" : {
"day" : {
"$dayOfMonth" : "$datetime"
},
"month" : {
"$month" : "$datetime"
},
"year" : {
"$year" : "$datetime"
}
},
"things" : {
"$push" : "$$ROOT"
}
}
}
],
)
in this case, the first step (the $project) gives the expected output (with the expected value of unitsprice), but then when doing the second $group step, it outputs this error:
"errmsg":$multiply only supports numeric types, not String",
"code":16555
I tried also turning around things, doing the $group step first and then the $project
db.things.aggregate(
[
{
$group: {
"_id" : {
"day" : {
"$dayOfMonth" : "$datetime"
},
"month" : {
"$month" : "$datetime"
},
"year" : {
"$year" : "$datetime"
}
},
"things" : {
"$push" : "$$ROOT"
}
}
},
{
$project: {
"_id" : 1,
"things":{
"name" : 1,
"units" : 1,
"price" : 1,
"datetime":1,
"unitsprice" : { $multiply: [ "$price", "$units" ] }
}
}
}
],
);
But in this case, the result of the multiplication is: unitsprice:null
Is there any way of doing this multiplication? Also, it would be nice to do it in a way that the output would not have nested fields, so it would look like:
{"_id":
"units":
"price":
"name":
"datetime":
"unitsprice":
}
Thanks in advance
PS:I am running MongoDB 3.2
Finally found the error. When importing one of the fields, a few of the price fields were created as a string. Surprisingly, the error didn't came out when first doing the multiplication in the project step (the output was normal until it reached the first wrong field, then it stopped), but when doing the group step.
In order to find the text fields I used this query:
db.things.find( { price: { $type: 2 } } );
Thanks for the hints
I'm learning how to use MeteorJS and I have a record that looks like:
meteor:PRIMARY> db.meals.find()
{ "_id" : "kHjRCXRRoC6JLYjJY", "name" : "Spaghetti & Meatballs", "calories" : "300", "eatenAt" : ISODate("2015-05-20T07:07:00Z"), "userId" : "movpJRhRMwyMZDBqf", "author" : "sergiotapia" }
{ "_id" : "vcQZ2S4MXHs49BknJ", "name" : "Lasgagna", "calories" : "150", "eatenAt" : ISODate("2015-05-20T07:07:00Z"), "userId" : "movpJRhRMwyMZDBqf", "author" : "sergiotapia" }
{ "_id" : "oqw4HZ5tybBKfMJmj", "name" : "test", "calories" : "900", "eatenAt" : ISODate("2015-05-20T07:38:00Z"), "userId" : "movpJRhRMwyMZDBqf", "author" : "sergiotapia" }
{ "_id" : "Pq6vawvTnXQniBvMZ", "name" : "booya", "calories" : "1000", "eatenAt" : ISODate("2015-05-19T07:37:00Z"), "userId" : "movpJRhRMwyMZDBqf", "author" : "sergiotapia" }
I want to filter these records using the ISODate value by both date and time. For example, get me the records from January 1st to January 12nd that are between 9am and 2pm.
Is it possible using a single field, or do I need to have a separate field specifically for time?
Your query is basically:
find documents that are between 2015-01-01 AND 2015-01-12 AND have time between 09:00 AND 14:00.
One approach is using the aggregation framework in particular the Date Aggregation Operators. You can use the meteorhacks:aggregate package that adds proper aggregation support for Meteor. This package exposes .aggregate method on Mongo.Collection instances.
Add to your app with
meteor add meteorhacks:aggregate
Then simply use .aggregate function like below.
var meals = new Mongo.Collection('meals');
var pipeline = [
{
"$project": {
"year": { "$year": "$eatenAt" },
"month": { "$month": "$eatenAt" },
"day": { "$dayOfMonth": "$eatenAt" },
"hour": { "$hour": "$eatenAt" },
"name" : 1,
"calories" : 1,
"eatenAt" : 1,
"userId" : 1,
"author" : 1
}
},
{
"$match": {
"year": 2015,
"month": 1,
"day": { "$gte": 1, "$lte": 12 },
"hour": { "$gt": 8, "$lt": 14 }
}
}
];
var result = meals.aggregate(pipeline);
I'm storing minutely performance data in MongoDB, each collection is a type of performance report, and each document is the measurement at that point in time for the port on the array:
{
"DateTime" : ISODate("2012-09-28T15:51:03.671Z"),
"array_serial" : "12345",
"Port Name" : "CL1-A",
"metric" : 104.2
}
There can be up to 128 different "Port Name" entries per "array_serial".
As the data ages I'd like to be able to average it out over increasing time spans:
Up to 1 Week : minute
1 Week to 1 month : 5 minute
1 - 3 months: 15 minute
etc..
Here's how I'm averaging the times so that they can be reduced :
var resolution = 5; // How many minutes to average over
var map = function(){
var coeff = 1000 * 60 * resolution;
var roundTime = new Date(Math.round(this.DateTime.getTime() / coeff) * coeff);
emit(roundTime, { value : this.metric, count: 1 } );
};
I'll be summing the values and counts in the reduce function, and getting the average in the finalize funciton.
As you can see this would average the data for just the time leaving out the "Port Name" value, and I need to average the values over time for each "Port Name" on each "array_serial".
So how can I include the port name in the above map function? Should the key for the emit be a compound "array_serial,PortName,DateTime" value that I split later? Or should I use the query function to query for each distinct serial, port and time? Am I storing this data in the database correctly?
Also, as far as I know this data gets saved out to it's own collection, what's the standard practice for replacing the data in the collection with this averaged data?
Is this what you mean Asya? Because it's not grouping the documents rounded to the lower 5 minute (btw, I changed 'DateTime' to 'datetime'):
$project: {
"year" : { $year : "$datetime" },
"month" : { $month : "$datetime" },
"day" : { $dayOfMonth : "$datetime" },
"hour" : { $hour : "$datetime" },
"minute" : { $mod : [ {$minute : "$datetime"}, 5] },
array_serial: 1,
port_name: 1,
port_number: 2,
metric: 1
}
From what I can tell the "$mod" operator will return the remainder of the minute divided by five, correct?
This would really help me if I could get the aggregation framework to do this operation rather than mapreduce.
Here is how you could do it in aggregation framework. I'm using a small simplification - I'm only grouping on Year, Month and Date - in your case you will need to add hour and minute for the finer grained calculations. You also have a choice about whether to do weighted average if the point distribution is not uniform in the data sample you get.
project={"$project" : {
"year" : {
"$year" : "$DateTime"
},
"month" : {
"$month" : "$DateTime"
},
"day" : {
"$dayOfWeek" : "$DateTime"
},
"array_serial" : 1,
"Port Name" : 1,
"metric" : 1
}
};
group={"$group" : {
"_id" : {
"a" : "$array_serial",
"P" : "$Port Name",
"y" : "$year",
"m" : "$month",
"d" : "$day"
},
"avgMetric" : {
"$avg" : "$metric"
}
}
};
db.metrics.aggregate([project, group]).result
I ran this with some random sample data and got something of this format:
[
{
"_id" : {
"a" : "12345",
"P" : "CL1-B",
"y" : 2012,
"m" : 9,
"d" : 6
},
"avgMetric" : 100.8
},
{
"_id" : {
"a" : "12345",
"P" : "CL1-B",
"y" : 2012,
"m" : 9,
"d" : 7
},
"avgMetric" : 98
},
{
"_id" : {
"a" : "12345",
"P" : "CL1-A",
"y" : 2012,
"m" : 9,
"d" : 6
},
"avgMetric" : 105
}
]
As you can see this is one result per array_serial, port name, year/month/date combination. You can use $sort to get them into the order you want to process them from there.
Here is how you would extend the project step to include hour and minute while rounding minutes to average over every five minutes:
{
"$project" : {
"year" : {
"$year" : "$DateTime"
},
"month" : {
"$month" : "$DateTime"
},
"day" : {
"$dayOfWeek" : "$DateTime"
},
"hour" : {
"$hour" : "$DateTime"
},
"fmin" : {
"$subtract" : [
{
"$minute" : "$DateTime"
},
{
"$mod" : [
{
"$minute" : "$DateTime"
},
5
]
}
]
},
"array_serial" : 1,
"Port Name" : 1,
"metric" : 1
}
}
Hope you will be able to extend that to your specific data and requirements.
"what's the standard practice for replacing the data in the collection with this averaged data?"
The standard practice is to keep the original data and to store all derived data separately.
In your case it means:
Don't delete the original data
Use another collection (in the same MongoDB database) to store average values