Mongo date aggregate output Date object - mongodb

I need to group datetime field by minute. That's rather easy:
db.my_collection.aggregate([
{ "$group": {
"_id": {
"year": { "$year": "$meta_data.created_at" },
"dayOfYear": { "$dayOfYear": "$meta_data.created_at" },
"hour": { "$hour": "$meta_data.created_at"},
"minute": { "$minute": "$meta_data.created_at"}
},
"count": { "$sum": 1 }
}}])
The problem is that the output is:
{
"_id": {
"year": 2016,
"dayOfYear": 349,
"hour": 16,
"minute": 43
},
"count": 4
}
Which is not really convenient if I want to query by Date later on.
How can I make the output of the aggregation back in DateTime object?

OK - so what I meant in the question is doable by doing:
db.my_collection.aggregate([
{"$group": {
"_id": {
"date_by_minute": {"$subtract": [{"$subtract":
["$meta_data.created_at",
{"$multiply":[{"$second": "$meta_data.created_at"} , 1000]}]},
{"$millisecond": "$meta_data.created_at"}]}},
"count": { "$sum": 1 }
}}
])

Related

Aggregating Mongo collection by year and then by month

I have a Mongo collection that looks like this with a bunch of months, days, years:
[
{
"Date": ISODate("2021-08-05T04:59:54.000Z"),
"Amount": 999,
"Business": "Business 1",
},
{
"Date": ISODate("2021-08-05T04:59:54.000Z"),
"Amount": 5.99,
"Business": "Business 2",
},
{
"Date": ISODate("2021-07-17T21:41:56.000Z"),
"Amount": 20000,
"Business": "Business 2",
},
{
"Date": ISODate("2021-06-17T21:41:56.000Z"),
"Amount": 200,
"Business": "Business 5",
}
]
I have done an aggregation like this
db.collection.aggregate({
$group: {
_id: {
year: {
$year: "$Date"
},
month: {
$month: "$Date"
}
},
sum: {
$sum: "$Amount"
}
}
})
...which partially gives me what I want which is a sum of amounts per year and month.
[
{
"_id": {
"month": 6,
"year": 2021
},
"sum": 200
},
{
"_id": {
"month": 7,
"year": 2021
},
"sum": 20000
},
{
"_id": {
"month": 8,
"year": 2021
},
"sum": 1004.99
}
]
What I would like however is to have something like the below where the year is at the top and the months are aggregated in a sum so that it's easier to iterate in the frontend but I have not been able to get it no matter what I have tried:
[
{
"year": 2021,
"sumAmount": 21204.99,
"months": [
{
"month": 7,
"amount": 20000
},
{
"month": 6,
"amount": 200
},
{
"month": 8,
"amount": 1004.99
}
]
},
{ "year" : 2020,
....
}
]
I have been pretty close in using another $group and $push but I have not been able to get what in my mind is a second group by month. Any help will be appreciated!
You just need one more $group to get your expected result. For another sorting, you can put an $sort after the $group stage. You will need to use $push to keep the ordering in the final array.
db.collection.aggregate([
{
$group: {
_id: {
year: {
$year: "$Date"
},
month: {
$month: "$Date"
}
},
sum: {
$sum: "$Amount"
}
}
},
{
"$sort": {
"_id.year": 1,
"_id.month": 1
}
},
{
"$group": {
"_id": "$_id.year",
"sumAmount": {
$sum: "$sum"
},
"months": {
"$push": {
"month": "$_id.month",
"amount": "$sum"
}
}
}
}
])
Here is the Mongo playground for your reference.

MongoDB - Aggregate by distinct field then count per day

I have a mongodb database that collects device data.
Example document is
{
"_id" : ObjectId("5c125a185dea1b0252c5352"),
"time" : ISODate("2018-12-13T15:09:42.536Z"),
"mac" : "10:06:21:3e:0a:ff",
}
The goal would be to count the unique mac values per day, from the first document in the db to the last document in the db.
I've been playing around and came to the conclusion that I would need to have multiple groups as well as projects during my aggregations.
This is what I tried - not sure if it's in the right direction or not or just completely messed up.
pipeline = [
{"$project": {
"_id": 1,
"mac": 1,
"day": {
"$dayOfMonth":"$time"
},
"month": {
"$month":"$time"
},
"year": {
"$year":"$time"
}
}
},
{
"$project": {
"_id": 1,
"mac": 1,
"time": {
"$concat": [{
"$substr":["$year", 0, 4]
},
"-", {
"$substr": ["$month", 0, 2]
},
"-",
{
"$substr":["$day", 0, 2]
}]
}
}
},
{
"$group": {
"_id": {
"time": "$time",
"mac": "$mac"
}
},
"$group": {
"_id": "$_id.time",
"count":{"$sum": 1},
}
}
]
data = list(collection.aggregate(pipeline, allowDiskUse=True))
The output now doesn't look like it did any aggregation,
[{"_id": null, "count": 751050}]
I'm using Pymongo as my driver and using Mongodb 4.
Ideally it should just show the date and count (eg { "_id" : "2018-12-13", "count" : 2 }.
I would love some feedback and advice.
Thanks in advance.
I prefer to minimize the number of stages, and especially to avoid unnecessary $group stages. So I would do it with the following pipeline:
pipeline = [
{ '$group' : {
'_id': { '$dateToString': { 'format': "%Y-%m-%d", 'date': "$time" } },
'macs':{ '$addToSet': '$mac' }
} },
{$addFields:{ 'macs':{'$size':'$macs'}}}
]
There's an operator called "$dateToString", which would solve most of your problems.
Edit: Didn't read the question carefully, #Asya Kamsky, thank you for pointing out. Here' the new answer.
pipeline = [
{
"$group": {
"_id": {
"date": {
$dateToString: {
format: "%Y-%m-%d",
date: "$time"
}
},
"mac": "$mac"
}
}
},
{
"$group": {
"_id": "$_id.date",
"count": {
"$sum": 1
}
}
}
]
[
{
"$project": {
"_id": 1,
"mac": 1,
"time": { "$dateToString": { "format": "%Y-%m-%d", "date": "$time", "timezone": "Africa/Johannesburg"}}
},
},
{
"$group": {
"_id":{
"time": "$time",
"mac": "$mac",
}}},{
"$group": {
"_id": "$_id.time",
"count":{"$sum": 1}
}},
{"$sort": SON([("_id", -1)])}
]
Does exactly what it should do.
Thanks. :)

MongoDB 2.4.10 GroupBy

I have created a MongoDB on my computer with MongoDB v3.4.4 and now I want to migrate to a Raspberry Pi and MongoDB (v 2.4.10), but when I want to execute this query it doesn't work because of the version.
db.product.aggregate(
{
"$project": {
"price": 1,
"y": {
"$year": "$date"
},
"m": {
"$month": "$date"
}
}
},
{
"$group": {
"_id": {
"year": "$y",
"month": "$m"
},
"total": {
"$sum": "$price"
},
}
},
{
$sort: {
"_id.year": 1,
"_id.month": 1
}
})
Is there a way to translate this query to MongoDB 2.4.10?
The error is this:
Thanks in advance!

is It possible to compare two Months Data in single Collection in MongoDB?

I have collection database with 10 000 000 call records.
I want to compare call usage of previous month to next month.
Example of collection document
{
"_id" : ObjectId("54ed74d76c68d23af73e230a"),
"msisdn" : "9818441000",
"callType" : "ISD"
"duration" : 10.109999656677246,
"charges" : 200,
"traffic" : "Voice",
"Date" : ISODate("2014-01-05T19:51:01.928Z")
}
{
"_id" : ObjectId("54ed74d76c68d23af73e230b"),
"msisdn" : "9818843796",
"callType" : "Local",
"duration" : 1,
"charges" : 150,
"traffic" : "Voice",
"Date" : ISODate("2014-02-04T14:25:35.861Z")
}
Duration is my usage.
I want to compare duration of ISODate("2014-01-04T14:25:35.861Z") with next month ISODate("2014-02-04T14:25:35.861Z") of all records.
All msisdn number are same in both months.
The obvious call here seems to be to aggregate the data, which in MongoDB the aggregation framework is well suited to. Taking the general use case fields that I see present here. And yes, we generally talk in terms of discrete months rather than some value assumed to be one month from the current point in time:
db.collection.aggregate([
{ "$match": {
"msisdn": "9818441000",
"Date": {
"$gte": new Date("2014-01-01"),
"$lt": new Date("2014-03-01")
}
}},
{ "$group": {
"_id": {
"year": { "$year": "$Date" },
"month": { "$month": "$Date" },
"callType": "$callType",
"traffic": "$traffic"
},
"charges": { "$sum": "$charges" },
"duration": { "$sum": "$duration" }
}},
{ "$sort": { "_id": 1 } }
])
The intent there is to produce two records in the response representing each month as a distinct value.
You can basically take those two results and compare the difference between them in client code.
Or you can do this over all "MSISDN" values with months grouped into pairs within the document:
db.collection.aggregate([
{ "$match": {
"Date": {
"$gte": new Date("2014-01-01"),
"$lt": new Date("2014-03-01")
}
}},
{ "$group": {
"_id": {
"year": { "$year": "$Date" },
"month": { "$month": "$Date" },
"msisdn": "$msisdn",
"callType": "$callType",
"traffic": "$traffic"
},
"charges": { "$sum": "$charges" },
"duration": { "$sum": "$duration" }
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": {
"msisdn": "$_id.msisdn",
"callType": "$_id.callType",
"traffic": "$_id.traffic"
},
"data": { "$push": {
"year": "$_id.year",
"month": "$_id.month",
"charges": "$charges",
"duration": "$duration"
}}
}}
])

Can the MongoDB Aggregation framework combine $group out?

Here is my query, I would like to combine $_id to YYYY-MM-DD? or any function like Mysql DATE() to convert DATETIME format to DATE format?
db.event.aggregate([
{
$project: {
"created": {$add: ["$created", 60*60*1000*8]},
}
},
{
$group: {
"_id": {
"year": {"$year": "$created"},
"month": {"$month": "$created"},
"day": {"$dayOfMonth": "$created"}
},
"count": { $sum: 1 }
}
}
])
You basically already are by using the date aggregation operators to split up the components into your compound _id key, and this is probably the best way to handle it. You can actually alter this though with the $substr operator and use of $concat:
db.event.aggregate([
{ "$project": {
"created": {$add: ["$created", 60*60*1000*8]},
}},
{ "$group": {
"_id": {
"year": {"$year": "$created"},
"month": {"$month": "$created"},
"day": {"$dayOfMonth": "$created"}
},
"count": { "$sum": 1 }
}},
{ "$project": {
"_id": { "$concat": [
{ "$substr": [ "$_id.year", 0, 4 ] },
"-",
{ "$cond": [
{ "$lte": [ "$_id.month", 9 ] },
{ "$concat": [
"0",
{ "$substr": [ "$_id.month", 0, 2 ] }
]},
{ "$substr": [ "$_id.month", 0, 2 ] }
]},
"-",
{ "$cond": [
{ "$lte": [ "$_id.day", 9 ] },
{ "$concat": [
"0",
{ "$substr": [ "$_id.day", 0, 2 ] }
]},
{ "$substr": [ "$_id.day", 0, 2 ] }
]}
]},
"count": 1
}}
])
So there is a bit of coercion of the values from the date parts to strings there as well as padding out any values under two didgits with a leading 0 just like in a "YYYY-MM-DD" format.
Noting that it can be done, and has been able to be done for some time, but it is notably missing from the manual page description of the $substr operator.
Not to sure about your "date math" at the start there. I would say you would be better off using the aggregation operators and then working on the values that you wanted to adjust by, or if indeed it was something like a "timezone" correction, then again you would probably be better off processing that client side.