I've got an aggregation pipeline that sorts by the last 24 hours. The Json output generally looks something like the following:
[{"_id": {"day": 28, "hour": 23}, "count": 2},
However, because today is the first of the month, it got mixed up and the above json was the 'latest', while everything containing "day": 1, was sorted last.
I think I fixed it by adding 'month' into the query, but just to be sure is this the correct way of sorting my aggregation?
pipeline = [
{
"$project": {
"_id": 1,
"mac": 1,
"time": 1
}
},
{"$match": {"time": {"$gt": timenow-timedelta(hours=24)}}},
{"$group": {
"_id": {
'month': {"$month": {"date": '$time', "timezone": 'Africa/Johannesburg'}},
'day': {"$dayOfMonth": {"date": '$time', "timezone": 'Africa/Johannesburg'}},
'hour': {"$hour": {"date": '$time', "timezone": 'Africa/Johannesburg'}},
'mac': {'_id': "$mac"}
},
"count": {"$sum": 1}
}},
{
"$group": {
"_id": {
'month':"$_id.month",
'day': "$_id.day",
'hour': "$_id.hour",
},
"count": {"$sum": 1}
}
},
{"$sort": SON([("_id", -1)])}
]
Related
I have a series of documents gathered by aggregation grouping. This is the result for one document:
{
"_id": {
"ip": "79.xxx.xxx.117",
"myDate": "2022-10-19"
},
"date": "2022-10-19",
"allVisitedPages": [
{
"page": "/",
"time": {
"time": "2022-10-19T11:35:44.655Z",
"tz": "-120",
"_id": "634fe1100a011986b7137da0"
}
},
{
"page": "/2",
"time": {
"time": "2022-10-19T12:14:29.536Z",
"tz": "-120",
"_id": "634fea257acb264f23d421f1"
}
},
{
"page": "/",
"time": {
"time": "2022-10-19T15:37:30.002Z",
"tz": "-120",
"_id": "634fea266001ea364eeb38ea"
}
},
],
"visitedPages": 3,
"createdAt": "2022-10-19T11:35:44.920Z"
},
I want to get this (in this case 2 documents as the time difference between array position 2 and 3 is greater than 2 hours):
{
"_id": {
"ip": "79.xxx.xxx.117",
"myDate": "2022-10-19"
},
"date": "2022-10-19",
"allVisitedPages": [
{
"page": "/",
"durationInMinutes": "39",
"time": {
"time": "2022-10-19T11:35:44.655Z",
"tz": "-120",
"_id": "634fe1100a011986b7137da0"
}
},
{
"page": "/2",
"durationInMinutes": "2",
"time": {
"time": "2022-10-19T12:14:29.536Z",
"tz": "-120",
"_id": "634fea257acb264f23d421f1"
}
}
],
"visitedPages": 2,
},
{
"_id": {
"ip": "79.xxx.xxx.117",
"myDate": "2022-10-19"
},
"date": "2022-10-19",
"allVisitedPages": [
{
"page": "/",
"durationInMinutes": "2",
"time": {
"time": "2022-10-19T15:37:30.002Z",
"tz": "-120",
"_id": "634fea266001ea364eeb38ea"
}
},
],
"visitedPages": 1,
},
I want to get a new grouping document if the time between an array position and the following array position is greater than 2 hours. On the last array position it show always show "2".
I tried $divide and $datediff. But this is not possible on the group stage as it's an unary operator. An approach I tried is to calculate the sum of start and end time by dividing. But how to execute this on an array level on the group stage? Maybe someone could point me in the right direction if possible at all?
You can group and then reduce, but another option is to use $setWindowFields to calculate your grouping index before grouping:
db.collection.aggregate([
{$setWindowFields: {
partitionBy: {$concat: ["$ip", "$date"]},
sortBy: {"time.time": 1},
output: {prevtime: {
$push: "$time.time",
window: {documents: [-1, "current"]}
}}
}},
{$addFields: {
minutesDiff: {
$toInt: {
$dateDiff: {
startDate: {$first: "$prevtime"},
endDate: {$last: "$prevtime"},
unit: "minute"
}
}
}
}},
{$addFields: {deltaIndex: {$cond: [{$gt: ["$minutesDiff", 120]}, 1, 0]}}},
{$setWindowFields: {
partitionBy: {$concat: ["$ip", "$date"]},
sortBy: {"time.time": 1},
output: {
groupIndex: {
$sum: "$deltaIndex",
window: {documents: ["unbounded", "current"]}
},
duration: {
$push: "$minutesDiff",
window: {documents: ["current", 1]}
}
}
}
},
{$set: {
duration: {
$cond: [
{$and: [
{$eq: [{$size: "$duration"}, 2]},
{$lte: [{$last: "$duration"}, 120]}
]},
{$last: "$duration"},
2
]
}
}},
{$group: {
_id: {ip: "$ip", myDate: "$date", groupIndex: "$groupIndex"},
date: {$first: "$date"},
allVisitedPages: {$push: {page: "$page", time: "$time", duration: "$duration"}},
visitedPages: {$sum: 1}
}},
{$unset: "_id.groupIndex"}
])
See how it works on the playground example
I want to aggregate entries whose date is less than 1 week old from today.
{
"collection": "my-lovely-collection",
"aggregate": [
{"$match": {"some_field": { "$regex": "awesome*"}}},
{"$match": {"created": {"$lt":
{"$dateToString":
{"date":
{"$dateSubtract":
{"startDate": {"$currentDate": {"$type": "date"}},
"unit": "day",
"amount": 7
}}}}}}},
{"$group": {"_id": "$some_field", "count": {"$sum": 1 }}},
{"$sort": [{"name": "count", "direction": 1}]}
]
}
When I use a hard-coded date for today everything works fine (but that's not what I want)
{"$match": {"created": {"$lt": "2022-08-29"}}}
You could use $dateSubtract:
{
$match: {
$expr: {
$lt: [
"$created",
{
$dateToString: {
format: "%Y-%m-%d",
date: {
$dateSubtract: {
startDate: "$$NOW",
unit: "week",
amount: 1
}
}
}
}
]
}
}
}
Mongo Playground: https://mongoplayground.net/p/QWT4ar4gbt0
Documentation: https://www.mongodb.com/docs/manual/reference/operator/aggregation/dateSubtract/
I'm doing an aggregation where I sum all the sales by month (createdAt), and I'm trying to calculate the variation between the prior value.
How to compare value with the prior value of same field in MongoDB?
[
{"$addFields": { "createdAt": {"$convert": { "input": "$createdAt", "to": "date", "onError": null}}}},
{"$addFields": {"createdAt": {"$cond": {"if": { "$eq": [{"$type": "$createdAt" }, "date"]},
"then": "$createdAt", "else": null}}}},
{"$addFields": {"__alias_0": {"year": {"$year": "$createdAt" }, "month": {"$subtract": [{ "$month": "$createdAt"}, 1] } } }},
{ "$group": { "_id": { "__alias_0": "$__alias_0" }, "__alias_1": {"$sum": 1 }}},
{ "$project": {"_id": 0, "__alias_0": "$_id.__alias_0", "__alias_1": 1}},
{ "$project": {"group": "$__alias_0", "value": "$__alias_1", "_id": 0 }}
I have many documents in a MongoDB database which look like the following four documents (note the first 3 are Feb 2017 and the last one is March 2017):
{"_id": 0,
"date": ISODate("2017-02-01T00:00:00Z),
"item": "Basketball",
"category": "Sports"}
{"_id": 1,
"date": ISODate("2017-02-13T00:00:00Z),
"item": "Football",
"category": "Sports"}
{"_id": 2,
"date": ISODate("2017-02-22T00:00:00Z),
"item": "Massage",
"category": "Leisure"}
{"_id": 3,
"date": ISODate("2017-03-05T00:00:00Z),
"item": "Golf club",
"category": "Sports"}
I'm trying to group the items by MONTH/YEAR and within that, group the items by CATEGORY. So the aggregation pipeline should return something that looks like this for the four docs above:
{"_id": {
"month": 2,
"year": 2017
},
"data": [
{"category": "Sports",
"items": ["Basketball", "Football"]
},
{"category": "Leisure",
"items": ["Massage"]
}
]
},
{"_id": {
"month": 3,
"year": 2017
},
"data": [
{"category": "Sports",
"items": ["Golf Club"]
}
]
}
I also want the returned cursor to be in order with year as the primary sort and month as the secondary sort.
Figured it out. Here's the answer using the pymongo api:
from bson.son import SON
cursor = db.collection.aggregate([
{'$group': {
'_id': {'month': {'$month': '$date'},
'year': {'$year': '$date'},
'$category': '$category'},
'items': {'$push': '$item'}
}},
{'$group': {
'_id': {'month': '_id.month',
'year': '_id.year'}
'data': {
'$push': {
'category': '$_id.category',
'items': '$items'
}
}
}},
{'$sort': SON([('_id.year', 1), ('_id.month', 1)])}
])
my_data = list(cursor)
I need to group datetime field by minute. That's rather easy:
db.my_collection.aggregate([
{ "$group": {
"_id": {
"year": { "$year": "$meta_data.created_at" },
"dayOfYear": { "$dayOfYear": "$meta_data.created_at" },
"hour": { "$hour": "$meta_data.created_at"},
"minute": { "$minute": "$meta_data.created_at"}
},
"count": { "$sum": 1 }
}}])
The problem is that the output is:
{
"_id": {
"year": 2016,
"dayOfYear": 349,
"hour": 16,
"minute": 43
},
"count": 4
}
Which is not really convenient if I want to query by Date later on.
How can I make the output of the aggregation back in DateTime object?
OK - so what I meant in the question is doable by doing:
db.my_collection.aggregate([
{"$group": {
"_id": {
"date_by_minute": {"$subtract": [{"$subtract":
["$meta_data.created_at",
{"$multiply":[{"$second": "$meta_data.created_at"} , 1000]}]},
{"$millisecond": "$meta_data.created_at"}]}},
"count": { "$sum": 1 }
}}
])