Im wondering if the following is possible in MongoDB.
I have collection of documents that represent changes in some value in time:
{
"day" : ISODate("2018-12-31T23:00:00.000Z"),
"value": [some integer value]
}
There are no 'holes' in the data, I have entries for all days within some period.
Is it possible to query this collection to get only documents that has different value than previous one (when sorting by day asc)? For example, having following documents:
{ day: ISODate("2019-04-01T00:00:00.000Z"), value: 10 }
{ day: ISODate("2019-04-02T00:00:00.000Z"), value: 10 }
{ day: ISODate("2019-04-03T00:00:00.000Z"), value: 15 }
{ day: ISODate("2019-04-04T00:00:00.000Z"), value: 15 }
{ day: ISODate("2019-04-05T00:00:00.000Z"), value: 15 }
{ day: ISODate("2019-04-06T00:00:00.000Z"), value: 10 }
I want to retrieve documents for 2018-04-01, 2018-04-03 and 2018-04-06 and only those since others don't have a change of value.
You need to get pairs of consecutive docs to detect the gap. For that you can push all documents into single array, and zip it with itself shifted 1 element from the head:
db.collection.aggregate([
{ $sort: { day: 1 } },
{ $group: { _id: null, docs: { $push: "$$ROOT" } } },
{ $project: {
pair: { $zip: {
inputs:[ { $concatArrays: [ [false], "$docs" ] }, "$docs" ]
} }
} },
{ $unwind: "$pair" },
{ $project: {
prev: { $arrayElemAt: [ "$pair", 0 ] },
next: { $arrayElemAt: [ "$pair", 1 ] }
} },
{ $match: {
$expr: { $ne: ["$prev.value", "$next.value"] }
} },
{ $replaceRoot:{ newRoot: "$next" } }
])
The rest is trivial - you unwind the array back to documents, compare the pairs, filter out the equal ones, and replaceRoot from what's left.
Starting in Mongo 5, it's a perfect use case for the new $setWindowFields aggregation operator:
// { day: ISODate("2019-04-01T00:00:00.000Z"), value: 10 } <=
// { day: ISODate("2019-04-02T00:00:00.000Z"), value: 10 }
// { day: ISODate("2019-04-03T00:00:00.000Z"), value: 15 } <=
// { day: ISODate("2019-04-04T00:00:00.000Z"), value: 15 }
// { day: ISODate("2019-04-05T00:00:00.000Z"), value: 15 }
// { day: ISODate("2019-04-06T00:00:00.000Z"), value: 10 } <=
db.collection.aggregate([
{ $setWindowFields: {
sortBy: { day: 1 },
output: { pair: { $push: "$value", window: { documents: [-1, "current"] } } }
}},
// { day: ISODate("2019-04-01T00:00:00Z"), value: 10, pair: [ 10 ] }
// { day: ISODate("2019-04-02T00:00:00Z"), value: 10, pair: [ 10, 10 ] }
// { day: ISODate("2019-04-03T00:00:00Z"), value: 15, pair: [ 10, 15 ] }
// { day: ISODate("2019-04-04T00:00:00Z"), value: 15, pair: [ 15, 15 ] }
// { day: ISODate("2019-04-05T00:00:00Z"), value: 15, pair: [ 15, 15 ] }
// { day: ISODate("2019-04-06T00:00:00Z"), value: 10, pair: [ 15, 10 ] }
{ $match: { $expr: { $or: [
{ $eq: [ { $size: "$pair" }, 1 ] }, // first doc doesn't have a previous doc
{ $ne: [ { $first: "$pair" }, { $last: "$pair" } ] }
]}}},
{ $unset: ["pair"] }
])
// { day: ISODate("2019-04-01T00:00:00Z"), value: 10 }
// { day: ISODate("2019-04-03T00:00:00Z"), value: 15 }
// { day: ISODate("2019-04-06T00:00:00Z"), value: 10 }
This:
starts with a $setWindowFields aggregation stage which adds a pair field representing the current document's value and the previous document's value (output: { pair: { ... }}):
$setWindowFields provides for a given document a view of other documents (a window)
which in our case is the "current" document and the previous one "-1": window: { documents: [-1, "current"] }.
such that we build within this window an array of values: $push: "$value"
and note that we've made sure to sort documents by day: sortBy: { day: 1 }.
and then:
filters in the first document (which is remarquable by its array having only one element): { $eq: [ { $size: "$pair" }, 1 ] }
and filters out the following documents if their pair has the same values: { $ne: [ { $first: "$pair" }, { $last: "$pair" } ] }
Related
{
metadata:{
dat:jkjcsvbdskjcbdskjcbdac,
meterId:kahcvajc
}
activeEnergy:1111,
actualtime:1689827191000
}
The document is something like that, I am facing problem only with activeEnergy so I want to focus on that only. Below I have written the code, in the first group object I have divided by year, month, day but in actual code it is dynamic, if the payload from frontend I receive is month and week then I group accordingly, but if I receive day in payload then I group by hour, I calculate the max and min energy of that hour and then sum for all hour, as activeEnergy is continuous, but the problem is that I am not getting data at every second so it is possible that I get the first data at 10:25am and last data at 10:45am therefore by taking max and min I only calculate for that 20 min and miss on all the remaining time data, ideally what I should do is max of this hour by max of previous hour. Similarly for the day the first document is at 12:59 AM and last Data at 11 pm, IF I calculate the max and min of the day I will still miss 2 hour data approx, so I have to find a way to find difference of max of today and max of previous day.
That becomes a problem because data is not grouped like that by me and I don't know any other way I can group data, how to solve the problem?
db.ts_events.aggregate([
{
$project: {
"y":{"$year": {$toDate: "$actualtime"}},
"m":{"$month": {$toDate: "$actualtime"}},
"d":{"$dayOfMonth": {$toDate: "$actualtime"}},
"h":{"$hour": {$toDate: "$actualtime"}},
"activeEnergy": 1,
"metadata.meterId": 1,
"activePower": 1,
"actualtime": 1,
"powerFactor": 1,
"metadata.dat": 1
}
},
{
$match: {
"metadata.dat": "62f0f3459731692a5eab5ad6/south0tpbit/tamilnadu5dvs8w/chennaidzc2yd/kknagarj4ffzo",
"actualtime": {
$gte: 1656613800000, $lte: 1659292199999,
},
// "metadata.device":"ObjectId(62f0f9b5f757672222282d9)" how to check using object id?,
"metadata.meterId": "911615402222257_2",
}
},
{
$group: {
_id: {
date: {
year: "$y",
month: "$m",
day: "$d",
// hour: "$h",
},
meter: "$metadata.meterId",
},
maxValue: {
$max: "$activeEnergy"
},
minValue: {
$min: "$activeEnergy"
},
averageActivePowerOfDay: { $avg: "$activePower" },
averagePowerFactorOfDay: { $avg: "$powerFactor" },
}
},
{
$addFields: {
differnce: {
$subtract: [
"$maxValue",
"$minValue"
]
},
}
},
//
//
//
{
$group: {
_id: null, res: {
$push: '$$ROOT'
}, differnceSum: {
$sum: '$differnce'
},
averageActivePowerOverThePeriod: {
$avg: "$averageActivePowerOfDay"
},
averagePowerFactorOverThePeriod: {
$avg: "$averagePowerFactorOfDay"
}
}
}
])
You can try with $setWindowFields
db.collection.aggregate([
{ $set: { actualTime: { $toDate: "$actualtime" } } },
{
$setWindowFields: {
partitionBy: "$metadata.dat",
sortBy: { actualTime: 1 },
output: {
max_today: {
$max: "$activeEnergy",
window: { range: [-1, 1], unit: "day" }
},
max_yesterday: {
$max: "$activeEnergy",
window: { range: [-2, -1], unit: "day" }
},
day: {
$last: "$actualTime",
window: { range: [-1, 1], unit: "day" }
},
}
}
},
{
$group: {
_id: { metadata: "$metadata", day: "$day" },
values: {
$addToSet: {
max_today: "$max_today",
max_yesterday: "$max_yesterday"
}
}
}
},
{ $replaceWith: { $mergeObjects: ["$_id", { $first: "$values" }] } }
])
I am using Mongo daily bucketing pattern. Each daily document contains an array with value calculated for every hour for that day:
{
meter: 'meterId',
date: 'dailyBucket',
hourlyConsumption: [0,0,1,1,1,2,2,2,4,4,4,4,3,3,3...] // array with 24 values for every hour of a day
}
Now in one of my aggregation queries, I would like to group documents for the same day of multiple meters and get a result like this:
INPUT (consumption of multiple meters in a same day)
{
meter: 'MeterA',
date: '2021-05-01',
hourlyConsumption: [0,0,1,1,1,2,2,2,4,4,4,4,3,3,3...]
},
{
meter: 'MeterB',
date: '2021-05-01',
hourlyConsumption: [10,10,10,10,10,10,10,10,10,10,10,10,10,10,10...]
}
RESULT (combined into single document)
{
date: '2021-05-01',
hourlyConsumption: [10,10,11,11,11,12,12,12,14,14,14,14,13,13,13...]
}
is there a way to achieve this without using $accumulator?
You can use $reduce
db.collection.aggregate([
{
$group: {
_id: "$date",
hourlyConsumption: { $push: "$hourlyConsumption" }
}
},
{
$set: {
hourlyConsumption: {
$reduce: {
input: "$hourlyConsumption",
initialValue: [],
in: { $map: { input: { $range: [ 0, 23 ] },
as: "h",
in: {
$sum: [
{ $arrayElemAt: [ "$$value", "$$h" ] },
{ $arrayElemAt: [ "$$this", "$$h" ] }
]
}
}
}
}
}
}
}
])
Mongo Playground
Or you use $unwind and $group:
db.collection.aggregate([
{
$unwind: {
path: "$hourlyConsumption",
includeArrayIndex: "hour"
}
},
{
$group: {
_id: {
date: "$date",
hour: "$hour"
},
hourlyConsumption: { $sum: "$hourlyConsumption" }
}
},
{ $sort: { "_id.hour": 1 } },
{
$group: {
_id: "$_id.date",
hourlyConsumption: { $push: "$hourlyConsumption" }
}
}
])
Mongo Playground
However, when you use $unwind, then you actually contradict your bucketing design pattern.
I have a problem with mongodb query. I have schema like this:
{
type: TYPE_1,
amount: 10
list: [
{ ...}
]
}
{
type: TYPE_1,
amount: 14
list: [
{ ...}
]
}
{
type: TYPE_2,
amount: 17
list: [
{ ...}
]
}
...
I want to filter documents with biggest amount value for every type field.
Like this:
{
type: TYPE_1,
amount: 14
list: [
{ ...}
]
}
{
type: TYPE_2,
amount: 17
list: [
{ ...}
]
}
Is it possible using mongodb aggregations?
db.collection.aggregate([
{
$group:
{
"_id": "$type",
"amount": {$max: "$amount"}
}
}
])
You can achieve with $group
db.collection.aggregate([
{
"$group": {
"_id": "$type", //group by type
"amount": { //get the maximum
$max: "$amount"
},
data: {
$push: { //add list so that you can use it in the next stage
"lists": "$list"
}
}
}
}
])
play
I have records stored in my collection as follows:
{
"sessionId" : "f960e3db-838c-42aa-95ce-a807096f7036",
"date" : "12-02-2020",
"hour" : "13",
"month" : "02",
"time" : "13:46:50",
"weekDay" : "Wednesday",
}
I want to group the above records by 'date', 'hour', getting the number of unique'sessionId' per hour. Something like below:
{
"12-02-2020": {
00: 23,//hour:unique number of sessions in that hour
01: 3,
04: 33,
05: 0,
10: 1
},
"13-02-2020": {
00: 2,//hour:unique number of sessions in that hour
03: 33,
09: 23,
05: 6,
10: 1
}
}
Can anyone please formulate the query for the above?
It is often a challenge when you desire dynamic field names and arrays, I found this solution:
db.collection.aggregate([
// group by hour
{
$group: {
_id: { date: "$date", hour: "$hour" },
sessions: { $addToSet: "$sessionId" }
}
},
// count the sessions
{ $set: { sessions: { $size: "$sessions" } } },
// group by day
{
$group: {
_id: "$_id.date",
hour: { $push: "$_id.hour" },
sessions: { $push: "$sessions" }
}
},
// transform result
{ $set: { data: { $range: [0, { $size: "$hour" }] } } },
{
$set: {
data: {
$map: {
input: "$data",
as: "idx",
in: {
k: { $arrayElemAt: ["$hour", "$$idx"] },
v: { $arrayElemAt: ["$sessions", "$$idx"] }
}
}
}
}
},
// transform day and hour values
{ $set: { v: { $arrayToObject: "$data" } } },
{ $project: { data: { k: "$_id", v: "$v" } } },
{ $set: { data: { $arrayToObject: "$data" } } },
{ $replaceRoot: { newRoot: "$data" } }
])
Mongo playground
You can try as below :
db.collection.aggregate([
/** group based on session & date & hour to get unique docs based on session */
{ $group: { _id: { session: "$sessionId", date: "$date", hour: "$hour" } } },
/** group on date & hour & count no.of docs */
{
$group: {
_id: { date: "$_id.date", hour: "$_id.hour" },
count: { $sum: 1 }
}
},
/** transform into entire data of each doc into data field with converting ['k':k..., 'v':v...] into {k,v} */
{
$project: {
data: {
$arrayToObject: [
[
{
k: "$_id.date",
v: { $arrayToObject: [[{ k: "$_id.hour", v: "$count" }]] }
}
]
]
}
}
},
/** replace root of each doc with new root as data */
{
$replaceRoot: {
newRoot: "$data"
}
}
]);
Test : MongoDB-Playground
I would like to achieve something like
{ _id: "A", count: 2 }
{ _id: "B", count: 1 }
from
{ userId: "A", timeStamp: "12:30PM" } <- start of 5 min interval A: 1
{ userId: "B", timeStamp: "12:30PM" } <- start of 5 min interval B: 1
{ userId: "B", timeStamp: "12:31PM" } <- ignored
{ userId: "A", timeStamp: "12:32PM" } <- ignored
{ userId: "B", timeStamp: "12:33PM" } <- ignored
{ userId: "A", timeStamp: "12:37PM" } <- start of next 5 min A : 2
where it groups based on userId and then after userId is group, the count is triggered every 5 mins.
For example: Within any 5 min period, starting at say midnight, an unlimited number of collections can have a timeStamp from 00:00 to 00:05 but would only be counted as 1 hit.
Hopefully I am explaining this clearly.
I'm able to group by userId and get the count in general but setting a condition of the count seems to be tricky.
You can try $bucket and $addToSet - the drawback is that you have to specify all the ranges manually:
db.col.aggregate([
{
$bucket: {
groupBy: "$timeStamp",
boundaries: [ "12:30PM", "12:35PM", "12:40PM", "12:45PM", "12:50PM", "12:55PM", "13:00PM" ],
output: {
"users" : { $addToSet: "$userId" }
}
}
},
{
$unwind: "$users"
},
{
$group: { _id: "$users", count: { $sum: 1 } }
}
])
Micki's solution is better if you have mongo 3.6.
If you have mongo 3.4 you can use $switch.
Obviously you would need to add all the cases in the day.
db.getCollection('user_timestamps').aggregate(
{
$group: {
_id: '$userId',
timeStamp: {$push: '$timeStamp'}
}
},
{
$project: {
timeStamps: {
$map: {
input: '$timeStamp',
as: 'timeStamp',
in: {
$switch: {
branches: [
{
case: {
$and: [
{$gte: ['$$timeStamp', '12:30PM']},
{$lt: ['$$timeStamp', '12:35PM']}
]
},
then: 1
},
{
case: {
$and: [
{$gte: ['$$timeStamp', '12:35PM']},
{$lt: ['$$timeStamp', '12:40PM']}
]
},
then: 2
}
],
default: 0
}
}
}
}
}
},
{
$unwind: '$timeStamps'
},
{
$group: {
_id: '$_id',
count: {
$addToSet: '$timeStamps'
}
}
},
{
$project: {
_id: true,
count: {$size: '$count'}
}
}
)
If you don't have mongo 3.4 you can replace the $switch with
cond: [
{
$and: [
{$gte: ['$$timeStamp', '12:30PM']},
{$lt: ['$$timeStamp', '12:35PM']}
]
},
1,
{
cond: [
{
$and: [
{$gte: ['$$timeStamp', '12:35PM']},
{$lt: ['$$timeStamp', '12:40PM']}
]
},
2,
0
]
}
]