MongoDB: aggregate and group by splitting the id - mongodb

My schema implementation is influenced from this tutorial on official mongo site
{
_id: String,
data:[
{
point_1: Number,
ts: Date
}
]
}
This is basically schema designed for time series data and I store data for each hour per device in an array in a single document. I create _id field combining device id which is sending the data and time. For example if a device having id xyz1234 sends a data at 2018-09-11 12:30:00 then my _id field becomes xyz1234:2018091112.
I create new doc if the document for that hour for that device doesn't exist otherwise I just push my data to the data array.
client.db('iot')
.collection('iotdata')
.update({_id:id},{$push:{data:{point_1,ts:date}}},{upsert:true});
Now I am facing problem while doing aggregation. I am trying to get these types of values
Min point_1 value for many devices in last 24 hours by grouping on device id
Max point_1 value for many devices in last 24 hours by grouping on device id
Average point_1 for many devices in last 24 hours by grouping on device id
I thought this is very simple aggregation then I realized device id is not direct but mixed with time data so it's not so direct to group data by device id. How can I split the _id and group based on device id? I tried my level best to write the question as clearly as possible so please ask questions in comments if any part of the question is not clear.

You can start with $unwind on data to get single document per entry. Then you can get deviceId using $substr and $indexOfBytes operators. Then you can apply your filtering condition (last 24 hours) and use $group to get min, max and avg
db.col.aggregate([
{
$unwind: "$data"
},
{
$project: {
point_1: "$data.point_1",
deviceId: { $substr: [ "$_id", 0, { $indexOfBytes: [ "$_id", ":" ] } ] },
dateTime: "$data.ts"
}
},
{
$match: {
dateTime: { $gte: ISODate("2018-09-10T12:00:00Z") }
}
},
{
$group: {
_id: "$deviceId",
min: { $min: "$point_1" },
max: { $max: "$point_1" },
avg: { $avg: "$point_1" }
}
}
])

You can use below query in 3.6.
db.colname.aggregate([
{"$project":{
"deviceandtime":{"$split":["$_id", ":"]},
"minpoint":{"$min":"$data.point_1"},
"maxpoint":{"$min":"$data.point_1"},
"sumpoint":{"$sum":"$data.point_1"},
"count":{"$size":"$data.point_1"}
}},
{"$match":{"$expr":{"$gte":[{"$arrayElemAt":["$deviceandtime",1]},"2018-09-10 00:00:00"]}}},
{"$group":{
"_id":{"$arrayElemAt":["$deviceandtime",0]},
"minpoint":{"$min":"$minpoint"},
"maxpoint":{"$max":"$maxpoint"},
"sumpoint":{"$sum":"$sumpoint"},
"countpoint":{"$sum":"$count"}
}},
{"$project":{
"minpoint":1,
"maxpoint":1,
"avgpoint":{"$divide":["$sumpoint","$countpoint"]}
}}
])

Related

MongoDB Aggregation question using summations / matches

I have a collection with the following type of documents:
{
device: integer,
date: string,
time: string,
voltage: double,
amperage: double
}
Data is inserted as time series data, and a separate process aggregates and averages results so that this collection has a single document per device every 5 minutes. ie. time is 00:05:00, 00:10:00, etc.
I need to search for a specific group of devices (usually 5-10 at a time). I need the voltage to be >= 27.0, and I need to search for a single date.
That part is easy, but I need to only find data when all 5-10 systems at a time interval meet the 27.0 requirement. I'm not sure how to handle that requirement.
Once I know that, I then need to find the specific grouping of devices that have the lowest summation of the amperage field, and I need to return the time that this occurred.
So, lets assume I am going to search for 5 devices. I need to find the time when all 5 devices have a voltage >= 27.0 and the summation of the amperage field is the lowest.
I'm not sure how to require that all the devices meet the voltage requirement, and then for that group of devices, to then find the time when the amperage summation is the lowest.
Any questions would be great.
Thanks.
You need to use $all operator.
Note: Provide please more information about "the summation of the amperage field is the lowest"
db.collection.aggregate([
{
$match: {
device: { $in: [1, 2, 3] },
date: "2022/10/01",
voltage: { $gte: 27.0 }
}
},
{
$group: {
_id: "$time",
device: {
"$addToSet": "$device"
},
amperage: {
$min: "$amperage"
},
root: {
$push: "$$ROOT"
}
}
},
{
$match: {
device: { $all: [ 1, 2, 3 ] }
}
}
])
MongoPlayground

MongoDB, Panache, Quarkus: How to do aggregate, $sum and filter

I have a table in mongodb with sales transactions each containing a userId, a timestamp and a corresponding revenue value of the specific sales transaction.
Now, I would like to query these users and getting the minimum, maximum, sum and average of all transactions of all users. There should only be transactions between two given timestamps and it should only include users, whose sum of revenue is greater than a specified value.
I have composed the corresponding query in mongosh:
db.salestransactions.aggregate(
{
"$match": {
"timestamp": {
"$gte": new ISODate("2020-01-01T19:28:38.000Z"),
"$lte": new ISODate("2020-03-01T19:28:38.000Z")
}
}
},
{
$group: {
_id: { userId: "$userId" },
minimum: {$min: "$revenue"},
maximum: {$max: "$revenue"},
sum: {$sum: "$revenue"},
avg: {$avg: "$revenue"}
}
},
{
$match: { "sum": { $gt: 10 } }
}
]
)
This query works absolutely fine.
How do I implement this query in a PanacheMongoRepository using quarkus ?
Any ideas?
Thanks!
A bit late but you could do it something like this.
Define a repo
this code is in kotkin
class YourRepositoryReactive : ReactivePanacheMongoRepository<YourEntity>{
fun getDomainDocuments():List<YourView>{
val aggregationPipeline = mutableListOf<Bson>()
// create your each stage with Document.parse("stage_obj") and add to aggregates collections
return mongoCollection().aggregate(aggregationPipeline,YourView::class.java)
}
mongoCollection() automatically executes on your Entity
YourView, a call to map related properties part of your output. Make sure that this class has
#ProjectionFor(YourEntity.class)
annotation.
Hope this helps.

Mongodb selecting every nth of a given sorted aggregation

I want to be able to retrieve every nth item of a given collection which is quite large (millions of records)
Here is a sample of my collection
{
_id: ObjectId("614965487d5d1c55794ad324"),
hour: ISODate("2021-09-21T17:21:03.259Z"),
searches: [
ObjectId("614965487d5d1c55794ce670")
]
}
My start of aggregation is like so
[
{
$match: {
searches: {
$in: [ObjectId('614965487d5d1c55794ce670')],
},
},
},
{ $sort: { hour: -1 } },
{ $project: { hour: 1 } },
...
]
I have tried many things including
$sample which does not make the pick in the good order
Using $skip makes it very slow as the number given to skip grows
Using _id instead of $skip but my ids are unfortunately not created in an ordered manner
My goal is thus to retrieve the hour of a record, every 20000 record, so that I can then make a call to retrieve data by chunks of approximately 20000 records.
I imagine it would be possible to
sort, and number every records, then keep only the first, 20000, 40000, ..., and the last
Thanks for your help and let me know if you need more information

Aggregate rates from database

I have a collection in MongoDB. Model is:
{
currency: String,
price: Number,
time: Date
}
Documents are recorded to that collection any time the official rate for currency changes.
I am given a timestamp, and I need to fetch rates for all available currencies to that time. So first I need to filter all documents whose time $lte then required, then I need to fetch only those with max timestamps. For each currency.
after seeing your requirement , I think you want max number of price and time , use max operator
db.collection.aggregate(
[
{
$group:
{
_id: "$currency",
time: { $max: "$time"},
price: { $max: "$price" }
}
}
]
)
You can use mongo aggregate function to do so. Please find the example below:
db.<collection_name>.aggregate([
// First sort all the docs by time in descending
{$sort: {time: -1}},
// Take the first 3 of those
{$limit: 3}
])
Hope this helps !!

How to count the number of documents on date field in MongoDB

Scenario: Consider, I have the following collection in the MongoDB:
{
"_id" : "CustomeID_3723",
"IsActive" : "Y",
"CreatedDateTime" : "2013-06-06T14:35:00Z"
}
Now I want to know the count of the created document on the particular day (say on 2013-03-04)
So, I am trying to find the solution using aggregation framework.
Information:
So far I have the following query built:
collection.aggregate([
{ $group: {
_id: '$CreatedDateTime'
}
},
{ $group: {
count: { _id: null, $sum: 1 }
}
},
{ $project: {
_id: 0,
"count" :"$count"
}
}
])
Issue: Now considering above query, its giving me the count. But not based on only date! Its taking time as well into consideration for unique count.
Question: Considering the field has ISO date, Can any one tell me how to count the documents based on only date (i.e excluding time)?
Replace your two groups with
{$project:{day:{$dayOfMonth:'$createdDateTime'},month:{$month:'$createdDateTime'},year:{$year:'$createdDateTime'}}},
{$group:{_id:{day:'$day',month:'$month',year:'$year'}, count: {$sum:1}}}
You can read more about the date operators here: http://docs.mongodb.org/manual/reference/aggregation/#date-operators