I am logging data into MongoDB in the following format:
{ "_id" : ObjectId("54f2393f80b72b00079d1a53"), "outT" : 10.88, "inT3" : 22.3, "light" : 336, "humidity" : 41.4, "pressure" : 990.31, "inT1" : 22.81, "logtime" : ISODate("2015-02-28T21:55:11.838Z"), "inT2" : 21.5 }
{ "_id" : ObjectId("54f2394580b72b00079d1a54"), "outT" : 10.88, "inT3" : 22.3, "light" : 338, "humidity" : 41.4, "pressure" : 990.43, "inT1" : 22.75, "logtime" : ISODate("2015-02-28T21:55:17.690Z"), "inT2" : 311.72 }
...
As you can see there is a single time element and multiple readings logged. I want to aggregate across all of the readings to provide a max min and average for each variable grouped by hour of day. I have managed to do this for a single variable using the following aggregation script:
db.logs.aggregate(
[
{
$match: {
logtime: {
$gte: ISODate("2015-03-01T00:00:00.000Z"),
$lt: ISODate("2015-03-03T00:00:00.000Z")
}
}
},
{
$project: {_id: 0, logtime: 1, outT: 1}
},
{
$group: {
_id: {
day: {$dayOfYear: "$logtime"},
hour: {$hour: "$logtime"}
},
average: {$avg: "$outT"},
max: {$max: "$outT"},
min:{$min: "$outT"}
}
}
]
)
which produces:
{ "_id" : { "day" : 61, "hour" : 22 }, "average" : 3.1878750000000116, "max" : 3.44, "min" : 3 }
{ "_id" : { "day" : 61, "hour" : 14 }, "average" : 13.979541666666638, "max" : 17.81, "min" : 8.81 }
...
I would like to produce output which looks like:
{"outT": { output from working aggregation above },
"inT1": { ... },
...
}
Everything I try seems to throw an error in the mongo console. Can anyone help?
Thanks
You can do this by including each statistic in your $group with a different name and then following that with a $project stage to reshape it into your desired format:
db.logs.aggregate([
{
$match: {
logtime: {
$gte: ISODate("2015-02-28T00:00:00.000Z"),
$lt: ISODate("2015-03-03T00:00:00.000Z")
}
}
},
{
$project: {_id: 0, logtime: 1, outT: 1, inT1: 1}
},
{
$group: {
_id: {
day: {$dayOfYear: "$logtime"},
hour: {$hour: "$logtime"}
},
outT_average: {$avg: "$outT"},
outT_max: {$max: "$outT"},
outT_min:{$min: "$outT"},
inT1_average: {$avg: "$inT1"},
inT1_max: {$max: "$inT1"},
inT1_min:{$min: "$inT1"}
}
},
{
$project: {
outT: {
average: '$outT_average',
max: '$outT_max',
min: '$outT_min'
},
inT1: {
average: '$inT1_average',
max: '$inT1_max',
min: '$inT1_min'
}
}
}
])
This gives you output that looks like:
{
"_id" : {
"day" : 59,
"hour" : 21
},
"outT" : {
"average" : 10.88,
"max" : 10.88,
"min" : 10.88
},
"inT1" : {
"average" : 22.78,
"max" : 22.81,
"min" : 22.75
}
}
$max in Mongodb gets the maximum of the corresponding values from all documents in the collection. $min gets the minimum values from all documents in the collection. $avg gets the average value from the collection.
you must go through the Mongodb link for sample examples.
Related
My documents are stored like this and no, i can't change them:
{
"_id" : ObjectId("5ea773f219d60c4f1629203a"),
"direction" : 135,
"latitude" : -3.744851,
"longitude" : -38.545571,
"metrictimestamp" : "20180201025959",
"odometer" : 55697826,
"routecode" : 0,
"speed" : 3,
"deviceid" : 134680,
"vehicleid" : 32040
}
I need a group by vehicleid and only the day of year from this "metrictimestamp", and count how many documents are with the same vehicle and day, ideas??
I would say your metrictimestamp probably would contains first few characters 20180201 as YYYYMMDD, So using $substrbytes in aggregation you can get month, day, year from the string. Try below query :
db.collection.aggregate([
{
$addFields: {
day: { $toInt: { $substrBytes: [ "$metrictimestamp", 6, 2 ] } }, // $toInt can be optional
month: { $toInt: { $substrBytes: [ "$metrictimestamp", 4, 2 ] } },
year: { $toInt: { $substrBytes: [ "$metrictimestamp", 0, 4 ] } }
}
},
{
$group: {
_id: { vehicleid: "$vehicleid", day: "$day", year: "$year" },
count: { $sum: 1 }
}
}
])
Test : mongoplayground
I've been trying to get my head around aggregation for a while now and I can't seem to work out how to find the average, min or max, of a sum of strings.
db.mycollectionname.aggregate([
{$unwind: "$Monitor"},
{$group: {_id: "$Monitor.Mon Type",
"Total": {$sum: 1}
}
}
])
I know it's not much, but this is how far I managed to get, it pumps out this;
{ "_id" : "RM", "Total" : 21 }
{ "_id" : "PT", "Total" : 43 }
{ "_id" : "IM", "Total" : 24 }
{ "_id" : "IO", "Total" : 72 }
What I'm trying to do is get the min/max of these sum results and an average of all the results.
Any help or advice is appreciated, can't seem to find anything that helps me out.
Thank you
Add this stage as last stage :
{$group :{_id:'', minimum :{$min: '$Total'}, maximum :{$max: '$Total'}, Total :{$sum : '$Total'}, average :{$avg : '$Total'}}}
So your query :
db.mycollectionname.aggregate([
{ $unwind: "$Monitor" },
{
$group: {
_id: "$Monitor.Mon Type",
"Total": { $sum: 1 }
}
},
{
$group: {
_id: '', minimum: { $min: '$Total' }, maximum: { $max: '$Total' },
Total: { $sum: '$Total' }, average: { $avg: '$Total' }
}
}
])
So when you do _id: '' or _id: null, a group stage would iterate through all the documents at that point. After adding final $group stage result should be :
/* 1 */
{
"_id" : "",
"minimum" : 21.0,
"maximum" : 72.0,
"Total" : 160.0,
"average" : 40.0
}
I am trying to build a dashboard chart in Mongo-Atlas.
The Table should should show the date on x-axis, the _id on y-axis.
The Values should be the count difference to the date before.
I have a collection with data points such as:
_id: "someName"
timestamp: 2019-09-05T06:24:24.689+00:00
count: 50
_id: "someName"
timestamp: 2019-09-04T06:24:24.689+00:00
count: 40
...
The goal is to get the difference of the count to the data point before. Having the same name.
_id: "someName"
timestamp: 2019-09-05T06:24:24.689+00:00
count: 50
difference: 10
_id: "someName"
timestamp: 2019-09-04T06:24:24.689+00:00
count: 40
difference: 17
...
That way I could make a table listing the differences
so far I created a aggregation pipeline
[
{$sort: {
"timestamp": -1
}},
{$group: {
_id: "$_id",
count: {
$push: { count: "$count", timestamp: "$timestamp" }
}
}},
{$project: {
_id: "$_id",
count: "$count",
countBefore: { $slice: [ "$count", 1, { $size: "$count" } ] }
}}
]
I was hoping to substract count and countBefore such that i get an array with the datapoints an the difference...
So I tried to follow with:
{$project: {
countDifference: {
$map: {
input: "$countBefore",
as: "before",
in: {
$subtract: ["$$before.count", "$count.count"]
/*"$count.count" seems to be the problem, since an integer works*/
}
}
}
}
}
Mongo Atlas only shows "An unknown error occurred"
I would be glad for some advice :)
The following query can get us the expected output:
db.collection.aggregate([
{
$sort:{
"timestamp":1
}
},
{
$group:{
"_id":"$id",
"counts":{
$push:"$count"
}
}
},
{
$project:{
"differences":{
$reduce:{
"input":"$counts",
"initialValue":{
"values":[],
"lastValue":0
},
"in":{
"values":{
$concatArrays:[
"$$value.values",
[
{
$subtract:["$$this","$$value.lastValue"]
}
]
]
},
"lastValue":"$$this"
}
}
}
}
},
{
$project:{
"_id":0,
"id":"$_id",
"plots":"$differences.values"
}
}
]).pretty()
Data Set:
{
"_id" : ObjectId("5d724550ef5e6630fde5b71e"),
"id" : "someName",
"timestamp" : "2019-09-05T06:24:24.689+00:00",
"count" : 50
}
{
"_id" : ObjectId("5d724550ef5e6630fde5b71f"),
"id" : "someName",
"timestamp" : "2019-09-04T06:24:24.689+00:00",
"count" : 40
}
{
"_id" : ObjectId("5d724796ef5e6630fde5b720"),
"id" : "someName",
"timestamp" : "2019-09-06T06:24:24.689+00:00",
"count" : 61
}
{
"_id" : ObjectId("5d724796ef5e6630fde5b721"),
"id" : "someName",
"timestamp" : "2019-09-07T06:24:24.689+00:00",
"count" : 72
}
{
"_id" : ObjectId("5d724796ef5e6630fde5b722"),
"id" : "someName",
"timestamp" : "2019-09-08T06:24:24.689+00:00",
"count" : 93
}
{
"_id" : ObjectId("5d724796ef5e6630fde5b723"),
"id" : "someName",
"timestamp" : "2019-09-09T06:24:24.689+00:00",
"count" : 100
}
Output:
{ "id" : "someName", "plots" : [ 40, 10, 11, 11, 21, 7 ] }
Explanation: We are pushing count for the same id into counts array and then applying $reduce operation on it to prepare a set of new values in which current value would hold difference between the current and previous value of counts array. For the very first value, the previous value is taken as zero.
I have my MongoDB data like that
Please look at the last field - time, as you can see, I have some "duplicate" data which have been marked with color.
For the small database, I can remove the duplicate values with below code
var cursor = db.getCollection("light").aggregate([
{$group : {
"_id": {
index: "$index",
unit: "$unit",
min: "$min",
max: "$max",
node: "$node",
year: { "$year": "$time" },
dayOfYear: { "$dayOfYear": "$time" },
hour: { "$hour": "$time" },
minute: { "$minute": "$time" }
},
_id_not_delete: { $last: "$_id" }
}}
],
{
"allowDiskUse" : true
}
)
var ids_not_delete = cursor.map(function (doc) { return doc._id_not_delete; });
db.getCollection("light").remove({"_id": { "$nin": ids_not_delete }});
But my database has more than 20 millions record, thus I receive this error
E QUERY [js] Error: Converting from JavaScript to BSON failed: Object size 23146644 exceeds limit of 16793600 bytes. :
Bulk/addToOperationsList#src/mongo/shell/bulk_api.js:611:28
Bulk/findOperations.remove#src/mongo/shell/bulk_api.js:743:24
DBCollection.prototype.remove#src/mongo/shell/collection.js:404:13
#(shell):1:1
I know that the root cause is
The maximum BSON document size is 16 megabytes
I think I should change below code, but I don't have any good solution.
var ids_not_delete = cursor.map(function (doc) { return doc._id_not_delete; });
Do you have any ideas to optimize my code?
Example documents in the collection:
{
"_id" : ObjectId("5be22d5808c08300545effee"),
"index" : "LIGHT",
"unit" : "LUX",
"min" : NumberInt(5),
"max" : NumberInt(6),
"avg" : 5.5,
"node" : "TH",
"time" : ISODate("2018-11-07T00:10:00.091+0000")
},
{
"_id" : ObjectId("5be22b0052122e0047c3467c"),
"index" : "LIGHT",
"unit" : "LUX",
"min" : NumberInt(3),
"max" : NumberInt(5),
"avg" : NumberInt(4),
"node" : "TH",
"time" : ISODate("2018-11-07T00:00:00.204+0000")
},
{
"_id" : ObjectId("5be22b0008c08300545eff79"),
"index" : "LIGHT",
"unit" : "LUX",
"min" : NumberInt(3),
"max" : NumberInt(5),
"avg" : NumberInt(4),
"node" : "TH",
"time" : ISODate("2018-11-07T00:00:00.081+0000")
}
MongoDB shell version v4.0.2
MongoDB 4.0.0
You can invert your aggregation to select ids you want to delete, rather than ones you want to keep:
const toDelete = db.getCollection("light").aggregate([
{ $group : {
"_id": {
index: "$index",
unit: "$unit",
min: "$min",
max: "$max",
node: "$node",
year: { "$year": "$time" },
dayOfYear: { "$dayOfYear": "$time" },
hour: { "$hour": "$time" },
minute: { "$minute": "$time" }
},
ids: {$push: "$_id"}
} },
{$project: {_id: {$slice: ["$ids", 1, 10000]}}},
{$unwind: "$_id"},
{$project: {_id: 0, deleteOne: { "filter" : { "_id" : "$_id"} } } }
]).toArray()
10,000 here is any big enough number significantly greater than expected number of duplicates within a group.
Then you can use bulkWrite:
db.getCollection("light").bulkWrite(toDelete);
The driver will split the array by batches 100,000 deletions each.
I'm working with a collection called "submissions" that includes mission data for a game. Every time you submit your answer to the question/mission a document is saved to the submissions collection.
A typical document looks like this:
{
"_id" : ObjectId("5b0c99dffea598002fdb6ec9"),
"status" : "Accepted",
"missionAcceptanceDate" : NumberInt(1527526261),
"submissionDate" : NumberInt(1527552495),
"location" : "",
"approvalDate" : null,
"mission_id" : ObjectId("5b0c5b10fea598002fdb6ec6"),
"user_id" : ObjectId("5b0c99d0fea598002fdb6ec7")
}
I need to build a query that can determine the following:
Counts the total submissions per grouping by status between a date range,
Counts the Total number of submissions by mission_id per each grouping.
Below is what I have so far ("cutting out some info..."):
QUERY:
db.submissions.aggregate(
{ $match: {
"submissionDate": {
"$gte": NumberInt(1527901260), //Fri, 1 Jun 2018 18:00:01 PDT
"$lte": NumberInt(1530831600) //Thu, 5 Jul 2018 16:00:00 PDT
}
}
},
{$project:
{_id:1,
status: 1,
missionAcceptanceDate: 1,
submissionDate: 1,
approvalDate: 1,
user_id: 1,
mission_id: 1
}
},
{$group: {
_id: { status: "$status" },
statusTotal: { $sum: 1 },
mission_id: { $addToSet: "$mission_id" },
}
}
)
RESULT:
{
"_id" : {
"status" : "Rejected"
},
"statusTotal" : 16.0,
"mission_id" : [
ObjectId("5b0edbfafea598002fdb6f3a"),
ObjectId("5b0f0131fea598002fdb6f43"),
ObjectId("5b0eded6fea598002fdb6f3d"),
...
]
}
{
"_id" : {
"status" : "Approved"
},
"statusTotal" : 592.0,
"mission_id" : [
ObjectId("5b391cf4e177c700308dd0ef"),
ObjectId("5b36a0d172b5240030d304be"),
ObjectId("5b3558276a0f950030db1732"),
...
]
}
This is a great, but I also need to know the Total Number of Times each "mission_id" was "Rejected" or "Approved". The desired out come should look something like the following:
{
"_id" : {
"status" : "Rejected"
},
"statusTotal" : 16.0,
"mission_id" : [
...
{ mission_id: ObjectId("5b0edbfafea598002fdb6f3a"), count: 3 },
{ mission_id: ObjectId("5b0f0131fea598002fdb6f43"), count: 5},
{ mission_id: ObjectId("5b0eded6fea598002fdb6f3d"), count: 2 }
...
]
}
{
"_id" : {
"status" : "Approved"
},
"statusTotal" : 592.0,
"missionData" : [
{ mission_id: ObjectId("5b391cf4e177c700308dd0ef"), count: 23 },
{ mission_id: ObjectId("5b36a0d172b5240030d304be"), count: 45},
{ mission_id: ObjectId("5b3558276a0f950030db1732"), count: 15 }
...
]
}
I tried using $size, $sum, $count, even using a second $group with {$sum:1} for each mission_id, but just can't figure it out. Any help with this would be so much appreciated. Thank you in advance!
You need an extra grouping before your group if I understand your question correctly.
Something like
db.submissions.aggregate([
{"$match":{"submissionDate":{"$gte":NumberInt(1527901260),"$lte":NumberInt(1530831600)}}},
{"$group":{
"_id":{"status":"$status","mission_id":"$mission_id"},
"missionTotal":{"$sum":1}
}},
{"$group":{
"_id":"$_id.status",
"statusTotal":{"$sum":"$missionTotal"},
"mission_id":{"$push":{"mission_id":"$_id.mission_id","count":"$missionTotal"}}
}}
])