How do we calculate a monthly average value in MongoDB?

How do we calculate a monthly average value in MongoDB? - mongodb

I have a list of documents in Mongo, each containing the value of a metric for a service for that day. The scheme is as follows:
_id: ObjectId("..."),
name: '<some_metric_name>',
service: '<service_name>',
timestamp_unix: Long("1635033600000"),
timestamp: '2021-10-24 03:00:00 +0300 EEST',
value: 99.9810785241248
},
I would like to have a dashboard in Redash where I can see the calculated average of these values per month, grouped by service.
e.g. over a span of 4 months it should look like this:
| service | date | avg
| my_svc | 01-01-2022 to 31-01-2022 | 99.500
| my_svc | 01-02-2022 to 28-02-2022 | 99.100
| my_svc | 01-03-2022 to 31-03-2022 | 99.400
| my_svc | 01-04-2022 to 30-04-2022 | 99.900
| my_svc_total | 01-01-2022 to 30-04-2022 | 99.475
| my_svc_2 | 01-01-2022 to 31-01-2022 | 99.150
.
.
.
So I need a query that is going to aggregate the documents by month, calculate the average and group them by month and service. So far I have this that can calculate the overall average:
{
"collection": "metrics",
"aggregate": [
{
"$group": {
"_id": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": {
"$toDate": "$timestamp_unix"
}
}
},
"total_average": {
"$avg": "$value"
}
}
}
]
}
So how would I go about implementing this?

I think you want your data group by month+year and service. So you should put both condition in your group stage. Then, you can get the average with the accumulator as you did.
{
"collection": "metrics",
"aggregate": [
{
"$group": {
"_id": {
"service" : "$service",
"month" :
{ "$dateToString": {
"format": "%Y-%m",
"date": {
"$toDate": "$timestamp_unix"
}
}
}
},
"average": {
"$avg": "$value"
}
}
},
{ "$project" :
{"service" : "$_id.service",
"year-month":"$_id.month",
"average":1,
"_id":0
}
},
{ "$sort": {"service" : 1, "year-month" : 1} }
]
}
This should give you data you want.
If you also want the total average per service, i would suggest to do use a $facet stage and do the exact same group stage but without the date condition.

Related

MongoError: PlanExecutor error during aggregation

I have tree records in mongodb but there could be many more, I'm getting shops by an ID coming from frontend
I need to get 20 records and group them by itemId and colorId, and get counts for every shop. the count of shops can be 1,2,3,....10etc..
this is output I need:
+--------+----------+-------+-------+-------+
| itemId | colorId | shop1 | shop2 | shop3 |
+========+==========+=======+=======+=======+
| 1 | colorId1 | 5 | 0 | 3 |
+--------+----------+-------+-------+-------+
| 2 | colorId2 | 3 | 0 | 0 |
+--------+----------+-------+-------+-------+
| 3 | colorId2 | 0 | 3 | 0 |
+--------+----------+-------+-------+-------+
| 2 | colorId1 | 0 | 5 | 0 |
+--------+----------+-------+-------+-------+
| 3 | colorId1 | 0 | 0 | 5 |
+--------+----------+-------+-------+-------+
here is my data and query - here shopId is string and it's work good.
but when I use this query on my local mashine, I'm getting this error:
MongoError: PlanExecutor error during aggregation :: caused by :: $arrayToObject requires an object with keys 'k' and 'v', where the value of 'k' must be of type string. Found type: objectId
but when I change shopId to the ObjectId I'm getting error.
ObjectId versoin

Per your request in the comments (if I got it right):
db.collection.aggregate([
{
"$match": {}// <-- Highly recommend you to use match due to the complexity of this query
},
{
$group: {
_id: 0,
data: {
$push: {
shopId: "$shopId",
shopItems: "$shopItems"
}
},
shopIds: {
"$push": {
shopId: "$shopId",
"count": 0
}
}
}
},
{
$unwind: "$data"
},
{
$unwind: "$data.shopItems"
},
{
$group: {
_id: {
itemId: "$data.shopItems.itemId",
colorId: "$data.shopItems.colorId"
},
data: {
$push: {
shopId: "$data.shopId",
count: "$data.shopItems.itemCount"
}
},
existing: {
$push: {
shopId: "$data.shopId",
"count": 0
}
},
shopIds: {
$first: "$shopIds"
}
}
},
{
"$addFields": {
"missing": {
"$setDifference": [
"$shopIds",
"$existing"
]
}
}
},
{
$project: {
data: {
$concatArrays: [
"$data",
"$missing"
]
}
}
},
{
$unwind: "$data"
},
{
$sort: {
"data.shopId": 1
}
},
{
$group: {
_id: "$_id",
counts: { // here you can change this key
$push: "$data"
},
totalCount: {
$sum: "$data.count" // if you want it
}
}
}
])
After the first $match, we $group in order to get all shopIds in each document.
Next we $unwind and $group by the group you wanted: by colorId and itemId. Then we are adding all the shops with count 0 and removing the ones that do have actual count. Last three steps are just for sorting, summing and formating.
You can play with it here.

How to search in MongoDB an element depending on the previous one?

I'm having to deal with a query that is kind of strange. I'm creating an app for boat tracking: I have a collections of documents with the timestamp and the Port ID where it was the board at that moment.
After sorting all the documents of this collection by the timestamp descending, I need to grab the elements that have the same Port ID in that range of time.
For example:
timestamp | port_id
2021-11-10T23:00:00.000Z | 1
2021-11-10T22:00:00.000Z | 1
2021-11-10T21:00:00.000Z | 1
2021-11-10T20:00:00.000Z | 2
2021-11-10T19:00:00.000Z | 2
2021-11-10T18:00:00.000Z | 2
2021-11-10T17:00:00.000Z | 1
2021-11-10T16:00:00.000Z | 1
2021-11-10T15:00:00.000Z | 1
Having this data (sorted by timestamp), I would have to grab the first 3 documents. The way I'm doing this now, is grabbing 2000 documents and implementing a filter function in the application level.
Another approch would be grabbing the first element, and then filtering by that port id, but that returns me 6 elements, not the first 3.
Do you know any way to perform a query like this in Mongo? Thanks!

Use $setWindowFields
db.collection.aggregate([
{
$setWindowFields: {
partitionBy: "",
sortBy: { timestamp: -1 },
output: {
c: {
$shift: {
output: "$port_id",
by: -1,
default: "Not available"
}
}
}
}
},
{
$set: {
c: {
$cond: {
if: { $eq: [ "$port_id", "$c" ] },
then: 0,
else: 1
}
}
}
},
{
$setWindowFields: {
partitionBy: "",
sortBy: { timestamp: -1 },
output: {
c: {
$sum: "$c",
window: { documents: [ "unbounded", "current" ] }
}
}
}
},
{
$match: { c: 1 }
},
{
$unset: "c"
}
])
mongoplayground

Aggregate counting logical values

Hello friend I am not friendly with mongodb aggregation I want is that I have array of object that contains subjects with its score for each question and I am using node js so I want is that full calculation with mongo query if possible that include subject name with its total score and count of attempt and not attempt my Json Array is as bellow
{
"examId": ObjectId("597367af7d8d3219d88c4341"),
"questionId": ObjectId("597368207d8d3219d88c4342"),
"questionNo": 1,
"subject": "Reasoning Ability",
"yourChoice": "A",
"correctMark": "1",
"attempt": true,
"notAttempt": false,
}
here in object one field is for correct marks subject are different an I want an output like
|Subject Name | Total attempts | total not attempts | total score |
| A | 5 | 3 | 10 |
| B | 10 | 5 | 25 |
I am trying with aggregation but not done yet I have tried this query
db.examscores.aggregate([
{ $group:{
_id:"$examId",
score: { $sum: '$correctMark' },
count: { $sum: 1 }
}}
])
Any one has idea how to achieve this type of output.
and if another way to achieve this using node than also good.

I have solved this here is my query
[
{ $match: { subject:'Reasoning Ability' } },
{
$group:
{
_id:{id:"$examId",subject:'$subject'},
totalAttempt: { $sum: {$cond : [ "$attempt", 1, 0 ]} },
totalNotAttempt: { $sum: {$cond : [ "$notAttempt", 1, 0 ]} },
markedForReview:{ $sum: {$cond : [ "$markedForReview", 1, 0 ]} },
answerAndMarkedForReview:{ $sum: {$cond : [ "$answerAndMarkedForReview", 1, 0 ]} },
score: { $sum: '$correctMark' },
count: { $sum: 1 }
}
}
]

MongoDB - How to extract field with max value

I have a MongoDb collection genre_count as
user | genre | count
-----+---------------+-------
1 | Western | 2
1 | Adventure | 1
1 | Comedy | 5
2 | Western | 3
2 | Thriller | 1
2 | Romance | 2
I need to extract the genre for each user with maximum count i.e. for user 1 , the genre with maximum count is Comedy with Count 5. I tried using couple of ways as:
db.genre_count.aggregate([
{
$group:{
_id:{
user:"$user",
genre:"$genre"
},
max_val:{
$max: "$count"
}
}
}
])
I thought this would work but it returned the count of the user for each genre so basically it returned me all the records.
Then I tried another solution which worked partially in :
db.genre_count.aggregate([
{
$group:{
_id:{
user:"$user"
},
max_val:{
$max: "$count"
}
}
}
])
But this only returned the maximum value as it has no corresponding genre information for that maximum value. Is there any way I can get the desired result?

To return the maximum count and genre list, you need to use the $max in your group stage to return the maximum "Count" for each group then use $push accumulator operator to return a list of "Genre Name" and "Count" for each group.
From there you need to use the $map operator in your $project stage to return a list of genre_names alongside the maximum count. The $cond here is used to compare each genre count to the maximum value.
db.genre_count.aggregate([
{ '$group': {
'_id': '$user',
'maxCount': { '$max': '$count' },
'genres': {
'$push': {
'name': '$genre',
'count': '$count'
}
}
}},
{ '$project': {
'maxCount': 1,
'genres': {
'$setDifference': [
{ '$map': {
'input': '$genres',
'as': 'genre',
'in': {
'$cond': [
{ '$eq': [ '$$genre.count', '$maxCount' ] },
'$$genre.name',
false
]
}
}},
[false]
]
}
}}
])

I think you can use this aggregate:
db.genre_count.aggregate([
{
$sort: {user:1, count:1}
},
{
$group:
{
_id: "$user",
maxCount: {$max: "$count"},
genre: {$last: "$genre"}
}
}])

Find last record of each day

I store data about my power consumption, each minute there is a new record, here is an example:
{"date":1393156826114,"id":"5309d4cae4b0fbd904cc00e1","adco":"O","hchc":7267599,"hchp":10805900,"hhphc":"g","ptec":"c","iinst":13,"papp":3010,"imax":58,"optarif":"s","isousc":60,"motdetat":"Á"}
such that I have around 1440 records a day.
How can I get the last record of each day?
Note: I use mongodb in spring java, so I need a query like this:
Example to get all measures :
#Query("{ 'date' : { $gt : ?0 }}")
public List<Mesure> findByDateGreaterThan(Date date, Sort sort);

A bit more modern than the original answer:
db.collection.aggregate([
{ "$sort": { "date": 1 } },
{ "$group": {
"_id": {
"$subtract": ["$date",{"$mod": ["$date",86400000]}]
},
"doc": { "$last": "$$ROOT" }
}},
{ "$replaceRoot": { "newDocument": "$doc" } }
])
The same principle applies that you essentially $sort the collection and then $group on the required grouping key picking up the $last data from the grouping boundary.
Making things a bit clearer since the original writing is that you can use $$ROOT instead of specifying every document property, and of course the $replaceRoot stage allows you to restore that data fully as the original document form.
But the general solution is still $sort first, then $group on the common key that is required and keep the $last or $first depending on sort order occurrences from the grouping boundary for the properties that are required.
Also for BSON Dates as opposed to a timestamp value as in the question, see Group result by 15 minutes time interval in MongoDb for different approaches on how to accumulate for different time intervals actually using and returning BSON Date values.
Not quite sure what you are going for here but you could do this in aggregate if my understanding is right. So to get the last record for each day:
db.collection.aggregate([
// Sort in date order as ascending
{"$sort": { "date": 1 } },
// Date math converts to whole day
{"$project": {
"adco": 1,
"hchc": 1,
"hchp": 1,
"hhphc": 1,
"ptec": 1,
"iinst": 1,
"papp": 1,
"imax": 1,
"optarif": 1,
"isousc": 1,
"motdetat": 1,
"date": 1,
"wholeDay": {"$subtract": ["$date",{"$mod": ["$date",86400000]}]}
}},
// Group on wholeDay ( _id insertion is monotonic )
{"$group":
"_id": "$wholeDay",
"docId": {"$last": "$_id" },
"adco": {"$last": "$adco" },
"hchc": {"$last": "$hchc" },
"hchp": {"$last": "$hchp" },
"hhphc": {"$last": "$hhphc" },
"ptec": {"$last": "$ptec" },
"iinst": {"$last": "$iinst" },
"papp": {"$last": "$papp" },
"imax": {"$last": "$imax" },
"optarif": {"$last": "$optarif",
"isousc": {"$last": "$isouc" },
"motdetat": {"$last": "$motdetat" },
"date": {"$last": "$date" },
}}
])
So the principle here is that given the timestamp value, do the date math to project that as the midnight time at the beginning of each day. Then as the _id key on the document is already monotonic (always increasing), then simply group on the wholeDay value while pulling the $last document from the grouping boundary.
If you don't need all the fields then only project and group on the ones you want.
And yes you can do this in the spring data framework. I'm sure there is a wrapped command in there. But otherwise, the incantation to get to the native command goes something like this:
mongoOps.getCollection("yourCollection").aggregate( ... )
For the record, if you actually had BSON date types rather than a timestamp as a number, then you can skip the date math:
db.collection.aggregate([
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
"day": { "$dayOfMonth": "$date" }
},
"hchp": { "$last": "$hchp" }
}}
])

It's also possible to format timestamps in the group key as %Y-%m-%d (e.g. 2021-12-05) with dateToString:
// { timestamp: 1638697946000, value: "a" } <= 2021-12-05 9:52:26
// { timestamp: 1638686311000, value: "b" } <= 2021-12-05 6:38:31
// { timestamp: 1638859111000, value: "c" } <= 2021-12-07 6:38:31
db.collection.aggregate([
{ $sort: { timestamp: 1 } },
// { timestamp: 1638686311000, value: "b" }
// { timestamp: 1638697946000, value: "a" }
// { timestamp: 1638859111000, value: "c" }
{ $group: {
_id: { $dateToString: { date: { $toDate: "$timestamp" }, format: "%Y-%m-%d" } },
last: { $last: "$$ROOT" }
}},
// { _id: "2021-12-07", last: { timestamp: 1638859111000, value: "c" } }
// { _id: "2021-12-05", last: { timestamp: 1638697946000, value: "a" } }
{ $replaceWith: "$last" }
])
// { timestamp: 1638697946000, value: "a" } <= 2021-12-05 9:52:26
// { timestamp: 1638859111000, value: "c" } <= 2021-12-07 6:38:31
This:
first $sorts documents by chronological order of timestamps such that we can latter on pick newest documents based on their order.
then $groups documents by their %Y-%m-%d-formatted timestamps:
by first converting the timestamp into a datetime: { $toDate: "$timestamp" }
and then converting the associated datetime into a string only representing the year, month and day: { $dateToString: { date: ..., format: "%Y-%m-%d" } }
such that for each group (i.e. date), we can pick the $last (i.e. newest since chronologically sorted) matching document
and the pick is the whole document as represented by $$ROOT
finally cleans up the group result with a $replaceWith stage (alias for $replaceRoot).