Sum the different grades by date in MongoDB [duplicate] - mongodb

This question already has an answer here:
Mongodb count distinct with multiple group fields
(1 answer)
Closed 6 years ago.
I'm using the restaurants dataset from the MongoDB website. A document has arrays like the following:
{
"grades" : [
{
"date" : ISODate("2014-06-10T00:00:00.000Z"),
"grade" : "A"
},
{
"date" : ISODate("2013-06-05T00:00:00.000Z"),
"grade" : "B",
"score" : 7
},
{
"date" : ISODate("2012-04-13T00:00:00.000Z"),
"grade" : "A"
},
{
"date" : ISODate("2011-10-12T00:00:00.000Z"),
"grade" : "A"
}
]
}
I'm trying to get a list of all dates, with a count of how many of each grade there was on that day.
I've got this far:
db.restaurants.aggregate([{
$unwind : {
path: '$grades'
}
}, {
$group: {
_id: '$grades.date',
grades: {
$push: '$grades.grade'
}
}
}])
Which gives me each date and the grades on that date.
How do I now count the number of each unique grade?

Figured it out with thanks to this question.
The solution is actually much simpler than I was thinking:
db.restaurants.aggregate([{
$unwind : {
path: '$grades'
}
}, {
$group: {
_id: {
date: '$grades.date',
grade: '$grades.grade'
},
count: {
$sum: 1
}
}
}])
This gives a result like:
/* 1 */
{
"_id" : {
"date" : ISODate("2014-06-23T00:00:00.000Z"),
"grade" : "C"
},
"count" : 4
}
/* 2 */
{
"_id" : {
"date" : ISODate("2011-11-01T00:00:00.000Z"),
"grade" : "C"
},
"count" : 3
}
/* 3 */
{
"_id" : {
"date" : ISODate("2014-05-06T00:00:00.000Z"),
"grade" : "A"
},
"count" : 121
}
/* 4 */
{
"_id" : {
"date" : ISODate("2012-08-21T00:00:00.000Z"),
"grade" : "C"
},
"count" : 5
}
/* 5 */
{
"_id" : {
"date" : ISODate("2013-09-04T00:00:00.000Z"),
"grade" : "C"
},
"count" : 4
}

Related

Mongo aggregation - Sorting using a field value from previous pipeline as the sort field

I have produced the below output using mongodb aggregation (including $group pipeline inside levelsCount field) :
{
"_id" : "1",
"name" : "First",
"levelsCount" : [
{ "_id" : "level_One", "levelNum" : 1, "count" : 1 },
{ "_id" : "level_Three", "levelNum" : 3, "count" : 1 },
{ "_id" : "level_Four", "levelNum" : 4, "count" : 8 }
]
}
{
"_id" : "2",
"name" : "Second",
"levelsCount" : [
{ "_id" : "level_One", "levelNum" : 1, "count" : 5 },
{ "_id" : "level_Two", "levelNum" : 2, "count" : 2 },
{ "_id" : "level_Three", "levelNum" : 3, "count" : 1 },
{ "_id" : "level_Four", "levelNum" : 4, "count" : 3 }
]
}
{
"_id" : "3",
"name" : "Third",
"levelsCount" : [
{ "_id" : "level_One", "levelNum" : 1, "count" : 1 },
{ "_id" : "level_Two", "levelNum" : 2, "count" : 3 },
{ "_id" : "level_Three", "levelNum" : 3, "count" : 2 },
{ "_id" : "level_Four", "levelNum" : 4, "count" : 3 }
]
}
Now, I need to sort these documents based on the levelNum and count fields of levelsCount array elements. I.e. If two documents both had the count 5 forlevelNum: 1 (level_One), then the sort goes to compare the count of levelNum: 2 (level_Two) field and so on.
I see how $sort pipeline would work on multiple fields (Something like { $sort : { level_One : 1, level_Two: 1 } }), But the problem is how to access those values of levelNum of each array element and set that value as a field name to do sorting on that. (I couldn't handle it even after $unwinding the levelsCount array).
P.s: The initial order of levelsCount array's elements may differ on each document and is not important.
Edit:
The expected output of the above structure would be:
// Sorted result:
{
"_id" : "2",
"name" : "Second",
"levelsCount" : [
{ "_id" : "level_One", "levelNum" : 1, "count" : 5 }, // "level_One's count: 5" is greater than "level_One's count: 1" in two other documents, regardless of other level_* fields. Therefore this whole document with "name: Second" is ordered first.
{ "_id" : "level_Two", "levelNum" : 2, "count" : 2 },
{ "_id" : "level_Three", "levelNum" : 3, "count" : 1 },
{ "_id" : "level_Four", "levelNum" : 4, "count" : 3 }
]
}
{
"_id" : "3",
"name" : "Third",
"levelsCount" : [
{ "_id" : "level_One", "levelNum" : 1, "count" : 1 },
{ "_id" : "level_Two", "levelNum" : 2, "count" : 3 }, // "level_Two's count" in this document exists with value (3) while the "level_Two" doesn't exist in the below document which mean (0) value for count. So this document with "name: Third" is ordered higher than the below document.
{ "_id" : "level_Three", "levelNum" : 3, "count" : 2 },
{ "_id" : "level_Four", "levelNum" : 4, "count" : 3 }
]
}
{
"_id" : "1",
"name" : "First",
"levelsCount" : [
{ "_id" : "level_One", "levelNum" : 1, "count" : 1 },
{ "_id" : "level_Three", "levelNum" : 3, "count" : 1 },
{ "_id" : "level_Four", "levelNum" : 4, "count" : 8 }
]
}
Of course, I'd prefer to have an output document in the below format, But the first problem is to sort all docs:
{
"_id" : "1",
"name" : "First",
"levelsCount" : [
{ "level_One" : 1 },
{ "level_Three" : 1 },
{ "level_Four" : 8 }
]
}
You can sort by levelNum as descending order and count as ascending order,
db.collection.aggregate([
{
$sort: {
"levelsCount.levelNum": -1,
"levelsCount.count": 1
}
}
])
Playground
For key-value format result of levelsCount array,
$map to iterate loop of levelsCount array
prepare key-value pair array and convert to object using $arrayToObject
{
$addFields: {
levelsCount: {
$map: {
input: "$levelsCount",
in: {
$arrayToObject: [
[{ k: "$$this._id", v: "$$this.levelNum" }]
]
}
}
}
}
}
Playground

How can I extract whole documents based on how they compare with their whole collection?

I'm trying to extract the latest available daily measurements from a "sparse" collection that might not have a measurement for every day. I'm interested in getting the whole original document as output. The collection contains several series of measurements identified by a unique id.
For example, given the following collection:
{ "date" : "2019-04-10", "id" : 1, "measurement" : 50 }
{ "date" : "2019-04-10", "id" : 2, "measurement" : 1 }
{ "date" : "2019-04-10", "id" : 3, "measurement" : 33 }
{ "date" : "2019-04-11", "id" : 1, "measurement" : 52 }
{ "date" : "2019-04-11", "id" : 3, "measurement" : 3 }
{ "date" : "2019-04-12", "id" : 1, "measurement" : 55 }
{ "date" : "2019-04-12", "id" : 2, "measurement" : 12 }
The above collection contains measurements for 3 ids. I'd like to retrieve the latest measurements for each id.
For example, the above collection should yield the following result:
{ "date" : "2019-04-12", "id" : 1, "measurement" : 55 }
{ "date" : "2019-04-12", "id" : 2, "measurement" : 12 }
{ "date" : "2019-04-11", "id" : 3, "measurement" : 3 }
So far, I'm able to extract the latest date for every ids with this:
db.control_subs.aggregate([ { $group : { _id : "$id", "last_date" : { $max : "$date" } } }, { $sort:{ "_id": 1 }} ])
But this, unfortunately, strips the actual measurement field from the output.
How could I obtain the desired output with a single MongoDB query?
You can try below aggregation query with $$ROOT operator:
db.control_subs.aggregate([
{
"$project":
{
"id": "$id",
"date": "$date",
"document": "$$ROOT" // save all fields for future usage
}
},
{
"$sort":
{ "date": -1
}
},
{
"$group":
{
"_id":{"id":"$id"},
"original_doc":{"$first":"$document"}
}
},
{
$project:
{
"original_doc.date":1, "original_doc.id":1, "original_doc.measurement":1, _id:0}
}
])
Output of above aggregation is
{ "original_doc" : { "date" : "2019-04-11", "id" : 3, "measurement" : 3 } }
{ "original_doc" : { "date" : "2019-04-12", "id" : 2, "measurement" : 12 } }
{ "original_doc" : { "date" : "2019-04-12", "id" : 1, "measurement" : 55 } }
Even you can also replace the original_doc with the help of $replaceRoot

Find sum of fields inside array in MongoDB

I have a data as follows:
> db.PQRCorp.find().pretty()
{
"_id" : 0,
"name" : "Ancy",
"results" : [
{
"evaluation" : "term1",
"score" : 1.463179736705023
},
{
"evaluation" : "term2",
"score" : 11.78273309957772
},
{
"evaluation" : "term3",
"score" : 6.676176060654615
}
]
}
{
"_id" : 1,
"name" : "Mark",
"results" : [
{
"evaluation" : "term1",
"score" : 5.89772766299929
},
{
"evaluation" : "term2",
"score" : 12.7726680028769
},
{
"evaluation" : "term3",
"score" : 2.78092882672992
}
]
}
{
"_id" : 2,
"name" : "Jeff",
"results" : [
{
"evaluation" : "term1",
"score" : 36.78917882992872
},
{
"evaluation" : "term2",
"score" : 2.883687879200287
},
{
"evaluation" : "term3",
"score" : 9.882668212003763
}
]
}
What I want to achieve is ::Find employees who failed in aggregate (term1 + term2 + term3)
What I am doing and eventually getting is:
db.PQRCorp.aggregate([
{$unwind:"$results"},
{ $group: {_id: "$id",
'totalTermScore':{ $sum:"$results.score" }
}
}])
OUTPUT:{ "_id" : null, "totalTermScore" : 90.92894831067625 }
Simply I am getting a output of a flat sum of all scores. What I want is, to sum terms 1 , 2 and 3 separately for separate employees.
Please can someone help me. I am new to MongoDB (quite evident though).
You do not need to use $unwind and $group here... A simple $project query can $sum your entire score...
db.PQRCorp.aggregate([
{ "$project": {
"name": 1,
"totalTermScore": {
"$sum": "$results.score"
}
}}
])

Count ignoring duplicate documents

I'd like to count all emails within the specific project (ID: 7), but ignoring duplicate rows in ONE campaign.
Here's the example of my collection structure:
{
"_id" : ObjectId("581a9054c274f7b512e8ed94"),
"email" : "a#example.com",
"IDproject" : 7,
"IDcampaign" : 10
}
{
"_id" : ObjectId("581a9064c274f7b512e8ed95"),
"email" : "b#example.com",
"IDproject" : 7,
"IDcampaign" : 10
}
{
"_id" : ObjectId("581a9068c274f7b512e8ed96"),
"email" : "b#example.com",
"IDproject" : 7,
"IDcampaign" : 10
}
{
"_id" : ObjectId("581a906cc274f7b512e8ed97"),
"email" : "b#example.com",
"IDproject" : 7,
"IDcampaign" : 11
}
{
"_id" : ObjectId("581a9072c274f7b512e8ed98"),
"email" : "c#example.com",
"IDproject" : 7,
"IDcampaign" : 11
}
{
"_id" : ObjectId("581a9079c274f7b512e8ed99"),
"email" : "d#example.com",
"IDproject" : 7,
"IDcampaign" : 12
}
This is what the result should be:
a#example.com
b#example.com
b#example.com
c#example.com
d#example.com
Total: 5 (of 6). Note that b#example.com is mentioned twice. That's because b#example.com has campaigns 10, 10 and 11. We're ignoring one 10.
This is what I've tried:
db.mycollection.aggregate([
{$match : {IDproject : 7}},
{$group : {_id : "$email", total : {$sum : 1}}}
])
But it returns only unique emails ignoring IDcampaign. Also, I can get unique number of emails with the following query:
db.mycollection.distinct('email', {IDproject : 7})
But again, it shows only unique emails ignoring IDcampaign.
Could someone give me a hint how to count emails including IDcampaign?
Thanks.
p.s. I use MongoDB with PHP, and I can solve the problem with PHP calculations, but that's not the solution.
Include it as part of your group key, as in the following example:
db.mycollection.aggregate([
{ "$match": { "IDproject": 7 } },
{
"$group": {
"_id": {
"email" : "$email",
"IDcampaign" : "$IDcampaign"
},
"count": { "$sum": 1 }
}
}
])
Sample Output
/* 1 */
{
"_id" : {
"email" : "a#example.com",
"IDcampaign" : 10
},
"count" : 1
}
/* 2 */
{
"_id" : {
"email" : "d#example.com",
"IDcampaign" : 12
},
"count" : 1
}
/* 3 */
{
"_id" : {
"email" : "b#example.com",
"IDcampaign" : 11
},
"count" : 1
}
/* 4 */
{
"_id" : {
"email" : "b#example.com",
"IDcampaign" : 10
},
"count" : 2
}
/* 5 */
{
"_id" : {
"email" : "c#example.com",
"IDcampaign" : 11
},
"count" : 1
}
To answer your follow-up question on getting the count only since you don't need the list of emails, you could run the following pipeline
db.mycollection.aggregate([
{ "$match": { "IDproject": 7 } },
{
"$group": {
"_id": null,
"count": { "$sum": 1 },
"emails": {
"$addToSet": {
"email" : "$email",
"IDcampaign" : "$IDcampaign"
}
}
}
},
{
"$project": {
"_id": 0,
"count": 1,
"total": { "$size": "$emails" }
}
}
])
which gives you the result
{
"total" : 5,
"count" : 6
}
that you can interpret as Total 5 (of 6)

How can I get a running total with mongodb aggregate framework?

I am fairly new to MongoDB and I am playing with the aggregate framework. One of the examples from the documentation shows the following, which returns total number of new user joins per month and lists the month joined:
db.users.aggregate(
[
{ $project : { month_joined : { $month : "$joined" } } } ,
{ $group : { _id : {month_joined:"$month_joined"} , number : { $sum : 1 } } },
{ $sort : { "_id.month_joined" : 1 } }
]
)
The code outputs the following:
{
"_id" : {
"month_joined" : 1
},
"number" : 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5
}
Is it possible to also have each object contain the sum of all users that have joined since the start, so I don't have to run over the objects programmatically and calculate it myself?
Example desired output:
{
"_id" : {
"month_joined" : 1
},
"number" : 3,
"total": 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9,
"total": 12
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5,
"total": 17
}