This question already has answers here:
Fill missing dates in records
(5 answers)
Closed 2 years ago.
I'm trying to get a small table from mongodb aggregate, for example, number of fatal accidents by years.
I want to get all the years , even if sum is null (zero).
MongoDB query:
[
{"$match": {"city": "myCity"}},
{"$group": {
"_id": "$accident_year",
"count": {"$sum": 1}}
},
{"$sort": {"_id": 1}}
]
actual result:
[
{"_id": "2015", "count": 2},
{"_id": "2017", "count": 4},
{"_id": "2018", "count": 6},
{"_id": "2019", "count": 2}
]
desired result:
[
{"_id": "2015", "count": 2},
{"_id": "2016", "count": 0},
{"_id": "2017", "count": 4},
{"_id": "2018", "count": 6},
{"_id": "2019", "count": 2}
]
Thank you
I don't think it's possible to get an output of 0 for years that are not listed in the collection the way you have the aggregation pipeline created now. I don't see a way for the $group stage to know what values don't exist.
Since you are sorting the results, your application code that receives the output could check if years are listed and manipulate the aggregation pipeline results to include missing years. This is probably the best option.
Here is a hack to get around this if you really want the results in your aggregation pipeline:
Create a document for every year in your collection. Inside of those documents add a field called something like isAccident and set it to 0. For example:
{
{"_id":{"$oid":"5e662fca1c9d440000c1aa71"},
"accident_year":"2018",
"isAccident":"0"
}
Then you can update your pipeline to have a $project stage before the $group stage that adds the isAccident field to all of the documents that don't have the isAccident field and assigns them a value of 1. In your $project stage you can sum on the $isAccident field.
[{$project: {
_id: 1,
accident_year: 1,
isAccident: { $ifNull: ["$isAccident", 1] }
}}, {$group: {
_id: "$accident_year",
count: {
$sum: "$isAccident"
}
}}]
This will give you the results you're expecting. Beware that if others after you come to group and sum the accidents in this collection and don't realize you've created this extra documents for the years, their calculations will be off by one.
Related
On production server I use mongodb 4.4
I have a query that works well
db.step_tournaments_results.aggregate([
{ "$match": { "tournament_id": "6377f2f96174982ef89c48d2" } },
{ "$sort": { "total_points": -1, "time_spent": 1 } },
{
$group: {
_id: "$club_name",
'total_points': { $sum: "$total_points"},
'time_spent': { $sum: "$time_spent"}
},
},
])
But the problem is in $group operator, because it sums all the points of every group for total_points, but I need only best 5 of every group. How to achieve that?
Query
like your query, match and sort
on group instead of sum, gather all members inside one array
(i collected the $ROOT but you can collect only the 2 fields you need inside a {}, if the documents have many fields)
take the first 5 of them
take the 2 sums you need from the first 5
remove the temp fields
*with mongodb 6, you can do this in the group, without need to collect th members in an array, in mongodb 5 you can also do those with window-fields without group, but for mongodb 4.4 i think this is a way to do it
aggregate(
[{"$match": {"tournament_id": {"$eq": "6377f2f96174982ef89c48d2"}}},
{"$sort": {"total_points": -1, "time_spent": 1}},
{"$group": {"_id": "$club_name", "group-members": {"$push": "$$ROOT"}}},
{"$set":
{"first-five": {"$slice": ["$group-members", 5]},
"group-members": "$$REMOVE"}},
{"$set":
{"total_points": {"$sum": "$first-five.total_points"},
"time_spent": {"$sum": "$first-five.time_spent"},
"first-five": "$$REMOVE"}}])
I have a MongoDB collection that I have managed to process using an aggregation pipeline to produce the following result:
[
{
_id: 'Complex Numbers',
count: 2
},
{ _id: 'Calculus',
count: 1
}
]
But the result that I am aiming for is something like the following:
{
'Complex Numbers': 2,
'Calculus': 1
}
is there a way to achieve that?
Query
to convert to {} we need somethings like [[k1 v1] ...] OR [{"k" "..." :v "..."}]
first stage
converts each document to [{"k" ".." , "v" ".."}]
then arrayToObject
and replace root
so we have each document like "Complex Numbers": 2
the group is used to combine all those documents in 1 document
and then replace the root with that one document
Test code here
aggregate(
[{"$replaceRoot":
{"newRoot": {"$arrayToObject": [[{"k": "$_id", "v": "$count"}]]}}},
{"$group": {"_id": null, "data": {"$mergeObjects": "$$ROOT"}}},
{"$replaceRoot": {"newRoot": "$data"}}])
I have a gifts collection in mongodb with four items inside it. how do I query the db so that I get only gifts that the sum of their amount is less-than-or-equal-to 5500?
so for example from these four gifts in db:
{
"_id": 1,
"amount": 3000,
},
{
"_id": 2,
"amount": 2000,
},
{
"_id": 3,
"amount": 1000,
},
{
"_id": 4,
"amount": 5000,
}
The query should return the first two only:
{
"_id": 1,
"amount": 3000,
},
{
"_id": 1,
"amount": 2000,
},
I think I should use mongo aggregation? if so, what is the syntax?
I had some googling, I know how to use $sum in the $group stage, but I don't know how to use it in the $match stage. is it event possible to do so?
P.S: I assumend I should use $sum in $match, Am I supposed to group them first? if so, how do I tell mongo to make a group where the sum of amounts in that group is less-than-or-equal-to 5500?
Thanks for any help you are able to provide.
You're going the right way.
First store your $sum in a variable then filter them with $match:
db.gifts.aggregate([
{$match: {}}, // Initial query
{$group: {
_id: '$code', // Assume your gift could be grouped by a unique code
sum: {$sum: '$amount'}, // Sum all amount per group
items: {$push: '$$ROOT'} // Push all gift item to an array
}},
{$match: {sum: {$lte: 5500}}}, // Filter group where sum <= 5500
{$unwind: '$items'}, // Unwind items array to get all match field
{$replaceRoot: {newRoot: '$items'}} // Use this stage to get back the original items
])
I have more than 100k records in my collections, and for every 5 seconds it will add a record into collection. I have a aggregate query to get 720(approx) records from last one year data.
The aggregate query:
db.collectionName.aggregate([
{"$match": {
"Id": "****-id-****",
"receivedDate": {
"$gte": ISODate("2016-06-26T18:30:00.463Z"),
"$lt": ISODate("2017-06-26T18:30:00.463Z")
}
}
},
{"$group": {
"_id": {
"$add": [
{"$subtract": [
{"$subtract": ["$receivedDate", ISODate("1970-01-01T00:00:00.000Z")]},
{"$mod": [
{"$subtract": ["$receivedDate", ISODate("1970-01-01T00:00:00.000Z")]},
43200000
]}
]},
ISODate("1970-01-01T00:00:00.000Z")
]
},
"_rid": {"$first": "$_id"},
"_data": {"$first": "$receivedData.data"},
"count": {"$sum": 1}
}
},
{"$sort": {"_id": -1}},
{"$project": {
"_id": "$_rid",
"receivedDate": "$_id",
"receivedData": {"data": "$_data"}
}
}
])
I am not sure why its taking more than 15 seconds, when I try to get data for 1 month it is working fine.
Its too late to answer this question, This would be helpful for others,
Might be the compound index can help in this situation, Compound indexes can support queries that match on multiple fields.
You can create compound index on Id and receivedDate fields,
db.collectionName.createIndex({ Id: -1, receivedDate: -1 });
The order of the fields listed in a compound index is important. The index will contain references to documents sorted first by the values of the Id field and, within each value of the Id field, sorted by values of the receivedDate field.
Objective
Find out possible differences in the following MongoDB queries and understand why one of them works and the other doesn't.
Background
A while ago I posted a questions asking for help regarding a MongoDB query:
Using $push with $group with pymongo
In that question my query didn't work, and I was looking for a way to fix it. I had a ton of help in the comments, and eventually found out the solution, but no one seems to be able to explain me why my first incorrect query doesn't work, and the second one does.
Code
1st (incorrect) query:
pipeline = [
{"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}},
{"$project": {"_id": "$user.screen_name", "count": 1, "tweet_texts": 1}},
{"$sort" : {"count" : -1}},
{"$limit": 5}
]
2nd query:
pipeline = [
{"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}},
{"$sort" : {"count" : -1}},
{"$limit": 5}
]
Now, the mindful eye will see that the difference between the two queries is the project stage {"$project": {"_id": "$user.screen_name", "count": 1, "tweet_texts": 1}}.
At the time I thought this stage was necessary, but since I am already selecting the fields I need in the $group stage, I don't really need it. In fact, this additional and unnecessary stage was causing the tests to fail.
Question
If the $project stage in the first example is useless and does the same thing as the $group stage, why was my code failing? Shouldn't it make no difference at all (since the change is idempotent?)
In the first query, after the group stage, the user screen name value is saved under the _id key. Not under the user.screen_name key, therefore, that value will not be projected since there is no key.
If you modify your projection, using
{"$project": {"_id": "$_id", "count": 1, "tweet_texts": 1}},
or
{"$project": {"_id": 1, "count": 1, "tweet_texts": 1}},
or
{"$project": {"count": 1, "tweet_texts": 1}},
first pipeline will be similar like second pipeline.