How to access overall document count during arithmetic aggregation expression - mongodb

I have a collection of documents in this format:
{
_id: ObjectId,
items: [
{
defindex: number,
...
},
...
]
}
Certain parts of the schema not relevant are omitted, and each item defindex within the items array is guaranteed to be unique for that array. The same defindex can occur in different documents' items fields, but will only occur once in each respective array if present.
I currently call $unwind upon the items field, followed by $sortByCount upon items.defindex to get a sorted list of items with the highest count.
I now want to add a new field to this final sorted list using $set called usage, that shows the item's usage as a percentage of the initial number of total documents in the collection.
(i.e. if the item's count is 1300, and the overall document count pre-$unwind was 2600, the usage value will be 0.5)
My initial plan was to use $facet upon the initial collection, creating a document as so:
{
total: number (achieved using $count),
documents: [{...}] (achieved using an empty $set)
}
And then calling $unwind on the documents field to add the total document count to each document. Calculating the usage value is then trivial using $set, since the total count is a field in the document itself.
This approach ran into memory issues though, since my collection is far larger than the 16MB limit.
How would I solve this?

One way to do it is use $setWindowFields:
db.collection.aggregate([
{
$setWindowFields: {
output: {
totalCount: {$count: {}}
}
}
},
{
$unwind: "$items"
},
{
$group: {
_id: "$items.defindex",
count: {$sum: 1},
totalCount: {$first: "$totalCount"}
}
},
{
$project: {
count: 1,
usage: {$divide: ["$count", "$totalCount"]
}
}
},
{$sort: {count: -1}}
])
As you can see here

Related

Aggregate MongoDB query within nested arrays over multiple documents

I am doing a little hobby project for friends' disc golf results.
I'm fairly new to MongoDB and I've stuck on a query. What I want to do is to return a JSON object on the form [{player: "Name", totalThrows: (sum of Totalt field), total+/-: (sum of +/- field)}].
So in short: I want to aggregate each players total throws (Total field) and total+/-, over 3 different documents in my database.
The final JSON would look like this: [{player: "Tormod", totalThrows: 148, total+/-: 24}, {player: "Martin", totalThrows: 149, total+/-: 25}, {player: "Andreas", totalThrows: 158, total+/-: 34}]
The picture below shows my document as it's saved in MongoDB. The results array consist of the results of a specific player discgolf results. There is also not a set order on which player is first in the results array in the different documents. Each document represents a new round (think of playing 3 different days).
Code:
aggregate(
[
{$unwind: "$results"},
{$group: {
player: "$results.PlayerName",
throws: "$results.Totalt"
}},
{$group: {
_id: "$_id",
totalThrows: {$sum: "$results.Totalt"}
}}
])
https://mongoplayground.net/p/aaKv3aaCeCi
You should pass an _id field (the group by expression) to the $group stage and use $sum to sum the Totalt and +/- values. Since the values are strings, you should convert them to integers using $toInt before summing them. Finally, you can project the values based on the desired structure.
Btw, it looks like most of the numeric fields are stored as strings in the documents. I would recommend you update the documents converting them all to numbers and make sure only numbers are added to the numeric fields in the new documents.
db.collection.aggregate([
{
$unwind: "$results",
},
{
$group: {
_id: "$results.PlayerName",
totalThrows: {
$sum: {
$toInt: "$results.Totalt",
},
},
"total+/-": {
$sum: {
$toInt: "$results.+/-",
},
},
},
},
{
$project: {
_id: 0,
player: "$_id",
totalThrows: "$totalThrows",
"total+/-": "$total+/-",
},
},
])
MongoPlayground

MongoDb Aggregate group and sort applications

There are documents with structure:
{"appId":<id>,"time":<number>}
For the example let we assume we have:
{"appId":"A","time":1}
{"appId":"A","time":3}
{"appId":"A","time":5}
{"appId":"B","time":1}
{"appId":"B","time":2}
{"appId":"B","time":4}
{"appId":"B","time":6}
Is it possible to group the documents by appId, each group to be sorted by time, and all results to be shown from the latest time for the group like:
{"appId":"B","time":6}
{"appId":"B","time":4}
{"appId":"B","time":2}
{"appId":"B","time":1}
{"appId":"A","time":5}
{"appId":"A","time":3}
{"appId":"A","time":1}
I tried this query:
collection.aggregate([{"$group":{"_id":{"a":"$appId"},"ttt":{"$max":"$time"}}},
{"$sort":{"_id.ttt":-1,"time":-1}}])
but i recieved only the last time for particular appId -> 2 results and this query change the structure of the data.
I want to keep the structure of the documents and only to group and sort them like the example.
You can try below aggregation:
db.collection.aggregate([
{
$sort: { time: -1 }
},
{
$group: {
_id: "$appId",
max: { $max: "$time" },
items: { $push: "$$ROOT" }
}
},
{
$sort: { max: -1 }
},
{
$unwind: "$items"
},
{
$replaceRoot: {
newRoot: "$items"
}
}
])
You can $sort before grouping to get the right order inside of each group. Then you can use special variable $$ROOT while grouping to capture whole orinal object. In the next step you can sort by $max value and use $unwind with $replaceRoot to get back the same amount of documents and to promote original shape to root level.
See if the below find & sort operation works with your real data.
collection.find({}, {_id : 0}).sort({appId:1, time:-1})
If this is a huge collection and this is going to be a repetitive query, make sure to create a compound index on these two fields.

How can I limit the result in a category using moongose

I have a collection in mongodb, the collection has a field displayInCategories. The collection contains 1000's of data wrt to different displayInCategories.
Is it possible to limit the records to <=5 for all the available displayInCategories.
I didn't want to limit the record on whole result, I need to limit the record as per the displayInCategories
This might get you going:
db.collection.aggregate([{
$group: {
_id: "$displayInCategories", // group by displayInCategories
"docs": { $push: "$$ROOT" } // remember all documents for this category
}
}, {
$project: {
"docs": { $slice: [ "$docs", 5 ] } // limit the items in each "docs" array to 5
}
}])
You might want to apply a $sort stage at the start to make sure you don't get random documents but rather the "top 5" based on some criteria.

Mongodb aggregation $group followed by $limit for pagination

In MongoDB aggregation pipeline, record flow from stage to stage happens one/batch at a time (or) will wait for the current stage to complete for whole collection before passing it to next stage?
For e.g., I have a collection classtest with following sample records
{name: "Person1", marks: 20}
{name: "Person2", marks: 20}
{name: "Person1", marks: 20}
I have total 1000 records for about 100 students and I have following aggregate query
db.classtest.aggregate(
[
{$sort: {name: 1}},
{$group: {_id: '$name',
total: {$sum: '$marks'}}},
{$limit: 5}
])
I have following questions.
The sort order is lost in final results. If I place another sort after $group, then results are sorted properly. Does that mean $group not maintains the previous sort order?
I would like to limit the results to 5. Does group operation has to be completely done (for all 1000 records) before passing to the limit. (or) The group operation passes the records to limit stage as and when it has record and stops processing when the requirement for limit stage is met?
My actual idea is to do pagination on results of aggregate. In above scenario, if $group maintains sort order and processes only required number of records, I want to apply $match condition {$ge: 'lastPersonName'} in subsequent page queries.
I do not want to apply $limit before $group as I want results for 5 students not first 5 records.
I may not want to use $skip as that means effectively traversing those many records.
I have solved the problem without need of maintaining another collection or even without $group traversing whole collection, hence posting my own answer.
As others have pointed:
$group doesn't retain order, hence early sorting is not of much help.
$group doesn't do any optimization, even if there is a following $limit, i.e., runs $group on entire collection.
My usecase has following unique features, which helped me to solve it:
There will be maximum of 10 records per each student (minimum of 1).
I am not very particular on page size. The front-end capable of handling varying page sizes.
The following is the aggregation command I have used.
db.classtest.aggregate(
[
{$sort: {name: 1}},
{$limit: 5 * 10},
{$group: {_id: '$name',
total: {$sum: '$marks'}}},
{$sort: {_id: 1}}
])
Explaining the above.
if $sort immediately precedes $limit, the framework optimizes the amount of data to be sent to next stage. Refer here
To get a minimum of 5 records (page size), I need to pass at least 5 (page size) * 10 (max records per student) = 50 records to the $group stage. With this, the size of final result may be anywhere between 0 and 50.
If the result is less than 5, then there is no further pagination required.
If the result size is greater than 5, there may be chance that last student record is not completely processed (i.e., not grouped all the records of student), hence I discard the last record from the result.
Then name in last record (among retained results) is used as $match criteria in subsequent page request as shown below.
db.classtest.aggregate(
[
{$match: {name: {$gt: lastRecordName}}}
{$sort: {name: 1}},
{$limit: 5 * 10},
{$group: {_id: '$name',
total: {$sum: '$marks'}}},
{$sort: {_id: 1}}
])
In above, the framework will still optimize $match, $sort and $limit together as single operation, which I have confirmed through explain plan.
The first few things to consider here is that the aggregation framework works with a "pipeline" of stages to be applied in order to get a result. If you are familiar with processing things on the "command line" or "shell" of your operating system, then you might have some experience with the "pipe" or | operator.
Here is a common unix idiom:
ps -ef | grep mongod | tee "out.txt"
In this case the output of the first command here ps -ef is being "piped" to the next command grep mongod which in turn has it's output "piped" to the tee out.txt which both outputs to terminal as well as the specified file name. This is a "pipeline" wher each stage "feeds" the next, and in "order" of the sequence they are written in.
The same is true of the aggregation pipeline. A "pipeline" here is in fact an "array", which is an ordered set of instructions to be passed in processing the data to a result.
db.classtest.aggregate([
{ "$group": {
"_id": "$name",
"total": { "$sum": "$marks"}
}},
{ "$sort": { "name": 1 } },
{ "$limit": 5 }
])
So what happens here is that all of the items in the collection are first processed by $group to get their totals. There is no specified "order" to grouping so there is not much sense in pre-ordering the data. Neither is there any point in doing so because you are yet to get to your later stages.
Then you would $sort the results and also $limit as required.
For your next "page" of data you will want ideally $match on the last unique name found, like so:
db.classtest.aggregate([
{ "$match": { "name": { "$gt": lastNameFound } }},
{ "$group": {
"_id": "$name",
"total": { "$sum": "$marks"}
}},
{ "$sort": { "name": 1 } },
{ "$limit": 5 }
])
It's not the best solution, but there really are not alternatives for this type of grouping. It will however notably get "faster" with each iteration towards the end. Alternately, storing all the unqiue names ( or reading that out of another collection ) and "paging" through that list with a "range query" on each aggregation statement may be a viable option, if your data permits it.
Something like:
db.classtest.aggregate([
{ "$match": { "name": { "$gte": "Allan", "$lte": "David" } }},
{ "$group": {
"_id": "$name",
"total": { "$sum": "$marks"}
}},
{ "$sort": { "name": 1 } },
])
Unfortunately there is not a "limit grouping up until x results" option, so unless you can work with another list, then you are basically grouping up everything ( and possibly a a gradually smaller set each time ) with each aggregation query you send.
"$group does not order its output documents." See http://docs.mongodb.org/manual/reference/operator/aggregation/group/
$limit limits the number of processed elements of an immediately preceding $sort operation, not only the number of elements passed to the next stage. See the note at http://docs.mongodb.org/manual/reference/operator/aggregation/limit/
For the very first question you asked, I am not sure, but it appears (see 1.) that a stage n+1 can influence the behaviour of stage n : the limit will limit the sort operation to its first n elements, and the sort operation will not complete just as if the following limit stage did not exist.
pagination on group data mongodb -
in $group items you can't directly apply pagination, but below trick will be used ,
if you want pagination on group data -
for example- i want group products categoryWise and then i want only 5 product per category then
step 1 - write aggregation on product table, and write groupBY
{ $group: { _id: '$prdCategoryId', products: { $push: '$$ROOT' } } },
step 2 - prdSkip for skipping , and limit for limiting data , pass it
dynamically
{
$project: {
// pagination for products
products: {
$slice: ['$products', prdSkip, prdLimit],
}
}
},
finally query looks like -
params - limit , skip - for category pagination
and prdSkip and PrdLimit for products pagination
db.products.aggregate([
{ $group: { _id: '$prdCategoryId', products: { $push: '$$ROOT' } } },
{
$lookup: {
from: 'categories',
localField: '_id',
foreignField: '_id',
as: 'categoryProducts',
},
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [{ $arrayElemAt: ['$categoryProducts', 0] }, '$$ROOT'],
},
},
},
{
$project: {
// pagination for products
products: {
$slice: ['$products', prdSkip, prdLimit],
},
_id: 1,
catName: 1,
catDescription: 1,
},
},
])
.limit(limit) // pagination for category
.skip(skip);
I used replaceRoot here to pullOut category.

mongodb: aggregates over a list contained in each document

I have currently reading the mongoDb aggregation introduction. The examples show how the aggregation operation is powerful, for example, to sum certain values across a subset of documents in a collection.
What I need is actually a bit different: I need to perform the same operation within a list that is contained in each document of a collection. In this way I would still get an element for each document that is contained in the collection, but the lists that are contained in each document would be collapsed, by summation on a certain field contained in the sub-documents contained in the list.
Is this possible with normal pipeline/aggregation operations in MongoDB?
I discovered that the $unwind operator allows to expand a list contained in a document across several documents.
For example, the following query just expands the sessions list into several documents, that can be used, afterwards, for an aggregation over the Ts field:
db.userStats.aggregate([
{ $match: {"u":{ "$in": [1,2,3,4,5] }}},
{ $unwind: "$sessions" },
{ $group: { _id:"$u" , total: { $sum: "$sessions.Ts"}}},
])
It sounds like you want to do a $project, possibly followed by a $group if you'd prefer to collapse all the results into a single document. Something like:
db.userStats.aggregate([
{ $match: {"u":{ "$in": [1,2,3,4,5] }}},
{ $project: { total: { $sum: "$sessions.Ts"}}},
{ $group: { _id:"$u" , total: { $first: "$total" }}},
])