How to improve aggregate pipeline - mongodb

I have pipeline
[
{'$match':{templateId:ObjectId('blabla')}},
{
"$sort" : {
"_id" : 1
}
},
{
"$facet" : {
"paginatedResult" : [
{
"$skip" : 0
},
{
"$limit" : 100
}
],
"totalCount" : [
{
"$count" : "count"
}
]
}
}
])
Index:
"key" : {
"templateId" : 1,
"_id" : 1
}
Collection has 10.6M documents 500k of it is with needed templateId.
Aggregate use index
"planSummary" : "IXSCAN { templateId: 1, _id: 1 }",
But the request takes 16 seconds. What i did wrong? How to speed up it?

For start, you should get rid of the $sort operator. The documents are already sorted by _id since the documents are already guaranteed to sorted by the { templateId: 1, _id: 1 } index. The outcome is sorting 500k which are already sorted anyway.
Next, you shouldn't use the $skip approach. For high page numbers you will skip large numbers of documents up to almost 500k (rather index entries, but still).
I suggest an alternative approach:
For the first page, calculate an id you know for sure falls out of the left side of the index. Say, if you know that you don't have entries back dated to 2019 and before, you can use a match operator similar to this:
var pageStart = ObjectId.fromDate(new Date("2020/01/01"))
Then, your match operator should look like this:
{'$match' : {templateId:ObjectId('blabla'), _id: {$gt: pageStart}}}
For the next pages, keep track of the last document of the previous page: if the rightmost document _id is x in a certain page, then pageStart should be x for the next page.
So your pipeline may look like this:
[
{'$match' : {templateId:ObjectId('blabla'), _id: {$gt: pageStart}}},
{
"$facet" : {
"paginatedResult" : [
{
"$limit" : 100
}
]
}
}
]
Note, that now the $skip is missing from the $facet operator as well.

Related

Paginated aggregation with two fields

Trying to do an aggregate operation to find distinct property pairs in a collection of objects, and paginated, but the $skip and $limit doesn't seems to work.
I have a collection with the following object type
{
"_id" : {
"expiration" : ISODate("2021-06-30T00:00:00.000Z"),
"product" : "proda",
"site" : "warehouse1",
"type" : "AVAILABLE"
},
"quantity" : 2,
"date" : ISODate("2021-06-28T00:00:00.000Z"),
}
I'm trying to find distinct product/site pairs, but only 2 at a time with the following aggregation:
db.getCollection('OBJECT').aggregate( [
{ $group: { "_id": { product: "$_id.product", site: "$_id.site" } } },
{ $skip: 0 },
{ $limit: 2 }
])
With skip being 0 it returns 2 distinct product-site paris as expected, but when I increase the skip value to 2 or more for the next steps, the query will not return anything and I have many objects with distinct product-site pairs that should be returned.

Count of a nested value of all entries in mongodb collection

I have a collection named outbox which has this kind of structure
"_id" :ObjectId("5a94e02bb0445b1cc742d795"),
"track" : {
"added" : {
"date" : ISODate("2020-12-03T08:48:51.000Z")
}
},
"provider_status" : {
"job_number" : "",
"count" : {
"total" : 1,
"sent" : 0,
"delivered" : 0,
"failed" : 0
},
"delivery" : []
}
I have 2 tasks. First I want the sum of all the "total","sent","failed" on all the entries in the collection no matter what their objectId is. ie I want sum of all the "total","sent","delivered" and "failed". Second I want all these only for a given object Id between Start and End date.
I am trying to find total using this query
db.outbox.aggregate(
{ $group: { _id : null, sum : { $sum: "$provider_status.count.total" } } });
But I am getting this error as shown
Since I do not have much experience in mongodb I don't have any idea how to do these two tasks. Need help here.
You are executing this in Robo3t seems like.
You need to enclose this in an array like
db.test.aggregate([ //See here
{
$group: {
_id: null,
sum: {
$sum: "$provider_status.count.total"
}
}
}
])//See here
But it's not the case with playground as they handle them before submitting to the server

MongoDB $divide on aggregate output

Is there a possibility to calculate mathematical operation on already aggregated computed fields?
I have something like this:
([
{
"$unwind" : {
"path" : "$users"
}
},
{
"$match" : {
"users.r" : {
"$exists" : true
}
}
},
{
"$group" : {
"_id" : "$users.r",
"count" : {
"$sum" : 1
}
}
},
])
Which gives an output as:
{ "_id" : "A", "count" : 7 }
{ "_id" : "B", "count" : 49 }
Now I want to divide 7 by 49 or vice versa.
Is there a possibility to do that? I tried $project and $divide but had no luck.
Any help would be really appreciated.
Thank you,
From your question, it looks like you are assuming result count to be 2 only. In that case I can assume users.r can have only 2 values(apart from null).
The simplest thing I suggest is to do this arithmetic via javascript(if you're using it in mongo console) or in case of using it in progam, use the language you're using to access mongo) e.g.
var results = db.collection.aggregate([theAggregatePipelineQuery]).toArray();
print(results[0].count/results[1].count);
EDIT: I am sharing an alternative to above approach because OP commented about the constraint of not using javascript code and the need to be done only via query. Here it is
([
{ /**your existing aggregation stages that results in two rows as described in the question with a count field **/ },
{ $group: {"_id": 1, firstCount: {$first: "$count"}, lastCount: {$last: "$count"}
},
{ $project: { finalResult: { $divide: ['$firstCount','$lastCount']} } }
])
//The returned document has your answer under `finalResult` field

Sort inside cond and if mongodb

I want to sort my aggregation only if a condition is met.
This is what I have so far:
{
$cond: {
if: { $gte: [sort, "like"] },
then: { $divide: { $sort : { total_likes : -1 } } },
else: { $divide: '' }
}
}
sort is a variable that comes from a query parameter.
I want to sort by total_likes, only if sort is "likes". If it's not, I want to leave it alone.
First of all, #schoenbl, if you want to match some condition in mongo aggregation, you should use $match aggregation. It will send the documents which fulfill the given condition.
if: { $gte: [sort, "like"] }
In MongoDB, you are not allowed to compare string using "gte" operator. For string comparison in MongoDB, you get two operators:
for case sensitive $cmp.
for case insensitive $strcasecmp.
then: { $divide: { $sort : { total_likes : -1 } } },
Next, you were using divide operator don't know what is your need but syntax is improper,
refer $divide, for better knowledge.
Also, you are doing sorting in $cond, which means you want to sort each element, and that is not possible because you can't sort without having a comparison as you are inside $cond operator and it is performing manipulation on a single document.
Now, according to your need, I have prepared the next stages which will give sorted document which contains "sort" equals to "like".
{$match:{"sort":"like"}},{$sort:{"total_likes":-1}}
Output:
{ "_id" : ObjectId("5d50569fbe39828b4a22fba2"), "name" : "kyle", "sort" : "like", "total_likes" : 5 }
{ "_id" : ObjectId("5d5056a6be39828b4a22fba3"), "name" : "jack", "sort" : "like", "total_likes" : 2 }
{ "_id" : ObjectId("5d5056abbe39828b4a22fba4"), "name" : "john", "sort" : "like", "total_likes" : 1 }

MongoDB fetch documents with sort by count

I have a document with sub-document which looks something like:
{
"name" : "some name1"
"like" : [
{ "date" : ISODate("2012-11-30T19:00:00Z") },
{ "date" : ISODate("2012-12-02T19:00:00Z") },
{ "date" : ISODate("2012-12-01T19:00:00Z") },
{ "date" : ISODate("2012-12-03T19:00:00Z") }
]
}
Is it possible to fetch documents "most liked" (average value for the last 7 days) and sort by the count?
There are a few different ways to solve this problem. The solution I will focus on uses mongodb's aggregation framework. First, here is an aggregation pipeline that will solve your problem, following it will be an explanation/breakdown of what is happening in the command.
db.testagg.aggregate(
{ $unwind : '$likes' },
{ $group : { _id : '$_id', numlikes : { $sum : 1 }}},
{ $sort : { 'numlikes' : 1}})
This pipeline has 3 main commands:
1) Unwind: this splits up the 'likes' field so that there is 1 'like' element per document
2) Group: this regroups the document using the _id field, incrementing the numLikes field for every document it finds. This will cause numLikes to be filled with a number equal to the number of elements that were in "likes" before
3) Sort: Finally, we sort the return values in ascending order based on numLikes. In a test I ran the output of this command is:
{"result" : [
{
"_id" : 1,
"numlikes" : 1
},
{
"_id" : 2,
"numlikes" : 2
},
{
"_id" : 3,
"numlikes" : 3
},
{
"_id" : 4,
"numlikes" : 4
}....
This is for data inserted via:
for (var i=0; i < 100; i++) {
db.testagg.insert({_id : i})
for (var j=0; j < i; j++) {
db.testagg.update({_id : i}, {'$push' : {'likes' : j}})
}
}
Note that this does not completely answer your question as it avoids the issue of picking the date range, but it should hopefully get you started and moving in the right direction.
Of course, there are other ways to solve this problem. One solution might be to just do all of the sorting and manipulations client-side. This is just one method for getting the information you desire.
EDIT: If you find this somewhat tedious, there is a ticket to add a $size operator to the aggregation framework, I invite you to watch and potentially upvote it to try and speed to addition of this new operator if you are interested.
https://jira.mongodb.org/browse/SERVER-4899
A better solution would be to keep a count field that will record how many likes for this document. While you can use aggregation to do this, the performance will likely be not very good. Having a index on the count field will make read operation fast, and you can use atomic operation to increment the counter when inserting new likes.
You can use this simplify the above aggregation query by the following from mongodb v3.4 onwards:
> db.test.aggregate([
{ $unwind: "$like" },
{ $sortByCount: "$_id" }
]).pretty()
{ "_id" : ObjectId("5864edbfa4d3847e80147698"), "count" : 4 }
Also as #ACE said you can now use $size within a projection instead:
db.test.aggregate([
{ $project: { count: { $size : "$like" } } }
]);
{ "_id" : ObjectId("5864edbfa4d3847e80147698"), "count" : 4 }