Indexing an aggregation slow query on mongo db

Indexing an aggregation slow query on mongo db - mongodb

I got a slow query on mongo about around 50k documents in a collection
How can I index it?
I tried to add the following index but it does not solve the issue
db.getCollection("events").createIndex({ "area.area_id": 1, "execute_time": -1 })
"Slow query","attr":{"type":"command","ns":"events.events",
"command":{"aggregate":"events","pipeline":
[
{"$facet":{"1":[{"$match":{"area.area_id":"1"}},
{"$sort":{"execute_time":-1}},{"$limit":30}
],
"2":
[
{"$match":{"area.area_id":"2"}},
{"$sort":{"execute_time":-1}},{"$limit":30}]}}
]
,"cursor":{},
"lsid":
{"id":{"$uuid":"2be3c461-dfc7-4591-adaf-da9454b9615c"}},"$db":"events"},
"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":37973,"cursorExhausted":true,"numYields":37,"nreturned":1,
"reslen":118011,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":61}},"Global":{"acquireCount":{"r":61}},
"Database":{"acquireCount":{"r":61}},"Collection":{"acquireCount":{"r":61}},"Mutex":{"acquireCount":{"r":24}}},"storage":{},"protocol":"op_msg","durationMillis":262}}
my query:
this.collection.aggregate([
{$facet: facetObj }])
each facet obj is something like:
facet[x] = [
{$match: {'area.area_id': x}},
{$sort: { execution_time: -1 }},
{$limit: limit}
]

You cannot use indexes in the $facet stage.
From the MongoDB documentation:
The $facet stage, and its sub-pipelines, cannot make use of indexes, even if its sub-pipelines use $match or if $facet is the first stage in the pipeline. The $facet stage will always perform a COLLSCAN during execution.

You did not show any input data nor the expected result. However from what I see, one approach could be this one:
db.collection.aggregate([
{ $match: { area_id: { $in: [ 1, 2 ] } } },
{ $sort: { execute_time: -1 } },
{
$group: {
_id: "$area_id",
execute_time: { $push: "$execute_time" }
}
},
{
$set: {
execute_time: { $slice: [ "$execute_time", 30 ] }
}
}
])
Mongo playground

Related

MongoDB - count by field, and sort by count

I am new to MongoDB, and new to making more than super basic queries and i didn't succeed to create a query that does as follows:
I have such collection, each document represents one "use" of a benefit (e.g first row states the benefit "123" was used once):
[
{
"id" : "1111",
"benefit_id":"123"
},
{
"id":"2222",
"benefit_id":"456"
},
{
"id":"3333",
"benefit_id":"456"
},
{
"id":"4444",
"benefit_id":"789"
}
]
I need to create q query that output an array. at the top is the most top used benefit and how many times is was used.
for the above example the query should output:
[
{
"benefit_id":"456",
"cnt":2
},
{
"benefit_id":"123",
"cnt": 1
},
{
"benefit_id":"789",
"cnt":1
}
]
I have tried to work with the documentation and with $sortByCount but with no success.

$group
$group by benefit_id and get count using $sum
$sort by count descending order
db.collection.aggregate([
{
$group: {
_id: "$benefit_id",
count: { $sum: 1 }
}
},
{ $sort: { count: -1 } }
])
Playground
$sortByCount
Same operation using $sortByCount operator
db.collection.aggregate([
{ $sortByCount: "$benefit_id" }
])
Playground

$match not filtering records satisfying a condition in a generated field after $project stage

I have a movies collection and I want to find all the movies with one-word names (eg: 'Adrift' should be returned but not 'Bird Box'). I did the following and nothing is returned upon executing the command in the Mongo shell. I saw that the output of only the '$project' stage works fine where all one-word movie titles have 'titleSize = true'. So, I think something is wrong with the way I wrote the '$match' stage. I am new to Mongo and may not have understood the concept. Any help in understanding what I am doing wrong will be greatly appreciated.
db.movies.aggregate([(
{$project:
{_id:0,
title:1,
"titleSize":{
$eq:[{$size:{$split: ["$title"," "]}},1]
}
}
},
{$match:
{"titleSize":true}
}
)])

There's a better alternative: $expr. Use it as the query with the find() method:
db.movies.find({
'$expr': {
'$eq': [
{ '$size': { '$split': ["$title", " "] } },
1
]
}
})
Or if you're still going with the current aggregate pipeline, tweak it by removing the opening and closing brackets and some extra braces, you should finally have:
db.movies.aggregate([
{ $project: {
_id: 0,
title: 1,
titleSize: {
$eq: [
{ $size: {
$split: ["$title", " "]
} },
1
]
}
} },
{ $match: { titleSize: true } }
])

Using "$count" Within an "addField" Operation in MongoDB Aggregation

I am trying to find the correct combination of aggregation operators to add a field titled "totalCount" to my mongoDB view.
This will get me the count at this particular stage of the aggregation pipeline and output this as the result of a count on each of the documents:
{
$count: "count"
}
But I then end up with one document with this result, rather than what I'm trying to accomplish, which is to make this value print out as an addedField that is a field/value on all of the documents, or even better, a value that prints in addition to the returned documents.
I've tried this but it gives me an error ""Unrecognized expression '$count'",":
{
$addFields: {
"totalCount" : { $count: "totalCount" }
}
}
What would the correct syntactical construction be for this? Is it possible to do it this way, or do I need to use $sum, or some other operator to make this work? I also tried this:
{
$addFields: {
"totalCount" : { $sum: { _id: 1 } }
}
},
... but while it doesn't give me any errors, it just prints 0 as the value for that field on every document rather than the total count of all documents.

Total count will always be a one-document result so you need $facet to run mutliple aggregation pipelines and then merge results. Let's say your regular pipeline contains simple $project and you want to merge it's results with $count. You can run below aggregation:
db.col.aggregate([
{
$facet: {
totalCount: [
{ $count: "value" }
],
pipelineResults: [
{
$project: { _id: 1 } // your regular aggregation pipeline here
}
]
}
},
{
$unwind: "$pipelineResults"
},
{
$unwind: "$totalCount"
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [ "$pipelineResults", { totalCount: "$totalCount.value" } ]
}
}
}
])
After $facet stage you'll get single document like this
{
"totalCount" : [
{
"value" : 3
}
],
"pipelineResults" : [
{
"_id" : ObjectId("5b313241120e4bc08ce87e46")
},
//....
]
}
Then you have to use $unwind to transform arrays into multiple documents and $replaceRoot with $mergeObjects to promote regular pipeline results into root level.

Since mongoDB version 5.0 there is another option, that allows to avoid the disadvantage of $facet, the grouping of all returned document into a one big document. The main concern is that a document as a size limit of 16M. Using $setWindowFields allows to avoid this concern
This can simply replace #micki's 4 steps:
db.col.aggregate([
{$setWindowFields: {output: {totalCount: {$count: {}}}}}
])

Compare document array size to other document field

The document might look like:
{
_id: 'abc',
programId: 'xyz',
enrollment: 'open',
people: ['a', 'b', 'c'],
maxPeople: 5
}
I need to return all documents where enrollment is open and the length of people is less than maxPeople
I got this to work with $where:
const
exists = ['enrollment', 'maxPeople', 'people'],
query = _.reduce(exists, (existsQuery, field) => {
existsQuery[field] = {'$exists': true}; return existsQuery;
}, {});
query['$and'] = [{enrollment: 'open'}];
query['$where'] = 'this.people.length<this.maxPeople';
return db.coll.find(query, {fields: {programId: 1, maxPeople: 1, people: 1}});
But could I do this with aggregation, and why would it be better?
Also, if aggregation is better/faster, I don't understand how I could convert the above query to use aggregation. I'm stuck at:
db.coll.aggregate([
{$project: {ab: {$cmp: ['$maxPeople','$someHowComputePeopleLength']}}},
{$match: {ab:{$gt:0}}}
]);
UPDATE:
Based on #chridam answer, I was able to implement a solution like so, note the $and in the $match, for those of you that need a similar query:
return Coll.aggregate([
{
$match: {
$and: [
{"enrollment": "open"},
{"times.start.dateTime": {$gte: new Date()}}
]
}
},
{
"$redact": {
"$cond": [
{"$lt": [{"$size": "$students" }, "$maxStudents" ] },
"$$KEEP",
"$$PRUNE"
]
}
}
]);

The $redact pipeline operator in the aggregation framework should work for you in this case. This will recursively descend through the document structure and do some actions based on an evaluation of specified conditions at each level. The concept can be a bit tricky to grasp but basically the operator allows you to proccess the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.
This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which restricts contents of the result set based on the access required to view the data and is more efficient.
To run a query on all documents where enrollment is open and the length of people is less than maxPeople, include a $redact stage as in the following::
db.coll.aggregate([
{ "$match": { "enrollment": "open" } },
{
"$redact": {
"$cond": [
{ "$lt": [ { "$size": "$people" }, "$maxPeople" ] },
"$$KEEP",
"$$PRUNE"
]
}
}
])

You can do :
1 $project that create a new field featuring the result of the comparison for the array size of people to maxPeople
1 $match that match the previous comparison result & enrollment to open
Query is :
db.coll.aggregate([{
$project: {
_id: 1,
programId: 1,
enrollment: 1,
cmp: {
$cmp: ["$maxPeople", { $size: "$people" }]
}
}
}, {
$match: {
$and: [
{ cmp: { $gt: 0 } },
{ enrollment: "open" }
]
}
}])

$match operator for sub document field in MongoDb

I am trying new pipeline query of MongoDB so i try to execute below query.
{
aggregate: 'Posts',
pipeline: [
{ $unwind: '$Comments'},
{ $match: {'$Comments.Owner': 'Harry' }},
{$group: {
'_id': '$Comments._id'
}
}
]
}
And nothing match to query so empty result returns. I guess problem can be on $match command . I am using dotted notation match comment Owner but not sure it is exactly true or not. Why this query does not return Ownders who is 'Harry' . I am sure it is exist in db.

You don't use the $ prefix for the $match field names.
Try this:
{
aggregate: 'Posts',
pipeline: [
{ $unwind: '$Comments'},
{ $match: {'Comments.Owner': 'Harry' }},
{ $group: {
'_id': '$Comments._id'
}}
]
}

I encounter the same problem with aggregation framework with MongoDB 2.2.
$match didn't work for me for subdocument (but I am just learning MongoDB, so I could do something wrong).
I added extra projection to remove subdocument (Comments in this case):
{
aggregate: 'Posts',
pipeline: [
{ $unwind: '$Comments'},
{ $project: {
comment_id: "$Comments._id",
comment_owner: "$Comments.Owner"
}},
{ $match: {'$comment_Owner': 'Harry' }},
{$group: {
'_id': '$comment_id'
}
}
]
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Indexing an aggregation slow query on mongo db - mongodb

You cannot use indexes in the $facet stage. From the MongoDB documentation: The $facet stage, and its sub-pipelines, cannot make use of indexes, even if its sub-pipelines use $match or if $facet is the first stage in the pipeline. The $facet stage will always perform a COLLSCAN during execution.

Related

MongoDB - count by field, and sort by count

$match not filtering records satisfying a condition in a generated field after $project stage

Using "$count" Within an "addField" Operation in MongoDB Aggregation

Compare document array size to other document field

$match operator for sub document field in MongoDb

Categories

Resources