How does mongodb use an index to count documents? - mongodb

According to docs, db.collection.countDocuments() wraps this:
db.collection.aggregate([
{ $match: <query> },
{ $group: { _id: null, n: { $sum: 1 } } }
])
Even if there is an index, all of the matched docs will be passed into the $group to be counted, no?
If not, how is mongodb able to count the docs without processing all matching docs?

The MongoDB query planner can make some optimizations.
In that sample aggregation, it can see that no fields are required except for the ones referenced in <query>, so it can add an implicit $project stage to select only those fields.
If those fields and the _id are all included in a single index, there is no need to fetch the documents to execute that query, all the necessary information is available from the index.

Related

Why sort document by id is slower with $match than not in mongodb?

So, I tried to query
db.collection('collection_name').aggregate([
{
$match: { owner_id: '5be9b2f03ef77262c2bd49e6' }
},
{
$sort: { _id: -1 }
}])
the query above takes up 20s
but If I tried to query
db.collection('collection_name').aggregate([{$sort : {_id : -1}}])
it's only take 0.7s
Why does it the one without $match is actually faster than without match ?
update :
when I try this query
db.getCollection('callbackvirtualaccounts').aggregate([
{
$match: { owner_id: '5860457640b4fe652bd9c3eb' }
},
{
$sort: { created: -1 }
}
])
it's only takes 0.781s
Why sort by _id is slower than by created field ?
note : I'm using mongodb v3.0.0
db.collection('collection_name').aggregate([
{
$match: { owner_id: '5be9b2f03ef77262c2bd49e6' }
},
{
$sort: { _id: -1 }
}])
This collection probably won't be having and index on owner_id; Try using below mentioned index creation query and rerun your previous code.
db.collection('collection_name').createIndexes({ owner_id:1}) //Simple Index
or
db.collection('collection_name').createIndexes({ owner_id:1,_id:-1}) //Compound Index
**Note:: If you don't know how to compound index yet, you can create simple indexes individually on all keys which are used either in match or sort and that should be making query efficient as well.
The query speed depends upon a lot of factors. The size of collection, size of the document, indexes defined on the collection (and used in the queries and properly), the hardware components (like CPU, RAM, network) and other processes running at the time the query is running.
You have to tell what indexes are defined on the collection being discussed for further analysis. The command will retrieve them: db.collection.getIndexes()
Note the unique index on the _id field is created by default, and cannot be modified or deleted.
(i)
But If I tried to query: db.collection.aggregate( [ { $sort : { _id : -1 } } ] ) it's
only take 0.7s.
The query is faster because there is an index on the _id field and it is used in sort process. Aggregation queries use indexes with sort stage and when this sort happens early in the pipeline. You can verify if the index is used or not by generating a query plan (use explain with executionStats mode). There will be an index scan (IXSCAN) in the generated query plan.
(ii)
db.collection.aggregate([
{
$match: { owner_id: '5be9b2f03ef77262c2bd49e6' }
},
{
$sort: { _id: -1 }
}
])
The query above takes up 20s.
When I try this query it's only takes 0.781s.
db.collection.aggregate([
{
$match: { owner_id: '5860457640b4fe652bd9c3eb' }
},
{
$sort: { created: -1 }
}
])
Why sort by _id is slower than by created field ?
Cannot come to any conclusions with the available information. In general, the $match and $sort stages present early in the aggregation query can use any indexes created on the fields used in the operations.
Generating a query plan will reveal what the issues are.
Please run the explain with executionStats mode and post the query plan details for all queries in question. There is documentation for Mongodb v3.0.0 version on generation query plans using explain: db.collection.explain()

Trying to select single documents from mongo collection

We have a rudimentary versioning system in a collection that uses a field (pageId) as a root key. Subsequent versions of this page have the same pageId. This allows us to very easily find all versions of a single page.
How do I go about running a query that returns only the lastModified document for each distinct pageId.
In psuedo-code you could say:
For each distinct pageId
sort documents based on lastModified descending
and return only the first document
You can use the aggregation pipelines for that.
$sort - Sorts all input documents and returns them to the pipeline in sorted order.
$group - Groups documents by some specified expression and outputs to the next stage a document for each distinct grouping.
$first - Returns the value that results from applying an expression to the first document in a group of documents that share the same group by key.
Example:
db.getCollection('t01').aggregate([
{
$sort: {'lastModified': -1}
},
{
$group: {
_id: "$pageId",
element1: { $first: "$element1" },
element2: { $first: "$element2" },
elementN: { $first: "$elementN" },
}
}
]);

Aggregate method in MongoDB Compass?

as stated in the title i'm having some problems querying from MongoDB Compass using the aggregate methhod. I have a collection of documents in this form:
{"Array":[{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},...]}
using mongo shell or Studio 3T software I query it with aggregate method, follows an example:
db.collection.aggregate([
{ $match: {"Array.field": "val"}},
{ $unwind: "$Array"},
{ $match: {"Array.field": "val"}},
{ $group: {_id: null, count: {$sum:NumberInt(1)}, Array: {$push: "$Array"}}},
{ $project: {"N. Hits": "$count", Array:1}}
])
where I look for elements of Array who has field's value = "val" and count them. This works perfectly, but I don't know how to do the same in MongoDB Compass
in the query bar I have 'filter', 'project' and 'sort' and I can do usual queries, but i don't know how to use aggregate method.
Thanks
You are looking at the Documents tab which is restricted for querying documents.
Take a look in the second tab called Aggregations where you can do your aggregation pipelines, as usual.
For further information please visit the Aggregation Pipeline Builder documentation.

SELECT avg(rate) FROM ratings WHERE sid=1 in MongoDB

How to implement equivalent of this SQL command in MongoDB?
SELECT avg(rate) FROM ratings WHERE sid=1
No need to grouping.
Yes there is aggregation framework in mongodb where you can make a pipeline of stages you want for query.
db.collection.aggregate([
{
$match: {
"sid": 1
}
},
{
$project: avg(rate): {
$avg: "$rate"
}
}
])
As you know in sql query where part is applied first that's why we've place $match pipeline at first. $match in mongodb is somehow equivalent to where i SQL and there is $avg in mongodb which works the same as AVG in SQL
To solve this, use $avg within the $group aggregation pipeline element. Basic pipeline flow:
match on sid=1 (your WHERE clause)
group by sid (there's only one sid to group by at this point, because the others are filtered out via match), and generate an average within the group'd content
Your pipeline would look something like:
db.rates.aggregate(
[
{ $match: {"sid":1}},
{ $group: { _id: "$sid", rateAvg: {$avg: "$rate" } }}
])

Refine/Restructure data from Mongodb query

Im using NodeJs, MongoDB Native 2.0+
The following query fetch one client document containing arrays of embedded staff and services.
db.collection('clients').findOne({_id: sessId}, {"services._id": 1, "staff": {$elemMatch: {_id: reqId}}}, callback)
Return a result like this:
{
_id: "5422c33675d96d581e09e4ca",
staff:[
{
name: "Anders"
_id: "5458d0aa69d6f72418969428"
// More fields not relevant to the question...
}
],
services: [
{
_id: "54578da02b1c54e40fc3d7c6"
},
{
_id: "54578da42b1c54e40fc3d7c7"
},
{
_id: "54578da92b1c54e40fc3d7c9"
}
]
}
Note that each embedded object in services actually contains several fields, but _id is the only field returned by means of the projection of the query.
From this returned data I start by "pluck" all id's from services and save them in an array later used for validation. This is by no means a difficult operation... but I'm curious... Is there an easy way to do some kind of aggregation instead of find, to get an array of already plucked objectId's directly from the DB. Something like this:
{
_id: "5422c33675d96d581e09e4ca",
staff:[
{
name: "Anders"
_id: "5458d0aa69d6f72418969428"
// More fields not relevant to the question...
}
],
services: [
"54578da02b1c54e40fc3d7c6",
"54578da42b1c54e40fc3d7c7",
"54578da92b1c54e40fc3d7c9"
]
}
One way of doing it is to first,
$unwind the document based on the staff field, this is done to
select the intended staff. This step is required due to the
unavailability of the $elemMatch operator in the aggregation
framework.
There is an open ticket here: Jira
Once the document with the correct staff is selected, $unwind, based on $services.
The $group, together $pushing all the services _id together in an array.
This is then followed by a $project operator, to show the intended fields.
db.clients.aggregate([
{$match:{"_id":sessId}},
{$unwind:"$staff"},
{$match:{"staff._id":reqId}},
{$unwind:"$services"},
{$group:{"_id":"$_id","services_id":{$push:"$services._id"},"staff":{$first:"$staff"}}},
{$project:{"services_id":1,"staff":1}}
])