Mongo Aggregate not using Index - mongodb

My mongo find query is using an index, but the same functionality if I am implementing using aggregate, it is not using the Index.
db.collection1.find({Attribute8: "s1000",Attribute9: "s1000"}).sort({Attribute10: 1})
"cursor used in find" : "BtreeCursor Attribute8_1_Attribute9_1_Attribute10_1"
db.collection1.aggregate([
{
$match: {
Attribute8: "s1000",
Attribute9: "s1000"
}
},
{
$sort: {
Attribute10: 1
}
}
])
"cursor used in aggregate" : "BtreeCursor ".
Can someone tell me where it went wrong. My goal is to use Indexes in aggregate method.
Thanks in advance.

After some digging the issue is the limitation of usage of the following types:
Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope
In this case Symbol is used for containing a string value, so index wont work.
Please try with a Number en set explain to true in the aggregate option.
[EDIT]
My previous answer is incorrect.
The aggregation pipeline is using a 'BtreeCursor' (only when the defined field has an index) to run the $match query and does uses the ensured index, check "indexBound" for verification.
Ensuring the whole collection to have an index on "Attribute08"
db.temps.ensureIndex({Attribute08:1})
$match on a field with an index:
db.temps.aggregate([{$match:{Attribute08:"s1000"}}],{explain:true})
"allPlans" : [
{
"cursor" : "BtreeCursor ",
"isMultiKey" : false,
"scanAndOrder" : false,
"indexBounds" : {
"Attribute08" : [
[
"s1000",
"s1000"
]
]
}
}
]
Below the $match on a field without index:
db.temps.aggregate([{$match:{Attribute09:"s1000"}}],{explain:true})
"allPlans" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"scanAndOrder" : false
}
]

Related

mongo sort before match in aggregation

Given a collection of a few million documents that look like:
{
organization: ObjectId("6a55b2f1aae2fe0ddd525828"),
updated_on: 2019-04-18 14:08:48.781Z
}
and 2 indices, on both keys {organization: 1} and {updated_on: 1}
The following query takes ages to return:
db.getCollection('sessions').aggregate([
{
"$match" : {
"organization" : ObjectId("5a55b2f1aae2fe0ddd525827"),
}
},
{
"$sort" : {
"updated_on" : 1
}
}
])
One thing to note is, the result is 0 matches. Upon further investigation, the planner in explain() actually returns the following:
{
"stage" : "FETCH",
"filter" : {
"organization" : {
"$eq" : ObjectId("5a55b2f1aae2fe0ddd525827")
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"updated_on" : 1.0
},
"indexName" : "updated_on_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"updated_on" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"updated_on" : [
"[MinKey, MaxKey]"
]
}
}
}
Why would Mongo combine these into one stage and decide to to sort ALL documents BEFORE filtering?
How can I prevent that?
Why would Mongo combine these into one stage and decide to to sort ALL
documents BEFORE filtering? How can I prevent that?
The sort does happen after the match stage. The query plan doesn't show the SORT stage - that is because there is an index on the sort key updated_on. If you remove the index on the updated_on field you will see a SORT stage in the query plan (and it will be an in-memory sort).
See Explain Results - sort stage.
Some Ideas:
(i) You can use a compound index, instead of a two single field indexes:
{ organization: 1, updated_on: 1 }
It will work fine. See this topic on Sort and Non-prefix Subset of an Index.
(ii) Also, instead of an aggregation, a find() query can do the same job:
db.test.find( { organization : ObjectId("5a55b2f1aae2fe0ddd525827") } ).sort( { updated_on: 1 } )
NOTE: Do verify with explain() and see how they perform. Also, try using the executionStats mode.
MongoDB will use the index on to $sort because it's a heavy operation even if matching before will limit the result to be sorted,
You can either force using the index for $match:
db.collection.aggregate(pipeline, {hint: "index_name"})
Or create a better index to solve both problems, see more informations here
db.collection.createIndex({organization: 1, updated_on:1}, {background: true})

Using Mongo compound indexes

I'm trying to create a compound index for a popular query.
There are two fields. A 'client' field of type String, containing an IP address. The other is a 'sendOn' field which is of type Date. I'm looking for documents where the client is null and the sendOn is between a certain range.
To support ascending sorting on the sendOn field, I've established that I need a sendOn index with a value of 1.
So, I've run
> db.queries.ensureIndex({"client":1,"sendOn":1})
> db.queries.find({ $query: { client: { $exists: false } , sendOn: { $gt: new Date(1387664033883), $lt: new Date(1387750493883) } } }).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 2546133,
"nscannedObjects" : 2546133,
"n" : 0,
"millis" : 25071,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
The documentation says that $exists queries are generally inefficient. I've considered forcing documents to contain at least an empty client field. But that query doesn't use the index either?
It should use the index (provided you replace the $exists by a null value, as you mentioned - the $exists operator is not optimized for indexes).
You may be messing things up here at some point : documentation states you shouldn't use .explain() with a $query format :
http://docs.mongodb.org/manual/reference/operator/meta/query/
Try this :
db.queries.find({ $query: { client: null , sendOn: { $gt: new Date(1387664033883), $lt: new Date(1387750493883) } }, $explain: true });

Mongod - IndexOnly is false even though query and projection are both covered in the index

The mongo documentation for covered querieshere talks about the queries and projections and to simply turn off the _id field in the projection if you want a covered query. What if you need the _id field though and still want the efficiency of a covered query (indexOnly = True)?
db.collection.ensureIndex({field1:1,_id:1})
db.collection.getIndexKeys()
[{
"_id" : 1
},
{
"field1" : 1
},
{
"field1" : 1,
"_id" : 1
}]
db.collection.find({field1:{$regex:/^\s/}},{field1:1,_id:1}).explain()
{
"cursor" : "BtreeCursor fieldname",
"isMultiKey" : false,
"n" : 3582,
"nscannedObjects" : 3582,
"nscanned" : 130511408,
"nscannedObjectsAllPlans" : 3582,
"nscannedAllPlans" : 130511408,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 20,
"nChunkSkips" : 0,
"millis" : 158705,
"indexBounds" : {
"cdr3_aa" : [
[
"",
{
}
]
]
},
"server" : localhost}
Of course if I turn off _id on the projection, IndexOnly returns true and the query is lightning fast. What am I doing wrong?
EDIT - I made it more efficient by getting rid of case insensitivity on a space, adding a ^ to speed up the query, but IndexOnly : False. I don't understand why its not true.
From documentation:
$regex can only use an index efficiently when the regular expression
has an anchor for the beginning (i.e. ^) of a string and is a
case-sensitive match. Additionally, while /^a/, /^a.*/, and /^a.*$/ match equivalent
strings, they have different performance characteristics. All of these expressions use an
index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can
stop scanning after matching the prefix.
In your case you use regex with i which means case-insensitive match. So, you should remove i from regex and start to search from the beginning of field.
BTW, I don't undestand your search criteria: Looking for one space char \s in the field with case-insensitive?
Not quite sure why, but it seems as though your query is using the { field1: 1 } index instead of { field1: 1, _id: 1 } index. Can you try running the query with the hint?
db.collection.find( { field1: {$regex:/^\s/} }, { field1: 1, _id: 1 } ).hint( { field1: 1, _id: 1 } ).explain()
It could be that the query optimizer has selected the { field1: 1 } index initially and has not re-evaluated the various plans. See http://docs.mongodb.org/manual/core/query-plans/ for explanation of the query optimizer and how it selects a plan.

Unable to get a covered query to use index only in mongodb

I am trying to use a covering index to implement stemming text search on my app which uses mongodb.
I've got the following index set:
ensureIndex({st: 1, n: 1, _id: 1});
But when I run explain() on my query, I can never get the indexOnly to read true, no matter what I do.
db.merchants.find({st: "Blue"}, {n:1,_id:1}).explain()
{
"cursor" : "BtreeCursor st_1_n_1__id_1",
"nscanned" : 8,
"nscannedObjects" : 8,
"n" : 8,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"st" : [
[
"Blue",
"Blue"
]
],
"n" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"_id" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
I've already figured out that the ordering of the keys in the index matter somehow. For instance if I used {_id, n:1, st:1} it wasn't using this index at all to perform the query. I also read somewhere that too few documents could trigger unpredictable behaviour with explain() since multiple strategies are equally fast. But in this case, I see that its using the right index, but its not using just the index. What is this happening?
I am using mongoid, and mongo 2.0.8 I believe.
UPDATE:
Switched over to using Mongoid v3.1.4 and mongod v2.2
Here is the query that mongod is seeing from mongoid: Mon Jul 15 10:47:26 [conn14] runQuery called spl_development.merchants { $query: { st: { $regex: "cr", $options: "i" } }, $explain: true } Mon Jul 15 10:47:26 [conn14] query spl_development.merchants query: { $query: { st: { $regex: "cr", $options: "i" } }, $explain: true } ntoreturn:0 keyUpdates:0 locks(micros) r:212 nreturned:1 reslen:393 0ms
So the projection isn't being sent to the mongod layer and only just handles it in the application layer. Not ideal!
This has been recognized as a bug in mongoid and can be tracked here:
https://github.com/mongoid/mongoid/issues/3142
I expect your query cannot use a covered index because you have a field with an array included in the index. This is suggested in the explain with "isMultiKey" : true.
As noted in the documentation (Create Indexes that Support Covered Queries):
MongoDB cannot use a covered query if any of the indexed fields in any of the documents in the collection includes an array. If an indexed field is an array, the index becomes a multi-key index and cannot support a covered query.
I wasn't able to reproduce the problem in 2.2.2, but add .sort({n: 1, _id: 1}) into the chain. Because you're not sorting, you're asking for the docs in whatever find order mongo wishes to use, and if that doesn't match the order in the index ($natural order, for instance) it still has to read the docs.
db.merchants.find({st: "Blue"}, {n:1,_id:1}).sort({n: 1, _id: 1}).explain()

mongodb index an array's key (not the values)

In MongoDB, I have the following document
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": {
"blabla": 20,
"blibli": 100,
"blublu": 250,
... (many more here)
}
}
And I would like to index it to be able to query for the "key" of the "concept" array (I know it's not really a mongoDB array...):
db.things.find({concepts:blabla});
Is it possible with the above schema? Or shall I refactor my documents to something like
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": ["blabla","blibli","blublu", ... (many more here)]
}
}
I'll answer your actual question. No you cannot index on the field names given your current schema. $exists uses an index but that is an existence check only.
There are a lot of problems with a schema like the one you're using and I would suggest a refactor to :
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": [
{name:"blabla", value: 20},
{name:"blibli", value: 100},
{name:"blublu", value: 250},
... (many more here)
]
}
then index {'concepts.name:1'} and you can actually query on the concept names rather than just check for the existence.
TL;DR : No you can't.
You can query field presence with specific query:
db.your_collection.find({"concept.yourfield": { $exists: true }})
(notice the $exists)
It will return all your document where yourfield is a field of concept subdocument
edit:
this solution is only about query. Indexes contains values not field.
MongoDB indexes each value of the array so you can query for individual items.As you can find here.
But in nested arrays you need to tell to index mongodb to index your sub-fields.
db.col1.ensureIndex({'concepts.blabla':1})
db.col1.ensureIndex({'concepts.blublu':1})
db.col1.find({'concepts.blabla': 20}).explain()
{
"cursor" : "BtreeCursor concepts.blabla_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"concepts.blabla" : [
[
20,
20
]
]
}
}
After creating the index , the cursor type changes itself from BasicCursor to BtreeCursor.
if you create your document as you stated at the end of your question
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": ["blabla","blibli","blublu", ... (many more here)]
}
}
just the indexing will be enough as below:
db.col1.ensureIndex({'concepts':1})