Unable to get a covered query to use index only in mongodb - mongodb

I am trying to use a covering index to implement stemming text search on my app which uses mongodb.
I've got the following index set:
ensureIndex({st: 1, n: 1, _id: 1});
But when I run explain() on my query, I can never get the indexOnly to read true, no matter what I do.
db.merchants.find({st: "Blue"}, {n:1,_id:1}).explain()
{
"cursor" : "BtreeCursor st_1_n_1__id_1",
"nscanned" : 8,
"nscannedObjects" : 8,
"n" : 8,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"st" : [
[
"Blue",
"Blue"
]
],
"n" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"_id" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
I've already figured out that the ordering of the keys in the index matter somehow. For instance if I used {_id, n:1, st:1} it wasn't using this index at all to perform the query. I also read somewhere that too few documents could trigger unpredictable behaviour with explain() since multiple strategies are equally fast. But in this case, I see that its using the right index, but its not using just the index. What is this happening?
I am using mongoid, and mongo 2.0.8 I believe.
UPDATE:
Switched over to using Mongoid v3.1.4 and mongod v2.2
Here is the query that mongod is seeing from mongoid: Mon Jul 15 10:47:26 [conn14] runQuery called spl_development.merchants { $query: { st: { $regex: "cr", $options: "i" } }, $explain: true } Mon Jul 15 10:47:26 [conn14] query spl_development.merchants query: { $query: { st: { $regex: "cr", $options: "i" } }, $explain: true } ntoreturn:0 keyUpdates:0 locks(micros) r:212 nreturned:1 reslen:393 0ms
So the projection isn't being sent to the mongod layer and only just handles it in the application layer. Not ideal!
This has been recognized as a bug in mongoid and can be tracked here:
https://github.com/mongoid/mongoid/issues/3142

I expect your query cannot use a covered index because you have a field with an array included in the index. This is suggested in the explain with "isMultiKey" : true.
As noted in the documentation (Create Indexes that Support Covered Queries):
MongoDB cannot use a covered query if any of the indexed fields in any of the documents in the collection includes an array. If an indexed field is an array, the index becomes a multi-key index and cannot support a covered query.

I wasn't able to reproduce the problem in 2.2.2, but add .sort({n: 1, _id: 1}) into the chain. Because you're not sorting, you're asking for the docs in whatever find order mongo wishes to use, and if that doesn't match the order in the index ($natural order, for instance) it still has to read the docs.
db.merchants.find({st: "Blue"}, {n:1,_id:1}).sort({n: 1, _id: 1}).explain()

Related

MongoDB Index boundry constraints

During my hands on with MongoDB I came to understand about a problem with MongoDB indexes. Problem is that MongoDB indexes sometimes doesn't enforce the two-end boundaries to query. Here's one of the output I encountered while querying the database:
Query:
db.user.find({transaction:{$elemMatch:{product:"mobile", firstTransaction:{$gte:ISODate("2015-01-01"), $lt:ISODate("2015-01-02")}}}}).hint("transaction.product_1_transaction.firstTransaction_1").explain()
Output:
"cursor" : "BtreeCursor transaction.firstTransaction_1_transaction.product_1",
"isMultiKey" : true,
"n" : 622,
"nscannedObjects" : 350931,
"nscanned" : 6188185,
"nscannedObjectsAllPlans" : 350931,
"nscannedAllPlans" : 6188185,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 235851,
"nChunkSkips" : 0,
"millis" : 407579,
"indexBounds" : {
"transaction.firstTransaction" : [
[
true,
ISODate("2015-01-02T00:00:00Z")
]
],
"transaction.product" : [
[
"mobile",
"mobile"
]
]
},
As you can see in above example for firstTransaction field one end of the bound is true instead of date I mentioned. I found the workaround for this is min(), max() functions. I tried those but they not seem to be working with embedded document (transaction is an array of sub document which contains fields like firstTransaction, product etc). I get following error:
Query:
db.user.find({transaction:{$elemMatch:{product:'mobile'}}}).min({transaction:{$elemMatch:{firstTransaction:ISODate("2015-01-01")}}}).max({transaction:{$elemMatch:{firstTransaction:ISODate("2015-01-02")}}})
Output:
planner returned error: unable to find relevant index for max/min query
firstTransaction field is indexed though as well as product & their compound index too. I don't know what is going wrong here.
Sample document:
{
_id: UUID (indexed by default),
name: string,
dob: ISODate,
addr: string,
createdAt: ISODate (indexed),
.
.
.,
transaction:[
{
firstTransaction: ISODate(indexed),
lastTransaction: ISODate(indexed),
amount: float,
product: string (indexed),
.
.
.
},...
],
other sub documents...
}
This is the correct behavior. You cannot always intersect the index bounds for $lte and $gte - sometimes it would give incorrect results. For example, consider the document
{ "x" : [{ "a" : [4, 6] }] }
This document matches the query
db.test.find({ "x" : { "$elemMatch" : { "a" : { "$gte" : 5, "$lte" : 5 } } } });
If we define an index on { "x.a" : 1 }, the two index bounds would be [5, infinity], and [-infinity, 5]. Intersecting them would give [5, 5] and using this index bound would not match the document - incorrectly!
Can you provide a sample document and tell us more about what you're trying to do with the query? With context, there may be another way to write the query that uses tighter index bounds.

Mongo Aggregate not using Index

My mongo find query is using an index, but the same functionality if I am implementing using aggregate, it is not using the Index.
db.collection1.find({Attribute8: "s1000",Attribute9: "s1000"}).sort({Attribute10: 1})
"cursor used in find" : "BtreeCursor Attribute8_1_Attribute9_1_Attribute10_1"
db.collection1.aggregate([
{
$match: {
Attribute8: "s1000",
Attribute9: "s1000"
}
},
{
$sort: {
Attribute10: 1
}
}
])
"cursor used in aggregate" : "BtreeCursor ".
Can someone tell me where it went wrong. My goal is to use Indexes in aggregate method.
Thanks in advance.
After some digging the issue is the limitation of usage of the following types:
Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope
In this case Symbol is used for containing a string value, so index wont work.
Please try with a Number en set explain to true in the aggregate option.
[EDIT]
My previous answer is incorrect.
The aggregation pipeline is using a 'BtreeCursor' (only when the defined field has an index) to run the $match query and does uses the ensured index, check "indexBound" for verification.
Ensuring the whole collection to have an index on "Attribute08"
db.temps.ensureIndex({Attribute08:1})
$match on a field with an index:
db.temps.aggregate([{$match:{Attribute08:"s1000"}}],{explain:true})
"allPlans" : [
{
"cursor" : "BtreeCursor ",
"isMultiKey" : false,
"scanAndOrder" : false,
"indexBounds" : {
"Attribute08" : [
[
"s1000",
"s1000"
]
]
}
}
]
Below the $match on a field without index:
db.temps.aggregate([{$match:{Attribute09:"s1000"}}],{explain:true})
"allPlans" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"scanAndOrder" : false
}
]

Mongod - IndexOnly is false even though query and projection are both covered in the index

The mongo documentation for covered querieshere talks about the queries and projections and to simply turn off the _id field in the projection if you want a covered query. What if you need the _id field though and still want the efficiency of a covered query (indexOnly = True)?
db.collection.ensureIndex({field1:1,_id:1})
db.collection.getIndexKeys()
[{
"_id" : 1
},
{
"field1" : 1
},
{
"field1" : 1,
"_id" : 1
}]
db.collection.find({field1:{$regex:/^\s/}},{field1:1,_id:1}).explain()
{
"cursor" : "BtreeCursor fieldname",
"isMultiKey" : false,
"n" : 3582,
"nscannedObjects" : 3582,
"nscanned" : 130511408,
"nscannedObjectsAllPlans" : 3582,
"nscannedAllPlans" : 130511408,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 20,
"nChunkSkips" : 0,
"millis" : 158705,
"indexBounds" : {
"cdr3_aa" : [
[
"",
{
}
]
]
},
"server" : localhost}
Of course if I turn off _id on the projection, IndexOnly returns true and the query is lightning fast. What am I doing wrong?
EDIT - I made it more efficient by getting rid of case insensitivity on a space, adding a ^ to speed up the query, but IndexOnly : False. I don't understand why its not true.
From documentation:
$regex can only use an index efficiently when the regular expression
has an anchor for the beginning (i.e. ^) of a string and is a
case-sensitive match. Additionally, while /^a/, /^a.*/, and /^a.*$/ match equivalent
strings, they have different performance characteristics. All of these expressions use an
index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can
stop scanning after matching the prefix.
In your case you use regex with i which means case-insensitive match. So, you should remove i from regex and start to search from the beginning of field.
BTW, I don't undestand your search criteria: Looking for one space char \s in the field with case-insensitive?
Not quite sure why, but it seems as though your query is using the { field1: 1 } index instead of { field1: 1, _id: 1 } index. Can you try running the query with the hint?
db.collection.find( { field1: {$regex:/^\s/} }, { field1: 1, _id: 1 } ).hint( { field1: 1, _id: 1 } ).explain()
It could be that the query optimizer has selected the { field1: 1 } index initially and has not re-evaluated the various plans. See http://docs.mongodb.org/manual/core/query-plans/ for explanation of the query optimizer and how it selects a plan.

Mongodb 2.4 2dsphere queries very slow (using $geoIntersects)?

mongod.log shows:
{deliver_area: { $geoIntersects:
{ $geometry: {
type: "Point",
coordinates: [ 116.3426399230957, 39.95959281921387 ]
} }
} }
ntoreturn:0
ntoskip:0
nscanned:2965
keyUpdates:0
numYields: 2 locks(micros)
r:136723
nreturned:52
reslen:23453
103ms
The collection has about 10k records, where deliver_area is one of the fields which is a Polygon(GeoJSON) and has a 2dsphere index
This is my query:
db.area_coll.find( {
id: 59,
deliver_area: {
$geoIntersects: {
$geometry: {
type: "Point",
coordinates: [ 116.3175773620605, 39.97607231140137 ]
}
}
}
})
Explain result:
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 3887,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 3887,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 5,
"indexBounds" : {
},
"nscanned" : 3887,
"matchTested" : NumberLong(666),
"geoTested" : NumberLong(0),
"cellsInCover" : NumberLong(1),
"server" : "testing:27017"
}
The query in the log does not match the query that you run as, the location is different:
[ 116.3426399230957, 39.95959281921387 ] vs.
[ 116.3175773620605, 39.97607231140137 ]
I also don't think you have reproduced your whole log line, as it just mentions area and not deliver_area.
However, they are not really slow. In the first case, it took 103ms, which in some cases might happen as your server is doing other IO. The second query took 5ms as the explain() output tells you.
But what is most striking is that your main criterion is id: 59. I don't know what your _id field is, but if you set an index on id then this should not even have to use a 2dsphere index at all — unless you have of course many documents where id=59. In that case, you could be better off with a compound key on { id: 1, deliver_area: '2dsphere' }.
I had exactly the same issue. My index was compound one.
So I had 2dsphere on location field + Ascending index on zoom field.
I always do query by both fields, filtering by location and zoom and it was really slow.
I tried to make two regular indexes (not compound) and it works fast. So looks like compound index which including 2dsphere doesn't work well or should be used in some complicated way.

mongodb index an array's key (not the values)

In MongoDB, I have the following document
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": {
"blabla": 20,
"blibli": 100,
"blublu": 250,
... (many more here)
}
}
And I would like to index it to be able to query for the "key" of the "concept" array (I know it's not really a mongoDB array...):
db.things.find({concepts:blabla});
Is it possible with the above schema? Or shall I refactor my documents to something like
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": ["blabla","blibli","blublu", ... (many more here)]
}
}
I'll answer your actual question. No you cannot index on the field names given your current schema. $exists uses an index but that is an existence check only.
There are a lot of problems with a schema like the one you're using and I would suggest a refactor to :
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": [
{name:"blabla", value: 20},
{name:"blibli", value: 100},
{name:"blublu", value: 250},
... (many more here)
]
}
then index {'concepts.name:1'} and you can actually query on the concept names rather than just check for the existence.
TL;DR : No you can't.
You can query field presence with specific query:
db.your_collection.find({"concept.yourfield": { $exists: true }})
(notice the $exists)
It will return all your document where yourfield is a field of concept subdocument
edit:
this solution is only about query. Indexes contains values not field.
MongoDB indexes each value of the array so you can query for individual items.As you can find here.
But in nested arrays you need to tell to index mongodb to index your sub-fields.
db.col1.ensureIndex({'concepts.blabla':1})
db.col1.ensureIndex({'concepts.blublu':1})
db.col1.find({'concepts.blabla': 20}).explain()
{
"cursor" : "BtreeCursor concepts.blabla_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"concepts.blabla" : [
[
20,
20
]
]
}
}
After creating the index , the cursor type changes itself from BasicCursor to BtreeCursor.
if you create your document as you stated at the end of your question
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": ["blabla","blibli","blublu", ... (many more here)]
}
}
just the indexing will be enough as below:
db.col1.ensureIndex({'concepts':1})