MongoDB - Query on nested field with index - mongodb

I am trying to figure out how I must structure queries such that they will hit my index.
I have documents structured like so:
{ "attributes" : { "make" : "Subaru", "color" : "Red" } }
With an index of: db.stuff.ensureIndex({"attributes.make":1})
What I've found is that querying using dot notation hits the index while querying with a document does not.
Example:
db.stuff.find({"attributes.make":"Subaru"}).explain()
{
"cursor" : "BtreeCursor attributes.make_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"attributes.make" : [
[
"Subaru",
"Subaru"
]
]
}
}
vs
db.stuff.find({attributes:{make:"Subaru"}}).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 0,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
Is there a way to get the document style query to hit the index? The reason is that when constructing queries from my persistent objects it's much easier to serialize them out as documents as opposed to something using dot notation.
I'll also add that we're using a home grown data mapper layer built w/ Jackson. Would using something like Morphia help with properly constructing these queries?

Did some more digging and this thread explains what's going with the sub-document query. My problem above was that to make the sub-document based query act like the dot-notation I needed to use elemMatch.
db.stuff.find({"attributes":{"$elemMatch" : {"make":"Subaru"}}}).explain()
{
"cursor" : "BtreeCursor attributes.make_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 0,
"millis" : 2,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"attributes.make" : [
[
"Subaru",
"Subaru"
]
]
}
}

Related

MongoDB indexing results on array fields

I have a collection as { student_id :1, teachers : [ "....",...]}
steps done in sequence as : 1) find by {teachers : "gore"}
2) set the index as { student_id : 1 }
3) find by {teachers : "gore"}
4) set the index as { teachers : 1 }
5) find by {teachers : "gore"}
and the results(time taken) are not that much effective by indexing teachers(array) Please someone explain what is happening? I may be doing something wrong here please correct me. The results are as :
d.find({teachers : "gore"}).explain()
{ "cursor" : "BasicCursor", "nscanned" : 999999, "nscannedObjects" : 999999, "n" : 447055, "millis" : 1623, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : {
} }
d.ensureIndex({student_id : 1})
d.find({teachers : "gore"}).explain() { "cursor" : "BasicCursor", "nscanned" : 999999, "nscannedObjects" : 999999, "n" : 447055, "millis" : 1300, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : {
} }
d.ensureIndex({teachers : 1})
d.find({teachers : "gore"}).explain() { "cursor" : "BtreeCursor teachers_1", "nscanned" : 447055, "nscannedObjects" : 447055, "n" : 447055, "millis" : 1501, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "teachers" : [ [ "gore", "gore" ] ] } }
Do you have the same data inserted over and over? The fact that it is showing a BtreeCursor is a positive, but the number of nscannedObjects is too large. Do you have the same data inserted over and over again? Is it possible that you have 447055 "gore" values? If so, thats why its taking such a long time.

Sort on $geoWithin geospatial query in MongoDB

I'm trying to retrieve a bunch of Polygons stored inside my db, and sort them by radius. So I wrote a query with a simple $geoWithin.
So, without sorting the code looks like this:
db.areas.find(
{
"geometry" : {
"$geoWithin" : {
"$geometry" : {
"type" : "Polygon",
"coordinates" : [ [ /** omissis: array of points **/ ] ]
}
}
}
}).limit(10).explain();
And the explain result is the following:
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 10,
"nscanned" : 367,
"nscannedObjectsAllPlans" : 10,
"nscannedAllPlans" : 367,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 2,
"indexBounds" : {
},
"nscanned" : 367,
"matchTested" : NumberLong(10),
"geoTested" : NumberLong(10),
"cellsInCover" : NumberLong(27),
"server" : "*omissis*"
}
(Even if it's fast, it shows as cursor S2Cursor, letting me understand that my compound index has not been used. However, it's fast)
So, whenever I try to add a sort command, simply with .sort({ radius: -1 }), the query becomes extremely slow:
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 58429,
"nscanned" : 705337,
"nscannedObjectsAllPlans" : 58429,
"nscannedAllPlans" : 705337,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 3,
"nChunkSkips" : 0,
"millis" : 3186,
"indexBounds" : {
},
"nscanned" : 705337,
"matchTested" : NumberLong(58432),
"geoTested" : NumberLong(58432),
"cellsInCover" : NumberLong(27),
"server" : "*omissis*"
}
with MongoDB scanning all the documents. Obviously I tried to add a compound index, like { radius: -1, geometry : '2dsphere' } or { geometry : '2dsphere' , radius: -1 }, but nothing helped. Still very slow.
I would know if I'm using in the wrong way the compound index, if the S2Cursor tells me something I should change in my indexing strategy, overall, what I am doing wrong.
(PS: I'm using MongoDB 2.4.5+, so the problem is NOT caused by second field ascending in compound index when using 2dsphere index as reported here https://jira.mongodb.org/browse/SERVER-9647)
First of all, s2Cursor means that the query uses a geographic index.
There can be multiple reasons why the sort operation is slow, sort operation require memory, maybe your server has very little memory, you should consider executing sort operations in code, not at the server side.

Mongodb indexing

I have a query
db.messages.find({'headers.Date':{'$gt': new Date(2001,3,1)}},{'headers.From':1, _id:0}).sort({'headers.From':1})
I have set headers.From as index. Now which part of query will use this index ? i.e find part of query or sort part of query?
Explain output is
{
"cursor" : "BtreeCursor headers.From_1",
"isMultiKey" : false,
"n" : 83057,
"nscannedObjects" : 120477,
"nscanned" : 120477,
"nscannedObjectsAllPlans" : 120581,
"nscannedAllPlans" : 120581,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 250,
"indexBounds" : {
"headers.From" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "Andrews-iMac.local:27017"
}
Any help is appreciated !!!
The index is being used for the sort part, not for the query, as your query doesn't use the headers.From field and your sort does.

Why is regex prefix query on indexed array slow in MongoDB?

I am trying to perform regex query on an array of strings in MongoDB collection. I could only find this limitation in the docs:
$regex can only use an index efficiently when the regular expression
has an anchor for the beginning (i.e. ^) of a string and is a
case-sensitive match.
Let's make a test:
> for (var i=0; i<100000; i++) db.test.insert({f: ['a_0_'+i, 'a_1_2']})
> db.test.count()
100000
> db.test.ensureIndex({f: 1})
> db.test.find({f: /^a_(0)?_12$/ })
{ "_id" : ObjectId("514ac59886f004fe03ef2a96"), "f" : [ "a_0_12", "a_1_2" ] }
> db.test.find({f: /^a_(0)?_12$/ }).explain()
{
"cursor" : "BtreeCursor f_1 multi",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 200000,
"nscanned" : 200000,
"nscannedObjectsAllPlans" : 200000,
"nscannedAllPlans" : 200000,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 482,
"indexBounds" : {
"f" : [
[
"a_",
"a`"
],
[
/^a_(0)?_12$/,
/^a_(0)?_12$/
]
]
},
"server" : "someserver:27017"
}
The query is sloooow. On the other hand, this query is optimal: (but doesn't suit my use case)
> db.test.find({f: 'a_0_12' }).explain()
{
"cursor" : "BtreeCursor f_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"f" : [
[
"a_0_12",
"a_0_12"
]
]
},
"server" : "someserver:27017"
}
Why is regex query scanning all (sub)records when it has an index? What am I missing?
Your test case has several characteristics that are unhelpful for regex and index usage:
each document includes an array of two values both starting with "a_". Your regex /^a_(0)?_12$/ is looking for a string starting with a followed by an optional "0", so leads to a comparison of all index entries (200k values).
your regex also matches a value that every document has (a_1_2), so will end up matching all documents irrespective of the index
Since you have a multikey (array index), the number of index comparisons is actually worse than just doing a full table scan of the 100k documents. You can test with a $natural hint to see:
db.test.find({f: /^a_(0|)12$/ }).hint({$natural:1}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 100000,
"nscanned" : 100000,
"nscannedObjectsAllPlans" : 100000,
"nscannedAllPlans" : 100000,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 192,
"indexBounds" : {
},
}
More random data or a more selective regex will result in fewer comparisons.

Improve querying fields exist in MongoDB

I'm in progress with estimation of MongoDB for our customers. Per requirements we need associate with some entity ent variable set of name-value pairs.
db.ent.insert({'a':5775, 'b':'b1'})
db.ent.insert({'c':'its a c', 'b':'b2'})
db.ent.insert({'a':7557, 'c':'its a c'})
After this I need intensively query ent for presence of fields:
db.ent.find({'a':{$exists:true}})
db.ent.find({'c':{$exists:false}})
Per MongoDB docs:
$exists is not very efficient even with an index, and esp. with {$exists:true} since it will effectively have to scan all indexed values.
Can experts there provide more efficient way (even with shift the paradigm) to deal fast with vary name-value pairs
You can redesign your schema like this:
{
pairs:[
{k: "a", v: 5775},
{k: "b", v: "b1"},
]
}
Then you indexing your key:
db.people.ensureIndex({"pairs.k" : 1})
After this you will able to search by exact match:
db.ent.find({'pairs.k':"a"})
In case you go with Sparse index and your current schema, proposed by #WesFreeman, you will need to create an index on each key you want to search. It can affect write performance or will be not acceptable if your keys are not static.
Simply redesign your schema such that it's an indexable query. Your use case is infact analogous to the first example application given in MongoDB The Definitive Guide.
If you want/need the convenience of result.a just store the keys somewhere indexable.
instead of the existing:
db.ent.insert({a:5775, b:'b1'})
do
db.ent.insert({a:5775, b:'b1', index: ['a', 'b']})
That's then an indexable query:
db.end.find({index: "a"}).explain()
{
"cursor" : "BtreeCursor index_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"index" : [
[
"a",
"a"
]
]
}
}
or if you're ever likely to query also by value:
db.ent.insert({
a:5775,
b:'b1',
index: [
{name: 'a', value: 5775},
{name: 'b', value: 'b1'}
]
})
That's also an indexable query:
db.end.find({"index.name": "a"}).explain()
{
"cursor" : "BtreeCursor index.name_",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"index.name" : [
[
"a",
"a"
]
]
}
}
I think a sparse index is the answer to this, although you'll need an index for each field. http://www.mongodb.org/display/DOCS/Indexes#Indexes-SparseIndexes
Sparse indexes should help with $exists:true queries.
Even still, if your field is not really sparse (meaning it's mostly set), it's not going to help you that much.
Update I guess I'm wrong. Looks like there's an open issue ( https://jira.mongodb.org/browse/SERVER-4187 ) still that $exists doesn't use sparse indexes. However, you can do something like this with find and sort, which looks like it properly uses the sparse index:
db.ent.find({}).sort({a:1});
Here's a full demonstration of the difference, using your example values:
> db.ent.insert({'a':5775, 'b':'b1'})
> db.ent.insert({'c':'its a c', 'b':'b2'})
> db.ent.insert({'a':7557, 'c':'its a c'})
> db.ent.ensureIndex({a:1},{sparse:true});
Note that find({}).sort({a:1}) uses the index (BtreeCursor):
> db.ent.find({}).sort({a:1}).explain();
{
"cursor" : "BtreeCursor a_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"a" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
And find({a:{$exists:true}}) does a full scan:
> db.ent.find({a:{$exists:true}}).explain();
{
"cursor" : "BasicCursor",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
Looks like you can also use .hint({a:1}) to force it to use the index.
> db.ent.find().hint({a:1}).explain();
{
"cursor" : "BtreeCursor a_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"a" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
How about setting the non-exists field to null? Then you can query them with {field: {$ne: null}}.
db.ent.insert({'a':5775, 'b':'b1', 'c': null})
db.ent.insert({'a': null, 'b':'b2', 'c':'its a c'})
db.ent.insert({'a':7557, 'b': null, 'c':'its a c'})
db.ent.ensureIndex({"a" : 1})
db.ent.ensureIndex({"b" : 1})
db.ent.ensureIndex({"c" : 1})
db.ent.find({'a':{$ne: null}}).explain()
Here's the output:
{
"cursor" : "BtreeCursor a_1 multi",
"isMultiKey" : false,
"n" : 4,
"nscannedObjects" : 4,
"nscanned" : 5,
"nscannedObjectsAllPlans" : 4,
"nscannedAllPlans" : 5,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"a" : [
[
{
"$minElement" : 1
},
null
],
[
null,
{
"$maxElement" : 1
}
]
]
},
"server" : "my-laptop"
}