I have a mongodb collection where documents contain the following fields (documents have a lot of fields but I removed them for understanding):
{
"_id": ObjectId("53aad11444d0e2fd648b4567"),
"id": NumberLong(238790),
"rid": NumberLong(12),
"parent_id": {
"0": NumberLong(12),
"1": NumberLong(2)
},
"coid": NumberLong(3159),
"reid": NumberLong(4312),
"cid": NumberLong(4400)
}
When I run a query I get
> db.ads2.find({coid:3159, parent_id : 2}).sort({inserdate:1}).explain()
{
"cursor" : "BtreeCursor coid_1_parent_id_1_insertdate_-1",
"isMultiKey" : true,
"n" : 20444,
"nscannedObjects" : 20444,
"nscanned" : 20444,
"nscannedObjectsAllPlans" : 20444,
"nscannedAllPlans" : 20444,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 319,
"nChunkSkips" : 0,
"millis" : 274,
"indexBounds" : {
"coid" : [
[
3159,
3159
]
],
"parent_id" : [
[
2,
2
]
],
"insertdate" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "myserver.com:27017",
"filterSet" : false
}
The question is how should I change the index to make mongo to use it properly?
First of all, your query and the document structure you posted do not match. If you run this query on the collection that contains documents with that kind of structure you will get no results.
But since your query yielded 20444 results I guess that your structure actually looks like this:
{
...
"parent_id": [
"0": NumberLong(12),
"1": NumberLong(2)
],
...
}
This is the index you created:
db.x.ensureIndex({coid : 1, parent_id : 1, insertdate : -1});
From your explain() output you can see that MongoDB is using index to find the documents (because number of scanned documents is equal to the number of returned documents n == nscanned). But, "scanAndOrder" : true means that MongoDB is not using the index to sort your documents.
Your index should be used for matching and sorting if the field you're using to sort exist in the documents.
But the problem is that I can't see insertdate field nowhere in your structure. So if you're sorting by a field that doesn't exist in your document, normally, MongoDB can't use the index for sorting.
Edit
After your comment and examination of your original question I probably found what's causing your problem. You have a typo in the query you're executing. You're specifying the sort parameter as inserdate, while the name of the field that was indexes is insertdate.
Related
I have a really simple mongo query, that should use _id index.
The explain plan looks good:
> db.items.find({ deleted_at: null, _id: ObjectId('541fd8016d792e0804820100') }).sort({ positions: 1 }).explain()
{
"cursor" : "BtreeCursor _id_",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 6,
"nscannedAllPlans" : 7,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"_id" : [
[
ObjectId("541fd8016d792e0804820100"),
ObjectId("541fd8016d792e0804820100")
]
]
},
"server" : "mydbserver:27017",
"filterSet" : false
}
But when I execute the query it executes in 100-800ms:
> db.items.find({ deleted_at: null, _id: ObjectId('541fd8016d792e0804820100') }).sort({ positions: 1 })
2014-09-26T12:34:00.279+0300 [conn38926] query mydb.items query: { query: { deleted_at: null, _id: ObjectId('541fd8016d792e0804820100') }, orderby: { positions: 1.0 } } planSummary: IXSCAN { positions: 1 } ntoreturn:0 ntoskip:0 nscanned:70043 nscannedObjects:70043 keyUpdates:0 numYields:1 locks(micros) r:1391012 nreturned:1 reslen:814 761ms
Why is it reporting nscanned:70043 nscannedObjects:70043 and why it so slow?
I am using MongoDB 2.6.4 on CentOS 6.
I tried repairing MongoDB, full dump/import, doesn't help.
Update 1
> db.items.find({deleted_at:null}).count()
67327
> db.items.find().count()
70043
I don't have index on deleted_at, but I have index on _id.
Update 2 (2014-09-26 14:57 EET)
Adding index on _id, deleted_at doesn't help, even explain doesn't use that index :(
> db.items.ensureIndex({ _id: 1, deleted_at: 1 }, { unique: true })
> db.items.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "mydb.items"
},
{
"v" : 1,
"unique" : true,
"key" : {
"_id" : 1,
"deleted_at" : 1
},
"name" : "_id_1_deleted_at_1",
"ns" : "mydb.items"
}
]
> db.items.find({ deleted_at: null, _id: ObjectId('541fd8016d792e0804820100') }).sort({ positions: 1 }).explain()
{
"cursor" : "BtreeCursor _id_",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 7,
"nscannedAllPlans" : 8,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"_id" : [
[
ObjectId("541fd8016d792e0804820100"),
ObjectId("541fd8016d792e0804820100")
]
]
},
"server" : "myserver:27017",
"filterSet" : false
}
Update 3 (2014-09-26 15:03:32 EET)
Adding index on _id, deleted_at, positions helped. But still it seems weird that previous cases forces full collection scan.
> db.items.ensureIndex({ _id: 1, deleted_at: 1, positions: 1 })
> db.items.find({ deleted_at: null, _id: ObjectId('541fd8016d792e0804820100') }).sort({ positions: 1 }).explain()
{
"cursor" : "BtreeCursor _id_1_deleted_at_1_positions_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 3,
"nscannedAllPlans" : 3,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"_id" : [
[
ObjectId("541fd8016d792e0804820100"),
ObjectId("541fd8016d792e0804820100")
]
],
"deleted_at" : [
[
null,
null
]
],
"positions" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "myserver:27017",
"filterSet" : false
}
This looks like a bug. The query planner should select the _id index and the _id index should be all you need as it must immediately reduce the result set to one document. The sort should be irrelevant as it's ordering one document. It's a weird case because you are explicitly asking for one document with an _id match and then sorting it. You should be able to go around mongoid and drop the sort as a workaround.
.explain() does not ignore the sort. You can test that simply like so:
> for (var i = 0; i < 100; i++) { db.sort_test.insert({ "i" : i }) }
> db.sort_test.ensureIndex({ "i" : 1 })
> db.sort_test.find().sort({ "i" : 1 }).explain()
If MongoDB can't use the index for the sort, it will sort in memory. The field scanAndOrder in explain output tells you if MongoDB cannot use the index to sort the query results (i.e. scanAndOrder : false means MongoDB can use the index to sort the query results).
Could you please file a bug report in the MongoDB SERVER project? Perhaps the engineers will say it's working as designed but the behavior looks wrong to me and there's been a couple of query planner gotchas in 2.6.4 already. I may have missed it if it was said before, but does the presence/absence of deleted_at : null affect the problem?
Also, if you do file a ticket, please post a link to it in your question or as a comment on this answer so it's easy for others to follow. Thanks!
UPDATE : corrected my answer that suggested to use a (_id, deleted_at) compound index.Also in the comments, more clarity on how explain() might not reflect the query planner in certain cases.
What we are expecting is that find() will filter the results, and then the sort() would apply on the filtered set. However, according to this doc, the query planner will not use the index on _id, and also the index (if any) on postion for this query. Now, if you have a compound index on (_id, position), it should be able to use that index to process the query.
The gist is that if you have an index that covers your query, you can be assured of your index being used by the query planner. In your case, the query definitely isn't covered, as indicated by indexOnly : false in the explain plan.
If this is by design, it deifinitely is counter-intuitive and as wdberkely suggested, you should file a bug report, so that the community gets a more detailed explanation of this peculiar behaviour.
I am guessing that you have 70043 objects with id of '541fd8016d792e0804820100'. Can you simply do a find on that id and count() them? Index does not mean 'unique' index -- if you have a index "bucket" with a certain value, once the bucket is reached, it now does a scan within the bucket to look at every 'deleted_at' value to see if its 'null'. To get around this, use a compound index of (id, deleted_at).
I have two arrays in my collection (one is an embedded document and the other one is just a simple collection of strings). A document for example:
{
"_id" : ObjectId("534fb7b4f9591329d5ea3d0c"),
"_class" : "discussion",
"title" : "A",
"owner" : "1",
"tags" : ["tag-1", "tag-2", "tag-3"],
"creation_time" : ISODate("2014-04-17T11:14:59.777Z"),
"modification_time" : ISODate("2014-04-17T11:14:59.777Z"),
"policies" : [
{
"participant_id" : "2",
"action" : "CREATE"
}, {
"participant_id" : "1",
"action" : "READ"
}
]
}
Since some of the queries will include only the policies and some will include the tags and the participants arrays, and considering the fact that I can't create an multikey indexe with two arrays, I thought that it will be a classic scenario to use the Index Intersection.
I'm executing a query , but I can't see the intersection kicks in.
Here are the indexes:
db.discussion.getIndexes()
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test-fw.discussion"
},
{
"v" : 1,
"key" : {
"tags" : 1,
"creation_time" : 1
},
"name" : "tags",
"ns" : "test-fw.discussion",
"dropDups" : false,
"background" : false
},
{
"v" : 1,
"key" : {
"policies.participant_id" : 1,
"policies.action" : 1
},
"name" : "policies",
"ns" : "test-fw.discussion"
}
Here is the query:
db.discussion.find({
"$and" : [
{ "tags" : { "$in" : [ "tag-1" , "tag-2" , "tag-3"] }},
{ "policies" : { "$elemMatch" : {
"$and" : [
{ "participant_id" : { "$in" : [
"participant-1",
"participant-2",
"participant-3"
]}},
{ "action" : "READ"}
]
}}}
]
})
.limit(20000).sort({ "creation_time" : 1 }).explain();
And here is the result of the explain:
"clauses" : [
{
"cursor" : "BtreeCursor tags",
"isMultiKey" : true,
"n" : 10000,
"nscannedObjects" : 10000,
"nscanned" : 10000,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"tags" : [
[
"tag-1",
"tag-1"
]
],
"creation_time" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
},
{
"cursor" : "BtreeCursor tags",
"isMultiKey" : true,
"n" : 10000,
"nscannedObjects" : 10000,
"nscanned" : 10000,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"tags" : [
[
"tag-2",
"tag-2"
]
],
"creation_time" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
},
{
"cursor" : "BtreeCursor tags",
"isMultiKey" : true,
"n" : 10000,
"nscannedObjects" : 10000,
"nscanned" : 10000,
"scanAndOrder" : false,
"indexOnly" : false,
"nChunkSkips" : 0,
"indexBounds" : {
"tags" : [
[
"tag-3",
"tag-3"
]
],
"creation_time" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
],
"cursor" : "QueryOptimizerCursor",
"n" : 20000,
"nscannedObjects" : 30000,
"nscanned" : 30000,
"nscannedObjectsAllPlans" : 30203,
"nscannedAllPlans" : 30409,
"scanAndOrder" : false,
"nYields" : 471,
"nChunkSkips" : 0,
"millis" : 165,
"server" : "User-PC:27017",
"filterSet" : false
Each of the tags in the query (tag1, tag-2 and tag-3 ) have 10K documents.
Each of the policies ({participant-1,READ},{participant-2,READ},{participant-3,READ}) have 10K documents.
The AND operator results with 20K documents.
As I said earlier, I can't see why the intersection of the two indexes (I mean the policies and the tags indexes), doesn't kick in.
Can someone please shade some light on the thing that I'm missing?
There are two things that are actually important to your understanding of this.
The first point is that the query optimizer can only use one index when resolving the query plan and cannot use both of the indexes you have specified. As such it picks the one that is the best "fit" by it's own determination, unless you explicitly specify this with a hint. Intersection somewhat suits, but now for the next point:
The second point is documented in the limitations of compound indexes. This actually points out that even if you were to "try" to create a compound index that included both of the array fields you want, then you could not. The problem here is that as an array this introduces too many possibilities for the bounds keys, and a multi-key index already introduces a fair level of complexity when used in compound with a standard field.
The limitations on combining the two multi-key indexes is the main problem here, much as it is on creation, the complexity of "combining" the two produces two many permutations to make it a viable option.
It might just be the case that the policies index is actually going to be the better one to use for this type of search, and you could probably amend this by specifying that field in the query first:
db.discussion.find({
{
"policies" : { "$elemMatch" : {
"participant_id" : { "$in" : [
"participant-1",
"participant-2",
"participant-3"
]},
"action" : "READ"
}},
"tags" : { "$in" : [ "tag-1" , "tag-2" , "tag-3"] }
}
)
That is if that will select the smaller range of data, which it probably does. Otherwise use the hint modifier as mentioned earlier.
If that does not actually directly help results, it might be worth re-considering the schema to something that would not involve having those values in array fields or some other type of "meta" field that could be easily looked up with an index.
Also note in the edited form that all the wrapping $and statements should not be required as "and" is implicit in MongoDB queries. As a modifier it is only required if you want two different conditions on the same field.
After doing a little testing, I believe Mongo can, in fact, use two multikey indexes in an intersection. I created a collection with the following structure:
{
"_id" : ObjectId("54e129c90ab3dc0006000001"),
"bar" : [
"hgcmdflitt",
...
"nuzjqxmzot"
],
"foo" : [
"bxzvqzlpwy",
...
"xcwrwluxbd"
]
}
I created indexes on foo and bar and then ran the following query. Note the "true" passed in to explain. This enables verbose mode.
db.col.find({"bar":"hgcmdflitt", "foo":"bxzvqzlpwy"}).explain(true)
In the verbose results, you can find the "allPlans" section of the response, which will show you all of the query plans mongo considered.
"allPlans" : [
{
"cursor" : "BtreeCursor bar_1",
...
},
{
"cursor" : "BtreeCursor foo_1",
...
},
{
"cursor" : "Complex Plan"
...
}
]
If you see a plan with "cursor" : "Complex Plan" that means mongo considered using an index intersection. To find the reasons why mongo might not have decided to actually use that query plan, see this answer: Why doesn't MongoDB use index intersection?
I have a collection in MongoDB (app_logins) that hold documents with the following structure:
{
"_id" : "c8535f1bd2404589be419d0123a569de"
"app" : "MyAppName",
"start" : ISODate("2014-02-26T14:00:03.754Z"),
"end" : ISODate("2014-02-26T15:11:45.558Z")
}
Since the documentation says that the queries in an $or can be executed in parallel and can use separate indices, and I assume the same holds true for $and, I added the following indices:
db.app_logins.ensureIndex({app:1})
db.app_logins.ensureIndex({start:1})
db.app_logins.ensureIndex({end:1})
But when I do a query like this, way too many documents are scanned:
db.app_logins.find(
{
$and:[
{ app : "MyAppName" },
{
$or:[
{
$and:[
{ start : { $gte:new Date(1393425621000) }},
{ start : { $lte:new Date(1393425639875) }}
]
},
{
$and:[
{ end : { $gte:new Date(1393425621000) }},
{ end : { $lte:new Date(1393425639875) }}
]
},
{
$and:[
{ start : { $lte:new Date(1393425639875) }},
{ end : { $gte:new Date(1393425621000) }}
]
}
]
}
]
}
).explain()
{
"cursor" : "BtreeCursor app_1",
"isMultiKey" : true,
"n" : 138,
"nscannedObjects" : 10716598,
"nscanned" : 10716598,
"nscannedObjectsAllPlans" : 10716598,
"nscannedAllPlans" : 10716598,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 30658,
"nChunkSkips" : 0,
"millis" : 38330,
"indexBounds" : {
"app" : [
[
"MyAppName",
"MyAppName"
]
]
},
"server" : "127.0.0.1:27017"
}
I know that this can be caused because 10716598 match the 'app' field, but the other query can return a much smaller subset.
Is there any way I can optimize this? The aggregation framework comes to mind, but I was thinking that there may be a better way to optimize this, possibly using indexes.
Edit:
Looks like if I add an index on app-start-end, as Josh suggested, I am getting better results. I am not sure if I can optimize this further this way, but the results are much better:
{
"cursor" : "BtreeCursor app_1_start_1_end_1",
"isMultiKey" : false,
"n" : 138,
"nscannedObjects" : 138,
"nscanned" : 8279154,
"nscannedObjectsAllPlans" : 138,
"nscannedAllPlans" : 8279154,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 2934,
"nChunkSkips" : 0,
"millis" : 13539,
"indexBounds" : {
"app" : [
[
"MyAppName",
"MyAppName"
]
],
"start" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"end" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "127.0.0.1:27017"
}
You can use a compound index to further improve performance.
Try using .ensureIndex({app:1, start:1, end:1})
This will allow mongo to match on app using an index, and then within the documents that matched on app, it will match on start also using an index. Likewise, for the documents that matched on start within the documents it matched on app, it will match on end using an index.
I doubt $and is executed in parallel. I haven't seen any documentation suggest so either. It just logically doesn't make sense as $and needs both to be present. Opposed to $or, only 1 needs to exist.
Your example only uses "start" & "end" without "app". I would drop "app" in the complex index which should reduce the index size. It will reduce the chance of RAM swapping if your database grows too big.
If searching for "app" is separate from "start" & "end", then have a separate simple index on "app" only, plus the complex index of "start" & "end" will be more efficient.
I created a multi-key compound index via Casbah (Scala library for Mongo):
db.collection.ensureIndex(MongoDBObject("Header.records.n" -> 1) ++ MongoDBObject("Header.records.v" -> 1) ++ MongoDBObject("Header.records.l" -> 1))
Then, via the Mongo Shell I had performed a db.collection.find(...).explain where the nScannedObjects exceeded the db.collection.count(). Looking at the Mongo docs, it appears that ensureIndex needs to be called once, and then any writes will force an update of the index.
However, I saw a post and this one that it's only required to call db.collection.ensureIndex(...) once.
EDIT
>db.collection.find( {"Header.records" : {$all : [
{$elemMatch: {n: "Name", v: "Kevin",
"l" : { "$gt" : 0 , "$lt" : 15}} }]}},
{_id : 1}).explain()
{
"cursor" : "BtreeCursor
Header.records.n_1_Header.records.v_1_Header.records.l_1",
"isMultiKey" : true,
"n" : 4098,
"nscannedObjects" : 9412,
"nscanned" : 9412,
"nscannedObjectsAllPlans" : 9412,
"nscannedAllPlans" : 9412,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 152,
"indexBounds" : {
"Header.records.n" : [
[
"Name",
"Name"
]
],
"Header.records.v" : [
[
"Kevin",
"Kevin"
]
],
"Header.records.l" : [
[
0,
1.7976931348623157e+308
]
]
},
"server" : "ABCD:27017"
Note that nScanned (9412) > count(4248).
> db.collection.count()
4248
Why?
About "nscanned" exceeding the count, that is probable since you actually have way more index entries than you have documents: each item in your list is an index entry. It seems like here you have on average 2 items in list per document. "nscannedObjects" follows the same principle since that counter is incremented whenever a document is looked at, even if the same document was already looked at earlier as part of the same query.
I have a collection of documents with a multikey index defined. However, the performance of the query is pretty poor for just 43K documents. Is ~215ms for this query considered poor? Did I define the index correctly if nscanned is 43902 (which equals the total documents in the collection)?
Document:
{
"_id": {
"$oid": "50f7c95b31e4920008dc75dc"
},
"bank_accounts": [
{
"bank_id": {
"$oid": "50f7c95a31e4920009b5fc5d"
},
"account_id": [
"ff39089358c1e7bcb880d093e70eafdd",
"adaec507c755d6e6cf2984a5a897f1e2"
]
}
],
"created_date": "2013,01,17,09,50,19,274089",
}
Index:
{ "bank_accounts.bank_id" : 1 , "bank_accounts.account_id" : 1}
Query:
db.visitor.find({ "bank_accounts.account_id" : "ff39089358c1e7bcb880d093e70eafdd" , "bank_accounts.bank_id" : ObjectId("50f7c95a31e4920009b5fc5d")}).explain()
Explain:
{
"cursor" : "BtreeCursor bank_accounts.bank_id_1_bank_accounts.account_id_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 43902,
"nscanned" : 43902,
"nscannedObjectsAllPlans" : 43902,
"nscannedAllPlans" : 43902,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 213,
"indexBounds" : {
"bank_accounts.bank_id" : [
[
ObjectId("50f7c95a31e4920009b5fc5d"),
ObjectId("50f7c95a31e4920009b5fc5d")
]
],
"bank_accounts.account_id" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "Not_Important"
}
I see three factors in play.
First, for application purposes, make sure that $elemMatch isn't a more appropriate query for this use-case. http://docs.mongodb.org/manual/reference/operator/elemMatch/. It seems like it would be bad if the wrong results came back due to multiple subdocuments satisfying the query.
Second, I imagine the high nscanned value can be accounted for by querying on each of the field values independently. .find({ bank_accounts.bank_id: X }) vs. .find({"bank_accounts.account_id": Y}). You may see that nscanned for the full query is about equal to nscanned of the largest subquery. If the index key were being evaluated fully as a range, this would not be expected, but...
Third, the { "bank_accounts.account_id" : [[{"$minElement" : 1},{"$maxElement" : 1}]] } clause of the explain plan shows that no range is being applied to this portion of the key.
Not really sure why, but I suspect it has something to do with account_id's nature (an array within a subdocument within an array). 200ms seems about right for an nscanned that high.
A more performant document organization might be to denormalize the account_id -> bank_id relationship within the subdocument, and store:
{"bank_accounts": [
{
"bank_id": X,
"account_id: Y,
},
{
"bank_id": X,
"account_id: Z,
}
]}
instead of:
{"bank_accounts": [{
"bank_id": X,
"account_id: [Y, Z],
}]}
My tests below show that with this organization, the query optimizer gets back to work and exerts a range on both keys:
> db.accounts.insert({"something": true, "blah": [{ a: "1", b: "2"} ] })
> db.accounts.ensureIndex({"blah.a": 1, "blah.b": 1})
> db.accounts.find({"blah.a": 1, "blah.b": "A RANGE"}).explain()
{
"cursor" : "BtreeCursor blah.a_1_blah.b_1",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 0,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"blah.a" : [
[
1,
1
]
],
"blah.b" : [
[
"A RANGE",
"A RANGE"
]
]
}
}