Does query order affect compound index usage - mongodb

MongoDB compound indexes support queries on any prefix of the index fields, but does the field order in the query must match the order in the compound index itself?
Assuming we have the following index:
{ "item": 1, "location": 1, "stock": 1 }
Does it cover this query:
{"location" : "Antarctica", "item" : "Hamster Wheel"}

Yes. The order/sequence of the fields in the index creation matters.
In your examples above all the queries that filter on "item" may use the index, but queries that do not use the "item" field and use "location" and or "stock" as your filter condition will NOT use this index.
The sequence of the fields in the filter in the "read" query does NOT matter. MongoDB is smart enough to know that
{"location" : "Antarctica", "item" : "Hamster Wheel"}
is the same as
{"item" : "Hamster Wheel", "location" : "Antarctica"}
As others have pointed out, the best way to ensure that your query is using the index, is to run an explain on your query http://bit.ly/1oE6zo1

Related

Mongo index for query

I have a collection with millions of records. I am trying to implement an autocomplete on a field called term that I broke down into an array of words called words. My query is very slow because I am missing something with regards to the index. Can someone please help?
I have the following query:
db.vx.find({
semantic: "product",
concept: true,
active: true,
$and: [ { words: { $regex: "^doxycycl.*" } } ]
}).sort({ length: 1 }).limit(100).explain()
The explain output says that no index was used even though I have the following index:
{
"v" : 1,
"key" : {
"words" : 1,
"active" : 1,
"concept" : 1,
"semantic" : 1
},
"name" : "words_1_active_1_concept_1_semantic_1",
"ns" : "mydatabase.vx"
}
You can check if the compound index is exploited correctly using the mongo shell
db.vx.find({YOURQUERY}).explain('executionStats')
and check the field winningPlan.stage:
COLLSCAN means the indexes are partially used or not used at all.
IXSCAN means the indexes are used correctly in this query.
You can also check if the text search fits your needs since is way more fast than $regex operator.
https://comsysto.com/blog-post/mongodb-full-text-search-vs-regular-expressions

Is my MongoDB index corrupt? adding additional query parameter increases the found documents

My query is failing to find all matching results. If I add an additional _id parameter to a specific matching example, I am getting results
> db.reviews.count({"contentProvider":"GLORP", "responses.0": {$exists: true}})
0
> db.reviews.count({_id: "1234", "contentProvider":"GLORP", "responses.0": {$exists: true}})
1
the first query is using index:
"indexName" : "contentProvider_1_reviewDetail_1_reviewerUserName_1_providerReviewId_1",
and the query with the _id is of course using the _id_ index:
"indexName" : "_id_"
Here is the index in question:
{
"v" : 1,
"key" : {
"contentProvider" : 1,
"reviewDetail" : 1,
"reviewerUserName" : 1,
"providerReviewId" : 1
},
"name" : "contentProvider_1_reviewDetail_1_reviewerUserName_1_providerReviewId_1",
"ns" : "test.reviews",
"background" : true
}
Using mongodb version 3.2.3
Is the index corrupted? Will dropping it and readding it likely fix the problem?
It's possible and you could certainly try it, however without knowing what version of MongoDB you are using and without seeing the index definition I cannot say for certain.
There are multiple different types of indexes as well as index properties like: sparse or partial that can change behavior and may explain why the index doesn't return the results you expect.
I'd recommend checking the index first and see if the index definition has any properties that would result in the document being excluded.
If not then you can always drop the index and recreate it.

Conflict in choosing the perfect index in MongoDB query optimizer

My problem is related to the query optimizer of MongoDB and how it picks the perfect index to use. I realized that under some conditions the optimizer doesn't pick the perfect existing index and rather continues using the one that is close enough.
Consider having a simple dataset like:
{ "_id" : 1, "item" : "f1", "type" : "food", "quantity" : 500 }
{ "_id" : 2, "item" : "f2", "type" : "food", "quantity" : 100 }
{ "_id" : 3, "item" : "p1", "type" : "paper", "quantity" : 200 }
{ "_id" : 4, "item" : "p2", "type" : "paper", "quantity" : 150 }
{ "_id" : 5, "item" : "f3", "type" : "food", "quantity" : 300 }
{ "_id" : 6, "item" : "t1", "type" : "toys", "quantity" : 500 }
{ "_id" : 7, "item" : "a1", "type" : "apparel", "quantity" : 250 }
{ "_id" : 8, "item" : "a2", "type" : "apparel", "quantity" : 400 }
{ "_id" : 9, "item" : "t2", "type" : "toys", "quantity" : 50 }
{ "_id" : 10, "item" : "f4", "type" : "food", "quantity" : 75 }
and then want to issue a query as following:
db.inventory.find({"type": "food","quantity": {$gt: 50}})
I go ahead and create the following index:
db.inventory.ensureIndex({"quantity" : 1, "type" : 1})
The statistics of cursor.explain() confirms that this index has the following performance: ( "n" : 4, "nscannedObjects" : 4, "nscanned" : 9). It scanned more indexes than the perfect matching number. Considering the fact that "type" is a higher selective attribute with an identified match, it is surely better to create the following index instead:
db.inventory.ensureIndex({ "type" : 1, "quantity" : 1})
The statistics also confirms that this index performs better: ("n" : 4, "nscannedObjects" : 4, "nscanned" : 4). Meaning the second index needs exactly scanning the same number of indexes as the matched documents.
However, I observed if I don't delete the first index, the query optimizer continues using the first index, although the better index is got created.
According to the documentation, every time a new index is created the query optimizer consider it to make the query plan, but I don't see this happening here.
Can anyone explain how the query optimizer really works?
Considering the fact that "type" is a higher selective attribute
Index selectivity is a very important aspect, but in this case, note that you're using an equality query on type and a range query on quantity which is the more compelling reason to swap the order of indices, even if selectivity was lower.
However, I observed if I don't delete the first index, the query optimizer continues using the first index, although the better index is got created. [...]
The MongoDB query optimizer is largely statistical. Unlike most SQL engines, MongoDB doesn't attempt to reason what could be a more or less efficient index. Instead, it simply runs different queries in parallel from time to time and remembers which one was faster. The faster strategy will then be used. From time to time, MongoDB will perform parallel queries again and re-evaluate the strategy.
One problem of this approach (and maybe the cause of the confusion) is that there's probably not a big difference with such a tiny dataset - it's often better to simply scan elements than to use any kind of index or search strategy if the data isn't large compared to the prefetch / page size / cache size and pipeline length. As a rule of thumb, simple lists of up to maybe 100 or even 1,000 elements often don't benefit from indexing at all.
Like for doing anything greater, designing indexes requires some forward thinking. The goal is:
Efficiency - fast read / write operations
Selectivity - minimize records scanning
Other requirements - e.g. how are sorts handled?
Selectivity is the primary factor that determines how efficiently an index can be used. Ideally, the index enables us to select only those records required to complete the result set, without the need to scan a substantially larger number of index keys (or documents) in order to complete the query. Selectivity determines how many records any subsequent operations must work with. Fewer records means less execution time.
Think about what queries will be used most frequently by the application. Use explain command and specifically see the executionStats:
nReturned
totalKeysExamined - if the number of keys examined very large than the returned documents? We need some index to reduce it.
Look at queryPlanner, rejectedPlans. Look at winningPlan which shows the keyPattern which shows which keys needed to indexed. Whenever we see stage:SORT, it means that the key to sort is not part of the index or the database was not able to sort documents based on the sort order specified in the database. And needed to perform in-memory sort. If we add the key based on which the sort happens, we will see that the winningPlan's' stage changes from SORT to FETCH. The keys in the index needs to be specified based on the range of the data for them. e.g.: the class will have lesser volume than student. Doing this needs us to have a trade-off. Although the executionTimeMillis will be very less but the docsExamined and keysExamined will be relatively a little large. But this trade-off is worth making.
There is also a way to force queries to use a particular index but this is not recommended to be a part of deployment. The command in concern is the .hint() which can be chained after find or sort for sorting etc. It requires the actual index name or the shape of the index.
In general, when building compound indexes for:
- equality field: field on which queries will perform an equality test
- sort field: field on which queries will specify a sort
- range field: field on which queries perform a range test
The following rules of thumb should we keep in mind:
Equality fields before range fields
Sort fields before range fields
Equality fields before sort fields

Index for sorting while using $in to query field containing an array

I'm querying an array using $in operator and I'm also trying to sort results, but I keep getting this error:
too much data for sort() with no index. add an index or specify a
smaller limit
I know that the sort is limited to 32 megabytes or you have to use an index. The problem is that I have a compound index on field that I'm querying and on field that I'm sorting on.
The collection:
{
"a" : [ 1, 5, 7, 10 ],
... // other fields are not relevant for querying
}
The query looks like this:
db.mycol.find({ a: { $in : [ 1, 10, 19, 100, 2000 ] }}).sort({b : 1});
The $in query contains approx. 2000 IDs to match.
The index is
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1
},
"ns" : "db.mycol",
"name" : "a_1_b_1",
"background" : true
},
If I use explain() when doing the query without sort() I can see that MongoDB is using that index to perform the query, but it obviously cannot use that same index to perform the sort. I also tried to use a skip and limit, but if I use a skip that's too big I get the same error, probably because index is not used for sorting.
If i create an index only on field b MongoDB will happily sort the data for me. But what I really want is to perform a search on indexed array field and sort the data.
I looked at the documentation but I couldn't find anything helpful. Did I encounter a bug in MongoDB or I'm doing something wrong?

MongoDB not using compound index on '_id'

I have a collection in MongoDB which has following documents.
/* 0 */
{
"T" : [
374135056604448742
],
"_id" : {
"#" : 7778532275691,
"ts" : ISODate("2013-07-26T02:25:00Z")
}
}
/* 1 */
{
"T" : [
1056188940167152853
],
"_id" : {
"#" : 34103385525388,
"ts" : ISODate("2013-07-30T03:00:00Z")
}
}
/* 2 */
{
"T" : [
1056188940167152853
],
"_id" : {
"#" : 34103385525388,
"ts" : ISODate("2013-07-30T03:18:00Z")
}
}
Now, I'm trying to query some documents with following query.
db.entries.find({
'_id.ts': {'$gte': beginTS, '$lte': endTS},
'_id.#' : 884327843395156951
}).hint([('_id', 1)]).explain()
According to my understanding, since _id is a compound field, and Mongo always maintains a index on _id, hence to answer above query, Mongo should have used the index on '_id'. However, the answer to the above query is as following:
{u'allPlans': [{u'cursor': u'BtreeCursor _id_',
u'indexBounds': {u'_id': [[{u'$minElement': 1}, {u'$maxElement': 1}]]},
u'n': 2803,
u'nscanned': 4869528,
u'nscannedObjects': 4869528}],
u'cursor': u'BtreeCursor _id_',
u'indexBounds': {u'_id': [[{u'$minElement': 1}, {u'$maxElement': 1}]]},
u'indexOnly': False,
u'isMultiKey': False,
u'millis': 128415,
u'n': 2803,
u'nChunkSkips': 0,
u'nYields': 132,
u'nscanned': 4869528,
u'nscannedAllPlans': 4869528,
u'nscannedObjects': 4869528,
u'nscannedObjectsAllPlans': 4869528,
u'scanAndOrder': False,
As it can be observed, MongoDB is doing an entire scan of DB to find just handful of documents. I don't know what the hell is wrong here.
I tried changing the order of query, but same result. I have no idea what is happening here. Any help if deeply appreciated.
UPDATE
I understood the nuance here. The _id is not a compound index, it's a mere exact index. This means that if _id is a document then irrespective of the structure of document and how many nested attrs or sub-documents it may have, the _id index will only contain one entry for the _id field. This entry is suppose to be hash of _id document and will be maintained unique.
You are using an object as a key, but you're not using a compund index here.
The _id index is a bit special, because it is created automatically and is always unique. Normally, the _id index is an ObjectId, a UUID or maybe an integer or a string that contains some kind of hash. MongoDB supports complex objects as keys. However, to MongoDB, this is still just a document. It can be compared to other documents, and documents that have the same fields and values will be equal. But since you didn't create the index keys (and you can't create that index manually), MongoDB has no idea that it contains a field # and a field ts.
A compound index, on the other hand, refers to the fields of a document explicitly, e.g. {"product.quantity" : 1, "product.created" : -1}. This must be specified when the index is created.
It seems you're trying to basically store a timestamp in your primary key. MongoDB's ObjectId already contains a timestamp, so you can do date-based range queries on ObjectIds directly.