Which index would be used if there are multiple indexes containing the same fields? - mongodb

Take, for example, a find() that involves a field a and b, in that order. For example,
db.collection.find({'a':{'$lt':10},'b':{'$lt':5}})
I have two keys in my array of indexes for the collection:
[
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1
},
"ns" : "x.test",
"name" : "a_1_b_1"
},
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1,
"c" : 1
},
"ns" : "x.test",
"name" : "a_1_b_1_c_1"
}
]
Is it guaranteed that mongo will use the first key since it more accurately matches the query, or does it randomly choose any of the two keys because they will both work?

MongoDB has a query optimizer which selects the indexes that are most efficient. From the docs:
The MongoDB query optimizer processes queries and chooses the most
efficient query plan for a query given the available indexes.
So it's not strictly guaranteed (but I expect that the smaller index will yield results faster than the bigger compound index).
You can also use hint operator to force the query optimizer to use the specified index.
db.collection.find({'a':{'$lt':10},'b':{'$lt':5}}).hint({a:1, b:1});
However, those two indexes in your example are redundant. That's because the compound index supports queries on any prefix of index fields.
The following index:
db.collection.ensureIndex({a: 1, b: 1, c: 1});
Can support queries that include a, a and b and a and b and c, but not only b or c, or only b and c.

You and use $exist,, When is true, $exists matches the documents that contain the field, including documents where the field value is null. If is false, the query returns only the documents that do not contain the field.
$exist
the query will be
db.inventory.find( { "key.a": { $exists: true, 1 },"key.b": { $exists: true, 1 } } )

Related

Index vs Aggregation Pipeline for Sorting

I'm developing an application using MongoDB as its database, and for sorting data, I encountered an interesting argument from a colleague that index can be used instead of aggregation pipeline for getting sorted data.
I tried this and it actually works; using an index with specified field and direction does return sorted data when queried. When using aggregation pipeline, I also obtained the same result.
I have created an index with the following specification:
index name: batch_deleted_a_desc
num: asc
marked: asc
value: desc
Using aggregation pipeline:
> db.test.aggregate([{$match: {num:"3",marked:false}}, {$sort:{"value":-1}}])
{ "_id" : ObjectId("5d70b40ba7bebd3d7c135615"), "value" : 4, "marked" : false, "num" : "3" }
{ "_id" : ObjectId("5d70b414a7bebd3d7c135616"), "value" : 2, "marked" : false, "num" : "3" }
{ "_id" : ObjectId("5d70b3fea7bebd3d7c135614"), "value" : 1, "marked" : false, "num" : "3" }
Using index:
> db.test.find({num:"3",marked:false})
{ "_id" : ObjectId("5d70b40ba7bebd3d7c135615"), "value" : 4, "marked" : false, "num" : "3" }
{ "_id" : ObjectId("5d70b414a7bebd3d7c135616"), "value" : 2, "marked" : false, "num" : "3" }
{ "_id" : ObjectId("5d70b3fea7bebd3d7c135614"), "value" : 1, "marked" : false, "num" : "3" }
As you can see, the results are the same. But I am unsure that using index for getting sorted data is a good practice, and yet using aggregation pipeline is (sometimes) taking more effort than just creating index.
So, which would be the best option?
In the context of the question, the better option would be the aggregation because it explicitly specifies the sort.
In the query example, results are being returned in order specified by the index because the query is using the index { num: 1, marked: 1, value: 1}. However, nothing specified in the query will guarantee that ordering, meaning results may change at some point in the future. For example, consider the case where the index { num: 1, marked: 1, updated_at: 1 } were to be created. The query planner may decide to use this index instead, which may result in results in a different order.
In the absence of a sort, a query would return results in the order of the index being used, but you should not rely upon that ordering without explicitly specifying it. Quoting the docs:
Unless you specify the sort() method or use the $near operator,
MongoDB does not guarantee the order of query results.

Mongo index for query

I have a collection with millions of records. I am trying to implement an autocomplete on a field called term that I broke down into an array of words called words. My query is very slow because I am missing something with regards to the index. Can someone please help?
I have the following query:
db.vx.find({
semantic: "product",
concept: true,
active: true,
$and: [ { words: { $regex: "^doxycycl.*" } } ]
}).sort({ length: 1 }).limit(100).explain()
The explain output says that no index was used even though I have the following index:
{
"v" : 1,
"key" : {
"words" : 1,
"active" : 1,
"concept" : 1,
"semantic" : 1
},
"name" : "words_1_active_1_concept_1_semantic_1",
"ns" : "mydatabase.vx"
}
You can check if the compound index is exploited correctly using the mongo shell
db.vx.find({YOURQUERY}).explain('executionStats')
and check the field winningPlan.stage:
COLLSCAN means the indexes are partially used or not used at all.
IXSCAN means the indexes are used correctly in this query.
You can also check if the text search fits your needs since is way more fast than $regex operator.
https://comsysto.com/blog-post/mongodb-full-text-search-vs-regular-expressions

Fetch all documents from mongo shard chunk

I have a 3-shard collection with over 65k chunks. How can I fetch documents from this collection that reside in particular chunk, having in mind that my shard key is a hashed index?
For example, one of my chunks (a document from chunks collection) looks like this:
{
"_id" : "foo.bar-x_1947951600265057904",
"lastmod" : Timestamp(1, 0),
"lastmodEpoch" : ObjectId("57910236c0b70d5ea7025479"),
"ns" : "foo.bar",
"min" : {
"x" : NumberLong("123") <-- this is a hashed value of a string field `x`
},
"max" : {
"x" : NumberLong("987") <-- this is a hashed value of a string field `x`
},
"shard" : "shard0000"
}
How to query foo.bar collection by min/max x which is a regular string? I tried:
db.bar.find({x:{$gte: NumberLong('123'), $lte: NumberLong('987')}}).count()
but it returns 0
You need to access the subdocument properly, i.e. by typing min.x and max.x. Also, you have to bind the conditions by using $and, because they are now separate conditions. I entered the following query and it worked:
db.bar.find({
$and:[
'min.x': {
$gte: NumberLong('123')
},
'max.x': {
$lte: NumberLong('987')
}
]
}).count()
Your initial query returned 0 because there is no document entry that matches x.
Note, that i did not use any sharded collections. I simply created a new collection and inserted the example you gave.
I hope this helps!

MapReduce index usage in MongoDB when using both sort and query

For optimal performance, if you provide both a sort and a query to the MapReduce, should you have:
one index with fields used in the sort, then the fields used in the query
one index with fields used in the query, then the fields used in the sort
two separate indexes
E.g. document contains fields A, B, C, D.
Map-Reduce is using a sort on field A and a query by fields B, C.
Which of the following indexes would be preferable:
{ "A" : 1, "B" : 1, "C" : 1 }
{ "B" : 1, "C" : 1, "A" : 1 }
{ "A" : 1 }, { "B" : 1, "C" : 1 }
Is this documented anywhere? (Index usage by map-reduce when using both sort and query.)

Index for sorting while using $in to query field containing an array

I'm querying an array using $in operator and I'm also trying to sort results, but I keep getting this error:
too much data for sort() with no index. add an index or specify a
smaller limit
I know that the sort is limited to 32 megabytes or you have to use an index. The problem is that I have a compound index on field that I'm querying and on field that I'm sorting on.
The collection:
{
"a" : [ 1, 5, 7, 10 ],
... // other fields are not relevant for querying
}
The query looks like this:
db.mycol.find({ a: { $in : [ 1, 10, 19, 100, 2000 ] }}).sort({b : 1});
The $in query contains approx. 2000 IDs to match.
The index is
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1
},
"ns" : "db.mycol",
"name" : "a_1_b_1",
"background" : true
},
If I use explain() when doing the query without sort() I can see that MongoDB is using that index to perform the query, but it obviously cannot use that same index to perform the sort. I also tried to use a skip and limit, but if I use a skip that's too big I get the same error, probably because index is not used for sorting.
If i create an index only on field b MongoDB will happily sort the data for me. But what I really want is to perform a search on indexed array field and sort the data.
I looked at the documentation but I couldn't find anything helpful. Did I encounter a bug in MongoDB or I'm doing something wrong?