Mongo index for query - mongodb

I have a collection with millions of records. I am trying to implement an autocomplete on a field called term that I broke down into an array of words called words. My query is very slow because I am missing something with regards to the index. Can someone please help?
I have the following query:
db.vx.find({
semantic: "product",
concept: true,
active: true,
$and: [ { words: { $regex: "^doxycycl.*" } } ]
}).sort({ length: 1 }).limit(100).explain()
The explain output says that no index was used even though I have the following index:
{
"v" : 1,
"key" : {
"words" : 1,
"active" : 1,
"concept" : 1,
"semantic" : 1
},
"name" : "words_1_active_1_concept_1_semantic_1",
"ns" : "mydatabase.vx"
}

You can check if the compound index is exploited correctly using the mongo shell
db.vx.find({YOURQUERY}).explain('executionStats')
and check the field winningPlan.stage:
COLLSCAN means the indexes are partially used or not used at all.
IXSCAN means the indexes are used correctly in this query.
You can also check if the text search fits your needs since is way more fast than $regex operator.
https://comsysto.com/blog-post/mongodb-full-text-search-vs-regular-expressions

Related

MongoDB querying to with changing values for key

Im trying to get back into Mongodb and Ive come across something that I cant figure out.
I have this data structure
> db.ratings.find().pretty()
{
"_id" : ObjectId("55881e43424cbb1817137b33"),
"e_id" : ObjectId("5565e106cd7a763b2732ad7c"),
"type" : "like",
"time" : 1434984003156,
"u_id" : ObjectId("55817c072e48b4b60cf366a7")
}
{
"_id" : ObjectId("55893be1e6a796c0198e65d3"),
"e_id" : ObjectId("5565e106cd7a763b2732ad7c"),
"type" : "dislike",
"time" : 1435057121808,
"u_id" : ObjectId("55817c072e48b4b60cf366a7")
}
{
"_id" : ObjectId("55893c21e6a796c0198e65d4"),
"e_id" : ObjectId("5565e106cd7a763b2732ad7c"),
"type" : "null",
"time" : 1435057185089,
"u_id" : ObjectId("55817c072e48b4b60cf366a7")
}
What I want to be able to do is count the documents that have either a like or dislike leaving the "null" out of the count. So I should have a count of 2. I tried to go about it like this whereby I set the query to both fields:
db.ratings.find({e_id: ObjectId("5565e106cd7a763b2732ad7c")}, {type: "like", type: "dislike"})
But this just prints out all three documents. Is there any reason?
If its glaringly obvious im sorry pulling out my hair at the moment.
Use the following db.collection.count() method which returns the count of documents that would match a find() query:
db.ratings.count({
"e_id": ObjectId("5565e106cd7a763b2732ad7c"),
type: {
"$in": ["like", "dislike"]
}
})
The db.collection.count() method is equivalent to the db.collection.find(query).count() construct. Your query selection criteria above can be interpreted as:
Get me the count of all documents which have the e_id field values as ObjectId("5565e106cd7a763b2732ad7c") AND the type field which has either value "like" or "dislike", as depicted by the $in operator that selects the documents where the value of a field equals any value in the specified array.
db.ratings.find({e_id: ObjectId("5565e106cd7a763b2732ad7c")},
{type: "like", type: "dislike"})
But this just prints out all three
documents. Is there any reason? If its glaringly obvious im sorry
pulling out my hair at the moment.
The second argument here is the projection used by the find method . It specifies fields that should be included -- regardless of their value. Normally, you specify a boolean value of 1 or true to include the field. Obviously, MongoDB accepts other values as true.
If you only need to count documents, you should issue a count command:
> db.runCommand({count: 'collection',
query: { "e_id" : ObjectId("5565e106cd7a763b2732ad7c"),
type: { $in: ["like", "dislike"]}}
})
{ "n" : 2, "ok" : 1 }
Please note the Mongo Shell provides the count helper for that:
> db.collection.find({ "e_id" : ObjectId("5565e106cd7a763b2732ad7c"),
type: { $in: ["like", "dislike"]}}).count()
2
That being said, to quote the documentation, using the count command "can result in an inaccurate count if orphaned documents exist or if a chunk migration is in progress." To avoid that, you might prefer using the aggregation framework:
> db.collection.aggregate([
{ $match: { "e_id" : ObjectId("5565e106cd7a763b2732ad7c"),
type: { $in: ["like", "dislike"]}}},
{ $group: { _id: null, n: { $sum: 1 }}}
])
{ "_id" : null, "n" : 2 }
This query should solve your problem
db.ratings.find({$or : [{"type": "like"}, {"type": "dislike"}]}).count()

Which index would be used if there are multiple indexes containing the same fields?

Take, for example, a find() that involves a field a and b, in that order. For example,
db.collection.find({'a':{'$lt':10},'b':{'$lt':5}})
I have two keys in my array of indexes for the collection:
[
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1
},
"ns" : "x.test",
"name" : "a_1_b_1"
},
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1,
"c" : 1
},
"ns" : "x.test",
"name" : "a_1_b_1_c_1"
}
]
Is it guaranteed that mongo will use the first key since it more accurately matches the query, or does it randomly choose any of the two keys because they will both work?
MongoDB has a query optimizer which selects the indexes that are most efficient. From the docs:
The MongoDB query optimizer processes queries and chooses the most
efficient query plan for a query given the available indexes.
So it's not strictly guaranteed (but I expect that the smaller index will yield results faster than the bigger compound index).
You can also use hint operator to force the query optimizer to use the specified index.
db.collection.find({'a':{'$lt':10},'b':{'$lt':5}}).hint({a:1, b:1});
However, those two indexes in your example are redundant. That's because the compound index supports queries on any prefix of index fields.
The following index:
db.collection.ensureIndex({a: 1, b: 1, c: 1});
Can support queries that include a, a and b and a and b and c, but not only b or c, or only b and c.
You and use $exist,, When is true, $exists matches the documents that contain the field, including documents where the field value is null. If is false, the query returns only the documents that do not contain the field.
$exist
the query will be
db.inventory.find( { "key.a": { $exists: true, 1 },"key.b": { $exists: true, 1 } } )

Why can I index a key in both ascending and descending order in MongoDB?

I can give the same key both ascending and descending index as below:
db.hw1_1.ensureIndex({answer:-1})
db.hw1_1.ensureIndex({answer:1})
And you can see that they are working at backend:
{ "v" : 1, "key" : { "answer" : -1 }, "ns" : "m101.hw1_1", "name" : "answer_-1" }
{ "v" : 1, "key" : { "answer" : 1 }, "ns" : "m101.hw1_1", "name" : "answer_1" }
Does it make any sense to have both orders of index at the same time on the same key?
Thanks!
It really doesn't make much sense to to this as the indexes can be used equally to "sort" in either order as it were. That said, there is nothing intrinsically wrong with doing this that should stop you from creating the index. Who knows, maybe you intend to remove one "after" creating the other.
But considering that you would be maintaining two indexes for exactly the same thing, there is the consideration of additional write overhead as well as the obvious additional disk space
But the order of processing will work both ways, and the first available index will always be chosen unless it is specifcically "hinted" at.
For a practical example, create some documents:
{ "answer" : 1 }
{ "answer" : 2 }
{ "answer" : 3 }
Then create an index:
db.collection.ensureIndex({ "answer": -1 })
Query with explain:
db.collection.find({},{_id:0}).sort({ answer: -1 }).explain()
Will of course select that index that was created, and when creating another index:
db.collection.ensureIndex({ "answer": 1 })
Issue the query with the "ascending" sort order:
db.collection.find({},{_id:0}).sort({ answer: 1 }).explain()
You will see that the orginal "answer_-1" index is still selected.
So it really is all down to a matter of how you use the index. If you generally want the results by "decending" key then do it that way or otherwise do the reverse.

Handling optional/empty data in MongoDB

I remember reading somewhere that the mongo engine was more confortable when the entire structure of a document was already in place in case of an update, so here is the question.
When dealing with "empty" data, for example when inserting an empty string, should I default it to null, "" or not insert it at all ?
{
_id: ObjectId("5192b6072fda974610000005"),
description: ""
}
or
{
_id: ObjectId("5192b6072fda974610000005"),
description: null
}
or
{
_id: ObjectId("5192b6072fda974610000005")
}
You have to remember that the description field may or may not be filled in every document (based on user input).
Introduction
If a document doesn't have a value, the DB considers its value to be null. Suppose a database with the following documents:
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
If you create a query to find documents with the field desc different than null, you will get just one document:
db.test.find({desc: {$ne: null}})
// Output:
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
The database doesn't differ documents without a desc field and documents with a desc field with the value null. One more test:
db.test.find({desc: null})
// Output:
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
But the differences are only ignored in the queries, because, as shown in the last example above, the fields are still saved on disk and you'll receive documents with the same structure of the documents that were sent to the MongoDB.
Question
When dealing with "empty" data, for example when inserting an empty string, should I default it to null, "" or not insert it at all ?
There isn't much difference from {desc: null} to {}, because most of the operators will have the same result. You should only pay special attention to these two operators:
$exists
$type
I'd save documents without the desc field, because the operators will continue to work as expected and I'd save some space.
Padding factor
If you know the documents in your database grow frequently, then MongoDB might need to move the documents during the update, because there isn't enough space in the previous document place. To prevent moving documents around, MongoDB allocates extra space for each document.
The ammount of extra space allocated by MongoDB per document is controlled by the padding factor. You cannot (and don't need to) choose the padding factor, because MongoDB will adaptively learn it, but you can help MongoDB preallocating internal space for each document by filling the possible future fields with null values. The difference is very small (depending on your application) and might be even smaller after MongoDB learn the best padding factor.
Sparse indexes
This section isn't too important to your specific problem right now, but may help you when you face similar problems.
If you create an unique index on field desc, then you wouldn't be able to save more than one document with the same value and in the previous database, we had more than one document with same value on field desc. Let's try to create an unique index in the previous presented database and see what error we get:
db.test.ensureIndex({desc: 1}, {unique: true})
// Output:
{
"err" : "E11000 duplicate key error index: test.test.$desc_1 dup key: { : null }",
"code" : 11000,
"n" : 0,
"connectionId" : 3,
"ok" : 1
}
If we want to be able to create an unique index on some field and let some documents have this field empty, we should create a sparse index. Let's try to create the unique index again:
// No errors this time:
db.test.ensureIndex({desc: 1}, {unique: true, sparse: true})
So far, so good, but why am I explaining all this? Because there is a obscure behaviour about sparse indexes. In the following query, we expect to have ALL documents sorted by desc.
db.test.find().sort({desc: 1})
// Output:
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
The result seems weird. What happened to the missing document? Let's try the query without sorting it:
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
All documents were returned this time. What's happening? It's simple, but not so obvious. When we sort the result by desc, we use the sparse index created previously and there is no entries for the documents that haven't the desc field. The following query show us the use of the index to sort the result:
db.test.find().sort({desc: 1}).explain().cursor
// Output:
"BtreeCursor desc_1"
We can skip the index using a hint:
db.test.find().sort({desc: 1}).hint({$natural: 1})
// Output:
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
Summary
Sparse unique indexes don't work if you include {desc: null}
Sparse unique indexes don't work if you include {desc: ""}
Sparse indexes might change the result of a query
There is little difference between the null value field and a document without the field. The main difference is that the former consumes a little disk space, while the latter does not consume at all. They can be distinguished by using $exists operator.
The field with an empty string is quite different from them. Though it depends on purpose I don't recommend to use it as a replacement for null. To be precise, they should be used to mean different things. For instance, think about voting. A person who cast a blank ballot is different from a person who wasn't permitted to vote. The former vote is an empty String, while the latter vote is null.
There is already a similar question here.

Can MongoDB use an index when checking for existence of a field with $exists operator?

If I have data in my users collection that looks like:
{ name: '...',
email: '...',
...,
photos: {
123: { url: '...', title: '...', ... },
456: { url: '...', title: '...', ... },
...
}
}
And I want to find which user owns photo id 127, then I am using the query:
db.users.find( {'photos.127': {'$exists' => true} } );
I've tried, but it doesn't seem possible to get MongoDB to use an index for this query. The index I tried was: db.users.ensureIndex({photos:1});. And when I used explain() mongo told me it was using a BasicCursor (i.e., no index was used).
Is it possible to create an index that mongo will use for this query?
Updated:
Seems $exists queries use index properly now based on these tickets
$exists queries should use index & {$exists: false} will not use index
Old Answer:
No, there is no way to tell mongodb to use index for exists query. Indexing is completely related to data. Since $exists is only related to the keys (fields) it cant be used in indexes.
$exists just verifies whether the given key (or field) exists in the document.
$exist will not use index, but you can change your data structure to
photos: [
{id:123, url: '...', title: '...', ... },
{id:456, url: '...', title: '...', ... },
...
]
and then use
db.users.ensureIndex({photos.id:1})
to create index for photo id.
It seems I am wrong, in fact, you can force your $exists query to use your index.
Let us go on using the above structure, but your photo id is not certainly contained , that is to say
some docs will have the key 'id' and some will not. Then you can create sparse index on it:
db.users.ensureIndex({'photos.id': 1}, {'sparse': true})
then query like this:
db.users.find({'photos.id': {$exists: true}}).hint({'photos.id': 1})
you can add explain to see if the query is using index.
Here is my result, my collection's mobile key is similar to your photos.id:
> db.test.count()
50000
> db.test.find({'mobile': {$exists: true}}).hint({'mobile': 1}).explain()
{
"cursor" : "BtreeCursor mobile_1",
"nscanned" : 49999,
"nscannedObjects" : 49999,
"n" : 49999,
"millis" : 138,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"mobile" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}
> db.test.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.test",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"mobile" : 1
},
"ns" : "test.test",
"name" : "mobile_1",
"sparse" : true,
"background" : true
}
]
Hope to help!
Since MongoDB 2.0 $exists queries should use an index. Unfortunately this fix has disappeared in the newest version and will be fixed in MongoDB 2.5
As of June 2022 the index CANNOT be used for ${exists: true} case and CAN BE ONLY PARTIALLY used for the half of other cases. There is a major (IMO) design bug with the MongoDB index which people are still often unaware of so I'm posting this answer with a workaround for some cases.
The issue is that MongoDB indexes non-existing fields as nulls and as such these fields are indistinguishable from nulls from the index perspective. In addition there is some mess related to JS analogy, where undefined === null is false but undefined == null is true.
That means:
When searching for {$exists: false} the index can be used. For that both null and non-existing values are scanned in the index , documents are fetched, and values equal to null are filtered out. Corresponding MongoDB stages: [null, null] IXSCAN and FETCH with {"$not" : {$exists: true}} filter.
When searching for {field: null} the index can be used. For that, just the index scan is needed. MongoDB parses that as {field: { $eq: null }}. For some likely historical reasons Mongo also searches for deprecated undefined type/value: [undefined, undefined]U[null, null] IXSCAN. IMO it would be way more logical to just interpret equality as value equality and not include non-existing fields in the results. But it is what it is.
To REALLY search for value equal to null, we need to search for field: {$type: "null"}. Index can be used. Similarly to the first case, stages are: [null, null] IXSCAN and FETCH with {'$type': [10]} filter. Values with non-existing field are unnecessarily fetched and then filtered out.
If you need to retrieve all existing values, i.e. including those with value null, then you are in bad luck. The search query is {field: {$exists: true}}. Index cannot be used. If mongo was using the index, it would include values indexed as null to include documents with the field equal to null, to later filter out non-existing values. So full index would be needed, and so the full collection scan is more efficient. Mongo stages: COLLSCAN with { '$exists': true } filter.
If you can live without values equal to null, i.e. it is fine to not include $type: null in results, then index can be used. If you know your field type is e.g. ObjectId then you can search for {field: {$type: "object"}}, for the list of types: {field: {$type: ["number", "object"]}}, or just {field: {$ne: "null"}}. The latter will exclude null and undefined types from the search. Mongo stages are:
[{},[]) IXSCAN and FETCH
'[nan.0, inf.0]U[{}, [])' and FETCH
[MinKey, undefined)U(null, MaxKey] IXSCAN and FETCH.
There is an issue filed on Feb 2014: https://jira.mongodb.org/browse/SERVER-12869. Unfortunately, MongoDB hasn't prioritized it yet, nor reflected the issue in the official documentation.
Posting it here with latest changes as of February 2023 -
if you wish to use $exists with an index, better consult the table here:
https://www.mongodb.com/docs/manual/reference/operator/query/exists/
If you wish to use an index on a query of { $exists: true }, best approach would be to use sparse index