MongoDB Indexing a field which may not exist - mongodb

I have a collection which has an optional field xy_id. About 10% of the documents (out of 500k) does not have this xy_id field.
I have quite a lot of queries to this collection like find({xy_id: <id>}).
I tried indexing it normally (.createIndex({xy_id: 1}, {"background": true})) and it does improve the query speed.
Is this the correct way to index the field in this case? or should I be using a sparse index or another way?

Yes, this is the correct way. The default behaviour of MongoDB is serving well in this case. You can see in the docs that index creation supports an unique flag, which is false by default. All your documents missing the index key will be indexed under a single index entry. Queries can use this index in all cases because all the documents are indexed.
On the other hand, if you use sparse index the documents missing the index key will not be indexed at all. Some operations such as count, sort and other queries will not be able to use the sparse index unless explicitly hinted to do so. If explicitly hinted, you should be okay with incorrect results - the entries not in the index will be omitted in the result. You can read about it here.

Related

Unique multi key hashed index in MongoDB

I have a collection with several billion documents and need to create a unique multi-key index for every attribute of my documents.
The problem is, I get an error if I try to do that because the generated keys would be too large.
pymongo.errors.OperationFailure: WiredTigerIndex::insert: key too large to index, failing
I found out MongoDB lets you create hashed indexes, which would resolve this problem, however they are not to be used for multi-key indexes.
How can i resolve this?
My first idea was to create another attribute for each of my document with an hash of every value of its attributes, then creating an index on that new field.
However this would mean to recalculate the hash every time I wish to add a new attribute, plus the excessive amount of time necessary to create both the hashes and the indexes.
This is a feature added in mongoDB since 2.6 to prevent the total size of an index entry to exceed 1024 bytes (also known as Index Key Length Limit).
In MongoDB 2.6, if you attempt to insert or update a document so that the value of an indexed field is longer than the Index Key Length Limit, the operation will fail and return an error to the client. In previous versions of MongoDB, these operations would successfully insert or modify a document but the index or indexes would not include references to the document.
For migration purposes and other temporary scenarios you can downgrade to 2.4 handling of this use case where this exception would not be triggered via setting this mongoDB server flag:
db.getSiblingDB('admin').runCommand( { setParameter: 1, failIndexKeyTooLong: false } )
This however is not recommended.
Also consider that creating indexes for every attribute of your documents may not be the optimal solution at all.
Have you examined how you query your documents and on which fields you key on? Have you used explain to view the query plan? It would be an exception to the rule if you tell us that you query on all fields all the time.
Here are the recommended MongoDB indexing strategies.
Excessive indexing has a price as well and should be avoided.

Fundamental misunderstanding of MongoDB indices

So, I read the following definition of indexes from [MongoDB Docs][1].
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Indexes are special data structures that store a small portion of the
collection’s data set in an easy to traverse form. The index stores
the value of a specific field or set of fields, ordered by the value
of the field. The ordering of the index entries supports efficient
equality matches and range-based query operations. In addition,
MongoDB can return sorted results by using the ordering in the index.
I have a sample database with a collection called pets. Pets have the following structure.
{
"_id": ObjectId(123abc123abc)
"name": "My pet's name"
}
I created an index on the name field using the following code.
db.pets.createIndex({"name":1})
What I expect is that the documents in the collection, pets, will be indexed in ascending order based on the name field during queries. The result of this index can potentially reduce the overall query time, especially if a query is strategically structured with available indices in mind. Under that assumption, the following query should return all pets sorted by name in ascending order, but it doesn't.
db.pets.find({},{"_id":0})
Instead, it returns the pets in the order that they were inserted. My conclusion is that I lack a fundamental understanding of how indices work. Can someone please help me to understand?
Yes, it is misunderstanding about how indexes work.
Indexes don't change the output of a query but the way query is processed by the database engine. So db.pets.find({},{"_id":0}) will always return the documents in natural order irrespective of whether there is an index or not.
Indexes will be used only when you make use of them in your query. Thus,
db.pets.find({name : "My pet's name"},{"_id":0}) and db.pets.find({}, {_id : 0}).sort({name : 1}) will use the {name : 1} index.
You should run explain on your queries to check if indexes are being used or not.
You may want to refer the documentation on how indexes work.
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/

MongoDB: Indexes, Sorting

After having read the official documentations on indexes, sort, intersection, i'm a little bit confuse on how everything work together.
I've trouble making my query use the indexes i've created. I work on a mongodb 3.0.3, on a collection having ~4millions of document.
To simplify, let's say my document is composed of 6 fields:
{
a:<text>,
b:<boolean>,
c:<text>,
d:<boolean>,
e:<date>,
f:<date>
}
The query I want to achieve is the following :
db.mycoll.find({ a:"OK", b:true, c:"ProviderA", d:true, e:{ $gte:ISODate("2016-10-28T12:00:01Z"),$lt:ISODate("2016-10-28T12:00:02") } }).sort({f:1});
So intuitively I've created two indexes
db.mycoll.createIndex({a: 1, b: 1, c: 1, d:1, e:1 }, {background: true,name: "test1"})
db.mycoll.createIndex({f:1}, {background: true,name: "test2"})
But the explain() give me that the first index is not used at all.
I known there is some kind of limitation when there is ranges in play in the filter (in the e field), but I can't find my way around it.
Also instead of having a single index on f, I try a compound index on {e:1,f:1} but it didn't change anything.
So What I have misunderstood?
Thanks for your support.
Update: also I find some time the following predicate for mongodb 2.6 :
A good rule of thumb for queries with sort is to order the indexed fields in this order:
First, the field(s) on which you will query for exact values.
Second, the field(s) on which you will sort.
Finally, field(s) on which you will query for a range of values (e.g., $gt, $lt, $in)
An example of using this rule of thumb is in the section on “Sorting the results of a complex query on a range of values” below, including a link to further reading.
Does this also apply for 3.X version?
Update 2: following above predicate, I created the following index
db.mycoll.createIndex({a: 1, b: 1, c: 1, d:1 , f:1, e:1}, {background: true,name: "test1"})
And for the same query :
db.mycoll.find({ a:"OK", b:true, c:"ProviderA", d:true, e:{ $gte:ISODate("2016-10-28T12:00:01Z"),$lt:ISODate("2016-10-28T12:00:02") } }).sort({f:1});
the index is indeed used. However too much keys seems to be scan, I may need to find a better order the fields in the query/index.
Mongo acts sometimes a bit strange when it comes to the index selection.
Mongo automagically decides what index to use. The smaller an index is the more likely it is used (especially indexes with only one field) - this is my experience. May be this happens because it is more often already loaded in RAM? To find out what index to use when Mongo performs test queries when it is idle. However the result is sometimes unexpected.
Therefore if you know what index to use you can force a query to use a specific index using the $hint option. You should try that.
Your two indexes used in the query and the sort does not overlap so MongoDB can not use them for index intersection:
Index intersection does not apply when the sort() operation requires an index completely separate from the query predicate.

DB Compound indexing best practices Mongo DB

How costly is it to index some fields in MongoDB,
I have a table where i want uniqueness combining two fields, Every where i search they suggested compound index with unique set to true. But what i was doing is " Appending both field1_field2 and making it a key, so that field2 will be always unique for field1.(and add Application logic) As i thought indexing is costly.
And also as MongoDB documentation advices us not to use Custom Object ID like auto incrementing number, I end up giving big numbers to Models like Classes, Students etc, (where i could have used easily used 1,2,3 in sql lite), I didn't think to add a new field for numbering and index that field for querying.
What are the best practices advice for production
The advantage of using compound indexes vs your own indexed field system is that compound indexes allows sorting quicker than regular indexed fields. It also lowers the size of every documents.
In your case, if you want to get the documents sorted with values in field1 ascending and in field2 descending, it is better to use a compound index. If you only want to get the documents that have some specific value contained in field1_field2, it does not really matter if you use compound indexes or a regular indexed field.
However, if you already have field1 and field2 in seperate fields in the documents, and you also have a field containing field1_field2, it could be better to use a compound index on field1 and field2, and simply delete the field containing field1_field2. This could lower the size of every document and ultimately reduce the size of your database.
Regarding the cost of the indexing, you almost have to index field1_field2 if you want to go down that route anyways. Queries based on unindexed fields in MongoDB are really slow. And it does not take much more time adding a document to a database when the document has an indexed field (we're talking 1 millisecond or so). Note that adding an index on many existing documents can take a few minutes. This is why you usually plan the indexing strategy before adding any documents.
TL;DR:
If you have limited disk space or need to sort the results, go with a compound index and delete field1_field2. Otherwise, use field1_field2, but it has to be indexed!

What does the digit "1" mean when creating indexes in mongodb

I am new to mongodb and want to make indexes for a specific collection. I have seen people use a digit "1" in front of the field name when they want to create an index. for example:
db.users.ensureIndex({user_name: 1})
now I want to know what does this digit mean and is it necessary to use it?
It's the type of index. MongoDB supports different kinds of indexes. However, only the first two indexes can be combined to a compound index.
1: Ascending binary-tree index.
-1: Descending binary-tree index. Very similar to the default index but the difference can matter for the behavior of compound indexes.
"hashed": A hashtable index. Very fast for lookup by exact value, especially in very large collections. But not usable for inexact queries ($gt, $regex or similar).
"text": A text index designed for searching for words in strings with natural language.
"2d": A geospatial index on a flat plane
"2dsphere": A geospatial index on a sphere
For more information, see the documentation of index types.
It defines the index type on that specefic field. For example the value of 1 creates an index with ascending order, while the value -1 create the index with descending order.
For more information, see the Manual