Mongo - Compound Shard Indexes in 2.2

Mongo - Compound Shard Indexes in 2.2 - mongodb

I was reading the documentation for Mongo Shard keys for 2.2, and found it a bit confusing.
All sharded collections must have an index that starts with the shard
key. If you shard a collection that does not yet contain documents and
without such an index, the shardCollection will create an index on the
shard key. If the collection already contains documents, you must
create an appropriate index before using shardCollection.
Changed in version 2.2: The index on the shard key no longer needs to
be identical to the shard key. This index can be an index of the shard
key itself as before, or a compound index where the shard key is the
prefix of the index. This index cannot be a multikey index.
If you have a collection named people, sharded using the field {
zipcode: 1 }, and you want to replace this with an index on the field
{ zipcode: 1, username: 1 }, then:
Create an index on { zipcode: 1, username: 1 }: db.people.ensureIndex(
{ zipcode: 1, username: 1 } ); When MongoDB finishes building the
index, you can safely drop existing index on { zipcode: 1 }:
db.people.dropIndex( { zipcode: 1 } ); Warning The index on the shard
key cannot be a multikey index. As above, an index on { zipcode: 1,
username: 1 } can only replace an index on zipcode if there are no
array values for the username field.
If you drop the last appropriate index for the shard key, recover by
recreating a index on just the shard key.
I have a couple of questions about shard keys and indexes.
i) From the documentation, it looks like multi-key indexes were supported before 2.2. If that is the case, how is a compound index different from multikey indexes ?
ii) What is the difference between having
[a] an index that starts with a shard key and
[b] an index which has a shard key as a prefix ?
iii) What is the warning note about an index on the shard key should not be a multikey index ?
Isn't db.people.ensureIndex( { zipcode: 1, username: 1 } a multikey index ?

How is a compound index different from a multikey index?:
A compound index is an index like the one you described in the example:
{ zipcode: 1, username: 1 }
A multikey index is one that indexes items in an array, like an index on tags that is used to return all documents that contain the tag 'mongoDB',
What is the difference between having [a] an index that starts with a shard key and [b] an index which has a shard key as a prefix?:
Nothing.
What is the warning note about an index on the shard key should not be a multikey index?:
This makes a fair bit of sense when you consider that a multikey index is an index on an array. Consider our index on a tags array. A document could easily live in many (or all) shards, if it had the right collection of values in the array.
In other words, documents still have to be sharded based on a single value, as opposed to an object or an array.

I noticed that the MongoDB documentation on indexes may have created confusion. Multi-key indexes are a way to create separate index entries for each element in an array: http://www.mongodb.org/display/DOCS/Multikeys. A compound index, on the other hand, creates index entries on two or more fields: http://www.mongodb.org/display/DOCS/Indexes#Indexes-CompoundKeys.
Multi-key shard key indexes were not supported in previous versions of MongoDB. Since MongoDB splits up sharded collections based on ranges of shard key values, using a multi-key index is not possible.
Assuming I understand your question, there is no difference. The shard key index can be an index on a single field or a compound index that starts with the shard key.
This is an example of a compound index, which can be used for the shard key index.
If zipcode is the shard key, these indexes would work:
db.people.ensureIndex({ zipcode: 1})
db.people.ensureIndex({ zipcode: 1, username: 1 })
An example of a multi-key index:
{_id: 1, array: [{zipcode: x}, {username: y}]}
db.people.ensureIndex({array: 1})

Related

Mongo Db ESR rule for multiple index keys in a compound index

Collection: appointments
Schema:
{
_id: ObjectId();
userId: string;
calType: string;
status: string;
appointment_start_date_time: string; //UTC ISO string
appointment_end_date_time: string; //UTC ISO string
}
Example:
{
_id: ObjectId('6332b21960f8083d24f3140b')
userId: "6272ccb3-4050-429c-b427-eb104f340962"
calType: "MY Personal Cal"
status: "CONFIRMED"
appointment_start_date_time: "2022-07-08T03:30:00.000Z"
appointment_end_date_time: "2022-07-08T04:00:00.000Z"
}
I want to create a compound index on userId, calType, status, appointment_start_date_time
Based on Mongo Db's ESR rule I would like to determine the arrangement of my keys.
The documentation conveniently gives an example of 3 keys in compound index where the first key is for equality, second for sort and third for range. But in my case I have more than 3 keys.
I would like to know how would the index keys be arranged for a more efficient compound index. In my case userId, calType, status will be used for equality based match whereas appointment_start_date_time will be used for sorting.
Potential queries which I will be making on this collection will be:
All appointments where userId = x, calType = y, status = z sort by appointment_start_date_time ASC
All appointments where userId = x, status = z
All appointments where calType = y, status = z
All appointments where userId = x sort by appointment_start_date_time ASC or DSC
What is the standard when we have multiple keys for equality and one for sorting/range?

None of your sample queries use a ranged filter. Assuming none of these fields contain arrays, applying the ESR rule:
Queries 1 and 2 could be optimally served by an index on
{userId:1, status:1, calType:1, appointment_start_date_time:1}
Query 3 would be best server by this index:
{calType:1, status:1}
Query 4 would be best served by:
{userId:1, appointment_start_date_time:1}
In these optimal cases, the MongoDB server could seek to the first matching index key, scan to the last key in a single pass, and encounter the documents in already sorted order.
It may also be possible to get acceptable performance for queries 1,2, and 4 using the index:
{userId:1, appointment_start_date_time:1, status:1, calType:1}
Using this index, query 4 would still be optimal, but query 1 and 2 would require and additional index seek for each status/calType pair. This would be somewhat less performant than the optimal case, but would still be better than an in-memory sort.

MongoDB can not create unique sparse index (duplicate key)

I want to create a unique index over two columns where the index should allow multiple null values for the second part of the index. But:
db.model.ensureIndex({userId : 1, name : 1},{unique : true, sparse : true});
Throws a duplicate key exception: E11000 duplicate key error index: devmongo.model.$userId_1_name_1 dup key: { : "-1", : null }. I thought because of the sparse=true option the index should allow this constellation? How can I achieve this? I use MongoDB 2.6.5

Sparse compound indexes will create an index entry for a document if any of the fields exist, setting the value to null in the index for any fields that do not exist in the document. Put another way: a sparse compound index will only skip a document if all of the index fields are missing from the document.
As of v3.2, partial indexes can be used to accomplish what you're trying to do. You could use:
db.model.ensureIndex({userId : 1, name : 1}, { partialFilterExpression: { name: { $exists: true }, unique: true });
which will only index documents that have a name field.
NB: This index cannot be used by mongo to handle a query by userId as it will not contain all of the documents in the collection. Also, a null in the document is considered a value and a field that has a null value exists.

The compound index should be considered as a whole one, so unique requires (userId, name) pair must be unique in the collection, and sparse means if both userId and name missed in a document, it is allowed. The error message shows that there are at least two documents whose (userId, name) pairs are equivalent (if a field missed, the value can be considered as null).

In my case, it turns out field names are case sensitive.
So creating a compound index on {field1 : 1, field2 : 1} is not the same as {Field1 : 1, Field2 : 1}

MongoDB Covered Query For Two Fields Without Compound Index

Can you perform a MongoDB covered query for two fields, for example
db.collection.find( { _id: 1, a: 2 } )
without having a compound index such as
db.collection.ensureIndex( { _id: 1, a: 1 } )
but instead having only one index for _id (you get that by default) and another index for field "a", as in
db.collection.ensureIndex( { a: 1 } )
In other words, I'd like to know if in order to perform a covered query for two fields I need a compound index vs. needing only two single (i.e., not compound) indexes, one for each field.

Queries only use one index.
Your example shows _id as one of the elements of your index? _id Needs to be unique in a collection, so it wouldn't make sense to make a compound index of _id and something else.
If you instead had:
db.collection.ensureIndex( { a: 1, b: 1 })
You could then use the a index as needed, independently, or as a compound index with b.

Does order of indexes matter in MongoDB?

Is tip #25 in Tips and Tricks for MongoDB Developers correct?
It says that this query:
collection.find({"x" : criteria, "y" : criteria, "z" : criteria})
can be optimized with
collection.ensureIndex({"y" : 1, "z" : 1, "x" : 1})
I think it's false because for this to work, x should be in front. I thought the order of indexes matter.
So where did I go wrong?

The order of the fields in the index only matters if the query doesn't include all of the fields in the index. This query is referencing all three fields so the order of the fields in the index doesn't matter.
See more details in the docs on compound indexes.
The order of the fields in the find query object is not relevant.

For beginners who wants to understand it better
Mongodb says The index contains references to documents sorted first by the values of the item field and, within each value of the item field, sorted by values of the stock field." What does this mean ????
let's create a compound index on fields a, b, c, and d in ascescending order(1)
Model.createIndex({ a: 1, b: 1, c: 1, d: 1 });
I visualize it as:
at level-1, list of references sorted in a specified order(1) based on the value of the first index field(a)
at level-2, each reference at level-1, holds another set of references from thier location in a specified order(1) based on the value of the second field in the chain(b).
at level-3, each reference at level-2, holds another set of references from thier location in a specified order(1) based on the value of the second field in the chain(c).
at level-4, each reference at level-3, holds another set of references from thier location in a specified order(1) based on the value of the second field in the chain(d).
This chain forms a tree structure thus is chosen to store in B-TREE data structure.
I would love to call this storage system a compound-index-chain in this context.
Normally we build indices to perform two types of operations 1. Query Operation like find() and 2. Non-query operation like Sort()
Now you created compound index on { a: 1, b: 1, c: 1, d: 1 }. But only index creation is not enought. It becomes inefficient and sometimes useless if you don't structure your database operatons(find and sort) in a way that use those indexes.
Let's dig deeper into what kinds of query supports what kind of index ?
find():
The following prefixes of the compound index supports also indexed find() query operation on fields
{a:1},
{a:1, b:1},
{a:1, b:1, c:1}
// Index prefixes are the beginning subsets of indexed fields
#JohnnyHK already said "The order of the fields in the find query object is not relevant."
The fields could be in ANY ORDER like {b:1, a1} instead of {a1:, b:1}. Index will still be utilized as long as it is find() operation being operated on compound index or the prefix of the compound index.
However the performance of the query will not be same(may degrade) even though the find() query is using the same index and index is being utilized if the order of the the fields in find() is not highly selective than other subsequent fields.
Meaning, if the first field in a query say find({a: 'red', b: tshrt}), has HIGH SELECTIVITY, the query will be less efficient than find({a: 'tshirt', b: 'red'}) as this query hs LOWER SELECTIVITY even though both queries are using one index {a:1, b:1}.
However the HIGHLY SELECTIVE query will perform better than not having any index at all.
I think #Sushil tried to touch this topic.
In case if you are still wondering, Query selectivity refers to how well the query predicate excludes or filters out documents in a collection. Query selectivity can determine whether or not queries can use indexes effectively or even use indexes at all.
Now Let's come to the prefixes of compound indexs
Note:find() behaves differently on this {a:1, c:1} prefix of the compound index {a:1, b:1, c:1, d1} than rest of its prefixes?
In this case, The find() operation will not be able to utilize our compound index efficiently.
What happens is a:1 field index will only be able to support the find query. index on c:1 field will not be used at all because compound-index-chain has been broken in between due to the absence of b:1 index field in the prefix.
So if find() query operates on a and c field together, for field a:1 IXSCAN( i.e use of index on a) and field c COLLSCAN(i.e no use of index) will be used. Meaining the query will be slower than having separate compound index on {a:1,c:1} but faster than not having any index at all.
Conclusion is Index fields are parsed in order; if a query omits a particular index prefix, it is unable to make use of any index fields that follow that prefix.
2. Sort():
For non-query-operation(i.e Sort), the subsets of the compount index must in the same order of the index as well as must also be in the either same or oposite direction of the direction specified for each field while creating the compound index.
Let's see how the our compound index { a: 1, b: 1, c: 1, d: 1 } with ascescending direction behave with sort() operation:
Let's look at the direction of the indexed fields in sorting.
As we know on single field index on {a:1} can support sort on {a:1} same-direction and {a:-1} reverse-direction,
Compound indexes follow the same rules while sorting.
{a:1, b:1, c:1, d:1} // in same-direction as of our compound index
{a:-1, b:-1, c:-1, d:-1 } // in reverse-direction of our compound index
// But these field have neither same-direction nor reverse-direction but is ARBITARY/MIXED. Thus
// Index will be discarded while performing sorting with these fields and directions
{a:-1:, b:1, c:1, d:1}
Another example would be compound index on {a:1, b:-1} can support indexed sorting on {a:1,b:-1} (same-direction) and on {a:-1,b:1}(reverse-direction) BUT NOT support {a:-1, b:-1}.
Now let's look at the order of the indexed fields in sorting
OPTIMUM SORTING:
When a Sort operation using the compound index or using the prefix of the compound index, examining the result set in the memory(RAM) is not needed. Such sorting operation is solely sattisfied by the fields available in the index, gives optimum performance in sorting operation.
For instance:
// compound index
{a:1, b:1, c:1, d:1}
// prefix of the compound index
{a:1},
{a:1, b:1},
{a:1, b:1, c:1}
Compound-Index-Chain-Break:
When a sort operation is partially covered by the compound index, may require to examine the non-indexed matched result set in the memory.
Model.find({ a: 2 }).sort({ c: 1 }); // will not use index for sorting using field c. But will be used for finding
Model.find({ a: { $gt: 2 } }).sort({ c: 1 }); // will not use index for sorting But will be used for finding
// because compound-index-chain-break due to absence of field b of the prefix {a:1, b:1, c:1} of our compound index {a:1, b:1, c:1, d:1}
Sort on Non-prefix Subset:
When prefix keys of the index appear in both the query predicate(i.e find()) and the sort(), that index fields which precedes(or overlap) the sort subset MUST have the equality conditions($eq,$gte,$lte) in the query. So
A compound index can support indexed query on the its index prefixes as well.
Model.find({ c: 5 }).sort({ c: 1 }); // will not use index at all because it does not belongs to any of the prefix of our compound index
Model.find({ b: 3, a: 4 }).sort({ c: 1 }); // will use the index for both finding and sorting as it belongs to one our index prexfix ie. {a:1, b:1, c:1}
Model.find({ a: 4 }).sort({ a: 1, b: 1 }); // will use index for finding but not use index for sorting because a field is overlapped.
Model.find({ a: { $gt: 4 } }).sort({ a: 1, b: 1 }); // will use index for both finding and sorting because overlapped field (a) in the predicate uses equality operator and it belongs to the prefix {a:1, b:1}
Model.find({ a: 5, b: 3 }).sort({ b: 1 }); // will not use index for sorting
Model.find({ a: 5, b: { $lt: 3 } }).sort({ b: 1 }); // will use index for both finding and sorting
Hope this helps somebody

The books states the below scenario
You have 3 queries to run:
Collection.find({x:criteria, y:criteria,z:criteria})
Collection.find({z:criteria, y:criteria,w:criteria})
Collection.find({y:criteria, w:criteria })
To use
collection.ensureIndex({y:1,z:1,x:1})
it is considering the occurrence and as occurrence of y is more, it want all the queries to hit y followed by z and lastly as you will be running the 1st query a thousand times more than the other two hence including x, if this was not the case and you run all 3 queries equally then the suggestion is
Collection.ensureIndex({y:1,w:1,z:1}) .
Moreover as per the MongoDB documentation “The order of fields in a compound index is very important.” But in the above case scenario the use case is different. It is trying to optimize all the use case queries with one index.

MongoDB one compound index vs multiple single field indexes

I have a collection of photos. They can be referenced (one-to-many) from some other collections (events, news, posts, etc)
I can create reference like this:
db.photos.insert({ parent:{ collection: 'events', id: 12345 }})
db.photos.insert({ parent:{ collection: 'events', id: 54321 }})
//or just DBRef
db.photos.ensureIndex({parent:1})
OR
db.photos.insert({ post_id: 12345 })
db.photos.insert({ event_id: 54321 })
db.photos.ensureIndex({post_id:1}, {sparse: true})
db.photos.ensureIndex({event_id:1}, {sparse: true})
In the first case we have one big compound index
In the second — some number of smaller indexes
What's pros and cons of each approach?

First, check the number of hits on which field.
Second, create one compound index with max hit fields.
Third, create one compound index with min hit fields.
Note:
If a large number of fields hit at a time than used compound indexing.
Other cases create single indexing.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Mongo - Compound Shard Indexes in 2.2 - mongodb

Related

Mongo Db ESR rule for multiple index keys in a compound index

MongoDB can not create unique sparse index (duplicate key)

MongoDB Covered Query For Two Fields Without Compound Index

Does order of indexes matter in MongoDB?

MongoDB one compound index vs multiple single field indexes

Categories

Resources