MongoDB one compound index vs multiple single field indexes - mongodb

I have a collection of photos. They can be referenced (one-to-many) from some other collections (events, news, posts, etc)
I can create reference like this:
db.photos.insert({ parent:{ collection: 'events', id: 12345 }})
db.photos.insert({ parent:{ collection: 'events', id: 54321 }})
//or just DBRef
db.photos.ensureIndex({parent:1})
OR
db.photos.insert({ post_id: 12345 })
db.photos.insert({ event_id: 54321 })
db.photos.ensureIndex({post_id:1}, {sparse: true})
db.photos.ensureIndex({event_id:1}, {sparse: true})
In the first case we have one big compound index
In the second — some number of smaller indexes
What's pros and cons of each approach?

First, check the number of hits on which field.
Second, create one compound index with max hit fields.
Third, create one compound index with min hit fields.
Note:
If a large number of fields hit at a time than used compound indexing.
Other cases create single indexing.

Related

Compound index where one field can be null MongoDB

How can I create compound index in mongo where one of the fields maybe not present or be null?
For example in below documents if I create a compound index name+age. How can I still achieve this with age being not present or null in some documents?
{
name: "Anurag",
age: "21",
},
{
name: "Nitin",
},
You can create partial Index as follow:
db.contacts.createIndex(
{ name: 1 },
{ partialFilterExpression: { age: { $exists: true } } }
)
Explained:
As per the documentation partial indexes only index the documents in a collection that meet a specified filter expression. By indexing a subset of the documents in a collection, partial indexes have lower storage requirements and reduced performance costs for index creation and maintenance. In this particular case imagine your collection have 100k documents , but only 5 documents have the "age" field existing , in this case the partial index will include only those 5 fields in the index optimizing the index storage space and providing better performance.
For the query optimizer to choose this partial index, the query predicate must include a condition on the name field as well as a non-null match on the age field.
Following example queries will be able to use the index:
db.contacts.find({name:"John"})
db.contacts.find({name:"John",age:{$gt:20}})
db.contacts.find({name:"John",age:30})
Following example query is a "covered query" based on this index:
db.contacts.find({name:"John",age:30},{_id:0,name:1,age:1})
( this query will be highly efficient since it return the data directly from the index )
Following example queries will not be able to use the index:
db.contacts.find({name:"John",age:{$exists:false}})
db.contacts.find({name:"John",age:null})
db.contacts.find({age:20})
Please, note you need to perform some analysis on if you need to search on the age field together with the name , since name field has a very good selectivity and this index will not be used in case you search only by age , maybe a good option is to create additional sparse/partial index only on the age field so you could fetch a list with contacts by certain age if this a possible search use case.

MongoDB complex indices

I'm trying to understand how to best work with indices in MongoDB. Lets say that I have a collection of documents like this one:
{
_id: 1,
keywords: ["gap", "casual", "shorts", "oatmeal"],
age: 21,
brand: "Gap",
color: "Black",
gender: "female",
retailer: "Gap",
style: "Casual Shorts",
student: false,
location: "US",
}
and I regularly run a query to find all documents that match a set of criteria for each of those fields, something like:
db.items.find({ age: { $gt: 13, $lt: 40 },
brand: { $in: ['Gap', 'Target'] },
retailer: { $in: ['Gap', 'Target'] },
gender: { $in: ['male', 'female'] },
style: { $in: ['Casual Shorts', 'Jeans']},
location: { $in: ['US', 'International'] },
color: { $in: ['Black', 'Green'] },
keywords: { $all: ['gap', 'casual'] }
})
I'm trying to figure what sort of index I can create to improve the speed of a query such as this. Should I create a compound index like this:
db.items.ensureIndex({ age: 1, brand: 1, retailer: 1, gender: 1, style: 1, location: 1, color: 1, keywords: 1})
or is there a better set of indices I can create to optimize this query?
Should I create a compound index like this:
db.items.ensureIndex({age: 1, brand: 1, retailer: 1, gender: 1, style: 1, location: 1, color: 1, keywords: 1})
You can create an index like the one above, but you're indexing almost the entire collection. Indexes take space; the more fields in the index, the more space is used. Usually RAM, although they can be swapped out. They also incur write penalty.
Your index seems wasteful, since probably indexing just a few of those fields will make MongoDB scan a set of documents that is close to the expected result of the find operation.
Is there a better set of indices I can create to optimize this query?
Like I said before, probably yes. But this question is very difficult to answer without knowing details of the collection, like the amount of documents it has, which values each field can have, how those values are distributed in the collection (50% gender male, 50% gender female?), how they correlate to each other, etc.
There are a few indexing strategies, but normally you should strive to create indexes with high selectivity. Choose "small" field combinations that will help MongoDB locate the desired documents scanning a "reasonable" amount of them. Again, "small" and "reasonable" will depend on the characteristics of the collection and query you are performing.
Since this is a fairly complex subject, here are some references that should help you building more appropriate indexes.
http://emptysqua.re/blog/optimizing-mongodb-compound-indexes/
http://docs.mongodb.org/manual/faq/indexes/#how-do-you-determine-what-fields-to-index
http://docs.mongodb.org/manual/tutorial/create-queries-that-ensure-selectivity/
And use cursor.explain to evaluate your indexes.
http://docs.mongodb.org/manual/reference/method/cursor.explain/
Large index like this one will penalize you on writes. It is better to index just what you need, and let Mongo's optimiser do most of the work for you. You can always give him an hint or, in last resort, reindex if you application or data usage changes drastically.
Your query will use the index for fields that have one (fast), and use a table scan (slow) on the remaining documents.
Depending on your application, a few stand alone indexes could be better. Adding more indexes will not improve performance. With the write penality, it could even make it worse (YMMV).
Here is a basic algorithm for selecting fields to put in an index:
What single field is in a query the most often?
If that single field is present in a query, will a table scan be expensive?
What other field could you index to further reduce the table scan?
This index seems to be very reasonable for your query. MongoDB calls the query a covered query for this index, since there is no need to access the documents. All data can be fetched from the index.
from the docs:
"Because the index “covers” the query, MongoDB can both match the query conditions and return the results using only the index; MongoDB does not need to look at the documents, only the index, to fulfill the query. An index can also cover an aggregation pipeline operation on unsharded collections."
Some remarks:
This index will only be used by queries that include a filter on age. A query that only filters by brand or retailer will probably not use this index.
Adding an index on only one or two of the most selective fields of your query will already bring a very significant performance boost. The more fields you add the larger the index size will be on disk.
You may want to generate some random sample data and measure the performance of this with different indexes or sets of indexes. This is obviously the safest way to know.

Compound Indexes Order in Mongo

Let's say I have the following document schema:
{
_id: ObjectId(...),
name: "Kevin",
weight: 500,
hobby: "scala",
favoriteFood : "chicken",
pet: "parrot",
favoriteMovie : "Diehard"
}
If I create a compound index on name-weight, I will be able to specify a strict parameter (name == "Kevin"), and a range on weight (between 50 and 200). However, I would not be able to do the reverse: specify a weight and give a "range" of names.
Of course compound index order matters where a range query is involved.
If only exact queries will be used (example: name == "Kevin", weight == 100, hobby == "C++"), then does the order actually matter for compound indexes?
When you have an exact query, the order should not matter. But when you want to be sure, the .explain() method on database cursors is your friend. It can tell you which indexes are used and how they are used when you perform a query in the mongo shell.
Important fields of the document returned by explain are:
indexOnly: when it's true, the query was completely covered by the index
n and nScanned: The first one tells you the number of found documents, the second how many documents had to be examined because the indexes couldn't sort them out. The latter shouldn't be notably higher than the first.
millis: number of milliseconds the query took to perform

MongoDB Covered Query For Two Fields Without Compound Index

Can you perform a MongoDB covered query for two fields, for example
db.collection.find( { _id: 1, a: 2 } )
without having a compound index such as
db.collection.ensureIndex( { _id: 1, a: 1 } )
but instead having only one index for _id (you get that by default) and another index for field "a", as in
db.collection.ensureIndex( { a: 1 } )
In other words, I'd like to know if in order to perform a covered query for two fields I need a compound index vs. needing only two single (i.e., not compound) indexes, one for each field.
Queries only use one index.
Your example shows _id as one of the elements of your index? _id Needs to be unique in a collection, so it wouldn't make sense to make a compound index of _id and something else.
If you instead had:
db.collection.ensureIndex( { a: 1, b: 1 })
You could then use the a index as needed, independently, or as a compound index with b.

Mongo - Compound Shard Indexes in 2.2

I was reading the documentation for Mongo Shard keys for 2.2, and found it a bit confusing.
All sharded collections must have an index that starts with the shard
key. If you shard a collection that does not yet contain documents and
without such an index, the shardCollection will create an index on the
shard key. If the collection already contains documents, you must
create an appropriate index before using shardCollection.
Changed in version 2.2: The index on the shard key no longer needs to
be identical to the shard key. This index can be an index of the shard
key itself as before, or a compound index where the shard key is the
prefix of the index. This index cannot be a multikey index.
If you have a collection named people, sharded using the field {
zipcode: 1 }, and you want to replace this with an index on the field
{ zipcode: 1, username: 1 }, then:
Create an index on { zipcode: 1, username: 1 }: db.people.ensureIndex(
{ zipcode: 1, username: 1 } ); When MongoDB finishes building the
index, you can safely drop existing index on { zipcode: 1 }:
db.people.dropIndex( { zipcode: 1 } ); Warning The index on the shard
key cannot be a multikey index. As above, an index on { zipcode: 1,
username: 1 } can only replace an index on zipcode if there are no
array values for the username field.
If you drop the last appropriate index for the shard key, recover by
recreating a index on just the shard key.
I have a couple of questions about shard keys and indexes.
i) From the documentation, it looks like multi-key indexes were supported before 2.2. If that is the case, how is a compound index different from multikey indexes ?
ii) What is the difference between having
[a] an index that starts with a shard key and
[b] an index which has a shard key as a prefix ?
iii) What is the warning note about an index on the shard key should not be a multikey index ?
Isn't db.people.ensureIndex( { zipcode: 1, username: 1 } a multikey index ?
How is a compound index different from a multikey index?:
A compound index is an index like the one you described in the example:
{ zipcode: 1, username: 1 }
A multikey index is one that indexes items in an array, like an index on tags that is used to return all documents that contain the tag 'mongoDB',
What is the difference between having [a] an index that starts with a shard key and [b] an index which has a shard key as a prefix?:
Nothing.
What is the warning note about an index on the shard key should not be a multikey index?:
This makes a fair bit of sense when you consider that a multikey index is an index on an array. Consider our index on a tags array. A document could easily live in many (or all) shards, if it had the right collection of values in the array.
In other words, documents still have to be sharded based on a single value, as opposed to an object or an array.
I noticed that the MongoDB documentation on indexes may have created confusion. Multi-key indexes are a way to create separate index entries for each element in an array: http://www.mongodb.org/display/DOCS/Multikeys. A compound index, on the other hand, creates index entries on two or more fields: http://www.mongodb.org/display/DOCS/Indexes#Indexes-CompoundKeys.
Multi-key shard key indexes were not supported in previous versions of MongoDB. Since MongoDB splits up sharded collections based on ranges of shard key values, using a multi-key index is not possible.
Assuming I understand your question, there is no difference. The shard key index can be an index on a single field or a compound index that starts with the shard key.
This is an example of a compound index, which can be used for the shard key index.
If zipcode is the shard key, these indexes would work:
db.people.ensureIndex({ zipcode: 1})
db.people.ensureIndex({ zipcode: 1, username: 1 })
An example of a multi-key index:
{_id: 1, array: [{zipcode: x}, {username: y}]}
db.people.ensureIndex({array: 1})