mongodb: Multikey indexing structure? - mongodb

I'm finding it hard to understand how exactly indexing is done on multikeys in mongodb.
This is what I read about multikeys in mongodb docs on its website:
1) "Creating an index on an array element indexes results in the database indexing each element of the array"
2) "...will index all the tags on the document, and create index entries for "X", "Y" and "Z" for that document."
So what exactly does it mean by index entries for that document? Does each doc remember the entries, in which case searching is gonna be a full table scan? Or is it the same b-tree index of mysql where each index entry will point to multiple documents for each respective occurrence, in which case I'm over thinking too much.
Let's take an example:
obj1 = {
name: "Apollo",
text: "Some text about Apollo moon landings",
tags: [ "moon", "apollo", "spaceflight", "nasa" ]
}
obj2 = {
name: "Atlantis",
text: "Some text about Atlantis flight missions",
tags: [ "space", "atlantis", "spaceflight", "nasa" ]
}
db.articles.ensureIndex( { tags : 1 } )
Please help me understand! Thanks, in advance.

In this case, your index (which is a B-tree) would look like this:
apollo => [ obj1 ]
atlantis => [ obj2 ]
moon => [ obj1 ]
nasa => [ obj1, obj2 ]
space => [ obj2 ]
spaceflight => [ obj1, obj2 ]
This is just a "regular" B-tree index, except that every document can appear more than once (it appears once for every unique tag value).

I think you misunderstood the difference between Multiindex and Compound indexes:
Compound indexes are user-defined indexes for multiple fields at once.
Multykey indexes: MongoDB determine if the field on which the Index is released is an array and create an index for each of the array elements, for example
db.user.ensureIndex({"address.street":1});
In this case and because the target field is an Array the index will store all the items but only once.
I highly recomend you to take a look at this simple articlw that will clarify you doubts about simple imdex types in MongoDB: http://mongodbspain.com/en/2014/01/24/mongodb-indexes-part1/
Regards,

Related

How to match an array against another array query

Say I have a collection that has documents like this where the crossRefs field is an array of strings:
{ name: "joe", crossRefs: [ "1" , "2" , "10a"] }
{ name: "jane", crossRefs: [ "10a" , "0xfgh" , "9"] }
{ name: "john", crossRefs: [ "11" , "hhj12" , "13dd"] }
...
Question 1: How to build a query against crossRefs with an array of strings
I would like to have a query that says "give me all the documents that have any one of crossRefs '10a', '1' and '2'". So the result from the above 3 documents would be just the first 2. I'm thinking I need $elemMatch or $in with a combination of ors but I just don't know how to formulate this.
Question 2: How to specify an index
I would like for MongoDB to build an index based on the crossRefs field. In Java is this as simple as:
mainDB.getCollection("theCollection").ensureIndex("crossRefs");
I.e. does that work ok when the field is actually an array of strings?
Thanks in advance.
$in works with array fields as well as non-array fields, so in the shell it would be:
db.theCollection.find({crossRefs: {$in: ['10a', '1', '2']}})
This will find the docs where at least one crossRefs element matches an $in element.
And yes, that's the correct way to index the field, even when it's an array.

How to fetch between a range of indexes in mongodb?

I need help.. Is there any method available to fetch documents between a range of indexes while using find in mongo.. Like [2:10] (from 2 to 10) ?
If you are talking about the "index" position within an array in your document then you want the $slice operator. The first argument being the index to start with and the second is how many to return. So from a 0 index position 2 is the "third" index:
db.collection.find({},{ "list": { "$slice": [ 2, 8 ] })
Within a collection itself if you use the .limit() an .skip() modifiers to move through the range in the collection:
db.collection.find({}).skip(2).limit(8)
Keep in mind that in the collection context MongoDB has no concept of "ordered" records and is dependent on the query and/or sort order that is given

How feasible is this query

suppose you have a collection of documents with the following structure:
_id
A_id = ObjectId
B_id = ObjectId
C_id = ObjectId
+ other stuff
suppose you have a collection of roughly 100 million to 1 billion documents. I have to run a query,
which returns all documents such that A_id, B_id, or C_id are in some list of ObjectId, say L = [ ObjectId]
Something like this:
{ '$or' : [ { 'A_id' : { '$in' : L}},
{ 'B_id' : { '$in' : L}},
{ 'C_id' : { '$in' : L}} ]
}
Q: is it doable to run such query? Is it normal to run such queries on mongodb?
Q: how long can it take of a single server and how long may it take at horizontally scaled database?
It is a doable query.
The real question is "is it a good query?"
The answer to that question is extremely dependant upon many variables.
First off I am assuming you have an index on each of the fields you are querying. I am also assuming that the query stands as is, without a sort. It should be noted that there are problems which stop a sort index from being used here by the optimiser: https://jira.mongodb.org/browse/SERVER-1205
Assuming you have indexes on A_id, B_id and C_id MongoDB will essentially do 3 queries and merge duplicates before returning your result.
This means that for small $or queries it might be faster within the database (or mongos) itself since you don't have to merge duplicates yourself in your application which not only spares network traffic but also costly iteration of the results of each clause of the $or.
So for a small $or like that the query is Okay. It isn't the best query in the world but it will do if you have no choice but to do an $or.
Q: how long can it take of a single server and how long may it take at horizontally scaled database?
Not sure if anyone here can answer that. It depends upon schema, the size of the $ins and much more.
It's certainly doable to run that query. However, you might want to consider an alternative structure that could be more easily searched.
Instead of
_id
A_id = ObjectId
B_id = ObjectId
C_id = ObjectId
+ other stuff
You might want to restructure it to be:
_id
idList = [
{ k: 'A', v: AObjectId },
{ k: 'B', v: BObjectId },
{ k: 'C', v: CObjectId }
]
+ other stuff
By using an array, with sub-objects with a key and value field, you can index the value fields so you can do just a single efficient query:
{ 'idList' : { $in : [listToCheck] } }

MongoDB complex indices

I'm trying to understand how to best work with indices in MongoDB. Lets say that I have a collection of documents like this one:
{
_id: 1,
keywords: ["gap", "casual", "shorts", "oatmeal"],
age: 21,
brand: "Gap",
color: "Black",
gender: "female",
retailer: "Gap",
style: "Casual Shorts",
student: false,
location: "US",
}
and I regularly run a query to find all documents that match a set of criteria for each of those fields, something like:
db.items.find({ age: { $gt: 13, $lt: 40 },
brand: { $in: ['Gap', 'Target'] },
retailer: { $in: ['Gap', 'Target'] },
gender: { $in: ['male', 'female'] },
style: { $in: ['Casual Shorts', 'Jeans']},
location: { $in: ['US', 'International'] },
color: { $in: ['Black', 'Green'] },
keywords: { $all: ['gap', 'casual'] }
})
I'm trying to figure what sort of index I can create to improve the speed of a query such as this. Should I create a compound index like this:
db.items.ensureIndex({ age: 1, brand: 1, retailer: 1, gender: 1, style: 1, location: 1, color: 1, keywords: 1})
or is there a better set of indices I can create to optimize this query?
Should I create a compound index like this:
db.items.ensureIndex({age: 1, brand: 1, retailer: 1, gender: 1, style: 1, location: 1, color: 1, keywords: 1})
You can create an index like the one above, but you're indexing almost the entire collection. Indexes take space; the more fields in the index, the more space is used. Usually RAM, although they can be swapped out. They also incur write penalty.
Your index seems wasteful, since probably indexing just a few of those fields will make MongoDB scan a set of documents that is close to the expected result of the find operation.
Is there a better set of indices I can create to optimize this query?
Like I said before, probably yes. But this question is very difficult to answer without knowing details of the collection, like the amount of documents it has, which values each field can have, how those values are distributed in the collection (50% gender male, 50% gender female?), how they correlate to each other, etc.
There are a few indexing strategies, but normally you should strive to create indexes with high selectivity. Choose "small" field combinations that will help MongoDB locate the desired documents scanning a "reasonable" amount of them. Again, "small" and "reasonable" will depend on the characteristics of the collection and query you are performing.
Since this is a fairly complex subject, here are some references that should help you building more appropriate indexes.
http://emptysqua.re/blog/optimizing-mongodb-compound-indexes/
http://docs.mongodb.org/manual/faq/indexes/#how-do-you-determine-what-fields-to-index
http://docs.mongodb.org/manual/tutorial/create-queries-that-ensure-selectivity/
And use cursor.explain to evaluate your indexes.
http://docs.mongodb.org/manual/reference/method/cursor.explain/
Large index like this one will penalize you on writes. It is better to index just what you need, and let Mongo's optimiser do most of the work for you. You can always give him an hint or, in last resort, reindex if you application or data usage changes drastically.
Your query will use the index for fields that have one (fast), and use a table scan (slow) on the remaining documents.
Depending on your application, a few stand alone indexes could be better. Adding more indexes will not improve performance. With the write penality, it could even make it worse (YMMV).
Here is a basic algorithm for selecting fields to put in an index:
What single field is in a query the most often?
If that single field is present in a query, will a table scan be expensive?
What other field could you index to further reduce the table scan?
This index seems to be very reasonable for your query. MongoDB calls the query a covered query for this index, since there is no need to access the documents. All data can be fetched from the index.
from the docs:
"Because the index “covers” the query, MongoDB can both match the query conditions and return the results using only the index; MongoDB does not need to look at the documents, only the index, to fulfill the query. An index can also cover an aggregation pipeline operation on unsharded collections."
Some remarks:
This index will only be used by queries that include a filter on age. A query that only filters by brand or retailer will probably not use this index.
Adding an index on only one or two of the most selective fields of your query will already bring a very significant performance boost. The more fields you add the larger the index size will be on disk.
You may want to generate some random sample data and measure the performance of this with different indexes or sets of indexes. This is obviously the safest way to know.

How do I do a "NOT IN" query in Mongo?

This is my document:
{
title:"Happy thanksgiving",
body: "come over for dinner",
blocked:[
{user:333, name:'john'},
{user:994, name:'jessica'},
{user:11, name: 'matt'},
]
}
What is the query to find all documents that do not have user 11 in "blocked"?
You can use $in or $nin for "not in"
Example ...
> db.people.find({ crowd : { $nin: ["cool"] }});
I put a bunch more examples here: http://learnmongo.com/posts/being-part-of-the-in-crowd/
Since you are comparing against a single value, your example actually doesn't need a NOT IN operation. This is because Mongo will apply its search criteria to every element of an array subdocument. You can use the NOT EQUALS operator, $ne, to get what you want as it takes the value that cannot turn up in the search:
db.myCollection.find({'blocked.user': {$ne: 11}});
However if you have many things that it cannot equal, that is when you would use the NOT IN operator, which is $nin. It takes an array of values that cannot turn up in the search:
db.myCollection.find({'blocked.user': {$nin: [11, 12, 13]}});
Try the following:
db.stack.find({"blocked.user":{$nin:[11]}})
This worked for me.
See http://docs.mongodb.org/manual/reference/operator/query/nin/#op._S_nin
db.inventory.find( { qty: { $nin: [ 5, 15 ] } } )
This query will
select all documents in the inventory collection where the qty field
value does not equal 5 nor 15. The selected documents will include
those documents that do not contain the qty field.
If the field holds an array, then the $nin operator selects the
documents whose field holds an array with no element equal to a value
in the specified array (e.g. , , etc.).