MongoDB best optimisation on embedded documents using index - mongodb

In MondoDB, how can you optimize better a collection with embedded documents using index, if you need to query the embedded document?
For example, in a collection with the following format:
{
name: “Andy”,
address: {
city: “London”,
street: “Sunny St.”
}
}
If we need to query by:
db.collection.find( {$and: [ {"address.city ": “London”}, {"address”: “Sunny St."} ] } )
Which type of index will be better:
1. db.collection.createIndex({"address":1})
2. db.collection.createIndex({"address.city ":1})
db.collection.createIndex({"address.street":1})
3. db.collection.createIndex({"address.city ":1, "address.street":1})
Thanks

for given query proposal number 3
db.collection.createIndex({"address.city ":1, "address.street":1})
will do the job as there is logical relation city=> street
if you need to get more precise output how mongo uses index and perform your own test use query.explain("executionStats") to see index usage.
more here

Related

mongodb using less performant index

I am using mongodb 2.6, and I have the following query:
db.getCollection('Jobs').find(
{ $and: [ { RunID: { $regex: ".*_0" } },
{ $or: [ { JobType: "TypeX" },
{ JobType: "TypeY" },
{ JobType: "TypeZ" },
{ $and: [ { Info: { $regex: "Weekly.*" } }, { JobType: "YetAnotherType" } ] } ] } ] })
I have three different indexes: RunID, RunID + JobType, RunID + JobType + Info. Mongo is always using the index containing RunID only, although the other indexes seem more likely to produce faster results, it is even sometimes using an index consisting of RunID + StartTime while StartTime is not even in the list of used fields, any idea why is it choosing that index?
Note1:
You can drop your first 2 indexes, RunID and RunID + JobType. It is enough to use just the expanded compound index RunID + JobType + Info; this can be also used to query on RunID or RunID + JobType fields, info here:
In addition to supporting queries that match on all the index fields,
compound indexes can support queries that match on the prefix of the
index fields.
When you drop those indexes, mongo will choose the only remained index.
Note2:
You can always use hint, to tell mongo to use a specific index:
db.getCollection('Jobs').find().hint({RunID:1, JobType:1, Info:1})
Thanks to Sergiu's answer and Sammaye's comment, I think I found what I am looking for:
I got rid of RunID index, since RunID is a prefix in many other indexes, mongodb will use it if it needs only RunID.
Concerning $or, we have the following in the documentation:
When evaluating the clauses in the $or expression, MongoDB either
performs a collection scan or, if all the clauses are supported by
indexes, MongoDB performs index scans. That is, for MongoDB to use
indexes to evaluate an $or expression, all the clauses in the $or
expression must be supported by indexes. Otherwise, MongoDB will
perform a collection scan.
As I mentioned earlier, RunID is already indexed, so we need a new index for the other fields in the query: JobType and Info, since JobType needs to be the index's prefix so that it can be used in queries not containing Info field, so the second index I created is
{ "JobType": 1.0, "Info": 1.0}
As a result, mongodb will use a complex plan in which different indexes will be used.

Put multiple fields in one index

When I have a collection of users, with a structure like this:
{
...
phone:"XXXXXXXXXX",
email:"XXXXXXXXXX",
...
}
Is there a way to put both the fields phone and email into one index, for example called contactinfos, so that I can query my user by their phonenumber OR by their email address like this:
db.users.find({contactinfos:"E-Mail-Address or Phonenumber"})
?
You don't need to create an aggregate field. You can query your collection with $or operator:
db.users.find( { $or: [ { email: "email#somethingc.com"}, { phone: "123456" } ] } );
This will find all documents with either e-mail OR phone.
You can then create a compound index on both fields to make the query faster (if you're using MongoDB 2.4).
If you're using MongoDB 2.6 you can take advantage of index intersection and create two separate indexes for those fields.

MongoDB "filtered" index: is it possible?

Is it possible to index some documents of the collection "only if" one of the fields to be indexed has a particular value?
Let me explain with an example:
The collection "posts" has millions of documents, ALL defined as follows:
{
    "network": "network_1",
    "blogname": "blogname_1",
    "post_id": 1234,
    "post_slug": "abcdefg"
}
Let's assume that the distribution of the post is equally split on network_1 and network_2
My application OFTEN select the type of query based on the value of "network" (although sometimes I need the data from both networks):
For example:
www.test.it/network_1/blog_1/**postid**/1234/
-> db.posts.find ({network: "network_1" blogname "blog_1", post_id: 1234})
www.test.it/network_2/blog_4/**slug**/aaaa/
-> db.posts.find ({network: "network_2" blogname "blog_4" post_slug: "yyyy"})
I could create two separate indexes (network / blogname / post_id and network / blogname / post_slug) but I would get a huge waste of RAM, since 50% of the data in the index will never be used.
Is there a way to create an index "filtered"?
Example:
(Note the WHERE parameter)
db.posts.ensureIndex ({network: 1 blogname: 1, post_id: 1}, {where: {network: "network_1"}})
db.posts.ensureIndex ({network: 1 blogname: 1, post_slug: 1}, {where: {network: "network_2"}})
Indeed it's possible in MongoDB 3.2+ They call it partialFilterExpression where you can set a condition based on which index will be created.
Example
db.users.createIndex({ "userId": 1, "project": 1 },
{ unique: true, partialFilterExpression:{
userId: { $exists: true, $gt : { $type : 10 } } } })
Please see Partial Index documentation
As of MongoDB v3.2, partial indexes are supported. Documentation: https://docs.mongodb.org/manual/core/index-partial/
It's possible, but it requires a workaround which creates redundancy in your documents, requires you to rewrite your find-queries and limits find-queries to exact matches.
MongoDB supports sparse indexes which only index the documents where the given field exists. You can use this feature to only index a part of the collection by adding this field only to those documents you want to index.
The bad news is that sparse indexes can only include a single field. But the good news is, that this field can also contain an object with multiple fields, so you can still store all the data you want to search for in this field.
To do this, add a new field to the included documents which includes an object with the fields you search for:
{
"network": "network_1",
"blogname": "blogname_1",
"post_id": 1234,
"post_slug": "abcdefg"
"network_1_index_key": {
"blogname": "blogname_1",
"post_id": 1234
}
}
Your ensureIndex command would index the field network_1_index_key:
db.posts.ensureIndex( { network_1_index_key: 1 }, { sparse: true } )
A find-query which is supposed to use this index, must now query for the exact object of the field network_1_index_key:
db.posts.find ({
network_1_index_key: {
blogname: "blogname_1",
post_id: 1234
}
})
Doing this would likely only make sense when the documents you want to index are a very small part of the collection. When its about half, I would just create a regular index and live with it because the larger document-size could mitigate the gains from the reduced index size.
You can try create index on all field (network / blogname / post_id / post_slug)

mongodb mapreduce function does not provide skip functionality, is their any solution to this?

Mongodb mapreduce function does not provide any way to skip record from database like find function. It has functionality of query, sort & limit options. But I want to skip some records from the database, and I am not getting any way to it. please provide solutions.
Thanks in advance.
Ideally a well-structured map-reduce query would allow you to skip particular documents in your collection.
Alternatively, as Sergio points out, you can simply not emit particular documents in map(). Using scope to define a global counter variable is one way to restrict emit to a specified range of documents. As an example, to skip the first 20 docs that are sorted by ObjectID (and thus, sorted by insertion time):
db.collection_name.mapReduce(map, reduce, {out: example_output, sort: {id:-1}, scope: "var counter=0")};
Map function:
function(){
counter ++;
if (counter > 20){
emit(key, value);
}
}
I'm not sure since which version this feature is available, but certainly in MongoDB 2.6 the mapReduce() function provides query parameter:
query : document
Optional. Specifies the selection criteria using query operators for determining the documents input to the map
function.
Example
Consider the following map-reduce operations on a collection orders that contains documents of the following prototype:
{
_id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: 'A',
price: 25,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]
}
Perform the map-reduce operation on the orders collection using the mapFunction2, reduceFunction2, and finalizeFunction2 functions.
db.orders.mapReduce( mapFunction2,
reduceFunction2,
{
out: { merge: "map_reduce_example" },
query: { ord_date:
{ $gt: new Date('01/01/2012') }
},
finalize: finalizeFunction2
}
)
This operation uses the query field to select only those documents with ord_date greater than new Date(01/01/2012). Then it output the results to a collection map_reduce_example. If the map_reduce_example collection already exists, the operation will merge the existing contents with the results of this map-reduce operation.

How to Create a nested index in MongoDB?

A. How do I index nested and all of it's values?
B. How do I index valuetwo?
{
id: 00000,
attrs: {
nested:{
value: value1,
valuetwo: value2,
}
}
}
I've looked here: http://www.mongodb.org/display/DOCS/Indexes, and the docs to my knowledge, aren't clear about indexing things that aren't nested.
You'd create them just as if you were creating an index on a top level field:
db.collection.createIndex({"attrs.nested.value": 1})
You do need to explicitly create indexes on each field.
MongoDB automatically creates a multikey index if any indexed field is an array; you do not need to explicitly specify the multikey type.
This will work for both the scenario's
db.coll.createIndex( { "addr.pin": 1 } )
Scenario 1 nested OBJECTS
{
userid: "1234",
addr: {
pin:"455522"
}
},
{
userid: "1234",
addr: {
pin:"777777"
}
}
Scenario 2 nested Arrays
{
userid: "1234",
addr: [
{ pin:"455522" },
{ pin:"777777" },
]
}
https://docs.mongodb.com/manual/core/index-multikey/
A. to index all the properties in "nested" you will have to index them separately:
db.collection.createIndex({"attrs.nested.value": 1});
db.collection.createIndex({"attrs.nested.valuetwo": 1});
This can be done in one command with:
db.collection.createIndexes([{"attrs.nested.value": 1}, {"attrs.nested.valuetwo": 1}]);
B. to index just "valuetwo":
db.collection.createIndex({"attrs.nested.valuetwo": 1})
Use createIndex over ensureIndex as ensureIndex is Deprecated since version 3.0.0
It looks like there may have been a new feature to address this since these answers have been written.
Mongodb has wildcard indexes that allow you to create an index for a field that contains varying keys.
From the documentation:
If the field is a nested document or array, the wildcard index recurses into the document/array and indexes all scalar fields in the document/array.
So the index in the question would be constructed like this:
db.collections.createIndex({ "attrs.nested.$**": 1 })
You can read more about this here!
The previous answers nearly gave me a heart attack.