Using Mongo compound indexes - mongodb

I'm trying to create a compound index for a popular query.
There are two fields. A 'client' field of type String, containing an IP address. The other is a 'sendOn' field which is of type Date. I'm looking for documents where the client is null and the sendOn is between a certain range.
To support ascending sorting on the sendOn field, I've established that I need a sendOn index with a value of 1.
So, I've run
> db.queries.ensureIndex({"client":1,"sendOn":1})
> db.queries.find({ $query: { client: { $exists: false } , sendOn: { $gt: new Date(1387664033883), $lt: new Date(1387750493883) } } }).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 2546133,
"nscannedObjects" : 2546133,
"n" : 0,
"millis" : 25071,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
The documentation says that $exists queries are generally inefficient. I've considered forcing documents to contain at least an empty client field. But that query doesn't use the index either?

It should use the index (provided you replace the $exists by a null value, as you mentioned - the $exists operator is not optimized for indexes).
You may be messing things up here at some point : documentation states you shouldn't use .explain() with a $query format :
http://docs.mongodb.org/manual/reference/operator/meta/query/
Try this :
db.queries.find({ $query: { client: null , sendOn: { $gt: new Date(1387664033883), $lt: new Date(1387750493883) } }, $explain: true });

Related

MongoDB Index boundry constraints

During my hands on with MongoDB I came to understand about a problem with MongoDB indexes. Problem is that MongoDB indexes sometimes doesn't enforce the two-end boundaries to query. Here's one of the output I encountered while querying the database:
Query:
db.user.find({transaction:{$elemMatch:{product:"mobile", firstTransaction:{$gte:ISODate("2015-01-01"), $lt:ISODate("2015-01-02")}}}}).hint("transaction.product_1_transaction.firstTransaction_1").explain()
Output:
"cursor" : "BtreeCursor transaction.firstTransaction_1_transaction.product_1",
"isMultiKey" : true,
"n" : 622,
"nscannedObjects" : 350931,
"nscanned" : 6188185,
"nscannedObjectsAllPlans" : 350931,
"nscannedAllPlans" : 6188185,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 235851,
"nChunkSkips" : 0,
"millis" : 407579,
"indexBounds" : {
"transaction.firstTransaction" : [
[
true,
ISODate("2015-01-02T00:00:00Z")
]
],
"transaction.product" : [
[
"mobile",
"mobile"
]
]
},
As you can see in above example for firstTransaction field one end of the bound is true instead of date I mentioned. I found the workaround for this is min(), max() functions. I tried those but they not seem to be working with embedded document (transaction is an array of sub document which contains fields like firstTransaction, product etc). I get following error:
Query:
db.user.find({transaction:{$elemMatch:{product:'mobile'}}}).min({transaction:{$elemMatch:{firstTransaction:ISODate("2015-01-01")}}}).max({transaction:{$elemMatch:{firstTransaction:ISODate("2015-01-02")}}})
Output:
planner returned error: unable to find relevant index for max/min query
firstTransaction field is indexed though as well as product & their compound index too. I don't know what is going wrong here.
Sample document:
{
_id: UUID (indexed by default),
name: string,
dob: ISODate,
addr: string,
createdAt: ISODate (indexed),
.
.
.,
transaction:[
{
firstTransaction: ISODate(indexed),
lastTransaction: ISODate(indexed),
amount: float,
product: string (indexed),
.
.
.
},...
],
other sub documents...
}
This is the correct behavior. You cannot always intersect the index bounds for $lte and $gte - sometimes it would give incorrect results. For example, consider the document
{ "x" : [{ "a" : [4, 6] }] }
This document matches the query
db.test.find({ "x" : { "$elemMatch" : { "a" : { "$gte" : 5, "$lte" : 5 } } } });
If we define an index on { "x.a" : 1 }, the two index bounds would be [5, infinity], and [-infinity, 5]. Intersecting them would give [5, 5] and using this index bound would not match the document - incorrectly!
Can you provide a sample document and tell us more about what you're trying to do with the query? With context, there may be another way to write the query that uses tighter index bounds.

Mongo Aggregate not using Index

My mongo find query is using an index, but the same functionality if I am implementing using aggregate, it is not using the Index.
db.collection1.find({Attribute8: "s1000",Attribute9: "s1000"}).sort({Attribute10: 1})
"cursor used in find" : "BtreeCursor Attribute8_1_Attribute9_1_Attribute10_1"
db.collection1.aggregate([
{
$match: {
Attribute8: "s1000",
Attribute9: "s1000"
}
},
{
$sort: {
Attribute10: 1
}
}
])
"cursor used in aggregate" : "BtreeCursor ".
Can someone tell me where it went wrong. My goal is to use Indexes in aggregate method.
Thanks in advance.
After some digging the issue is the limitation of usage of the following types:
Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope
In this case Symbol is used for containing a string value, so index wont work.
Please try with a Number en set explain to true in the aggregate option.
[EDIT]
My previous answer is incorrect.
The aggregation pipeline is using a 'BtreeCursor' (only when the defined field has an index) to run the $match query and does uses the ensured index, check "indexBound" for verification.
Ensuring the whole collection to have an index on "Attribute08"
db.temps.ensureIndex({Attribute08:1})
$match on a field with an index:
db.temps.aggregate([{$match:{Attribute08:"s1000"}}],{explain:true})
"allPlans" : [
{
"cursor" : "BtreeCursor ",
"isMultiKey" : false,
"scanAndOrder" : false,
"indexBounds" : {
"Attribute08" : [
[
"s1000",
"s1000"
]
]
}
}
]
Below the $match on a field without index:
db.temps.aggregate([{$match:{Attribute09:"s1000"}}],{explain:true})
"allPlans" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"scanAndOrder" : false
}
]

Incorrect items count with specific index usage

I'm using MongoDB, version 2.4.8 on windows server 2008 R2 and I have strange index behaviour which I can't explain. Here example of structure that I have in my collection:
{
"_id" : NUUID("67070100-4627-4aa5-8ab9-45624e5b82ad"),
"PropertyType" : "Cooperative",
"Address" : {
"Street" : "aaaaaaaaa",
"HouseNo" : "165",
"PostalCode" : 2860,
"City" : "bbbbb",
"Floor" : "1",
"DoorNumber" : ""
},
"Sales" : {
"Price" : 425000,
"Payout" : 0,
"AreaPrice" : 9042,
"GrossPrice" : 2340,
"NetPrice" : 800,
},
"WithdrawnFromSale" : true,
"UnitData" : {
"UnitType" : "aaaaa",
"Area" : 400,
"LivingArea" : 50,
"UnitArea" : 50,
"Rooms" : 2,
"BuildYear" : 1948,
"GroundArea" : 203,
"NoiseLevel" : 5
}
}
Also, I've created index for that collection:
db["UnitModel"].ensureIndex({ "Sales": 1, "PropertyType": 1, "UnitData.Rooms": 1, "UnitData.NoiseLevel": 1 })
The problem with that index is that I get wrong count of items when using this index.
When I issue this request:
db.UnitModel.find({Sales: {$ne: null}, WithdrawnFromSale: false}).explain({verbose: true})
I get following results:
{
"cursor" : "BtreeCursor Sales_1_PropertyType_1_UnitData.Rooms_1_UnitData.NoiseLevel_1 multi",
"isMultiKey" : false,
"n" : 19368,
"nscannedObjects" : 42875,
"nscanned" : 42876,
"nscannedObjectsAllPlans" : 43274,
"nscannedAllPlans" : 43276,
"scanAndOrder" : false,
"indexOnly" : false,
....
}
Here we can see that index has been used, but the number of items returned is "n" : 19368. which is wrong.
It should be 70986 items in collection with that criteria.
Why am I sure that it should be more records? Well, here the code:
var totalCount = 0;
db.UnitModel.find({WithdrawnFromSale: false}).forEach(
function (e) {
if(e.hasOwnProperty('Sales') && e.Sales != null)
totalCount++;
}
)
totalCount;
totalCount = 70986
To be sure that query above do not use any indexes let's check it out:
db.UnitModel.find({WithdrawnFromSale: false}).explain({verbose: true})
And result:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 70986,
"nscannedObjects" : 3204212,
"nscanned" : 3204212,
"nscannedObjectsAllPlans" : 3204212,
"nscannedAllPlans" : 3204212,
"scanAndOrder" : false,
"indexOnly" : false,
....
}
So, for UnitModel collection I'm using, for criteria: Sales: {$ne: null}, WithdrawnFromSale: false it should be 70986 records returned by mongo. But as you can see I get it wrong.
Can someone explain me why? What can be the reason?
BTW. When I drop that index and use following index:
db["UnitModel"].ensureIndex({ "WithdrawnFromSale": 1})
it works as expected. But I do not need that index, it's not optimzal for my case.
As at MongoDB 2.4, the maximum size of an indexed value is 1024 bytes. The current behaviour for a key too large to index is to log a warning on the server side -- but this does not throw an exception. In this case, documents with excessively long keys will not be included in the index when the key is too long, but will be included in other indexes. This can lead to inconsistencies in results such as incorrect counts and "missing documents" that cannot be found by one index but may be available in another index or with a $natural search.
In the MongoDB 2.5 development/unstable branch (which will culminate in the MongoDB 2.6 production release later this year) this behaviour has changed. As at MongoDB 2.5.5, an exception will now be raised if a insert/update includes an index update where the keys would be too large. See SERVER-5290 in the MongoDB issue tracker for more details.
Figure out what the reason of the issue. When I look in log files for monogodb I have seen tons of following messages:
HBReadModel.system.indexes Btree::insert: key too large to index, skipping HBReadModel.UnitModel.$Sales_1_WithdrawnFromSale_1_PropertyType_1_UnitData.Rooms_1_UnitData.NoiseLevel_1
I was trying to create index on sales field which in actually document and not field. To avoid this I just re-created index and specify field inside Sales document. Log is clear, query returns records as expected.

Mongod - IndexOnly is false even though query and projection are both covered in the index

The mongo documentation for covered querieshere talks about the queries and projections and to simply turn off the _id field in the projection if you want a covered query. What if you need the _id field though and still want the efficiency of a covered query (indexOnly = True)?
db.collection.ensureIndex({field1:1,_id:1})
db.collection.getIndexKeys()
[{
"_id" : 1
},
{
"field1" : 1
},
{
"field1" : 1,
"_id" : 1
}]
db.collection.find({field1:{$regex:/^\s/}},{field1:1,_id:1}).explain()
{
"cursor" : "BtreeCursor fieldname",
"isMultiKey" : false,
"n" : 3582,
"nscannedObjects" : 3582,
"nscanned" : 130511408,
"nscannedObjectsAllPlans" : 3582,
"nscannedAllPlans" : 130511408,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 20,
"nChunkSkips" : 0,
"millis" : 158705,
"indexBounds" : {
"cdr3_aa" : [
[
"",
{
}
]
]
},
"server" : localhost}
Of course if I turn off _id on the projection, IndexOnly returns true and the query is lightning fast. What am I doing wrong?
EDIT - I made it more efficient by getting rid of case insensitivity on a space, adding a ^ to speed up the query, but IndexOnly : False. I don't understand why its not true.
From documentation:
$regex can only use an index efficiently when the regular expression
has an anchor for the beginning (i.e. ^) of a string and is a
case-sensitive match. Additionally, while /^a/, /^a.*/, and /^a.*$/ match equivalent
strings, they have different performance characteristics. All of these expressions use an
index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can
stop scanning after matching the prefix.
In your case you use regex with i which means case-insensitive match. So, you should remove i from regex and start to search from the beginning of field.
BTW, I don't undestand your search criteria: Looking for one space char \s in the field with case-insensitive?
Not quite sure why, but it seems as though your query is using the { field1: 1 } index instead of { field1: 1, _id: 1 } index. Can you try running the query with the hint?
db.collection.find( { field1: {$regex:/^\s/} }, { field1: 1, _id: 1 } ).hint( { field1: 1, _id: 1 } ).explain()
It could be that the query optimizer has selected the { field1: 1 } index initially and has not re-evaluated the various plans. See http://docs.mongodb.org/manual/core/query-plans/ for explanation of the query optimizer and how it selects a plan.

mongodb index an array's key (not the values)

In MongoDB, I have the following document
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": {
"blabla": 20,
"blibli": 100,
"blublu": 250,
... (many more here)
}
}
And I would like to index it to be able to query for the "key" of the "concept" array (I know it's not really a mongoDB array...):
db.things.find({concepts:blabla});
Is it possible with the above schema? Or shall I refactor my documents to something like
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": ["blabla","blibli","blublu", ... (many more here)]
}
}
I'll answer your actual question. No you cannot index on the field names given your current schema. $exists uses an index but that is an existence check only.
There are a lot of problems with a schema like the one you're using and I would suggest a refactor to :
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": [
{name:"blabla", value: 20},
{name:"blibli", value: 100},
{name:"blublu", value: 250},
... (many more here)
]
}
then index {'concepts.name:1'} and you can actually query on the concept names rather than just check for the existence.
TL;DR : No you can't.
You can query field presence with specific query:
db.your_collection.find({"concept.yourfield": { $exists: true }})
(notice the $exists)
It will return all your document where yourfield is a field of concept subdocument
edit:
this solution is only about query. Indexes contains values not field.
MongoDB indexes each value of the array so you can query for individual items.As you can find here.
But in nested arrays you need to tell to index mongodb to index your sub-fields.
db.col1.ensureIndex({'concepts.blabla':1})
db.col1.ensureIndex({'concepts.blublu':1})
db.col1.find({'concepts.blabla': 20}).explain()
{
"cursor" : "BtreeCursor concepts.blabla_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"concepts.blabla" : [
[
20,
20
]
]
}
}
After creating the index , the cursor type changes itself from BasicCursor to BtreeCursor.
if you create your document as you stated at the end of your question
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": ["blabla","blibli","blublu", ... (many more here)]
}
}
just the indexing will be enough as below:
db.col1.ensureIndex({'concepts':1})