MongoDB Index boundry constraints - mongodb

During my hands on with MongoDB I came to understand about a problem with MongoDB indexes. Problem is that MongoDB indexes sometimes doesn't enforce the two-end boundaries to query. Here's one of the output I encountered while querying the database:
Query:
db.user.find({transaction:{$elemMatch:{product:"mobile", firstTransaction:{$gte:ISODate("2015-01-01"), $lt:ISODate("2015-01-02")}}}}).hint("transaction.product_1_transaction.firstTransaction_1").explain()
Output:
"cursor" : "BtreeCursor transaction.firstTransaction_1_transaction.product_1",
"isMultiKey" : true,
"n" : 622,
"nscannedObjects" : 350931,
"nscanned" : 6188185,
"nscannedObjectsAllPlans" : 350931,
"nscannedAllPlans" : 6188185,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 235851,
"nChunkSkips" : 0,
"millis" : 407579,
"indexBounds" : {
"transaction.firstTransaction" : [
[
true,
ISODate("2015-01-02T00:00:00Z")
]
],
"transaction.product" : [
[
"mobile",
"mobile"
]
]
},
As you can see in above example for firstTransaction field one end of the bound is true instead of date I mentioned. I found the workaround for this is min(), max() functions. I tried those but they not seem to be working with embedded document (transaction is an array of sub document which contains fields like firstTransaction, product etc). I get following error:
Query:
db.user.find({transaction:{$elemMatch:{product:'mobile'}}}).min({transaction:{$elemMatch:{firstTransaction:ISODate("2015-01-01")}}}).max({transaction:{$elemMatch:{firstTransaction:ISODate("2015-01-02")}}})
Output:
planner returned error: unable to find relevant index for max/min query
firstTransaction field is indexed though as well as product & their compound index too. I don't know what is going wrong here.
Sample document:
{
_id: UUID (indexed by default),
name: string,
dob: ISODate,
addr: string,
createdAt: ISODate (indexed),
.
.
.,
transaction:[
{
firstTransaction: ISODate(indexed),
lastTransaction: ISODate(indexed),
amount: float,
product: string (indexed),
.
.
.
},...
],
other sub documents...
}

This is the correct behavior. You cannot always intersect the index bounds for $lte and $gte - sometimes it would give incorrect results. For example, consider the document
{ "x" : [{ "a" : [4, 6] }] }
This document matches the query
db.test.find({ "x" : { "$elemMatch" : { "a" : { "$gte" : 5, "$lte" : 5 } } } });
If we define an index on { "x.a" : 1 }, the two index bounds would be [5, infinity], and [-infinity, 5]. Intersecting them would give [5, 5] and using this index bound would not match the document - incorrectly!
Can you provide a sample document and tell us more about what you're trying to do with the query? With context, there may be another way to write the query that uses tighter index bounds.

Related

MongoDB not uses indexes on some fields when doing distinct

I have noticed that MongoDB won't use indexes when querying for a distinct value on a field. I will use it on some fields, but won't on others.
Here's the example:
db.product.createIndex({"_indexed.preventieve_mondzorg-max_bedrag_p_jr": 1});
db.runCommand({distinct: "product", key:"_indexed.preventieve_mondzorg-max_bedrag_p_jr"});
This query will not use an index that is built on that field and will go for the full collection scan. That's what it produces:
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 50,
"numIndexesAfter" : 50,
"note" : "all indexes already exist",
"ok" : 1
}
{
"values" : [
"€ 250,- | 75%",
"Geen dekking",
"...",
],
"stats" : {
"n" : 33660,
"nscanned" : 0,
"nscannedObjects" : 33660,
"timems" : 12531,
"planSummary" : "COLLSCAN"
},
"ok" : 1
}
On the other hand
db.product.createIndex({"free_choice.value": 1});
db.runCommand({distinct: "product", key:"free_choice.value"});
Will the index:
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 50,
"numIndexesAfter" : 50,
"note" : "all indexes already exist",
"ok" : 1
}
{
"values" : [
"gedeeltelijk",
"geen",
"ja"
],
"stats" : {
"n" : 4,
"nscanned" : 4,
"nscannedObjects" : 4,
"timems" : 2,
"planSummary" : "DISTINCT { free_choice.value: 1.0 }"
},
"ok" : 1
}
So... what could be the difference between those two fields?
Is it a bug, or I am doing something wrong?
I am running MongoDB 3.0.6 in a Vagrant box with Ubuntu 14.04.3 LTS
Apparently this is the bug of MongoDB core. Or misbehaviour. MongoDB would not use multikey index for dotted fields for distinct requests.
Here's the Mongo's response:
The distinct optimization uses a special index access stage which
returns the distinct index keys to its parent stage. In the multikey
dotted case, however, the distinct stage would have to check for null
or undefined keys. In the case of null or undefined, it must fetch the
full document in order to disambiguate between literal nulls versus
null by virtue of missing fields. We have decided to hold off unless
we see that users really need this.
If you really want this feature, please vote for it here: SERVER-13298

Incorrect items count with specific index usage

I'm using MongoDB, version 2.4.8 on windows server 2008 R2 and I have strange index behaviour which I can't explain. Here example of structure that I have in my collection:
{
"_id" : NUUID("67070100-4627-4aa5-8ab9-45624e5b82ad"),
"PropertyType" : "Cooperative",
"Address" : {
"Street" : "aaaaaaaaa",
"HouseNo" : "165",
"PostalCode" : 2860,
"City" : "bbbbb",
"Floor" : "1",
"DoorNumber" : ""
},
"Sales" : {
"Price" : 425000,
"Payout" : 0,
"AreaPrice" : 9042,
"GrossPrice" : 2340,
"NetPrice" : 800,
},
"WithdrawnFromSale" : true,
"UnitData" : {
"UnitType" : "aaaaa",
"Area" : 400,
"LivingArea" : 50,
"UnitArea" : 50,
"Rooms" : 2,
"BuildYear" : 1948,
"GroundArea" : 203,
"NoiseLevel" : 5
}
}
Also, I've created index for that collection:
db["UnitModel"].ensureIndex({ "Sales": 1, "PropertyType": 1, "UnitData.Rooms": 1, "UnitData.NoiseLevel": 1 })
The problem with that index is that I get wrong count of items when using this index.
When I issue this request:
db.UnitModel.find({Sales: {$ne: null}, WithdrawnFromSale: false}).explain({verbose: true})
I get following results:
{
"cursor" : "BtreeCursor Sales_1_PropertyType_1_UnitData.Rooms_1_UnitData.NoiseLevel_1 multi",
"isMultiKey" : false,
"n" : 19368,
"nscannedObjects" : 42875,
"nscanned" : 42876,
"nscannedObjectsAllPlans" : 43274,
"nscannedAllPlans" : 43276,
"scanAndOrder" : false,
"indexOnly" : false,
....
}
Here we can see that index has been used, but the number of items returned is "n" : 19368. which is wrong.
It should be 70986 items in collection with that criteria.
Why am I sure that it should be more records? Well, here the code:
var totalCount = 0;
db.UnitModel.find({WithdrawnFromSale: false}).forEach(
function (e) {
if(e.hasOwnProperty('Sales') && e.Sales != null)
totalCount++;
}
)
totalCount;
totalCount = 70986
To be sure that query above do not use any indexes let's check it out:
db.UnitModel.find({WithdrawnFromSale: false}).explain({verbose: true})
And result:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 70986,
"nscannedObjects" : 3204212,
"nscanned" : 3204212,
"nscannedObjectsAllPlans" : 3204212,
"nscannedAllPlans" : 3204212,
"scanAndOrder" : false,
"indexOnly" : false,
....
}
So, for UnitModel collection I'm using, for criteria: Sales: {$ne: null}, WithdrawnFromSale: false it should be 70986 records returned by mongo. But as you can see I get it wrong.
Can someone explain me why? What can be the reason?
BTW. When I drop that index and use following index:
db["UnitModel"].ensureIndex({ "WithdrawnFromSale": 1})
it works as expected. But I do not need that index, it's not optimzal for my case.
As at MongoDB 2.4, the maximum size of an indexed value is 1024 bytes. The current behaviour for a key too large to index is to log a warning on the server side -- but this does not throw an exception. In this case, documents with excessively long keys will not be included in the index when the key is too long, but will be included in other indexes. This can lead to inconsistencies in results such as incorrect counts and "missing documents" that cannot be found by one index but may be available in another index or with a $natural search.
In the MongoDB 2.5 development/unstable branch (which will culminate in the MongoDB 2.6 production release later this year) this behaviour has changed. As at MongoDB 2.5.5, an exception will now be raised if a insert/update includes an index update where the keys would be too large. See SERVER-5290 in the MongoDB issue tracker for more details.
Figure out what the reason of the issue. When I look in log files for monogodb I have seen tons of following messages:
HBReadModel.system.indexes Btree::insert: key too large to index, skipping HBReadModel.UnitModel.$Sales_1_WithdrawnFromSale_1_PropertyType_1_UnitData.Rooms_1_UnitData.NoiseLevel_1
I was trying to create index on sales field which in actually document and not field. To avoid this I just re-created index and specify field inside Sales document. Log is clear, query returns records as expected.

Mongod - IndexOnly is false even though query and projection are both covered in the index

The mongo documentation for covered querieshere talks about the queries and projections and to simply turn off the _id field in the projection if you want a covered query. What if you need the _id field though and still want the efficiency of a covered query (indexOnly = True)?
db.collection.ensureIndex({field1:1,_id:1})
db.collection.getIndexKeys()
[{
"_id" : 1
},
{
"field1" : 1
},
{
"field1" : 1,
"_id" : 1
}]
db.collection.find({field1:{$regex:/^\s/}},{field1:1,_id:1}).explain()
{
"cursor" : "BtreeCursor fieldname",
"isMultiKey" : false,
"n" : 3582,
"nscannedObjects" : 3582,
"nscanned" : 130511408,
"nscannedObjectsAllPlans" : 3582,
"nscannedAllPlans" : 130511408,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 20,
"nChunkSkips" : 0,
"millis" : 158705,
"indexBounds" : {
"cdr3_aa" : [
[
"",
{
}
]
]
},
"server" : localhost}
Of course if I turn off _id on the projection, IndexOnly returns true and the query is lightning fast. What am I doing wrong?
EDIT - I made it more efficient by getting rid of case insensitivity on a space, adding a ^ to speed up the query, but IndexOnly : False. I don't understand why its not true.
From documentation:
$regex can only use an index efficiently when the regular expression
has an anchor for the beginning (i.e. ^) of a string and is a
case-sensitive match. Additionally, while /^a/, /^a.*/, and /^a.*$/ match equivalent
strings, they have different performance characteristics. All of these expressions use an
index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can
stop scanning after matching the prefix.
In your case you use regex with i which means case-insensitive match. So, you should remove i from regex and start to search from the beginning of field.
BTW, I don't undestand your search criteria: Looking for one space char \s in the field with case-insensitive?
Not quite sure why, but it seems as though your query is using the { field1: 1 } index instead of { field1: 1, _id: 1 } index. Can you try running the query with the hint?
db.collection.find( { field1: {$regex:/^\s/} }, { field1: 1, _id: 1 } ).hint( { field1: 1, _id: 1 } ).explain()
It could be that the query optimizer has selected the { field1: 1 } index initially and has not re-evaluated the various plans. See http://docs.mongodb.org/manual/core/query-plans/ for explanation of the query optimizer and how it selects a plan.

Slow range query on a multikey index

I have a MongoDB collection named post with 35 million objects. The collection has two secondary indexes defined as follows.
> db.post.getIndexKeys()
[
{
"_id" : 1
},
{
"namespace" : 1,
"domain" : 1,
"post_id" : 1
},
{
"namespace" : 1,
"post_time" : 1,
"tags" : 1 // this is an array field
}
]
I expect the following query, which simply filters by namespace and post_time, to run in a reasonable time without scanning all objects.
>db.post.find({post_time: {"$gte" : ISODate("2013-04-09T00:00:00Z"), "$lt" : ISODate("2013-04-09T01:00:00Z")}, namespace: "my_namespace"}).count()
7408
However, it takes MongoDB at least ten minutes to retrieve the result and, curiously, it manages to scan 70 million objects to do the job according to the explain function.
> db.post.find({post_time: {"$gte" : ISODate("2013-04-09T00:00:00Z"), "$lt" : ISODate("2013-04-09T01:00:00Z")}, namespace: "my_namespace"}).explain()
{
"cursor" : "BtreeCursor namespace_1_post_time_1_tags_1",
"isMultiKey" : true,
"n" : 7408,
"nscannedObjects" : 69999186,
"nscanned" : 69999186,
"nscannedObjectsAllPlans" : 69999186,
"nscannedAllPlans" : 69999186,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 378967,
"nChunkSkips" : 0,
"millis" : 290048,
"indexBounds" : {
"namespace" : [
[
"my_namespace",
"my_namespace"
]
],
"post_time" : [
[
ISODate("2013-04-09T00:00:00Z"),
ISODate("292278995-01--2147483647T07:12:56.808Z")
]
],
"tags" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "localhost:27017"
}
The difference between the number of objects and the number of scans must be caused by the lengths of the tag arrays (which are all equal to 2). Still, I don't understand why post_time filter does not make use of the index.
Can you tell me what I might be missing?
(I am working on a descent machine with 24 cores and 96 GB RAM. I am using MongoDB 2.2.3.)
Found my answer in this question: Order of $lt and $gt in MongoDB range query
My index is a multikey index (on tags) and I am running a range query (on post_time). Apparently, MongoDB cannot use both sides of the range as a filter in this case, so it just picks the $gte clause, which comes first. As my lower limit happens to be the lowest post_time value, MongoDB starts scanning all the objects.
Unfortunately, this is not the whole story. Trying to solve the problem, I created non-multikey indexes too but MongoDB insisted on using the bad one. That made me think that the problem was elsewhere. Finally, I had to drop the multikey index and create one without the tags field. Everything is fine now.

mongodb index an array's key (not the values)

In MongoDB, I have the following document
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": {
"blabla": 20,
"blibli": 100,
"blublu": 250,
... (many more here)
}
}
And I would like to index it to be able to query for the "key" of the "concept" array (I know it's not really a mongoDB array...):
db.things.find({concepts:blabla});
Is it possible with the above schema? Or shall I refactor my documents to something like
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": ["blabla","blibli","blublu", ... (many more here)]
}
}
I'll answer your actual question. No you cannot index on the field names given your current schema. $exists uses an index but that is an existence check only.
There are a lot of problems with a schema like the one you're using and I would suggest a refactor to :
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": [
{name:"blabla", value: 20},
{name:"blibli", value: 100},
{name:"blublu", value: 250},
... (many more here)
]
}
then index {'concepts.name:1'} and you can actually query on the concept names rather than just check for the existence.
TL;DR : No you can't.
You can query field presence with specific query:
db.your_collection.find({"concept.yourfield": { $exists: true }})
(notice the $exists)
It will return all your document where yourfield is a field of concept subdocument
edit:
this solution is only about query. Indexes contains values not field.
MongoDB indexes each value of the array so you can query for individual items.As you can find here.
But in nested arrays you need to tell to index mongodb to index your sub-fields.
db.col1.ensureIndex({'concepts.blabla':1})
db.col1.ensureIndex({'concepts.blublu':1})
db.col1.find({'concepts.blabla': 20}).explain()
{
"cursor" : "BtreeCursor concepts.blabla_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"concepts.blabla" : [
[
20,
20
]
]
}
}
After creating the index , the cursor type changes itself from BasicCursor to BtreeCursor.
if you create your document as you stated at the end of your question
{
"_id": { "$oid" : "4FFD813FE4B0931BDAAB4F01" },
"concepts": ["blabla","blibli","blublu", ... (many more here)]
}
}
just the indexing will be enough as below:
db.col1.ensureIndex({'concepts':1})