MongoDB not uses indexes on some fields when doing distinct - mongodb

I have noticed that MongoDB won't use indexes when querying for a distinct value on a field. I will use it on some fields, but won't on others.
Here's the example:
db.product.createIndex({"_indexed.preventieve_mondzorg-max_bedrag_p_jr": 1});
db.runCommand({distinct: "product", key:"_indexed.preventieve_mondzorg-max_bedrag_p_jr"});
This query will not use an index that is built on that field and will go for the full collection scan. That's what it produces:
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 50,
"numIndexesAfter" : 50,
"note" : "all indexes already exist",
"ok" : 1
}
{
"values" : [
"€ 250,- | 75%",
"Geen dekking",
"...",
],
"stats" : {
"n" : 33660,
"nscanned" : 0,
"nscannedObjects" : 33660,
"timems" : 12531,
"planSummary" : "COLLSCAN"
},
"ok" : 1
}
On the other hand
db.product.createIndex({"free_choice.value": 1});
db.runCommand({distinct: "product", key:"free_choice.value"});
Will the index:
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 50,
"numIndexesAfter" : 50,
"note" : "all indexes already exist",
"ok" : 1
}
{
"values" : [
"gedeeltelijk",
"geen",
"ja"
],
"stats" : {
"n" : 4,
"nscanned" : 4,
"nscannedObjects" : 4,
"timems" : 2,
"planSummary" : "DISTINCT { free_choice.value: 1.0 }"
},
"ok" : 1
}
So... what could be the difference between those two fields?
Is it a bug, or I am doing something wrong?
I am running MongoDB 3.0.6 in a Vagrant box with Ubuntu 14.04.3 LTS

Apparently this is the bug of MongoDB core. Or misbehaviour. MongoDB would not use multikey index for dotted fields for distinct requests.
Here's the Mongo's response:
The distinct optimization uses a special index access stage which
returns the distinct index keys to its parent stage. In the multikey
dotted case, however, the distinct stage would have to check for null
or undefined keys. In the case of null or undefined, it must fetch the
full document in order to disambiguate between literal nulls versus
null by virtue of missing fields. We have decided to hold off unless
we see that users really need this.
If you really want this feature, please vote for it here: SERVER-13298

Related

What's causing this indexing error in mongo db?

I've created 3 indexes based on several json files of Yelp that I imported into my mongodb.
> db.review.createIndex({"text":"text"})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.business.createIndex({"categories":"text"})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.business.createIndex({"attributes":"text"})
{
"ok" : 0,
"errmsg" : "Index with pattern: { _fts: \"text\", _ftsx: 1 } already exists with different options",
"code" : 85
Basically I'm trying to create 3 indexes to make count function faster in my mongodb.
What does "errmsg" : "Index with pattern: { _fts: \"text\", _ftsx: 1 } already exists with different options" means?
Should I pick a differente thing as attribute, or should I drop it?
MongoDB (as of v3.4) only allows one text index per collection
In your business collection, you already built a text index on categories. So, the second text index, on attributes, will fail.

MongoDB Index boundry constraints

During my hands on with MongoDB I came to understand about a problem with MongoDB indexes. Problem is that MongoDB indexes sometimes doesn't enforce the two-end boundaries to query. Here's one of the output I encountered while querying the database:
Query:
db.user.find({transaction:{$elemMatch:{product:"mobile", firstTransaction:{$gte:ISODate("2015-01-01"), $lt:ISODate("2015-01-02")}}}}).hint("transaction.product_1_transaction.firstTransaction_1").explain()
Output:
"cursor" : "BtreeCursor transaction.firstTransaction_1_transaction.product_1",
"isMultiKey" : true,
"n" : 622,
"nscannedObjects" : 350931,
"nscanned" : 6188185,
"nscannedObjectsAllPlans" : 350931,
"nscannedAllPlans" : 6188185,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 235851,
"nChunkSkips" : 0,
"millis" : 407579,
"indexBounds" : {
"transaction.firstTransaction" : [
[
true,
ISODate("2015-01-02T00:00:00Z")
]
],
"transaction.product" : [
[
"mobile",
"mobile"
]
]
},
As you can see in above example for firstTransaction field one end of the bound is true instead of date I mentioned. I found the workaround for this is min(), max() functions. I tried those but they not seem to be working with embedded document (transaction is an array of sub document which contains fields like firstTransaction, product etc). I get following error:
Query:
db.user.find({transaction:{$elemMatch:{product:'mobile'}}}).min({transaction:{$elemMatch:{firstTransaction:ISODate("2015-01-01")}}}).max({transaction:{$elemMatch:{firstTransaction:ISODate("2015-01-02")}}})
Output:
planner returned error: unable to find relevant index for max/min query
firstTransaction field is indexed though as well as product & their compound index too. I don't know what is going wrong here.
Sample document:
{
_id: UUID (indexed by default),
name: string,
dob: ISODate,
addr: string,
createdAt: ISODate (indexed),
.
.
.,
transaction:[
{
firstTransaction: ISODate(indexed),
lastTransaction: ISODate(indexed),
amount: float,
product: string (indexed),
.
.
.
},...
],
other sub documents...
}
This is the correct behavior. You cannot always intersect the index bounds for $lte and $gte - sometimes it would give incorrect results. For example, consider the document
{ "x" : [{ "a" : [4, 6] }] }
This document matches the query
db.test.find({ "x" : { "$elemMatch" : { "a" : { "$gte" : 5, "$lte" : 5 } } } });
If we define an index on { "x.a" : 1 }, the two index bounds would be [5, infinity], and [-infinity, 5]. Intersecting them would give [5, 5] and using this index bound would not match the document - incorrectly!
Can you provide a sample document and tell us more about what you're trying to do with the query? With context, there may be another way to write the query that uses tighter index bounds.

MongoDB fulltext search not using index

We use mongoDB fulltext search to find products in our database.
Unfortunately it is incredible slow.
The collection contains 89.114.052 documents and I have the suspicion, that the full text index is not used.
Performing a search with explain(), nscannedObjects returns 133212.
Shouldn't this be 0 if an index is used?
My index:
{
"v" : 1,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"name" : "textIndex",
"ns" : "search.products",
"weights" : {
"brand" : 1,
"desc" : 1,
"ean" : 1,
"name" : 3,
"shop_product_number" : 1
},
"default_language" : "german",
"background" : false,
"language_override" : "language",
"textIndexVersion" : 2
}
The complete test search:
> db.products.find({ $text: { $search: "playstation" } }).limit(100).explain()
{
"cursor" : "TextCursor",
"n" : 100,
"nscannedObjects" : 133212,
"nscanned" : 133212,
"nscannedObjectsAllPlans" : 133212,
"nscannedAllPlans" : 133212,
"scanAndOrder" : false,
"nYields" : 1041,
"nChunkSkips" : 0,
"millis" : 105,
"server" : "search2:27017",
"filterSet" : false
}
Please have a look at the question you asked:
".... The collection contains 89.114.052 documents and I have the suspicion, that the full text index is not used ...."
You are only "nScanned" for 133212 documents. Of course the index is used. If it was not then 89,114,052 documents ( because this is English locale and not German ) would have otherwise been reported in "nScanned" which means an index is not used.
Your query is slow. Well it seems your hardware is not up to the task of keeping 1333212 documents in memory or otherwise having the super fast disk to "page" effectively. But this is not a MongoDB problem but yours.
You have over 100,000 documents that match your query and even if you just want 100 then you need to accept this is how this works and MongoDB does not "give up" once you have matched 100 documents and yield control. The query pattern here finds all of the matches and then applies the "limit" to the cursor in order just to return the most recent.
Maybe some time in the future the "text" functionality might allow you do do things like you can do in the aggregate version of $geoNear and specify "minimum" and "maximum" values for a "score" in order to improve results. But right now it does not.
So either upgrade your hardware or use an external text search solution if your problem is the slow results on matching over 100,000 documents out of over 89,000,000 documents.

Incorrect items count with specific index usage

I'm using MongoDB, version 2.4.8 on windows server 2008 R2 and I have strange index behaviour which I can't explain. Here example of structure that I have in my collection:
{
"_id" : NUUID("67070100-4627-4aa5-8ab9-45624e5b82ad"),
"PropertyType" : "Cooperative",
"Address" : {
"Street" : "aaaaaaaaa",
"HouseNo" : "165",
"PostalCode" : 2860,
"City" : "bbbbb",
"Floor" : "1",
"DoorNumber" : ""
},
"Sales" : {
"Price" : 425000,
"Payout" : 0,
"AreaPrice" : 9042,
"GrossPrice" : 2340,
"NetPrice" : 800,
},
"WithdrawnFromSale" : true,
"UnitData" : {
"UnitType" : "aaaaa",
"Area" : 400,
"LivingArea" : 50,
"UnitArea" : 50,
"Rooms" : 2,
"BuildYear" : 1948,
"GroundArea" : 203,
"NoiseLevel" : 5
}
}
Also, I've created index for that collection:
db["UnitModel"].ensureIndex({ "Sales": 1, "PropertyType": 1, "UnitData.Rooms": 1, "UnitData.NoiseLevel": 1 })
The problem with that index is that I get wrong count of items when using this index.
When I issue this request:
db.UnitModel.find({Sales: {$ne: null}, WithdrawnFromSale: false}).explain({verbose: true})
I get following results:
{
"cursor" : "BtreeCursor Sales_1_PropertyType_1_UnitData.Rooms_1_UnitData.NoiseLevel_1 multi",
"isMultiKey" : false,
"n" : 19368,
"nscannedObjects" : 42875,
"nscanned" : 42876,
"nscannedObjectsAllPlans" : 43274,
"nscannedAllPlans" : 43276,
"scanAndOrder" : false,
"indexOnly" : false,
....
}
Here we can see that index has been used, but the number of items returned is "n" : 19368. which is wrong.
It should be 70986 items in collection with that criteria.
Why am I sure that it should be more records? Well, here the code:
var totalCount = 0;
db.UnitModel.find({WithdrawnFromSale: false}).forEach(
function (e) {
if(e.hasOwnProperty('Sales') && e.Sales != null)
totalCount++;
}
)
totalCount;
totalCount = 70986
To be sure that query above do not use any indexes let's check it out:
db.UnitModel.find({WithdrawnFromSale: false}).explain({verbose: true})
And result:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 70986,
"nscannedObjects" : 3204212,
"nscanned" : 3204212,
"nscannedObjectsAllPlans" : 3204212,
"nscannedAllPlans" : 3204212,
"scanAndOrder" : false,
"indexOnly" : false,
....
}
So, for UnitModel collection I'm using, for criteria: Sales: {$ne: null}, WithdrawnFromSale: false it should be 70986 records returned by mongo. But as you can see I get it wrong.
Can someone explain me why? What can be the reason?
BTW. When I drop that index and use following index:
db["UnitModel"].ensureIndex({ "WithdrawnFromSale": 1})
it works as expected. But I do not need that index, it's not optimzal for my case.
As at MongoDB 2.4, the maximum size of an indexed value is 1024 bytes. The current behaviour for a key too large to index is to log a warning on the server side -- but this does not throw an exception. In this case, documents with excessively long keys will not be included in the index when the key is too long, but will be included in other indexes. This can lead to inconsistencies in results such as incorrect counts and "missing documents" that cannot be found by one index but may be available in another index or with a $natural search.
In the MongoDB 2.5 development/unstable branch (which will culminate in the MongoDB 2.6 production release later this year) this behaviour has changed. As at MongoDB 2.5.5, an exception will now be raised if a insert/update includes an index update where the keys would be too large. See SERVER-5290 in the MongoDB issue tracker for more details.
Figure out what the reason of the issue. When I look in log files for monogodb I have seen tons of following messages:
HBReadModel.system.indexes Btree::insert: key too large to index, skipping HBReadModel.UnitModel.$Sales_1_WithdrawnFromSale_1_PropertyType_1_UnitData.Rooms_1_UnitData.NoiseLevel_1
I was trying to create index on sales field which in actually document and not field. To avoid this I just re-created index and specify field inside Sales document. Log is clear, query returns records as expected.

Querying a Sub Object in MongoDB is not using the Index

I am recording site usage events in a sub object of a (visitor). here is a basic example of the data structure:
{ "_id" : ObjectId("4d4c695794b332a0740009bd"), "evs" : [
{
"ev" : "Visit Home Page",
"d" : 1,
"s" : 1
},
{
"ev" : "Buy Product",
"d" : "110.10",
"upc" : 1234,
"s" : 1
},
{
"ev" : "Sign up to newsletter",
"d" : "1",
"s" : 1
}
]}
I have an index on 'evs.s', but when I search on evs.s, the index is not used:
db.visitors.find({'evs.s':0}).explain()
{
"cursor" : "BtreeCursor evs.s_1",
"nscanned" : 33361,
"nscannedObjects" : 33361,
"n" : 33361,
"millis" : 311,
"nYields" : 105,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"evs.s" : [
[
0,
0
]
]
}
}
That query takes 311 milliseconds and scans through every object.
Here is the index: db.visitors.getIndexes()
{
"ns" : "tracking.visitors",
"unique" : false,
"key" : {
"evs.s" : 1
},
"name" : "evs.s_1",
"v" : 0
}
Your query actually is using an index, as indicated by the cursor type in the explain output ("BtreeCursor evs.s_1"). If you were not using a an index, it would be "BasicCursor".
From your input data, it looks like evs.s might not be a very efficient key to index on. If all of the values of evs.s are either 1 or 0, your index will always hit a large number of matches.
My guess is that your query did not do a full table scan, but that there are actually that many records with a value of evs.s = 0 in your index.
You might compare the output of
db.visits.find({evs.s: 0}).count();
db.visits.find({evs.s: 1}).count();
db.visits.find().count();
to verify this.
There are several things you can do to speed this up:
1) You can use a different index that has more distinct values. This will reduce the search space on each query.
2) You can add a limit statement to your query. This will stop scanning the index once limit documents have been found.
"cursor" : "BtreeCursor evs.s_1"
means that the index is used.