What's causing this indexing error in mongo db? - mongodb

I've created 3 indexes based on several json files of Yelp that I imported into my mongodb.
> db.review.createIndex({"text":"text"})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.business.createIndex({"categories":"text"})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.business.createIndex({"attributes":"text"})
{
"ok" : 0,
"errmsg" : "Index with pattern: { _fts: \"text\", _ftsx: 1 } already exists with different options",
"code" : 85
Basically I'm trying to create 3 indexes to make count function faster in my mongodb.
What does "errmsg" : "Index with pattern: { _fts: \"text\", _ftsx: 1 } already exists with different options" means?
Should I pick a differente thing as attribute, or should I drop it?

MongoDB (as of v3.4) only allows one text index per collection
In your business collection, you already built a text index on categories. So, the second text index, on attributes, will fail.

Related

What does createdCollectionAutomatically mean?

When I create an index on a collection, one of the properties of the result document is createCollectionAutomatically:false.
db.myCollection.createIndex({"address":1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 2,
"numIndexesAfter" : 3,
"ok" : 1
}
What does it mean and when is this true?
Found the answer here: https://docs.mongodb.com/manual/reference/command/createIndexes/#output
The createdCollectionAutomatically indicates if the operation
created a collection. If a collection does not exist, MongoDB
creates the collection as part of the indexing
operation.
So when I run db.myCollection.createIndex({"address":1}) and myCollection does not exists, the result is
{
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}

Indexing array/subobject in mongoDB causes duplicate key error

I have a collection where I will have a _children attribute like this:
{
_children: {
videoTags: [ { id: '1', name: 'one'}, { id: '2', name: 'two'} ],
},
a: 10
}
Since I WILL search in videoTags, I create an index as such:
> db.test4.createIndex({ "_children.videosTags.id" : 1 }, { "unique" : true } );
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
Trouble is, I can no longer add anything to that table since I get a duplicate index error. Here is how to reproduce it:
Step 1: insert to a collection
db.test4.insert({a:20})
WriteResult({ "nInserted" : 1 })
Step 2: make the index
db.test4.createIndex({ "_children.videosTags.id" : 1 }, { "unique" : true } );
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
Step 3: try to insert again
db.test4.insert({a:30})
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: wonder_1.test4.$_children.videosTags.id_1 dup key: { : null }"
}
})
I think the issue here is that there is already a record where _children.videoTags.id is not defined.
However, what I expected was a behaviour where if videoTags.id was specified, it needed to be unique. Instead, an empty one is considered a "taken" key.
What am I doing that is stupidly wrong?
This will work if you don't set unique as true, but I have the feeling I need to fix it for real...
There could be two reasons.
There could be other documents exists in collection with same _children.videosTags.id
It's quite possible that more than one document may have missing _children.videosTags.id" or having null value.
As you are creating unique key, null or empty values are give you tough time. Solution is either create sparse index and if your MongoDB version is 3.2+, create partial index. See documentation for partial indexes.

MongoDB not uses indexes on some fields when doing distinct

I have noticed that MongoDB won't use indexes when querying for a distinct value on a field. I will use it on some fields, but won't on others.
Here's the example:
db.product.createIndex({"_indexed.preventieve_mondzorg-max_bedrag_p_jr": 1});
db.runCommand({distinct: "product", key:"_indexed.preventieve_mondzorg-max_bedrag_p_jr"});
This query will not use an index that is built on that field and will go for the full collection scan. That's what it produces:
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 50,
"numIndexesAfter" : 50,
"note" : "all indexes already exist",
"ok" : 1
}
{
"values" : [
"€ 250,- | 75%",
"Geen dekking",
"...",
],
"stats" : {
"n" : 33660,
"nscanned" : 0,
"nscannedObjects" : 33660,
"timems" : 12531,
"planSummary" : "COLLSCAN"
},
"ok" : 1
}
On the other hand
db.product.createIndex({"free_choice.value": 1});
db.runCommand({distinct: "product", key:"free_choice.value"});
Will the index:
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 50,
"numIndexesAfter" : 50,
"note" : "all indexes already exist",
"ok" : 1
}
{
"values" : [
"gedeeltelijk",
"geen",
"ja"
],
"stats" : {
"n" : 4,
"nscanned" : 4,
"nscannedObjects" : 4,
"timems" : 2,
"planSummary" : "DISTINCT { free_choice.value: 1.0 }"
},
"ok" : 1
}
So... what could be the difference between those two fields?
Is it a bug, or I am doing something wrong?
I am running MongoDB 3.0.6 in a Vagrant box with Ubuntu 14.04.3 LTS
Apparently this is the bug of MongoDB core. Or misbehaviour. MongoDB would not use multikey index for dotted fields for distinct requests.
Here's the Mongo's response:
The distinct optimization uses a special index access stage which
returns the distinct index keys to its parent stage. In the multikey
dotted case, however, the distinct stage would have to check for null
or undefined keys. In the case of null or undefined, it must fetch the
full document in order to disambiguate between literal nulls versus
null by virtue of missing fields. We have decided to hold off unless
we see that users really need this.
If you really want this feature, please vote for it here: SERVER-13298

Incorrect items count with specific index usage

I'm using MongoDB, version 2.4.8 on windows server 2008 R2 and I have strange index behaviour which I can't explain. Here example of structure that I have in my collection:
{
"_id" : NUUID("67070100-4627-4aa5-8ab9-45624e5b82ad"),
"PropertyType" : "Cooperative",
"Address" : {
"Street" : "aaaaaaaaa",
"HouseNo" : "165",
"PostalCode" : 2860,
"City" : "bbbbb",
"Floor" : "1",
"DoorNumber" : ""
},
"Sales" : {
"Price" : 425000,
"Payout" : 0,
"AreaPrice" : 9042,
"GrossPrice" : 2340,
"NetPrice" : 800,
},
"WithdrawnFromSale" : true,
"UnitData" : {
"UnitType" : "aaaaa",
"Area" : 400,
"LivingArea" : 50,
"UnitArea" : 50,
"Rooms" : 2,
"BuildYear" : 1948,
"GroundArea" : 203,
"NoiseLevel" : 5
}
}
Also, I've created index for that collection:
db["UnitModel"].ensureIndex({ "Sales": 1, "PropertyType": 1, "UnitData.Rooms": 1, "UnitData.NoiseLevel": 1 })
The problem with that index is that I get wrong count of items when using this index.
When I issue this request:
db.UnitModel.find({Sales: {$ne: null}, WithdrawnFromSale: false}).explain({verbose: true})
I get following results:
{
"cursor" : "BtreeCursor Sales_1_PropertyType_1_UnitData.Rooms_1_UnitData.NoiseLevel_1 multi",
"isMultiKey" : false,
"n" : 19368,
"nscannedObjects" : 42875,
"nscanned" : 42876,
"nscannedObjectsAllPlans" : 43274,
"nscannedAllPlans" : 43276,
"scanAndOrder" : false,
"indexOnly" : false,
....
}
Here we can see that index has been used, but the number of items returned is "n" : 19368. which is wrong.
It should be 70986 items in collection with that criteria.
Why am I sure that it should be more records? Well, here the code:
var totalCount = 0;
db.UnitModel.find({WithdrawnFromSale: false}).forEach(
function (e) {
if(e.hasOwnProperty('Sales') && e.Sales != null)
totalCount++;
}
)
totalCount;
totalCount = 70986
To be sure that query above do not use any indexes let's check it out:
db.UnitModel.find({WithdrawnFromSale: false}).explain({verbose: true})
And result:
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 70986,
"nscannedObjects" : 3204212,
"nscanned" : 3204212,
"nscannedObjectsAllPlans" : 3204212,
"nscannedAllPlans" : 3204212,
"scanAndOrder" : false,
"indexOnly" : false,
....
}
So, for UnitModel collection I'm using, for criteria: Sales: {$ne: null}, WithdrawnFromSale: false it should be 70986 records returned by mongo. But as you can see I get it wrong.
Can someone explain me why? What can be the reason?
BTW. When I drop that index and use following index:
db["UnitModel"].ensureIndex({ "WithdrawnFromSale": 1})
it works as expected. But I do not need that index, it's not optimzal for my case.
As at MongoDB 2.4, the maximum size of an indexed value is 1024 bytes. The current behaviour for a key too large to index is to log a warning on the server side -- but this does not throw an exception. In this case, documents with excessively long keys will not be included in the index when the key is too long, but will be included in other indexes. This can lead to inconsistencies in results such as incorrect counts and "missing documents" that cannot be found by one index but may be available in another index or with a $natural search.
In the MongoDB 2.5 development/unstable branch (which will culminate in the MongoDB 2.6 production release later this year) this behaviour has changed. As at MongoDB 2.5.5, an exception will now be raised if a insert/update includes an index update where the keys would be too large. See SERVER-5290 in the MongoDB issue tracker for more details.
Figure out what the reason of the issue. When I look in log files for monogodb I have seen tons of following messages:
HBReadModel.system.indexes Btree::insert: key too large to index, skipping HBReadModel.UnitModel.$Sales_1_WithdrawnFromSale_1_PropertyType_1_UnitData.Rooms_1_UnitData.NoiseLevel_1
I was trying to create index on sales field which in actually document and not field. To avoid this I just re-created index and specify field inside Sales document. Log is clear, query returns records as expected.

Querying a Sub Object in MongoDB is not using the Index

I am recording site usage events in a sub object of a (visitor). here is a basic example of the data structure:
{ "_id" : ObjectId("4d4c695794b332a0740009bd"), "evs" : [
{
"ev" : "Visit Home Page",
"d" : 1,
"s" : 1
},
{
"ev" : "Buy Product",
"d" : "110.10",
"upc" : 1234,
"s" : 1
},
{
"ev" : "Sign up to newsletter",
"d" : "1",
"s" : 1
}
]}
I have an index on 'evs.s', but when I search on evs.s, the index is not used:
db.visitors.find({'evs.s':0}).explain()
{
"cursor" : "BtreeCursor evs.s_1",
"nscanned" : 33361,
"nscannedObjects" : 33361,
"n" : 33361,
"millis" : 311,
"nYields" : 105,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"evs.s" : [
[
0,
0
]
]
}
}
That query takes 311 milliseconds and scans through every object.
Here is the index: db.visitors.getIndexes()
{
"ns" : "tracking.visitors",
"unique" : false,
"key" : {
"evs.s" : 1
},
"name" : "evs.s_1",
"v" : 0
}
Your query actually is using an index, as indicated by the cursor type in the explain output ("BtreeCursor evs.s_1"). If you were not using a an index, it would be "BasicCursor".
From your input data, it looks like evs.s might not be a very efficient key to index on. If all of the values of evs.s are either 1 or 0, your index will always hit a large number of matches.
My guess is that your query did not do a full table scan, but that there are actually that many records with a value of evs.s = 0 in your index.
You might compare the output of
db.visits.find({evs.s: 0}).count();
db.visits.find({evs.s: 1}).count();
db.visits.find().count();
to verify this.
There are several things you can do to speed this up:
1) You can use a different index that has more distinct values. This will reduce the search space on each query.
2) You can add a limit statement to your query. This will stop scanning the index once limit documents have been found.
"cursor" : "BtreeCursor evs.s_1"
means that the index is used.