MongoDB - distinct with query doesn't use indexes - mongodb

Using Mongo 3.2.
Let's say I have a collection with this schema:
{ _id: 1, type: a, source: x },
{ _id: 2, type: a, source: y },
{ _id: 3, type: b, source: x },
{ _id: 4, type: b, source: y }
Of course that my db is much larger and with many more types and sources.
I have created 4 indexes combinations of type and source (even though 1 should be enough):
{type: 1}
{source: 1},
{type: 1, source: 1},
{source: 1, type: 1}
Now, I am running this distinct query:
db.test.distinct("source", {type: "a"})
The problem is that this query takes much more time that it should take.
If I run it with runCommand:
db.runCommand({distinct: 'test', key: "source", query: {type: "a"}})
this is the result i get:
{
"waitedMS": 0,
"values": [
"x",
"y"
],
"stats": {
"n": 19400840,
"nscanned": 19400840,
"nscannedObjects": 19400840,
"timems": 14821,
"planSummary": "IXSCAN { type: 1 }"
},
"ok": 1
}
For some reason, mongo use only the type: 1 index for the query stage.
It should use the index also for the distinct stage.
Why is that? Using the {type: 1, source: 1} index would be much better, no? right now it is scanning all the type: a documents while it has an index for it.
Am I doing something wrong? Do I have a better option for this kind of distinct?

As Alex mentioned, apparently MongoDB doesn't support this right now.
There is an open issue for it:
https://jira.mongodb.org/browse/SERVER-19507

Just drop first 2 indexes. You don't need them. Mongo can use {type: 1, source: 1} in any query that may need {type: 1} index.

Related

Creation of Unique Index not working using Mongo shell

I have created small mongodb database, I wanted to create username column as unique. So I used createIndex() command to create index for that column with UNIQUE property.
I tried creating unique index using below command in Mongosh.
db.users.createIndex({'username':'text'},{unqiue:true,dropDups: true})
For checking current index, I used getIndex() command. below is the output for that.
newdb> db.users.getIndexes()
[
{ v: 2, key: { _id: 1 }, name: '_id_' },
{
v: 2,
key: { _fts: 'text', _ftsx: 1 },
name: 'username_text',
weights: { username: 1 },
default_language: 'english',
language_override: 'language',
textIndexVersion: 3
}
]
Now Index is created, so for confirmation I checked same in MongoDB Compass.But I am not able to see UNIQUE property got assign to my newly created index. Please refer below screenshot.
MongoDB Screenshot
I tried deleting old index, as it was not showing UNIQUE property and Created again using MongoDB Compass GUI, and now I can see UNIQUE Property assign to index.
MongoDB Screentshot2
And below is output for getIndex() command in Mongosh.
newdb> db.users.getIndexes()
[
{ v: 2, key: { _id: 1 }, name: '_id_' },
{
v: 2,
key: { _fts: 'text', _ftsx: 1 },
name: 'username_text',
unique: true,
sparse: false,
weights: { username: 1 },
default_language: 'english',
language_override: 'language',
textIndexVersion: 3
}
]
I tried searching similar topics, but didn't found anything related. Is there anything I am missing or doing wrong here?
Misspelled the property unique as unqiue, which leads to this issue.
I tried again with the correct spelling, and it is working now.
Sorry for a dumb question

Mongodb Index behavior different in PROD and Testing environment

I have 2 dedicated Mongo clusters which have the same exact Model, Indexes and we query both envs the same way but the result is different.
user.model.js
const schema = mongoose.Schema({
_id: ObjectId,
role: {
type: String,
enum: ['user', 'admin'],
required: true,
},
score: { type: Number, default: 0 },
deactivated: { type: Date },
});
schema.index(
{ deactivated: 1, role: 1, score: -1 },
{ name: 'search_index', collation: { locale: 'en', strength: 2 } }
);
I noticed that one of our common queries was causing issues on the PROD environment.
The query looks like this:
db.getCollection('users')
.find({deactivated: null, role: 'user'})
.sort({score: -1})
.limit(10)
.collation({locale: 'en', strength: 2})
On the Testing Environment the query runs as expected fully utilizing the index. (has ~80K records total, 1300 deactivated)
But in our PROD env the query, seems to be using only the first part of the compound index. (has ~50K records total, ~20K records deactivated)
The executionStats looks like:
As we can see it is using at least the first part of the index to only search in non-deactivated records, but the SORT is in memory.
This is a legacy application so the first thing I did was ensure that the types of the indexed fields are following the schema in all the records.
I wonder if it could be the "role" collation somehow?
Any hint or clue will be greatly appreciated. Thanks in advance.
Thanks for providing the plans. It is a combination of a few things (including the multikeyness of the production index) that is causing the problem.
There are a few ways to potentially solve this, let's start with the obvious question. Is score supposed to be an array?
The schema suggests not. With MongoDB, an index becomes multikey once a single document is inserted that has an array (even empty) for a key in the index. There is no way to way to "undo" this change apart from rebuilding the index. If the field is not supposed to contain an array, then I would suggest fixing any documents that contain the incorrect data and then rebuilding the index. As this is production, you may want to build a temporary index to reduce the impact to the application while the original index is dropped and recreated. You may also want to look into schema validation to help prevent incorrect data from getting inserted in the future.
If score can be an array, then we'll need to take a different approach. We can see in the UAT plan that a SORT_MERGE is used. The only reason that stage is required is because {"deactivated" : null} seems to have an additional index bound looking for undefined. That may be some internal implementation quirk as that BSON type appears to be deprecated. So updating the data to have an explicit false value for this field and using that check in the query predicate (rather than a check for null) will remove the need to split the plan out with a SORT_MERGE and will probably allow the multikey index to provide the sort:
winningPlan: {
stage: 'LIMIT',
limitAmount: 10,
inputStage: {
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
keyPattern: { deactivated: 1, role: 1, score: -1 },
indexName: 'search_index',
collation: {
locale: 'en',
caseLevel: false,
caseFirst: 'off',
strength: 2,
numericOrdering: false,
alternate: 'non-ignorable',
maxVariable: 'punct',
normalization: false,
backwards: false,
version: '57.1'
},
isMultiKey: true,
multiKeyPaths: { deactivated: [], role: [], score: [ 'score' ] },
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: {
deactivated: [ '[false, false]' ],
role: [
'[CollationKey(0x514d314b0108), CollationKey(0x514d314b0108)]'
],
score: [ '[MaxKey, MinKey]' ]
}
}
}
}

Building mongo query

I have a model like this:
[{item: {
_id: 123,
field1: someValue,
price: {
[_id: 456,
field1: anotherValue,
type: []],
[_id: 789,
field1: anotherValue,
type: ['super']]
}
}]
I need to find an item by 3 parameters: item _id, price _id, and check if price type array is empty. And check it in one price field.
Model.findOneAndUpdate({_id: 123, "price._id": 456, "price.type": {size:0})
This query always returns item, cause search in different prices.
Model.findOneAndUpdate({_id: 123, price: {id: 456, type: {size:0})
This query returns error (cast array value or something like this).
tried to build query with $in, $and, but still getting an error
Use $elemMatch:
The $elemMatch operator matches documents that contain an array field
with at least one element that matches all the specified query
criteria.
db.inventory.find({
price: {
"$elemMatch": {
_id: 456,
type: {
$size: 0
}
}
}
})

Mongo efficient querying log collection by level range

I have a capped collection for storing server logs:
var schema = new mongoose.Schema({
level: { type: Number, required: true },
...
}, { capped: 64 * 1024 * 1024, versionKey: false });
I'm having trouble figuring out how to query logs by level range efficiently. Here's a sample query I want to run:
db.getCollection('logs').find({
level: { $gte: 2, $lte: 6 }
}).sort({ _id: -1 }).limit(500)
Indexing on { _id: 1, level: 1 } doesn't make any sense, as _id is unique and there will be only a single level for each of them, so in worst case whole collection will be checked.
If I index on { level: 1, _id: -1 }, in worst case Mongo pulls all logs for levels 2, 3, 4, 5, 6 joins them and sorts them manually, so performance is horrible. Sometimes it also decides to use { _id: 1 } index, which is terrible too.
It could just walk through these 6 indexes at once and get the result while checking at most 504 documents. Or it could pull only first 500 results from each level, so it would sort at most 2500 documents. But it won't, Mongo is just plain stupid when it comes to range queries.
The fastest solution I can think of is implementing the last mentioned method on the client, so running 5 queries and then merging them manually:
db.getCollection('logs').find({ level: 2 }).sort({ _id: -1 }).limit(500)
db.getCollection('logs').find({ level: 2 }).sort({ _id: -1 }).limit(500)
db.getCollection('logs').find({ level: 3 }).sort({ _id: -1 }).limit(500)
...
Merging can be done in O(n) on the client, there are only 7 log levels so at most 7 queries will be executed and 3500 documents pulled from the database.
Is there a better way?
Since you have only 7 levels, it may worth to consider { level: 1, _id: -1 } index with $or query:
db.logs.find({$or:[
{level: 2},
{level: 3},
{level: 4},
{level: 5},
{level: 6}
]}).sort({_id:-1}).limit(500)
Since it is equality condition, it should make use of the index, but I never tried it on capped collections.
I would give it a try and run explain() to confirm it works, then probably enabled profiler and run few other queries.

mongodb: order and limit the collection, then order and limit the resulting document nested documents

I have the following data structure on my game collection:
{
name: game1
date: 2010-10-10
media: [{
id: 1,
created: 2010-10-10 00:00:59
}, {
id: 2,
created: 2010-10-10 00:00:30
}]
},
{
name: game2
date: 2010-10-09
media: [{
id: 1,
created: 2010-10-09 00:10:40
}, {
id: 2,
created: 2010-10-09 09:01:00
}]
}
I want to get the game with the highest date, then get the related media with the highest created to get it's id. In the example above, the result would be
{
name: game1
date: 2010-10-10
media: [{
id: 1,
created: 2010-10-10 00:00:59
}]
}
I tried to use the find and find_one, and also aggregation, but I could not figure a way to make this query.
Any suggestions?
You will need to $unwind the media array in order to get the subdocument in that array where created is the highest then you $sort your documents by date and created all in descending order. Use $limit to output n documents which is 1 in our case.
In [26]: import pymongo
In [27]: conn = pymongo.MongoClient()
In [28]: db = conn.test
In [29]: col = db.gamers
In [30]: list(col.aggregate([{"$unwind": "$media"}, {"$sort": {"date": -1, "media.created": -1}}, {"$limit": 1}]))
Out[30]:
[{'_id': ObjectId('553323ec0acf450bc6b7438c'),
'date': '2010-10-10',
'media': {'created': '2010-10-10 00:00:59', 'id': 1},
'name': 'game1'
}]