Mongodb Index behavior different in PROD and Testing environment

Mongodb Index behavior different in PROD and Testing environment - mongodb

I have 2 dedicated Mongo clusters which have the same exact Model, Indexes and we query both envs the same way but the result is different.
user.model.js
const schema = mongoose.Schema({
_id: ObjectId,
role: {
type: String,
enum: ['user', 'admin'],
required: true,
},
score: { type: Number, default: 0 },
deactivated: { type: Date },
});
schema.index(
{ deactivated: 1, role: 1, score: -1 },
{ name: 'search_index', collation: { locale: 'en', strength: 2 } }
);
I noticed that one of our common queries was causing issues on the PROD environment.
The query looks like this:
db.getCollection('users')
.find({deactivated: null, role: 'user'})
.sort({score: -1})
.limit(10)
.collation({locale: 'en', strength: 2})
On the Testing Environment the query runs as expected fully utilizing the index. (has ~80K records total, 1300 deactivated)
But in our PROD env the query, seems to be using only the first part of the compound index. (has ~50K records total, ~20K records deactivated)
The executionStats looks like:
As we can see it is using at least the first part of the index to only search in non-deactivated records, but the SORT is in memory.
This is a legacy application so the first thing I did was ensure that the types of the indexed fields are following the schema in all the records.
I wonder if it could be the "role" collation somehow?
Any hint or clue will be greatly appreciated. Thanks in advance.

Thanks for providing the plans. It is a combination of a few things (including the multikeyness of the production index) that is causing the problem.
There are a few ways to potentially solve this, let's start with the obvious question. Is score supposed to be an array?
The schema suggests not. With MongoDB, an index becomes multikey once a single document is inserted that has an array (even empty) for a key in the index. There is no way to way to "undo" this change apart from rebuilding the index. If the field is not supposed to contain an array, then I would suggest fixing any documents that contain the incorrect data and then rebuilding the index. As this is production, you may want to build a temporary index to reduce the impact to the application while the original index is dropped and recreated. You may also want to look into schema validation to help prevent incorrect data from getting inserted in the future.
If score can be an array, then we'll need to take a different approach. We can see in the UAT plan that a SORT_MERGE is used. The only reason that stage is required is because {"deactivated" : null} seems to have an additional index bound looking for undefined. That may be some internal implementation quirk as that BSON type appears to be deprecated. So updating the data to have an explicit false value for this field and using that check in the query predicate (rather than a check for null) will remove the need to split the plan out with a SORT_MERGE and will probably allow the multikey index to provide the sort:
winningPlan: {
stage: 'LIMIT',
limitAmount: 10,
inputStage: {
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
keyPattern: { deactivated: 1, role: 1, score: -1 },
indexName: 'search_index',
collation: {
locale: 'en',
caseLevel: false,
caseFirst: 'off',
strength: 2,
numericOrdering: false,
alternate: 'non-ignorable',
maxVariable: 'punct',
normalization: false,
backwards: false,
version: '57.1'
},
isMultiKey: true,
multiKeyPaths: { deactivated: [], role: [], score: [ 'score' ] },
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: {
deactivated: [ '[false, false]' ],
role: [
'[CollationKey(0x514d314b0108), CollationKey(0x514d314b0108)]'
],
score: [ '[MaxKey, MinKey]' ]
}
}
}
}

Related

Creation of Unique Index not working using Mongo shell

I have created small mongodb database, I wanted to create username column as unique. So I used createIndex() command to create index for that column with UNIQUE property.
I tried creating unique index using below command in Mongosh.
db.users.createIndex({'username':'text'},{unqiue:true,dropDups: true})
For checking current index, I used getIndex() command. below is the output for that.
newdb> db.users.getIndexes()
[
{ v: 2, key: { _id: 1 }, name: '_id_' },
{
v: 2,
key: { _fts: 'text', _ftsx: 1 },
name: 'username_text',
weights: { username: 1 },
default_language: 'english',
language_override: 'language',
textIndexVersion: 3
}
]
Now Index is created, so for confirmation I checked same in MongoDB Compass.But I am not able to see UNIQUE property got assign to my newly created index. Please refer below screenshot.
MongoDB Screenshot
I tried deleting old index, as it was not showing UNIQUE property and Created again using MongoDB Compass GUI, and now I can see UNIQUE Property assign to index.
MongoDB Screentshot2
And below is output for getIndex() command in Mongosh.
newdb> db.users.getIndexes()
[
{ v: 2, key: { _id: 1 }, name: '_id_' },
{
v: 2,
key: { _fts: 'text', _ftsx: 1 },
name: 'username_text',
unique: true,
sparse: false,
weights: { username: 1 },
default_language: 'english',
language_override: 'language',
textIndexVersion: 3
}
]
I tried searching similar topics, but didn't found anything related. Is there anything I am missing or doing wrong here?

Misspelled the property unique as unqiue, which leads to this issue.
I tried again with the correct spelling, and it is working now.
Sorry for a dumb question

When using TTL indexes, how to correctly mark document as "never-expires"

I am using Mongo's TTL indexes on user created posts. Each post has expiresAt field that is a date which the TTL index uses.
Admin's can "highlight" posts, hence making it so that post never expires.
I am not sure how to do this correctly and am considering these 2 methods
Setting expiresAt to some big number in the future i.e. 9999 years
Deleting or setting expiresAt field to "undefined"
What approach would be the best, ideally removing index on the document as well so it is not stored unnecessarily?

As you suggest, there are multiple ways to approach this. In general, TTL indexes have the following behavior:
If the indexed field in a document is not a date or an array that holds one or more date values, the document will not expire.
If a document does not contain the indexed field, the document will not expire.
This behavior confirms that your second idea (unsetting expiresAt) will work. You additionally mention that the solution should "ideally remov[e the] index on the document as well so it is not stored unnecessarily?" Based on that, I would approach this by using an index that is both TTL and Partial. For example:
db.foo.createIndex(
{ expiresAt: 1 },
{ expireAfterSeconds: 3600, partialFilterExpression: { expiresAt: {$exists:true} }
})
If we have the following three documents:
test> db.foo.find()
[
{ _id: 1, expiresAt: ISODate("2022-11-27T17:09:23.394Z") },
{ _id: 2, expiresAt: ISODate("2022-11-27T17:09:26.718Z") },
{ _id: 3 }
]
We can see that only the first two are captured in our index, meaning that the third document is not taking up space and also will not expire;
test> db.foo.find().hint({expiresAt:1}).explain("executionStats").executionStats.totalKeysExamined
2
If documents defaulted to having some highlighted field as false, then an alternative approach might be:
db.foo.createIndex(
{ expiresAt: 1 },
{ expireAfterSeconds: 3600, partialFilterExpression: { highlighted: false }
})
The interesting thing about this approach is that it allows you to retain the original expiration value (in the original field) should it be needed for reference later for some reason:
test> db.foo.find()
[
{
_id: 1,
expiresAt: ISODate("2022-11-27T17:09:23.394Z"),
highlighted: false
},
{
_id: 2,
expiresAt: ISODate("2022-11-27T17:09:26.718Z"),
highlighted: false
},
{
_id: 3,
expiresAt: ISODate("2022-11-27T17:13:25.929Z"),
highlighted: true
}
]
test> db.foo.find().hint({expiresAt:1}).explain("executionStats").executionStats.totalKeysExamined
2

How to create an index for a complex MongoDB query

I need to create an index for the following query:
await Activity.find({
$and: [
{
lastUpdated: {
$gte: new Date(new Date().getTime() - 7 * 24 * 60 * 60 * 1000),
},
},
{
"followers._user": _user,
},
{
global: true,
}
]
})
.collation({
locale: "en_US",
numericOrdering: true,
})
.sort({
lastUpdated: -1
})
.skip(
length
)
.limit(
10
)
I have the below index in place currently but the query does not use it.
ActivitiesSchema.index(
{ "followers._user": 1, global: 1, lastUpdated: -1 },
{
collation: {
locale: "en_US",
numericOrdering: true,
},
}
);
What can I try to solve this?

Change index to:
{ lastUpdated: -1, "followers._user": 1, global: 1 }
NB: it may affect other queries that rely on existing index
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/#sort-and-index-prefix reads:
If the sort keys correspond to the index keys or an index prefix, MongoDB can use the index to sort the query results. A prefix of a compound index is a subset that consists of one or more keys at the start of the index key pattern.
Since you are sorting by "lastUpdated", the index should start from it.
NB 2: With this change Mongodb can use, but it is not guaranteed. There are many other factors like selectivity and cardinality, e.g. global: true, implies extremely low cardinality to benefit from an index on this field. On the other hand if the user doesn't follow much and the total activity is massive, it might be cheaper to filter by "followers._user" index and do in-memory sort. It's up to the query planner to decide which index to use.

MongoDB - distinct with query doesn't use indexes

Using Mongo 3.2.
Let's say I have a collection with this schema:
{ _id: 1, type: a, source: x },
{ _id: 2, type: a, source: y },
{ _id: 3, type: b, source: x },
{ _id: 4, type: b, source: y }
Of course that my db is much larger and with many more types and sources.
I have created 4 indexes combinations of type and source (even though 1 should be enough):
{type: 1}
{source: 1},
{type: 1, source: 1},
{source: 1, type: 1}
Now, I am running this distinct query:
db.test.distinct("source", {type: "a"})
The problem is that this query takes much more time that it should take.
If I run it with runCommand:
db.runCommand({distinct: 'test', key: "source", query: {type: "a"}})
this is the result i get:
{
"waitedMS": 0,
"values": [
"x",
"y"
],
"stats": {
"n": 19400840,
"nscanned": 19400840,
"nscannedObjects": 19400840,
"timems": 14821,
"planSummary": "IXSCAN { type: 1 }"
},
"ok": 1
}
For some reason, mongo use only the type: 1 index for the query stage.
It should use the index also for the distinct stage.
Why is that? Using the {type: 1, source: 1} index would be much better, no? right now it is scanning all the type: a documents while it has an index for it.
Am I doing something wrong? Do I have a better option for this kind of distinct?

As Alex mentioned, apparently MongoDB doesn't support this right now.
There is an open issue for it:
https://jira.mongodb.org/browse/SERVER-19507

Just drop first 2 indexes. You don't need them. Mongo can use {type: 1, source: 1} in any query that may need {type: 1} index.

Storing null vs not storing the key at all in MongoDB

It seems to me that when you are creating a Mongo document and have a field {key: value} which is sometimes not going to have a value, you have two options:
Write {key: null} i.e. write null value in the field
Don't store the key in that document at all
Both options are easily queryable, in one you query for {key : null} and the other you query for {key : {$exists : false}}.
I can't really think of any differences between the two options that would have any impact in an application scenario (except that option 2 has slightly less storage).
Can anyone tell me if there are any reasons one would prefer either of the two approaches over the other, and why?
EDIT
After asking the question it also occurred to me that indexes may behave differently in the two cases i.e. a sparse index can be created for option 2.

Indeed you have also a third possibility :
key: "" (empty value)
And you forget a specificity about null value.
Query on
key: null will retrieve you all document where key is null or where key doesn't exist.
When a query on $exists:false will retrieve only doc where field key doesn't exist.
To go back to your exact question it depends of you queries and what data represent.
If you need to keep that, by example, a user set a value then unset it, you should keep the field as null or empty. If you dont need, you may remove this field.

Note that, since MongoDB doesnt use field name dictionary compression, field:null consumes disk space and RAM, while storing no key at all doesnt consume resources.

It really comes down to:
Your scenario
Your querying manner
Your index needs
Your language
I personally have chosen to store null keys. It makes it much easier to integrate into my app. I use PHP with Active Record and uisng null values makes my life a lot easier since I am not having to put the stress of field depedancy upon the app. Also I do not need to make any complex code to deal with magics to set non-existant variables.
I personally would not store an empty value like "" since if your not careful you could have two empty values null and "" and then you'll have a hap-hazard time of querying specifically. So I personally prefer null for empty values.
As for space and index: it depends on how many rows might not have this colum but I doubt you will really notice the index size increase due to a few extra docs with null in. I mean the difference in storage is mineute especially if the corresponding key name is small as well. That goes for large setups too.
I am quite frankly unsure of the index usage between $exists and null however null could be a more standardised method by which to query the existance since remember that MongoDB is schemaless which means you have no requirement to have that field in the doc which again produces two empty values: non-existant and null. So better to choose one or the other.
I choose null.

Another point you might want to consider is when you use OGM tools like Hibernate OGM.
If you are using Java, Hibernate OGM supports the JPA standard. So if you can write a JPQL query, you would be theoretically easy if you want to switch to an alternate NoSQL datastore which is supported by the OGM tool.
JPA does not define a equivalent for $exists in Mongo. So if you have optional attributes in your collection then you cannot write a proper JPQL for the same. In such a case, if the attribute's value is stored as NULL, then it is still possible to write a valid JPQL query like below.
SELECT p FROM pppoe p where p.logout IS null;

I think in terms of disk space the difference is negligible. If you need to create an index on this field then consider Partial Index.
In index with { partialFilterExpression: { key: { $exists: true } } } can be much smaller than a normal index.
Also should be noted, that queries look different, see values like this:
db.collection.insertMany([
{ _id: 1, a: 1 },
{ _id: 2, a: '' },
{ _id: 3, a: undefined },
{ _id: 4, a: null },
{ _id: 5 }
])
db.collection.aggregate([
{
$set: {
type: { $type: "$a" },
ifNull: { $ifNull: ["$a", true] },
defined: { $ne: ["$a", undefined] },
existing: { $ne: [{ $type: "$a" }, "missing"] }
}
}
])
{ _id: 1, a: 1, type: double, ifNull: 1, defined: true, existing: true }
{ _id: 2, a: "", type: string, ifNull: "", defined: true, existing: true }
{ _id: 3, a: undefined, type: undefined, ifNull: true, defined: false, existing: true }
{ _id: 4, a: null, type: null, ifNull: true, defined: true, existing: true }
{ _id: 5, type: missing, ifNull: true, defined: false, existing: false }
Or with db.collection.find():
db.collection.find({ a: { $exists: false } })
{ _id: 5 }
db.collection.find({ a: { $exists: true} })
{ _id: 1, a: 1 },
{ _id: 2, a: '' },
{ _id: 3, a: undefined },
{ _id: 4, a: null }
db.collection.find({ a: null })
{ _id: 3, a: undefined },
{ _id: 4, a: null },
{ _id: 5 }
db.collection.find({ a: {$ne: null} })
{ _id: 1, a: 1 },
{ _id: 2, a: '' },
db.collection.find({ a: {$type: "null"} })
{ _id: 4, a: null }