How to create an index for a complex MongoDB query - mongodb

I need to create an index for the following query:
await Activity.find({
$and: [
{
lastUpdated: {
$gte: new Date(new Date().getTime() - 7 * 24 * 60 * 60 * 1000),
},
},
{
"followers._user": _user,
},
{
global: true,
}
]
})
.collation({
locale: "en_US",
numericOrdering: true,
})
.sort({
lastUpdated: -1
})
.skip(
length
)
.limit(
10
)
I have the below index in place currently but the query does not use it.
ActivitiesSchema.index(
{ "followers._user": 1, global: 1, lastUpdated: -1 },
{
collation: {
locale: "en_US",
numericOrdering: true,
},
}
);
What can I try to solve this?

Change index to:
{ lastUpdated: -1, "followers._user": 1, global: 1 }
NB: it may affect other queries that rely on existing index
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/#sort-and-index-prefix reads:
If the sort keys correspond to the index keys or an index prefix, MongoDB can use the index to sort the query results. A prefix of a compound index is a subset that consists of one or more keys at the start of the index key pattern.
Since you are sorting by "lastUpdated", the index should start from it.
NB 2: With this change Mongodb can use, but it is not guaranteed. There are many other factors like selectivity and cardinality, e.g. global: true, implies extremely low cardinality to benefit from an index on this field. On the other hand if the user doesn't follow much and the total activity is massive, it might be cheaper to filter by "followers._user" index and do in-memory sort. It's up to the query planner to decide which index to use.

Related

MongoDB returns wrong rows with $sort, $skip and $limit

This is the function I'm using
MyModel.aggregate([
{ $match: query },
{ $sort: { 'createdAt': -1 } },
{ $skip: skip },
{ $limit: 10 }
], { allowDiskUse : true });
query is to filter the rows. skip is dynamic value based on pagination (i.e 0, 10, 20 ...). The problem is, the rows for each page in wrong. For instance, I can see a specific row in page 1,2,3,4 at the same time! some rows are missing as well.
Why is it happening?
I think the key to this question is the information that you shared in this comment:
createdAt for many rows has the same value. is it a probability for $sort not to work properly?
It's not that the sort doesn't "work properly" necessarily, it's that it doesn't have enough information to sort deterministically each time. The value of 123 doesn't come before or after another value of 123 on its own. As #Noel points out, you need to provide an additional field(s) to your sort.
This is also covered here in the documentation:
MongoDB does not store documents in a collection in a particular order. When sorting on a field which contains duplicate values, documents containing those values may be returned in any order.
If consistent sort order is desired, include at least one field in your sort that contains unique values. The easiest way to guarantee this is to include the _id field in your sort query.
This is because the _id field is unique. If you take that approach it would change your sort to:
{ $sort: { 'createdAt': -1, _id: 1 } },
the dynamic value based on pagination (i.e 0, 10, 20 ...) should be used for limit, then the skip should be (page - 1 * limit)
let say,
const limit = 10;
const page = 1;
const skip = (page - 1) * limit; // basically is 0
MyModel.aggregate([
{ $match: query },
{ $sort: { 'createdAt': -1 } },
{ $skip: skip },
{ $limit: limit }
], { allowDiskUse : true });
so your query will skip 0 data and start with the first result up 10th
so if you page query is 2,
const limit = 10;
const page = 2;
const skip = (page - 1) * limit; // basically is 10
MyModel.aggregate([
{ $match: query },
{ $sort: { 'createdAt': -1 } },
{ $skip: skip },
{ $limit: limit }
], { allowDiskUse : true });
so your query will skip 10 data and start with the 11th result up 20th

Mongodb Index behavior different in PROD and Testing environment

I have 2 dedicated Mongo clusters which have the same exact Model, Indexes and we query both envs the same way but the result is different.
user.model.js
const schema = mongoose.Schema({
_id: ObjectId,
role: {
type: String,
enum: ['user', 'admin'],
required: true,
},
score: { type: Number, default: 0 },
deactivated: { type: Date },
});
schema.index(
{ deactivated: 1, role: 1, score: -1 },
{ name: 'search_index', collation: { locale: 'en', strength: 2 } }
);
I noticed that one of our common queries was causing issues on the PROD environment.
The query looks like this:
db.getCollection('users')
.find({deactivated: null, role: 'user'})
.sort({score: -1})
.limit(10)
.collation({locale: 'en', strength: 2})
On the Testing Environment the query runs as expected fully utilizing the index. (has ~80K records total, 1300 deactivated)
But in our PROD env the query, seems to be using only the first part of the compound index. (has ~50K records total, ~20K records deactivated)
The executionStats looks like:
As we can see it is using at least the first part of the index to only search in non-deactivated records, but the SORT is in memory.
This is a legacy application so the first thing I did was ensure that the types of the indexed fields are following the schema in all the records.
I wonder if it could be the "role" collation somehow?
Any hint or clue will be greatly appreciated. Thanks in advance.
Thanks for providing the plans. It is a combination of a few things (including the multikeyness of the production index) that is causing the problem.
There are a few ways to potentially solve this, let's start with the obvious question. Is score supposed to be an array?
The schema suggests not. With MongoDB, an index becomes multikey once a single document is inserted that has an array (even empty) for a key in the index. There is no way to way to "undo" this change apart from rebuilding the index. If the field is not supposed to contain an array, then I would suggest fixing any documents that contain the incorrect data and then rebuilding the index. As this is production, you may want to build a temporary index to reduce the impact to the application while the original index is dropped and recreated. You may also want to look into schema validation to help prevent incorrect data from getting inserted in the future.
If score can be an array, then we'll need to take a different approach. We can see in the UAT plan that a SORT_MERGE is used. The only reason that stage is required is because {"deactivated" : null} seems to have an additional index bound looking for undefined. That may be some internal implementation quirk as that BSON type appears to be deprecated. So updating the data to have an explicit false value for this field and using that check in the query predicate (rather than a check for null) will remove the need to split the plan out with a SORT_MERGE and will probably allow the multikey index to provide the sort:
winningPlan: {
stage: 'LIMIT',
limitAmount: 10,
inputStage: {
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
keyPattern: { deactivated: 1, role: 1, score: -1 },
indexName: 'search_index',
collation: {
locale: 'en',
caseLevel: false,
caseFirst: 'off',
strength: 2,
numericOrdering: false,
alternate: 'non-ignorable',
maxVariable: 'punct',
normalization: false,
backwards: false,
version: '57.1'
},
isMultiKey: true,
multiKeyPaths: { deactivated: [], role: [], score: [ 'score' ] },
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: {
deactivated: [ '[false, false]' ],
role: [
'[CollationKey(0x514d314b0108), CollationKey(0x514d314b0108)]'
],
score: [ '[MaxKey, MinKey]' ]
}
}
}
}

Mongodb querying - How to specify index traversal order?

Context
I have a users collection with an array of key-value pairs like so:
{
name: 'David',
customFields: [
{ key: 'position', value: 'manager' },
{ key: 'department', value: 'HR' }
]
},
{
name: 'Allison',
customFields: [
{ key: 'position', value: 'employee' },
{ key: 'department', value: 'IT' }
]
}
The field names in customFields are configurable by the application users, so in order for it to be indexable I store them as an array of key-value pairs. Index { 'customFields.key': 1, 'customFields.value': 1} works quite well. Problem is when I want to retrieve the results in either order (ascending or descending). Say I want to sort all users having the custom field position in either order. The below query gives me ascending order:
db.users.find({ customFields: { $elemMatch: { key: 'position' } } })
However, I couldn't figure out how to get the opposite order. I'm pretty sure with the way the indexes are structured, if I could tell mongodb to traverse the index backwards I would get what I want. The other options is to have an other 2 indexes to cover both directions, however it's more costly.
Is there a way to specify the traversal direction? If not, what are some good alternatives? Much appreciated.
EDIT
I'd like to further clarify my situation. I have tried:
db.users.find({ customFields: { $elemMatch: { key: 'position' } } })
.sort({ 'customFields.key': 1, 'customFields.value': 1 })
db.users.find({ customFields: { $elemMatch: { key: 'position' } } })
.sort({'customFields.value': 1 })
These two queries only sort the documents after they have been filtered, meaning the sorting is applied on all the custom fields they have, not on the field matching the query (position in this case). So it seems using the sort method won't be of any help.
Not using any sorting conveniently returns the documents in the correct order in the ascending case. As can be seen in the explain result:
"direction" : "forward",
"indexBounds" : {
"customFields.key" : [
"[\"position\", \"position\"]"
],
"customFields.value" : [
"[MinKey, MaxKey]"
]
}
I'm using exact match for customFields.key so I don't care about it's order. Within each value of customFields.key, the values of customFields.value are arranged in the order specified in the index, so I just take them out as it is and all is good. Now if I could do something like:
db.users.find({ customFields: { $elemMatch: { key: 'position' } } })
.direction('backwards')
It would be the equivalent of using the index { 'customFields.key': -1, 'customFields.value': -1 }. Like I said I don't care about customFields.key order, and 'customFields.value': -1 gives me exactly what I want. I've searched mongodb documentation but I couldn't find anything similar to that.
I could retrieve all the documents matching customFields.key and do the sorting myself, but it's expensive as the number of documents could get very big. Ideally all the filtering and sorting is solved by the index.
Here is the compound index you created:
db.users.createIndex( { "customFields.key": 1, "customFields.value": 1 } )
This index will allow traversal either using both fields in ascending order:
db.users.sort( { "customFields.key": 1, "customFields.value": 1 } )
Or, it can also support traversal using both keys in descending order:
db.users.sort( { "customFields.key": -1, "customFields.value": -1 } )
If you need mixed behavior, i.e. ascending on one field but descending on the other, then you would need to add a second index:
db.users.createIndex( { "customFields.key": 1, "customFields.value": -1 } )
Note that after adding this second index, all four possible combinations are now supported.

How to find all documents from partial index in mongoDB?

Suppose I have a database users with a partial index:
db.users.createIndex(
{ username: 1 },
{ unique: true, partialFilterExpression: { age: { $gte: 21 } } }
)
I want to find all documents for query:
db.users.find(
{age: { $gte: 21 }}
)
However, this query will not use my index despite the fact that all I need is to find all documents from the index.
What should I do to use this index for my purpose?
Try this query
db.users.find({ username:{ $ne:null }, age:{ $gte: 21 }})
If you have null users, try:
db.users.find({$or:[{username:{ $ne:null}},{username: {$eq:null }}], age:{ $gte: 21 }})
Considerations
MongoDB requires the indexed field to be in the query, to use an index
There may be other suitable conditions ($exists won't work), but I find this useful enough.
Add .explain() to see the parsed Query and the usage of Indexes.
Tested
I guess this one should also work (but I did not test):
db.users.find({}).min({}).hint({ username: 1 })

MongoDb Indexing for MultiTenant

I have two collections, customSchemas, and customdata. Besides the default _id index, I've added the following indexes
db.customData.createIndex( { "orgId": 1, "contentType": 1 });
db.customSchemas.createIndex( { "orgId": 1, "contentType": 1 }, { unique: true });
I've decided to enforce orgId on all calls, so in my service layer, every query has an orgId in it, even the ones with ids, e.g.
db.customData.find({"_id" : ObjectId("557f30402598f1243c14403c"), orgId: 1});
Should I add an index that has both _id and orgId in it? Do the indexes I have currently help at all when I'm searching by both _id and orgId?
MongoDB 2.6+ provides index intersection feature that cover your case by using intersection of index _id {_id:1} and index prefix orgId in { "orgId": 1, "contentType": 1 }
So your query {"_id" : ObjectId("557f30402598f1243c14403c"), orgId: 1} should be covered by index already.
However, index intersection is less performant than a compound index on {"_id" : 1, orgId: 1}, as it comes with an extra step (intersection of the two sets). Hence, if this is a query that you use most of the time, creating the compound index on it is a good idea.