Suppose I have a database users with a partial index:
db.users.createIndex(
{ username: 1 },
{ unique: true, partialFilterExpression: { age: { $gte: 21 } } }
)
I want to find all documents for query:
db.users.find(
{age: { $gte: 21 }}
)
However, this query will not use my index despite the fact that all I need is to find all documents from the index.
What should I do to use this index for my purpose?
Try this query
db.users.find({ username:{ $ne:null }, age:{ $gte: 21 }})
If you have null users, try:
db.users.find({$or:[{username:{ $ne:null}},{username: {$eq:null }}], age:{ $gte: 21 }})
Considerations
MongoDB requires the indexed field to be in the query, to use an index
There may be other suitable conditions ($exists won't work), but I find this useful enough.
Add .explain() to see the parsed Query and the usage of Indexes.
Tested
I guess this one should also work (but I did not test):
db.users.find({}).min({}).hint({ username: 1 })
Related
{
field_1: "string" // can only have the value of "A" or "B",
field_2: "numeric",
}
The above is the schema for my collection.
The following compound index exists:
{
field_1: 1,
field_2: 1
}
The query in question is below:
db.col.find( { field_2: { $gt: 100 } } )
This query skips the prefix field_1. Hence MongoDB does not use the compound index.
So in order to get it to use the compound index, I change the query to this:
db.col.find( { field_1: { $in: ["A", "B"] }, field_2: { $gt: 100 } } )
Would MongoDB use the compound index in the second query?
Would there be any performance benefits either way?
If there is a performance benefits in some case to using the second query, are there cases where the performance would actually be worst?
Yes, the query will use the index for the second query.
There will be some performance benefit, but that will depend on how big your collection is, how many documents the query returns compared to the whole collection etc.
You can check the execution stats for yourself by using explain.
db.col.find({field_1: {$in: ["A","B"] }, field_2: {$gt: 4}}).explain("executionStats")
I have a bunch of records I want to upsert for specific product ids. Depending on previous calculations I want to record what type that product was in the current week/year.
Problem is that I can't figure out a way to do this except for one at a time. Right now I'm doing:
a_group.forEach(p => {
db.abc.update({
product_id: p._id,
year: 2021
}, {
$set: {
'abc.34': 'a'
}
}, {
upsert: true
});
});
Where a_group is just an array of products.
This is really heavy in case of a large products array. It just does a_group.length upsert operations.
Ideally I would like to do something like:
db.abc.update({
product_id: { $in: a_group.map(p => p._id) },
year: 2021
}, {
$set: {
'abc.34': 'a'
}
}, {
upsert: true,
multi: true
});
Which would see that a_group is an array and try to match and upsert for every single item in the array. Except that doesn't work.
Any help would be very much appreciated.
The problem here is you want a separate upsert for each discreet value of the _id.
From the docs:
An upsert:
Updates documents that match your query filter
Inserts a document if there are no matches to your query filter
In the case of an upsert such as:
updateMany(
{a:{$in[1,2,3]}},
{$set:{b:true}},
{upsert: true}
)
If there exists a document containing a: 3, then it will match the query filter, and therefore that one document will be updated, and no inserts will occur.
In the event that no document matches any of the values passed to $in, a single new document will be upserted. Since the query has no way to determine which value of a you wanted, it will create a document containing {b:true}, but will leave a undefined.
What you probably want is a bulk operation that can perform many upsert operations with a single call to the database.
Using the mongosh shell, that might look like:
let ops = [];
a_group.forEach(p => {
ops.push(
{ updateOne:
{
filter: {
product_id: p._id,
year: 2021
},
update: {$set: {'abc.34': 'a'}},
upsert: true
}
);
});
db.abc.bulkWrite(ops)
Check the docs for the driver you are using to see how to do a bulk operation.
according to your code i think a_group is array of object because you use p._id in forEach ,
you should exports _id from a_group and push in new array for example productsId
and replace
product_id: { $in: a_group }
with
product_id: { $in: productsId },
Wanted to know the performance difference between countDocument and find query.
I have to find the count of documents based on certain filter, which approach will be better and takes less time?
db.collection.countDocuments({ userId: 12 })
or
db.collection.find({ userId: 12 }) and then using the length of resulted array.
You should definitely use db.collection.countDocuments() if you don't need the data. This method uses an aggregation pipeline with the filters you pass on and only returns the count so you don't waste processing and time waiting for an array with all results.
This:
db.collection.countDocuments({ userId: 12 })
Is equivalent to:
db.collection.aggregate([
{ $match: { userId: 12 } },
{ $group: { _id: null, n: { $sum: 1 } } }
])
I'm using MongoDB version 4.2.0. I have a collection with the following indexes:
{uuid: 1},
{unique: true, name: "uuid_idx"}
and
{field1: 1, field2: 1, _id: 1},
{unique: true, name: "compound_idx"}
When executing this query
aggregate([
{"$match": {"uuid": <uuid_value>}}
])
the planner correctly selects uuid_idx.
When adding this sort clause
aggregate([
{"$match": {"uuid": <uuid_value>}},
{"$sort": {"field1": 1, "field2": 1, "_id": 1}}
])
the planner selects compound_idx, which makes the query slower.
I would expect the sort clause to not make a difference in this context. Why does Mongo not use the uuid_idx index in both cases?
EDIT:
A little clarification, I understand there are workarounds to use the correct index, but I'm looking for an explanation of why this does not happen automatically (if possible with links to the official documentation). Thanks!
Why is this happening?:
Lets understand how Mongo chooses which index to use as explained here.
If a query can be satisfied by multiple indexes (satisfied is used losely as Mongo actually chooses all possibly relevant indexes) defined in the collection.
MongoDB will then test all the applicable indexes in parallel. The first index that can returns 101 results will be selected by the query planner.
Meaning that for that certain query that index actually wins.
What can we do?:
We can use $hint, hint basically forces Mongo to use a specific index, however Mongo this is not recommended because if changes occur Mongo will not adapt to those.
The query:
aggregate(
[
{ $match : { uuid : "some_value" } },
{ $sort : { fld1: 1, fld2: 1, _id: 1 } }
],
)
doesn't use the index "uuid_idx".
There are couple of options you can work with for using indexes on both the match and sort operations:
(1) Define a new compound index: { uuid: 1, fld1: 1, fld2: 1, _id: 1 }
Both the match and match+sort queries will use this index (for both the match and sort operations).
(2) Use the hint on the uuid index (using existing indexes)
Both the match and match+sort queries will use this index (for both the match and sort operations).
aggregate(
[
{ $match : { uuid : "some_value" } },
{ $sort : { fld1: 1, fld2: 1, _id: 1 } }
],
{ hint: "uuid_idx"}
)
If you can use find instead of aggregate, it will use the right index. So this is still problem in aggregate pipeline.
So, I tried to query
db.collection('collection_name').aggregate([
{
$match: { owner_id: '5be9b2f03ef77262c2bd49e6' }
},
{
$sort: { _id: -1 }
}])
the query above takes up 20s
but If I tried to query
db.collection('collection_name').aggregate([{$sort : {_id : -1}}])
it's only take 0.7s
Why does it the one without $match is actually faster than without match ?
update :
when I try this query
db.getCollection('callbackvirtualaccounts').aggregate([
{
$match: { owner_id: '5860457640b4fe652bd9c3eb' }
},
{
$sort: { created: -1 }
}
])
it's only takes 0.781s
Why sort by _id is slower than by created field ?
note : I'm using mongodb v3.0.0
db.collection('collection_name').aggregate([
{
$match: { owner_id: '5be9b2f03ef77262c2bd49e6' }
},
{
$sort: { _id: -1 }
}])
This collection probably won't be having and index on owner_id; Try using below mentioned index creation query and rerun your previous code.
db.collection('collection_name').createIndexes({ owner_id:1}) //Simple Index
or
db.collection('collection_name').createIndexes({ owner_id:1,_id:-1}) //Compound Index
**Note:: If you don't know how to compound index yet, you can create simple indexes individually on all keys which are used either in match or sort and that should be making query efficient as well.
The query speed depends upon a lot of factors. The size of collection, size of the document, indexes defined on the collection (and used in the queries and properly), the hardware components (like CPU, RAM, network) and other processes running at the time the query is running.
You have to tell what indexes are defined on the collection being discussed for further analysis. The command will retrieve them: db.collection.getIndexes()
Note the unique index on the _id field is created by default, and cannot be modified or deleted.
(i)
But If I tried to query: db.collection.aggregate( [ { $sort : { _id : -1 } } ] ) it's
only take 0.7s.
The query is faster because there is an index on the _id field and it is used in sort process. Aggregation queries use indexes with sort stage and when this sort happens early in the pipeline. You can verify if the index is used or not by generating a query plan (use explain with executionStats mode). There will be an index scan (IXSCAN) in the generated query plan.
(ii)
db.collection.aggregate([
{
$match: { owner_id: '5be9b2f03ef77262c2bd49e6' }
},
{
$sort: { _id: -1 }
}
])
The query above takes up 20s.
When I try this query it's only takes 0.781s.
db.collection.aggregate([
{
$match: { owner_id: '5860457640b4fe652bd9c3eb' }
},
{
$sort: { created: -1 }
}
])
Why sort by _id is slower than by created field ?
Cannot come to any conclusions with the available information. In general, the $match and $sort stages present early in the aggregation query can use any indexes created on the fields used in the operations.
Generating a query plan will reveal what the issues are.
Please run the explain with executionStats mode and post the query plan details for all queries in question. There is documentation for Mongodb v3.0.0 version on generation query plans using explain: db.collection.explain()