I am querying a collection where for two array fields I have specific values and for two other fields I only want { $exists : true }.
i.e.
db.collection.aggregate([
{ $match: {array: {$elemMatch: {field1: 'value1', field2: 'value2', field3:{$exists : true}, field4: {$exists : true}}}] },
{$unwind:array},
{$project...}
])
I have created three indexes:
index 1: field1:1, field2:1
index 2: field1:1, field2:1, field3:1
index 3: field1:1, field2:1, field3:1, field4:1
when I try the explain() method on the query the winning plan always picks up index 1.
Can i create a compound index where all four fields are included to speed up my query? (I have tried partial indexing on fields 3 and 4 but it made no difference)
Related
{
field_1: "string" // can only have the value of "A" or "B",
field_2: "numeric",
}
The above is the schema for my collection.
The following compound index exists:
{
field_1: 1,
field_2: 1
}
The query in question is below:
db.col.find( { field_2: { $gt: 100 } } )
This query skips the prefix field_1. Hence MongoDB does not use the compound index.
So in order to get it to use the compound index, I change the query to this:
db.col.find( { field_1: { $in: ["A", "B"] }, field_2: { $gt: 100 } } )
Would MongoDB use the compound index in the second query?
Would there be any performance benefits either way?
If there is a performance benefits in some case to using the second query, are there cases where the performance would actually be worst?
Yes, the query will use the index for the second query.
There will be some performance benefit, but that will depend on how big your collection is, how many documents the query returns compared to the whole collection etc.
You can check the execution stats for yourself by using explain.
db.col.find({field_1: {$in: ["A","B"] }, field_2: {$gt: 4}}).explain("executionStats")
I'm using MongoDB version 4.2.0. I have a collection with the following indexes:
{uuid: 1},
{unique: true, name: "uuid_idx"}
and
{field1: 1, field2: 1, _id: 1},
{unique: true, name: "compound_idx"}
When executing this query
aggregate([
{"$match": {"uuid": <uuid_value>}}
])
the planner correctly selects uuid_idx.
When adding this sort clause
aggregate([
{"$match": {"uuid": <uuid_value>}},
{"$sort": {"field1": 1, "field2": 1, "_id": 1}}
])
the planner selects compound_idx, which makes the query slower.
I would expect the sort clause to not make a difference in this context. Why does Mongo not use the uuid_idx index in both cases?
EDIT:
A little clarification, I understand there are workarounds to use the correct index, but I'm looking for an explanation of why this does not happen automatically (if possible with links to the official documentation). Thanks!
Why is this happening?:
Lets understand how Mongo chooses which index to use as explained here.
If a query can be satisfied by multiple indexes (satisfied is used losely as Mongo actually chooses all possibly relevant indexes) defined in the collection.
MongoDB will then test all the applicable indexes in parallel. The first index that can returns 101 results will be selected by the query planner.
Meaning that for that certain query that index actually wins.
What can we do?:
We can use $hint, hint basically forces Mongo to use a specific index, however Mongo this is not recommended because if changes occur Mongo will not adapt to those.
The query:
aggregate(
[
{ $match : { uuid : "some_value" } },
{ $sort : { fld1: 1, fld2: 1, _id: 1 } }
],
)
doesn't use the index "uuid_idx".
There are couple of options you can work with for using indexes on both the match and sort operations:
(1) Define a new compound index: { uuid: 1, fld1: 1, fld2: 1, _id: 1 }
Both the match and match+sort queries will use this index (for both the match and sort operations).
(2) Use the hint on the uuid index (using existing indexes)
Both the match and match+sort queries will use this index (for both the match and sort operations).
aggregate(
[
{ $match : { uuid : "some_value" } },
{ $sort : { fld1: 1, fld2: 1, _id: 1 } }
],
{ hint: "uuid_idx"}
)
If you can use find instead of aggregate, it will use the right index. So this is still problem in aggregate pipeline.
I need an index that will provide me uniqueness of the field among all fields. For example, I have the document:
{
_id: ObjectId("123"),
fieldA: "a",
fieldB: "b"
}
and I want to forbid insert the document
{
_id: ObjectId("456"),
fieldA: "new value for field a",
fieldB: "a"
}
because already exists the document that has the value "a" set on field "fieldA". Is it possible?
It seems you need a multikey index with a unique constraint.
Take into account that you can only have one multikey index in each collection for this reason you have to include all the fields you like to uniqueness inside an array
{
_id: ObjectId("123"),
multikey: [
{fieldA: "a"},
{fieldB: "b"}
]
}
Give a try to this code
db.collection.createIndex( { "multikey": 1}, { unique: true } )
To query you have to code
db.collection.findOne({"multikey.fieldA": "a"}, // Query
{"multikey.fieldA": 1, "multikey.fieldB": 1}) // Projection
For more info you can take a look at embedded multikey documents.
Hope this helps.
another option is to create a document with each unique key, indexed by this unique key and perform a loop over the field of each candidate document cancelling the write if any key is found.
IMO this solution is more resource consuming, in change it gets you a list of all keys consumed in written documents.
db.collection.createIndex( { "unikey": 1}, { unique: true } )
db.collection.insertMany( {[{"unikey": "$FieldA"},{"unikey": "$FieldB"}]}
db.collection.find({"unikey": 1})
I have two collections, customSchemas, and customdata. Besides the default _id index, I've added the following indexes
db.customData.createIndex( { "orgId": 1, "contentType": 1 });
db.customSchemas.createIndex( { "orgId": 1, "contentType": 1 }, { unique: true });
I've decided to enforce orgId on all calls, so in my service layer, every query has an orgId in it, even the ones with ids, e.g.
db.customData.find({"_id" : ObjectId("557f30402598f1243c14403c"), orgId: 1});
Should I add an index that has both _id and orgId in it? Do the indexes I have currently help at all when I'm searching by both _id and orgId?
MongoDB 2.6+ provides index intersection feature that cover your case by using intersection of index _id {_id:1} and index prefix orgId in { "orgId": 1, "contentType": 1 }
So your query {"_id" : ObjectId("557f30402598f1243c14403c"), orgId: 1} should be covered by index already.
However, index intersection is less performant than a compound index on {"_id" : 1, orgId: 1}, as it comes with an extra step (intersection of the two sets). Hence, if this is a query that you use most of the time, creating the compound index on it is a good idea.
Let's say I have the following document structure:
{ _id : 1,
items: [ {n: "Name", v: "Kevin"}, ..., {n: "Age", v: 100} ],
records : [ {n: "Environment", v: "Linux"}, ... , {n: "RecordNumber, v: 555} ]
}
If I create 2 compound indexes on items.n-items.v and records.n-records.v, I could perform an $all query:
db.collection.find( {"items" : {$all : [{ $elemMatch: {n: "Name", v: "Kevin"} },
{$elemMatch: {n: "Age", v: 100} } ]}})
I could also perform a similar search on records.
db.collection.find( {"records" : {$all : [{ $elemMatch: {n: "Environment", v: "Linux"} },
{$elemMatch: {n: "RecordNumber", v: 555} } ]}})
Can I somehow perform a query that uses the index(es) to search for a document based on the items and records field?
find all documents where item.n = "Name" and item.v = "Kevin" AND record.n="RecordNumber" and record.v = 100
I'm not sure that this is possible using $all.
You can use an index to query on one array, but not both. Per the documentation, While you can create multikey compound indexes, at most one field in a compound index may hold an array.
Practically:
You can use a Compound index to index multiple fields.
You can use a Multikey index to index all the elements of an array.
You can use a Multikey index as one element of a compound Index
You CANNOT use multiple multikey indexes in a compound index
The documentation lays out the reason for this pretty clearly:
MongoDB does not index parallel arrays because they require the index to include each value in the Cartesian product of the compound keys, which could quickly result in incredibly large and difficult to maintain indexes.