Compound index - skipping a prefix vs selecting all values of a prefix - mongodb

{
field_1: "string" // can only have the value of "A" or "B",
field_2: "numeric",
}
The above is the schema for my collection.
The following compound index exists:
{
field_1: 1,
field_2: 1
}
The query in question is below:
db.col.find( { field_2: { $gt: 100 } } )
This query skips the prefix field_1. Hence MongoDB does not use the compound index.
So in order to get it to use the compound index, I change the query to this:
db.col.find( { field_1: { $in: ["A", "B"] }, field_2: { $gt: 100 } } )
Would MongoDB use the compound index in the second query?
Would there be any performance benefits either way?
If there is a performance benefits in some case to using the second query, are there cases where the performance would actually be worst?

Yes, the query will use the index for the second query.
There will be some performance benefit, but that will depend on how big your collection is, how many documents the query returns compared to the whole collection etc.
You can check the execution stats for yourself by using explain.
db.col.find({field_1: {$in: ["A","B"] }, field_2: {$gt: 4}}).explain("executionStats")

Related

MongoDB match on document and subdocuments, what to use as indexes?

I have a lot of documents looking like this:
[{
"title": "Luxe [daagse] [verzorging] # Egypte! Incl. vluchten, transfers & 4* ho",
"price": 433,
"automatic": false,
"destination": "5d26fc92f72acc7a0b19f2c4",
"date": "2020-01-19T00:00:00.000+00:00",
"days": 8,
"arrival_airport": "5d1f5b407ec7385fa2963623",
"departure_airport": "5d1f5adb7ec7385fa2963307",
"board_type": "5d08e1dfff6c4f13f6db1e6c"
},
{
"title": "Luxe [daagse] [verzorging] # Egypte! Incl. vluchten, transfers & 4* ho",
"automatic": true,
"destination": "5d26fc92f72acc7a0b19f2c4",
"prices": [{
"price": 433,
"date_from": "2020-01-19T00:00:00.000+00:00",
"date_to": "2020-01-28T00:00:00.000+00:00",
"day_count": 8,
"arrival_airport": "5d1f5b407ec7385fa2963623",
"departure_airport": "5d1f5adb7ec7385fa2963307",
"board_type": "5d08e1dfff6c4f13f6db1e6c"
},
{
"price": 899,
"date_from": "2020-04-19T00:00:00.000+00:00",
"date_to": "2020-04-28T00:00:00.000+00:00",
"day_count": 19,
"arrival_airport": "5d1f5b407ec7385fa2963623",
"departure_airport": "5d1f5adb7ec7385fa2963307",
"board_type": "5d08e1dfff6c4f13f6db1e6c"
}
]
}
]
As you can see, automatic deals have multiple prices (can be a lot, between 1000 and 4000) and does not have the original fields available.
Now I need to search in the original document as well in the subdocuments to look for a match.
This is the aggregation I use to search through the documents:
[{
"$match": {
"destination": {
"$in": ["5d26fc9af72acc7a0b19f313"]
}
}
}, {
"$match": {
"$or": [{
"prices": {
"$elemMatch": {
"price": {
"$lte": 1500,
"$gte": 400
},
"date_to": {
"$lte": "2020-04-30T22:00:00.000Z"
},
"date_from": {
"$gte": "2020-03-31T22:00:00.000Z"
},
"board_type": {
"$in": ["5d08e1bfff6c4f13f6db1e68"]
}
}
}
}, {
"price": {
"$lte": 1500,
"$gte": 400
},
"date": {
"$lte": "2020-04-30T22:00:00.000Z",
"$gte": "2020-03-31T22:00:00.000Z"
},
"board_type": {
"$in": ["5d08e1bfff6c4f13f6db1e68"]
}
}]
}
}, {
"$limit": 20
}]
I would like to speed things up, because it can be quite slow. I was wondering, what is the best index strategy for this aggregate, what fields do I use? Is this the best way of doing it or is there a better way?
From Mongo's $or docs:
When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes. Otherwise, MongoDB will perform a collection scan.
So with that in mind in order to avoid a collection scan in this pipeline you have to create a compound index containing both price and prices fields.
Remember that order matters in compound indexes so the order of the field should vary depending on your possible usage of it.
It seems to me that the index you want to create looks something like:
{destination: 1, date: 1, board_type: 1, price: 1, prices: 1}
A compound index including the match filter fields is required to make the aggregation run fast. In aggregation queries, having the $match stage early in the pipeline (preferably, first stage) utilizes indexes, if any are defined on the filter fields. In the posted query it is so, and defining the indexes is all needed for a fast query. But, index on what fields?
The index is going to be compound index; i.e., index on multiple fields of the query criteria. The index prefix starts with the destination field. The remaining index fields are to be determined. What are the remaining fields?
Most of these fields are in the prices array's sub-document fields - price, date_from, date_to and board_type. There is also the date field from the main document. Which of these fields need to be used in the compound index?
Defining indexes on array elements (or fields of sub-documents in an array) creates lots of index keys. This means lots of storage and for using the index the memory (or RAM). This is an important consideration. Indexes on array elements are called as multikey indexes. For an index to be properly utilized, the collection's documents and the index being used by the query (together called as working set) must fit into the RAM.
Another aspect you need to consider is the query selectivity. How many documents gets selected using a filter which uses an index field, is a factor. It is imperative that the filter field with must select a small set of the input documents to be effective. See Create Queries that Ensure Selectivity.
It is difficult to determine what other fields need to be considered (sure some of the fields of the prices) based on the above two factors. So, the index is going to be something like this:
{ destination: 1, fld1: 1, fld2: 1, ... }
The fld1, fld2, ..., are going to be the prices array sub-document fields and / or the date field. I think only one set of date fields can be used with the index. An example index can be one of these:
{ destination: 1, date: 1, "prices.price": 1, "prices.board_type": 1}
{ destination: 1, "prices.price": 1, "prices.date_from": 1, "prices.date_to": 1, "prices.board_type": 1}
Note the index keys order and the necessity of the price, date_from, date_to and board_type is to be determined based upon the two main factors - requirement of the working set and the query selectivity - this is important.
NOTES: On a small sample data set with similar structure showed usage of the compound index with the primary destination field and two fields from the prices (one with equality condition and one with range condition). The query plan using the explain showed an IXSCAN (index scan) on the compound index, and using an index will sure improve the query performance.

Mongo using the wrong index during aggregation with match+sort operations

I'm using MongoDB version 4.2.0. I have a collection with the following indexes:
{uuid: 1},
{unique: true, name: "uuid_idx"}
and
{field1: 1, field2: 1, _id: 1},
{unique: true, name: "compound_idx"}
When executing this query
aggregate([
{"$match": {"uuid": <uuid_value>}}
])
the planner correctly selects uuid_idx.
When adding this sort clause
aggregate([
{"$match": {"uuid": <uuid_value>}},
{"$sort": {"field1": 1, "field2": 1, "_id": 1}}
])
the planner selects compound_idx, which makes the query slower.
I would expect the sort clause to not make a difference in this context. Why does Mongo not use the uuid_idx index in both cases?
EDIT:
A little clarification, I understand there are workarounds to use the correct index, but I'm looking for an explanation of why this does not happen automatically (if possible with links to the official documentation). Thanks!
Why is this happening?:
Lets understand how Mongo chooses which index to use as explained here.
If a query can be satisfied by multiple indexes (satisfied is used losely as Mongo actually chooses all possibly relevant indexes) defined in the collection.
MongoDB will then test all the applicable indexes in parallel. The first index that can returns 101 results will be selected by the query planner.
Meaning that for that certain query that index actually wins.
What can we do?:
We can use $hint, hint basically forces Mongo to use a specific index, however Mongo this is not recommended because if changes occur Mongo will not adapt to those.
The query:
aggregate(
[
{ $match : { uuid : "some_value" } },
{ $sort : { fld1: 1, fld2: 1, _id: 1 } }
],
)
doesn't use the index "uuid_idx".
There are couple of options you can work with for using indexes on both the match and sort operations:
(1) Define a new compound index: { uuid: 1, fld1: 1, fld2: 1, _id: 1 }
Both the match and match+sort queries will use this index (for both the match and sort operations).
(2) Use the hint on the uuid index (using existing indexes)
Both the match and match+sort queries will use this index (for both the match and sort operations).
aggregate(
[
{ $match : { uuid : "some_value" } },
{ $sort : { fld1: 1, fld2: 1, _id: 1 } }
],
{ hint: "uuid_idx"}
)
If you can use find instead of aggregate, it will use the right index. So this is still problem in aggregate pipeline.

MongoDB index for uniqueness value

I need an index that will provide me uniqueness of the field among all fields. For example, I have the document:
{
_id: ObjectId("123"),
fieldA: "a",
fieldB: "b"
}
and I want to forbid insert the document
{
_id: ObjectId("456"),
fieldA: "new value for field a",
fieldB: "a"
}
because already exists the document that has the value "a" set on field "fieldA". Is it possible?
It seems you need a multikey index with a unique constraint.
Take into account that you can only have one multikey index in each collection for this reason you have to include all the fields you like to uniqueness inside an array
{
_id: ObjectId("123"),
multikey: [
{fieldA: "a"},
{fieldB: "b"}
]
}
Give a try to this code
db.collection.createIndex( { "multikey": 1}, { unique: true } )
To query you have to code
db.collection.findOne({"multikey.fieldA": "a"}, // Query
{"multikey.fieldA": 1, "multikey.fieldB": 1}) // Projection
For more info you can take a look at embedded multikey documents.
Hope this helps.
another option is to create a document with each unique key, indexed by this unique key and perform a loop over the field of each candidate document cancelling the write if any key is found.
IMO this solution is more resource consuming, in change it gets you a list of all keys consumed in written documents.
db.collection.createIndex( { "unikey": 1}, { unique: true } )
db.collection.insertMany( {[{"unikey": "$FieldA"},{"unikey": "$FieldB"}]}
db.collection.find({"unikey": 1})

Querying MongoDb's custom key & value with Index

I am trying to store key value data in MongoDb.
Key could be any string and I don't know about it anything more before storing, value could be any type (int, string, array). And I would like to have an index on such key & value.
I was looking on a (Multikey Index) over an array of my key-vals but looks like it can't cover queries over array fields.
Is it possible to have an index on a custom key & value in mongoDb and make queries with such operations as $exists and $eq and $gte, $lte, $and, $or, $in without COLLSCAN but through an IXSCAN stage?
Or maybe I need another Db for that?
I may have misunderstood your question but I think that this is precisely where MongoDB's strengths are - dealing with different shapes of documents and data types.
So let's say you have to following two documents:
db.test.insertMany([
{
key: "test",
value: [ "some array", 1 ]
},
{
key: 12.7,
values: "foo"
}
])
and you create a compound index like this:
db.test.createIndex({
"key": 1,
"value": 1
})
then the following query will use that index:
db.test.find({ "key": "test", "value": 1 })
and also more complicated queries will do the same:
db.test.find({ "key": { $exists: true }, "value": { gt: 0 } })
You can verify this by adding a .explain() to the end of the above queries.
UPDATE based on your comment:
You don't need the aggregation framework for that. You can simply do something like this:
db.test.distinct("user_id", { "key": { $exists: true } })
This query is going to use the above index. Moreover it can be made even faster by changing the index definition to include the "user_id" field like this:
db.test.createIndex({
"key" : 1.0,
"value" : 1.0,
"user_id" : 1
})
This, again, can be verified by running the following query:
db.test.explain().distinct("user_id", { "key": { $exists: true } })
If your key can be any arbitrary value, then this is impossible. Your best bet is to create an index on some other known field to limit the initial results so that the inevitable collection scan's impact is reduced to a minimum.

How to find specific key in MongoDB?

I have to find all the documents which include "p396:branchCode" as key in MongoDB.Value does not matter,can be anything.I tried using
{"p396:branchCode": new RegExp(".*")}
in MongoVUE but i found nothing.
My db is very nested and branchCode has superkey "p396:checkTellersEApproveStatus"
Your key is nested under a super-key, so you need to use the dot-operator:
{"p396:checkTellersEApproveStatus.p396:branchCode": {$exists: true}}
This assumes p396:branchCode is always under p396:checkTellersEApproveStatus. When that's not the case, you have a problem, because MongoDB does not allow to do queries for unknown keys. When the number of possible super-keys is low, you could query for all of them with the $or-operator. When not, then your only option is to refactor your objects to arrays. To give an example, a structure like this:
properties: {
prop1: "value1",
prop2: "value2",
prop3: "value3"
}
would be far easier to query for values under arbitrary keys when made to look like this:
properties: [
{ key: "prop1", value:"value1"} ,
{ key: "prop2", value:"value2"},
{ key: "prop3", value:"value3"}
]
because you could just do db.collection.find({"properties.value":"value2"})
If you are actually "mixing types then that probably is not a good thing. But if all you care about is that the field $exists then that is the operator to use:
db.collection.find({
"p396:checkTellersApproveStatus.p396:branchCode": { "$exists": true }
})
If the values are actuall "all" numeric and you have an expected "range" then use $gt and $lt operators instead. This allows an "index" on the field to be used. And a "sparse" index where this was not present in all documents would improve performance:
db.collection.find({
"p396:checkTellersApproveStatus.p396:branchCode": {
"$gt": 0, "$lt": 99999
}
})
In all cases, this is the "child" of the parent "p396:checkTellersApproveStatus", so you use "not notation" to acess the full path to the property.
Sounds like you want to use the $exists operator.
{'p396:branchCode': {$exists: true}}
This assumes this query is part of the path:
{ 'p396:checkTellersApproveStatus': {'p396:branchCode': {$exists: true}}}
Which can be shortened to:
{ 'p396:checkTellersApproveStatus.p396:branchCode': {$exists: true}}