Correct mongo index for a large query - mongodb

I'm using mongoose and have a schema similar to this:
var schema = new mongoose.Schema({
created: Date,
fieldA: ObjectId,
fieldB: ObjectId,
fieldC: ObjectId,
sortField: Number
});
This is a big collection so I want to make sure the indexes are optimal. I build up a query with something like this:
var query = Schema.find({created: some-date-clause});
if ( some-condition )
query = query.or({fieldA: {$in: listOfSomeFieldAIDs}});
if ( some-other-condition )
query = query.or({fieldB: {$in: listOfSomeFieldBIDs}});
if ( yet-another-condition )
query = query.or({fieldC: {$in: listOfSomeFieldCIDs}});
query = query.sort({sortField:-1});
I want to make sure that whatever the query is, it's fully covered. My instinct is to create an index that spans created, fieldA, fieldB, fieldC and sortField. But should I actually create several indices for the situations where for example only the second condition is true, or the first and the third are true? Should I be approaching this differently?

Each clause of an $or query is considered independently, so it's likely best to create three separate indexes, one per fieldA, fieldB, and fieldC. See docs here.
The sort occurs after the results of the three $or clauses are merged, so adding sortField to these indexes isn't likely to be useful.
But as always, validate any index approach using explain() to make sure your queries are able to use the indexes you've created.

Related

In MongoDB how to decide for a collection which fields to be indexed for a costly query

I have a collection with 1000+ records and I need to run the query below. I have come across the issue that this query takes more than a minute even if the departmentIds array has length something like 15-20. I think if I use an index the query time will be reduced.
From what I observe the 99% of the time spent on the query is due to the $in part.
How do I decide which fields to index. Should I index only department.department_id since that's what taking most time or should I create a compound index using userId,something and department.department_id (bascially all the fields I'm using in the query here)
Here is what my query looks like
let departmentIds = [.......................... can be large]
let query = {
userId: someid,
something: something,
'department.department_id': {
$in: departmentIds
}
};
//db query
let result = db
.collection(TABLE_NAME)
.find(query)
.project({
anotherfield: 1,
department: 1
})
.toArray();
You need to check all search cases and create indexes for those that are often used and most critical for your application. For the particular case above this seems to be the index options:
userId:1
userId:1,something:1
userId:1,something:1,department.department_id:1
I bet on option 1 since userId sounds like a unique key with high selectivity very suitable for index , afcourse best is to do some testing and identify the fastest options , there is good explain option that can help alot with the testing:
db.collection.find().explain("executionStats")

Optimising queries in mongodb

I am working on optimising my queries in mongodb.
In normal sql query there is an order in which where clauses are applied. For e.g. select * from employees where department="dept1" and floor=2 and sex="male", here first department="dept1" is applied, then floor=2 is applied and lastly sex="male".
I was wondering does it happen in a similar way in mongodb.
E.g.
DbObject search = new BasicDbObject("department", "dept1").put("floor",2).put("sex", "male");
here which match clause will be applied first or infact does mongo work in this manner at all.
This question basically arises from my background with SQL databases.
Please help.
If there are no indexes we have to scan the full collection (collection scan) in order to find the required documents. In your case if you want to apply with order [department, floor and sex] you should create this compound index:
db.employees.createIndex( { "department": 1, "floor": 1, "sex" : 1 } )
As documentation: https://docs.mongodb.org/manual/core/index-compound/
db.products.createIndex( { "item": 1, "stock": 1 } )
The order of the fields in a compound index is very important. In the
previous example, the index will contain references to documents
sorted first by the values of the item field and, within each value of
the item field, sorted by values of the stock field.

How feasible is this query

suppose you have a collection of documents with the following structure:
_id
A_id = ObjectId
B_id = ObjectId
C_id = ObjectId
+ other stuff
suppose you have a collection of roughly 100 million to 1 billion documents. I have to run a query,
which returns all documents such that A_id, B_id, or C_id are in some list of ObjectId, say L = [ ObjectId]
Something like this:
{ '$or' : [ { 'A_id' : { '$in' : L}},
{ 'B_id' : { '$in' : L}},
{ 'C_id' : { '$in' : L}} ]
}
Q: is it doable to run such query? Is it normal to run such queries on mongodb?
Q: how long can it take of a single server and how long may it take at horizontally scaled database?
It is a doable query.
The real question is "is it a good query?"
The answer to that question is extremely dependant upon many variables.
First off I am assuming you have an index on each of the fields you are querying. I am also assuming that the query stands as is, without a sort. It should be noted that there are problems which stop a sort index from being used here by the optimiser: https://jira.mongodb.org/browse/SERVER-1205
Assuming you have indexes on A_id, B_id and C_id MongoDB will essentially do 3 queries and merge duplicates before returning your result.
This means that for small $or queries it might be faster within the database (or mongos) itself since you don't have to merge duplicates yourself in your application which not only spares network traffic but also costly iteration of the results of each clause of the $or.
So for a small $or like that the query is Okay. It isn't the best query in the world but it will do if you have no choice but to do an $or.
Q: how long can it take of a single server and how long may it take at horizontally scaled database?
Not sure if anyone here can answer that. It depends upon schema, the size of the $ins and much more.
It's certainly doable to run that query. However, you might want to consider an alternative structure that could be more easily searched.
Instead of
_id
A_id = ObjectId
B_id = ObjectId
C_id = ObjectId
+ other stuff
You might want to restructure it to be:
_id
idList = [
{ k: 'A', v: AObjectId },
{ k: 'B', v: BObjectId },
{ k: 'C', v: CObjectId }
]
+ other stuff
By using an array, with sub-objects with a key and value field, you can index the value fields so you can do just a single efficient query:
{ 'idList' : { $in : [listToCheck] } }

MongoDB MongoEngine index declaration

I have Document
class Store(Document):
store_id = IntField(required=True)
items = ListField(ReferenceField(Item, required=True))
meta = {
'indexes': [
{
'fields': ['campaign_id'],
'unique': True
},
{
'fields': ['items']
}
]
}
And want to set up indexes in items and store_id, does my configuration right?
Your second index declaration looks like it should do what you want. But to make sure that the index is really effective, you should use explain. Connect to your database with the mongo shell and perform a find-query which should use that index followed by .explain(). Example:
db.yourCollection.find({items:"someItem"}).explain();
The output will be a document with lots of fields. The documentation explains what exactly each field means. Pay special attention to these fields:
millis Time in milliseconds the query required
indexOnly (self-explaining)
n number of returned documents
nscannedObjects the number of objects which had to be examined without using an index. For an index-only query this should be equal to n. When it is higher, it means that some documents could not be excluded by an index and had to be scanned manually.

Compound Indexes Order in Mongo

Let's say I have the following document schema:
{
_id: ObjectId(...),
name: "Kevin",
weight: 500,
hobby: "scala",
favoriteFood : "chicken",
pet: "parrot",
favoriteMovie : "Diehard"
}
If I create a compound index on name-weight, I will be able to specify a strict parameter (name == "Kevin"), and a range on weight (between 50 and 200). However, I would not be able to do the reverse: specify a weight and give a "range" of names.
Of course compound index order matters where a range query is involved.
If only exact queries will be used (example: name == "Kevin", weight == 100, hobby == "C++"), then does the order actually matter for compound indexes?
When you have an exact query, the order should not matter. But when you want to be sure, the .explain() method on database cursors is your friend. It can tell you which indexes are used and how they are used when you perform a query in the mongo shell.
Important fields of the document returned by explain are:
indexOnly: when it's true, the query was completely covered by the index
n and nScanned: The first one tells you the number of found documents, the second how many documents had to be examined because the indexes couldn't sort them out. The latter shouldn't be notably higher than the first.
millis: number of milliseconds the query took to perform