How to index an $or query with sort - mongodb

Suppose I have a query that looks something like this:
db.things.find({
deleted: false,
type: 'thing',
$or: [{
'creator._id': someid
}, {
'parent._id': someid
}, {
'somerelation._id': someid
}]
}).sort({
'date.created': -1
})
That is, I want to find documents that meets one of those three conditions and sort it by newest. However, $or queries do not use indexes in parallel when used with a sort. Thus, how would I index this query?
http://docs.mongodb.org/manual/core/indexes/#index-behaviors-and-limitations
You can assume the following selectivity:
deleted - 99%
type - 25%
creator._id, parent._id, somerelation._id - < 1%

Now you are going to need more than one index for this query; there is no doubt about that.
The question is what indexes?
Now you have to take into consideration that none of your $ors will be able to sort their data cardinally in an optimal manner using the index due to a bug in MongoDBs query optimizer: https://jira.mongodb.org/browse/SERVER-1205 .
So you know that the $or will have some performance problems with a sort and that putting the sort field into the $or clause indexes is useless atm.
So considering this the first index you want is one that covers the base query you are making. As #Leonid said you could make this into a compound index, however, I would not do it the order he has done it. Instead, I would do:
db.col.ensureIndex({type:-1,deleted:-1,date.created:-1})
I am very unsure about the deleted field being in the index at all due to its super low selectivity; it could, in fact, create a less performant operation (this is true for most databases including SQL) being in the index rather than being taken out. This part will need testing by you; maybe the field should be last (?).
As to the order of the index, again I have just guessed. I have said DESC for all fields because your sort is DESC, but you will need to explain this yourself here.
So that should be able to handle the master clause of your query. Now to deal with those $ors.
Each $or will use an index separately, and the MongoDB query optimizer will look for indexes for them separately too as though they are separate queries altogether, so something worth noting here is a little snag about compound indexes ( http://docs.mongodb.org/manual/core/indexes/#compound-indexes ) is that they work upon prefixes ( an example note here: http://docs.mongodb.org/manual/core/indexes/#id5 ) so you can't make one single compound index to cover all three clauses, so a more optimal method of declaring indexes on the $or (considering the bug above) is:
db.col.ensureindex({creator._id:1});
db.col.ensureindex({aprent._id:1});
db.col.ensureindex({somrelation._id:1});
It should be able to get you started on making optimal indexes for your query.
I should stress however that you need to test this yourself.

Mongodb can use only one index per query, so I can't see the way to use indexes to query someid in your model.
So, the best approach is to add special field for this task:
ids = [creator._id, parent._id, somerelation._id]
In this case you'll be able to query without using $or operator:
db.things.find({
deleted: false,
type: 'thing',
ids: someid
}).sort({
'date.created': -1
})
In this case your index will look something like this:
{deleted:1, type:1, ids:1, 'date.created': -1}

If you had flexibility to adjust the schema, I would suggest adding a new field, associatedIds : [ ] which would hold creator._id, parent._id, some relation._id - you can update that field atomically when you update the main corresponding field, but now you can have a compound index on this field, type and created_date which eliminates the need for $or in your query entirely.

Considering your requirement for indexing , I would suggest you to use $orderBy operator along side your $or query. By that I mean you should be able to index on the criteria's in your $or expressions used in your $or query and then you can $orderBy to sort the result.
For example:
db.things.find({
deleted: false,
type: 'thing',
$or: [{
'creator._id': someid
}, {
'parent._id': someid
}, {
'somerelation._id': someid
}]
},{$orderBy:{'date.created': -1}})
The above query would require compound indexes on each of the fields in the $or expressions combined with the sort object specified in the orderBy criteria.
for example:
db.things.ensureIndex{'parent._id': 1,"date.created":-1}
and so on for other fields.
It is a good practice to specify "limit" for the result to prevent mongodb from performing a huge in memory sort.
Read More on $orderBy operator here

Related

Will $or query in mongo findOne operation use index if not all expressions are indexed?

Mongo's documentation on $or operator says:
When evaluating the clauses in the
$or
expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an
$or
expression, all the clauses in the
$or
expression must be supported by indexes. Otherwise, MongoDB will perform a collection scan.
So effectively, if you want the query to be efficient, both of the attributes used in the $or condition should be indexed.
However, I'm not sure if this applies to "findOne" operations as well since I can't use Mongo's explain functionality for a findOne operation. It seems logical to me that if you only care about returning one document, you would check an indexed condition first, since you could bail after just finding one without needing to care about the non-indexed fields.
Example
Let's pretend in the below that "email" is indexed, but username is not. It's a contrived example, but bear with me.
db.users.findOne(
{
$or: [
{ email : 'someUser#gmail.com' },
{ username: 'some-user' }
]
}
)
Would the above use the email index, or would it do a full collection scan until it finds a document that matches the criteria?
I could not find anything documenting what would be expected here. Does someone know the answer?
Well I feel a little silly - it turns out that the docs I found saying you can only run explain on "find" may have been just referring to when you're in MongoDB Compass.
I ran the below (userEmail not indexed)
this.userModel
.findOne()
.where({ $or: [{ _id: id }, { userEmail: email }] })
.explain();
this.userModel
.getModelInstance()
.findOne()
.where({ _id: id })
.explain();
...and found that the first does do a COLLSCAN and the second's plan was IDHACK (basically it does a normal ID lookup).
When running performance tests, I can see that the first is about 4x slower in a collection with 20K+ documents (depends on where in the natural order the document you're finding is)

MongoDB Single or Compound index for sorted query?

I'm using mongoose and I have a query like this:
const stories = await Story.find({ genre: 'romance' }).sort({ createdAt: -1 })
I want to set an index on Story so that this kind of query becomes faster.
Which one of these is the best approach and why:
1. Create one Compound index with both fields:
Story.createIndex({genre: 1, createdAt: -1})
2. Create two separate indexes on each field:
Story.createIndex({genre: 1})
Story.createIndex({createdAt: -1})
If "genre" is always going to be part of the search field, using a compound index will always result in better performance.
1.) A Compound index consisting of the field being searched and the field that it is being sorted on can satisfy both the conditions.
2.) Creating more than one indexes assumes that, both the indexes will be used while fulfilling the query, that is not true. Index intersection concept is only applicable in a few circumstances. In this particular instance, since we have one field in the search criteria and another in sort, index intersection will not be employed by Mongo. (Link).
So in this situation, I would go with the compound index.
As long as all your queries that need to use the createdAt field, use the genre field as well you should use the compound index.
Lets compare the two options:
Queries: As long as what i stated above holds both queries will behave the same, There is no difference between those two when it comes to query execution speed.
Memory: A compound index will use less memory, which is crucial if you have limited RAM space. lets see the difference with an example:
Lets have 3 documents:
{
name: "john",
last_name: "mayer"
}
{
name: "john",
last_name: "cake"
}
{
name: "banana",
last_name: "pie"
}
Now if we run db.collection.stats() on option 1 the compound index we get:
totalIndexSize: 53248.0
On the contrary for option 2:
totalIndexSize: 69632.0
Inserting: full disclosure I have no idea how each is affected. from small tests it seems that a compound index is slightly quicker, however I could not really find documentation on this nor did I investigate deeper.

In MongoDB, which index would be more efficient? One that queries an array with two values, or one that uses an $or statement?

Let's say I have a document that looks like this:
{
_id: ObjectId("5260ca3a1606ed3e76bf3835"),
event_id: "20131020_NFL_SF_TEN",
team: {
away: "SF",
home: "TEN"
}
}
I want to query for any game with "SF" as the away team or home team. So I put an index on team.away and team.home and run an $or query to find all San Francisco games.
Another option:
{
_id: ObjectId("5260ca3a1606ed3e76bf3835"),
event_id: "20131020_NFL_SF_TEN",
team: [
{
name: "SF",
loc: "AWAY"
},
{
name: "TEN",
loc: "HOME"
}
]
}
In the array above, I could put an index on team.name instead of two indexes as before. Then I would query team.name for any game with "SF" inside.
Which query would be more efficient? Thanks!
I believe that you would want to use the second example you gave with the single index on team.name.
There are some special considerations that you need to know when working with the $or operator. Quoting from the documentation (with some additional formatting):
When using indexes with $or queries, remember that each clause of an $or query will execute in parallel. These clauses can each use their own index.
db.inventory.find ( { $or: [ { price: 1.99 }, { sale: true } ] } )
For this query, you would create one index on price:db.inventory.ensureIndex({ price: 1 },
and another index on sale:db.inventory.ensureIndex({ sale: 1 } )
rather than a compound index.
Taking your first example into consideration, it doesn't make much sense to index a field that you are not going to specifically query. When you say that you don't mind if SF is playing on an away or home game, you would always include both the away and home fields in your query, so you're using two indexes where all you need to query is one value - SF.
It seems appropriate to mention at this stage that you should always consider the majority of your queries when thinking about the format of your documents. Think about the queries that you are planning to make most often and build your documents accordingly. It's always better to handle 80% of the cases as best you can rather than trying to solve all the possibilities (which might lead to worse performance overall).
Looking at your second example, of nested documents, as you said, you would only need to use one index (saving valuable space on your server).
Some more relevant quotes from the $or docs (again with added formatting):
Also, when using the $or operator with the sort() method in a query, the query will not use the indexes on the $or fields. Consider the following query which adds a sort() method to the above query:
db.inventory.find ({ $or: [{ price: 1.99 }, { sale: true }] }).sort({item:1})
This modified query will not use the index on price nor the index on sale.
So the question now is - are you planning to use the sort() function? If the answer is yes then you should be aware that your indexes might turn out to be useless! :(
The take-away from this is pretty much "it depends!". Consider the queries you plan to make, and consider what document structure and indexes will be most beneficial to you according to your usage projections.

How to optimize this query with $and and $or

Coming from SQL I have this search condition
WHERE (col1 LIKE "%foo%" OR col2 LIKE "%foo%") AND
(col1 LIKE "%bar%" OR col2 LIKE "%bar%")
which I want to convert to MongoDB.
I came up wit this, hopefully semantically identical query:
{
$and: [
{
$or: [
{ col1: /.*foo.*/ },
{ col2: /.*foo.*/ }
]
},
{
$or: [
{ col1: /.*bar.*/ },
{ col2: /.*bar.*/ }
]
}
]
}
Is this the correct way or can it be improved?
Any suggestions about indexes (if they can be used at all)?
Yes, that's the correct way to implement that query for MongoDB.
If you want an index to fully assist the query, it needs to be a compound index that includes both fields, because MongoDB queries can only use one index per query. So an index like this:
db.coll.ensureIndex({col1: 1, col2: 1})
You can confirm your query is using the index you expect by using explain().
and and or queries are optimised in different ways. Quoting directly from 50 tips and tricks MongoDB:
Tip #27: AND-queries should match as little as possible as fast
as possible
and
Tip #28: OR-queries should match as much as possible as soon
as possible
So depending on the complexity and flexibility of your actual queries, you should make foo and bar more generic, yet try to limit the results from both $or statements.
Hope it helps
You have three problems here:
$and is two evaluated queries
$or is two evaluated queries
None pre-fixed regex
$and does not use a multiple index plan, only a single index plan however $or does as such using a single compound index here will not help that much.
In fact no index will help since MongoDB cannot use indexes for none pre-prefixed regexs.
So actually adding any index at all would be useless to this query.
There is no way to optimise this query in it's current form.
Normally a good way of doing searches like this is to split out the words you are searching for into a subdocument of words that can be searched on directly and indexed. Take a look at: http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo for examples.
When using this you would probably house one index on the array of words and that's it. MongoDB should be able to use that index for the entire query (twice over for the $or).
edit
Also an $or works different to how you have posted.
You don't need the $and and the two $ors. You only need one $or with a multi condition:
Edit again
I decided to take out the $or altogether. Note this uses the index unfriendly regex but still it is an interesting concept.
{ col1: /.*(foo|bar).*/ , col2: /.*(bar|foo).*/ }
If you want to use index friendly stuff then you will need to change the way your query works entirely as described above.

Sorting on Multiple fields mongo DB

I have a query in mongo such that I want to give preference to the first field and then the second field.
Say I have to query such that
db.col.find({category: A}).sort({updated: -1, rating: -1}).limit(10).explain()
So I created the following index
db.col.ensureIndex({category: 1, rating: -1, updated: -1})
It worked just fined scanning as many objects as needed, i.e. 10.
But now I need to query
db.col.find({category: { $ne: A}}).sort({updated: -1, rating: -1}).limit(10)
So I created the following index:
db.col.ensureIndex({rating: -1, updated: -1})
but this leads to scanning of the whole document and when I create
db.col.ensureIndex({ updated: -1 ,rating: -1})
It scans less number of documents:
I just want to ask to be clear about sorting on multiple fields and what is the order to be preserved when doing so. By reading the MongoDB documents, it's clear that the field on which we need to perform sorting should be the last field. So that is the case I assumed in my $ne query above. Am I doing anything wrong?
The MongoDB query optimizer works by trying different plans to determine which approach works best for a given query. The winning plan for that query pattern is then cached for the next ~1,000 queries or until you do an explain().
To understand which query plans were considered, you should use explain(1), eg:
db.col.find({category:'A'}).sort({updated: -1}).explain(1)
The allPlans detail will show all plans that were compared.
If you run a query which is not very selective (for example, if many records match your criteria of {category: { $ne:'A'}}), it may be faster for MongoDB to find results using a BasicCursor (table scan) rather than matching against an index.
The order of fields in the query generally does not make a difference for the index selection (there are a few exceptions with range queries). The order of fields in a sort does affect the index selection. If your sort() criteria does not match the index order, the result data has to be re-sorted after the index is used (you should see scanAndOrder:true in the explain output if this happens).
It's also worth noting that MongoDB will only use one index per query (with the exception of $ors).
So if you are trying to optimize the query:
db.col.find({category:'A'}).sort({updated: -1, rating: -1})
You will want to include all three fields in the index:
db.col.ensureIndex({category: 1, updated: -1, rating: -1})
FYI, if you want to force a particular query to use an index (generally not needed or recommended), there is a hint() option you can try.
That is true but there are two layers of ordering you have here since you are sorting on a compound index.
As you noticed when the first field of the index matches the first field of sort it worked and the index was seen. However when working the other way around it does not.
As such by your own obersvations the order needed to be preserved is query order of fields from first to last. The mongo analyser can sometimes move around fields to match an index but normally it will just try and match the first field, if it cannot it will skip it.
try this code it will sort data first based on name then keeping the 'name' in key holder it will sort 'filter'
var cursor = db.collection('vc').find({ "name" : { $in: [ /cpu/, /memo/ ] } }, { _id: 0, }).sort( { "name":1 , "filter": 1 } );
Sort and Index Use
MongoDB can obtain the results of a sort operation from an index which
includes the sort fields. MongoDB may use multiple indexes to support
a sort operation if the sort uses the same indexes as the query
predicate. ... Sort operations that use an index often have better
performance than blocking sorts.
db.restaurants.find().sort( { "borough": 1, "_id": 1 } )
more information :
https://docs.mongodb.com/manual/reference/method/cursor.sort/