In MongoDB, which index would be more efficient? One that queries an array with two values, or one that uses an $or statement? - mongodb

Let's say I have a document that looks like this:
{
_id: ObjectId("5260ca3a1606ed3e76bf3835"),
event_id: "20131020_NFL_SF_TEN",
team: {
away: "SF",
home: "TEN"
}
}
I want to query for any game with "SF" as the away team or home team. So I put an index on team.away and team.home and run an $or query to find all San Francisco games.
Another option:
{
_id: ObjectId("5260ca3a1606ed3e76bf3835"),
event_id: "20131020_NFL_SF_TEN",
team: [
{
name: "SF",
loc: "AWAY"
},
{
name: "TEN",
loc: "HOME"
}
]
}
In the array above, I could put an index on team.name instead of two indexes as before. Then I would query team.name for any game with "SF" inside.
Which query would be more efficient? Thanks!

I believe that you would want to use the second example you gave with the single index on team.name.
There are some special considerations that you need to know when working with the $or operator. Quoting from the documentation (with some additional formatting):
When using indexes with $or queries, remember that each clause of an $or query will execute in parallel. These clauses can each use their own index.
db.inventory.find ( { $or: [ { price: 1.99 }, { sale: true } ] } )
For this query, you would create one index on price:db.inventory.ensureIndex({ price: 1 },
and another index on sale:db.inventory.ensureIndex({ sale: 1 } )
rather than a compound index.
Taking your first example into consideration, it doesn't make much sense to index a field that you are not going to specifically query. When you say that you don't mind if SF is playing on an away or home game, you would always include both the away and home fields in your query, so you're using two indexes where all you need to query is one value - SF.
It seems appropriate to mention at this stage that you should always consider the majority of your queries when thinking about the format of your documents. Think about the queries that you are planning to make most often and build your documents accordingly. It's always better to handle 80% of the cases as best you can rather than trying to solve all the possibilities (which might lead to worse performance overall).
Looking at your second example, of nested documents, as you said, you would only need to use one index (saving valuable space on your server).
Some more relevant quotes from the $or docs (again with added formatting):
Also, when using the $or operator with the sort() method in a query, the query will not use the indexes on the $or fields. Consider the following query which adds a sort() method to the above query:
db.inventory.find ({ $or: [{ price: 1.99 }, { sale: true }] }).sort({item:1})
This modified query will not use the index on price nor the index on sale.
So the question now is - are you planning to use the sort() function? If the answer is yes then you should be aware that your indexes might turn out to be useless! :(
The take-away from this is pretty much "it depends!". Consider the queries you plan to make, and consider what document structure and indexes will be most beneficial to you according to your usage projections.

Related

Will $or query in mongo findOne operation use index if not all expressions are indexed?

Mongo's documentation on $or operator says:
When evaluating the clauses in the
$or
expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans. That is, for MongoDB to use indexes to evaluate an
$or
expression, all the clauses in the
$or
expression must be supported by indexes. Otherwise, MongoDB will perform a collection scan.
So effectively, if you want the query to be efficient, both of the attributes used in the $or condition should be indexed.
However, I'm not sure if this applies to "findOne" operations as well since I can't use Mongo's explain functionality for a findOne operation. It seems logical to me that if you only care about returning one document, you would check an indexed condition first, since you could bail after just finding one without needing to care about the non-indexed fields.
Example
Let's pretend in the below that "email" is indexed, but username is not. It's a contrived example, but bear with me.
db.users.findOne(
{
$or: [
{ email : 'someUser#gmail.com' },
{ username: 'some-user' }
]
}
)
Would the above use the email index, or would it do a full collection scan until it finds a document that matches the criteria?
I could not find anything documenting what would be expected here. Does someone know the answer?
Well I feel a little silly - it turns out that the docs I found saying you can only run explain on "find" may have been just referring to when you're in MongoDB Compass.
I ran the below (userEmail not indexed)
this.userModel
.findOne()
.where({ $or: [{ _id: id }, { userEmail: email }] })
.explain();
this.userModel
.getModelInstance()
.findOne()
.where({ _id: id })
.explain();
...and found that the first does do a COLLSCAN and the second's plan was IDHACK (basically it does a normal ID lookup).
When running performance tests, I can see that the first is about 4x slower in a collection with 20K+ documents (depends on where in the natural order the document you're finding is)

MongoDB Single or Compound index for sorted query?

I'm using mongoose and I have a query like this:
const stories = await Story.find({ genre: 'romance' }).sort({ createdAt: -1 })
I want to set an index on Story so that this kind of query becomes faster.
Which one of these is the best approach and why:
1. Create one Compound index with both fields:
Story.createIndex({genre: 1, createdAt: -1})
2. Create two separate indexes on each field:
Story.createIndex({genre: 1})
Story.createIndex({createdAt: -1})
If "genre" is always going to be part of the search field, using a compound index will always result in better performance.
1.) A Compound index consisting of the field being searched and the field that it is being sorted on can satisfy both the conditions.
2.) Creating more than one indexes assumes that, both the indexes will be used while fulfilling the query, that is not true. Index intersection concept is only applicable in a few circumstances. In this particular instance, since we have one field in the search criteria and another in sort, index intersection will not be employed by Mongo. (Link).
So in this situation, I would go with the compound index.
As long as all your queries that need to use the createdAt field, use the genre field as well you should use the compound index.
Lets compare the two options:
Queries: As long as what i stated above holds both queries will behave the same, There is no difference between those two when it comes to query execution speed.
Memory: A compound index will use less memory, which is crucial if you have limited RAM space. lets see the difference with an example:
Lets have 3 documents:
{
name: "john",
last_name: "mayer"
}
{
name: "john",
last_name: "cake"
}
{
name: "banana",
last_name: "pie"
}
Now if we run db.collection.stats() on option 1 the compound index we get:
totalIndexSize: 53248.0
On the contrary for option 2:
totalIndexSize: 69632.0
Inserting: full disclosure I have no idea how each is affected. from small tests it seems that a compound index is slightly quicker, however I could not really find documentation on this nor did I investigate deeper.

How to query data efficiently in large mongodb collection?

I have one big mongodb collection (3-million docs, 50 GigaBytes), and it would be very slow to query the data even I have created the indexs.
db.collection.find({"C123":1, "C122":2})
e.g. the query will be timeout or will be extreme slow (10s at least), even if I have created the separate indexes for C123 and C122.
Should I create more indexs or increase the physical memory to accelerate the querying?
For such a query you should create compound indexes. One on both fields. And then it should be very efficient. Creating separate indexes won't help you much, because MongoDB engine will use first to get results of first part of query, but second if is used won't help much (or even can slow down in some cases your query because of lookup in indexes table and then in real data again). You can confirm used indexes by using .explain() on your query in shell.
See compound indexes:
https://docs.mongodb.com/manual/core/index-compound/
Also consider sorting directions on both your fields while making indexes.
The answer is really simple.
You don't need to create more indexes, you need to create the right indexes. Index on field c124 won't help queries on field c123, so no point in creating it.
Use better/more hardware. More RAM, more machines (sharding).
Create Right indices and carefully use compound index. (You can have max. 64 indices per collection and 31 fields in compound index)
Use mongo side pagination
Try to find out most used queries and build compound index around that.
Compound index strictly follow sequence so read documentation and do trials
Also try covered query for 'summary' like queries
Learned it hard way..
Use skip and limit. Run a loop for 50000 data at once .
https://docs.mongodb.com/manual/reference/method/cursor.skip/
https://docs.mongodb.com/manual/reference/method/cursor.limit/
example :
[
{
$group: {
_id: "$myDoc,homepage_domain",
count: {$sum: 1},
entry: {
$push: {
location_city: "$myDoc.location_city",
homepage_domain: "$myDoc.homepage_domain",
country: "$myDoc.country",
employee_linkedin: "$myDoc.employee_linkedin",
linkedin_url: "$myDoc.inkedin_url",
homepage_url: "$myDoc.homepage_url",
industry: "$myDoc.industry",
read_at: "$myDoc.read_at"
}
}
}
}, {
$limit : 50000
}, {
$skip: 50000
}
],
{
allowDiskUse: true
},
print(
db.Or9.insert({
"HomepageDomain":myDoc.homepage_domain,
"location_city":myDoc.location_city
})
)

How to index an $or query with sort

Suppose I have a query that looks something like this:
db.things.find({
deleted: false,
type: 'thing',
$or: [{
'creator._id': someid
}, {
'parent._id': someid
}, {
'somerelation._id': someid
}]
}).sort({
'date.created': -1
})
That is, I want to find documents that meets one of those three conditions and sort it by newest. However, $or queries do not use indexes in parallel when used with a sort. Thus, how would I index this query?
http://docs.mongodb.org/manual/core/indexes/#index-behaviors-and-limitations
You can assume the following selectivity:
deleted - 99%
type - 25%
creator._id, parent._id, somerelation._id - < 1%
Now you are going to need more than one index for this query; there is no doubt about that.
The question is what indexes?
Now you have to take into consideration that none of your $ors will be able to sort their data cardinally in an optimal manner using the index due to a bug in MongoDBs query optimizer: https://jira.mongodb.org/browse/SERVER-1205 .
So you know that the $or will have some performance problems with a sort and that putting the sort field into the $or clause indexes is useless atm.
So considering this the first index you want is one that covers the base query you are making. As #Leonid said you could make this into a compound index, however, I would not do it the order he has done it. Instead, I would do:
db.col.ensureIndex({type:-1,deleted:-1,date.created:-1})
I am very unsure about the deleted field being in the index at all due to its super low selectivity; it could, in fact, create a less performant operation (this is true for most databases including SQL) being in the index rather than being taken out. This part will need testing by you; maybe the field should be last (?).
As to the order of the index, again I have just guessed. I have said DESC for all fields because your sort is DESC, but you will need to explain this yourself here.
So that should be able to handle the master clause of your query. Now to deal with those $ors.
Each $or will use an index separately, and the MongoDB query optimizer will look for indexes for them separately too as though they are separate queries altogether, so something worth noting here is a little snag about compound indexes ( http://docs.mongodb.org/manual/core/indexes/#compound-indexes ) is that they work upon prefixes ( an example note here: http://docs.mongodb.org/manual/core/indexes/#id5 ) so you can't make one single compound index to cover all three clauses, so a more optimal method of declaring indexes on the $or (considering the bug above) is:
db.col.ensureindex({creator._id:1});
db.col.ensureindex({aprent._id:1});
db.col.ensureindex({somrelation._id:1});
It should be able to get you started on making optimal indexes for your query.
I should stress however that you need to test this yourself.
Mongodb can use only one index per query, so I can't see the way to use indexes to query someid in your model.
So, the best approach is to add special field for this task:
ids = [creator._id, parent._id, somerelation._id]
In this case you'll be able to query without using $or operator:
db.things.find({
deleted: false,
type: 'thing',
ids: someid
}).sort({
'date.created': -1
})
In this case your index will look something like this:
{deleted:1, type:1, ids:1, 'date.created': -1}
If you had flexibility to adjust the schema, I would suggest adding a new field, associatedIds : [ ] which would hold creator._id, parent._id, some relation._id - you can update that field atomically when you update the main corresponding field, but now you can have a compound index on this field, type and created_date which eliminates the need for $or in your query entirely.
Considering your requirement for indexing , I would suggest you to use $orderBy operator along side your $or query. By that I mean you should be able to index on the criteria's in your $or expressions used in your $or query and then you can $orderBy to sort the result.
For example:
db.things.find({
deleted: false,
type: 'thing',
$or: [{
'creator._id': someid
}, {
'parent._id': someid
}, {
'somerelation._id': someid
}]
},{$orderBy:{'date.created': -1}})
The above query would require compound indexes on each of the fields in the $or expressions combined with the sort object specified in the orderBy criteria.
for example:
db.things.ensureIndex{'parent._id': 1,"date.created":-1}
and so on for other fields.
It is a good practice to specify "limit" for the result to prevent mongodb from performing a huge in memory sort.
Read More on $orderBy operator here

How to optimize this query with $and and $or

Coming from SQL I have this search condition
WHERE (col1 LIKE "%foo%" OR col2 LIKE "%foo%") AND
(col1 LIKE "%bar%" OR col2 LIKE "%bar%")
which I want to convert to MongoDB.
I came up wit this, hopefully semantically identical query:
{
$and: [
{
$or: [
{ col1: /.*foo.*/ },
{ col2: /.*foo.*/ }
]
},
{
$or: [
{ col1: /.*bar.*/ },
{ col2: /.*bar.*/ }
]
}
]
}
Is this the correct way or can it be improved?
Any suggestions about indexes (if they can be used at all)?
Yes, that's the correct way to implement that query for MongoDB.
If you want an index to fully assist the query, it needs to be a compound index that includes both fields, because MongoDB queries can only use one index per query. So an index like this:
db.coll.ensureIndex({col1: 1, col2: 1})
You can confirm your query is using the index you expect by using explain().
and and or queries are optimised in different ways. Quoting directly from 50 tips and tricks MongoDB:
Tip #27: AND-queries should match as little as possible as fast
as possible
and
Tip #28: OR-queries should match as much as possible as soon
as possible
So depending on the complexity and flexibility of your actual queries, you should make foo and bar more generic, yet try to limit the results from both $or statements.
Hope it helps
You have three problems here:
$and is two evaluated queries
$or is two evaluated queries
None pre-fixed regex
$and does not use a multiple index plan, only a single index plan however $or does as such using a single compound index here will not help that much.
In fact no index will help since MongoDB cannot use indexes for none pre-prefixed regexs.
So actually adding any index at all would be useless to this query.
There is no way to optimise this query in it's current form.
Normally a good way of doing searches like this is to split out the words you are searching for into a subdocument of words that can be searched on directly and indexed. Take a look at: http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo for examples.
When using this you would probably house one index on the array of words and that's it. MongoDB should be able to use that index for the entire query (twice over for the $or).
edit
Also an $or works different to how you have posted.
You don't need the $and and the two $ors. You only need one $or with a multi condition:
Edit again
I decided to take out the $or altogether. Note this uses the index unfriendly regex but still it is an interesting concept.
{ col1: /.*(foo|bar).*/ , col2: /.*(bar|foo).*/ }
If you want to use index friendly stuff then you will need to change the way your query works entirely as described above.