How to optimize this query with $and and $or - mongodb

Coming from SQL I have this search condition
WHERE (col1 LIKE "%foo%" OR col2 LIKE "%foo%") AND
(col1 LIKE "%bar%" OR col2 LIKE "%bar%")
which I want to convert to MongoDB.
I came up wit this, hopefully semantically identical query:
{
$and: [
{
$or: [
{ col1: /.*foo.*/ },
{ col2: /.*foo.*/ }
]
},
{
$or: [
{ col1: /.*bar.*/ },
{ col2: /.*bar.*/ }
]
}
]
}
Is this the correct way or can it be improved?
Any suggestions about indexes (if they can be used at all)?

Yes, that's the correct way to implement that query for MongoDB.
If you want an index to fully assist the query, it needs to be a compound index that includes both fields, because MongoDB queries can only use one index per query. So an index like this:
db.coll.ensureIndex({col1: 1, col2: 1})
You can confirm your query is using the index you expect by using explain().

and and or queries are optimised in different ways. Quoting directly from 50 tips and tricks MongoDB:
Tip #27: AND-queries should match as little as possible as fast
as possible
and
Tip #28: OR-queries should match as much as possible as soon
as possible
So depending on the complexity and flexibility of your actual queries, you should make foo and bar more generic, yet try to limit the results from both $or statements.
Hope it helps

You have three problems here:
$and is two evaluated queries
$or is two evaluated queries
None pre-fixed regex
$and does not use a multiple index plan, only a single index plan however $or does as such using a single compound index here will not help that much.
In fact no index will help since MongoDB cannot use indexes for none pre-prefixed regexs.
So actually adding any index at all would be useless to this query.
There is no way to optimise this query in it's current form.
Normally a good way of doing searches like this is to split out the words you are searching for into a subdocument of words that can be searched on directly and indexed. Take a look at: http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo for examples.
When using this you would probably house one index on the array of words and that's it. MongoDB should be able to use that index for the entire query (twice over for the $or).
edit
Also an $or works different to how you have posted.
You don't need the $and and the two $ors. You only need one $or with a multi condition:
Edit again
I decided to take out the $or altogether. Note this uses the index unfriendly regex but still it is an interesting concept.
{ col1: /.*(foo|bar).*/ , col2: /.*(bar|foo).*/ }
If you want to use index friendly stuff then you will need to change the way your query works entirely as described above.

Related

Mongodb searching on array / indexing

I'm using the airbnb sample set and it has a field that looks like:
"amenities": ["TV", "Cable TV", "Wifi"....
So I'm trying to do a case-INsensitive, wildcard search (on one or more values passed in).
Only thing I've found that works is:
{ amenities: { $in: [ /wi/ ] }}
Is that the best way?
So I ran it in Compass as the dataset was imported (5600 docs), and the Explain says it took ~20ms on my machine and warned there was no index. I then created an index on the amenities column and the same search jumped up to ~100ms. I just created the index through the Compass UI, so not sure why its taking 5x as long with an index? Or if there is a better way to do this?
The way to run that query is:
{ amenities: /wi/i }
//better but not always useful
{ amenities: /wi/i }, { amenities:1, _id:0 }
It already traverses the array, and to be case insensitive it must be on the options.
For multikey indexes the second query won't be a covered query. Otherwise, it would be blazing fast.
I've tested a similar search with and without index though. Exec. time is reduced 10X. (1500ms to 150ms, in a huge collection). Measure with Mongo Hacker.
As you report executionTimeMilliseconds is not that different. But still smaller.
The reason why you don't see a huge decrease in time is because the index stores each array entry separately. When it finds a match, it comes back to collection to fetch the whole array field, instead of using the indexes.
Probably indexes aren't very useful for arrays.
When querying with an unanchored regex, the query executor will have to scan every index key to see if there is a match.
You might find a collated index to be helpful.
Create an index with the appropriate collation, like:
(strength 1 and 2 are case-insensitive)
db.collection.createIndex({amenities:1},{collation:{locale:"en",strength:1}})
Then query using the same collation:
db.collection.find({amenities:"wifi"}).collation({locale:"en",strength:1})
The search will be case insensitive, and it can efficiently use the index.

In MongoDB, which index would be more efficient? One that queries an array with two values, or one that uses an $or statement?

Let's say I have a document that looks like this:
{
_id: ObjectId("5260ca3a1606ed3e76bf3835"),
event_id: "20131020_NFL_SF_TEN",
team: {
away: "SF",
home: "TEN"
}
}
I want to query for any game with "SF" as the away team or home team. So I put an index on team.away and team.home and run an $or query to find all San Francisco games.
Another option:
{
_id: ObjectId("5260ca3a1606ed3e76bf3835"),
event_id: "20131020_NFL_SF_TEN",
team: [
{
name: "SF",
loc: "AWAY"
},
{
name: "TEN",
loc: "HOME"
}
]
}
In the array above, I could put an index on team.name instead of two indexes as before. Then I would query team.name for any game with "SF" inside.
Which query would be more efficient? Thanks!
I believe that you would want to use the second example you gave with the single index on team.name.
There are some special considerations that you need to know when working with the $or operator. Quoting from the documentation (with some additional formatting):
When using indexes with $or queries, remember that each clause of an $or query will execute in parallel. These clauses can each use their own index.
db.inventory.find ( { $or: [ { price: 1.99 }, { sale: true } ] } )
For this query, you would create one index on price:db.inventory.ensureIndex({ price: 1 },
and another index on sale:db.inventory.ensureIndex({ sale: 1 } )
rather than a compound index.
Taking your first example into consideration, it doesn't make much sense to index a field that you are not going to specifically query. When you say that you don't mind if SF is playing on an away or home game, you would always include both the away and home fields in your query, so you're using two indexes where all you need to query is one value - SF.
It seems appropriate to mention at this stage that you should always consider the majority of your queries when thinking about the format of your documents. Think about the queries that you are planning to make most often and build your documents accordingly. It's always better to handle 80% of the cases as best you can rather than trying to solve all the possibilities (which might lead to worse performance overall).
Looking at your second example, of nested documents, as you said, you would only need to use one index (saving valuable space on your server).
Some more relevant quotes from the $or docs (again with added formatting):
Also, when using the $or operator with the sort() method in a query, the query will not use the indexes on the $or fields. Consider the following query which adds a sort() method to the above query:
db.inventory.find ({ $or: [{ price: 1.99 }, { sale: true }] }).sort({item:1})
This modified query will not use the index on price nor the index on sale.
So the question now is - are you planning to use the sort() function? If the answer is yes then you should be aware that your indexes might turn out to be useless! :(
The take-away from this is pretty much "it depends!". Consider the queries you plan to make, and consider what document structure and indexes will be most beneficial to you according to your usage projections.

In MongoDB, is one big $or search faster than multiple single searches?

I have a list of about 50 tags in an array, and want to search through my documents to find records that match these tags.
Because they're user-submitted and mongoDB is case-sensitive, I'm using /wildcard/i as a means of searching. I know this is not the fastest way to do a search but I can't think of a better solution.
I can do my query in two ways. The first is to run a for loop over my tags array, and for each result, perform:
db.collection.find({tags: /<tag[x]>/i})
Or, I can collect all of the tags and run one single lookup using $or, like so:
db.collection.find({$or:[{tags:/<tag1>/i},{tags:/<tag2>/i},{tags:/<tag3>/i}, ... {tags:/<tag50>/i}]});
I have tried both, and found using $or to be significantly faster - but because of the work-in-progress state of my application, it's very difficult to tell whether this is because it's actually faster or whether my app is causing significant overhead in other areas (it is).
So for clarification, in MongoDB is a big query performed once faster than small queries performed many times?
EDIT: Another example would be whether looking up 3 individual records based on _id is faster than doing one lookup using {$or:[{_id: ObjectId([id1])},{_id: ObjectId([id2])},{_id: ObjectId([id3])}]}. Is less more?
I recommend you adjust your schema so it keeps a normalized array of tags. When you insert a new document, do it like this:
tags : [ "business", "Computing", "PayPal" ],
lowercaseTags : [ "business", "computing", "paypal" ]
Similarly when you update the tags, update both arrays.
Create an index on lowercaseTags, and then when you want to query them, use a single query with the $in operator, and the normalized form of the search terms.
For example, to search for business iTunes YouTube, use this query:
db.collection.find( { tags : $in: [ "business", "itunes", "youtube" ] } )
This answer gives an example of this approach. It should be loads faster than what you have.
An alternate approach you can take is to create a text index and use the text command.
Both of these approaches are geared toward index optimization, and designing your schema to work well with Mongo. The payoff should be a lot higher than whatever difference there is between a single $or query and 50 simpler queries.

How to index an $or query with sort

Suppose I have a query that looks something like this:
db.things.find({
deleted: false,
type: 'thing',
$or: [{
'creator._id': someid
}, {
'parent._id': someid
}, {
'somerelation._id': someid
}]
}).sort({
'date.created': -1
})
That is, I want to find documents that meets one of those three conditions and sort it by newest. However, $or queries do not use indexes in parallel when used with a sort. Thus, how would I index this query?
http://docs.mongodb.org/manual/core/indexes/#index-behaviors-and-limitations
You can assume the following selectivity:
deleted - 99%
type - 25%
creator._id, parent._id, somerelation._id - < 1%
Now you are going to need more than one index for this query; there is no doubt about that.
The question is what indexes?
Now you have to take into consideration that none of your $ors will be able to sort their data cardinally in an optimal manner using the index due to a bug in MongoDBs query optimizer: https://jira.mongodb.org/browse/SERVER-1205 .
So you know that the $or will have some performance problems with a sort and that putting the sort field into the $or clause indexes is useless atm.
So considering this the first index you want is one that covers the base query you are making. As #Leonid said you could make this into a compound index, however, I would not do it the order he has done it. Instead, I would do:
db.col.ensureIndex({type:-1,deleted:-1,date.created:-1})
I am very unsure about the deleted field being in the index at all due to its super low selectivity; it could, in fact, create a less performant operation (this is true for most databases including SQL) being in the index rather than being taken out. This part will need testing by you; maybe the field should be last (?).
As to the order of the index, again I have just guessed. I have said DESC for all fields because your sort is DESC, but you will need to explain this yourself here.
So that should be able to handle the master clause of your query. Now to deal with those $ors.
Each $or will use an index separately, and the MongoDB query optimizer will look for indexes for them separately too as though they are separate queries altogether, so something worth noting here is a little snag about compound indexes ( http://docs.mongodb.org/manual/core/indexes/#compound-indexes ) is that they work upon prefixes ( an example note here: http://docs.mongodb.org/manual/core/indexes/#id5 ) so you can't make one single compound index to cover all three clauses, so a more optimal method of declaring indexes on the $or (considering the bug above) is:
db.col.ensureindex({creator._id:1});
db.col.ensureindex({aprent._id:1});
db.col.ensureindex({somrelation._id:1});
It should be able to get you started on making optimal indexes for your query.
I should stress however that you need to test this yourself.
Mongodb can use only one index per query, so I can't see the way to use indexes to query someid in your model.
So, the best approach is to add special field for this task:
ids = [creator._id, parent._id, somerelation._id]
In this case you'll be able to query without using $or operator:
db.things.find({
deleted: false,
type: 'thing',
ids: someid
}).sort({
'date.created': -1
})
In this case your index will look something like this:
{deleted:1, type:1, ids:1, 'date.created': -1}
If you had flexibility to adjust the schema, I would suggest adding a new field, associatedIds : [ ] which would hold creator._id, parent._id, some relation._id - you can update that field atomically when you update the main corresponding field, but now you can have a compound index on this field, type and created_date which eliminates the need for $or in your query entirely.
Considering your requirement for indexing , I would suggest you to use $orderBy operator along side your $or query. By that I mean you should be able to index on the criteria's in your $or expressions used in your $or query and then you can $orderBy to sort the result.
For example:
db.things.find({
deleted: false,
type: 'thing',
$or: [{
'creator._id': someid
}, {
'parent._id': someid
}, {
'somerelation._id': someid
}]
},{$orderBy:{'date.created': -1}})
The above query would require compound indexes on each of the fields in the $or expressions combined with the sort object specified in the orderBy criteria.
for example:
db.things.ensureIndex{'parent._id': 1,"date.created":-1}
and so on for other fields.
It is a good practice to specify "limit" for the result to prevent mongodb from performing a huge in memory sort.
Read More on $orderBy operator here

Sorting on Multiple fields mongo DB

I have a query in mongo such that I want to give preference to the first field and then the second field.
Say I have to query such that
db.col.find({category: A}).sort({updated: -1, rating: -1}).limit(10).explain()
So I created the following index
db.col.ensureIndex({category: 1, rating: -1, updated: -1})
It worked just fined scanning as many objects as needed, i.e. 10.
But now I need to query
db.col.find({category: { $ne: A}}).sort({updated: -1, rating: -1}).limit(10)
So I created the following index:
db.col.ensureIndex({rating: -1, updated: -1})
but this leads to scanning of the whole document and when I create
db.col.ensureIndex({ updated: -1 ,rating: -1})
It scans less number of documents:
I just want to ask to be clear about sorting on multiple fields and what is the order to be preserved when doing so. By reading the MongoDB documents, it's clear that the field on which we need to perform sorting should be the last field. So that is the case I assumed in my $ne query above. Am I doing anything wrong?
The MongoDB query optimizer works by trying different plans to determine which approach works best for a given query. The winning plan for that query pattern is then cached for the next ~1,000 queries or until you do an explain().
To understand which query plans were considered, you should use explain(1), eg:
db.col.find({category:'A'}).sort({updated: -1}).explain(1)
The allPlans detail will show all plans that were compared.
If you run a query which is not very selective (for example, if many records match your criteria of {category: { $ne:'A'}}), it may be faster for MongoDB to find results using a BasicCursor (table scan) rather than matching against an index.
The order of fields in the query generally does not make a difference for the index selection (there are a few exceptions with range queries). The order of fields in a sort does affect the index selection. If your sort() criteria does not match the index order, the result data has to be re-sorted after the index is used (you should see scanAndOrder:true in the explain output if this happens).
It's also worth noting that MongoDB will only use one index per query (with the exception of $ors).
So if you are trying to optimize the query:
db.col.find({category:'A'}).sort({updated: -1, rating: -1})
You will want to include all three fields in the index:
db.col.ensureIndex({category: 1, updated: -1, rating: -1})
FYI, if you want to force a particular query to use an index (generally not needed or recommended), there is a hint() option you can try.
That is true but there are two layers of ordering you have here since you are sorting on a compound index.
As you noticed when the first field of the index matches the first field of sort it worked and the index was seen. However when working the other way around it does not.
As such by your own obersvations the order needed to be preserved is query order of fields from first to last. The mongo analyser can sometimes move around fields to match an index but normally it will just try and match the first field, if it cannot it will skip it.
try this code it will sort data first based on name then keeping the 'name' in key holder it will sort 'filter'
var cursor = db.collection('vc').find({ "name" : { $in: [ /cpu/, /memo/ ] } }, { _id: 0, }).sort( { "name":1 , "filter": 1 } );
Sort and Index Use
MongoDB can obtain the results of a sort operation from an index which
includes the sort fields. MongoDB may use multiple indexes to support
a sort operation if the sort uses the same indexes as the query
predicate. ... Sort operations that use an index often have better
performance than blocking sorts.
db.restaurants.find().sort( { "borough": 1, "_id": 1 } )
more information :
https://docs.mongodb.com/manual/reference/method/cursor.sort/