I'm using the airbnb sample set and it has a field that looks like:
"amenities": ["TV", "Cable TV", "Wifi"....
So I'm trying to do a case-INsensitive, wildcard search (on one or more values passed in).
Only thing I've found that works is:
{ amenities: { $in: [ /wi/ ] }}
Is that the best way?
So I ran it in Compass as the dataset was imported (5600 docs), and the Explain says it took ~20ms on my machine and warned there was no index. I then created an index on the amenities column and the same search jumped up to ~100ms. I just created the index through the Compass UI, so not sure why its taking 5x as long with an index? Or if there is a better way to do this?
The way to run that query is:
{ amenities: /wi/i }
//better but not always useful
{ amenities: /wi/i }, { amenities:1, _id:0 }
It already traverses the array, and to be case insensitive it must be on the options.
For multikey indexes the second query won't be a covered query. Otherwise, it would be blazing fast.
I've tested a similar search with and without index though. Exec. time is reduced 10X. (1500ms to 150ms, in a huge collection). Measure with Mongo Hacker.
As you report executionTimeMilliseconds is not that different. But still smaller.
The reason why you don't see a huge decrease in time is because the index stores each array entry separately. When it finds a match, it comes back to collection to fetch the whole array field, instead of using the indexes.
Probably indexes aren't very useful for arrays.
When querying with an unanchored regex, the query executor will have to scan every index key to see if there is a match.
You might find a collated index to be helpful.
Create an index with the appropriate collation, like:
(strength 1 and 2 are case-insensitive)
db.collection.createIndex({amenities:1},{collation:{locale:"en",strength:1}})
Then query using the same collation:
db.collection.find({amenities:"wifi"}).collation({locale:"en",strength:1})
The search will be case insensitive, and it can efficiently use the index.
Related
I have a collection of around 200m documents. I need to build a search for it that looks for substrings. Using a regex search is incredibly slow even with a regular index on the field being searched for.
The answer seems to be a text index but there is only one text index allowed per collection. I can make the text index search multiple fields but that will actually break intended functionality as it will make results inaccurate. I need to specify the exact fields the substring should appear in.
Are there any ways around this limitation? The documentation says their cloud databases allow multiple indexes but for this project I need to keep data on our own servers.
Yea even if you index your field, it will still go for a collection scan if you use regex search. And you can only have text index on a single field. Also this text index is based on words not sub-strings, so text index would not do anything.
These indices including text index is basically pre sorting the documents according to the indexed field in alphabetical order(or reverse). For text field, it is very similar, but a little better because it indexes each word of the selected field. But in your case since you are searching for substrings, text index would be equally useless.
To solve your problem, typically you would have to go for another dedicated database such as ElasticSearch.
Fortunately, MongoDB Atlas released Atlas search index recently and it should solve your problem. You can index multiple(or all) fields and it can also search sub-strings. Its basically a "search engine". Just like ElasticSearch, it is based on the popular open source search engine, Lucene. After you apply Atlas search index you can use aggregate with $search pipeline.
But in order to use this feature, you need to use MongoDB Atlas. As far as I know you can only create this search index in MongoDB Atlas. Once you have MongoDB Atlas setup, applying and using this search feature is straight forward. You can go to MongoDB Atlas, then to your collection and apply this search index with few clicks. You can fine tune it(check the docs) but you can start with the default settings.
Using it in your backend is very simple(from docs):
db.articles.aggregate(
[
{ $match: { $text: { $search: "cake" } } },
{ $group: { _id: null, views: { $sum: "$views" } } }
]
)
When I create an Index on a string-type field in MongoDB I get no significant speed boost from it. In fact, when I use the query:
db.movies.find({plot: /some text/}).explain("executionStats")
An Index is slowing down the query by 30-50% in my Database (~55k Docs).
I know, that I can use a "text" Index, which is working fine for me, but I was wondering, why you would create a "normal" Index on a string field.
Index on string fields will improve the performance of exact matches like,
db.movies.find({name: "some movie"})
Indexes will also be used for find queries with prefix expression,
db.movies.find({plot: /^begins with/})
After having read the official documentations on indexes, sort, intersection, i'm a little bit confuse on how everything work together.
I've trouble making my query use the indexes i've created. I work on a mongodb 3.0.3, on a collection having ~4millions of document.
To simplify, let's say my document is composed of 6 fields:
{
a:<text>,
b:<boolean>,
c:<text>,
d:<boolean>,
e:<date>,
f:<date>
}
The query I want to achieve is the following :
db.mycoll.find({ a:"OK", b:true, c:"ProviderA", d:true, e:{ $gte:ISODate("2016-10-28T12:00:01Z"),$lt:ISODate("2016-10-28T12:00:02") } }).sort({f:1});
So intuitively I've created two indexes
db.mycoll.createIndex({a: 1, b: 1, c: 1, d:1, e:1 }, {background: true,name: "test1"})
db.mycoll.createIndex({f:1}, {background: true,name: "test2"})
But the explain() give me that the first index is not used at all.
I known there is some kind of limitation when there is ranges in play in the filter (in the e field), but I can't find my way around it.
Also instead of having a single index on f, I try a compound index on {e:1,f:1} but it didn't change anything.
So What I have misunderstood?
Thanks for your support.
Update: also I find some time the following predicate for mongodb 2.6 :
A good rule of thumb for queries with sort is to order the indexed fields in this order:
First, the field(s) on which you will query for exact values.
Second, the field(s) on which you will sort.
Finally, field(s) on which you will query for a range of values (e.g., $gt, $lt, $in)
An example of using this rule of thumb is in the section on “Sorting the results of a complex query on a range of values” below, including a link to further reading.
Does this also apply for 3.X version?
Update 2: following above predicate, I created the following index
db.mycoll.createIndex({a: 1, b: 1, c: 1, d:1 , f:1, e:1}, {background: true,name: "test1"})
And for the same query :
db.mycoll.find({ a:"OK", b:true, c:"ProviderA", d:true, e:{ $gte:ISODate("2016-10-28T12:00:01Z"),$lt:ISODate("2016-10-28T12:00:02") } }).sort({f:1});
the index is indeed used. However too much keys seems to be scan, I may need to find a better order the fields in the query/index.
Mongo acts sometimes a bit strange when it comes to the index selection.
Mongo automagically decides what index to use. The smaller an index is the more likely it is used (especially indexes with only one field) - this is my experience. May be this happens because it is more often already loaded in RAM? To find out what index to use when Mongo performs test queries when it is idle. However the result is sometimes unexpected.
Therefore if you know what index to use you can force a query to use a specific index using the $hint option. You should try that.
Your two indexes used in the query and the sort does not overlap so MongoDB can not use them for index intersection:
Index intersection does not apply when the sort() operation requires an index completely separate from the query predicate.
I have a mongodb database, which has following fields:
{"word":"ipad", "date":20140113, "docid": 324, "score": 98}
which is a reverse index for a log of docs(about 120 millions).
there are two kinds of queries in my system:
one of which is :
db.index.find({"word":"ipad", "date":20140113}).sort({"score":-1})
this query fetch the word "ipad" in date 20140113, and sort the all docs by score.
another query is:
db.index.find({"word":"ipad", "date":20140113, "docid":324})
to speed up these two kinds of query, what index should I build?
Should I build two indexes like this?:
db.index.ensureIndex({"word":1, "date":1, "docid":1}, {"unique":true})
db.index.ensureIndex({"word":1, "date":1, "score":1}
but I think build the two index use two much hard disk space.
So do you have some good ideas?
You are sorting by score descending (.sort({"score":-1})), which means that your index should also be descending on the score-field so it can support the sorting:
db.index.ensureIndex({"word":1, "date":1, "score":-1});
The other index looks good to speed up that query, but you still might want to confirm that by running the query in the mongo shell followed with .explain().
Indexes are always a tradeoff of space and write-performance for read-performance. When you can't afford the space, you can't have the index and have to deal with it. But usually the write-performance is the larger concern, because drive space is usually cheap.
But maybe you could save one of the three indexes you have. "Wait, three indexes?" Yes, keep in mind that every collection must have an unique index on the _id field which is created implicitely when the collection is initialized.
But the _id field doesn't have to be an auto-generated ObjectId. It can be anything you want. When you have another index with an uniqueness-constraint and you have no use for the _id field, you can move that unique-constraint to the _id field to save an index. Your documents would then look like this:
{ _id: {
"word":"ipad",
"date":20140113,
"docid": 324
},
"score": 98
}
I am running a query like the following:
db.sales.find({location: "LLLA",
created_at: {$gt: ISODate('2012-07-27T00:00:00Z')}}).
sort({created_at: -1}).limit(3)
This query is working as expected, but its not performing fast enough. I have an index on the location field, and a separate index on the created_at field. Please let me know if you see me missing something obvious here to make it faster.
Thanks for your time on this.
Mongo won't use two indexes for a given query, if you want filter by location and sort by created_at you can add a compound index:
db.sales.ensureIndex({location: 1, created_at: -1})
You should run explain on your queries when you have issues like that. Sort in particular causes non-intuitive complications with indexing, and the explain command can occasionally make it obvious why.
It's also worth noting that you can usually sort by _id to approximate insert time (assuming auto generated ObjectId values), but that won't help you as much as a compound index will in this case since you're filtering on an additional field.
just use IDs:
db.sales.find({location: "LLLA"}).sort({_id: -1}).limit(3)