I am making an application using pymongo wrapper for which my schema is like:
{
_id: <some_id>,
name: <some_name>,
my_tags: [<list_of_tags>]
}
Now I want to return those entries which falls under the user specified tags. For example,
I want to have entries where my_tags should be atleast ["college", "USA", "engineering"]. For that I read $all construct can be used. Now what I want to know is, would it be of any use making an index on my_tags. For my app, this type of queries are used extensively.
would it be of any use making an index on my_tags. For my app, this type of queries are used extensively.
Yes $all will use an index so it is still good to make one there however there are still optimisations that can be done for it: https://jira.mongodb.org/browse/SERVER-5331 and https://jira.mongodb.org/browse/SERVER-1000
Normally the docs will only warn you of when something can not use an index.
The syntax for the $all query is:
db.collection.find({'my_tags': {'$all': ['college', 'USA', 'engineering']}})
The documentation can be found at:
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24all
Related
I'm new in mongoDB and I'm facing an issue about performance that need your help. I have a collection with 400k records, when not create index for any field on the collection it takes 20-30s for each query then I create indexs for fields that usually using for search query, but the problem is, when using $regex to search for a string field with index on it, mongoDB does not use index on that field, mongodb still scan for all records in that collection, I've searched on internet with this keyword: "index on regex fields mongodb" and I found some answers which say that "MongoDB use prefix of RegEx to lookup indexes" which means you have to use "^" prefix for the index to work like "db.users.find({name: /^key word/})", but that is not working for me, does "index on $regex field" need MongoDB Atlas to work? because i'm using comunity version of mongoDB. Thanks!
There's a lot to unpack here. We'll split the answer into two parts, the first to try and answer some of the direct questions about index usage and the second to explore solutions to satisfy the application requirements.
Index Usage with $regex
As is true with an index in any database that captures the full string value as the key, MongoDB can use the index for a $regex operation but its efficiency in doing so greatly depends on the regex being applied. That is what the Index Use documentation from the comments and the other answers you reference are describing.
In the comments you mention that an example query might be db.users.find({name: {$regex: '.*keyword.*', $options: 'i'}}). That means that the regex is a both unanchored and case-insensitive. The aforementioned doumentation states directly:
Case insensitive regular expression queries generally cannot use indexes effectively.
Why is this? because the substring that you are searching for can be found in any string value captured by the index. So the document with matching value {name: 'a keyword'} would be located at one end of the index, {name: 'keyWord' }, may be somewhere in the middle, and {name: 'Z keyword'} may be at the end. The only way to ensure correct results is for the database to scan the index for all string values. So while it is still using the index, it may not be efficient as most of the scanned values will not be match and will be discarded.
You may always use .explain() to better understand how the database is answering the query, such as if and how it is using an index.
Solutions
So what do we do about this?
Well as #rickhg12hs suggests in the comments, it depends on exactly what you are trying to achieve. You reiterate that that you are looking for 'full regex search capability', but that is really an approach/solution rather than a goal. If what you really need, for example, is just to match an exact string in a case insensitive manner, then something as simple as a case insensitive index would likely do the trick.
However if truly do wish to perform arbitrary substring searching, then you are really looking at search engine capabilities. In that situation your best bets would probably be to emulate their indexes directly in MongoDB (e.g. have the application manually tokenize the strings to be indexed), stand up something like Solr/Elasticsearch next to MongoDB, or use MongoDB's Atlas Search offering. The $text operator mentioned in the comment has limitations when it comes to substring searching (such as just part of a word), which may or may not be relevant for your needs.
I'm using the airbnb sample set and it has a field that looks like:
"amenities": ["TV", "Cable TV", "Wifi"....
So I'm trying to do a case-INsensitive, wildcard search (on one or more values passed in).
Only thing I've found that works is:
{ amenities: { $in: [ /wi/ ] }}
Is that the best way?
So I ran it in Compass as the dataset was imported (5600 docs), and the Explain says it took ~20ms on my machine and warned there was no index. I then created an index on the amenities column and the same search jumped up to ~100ms. I just created the index through the Compass UI, so not sure why its taking 5x as long with an index? Or if there is a better way to do this?
The way to run that query is:
{ amenities: /wi/i }
//better but not always useful
{ amenities: /wi/i }, { amenities:1, _id:0 }
It already traverses the array, and to be case insensitive it must be on the options.
For multikey indexes the second query won't be a covered query. Otherwise, it would be blazing fast.
I've tested a similar search with and without index though. Exec. time is reduced 10X. (1500ms to 150ms, in a huge collection). Measure with Mongo Hacker.
As you report executionTimeMilliseconds is not that different. But still smaller.
The reason why you don't see a huge decrease in time is because the index stores each array entry separately. When it finds a match, it comes back to collection to fetch the whole array field, instead of using the indexes.
Probably indexes aren't very useful for arrays.
When querying with an unanchored regex, the query executor will have to scan every index key to see if there is a match.
You might find a collated index to be helpful.
Create an index with the appropriate collation, like:
(strength 1 and 2 are case-insensitive)
db.collection.createIndex({amenities:1},{collation:{locale:"en",strength:1}})
Then query using the same collation:
db.collection.find({amenities:"wifi"}).collation({locale:"en",strength:1})
The search will be case insensitive, and it can efficiently use the index.
I have the following MongoDB collection.
[{A:1, B:2}, {A:1, B:1}]
Is there a way to use the property "B" something as the following?
db.myCollection.find({A: '$B'})
I read that there is an approach that calculates the diff of A and B, but what I really want to know is if I can reference document fields when matching documents. This is important for me to understand what I can do and what I cannot do.
I don't believe there is a quick way of doing this. If you want to do it, you can specify a custom $where though, for example:
db.myCollection.find({"$where": "this.A == this.B"})
As noted in the manual though, this will require running that Javascript code for every record in the collection, so this won't be the fastest query in the world.
I have a mongodb collection full of 65k+ documents, each one with a properties named site_histories. The value of it is an array that might be empty, or might not be. If it is not empty, it will have one or more objects similar to this:
"site_histories" : "[{\"site_id\":\"129373\",\"accepted\":\"1\",\"rejected\":\"0\",\"pending\":\"0\",\"user_id\":\"12743\"}]"
I need to make a query that will look for every instance in the collection of a document that does not have a given user_id.
I'm pretty new to Mongo, so I was trying to make a query that would find every instance that does have the given user_id, which I was then planning on adding a "$ne" to, but even that didn't work. This is the query I was using that didn't work:
db.test.find({site_histories: { $elemMatch: {user_id: '12743\' }}})
So can anyone tell me why this query didn't work? And can anyone help me format a query that will do what I need the final query to do?
If your site_histories really is an array, it should be as simple as doing:
db.test.find({"site_histories.user_id": "12743"})
That looks in all the elements of the array.
However, I'm a bit scared of all those backslashes. If site_histories is a string, that won't work. It would mean that the schema is poorly designed, you'd maybe try with $regex
I am not sure if it is possible, I'm just curious.
I have a region collection with the purpose of being loaded to a web dropdown. Not the big thing there.
{
_id:"some numeric id",
name:"region name",
}
and having the index created like db.regions.createIndex({name:1})
I tried both db.regions.find({},{_id:0}).sort(name:1) and without the sort.
Using the explain shows that the totalDocsExamined is greater than zero, so that means that it is not a covered query if I understood the concept.
For a covered query, you have to explicitly list each of the covered fields in the projection:
db.regions.find({}, {_id:0, name: 1}).sort({name:1})
If you only exclude _id, you're telling MongoDB to include all fields besides _id. You know that all your docs only have a name field, but MongoDB doesn't so you have to be explicit.