MongoDB simplifying double $elemMatch query - mongodb

I am still a bit humble with mongo queries, is it possible, or - regarding performance - necessary, to put the following MongoDB query into a smarter form? Does the double use of $elemMatch affect performance?
Example for a db full of chicken-coops:
{chickens: {$elemMatch: {recentlyDroppedEggs: {$elemMatch:{appearance:"red-blue-striped"}}}}}
for finding all chicken coops that have a chicken (in its chickens-array) which recently dropped a red-blue-striped egg (into its recentlyDroppedEggs-array).
Thanks for any hints!

No, you don't need $elemMatch for that. You could just use:
{'chickens.recentlyDroppedEggs.appearance': 'red-blue-striped'}}}}
$elemMatch is typically only needed when you want to match multiple fields in an array element or apply multiple operators to a single field (e.g. $lt and $gt).

Related

mongodb index on regex fields not working

I'm new in mongoDB and I'm facing an issue about performance that need your help. I have a collection with 400k records, when not create index for any field on the collection it takes 20-30s for each query then I create indexs for fields that usually using for search query, but the problem is, when using $regex to search for a string field with index on it, mongoDB does not use index on that field, mongodb still scan for all records in that collection, I've searched on internet with this keyword: "index on regex fields mongodb" and I found some answers which say that "MongoDB use prefix of RegEx to lookup indexes" which means you have to use "^" prefix for the index to work like "db.users.find({name: /^key word/})", but that is not working for me, does "index on $regex field" need MongoDB Atlas to work? because i'm using comunity version of mongoDB. Thanks!
There's a lot to unpack here. We'll split the answer into two parts, the first to try and answer some of the direct questions about index usage and the second to explore solutions to satisfy the application requirements.
Index Usage with $regex
As is true with an index in any database that captures the full string value as the key, MongoDB can use the index for a $regex operation but its efficiency in doing so greatly depends on the regex being applied. That is what the Index Use documentation from the comments and the other answers you reference are describing.
In the comments you mention that an example query might be db.users.find({name: {$regex: '.*keyword.*', $options: 'i'}}). That means that the regex is a both unanchored and case-insensitive. The aforementioned doumentation states directly:
Case insensitive regular expression queries generally cannot use indexes effectively.
Why is this? because the substring that you are searching for can be found in any string value captured by the index. So the document with matching value {name: 'a keyword'} would be located at one end of the index, {name: 'keyWord' }, may be somewhere in the middle, and {name: 'Z keyword'} may be at the end. The only way to ensure correct results is for the database to scan the index for all string values. So while it is still using the index, it may not be efficient as most of the scanned values will not be match and will be discarded.
You may always use .explain() to better understand how the database is answering the query, such as if and how it is using an index.
Solutions
So what do we do about this?
Well as #rickhg12hs suggests in the comments, it depends on exactly what you are trying to achieve. You reiterate that that you are looking for 'full regex search capability', but that is really an approach/solution rather than a goal. If what you really need, for example, is just to match an exact string in a case insensitive manner, then something as simple as a case insensitive index would likely do the trick.
However if truly do wish to perform arbitrary substring searching, then you are really looking at search engine capabilities. In that situation your best bets would probably be to emulate their indexes directly in MongoDB (e.g. have the application manually tokenize the strings to be indexed), stand up something like Solr/Elasticsearch next to MongoDB, or use MongoDB's Atlas Search offering. The $text operator mentioned in the comment has limitations when it comes to substring searching (such as just part of a word), which may or may not be relevant for your needs.

MongoDB Query Nested Array Search

I need to query documents with mongoDb that contain nested arrays. I see a lot of examples using the simple $in operator. The only problem is that I strictly need to check for proper subsets.
Consider the following document.
{data: [[1,2,3], [4,5,6]]}
The query needs to be able to get documents with all of [1,2,3] where 1,2,3 can be in any order, which rules out the following query, because it will only match in the correct order.
{data:{$elemMatch:{$all:[[1,2,3]]}}}
I've also tried nested $elemMatch operators with no success, because the $in operator will return the document even if only one element matches such as the following.
{data:{$elemMatch:{$elemMatch:{$in:[1,4]}}}}
Not sure what your actual query looks like, but this should do what you need:
db.documentDto.find({"some_field":{"$elemMatch":{"$in":[1,2,3]}} })
I haven't got a complete answer (and not much time as its late here) but I would consider
Using aggregation pipeline instead of a query if your not already
Use $unwind operator to deconstruct your nested arrays
Use $sort to sort the contents of the arrays - so you can now compare
Use $match to filter out the arrays which don't fit the array subset values as you can now check based on order.
Use $group to group the result back together based on the _id value
Ref:
http://docs.mongodb.org/manual/reference/operator/aggregation-pipeline/ will give you info on each of the above.
From a quick search I came up with a similar question/example that might be helpful: Mongodb sort inner array

how to find quickly all mongo documents having at least one element in an array

I have the following query
db.runCommand(
{"text":"item","search":"\"price\" ",
"project":{"_id":1},
"limit":1,
"filter":{"quotes":{"$not":{"$size":0}}}}
);
But the filter part is taking a long time. For your understanding, "quotes" is a simple array of embeded documents. Is it possible to create an index to find all elements having at least one quote quickly?
EDIT:
To be more specific: The question is not only about "how to query" but "how to make a useful index".
I think the quicker way is this one:
db.collection.find({array: {$elemMatch: {$exists: true}}})
The negation operators like $not and $nin generally perform slower. Can you check if the below query performs better and meets your needs?
db.collection.runCommand("text", {"search":"\"price\" ",
"project":{"_id":1},
"limit":1,
"filter":{"quotes.0":{"$exists":true}}}
);

How to check what index I use?

Hi how can check what index use , and number of scanned objects in aggregate query , something similar to
db.collection.find().explain() ?
Right now, there is no explain functionality for aggregate() yet. However, in general indexes are only used for certain operators if they are the first element in the aggregation operator pipeline. For example, $match and $geoNear.
So in order to figure out which index is being used, simply run the explain() on a find() where the query matches your first $match options.
explain() functionality for aggregate() is an issue in JIRA: https://jira.mongodb.org/browse/SERVER-4504 — I would suggest you vote for the issue on JIRA as well.

Replace the use of $where in MongoDB

I have a question, I need to compare if the division of two fields is greater or equal than some value. So the solution I've found is:
{
"$where": "this.total / this.limit > 0.6"
}
But the docs says that the use of $where isn't good for performance, because it will run a javascript function, and lose the indexes.
Does someone has a better solution for this, that doesn't use $where?
Thanks !
You could move to the aggregation framework here using divide ( http://docs.mongodb.org/manual/reference/aggregation/#_S_divide ):
db.col.aggregate([
{$match: {_id: whatever_or_whatever_clause_you_want}},
{$project: {
// Your document fields
divided_limit: {$divide: ['$total', '$limit']}
}},
{$match: {divided_limit: {$gt: 0.6}}}
]);
Note: the aggregation framework is new since v2.1
But the docs says that the use of $where isn't good for performance, because it will run a javascript function, and lose the indexes
You can't use an index for this anyway, there is no way to use an index for mathematical functions like this. The key thing is that the JS engine is up to 16X slower than normal querying, the aggregation framework should be faster. Not only that but the JS lock is global for all queries.
Of course the fastest method is to pre-aggregate this sum into another field upon modifying the record in your application, then you won't need any of this, just a normal query.