(Boolean logic) What is a workaround for duplicate keys when working with boolean logic in MongoDB? - mongodb

When I'm trying to query for several categories in a database, such as:
Example: {
"AttributeA": type,
"AttributeB": type,
"AttributeC": type,
"AttributeD": type
etc...
}
And say that I'm trying to query for samples that have (AttributeA matching criteria1 or AttributeB matching criteria2) and (AttributeC matching criteria3, 4, 5, or 6). How would we compose the .find() method in MongoDB to match without error or complicated boolean algebra (distributing out the "and")?
I've tried using notation such as:
db.Example.find({
$or:[{"AttributeA" : criteria1, "AttributeB" : criteria2}],
$or:[{"AttributeC" : criteria3, "AttributeC" : criteria4, ...}]
})
in an attempt for an easy query satisfying the conditions above, but in some cases this would result in only one condition of the query being satisfied (giving a "duplicate key '$or'" warning before compiling) and in other instances (different platforms) this straight up gives an error along the lines of the warning.

If I understood the question correctly, I think you are looking for a query predicate such as:
{
$or: [
{
AttributeA: "criteria1"
},
{
AttributeB: "criteria2"
}
],
AttributeC: {
$in: [
"criteria3",
"criteria4",
"criteria5"
]
}
}
See how it works on this playground example
How would we compose the .find() method in MongoDB to match without ... complicated boolean algebra (distributing out the "and")?
By default, MongoDB applies an implicit $and operator to query predicates at the same level. There is an explicit operator also available, but it is not needed in your situation. You can allow the database to distribute the condition(s) that is logically $anded for you.
AttributeC matching criteria3, 4, 5, or 6
This is a perfect use for the $in operator, which I've used in the query above as:
AttributeC: {
$in: [ "criteria3", "criteria4", "criteria5" ]
}

Related

Difference between aggregation and operator

I have been reading up on some mongodb documentation and ran across some confusing terminology, namely how to differentiate when a symbol will be used as an aggregate function or as an operator.
For example, the $size function either calculates the number of items in an array, or checks if the number of elements in an array is equal to a number, is there any way to know what the function will do at what time? Through trial and error I discovered that the $size will throw an error unless a number is passed to it in the $match step, but is there some rule/guideline so I can know what it will do beforehand?
db.collection.aggregate([
{
$project: {
key: 1,
number: {
$size: "$key"
}
}
},
{
$match: {
key: {
$size: 1
}
}
}
])
For querying data in MongoDB you can use the find method or the aggregate method. There are operators which you can use with these methods.
These query operators are used with the find method. Some of these can also used with the $match stage of the aggregate method (details later in the post).
These aggregation pipeline operators are used within the aggregate's stages. Some of these can also be used with the find method (details later in the post).
You will notice that, there are common operator names; for example, $eq, $gte, $or, $type, $size, etc. But, their usage and / or functionality can be different. The $eq operator has same function but different usage syntax and the $typeoperator has different functionality (and usage syntax).
And, some of these operators can be used with both the methods.
Some Usage Scenarios:
Lets consider a users collection and some queries:
{ "_id" : 1, "age" : 21, "firstname" : "John" }
{ "_id" : 2, "age" : 18, "firstname" : "John" }
{ "_id" : 3, "age" : "39", "firstname" : "Johnson" }
The query:
db.users.find( { firstname: { $eq: "John"}, age: { $gt: 20 } } )
This query's filter is same as { firstname: "John"}, age: { $gt: 20 } }. This uses the query operators $eq and $gt. The same query can be written in the aggregate method's $match stage:
db.users.aggregate([
{ $match: { firstname: "John", age: { $gt: 20 } } },
])
The operators used in this case are the same query operators. The query comparison operators can be used with the aggregate method's $match and $lookup stages.
Another Scenario:
db.users.aggregate([
{ $project: { ageGreaterThan20: { $gt: [ "$age", 20 ] } } },
])
This is the usage of aggregation comparison operator $gt. Note this is used within the aggregation query, but within the $project stage.
As you had used the query operators in the aggregation query, you can also use the aggregation operators within the find method. But, this must be used with the "special" $expr operator. For example:
db.users.find( { $expr: { $gt: [ "$age", 20 ] } } )
The advantage of the $expr is that - there are number of aggregation operators which can be used within the find queries. For example, using the $strLenCP:
db.users.find( { $expr: { $gt: [ { $strLenCP: "$firstname" }, 4 ] } } )
You can also use the $expr within the aggregation, within the $match or $lookup stages:
db.users.aggregate([
{ $match: { $expr: { $gt: [ "$age", 20 ] } } },
])
Finally:
I have been reading up on some mongodb documentation and ran across
some confusing terminology, namely how to differentiate when a symbol
will be used as an aggregate function or as an operator. ... but is
there some rule/guideline so I can know what it will do beforehand?
Reading helps, practicing helps better and experience helps the best. You use a specific operator in a specific scenario or use case. To achieve some functionality you use an appropriate method and operator(s).
Reference: Operators
As per my understanding, I'm pointing out few things :
Difference between aggregation and operator
You'll have certain basic functions for crud operations like .find() or .insert() or .delete() or .update() on MongoDB (There are few others like .count(), .distinct() but those are primary)
Versus
aggregation is a whole framework heavily used for complex reads, only two stages in aggregation is capable of writes $out and $merge.
Operators :
There are different types of operators :
Query and Projection Operators : These operators are crucial and are used to filter docs and to transform fields of docs in the response, Usually used in filter and project part of .find(filter, project) or .update(filter, update, project/options) etc.. These are also used in $match stage of aggregation pipeline ($match is similar to filter in .find()). Ex. :- $and, $or, $in, $or & more.
Update Operators : By name, these operators help to update documents in the collection. Usually used in update part of .update() or .findOneAndUpdate() etc.. Ex. :- $set, $unset, $inc & more.
Aggregation :
When it comes to aggregation they call it aggregation framework, definitely for a reason as you can do a lot of things with data using aggregation.
Aggregation has aggregation pipeline , which has syntax .aggregate([]). pipeline is an array with stages. Each stage in aggregation does certain operation on data flowing through them. Ex. :- $match, $project, $group etc..
As we know each document is independent on it's own, most of aggregation stages operate independently on each doc flowing through them.
Aggregation pipeline Operators :
These operators are generally used to achieve what you're looking for, certain operators can't be used in conjunction with other operators or in a stage.
Let's say in $match stage you would mostly use Query operators but not Aggregation operators as aggregation operators can't directly be used in $match in contrast with $project or $addFields stages.
Example with Stages & Operators :
Let's say you got to make Smoothie :
Out of a bunch of groceries you would filter needed fruits(docs in fruits collection) by matching with what you wanted to blend using $match stage ($and to match fruits/veggies, $lt to filter only fruits that hasn't expired and $size to limit no.of fruits needed).
You would peel off skin or chop into pieces to keep just keep useful parts of fruits(docs) using $project.
You would group all of the fruits into a blender using $group & add ingredients like cream & sugar using $addFields - You're too cautious about sugar quantity so you'll use operators like $size to check size and $multiply no.of nutrients based on conditions ($cond) you would $divide both sugar/nutrients to count nutrition value.
You'll iterate on adding ice pieces again and again using $map, $filter to remove un crushed ice pieces.
uhh, you always forget to add veggies (A different collection needs to be merged) based on fruits that already got blended you use $lookup to lookup for matching veggies(docs) & either blend in again with group or just put it on top.
Finally either you drink it from blender (just return the docs no more stages) or take into glass using $out or $merge stage (Remember as I said, only two stages that can write to a collection).
Off course every stage is Optional - You can eat fruits as is instead of blending (get all docs and all their fields) but you know what you wouldn't do that (for many reasons like performance, unnecessary data flowing through network) unless it's pre-prepared product (like a small configuration collection which has limited data with few docs & every doc is need & can be easily retrieved in one DB call).
Note :
Usually you can't use aggregation operators in filter part i.e; .find(filter) or in '$match' stage unless you use $expr.
Starting MongoDB v4.2 you can run aggregation pipeline in update-with-an-aggregation-pipeline where you can take advantage of aggregation stages/operators in update part.
While you search MongoDB's documentation often you would find the same operator in multiple places, Let's say if you search for $size you would find multiple references one is Query operator or as Projection operator or as aggregation operator - So depends on need/where you wanted to use you can refer their documentation for usage cause though name seems to be similar or does almost same but functionality may differ & syntax also differs.
You can always MongoDB for free at MongoDB University.

MongoDB $pop element which is in 3 time nested array

Here is the data structure for each document in the collection. The datastructure is fixed.
{
'_id': 'some-timestamp',
'RESULT': [
{
'NUMERATION': [ // numeration of divisions
{
// numeration of producttypes
'DIVISIONX': [{'PRODUCTTYPE': 'product xy', COUNT: 100}]
}
]
}
]
}
The query result should be in the same structure but only contain producttypes matching a regular expression.
I tried using an nested $elemMatchoperator but this doesn't get me any closer. I don't know how I can iterate each value in the producttypes array for each division.
How can I do that? Then I could apply $pop, $in and $each.
I looked at:
Querying an array of arrays in MongoDB
https://docs.mongodb.com/manual/reference/operator/update/each/
https://docs.mongodb.com/manual/reference/operator/update/pop/
... and more
The solution I want to avoid is writing something like this:
collection.find().forEach(function(x) { /* more for eaches */ })
Edit:
Here is an example document to copy:
{"_id":"5ab550d7e85d5930b0879cbe","RESULT":[{"NUMERATION":[{"DIVISION":[{"PRODUCTTYPE":"Book","COUNT":10},{"PRODUCTTYPE":"Giftcard","COUNT":"300"}]}]}]}
E.g. the query result should only return the entry with the giftcard:
{"_id":"5ab550d7e85d5930b0879cbe","RESULT":[{"NUMERATION":[{"DIVISION":[{"PRODUCTTYPE":"Giftcard","COUNT":"300"}]}]}]}
Using the forEach approach the result is in the correct format. I'm still looking for a better way which does not involve the use of that function - therefore I will not mark this as an answer.
But for now this works fine:
db.collection.find().forEach(
function(wholeDocument) {
wholeDocument['RESULT'].forEach(function (resultEntry) {
resultEntry['NUMERATION'].forEach(function (numerationEntry) {
numerationEntry['DIVISION'].forEach(function(divisionEntry, index) {
// example condition (will be replaced by regular expression evaluation)
if(divisionEntry['PRODUCTTYPE'] != 'Giftcard'){
numerationEntry['DIVISION'].splice(index, 1);
}
})
})
})
print(wholeDocument);
}
)
UPDATE
Thanks to Rahul Raj's comments I have read up the aggregation with the $redact operator. A prototype of the solution to the issue is this query:
db.getCollection('DeepStructure').aggregate( [
{ $redact: {
$cond: {
if: { $ne: [ "$PRODUCTTYPE", "Giftcard" ] },
then: "$$DESCEND",
else: "$$PRUNE"
}
}
}
]
)
I hope you're trying to update nested array.
You need to use positional operators $[] or $ for that.
If you use $[], you will be able to remove all matching nested array elements.
And if you use $, only the first matching array element will get removed.
Use $regex operator to pass on your regular expression.
Also, you need to use $pull to remove array elements based on matching condition. In your case, its regular expression. Note that $elemMatch is not the correct one to use with $pull as arguments to $pull are direct queries to the array.
db.collection.update(
{/*additional matching conditions*/},
{$pull: {"RESULT.$[].NUMERATION.$[].DIVISIONX":{PRODUCTTYPE: {$regex: "xy"}}}},
{multi: true}
)
Just replace xy with your regular expression and add your own matching conditions as required. I'm not quite sure about your data set, but I came up with the above answer based on my assumptions from the given info. Feel free to change according to your requirements.

How often should we use where operator of MongoDB

I have this particular scenario where I have to update certain value in MongoDB depending on different attributes present in same Document. So I am trying to use findAndUpdate with where operator which will be passed a JavaScript function and I will also be using one of the attribute as find criteria. But it has been mentioned in MongoDB documentation that, one should not use where operator until it can not be avoided because of performance issue.
Now lets say I have 3 attributes id, counter1, counter2 in my document and I am updating counter1 by 1 only when counter1 + counter2 = 2. So I will be writing something like
db.mydb.findAndUpdate({"_id" : id, $where : function() {
this.counter1 + this.counter2 == 2 ;}},
{$inc : {counter1 : 1}})
Now my question is:
Will this particular approach create any performance issue? as I am using id as another nonWhere operator criteria to search for a document.
Or I should be having another attribute in mydb collection something called say sumCounter which will store the values of counter1 and counter2.
So the main catch with $where evaluation is that the conditional logic cannot process an "index" in order to filter out matches. In addition, it is JavaScript logic afterall, and needs to be compiled as well as there needs to be "object translation" from the native forms into something that will work with the evaluation in the JavacScript engine.
So it's use should be "very sparingly" and only when "absolutely" required, as in there is no other practical way. In your case this is an "update" operation, therefore if you need that logic then fine. If it where just a "query", then I would say to use $redact in the aggregation framework instead:
db.mydb.aggregate([
{ "$match": { "_id": id } } },
{ "$redact": {
"$cond": {
"if": {
"$eq": [
{ "$add": [ "$counter1", "$counter2" ] },
2
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
As that is at least all in native operators and therefore going to work faster than JavaScript.
As for "performance", then it is all relative. But however in your case where _id is a "unique" lookup, then the actual performance "hit" should be negligible as the "exact match" was already done on the "index" for the primary key.
This is the general advice for $where conditions. In that you "use them" generally in conjuction with other native query operators that do the "bulk" of the filtering. Then if it takes a few more CPU cycles to apply the conditions in your JavaScript logic ( and it is absolutely needed since there is no other way ), then so be it.
But if however your JavaScript based condition needs to scan many documents without the assistance of other filtering, then that is bad indeed.

What is the purpose of $eq

I noticed a new $eq operator released with MongoDB 3.0 and I don't understand the purpose of it. For instance these two queries are exactly the same:
db.users.find({age:21})
and
db.users.find({age:{$eq:21}})
Does anyone know why this was necessary?
The problem was that you'd have to handle equality differently from comparison when you had some kind of query builder, so it's
{ a : { $gt : 3 } }
{ a : { $lt : 3 } }
but
{ a : 3 }
for equality, which looks completely different. The same applies for composition of $not, as JohnnyHK already pointed out. Also, comparing with $eq saves you from having to $-escape user provided strings. Therefore, people asked for alternatives that are syntactically closer and it was implemented. The Jira ticket contains a longer discussion which mentions all these points.
The clearer syntax of an $eq operator might also make sense in the aggregation framework to compare two fields, should such a feature be implemented.
Also, the feature has apparently been around since 2.5, was added to the documentation relatively late.
One specific application I can see for $eq is with cases like the $not operator which requires that its value is an operator-expression.
This allows you to construct a query like:
db.zips.find({state: {$not: {$eq: 'NY'}}})
Before, the closest you could get to this semantically was:
db.zips.find({state: {$not: {$regex: /^NY$/}}})
I realize there are other ways to represent the functionality of that query, but if you need to use the $not operator there for other reasons, this would now allow it.
In filter part of an aggregation query if you need to check if some field is equal to a value you can not use assign syntax:
db.sales.aggregate([
{
$project: {
items: {
$filter: {
input: "$items",
as: "item",
cond: { $eq: [ "$$item.price", 100 ] }
}
}
}
}
])

Find documents with arrays not containing a document with a particular field value in MongoDB

I'm trying to find all documents that do not contain at least one document with a specific field value. For example here is a sample collection:
{ _id : 1,
docs : [
{ foo : 1,
bar : 2},
{ foo : 3,
bar : 3}
]
},
{ _id : 2,
docs : [
{ foo : 2,
bar : 2},
{ foo : 3,
bar : 3}
]
}
I want to find every record where there is not a document in the docs block that does not contain at least one record with foo = 1. In the example above, only the second document should be returned.
I have tried the following, but it only tells me if there are any that don't match (which returns document 1.
db.collection.find({"docs": { $not: {$elemMatch: {foo: 1 } } } })
UPDATE: The query above actually does work. As many times happens, my data was wrong, not my code.
I have also looked at the $nin operator but the examples only show when the array contains a list of primitive values, not an additional document. When I've tried to do this with something like the following, it looks for the EXACT document rather than just the foo field I want.
db.collection.find({"docs": { $nin: {'foo':1 } } })
Is there anyway to accomplish this with the basic operators?
Using $nin will work, but you have the syntax wrong. It should be:
db.collection.find({'docs.foo': {$nin: [1]}})
Use the $ne operator:
db.collection.find({'docs.foo': {$ne: 1}})
Update: I'd advise against using $nin in this case.
{'docs.foo': {$ne: 1}} takes all elements of docs, and for each of them it checks whether the foo field equals 1 or not. If it finds a match, it discards the document from the result list.
{'docs.foo': {$nin: [1]}} takes all elements of docs, and for each element it checks whether its foo field matches any of the members of the array [1]. This is a Cartesian product, you compare an array to another array, each element to each element. Although MongoDB might be smart and optimize this query, I assume you only use $nin because "it has do to something with arrays". But if you understand what you do here, you'll realize $nin is superfluous, and has possibly subpar performance.