How often should we use where operator of MongoDB - mongodb

I have this particular scenario where I have to update certain value in MongoDB depending on different attributes present in same Document. So I am trying to use findAndUpdate with where operator which will be passed a JavaScript function and I will also be using one of the attribute as find criteria. But it has been mentioned in MongoDB documentation that, one should not use where operator until it can not be avoided because of performance issue.
Now lets say I have 3 attributes id, counter1, counter2 in my document and I am updating counter1 by 1 only when counter1 + counter2 = 2. So I will be writing something like
db.mydb.findAndUpdate({"_id" : id, $where : function() {
this.counter1 + this.counter2 == 2 ;}},
{$inc : {counter1 : 1}})
Now my question is:
Will this particular approach create any performance issue? as I am using id as another nonWhere operator criteria to search for a document.
Or I should be having another attribute in mydb collection something called say sumCounter which will store the values of counter1 and counter2.

So the main catch with $where evaluation is that the conditional logic cannot process an "index" in order to filter out matches. In addition, it is JavaScript logic afterall, and needs to be compiled as well as there needs to be "object translation" from the native forms into something that will work with the evaluation in the JavacScript engine.
So it's use should be "very sparingly" and only when "absolutely" required, as in there is no other practical way. In your case this is an "update" operation, therefore if you need that logic then fine. If it where just a "query", then I would say to use $redact in the aggregation framework instead:
db.mydb.aggregate([
{ "$match": { "_id": id } } },
{ "$redact": {
"$cond": {
"if": {
"$eq": [
{ "$add": [ "$counter1", "$counter2" ] },
2
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
As that is at least all in native operators and therefore going to work faster than JavaScript.
As for "performance", then it is all relative. But however in your case where _id is a "unique" lookup, then the actual performance "hit" should be negligible as the "exact match" was already done on the "index" for the primary key.
This is the general advice for $where conditions. In that you "use them" generally in conjuction with other native query operators that do the "bulk" of the filtering. Then if it takes a few more CPU cycles to apply the conditions in your JavaScript logic ( and it is absolutely needed since there is no other way ), then so be it.
But if however your JavaScript based condition needs to scan many documents without the assistance of other filtering, then that is bad indeed.

Related

Is it possible to $setOnInsert with aggregation pipeline?

MongoDB has recently added an option to perform an update operation by providing an aggregation pipeline rather than the standard modifier object. Check MongoDB's docs on this topic.
The ability to use aggregation pipeline, whose statements can refer to existing document properties, can be extremely useful in situations when certain fields needs to be evaluated based on other fields, e.g. during data migration.
Moreover, most of the standard update operators like $set, $push, $inc, etc. can be successfully replicated with the aggregation expression language so in some sense this new functionality generalizes the good old modifiers technique. Though, I must admit the pipeline can become quite verbose if one tries to do things like $addToSet. This of course brings up a whole bunch of performance related questions, but let's ignore them for now.
So far, there's been just one thing which I haven't been able to fully replicate with the aggregation pipeline update, namely the $setOnInsert operator. Let's assume that I want to perform an upsert:
db.test.update(selector, pipeline, { upsert: true });
My initial intuition was that the $$ROOT variable (which I can use in the pipeline) will equal null unless there exists a document that matches selector. Unfortunately, but probably for a good reason, MongoDB developers decided that $$ROOT should be derived from selector by default. It makes sense when you think about how normal $setOnInsert works, but it also makes it practically impossible to distinguish between an update and an insert within pipeline.
I know what you're thinking. You can look at $$ROOT._id. This is a good idea, though if _id is part of the selector it doesn't work anymore. I have figured out that this can be bypassed by tricking MongoDB a little bit and doing things like:
selector = {
_id: { $in: [value, 'fake'] },
}
instead if the simpler { _id: value }, but this doesn't look clean. Please note that if $in only contains one element, then Mongo is actually clever enough to figure out what the identifier should be and it populates $$ROOT accordingly (sic!).
I am wondering if anyone has a better idea how to approach this. Maybe there's some hidden variable that I could potentially use inside the pipeline itself to distinguish between update and insert (e.g. in $merge stage there's $$new variable which serves a similar purpose)?
If there is no matching documents, $$ROOT will have only _id field. So you can transform $$ROOT to array by its key/value pairs and check if the size of that array is equal to 1. If it is then create a new document, and if it is not then do nothing.
$objectToArray and $size to convert $$ROOT to an array by its key/value pairs and to get the size of that array
$cond to check if the size of the array above is equal to 1. If it is then merge current $$ROOT (which is only _id field) with the update object. If it is not, return the current $$ROOT. In both scenarios, put result in result feild.
$mergeObjects to merge $$ROOT and the update that you are sending, and put that in the result field
$replaceRoot to replace root to the result field from previous stage
db.collection.update({
_id: 1
},
[
{
$set: {
result: {
$cond: {
if: {
"$eq": [
{
$size: {
$objectToArray: "$$ROOT"
},
},
1
]
},
then: {
$mergeObjects: [
"$$ROOT",
{
key: 3
}
]
},
else: "$$ROOT"
},
}
}
},
{
$replaceRoot: {
newRoot: "$result"
}
}
],
{
upsert: true
})
Working example

How to match a trigger on a specific field in mongodb stitch?

The $match expression on mongodb's stitch application does not work properly.
I am trying to set up a simple update trigger that will only work on one field in a collection.
The trigger setup provides a $match aggregation which seems simple enough to set up.
For example if I want the trigger to only fire when the field "online" in a specified collection gets set to "true" I would do:
{"updateDescription.updatedFields":{"online":"true"}}
which for a stitch trigger is the same as:
{$match:{{updateDescription.updatedFields:{online:"true"}}}
The problem is when i try to match an update on a field that is an object.(for example hours:{online:40,offline:120}
For some reason $exists or $in does not work
So doing:
{"updateDescription.updatedFields":{"hours":{"$exists":true}}
does not work,neither does something like:
{"updateDescription.updatedFields":{"hours.online":{"$exists":true}}
The $match for the trigger is supposed to work exactly like a normal mongo $match.
They just provide one example :
{
"updateDescription.updatedFields": {
"status": "blocked"
}
}
The example is from here:
https://docs.mongodb.com/stitch/triggers/database-triggers/
I tried 100's of variations but i can't seem to get it
The trigger is working fine if the match is a specific value like:
{"updateDescription.updatedFields":{"hours.online":{"$numberInt\":"20"}}
and then i set the hours.online to 20 in the database.
I was able to have it match items by using an explicit $expr operator or declare it as a single field not an embedded object. ie. "updateDescription.updatedFields.statue": "blocked"
I struggled with this myself, trying to get a trigger to fire when a certain nested field was updated to any value (rather than just one specific one).
The issue appears to have to do with how change streams report updated fields.
With credit and thanks to MongoDB support, I can finally offer this as a potential solution, at least for simpler cases:
{
"$expr": {
"$not": {
"$cmp": [{
"$let": {
"vars": { "updated": { "$objectToArray": "$updateDescription.updatedFields" } },
"in": { "$arrayElemAt": [ "$$updated.k", 0 ] }
}},
"example.subdocument.nested_field"
]
}
}
}
Of course replace example.subdocument.nested_field with your own dot-notation field path.

MapReduce function to return two outputs. MongoDB

I am currently using doing some basic mapReduce using MongoDB.
I currently have data that looks like this:
db.football_team.insert({name: "Tane Shane", weight: 93, gender: "m"});
db.football_team.insert({name: "Lily Jones", weight: 45, gender: "f"});
...
I want to create a mapReduce function to group data by gender and show
Total number of each gender, Male & Female
Average weight of each gender
I can create a map / reduce function to carry out each function seperately, just cant get my head around how to show output for both. I am guessing since the grouping is based on Gender, Map function should stay the same and just alter something ont he reduce section...
Work so far
var map1 = function()
{var key = this.gender;
emit(key, {count:1});}
var reduce1 = function(key, values)
{var sum=0;
values.forEach(function(value){sum+=value["count"];});
return{count: sum};};
db.football_team.mapReduce(map1, reduce1, {out: "gender_stats"});
Output
db.football_team.find()
{"_id" : "f", "value" : {"count": 12} }
{"_id" : "m", "value" : {"count": 18} }
Thanks
The key rule to "map/reduce" in any implementation is basically that the same shape of data needs to be emitted by the mapper as is also returned by the reducer. The key reason for this is part of how "map/reduce" conceptually works by quite possibly calling the reducer multiple times. Which basically means you can call your reducer function on output that was already emitted from a previous pass through the reducer along with other data from the mapper.
MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.
That said, your best approach to "average" is therefore to total the data along with a count, and then simply divide the two. This actually adds another step to a "map/reduce" operation as a finalize function.
db.football_team.mapReduce(
// mapper
function() {
emit(this.gender, { count: 1, weight: this.weight });
},
// reducer
function(key,values) {
var output = { count: 0, weight: 0 };
values.forEach(value => {
output.count += value.count;
output.weight += value.weight;
});
return output;
},
// options and finalize
{
"out": "gender_stats", // or { "inline": 1 } if you don't need another collection
"finalize": function(key,value) {
value.avg_weight = value.weight / value.count; // take an average
delete value.weight; // optionally remove the unwanted key
return value;
}
}
)
All fine because both mapper and reducer are emitting data with the same shape and also expecting input in that shape within the reducer itself. The finalize method of course is just invoked after all "reducing" is finally done and just processes each result.
As noted though, the aggregate() method actually does this far more effectively and in native coded methods which do not incur the overhead ( and potential security risks ) of server side JavaScript interpretation and execution:
db.football_team.aggregate([
{ "$group": {
"_id": "$gender",
"count": { "$sum": 1 },
"avg_weight": { "$avg": "$weight" }
}}
])
And that's basically it. Moreover you can actually continue and do other things after a $group pipeline stage ( or any stage for that matter ) in ways that you cannot do with a MongoDB mapReduce implementation. Notably something like applying a $sort to the results:
db.football_team.aggregate([
{ "$group": {
"_id": "$gender",
"count": { "$sum": 1 },
"avg_weight": { "$avg": "$weight" }
}},
{ "$sort": { "avg_weight": -1 } }
])
The only sorting allowed by mapReduce is solely that the key used with emit is always sorted in ascending order. But you cannot sort the aggregated result in output in any other way, without of course performing queries when output to another collection, or by working "in memory" with returned results from the server.
As a "side note" ( though an important one ), you probably should also consider in "learning" that the reality is the "server-side JavaScript" functionality of MongoDB is really a work-around more than being a feature. When MongoDB was first introduced, it applied a JavaScript engine for server execution mostly to make up for features which had not yet been implemented.
Thus to make up for the lack of the complete implementation of many query operators and aggregation functions which would come later, adding a JavaScript engine was a "quick fix" to allow certain things to be done with minimal implementation.
The result over the years is those JavaScript engine features are gradually being removed. The group() function of the API is removed. The eval() function of the API is deprecated and scheduled for removal at the next major version. The writing is basically "on the wall" for the limited future of these JavaScript on the server features, as the clear pattern is where the native features provide support for something, then the need to continue support for the JavaScript engine basically goes away.
The core wisdom here being that focusing on learning these JavaScript on the server features, is probably not really worth the time invested unless you have a pressing use case that currently cannot be solved by any other means.

MongoDB update with $isolated

i want to know the difference between this 2 query:
myCollection.update ( {
a:1,
b:1,
$isolated:1 } );
myCollection.update ( {
$and:
[
{a:1},
{b:1},
{$isolated:1}
] } );
Basically i need to perform an .update() with $isolated for all the documents that have 'a=1 and b=1'. I'm confusing about how to write the '$isolated' param and how to be sure that the query work fine.
I would basically question the "need to perform" of your statement, especially considering lack of { multi: true } where you intend to match and update a lot of documents.
The second consideration here is that your proposed statement(s) lack any kind of update operation at all. That might be a consquence of the question you are asking about the "difference", but given your present apparent understanding of MongoDB query operations with relation to the usage of $and, then I seriously doubt you "need" this at all.
So "If" you really needed to write a statement this way, then it should look like this:
myCollection.update(
{ "a": 1, "b": 1, "$isolated": true },
{ "$inc": { "c": 1 } },
{ "multi": true }
)
But what you really "need" to understand is what that is doing.
Essentially this query is going to cause MongoDB to hold a "write lock", and at least on the collection level. So that no other operations can be performed until the entire wtite is complete. This also ensures that until that time, then all read attempts will only see the state of the document before any changes were made, until that operation is complete in full and then subsequent reads see all the changes.
This may "sound tempting", to be a good idea to some, but it really is not. Write locks affect the concurrency of updates and a generally a bad thing to be avoided. You might also be confusing this with a "transaction" but it is not, and as such any failure during the execution will only halt the operations at the point where it failed. This does not "undo" changes made within the $isolated block, and they will remain committed.
Just about the only valid use case here, would be where you "absolutely need", all of the elements to be modified matching "a" and "b" to maintain a consistent state in the event that something was "aggregating" that combination at the exact same time as this operation was run. In that case, then exposing "partially" altered values of "c" may not be desirable. But the range of usage of this is pretty slim, and most general applications do not require such consistency.
Back to the usage of $and, well all MongoDB arguments are implicitly an $and operation anyway, unless they are explicitly stated. The only general usage for $and is where you need multiple conditions on the same document key. And even then, that is generally better written on the "right side" of evaluation, such as with $gt and $lt:
{ "a": { "$gt": 1, "$lt": 3 } }
Being exactly the same as:
{
"$and": [
{ "a": { "$gt": 1 } },
{ "b": { "$lt": 3 } }
]
}
So it's really quite superfluous.
In the end, if all you really want to do is:
myCollection.update(
{ "a": 1, "b": 1 },
{ "$inc": { "c": 1 } },
)
Updating a single document, then there is no need for $isolated at all. Obtaining an explicit lock here is really just providing complexity to an otherwise simple operation that is not required. And even in bulk, you likely really do not need the consistency that is provided by obtaining the lock, and as such can simple do again:
myCollection.update(
{ "a": 1, "b": 1 },
{ "$inc": { "c": 1 } },
{ "multi": true }
)
Which will hapilly yield to allow writes on all selected documents and reads of the "latest" information. Generally speaking, "you want this" as "atomic" operators such as $inc are just going to modify the present value they see anyway.
So it does not matter if another process matched one of these documents before the "multi" write found that document in all the matches, since "both" $inc operations are going to execute anyway. All $isolated really does here is "ensure" that when this operation is started, then "it's" write will be the "first" committed, and then anything attempted during the lock will happen "after", as opposed to just the general order of when each operation is able to grab that document and make the modification.
In 9/10 cases, the end result is the same. The exception being that the "write lock" obtained here, "will" slow down other operations.

What is the purpose of $eq

I noticed a new $eq operator released with MongoDB 3.0 and I don't understand the purpose of it. For instance these two queries are exactly the same:
db.users.find({age:21})
and
db.users.find({age:{$eq:21}})
Does anyone know why this was necessary?
The problem was that you'd have to handle equality differently from comparison when you had some kind of query builder, so it's
{ a : { $gt : 3 } }
{ a : { $lt : 3 } }
but
{ a : 3 }
for equality, which looks completely different. The same applies for composition of $not, as JohnnyHK already pointed out. Also, comparing with $eq saves you from having to $-escape user provided strings. Therefore, people asked for alternatives that are syntactically closer and it was implemented. The Jira ticket contains a longer discussion which mentions all these points.
The clearer syntax of an $eq operator might also make sense in the aggregation framework to compare two fields, should such a feature be implemented.
Also, the feature has apparently been around since 2.5, was added to the documentation relatively late.
One specific application I can see for $eq is with cases like the $not operator which requires that its value is an operator-expression.
This allows you to construct a query like:
db.zips.find({state: {$not: {$eq: 'NY'}}})
Before, the closest you could get to this semantically was:
db.zips.find({state: {$not: {$regex: /^NY$/}}})
I realize there are other ways to represent the functionality of that query, but if you need to use the $not operator there for other reasons, this would now allow it.
In filter part of an aggregation query if you need to check if some field is equal to a value you can not use assign syntax:
db.sales.aggregate([
{
$project: {
items: {
$filter: {
input: "$items",
as: "item",
cond: { $eq: [ "$$item.price", 100 ] }
}
}
}
}
])