i want to know the difference between this 2 query:
myCollection.update ( {
a:1,
b:1,
$isolated:1 } );
myCollection.update ( {
$and:
[
{a:1},
{b:1},
{$isolated:1}
] } );
Basically i need to perform an .update() with $isolated for all the documents that have 'a=1 and b=1'. I'm confusing about how to write the '$isolated' param and how to be sure that the query work fine.
I would basically question the "need to perform" of your statement, especially considering lack of { multi: true } where you intend to match and update a lot of documents.
The second consideration here is that your proposed statement(s) lack any kind of update operation at all. That might be a consquence of the question you are asking about the "difference", but given your present apparent understanding of MongoDB query operations with relation to the usage of $and, then I seriously doubt you "need" this at all.
So "If" you really needed to write a statement this way, then it should look like this:
myCollection.update(
{ "a": 1, "b": 1, "$isolated": true },
{ "$inc": { "c": 1 } },
{ "multi": true }
)
But what you really "need" to understand is what that is doing.
Essentially this query is going to cause MongoDB to hold a "write lock", and at least on the collection level. So that no other operations can be performed until the entire wtite is complete. This also ensures that until that time, then all read attempts will only see the state of the document before any changes were made, until that operation is complete in full and then subsequent reads see all the changes.
This may "sound tempting", to be a good idea to some, but it really is not. Write locks affect the concurrency of updates and a generally a bad thing to be avoided. You might also be confusing this with a "transaction" but it is not, and as such any failure during the execution will only halt the operations at the point where it failed. This does not "undo" changes made within the $isolated block, and they will remain committed.
Just about the only valid use case here, would be where you "absolutely need", all of the elements to be modified matching "a" and "b" to maintain a consistent state in the event that something was "aggregating" that combination at the exact same time as this operation was run. In that case, then exposing "partially" altered values of "c" may not be desirable. But the range of usage of this is pretty slim, and most general applications do not require such consistency.
Back to the usage of $and, well all MongoDB arguments are implicitly an $and operation anyway, unless they are explicitly stated. The only general usage for $and is where you need multiple conditions on the same document key. And even then, that is generally better written on the "right side" of evaluation, such as with $gt and $lt:
{ "a": { "$gt": 1, "$lt": 3 } }
Being exactly the same as:
{
"$and": [
{ "a": { "$gt": 1 } },
{ "b": { "$lt": 3 } }
]
}
So it's really quite superfluous.
In the end, if all you really want to do is:
myCollection.update(
{ "a": 1, "b": 1 },
{ "$inc": { "c": 1 } },
)
Updating a single document, then there is no need for $isolated at all. Obtaining an explicit lock here is really just providing complexity to an otherwise simple operation that is not required. And even in bulk, you likely really do not need the consistency that is provided by obtaining the lock, and as such can simple do again:
myCollection.update(
{ "a": 1, "b": 1 },
{ "$inc": { "c": 1 } },
{ "multi": true }
)
Which will hapilly yield to allow writes on all selected documents and reads of the "latest" information. Generally speaking, "you want this" as "atomic" operators such as $inc are just going to modify the present value they see anyway.
So it does not matter if another process matched one of these documents before the "multi" write found that document in all the matches, since "both" $inc operations are going to execute anyway. All $isolated really does here is "ensure" that when this operation is started, then "it's" write will be the "first" committed, and then anything attempted during the lock will happen "after", as opposed to just the general order of when each operation is able to grab that document and make the modification.
In 9/10 cases, the end result is the same. The exception being that the "write lock" obtained here, "will" slow down other operations.
Related
Consider the use case where I am embedding Address document IDs into a User document. Threads 1 and 2 run at the same time followed by the Check results thread.
User:
addresses: [1, 2, 3]
Async Thread 1:
user = userCollection.find(userId)
user.addresses.push(4)
userCollection.update(user)
Async Thread 2:
user = userCollection.find(userId)
user.addresses.push(5)
userCollection.update(user)
Check results:
user = userCollection.find(userId)
user.addresses.includes(4) // result is non-deterministic
user.addresses.includes(5) // result is non-deterministic
Do I need to implement my own document level locks at the application level to prevent multi-threaded applications from overwriting data that other threads are currently writing?
Maybe I'm missing a built-in atomic function to append to an array? But what about the case of find / replace? I don't just want to 'push' a new value into the array but find an old ID that's been deleted and then remove it. And at the same time another thread wants to add to the array. I'm honestly not sure what the simplest solution is. I've written the problem in psuedo-javascript however I'm using golang for the project.
According to official doc,
In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document.
You won't need to care a lot about atomicity for a single document. For your given example, you can simply do an update with an aggregate pipeline, which is available for MongoDB v4.2+.
db.collection.update({
"userId": 1
},
[
{
"$set": {
"addresses": {
"$setUnion": [
{
"$filter": {
"input": "$addresses",
"as": "a",
"cond": {
$ne: [
"$$a",
3 // element you want to remove
]
}
}
},
// element you want to add
[
4
]
]
}
}
}
])
Here is the Mongo playground for your reference.
If you need to deal with multi-document atomicity, you can opt for transactions, which is available for MongoDB v4.0+
Consider the following two approaches of (maybe) updating a bunch of records:
Approach 1 ("find and maybe update"):
let ids = db.getCollection("users").find({
"status.lastActivity": {"$lte": timeoutDate}
}, {
"fields": {"_id": 1}
}).fetch().map(doc => {
doc = doc._id;
return doc
});
if (ids.length) {
db.getCollection("users").update({
"_id": {"$in": ids}
}, {
"$set": {
"status.idle": true
}
}, {
"multi": true
});
}
Approach 2 ("directly update"):
db.getCollection("users").update({
"status.lastActivity": {"$lte": timeoutDate}
}, {
"$set": {
"status.idle": true
}
}, {
"multi": true
});
And now to keep it simple let's assume that there are never users with a smaller status.lastActivity than timeoutDate (so ids is also always an empty array).
In that case I get a significantly better performance with Approach 1. Like Approach 1 takes 0.1 to 2 ms while Approach 2 takes 40 to 80 ms.
My question now is, why is that the case? I would have assumed MongoDB is 'clever' enough to do things similar to Approach 1 under the hood when I actually use Approach 2 and doesn't waste resource when there actually is no record matched by the selector...
And can I change it somehow so that it would work that way? Or have I maybe some kind of wrong configuration which is causing this and I could get rid of? Because obviously writing things like in Approach 2 would be leaner...
Is this in JS? db.getCollection("users").find( looks like it should return a Promise, and promises don't have length, so the update code that's gated by ids.length would never run.
I have a document that stores sensor data where the sensor readings are objects stored in an array. Example:
{
"readings": [
{
"timestamp": 1499475320,
"temperature": 121
},
{
"timestamp": 1499475326,
"temperature": 93
},
{
"timestamp": 1499475340,
"temperature": 142
}
]
}
I know how to push/add an item to the "readings" array. But what I need is when I add an item to the array, I also want to "clean" the array by removing items that have "timestamp" value older than a cutoff time.
Is this possible in mongodb?
The way I see this you basically have two options here that have varying approaches.
Restrict Arrays to Capped Size
The first option here is "not exactly" what you are asking for, but it is the option with the least implementation and execution overhead. The variance from your question is that instead of "removing past a certain age", we instead simply place a "limit/cap" on the total number of entries in the array.
This is actually done using the $slice modifier to $push:
Model.update(
{ "_id": docId },
{ "$push": {
"readings": {
"$each": [{ "timestamp": 1499478496679, "temperature": 100 }],
"$slice": -10
}
}
)
In this case the -10 argument restricts the array to only have the "last ten" entries from the end of the array since we are "appending" with $push. If you wanted instead the "latest" as the first entry then you would modify with $position and instead provide the "positive" value to $slice, which means "first ten" in contrast.
So it's not the same thing you asked for, but it is practical since the arrays do not have "unlimited growth" and you can simply "cap" them as each update is made and the "oldest" item will be removed once at the maximum length. This means the overall document never actually grows beyond a set size, and this is a very good thing for MongoDB.
Issue with Bulk Operations
The next case which actually does exactly what you ask uses "Bulk Operations" to issue "two" update operations in a "single" request to the server. The reason why it is "two" is because there is a rule that you cannot have different update operators "assigned to the same path" in a singe update operation.
Therefore what you want actually involves a $push AND a $pull operation, and on the "same array path" we need to issue those as "separate" operations. This is where the Bulk API can help:
Model.collection.bulkWrite([
{ "updateOne": {
"filter": { "_id": docId },
"update": {
"$pull": {
"readings": { "timestamp": { "$lt": cutOff } }
}
}
}},
{ "updateOne": {
"filter": { "_id": docId },
"update": {
"$push": { "timestamp": 1499478496679, "temperature": 100 }
}
}}
])
This uses the .bulkWrite() method from the underlying driver which you access from the model via .collection as shown. This will actually return a BulkWriteOpResult within the callback or Promise which contains information about the actual operations performed within the "batch". In this case it will be the "matched" and "modified" numbers which will be appropriate to the operations that were actually performed.
Hence if the $pull did not actually "remove" anything since the timestamp values were actually newer than the given constraint, then the modified count would only reflect the $push operation. But most of the time this need not concern you, where instead you would just accept that the operations completed without error and did something according to what you actually asked.
Conclude
So the general case of "both" is that it's really all done in one request and one response. The differences come in that "under the hood" the second approach which matches your request actually does do "two" operations per request and therefore takes microseconds longer.
There is actually no reason why you could not "combine" the logic of "both", and remove past your "cutoFF" as well as keeping a "cap" on the overall array size. But the general idea here is that the first implementation, though not exactly the same thing as asked will actually do a "good enough" job of "housekeeping" with little to no additional overhead on the request, or indeed the implementation of the actual code.
Also, whilst you can always "read the data" -> "modify" -> "save". That is not a really great pattern. And for best performance as well as "consistency" without conflict, you should be using the atomic operations to modify in just the same way as is outlined here.
I performed an update in sharded environment like :
db.collection.update({$or:[{a:2,b:3},{a:3,b:2}]},{$set:{x:5}})
But i got this error message :
update { q: { $or: [ {a:2,b:3},{a:3,b:2} ] }, u: {$set:{x:5}}, multi: false, upsert: false } does not contain _id or shard key for pattern { a: 1.0, b: 1.0 }
How can i perform this kind of update with the $or predicate on the shard key ?
Thanks
The main problem here is that your $or condition does not really make much sense without the "multi" parameter of the update statement. At least the sharding conditional logic thinks so, even if your intention was to only match a singular document.
In the "mind" of the sharding manager that lives withing the mongos router, the expectation is that you either target a singular shard or range of keys, or you are asking to access a possible variety or shards.
Here is the actual code handling this for reference:
// Validate that single (non-multi) sharded updates are targeted by shard key or _id
if (!updateDoc.getMulti() && shardKey.isEmpty() && !isExactIdQuery(updateDoc.getQuery())) {
return Status(ErrorCodes::ShardKeyNotFound,
stream() << "update " << updateDoc.toBSON()
<< " does not contain _id or shard key for pattern "
<< _manager->getShardKeyPattern().toString());
}
So as you should clearly see in the "if" condition, the expectation here is that there is either a definition of the "shard key" within the query, or at least an exact _id to facilitate an exact match.
Therefore your two provisions on making this valid for an update over shards is to either:
Include an "range" over possible values in the shard key with the query criteria. I don't know your shard key so I cannot really give a sample. But basically:
{
"shardKey": { "$gt": minShardKey, "$lt": maxShardKey },
"$or": [
{ "a": 2, "b": 3 },
{ "a": 3, "b": 2 }
]
}
As the query condition, where the minShardkey and maxShardKey refer to the minimum and maximum possible values on that key within the range ( on the also hypothetical "shardKey" field ) in order to make the manager consider that you really intend to search across all shards.
Include the "multi" option in the update like so:
db.collection.update(
{ "$or":[
{ "a": 2, "b": 3 },
{ "a": 3, "b": 2 }
]},
{ "$set": { "x":5 } },
{ "multi": true }
)
Which makes the selection "possibly" match more than one, and is therefore valid for searching over shards without needing a targeted shard key.
In either case, the logic is fulfilled in that you at least "intend" to search the conditions across the shards to find something or "things" that match the conditions you have given.
As an additional note, then also consider that "upsert" actions have similar restrictions, and that the general principle is that the shard key needs to be addressed, othewise the operation in invalid since there needs to be some indication of which shard to insert the new data.
I have this particular scenario where I have to update certain value in MongoDB depending on different attributes present in same Document. So I am trying to use findAndUpdate with where operator which will be passed a JavaScript function and I will also be using one of the attribute as find criteria. But it has been mentioned in MongoDB documentation that, one should not use where operator until it can not be avoided because of performance issue.
Now lets say I have 3 attributes id, counter1, counter2 in my document and I am updating counter1 by 1 only when counter1 + counter2 = 2. So I will be writing something like
db.mydb.findAndUpdate({"_id" : id, $where : function() {
this.counter1 + this.counter2 == 2 ;}},
{$inc : {counter1 : 1}})
Now my question is:
Will this particular approach create any performance issue? as I am using id as another nonWhere operator criteria to search for a document.
Or I should be having another attribute in mydb collection something called say sumCounter which will store the values of counter1 and counter2.
So the main catch with $where evaluation is that the conditional logic cannot process an "index" in order to filter out matches. In addition, it is JavaScript logic afterall, and needs to be compiled as well as there needs to be "object translation" from the native forms into something that will work with the evaluation in the JavacScript engine.
So it's use should be "very sparingly" and only when "absolutely" required, as in there is no other practical way. In your case this is an "update" operation, therefore if you need that logic then fine. If it where just a "query", then I would say to use $redact in the aggregation framework instead:
db.mydb.aggregate([
{ "$match": { "_id": id } } },
{ "$redact": {
"$cond": {
"if": {
"$eq": [
{ "$add": [ "$counter1", "$counter2" ] },
2
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
As that is at least all in native operators and therefore going to work faster than JavaScript.
As for "performance", then it is all relative. But however in your case where _id is a "unique" lookup, then the actual performance "hit" should be negligible as the "exact match" was already done on the "index" for the primary key.
This is the general advice for $where conditions. In that you "use them" generally in conjuction with other native query operators that do the "bulk" of the filtering. Then if it takes a few more CPU cycles to apply the conditions in your JavaScript logic ( and it is absolutely needed since there is no other way ), then so be it.
But if however your JavaScript based condition needs to scan many documents without the assistance of other filtering, then that is bad indeed.