Difference between findOneAndDelete() and findOneAndRemove() - mongodb

I am not able to differentiate findOneAndDelete() And findOneAndRemove() in the mongoose documentaion.
Query.prototype.findOneAndDelete()
This function differs slightly from Model.findOneAndRemove() in that
findOneAndRemove() becomes a MongoDB findAndModify() command, as
opposed to a findOneAndDelete() command. For most mongoose use cases,
this distinction is purely pedantic. You should use
findOneAndDelete() unless you have a good reason not to.

TLDR: You should use findOneAndDelete() unless you have a good reason not to.
Longer answer:
From the Mongoose documentation of findOneAndDelete:
This function differs slightly from Model.findOneAndRemove() in that findOneAndRemove() becomes a MongoDB findAndModify() command, as opposed to a findOneAndDelete() command. For most mongoose use cases, this distinction is purely pedantic. You should use findOneAndDelete() unless you have a good reason not to.
Mongoose Docs findOneAndDelete

In MongoDB native client (Mongoose is different) passing { remove: true } to findAndModify makes it remove the found document, it's behavior in this case would appear quite similar to findOneAndDelete.
However, there are some differences when using other options:
findAndModify takes 5 parameters :query, sort, doc (the update object), options and a callback. findOneAndDelete only takes 3 (filter or query, options and a callback)
options for findAndModify include w, wtimeout and j for write concerns, as:
“the level of acknowledgment requested from MongoDB for write operations to a standalone mongod or to replica sets or to sharded clusters.“ or simply guarantee levels available for reporting the success of a write operation.
options for findOneAndDelete do not include write concerns configuration.
findAndModify lets you return the new document and remove it at the same time.
Example:
// Simple findAndModify command returning the new document and
// removing it at the same time
collection.findAndModify({b:1}, [['b', 1]], {$set:{deleted: Date()}}, {remove:true}, calback)

Both of them are almost similar except findOneAndRemove uses findAndModify with remove flag and time complexity will be a bit higher compare to findOneAndDelete because you are doing an update. Delete are always faster.

In mongoose findOneAndDelete works the same way as findOneAndRemove. They both look an object by its properties in JSON, then goes ahead to delete it and as well as return it object once after deletion. When you are using the native mongodb as your database findOneAndDelete might be useful to you, but in the case of mongoose it is deprecated I may advise you to use findOneAndDelete to perform your operation based on the latest mongoose and nodejs configuration as at this period. https://github.com/Automattic/mongoose/issues/6880

Here is the exact difference (quoted from the mongoose docs in Model.findOneAndDelete() section):
"This function differs slightly from Model.findOneAndRemove() in that
findOneAndRemove() becomes a MongoDB findAndModify() command, as
opposed to a findOneAndDelete() command. For most mongoose use cases,
this distinction is purely pedantic. You should use findOneAndDelete()
unless you have a good reason not to."
Here is the link to it:
https://mongoosejs.com/docs/api.html#model_Model.findOneAndDelete

The other answers here have a lot of wrong info. They are for all intents and purposes the same, and you should just use findOneAndDelete().
Mongoose's Model.findOneAndRemove(query) becomes MongoDB's findAndModify({query, remove: true}).
Mongoose's Model.findOneAndDelete() becomes MongoDB's findOneAndDelete(). However, the MongoDB driver converts findOneAndDelete(query, opts) to findAndModify({query, remove: true}) (src). Thus they do the exact same thing in the database.
Both take the same options.
Both return the deleted document.

findOneAndRemove returns the removed document so if you remove a document that you later decide should not be removed, you can insert it back into the db. Ensuring your logic is sound before removing the document would be preferred to checks afterward IMO.
findOneAndDelete has the sort parameter which can be used to influence which document is updated. It also has a TimeLimit parameter which can control within which operation has to complete

I would suggest you use findOneAndDelete().
Mongoose provides both the features to handle data using the ORM and also features to write directly into the database, with findOneAndDelete() being one of the latter. Writing to database directly is more dangerous as you run the risk of not calling middleware or validators, potentially submitting partial or incomplete data to the database. Note that I said it's more dangerous, not that it's flat out dangerous, findOneAndDelete() just goes throughout the ORM adding safety.

Related

MongoDB sequence number based on count in one operation

I'm working on creating an immutable append only event log for MongoDB, in this I need a sequence number genereated and can base it off of the count of documents, since there will be no removals from the event log. However, I'm trying to avoid having to do two operations on MongoDB and would rather it happen in one "transaction" within the database itself.
If I were to do this from the Mongo shell, it would be something like below:
db['event-log'].insertOne({SequenceNumber: db['event-log'].count() +1 })
Is this doable in any way with the regular API?
Prior to v4, there was the possibility of doing eval - which would have made this much easier.
Update
The reason for my need of a sequence number is to be able to guarantee the order in which they were inserted when reading them back. Default behavior of Mongo is to retrieve them in the $natural order and one can explicitly define that on .find() as well (read more here). Although documentation is clear on not relying on it, it seems that as long as there are no modifications / removal of documents already there, it should be fine from what I can gather.
I realized also that I might get around this in another way as well, I'm going to introduce an Actor framework and I could make my committer a stateful actor with the sequence number in it if I need it.

Atomic replace operation for mongo document

I'm setting up a python application which uses mongodb (through pymongo).
I need to overwrite the contents of an entire document. This can be done either with update or replace. However, the mongo documentation isn't explicit about the atomicity of these operations - saying only that individual write operations are atomic, without explaining if update or replace use multiple write operations.
Does anyone know for sure if either of these operations is completely atomic?
find_and_modify is deprecated in the pymongo driver. Instead use one of:
find_one_and_delete
find_one_and_replace
find_one_and_update
The original find_and_modify had the potential to modify multiple documents which is probably not what was intended and is also not atomic.
For a truly ACID compiant sequence of modifications in MongoDB please look at MongoDB ACID Transactions. Supported since MongoDB 4.0, released last year.

Why does MongoDB map-reduce have implicit this

The map-reduce usage is following
db.myCollection.mapReduce(function() {
emit(this.smth);
},
function(key, values) {
// return something done with key and values
});
My question is, why is the map part implemented to have implicit this that references the current document being processed? IMO, it would be cleaner to have the current document passed in as an argument to the map function (I prefer to write all my JavaScript without this).
In practice this also rules out the use of arrow functions in mongo scripts, since this reference does not work with them.
why is the map part implemented to have implicit this that references the current document being processed?
MongoDB's Map/Reduce API was created in 2009, which is well before arrow functions were available in JavaScript (via ES6/ES2015). I can only speculate on the design intention, but much has changed in JavaScript (and MongoDB) since the original Map/Reduce implementation.
The this keyword in a JavaScript method refers to the owner or execution context, so setting it to the current document being processed was perhaps a reasonable convention (or convenience) for JavaScript usage at the time. The reduce function has a required prototype of function (key, values) so a map prototype of function (doc) might have been more consistent. However, once an API choice is made, any significant breaking changes become more challenging to introduce.
A more modern take on aggregation might look quite different, and this is the general path that MongoDB has taken. The Aggregation Framework introduced in MongoDB 2.2 (August, 2012) is a higher performance approach to data aggregation and should be preferred (where possible) over Map/Reduce.
Successive releases of the MongoDB server have made significant improvements to the features and performance of the Aggregation Framework, while Map/Reduce has not notably evolved. For example, the Aggregation Framework is written in C++ and able to manipulate MongoDB's native BSON data types; Map/Reduce spawns JavaScript threads and has to marshal data between BSON and JavaScript.
In practice this also rules out the use of arrow functions in mongo scripts, since this reference does not work with them.
Indeed. As at MongoDB 4.0, arrow functions are not supported in Map/Reduce. There is a feature request to support arrow functions which can you watch/upvote in the MongoDB issue tracker: SERVER-34281.

Why aren't defaults, setters, validators and middleware not applied for findOneAndModify mongoose?

While reading Mongoose's documentation, I found the following note for findOneAndModify:
Although values are cast to their appropriate types when using the
findAndModify helpers, the following are not applied:
defaults
setters
validators
middleware
The documentation goes on to explain that, in order to get these, one should follow the traditional approach, which uses findOne and save.
My question: why aren't these functions applied? I understand that this can be simply a design decision of the Mongoose developers, but, looking at the code for findOne and findOneAndUpdate, I don't see much difference.
Note: This is not necessarily specific to findOneAndUpdate, but applies to other methods like findOneAndRemove.
findOneAndUpdate allows you to make raw call to MongoDB with Mongoose. It just sends an findAndModify request to MongoDB.
setters, validators and middlewares requires Mongoose to fetch all the data first.
findOneAndUpdate is faster then the traditional way, because it simply makes a single call to MongoDB skipping all the Mongoose magic.
The only actual difference between Mongoose findOneAndUpdate function and raw db.collection.findAndModify operation is that Mongoose casts your update operation according to your schema.
Update. According to API docs it issues a mongodb findAndModify update command.
When you're using traditional way with findOne and save, Mongoose fetches all the data and wraps it into Mongoose document. Then it catches all your update operations applying your getters. Then, when you call save on the document, it runs all validators and hooks and issues an atomic update operation on modified fields. It's not replace the old document with the new one like raw MongoDB db.collection.save do.

It's not possible to lock a mongodb document. What if I need to?

I know that I can't lock a single mongodb document, in fact there is no way to lock a collection either.
However, I've got this scenario, where I think I need some way to prevent more than one thread (or process, it's not important) from modifying a document. Here's my scenario.
I have a collection that contains object of type A. I have some code that retrieve a document of type A, add an element in an array that is a property of the document (a.arr.add(new Thing()) and then save back the document to mongodb. This code is parallel, multiple threads in my applications can do theses operations and for now there is no way to prevent to threads from doing theses operations in parallel on the same document. This is bad because one of the threads could overwrite the works of the other.
I do use the repository pattern to abstract the access to the mongodb collection, so I only have CRUDs operations at my disposition.
Now that I think about it, maybe it's a limitation of the repository pattern and not a limitation of mongodb that is causing me troubles. Anyway, how can I make this code "thread safe"? I guess there's a well known solution to this problem, but being new to mongodb and the repository pattern, I don't immediately sees it.
Thanks
Hey the only way of which I think now is to add an status parameter and use the operation findAndModify(), which enables you to atomically modify a document. It's a bit slower, but should do the trick.
So let's say you add an status attribut and when you retrieve the document change the status from "IDLE" to "PROCESSING". Then you update the document and save it back to the collection updating the status to "IDLE" again.
Code example:
var doc = db.runCommand({
"findAndModify" : "COLLECTION_NAME",
"query" : {"_id": "ID_DOCUMENT", "status" : "IDLE"},
"update" : {"$set" : {"status" : "RUNNING"} }
}).value
Change the COLLECTION_NAME and ID_DOCUMENT to a proper value. By default findAndModify() returns the old value, which means the status value will be still IDLE on the client side. So when you are done with updating just save/update everything again.
The only think you need be be aware is that you can only modify one document at a time.
Hope it helps.
Stumbled into this question while working on mongodb upgrades. Unlike at the time this question was asked, now mongodb supports document level locking out of the box.
From: http://docs.mongodb.org/manual/faq/concurrency/
"How granular are locks in MongoDB?
Changed in version 3.0.
Beginning with version 3.0, MongoDB ships with the WiredTiger storage engine, which uses optimistic concurrency control for most read and write operations. WiredTiger uses only intent locks at the global, database and collection levels. When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation."
Classic solution when you want to make something thread-safe is to use locks (mutexes).
This is also called pessimistic locking as opposed to optimistic locking described here.
There are scenarios when pessimistic locking is more efficient (more details here). It is also far easier to implement (major difficulty of optimistic locking is recovery from collision).
MongoDB does not provide mechanism for a lock. But this can be easily implemented at application level (i.e. in your code):
Acquire lock
Read document
Modify document
Write document
Release lock
The granularity of the lock can be different: global, collection-specific, record/document-specific. The more specific the lock the less its performance penalty.
"Doctor, it hurts when I do this"
"Then don't do that!"
Basically, what you're describing sounds like you've got a serial dependency there -- MongoDB or whatever, your algorithm has a point at which the operation has to be serialized. That will be an inherent bottleneck, and if you absolutely must do it, you'll have to arrange some kind of semaphore to protect it.
So, the place to look is at your algorithm. Can you eliminate that? Could you, for example, handle it with some kind of conflict resolution, like "get record into local' update; store record" so that after the store the new record would be the one gotten on that key?
Answering my own question because I found a solution while doing research on the Internet.
I think what I need to do is use an Optimistic Concurency Control.
It consist in adding a timestamp, a hash or another unique identifier (I'll used UUIDs) to every documents. The unique identifier must be modified each time the document is modified. before updating the document I'll do something like this (in pseudo-code) :
var oldUUID = doc.uuid;
doc.uuid = new UUID();
BeginTransaction();
if (GetDocUUIDFromDatabase(doc.id) == oldUUID)
{
SaveToDatabase(doc);
Commit();
}
else
{
// Document was modified in the DB since we read it. We can't save our changes.
RollBack();
throw new ConcurencyException();
}
Update:
With MongoDB 3.2.2 using WiredTiger Storage implementation as default engine, MongoDB use default locking at document level.It was introduced in version 3.0 but made default in version 3.2.2. Therefore MongoDB now has document level locking.
As of 4.0, MongoDB supports Transactions for replica sets. Support for sharded clusters will come in MongoDB 4.2. Using transactions, DB updates will be aborted if a conflicting write occurs, solving your issue.
Transactions are much more costly in terms of performance so don't use Transactions as an excuse for poor NoSQL schema design!
An alternative is to do in place update
for ex:
http://www.mongodb.org/display/DOCS/Updating#comment-41821928
db.users.update( { level: "Sourcerer" }, { '$push' : { 'inventory' : 'magic wand'} }, false, true );
which will push 'magic wand' into all "Sourcerer" user's inventory array. Update to each document/user is atomic.
If you have a system with > 1 servers then you'll need a distributive lock.
I prefer to use Hazelcast.
While saving you can get Hazelcast lock by entity id, fetch and update data, then release a lock.
As an example:
https://github.com/azee/template-api/blob/master/template-rest/src/main/java/com/mycompany/template/scheduler/SchedulerJob.java
Just use lock.lock() instead of lock.tryLock()
Here you can see how to configure Hazelcast in your spring context:
https://github.com/azee/template-api/blob/master/template-rest/src/main/resources/webContext.xml
Instead of writing the question in another question, I try to answer this one: I wonder if this WiredTiger Storage will handle the problem I pointed out here:
Limit inserts in mongodb
If the order of the elements in the array is not important for you then the $push operator should be safe enough to prevent threads from overwriting each others changes.
I had a similar problem where I had multiple instances of the same application which would pull data from the database (the order did not matter; all documents had to be updated - efficiently), work on it and write back the results. However, without any locking in place, all instances obviously pulled the same document(s) instead of intelligently distributing their workforce.
I tried to solve it by implementing a lock on application level, which would add an locked-field in the corresponding document when it was currently being edited, so that no other instance of my application would pick the same document and waste time on it by performing the same operation as the other instance(s).
However, when running dozens or more instances of my application, the timespan between reading the document (using find()) and setting the locked-field to true (using update()) where to long and the instances still pulled the same documents from the database, making my idea of speeding up the work using multiple instances pointless.
Here are 3 suggestions that might solve your problem depending on your situation:
Use findAndModify() since the read and write operations are atomic using that function. Theoretically, a document requested by one instance of your application should then appear as locked for the other instances. And when the document is unlocked and visible for other instances again, it is also modified.
If however, you need to do other stuff in between the read find() and write update() operations, you could you use transactions.
Alternatively, if that does not solve your problem, a bit of a cheese solution (which might suffice) is making the application pull documents in large batches and making each instance pick a random document from that batch and work on it. Obvisously this shady solution is based on the fact that coincidence will not punish your application's efficieny.
Sounds like you want to use MongoDB's atomic operators: http://www.mongodb.org/display/DOCS/Atomic+Operations