MongoDB collection locking - how does it work? - mongodb

I have not so big collection, that has about 500k records, but it's mission critical.
I want to add one field and remove another one. I was wondering would it lock that collection from inserting/updating (I really don't want any downtime).
I've made an experiment, and it looks that it doesn't block it:
// mongo-console 1
use "my_db"
// add new field
db.my_col.update(
{},
{ $set:
{ foobar : "bizfoo"}
},
{ multi: true}
);
// mongo-console 2
use "my_db"
db.my_col.insert({_id: 1, foobar: 'Im in'});
db.my_col.findOne({_id: 1});
=>{ "_id" : 1, "foo" : "bar" }
Although I don't really understand why, because db.currentOp() shows that there are Write locks on it.
Also on the production system I have replica set, and I was curious how does it impact the migration.
Can someone answer these questions, or point me to some article where it's nicely explained.
Thanks!
(MongoDB version I use is 2.4)

MongoDB 2.4 locks on the database level per shard. You mentioned you have a replica set. Replica sets have no impact on the locking. Shards do. If you have your data sharded, when you perform an update, the lock will only lock the database on the shard where the data lives. If you don't have your data sharded, then the database is locked during the write operation.
In order to see impact, you'll need a test that does a significant amount of work.
You can read more at:
http://www.mongodb.org/display/DOCS/How+does+concurrency+work

Related

Is it possible to update the same document on concurrent findAndModify operations in Mongo?

Let's suppose I am running the following command on multiple concurrent threads:
db.tasks.findAndModify({
query: { status : "TODO" },
update: { $set: { status : "DONE" } },
new: true
})
From this command it's obvious that each document should be updated exactly once, since after the update the query no longer matches the state of the document. This implies that each running thread will get a different task on each execution.
Is this something that Mongo guarantees without the need for extra transactions? I've read similar questions about concurrency and document-level locking but none of them seems to be matching my case where the update operation modifies fields referenced on the query.
I am using Mongo 4.0 with WiredTiger storage engine if that's relevant.
As far as I can tell this isn't explicitly mentioned in the documentation.
However, https://docs.mongodb.com/manual/reference/method/db.collection.findAndModify/#upsert-with-unique-index says that findandmodify can insert multiple documents when invoked concurrently and an upsert is requested:
If all findOneAndUpdate() operations finish the query phase before any client successfully inserts data, and there is no unique index on the name field, each findOneAndUpdate() operation may result in an insert, creating multiple documents with name: Andy.
This suggests that the find part and the modify part are simply linearly sequenced, and there isn't anything special being done by f-a-m to ensure only one modification takes place for a given starting document.
But, the above could only apply to upserts.

Best way to drop big collection in MongoDB

I have standalone MongoDB instance version 3.2. Storage engine is WiredTiger. What is the best way to drop big collection (>500Gb) to minimize time of exclusive database lock? Will be there time difference between 2 solution?
Remove all documents from collection, drop index, drop collection
Just drop collection
Additional information, probably it could be important:
Collection contains about 200.000.000 documents
Collection has only one index by _id
'_id' looks like {_id : {day: "2018-01-01", key :"someuniquekeybyday"}}
The correct answer is probably: "drop operation is not linear". It takes few seconds on 10Gb collection and it takes absolutely the same time with 500Gb collection.
I've deleted 1TB collection many times, it took several seconds.
p.s. To offer you something new, not seen in comments: you have the third option - to make a copy of other collections in this database and then switch database in your application.
I have dropped over 1.4 TB collection on version 4.0. Operation took less than 1 sec.
2022-03-15T01:17:25.688+0000 I REPL [replication-2] Completing collection drop for order.system.drop.1647307045i163t6.feeds with drop optime { ts: Timestamp(1647307045, 163), t: 6 } (notification optime: { ts: Timestamp(1647307045, 163), t: 6 })
As per documentation drop operation will obtain lock on affected database & will block all the operations, so application will suffer minor latencies in database operations for short duration. before dropping large collection make sure...
remove all read/writes from application on that collection
rename to temporary collection to avoid any operation during drop.
choose least traffic time to drop temporary collection.

Mongodb eventual consistency on replica-sets when writing on two documents

We have a single client that serially writes on two documents (with {w:1}).
For example, the original documents may be:
{_id: "a", value: 0},
{_id: "b", value: 0}
and the client updates document "a" to {_id: "a", value: 1} and then, after the update completes, the client updates document "b" to {_id: "b", value: 1}.
A second client calls find({}) afterwards. The second client reads from a secondary, which may have not received all the changes.
Obviously it can read the following states:
{_id:"a",value:0},{_id:"b",value:0}
{_id:"a",value:1},{_id:"b",value:0}
{_id:"a",value:1},{_id:"b",value:1}
which are "real" states on the primary (at some moment in the past).
Can the second client see a state like: {_id:"a",value:0},{_id:"b",value:1}? Notice that this state never existed on the primary.
P.S.
The explanation here says:
Secondaries ... apply write operations in the order that they appear in the oplog.
Does that mean the secondaries change their documents at the same order they were updated on the primary?
P.S. does find cursors "freeze" the state of the documents that they are reading (i.e. ignore changes that were made after the cursor was created)? Could things be different if I used find(...).sort({_id:-1}) or if document "a"'s id was "c" (i.e. larger than "b")?
Thanks
First question: yes, the operations on the secondary are performed in the same order as on the primary. All operations are recorded in the oplog. The oplog itself is not a journal of the queries performed (i.e. updateMany()) but what has to be done on the actual documents so it's operations become idempotent.
Regarding the cursor operation. It might happen that documents get moved or updated while iterating over the cursor. It may even happen, that the same document appears twice on the cursor if it's index or storage location changes during the update.
There is a special snapshot mode that provides some sort of isolation, but it has some limitations, i.e. it cannot be used with sharding
if our documnet was updated by sequence on master
change A
change B
change C
then secondaries will update document with the same sequence:
change A
documnet can be read without other changes applied
change B
documnet can be read without other changes applied
change C
For locking see this as mongo can optimise operations sequence, which can allow reads even if document update is fired to proceed.

Partial doc updates to a large mongo collection - how to not lock up the database?

I've got a mongo db instance with a collection in it which has around 17 million records.
I wish to alter the document structure (to add a new attribute in the document) of all 17 million documents, so that I dont have to problematically deal with different structures as well as make queries easier to write.
I've been told though that if I run an update script to do that, it will lock the whole database, potentially taking down our website.
What is the easiest way to alter the document without this happening? (I don't mind if the update happens slowly, as long as it eventually happens)
The query I'm attempting to do is:
db.history.update(
{ type : { $exists: false }},
{
$set: { type: 'PROGRAM' }
},
{ multi: true }
)
You can update the collection in batches(say half a million per batch), this will distribute the load.
I created a collection with 20000000 records and ran your query on it. It took ~3 minutes to update on a virtual machine and i could still read from the db in a separate console.
> for(var i=0;i<20000000;i++){db.testcoll.insert({"somefield":i});}
The locking in mongo is quite lightweight, and it is not going to be held for the whole duration of the update. Think of it like 20000000 separate updates. You can read more here:
http://docs.mongodb.org/manual/faq/concurrency/
You do actually care if your update query is slow, because of the write lock problem on the database you are aware of, both are tightly linked. It's not a simple read query here, you really want this write query to be as fast as possible.
Updating the "find" part is part of the key here. First, since your collection has millions of documents, it's a good idea to keep the field name size as small as possible (ideally one single character : type => t). This helps because of the schemaless nature of mongodb collections.
Second, and more importantly, you need to make your query use a proper index. For that you need to workaround the $exists operator which is not optimized (several ways to do it there actually).
Third, you can work on the field values themselves. Use http://bsonspec.org/#/specification to estimate the size of the value you want to store, and eventually pick a better choice (in your case, you could replace the 'PROGRAM' string by a numeric constant for example and gain a few bytes in the process, multiplied by the number of documents to update for each update multiple query). The smaller the data you want to write, the faster the operation will be.
A few links to other questions which can inspire you :
Can MongoDB use an index when checking for existence of a field with $exists operator?
Improve querying fields exist in MongoDB

Ensuring data persistence in MongoDB

I want to be sure whether my data gets persisted successfully in MongoDB. As in some cases MongoDB takes a fire_and_forget strategy, I want to specify Write Concern {w : majority, j : 1} at driver level which in my case is Mongoid.
Use-case :
I want to ensure my Users have unique 'nickname' and cannot signup violating the uniqueness.
I have already created an Unique Index on 'nickname' field.
For replica sets you can use the following configuration, as is described at http://mongoid.org/en/mongoid/docs/installation.html#replica:
consistency: :strong
Together with that, you'd want to have safe mode on, as is described at http://mongoid.org/en/mongoid/docs/tips.html#safe_mode:
safe: true
It does not look like you can set MongoDB's w parameter like this, but you can set it on a Band document operation—that's going to be per query though:
Band.with(safe: { w: 3 })
You can also do it per session with:
Band.mongo_session.with(safe: { w: 3 }) do |session|
session[:artists].find(...)
end
Short answer: you can't.
Long answer:
Consider using multiple data storage options. Far too often people are jumping on the NoSQL bandwagon, when it isn't necessary. If you need guranteed writes you should use a relational database, or consider a hybrid such as orientDB. Lack of guaranteed writes is one of the big reasons why solutions such as MongoDB scale so well.