Why do reads in MongoDB sometimes wait for lock? - mongodb

While using db.currentOp(), I sometimes see operations like:
{
"opid" : 1238664,
"active" : false,
"lockType" : "read",
"waitingForLock" : true,
"op" : "query",
.....
"desc" : "conn"
}
Why does a read operation need to wait for a lock? Is there a way to tell a query to ignore any pending writes and just go ahead and read anyway?

You can't tell a query ignore pending writes because of mongodb indexes working in synchronous way. And this is by design.
For example indexes in RavenDB can work in async and sync way. So may be you need ravendb (if you on windows) ;)
Why do reads in MongoDB sometimes wait for lock?
They are waiting for index rebuild.

Related

Mongodb Memory engine vs Redis for caching the writes

I have a server for processing the users' page viewing history with Mongodb.
The collections are saved like this when a user views a page
view_collection
{ "_id" : "60b212afb63a57d57a8f0006",
"pageId" : "gh42RzrRqYbp2Hj1y",
"userId" : "9Swh4jkYOPjWSgxjm",
"uniqueString" : "s",
"views" : {
"date" : ISODate("2021-01-14T14:39:20.378+0000"),
"viewsCount" : NumberInt(1)
}}
page_collection
{"_id" : "gh42RzrRqYbp2Hj1y", "views" : NumberInt(30) ,"lastVisitors" : ["9Swh4jkYOPjWSgxjm"]}
user_collection
{
_id:"9Swh4jkYOPjWSgxjm",
"statistics" : {
"totalViewsCount" : NumberInt(1197) }
}
Everything is working fine, Except that I want to find a way to cache the operations going to database .
I've been thinking about how to use Redis to cache the writings and then periodically looping through the Redis-keys to get the results inserted into Database. (But It would be too complicated and needs lots of coding. ) Also, I found Mongodb has In-Memory Storage ,for which I might not need to re-write everything from zero and simply change some config files of mongod to get the cache-write works
Redis is a much less featureful data store than MongoDB. If you don't need any of the MongoDB functionality on your data, you can put it in redis for higher performance.
MongoDB in memory storage engine sounds like a premature optimization.

Is mongodb handle query one by one?

Assume I have a mongodb instance on a remote server (1 core cpu and 8G memory).
When I send a simple query db.table.find({_id:ObjectId('xxx')}).limit(1).explain()to the instance, I got the result show that query cost 10ms.
So can I come to a conclusion that "my mongodb server can only handle 100 simple query per second"?
"So can I come to a conclusion that "my mongodb server can only handle 100 simple query per second""
Its not tht like that 1 query takes 10ms then 100 query will take 1sec
db.table.find({_id:ObjectId('xxx')}).limit(1) will not lock your table
So if your 100 client request this with 100 connections all will return in 10ms
Concurrency Depends upon locks and connection limits
If a query locking collection for read and write for 10 sec then all query after that will wait for lock to cleared
MongoDb Can Handle multiple Request
db.runCommand( { serverStatus: 1 } )
will return current status of mongo there is object in
"connections" : {
"current" : <num>,
"available" : <num>,
"totalCreated" : <num>,
"active" : <num>
}
https://docs.mongodb.com/manual/reference/command/serverStatus/#connections
https://docs.mongodb.com/manual/reference/configuration-options/#net.maxIncomingConnections
https://docs.mongodb.com/manual/faq/concurrency/
These will give more clarity around connections and its limits
You should know, your mongodb server can handle many parallel queries.
your assumption is incorrect, cuz u are ignoring the fact that mongodb supports concurrency
Source

How to get notified when a new field is added to mongodb collections?

I have a graphQl schema defined which needs to be changed runtime whenever there is a new field added in a mongodb collection. For example, a collection has just two fields before
person {
"age" : "54"
"name" : "Tony"
}
And later a new field, "height" is added.
person {
"age" : "54"
"name" : "Tony"
"height" : "167"
}
I need to change my graphql schema and add height to that. So how do I get alerted or notifications from Mongodb ?
MongoDB does not natively implement event messaging. You cannot, natively, get informed of a DB, collections or document updates.
However, MongoDB integrates an 'operation log' feature, which allows you to get access to a journal log of each write operation on collections.
The journal logs are used for MongoDB replicas, aka cluster synchronization features. In order to activate oplogs you need to have at least two MongoDB instances, a master and a replicate.
Operations logs are built upon the capped collection feature, which allows a collection to be built over an append-only mechanism, which ensures fast writes and tailing cursors. Authors say:
The oplog exists internally as a capped collection, so you cannot
modify its size in the course of normal operations.
MongoDB - Change the Size of the Oplog
And:
Capped collections are fixed-size collections that support
high-throughput operations that insert and retrieve documents based on
insertion order. Capped collections work in a way similar to circular
buffers: once a collection fills its allocated space, it makes room
for new documents by overwriting the oldest documents in the
collection.
MongoDB - Capped Collections
The schema of the documents within an operation log journal looks like:
"ts" : Timestamp(1395663575, 1),
"h" : NumberLong("-5872498803080442915"),
"v" : 2,
"op" : "i",
"ns" : "wiktory.items",
"o" : {
"_id" : ObjectId("533022d70d7e2c31d4490d22"),
"author" : "JRR Hartley",
"title" : "Flyfishing"
}
}
Eg: "op" : "i" means operation is an insertion and "o" is the object inserted.
The same way, you can be informed of update operations:
"op" : "u",
"ns" : "wiktory.items",
"o2" : {
"_id" : ObjectId("533022d70d7e2c31d4490d22")
},
"o" : {
"$set" : {
"outofprint" : true
}
}
Note that the operation logs (you access them as collections) are limited either in disk size or entry numbers (FIFO). This means that, eventually, whwnever your oplog consumers are slower than oplog writers, you will get missed operation log entries, resulting in corrupted consumption results.
This is the reason why MongoDB is terrible for guaranteeing document tracking on highly sollicited clusters, and the reason why solutions for messaging such as Apache Kafka come as supplements for event tracking (eg: event document update)
To answer your question: in a reasonably solicited environment, you might want to take a look at the Javascript Meteor project, which allows you to trigger events based on changes from queries results, and relies on MongoDB oplog features.
Credits: oplogs examples from The MongoDB Oplog
As of MongoDb 3.6 you can subscribe to a change stream. You could subscribe to an "update" event operation. More details here:
https://stackoverflow.com/a/47184757/5103354

Simple inserts and updates to MongoDB run very slowly under load

I'm using Mongo 2.6.9 with a cluster of 2 shards, each shard has 3 replicas, one of which is hidden.
It's a 5 machines deployment running on RedHat where 4 machines contain a single replica of 1 shard and the 5th machine contains the hidden replicas of both shards.
There is a load running of around 250 inserts per second and 50 updates per second. These are simple inserts and updates of pretty small documents.
In addition there is a small load of small files inserted to GridFS (around 1 file / second). The average file size is less than 1 MB.
There are 14 indexes defined for the involved collections. Those will be required when I will be adding the application that will be reading from the DB.
From the logs of the primary replicas I see during the whole run a huge amount of simple inserts and updates or even GetLastError requests that take hundreds of ms or even sometimes seconds (the default logging level only shows queries that took more than 100ms). For example this simple update uses an index for the query and doesn't update any index:
2015-10-12T06:12:17.258+0000 [conn166086] update chatlogging.SESSIONS query: { _id: "743_12101506113018605820fe43610c0a81eb9_IM" } update: { $set: { EndTime: new Date(1444630335126) } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:0 numYields:0 locks(micros) w:430 2131ms
2015-10-12T06:12:17.259+0000 [conn166086] command chatlogging.$cmd command: update { update: "SESSIONS", updates: [ { q: { _id: "743_12101506113018605820fe43610c0a81eb9_IM" }, u: { $set: { EndTime: new Date(1444630335126) } }, multi: false, upsert: false } ], writeConcern: { w: 1 }, ordered: true, metadata: { shardName: "S1R", shardVersion: [ Timestamp 17000|3, ObjectId('56017697ca848545f5f47bf5') ], session: 0 } } ntoreturn:1 keyUpdates:0 numYields:0 reslen:155 2132ms
All inserts and updates are made with w:1, j:1.
The machines have plenty of available CPU and memory. The disk I/O is significant, but not coming anywhere near 100% when these occur.
I really need to figure out what's causing this unexpectedly slow responsiveness of the DB. It's very possible that I need to change something in the way the DB is set up. Mongo runs with default configuration including the logging level.
An update -
I've continued looking into this issue and here are additional details that I'm hoping will allow to pinpoint the root cause of the problem or at least point me to the right direction:
The total DB size for a single shard is more than 200GB at the moment. The indexes being almost 50GB. Here is the relevant part from db.stats() and the mem part from db.ServerStatus() from the primary replica of one of the shards:
"collections" : 7,
"objects" : 73497326,
"avgObjSize" : 1859.9700916465995,
"dataSize" : 136702828176,
"storageSize" : 151309253648,
"numExtents" : 150,
"indexes" : 14,
"indexSize" : 46951096976,
"fileSize" : 223163187200,
"nsSizeMB" : 16,
"mem" : {
"bits" : 64,
"resident" : 5155,
"virtual" : 526027,
"supported" : true,
"mapped" : 262129,
"mappedWithJournal" : 524258
},
The servers have 8GB of RAM, out of which the mongod process use around 5GB. So the majority of the data and probably more important the indexes is not kept in memory. Can this be the our root cause? When I previously wrote that the system has plenty of free memory, I was refering to the fact that the mongod process isn't using as much as it could and also that most of the RAM is used for cached memory that can be released if required:
free -m output
Here is the output of mongostat from the same mongod:
mongostat output
I do see few faults in these, but these numbers look too low to me to indicate a real problem. Am I wrong?
Also I don't know whether the numbers seen in "locked db" are considered reasonable, or do those indicate that we have locks contention?
During the same timeframe when these stats were taken, many simple update operations that find a document based on an index and don't update an index, like the following one took hundreds of ms:
2015-10-19T09:44:09.220+0000 [conn210844] update chatlogging.SESSIONS query: { _id: "838_19101509420840010420fe43620c0a81eb9_IM" } update: { $set: { EndTime: new Date(1445247849092) } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:0 numYields:0 locks(micros) w:214 126ms
Many other types of insert or update operations take hundreds of ms too. So the issue looks to be system wide and not related to a specific type of query. Using mtools I'm not able to find operations that scan lots of documents.
I'm hoping that here I'll be able to get help with regards to finding the root cause of the problem. I can provide whatever additional info or statistics from the system.
Thank you in advance,
Leonid
1) First you need to increase the logging level
2) Use mtools to figure out what queries are slow
3) Tune that queries to figure out your bottleneck

MongoDB collection locking - how does it work?

I have not so big collection, that has about 500k records, but it's mission critical.
I want to add one field and remove another one. I was wondering would it lock that collection from inserting/updating (I really don't want any downtime).
I've made an experiment, and it looks that it doesn't block it:
// mongo-console 1
use "my_db"
// add new field
db.my_col.update(
{},
{ $set:
{ foobar : "bizfoo"}
},
{ multi: true}
);
// mongo-console 2
use "my_db"
db.my_col.insert({_id: 1, foobar: 'Im in'});
db.my_col.findOne({_id: 1});
=>{ "_id" : 1, "foo" : "bar" }
Although I don't really understand why, because db.currentOp() shows that there are Write locks on it.
Also on the production system I have replica set, and I was curious how does it impact the migration.
Can someone answer these questions, or point me to some article where it's nicely explained.
Thanks!
(MongoDB version I use is 2.4)
MongoDB 2.4 locks on the database level per shard. You mentioned you have a replica set. Replica sets have no impact on the locking. Shards do. If you have your data sharded, when you perform an update, the lock will only lock the database on the shard where the data lives. If you don't have your data sharded, then the database is locked during the write operation.
In order to see impact, you'll need a test that does a significant amount of work.
You can read more at:
http://www.mongodb.org/display/DOCS/How+does+concurrency+work