Why every few minutes activity on local.oplog.rs locks mongo clients - mongodb
The problem:
Every one or two minutes, the mongo clients halt for about 3 seconds. Normal operation time for updates is about 1 or 2 milliseconds. When this slowness appear, we have a bunch of updates lasting 1 to 3 seconds.
The slow queries log does not show anything related to this. Neither does debugging the mongo client (mongo-php-client).
The current architecture has 1 master, 1 slave and one Arbiter in the replica set.
The queries executed are always the same sort (upsert by _id, insert with new MongoId). There is no "every few minutes we run this super-expensive update"
The blocking seems to be caused by local.oplog.rs. At least, that is what the mongotop output shows below. I haven't found any indication that the secondary is causing this issue, as the outputs of all the following commands seem to be stable there. I haven't found any information pointing that the slowness is caused by a specific query either.
The idea behind the way we store data is pre-aggregated reports. We have a lot of updates (couple of hundreds per second), but a very low rate of queries.
The indexes are date bound (except for _id, which is calculated from the composite key based on all the dimensions the record contains. Whit this, i mean that the _id is not incremental, as it would be with an ObjectId index). Just to give an idea, the indexes in the biggest collection are (in Mb):
"indexSizes" : {
"_id_" : 1967,
"owner_date" : 230,
"flow_date" : 231,
"date" : 170
},
Most of the other collections have indexes of the size of 100Mb or less. In all the collections, the _id index is the biggest one. Is worth noting that this ids are generated manually (based on the metadata so the insertion is done as an upsert, and are not incremental)
FollowUp:
Edit1: After more digging, it seems that the locking is related with the rsync process of the journal, when flushing to disk. The journal is in the same filesystem than the data files, but the slowness in not expected as the disks are fast ssd devices.
Edit2: After some testings, the writing capability of the disks is not an issue. We are usually writing at a rate of several megabytes per second. Running some tests on on the disk it accepts 150mb/s without problem
The expected answer:
Why is this downtime happening?
Pointers to possible reasons, in order to investigate further
Experience/solution based on similar cases
Issue Explanation:
The following commands are run in the primary node
Every time the slowness appears, we see the following in the mongostat (two examples)
insert query update delete getmore command % dirty % used flushes vsize res qr|qw ar|aw netIn netOut conn set repl time
10 *0 141 *0 93 120|0 0.4 80.0 0 11.5G 9.4G 0|1 1|0 110k 71k 274 rs0 PRI 13:15:44
12 *0 178 *0 72 105|0 0.2 80.0 1 11.5G 9.4G 0|0 1|0 111k 79k 274 rs0 PRI 13:15:45
47 *0 7 *0 0 159|0 0.1 80.0 0 11.5G 9.4G 0|0 2|1 15k 44k 274 rs0 PRI 13:15:49 !!!!HERE
14 *0 929 *0 99 170|0 0.2 80.0 0 11.5G 9.4G 0|0 1|0 419k 417k 274 rs0 PRI 13:15:50
21 *0 287 *0 124 181|0 0.2 80.0 0 11.5G 9.4G 0|0 1|0 187k 122k 274 rs0 PRI 13:15:51
insert query update delete getmore command % dirty % used flushes vsize res qr|qw ar|aw netIn netOut conn set repl time
10 *0 145 *0 70 108|0 0.3 79.9 0 11.5G 9.4G 0|0 1|0 98k 71k 274 rs0 PRI 13:16:48
11 *0 155 *0 72 111|0 0.2 79.9 1 11.5G 9.4G 0|0 1|0 103k 75k 274 rs0 PRI 13:16:49
44 *0 3 *0 0 144|0 0.0 79.9 0 11.5G 9.4G 0|2 1|0 11k 75k 274 rs0 PRI 13:16:53 !!!!HERE
11 *0 837 *0 94 134|0 0.2 79.9 0 11.5G 9.4G 0|0 1|0 377k 348k 274 rs0 PRI 13:16:54
12 *0 180 *0 86 139|0 0.2 79.9 0 11.5G 9.4G 0|0 1|0 122k 85k 274 rs0 PRI 13:16:55
14 *0 195 *0 83 124|0 0.2 79.9 0 11.5G 9.4G 0|0 2|0 125k 89k 274 rs0 PRI 13:16:56
the update column has a drop, and the following one much more updates. Note also that we are using mongostat with 1 second delay. When the slowness appears, the mongostat stops replying for few seconds.
The stop is only present in the master, not in the slave server
This is the output of mongotop when this problem happens (at 2015-07-07T13:29:38):
(an example with a bit more of context can be found here)
ns total read write 2015-07-07T13:29:33+02:00
database_name.d_date_flow_owner 555ms 550ms 4ms
local.oplog.rs 61ms 53ms 7ms
database_name.client_context_bbbbbbb 15ms 0ms 15ms
database_name.d_date_landing_owner 15ms 0ms 15ms
database_name.d_date_billing_owner 10ms 0ms 10ms
database_name.w_bl_carrier_country_date_device_flow_landing_manager_op1_os_owner_prod_site 7ms 0ms 7ms
database_name.d_carrier_country_date_device_flow_landing_op1_os_owner_site 5ms 0ms 5ms
database_name.d_country_date_owner 5ms 0ms 5ms
database_name.d_date_device_owner 5ms 0ms 5ms
database_name.d_date_os_owner 5ms 0ms 5ms
ns total read write 2015-07-07T13:29:37+02:00
database_name.client_context_bbbbbbb 2ms 0ms 2ms
database_name.client_context_aaaaaaaaa 1ms 0ms 1ms
admin.system.backup_users 0ms 0ms 0ms
admin.system.namespaces 0ms 0ms 0ms
admin.system.new_users 0ms 0ms 0ms
admin.system.profile 0ms 0ms 0ms
admin.system.roles 0ms 0ms 0ms
admin.system.users 0ms 0ms 0ms
admin.system.version 0ms 0ms 0ms
database_name 0ms 0ms 0ms
ns total read write 2015-07-07T13:29:38+02:00
local.oplog.rs 8171ms 4470ms 3701ms
database_name.d_date_op1_owner 45ms 0ms 45ms
database_name.d_date_device_owner 39ms 0ms 39ms
database_name.m_date_owner 34ms 0ms 34ms
database_name.d_date_owner 32ms 0ms 32ms
database_name.d_date_owner_site 31ms 0ms 31ms
database_name.d_carrier_date_owner 30ms 0ms 30ms
database_name.d_date_flow_owner 30ms 0ms 30ms
database_name.d_date_owner_product 28ms 0ms 28ms
database_name.d_carrier_country_date_device_flow_landing_op1_os_owner_site 27ms 0ms 27ms
ns total read write 2015-07-07T13:29:39+02:00
database_name.d_date_flow_owner 558ms 552ms 6ms
local.oplog.rs 62ms 61ms 1ms
database_name.d_carrier_date_owner 17ms 0ms 17ms
database_name.d_date_owner 16ms 0ms 16ms
database_name.w_bl_carrier_country_date_device_flow_landing_manager_op1_os_owner_prod_site 7ms 0ms 7ms
database_name.d_date_billing_owner 6ms 0ms 6ms
database_name.d_carrier_country_date_device_flow_landing_op1_os_owner_site 5ms 0ms 5ms
database_name.d_country_date_owner 5ms 0ms 5ms
database_name.d_date_device_owner 5ms 0ms 5ms
database_name.d_date_op1_owner 5ms 0ms 5ms
Debug of the php mongo client that shows the issue (the last two "PhpMongoClient debug" lines):
(an example with a bit more of context can be found here)
Update duration: 2ms
Update duration: 1ms
Update duration: 1ms
Update duration: 4006ms
PhpMongoClient debug: 2015-07-07 10:40:26 - PARSE (INFO): Parsing mongodb://primary_host.lan,secondary_host.lan
PhpMongoClient debug: 2015-07-07 10:40:26 - PARSE (INFO): - Found node: primary_host.lan:27017
[....]
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): limiting by credentials: done
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): sorting servers by priority and ping time
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): - connection: type: PRIMARY, socket: 42, ping: 0, hash: primary_host.lan:27017;rs0;database_name/user/5ca571e7db198eeee3abee35857bfd53;30751
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): sorting servers: done
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): selecting near servers
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): selecting near servers: nearest is 0ms
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): - connection: type: PRIMARY, socket: 42, ping: 0, hash: primary_host.lan:27017;rs0;database_name/user/5ca571e7db198eeee3abee35857bfd53;30751
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): selecting near server: done
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (INFO): pick server: random element 0
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (INFO): - connection: type: PRIMARY, socket: 42, ping: 0, hash: primary_host.lan:27017;rs0;database_name/user/5ca571e7db198eeee3abee35857bfd53;30751
PhpMongoClient debug: 2015-07-07 10:40:26 - CON (FINE): No timeout changes for primary_host.lan:27017;rs0;database_name/user/5ca571e7db198eeee3abee35857bfd53;30751
PhpMongoClient debug: 2015-07-07 10:40:30 - CON (FINE): No timeout changes for primary_host.lan:27017;rs0;database_name/user/5ca571e7db198eeee3abee35857bfd53;30751
Update duration: 3943ms
Update duration: 3476ms
Update duration: 2008ms
Update duration: 961ms
Update duration: 956ms
Update duration: 20ms
Update duration: 20ms
Update duration: 3ms
Update duration: 42ms
Update duration: 24ms
Update duration: 25ms
Update duration: 56ms
Update duration: 24ms
Update duration: 11ms
Update duration: 11ms
Update duration: 3ms
Update duration: 2ms
Update duration: 3ms
Update duration: 1ms
Update duration: 1ms
Update duration: 1ms
Update duration: 2ms
Mongo Information:
Mongo Version: 3.0.3
Replica set with 1 slave and 1 arbiter
Replication lag varies between 0 and 4 seconds
Engine: WiredTiger
Filesystem: XFS
Operative System: Red Hat Enterprise Linux Server release 7.1
Memory: 24Gb. Reported by htop as 40% used, 60% cache
This issue has now disappeared. Two actions were undertaken:
Reworked the pre-aggregated reports system. The workload of mongo was reduced by 10 times.
Updated version of mongo to 3.0.6
Unfortunately, the two changes were put online with no much time in between them. I have the suspicion that reducing the workload did the trick (which might or might not be linked to the issue that #steve-brisk pointed out) at least for now (we'll see when we hit again the previous workload levels). But, as the version is also updated, it might be that even with the previous levels of workload we might not hit this issue again
I have no evidence pointing to one of the two solutions alone. But after taking both of them, the issue is resolved
Related
ACI Container hangs during run CPU=0
I am attempting to run a bash script using ACI. Occasionally the container stalls or hangs. CPU activity drops to 0, memory flattens(see below) and network activity drops to ~50bytes but the script never completes. The container never terminates that I can tell. A bash window can be opened on the container. The logs suggest the hang occurs during wget. Possible clue: How can I verify my container is using SMB3.0 to connect to my share or is that handled at the host server level and I have to assume ACI uses SMB 3.0 ? This script: Dequeues an item from ServiceBus. Runs an exe to obtain a URL Performs a wget using the URL; writes the output to a StorageAccount Fileshare. Exits, terminating the container. wget is invoked with a 4min timeout. Data is written directly to the share so the run can be retried if it fails and the wget can resume. The timeout command should force wget to end if it hangs. The logs suggest the container hangs at wget. timeout 4m wget -c -O "/aci/mnt/$ID/$ID.sra" $URL I have 100 items in queue. I have 5 ACI's running 5 containers each (25 total containers.) A Logic App checks the Queue and if items are present, runs the containers. Approximately 95% of the download runs work as expected. Many of the runs simply hang as far as I can tell at 104GB total downloads. I am using a Premium Storage Account with a 300GB Fileshare using SMB Multichannel=Enabled It seems on some of the large files (>3GB) the Container Instance will hang. A successful run looks somethin like this: PeekLock Message (5min)... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 353 0 353 0 0 1481 0 --:--:-- --:--:-- --:--:-- 1476 Wed Dec 28 14:31:41 UTC 2022: prefetch-01-0: ---------- BEGIN RUN ---------- ./pipeline.sh: line 80: can't create /aci/mnt/SRR10575111/SRR10575111.txt: nonexistent directory Wed Dec 28 14:31:41 UTC 2022: prefetch-01-0: vdb-dump [SRR10575111]... Wed Dec 28 14:31:44 UTC 2022: prefetch-01-0: wget [SRR10575111]... Wed Dec 28 14:31:44 UTC 2022: prefetch-01-0: URL=https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR10575111/SRR10575111 Connecting to sra-pub-run-odp.s3.amazonaws.com (54.231.131.113:443) saving to '/aci/mnt/SRR10575111/SRR10575111.sra' SRR10575111.sra 0% | | 12.8M 0:03:39 ETA SRR10575111.sra 1% | | 56.1M 0:01:39 ETA ... SRR10575111.sra 99% |******************************* | 2830M 0:00:00 ETA SRR10575111.sra 100% |********************************| 2833M 0:00:00 ETA '/aci/mnt/SRR10575111/SRR10575111.sra' saved Wed Dec 28 14:35:42 UTC 2022: prefetch-01-0: wget exit... Wed Dec 28 14:35:43 UTC 2022: prefetch-01-0: wget Success! Wed Dec 28 14:35:43 UTC 2022: prefetch-01-0: Delete Message... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 Wed Dec 28 14:35:43 UTC 2022: prefetch-01-0: POST to [orchestrator] queue... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 325 0 0 100 325 0 1105 --:--:-- --:--:-- --:--:-- 1109 Wed Dec 28 14:35:44 UTC 2022: prefetch-01-0: exit RESULTCODE=0 A Hung run looks like this: PeekLock Message (5min)... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 352 0 352 0 0 1252 0 --:--:-- --:--:-- --:--:-- 1252 Wed Dec 28 14:31:41 UTC 2022: prefetch-01-1: ---------- BEGIN RUN ---------- ./pipeline.sh: line 80: can't create /aci/mnt/SRR9164212/SRR9164212.txt: nonexistent directory Wed Dec 28 14:31:41 UTC 2022: prefetch-01-1: vdb-dump [SRR9164212]... Wed Dec 28 14:31:44 UTC 2022: prefetch-01-1: wget [SRR9164212]... Wed Dec 28 14:31:44 UTC 2022: prefetch-01-1: URL=https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR9164212/SRR9164212 Connecting to sra-pub-run-odp.s3.amazonaws.com (52.216.146.75:443) saving to '/aci/mnt/SRR9164212/SRR9164212.sra' SRR9164212.sra 0% | | 2278k 0:55:44 ETA SRR9164212.sra 0% | | 53.7M 0:04:30 ETA SRR9164212.sra 1% | | 83.9M 0:04:18 ETA ... SRR9164212.sra 44% |************** | 3262M 0:04:55 ETA SRR9164212.sra 44% |************** | 3292M 0:04:52 ETA SRR9164212.sra 45% |************** | 3326M 0:04:47 ETA The container is left in a Running State. CPU goes to 0; Network activity goes to ~ 50B received/transmitted.
How to free memory after mongodump?
Is there a way to tell mongodump or mongod to that effect, to free the current used ram? I have an instance with a Mongo DB server which has a couple of databases which total around 2GB. The instance has 5GB of ram. Every night I have a backup cron running mongodump. I have set up a 5GB swap file. Every other night the OOM-Killer will kill mongo, the memory will go down to ~30% and spike up to again ~60% at backup time and stays like that until next backup spikes up the memory and OOM-Killer kicks in. Before this I had like 3.75GB and no swap and it was getting killed every night and sometimes during the day when usted. I added more ram and the swap file a few days ago, and it has improved, but still is getting killed every other day and the memory after the backup is at ~60%. And I'm paying for extra ram that is only used for these spikes at backup. If I run mongostat I see that mongo increases the used ram during backup but does frees it afterward. Is there a way to free it? Something that will not be to stop and start mongod ? Before Backup: insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn time *0 *0 *0 *0 0 3|0 0.0% 1.0% 0 1.10G 100M 0|0 1|0 212b 71.0k 3 May 3 16:14:41.461 During backup: insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn time *0 *0 *0 *0 1 1|0 0.0% 81.2% 0 2.61G 1.55G 0|0 2|0 348b 33.5m 7 May 3 16:16:01.464 After Backup insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn time *0 *0 *0 *0 0 2|0 0.0% 79.7% 0 2.65G 1.62G 0|0 1|0 158b 71.1k 4 May 3 16:29:18.015
No drop in mongodb write bandwidth after changing to atomic update
I recently changed how data was being updated in a large mongo collection. The average document size in this collection is 71K. MongoDB Info: [ec2-user#ip-10-0-1-179 mongodata]$ mongod --version db version v3.0.7 git version: 6ce7cbe8c6b899552dadd907604559806aa2e9bd Storage Engine: WiredTiger Previously we were retrieving the entire document, adding a single element to 3 arrays and then writing the entire document back to Mongo. The recent change I made was to create atomic ADD operations so that rather than writing the entire document back to Mongo we just called 3 ADD operations instead. This has had a considerable impact on the network bandwidth into the mongo server (from 634 MB/s to 72 MB/s average): But the disk volume metrics tell a VERY different story. There was absolutely NO change in the data volume metrics (i.e. /var/mongodata/): The journal volume metrics appear to show a significantly INCREASED write bandwidth and IOPS (i.e. /var/mongodata/journal): I can 'just about' justify the increased journal write IOPS as I am now performing multiple smaller operations instead of one large one. Here is a current snapshot of mongostat (which doesn't suggest a huge number of inserts or updates) insert query update delete getmore command % dirty % used flushes vsize res qr|qw ar|aw netIn netOut conn set repl time 1 910 412 *0 0 77|0 7.7 80.2 0 29.2G 21.9G 0|0 4|2 1m 55m 738 rs0 PRI 16:21:45 1 758 19 *0 0 51|0 7.3 80.0 0 29.2G 21.9G 0|1 2|22 82k 29m 738 rs0 PRI 16:21:47 *0 1075 164 *0 0 83|0 7.4 80.1 0 29.2G 21.9G 0|0 4|2 1m 36m 738 rs0 PRI 16:21:48 *0 1046 378 *0 0 77|0 7.3 80.1 0 29.2G 21.9G 0|0 4|2 629k 55m 738 rs0 PRI 16:21:49 *0 1216 167 *0 0 58|0 7.6 80.2 0 29.2G 21.9G 0|1 1|11 238k 43m 738 rs0 PRI 16:21:50 *0 1002 9 *0 0 59|0 1.1 79.7 1 29.2G 21.9G 0|1 1|22 105k 35m 738 rs0 PRI 16:21:51 *0 801 37 *0 0 275|0 0.7 79.9 0 29.2G 21.9G 0|2 13|1 949k 17m 738 rs0 PRI 16:21:52 1 2184 223 *0 0 257|0 0.9 80.1 0 29.2G 21.9G 0|0 3|2 825k 52m 738 rs0 PRI 16:21:53 *0 1341 128 *0 0 124|0 0.9 80.0 0 29.2G 21.9G 0|1 2|39 706k 55m 738 rs0 PRI 16:21:54 1 1410 379 *0 0 121|0 1.2 80.0 0 29.2G 21.9G 0|0 2|2 2m 66m 738 rs0 PRI 16:21:55 My question is, WHY, given the almost 10 fold drop in network bandwidth (which represent the size of the write operations) is this change not being reflected by the data volume metrics.
MultiProcess Perl program Timing out connection to MongoDB
I'm writing a migration program to transform the data in one database collection to another database collection using Perl and MongoDB. Millions of documents need to be transformed and performance is very bad (it will take weeks to complete, which is not acceptable). So I thought to use Parallel::TaskManager to create multiple processes to do the transformation in parallel. Performance starts OK, then rapidly tails off, and then I start getting the following errors: update error: MongoDB::NetworkTimeout: Timed out while waiting for socket to become ready for reading at /usr/local/share/perl/5.18.2/Meerkat/Collection.pm line 322. update error: MongoDB::NetworkTimeout: Timed out while waiting for socket to become ready for reading at /usr/local/share/perl/5.18.2/Meerkat/Collection.pm line 322. So my suspicion is that this is due to spawned processes not letting go of sockets quick enough. I'm not sure how to fix this, though if in fact this is the problem. What I've tried: I reduced tcp_keepalive_time via sudo sysctl -w net.ipv4.tcp_keepalive_time=120 and restarted my mongod I reduced the max_time_ms (this made matters worse) Here's details on my setup Single Mongod, no replication or sharding. Both databases are on this server the perl program is iterating over the original database and doing some processing on that data in the document and writing to 3 collections in the new database. Using MongoDB::Client to access original database and using Meerkat to write to new database. write_safety set to zero for both. Not sure how to read this but here is a segment of mongostat from the time the errors were occurring: insert query update delete getmore command % dirty % used flushes vsize res qr|qw ar|aw netIn netOut conn time *0 *0 *0 *0 0 1|0 0.0 0.3 0 20.4G 9.4G 0|0 1|35 79b 15k 39 11:10:37 *0 3 8 *0 0 11|0 0.0 0.3 0 20.4G 9.4G 0|0 2|35 5k 18k 39 11:10:38 *0 3 1 *0 1 5|0 0.1 0.3 0 20.4G 9.4G 0|0 1|35 2k 15m 39 11:10:39 *0 12 4 *0 1 13|0 0.1 0.3 0 20.4G 9.4G 0|0 2|35 9k 577k 43 11:10:40 *0 3 1 *0 3 5|0 0.1 0.3 0 20.4G 9.4G 0|0 1|34 2k 10m 43 11:10:41 *0 3 8 *0 1 10|0 0.1 0.3 0 20.4G 9.4G 0|0 2|34 5k 2m 43 11:10:42 *0 9 24 *0 0 29|0 0.1 0.3 0 20.4G 9.4G 0|0 5|34 13k 24k 43 11:10:43 *0 3 8 *0 0 10|0 0.1 0.3 0 20.4G 9.4G 0|0 5|35 4k 12m 43 11:10:44 *0 3 8 *0 0 11|0 0.1 0.3 0 20.4G 9.4G 0|0 5|35 5k 12m 42 11:10:45 *0 *0 *0 *0 0 2|0 0.1 0.3 0 20.4G 9.3G 0|0 4|35 211b 12m 42 11:10:46 Please let me know if you would like to see any additional information to help me diagnose this problem. Dropping the number of processes running in parallel down to 3 from 8 (or more) seems to cut down the number of timeout errors, but at the cost of throughput.
None of the tuning suggestion helped, nor did bulk inserts. I continued to investigate and the root of the problem was that that my process was doing many "$addToSet" operations, which can become slow with large arrays. So my I was consuming all available sockets with slow updates. I restructured my documents so that I would not use arrays that could become large and I returned to an acceptable insert rate.
Mongod resident memory usage low
I'm trying to debug some performance issues with a MongoDB configuration, and I noticed that the resident memory usage is sitting very low (around 25% of the system memory) despite the fact that there are occasionally large numbers of faults occurring. I'm surprised to see the usage so low given that MongoDB is so memory dependent. Here's a snapshot of top sorted by memory usage. It can be seen that no other process is using an significant memory: top - 21:00:47 up 136 days, 2:45, 1 user, load average: 1.35, 1.51, 0.83 Tasks: 62 total, 1 running, 61 sleeping, 0 stopped, 0 zombie Cpu(s): 13.7%us, 5.2%sy, 0.0%ni, 77.3%id, 0.3%wa, 0.0%hi, 1.0%si, 2.4%st Mem: 1692600k total, 1676900k used, 15700k free, 12092k buffers Swap: 917500k total, 54088k used, 863412k free, 1473148k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2461 mongodb 20 0 29.5g 564m 492m S 22.6 34.2 40947:09 mongod 20306 ubuntu 20 0 24864 7412 1712 S 0.0 0.4 0:00.76 bash 20157 root 20 0 73352 3576 2772 S 0.0 0.2 0:00.01 sshd 609 syslog 20 0 248m 3240 520 S 0.0 0.2 38:31.35 rsyslogd 20304 ubuntu 20 0 73352 1668 872 S 0.0 0.1 0:00.00 sshd 1 root 20 0 24312 1448 708 S 0.0 0.1 0:08.71 init 20442 ubuntu 20 0 17308 1232 944 R 0.0 0.1 0:00.54 top I'd like to at least understand why the memory isn't being better utilized by the server, and ideally to learn how to optimize either the server config or queries to improve performance. UPDATE: It's fair that the memory usage looks high, which might lead to the conclusion it's another process. There's no other processes using any significant memory on the server; the memory appears to be consumed in the cache, but I'm not clear why that would be the case: $free -m total used free shared buffers cached Mem: 1652 1602 50 0 14 1415 -/+ buffers/cache: 172 1480 Swap: 895 53 842 UPDATE: You can see that the database is still page faulting: insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn set repl time 0 402 377 0 1167 446 0 24.2g 51.4g 3g 0 <redacted>:9.7% 0 0|0 1|0 217k 420k 457 mover PRI 03:58:43 10 295 323 0 961 592 0 24.2g 51.4g 3.01g 0 <redacted>:10.9% 0 14|0 1|1 228k 500k 485 mover PRI 03:58:44 10 240 220 0 698 342 0 24.2g 51.4g 3.02g 5 <redacted>:10.4% 0 0|0 0|0 164k 429k 478 mover PRI 03:58:45 25 449 359 0 981 479 0 24.2g 51.4g 3.02g 32 <redacted>:20.2% 0 0|0 0|0 237k 503k 479 mover PRI 03:58:46 18 469 337 0 958 466 0 24.2g 51.4g 3g 29 <redacted>:20.1% 0 0|0 0|0 223k 500k 490 mover PRI 03:58:47 9 306 238 1 759 325 0 24.2g 51.4g 2.99g 18 <redacted>:10.8% 0 6|0 1|0 154k 321k 495 mover PRI 03:58:48 6 301 236 1 765 325 0 24.2g 51.4g 2.99g 20 <redacted>:11.0% 0 0|0 0|0 156k 344k 501 mover PRI 03:58:49 11 397 318 0 995 395 0 24.2g 51.4g 2.98g 21 <redacted>:13.4% 0 0|0 0|0 198k 424k 507 mover PRI 03:58:50 10 544 428 0 1237 532 0 24.2g 51.4g 2.99g 13 <redacted>:15.4% 0 0|0 0|0 262k 571k 513 mover PRI 03:58:51 5 291 264 0 878 335 0 24.2g 51.4g 2.98g 11 <redacted>:9.8% 0 0|0 0|0 163k 330k 513 mover PRI 03:58:52
It appears this was being caused by a large amount of inactive memory on the server that wasn't be cleared for Mongo's use. By looking at the result from: cat /proc/meminfo I could see a large amount of Inactive memory. Using this command as a sudo user: free && sync && echo 3 > /proc/sys/vm/drop_caches && echo "" && free Freed up the inactive memory, and over the next 24 hours I was able to see the resident memory of my Mongo instance increasing to consume the rest of the memory available on the server. Credit to the following blog post for it's instructions: http://tinylan.com/index.php/article/how-to-clear-inactive-memory-in-linux
MongoDB only uses as much memory as it needs, so if all of the data and indexes that are in MongoDB can fit inside what it's currently using you won't be able to push that anymore. If the data set is larger than memory, there are a couple of considerations: Check MongoDB itself to see how much data it thinks its using by running mongostat and looking at resident-memory Was MongoDB re/started recently? If it's cold then the data won't be in memory until it gets paged in (leading to more page faults initially that gradually settle). Check out the touch command for more information on "warming MongoDB up" Check your read ahead settings. If your system read ahead is too high then MongoDB can't efficiently use the memory on the system. For MongoDB a good number to start with is a setting of 32 (that's 16 KB of read ahead assuming you have 512 byte blocks)
I had the same issue: Windows Server 2008 R2, 16 Gb RAM, Mongo 2.4.3. Mongo uses only 2 Gb of RAM and generates a lot of page faults. Queries are very slow. Disk is idle, memory is free. Found no other solution than upgrade to 2.6.5. It helped.