Mongod resident memory usage low - mongodb

I'm trying to debug some performance issues with a MongoDB configuration, and I noticed that the resident memory usage is sitting very low (around 25% of the system memory) despite the fact that there are occasionally large numbers of faults occurring. I'm surprised to see the usage so low given that MongoDB is so memory dependent.
Here's a snapshot of top sorted by memory usage. It can be seen that no other process is using an significant memory:
top - 21:00:47 up 136 days, 2:45, 1 user, load average: 1.35, 1.51, 0.83
Tasks: 62 total, 1 running, 61 sleeping, 0 stopped, 0 zombie
Cpu(s): 13.7%us, 5.2%sy, 0.0%ni, 77.3%id, 0.3%wa, 0.0%hi, 1.0%si, 2.4%st
Mem: 1692600k total, 1676900k used, 15700k free, 12092k buffers
Swap: 917500k total, 54088k used, 863412k free, 1473148k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2461 mongodb 20 0 29.5g 564m 492m S 22.6 34.2 40947:09 mongod
20306 ubuntu 20 0 24864 7412 1712 S 0.0 0.4 0:00.76 bash
20157 root 20 0 73352 3576 2772 S 0.0 0.2 0:00.01 sshd
609 syslog 20 0 248m 3240 520 S 0.0 0.2 38:31.35 rsyslogd
20304 ubuntu 20 0 73352 1668 872 S 0.0 0.1 0:00.00 sshd
1 root 20 0 24312 1448 708 S 0.0 0.1 0:08.71 init
20442 ubuntu 20 0 17308 1232 944 R 0.0 0.1 0:00.54 top
I'd like to at least understand why the memory isn't being better utilized by the server, and ideally to learn how to optimize either the server config or queries to improve performance.
UPDATE:
It's fair that the memory usage looks high, which might lead to the conclusion it's another process. There's no other processes using any significant memory on the server; the memory appears to be consumed in the cache, but I'm not clear why that would be the case:
$free -m
total used free shared buffers cached
Mem: 1652 1602 50 0 14 1415
-/+ buffers/cache: 172 1480
Swap: 895 53 842
UPDATE:
You can see that the database is still page faulting:
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 402 377 0 1167 446 0 24.2g 51.4g 3g 0 <redacted>:9.7% 0 0|0 1|0 217k 420k 457 mover PRI 03:58:43
10 295 323 0 961 592 0 24.2g 51.4g 3.01g 0 <redacted>:10.9% 0 14|0 1|1 228k 500k 485 mover PRI 03:58:44
10 240 220 0 698 342 0 24.2g 51.4g 3.02g 5 <redacted>:10.4% 0 0|0 0|0 164k 429k 478 mover PRI 03:58:45
25 449 359 0 981 479 0 24.2g 51.4g 3.02g 32 <redacted>:20.2% 0 0|0 0|0 237k 503k 479 mover PRI 03:58:46
18 469 337 0 958 466 0 24.2g 51.4g 3g 29 <redacted>:20.1% 0 0|0 0|0 223k 500k 490 mover PRI 03:58:47
9 306 238 1 759 325 0 24.2g 51.4g 2.99g 18 <redacted>:10.8% 0 6|0 1|0 154k 321k 495 mover PRI 03:58:48
6 301 236 1 765 325 0 24.2g 51.4g 2.99g 20 <redacted>:11.0% 0 0|0 0|0 156k 344k 501 mover PRI 03:58:49
11 397 318 0 995 395 0 24.2g 51.4g 2.98g 21 <redacted>:13.4% 0 0|0 0|0 198k 424k 507 mover PRI 03:58:50
10 544 428 0 1237 532 0 24.2g 51.4g 2.99g 13 <redacted>:15.4% 0 0|0 0|0 262k 571k 513 mover PRI 03:58:51
5 291 264 0 878 335 0 24.2g 51.4g 2.98g 11 <redacted>:9.8% 0 0|0 0|0 163k 330k 513 mover PRI 03:58:52

It appears this was being caused by a large amount of inactive memory on the server that wasn't be cleared for Mongo's use.
By looking at the result from:
cat /proc/meminfo
I could see a large amount of Inactive memory. Using this command as a sudo user:
free && sync && echo 3 > /proc/sys/vm/drop_caches && echo "" && free
Freed up the inactive memory, and over the next 24 hours I was able to see the resident memory of my Mongo instance increasing to consume the rest of the memory available on the server.
Credit to the following blog post for it's instructions:
http://tinylan.com/index.php/article/how-to-clear-inactive-memory-in-linux

MongoDB only uses as much memory as it needs, so if all of the data and indexes that are in MongoDB can fit inside what it's currently using you won't be able to push that anymore.
If the data set is larger than memory, there are a couple of considerations:
Check MongoDB itself to see how much data it thinks its using by running mongostat and looking at resident-memory
Was MongoDB re/started recently? If it's cold then the data won't be in memory until it gets paged in (leading to more page faults initially that gradually settle). Check out the touch command for more information on "warming MongoDB up"
Check your read ahead settings. If your system read ahead is too high then MongoDB can't efficiently use the memory on the system. For MongoDB a good number to start with is a setting of 32 (that's 16 KB of read ahead assuming you have 512 byte blocks)

I had the same issue: Windows Server 2008 R2, 16 Gb RAM, Mongo 2.4.3. Mongo uses only 2 Gb of RAM and generates a lot of page faults. Queries are very slow. Disk is idle, memory is free. Found no other solution than upgrade to 2.6.5. It helped.

Related

How to free memory after mongodump?

Is there a way to tell mongodump or mongod to that effect, to free the current used ram?
I have an instance with a Mongo DB server which has a couple of databases which total around 2GB. The instance has 5GB of ram. Every night I have a backup cron running mongodump. I have set up a 5GB swap file. Every other night the OOM-Killer will kill mongo, the memory will go down to ~30% and spike up to again ~60% at backup time and stays like that until next backup spikes up the memory and OOM-Killer kicks in.
Before this I had like 3.75GB and no swap and it was getting killed every night and sometimes during the day when usted. I added more ram and the swap file a few days ago, and it has improved, but still is getting killed every other day and the memory after the backup is at ~60%. And I'm paying for extra ram that is only used for these spikes at backup.
If I run mongostat I see that mongo increases the used ram during backup but does frees it afterward. Is there a way to free it? Something that will not be to stop and start mongod ?
Before Backup:
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn time
*0 *0 *0 *0 0 3|0 0.0% 1.0% 0 1.10G 100M 0|0 1|0 212b 71.0k 3 May 3 16:14:41.461
During backup:
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn time
*0 *0 *0 *0 1 1|0 0.0% 81.2% 0 2.61G 1.55G 0|0 2|0 348b 33.5m 7 May 3 16:16:01.464
After Backup
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn time
*0 *0 *0 *0 0 2|0 0.0% 79.7% 0 2.65G 1.62G 0|0 1|0 158b 71.1k 4 May 3 16:29:18.015

ceph raw used is more than sum of used in all pools (ceph df detail)

First of all sorry for my poor English
In my ceph cluster, when i run the ceph df detail command it shows me like as following result
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 62 TiB 52 TiB 10 TiB 10 TiB 16.47
ssd 8.7 TiB 8.4 TiB 370 GiB 377 GiB 4.22
TOTAL 71 TiB 60 TiB 11 TiB 11 TiB 14.96
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
rbd-kubernetes 36 288 GiB 71.56k 865 GiB 1.73 16 TiB N/A N/A 71.56k 0 B 0 B
rbd-cache 41 2.4 GiB 208.09k 7.2 GiB 0.09 2.6 TiB N/A N/A 205.39k 0 B 0 B
cephfs-metadata 51 529 MiB 221 1.6 GiB 0 16 TiB N/A N/A 221 0 B 0 B
cephfs-data 52 1.0 GiB 424 3.1 GiB 0 16 TiB N/A N/A 424 0 B 0 B
So i have a question about the result
As you can see, sum of my pools used storage is less than 1 TB, But in RAW STORAGE section the used from HDD hard disks is 10TB and it is growing every day.I think this is unusual and something is wrong with this CEPH cluster.
And also FYI the output of ceph osd dump | grep replicated is
pool 36 'rbd-kubernetes' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 244 pg_num_target 64 pgp_num_target 64 last_change 1376476 lfor 2193/2193/2193 flags hashpspool,selfmanaged_snaps,creating tiers 41 read_tier 41 write_tier 41 stripe_width 0 application rbd
pool 41 'rbd-cache' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 1376476 lfor 2193/2193/2193 flags hashpspool,incomplete_clones,selfmanaged_snaps,creating tier_of 36 cache_mode writeback target_bytes 1000000000000 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 decay_rate 0 search_last_n 0 min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
pool 51 'cephfs-metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 31675 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
pool 52 'cephfs-data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 742334 flags hashpspool,selfmanaged_snaps stripe_width 0 application cephfs
Ceph Version ceph -v
ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
Ceph OSD versions ceph tell osd.* version return for all OSDs like
osd.0: {
"version": "ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)"
}
Ceph status ceph -s
cluster:
id: 6a86aee0-3171-4824-98f3-2b5761b09feb
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-sn-03,ceph-sn-02,ceph-sn-01 (age 37h)
mgr: ceph-sn-01(active, since 4d), standbys: ceph-sn-03, ceph-sn-02
mds: cephfs-shared:1 {0=ceph-sn-02=up:active} 2 up:standby
osd: 63 osds: 63 up (since 41h), 63 in (since 41h)
task status:
scrub status:
mds.ceph-sn-02: idle
data:
pools: 4 pools, 384 pgs
objects: 280.29k objects, 293 GiB
usage: 11 TiB used, 60 TiB / 71 TiB avail
pgs: 384 active+clean
According to the provided data, you should evaluate the following considerations and scenarios:
The replication size is inclusive, and once the min_size is achieved in a write operation, you receive a completion message. That means you should expect storage consumption with the minimum of min_size and maximum of the replication size.
Ceph stores metadata and logs for housekeeping purposes, obviously consuming storage.
If you do benchmark operation via "rados bench" or a similar interface with the --no-cleanup parameter, objects will be permanently stored within the cluster that consumes storage.
All the mentioned scenarios are a couple of possibilities.

Is it possible to delete cache in Mongo?

My case is that I run several queries in Mongo and measure the response time. I know that Mongo uses Memory Mapped files and cache some data in RAM.
If I create an index and run a query the execution time is 2,07 seconds. When I run the query again the execution time is 0,017 seconds.
This means that the file is mapped into memory by the OS on the first access and the OS will keep it in cache until it decides to page it out.
Because I want to make some benchmarks I want somehow to remove the mapped results from memory.
free -g:
total used free shared buffers cached
Mem: 29 29 0 0 0 29
-/+ buffers/cache: 0 29
Swap: 0 0 0
I try db.collection.getPlanCache().clear() but it doesn't work.
Then I try:
sudo service mongod stop
I reboot my machine
sync; echo 3 > /proc/sys/vm/drop_caches
sudo service mongod start
Free -g:
total used free shared buffers cached
Mem: 29 0 0 0 0 0
-/+ buffers/cache: 0 29
Swap: 0 0 0
In above it seems that cache is free now. But when I execute again the query the execution time is 0,017 seconds. It seems that mongo has manage to cache or preserve the data somewhere. The desired execution time would be 2,07 seconds as I clear the cache.
Also mongostat gives:
insert query update delete getmore command flushes mapped vsize res faults qr|qw ar|aw netIn netOut conn set repl time
*0 *0 *0 *0 0 1|0 0 124.0G 248.9G 27.7G 1 0|0 0|0 79b 11k 13 rs0 PRI 12:37:06
*0 *0 *0 *0 1 5|0 0 124.0G 248.9G 27.7G 0 0|0 0|0 687b 11k 13 rs0 PRI 12:37:07
*0 *0 *0 *0 2 1|0 0 124.0G 248.9G 27.7G 0 0|0 0|0 173b 11k 13 rs0 PRI 12:37:08
*0 *0 *0 *0 1 5|0 0 124.0G 248.9G 27.7G 0 0|0 0|0 687b 11k 13 rs0 PRI 12:37:09
*0 *0 *0 *0 0 1|0 0 124.0G 248.9G 27.7G 0 0|0 0|0 79b 11k 13 rs0 PRI 12:37:10
Can anyone tell me how I can actually delete the cached results in Mongo?

No drop in mongodb write bandwidth after changing to atomic update

I recently changed how data was being updated in a large mongo collection. The average document size in this collection is 71K.
MongoDB Info:
[ec2-user#ip-10-0-1-179 mongodata]$ mongod --version
db version v3.0.7
git version: 6ce7cbe8c6b899552dadd907604559806aa2e9bd
Storage Engine: WiredTiger
Previously we were retrieving the entire document, adding a single element to 3 arrays and then writing the entire document back to Mongo.
The recent change I made was to create atomic ADD operations so that rather than writing the entire document back to Mongo we just called 3 ADD operations instead.
This has had a considerable impact on the network bandwidth into the mongo server (from 634 MB/s to 72 MB/s average):
But the disk volume metrics tell a VERY different story.
There was absolutely NO change in the data volume metrics (i.e. /var/mongodata/):
The journal volume metrics appear to show a significantly INCREASED write bandwidth and IOPS (i.e. /var/mongodata/journal):
I can 'just about' justify the increased journal write IOPS as I am now performing multiple smaller operations instead of one large one.
Here is a current snapshot of mongostat (which doesn't suggest a huge number of inserts or updates)
insert query update delete getmore command % dirty % used flushes vsize res qr|qw ar|aw netIn netOut conn set repl time
1 910 412 *0 0 77|0 7.7 80.2 0 29.2G 21.9G 0|0 4|2 1m 55m 738 rs0 PRI 16:21:45
1 758 19 *0 0 51|0 7.3 80.0 0 29.2G 21.9G 0|1 2|22 82k 29m 738 rs0 PRI 16:21:47
*0 1075 164 *0 0 83|0 7.4 80.1 0 29.2G 21.9G 0|0 4|2 1m 36m 738 rs0 PRI 16:21:48
*0 1046 378 *0 0 77|0 7.3 80.1 0 29.2G 21.9G 0|0 4|2 629k 55m 738 rs0 PRI 16:21:49
*0 1216 167 *0 0 58|0 7.6 80.2 0 29.2G 21.9G 0|1 1|11 238k 43m 738 rs0 PRI 16:21:50
*0 1002 9 *0 0 59|0 1.1 79.7 1 29.2G 21.9G 0|1 1|22 105k 35m 738 rs0 PRI 16:21:51
*0 801 37 *0 0 275|0 0.7 79.9 0 29.2G 21.9G 0|2 13|1 949k 17m 738 rs0 PRI 16:21:52
1 2184 223 *0 0 257|0 0.9 80.1 0 29.2G 21.9G 0|0 3|2 825k 52m 738 rs0 PRI 16:21:53
*0 1341 128 *0 0 124|0 0.9 80.0 0 29.2G 21.9G 0|1 2|39 706k 55m 738 rs0 PRI 16:21:54
1 1410 379 *0 0 121|0 1.2 80.0 0 29.2G 21.9G 0|0 2|2 2m 66m 738 rs0 PRI 16:21:55
My question is, WHY, given the almost 10 fold drop in network bandwidth (which represent the size of the write operations) is this change not being reflected by the data volume metrics.

MultiProcess Perl program Timing out connection to MongoDB

I'm writing a migration program to transform the data in one database collection to another database collection using Perl and MongoDB. Millions of documents need to be transformed and performance is very bad (it will take weeks to complete, which is not acceptable). So I thought to use Parallel::TaskManager to create multiple processes to do the transformation in parallel. Performance starts OK, then rapidly tails off, and then I start getting the following errors:
update error: MongoDB::NetworkTimeout: Timed out while waiting for socket to become ready for reading
at /usr/local/share/perl/5.18.2/Meerkat/Collection.pm line 322.
update error: MongoDB::NetworkTimeout: Timed out while waiting for socket to become ready for reading
at /usr/local/share/perl/5.18.2/Meerkat/Collection.pm line 322.
So my suspicion is that this is due to spawned processes not letting go of sockets quick enough. I'm not sure how to fix this, though if in fact this is the problem.
What I've tried:
I reduced tcp_keepalive_time via sudo sysctl -w net.ipv4.tcp_keepalive_time=120 and restarted my mongod
I reduced the max_time_ms (this made matters worse)
Here's details on my setup
Single Mongod, no replication or sharding.
Both databases are on this server the perl program is iterating over
the original database and doing some processing on that data in the
document and writing to 3 collections in the new database.
Using MongoDB::Client to access original database and using Meerkat
to write to new database. write_safety set to zero for both.
Not sure how to read this but here is a segment of mongostat from the time the errors were occurring:
insert query update delete getmore command % dirty % used flushes vsize res qr|qw ar|aw netIn netOut conn time
*0 *0 *0 *0 0 1|0 0.0 0.3 0 20.4G 9.4G 0|0 1|35 79b 15k 39 11:10:37
*0 3 8 *0 0 11|0 0.0 0.3 0 20.4G 9.4G 0|0 2|35 5k 18k 39 11:10:38
*0 3 1 *0 1 5|0 0.1 0.3 0 20.4G 9.4G 0|0 1|35 2k 15m 39 11:10:39
*0 12 4 *0 1 13|0 0.1 0.3 0 20.4G 9.4G 0|0 2|35 9k 577k 43 11:10:40
*0 3 1 *0 3 5|0 0.1 0.3 0 20.4G 9.4G 0|0 1|34 2k 10m 43 11:10:41
*0 3 8 *0 1 10|0 0.1 0.3 0 20.4G 9.4G 0|0 2|34 5k 2m 43 11:10:42
*0 9 24 *0 0 29|0 0.1 0.3 0 20.4G 9.4G 0|0 5|34 13k 24k 43 11:10:43
*0 3 8 *0 0 10|0 0.1 0.3 0 20.4G 9.4G 0|0 5|35 4k 12m 43 11:10:44
*0 3 8 *0 0 11|0 0.1 0.3 0 20.4G 9.4G 0|0 5|35 5k 12m 42 11:10:45
*0 *0 *0 *0 0 2|0 0.1 0.3 0 20.4G 9.3G 0|0 4|35 211b 12m 42 11:10:46
Please let me know if you would like to see any additional information to help me diagnose this problem.
Dropping the number of processes running in parallel down to 3 from 8 (or more) seems to cut down the number of timeout errors, but at the cost of throughput.
None of the tuning suggestion helped, nor did bulk inserts.
I continued to investigate and the root of the problem was that that my process was doing many "$addToSet" operations, which can become slow with large arrays. So my I was consuming all available sockets with slow updates. I restructured my documents so that I would not use arrays that could become large and I returned to an acceptable insert rate.