MongoDB high cpu usage/long read time - mongodb

I am a new user of MongoDB, and I am hoping to get pointed in the right direction. I will provide any further needed information I have missed as this question develops.
I am using a Perl program to upload and annotate/modify documents to a/in a MongoDB database via the MongoDB cpan module. Indexes are being used (I believe) for this program, but the problem I have is that reading from MongoDB takes increasingly long. Based on mongotop, it takes ~500 ms to read and only 10-15 ms to write. After allowing the program to run for a considerable amount of time, the read time increases significantly, taking more then 3000+ ms after many hours of running.
Monitoring the program while its running using top, Perl starts out at around 10-20% CPU usage and MongoDB starts at 70-90% CPU usage. While running, within a few minutes Perl drops below 5% and mongoDB is 90-95%. After running for a much longer period of time (12+ hours), MongoDB is ~98% CPU usage while Perl is around 0.3%, but only pops up every 5-10 seconds in top.
Based on this trend, an indexing issue seems very likely but I am not sure how to check this, and all I know is that the appropriate indexes are at least made, but not necessarily being used.
Additional information:
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 19209
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 19209
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
As the program runs, I see that the indexSize and dataSize change (via db.stats() in the Mongo shell), which makes me think that they are at least being used to some degree
Is this something that could be affected by the power of my computer? I am under the impression that indexing should make a lot of this process very manageable for the computer

That sounds a lot like it could be doing a collection scan rather than using the index. I.e. as your collection grows, the reads are getting slower.
If you're using the find method, you can run explain on the resulting cursor to get information on how the query would execute.
Here's a trivial example:
use MongoDB;
use JSON::MaybeXS;
my $coll = MongoDB->connect->ns("test.foo");
$coll->drop();
$coll->insert_one({x => $_}) for 1 .. 1000;
my $cursor = $coll->find({x => 42});
my $diag = $cursor->explain;
my $json = JSON::MaybeXS->new(
allow_blessed => 1, convert_blessed => 1, pretty => 1, canonical => 1
);
print $json->encode($diag->{queryPlanner}{winningPlan});
Looking at just the 'winningPlan' part of the output you can see 'COLLSCAN':
{
"direction" : "forward",
"filter" : {
"x" : {
"$eq" : 42
}
},
"stage" : "COLLSCAN"
}
Now I'll do it again, but first creating an index on 'x' before the insertions with $coll->indexes->create_one([x => 1]). You can see in the output that the query plan is now using the index (IXSCAN).
{
"inputStage" : {
"direction" : "forward",
"indexBounds" : {
"x" : [
"[42, 42]"
]
},
"indexName" : "x_1",
"indexVersion" : 2,
"isMultiKey" : false,
"isPartial" : false,
"isSparse" : false,
"isUnique" : false,
"keyPattern" : {
"x" : 1
},
"multiKeyPaths" : {
"x" : []
},
"stage" : "IXSCAN"
},
"stage" : "FETCH"
}
There's a lot more you can discover from the full 'explain' output. You can watch a great video from MongoDB World 2016 to learn more about it: Deciphering Explain Output.

Related

Mongo is logging slow query log but profiling is zero

I'm using MongoDb-CE 4.4
I have a 2.4 GB mongod.log file.
I executed
db.setProfilingLevel(0);
I obtained
{
"was" : 0.0,
"slowms" : 100.0,
"sampleRate" : 1.0,
"ok" : 1.0
}
But still it is logging slow query... why?
setting profiling level to zero turns off recording data to the collection system.profile, but will continue to log to the log file any operations slower than slowms. You cannot stop logging slow operations, but you can set the slowms to a high value - db.setProfilingLevel(0, 5000000) which may have the same effect. In this example it will log if it takes more than 5,000,000 ms (i.e., 5,000 seconds).

Mongos memory usage in constant augmentation

We chose to deploy the mongos router in the same VM as our applications, but we're running into some issues where the application gets OOM Killed because the mongos eats up a lot more RAM than we'd expect / want to.
After a reboot, the mongos footprint is a bit under 2GB, but from here it constantly requires more memory. About 500MB per week. It went up to 4.5+GB
This is the stats for one of our mongos for the past 2 weeks and it clearly looks like it's leaking memory...
So my question is: how to investigate such behavior? We've not really been able to find explanations as of why the router might require more RAM, or how to diagnosis the behavior much. Or even how to set a memory usage limit to the mongos.
With a db.serverStatus on the mongos we can see the allocations:
"tcmalloc" : {
"generic" : {
"current_allocated_bytes" : 536925728,
"heap_size" : NumberLong("2530185216")
},
"tcmalloc" : {
"pageheap_free_bytes" : 848211968,
"pageheap_unmapped_bytes" : 213700608,
"max_total_thread_cache_bytes" : NumberLong(1073741824),
"current_total_thread_cache_bytes" : 819058352,
"total_free_bytes" : 931346912,
"central_cache_free_bytes" : 108358128,
"transfer_cache_free_bytes" : 3930432,
"thread_cache_free_bytes" : 819058352,
"aggressive_memory_decommit" : 0,
"pageheap_committed_bytes" : NumberLong("2316484608"),
"pageheap_scavenge_count" : 35286,
"pageheap_commit_count" : 64660,
"pageheap_total_commit_bytes" : NumberLong("28015460352"),
"pageheap_decommit_count" : 35286,
"pageheap_total_decommit_bytes" : NumberLong("25698975744"),
"pageheap_reserve_count" : 513,
"pageheap_total_reserve_bytes" : NumberLong("2530185216"),
"spinlock_total_delay_ns" : NumberLong("38522661660"),
"release_rate" : 1
}
},
------------------------------------------------
MALLOC: 536926304 ( 512.1 MiB) Bytes in use by application
MALLOC: + 848211968 ( 808.9 MiB) Bytes in page heap freelist
MALLOC: + 108358128 ( 103.3 MiB) Bytes in central cache freelist
MALLOC: + 3930432 ( 3.7 MiB) Bytes in transfer cache freelist
MALLOC: + 819057776 ( 781.1 MiB) Bytes in thread cache freelists
MALLOC: + 12411136 ( 11.8 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 2328895744 ( 2221.0 MiB) Actual memory used (physical + swap)
MALLOC: + 213700608 ( 203.8 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 2542596352 ( 2424.8 MiB) Virtual address space used
MALLOC:
MALLOC: 127967 Spans in use
MALLOC: 73 Thread heaps in use
MALLOC: 4096 Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
But I can't say it's really helpful. At least to me.
In the server stats we can also see that the number of calls to killCursors is quite high (2909015) but I'm not sure how it would explain the steady increase in memory usage? As the cursors are automatically killed after 30ish seconds, and the number of calls made to the mongos is pretty much steady all throughout the period.
So yeah, any idea on how to diagnosis / where to look / what to look for?
Mongos version: 4.0.19
Edit: seems like our monitoring is based on the virt and not the res memory, so the graph might not be very pertinent. However, we still ended-up with 4+ GB of RES memory at some point
Why the router would require more memory?
If there is any query in the sharded cluster where the system needs to do a scatter gather then merging activity is taken care of by the mongos itself.
For example I am running a query db.collectionanme.find({something : 1})
If this something field here is not the shard key itself then by default it will do a scatter gather, use explainPlan to check the query. It does a scatter gather because mongos interacts with config server and realises that it doesn't have information for this respective field. {This is applicable for a collection which is sharded}
To make things worse, if you have sorting operations where the index cannot be used then even that now has to be done on the mongos itself. Sorting operations have to block the memory segment to get the pages together based on volume of data then sort works, imagine the best possible Big O for a sorting operation here. Till that is done the memory is blocked for that operation.
What you should do?
Based on settings (your slowms setting, default should be 100ms), check the logs, take a look at your slow queries in the system. If you see a lot of SHARD_MERGE & in memory sorts taking place then you have your culprit right there.
And for quick fix increase the Swap memory availability and make sure settings are apt.
All the best.
Without access to the machine one can only speculate, with that said I want to say this is somewhat expected behaviour by Mongo.
Mongo likes memory, it saves many things in RAM in order to increase performance.
Many different things can cause this, i'll list a few:
MongoDB uses RAM to handle open connections, aggregations, serverside code, open cursors and more.
WiredTiger keeps multiple versions of records in its cache (Multi
Version Concurrency Control, read operations access the last
committed version before their operation).
WiredTiger Keeps checksums of the data in cache.
There are many more things that are cached / stored in memory by mongo, one such example could be an index tree for a collection.
If Mongo has the memory to store something, it will. that's why as you're using it more and more the RAM usage increases. however I personally do not think it's a memory leak. as I said Mongo just like's RAM, A LOT.
We chose to deploy the mongos router in the same VM as our applications.
This is a big no no from personal experience in general because of Mongo's hungriness I would personally try and avoid this if possible.
To summarize I don't think you have a memory leak ( although possible ), I just think the more time passes Mongo stores more things into RAM as they are being used.
You should be on the lookout for long running queries as those are the most likely culprit IMO.

Mongodb read locks

I have a mongodb collection with custom _id and 500M+ documents. Size of the _id's index is ≈25Gb and whole collection is ≈125 Gb. Server has 96 Gb RAM. Read activity is only range queries by _id. Explain() shows that queries use the index. Mongo works rather fast some time after load tests start and slows down after a while. I can see in a log a lot of entries like this:
[conn116] getmore csdb.archive query: { _id: { $gt: 2812719756651008, $lt: 2812720361451008 } } cursorid:444942282445272280 ntoreturn:0 keyUpdates:0 numYields: 748 locks(micros) r:7885031 nreturned:40302 reslen:1047872 10329ms
A piece of db.currentOp():
"waitingForLock" : false,
"numYields" : 193,
"lockStats" : {
"timeLockedMicros" : {
"r" : NumberLong(869051),
"w" : NumberLong(0)
},
"timeAcquiringMicros" : {
"r" : NumberLong(1369404),
"w" : NumberLong(0)
}
}
What is locks(micros) r? What can I do to cut it down?
What is locks(micros) r?
The amount of time that read locks was held (in microseconds).
R - Global read lock
W - Global write lock
r - Database specific read lock
w - Database specific write lock
What can I do to cut it down?
How does sharding affect concurrency?
Sharding improves concurrency by distributing collections over multiple mongod instances, allowing shard servers (i.e. mongos processes) to perform any number of operations concurrently to the various downstream mongod instances.
Diagnosing Performance Issues (Locks)
MongoDB uses a locking system to ensure data set consistency. However, if certain operations are long-running, or a queue forms, performance will slow as requests and operations wait for the lock. Lock-related slowdowns can be intermittent. To see if the lock has been affecting your performance, look to the data in the globalLock section of the serverStatus output. If globalLock.currentQueue.total is consistently high, then there is a chance that a large number of requests are waiting for a lock. This indicates a possible concurrency issue that may be affecting performance.
If globalLock.totalTime is high relative to uptime, the database has existed in a lock state for a significant amount of time. If globalLock.ratio is also high, MongoDB has likely been processing a large number of long running queries. Long queries are often the result of a number of factors: ineffective use of indexes, non-optimal schema design, poor query structure, system architecture issues, or insufficient RAM resulting in page faults and disk reads.
How We Scale MongoDB (Vertically)
Sadly, MongoDB itself will usually become a bottleneck before the capacity of a server is exhausted. Write lock is almost always the biggest problem (though there are practical limits to how much IO capacity a single MongoDB process can take advantage of).

Why is MongoDB storage constantly increasing?

I have a single-host database which grew up to 95% of disk space while I was not watching. To remedy the situation I created a process that automatically removes the old records from the biggest collection, so the data usage fell to about 40% of disk space. I figured I was safe as long as the data size doesn't grow near the size of preallocated files, but after a week I was proven wrong:
Wed Jan 23 18:19:22 [FileAllocator] allocating new datafile /var/lib/mongodb/xxx.101, filling with zeroes...
Wed Jan 23 18:25:11 [FileAllocator] done allocating datafile /var/lib/mongodb/xxx.101, size: 2047MB, took 347.8 secs
Wed Jan 23 18:25:14 [conn4243] serverStatus was very slow: { after basic: 0, middle of mem: 590, after mem: 590, after connections: 590, after extra info: 970, after counters: 970, after repl: 970, after asserts: 970, after dur: 1800, at end: 1800 }
This is the output of db.stats(): (note that the numbers are in MB because of scale)
> db.stats(1024*1024)
{
"db" : "xxx",
"collections" : 47,
"objects" : 189307130,
"avgObjSize" : 509.94713418348266,
"dataSize" : 92064,
"storageSize" : 131763,
"numExtents" : 257,
"indexes" : 78,
"indexSize" : 29078,
"fileSize" : 200543,
"nsSizeMB" : 16,
"ok" : 1
}
Question: What can I do to stop MongoDB from allocating new datafiles?
Running repair is difficult because I would have to install new disk. Would running compact help? If yes, should I be running it regularly and how can I tell when I should run it?
UPDATE: I guess I am missing something fundamental here... Could someone please elaborate on connection between data files, extents, collections and database, and how space is allocated when needed?
Upgrade to 2.2.2 - 2.2.0 has an idempotency bug in replication and no longer recommended for production.
See here for general info http://docs.mongodb.org/manual/faq/storage/#faq-disk-size
The only way to recover space back from mongodb is to either sync a new node over the network - in which case the documents are copied over the the new file system and stored anew without fragmentation. Or to use the repair command - but for that you need double the disk space that you are using on disk. The data files are copied, defragged and compacted and copied back over the original. The compact command is badly named and only defrags - it doesn't recover disk space back from mongo.
Going forward, use usePowerOf2Sizes command (new in 2.2.x) http://docs.mongodb.org/manual/reference/command/collMod/
If you use that command and allocate say an 800 byte document, 1024 bytes will be allocated on disk. If you then delete that doc and insert a new one - say 900 bytes, that doc can fit in the 1024 byte space. Without this option enabled, the 800 byte doc might only have 850 bytes on disk - so when it's deleted and the 900 byte doc inserted, new space has to be allocated. And if that is then deleted you will end up with two free space - 850 bytes and 950 bytes which are never joined (unless compact or repair is used) - so then insert a 1000 byte doc and you need to allocate another chunk of disk. usePowerOf2Sizes helps this situation a lot by using standard bucket sizes.

MongoDB not enough storage, 32 bit

I'm getting the "not enough storage" error when trying to insert data into my MongoDB. But I'm nowhere near the size limit, as seen in the stats.
> db.stats()
{
"db" : "{my db name}",
"collections" : 20,
"objects" : 281092,
"avgObjSize" : 806.4220539894412,
"dataSize" : 226678788,
"storageSize" : 470056960,
"numExtents" : 95,
"indexes" : 18,
"indexSize" : 13891024,
"fileSize" : 1056702464,
"nsSizeMB" : 16,
"ok" : 1
}
Journaling is on, but the journal file size is only 262232 KB.
The data file at size 524032 KB has been created, although dataSize is below the smaller file of 262144 KB.
The NS file is 16384 KB.
I've read several places that this error is caused by the size limit on a 32 bit of around 2 GB, but then why am I getting this error when my dataSize is below that?
First of all. The constraint applies to the fileSize since the storage engine uses memory mapped files. This is currently at 1Gb for you. What is likely is that MongoDB is about to prealloc a new data file which is probably going to be 1Gb in size (the default sizes are 64mb, 128mb, 256mb, 512mb, 1Gb, and from there on 2Gb per additional data file). You're probably at the point where you have the 512mb one but not the 1Gb one.
Frankly I think using MongoDB on a 32-bit environment is an absolute no-go but if you're stuck on it you can try the --smallfiles option which is allocates smaller files of at most 512Mb.