MongoDB degrading write performance over time - mongodb

I am importing a lot of data (18GB, 3 million documents) over time, almost all the data are indexed, so there are lots of indexing going on. The system consist of a single client (single process on a separate machine) establishing a single connection (using pymongo) and doing insertMany in batch of 1000 docs.
MongoDB setup:
single instance,
journaling enabled,
WiredTiger with default cache,
RHEL 7,
version 4.2.1
192GB RAM, 16 CPUs
1.5 TB SSD,
cloud machine.
When I start the server (after full reboot) and insert the collection, it takes 1.5 hours. If the server run for a while inserting some other data (from a single client), it finishes to insert the data, I delete the collection and run the same data to insert - it takes 6 hours to insert it (there is still sufficient disk more than 60%, nothing else making connections to the db). It feels like the server performance degrades over time, may be OS specific. Any similar experience, ideas?

I had faced similar issue, the problem was RAM.
After full restart the server had all RAM free, but after insertions the RAM was full. Deletion of collection and insertion same data again might take time as some RAM was still utilised and less was free for mongo.
Try freeing up RAM and cache after you drop the collection, and check if same behaviour persists.

As you haven't provided any specific details, I would recommend you enable profiling; this will allow you to examine performance bottlenecks. At the mongo shell run:
db.setProfilingLevel(2)
Then run:
db.system.profile.find( { "millis": { "$gt": 10 } }, { "millis": 1, "command": 1 }) // find operations over 10 milliseconds
Once done set reset the profiling mode:
db.setProfilingLevel(0)

Related

MongoDB: Queries running twice slow on NEW server compared to OLD server

I transferred current/old running DB into a new standalone server for MongoDB. To do this, I performed the following:
Took dump of data from OLD server
Restored data from the generated dump into NEW server
Configured the server for authentication
Issue:
I noticed that after performing the above, few queries on the NEW server were running slow almost twice the time compared to their performance on the OLD server.
Configurations:
The configurations of both the servers are same however the NEW server has 32 GB RAM while the OLD server had 28GB RAM. OLD server had other applications and servers running as well. While the NEW server is a dedicated server only for this DB.
CPU consumption is similar however RAM is heavily occupied in the OLD server while it is comparatively less occupied on NEW server.
Therefore, NEW server is better equipped in hardware and RAM consumption. Also NEW server is standalone dedicated to only this DB.
Question:
Why could my NEW server even though it is standalone be slow compared to OLD one? How can I correct this?
MongoDB keeps the most recently used data in RAM. If you have created indexes for your queries and your working data set fits in RAM, MongoDB serves all queries from memory.
So your OLD DB has cached data in memory and whenever you are performing the query your results may be coming from memory.
Also by default wired tiger consumes 50% of memory for internal cache.
Hence, the heavy memory consumption and better performance for your OLD DB.
Since you just restored the dump on your NEW DB, it doesn't have any recent data cached into the memory.
So whenever you perform a query it is hitting the disk and performing the I/O operations, which impacts your query performance.
If you fire a set of queries a couple of times, MongoDB will cache the result if it fits in your memory and it will give you better performance on your queries on NEW DB
Here are some docs that may help you get more information
https://docs.mongodb.com/manual/core/wiredtiger/#wiredtiger-ram
https://docs.mongodb.com/manual/faq/fundamentals/#does-mongodb-handle-caching

MongoDB: blocked queries during write operation in a replica set

I am using MongoDB (3.0) with a replica set of 3 servers. I experience very slow queries since a week and I have tried to find out what was wrong on my servers.
By using the db.currentOp() command I can see that queries are sometimes blocked on the secondaries when a "replication worker" is running. All the queries are waiting for lock ("waitingForLock" : true) and it seems that the replication worker has taken this lock and is running since several minutes (seems pretty long).
To be more specific about my user case, I have multiple databases in the replica set, all these database containing the same collections but not the same amount of data (I use one database per client).
I use WiredTiger as a storage engine that normally (as the doc claims) do not use global locks. So I was expecting that queries on the specific collection to be slow if this collection is updated, but I was not expecting all the queries to be slow or blocked.
Does anyone experienced the same issue? Is there some limitation with MongoDB when read are performed when processes write in the database?
Furthermore, is there a way to tell MongoDB that I don't care about consistency for read operations (in order to avoid locks)?
Thanks.
Update :
By restarting the servers the problems disappeared. It seems that memory and cpu usage was growing (but was still very low) that this lead to slow replication process which hold a lock and prevent queries execution.
I still don't understand why the we have the problem on this database. Maybe version 3.0.9 has a bug (I will upgrade to 3.0.12). Still it takes one month to the database to be very slow and only a restart of all the servers solve the problem. Our workload is mainly writes (with findAndModify). Does anyone know about a bug in Mongo where intensive write leads to performance decreasing over the time ?

Memory usage remains 99 % even after create index is done on MongoDB Collection

While doing indexing on MongoDB. Now we have nearly 350 GBs of data in the database and its deployed as a windows service in AWS EC2.
And we are doing indexing for some experimentation. But every time I run the indexing command the memory usage goes to 99% and even after the indexing is done the memory usage keeps like that until I restart the service.
The instance has 30 GB of RAM and SSD drive. And right now we have the DB setup as stand alone (not sharded till now). And we are using the latest version of MongoDB.
Any feedback related to this will be helpful.
Thanks,
Arpan
That's normal behavior for MongoDB.
MongoDB grabs all the RAM it can get to cache each accessed document as long as possible. When you add an index to a collection, each document needs to be read once to build the index, which causes MongoDB to load each document into RAM. It then keeps them in RAM in case you want to access them later. But MongoDB will not squat the RAM. When another process needs memory, MongoDB will willingly release it.
This is explained in the FAQ:
Does MongoDB require a lot of RAM?
Not necessarily. It’s certainly
possible to run MongoDB on a machine with a small amount of free RAM.
MongoDB automatically uses all free memory on the machine as its
cache. System resource monitors show that MongoDB uses a lot of
memory, but its usage is dynamic. If another process suddenly needs
half the server’s RAM, MongoDB will yield cached memory to the other
process.
Technically, the operating system’s virtual memory subsystem manages
MongoDB’s memory. This means that MongoDB will use as much free memory
as it can, swapping to disk as needed. Deployments with enough memory
to fit the application’s working data set in RAM will achieve the best
performance.
See also: FAQ: MongoDB Diagnostics for answers to additional questions
about MongoDB and Memory use.

MongoDB Stops Responding During Background Flush

Mongodb Background Flushing blocks all the requests:
Server: Windows server 2008 R2
CPU Usage: 10 %
Memory: 64G, Used 7%, 250MB for Mongod
Disk % Read/Write Time: less than 5% (According to Perfmon)
Mongodb Version: 2.4.6
Mongostat Normally:
insert:509 query:608 update:331 delete:*0 command:852|0 flushes:0 mapped:63.1g vsize:127g faults:6449 locked db:Radius:12.0%
Mongostat Before(maybe while) Flushing:
insert:1 query:4 update:3 delete:*0 command:7|0 flushes:0 mapped:63.1g vsize:127g faults:313 locked db:local:0.0%
And Mongostat After Flushing:
insert:1572 query:1849 update:1028 delete:*0 command:2673|0 flushes:1 mapped:63.1g vsize:127g faults:21065 locked db:.:99.0%
As you see when flushes happening lock is 99% just at this point mongod stops responding any read/write operation (mongotop and mongostat also stop). The flushing takes about 7 to 8 seconds to complete which does not increase disk load more than 10%.
Is there any suggestions?
Under Windows server 2008 R2 (and other versions of Windows I would suspect, although I don't know for sure), MongoDB's (2.4 and older) background flush process imposes a global lock, doing substantial blocking of reads and writes, and the length of the flush time tends to be proportional to the amount of memory MongoDB is using (both resident and system cache for memory-mapped files), even if very little actual write activity is going on. This is a phenomenon we ran into at our shop.
In one replica set where we were using MongoDB version 2.2.2, on a host with some 128 GBs of RAM, when most of the RAM was in use either as resident memory or as standby system cache, the flush time was reliably between 10 and 15 seconds under almost no load and could go as high as 30 to 40 seconds under load. This could cause Mongo to go into long pauses of unresponsiveness every minute. Our storage did not show signs of being stressed.
The basic problem, it seems, is that Windows handles flushing to memory-mapped files differently than Linux. Apparently, the process is synchronous under Windows and this has a number of side effects, although I don't understand the technical details well enough to comment.
MongoDb, Inc., is aware of this issue and is working on optimizations to address it. The problem is documented in a couple of tickets:
https://jira.mongodb.org/browse/SERVER-13444
https://jira.mongodb.org/browse/SERVER-12401
What to do?
The phenomenon is tied, to some degree, to the minimum latency of the disk subsystem as measured under low stress, so you might try experimenting with faster disks, if you can. Some improvements have been reported with this approach.
A strategy that worked for us in some limited degree is avoiding provisioning too much RAM. It happened that we really didn't need 128 GBs of RAM, so by dialing back on the RAM, we were able to reduce the flush time. Naturally, that wouldn't work for everyone.
The latest versions of MongoDB (2.6.0 and later) seem to handle the
situation better in that writes are still blocked during the long
flush but reads are able to proceed.
If you are working with a sharded cluster, you could try dividing the RAM by putting multiple shards on the same host. We didn't try this ourselves, but it seems like it might have worked. On the other hand, careful design and testing would be highly recommended in any such scenario to avoid compromising performance and/or high availability
We tried playing with syncdelay. Reducing it didn't help (the long flush times just happened more frequently). Increasing it helped a little (there was more time between flushes to get work done), but increasing it too much can exacerbate the problem severely. We boosted the syncdelay to five minutes (300 seconds), at one point, and were rewarded with a background flush of 20 minutes.
Some optimizations are in the works at MongoDB, Inc. These may be available soon.
In our case, to relieve the pressure on the primary host, we periodically rebooted one of the secondaries (clearing all memory) and then failed over to it. Naturally, there is some performance hit due to re-caching, and I think this only worked for us because our workload is write-heavy. Moreover, this technique not in any sense a solution. But if high flush times are causing serious disruption, this may be one way to "reduce the fever" so to speak.
Consider running on Linux... :-)
Background flush by default does not block read/write. mongod does flush every 60s, unless otherwise specified with -syncDelay parameter. syncDelay uses fsync() operation, which can set to block write while in-memory pages flush to disk. A blocked write could have potential to block reads as well. Read more: http://docs.mongodb.org/manual/reference/command/fsync/
However, normally a flush should not take more than 1000ms (1 second). If it does, it is likely the amount of data flushing to disk is too large for your disk to handle.
Solution: upgrade to a faster disk like SSD, or decrease flush interval (try 30s, rather than the default 60s).

memory-safe mongodb tasks

MongoDB is running out of memory on a 96GB root server when adding a single index on a timestamp field for a 50GB collection.
Does MongoDB have any option to run a query or task in "safe-mode", e.g. without cutting the memory too much? It seems to be very touchy and can be crashed, e.g. by running some find queries with $lte/$gt on a non-indexed timestamp field.
i can't control it, but shouldn't there a mongodb config setting for "safety" which makes sure to release RAM once it's breaking the limit? maybe even before it is blocking other processes or stoped by oom killer?
MongoDB does not use its own memory management. Instead it uses the OS' LRU. The OS is paging documents so heavily because it has used the amount of memory allocated to mongod, aka your working set is bigger than the amount of RAM you have spare for MongoDB as such MongoDB is swapping page faults for most of it not all of your data ( a good reference for paging: http://en.wikipedia.org/wiki/Paging ).
I would strongly not recommend restricting MongoDB in this case since it will run even worse however, especially on Linux, you can actually use ulimit on the mongo user you are using to run mongod: http://docs.mongodb.org/manual/administration/ulimit/
Does MongoDB have any option to run a query or task in "safe-mode", e.g. without cutting the memory too much?
Not really.
It seems to be very touchy and can be crashed, e.g. by running some find queries with $lte/$gt on a non-indexed timestamp field.
Naturally this shouldn't cause an OOM exception for MongoDB, it could indicate a memory leak somewhere: http://docs.mongodb.org/manual/administration/ulimit/
If you limit the resident memory size on a system running MongoDB you risk allowing the operating system to terminate the mongod process under normal situations. Do not set this value. If the operating system (i.e. Linux) kills your mongod, with the OOM killer, check the output of serverStatus and ensure MongoDB is not leaking memory.
It seems to be very touchy and can be crashed e.g. by running some find queries with $lte/$gt on a non-indexed timestamp field.
It's the OOM killer that's killing it because your mongod instance is swapping a lot pages into RAM. You probably have a lot processes contending for RAM. You can instruct Linux to not kill the mongod daemon as follows :
sudo echo -17 > /proc/<process if of mongod>/oom_adj
You can't control how much memory mongodb uses, unfortunately. I suggest looking at the background indexing docs on mongodb. And some more useful links :
See the related thread on stackoverflow
How do i limit the cache size?