Overview:
We are comparing performance of create/read/write/rw over two different architectures: Single Database vs Multiple Databases (15k-25k).
We prefer to use the Multi-DB architecture because that makes it easier to separate customers (customer = 1 company). However, due to performance degradation we fear this may not be a good solution.
Server Specification:
Single instance MongoDB server; 64GB RAM; 16 core; SSD HD
Test results:
Both test scenarios have the same total number of documents (and documents are roughly the same size). The variables are number of databases, collections per database and documents per collection.
All tests are conducted in parallel using 50 client threads (separate machine), with the exception of Read/Write, which uses 100 (50R/50W). 'directoryPerDB' is enabled.
(All times are in milliseconds per doc operation)
Test Creation Read Write Read/Write Notes
25000 DB 4 Coll 250 Doc 23ms 1-10ms 1-4ms 2-10ms Max 1400% CPU, noticeable "pauses" (CPU drops to 100%)
15000 DB 4 Coll 420 Doc 23ms 0.7-4ms 0.9-4ms 2-9ms Max 1400% CPU, noticeable "pauses" (CPU drops to 100%)
1 DB 4 Coll 125000 Doc 0.8ms 0.6ms 0.8ms 1.2-1.6ms Max 600% CPU, no pauses
Conclusion:
There seems to be noticeable performance degradation on a regular interval when the DB count is very high. It may be due to the sheer number of files (25000 DBs * 4 Colls * 2 files = 200k files) or some other bottleneck.
In the Single-DB test, the CPU stays around 600% and maintains that until completion. In the multi DB tests, the CPU (at peak performance) is somewhere between 800-1400%, but every so often the CPU drops to 100% and all operations are paused. This can be verified by watching the mongo log, as well as the logs from the test clients that are issuing R/W commands.
If it wasn't for these pauses, the Multi-DB architecture would be ~2x faster than Single-DB, however it appears there is some global contention that cannot be avoided.
I'm hoping someone might know what this global contention is and (if possible) how to solve it.
Related
What is the maximum theoretical number of parallel requests that we can squize from single mongodb instance before deciding to shard?
Considering the database and indexes fit in memory and all requests are find() queries fetching single document based on indexed field. The hosting OS is Ubuntu , the data partition is SSD. ulimits are set to max.
In my laptop with simple test on single instance I reach near 40k/sec , after that the avg execution times start to increase significantly, but wondering what can be the upper theoretical limit?
It depends. If your active dataset can fit in the memory - if most of the requests don't need to perform any disk I/O - then you can achieve 24k+ requests pretty easily. If not on a (bigger) single machine, then at least use a replica set cluster with multiple secondaries.
If an active dataset is much larger than the available RAM then you have the same problem as with any other database. The advantage of MongoDB's new engine WiredTiger (since v3.0) is a transparent compression - it can reduce the amount of data and I/O and thus improve performance - even despite the fact that compression adds CPU load.
For more performance it really helps:
if the most accessed documents are small so it takes less time to
load them, transfer them, and less time to deserialize in your app List item
If you use projections in find(), for the same reasons
if you use bulk operations to reduce networking I/O and context switches
Even MongoDB itself has an option to limit the maximum number of incoming connections. It defaults to 64k.
for more information you can refer link
We need to store 1 billion documents of 1KB each. Each shard is planned to have 8GB of RAM. The platform is Open Shift Red Hat Linux.
Initially we had 10 shards for 300 million. We started inserting documents with 2000 inserts/second. Everything went well till 250 million. After that the insert slowed down drastically to 300/400 insert per second.
The queries are also taking long time (more than 1 minute) even all the queries are covered queries.(Queries which need to scan all the indexes).
Hence we assumed, that 20 million per shard is the optimal value and hence we require 50 shards for the current hardware to achieve 1 billion.
Is this reasonable estimate or we can improve it (less shards) by tweaking mongo db parameters for better performance with the current hardware?
There are two compound indexes and one unique index(long).insertion is done using bulk write( with unordered option) with 10 threads and 200 records per (thread) bulk write using java script directly on the mongos.Shardkey is nodeId(prefix of compound index) which has cardinality upto 10k. For 300 million, the total index size comes to 45 GB.40 GB for the 2 compound indexes.Almost 9500 chunks are distributed across 10 nodes.One interesting fact is that if I increase RAM to 12 GB, the speed increases to 1500 inserts/sec.Is RAM limiting factor?
Update:
Using mongostat tool, we found that the flush(fysnc) takes more than 55 seconds to complete.MongoDB cluster runs on kubernetes based on RedHat OpenShift platform. It runs on Dell EMC server with NFS (EXT4 disk format).Is it a problem in the I/O that it supports only 2MB/second. It takes 60 seconds to write 2000 records per second and another 55 seconds to flush completely to disk.(during which all the operations of DB are blocked)
The disk utilization does not even reach 4 %.
Have you tried not sharding at all?
There's a common tendency to shard prematurely. I've seen a MongoDB consultant who suggested a rule of thumb, to not shard until your total data size is at least 2 TB. Your 1B documents of 1KB each should be around 1 TB. While it's only a rule of thumb, maybe it's worth trying.
If nothing else, it'll be much simpler to design the db without sharding and performance will be much more predictable.
Last time I get alert from MongoDB Atlas:
Disk I/O % utilization on Data Partition has gone above 70 on nvme2n1
But I have no any ideas how can I localize / query / index / part of code / problematic collection.
In what way can I perform any analyze to find out problem root-cause?
Not answer, but just seen that many people faced with similar problem.
In My case root cause was: we had collection with huge documents that contain array of data (in fact - list of coordinates with some metadata), and update it as many times, as coordinates we have (when adding new coordinates). + some additional operations.
As I know MongoDB cannot fetch just part of document, it fetch full document, and when we fetch many different and big documents, they are not fit into MongoDB in-memory cache, and each time access into hard disc, that lead to this issue.
So, we just split up this document on several, and this fixed issue. While we need frequent access to update/add this data, we keep it into different documents, and finally, after process done, we gather back all this documents into one big document, for "history check" purpose.
Recently, we met this alert on MongoDB Atlas Disk I/O % utilization on Data Partition has gone above 90 after the instance reboots maintenance. After a discussion with Atlas support guys, we clearly understand this metric.
Understanding Disk I/O % Utilization
The definition of Disk I/O % Utilization and Disk I/O % utilization on Data Partition per doc
Disk I/O % Utilization alerts indicate that the percentage of time during which requests are being issued reaches a specified threshold.
Disk I/O % utilization on Data Partition occurs if the percentage of time during which requests are being issued to any partition that contains the MongoDB collection data meets or exceeds the threshold.
Two traps in iostat: %util and svctm
Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance limits.
This means if there was even just one I/O operation in progress for a given time period, the operating system would report 100% Disk Util, as the disk was in use 100% of that time.
Thus, the disk utilization percentage by itself is NOT an indicator of stress on the disk relative to its maximum IOPS capacity.
Having disk utilization at 100% does not in itself imply there is an issue. Disk utilization is the percentage of time requests are issued to any partition containing the MongoDB collection data. This includes requests from any process, not just MongoDB processes. Modern disk storage can sustain multiple I/O operations simultaneously, so having a ~100% utilization is not unusual, because it just means that the disk is constantly processing at least one operation during the 100% interval.
Conclusion
We should look at a combination of all the available disk-related metrics, as well as IOWait in the System CPU when diagnosing potential disk performance-related issues.
Possible actions to help resolve Disk Utilization % alerts
Optimize your queries
Create an Index to Support Read Operations
Pay attention to Query Selectivity and Covered Query
Use the Atlas Performance Advisor to view slow queries and suggested indexes.
Review Indexing Strategies for possible further indexing improvements.
Analyze Query Performance to review how your queries are using your indexes.
Analyze Profile to optimize the long execution time query
Increase hardware resources, such as instance size and IOPS on Atlas
Source: Mongo Doc
As the alert says, it is due to the high utilization of the disk. The most common cause of it is unoptimized queries with poor Query Targeting Ratio, or simply reading/writing a lot of documents from/to the disk in a relatively shorter time window.
In order to identify these queries, start with the Profiler and look for the operations with a poor Examined:Returned ratio. You can also refer to the Performance Advisor to see if it suggests any indexes on the inefficient operations. Since Profiler's window is limited to the last 24 hours, you can also refer to your logs to identify the Slow Queries.
Ultimately, the effort to solve this is tri-directional:
Optimizing the query execution with efficient indexing and filtering strategies
Keep a check on the volume of data being read/written in one go.
Increase the IOPS of the cluster
For official reference, checkout the documentation here.
Our Postgres DB (hosted on Google Cloud SQL with 1 CPU, 3.7 GB of RAM, see below) consists mostly of one big ~90GB table with about ~60 million rows. The usage pattern consists almost exclusively of appends and a few indexed reads near the end of the table. From time to time a few users get deleted, deleting a small percentage of rows scattered across the table.
This all works fine, but every few months an autovacuum gets triggered on that table, which significantly impacts our service's performance for ~8 hours:
Storage usage increases by ~1GB for the duration of the autovacuum (several hours), then slowly returns to the previous value (might eventually drop below it, due to the autovacuum freeing pages)
Database CPU utilization jumps from <10% to ~20%
Disk Read/Write Ops increases from near zero to ~50/second
Database Memory increases slightly, but stays below 2GB
Transaction/sec and ingress/egress bytes are also fairly unaffected, as would be expected
This has the effect of increasing our service's 95th latency percentile from ~100ms to ~0.5-1s during the autovacuum, which in turn triggers our monitoring. The service serves around ten requests per second, with each request consisting of a few simple DB reads/writes that normally have a latency of 2-3ms each.
Here are some monitoring screenshots illustrating the issue:
The DB configuration is fairly vanilla:
The log entry documenting this autovacuum process reads as follows:
system usage: CPU 470.10s/358.74u sec elapsed 38004.58 sec
avg read rate: 2.491 MB/s, avg write rate: 2.247 MB/s
buffer usage: 8480213 hits, 12117505 misses, 10930449 dirtied
tuples: 5959839 removed, 57732135 remain, 4574 are dead but not yet removable
pages: 0 removed, 6482261 remain, 0 skipped due to pins, 0 skipped frozen
automatic vacuum of table "XXX": index scans: 1
Any suggestions what we could tune to reduce the impact of future autovacuums on our service? Or are we doing something wrong?
If you can increase autovacuum_vacuum_cost_delay, your autovacuum would run slower and be less invasive.
However, it is usually the best solution to make it faster by setting autovacuum_vacuum_cost_limit to 2000 or so. Then it finishes faster.
You could also try to schedule VACUUMs of the table yourself at times when it hurts least.
But frankly, if a single innocuous autovacuum is enough to disturb your operation, you need more I/O bandwidth.
I am using 10 threads to do find-and-modify on large documents (one document at a time on each thread). The documents are about 600kB each and contain each a 10,000 elements array. The client and the MongoDB server are on the same LAN in a datacenter and are connected via a very fast connection. The client runs on small machine (CPU is fairly slow, 768 MB of RAM).
The problem is that every find-and-modify operation is extremely slow.
For example here is the timeline of an operation:
11:56:04: Calling FindAndModify on the client.
12:05:59: The operation is logged on the server ("responseLength" : 682598, "millis" : 7), so supposedly the server returns almost immediately.
12:38:39: FindAndModify returns on the client.
The CPU usage on the machine is only 10-20%, but the memory seems to be running low. The available bytes performance counter is around 40 MB, and the Private Bytes of the process that is calling MongoDB is 800 MB (which is more than the physical RAM on the machine). Also the Page/sec performance counter is hovering around 4,000.
My theory is that it took 33 minutes for the driver to deserialize the document, and that could be because the OS is swapping due to high memory usage caused by the deserialization of the large documents into CLR objects. Still, this does not really explain why it took 10 minutes between the call to FindAndModify and the server side execution.
How likely is it that this is the case? Do you have another theory? How can I verify this is the case?