Will Increasing the system memory, reduce dirty Cache % in Mongo wired tiger - mongodb

I am having a Mongodb replica set with one primary, one secondary and one arbiter.
The hardware specification of both primary and secondary is same, i.e. 8 core, 32 GB RAM and 700 GB SSD.
I have recently moved the database to the WiredTiger db engine from MMap.
According to the documents of Mongo, I know that, page eviction will start when:
Wired Tiger cache is 80% used.
When dirty cache % is more than 5%.
My resident memory is 13 GB. I can see our dirty cache % more than 5%, around 7% all the time, also our Wired Tiger cache usage is more than 11 GB, which is around 80% of our WT cache usage.
I can see an increase in CPU usage due to app thread going into cache evictions all the time.
I want to know, if increase the box size to 16 core, 64 GB, is it going to fix the issue?

Related

Meaning of ADX Cache utilization more than 100%

We see Cache utilization dashboard for an ADX cluster on Azure portal, but at times I have noticed that this utilization shows up to be more than 100%. I am trying to understand how to interpret it. Say , for example , if cache utilization shows up as 250% , does it mean that 100% of memory cache is utilized and then beyond that 150% disk cache is being utilized?
as explained in the documentation for the Cache Utilization metric:
[this is the] Percentage of allocated cache resources currently in use by the cluster.
Cache is the size of SSD allocated for user activity according to the defined cache policy.
An average cache utilization of 80% or less is a sustainable state for a cluster.
If the average cache utilization is above 80%, the cluster should be scaled up to a storage optimized pricing tier or scaled out to more instances. Alternatively, adapt the cache policy (fewer days in cache).
If cache utilization is over 100%, the size of data to be cached, according to the caching policy, is larger that the total size of cache on the cluster.
Utilization > 100% means that there's not enough room in the (SSD) cache to hold all the data that the policy indicates should be cached. If auto-scale is enabled then the cluster will be scaled-out as a result.
The cache applies an LRU eviction policy, so that even when utilization exceeds 100% query performance will be as good as possible (though, of course, if queries constantly reference data more than what the cache can hold some performance degradation will be observed.)

On AWS RDS Postgres, what could cause disk latency to go up while iOPs / throughput go down?

I'm investigating an approximately 3 hour period of increased query latency on a production Postgres RDS instance (m4.xlarge, 400 GiB of gp2 storage).
The driver seems to be a spike in both read and write disk latencies: I see them going from a baseline of ~0.0005 up to a peak of 0.0136 write latency / 0.0081 read latency.
I also see an increase in disk queue depth from a baseline of around 2, to a peak of 14.
When there's a spike in disk latencies, I generally expect to see an increase in data being written to disk. But read iOPS, write iOPS, read throughput, and write throughput all went down (by approximately 50%) during the time when latency was elevated.
I also have server-side metrics on the total query volume I'm sending (measured in both queries per second and amount of data written: this is a write-heavy workload), and those metrics were flat during this time period.
I'm at a loss for what to investigate next. What are possible reasons that disk latency could increase while iOPs go down?

GCE SSD persistent disk slower than Standard persistent disk

We are using GCE for a MongoDB replica set with three members. As our data is quite large the initial sync for a new member is taking quite a lot. In our case the initial sync takes 7 hours for copying records and then 30 hours to create indexes.
The database is stored on a separate disk with these properties (copy-paste from the GCE console):
Type: Standard persistent disk
Size: 2000 GB
Zone: us-central1-c
Sustained random IOPS limit - estimated (R/W): 1,500 / 3,000
Sustained throughput limit (MB/s) - estimated (R/W): 180 / 120
To speed up we tried to add an SSD disk:
Type: SSD persistent disk
Size: 1000 GB
Zone: us-central1-c
Sustained random IOPS limit - estimated (R/W): 15,000 / 15,000
Sustained throughput limit (MB/s) - estimated: 240 / 240
One would expect that SSD disk should be quite faster than a Standard disk. But our results are different. During the initial MongoDB sync Standard disk was several time faster than the SSD. While it took 7 hours for the Standard disk to copy all data, the SSD disk after 12 hours had copied just half of data. We used Linux tool iostat and measured, Standard disk is achieving around 80,000 kB_wrtn/s while the SSD disk is around 8,000 kB_wrtn/s. How is possible that SSD disks is 10 times slower than the Standard disk?

What is the max size of collection in mongodb

I would like to know what is the max size of collection in mongodb.
In mongodb limitations documentation it is mentioned single MMAPv1 database has a maximum size of 32TB.
This means max size of collection is 32TB?
If I want to store more than 32TB in one collection what is the solution?
There are theoretical limits, as I will show below, but even the lower bound is pretty high. It is not easy to calculate the limits correctly, but the order of magnitude should be sufficient.
mmapv1
The actual limit depends on a few things like length of shard names and alike (that sums up if you have a couple of hundred thousands of them), but here is a rough calculation with real life data.
Each shard needs some space in the config db, which is limited as any other database to 32TB on a single machine or in a replica set. On the servers I administrate, the average size of an entry in config.shards is 112 bytes. Furthermore, each chunk needs about 250 bytes of metadata information. Let us assume optimal chunk sizes of close to 64MB.
We can have at maximum 500,000 chunks per server. 500,000 * 250byte equals 125MB for the chunk information per shard. So, per shard, we have 125.000112 MB per shard if we max everything out. Dividing 32TB by that value shows us that we can have a maximum of slightly under 256,000 shards in a cluster.
Each shard in turn can hold 32TB worth of data. 256,000 * 32TB is 8.19200 exabytes or 8,192,000 terabytes. That would be the limit for our example.
Let's say its 8 exabytes. As of now, this can easily translated to "Enough for all practical purposes". To give you an impression: All data held by the Library of Congress (arguably one of the biggest library in the world in terms of collection size) holds an estimated size of data of around 20TB in size including audio, video, and digital materials. You could fit that into our theoretical MongoDB cluster some 400,000 times. Note that this is the lower bound of the maximum size, using conservative values.
WiredTiger
Now for the good part: The WiredTiger storage engine does not have this limitation: The database size is not limited (since there is no limit on how many datafiles can be used), so we can have an unlimited number of shards. Even when we have those shards running on mmapv1 and only our config servers on WT, the size of a becomes nearly unlimited – the limitation to 16.8M TB of RAM on a 64 bit system might cause problems somewhere and cause the indices of the config.shard collection to be swapped to disk, stalling the system. I can only guess, since my calculator refuses to work with numbers in that area (and I am too lazy to do it by hand), but I estimate the limit here in the two digit yottabyte area (and the space needed to host that somewhere in the size of Texas).
Conclusion
Do not worry about the maximum data size in a sharded environment. No matter what, it is by far enough, even with the most conservative approach. Use sharding, and you are done. Btw: even 32TB is a hell lot of data: Most clusters I know hold less data and shard because the IOPS and RAM utilization exceeded a single nodes capacity.

membase key-heavy memory usage vs redis

I recently ran a test with membase, incrementing 60 million keys, each key of size 20-30 bytes, the values are less than the value of an integer. This cluster was across 3 16 GB boxes, 15 GB dedicated to a single bucket (replication=1) in membase. The build is membase-server-community_x86_64_1.7.1.1 on 64-bit ubuntu lucid boxes.
Results:
Initially, 10 million keys resided on 3 GB of memory. (3mil keys / GB)
#60 million keys resided on 45 GB of memory. (1.33mil keys / GB)
In comparison, redis handles 9-10 million keys / GB # 60 million keys. This ratio of keys per GB is consistent regardless of the dataset size.
Question:
Membase does not seem to scale well when faced with key heavy datasets. Is there any tuning/configuration that could help Membase in this use case?
Thanks.
P.S I migrated from redis to membase because the latter seemed to offer more reliability against cache failure. However, this degradation of performance with large datasets is a bit too painful.