GCE SSD persistent disk slower than Standard persistent disk - mongodb

We are using GCE for a MongoDB replica set with three members. As our data is quite large the initial sync for a new member is taking quite a lot. In our case the initial sync takes 7 hours for copying records and then 30 hours to create indexes.
The database is stored on a separate disk with these properties (copy-paste from the GCE console):
Type: Standard persistent disk
Size: 2000 GB
Zone: us-central1-c
Sustained random IOPS limit - estimated (R/W): 1,500 / 3,000
Sustained throughput limit (MB/s) - estimated (R/W): 180 / 120
To speed up we tried to add an SSD disk:
Type: SSD persistent disk
Size: 1000 GB
Zone: us-central1-c
Sustained random IOPS limit - estimated (R/W): 15,000 / 15,000
Sustained throughput limit (MB/s) - estimated: 240 / 240
One would expect that SSD disk should be quite faster than a Standard disk. But our results are different. During the initial MongoDB sync Standard disk was several time faster than the SSD. While it took 7 hours for the Standard disk to copy all data, the SSD disk after 12 hours had copied just half of data. We used Linux tool iostat and measured, Standard disk is achieving around 80,000 kB_wrtn/s while the SSD disk is around 8,000 kB_wrtn/s. How is possible that SSD disks is 10 times slower than the Standard disk?

Related

How much more disk space does wal_level logical compared to replica consume?

I setting the wal_level in my postgresDB from default replica to logical.
That I'm able to use debezium.
I wondered how much more disk space that will take up over time ?
A real life example would be great.
Something like:
1 Moth with roughtly the same amount of data changes.
with replica the disk space consumed 100 more mb
with logical the disk space consumed 200 more mb
I just assume wal_level logical will consume more space, is this even correct ?

Will Increasing the system memory, reduce dirty Cache % in Mongo wired tiger

I am having a Mongodb replica set with one primary, one secondary and one arbiter.
The hardware specification of both primary and secondary is same, i.e. 8 core, 32 GB RAM and 700 GB SSD.
I have recently moved the database to the WiredTiger db engine from MMap.
According to the documents of Mongo, I know that, page eviction will start when:
Wired Tiger cache is 80% used.
When dirty cache % is more than 5%.
My resident memory is 13 GB. I can see our dirty cache % more than 5%, around 7% all the time, also our Wired Tiger cache usage is more than 11 GB, which is around 80% of our WT cache usage.
I can see an increase in CPU usage due to app thread going into cache evictions all the time.
I want to know, if increase the box size to 16 core, 64 GB, is it going to fix the issue?

On AWS RDS Postgres, what could cause disk latency to go up while iOPs / throughput go down?

I'm investigating an approximately 3 hour period of increased query latency on a production Postgres RDS instance (m4.xlarge, 400 GiB of gp2 storage).
The driver seems to be a spike in both read and write disk latencies: I see them going from a baseline of ~0.0005 up to a peak of 0.0136 write latency / 0.0081 read latency.
I also see an increase in disk queue depth from a baseline of around 2, to a peak of 14.
When there's a spike in disk latencies, I generally expect to see an increase in data being written to disk. But read iOPS, write iOPS, read throughput, and write throughput all went down (by approximately 50%) during the time when latency was elevated.
I also have server-side metrics on the total query volume I'm sending (measured in both queries per second and amount of data written: this is a write-heavy workload), and those metrics were flat during this time period.
I'm at a loss for what to investigate next. What are possible reasons that disk latency could increase while iOPs go down?

aerospike bad latencies with aws

We have aerospike running in the Soft layer in bare metal machines in 2 node cluster. our profile average size is 1.5 KB and at peak, operations will be around 6000 ops/sec in each node. The latencies are all fine which is at peak > 1ms will be around 5%.
Now we planned to migrate to aws. So we booted 2 i3.xlarge machines. We ran the benchmark with the 1.5KB object size with the 3x load. results were satisfactory, that is around 4-5%(>1ms). Now we started actual processing, the latencies at peak jumped to 25-30% that is > 1ms and maximum it can accommodate is some 5K ops/sec. So we added one more node, we did benchmark (4.5KB object size and 3x load). The results were 2-4%(>1ms). Now after adding to cluster, the peak came down to 16-22%. We added one more node and peak is now at 10-15%.
The version in aws is aerospike-server-community-3.15.0.2 the version in Sl is Aerospike Enterprise Edition 3.6.3
Our config as follows
#Aerospike database configuration file.
service {
user xxxxx
group xxxxx
run-as-daemon
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 8
transaction-queues 8
transaction-threads-per-queue 8
proto-fd-max 15000
}
logging {
#Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
port 13000
address h1 reuse-address
}
heartbeat {
mode mesh
port 13001
address h1
mesh-seed-address-port h1 13001
mesh-seed-address-port h2 13001
mesh-seed-address-port h3 13001
mesh-seed-address-port h4 13001
interval 150
timeout 10
}
fabric {
port 13002
address h1
}
info {
port 13003
address h1
}
}
namespace XXXX {
replication-factor 2
memory-size 27G
default-ttl 10d
high-water-memory-pct 70
high-water-disk-pct 60
stop-writes-pct 90
storage-engine device {
device /dev/nvme0n1
scheduler-mode noop
write-block-size 128K
}
}
What should be done to bring down latencies in aws?
This comes down to the difference in the performance characteristics of the SSDs of the i3 nodes, compared to what you had on Softlayer. If you ran Aerospike on a floppy disk you'd get 0.5TPS.
Piyush's comment mentions ACT, the open source tool Aerospike has created to benchmark SSDs with real database workloads. The point of ACT is to find the sustained rate in which the SSD can be relied on to deliver the latency you want. Burst rates don't matter much for databases.
The performance engineering team at Aerospike has used ACT to find what the i3 1900G SSD can do, and published the results in a post. Its ACT rating is 4x, meaning that the full 1900G SSD can do 8Ktps reads, 4Ktps writes with the standard 1.5K object size, 128K block size, and stay at 95% < 1ms, 99% < 8ms, 99.9% < 64ms. This is not particularly good for an SSD. By comparison, a Micron 9200 PRO rates at 94.5x, nearly 24 times higher TPS load. What more, with the i3.xlarge you're sharing half that drive with a neighbor. There's no way to cap the IOPS so that you each get half, there's only a partition of the storage. This means that you can expect latency spikes originating in the neighbor. The i3.2xlarge is the smallest instance that gives you the entire SSD.
So, you take the ACT information and you use it to do capacity planning. The main factors you need to know are the average object size (you can find that using objsz histogram), number of objects (again, available via asadm), peak read TPS and peak write TPS (how does the 60Ktps you mentioned split between reads and writes?).
Check your logs for your cache-read-pct values. If they're in the range of 10% or higher you should be raising your post-write-queue value to get better read latencies (and also reduce IOPS pressure from the drive).

membase key-heavy memory usage vs redis

I recently ran a test with membase, incrementing 60 million keys, each key of size 20-30 bytes, the values are less than the value of an integer. This cluster was across 3 16 GB boxes, 15 GB dedicated to a single bucket (replication=1) in membase. The build is membase-server-community_x86_64_1.7.1.1 on 64-bit ubuntu lucid boxes.
Results:
Initially, 10 million keys resided on 3 GB of memory. (3mil keys / GB)
#60 million keys resided on 45 GB of memory. (1.33mil keys / GB)
In comparison, redis handles 9-10 million keys / GB # 60 million keys. This ratio of keys per GB is consistent regardless of the dataset size.
Question:
Membase does not seem to scale well when faced with key heavy datasets. Is there any tuning/configuration that could help Membase in this use case?
Thanks.
P.S I migrated from redis to membase because the latter seemed to offer more reliability against cache failure. However, this degradation of performance with large datasets is a bit too painful.