I was doing some tests to figure out the performance of Replica Sets in our environment. The set up consists of 1 Primary and 1 Secondary in local Data Center and 1 Secondary in remote Data Center.
My record consists of 1 field of size 512 bytes. The numbers of inserts were 100,000 and 500,000.
During week 1 the inserts in primary were happening within the following time:
100,000 writes - 5 seconds
500,000 writes - 20 seconds
Week 2 -
100,000 writes - 14 seconds
500,000 writes - 66 seconds
I can't seem to figure what could have caused the rate to dip down so much. I have an oplog of size 1 GB and journaling enabled. I am not concerned about replication lag since there isn't much lag. There is no other i/o processes happening in the environments on which the mongodb is setup. I have also deleted files and restarted the machines but still I notice this dip.
Can anyone let me know what could be the cause?
Thanks,
Ganesh
If these are virtual machines, then you might have a "noisy neighbor". If you're using NAS or SAN storage, then write throughput can be affected by network traffic or by I/O load for other hosts sharing the NAS or SAN.
Related
We need to store 1 billion documents of 1KB each. Each shard is planned to have 8GB of RAM. The platform is Open Shift Red Hat Linux.
Initially we had 10 shards for 300 million. We started inserting documents with 2000 inserts/second. Everything went well till 250 million. After that the insert slowed down drastically to 300/400 insert per second.
The queries are also taking long time (more than 1 minute) even all the queries are covered queries.(Queries which need to scan all the indexes).
Hence we assumed, that 20 million per shard is the optimal value and hence we require 50 shards for the current hardware to achieve 1 billion.
Is this reasonable estimate or we can improve it (less shards) by tweaking mongo db parameters for better performance with the current hardware?
There are two compound indexes and one unique index(long).insertion is done using bulk write( with unordered option) with 10 threads and 200 records per (thread) bulk write using java script directly on the mongos.Shardkey is nodeId(prefix of compound index) which has cardinality upto 10k. For 300 million, the total index size comes to 45 GB.40 GB for the 2 compound indexes.Almost 9500 chunks are distributed across 10 nodes.One interesting fact is that if I increase RAM to 12 GB, the speed increases to 1500 inserts/sec.Is RAM limiting factor?
Update:
Using mongostat tool, we found that the flush(fysnc) takes more than 55 seconds to complete.MongoDB cluster runs on kubernetes based on RedHat OpenShift platform. It runs on Dell EMC server with NFS (EXT4 disk format).Is it a problem in the I/O that it supports only 2MB/second. It takes 60 seconds to write 2000 records per second and another 55 seconds to flush completely to disk.(during which all the operations of DB are blocked)
The disk utilization does not even reach 4 %.
Have you tried not sharding at all?
There's a common tendency to shard prematurely. I've seen a MongoDB consultant who suggested a rule of thumb, to not shard until your total data size is at least 2 TB. Your 1B documents of 1KB each should be around 1 TB. While it's only a rule of thumb, maybe it's worth trying.
If nothing else, it'll be much simpler to design the db without sharding and performance will be much more predictable.
I have one publisher with around 50 subscribers. Not so often (few times a month) a binary file of size 30MB is written to the database. At this point all subscriber are getting this file and I have network bandwidth issues.
Is it possible to limit (in Postgres or OS) the bandwidth used by the logical replication per publisher/subscribers?
Is it possible to limit the bandwidth used during first sync?
At the PostgreSQL level I can suggest trying to reduce max_wal_senders parameter at the sending server (it is 10 by default)
Depending on the latency you can accept, you can limit the number of concurrent sending processes up to 1 process at a time
Currently we have one replica set of 3 members, 25 GB of data, normal cpu usage is 1.5 in both secondary, 0.5 in primary(read happen in secondary instance only), normally 1200 users hit our website. Now we have planned to increase the no of hit to our website. We are expecting about 5000 concurrent users to our website, can you please suggest no of instance needed to add in my replica set.
Current infra in our replica set:
1. Primary instance
CPUs: 16
RAM: 32 GB
HDD: 100 GB
2. Secondary instance
CPUs: 8
RAM: 16 GB
HDD: 100 GB
3. Secondary instance
CPUs: 8
RAM: 16 GB
HDD: 100 GB
Assuming your application scales linearly with the number of users, the CPU capacity should not be a problem (does it? Only you can tell - we don't know what your application does).
The question is: how much do you expect your data to grow? When you currently have 25 GB of data and 16 GB of ram, 64% of your data fits into RAM. That likely means that many queries can be served directly from the RAM cache without hitting the hard drives. These queries are usually very fast. But when your working set increases further beyond the size of your RAM, you might experience some increased latency when accessing the data which now needs to be read from the hard drives (it depends, though: when your application interacts primarily with recent data and rarely with older data, you might not even notice much of a difference).
The solution to this is obvious: get more RAM. Should this not be an option (for example because the server reached the maximum RAM capacity the hardware allows), your next option is building a sharded cluster where each shard is responsible for serving an interval of your data.
We are using MongoDB as a virtual machine (A3) on Azure. We are trying to simulate running cost of using MongoDB for our following scenario:
Scenario is to insert/update around 2k amount of data (time series data) every 5 minutes by 100,000 customers. We are using MongoDB on A3 instance (4 core) of Windows Server on Azure (that restricts 4TB per shard).
When we estimated running cost, it is coming out to be approx $34,000 per month - which includes MongoDB licensing, our MongoDB virtual machine, storage, backup storage and worker role.
This is way costly. We have some ideas to bring the cost down but need some advice on those ideas as some of you may have already done this.
Two questions:
1- As of today, we are estimating to use 28 MongoDB instances (with 4 TB limit). I have read that we can increase the disk size from 4TB to 64 TB on Linux VM or Windows Server 2012 server. This may reduce our number of shards needed. Is running MongoDB on 64TB disk size shard possible in Azure?
You may ask why 28 number of instances..
2- We are calculating our number of shards required based on "number of inserts per core"; which is itself depend on number of values inserted in the MongoDB per message. each value is 82 bytes. We did some load testing and it comes out that we can only run 8000 inserts per second and each core can handle approx. 193 inserts per second - resulting into need of 41 cores (which is way too high). You can divide 41 cores/4 resulting into A3 11 instances -- which is another cost....
Looking for help to see - if our calculation is wrong or the way we have setup is wrong.
Any help will be appreciated.
Question nr. 1:
1- As of today, we are estimating to use 28 MongoDB instances (with 4
TB limit). I have read that we can increase the disk size from 4TB to
64 TB on Linux VM or Windows Server 2012 server. This may reduce our
number of shards needed. Is running MongoDB on 64TB disk size shard
possible in Azure?
According to documentation here, the maximum you can achieve is 16TB, which is 16 Data disks attached, max. 1 TB each. So, technically the largest disk you can attach is 1TB, but you can build RAID 0 stripe with the 16 disks attached, so you can get 16TB storage. But this (16TB) is the maximum amount of storage you can officially get.
According to the Azure documentation A3 size can have a maximum of 8 data disks. So a maximum of 8TB. A4 can handle 16 disks. I would assume your bottleneck here is disk and not the number of cores. So i'm not convinced you need such a big cluster.
I have a perl script (in Ubuntu 12.04 LTS) writing to 26 TCH files. The keys are roughly equally distributed. The writes become very slow after 3 Million inserts (equally distributed to all the files) and the speed comes down from 240,000 inserts/min at the beginning to 14,000 inserts/min after 3 MM inserts. Individually the shard files are no more than 150 MB and overall their size comes to around 2.7 GB.
I run optimize on every TCH File after every 100K inserts to that file with bnum as 4*num_records_then and options set to TLARGE and make sure xmsiz matches the size of bnum (as mentioned in Why does tokyo tyrant slow down exponentially even after adjusting bnum?)
Even after this, the inserts start at high speed then slowly decrease to 14k inserts/min from 240k inserts/min. Could it be due to holding multiple tch connections (26) in a single script? Or is there configuration setting, I'm missing (would disabling journaling help, but the above thread says journaling affects performance only after the tch file becomes bigger than 3-4GB, my shards are <150MB files..)?
I would turn off journaling and measure what changes.
The cited thread talks about a 2-3 GB tch file, but if you sum the sizes of your 26 tch files, you are in the same league. For the filesystem, the total amount of data ranges written to should be the relevant parameter.