Sublinear behavior (MongoDB cluster) - mongodb

I have the following setup:
Import CSV-file (20GB) with 90 million rows -> data takes 9GB in MongoDB -> index on „2d“ column -> additional integer-column index for sharding -> distribute data with 1, 2, 4, 6, 8, 16 shards.
Each shard machine in cluster has 20GB disk space and 2GB RAM.
I generated a random query and benchmarked the execution time for each cluster configuration (see attachment).
Now my question:
Using 1, 2, 4, 6, and 8 shards I see a more or less linear decrease of runtime, as expected. With 8 shards I would assume that on each shard my data fits into memory. Therefore I thought there would be no improvement from 8 shards to 16 shards.
But from my benchmarks I observe a very strong sublinear decrease of runtime.
Do you have an idea how this behavior might be explained? Any suggestions or references to the manual are much appreciated!
Thanks in advance,
Lydia

Related

What is the minimum number of shards required for Mongo database to store 1 billion documents?

We need to store 1 billion documents of 1KB each. Each shard is planned to have 8GB of RAM. The platform is Open Shift Red Hat Linux.
Initially we had 10 shards for 300 million. We started inserting documents with 2000 inserts/second. Everything went well till 250 million. After that the insert slowed down drastically to 300/400 insert per second.
The queries are also taking long time (more than 1 minute) even all the queries are covered queries.(Queries which need to scan all the indexes).
Hence we assumed, that 20 million per shard is the optimal value and hence we require 50 shards for the current hardware to achieve 1 billion.
Is this reasonable estimate or we can improve it (less shards) by tweaking mongo db parameters for better performance with the current hardware?
There are two compound indexes and one unique index(long).insertion is done using bulk write( with unordered option) with 10 threads and 200 records per (thread) bulk write using java script directly on the mongos.Shardkey is nodeId(prefix of compound index) which has cardinality upto 10k. For 300 million, the total index size comes to 45 GB.40 GB for the 2 compound indexes.Almost 9500 chunks are distributed across 10 nodes.One interesting fact is that if I increase RAM to 12 GB, the speed increases to 1500 inserts/sec.Is RAM limiting factor?
Update:
Using mongostat tool, we found that the flush(fysnc) takes more than 55 seconds to complete.MongoDB cluster runs on kubernetes based on RedHat OpenShift platform. It runs on Dell EMC server with NFS (EXT4 disk format).Is it a problem in the I/O that it supports only 2MB/second. It takes 60 seconds to write 2000 records per second and another 55 seconds to flush completely to disk.(during which all the operations of DB are blocked)
The disk utilization does not even reach 4 %.
Have you tried not sharding at all?
There's a common tendency to shard prematurely. I've seen a MongoDB consultant who suggested a rule of thumb, to not shard until your total data size is at least 2 TB. Your 1B documents of 1KB each should be around 1 TB. While it's only a rule of thumb, maybe it's worth trying.
If nothing else, it'll be much simpler to design the db without sharding and performance will be much more predictable.

Choosing between MongoDB and ElasticSearch - Scaling/Sharding

I'm currently deciding between MongoDB and Elasticsearch as a backend to a logging and analytics platform. I plan to use a cluster of 5 Intel Xeon Quad Core servers with 64GB RAM and a 500GB NVMe drive in each. With 1 replica set, it should support 1TB+ of data I'm guessing.
From what I've read on Elasticsearch, the recommended set-up for the above servers would be 5-10 shards, but shards cannot be increased in the future without a huge migration. So maybe I can add 5 more servers/nodes to the cluster for the same index, but not 10 or 20, because I can't create more shards to spread across the new nodes/servers - correct?
MongoDB appears to automatically manage sharding based on a key value and redistribute those shards as more nodes get added. So does that mean that I can add 50 more servers to the cluster in the future and MongoDB will happily spread the data from this one index across all the servers?
I basically only need 1TB of storage right now, but don't want to paint myself into a corner, should this 1 dataset end up growing to 100TB.
Without starting Elasticsearch with 100 shards at the beginning, which seems inefficient and bad practice, how can it scale past 5/10 servers for this single dataset?
As Val said, you would normally have time based indices, so you can easily (in a performant way) remove data after a certain retention period. So as your requirements change over time, you change your shard number (normally through an index template).
Current versions of Elasticsearch now support a _split API, which does exactly what you are asking for: Use 5 shards initially, but have the option to go up to any factor of 20 (just as an example) — so 5 -> 10 -> 30 would be options.
If you have 5 primary shards and a replication factor of 1, you could still spread out the load over 10 nodes: Writes to the 5 primary and 5 replica shards; reads will go to either one of them. Elasticsearch's write / read model is generally different than MongoDB's.
PS disclaimer: I work for Elastic now, but I have used MongoDB in production for 5 years as well.

behaviour of balancer in mongodb sharding

I was experimenting with mongo sharding. The collection has shard key as {policyId,startTime}.
policyId - java UUID (limited values,lets say 50)
startTime - monotonically increasing time.
After inserting around 30M(32 GB) documents in the collection : Below is the data distribution:
shard key: { "policyId" : 1, "startDate" : 1 }
unique: false
balancing: true
chunks:
sharda 63
shardb 138
During insertion sh.isBalancerRunning() was giving 'false' as result. When I stopped inserting more documents, balancer started moving chunks. After that I got even distribution of data.
Below are my concerns / Questions regarding balancer:
1. If insertion in db is stopped, then only balancer is active and started moving chunks. If I insert more data for longer duration which will create more chunks and data will be more skewed. Chunk migration will itself take more time to balance the shards. So how does mongo decide when to migrate chunks?
2. I was able notice spikes in write latency if data is getting inserted after 20M docs. Does it mean balancer is moving some of the chunks intermittently?
3. Count API gives inconsistent result during chunk migration because balancer copies chunks from one shard to another and deletes the old chunk. Should we expect Find API will also give incorrect result (duplicate docs)?
If is possible could any one share any documentation/blog for mongo balancer for better understanding.
Assumption is wrong (i.e. If insertion in db is stopped, then only balancer is active and started moving chunks). The balancer process automatically migrates chunks when there is an uneven distribution of a sharded collection’s chunks across the shards.
Migration is not a continuous or steadily process. Automatic migration happens when it is required. for more details refer https://docs.mongodb.com/v3.0/core/sharding-balancing/#sharding-migration-thresholds
Read while migration will not give incorrect result. No duplicates records should come via find API.
For more about balancer refer https://docs.mongodb.com/manual/core/sharding-balancer-administration/
About migration refer https://docs.mongodb.com/v3.0/core/sharding-chunk-migration/
There are various things to consider
Default chunk size - 64 MB
Cardinality - If cardinality is more then and your data over period of time will not cause same value to be more than 64 MB ( assume you store 1 or more years data ) then you don't have to worry. In case not then you probably had to increase the default chunk size
Suppose you have 2 shards - Cardinality (hash key) is 100 then 50 values data will go to 1 shard and 50% to other. If you have range keys then 0-50 will go to 1 shard and 50-100 in other.
Now suppose your current chunk with value A to F reaches size 64 MB then this chunk will be split and data will be moved to other shard.
If your cardinality is low then A value itself can be more than 64 MB and chunk will not be able to split and marked as Jumbo chunk

Migrating chunks from primary shard to the other single shard takes too long

Each chunk move takes about 30-40 mins.
The shard key is a random looking but monotically increasing integer string which is a long sequence of digits. A "hashed" index is created for that field.
There are 150M documents each about 1.5Kb in size. The sharded collection has 10 indexes (some of them compound).
I have a total of ~11k chunks reported in sh.status(). So far I could only transfer 42 of them to the other shard.
The system consists of one mongos, one config server and one primary (mongod) shard and other (mongod) shard. All in the same server which has 8 cores and 32 GB ram.
I know the ideal is to use seperate machines but none of the CPUs were utilized so I thought it was good for a start.
What is your comment?
What do I need to investigate?
Is it normal?
As said on the mongodb documentation : " Sharding is the process of storing data records across multiple machines and is MongoDB’s approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. With sharding, you add more machines to support data growth and the demands of read and write operations."
You should definitely not have your shards on the same machine. It is useless. The interest of sharding is that you scale horizontaly. So if you shard on the same machine.... You are just killing your throughput.
Your database will be faster without sharding if you have one machine.
To avoid data loss, before using sharding you should use : raid (not 0), replicaset and then sharding.

Can't map file memory-mongo requires 64 bit build for larger datasets

I have a sharded cluster in 3 systems.
While inserting I get the error message:
cant map file memory-mongo requires 64 bit build for larger datasets
I know that 32 bit machine have a limit size of 2 gb.
I have two questions to ask.
The 2 gb limit is for 1 system, so the total data will be, 6gb as my sharding is done in 3 systems. So it would be only 2 gb or 6 gb?
While sharding is done properly, all the data are stored in single system in spite of distributing data in all the three sharded system?
Does Sharding play any role in increasing the datasize limit?
Does chunk size play any vital role in performance?
I would not recommend you do anything with 32bit MongoDB beyond running it on a development machine where you perhaps cannot run 64bit. Once you hit the limit the file becomes unuseable.
The documentation states "Use 64 bit for production. This is important as if you hit the mmap size limit (exact limit varies but less than 2GB) you will be unable to write to the database (analogous to a disk full condition)."
Sharding is all about scaling out your data set across multiple nodes so in answer to your question, yes you have increased the possible size of your data set. Remember though that namespaces and indexes also take up space.
You haven't specified where your mongos resides??? Where are you seeing the error from - a mongod or the mongos? I suspect that it's the mongod. I believe that you need to look at pre-splitting the chunks - http://docs.mongodb.org/manual/administration/sharding/#splitting-chunks.
which would seem to indicate that all your data is going to the one mongod.
If you have a mongos, what does sh.status() return? Are chunks spread across all mongod's?
For testing, I'd recommend a chunk size of 1mb. In production, it's best to stick with the default of 64mb unless you've some really important reason why you don't want the default and you really know what you are doing. If you have too small of a chunk size, then you will be performing splits far too often.