When the size of the cluster rises chunks are divided. Docs say that "the balancer will not move chunks off an overloaded shard. This must happen manually."(doc here). So will redudant chunks of a shard, that has reached the maxsize limit, be moved to another shard that hasn't exceeded the maxsize, or will they stay on the same shard and one must manually move those extra bytes and chunks off the shard?
Docs say that "the balancer will not move chunks off an overloaded shard. This must happen manually.". So will redudant chunks of a shard, that has reached the maxsize limit, be moved to another shard that hasn't exceeded the maxsize, or will they stay on the same shard and one must manually move those extra bytes and chunks off the shard?
This is specific to when you have set maxSize limit for a shard and that limit has been reached. The balancer will no longer migrate chunks to that shard, and it will remain "full" unless you manually move some chunks to another shard via sh.moveChunk(). The default behaviour is to have no maxSize set so shards can use as much disk space as is available.
my scenario is that i have for example 2 servers one with a bigger hard drive than the other. So if one is 500GB and the other is 1TB and the first gets full with data, what happens when I add more data to the servers. Will the balancer know that the first is full and transfer the extra data from the first server to the second?
MongoDB balances data between shards on the basis of logical chunks that are contiguous ranges of values based on the shard key you have selected. By default a chunk represents roughly 64MB of data.
MongoDB is unaware of the underlying disk configuration, so if server with shardA has twice as much disk space as a server with shardB the balancer is still only considering the number of chunks associated with each shard (not the actual disk usage). Ideally all shards should have similar configuration in terms of hardware and disk space.
If you use the maxSize option to limit the storage on a specific shard, this setting only controls whether the balancer will move chunks to that shard once the maxSize has been reached.
For more information see Sharded Collection Balancing in the MongoDB documentation.
Related
My MongoDB sharded cluster ingestion performances don't scale up when adding a new shard.
I have a small cluster setup with 1 mongos + 1 config replica set (3 nodes) + N shards replica sets (3 nodes each).
Mongos is on a dedicated Kubernetes node, and each mongo process hosting shards has its dedicated k8s node, while the config mong processes run a bit here and there where they happens to be deployed.
The cluster is used mainly for GridFS file hosting, with a typical file being around 100Mb.
I am doing stress tests with 1, 2 and 3 shards to see if it scales properly, and it doesn't.
If I start a brand new cluster with 2 shards and run my test it ingest files at (approx) twice the speed I had with 1 shard, but if I start the cluster with 1 shard, then perform the test, then add 1 more shard (total 2 shards), then perform the test again, the speed of ingestion is approx the same as before with 1 shard.
Looking at where chunks go, when I start the cluster immediately with 2 shards the load is evenly balanced between shards.
If I start with 1 shard and add a second after a some insertions, then the chunks tend to go all on the old shard and the balancer must bring them later to the second shard.
Quick facts:
chunksize 1024 MB
sharding key is GridFS file_id, hashed
This is due to how hashed sharding and balancing works.
In an empty collection (from Shard an Empty Collection):
The sharding operation creates empty chunks to cover the entire range of the shard key values and performs an initial chunk distribution. By default, the operation creates 2 chunks per shard and migrates across the cluster.
So if you execute sh.shardCollection() on a cluster with x numbers of shards, it will create 2 chunks per shard and distribute them across the shards, totalling 2x chunks across the cluster. Since the collection is empty, moving the chunks around take little time. Your ingestion will now be distributed evenly across the shards (assuming other things e.g. good cardinality of the hashed field).
Now if you add a new shard after the chunks were created, that shard starts empty and the balancer will start to send chunks to it using the Migration Thresholds. In a populated collection, this process may take a while to finish.
If while the balancer is still moving chunks around (which may not be empty now) you do another ingestion, the cluster is now doing two different jobs at the same time: 1) ingestion, and 2) balancing.
When you're doing this with 1 shard and add another shard, it's likely that the chunks you're ingesting into are still located in shard 1 and haven't moved to the new shard yet, so most data will go into that shard.
Thus you should wait until the cluster is balanced after adding the new shard before doing another ingestion. After it's balanced, the ingestion load should be more evenly distributed.
Note: since your shard key is file_id, I'm assuming that each file is approximately the same size (~100 MB). If some files are much larger than others, some chunks will be busier than others as well.
I deployed a sharded cluster of two shards with MongoDB version 3.0.3.
Unfortunately, I chose a monotonic shard key just like:
{insertTime: 1}
When data size was small and the write speed was slow, the balancer can balance the data between the two shards. But when the data size grows big and our write speed is much faster, the balancing speed is so slow.
Now, the hard disk's storage of one of the two shards called shard2 is near the limit.
How Can I solve this problem without stopping our service and application??
I strongly suggest that you change your shard key while it's not too late to do so to avoid the preditable death of your cluster.
When a shard key increase monotonically, all the writes operations are sent to a single shard. Thus, this shard will grow then split into 2 shards. You will continue to hammer one of them until it splits again. At some point, you cluster won't be balanced anymore and your cluster will trigger some chunk moves and slow down your cluster even more.
MongoDB generates ObjectId values upon document creation to produce a unique identifier for the object. However, the most significant bits of data in this value represent a time stamp, which means that they increment in a regular and predictable pattern. Even though this value has high cardinality, when using this, any date, or other monotonically increasing number as the shard key, all insert operations will be storing data into a single chunk, and therefore, a single shard. As a result, the write capacity of this shard will define the effective write capacity of the cluster.
You do not benefit from the good part of the sharding with this shard key. It's actually worst in performance than a single node.
You should read this to select your new shard key and avoid the typical anti patterns. http://docs.mongodb.org/manual/tutorial/choose-a-shard-key/
You could add a shard to the cluster to increase capacity.
From the docs:
You add shards to a sharded cluster after you create the cluster or any time that you need to add capacity to the cluster. If you have not created a sharded cluster, see Deploy a Sharded Cluster.
When adding a shard to a cluster, always ensure that the cluster has enough capacity to support the migration required for balancing the cluster without affecting legitimate production traffic.
my scenario is that i have for example 2 servers (shards) one with a bigger hard drive than the other. So if one is 500GB and the other is 1TB and the first gets full with data, what happens when I add more data to the servers. Will the balancer know that the first is full and transfer the extra data from the first server to the second?
No. The balancer will try to evenly partition the chunks on all shards.
First, your first largest shard will not get filled first. Along time you will probably have a similar amount of chunks and data on both shards. This is why it is recommended to have similar server specs.
Nevertheless, if you want to partition your chunks in a ratio of two to one you can do one of the two:
Change the Maximum Storage Size for a Given Shard . Use a ratio of 2 to 1
Use Tag Aware Sharding , which is more manageable, predictable option
I have a collection where the sharding key is UUID (hexidecimal string). The collection is huge: 812 millions of documents, about 9600 chunks on 2 shards. For some reason I initially stored documents which instead of UUID had integer in sharding key field. Later I deleted them completely, and now all of my documents are sharded by UUID. But I am now facing a problem with chunk distribution. While I had documents with integer instead of UUID, balancer created about 2700 chunks for these documents, and left all of them on one shard. When I deleted all these documents, chunks were not deleted, they stayed empty and they will always be empty because I only use UUID now. Since balancer distrubutes chunks relying on chunk count per shard, not document count or size, one of my shards takes 3 times more disk space than another:
--- Sharding Status ---
db.click chunks:
set1 4863
set2 4784 // 2717 of them are empty
set1> db.click.count()
191488373
set2> db.click.count()
621237120
The sad thing here is mongodb does not provide commands to remove or merge chunks manually.
My main question is, whould anything of this work to get rid of empty chunks:
Stop the balancer. Connect to each config server, remove from config.chunks ranges of empty chunks and also fix minKey slice to end at beginning of first non-empty chunk. Start the balancer.
Seems risky, but as far as I see, config.chunks is the only place where chunk information is stored.
Stop the balancer. Start a new mongod instance and connect it as a 3rd shard. Manually move all empty chunks to this new shard, then shut it down forever. Start the balancer.
Not sure, but as long as I dont use integer values in sharding key again, all queries should run fine.
Some might read this and think that the empty chunks are occupying space. That's not the case - chunks themselves take up no space - they are logical ranges of shard keys.
However, chunk balancing across shards is based on the number of chunks, not the size of each chunk.
You might want to add your voice to this ticket: https://jira.mongodb.org/browse/SERVER-2487
Since the mongodb balancer only balances chunks number across shards, having too many empty chunks in a collection can cause shards to be balanced by chunk number but severely unbalanced by data size per shard (e.g., as shown by db.myCollection.getShardDistribution()).
You need to identify the empty chunks, and merge them into chunks that have data. This will eliminate the empty chunks. This is all now documented in Mongodb docs (at least 3.2 and above, maybe even prior to that).
Reading the MongoDB documentation for indexes, i was left a little mystified and unsettled by this assertion found at: http://docs.mongodb.org/manual/applications/indexes/#ensure-indexes-fit-ram
If you have and use multiple collections, you must consider the size
of all indexes on all collections. The indexes and the working set
must be able to fit in RAM at the same time.
So, how is this supposed to scale when new nodes in the shard are added? suppose all my 576 nodes are bounded at 8Gb, and i have 12 collections of 4Gb each (including their associated indices) and 3 collections of 16Gb (including indices). How does the sharding spread work between nodes so that the 12 collections can be queried efficiently?
When sharding you spread the data across different shards. The mongos process routes queries to shards it needs to get data from. As such you only need to look at the data a shard is holding. To quote from When to Use Sharding:
You should consider deploying a sharded cluster, if:
your data set approaches or exceeds the storage capacity of a single node in your system.
the size of your system’s active working set will soon exceed the capacity of the maximum amount of RAM for your system.
Also note that the working set != whole collection. The working set is defined as:
The collection of data that MongoDB uses regularly. This data is typically (or preferably) held in RAM.
E.g. you have 1TB of data but typically only 50GB is used/queried. That subset is preferably held in RAM.