mongodb sharding - chunks are not having the same size - mongodb

I am new on playing with mongodb.
Due to the fact that I have to store +-50 mln of documents, I had to set up a mongodb shard cluster with two replica sets
The document looks like this:
{
"_id" : "predefined_unique_id",
"appNr" : "abcde",
"modifiedDate" : ISODate("2016-09-16T13:00:57.000Z"),
"size" : NumberLong(803),
"crc32" : NumberLong(538462645)
}
The shard key is appNr (was selected because for query performance reasons, all documents having same appNr have to stay within one chunk).
Usually multiple documents have the same appNr.
After loading like two million records, I see the chunks are equally balanced however when running db.my_collection.getShardDistribution(), I get :
Shard rs0 at rs0/...
data : 733.97MiB docs : 5618348 chunks : 22
estimated data per chunk : 33.36MiB
estimated docs per chunk : 255379
Shard rs1 at rs1/...
data : 210.09MiB docs : 1734181 chunks : 19
estimated data per chunk : 11.05MiB
estimated docs per chunk : 91272
Totals
data : 944.07MiB docs : 7352529 chunks : 41
Shard rs0 contains 77.74% data, 76.41% docs in cluster, avg obj size on shard : 136B
Shard rs1 contains 22.25% data, 23.58% docs in cluster, avg obj size on shard : 127B
My question is what settings I should do in order to get the data equally distributed between shards? I would like to understand how the data gets split in chunks. I have defined a ranged shard key and chunk size 264.

MongoDB uses the shard key associated to the collection to partition the data into chunks. A chunk consists of a subset of sharded data. Each chunk has a inclusive lower and exclusive upper range based on the shard key.
Diagram of the shard key value space segmented into smaller ranges or chunks.
The mongos routes writes to the appropriate chunk based on the shard key value. MongoDB splits chunks when they grows beyond the configured chunk size. Both inserts and updates can trigger a chunk split.
The smallest range a chunk can represent is a single unique shard key
value. A chunk that only contains documents with a single shard key
value cannot be split.
Chunk Size will have a major impact on the shards.
The default chunk size in MongoDB is 64 megabytes. We can increase or reduce the chunk size. But modification of the chunk size should be done after considering the below items
Small chunks lead to a more even distribution of data at the expense of more frequent migrations. This creates expense at the query routing (mongos) layer.
Large chunks lead to fewer migrations. This is more efficient both from the networking perspective and in terms of internal overhead at the query routing layer. But, these efficiencies come at the expense of a potentially uneven distribution of data.
Chunk size affects the Maximum Number of Documents Per Chunk to Migrate.
Chunk size affects the maximum collection size when sharding an existing collection. Post-sharding, chunk size does not constrain collection size.
By referring these information and your shard key "appNr", this would have happened because of chunk size.
Try resizing the chunk size instead of 264MB(which you have currently) to a lower size and see whether there is a change in the document distribution. But this would be a trial and error approach and it would take considerable amount of time and iterations.
Reference : https://docs.mongodb.com/v3.2/core/sharding-data-partitioning/
Hope it Helps!

I'll post my findings here - maybe they will have some further use.
The mongodb documentation says that "when a chunk grows beyond specified chunk size" it gets splitted.
I think the documentation is not fully accurate or rather incomplete.
When mongo does auto-splitting, splitVector command will ask the primary shard for splitting points, then will split accordingly.This will happen first when like 20% from specified chunk size is reached and - if no splitting points found - will retry at 40%,60% so on...so the splitting should not wait for max size .
In my case, for the first half of the shards this happened ok, but then for the second half - the split happened only after the max chunk size was exceeded. Still have to investigate why the split didn't happened earlier, as I see no reason for this behaviour.
After splitting in chunks, the balancer starts. This will divide the chunks equally across shards, without considering chunk size ( a chunk with 0 documents is equal to a chunk with 100 documents from this regard).The chunks will be moved following the order of their creation.
My problem was that the second half of the chunks was almost twice the size than the first half. Therefore as balancer allways moved the first half of the chunks collection to the other shard, the cluster became unbalanced.
a much better explanation I found here
In order to fix it, I have changed the sharding key to "hashed".

Related

What happen if chunk size goes beyond the limit[64 mb] for single shard key in mongodb

We have a sharded cluster of mongodb. shard key is sellerId. We have nearly 20k sellers. We capture responses for sellers. Some sellers may have huge response set. Now lets say sellerId 10001 has some very good listing and got millions of responses in that case single shard key 10001 has huge data and goes beyond of 64 mb of default size. As per mongo documentation there can be only one chunk per shard key in res in replica set. What will happen with this chunk. Does the chunk size automatically increase?

behaviour of balancer in mongodb sharding

I was experimenting with mongo sharding. The collection has shard key as {policyId,startTime}.
policyId - java UUID (limited values,lets say 50)
startTime - monotonically increasing time.
After inserting around 30M(32 GB) documents in the collection : Below is the data distribution:
shard key: { "policyId" : 1, "startDate" : 1 }
unique: false
balancing: true
chunks:
sharda 63
shardb 138
During insertion sh.isBalancerRunning() was giving 'false' as result. When I stopped inserting more documents, balancer started moving chunks. After that I got even distribution of data.
Below are my concerns / Questions regarding balancer:
1. If insertion in db is stopped, then only balancer is active and started moving chunks. If I insert more data for longer duration which will create more chunks and data will be more skewed. Chunk migration will itself take more time to balance the shards. So how does mongo decide when to migrate chunks?
2. I was able notice spikes in write latency if data is getting inserted after 20M docs. Does it mean balancer is moving some of the chunks intermittently?
3. Count API gives inconsistent result during chunk migration because balancer copies chunks from one shard to another and deletes the old chunk. Should we expect Find API will also give incorrect result (duplicate docs)?
If is possible could any one share any documentation/blog for mongo balancer for better understanding.
Assumption is wrong (i.e. If insertion in db is stopped, then only balancer is active and started moving chunks). The balancer process automatically migrates chunks when there is an uneven distribution of a sharded collection’s chunks across the shards.
Migration is not a continuous or steadily process. Automatic migration happens when it is required. for more details refer https://docs.mongodb.com/v3.0/core/sharding-balancing/#sharding-migration-thresholds
Read while migration will not give incorrect result. No duplicates records should come via find API.
For more about balancer refer https://docs.mongodb.com/manual/core/sharding-balancer-administration/
About migration refer https://docs.mongodb.com/v3.0/core/sharding-chunk-migration/
There are various things to consider
Default chunk size - 64 MB
Cardinality - If cardinality is more then and your data over period of time will not cause same value to be more than 64 MB ( assume you store 1 or more years data ) then you don't have to worry. In case not then you probably had to increase the default chunk size
Suppose you have 2 shards - Cardinality (hash key) is 100 then 50 values data will go to 1 shard and 50% to other. If you have range keys then 0-50 will go to 1 shard and 50-100 in other.
Now suppose your current chunk with value A to F reaches size 64 MB then this chunk will be split and data will be moved to other shard.
If your cardinality is low then A value itself can be more than 64 MB and chunk will not be able to split and marked as Jumbo chunk

MongoDB: One of the shard is not equally balanced like other shards

I have a sharded cluster setup for my app but unfortunately one of the shard is taking 17 GB of data size and others are taking average 3 GB of data size. What could be the issue?
sh.status() gives me huge output. Shared here: https://www.dropbox.com/s/qqsucbm6q9egbhf/shard.txt?dl=0
My bad collection shard distribution details is below.
mongos> db.MyCollection_1_100000.getShardDistribution()
Shard shard_0 at shard_0/mongo-11.2816.mongodbdns.com:270
00,mongo-12.2816.mongodbdns.com:27000,mongo-13.2816. mongodbdns.com:27000,mongo-3.2816.mongodbdns.com:27003
data : 143.86MiB docs : 281828 chunks : 4
estimated data per chunk : 35.96MiB
estimated docs per chunk : 70457
Shard shard_1 at shard_1/mongo-10.2816.mongodbdns.com:270 00,mongo-11.2816.mongodbdns.com:27002,mongo-19.2816. mongodbdns.com:27001,mongo-9.2816.mongodbdns.com:27005
data : 107.66MiB docs : 211180 chunks : 3
estimated data per chunk : 35.88MiB
estimated docs per chunk : 70393
Shard shard_2 at shard_2/mongo-14.2816.mongodbdns.com:270 00,mongo-3.2816.mongodbdns.com:27000,mongo-4.2816.mo ngodbdns.com:27000,mongo-6.2816.mongodbdns.com:27002
data : 107.55MiB docs : 210916 chunks : 3
estimated data per chunk : 35.85MiB
estimated docs per chunk : 70305
Shard shard_3 at shard_3/mongo-14.2816.mongodbdns.com:270 04,mongo-18.2816.mongodbdns.com:27002,mongo-6.2816.m ongodbdns.com:27000,mongo-8.2816.mongodbdns.com:27000
data : 107.99MiB docs : 211506 chunks : 3
estimated data per chunk : 35.99MiB
estimated docs per chunk : 70502
Shard shard_4 at shard_4/mongo-12.2816.mongodbdns.com:270 01,mongo-13.2816.mongodbdns.com:27001,mongo-17.2816. mongodbdns.com:27002,mongo-6.2816.mongodbdns.com:27003
data : 107.92MiB docs : 211440 chunks : 3
estimated data per chunk : 35.97MiB
estimated docs per chunk : 70480
Shard shard_5 at shard_5/mongo-17.2816.mongodbdns.com:270 01,mongo-18.2816.mongodbdns.com:27001,mongo-19.2816. mongodbdns.com:27000
data : 728.64MiB docs : 1423913 chunks : 4
estimated data per chunk : 182.16MiB
estimated docs per chunk : 355978
Shard shard_6 at shard_6/mongo-10.2816.mongodbdns.com:270 01,mongo-14.2816.mongodbdns.com:27005,mongo-3.2816.m ongodbdns.com:27001,mongo-8.2816.mongodbdns.com:27003
data : 107.52MiB docs : 211169 chunks : 3
estimated data per chunk : 35.84MiB
estimated docs per chunk : 70389
Shard shard_7 at shard_7/mongo-17.2816.mongodbdns.com:270 00,mongo-18.2816.mongodbdns.com:27000,mongo-19.2816. mongodbdns.com:27003,mongo-9.2816.mongodbdns.com:27003
data : 107.87MiB docs : 211499 chunks : 3
estimated data per chunk : 35.95MiB
estimated docs per chunk : 70499
Shard shard_8 at shard_8/mongo-19.2816.mongodbdns.com:270 02,mongo-4.2816.mongodbdns.com:27002,mongo-8.2816.mo ngodbdns.com:27001,mongo-9.2816.mongodbdns.com:27001
data : 107.83MiB docs : 211154 chunks : 3
estimated data per chunk : 35.94MiB
estimated docs per chunk : 70384
Shard shard_9 at shard_9/mongo-10.2816.mongodbdns.com:270 02,mongo-11.2816.mongodbdns.com:27003,mongo-12.2816. mongodbdns.com:27002,mongo-13.2816.mongodbdns.com:27002
data : 107.84MiB docs : 211483 chunks : 3
estimated data per chunk : 35.94MiB
estimated docs per chunk : 70494
Totals
data : 1.69GiB docs : 3396088 chunks : 32
Shard shard_0 contains 8.29% data, 8.29% docs in cluster, avg obj size on shard : 535B
Shard shard_1 contains 6.2% data, 6.21% docs in cluster, avg obj size on shard : 5 34B
Shard shard_2 contains 6.2% data, 6.21% docs in cluster, avg obj size on shard : 5 34B
Shard shard_3 contains 6.22% data, 6.22% docs in cluster, avg obj size on shard : 535B
Shard shard_4 contains 6.22% data, 6.22% docs in cluster, avg obj size on shard : 535B
Shard shard_5 contains 42% data, 41.92% docs in cluster, avg obj size on shard : 5 36B
Shard shard_6 contains 6.19% data, 6.21% docs in cluster, avg obj size on shard : 533B
Shard shard_7 contains 6.21% data, 6.22% docs in cluster, avg obj size on shard : 534B
Shard shard_8 contains 6.21% data, 6.21% docs in cluster, avg obj size on shard : 535B
Shard shard_9 contains 6.21% data, 6.22% docs in cluster, avg obj size on shard : 534B
I have 150+ similar collections where I have divided data by user_id's
e.g. MyCollection_1_100000
MyCollection_100001_200000
MyCollection_200001_300000
Here I have divided data of user id's ranging from 1 to 100000 in MyCollection_1_100000 likewise for other collections
shard key for all 150+ collection is sequential number but it is hashed. Applied by following way
db.MyCollection_1_100000.ensureIndex({"column": "hashed"})
sh.shardCollection("dbName.MyCollection_1_100000", { "column": "hashed" })
Please suggest me corrective steps to get rid of unbalanced shard problem.
Unshared Collections
Shard 5 is the primary shard in your cluster, which means it will take all unsharded collections and therefore grows bigger in size. You should check for that. See here.
Chunk Split
As Markus pointed out, distribution is done by chunk and not by documents. Chunks may grow up to their defined chunk size. When they exceed the chunk size they are split and redistributed. In your case there seems to be at least one collection that has 1 additional chunk than all the other shards. The reason could be that either the chunk has not yet reached it's chunk limit (check db.settings.find( { _id:"chunksize" }) default size is 64MB, see also here) or that the chunk can not be split because the range represented by the chunk can not be further split automatically. You should check the ranges using the sh.status(true) command (the output of the ranges is omitted for some collections in the large output you posted)
However you may split the chunk manually.
There is also a quite good answer on the dba forum.
Shard Key
If you have no unsharded collections, the problem may be the shard key itself. Mongo suggest to use a shard key with high cardinality and a high degree of randomness. Without knowing the value range of your columns, I assume the cardinality is rather low (i.e. 1000 columns) compared to, lets say a timestamp (1 for every single entry, making up to a LOT of different values).
Further, the data should be evenly distributed. So lets say you have 10 possible columns. But there are a lot more entries with a particular value for the column name all that entries would be written to the same shard. For example
entries.count({column: "A"} = 10 -> shard 0
entries.count({column: "B"} = 10 -> shard 1
...
entries.count({column: "F"} = 100 -> shard 5
The sh.status() command should give you some more information about the chunks.
If you use the object id or a timestamp - which are values that are monotonically increasing - will lead to data being written to the same chunk as well.
So Mongo suggests to use a compound key which will lead to a higher cardinality (value-range of field1 x value-range of field2). In your case you could combine the column name with a timestamp.
But either way, you're out of luck for your current installation, as you can not change the shard key afterwards.
DB Design
The verbose output you printed also indicates, you have several dbs/collections with same schema or purpose which occur to me to be sort of manually partitioned. Is there a particular reason for this? This could have an effect on the distribution of the data in the cluster as well as every collection start to be filled on the primary node. There is at least one collection with just a single chunk in the primary, and some with 3 or 4 chunks in total, all having at least one chunk on the primary (i.e. the z_best_times_*).
Preferrably you should only have a single collection for one purpose and probably use a compound shard key (i.e. hashed timestamp in addition).

When I create sharded collections, should it create chunks across more than 1 shard?

I am trying to confirm whether the sharding and chunking behaviour is correct in my MongoDB instance.
We have 2 shards, each with a replica set, and have:
(a) Enabled sharding using sh.enableSharding() command for my database
(b) Added an hashIndex to a new collection via db.X.ensureIndex() command
(c) Added the sharded collection to my database via sh.shardCollection() command.
When I run sh.status, I notice that only one of my two shards contains chunks, implying that my data is not distributed. I have added a couple of documents to test processing, but I still only see 1 chunk. Is this the correct behaviour? Intuitively, I would expect more 1..n chunks in each shard.
Thanks in advance,
Steve Westwood
Mongo will only start to split data across chunks when the first chunk has reached a certain size. When there are less than 10 chunks it will split when a chunk grows above about 16mb. When there are more chunks it will split them at 64mb.
So I expect you don't have enough data to trigger a chunk split.
You can override these chunk size values with the chunkSize option to the mongos, which can be useful for testing.

Relation between shard keys and chunks in MongoDB sharded cluster?

I can't really understand the shard key concept in a MongoDB sharded cluster, as I've just started learning MongoDB.
Citing the MongoDB documentation:
A chunk is a contiguous range of shard key values assigned to a
particular shard. When they grow beyond the configured chunk size, a
mongos splits the chunk into two chunks.
It seems that chuck size is something related to a particular shard, not to the cluster itself. Am I right?
Speaking about the cardinality of a shard key:
Consider the use of a state field as a shard key:
The state key’s value
holds the US state for a given address document. This field has a low
cardinality as all documents that have the same value in the state
field must reside on the same shard, even if a particular state’s
chunk exceeds the maximum chunk size.
Since there are a limited number of possible values for the state field, MongoDB may distribute data unevenly among a small number of fixed chunks.
My question is how the shard key relates to the chunk size.
It seems to me that, having just two shard servers, it wouldn't be possible to distribute the data because same value in the state field must reside on the same shard. With three documents with states like Arizona, Indiana and Maine, how data is distributed among just two shards?
In order to understand the answer to your question you need to understand range based partitioning. If you have N documents they will be partitioned into chunks - the way the split points are determined is based on your shard key.
With shard key being some field in your document, all the possible values of the shard key will be considered and all the documents will be (logically) split into chunks/ranges, based on what value each document's shard key is.
In your example there are 50 possible values for "state" (okay, probably more like 52) so at most there can only be 52 chunks. Default chunk size is 64MB. Now imagine that you are sharding a collection with ten million documents which are 1K each. Each chunk should not contain more than about 65K documents. Ten million documents should be split into more than 150 chunks, but we only have 52 distinct values for the shard key! So your chunks are going to be very large. Why is that a problem? Well, in order to auto-balance chunk among shards the system needs to migrate chunks between shards and if the chunk is too big, it can't be moved. And since it can't be split, you'll be stuck with unbalanced cluster.
There is definitely a relationship between shard key and chunk size. You want to choose a shard key with a high level of cardinality. That is, you want a shard key that can have many possible values as opposed to something like State which is basically locked into only 50 possible values. Low cardinality shard keys like that can result in chunks that consist of only one of the shard key values and thus can not be split and moved to another shard in a balancing operation.
High cardinality of the shard key (like a person's phone number as opposed to their State or Zip Code) is essential to ensure even distribution of data. Low cardinality shard keys can lead to larger chunks (because you have more contiguous values that need to be kept together) that can not be split.