Let's suppose that a sharded collection with shard key named "skey"
and there is another indexed but not shard key named "ikey"
if the query like this,
db.collection.find({"ikey": "something"})
It will search the docs across the all shards because it is not a shard key.
At this point, how does the mongos know it should be searched across the shards? where is that index information stored? configServer? or each sharded mongod server?
Indices are stored on the individual shard. This goes for both the shard key and all other indices.
The mongos holds a cache of so called key ranges of the shard key for each shard. The key ranges are stored at the config servers which will be contacted by the mongos instances on certain occasions to retrieve those key ranges and their associated shards. The key ranges are basically a dictionary of shard key values and the name of the shard on which the document with the respective shard key lives on.
So if you query by shard key, mongos can identify the shards which hold the data and sends the query to the shards which hold the data with the defined key range(s). The shards then return the data to the mongos, which will sort the returned data, if necessary.
Mongos knows it has to send the query to all shards simply because the query does not contain the shard key for the respective collection.
Related
I have a collection with a lot of documents. I shared that collection. I have 2 shard clusters. But yet all documents reside in the primary shard cluster. Why it is not split into different shards?
100 % of data is still in the primary shard
For my application I need to shard a fairly big collection, the entire collection will contain app. 500 billion documents.
I have two potential fields which can be used as Sharding Key:
For inserting either Sharding Key will distribute documents evenly throughout the cluster, there is does not matter which field I use as Sharding Key.
For query it is different.
Field(1) is usually part of the query filter condition, thus query would be processed usually on a single shard only.
Field(2) is typically not part of the query filter condition, thus query would be processed over all shards and typically several shards will contribute to final query result.
Which one is the better field to be used as Sharding Key? I did not find anything in MongoDB documentation about that topic.
Either fields have the same range and very similar cardinality figures, there won't be any difference. Usually the number of documents returned by a query is very low (typically less than 20-30 documents).
In a sharded cluster the mongos router determines which shard is to be targeted for a read or write operation - based on the available shard key meta-data stored on the config servers.
For inserting either Sharding Key will distribute documents evenly
throughout the cluster, there is does not matter which field I use as
Sharding Key.
When you insert a document it will have a shard key and the document will be stored on a designated shard.
Field(1) is usually part of the query filter condition, thus query
would be processed usually on a single shard only.
The shard key's main purposes are (a) to distribute data evenly across shards in a cluster, and (b) to be able to query the data in such a way that the query targets a single shard.
For a query to target a single shard, the shard key must be part of the query's filter criteria. The mongos router will target the single shard using the shard key.
If the shard key is not part of the filter criteria it will be a scatter-gather operation (a long running query). It is important that the most important query operations of the application using the sharded collection must be able use the shard key.
Field(2) is typically not part of the query filter condition, thus
query would be processed over all shards and typically several shards
will contribute to final query result.
When the shard key is not part of the query filter, the operation will span across multiple shards (a scatter-gather operation) and it will be a slow running operation. The mongos router will not be able to determine which shards have the target data, and all the shards in the cluster will be queried to return the final result.
Which one is the better field to be used as Sharding Key?
It can be concluded that the Field(1) must be used as a shard key.
See documentation on shard keys and choosing a shard key # MongoDB docs on Shard Keys.
I am confused a bit about sharding key in mongo.
Is it possible to use several sharding keys when you creates shard?
Shard key indexes are defined at the collection level and each collection within a database can only have a single shard key index. Within a sharded cluster you have the choice of sharding some or all collections.
It is important to note that shard keys are immutable and once the shard key is created, it cannot be modified.
For more information see:
Deploy a Sharded Cluster
Considerations for Selecting Shard Keys
I've a scenario in which I don't know what would be the structure & fields of collections in MongoDb. Also there will be like multiple single DB per user(Like Multi-tenant DB).
I'll be deploying Replicated sharded cluster in production.For scaling & better machine optimization, I'm applying sharding on per DB basis during the creation of each DB, and each collection under the same DB will be sharded to different shards. Now in this scenario I'm not sure which key would be the best choice since the structure & field(s) of collection(s) which would be created under each DB will be unknown. Since the structure of DB, Collection is unknown I can't forecast which type of query will be used most of the time. So I want to select a shard key which would fulfill all the criteria for shard key selection like: Cardinality, Query Isolation, Monotonically increasing, Write scaling, Easily divisible.
What would be the solution in this scenario?
Also What if I select all the fields under that collection for shard key along with hashed _id field as compound key?
Once you create a shard key you can not edit it.
So keep pumping the data into the collection, once you get clarity on the fields you can shard the collections any time.
Rebalancing happens automatically after sharding.
I would like to shard my collection on the basis of range on mongodb shards, my question is if shard key is string field then how will we divide string based shard key in different chunks for range based sharding ???
You can divide a string across shards using tag aware sharding. You create the "tags" denoting the ranges of the key to assign to a specific shard. Mongo's balancer will handle the distribution of the data and when you write a query for the key in question Mongo will know to target only that shard.
For more information see the following URL from the vendor. sharding-introduction/