Sharded collections and MongoDb

Sharded collections and MongoDb - mongodb

I've created a sharded collection on Cosmos (for use with C# MongoDB driver) through the portal. Created using Data Explorer -> New Collection - Shard Key set at this point.
I've set the shard key to be partitionId.
As an example when trying to insert this document into a collection named "data":
db.data.insert({partitionId:"test"})
I receive the error Command insert failed: document does not contain shard key.

Edit:
There seem to be issues when creating the sharded collection using the portal. Manually creating the sharded collection should work, see: https://stackoverflow.com/a/48202411/5405453
Original:
From the docs:
The shard key determines the distribution of the collection’s
documents among the cluster’s shards. The shard key is either an
indexed field or indexed compound fields that exists in every document
in the collection.
On creation of the sharded collection, you have provided a key which should be used as shard key. Next if you insert a document it has to contain that key. See here.

Related

MongoDB Shard Key vs Query Index

I have set up my first mongodb sharded cluster and am finally at the stage where I create a db/collection and choose the shard key. I’ve read about how to choose an appropriate shard key and am likely going with a hashed index but I might be having some conceptual misunderstandings.
My documents are super simple and contain a document id (some natural number), a document version id (a natural number), and a string of the raw text itself. If I understand correctly from the documentation, I can choose to shard on the document id but this can lead to jumbo shards since the document id will be incremented and new documents will be added to the same shard. And so I could set the shard key as a hashed value of the document id.
My question is whether or not I can still continue to query by the document id? My brain is making me doubt this and making me think that the indexing of the documents is over the hashed shard key and not over the document id. I am hoping that the hashed shard key is used strictly for sharding and that I can set any key (i.e., document id) to be indexed. Is this correct?

Yes, you can still query by the value of the shard key.
If you are referring to _id, that will be automatically indexed with it's natural value, otherwise you could explicitly create and index on the document id that is not hashed in addition to the shard key index.
As long as you test for equality to a single or explicit list of values, the query should be handled by the minimum number of shards.
However, if you use a ranged test such as $gte, the query will have to be forwarded to every shard to be processed.
Using the hashed document id as the shard key will result in the creation of an index for the hashed value in addition to any other indexes.
There is a pretty good description of hashed sharding in the documentation

Is it possible to create several sharding keys for collection in MongoDB?

I am confused a bit about sharding key in mongo.
Is it possible to use several sharding keys when you creates shard?

Shard key indexes are defined at the collection level and each collection within a database can only have a single shard key index. Within a sharded cluster you have the choice of sharding some or all collections.
It is important to note that shard keys are immutable and once the shard key is created, it cannot be modified.
For more information see:
Deploy a Sharded Cluster
Considerations for Selecting Shard Keys

MongoDB Shard key Selection

I've a scenario in which I don't know what would be the structure & fields of collections in MongoDb. Also there will be like multiple single DB per user(Like Multi-tenant DB).
I'll be deploying Replicated sharded cluster in production.For scaling & better machine optimization, I'm applying sharding on per DB basis during the creation of each DB, and each collection under the same DB will be sharded to different shards. Now in this scenario I'm not sure which key would be the best choice since the structure & field(s) of collection(s) which would be created under each DB will be unknown. Since the structure of DB, Collection is unknown I can't forecast which type of query will be used most of the time. So I want to select a shard key which would fulfill all the criteria for shard key selection like: Cardinality, Query Isolation, Monotonically increasing, Write scaling, Easily divisible.
What would be the solution in this scenario?
Also What if I select all the fields under that collection for shard key along with hashed _id field as compound key?

Once you create a shard key you can not edit it.
So keep pumping the data into the collection, once you get clarity on the fields you can shard the collections any time.
Rebalancing happens automatically after sharding.

sharded collection's indexes need to start with the shard key?

I read through the sharding docs on the mongo official site.
However, I can't an answer for these:
Do all of a sharded collection's indexes need to start with the shard key?
If I required a TTL index on a field for a sharded collection, and since compound indexes are not supported for TTL, what kind I do in this case? (field != shard key)

No. You can have any index on a sharded collection. However, queries which do not include the shard key will be sent to all shards. The individual shard will then make use of any existing index, sending back it's result to the mongos query router, which in turn will sort the results, if required, and send the result set back to the client. Please read Routing Process in the MongoDB docs for further details.
The TTL removal is a background process which runs on a date field. Each of your shards will spawn said background process. So you can simply create the TTL index on the date field of your choice. Each individual shard will take care of the documents which are to be deleted.

Where the mongodb sharded collection index info stored?

Let's suppose that a sharded collection with shard key named "skey"
and there is another indexed but not shard key named "ikey"
if the query like this,
db.collection.find({"ikey": "something"})
It will search the docs across the all shards because it is not a shard key.
At this point, how does the mongos know it should be searched across the shards? where is that index information stored? configServer? or each sharded mongod server?

Indices are stored on the individual shard. This goes for both the shard key and all other indices.
The mongos holds a cache of so called key ranges of the shard key for each shard. The key ranges are stored at the config servers which will be contacted by the mongos instances on certain occasions to retrieve those key ranges and their associated shards. The key ranges are basically a dictionary of shard key values and the name of the shard on which the document with the respective shard key lives on.
So if you query by shard key, mongos can identify the shards which hold the data and sends the query to the shards which hold the data with the defined key range(s). The shards then return the data to the mongos, which will sort the returned data, if necessary.
Mongos knows it has to send the query to all shards simply because the query does not contain the shard key for the respective collection.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Sharded collections and MongoDb - mongodb

Related

MongoDB Shard Key vs Query Index

Is it possible to create several sharding keys for collection in MongoDB?

MongoDB Shard key Selection

sharded collection's indexes need to start with the shard key?

Where the mongodb sharded collection index info stored?

Categories

Resources