Insert data to Mongob Shard Cluster? - mongodb

At first below is my shard cluster I create by Ops Manager:
I have 2 Mongos and 2 Shard (each shard configure replicates set). I not configure any shard key, I mean not sharded collections esxit in my cluster.
When I use mongos to insert a database for testing purposes, the database store only one Shard.
So I want when I insert a database, data can split and store balance on both shards. And I can query from mongos to get accurate data.
Anyone have the same issue?

Databases and collections are not sharded automatically: a sharded deployment can contain both unsharded and sharded data. Unsharded collections will be created on the primary shard for a given database.
If you want to shard a collection you need to take a few steps in the mongo shell connected to a mongos process for your sharded deployment:
Run sh.enableSharding(<database>) for a database (this is a one-off action per database)
Choose a shard key using sh.shardCollection()
See Shard a Collection in the MongoDB manual for specific steps.
It is important to choose a good shard key for your data distribution and use case. Poor choices of shard key may result in unequal data distribution or limit your sharding performance. The MongoDB documentation has more information on the considerations and options for choosing a shard key.
If you are not sure a collection if a collection sharded or want to see a summary of the current data distribution, you can use db.collection.getShardDistribution() in the mongo shell.

You need to implement Zone Range so according the range the data will be stored for each shred.
The code bellows helps you to create zones :
For the zone01 :
sh.addShardTag("rs1", "zone01")
sh.addTagRange("myDB.col01", { num: 1 }, { num: 10 }, "zone01")
For the zone02 :
sh.addShardTag("rs2", "zone02")
sh.addTagRange("myDB.col01", { num: 11 }, { num: 20 }, "zone02")
This will help you Manage Shard Zones

Related

MongoDB sharding with repeated documents

I am new to mongodb and wish to create a distributed database environment using docker-compose with mongodb. I've created multiple docker with shards to simulate multiple sites. However, I have a problem to replicate the same set of documents into multiple shards.
For example I have a collection with a key that has value "A" and "B". I want to distribute this collection into 2 shards where
Shard 1 = A & B
Shard 2 = B only
However, when I run the balancer it distributes all A's into shard 1 and B's into shard 2. Is there any way I can do the sharding with repeated data or am I using the wrong approach for my problem?
You might be approaching sharding (horizontal scaling) incorrectly. What makes sharding in Mongo work is that the sharding key is chosen such that it results in (vertical) shards which have a roughly even distribution of data, or a similar number of Mongo documents. A requirement of sharding which makes it work well is that queries would typically be directed to only a single shard. If you have queries which need to return some field having the different values of A and B, then it implies that this field should not be the sharding key. Queries can go across shards, but certain cross-shard operations, such as joins, can be very costly. In your particular case, perhaps some other field could be used as sharding key.
Redundancy in MongoDB is provided by replica sets, not sharded clusters.
Each shard can be backed by a replica set with your desired number of nodes to provide the required redundancy level.
It is not possible to have the same document be (authoritatively) located in multiple shards.

MongoDB Shard chunks already have an identically

We has a MongoDB Shard configured with 4 shards (A,B,C,D). Shard D was added after A,B,C. Some collections correctly balance with D but just one collection has some problem with migration.
On D logs show this message all time.
W SHARDING [migrateThread] Cannot receive chunk [{ _id:
ObjectId('5ad5586b7ee7821b48139cfb') }, { _id:
ObjectId('5ad6d2d77ee78222283cc9d5') }) for collection products
because we already have an identically named collection with UUID
c16daf18-9412-437b-a1ba-a9e000e694ac, which differs from the donor's
UUID 25a21963-d9ba-4022-becc-648d4d39a68c. Manually drop the
collection on this shard if it contains data from a previous
incarnation of products.
I understand the error, but I don't know how do that.
If I go to mongos and use status(), the collection products not show on shard D, but on logs says the contrary.
I don't know, if I connect on shard D and run db.products.drop(), this action delete just on D or in all shard?
Yes, you need to connect directly to shard D and drop the collection there, which will NOT drop if from the other shards. Then when the balancer runs again, the balancer will be able to create the collection itself. Somehow the collection was there on the replica set before it was added as a shard. (You also might want to do a mongodump directly from the shard replica set (not the mongos), just in case there is data you want there.

Randomly distribute collections across cluster

Requirements:
up to one billion of documents per chunk (single shard key)
tens of thousands chunks(30k)
queries are run only in chunk scope - filtered by shard key
3 indexes - single: hashed shard key, compound: shard key + _id, compound: shard key + 3 fields
all access paths are write - insert, find and update, find and delete
What sharding strategy should I pick for MongoDb?
Mongo hash-based sharding with shard key (String)
Application level pseudo-sharding with each chunk going to its separate collection
Concerns about MongoDb:
Indexes won't fit in memory for billion of documents
All queries are write and Mongo is Master-Slave
Is option 1 a good idea?
With option 2, is it possible to randomly distribute collections across Mongo cluster?

MongoDB Shard key Selection

I've a scenario in which I don't know what would be the structure & fields of collections in MongoDb. Also there will be like multiple single DB per user(Like Multi-tenant DB).
I'll be deploying Replicated sharded cluster in production.For scaling & better machine optimization, I'm applying sharding on per DB basis during the creation of each DB, and each collection under the same DB will be sharded to different shards. Now in this scenario I'm not sure which key would be the best choice since the structure & field(s) of collection(s) which would be created under each DB will be unknown. Since the structure of DB, Collection is unknown I can't forecast which type of query will be used most of the time. So I want to select a shard key which would fulfill all the criteria for shard key selection like: Cardinality, Query Isolation, Monotonically increasing, Write scaling, Easily divisible.
What would be the solution in this scenario?
Also What if I select all the fields under that collection for shard key along with hashed _id field as compound key?
Once you create a shard key you can not edit it.
So keep pumping the data into the collection, once you get clarity on the fields you can shard the collections any time.
Rebalancing happens automatically after sharding.

Query against local MongoDB shard data only

I have a sharded collection, with a shard key "user id". I would like to perform a query where, instead of passing the shard key, I simply restrict the query to only the data on the local mongos shard.
Is this possible / advisable?
Furthermore, can it be used with findAndModify? This would allow me to perform atomic updates on local documents, without specifying a shard key in the query.
Edit
As stated in some answers and comments below, my understanding of mongos vs. mongod was a little skewed. I now appreciate that mongos doesn't hold the local data.
Does mongos have any "local" data?
No. Each mongos daemon routes queries to your shards and does not store any data itself, so there is no such concept as "local" documents stored by a mongos. The mongos interface provides a logical view of the entire sharded cluster and does not have affinity to a specific shard.
Based on the type of query/command you send to mongos, the query will be:
Directed: sent to a specific shard if the query uses the shard key
Targeted: sent to applicable shards if the query includes multiple shard key values (or uses a prefix subset of a compound shard key)
Scatter/gather: sent to all shards, if the query is not using the shard key
Should I read from shards directly?
No. It's technically possible to read data from the shards directly but definitely not recommendable as you can get an inconsistent view of data. For example, if there is a migration in progress the data will temporarily exist on both the donor shard and the target shard. Similarly, copies of documents may be orphaned as the result of failed migrations.
A query through mongos correctly directs queries to the appropriate shard(s) and filters results based on the sharded cluster metadata.
Can I use findAndModify() on a sharded collection without a query based on a shard key?
No. For a sharded collection, findAndModify() requires a query based on the shard key. The shard key provides a guarantee that the requested document only exists on one shard.
Can I update sharded collections without going through mongos?
No. All updates to a sharded collection must go through mongos.
Please keep in mind, that doing so is unadvised as traffic to a shared cluster should go through a mongos service.
That being said, It's possible to query the shard itself if you're performing the query locally on the shard instance.
I've never tried to do that programatically, but It may worth a shot.
You can either login directly to the machine running the shard, and open a mongo shell there (if you've never created a local user/password on it, I believe you can connect without credentials, otherwise, the mongod process on that specific shard must have it's own user/pass (as those which were created via the mongos are not recognised in the mongod shards.
As each shard knows its own data files only, and for example you'll run a count() operation on one of your collection you'll see that the result is only a portion of the actual collection size.
Your question is a little vague since you mix your English:
I simply restrict the query to only the data on the local mongos shard.
The shard will infact be a mongod process, not a mongos process, however your English can make sense if you have a mongos per shard in which case it makes sense that you want to direct to a mongos on that shard that can query its local mongod data.
If you are considering on circumventing the mongos then #Stennies comment answers your question however, if your English means something else then I do not believe the mongos has a command switch to allow you to direct queries without a shard key currently.