MongoDB Shard chunks already have an identically - mongodb

We has a MongoDB Shard configured with 4 shards (A,B,C,D). Shard D was added after A,B,C. Some collections correctly balance with D but just one collection has some problem with migration.
On D logs show this message all time.
W SHARDING [migrateThread] Cannot receive chunk [{ _id:
ObjectId('5ad5586b7ee7821b48139cfb') }, { _id:
ObjectId('5ad6d2d77ee78222283cc9d5') }) for collection products
because we already have an identically named collection with UUID
c16daf18-9412-437b-a1ba-a9e000e694ac, which differs from the donor's
UUID 25a21963-d9ba-4022-becc-648d4d39a68c. Manually drop the
collection on this shard if it contains data from a previous
incarnation of products.
I understand the error, but I don't know how do that.
If I go to mongos and use status(), the collection products not show on shard D, but on logs says the contrary.
I don't know, if I connect on shard D and run db.products.drop(), this action delete just on D or in all shard?

Yes, you need to connect directly to shard D and drop the collection there, which will NOT drop if from the other shards. Then when the balancer runs again, the balancer will be able to create the collection itself. Somehow the collection was there on the replica set before it was added as a shard. (You also might want to do a mongodump directly from the shard replica set (not the mongos), just in case there is data you want there.

Related

How to make data in a sharded collection move away from a shard [migrated]

This question was migrated from Stack Overflow because it can be answered on Database Administrators Stack Exchange.
Migrated 3 days ago.
I have a sharded collection with 3 shards (shard1, shard2, shard3). Each shard is a 3-node replicaset (rs1, rs2, rs3).
I have a db called activity that has a large sharded collection called items. ie( activity.items). The data in this collection is split across shard{1,2,3}.
I have another db called app and collection called users (ie, app.users). This is not a sharded collection. It is housed on shard1.
I want the data from activity.items that current resides on shard1 to no longer reside on shard1. I don't want any activity.items data on shard1.. It should either move to shard2 or shard3. Or if it's easier, I can spin up a shard4
Is this possible? Any high level guidance on which commands to be looking at example docs would be greatly appreciated. I'm open to alternative solutions that achieve my goal of moving the activity.items collection data away from shard1.
You can use moveChunk command.
This old script is used to move everything to one shard (Primary), but you get the idea. In your case, use just filter (for source) only one shard and then for the destination you create "loop" where you distribute those chunks evenly to those other shards. Of course, you can move them to only one shard, and let the balancer to distribute those chunks.
database = 'activity'
collection = database + '.items.chunks'
sh.stopBalancer()
use config
primary = db.databases.findOne({_id: database}).primary
// move all chunks to primary
db.chunks.find({ns: collection, shard: {$ne: primary}}).forEach(function(chunk){
print('moving chunk from', chunk.shard, 'to', primary, '::', tojson(chunk.min), '-->', tojson(chunk.max));
sh.moveChunk(collection, chunk.min, primary);
});

MongoDB sharding with repeated documents

I am new to mongodb and wish to create a distributed database environment using docker-compose with mongodb. I've created multiple docker with shards to simulate multiple sites. However, I have a problem to replicate the same set of documents into multiple shards.
For example I have a collection with a key that has value "A" and "B". I want to distribute this collection into 2 shards where
Shard 1 = A & B
Shard 2 = B only
However, when I run the balancer it distributes all A's into shard 1 and B's into shard 2. Is there any way I can do the sharding with repeated data or am I using the wrong approach for my problem?
You might be approaching sharding (horizontal scaling) incorrectly. What makes sharding in Mongo work is that the sharding key is chosen such that it results in (vertical) shards which have a roughly even distribution of data, or a similar number of Mongo documents. A requirement of sharding which makes it work well is that queries would typically be directed to only a single shard. If you have queries which need to return some field having the different values of A and B, then it implies that this field should not be the sharding key. Queries can go across shards, but certain cross-shard operations, such as joins, can be very costly. In your particular case, perhaps some other field could be used as sharding key.
Redundancy in MongoDB is provided by replica sets, not sharded clusters.
Each shard can be backed by a replica set with your desired number of nodes to provide the required redundancy level.
It is not possible to have the same document be (authoritatively) located in multiple shards.

mongodb - one collection per shard

My system is built on multi-tenancy, and I'm intending to apply database sharding and replica set on it. This is new to me, so I have some questions below:
Is it possible to partition collection disjoint to one shard only? That means instead of splitting some documents in 1 shard and some others in another shard, I want to put 1 collection completely in 1 shard, and another collection completely in another shard. Because my multi-tenant system is built on schema-per-tenant, so 1 collection represents 1 tenant. Putting each of them completely in 1 shard would make aggregate query more reliable with in that tenant's scope.
If MongoDB is unable to support the answer of question 1, how can I aggregate the queried data among shards correctly if a collection's documents are scattered?
I want to know the full extent of support provided by DBMS instead of delegating the logic into backend. Thank you very much

Insert data to Mongob Shard Cluster?

At first below is my shard cluster I create by Ops Manager:
I have 2 Mongos and 2 Shard (each shard configure replicates set). I not configure any shard key, I mean not sharded collections esxit in my cluster.
When I use mongos to insert a database for testing purposes, the database store only one Shard.
So I want when I insert a database, data can split and store balance on both shards. And I can query from mongos to get accurate data.
Anyone have the same issue?
Databases and collections are not sharded automatically: a sharded deployment can contain both unsharded and sharded data. Unsharded collections will be created on the primary shard for a given database.
If you want to shard a collection you need to take a few steps in the mongo shell connected to a mongos process for your sharded deployment:
Run sh.enableSharding(<database>) for a database (this is a one-off action per database)
Choose a shard key using sh.shardCollection()
See Shard a Collection in the MongoDB manual for specific steps.
It is important to choose a good shard key for your data distribution and use case. Poor choices of shard key may result in unequal data distribution or limit your sharding performance. The MongoDB documentation has more information on the considerations and options for choosing a shard key.
If you are not sure a collection if a collection sharded or want to see a summary of the current data distribution, you can use db.collection.getShardDistribution() in the mongo shell.
You need to implement Zone Range so according the range the data will be stored for each shred.
The code bellows helps you to create zones :
For the zone01 :
sh.addShardTag("rs1", "zone01")
sh.addTagRange("myDB.col01", { num: 1 }, { num: 10 }, "zone01")
For the zone02 :
sh.addShardTag("rs2", "zone02")
sh.addTagRange("myDB.col01", { num: 11 }, { num: 20 }, "zone02")
This will help you Manage Shard Zones

Query against local MongoDB shard data only

I have a sharded collection, with a shard key "user id". I would like to perform a query where, instead of passing the shard key, I simply restrict the query to only the data on the local mongos shard.
Is this possible / advisable?
Furthermore, can it be used with findAndModify? This would allow me to perform atomic updates on local documents, without specifying a shard key in the query.
Edit
As stated in some answers and comments below, my understanding of mongos vs. mongod was a little skewed. I now appreciate that mongos doesn't hold the local data.
Does mongos have any "local" data?
No. Each mongos daemon routes queries to your shards and does not store any data itself, so there is no such concept as "local" documents stored by a mongos. The mongos interface provides a logical view of the entire sharded cluster and does not have affinity to a specific shard.
Based on the type of query/command you send to mongos, the query will be:
Directed: sent to a specific shard if the query uses the shard key
Targeted: sent to applicable shards if the query includes multiple shard key values (or uses a prefix subset of a compound shard key)
Scatter/gather: sent to all shards, if the query is not using the shard key
Should I read from shards directly?
No. It's technically possible to read data from the shards directly but definitely not recommendable as you can get an inconsistent view of data. For example, if there is a migration in progress the data will temporarily exist on both the donor shard and the target shard. Similarly, copies of documents may be orphaned as the result of failed migrations.
A query through mongos correctly directs queries to the appropriate shard(s) and filters results based on the sharded cluster metadata.
Can I use findAndModify() on a sharded collection without a query based on a shard key?
No. For a sharded collection, findAndModify() requires a query based on the shard key. The shard key provides a guarantee that the requested document only exists on one shard.
Can I update sharded collections without going through mongos?
No. All updates to a sharded collection must go through mongos.
Please keep in mind, that doing so is unadvised as traffic to a shared cluster should go through a mongos service.
That being said, It's possible to query the shard itself if you're performing the query locally on the shard instance.
I've never tried to do that programatically, but It may worth a shot.
You can either login directly to the machine running the shard, and open a mongo shell there (if you've never created a local user/password on it, I believe you can connect without credentials, otherwise, the mongod process on that specific shard must have it's own user/pass (as those which were created via the mongos are not recognised in the mongod shards.
As each shard knows its own data files only, and for example you'll run a count() operation on one of your collection you'll see that the result is only a portion of the actual collection size.
Your question is a little vague since you mix your English:
I simply restrict the query to only the data on the local mongos shard.
The shard will infact be a mongod process, not a mongos process, however your English can make sense if you have a mongos per shard in which case it makes sense that you want to direct to a mongos on that shard that can query its local mongod data.
If you are considering on circumventing the mongos then #Stennies comment answers your question however, if your English means something else then I do not believe the mongos has a command switch to allow you to direct queries without a shard key currently.