MongoDB sharding is possible on collections? - mongodb

Can is it possible sharding only on collections ? if yes than how..?
What is difference between sharding on database and on collections?

Mongodb shards collections. You enable sharding on database but just enabling sharding on database will not distribute data across shards. To distribute data accross shards you need to tell mongodb what collection to distribute. So, you have to shard your collection and then only that collection will be spread across the shards.
Remember, mongodb will distribute data on the basis of collections sharded. If you have 2 collections in your database and you shard one of them then data of sharded collection will be spread out across the shards but the other collection will have all data on one shard.
In plain language, mongodb doesn't shard whole database automatically. Mongodb sharding works on collection level.

Related

how MongoRepository.findAll() will wok on sharded collection

I have a collection which is sharded by 2 fields, and my requirement is to find all the data irrespective of the shard key.
I'm using Spring Data MongoDb.
Will MongoRepository.findAll() work in sharded collection?
and will it fetch data from all the sharded collection ?
I'm using MongoRepository.findAll()
it should return all the data from all the sharded collection

MongoDB sharding with repeated documents

I am new to mongodb and wish to create a distributed database environment using docker-compose with mongodb. I've created multiple docker with shards to simulate multiple sites. However, I have a problem to replicate the same set of documents into multiple shards.
For example I have a collection with a key that has value "A" and "B". I want to distribute this collection into 2 shards where
Shard 1 = A & B
Shard 2 = B only
However, when I run the balancer it distributes all A's into shard 1 and B's into shard 2. Is there any way I can do the sharding with repeated data or am I using the wrong approach for my problem?
You might be approaching sharding (horizontal scaling) incorrectly. What makes sharding in Mongo work is that the sharding key is chosen such that it results in (vertical) shards which have a roughly even distribution of data, or a similar number of Mongo documents. A requirement of sharding which makes it work well is that queries would typically be directed to only a single shard. If you have queries which need to return some field having the different values of A and B, then it implies that this field should not be the sharding key. Queries can go across shards, but certain cross-shard operations, such as joins, can be very costly. In your particular case, perhaps some other field could be used as sharding key.
Redundancy in MongoDB is provided by replica sets, not sharded clusters.
Each shard can be backed by a replica set with your desired number of nodes to provide the required redundancy level.
It is not possible to have the same document be (authoritatively) located in multiple shards.

Insert data to Mongob Shard Cluster?

At first below is my shard cluster I create by Ops Manager:
I have 2 Mongos and 2 Shard (each shard configure replicates set). I not configure any shard key, I mean not sharded collections esxit in my cluster.
When I use mongos to insert a database for testing purposes, the database store only one Shard.
So I want when I insert a database, data can split and store balance on both shards. And I can query from mongos to get accurate data.
Anyone have the same issue?
Databases and collections are not sharded automatically: a sharded deployment can contain both unsharded and sharded data. Unsharded collections will be created on the primary shard for a given database.
If you want to shard a collection you need to take a few steps in the mongo shell connected to a mongos process for your sharded deployment:
Run sh.enableSharding(<database>) for a database (this is a one-off action per database)
Choose a shard key using sh.shardCollection()
See Shard a Collection in the MongoDB manual for specific steps.
It is important to choose a good shard key for your data distribution and use case. Poor choices of shard key may result in unequal data distribution or limit your sharding performance. The MongoDB documentation has more information on the considerations and options for choosing a shard key.
If you are not sure a collection if a collection sharded or want to see a summary of the current data distribution, you can use db.collection.getShardDistribution() in the mongo shell.
You need to implement Zone Range so according the range the data will be stored for each shred.
The code bellows helps you to create zones :
For the zone01 :
sh.addShardTag("rs1", "zone01")
sh.addTagRange("myDB.col01", { num: 1 }, { num: 10 }, "zone01")
For the zone02 :
sh.addShardTag("rs2", "zone02")
sh.addTagRange("myDB.col01", { num: 11 }, { num: 20 }, "zone02")
This will help you Manage Shard Zones

How does MongoDB distribute data across a cluster

I've read about sharding a collection in MongoDB. MongoDB lets me shard a collection explicitly by calling shardCollection method. There I can choose whether I want it to be rangely shareded or hashingly sharded.
My question is, what would happen if I didn't call the shardCollection method, and I had say 100 nodes?
Would MongoDB keep the collections intact and distribute them across the cluster?
Would MongoDB keep all the collections in a single node?
Do I completely not understand how this works?
A database can have a mixture of sharded and unsharded collections. Sharded collections are partitioned and distributed across shards in the cluster. As at MongoDB 3.4, each database has a primary shard where the unsharded collections are stored. If your deployment has a number of databases this may result in some distribution of unsharded collections, but there is no balancing activity for unsharded data. For more information on expected behaviours, see the Sharding section in the MongoDB manual.
If you are interested in distribution of unsharded collections within a sharded database, there is a relevant feature request you can watch/upvote in the MongoDB issue tracker: SERVER-939: Ability to distribute collections in a single DB.

Mongodb - sharded and unsharded collections

I'm a bit confused as to how this works.
When sharding MySQL, we had some tables, usually small ones with reference data, whole in each shard. This was to enable joins.
If we have small collections in MongoDB, that we don't shard in a sharded setup, what happens to them? Do they get sent to each shard, or just stay in the first shard?
This strikes me as a possible potential bottleneck, if all processes in a heavily sharded system with many application servers were hitting on one server.
In MongoDB with the autosharding feature, a sharded collection will be distributed somehow evenly along all the shards you have.
With those collections which you not likely to shard (which are not sharded) you can specify a primary shard which will they reside on. This primary shard is a given one for a specific database, so it is on per database level. Can be moved and can be different for different databases.
There is the notion of shard tagging which with you can influence for sharded collections where to be placed. Basicly you can constraint a collection or a part of a collection to be stored on a specific set of shards. (Reference)