MongoDB sh.status() and db.getShardDistribution() results not consistent about shards - mongodb

When I run sh.status() on my MongoDB server, it shows that the collections are sharded, that there are 3 shards and identifies the primary shard in each database.
When I run db.getCollection('ReportRow').getShardDistribution(), it returns
Collection reporting.ReportRow is not sharded, even though sh.status shows that it is.
Any ideas on why MongoDb would have this discrepancy?

The shard instances exist but the data is not being sharded.
Check to insure the shard key exists on the collection as an index:
- db.collection.getIndexes()

Related

Insert data to Mongob Shard Cluster?

At first below is my shard cluster I create by Ops Manager:
I have 2 Mongos and 2 Shard (each shard configure replicates set). I not configure any shard key, I mean not sharded collections esxit in my cluster.
When I use mongos to insert a database for testing purposes, the database store only one Shard.
So I want when I insert a database, data can split and store balance on both shards. And I can query from mongos to get accurate data.
Anyone have the same issue?
Databases and collections are not sharded automatically: a sharded deployment can contain both unsharded and sharded data. Unsharded collections will be created on the primary shard for a given database.
If you want to shard a collection you need to take a few steps in the mongo shell connected to a mongos process for your sharded deployment:
Run sh.enableSharding(<database>) for a database (this is a one-off action per database)
Choose a shard key using sh.shardCollection()
See Shard a Collection in the MongoDB manual for specific steps.
It is important to choose a good shard key for your data distribution and use case. Poor choices of shard key may result in unequal data distribution or limit your sharding performance. The MongoDB documentation has more information on the considerations and options for choosing a shard key.
If you are not sure a collection if a collection sharded or want to see a summary of the current data distribution, you can use db.collection.getShardDistribution() in the mongo shell.
You need to implement Zone Range so according the range the data will be stored for each shred.
The code bellows helps you to create zones :
For the zone01 :
sh.addShardTag("rs1", "zone01")
sh.addTagRange("myDB.col01", { num: 1 }, { num: 10 }, "zone01")
For the zone02 :
sh.addShardTag("rs2", "zone02")
sh.addTagRange("myDB.col01", { num: 11 }, { num: 20 }, "zone02")
This will help you Manage Shard Zones

Mongodb sharded cluster $in VS $or

If I have MongoDB shurded cluster in sharded key: "my_key".
I have to find in collection pack documents (about 10-500 items) with different my_key's.
Foe example:
db.test.find({my_key: {$in:[1,3,5,67,45,56...]}})
Mongos knows where chunks with 'my_key' stored.
Can mongos split my query to small queries to exact shards where documents stored? Or mongos will send this query to ALL shards?
And the same question about $or
db.test.find({$or:[{my_key: 1},{my_key: 3},{my_key: 5}...]})
I have run tests.
If $in contains values only from one shard mongos will send SINGLE_SHARD query.
If $in contains values from several shards then mongos will send SHARD_MERGE query only for shards than contains needed data (not all cluster).

sharded collection's indexes need to start with the shard key?

I read through the sharding docs on the mongo official site.
However, I can't an answer for these:
Do all of a sharded collection's indexes need to start with the shard key?
If I required a TTL index on a field for a sharded collection, and since compound indexes are not supported for TTL, what kind I do in this case? (field != shard key)
No. You can have any index on a sharded collection. However, queries which do not include the shard key will be sent to all shards. The individual shard will then make use of any existing index, sending back it's result to the mongos query router, which in turn will sort the results, if required, and send the result set back to the client. Please read Routing Process in the MongoDB docs for further details.
The TTL removal is a background process which runs on a date field. Each of your shards will spawn said background process. So you can simply create the TTL index on the date field of your choice. Each individual shard will take care of the documents which are to be deleted.

Where the mongodb sharded collection index info stored?

Let's suppose that a sharded collection with shard key named "skey"
and there is another indexed but not shard key named "ikey"
if the query like this,
db.collection.find({"ikey": "something"})
It will search the docs across the all shards because it is not a shard key.
At this point, how does the mongos know it should be searched across the shards? where is that index information stored? configServer? or each sharded mongod server?
Indices are stored on the individual shard. This goes for both the shard key and all other indices.
The mongos holds a cache of so called key ranges of the shard key for each shard. The key ranges are stored at the config servers which will be contacted by the mongos instances on certain occasions to retrieve those key ranges and their associated shards. The key ranges are basically a dictionary of shard key values and the name of the shard on which the document with the respective shard key lives on.
So if you query by shard key, mongos can identify the shards which hold the data and sends the query to the shards which hold the data with the defined key range(s). The shards then return the data to the mongos, which will sort the returned data, if necessary.
Mongos knows it has to send the query to all shards simply because the query does not contain the shard key for the respective collection.

Query against local MongoDB shard data only

I have a sharded collection, with a shard key "user id". I would like to perform a query where, instead of passing the shard key, I simply restrict the query to only the data on the local mongos shard.
Is this possible / advisable?
Furthermore, can it be used with findAndModify? This would allow me to perform atomic updates on local documents, without specifying a shard key in the query.
Edit
As stated in some answers and comments below, my understanding of mongos vs. mongod was a little skewed. I now appreciate that mongos doesn't hold the local data.
Does mongos have any "local" data?
No. Each mongos daemon routes queries to your shards and does not store any data itself, so there is no such concept as "local" documents stored by a mongos. The mongos interface provides a logical view of the entire sharded cluster and does not have affinity to a specific shard.
Based on the type of query/command you send to mongos, the query will be:
Directed: sent to a specific shard if the query uses the shard key
Targeted: sent to applicable shards if the query includes multiple shard key values (or uses a prefix subset of a compound shard key)
Scatter/gather: sent to all shards, if the query is not using the shard key
Should I read from shards directly?
No. It's technically possible to read data from the shards directly but definitely not recommendable as you can get an inconsistent view of data. For example, if there is a migration in progress the data will temporarily exist on both the donor shard and the target shard. Similarly, copies of documents may be orphaned as the result of failed migrations.
A query through mongos correctly directs queries to the appropriate shard(s) and filters results based on the sharded cluster metadata.
Can I use findAndModify() on a sharded collection without a query based on a shard key?
No. For a sharded collection, findAndModify() requires a query based on the shard key. The shard key provides a guarantee that the requested document only exists on one shard.
Can I update sharded collections without going through mongos?
No. All updates to a sharded collection must go through mongos.
Please keep in mind, that doing so is unadvised as traffic to a shared cluster should go through a mongos service.
That being said, It's possible to query the shard itself if you're performing the query locally on the shard instance.
I've never tried to do that programatically, but It may worth a shot.
You can either login directly to the machine running the shard, and open a mongo shell there (if you've never created a local user/password on it, I believe you can connect without credentials, otherwise, the mongod process on that specific shard must have it's own user/pass (as those which were created via the mongos are not recognised in the mongod shards.
As each shard knows its own data files only, and for example you'll run a count() operation on one of your collection you'll see that the result is only a portion of the actual collection size.
Your question is a little vague since you mix your English:
I simply restrict the query to only the data on the local mongos shard.
The shard will infact be a mongod process, not a mongos process, however your English can make sense if you have a mongos per shard in which case it makes sense that you want to direct to a mongos on that shard that can query its local mongod data.
If you are considering on circumventing the mongos then #Stennies comment answers your question however, if your English means something else then I do not believe the mongos has a command switch to allow you to direct queries without a shard key currently.