mongoDB sharded cluster and chunks - mongodb

I'm quite newbie on mongoDB and I'm wondering if it is normal if I don't have any chunk on a mongodb sharded cluster ?
Let me illustrate. I've got three shards :
mongos> use config
mongos> db.getSiblingDB("config").shards.find()
{ "_id" : "shard1", ... }
{ "_id" : "shard2", ... }
{ "_id" : "shard3", ... }
mongos>
I've got some databases, and especially one on shard1:
mongos> db.getSiblingDB("config").databases.find()
{ "_id" : "udev_prod", "partitioned" : false, "primary" : "shard1" }
But no chunks at all... :
mongos> db.getSiblingDB("config").chunks.find()
mongos>
on top of that, if I connect to the udev_prod database and try to get the sharded distribution of any collection, mongoDB tells me it's not sharded...
mongos> db.User.getShardDistribution()
Collection udev_prod.User is not sharded.
I think that I'm missing something here, or it is not working well.. could someone tell me if that situation is "normal" ?
Thanks a lot
Best Regards
Julien

This is the key piece from your find on databases:
"partitioned" : false
That means that the database does not have sharding enabled. You need to enable sharding for the database first, and then shard a collection (and pick a shard key) before any chunks are created. Otherwise the database just lives on one shard - it's still usable, just not sharded.
There is a full tutorial available for setting up a sharded cluster with sharded collections, this is the section you want to start with.

Related

Mongodb Memory engine vs Redis for caching the writes

I have a server for processing the users' page viewing history with Mongodb.
The collections are saved like this when a user views a page
view_collection
{ "_id" : "60b212afb63a57d57a8f0006",
"pageId" : "gh42RzrRqYbp2Hj1y",
"userId" : "9Swh4jkYOPjWSgxjm",
"uniqueString" : "s",
"views" : {
"date" : ISODate("2021-01-14T14:39:20.378+0000"),
"viewsCount" : NumberInt(1)
}}
page_collection
{"_id" : "gh42RzrRqYbp2Hj1y", "views" : NumberInt(30) ,"lastVisitors" : ["9Swh4jkYOPjWSgxjm"]}
user_collection
{
_id:"9Swh4jkYOPjWSgxjm",
"statistics" : {
"totalViewsCount" : NumberInt(1197) }
}
Everything is working fine, Except that I want to find a way to cache the operations going to database .
I've been thinking about how to use Redis to cache the writings and then periodically looping through the Redis-keys to get the results inserted into Database. (But It would be too complicated and needs lots of coding. ) Also, I found Mongodb has In-Memory Storage ,for which I might not need to re-write everything from zero and simply change some config files of mongod to get the cache-write works
Redis is a much less featureful data store than MongoDB. If you don't need any of the MongoDB functionality on your data, you can put it in redis for higher performance.
MongoDB in memory storage engine sounds like a premature optimization.

How to get notified when a new field is added to mongodb collections?

I have a graphQl schema defined which needs to be changed runtime whenever there is a new field added in a mongodb collection. For example, a collection has just two fields before
person {
"age" : "54"
"name" : "Tony"
}
And later a new field, "height" is added.
person {
"age" : "54"
"name" : "Tony"
"height" : "167"
}
I need to change my graphql schema and add height to that. So how do I get alerted or notifications from Mongodb ?
MongoDB does not natively implement event messaging. You cannot, natively, get informed of a DB, collections or document updates.
However, MongoDB integrates an 'operation log' feature, which allows you to get access to a journal log of each write operation on collections.
The journal logs are used for MongoDB replicas, aka cluster synchronization features. In order to activate oplogs you need to have at least two MongoDB instances, a master and a replicate.
Operations logs are built upon the capped collection feature, which allows a collection to be built over an append-only mechanism, which ensures fast writes and tailing cursors. Authors say:
The oplog exists internally as a capped collection, so you cannot
modify its size in the course of normal operations.
MongoDB - Change the Size of the Oplog
And:
Capped collections are fixed-size collections that support
high-throughput operations that insert and retrieve documents based on
insertion order. Capped collections work in a way similar to circular
buffers: once a collection fills its allocated space, it makes room
for new documents by overwriting the oldest documents in the
collection.
MongoDB - Capped Collections
The schema of the documents within an operation log journal looks like:
"ts" : Timestamp(1395663575, 1),
"h" : NumberLong("-5872498803080442915"),
"v" : 2,
"op" : "i",
"ns" : "wiktory.items",
"o" : {
"_id" : ObjectId("533022d70d7e2c31d4490d22"),
"author" : "JRR Hartley",
"title" : "Flyfishing"
}
}
Eg: "op" : "i" means operation is an insertion and "o" is the object inserted.
The same way, you can be informed of update operations:
"op" : "u",
"ns" : "wiktory.items",
"o2" : {
"_id" : ObjectId("533022d70d7e2c31d4490d22")
},
"o" : {
"$set" : {
"outofprint" : true
}
}
Note that the operation logs (you access them as collections) are limited either in disk size or entry numbers (FIFO). This means that, eventually, whwnever your oplog consumers are slower than oplog writers, you will get missed operation log entries, resulting in corrupted consumption results.
This is the reason why MongoDB is terrible for guaranteeing document tracking on highly sollicited clusters, and the reason why solutions for messaging such as Apache Kafka come as supplements for event tracking (eg: event document update)
To answer your question: in a reasonably solicited environment, you might want to take a look at the Javascript Meteor project, which allows you to trigger events based on changes from queries results, and relies on MongoDB oplog features.
Credits: oplogs examples from The MongoDB Oplog
As of MongoDb 3.6 you can subscribe to a change stream. You could subscribe to an "update" event operation. More details here:
https://stackoverflow.com/a/47184757/5103354

Whether all shards explain stages are same?

I have tree shard servers and one mongos server. I connect to the mongos server to run some queries. When I look at the explain output from these queries, it always gives me the same number of explain stages for each shard server. I wonder whether I can get a different number of explain execution stages for differen shard server. I have tried to create an index on one shard but not the other two. But it still gives me a same number of stages on explain query.
It is possible to have "SINGLE_SHARD" as a stage
e.g.
{
"queryPlanner" : {
"mongosPlannerVersion" : 1,
"winningPlan" : {
"stage" : "SINGLE_SHARD",
It means that a single shard was sent to the query (as opposed to when you don't specify a shard key and it has to be sent to multiple shards)

Mongo sharding not removing data of sharded collection in source shard

I have MongoDB 3.2.6 installed on 5 machines which all form sharded cluster consisting of 2 shards (each is replica set with primary-secondary-arbiter configuration).
I also have a database with very large collection (~50M records, 200GB) and it was imported through mongos which put it to primary shard along with other collections.
I generated hashed id on that collection which will be my shard key.
After thay I sharded collection with:
> use admin
> db.runCommand( { enablesharding : "my-database" } )
> use my-database
> sh.shardCollection("my-database.my-collection", { "_id": "hashed" } )
Comand returned:
{ "collectionsharded" : "my-database.my-collection", "ok" : 1 }
And it actually started to shard. Status of shard looks like this:
> db.my-collection.getShardingDistribution()
Totals
data : 88.33GiB docs : 45898841 chunks : 2825
Shard my-replica-1 contains 99.89% data, 99.88% docs in cluster, avg obj size on shard : 2KiB
Shard my-replica-2 contains 0.1% data, 0.11% docs in cluster, avg obj size on shard : 2KiB()
This all looks ok but problem is that when I count my-collection through mongos I see number is increasing.
When I log in to primary replica set (my-replica-1) I see that number of records in my-collection is not decreasing although number in my-replica-2 is increasing (which is expected) so I guess mongodb is not removing chunks from source shard while migrating to second shard.
Does anyone know is this normal and if not why is it happening?
EDIT: Actually now it started to decrease on my-replica-1 although it still grows when counting on mongos (sometimes it goes little down and then up). Maybe this is normal behaviour when migrating large collection, I don't know
Ivan
according to documentation here you are observing a valid situation.
When document is moved from a to b it is counted twice as long as a receive confirmation that relocation was successfule.
On a sharded cluster, db.collection.count() can result in an
inaccurate count if orphaned documents exist or if a chunk migration
is in progress.
To avoid these situations, on a sharded cluster, use the $group stage
of the db.collection.aggregate() method to $sum the documents. For
example, the following operation counts the documents in a collection:
db.collection.aggregate(
[
{ $group: { _id: null, count: { $sum: 1 } } }
]
)

MongoDb: Is it harmful to issue a command to enable sharding on a collection which already has sharding enabled?

One of our project is using MongoDb and our database => collections are sharded. BTW, our collection gets created dynamically and hence we had to write a code inside our Application layer written in PHP to inform MongoDb to shard the collections. Hence I am looking for two options:
Is there any way to identify if the collection is shareded via PHP? [or]
Is there any problem in telling MongoDb to shard a collection which is sharded?
1] you can identify whether the collection is sharded or not by manipulating string output given by db.printShardingStatus()
check this to know how to print output of MongoDB
2] No issue to ask MongoDB to shard a collection which is already sharded.
Mongo wil never set "ok" to 1 if it is already sharded.
db.runCommand({ enablesharding : "sharded_db_name" })
{ "ok" : 0, "errmsg" : "already enabled" }
1.I'm not entirely sure on this as I don't use the PHP driver, however, on the shell when one runs sh.status(), it is possible to see what the sharded collection is. For example, below I have sharded the "tweets" collection in the "twitter" database -
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "twitter", "partitioned" : true, "primary" : "shard0000" }
twitter.tweets chunks:
shard0001 11
shard0002 11
shard0000 12
There's further sharding documentation here.
PHP driver documentation is here.
2.You can't shard a sharded collection, not really sure why you need to....it doesn't make sense.