Mongo DB write operation in shard setting - mongodb

I am new to mongoDB while going through some tutorial I got a question in my mind that, in sharded environment during reading operation "mongos" first checks config server to get details to which shard it has to query. But what about during write operation does it first checks to which shard it has to perform write operation?
Thanks in advance,
Kitty

I am going to answer based on the current stable release of MongoDB v3.2.
The config servers store the cluster's metadata in the config database. The mongos instances cache this data and use it to route reads and writes to shards.
MongoDB only writes data to the config servers when the metadata changes, such as:
After a chunk migration, or
After a chunk split.
MongoDB reads data from the config server in the following cases:
A new mongos starts for the first time, or an existing mongos restarts.
After change in the cluster metadata, such as after a chunk migration.
See also: Sharded Cluster Mechanics.

Related

In MongoDB, can I run the compact command without shutting down each instance?

In the server structure, primary, secondary, and arbiter are each physically operated.
mongo db version is 4.2.3.
Some of the documents were deleted in the oldest order because too many documents were accumulated in a specific collection.
However, even deleting documents did not release the storage area.
Upon checking, I found that mongodb's mechanism retains reusable bytes even if the document is deleted.
Also, I found out that unnecessary disk space can be freed with the compact command in the WiredTiger engine.
Currently, all clients connected to the db are querying using the arbiter ip and port.
Since the DB is composed only of replication, not sharding, if each individual executes the compact command independently, Even if each instance is locked, it is expected that the arbiter will distribute the query to the currently available instances.
Is this possible?
Or, Should I shutdown each instance, run it standalone, run the compact command, and then reconfigure psa?
You may upgrade your MonogDB to latest version 4.4. Documentation of compact:
Blocking
Changed in version 4.4.
Starting in v4.4, on WiredTiger, compact only blocks the following
metadata operations:
db.collection.drop
db.collection.createIndex and db.collection.createIndexes
db.collection.dropIndex and db.collection.dropIndexes
compact does not block MongoDB CRUD Operations for the database it is
currently operating on.
Before v4.4, compact blocked all operations for the database it was
compacting, including MongoDB CRUD Operations, and was therefore
recommended for use only during scheduled maintenance periods.
Starting in v4.4, the compact command is appropriate for use at any
time.
To anyone looking for the answer with 4.4 please see this bug and the documentation entry as the compact routine still forces the node to recovery state if you are running in replica set (and I assume this is the default use case for most projects)

About MongoDB add shard and router server need to restart?

I build a MongoDB sharding environment and want to test the performance of migration data.
I insert one billion rows in a collection in Replica Set A.
I added another shard setting Replica Set B.
MongoDB starts to balance chunks between those shards.
After balancing is finished, I found out I can't look up some data.
Because those data have been moved to Replica Set B, only when I restart all mongo router service am I able to query them.
Is it a normal and inevitable procedure, or is there any way to reload the whole system (through mongo shell command or anything else)?
Thank you !!!
I found a command that it seems help to reload the router config
db.adminCommand({"flushRouterConfig":1});
2017-05-18 After testing, it works!

Hadoop MongoInputFormat Read Preference does not work to Mongodb Shard

I have a Spark job (PySpark) running a query to mongo. I intended to fire the query to secondary.
I have a mongo input uri like below
'mongo.input.uri': 'mongodb://'+host+':27017/'+dbName+'.'+collection+'?readPreference=secondary'
As per my understanding, having readPreference=secondary as option passed to mongo input uri is the way to make it read from secondary (ref: https://github.com/mongodb/mongo-hadoop/wiki/Configuration-Reference)
However, when I run the job, I see spike on mongo fault at my primary nodes monitoring. When I checked the log at primary nodes, it is confirmed that query run against primary instead of secondary.
What did I do wrong? Did I wrongly placed the configuration? Is it only working for normal replica set, and not for shard?
Note:
Mongodb setting is sharded with version 2.6.X

One Shard with Multiple Mongos

Can we have this type of configuration?
Two server running the following things each-
1.Mongo Config Server.
2.Mongo Router.
3.Application.
Total 4 EC2 servers-
First Server-Running the web application & mongos.
Second Server-Running the web application & mongos.
Third Server-Running the First Shard with complete DB(Say for
example Demo).
Forth Server-Running The Second Shard with complete DB(Say for
example Demo).
Both the Mongos should point to one shard named Shard1?
Yes, You can have multiple mongos instances running against a single shard. Think of the mongos instances as clients for the sharded cluster which have to run as a daemon process in order to keep metadata and heartbeats up to date.
Edit: as for having a complete DB, this is only possible for a single DB. You can have one DB on shard1 and the other DB on shard2, for example. but you can never have a single complete DB on two shards. To achieve the goal of having db1 on shard1 and db2 on shard2, you simply make the respective shard the primary shard of the respective database and don't shard any collection. Please read the docs for the movePrimary command for details.
A bit OOT:
However, running a single config server is strongly advised against, and for a good reason. If the single config server goes down or gets corrupted, your cluster will be impossible to use - and recreating the sharded cluster will not an easy task to be done. And it's going to be a lengty process. So please, use three config servers.*

mongodb single DB replication

I've a working MongoDB "replica set" made up by 3 servers.
It is storing two DBs, I wonder if is it possible to replicate only one of the DBs without running more than one mongoDB instance(one per DB).
Here is a sketch of the "problem"
Server1 Server2 Server3
DB1 X X X
DB2 X X
X stands for Server where DBs have to be replicated in.
thank
I don't believe it is possible.
Unlike sharding, where you specify down to the collection level what gets sharded, with replica sets you're defining that a given MongoDB instance is part of a replica set. As only one node in a replica set can be the master at any given time, based on the scenario you are talking about, then there would be a problem if e.g. Server1 went down and Server3 was promoted to master - as DB2 would then not be able to be written to.
I had a simliar problem and found a quite easy solution in javascript to be executed in a mongo-shell.
Sourcecode available here:
http://www.suenkel.de/blog/2012/02/mongodb-replicate-one-database-or-collection/
With opening a tailable cursor on the oplog of the master server each operation could be applied to another server (of course you can filter by the namespace of the collections or even the databases...)
According to current MongoDB ReplicaSet architecture, you can't use a single Replica Set with some members having parts of the databases or collections.
However, if you have the requirement of replicating a single database or collection in real-time in another location, I ended up with following workaround:
Use directoryPerDB to separate the desired database files (Create a new replica with this option enabled if you don't have this already)
Copy the directory of desired database to the new location.
Deploy a new ReplicaSet with this single database.
Write a simple script and use Change Streams to perform the replication for you.
As I said, you will end up with another Replica Set dedicated for this database, but replication is done in real-time and both Replica Sets has the data in a consistent way (You have to perform your write operations on first ReplicaSet, though).