I am managing my PostgreSQL cluster in docker-compose and I have connected to master(coordinator) node via external client. For a given table(for example companies) and a given node number(for example worker 2 or other identifier), can I view sharded tables(for example companies_000005},companies_0000010} and their rows on worker 2) in master node without directly connecting to the individual node?
To see some information about the shards (such as shard sizes or which node the shard is on), you can use the following query with Citus 10 and later:
SELECT * FROM citus_shards;
Also, accessing the shards directly is not a suggested pattern, and it prevents certain checks/enforcements that Citus does around distributed locking and correctness etc. With that in mind, why do you need to access the shards directly?
Related
I have a few questions about MongoDB standalone and Replica sets, I don't really get it.
When should I use either of them
Why all the replica sets tutorials show 3 connections, is there a reason?
Can I create a replica set for 1 instance only? and in that case how is it different than the standalone mongodb instance?
How to Migrate data from a standalone instance to replica sets?
All these questions I'm asking because recently I was trying to implement transactions and sessions can only start on "replica sets" I don't really get the difference at all.
When should I use either of them?
Replication is the process of synchronizing data across multiple servers. Replication provides redundancy and increases data availability with multiple copies of data on different database servers. Replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
To keep your data safe
High (24*7) availability of data
Disaster recovery
No downtime for maintenance (like backups, index rebuilds, compaction) Read scaling (extra copies to read from)
Replica set is transparent to the application
Why all the replica sets tutorials show 3 connections, is there a reason?
The basic implementation to take full advantage of replication specifies you
should have at least one primary node with two secondary nodes. So the
examples are always with 3 nodes. Not only this if from 3 the
Primary node goes down you still have 2 nodes (mongoDB will assign
using arbiter rule) and one primary and one secondary for high availability
Can I create a replica set for 1 instance only? and in that case how is it different than the standalone mongodb instance?
It does not make sense to have single instance with mongo replication.
How to Migrate data from a standalone instance to replica sets?
Convert a Standalone to a Replica Set . Your existing data will be migrated to all replication instances once they are up and running when converted to replication sets from standalone.
Technically is it supported to begin with just one shard for a shard cluster? So we can be ready for adding additional one(s) at anytime, at the same time save the cost of additional shard(s) before we really need it(them)?
To go further, is it possible to have a shard running on one single instance, instead of having to be based off of a 3 instance replica set?
From here, sharding is:
A database architecture that partitions data by key ranges and
distributes the data among two or more database instances.
A shard will be either a replica set or a standalone mongod instance. It is possible for you to use a single machine by using different ports to establish distinct communication endpoints for the config, mongod and mongos processes on the single machine. Also, yes, you may add a shard at a later time when you need to expand.
However, the point of providing sharding is to support horizontal scaling. Additionally, the point of sharded clustering is to provide failover and redundancy support. By using a single shard on a single server, you are losing the benefits of scaling and certainly failover.
The recommended production architecture includes:
Three config servers on separate machines for each sharded cluster.
Two or more replica sets as shards.
One or more query routers (mongos); typically, one mongos instance per application server.
Peruse the Sharded Cluster Requirements section in the documentation to get a feel for whether or not your environment needs sharding and sharded clusters since there is complexity in establishing such an architecture.
I want to make continuous backup of my Sharded Cluster on a single MongoDB server somewhere else.
So, is it possible to create Replica Set with Sharded Cluster (mongos instance) and single MongoDB server?
Did anyone experience creating Replica Sets with two Sharded Clusters or with one Sharded Cluster and one Single Server?
How does it work?
By the way, the best (and for now, the only) way to continuously backup Sharded Cluster is by using MongoDB Management Service (MMS).
I were also facing the same issue sometime back. I wanted to take replica of all sharded cluster into one MongoDB. But, i didn't found any solution for this scenario, and I think this is true, because -
If you configure the multiple shard server (say 2 shard server) with
one replica set, then this will not work because in a replica set (say
rs0) only 1-primary member is possible. And In this scenario, we will
have multiple primary depend on number of shard server.
To take the backup of your whole sharded cluster, you must stop all writes to the cluster. You can refer to MongoDB documentation on this - http://docs.mongodb.org/manual/tutorial/backup-sharded-cluster-with-database-dumps/
I would like to create a development project on EC2 cluster. Current design suggest accessing mongo database files stored on EBS volume. If that is possible to run distributed computing and access same files in /data/db/ simultaneously from different nodes?
No, that will not work. You cannot access the same mongodb database files from different processes on different nodes.
The way you use mongoDB in a distributed environment is with replica sets and sharding. In both cases you have mongodb instances running on each node. Replica sets duplicate the same data across all the nodes in the set, for data redundancy and fault tolerance. Sharding allows you to distribute different sets of data on different nodes to provide horizontal scaling. Large production environments use both replica sets and sharding.
Best place to read up on all of this is:
http://docs.mongodb.org/manual/administration/replica-sets/
http://docs.mongodb.org/manual/sharding/
http://docs.mongodb.org/ecosystem/platforms/amazon-ec2/
A MongoDB instance can have different roles:
Config server
Router (mongos)
Data server
Arbiter server (for replica sets)
I know that db.serverStatus() can be used to see if an instance is a router, the process value is mongos.
But for config servers, arbiters and data nodes the process value is mongod.
Is there a simple way of distinguishing between these instance types?
I want to bring attention to one particular important issue with this question: sharding is and horizontal dimension ( several replicasets where data is distributed to ) and replicaset is a high availability solution which is represented by the composition of different mongod nodes!
So you actually what you are trying to figure out is:
ReplicaSet nodes roles
Shard Nodes members
In the case of a replicaSet what you might be interested in knowing is each node role. You can easily get the information without needing to connect to all the nodes of the replicaset, just run the command:
db.isMaster()
with this you will get the node members and roles of each member.
For shard node members first of all you should never try to connect directly to the config servers. These are their to manage the distribution of chunks, chunk splits and other configuration data, relevant only for the shard cluster functionality. Avoid using those ip's to connect to from your application.
So if you want to have a clear view of which members compose your shard cluster, how many shards you have etc, you need to run command:
db.printShardStatus()
or
sh.status()
Please review the documentation here
Cheers,
N.