Scaling multiple schemas on multiple machine, I am using PostgreSQL as backend system, now I want to perform the lookup of schema that on which machine the schema resides, so like I have two machines M1 and M2 on both of the machines D1 and D2 are installed now tenant1 schema is on DB1 and Tenant2 schema is on DB2, now both the tenants will use the same application server, either in this case I have to use cluster and partition nodes and every node contain some amount of data or the like. The same application can be used by tenant1 and tenant2, now I want some solution to stay in the middle and perform lookup and caching of the pooling. Is it possible to get it done by using Cluster ? Or I have to make a program like PgPool II that appears as Database Sever and lookup the schema for me.
I would suggest one of two things.
You could use pgpool or application-level connection pooling to make the decision.
If you need it to be db-transparent you could set up schemas with pl/proxy procedures to handle sharding of the db out in this way. This would allow you to have your "front-end" database servers coordinate queries for the storage shards (each in a different partition).
Related
I have a Aurora postgresql cluster in AWS and have one DB instance in this cluster. The postgresql DB is only used for write not for read. I use it as a backup database. I know Aurora has read scaling policy and I can create multiple DB instances in this cluster to improve read performance. But it doesn't benefit my case (write only). My question is that is there any benefits for me to spin up multiple db instances in the cluster. Aurura postgresql is a single master mode which means only one instance can take write. If I deploy multiple instances/replicas, they are basically useless. Do I understand it correctly?
Yes, you are correct. In your case there is no reason to launch additional instances in the cluster.
In the future, you may be able to use Aurora multi-master to give you more performance for writes. This is only available for MySQL 5.6 at the moment. See https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-multi-master.html
Suppose my Master server have 10-15 database registered.
And on my slave server only have one or two of them.
So is there any possible way to sync between master and slave for that two databases only.
Cause while configuring data replication, there are settings for configure server only.
So, Is there any configuration to specify databases which to replicate?
This will be confusing for some due to poor terminology choices by the PostgreSQL folks, but please bear with me...
We have a need to be able to support multiple PostgreSQL (PG) clusters, and cluster them on multiple servers using, e.g. repmgr. For example, to support both server availability and also PITR for each PG cluster. A single PG cluster per server is too expensive in many cases, so we multi-tenant (small) customers on separate PG clusters, for data separation, recovery, etc., but also want to be able to support HA via replication/fail-over.
The closest analogy for a PG cluster is a SQL Server instance - each can host multiple DB's, has its own port, etc. Like SQL Server, you can run multiple instances (PG clusters) on the same server, and set up replication for each.
Basic repmgr setup is no problem - that seems fairly clear in the single PG cluster model. But, is there any recommended/supported approach to multiple PG clusters using repmgr? I can kind of imagine faking repmgr into thinking each PG cluster is in effect a separate repmgr cluster (with separate repmgr.conf, connection info/port). But, I'm not yet sure that will work.
I'd typically expect to fail-over all PG clusters on the same server - not one at a time.
I recognize this may not be the best idea in all cases, but am mostly exploring what's possible. I have some alternatives, but this is closest to our current single-node model.
To clarify, I need to support many thousands of customers across many server clusters. Ideally, each cluster uses the same repmgr DB (in the main PG cluster, e.g.), and essentially stands alone from the other server clusters.
Thanks...
Answering my own question, but I hope someone eventually posts a better answer, as I otherwise quite like repmgr. In the end, it appears repmgr just isn't suitable for multiple PG clusters (instances), as there is an implied relationship between the repmgr cluster connection strings and the PG cluster (port). Thus, you'd essentially have to create a separate repmgr environment (DB) for every clustered PG cluster/instance, losing a lot of the operational simplicity that repmgr brings to the table.
I will investigate a more generic solution using Corosync/Pacemaker/etc., as at least in that case, the virtual cluster IP handling is built-in to the solution, and doesn't require additional software/resources to pull off.
I'm sure I'm probably over-simplifying things, but it seems like repmgr was tantilizingly close to solving much of the problem, had it allowed the repmgr DB to be fully independent of the PG cluster and allow each repmgr cluster to specify its own connection info, not (only) the connection info for the repmgr DB itself.
I would like to create a development project on EC2 cluster. Current design suggest accessing mongo database files stored on EBS volume. If that is possible to run distributed computing and access same files in /data/db/ simultaneously from different nodes?
No, that will not work. You cannot access the same mongodb database files from different processes on different nodes.
The way you use mongoDB in a distributed environment is with replica sets and sharding. In both cases you have mongodb instances running on each node. Replica sets duplicate the same data across all the nodes in the set, for data redundancy and fault tolerance. Sharding allows you to distribute different sets of data on different nodes to provide horizontal scaling. Large production environments use both replica sets and sharding.
Best place to read up on all of this is:
http://docs.mongodb.org/manual/administration/replica-sets/
http://docs.mongodb.org/manual/sharding/
http://docs.mongodb.org/ecosystem/platforms/amazon-ec2/
I've a working MongoDB "replica set" made up by 3 servers.
It is storing two DBs, I wonder if is it possible to replicate only one of the DBs without running more than one mongoDB instance(one per DB).
Here is a sketch of the "problem"
Server1 Server2 Server3
DB1 X X X
DB2 X X
X stands for Server where DBs have to be replicated in.
thank
I don't believe it is possible.
Unlike sharding, where you specify down to the collection level what gets sharded, with replica sets you're defining that a given MongoDB instance is part of a replica set. As only one node in a replica set can be the master at any given time, based on the scenario you are talking about, then there would be a problem if e.g. Server1 went down and Server3 was promoted to master - as DB2 would then not be able to be written to.
I had a simliar problem and found a quite easy solution in javascript to be executed in a mongo-shell.
Sourcecode available here:
http://www.suenkel.de/blog/2012/02/mongodb-replicate-one-database-or-collection/
With opening a tailable cursor on the oplog of the master server each operation could be applied to another server (of course you can filter by the namespace of the collections or even the databases...)
According to current MongoDB ReplicaSet architecture, you can't use a single Replica Set with some members having parts of the databases or collections.
However, if you have the requirement of replicating a single database or collection in real-time in another location, I ended up with following workaround:
Use directoryPerDB to separate the desired database files (Create a new replica with this option enabled if you don't have this already)
Copy the directory of desired database to the new location.
Deploy a new ReplicaSet with this single database.
Write a simple script and use Change Streams to perform the replication for you.
As I said, you will end up with another Replica Set dedicated for this database, but replication is done in real-time and both Replica Sets has the data in a consistent way (You have to perform your write operations on first ReplicaSet, though).