I have installed Postgres cluster using zalando operator.
Also enabled pgbouncer for replicas and master.
But I would like to combine or load balance replicase and master connections,
So that read requests can be routed to read replicas and write requests can be routed to master.
Can anyone help me out in achieving this.
Thanks in advance.
Tried enabling pgbouncer.
pgbouncer is getting enabled either to master or to slave.
But I need a single point where it can route read requests to slaves and write requests to master.
There is no safe way to distinguish reading and writing statements in PostgreSQL. pgPool tries to do that, but I think any such solution is flaky. You will have to teach your application to direct reads and writes to different data sources.
I don't think Pgbouncer provides any out of the box way to load balance read and write queries. An alternative to that is the use of pgpool as a connection pooler. Pgpool provides a mode known as load_balance_mode which you can turn it on and it will try to load balance queries and send write queries to master and read queries to replica. You can read more about the load_balance_mode here
Related
I am trying to create a connection pooling system with load balancing. From what I unsderstand PGbouncer doesn't have a load balancing option and all I can do is to create a file with all the users+pass and configure the dbs/clusters. but in this option i cannot direct the connections to specific cluster. i'll explain: inserts will go to primary and selects will go to slave. what is possible is to let user "user1" connect to cluster on port 5432 to DB "database123".
How can I redirect queries to standby with other tools?
I tried to do this with pgpool but for some reason the standby is always on "waiting" status --> Cannot configure pgpool with master and slave nodes
It is impossible to tell from an SQL statement if it will modify data or not. What about SELECT delete_my_data();?
So all tools that try to figure that out by looking at the SQL statement are potentially problematic.
The best you can do is to write your application so that it uses two data sources: one for reading and one for writing, and you determine what goes where.
Assuming I have 2 postgres servers (1 master and 1 slave) and I'm using Patroni for high availability
1) I intend to have three-machine etcd cluster. Is it OK to use the 2 postgres machines also for etcd + another server, or it is preferable to use machines that are not used by Postgres?
2) What are my options of directing the read request to the slave and the write requests to the master without using pgpool?
Thanks!
yes, it is the best practice to run etcd on the two PostgreSQL machines.
the only safe way to do that is in your application. The application has to be taught to read from one database connection and write to another.
There is no safe way to distinguish a writing query from a non-writing one; consider
SELECT delete_some_rows();
The application also has to be aware that changes will not be visible immediately on the replica.
Streaming replication is of limited use when it comes to scaling...
I've read all the docs on the Google Cloud SQL site, and I now understand how to created and manage Read Replicas, but I have not seen any information about how to use them,
Does Google automatically load-balance connections between all instances?
Do I have to manually connect to a specific Read Replica to avoid hitting the Master? If so, do I have to manage reconnecting on replica failure myself?
Does Google automatically load-balance connections between all instances?
No, it doesn't. Each instance is independent. You can connect to replicas and use them to read while using the master to read/write, but you need to design that logic into your application
Do I have to manually connect to a specific Read Replica to avoid hitting the Master? If so, do I have to manage reconnecting on replica failure myself?
Yes, you have to connect to a specific read replica. Right now you can't even save and reuse the instance IP like you can do with compute engine instances (sigh, I hope they fix this soon....).
There is now a failover replica option that you can use so you don't need to connect to the read replica yourself, but it only activates on failure, it is not a load balancer.
Read replica can be used by setting up ProxySQL. You can configure ProxySQL to distribute the database queries. Here is a community tutorial providing more details on architecture and configuration example.
How do I use Read Replicas?
Use them for disaster recovery or to migrate your database to
another region by promoting a read replica to become a primary
database.
https://cloud.google.com/sql/docs/postgres/replication/cross-region-replicas
Use them for separating read workloads from production workloads. This blog post covers using Read Replicas for analytics workloads:
Use Cloud SQL Read Replicas to separate your analytics and production workloads
Cloud SQL does not provide load balancing between replicas1
ref:https://cloud.google.com/sql/docs/sqlserver/replication
I have some nightly jobs that are running on EC2 and the number of machines is scaled by the number of messages in SQS. My process requires reads from a Postgres RDS database. Now these are the issues I am facing.
Not able to scale beyond a certain number because of the unavailability of connections.
I tried creating a connection pool using pgbouncer, and tried with different settings as well, but it's missing a lot of data on the resultant set.
Make your postgresql RDS install multi AZ. Then you can make read replicas on demand and scale read performance with your load.
To answer the comments:
Some extra "plumbing" is required to make the connections to the read replica. Maybe route53 dynamically updated records as the scaling happens or something like haproxy
The reason I mention multi AZ is that this would help prevent downtime during an auto scaling event bringing up the read replica
It would be simpler (but more costly) to permanently bring up a read replica and use DNS round robin to share the load
See https://aws.amazon.com/blogs/aws/amazon-rds-announcing-read-replicas/ for information on read replicas
I am trying to setup a very simple cluster of 2 ejabberd nodes. However, while trying to go through the official ejabberd documentation and using the join_cluster argument that comes along with the ejabberdctl script, I always end up with a multi-master cluster where both the mnesia databases have replicated data.
Is it possible to set up a ejabberd cluster in master-slave mode? And if yes, then what I am I missing?
In my understanding, a slave get the data replicated but would simply not be active. The slave needs the data to be able to take over the task of the master at some point.
It seems to means that the core of the setup you describe is not about disabling replication but about not sending traffic to the slave, no ?
In that case, this is just a matter of configuring your load balancing mechanism to route the traffic accordingly to your preference..