I've read all the docs on the Google Cloud SQL site, and I now understand how to created and manage Read Replicas, but I have not seen any information about how to use them,
Does Google automatically load-balance connections between all instances?
Do I have to manually connect to a specific Read Replica to avoid hitting the Master? If so, do I have to manage reconnecting on replica failure myself?
Does Google automatically load-balance connections between all instances?
No, it doesn't. Each instance is independent. You can connect to replicas and use them to read while using the master to read/write, but you need to design that logic into your application
Do I have to manually connect to a specific Read Replica to avoid hitting the Master? If so, do I have to manage reconnecting on replica failure myself?
Yes, you have to connect to a specific read replica. Right now you can't even save and reuse the instance IP like you can do with compute engine instances (sigh, I hope they fix this soon....).
There is now a failover replica option that you can use so you don't need to connect to the read replica yourself, but it only activates on failure, it is not a load balancer.
Read replica can be used by setting up ProxySQL. You can configure ProxySQL to distribute the database queries. Here is a community tutorial providing more details on architecture and configuration example.
How do I use Read Replicas?
Use them for disaster recovery or to migrate your database to
another region by promoting a read replica to become a primary
database.
https://cloud.google.com/sql/docs/postgres/replication/cross-region-replicas
Use them for separating read workloads from production workloads. This blog post covers using Read Replicas for analytics workloads:
Use Cloud SQL Read Replicas to separate your analytics and production workloads
Cloud SQL does not provide load balancing between replicas1
ref:https://cloud.google.com/sql/docs/sqlserver/replication
Related
I have installed Postgres cluster using zalando operator.
Also enabled pgbouncer for replicas and master.
But I would like to combine or load balance replicase and master connections,
So that read requests can be routed to read replicas and write requests can be routed to master.
Can anyone help me out in achieving this.
Thanks in advance.
Tried enabling pgbouncer.
pgbouncer is getting enabled either to master or to slave.
But I need a single point where it can route read requests to slaves and write requests to master.
There is no safe way to distinguish reading and writing statements in PostgreSQL. pgPool tries to do that, but I think any such solution is flaky. You will have to teach your application to direct reads and writes to different data sources.
I don't think Pgbouncer provides any out of the box way to load balance read and write queries. An alternative to that is the use of pgpool as a connection pooler. Pgpool provides a mode known as load_balance_mode which you can turn it on and it will try to load balance queries and send write queries to master and read queries to replica. You can read more about the load_balance_mode here
I am having a replica lag issue with documentDB. Where I am trying to write some data from a collection and read the same at the same time. But because I am using a distributed system, I am not able to read the already written data from the replica sets.
Here's the cluster design.
.
So, is it possible to read from the primary instance in nodejs or is it possible to read from a specific instance?
How big is the replication lag? It might be worth investigating the cause for the lag, maybe bigger instances are needed or queries have to be optimized.
If your application can't tolerate eventual consistency or read after write consistency is required, then use readPreference: primaryPreferred to instruct the driver to read from the Primary instance when available. However, in this case, the replicas will not be used to scale horizontally the read traffic.
Amazon DocumentDB has other endpoints too:
reader endpoint - points to replica instances, it's found in the configuration section of the cluster (console or aws cli describe-db-clusters command)
instance endpoint - each instance has its own endpoint, it's found in the instances section (console or aws cli describe-db-instances command)
The best practice is to connect as replica set, using the readPreference parameter to adjust the preference. Instance endpoints can be useful when, for example, there's a need for large analytics queries and a bigger instance is deployed, temporarily, to run them.
I've been configuring pods in Kubernetes to hold a mongodb and golang image each with a service to load-balance. The major issue I am facing is data replication between databases. Replication controllers/replicasets do not seem to do what the name implies, but rather is a blank-slate copy instead of a replica of existing/currently running pods. I cannot seem to find any examples or clear answers on how Kubernetes addresses this, or does it even?
For example, data insertions being sent by the Go program are going to automatically load balance to one of X replicated instances of mongodb by the service. This poses problems since they will all be maintaining separate documents without any relation to one another once Kubernetes begins to balance the connections among other pods. Is there a way to address this in Kubernetes, or does it require a complete re-write of the Go code to expect data replication among numerous available databases?
Sorry, I'm relatively new to Kubernetes and couldn't seem to find much information regarding this.
You're right, a replica set is not a replica of another container, it's just a container with the same configuration spun up within the same logical unit.
A replica set (or deployment, which is the resource you should be using now) will have multiple pods, and it's up to you, the operator, to configure the mongodb part.
I would recommend reading this example of how to set up a replica set with multiple mongodb containers:
https://medium.com/google-cloud/mongodb-replica-sets-with-kubernetes-d96606bd9474#.e8y706grr
I have some nightly jobs that are running on EC2 and the number of machines is scaled by the number of messages in SQS. My process requires reads from a Postgres RDS database. Now these are the issues I am facing.
Not able to scale beyond a certain number because of the unavailability of connections.
I tried creating a connection pool using pgbouncer, and tried with different settings as well, but it's missing a lot of data on the resultant set.
Make your postgresql RDS install multi AZ. Then you can make read replicas on demand and scale read performance with your load.
To answer the comments:
Some extra "plumbing" is required to make the connections to the read replica. Maybe route53 dynamically updated records as the scaling happens or something like haproxy
The reason I mention multi AZ is that this would help prevent downtime during an auto scaling event bringing up the read replica
It would be simpler (but more costly) to permanently bring up a read replica and use DNS round robin to share the load
See https://aws.amazon.com/blogs/aws/amazon-rds-announcing-read-replicas/ for information on read replicas
I am trying to setup a very simple cluster of 2 ejabberd nodes. However, while trying to go through the official ejabberd documentation and using the join_cluster argument that comes along with the ejabberdctl script, I always end up with a multi-master cluster where both the mnesia databases have replicated data.
Is it possible to set up a ejabberd cluster in master-slave mode? And if yes, then what I am I missing?
In my understanding, a slave get the data replicated but would simply not be active. The slave needs the data to be able to take over the task of the master at some point.
It seems to means that the core of the setup you describe is not about disabling replication but about not sending traffic to the slave, no ?
In that case, this is just a matter of configuring your load balancing mechanism to route the traffic accordingly to your preference..