I am looking at either setting up Aurora Postgresql or RDS Postgresql instance in AWS.
I would like the db instance to be running in 2 different regions and would like real time replication to be set up. I would also like no downtime for rehydration / patching etc.
Based on what I have read / discussed with colleagues so far , I am under the impression that Aurora Postgresql is the option to choose because RDS needs few minutes of downtime for rehydration and Aurora supports realtime replication of db instance across different regions.
Is my understanding correct and are there any other factors that I should be aware of?
No RDS product supports "real-time" replication across regions. Cross-region replication is always asynchronous.
You can expect to see a higher level of lag time for any Read Replica that is in a different AWS Region than the source instance, due to the longer network channels between regional data centers.
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html#USER_ReadRepl.XRgn
Additionally, cross-region replicas for Aurora/Postgres are not yet available.
Cross-region replicas are only available for Aurora/MySQL... but a cross-region replica is not for zero downtime or failover, anyway -- it's only for geo/latency-based read scale-out or disaster recovery, because once you promote the replica, the original master has to be abandoned, because replication is one-way.
If, when you said "region," you were actually referring to availability zones, then that is much more straightforward, since the backing store of Aurora instances is replicated across 3 availability zones within the region, and replication is synchronous. All replicas in a single region can be synchronous, even in different AZs, since they all share the same replicated storage.
Related
I have some experience with AWS RDS MySQL multi-AZ (HA). I'm looking at GCP Cloud SQL Postgres HA for a new project.
I'm trying to figure how certain maintenance operations work but can't figure it out from the Cloud SQL docs.
How much unavailability does a failover cause?
How much unavailability does a CPU/memory upgrade cause?
After a failover, is it important to eventually "failback" to the original primary instance? Or can I leave it running on the standby instance indefinitely? (The Cloud SQL HA failover diagram make it seem like the two instances aren't totally symmetric.)
Just FYI, the answers for AWS RDS
Failover: usually under 70 seconds of unavailability before my application is able to issue queries again.
This is for planned failovers. (For unplanned failovers, it may take a little longer for RDS to detect that the primary instance is unresponsive before it actually initiates the failover.)
A lot of the failover lag is likely due to DNS. Using the AWS RDS Proxy service may reduce that time (they claim by ~80%). The Cloud SQL HA failover diagram shows both instances sharing a virtual IP, which might mean no DNS lag?
CPU/memory upgrade: I think AWS can accomplish this with a single failover worth of unavailability. It upgrades the standby instance (no unavailability), performs a failover, then upgrades the other instance.
On RDS, I think the two instances that are part of the HA set up are symmetric. So if you failover to the standby, it's fine to leave it that way. There's no need (as far as RDS is concerned) to failover back to the original.
To answer your following questions:
As you mentioned, the duration of the unavailability would vary depending if it is a planned (manual) failover vs unplanned. It's best that you test and manually initiate the failover so you can see how long your instance would respond to it, usually it would take a minute or so. When it comes to unplanned failovers, it's pretty much covered in the docs that when failover occurs, any existing connections to the primary instance and read replicas are closed, and it will take approximately 2-3 minutes for connections to be reestablished.
To address this question, you need to understand the requirements for your instance to allow failover:
The primary instance must be in a normal operating state (not stopped, undergoing maintenance, or performing a long-running Cloud SQL instance operation such as a backup, import or export operation).
That means that failover doesn't count when upgrading your instance, changing your hardware specs (CPU/Memory) will incur downtime so you should plan ahead when making these changes.
To understand the importance of failback, here's an excerpt from this link:
High availability solutions continuously replicate data to a remote site or cloud. In the event that a primary system goes down, the remote, secondary system can be spun up and users are rerouted. This process is commonly referred to as “failover,” and it reduces downtime to seconds or minutes.
However, failover isn’t a permanent state. Once primary servers are up and running, data and applications must be restored so normal operations can resume. This process is known as failback, and it is very important from a DR testing standpoint. Here’s why: Not all replication technology is created equally when it comes to failback. In some cases, failing back to production servers can be painfully slow.
UPDATE 1:
HA on Cloud SQL will provision specs for your standby instance similar to your primary, that's why you'll get billed double the price of a non-HA instance. Also, the importance of failback is not limited to any cloud providers. It is simply a good practice to make sure that all the operation returns to your primary instance instead of just leaving it on a standby instance. On that case, failback (on Cloud SQL to be specific) is really necessary to make sure that everything is back to normal after an outage.
UPDATE 2:
If you don't failback, what could happen is that when there's an outage on the zone where your standby instance is running (you can't control what zone your standby instance will come from), you won't be able to do a failover as the operation will be blocked. (See the docs)
Unfortunately there's pretty much no option as the downtime is required whenever you change hardware. The procedure will require the instance to restart. Here's a link to see how long it would take.
Additional resources: https://severalnines.com/database-blog/achieving-mysql-failover-failback-google-cloud-platform-gcp
I have a PostgreSQL DB on Amazon RDS. I need a replication available on a different AWS Region for having high availability. I read the Posgres Docs here. However, I'm not sure if the replication slots are also replicated (along with the lsn's).
Can someone throw some light on this? Also, if the replication slots are not duplicated on the RDS replica (in the different region), how do I manage a region failure?
In PostgreSQL, replication slots are not replicated. You can, however, create replication slots on standby servers, if you want to use cascading replication.
There is no need to replicate a replication slot.
I have a few questions about MongoDB standalone and Replica sets, I don't really get it.
When should I use either of them
Why all the replica sets tutorials show 3 connections, is there a reason?
Can I create a replica set for 1 instance only? and in that case how is it different than the standalone mongodb instance?
How to Migrate data from a standalone instance to replica sets?
All these questions I'm asking because recently I was trying to implement transactions and sessions can only start on "replica sets" I don't really get the difference at all.
When should I use either of them?
Replication is the process of synchronizing data across multiple servers. Replication provides redundancy and increases data availability with multiple copies of data on different database servers. Replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
To keep your data safe
High (24*7) availability of data
Disaster recovery
No downtime for maintenance (like backups, index rebuilds, compaction) Read scaling (extra copies to read from)
Replica set is transparent to the application
Why all the replica sets tutorials show 3 connections, is there a reason?
The basic implementation to take full advantage of replication specifies you
should have at least one primary node with two secondary nodes. So the
examples are always with 3 nodes. Not only this if from 3 the
Primary node goes down you still have 2 nodes (mongoDB will assign
using arbiter rule) and one primary and one secondary for high availability
Can I create a replica set for 1 instance only? and in that case how is it different than the standalone mongodb instance?
It does not make sense to have single instance with mongo replication.
How to Migrate data from a standalone instance to replica sets?
Convert a Standalone to a Replica Set . Your existing data will be migrated to all replication instances once they are up and running when converted to replication sets from standalone.
Is there some kind of native Postgres tool they use, or is it a custom one? Are the replicas always in sync or do they drift apart from time to time?
With Multi-AZ RDS replication is synchronous. And since AWS like to be in full control of their software, it’s most likely a customised replication (but I couldn’t tell you for sure).
We have mongodb sharded cluster currently deployed on EC2 instances in Amazon. These shards are also replica sets. The instances used are using EBS with IOPS provisioned.
We have about 30 million documents in a collection. Our queries count the whole collection that matches the filters. We have indexes on almost all of the query-able fields. This results to the RAM reaching 100% usage. Our working set exceeds the size of the RAM. We think that the slow response of our queries are caused by EBS being slow so we are thinking of migrating to the new SSD-backed instances.
C3 is available
http://aws.typepad.com/aws/2013/11/a-generation-of-ec2-instances-for-compute-intensive-workloads.html
I2 is coming soon
http://aws.typepad.com/aws/2013/11/coming-soon-the-i2-instance-type-high-io-performance-via-ssd.html
Our only concern is that SSD is ephemeral, meaning the data will be gone once the instance stops, terminates, or fails. How can we address this? How do we automate backups. Is it a good idea to migrate to SSD to improve the performance of our queries? Do we still need to set-up a sharded cluster?
Working with the ephemeral disks is a risk but if you have your replication setup correctly it shouldn't be a huge concern. I'm assuming you've setup a three node replica set correct? Also you have three nodes for your config servers?
I can speak of this from experience as the company I'm at has been setup this way. To help mitigate risk I'm moving towards a backup strategy that involved a hidden replica. With this setup I can shutdown the hidden replica set and one of the config servers (first having stopped balancing) and take a complete copy of the data files (replica and config server) and have a valid backup. If AWS went down on my availability zone I'd still have a daily backup available on S3 to restore from.
Hope this helps.