CloudSQL Failover Data Inconsistency - google-cloud-sql

I have set high availability (multi-zone) in Cloud SQL, but is there a possibility that data inconsistency will occur when a failover occurs?

Related

Debezium: Reduce Load consumption of WalSenderMain activity

I'm currently using Debezium for our postgres databases which are on AWS Aurora (RDS). As you know, debezium makes use of publication (the default name: 'dbz_publication'.
One thing I've noticed is that in AWS, we can see that the WALSenderMain (Write ahead log) activity seems to be constant with a load of AAS (Average Active Sessions) of 1 (100&).
By drilling down further this is caused by the continuous call of querying the publication table.
Note that in the configuration, I've set the 'publication.autocreate.mode' to 'filtered' rather than all the tables.
Is this normal? Would there be any way for me to finetune debezium to lower WALSenderMain activity usuage? Thanks.

Should I create a read replica or scale the DB instance?

I'm using AWS's RDS PostgresQL DB
I recently traced an application error to the database connections being maxed out, I temporarily scaled the instance to support more connections since AWS's default behaviour is to use the instance's memory to calculate the maximum number of connections.
Most of the connections are due to clients reading data, so should I create a read replica instead of scaling the server? I'm thinking of this in terms of best practices, costs, and effort
It depends how much the IOPS are, if your app is read intensive and there are connections limitation as well. Read IOPS are directly related to EBS storage, if you are using general storage.
Thus if you are using general storage. First look at your Read IOPS, if they are on average higher than STORAGE_IN_GB*3 then go for a read replica.
Otherwise just connection issue should be fixed by scaling the instance up vertically.

High availability feature within DB2 on cloud

As per the documentation , the high availablity feature in DB2 on cloud offers an additional redundant node within the same data center ( availability zone ) only. Why cant HA be provided atleast across different AZ's within the same region?
As Gilbert said, this is due to latency. The nodes are placed in the same datacenter because the HA replication is synchronous. They are kept on different power and networking pods to provide a level of isolation while still keeping them physically close.
For further physical isolation, there is the Disaster Recovery feature, where a node is added in a different datacenter altogether. This replication is asynchronous and the failovers are triggered manually by the user.

Instance and storage reliability

I'm new to Google Cloud SQL.
Usually, all cloud instances has ephemeral storage. In case of shutdown, crash, reboot, maintenance and so on, data stored on ephemeral disk is lost.
In other words, in a cloud words, data loss is expected.
What about Google Cloud SQL ? Are my data stored on a persistent and redundant disk ? What happens to my data in case of crashes, maintenance, reboots and so on ?
I know that backups are needed (as always), but is data loss something to expect like with any cloud instance? Is HA and read replicas mandatory even if my application doesn't need 99.99% SLAs ? In example, if Cloud SQL should fail for some minutes and the be back online (with all of my data), is not an issue.
What do you think ?
tl;dr: are Cloud SQL instances stored on persistent disks or on storages with RAID or similiar systems to prevent data-loss ?
An instance of Google Cloud SQL is actually running on a Google Compute Engine VM. That means that the database is stored in a persistent disk [1]. It suppose that these disk are very secure, however, the recommendation is constantly create backups [2] (although data loss is something not expected)
On the other hand, HA is not mandatory. In fact you have to explicitly request it feature when you create an instance [3].
It looks like you already saw the SLA of Cloud SQL, but check it in case you didn’t yet [4].
[1] https://cloud.google.com/docs/compare/data-centers/storage#block_storage
[2] https://cloud.google.com/sql/docs/mysql/backup-recovery/backups#what_backups_provide
[3] https://cloud.google.com/sql/docs/mysql/configure-ha
[4] https://cloud.google.com/sql/sla

Is Google Cloud SQL high availability really improving reliability?

I want to create a Google Cloud SQL instance but I am not sure about choosing high availability or not.
From what I understand the failover switch can take a few minutes, it is not instantly done, and the cost is roughly 2x the cost of a regular instance.
The failover is triggered only in case of zone outage, not in case of db issues. Since the monthly uptime is 99.95 at least, that makes an outage possibility of 21mins per month maximum. A failover can take up to 5 mins, and we can suppose the 21minutes downtime is not happening on a single event, therefore is there a real need to subscribe to High Availability?
A full zone outage is probably quite rare, so if you don't care about it, an HA instance might indeed not be needed.
One advantage of HA is that failover can be faster than restart. We've experienced cases when the primary instance gets "stuck" and a restart would take up to 30 minutes (GCP ticket). In such cases it's faster to failover to an HA instance.
(Before October 2019, HA failover instances could also be used for read queries, and thus avoid the need for an additional read replica. With the change from binlog-based replication to disk-based replication this is not the case anymore.)
HA Failover is not just for a full zone outage. It kicks in whenever the primary instance stops responding for more than a minute.
The fact that it is quicker than a restart, more reliable than a restart, and automatic means it keeps your outages much shorter when mysql crashes.
Also, don't you need HA for the SLA to apply, without HA you're not multizone, and therefore you can't meet the defintion of "Downtime"
"Downtime" means (ii) with respect to Cloud SQL Second Generation for
MySQL, Cloud SQL for SQL Server, and Cloud SQL for PostgreSQL: all
connection requests to a Multi-zone Instance fail.
https://cloud.google.com/sql/sla