Is Google Cloud SQL high availability really improving reliability? - google-cloud-sql

I want to create a Google Cloud SQL instance but I am not sure about choosing high availability or not.
From what I understand the failover switch can take a few minutes, it is not instantly done, and the cost is roughly 2x the cost of a regular instance.
The failover is triggered only in case of zone outage, not in case of db issues. Since the monthly uptime is 99.95 at least, that makes an outage possibility of 21mins per month maximum. A failover can take up to 5 mins, and we can suppose the 21minutes downtime is not happening on a single event, therefore is there a real need to subscribe to High Availability?

A full zone outage is probably quite rare, so if you don't care about it, an HA instance might indeed not be needed.
One advantage of HA is that failover can be faster than restart. We've experienced cases when the primary instance gets "stuck" and a restart would take up to 30 minutes (GCP ticket). In such cases it's faster to failover to an HA instance.
(Before October 2019, HA failover instances could also be used for read queries, and thus avoid the need for an additional read replica. With the change from binlog-based replication to disk-based replication this is not the case anymore.)

HA Failover is not just for a full zone outage. It kicks in whenever the primary instance stops responding for more than a minute.
The fact that it is quicker than a restart, more reliable than a restart, and automatic means it keeps your outages much shorter when mysql crashes.
Also, don't you need HA for the SLA to apply, without HA you're not multizone, and therefore you can't meet the defintion of "Downtime"
"Downtime" means (ii) with respect to Cloud SQL Second Generation for
MySQL, Cloud SQL for SQL Server, and Cloud SQL for PostgreSQL: all
connection requests to a Multi-zone Instance fail.
https://cloud.google.com/sql/sla

Related

High availability feature within DB2 on cloud

As per the documentation , the high availablity feature in DB2 on cloud offers an additional redundant node within the same data center ( availability zone ) only. Why cant HA be provided atleast across different AZ's within the same region?
As Gilbert said, this is due to latency. The nodes are placed in the same datacenter because the HA replication is synchronous. They are kept on different power and networking pods to provide a level of isolation while still keeping them physically close.
For further physical isolation, there is the Disaster Recovery feature, where a node is added in a different datacenter altogether. This replication is asynchronous and the failovers are triggered manually by the user.

Google Cloud SQL: Does edit instances (PostgreSQL) in part 'Connectivity' make DB downtime?

I have an instances PostgreSQL 11 on production of a web application.
Now, i want to modify the connectivity part to use internal IP and disable the public connection.
Do this activity make a little downtime for our DB?
I have guest on my own:
GCP only changes the network part of this instances so we can get zero downtime.
GCP will re-configure the instance host this PostgreSQL and this make a little downtime for our DB, the thing we dont want with our production.
Tks in advance!
Given the fact that you are using Cloud SQL for PostgreSQL, you can consider the following from the Documentation:
For most instance settings, Cloud SQL applies the change immediately and connectivity to the instance is unaffected.
Changing the number of CPUs, memory size, or the zone of the instance results in the instance going offline for several minutes. You should plan to make these kind of changes when your application can handle an outage of this length.
Update
As per the documentation, besides changing the number of CPUs, memory size, and zone of the instance, configuring an existing instance to use private IP, or changing the network it is connected to, causes the instance to be restarted. This causes a few minutes of downtime.

temporarily shut down redshift to reduce bill

Amazon says the following on Redshift billing
"Node usage hours are billed for each hour your data warehouse cluster is running in an Available state. If you no longer wish to be charged for your data warehouse cluster, you must terminate it to avoid being billed for additional node hours."
This means if I just create a cluster and whether use it or not I'll be billed 24/7 because the cluster doesn't have any state like "Suspend". Is there a way to shut down the whole Redshift server when not in use so that I'll be billed only for the hours when I want to use the clusters?
Edit: With Tomasz's reply it sounds like if I want to shutdown the cluster on weekend it'll be like backing up the whole database on Friday evening and restoring on Sunday evening. This doesn't sound good. What does Amazon really mean when they say "PAY ONLY FOR THE HOURS YOU USE"?
Can you tell me how much time will it take to backup/restore a data warehouse of size around 100GB? Can I automatically associate security groups to the cluster after restoring from the Java code?
You can create a manual snapshot of a cluster when you have finished work and then remove cluster.
You will pay for S3 storage, but that is much less than for running Redshift cluster.
Next day just restore cluster from latest snapshot. You will have to add security groups to new cluster, probably with JAVA API:
The new cluster will be associated only with the default security and
parameter groups. If the original cluster was associated with any
other security or parameter group, you will need to manually associate
those groups with the new cluster.
The easiest way to create snapshot is from the console, but you probably will want to do it automatically using cli or Java SDK.
Creating a snapshot of a 3 node cluster filled up to 80% took me about 5 minutes (it's so quick because snapshots are incremental). 100GB is much less than my setup, so it should be even faster. Also restore shouldn't take long time.
UPDATE: A lot has changed in the intervening years, in particular restore from snapshot is now quite fast. Your cluster becomes available in a few minutes and you can run queries while the restore continues in the background. Total time for complete restore of 100GB would now be measured in minutes (varies based on node type & count).
What does Amazon really mean when they say "PAY ONLY FOR THE HOURS YOU USE"?
You pay for the whole hour of any partial hours used.
Can you tell me how much time will it take to backup/restore a data warehouse of size around 100GB?
Snapshots are incremental and this is what makes them fast (as Tomasz mentioned). It's is fairly quick to shutdown a cluster about half an hour. However restoring from a snapshot is very slow I'd suggest around 3 hours for restoring 100GB.
If you really want to be able to take a database cluster up and down quickly you might be better using another analytic DB (e.g. Greenplum or Vertica free editions) with the data stored on EBS volumes. It'd be a lot more work to manage though, that's the tradeoff.
Now we can able to pause and resume the Redshift cluster (both Console and CLI)
check out the link:
https://aws.amazon.com/blogs/big-data/lower-your-costs-with-the-new-pause-and-resume-actions-on-amazon-redshift/

Does MongoDB require at least 2 server instances to prevent the loss of data?

I have decided to start developing a little web application in my spare time so I can learn about MongoDB. I was planning to get an Amazon AWS micro instance and start the development and the alpha stage there. However, I stumbled across a question here on Stack Overflow that concerned me:
But for durability, you need to use at least 2 mongodb server
instances as master/slave. Otherwise you can lose the last minute of
your data.
Is that true? Can't I just have my box with everything installed on it (Apache, PHP, MongoDB) and rely on the data being correctly stored? At least, there must be a config option in MongoDB to make it behave reliably even if installed on a single box - isn't there?
The information you have on master/slave setups is outdated. Running single-server MongoDB with journaling is a durable data store, so for use cases where you don't need replica sets or if you're in development stage, then journaling will work well.
However if you're in production, we recommend using replica sets. For the bare minimum set up, you would ideally run three (or more) instances of mongod, a 'primary' which receives reads and writes, a 'secondary' to which the writes from the primary are replicated, and an arbiter, a single instance of mongod that allows a vote to take place should the primary become unavailable. This 'automatic failover' means that, should your primary be unable to receive writes from your application at a given time, the secondary will become the primary and take over receiving data from your app.
You can read more about journaling here and replication here, and you should definitely familiarize yourself with the documentation in general in order to get a better sense of what MongoDB is all about.
Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
In some cases, you can use replication to increase read capacity. Clients have the ability to send read and write operations to different servers. You can also maintain copies in different data centers to increase the locality and availability of data for distributed applications.
Replication in MongoDB
A replica set is a group of mongod instances that host the same data set. One mongod, the primary, receives all write operations. All other instances, secondaries, apply operations from the primary so that they have the same data set.
The primary accepts all write operations from clients. Replica set can have only one primary. Because only one member can accept write operations, replica sets provide strict consistency. To support replication, the primary logs all changes to its data sets in its oplog. See primary for more information.

wait for transactional replication in ADO.NET or TSQL

My web app uses ADO.NET against SQL Server 2008. Database writes happen against a primary (publisher) database, but reads are load balanced across the primary and a secondary (subscriber) database. We use SQL Server's built-in transactional replication to keep the secondary up-to-date. Most of the time, the couple of seconds of latency is not a problem.
However, I do have a case where I'd like to block until the transaction is committed at the secondary site. Blocking for a few seconds is OK, but returning a stale page to the user is not. Is there any way in ADO.NET or TSQL to specify that I want to wait for the replication to complete? Or can I, from the publisher, check the replication status of the transaction without manually connecting to the secondary server.
[edit]
99.9% of the time, The data in the subscriber is "fresh enough". But there is one operation that invalidates it. I can't read from the publisher every time on the off chance that it's become invalid. If I can't solve this problem under transactional replication, can you suggest an alternate architecture?
There's no such solution for SQL Server, but here's how I've worked around it in other environments.
Use three separate connection strings in your application, and choose the right one based on the needs of your query:
Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
Near-Realtime - Points at a load balanced pool of subscribers. No writes go here, only reads. Used for the vast majority of OLTP reads.
Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.
If you design your app to use those 3 connection strings from the start, scaling is a lot easier, especially in the case you're experiencing.
You are describing a synchronous mirroring situation. Replication cannot, by definition, support your requirement. Replication must wait for a transaction to commit before reading it from the log and delivering it to the distributor and from there to the subscriber, which means replication by definition has a window of opportunity for data to be out of sync.
If you have a requirement an operation to read the authorithative copy of the data, then you should make that decission in the client and ensure you read from the publisher in that case.
While you can, in threory, validate wether a certain transaction was distributed to the subscriber or not, you should not base your design on it. Transactional replication makes no latency guarantee, by design, so you cannot rely on a 'perfect day' operation mode.