Severe delays in cloud SQL responses - google-cloud-sql

In the past 4-5 hours there have been 10s of simple read queries that took 40-70 seconds to return result from the cloud SQL DB. Usually they take 50ms or so. Is there some ongoing issue? I can provide DB IDs and specific times if needed.
Thanks.

Between 11.00PST and 11.30PST there was an issue that interrupted many Cloud SQL instances. The problem should now be resolved.
We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority for the Google Cloud Platform, and we are making continuous improvements to make our systems better.
To be kept informed of other Google Cloud SQL issues and launches, please join google-cloud-sql-announce#googlegroups.com
https://groups.google.com/forum/#!forum/google-cloud-sql-announce
(from Joe Faith in another thread)

Related

Cloud SQL Mysql - Stuck in failover operation in progress

My Cloud SQL Mysql 5.7.37 Highly available instance is stuck in a "Failover operation in progress. This may take a few minutes. While this operation is running, you may continue to view information about the instance" process. It is a fairly small database and it has been stuck like this for 5 hours and the failover is not available so no DB queries can be executed, hence our system is currently down.
No commands on the DB can be executed since it is in an updating process, the error log is empty and the operations log only contain this update and successfull backups.
Does anyone have any suggestions? I am not paying for Google Support so I cant get support directly from them (which I think is terrible since this a fully managed service).
Best,
Carl-Fredrik

Does the performance of Firestore transactions change significantly when those transactions are dispatched from within Google Cloud?

At the moment, I have many Firebase functions whose performance I'm not happy with when testing locally. The slowest part of these functions are the Firestore transactions, even though they are only relatively short series of gets and sets. Atomizing these series as single transactions and/or batching has not improved the performance sufficiently.
So, before I try any other strategies, I wanted to do some research on the relative performance of transacting with Firestore when calling from within a Cloud Function, i.e., from within Google Cloud, and from without. I haven't found anything that quite answers my question yet. Any recommendations or answers?
Does the performance of Firestore transactions change significantly when those transactions are dispatched from within Google Cloud?
Transactions involve a network round trip between the client machine running the SDK code that running the transaction and a the Google Cloud backend that hosts the Firestore data. It stands to reason that reducing the network latency between those machines will reduce the time it takes to execute the transaction. The only way to know for sure if there is an improvement in your particular case is the perform some benchmarking.
Your best case scenario likely involves the client and backend in the same Google Cloud region. So, if you are using a Cloud Function in the same region as your Firestore instance, then that should, in theory, be the best option. Again, only benchmarking will tell how much of an improvement that will be, if any. Whether or not that is "significant" is dependent on your benchmark observations (and your expectations of what is "significant").

Google Cloud SQL Postgres - randomly slow queries from Google Compute / Kubernetes

I've been testing Google Cloud SQL with Postgresql, but I have random queries taking ~3s instead of a few ms.
Troubleshooting I did:
The queries themselves aren't problems, rerunning the same query will work.
Indexes are properly set. The database is also very very small, it shouldn't do this, even if there weren't any index.
The Kubernetes container is connecting to the database through SQL Proxy (I followed this https://cloud.google.com/sql/docs/postgres/connect-kubernetes-engine). It is not the problem though as I tried to connect directly to the database, with the same issue.
I configured net.ipv4.tcp_keepalive_time to 60 to make sure the connection weren't dropping.
I also have a pool of connection that are never disconnected to make sure it wasn't from that.
When I run queries directly through my local Postgresql client, I never have the problem.
I don't have this issue when developing locally either and connecting to my local database.
What I'm getting at is: I feel there's some weird connection/link issue between my Google Compute instances and my Google SQL instance that I can't seem to figure out.
Any idea?
Edit:
I also noticed these logs in my SQL Cloud instance every 30s:
ERROR: recovery is not in progress
HINT: Recovery control functions can only be executed during recovery.
STATEMENT: SELECT pg_is_xlog_replay_paused(), current_timestamp
That's an interesting problem you are facing. So my knowledge on Kubernetes isn't that great, but I do have a general understanding so let's see if I can provide some suggestions.
To start with, the API that you linked to in your question does mention that it is still in beta. So I do believe there would still be issues to patch in maximizing speed performance.
Secondly, from what I understand, Kubernetes is a great tool for handling stateless workloads. Thus, handling data where state is required for queries would be a slow operation. This article (although not entirely related) does explain some of the pitfalls of Kubernetes (not all the questions are relevant)
Thirdly, could you explain your use case a little bit? Do you really need to use Kubernetes or will another tool like a powerful Compute Engine Instance or or a Dataflow job resolve the the issue? Are you making your database queries through a programming language or an application call?
Thanks, and do let me know!

Google Cloud SQL very slow from time to time

It's been almost 3 months I have switched my platform to Google Cloud (Compute Engine + Cloud SQL + Cloud Storage).
I am very happy with it but from time to time I noticed big latency on the Cloud SQL server. My VMs from Compute Engine and my Cloud SQL instance are all on the same location (us-1) datacenter.
Since my Java backend makes a lot of SQL queries to generate a server response, the response times may vary from 250-300ms (normal) up to 2s!
In the console, I notice absolutely nothing: no CPU peaks, no read/write peaks, no backup running, nothing. No alert. Last time it happened, it lasted for a few days and then the response times went suddenly better than ever.
I am pretty sure Google works on the infrastructure behind the scenes... But no way to point that out.
So here's my questions:
Has anybody else ever had noticed the same kind of problem?
It is really annoying for me because my web pages get very slow and I have absolutely no control over it. Plus I loose a lot of time because I generally never first suspect a hardware problem / maintenance but instead something that we introduced in our app. Is it normal or do I have a problem on my SQL instance?
Is there anywhere I can have visibility over what's Google doing on the hardware? I know there are maintenance alerts, but for my zone it seems always empty when it happen.
The only option I have for now is to wait and that is really not acceptable.
I suspect that Google does some sort of IO throttling and their algorithm is not very sophisticated. We have a build server which slows down to a crawl if we do more than two builds within an hour. The build that normally takes 15 minutes will run for more than an hour and we usually terminate it and re-run manually later. This question describes a similar problem and the recommended solution is to use larger volumes as they come with more IO allowance.

Cloud SQL errors

Various errors started occurring in Google SQL. The system is saying temporary unavailable, but it's been quite a while. Looks like 1 in 10 queries now give 500/502 errors. Here is an example stacktrace http://pastebin.com/MNk06PT4
This is a follow-up from Severe delays in cloud SQL responses. It could be the same issue. Same conditions, google cloud engine connected to a cloud SQL, no zone preference. Hope that sheds more light on the issue.
Between 11.00PST and 11.30PST there was an issue that interrupted many Cloud SQL instances. The problem should now be resolved.
We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority for the Google Cloud Platform, and we are making continuous improvements to make our systems better.
To be kept informed of other Google Cloud SQL issues and launches, please join google-cloud-sql-announce#googlegroups.com
https://groups.google.com/forum/#!forum/google-cloud-sql-announce
Seems all Google Cloud SQL is down, any expected recovery time?
https://cloud.google.com/console
Error: Server Error
The server encountered an error and could not complete your request.
Please try again in 30 seconds.
Best regards
Sergio
Also I noticed that you are using the deprecated JDBC driver that has much lower performance than using the MySQL wire protocol natively. See https://developers.google.com/cloud-sql/docs/external for information on connecting using the standard drivers. That will help latency as well as consistency of performance.
I also got this error using CodeIgniter v2.1 and v3 on app engine and got this error as well.
It happens when using $autoload['libraries'] = array('database');
Then after a few random page refreshes this error pops up.
After changing the following in database.php:
'pconnect' => TRUE,
into
'pconnect' => FALSE,
This errors is gone in my application. Now both version 2.1 and 3 are working for me.
Maybe there is a similar setting in the framework or code you're using.