we are investigating performance issues on our application database and since turning on Azure Automatic Tuning am getting 97% waits on
Related
My Cloud SQL Mysql 5.7.37 Highly available instance is stuck in a "Failover operation in progress. This may take a few minutes. While this operation is running, you may continue to view information about the instance" process. It is a fairly small database and it has been stuck like this for 5 hours and the failover is not available so no DB queries can be executed, hence our system is currently down.
No commands on the DB can be executed since it is in an updating process, the error log is empty and the operations log only contain this update and successfull backups.
Does anyone have any suggestions? I am not paying for Google Support so I cant get support directly from them (which I think is terrible since this a fully managed service).
Best,
Carl-Fredrik
I'm doing some load testing on a microservice application. Collected the percentile statistics and plotted them. The application is running in a shared K8s cluster. The thing I am not quite understanding is why is there a latency spike in the start? Is this an issue with a cold boot?
Locust plot showing RT over time
Is this an issue with a cold boot?
Yes, this is the most likely explanation. There's no way of knowing without digging into your application and its logs though.
Most applications, especially ones that do automatic scaling, perform very poorly when suddenly hit with a large amount of load. If your actual expected user load does not have this behaviour, then maybe a slower ramp-up is more appropriate.
If you havent already read this, then maybe have a look at https://github.com/locustio/locust/wiki/FAQ#increase-my-request-raterps
It's been almost 3 months I have switched my platform to Google Cloud (Compute Engine + Cloud SQL + Cloud Storage).
I am very happy with it but from time to time I noticed big latency on the Cloud SQL server. My VMs from Compute Engine and my Cloud SQL instance are all on the same location (us-1) datacenter.
Since my Java backend makes a lot of SQL queries to generate a server response, the response times may vary from 250-300ms (normal) up to 2s!
In the console, I notice absolutely nothing: no CPU peaks, no read/write peaks, no backup running, nothing. No alert. Last time it happened, it lasted for a few days and then the response times went suddenly better than ever.
I am pretty sure Google works on the infrastructure behind the scenes... But no way to point that out.
So here's my questions:
Has anybody else ever had noticed the same kind of problem?
It is really annoying for me because my web pages get very slow and I have absolutely no control over it. Plus I loose a lot of time because I generally never first suspect a hardware problem / maintenance but instead something that we introduced in our app. Is it normal or do I have a problem on my SQL instance?
Is there anywhere I can have visibility over what's Google doing on the hardware? I know there are maintenance alerts, but for my zone it seems always empty when it happen.
The only option I have for now is to wait and that is really not acceptable.
I suspect that Google does some sort of IO throttling and their algorithm is not very sophisticated. We have a build server which slows down to a crawl if we do more than two builds within an hour. The build that normally takes 15 minutes will run for more than an hour and we usually terminate it and re-run manually later. This question describes a similar problem and the recommended solution is to use larger volumes as they come with more IO allowance.
I am running various tests that spend a lot of time in the database.
I'd like to keep it all in memory and have it not touch the db, hopefully that would speed things up. Like using sqlite3's in-memory option. I don't need persistence/durability/whatnot, everything is immediately discarded after the test.
Is that possible? I tried tweaking my postgres memory-related vars (as in the solution below), but that doesn't seem to affect the number of db writes it performs, and I couldn't find anything that looks like an 'in-memory' option.
https://dba.stackexchange.com/questions/18484/tuning-postgresql-for-large-amounts-of-ram
I wrote a detailed post on this some time ago:
Optimise PostgreSQL for fast testing
You may find it informative; it covers options for making PostgreSQL run without durability and other tweaks that're useful for running tests.
You do not actually need in-memory operation. If PostgreSQL is set not to flush changes to disk then in practice there'll be little difference for DBs that fit in RAM, and for DBs that don't fit in RAM it won't crash.
You should test with the same database engine you're using in production. Testing with SQLite, Derby, H2, etc then deploying live on PostgreSQL doesn't make tons of sense... as any Heroku/Rails user can tell you from experience.
We have a production system that uses MongoDB as it's data storage and writes a lot of the actions it completes there.
I'd like to run some reports to see what is going on and if this was a MSSQL DB I'd have a replicated server setup so I didn't cause any locks that might effect the live system.
Is this necessary in MongoDB?
I was considering adding a Hidden server that could be used to run queries from, but I haven't investigated that in any detail.
Obviously any queries you run for reporting are going to add load to the server. Depending on what types of queries your reports are running will affect how big of an impact this will have. It is definitely possible to set up a dedicated secondary for the sole purpose of reporting. You could then make a direct connection to that secondary and do slaveOk queries to run your reports. This page has information on how to set up a hidden replica set member: http://www.mongodb.org/display/DOCS/Replica+Set+Configuration#ReplicaSetConfiguration-Memberoptions.