My Cloud SQL Mysql 5.7.37 Highly available instance is stuck in a "Failover operation in progress. This may take a few minutes. While this operation is running, you may continue to view information about the instance" process. It is a fairly small database and it has been stuck like this for 5 hours and the failover is not available so no DB queries can be executed, hence our system is currently down.
No commands on the DB can be executed since it is in an updating process, the error log is empty and the operations log only contain this update and successfull backups.
Does anyone have any suggestions? I am not paying for Google Support so I cant get support directly from them (which I think is terrible since this a fully managed service).
Best,
Carl-Fredrik
Related
I've got a database on Google SQL that is used by our application running on kubernetes in GKE.
The mysql instance is running on 5.6, and I need to update it to 5.7, so I tried using the new migration jobs.
I've set up the connection profile and all the required permissions for the source DB, then followed the instructions to make a continuous migration.
The Job says it's running, migrating the ~450GB database. After about a day, it's still running, the storage used seems to have stopped growing, and the replication delay is at 0. The source database is not currently in use (That's why I'm unsing it to try this out before doing the same with a more important db).
According to this, if the dump phase is done, I should be able to promote the instance, but the promote button remains greyed out, and there's no way to check the running state (it only says "running", and I don't see any way to check if it's dumping, on CDC, or anything else).
The documentation seems a bit lacking, and I couldn't find anything by googling around. Has anyone been using this?
In short, my questions are:
Why can't I promote the instance?
and how can I check in what phase is the migration?
Here's a screencap of my job:
link because SO doesn't let me embed images yet
Thanks.
p.d.: the tag that the documentation says should be used in stackoverflow is: google-cloud-database-migration-service, which is too long and stackoverflow doesn't allow, so I used google-cloud-sql instead :/
I am seeing an issue like this, but possibly more frustrating. After a week for a 2TB database, storage resets to near-zero and the full dump restarts, without any errors or indication of what happened.
We're a team of 4 data scientists that use Amazon RDS PostgreSQL for analysis purposes. So we're looking for a way to automatically start/stop the instance automatically but based on usage as opposed to time.
For example, there are clearly solutions for starting and stopping automatically during regular business hours (Stopping an Amazon RDS DB Instance Temporarily).
However, this doesn't quite work for us because we all have different schedules and don't necessarily adhere to a standard schedule. I would like a script that basically checks whether the DB has been used in the past, say 30 minutes, and if not turn off the instance. Then, if someone tries to connect to the DB but it's turned off, then automatically turn it on. My intuition tells me that the latter is harder than the former, but I'm not sure. Is this possible?
To do this you would need to use a CloudWatch Alarm, to do this you would rely on metrics that are available to CloudWatch such as number of connections or CPU Utilization.
This alarm could trigger a Lambda function that will stop your RDS instance, be aware that an RDS instance will restart once it has been off for 7 days.
Alternatively if you're able to use it you could look into Aurora Serverless with the PostgreSQL compatible version. this option would automatically handle the stop/start functionality when no one is using it.
I've been testing Google Cloud SQL with Postgresql, but I have random queries taking ~3s instead of a few ms.
Troubleshooting I did:
The queries themselves aren't problems, rerunning the same query will work.
Indexes are properly set. The database is also very very small, it shouldn't do this, even if there weren't any index.
The Kubernetes container is connecting to the database through SQL Proxy (I followed this https://cloud.google.com/sql/docs/postgres/connect-kubernetes-engine). It is not the problem though as I tried to connect directly to the database, with the same issue.
I configured net.ipv4.tcp_keepalive_time to 60 to make sure the connection weren't dropping.
I also have a pool of connection that are never disconnected to make sure it wasn't from that.
When I run queries directly through my local Postgresql client, I never have the problem.
I don't have this issue when developing locally either and connecting to my local database.
What I'm getting at is: I feel there's some weird connection/link issue between my Google Compute instances and my Google SQL instance that I can't seem to figure out.
Any idea?
Edit:
I also noticed these logs in my SQL Cloud instance every 30s:
ERROR: recovery is not in progress
HINT: Recovery control functions can only be executed during recovery.
STATEMENT: SELECT pg_is_xlog_replay_paused(), current_timestamp
That's an interesting problem you are facing. So my knowledge on Kubernetes isn't that great, but I do have a general understanding so let's see if I can provide some suggestions.
To start with, the API that you linked to in your question does mention that it is still in beta. So I do believe there would still be issues to patch in maximizing speed performance.
Secondly, from what I understand, Kubernetes is a great tool for handling stateless workloads. Thus, handling data where state is required for queries would be a slow operation. This article (although not entirely related) does explain some of the pitfalls of Kubernetes (not all the questions are relevant)
Thirdly, could you explain your use case a little bit? Do you really need to use Kubernetes or will another tool like a powerful Compute Engine Instance or or a Dataflow job resolve the the issue? Are you making your database queries through a programming language or an application call?
Thanks, and do let me know!
Usually you cant use MongoDB in Lambdas because Lambda functions are stateless and operations on MongoDB require a connection, thus you suffer a large performance hit in setting up a DB connection each time a function is run.
A solution I have thought of is to use mLab's REST API (http://docs.mlab.com/data-api/), that way I dont need to open a new connection each time my Lambda function is called.
On problem I can see if that mLab's REST service could become a bottleneck, plus im relying on it never going down.
Thoughts?
I have a couple of alternative suggestions for you on this. Only because I've never used mLab.
Setup http://restheart.org/ and have that sit between your lambda micro services and your MongoDB instance. I've used this with pretty decent success on another project. It does come with the downside of now having an EC2 instance to maintain. However, setting up restheart is pretty easy and the crew maintaining it and giving support is pretty great.
You can setup a lambda function that pays the cost of connecting and keeping a connection open. All of your other microservices can then call that lambda function for the data they need. If it is hit more frequently, you will not have to pay the cost of the DB connection as frequently. However, that first connection can be pretty brutal so you may need something keeping it warm. You will also have the potential issue of connections never getting properly closed, and eventually running out.
Those two options aside, if mlab is hosting your DB, you already have put a lot of faith in their ability to keep a system alive. If they cant keep an API up that lack of faith should also translate to their ability to keep your DB alive.
We have a production system that uses MongoDB as it's data storage and writes a lot of the actions it completes there.
I'd like to run some reports to see what is going on and if this was a MSSQL DB I'd have a replicated server setup so I didn't cause any locks that might effect the live system.
Is this necessary in MongoDB?
I was considering adding a Hidden server that could be used to run queries from, but I haven't investigated that in any detail.
Obviously any queries you run for reporting are going to add load to the server. Depending on what types of queries your reports are running will affect how big of an impact this will have. It is definitely possible to set up a dedicated secondary for the sole purpose of reporting. You could then make a direct connection to that secondary and do slaveOk queries to run your reports. This page has information on how to set up a hidden replica set member: http://www.mongodb.org/display/DOCS/Replica+Set+Configuration#ReplicaSetConfiguration-Memberoptions.