Starting and Stopping PostgreSQL Amazon RDS Instance Automatically Based on Usage - postgresql

We're a team of 4 data scientists that use Amazon RDS PostgreSQL for analysis purposes. So we're looking for a way to automatically start/stop the instance automatically but based on usage as opposed to time.
For example, there are clearly solutions for starting and stopping automatically during regular business hours (Stopping an Amazon RDS DB Instance Temporarily).
However, this doesn't quite work for us because we all have different schedules and don't necessarily adhere to a standard schedule. I would like a script that basically checks whether the DB has been used in the past, say 30 minutes, and if not turn off the instance. Then, if someone tries to connect to the DB but it's turned off, then automatically turn it on. My intuition tells me that the latter is harder than the former, but I'm not sure. Is this possible?

To do this you would need to use a CloudWatch Alarm, to do this you would rely on metrics that are available to CloudWatch such as number of connections or CPU Utilization.
This alarm could trigger a Lambda function that will stop your RDS instance, be aware that an RDS instance will restart once it has been off for 7 days.
Alternatively if you're able to use it you could look into Aurora Serverless with the PostgreSQL compatible version. this option would automatically handle the stop/start functionality when no one is using it.

Related

Cloud SQL Mysql - Stuck in failover operation in progress

My Cloud SQL Mysql 5.7.37 Highly available instance is stuck in a "Failover operation in progress. This may take a few minutes. While this operation is running, you may continue to view information about the instance" process. It is a fairly small database and it has been stuck like this for 5 hours and the failover is not available so no DB queries can be executed, hence our system is currently down.
No commands on the DB can be executed since it is in an updating process, the error log is empty and the operations log only contain this update and successfull backups.
Does anyone have any suggestions? I am not paying for Google Support so I cant get support directly from them (which I think is terrible since this a fully managed service).
Best,
Carl-Fredrik

LISTEN/NOTIFY on AWS Aurora/postgres

I am hoping to implement a trigger that fires a notify on table change in PostgreSQL, but I am unable to find any definite answer to if it would work in a cluster.
Does PostgreSQL listen/notify work on AWS Aurora (both provisioned and serverless), and can I expect all clients to receive notifications regardless of what instance they are connected to?
If yes, are there any major performance concerns?
Thanks :)
The best AWS Documentation implies - YES, it does work
Specifically for Aurora Serverless, there is an advisory that "LISTEN/NOTIFY" is NOT RECOMMENDED. This implies that it's possible with Aurora Serverless but not advisable. It also implies that normal RDS LISTEN/NOTIFY does work and is ok to use.
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.html
Currently, certain SQL queries and commands, such as cursors with hold
(DECLARE name ... CURSOR WITH HOLD FOR query); session-level advisory
locks; temporary relations; and asynchronous notifications (LISTEN,
NOTIFY commands) prevent scaling and are not recommended
My own quick test on AWS shows - Yes, it does work
Launch Instance with Public Access
...
Allow Traffic
Run command in dbeaver
Note: This test doesn't show a successful receipt of a notification. That will need a quick console application to test that.

Google Cloud SQL Postgres - randomly slow queries from Google Compute / Kubernetes

I've been testing Google Cloud SQL with Postgresql, but I have random queries taking ~3s instead of a few ms.
Troubleshooting I did:
The queries themselves aren't problems, rerunning the same query will work.
Indexes are properly set. The database is also very very small, it shouldn't do this, even if there weren't any index.
The Kubernetes container is connecting to the database through SQL Proxy (I followed this https://cloud.google.com/sql/docs/postgres/connect-kubernetes-engine). It is not the problem though as I tried to connect directly to the database, with the same issue.
I configured net.ipv4.tcp_keepalive_time to 60 to make sure the connection weren't dropping.
I also have a pool of connection that are never disconnected to make sure it wasn't from that.
When I run queries directly through my local Postgresql client, I never have the problem.
I don't have this issue when developing locally either and connecting to my local database.
What I'm getting at is: I feel there's some weird connection/link issue between my Google Compute instances and my Google SQL instance that I can't seem to figure out.
Any idea?
Edit:
I also noticed these logs in my SQL Cloud instance every 30s:
ERROR: recovery is not in progress
HINT: Recovery control functions can only be executed during recovery.
STATEMENT: SELECT pg_is_xlog_replay_paused(), current_timestamp
That's an interesting problem you are facing. So my knowledge on Kubernetes isn't that great, but I do have a general understanding so let's see if I can provide some suggestions.
To start with, the API that you linked to in your question does mention that it is still in beta. So I do believe there would still be issues to patch in maximizing speed performance.
Secondly, from what I understand, Kubernetes is a great tool for handling stateless workloads. Thus, handling data where state is required for queries would be a slow operation. This article (although not entirely related) does explain some of the pitfalls of Kubernetes (not all the questions are relevant)
Thirdly, could you explain your use case a little bit? Do you really need to use Kubernetes or will another tool like a powerful Compute Engine Instance or or a Dataflow job resolve the the issue? Are you making your database queries through a programming language or an application call?
Thanks, and do let me know!

Is there a way to enable an SQL log to see/optimize my queries using CloudSQL

I started my test of using a Google's CloudSQL instance with a desktop based application, so far I am impressed with a performance, even it is laggy, it does the job, so my next step is to see what simple modifications can do to my application most intended to reduce Access to the database and optimize if there is something more to do.
How can I do log the sql commands send to the database in order to check what queries are being sent. My app uses ODBC drivers in Windows.
Regards
What you probably want is to turn on the general log. Unfortunately, that requires SUPER privileges and that was removed some time ago (announcement). We are going to provide a way to tweak parameters like that via the Cloud SQL API. For now, the best solution is to use a setup a local server and use the logging on that one. If you really want it on production ping us on the google-cloud-sql-discuss Google group and we'll enable the SUPER for your instance.

pgpool-II for Postgres - Is it what I need?

I just stumbled upon pgpool-II in my search for clustering my Postgres DB (just getting ready to deploy a web app in a couple months). I still have the shakes from excitement, but I'm nervous, as each time I find something this excellent I am soon let down. Have you any experience with pgpool-II, and will it help me run my database in multiple VMs, and later in multiple physical servers altogether? Is it all I need for backing up, load balancing, and providing a higher availability for my DB server!?
Also, is it easy to use the parallel query function (for instance, in Django or through Pythons psycopg2)? This would be most excellent for providing reporting and aggregation!
One last thing: It seems to work between Postgres and psycopg2. Is this a correct understanding of it, so I can use psycopg2 the same as normal, without regard for pgpool-II?
pgpool-II works fine for what it claims to do. And it fits between your application and the database the way you expect it to; just point psycopg2 toward it instead of directly at the database and off you go.
The main thing you have to note is that while it supports many different types of features--replication, load balancing, parallel query--you can't use them all at once. It sounds like you may be under the impression you can do that, and it doesn't work that way. The documentation is not all that clear on this subject (the English version at least, I can't speak to the original Japanese one).
For example, if you run pgpool-II in its "Master/Slave" mode, so that it supports load-balancing for scaling reads, you have to use another program to actually do the replication between those nodes. Slony was the supported replication solution to put underneath of there in earlier PostgreSQL versions, as of pgpool-II 3.0 and PostgreSQL 9.0 you can also use the soon to be released Streaming Replication/Hot Standby features of that new version as well.
pgpool-II is a useful component and you can use it in a lot of interesting ways, but I doubt it will be "all you need" for every requirement you hope to achieve with it.