PostgreSQL query is very slow every fifth/sixth time it runs - postgresql

we have a quite powerful PostgreSQL-Instance running in a Kubernetes cluster with actually quite low usage. We are facing the problem that every fifth/sixth/seventh time we run the query it is significantly slower (5s instead of 100ms). You can find an example query, but we are facing the same problem with several other queries as well. The query planner (EXPLAIN ANALYSE) show all the time the same execution plan.
SELECT count(*) AS "count"
FROM "administrative_areas"."flurstuecke" AS "flurstuecke"
WHERE "flurstuecke"."flur_id" = 8554
Our storage is based on Netapp and integrated into Kubernetes with Trident. We can not see any high IOPS values, high CPU load or memory consumption.
Does anyone have an idea or can point us in the right direction?

Related

High CPU usage on Cloud SQL causing timeouts

We have a postres database that has billions of records in it.
We have one client that uses our older API to query the database to fetch thousands of records once a day.
I would say close to the top end of the thousands.
The API is currently on a compute engine behind a load balancer and during the allotted time I spin up 6 instances of this to attempt to help handle the load.
What I have found is that the CPU usage on cloud SQL is maxing out at 100% and most of the other stats are fine, it's just the CPU.
This basically renders our API useless as we can't accept connections and it just shits its self.
What can we do to help this?
Here is the CPU utilisation chart
And the connections
Read/Writes
Memory Usage
You can see in most of the other charts the readings are well within normal for what we expect.
I don't really want to have to beef up the CPU usage if it isn't really the actual underlining problem.
A further thing to note is we have developed a new endpoint for this client specifically to use, they have not got that in place yet, and there is no guarantee that it will reduce the db load.
High CPU usage can most definitely cause dropped or ignored connections. The database engine and underlying OS are fighting for resources and aren't able to respond to the connection in time.
While you can increase CPU usage, it looks like the CPU usage you have it (usually) enough, except during parts where the CPU is at 100%. I'd suggest instead finding out why the query is eating so much CPU usage and optimizing it.
You might be interested in something like Cloud SQL Insights to help debug the query.

GCP CloudSQL (PostgreSQL) Crash During Stored Procedure Execution and Failover

I have a stored procedure in GCP CloudSQL (PostgreSQL v9.0.23). It works find in lower environments; but when it runs in Production (with significantly more volume), it crashes the DB itself which results in a Failover.
When we checked the metrics, what we found out is that the memory is more than 90% just before it crashes (15 GB out of the 16GB instance memory). Also the Read / Writes are very high >1000 Ops per second.
The SP does some select and insert statements. Any suggestions to improve this situation helps.
Thanks in advance.
As you have mentioned that the Cloud SQL instance is running smoothly with a small amount of workload but crashing with the Production environment where more intensive workloads are there, it seems the issue is with the instance size. So I would suggest you increase the instance size as per your need.
Also you have mentioned that the memory usage is 15 GB out of 16 GB which amounts to nearly 94%. As per this document your Cloud SQL instance will not be covered under Cloud SQL SLA if memory usage is over 90% for more than 6 hours of duration. So I would suggest you keep the memory usage within 90%. Also I would suggest keeping the CPU utilization as mentioned in this document. To know when your instance reaches any threshold I will suggest you set a monitoring alert for that metrics as mentioned here.
If increasing your instance size doesn’t help I would recommend you to create a support ticket with Google Cloud Support so that they can investigate in detail.

Automatic vacuum of table "cloudsqladmin.public.heartbeat"

We're experiencing some constant outages in our back-end that seem to correlate with peaks of high CPU usage for our Cloud SQL Postgres instance (v9.6)
Taking a look to the cloudsql.googleapis.com/postgres.log, those high CPU peaks seems to also correlate to when the database is running an automatic vacuum of table cloudsqladmin.public.heartbeat
We haven't found any documentation on what this table is and why is running autovacuum so often (our own tables doesn't seem to be affected by it).
Is this normal? Should we tune the values for the autovacuum? Thanks in advance.
By looking at your graphs there is no correlation between the CPU and the cloudsqladmin.public.heartbeat autovacuum.
Lets start by what the cloudsqladmin.public.heartbeat table is, this is a table used by the Cloud SQL High Availability process, this is better explained here:
Each second, the primary instance writes to a system database as a
heartbeat signal.
So the table is used internally to keep track of your instance's health. The autovacuum is triggered based on the doc David shared.
Now, if the Vacuum process generated the CPU spike, you would see the spike every minute/second.
So, straight answers to your questions:
Is this normal? : Yes, the autovacuum and the cloudsqladmin.public.heartbeat table are completely normal from a Cloud SQL internal perspective, they should not impact in any way the Instance.
Should we tune the values for the autovacuum? : No need for that, as mentioned, this process is not the one impacting the CPU Instance, you can hide the similar logs including "cloudsqladmin.public.heartbeat" and analyze the ones left on the time the Spike was presented.
It is worth looking at the backup processes triggered too (there could be one on the same time) Cloud SQL > Instance Details > Backups, but of course, that's a different topic than the one described here :) .
Here's a recommendation that seems very relevant to your situation: https://www.netiq.com/documentation/cloud-manager-2-5/ncm-install/data/vacuum.html

What happens when AWS RDS CPU utilisation reaches100%?

I would like to know what what happens when AWS RDS CPU utilisation is 100%?
Do the database requests fail or are the requests put on hold until the CPU utilisation drops below 100%?
I'm using RDS Postgres and thanks in advance for your help.
Your query performance will degrade. Further queries will fail.
If your RDS is the sole database instance for your application, your entire application could come to a stand still.
You will need to figure out if the CPU is peaking due to high load or if there is one such query that is consuming all the resources.
If its under heavy load, adding another replica might help if its read heavy. If its write heavy, you may need to scale up to the next instance or probably think about sharding your datasets.
This could lead to a lot of issues namely:
There is a good chance that if the CPU remains at 100% consistently, your instance will crash. Now, if this a Multi AZ instance, an automatic failover can reduce the downtime incurred due to any unexpected reboot to around 2min. However, if this is a Single AZ instance, the downtime can be significantly longer.
Your Db instance won't accept any new connections despite not hitting the value of 'max_connections'. There is a good chance that some of the existing transactions will roll back due to performance degradation.
Continuous spike to 100% CPU may lead to memory pressure, ie, very high swap usage and low freeable memory and an eventual instance crash.
Workload will reach the threshold and read and write IOPS won't go further.

Scaling DB2 to increase tps

I wanna have brainstorm with you guys all about scaling option that DB2 have. Hope can helped me to resolve the problem.
I need to scale my DB2 database to anticipate flash crowd transaction to database server. My database can only serve around 200 transaction++ per sec in application term not database tps before my database totally stalled and out of cpu.
What are you guys think, if I want to increase to reach 2000++ or 10 times before, what options i have to scale my database?
Recently i read about pureScale feature. Its look promising but its not flexible solution by mean it just can be deploy on IBM System X and ours is not. Are there other solution like pureScale in shared-everything approach?
The second option maybe database partition. Is database partition or shared-nothing approach can help resolve my problem? Can add processing power to my system?
Thanks and regards,
Fritz
Before you worry about how to scale up (more hardware in 1 server) or out (more servers), look at how to tune your database. Buying your way out of a performance problem is almost always more expensive than spending time to find and fix the performance problem.
Assuming that the process(es) consuming CPU on your database server are the database engine, then high CPU activity and low I/O activity is indicative that you're doing a LOT of reads, but they are just all in memory. Scanning a huge table is still in inefficient, even if that table is stored completely in memory (buffer pools).
Find the SQL statements that are using the most CPU. Look at the explain plans, and figure out how to make them more efficient. There are LOTS of resources on the web for database performance tuning.