Google Cloud PostgreSQL: Utilization remains at 100% - postgresql

I am using Google Cloud PostgreSQL which has utilization CPU 100%. I have upgraded the instance to use 2 cores. Now the instance is running on 2 CPU's and 3.75Gb of RAM. Still the instance is using 100% of CPU resources. Again, I have upgrade the instance to 6 cores and 12Gb of RAM, but still no change in CPU utilization. Here are some stats metrics:
I want any thought about why this is happening, how can I figure out the solution?
I have checked the number of queries running on PostgreSQL. Number of queries is less than 100 and execution time is less than 30 seconds. PostgreSQL verison is 9.6

I've been doing this daily now, I'll share how I debug this problem.
Firsts of all, install extension pgstatstatements so it will store all execution statistics of all SQL statements executed on the server.
After that, it's easy...
This query will show most "expensive" queries:
SELECT substring(query, 1, 50) AS short_query,
round(total_time::numeric, 2) AS total_time,
calls,
round(mean_time::numeric, 2) AS mean,
round(max_time::numeric, 2) AS max_time,
round((100 * total_time / sum(total_time::numeric) OVER ())::numeric, 2) AS percentage_cpu,
query
FROM pg_stat_statements
ORDER BY total_time DESC LIMIT 10
And this one to reset the statistics, useful when you want to debug a specific period:
SELECT pg_stat_statements_reset()
In order to see which queries are running currently on the server:
SELECT user, pid, client_addr, query, query_start, NOW() - query_start AS elapsed
FROM pg_stat_activity
WHERE query != '<IDLE>'
-- AND EXTRACT(EPOCH FROM (NOW() - query_start)) > 1
ORDER BY elapsed DESC;
If you have a better way to debug the performance please tell me!
Also if some GCP engineers are reading this, please enable more metrics that can enable us to trace the problem. Example process CPU on the server can tell which DB/Schema is taking too much CPU.
EDIT:
Google released query insights and it's useful when you don't want to make your hands dirty!
I still use pgstatstatements!

Related

slow query fetch time

I am using gcp cloud sql (mysql 8.0.18) and I am trying to execute a query for only 5000 rows,
SELECT * FROM client_1079.c_crmleads ORDER BY LeadID DESC LIMIT 5000;
but I think the execution is taking long time to fetch data
here is the time details
Affected rows: 0 Found rows: 5,000 Warnings: 0 Duration for 1 query: 0.797 sec. (+ 117.609 sec. network)
Instance configuration is vCPU: 8 , RAM: 20 GB, SSD: 410GB
screenshot of gcp cloud sql instance
also I am facing some issues on high table_open_cache and high ram utilization.
how do I reduce open_table_cache also how to increase instance performance?
Looks like the size of the data retrieved is quite large and the time spent on sending the data from the SQL instance to your App is the reason of the latency observed.
You may review your use case and maybe retrieve less information, or try to parallellize queries, or improve the SQL instance I/O performance (it is related to DB Disk Size).

Timescaledb jobs not running since a while now

I have timescaledb version 2.2.1 configured on a few servers ( all single-node deployments) and it has been working great for the most part.
On one of the servers, however, which is a relatively less powerful machine and with a large 100TB mounted NAS drive, the compression jobs I scheduled seemed to stop working once I set them for a large DB.
It did work for the smaller databases earlier, but when I created the hypertables on the largest DB ( total size of 13TB - with one table at 9.7TB itself ) and set up the appropriate compression policy, it just never triggered, even after I manually altered the job with the alter_job command. The same thing happened to the other DBS ( timescaledbs enabled ). The scheduled jobs stopped working on them too around the same time ( It has 29th Sept as the last successful finish date - ie. 20 days ago ).
I have tried manually calling the job and it only compresses one chunk at a time. So I had to manually compress them currently as a quick fix.
Can anyone please help? I cannot seem to find any resource regarding this.
SELECT compress_chunk(i,if_not_compressed=>true) from
show_chunks('oa_odds_historic', older_than => INTERVAL '10 day') i ;
code: SELECT alter_job(job_id, next_start =>now())
-- select *
FROM timescaledb_information.jobs
WHERE proc_name = 'policy_compression';
server spec :
timescaledb jobs:
Thanks Jonatasdp.
Unfortunately, it was due to not having enough background workers -
timescaledb.max_background_workers ,which I increased to accomodate the total number of jobs ( = number of hypertables which had compression policies enabled ) on the server.

PostgreSQL query is very slow every fifth/sixth time it runs

we have a quite powerful PostgreSQL-Instance running in a Kubernetes cluster with actually quite low usage. We are facing the problem that every fifth/sixth/seventh time we run the query it is significantly slower (5s instead of 100ms). You can find an example query, but we are facing the same problem with several other queries as well. The query planner (EXPLAIN ANALYSE) show all the time the same execution plan.
SELECT count(*) AS "count"
FROM "administrative_areas"."flurstuecke" AS "flurstuecke"
WHERE "flurstuecke"."flur_id" = 8554
Our storage is based on Netapp and integrated into Kubernetes with Trident. We can not see any high IOPS values, high CPU load or memory consumption.
Does anyone have an idea or can point us in the right direction?

Measuring load per database in Postgres using 'active' processes in pg_stat_activity?

I am trying to measure the load that various databases living on the same Postgres server are incurring, to determine how to best split them up across multiple servers. I devised this query:
select
now() as now,
datname as database,
usename as user,
count(*) as processes
from pg_stat_activity
where state = 'active'
and waiting = 'f'
and query not like '%from pg_stat_activity%'
group by
datname,
usename;
But there were surprisingly few active processes!
Digging deeper I ran a simple query that returns 20k rows and took 5 seconds to complete, according to the client I ran it from. When I queried pg_stat_activity during that time, the process was idle! I repeated this experiment several times.
The Postgres documentation says active means
The backend is executing a query.
and idle means
The backend is waiting for a new client command.
Is it really more nuanced than that? Why was the process running my query was not active when I checked in?
If this approach is flawed, what alternatives are there for measuring load at a database granularity than periodically sampling the number of active processes?
your expectations regarding active, idleand idle in transaction are very right. The only explanation I can think of is a huge delay in displaying data client side. So the query indeed finished on server and session is idle and yet you don't see the result with client.
regarding the load measurement - I would not rely on number of active sessions much. Pure luck to hit the fast query in active state. Eg hypothetically you can check pg_stat_activity each second and see one active session, but between measurement one db was queried 10 times and another once - yet none of those numbers will be seen. Because they were active between executions. And this 10+1 active states (although mean that one db is queried 10times more often) do not mean you should consider load at all - because cluster is so much not loaded, that you can't even catch executions. But this unavoidably mean that you can catch many active sessions and it would not mean that server is loaded indeed.
so at least take now()-query_start to your query to catch longer queries. Or even better save execution time for some often queries and measure if it degrades over time. Or better select pid and check resources eaten by that pid.
Btw for longer queries look into pg_stat_statements - looking how they change over time can give you some expectations on how the load changes

Postgresql I/O utilization

Newrelic shows I/O pikes of up to 100% reads.
The rest of the time it's close to 0.
It doesn't coincide in time with our cron jobs (database backup, analyze, etc.), also almost no match with autovacuum (pg_stat_all_tables.last_autovacuum).
Querying pg_stat_activity also shows nothing but our simple queries sent by Ruby application
select *
from pg_stat_activity
where query_start between '2017-09-25 09:00:00' and '2017-09-25 11:00:00';
And these queries are not even among our slow queries in log file.
Please advise how to identify what processes cause such I/O load.