I am using gcp cloud sql (mysql 8.0.18) and I am trying to execute a query for only 5000 rows,
SELECT * FROM client_1079.c_crmleads ORDER BY LeadID DESC LIMIT 5000;
but I think the execution is taking long time to fetch data
here is the time details
Affected rows: 0 Found rows: 5,000 Warnings: 0 Duration for 1 query: 0.797 sec. (+ 117.609 sec. network)
Instance configuration is vCPU: 8 , RAM: 20 GB, SSD: 410GB
screenshot of gcp cloud sql instance
also I am facing some issues on high table_open_cache and high ram utilization.
how do I reduce open_table_cache also how to increase instance performance?
Looks like the size of the data retrieved is quite large and the time spent on sending the data from the SQL instance to your App is the reason of the latency observed.
You may review your use case and maybe retrieve less information, or try to parallellize queries, or improve the SQL instance I/O performance (it is related to DB Disk Size).
Related
I know that count(*) in Postgres is generally slow, however I have a database where it's super slow. I'm talking about minutes even hours.
There is approximately 40M rows in a table and the table consists of 29 columns (most of the are text, 4 are double precision). There is an index on one column which should be unique and I've already run vacuum full. It took around one hour to complete but without no observable results.
Database uses dedicated server with 32GB ram. I set shared_buffers to 8GB and work_mem to 80MB but no speed improvement. I'm aware there are some techniques to get approximated count or to use external table to keep the count but I'm not interested in the count specifically, I'm more concerned about performance in general, since now it's awful. When I run the count there are no CPU peeks or something. Could someone point where to look? Can it be that data are structured so badly that 40M rows are too much for postgres to handle?
I am using Google Cloud PostgreSQL which has utilization CPU 100%. I have upgraded the instance to use 2 cores. Now the instance is running on 2 CPU's and 3.75Gb of RAM. Still the instance is using 100% of CPU resources. Again, I have upgrade the instance to 6 cores and 12Gb of RAM, but still no change in CPU utilization. Here are some stats metrics:
I want any thought about why this is happening, how can I figure out the solution?
I have checked the number of queries running on PostgreSQL. Number of queries is less than 100 and execution time is less than 30 seconds. PostgreSQL verison is 9.6
I've been doing this daily now, I'll share how I debug this problem.
Firsts of all, install extension pgstatstatements so it will store all execution statistics of all SQL statements executed on the server.
After that, it's easy...
This query will show most "expensive" queries:
SELECT substring(query, 1, 50) AS short_query,
round(total_time::numeric, 2) AS total_time,
calls,
round(mean_time::numeric, 2) AS mean,
round(max_time::numeric, 2) AS max_time,
round((100 * total_time / sum(total_time::numeric) OVER ())::numeric, 2) AS percentage_cpu,
query
FROM pg_stat_statements
ORDER BY total_time DESC LIMIT 10
And this one to reset the statistics, useful when you want to debug a specific period:
SELECT pg_stat_statements_reset()
In order to see which queries are running currently on the server:
SELECT user, pid, client_addr, query, query_start, NOW() - query_start AS elapsed
FROM pg_stat_activity
WHERE query != '<IDLE>'
-- AND EXTRACT(EPOCH FROM (NOW() - query_start)) > 1
ORDER BY elapsed DESC;
If you have a better way to debug the performance please tell me!
Also if some GCP engineers are reading this, please enable more metrics that can enable us to trace the problem. Example process CPU on the server can tell which DB/Schema is taking too much CPU.
EDIT:
Google released query insights and it's useful when you don't want to make your hands dirty!
I still use pgstatstatements!
I have a Redshift cluster with 3 nodes. Every now and then, with users running queries against it, we end in this unpleasant situation where some queries run for way longer than expected (even simple ones, exceeding 15 minutes), and the cluster storage starts increasing to the point that if you don't terminate the long-standing queries it gets to 100% storage occupied.
I wonder why this may happen. My experience is varied, sometimes it's been a single query doing this and sometimes it's been different concurrent queries been run at the same time.
One specific scenario where we saw this happen related to LISTAGG. The type of LISTAGG is varchar(65535), and while Redshift optimizes away the implicit trailing blanks when stored to disk, the full width is required in memory during processing.
If you have a query that returns a million rows, you end up with 1,000,000 rows times 65,535 bytes per LISTAGG, which is 65 gigabytes. That can quickly get you into a situation like what you describe, with queries taking unexpectedly long or failing with “Disk Full” errors.
My team discussed this a bit more on our team blog the other day.
This typically happens when a poorly constructed query spills a too much data to disk. For instance the user accidentally specifies a Cartesian product (every row from tblA joined to every row of tblB).
If this happens regularly you can implement a QMR rule that limits the amount of disk spill before a query is aborted.
QMR Documentation: https://docs.aws.amazon.com/redshift/latest/dg/cm-c-wlm-query-monitoring-rules.html
QMR Rule Candidates query: https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminScripts/wlm_qmr_rule_candidates.sql
I have an entry plan instance of DB2 Warehouse on Cloud that I'm looking to use for development of a streaming application.
If I keep the data to <= 1GB, it will cost me $50/month. I'm worried that I could easily fill the database up with 20GB and the cost jumps up to $1000/month.
Is there a way that I can limit the amount of data in my DB2 Warehouse on Cloud to < 1GB?
As per this link
Db2 Warehouse pricing plans
You will not be charged anything as long as your data usage does not exceed 1 GB. From 1 GB to 20 GB the price will vary based on the data used.
You should be able to see the current % of usage at any time in your console. Other than that I am not aware of any method to automatically restrict the usage to less than 1 GB at this time.
One of the problem would be the data compression which determines the actual amount of data stored and it can vary based on the type of data stored.
Hope this helps.
Regards
Murali
We have created a RDS postgres instance (m4.xlarge) with 200GB storage (Provisioned IOPS). We are trying to upload data from company data mart to the 23 tables in RDS using DataStage. However the uploads are quite slow. It takes about 6 hours to load 400K records.
Then I started tuning the following parameters according to Best Practices for Working with PostgreSQL:
autovacuum 0
checkpoint_completion_target 0.9
checkpoint_timeout 3600
maintenance_work_mem {DBInstanceClassMemory/16384}
max_wal_size 3145728
synchronous_commit off
Other than these, I also turned off multi AZ and back-up. SSL is enabled though, not sure this will change anything. However, after all the changes, still not much improvement. DataStage is uploading data in parallel already ~12 threads. Write IOPS is around 40/sec. Is this value normal? Is there anything else I can do to speed up the data transfer?
In Postgresql, you're going to have to wait 1 full round trip (latency) for each insert statement written. This latency is the latency between the database all the way to the machine where the data is being loaded from.
In AWS you have many options to improve performance.
For starters, you can load your raw data onto an EC2 instance and start importing from there, however, you will likely not be able to use your dataStage tool unless it can be loaded directly on the ec2 instance.
You can configure dataStage to use batch processing where each insert statement actually contains many rows.. generally, the more, the faster.
disable data compression and make sure you've done everything you can to minimize latency between the two endpoints.