I'm running Postgres 11.
I have a table with 1.000.000 (1 million) rows and each row has a size of 40 bytes (it contains 5 columns). That is equal to 40MB.
When I execute (directly executed on the DB via DBeaver, DataGrid ect.- not called via Node, Python ect.):
SELECT * FROM TABLE
it takes 40 secs first time (is this not very slow even for the first time).
The CREATE statement of my tables:
CREATE TABLE public.my_table_1 (
c1 int8 NOT NULL GENERATED ALWAYS AS IDENTITY,
c2 int8 NOT NULL,
c3 timestamptz NULL,
c4 float8 NOT NULL,
c5 float8 NOT NULL,
CONSTRAINT my_table_1_pkey PRIMARY KEY (id)
);
CREATE INDEX my_table_1_c3_idx ON public.my_table_1 USING btree (c3);
CREATE UNIQUE INDEX my_table_1_c2_idx ON public.my_table_1 USING btree (c2);
On 5 random tables: EXPLAIN (ANALYZE, BUFFERS) select * from [table_1...2,3,4,5]
Seq Scan on table_1 (cost=0.00..666.06 rows=34406 width=41) (actual time=0.125..7.698 rows=34406 loops=1)
Buffers: shared read=322
Planning Time: 15.521 ms
Execution Time: 10.139 ms
Seq Scan on table_2 (cost=0.00..9734.87 rows=503187 width=41) (actual time=0.103..57.698 rows=503187 loops=1)
Buffers: shared read=4703
Planning Time: 14.265 ms
Execution Time: 74.240 ms
Seq Scan on table_3 (cost=0.00..3486217.40 rows=180205440 width=41) (actual time=0.022..14988.078 rows=180205379 loops=1)
Buffers: shared hit=7899 read=1676264
Planning Time: 0.413 ms
Execution Time: 20781.303 ms
Seq Scan on table_4 (cost=0.00..140219.73 rows=7248073 width=41) (actual time=13.638..978.125 rows=7247991 loops=1)
Buffers: shared hit=7394 read=60345
Planning Time: 0.246 ms
Execution Time: 1264.766 ms
Seq Scan on table_5 (cost=0.00..348132.60 rows=17995260 width=41) (actual time=13.648..2138.741 rows=17995174 loops=1)
Buffers: shared hit=82 read=168098
Planning Time: 0.339 ms
Execution Time: 2730.355 ms
When I add a LIMIT 1.000.000 to table_5 (it contains 1.7 million rows)
Limit (cost=0.00..19345.79 rows=1000000 width=41) (actual time=0.007..131.939 rows=1000000 loops=1)
Buffers: shared hit=9346
-> Seq Scan on table_5(cost=0.00..348132.60 rows=17995260 width=41) (actual time=0.006..68.635 rows=1000000 loops=1)
Buffers: shared hit=9346
Planning Time: 0.048 ms
Execution Time: 164.133 ms
When I add a WHERE clause between 2 dates (I'm monitored the query below with DataDog software and the results are here (max.~ 31K rows/sec when fetching): https://www.screencast.com/t/yV0k4ShrUwSd):
Seq Scan on table_5 (cost=0.00..438108.90 rows=17862027 width=41) (actual time=0.026..2070.047 rows=17866766 loops=1)
Filter: (('2018-01-01 00:00:00+04'::timestamp with time zone < matchdate) AND (matchdate < '2020-01-01 00:00:00+04'::timestamp with time zone))
Rows Removed by Filter: 128408
Buffers: shared hit=168180
Planning Time: 14.820 ms
Execution Time: 2673.171 ms
All tables has an unique index on the c3 column.
The size of the database is like 500GB in total.
The server has 16 cores and 112GB M2 memory.
I have tried to optimize Postgres system variables - Like: WorkMem(1GB), shared_buffer(50GB), effective_cache_size (20GB) - But it doesn't seems to change anything (I know the settings has been applied - because I can see a big difference in the amount of idle memory the server has allocated).
I know the database is too big for all data to be in memory. But is there anything I can do to boost the performance / speed of my query?
Make sure CreatedDate is indexed.
Make sure CreatedDate is using the date column type. This will be more efficient on storage (just 4 bytes), performance, and you can use all the built in date formatting and functions.
Avoid select * and only select the columns you need.
Use YYYY-MM-DD ISO 8601 format. This has nothing to do with performance, but it will avoid a lot of ambiguity.
The real problem is likely that you have thousands of tables with which you regularly make unions of hundreds of tables. This indicates a need to redesign your schema to simplify your queries and get better performance.
Unions and date change checks suggest a lot of redundancy. Perhaps you've partitioned your tables by date. Postgres has its own built in table partitioning which might help.
Without more detail that's all I can say. Perhaps ask another question about your schema.
Without seeing EXPLAIN (ANALYZE, BUFFERS), all we can do is speculate.
But we can do some pretty good speculation.
Cluster the tables on the index on CreatedDate. This will allow the data to be accessed more sequentially, allowing more read-ahead (but this might not help much for some kinds of storage). If the tables have high write load, they may not stay clustered and so you would have recluster them occasionally. If they are static, this could be a one-time event.
Get more RAM. If you want to perform as if all the data was in memory, then get all the data into memory.
Get faster storage, like top-notch SSD. It isn't as fast as RAM, but much faster than HDD.
Related
We have a large table (265M rows), which has about 50 columns in total, mostly either integers, dates, or varchars. The table has a primary key defined on an autoincremented column.
A query loads temp table with Pk values (say, 10000-20000 rows) and then the large table is queried by being joined to the temp table.
Average size of the row in the large table is fairly consistent and is around 280 bytes.
Here is the query plan it produces, when running:
Nested Loop (cost=0.57..115712.21 rows=13515 width=372) (actual time=0.016..54797.581 rows=11838 loops=1)
Buffers: shared hit=49960 read=9261, local hit=53
-> Seq Scan on t_ids ti (cost=0.00..188.15 rows=13515 width=4) (actual time=0.006..6.993 rows=11838 loops=1)
Buffers: local hit=53
-> Index Scan using test_pk on test t (cost=0.57..8.55 rows=1 width=368) (actual time=4.624..4.624 rows=1 loops=11838)
Index Cond: (pk = ti.pk)
Buffers: shared hit=49960 read=9261
Planning Time: 0.128 ms
Execution Time: 54801.600 ms
... where test is the large table (actually, clustered by pk) and t_ids is the temporary table.
It seems to be doing the right thing - scanning temp table and hitting the large table on the pk index, 11k times... But it is sooo slow....
Any suggestions on what can be tried to make it run faster are gretly appreciated!
My SysAdmin colleague told me that the Postgres'hosts use NVME disk.
How can I check that with Linux command?
Why does the planner/optimizer seem to get it wrong when I set random_page_cost =1.1
To bench it, I created table t_1(id int):
CREATE TABLE t_1 (id int);
In the id column, I inserted random numbers between 1 and 10,000,000. I inserted 10 millions rows.
Then:
CREATE INDEX ON t_1(id);
SET random_page_cost =1.1;
The query, using a Seq Scan, looks like this:
EXPLAIN (ANALYZE,BUFFERS) SELECT * from t_1 where id > 1600000;
And the result like this:
Seq Scan on t_1 (cost=0.00..169248.00 rows=8361406 width=4) (actual time=0.010..917.250 rows=8399172 loops=1)
Filter: (id > 1600000)
Rows Removed by Filter: 1600828
Buffers: shared hit=44248
Planning Time: 0.079 ms
Execution Time: 1301.160 ms
(6 lignes)
The query, using a Seq Scan, looks like that:
EXPLAIN (ANALYZE,BUFFERS) SELECT * from t_1 where id > 1700000;
And the result like that:
Index Only Scan using t_1_id_idx on t_1 (cost=0.43..166380.65 rows=8258658 width=4) (actual time=0.027..1309.272 rows=8300257 loops=1)
Index Cond: (id > 1700000)
Heap Fetches: 0
Buffers: shared hit=3176619
Planning Time: 0.070 ms
Execution Time: 1698.532 ms
(6 lignes)
On the internet, I read that it is recommended to set the random_page_cost ~= seq_page_cost because random accesses are as fast as sequential accesses with NVME disks.
Using an index requires reading random pages from disk; whereas reading the full table gets to read pages sequentially.
By decreasing the random_page_cost parameter, I notice that the planner favors the use of Indexes as expected, but to my surprise the execution time deteriorates.
My manager ask me to bench the random_page_cost parameter. How can I bench random_page_cost ?
I have a table with 4707838 rows. When I run the following query on this table it takes around 9 seconds to execute.
SELECT json_agg(
json_build_object('accessorId',
p."accessorId",
'mobile',json_build_object('enabled', p.mobile,'settings',
json_build_object('proximityAccess', p."proximity",
'tapToAccess', p."tapToAccess",
'clickToAccessRange', p."clickToAccessRange",
'remoteAccess',p."remote")
),'
card',json_build_object('enabled',p."card"),
'fingerprint',json_build_object('enabled',p."fingerprint"))
) AS permissions
FROM permissions AS p
WHERE p."accessPointId"=99
The output of explain analyze is as follows:
Aggregate (cost=49860.12..49860.13 rows=1 width=32) (actual time=9011.711..9011.712 rows=1 loops=1)
Buffers: shared read=29720
I/O Timings: read=8192.273
-> Bitmap Heap Scan on permissions p (cost=775.86..49350.25 rows=33991 width=14) (actual time=48.886..8704.470 rows=36556 loops=1)
Recheck Cond: ("accessPointId" = 99)
Heap Blocks: exact=29331
Buffers: shared read=29720
I/O Timings: read=8192.273
-> Bitmap Index Scan on composite_key_accessor_access_point (cost=0.00..767.37 rows=33991 width=0) (actual time=38.767..38.768 rows=37032 loops=1)
Index Cond: ("accessPointId" = 99)
Buffers: shared read=105
I/O Timings: read=32.592
Planning Time: 0.142 ms
Execution Time: 9012.719 ms
This table has a btree index on accessorId column and composite index on (accessorId,accessPointId).
Can anyone tell me what could be the reason for this query to be slow even though it uses an index?
Over 90% of the time is waiting to get data from disk. At 3.6 ms per read, that is pretty fast for a harddrive (suggesting that much of the data was already in the filesystem cache, or that some of the reads brought in neighboring data that was also eventually required--that is sequential reads not just random reads) but slow for a SSD.
If you set enable_bitmapscan=off and clear the cache (or pick a not recently used "accessPointId" value) what performance do you get?
How big is the table? If you are reading a substantial fraction of the table and think you are not getting as much benefit from sequential reads as you should be, you can try making your OSes readahead settings more aggressive. On Linux that is something like sudo blockdev --setra ...
You could put all columns referred to by the query into the index, to enable index-only scans. But given the number of columns you are using that might be impractical. You could want "accessPointId" to be the first column in the index. By the way, is the index currently used really on (accessorId,accessPointId)? It looks to me like "accessPointId" is really the first column in that index, not the 2nd one.
You could cluster the table by an index which has "accessPointId" as the first column. That would group the related records together for faster access. But note it is a slow operation and takes a strong lock on the table while it is running, and future data going into the table won't be clustered, only the current data.
You could try to increase effective_io_concurrency so that you can have multiple io requests outstanding at a time. How effective this is will depend on your hardware.
I have a table named snapshots with a column named data in jsonb format
An Index is created on snapshots table
create index on snapshots using(( data->>'creator' ));
The following query was using index initially but not after couple of days
SELECT id, data - 'sections' - 'sharing' AS data FROM snapshots WHERE data->>'creator' = 'abc#email.com' ORDER BY xmin::text::bigint DESC
below is the output by running explain analyze
Sort (cost=19.10..19.19 rows=35 width=77) (actual time=292.159..292.163 rows=35 loops=1)
Sort Key: (((xmin)::text)::bigint) DESC
Sort Method: quicksort Memory: 30kB
-> Seq Scan on snapshots (cost=0.00..18.20 rows=35 width=77) (actual time=3.500..292.104 rows=35 loops=1)
Filter: ((data ->> 'creator'::text) = 'abc#email.com'::text)
Rows Removed by Filter: 152
Planning Time: 0.151 ms
Execution Time: 292.198 ms
A table with 187 rows is very small. For very small tables, a sequential scan is the most efficient strategy.
What is surprising here is the long duration of the query execution (292 milliseconds!). Unless you have incredibly lame or overloaded storage, this must mean that the table is extremely bloated – it is comparatively large, but almost all pages are empty, with only 187 live rows. You should rewrite the table to compact it:
VACUUM (FULL) snapshots;
Then the query will become must faster.
The below query is taking more time to run. How can I optimize the below query to run for more records? I have run Explain Analyze for this query. Attached the output for the same.
This was the existing query created as a View and taking a long time (more than hours) to return the result.
I have done vacuum, analyze and reindex on these 2 tables but no luck.
select st_tr.step_trail_id,
st_tr.test_id,
st_tr.trail_id,
st_tr.step_name,
filter.regular_expression as filter_expression,
filter.order_of_occurrence as filter_order,
filter.match_type as filter_match_type,
null as begins_with,
null as ends_with,
null as input_source,
null as pattern_expression,
null as pattern_matched,
null as pattern_status,
null as pattern_order,
'filter' as record_type
from tab_report_step st_tr,
tab_report_filter filter
where st_tr.st_tr_id = filter.st_tr_id)
Query plan:
Hash Join (cost=446852.58..1176380.76 rows=6353676 width=489) (actual time=16641.953..47270.831 rows=6345360 loops=1)
Buffers: shared hit=1 read=451605 dirtied=5456 written=5424, temp read=154080 written=154074
-> Seq Scan on tab_report_filter filter (cost=0..24482.76 rows=6353676 width=161) (actual time=0.041..8097.233 rows=6345360 loops=1)
Buffers: shared read=179946 dirtied=4531 written=4499
-> Hash (cost=318817.7..318817.7 rows=4716070 width=89) (actual time=16627.291..16627.291 rows=4709040 loops=1)
Buffers: shared hit=1 read=271656 dirtied=925 written=925, temp written=47629
-> Seq Scan on tab_report_step st_tr (cost=0..318817.7 rows=4716070 width=89) (actual time=0.059..10215.484 rows=4709040 loops=1)
Buffers: shared hit=1 read=271656 dirtied=925 written=925
You have not run VACUUM on these tables. Perhaps VACUUM (FULL), but certainly not VACUUM.
There are two things that can be improved:
Make sure that no pages have to be dirtied or written while you read them. That is most likely because this is the first time you read the rows, and PostgreSQL sets hint bits.
Running VACUUM (without FULL) would have fix that. Also, if you repeat the experiment, you shouldn't get those dirtied and written buffers any more.
Give the query more memory by increasing work_mem. The hash does not fit in work_mem and spills to disk, which causes extra disk reads and writes, which is bad for performance.
Since you join two big tables with no restricting WHERE conditions and have a lot of result rows, this query will never be fast.