postgres function update large table and deletion performance - postgresql

postgres version: 9.3
postgres.conf: all default configurations
I have 2 tables, A and B,both have 1 million rows.
There is a postgres function that will execute every 2 seconds, it will update Table A where ids in an array(array size = 20), and then delete the rows in Table B.
DB function shows as below:
CREATE OR REPLACE FUNCTION test_function (ids NUMERIC[])
RETURNS void AS $$
BEGIN
UPDATE A a
SET status = 'begin', end_time = (NOW() AT TIME ZONE 'UTC')
WHERE a.id = ANY (ids);
DELETE FROM B b
WHERE b.aid = ANY (ids)
AND b.status = 'end';
END;
$$ LANGUAGE plpgsql;
Analysis shows as below:
explain(ANALYZE,BUFFERS,VERBOSE) select test_function('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}');
QUERY PLAN
Result (cost=0.00..0.26 rows=1 width=0) (actual time=14030.435..14030.436 rows=1 loops=1)
Output: test_function('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}'::numeric[])
Buffers: shared hit=24297 read=26137 dirtied=20
Total runtime: 14030.444 ms
(4 rows)
My Question is:
In the production environment, why this function need to execute at most 7 seconds before success;
When this function is executing, this process will eats up to 60%. --> This is the key problem
EDIT:
Analyze each single sql:
explain(ANALYZE,VERBOSE,BUFFERS) UPDATE A a SET status = 'begin',
end_time = (now()) WHERE a.id = ANY
('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}');
QUERY PLAN
Update on public.A a (cost=0.45..99.31 rows=20 width=143) (actual time=1.206..1.206 rows=0 loops=1)
Buffers: shared hit=206 read=27 dirtied=30
-> Index Scan using A_pkey on public.a a (cost=0.45..99.31 rows=20 width=143) (actual time=0.019..0.116 rows=19 loops=1)
Output: id, start_time, now(), 'begin'::character varying(255), xxxx... ctid
Index Cond: (t.id = ANY('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}'::integer[]))
Buffers: shared hit=75 read=11
Trigger test_trigger: time=5227.111 calls=1
Total runtime: 5228.357 ms
(8 rows)
explain(ANALYZE,BUFFERS,VERBOSE) DELETE FROM
B b WHERE tq.aid = ANY
('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}');
QUERY PLAN
Delete on B b (cost=0.00..1239.11 rows=20 width=6) (actual time=6.013..6.013 rows=0 loops=1)
Buffers: shared hit=448
-> Seq Scan on B b (cost=0.00..1239.11 rows=20 width=6) (actual time=6.011..6.011 rows=0 loops=1)
Output: ctid
Filter: (b.aid = ANY ('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}'::bigint[]))
Rows Removed by Filter: 21743
Buffers: shared hit=448
Total runtime: 6.029 ms
(8 rows)
CPU usage
Before calling:
Afer frequently operations:

Related

Update statement on PostgreSQL not using primary key index to update

I have a stored procedure on PostgreSQL like this:
create or replace procedure new_emp_sp (f_name varchar, l_name varchar, age integer, threshold integer, dept varchar)
language plpgsql
as $$
declare
new_emp_count integer;
begin
INSERT INTO employees (id, first_name, last_name, age)
VALUES (nextval('emp_id_seq'),
random_string(10),
random_string(20),
age);
select count(*) into new_emp_count from employees where age > threshold;
update dept_employees set emp_count = new_emp_count where id = dept;
end; $$
I have enabled auto_explain module and set log_min_duration to 0 so that it logs everything.
I have an issue with the update statement in the procedure. From the auto_explain logs I see that it is not using the primary key index to update the table:
-> Seq Scan on dept_employees (cost=0.00..1.05 rows=1 width=14) (actual time=0.005..0.006 rows=1 loops=1)
Filter: ((id)::text = 'ABC'::text)
Rows Removed by Filter: 3
This worked as expected until a couple of hours ago and I used to get a log like this:
-> Index Scan using dept_employees_pkey on dept_employees (cost=0.15..8.17 rows=1 width=48) (actual time=0.010..0.011 rows=1 loops=1)
Index Cond: ((id)::text = 'ABC'::text)
Without the procedure, if I run the statement standalone like this:
explain analyze update dept_employees set emp_count = 123 where id = 'ABC';
The statement correctly uses the primary key index:
Update on dept_employees (cost=0.15..8.17 rows=1 width=128) (actual time=0.049..0.049 rows=0 loops=1)
-> Index Scan using dept_employees_pkey on dept_employees (cost=0.15..8.17 rows=1 width=128) (actual time=0.035..0.036 rows=1 loops=1)
Index Cond: ((id)::text = 'ABC'::text)
I can't figure out what has gone wrong especially because it worked perfectly just a couple of hours ago.
It is faster to scan N rows sequentially than to scan N rows using an index. So for small tables Postgres may decide that a sequence scan is faster than an index scan.
PL/pgSQL can cache prepared statements and execution plans, so you're probably getting a cached execution plan from when the table was smaller.

Postgres function slower than same ad hoc query

I have had several cases where a Postgres function that returns a table result from a query is much slower than running the actual query. Why is that?
This is one example, but I've found that function is slower than just the query in many cases.
create function trending_names(date_start timestamp with time zone, date_end timestamp with time zone, gender_filter character, country_filter text)
returns TABLE(name_id integer, gender character, country text, score bigint, rank bigint)
language sql
as
$$
select u.name_id,
n.gender,
u.country,
count(u.rank) as score,
row_number() over (order by count(u.rank) desc) as rank
from babynames.user_scores u
inner join babynames.names n on u.name_id = n.id
where u.created_at between date_start and date_end
and u.rank > 0
and n.gender = gender_filter
and u.country = country_filter
group by u.name_id, n.gender, u.country
$$;
This is the query plan for a select from the function:
Function Scan on trending_names (cost=0.25..10.25 rows=1000 width=84) (actual time=1118.673..1118.861 rows=2238 loops=1)
Buffers: shared hit=216509 read=29837
Planning Time: 0.078 ms
Execution Time: 1119.083 ms
Query plan from just running the query. This takes less than half the time.
WindowAgg (cost=44834.98..45593.32 rows=43334 width=25) (actual time=383.387..385.223 rows=2238 loops=1)
Planning Time: 2.512 ms
Execution Time: 387.403 ms
Buffers: shared hit=100446 read=50220
-> Sort (cost=44834.98..44943.31 rows=43334 width=17) (actual time=383.375..383.546 rows=2238 loops=1)
Sort Method: quicksort Memory: 271kB
Sort Key: (count(u.rank)) DESC
Buffers: shared hit=100446 read=50220
-> HashAggregate (cost=41064.22..41497.56 rows=43334 width=17) (actual time=381.088..381.906 rows=2238 loops=1)
" Group Key: u.name_id, u.country, n.gender"
Buffers: shared hit=100446 read=50220
-> Hash Join (cost=5352.15..40630.88 rows=43334 width=13) (actual time=60.710..352.646 rows=36271 loops=1)
Hash Cond: (u.name_id = n.id)
Buffers: shared hit=100446 read=50220
-> Index Scan using user_scores_rank_ix on user_scores u (cost=0.43..35077.55 rows=76796 width=11) (actual time=24.193..287.393 rows=69770 loops=1)
-> Hash (cost=5005.89..5005.89 rows=27667 width=6) (actual time=36.420..36.420 rows=27472 loops=1)
Rows Removed by Filter: 106521
Index Cond: (rank > 0)
Filter: ((created_at >= '2021-01-01 00:00:00+00'::timestamp with time zone) AND (country = 'sv'::text) AND (created_at <= now()))
Buffers: shared hit=99417 read=46856
Buffers: shared hit=1029 read=3364
Buckets: 32768 Batches: 1 Memory Usage: 1330kB
-> Seq Scan on names n (cost=0.00..5005.89 rows=27667 width=6) (actual time=0.022..24.447 rows=27472 loops=1)
Rows Removed by Filter: 21559
Filter: (gender = 'f'::bpchar)
Buffers: shared hit=1029 read=3364
I'm also confused on why it does a Seq scan on names n in the last step since names.id is the primary key and gender is indexed.

PostgreSQL slow order

I have table (over 100 millions records) on PostgreSQL 13.1
CREATE TABLE report
(
id serial primary key,
license_plate_id integer,
datetime timestamp
);
Indexes (for test I create both of them):
create index report_lp_datetime_index on report (license_plate_id, datetime);
create index report_lp_datetime_desc_index on report (license_plate_id desc, datetime desc);
So, my question is why query like
select * from report r
where r.license_plate_id in (1,2,4,5,6,7,8,10,15,22,34,75)
order by datetime desc
limit 100
Is very slow (~10sec). But query without order statement is fast (milliseconds).
Explain:
explain (analyze, buffers, format text) select * from report r
where r.license_plate_id in (1,2,4,5,6,7,8,10,15,22,34, 75,374,57123)
limit 100
Limit (cost=0.57..400.38 rows=100 width=316) (actual time=0.037..0.216 rows=100 loops=1)
Buffers: shared hit=103
-> Index Scan using report_lp_id_idx on report r (cost=0.57..44986.97 rows=11252 width=316) (actual time=0.035..0.202 rows=100 loops=1)
Index Cond: (license_plate_id = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75,374,57123}'::integer[]))
Buffers: shared hit=103
Planning Time: 0.228 ms
Execution Time: 0.251 ms
explain (analyze, buffers, format text) select * from report r
where r.license_plate_id in (1,2,4,5,6,7,8,10,15,22,34,75,374,57123)
order by datetime desc
limit 100
Limit (cost=44193.63..44193.88 rows=100 width=316) (actual time=4921.030..4921.047 rows=100 loops=1)
Buffers: shared hit=11455 read=671
-> Sort (cost=44193.63..44221.76 rows=11252 width=316) (actual time=4921.028..4921.035 rows=100 loops=1)
Sort Key: datetime DESC
Sort Method: top-N heapsort Memory: 128kB
Buffers: shared hit=11455 read=671
-> Bitmap Heap Scan on report r (cost=151.18..43763.59 rows=11252 width=316) (actual time=54.422..4911.927 rows=12148 loops=1)
Recheck Cond: (license_plate_id = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75,374,57123}'::integer[]))
Heap Blocks: exact=12063
Buffers: shared hit=11455 read=671
-> Bitmap Index Scan on report_lp_id_idx (cost=0.00..148.37 rows=11252 width=0) (actual time=52.631..52.632 rows=12148 loops=1)
Index Cond: (license_plate_id = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75,374,57123}'::integer[]))
Buffers: shared hit=59 read=4
Planning Time: 0.427 ms
Execution Time: 4921.128 ms
You seem to have rather slow storage, if reading 671 8kB-blocks from disk takes a couple of seconds.
The way to speed this up is to reorder the table in the same way as the index, so that you can find the required rows in the same or adjacent table blocks:
CLUSTER report_lp_id_idx USING report_lp_id_idx;
Be warned that rewriting the table in this way causes downtime – the table will not be available while it is being rewritten. Moreover, PostgreSQL does not maintain the table order, so subsequent data modifications will cause performance to gradually deteriorate, so that after a while you will have to run CLUSTER again.
But if you need this query to be fast no matter what, CLUSTER is the way to go.
Your two indices do exactly the same thing, so you can remove the second one, it's useless.
To optimize your query, the order of the fields inside the index must be reversed:
create index report_lp_datetime_index on report (datetime,license_plate_id);
BEGIN;
CREATE TABLE foo (d INTEGER, i INTEGER);
INSERT INTO foo SELECT random()*100000, random()*1000 FROM generate_series(1,1000000) s;
CREATE INDEX foo_d_i ON foo(d DESC,i);
COMMIT;
VACUUM ANALYZE foo;
EXPLAIN ANALYZE SELECT * FROM foo WHERE i IN (1,2,4,5,6,7,8,10,15,22,34,75) ORDER BY d DESC LIMIT 100;
Limit (cost=0.42..343.92 rows=100 width=8) (actual time=0.076..9.359 rows=100 loops=1)
-> Index Only Scan Backward using foo_d_i on foo (cost=0.42..40976.43 rows=11929 width=8) (actual time=0.075..9.339 rows=100 loops=1)
Filter: (i = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75}'::integer[]))
Rows Removed by Filter: 9016
Heap Fetches: 0
Planning Time: 0.339 ms
Execution Time: 9.387 ms
Note the index is not used to optimize the WHERE clause. It is used here as a compact and fast way to store references to the rows ordered by date DESC, so the ORDER BY can do an index-only scan and avoid sorting. By adding column id to the index, an index-only scan can be performed to test the condition on id, without hitting the table for every row. Since there is a low LIMIT value it does not need to scan the whole index, it only scans it in date DESC order until it finds enough rows satisfying the WHERE condition to return the result.
It will be faster if you create the index in date DESC order, this could be useful if you use ORDER BY date DESC + LIMIT in other queries too.
You forget that OP's table has a third column, and he is using SELECT *. So that wouldn't be an index-only scan.
Easy to work around. The optimum way to do this query would be an index-only scan to filter on WHERE conditions, then LIMIT, then hit the table to get the rows. For some reason if "select *" is used postgres takes the id column from the table instead of taking it from the index, which results in lots of unnecessary heap fetches for rows whose id is rejected by the WHERE condition.
Easy to work around, by doing it manually. I've also added another bogus column to make sure the SELECT * hits the table.
EXPLAIN (ANALYZE,buffers) SELECT * FROM foo
JOIN (SELECT d,i FROM foo WHERE i IN (1,2,4,5,6,7,8,10,15,22,34,75) ORDER BY d DESC LIMIT 100) f USING (d,i)
ORDER BY d DESC LIMIT 100;
Limit (cost=0.85..1281.94 rows=1 width=17) (actual time=0.052..3.618 rows=100 loops=1)
Buffers: shared hit=453
-> Nested Loop (cost=0.85..1281.94 rows=1 width=17) (actual time=0.050..3.594 rows=100 loops=1)
Buffers: shared hit=453
-> Limit (cost=0.42..435.44 rows=100 width=8) (actual time=0.037..2.953 rows=100 loops=1)
Buffers: shared hit=53
-> Index Only Scan using foo_d_i on foo foo_1 (cost=0.42..51936.43 rows=11939 width=8) (actual time=0.037..2.935 rows=100 loops=1)
Filter: (i = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75}'::integer[]))
Rows Removed by Filter: 9010
Heap Fetches: 0
Buffers: shared hit=53
-> Index Scan using foo_d_i on foo (cost=0.42..8.45 rows=1 width=17) (actual time=0.005..0.005 rows=1 loops=100)
Index Cond: ((d = foo_1.d) AND (i = foo_1.i))
Buffers: shared hit=400
Execution Time: 3.663 ms
Another option is to just add the primary key to the date,license_plate index.
SELECT * FROM foo JOIN (SELECT id FROM foo WHERE i IN (1,2,4,5,6,7,8,10,15,22,34,75) ORDER BY d DESC LIMIT 100) f USING (id) ORDER BY d DESC LIMIT 100;
Limit (cost=1357.98..1358.23 rows=100 width=17) (actual time=3.920..3.947 rows=100 loops=1)
Buffers: shared hit=473
-> Sort (cost=1357.98..1358.23 rows=100 width=17) (actual time=3.919..3.931 rows=100 loops=1)
Sort Key: foo.d DESC
Sort Method: quicksort Memory: 32kB
Buffers: shared hit=473
-> Nested Loop (cost=0.85..1354.66 rows=100 width=17) (actual time=0.055..3.858 rows=100 loops=1)
Buffers: shared hit=473
-> Limit (cost=0.42..509.41 rows=100 width=8) (actual time=0.039..3.116 rows=100 loops=1)
Buffers: shared hit=73
-> Index Only Scan using foo_d_i_id on foo foo_1 (cost=0.42..60768.43 rows=11939 width=8) (actual time=0.039..3.093 rows=100 loops=1)
Filter: (i = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75}'::integer[]))
Rows Removed by Filter: 9010
Heap Fetches: 0
Buffers: shared hit=73
-> Index Scan using foo_pkey on foo (cost=0.42..8.44 rows=1 width=17) (actual time=0.006..0.006 rows=1 loops=100)
Index Cond: (id = foo_1.id)
Buffers: shared hit=400
Execution Time: 3.972 ms
Edit
After thinking about it... since the LIMIT restricts the output to 100 rows ordered by date desc, wouldn't it be nice if we could get the 100 most recent rows for each license_plate_id, put all that into a top-n sort, and only keep the best 100 for all license_plate_ids? That would avoid reading and throwing away a lot of rows from the index. Even if that's much faster than hitting the table, it will still load up these index pages in RAM and clog up your buffers with stuff you don't actually need to keep in cache. Let's use LATERAL JOIN:
EXPLAIN (ANALYZE,BUFFERS)
SELECT * FROM foo
JOIN (SELECT d,i FROM
(VALUES (1),(2),(4),(5),(6),(7),(8),(10),(15),(22),(34),(75)) idlist
CROSS JOIN LATERAL
(SELECT d,i FROM foo WHERE i=idlist.column1 ORDER BY d DESC LIMIT 100) f2
ORDER BY d DESC LIMIT 100
) f3 USING (d,i)
ORDER BY d DESC LIMIT 100;
It's even faster: 2ms, and it uses the index on (license_plate_id,date) instead of the other way around. Also, and this is important, since each subquery in the lateral hits only the index pages that contain rows that will actually be selected, while the previous queries hit much more index pages. So you save on RAM buffers.
If you don't need the index on (date,license_plate_id) and don't want to keep a useless index, that could be interesting since this query doesn't use it. On the other hand, if you need the index on (date,license_plate_id) for something else and want to keep it, then... maybe not.
Please post results for the winning query 🔥

Postgres optimization failing to filter window function partitions early

In some cases, PostgreSQL does not filter out window function partitions until they are calculated, while in a very similar scenario PostgreSQL filters row before performing window function calculation.
Tables used for minimal STR - log is the main data table, each row contains either increment or absolute value. Absolute value resets the current counter with a new base value. Window functions need to process all logs for a given account_id to calculate the correct running total. View uses a subquery to ensure that underlying log rows are not filtered by ts, otherwise, this would break the window function.
CREATE TABLE account(
id serial,
name VARCHAR(100)
);
CREATE TABLE log(
id serial,
absolute int,
incremental int,
account_id int,
ts timestamp,
PRIMARY KEY(id),
CONSTRAINT fk_account
FOREIGN KEY(account_id)
REFERENCES account(id)
);
CREATE FUNCTION get_running_total_func(
aggregated_total int,
absolute int,
incremental int
) RETURNS int
LANGUAGE sql IMMUTABLE CALLED ON NULL INPUT AS
$$
SELECT
CASE
WHEN absolute IS NOT NULL THEN absolute
ELSE COALESCE(aggregated_total, 0) + incremental
END
$$;
CREATE AGGREGATE get_running_total(integer, integer) (
sfunc = get_running_total_func,
stype = integer
);
Slow view:
CREATE VIEW test_view
(
log_id,
running_value,
account_id,
ts
)
AS
SELECT log_running.* FROM
(SELECT
log.id,
get_running_total(
log.absolute,
log.incremental
)
OVER(
PARTITION BY log.account_id
ORDER BY log.ts RANGE UNBOUNDED PRECEDING
),
account.id,
ts
FROM log log JOIN account account ON log.account_id=account.id
) AS log_running;
CREATE VIEW
postgres=# EXPLAIN ANALYZE SELECT * FROM test_view WHERE account_id=1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on log_running (cost=12734.02..15981.48 rows=1 width=20) (actual time=7510.851..16122.404 rows=20 loops=1)
Filter: (log_running.id_1 = 1)
Rows Removed by Filter: 99902
-> WindowAgg (cost=12734.02..14732.46 rows=99922 width=32) (actual time=7510.830..14438.783 rows=99922 loops=1)
-> Sort (cost=12734.02..12983.82 rows=99922 width=28) (actual time=7510.628..9312.399 rows=99922 loops=1)
Sort Key: log.account_id, log.ts
Sort Method: external merge Disk: 3328kB
-> Hash Join (cost=143.50..2042.24 rows=99922 width=28) (actual time=169.941..5431.650 rows=99922 loops=1)
Hash Cond: (log.account_id = account.id)
-> Seq Scan on log (cost=0.00..1636.22 rows=99922 width=24) (actual time=0.063..1697.802 rows=99922 loops=1)
-> Hash (cost=81.00..81.00 rows=5000 width=4) (actual time=169.837..169.865 rows=5000 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 240kB
-> Seq Scan on account (cost=0.00..81.00 rows=5000 width=4) (actual time=0.017..84.639 rows=5000 loops=1)
Planning Time: 0.199 ms
Execution Time: 16127.275 ms
(15 rows)
Fast view - only change is account.id -> log.account_id (!):
CREATE VIEW test_view
(
log_id,
running_value,
account_id,
ts
)
AS
SELECT log_running.* FROM
(SELECT
log.id,
get_running_total(
log.absolute,
log.incremental
)
OVER(
PARTITION BY log.account_id
ORDER BY log.ts RANGE UNBOUNDED PRECEDING
),
log.account_id,
ts
FROM log log JOIN account account ON log.account_id=account.id
) AS log_running;
CREATE VIEW
postgres=# EXPLAIN ANALYZE SELECT * FROM test_view WHERE account_id=1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on log_running (cost=1894.96..1895.56 rows=20 width=20) (actual time=34.718..45.958 rows=20 loops=1)
-> WindowAgg (cost=1894.96..1895.36 rows=20 width=28) (actual time=34.691..45.307 rows=20 loops=1)
-> Sort (cost=1894.96..1895.01 rows=20 width=24) (actual time=34.367..35.925 rows=20 loops=1)
Sort Key: log.ts
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.28..1894.53 rows=20 width=24) (actual time=0.542..34.066 rows=20 loops=1)
-> Index Only Scan using account_pkey on account (cost=0.28..8.30 rows=1 width=4) (actual time=0.025..0.054 rows=1 loops=1)
Index Cond: (id = 1)
Heap Fetches: 1
-> Seq Scan on log (cost=0.00..1886.03 rows=20 width=24) (actual time=0.195..32.937 rows=20 loops=1)
Filter: (account_id = 1)
Rows Removed by Filter: 99902
Planning Time: 0.297 ms
Execution Time: 47.300 ms
(14 rows)
Is it a bug in PostgreSQL implementation? It seems that this change in view definition shouldn't affect performance at all, PostgreSQL should be able to filter data before applying window function for all data set.

Generic execution plan in query language (SQL) function

I am trying to figure out an issue related to poor performance in a parameterized SQL function. The documentation is relatively clear about execution plans in PL/PGSQL functions - the query planner plans the executions the first 5 times and then may cache the plan after that within the same session.
However, I have been unable to find similar documentation related to SQL functions and, from some investigating, it appears that the query planner surprisingly always uses a generic plan.
Here's a simple example:
-- Postgres 11.1
-- Test table and some data
create table test (a int);
insert into test select 1 from generate_series(1,1000000);
insert into test values (2);
create index on test(a);
analyze test;
-- A SQL Function
create function test_f(val int, cnt out bigint) AS $$
SELECT count(*) FROM test where a = val;
$$ LANGUAGE SQL;
-- A similar PL/PGSQL Function
create function test_f_plpgsql (val int, cnt out bigint) AS $$
BEGIN
SELECT count(*) FROM test where a = val INTO cnt;
END
$$ LANGUAGE PLPGSQL;
-- Show plans
LOAD 'auto_explain';
SET auto_explain.log_analyze = on;
SET client_min_messages = log;
SET auto_explain.log_nested_statements = on;
SET auto_explain.log_min_duration = 0;
The PL/PGSQL function uses a non-generic plan, and knows to use an index-only scan.
select * from test_f_plpgsql(2);
LOG: duration: 0.326 ms plan:
Query Text: SELECT count(*) FROM test where a = val
Aggregate (cost=4.45..4.46 rows=1 width=8) (actual time=0.180..0.202 rows=1 loops=1)
-> Index Only Scan using test_a_idx on test (cost=0.42..4.44 rows=1 width=0) (actual time=0.096..0.135 rows=1 loops=1)
Index Cond: (a = 2)
Heap Fetches: 1
LOG: duration: 1.250 ms plan:
Query Text: select * from test_f_plpgsql(2);
Function Scan on test_f_plpgsql (cost=0.25..0.26 rows=1 width=8) (actual time=1.116..1.152 rows=1 loops=1)
cnt
-----
1
(1 row)
The SQL function, on the other hand, uses a generic plan, which poorly chooses a full table scan.
select * from test_f(2);
LOG: duration: 18.716 ms plan:
Query Text:
SELECT count(*) FROM test where a = val;
Partial Aggregate (cost=10675.01..10675.02 rows=1 width=8) (actual time=18.639..18.665 rows=1 loops=1)
-> Parallel Seq Scan on test (cost=0.00..9633.34 rows=416667 width=0) (actual time=18.621..18.628 rows=0 loops=1)
Filter: (a = $1)
Rows Removed by Filter: 273008
LOG: duration: 28.304 ms plan:
Query Text:
SELECT count(*) FROM test where a = val;
Partial Aggregate (cost=10675.01..10675.02 rows=1 width=8) (actual time=28.234..28.248 rows=1 loops=1)
-> Parallel Seq Scan on test (cost=0.00..9633.34 rows=416667 width=0) (actual time=28.199..28.208 rows=1 loops=1)
Filter: (a = $1)
Rows Removed by Filter: 129222
LOG: duration: 45.913 ms plan:
Query Text:
SELECT count(*) FROM test where a = val;
Finalize Aggregate (cost=11675.22..11675.23 rows=1 width=8) (actual time=42.370..42.377 rows=1 loops=1)
-> Gather (cost=11675.01..11675.22 rows=2 width=8) (actual time=42.288..45.787 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=10675.01..10675.02 rows=1 width=8) (actual time=29.579..29.597 rows=1 loops=3)
-> Parallel Seq Scan on test (cost=0.00..9633.34 rows=416667 width=0) (actual time=29.553..29.560 rows=0 loops=3)
Filter: (a = $1)
Rows Removed by Filter: 333333
LOG: duration: 47.128 ms plan:
Query Text: select * from test_f(2);
Function Scan on test_f (cost=0.25..0.26 rows=1 width=8) (actual time=47.058..47.073 rows=1 loops=1)
cnt
-----
1
(1 row)
Is there a way to force the SQL function to use a non-generic plan?