postgres index not used - postgresql

Postgres 9.5
I have a table having one of columns jsonb;
CREATE TABLE public.test
(
objectstate jsonb
)
and index on it:
CREATE INDEX "test.type"
ON public.test
USING btree
((objectstate ->> 'type'::text) COLLATE pg_catalog."default");
I also have function returning depended types.... its more complex, so i'll give an example....
CREATE OR REPLACE FUNCTION testfunc(sxtype text)
RETURNS text AS
$BODY$
BEGIN
return '{type1, type2}';
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
now what I've got:
select testfunc('type1') gives me '{type1, type2}'
Next syntax works well and DOES uses index:
select * from test where objectstate->>'type' = ANY('{type1, type2}'::text[])
But once I'm trying to combine them, index not used
select * from test
where objectstate->>'type' = ANY((select testfunc('type1'))::text[])
Wierd thing is next query DOES USE sytax again! (but I cant use this workaround everywhere)
select * from test
where objectstate->>'type' = ANY((select testfunc('type1'))::text[])
order by objectstate->>'type'
explain analyze gives me:
"Seq Scan on test (cost=0.26..530872.27 rows=2238634 width=743) (actual time=1107.155..7992.825 rows=129 loops=1)"
" Filter: ((test ->> 'type'::text) = ANY (($0)::text[]))"
" Rows Removed by Filter: 4063727"
" InitPlan 1 (returns $0)"
" -> Result (cost=0.00..0.26 rows=1 width=0) (actual time=0.718..0.718 rows=1 loops=1)"
"Planning time: 0.319 ms"
"Execution time: 7992.870 ms"
and when order applyed:
"Index Scan using "test.type" on test (cost=0.70..545058.44 rows=2238634 width=743) (actual time=0.645..0.740 rows=129 loops=1)"
" Index Cond: ((objectstate ->> 'type'::text) = ANY (($0)::text[]))"
" InitPlan 1 (returns $0)"
" -> Result (cost=0.00..0.26 rows=1 width=0) (actual time=0.617..0.617 rows=1 loops=1)"
"Planning time: 0.300 ms"
"Execution time: 0.782 ms"
Any Ideas how can I force postgres to use Index without applying order by?

May be it is not an answer, but seems you may change the function definition from VOLATILE to IMMUTABLE with
CREATE OR REPLACE FUNCTION testfunc(sxtype text)
RETURNS text AS
$BODY$
BEGIN
return '{type1, type2}';
END;
$BODY$
LANGUAGE plpgsql IMMUTABLE
COST 100;
With VOLATILE function Postgres does not apply optimizations due to VOLATILE functions may change data and result of the function is not predictable. More at documentation https://www.postgresql.org/docs/9.5/static/sql-createfunction.html

Related

Update statement on PostgreSQL not using primary key index to update

I have a stored procedure on PostgreSQL like this:
create or replace procedure new_emp_sp (f_name varchar, l_name varchar, age integer, threshold integer, dept varchar)
language plpgsql
as $$
declare
new_emp_count integer;
begin
INSERT INTO employees (id, first_name, last_name, age)
VALUES (nextval('emp_id_seq'),
random_string(10),
random_string(20),
age);
select count(*) into new_emp_count from employees where age > threshold;
update dept_employees set emp_count = new_emp_count where id = dept;
end; $$
I have enabled auto_explain module and set log_min_duration to 0 so that it logs everything.
I have an issue with the update statement in the procedure. From the auto_explain logs I see that it is not using the primary key index to update the table:
-> Seq Scan on dept_employees (cost=0.00..1.05 rows=1 width=14) (actual time=0.005..0.006 rows=1 loops=1)
Filter: ((id)::text = 'ABC'::text)
Rows Removed by Filter: 3
This worked as expected until a couple of hours ago and I used to get a log like this:
-> Index Scan using dept_employees_pkey on dept_employees (cost=0.15..8.17 rows=1 width=48) (actual time=0.010..0.011 rows=1 loops=1)
Index Cond: ((id)::text = 'ABC'::text)
Without the procedure, if I run the statement standalone like this:
explain analyze update dept_employees set emp_count = 123 where id = 'ABC';
The statement correctly uses the primary key index:
Update on dept_employees (cost=0.15..8.17 rows=1 width=128) (actual time=0.049..0.049 rows=0 loops=1)
-> Index Scan using dept_employees_pkey on dept_employees (cost=0.15..8.17 rows=1 width=128) (actual time=0.035..0.036 rows=1 loops=1)
Index Cond: ((id)::text = 'ABC'::text)
I can't figure out what has gone wrong especially because it worked perfectly just a couple of hours ago.
It is faster to scan N rows sequentially than to scan N rows using an index. So for small tables Postgres may decide that a sequence scan is faster than an index scan.
PL/pgSQL can cache prepared statements and execution plans, so you're probably getting a cached execution plan from when the table was smaller.

How to optimize Row Level Security in Postgresql

I have postgres (13.2) based API with RLS enabled (I use postgraphile) and it's extremely slow.
User sends JWT from Google OAuth. Access to tables are based on roles (there are 2: person, admin) + RLS.
I have 2 tables for users auth: person, person_groups
CREATE TABLE IF NOT EXISTS myschema.person_groups (
id serial PRIMARY KEY,
person_id citext NOT NULL REFERENCES myschema.person (id),
google_id text NOT NULL REFERENCES myschema_private.person_account (google_id),
group_id serial NOT NULL REFERENCES myschema.groups (id),
updated_at timestamp DEFAULT now(),
CONSTRAINT unq_person_id_group_id UNIQUE (person_id, group_id)
);
CREATE INDEX persongroups_google_group_idx ON myschema.person_groups (google_id, group_id);
For RLS to check I have function specified as:
CREATE OR REPLACE FUNCTION myschema.is_in_group (group_id int[])
RETURNS boolean
AS $$
SELECT
CASE WHEN current_setting('role', FALSE) = 'admin' THEN
TRUE
WHEN EXISTS (
SELECT
1
FROM
myschema.person_groups
WHERE
person_groups.group_id = ANY ($1) AND person_groups.google_id = current_setting('user.sub', TRUE)) THEN
TRUE
ELSE
FALSE
END
$$
LANGUAGE SQL
STABLE
STRICT
SECURITY DEFINER;
I have table: "gate_enterlogs", which user wants to access.
RLS for this table is:
CREATE POLICY select_gate_enterlog ON myschema.gate_enterlog
FOR SELECT TO person
USING (myschema.is_in_group (ARRAY[6, 1]));
If I use such code:
BEGIN;
SET local ROLE person;
SET local "user.sub" TO 'yyy';
EXPLAIN ANALYZE VERBOSE
SELECT COUNT(id) FROM myschema.gate_enterlog;
COMMIT;
I end up with:
Aggregate (cost=23369.00..23369.01 rows=1 width=8) (actual time=2897.487..2897.487 rows=1 loops=1)
Output: count(id)
-> Seq Scan on myschema.gate_enterlog (cost=0.00..23297.08 rows=28769 width=4) (actual time=2897.484..2897.484 rows=0 loops=1)
Output: id, person_id, checkpoint_time, direction, place
Filter: is_in_group('{6,1}'::integer[])
Rows Removed by Filter: 86308
Planning Time: 0.626 ms
Execution Time: 2897.567 ms
If I disable RLS policy:
CREATE POLICY select_gate_enterlog ON myschema.gate_enterlog FOR SELECT TO person USING (TRUE);
Aggregate (cost=1935.85..1935.86 rows=1 width=8) (actual time=17.671..17.672 rows=1 loops=1)
Output: count(id)
-> Seq Scan on myschema.gate_enterlog (cost=0.00..1720.08 rows=86308 width=4) (actual time=0.008..7.364 rows=86308 loops=1)
Output: id, person_id, checkpoint_time, direction, place
Planning Time: 0.594 ms
Execution Time: 17.737 ms
Do you have any thoughts how can I optimize RLS so postgres would "remember" that user has privileges to access table.
My only idea is to end up with USING (TRUE) for select and grant access once before calling query, but before going that way I hope that somebody can give me a hint what I did wrong
I figured it out somehow. It seems that for some reason boolean function aren't optimised. I changed my auth function to:
CREATE OR REPLACE FUNCTION myschema.auth_group (group_id int[])
RETURNS SETOF int
AS $$
BEGIN
IF current_setting('role', FALSE) = 'admin' THEN
RETURN QUERY SELECT 1;
ELSIF EXISTS (SELECT 1 FROM myschema.person_groups
WHERE person_groups.google_id = current_setting('user.sub', TRUE) AND person_groups.group_id = ANY ($1)) THEN
RETURN QUERY SELECT 1;
END IF;
END;
$$
LANGUAGE plpgsql
STABLE STRICT
SECURITY DEFINER;
CREATE POLICY select_gate_enterlog ON myschema.gate_enterlog
FOR SELECT TO person USING (EXISTS (SELECT myschema.auth_group (ARRAY[6, 1])));
With such function planner is efficient:
Aggregate (cost=1827.97..1827.98 rows=1 width=8) (actual time=6.005..6.006 rows=1 loops=1)
Output: count(gate_enterlog.id)
InitPlan 1 (returns $0)
-> ProjectSet (cost=0.00..5.27 rows=1000 width=4) (actual time=0.158..0.159 rows=0 loops=1)
Output: auth_group(current_setting('role'::text, false), current_setting('user.sub'::text, true), '{6,1}'::integer[])
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
-> Seq Scan on mychema.gate_enterlog (cost=0.00..1720.08 rows=43154 width=4) (actual time=6.002..6.002 rows=0 loops=1)
Output: gate_enterlog.id, gate_enterlog.person_id, gate_enterlog.checkpoint_time, gate_enterlog.direction, gate_enterlog.place
Filter: $0
Rows Removed by Filter: 86308
Planning Time: 0.500 ms
Execution Time: 6.100 ms
Cost is pretty much the same as USING(TRUE) in RLS.
Accepted answer technically works, but SETOF return is not the actual fix here. Unless the contents of functions need optimization, you can keep the return type as is. The part that actually boost the performance is using SELECT when calling the function, which makes it so that subquery is evaluated and cached once, and not invoked for every row. To give an example, for the following functions used with RLS:
create or replace function utils.my_rls_check() returns boolean
LANGUAGE plpgsql
as $$
BEGIN
return true;
END;
$$;
CREATE OR REPLACE FUNCTION utils.my_other_rls_check() RETURNS UUID AS $$
SELECT '00000000-0000-0000-0000-000000000000'::UUID;
$$ LANGUAGE sql STABLE;
This would be slow:
select count(*) from app_public.entities
WHERE utils.my_rls_check();
select count(*) from app_public.entities
WHERE check_id = utils.my_other_rls_check();
This would be the fix to boost the performance:
select count(*) from app_public.entities
WHERE (SELECT utils.my_rls_check());
select count(*) from app_public.entities
WHERE check_id = (SELECT utils.my_other_rls_check());
The example functions are simplistic, but are enough to test the differences in performance, as long as the table being tested has a reasonable amount of data, e.g. 1 million rows or more.

PostgreSQL: Same request is slower with plpgsql language compared to sql

I'm new to PostgreSQL and I'm facing a issue regarding table functions performance. What I need to do is the equivalent of a Stored Procedure in MSSQL. After some research I found that a table function is the way to go so I took an exemple to create my function using plpgsql.
By comparing the execution times, it was 2 times slower using the function than calling the query directly (the query is exactly the same in the function).
After digging a little bit, I've found that using SQL language in my function improves a lot the execution time (becomes exactly the same time as if I call the query). After reading on this, I understand that plpgsql adds a little bit overhead but the difference is too big to explain that.
Since I'm not using any plpgsql functionality, this solution is fine for me and totally makes sense. However, I'd like to understand why such difference. If I compare the execution plans, the plpgsql version does some HashAggregate and sequential search compared to the SQL version that does GroupAggregate with some pre-sorting... I did use auto_explain as suggested by Laurenz Albe and I added at the end both execution plans.
Why such difference in the execution plan of the same query with the only difference the language? And moreover, even the result of the SUM (see my request below) has a significant difference. I know I'm using floating values so the result can be a little different between each call, but in this case the difference between the query and function is around ~3 which is unexpected (~10001 vs ~9998).
Below the code to reproduce the problem using 2 tables and 2 functions.
Note that I'm using PostgreSQL 12.
Any explanation are appreciated :) Thanks.
-- Step 1: Create database
-- Step 2: Create tables
-- table1
CREATE TABLE public.table1(area real, code text COLLATE pg_catalog."default");
-- table 2
CREATE TABLE public.table2(code text COLLATE pg_catalog."default" NOT NULL, surface real, CONSTRAINT table2_pkey PRIMARY KEY (code));
-- Step 3: create functions
-- plpgsql
CREATE OR REPLACE FUNCTION public.test_function()
RETURNS TABLE(code text, value real)
LANGUAGE 'plpgsql'
COST 100
VOLATILE
ROWS 1000
AS $BODY$
BEGIN
RETURN QUERY
SELECT table2.code, (case when (sum(area) * surface) IS NULL then 0 else (sum(area) * surface) end) AS value
FROM table1
INNER JOIN table2 on table1.code = table2.code
GROUP BY table2.code, surface
;
END;
$BODY$;
-- sql
CREATE OR REPLACE FUNCTION public.test_function2()
RETURNS TABLE(code text, value real)
LANGUAGE SQL
AS $BODY$
SELECT table2.code, (case when (sum(area) * surface) IS NULL then 0 else (sum(area) * surface) end) AS value
FROM table1
INNER JOIN table2 on table1.code = table2.code
GROUP BY table2.code, surface
$BODY$;
-- Step 4: insert some random data
-- table 2
INSERT INTO table2(code, surface) VALUES ('AAAAA', 1);
INSERT INTO table2(code, surface) VALUES ('BBBBB', 1);
INSERT INTO table2(code, surface) VALUES ('CCCCC', 1);
INSERT INTO table2(code, surface) VALUES ('DDDDD', 1);
INSERT INTO table2(code, surface) VALUES ('EEEEE', 1);
-- table1 (will take some time, this simulate my current query with 10 millions rows)
DO
$$
DECLARE random_code text;
DECLARE code_count int := (SELECT COUNT(*) FROM table2);
BEGIN
FOR i IN 1..10000000 LOOP
random_code := (SELECT code FROM table2 OFFSET floor(random() * code_count) LIMIT 1);
INSERT INTO public.table1(area, code) VALUES (random() / 100, random_code);
END LOOP;
END
$$
-- Step 5: compare
SELECT * FROM test_function()
SELECT * FROM test_function2() -- 2 times faster
Execution plan for test_function (plpgsql version)
2021-04-14 11:52:10.335 GMT [5056] LOG: duration: 3808.919 ms plan:
Query Text: SELECT table2.code, (case when (sum(area) * surface) IS NULL then 0 else (sum(area) * surface) end) AS value
FROM table1
INNER JOIN table2 on table1.code = table2.code
GROUP BY table2.code, surface
HashAggregate (cost=459899.03..459918.08 rows=1270 width=40) (actual time=3808.908..3808.913 rows=5 loops=1)
Group Key: table2.code
Buffers: shared hit=34 read=162130
-> Hash Join (cost=38.58..349004.15 rows=14785984 width=40) (actual time=215.340..2595.247 rows=10000000 loops=1)
Hash Cond: (table1.code = table2.code)
Buffers: shared hit=34 read=162130
-> Seq Scan on table1 (cost=0.00..310022.84 rows=14785984 width=10) (actual time=215.294..1036.615 rows=10000000 loops=1)
Buffers: shared hit=33 read=162130
-> Hash (cost=22.70..22.70 rows=1270 width=36) (actual time=0.019..0.020 rows=5 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 17kB
Buffers: shared hit=1
-> Seq Scan on table2 (cost=0.00..22.70 rows=1270 width=36) (actual time=0.013..0.014 rows=5 loops=1)
Buffers: shared hit=1
2021-04-14 11:52:10.335 GMT [5056] CONTEXT: PL/pgSQL function test_function() line 3 at RETURN QUERY
Execution plan for test_function2 (sql version)
2021-04-14 11:54:24.122 GMT [5056] LOG: duration: 1513.001 ms plan:
Query Text:
SELECT table2.code, (case when (sum(area) * surface) IS NULL then 0 else (sum(area) * surface) end) AS value
FROM table1
INNER JOIN table2 on table1.code = table2.code
GROUP BY table2.code, surface
Finalize GroupAggregate (cost=271918.31..272252.77 rows=1270 width=40) (actual time=1484.846..1512.998 rows=5 loops=1)
Group Key: table2.code
Buffers: shared hit=96 read=162098
-> Gather Merge (cost=271918.31..272214.67 rows=2540 width=40) (actual time=1484.840..1512.990 rows=15 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=96 read=162098
-> Sort (cost=270918.29..270921.46 rows=1270 width=40) (actual time=1435.897..1435.899 rows=5 loops=3)
Sort Key: table2.code
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
Buffers: shared hit=96 read=162098
-> Partial HashAggregate (cost=270840.11..270852.81 rows=1270 width=40) (actual time=1435.857..1435.863 rows=5 loops=3)
Group Key: table2.code
Buffers: shared hit=74 read=162098
-> Hash Join (cost=38.58..240035.98 rows=6160827 width=40) (actual time=204.916..1022.133 rows=3333333 loops=3)
Hash Cond: (table1.code = table2.code)
Buffers: shared hit=74 read=162098
-> Parallel Seq Scan on table1 (cost=0.00..223771.27 rows=6160827 width=10) (actual time=204.712..486.850 rows=3333333 loops=3)
Buffers: shared hit=65 read=162098
-> Hash (cost=22.70..22.70 rows=1270 width=36) (actual time=0.155..0.156 rows=5 loops=3)
Buckets: 2048 Batches: 1 Memory Usage: 17kB
Buffers: shared hit=3
-> Seq Scan on table2 (cost=0.00..22.70 rows=1270 width=36) (actual time=0.142..0.143 rows=5 loops=3)
Buffers: shared hit=3
2021-04-14 11:54:24.122 GMT [5056] CONTEXT: SQL function "test_function2" statement 1
First, a general discussion how to get execution plans in such a case
To get to the bottom of that, activate auto_explain and track function execution in postgresql.conf:
shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = 0
auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_nested_statements = on
track_functions = 'pl'
Then restart the database. Don't do that on a busy productive database, as it will log a lot and add considerable overhead!
Reset the database statistics with
SELECT pg_stat_reset();
Now the execution plans of all the SQL statements inside your functions will be logged, and PostgreSQL keeps track of function execution times.
Look at the execution plans and execution times of the statements when called from the SQL function and the PL/pgSQL function and see if you can spot a difference. Then compare the execution times in pg_stat_user_functions to compare the function execution time.
Explanation in the current case, after looking at the execution plans
The query run from PL/pgSQL is not parallelized. Due to a limitation in the implementation, queries run with RETURN QUERY never are.

PgSQL: Assigning a column value to a variable makes query parameter unbound

When running the code below:
drop table if exists demo;
drop table if exists demo_test;
drop table if exists demo_result;
create table demo as select md5(v::text) from generate_series(1, 1000000) v;
create index on demo (md5 text_pattern_ops);
analyze demo;
create table demo_test
as select left(md5(v::text), 5) || '%' as "patt" from generate_series(2000000, 2000010) v;
create table demo_result (row text);
load 'auto_explain';
set auto_explain.log_min_duration to 0;
set auto_explain.log_analyze to true;
set auto_explain.log_nested_statements to true;
do $$
declare
row record;
pattern text;
begin
for row in select patt from demo_test loop
pattern = row.patt; -- <--- CRUCIAL LINE
insert into demo_result select * from demo where md5 like pattern;
end loop;
end$$;
PostgreSQL generates the following query plan:
2017-10-02 17:03:48 CEST [18038-23] app=psql barczynski#barczynski LOG: duration: 0.021 ms plan:
Query Text: insert into demo_result select * from demo where md5 like pattern
Insert on demo_result (cost=0.42..8.45 rows=100 width=33) (actual time=0.021..0.021 rows=0 loops=1)
-> Index Only Scan using demo_md5_idx on demo (cost=0.42..8.45 rows=100 width=33) (actual time=0.018..0.018 rows=1 loops=1)
Index Cond: ((md5 ~>=~ '791cc'::text) AND (md5 ~<~ '791cd'::text))
Filter: (md5 ~~ '791cc%'::text)
Heap Fetches: 1
But after removing pattern variable, and inlining row.patt in where condition:
insert into demo_result select * from demo where md5 like row.patt;
PostgreSQL treats the parameter as bind:
2017-10-02 17:03:02 CEST [17901-23] app=psql barczynski#barczynski LOG: duration: 89.636 ms plan:
Query Text: insert into demo_result select * from demo where md5 like row.patt
Insert on demo_result (cost=0.00..20834.00 rows=5000 width=33) (actual time=89.636..89.636 rows=0 loops=1)
-> Seq Scan on demo (cost=0.00..20834.00 rows=5000 width=33) (actual time=47.255..89.628 rows=1 loops=1)
Filter: (md5 ~~ $4)
Rows Removed by Filter: 999999
I understand that the latter plan employs sequential scan, because PostgreSQL assumes that bind parameters start with wildcards.
My question is why the extra assignment switches bind parameter on and off?
The difference is in the data available to the optimizer at the time it looks at the query.
With the first query, the bound parameter is available for the optimizer to look at. So it sees that there is no wildcard and it knows that the index can be used.
insert into demo_result select * from demo where md5 like '791cc%';
The second query has no idea what the pattern will look like so its not able to make the assumption that the index is any good.
I suspect that if you had a pattern with a leading wildcard '%791cc' you would see the same query plan being used for both approaches as a seq_scan would be used for both.

SQL function very slow compared to query without function wrapper

I have this PostgreSQL 9.4 query that runs very fast (~12ms):
SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email,
customers.name,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = 2
ORDER BY
auth_web_events.id DESC;
But if I embed it into a function, the query runs very slow through all data, seems that is running through every record, what am I missing?, I have ~1M of data and I want to simplify my database layer storing the large queries into functions and views.
CREATE OR REPLACE FUNCTION get_web_events_by_userid(int) RETURNS TABLE(
id int,
time_stamp timestamp with time zone,
description text,
origin text,
userlogin text,
customer text,
client_ip inet
) AS
$func$
SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email AS user,
customers.name AS customer,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = $1
ORDER BY
auth_web_events.id DESC;
$func$ LANGUAGE SQL;
The query plan is:
"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=61.905..61.906 rows=2 loops=1)"
" Sort Key: auth_web_events.id"
" Sort Method: quicksort Memory: 25kB"
" -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=61.884..61.893 rows=2 loops=1)"
" -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=61.874..61.879 rows=2 loops=1)"
" -> Index Scan using auth_web_events_fk1 on auth_web_events (cost=0.57..4.58 rows=1 width=61) (actual time=61.860..61.860 rows=2 loops=1)"
" Index Cond: (user_id_fk = 2)"
" -> Index Scan using auth_user_pkey on auth_user (cost=0.14..8.16 rows=1 width=524) (actual time=0.005..0.005 rows=1 loops=2)"
" Index Cond: (id = 2)"
" -> Index Scan using customers_id_idx on customers (cost=0.14..8.16 rows=1 width=222) (actual time=0.004..0.005 rows=1 loops=2)"
" Index Cond: (id = auth_user.customer_id_fk)"
"Planning time: 0.369 ms"
"Execution time: 61.965 ms"
I'm calling the funcion on this way:
SELECT * from get_web_events_by_userid(2)
The query plan for the function:
"Function Scan on get_web_events_by_userid (cost=0.25..10.25 rows=1000 width=172) (actual time=279107.142..279107.144 rows=2 loops=1)"
"Planning time: 0.038 ms"
"Execution time: 279107.175 ms"
EDIT: I just change the parameters, and the issue persist.
EDIT2: Query plan for the Erwin answer:
"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=0.048..0.049 rows=2 loops=1)"
" Sort Key: w.id"
" Sort Method: quicksort Memory: 25kB"
" -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=0.030..0.037 rows=2 loops=1)"
" -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=0.023..0.025 rows=2 loops=1)"
" -> Index Scan using auth_user_pkey on auth_user u (cost=0.14..8.16 rows=1 width=524) (actual time=0.011..0.012 rows=1 loops=1)"
" Index Cond: (id = 2)"
" -> Index Scan using auth_web_events_fk1 on auth_web_events w (cost=0.57..4.58 rows=1 width=61) (actual time=0.008..0.008 rows=2 loops=1)"
" Index Cond: (user_id_fk = 2)"
" -> Index Scan using customers_id_idx on customers c (cost=0.14..8.16 rows=1 width=222) (actual time=0.003..0.004 rows=1 loops=2)"
" Index Cond: (id = u.customer_id_fk)"
"Planning time: 0.541 ms"
"Execution time: 0.101 ms"
user
While rewriting your function I realized that you added column aliases here:
SELECT
...
auth_user.email AS user,
customers.name AS customer,
.. which wouldn't do anything to begin with, since those aliases are invisible outside the function and not referenced inside the function. So they would be ignored. For documentation purposes better use a comment.
But it also makes your query invalid, because user is a completely reserved word and cannot be used as column alias unless double-quoted.
Oddly, in my tests the function seems to work with the invalid alias. Probably because it is ignored (?). But I am not sure this couldn't have side effects.
Your function rewritten (otherwise equivalent):
CREATE OR REPLACE FUNCTION get_web_events_by_userid(int)
RETURNS TABLE (
id int
, time_stamp timestamptz
, description text
, origin text
, userlogin text
, customer text
, client_ip inet
)
LANGUAGE sql STABLE AS
$func$
SELECT w.id
, w.time_stamp
, w.description
, w.origin
, u.email -- AS user -- make this a comment!
, c.name -- AS customer
, w.client_ip
FROM public.auth_user u
JOIN public.auth_web_events w ON w.user_id_fk = u.id
JOIN public.customers c ON c.id = u.customer_id_fk
WHERE u.id = $1 -- reverted the logic here
ORDER BY w.id DESC
$func$;
Obviously, the STABLE keyword changed the outcome. Function volatility should not be an issue in the test situation you describe. The setting does not normally profit a single, isolated function call. Read details in the manual. Also, standard EXPLAIN does not show query plans for what's going on inside functions. You could employ the additional module auto-explain for that:
Postgres query plan of a UDF invocation written in pgpsql
You have a very odd data distribution:
auth_web_events table has 100000000 records, auth_user->2 records, customers-> 1 record
Since you didn't define otherwise, the function assumes an estimate of 1000 rows to be returned. But your function is actually returning only 2 rows. If all your calls only return (in the vicinity of) 2 rows, just declare that with an added ROWS 2. Might change the query plan for the VOLATILE variant as well (even if STABLE is the right choice anyway here).
You will get better performance by making this query dynamic and using plpgsql.
CREATE OR REPLACE FUNCTION get_web_events_by_userid(uid int) RETURNS TABLE(
id int,
time_stamp timestamp with time zone,
description text,
origin text,
userlogin text,
customer text,
client_ip inet
) AS $$
BEGIN
RETURN QUERY EXECUTE
'SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email AS user,
customers.name AS customer,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = ' || uid ||
'ORDER BY
auth_web_events.id DESC;'
END;
$$ LANGUAGE plpgsql;