Related
What I mean is this. In PostgreSQL (v 15.1) I have a table foo created in the following way.
create table foo (
id integer primary key generated by default as identity,
id_mod_7 int generated always as (id % 7) stored
);
create index on foo (id_mod_7, id);
insert into foo (id) select generate_series(1, 10000);
If I query this table with a predicate that doesn't use a literal constant but rather uses a function, a sequential scan is used:
explain analyze
select count(1) from foo where id_mod_7 = extract(dow from current_date);
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Aggregate (cost=245.12..245.13 rows=1 width=8) (actual time=7.218..7.219 rows=1 loops=1)
-> Seq Scan on foo (cost=0.00..245.00 rows=50 width=0) (actual time=0.020..7.028 rows=1428 loops=1)
Filter: ((id_mod_7)::numeric = EXTRACT(dow FROM CURRENT_DATE))
Rows Removed by Filter: 8572
Planning Time: 0.178 ms
Execution Time: 7.281 ms
However, if I query this table with a predicate that does use a literal constant, an index scan is used:
explain analyze
select count(1) from foo where id_mod_7 = 6;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=48.84..48.85 rows=1 width=8) (actual time=0.321..0.322 rows=1 loops=1)
-> Index Only Scan using foo_id_mod_7_id_idx on foo (cost=0.29..45.27 rows=1428 width=0) (actual time=0.022..0.214 rows=1428 loops=1)
Index Cond: (id_mod_7 = 6)
Heap Fetches: 0
Planning Time: 0.106 ms
Execution Time: 0.397 ms
I thought maybe I could fool it into using the index if I used the caching (alleged?) properties of Common Table Expressions (CTE), but to no avail:
explain analyze
with param as (select extract(dow from current_date) as dow)
select count(1) from foo join param on id_mod_7 = dow;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Aggregate (cost=245.12..245.13 rows=1 width=8) (actual time=5.830..5.831 rows=1 loops=1)
-> Seq Scan on foo (cost=0.00..245.00 rows=50 width=0) (actual time=0.025..5.668 rows=1428 loops=1)
Filter: ((id_mod_7)::numeric = EXTRACT(dow FROM CURRENT_DATE))
Rows Removed by Filter: 8572
Planning Time: 0.234 ms
Execution Time: 5.894 ms
It's not fatal, but I'm just trying to understand what's going on here. Thanks!
P.S. and just to avoid confusion, it's not the table column that is being computed within the SQL query. It's the value in the predicate expression that is (or would be) computed within the SQL query.
Like I said, I've tried using a CTE because I believed the CTE would be cached or materialized and expected an index scan, but unfortunately still got a sequential scan.
This is because extract() returns a numeric value, but the column is an integer. You can see this effect in the execution plan: (id_mod_7)::numeric = ... - the column needs to be cast to a numeric to be able to match the value from the extract() function
You need to cast the result of the extract() function to an int:
select count(*)
from foo
where id_mod_7 = extract(dow from current_date)::int
I have a table feed_item_likes_dislikes in PostgresQL (feed_item_id, user_id, vote) where
feed_item_id is uuid
user_id is integer
vote = TRUE then it is a like
vote = FALSE then it is a dislike
vote = NULL meaning the user originally liked or disliked but came back and removed the vote by unvoting
I have another table feed_item_likes_dislikes_aggregate(feed_item_id, likes, dislikes) where I want to maintain total number of likes dislikes per post
when the user adds a new like in the feed_item_likes_dislikes table with
INSERT INTO feed_item_likes_dislikes VALUES('54d67b62-9b71-a6bc-d934-451c1eaae3bc', 1, TRUE);
I want to update the total number of likes in the aggregate table. Similar case needs to be handled for dislikes and when user unvotes something by setting vote to null
User may also update their like to a dislike and vice versa and in every condition, the total number of likes and dislikes for that post needs to be maintained
I wrote the following trigger function to accomplish this
CREATE OR REPLACE FUNCTION update_votes() RETURNS trigger AS $$
DECLARE
feed_item_id_val uuid;
likes_val integer;
dislikes_val integer;
BEGIN
IF (TG_OP = 'DELETE') THEN
-- when a row is deleted, store feed_item_id of the deleted row so that we can update its likes and dislikes count
feed_item_id_val:=OLD.feed_item_id;
ELSIF (TG_OP = 'UPDATE') OR (TG_OP='INSERT') THEN
feed_item_id_val:=NEW.feed_item_id;
END IF;
-- get total number of likes and dislikes for the given feed_item_id
SELECT COUNT(*) FILTER(WHERE vote=TRUE) AS likes, COUNT(*) FILTER(WHERE vote=FALSE) AS dislikes INTO likes_val, dislikes_val FROM feed_item_likes_dislikes WHERE feed_item_id=feed_item_id_val;
-- update the aggregate count for only this feed_item_id
INSERT INTO feed_item_likes_dislikes_aggregate (feed_item_id, likes, dislikes) VALUES (feed_item_id_val, likes_val, dislikes_val) ON CONFLICT(feed_item_id) DO UPDATE SET likes=likes_val, dislikes=dislikes_val;
RETURN NULL; -- result is ignored since this is an AFTER trigger
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER update_votes_trigger AFTER INSERT OR UPDATE OR DELETE ON feed_item_likes_dislikes FOR EACH ROW EXECUTE PROCEDURE update_votes();
But when I do a bulk insert into feed_item_likes_dislikes, sometimes the total number of likes and dislikes is incorrect.
Can someone kindly tell me how I can fix this?
Update 1
I tried creating a view but it takes a lot of time on my production data set, here is the db fiddle https://www.db-fiddle.com/f/2ZAkjQhUydMaV9o5xvLgMT/17
Query #1
EXPLAIN ANALYZE SELECT f.feed_item_id,pubdate,link,guid,title,summary,author,feed_id,COALESCE(likes, 0) AS likes,COALESCE(dislikes, 0) AS dislikes,COALESCE(bullish, 0) AS bullish,COALESCE(bearish, 0) AS bearish FROM feed_items f LEFT JOIN likes_dislikes_aggregate l ON f.feed_item_id = l.feed_item_id LEFT JOIN bullish_bearish_aggregate b ON f.feed_item_id = b.feed_item_id ORDER BY pubdate DESC, f.feed_item_id DESC LIMIT 10;
QUERY PLAN
Limit (cost=112.18..112.21 rows=10 width=238) (actual time=0.257..0.260 rows=10 loops=1)
-> Sort (cost=112.18..112.93 rows=300 width=238) (actual time=0.257..0.257 rows=10 loops=1)
Sort Key: f.pubdate DESC, f.feed_item_id DESC
Sort Method: top-N heapsort Memory: 27kB
-> Hash Left Join (cost=91.10..105.70 rows=300 width=238) (actual time=0.162..0.222 rows=100 loops=1)
Hash Cond: (f.feed_item_id = b.feed_item_id)
-> Hash Left Join (cost=45.55..59.35 rows=300 width=222) (actual time=0.080..0.114 rows=100 loops=1)
Hash Cond: (f.feed_item_id = l.feed_item_id)
-> Seq Scan on feed_items f (cost=0.00..13.00 rows=300 width=206) (actual time=0.004..0.011 rows=100 loops=1)
-> Hash (cost=43.05..43.05 rows=200 width=32) (actual time=0.069..0.069 rows=59 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Subquery Scan on l (cost=39.05..43.05 rows=200 width=32) (actual time=0.037..0.052 rows=59 loops=1)
-> HashAggregate (cost=39.05..41.05 rows=200 width=32) (actual time=0.036..0.046 rows=59 loops=1)
Group Key: feed_item_likes_dislikes.feed_item_id
-> Seq Scan on feed_item_likes_dislikes (cost=0.00..26.60 rows=1660 width=17) (actual time=0.003..0.008 rows=95 loops=1)
-> Hash (cost=43.05..43.05 rows=200 width=32) (actual time=0.064..0.064 rows=63 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Subquery Scan on b (cost=39.05..43.05 rows=200 width=32) (actual time=0.029..0.044 rows=63 loops=1)
-> HashAggregate (cost=39.05..41.05 rows=200 width=32) (actual time=0.028..0.038 rows=63 loops=1)
Group Key: feed_item_bullish_bearish.feed_item_id
-> Seq Scan on feed_item_bullish_bearish (cost=0.00..26.60 rows=1660 width=17) (actual time=0.002..0.007 rows=93 loops=1)
Planning Time: 0.140 ms
Execution Time: 0.328 ms
View on DB Fiddle
The attempt to keep a running aggregate is always riddled with traps, and is almost always not worth the effort. The solution is to not try to store aggregates but to derive them as needed. You do this by creating a VIEW rather that a table. This then removes all additional processing, especially so in this case as your trigger basically contains the query needed to generate the view. ( see Demo here )
create or replace VIEW likes_dislikes_aggregate as
select id
, count(*) filter(where vote) as likes
, count(*) filter(where not vote) as dislikes
, count(*) filter(where vote is null) as no_vote
from likes_dislikes
group by id;
No trigger, no additional code everything with the view through standard DML. Notice that the entity view is basically nothing but your count query without the trigger overhead and maintenance.
SELECT COUNT(*) FILTER(WHERE vote=TRUE) AS likes, COUNT(*) FILTER(WHERE vote=FALSE) AS dislikes INTO likes_val, dislikes_val FROM feed_item_likes_dislikes WHERE feed_item_id=feed_item_id_val;
I have table (over 100 millions records) on PostgreSQL 13.1
CREATE TABLE report
(
id serial primary key,
license_plate_id integer,
datetime timestamp
);
Indexes (for test I create both of them):
create index report_lp_datetime_index on report (license_plate_id, datetime);
create index report_lp_datetime_desc_index on report (license_plate_id desc, datetime desc);
So, my question is why query like
select * from report r
where r.license_plate_id in (1,2,4,5,6,7,8,10,15,22,34,75)
order by datetime desc
limit 100
Is very slow (~10sec). But query without order statement is fast (milliseconds).
Explain:
explain (analyze, buffers, format text) select * from report r
where r.license_plate_id in (1,2,4,5,6,7,8,10,15,22,34, 75,374,57123)
limit 100
Limit (cost=0.57..400.38 rows=100 width=316) (actual time=0.037..0.216 rows=100 loops=1)
Buffers: shared hit=103
-> Index Scan using report_lp_id_idx on report r (cost=0.57..44986.97 rows=11252 width=316) (actual time=0.035..0.202 rows=100 loops=1)
Index Cond: (license_plate_id = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75,374,57123}'::integer[]))
Buffers: shared hit=103
Planning Time: 0.228 ms
Execution Time: 0.251 ms
explain (analyze, buffers, format text) select * from report r
where r.license_plate_id in (1,2,4,5,6,7,8,10,15,22,34,75,374,57123)
order by datetime desc
limit 100
Limit (cost=44193.63..44193.88 rows=100 width=316) (actual time=4921.030..4921.047 rows=100 loops=1)
Buffers: shared hit=11455 read=671
-> Sort (cost=44193.63..44221.76 rows=11252 width=316) (actual time=4921.028..4921.035 rows=100 loops=1)
Sort Key: datetime DESC
Sort Method: top-N heapsort Memory: 128kB
Buffers: shared hit=11455 read=671
-> Bitmap Heap Scan on report r (cost=151.18..43763.59 rows=11252 width=316) (actual time=54.422..4911.927 rows=12148 loops=1)
Recheck Cond: (license_plate_id = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75,374,57123}'::integer[]))
Heap Blocks: exact=12063
Buffers: shared hit=11455 read=671
-> Bitmap Index Scan on report_lp_id_idx (cost=0.00..148.37 rows=11252 width=0) (actual time=52.631..52.632 rows=12148 loops=1)
Index Cond: (license_plate_id = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75,374,57123}'::integer[]))
Buffers: shared hit=59 read=4
Planning Time: 0.427 ms
Execution Time: 4921.128 ms
You seem to have rather slow storage, if reading 671 8kB-blocks from disk takes a couple of seconds.
The way to speed this up is to reorder the table in the same way as the index, so that you can find the required rows in the same or adjacent table blocks:
CLUSTER report_lp_id_idx USING report_lp_id_idx;
Be warned that rewriting the table in this way causes downtime – the table will not be available while it is being rewritten. Moreover, PostgreSQL does not maintain the table order, so subsequent data modifications will cause performance to gradually deteriorate, so that after a while you will have to run CLUSTER again.
But if you need this query to be fast no matter what, CLUSTER is the way to go.
Your two indices do exactly the same thing, so you can remove the second one, it's useless.
To optimize your query, the order of the fields inside the index must be reversed:
create index report_lp_datetime_index on report (datetime,license_plate_id);
BEGIN;
CREATE TABLE foo (d INTEGER, i INTEGER);
INSERT INTO foo SELECT random()*100000, random()*1000 FROM generate_series(1,1000000) s;
CREATE INDEX foo_d_i ON foo(d DESC,i);
COMMIT;
VACUUM ANALYZE foo;
EXPLAIN ANALYZE SELECT * FROM foo WHERE i IN (1,2,4,5,6,7,8,10,15,22,34,75) ORDER BY d DESC LIMIT 100;
Limit (cost=0.42..343.92 rows=100 width=8) (actual time=0.076..9.359 rows=100 loops=1)
-> Index Only Scan Backward using foo_d_i on foo (cost=0.42..40976.43 rows=11929 width=8) (actual time=0.075..9.339 rows=100 loops=1)
Filter: (i = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75}'::integer[]))
Rows Removed by Filter: 9016
Heap Fetches: 0
Planning Time: 0.339 ms
Execution Time: 9.387 ms
Note the index is not used to optimize the WHERE clause. It is used here as a compact and fast way to store references to the rows ordered by date DESC, so the ORDER BY can do an index-only scan and avoid sorting. By adding column id to the index, an index-only scan can be performed to test the condition on id, without hitting the table for every row. Since there is a low LIMIT value it does not need to scan the whole index, it only scans it in date DESC order until it finds enough rows satisfying the WHERE condition to return the result.
It will be faster if you create the index in date DESC order, this could be useful if you use ORDER BY date DESC + LIMIT in other queries too.
You forget that OP's table has a third column, and he is using SELECT *. So that wouldn't be an index-only scan.
Easy to work around. The optimum way to do this query would be an index-only scan to filter on WHERE conditions, then LIMIT, then hit the table to get the rows. For some reason if "select *" is used postgres takes the id column from the table instead of taking it from the index, which results in lots of unnecessary heap fetches for rows whose id is rejected by the WHERE condition.
Easy to work around, by doing it manually. I've also added another bogus column to make sure the SELECT * hits the table.
EXPLAIN (ANALYZE,buffers) SELECT * FROM foo
JOIN (SELECT d,i FROM foo WHERE i IN (1,2,4,5,6,7,8,10,15,22,34,75) ORDER BY d DESC LIMIT 100) f USING (d,i)
ORDER BY d DESC LIMIT 100;
Limit (cost=0.85..1281.94 rows=1 width=17) (actual time=0.052..3.618 rows=100 loops=1)
Buffers: shared hit=453
-> Nested Loop (cost=0.85..1281.94 rows=1 width=17) (actual time=0.050..3.594 rows=100 loops=1)
Buffers: shared hit=453
-> Limit (cost=0.42..435.44 rows=100 width=8) (actual time=0.037..2.953 rows=100 loops=1)
Buffers: shared hit=53
-> Index Only Scan using foo_d_i on foo foo_1 (cost=0.42..51936.43 rows=11939 width=8) (actual time=0.037..2.935 rows=100 loops=1)
Filter: (i = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75}'::integer[]))
Rows Removed by Filter: 9010
Heap Fetches: 0
Buffers: shared hit=53
-> Index Scan using foo_d_i on foo (cost=0.42..8.45 rows=1 width=17) (actual time=0.005..0.005 rows=1 loops=100)
Index Cond: ((d = foo_1.d) AND (i = foo_1.i))
Buffers: shared hit=400
Execution Time: 3.663 ms
Another option is to just add the primary key to the date,license_plate index.
SELECT * FROM foo JOIN (SELECT id FROM foo WHERE i IN (1,2,4,5,6,7,8,10,15,22,34,75) ORDER BY d DESC LIMIT 100) f USING (id) ORDER BY d DESC LIMIT 100;
Limit (cost=1357.98..1358.23 rows=100 width=17) (actual time=3.920..3.947 rows=100 loops=1)
Buffers: shared hit=473
-> Sort (cost=1357.98..1358.23 rows=100 width=17) (actual time=3.919..3.931 rows=100 loops=1)
Sort Key: foo.d DESC
Sort Method: quicksort Memory: 32kB
Buffers: shared hit=473
-> Nested Loop (cost=0.85..1354.66 rows=100 width=17) (actual time=0.055..3.858 rows=100 loops=1)
Buffers: shared hit=473
-> Limit (cost=0.42..509.41 rows=100 width=8) (actual time=0.039..3.116 rows=100 loops=1)
Buffers: shared hit=73
-> Index Only Scan using foo_d_i_id on foo foo_1 (cost=0.42..60768.43 rows=11939 width=8) (actual time=0.039..3.093 rows=100 loops=1)
Filter: (i = ANY ('{1,2,4,5,6,7,8,10,15,22,34,75}'::integer[]))
Rows Removed by Filter: 9010
Heap Fetches: 0
Buffers: shared hit=73
-> Index Scan using foo_pkey on foo (cost=0.42..8.44 rows=1 width=17) (actual time=0.006..0.006 rows=1 loops=100)
Index Cond: (id = foo_1.id)
Buffers: shared hit=400
Execution Time: 3.972 ms
Edit
After thinking about it... since the LIMIT restricts the output to 100 rows ordered by date desc, wouldn't it be nice if we could get the 100 most recent rows for each license_plate_id, put all that into a top-n sort, and only keep the best 100 for all license_plate_ids? That would avoid reading and throwing away a lot of rows from the index. Even if that's much faster than hitting the table, it will still load up these index pages in RAM and clog up your buffers with stuff you don't actually need to keep in cache. Let's use LATERAL JOIN:
EXPLAIN (ANALYZE,BUFFERS)
SELECT * FROM foo
JOIN (SELECT d,i FROM
(VALUES (1),(2),(4),(5),(6),(7),(8),(10),(15),(22),(34),(75)) idlist
CROSS JOIN LATERAL
(SELECT d,i FROM foo WHERE i=idlist.column1 ORDER BY d DESC LIMIT 100) f2
ORDER BY d DESC LIMIT 100
) f3 USING (d,i)
ORDER BY d DESC LIMIT 100;
It's even faster: 2ms, and it uses the index on (license_plate_id,date) instead of the other way around. Also, and this is important, since each subquery in the lateral hits only the index pages that contain rows that will actually be selected, while the previous queries hit much more index pages. So you save on RAM buffers.
If you don't need the index on (date,license_plate_id) and don't want to keep a useless index, that could be interesting since this query doesn't use it. On the other hand, if you need the index on (date,license_plate_id) for something else and want to keep it, then... maybe not.
Please post results for the winning query 🔥
postgres version: 9.3
postgres.conf: all default configurations
I have 2 tables, A and B,both have 1 million rows.
There is a postgres function that will execute every 2 seconds, it will update Table A where ids in an array(array size = 20), and then delete the rows in Table B.
DB function shows as below:
CREATE OR REPLACE FUNCTION test_function (ids NUMERIC[])
RETURNS void AS $$
BEGIN
UPDATE A a
SET status = 'begin', end_time = (NOW() AT TIME ZONE 'UTC')
WHERE a.id = ANY (ids);
DELETE FROM B b
WHERE b.aid = ANY (ids)
AND b.status = 'end';
END;
$$ LANGUAGE plpgsql;
Analysis shows as below:
explain(ANALYZE,BUFFERS,VERBOSE) select test_function('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}');
QUERY PLAN
Result (cost=0.00..0.26 rows=1 width=0) (actual time=14030.435..14030.436 rows=1 loops=1)
Output: test_function('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}'::numeric[])
Buffers: shared hit=24297 read=26137 dirtied=20
Total runtime: 14030.444 ms
(4 rows)
My Question is:
In the production environment, why this function need to execute at most 7 seconds before success;
When this function is executing, this process will eats up to 60%. --> This is the key problem
EDIT:
Analyze each single sql:
explain(ANALYZE,VERBOSE,BUFFERS) UPDATE A a SET status = 'begin',
end_time = (now()) WHERE a.id = ANY
('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}');
QUERY PLAN
Update on public.A a (cost=0.45..99.31 rows=20 width=143) (actual time=1.206..1.206 rows=0 loops=1)
Buffers: shared hit=206 read=27 dirtied=30
-> Index Scan using A_pkey on public.a a (cost=0.45..99.31 rows=20 width=143) (actual time=0.019..0.116 rows=19 loops=1)
Output: id, start_time, now(), 'begin'::character varying(255), xxxx... ctid
Index Cond: (t.id = ANY('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}'::integer[]))
Buffers: shared hit=75 read=11
Trigger test_trigger: time=5227.111 calls=1
Total runtime: 5228.357 ms
(8 rows)
explain(ANALYZE,BUFFERS,VERBOSE) DELETE FROM
B b WHERE tq.aid = ANY
('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}');
QUERY PLAN
Delete on B b (cost=0.00..1239.11 rows=20 width=6) (actual time=6.013..6.013 rows=0 loops=1)
Buffers: shared hit=448
-> Seq Scan on B b (cost=0.00..1239.11 rows=20 width=6) (actual time=6.011..6.011 rows=0 loops=1)
Output: ctid
Filter: (b.aid = ANY ('{2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}'::bigint[]))
Rows Removed by Filter: 21743
Buffers: shared hit=448
Total runtime: 6.029 ms
(8 rows)
CPU usage
Before calling:
Afer frequently operations:
I have this PostgreSQL 9.4 query that runs very fast (~12ms):
SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email,
customers.name,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = 2
ORDER BY
auth_web_events.id DESC;
But if I embed it into a function, the query runs very slow through all data, seems that is running through every record, what am I missing?, I have ~1M of data and I want to simplify my database layer storing the large queries into functions and views.
CREATE OR REPLACE FUNCTION get_web_events_by_userid(int) RETURNS TABLE(
id int,
time_stamp timestamp with time zone,
description text,
origin text,
userlogin text,
customer text,
client_ip inet
) AS
$func$
SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email AS user,
customers.name AS customer,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = $1
ORDER BY
auth_web_events.id DESC;
$func$ LANGUAGE SQL;
The query plan is:
"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=61.905..61.906 rows=2 loops=1)"
" Sort Key: auth_web_events.id"
" Sort Method: quicksort Memory: 25kB"
" -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=61.884..61.893 rows=2 loops=1)"
" -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=61.874..61.879 rows=2 loops=1)"
" -> Index Scan using auth_web_events_fk1 on auth_web_events (cost=0.57..4.58 rows=1 width=61) (actual time=61.860..61.860 rows=2 loops=1)"
" Index Cond: (user_id_fk = 2)"
" -> Index Scan using auth_user_pkey on auth_user (cost=0.14..8.16 rows=1 width=524) (actual time=0.005..0.005 rows=1 loops=2)"
" Index Cond: (id = 2)"
" -> Index Scan using customers_id_idx on customers (cost=0.14..8.16 rows=1 width=222) (actual time=0.004..0.005 rows=1 loops=2)"
" Index Cond: (id = auth_user.customer_id_fk)"
"Planning time: 0.369 ms"
"Execution time: 61.965 ms"
I'm calling the funcion on this way:
SELECT * from get_web_events_by_userid(2)
The query plan for the function:
"Function Scan on get_web_events_by_userid (cost=0.25..10.25 rows=1000 width=172) (actual time=279107.142..279107.144 rows=2 loops=1)"
"Planning time: 0.038 ms"
"Execution time: 279107.175 ms"
EDIT: I just change the parameters, and the issue persist.
EDIT2: Query plan for the Erwin answer:
"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=0.048..0.049 rows=2 loops=1)"
" Sort Key: w.id"
" Sort Method: quicksort Memory: 25kB"
" -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=0.030..0.037 rows=2 loops=1)"
" -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=0.023..0.025 rows=2 loops=1)"
" -> Index Scan using auth_user_pkey on auth_user u (cost=0.14..8.16 rows=1 width=524) (actual time=0.011..0.012 rows=1 loops=1)"
" Index Cond: (id = 2)"
" -> Index Scan using auth_web_events_fk1 on auth_web_events w (cost=0.57..4.58 rows=1 width=61) (actual time=0.008..0.008 rows=2 loops=1)"
" Index Cond: (user_id_fk = 2)"
" -> Index Scan using customers_id_idx on customers c (cost=0.14..8.16 rows=1 width=222) (actual time=0.003..0.004 rows=1 loops=2)"
" Index Cond: (id = u.customer_id_fk)"
"Planning time: 0.541 ms"
"Execution time: 0.101 ms"
user
While rewriting your function I realized that you added column aliases here:
SELECT
...
auth_user.email AS user,
customers.name AS customer,
.. which wouldn't do anything to begin with, since those aliases are invisible outside the function and not referenced inside the function. So they would be ignored. For documentation purposes better use a comment.
But it also makes your query invalid, because user is a completely reserved word and cannot be used as column alias unless double-quoted.
Oddly, in my tests the function seems to work with the invalid alias. Probably because it is ignored (?). But I am not sure this couldn't have side effects.
Your function rewritten (otherwise equivalent):
CREATE OR REPLACE FUNCTION get_web_events_by_userid(int)
RETURNS TABLE (
id int
, time_stamp timestamptz
, description text
, origin text
, userlogin text
, customer text
, client_ip inet
)
LANGUAGE sql STABLE AS
$func$
SELECT w.id
, w.time_stamp
, w.description
, w.origin
, u.email -- AS user -- make this a comment!
, c.name -- AS customer
, w.client_ip
FROM public.auth_user u
JOIN public.auth_web_events w ON w.user_id_fk = u.id
JOIN public.customers c ON c.id = u.customer_id_fk
WHERE u.id = $1 -- reverted the logic here
ORDER BY w.id DESC
$func$;
Obviously, the STABLE keyword changed the outcome. Function volatility should not be an issue in the test situation you describe. The setting does not normally profit a single, isolated function call. Read details in the manual. Also, standard EXPLAIN does not show query plans for what's going on inside functions. You could employ the additional module auto-explain for that:
Postgres query plan of a UDF invocation written in pgpsql
You have a very odd data distribution:
auth_web_events table has 100000000 records, auth_user->2 records, customers-> 1 record
Since you didn't define otherwise, the function assumes an estimate of 1000 rows to be returned. But your function is actually returning only 2 rows. If all your calls only return (in the vicinity of) 2 rows, just declare that with an added ROWS 2. Might change the query plan for the VOLATILE variant as well (even if STABLE is the right choice anyway here).
You will get better performance by making this query dynamic and using plpgsql.
CREATE OR REPLACE FUNCTION get_web_events_by_userid(uid int) RETURNS TABLE(
id int,
time_stamp timestamp with time zone,
description text,
origin text,
userlogin text,
customer text,
client_ip inet
) AS $$
BEGIN
RETURN QUERY EXECUTE
'SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email AS user,
customers.name AS customer,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = ' || uid ||
'ORDER BY
auth_web_events.id DESC;'
END;
$$ LANGUAGE plpgsql;