Reduce the execution time in postgresql - postgresql

Here I attached my query. It's getting 60141 ms to execute. I don't know what should i do. But I want to execute in short time.And now I posted my analyze and execute output of my query. Please help on this.
EXPLAIN (BUFFERS,ANALYZE) SELECT id
FROM activitylog
WHERE (url = '/staff/save/117' OR url = '/staff/create/117')
AND timestamp > '1990-01-01 00:00:00'
AND userid IN ( SELECT id
FROM users
WHERE companyid = ( SELECT companyid
FROM users
WHERE id='150' ) )
ORDER BY timestamp DESC
Output:
Sort (cost=934879.83..934879.83 rows=1 width=12) (actual time=63918.947..63918.948 rows=4 loops=1)
Sort Key: activitylog."timestamp"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=168161 read=561433
InitPlan 1 (returns $0)
-> Index Scan using "usersPrimary" on users users_1 (cost=0.14..8.16 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=1)
Index Cond: (id = 150)
Buffers: shared hit=2
-> Nested Loop (cost=0.00..934871.66 rows=1 width=12) (actual time=63918.693..63918.917 rows=4 loops=1)
Join Filter: (activitylog.userid = users.id)
Rows Removed by Join Filter: 400
Buffers: shared hit=168158 read=561433
-> Seq Scan on users (cost=0.00..10.53 rows=25 width=4) (actual time=0.018..0.085 rows=101 loops=1)
Filter: (companyid = $0)
Rows Removed by Filter: 114
Buffers: shared hit=10
-> Materialize (cost=0.00..934860.39 rows=2 width=16) (actual time=120.024..632.858 rows=4 loops=101)
Buffers: shared hit=168148 read=561433
-> Seq Scan on activitylog (cost=0.00..934860.38 rows=2 width=16) (actual time=12122.376..63918.564 rows=4 loops=1)
Filter: (("timestamp" > '2019-01-02 19:19:12.649837+00'::timestamp with time zone) AND (((url)::text = '/jobs/save/81924'::text) OR ((url)::text = '/jobs/create/81924'::text)))
Rows Removed by Filter: 11935833
Buffers: shared hit=168148 read=561433
Planning time: 0.806 ms
Execution time: 63919.748 ms
Thanks In advance.

Try this one, Join query will enhance the indexing and optimized query execution time
SELECT id
FROM activitylog
WHERE url in ('/staff/save/117','/staff/create/117')
AND TIMESTAMP > '1990-01-01 00:00:00'
AND EXISTS
(SELECT 1
FROM users AS u
JOIN users ur ON ur.CompanyID = u.CompanyID
WHERE ur.ID = '150'
AND u.id = activitylog.userid)
ORDER BY TIMESTAMP DESC

You should rewrite the query using IN instead of OR like Pranesh Janarthanan's answer suggests, because OR is a performance killer.
In addition, you need indexes to avoid the expensive sequential scan on activitylog:
CREATE INDEX ON activitylog (timestamp);
CREATE INDEX ON activitylog (url);
Which of these indexes you need depends on how selective the respective conditions are.
Try with each of these indexes and with both (which may give you a “Bitmap And”) and keep what works best.

You should make sure you have the right indexes. Also, let's optimize the query:
SELECT id
FROM activitylog
WHERE url in ('/staff/save/117', '/staff/create/117')
AND timestamp > '1990-01-01 00:00:00'
AND
(
(userid = 150) or
EXISTS
(
select 1
from users workmate
join users u150
on workmate.companyid = u150.companyid and u150.id = 150
and activitylog.userid = workmate.id
)
)
ORDER BY timestamp DESC

Related

Postgres not using index when ORDER BY and LIMIT when LIMIT above X

I have been trying to debug an issue with postgres where it decides to not use an index when LIMIT is above a specific value.
For example I have a table of 150k rows and when searching with LIMIT of 286 it uses the index while with LIMIT above 286 it does not.
LIMIT 286 uses index
db=# explain (analyze, buffers) SELECT * FROM tempz.tempx AS r INNER JOIN tempz.tempy AS z ON (r.id_tempy=z.id) WHERE z.int_col=2000 AND z.string_col='temp_string' ORDER BY r.name ASC, r.type ASC, r.id ASC LIMIT 286;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..5024.12 rows=286 width=810) (actual time=0.030..0.992 rows=286 loops=1)
Buffers: shared hit=921
-> Nested Loop (cost=0.56..16968.23 rows=966 width=810) (actual time=0.030..0.977 rows=286 loops=1)
Join Filter: (r.id_tempy = z.id)
Rows Removed by Join Filter: 624
Buffers: shared hit=921
-> Index Scan using tempz_tempx_name_type_id_idx on tempx r (cost=0.42..14357.69 rows=173878 width=373) (actual time=0.016..0.742 rows=910 loops=1)
Buffers: shared hit=919
-> Materialize (cost=0.14..2.37 rows=1 width=409) (actual time=0.000..0.000 rows=1 loops=910)
Buffers: shared hit=2
-> Index Scan using tempy_string_col_idx on tempy z (cost=0.14..2.37 rows=1 width=409) (actual time=0.007..0.008 rows=1 loops=1)
Index Cond: (string_col = 'temp_string'::text)
Filter: (int_col = 2000)
Buffers: shared hit=2
Planning Time: 0.161 ms
Execution Time: 1.032 ms
(16 rows)
vs.
LIMIT 287 doing sort
db=# explain (analyze, buffers) SELECT * FROM tempz.tempx AS r INNER JOIN tempz.tempy AS z ON (r.id_tempy=z.id) WHERE z.int_col=2000 AND z.string_col='temp_string' ORDER BY r.name ASC, r.type ASC, r.id ASC LIMIT 287;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4976.86..4977.58 rows=287 width=810) (actual time=49.802..49.828 rows=287 loops=1)
Buffers: shared hit=37154
-> Sort (cost=4976.86..4979.27 rows=966 width=810) (actual time=49.801..49.813 rows=287 loops=1)
Sort Key: r.name, r.type, r.id
Sort Method: top-N heapsort Memory: 506kB
Buffers: shared hit=37154
-> Nested Loop (cost=0.42..4932.59 rows=966 width=810) (actual time=0.020..27.973 rows=51914 loops=1)
Buffers: shared hit=37154
-> Seq Scan on tempy z (cost=0.00..12.70 rows=1 width=409) (actual time=0.006..0.008 rows=1 loops=1)
Filter: ((int_col = 2000) AND (string_col = 'temp_string'::text))
Rows Removed by Filter: 2
Buffers: shared hit=1
-> Index Scan using tempx_id_tempy_idx on tempx r (cost=0.42..4340.30 rows=57959 width=373) (actual time=0.012..17.075 rows=51914 loops=1)
Index Cond: (id_tempy = z.id)
Buffers: shared hit=37153
Planning Time: 0.258 ms
Execution Time: 49.907 ms
(17 rows)
Update:
This is Postgres 11 and VACUUM ANALYZE is run daily. Also, I have already tried to use CTE to remove the filter but the problem is the sorting specifically
-> Sort (cost=4976.86..4979.27 rows=966 width=810) (actual time=49.801..49.813 rows=287 loops=1)
Sort Key: r.name, r.type, r.id
Sort Method: top-N heapsort Memory: 506kB
Buffers: shared hit=37154
Update 2:
After running VACUUM ANALYZE the database starts using the index for some hours and then it goes back to not using it.
Turns out that I can force Postgres to avoid doing any sort if I run SET enable_sort TO OFF;. This raises the cost of sorting very high which causes the Postgres planner to do index scan instead.
I am not really sure why Postgres thinks that index scan is so costly cost=0.42..14357.69 and thinks sorting is cheaper and ends up choosing that. It is also very weird that immediately after a VACUUM ANALYZE it analyzes the costs correctly but after some hours it goes back to sorting.
With sort off plan is still not optimized as it does materialize and loads stuff into memory but it is still faster than sorting.

Postgres function slower than same ad hoc query

I have had several cases where a Postgres function that returns a table result from a query is much slower than running the actual query. Why is that?
This is one example, but I've found that function is slower than just the query in many cases.
create function trending_names(date_start timestamp with time zone, date_end timestamp with time zone, gender_filter character, country_filter text)
returns TABLE(name_id integer, gender character, country text, score bigint, rank bigint)
language sql
as
$$
select u.name_id,
n.gender,
u.country,
count(u.rank) as score,
row_number() over (order by count(u.rank) desc) as rank
from babynames.user_scores u
inner join babynames.names n on u.name_id = n.id
where u.created_at between date_start and date_end
and u.rank > 0
and n.gender = gender_filter
and u.country = country_filter
group by u.name_id, n.gender, u.country
$$;
This is the query plan for a select from the function:
Function Scan on trending_names (cost=0.25..10.25 rows=1000 width=84) (actual time=1118.673..1118.861 rows=2238 loops=1)
Buffers: shared hit=216509 read=29837
Planning Time: 0.078 ms
Execution Time: 1119.083 ms
Query plan from just running the query. This takes less than half the time.
WindowAgg (cost=44834.98..45593.32 rows=43334 width=25) (actual time=383.387..385.223 rows=2238 loops=1)
Planning Time: 2.512 ms
Execution Time: 387.403 ms
Buffers: shared hit=100446 read=50220
-> Sort (cost=44834.98..44943.31 rows=43334 width=17) (actual time=383.375..383.546 rows=2238 loops=1)
Sort Method: quicksort Memory: 271kB
Sort Key: (count(u.rank)) DESC
Buffers: shared hit=100446 read=50220
-> HashAggregate (cost=41064.22..41497.56 rows=43334 width=17) (actual time=381.088..381.906 rows=2238 loops=1)
" Group Key: u.name_id, u.country, n.gender"
Buffers: shared hit=100446 read=50220
-> Hash Join (cost=5352.15..40630.88 rows=43334 width=13) (actual time=60.710..352.646 rows=36271 loops=1)
Hash Cond: (u.name_id = n.id)
Buffers: shared hit=100446 read=50220
-> Index Scan using user_scores_rank_ix on user_scores u (cost=0.43..35077.55 rows=76796 width=11) (actual time=24.193..287.393 rows=69770 loops=1)
-> Hash (cost=5005.89..5005.89 rows=27667 width=6) (actual time=36.420..36.420 rows=27472 loops=1)
Rows Removed by Filter: 106521
Index Cond: (rank > 0)
Filter: ((created_at >= '2021-01-01 00:00:00+00'::timestamp with time zone) AND (country = 'sv'::text) AND (created_at <= now()))
Buffers: shared hit=99417 read=46856
Buffers: shared hit=1029 read=3364
Buckets: 32768 Batches: 1 Memory Usage: 1330kB
-> Seq Scan on names n (cost=0.00..5005.89 rows=27667 width=6) (actual time=0.022..24.447 rows=27472 loops=1)
Rows Removed by Filter: 21559
Filter: (gender = 'f'::bpchar)
Buffers: shared hit=1029 read=3364
I'm also confused on why it does a Seq scan on names n in the last step since names.id is the primary key and gender is indexed.

Postgres optimization failing to filter window function partitions early

In some cases, PostgreSQL does not filter out window function partitions until they are calculated, while in a very similar scenario PostgreSQL filters row before performing window function calculation.
Tables used for minimal STR - log is the main data table, each row contains either increment or absolute value. Absolute value resets the current counter with a new base value. Window functions need to process all logs for a given account_id to calculate the correct running total. View uses a subquery to ensure that underlying log rows are not filtered by ts, otherwise, this would break the window function.
CREATE TABLE account(
id serial,
name VARCHAR(100)
);
CREATE TABLE log(
id serial,
absolute int,
incremental int,
account_id int,
ts timestamp,
PRIMARY KEY(id),
CONSTRAINT fk_account
FOREIGN KEY(account_id)
REFERENCES account(id)
);
CREATE FUNCTION get_running_total_func(
aggregated_total int,
absolute int,
incremental int
) RETURNS int
LANGUAGE sql IMMUTABLE CALLED ON NULL INPUT AS
$$
SELECT
CASE
WHEN absolute IS NOT NULL THEN absolute
ELSE COALESCE(aggregated_total, 0) + incremental
END
$$;
CREATE AGGREGATE get_running_total(integer, integer) (
sfunc = get_running_total_func,
stype = integer
);
Slow view:
CREATE VIEW test_view
(
log_id,
running_value,
account_id,
ts
)
AS
SELECT log_running.* FROM
(SELECT
log.id,
get_running_total(
log.absolute,
log.incremental
)
OVER(
PARTITION BY log.account_id
ORDER BY log.ts RANGE UNBOUNDED PRECEDING
),
account.id,
ts
FROM log log JOIN account account ON log.account_id=account.id
) AS log_running;
CREATE VIEW
postgres=# EXPLAIN ANALYZE SELECT * FROM test_view WHERE account_id=1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on log_running (cost=12734.02..15981.48 rows=1 width=20) (actual time=7510.851..16122.404 rows=20 loops=1)
Filter: (log_running.id_1 = 1)
Rows Removed by Filter: 99902
-> WindowAgg (cost=12734.02..14732.46 rows=99922 width=32) (actual time=7510.830..14438.783 rows=99922 loops=1)
-> Sort (cost=12734.02..12983.82 rows=99922 width=28) (actual time=7510.628..9312.399 rows=99922 loops=1)
Sort Key: log.account_id, log.ts
Sort Method: external merge Disk: 3328kB
-> Hash Join (cost=143.50..2042.24 rows=99922 width=28) (actual time=169.941..5431.650 rows=99922 loops=1)
Hash Cond: (log.account_id = account.id)
-> Seq Scan on log (cost=0.00..1636.22 rows=99922 width=24) (actual time=0.063..1697.802 rows=99922 loops=1)
-> Hash (cost=81.00..81.00 rows=5000 width=4) (actual time=169.837..169.865 rows=5000 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 240kB
-> Seq Scan on account (cost=0.00..81.00 rows=5000 width=4) (actual time=0.017..84.639 rows=5000 loops=1)
Planning Time: 0.199 ms
Execution Time: 16127.275 ms
(15 rows)
Fast view - only change is account.id -> log.account_id (!):
CREATE VIEW test_view
(
log_id,
running_value,
account_id,
ts
)
AS
SELECT log_running.* FROM
(SELECT
log.id,
get_running_total(
log.absolute,
log.incremental
)
OVER(
PARTITION BY log.account_id
ORDER BY log.ts RANGE UNBOUNDED PRECEDING
),
log.account_id,
ts
FROM log log JOIN account account ON log.account_id=account.id
) AS log_running;
CREATE VIEW
postgres=# EXPLAIN ANALYZE SELECT * FROM test_view WHERE account_id=1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on log_running (cost=1894.96..1895.56 rows=20 width=20) (actual time=34.718..45.958 rows=20 loops=1)
-> WindowAgg (cost=1894.96..1895.36 rows=20 width=28) (actual time=34.691..45.307 rows=20 loops=1)
-> Sort (cost=1894.96..1895.01 rows=20 width=24) (actual time=34.367..35.925 rows=20 loops=1)
Sort Key: log.ts
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.28..1894.53 rows=20 width=24) (actual time=0.542..34.066 rows=20 loops=1)
-> Index Only Scan using account_pkey on account (cost=0.28..8.30 rows=1 width=4) (actual time=0.025..0.054 rows=1 loops=1)
Index Cond: (id = 1)
Heap Fetches: 1
-> Seq Scan on log (cost=0.00..1886.03 rows=20 width=24) (actual time=0.195..32.937 rows=20 loops=1)
Filter: (account_id = 1)
Rows Removed by Filter: 99902
Planning Time: 0.297 ms
Execution Time: 47.300 ms
(14 rows)
Is it a bug in PostgreSQL implementation? It seems that this change in view definition shouldn't affect performance at all, PostgreSQL should be able to filter data before applying window function for all data set.

Need to reduce the query optimization time in postgres

Use Case: Need to find the index and totalCount of the particular id in the table
I am having a table ann_details which has 60 million records and based on where condition I need to retrieve the rows along with index of that id
Query:
with a as (
select an.id, row_number() over (partition by created_at) as rn
from annotation an
where ( an.layer_id = '47afb169-aed2-4378-ab13-897836275da3' or an.job_id = '' or an.task_id = '') and
an.category_id in (10019)
) select (select count(1) from a ) as totalCount , rn-1 as index from a where a.id= '47afb169-aed2-4378-ab13-897836275da3_a93f0758-8fe0-4c76-992f-0be17e5618bf_484484101';
Output:
totalCount index
1797124,1791143
Execution Time: 5 sec 487 ms
explain and analyze
CTE Scan on a (cost=872778.54..907545.00 rows=7722 width=16) (actual time=5734.572..5735.989 rows=1 loops=1)
Filter: ((id)::text = '47afb169-aed2-4378-ab13-897836275da3_a93f0758-8fe0-4c76-992f-0be17e5618bf_484484101'::text)
Rows Removed by Filter: 1797123
CTE a
-> WindowAgg (cost=0.68..838031.38 rows=1544318 width=97) (actual time=133.660..3831.998 rows=1797124 loops=1)
-> Index Only Scan using test_index_test_2 on annotation an (cost=0.68..814866.61 rows=1544318 width=89) (actual time=133.647..2660.009 rows=1797124 loops=1)
Index Cond: (category_id = 10019)
Filter: (((layer_id)::text = '47afb169-aed2-4378-ab13-897836275da3'::text) OR ((job_id)::text = ''::text) OR ((task_id)::text = ''::text))
Rows Removed by Filter: 3773007
Heap Fetches: 101650
InitPlan 2 (returns $1)
-> Aggregate (cost=34747.15..34747.17 rows=1 width=8) (actual time=2397.391..2397.392 rows=1 loops=1)
-> CTE Scan on a a_1 (cost=0.00..30886.36 rows=1544318 width=0) (actual time=0.017..2156.210 rows=1797124 loops=1)
Planning time: 0.487 ms
Execution time: 5771.080 ms
Index:
CREATE INDEX test_index_test_2 ON public.annotation USING btree (category_id,created_at,layer_id,job_id,task_id,id);
From application we will be passing the job_id or task_id or layer_id and rest 2 will be passed as empty
Need help in optimizing the query to get the response in 2 sec
Query Plan: https://explain.depesz.com/s/mXme

Postgres Query Optimization w/ simple join

I have the following query:
SELECT "person_dimensions"."dimension"
FROM "person_dimensions"
join users
on users.id = person_dimensions.user_id
where users.team_id = 2
The following is the result of EXPLAIN ANALYZE:
Nested Loop (cost=0.43..93033.84 rows=452 width=11) (actual time=1245.321..42915.426 rows=827 loops=1)
-> Seq Scan on person_dimensions (cost=0.00..254.72 rows=13772 width=15) (actual time=0.022..9.907 rows=13772 loops=1)
-> Index Scan using users_pkey on users (cost=0.43..6.73 rows=1 width=4) (actual time=2.978..3.114 rows=0 loops=13772)
Index Cond: (id = person_dimensions.user_id)
Filter: (team_id = 2)
Rows Removed by Filter: 1
Planning time: 0.396 ms
Execution time: 42915.678 ms
Indexes exist on person_dimensions.user_id and users.team_id, so it is unclear as to why this seemingly simple query would be taking so long.
Maybe it has something to do with team_id being unable to be used in the join condition? Ideas how to speed this up?
EDIT:
I tried this query:
SELECT "person_dimensions"."dimension"
FROM "person_dimensions"
join users ON users.id = person_dimensions.user_id
WHERE users.id IN (2337,2654,3501,56,4373,1060,3170,97,4629,41,3175,4541,2827)
which contains the id's returned by the subquery:
SELECT id FROM users WHERE team_id = 2
The result was 380ms versus 42s as above. I could use this as a workaround, but I am really curious as to what is going on here...
I rebooted my DB server yesterday, and when it came back up this same query was performing as expected with a completely different query plan that used expected indices:
QUERY PLAN
Hash Join (cost=1135.63..1443.45 rows=84 width=11) (actual time=0.354..6.312 rows=835 loops=1)
Hash Cond: (person_dimensions.user_id = users.id)
-> Seq Scan on person_dimensions (cost=0.00..255.17 rows=13817 width=15) (actual time=0.002..2.764 rows=13902 loops=1)
-> Hash (cost=1132.96..1132.96 rows=214 width=4) (actual time=0.175..0.175 rows=60 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 11kB
-> Bitmap Heap Scan on users (cost=286.07..1132.96 rows=214 width=4) (actual time=0.032..0.157 rows=60 loops=1)
Recheck Cond: (team_id = 2)
Heap Blocks: exact=68
-> Bitmap Index Scan on index_users_on_team_id (cost=0.00..286.02 rows=214 width=0) (actual time=0.021..0.021 rows=82 loops=1)
Index Cond: (team_id = 2)
Planning time: 0.215 ms
Execution time: 6.474 ms
Anyone have any ideas why it required a reboot to be aware of all of this? Could it be that manual vacuums were required that hadn't been done in a while, or something like this? Recall I did do an analyze on the relevant tables before the reboot and it didn't change anything.