Search results using ts_vectors and ST_Distance

Search results using ts_vectors and ST_Distance - postgresql

We have a page where we show a list of results, and the results must be relevant given 2 factors:
keyword similarity
location
we are using postgresql postgis and ts_vectors, however, we don't know how to combine the scores coming of ts vectors and st_distance in order to have the "best" search results, the queries seem to be taking between 30 seconds and 1 minute.
SELECT [121/1808]
ts_rank_cd(doc_vectors, plainto_tsquery('Uber '), 1 | 4 | 32) AS rank, ts_headline('english', short_job_description, plainto_tsquery('Uber '), 'MaxWords=80,MinWords=50'),
-- a bunch of fields omitted...
org.logo
FROM jobs.job as job
LEFT OUTER JOIN jobs.organization as org
ON job.organization_id = org.id
WHERE job.is_expired = 0 and deleted_at is NULL and doc_vectors ## plainto_tsquery('Uber ') order by rank desc offset 80 limit 20;
Do you guys have suggestions for us?
EXPLAIN (ANALYZE, BUFFERS) for same Query:
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=886908.73..886908.81 rows=30 width=1108) (actual time=20684.508..20684.518 rows=30 loops=1)
Buffers: shared hit=1584 read=825114
-> Sort (cost=886908.68..889709.48 rows=1120318 width=1108) (actual time=20684.502..20684.509 rows=50 loops=1)
Sort Key: job.created_at DESC
Sort Method: top-N heapsort Memory: 75kB
Buffers: shared hit=1584 read=825114
-> Hash Left Join (cost=421.17..849692.52 rows=1120318 width=1108) (actual time=7.012..18887.816 rows=1111019 loops=1)
Hash Cond: (job.organization_id = org.id)
Buffers: shared hit=1581 read=825114
-> Seq Scan on job (cost=0.00..846329.53 rows=1120318 width=1001) (actual time=0.052..17866.594 rows=1111019 loops=1)
Filter: ((deleted_at IS NULL) AND (is_expired = 0) AND (is_hidden = 0))
Rows Removed by Filter: 196298
Buffers: shared hit=1564 read=824989
-> Hash (cost=264.41..264.41 rows=12541 width=107) (actual time=6.898..6.899 rows=12541 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 1037kB
Buffers: shared hit=14 read=125
-> Seq Scan on organization org (cost=0.00..264.41 rows=12541 width=107) (actual time=0.021..3.860 rows=12541 loops=1)
Buffers: shared hit=14 read=125
Planning time: 2.223 ms
Execution time: 20684.682 ms```

Related

Efficiently querying view with left join

PostgreSQL 14.6 on x86_64-pc-linux-gnu, compiled by gcc, a 12d889e8a6 p ce8d8b4729, 64-bit
I have an organizations table and a (much smaller) partner_members table that associate some organizations with a partner_member_id.
There is also a convenience view to list organizations with their (potential) partner IDs, defined like this:
select
o.id,
o.name,
o.email,
o.created,
p.member_id AS partner_member_id
from organizations o
left join partner_members p on o.id= p.organization_id
However, this leads to an admin query that queries this view ending up like this:
select count(*) OVER (),"id","name","email","created"
from (
select
o.id,
o.name,
o.email,
o.created,
p.member_id AS partner_member_id
from organizations o
left join partner_members p on o.id= p.organization_id
) _
where ("name" ilike '%example#example.com%')
or ("email" ilike '%example#example.com%')
or ("partner_member_id" ilike '%example#example.com%')
or ("id" ilike '%example#example.com%')
order by "created" desc
offset 0 limit 50;
… which is super slow, since the partner_member_id constraint isn't “pushed” into the sub query, which means that the filtering happens way too late.
Is there a way to make a query such as this efficient, or is this convenience view a no-go here?
Here is the plan:
Limit (cost=12842.32..12848.77 rows=50 width=74) (actual time=2344.828..2385.234 rows=0 loops=1)
Buffers: shared hit=5246, temp read=3088 written=3120
-> WindowAgg (cost=12842.32..12853.80 rows=89 width=74) (actual time=2344.826..2385.232 rows=0 loops=1)
Buffers: shared hit=5246, temp read=3088 written=3120
-> Gather Merge (cost=12842.32..12852.69 rows=89 width=66) (actual time=2344.822..2385.226 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5246, temp read=3088 written=3120
-> Sort (cost=11842.30..11842.39 rows=37 width=66) (actual time=2322.988..2323.050 rows=0 loops=3)
Sort Key: o.created DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=5246, temp read=3088 written=3120
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Parallel Hash Left Join (cost=3368.61..11841.33 rows=37 width=66) (actual time=2322.857..2322.917 rows=0 loops=3)
Hash Cond: ((o.id)::text = p.organization_id)
Filter: (((o.name)::text ~~* '%example#example.com%'::text) OR ((o.email)::text ~~* '%example#example.com%'::text) OR (p.member_id ~~* '%example#example.com%'::text) OR ((o.id)::text ~~* '%example#example.com%'::text))
Rows Removed by Filter: 73800
Buffers: shared hit=5172, temp read=3088 written=3120
-> Parallel Seq Scan on organizations o (cost=0.00..4813.65 rows=92365 width=66) (actual time=0.020..200.111 rows=73800 loops=3)
Buffers: shared hit=3890
-> Parallel Hash (cost=1926.05..1926.05 rows=71005 width=34) (actual time=108.608..108.610 rows=40150 loops=3)
Buckets: 32768 Batches: 4 Memory Usage: 2432kB
Buffers: shared hit=1216, temp written=620
-> Parallel Seq Scan on partner_members p (cost=0.00..1926.05 rows=71005 width=34) (actual time=0.028..43.757 rows=40150 loops=3)
Buffers: shared hit=1216
Planning:
Buffers: shared hit=24
Planning Time: 1.837 ms
Execution Time: 2385.319 ms

Postgres not using index when ORDER BY and LIMIT when LIMIT above X

I have been trying to debug an issue with postgres where it decides to not use an index when LIMIT is above a specific value.
For example I have a table of 150k rows and when searching with LIMIT of 286 it uses the index while with LIMIT above 286 it does not.
LIMIT 286 uses index
db=# explain (analyze, buffers) SELECT * FROM tempz.tempx AS r INNER JOIN tempz.tempy AS z ON (r.id_tempy=z.id) WHERE z.int_col=2000 AND z.string_col='temp_string' ORDER BY r.name ASC, r.type ASC, r.id ASC LIMIT 286;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..5024.12 rows=286 width=810) (actual time=0.030..0.992 rows=286 loops=1)
Buffers: shared hit=921
-> Nested Loop (cost=0.56..16968.23 rows=966 width=810) (actual time=0.030..0.977 rows=286 loops=1)
Join Filter: (r.id_tempy = z.id)
Rows Removed by Join Filter: 624
Buffers: shared hit=921
-> Index Scan using tempz_tempx_name_type_id_idx on tempx r (cost=0.42..14357.69 rows=173878 width=373) (actual time=0.016..0.742 rows=910 loops=1)
Buffers: shared hit=919
-> Materialize (cost=0.14..2.37 rows=1 width=409) (actual time=0.000..0.000 rows=1 loops=910)
Buffers: shared hit=2
-> Index Scan using tempy_string_col_idx on tempy z (cost=0.14..2.37 rows=1 width=409) (actual time=0.007..0.008 rows=1 loops=1)
Index Cond: (string_col = 'temp_string'::text)
Filter: (int_col = 2000)
Buffers: shared hit=2
Planning Time: 0.161 ms
Execution Time: 1.032 ms
(16 rows)
vs.
LIMIT 287 doing sort
db=# explain (analyze, buffers) SELECT * FROM tempz.tempx AS r INNER JOIN tempz.tempy AS z ON (r.id_tempy=z.id) WHERE z.int_col=2000 AND z.string_col='temp_string' ORDER BY r.name ASC, r.type ASC, r.id ASC LIMIT 287;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4976.86..4977.58 rows=287 width=810) (actual time=49.802..49.828 rows=287 loops=1)
Buffers: shared hit=37154
-> Sort (cost=4976.86..4979.27 rows=966 width=810) (actual time=49.801..49.813 rows=287 loops=1)
Sort Key: r.name, r.type, r.id
Sort Method: top-N heapsort Memory: 506kB
Buffers: shared hit=37154
-> Nested Loop (cost=0.42..4932.59 rows=966 width=810) (actual time=0.020..27.973 rows=51914 loops=1)
Buffers: shared hit=37154
-> Seq Scan on tempy z (cost=0.00..12.70 rows=1 width=409) (actual time=0.006..0.008 rows=1 loops=1)
Filter: ((int_col = 2000) AND (string_col = 'temp_string'::text))
Rows Removed by Filter: 2
Buffers: shared hit=1
-> Index Scan using tempx_id_tempy_idx on tempx r (cost=0.42..4340.30 rows=57959 width=373) (actual time=0.012..17.075 rows=51914 loops=1)
Index Cond: (id_tempy = z.id)
Buffers: shared hit=37153
Planning Time: 0.258 ms
Execution Time: 49.907 ms
(17 rows)
Update:
This is Postgres 11 and VACUUM ANALYZE is run daily. Also, I have already tried to use CTE to remove the filter but the problem is the sorting specifically
-> Sort (cost=4976.86..4979.27 rows=966 width=810) (actual time=49.801..49.813 rows=287 loops=1)
Sort Key: r.name, r.type, r.id
Sort Method: top-N heapsort Memory: 506kB
Buffers: shared hit=37154
Update 2:
After running VACUUM ANALYZE the database starts using the index for some hours and then it goes back to not using it.

Turns out that I can force Postgres to avoid doing any sort if I run SET enable_sort TO OFF;. This raises the cost of sorting very high which causes the Postgres planner to do index scan instead.
I am not really sure why Postgres thinks that index scan is so costly cost=0.42..14357.69 and thinks sorting is cheaper and ends up choosing that. It is also very weird that immediately after a VACUUM ANALYZE it analyzes the costs correctly but after some hours it goes back to sorting.
With sort off plan is still not optimized as it does materialize and loads stuff into memory but it is still faster than sorting.

Postgresql Limit 1 causing large number of loops

We have a transactions table, approx. 10m rows and growing. Each customer we have specifies many rules which group certain transactions together based on locations, associated products, sale customer, etc. Off of these rules we produce reports each night which allows them to see the price the customer is paying for products vs their purchase prices from different price lists, these lists are changing daily and each date on the transaction we have to either find their set yearly price or the price that was effective at the date of the transaction.
These price lists can change historically and do all the time as do new historic transactions which are added so within each financial year we have to continue to regenerate these reports.
We are having a problem with the two types of price list/price joins we have to do. The first is on the set yearly price list.
I have removed the queries which bring the transactions in and put into a table called transaction_data_6787.
EXPLAIN analyze
SELECT *
FROM transaction_data_6787 t
inner JOIN LATERAL
(
SELECT p."Price"
FROM "Prices" p
INNER JOIN "PriceLists" pl on p."PriceListId" = pl."Id"
WHERE (pl."CustomerId" = 20)
AND (pl."Year" = 2020)
AND (pl."PriceListTypeId" = 2)
AND p."ProductId" = t.product_id
limit 1
) AS prices ON true
Nested Loop (cost=0.70..133877.20 rows=5394 width=165) (actual time=0.521..193.638 rows=5394 loops=1) -> Seq Scan on transaction_data_6787 t (cost=0.00..159.94 rows=5394 width=145) (actual time=0.005..0.593 rows=5394 loops=1) -> Limit (cost=0.70..24.77 rows=1 width=20) (actual time=0.035..0.035 rows=1 loops=5394)
-> Nested Loop (cost=0.70..24.77 rows=1 width=20) (actual time=0.035..0.035 rows=1 loops=5394)
-> Index Scan using ix_prices_covering on "Prices" p (cost=0.42..8.44 rows=1 width=16) (actual time=0.006..0.015 rows=23 loops=5394)
Index Cond: (("ProductId" = t.product_id))
-> Index Scan using ix_pricelists_covering on "PriceLists" pl (cost=0.28..8.30 rows=1 width=12) (actual time=0.001..0.001 rows=0 loops=122443)
Index Cond: (("Id" = p."PriceListId") AND ("CustomerId" = 20) AND ("PriceListTypeId" = 2))
Filter: ("Year" = 2020)
Rows Removed by Filter: 0 Planning Time: 0.307 ms Execution Time: 193.982 ms
If I remove the LIMIT 1, the execution time drops to 3ms and the 122443 loops on ix_pricelists_covering don't happen. The reason we are doing a lateral join is the price query is dynamically built and sometimes when not joining on the annual price list we join on the effective price lists. This looks like the below:
EXPLAIN analyze
SELECT *
FROM transaction_data_6787 t
inner JOIN LATERAL
(
SELECT p."Price"
FROM "Prices" p
INNER JOIN "PriceLists" pl on p."PriceListId" = pl."Id"
WHERE (pl."CustomerId" = 20)
AND (pl."PriceListTypeId" = 1)
AND p."ProductId" = t.product_id
and pl."ValidFromDate" <= t.transaction_date
ORDER BY pl."ValidFromDate" desc
limit 1
) AS prices ON true
This is killing our performance, some queries are taking 20 seconds plus when we don't order by date desc/limit 1 it completes in ms but we could get duplicate prices back.
We are happy to rewrite if a better way of joining the most recent record. We have thousands of price lists and 100k or prices and there could be 100s if not 1000s of effective prices for each transaction and we need to ensure we get the one which was most recently effective for a product at the date of the transaction.
I have found if I denormalise the price lists/prices into a single table and add an index with ValidFromDate DESC it seems to eliminate the loops but I am hesitant to denormalise and have to maintain that data, these reports can be run adhoc as well as batch jobs and we would have to maintain that data in real time.
Updated Explain/Analyze:
I've added below the query which joins on prices which need to get the most recently effective for the transaction date. I see now that when <= date clause and limit 1 is removed it's actually spinning up multiple workers which is why it seems faster.
I am still seeing the slower query doing a large number of loops, 200k+ (when limit 1/<= date is included).
Maybe the better question is what can we do instead of a lateral join which will allow us to join the effective prices for transactions in the most efficient/performant way possible. I am hoping to avoid denormalising and maintainint that data but if it is the only way we'll do it. If there is a way to rewrite this and not denormalise then I'd really appreciate any insight.
Nested Loop (cost=14.21..76965.60 rows=5394 width=10) (actual time=408.948..408.950 rows=0 loops=1)
Output: t.transaction_id, pr."Price"
Buffers: shared hit=688022
-> Seq Scan on public.transaction_data_6787 t (cost=0.00..159.94 rows=5394 width=29) (actual time=0.018..0.682 rows=5394 loops=1)
Output: t.transaction_id
Buffers: shared hit=106
-> Limit (cost=14.21..14.22 rows=1 width=10) (actual time=0.075..0.075 rows=0 loops=5394)
Output: pr."Price", pl."ValidFromDate"
Buffers: shared hit=687916
-> Sort (cost=14.21..14.22 rows=1 width=10) (actual time=0.075..0.075 rows=0 loops=5394)
Output: pr."Price", pl."ValidFromDate"
Sort Key: pl."ValidFromDate" DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=687916
-> Nested Loop (cost=0.70..14.20 rows=1 width=10) (actual time=0.074..0.074 rows=0 loops=5394)
Output: pr."Price", pl."ValidFromDate"
Inner Unique: true
Buffers: shared hit=687916
-> Index Only Scan using ix_prices_covering on public."Prices" pr (cost=0.42..4.44 rows=1 width=10) (actual time=0.007..0.019 rows=51 loops=5394)
Output: pr."ProductId", pr."ValidFromDate", pr."Id", pr."Price", pr."PriceListId"
Index Cond: (pr."ProductId" = t.product_id)
Heap Fetches: 0
Buffers: shared hit=17291
-> Index Scan using ix_pricelists_covering on public."PriceLists" pl (cost=0.28..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=273678)
Output: pl."Id", pl."Name", pl."CustomerId", pl."ValidFromDate", pl."PriceListTypeId"
Index Cond: ((pl."Id" = pr."PriceListId") AND (pl."CustomerId" = 20) AND (pl."PriceListTypeId" = 1))
Filter: (pl."ValidFromDate" <= t.transaction_date)
Rows Removed by Filter: 0
Buffers: shared hit=670625
Planning Time: 1.254 ms
Execution Time: 409.088 ms
Gather (cost=6395.67..7011.99 rows=68 width=10) (actual time=92.481..92.554 rows=0 loops=1)
Output: t.transaction_id, pr."Price"
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=1466 read=2
-> Hash Join (cost=5395.67..6005.19 rows=28 width=10) (actual time=75.126..75.129 rows=0 loops=3)
Output: t.transaction_id, pr."Price"
Inner Unique: true
Hash Cond: (pr."PriceListId" = pl."Id")
Join Filter: (pl."ValidFromDate" <= t.transaction_date)
Rows Removed by Join Filter: 41090
Buffers: shared hit=1466 read=2
Worker 0: actual time=64.707..64.709 rows=0 loops=1
Buffers: shared hit=462
Worker 1: actual time=72.545..72.547 rows=0 loops=1
Buffers: shared hit=550 read=1
-> Merge Join (cost=5374.09..5973.85 rows=3712 width=18) (actual time=26.804..61.492 rows=91226 loops=3)
Output: t.transaction_id, t.transaction_date, pr."Price", pr."PriceListId"
Merge Cond: (pr."ProductId" = t.product_id)
Buffers: shared hit=1325 read=2
Worker 0: actual time=17.677..51.590 rows=83365 loops=1
Buffers: shared hit=400
Worker 1: actual time=24.995..59.395 rows=103814 loops=1
Buffers: shared hit=488 read=1
-> Parallel Index Only Scan using ix_prices_covering on public."Prices" pr (cost=0.42..7678.38 rows=79544 width=29) (actual time=0.036..12.136 rows=42281 loops=3)
Output: pr."ProductId", pr."ValidFromDate", pr."Id", pr."Price", pr."PriceListId"
Heap Fetches: 0
Buffers: shared hit=989 read=2
Worker 0: actual time=0.037..9.660 rows=36873 loops=1
Buffers: shared hit=285
Worker 1: actual time=0.058..13.459 rows=47708 loops=1
Buffers: shared hit=373 read=1
-> Sort (cost=494.29..507.78 rows=5394 width=29) (actual time=9.037..14.700 rows=94555 loops=3)
Output: t.transaction_id, t.product_id, t.transaction_date
Sort Key: t.product_id
Sort Method: quicksort Memory: 614kB
Worker 0: Sort Method: quicksort Memory: 614kB
Worker 1: Sort Method: quicksort Memory: 614kB
Buffers: shared hit=336
Worker 0: actual time=6.608..12.034 rows=86577 loops=1
Buffers: shared hit=115
Worker 1: actual time=8.973..14.598 rows=107126 loops=1
Buffers: shared hit=115
-> Seq Scan on public.transaction_data_6787 t (cost=0.00..159.94 rows=5394 width=29) (actual time=0.020..2.948 rows=5394 loops=3)
Output: t.transaction_id, t.product_id, t.transaction_date
Buffers: shared hit=318
Worker 0: actual time=0.017..2.078 rows=5394 loops=1
Buffers: shared hit=106
Worker 1: actual time=0.027..2.976 rows=5394 loops=1
Buffers: shared hit=106
-> Hash (cost=21.21..21.21 rows=30 width=8) (actual time=0.145..0.145 rows=35 loops=3)
Output: pl."Id", pl."ValidFromDate"
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=53
Worker 0: actual time=0.137..0.138 rows=35 loops=1
Buffers: shared hit=18
Worker 1: actual time=0.149..0.150 rows=35 loops=1
Buffers: shared hit=18
-> Bitmap Heap Scan on public."PriceLists" pl (cost=4.59..21.21 rows=30 width=8) (actual time=0.067..0.114 rows=35 loops=3)
Output: pl."Id", pl."ValidFromDate"
Recheck Cond: (pl."CustomerId" = 20)
Filter: (pl."PriceListTypeId" = 1)
Rows Removed by Filter: 6
Heap Blocks: exact=15
Buffers: shared hit=53
Worker 0: actual time=0.068..0.108 rows=35 loops=1
Buffers: shared hit=18
Worker 1: actual time=0.066..0.117 rows=35 loops=1
Buffers: shared hit=18
-> Bitmap Index Scan on "IX_PriceLists_CustomerId" (cost=0.00..4.58 rows=41 width=0) (actual time=0.049..0.049 rows=41 loops=3)
Index Cond: (pl."CustomerId" = 20)
Buffers: shared hit=8
Worker 0: actual time=0.053..0.054 rows=41 loops=1
Buffers: shared hit=3
Worker 1: actual time=0.048..0.048 rows=41 loops=1
Buffers: shared hit=3
Planning Time: 2.236 ms
Execution Time: 92.814 ms

query taking nested loop instead of hash join

Below query is nested loop and runs for 21 mins, after disabling nested loop it works in < 1min. Table stats are up to date and vacuum is run on the tables, any way to figure out why postgres is taking nested loop instead of efficient hash join.
Also to disable a nested loop is it better to set enable_nestloop to off or increase the random_page_cost? I think setting nestloop to off would stop plans from using nested loop if its going to be efficient in some places. What would be a better alternative, please advise.
SELECT DISTINCT ON (three.quebec_delta)
last_value(three.reviewed_by_nm) OVER wnd AS reviewed_by_nm,
last_value(three.reviewer_specialty_nm) OVER wnd AS reviewer_specialty_nm,
last_value(three.kilo) OVER wnd AS kilo,
last_value(three.review_reason_dscr) OVER wnd AS review_reason_dscr,
last_value(three.review_notes) OVER wnd AS review_notes,
last_value(three.seven_uniform_charlie) OVER wnd AS seven_uniform_charlie,
last_value(three.di_audit_source_system_cd) OVER wnd AS di_audit_source_system_cd,
last_value(three.di_audit_update_dtm) OVER wnd AS di_audit_update_dtm,
three.quebec_delta
FROM
ods_authorization.quebec_foxtrot seven_uniform_foxtrot
JOIN ods_authorization.golf echo ON seven_uniform_foxtrot.four = echo.oscar
JOIN ods_authorization.papa three ON echo.five = three.quebec_delta
AND three.xray = '0'::bpchar
WHERE
seven_uniform_foxtrot.two_india >= (zulu () - '2 years'::interval)
AND lima (three.kilo, 'ADVISOR'::character varying)::text = 'ADVISOR'::text
WINDOW wnd AS (PARTITION BY three.quebec_delta ORDER BY three.seven_uniform_charlie DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
Plan running for 21m and taking nested loop
Unique (cost=550047.63..550257.15 rows=5238 width=281) (actual time=1295000.966..1296128.356 rows=319863 loops=1)
-> WindowAgg (cost=550047.63..550244.06 rows=5238 width=281) (actual time=1295000.964..1296013.046 rows=461635 loops=1)
-> Sort (cost=550047.63..550060.73 rows=5238 width=326) (actual time=1295000.929..1295089.796 rows=461635 loops=1)
Sort Key: three.quebec_delta, three.seven_uniform_charlie DESC
Sort Method: quicksort Memory: 197021kB
-> Nested Loop (cost=1001.12..549724.06 rows=5238 width=326) (actual time=8.274..1292470.826 rows=461635 loops=1)
-> Gather (cost=1000.56..527782.84 rows=24896 width=391) (actual time=4.287..12701.687 rows=3484699 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=0.56..524293.24 rows=10373 width=391) (actual time=3.492..400998.923 rows=1161566 loops=3)
-> Parallel Seq Scan on papa three (cost=0.00..436912.84 rows=10373 width=326) (actual time=1.554..2455.626 rows=1161566 loops=3)
Filter: ((xray = 'november'::bpchar) AND ((lima_sierra(kilo, 'two_zulu'::character varying))::text = 'two_zulu'::text))
Rows Removed by Filter: 501723
-> Index Scan using five_tango on golf echo (cost=0.56..8.42 rows=1 width=130) (actual time=0.342..0.342 rows=1 loops=3484699)
Index Cond: (five_hotel = three.quebec_delta)
-> Index Scan using lima_alpha on quebec_foxtrot seven_uniform_foxtrot (cost=0.56..0.88 rows=1 width=65) (actual time=0.366..0.366 rows=0 loops=3484699)
Index Cond: (four = echo.oscar)
Filter: (two_india >= (zulu() - 'two_two'::interval))
Rows Removed by Filter: 1
Planning time: 0.777 ms
Execution time: 1296183.259 ms
Plan after setting enable_nestloop to off and work_mem to 8GB. I get the same plan when increasing random_page_cost to 1000.
Unique (cost=5933437.24..5933646.68 rows=5236 width=281) (actual time=19898.050..20993.124 rows=319980 loops=1)
-> WindowAgg (cost=5933437.24..5933633.59 rows=5236 width=281) (actual time=19898.049..20879.655 rows=461769 loops=1)
-> Sort (cost=5933437.24..5933450.33 rows=5236 width=326) (actual time=19898.022..19978.839 rows=461769 loops=1)
Sort Key: three.quebec_delta, three.seven_uniform_charlie DESC
Sort Method: quicksort Memory: 197056kB
-> Hash Join (cost=1947451.87..5933113.80 rows=5236 width=326) (actual time=11616.323..17931.146 rows=461769 loops=1)
Hash Cond: (echo.oscar = seven_uniform_foxtrot.four)
-> Gather (cost=438059.74..4423656.32 rows=24897 width=391) (actual time=1909.685..7291.289 rows=3484833 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=437059.74..4420166.62 rows=10374 width=391) (actual time=1904.546..7385.948 rows=1161611 loops=3)
Hash Cond: (echo.five = three.quebec_delta)
-> Parallel Seq Scan on golf echo (cost=0.00..3921922.09 rows=8152209 width=130) (actual time=0.003..1756.576 rows=6531668 loops=3)
-> Parallel Hash (cost=436930.07..436930.07 rows=10374 width=326) (actual time=1904.354..1904.354 rows=1161611 loops=3)
Buckets: 4194304 (originally 32768) Batches: 1 (originally 1) Memory Usage: 1135200kB
-> Parallel Seq Scan on papa three (cost=0.00..436930.07 rows=10374 width=326) (actual time=0.009..963.728 rows=1161611 loops=3)
Filter: ((xray = 'november'::bpchar) AND ((lima(kilo, 'two_zulu'::character varying))::text = 'two_zulu'::text))
Rows Removed by Filter: 502246
-> Hash (cost=1476106.74..1476106.74 rows=2662831 width=65) (actual time=9692.517..9692.517 rows=2685656 loops=1)
Buckets: 4194304 Batches: 1 Memory Usage: 287171kB
-> Seq Scan on quebec_foxtrot seven_uniform_foxtrot (cost=0.00..1476106.74 rows=2662831 width=65) (actual time=0.026..8791.556 rows=2685656 loops=1)
Filter: (two_india >= (zulu() - 'two_two'::interval))
Rows Removed by Filter: 9984069
Planning time: 0.742 ms
Execution time: 21218.770 ms

Try an index on papa(lima_sierra(kilo, 'two_zulu'::character varying)) and ANALYZE the table. With that index in place, PostgreSQL collects statistics on the expression, which should improve the estimate, so that you don't get a nested loop join.

If you just replace COALESCE(r_cd, 'ADVISOR') = 'ADVISOR' with
(r_cd = 'ADVISOR' or r_cd IS NULL)
That might use the current table statistics to improve the estimates enough to change the plan.

PostgreSQL 9.4 Index speed very slow on Select

The box running PostgreSQL 9.4 is a 32 core system at 3.5ghz per core with 128GB or ram with mirrored Samsung pro 850 SSD drives on FreeBSD with ZFS. This is no reason for this poor of performance from PostgreSQL!
psql (9.4.5)
forex=# \d pair_data;
Table "public.pair_data"
Column | Type | Modifiers
------------+-----------------------------+--------------------------------------------------------
id | integer | not null default nextval('pair_data_id_seq'::regclass)
pair_name | character varying(6) |
pair_price | numeric(9,6) |
ts | timestamp without time zone |
Indexes:
"pair_data_pkey" PRIMARY KEY, btree (id)
"date_idx" gin (ts)
"pair_data_gin_idx1" gin (pair_name)
"pair_data_gin_idx2" gin (ts)
With this select:
forex=# explain (analyze, buffers) select pair_name as name, pair_price as price, ts as date from pair_data where pair_name = 'EURUSD' order by ts desc limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=201442.55..201442.57 rows=10 width=22) (actual time=10870.395..10870.430 rows=10 loops=1)
Buffers: shared hit=40 read=139150 dirtied=119
-> Sort (cost=201442.55..203787.44 rows=937957 width=22) (actual time=10870.390..10870.401 rows=10 loops=1)
Sort Key: ts
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=40 read=139150 dirtied=119
-> Bitmap Heap Scan on pair_data (cost=9069.17..181173.63 rows=937957 width=22) (actual time=614.976..8903.913 rows=951858 loops=1)
Recheck Cond: ((pair_name)::text = 'EURUSD'::text)
Rows Removed by Index Recheck: 13624055
Heap Blocks: exact=33464 lossy=105456
Buffers: shared hit=40 read=139150 dirtied=119
-> Bitmap Index Scan on pair_data_gin_idx1 (cost=0.00..8834.68 rows=937957 width=0) (actual time=593.701..593.701 rows=951858 loops=1)
Index Cond: ((pair_name)::text = 'EURUSD'::text)
Buffers: shared hit=40 read=230
Planning time: 0.387 ms
Execution time: 10871.419 ms
(16 rows)
Or this select:
forex=# explain (analyze, buffers) with intervals as ( select start, start + interval '4hr' as end from generate_series('2015-12-01 15:00', '2016-01-19 16:00', interval '4hr') as start ) select distinct intervals.start as date, min(pair_price) over w as low, max(pair_price) over w as high, first_value(pair_price) over w as open, last_value(pair_price) over w as close from intervals join pair_data mb on mb.pair_name = 'EURUSD' and mb.ts >= intervals.start and mb.ts < intervals.end window w as (partition by intervals.start order by mb.ts asc rows between unbounded preceding and unbounded following) order by intervals.start;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=64634864.43..66198331.09 rows=815054 width=23) (actual time=1379732.924..1384602.952 rows=82 loops=1)
Buffers: shared hit=8 read=139210 dirtied=22, temp read=3332596 written=17050
CTE intervals
-> Function Scan on generate_series start (cost=0.00..12.50 rows=1000 width=8) (actual time=0.135..1.801 rows=295 loops=1)
-> Sort (cost=64634851.92..64895429.70 rows=104231111 width=23) (actual time=1379732.918..1381724.179 rows=951970 loops=1)
Sort Key: intervals.start, (min(mb.pair_price) OVER (?)), (max(mb.pair_price) OVER (?)), (first_value(mb.pair_price) OVER (?)), (last_value(mb.pair_price) OVER (?))
Sort Method: external sort Disk: 60808kB
Buffers: shared hit=8 read=139210 dirtied=22, temp read=3332596 written=17050
-> WindowAgg (cost=41474743.35..44341098.90 rows=104231111 width=23) (actual time=1341744.405..1365946.672 rows=951970 loops=1)
Buffers: shared hit=8 read=139210 dirtied=22, temp read=3324995 written=9449
-> Sort (cost=41474743.35..41735321.12 rows=104231111 width=23) (actual time=1341743.502..1343723.884 rows=951970 loops=1)
Sort Key: intervals.start, mb.ts
Sort Method: external sort Disk: 32496kB
Buffers: shared hit=8 read=139210 dirtied=22, temp read=1154778 written=7975
-> Nested Loop (cost=9098.12..21180990.32 rows=104231111 width=23) (actual time=271672.696..1337526.628 rows=951970 loops=1)
Join Filter: ((mb.ts >= intervals.start) AND (mb.ts < intervals."end"))
Rows Removed by Join Filter: 279879180
Buffers: shared hit=8 read=139210 dirtied=22, temp read=1150716 written=3913
-> CTE Scan on intervals (cost=0.00..20.00 rows=1000 width=16) (actual time=0.142..4.075 rows=295 loops=1)
-> Materialize (cost=9098.12..190496.52 rows=938080 width=15) (actual time=2.125..1962.153 rows=951970 loops=295)
Buffers: shared hit=8 read=139210 dirtied=22, temp read=1150716 written=3913
-> Bitmap Heap Scan on pair_data mb (cost=9098.12..181225.12 rows=938080 width=15) (actual time=622.818..11474.987 rows=951970 loops=1)
Recheck Cond: ((pair_name)::text = 'EURUSD'::text)
Rows Removed by Index Recheck: 13623989
Heap Blocks: exact=33485 lossy=105456
Buffers: shared hit=8 read=139210 dirtied=22
-> Bitmap Index Scan on pair_data_gin_idx1 (cost=0.00..8863.60 rows=938080 width=0) (actual time=601.158..601.158 rows=951970 loops=1)
Index Cond: ((pair_name)::text = 'EURUSD'::text)
Buffers: shared hit=8 read=269
Planning time: 0.454 ms
Execution time: 1384653.385 ms
(31 rows)
Table of pair_data only has:
forex=# select count(*) from pair_data;
count
----------
21833886
(1 row)
Why is this doing heap scans when their are indexes? I do not understand what is going on with the query plan? Dose anyone have an idea on where the problem might be?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Search results using ts_vectors and ST_Distance - postgresql

Related

Efficiently querying view with left join

Postgres not using index when ORDER BY and LIMIT when LIMIT above X

Postgresql Limit 1 causing large number of loops

query taking nested loop instead of hash join

PostgreSQL 9.4 Index speed very slow on Select

Categories

Resources