How can I "think better" when reading a PostgreSQL query plan? - postgresql

I spent over an hour today puzzling myself over a query plan that I couldn't understand. The query was an UPDATE and it just wouldn't run at all. Totally deadlocked: pg_locks showed it wasn't waiting for anything either. Now, I don't consider myself the best or worst query plan reader, but I find this one exceptionally difficult. I'm wondering how does one read these? Is there a methodology that the Pg aces follow in order to pinpoint the error?
I plan on asking another question as to how to work around this issue, but right now I'm speaking specifically about how to read these types of plans.
QUERY PLAN
--------------------------------------------------------------------------------------------
Nested Loop Anti Join (cost=47680.88..169413.12 rows=1 width=77)
Join Filter: ((co.fkey_style = v.chrome_styleid) AND (co.name = o.name))
-> Nested Loop (cost=5301.58..31738.10 rows=1 width=81)
-> Hash Join (cost=5301.58..29722.32 rows=229 width=40)
Hash Cond: ((io.lot_id = iv.lot_id) AND ((io.vin)::text = (iv.vin)::text))
-> Seq Scan on options io (cost=0.00..20223.32 rows=23004 width=36)
Filter: (name IS NULL)
-> Hash (cost=4547.33..4547.33 rows=36150 width=24)
-> Seq Scan on vehicles iv (cost=0.00..4547.33 rows=36150 width=24)
Filter: (date_sold IS NULL)
-> Index Scan using options_pkey on options co (cost=0.00..8.79 rows=1 width=49)
Index Cond: ((co.fkey_style = iv.chrome_styleid) AND (co.code = io.code))
-> Hash Join (cost=42379.30..137424.09 rows=16729 width=26)
Hash Cond: ((v.lot_id = o.lot_id) AND ((v.vin)::text = (o.vin)::text))
-> Seq Scan on vehicles v (cost=0.00..4547.33 rows=65233 width=24)
-> Hash (cost=20223.32..20223.32 rows=931332 width=44)
-> Seq Scan on options o (cost=0.00..20223.32 rows=931332 width=44)
(17 rows)
The issue with this query plan - I believe I understand - is probably best said by RhodiumToad (he is definitely better at this, so I'll bet on his explanation being better) of irc://irc.freenode.net/#postgresql:
oh, that plan is potentially disastrous
the problem with that plan is that it's running a hugely expensive hashjoin for each row
the problem is the rows=1 estimate from the other join and
the planner thinks it's ok to put a hugely expensive query in the inner path of a nestloop where the outer path is estimated to return only one row.
since, obviously, by the planner's estimate the expensive part will only be run once
but this has an obvious tendency to really mess up in practice
the problem is that the planner believes its own estimates
ideally, the planner needs to know the difference between "estimated to return 1 row" and "not possible to return more than 1 row"
but it's not at all clear how to incorporate that into the existing code
He goes on to say:
it can affect any join, but usually joins against subqueries are the most likely
Now when I read this plan the first thing I noticed was the Nested Loop Anti Join, this had a cost of 169,413 (I'll stick to upper bounds). This Anti-Join breaks down to the result of a Nested Loop at cost of 31,738, and the result of a Hash Join at a cost of 137,424. Now, the 137,424, is much greater than 31,738 so I knew the problem was the Hash Join.
Then I proceed to EXPLAIN ANALYZE the Hash Join segment outside of the query. It executed in 7 secs. I made sure there was indexes on (lot_id, vin), and (co.code, and v.code) -- there was. I disabled seq_scan and hashjoin individually and notice a speed increase of less than 2 seconds. Not near enough to account for why it wasn't progressing after an hour.
But, after all this I'm totally wrong! Yes, it was the slower part of the query, but because the rows="1" bit (I presume it was on the Nested Loop Anti Join). Here it is a bug (lack of ability) in the planner mis-estimating the amount of rows? How am I supposed to read into this to come to the same conclusion RhodiumToad did?
Is it simply rows="1" that is supposed to trigger me figuring this out?
I did run VACUUM FULL ANALYZE on all of the tables involved, and this is Postgresql 8.4.

Seeing through issues like this requires some experience on where things can go wrong. But to find issues in the query plans, try to validate the produced plan from inside out, check if the number of rows estimates are sane and cost estimates match spent time. Btw. the two cost estimates aren't lower and upper bounds, first is the estimated cost to produce the first row of output, the second number is the estimated total cost, see explain documentation for details, there is also some planner documentation available. It also helps to know how the different access methods work. As a starting point Wikipedia has information on nested loop, hash and merge joins.
In your example, you'd start with:
-> Seq Scan on options io (cost=0.00..20223.32 rows=23004 width=36)
Filter: (name IS NULL)
Run EXPLAIN ANALYZE SELECT * FROM options WHERE name IS NULL; and see if the returned rows matches the estimate. A factor of 2 off isn't usually a problem, you're trying to spot order of magnitude differences.
Then see EXPLAIN ANALYZE SELECT * FROM vehicles WHERE date_sold IS NULL; returns expected amount of rows.
Then go up one level to the hash join:
-> Hash Join (cost=5301.58..29722.32 rows=229 width=40)
Hash Cond: ((io.lot_id = iv.lot_id) AND ((io.vin)::text = (iv.vin)::text))
See if EXPLAIN ANALYZE SELECT * FROM vehicles AS iv INNER JOIN options io ON (io.lot_id = iv.lot_id) AND ((io.vin)::text = (iv.vin)::text) WHERE iv.date_sold IS NULL AND io.name IS NULL; results in 229 rows.
Up one more level adds INNER JOIN options co ON (co.fkey_style = iv.chrome_styleid) AND (co.code = io.code) and is expected to return only one row. This is probably where the issue is because if the actual numebr of rows goes from 1 to 100, the total cost estimate of traversing the inner loop of the containing nested loop is off by a factor of 100.
The underlying mistake that the planner is making is probably that it expects that the two predicates for joining in co are independent of each other and multiplies their selectivities. While in reality they may be heavily correlated and the selectivity is closer to MIN(s1, s2) not s1*s2.

Did you ANALYZE the tables? And what does pg_stats has to say about these tables? The queryplan is based on the stats, these have to be ok. And what version do you use? 8.4?
The costs can be calculated by using the stats, the amount of relpages, amount of rows and the settings in postgresql.conf for the Planner Cost Constants.
work_mem is also involved, it might be too low and force the planner to do a seqscan, to kill performance...

Related

Faster Postgres Counting with where condition

I need to count the total no of rows in a table with a where clause. My application can tolerate some level of inaccuracy.
SELECT count(*) AS "count" FROM "Orders" AS "Order" WHERE "Order"."orderType" = 'online' AND "Order"."status" = 'paid';
But clearly, this is a very slow query. I came across this answer but that returns the count of all rows in the table.
What's a faster method of counting when I have a where clause? I'm using sequelize's ORM, so any relevant method in sequelize would also help.
So, doing EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) AS "count" FROM "Orders" AS "Order" WHERE "Order"."orderType" = 'online' AND "Order"."status" != 'paid'; returns me the following:
Aggregate (cost=47268.10..47268.11 rows=1 width=8) (actual time=719.722..719.723 rows=1 loops=1)
Buffers: shared hit=32043
-> Seq Scan on ""Orders"" ""Order"" (cost=0.00..47044.35 rows=89501 width=0) (actual time=0.011..674.316 rows=194239 loops=1)
Filter: (((status)::text <> 'paid'::text) AND ((""orderType"")::text = 'online'::text))
Rows Removed by Filter: 830133
Buffers: shared hit=32043
Planning time: 0.069 ms
Execution time: 719.755 ms
My application can tolerate some level of inaccuracy.
This is pretty hard to capitalize on in PostgreSQL. It is fairly easy to get an approximate answer, but hard to put a limit on how approximate that answer is. There will always be cases where the estimate can be very wrong, and it is hard to know when that is occurring without doing the work needed to get an exact answer.
In your query plan, it is off by a factor of 2.17. Is that good enough?
(cost=0.00..47044.35 rows=89501 width=0) (actual time=0.011..674.316 rows=194239 loops=1)
Or, can you put bounds on tolerable inaccuracy in some other dimension? Like "accurate as of some point in the last hour"? With that kind of tolerance, you could make a materialized view to partially summarize the data, like:
create materialized view order_counts as
SELECT "orderType", "status", count(*) AS "count" FROM "Orders"
group by 1,2;
and then pull the counts out of that with your WHERE clause (and possibly resummarize them). The effectiveness of this depends on the number of combinations of "orderType" and "status" being much less than the total number of rows in the main table. You would have to set up a scheduled job to refresh the matview periodically. It is not implemented to have PostgreSQL rewrite your original query to use the matview, you have to rewrite it yourself.
You have shown us two different queries, status = 'paid' and status != 'paid'. Is one of those a mistake, or do they reflect variation in the queries you actually want to run? What other things might vary in this pool of similar queries? You should be able to get some speed up using indexes, but which index in particular will depend on your queries. For the equality query, you can include "status" in the index. For inequality query, that wouldn't do much good, so instead you could use a partial index WHERE status<>'paid'. (But then that index wouldn't be useful if 'paid' was changed to 'delinquent', for example.)

Postgres doing a sort on simple join

I have two tables in my database (address and person_address). Address has a PK in address_id. person_address has a PK on (address_id, person_id, usage_code)
When joining this two tables through the address_id, my expectation is that the PK index is used on both cases. However, Postgres is adding sort and materialize steps to the plan, which slows down the execution of the query. I have tried dropping indexes (person_address had an index on address_id), analyzing stats, without success.
I will appreciate any help on how to isolate this situation since those queries run slower than expected on our production environment
This is the query:
select *
from person_addresses pa
join address a
on pa.address_id = a.address_id
This is the plan :
Merge Join (cost=1506935.96..2416648.39 rows=16033774 width=338)
Merge Cond: (pa.address_id = ((a.address_id)::numeric))
-> Index Scan using person_addresses_pkey on person_addresses pa (cost=0.43..592822.76 rows=5256374 width=104)
-> Materialize (cost=1506935.53..1526969.90 rows=4006874 width=234)
-> Sort (cost=1506935.53..1516952.71 rows=4006874 width=234)
Sort Key: ((a.address_id)::numeric)
-> Seq Scan on address a (cost=0.00..163604.74 rows=4006874 width=234)
Thanks.
Edit 1. After the comment checked the data types and found a discrepancy. Fixing the data type changed the plan to the following
Hash Join (cost=343467.18..881125.47 rows=5256374 width=348)
Hash Cond: (pa.address_id = a.address_id)
-> Seq Scan on person_addresses pa (cost=0.00..147477.74 rows=5256374 width=104)
-> Hash (cost=159113.97..159113.97 rows=4033697 width=244)
-> Seq Scan on address_normalization a (cost=0.00..159113.97 rows=4033697 width=244)
Performance improvement is evident on the plan, but am wondering if the sequential scans are expected without any filters
So there are two questions here:
why did Postgres choose the (expensive) "Merge Join" in the first query?
The reason for this is that it could not use the more efficient "Hash Join" because the hash values of integer and numeric values would be different. But the Merge join requires that the values are sorted, and that's where the "Sort" step comes from in the first execution plan. Given the number of rows a "Nested Loop" would have been even more expensive.
The second question is:
I am wondering if the sequential scans are expected without any filters
Yes they are expected. The query retrieves all matching rows from both tables and that is done most efficiently by scanning all rows. An index scan requires about 2-3 I/O operations per row that has to be retrieved. A sequential scan usually requires less than one I/O operation as one block (which is the smallest unit the database reads from the disk) contains multiple rows.
You can run explain (analyze, buffers) to see how much "logical reads" each step takes.

Group by too slow on Amazon RDS Postgres

I am running Postgres 9.4.4 on an Amazon RDS db.r3.4xlarge instance
- 16CPUs, 122GB Memory.
I recently came across one of the queries which needed a fairly straight forward aggregation on a large table (~270 million records). The query takes over 5 hours to execute.
The joining column and the grouping column on the large table have indexes defined. I have tried experimenting with the work_mem and temp_buffers by setting each to 1GB but it dint help much.
Here's the query and the execution plan. Any leads will be highly appreciated.
explain SELECT
largetable.column_group,
MAX(largetable.event_captured_dt) AS last_open_date,
.....
FROM largetable
LEFT JOIN smalltable
ON smalltable.column_b = largetable.column_a
WHERE largetable.column_group IS NOT NULL
GROUP BY largetable.column_group
Here is the execution plan -
GroupAggregate (cost=699299968.28..954348399.96 rows=685311 width=38)
Group Key: largetable.column_group
-> Sort (cost=699299968.28..707801354.23 rows=3400554381 width=38)
Sort Key: largetable.column_group
-> Merge Left Join (cost=25512.78..67955201.22 rows=3400554381 width=38)
Merge Cond: (largetable.column_a = smalltable.column_b)
-> Index Scan using xcrmstg_largetable_launch_id on largetable (cost=0.57..16241746.24 rows=271850823 width=34)
Filter: (column_a IS NOT NULL)
-> Sort (cost=25512.21..26127.21 rows=246000 width=4)
Sort Key: smalltable.column_b
-> Seq Scan on smalltable (cost=0.00..3485.00 rows=246000 width=4)
You say the joining key and the grouping key on the large table are indexed, but you don't mention the joining key on the small table.
The merges and sorts are a big source of slowness. However, I'm also worried that you're returning ~700,000 rows of data. Is that really useful to you? What's the situation where you need to return that much data, but a 5 hour wait is too long? If you don't need all that data coming out, then filtering as early as possible is by far and away the largest speed gain you'll realize.

Optimizing a Large PostGIS Query

I currently have a large table mivehdetailedtrajectory (25B rows) and a small table cell_data_tower (400 rows) that I need to join using PostGIS. Specifically, I need to run this query:
SELECT COUNT(traj.*), tower.id
FROM cell_data_tower tower LEFT OUTER JOIN mivehdetailedtrajectory traj
ON ST_Contains(tower.geom, traj.location)
GROUP BY tower.id
ORDER BY tower.id;
It errors out angry that it can't write to disk. This seemed weird for a SELECT so I ran EXPLAIN:
NOTICE: gserialized_gist_joinsel: jointype 1 not supported
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Sort (cost=28905094882.25..28905094883.25 rows=400 width=120)
Sort Key: tower.id
-> HashAggregate (cost=28905094860.96..28905094864.96 rows=400 width=120)
-> Nested Loop Left Join (cost=0.00..28904927894.80 rows=33393232 width=120)
Join Filter: ((tower.geom && traj.location) AND _st_contains(tower.geom, traj.location))
-> Seq Scan on cell_data_tower tower (cost=0.00..52.00 rows=400 width=153)
-> Materialize (cost=0.00..15839886.96 rows=250449264 width=164)
-> Seq Scan on mivehdetailedtrajectory traj (cost=0.00..8717735.64 rows=250449264 width=164)
I don't understand why postgres thinks it should materialize the inner table. Also, I don't understand the plan in general to be honest. Seems like it should keep the cell_data_tower table in memory and iterate over the mivehdetailedtrajectory table. Any thoughts on how I can optimize this to (a) run, (b) do so in a reasonable amount of time. Specifically, it seems like this should be do-able in less than 1 day.
Edit: Postgres version 9.3
Queries that need a lot of memory are those rare places where correlated subqueries perform better (LATERAL JOIN should work too but those are beyond me). Also please note you didn't select tower.id so your result wouldn't be too useful.
SELECT tower.id, (SELECT COUNT(traj.*)
FROM mivehdetailedtrajectory traj
WHERE ST_Contains(tower.geom, traj.location))
FROM cell_data_tower tower
ORDER BY tower.id;
Try running it with LIMIT 1 first. The total runtime should be the runtime for one tower * number of towers.
I don't have a db so big like you, only 80M. But in my case i create a LinkID field to know where is each geom, and calculate which one is the closest LinkID when i insert a new record.
When i found out a single LinkID take 30ms and doing that 80M times would take 27 days i went from pre calculate those values.
Also i don't keep all the records, i only keep a month at any time.

How to influence the Postgres Query Analyzer when dealing with text search and geospatial data

I have a quite serious performance issue with the following statement that i can't fix myself.
Given Situation
I have a postgres 8.4 Database with Postgis 1.4 installed
I have a geospatial table with ~9 Million entries. This table has a (postgis) geometry column and a tsvector column
I have a GIST Index on the geometry and a VNAME Index on the vname column
Table is ANALYZE'd
I want to excecute a to_tsquerytext search within a subset of these geometries that should give me all affecteded ids back.
The area to search in will limit the 9 Million datasets to approximately 100.000 and the resultset of the ts_query inside this area will most likely give an output of 0..1000 Entries.
Problem
The query analyzer decides that he wants to do an Bitmap Index Scan on the vname first, and then aggreates and puts a filter on the geometry (and other conditions I have in this statement)
Query Analyzer output:
Aggregate (cost=12.35..12.62 rows=1 width=510) (actual time=5.616..5.616 rows=1 loops=1)
-> Bitmap Heap Scan on mxgeom g (cost=8.33..12.35 rows=1 width=510) (actual time=5.567..5.567 rows=0 loops=1)
Recheck Cond: (vname ## '''hemer'' & ''hauptstrasse'':*'::tsquery)
Filter: (active AND (geom && '0107000020E6100000010000000103000000010000000B0000002AFFFF5FD15B1E404AE254774BA8494096FBFF3F4CC11E40F37563BAA9A74940490200206BEC1E40466F209648A949404DF6FF1F53311F400C9623C206B2494024EBFF1F4F711F404C87835954BD4940C00000B0E7CA1E4071551679E0BD4940AD02004038991E40D35CC68418BE49408EF9FF5F297C1E404F8CFFCB5BBB4940A600006015541E40FAE6468054B8494015040060A33E1E4032E568902DAE49402AFFFF5FD15B1E404AE254774BA84940'::geometry) AND (mandator_id = ANY ('{257,1}'::bigint[])))
-> Bitmap Index Scan on gis_vname_idx (cost=0.00..8.33 rows=1 width=0) (actual time=5.566..5.566 rows=0 loops=1)
Index Cond: (vname ## '''hemer'' & ''hauptstrasse'':*'::tsquery)
which causes a LOT of I/O - AFAIK It would be smarter to limit the geometry first, and do the vname search after.
Attempted Solutions
To achieve the desired behaviour i tried to
I Put the geom ## AREA into a subselect -> Did not change the execution plan
I created a temporary view with the desired area subset -> Did not change the execution plan
I created a temporary table of the desired area -> Takes 4~6 seconds to create, so that made it even worse.
Btw, sorry for not posting the actual query: I think my boss would really be mad at me if I did, also I'm looking more for theoretical pointers for someone to fix my actual query. Please ask if you need further clarification
EDIT
Richard had a very good point: You can achieve the desired behaviour of the Query Planner with the width statement. The bad thing is that this temporary Table (or CTE) messes up the vname index, thus making the query return nothing in some cases.
I was able to fix this with creating a new vname on-the-fly with to_tsvector(), but this is (too) costly - around 300 - 500ms per query.
My Solution
I ditched the vname search and went with a simple LIKE('%query_string%') (10-20 ms/query), but this is only fast in my given environment. YMMV.
There have been some improvements in statistics handling for tsvector (and I think PostGIS too, but I don't use it). If you've got the time, it might be worth trying again with a 9.1 release and see what that does for you.
However, for this single query you might want to look at the WITH construct.
http://www.postgresql.org/docs/8.4/static/queries-with.html
If you put the geometry part as the WITH clause, it will be evaluated first (guaranteed) and then that result-set will filtered by the following SELECT. It might end up slower though, you won't know until you try.
It might be an adjustment to work_mem would help too - you can do this per-session ("SET work_mem = ...") but be careful with setting it too high - concurrent queries can quickly burn through all your RAM.
http://www.postgresql.org/docs/8.4/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY