I am running Postgres 9.4.4 on an Amazon RDS db.r3.4xlarge instance
- 16CPUs, 122GB Memory.
I recently came across one of the queries which needed a fairly straight forward aggregation on a large table (~270 million records). The query takes over 5 hours to execute.
The joining column and the grouping column on the large table have indexes defined. I have tried experimenting with the work_mem and temp_buffers by setting each to 1GB but it dint help much.
Here's the query and the execution plan. Any leads will be highly appreciated.
explain SELECT
largetable.column_group,
MAX(largetable.event_captured_dt) AS last_open_date,
.....
FROM largetable
LEFT JOIN smalltable
ON smalltable.column_b = largetable.column_a
WHERE largetable.column_group IS NOT NULL
GROUP BY largetable.column_group
Here is the execution plan -
GroupAggregate (cost=699299968.28..954348399.96 rows=685311 width=38)
Group Key: largetable.column_group
-> Sort (cost=699299968.28..707801354.23 rows=3400554381 width=38)
Sort Key: largetable.column_group
-> Merge Left Join (cost=25512.78..67955201.22 rows=3400554381 width=38)
Merge Cond: (largetable.column_a = smalltable.column_b)
-> Index Scan using xcrmstg_largetable_launch_id on largetable (cost=0.57..16241746.24 rows=271850823 width=34)
Filter: (column_a IS NOT NULL)
-> Sort (cost=25512.21..26127.21 rows=246000 width=4)
Sort Key: smalltable.column_b
-> Seq Scan on smalltable (cost=0.00..3485.00 rows=246000 width=4)
You say the joining key and the grouping key on the large table are indexed, but you don't mention the joining key on the small table.
The merges and sorts are a big source of slowness. However, I'm also worried that you're returning ~700,000 rows of data. Is that really useful to you? What's the situation where you need to return that much data, but a 5 hour wait is too long? If you don't need all that data coming out, then filtering as early as possible is by far and away the largest speed gain you'll realize.
Related
I'm trying to get distinct values from a nested field on JSONB column, but it takes about 2 minutes on a 400K rows table.
The original query used DISTINCT but then I read that GROUP BY works better so tried this too, but no luck - still extremely slow.
Adding an index did not help either:
create index "orders_financial_status_index" on orders ((data ->'data'->> 'financial_status'));
ANALYZE EXPLAIN gave this result:
HashAggregate (cost=13431.16..13431.22 rows=4 width=32) (actual time=123074.941..123074.943 rows=4 loops=1)
Group Key: ((data -> 'data'::text) ->> 'financial_status'::text)
-> Seq Scan on orders (cost=0.00..12354.14 rows=430809 width=32) (actual time=2.993..122780.325 rows=434080 loops=1)
Planning time: 0.119 ms
Execution time: 123074.979 ms
It's worth mentioning that there are no null values on this column, and currently there are 4 unique values.
What should I do in order to query the distinct values faster?
No index will make this faster, because the query has to scan the whole table.
As you can see, the sequential scan uses almost all the time; the hash aggregate is fast.
Still I would not drop the index, because it allows PostgreSQL to estimate the number of groups accurately and decide on the more efficient hash aggregate rather than sorting the rows. You can try without the index to be sure.
However, two minutes for half a million rows is not very fast. Do you have slow storage? Is the table bloated? If the latter, VACUUM (FULL) should improve things.
You can speed up the query by reducing I/O. Load the table into RAM with pg_prewarm, then processing should be considerably faster.
I have two tables in my database (address and person_address). Address has a PK in address_id. person_address has a PK on (address_id, person_id, usage_code)
When joining this two tables through the address_id, my expectation is that the PK index is used on both cases. However, Postgres is adding sort and materialize steps to the plan, which slows down the execution of the query. I have tried dropping indexes (person_address had an index on address_id), analyzing stats, without success.
I will appreciate any help on how to isolate this situation since those queries run slower than expected on our production environment
This is the query:
select *
from person_addresses pa
join address a
on pa.address_id = a.address_id
This is the plan :
Merge Join (cost=1506935.96..2416648.39 rows=16033774 width=338)
Merge Cond: (pa.address_id = ((a.address_id)::numeric))
-> Index Scan using person_addresses_pkey on person_addresses pa (cost=0.43..592822.76 rows=5256374 width=104)
-> Materialize (cost=1506935.53..1526969.90 rows=4006874 width=234)
-> Sort (cost=1506935.53..1516952.71 rows=4006874 width=234)
Sort Key: ((a.address_id)::numeric)
-> Seq Scan on address a (cost=0.00..163604.74 rows=4006874 width=234)
Thanks.
Edit 1. After the comment checked the data types and found a discrepancy. Fixing the data type changed the plan to the following
Hash Join (cost=343467.18..881125.47 rows=5256374 width=348)
Hash Cond: (pa.address_id = a.address_id)
-> Seq Scan on person_addresses pa (cost=0.00..147477.74 rows=5256374 width=104)
-> Hash (cost=159113.97..159113.97 rows=4033697 width=244)
-> Seq Scan on address_normalization a (cost=0.00..159113.97 rows=4033697 width=244)
Performance improvement is evident on the plan, but am wondering if the sequential scans are expected without any filters
So there are two questions here:
why did Postgres choose the (expensive) "Merge Join" in the first query?
The reason for this is that it could not use the more efficient "Hash Join" because the hash values of integer and numeric values would be different. But the Merge join requires that the values are sorted, and that's where the "Sort" step comes from in the first execution plan. Given the number of rows a "Nested Loop" would have been even more expensive.
The second question is:
I am wondering if the sequential scans are expected without any filters
Yes they are expected. The query retrieves all matching rows from both tables and that is done most efficiently by scanning all rows. An index scan requires about 2-3 I/O operations per row that has to be retrieved. A sequential scan usually requires less than one I/O operation as one block (which is the smallest unit the database reads from the disk) contains multiple rows.
You can run explain (analyze, buffers) to see how much "logical reads" each step takes.
I currently have a large table mivehdetailedtrajectory (25B rows) and a small table cell_data_tower (400 rows) that I need to join using PostGIS. Specifically, I need to run this query:
SELECT COUNT(traj.*), tower.id
FROM cell_data_tower tower LEFT OUTER JOIN mivehdetailedtrajectory traj
ON ST_Contains(tower.geom, traj.location)
GROUP BY tower.id
ORDER BY tower.id;
It errors out angry that it can't write to disk. This seemed weird for a SELECT so I ran EXPLAIN:
NOTICE: gserialized_gist_joinsel: jointype 1 not supported
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Sort (cost=28905094882.25..28905094883.25 rows=400 width=120)
Sort Key: tower.id
-> HashAggregate (cost=28905094860.96..28905094864.96 rows=400 width=120)
-> Nested Loop Left Join (cost=0.00..28904927894.80 rows=33393232 width=120)
Join Filter: ((tower.geom && traj.location) AND _st_contains(tower.geom, traj.location))
-> Seq Scan on cell_data_tower tower (cost=0.00..52.00 rows=400 width=153)
-> Materialize (cost=0.00..15839886.96 rows=250449264 width=164)
-> Seq Scan on mivehdetailedtrajectory traj (cost=0.00..8717735.64 rows=250449264 width=164)
I don't understand why postgres thinks it should materialize the inner table. Also, I don't understand the plan in general to be honest. Seems like it should keep the cell_data_tower table in memory and iterate over the mivehdetailedtrajectory table. Any thoughts on how I can optimize this to (a) run, (b) do so in a reasonable amount of time. Specifically, it seems like this should be do-able in less than 1 day.
Edit: Postgres version 9.3
Queries that need a lot of memory are those rare places where correlated subqueries perform better (LATERAL JOIN should work too but those are beyond me). Also please note you didn't select tower.id so your result wouldn't be too useful.
SELECT tower.id, (SELECT COUNT(traj.*)
FROM mivehdetailedtrajectory traj
WHERE ST_Contains(tower.geom, traj.location))
FROM cell_data_tower tower
ORDER BY tower.id;
Try running it with LIMIT 1 first. The total runtime should be the runtime for one tower * number of towers.
I don't have a db so big like you, only 80M. But in my case i create a LinkID field to know where is each geom, and calculate which one is the closest LinkID when i insert a new record.
When i found out a single LinkID take 30ms and doing that 80M times would take 27 days i went from pre calculate those values.
Also i don't keep all the records, i only keep a month at any time.
I have table with index:
Table:
Participates (player_id integer, other...)
Indexes:
"index_participates_on_player_id" btree (player_id)
Table contains 400kk rows.
I execute the same query two times:
Query: explain analyze select * from participates where player_id=149294217;
First time:
Index Scan using index_participates_on_player_id on participates (cost=0.57..19452.86 rows=6304 width=58) (actual time=261.061..2025.559 rows=332 loops=1)
Index Cond: (player_id = 149294217)
Total runtime: 2025.644 ms
(3 rows)
Second time:
Index Scan using index_participates_on_player_id on participates (cost=0.57..19452.86 rows=6304 width=58) (actual time=0.030..0.479 rows=332 loops=1)
Index Cond: (player_id = 149294217)
Total runtime: 0.527 ms
(3 rows)
So, first query has big actual time - how to increase speed the first execute?
UPDATE
Sorry, How to accelerate first query?)
Why index scan search so slow?
The difference in execution time is probably because the second time through, the table/index data from the first run of the query is in the shared buffers cache, and so the subsequent run of the query takes less time, since it doesn't have to go long-path to disk for that information.
Edit:
Regarding the slowness of the original query, does the table have a lot of dead tuples? Those can slow things down considerably. If so, VACUUM ANALYZE the table.
Another factor can be if there are long-ish idle transactions on the server (i.e. several minutes or more). Due to the nature of MVCC this can also slow even index-based queries down quite a bit.
Also, what the query planner is expecting for the results vs. actual is quite different, so you may want to do an ANALYZE on the query beforehand to update the stats.
1.) Take a look at http://www.postgresql.org/docs/9.3/static/runtime-config-resource.html and check out for some tuning for using more memory. This can speed up your search but will not give you a warranty (depending on the answer before)!
2.) Transfer a part of your tables/indexes to a more powerful tablespace. For example a tablespace based on SSDs.
I spent over an hour today puzzling myself over a query plan that I couldn't understand. The query was an UPDATE and it just wouldn't run at all. Totally deadlocked: pg_locks showed it wasn't waiting for anything either. Now, I don't consider myself the best or worst query plan reader, but I find this one exceptionally difficult. I'm wondering how does one read these? Is there a methodology that the Pg aces follow in order to pinpoint the error?
I plan on asking another question as to how to work around this issue, but right now I'm speaking specifically about how to read these types of plans.
QUERY PLAN
--------------------------------------------------------------------------------------------
Nested Loop Anti Join (cost=47680.88..169413.12 rows=1 width=77)
Join Filter: ((co.fkey_style = v.chrome_styleid) AND (co.name = o.name))
-> Nested Loop (cost=5301.58..31738.10 rows=1 width=81)
-> Hash Join (cost=5301.58..29722.32 rows=229 width=40)
Hash Cond: ((io.lot_id = iv.lot_id) AND ((io.vin)::text = (iv.vin)::text))
-> Seq Scan on options io (cost=0.00..20223.32 rows=23004 width=36)
Filter: (name IS NULL)
-> Hash (cost=4547.33..4547.33 rows=36150 width=24)
-> Seq Scan on vehicles iv (cost=0.00..4547.33 rows=36150 width=24)
Filter: (date_sold IS NULL)
-> Index Scan using options_pkey on options co (cost=0.00..8.79 rows=1 width=49)
Index Cond: ((co.fkey_style = iv.chrome_styleid) AND (co.code = io.code))
-> Hash Join (cost=42379.30..137424.09 rows=16729 width=26)
Hash Cond: ((v.lot_id = o.lot_id) AND ((v.vin)::text = (o.vin)::text))
-> Seq Scan on vehicles v (cost=0.00..4547.33 rows=65233 width=24)
-> Hash (cost=20223.32..20223.32 rows=931332 width=44)
-> Seq Scan on options o (cost=0.00..20223.32 rows=931332 width=44)
(17 rows)
The issue with this query plan - I believe I understand - is probably best said by RhodiumToad (he is definitely better at this, so I'll bet on his explanation being better) of irc://irc.freenode.net/#postgresql:
oh, that plan is potentially disastrous
the problem with that plan is that it's running a hugely expensive hashjoin for each row
the problem is the rows=1 estimate from the other join and
the planner thinks it's ok to put a hugely expensive query in the inner path of a nestloop where the outer path is estimated to return only one row.
since, obviously, by the planner's estimate the expensive part will only be run once
but this has an obvious tendency to really mess up in practice
the problem is that the planner believes its own estimates
ideally, the planner needs to know the difference between "estimated to return 1 row" and "not possible to return more than 1 row"
but it's not at all clear how to incorporate that into the existing code
He goes on to say:
it can affect any join, but usually joins against subqueries are the most likely
Now when I read this plan the first thing I noticed was the Nested Loop Anti Join, this had a cost of 169,413 (I'll stick to upper bounds). This Anti-Join breaks down to the result of a Nested Loop at cost of 31,738, and the result of a Hash Join at a cost of 137,424. Now, the 137,424, is much greater than 31,738 so I knew the problem was the Hash Join.
Then I proceed to EXPLAIN ANALYZE the Hash Join segment outside of the query. It executed in 7 secs. I made sure there was indexes on (lot_id, vin), and (co.code, and v.code) -- there was. I disabled seq_scan and hashjoin individually and notice a speed increase of less than 2 seconds. Not near enough to account for why it wasn't progressing after an hour.
But, after all this I'm totally wrong! Yes, it was the slower part of the query, but because the rows="1" bit (I presume it was on the Nested Loop Anti Join). Here it is a bug (lack of ability) in the planner mis-estimating the amount of rows? How am I supposed to read into this to come to the same conclusion RhodiumToad did?
Is it simply rows="1" that is supposed to trigger me figuring this out?
I did run VACUUM FULL ANALYZE on all of the tables involved, and this is Postgresql 8.4.
Seeing through issues like this requires some experience on where things can go wrong. But to find issues in the query plans, try to validate the produced plan from inside out, check if the number of rows estimates are sane and cost estimates match spent time. Btw. the two cost estimates aren't lower and upper bounds, first is the estimated cost to produce the first row of output, the second number is the estimated total cost, see explain documentation for details, there is also some planner documentation available. It also helps to know how the different access methods work. As a starting point Wikipedia has information on nested loop, hash and merge joins.
In your example, you'd start with:
-> Seq Scan on options io (cost=0.00..20223.32 rows=23004 width=36)
Filter: (name IS NULL)
Run EXPLAIN ANALYZE SELECT * FROM options WHERE name IS NULL; and see if the returned rows matches the estimate. A factor of 2 off isn't usually a problem, you're trying to spot order of magnitude differences.
Then see EXPLAIN ANALYZE SELECT * FROM vehicles WHERE date_sold IS NULL; returns expected amount of rows.
Then go up one level to the hash join:
-> Hash Join (cost=5301.58..29722.32 rows=229 width=40)
Hash Cond: ((io.lot_id = iv.lot_id) AND ((io.vin)::text = (iv.vin)::text))
See if EXPLAIN ANALYZE SELECT * FROM vehicles AS iv INNER JOIN options io ON (io.lot_id = iv.lot_id) AND ((io.vin)::text = (iv.vin)::text) WHERE iv.date_sold IS NULL AND io.name IS NULL; results in 229 rows.
Up one more level adds INNER JOIN options co ON (co.fkey_style = iv.chrome_styleid) AND (co.code = io.code) and is expected to return only one row. This is probably where the issue is because if the actual numebr of rows goes from 1 to 100, the total cost estimate of traversing the inner loop of the containing nested loop is off by a factor of 100.
The underlying mistake that the planner is making is probably that it expects that the two predicates for joining in co are independent of each other and multiplies their selectivities. While in reality they may be heavily correlated and the selectivity is closer to MIN(s1, s2) not s1*s2.
Did you ANALYZE the tables? And what does pg_stats has to say about these tables? The queryplan is based on the stats, these have to be ok. And what version do you use? 8.4?
The costs can be calculated by using the stats, the amount of relpages, amount of rows and the settings in postgresql.conf for the Planner Cost Constants.
work_mem is also involved, it might be too low and force the planner to do a seqscan, to kill performance...