Postgresql poorly planned query runs too long - postgresql

I have a complex query which has been greatly simplified below, running on "PostgreSQL 11.9 on aarch64-unknown-linux-gnu, compiled by aarch64-unknown-linux-gnu-gcc (GCC) 7.4.0, 64-bit", running on an AWS Aurora Serverless 2xlarge server (8 cores, 64GB RAM).
I have the following...
mv_journey, a materialized view with ~550M rows which contains information about journeys which have an origin and destination, and some measures about those (how long the journey took, etc), defined with columns from_id and from_region which identify the origins, and to_id and to_region for the destinations.
place_from and place_to, which are calculated from a function, fn_location_get, in an initial step of a CTE, and contain id and region (which map to the from_id, from_region, and to_id, to_region, respectively). These also contain rollup levels from the region, eg country, continent. Typically these return between ~100 and 20,000 rows.
Later in that CTE, I use place_from and place_to to filter the 550M mv_journey rows, and group by to create a rollup report based on journeys, eg from country to country.
The simplified query is something like this.
WITH place_from AS (
select *
from fn_location_get(...)
), place_to AS (
select *
from fn_location_get(...)
)
select [many dimension columns...]
, [a few aggregated measure columns]
from mv_journey j
inner join place_from o on j.from_id = o.id
and j.from_region = o.region
inner join place_to d on j.from_id = d.id
and j.from_region = d.region
where service_type_id = ?
group by [many dimension columns...]
I have indexes on mv_journey
CREATE INDEX idx_mv_journey_from ON mv_journey (from_id, from_region);
CREATE INDEX idx_mv_journey_to ON mv_journey (to_id, to_region);
When I run the query (using SET LOCAL work_mem = '2048MB' to invoke quicksorts) with a small number of rows in the place_from (92) and a large number in place_to (~18,000), the query runs in about 25 seconds with the following query plan (which includes the steps in the CTE to generate place_from and place_to).
"GroupAggregate (cost=530108.64..530129.64 rows=30 width=686) (actual time=13097.187..25408.707 rows=92 loops=1)"
" Group Key: [many dimension columns...]"
" CTE place_from"
" -> Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396) (actual time=34.275..34.331 rows=92 loops=1)"
" CTE place_to"
" -> Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396) (actual time=96.287..97.428 rows=18085 loops=1)"
" -> Sort (cost=530088.14..530088.22 rows=30 width=622) (actual time=12935.329..13295.468 rows=1871349 loops=1)"
" Sort Key: [many dimension columns...]"
" Sort Method: quicksort Memory: 826782kB"
" -> Merge Join (cost=529643.68..530087.41 rows=30 width=622) (actual time=4708.780..6021.449 rows=1871349 loops=1)"
" Merge Cond: ((j.to_id = d.id) AND (j.to_region = d.region))"
" -> Sort (cost=529573.85..529719.16 rows=58124 width=340) (actual time=4583.265..4788.625 rows=1878801 loops=1)"
" Sort Key: j.to_id, j.to_region"
" Sort Method: quicksort Memory: 623260kB"
" -> Nested Loop (cost=0.57..524974.25 rows=58124 width=340) (actual time=34.324..3079.815 rows=1878801 loops=1)"
" -> CTE Scan on place_from o (cost=0.00..20.00 rows=1000 width=320) (actual time=34.277..34.432 rows=92 loops=1)"
" -> Index Scan using idx_mv_journey_from on mv_journey j (cost=0.57..524.37 rows=58 width=60) (actual time=0.018..30.022 rows=20422 loops=92)"
" Index Cond: ((from_id = o.id) AND (from_region = o.region))"
" Filter: (service_type_id = 'ALL'::text)"
" Rows Removed by Filter: 81687"
" -> Sort (cost=69.83..72.33 rows=1000 width=320) (actual time=125.505..223.780 rows=1871350 loops=1)"
" Sort Key: d.id, d.region"
" Sort Method: quicksort Memory: 3329kB"
" -> CTE Scan on place_to d (cost=0.00..20.00 rows=1000 width=320) (actual time=96.292..103.677 rows=18085 loops=1)"
"Planning Time: 0.546 ms"
"Execution Time: 25501.827 ms"
The problem is that when I swap the locations in the from/to, ie, large number of rows in the place_from (~18,000) and a small number in place_to (92), the query takes forever. By the way, mv_journey is expected to have the same number of rows matched in both cases - there are not more records expected in one direction than the other.
I have not once got this second query to complete without it running for hours and PGAdmin 4 losing the connection to the server. I therefore cannot even do a EXPLAIN ANALYZE on it. However I have the EXPLAIN:
"GroupAggregate (cost=474135.40..474152.90 rows=25 width=686)"
" Group Key: [many dimension columns...]"
" CTE place_from"
" -> Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396)"
" CTE place_to"
" -> Function Scan on fn_location_get (cost=0.25..10.25 rows=1000 width=396)"
" -> Sort (cost=474114.90..474114.96 rows=25 width=622)"
" Sort Key: [many dimension columns...]"
" -> Merge Join (cost=473720.23..474114.31 rows=25 width=622)"
" Merge Cond: ((j.to_id = d.id) AND (j.to_region = d.region))## Heading ##"
" -> Sort (cost=473650.40..473779.18 rows=51511 width=340)"
" Sort Key: j.to_id, j.to_region"
" -> Nested Loop (cost=0.57..469619.00 rows=51511 width=340)"
" -> CTE Scan on place_from o (cost=0.00..20.00 rows=1000 width=320)"
" -> Index Scan using idx_mv_journey_from on mv_journey j (cost=0.57..469.08 rows=52 width=60)"
" Index Cond: ((from_id = o.id) AND (from_region = o.region))"
" Filter: (service_type_id = 'ALL'::text)"
" -> Sort (cost=69.83..72.33 rows=1000 width=320)"
" Sort Key: d.id, d.region"
" -> CTE Scan on place_to d (cost=0.00..20.00 rows=1000 width=320)"
My assumption was that if I had the equivalent indexes on both sides of the from/to, then Postgres would use the mirror-opposite query plan, doing a merge join for the origin and a nested loop join using idx_mv_journey_to for the destination.
But it looks like the query planner's row count estimates are way off in both queries. It seems only luck that the first query performs so well despite that.
I have tried the following, none of which worked
swap the inner join statements so the destination join is first
ALTER TABLE mv_journey ALTER COLUMN to_id SET STATISTICS 1000; ANALYZE mv_journey
ALTER TABLE mv_journey ALTER COLUMN from_id SET STATISTICS 1000; ANALYZE mv_journey
I guess the plan is done before the start of CTE execution? And that's why it has no idea what will come out of the fn_location_get calls that create the place_from and place_to sets?
fn_location_get is a complicated function with its own recursive CTE and I don't want to bring its logic out of the function and into this CTE.
What's the best way out of this mess?

The most straightforward approach is would be to create two temp tables as the result of the function calls, manually ANALYZE them, then run the query against the temp tables rather than the function calls.

I sort of worked out an answer in writing the question... don't use a CTE, but use temp tables instead.
DROP TABLE IF EXISTS place_from;
CREATE TEMP TABLE place_from AS
select *
from fn_location_get(...);
DROP TABLE IF EXISTS place_to;
CREATE TEMP TABLE place_to AS
select *
from fn_location_get(...);
select [many dimension columns...]
, [a few aggregated measure columns]
from mv_journey j
inner join place_from o on j.from_id = o.id
and j.from_region = o.region
inner join place_to d on j.from_id = d.id
and j.from_region = d.region
where service_type_id = ?
group by [many dimension columns...]
I thought this worked because by the time the query plan for the reporting select is done, the temp tables' row counts are known, and a better query plan can be made.
However, the rows estimates are still inaccurate. Good enough to choose the right plan, but inaccurate.
"GroupAggregate (cost=200682.98..200706.78 rows=34 width=686) (actual time=21233.486..33200.052 rows=92 loops=1)"
" Group Key: [many dimension columns...]"
" -> Sort (cost=200682.98..200683.07 rows=34 width=622) (actual time=21077.807..21443.739 rows=1802571 loops=1)"
" Sort Key: [many dimension columns...]"
" Sort Method: quicksort Memory: 800480kB"
" -> Merge Join (cost=200555.00..200682.12 rows=34 width=622) (actual time=4820.798..6106.722 rows=1802571 loops=1)"
" Merge Cond: ((from_id = o.id) AND (from_region = o.region))"
" -> Sort (cost=199652.79..199677.24 rows=9779 width=340) (actual time=4794.354..5003.954 rows=1810023 loops=1)"
" Sort Key: j.from_id, j.from_region"
" Sort Method: quicksort Memory: 603741kB"
" -> Nested Loop (cost=0.57..199004.67 rows=9779 width=340) (actual time=0.044..3498.767 rows=1810023 loops=1)"
" -> Seq Scan on place_to d (cost=0.00..11.90 rows=190 width=320) (actual time=0.006..0.078 rows=92 loops=1)"
" -> Index Scan using idx_mv_journey_to on mv_journey j (cost=0.57..1046.82 rows=51 width=60) (actual time=0.020..35.055 rows=19674 loops=92)"
" Index Cond: ((j.to_id = d.id) AND (j.to_region = d.region))"
" Filter: (service_type_id = 'ALL'::text)"
" Rows Removed by Filter: 78697"
" -> Sort (cost=902.20..920.02 rows=7125 width=320) (actual time=26.434..121.106 rows=1802572 loops=1)"
" Sort Key: o.id, o.region"
" Sort Method: quicksort Memory: 3329kB"
" -> Seq Scan on place_from o (cost=0.00..446.25 rows=7125 width=320) (actual time=0.016..4.205 rows=18085 loops=1)"
"Planning Time: 0.792 ms"
"Execution Time: 33286.461 ms"
UPDATE: When adding the manual ANALYZE after the CREATE as jjanes suggests, the estimates are now as expected.

Related

Efficient way of calculating overlap area in PostGIS

I am working on calculating the overlap of two tables/layers in a PostGIS database. One set is a grid of hexagons for which I want to calculate the fraction of overlap with another set of polygons, for each of the hexagons. The multipolygon set also has a few polygons that overlap, so I need to dissolve/union those. Before I was doing this in FME, but it kept running out of memory for some of the larger polygons. That's why I want to do this in the database (and PostGIS should be very much capable of doing that).
Here is what I have so far, and it works, and memory is not running out now:
EXPLAIN ANALYZE
WITH rh_union AS (
SELECT (ST_Dump(ST_Union(rh.geometry))).geom AS geometry
FROM relevant_habitats rh
WHERE rh.assessment_area_id=1
)
SELECT h.receptor_id,
SUM(ROUND((ST_Area(ST_Intersection(rhu.geometry, h.geometry)) / ST_Area(h.geometry))::numeric,3)) AS frac_overlap
FROM rh_union rhu, hexagons h
WHERE h.zoom_level=1 AND ST_Intersects(rhu.geometry, h.geometry)
GROUP BY h.receptor_id
So I first break the multipolygon and union what I can. Then calculate the overlay of the hexagons with the polygons. Then calculate the (sum of all small pieces of) area.
Now, my question is:
is this an efficient way of doing this? Or would there be a better way?
(And a side question: is it correct to first ST_Union and then ST_Dump?)
--Update with EXPLAIN ANALYZE
Output for a single area:
"QUERY PLAN"
"GroupAggregate (cost=1996736.77..15410052.20 rows=390140 width=36) (actual time=571.303..1137.657 rows=685 loops=1)"
" Group Key: h.receptor_id"
" -> Sort (cost=1996736.77..1998063.55 rows=530712 width=188) (actual time=571.090..620.379 rows=806 loops=1)"
" Sort Key: h.receptor_id"
" Sort Method: external merge Disk: 42152kB"
" -> Nested Loop (cost=55.53..1848314.51 rows=530712 width=188) (actual time=382.769..424.643 rows=806 loops=1)"
" -> Result (cost=55.25..1321.51 rows=1000 width=32) (actual time=382.550..383.696 rows=65 loops=1)"
" -> ProjectSet (cost=55.25..61.51 rows=1000 width=32) (actual time=382.544..383.652 rows=65 loops=1)"
" -> Aggregate (cost=55.25..55.26 rows=1 width=32) (actual time=381.323..381.325 rows=1 loops=1)"
" -> Index Scan using relevant_habitats_pkey on relevant_habitats rh (cost=0.28..28.75 rows=12 width=130244) (actual time=0.026..0.048 rows=12 loops=1)"
" Index Cond: (assessment_area_id = 94)"
" -> Index Scan using idx_hexagons_geometry_gist on hexagons h (cost=0.29..1846.45 rows=53 width=156) (actual time=0.315..0.624 rows=12 loops=65)"
" Index Cond: (geometry && (((st_dump((st_union(rh.geometry))))).geom))"
" Filter: (((zoom_level)::integer = 1) AND st_intersects((((st_dump((st_union(rh.geometry))))).geom), geometry))"
" Rows Removed by Filter: 19"
"Planning Time: 0.390 ms"
"Execution Time: 1372.443 ms"
Update 2: now the output of EXPLAIN ANALYZE on the second select (SELECT h.receptor_id~) and the CTE replaced by a (temp) table:
"QUERY PLAN"
"GroupAggregate (cost=2691484.47..20931829.74 rows=390140 width=36) (actual time=29.455..927.945 rows=685 loops=1)"
" Group Key: h.receptor_id"
" -> Sort (cost=2691484.47..2693288.89 rows=721768 width=188) (actual time=28.382..31.514 rows=806 loops=1)"
" Sort Key: h.receptor_id"
" Sort Method: quicksort Memory: 336kB"
" -> Nested Loop (cost=0.29..2488035.20 rows=721768 width=188) (actual time=0.189..27.852 rows=806 loops=1)"
" -> Seq Scan on rh_union rhu (cost=0.00..23.60 rows=1360 width=32) (actual time=0.016..0.050 rows=65 loops=1)"
" -> Index Scan using idx_hexagons_geometry_gist on hexagons h (cost=0.29..1828.89 rows=53 width=156) (actual time=0.258..0.398 rows=12 loops=65)"
" Index Cond: (geometry && rhu.geometry)"
" Filter: (((zoom_level)::integer = 1) AND st_intersects(rhu.geometry, geometry))"
" Rows Removed by Filter: 19"
"Planning Time: 0.481 ms"
"Execution Time: 928.583 ms"
You want a metric describing the extent of overlap of a table of polygons on other polygons on another table.
This query returns the id of overlapping polygons within the same table and an indicator as a percentage.
SELECT
CONCAT(a.polyid,' ',b.polyid) AS intersecting_polys,
CONCAT(a.attribute_x,' ',b.attribute_x) AS attribute,
ROUND((100 * (ST_Area(ST_Intersection(a.geom, b.geom))) / ST_Area(a.geom))::NUMERIC,0) AS pc_overlap
FROM your_schema.your_table AS a
LEFT JOIN your_schema.your_table AS b
ON (a.geom && b.geom AND ST_Relate(a.geom, b.geom, '2********'))
WHERE a.polyid != b.polyid
;
Note the term
ON (a.geom && b.geom AND ST_Relate(a.geom, b.geom, '2********')) is used instead of ST_Covers. You might want to experiment which is correct in your use case.

Postgres Explain Plans Are DIfferent for same query with different value

I have databases running Postgres 9.56 on heroku.
I'm running the following SQL with different parameter value, but getting very different results in the performance.
Query 1
SELECT COUNT(s), DATE_TRUNC('MONTH', t.departure)
FROM tk_seat s
LEFT JOIN tk_trip t ON t.trip_id = s.trip_id
WHERE DATE_PART('year', t.departure)= '2017'
AND t.trip_status = 'BOOKABLE'
AND t.route_id = '278'
AND s.seat_status_type != 'NONE'
AND s.operator_id = '15'
GROUP BY DATE_TRUNC('MONTH', t.departure)
ORDER BY DATE_TRUNC('MONTH', t.departure)
Query 2
SELECT COUNT(s), DATE_TRUNC('MONTH', t.departure)
FROM tk_seat s
LEFT JOIN tk_trip t ON t.trip_id = s.trip_id
WHERE DATE_PART('year', t.departure)= '2017'
AND t.trip_status = 'BOOKABLE'
AND t.route_id = '150'
AND s.seat_status_type != 'NONE'
AND s.operator_id = '15'
GROUP BY DATE_TRUNC('MONTH', t.departure)
ORDER BY DATE_TRUNC('MONTH', t.departure)
Only Difference is t.route_id value.
So, I tried running explain and give me very different result.
For Query 1
"GroupAggregate (cost=279335.17..279335.19 rows=1 width=298)"
" Group Key: (date_trunc('MONTH'::text, t.departure))"
" -> Sort (cost=279335.17..279335.17 rows=1 width=298)"
" Sort Key: (date_trunc('MONTH'::text, t.departure))"
" -> Nested Loop (cost=0.00..279335.16 rows=1 width=298)"
" Join Filter: (s.trip_id = t.trip_id)"
" -> Seq Scan on tk_trip t (cost=0.00..5951.88 rows=1 width=12)"
" Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (route_id = '278'::bigint) AND (date_part('year'::text, departure) = '2017'::double precision))"
" -> Seq Scan on tk_seat s (cost=0.00..271738.35 rows=131594 width=298)"
" Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
For Query 2
"Sort (cost=278183.94..278183.95 rows=1 width=298)"
" Sort Key: (date_trunc('MONTH'::text, t.departure))"
" -> HashAggregate (cost=278183.92..278183.93 rows=1 width=298)"
" Group Key: date_trunc('MONTH'::text, t.departure)"
" -> Hash Join (cost=5951.97..278183.88 rows=7 width=298)"
" Hash Cond: (s.trip_id = t.trip_id)"
" -> Seq Scan on tk_seat s (cost=0.00..271738.35 rows=131594 width=298)"
" Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
" -> Hash (cost=5951.88..5951.88 rows=7 width=12)"
" -> Seq Scan on tk_trip t (cost=0.00..5951.88 rows=7 width=12)"
" Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (route_id = '150'::bigint) AND (date_part('year'::text, departure) = '2017'::double precision))"
My question is why and how can i make it same? Because first Query give me very bad performance
Query 1 Analyze
"GroupAggregate (cost=274051.28..274051.31 rows=1 width=8) (actual time=904682.606..904684.283 rows=7 loops=1)"
" Group Key: (date_trunc('MONTH'::text, t.departure))"
" -> Sort (cost=274051.28..274051.29 rows=1 width=8) (actual time=904682.432..904682.917 rows=13520 loops=1)"
" Sort Key: (date_trunc('MONTH'::text, t.departure))"
" Sort Method: quicksort Memory: 1018kB"
" -> Nested Loop (cost=0.42..274051.27 rows=1 width=8) (actual time=1133.925..904676.254 rows=13520 loops=1)"
" Join Filter: (s.trip_id = t.trip_id)"
" Rows Removed by Join Filter: 42505528"
" -> Index Scan using tk_trip_route_id_idx on tk_trip t (cost=0.42..651.34 rows=1 width=12) (actual time=0.020..2.720 rows=338 loops=1)"
" Index Cond: (route_id = '278'::bigint)"
" Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (date_part('year'::text, departure) = '2017'::double precision))"
" Rows Removed by Filter: 28"
" -> Seq Scan on tk_seat s (cost=0.00..271715.83 rows=134728 width=8) (actual time=0.071..2662.102 rows=125796 loops=338)"
" Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
" Rows Removed by Filter: 6782294"
"Planning time: 1.172 ms"
"Execution time: 904684.570 ms"
Query 2 Analyze
"Sort (cost=275018.88..275018.89 rows=1 width=8) (actual time=2153.843..2153.843 rows=9 loops=1)"
" Sort Key: (date_trunc('MONTH'::text, t.departure))"
" Sort Method: quicksort Memory: 25kB"
" -> HashAggregate (cost=275018.86..275018.87 rows=1 width=8) (actual time=2153.833..2153.834 rows=9 loops=1)"
" Group Key: date_trunc('MONTH'::text, t.departure)"
" -> Hash Join (cost=2797.67..275018.82 rows=7 width=8) (actual time=2.472..2147.093 rows=36565 loops=1)"
" Hash Cond: (s.trip_id = t.trip_id)"
" -> Seq Scan on tk_seat s (cost=0.00..271715.83 rows=134728 width=8) (actual time=0.127..2116.153 rows=125796 loops=1)"
" Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
" Rows Removed by Filter: 6782294"
" -> Hash (cost=2797.58..2797.58 rows=7 width=12) (actual time=1.853..1.853 rows=1430 loops=1)"
" Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 78kB"
" -> Bitmap Heap Scan on tk_trip t (cost=32.21..2797.58 rows=7 width=12) (actual time=0.176..1.559 rows=1430 loops=1)"
" Recheck Cond: (route_id = '150'::bigint)"
" Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (date_part('year'::text, departure) = '2017'::double precision))"
" Rows Removed by Filter: 33"
" Heap Blocks: exact=333"
" -> Bitmap Index Scan on tk_trip_route_id_idx (cost=0.00..32.21 rows=1572 width=0) (actual time=0.131..0.131 rows=1463 loops=1)"
" Index Cond: (route_id = '150'::bigint)"
"Planning time: 0.211 ms"
"Execution time: 2153.972 ms"
You can - possibly - make them the same if you hint postgres to not use Nested Loops:
SET enable_nestloop = 'off';
You can make it permanent by setting it to either server, role, inside function definition or server configuration:
ALTER DATABASE postgres
SET enable_nestloop = 'off';
ALTER ROLE lkaminski
SET enable_nestloop = 'off';
CREATE FUNCTION add(integer, integer) RETURNS integer
AS 'select $1 + $2;'
LANGUAGE SQL
SET enable_nestloop = 'off'
IMMUTABLE
RETURNS NULL ON NULL INPUT;
As for why - you change search condition and planner estimates that from tk_trip he will get 1 row instead of 7, so it changes plan because it seems like Nested Loop will be better. Sometimes it is wrong about those and you might get slower execution time. But if you "force" it to not use Nested Loops then for different parameter it could be slower to use second plan instead of first one (with Nested Loop).
You can make planner estimates more accurate by increasing how much statistics it gathers per column. It might help.
ALTER TABLE tk_trip ALTER COLUMN route_id SET STATISTICS 1000;
As a side note - your LEFT JOIN is actually INNER JOIN, because you have put conditions for that table inside WHERE instead of ON. You should get different plan (and result) if you move them over to ON - assuming you wanted LEFT JOIN.

Postgresql Query Too Slow

I use PostgreSQL to storage my DB, and I also create index to speed up the query time.
After I created index on table, the query runs very fast, about 1.5s per query.
But, a few days later, the query runs too low, about 20-28s per query.
I have tried to Drop index then Re-Create index again. The query runs fast again?
Could you help me resolve this issue or Do you have any ideal about this problem?
P/S: here is the query:
SELECT category,
video_title AS title,
event_count AS contentView,
VOD_GROUPBY_ANDSORT.rank
FROM
(SELECT VOD_SORTBY_VIEW.category,
VOD_SORTBY_VIEW.video_title,
VOD_SORTBY_VIEW.event_count,
ROW_NUMBER() OVER(PARTITION BY VOD_SORTBY_VIEW.category
ORDER BY VOD_SORTBY_VIEW.event_count DESC) AS RN,
DENSE_RANK() OVER(
ORDER BY VOD_SORTBY_VIEW.category ASC) AS rank
FROM
(SELECT VOD.category AS category,
VOD.video_title,
SUM(INV.event_count) AS event_count
FROM
(SELECT content_hash.hash_value,
VODCT.category,
VODCT.video_title
FROM
(SELECT vod_content.content_id,
vod_content.category,
vod_content.video_title
FROM vod_content
WHERE vod_content.category IS NOT NULL) VODCT
LEFT JOIN content_hash ON content_hash.content_value = VODCT.content_id) VOD
LEFT JOIN inventory_stats INV ON INV.hash_value = VOD.hash_value
WHERE TIME BETWEEN '2017-02-06 08:00:00'::TIMESTAMP AND '2017-03-06 08:00:00'::TIMESTAMP
GROUP BY VOD.category,
VOD.video_title ) VOD_SORTBY_VIEW ) VOD_GROUPBY_ANDSORT
WHERE RN <= 3
AND event_count > 100
ORDER BY category
And here is the Analyze:
"QUERY PLAN"
"Subquery Scan on vod_groupby_andsort (cost=368586.86..371458.16 rows=6381 width=63) (actual time=19638.213..19647.468 rows=177 loops=1)"
" Filter: ((vod_groupby_andsort.rn <= 3) AND (vod_groupby_andsort.event_count > 100))"
" Rows Removed by Filter: 10246"
" -> WindowAgg (cost=368586.86..370596.77 rows=57426 width=71) (actual time=19638.199..19646.856 rows=10423 loops=1)"
" -> WindowAgg (cost=368586.86..369735.38 rows=57426 width=63) (actual time=19638.194..19642.030 rows=10423 loops=1)"
" -> Sort (cost=368586.86..368730.43 rows=57426 width=55) (actual time=19638.185..19638.984 rows=10423 loops=1)"
" Sort Key: vod_sortby_view.category, vod_sortby_view.event_count DESC"
" Sort Method: quicksort Memory: 1679kB"
" -> Subquery Scan on vod_sortby_view (cost=350535.62..362084.01 rows=57426 width=55) (actual time=16478.589..19629.400 rows=10423 loops=1)"
" -> GroupAggregate (cost=350535.62..361509.75 rows=57426 width=55) (actual time=16478.589..19628.381 rows=10423 loops=1)"
" Group Key: vod_content.category, vod_content.video_title"
" -> Sort (cost=350535.62..353135.58 rows=1039987 width=51) (actual time=16478.570..19436.741 rows=1275817 loops=1)"
" Sort Key: vod_content.category, vod_content.video_title"
" Sort Method: external merge Disk: 76176kB"
" -> Hash Join (cost=95179.29..175499.62 rows=1039987 width=51) (actual time=299.040..807.418 rows=1275817 loops=1)"
" Hash Cond: (inv.hash_value = content_hash.hash_value)"
" -> Bitmap Heap Scan on inventory_stats inv (cost=48708.84..114604.81 rows=1073198 width=23) (actual time=133.873..269.249 rows=1397466 loops=1)"
" Recheck Cond: ((""time"" >= '2017-02-06 08:00:00'::timestamp without time zone) AND (""time"" <= '2017-03-06 08:00:00'::timestamp without time zone))"
" Heap Blocks: exact=11647"
" -> Bitmap Index Scan on inventory_stats_pkey (cost=0.00..48440.54 rows=1073198 width=0) (actual time=132.113..132.113 rows=1397466 loops=1)"
" Index Cond: ((""time"" >= '2017-02-06 08:00:00'::timestamp without time zone) AND (""time"" <= '2017-03-06 08:00:00'::timestamp without time zone))"
" -> Hash (cost=46373.37..46373.37 rows=7766 width=66) (actual time=165.125..165.125 rows=13916 loops=1)"
" Buckets: 16384 (originally 8192) Batches: 1 (originally 1) Memory Usage: 1505kB"
" -> Nested Loop (cost=1.72..46373.37 rows=7766 width=66) (actual time=0.045..159.441 rows=13916 loops=1)"
" -> Seq Scan on content_hash (cost=0.00..389.14 rows=8014 width=43) (actual time=0.007..2.185 rows=16365 loops=1)"
" -> Bitmap Heap Scan on vod_content (cost=1.72..5.73 rows=1 width=72) (actual time=0.009..0.009 rows=1 loops=16365)"
" Recheck Cond: (content_id = content_hash.content_value)"
" Filter: (category IS NOT NULL)"
" Rows Removed by Filter: 0"
" Heap Blocks: exact=15243"
" -> Bitmap Index Scan on vod_content_pkey (cost=0.00..1.72 rows=1 width=0) (actual time=0.007..0.007 rows=1 loops=16365)"
" Index Cond: (content_id = content_hash.content_value)"
"Planning time: 1.665 ms"
"Execution time: 19655.693 ms"
You probably need to vacuum and analyze your tables more aggressively, especially if you're doing a lot of deletes and updates.
When a row is deleted or updated, it isn't removed, it's just marked obsolete. vacuum cleans up these dead rows.
analyze updates the statistics about your data used by the query planner.
Normally these are run by the autovacuum daemon. It's possible this has been disabled, or its not running frequently enough.
See this blog about Slow PostgreSQL Performance and the PostgreSQL docs about Routine Vacuuming for more details.
Here is an attempt at a condensed version of the query. I'm not saying it's any faster. Also since I can't run it, there might be some issues.
Left joins where converted to inner since the time value from the second join is required.
Also, I'm curious what the purpose of the dense_rank function is. Looks like you are getting the top 3 titles for each category and then giving the ones in the same category all the same number based on alphanumeric sort? The category already gives them a unique common identifier.
SELECT category, video_title AS title, event_count AS contentView,
DENSE_RANK() OVER(ORDER BY v.category ASC) AS rank
FROM (
SELECT c.category, c.video_title,
SUM(i.event_count) AS event_count,
ROW_NUMBER() OVER(PARTITION BY category ORDER BY sum(i.event_count) DESC) AS rn
FROM vod_content c
JOIN content_hash h ON h.content_value = c.content_id
JOIN inventory_stats i ON i.hash_value = v.hash_value
where c.category is not null
and i.time BETWEEN '2017-02-06 08:00:00'::TIMESTAMP AND '2017-03-06 08:00:00'::TIMESTAMP
GROUP BY c.category, c.video_title
) v
where rn <= 3 and event_count > 100
ORDER BY category

PostgreSQL complex summation query

I have the following tables:
video (id, name)
keyframe (id, name, video_id) /*video_id has fk on video.id*/
detector (id, concepts)
score (detector_id, keyframe_id, score) /*detector_id has fk on detector .id and keyframe_id has fk on keyframe.id*/
In essence, a video has multiple keyframes associated with it, and each keyframe has been scored by all detectors. Each detector has a string of concepts it will score the keyframes on.
Now, I want to find, in a single query if possible, the following:
Given an array of detector id's (say, max 5), return the top 10 videos that have the best score on those detectors combined. Scoring them by averaging the keyframe scores per video per detector, and then summing the detector scores.
Example:
For a video with 3 associated keyframes with the following scores for 2 detectors:
detector_id | keyframe_id | score
1 1 0.0281
1 2 0.0012
1 3 0.0269
2 1 0.1341
2 2 0.9726
2 3 0.7125
This would give a score of the video:
sum(avg(0.0281, 0.0012, 0.0269), avg(0.1341, 0.9726, 0.7125))
Eventually I want the following result:
video_id | score
1 0.417328
2 ...
It has to be something like this I think, but I'm not quite there yet:
select
(select
(select sum(avg_score) summed_score
from
(select
avg(s.score) avg_score
from score s
where s.detector_id = ANY(array[1,2,3,4,5]) and s.keyframe_id = kf.id) x)
from keyframe kf
where kf.video_id = v.id) y
from video v
My score table is pretty big (100M rows), so I'd like it to be as fast as possible (all other options I tried take minutes to complete). I have a total of about 3000 videos, 500 detectors, and about 15 keyframes per video.
If it's not possible to do this in less than ~2s, then I am also open to ways of restructuring the database schema's. There will probably be no inserts/deletions in the database at all.
EDIT:
Thanks to GabrielsMessanger I have an answer, here is the query plan:
EXPLAIN (analyze, verbose)
SELECT
v_id, sum(fd_avg_score)
FROM (
SELECT
v.id as v_id, k.id as k_id, d.id as d_id,
avg(s.score) as fd_avg_score
FROM
video v
JOIN keyframe k ON k.video_id = v.id
JOIN score s ON s.keyframe_id = k.id
JOIN detector d ON d.id = s.detector_id
WHERE
d.id = ANY(ARRAY[1,2,3,4,5]) /*here goes detector's array*/
GROUP BY
v.id,
k.id,
d.id
) sub
GROUP BY
v_id
;
.
"GroupAggregate (cost=1865513.09..1910370.09 rows=200 width=12) (actual time=52141.684..52908.198 rows=2991 loops=1)"
" Output: v.id, sum((avg(s.score)))"
" Group Key: v.id"
" -> GroupAggregate (cost=1865513.09..1893547.46 rows=1121375 width=20) (actual time=52141.623..52793.184 rows=1121375 loops=1)"
" Output: v.id, k.id, d.id, avg(s.score)"
" Group Key: v.id, k.id, d.id"
" -> Sort (cost=1865513.09..1868316.53 rows=1121375 width=20) (actual time=52141.613..52468.062 rows=1121375 loops=1)"
" Output: v.id, k.id, d.id, s.score"
" Sort Key: v.id, k.id, d.id"
" Sort Method: external merge Disk: 37232kB"
" -> Hash Join (cost=11821.18..1729834.13 rows=1121375 width=20) (actual time=120.706..51375.777 rows=1121375 loops=1)"
" Output: v.id, k.id, d.id, s.score"
" Hash Cond: (k.video_id = v.id)"
" -> Hash Join (cost=11736.89..1711527.49 rows=1121375 width=20) (actual time=119.862..51141.066 rows=1121375 loops=1)"
" Output: k.id, k.video_id, s.score, d.id"
" Hash Cond: (s.keyframe_id = k.id)"
" -> Nested Loop (cost=4186.70..1673925.96 rows=1121375 width=16) (actual time=50.878..50034.247 rows=1121375 loops=1)"
" Output: s.score, s.keyframe_id, d.id"
" -> Seq Scan on public.detector d (cost=0.00..11.08 rows=5 width=4) (actual time=0.011..0.079 rows=5 loops=1)"
" Output: d.id, d.concepts"
" Filter: (d.id = ANY ('{1,2,3,4,5}'::integer[]))"
" Rows Removed by Filter: 492"
" -> Bitmap Heap Scan on public.score s (cost=4186.70..332540.23 rows=224275 width=16) (actual time=56.040..9961.040 rows=224275 loops=5)"
" Output: s.detector_id, s.keyframe_id, s.score"
" Recheck Cond: (s.detector_id = d.id)"
" Rows Removed by Index Recheck: 34169904"
" Heap Blocks: exact=192845 lossy=928530"
" -> Bitmap Index Scan on score_index (cost=0.00..4130.63 rows=224275 width=0) (actual time=49.748..49.748 rows=224275 loops=5)"
" Index Cond: (s.detector_id = d.id)"
" -> Hash (cost=3869.75..3869.75 rows=224275 width=8) (actual time=68.924..68.924 rows=224275 loops=1)"
" Output: k.id, k.video_id"
" Buckets: 16384 Batches: 4 Memory Usage: 2205kB"
" -> Seq Scan on public.keyframe k (cost=0.00..3869.75 rows=224275 width=8) (actual time=0.003..33.662 rows=224275 loops=1)"
" Output: k.id, k.video_id"
" -> Hash (cost=46.91..46.91 rows=2991 width=4) (actual time=0.834..0.834 rows=2991 loops=1)"
" Output: v.id"
" Buckets: 1024 Batches: 1 Memory Usage: 106kB"
" -> Seq Scan on public.video v (cost=0.00..46.91 rows=2991 width=4) (actual time=0.005..0.417 rows=2991 loops=1)"
" Output: v.id"
"Planning time: 2.136 ms"
"Execution time: 52914.840 ms"
Disclaimer:
My final answer is based on coments and extend chat discussion with author. One thing that shoult be noted: every keyframe_id is assigned to only one video
Original answer:
Isn't this that simple as following query?:
SELECT
v_id, sum(fd_avg_score)
FROM (
SELECT
v.id as v_id, k.id as k_id, s.detector_id as d_id,
avg(s.score) as fd_avg_score
FROM
video v
JOIN keyframe k ON k.video_id = v.id
JOIN score s ON s.keyframe_id = k.id
WHERE
s.detector_id = ANY(ARRAY[1,2,3,4,5]) /*here goes detector's array*/
GROUP BY
v.id,
k.id,
detector_id
) sub
GROUP BY
v_id
LIMIT 10
;
Firstly in subquery we join videos with their keyframes and keyframes with scores. We calculate avg score per video, per every keyframe of those videos and per detector (as you said). Lastly in master query we sumirize avg_score per video.
Performance
As author noted he has PRIMARY KEYS on id column in every table, and also have composite index on table score(detector_id, keyrame_id). This could by sufficient to run this query quickly.
But, while testing author needs futher optimalizations. So two things:
Remeber always perform VACUUM ANALYZE on tables esspecially if you insert 100M rows (like score table). So perform at least VACUUM ANALYZE score.
To try optimize more we can change composite index on score(detector_id, keyrame_id) to composite index on score(detector_id, keyrame_id, score). It may allow PostgreSQL to use Index Only Scan while calculating avg value.

Is it possible to answer queries on a view before fully materializing the view?

In short: Distinct,Min,Max on the Left hand side of a Left Join, should be answerable without doing the join.
I’m using a SQL array type (on Postgres 9.3) to condense several rows of data in to a single row, and then a view to return the unnested normalized view. I do this to save on index costs, as well as to get Postgres to compress the data in the array.
Things work pretty well, but some queries that could be answered without unnesting and materializing/exploding the view are quite expensive because they are deferred till after the view is materialized. Is there any way to solve this?
Here is the basic table:
CREATE TABLE mt_count_by_day
(
run_id integer NOT NULL,
type character varying(64) NOT NULL,
start_day date NOT NULL,
end_day date NOT NULL,
counts bigint[] NOT NULL,
CONSTRAINT mt_count_by_day_pkey PRIMARY KEY (run_id, type),
)
An index on ‘type’ just for good measure:
CREATE INDEX runinfo_mt_count_by_day_type_idx on runinfo.mt_count_by_day (type);
Here is the view that uses generate_series and unnest
CREATE OR REPLACE VIEW runinfo.v_mt_count_by_day AS
SELECT mt_count_by_day.run_id,
mt_count_by_day.type,
mt_count_by_day.brand,
generate_series(mt_count_by_day.start_day::timestamp without time zone, mt_count_by_day.end_day - '1 day'::interval, '1 day'::interval) AS row_date,
unnest(mt_count_by_day.counts) AS row_count
FROM runinfo.mt_count_by_day;
What if I want to do distinct on the ‘type' column?
explain analyze select distinct(type) from mt_count_by_day;
"HashAggregate (cost=9566.81..9577.28 rows=1047 width=19) (actual time=171.653..172.019 rows=1221 loops=1)"
" -> Seq Scan on mt_count_by_day (cost=0.00..9318.25 rows=99425 width=19) (actual time=0.089..99.110 rows=99425 loops=1)"
"Total runtime: 172.338 ms"
Now what happens if I do the same on the view?
explain analyze select distinct(type) from v_mt_count_by_day;
"HashAggregate (cost=1749752.88..1749763.34 rows=1047 width=19) (actual time=58586.934..58587.191 rows=1221 loops=1)"
" -> Subquery Scan on v_mt_count_by_day (cost=0.00..1501190.38 rows=99425000 width=19) (actual time=0.114..37134.349 rows=68299959 loops=1)"
" -> Seq Scan on mt_count_by_day (cost=0.00..506940.38 rows=99425000 width=597) (actual time=0.113..24907.147 rows=68299959 loops=1)"
"Total runtime: 58587.474 ms"
Is there a way to get postgres to recognize that it can solve this without first exploding the view?
Here we can see for comparison we are counting the number of rows matching criteria in the table vs the view. Everything works as expected. Postgres filters down the rows before materializing the view. Not quite the same, but this property is what makes our data more manageable.
explain analyze select count(*) from mt_count_by_day where type = ’SOCIAL_GOOGLE'
"Aggregate (cost=157.01..157.02 rows=1 width=0) (actual time=0.538..0.538 rows=1 loops=1)"
" -> Bitmap Heap Scan on mt_count_by_day (cost=4.73..156.91 rows=40 width=0) (actual time=0.139..0.509 rows=122 loops=1)"
" Recheck Cond: ((type)::text = 'SOCIAL_GOOGLE'::text)"
" -> Bitmap Index Scan on runinfo_mt_count_by_day_type_idx (cost=0.00..4.72 rows=40 width=0) (actual time=0.098..0.098 rows=122 loops=1)"
" Index Cond: ((type)::text = 'SOCIAL_GOOGLE'::text)"
"Total runtime: 0.625 ms"
explain analyze select count(*) from v_mt_count_by_day where type = 'SOCIAL_GOOGLE'
"Aggregate (cost=857.11..857.12 rows=1 width=0) (actual time=6.827..6.827 rows=1 loops=1)"
" -> Bitmap Heap Scan on mt_count_by_day (cost=4.73..357.11 rows=40000 width=597) (actual time=0.124..5.294 rows=15916 loops=1)"
" Recheck Cond: ((type)::text = 'SOCIAL_GOOGLE'::text)"
" -> Bitmap Index Scan on runinfo_mt_count_by_day_type_idx (cost=0.00..4.72 rows=40 width=0) (actual time=0.082..0.082 rows=122 loops=1)"
" Index Cond: ((type)::text = 'SOCIAL_GOOGLE'::text)"
"Total runtime: 6.885 ms"
Here is the code required to reproduce this:
CREATE TABLE base_table
(
run_id integer NOT NULL,
type integer NOT NULL,
start_day date NOT NULL,
end_day date NOT NULL,
counts bigint[] NOT NULL
CONSTRAINT match_check CHECK (end_day > start_day AND (end_day - start_day) = array_length(counts, 1)),
CONSTRAINT base_table_pkey PRIMARY KEY (run_id, type)
);
--Just because...
CREATE INDEX base_type_idx on base_table (type);
CREATE OR REPLACE VIEW v_foo AS
SELECT m.run_id,
m.type,
t.row_date::date,
t.row_count
FROM base_table m
LEFT JOIN LATERAL ROWS FROM (
unnest(m.counts),
generate_series(m.start_day, m.end_day-1, interval '1d')
) t(row_count, row_date) ON true;
insert into base_table
select a.run_id, a.type, '20120101'::date as start_day, '20120401'::date as end_day, b.counts from (SELECT N AS run_id, L as type
FROM
generate_series(1, 10000) N
CROSS JOIN
generate_series(1, 7) L
ORDER BY N, L) a, (SELECT array_agg(generate_series)::bigint[] as counts FROM generate_series(1, 91) ) b
And the results on 9.4.1:
explain analyze select distinct type from base_table;
"HashAggregate (cost=6750.00..6750.03 rows=3 width=4) (actual time=51.939..51.940 rows=3 loops=1)"
" Group Key: type"
" -> Seq Scan on base_table (cost=0.00..6600.00 rows=60000 width=4) (actual time=0.030..33.655 rows=60000 loops=1)"
"Planning time: 0.086 ms"
"Execution time: 51.975 ms"
explain analyze select distinct type from v_foo;
"HashAggregate (cost=1356600.01..1356600.04 rows=3 width=4) (actual time=9215.630..9215.630 rows=3 loops=1)"
" Group Key: m.type"
" -> Nested Loop Left Join (cost=0.01..1206600.01 rows=60000000 width=4) (actual time=0.112..7834.094 rows=5460000 loops=1)"
" -> Seq Scan on base_table m (cost=0.00..6600.00 rows=60000 width=764) (actual time=0.009..42.694 rows=60000 loops=1)"
" -> Function Scan on t (cost=0.01..10.01 rows=1000 width=0) (actual time=0.091..0.111 rows=91 loops=60000)"
"Planning time: 0.132 ms"
"Execution time: 9215.686 ms"
Generally, the Postgres query planner does "inline" views to optimize the whole query. Per documentation:
One application of the rewrite system is in the realization of views.
Whenever a query against a view (i.e., a virtual table) is made, the
rewrite system rewrites the user's query to a query that accesses the
base tables given in the view definition instead.
But I don't think Postgres is smart enough to conclude that it can reach the same result from the base table without exploding rows.
You can try this alternative query with a LATERAL join. It's cleaner:
CREATE OR REPLACE VIEW runinfo.v_mt_count_by_day AS
SELECT m.run_id, m.type, m.brand
, m.start_day + c.rn - 1 AS row_date
, c.row_count
FROM runinfo.mt_count_by_day m
LEFT JOIN LATERAL unnest(m.counts) WITH ORDINALITY c(row_count, rn) ON true;
It also makes clear that one of (end_day, start_day) is redundant.
Using LEFT JOIN because that might allow the query planner to ignore the join from your query:
SELECT DISTINCT type FROM v_mt_count_by_day;
Else (with a CROSS JOIN or INNER JOIN) it must evaluate the join to see whether rows from the first table are eliminated.
BTW, it's:
SELECT DISTINCT type ...
not:
SELECT DISTINCT(type) ...
Note that this returns a date instead of the timestamp in your original. Easer, and I guess it's what you want anyway?
Requires Postgres 9.3+ Details:
PostgreSQL unnest() with element number
ROWS FROM in Postgres 9.4+
To explode both columns in parallel safely:
CREATE OR REPLACE VIEW runinfo.v_mt_count_by_day AS
SELECT m.run_id, m.type, m.brand
t.row_date::date, t.row_count
FROM runinfo.mt_count_by_day m
LEFT JOIN LATERAL ROWS FROM (
unnest(m.counts)
, generate_series(m.start_day, m.end_day, interval '1d')
) t(row_count, row_date) ON true;
The main benefit: This would not derail into a Cartesian product if the two SRF don't return the same number of rows. Instead, NULL values would be padded.
Again, I can't say whether this would help the query planner with a faster plan for DISTINCT type without testing.