I have sensor data in a table by timestamp with multiple values in an array. E.g.:
CREATE TABLE test_raw (
ts timestamp without time zone NOT NULL,
values real[]
);
INSERT INTO test_raw VALUES
('2020-7-14 00:00:00', ARRAY[1, 10]),
('2020-7-14 00:01:00', ARRAY[2, 20, 30]),
('2020-7-14 00:20:00', ARRAY[3, NULL, 30, 40]),
('2020-7-14 00:23:00', ARRAY[9, NULL, 50, 80]),
('2020-7-14 00:10:00', ARRAY[3, 30, 40]),
('2020-7-14 00:11:00', ARRAY[3, 30, NULL, 50])
;
The array corresponds to different metrics collected by a device, e.g., values[1] might be temperature, values[2] might be humidity, etc. The full schema has additional columns (e.g. device ID) that indicate what the array contains.
I'd now like to create an aggregate/rollup table that has, say, the average over 10 minutes. If values were a scalar and not an array, I'd write the following view (which I'd use to populate the rollup table):
CREATE VIEW test_raw_10m AS
SELECT
floor(extract(epoch FROM ts)/600)*600 as ts,
AVG(value) /* scalar value! */
FROM test_raw
GROUP BY ts;
But it's not so simple with a values array. I saw the answer to a very closely related question: Pairwise array sum aggregate function?
This leads me to the following, which seems overly complicated:
WITH test_raw_10m AS (
SELECT floor(extract(epoch FROM ts)/600)*600 as ts, values
FROM test_raw
)
SELECT
t.ts,
ARRAY( SELECT
AVG(value) as value
FROM test_raw_10m tt, UNNEST(tt.values) WITH ORDINALITY x(value, rn)
WHERE tt.ts = t.ts
GROUP by x.rn
ORDER by x.rn) AS values
FROM test_raw_10m AS t
GROUP BY ts
ORDER by ts
;
My question: Is there a better way to do this?
For completeness, here's the result given the above sample data:
ts | values
------------+----------------
1594684800 | {1.5,15,30}
1594685400 | {3,30,40,50}
1594686000 | {6,NULL,40,60}
(3 rows)
and here's the query plan:
QUERY PLAN
-------------------------------------------------------------------------------------------
Group (cost=119.37..9490.26 rows=200 width=40)
Group Key: t.ts
CTE test_raw_10m
-> Seq Scan on test_raw (cost=0.00..34.00 rows=1200 width=40)
-> Sort (cost=85.37..88.37 rows=1200 width=8)
Sort Key: t.ts
-> CTE Scan on test_raw_10m t (cost=0.00..24.00 rows=1200 width=8)
SubPlan 2
-> Sort (cost=46.57..46.82 rows=100 width=16)
Sort Key: x.rn
-> HashAggregate (cost=42.00..43.25 rows=100 width=16)
Group Key: x.rn
-> Nested Loop (cost=0.00..39.00 rows=600 width=12)
-> CTE Scan on test_raw_10m tt (cost=0.00..27.00 rows=6 width=32)
Filter: (ts = t.ts)
-> Function Scan on unnest x (cost=0.00..1.00 rows=100 width=12)
The following query is significantly faster on my real dataset if I do partial updates by changing FROM test_raw to something like FROM test_raw WHERE ts >= <some timestamp> (in both queries):
SELECT bucket as ts, ARRAY_AGG(v)
FROM (
SELECT to_timestamp(floor(extract(epoch FROM ts)/600)*600) as bucket, AVG(values[i]) AS v
FROM (SELECT ts, generate_subscripts(values, 1) AS i, values FROM test_raw) AS foo
GROUP BY bucket, i
ORDER BY bucket, i
) bar
GROUP BY bucket;
I believe the ORDER BY bucket, i is not necessary, but I'm not sure.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
-
GroupAggregate (cost=228027.62..241630.12 rows=200 width=40) (actual time=0.948..1.209 rows=3 loops=1)
Group Key: (to_timestamp((floor((date_part('epoch'::text, foo.ts) / '600'::double precision)) * '600'::double precision)))
-> GroupAggregate (cost=228027.62..241027.62 rows=40000 width=20) (actual time=0.826..1.099 rows=11 loops=1)
Group Key: (to_timestamp((floor((date_part('epoch'::text, foo.ts) / '600'::double precision)) * '600'::double precision))), foo.i
-> Sort (cost=228027.62..231027.62 rows=1200000 width=44) (actual time=0.773..0.870 rows=20 loops=1)
Sort Key: (to_timestamp((floor((date_part('epoch'::text, foo.ts) / '600'::double precision)) * '600'::double precision))), foo.i
Sort Method: quicksort Memory: 19kB
-> Subquery Scan on foo (cost=0.00..33031.00 rows=1200000 width=44) (actual time=0.165..0.619 rows=20 loops=1)
-> ProjectSet (cost=0.00..6031.00 rows=1200000 width=44) (actual time=0.131..0.312 rows=20 loops=1)
-> Seq Scan on test_raw (cost=0.00..22.00 rows=1200 width=40) (actual time=0.034..0.070 rows=6 loops=1)
Planning Time: 0.525 ms
Execution Time: 1.504 ms
(12 rows)
Related
I just noticed Postgres (checked on version 13 and 14) behavior that surprised me. I have a simple table volume with id and unique text column name. Second table dir has 3 columns: id, volume_id and path. This table is partitioned by hash on volume_id column. Here is full table schema with sample data:
CREATE TABLE dir (
id BIGSERIAL,
volume_id BIGINT,
path TEXT
) PARTITION BY HASH (volume_id);
CREATE TABLE dir_0
PARTITION OF dir FOR VALUES WITH (modulus 3, remainder 0);
CREATE TABLE dir_1
PARTITION OF dir FOR VALUES WITH (modulus 3, remainder 1);
CREATE TABLE dir_2
PARTITION OF dir FOR VALUES WITH (modulus 3, remainder 2);
CREATE TABLE volume(
id BIGINT,
name TEXT UNIQUE
);
INSERT INTO volume (id, name) VALUES (1, 'vol1'), (2, 'vol2'), (3, 'vol3');
INSERT INTO dir (volume_id, path) SELECT i % 3 + 1, 'path_' || i FROM generate_series(1,1000) AS i;
Now, given the volume name, I need to find all the rows from the dir table on that volume. I can do that in 2 different ways.
Query #1
EXPLAIN ANALYZE
SELECT * FROM dir AS d
INNER JOIN volume AS v ON d.volume_id = v.id
WHERE v.name = 'vol1';
Which produces query plan:
QUERY PLAN
Hash Join (cost=1.05..31.38 rows=333 width=37) (actual time=0.186..0.302 rows=333 loops=1)
Hash Cond: (d.volume_id = v.id)
-> Append (cost=0.00..24.00 rows=1000 width=24) (actual time=0.006..0.154 rows=1000 loops=1)
-> Seq Scan on dir_0 d_1 (cost=0.00..6.34 rows=334 width=24) (actual time=0.006..0.032 rows=334 loops=1)
-> Seq Scan on dir_1 d_2 (cost=0.00..6.33 rows=333 width=24) (actual time=0.006..0.029 rows=333 loops=1)
-> Seq Scan on dir_2 d_3 (cost=0.00..6.33 rows=333 width=24) (actual time=0.004..0.026 rows=333 loops=1)
-> Hash (cost=1.04..1.04 rows=1 width=13) (actual time=0.007..0.007 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on volume v (cost=0.00..1.04 rows=1 width=13) (actual time=0.003..0.004 rows=1 loops=1)
Filter: (name = 'vol1'::text)
Rows Removed by Filter: 2
Planning Time: 0.500 ms
Execution Time: 0.364 ms
As you can see this query leads to a sequential scan on all 3 partitions of the dir table.
Alternatively, we can write this query like this:
Query #2
EXPLAIN ANALYZE
SELECT * FROM dir AS d
WHERE volume_id = (SELECT id FROM volume AS v WHERE v.name = 'vol1');
In that case we get following query plan:
QUERY PLAN
Append (cost=1.04..27.54 rows=1000 width=24) (actual time=0.010..0.066 rows=333 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on volume v (cost=0.00..1.04 rows=1 width=8) (actual time=0.003..0.004 rows=1 loops=1)
Filter: (name = 'vol1'::text)
Rows Removed by Filter: 2
-> Seq Scan on dir_0 d_1 (cost=0.00..7.17 rows=334 width=24) (never executed)
Filter: (volume_id = $0)
-> Seq Scan on dir_1 d_2 (cost=0.00..7.16 rows=333 width=24) (never executed)
Filter: (volume_id = $0)
-> Seq Scan on dir_2 d_3 (cost=0.00..7.16 rows=333 width=24) (actual time=0.004..0.037 rows=333 loops=1)
Filter: (volume_id = $0)
Planning Time: 0.063 ms
Execution Time: 0.093 ms
Here we can see that partitions dir_0 and dir_1 have never executed annotation.
View on DB Fiddle
My question is:
Why in the first case is there no partition pruning? Postgres already knows that the volume.name column is unique and that it will translate into a single volume_id. I would like to get a good intuition on when partition pruning can happen during query execution.
To get partition pruning with a hash join, you'd need to add a condition on d.volume_id to your query. No inference is made from the join with volume.
Your second query shows partition pruning; the "never executed" means that the query executor pruned the scan of certain partitions.
An alternative method that forces a nested loop join and prunes partitions should be
SELECT *
FROM volume AS v
CROSS JOIN LATERAL (SELECT * FROM dir
WHERE dir.volume_id = v.id
OFFSET 0) AS d
WHERE v.name = 'vol1';
OFFSET 0 prevents anything except a nested loop join. Different from your query, that should also work if volumn.name is not unique.
In some cases, PostgreSQL does not filter out window function partitions until they are calculated, while in a very similar scenario PostgreSQL filters row before performing window function calculation.
Tables used for minimal STR - log is the main data table, each row contains either increment or absolute value. Absolute value resets the current counter with a new base value. Window functions need to process all logs for a given account_id to calculate the correct running total. View uses a subquery to ensure that underlying log rows are not filtered by ts, otherwise, this would break the window function.
CREATE TABLE account(
id serial,
name VARCHAR(100)
);
CREATE TABLE log(
id serial,
absolute int,
incremental int,
account_id int,
ts timestamp,
PRIMARY KEY(id),
CONSTRAINT fk_account
FOREIGN KEY(account_id)
REFERENCES account(id)
);
CREATE FUNCTION get_running_total_func(
aggregated_total int,
absolute int,
incremental int
) RETURNS int
LANGUAGE sql IMMUTABLE CALLED ON NULL INPUT AS
$$
SELECT
CASE
WHEN absolute IS NOT NULL THEN absolute
ELSE COALESCE(aggregated_total, 0) + incremental
END
$$;
CREATE AGGREGATE get_running_total(integer, integer) (
sfunc = get_running_total_func,
stype = integer
);
Slow view:
CREATE VIEW test_view
(
log_id,
running_value,
account_id,
ts
)
AS
SELECT log_running.* FROM
(SELECT
log.id,
get_running_total(
log.absolute,
log.incremental
)
OVER(
PARTITION BY log.account_id
ORDER BY log.ts RANGE UNBOUNDED PRECEDING
),
account.id,
ts
FROM log log JOIN account account ON log.account_id=account.id
) AS log_running;
CREATE VIEW
postgres=# EXPLAIN ANALYZE SELECT * FROM test_view WHERE account_id=1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on log_running (cost=12734.02..15981.48 rows=1 width=20) (actual time=7510.851..16122.404 rows=20 loops=1)
Filter: (log_running.id_1 = 1)
Rows Removed by Filter: 99902
-> WindowAgg (cost=12734.02..14732.46 rows=99922 width=32) (actual time=7510.830..14438.783 rows=99922 loops=1)
-> Sort (cost=12734.02..12983.82 rows=99922 width=28) (actual time=7510.628..9312.399 rows=99922 loops=1)
Sort Key: log.account_id, log.ts
Sort Method: external merge Disk: 3328kB
-> Hash Join (cost=143.50..2042.24 rows=99922 width=28) (actual time=169.941..5431.650 rows=99922 loops=1)
Hash Cond: (log.account_id = account.id)
-> Seq Scan on log (cost=0.00..1636.22 rows=99922 width=24) (actual time=0.063..1697.802 rows=99922 loops=1)
-> Hash (cost=81.00..81.00 rows=5000 width=4) (actual time=169.837..169.865 rows=5000 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 240kB
-> Seq Scan on account (cost=0.00..81.00 rows=5000 width=4) (actual time=0.017..84.639 rows=5000 loops=1)
Planning Time: 0.199 ms
Execution Time: 16127.275 ms
(15 rows)
Fast view - only change is account.id -> log.account_id (!):
CREATE VIEW test_view
(
log_id,
running_value,
account_id,
ts
)
AS
SELECT log_running.* FROM
(SELECT
log.id,
get_running_total(
log.absolute,
log.incremental
)
OVER(
PARTITION BY log.account_id
ORDER BY log.ts RANGE UNBOUNDED PRECEDING
),
log.account_id,
ts
FROM log log JOIN account account ON log.account_id=account.id
) AS log_running;
CREATE VIEW
postgres=# EXPLAIN ANALYZE SELECT * FROM test_view WHERE account_id=1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on log_running (cost=1894.96..1895.56 rows=20 width=20) (actual time=34.718..45.958 rows=20 loops=1)
-> WindowAgg (cost=1894.96..1895.36 rows=20 width=28) (actual time=34.691..45.307 rows=20 loops=1)
-> Sort (cost=1894.96..1895.01 rows=20 width=24) (actual time=34.367..35.925 rows=20 loops=1)
Sort Key: log.ts
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.28..1894.53 rows=20 width=24) (actual time=0.542..34.066 rows=20 loops=1)
-> Index Only Scan using account_pkey on account (cost=0.28..8.30 rows=1 width=4) (actual time=0.025..0.054 rows=1 loops=1)
Index Cond: (id = 1)
Heap Fetches: 1
-> Seq Scan on log (cost=0.00..1886.03 rows=20 width=24) (actual time=0.195..32.937 rows=20 loops=1)
Filter: (account_id = 1)
Rows Removed by Filter: 99902
Planning Time: 0.297 ms
Execution Time: 47.300 ms
(14 rows)
Is it a bug in PostgreSQL implementation? It seems that this change in view definition shouldn't affect performance at all, PostgreSQL should be able to filter data before applying window function for all data set.
My app employs a multilevel hierarchical structure. There are many PL/pgSQL functions in the app that use the same type of selection: "select entities according to a list and all their child entities". I created a recursive view trying to avoid redundancy. The problem is, if i understand correctly, PostgreSQL (12.3, compiled by Visual C++ build 1914, 64-bit) selects all entities first and then it filters the records.
Here is a simplified example.
drop view if exists v;
drop table if exists t;
create table t
(
id int primary key,
parent_id int
);
insert into t (id, parent_id)
select s, (s - 1) * random()
from generate_series(1, 100000) as s;
create recursive view v (start_id, id, pid) as
select id, id, parent_id
from t
union all
select v.start_id, t.id, t.parent_id
from t
inner join v on v.id = t.parent_id;
explain (analyze)
select *
from v
where start_id = 10
order by start_id, id;
explain (analyze)
select *
from v
where start_id in (10, 11, 12, 20, 100)
order by start_id, id;
Is there a better solution? Any help is greatly appreciated.
Here is the query plan I got on my computer:
Sort (actual time=3809.581..3812.541 rows=29652 loops=1)
" Sort Key: v.start_id, v.id"
Sort Method: quicksort Memory: 2158kB
-> CTE Scan on v (actual time=0.044..3795.424 rows=29652 loops=1)
" Filter: (start_id = ANY ('{10,11,12,20,100}'::integer[]))"
Rows Removed by Filter: 1069171
CTE v
-> Recursive Union (actual time=0.028..3411.325 rows=1098823 loops=1)
-> Seq Scan on t (actual time=0.025..19.465 rows=100000 loops=1)
-> Merge Join (actual time=74.631..127.916 rows=41618 loops=24)
Merge Cond: (t_1.parent_id = v_1.id)
-> Sort (actual time=46.021..59.589 rows=99997 loops=24)
Sort Key: t_1.parent_id
Sort Method: external merge Disk: 1768kB
-> Seq Scan on t t_1 (actual time=0.016..11.797 rows=100000 loops=24)
-> Materialize (actual time=23.542..42.088 rows=65212 loops=24)
-> Sort (actual time=23.188..29.740 rows=45385 loops=24)
Sort Key: v_1.id
Sort Method: quicksort Memory: 25kB
-> WorkTable Scan on v v_1 (actual time=0.017..7.412 rows=45784 loops=24)
Planning Time: 0.260 ms
Execution Time: 3819.152 ms
I'm working with a table where each row has a timestamp, and that timestamp is unique for a given set of other column values:
CREATE TEMPORARY TABLE time_series (
id SERIAL PRIMARY KEY,
created TIMESTAMP WITH TIME ZONE NOT NULL,
category TEXT,
value INT
);
CREATE UNIQUE INDEX ON time_series (created, category);
INSERT INTO time_series (created, category, value)
VALUES ('2000-01-01 00:00:00Z', 'foo', 1),
('2000-01-01 06:00:00Z', 'bar', 5),
('2000-01-01 12:00:00Z', 'bar', 5),
('2000-01-02 00:00:00Z', 'bar', 5),
('2000-01-02 12:34:45Z', 'bar', 2),
('2000-01-03 00:00:00Z', 'bar', 3),
('2000-01-04 00:00:00Z', 'bar', 3),
('2000-01-04 11:11:11Z', 'foo', 4),
('2000-01-04 22:22:22Z', 'bar', 5),
('2000-01-04 23:23:23Z', 'bar', 4),
('2000-01-05 00:00:00Z', 'foo', 1),
('2000-01-05 23:23:23Z', 'bar', 4);
The timestamps are not spaced uniformly. My task, given an arbitrary start and end datetime, is to get the entries between those datetimes and the entries immediately before and after that range. Basically, how do I simplify this query:
(SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created < '2000-01-02 06:00:00Z'
ORDER BY created DESC
LIMIT 1)
UNION
(SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-02 06:00:00Z'
AND created < '2000-01-04 12:00:00Z')
UNION
(SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-04 12:00:00Z'
ORDER BY created
LIMIT 1)
ORDER BY created;
created value
2000-01-02 00:00:00+00 5
2000-01-02 12:34:45+00 2
2000-01-03 00:00:00+00 3
2000-01-04 00:00:00+00 3
2000-01-04 22:22:22+00 5
The use case is getting the data points to display a graph: I know the datetimes of the left and right edges of the graph, but they will not in general align exactly with created datetimes, so in order to display a graph all the way to the edge I need a data point to either side of the range.
Fiddle
Non-solutions:
I can not simply select the whole range, because it might be huge.
I can not select some arbitrarily long period outside of the given range, because that data set might again be huge or whichever period I select might not be enough to get the next readings.
EDITED:
You can combine UNION ALL with ORDER BY and LIMIT and some clause bounds.
Something like this:
APPROACH 1:
SELECT created,
value
FROM (SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created < '2000-01-02 06:00:00Z'
ORDER BY created ASC LIMIT 1
) AS ub
UNION ALL
SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-02 06:00:00Z'
AND created < '2000-01-04 12:00:00Z'
UNION ALL
SELECT created,
value
FROM (SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-04 12:00:00Z'
ORDER BY created DESC LIMIT 1
) AS lb
ORDER BY 1;
EXPLAIN ANALYZE from approach 1:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3.60..3.61 rows=3 width=12) (actual time=0.228..0.237 rows=5 loops=1)
Sort Key: time_series.created
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=3.55..3.58 rows=3 width=12) (actual time=0.182..0.195 rows=5 loops=1)
Group Key: time_series.created, time_series.value
-> Append (cost=1.16..3.53 rows=3 width=12) (actual time=0.073..0.163 rows=5 loops=1)
-> Limit (cost=1.16..1.16 rows=1 width=12) (actual time=0.070..0.073 rows=1 loops=1)
-> Sort (cost=1.16..1.16 rows=1 width=12) (actual time=0.065..0.067 rows=1 loops=1)
Sort Key: time_series.created DESC
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series (cost=0.00..1.15 rows=1 width=12) (actual time=0.026..0.035 rows=2 loops=1)
Filter: ((created < '2000-01-02 06:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 8
-> Seq Scan on time_series time_series_1 (cost=0.00..1.18 rows=1 width=12) (actual time=0.007..0.016 rows=3 loops=1)
Filter: ((created >= '2000-01-02 06:00:00+00'::timestamp with time zone) AND (created < '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 7
-> Limit (cost=1.16..1.16 rows=1 width=12) (actual time=0.051..0.054 rows=1 loops=1)
-> Sort (cost=1.16..1.16 rows=1 width=12) (actual time=0.047..0.049 rows=1 loops=1)
Sort Key: time_series_2.created
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series time_series_2 (cost=0.00..1.15 rows=1 width=12) (actual time=0.009..0.016 rows=2 loops=1)
Filter: ((created >= '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 8
Planning time: 0.388 ms
Execution time: 0.438 ms
(25 rows)
Another similar approach can be used.
APPROACH 2:
SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= (SELECT created
FROM time_series
WHERE category = 'bar'
AND created < '2000-01-02 06:00:00Z'
ORDER BY created ASC LIMIT 1)
AND created < (SELECT created
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-04 12:00:00Z'
ORDER BY created DESC LIMIT 1
)
EXPLAIN ANALYZE from approach 2:
--------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on time_series (cost=2.33..3.50 rows=1 width=12) (actual time=0.143..0.157 rows=6 loops=1)
Filter: ((created >= $0) AND (created < $1) AND (category = 'bar'::text))
Rows Removed by Filter: 4
InitPlan 1 (returns $0)
-> Limit (cost=1.16..1.16 rows=1 width=8) (actual time=0.066..0.069 rows=1 loops=1)
-> Sort (cost=1.16..1.16 rows=1 width=8) (actual time=0.061..0.062 rows=1 loops=1)
Sort Key: time_series_1.created
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series time_series_1 (cost=0.00..1.15 rows=1 width=8) (actual time=0.008..0.015 rows=2 loops=1)
Filter: ((created < '2000-01-02 06:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 8
InitPlan 2 (returns $1)
-> Limit (cost=1.16..1.16 rows=1 width=8) (actual time=0.041..0.044 rows=1 loops=1)
-> Sort (cost=1.16..1.16 rows=1 width=8) (actual time=0.038..0.039 rows=1 loops=1)
Sort Key: time_series_2.created DESC
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series time_series_2 (cost=0.00..1.15 rows=1 width=8) (actual time=0.007..0.013 rows=2 loops=1)
Filter: ((created >= '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 8
Planning time: 0.392 ms
Execution time: 0.288 ms
As you're using limit, the query will run fast.
APPROACH 3:
WITH a as (
SELECT created,
value,
lag(created, 1) OVER (ORDER BY created desc) AS ub,
lag(created, -1) OVER (ORDER BY created desc) AS lb
FROM time_series
WHERE category = 'bar'
) SELECT created,
value
FROM a
WHERE ub>='2000-01-02 06:00:00Z'
AND lb<'2000-01-04 12:00:00Z'
ORDER BY created
EXPLAIN ANALYZE from approach 3:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1.19..1.20 rows=1 width=12) (actual time=0.174..0.181 rows=5 loops=
1)
Sort Key: a.created
Sort Method: quicksort Memory: 25kB
CTE a
-> WindowAgg (cost=1.14..1.16 rows=1 width=28) (actual time=0.075..0.107 rows=7 loops=1)
-> Sort (cost=1.14..1.14 rows=1 width=12) (actual time=0.056..0.067 rows=7 loops=1)
Sort Key: time_series.created DESC
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series (cost=0.00..1.12 rows=1 width=12) (actual time=0.018..0.030 rows=7 loops=1)
Filter: (category = 'bar'::text)
Rows Removed by Filter: 3
-> CTE Scan on a (cost=0.00..0.03 rows=1 width=12) (actual time=0.088..0.131 rows=5 loops=1)
Filter: ((ub >= '2000-01-02 06:00:00+00'::timestamp with time zone) AND (lb < '2000-01-04 12:00:00+00'::timestamp with time zone))
Rows Removed by Filter: 2
Planning time: 0.175 ms
Execution time: 0.247 ms
(16 rows)
I'm trying to take advantages of partitioning in one case:
I have table "events" which partitioned by list by field "dt_pk" which is foreign key to table "dates".
-- Schema
drop schema if exists test cascade;
create schema test;
-- Tables
create table if not exists test.dates (
id bigint primary key,
dt date not null
);
create sequence test.seq_events_id;
create table if not exists test.events
(
id bigint not null,
dt_pk bigint not null,
content_int bigint,
foreign key (dt_pk) references test.dates(id) on delete cascade,
primary key (dt_pk, id)
)
partition by list (dt_pk);
-- Partitions
create table test.events_1 partition of test.events for values in (1);
create table test.events_2 partition of test.events for values in (2);
create table test.events_3 partition of test.events for values in (3);
-- Fill tables
insert into test.dates (id, dt)
select id, dt
from (
select 1 id, '2020-01-01'::date as dt
union all
select 2 id, '2020-01-02'::date as dt
union all
select 3 id, '2020-01-03'::date as dt
) t;
do $$
declare
dts record;
begin
for dts in (
select id
from test.dates
) loop
for k in 1..10000 loop
insert into test.events (id, dt_pk, content_int)
values (nextval('test.seq_events_id'), dts.id, random_between(1, 1000000));
end loop;
commit;
end loop;
end;
$$;
vacuum analyze test.dates, test.events;
I want to run select like this:
select *
from test.events e
join test.dates d on e.dt_pk = d.id
where d.dt between '2020-01-02'::date and '2020-01-03'::date;
But in this case partition pruning doesn't work. It's clear, I don't have constant for partition key. But from documentation I know that there is partition pruning at execution time, which works with value obtained from a subquery:
Partition pruning can be performed not only during the planning of a
given query, but also during its execution. This is useful as it can
allow more partitions to be pruned when clauses contain expressions
whose values are not known at query planning time, for example,
parameters defined in a PREPARE statement, using a value obtained from
a subquery, or using a parameterized value on the inner side of a
nested loop join.
So I rewrite my query like this and I expected partitionin pruning:
select *
from test.events e
where e.dt_pk in (
select d.id
from test.dates d
where d.dt between '2020-01-02'::date and '2020-01-03'::date
);
But explain for this select says:
Hash Join (cost=1.07..833.07 rows=20000 width=24) (actual time=3.581..15.989 rows=20000 loops=1)
Hash Cond: (e.dt_pk = d.id)
-> Append (cost=0.00..642.00 rows=30000 width=24) (actual time=0.005..6.361 rows=30000 loops=1)
-> Seq Scan on events_1 e (cost=0.00..164.00 rows=10000 width=24) (actual time=0.005..1.104 rows=10000 loops=1)
-> Seq Scan on events_2 e_1 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.005..1.127 rows=10000 loops=1)
-> Seq Scan on events_3 e_2 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.008..1.097 rows=10000 loops=1)
-> Hash (cost=1.04..1.04 rows=2 width=8) (actual time=0.006..0.006 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on dates d (cost=0.00..1.04 rows=2 width=8) (actual time=0.004..0.004 rows=2 loops=1)
Filter: ((dt >= '2020-01-02'::date) AND (dt <= '2020-01-03'::date))
Rows Removed by Filter: 1
Planning Time: 0.206 ms
Execution Time: 17.237 ms
So, we read all partitions. I even tried to the planner to use nested loop join, because I read in documentation "parameterized value on the inner side of a nested loop join", but it didn't work:
set enable_hashjoin to off;
set enable_mergejoin to off;
And again:
Nested Loop (cost=0.00..1443.05 rows=20000 width=24) (actual time=9.160..25.252 rows=20000 loops=1)
Join Filter: (e.dt_pk = d.id)
Rows Removed by Join Filter: 30000
-> Append (cost=0.00..642.00 rows=30000 width=24) (actual time=0.008..6.280 rows=30000 loops=1)
-> Seq Scan on events_1 e (cost=0.00..164.00 rows=10000 width=24) (actual time=0.008..1.105 rows=10000 loops=1)
-> Seq Scan on events_2 e_1 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.008..1.047 rows=10000 loops=1)
-> Seq Scan on events_3 e_2 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.007..1.082 rows=10000 loops=1)
-> Materialize (cost=0.00..1.05 rows=2 width=8) (actual time=0.000..0.000 rows=2 loops=30000)
-> Seq Scan on dates d (cost=0.00..1.04 rows=2 width=8) (actual time=0.004..0.004 rows=2 loops=1)
Filter: ((dt >= '2020-01-02'::date) AND (dt <= '2020-01-03'::date))
Rows Removed by Filter: 1
Planning Time: 0.202 ms
Execution Time: 26.516 ms
Then I noticed that in every example of "partition pruning at execution time" I see only = condition, not in.
And it really works that way:
explain (analyze) select * from test.events e where e.dt_pk = (select id from test.dates where id = 2);
Append (cost=1.04..718.04 rows=30000 width=24) (actual time=0.014..3.018 rows=10000 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on dates (cost=0.00..1.04 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)
Filter: (id = 2)
Rows Removed by Filter: 2
-> Seq Scan on events_1 e (cost=0.00..189.00 rows=10000 width=24) (never executed)
Filter: (dt_pk = $0)
-> Seq Scan on events_2 e_1 (cost=0.00..189.00 rows=10000 width=24) (actual time=0.004..2.009 rows=10000 loops=1)
Filter: (dt_pk = $0)
-> Seq Scan on events_3 e_2 (cost=0.00..189.00 rows=10000 width=24) (never executed)
Filter: (dt_pk = $0)
Planning Time: 0.135 ms
Execution Time: 3.639 ms
And here is my final question: does partition pruning at execution time work only with subquery returning one item, or there is a way to get advantages of partition pruning with subquery returning a list?
And why doesn't it work with nested loop join, did I understand something wrong in words:
This includes values from subqueries and values from execution-time
parameters such as those from parameterized nested loop joins.
Or "parameterized nested loop joins" is something different from regular nested loop joins?
There is no partition pruning in your nested loop join because the partitioned table is on the outer side, which is always scanned completely. The inner side is scanned with the join key from the outer side as parameter (hence parameterized scan), so if the partitioned table were on the inner side of the nested loop join, partition pruning could happen.
Partition pruning with IN lists can take place if the list vales are known at plan time:
EXPLAIN (COSTS OFF)
SELECT * FROM test.events WHERE dt_pk IN (1, 2);
QUERY PLAN
---------------------------------------------------
Append
-> Seq Scan on events_1
Filter: (dt_pk = ANY ('{1,2}'::bigint[]))
-> Seq Scan on events_2
Filter: (dt_pk = ANY ('{1,2}'::bigint[]))
(5 rows)
But no attempts are made to flatten a subquery, and PostgreSQL doesn't use partition pruning, even if you force the partitioned table to be on the inner side (enable_material = off, enable_hashjoin = off, enable_mergejoin = off):
EXPLAIN (ANALYZE)
SELECT * FROM test.events WHERE dt_pk IN (SELECT 1 UNION SELECT 2);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.06..2034.09 rows=20000 width=24) (actual time=0.057..15.523 rows=20000 loops=1)
Join Filter: (events_1.dt_pk = (1))
Rows Removed by Join Filter: 40000
-> Unique (cost=0.06..0.07 rows=2 width=4) (actual time=0.026..0.029 rows=2 loops=1)
-> Sort (cost=0.06..0.07 rows=2 width=4) (actual time=0.024..0.025 rows=2 loops=1)
Sort Key: (1)
Sort Method: quicksort Memory: 25kB
-> Append (cost=0.00..0.05 rows=2 width=4) (actual time=0.006..0.009 rows=2 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
-> Append (cost=0.00..642.00 rows=30000 width=24) (actual time=0.012..4.334 rows=30000 loops=2)
-> Seq Scan on events_1 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.011..1.057 rows=10000 loops=2)
-> Seq Scan on events_2 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.004..0.641 rows=10000 loops=2)
-> Seq Scan on events_3 (cost=0.00..164.00 rows=10000 width=24) (actual time=0.002..0.594 rows=10000 loops=2)
Planning Time: 0.531 ms
Execution Time: 16.567 ms
(16 rows)
I am not certain, but it may be because the tables are so small. You might want to try with bigger tables.
If you care more about get it working than the fine details, and you haven't tried this yet: you can rewrite the query to something like
explain analyze select *
from test.dates d
join test.events e on e.dt_pk = d.id
where
d.dt between '2020-01-02'::date and '2020-01-03'::date
and e.dt_pk in (extract(day from '2020-01-02'::date)::int,
extract(day from '2020-01-03'::date)::int);
which will give the expected pruning.