We have a large table in Redshift where we are storing our AWS billing files and querying them. We are using interleaved sort keys as not all queries have all the filters.
Running this query
select "column", type, encoding, distkey, sortkey, "notnull"
from pg_table_def
where tablename = 'accountbillingflat'
and sortkey <> 0
order by sortkey;
Gives this result
Now if we execute the following query
SELECT
payeraccountid,
linkedaccountid,
billingmonth,
updatedon
FROM
accountbillingflat
WHERE
linkedaccountid IN (
'123'
)
AND billingmonth IN (
201603
)
GROUP BY
payeraccountid,
linkedaccountid,
billingmonth,
updatedon
ORDER BY
billingmonth desc,
updatedon desc
The plan for query execution is
XN Merge (cost=1000002059236.65..1000002059237.77 rows=445 width=44)
-> XN Network (cost=1000002059236.65..1000002059237.77 rows=445 width=44)
-> XN Sort (cost=1000002059236.65..1000002059237.77 rows=445 width=44)
-> XN HashAggregate (cost=2059217.08..2059217.08 rows=445 width=44)
-> XN Seq Scan on accountbillingflat (cost=0.00..2045174.64 rows=1404244 width=44)
The actual execution is
And looking into the details of sequential scan we see
which is strange as both linkedaccountid and billingmonth are sort keys.
Related
I have four tables in postgres (size: 50 millions rows). I would like to perform join on pair of tables (after merge operation) and later join them sorted by a column.
The EXPLAIN query is as follows:
EXPLAIN (
SELECT i23.*, s23.score
FROM bp7_dataprep.id_23mer i23 JOIN bp7_dataprep.score_23mer s23 ON i23.seq = s23.seq
UNION
SELECT i9.*, s9.score
FROM bp7_dataprep.id_9mer i9 JOIN bp7_dataprep.score_9mer s9 ON i9.seq = s9.seq
ORDER BY id
) ;
EXPLAIN shows the output as:
Sort (cost=183017409154.40..183633517319.38 rows=246443265992 width=72)
Sort Key: i23.id
-> Unique (cost=93490775307.52..95955207967.44 rows=246443265992 width=72)
-> Sort (cost=93490775307.52..94106883472.50 rows=246443265992 width=72)
Sort Key: i23.seq, i23.id, s23.score
-> Append (cost=183991975.33..6428574120.56 rows=246443265992 width=72)
-> Gather (cost=183991975.33..248110189.48 rows=563676608 width=65)
Workers Planned: 2
-> Merge Join (cost=183990975.33..191741528.68 rows=234865253 width=65)
Merge Cond: ((i23.seq)::text = (s23.seq)::text)
-> Sort (cost=59046049.02..59633212.15 rows=234865253 width=57)
Sort Key: i23.seq
-> Parallel Seq Scan on id_23mer i23 (cost=0.00..8730486.53 rows=234865253 width=57)
-> Materialize (cost=124944926.31..127763309.35 rows=563676608 width=40)
-> Sort (cost=124944926.31..126354117.83 rows=563676608 width=40)
Sort Key: s23.seq
-> Seq Scan on score_23mer s23 (cost=0.00..12187639.08 rows=563676608 width=40)
-> Merge Join (cost=1.14..3716031271.15 rows=245879589384 width=51)
Merge Cond: ((s9.seq)::text = (i9.seq)::text)
-> Index Scan using score_9mer_seq_idx on score_9mer s9 (cost=0.57..12329705.30 rows=220649248 width=18)
-> Materialize (cost=0.57..15507927.70 rows=220649248 width=43)
-> Index Scan using id_9mer_seq_idx on id_9mer i9 (cost=0.57..14956304.58 rows=220649248 width=43)
The above command is taking too long to run. Any suggestions to improve the query.
DDL of four tables are as follows:
CREATE TABLE
id_23mer
(
seq CHARACTER VARYING NOT NULL,
id CHARACTER VARYING NOT NULL
);
CREATE TABLE
id_9mer
(
seq CHARACTER VARYING NOT NULL,
id CHARACTER VARYING NOT NULL
);
CREATE TABLE
score_23mer
(
seq CHARACTER VARYING NOT NULL,
score DOUBLE PRECISION
);
CREATE TABLE
score_9mer
(
seq CHARACTER VARYING NOT NULL,
score DOUBLE PRECISION
);
Note: I have indexes on 'seq column' of each table.
I have two tables in a Postgres 11 database:
client table
--------
client_id integer
client_name character_varying
file table
--------
file_id integer
client_id integer
file_name character_varying
The client table is not partitioned, the file table is partitioned by client_id (partition by list). When a new client is inserted into the client table, a trigger creates a new partition for the file table.
The file table has a foreign key constraint referencing the client table on client_id.
When I execute this SQL (where c.client_id = 1), everything seems fine:
explain
select *
from client c
join file f using (client_id)
where c.client_id = 1;
Partition pruning is used, only the partition file_p1 is scanned:
Nested Loop (cost=0.00..3685.05 rows=100001 width=82)
-> Seq Scan on client c (cost=0.00..1.02 rows=1 width=29)
Filter: (client_id = 1)
-> Append (cost=0.00..2684.02 rows=100001 width=57)
-> Seq Scan on file_p1 f (cost=0.00..2184.01 rows=100001 width=57)
Filter: (client_id = 1)
But when I use a where clause like "where c.client_name = 'test'", the database scans in all partitions and does not recognize, that client_name "test" is equal to client_id 1:
explain
select *
from client c
join file f using (client_id)
where c.client_name = 'test';
Execution plan:
Hash Join (cost=1.04..6507.57 rows=100001 width=82)
Hash Cond: (f.client_id = c.client_id)
-> Append (cost=0.00..4869.02 rows=200002 width=57)
-> Seq Scan on file_p1 f (cost=0.00..1934.01 rows=100001 width=57)
-> Seq Scan on file_p4 f_1 (cost=0.00..1934.00 rows=100000 width=57)
-> Seq Scan on file_pdefault f_2 (cost=0.00..1.00 rows=1 width=556)
-> Hash (cost=1.02..1.02 rows=1 width=29)
-> Seq Scan on client c (cost=0.00..1.02 rows=1 width=29)
Filter: ((name)::text = 'test'::text)
So for this SQL, alle partitions in the file-table are scanned.
So should every select use the column on which the tables are partitioned by? Is the database not able to deviate the partition pruning criteria?
Edit:
To add some information:
In the past, I have been working with Oracle databases most of the time.
The execution plan there would be something like
Do a full table scan on client table with the client name to find out the client_id.
Do a "PARTITION LIST" access to the file table, where SQL Developer states PARTITION_START = KEY and PARTITION_STOP = KEY to indicate the exact partition is not known when calculating the execution plan, but the access will be done to only a list of paritions, which are calculated on the client_id found in the client table.
This is what I would have expected in Postgresql as well.
The documentation states that dynamic partition pruning is possible
(...) During actual execution of the query plan. Partition pruning may also be performed here to remove partitions using values which are only known during actual query execution. This includes values from subqueries and values from execution-time parameters such as those from parameterized nested loop joins.
If I understand it correctly, it applies to prepared statements or queries with subqueries which provide the partition key value as a parameter. Use explain analyse to see dynamic pruning (my sample data contains a million rows in three partitions):
explain analyze
select *
from file
where client_id = (
select client_id
from client
where client_name = 'test');
Append (cost=25.88..22931.88 rows=1000000 width=14) (actual time=0.091..96.139 rows=333333 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on client (cost=0.00..25.88 rows=6 width=4) (actual time=0.040..0.042 rows=1 loops=1)
Filter: (client_name = 'test'::text)
Rows Removed by Filter: 2
-> Seq Scan on file_p1 (cost=0.00..5968.66 rows=333333 width=14) (actual time=0.039..70.026 rows=333333 loops=1)
Filter: (client_id = $0)
-> Seq Scan on file_p2 (cost=0.00..5968.68 rows=333334 width=14) (never executed)
Filter: (client_id = $0)
-> Seq Scan on file_p3 (cost=0.00..5968.66 rows=333333 width=14) (never executed)
Filter: (client_id = $0)
Planning Time: 0.423 ms
Execution Time: 109.189 ms
Note that scans for partitions p2 and p3 were never executed.
Answering your exact question, the partition pruning in queries with joins described in the question is not implemented in Postgres (yet?)
Say I have the following tables and indices:
create table inbound_messages(id int, user_id int, received_at timestamp);
create table outbound_messages(id int, user_id int, sent_at timestamp);
create index on inbound_messages(user_id, received_at);
create index on outbound_messages(user_id, sent_at);
Now I want to pull out the last 20 messages for a user, either inbound or outbound in a specific time range. I can do the following and from the explain it looks like PG walks back both indices in 'parallel' so it minimises the amount of rows it needs to scan.
explain select * from (select id, user_id, received_at as time from inbound_messages union all select id, user_id, sent_at as time from outbound_messages) x where user_id = 5 and time between '2018-01-01' and '2020-01-01' order by user_id,time desc limit 20;
Limit (cost=0.32..16.37 rows=2 width=16)
-> Merge Append (cost=0.32..16.37 rows=2 width=16)
Sort Key: inbound_messages.received_at DESC
-> Index Scan Backward using inbound_messages_user_id_received_at_idx on inbound_messages (cost=0.15..8.17 rows=1 width=16)
Index Cond: ((user_id = 5) AND (received_at >= '2018-01-01 00:00:00'::timestamp without time zone) AND (received_at <= '2020-01-01 00:00:00'::timestamp without time zone))
-> Index Scan Backward using outbound_messages_user_id_sent_at_idx on outbound_messages (cost=0.15..8.17 rows=1 width=16)
Index Cond: ((user_id = 5) AND (sent_at >= '2018-01-01 00:00:00'::timestamp without time zone) AND (sent_at <= '2020-01-01 00:00:00'::timestamp without time zone))
For example it could do something crazy like find all the matching rows in memory, and then sort the rows. Lets say there were millions of matching rows then this could take a long time. But because it walks the indices in the same order we want the results in this is a fast operation. It looks like the 'Merge Append' operation is done lazily and it doesn't actually materialize all the matching rows.
Now we can see postgres supports this operation for two distinct tables, however is it possible to force Postgres to use this optimisation for a single table.
Lets say I wanted the last 20 inbound messages for user_id = 5 or user_id = 6.
explain select * from inbound_messages where user_id in (6,7) order by received_at desc limit 20;
Then we get a query plan that does a bitmap heap scan, and then does an in-memory sort. So if there are millions of messages that match then it will look at millions of rows even though theoretically it could use the same Merge trick to only look at a few rows.
Limit (cost=15.04..15.09 rows=18 width=16)
-> Sort (cost=15.04..15.09 rows=18 width=16)
Sort Key: received_at DESC
-> Bitmap Heap Scan on inbound_messages (cost=4.44..14.67 rows=18 width=16)
Recheck Cond: (user_id = ANY ('{6,7}'::integer[]))
-> Bitmap Index Scan on inbound_messages_user_id_received_at_idx (cost=0.00..4.44 rows=18 width=0)
Index Cond: (user_id = ANY ('{6,7}'::integer[]))
We could think of just adding (received_at) as an index on the table and then it will do the same backwards scan. However, if we have a large number of users then we are missing out on a potentially large speedup because we are scanning lots of index entries that would not match the query.
The following approach should work as a way of forcing Postgres to use the "merge append" plan when you are interested in most recent messages for two users from the same table.
[Note: I tested this on YugabyteDB (which is based on Postgres)- so I expect the same to apply to Postgres also.]
explain select * from (
(select * from inbound_messages where user_id = 6 order by received_at DESC)
union all
(select * from inbound_messages where user_id = 7 order by received_at DESC)
) AS result order by received_at DESC limit 20;
which produces:
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.01..3.88 rows=20 width=16)
-> Merge Append (cost=0.01..38.71 rows=200 width=16)
Sort Key: inbound_messages.received_at DESC
-> Index Scan Backward using inbound_messages_user_id_received_at_idx on inbound_messages (cost=0.00..17.35 rows=100 width=16)
Index Cond: (user_id = 6)
-> Index Scan Backward using inbound_messages_user_id_received_at_idx on inbound_messages inbound_messages_1 (cost=0.00..17.35 rows=100 width=16)
Index Cond: (user_id = 7)
I have a table in Redshift with a few billion rows which looks like this
CREATE TABLE channels AS (
fact_key TEXT NOT NULL distkey
job_key BIGINT
channel_key TEXT NOT NULL
)
diststyle key
compound sortkey(job_key, channel_key);
When I query by job_key + channel_key my seq scan is properly restricted by the full sortkey if I use specific values for channel_key in my query.
EXPLAIN
SELECT * FROM channels scd
WHERE scd.job_key = 1 AND scd.channel_key IN ('1234', '1235', '1236', '1237')
XN Seq Scan on channels scd (cost=0.00..3178474.92 rows=3428929 width=77)
Filter: ((((channel_key)::text = '1234'::text) OR ((channel_key)::text = '1235'::text) OR ((channel_key)::text = '1236'::text) OR ((channel_key)::text = '1237'::text)) AND (job_key = 1))
However if I query against channel_key by using IN + a subquery Redshift does not use the sortkey.
EXPLAIN
SELECT * FROM channels scd
WHERE scd.job_key = 1 AND scd.channel_key IN (select distinct channel_key from other_channel_list where job_key = 14 order by 1)
XN Hash IN Join DS_DIST_ALL_NONE (cost=3.75..3540640.36 rows=899781 width=77)
Hash Cond: (("outer".channel_key)::text = ("inner".channel_key)::text)
-> XN Seq Scan on channels scd (cost=0.00..1765819.40 rows=141265552 width=77)
Filter: (job_key = 1)
-> XN Hash (cost=3.75..3.75 rows=1 width=402)
-> XN Subquery Scan "IN_subquery" (cost=0.00..3.75 rows=1 width=402)
-> XN Unique (cost=0.00..3.74 rows=1 width=29)
-> XN Seq Scan on other_channel_list (cost=0.00..3.74 rows=1 width=29)
Filter: (job_key = 14)
Is it possible to get this to work? My ultimate goal is to turn this into a view so pre-defining my list of channel_keys won't work.
Edit to provide more context:
This is part of a larger query and the results of this get hash joined to some other data. If I hard-code the channel_keys then the input to the hash join is ~2 million rows. If I use the IN condition with the subquery (nothing else changes) then the input to the hash join is 400 million rows. The total query time goes from ~40 seconds to 15+ minutes.
Does this give you a better plan than the subquery version?
with other_channels as (
select distinct channel_key from other_channel_list where job_key = 14 order by 1
)
SELECT *
FROM channels scd
JOIN other_channels ocd on scd.channel_key = ocd.channel_key
WHERE scd.job_key = 1
I'm using a partitioned postgres table following the documentation using rules, using a partitioning scheme based on date ranges (my date column is an epoch integer)
The problem is that a simple query to select the row with the maximum value of the sharded column is not using indices:
First, some settings to coerce postgres to do what I want:
SET constraint_exclusion = on;
SET enable_seqscan = off;
The query on a single partition works:
explain (SELECT * FROM urls_0 ORDER BY date_created ASC LIMIT 1);
Limit (cost=0.00..0.05 rows=1 width=38)
-> Index Scan using urls_date_created_idx_0 on urls_0 (cost=0.00..436.68 rows=8099 width=38)
However, the same query on the entire table is seq scanning:
explain (SELECT * FROM urls ORDER BY date_created ASC LIMIT 1);
Limit (cost=50000000274.88..50000000274.89 rows=1 width=51)
-> Sort (cost=50000000274.88..50000000302.03 rows=10859 width=51)
Sort Key: public.urls.date_created
-> Result (cost=10000000000.00..50000000220.59 rows=10859 width=51)
-> Append (cost=10000000000.00..50000000220.59 rows=10859 width=51)
-> Seq Scan on urls (cost=10000000000.00..10000000016.90 rows=690 width=88)
-> Seq Scan on urls_15133 urls (cost=10000000000.00..10000000016.90 rows=690 width=88)
-> Seq Scan on urls_15132 urls (cost=10000000000.00..10000000016.90 rows=690 width=88)
-> Seq Scan on urls_15131 urls (cost=10000000000.00..10000000016.90 rows=690 width=88)
-> Seq Scan on urls_0 urls (cost=10000000000.00..10000000152.99 rows=8099 width=38)
Finally, a lookup by date_created does work correctly with contraint exclusions and index scans:
explain (SELECT * FROM urls where date_created = 1212)
Result (cost=10000000000.00..10000000052.75 rows=23 width=45)
-> Append (cost=10000000000.00..10000000052.75 rows=23 width=45)
-> Seq Scan on urls (cost=10000000000.00..10000000018.62 rows=3 width=88)
Filter: (date_created = 1212)
-> Index Scan using urls_date_created_idx_0 on urls_0 urls (cost=0.00..34.12 rows=20 width=38)
Index Cond: (date_created = 1212)
Does anyone know how to use partitioning so that this type of query will use an index scan?
Postgresql 9.1 knows how to optimize this out of the box.
In 9.0 or earlier, you need to decompose the query manually, by unioning each of the subqueries individually with their own order by/limit statement.