Redshift DELETE using slow Hash Join while equivalent SELECT uses Merge Join

Redshift DELETE using slow Hash Join while equivalent SELECT uses Merge Join - amazon-redshift

We are using the recommended method defined here for performing "upserts":
http://docs.aws.amazon.com/redshift/latest/dg/merge-replacing-existing-rows.html
It is taking almost two minutes to load a file of just 150 rows. Almost all of this time is spent on this delete operation:
delete
from
    measurement using measurement_temp
where
    measurement.measurement_tag_id = measurement_temp.measurement_tag_id
    and measurement.date_time = measurement_temp.date_time
Even if the temp table is empty, it still takes the nearly two minutes for this operation to complete.
It is still this slow after running a full vacuum on all tables.
Both of the columns in the predicate are part of the compound sort key, and the measurement_tag_id is the distribution key, so it's not clear to me why Redshift is taking so long.
The schema for this table looks like this:
create table measurement(
measurement_tag_id integer not null distkey,
date_time timestamp not null,
value_avg decimal(8,3),
value_min decimal(8,3),
value_max decimal(8,3),
value_std_dev decimal(8,3),
failed_qa_rule_id integer,
primary key(measurement_tag_id, date_time),
foreign key (measurement_tag_id) references measurement_tag(measurement_tag_id),
foreign key (failed_qa_rule_id) references qa_rule(qa_rule_id)
)
compound sortkey(measurement_tag_id, date_time);
Here is the query plan for the DELETE:
XN Hash Join DS_DIST_NONE (cost=9.30..3368957513.17 rows=451945 width=6)
Hash Cond: (("outer".date_time = "inner".date_time) AND ("outer".measurement_tag_id = "inner".measurement_tag_id))
-> XN Seq Scan on measurement (cost=0.00..26844282.88 rows=2684428288 width=18)
-> XN Hash (cost=6.20..6.20 rows=620 width=12)
-> XN Seq Scan on measurement_temp (cost=0.00..6.20 rows=620 width=12)
The equivalent SELECT returns almost instantly. Here's its query plan:
explain
select * from measurement as m
join measurement_temp as mt on m.measurement_tag_id = mt.measurement_tag_id and m.date_time = mt.date_time
XN Merge Join DS_DIST_NONE (cost=0.00..40266436.05 rows=451945 width=144)
Merge Cond: (("outer".measurement_tag_id = "inner".measurement_tag_id) AND ("outer".date_time = "inner".date_time))
-> XN Seq Scan on measurement m (cost=0.00..26844282.88 rows=2684428288 width=68)
-> XN Seq Scan on measurement_temp mt (cost=0.00..6.20 rows=620 width=76)
So the DELETE is performing a Hash Join, while the SELECT is using a Merge Join which is much faster.
Any ideas how to speed this up? Most of the time, there isn't anything to delete (and it's still this slow), so I could add a SELECT query first to check whether it's necessary to delete anything, but it is still going to be slow for re-loading over existing data.

Related

Postgres / PostGIS query optimisation

I've put together a query which works, I'm just wanting to learn how I can optimise it. The idea of the query is that given a particular row in table A, it take its geometry and in table B finds the closest matching geometry to it filtered by certain criteria.
SELECT a.id,
closest_pt.dist,
closest_pt.name,
closest_pt.meters
FROM "hex-hex-uk" a
CROSS JOIN lateral
(
SELECT a.id,
b.name AS name,
a.geom <-> b.way AS dist,
st_distance(a.geom, b.way, FALSE) AS meters
FROM "osm-polygons-uk" b
WHERE (
b.landuse='industrial'
OR b.man_made='works')
AND st_area(b.way, FALSE)>15000
ORDER BY a.geom <-> b.way
LIMIT 1) AS closest_pt
WHERE a.id='abc'
Currently the query executes in 30-90ms, but I need to perform millions of these lookups. I tried swopping
a.id='abc' with a.id IN ('abc','def','ghi',...) and looking up 10000 at a time, but it takes 10mins+ which doesn't really add up.
Here's the query plan as it stands:
" -> Index Scan using ""hex-hex-uk_id_idx"" on ""hex-hex-uk"" a (cost=0.43..8.45 rows=1 width=168) (actual time=0.029..0.046 rows=1 loops=1)"
" Index Cond: ((id)::text = '89195c849a3ffff'::text)"
" -> Limit (cost=0.28..536.88 rows=1 width=43) (actual time=33.009..33.062 rows=1 loops=1)"
" -> Index Scan using ""idx_osm-polygons-uk_geom"" on ""osm-polygons-uk"" b (cost=0.28..4935623.77 rows=9198 width=43) (actual time=32.992..33.001 rows=1 loops=1)"
" Order By: (way <-> a.geom)"
" Filter: (((landuse = 'industrial'::text) OR (man_made = 'works'::text)) AND (st_area((way)::geography, false) > '15000'::double precision))"
" Rows Removed by Filter: 7"
"Planning Time: 0.142 ms"
"Execution Time: 33.311 ms"
What would be the process for trying to optimise a query like this? I learn best by example hence I think it makes sense to post on here rather than just reading about optimisation techniques.
Thanks!
CREATE TABLE "osm-polygons-uk" (id bigint,name text,landuse text, man_made text,way geometry);
CREATE INDEX "idx_osm-polygons-uk_geom" ON "osm-polygons-uk" USING gist (way);
ALTER TABLE "osm-polygons-uk" ADD PRIMARY KEY (id);
CREATE TABLE "hex-hex-uk" (id varchar(15), geom geometry);
CREATE UNIQUE INDEX ON "hex-hex-uk" (id);

Some great tips above. The comment about the indexed materialized view led me to create a view with only the filtered data.. it cut the number of rows down from 1 million to ~20000 and executed in a couple of seconds.
From then I tweaked the original query and it ended up blasting through 2400000 rows in a couple of minutes. A huge improvement from the original 13 hours it was going to take to run!
SELECT a.id, closest_pt.name, ST_Distance(a.geom, closest_pt.way, false) as meters
FROM "hex-hex-uk" a
CROSS JOIN LATERAL
(SELECT
id,
b.name as name,
a.geom <-> b.way as dist,
b.way as way
FROM "tmp_industrial" b
ORDER BY dist ASC
LIMIT 1) AS closest_pt WHERE a.id IN ('abc','def','ghi',...);
Thanks for the tips, it gives me a bit of a guide as to how to go about debugging query performance.

When does Postgresql do partition pruning with JOIN columns?

I have two tables in a Postgres 11 database:
client table
--------
client_id integer
client_name character_varying
file table
--------
file_id integer
client_id integer
file_name character_varying
The client table is not partitioned, the file table is partitioned by client_id (partition by list). When a new client is inserted into the client table, a trigger creates a new partition for the file table.
The file table has a foreign key constraint referencing the client table on client_id.
When I execute this SQL (where c.client_id = 1), everything seems fine:
explain
select *
from client c
join file f using (client_id)
where c.client_id = 1;
Partition pruning is used, only the partition file_p1 is scanned:
Nested Loop (cost=0.00..3685.05 rows=100001 width=82)
-> Seq Scan on client c (cost=0.00..1.02 rows=1 width=29)
Filter: (client_id = 1)
-> Append (cost=0.00..2684.02 rows=100001 width=57)
-> Seq Scan on file_p1 f (cost=0.00..2184.01 rows=100001 width=57)
Filter: (client_id = 1)
But when I use a where clause like "where c.client_name = 'test'", the database scans in all partitions and does not recognize, that client_name "test" is equal to client_id 1:
explain
select *
from client c
join file f using (client_id)
where c.client_name = 'test';
Execution plan:
Hash Join (cost=1.04..6507.57 rows=100001 width=82)
Hash Cond: (f.client_id = c.client_id)
-> Append (cost=0.00..4869.02 rows=200002 width=57)
-> Seq Scan on file_p1 f (cost=0.00..1934.01 rows=100001 width=57)
-> Seq Scan on file_p4 f_1 (cost=0.00..1934.00 rows=100000 width=57)
-> Seq Scan on file_pdefault f_2 (cost=0.00..1.00 rows=1 width=556)
-> Hash (cost=1.02..1.02 rows=1 width=29)
-> Seq Scan on client c (cost=0.00..1.02 rows=1 width=29)
Filter: ((name)::text = 'test'::text)
So for this SQL, alle partitions in the file-table are scanned.
So should every select use the column on which the tables are partitioned by? Is the database not able to deviate the partition pruning criteria?
Edit:
To add some information:
In the past, I have been working with Oracle databases most of the time.
The execution plan there would be something like
Do a full table scan on client table with the client name to find out the client_id.
Do a "PARTITION LIST" access to the file table, where SQL Developer states PARTITION_START = KEY and PARTITION_STOP = KEY to indicate the exact partition is not known when calculating the execution plan, but the access will be done to only a list of paritions, which are calculated on the client_id found in the client table.
This is what I would have expected in Postgresql as well.

The documentation states that dynamic partition pruning is possible
(...) During actual execution of the query plan. Partition pruning may also be performed here to remove partitions using values which are only known during actual query execution. This includes values from subqueries and values from execution-time parameters such as those from parameterized nested loop joins.
If I understand it correctly, it applies to prepared statements or queries with subqueries which provide the partition key value as a parameter. Use explain analyse to see dynamic pruning (my sample data contains a million rows in three partitions):
explain analyze
select *
from file
where client_id = (
select client_id
from client
where client_name = 'test');
Append (cost=25.88..22931.88 rows=1000000 width=14) (actual time=0.091..96.139 rows=333333 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on client (cost=0.00..25.88 rows=6 width=4) (actual time=0.040..0.042 rows=1 loops=1)
Filter: (client_name = 'test'::text)
Rows Removed by Filter: 2
-> Seq Scan on file_p1 (cost=0.00..5968.66 rows=333333 width=14) (actual time=0.039..70.026 rows=333333 loops=1)
Filter: (client_id = $0)
-> Seq Scan on file_p2 (cost=0.00..5968.68 rows=333334 width=14) (never executed)
Filter: (client_id = $0)
-> Seq Scan on file_p3 (cost=0.00..5968.66 rows=333333 width=14) (never executed)
Filter: (client_id = $0)
Planning Time: 0.423 ms
Execution Time: 109.189 ms
Note that scans for partitions p2 and p3 were never executed.
Answering your exact question, the partition pruning in queries with joins described in the question is not implemented in Postgres (yet?)

Optimizing indexes for query on large table with multiple joins

We have an images table containing around ~25 million records and when I query the table based on the values from several joins the planner's estimates are quite different from the actual results for row counts. We have other queries that are roughly the same without all of the joins and it is much faster. I would like to know what steps I can take to debug and optimize the query. Also, is it better to have one index covering all columns included in the join and the where clause or a multiple indexes one for each join column and then another with all of the fields in the where clause?
The query:
EXPLAIN ANALYZE
SELECT "images".* FROM "images"
INNER JOIN "locations" ON "locations"."id" = "images"."location_id"
INNER JOIN "users" ON "images"."creator_id" = "users"."id"
INNER JOIN "user_groups" ON "users"."id" = "user_groups"."user_id"
WHERE "images"."deleted_at" IS NULL
AND "user_groups"."group_id" = 7
AND "images"."creator_type" = 'User'
AND "images"."status" = 2
AND "locations"."active" = TRUE
ORDER BY date_uploaded DESC
LIMIT 50
OFFSET 0;
The explain:
Limit (cost=25670.61..25670.74 rows=50 width=585) (actual time=1556.250..1556.278 rows=50 loops=1)
-> Sort (cost=25670.61..25674.90 rows=1714 width=585) (actual time=1556.250..1556.264 rows=50 loops=1)
Sort Key: images.date_uploaded
Sort Method: top-N heapsort Memory: 75kB
-> Nested Loop (cost=1.28..25613.68 rows=1714 width=585) (actual time=0.097..1445.777 rows=160886 loops=1)
-> Nested Loop (cost=0.85..13724.04 rows=1753 width=585) (actual time=0.069..976.326 rows=161036 loops=1)
-> Nested Loop (cost=0.29..214.87 rows=22 width=8) (actual time=0.023..0.786 rows=22 loops=1)
-> Seq Scan on user_groups (cost=0.00..95.83 rows=22 width=4) (actual time=0.008..0.570 rows=22 loops=1)
Filter: (group_id = 7)
Rows Removed by Filter: 5319
-> Index Only Scan using users_pkey on users (cost=0.29..5.40 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=22)
Index Cond: (id = user_groups.user_id)
Heap Fetches: 18
-> Index Scan using creator_date_uploaded_Where_pub_not_del on images (cost=0.56..612.08 rows=197 width=585) (actual time=0.062..40.992 rows=7320 loops=22)
Index Cond: ((creator_id = users.id) AND ((creator_type)::text = 'User'::text) AND (status = 2))
-> Index Scan using locations_pkey on locations (cost=0.43..6.77 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=161036)
Index Cond: (id = images.location_id)
Filter: active
Rows Removed by Filter: 0
Planning time: 1.694 ms
Execution time: 1556.352 ms
We are running Postgres 9.4 on an RDS db.m4.large instance.

As for the query itself, the only thing you can do is skipping on users table. From EXPLAIN you can see that it only does an Index Only Scan without actually touching the table. So, technically your query could look like this:
SELECT images.* FROM images
INNER JOIN locations ON locations.id = images.location_id
INNER JOIN user_groups ON images.creator_id = user_groups.user_id
WHERE images.deleted_at IS NULL
AND user_groups.group_id = 7
AND images.creator_type = 'User'
AND images.status = 2
AND locations.active = TRUE
ORDER BY date_uploaded DESC
OFFSET 0 LIMIT 50
The rest is about indexes. locations seems to have very little data, so optimization here will gain you nothing. user_groups on the other hand could benefit from an index ON (user_id) WHERE group_id = 7 or ON (group_id, user_id). This should remove some extra filtering on table content.
-- Option 1
CREATE INDEX ix_usergroups_userid_groupid7
ON user_groups (user_id)
WHERE group_id = 7;
-- Option 2
CREATE INDEX ix_usergroups_groupid_userid
ON user_groups (group_id, user_id);
Of course, the biggest thing here is images. Currently, the planer would do an index scan on creator_date_uploaded_Where_pub_not_del which I suspect does not fully match the requirements. Here, multiple options come to mind depending on your usage pattern - from one where the search parameters are rather common:
-- Option 1
CREATE INDEX ix_images_creatorid_typeuser_status2_notdel
ON images (creator_id)
WHERE creator_type = 'User' AND status = 2 AND deleted_at IS NULL;
to one with completely dynamic parameters:
-- Option 2
CREATE INDEX ix_images_status_creatortype_creatorid_notdel
ON images (status, creator_type, creator_id)
WHERE deleted_at IS NULL;
The first index is preferable as it is smaller (values are filtered-out rather than indexed).
To summarize, unless you are limited by memory (or other factors), I would add indexes on user_groups and images. Correct choice of indexes must be confirmed empirically, as multiple options are usually available and the situation depends on statistical distribution of data.

Here's a different approach:
I think one of the problems is that you are doing joins 1714 times, and then just returning the first 50 results. We'll probably want to avoid extra joins as soon as possible.
For this, We'll try to have an index by date_uploaded first. And then we will filter by the rest of the columns. Also, We add creator_id for getting an index-only scan:
CREATE INDEX ix_images_sort_test
ON images (date_uploaded desc, creator_id)
WHERE creator_type = 'User' AND status = 2 AND deleted_at IS NULL;
Also you may use the generic version (unfiltered). But it should somewhat worse. Since the first column will be date_uploaded we will need to read the whole index for the filtering of the rest of the columns.
CREATE INDEX ix_images_sort_test
ON images (date_uploaded desc, status, creator_type, creator_id)
WHERE deleted_at IS NULL;
The pity here is that you are also filtering by group_id, which is on another table. But even that, It may be worth trying this approach.
Also, verify that all joined tables have an index on the foreign key.
So, add an index for user_groups as (user_id, group_id)
Also, as Boris noticed, you may remove the "Users" join.

Can Redshift use the results of a subquery to filter by sortkey?

I have a table in Redshift with a few billion rows which looks like this
CREATE TABLE channels AS (
fact_key TEXT NOT NULL distkey
job_key BIGINT
channel_key TEXT NOT NULL
)
diststyle key
compound sortkey(job_key, channel_key);
When I query by job_key + channel_key my seq scan is properly restricted by the full sortkey if I use specific values for channel_key in my query.
EXPLAIN
SELECT * FROM channels scd
WHERE scd.job_key = 1 AND scd.channel_key IN ('1234', '1235', '1236', '1237')
XN Seq Scan on channels scd (cost=0.00..3178474.92 rows=3428929 width=77)
Filter: ((((channel_key)::text = '1234'::text) OR ((channel_key)::text = '1235'::text) OR ((channel_key)::text = '1236'::text) OR ((channel_key)::text = '1237'::text)) AND (job_key = 1))
However if I query against channel_key by using IN + a subquery Redshift does not use the sortkey.
EXPLAIN
SELECT * FROM channels scd
WHERE scd.job_key = 1 AND scd.channel_key IN (select distinct channel_key from other_channel_list where job_key = 14 order by 1)
XN Hash IN Join DS_DIST_ALL_NONE (cost=3.75..3540640.36 rows=899781 width=77)
Hash Cond: (("outer".channel_key)::text = ("inner".channel_key)::text)
-> XN Seq Scan on channels scd (cost=0.00..1765819.40 rows=141265552 width=77)
Filter: (job_key = 1)
-> XN Hash (cost=3.75..3.75 rows=1 width=402)
-> XN Subquery Scan "IN_subquery" (cost=0.00..3.75 rows=1 width=402)
-> XN Unique (cost=0.00..3.74 rows=1 width=29)
-> XN Seq Scan on other_channel_list (cost=0.00..3.74 rows=1 width=29)
Filter: (job_key = 14)
Is it possible to get this to work? My ultimate goal is to turn this into a view so pre-defining my list of channel_keys won't work.
Edit to provide more context:
This is part of a larger query and the results of this get hash joined to some other data. If I hard-code the channel_keys then the input to the hash join is ~2 million rows. If I use the IN condition with the subquery (nothing else changes) then the input to the hash join is 400 million rows. The total query time goes from ~40 seconds to 15+ minutes.

Does this give you a better plan than the subquery version?
with other_channels as (
select distinct channel_key from other_channel_list where job_key = 14 order by 1
)
SELECT *
FROM channels scd
JOIN other_channels ocd on scd.channel_key = ocd.channel_key
WHERE scd.job_key = 1

Optimising UUID Lookups in Postgres

All uuid columns below use the native Postgres uuid column type.
Have a lookup table where the uuid (uuid type 4 - so as random as can feasibly be) is the primary key. Regularly pull sequence of rows, say 10,000 from this lookup table.
Then, wish to use that set of uuid's retrieved from the lookup table to query other tables, typically two others, using the UUID's just retrieved. The UUID's in the other tables (tables A and B) are not primary keys. UUID columns in other tables A and B have UNIQUE constraints (btree indices).
Currently not doing this merging using a JOIN of any kind, just simple:
Query lookup table, get uuids.
Query table A using uuids from (1)
Query table B using uuids from (1)
The issue is that queries (2) and (3) are surprisingly slow. So for around 4000 rows in tables A and B, particularly table A, around 30-50 seconds typically. Table A has around 60M rows.
Dealing with just table A, when using EXPLAIN ANALYZE, reports as doing an "Index Scan" on the uuid column in column A, with an Index Cond in the EXPLAIN ANALYZE output.
I've experiment with various WHERE clauses:
uuid = ANY ('{
uuid = ANY(VALUES('
uuid ='uuid1' OR uuid='uuid2' etc ....
And experimented with btree (distinct), hash index table A on uuid, btree and hash index.
By far the fastest (which is still relatively slow) is: btree and use of "ANY ('{" in the WHERE clause.
Various opinions I've read:
Actually doing a proper JOIN e.g. LEFT OUTER JOIN across the three tables.
That the use of uuid type 4 is the problem, it being a randomly generated id, as opposed to a sequence based id.
Possibly experimenting with work_mem.
Anyway. Wondered if anyone else had any other suggestions?
Table: "lookup"
uuid: type uuid. not null. plain storage.
datetime_stamp: type bigint. not null. plain storage.
harvest_date_stamp: type bigint. not null. plain storage.
state: type smallint. not null. plain storage.
Indexes:
"lookup_pkey" PRIMARY KEY, btree (uuid)
"lookup_32ff3898" btree (datetime_stamp)
"lookup_6c8369bc" btree (harvest_date_stamp)
"lookup_9ed39e2e" btree (state)
Has OIDs: no
Table: "article_data"`
int: type integer. not null default nextval('article_data_id_seq'::regclass). plain storage.
title: text.
text: text.
insertion_date: date
harvest_date: timestamp with time zone.
uuid: uuid.
Indexes:
"article_data_pkey" PRIMARY KEY, btree (id)
"article_data_uuid_key" UNIQUE CONSTRAINT, btree (uuid)
Has OIDs: no
Both lookup and article_data have around 65m rows. Two queries:
SELECT uuid FROM lookup WHERE state = 200 LIMIT 4000;
OUTPUT FROM EXPLAIN (ANALYZE, BUFFERS):
Limit (cost=0.00..4661.02 rows=4000 width=16) (actual time=0.009..1.036 rows=4000 loops=1)
Buffers: shared hit=42
-> Seq Scan on lookup (cost=0.00..1482857.00 rows=1272559 width=16) (actual time=0.008..0.777 rows=4000 loops=1)
Filter: (state = 200)
Rows Removed by Filter: 410
Buffers: shared hit=42
Total runtime: 1.196 ms
(7 rows)
Question: Why does this do a sequence scan and not an index scan when there is a btree on state?
SELECT article_data.id, article_data.uuid, article_data.title, article_data.text
FROM article_data
WHERE uuid = ANY ('{f0d5e665-4f21-4337-a54b-cf0b4757db65,..... 3999 more uuid's ....}'::uuid[]);
OUTPUT FROM EXPLAIN (ANALYZE, BUFFERS):
Index Scan using article_data_uuid_key on article_data (cost=5.56..34277.00 rows=4000 width=581) (actual time=0.063..66029.031 rows=400
0 loops=1)
Index Cond: (uuid = ANY ('{f0d5e665-4f21-4337-a54b-cf0b4757db65,5618754f-544b-4700-9d24-c364fd0ba4e9,958e37e3-6e6e-4b2a-b854-48e88ac1fdb7, ba56b483-59b2-4ae5-ae44-910401f3221b,aa4
aca60-a320-4ed3-b7b4-829e6ca63592,05f1c0b9-1f9b-4e1c-8f41-07545d694e6b,7aa4dee9-be17-49df-b0ca-d6e63b0dc023,e9037826-86c4-4bbc-a9d5-6977ff7458af,db5852bf- a447-4a1d-9673-ead2f7045589
,6704d89 .......}'::uuid[]))
Buffers: shared hit=16060 read=4084 dirtied=292
Total runtime: 66041.443 ms
(4 rows)
Question: Why is this so slow, even though it's reading from disk?

Without seeing your table structure and the output of explain analyze..., I'd expect an inner join on the lookup table to give the best performance. (My table_a has about 10 million rows.)
select *
from table_a
inner join
-- Brain dead way to get about 1000 rows
-- from a renamed scratch table.
(select test_uuid from lookup_table
where test_id < 10000) t
on table_a.test_uuid = t.test_uuid;
"Nested Loop (cost=0.72..8208.85 rows=972 width=36) (actual time=0.041..11.825 rows=999 loops=1)"
" -> Index Scan using uuid_test_2_test_id_key on lookup_table (cost=0.29..39.30 rows=972 width=16) (actual time=0.015..0.414 rows=999 loops=1)"
" Index Cond: (test_id Index Scan using uuid_test_test_uuid_key on table_a (cost=0.43..8.39 rows=1 width=20) (actual time=0.010..0.011 rows=1 loops=999)"
" Index Cond: (test_uuid = lookup_table.test_uuid)"
"Planning time: 0.503 ms"
"Execution time: 11.953 ms"

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse