Avoid using Nested Loop Join while using a non Equi join condition - postgresql

Postgres is using a Nested Loop Join algorithm when I use a non equi join condition in my update query. I understand that the Nested Loop Join can be very costly as the right relation is scanned once for every row found in the left relation as per
[https://www.postgresql.org/docs/8.3/planner-optimizer.html]
The update query and the execution plan is below.
Query
explain analyze
UPDATE target_tbl tgt
set descr = stage.descr,
prod_name = stage.prod_name,
item_name = stage.item_name,
url = stage.url,
col1_name = stage.col1_name,
col2_name = stage.col2_name,
col3_name = stage.col3_name,
col4_name = stage.col4_name,
col5_name = stage.col5_name,
col6_name = stage.col6_name,
col7_name = stage.col7_name,
col8_name = stage.col8_name,
flag = stage.flag
from tbl1 stage
where tgt.col1 = stage.col1
and tgt.col2 = stage.col2
and coalesce(tgt.col3, 'col3'::text) = coalesce(stage.col3, 'col3'::text)
and coalesce(tgt.col4, 'col4'::text) = coalesce(stage.col4, 'col4'::text)
and stage.row_number::int >= 1::int
and stage.row_number::int < 50001::int;
Execution Plan
Update on target_tbl tgt (cost=0.56..3557.91 rows=1 width=813) (actual time=346153.460..346153.460 rows=0 loops=1)
-> Nested Loop (cost=0.56..3557.91 rows=1 width=813) (actual time=4.326..163876.029 rows=50000 loops=1)
-> Seq Scan on tbl1 stage (cost=0.00..2680.96 rows=102 width=759) (actual time=3.060..2588.745 rows=50000 loops=1)
Filter: (((row_number)::integer >= 1) AND ((row_number)::integer < 50001))
-> Index Scan using tbl_idx on target_tbl tgt (cost=0.56..8.59 rows=1 width=134) (actual time=3.152..3.212 rows=1 loops=50000)
Index Cond: ((col1 = stage.col1) AND (col2 = stage.col2) AND (COALESCE(col3, 'col3'::text) = COALESCE(stage.col3, 'col3'::text)) AND (COALESCE(col4, 'col4'::text) = COALESCE(stage.col4, 'col4'::text)))
Planning time: 17.700 ms
Execution time: 346157.168 ms
Is there any way to avoid the nested loop join during the execution of the above query?
Or is there a way that can help me to reduce the cost of the the nested loop scan, currently it takes 6-7 minutes to update just 50000 records?

PostgreSQL can choose a different join strategy in that case. The reason why it doesn't is the gross mis-estimate in the sequential scan: 102 instead of 50000.
Fix that problem, and things will get better:
ANALYZE tbl1;
If that is not enough, collect more detailed statistics:
ALTER TABLE tbl1 ALTER row_number SET STATISTICS 1000;
ANALYZE tbl1;
All this assumes that row_number is an integer and the type cast is redundant. If you made the mistake to use a different data type, an index is your only hope:
CREATE INDEX ON tbl1 ((row_number::integer));
ANALYZE tbl1;

I understand that the Nested Loop Join can be very costly as the right relation is scanned once for every row found in the left relation
But the "right relation" here is an index scan, not a scan of the full table.
You can get it to stop using the index by changing the leading column of the join condition to something like where tgt.col1+0 = stage.col1 .... Upon doing that, it will probably change to a hash join or a merge join, but you will have to try it and see if it does. Also, the new plan may not actually be faster. (And fixing the estimation problem would be preferable, if that works)
Or is there a way that can help me to reduce the cost of the the
nested loop scan, currently it takes 6-7 minutes to update just 50000
records?
Your plan shows that over half the time is spent on the update itself, so probably reducing the cost of just the nested loop scan can have only a muted impact on the overall time. Do you have a lot of indexes on the table? The maintenance of those indexes might be a major bottleneck.

Related

restriction in second position - index not used - why?

I have created the below example and do not understand why the planner does not use index i2 for the query. As can be seen in pg_stats, it understands that column uniqueIds contains unique values. it also understands that column fourOtherIds contains only 4 different values. Shouldn't a search of index i2 then be by far the fastest way? Looking for uniqueIds in only four different index leaves of fourOtherIds? What is wrong with my understanding of how an index works? Why does it think using i1 makes more sense here, even though it has to filter out 333.333 rows? In my understanding it should use i2 to find the one row (or few rows, as there is no unique constraint) that has uniqueIds 4000 first and then apply where fourIds = 1 as a filter.
create table t (fourIds int, uniqueIds int,fourOtherIds int);
insert into t ( select 1,*,5 from generate_series(1 ,1000000));
insert into t ( select 2,*,6 from generate_series(1000001,2000000));
insert into t ( select 3,*,7 from generate_series(2000001,3000000));
insert into t ( select 4,*,8 from generate_series(3000001,4000000));
create index i1 on t (fourIds);
create index i2 on t (fourOtherIds,uniqueIds);
analyze t;
select n_distinct,attname from pg_stats where tablename = 't';
/*
n_distinct|attname |
----------+------------+
4.0|fourids |
-1.0|uniqueids |
4.0|fourotherids|
*/
explain analyze select * from t where fourIds = 1 and uniqueIds = 4000;
/*
QUERY PLAN |
--------------------------------------------------------------------------------------------------------------------------+
Gather (cost=1000.43..22599.09 rows=1 width=12) (actual time=0.667..46.818 rows=1 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Parallel Index Scan using i1 on t (cost=0.43..21598.99 rows=1 width=12) (actual time=25.227..39.852 rows=0 loops=3)|
Index Cond: (fourids = 1) |
Filter: (uniqueids = 4000) |
Rows Removed by Filter: 333333 |
Planning Time: 0.107 ms |
Execution Time: 46.859 ms |
*/
Not every conceivable optimization has been implemented. You are looking for a variant of an index skip scan AKA a loose index scan. PostgreSQL does not automatically implement those (yet--people were working on it but I don't know if they still are. Also, I think I've read that one of the 3rd party extensions/forks, citus maybe, has implemented it). You can emulate one yourself using a recursive CTE, but that would be quite annoying to do.

Time Consuming SQL Update Statement

In Postgresql (version 9.2), I need to update a table with values from another table. The UPDATE statement below works and completes quickly on a small data set (1K records). With large amount of records (600K+), the statement has not completed after more than two hours. I don't know if it is taking a long time or is simply hung.
UPDATE training_records r SET cid =
(SELECT cid_main FROM account_events e
WHERE e.user_ekey = r.ekey
AND e.type = 't'
AND r.enroll_date < e.time
ORDER BY e.time ASC LIMIT 1)
WHERE r.cid IS NULL;
Is there a problem with this statement? Is there a more efficient way to do this?
About the operation: training_records holds course enrollment records for member accounts (id by ekey) that are organized in groups. cid is the group id. account_events holds account changing events including transfers between groups (e.type='t'), where cid_main would be the group id before the transfer. I am trying to retroactively patch the newly added cid column in training_records so it accurately reflects the group membership when the course was enrolled. There could be multiple transfers, so I am picking the group id (cid_main) from the earliest transfer after the time of enrollment. Hope this makes sense.
The table training_records has close to 700K records, and account_events has 560K+ records.
Output of EXPLAIN {command above}
Update on training_records r (cost=0.00..13275775666.76 rows=664913 width=74)
-> Seq Scan on training_records r (cost=0.00..13275775666.76 rows=664913 width=74)
Filter: (cid IS NULL)
SubPlan 1
-> Limit (cost=19966.15..19966.16 rows=1 width=12)
-> Sort (cost=19966.15..19966.16 rows=1 width=12)
Sort Key: e."time"
-> Seq Scan on account_events e (cost=0.00..19966.15 rows=1 width=12)
Filter: ((r.enroll_date < "time") AND (user_ekey = r.ekey) AND (type = 't'::bpchar))
(9 rows)
One more udpate:
Adding an additional condition in WHERE, I limited the number of records from training_records to about 10K. The update took about 15 minutes. If the time is even to close to being linear to the number of records of this one table, 700K records would take about 17+ hours.
Thank you for your help!
Update: It took close to 9 hours, but the original command completed.
Try to transform it to something that does not force a nested loop join:
UPDATE training_records r
SET cid = e.cid_main
FROM account_events e
WHERE e.user_ekey = r.ekey
AND e.type = 't'
AND r.enroll_date < e.time
AND NOT EXISTS (SELECT 1 FROM account_events e1
WHERE e1.user_ekey = r.ekey
AND e1.type = 't'
AND r.enroll_date < e1.time
AND e1.time < e.time)
AND r.cid IS NULL;
The statement actually isn't equivalent: if there is no matching account_events row, your statement will update cid to NULL, while my statement will not update that row.

How to get Postgresql total cost time from explain

I have an sql query on postgresql 9.5, but it takes too long time. And I run the explain query:
DELETE FROM source v1
WHERE id < (SELECT MAX(id)
FROM source v2
WHERE v2.ent_id = v1.ent_id
AND v2.name = v1.name
);
And ex plain is
Delete on source v1 (cost=0.00..1764410287608.21 rows=2891175 width=6)');
-> Seq Scan on source v1 (cost=0.00..1764410287608.21 rows=2891175 width=6)');
Filter: (id < (SubPlan 2))');
SubPlan 2');
-> Result (cost=203424.76..203424.77 rows=1 width=0)');
InitPlan 1 (returns $2)');
-> Limit (cost=0.43..203424.76 rows=1 width=8)');
-> Index Scan Backward using source_id_ix on source v2 (cost=0.43..813697.74 rows=4 width=8)');
Index Cond: (id IS NOT NULL)');
Filter: (((ent_id)::text = (v1.ent_id)::text) AND ((name)::text = (v1.name)::text))');
My table has about 8.000.000 records. And I could not get the result for days. And I could not calculate how many times will take? is there any way for a new solution?
There is no really good way to predict execution time.
As a very rough rule of thumb, you can compare a cost of 1 to the time of reading one 8 KB page from disk during a sequential scan, but that will often be off by more than an order of magnitude.
To solve the underlying problem, try
DELETE FROM source AS v1
WHERE EXISTS (SELECT 1
FROM source AS v2
WHERE (v1.ent_id, v1.name) = (v2.ent_id, v2.name)
AND v2.id > v1.id);
The problem with your query is that it has to execute an expensive subselect for every row found, while mine can perform a semijoin. Look at my query's execution plan.

Is this the right way to create a partial index in Postgres?

We have a table with 4 million records, and for a particular frequently used use-case we are only interested in records with a particular salesforce userType of 'Standard' which are only about 10,000 out of 4 million. The other usertype's that could exist are 'PowerPartner', 'CSPLitePortal', 'CustomerSuccess', 'PowerCustomerSuccess' and 'CsnOnly'.
So for this use case I thought creating a partial index would be better, as per the documentation.
So I am planning to create this partial index to speed up queries for records with a usertype of 'Standard' and prevent the request from the web from getting timed out:
CREATE INDEX user_type_idx ON user_table(userType)
WHERE userType NOT IN
('PowerPartner', 'CSPLitePortal', 'CustomerSuccess', 'PowerCustomerSuccess', 'CsnOnly');
The lookup query will be
select * from user_table where userType='Standard';
Could you please confirm if this is the right way to create the partial index? It would of great help.
Postgres can use that but it does so in a way that is (slightly) less efficient than an index specifying where user_type = 'Standard'.
I created a small test table with 4 million rows, 10.000 of them having the user_type 'Standard'. The other values were randomly distributed using the following script:
create table user_table
(
id serial primary key,
some_date date not null,
user_type text not null,
some_ts timestamp not null,
some_number integer not null,
some_data text,
some_flag boolean
);
insert into user_table (some_date, user_type, some_ts, some_number, some_data, some_flag)
select current_date,
case (random() * 4 + 1)::int
when 1 then 'PowerPartner'
when 2 then 'CSPLitePortal'
when 3 then 'CustomerSuccess'
when 4 then 'PowerCustomerSuccess'
when 5 then 'CsnOnly'
end,
clock_timestamp(),
42,
rpad(md5(random()::text), (random() * 200 + 1)::int, md5(random()::text)),
(random() + 1)::int = 1
from generate_series(1,4e6 - 10000) as t(i)
union all
select current_date,
'Standard',
clock_timestamp(),
42,
rpad(md5(random()::text), (random() * 200 + 1)::int, md5(random()::text)),
(random() + 1)::int = 1
from generate_series(1,10000) as t(i);
(I create tables that have more than just a few columns as the planner's choices are also driven by the size and width of the tables)
The first test using the index with NOT IN:
create index ix_not_in on user_table(user_type)
where user_type not in ('PowerPartner', 'CSPLitePortal', 'CustomerSuccess', 'PowerCustomerSuccess', 'CsnOnly');
explain (analyze true, verbose true, buffers true)
select *
from user_table
where user_type = 'Standard'
Results in:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on stuff.user_table (cost=139.68..14631.83 rows=11598 width=139) (actual time=1.035..2.171 rows=10000 loops=1)
Output: id, some_date, user_type, some_ts, some_number, some_data, some_flag
Recheck Cond: (user_table.user_type = 'Standard'::text)
Buffers: shared hit=262
-> Bitmap Index Scan on ix_not_in (cost=0.00..136.79 rows=11598 width=0) (actual time=1.007..1.007 rows=10000 loops=1)
Index Cond: (user_table.user_type = 'Standard'::text)
Buffers: shared hit=40
Total runtime: 2.506 ms
(The above is a typical execution time after I ran the statement about 10 times to eliminate caching issues)
As you can see the planner uses a Bitmap Index Scan which is a "lossy" scan that needs an extra step to filter out false positives.
When using the following index:
create index ix_standard on user_table(id)
where user_type = 'Standard';
This results in the following plan:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_standard on stuff.user_table (cost=0.29..443.16 rows=10267 width=139) (actual time=0.011..1.498 rows=10000 loops=1)
Output: id, some_date, user_type, some_ts, some_number, some_data, some_flag
Buffers: shared hit=313
Total runtime: 1.815 ms
Conclusion:
Your index is used but an index on only the type that you are interested in is a bit more efficient.
The runtime is not that much different. I executed each explain about 10 times, and the average for the ix_standard index was slightly below 2ms and the average of the ix_not_in index was slightly above 2ms - so not a real performance difference.
But in general the Index Scan will scale better with increasing table sizes than the Bitmap Index Scan will do. This is basically due to the "Recheck Condition" - especially if not enough work_mem is available to keep the bitmap in memory (for larger tables).
For the index to be used, the WHERE condition must be used in the query as you wrote it.
PostgreSQL has some ability to make deductions, but it won't be able to infer that userType = 'Standard' is equivalent to the condition in the index.
Use EXPLAIN to find out if your index can be used.

Deducing estimated time taken with EXPLAIN

The following is the EXPLAIN output of a query with set enable_seqscan = true.
Hash Join (cost=1028288.04..278841855100.04 rows=429471108 width=125)
Hash Cond: ((u.destination)::text = (n.mid)::text)
-> Nested Loop (cost=0.00..278587474234.17 rows=429471108 width=112)
Join Filter: (((u.destination)::text <> (u2.mid)::text) AND ("position"((u2.path_name)::text, (suffix(u.path_name))::text) = 0) AND (((prefix((u.path_name)::text))::text = (prefix((u2.path_name)::text))::text) OR ((prefix((u.path_name)::text))::text = 'common'::text)))
-> Seq Scan on unresolved u2 (cost=0.00..2780546.32 rows=117608632 width=79)
-> Index Scan using unresolved__mid on unresolved u (cost=0.00..1864.44 rows=492 width=53)
Index Cond: ((u.mid)::text = (u2.destination)::text)
-> Hash (cost=488335.24..488335.24 rows=27237024 width=33)
-> Seq Scan on name n (cost=0.00..488335.24 rows=27237024 width=33)
(9 rows)
The following is the EXPLAIN output of the same query but with set enable_seqscan = false.
Hash Join (cost=102089128.45..279381508122.13 rows=429471108 width=125)
Hash Cond: ((u.destination)::text = (n.mid)::text)
-> Nested Loop (cost=0.00..279026066415.86 rows=429471108 width=112)
Join Filter: (((u.destination)::text <> (u2.mid)::text) AND ("position"((u2.path_name)::text, (suffix(u.path_name))::text) = 0) AND (((prefix((u.path_name)::text))::text = (prefix((u2.path_name)::text))::text) OR ((prefix((u.path_name)::text))::text = 'common'::text)))
-> Index Scan using unresolved__destination on unresolved u2 (cost=0.00..441372728.01 rows=117608632 width=79)
-> Index Scan using unresolved__mid on unresolved u (cost=0.00..1864.44 rows=492 width=53)
Index Cond: ((u.mid)::text = (u2.destination)::text)
-> Hash (cost=101549175.65..101549175.65 rows=27237024 width=33)
-> Index Scan using name_pkey on name n (cost=0.00..101549175.65 rows=27237024 width=33)
(9 rows)
I would like to know how long would the query take. It's been running for about 10 hours now. Is the estimated time deduced from the 'cost' in the first row, in the case of the latter it is '279381508122.13 ms' which is 8.8 years?! :-(
The numbers do not correspond to time. They are relative numbers only. From the documentation (Using Explain):
The costs are measured in arbitrary
units determined by the planner's cost
parameters (see Section 18.6.2).
Traditional practice is to measure the
costs in units of disk page fetches;
that is, seq_page_cost is
conventionally set to 1.0 and the
other cost parameters are set relative
to that. (The examples in this section
are run with the default cost
parameters.)
In any event, the Nested Loop due to a somewhat vague join condition appears to be killing your performance. Hard to tell without seeing the original query and table/index structures, but you might find benefit in creating a functional index on unresolved, assuming "prefix()" is an IMMUTABLE function:
CREATE INDEX idx_path_name_prefix ON unresolved (prefix(path_name));