Why PostgreSQL do whole scan on index when condition is FALSE? - postgresql

I notice some slow down when query is running. From 5ms to 200ms. (+44ms JIT)
https://explain.depesz.com/s/lZYf#l12
similar, but JIT is off
Underlined expression is NULL so whole filter expression is FALSE.
Why here PG waste time 227ms? What I did wrong?
EXPLAIN( ANALYSE, FORMAT JSON, VERBOSE, settings, buffers )
WITH
_app_period AS ( select ?::tstzrange ),
ready AS (
SELECT
min( lower( o.app_period ) ) OVER ( PARTITION BY agreement_id ) <# (select * from _app_period) AS new_order,
max( upper( o.app_period ) ) OVER ( PARTITION BY agreement_id ) <# (select * from _app_period) AS del_order
,o.*
FROM "order_bt" o
LEFT JOIN acc_ready( 'Usage', (select * from _app_period), o ) acc_u ON acc_u.ready
LEFT JOIN acc_ready( 'Invoice', (select * from _app_period), o ) acc_i ON acc_i.ready
LEFT JOIN agreement a ON a.id = o.agreement_id
LEFT JOIN xcheck c ON c.doc_id = o.id and c.doctype = 'OrderDetail'
WHERE o.sys_period #> sys_time() AND o.app_period && app_period()
)
SELECT * FROM ready
UPD
Server version is 13.1
Is the second execution faster?
No. Result is reproducible all the time.
Perhaps sys_time() is expensive - what is that function?
This is stable function which do select coalesce( biconf( 'sys_time' )::timestamptz, now() ). app_period() is STABLE SQL and do similar thing.
Are you sure that the expression is NULL for all rows?
Yes. I check result of app_period() it is NULL, so it does not matter how many rows in table. o.app_period && NULL will result NULL for all rows.
Does the execution time change if you replace the expression with a literal NULL?
Yes, changing condition to WHERE o.sys_period #> sys_time() AND o.app_period && NULL reduce time to 0.08ms. Plan is changed.
Do you have indexes on o.sys_period and o.app_period?
Yes. I have: "order_id_sys_period_app_period_excl" EXCLUDE USING gist (id WITH =, sys_period WITH &&, app_period WITH &&)
And what happens when you execute the query without the CTE?
Without CTE many things are inlined and time is reduced to 0.5ms. But for IndexScan similar condition is used (now it is fast)
When I put (select * from _app_period) everywhere then query also run fast: 15ms. Filter is planned as $3: (o.app_period && $3) AND (o.sys_period #> sys_time())

Related

JOIN vs SubQuery: Why subquery performance win when it should not?

Recently I have asked about Why select from function is slow?.
But now when I LEFT JOIN this function it take 11500ms.
When I rewrite LEFT JOIN by SubQuery it took only 111ms
SELECT
(SELECT next_ots FROM order_total_suma( next_range ) next_ots
WHERE next_ots.order_id = ots.order_id AND next_ots.consumed_period #> (ots.o).billed_to
) AS next_suma, --<< this took only 111ms. See plan
ots.* FROM (
SELECT
tstzrange(
NULLIF( (ots.o).billed_to, 'infinity' ),
NULLIF( (ots.o).billed_to +p.interval, 'infinity' )
) as next_range,
ots.*
FROM order_total_suma() ots
LEFT JOIN period p ON p.id = (ots.o).period_id
) ots
--LEFT JOIN order_total_suma( next_range ) next_ots ON next_ots.order_id = 6154
-- AND next_ots.consumed_period #> (ots.o).billed_to --<< this is fine. plan is not posted
--LEFT JOIN order_total_suma( next_range ) next_ots ON next_ots.order_id = ots.order_id
-- AND next_ots.consumed_period #> (ots.o).billed_to --<< this takes 11500ms. See Plan
WHERE ots.order_id IN ( 6154, 10805 )
Attached plans
While googling I have found this blog post
In most cases, joins are also a better solution than subqueries — Postgres will even internally “rewrite” a subquery, creating a join, whenever possible, but this of course increases the time it takes to come up with the query plan
Many SO question like this
A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone.
So why LEFT JOINing function is significantly slower in compare to SubQuery?
Is there a way to make LEFT JOIN take time equally to SubQuery?

Postgres left joining 3 tables with a condition

I have a query like this:
SELECT x.id
FROM x
LEFT JOIN (
SELECT a.id FROM a
WHERE [condition1] [condition2]
) AS A USING (id)
LEFT JOIN (
SELECT b.id FROM b
WHERE [condition1] [condition3]
) AS B USING (id)
LEFT JOIN (
SELECT c.id FROM c
WHERE [condition1] [condition4]
) AS C USING (id)
WHERE [condition1]
As you can see the [condition1] is common for subqueries and the outer query.
When [in general] might it be worth to remove [condition1] from subqueries (as the result is same) for performance reasons? Please don't give answers like "run it and see". There are lots of data and its changing so we need good worst case behaviour.
I have tried to do some tests but they are far from being conclusive. Will Postgres figure out that the condition applied to subqueries as well and propagate it?
Examples for condition1:
WHERE a.id NOT IN (SELECT id FROM {ft_geom_in}) (this is slow, I know, this is just for example)
WHERE a.id > x
It is difficult to give a clear general answer, since much depends on the actual data model (especially indexes) and queries (conditions).
However, in many cases it makes sense to place condition1 in joined subqueries.
This applies particularly when condition2 excludes much less rows than condition1.
In such cases, the filter on condition1 may significantly reduce the number of checks of condition2.
On the other hand, it seems unlikely that the presence of condition1 in subqueries could substantially slow down the query.
Simple tests do not give general answers, but might serve as an illustration.
create table x (id integer, something text);
create table a (id integer, something text);
insert into x select i, i::text from generate_series (1, 1000000) i;
insert into a select i, i::text from generate_series (1, 1000000) i;
Query A: condition2 excludes few rows.
A1: with condition1
explain analyse
select x.id
from x
left join (
select id from a
where id < 500000 and length(something) > 1
) as a using (id)
where id < 500000;
Average execution time: ~620.000 ms
A2: without condition1
explain analyse
select x.id
from x
left join (
select id from a
where length(something) > 1
) as a using (id)
where id < 500000;
Average execution time: ~810.000 ms
Query B: condition2 excludes many rows.
B1: with condition1
explain analyse
select x.id
from x
left join (
select id from a
where id < 500000 and length(something) = 1
) as a using (id)
where id < 500000;
Average execution time: ~220.000 ms
B2: without condition1
explain analyse
select x.id
from x
left join (
select id from a
where length(something) = 1
) as a using (id)
where id < 500000;
Average execution time: ~230.000 ms
Note, that the queries do not need to have subqueries. Queries with simple left joins and conditions in a common where clause should be a little bit faster. For example, this is the equivalent of query B1:
explain analyse
select x.id
from x
left join a using(id)
where x.id < 500000
and a.id < 500000
and length(a.something) = 1
Average execution time: ~210.000 ms

Using EXISTS as a column in TSQL

Is it possible to use the value of EXISTS as part of a query?
(Please note: unfortunately due to client constraints, I need SQLServer 2005 compatible answers!)
So when returning a set of results, one of the columns is a boolean value which states whether the subquery would return any rows.
For example, I want to return a list of usernames and whether a different table contains any rows for each user. The following is not syntactically correct, but hopefully gives you an idea of what I mean...
SELECT T1.[UserName],
(EXISTS (SELECT *
FROM [AnotherTable] T2
WHERE T1.[UserName] = T2.[UserName])
) AS [RowsExist]
FROM [UserTable] T1
Where the resultant set contains a column called [UserName] and boolean column called [RowsExist].
The obvious solution is to use a CASE, such as below, but I wondered if there was a better way of doing it...
SELECT T1.[UserName],
(CASE (SELECT COUNT(*)
FROM [AnotherTable] T2
WHERE T1.[UserName] = T2.[UserName]
)
WHEN 0 THEN CAST(0 AS BIT)
ELSE CAST(1 AS BIT) END
) AS [RowsExist]
FROM [UserTable] T1
Your second query isn't valid syntax.
SELECT T1.[UserName],
CASE
WHEN EXISTS (SELECT *
FROM [AnotherTable] T2
WHERE T1.[UserName] = T2.[UserName]) THEN CAST(1 AS BIT)
ELSE CAST(0 AS BIT)
END AS [RowsExist]
FROM [UserTable] T1
Is generally fine and will be implemented as a semi join.
The article Subqueries in CASE Expressions discusses this further.
In some cases a COUNT query can actually perform better though as discussed here
I like the other guys sql better but i just wrote this:
with bla as (
select t2.username, isPresent=CAST(1 AS BIT)
from t2
group by t2.username
)
select t1.*, isPresent = isnull(bla.isPresent, CAST(0 AS BIT))
from t1
left join blah on t1.username=blah.username
From what you wrote here I would alter your first query into something like this
SELECT
T1.[UserName], ISNULL(
(
SELECT
TOP 1 1
FROM [AnotherTable]
WHERE EXISTS
(
SELECT
1
FROM [AnotherTable] AS T2
WHERE T1.[UserName] = T2.[UserName]
)
), 0)
FROM [UserTable] T1
But actually if you use TOP 1 1 you would not need EXISTS, you could also write
SELECT
T1.[UserName], ISNULL(
(
SELECT
TOP 1 1
FROM [AnotherTable] AS T2
WHERE T1.[UserName] = T2.[UserName]
), 0)
FROM [UserTable] T1

T SQL Conditional join based on parameter value

I have the need to have a inner join based on the value of a parameter I have in a stored procedure. I'm also using a function to split values out of a string of comma separated values. My code is as follows
Select *
from view_Project as vp
join inline_split_me(#earmark) as e on (vp.EarmarkId LIKE e.Value and #earmark IS NOT NULL)
If #earmark is NULL then I don't want this join to happen at all, otherwise if I have a string of '%' or '119' or '119,120,121' this join should happen and does yield the proper results. I would just like to have it not happen at all if #earmark is null, I thought that I could just use the and #earmark is not null to delineate that however it is not returning the proper results, which is discovered by commenting the join line out and running the same sproc with null as the #earmark param, which gives me all rows as a result. When I keep this join and pass null I get no rows, I've been fiddling with this for some time, any help would be appreciated.
Here is the FUNCTION:
[inline_split_me](#param nvarchar(MAX))
RETURNS TABLE AS
RETURN(SELECT ltrim(rtrim(convert(nvarchar(4000),
substring(#param, Number,
charindex(N',' COLLATE SQL_Latin1_General_CP1_CI_AS,
#param + convert(nvarchar(MAX), N','),
Number) -
Number)
))) AS Value
FROM APM_Numbers
WHERE Number <= convert(int, len(#param))
AND substring(convert(nvarchar(MAX), N',') + #param, Number, 1) =
N',' COLLATE SQL_Latin1_General_CP1_CI_AS)
Got it, thanks Cade Roux and others
if (#earmark = '%')
select *
from view_Project as vp
where vp.EarmarkId like #earmark
else
select *
from view_Project as vp
where #earmark is null or vp.EarmarkId in (select Value from inline_split_me(#earmark))
INNER JOIN is your problem. A LEFT JOIN will always return the rows on the LEFT, even though when #earmark is NULL, the join condition can never be true.
Select *
from view_Project as vp
LEFT join inline_split_me(#earmark) as e on (vp.EarmarkId LIKE e.Value and #earmark IS NOT NULL)
You could fool around with a UNION to manufacture rows to join when #earmark is NULL
Select *
from view_Project as vp
INNER join (
SELECT Value, -- columns here ...
FROM inline_split_me(#earmark) as e
UNION ALL
SELECT DISTINCT vp.EarmarkId AS Value, -- NULL, NULL, etc.
FROM view_Project
WHERE #earmark IS NULL
) AS e
ON vp.EarmarkId LIKE e.Value
But frankly, I would just do a conditional logic:
IF #earmark IS NULL
Select *
from view_Project as vp
ELSE
Select *
from view_Project as vp
INNER join inline_split_me(#earmark) as e on (vp.EarmarkId LIKE e.Value and #earmark IS NOT NULL)
If you can get away from LIKE:
Select *
from view_Project as vp
WHERE #earmark IS NULL OR vp.EarmarkId IN (
SELECT Value FROM inline_split_me(#earmark)
)
...as vp join lined_split_me(#earmark) as...
should be defaulting to an inner join, which means that the query only returns rows if matches are found between the two tables. (Double-check by explicitly saying inner join.)
Does the function call return no (zero) rows if #earmark is null? If so, then there should be no rows returned from the query.
I know this question is pretty old, but I was researching a similar issue and came across this and came up with a totally different solution that worked like a charm.
Use a LEFT JOIN, but then have a filter in you WHERE clause that if your parameter is not null, neither can your join match be null. That functionally results in a conditional INNER JOIN.
SELECT *
FROM
A
LEFT JOIN B
ON A.KEY = B.KEY
WHERE
(#JOIN_B IS NOT NULL AND B.KEY IS NOT NULL)
OR #JOIN_B IS NULL

TSQL Update Query behaving unexpectedly

I have a nested select query that is returning the proper amount of rows. The query builds a recordset and compares it to a table and returns the records in the query that are not in the table.
I converted the select query to an update query. I am trying to populate the table with the rows returned from the query. When I run the update query it is returning with zero rows to update. I dont understand why because the select query is returning record and I am using the same code in the update query.
Thanks
Select Query: (This is returning several records)
Select *
From
(SELECT DISTINCT
ProductClass,SalProductClass.[Description],B.Branch,B.BranchDesc,B.Salesperson,B.Name,
CAST(0 AS FLOAT) AS Rate,'N' AS Split
FROM (SELECT SalBranch.Branch,SalBranch.[Description] AS BranchDesc,A.Salesperson,A.Name
FROM (SELECT DISTINCT
Salesperson,Name
FROM SalSalesperson
) A
CROSS JOIN SalBranch
) B
CROSS JOIN SalProductClass
) C
Left Outer Join RateComm On
RateComm.ProductClass = C.ProductClass and
RateComm.Branch = C.Branch And RateComm.Salesperson = C.Salesperson
Where RateComm.ProductClass is Null
Update Query: (This is returning zero records)
UPDATE RateComm
SET RateComm.ProductClass=C.ProductClass,RateComm.ProdClassDesc=C.ProdClassDesc,
RateComm.Branch=C.Branch,RateComm.BranchDesc=C.BranchDesc,RateComm.Salesperson=C.Salesperson,
RateComm.Name=C.Name,RateComm.Rate=C.Rate,RateComm.Split=C.Split
FROM (SELECT DISTINCT
ProductClass,SalProductClass.[Description] AS ProdClassDesc,B.Branch,B.BranchDesc,B.Salesperson,B.Name,
CAST(0 AS FLOAT) AS Rate,'N' AS Split
FROM (SELECT SalBranch.Branch,SalBranch.[Description] AS BranchDesc,A.Salesperson,A.Name
FROM (SELECT DISTINCT
Salesperson,Name
FROM SalSalesperson
) A
CROSS JOIN SalBranch
) B
CROSS JOIN SalProductClass
) C
LEFT OUTER JOIN RateComm ON C.ProductClass=RateComm.ProductClass AND
C.Salesperson=RateComm.Salesperson AND C.Branch=RateComm.Branch
WHERE RateComm.ProductClass IS NULL
It's difficult to update what doesn't exist. Have you tried an INSERT query instead?