Postgres subquery execution steps - postgresql

Suppose I have a query which says
Select * from (
select coalesce(mytable.created_date,mytable1.created_date) as created_date,...
from mytable
left join mytable1 ON (mytable.id=mytable1.id)
--Other Joins and tables here
) as foo
where created_date > CURRENT_DATE
Will Postgres select only the rows where created_date is > CURRENT_DATE for inner query joins where I am joining many tables?
Or will it take all rows from mytable and make joins with other tables on inner query, then check for created_date > CURRENT_DATE.
Is my previous query the same as
select coalesce(mytable.created_date,mytable1.created_date),... from mytable
left join mytable1 ON (mytable.id=mytable1.id)
--Other Joins and tables here
WHERE
coalesce(mytable.created_date,mytable1.created_date) > CURRENT_DATE

As you can see when you use EXPLAIN, the optimizer can “flatten” such subqueries, so that the execution plans for these two queries will be the same.
In other words, the optimizer is able to push the WHERE condition into the subquery and the join, so that it can be executed first.
Moreover, if created_date happens to be a column of mytable1, PostgreSQL will deduce that created_date can never be NULL and perform an inner join rather than an outer join.

Related

How to update joined table using condition

I'm having an issue with a simple update statement. I'm new to postgresql and I'm still stuck on MS Sql Server syntax.
What I want to do is to update all records from table1 which are not present / don't exist in table2. Table1 and Table2 are having an 1 to 1 relation. The join column is "colx" from my example
On Ms SQL Server I would have something like this:
UPDATE table1 set col1='some value' from table1 t1 LEFT JOIN table2 t2 on t1.colx=t2.colx WHERE t2.colx IS NULL
or
UPDATE table1 set col1='some value' from table1 t1 where not exists (select 1 from table2 t2 where t1.colx=t2.colx)
My issue is when performing the same on PostgreSql it updates all records from table1, not only the records matching the condition (e.g. I was expecting 4 records to be updated, but all records from table1 are updated instead).
I checked using a select statement the join condition for all possible approaches and I have the expected result (e.g. 4 records).
Is there anything I'm missing?
Your question is not very clear about the requirement.
What I understood is you want to update the value of col1 in table1 for those records which are not present in the table2.
You can try it this way in Postgresql:
UPDATE table1 t1 set col1='some value' where not exists(select 1 from table2 where colx=t1.colx)
DEMO

Should I do ORDER BY twice when selecting from subquery?

I have SQL query (code below) which selects some rows from subquery. In subquery I perform ORDER BY.
The question is: will order of subquery be preserved in parent query?
Is there some spec/document or something which proves that?
SELECT sub.id, sub.name, ot.field
FROM (SELECT t.id, t.name
FROM table t
WHERE t.something > 10
ORDER BY t.id
LIMIT 25
) sub
LEFT JOIN other_table ot ON ot.table_id = sub.id
/**order by id?**/```
will order of subquery be preserved in parent query
It might happen, but you can not rely on that.
For example, if the optimizer decides to use a hash join between your derived table and other_table then the order of the derived table will not be preserved.
If you want a guaranteed sort order, then you have to use an order by in the outer query as well.

Multiple Selection in Datagrip (Running a CTE that's not in your nested query)

I'm looking for a way to run a nested correlated query that requires a CTE created above the script. For example if I had:
with first_cte as (
select *
from a_table
where 1=1
)
select * from
(select
column_1,
column_2,
column_3
from b_table b
inner join first_cte f on f.user_id = b.user_id
where 1=1) x
If I just wanted to test the nested query, it will say that first_cte doesn't exist. Is there a way to highlight the CTE so that it will run when I'm testing nested queries?
I'm using PostgreSQL btw. Thanks!!!
There are Execute selection and Execution options features in DataGrip.

Does SQL execute subqueries fully?

Imagine I have this SQL query and the table2 is HUGE.
select product_id, count(product_id)
from table1
where table2_ptr_id in (select id
from table2
where author is not null)
Will SQL first execute the subquery and load all the table2 into memory? like if table1 has 10 rows and table2 has 10 million rows will it be better to join first and then filter? Or DB is smart enough to optimize this query as it is written.
You have to EXPLAIN the query to know what it is doing.
However, your query will likely perform better in PostgreSQL if you rewrite it to
SELECT product_id
FROM table1
WHERE EXISTS (SELECT 1
FROM table2
WHERE table2.id = table1.table2_ptr_id
AND table2.author IS NOT NULL);
Then PostgreSQL can use an anti-join, which will probably perform much better with a huge table2.
Remark: the count in your query doesn't make any sense to me.

How does COUNT(*) behave in an inner join

Take this query:
SELECT c.CustomerID, c.AccountNumber, COUNT(*) AS CountOfOrders,
SUM(s.TotalDue) AS SumOfTotalDue
FROM Sales.Customer AS c
INNER JOIN Sales.SalesOrderheader AS s ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID, c.AccountNumber
ORDER BY c.CustomerID;
I expected COUNT(*) to count the rows in Sales.Customer but to my surprise it counts the number of rows in the joined table.
Any idea why this is? Also, is there a way to be explicit in specifying which table COUNT() should operate on?
Query Processing Order...
The FROM clause is processed before the SELECT clause -- which is to say -- by the time SELECT comes into play, there is only one (virtual) table it is selecting from -- namely, the individual tables after their joined (JOIN), filtered (WHERE), etc.
If you just want to count over the one table, then you might try a couple of things...
COUNT(DISTINCT table1.id)
Or turn the table you want to count into a sub-query with count() inside of it