How does COUNT(*) behave in an inner join - tsql

Take this query:
SELECT c.CustomerID, c.AccountNumber, COUNT(*) AS CountOfOrders,
SUM(s.TotalDue) AS SumOfTotalDue
FROM Sales.Customer AS c
INNER JOIN Sales.SalesOrderheader AS s ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID, c.AccountNumber
ORDER BY c.CustomerID;
I expected COUNT(*) to count the rows in Sales.Customer but to my surprise it counts the number of rows in the joined table.
Any idea why this is? Also, is there a way to be explicit in specifying which table COUNT() should operate on?

Query Processing Order...
The FROM clause is processed before the SELECT clause -- which is to say -- by the time SELECT comes into play, there is only one (virtual) table it is selecting from -- namely, the individual tables after their joined (JOIN), filtered (WHERE), etc.
If you just want to count over the one table, then you might try a couple of things...
COUNT(DISTINCT table1.id)
Or turn the table you want to count into a sub-query with count() inside of it

Related

Using LIMIT Statement in INNER JOIN (postgreSQL)

I am having trouble using the LIMIT Statement. I would really appreciate your help.
I am trying to INNER JOIN three tables and use the LIMIT statement to only query a few lines because the tables are so huge.
So, basically, this is what I am trying to accomplish:
SELECT *
FROM ((scheme1.table1
INNER JOIN scheme1.table2
ON scheme1.table1.column1 = scheme1.table2.column1 LIMIT 1)
INNER JOIN scheme1.table3
ON scheme1.table1.column1 = scheme1.table3.column1)
LIMIT 1;
I get an syntax error on the LIMIT from the first INNER JOIN. Why? How can I limit the results I get from each of the INNER JOINS. If I only use the second "LIMIT 1" at the bottom, I will query the entire table.
Thanks a lot!
LIMIT can only be applied to queries, not to a table reference. So you need to use a complete SELECT query for table2 in order to be able to use the LIMIT clause:
SELECT *
FROM schema1.table1 as t1
INNER JOIN (
select *
from schema1.table2
order by ???
limit 1
) as t2 ON t1.column1 = t2.column1
INNER JOIN schema1.table3 as t3 on ON t1.column1 = t3.column1
order by ???
limit 1;
Note that LIMIT without an ORDER BY typically makes no sense as results of a query have no inherent sort order. You should think about applying the necessary ORDER BY in the derived table (aka sub-query) and the outer query to get consistent and deterministic results.

Should I do ORDER BY twice when selecting from subquery?

I have SQL query (code below) which selects some rows from subquery. In subquery I perform ORDER BY.
The question is: will order of subquery be preserved in parent query?
Is there some spec/document or something which proves that?
SELECT sub.id, sub.name, ot.field
FROM (SELECT t.id, t.name
FROM table t
WHERE t.something > 10
ORDER BY t.id
LIMIT 25
) sub
LEFT JOIN other_table ot ON ot.table_id = sub.id
/**order by id?**/```
will order of subquery be preserved in parent query
It might happen, but you can not rely on that.
For example, if the optimizer decides to use a hash join between your derived table and other_table then the order of the derived table will not be preserved.
If you want a guaranteed sort order, then you have to use an order by in the outer query as well.

how to fetch data quickly in join query?

I have 3 tables users, orders and comments every tables has 10087250,24949600 and 26532000 much records, I made this query to counts comments on every order but it is taking more than half an hour to execute, how to speed up this query.
Note: there is already index on foreig_key columns.
select users.user_name, orders.id, count(comments.order_id)
from orders
inner join users on users.id=orders.user_id
inner join comments on orders.id=comments.order_id
group by comments.order_id, users.user_name, orders.id
limit 2;
For the first - probably yuo need ORDER BY clause to use it with LIMIT
If you need most commented pair you can ORDER BY count DESC
The second things comments.order_id = orders.id. Why do you use both for GROUP?
group by comments.order_id, users.user_name, orders.id
May be you can help something like this:
WITH grouped AS (
SELECT order_id AS id, count(*)
FROM comments
GROUP BY 1
ORDER BY 2 DESC
LIMIT 2
)
SELECT u.user_name, g.id, g.count
FROM grouped AS g
JOIN orders AS o ON
o.id = g.id
JOIN users AS u ON
u.id = o.user_id
This allows to avoid join all tables before filtering and grouping
You can try to use temporary tables before aggregating the records. This might help to reduce the query time. Something like this...
CREATE TEMPORARY TABLE temp_table(
...
);
INSERT INTO temp_table
SELECT users.user_name, orders.id, comments.order_id
FROM orders INNER JOIN users ON users.id = orders.user_id INNER JOIN comments ON orders.id = comments.order_id;
SELECT user_name, id, count(order_id) FROM temp_table group by order_id, user_name, id;
I think you need to reduce a unneccessary join between orders and comments tables. All you want to get from table comments is how many comments of an order, so you need to do denormalization.
It means you need to add a comments_count column into orders table, and when every a comment is added to an order, just increase it or decrease it if a comment of order is deleted.
After you add new comments_count column, you need to update comments_count for each order.
Then you can just load orders table and you already have comments count for each order.

Postgres: left join with order by and limit 1

I have the situation:
Table1 has a list of companies.
Table2 has a list of addresses.
Table3 is a N relationship of Table1 and Table2, with fields 'begin' and 'end'.
Because companies may move over time, a LEFT JOIN among them results in multiple records for each company.
begin and end fields are never NULL. The solution to find the latest address is use a ORDER BY being DESC, and to remove older addresses is a LIMIT 1.
That works fine if the query can bring only 1 company. But I need a query that brings all Table1 records, joined with their current Table2 addresses. Therefore, the removal of outdated data must be done (AFAIK) in LEFT JOIN's ON clause.
Any idea how I can build the clause to not create duplicated Table1 companies and bring latest address?
Use a dependent subquery with max() function in a join condition.
Something like in this example:
SELECT *
FROM companies c
LEFT JOIN relationship r
ON c.company_id = r.company_id
AND r."begin" = (
SELECT max("begin")
FROM relationship r1
WHERE c.company_id = r1.company_id
)
INNER JOIN addresses a
ON a.address_id = r.address_id
demo: http://sqlfiddle.com/#!15/f80c6/2
Since PostgreSQL 9.3 there is JOIN LATERAL (https://www.postgresql.org/docs/9.4/queries-table-expressions.html) that allows to make a sub-query to join, so it solves your issue in an elegant way:
SELECT * FROM companies c
JOIN LATERAL (
SELECT * FROM relationship r
WHERE c.company_id = r.company_id
ORDER BY r."begin" DESC LIMIT 1
) r ON TRUE
JOIN addresses a ON a.address_id = r.address_id
The disadvantage of this approach is the indexes of the tables inside LATERAL do not work outside.
I managed to solve it using Windows Function:
WITH ranked_relationship AS(
SELECT
*
,row_number() OVER (PARTITION BY fk_company ORDER BY dt_start DESC) as dt_last_addr
FROM relationship
)
SELECT
company.*
address.*,
dt_last_addr as dt_relationship
FROM
company
LEFT JOIN ranked_relationship as relationship
ON relationship.fk_company = company.pk_company AND dt_last_addr = 1
LEFT JOIN address ON address.pk_address = relationship.fk_address
row_number() creates an int counter for each record, inside each window based to fk_company. For each window, the record with latest date comes first with rank 1, then dt_last_addr = 1 makes sure the JOIN happens only once for each fk_company, with the record with latest address.
Window Functions are very powerful and few ppl use them, they avoid many complex joins and subqueries!

Intersect of Select Statements based on a particular column

I have a Q about INTERSECT clause between two select statements in Sql server 2008.
Select 1 a,b,c ..... INTERSECT Select 2 a,b,c....
Here, the datasets of the two queries should exactly match to return the common elements.
But, I want only column a of both select statements to match.
If the values of column a in both the queries have same values, the entire row should appear in the result set.
Can i Do that and How ??
Thanks,
Marcus..
The best thing to do is to look at the queries itself. DO they need an INTERSECT, of is it possible to make a join with it
for example.
An INTERSECT looks like this
select columnA
from tableA
INTERSECT
select columnAreference
from tableB
Your result would have all columns that are in BOTH tables.. so a join would be more usefull
select columnA
from tableA a
inner join tableB b
on b.columnAReference = a.columnA
If you look into the execution plan you'll see that the INTERSECT will do a left semi join and the inner join will do a, like expected, an inner join. A left semi join isn't something you can tell the query optimizer to do, BUT IT IS FASTER!!!! A left semi join will only return 1 row from the left table, where a normal join will return them all. In this particular case it will be faster.
So an INTERSECT isn't a bad thing which should be eliminated with an INNER JOIN construction, sometimes it will perform even better.
However, to give you the best answer, i will need some more details about your query :)
select * from table1 t1 inner join Table2 t2
on t1.col1=t2.col1