Transact SQL ON EXISTS statement - tsql

I've got a Transact SQL problem which I don't understand.
I have 2 tables tblMedewerker2 and tblMedewerker3.
tblMedewerker2 has got the following values for employeenumber :129, 143,144,145,146,147,169.
tblMedewerker3 has got the following values for employeenumber: 129, 143,144,145,146,147, 166,167,168.
They contain 7 respectively 9 rows, so the values are unique.
The following query yields 63 rows :
select
a.employeenumber as emp_a
, b.employeenumber as emp_b
, isnull(a.employeenumber, b.employeenumber) as single_employeenumber
from tblMedewerker2 a
full join
tblMedewerker3 b
on exists
(
select a.employeenumber from tblMedewerker2
union
select b.employeenumber from tblmedewerker3
)
whereas this query yields 10 rows:
select
a.employeenumber as emp_a
, b.employeenumber as emp_b
, isnull(a.employeenumber, b.employeenumber) as single_employeenumber
from tblMedewerker2 a
full join
tblMedewerker3 b
on exists
(
select a.employeenumber from tblMedewerker2
intersect
select b.employeenumber from tblmedewerker3
)
Why would the first query turn the SQL into some sort of CROSS JOIN ?
I would say the exists just gives back a TRUE or a FALSE. So why the difference in numbers of records in both queries ?
Thanks !
Rgds
BB

It comes down to an how all JOINs work. Let's think about a simple INNER JOIN
SELECT a.id, b.id
FROM
a
INNER JOIN
b
ON a.id = b.id
This is saying "Compare every row to every row in the other table. When the ON condition is true, include the rows joined together in the result"
Now consider the following valid query:
SELECT a.id, b.id
FROM
a
INNER JOIN
b
ON 1==1
Again, the way it works is as described above. "Compare every row to every row in the other table. When the ON condition is true, include the rows joined together in the result". In this case the ON condition is true for all the comparisons.
So if the left table had 7 rows and the right table has 9, you'll get 63 rows. (I put it in a SQL Fiddle for you here to see for yourself: http://sqlfiddle.com/#!18/87097/17)
Your ON EXISTS statement in your first query is always going to be true, since any row in the tables you are joining can be found in the UNION. Its very similar to my 1==1 example above. The fact that it's a FULL JOIN in the first query doesn't matter. If it's a LEFT JOIN or INNER JOIN or FULL JOIN it will return 63 rows.
In your second query the ON condition is only true in a limited set of circumstances: When the row being evaluated happens to be in the intersection of both tables.
As a sidenote. Your second query can be simplified to a usual ON clause comparing the employeeNumbers. This is because the ON clause is really taking the intersection of both tables. You can write your second query as:
select
a.employeenumber as emp_a
, b.employeenumber as emp_b
, isnull(a.employeenumber, b.employeenumber) as single_employeenumber
from tblMedewerker2 a
full join
tblMedewerker3 b
on a.employeeNumber = b.employeeNumner

As mentioned in my comment above another solution (rather than an answer to cartesian products and set intersection part of your question) might be based on a different approach - this is just sketched out and not tested (it's late here and I'm tired):
Generate a CTE of employee IDs and LEFT JOIN this to each table
WITH EmployeeNumbers AS
SELECT DISTINCT employeenumber
FROM
(SELECT employeenumber FROM tblMedewerker2
UNION ALL
SELECT employeenumber FROM tblMedewerker2
) AS p
SELECT
t2.employeenumber AS empA,
t3.employeenumber AS empB,
ISNULL(t2.employeenumber, t3.employeenumber) AS single_employeenumber
-- an alternative to the above line
-- EN.employeenumber AS single_employeenumber
FROM
EmployeeNumbers AS EN
LEFT JOIN
tblMedewerker2 AS T2 ON EN.employeenumber = T2.employeenumber
LEFT JOIN
tblMedewerker3 AS T2 ON EN.employeenumber = T2.employeenumber

Related

How to get unique rows by one column but sort by the second

There is an example request in which there are several joins.
SELECT DISTINCT ON(a.id_1) 1, a.name, b.task, c.created_at
FROM a
INNER JOIN b ON a.id_2 = b.id
INNER JOIN c ON a.ID_2 = c.id
WHERE a.deleted_at IS NULL
ORDER BY a.id_1 desc
In this case, the query will work, sorting by unique values ​​of id_1 will take place. But I need to sort by the column a.name. In this case, postresql will swear with the words ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions.
The following query can serve as a solution to the problem:
SELECT *
FROM(
SELECT DISTINCT ON(a.id_1) a.name, b.task, c.created_at
FROM a
INNER JOIN b ON a.id_2 = b.id
INNER JOIN c ON a.ID_2 = c.id
WHERE a.deleted_at IS NULL
)
ORDER_BY a.name desc
But in reality the database is very large and such a query is not optimal. Are there other ways to sort by the selected column while keeping one uniqueness?

More Efficient Way to Join Three Tables Together in Postgres

I am attempting to link three tables together in postgres.
All three tables are generated from subqueries. The first table is linked to the second table by the variable call_sign as a FULL JOIN (because I want the superset of entries from both tables). The third table has an INNER JOIN with the second table also on call_sign (but theoretically could have been linked to the first table)
The query runs but is quite slow and I feel will become even slower as I add more data. I realize that there are certain things that I can do to speed things up - like not pulling unnecessary data in the subqueries and not converting text to numbers on the fly. But is there a better way to structure the JOINs between these three tables?
Any advice would be appreciated because I am a novice in postgres.
Here is the code:
select
(CASE
WHEN tmp1.frequency_assigned is NULL
THEN tmp2.lower_frequency
ELSE tmp1.frequency_assigned END) as master_frequency,
(CASE
WHEN tmp1.call_sign is NULL
THEN tmp2.call_sign
ELSE tmp1.call_sign END) as master_call_sign,
(CASE
WHEN tmp1.entity_type is NULL
THEN tmp2.entity_type
ELSE tmp1.entity_type END) as master_entity_type,
(CASE
WHEN tmp1.licensee_id is NULL
THEN tmp2.licensee_id
ELSE tmp1.licensee_id END) as master_licensee_id,
(CASE
WHEN tmp1.entity_name is NULL
THEN tmp2.entity_name
ELSE tmp1.entity_name END) as master_entity_name,
tmp3.market_name
FROM
(select cast(replace(frequency_assigned, ',','.') as decimal) AS frequency_assigned,
frequency_upper_band,
f.uls_file_number,
f.call_sign,
entity_type,
licensee_id,
entity_name
from combo_fr f INNER JOIN combo_en e
ON f.call_sign=e.call_sign
ORDER BY frequency_assigned DESC) tmp1
FULL JOIN
(select cast(replace(lower_frequency, ',','.') as decimal) AS lower_frequency,
upper_frequency,
e.uls_file_number,
mf.call_sign,
entity_type,
licensee_id,
entity_name
FROM market_mf mf INNER JOIN combo_en e
ON mf.call_sign=e.call_sign
ORDER BY lower_frequency DESC) tmp2
ON tmp1.call_sign=tmp2.call_sign
INNER JOIN
(select en.call_sign,
mk.market_name
FROM combo_mk mk
INNER JOIN combo_en en
ON mk.call_sign=en.call_sign) tmp3
ON tmp2.call_sign=tmp3.call_sign
ORDER BY master_frequency DESC;
you'll want to unwind those queries and do it all in one join, if you can. Soemthing like:
select <whatever you need>
from combo_fr f
JOIN combo_en e ON f.call_sign=e.call_sign
JOIN market_mf mf mf ON mf.call_sign=e.call_sign
JOIN combo_mk mk ON mk.call_sign=en.call_sign
I can't completely grok what you're doing, but some of the join clauses might have to become LEFT JOINs in order to deal with places where the call sign does or does not appear.
After creating indexes on call_sign for all four involved tables, try this:
WITH nodup AS (
SELECT call_sign FROM market_mf
EXCEPT SELECT call_sign FROM combo_fr
) SELECT
CAST(REPLACE(u.master_frequency_string, ',','.') AS DECIMAL)
AS master_frequency,
u.call_sign AS master_call_sign,
u.entity_type AS master_entity_type,
u.licensee_id AS master_licensee_id,
u.entity_name AS master_entity_name,
combo_mk.market_name
FROM (SELECT frequency_assigned AS master_frequency_string, call_sign,
entity_type, licensee_id, entity_name
FROM combo_fr
UNION ALL SELECT lower_frequency, call_sign,
entity_type, licensee_id, entity_name
FROM market_mf INNER JOIN nodup USING (call_sign)
) AS u
INNER JOIN combo_en USING (call_sign)
INNER JOIN combo_mk USING (call_sign)
ORDER BY 1 DESC;
I post this because this is the simplest way to understand what you need.
If there are no call_sign values which appear in both market_mf and
combo_fr, WITH nodup ... and INNER JOIN nodup ... can be omitted.
I am making the assumption that call_sign is unique in both combo_fr and market_mf ( = there are no two records in each table with the same value), even if there can be values which can appear in both tables.
It is very unfortunate that you order by a computed column, and that the computation is so silly. A certain optimization would be to convert the frequency strings once and for all in the table itself. The steps would be:
(1) add numeric frequncy columns to your tables (2) populate them with the values converted from the current text columns (3) convert new values directly into the new columns, by inputting them with a locale which has the desired decimal separator.

Can't solve this SQL query

I have a difficulty dealing with a SQL query. I use PostgreSQL.
The query says: Show the customers that have done at least an order that contains products from 3 different categories. The result will be 2 columns, CustomerID, and the amount of orders. I have written this code but I don't think it's correct.
select SalesOrderHeader.CustomerID,
count(SalesOrderHeader.SalesOrderID) AS amount_of_orders
from SalesOrderHeader
inner join SalesOrderDetail on
(SalesOrderHeader.SalesOrderID=SalesOrderDetail.SalesOrderID)
inner join Product on
(SalesOrderDetail.ProductID=Product.ProductID)
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
from Product
group by ProductCategoryID
having count(DISTINCT ProductCategoryID)>=3)
group by SalesOrderHeader.CustomerID;
Here are the database tables needed for the query:
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
Is never going to give you a result as an ID (SalesOrderDetailID) will never logically match a COUNT (count(ProductCategoryID)).
This should get you the output I think you want.
SELECT soh.CustomerID, COUNT(soh.SalesOrderID) AS amount_of_orders
FROM SalesOrderHeader soh
INNER JOIN SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
INNER JOIN Product p ON sod.ProductID = p.ProductID
HAVING COUNT(DISTINCT p.ProductCategoryID) >= 3
GROUP BY soh.CustomerID
Try this :
select CustomerID,count(*) as amount_of_order from
SalesOrder join
(
select SalesOrderID,count(distinct ProductCategoryID) CategoryCount
from SalesOrderDetail JOIN Product using (ProductId)
group by 1
) CatCount using (SalesOrderId)
group by 1
having bool_or(CategoryCount>=3) -- At least on CategoryCount>=3

Get maximum value of an aggregate function

I want to only return the row where the count(object) is the highest, so I have written this query
select klantnr, count(objectnaam)
from klanten inner join deelnames using(klantnr)
inner join reizen using(reisnr)
inner join bezoeken using(reisnr)
where objectnaam = 'Maan'
group by klantnr
Now, I can't do
select max(count(objectnaam))
How would I go about solving this problem?
I have tried by using a subquery which is equally invalid
select max(select count(objectnaam) from ....)
I think I need a subquery in the from, so I have rewritten the query like this which I think is closer to the actual answer but still not right, as now it returns the maximum value of all rows.
select klantnr, max(c)
FROM(
select klantnr, count(objectnaam) as c
from klanten inner join deelnames using(klantnr)
inner join reizen using(reisnr)
inner join bezoeken using(reisnr)
where objectnaam = 'Maan'
group by klantnr) as F
group by klantnr
thanks for any help you can give me!
You do not provide the structure of tables, so probably you have to modify the following query. However it works just for PostgreSQL 9.x+
WITH t AS (
SELECT klantnr, COUNT(objectnaam) AS c
FROM klanten
WHERE objectnaam = 'Maan'
GROUP BY klantnr
ORDER BY c DESC
LIMIT 1
)
SELECT * FROM t
INNER JOIN deelnames USING(klantnr)
INNER JOIN reizen USING(reisnr)
INNER JOIN bezoeken USING(reisnr);
see http://www.postgresql.org/docs/9.3/static/queries-with.html how to use WITH QUERIES.
I have found a simpeler solution:
select klantnr,count (klantnr)
from bezoeken natural join deelnames
where objectnaam ='Maan'
group by klantnr
order by count desc
limit 1

Join table variable vs join view

I have a stored procedure which is running quite slow. Therefore I want to extract some of the query in a separate view.
My code looks something like this:
DECLARE #tmpTable TABLE(..)
INSERT INTO #tmpTable (..) *query* (returns 3000 rows)
Select ... from table1
inner join table2
inner join table3
inner join #tmpTable
...
I then extract (copy-paste) the *query* and put it in a view - i.e. vView.
Doing this will then give me a different result:
Select ... from table1
inner join table2
inner join table3
inner join vView
...
Why? I can see that the vView and the #tmpTable both returns 3000 rows, so they should match (also did a except query to check).
Any comments would be much appriciated as I feel quite stuck with this..
EDITED:
This is the full query for getting the result (using #tmpTable or vView gives me different results, although the appear the same):
select dep.sid as depsid, dep.[name], COUNT(b.sid) as possiblelogins, count(ls.clientsid) as logins
from department dep
inner join relationship r on dep.sid=r.primarysid and r.relationshiptypeid=27 and r.validto is null
inner join [user] u on r.secondarysid=u.sid
inner join relationship r2 on u.sid=r2.secondarysid and r2.validto is null and r2.relationshiptypeid in (1,37)
inner join client c on r2.primarysid=c.sid
inner join ***#tmpTable or vView*** b on b.sid = c.sid
left outer join (select distinct clientsid from logonstatistics) as ls on b.sid=ls.clientsid
GROUP BY dep.sid, dep.[name],dep.isdepartment
HAVING dep.isdepartment=1
You maybe don't need the view/table if you change to this.
It joins on to client c and appears to be there only to JOIN onto logonstatistics
--remove inner join ***#tmpTable or vView*** b on b.sid = c.sid
--change JOIN
left outer join (select distinct clientsid from logonstatistics) as ls on c.sid=ls.clientsid
And change COUNT(b.sid) to COUNT(c.sid) in the SELECT clause
Otherwise, if you get different results you have two options I can see:
Table and view have different data. Have you run a line by line comparsion?
One has NULL, one has a value (especially for the sid column which will affect the JOIN)
Finally, when you says "different results" do you mean you get x2 or x3 rows? A different COUNT? What?