SQL GROUP BY HAVING issue - group-by

I have two tables of records that I need to find all of the matches. The tables are based on different Primary Key identifiers, but the data points are exactly the same. I need a fast query that can show me records that are duplicated from the first table to the second. Here is an example of what I am trying to do:
DECLARE #Table1 TABLE (ID INT, Value INT)
DECLARE #Table2 TABLE (ID INT, Value INT)
INSERT INTO #Table1 VALUES (1, 500)
INSERT INTO #Table1 VALUES (2, 500)
INSERT INTO #Table2 VALUES (3, 500)
INSERT INTO #Table2 VALUES (4, 500)
SELECT MAX(x.T1ID)
,MAX(x.T2ID)
FROM (
SELECT T1ID = t1.ID
,T2ID = 0
,t1.Value
FROM #Table1 t1
UNION ALL
SELECT T1ID = 0
,T2ID = t2.ID
,t2.Value
FROM #Table2 t2
) x
GROUP BY x.Value
HAVING COUNT(*) >= 2
The problem with this code is that it returns record 2 in table 1 correlated to record 4 in table 2. I really need it to return record 1 in table 1 correlated to record 3 in table 2. I tried the following:
SELECT MIN(x.T1ID)
,MIN(x.T2ID)
FROM (
SELECT T1ID = t1.ID
,T2ID = 0
,t1.Value
FROM #Table1 t1
UNION ALL
SELECT T1ID = 0
,T2ID = t2.ID
,t2.Value
FROM #Table2 t2
) x
GROUP BY x.Value
HAVING COUNT(*) >= 2
This code does not work either. It returns 0,0.
Is there a way to return the MIN value greater than 0 for both tables?

Might answer my own question. This seems to work. Are there any reasons why I would not do this?
SELECT MIN(t1.ID)
,MIN(t2.ID)
FROM #Table1 t1
INNER JOIN #Table2 t2 ON t1.Value = t2.Value
GROUP BY t1.Value

If you want to see the records in table1 that have matches in table2 then
select *
from #Table1 T1
where exists (select * from #Table2 T2
where T1.ID=T2.ID
-- you would put the complete join clause that defines a match here
)

Related

Delete duplicate rows with different values in columns

I didn't find my case on the Internet. Tell me how i can delete duplicates if the values are in different columns.
I have a table with a lot of values, for example:
|Id1|Id2|
|89417980|89417978|
|89417980|89417979|
|89417978|89417980|
|89417979|89417980|
I need to exclude duplicates and leave in the answer only:
|Id1|Id2|
|89417980|89417978|
|89417980|89417979|
min/max does not work here, as the values may be different.
I tried to union/join tables on a table/exclude results with temporary tables, but in the end I come to the beginning.
Assuming id1 and id2 are primary keys columns you could try this
DECLARE #tbl table (id1 int, id2 int )
INSERT INTO #tbl
SELECT 89417980, 89417978
UNION SELECT 89417980, 89417979
UNION SELECT 89417978, 89417980
UNION SELECT 89417979, 89417980
SELECT * FROM #tbl
;WITH CTE AS (--Get comparable value as "cs"
SELECT
IIF(id1 > id2, CHECKSUM(id1, id2), CHECKSUM(id2,id1)) as cs
, id1
, id2
, ROW_NUMBER() OVER (order by id1, id2) as rn
FROM #tbl
)
, CTE2 AS ( --Get rows to keep
SELECT MAX (rn) as rn
FROM CTE
GROUP BY cs
HAVING COUNT(*) > 1
)
DELETE tbl -- Delete all except the rows to keep
FROM #tbl tbl
WHERE NOT EXISTS(SELECT 1
FROM CTE2
JOIN CTE ON CTE.rn = CTE2.rn
WHERE CTE.id1 = tbl.id1
AND CTE.id2 = tbl.id2
)
SELECT * FROM #tbl

Is there a shortcut to deleting all in one table not in another?

Are there any shortcuts for deleting everything in one table that does not exist in the second?
I know I can do this:
DECLARE #Table1 TABLE (ID INT)
DECLARE #Table2 TABLE (ID INT)
INSERT INTO #Table1 VALUES (1),(2),(3),(4)
INSERT INTO #Table2 VALUES (3),(4)
DELETE t1
FROM #Table1 t1
WHERE NOT EXISTS (SELECT 1 FROM #Table2 t2 WHERE t2.ID = t1.ID)
SELECT * FROM #Table1
However, I have over 600 columns, so you can see why I might be reluctant to go that route if there's another way. What I WANT to do would look like this:
DECLARE #Table1 TABLE (ID INT)
DECLARE #Table2 TABLE (ID INT)
INSERT INTO #Table1 VALUES (1),(2),(3),(4)
INSERT INTO #Table2 VALUES (3),(4)
DELETE #Table1
EXCEPT SELECT * FROM #Table2
That EXCEPT has been very handy in dealing with this project I'm working on, but I guess it's limited.
Please use this:
DELETE FROM #Table1 WHERE BINARY_CHECKSUM(*) NOT IN(SELECT BINARY_CHECKSUM(*) FROM #Table2);
But be carefull, if your table contains float data types. In very rare cases wrong checksum may be calculated. But, these cases are rare and random, no problems will remain after second delete iteration.
Sure:
DELETE t1
FROM #Table1 t1
LEFT JOIN #Table2 t2 ON t2.ID = t1.ID
WHERE t2.ID IS NULL
My first answer was about the case, when t1 and t2 tables are the same, and joined corressponding cols, when deciding deletion.
Ok, now about the other situation: your #table1 column [ID] can by joined with any unknown #table2 column. You can solve 600+ cols problem using XML:
DELETE FROM #Table1 WHERE CONVERT(NVARCHAR, [ID]) NOT IN
(
SELECT
[col].[value]('(.)[1]', 'NVARCHAR(MAX)')
FROM
(
SELECT [xml] = (CONVERT(XML, (SELECT * FROM #Table2 FOR XML PATH('t2'))))
) AS [t2]
CROSS APPLY [t2].[xml].[nodes]('t2/*') AS [tab]([col])
);

sql recursion: find tree given middle node

I need to get a tree of related nodes given a certain node, but not necessary top node. I've got a solution using two CTEs, since I am struggling to squeeze it all into one CTE :). Might somebody have a sleek solution to avoid using two CTEs? Here is some code that I was playing with:
DECLARE #temp AS TABLE (ID INT, ParentID INT)
INSERT INTO #temp
SELECT 1 ID, NULL AS ParentID
UNION ALL
SELECT 2, 1
UNION ALL
SELECT 3, 2
UNION ALL
SELECT 4, 3
UNION ALL
SELECT 5, 4
UNION ALL
SELECT 6, NULL
UNION ALL
SELECT 7, 6
UNION ALL
SELECT 8, 7
DECLARE #startNode INT = 4
;WITH TheTree (ID,ParentID)
AS (
SELECT ID, ParentID
FROM #temp
WHERE ID = #startNode
UNION ALL
SELECT t.id, t.ParentID
FROM #temp t
JOIN TheTree tr ON t.ParentID = tr.ID
)
SELECT * FROM TheTree
;WITH Up(ID,ParentID)
AS (
SELECT t.id, t.ParentID
FROM #temp t
WHERE t.ID = #startNode
UNION ALL
SELECT t.id, t.ParentID
FROM #temp t
JOIN Up c ON t.id = c.ParentID
)
--SELECT * FROM Up
,TheTree (ID,ParentID)
AS (
SELECT ID, ParentID
FROM Up
WHERE ParentID is null
UNION ALL
SELECT t.id, t.ParentID
FROM #temp t
JOIN TheTree tr ON t.ParentID = tr.ID
)
SELECT * FROM TheTree
thanks
Meh. This avoids using two CTEs, but the result is a brute force kludge that hardly qualifies as "sleek" as it won’t be efficient if your table is at all sizeable. It will:
Recursively build all possible hierarchies
As you build them, flag the target NodeId as you find it
Return only the targeted tree
I threw in column “TreeNumber” on the off-chance the TargetId appears in multiple hierarchies, or if you’d ever have multiple values to check in one pass. “Depth” was added to make the output a bit more legible.
A more complex solution like #John’s might do, and more and subtler tricks could be done with more detailed table sturctures.
DECLARE #startNode INT = 4
;WITH cteAllTrees (TreeNumber, Depth, ID, ParentID, ContainsTarget)
AS (
SELECT
row_number() over (order by ID) TreeNumber
,1
,ID
,ParentID
,case
when ID = #startNode then 1
else 0
end ContainsTarget
FROM #temp
WHERE ParentId is null
UNION ALL
SELECT
tr.TreeNumber
,tr.Depth + 1
,t.id
,t.ParentID
,case
when tr.ContainsTarget = 1 then 1
when t.ID = #startNode then 1
else 0
end ContainsTarget
FROM #temp t
INNER JOIN cteAllTrees tr
ON t.ParentID = tr.ID
)
SELECT
TreeNumber
,Depth
,ID
,ParentId
from cteAllTrees
where TreeNumber in (select TreeNumber from cteAllTrees where ContainsTarget = 1)
order by
TreeNumber
,Depth
,ID
Here is a technique where you can select the entire hierarchy, a specific node with all its children, and even a filtered list and how they roll.
Note: See the comments next to the DECLAREs
Declare #YourTable table (id int,pt int,name varchar(50))
Insert into #YourTable values
(1,null,'1'),(2,1,'2'),(3,1,'3'),(4,2,'4'),(5,2,'5'),(6,3,'6'),(7,null,'7'),(8,7,'8')
Declare #Top int = null --<< Sets top of Hier Try 2
Declare #Nest varchar(25) = '|-----' --<< Optional: Added for readability
Declare #Filter varchar(25) = '' --<< Empty for All or try 4,6
;with cteP as (
Select Seq = cast(1000+Row_Number() over (Order by name) as varchar(500))
,ID
,pt
,Lvl=1
,name
From #YourTable
Where IsNull(#Top,-1) = case when #Top is null then isnull(pt,-1) else ID end
Union All
Select Seq = cast(concat(p.Seq,'.',1000+Row_Number() over (Order by r.name)) as varchar(500))
,r.ID
,r.pt
,p.Lvl+1
,r.name
From #YourTable r
Join cteP p on r.pt = p.ID)
,cteR1 as (Select *,R1=Row_Number() over (Order By Seq) From cteP)
,cteR2 as (Select A.Seq,A.ID,R2=Max(B.R1) From cteR1 A Join cteR1 B on (B.Seq like A.Seq+'%') Group By A.Seq,A.ID )
Select Distinct
A.R1
,B.R2
,A.ID
,A.pt
,A.Lvl
,name = Replicate(#Nest,A.Lvl-1) + A.name
From cteR1 A
Join cteR2 B on A.ID=B.ID
Join (Select R1 From cteR1 where IIF(#Filter='',1,0)+CharIndex(concat(',',ID,','),concat(',',#Filter+','))>0) F on F.R1 between A.R1 and B.R2
Order By A.R1

JOIN 2 Tables get Unique values from 2 columns

I have a (i think) complicated problem, and have no idea how to do that in SQL (the whole day). I have turned the logic around a couple of times, and always something is missing.
There is a join between 2 tables that hold different FK references to a 3rd table.
How to join those 2 tables, so i am sure that all FK combinations are presented, and all are unique?
I need to have the 2 FK columns in one, so i can later join to 3rd. nulls are possible. Group by not possible, since i need to know where the record is from (need Id_1 and Id_2 in the result)
here the sample code:
DECLARE #T1 TABLE (Id int, CommonId int, FK_Id_1 int)
DECLARE #T2 TABLE (Id int,CommonId int, FK_Id_2 int)
INSERT INTO #T1 VALUES (1,1,1)
INSERT INTO #T1 VALUES (2,1,2)
INSERT INTO #T1 VALUES (3,2,3)
INSERT INTO #T1 VALUES (4,3,NULL)
INSERT INTO #T1 VALUES (5,4,NULL)
INSERT INTO #T2 VALUES (11,1,1)
INSERT INTO #T2 VALUES (12,2,2)
INSERT INTO #T2 VALUES (13,2,3)
INSERT INTO #T2 VALUES (14,4,5)
SELECT t1.Id as Id_1,t2.Id as Id_2, t1.CommonId, t1.FK_Id_1, t2.FK_Id_2,
COUNT(t1.FK_Id_1) OVER (PARTITION BY t1.FK_Id_1) AS T1_RANK,
COUNT(t2.FK_Id_2) OVER (PARTITION BY t2.FK_Id_2)AS T2_RANK
FROM #T1 t1
FULL JOIN #T2 t2 on t1.CommonId = t2.CommonId
ORDER BY CommonId
This query is returning this:
Id_1 Id_2 CommonId FK_Id_1 FK_Id_2 T1_RANK T2_RANK
----------- ----------- ----------- ----------- ----------- ----------- -----------
1 11 1 1 1 1 2
2 11 1 2 1 1 2
3 12 2 3 2 2 1
3 13 2 3 3 2 1
4 NULL 3 NULL NULL 0 0
5 14 4 NULL 5 0 1
and i need somehow to make it look like this:
Id_1 Id_2 CommonId FK_Id
----------- ----------- ----------- -----------
1 11 1 1
2 11 1 2
3 12 2 2
3 13 2 3
4 NULL 3 NULL
5 14 4 5
I did something like SELECT COALESCE(FK_Id_1,FK_Id_2) AS FK_Id but this is always selecting T1 with priority. I am thinking of some way to switch the priority depending of the duplicated values.
i have a ugly solution that looks like this, but am looking for a better ideas.
;WITH tmp as (
SELECT t1.Id as Id_1,t2.Id as Id_2, t1.CommonId, t1.FK_Id_1, t2.FK_Id_2,
COUNT(t1.FK_Id_1) OVER (PARTITION BY t1.FK_Id_1) AS T1_RANK,
COUNT(t2.FK_Id_2) OVER (PARTITION BY t2.FK_Id_2)AS T2_RANK
FROM #T1 t1
FULL JOIN #T2 t2 on t1.CommonId = t2.CommonId)
SELECT Id_1, Id_2, CommonId,
CASE
WHEN T1_RANK > T2_RANK THEN COALESCE(FK_Id_2,FK_Id_1)
WHEN T2_RANK > T1_RANK THEN COALESCE(FK_Id_1,FK_Id_2)
END AS FK_Id
FROM tmp
ORDER BY CommonId
I don't know if i explained the whole situation correctly, i must join the tables, because i have other columns coming only from T1 and T2 ( can not UNION->DISTINCT - this will select also the NULLs)
just pick the CommonId, and then full join to both tables.
the query below matches 100% to your desired result.
;WITH cte AS (
SELECT CommonId FROM #T1
UNION SELECT CommonId FROM #T2
)
SELECT t1.Id AS Id_1, t2.Id AS Id_2, cte.CommonId, ISNULL(t2.FK_Id_2, t1.FK_Id_1) AS FK_Id
FROM cte
FULL OUTER JOIN #T1 t1 ON cte.CommonId = t1.CommonId
FULL OUTER JOIN #T2 t2 ON cte.CommonId = t2.CommonId
Beware that that the results of the FK_Id changes with the column order in the ISNULL in the predicate
ISNULL(t2.FK_Id_2, t1.FK_Id_1) is not the same as ISNULL(t1.FK_Id_1, t2.FK_Id_2)
In my opinion this alternative version fits better your request, since it's taking both options of FK.
;WITH cte AS (
SELECT CommonId FROM #T1
UNION SELECT CommonId FROM #T2
)
SELECT t1.Id AS Id_1, t2.Id AS Id_2, cte.CommonId, ISNULL(t2.FK_Id_2, t1.FK_Id_1) AS FK_Id--, cte.CommonId, *
FROM cte
FULL OUTER JOIN #T1 t1 ON cte.CommonId = t1.CommonId
FULL OUTER JOIN #T2 t2 ON cte.CommonId = t2.CommonId
UNION
SELECT t1.Id AS Id_1, t2.Id AS Id_2, cte.CommonId, ISNULL(t1.FK_Id_1, t2.FK_Id_2) AS FK_Id--, cte.CommonId, *
FROM cte
FULL OUTER JOIN #T1 t1 ON cte.CommonId = t1.CommonId
FULL OUTER JOIN #T2 t2 ON cte.CommonId = t2.CommonId

TSQL difficult join issue

I struggling with a problem I have in TSQL, I need to get the top 10 results for each user from a table that might contain more than 10 results.
My natural (and procedurally minded) approach is "for each user in table T select the top 10 results ordered by date".
Each time I try to formulate the question in my mind in a set based approach, I keep running into the term "foreach".
Is it possible to do something like this:
SELECT *
FROM table AS t1
INNER JOIN (
SELECT TOP 10 *
FROM table AS t2
WHERE t2.id = t1.id
ORDER BY date DESC
)
Or even
SELECT ( SELECT TOP 10 *
FROM table AS t2
WHERE t2.id = t1.id
ORDER BY date )
FROM table AS t1
Or is there another solution to this using temp tables that I should think about?
EDIT:
Just to be perfectly clear - I need to the top 10 results for each user in the table, e.g. 10 * N where N = number of users.
EDIT:
In response to a suggestion made by RBarryYoung, I'm having an issue, which is best demonstrated with code:
CREATE TABLE #temp (id INT, date DATETIME)
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
SELECT *
FROM #temp AS t1
CROSS APPLY (
SELECT TOP 1 *
FROM #temp AS t2
WHERE t2.id = t1.id
ORDER BY t2.date DESC
) AS t2
DROP TABLE #temp
Running this, you can see that this doesn't limit the results to the TOP 1... Am I doing something wrong here?
EDIT:
It seems my last example provided a bit of confusion. Here is an example showing what I want to do:
CREATE TABLE #temp (id INT, date DATETIME)
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (2, GETDATE())
SELECT *
FROM #temp AS t1
CROSS APPLY
(
SELECT TOP 2 *
FROM #temp AS t2
WHERE t2.id = t1.id
ORDER BY t2.date DESC
) AS t2
DROP TABLE #temp
This outputs:
1 2009-08-26 09:05:56.570 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.570 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
2 2009-08-26 09:05:56.583 2 2009-08-26 09:05:56.583
If I use distinct:
SELECT DISTINCT t1.id
FROM #temp AS t1
CROSS APPLY
(
SELECT TOP 2 *
FROM #temp AS t2
WHERE t2.id = t1.id
ORDER BY t2.date DESC
) AS t2
I get
1
2
I need
1
1
2
Does anyone know if this is possible?
EDIT:
The following code will do this
WITH RowTable AS
(
SELECT
id, date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS RowNum
FROM #temp
)
SELECT *
FROM RowTable
WHERE RowNum <= 2;
I posted in the comments, but there is no code formatting, so it doesn't look very nice.
Yes, there are several differet good ways to do this in 2005 and 2008. The one most similar to what you are already trying is with CROSS APPLY:
SELECT T2.*
FROM (
SELECT DISTINCT ID FROM table
) AS t1
CROSS APPLY (
SELECT TOP 10 *
FROM table AS t2
WHERE t2.id = t1.id
ORDER BY date DESC
) AS t2
ORDER BY T2.id, date DESC
This then returns the ten most recent entries in [table] (or as many as exist, up to 10), for each distinct [id]. Asumming that [id] corresponds to a user, then this should be exactly what you are asking for.
(edit: slight changes because I did not take into account that T1 and T2 were the same tables and thus there will be multiple duplicate t1.IDs matching multiple duplicate T2.ids.)
select userid, foo, row_number() over (partition by userid order by foo) as rownum from table where rownum <= 10
It is possible, however using nested queries will be slower.
The following will also find the results you are looking for:
SELECT TOP 10 *
FROM table as t1
INNER JOIN table as t2
ON t1.id = t2.id
ORDER BY date DESC
I believe this SO question will answer your question. It's not answering exactly the same question, but I think the solution will work for you too.
Here's a trick I use to do this "top-N-per-group" type of query:
SELECT t1.id
FROM table t1 LEFT OUTER JOIN table t2
ON (t1.user_id = t2.user_id AND (t1.date > t2.date
OR t1.date = t2.date AND t1.id > t2.id))
GROUP BY t1.id
HAVING COUNT(*) < 10
ORDER BY t1.user_id, COALESCE(COUNT(*), 0);