Finding duplicate rows but skip the last result? - tsql

I am trying to find duplicate rows in my DB, like this:
SELECT email, COUNT(emailid) AS NumOccurrences
FROM users
GROUP BY emailid HAVING ( COUNT(emailid) > 1 )
This returns the emailid and the number of matches found. Now what I want do is compare the ID column to another table I have and set a column there with the count.
The other table has a column named duplicates, which should contain the amount of duplicates from the select. So let's say we have 3 rows with the same emailid. The duplicates column has a "3" in all 3 rows. What I want is a "2" in the first 2 and nothing or 0 in the last of the 3 matching ID rows.
Is this possible?
Update:
I managed to have a temporary table now, which looks like this:
mailid | rowcount | AmountOfDups
643921 | 1 | 3
643921 | 2 | 3
643921 | 3 | 3
Now, how could I decide that only the first 2 should be updated (by mailid) in the other table? The other table has mailid as well.

SELECT ...
ROW_NUMBER() OVER (PARTITION BY email ORDER BY emailid DESC) AS RN
FROM ...
...is a great starting point for such a problem. Never underestimate the power of ROW_NUMBER()!

Using Sql Server 2005+ you could try something like (full example)
DECLARE #Table TABLE(
ID INT IDENTITY(1,1),
Email VARCHAR(20)
)
INSERT INTO #Table (Email) SELECT 'a'
INSERT INTO #Table (Email) SELECT 'b'
INSERT INTO #Table (Email) SELECT 'c'
INSERT INTO #Table (Email) SELECT 'a'
INSERT INTO #Table (Email) SELECT 'b'
INSERT INTO #Table (Email) SELECT 'a'
; WITH Duplicates AS (
SELECT Email,
COUNT(ID) TotalDuplicates
FROM #Table
GROUP BY Email
HAVING COUNT(ID) > 1
)
, Counts AS (
SELECT t.ID,
ROW_NUMBER() OVER(PARTITION BY t.Email ORDER BY t.ID) EmailID,
d.TotalDuplicates
FROM #Table t INNER JOIN
Duplicates d ON t.Email = d.Email
)
SELECT ID,
CASE
WHEN EmailID = TotalDuplicates
THEN 0
ELSE TotalDuplicates - 1
END Dups
FROM Counts

Related

left join all orders on left on 1 row

I have a table Orders, it has the following columns:
OrderID, ClientID, BankNumber, Adres, Name;
I want to write a query that gives me this result: distinct clientid name and adres on one row with all the belonging orders and corespondating bankaccount numbers on one row: This is my example.
ClientID Adres Name order1 Banknumber Order2 Banknumber order3 Banknumber
First you cannot query something and come up with a results set with infinite number of columns, but you could combine orders and show them in 1 column.
if you are on SQL Azure or SQL2017 you can also use STRING_AGG like this:
select customer.Id, customer.Name, orderSummary.orderData
(select STRING_AGG(orderID+'-'+banknumber+', ') as orderData from orders where customerId = customer.Id) orderSummary
from Customers as customer
You can look at this post for more answers
How to concatenate text from multiple rows into a single text string in SQL server?
And Subquery from Microsoft:
https://technet.microsoft.com/en-us/library/ms189575(v=sql.105).aspx
here is a working sample.
Hope it works for you.
You need to build the Order1, order2, order3...and BankNumber1,BankNumber2...dynamically. I have hard coded in my example
drop table #t1
create table #t1(OrderID Int, ClientID int, BankNumber varchar(50), [Address] varchar(50), Name varchar(50))
insert into #t1
select 1,11,'111','xyz1','xyz'
union all
select 2,22,'112','xyz2','xyzz'
union all
select 3,33,'113','xyz3','xyzzz'
union all
select 100,11,'111','xyz1','xyz'
union all
select 200,22,'112','xyz2','xyzz'
union all
select 300,33,'113','xyz3','xyzzz'
;with cte as
(
select OrderID,ClientID,BankNumber,Address,Name,ROW_NUMBER()over (partition by clientid order by orderid asc) RN
from #t1
)
select ClientID
,max([Order 1]) Order1
,max([Order 2]) Order2
,max([BankNumber 1]) BankNumber1
,max([BankNumber 2]) BankNumber2
from
(
select ClientID,Address,Name,OrderID,BankNumber,'Order '+cast(rn as varchar(10)) OrderSeq
,'BankNumber '+cast(rn as varchar(10)) BankNumberSeq
from cte
) as ST
pivot(max(OrderID) for OrderSeq in ([Order 1],[Order 2])) as pt1
pivot(max(BankNumber) for BankNumberSeq in ([BankNumber 1],[BankNumber 2])) as pt2
group by ClientID

Comparing data between rows within a table

I have following table structure
table 1
ID SOURCE_ID NAME
1 1 A
2 1 B
3 2 B
4 2 C
5 2 A
i need to pick those names which are common across all SOURCE_ID , hence i expect names A and B as they are present in both the SOURCE_ID 1,2.
The following query gives me the expected output:
SELECT DISTINCT NAME
FROM TABLE1 A, TABLE1 B
WHERE A.NAME = B.NAME AND A.SOURCE_ID != B.SOURCE_ID
Now when the data in table changes to include a new record ID 6
table 1
ID SOURCE_ID NAME
1 1 A
2 1 B
3 2 B
4 2 C
5 2 A
6 3 A
The name that is common in all three SOURCE_ID(1,2,3) IS A.
My query fails to return the correct output as new records are entered.
Please provide me a query that works correctly when new records are inserted.
Have a look at something like
DECLARE #Table TABLE(
SOURCE_ID INT,
NAME VARCHAR(20)
)
INSERT INTO #Table SELECT 1,'A'
INSERT INTO #Table SELECT 1,'B'
INSERT INTO #Table SELECT 2,'B'
INSERT INTO #Table SELECT 2,'C'
INSERT INTO #Table SELECT 2,'A'
--INSERT INTO #Table SELECT 3,'A'
;WITH DistinctCount AS (
SELECT NAME,
COUNT(DISTINCT SOURCE_ID) Cnt
FROM #Table
GROUP BY NAME
)
SELECT *
FROM DistinctCount
WHERE Cnt = (SELECT COUNT(DISTINCT SOURCE_ID) FROM #Table)
With the 6th insert commented out, should return A and B, with it included, should return A

Most effective way to get value if select count(*) = 1 with grouping

Lets say I have table with ID int, VALUE string:
ID | VALUE
1 abc
2 abc
3 def
4 abc
5 abc
6 abc
If I do select value, count(*) group by value I should get
VALUE | COUNT
abc 5
def 1
Now the tricky part, if there is count == 1 I need to get that ID from first table. Should I be using CTE? creating resultset where I will add ID string == null and run update b.ID = a.ID where count == 1 ?
Or is there another easier way?
EDIT:
I want to have result table like this:
ID VALUE count
null abc 5
3 def 1
If your ID values are unique, you can simply check to see if the max(id) = min(id). If so, then use either one, otherwise you can return null. Like this:
Select Case When Min(id) = Max(id) Then Min(id) Else Null End As Id,
Value, Count(*) As [Count]
From YourTable
Group By Value
Since you are already performing an aggregate, including the MIN and Max function is not likely to take any extra (noticeable) time. I encourage you to give this a try.
The way I would do it would indeed be a CTE:
using #group AS (SELECT value, Count(*) as count from MyTable GROUP BY value HAVING count = 1)
SELECT MyTable.ID, #group.value, #group.count from MyTable
JOIN #group ON #group.value = MyTable.value
When using group by, after the group by statement you can use a having clause.
So
SELECT [ID]
FROM table
GROUP BY [VALUE]
HAVING COUNT(*) = 1
Edit: with regards to your edited question: this uses some fun joins and unions
CREATE TABLE #table
(ID int IDENTITY,
VALUE varchar(3))
INSERT INTO #table (VALUE)
VALUES('abc'),('abc'),('def'),('abc'),('abc'),('abc')
SELECT * FROM (
SELECT Null as ID,VALUE, COUNT(*) as [Count]
FROM #table
GROUP BY VALUE
HAVING COUNT(*) > 1
UNION ALL
SELECT t.ID,t.VALUE,p.Count FROM
#table t
JOIN
(SELECT VALUE, COUNT(*) as [Count]
FROM #table
GROUP BY VALUE
HAVING COUNT(*) = 1) p
ON t.VALUE=p.VALUE
) a
DROP TABLE #table
maybe not the most efficient but something like this works:
SELECT MAX(Id) as ID,Value FROM Table WHERE COUNT(*) = 1 GROUP BY Value

display unique row from two tables

I have two tables (one for quarter one, one for quarter two), each of which contains employees who have bonus in that quarter. Every employee has a unique id in the company.
I want to get all employees who has bonus in either q1 or q2. No duplicate employee is needed. Both Id, and Amount are required.
Below is my solution, I want to find out if there is a better solution.
declare #q1 table (
EmployeeID int identity(1,1) primary key not null,
amount int
)
declare #q2 table (
EmployeeID int identity(1,1) primary key not null,
amount int
)
insert into #q1
(amount)
select 1
insert into #q1
(amount)
select 2
select * from #q1
insert into #q2
(amount)
select 1
insert into #q2
(amount)
select 11
insert into #q2
(amount)
select 22
select * from #q2
My Solution:
;with both as
(
select EmployeeID
from #q1
union
select EmployeeID
from #q2
)
select a.EmployeeID, a.amount
from #q1 as a
where a.EmployeeID in (select EmployeeID from both)
union all
select b.EmployeeID, b.amount
from #q2 as b
where b.EmployeeID in (select EmployeeID from both) and b.EmployeeID NOT in (select EmployeeID from #q1)
Result:
EmployeeID, Amount
1 1
2 2
3 22
SELECT EmployeeID, Name, SUM(amount) AS TotalBonus
FROM
(SELECT EmployeeID, Name, amount
from #q1
UNION ALL
SELECT EmployeeID, Name, amount
from #q2) AS all
GROUP BY EmployeeID, Name
The subselect UNIONS both tables together. The GROUP BY gives you one row per employee and the SUM means that if someone got lucky in both qs then you get the total. I'm guessing that's the right thing for you.
try this one:
SELECT EmployeeID
FROM EmployeeList
WHERE EmployeeID IN
(SELECT EmployeeID From QuarterOne
UNION
SELECT EmployeeID From QuarterTwo)
OR by using JOIN
SELECT EmployeeID
FROM EmployeeList a INNER JOIN QuarterTwo b
ON a.EmployeeID = b.EmployeeID
INNER JOIN QuarterTwo c
ON a.EmployeeID = c.EmployeeID
This will return all EmployeeID that has record in either quarter.
Try:
SELECT DISTINCT q1.EmployeeID --- Same as q2.EmployeeID thanks to the join
, q1.EmployeeName -- Not defined in OP source.
FROM #q1 AS q1
CROSS JOIN #q2 AS q2
WHERE q1.amount IS NOT NULL
OR q2.amount IS NOT NULL

one column split to more column sql server 2008?

Table name: Table1
id name
1 1-aaa-14 milan road
2 23-abcde-lsd road
3 2-mnbvcx-welcoome street
I want the result like this:
Id name name1 name2
1 1 aaa 14 milan road
2 23 abcde lsd road
3 2 mnbvcx welcoome street
This function ought to give you what you need.
--Drop Function Dbo.Part
Create Function Dbo.Part
(#Value Varchar(8000)
,#Part Int
,#Sep Char(1)='-'
)Returns Varchar(8000)
As Begin
Declare #Start Int
Declare #Finish Int
Set #Start=1
Set #Finish=CharIndex(#Sep,#Value,#Start)
While (#Part>1 And #Finish>0)Begin
Set #Start=#Finish+1
Set #Finish=CharIndex(#Sep,#Value,#Start)
Set #Part=#Part-1
End
If #Part>1 Set #Start=Len(#Value)+1 -- Not found
If #Finish=0 Set #Finish=Len(#Value)+1 -- Last token on line
Return SubString(#Value,#Start,#Finish-#Start)
End
Usage:
Select ID
,Dbo.Part(Name,1,Default)As Name
,Dbo.Part(Name,2,Default)As Name1
,Dbo.Part(Name,3,Default)As Name2
From Dbo.Table1
It's rather compute-intensive, so if Table1 is very long you ought to write the results to another table, which you could refresh from time to time (perhaps once a day, at night).
Better yet, you could create a trigger, which automatically updates Table2 whenever a change is made to Table1. Assuming that column ID is primary key:
Create Table Dbo.Table2(
ID Int Constraint PK_Table2 Primary Key,
Name Varchar(8000),
Name1 Varchar(8000),
Name2 Varchar(8000))
Create Trigger Trigger_Table1 on Dbo.Table1 After Insert,Update,Delete
As Begin
If (Select Count(*)From Deleted)>0
Delete From Dbo.Table2 Where ID=(Select ID From Deleted)
If (Select Count(*)From Inserted)>0
Insert Dbo.Table2(ID, Name, Name1, Name2)
Select ID
,Dbo.Part(Name,1,Default)
,Dbo.Part(Name,2,Default)
,Dbo.Part(Name,3,Default)
From Inserted
End
Now, do your data manipulation (Insert, Update, Delete) on Table1, but do your Select statements on Table2 instead.
The below solution uses a recursive CTE for splitting the strings, and PIVOT for displaying the parts in their own columns.
WITH Table1 (id, name) AS (
SELECT 1, '1-aaa-14 milan road' UNION ALL
SELECT 2, '23-abcde-lsd road' UNION ALL
SELECT 3, '2-mnbvcx-welcoome street'
),
cutpositions AS (
SELECT
id, name,
rownum = 1,
startpos = 1,
nextdash = CHARINDEX('-', name + '-')
FROM Table1
UNION ALL
SELECT
id, name,
rownum + 1,
nextdash + 1,
CHARINDEX('-', name + '-', nextdash + 1)
FROM cutpositions c
WHERE nextdash < LEN(name)
)
SELECT
id,
[1] AS name,
[2] AS name1,
[3] AS name2
/* add more columns here */
FROM (
SELECT
id, rownum,
part = SUBSTRING(name, startpos, nextdash - startpos)
FROM cutpositions
) s
PIVOT ( MAX(part) FOR rownum IN ([1], [2], [3] /* extend the list here */) ) x
Without additional modifications this query can split names consisting of up to 100 parts (that's the default maximum recursion depth, which can be changed), but can only display no more than 3 of them. You can easily extend it to however many parts you want it to display, just follow the instructions in the comments.
select T.id,
substring(T.Name, 1, D1.Pos-1) as Name,
substring(T.Name, D1.Pos+1, D2.Pos-D1.Pos-1) as Name1,
substring(T.Name, D2.Pos+1, len(T.name)) as Name2
from Table1 as T
cross apply (select charindex('-', T.Name, 1)) as D1(Pos)
cross apply (select charindex('-', T.Name, D1.Pos+1)) as D2(Pos)
Testing performance of suggested solutions
Setup:
create table Table1
(
id int identity primary key,
Name varchar(50)
)
go
insert into Table1
select '1-aaa-14 milan road' union all
select '23-abcde-lsd road' union all
select '2-mnbvcx-welcoome street'
go 10000
Result:
if you always will have 2 dashes, you can do the following by using PARSENAME
--testing table
CREATE TABLE #test(id INT, NAME VARCHAR(1000))
INSERT #test VALUES(1, '1-aaa-14 milan road')
INSERT #test VALUES(2, '23-abcde-lsd road')
INSERT #test VALUES(3, '2-mnbvcx-welcoome street')
SELECT id,PARSENAME(name,3) AS name,
PARSENAME(name,2) AS name1,
PARSENAME(name,1)AS name2
FROM (
SELECT id,REPLACE(NAME,'-','.') NAME
FROM #test)x
if you have dots in the name column you have to first replace them and then replace them back to dots in the end
example, by using a tilde to substitute the dot
INSERT #test VALUES(3, '5-mnbvcx-welcoome street.')
SELECT id,REPLACE(PARSENAME(name,3),'~','.') AS name,
REPLACE(PARSENAME(name,2),'~','.') AS name1,
REPLACE(PARSENAME(name,1),'~','.') AS name2
FROM (
SELECT id,REPLACE(REPLACE(NAME,'.','~'),'-','.') NAME
FROM #test)x