How to generate RowID in Sql server 2000 without using identity column - tsql

Let me frame my question ....
I have say
Name
A
B
C
A
D
B
What I want is
ID Name
1 A
2 B
3 C
4 A
5 D
6 B
If I write
SELECT name, (SELECT COUNT(*) FROM #t AS i2 WHERE i2.Name <= i1.Name) As rn FROM #t AS i1
it will work fine if all the names are distinct/unique...What if they are not(as in this example)
Even NEWID() does not make the trick as it varies overtime?
I am using sql server 2000...
Please help

Here are 2 ways of solving it
1.
DECLARE #t TABLE ([ID] [int] IDENTITY(1,1), name CHAR)
INSERT #t VALUES ('b')
INSERT #t VALUES ('a')
INSERT #t VALUES ('c')
INSERT #t VALUES ('b')
SELECT * FROM #t
2.
DECLARE #t2 TABLE (name CHAR)
INSERT #t2 (name) VALUES ('b')
INSERT #t2 (name) VALUES ('a')
INSERT #t2 (name) VALUES ('c')
INSERT #t2 (name) VALUES ('b')
SELECT ID = ROW_NUMBER() OVER (ORDER BY b), name
FROM (SELECT name, null b FROM #t2) temp

Related

How to populate a column in a table with a string of concatenated values from a column in another table

I'm trying to populate a column in a table with a string of concatenated values from a column in another table. There are numerous solutions suggested, such as How to concatenate text from multiple rows into a single text string in SQL Server, which has 47 answers, but none of them are working for me.
Table #tbl1:
DECLARE #tbl1 TABLE ([Id] INT, [Value] VARCHAR(10))
INSERT INTO #tbl1 ([Id]) VALUES (1),(2),(3)
[Id] [Value]
1 NULL
2 NULL
3 NULL
Table #tbl2:
DECLARE #tbl2 TABLE ([Id] INT, [Value] VARCHAR(10))
INSERT INTO #tbl2 ([Id],[Value]) VALUES (1,'A'),(3,'B'),(1,'C'),(2,'D'),(2,'E'),(3,'F'),(1,'G')
[Id] [Value]
1 A
3 B
1 C
2 D
2 E
3 F
1 G
I'm seeking the syntax to update the records in table #tbl1 to this:
[Id] [Value]
1 ACG
2 DE
3 BF
This doesn't work:
UPDATE [t1]
SET [t1].[Value] = COALESCE([t1].[Value],'') + [t2].[Value]
FROM #tbl1 AS [t1]
LEFT JOIN #tbl2 AS [t2] ON [t1].[Id] = [t2].[Id]
Result:
[Id] [Value]
1 A
2 D
3 B
This syntax produces the same result:
UPDATE [t1]
SET [t1].[Value] = [t2].[Val]
FROM #tbl1 AS [t1]
OUTER APPLY (
SELECT COALESCE([tb2].[Value],[t1].[Value]) AS [Val]
FROM #tbl2 AS [tb2]
WHERE [tb2].[Id] = [t1].[Id]
) AS [t2]
Changing SET to SELECT (below), as in most of the accepted answers, results in the error messages Invalid object name 't1' and Incorrect syntax near 'SELECT'. Expecting SET.
UPDATE [t1]
SELECT [t1].[Value] = COALESCE([t1].[Value],'') + [t2].[Value]
FROM #tbl1 AS [t1]
LEFT JOIN #tbl2 AS [t2] ON [t1].[Id] = [t2].[Id]
My experiments with XML PATH, based upon other Stack Overflow responses (How to concatenate text from multiple rows into a single text string in SQL Server), also produce syntax errors or incorrect results.
Can someone offer the correct syntax?
You have to group the rows, use string_agg to get the values together, and then run the update:
select ##version;DECLARE #tbl1 TABLE ([Id] INT, [Value] VARCHAR(10))
INSERT INTO #tbl1 ([Id]) VALUES (1),(2),(3)
DECLARE #tbl2 TABLE ([Id] INT, [Value] VARCHAR(10))
INSERT INTO #tbl2 ([Id],[Value]) VALUES (1,'A'),(3,'B'),(1,'C'),(2,'D'),(2,'E'),(3,'F'),(1,'G')
;with grouped_data as (
select tbl1.Id, STRING_AGG(tbl2.[Value], '') as value_aggregated
from #tbl1 tbl1
inner join #tbl2 tbl2 on tbl1.Id=tbl2.Id
group by tbl1.id
)
update tbl1 set [Value]=value_aggregated
from #tbl1 tbl1
inner join grouped_data gd on gd.Id=tbl1.id
select * from #tbl1
You can check it running on this DB Fiddle

How to pivot a table to a view on matching-length delimited cells

Disclaimer: I'm dealing with a rather old legacy system so any comments telling me about poor design are redundant, although I do genuinely appreciate any such sentiment. There is a new version that solves most legacy problems but we still have to maintain the old system, so basically, we have to manage for now.
I have a table that looks like this (yes, that is a single column, I know):
And I need a view (for reporting purposes) that will dynamically process the data in said table and return this:
The values are \n-delimited (shudder) and you can assume there will always be the same number of values in each cell (9 in the example, although other databases could have 4 or 12 or any number), although I suppose having NULL-insertion in the event of missing values couldn't hurt. They will also always be in a matching order (as in the example, 'AUD', 'Australian Dollar', and '$' are all the first values in their respective cells, and so on).
I've found various approaches to splitting a single cell out into a view, but nothing that covers merging data in such a way as I require. Sitting at home with a cold has not helped my research capabilities. Help me StackOverflow, you're my only hope!
Bonus points for tidy, relatively readable SQL examples, although I'm anticipating messiness as a natural by-product of the hackish nature of my required solution.
Something like this. I didn't take the time to build out the tables, but it should be fairly obvious where you can replace my variables with your rows. You will also want to do a replace char(10) where I have used commas. You could package it up in a table valued function and then call as a view.
declare #xml1 xml
declare #xml2 xml
declare #xml3 xml
declare #c1 nvarchar(250)
declare #c2 nvarchar(250)
declare #c3 nvarchar(250)
set #c1 = N'AUD,CAD,EUR,GBP,JPY,NZD,USD,KES,CHF';
set #c2 = N'Australian Dollar,Canadian Dollar,Euro,Pound Sterling,Yen,New Zealand Dollar,United States Dollar,Kenyan Shilling, Swiss Franc';
set #c3 = N'$,$,C,L,Y,$,$,K,F';
-- you'd use replace(#c1, char(10), '</r><r>') etc etc for /n delimited code
set #xml1 = N'<root><r>' + replace(#c1,',','</r><r>') + '</r></root>';
set #xml2 = N'<root><r>' + replace(#c2,',','</r><r>') + '</r></root>';
set #xml3 = N'<root><r>' + replace(#c3,',','</r><r>') + '</r></root>';
select code.code, name.name, symbol.symbol
from
(select ROW_NUMBER() over (order by ##rowcount) as ck,
c.value('.','varchar(max)') as [code]
from #xml1.nodes('//root/r') as a(c)) as code
inner join
(select ROW_NUMBER() over (order by ##rowcount) as nk,
n.value('.','varchar(max)') as [name]
from #xml2.nodes('//root/r') as a(n)) as name on code.ck = name.nk
inner join
(select ROW_NUMBER() over (order by ##rowcount) as sk,
s.value('.','varchar(max)') as [symbol]
from #xml3.nodes('//root/r') as a(s)) as symbol on symbol.sk = name.nk
You can run this as a single script in SSMS for verification that it works. No schema necessary.
Using Jeff Moden's Tally Ho! CSV splitter:
CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
and inline CTE data like this
with
data as (select Num,Currencies from (values
(1,'AUD'+char(10)+'CAD'+char(10)+'USD'+char(10)+'KES')
,(2,'Australian DOllar'+char(10)+'Canadian Dollar'+char(10)+'US Dollar'+char(10)+'Kenyan Shilling')
,(3,'$'+char(10)+'$'+char(10)+'$'+char(10)+'k')
)data(Num,Currencies)
),
The solution is as simple as this:
map as (select * from (values
(1,'Code')
,(2,'Name')
,(3,'Symbol')
)map(Num,Col )
)
select
ItemNumber
,max(Code) as Code
,max(Name) as Name
,max(Symbol) as Symbol
from (
select
map.Num
,map.Col
,c.Item
,c.ItemNumber
from data
join map
on map.Num = data.Num
cross apply dbo.DelimitedSplit8K(data.Currencies,char(10)) c
) t
pivot (max(Item) for Col in (Code,Name,Symbol)) pvt
group by ItemNumber
to give us:
ItemNumber Code Name Symbol
-------------- ---- -------------------- ---------------
1 AUD Australian DOllar $
2 CAD Canadian Dollar $
3 USD US Dollar $
4 KES Kenyan Shilling k
Hope this Helps. Run all together or replace the table variable with a temptable.
Sample Data:
IF OBJECT_ID(N'tempdb..#table') > 0
BEGIN
DROP TABLE #table
END
DECLARE #table TABLE(ATTRIBUTELVAUE VARCHAR(MAX))
INSERT INTO #table
SELECT
'AFN
ALL
DZD
USD
EUR
AOA
XCD
XCD
ARS'
INSERT INTO #table
SELECT
'Afghanistan
Albania
Algeria
American Samoa
Andorra
Angola
Anguilla
Antigua and Barbuda
Argentina'
INSERT INTO #table
SELECT
'AF
AL
DZ
AS
AD
AO
AI
AG
AR'
Query:
IF OBJECT_ID(N'tempdb..#TEMP') > 0
BEGIN
DROP TABLE #TEMP
END
DECLARE #StartLoop INT
DECLARE #EndLoop INT
DECLARE #Code TABLE (ID INT IDENTITY(1, 1),
Code VARCHAR(250))
DECLARE #Name TABLE (ID INT IDENTITY(1, 1),
Name VARCHAR(250))
DECLARE #Symbol TABLE (ID INT IDENTITY(1, 1),
Symbol VARCHAR(250))
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ID,
*
INTO #Temp
FROM #table
SELECT #StartLoop = MIN(ID),
#EndLoop = MAX(ID)
FROM #Temp
WHILE #StartLoop <= #EndLoop
BEGIN
DECLARE #WorkingString VARCHAR(MAX)
SELECT #WorkingString = ATTRIBUTELVAUE + CHAR(10) + ' '
FROM #Temp
WHERE ID = #StartLoop
--print #WorkingString
WHILE CHARINDEX(CHAR(10), #WorkingString) > 0
BEGIN
DECLARE #SearchCharacter INT
DECLARE #WorkingStringLength INT
DECLARE #TempStringLength INT
DECLARE #TempString VARCHAR(MAX)
SET #WorkingStringLength = LEN(#WorkingString)
SET #SearchCharacter = CHARINDEX(CHAR(10), #WorkingString)
SET #TempString = SUBSTRING(#WorkingString, 1, #SearchCharacter - 1)
SET #TempStringLength = LEN(#TempString)
SET #WorkingString = SUBSTRING(#WorkingString, #SearchCharacter + 1, #WorkingStringLength)
SET #TempString = REPLACE(#TempString, CHAR(13), '')
IF #StartLoop = 1
BEGIN
INSERT INTO #Code
SELECT #TempString
END
IF #StartLoop = 2
BEGIN
INSERT INTO #Name
SELECT #TempString
END
IF #StartLoop = 3
BEGIN
INSERT INTO #Symbol
SELECT #TempString
END
END
SET #StartLoop = #StartLoop + 1
END
SELECT Code,
Name,
Symbol
FROM #Code AS c
JOIN #Name AS n
ON c.ID = n.ID
JOIN #Symbol AS s
ON s.ID = n.ID
Cleanup:
IF OBJECT_ID(N'tempdb..#TEMP') > 0
BEGIN
DROP TABLE #TEMP
END
IF OBJECT_ID(N'tempdb..#table') > 0
BEGIN
DROP TABLE #table
END
Because I needed a view, this ended up being my solution:
CREATE FUNCTION [dbo].[CurrencyTableGenerator]()
RETURNS
#CurrencyTable TABLE(
Code NVARCHAR(250)
,Name NVARCHAR(250)
,Symbol NVARCHAR(250)
)
AS
BEGIN
DECLARE #xml1 XML
DECLARE #xml2 XML
DECLARE #xml3 XML
DECLARE #C1 NVARCHAR(250)
DECLARE #C2 NVARCHAR(250)
DECLARE #c3 NVARCHAR(250)
SET #c1 = (SELECT ...)
SET #c2 = (SELECT ...)
SET #c3 = (SELECT ...)
SET #xml1 = N'<root><r>' + REPLACE(#c1, CHAR(10), '</r><r>') + '</r></root>';
SET #xml2 = N'<root><r>' + REPLACE(#c2, CHAR(10), '</r><r>') + '</r></root>';
SET #xml3 = N'<root><r>' + REPLACE(#c3, CHAR(10), '</r><r>') + '</r></root>';
INSERT INTO #CurrencyTable
SELECT Code.Code, Name.Name, Symbol.Symbol
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY ##ROWCOUNT) AS ck,
c.value('.', 'VARCHAR(250)') AS [Code]
FROM #xml1.nodes('//root/r') AS a(c)) AS Code
INNER JOIN
(SELECT ROW_NUMBER() OVER (ORDER BY ##ROWCOUNT) AS nk,
n.value('.', 'VARCHAR(250)') AS [Name]
FROM #xml2.nodes('//root/r') AS a(n)) AS Name ON Code.ck = Name.nk
INNER JOIN
(SELECT ROW_NUMBER() OVER (ORDER BY ##ROWCOUNT) AS sk,
s.value('.', 'VARCHAR(250)') AS [Symbol]
FROM #xml3.nodes('//root/r') AS a(s)) AS Symbol ON Symbol.sk = Name.nk
RETURN
END
GO
CREATE VIEW [dbo].[CurrencyView]
AS
SELECT * FROM [dbo].[CurrencyTableGenerator]()
GO
Thanks to RThomas for the function.

Converting Traditional IF EXIST UPDATE ELSE INSERT into MERGE is not working?

I am going to use MERGE to insert or update a table depending upon ehether it's exist or not. This is my query,
declare #t table
(
id int,
name varchar(10)
)
insert into #t values(1,'a')
MERGE INTO #t t1
USING (SELECT id FROM #t WHERE ID = 2) t2 ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET name = 'd', id = 3
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (2, 'b');
select * from #t;
The result is,
id name
1 a
I think it should be,
id name
1 a
2 b
You have your USING part slightly messed up, that's where to put what you want to match against (although in this case you're only using id)
declare #t table
(
id int,
name varchar(10)
)
insert into #t values(1,'a')
MERGE INTO #t t1
USING (SELECT 2, 'b') AS t2 (id, name) ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET name = 'd', id = 3
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (2, 'b');
select * from #t;
As Mikhail pointed out, your query in the USING clause doesn't contain any rows.
If you want to do an upsert, put the new data into the USING clause:
MERGE INTO #t t1
USING (SELECT 2 as id, 'b' as name) t2 ON (t1.id = t2.id) --This no longer has an artificial dependency on #t
WHEN MATCHED THEN
UPDATE SET name = t2.name
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (t2.id, t2.name);
This query won't return anything:
SELECT id FROM #t WHERE ID = 2
Because where is no rows in table with ID = 2, so there is nothing to merge into table.
Besides, in MATCHED clause you are updating a field ID on which you are joining table, i think, it's forbidden.
For each DML operations you have to commit (Marks the end of a successful the transaction)Then only you will be able to see the latest data
For example :
GO
BEGIN TRANSACTION;
GO
DELETE FROM HumanResources.JobCandidate
WHERE JobCandidateID = 13;
GO
COMMIT TRANSACTION;
GO

where exists - all - group by?

I use SQL Server 2008 R2.
I have a weird problem as following. I have a table as shown in
I need to write such a query like:
SELECT DISTINCT Field1
FROM MYTABLE
WHERE Field2 IN (96,102)
in this query, WHERE Field2 IN (96,102) gives me 96 or 102 or both!
More over, I would like to return rows that contains 96 and 102 at the same time!
Is there any suggestion? please write result oriented...
I have made a sqlfiddle for this..
create table a (id int, val int)
go
insert into a select 1, 22
insert into a select 1, 122
insert into a select 2, 22
insert into a select 3, 122
insert into a select 4, 22
insert into a select 4, 122
then select like this
select count(distinct id), id
from a
where val in (22, 122)
group by id
having count(id) > 1
EDIT: count(distinct id) will only show distinct counts..
EDIT:
Here's a sqlfiddle example (thanks to Mark Kremers):
http://sqlfiddle.com/#!3/df201/1
create table mytable (field1 int, field2 int)
go
insert into mytable values (199201, 84)
insert into mytable values (199201, 96)
insert into mytable values (199201, 102)
insert into mytable values (199201, 103)
insert into mytable values (581424, 96)
insert into mytable values (581424, 84)
insert into mytable values (581424, 106)
insert into mytable values (581424, 122)
insert into mytable values (687368, 79)
insert into mytable values (687368, 96)
insert into mytable values (687368, 102)
insert into mytable values (687368, 104)
insert into mytable values (687368, 106)
Here's the query:
select distinct a.field1 from
( select field1 from mytable where field2=96) a
inner join
( select field1 from mytable where field2=102) b
on a.field1 = b.field1
And here are the results:
FIELD1
199201
687368
Finally, here's a simplified version of the query (thans to pst):
select distinct a.field1 from mytable a
inner join mytable b
on a.field1 = b.field1
where a.field2=96 and b.field2=102
Use a self-join? Not the most tidy, but I think it works well for 2 values
SELECT *
FROM T R1
JOIN T R2 -- join table with itself
ON R1.F1 = R2.F1 -- where the first field is the same
WHERE R1.F2 = 96 AND R2.F2 = 102 -- and each has one of the required values
(T = Table, Rx = Relation Alias, Fx = Field)
If there can be an arbitrary number of fields, this can be solved as
CREATE TABLE #T (id int, val int)
GO
INSERT INTO #T (id, val)
VALUES
(1, 22), (1, 22), -- no, only 22 (but 2 records)
(2, 22), (2, 122), -- yes, both values (only)
(3, 122), -- no, only 122
(4, 22), (4,122), -- yes, both values ..
(4, 444), (4, null), -- and extra values
(5, 555) -- no, neither value
GO
-- Using DISTINCT over filtered results first, as
-- SQL Server 2008 does not support HAVING COUNT(DISTINCT F1, F2)
SELECT id
FROM (SELECT DISTINCT id, val
FROM #T
WHERE val IN (22, 122)) AS R1
GROUP BY id
HAVING COUNT(id) >= 2 -- or 3 or ..
GO
-- Or a similar variation, as can COUNT(DISTINCT ..)
-- in the SELECT of a GROUP BY
SELECT id
FROM (SELECT id, COUNT(DISTINCT val) as ct
FROM #T
WHERE val IN (22, 122)
GROUP BY id) AS R1
WHERE ct >= 2 -- or 3 or ..
GO
For larger IN (..) sizes, say above 20 values, it may be advisable to use a separate table or table-value and a JOIN for performance reasons.
Try from your original query:
SELECT DISTINCT Field1
FROM MYTABLE
WHERE rtrim(ltrim(cast(Field2 as varchar))) IN ('96','102')

TSQL: Remove duplicates based on max(date)

I am searching for a query to select the maximum date (a datetime column) and keep its id and row_id. The desire is to DELETE the rows in the source table.
Source Data
id date row_id(unique)
1 11/11/2009 1
1 12/11/2009 2
1 13/11/2009 3
2 1/11/2009 4
Expected Survivors
1 13/11/2009 3
2 1/11/2009 4
What query would I need to achieve the results I am looking for?
Tested on PostgreSQL:
delete from table where (id, date) not in (select id, max(date) from table group by id);
There are various ways of doing this, but the basic idea is the same:
- Indentify the rows you want to keep
- Compare each row in your table to the ones you want to keep
- Delete any that don't match
DELETE
[source]
FROM
yourTable AS [source]
LEFT JOIN
yourTable AS [keep]
ON [keep].id = [source].id
AND [keep].date = (SELECT MAX(date) FROM yourTable WHERE id = [keep].id)
WHERE
[keep].id IS NULL
DELETE
[yourTable]
FROM
[yourTable]
LEFT JOIN
(
SELECT id, MAX(date) AS date FROM yourTable GROUP BY id
)
AS [keep]
ON [keep].id = [yourTable].id
AND [keep].date = [yourTable].date
WHERE
[keep].id IS NULL
DELETE
[source]
FROM
yourTable AS [source]
WHERE
[source].row_id != (SELECT TOP 1 row_id FROM yourTable WHERE id = [source].id ORDER BY date DESC)
DELETE
[source]
FROM
yourTable AS [source]
WHERE
NOT EXISTS (SELECT id FROM yourTable GROUP BY id HAVING id = [source].id AND MAX(date) != [source].date)
Because you are using SQL Server 2000, you'er not able to use the Row Over technique of setting up a sequence and to identify the top row for each unique id.
So, your proposed technique is to use a datetime column to get the top 1 row to remove duplicates. That might work, but there is a possibility that you might still get duplicates having the same datetime value. But that's easy enough to check for.
First check the assumption that all rows are unique based on the id and date columns:
CREATE TABLE #TestTable (rowid INT IDENTITY(1,1), thisid INT, thisdate DATETIME)
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '11/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/12/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
SELECT COUNT(*) AS thiscount
FROM #TestTable
GROUP BY thisid, thisdate
HAVING COUNT(*) > 1
This example returns a value of 2 - indicating that you will still end up with duplicates even after using the date column to remove duplicates. If you return 0, then you have proven that your proposed technique will work.
When de-duping production data, I think one should take some precautions and test before and after. You should create a table to hold the rows you plan to remove so you can recover them easily if you need to after the delete statement has been executed.
Also, it's a good idea to know beforehand how many rows you plan to remove so you can verify the count before and after - and you can gauge the magnitude of the delete operation. Based on how many rows will be affected, you can plan when to run the operation.
To test before the de-duping process, find the occurrences.
-- Get occurrences of duplicates
SELECT COUNT(*) AS thiscount
FROM
#TestTable
GROUP BY thisid
HAVING COUNT(*) > 1
ORDER BY thisid
That gives you the rows with more than one row with the same id. Capture the rows from this query into a temporary table and then run a query using the SUM to get the total number of rows that are not unique based on your key.
To get the number of rows you plan to delete, you need the count of rows that are duplicate based on your unique key, and the number of distinct rows based on your unique key. You subtract the distinct rows from the count of occurrences. All that is pretty straightforward - so I'll leave you to it.
Try this
declare #t table (id int, dt DATETIME,rowid INT IDENTITY(1,1))
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
Query:
delete from #t where rowid not in(
select t.rowid from #t t
inner join(
select MAX(dt)maxdate
from #t
group by id) X
on t.dt = X.maxdate )
select * from #t
Output:
id dt rowid
1 2009-11-13 00:00:00.000 3
2 2009-11-01 00:00:00.000 4
delete from temp where row_id not in (
select t.row_id from temp t
right join
(select id,MAX(dt) as dt from temp group by id) d
on t.dt = d.dt and t.id = d.id)
I have tested this answer..
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
select * from #t
;WITH T AS(
select dense_rank() over(partition by id order by dt desc)NO,DT,ID,rowid from #t )
DELETE T WHERE NO>1