The simplest way to remove duplicate rows in t-sql [closed] - tsql

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 9 years ago.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Improve this question
What is the simplest way of removing duplicates from table in T-SQL query?
Columns are A and B.
Oneliners are most welcome.

How about something like
DECLARE #TABLE TABLE(
A VARCHAR(10),
B VARCHAR(10)
)
INSERT INTO #TABLE VALUES
('1','1'),
('1','2'),
('2','2'),
('1','1'),
('1','2'),
('2','2')
SELECT *
FROM #TABLE
;WITH Vals AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY A,B ORDER BY A,B) ROWID
FROM #TABLE
)
DELETE
FROM Vals
WHERE ROWID > 1
SELECT *
FROM #TABLE
SQL Fiddle DEMO

how to remove duplicate values from MySQL table
DELETE a FROM tbl a
LEFT JOIN
(
SELECT MIN(id) AS id, name
FROM tbl
GROUP BY name
) b ON a.id = b.id AND a.name = b.name
WHERE b.id IS NULL

The following sql will do the job
DECLARE #TEMP AS TABLE(a Varchar(100),b VarChar(100))
INSERT INTO #TEMP (a,b)
SELECT A,B FROM MY_Table
GROUP BY A, b
DELETE My_Table
INSERT INTO My_Table (a, b)
SELECT a, b
FROM #temp
SELECT * FROM My_Table
http://sqlfiddle.com/#!3/5fdb8/1

Try this
SELECT A, B, count(*)
FROM table
GROUP BY A, B
HAVING count(*) > 1

Related

Delete duplicate rows with different values in columns

I didn't find my case on the Internet. Tell me how i can delete duplicates if the values are in different columns.
I have a table with a lot of values, for example:
|Id1|Id2|
|89417980|89417978|
|89417980|89417979|
|89417978|89417980|
|89417979|89417980|
I need to exclude duplicates and leave in the answer only:
|Id1|Id2|
|89417980|89417978|
|89417980|89417979|
min/max does not work here, as the values may be different.
I tried to union/join tables on a table/exclude results with temporary tables, but in the end I come to the beginning.
Assuming id1 and id2 are primary keys columns you could try this
DECLARE #tbl table (id1 int, id2 int )
INSERT INTO #tbl
SELECT 89417980, 89417978
UNION SELECT 89417980, 89417979
UNION SELECT 89417978, 89417980
UNION SELECT 89417979, 89417980
SELECT * FROM #tbl
;WITH CTE AS (--Get comparable value as "cs"
SELECT
IIF(id1 > id2, CHECKSUM(id1, id2), CHECKSUM(id2,id1)) as cs
, id1
, id2
, ROW_NUMBER() OVER (order by id1, id2) as rn
FROM #tbl
)
, CTE2 AS ( --Get rows to keep
SELECT MAX (rn) as rn
FROM CTE
GROUP BY cs
HAVING COUNT(*) > 1
)
DELETE tbl -- Delete all except the rows to keep
FROM #tbl tbl
WHERE NOT EXISTS(SELECT 1
FROM CTE2
JOIN CTE ON CTE.rn = CTE2.rn
WHERE CTE.id1 = tbl.id1
AND CTE.id2 = tbl.id2
)
SELECT * FROM #tbl

T-SQL - Pivot/Crosstab - variable number of values

I have a simple data set that looks like this:
Name Code
A A-One
A A-Two
B B-One
C C-One
C C-Two
C C-Three
I want to output it so it looks like this:
Name Code1 Code2 Code3 Code4 Code...n ...
A A-One A-Two
B B-One
C C-One C-Two C-Three
For each of the 'Name' values, there can be an undetermined number of 'Code' values.
I have been looking at various examples of Pivot SQL [including simple Pivot sql and sql using the XML function?] but I have not been able to figure this out - or to understand if it is even possible.
I would appreciate any help or pointers.
Thanks!
Try it like this:
DECLARE #tbl TABLE([Name] VARCHAR(100),Code VARCHAR(100));
INSERT INTO #tbl VALUES
('A','A-One')
,('A','A-Two')
,('B','B-One')
,('C','C-One')
,('C','C-Two')
,('C','C-Three');
SELECT p.*
FROM
(
SELECT *
,CONCAT('Code',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
FROM #tbl
)t
PIVOT
(
MAX(Code) FOR ColumnName IN (Code1,Code2,Code3,Code4,Code5 /*add as many as you need*/)
)p;
This line
,CONCAT('Code',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
will use a partitioned ROW_NUMBER in order to create numbered column names per code. The rest is simple PIVOT...
UPDATE: A dynamic approach to reflect the max amount of codes per group
CREATE TABLE TblTest([Name] VARCHAR(100),Code VARCHAR(100));
INSERT INTO TblTest VALUES
('A','A-One')
,('A','A-Two')
,('B','B-One')
,('C','C-One')
,('C','C-Two')
,('C','C-Three');
DECLARE #cols VARCHAR(MAX);
WITH GetMaxCount(mc) AS(SELECT TOP 1 COUNT([Code]) FROM TblTest GROUP BY [Name] ORDER BY COUNT([Code]) DESC)
SELECT #cols=STUFF(
(
SELECT CONCAT(',Code',Nmbr)
FROM
(SELECT TOP((SELECT mc FROM GetMaxCount)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) t(Nmbr)
FOR XML PATH('')
),1,1,'');
DECLARE #sql VARCHAR(MAX)=
'SELECT p.*
FROM
(
SELECT *
,CONCAT(''Code'',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
FROM TblTest
)t
PIVOT
(
MAX(Code) FOR ColumnName IN (' + #cols + ')
)p;';
EXEC(#sql);
GO
DROP TABLE TblTest;
As you can see, the only part which will change in order to reflect the actual amount of columns is the list in PIVOTs IN() clause.
You can create a string, which looks like Code1,Code2,Code3,...CodeN and build the statement dynamically. This can be triggered with EXEC().
I'd prefer the first approach. Dynamically created SQL is very mighty, but can be a pain in the neck too...

SQL Server : Update order, Why does it work and can I trust it?

I am using SQL Server 2012
There is a "magic query" I don't understand why it's working using a temporary column I am updating a table and let it use the previous values it already calculated.
It sets the rolMul to be a rolling multiplication of the item till now.
Can I trust this method?
Why does it work in the first place?
If I can't trust it what alternatives can I use?
-- Create data to work on
select * into #Temp from (
select 1 as id, null as rolMul ) A
insert into #temp select 2 as id, null as rolMul
insert into #temp select 3 as id, null as rolMul
insert into #temp select 4 as id, null as rolMul
insert into #temp select 5 as id, null as rolMul
------Here is the magic I don't understand why it's working -----
declare #rolMul int = 1
update #temp set #rolMul = "rolMul" = #rolMul * id from #temp
select * from #temp
-- you can see it did what I wanted multiply all the previous values
drop table #temp
What bothers me is:
Why does it work? can I trust it to work?
What about the order? If
the table was not ordered
select * into #Temp from (
select 3 as id, null as rolMul ) A
insert into #temp select 1 as id, null as rolMul
insert into #temp select 5 as id, null as rolMul
insert into #temp select 2 as id, null as rolMul
insert into #temp select 4 as id, null as rolMul
declare #rolMul int = 1
update #temp set #rolMul = "rolMul" = #rolMul * id from #temp
select * from #temp order by id
drop table #Temp
go
If I can't trust it what alternatives can I use?
As of SQL Server 2012, you can use an efficient rolling sum of logarithms.
WITH tempcte AS (
SELECT
id,
rolmul,
EXP(SUM(LOG(id)) OVER (ORDER BY id)) AS setval
FROM #Temp
)
UPDATE tempcte
SET rolmul = setval;
SQL Server 2012 introduces the OVER clause to the SUM function. Ole Michelsen shows with a brief example how this efficiently solves the running total problem.
The product law of logarithms says that the log of the product of two numbers is equal to the sum of the log of each number.
This identity allows us to use the fast sum to calculate multiplications at similar speed. Take the log before the sum and take the exponent of the result, and you have your answer!
SQL Server gives you LOG and EXP to calculate the natural logarithm (base e) and its exponential. It doesn't matter what base you use as long as you are consistent.
The updatable common table expression is necessary because window expressions can't appear in the SET clause of an update statement.
The query is reliably correct for small numbers of rows, but will overflow very quickly. Try 64 rows of 2 and you'll bust the bigint!
In theory this should product the correct result as long as the ids are unique. In practice, I think your set of ids will always be small :-)

postgresql where clause behavior

I made two queries that I thought should have the same result:
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
WHERE table2.value = '1')
AS result1 ORDER BY id1)
AS result2;
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
)
AS result1 ORDER BY id1)
AS result2
WHERE value = '1';
The only difference being that one had the WHERE clause inside SELECT DISTINCT ON, and the other outside that, but inside SELECT COUNT. But the results were not the same. I don't understand why the position of the WHERE clause should make a difference in this case. Can anyone explain? Or is there a better way to phrase this question?
here's a good way to look at this:
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a;
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a
WHERE value = 2;
The problem has to do with the unique conditions and what is visible where. It is behavior by design.

PostgreSQL Record Reordering using Update with a Sub-Select

I found this solution on the SQL Server forum on how to reorder records in a table.
UPDATE SomeTable
SET rankcol = SubQuery.Sort_Order
FROM
(
SELECT IDCol, Row_Number() OVER (ORDER BY ValueCOL) as SORT_ORDER
FROM SomeTable
) SubQuery
INNER JOIN SomeTable ON
SubQuery.IDCol = SomeTable.IDCol
When I try doing the same on PostgreSQL, I get an error message -
ERROR: table name "sometable" specified more than once
Any help will be appreciated.
Thanks!
You don`t need to explicitly join SomeTable, how cool is that? :)
UPDATE SomeTable
SET rankcol = SubQuery.Sort_Order
FROM
(
SELECT IDCol, Row_Number() OVER (ORDER BY ValueCOL) as SORT_ORDER
FROM SomeTable
) SubQuery
where SubQuery.IDCol = SomeTable.IDCol
remark: Postgres is case insensitive, better use lower-case, like row_number, sort_order, id_col , etc.