Deleting duplicate entry from table

Deleting duplicate entry from table - db2

Suppose I have a table as follows: (on DB2 9.7.2)
COL1 COL2 COL3
----------- ---------- ----------
3 4 xyz
3 4 xyz
Now I want to write a query such that only one from these two identical records will be deleted. How can I achieve this?
I can think of :
delete from ;
or
delete from where col1=3;
but both of the above queries will delete both records whereas I want to keep one of them.

If LIMIT doesn't work, this will:
DELETE FROM (SELECT * FROM tbl WHERE col = 3 FETCH FIRST ROW ONLY)

Can't you use a limit clause?
DELETE FROM <table> WHERE <column>=3 LIMIT 1

This is something that served my purpose:
DELETE FROM tabA M
WHERE M.tabAky IN (SELECT tabAky
FROM (SELECT tabAky,
ROW_NUMBER() OVER (PARTITION BY tabAcol1,
tabAcol2,
tabAcoln)
FROM tabA a) AS X (tabAky, ROWNUM)
WHERE ROWNUM> 1) ;

Try This
delete from table A (select row_number() over (partition by col1 order by col1 ) count,* from table) where A.count> 1

Related

Selecting row(s) that have distinct count (one) of certain column

I have following dataset:
org system_id punch_start_tb1 punch_start_tb2
CG 100242 2022-08-16T00:08:00Z 2022-08-16T03:08:00Z
LA 250595 2022-08-16T00:00:00Z 2022-08-16T03:00:00Z
LB 300133 2022-08-15T04:00:00Z 2022-08-16T04:00:00Z
LB 300133 2022-08-16T04:00:00Z 2022-08-15T04:00:00Z
MO 400037 2022-08-15T14:00:00Z 2022-08-15T23:00:00Z
MO 400037 2022-08-15T23:00:00Z 2022-08-15T14:00:00Z
I am trying to filter out data so that it only populates the outcome when Count of "system_id" = 1.
So, the expected outcome would be only following two rows:
org system_id punch_start_tb1 punch_start_tb2
CG 100242 2022-08-16T00:08:00Z 2022-08-16T03:08:00Z
LA 250595 2022-08-16T00:00:00Z 2022-08-16T03:00:00Z
I tried with Group by and Having clause, but I did not have a success.

You can try below
SELECT * FROM
(
SELECT org,system_id,punch_start_tbl,punch_start_tb2
,ROW_NUMBER()OVER(PARTITION BY system_id ORDER BY system_id)RN
FROM <TableName>
)X
WHERE RN = 1

CTE returns org with only one record then join with main table on org column.
;WITH CTE AS (
select org
from <table_name>
group by org
Having count(1) = 1
)
select t.*
from cte
inner join <table_name> t on cte.org = t.org

You can try this (use min because we have only one row):
select MIN(org), system_id, MIN(punch_start_tb1), MIN(punch_start_tb2)
from <table_name>
group by system_id
Having count(1) = 1
or use answer #Meyssam Toluie with group by by system_id

how to select multiple column from the table using group by( based on one column) , having and count in hive query

Requirement :
Using group by A and get records having count > 1
eg:
SELECT count(sk), id, sk
FROM table x
GROUP BY id
HAVING COUNT(sk) > 1
But I am not able to select sk in select statement. Is there any other way to do this. how to use partition on this input and output set attached here?

Something like this, you can do.
select * from (
SELECT count(sk)over(partition by id) as cnt, id, sk
FROM table x) a
where a.cnt >1

How to Select the Maximum value occurring 2 times?

Suppose I have a table of values looking like this:
Sample_Number |
-------------------
1 |
1 |
2 |
3 |
3 |
4 |
5 |
How can I write a SELECT statement to return the maximum sample number that occurs exactly 2 times? In the sample data the value I am looking for would be 3.
I imagine there could be a number of answers to this - I am especially interested in a solution with no inner selects and that uses the Having clause (if this is possible).

You can use this query:
SELECT TOP 1 Sample_Number As MaxSampleNumberThatOccursTwice
FROM dbo.TableName
GROUP BY Sample_Number
HAVING COUNT(*) = 2
ORDER BY Sample_Number DESC

I'm sure there's an easier way to do this, but you can do it by pulling all of the Sample_Numbers with exactly two entries, and pulling the MAX() of those values:
;With Cte As
(
Select Sample_Number
From Test
Group By Sample_Number
Having Count(Sample_Number) = 2
)
Select Max(Sample_Number)
From Cte

;with cte
as
(select sample_number
from #temp
group by Sample_Number
having
count(Analysis_ID)=2
)
select max(sample_number) from cte

I would use subselect:
SELECT MAX (sample_number)
FROM (SELECT sample_number
FROM TAB1
GROUP BY sample_number
HAVING COUNT(sample_number) =2
)

Group by the Sample_Number and get the count of the group
and only select if the count is 2
select Sample_Number, count(*) count from someTable
group by Sample_Number
having count=2

PostgreSQL Removing duplicates

I am working on postgres query to remove duplicates from a table. The following table is dynamically generated and I want to write a select query which will remove the record if the first row has duplicate values.
The table looks something like this
Ist col 2nd col
4 62
6 34
5 26
5 12
I want to write a select query which remove either row 3 or 4.

There is no need for an intermediate table:
delete from df1
where ctid not in (select min(ctid)
from df1
group by first_column);
If you are deleting many rows from a large table, the approach with an intermediate table is probably faster.
If you just want to get unique values for one column, you can use:
select distinct on (first_column) *
from the_table
order by first_column;
Or simply
select first_column, min(second_column)
from the_table
group by first_column;

select count(first) as cnt, first, second
from df1
group by first
having(count(first) = 1)
if you want to keep one of the rows (sorry, I initially missed it if you wanted that):
select first, min(second)
from df1
group by first
Where the table's name is df1 and the columns are named first and second.
You can actually leave off the count(first) as cnt if you want.
At the risk of stating the obvious, once you know how to select the data you want (or don't want) the delete the records any of a dozen ways is simple.
If you want to replace the table or make a new table you can just use create table as for the deletion:
create table tmp as
select count(first) as cnt, first, second
from df1
group by first
having(count(first) = 1);
drop table df1;
create table df1 as select * from tmp;
or using DELETE FROM:
DELETE FROM df1 WHERE first NOT IN (SELECT first FROM tmp);
You could also use select into, etc, etc.

if you want to SELECT unique rows:
SELECT * FROM ztable u
WHERE NOT EXISTS ( -- There is no other record
SELECT * FROM ztable x
WHERE x.id = u.id -- with the same id
AND x.ctid < u.ctid -- , but with a different(lower) "internal" rowid
); -- so u.* must be unique
if you want to SELECT the other rows, which were suppressed in the previous query:
SELECT * FROM ztable nu
WHERE EXISTS ( -- another record exists
SELECT * FROM ztable x
WHERE x.id = nu.id -- with the same id
AND x.ctid < nu.ctid -- , but with a different(lower) "internal" rowid
);
if you want to DELETE records, making the table unique (but keeping one record per id):
DELETE FROM ztable d
WHERE EXISTS ( -- another record exists
SELECT * FROM ztable x
WHERE x.id = d.id -- with the same id
AND x.ctid < d.ctid -- , but with a different(lower) "internal" rowid
);

So basically I did this
create temp t1 as
select first, min (second) as second
from df1
group by first
select * from df1
inner join t1 on t1.first = df1.first and t1.second = df1.second
Its a satisfactory answer. Thanks for your help #Hack-R

T-SQL how to count the number of duplicate rows then print the outcome?

I have a table ProductNumberDuplicates_backups, which has two columns named ProductID and ProductNumber. There are some duplicate ProductNumbers. How can I count the distinct number of products, then print out the outcome like "() products was backup." ? Because this is inside a stored procedure, I have to use a variable #numrecord as the distinct number of rows. I put my codes like this:
set #numrecord= select distinct ProductNumber
from ProductNumberDuplicates_backups where COUNT(*) > 1
group by ProductID
having Count(ProductNumber)>1
Print cast(#numrecord as varchar)+' product(s) were backed up.'
obviously the error was after the = sign as the select can not follow it. I've search for similar cases but they are just select statements. Please help. Many thanks!

Try
select #numrecord= count(distinct ProductNumber)
from ProductNumberDuplicates_backups
Print cast(#numrecord as varchar)+' product(s) were backed up.'

begin tran
create table ProductNumberDuplicates_backups (
ProductNumber int
)
insert ProductNumberDuplicates_backups(ProductNumber)
select 1
union all
select 2
union all
select 1
union all
select 3
union all
select 2
select * from ProductNumberDuplicates_backups
declare #numRecord int
select #numRecord = count(ProductNumber) from
(select ProductNumber, ROW_NUMBER()
over (partition by ProductNumber order by ProductNumber) RowNumber
from ProductNumberDuplicates_backups) p
where p.RowNumber > 1
print cast(#numRecord as varchar) + ' product(s) were backed up.'
rollback

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Deleting duplicate entry from table - db2

If LIMIT doesn't work, this will: DELETE FROM (SELECT * FROM tbl WHERE col = 3 FETCH FIRST ROW ONLY)

Can't you use a limit clause? DELETE FROM <table> WHERE <column>=3 LIMIT 1

This is something that served my purpose: DELETE FROM tabA M WHERE M.tabAky IN (SELECT tabAky FROM (SELECT tabAky, ROW_NUMBER() OVER (PARTITION BY tabAcol1, tabAcol2, tabAcoln) FROM tabA a) AS X (tabAky, ROWNUM) WHERE ROWNUM> 1) ;

Try This delete from table A (select row_number() over (partition by col1 order by col1 ) count,* from table) where A.count> 1

Related

Selecting row(s) that have distinct count (one) of certain column

how to select multiple column from the table using group by( based on one column) , having and count in hive query

How to Select the Maximum value occurring 2 times?

PostgreSQL Removing duplicates

T-SQL how to count the number of duplicate rows then print the outcome?

Categories

Resources