How to delete duplicate rows without unique ID - postgresql

Id
SleepDay
TotalMinutesAsleep
TotalTimeInBed
8378563200
4/20/2016
381
409
8378563200
4/21/2016
396
417
8378563200
4/22/2016
441
469
8378563200
4/23/2016
565
591
8378563200
4/24/2016
458
492
8378563200
4/25/2016
388
402 ---> this is the duplicate
8378563200
4/25/2016
388
402
8378563200
4/26/2016
550
584
8378563200
4/27/2016
531
600
This is part of my table and how can I delete the duplicate row? I use CTE clause but it deleted all records of id #8378563200 on 4/25/2016.

Use:
DELETE
FROM table1
WHERE ctid IN (SELECT ctid
FROM (SELECT ctid,
ROW_NUMBER() OVER (
PARTITION BY Id, SleepDay,TotalMinutesAsleep,TotalTimeInBed ) AS rn
FROM table1) t
WHERE rn > 1);
Replace table1 with your own table name.

Without column(s) to identify a unique row?
Then you could use ctid.
ctid
The physical location of the row version within its table. Note
that although the ctid can be used to locate the row version very
quickly, a row's ctid will change if it is updated or moved by VACUUM
FULL. Therefore ctid is useless as a long-term row identifier. A
primary key should be used to identify logical rows
For example:
delete
from SleepLogs log1
using SleepLogs log2
where log2.Id = log1.Id
and log2.SleepDay = log1.SleepDay
and log2.TotalMinutesAsleep = log1.TotalMinutesAsleep
and log2.TotalTimeInBed = log1.TotalTimeInBed
and log2.ctid < log1.ctid;
1 rows affected
select * from SleepLogs
id
sleepday
totalminutesasleep
totaltimeinbed
8378563200
2016-04-20
381
409
8378563200
2016-04-21
396
417
8378563200
2016-04-22
441
469
8378563200
2016-04-23
565
591
8378563200
2016-04-24
458
492
8378563200
2016-04-25
388
402
8378563200
2016-04-26
550
584
8378563200
2016-04-27
531
600
Test on db<>fiddle here

Related

Create new columns using a column value and fill values from another column

I have the following table in PostgreSQL
id type name
146 INN Ofloxacin
146 TRADE_NAME Ocuflox
146 TRADE_NAME Ofloxacin
146 TRADE_NAME Tarivid i.v.
146 TRADE_NAME Tarivid 400
147 TRADE_NAME Mictral
147 TRADE_NAME Neggram
543 INN Amphetamine
543 INN Amfetamine
543 TRADE_NAME Adzenys xr-odt
543 TRADE_NAME Adzenys er
543 TRADE_NAME Dyanavel xr
I would like to create two new columns trade_name and inn and fill their respective value (copying over or concatenate the INN values) from column 'name'. I am expecting the following output
id trade_name inn
146 Ocuflox Ofloxacin
146 Ofloxacin Ofloxacin
146 Tarivid i.v. Ofloxacin
146 Tarivid 400 Ofloxacin
147 Mictral Ofloxacin
147 Neggram Ofloxacin
543 Adzenys xr-odt Amphetamine | Amfetamine
543 Adzenys er Amphetamine | Amfetamine
543 Dyanavel xr Amphetamine | Amfetamine
Any help is highly appreciated.
You can get a result set of distinct ids and then join that back to the same table. Once to get trade_names and once to get inn records:
SELECT ids.id,
tradenames.name as trade_name,
inns.name as inn
FROM
(SELECT DISTINCT id FROM yourtable) as ids
LEFT OUTER JOIN yourtable as tradenames
ON ids.id = tradenames.id
AND tradenames.type = 'TRADE_NAME'
LEFT OUTER JOIN yourtable as inns
ON ids.id = inns.id
AND inns.type = 'INNS';
You might also be able to pull this off with a pivot, but I think that would be overkill for the two output columns you are after.

How to sum the total of each row?

Data is
cases e_id
NULL 2820
3 3107
5 2987
66 2987
18 503
26 503
1 503
108 503
32 503
4 503
Expectation
On the basis of unique e_id , sum the cases in extreme count column.
cases e_id count
NULL 2820 0
3 3107 3
5 2987 71
66 2987 71
18 503 189
26 503 189
1 503 189
108 503 189
32 503 189
4 503 189
That can be done using a window function:
select cases, e_id, sum(cases) over (partition by e_id) as count
from the_table;

SELECT FROM VALUES used a bit like a CASE statement - but possibly more powerful

I just found myself writing the code below - which works.
Interesting, but is it necessarily the best method?
the syntax allows the TRY_CAST to only be performed once.
Note "Atextfield" can contain valid numbers and invalid numbers.
SELECT *
FROM call
WHERE
EXISTS ( SELECT 1
FROM ( VALUES( TRY_CAST(call.[Atextfield] AS int) )
) AS Table1(num)
WHERE
(Table1.num BETWEEN 124 AND 140 )
OR (Table1.num BETWEEN 143 AND 146 )
OR (Table1.num BETWEEN 148 AND 149 )
OR (Table1.num BETWEEN 160 AND 169 )
OR (Table1.num BETWEEN 181 AND 189 )
)
;
2 .Could this be re-written as follows?
SELECT *
FROM [call]
WHERE TRY_CAST([call].AtextField AS TINYINT) BETWEEN 124 AND 189
AND TRY_CAST([call].AtextField AS TINYINT) NOT IN (141,142,147)
AND TRY_CAST([call].AtextField AS TINYINT) NOT BETWEEN 150 AND 159
AND TRY_CAST([call].AtextField AS TINYINT) NOT BETWEEN 170 AND 180
Note I'm new to CASE in t-sql...
2A. Is the TRY_CAST(...) evaluated more than once?
Which of the above will be quicker?
Is there a better way to write this?
Is the first method useful when the criteria get more involved and complex.
Is this an acceptable approach?
Harvey
There's no need to use exists or 1 = CASE...
Just put your logic in the where clause directly. I'd probably do something like this:
SELECT *
FROM [call]
WHERE TRY_CAST([call].AtextField AS TINYINT) BETWEEN 124 AND 189
AND TRY_CAST([call].AtextField AS TINYINT) NOT IN (141,142,147)
AND TRY_CAST([call].AtextField AS TINYINT) NOT BETWEEN 150 AND 159
AND TRY_CAST([call].AtextField AS TINYINT) NOT BETWEEN 170 AND 180
Cross Apply Method:
SELECT *
FROM [call]
CROSS APPLY (SELECT CAST(PersonID AS TINYINT)) CA(intField)
WHERE intField BETWEEN 124 AND 189
AND intField NOT IN (141,142,147)
AND intField NOT BETWEEN 150 AND 159
AND intField NOT BETWEEN 170 AND 180
My guess is that your query and mine queries will be pretty similiar. If you want to check performance, try running this first and then running each query and recording the logical reads and times.
SET STATISTICS IO ON
SET STATISTICS TIME ON

SQL Query to fetch count for Parent-Child data relationship

I have a requirement to get count of parent-child relationship.
QuestionID ParentQuestionID
207 NULL
208 NULL
209 207
210 208
211 209
212 210
For example, question id 207 has child id 209 & 209 has child id 211. so totally 207 has two child ids. So i want to return count as 2. How can i do that. Can some one help?
Try this:
;with cte as
(
select QuestionID, ParentQuestionID, 0 as lvl
from questiontable
where QuestionID = 207
union all
select q.QuestionID, q.ParentQuestionID, lvl+1
from questiontable q
inner join cte c on c.QuestionID= q.ParentQuestionID
)
select count(*) from cte
where QuestionID <> 207
You can use a parameter instead of hard-coded value 207 to make it dynamic for any QuestionID.
Demo

Recursive CTE with multiple valid same parent child relationships

I have an equipment inventory application I am working on. The piece of equipment is my top level and it contains assemblies, sub-assemblies and parts. I am trying to use recursive CTE to display the parent/child relationships. The issue I am having is that some assemblies can have multiple sub-assemblies that are the same, meaning there is not difference in the part numbers. This is causing my query to not show the correct relationship based on my order by statement. This is the first time I have used CTE so I have be using a lot learned on the web.
PartNumberID 174 is used twice in this assembly.
Sample Table
equipmentID parentPartNumberID partNumberID
17 1 281
17 281 156
17 156 161
17 161 224
17 281 174
17 174 192
17 192 56
17 174 193
17 281 174
17 174 192
17 192 56
17 174 193
17 281 283
17 ` 283 183
17 283 277
17 283 173
Results of Query
PARENT CHILD PARTLEVEL HIERARCHY
1 281 0 281
281 156 1 281.156
156 161 2 281.156.161
161 224 3 281.156.161.224
281 174 1 281.174
281 174 1 281.174
174 192 2 281.174.192
174 192 2 281.174.192
192 56 3 281.174.192.56
192 56 3 281.174.192.56
174 193 2 281.174.193
174 193 2 281.174.193
281 283 1 281.283
283 173 2 281.283.173
283 183 2 281.283.183
283 277 2 281.283.277
As you can see the hierarchy is created correctly but I it is not being returned correctly because there is nothing unique for these 2 assemblies for the order by statement.
The Code:
with parts(PARENT,CHILD,PARTLEVEL,HIERARCHY) as (select parentPartNumberID,
--- Used to get rid of duplicates
CASE WHEN ROW_NUMBER() OVER (PARTITION BY partNumberID ORDER BY partNumberID) > 1
THEN NULL
ELSE partNumberID END AS partNumberID,
0,
CAST( partNumberID as nvarchar) as PARTLEVEL
FROM db.tbl_ELEMENTS
WHERE parentPartNumberID=1 and equiptmentID=17
UNION ALL
SELECT part1.parentPartNumberId,
--- Used to get rid of duplicates
CASE WHEN ROW_NUMBER() OVER (PARTITION BY parts1.partNumberID ORDER BY parts1.partNumberID) > 1
THEN 10000 + parts1.partNumberID
ELSE parts1.partNumberID END,
PARTLEVEL+1,
cast(parts.hierarchy + '.' + CAST(parts1.partNumberID as nvarchar) as nvarchar)
from dbo.tbl_BOM_Elements as parts1 inner
join parts onparts1.parentPartNumberID=parts.CHILD
where id =17)
select CASE WHEN PARENT > 10000
THEN PARENT - 10000
ELSE PARENT END AS PARENT,
CASE WHEN CHILD > 10000
THEN CHILD - 10000
ELSE CHILD END AS CHILD,
PARTLEVEL,HIERARCHY
from parts
order by hierarchy
I tried to create a unique ID to order but was not successful. Any suggestions would be greatly appreciated.
I'll start by just answering the part about getting a sequential id.
If you have control you could just a unique Id to your source table. Having a surrogate primary key would be pretty typical here.
You could instead use a second CTE before the recursive one and add the row numbers there using ROW_NUMBER() OVER BY (ORDER BY equipmentID, parentPartNumberID, partNumberID). Then build your recursive CTE off of that rather than the source table directly.
Better might be to use the first CTE to instead GROUP BY equipmentID, parentPartNumberID, partNumberID and add a COUNT(1) field. This would let you instead use the count in you hierarchy rather than getting the duplicates. Something like 281.283.277x2 or whatever.