Get unique values across two columns - postgresql

I have a table that looks like this:
id | col_1 | col_2
------+------------------+-------------------
1 | 12 | 15
2 | 12 | 16
3 | 12 | 17
4 | 13 | 18
5 | 14 | 18
6 | 14 | 19
7 | 15 | 19
8 | 16 | 20
I know if I do something like this, it will return all unique values from col_1:
select distinct(col_1) from table;
Is there a way I can get the distinct values across two columns? So my output would only be:
12
13
14
15
16
17
18
19
20
That is, it would take the distinct values from col_1 and add them to col_2's distinct values while also removing any values that are in both distinct lists (such as 15 which appears in both col_1 and col_2

You can use a UNION
select col_1
from the_table
union
select col_2
from the_table;
union implies a distinct operation, the above is the same as:
select distinct col
from (
select col_1 as col
from the_table
union all
select col_2 as col
from the_table
) x

You will need to use union
select col1 from table
union
select col2 from table;
You will not need distinct here because a union automatically does that for you (as opposed to a union all).

Related

Difference of top two values while GROUP BY

Suppose I have the following SQL Table:
id | score
------------
1 | 4433
1 | 678
1 | 1230
1 | 414
5 | 8899
5 | 123
6 | 2345
6 | 567
6 | 2323
Now I wanted to do a GROUP BY id operation wherein the score column would be modified as follows: take the absolute difference between the top two highest scores for each id.
For example, the response for the above query should be:
id | score
------------
1 | 3203
5 | 8776
6 | 22
How can I perform this query in PostgreSQL?
Using ROW_NUMBER along with pivoting logic we can try:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY score DESC) rn
FROM yourTable
)
SELECT id,
ABS(MAX(score) FILTER (WHERE rn = 1) -
MAX(score) FILTER (WHERE rn = 2)) AS score
FROM cte
GROUP BY id;
Demo

Pivot sum in PostgreSQL

I'm using PosgreSQL 4.2. And I have the following code:
SELECT ID, Num1, Num2 FROM Tab1
WHERE ID IN(1,2,3,4.....50);
Returns results as:
ID | Num1 | Num2
----+--------------------
1 | 100 | 0
2 | 50 | 1
3 | 30 | 2
4 | 110 | 3
5 | 33 | 4
6 | 46 | 5
7 | 36 | 6
8 | 19 | 7
9 | 20 | 8
10 | 31 | 9
11 | 68 | 10
12 | 123 | 11
13 | 588 | 0
14 | 231 | 1
15 | 136 | 2
I want to Pivot sum to return result with pairs of number in IN clause, and result return like this:
| ID | Meaning
-------------------------
Num1 | 150 | 1+2(num1)
Num2 | 1 | 1+2(num2)
Num1 | 140 | 3+4(num1)
Num2 | 5 | 3+4(num2)
Num1 | 79 | 5+6(num1)
Num2 | 9 | 5+6(num2)
.........................
How can I do that?
PostgreSQL 4.2. No you must have another application ans confusing your application versions:
In 1996, the project was renamed to PostgreSQL to reflect its support
for SQL. The online presence at the website PostgreSQL.org began on
October 22, 1996.[26] The first PostgreSQL release formed version 6.0
on January 29, 1997.
Wikipedia. You might want to run the query: select version();
However, any supported version is sufficient for this request. Actually the SQL necessary to supply the necessary summations is simple once you understand LEAD (LAG) functions. The LEAD function permits access to the following row. Using that then the query:
select id, nid, n1s, n2s
from ( select id
, coalesce(lead(id) nid
, num1 + coalesce(lead(num1) over (order by id),0) n1s
, num2 + coalesce(lead(num2) over (order by id),0) n2s
, row_number() over() rn
from tab1
) i
where rn%2 = 1;
Provides all the necessary data for the presentation layer to format the the result as desired. However, that's not the result requested, that requires some SQL gymnastics.
We begin the gymnastics show by wrapping the above into an CTE adding a little setup for the show to follow. The main event then breaks the results into 2 sets in order to add the syntactical sugar tags before bringing then back together. So for the big show:
with joiner(id,nid,n1s,n2s,rn) as
( select *
from ( select id
, coalesce(lead(id) over (order by id),0)
, num1 + coalesce(lead(num1) over (order by id),0)
, num2 + coalesce(lead(num2) over (order by id),0)
, row_number() over() rn
from tab1
) i
where rn%2 = 1
)
select "Name","Sum","Meaning"
from (select 'Num1' as "Name"
, n1s as "Sum"
, concat(id::text, case when nid = 0
then null
else '+' || nid::text
end
) || ' (num1)'as "Meaning"
, rn
from joiner
union all
select 'Num2'
, n2s
, concat(id::text, case when nid = 0
then null
else '+' || nid::text
end
) || ' (num2)'
, rn
from joiner
) j
order by rn, "Name"
See fiddle. Note: I used "Sum" instead of ID for the column title, as it indicates the id not at all. That is just in the "Meaning" column.

Find unique entities with multiple UUID identifiers in redshift

Having an event table with multiple types of UUID's per user, we would like to come up with a way to stitch all those UUIDs together to get the highest possible definition of a single user.
For example:
UUID1 | UUID2
1 a
1 a
2 a
2 b
3 c
4 c
There are 2 users here, the first one with uuid1={1,2} and uuid2={a,b}, the second one with uuid1={3,4} and uuid2={c}. These chains could potentially be very long. There are no intersections (i.e. 1c doesn't exist) and all rows are timestamp ordered.
Is there a way in redshift to generate these unique "guest" identifiers without creating an immense query with many joins?
Thanks in advance!
Create test data table
-- DROP TABLE uuid_test;
CREATE TEMP TABLE uuid_test AS
SELECT 1 row_id, 1::int uuid1, 'a'::char(1) uuid2
UNION ALL SELECT 2 row_id, 1::int uuid1, 'a'::char(1) uuid2
UNION ALL SELECT 3 row_id, 2::int uuid1, 'a'::char(1) uuid2
UNION ALL SELECT 4 row_id, 2::int uuid1, 'b'::char(1) uuid2
UNION ALL SELECT 5 row_id, 3::int uuid1, 'c'::char(1) uuid2
UNION ALL SELECT 6 row_id, 4::int uuid1, 'c'::char(1) uuid2
UNION ALL SELECT 7 row_id, 4::int uuid1, 'd'::char(1) uuid2
UNION ALL SELECT 8 row_id, 5::int uuid1, 'e'::char(1) uuid2
UNION ALL SELECT 9 row_id, 6::int uuid1, 'e'::char(1) uuid2
UNION ALL SELECT 10 row_id, 6::int uuid1, 'f'::char(1) uuid2
UNION ALL SELECT 11 row_id, 7::int uuid1, 'f'::char(1) uuid2
UNION ALL SELECT 12 row_id, 8::int uuid1, 'g'::char(1) uuid2
UNION ALL SELECT 13 row_id, 8::int uuid1, 'h'::char(1) uuid2
;
The actual problem is solved by using strict ordering to find every place where the unique user changes, capturing that as a lookup table and then applying it to the original data.
-- Create lookup table with a from-to range of IDs for each unique user
WITH unique_user AS (
-- Calculate the end of the id range using LEAD() to look ahead
-- Use an inline MAX() to find the ending ID for the last entry
SELECT row_id AS from_id
, NVL(LEAD(row_id,1) OVER (ORDER BY row_id)-1, (SELECT MAX(row_id) FROM uuid_test) ) AS to_id
, unique_uuid
-- Mark unique user change when there is discontinuity in either UUID
FROM (SELECT row_id
,CASE WHEN NVL(LAG(uuid1,1) OVER (ORDER BY row_id), 0) <> uuid1
AND NVL(LAG(uuid2,1) OVER (ORDER BY row_id), '') <> uuid2
THEN MD5(uuid1||uuid2)
ELSE NULL END unique_uuid
FROM uuid_test) t
WHERE unique_uuid IS NOT NULL
ORDER BY row_id
)
-- Apply the unique user value to each row using a range join to the lookup table
SELECT a.row_id, a.uuid1, a.uuid2, b.unique_uuid
FROM uuid_test AS a
JOIN unique_user AS b
ON a.row_id BETWEEN b.from_id AND b.to_id
ORDER BY a.row_id
;
Here's the output
row_id | uuid1 | uuid2 | unique_uuid
--------+-------+-------+----------------------------------
1 | 1 | a | efaa153b0f682ae5170a3184fa0df28c
2 | 1 | a | efaa153b0f682ae5170a3184fa0df28c
3 | 2 | a | efaa153b0f682ae5170a3184fa0df28c
4 | 2 | b | efaa153b0f682ae5170a3184fa0df28c
5 | 3 | c | 5fcfcb7df376059d0075cb892b2cc37f
6 | 4 | c | 5fcfcb7df376059d0075cb892b2cc37f
7 | 4 | d | 5fcfcb7df376059d0075cb892b2cc37f
8 | 5 | e | 18a368e1052b5aa0388ef020dd9a1e20
9 | 6 | e | 18a368e1052b5aa0388ef020dd9a1e20
10 | 6 | f | 18a368e1052b5aa0388ef020dd9a1e20
11 | 7 | f | 18a368e1052b5aa0388ef020dd9a1e20
12 | 8 | g | 321fcc2447163a81d470b9353e394121
13 | 8 | h | 321fcc2447163a81d470b9353e394121

T-SQL query to remove duplicates from large tables using join

I am new in using T-SQL queries and I was trying different solutions in order to remove duplicate rows from a fairy large table (with over 270,000 rows).
The table looks something like:
TableA
-----------
RowID int not null identity(1,1) primary key,
Col1 varchar(50) not null,
Col2 int not null,
Col3 varchar(50) not null
The rows for this table are not perfect duplicates because of the existence of the RowID identity field.
The second table that I need to join with:
TableB
-----------
RowID int not null identity(1,1) primary key,
Col1 int not null,
Col2 varchar(50) not null
In TableA I have something like:
1 | gray | 4 | Angela
2 | red | 6 | Diana
3 | black| 6 | Alina
4 | black| 11 | Dana
5 | gray | 4 | Angela
6 | red | 12 | Dana
7 | red | 6 | Diana
8 | black| 11 | Dana
And in TableB:
1 | 6 | klm
2 | 11 | lmi
Second column from TableB (Col1) is foreign key inside TableA (Col2).
I need to remove ONLY the duplicates from TableA that has Col2 = 6 ignoring the other duplicates.
1 | gray | 4 | Angela
2 | red | 6 | Diana
4 | black| 6 | Alina
5 | black| 11 | Dana
6 | gray | 4 | Angela
7 | red | 12 | Dana
8 | black| 11 | Dana
I tried using
DELETE FROM TableA a inner join TableB b on a.Col2=b.Col1
WHERE a.RowId NOT IN (SELECT MIN(RowId) FROM TableA GROUP BY RowId, Col1, Col2, Col3) and b.Col2="klm"
but I still get some of the duplicates that I need to remove.
What is the best way to remove not perfect duplicate rows using join?
well min would only be one and group by PK will give you everything
and the RowID are wrong in the example
DELETE FROM TableA a
inner join TableB b
on a.Col2=b.Col1
WHERE a.RowId NOT IN (SELECT MIN(RowId)
FROM TableA GROUP BY RowId, Col1, Col2, Col3)
and b.Col2="klm"
this would be rows to delete
select *
from
( select *
, row_number over (partition by Col1, Col3 order by RowID) as rn
from TableA a
where del.Col2 = 6
) tt
where tt.rn > 1
another solution is:
WITH CTE AS(
SELECT t.[col1], t.[col2], t.[col3], t.[col4],
RN = ROW_NUMBER() OVER (PARTITION BY t.[col1], t.[col2], t.[col3], t.[col4] ORDER BY t.[col1])
FROM [TableA] t
)
delete from CTE WHERE RN > 1
regards.

Sql join and remove distinct in two separate column

I have table ordered
form_id | procedure_id
----------+-------------
101 | 24
101 | 23
101 | 22
102 | 7
102 | 6
102 | 3
102 | 2
And another table have table performed
form_id | procedure_id
----------+-------------
101 | 42
101 | 45
102 | 5
102 | 3
102 | 7
102 | 12
102 | 13
Expected output
form_id o_procedure_id p_procedure_id
101 24 42
101 23 45
101 22 NULL
102 7 7
102 6 5
102 3 3
102 2 12
102 NULL 13
I tried the below query:
with ranked as
(select
dense_rank() over (partition by po.form_id order by po.procedure_id) rn1,
dense_rank() over (partition by po.form_id order by pp.procedure_id) rn2,
po.form_id,
po.procedure_id,
pp.procedure_id
from ordered po,
performed pp where po.form_id = pp.form_id)
select ranked.* from ranked
--where rn1=1 or rn2=1
The above query return the value with repeat value ordered and procedure ID.
How to get Excepted output?
I wasn't quite sure how you would want to handle multiple null values and/or null values on both sides of your tables. My example therefor assumes the first table to be leading and include all entries while the second table might include holes. Query ain't pretty but i suppose it does what you expect it to:
select test1_sub.form_id, test1_sub.process_id as pid_1, test2_sub.process_id as pid_2 from (
select form_id,
process_id,
rank() over (partition by form_id order by process_id asc nulls last)
from test1) as test1_sub
left join (
select * from (
select form_id,
process_id,
rank() over (partition by form_id order by process_id asc nulls last)
from test2
) as test2_nonexposed
) as test2_sub on test1_sub.form_id = test2_sub.form_id and test1_sub.rank = test2_sub.rank;