Find missing entries in link table - tsql

I use SQL-Server 2008 and have the following tables.
users
userid
1
2
3
objects
objectid | category
9 | A
8 | B
7 | A
6 | C
userobjects
userid | objectid
1 | 9
3 | 7
3 | 6
As you can see, userobjects is a link table. Unfortunately it is missing some entries. I could include them with a script but I wonder if there is a solution in sql.
For every userid and every objectid that belongs to category 'A' there should be an entry in userobjects. So what I wanted to have is this:
userid | objectid
1 | 9
1 | 7
2 | 9
2 | 7
3 | 9
3 | 7
3 | 6

I could include them with a script but I wonder if there is a solution
in sql.
This is a select query using UNION (here is SQL-DEMO);
select u.userId, o.objectId
from objects o cross join users u
where o.category = 'A'
union
select u.userId, o.objectId
from users u join userobjects uj on u.userid = uj.userId
join objects o on uj.objectid = o.objectid
where o.category <> 'A'
order by u.userid,o.objectid desc
--RESULTS
userId objectId
1 9
1 7
2 9
2 7
3 9
3 7
3 6

You can do it without a UNION using a LEFT join and a non-traditional INNER JOIN to objects
SELECT DISTINCT u.userid,
o.objectid
FROM users u
LEFT JOIN userobjects uj
ON u.userid = uj.userid
INNER JOIN objects o
ON uj.objectid = o.objectid
OR ( o.category = 'A' )
ORDER BY u.userid,
o.objectid DESC
Fiddle

Related

Join data from 3 tables

table_1
id customer_id
---------------
1 1
2 2
3 1
4 1
5 3
6 4
table_2
id id_table1 device_mac
-------------------------------------
1 1 aa:bb:cc:dd:ee:ff
2 1 11:22:33:44:55:66
3 2 1a:2a:3a:4a:5a:6a
4 3 2b:3b:4b:5b:6b:7b
5 4 3c:4c:5c:6c:7c:8c
6 2 4d:5d:6d:7d:8d:9d
table_3
id device_mac device_name
---------------------------------------
1 aa:bb:cc:dd:ee:ff loc1
2 11:22:33:44:55:66 loc2
3 1a:2a:3a:4a:5a:6a loc3
4 2b:3b:4b:5b:6b:7b loc4
5 3c:4c:5c:6c:7c:8c loc5
6 4d:5d:6d:7d:8d:9d loc6
I have a requirement where I need to get the below details by customer_id using python and postgres db.
ex: get details with customer_id = 1
table1_id count(table_2) device_names
1 2 [loc1, loc2]
3 1 [loc4]
4 1 [loc5]
I tried with individual queries using python:
select id from table_1 where customer_id=1;
for t1_id from ids above table_1 data:
select * from table_2 where table_id=t1_id
for t2_data from ids above table2_data:
select * from table_3 where device_mac = t2_data.device_mac
# generate expected rows
Can I just do this in a signle query?
Join the tables and aggregate:
SELECT t1.id,
COUNT(*) count,
STRING_AGG(t3.device_name, ',' ORDER BY t3.device_name) device_names
FROM table_1 t1
INNER JOIN table_2 t2 ON t2.id_table1 = t1.id
INNER JOIN table_3 t3 ON t3.device_mac = t2.device_mac
WHERE t1.customer_id = 1
GROUP BY t1.id
If you are getting duplicate device_names you may use DISTINCT:
STRING_AGG(DISTINCT t3.device_name, ',' ORDER BY t3.device_name) device_names
See the demo.
Results:
id
count
device_names
1
2
loc1,loc2
3
1
loc4
4
1
loc5

Find unique entities with multiple UUID identifiers in redshift

Having an event table with multiple types of UUID's per user, we would like to come up with a way to stitch all those UUIDs together to get the highest possible definition of a single user.
For example:
UUID1 | UUID2
1 a
1 a
2 a
2 b
3 c
4 c
There are 2 users here, the first one with uuid1={1,2} and uuid2={a,b}, the second one with uuid1={3,4} and uuid2={c}. These chains could potentially be very long. There are no intersections (i.e. 1c doesn't exist) and all rows are timestamp ordered.
Is there a way in redshift to generate these unique "guest" identifiers without creating an immense query with many joins?
Thanks in advance!
Create test data table
-- DROP TABLE uuid_test;
CREATE TEMP TABLE uuid_test AS
SELECT 1 row_id, 1::int uuid1, 'a'::char(1) uuid2
UNION ALL SELECT 2 row_id, 1::int uuid1, 'a'::char(1) uuid2
UNION ALL SELECT 3 row_id, 2::int uuid1, 'a'::char(1) uuid2
UNION ALL SELECT 4 row_id, 2::int uuid1, 'b'::char(1) uuid2
UNION ALL SELECT 5 row_id, 3::int uuid1, 'c'::char(1) uuid2
UNION ALL SELECT 6 row_id, 4::int uuid1, 'c'::char(1) uuid2
UNION ALL SELECT 7 row_id, 4::int uuid1, 'd'::char(1) uuid2
UNION ALL SELECT 8 row_id, 5::int uuid1, 'e'::char(1) uuid2
UNION ALL SELECT 9 row_id, 6::int uuid1, 'e'::char(1) uuid2
UNION ALL SELECT 10 row_id, 6::int uuid1, 'f'::char(1) uuid2
UNION ALL SELECT 11 row_id, 7::int uuid1, 'f'::char(1) uuid2
UNION ALL SELECT 12 row_id, 8::int uuid1, 'g'::char(1) uuid2
UNION ALL SELECT 13 row_id, 8::int uuid1, 'h'::char(1) uuid2
;
The actual problem is solved by using strict ordering to find every place where the unique user changes, capturing that as a lookup table and then applying it to the original data.
-- Create lookup table with a from-to range of IDs for each unique user
WITH unique_user AS (
-- Calculate the end of the id range using LEAD() to look ahead
-- Use an inline MAX() to find the ending ID for the last entry
SELECT row_id AS from_id
, NVL(LEAD(row_id,1) OVER (ORDER BY row_id)-1, (SELECT MAX(row_id) FROM uuid_test) ) AS to_id
, unique_uuid
-- Mark unique user change when there is discontinuity in either UUID
FROM (SELECT row_id
,CASE WHEN NVL(LAG(uuid1,1) OVER (ORDER BY row_id), 0) <> uuid1
AND NVL(LAG(uuid2,1) OVER (ORDER BY row_id), '') <> uuid2
THEN MD5(uuid1||uuid2)
ELSE NULL END unique_uuid
FROM uuid_test) t
WHERE unique_uuid IS NOT NULL
ORDER BY row_id
)
-- Apply the unique user value to each row using a range join to the lookup table
SELECT a.row_id, a.uuid1, a.uuid2, b.unique_uuid
FROM uuid_test AS a
JOIN unique_user AS b
ON a.row_id BETWEEN b.from_id AND b.to_id
ORDER BY a.row_id
;
Here's the output
row_id | uuid1 | uuid2 | unique_uuid
--------+-------+-------+----------------------------------
1 | 1 | a | efaa153b0f682ae5170a3184fa0df28c
2 | 1 | a | efaa153b0f682ae5170a3184fa0df28c
3 | 2 | a | efaa153b0f682ae5170a3184fa0df28c
4 | 2 | b | efaa153b0f682ae5170a3184fa0df28c
5 | 3 | c | 5fcfcb7df376059d0075cb892b2cc37f
6 | 4 | c | 5fcfcb7df376059d0075cb892b2cc37f
7 | 4 | d | 5fcfcb7df376059d0075cb892b2cc37f
8 | 5 | e | 18a368e1052b5aa0388ef020dd9a1e20
9 | 6 | e | 18a368e1052b5aa0388ef020dd9a1e20
10 | 6 | f | 18a368e1052b5aa0388ef020dd9a1e20
11 | 7 | f | 18a368e1052b5aa0388ef020dd9a1e20
12 | 8 | g | 321fcc2447163a81d470b9353e394121
13 | 8 | h | 321fcc2447163a81d470b9353e394121

How does one print depth-level in a Postgres query that uses RECURSIVE to select descendants?

I have a table persons that contains a column for parent_id, which refers to another row in the same table. Assume this is the logical hierarchy:
P1
P2 P3 P4
P5 P6 P7 P8 P9 P10
I have written a query that prints all parents of a given node, along with the height above the node, and it seems to work fine:
WITH
RECURSIVE ancestors AS (
SELECT id, parent_id
FROM persons
WHERE id = 8
UNION
SELECT p.id, p.parent_id
FROM persons p
INNER JOIN ancestors
ON
p.id = ancestors.parent_id
)
SELECT persons.id, persons.name,
ROW_NUMBER() over () as height
FROM ancestors
INNER JOIN persons
ON
ancestors.id = persons.id
WHERE
persons.id <> 8
Result:
id | name | height
-------+-------------+---------
3 | P3 | 1
1 | P1 | 2
(2 rows)
I now want to write a query that similarly prints all descendants, along with depth. Here's the query so far (same as above with id and parent_id swapped in the UNION join):
WITH
RECURSIVE descendants AS (
SELECT id, parent_id
FROM persons
WHERE id = 1
UNION
SELECT p.id, p.parent_id
FROM persons p
INNER JOIN descendants
ON
p.parent_id = descendants.id
)
SELECT persons.id, persons.name,
ROW_NUMBER() over () as depth
FROM descendants
INNER JOIN persons
ON
descendants.id = persons.id
WHERE
persons.id <> 1
This gives the following result:
id | name | depth
-------+-------------+---------
2 | P2 | 1
3 | P3 | 2
4 | P4 | 3
5 | P5 | 4
6 | P6 | 5
7 | P7 | 6
8 | P8 | 7
9 | P9 | 8
10 | P10 | 9
(9 rows)
Clearly, the depth is all wrong. ROW_NUMBER() isn't doing what I want. How do I go about this?
I've thought about using a counter within the recursive part of the query itself, which increments every time it is run, but I'm not sure if there's a way to achieve that.
Use an additional integer column with values incremented at each recursive step.
WITH RECURSIVE descendants AS (
SELECT id, parent_id, 0 AS depth
FROM persons
WHERE id = 1
UNION
SELECT p.id, p.parent_id, d.depth+ 1
FROM persons p
INNER JOIN descendants d
ON p.parent_id = d.id
)
SELECT p.id, p.name, depth
FROM descendants d
INNER JOIN persons p
ON d.id = p.id
WHERE p.id <> 1;
id | name | depth
----+------+-------
2 | P2 | 1
3 | P3 | 1
4 | P4 | 1
5 | P5 | 2
6 | P6 | 2
7 | P7 | 2
8 | P8 | 2
9 | P9 | 2
10 | P10 | 2
(9 rows)
Db<>fiddle.

Grouped LIMIT 10 in Postgresql

I have a query:
select
a.kli,
b.term_desc,
count(distinct(a.adic)) as count,
a.partner_id
from
ad_delivery.sgmt_kli_adic a
join wand.wandterms b on a.kli = b.term_code
join wand.wandterms c on b.term_desc=c.term_desc
join dwh.sgmt_clients e on a.partner_id::varchar = e.partner_id
join dwh.schema_names f on e.partner_id::integer = f.partner_id::integer
where
a.partner_id::integer in (f.partner_id)
and c.class_code = 969
group by a.partner_id, b.term_desc, a.kli
order by partner_id, count desc;
which brings back counts for certain terms per partner_id. I want to be able to show the top 10 for each of the ~40 partner_id in order by the count desc
the query results look like
db=# SELECT * FROM xxx;
pid | term_desc | count
----+------------+------
4 | termdesc1 | 3434
4 | termdesc2 | 235
4 | termdesc3 | 367
4 | termdesc4 | 4533
5 | termdesc1 | 235
5 | termdesc2 | 567
5 | termdesc3 | 344
5 | termdesc4 | 56
(10k+ rows)
You could add a rank column and then filter the result by the rank :
select
a.kli,
b.term_desc,
count(distinct(a.adic)) as count,
a.partner_id,
RANK() OVER (PARTITION BY a.partner_id order by a.partner_id DESC) AS r
from
ad_delivery.sgmt_kli_adic a
join wand.wandterms b on a.kli = b.term_code
join wand.wandterms c on b.term_desc=c.term_desc
join dwh.sgmt_clients e on a.partner_id::varchar = e.partner_id
join dwh.schema_names f on e.partner_id::integer = f.partner_id::integer
where
a.partner_id::integer in (f.partner_id)
and c.class_code = 969
group by a.partner_id, b.term_desc, a.kli
HAVING r < 11
order by partner_id, count desc;
I have not tested the code, however the trick is ranking the each row of the GROUP BY and filter the resultset with the HAVING clause, keeping only item with a lower rank than 11 (you will get 10 item per group).

TSQL A recursive update?

I'm wondering if exists a recursive update in tsql (CTE)
ID parentID value
-- -------- -----
1 NULL 0
2 1 0
3 2 0
4 3 0
5 4 0
6 5 0
I it possible to update the column value recursively using e.g CTE from ID = 6 to the top most row ?
Yes, it should be. MSDN gives an example:
USE AdventureWorks;
GO
WITH DirectReports(EmployeeID, NewVacationHours, EmployeeLevel)
AS
(SELECT e.EmployeeID, e.VacationHours, 1
FROM HumanResources.Employee AS e
WHERE e.ManagerID = 12
UNION ALL
SELECT e.EmployeeID, e.VacationHours, EmployeeLevel + 1
FROM HumanResources.Employee as e
JOIN DirectReports AS d ON e.ManagerID = d.EmployeeID
)
UPDATE HumanResources.Employee
SET VacationHours = VacationHours * 1.25
FROM HumanResources.Employee AS e
JOIN DirectReports AS d ON e.EmployeeID = d.EmployeeID;