PostgreSQL: recursively join a second table - postgresql

I'm struggling with recursion in PostgreSQL. I need to join a first table with a second one, and then recursively join within the second table. I looked at quite a number of examples, but most are about finding the parent records within a single table, and this has left me utterly confused.
Here's a minimal example with tables thing and category. Records in thing may or may not have a category:
id
name
category
1
a5
3
2
passat
2
3
apple
NULL
Records in category may have one or more parents in the same table:
id
name
parent_category
1
vehicle
NULL
2
car
1
3
coupe
2
The result I'm looking for is the combination of all things with their categories, as well as the category level (1 for the direct parent, 2 for the level above).
thing_name
category_name
level
a5
coupe
1
a5
car
2
a5
vehicle
3
passat
car
1
passat
vehicle
2
apple
NULL
NULL
I have a DB Fiddle here: https://www.db-fiddle.com/f/b7V8ddragZZ9x2RsMkdFYn/5
CREATE TABLE category (
id INT,
name TEXT,
parent_category INT
);
INSERT INTO category VALUES (1, 'vehicle', null);
INSERT INTO category VALUES (2, 'car', 1);
INSERT INTO category VALUES (3, 'coupe', 2);
CREATE TABLE thing (
id INT,
name TEXT,
category INT
);
INSERT INTO thing VALUES (1, 'a5', 3);
INSERT INTO thing VALUES (2, 'passat', 2);
INSERT INTO thing VALUES (3, 'apple', null);

Use a CTE to join the tables, giving you a tree-like view of combined thing_categories, which you can then use with a normal recursive CTE.
with recursive join_thing_category as (
select thing.id as thing_id,
thing.name as thing_name,
thing.category as thing_category,
category.id as category_id,
category.name as category_name,
category.parent_category as parent_category
from thing left join category on thing.category=category.id
),
recursive_part(n) as (
select thing_id, thing_name, thing_category, category_id, category_name, parent_category, (0*parent_category) + 1 as level from join_thing_category
union all
select 1, thing_name, thing_category, cat.id category_id, cat.name, cat.parent_category as parent, level+1 as level from recursive_part rp cross join category cat
where cat.id=rp.parent_category
)
select thing_name, category_name, level from recursive_part order by 1, 2, 3 limit 1024;
thing_name
category_name
level
a5
car
2
a5
coupe
1
a5
vehicle
3
apple
passat
car
1
passat
vehicle
2
View on DB Fiddle
The (0*parent_category) + 1 as level bit is so that things with no category get NULL as their level instead of 1.

Related

SQL SELECT Parent-Child with SORT (...again)

UPDATED:
I have a simple, one level parent child relation table, with following columns:
ID_Asset| Parent_ID_Asset | ProductTitle
I need output grouped by Parent followed by children, and also sorted by Parent and Children Name. My attempts in the fiddle.
See here for details: https://rextester.com/PPCHG20007
Desired order:
9 8 NULL ADONIS Server
7 16 8 ADONIS Designer
8 20 8 ADONIS Portal Module “Control & Release” Package XS
Parent first, than children, while ProductTitle ordered alphabetically.
Thanx all for hints so far.
I would do conditional ordering instead :
select t.*
from table t
order by (case when parent_id is null then id else parent_id end), ProductTitle;
I am assuming you need to sort the data based on parent-child relation.
As far as I understand, you need to sort on the name of the root products, and in between them to show the sub-products, ordered by name, and in between them to show their sub-products, etc.
I guess you are using a recursive cte. You can define a "hierarchy sorting" helper which is a padded number in the current level, and for each level deep, add a suffix with the padded number in the current level, etc.
Something like that:
declare #Products table(ID int, Parent_ID int, ProductTitle varchar(100))
insert into #Products values
(1, NULL, 'ADONIS'),
(2, NULL, 'BACARAT'),
(3, 1, 'Portal Module'),
(4, 1, 'Alhambra'),
(5, NULL, 'ZULU'),
(6, 2, 'Omega')
; with cte as (
select ID, Parent_ID, ProductTitle, FORMAT(ROW_NUMBER() over(order by ProductTitle), '0000') as SortingHelper
from #Products
where Parent_ID is null
union all
select p.ID, p.Parent_ID, p.ProductTitle, cte.SortingHelper + '.' + FORMAT(ROW_NUMBER() over(order by p.ProductTitle), '0000') as SortingHelper
from #Products p
inner join cte on cte.ID = p.Parent_ID
)
select ID, Parent_ID, ProductTitle
from cte
order by SortingHelper
I think this is the ordering you are looking for. This joins each row to its parent (if it exists). Then if there is a TP (parent) record then that is the parent title, ID etc, otherwise the current record must be the parent. I've shown the parent name in the query results for clarity. Then it sorts by
Parent name (so parents and children are together, but in parent name order)
Parent ID (in case two or more parents have the same name/title, this will keep children with the correct parent)
A flag which is 0 for parents, 1 for children, so the parent comes first
The current record name, which will sort children by name/title order
Code is
Select T.*,
isnull(TP.ProductTitle, T.ProductTitle) as ParentName -- This is the parent name, shown for reference
from test T
left outer join Test TP on TP.ID_Asset = T.Parent_ID_Asset --This is the parent record, if it exists
ORDER BY isnull(TP.ProductTitle, T.ProductTitle), --ParentName sort
isnull(TP.ID_Asset, T.ID_Asset), --if two parents have the same title, this makes sure they group with their correct children
Case when T.Parent_ID_Asset is null then 0 else 1 end, --this makes sure the parent comes before the child
T.ProductTitle --Child Sort

How to query the data in a join table by two sets of joined records?

I've got three tables: users, courses, and grades, the latter of which joins users and courses with some metadata like the user's score for the course. I've created a SQLFiddle, though the site doesn't appear to be working at the moment. The schema looks like this:
CREATE TABLE users(
id INT,
name VARCHAR,
PRIMARY KEY (ID)
);
INSERT INTO users VALUES
(1, 'Beth'),
(2, 'Alice'),
(3, 'Charles'),
(4, 'Dave');
CREATE TABLE courses(
id INT,
title VARCHAR,
PRIMARY KEY (ID)
);
INSERT INTO courses VALUES
(1, 'Biology'),
(2, 'Algebra'),
(3, 'Chemistry'),
(4, 'Data Science');
CREATE TABLE grades(
id INT,
user_id INT,
course_id INT,
score INT,
PRIMARY KEY (ID)
);
INSERT INTO grades VALUES
(1, 2, 2, 89),
(2, 2, 1, 92),
(3, 1, 1, 93),
(4, 1, 3, 88);
I'd like to know how (if possible) to construct a query which specifies some users.id values (1, 2, 3) and courses.id values (1, 2, 3) and returns those users' grades.score values for those courses
| name | Algebra | Biology | Chemistry |
|---------|---------|---------|-----------|
| Alice | 89 | 92 | |
| Beth | | 93 | 88 |
| Charles | | | |
In my application logic, I'll be receiving an array of user_ids and course_ids, so the query needs to select those users and courses dynamically by primary key. (The actual data set contains millions of users and tens of thousands of courses—the examples above are just a sample to work with.)
Ideally, the query would:
use the course titles as dynamic attributes/column headers for the users' score data
sort the row and column headers alphabetically
include empty/NULL cells if the user-course pair has no grades relationship
I suspect I may need some combination of JOINs and Postgresql's crosstab, but I can't quite wrap my head around it.
Update: learning that the terminology for this is "dynamic pivot", I found this SO answer which appears to be trying to solve a related problem in Postgres with crosstab()
I think a simple pivot query should work here, since you only have 4 courses in your data set to pivot.
SELECT t1.name,
MAX(CASE WHEN t3.title = 'Biology' THEN t2.score ELSE NULL END) AS Biology,
MAX(CASE WHEN t3.title = 'Algebra' THEN t2.score ELSE NULL END) AS Algebra,
MAX(CASE WHEN t3.title = 'Chemistry' THEN t2.score ELSE NULL END) AS Chemistry,
MAX(CASE WHEN t3.title = 'Data Science' THEN t2.score ELSE NULL END) AS Data_Science
FROM users t1
LEFT JOIN grades t2
ON t1.id = t2.user_id
LEFT JOIN courses t3
ON t2.course_id = t3.id
GROUP BY t1.name
Follow the link below for a running demo. I used MySQL because, as you have noticed, SQLFiddle seems to be perpetually busted the other databases.
SQLFiddle

Find all records NOT in any blocked range where blocked ranges are in a table

I have a table TaggedData with the following fields and data
ID GroupID Tag MyData
** ******* *** ******
1 Texas AA01 Peanut Butter
2 Texas AA15 Cereal
3 Ohio AA05 Potato Chips
4 Texas AA08 Bread
I have a second table of BlockedTags as follows:
ID StartTag EndTag
** ******** ******
1 AA00 AA04
2 AA15 AA15
How do I select from this to return all data matching a given GroupId but NOT in any blocked range (inclusive)? For the data given if the GroupId is Texas, I don't want to return Cereal because it matches the second range. It should only return Bread.
I did try left joins based queries but I'm not even that close.
Thanks
create table TaggedData (
ID int,
GroupID varchar(16),
Tag char(4),
MyData varchar(50))
create table BlockedTags (
ID int,
StartTag char(4),
EndTag char(4)
)
insert into TaggedData(ID, GroupID, Tag, MyData)
values (1, 'Texas', 'AA01', 'Peanut Butter')
insert into TaggedData(ID, GroupID, Tag, MyData)
values (2, 'Texas' , 'AA15', 'Cereal')
insert into TaggedData(ID, GroupID, Tag, MyData)
values (3, 'Ohio ', 'AA05', 'Potato Chips')
insert into TaggedData(ID, GroupID, Tag, MyData)
values (4, 'Texas', 'AA08', 'Bread')
insert into BlockedTags(ID, StartTag, EndTag)
values (1, 'AA00', 'AA04')
insert into BlockedTags(ID, StartTag, EndTag)
values (2, 'AA15', 'AA15')
select t.* from TaggedData t
left join BlockedTags b on t.Tag between b.StartTag and b.EndTag
where b.ID is null
Returns:
ID GroupID Tag MyData
----------- ---------------- ---- --------------------------------------------------
3 Ohio AA05 Potato Chips
4 Texas AA08 Bread
(2 row(s) affected)
So, to match on given GroupID you change the query like that:
select t.* from TaggedData t
left join BlockedTags b on t.Tag between b.StartTag and b.EndTag
where b.ID is null and t.GroupID=#GivenGroupID
I Prefer the NOT EXISTS simply because it gives you more readability, usability and better performance usually in large data (several cases get better execution plans):
would be like this:
SELECT * from TaggedData
WHERE GroupID=#GivenGroupID
AND NOT EXISTS(SELECT 1 FROM BlockedTags WHERE Tag BETWEEN StartTag ANDEndTag)

T-SQL query, multiple values in a field

I have two tables in a database. The first table tblTracker contains many columns, but the column of particular interest is called siteAdmin and each row in that column can contain multiple loginIDs of 5 digits like 21457, 21456 or just one like 21444. The next table users contains columns like LoginID, fname, and lname.
What I would like to be able to do is take the loginIDs contained in tblTracker.siteAdmin and return fname + lname from users. I can successfully do this when there is only one loginID in the row such as 21444 but I cannot figure out how to do this when there is more than one like 21457, 21456.
Here is the SQL statement I use for when there is one loginID in that column
SELECT b.FName + '' '' + b.LName AS siteAdminName,
FROM tblTracker a
LEFT OUTER JOIN users b ON a.siteAdmin= b.Login_Id
However this doesn't work when it tries to join a siteAdmin with more than one LoginID in it
Thanks!
I prefer the number table approach to split a string in TSQL
For this method to work, you need to do this one time table setup:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this split function:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this will not return empty rows
----------------
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(#SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = #SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
You can now easily split a CSV string into a table and join on it:
select * from dbo.FN_ListToTable(',','1,2,3,,,4,5,6777,,,')
OUTPUT:
ListValue
-----------------------
1
2
3
4
5
6777
(6 row(s) affected)
Your can now use a CROSS APPLY to split every row in your table like:
DECLARE #users table (LoginID int, fname varchar(5), lname varchar(5))
INSERT INTO #users VALUES (1, 'Sam', 'Jones')
INSERT INTO #users VALUES (2, 'Don', 'Smith')
INSERT INTO #users VALUES (3, 'Joe', 'Doe')
INSERT INTO #users VALUES (4, 'Tim', 'White')
INSERT INTO #users VALUES (5, 'Matt', 'Davis')
INSERT INTO #users VALUES (15,'Sue', 'Me')
DECLARE #tblTracker table (RowID int, siteAdmin varchar(50))
INSERT INTO #tblTracker VALUES (1,'1,2,3')
INSERT INTO #tblTracker VALUES (2,'2,3,4')
INSERT INTO #tblTracker VALUES (3,'1,5')
INSERT INTO #tblTracker VALUES (4,'1')
INSERT INTO #tblTracker VALUES (5,'5')
INSERT INTO #tblTracker VALUES (6,'')
INSERT INTO #tblTracker VALUES (7,'8,9,10')
INSERT INTO #tblTracker VALUES (8,'1,15,3,4,5')
SELECT
t.RowID, u.LoginID, u.fname+' '+u.lname AS YourAdmin
FROM #tblTracker t
CROSS APPLY dbo.FN_ListToTable(',',t.siteAdmin) st
LEFT OUTER JOIN #users u ON st.ListValue=u.LoginID --to get all rows even if missing siteAdmin
--INNER JOIN #users u ON st.ListValue=u.LoginID --to remove rows without any siteAdmin
ORDER BY t.RowID,u.fname,u.lname
OUTPUT:
RowID LoginID YourAdmin
----------- ----------- -----------
1 2 Don Smith
1 3 Joe Doe
1 1 Sam Jones
2 2 Don Smith
2 3 Joe Doe
2 4 Tim White
3 5 Matt Davis
3 1 Sam Jones
4 1 Sam Jones
5 5 Matt Davis
7 NULL NULL
7 NULL NULL
7 NULL NULL
8 3 Joe Doe
8 5 Matt Davis
8 1 Sam Jones
8 15 Sue Me
8 4 Tim White
(18 row(s) affected)

Find duplicate row "details" in table

OrderId OrderCode Description
-------------------------------
1 Z123 Stuff
2 ABC999 Things
3 Z123 Stuff
I have duplicates in a table like the above. I'm trying to get a report of which Orders are duplicates, and what Order they are duplicates of, so I can figure out how they got into the database.
So ideally I'd like to get an output something like;
OrderId IsDuplicatedBy
-------------------------
1 3
3 1
I can't work out how to code this in SQL.
You can use the same table twice in one query and join on the fields you need to check against. T1.OrderID <> T2.OrderID is needed to not find a duplicate for the same row.
declare #T table (OrderID int, OrderCode varchar(10), Description varchar(50))
insert into #T values
(1, 'Z123', 'Stuff'),
(2, 'ABC999', 'Things'),
(3, 'Z123', 'Stuff')
select
T1.OrderID,
T2.OrderID as IsDuplicatedBy
from #T as T1
inner join #T as T2
on T1.OrderCode = T2.OrderCode and
T1.Description = T2.Description and
T1.OrderID <> T2.OrderID
Result:
OrderID IsDuplicatedBy
1 3
3 1