partition over two columns - db2

I'm wanting to partition by two columns (PROJECT_ID, AND CATEGORY_NAME) and I'm having trouble writing the correct syntax. My query below is functional but when I attempt to add an additional over clause it doesn't work correctly. The recursive query was used to concatenate rows partitioning over project_id, creating a list of admins combining and concatenating name_last and name_first to make a list. I need to use an additional over clause to include the CATEGORY_NAME due to admins in the list that work in different categories ('INVISION' AND 'INSIGHT') but are under the same project_id. The first subquery
SELECT
RowNumber() over (PARTITION BY F13.DIM_PROJECT_ID, F13.CATEGORY_NAME ORDER BY F13.PROJECT_NAME),
F13.DIM_PROJECT_ID.....etc.
extracts the correct data, I'm just unsure of how to pull that correct data out partitioning by both project and category. I'm using db2.
with
t1(rowNum, PROJECT_ID, NAME_LAST, NAME_FIRST, POINT_OF_CONTACT, PROJECT_NAME, BUSINESS_NAME) as
(
SELECT
RowNumber() over (PARTITION BY F13.DIM_PROJECT_ID, F13.CATEGORY_NAME ORDER BY F13.PROJECT_NAME),
F13.DIM_PROJECT_ID,
F2P.NAME_LAST,
F2P.NAME_FIRST,
REPLACE(F2P.POINT_OF_CONTACT, ',', ' |') AS POINT_OF_CONTACT,
F13.PROJECT_NAME,
F2H.CATEGORY_NAME,
FROM FACT_TABLE AS F13
INNER JOIN ADMIN AS F2P ON F13.DIM_PROJECT_ID = F2P.DIM_PROJECT_ID
LEFT JOIN HOURS AS F2H ON F13.DIM_PROJECT_ID = F2H.DIM_PROJECT_ID
WHERE F2H.CATEGORY_NAME = ('INVISION')
group by
F13.DIM_PROJECT_ID,
F13.PROJECT_NAME,
F2P.NAME_LAST,
F2P.NAME_FIRST,
F2P.POINT_OF_CONTACT,
F2H.CATEGORY_NAME
) ,
t2(PROJECT_ID, LIST, POINT_OF_CONTACT, PROJECT_NAME, BUSINESS_NAME, cnt) AS
( SELECT PROJECT_ID,
VARCHAR(NAME_FIRST CONCAT ' ' CONCAT NAME_LAST, 6000),
POINT_OF_CONTACT,
PROJECT_NAME,
CATEGORY_NAME,
1
FROM t1
WHERE rowNum = 1
UNION ALL
SELECT t2.PROJECT_ID,
t2.list || ' | ' || t1.NAME_FIRST CONCAT ' ' CONCAT t1.NAME_LAST,
t1.POINT_OF_CONTACT,
t1.PROJECT_NAME,
t1.CATEGORY_NAME
FROM t2, t1
WHERE t2.project_id = t1.project_id
AND t2.cnt + 1 = t1.rowNum )
SELECT PROJECT_ID,
PROJECT_NAME,
POINT_OF_CONTACT,
CATEGORY_NAME
list
FROM t2
WHERE ( PROJECT_ID, cnt ) IN (
SELECT PROJECT_ID, MAX(rowNum)
FROM t1
GROUP BY PROJECT_ID )
The results that I'm getting are producing duplicates but only when the second column (category_name is included in the partition clause. Current results:
Desired results:

I figured it out. I added an ID for category and partitioned by category_id and project_id.
with
t1(rowNum, PROJECT_ID, NAME_LAST, NAME_FIRST, POINT_OF_CONTACT, PROJECT_NAME, CATEGORY_ID, CATEGORY_NAME) as
(
SELECT
RowNumber() over (PARTITION BY F13.DIM_PROJECT_ID, F13.CATEGORY_ID ORDER BY F13.PROJECT_NAME, F13.CATEGORY_NAME),
F13.DIM_PROJECT_ID,
F2P.NAME_LAST,
F2P.NAME_FIRST,
REPLACE(F2P.POINT_OF_CONTACT, ',', ' |') AS POINT_OF_CONTACT,
F13.PROJECT_NAME,
F13.CATEGORY_ID
F13.CATEGORY_NAME,
FROM FACT_TABLE AS F13
INNER JOIN ADMIN AS F2P ON F13.DIM_PROJECT_ID = F2P.DIM_PROJECT_ID
LEFT JOIN HOURS AS F2H ON F13.DIM_PROJECT_ID = F2H.DIM_PROJECT_ID
WHERE F13.CATEGORY_NAME = ('INVISION')
group by
F13.DIM_PROJECT_ID,
F13.PROJECT_NAME,
F2P.NAME_LAST,
F2P.NAME_FIRST,
F2P.POINT_OF_CONTACT,
F13.CATEGORY_ID
F13.CATEGORY_NAME
) ,
t2(PROJECT_ID, LIST, POINT_OF_CONTACT, PROJECT_NAME, CATEGORY_ID, CATEGORY_NAME, cnt) AS
( SELECT PROJECT_ID,
VARCHAR(NAME_FIRST CONCAT ' ' CONCAT NAME_LAST, 6000),
POINT_OF_CONTACT,
PROJECT_NAME,
CATEGORY_ID,
CATEGORY_NAME,
1
FROM t1
WHERE rowNum = 1
UNION ALL
SELECT t2.PROJECT_ID,
t2.list || ' | ' || t1.NAME_FIRST CONCAT ' ' CONCAT t1.NAME_LAST,
t1.POINT_OF_CONTACT,
t1.PROJECT_NAME,
t1.CATEGORY_ID,
t1.CATEGORY_NAME
FROM t2, t1
WHERE t2.project_id = t1.project_id
AND t2.category_id = t1.category_id
AND t2.cnt + 1 = t1.rowNum )
SELECT PROJECT_ID,
PROJECT_NAME,
POINT_OF_CONTACT,
CATEGORY_ID,
CATEGORY_NAME
list
FROM t2
WHERE ( PROJECT_ID, CATEGORY_ID, cnt ) IN (
SELECT PROJECT_ID, CATEGORY_ID, MAX(rowNum)
FROM t1
GROUP BY PROJECT_NAME )

Related

group by 2 fields oracle sql inner join

I can't get the syntax correct to be able to group by two fields: as_of_date and ISSUERID. Thanks!
select as_of_date, count(distinct(issuer_id)) from
crd_own.ml_corp_index_data_monthly tb1
INNER JOIN pm_own.esg_credit_factors tb2
ON tb1.TICKER = tb2.ISSUER_TICKER
AND trunc(tb1.DATADATE, 'month') = trunc(tb2.AS_OF_DATE, 'month')
where INDEXNAME ='IG'
and DATADATE = '31-DEC-17'
group by as_of_date, ISSUERID
order by as_of_date asc
You do not have the same number of "non-aggregating" columns in both the select and group by clauses
SELECT
as_of_date
, COUNT( DISTINCT (issuer_id) )
...
GROUP BY
as_of_date
, ISSUERID <<< this is the problem
You need to either include ISSUERID in the select clause:
SELECT
as_of_date
, ISSUERID
, COUNT( DISTINCT (issuer_id) )
...
GROUP BY
as_of_date
, ISSUERID
ORDER BY
as_of_date ASC
Or remove ISSUERID completely.
SELECT
-- non-aggregating columns
as_of_date
, ISSUERID
-- aggregating columns
, COUNT( DISTINCT (issuer_id) )
FROM ...
WHERE ...
GROUP BY
-- repeat all non-aggregating columns here
as_of_date
, ISSUERID

sql recursion: find tree given middle node

I need to get a tree of related nodes given a certain node, but not necessary top node. I've got a solution using two CTEs, since I am struggling to squeeze it all into one CTE :). Might somebody have a sleek solution to avoid using two CTEs? Here is some code that I was playing with:
DECLARE #temp AS TABLE (ID INT, ParentID INT)
INSERT INTO #temp
SELECT 1 ID, NULL AS ParentID
UNION ALL
SELECT 2, 1
UNION ALL
SELECT 3, 2
UNION ALL
SELECT 4, 3
UNION ALL
SELECT 5, 4
UNION ALL
SELECT 6, NULL
UNION ALL
SELECT 7, 6
UNION ALL
SELECT 8, 7
DECLARE #startNode INT = 4
;WITH TheTree (ID,ParentID)
AS (
SELECT ID, ParentID
FROM #temp
WHERE ID = #startNode
UNION ALL
SELECT t.id, t.ParentID
FROM #temp t
JOIN TheTree tr ON t.ParentID = tr.ID
)
SELECT * FROM TheTree
;WITH Up(ID,ParentID)
AS (
SELECT t.id, t.ParentID
FROM #temp t
WHERE t.ID = #startNode
UNION ALL
SELECT t.id, t.ParentID
FROM #temp t
JOIN Up c ON t.id = c.ParentID
)
--SELECT * FROM Up
,TheTree (ID,ParentID)
AS (
SELECT ID, ParentID
FROM Up
WHERE ParentID is null
UNION ALL
SELECT t.id, t.ParentID
FROM #temp t
JOIN TheTree tr ON t.ParentID = tr.ID
)
SELECT * FROM TheTree
thanks
Meh. This avoids using two CTEs, but the result is a brute force kludge that hardly qualifies as "sleek" as it won’t be efficient if your table is at all sizeable. It will:
Recursively build all possible hierarchies
As you build them, flag the target NodeId as you find it
Return only the targeted tree
I threw in column “TreeNumber” on the off-chance the TargetId appears in multiple hierarchies, or if you’d ever have multiple values to check in one pass. “Depth” was added to make the output a bit more legible.
A more complex solution like #John’s might do, and more and subtler tricks could be done with more detailed table sturctures.
DECLARE #startNode INT = 4
;WITH cteAllTrees (TreeNumber, Depth, ID, ParentID, ContainsTarget)
AS (
SELECT
row_number() over (order by ID) TreeNumber
,1
,ID
,ParentID
,case
when ID = #startNode then 1
else 0
end ContainsTarget
FROM #temp
WHERE ParentId is null
UNION ALL
SELECT
tr.TreeNumber
,tr.Depth + 1
,t.id
,t.ParentID
,case
when tr.ContainsTarget = 1 then 1
when t.ID = #startNode then 1
else 0
end ContainsTarget
FROM #temp t
INNER JOIN cteAllTrees tr
ON t.ParentID = tr.ID
)
SELECT
TreeNumber
,Depth
,ID
,ParentId
from cteAllTrees
where TreeNumber in (select TreeNumber from cteAllTrees where ContainsTarget = 1)
order by
TreeNumber
,Depth
,ID
Here is a technique where you can select the entire hierarchy, a specific node with all its children, and even a filtered list and how they roll.
Note: See the comments next to the DECLAREs
Declare #YourTable table (id int,pt int,name varchar(50))
Insert into #YourTable values
(1,null,'1'),(2,1,'2'),(3,1,'3'),(4,2,'4'),(5,2,'5'),(6,3,'6'),(7,null,'7'),(8,7,'8')
Declare #Top int = null --<< Sets top of Hier Try 2
Declare #Nest varchar(25) = '|-----' --<< Optional: Added for readability
Declare #Filter varchar(25) = '' --<< Empty for All or try 4,6
;with cteP as (
Select Seq = cast(1000+Row_Number() over (Order by name) as varchar(500))
,ID
,pt
,Lvl=1
,name
From #YourTable
Where IsNull(#Top,-1) = case when #Top is null then isnull(pt,-1) else ID end
Union All
Select Seq = cast(concat(p.Seq,'.',1000+Row_Number() over (Order by r.name)) as varchar(500))
,r.ID
,r.pt
,p.Lvl+1
,r.name
From #YourTable r
Join cteP p on r.pt = p.ID)
,cteR1 as (Select *,R1=Row_Number() over (Order By Seq) From cteP)
,cteR2 as (Select A.Seq,A.ID,R2=Max(B.R1) From cteR1 A Join cteR1 B on (B.Seq like A.Seq+'%') Group By A.Seq,A.ID )
Select Distinct
A.R1
,B.R2
,A.ID
,A.pt
,A.Lvl
,name = Replicate(#Nest,A.Lvl-1) + A.name
From cteR1 A
Join cteR2 B on A.ID=B.ID
Join (Select R1 From cteR1 where IIF(#Filter='',1,0)+CharIndex(concat(',',ID,','),concat(',',#Filter+','))>0) F on F.R1 between A.R1 and B.R2
Order By A.R1

t-sql WITH on WITH

I have to make query on WITH query, something like
; WITH #table1
(
SELECT id, x from ... WHERE....
UNION ALL
SELECT id, x from ... WHERE...
)
WITH #table2
(
SELECT DISTINCT tbl_x.*,ROW_NUMBER() OVER (order by id) as RowNumber
WHERE id in ( SELECT id from #table1)
)
SELECT * FROM #table2 WHERE RowNumber > ... and ...
So I have to use WITH on WITH and then SELECT on second WITH, How I can do that?
You can define multiple CTEs after the WITH keyword by separating each CTE with a comma.
WITH T1 AS
(
SELECT id, x from ... WHERE....
UNION ALL
SELECT id, x from ... WHERE...
)
, T2 AS
(
SELECT DISTINCT tbl_x.*, ROW_NUMBER() OVER (order by id) as RowNumber
WHERE id in ( SELECT id from T1 )
)
SELECT * FROM T2 WHERE RowNumber > ... and ...
https://web.archive.org/web/20210927200924/http://www.4guysfromrolla.com/webtech/071906-1.shtml

concatenating single column in TSQL

I am using SSMS 2008 and trying to concatenate one of the rows together based on a different field's grouping. I have two columns, people_id and address_desc. They look like this:
address_desc people_id
---------- ------------
Murfreesboro, TN 37130 F15D1135-9947-4F66-B778-00E43EC44B9E
11 Mohawk Rd., Burlington, MA 01803 C561918F-C2E9-4507-BD7C-00FB688D2D6E
Unknown, UN 00000 C561918F-C2E9-4507-BD7C-00FB688D2D6E Jacksonville, NC 28546 FC7C78CD-8AEA-4C8E-B93D-010BF8E4176D
Memphis, TN 38133 8ED8C601-5D35-4EB7-9217-012905D6E9F1
44 Maverick St., Fitchburg, MA 8ED8C601-5D35-4EB7-9217-012905D6E9F1
Now I want to concatenate the address_desc field / people_id. So the first one here should just display "Murfreesboro, TN 37130" for address_desc. But second person should have just one line instead of two which says "11 Mohawk Rd., Burlington, MA 01803;Unknown, UN 00000" for address_desc.
How do I do this? I tried using CTE, but this was giving me ambiguity error:
WITH CTE ( people_id, address_list, address_desc, length )
AS ( SELECT people_id, CAST( '' AS VARCHAR(8000) ), CAST( '' AS VARCHAR(8000) ), 0
FROM dbo.address_view
GROUP BY people_id
UNION ALL
SELECT p.people_id, CAST( address_list +
CASE WHEN length = 0 THEN '' ELSE ', ' END + c.address_desc AS VARCHAR(8000) ),
CAST( c.address_desc AS VARCHAR(8000)), length + 1
FROM CTE c
INNER JOIN dbo.address_view p
ON c.people_id = p.people_id
WHERE p.address_desc > c.address_desc )
SELECT people_id, address_list
FROM ( SELECT people_id, address_list,
RANK() OVER ( PARTITION BY people_id ORDER BY length DESC )
FROM CTE ) D ( people_id, address_list, rank )
WHERE rank = 1 ;
Here was my initial SQL query:
SELECT a.address_desc, a.people_id
FROM dbo.address_view a
INNER JOIN (SELECT people_id
FROM dbo.address_view
GROUP BY people_id
HAVING COUNT(*) > 1) t
ON a.people_id = t.people_id
order by a.people_id
You can use FOR XML PATH('') like this:
DECLARE #TestData TABLE
(
address_desc NVARCHAR(100) NOT NULL
,people_id UNIQUEIDENTIFIER NOT NULL
);
INSERT #TestData
SELECT 'Murfreesboro, TN 37130', 'F15D1135-9947-4F66-B778-00E43EC44B9E'
UNION ALL
SELECT '11 Mohawk Rd., Burlington, MA 01803', 'C561918F-C2E9-4507-BD7C-00FB688D2D6E'
UNION ALL
SELECT 'Unknown, UN 00000', 'C561918F-C2E9-4507-BD7C-00FB688D2D6E'
UNION ALL
SELECT 'Memphis, TN 38133', '8ED8C601-5D35-4EB7-9217-012905D6E9F1'
UNION ALL
SELECT '44 Maverick St., Fitchburg, MA', '8ED8C601-5D35-4EB7-9217-012905D6E9F1';
SELECT a.people_id,
(SELECT SUBSTRING(
(SELECT ';'+b.address_desc
FROM #TestData b
WHERE a.people_id = b.people_id
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)')
,2
,4000)
) GROUP_CONCATENATE
FROM #TestData a
GROUP BY a.people_id
Results:
people_id GROUP_CONCATENATE
------------------------------------ ------------------------------------------------------
F15D1135-9947-4F66-B778-00E43EC44B9E Murfreesboro, TN 37130
C561918F-C2E9-4507-BD7C-00FB688D2D6E 11 Mohawk Rd., Burlington, MA 01803;Unknown, UN 00000
8ED8C601-5D35-4EB7-9217-012905D6E9F1 Memphis, TN 38133;44 Maverick St., Fitchburg, MA

How to use ROW_NUMBER in the following procedure?

I have the following stored procedure which returns A, B, and the count in descending order. I am trying to use ROW_NUMBER, so I can page the records, but I want the first row number 1 to be the record with the highest count, so basically, if I return a table with 3 records and the count is 30, 20, 10, then row number 1 should correspond with count 30, row number 2 should correspond with count 20, and row number 3 should correspond with count 10. dbo.f_GetCount is a function that returns a count.
create procedure dbo.Test
as
#A nvarchar(300) = NULL,
#B nvarchar(10) = NULL
as
select #A = nullif(#A,'')
,#B = nullif(#B,'');
select h.A
,hrl.B
,dbo.f_GetCount(hrl.A,h.B) as cnt
from dbo.hrl
inner join dbo.h
on h.C = hrl.C
where(#A is null
or h.A like '%'+#A+'%'
)
and (#B is null
or hrl.B = #B
)
group by hrl.B
,h.A
order by cnt desc;
WITH q AS
(
SELECT h.A, hrl.B,
dbo.f_GetCount(hrl.A,h.B) as cnt
FROM dbo.hrl
INNER JOIN dbo.h on h.C = hrl.C
WHERE (#A IS NULL OR h.A like '%' + #A + '%')
AND (#B IS NULL OR hrl.B = #B)
GROUP BY hrl.B, h.A
)
SELECT q.*, ROW_NUMBER() OVER (ORDER BY cnt DESC) AS rn
FROM q
ORDER BY rn DESC
To retrieve first 10 rows, use:
WITH q AS
(
SELECT h.A, hrl.B,
dbo.f_GetCount(hrl.A,h.B) as cnt
FROM dbo.hrl
INNER JOIN dbo.h on h.C = hrl.C
WHERE (#A IS NULL OR h.A like '%' + #A + '%')
AND (#B IS NULL OR hrl.B = #B)
GROUP BY hrl.B, h.A
)
SELECT TOP 10 q.*,
ROW_NUMBER() OVER (ORDER BY cnt DESC, A, B) AS rn
FROM q
ORDER BY cnt DESC, A, B
To retrieve rows between 11 and 20, use:
SELECT *
FROM (
WITH q AS
(
SELECT h.A, hrl.B,
dbo.f_GetCount(hrl.A,h.B) as cnt
FROM dbo.hrl
INNER JOIN dbo.h on h.C = hrl.C
WHERE (#A IS NULL OR h.A like '%' + #A + '%')
AND (#B IS NULL OR hrl.B = #B)
GROUP BY hrl.B, h.A
)
SELECT q.*,
ROW_NUMBER() OVER (ORDER BY cnt DESC, A, B) AS rn
FROM q
) qq
WHERE rn BETWEEN 11 AND 20
ORDER BY cnt DESC, A, B
I would use a sub-query to get the values of the function into the result, and then the ROW_NUMBER ranking function, like so:
select
ROW_NUMBER() over (order by t.cnt desc) as RowId, t.*
from
(
SELECT
h.A, hrl.B, dbo.f_GetCount(hrl.A,h.B) as cnt
FROM
dbo.hrl
INNER JOIN dbo.h on h.C = hrl.C
WHERE
(#A IS NULL OR h.A like '%' + #A + '%') AND
(#B IS NULL OR hrl.B = #B)
GROUP BY
hrl.B, h.A
) as t
order by
1
If you wanted only a certain section of results (say, for paging), then you would need another subquery, and then filter on the row number:
select
t.*
from
(
select
ROW_NUMBER() over (order by t.cnt desc) as RowId, t.*
from
(
SELECT
h.A, hrl.B, dbo.f_GetCount(hrl.A,h.B) as cnt
FROM
dbo.hrl
INNER JOIN dbo.h on h.C = hrl.C
WHERE
(#A IS NULL OR h.A like '%' + #A + '%') AND
(#B IS NULL OR hrl.B = #B)
GROUP BY
hrl.B, h.A
) as t
) as t
where
t.RowId between 1 and 10
order by
t.RowId
Note that in this query, you could put ROW_NUMBER anywhere in the select list, since you are no longer reliant on using the "order by 1" syntax for the order by statement.
There is a subtle issue here when calling this query multiple times. It is not guaranteed that the order in which the records are returned are going to be consistent if the number of items in each group is not unique. In order to address this, you have to change the ROW_NUMBER function to order on the fields that make up the group in the count.
In this case, it would be A and B, resulting in:
select
t.*
from
(
select
ROW_NUMBER() over (order by t.cnt desc, t.A, t.B) as RowId, t.*
from
(
SELECT
h.A, hrl.B, dbo.f_GetCount(hrl.A,h.B) as cnt
FROM
dbo.hrl
INNER JOIN dbo.h on h.C = hrl.C
WHERE
(#A IS NULL OR h.A like '%' + #A + '%') AND
(#B IS NULL OR hrl.B = #B)
GROUP BY
hrl.B, h.A
) as t
) as t
where
t.RowId between 1 and 10
order by
t.RowId
This ends up ordering the results consistently between calls when the count of the items between groups is not unique (assuming the same set of data).
SELECT h.A, hrl.B,
dbo.f_GetCount(hrl.A,h.B) as cnt,
ROW_NUMBER() over (order by cnt desc) as row_num
FROM dbo.hrl
INNER JOIN dbo.h on h.C = hrl.C
WHERE (#A IS NULL OR h.A like '%' + #A + '%')
AND (#B IS NULL OR hrl.B = #B)
GROUP BY hrl.B, h.A
ORDER BY cnt desc
This should do the trick. I don't have SSMS in front of me to test, but you MAY have to substitute the usage of 'cnt' in the ROW_NUMBER's order by clause with a second call to the function, but this should give you the general idea.