Use WHERE statement in OVER()

Use WHERE statement in OVER() - postgresql

I'm trying to create a query, which will give me a row_number for all the returned records. I can do that for all records present in the database. The problem is, i need to somehow retrieve a row number for a query with WHERE statement inside (WHERE posts.status = 'published').
My original query looks like that:
SELECT
posts.*,
row_number() over (ORDER BY posts.score DESC) as position
FROM posts
However, adding a where statement inside over() throws syntax error:
SELECT
posts.*,
row_number() over (
WHERE posts.status = 'published'
ORDER BY posts.score DESC
) as position
FROM posts

SELECT posts.*, row_number() over (ORDER BY posts.score DESC) as position
FROM posts
WHERE posts.status = 'published'

Not quite sure what you are after. Maybe show an example of expected output. Here is an an example of an approach:
create table posts(id int, score int, status text);
insert into posts values(1, 1, 'x');
insert into posts values(2, 2, 'published');
insert into posts values(3, 3, 'x');
insert into posts values(4, 4, 'x');
SELECT x.id, x.score, x.status
,CASE WHEN x.status = 'published' THEN null ELSE x.position END
FROM (SELECT posts.*,
row_number() OVER (ORDER BY posts.score DESC)
-SUM(CASE WHEN status = 'published' THEN 1 ELSE 0 END)
OVER (ORDER BY posts.score DESC) as position
FROM posts
) x
Result:
4 4 x 1
3 3 x 2
2 2 published
1 1 x 3

Related

How to collapse overlapping date periods with acceptable gaps using T-SQL?

We want to group our members' enrollments into "continuous enrollments," allowing for a gap of up to 45 days. I know how to use LEAD to determine if an enrollment should be grouped with the next, but I don't know how to group them. Would it be more appropriate to add 45 to the term date and subtract 45 from the effective date, then check for overlapping date periods? My goal is to have a SQL view that returns the results similar to the final query below. Thank you for your help.
SELECT '101' AS MemID, '2021-01-01' AS EffDate, '2021-01-31' AS TermDate INTO #T1 UNION
SELECT '101', '2021-02-01', '2021-02-28' UNION
SELECT '101', '2021-03-01', '2021-03-31' UNION
SELECT '101', '2021-06-01', '2021-06-30' UNION
SELECT '999', '2021-01-01', '2021-01-15' UNION
SELECT '999', '2021-09-01', '2021-09-28' UNION
SELECT '999', '2021-10-01', '2021-10-31'
SELECT *
, LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS LeadEffDate
, DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate))) AS DaysToNextEnrollment
, CASE WHEN (DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate)))) <= 45 THEN 1 ELSE 0 END AS CombineWithNextRecord
FROM #T1
-- result objective
SELECT 101 AS MemID, '2021-01-01' AS EffDate, '2021-03-31' AS TermDate UNION
SELECT 101, '2021-06-01', '2021-06-30' UNION
SELECT 999, '2021-01-01', '2021-01-15' UNION
SELECT 999, '2021-09-01', '2021-10-31'

I think you are really close. Your question is very similar to
TSQL - creating from-to date table while ignoring in-between steps with conditions with a logic difference on what you want to consider to be the same group.
My basic approach is to use the LAG() function to figure out the previous values for MemID and TermDate and combine that with your 45 day rule to define a group. And finally get the first and last values of each group.
Here is my response to that question modified to your situation.
SELECT
a4.MemID
, CONVERT (DATE, a4.First_EffDate) AS [EffDate]
, CONVERT (DATE, a4.TermDate) AS [TermDate]
FROM (
SELECT
a3.MemID
, a3.EffDate
, a3.TermDate
, a3.MemID_group
, FIRST_VALUE (a3.EffDate) OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate) AS [First_EffDate]
, ROW_NUMBER () OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate DESC) AS [Row_number]
FROM (
SELECT
a2.MemID
, a2.EffDate
, a2.TermDate
, a2.Previous_MemID
, a2.Previous_TermDate
, a2.New_group
, SUM (a2.New_group) OVER (ORDER BY a2.MemID, a2.EffDate) AS [MemID_group]
FROM (
SELECT
a1.MemID
, a1.EffDate
, a1.TermDate
, a1.Previous_MemID
, a1.Previous_TermDate
---------------------------------------------------------------------------------
-- new group if the MemID is different from the previous row OR
-- if the MemID is the same as the previous row AND it has been more than 45 days
-- between the TermDate of the previous row and the EffDate of the current row
,
IIF((a1.MemID <> a1.Previous_MemID)
OR (
a1.MemID = a1.Previous_MemID
AND DATEDIFF (DAY, a1.Previous_TermDate, a1.EffDate) > 45
)
, 1
, 0) AS [New_group]
---------------------------------------------------------------------------------
FROM (
SELECT
MemID
, EffDate
, TermDate
, LAG (MemID) OVER (ORDER BY MemID) AS [Previous_MemID]
, LAG (TermDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS [Previous_TermDate]
FROM #T1
) a1
) a2
) a3
) a4
WHERE a4.[Row_number] = 1;
Here is the dbfiddle.

Checking Slowly Changing Dimension 2

I have a table that looks like this:
A slowly changing dimension type 2, according to Kimball.
Key is just a surrogate key, a key to make rows unique.
As you can see there are three rows for product A.
Timelines for this product are ok. During time the description of the product changes.
From 1-1-2020 up until 4-1-2020 the description of this product was ProdA1.
From 5-1-2020 up until 12-2-2020 the description of this product was ProdA2 etc.
If you look at product B, you see there are gaps in the timeline.
We use DB2 V12 z/Os. How can I check if there are gaps in the timelines for each and every product?
Tried this, but doesn't work
with selectie (key, tel) as
(select product, count(*)
from PROD_TAB
group by product
having count(*) > 1)
Select * from
PROD_TAB A
inner join selectie B
on A.product = B.product
Where not exists
(SELECT 1 from PROD_TAB C
WHERE A.product = C.product
AND A.END_DATE + 1 DAY = C.START_DATE
)
Does anyone know the answer?

The following query returns all gaps for all products.
The idea is to enumerate (RN column) all periods inside each product by START_DATE and join each record with its next period record.
WITH
/*
MYTAB (PRODUCT, DESCRIPTION, START_DATE, END_DATE) AS
(
SELECT 'A', 'ProdA1', DATE('2020-01-01'), DATE('2020-01-04') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'A', 'ProdA2', DATE('2020-01-05'), DATE('2020-02-12') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'A', 'ProdA3', DATE('2020-02-13'), DATE('2020-12-31') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB1', DATE('2020-01-05'), DATE('2020-01-09') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB2', DATE('2020-01-12'), DATE('2020-03-14') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB3', DATE('2020-03-15'), DATE('2020-04-18') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB4', DATE('2020-04-16'), DATE('2020-05-03') FROM SYSIBM.SYSDUMMY1
)
,
*/
MYTAB_ENUM AS
(
SELECT
T.*
, ROWNUMBER() OVER (PARTITION BY PRODUCT ORDER BY START_DATE) RN
FROM MYTAB T
)
SELECT A.PRODUCT, A.END_DATE + 1 START_DT, B.START_DATE - 1 END_DT
FROM MYTAB_ENUM A
JOIN MYTAB_ENUM B ON B.PRODUCT = A.PRODUCT AND B.RN = A.RN + 1
WHERE A.END_DATE + 1 <> B.START_DATE
AND A.END_DATE < B.START_DATE;
The result is:
|PRODUCT|START_DT |END_DT |
|-------|----------|----------|
|B |2020-01-10|2020-01-11|
May be more efficient way:
WITH MYTAB2 AS
(
SELECT
T.*
, LAG(END_DATE) OVER (PARTITION BY PRODUCT ORDER BY START_DATE) END_DATE_PREV
FROM MYTAB T
)
SELECT PRODUCT, END_DATE_PREV + 1 START_DATE, START_DATE - 1 END_DATE
FROM MYTAB2
WHERE END_DATE_PREV + 1 <> START_DATE
AND END_DATE_PREV < START_DATE;

Thnx Mark, will try this one of these days.
Never heard of LAG in DB2 V12 for z/Os
Will read about it
Thnx

TSQL - LEAD for Next Different Row

Is there a way to use the lead function such that I can get the next row where something has changed, as opposed it where it is the same?
In this example, the RowType can be 'in' or 'out', for each 'in' I need to know the next RowNumber where it has become 'out'. I have been playing with the lead function as it is really fast, however I haven't been able to get it working. I just need to do the following really, which is partition by a RowType which isn't the one in the current row.
select
RowNumber
,RowType --In this case I am only interested in RowType = 'In'
, Lead(RowNumber)
OVER (partition by "RowType = out" --This is the bit I am stuck on--
order by RowNumber ASC) as NextOutFlow
from table
order by RowNumber asc
Thanks in advance for any help

Rather than using lead() I would use an outer apply that returns the next row with type out for all rows with type in:
select RowNumber, RowType, nextOut
from your_table t
outer apply (
select min(RowNumber) as nextOut
from your_table
where RowNumber > t.RowNumber and RowType='Out'
) oa
where RowType = 'In'
order by RowNumber asc
Given sample data like:
RowNumber RowType
1 in
2 out
3 in
4 in
5 out
6 in
This would return:
RowNumber RowType nextOut
1 in 2
3 in 5
4 in 5
6 in NULL

I think this will work
If you would use a bit field for in out you would get better performance
;with cte1 as
(
SELECT [inden], [OnOff]
, lag([OnOff]) over (order by [inden]) as [lagOnOff]
FROM [OnOff]
), cte2 as
(
select [inden], [OnOff], [lagOnOff]
, lead([inden]) over (order by [inden]) as [Leadinden]
from cte1
where [OnOff] <> [lagOnOff]
or [lagOnOff] is null
)
select [inden], [OnOff], [lagOnOff], [Leadinden]
from cte2
where [OnOff] = 'true'
probably slower but if you have the right indexes may work
select t1.rowNum as 'rowNumIn', min(t2.rownum) as 'nextRowNumOut'
from tabel t1
join table t2
on t1.rowType = 'In'
and t2.rowType = 'Out'
and t2.rowNum > t1.rowNum
and t2.rowNum < t1.rowNum + 1000 -- if you can constrain it
group by t1.rowNum

row number always return 1 for each row in sql server

(SELECT
(SELECT ROW_NUMBER() OVER (order by t.NotificationID)) as RowNumber,
[NotificationID],[ProjectID],[TeamMemberID],[OperationType],
[Hours],[Occurance],[Period],[NotificationText],
[NotificationRecipientIDs],[NotificationRecipientClienitsIDs]
FROM tblIA_Notifications t
WHERE IsDeleted = 0 AND IsActive = 1
)
The above query always returns rownumber 1 for each row. When I use the select statement outside, its problem. Otherwise if I remove the outer select statement its fine.
I don't understand the behavior.

Uou are getting row_number 1 for each row because you are selecting the Row_Number for each row
try this--->
SELECT ROW_NUMBER() OVER (order by t.NotificationID) as RowNumber,
[NotificationID],
[ProjectID],
[TeamMemberID],
[OperationType],
[Hours],
[Occurance],
[Period],
[NotificationText],
[NotificationRecipientIDs],
[NotificationRecipientClienitsIDs]
FROM tblIA_Notifications t
WHERE IsDeleted = 0
AND IsActive = 1

Try this...
SELECT ROW_NUMBER() OVER (order by T.COLUMN_NAME) as RowNumber FROM [dbo].[TABLE_NAME] T

T-SQL group by partition

I have below table in SQL server 2008.Please help to get expected output
Thanks.
CREATE TABLE [dbo].[Test]([Category] [varchar](10) NULL,[Value] [int] NULL,
[Weightage] [int] NULL,[Rn] [smallint] NULL ) ON [PRIMARY]
insert into Test values ('Cat1',310,674,1),('Cat1',783,318,2),('Cat1',310,96,3),('Cat1',109,917,4),('Cat2',441,397,1),('Cat2',637,725,2),('Cat2',460,742,3),('Cat2',542,583,4),('Cat2',601,162,5),('Cat2',45,719,6),('Cat2',46,305,7),('Cat3',477,286,1),('Cat3',702,484,2),('Cat3',797,836,3),('Cat3',541,890,4),('Cat3',750,962,5),('Cat3',254,407,6),('Cat3',136,585,7),('Cat3',198,477,8),('Cat4',375,198,1),('Cat4',528,351,2),('Cat4',845,380,3),('Cat4',716,131,4),('Cat4',781,919,5)

For per category Average Weightage
SELECT
Category,
AVG(Value),
SUM(CASE WHEN RN<4 THEN Weightage ELSE 0 END) / (NULLIF(SUM(CASE WHEN RN<4 THEN 1 ELSE 0 END), 0))
FROM
MyTable
GROUP BY
Category
Average Weightage over the whole set
SELECT
M.Category,
AVG(Value),
foo.AvgWeightage
FROM
MyTable M
CROSS JOIN
(SELECT AVG(Weightage) As AvgWeightage FROM MyTable WHERE Rn < 4) foo
GROUP BY
M.Category, foo.AvgWeightage

Simple:)
SELECT Category,
AVG(Value) AS AvgValue,
AVG(CASE WHEN RN< 4 THEN (Weightage) END ) AS AvgWeightage
FROM Test
GROUP BY Category

Try this
SELECT AvgValue.Category, AvgValue.AvgValue, AvgWeight.Weight
FROM(
(SELECT c.Category,
AVG(c.Value) AS AvgValue
FROM Test c
GROUP BY Category) AvgValue
INNER JOIN
(SELECT Category, AVG(Weightage) AS Weight
FROM Test
WHERE Rn < 4
GROUP BY Category) AvgWeight
ON AvgValue.Category = AvgWeight.Category)