How to calculate longest streak in SQL?

How to calculate longest streak in SQL? - tsql

I have
TABLE EMPLOYEE - ID,DATE,IsPresent
I want to calculate longest streak for a employee presence.The Present bit will be false for days he didnt come..So I want to calculate the longest number of days he came to office for consecutive dates..I have the Date column field is unique...So I tried this way -
Select Id,Count(*) from Employee where IsPresent=1
But the above doesnt work...Can anyone guide me towards how I can calculate streak for this?....I am sure people have come across this...I tried searching online but...didnt understand it well...please help me out..

EDIT Here's a SQL Server version of the query:
with LowerBound as (select second_day.EmployeeId
, second_day."DATE" as LowerDate
, row_number() over (partition by second_day.EmployeeId
order by second_day."DATE") as RN
from T second_day
left outer join T first_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = dateadd(day, -1, second_day."DATE")
and first_day.IsPresent = 1
where first_day.EmployeeId is null
and second_day.IsPresent = 1)
, UpperBound as (select first_day.EmployeeId
, first_day."DATE" as UpperDate
, row_number() over (partition by first_day.EmployeeId
order by first_day."DATE") as RN
from T first_day
left outer join T second_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = dateadd(day, -1, second_day."DATE")
and second_day.IsPresent = 1
where second_day.EmployeeId is null
and first_day.IsPresent = 1)
select LB.EmployeeID, max(datediff(day, LowerDate, UpperDate) + 1) as LongestStreak
from LowerBound LB
inner join UpperBound UB
on LB.EmployeeId = UB.EmployeeId
and LB.RN = UB.RN
group by LB.EmployeeId
SQL Server Version of the test data:
create table T (EmployeeId int
, "DATE" date not null
, IsPresent bit not null
, constraint T_PK primary key (EmployeeId, "DATE")
)
insert into T values (1, '2000-01-01', 1);
insert into T values (2, '2000-01-01', 0);
insert into T values (3, '2000-01-01', 0);
insert into T values (3, '2000-01-02', 1);
insert into T values (3, '2000-01-03', 1);
insert into T values (3, '2000-01-04', 0);
insert into T values (3, '2000-01-05', 1);
insert into T values (3, '2000-01-06', 1);
insert into T values (3, '2000-01-07', 0);
insert into T values (4, '2000-01-01', 0);
insert into T values (4, '2000-01-02', 1);
insert into T values (4, '2000-01-03', 1);
insert into T values (4, '2000-01-04', 1);
insert into T values (4, '2000-01-05', 1);
insert into T values (4, '2000-01-06', 1);
insert into T values (4, '2000-01-07', 0);
insert into T values (5, '2000-01-01', 0);
insert into T values (5, '2000-01-02', 1);
insert into T values (5, '2000-01-03', 0);
insert into T values (5, '2000-01-04', 1);
insert into T values (5, '2000-01-05', 1);
insert into T values (5, '2000-01-06', 1);
insert into T values (5, '2000-01-07', 0);
Sorry, this is written in Oracle, so substitute the appropriate SQL Server date arithmetic.
Assumptions:
Date is either a Date value or
DateTime with time component of
00:00:00.
The primary key is
(EmployeeId, Date)
All fields are not null
If a date is missing for the employee, they were not present. (Used to handle the beginning and ending of the data series, but also means that missing dates in the middle will break streaks. Could be a problem depending on requirements.
with LowerBound as (select second_day.EmployeeId
, second_day."DATE" as LowerDate
, row_number() over (partition by second_day.EmployeeId
order by second_day."DATE") as RN
from T second_day
left outer join T first_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = second_day."DATE" - 1
and first_day.IsPresent = 1
where first_day.EmployeeId is null
and second_day.IsPresent = 1)
, UpperBound as (select first_day.EmployeeId
, first_day."DATE" as UpperDate
, row_number() over (partition by first_day.EmployeeId
order by first_day."DATE") as RN
from T first_day
left outer join T second_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = second_day."DATE" - 1
and second_day.IsPresent = 1
where second_day.EmployeeId is null
and first_day.IsPresent = 1)
select LB.EmployeeID, max(UpperDate - LowerDate + 1) as LongestStreak
from LowerBound LB
inner join UpperBound UB
on LB.EmployeeId = UB.EmployeeId
and LB.RN = UB.RN
group by LB.EmployeeId
Test Data:
create table T (EmployeeId number(38)
, "DATE" date not null check ("DATE" = trunc("DATE"))
, IsPresent number not null check (IsPresent in (0, 1))
, constraint T_PK primary key (EmployeeId, "DATE")
)
/
insert into T values (1, to_date('2000-01-01', 'YYYY-MM-DD'), 1);
insert into T values (2, to_date('2000-01-01', 'YYYY-MM-DD'), 0);
insert into T values (3, to_date('2000-01-01', 'YYYY-MM-DD'), 0);
insert into T values (3, to_date('2000-01-02', 'YYYY-MM-DD'), 1);
insert into T values (3, to_date('2000-01-03', 'YYYY-MM-DD'), 1);
insert into T values (3, to_date('2000-01-04', 'YYYY-MM-DD'), 0);
insert into T values (3, to_date('2000-01-05', 'YYYY-MM-DD'), 1);
insert into T values (3, to_date('2000-01-06', 'YYYY-MM-DD'), 1);
insert into T values (3, to_date('2000-01-07', 'YYYY-MM-DD'), 0);
insert into T values (4, to_date('2000-01-01', 'YYYY-MM-DD'), 0);
insert into T values (4, to_date('2000-01-02', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-03', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-04', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-05', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-06', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-07', 'YYYY-MM-DD'), 0);
insert into T values (5, to_date('2000-01-01', 'YYYY-MM-DD'), 0);
insert into T values (5, to_date('2000-01-02', 'YYYY-MM-DD'), 1);
insert into T values (5, to_date('2000-01-03', 'YYYY-MM-DD'), 0);
insert into T values (5, to_date('2000-01-04', 'YYYY-MM-DD'), 1);
insert into T values (5, to_date('2000-01-05', 'YYYY-MM-DD'), 1);
insert into T values (5, to_date('2000-01-06', 'YYYY-MM-DD'), 1);
insert into T values (5, to_date('2000-01-07', 'YYYY-MM-DD'), 0);

groupby is missing.
To select total man-days (for everyone) attendance of the whole office.
Select Id,Count(*) from Employee where IsPresent=1
To select man-days attendance per employee.
Select Id,Count(*)
from Employee
where IsPresent=1
group by id;
But that is still not good because it counts the total days of attendance and NOT the length of continuous attendance.
What you need to do is construct a temp table with another date column date2. date2 is set to today. The table is the list of all days an employee is absent.
create tmpdb.absentdates as
Select id, date, today as date2
from EMPLOYEE
where IsPresent=0
order by id, date;
So the trick is to calculate the date difference between two absent days to find the length of continuously present days.
Now, fill in date2 with the next absent date per employee. The most recent record per employee will not be updated but left with value of today because there is no record with greater date than today in the database.
update tmpdb.absentdates
set date2 =
select min(a2.date)
from
tmpdb.absentdates a1,
tmpdb.absentdates a2
where a1.id = a2.id
and a1.date < a2.date
The above query updates itself by performing a join on itself and may cause deadlock query so it is better to create two copies of the temp table.
create tmpdb.absentdatesX as
Select id, date
from EMPLOYEE
where IsPresent=0
order by id, date;
create tmpdb.absentdates as
select *, today as date2
from tmpdb.absentdatesX;
You need to insert the hiring date, presuming the earliest date per employee in the database is the hiring date.
insert into tmpdb.absentdates a
select a.id, min(e.date), today
from EMPLOYEE e
where a.id = e.id
Now update date2 with the next later absent date to be able to perform date2 - date.
update tmpdb.absentdates
set date2 =
select min(x.date)
from
tmpdb.absentdates a,
tmpdb.absentdatesX x
where a.id = x.id
and a.date < x.date
This will list the length of days an emp is continuously present:
select id, datediff(date2, date) as continuousPresence
from tmpdb.absentdates
group by id, continuousPresence
order by id, continuousPresence
But you only want to longest streak:
select id, max(datediff(date2, date) as continuousPresence)
from tmpdb.absentdates
group by id
order by id
However, the above is still problematic because datediff does not take into account holidays and weekends.
So we depend on the count of records as the legitimate working days.
create tmpdb.absentCount as
Select a.id, a.date, a.date2, count(*) as continuousPresence
from EMPLOYEE e, tmpdb.absentdates a
where e.id = a.id
and e.date >= a.date
and e.date < a.date2
group by a.id, a.date
order by a.id, a.date;
Remember, every time you use an aggregator like count, ave
yo need to groupby the selected item list because it is common sense that you have to aggregate by them.
Now select the max streak
select id, max(continuousPresence)
from tmpdb.absentCount
group by id
To list the dates of streak:
select id, date, date2, continuousPresence
from tmpdb.absentCount
group by id
having continuousPresence = max(continuousPresence);
There may be some mistakes (sql server tsql) above but this is the general idea.

Try this:
select
e.Id,
e.date,
(select
max(e1.date)
from
employee e1
where
e1.Id = e.Id and
e1.date < e.date and
e1.IsPresent = 0) StreakStartDate,
(select
min(e2.date)
from
employee e2
where
e2.Id = e.Id and
e2.date > e.date and
e2.IsPresent = 0) StreakEndDate
from
employee e
where
e.IsPresent = 1
Then finds out the longest streak for each employee:
select id, max(datediff(streakStartDate, streakEndDate))
from (<use subquery above>)
group by id
I'm not fully sure this query has correct syntax because I havn't database just now.
Also notice streak start and streak end columns contains not the first and last day when employee was present, but nearest dates when he was absent. If dates in table have approximately equal distance, this does not means, otherwise query become little more complex, because we need to finds out nearest presence dates. Also this improvements allow to handle situation when the longest streak is first or last streak.
The main idea is for each date when employee was present find out streak start and streak end.
For each row in table when employee was present, streak start is maximum date that is less then date of current row when employee was absent.

Here is an alternate version, to handle missing days differently. Say that you only record a record for work days, and being at work Monday-Friday one week and Monday-Friday of the next week counts as ten consecutive days. This query assumes that missing dates in the middle of a series of rows are non-work days.
with LowerBound as (select second_day.EmployeeId
, second_day."DATE" as LowerDate
, row_number() over (partition by second_day.EmployeeId
order by second_day."DATE") as RN
from T second_day
left outer join T first_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = dateadd(day, -1, second_day."DATE")
and first_day.IsPresent = 1
where first_day.EmployeeId is null
and second_day.IsPresent = 1)
, UpperBound as (select first_day.EmployeeId
, first_day."DATE" as UpperDate
, row_number() over (partition by first_day.EmployeeId
order by first_day."DATE") as RN
from T first_day
left outer join T second_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = dateadd(day, -1, second_day."DATE")
and second_day.IsPresent = 1
where second_day.EmployeeId is null
and first_day.IsPresent = 1)
select LB.EmployeeID, max(datediff(day, LowerDate, UpperDate) + 1) as LongestStreak
from LowerBound LB
inner join UpperBound UB
on LB.EmployeeId = UB.EmployeeId
and LB.RN = UB.RN
group by LB.EmployeeId
go
with NumberedRows as (select EmployeeId
, "DATE"
, IsPresent
, row_number() over (partition by EmployeeId
order by "DATE") as RN
-- , min("DATE") over (partition by EmployeeId, IsPresent) as MinDate
-- , max("DATE") over (partition by EmployeeId, IsPresent) as MaxDate
from T)
, LowerBound as (select SecondRow.EmployeeId
, SecondRow.RN
, row_number() over (partition by SecondRow.EmployeeId
order by SecondRow.RN) as LowerBoundRN
from NumberedRows SecondRow
left outer join NumberedRows FirstRow
on FirstRow.IsPresent = 1
and FirstRow.EmployeeId = SecondRow.EmployeeId
and FirstRow.RN + 1 = SecondRow.RN
where FirstRow.EmployeeId is null
and SecondRow.IsPresent = 1)
, UpperBound as (select FirstRow.EmployeeId
, FirstRow.RN
, row_number() over (partition by FirstRow.EmployeeId
order by FirstRow.RN) as UpperBoundRN
from NumberedRows FirstRow
left outer join NumberedRows SecondRow
on SecondRow.IsPresent = 1
and FirstRow.EmployeeId = SecondRow.EmployeeId
and FirstRow.RN + 1 = SecondRow.RN
where SecondRow.EmployeeId is null
and FirstRow.IsPresent = 1)
select LB.EmployeeId, max(UB.RN - LB.RN + 1)
from LowerBound LB
inner join UpperBound UB
on LB.EmployeeId = UB.EmployeeId
and LB.LowerBoundRN = UB.UpperBoundRN
group by LB.EmployeeId

I did this once to determine consecutive days that a fire fighter had been on shift at least 15 minutes.
Your case is a bit more simple.
If you wanted to assume that no employee came more than 32 consecutive times, you could just use a Common Table Expression. But a better approach would be to use a temp table and a while loop.
You will need a column called StartingRowID. Keep joining from your temp table to the employeeWorkDay table for the next consecutive employee work day and insert them back into the temp table. When ##Row_Count = 0, you have captured the longest streak.
Now aggregate by StartingRowID to get the first day of the longest streak. I'm running short on time, or I would include some sample code.

Related

Optimise query: count latest win/lose streak for all teams

I'm not an expert in data-warehousing nor analytics, so I give birth to a monster-query that I'd like to optimise (if possible).
The problem is: I need to display the stagings table for a given tournament. This table should display team id, total score, team position (i.e. bucket for same-score teams), order and latest (!) win-lose strike.
I.e. for the following sequence (most recent first) WWLLLWLD I should get 2W
W = win, L = lose, D = draft
Schema
create table matches (
id integer primary key,
stage_id integer not null,
scheduled_at timestamp not null,
winner_id integer null,
status text not null default 'finished' -- just to give extra context
);
create table teams (
id integer primary key
);
create table match_teams (
match_id integer,
team_id integer,
constraint fk_mt_m foreign key (match_id) references matches(id),
constraint fk_mt_t foreign key (team_id) references teams(id)
);
insert into teams(id) values(1),(2);
insert into matches(id, stage_id, scheduled_at, winner_id) values
(1, 1, now() - interval '1 day', 1),
(2, 1, now() - interval '2 days', 1),
(3, 1, now() - interval '3 days', 2),
(4, 1, now() - interval '4 days', 1),
(5, 1, now() - interval '5 days', null);
insert into match_teams(match_id, team_id) values
(1, 1),
(1, 2),
(2, 1),
(2, 2),
(3, 1),
(3, 2),
(4, 1),
(4, 2),
(5, 1),
(5, 2);
Query itself:
with v_mto as (
SELECT
m.id,
m."stage_id",
mt."team_id",
m."scheduled_at",
(
case
when m."winner_id" IS NULL then 0
when (m."winner_id" = mt."team_id") then 1
else -1
end
) win
FROM matches m
INNER JOIN match_teams mt ON m.id = mt."match_id"
WHERE m.status = 'finished'
ORDER BY "stage_id", "team_id", "scheduled_at" desc
),
v_lag as (
select
"stage_id",
"team_id",
win,
lag(win, 1, win) over (partition by "stage_id", "team_id" order by "scheduled_at" desc ) lag_win,
first_value(win) over (partition by "stage_id", "team_id" order by "scheduled_at" desc ) first_win
from v_mto
)
select
"stage_id",
"team_id",
v_lag.win,
count(1)
from v_lag
where v_lag.win = v_lag.lag_win and v_lag.win = v_lag.first_win
group by 1, 2, 3
-- This is the query for the final table (on a screenshot)
-- with team_scores as (
-- select
-- m."tournamentStageId",
-- "teamId",
-- sum(
-- -- each win gives 3 score, each draft gives 1 score
-- coalesce((m."winner_id" = mt."team_id")::integer, 0) * 3
-- +
-- (m."winner_id" IS NULL)::int
-- ) as score
-- from matches m
-- inner join match_teams mt on m.id = mt."matchId"
-- where m.status = 1
-- group by m."tournamentStageId", "teamId")
-- select
-- "tournamentStageId",
-- "teamId",
-- t.name,
-- score,
-- dense_rank() over (partition by "tournamentStageId" order by score desc) rank,
-- row_number() over (partition by "tournamentStageId" order by t.name) position
-- -- total number of wins/losses/drafts to be added (the "score" column from the screenshot)
-- from team_scores ts
-- inner join teams t on t.id = ts."teamId"
-- order by "tournamentStageId", rank, position
I've created a sandbox for those who is brave enough to get a deep dive into the task: https://www.db-fiddle.com/f/6jsFFnxQMKwNQWznR3VXHC/2
Also, I've already crafted the part that creates a list of teams together with scores and points, so the attached query will be used as a joined one or sub-select.
Query plan on the real database and query (some indexes, probably, are missing, but that's ok for this moment):
GroupAggregate (cost=24862.28..29423.68 rows=3 width=24)
" Group Key: v_lag.""computerGameId"", v_lag.""tournamentStageId"", v_lag.""teamId"", v_lag.win"
-> Incremental Sort (cost=24862.28..29423.61 rows=3 width=16)
" Sort Key: v_lag.""computerGameId"", v_lag.""tournamentStageId"", v_lag.""teamId"", v_lag.win"
" Presorted Key: v_lag.""computerGameId"", v_lag.""tournamentStageId"", v_lag.""teamId"""
-> Subquery Scan on v_lag (cost=22581.67..29423.47 rows=3 width=16)
Filter: ((v_lag.win = v_lag.lag_win) AND (v_lag.lag_win = v_lag.first_win))
-> WindowAgg (cost=22581.67..27468.67 rows=130320 width=32)
-> Subquery Scan on v_mto (cost=22581.67..24210.67 rows=130320 width=24)
-> Sort (cost=22581.67..22907.47 rows=130320 width=28)
" Sort Key: m.""computerGameId"", m.""tournamentStageId"", mt.""teamId"", m.""scheduledAt"" DESC"
-> Hash Join (cost=3863.39..8391.38 rows=130320 width=28)
" Hash Cond: (mt.""matchId"" = m.id)"
-> Seq Scan on match_teams mt (cost=0.00..2382.81 rows=137281 width=8)
-> Hash (cost=2658.10..2658.10 rows=65623 width=24)
-> Seq Scan on matches m (cost=0.00..2658.10 rows=65623 width=24)
Filter: (status = 1)
Thanks everyone for help and suggestions!
The final result:
P.S. it is possible to convert the first query (v_mto) as materialised view or de-normalise win into the match_teams table, as this piece will be used in different queries to build match/game stats.

So, the original query is wrong - gives incorrect result for the standings results.
I've moved to row_number math to solve this task.
The final query (with scores) looks like this:
create materialized view vm_tournament_stage_standings as
with v_mto as (SELECT m.id,
m."computerGameId",
m."tournamentStageId",
mt."teamId",
m."scheduledAt",
(
case
when m."winnerId" IS NULL then 'D'
when m."winnerId" = mt."teamId" then 'W'
else 'L'
end
) win
FROM matches m
INNER JOIN match_teams mt ON
m.id = mt."matchId"
WHERE m.status = 1),
v_streaks as (select "computerGameId",
"tournamentStageId",
"teamId",
row_number()
over grp_ord_matches
- row_number()
over (partition by "computerGameId", "tournamentStageId", "teamId", win order by "scheduledAt" desc ) streak_index,
win
from v_mto
window grp_ord_matches as (partition by "computerGameId", "tournamentStageId", "teamId" order by "scheduledAt" desc)),
v_streak as (select "computerGameId",
"tournamentStageId",
"teamId",
count(1) || win as streak
from v_streaks
where streak_index = 0
group by "computerGameId", "tournamentStageId", "teamId", "win"),
team_scores as (select m."tournamentStageId",
"teamId",
sum((m."winnerId" = mt."teamId")::int) as wins,
sum((m."winnerId" is null)::int) draws,
sum((m."winnerId" <> mt."teamId")::int) loses,
sum(
coalesce((m."winnerId" = mt."teamId")::integer, 0) * 3
+
(m."winnerId" IS NULL)::int
) as score
from matches m
inner join match_teams mt on m.id = mt."matchId"
where m.status = 1
group by m."tournamentStageId", "teamId")
select ts."tournamentStageId",
ts."teamId",
score,
wins,
draws,
loses,
vs.streak as streak,
dense_rank() over (partition by ts."tournamentStageId" order by score desc) rank,
row_number() over (partition by ts."tournamentStageId" order by t.name) position
from team_scores ts
inner join teams t on t.id = ts."teamId"
inner join v_streak vs on vs."teamId" = t.id and vs."tournamentStageId" = ts."tournamentStageId"
order by "tournamentStageId", rank, position

TSQL Get Item Price History from Item Price Changes

I have a table of item price changes, and I want to use it to create a table of item prices for each date (between the item's launch and end dates).
Here's some code to create the date:-
declare #Item table (item_id int, item_launch_date date, item_end_date date);
insert into #Item Values (1,'2001-01-01','2016-01-01'), (2,'2001-01-01','2016-01-01')
declare #ItemPriceChanges table (item_id int, item_price money, my_date date);
INSERT INTO #ItemPriceChanges VALUES (1, 123.45, '2001-01-01'), (1, 345.34, '2001-01-03'), (2, 34.34, '2001-01-01'), (2,23.56 , '2005-01-01'), (2, 56.45, '2016-05-01'), (2, 45.45, '2017-05-01'); ;
What I'd like to see is something like this:-
item_id date price
------- ---- -----
1 2001-01-01 123.45
1 2001-01-02 123.45
1 2001-01-03 345.34
1 2001-01-04 345.34
etc.
2 2001-01-01 34.34
2 2001-01-02 34.34
etc.
Any suggestions on how to write the query?
I'm using SQL Server 2016.
Added:
I also have a calendar table called "dim_calendar" with one row per day. I had hoped to use a windowing function, but the nearest I can find is lead() and it doesn't do what I thought it would do:-
select
i.item_id,
c.day_date,
ipc.item_price as item_price_change,
lead(item_price,1,NULL) over (partition by i.item_id ORDER BY c.day_date) as item_price
from dim_calendar c
inner join #Item i
on c.day_date between i.item_launch_date and i.item_end_date
left join #ItemPriceChanges ipc
on i.item_id=ipc.item_id
and ipc.my_date=c.day_date
order by
i.item_id,
c.day_date;
Thanks

I wrote this prior to your edit. Note that your sample output suggests that an item can have two prices on the day of the price change. The following assumes that an item can only have one price on a price change day and that is the new price.
declare #Item table (item_id int, item_launch_date date, item_end_date date);
insert into #Item Values (1,'2001-01-01','2016-01-01'), (2,'2001-01-01','2016-01-01')
declare #ItemPriceChange table (item_id int, item_price money, my_date date);
INSERT INTO #ItemPriceChange VALUES (1, 123.45, '2001-01-01'), (1, 345.34, '2001-01-03'), (2, 34.34, '2001-01-01'), (2,23.56 , '2005-01-01'), (2, 56.45, '2016-05-01'), (2, 45.45, '2017-05-01');
SELECT * FROM #ItemPriceChange
-- We need a table variable holding all possible date points for the output
DECLARE #DatePointList table (DatePoint date);
DECLARE #StartDatePoint date = '01-Jan-2001';
DECLARE #MaxDatePoint date = GETDATE();
DECLARE #DatePoint date = #StartDatePoint;
WHILE #DatePoint <= #MaxDatePoint BEGIN
INSERT INTO #DatePointList (DatePoint)
SELECT #DatePoint;
SET #DatePoint = DATEADD(DAY,1,#DatePoint);
END;
-- We can use a CTE to sequence the price changes
WITH ItemPriceChange AS (
SELECT item_id, item_price, my_date, ROW_NUMBER () OVER (PARTITION BY Item_id ORDER BY my_date ASC) AS SeqNo
FROM #ItemPriceChange
)
-- With the price changes sequenced, we can derive from and to dates for each price and use a join to the table of date points to produce the output. Also, use an inner join back to #item to only return rows for dates that are within the start/end date of the item
SELECT ItemPriceDate.item_id, DatePointList.DatePoint, ItemPriceDate.item_price
FROM #DatePointList AS DatePointList
INNER JOIN (
SELECT ItemPriceChange.item_id, ItemPriceChange.item_price, ItemPriceChange.my_date AS from_date, ISNULL(ItemPriceChange_Next.my_date,#MaxDatePoint) AS to_date
FROM ItemPriceChange
LEFT OUTER JOIN ItemPriceChange AS ItemPriceChange_Next ON ItemPriceChange_Next.item_id = ItemPriceChange.item_id AND ItemPriceChange.SeqNo = ItemPriceChange_Next.SeqNo - 1
) AS ItemPriceDate ON DatePointList.DatePoint >= ItemPriceDate.from_date AND DatePointList.DatePoint < ItemPriceDate.to_date
INNER JOIN #item AS item ON item.item_id = ItemPriceDate.item_id AND DatePointList.DatePoint BETWEEN item.item_launch_date AND item.item_end_date
ORDER BY ItemPriceDate.item_id, DatePointList.DatePoint;

#AlphaStarOne Perfect! I've modified it to use a Windowing function rather than a self-join, but what you've suggested works. Here's my implementation of that in case anyone else needs it:
SELECT
ipd.item_id,
dc.day_date,
ipd.item_price
FROM dim_calendar dc
INNER JOIN (
SELECT
item_id,
item_price,
my_date AS from_date,
isnull(lead(my_date,1,NULL) over (partition by item_id ORDER BY my_date),getdate()) as to_date
FROM #ItemPriceChange ipc1
) AS ipd
ON dc.day_date >= ipd.from_date
AND dc.day_date < ipd.to_date
INNER JOIN #item AS i
ON i.item_id = ipd.item_id
AND dc.day_date BETWEEN i.item_launch_date AND i.item_end_date
ORDER BY
ipd.item_id,
dc.day_date;

TSQL - Replace Cursor

I found in our database a cursor statement and I would like to replace it.
Declare #max_date datetime
Select #max_date = max(finished) From Payments
Declare #begin_date datetime = '2015-02-01'
Declare #end_of_last_month datetime
While #begin_date <= #max_date
Begin
SELECT #end_of_last_month = CAST(DATEADD(DAY, -1 , DATEFROMPARTS(YEAR(#begin_date),MONTH(#begin_date),1)) AS DATE) --AS end_of_last_month
Insert Into #table(Customer, ArticleTypeID, ArticleType, end_of_month, month, year)
Select Count(distinct (customerId)), prod.ArticleTypeID, at.ArticleType, #end_of_last_month, datepart(month, #end_of_last_month), datepart(year, #end_of_last_month)
From Customer cust
Inner join Payments pay ON pay.member_id = m.member_id
Inner Join Products prod ON prod.product_id = pay.product_id
Inner Join ArticleType at ON at.ArticleTypeID = prod.ArticleTypeID
Where #end_of_last_month between begin_date and expire_date
and completed = 1
Group by prod.ArticleTypeID, at.ArticleType
order by prod.ArticleTypeID, at.ArticleType
Set #begin_date = DATEADD(month, 1, #begin_date)
End
It groups all User per Month where the begin- and expire date in the actual Cursormonth.
Notes:
The user has different payment types, for e.g. 1 Month, 6 Month and so on.
Is it possible to rewrite the code - my problem is only the identification at the where clause (#end_of_last_month between begin_date and expire_date)
How can I handle this with joins or cte's?

What you need first, if not already is a numbers table
Using said Numbers table you can create a dynamic list of dates for "end_of_Last_Month" like so
;WITH ctexAllDates
AS (
SELECT end_of_last_month = DATEADD(DAY, -1, DATEADD(MONTH, N.N -1, #begin_date))
FROM
dbo.Numbers N
WHERE
N.N <= DATEDIFF(MONTH, #begin_date, #max_date) + 1
)
select * FROM ctexAllDates
Then combine with your query like so
;WITH ctexAllDates
AS (
SELECT end_of_last_month = DATEADD(DAY, -1, DATEADD(MONTH, N.N -1, #begin_date))
FROM
dbo.Numbers N
WHERE
N.N <= DATEDIFF(MONTH, #begin_date, #max_date) + 1
)
INSERT INTO #table
(
Customer
, ArticleTypeID
, ArticleType
, end_of_month
, month
, year
)
SELECT
COUNT(DISTINCT (customerId))
, prod.ArticleTypeID
, at.ArticleType
, A.end_of_last_month
, DATEPART(MONTH, A.end_of_last_month)
, DATEPART(YEAR, A.end_of_last_month)
FROM
Customer cust
INNER JOIN Payments pay ON pay.member_id = m.member_id
INNER JOIN Products prod ON prod.product_id = pay.product_id
INNER JOIN ArticleType at ON at.ArticleTypeID = prod.ArticleTypeID
LEFT JOIN ctexAllDates A ON A.end_of_last_month BETWEEN begin_date AND expire_date
WHERE completed = 1
GROUP BY
prod.ArticleTypeID
, at.ArticleType
, A.end_of_last_month
ORDER BY
prod.ArticleTypeID
, at.ArticleType;

Use WHERE statement in OVER()

I'm trying to create a query, which will give me a row_number for all the returned records. I can do that for all records present in the database. The problem is, i need to somehow retrieve a row number for a query with WHERE statement inside (WHERE posts.status = 'published').
My original query looks like that:
SELECT
posts.*,
row_number() over (ORDER BY posts.score DESC) as position
FROM posts
However, adding a where statement inside over() throws syntax error:
SELECT
posts.*,
row_number() over (
WHERE posts.status = 'published'
ORDER BY posts.score DESC
) as position
FROM posts

SELECT posts.*, row_number() over (ORDER BY posts.score DESC) as position
FROM posts
WHERE posts.status = 'published'

Not quite sure what you are after. Maybe show an example of expected output. Here is an an example of an approach:
create table posts(id int, score int, status text);
insert into posts values(1, 1, 'x');
insert into posts values(2, 2, 'published');
insert into posts values(3, 3, 'x');
insert into posts values(4, 4, 'x');
SELECT x.id, x.score, x.status
,CASE WHEN x.status = 'published' THEN null ELSE x.position END
FROM (SELECT posts.*,
row_number() OVER (ORDER BY posts.score DESC)
-SUM(CASE WHEN status = 'published' THEN 1 ELSE 0 END)
OVER (ORDER BY posts.score DESC) as position
FROM posts
) x
Result:
4 4 x 1
3 3 x 2
2 2 published
1 1 x 3

Dealing with periods and dates without using cursors

I would like to solve this issue avoiding to use cursors (FETCH).
Here comes the problem...
1st Table/quantity
------------------
periodid periodstart periodend quantity
1 2010/10/01 2010/10/15 5
2st Table/sold items
-----------------------
periodid periodstart periodend solditems
14343 2010/10/05 2010/10/06 2
Now I would like to get the following view or just query result
Table Table/stock
-----------------------
periodstart periodend itemsinstock
2010/10/01 2010/10/04 5
2010/10/05 2010/10/06 3
2010/10/07 2010/10/15 5
It seems impossible to solve this problem without using cursors, or without using single dates instead of periods.
I would appreciate any help.
Thanks

DECLARE #t1 TABLE (periodid INT,periodstart DATE,periodend DATE,quantity INT)
DECLARE #t2 TABLE (periodid INT,periodstart DATE,periodend DATE,solditems INT)
INSERT INTO #t1 VALUES(1,'2010-10-01T00:00:00.000','2010-10-15T00:00:00.000',5)
INSERT INTO #t2 VALUES(14343,'2010-10-05T00:00:00.000','2010-10-06T00:00:00.000',2)
DECLARE #D1 DATE
SELECT #D1 = MIN(P) FROM (SELECT MIN(periodstart) P FROM #t1
UNION ALL
SELECT MIN(periodstart) FROM #t2) D
DECLARE #D2 DATE
SELECT #D2 = MAX(P) FROM (SELECT MAX(periodend) P FROM #t1
UNION ALL
SELECT MAX(periodend) FROM #t2) D
;WITH
L0 AS (SELECT 1 AS c UNION ALL SELECT 1),
L1 AS (SELECT 1 AS c FROM L0 A CROSS JOIN L0 B),
L2 AS (SELECT 1 AS c FROM L1 A CROSS JOIN L1 B),
L3 AS (SELECT 1 AS c FROM L2 A CROSS JOIN L2 B),
L4 AS (SELECT 1 AS c FROM L3 A CROSS JOIN L3 B),
Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS i FROM L4),
Dates AS(SELECT DATEADD(DAY,i-1,#D1) AS D FROM Nums where i <= 1+DATEDIFF(DAY,#D1,#D2)) ,
Stock As (
SELECT D ,t1.quantity - ISNULL(t2.solditems,0) AS itemsinstock
FROM Dates
LEFT OUTER JOIN #t1 t1 ON t1.periodend >= D and t1.periodstart <= D
LEFT OUTER JOIN #t2 t2 ON t2.periodend >= D and t2.periodstart <= D ),
NStock As (
select D,itemsinstock, ROW_NUMBER() over (order by D) - ROW_NUMBER() over (partition by itemsinstock order by D) AS G
from Stock)
SELECT MIN(D) AS periodstart, MAX(D) AS periodend, itemsinstock
FROM NStock
GROUP BY G, itemsinstock
ORDER BY periodstart

Hopefully a little easier to read than Martin's. I used different tables and sample data, hopefully extrapolating the right info:
CREATE TABLE [dbo].[Quantity](
[PeriodStart] [date] NOT NULL,
[PeriodEnd] [date] NOT NULL,
[Quantity] [int] NOT NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[SoldItems](
[PeriodStart] [date] NOT NULL,
[PeriodEnd] [date] NOT NULL,
[SoldItems] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO Quantity (PeriodStart,PeriodEnd,Quantity)
SELECT '20100101','20100115',5
INSERT INTO SoldItems (PeriodStart,PeriodEnd,SoldItems)
SELECT '20100105','20100107',2 union all
SELECT '20100106','20100108',1
The actual query is now:
;WITH Dates as (
select PeriodStart as DateVal from SoldItems union select PeriodEnd from SoldItems union select PeriodStart from Quantity union select PeriodEnd from Quantity
), Periods as (
select d1.DateVal as StartDate, d2.DateVal as EndDate
from Dates d1 inner join Dates d2 on d1.DateVal < d2.DateVal left join Dates d3 on d1.DateVal < d3.DateVal and d3.DateVal < d2.DateVal where d3.DateVal is null
), QuantitiesSold as (
select StartDate,EndDate,COALESCE(SUM(si.SoldItems),0) as Quantity
from Periods p left join SoldItems si on p.StartDate < si.PeriodEnd and si.PeriodStart < p.EndDate
group by StartDate,EndDate
)
select StartDate,EndDate,q.Quantity - qs.Quantity
from QuantitiesSold qs inner join Quantity q on qs.StartDate < q.PeriodEnd and q.PeriodStart < qs.EndDate
And the result is:
StartDate EndDate (No column name)
2010-01-01 2010-01-05 5
2010-01-05 2010-01-06 3
2010-01-06 2010-01-07 2
2010-01-07 2010-01-08 4
2010-01-08 2010-01-15 5
Explanation: I'm using three Common Table Expressions. The first (Dates) is gathering all of the dates that we're talking about, from the two tables involved. The second (Periods) selects consecutive values from the Dates CTE. And the third (QuantitiesSold) then finds items in the SoldItems table that overlap these periods, and adds their totals together. All that remains in the outer select is to subtract these quantities from the total quantity stored in the Quantity Table

John, what you could do is a WHILE loop. Declare and initialise 2 variables before your loop, one being the start date and the other being end date. Your loop would then look like this:
WHILE(#StartEnd <= #EndDate)
BEGIN
--processing goes here
SET #StartEnd = #StartEnd + 1
END
You would need to store your period definitions in another table, so you could retrieve those and output rows when required to a temporary table.
Let me know if you need any more detailed examples, or if I've got the wrong end of the stick!

Damien,
I am trying to fully understand your solution and test it on a large scale of data, but I receive following errors for your code.
Msg 102, Level 15, State 1, Line 20
Incorrect syntax near 'Dates'.
Msg 102, Level 15, State 1, Line 22
Incorrect syntax near ','.
Msg 102, Level 15, State 1, Line 25
Incorrect syntax near ','.

Damien,
Based on your solution I also wanted to get a neat display for StockItems without overlapping dates. How about this solution?
CREATE TABLE [dbo].[SoldItems](
[PeriodStart] [datetime] NOT NULL,
[PeriodEnd] [datetime] NOT NULL,
[SoldItems] [int] NOT NULL
) ON [PRIMARY]
INSERT INTO SoldItems (PeriodStart,PeriodEnd,SoldItems)
SELECT '20100105','20100106',2 union all
SELECT '20100105','20100108',3 union all
SELECT '20100115','20100116',1 union all
SELECT '20100101','20100120',10
;WITH Dates as (
select PeriodStart as DateVal from SoldItems
union
select PeriodEnd from SoldItems
union
select PeriodStart from Quantity
union
select PeriodEnd from Quantity
), Periods as (
select d1.DateVal as StartDate, d2.DateVal as EndDate
from Dates d1
inner join Dates d2 on d1.DateVal < d2.DateVal
left join Dates d3 on d1.DateVal < d3.DateVal and
d3.DateVal < d2.DateVal where d3.DateVal is null
), QuantitiesSold as (
select StartDate,EndDate,SUM(si.SoldItems) as Quantity
from Periods p left join SoldItems si on p.StartDate < si.PeriodEnd and si.PeriodStart < p.EndDate
group by StartDate,EndDate
)
select StartDate,EndDate, qs.Quantity
from QuantitiesSold qs
where qs.quantity is not null

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to calculate longest streak in SQL? - tsql

Related

Optimise query: count latest win/lose streak for all teams

TSQL Get Item Price History from Item Price Changes

TSQL - Replace Cursor

Use WHERE statement in OVER()

Dealing with periods and dates without using cursors

Categories

Resources