Related
I have the following Tables with the following data:
CREATE TABLE TestSource (
InstrumentID int,
ProviderID int,
KPI1 int,
Col2 varchar(255),
KPI3 int
);
CREATE TABLE TestTarget (
InstrumentID int,
ProviderID int,
KPI1 int,
Col2 varchar(255),
KPI3 int
);
INSERT INTO TestSource (InstrumentID,ProviderID,KPI1,Col2,KPI3)
VALUES (123, 27, 1, 'ABC', 10.0 ),
(1234, 27, 2, 'DEF', 10.0 ),
(345, 27, 1, NULL, 0.00 );
INSERT INTO TestTarget (InstrumentID,ProviderID,KPI1,Col2,KPI3)
VALUES (123, 27, 1, 'ABC', 10.0 ),
(1234, 27, 2, 'DEF', 10.0 ),
(345, 27, 1, 'ABC', 0.0 );
I'm trying to compare the values between tables. Here's the query logic I am currently using:
DECLARE #Result NVARCHAR(max)
;WITH
compare_source (InstrumentID,ProviderID,
/*** Source columns to compare ***/
Col1Source, Col2Source,Col3Source
)
as (
select InstrumentID
,ProviderID
,KPI1
--,ISNULL(Col2,'NA') as Col2
,Col2
,KPI3
from TestSource
group by
InstrumentID
,ProviderID
,KPI1
,Col2
,KPI3
),
compare_target (InstrumentID,ProviderID,
/*** Target columns to compare ***/
Col1Target,Col2Target,Col3Target
)
as
(
select
InstrumentID
,ProviderID
,KPI1
--,1
,Col2
,KPI3
from TestTarget
group by
InstrumentID
,ProviderID
,KPI1
,Col2
,KPI3
)
SELECT #Result = STRING_AGG ('InstrumentID = ' + CONVERT(VARCHAR,InstrumentID)
+ ', Col1: ' + CONVERT(VARCHAR,Col1Source) + ' vs ' + CONVERT(VARCHAR,Col1Target)
+ ', Col2: ' + CONVERT(VARCHAR,Col2Source) + ' vs ' + CONVERT(VARCHAR,Col2Target)
+ ', Col3: ' + CONVERT(VARCHAR,Col3Source) + ' vs ' + CONVERT(VARCHAR,Col3Target)
, CHAR(13) + CHAR(10)
)
FROM
(
select
s.InstrumentID
,s.Col1Source
,t.Col1Target
,s.Col2Source
,t.Col2Target
,s.Col3Source
,t.Col3Target
from compare_source s
left join compare_target t on t.InstrumentID = s.InstrumentID and t.ProviderID = s.ProviderID
where not exists
(
select 1 from compare_target t where
s.InstrumentID = t.InstrumentID AND
( s.Col1Source = t.Col1Target ) OR (ISNULL(s.Col1Source, t.Col1Target) IS NULL) AND
( s.Col2Source = t.Col2Target ) OR (ISNULL(s.Col2Source, t.Col2Target) IS NULL) AND
( s.Col3Source = t.Col3Target ) OR (ISNULL(s.Col3Source, t.Col3Target) IS NULL)
)
) diff
PRINT #Result
When there are no NULL values in my tables, the comparison works well. However, as soon as I attempt to insert NULLs in either of the tables, my comparison logic breaks down and does not account for the differences between tables values.
I know that I could easily do an ISNULL on my columns in my individual selects, however, I'd like to keep it as generic as possible and to only do my comparison checks and NULL checks in my final NOT EXISTS comparison WHERE clause.
I've also tried the following logic in my comparison logic without success:
(
select 1 from compare_target t where
s.InstrumentID = t.InstrumentID AND
( s.Col1Source = t.Col1Target OR (s.Col1Source IS NULL AND t.Col1Target IS NULL) ) AND
( s.Col2Source = t.Col2Target OR (s.Col2Source IS NULL AND t.Col2Target IS NULL) ) AND
( s.Col3Source = t.Col3Target OR (s.Col3Source IS NULL AND t.Col3Target IS NULL) )
)
Another issue I am having is that my query cannot distinguish between data formats (for example, it sees the value 0.00 as equivalent to 0.0)
I'm not totally certain as to what I am missing.
Any help to put me on the right path would be great.
Well the two problems I see are this:
The WHERE clause at the bottom needs to have extra parenthesis to combine your ORs with your ANDs so that the order of precedence is correct:
select 1 from compare_target t where
s.InstrumentID = t.InstrumentID AND
(( s.Col1Source = t.Col1Target ) OR (ISNULL(s.Col1Source, t.Col1Target) IS NULL)) AND
(( s.Col2Source = t.Col2Target ) OR (ISNULL(s.Col2Source, t.Col2Target) IS NULL)) AND
(( s.Col3Source = t.Col3Target ) OR (ISNULL(s.Col3Source, t.Col3Target) IS NULL))
When you make that change the one row that is returned has a NULL value in the Col2Source column. So when you try and build the string that you are sending to STRING_AGG it has a NULL in the middle of it. So the entire string will be NULL. So you will need to use ISNULL in either the subquery in your FROM clause or within the STRING_AGG()....or is suppose right where you had it commented out.
Preamble:
I'm working in an environment (Knowage) that let me only do SELECTs.
$P{something} is the call of a parameter replaced by knowage when running the query on the MSSQL engine
Whay I have to do:
I have to use a parameter to 'filter' the data with an entry that let take out all, currently I can only get filtered data, i cannot take out 'all'
My query looks like this:
select ClusterDRG, t.N as N_cur, u.N as N_pas, t.SubStabilimento
from(
--- prendo i primi 10 DRG del periodo in corso confrontando i valori degli stessi DRG del periodo precedente (indipendentemente dal fatto che fossero o meno tra i primi 10)
SELECT count(ProgrSdo) as N
,ClusterDRG, TipologiaDRG as tipodrg
, ROW_NUMBER() OVER(PARTITION BY anno,tipologiaDRG ORDER BY count(progrsdo) desC) as Ordine
, case when [Anno]=$P{lista_anni} then 'Attuale'
when [Anno]=$P{lista_anni}-1 then 'Precedente'
else cast([Anno] as varchar (4)) end as Anno
FROM
MY_TABLE
where left(Mese,2) <=$P{lista_mesi}
and ANNO = $P{lista_anni} -- l'anno corrente
and SubStabilimento = $P{lista_stabilimenti} -- here is my issue
and codicepresidio=111111
group by ClusterDRG, TipologiaDRG, anno, SubStabilimento
) t
left join
(
SELECT count(ProgrSdo) as N
,ClusterDRG, TipologiaDRG as tipodrg
, case when [Anno]=$P{lista_anni} then 'Attuale'
when [Anno]=$P{lista_anni}-1 then 'Precedente'
else cast([Anno] as varchar (4)) end as Anno
, ROW_NUMBER() OVER(PARTITION BY anno,tipologiaDRG ORDER BY count(progrsdo) desC) as Ordine
FROM MY_TABLE
where left(Mese,2) <= $P{lista_mesi}
and ANNO = $P{lista_anni}-1 -- l'anno passato
and codicepresidio=111111
and SubStabilimento = $P{lista_stabilimenti} -- here is my issue
group by ClusterDRG, TipologiaDRG, anno, SubStabilimento
)u on t.ClusterDRG=u.ClusterDRG and t.TipoDRG=u.TipoDRG and t.SubStabilimento=u.SubStabilimento
my parameters got these values:
$P{lista_mesi} := 5
$P{lista_anni} := 2018
$P{lista_stabilimenti} := **here is my issue**
I want to use $P{lista_stabilimenti} to filter by single entity (and at doing that now it works) or all if a specific value is set.
so If I have $P{lista_stabilimenti} := 'stab1'
I get
clust1, 123, 122, stab1
clust2, 789, 456, stab1
and If I have $P{lista_stabilimenti} := 'ALL' (This is the behaviour I'm trying to make)
I get
clust1, 123, 122, stab1
clust2, 789, 456, stab1
clust1, 321, 221, stab2
clust2, 987, 654, stab2
clust5, 963, 258, stab3
I tryed populating the and SubStabilimento = $P{lista_stabilimenti}
as
[...]
and SubStabilimento in ( case when exists ( select SubStabilimento from Knowage_L15_COLLEGATA where SubStabilimento = $P{lista_stabilimenti}) then $P{lista_stabilimenti}
when $P{lista_stabilimenti} like 'TUTTI' then (
select STUFF(
(
SELECT ',' + SubStabilimento
FROM MY_TABLE v
where CodicePresidio = '111111'
group by SubStabilimento
FOR XML PATH('')
),
1, 1, '')
)
end)
[...]
With this I get a string like stab1,stab2,stab3 but I'm not able to feed it to the IN clause
A pizza made my brain to grind again, my solution is like this:
[...]
and SubStabilimento LIKE ( select case when exists (
select SubStabilimento
from MY_TABLE
where SubStabilimento = $P{lista_stabilimenti})
then $P{lista_stabilimenti}
when $P{lista_stabilimenti} like 'TUTTI'
then '%' end as SubStab )
[...]
I am importing a large text file that consists of several 'reports'. Each report consists of several rows of data. The only way I know when a new report starts is the line starts with "XX". Then all rows following belong to that master row with XX. I am trying to put in a grouping ID so that I can work with the data and parse it into the database.
CREATE TABLE RawData(
ID int IDENTITY(1,1) NOT NULL
,Grp1 int NULL
,Grp2 int NULL
,Rowdata varchar(max) NULL
)
INSERT INTO RawData(Rowdata) VALUES 'XX Monday'
INSERT INTO RawData(Rowdata) VALUES 'Tues day'
INSERT INTO RawData(Rowdata) VALUES 'We d ne s day'
INSERT INTO RawData(Rowdata) VALUES 'Thurs day'
INSERT INTO RawData(Rowdata) VALUES 'F r i d day'
INSERT INTO RawData(Rowdata) VALUES 'XX January'
INSERT INTO RawData(Rowdata) VALUES 'Feb r u a'
INSERT INTO RawData(Rowdata) VALUES 'XX Sun d a y'
INSERT INTO RawData(Rowdata) VALUES 'Sat ur day'
I need to write a script that will update the Grp1 field based on where the "XX" line is at. When I am finished I'd like the table to look like this:
ID Grp1 Grp2 RowData
1 1 1 XX Monday
2 1 2 Tues day
3 1 3 We d ne s day
4 1 4 Thurs day
5 1 5 F r i d day
6 2 1 XX January
7 2 2 Feb r u a
8 3 1 XX Sun d a y
9 3 2 Sat ur day
I know for Grp2 field I can use the DENSE_RANK. The issue I am having is how do I fill in all the values for Grp1. I can do an update where I see the 'XX', but that does not fill in the values below.
Thank you for any advise/help.
This should do the trick
-- sample data
DECLARE #RawData TABLE
(
ID int IDENTITY(1,1) NOT NULL
,Grp1 int NULL
,Grp2 int NULL
,Rowdata varchar(max) NULL
);
INSERT INTO #RawData(Rowdata)
VALUES ('XX Monday'),('Tues day'),('We d ne s day'),('Thurs day'),('F r i d day'),
('XX January'),('Feb r u a'),('XX Sun d a y'),('Sat ur day');
-- solution
WITH rr AS
(
SELECT ID, thisVal = ROW_NUMBER() OVER (ORDER BY ID)
FROM #rawData
WHERE RowData LIKE 'XX %'
),
makeGrp1 AS
(
SELECT
ID,
Grp1 = (SELECT MAX(thisVal) FROM rr WHERE r.id >= rr.id),
RowData
FROM #rawData r
)
SELECT
ID,
Grp1,
Grp2 = ROW_NUMBER() OVER (PARTITION BY Grp1 ORDER BY ID),
RowData
FROM makeGrp1;
UPDATE: below is the code to update you #RawData table; I just re-read the requirement. I'm leaving the original solution as it will help you bbetter understand how my update works:
-- sample data
DECLARE #RawData TABLE
(
ID int IDENTITY(1,1) NOT NULL
,Grp1 int NULL
,Grp2 int NULL
,Rowdata varchar(max) NULL
);
INSERT INTO #RawData(Rowdata)
VALUES ('XX Monday'),('Tues day'),('We d ne s day'),('Thurs day'),('F r i d day'),
('XX January'),('Feb r u a'),('XX Sun d a y'),('Sat ur day');
-- Solution to update the #RawData Table
WITH rr AS
(
SELECT ID, thisVal = ROW_NUMBER() OVER (ORDER BY ID)
FROM #rawData
WHERE RowData LIKE 'XX %'
),
makeGroups AS
(
SELECT
ID,
Grp1 = (SELECT MAX(thisVal) FROM rr WHERE r.id >= rr.id),
Grp2 = ROW_NUMBER()
OVER (PARTITION BY (SELECT MAX(thisVal) FROM rr WHERE r.id >= rr.id) ORDER BY ID)
FROM #rawData r
)
UPDATE #RawData
SET Grp1 = mg.Grp1, Grp2 = mg.Grp2
FROM makeGroups mg
JOIN #RawData rd ON mg.ID = rd.ID;
;with cte0 as (
Select *,Flag = case when RowData like 'XX%' then 1 else 0 end
From RawData )
Update RawData
Set Grp1 = B.Grp1
,Grp2 = B.Grp2
From RawData U
Join (
Select ID
,Grp1 = Sum(Flag) over (Order by ID)
,Grp2 = Row_Number() over (Partition By (Select Sum(Flag) From cte0 Where ID<=a.ID) Order by ID)
From cte0 A
) B on U.ID=B.ID
Select * from RawData
The Updated RawData looks like this
I have a table:
ID as int, ParentId as int, FreeFromTerxt as varchar(max), ActiveUntil as DateTime
As an example, within this table I have two records.
1, 100, 'Some text', '2015-11-30 12:10:09.0000000'
2, 100, 'New text', null
What I am trying to do is get the current active record, which in the case above would by record 1. To do that I just select with the following criteria:
ActiveUntil > GETDATE()
This works great, but if I change the first date to 2015-10-30, I need to get the null record as this record will take precedence.
So I changed the code to be:
((ActiveUntil is NULL) OR (ActiveUntil > GETDATE()))
But this does not work.
Here is some example with union:
DECLARE #t TABLE ( d DATETIME )
INSERT INTO #t
VALUES ( NULL ),
( '2015-11-30' )
SELECT TOP 1 *
FROM ( SELECT * , 1 AS ordering
FROM #t
WHERE d > GETDATE()
UNION ALL
SELECT * , 2 AS ordering
FROM #t
WHERE d IS NULL
) t
ORDER BY ordering, d
For 2015-11-30 it returns 2015-11-30. For 2015-10-30 it returns null.
Try like this:
((ActiveUntil is NULL) OR (CONVERT(char(10), ActiveUntil ,126)) > GETDATE())
Refer MSDN for Cast and Convert. The format specifier 126 is for YYYY-MM-DD. Or you can use CAST
((ActiveUntil is NULL) OR (CAST(ActiveUntil as Date) > GETDATE())
I have
TABLE EMPLOYEE - ID,DATE,IsPresent
I want to calculate longest streak for a employee presence.The Present bit will be false for days he didnt come..So I want to calculate the longest number of days he came to office for consecutive dates..I have the Date column field is unique...So I tried this way -
Select Id,Count(*) from Employee where IsPresent=1
But the above doesnt work...Can anyone guide me towards how I can calculate streak for this?....I am sure people have come across this...I tried searching online but...didnt understand it well...please help me out..
EDIT Here's a SQL Server version of the query:
with LowerBound as (select second_day.EmployeeId
, second_day."DATE" as LowerDate
, row_number() over (partition by second_day.EmployeeId
order by second_day."DATE") as RN
from T second_day
left outer join T first_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = dateadd(day, -1, second_day."DATE")
and first_day.IsPresent = 1
where first_day.EmployeeId is null
and second_day.IsPresent = 1)
, UpperBound as (select first_day.EmployeeId
, first_day."DATE" as UpperDate
, row_number() over (partition by first_day.EmployeeId
order by first_day."DATE") as RN
from T first_day
left outer join T second_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = dateadd(day, -1, second_day."DATE")
and second_day.IsPresent = 1
where second_day.EmployeeId is null
and first_day.IsPresent = 1)
select LB.EmployeeID, max(datediff(day, LowerDate, UpperDate) + 1) as LongestStreak
from LowerBound LB
inner join UpperBound UB
on LB.EmployeeId = UB.EmployeeId
and LB.RN = UB.RN
group by LB.EmployeeId
SQL Server Version of the test data:
create table T (EmployeeId int
, "DATE" date not null
, IsPresent bit not null
, constraint T_PK primary key (EmployeeId, "DATE")
)
insert into T values (1, '2000-01-01', 1);
insert into T values (2, '2000-01-01', 0);
insert into T values (3, '2000-01-01', 0);
insert into T values (3, '2000-01-02', 1);
insert into T values (3, '2000-01-03', 1);
insert into T values (3, '2000-01-04', 0);
insert into T values (3, '2000-01-05', 1);
insert into T values (3, '2000-01-06', 1);
insert into T values (3, '2000-01-07', 0);
insert into T values (4, '2000-01-01', 0);
insert into T values (4, '2000-01-02', 1);
insert into T values (4, '2000-01-03', 1);
insert into T values (4, '2000-01-04', 1);
insert into T values (4, '2000-01-05', 1);
insert into T values (4, '2000-01-06', 1);
insert into T values (4, '2000-01-07', 0);
insert into T values (5, '2000-01-01', 0);
insert into T values (5, '2000-01-02', 1);
insert into T values (5, '2000-01-03', 0);
insert into T values (5, '2000-01-04', 1);
insert into T values (5, '2000-01-05', 1);
insert into T values (5, '2000-01-06', 1);
insert into T values (5, '2000-01-07', 0);
Sorry, this is written in Oracle, so substitute the appropriate SQL Server date arithmetic.
Assumptions:
Date is either a Date value or
DateTime with time component of
00:00:00.
The primary key is
(EmployeeId, Date)
All fields are not null
If a date is missing for the employee, they were not present. (Used to handle the beginning and ending of the data series, but also means that missing dates in the middle will break streaks. Could be a problem depending on requirements.
with LowerBound as (select second_day.EmployeeId
, second_day."DATE" as LowerDate
, row_number() over (partition by second_day.EmployeeId
order by second_day."DATE") as RN
from T second_day
left outer join T first_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = second_day."DATE" - 1
and first_day.IsPresent = 1
where first_day.EmployeeId is null
and second_day.IsPresent = 1)
, UpperBound as (select first_day.EmployeeId
, first_day."DATE" as UpperDate
, row_number() over (partition by first_day.EmployeeId
order by first_day."DATE") as RN
from T first_day
left outer join T second_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = second_day."DATE" - 1
and second_day.IsPresent = 1
where second_day.EmployeeId is null
and first_day.IsPresent = 1)
select LB.EmployeeID, max(UpperDate - LowerDate + 1) as LongestStreak
from LowerBound LB
inner join UpperBound UB
on LB.EmployeeId = UB.EmployeeId
and LB.RN = UB.RN
group by LB.EmployeeId
Test Data:
create table T (EmployeeId number(38)
, "DATE" date not null check ("DATE" = trunc("DATE"))
, IsPresent number not null check (IsPresent in (0, 1))
, constraint T_PK primary key (EmployeeId, "DATE")
)
/
insert into T values (1, to_date('2000-01-01', 'YYYY-MM-DD'), 1);
insert into T values (2, to_date('2000-01-01', 'YYYY-MM-DD'), 0);
insert into T values (3, to_date('2000-01-01', 'YYYY-MM-DD'), 0);
insert into T values (3, to_date('2000-01-02', 'YYYY-MM-DD'), 1);
insert into T values (3, to_date('2000-01-03', 'YYYY-MM-DD'), 1);
insert into T values (3, to_date('2000-01-04', 'YYYY-MM-DD'), 0);
insert into T values (3, to_date('2000-01-05', 'YYYY-MM-DD'), 1);
insert into T values (3, to_date('2000-01-06', 'YYYY-MM-DD'), 1);
insert into T values (3, to_date('2000-01-07', 'YYYY-MM-DD'), 0);
insert into T values (4, to_date('2000-01-01', 'YYYY-MM-DD'), 0);
insert into T values (4, to_date('2000-01-02', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-03', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-04', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-05', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-06', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-07', 'YYYY-MM-DD'), 0);
insert into T values (5, to_date('2000-01-01', 'YYYY-MM-DD'), 0);
insert into T values (5, to_date('2000-01-02', 'YYYY-MM-DD'), 1);
insert into T values (5, to_date('2000-01-03', 'YYYY-MM-DD'), 0);
insert into T values (5, to_date('2000-01-04', 'YYYY-MM-DD'), 1);
insert into T values (5, to_date('2000-01-05', 'YYYY-MM-DD'), 1);
insert into T values (5, to_date('2000-01-06', 'YYYY-MM-DD'), 1);
insert into T values (5, to_date('2000-01-07', 'YYYY-MM-DD'), 0);
groupby is missing.
To select total man-days (for everyone) attendance of the whole office.
Select Id,Count(*) from Employee where IsPresent=1
To select man-days attendance per employee.
Select Id,Count(*)
from Employee
where IsPresent=1
group by id;
But that is still not good because it counts the total days of attendance and NOT the length of continuous attendance.
What you need to do is construct a temp table with another date column date2. date2 is set to today. The table is the list of all days an employee is absent.
create tmpdb.absentdates as
Select id, date, today as date2
from EMPLOYEE
where IsPresent=0
order by id, date;
So the trick is to calculate the date difference between two absent days to find the length of continuously present days.
Now, fill in date2 with the next absent date per employee. The most recent record per employee will not be updated but left with value of today because there is no record with greater date than today in the database.
update tmpdb.absentdates
set date2 =
select min(a2.date)
from
tmpdb.absentdates a1,
tmpdb.absentdates a2
where a1.id = a2.id
and a1.date < a2.date
The above query updates itself by performing a join on itself and may cause deadlock query so it is better to create two copies of the temp table.
create tmpdb.absentdatesX as
Select id, date
from EMPLOYEE
where IsPresent=0
order by id, date;
create tmpdb.absentdates as
select *, today as date2
from tmpdb.absentdatesX;
You need to insert the hiring date, presuming the earliest date per employee in the database is the hiring date.
insert into tmpdb.absentdates a
select a.id, min(e.date), today
from EMPLOYEE e
where a.id = e.id
Now update date2 with the next later absent date to be able to perform date2 - date.
update tmpdb.absentdates
set date2 =
select min(x.date)
from
tmpdb.absentdates a,
tmpdb.absentdatesX x
where a.id = x.id
and a.date < x.date
This will list the length of days an emp is continuously present:
select id, datediff(date2, date) as continuousPresence
from tmpdb.absentdates
group by id, continuousPresence
order by id, continuousPresence
But you only want to longest streak:
select id, max(datediff(date2, date) as continuousPresence)
from tmpdb.absentdates
group by id
order by id
However, the above is still problematic because datediff does not take into account holidays and weekends.
So we depend on the count of records as the legitimate working days.
create tmpdb.absentCount as
Select a.id, a.date, a.date2, count(*) as continuousPresence
from EMPLOYEE e, tmpdb.absentdates a
where e.id = a.id
and e.date >= a.date
and e.date < a.date2
group by a.id, a.date
order by a.id, a.date;
Remember, every time you use an aggregator like count, ave
yo need to groupby the selected item list because it is common sense that you have to aggregate by them.
Now select the max streak
select id, max(continuousPresence)
from tmpdb.absentCount
group by id
To list the dates of streak:
select id, date, date2, continuousPresence
from tmpdb.absentCount
group by id
having continuousPresence = max(continuousPresence);
There may be some mistakes (sql server tsql) above but this is the general idea.
Try this:
select
e.Id,
e.date,
(select
max(e1.date)
from
employee e1
where
e1.Id = e.Id and
e1.date < e.date and
e1.IsPresent = 0) StreakStartDate,
(select
min(e2.date)
from
employee e2
where
e2.Id = e.Id and
e2.date > e.date and
e2.IsPresent = 0) StreakEndDate
from
employee e
where
e.IsPresent = 1
Then finds out the longest streak for each employee:
select id, max(datediff(streakStartDate, streakEndDate))
from (<use subquery above>)
group by id
I'm not fully sure this query has correct syntax because I havn't database just now.
Also notice streak start and streak end columns contains not the first and last day when employee was present, but nearest dates when he was absent. If dates in table have approximately equal distance, this does not means, otherwise query become little more complex, because we need to finds out nearest presence dates. Also this improvements allow to handle situation when the longest streak is first or last streak.
The main idea is for each date when employee was present find out streak start and streak end.
For each row in table when employee was present, streak start is maximum date that is less then date of current row when employee was absent.
Here is an alternate version, to handle missing days differently. Say that you only record a record for work days, and being at work Monday-Friday one week and Monday-Friday of the next week counts as ten consecutive days. This query assumes that missing dates in the middle of a series of rows are non-work days.
with LowerBound as (select second_day.EmployeeId
, second_day."DATE" as LowerDate
, row_number() over (partition by second_day.EmployeeId
order by second_day."DATE") as RN
from T second_day
left outer join T first_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = dateadd(day, -1, second_day."DATE")
and first_day.IsPresent = 1
where first_day.EmployeeId is null
and second_day.IsPresent = 1)
, UpperBound as (select first_day.EmployeeId
, first_day."DATE" as UpperDate
, row_number() over (partition by first_day.EmployeeId
order by first_day."DATE") as RN
from T first_day
left outer join T second_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = dateadd(day, -1, second_day."DATE")
and second_day.IsPresent = 1
where second_day.EmployeeId is null
and first_day.IsPresent = 1)
select LB.EmployeeID, max(datediff(day, LowerDate, UpperDate) + 1) as LongestStreak
from LowerBound LB
inner join UpperBound UB
on LB.EmployeeId = UB.EmployeeId
and LB.RN = UB.RN
group by LB.EmployeeId
go
with NumberedRows as (select EmployeeId
, "DATE"
, IsPresent
, row_number() over (partition by EmployeeId
order by "DATE") as RN
-- , min("DATE") over (partition by EmployeeId, IsPresent) as MinDate
-- , max("DATE") over (partition by EmployeeId, IsPresent) as MaxDate
from T)
, LowerBound as (select SecondRow.EmployeeId
, SecondRow.RN
, row_number() over (partition by SecondRow.EmployeeId
order by SecondRow.RN) as LowerBoundRN
from NumberedRows SecondRow
left outer join NumberedRows FirstRow
on FirstRow.IsPresent = 1
and FirstRow.EmployeeId = SecondRow.EmployeeId
and FirstRow.RN + 1 = SecondRow.RN
where FirstRow.EmployeeId is null
and SecondRow.IsPresent = 1)
, UpperBound as (select FirstRow.EmployeeId
, FirstRow.RN
, row_number() over (partition by FirstRow.EmployeeId
order by FirstRow.RN) as UpperBoundRN
from NumberedRows FirstRow
left outer join NumberedRows SecondRow
on SecondRow.IsPresent = 1
and FirstRow.EmployeeId = SecondRow.EmployeeId
and FirstRow.RN + 1 = SecondRow.RN
where SecondRow.EmployeeId is null
and FirstRow.IsPresent = 1)
select LB.EmployeeId, max(UB.RN - LB.RN + 1)
from LowerBound LB
inner join UpperBound UB
on LB.EmployeeId = UB.EmployeeId
and LB.LowerBoundRN = UB.UpperBoundRN
group by LB.EmployeeId
I did this once to determine consecutive days that a fire fighter had been on shift at least 15 minutes.
Your case is a bit more simple.
If you wanted to assume that no employee came more than 32 consecutive times, you could just use a Common Table Expression. But a better approach would be to use a temp table and a while loop.
You will need a column called StartingRowID. Keep joining from your temp table to the employeeWorkDay table for the next consecutive employee work day and insert them back into the temp table. When ##Row_Count = 0, you have captured the longest streak.
Now aggregate by StartingRowID to get the first day of the longest streak. I'm running short on time, or I would include some sample code.