Related
everyone. I am a beginner of Postgresql. Recently I met with one question.
I have one table named 'sales'.
create table sales
(
cust varchar(20),
prod varchar(20),
day integer,
month integer,
year integer,
state char(2),
quant integer
);
insert into sales values ('Bloom', 'Pepsi', 2, 12, 2001, 'NY', 4232);
insert into sales values ('Knuth', 'Bread', 23, 5, 2005, 'PA', 4167);
insert into sales values ('Emily', 'Pepsi', 22, 1, 2006, 'CT', 4404);
insert into sales values ('Emily', 'Fruits', 11, 1, 2000, 'NJ', 4369);
insert into sales values ('Helen', 'Milk', 7, 11, 2006, 'CT', 210);
insert into sales values ('Emily', 'Soap', 2, 4, 2002, 'CT', 2549);
insert into sales values ('Bloom', 'Eggs', 30, 11, 2000, 'NJ', 559);
....
There are 498 rows in total.
Here is the overview of this table:
Now I want to compute the maximum and minimum sales quantities for each product, along with their corresponding customer (who purchased the product), dates (i.e., dates of those maximum and minimum sales quantities) and the state in which the sale transaction took place.
And the average sales quantity for the corresponding products.
The combined one should be like this:
It should have 10 rows because there are 10 distinct products in total.
I have tried:
select prod,
max(quant),
cust as MAX_CUST
from sales
group by prod;
but it returned an error and said the cust should be in the group by. But I only want to classify by the type of product.
What's more, how can I horizontally combine the max_q and its customer, date, state with min_q and its customer, date, state and also the AVG_Q by their product name?
I feel really confused!
You can use analytic function ROW_NUMBER to rank records by increasing/decreasing sales for each product in a subquery, and then do conditional aggregation:
SELECT
prod product,
MAX(CASE WHEN rn2 = 1 THEN quant END) max_quant,
MAX(CASE WHEN rn2 = 1 THEN cust END) max_cust,
MAX(CASE WHEN rn2 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) max_date,
MAX(CASE WHEN rn2 = 1 THEN state END) max_state,
MAX(CASE WHEN rn1 = 1 THEN quant END) min_quant,
MAX(CASE WHEN rn1 = 1 THEN cust END) min_cust,
MAX(CASE WHEN rn1 = 1 THEN TO_DATE(year || '-' || month || '-' || day, 'YYYY-MM-DD') END) min_date,
MAX(CASE WHEN rn1 = 1 THEN state END) min_state,
avg_quant
FROM (
SELECT
s.*,
ROW_NUMBER() OVER(PARTITION BY prod ORDER BY quant) rn1,
ROW_NUMBER() OVER(PARTITION BY prod ORDER BY quant DESC) rn2,
AVG(quant) OVER(PARTITION BY prod) avg_quant
FROM sales s
) x
WHERE rn1 = 1 OR rn2 = 1
GROUP BY prod, avg_quant
With two aggregate function (min, max) applied on a column and selecting respective row is not that straight forward. if u wanted only one aggregate function u could do something like example below with dense rank (window function).
SELECT prod, quant cust,
dense_rank() OVER (PARTITION BY prod ORDER BY quant DESC) AS c_rank
FROM sales WHERE c_rank < 2;
this will give you rows for a product with maximum quant. you can do same for minimum quant. it will more complicated to do both in same query, you can do it in simple way of creating on the fly tables for each case and joining them as show below.
with max_quant as (
SELECT prod, quant cust,
dense_rank() OVER (PARTITION BY prod ORDER BY quant DESC) AS c_rank
FROM sales WHERE c_rank < 2
),
min_quant as (
SELECT prod, quant cust,
dense_rank() OVER (PARTITION BY prod ORDER BY quant DESC) AS c_rank
FROM sales WHERE c_rank < 2
),
avg_quant as (
select prod, avg(quant) as avg_quant from sales group by prod
)
select mx.prod, mx.quant, mx.cust, mn.quant, mn.cust, ag.avg_quant
from max_quant mx
join min_quant mn on mn.prod = mx.prod
join avg_quant ag on ag.prod = mx.prod;
you cant use a group by to select min/max here as you want to get the complete row for the min/max value of quant which is not possible directly with group by.
I have a TSQL query that works when executed through SSMS but when using the same query in an SSRS dataset it does not produce any results.
I removed the where clause in the TSQL statement which then displayed the results in the ssrs report but i need the where cluase in the TSQL statement to produce the output.
I have changed the where clause data type using cast/convert/int/varchar/date without any success.
Does anybody have a solution?
Although i tried filtering in the SSRS report itself which also did not work, i don't want to use this technique and the results will be output in csv.
Any ideas, thoughts, solutions appreciated - Thanks in advance.
Update: The report has no parameters and is designed to drop a csv based on contents of the sql below when subscribed to run;
select TenancyRef, FullName, Telephone1, Telephone2,convert(varchar (2),weekucpayday) wkpayday, substring(convert(varchar,getdate(),103),1,2) daypayday --,Currentdate, paydate,
from (
SELECT CAST(RTRIM(ZZ.propref) AS nvarchar) + CAST(RTRIM(ZZ.tensuffix) AS nvarchar) + CAST(RTRIM(ZZ.checkdigit) AS nvarchar) AS TenancyRef, RTRIM(ZZ.perstitle) + ' ' + RTRIM(ZZ.perfnames)
+ ' ' + RTRIM(ZZ.persname) AS FullName,
ZZ.Mobile_Number AS Telephone1,
ZZ.Home_Number AS Telephone2,
ZZ.tenind AS Status,
ZZ.ArrearsBalance,
a.ucdaycode,
ZZ.tenaltkey,
cast(cast(a.ucdaycode as varchar) +'/' + cast(month(GETDATE()) as varchar) +'/' + cast(YEAR(GETDATE()) as varchar ) as date) as paydate,
datename(dw,convert(date,(cast(a.ucdaycode as varchar) +'/' + cast(month(GETDATE()) as varchar) +'/' + cast(YEAR(GETDATE()) as varchar )),103)) payday,
datepart(dd,(case when datepart(dw,cast(DATEADD(dd, a.ucdaycode, DATEADD(month, DATEDIFF(month, 0, GETDATE()), -1)) as date)) = 6 then
cast(DATEADD(dd, a.ucdaycode, DATEADD(month, DATEDIFF(month, 0, GETDATE()), -2)) as date)
when datepart(dw,cast(DATEADD(dd, a.ucdaycode, DATEADD(month, DATEDIFF(month, 0, GETDATE()), -1)) as date)) = 7 then
cast(DATEADD(dd, a.ucdaycode, DATEADD(month, DATEDIFF(month, 0, GETDATE()), -3)) as date)
else cast(DATEADD(dd, a.ucdaycode, DATEADD(month, DATEDIFF(month, 0, GETDATE()), -1)) as date)
end)) as weekucpayday,
cast(DATEADD(dd, a.ucdaycode, DATEADD(month, DATEDIFF(month, 0, GETDATE()), -1)) as date) AS ucpayday
, convert(nvarchar,GETDATE(),1) Currentdate
FROM [ucdetail ] AS a INNER JOIN
(SELECT Z.propref, Z.tensuffix, Z.checkdigit, Z.tenind, Z.tenaltkey, Z.ArrearsBalance, Z.paymthdcode, T3.perstitle, T3.perfnames, T3.persname, T1.contdetails AS Mobile_Number,
T2.contdetails AS Home_Number
FROM (SELECT T.propref, T.tensuffix, T.tenaltkey, T.tenind, T.checkdigit, occupancy.relncode, occupancy.occend, occupancy.perref AS OccPerref, T.tenarrsbal AS ArrearsBalance, T.paymthdcode
FROM tenancy AS T LEFT OUTER JOIN
occupancy ON T.tenaltkey = occupancy.tenaltkey
WHERE (occupancy.relncode = 'T1') AND (GETDATE() < occupancy.occend) OR
(occupancy.relncode = 'T1') AND (occupancy.occend IS NULL)) AS Z LEFT OUTER JOIN
(SELECT perref, contdetails
FROM [contdetails ] AS C1
WHERE (UPPER(commtype) = 'T') AND (UPPER(commdesccode) = 'MOB') AND (UPPER(activeind) = 'Y') AND (UPPER(preferred) = 'Y')) AS T1 ON Z.OccPerref = T1.perref LEFT OUTER JOIN
(SELECT perref, contdetails
FROM [contdetails ] AS C2
WHERE (UPPER(commtype) = 'T') AND (UPPER(commdesccode) = 'HOME') AND (UPPER(activeind) = 'Y') AND (UPPER(preferred) = 'Y')) AS T2 ON Z.OccPerref = T2.perref LEFT OUTER JOIN
(SELECT perstitle, perfnames, persname, perref
FROM person) AS T3 ON Z.OccPerref = T3.perref) AS ZZ ON a.tenaltkey = ZZ.tenaltkey
WHERE ZZ.tenaltkey not in (select postref from tranmisc where tranmisc.acctcode = 'ATBEN') and ZZ.paymthdcode != 'DD' and (ZZ.Mobile_Number IS NOT NULL OR
ZZ.Home_Number IS NOT NULL) AND (a.status = 'Active') AND (ZZ.tenind = 'C') AND (a.ucdaycode <> 0) --AND (a.ucdaycode = DATEPART(dd, GETDATE()))
) AS UC where convert(int,UC.weekucpayday) = convert(int,(DATEPART(dd,getdate())))
I found in our database a cursor statement and I would like to replace it.
Declare #max_date datetime
Select #max_date = max(finished) From Payments
Declare #begin_date datetime = '2015-02-01'
Declare #end_of_last_month datetime
While #begin_date <= #max_date
Begin
SELECT #end_of_last_month = CAST(DATEADD(DAY, -1 , DATEFROMPARTS(YEAR(#begin_date),MONTH(#begin_date),1)) AS DATE) --AS end_of_last_month
Insert Into #table(Customer, ArticleTypeID, ArticleType, end_of_month, month, year)
Select Count(distinct (customerId)), prod.ArticleTypeID, at.ArticleType, #end_of_last_month, datepart(month, #end_of_last_month), datepart(year, #end_of_last_month)
From Customer cust
Inner join Payments pay ON pay.member_id = m.member_id
Inner Join Products prod ON prod.product_id = pay.product_id
Inner Join ArticleType at ON at.ArticleTypeID = prod.ArticleTypeID
Where #end_of_last_month between begin_date and expire_date
and completed = 1
Group by prod.ArticleTypeID, at.ArticleType
order by prod.ArticleTypeID, at.ArticleType
Set #begin_date = DATEADD(month, 1, #begin_date)
End
It groups all User per Month where the begin- and expire date in the actual Cursormonth.
Notes:
The user has different payment types, for e.g. 1 Month, 6 Month and so on.
Is it possible to rewrite the code - my problem is only the identification at the where clause (#end_of_last_month between begin_date and expire_date)
How can I handle this with joins or cte's?
What you need first, if not already is a numbers table
Using said Numbers table you can create a dynamic list of dates for "end_of_Last_Month" like so
;WITH ctexAllDates
AS (
SELECT end_of_last_month = DATEADD(DAY, -1, DATEADD(MONTH, N.N -1, #begin_date))
FROM
dbo.Numbers N
WHERE
N.N <= DATEDIFF(MONTH, #begin_date, #max_date) + 1
)
select * FROM ctexAllDates
Then combine with your query like so
;WITH ctexAllDates
AS (
SELECT end_of_last_month = DATEADD(DAY, -1, DATEADD(MONTH, N.N -1, #begin_date))
FROM
dbo.Numbers N
WHERE
N.N <= DATEDIFF(MONTH, #begin_date, #max_date) + 1
)
INSERT INTO #table
(
Customer
, ArticleTypeID
, ArticleType
, end_of_month
, month
, year
)
SELECT
COUNT(DISTINCT (customerId))
, prod.ArticleTypeID
, at.ArticleType
, A.end_of_last_month
, DATEPART(MONTH, A.end_of_last_month)
, DATEPART(YEAR, A.end_of_last_month)
FROM
Customer cust
INNER JOIN Payments pay ON pay.member_id = m.member_id
INNER JOIN Products prod ON prod.product_id = pay.product_id
INNER JOIN ArticleType at ON at.ArticleTypeID = prod.ArticleTypeID
LEFT JOIN ctexAllDates A ON A.end_of_last_month BETWEEN begin_date AND expire_date
WHERE completed = 1
GROUP BY
prod.ArticleTypeID
, at.ArticleType
, A.end_of_last_month
ORDER BY
prod.ArticleTypeID
, at.ArticleType;
I was trying to solve the issue of returning sum for each date in a specific date range, no matter if data is present for a day or not.
I have found that the best way would be to use a table pre-populated with all dates, select date range and union it with my data.
For some reason couldn't get left join to work, but union looks like working almost perfectly. The only issue is that it returns duplicates for dates where data is present in my data table.
SELECT NULL AS Visitors
,D.DATE AS Day
FROM Database.support.dates D
WHERE D.DATE BETWEEN #Start_date
AND #End_date
UNION
SELECT count(V.id) AS Visitors
,DATEADD(day, 0, DATEDIFF(day, 0, V.CreateTime)) AS Day
FROM Database.Clients.Clients V
WHERE V.CreateTime BETWEEN #Start_date
AND #End_date
AND V.WADID = #WADID
AND (
V.WAPID = #WAPID
OR #WAPID IS NULL
)
GROUP BY DATEADD(day, 0, DATEDIFF(day, 0, V.CreateTime))
ORDER BY Day DESC
Was researching it whole last day and still can't get it working :/
I think that left joining your calendar table to your current query actually was the right thing to do. That being said, you can remedy your current situation by simply aggregating on the day, e.g. using MAX. By default, NULL values for each day would be ignored, so long as there is a non null visitor count present:
WITH cte AS (
SELECT NULL AS Visitors, D.DATE AS Day
FROM Database.support.dates D
WHERE D.DATE BETWEEN #Start_date AND #End_date
UNION
SELECT COUNT(V.id), DATEADD(day, 0, DATEDIFF(day, 0, V.CreateTime))
FROM Database.Clients.Clients V
WHERE V.CreateTime BETWEEN #Start_date AND #End_date AND
V.WADID = #WADID AND (V.WAPID = #WAPID OR #WAPID IS NULL)
GROUP BY DATEADD(day, 0, DATEDIFF(day, 0, V.CreateTime))
)
SELECT Day, MAX(Visitors) AS Visitors -- filter off unwanted NULL values
FROM cte
GROUP BY Day
ORDER BY Day DESC;
I have
TABLE EMPLOYEE - ID,DATE,IsPresent
I want to calculate longest streak for a employee presence.The Present bit will be false for days he didnt come..So I want to calculate the longest number of days he came to office for consecutive dates..I have the Date column field is unique...So I tried this way -
Select Id,Count(*) from Employee where IsPresent=1
But the above doesnt work...Can anyone guide me towards how I can calculate streak for this?....I am sure people have come across this...I tried searching online but...didnt understand it well...please help me out..
EDIT Here's a SQL Server version of the query:
with LowerBound as (select second_day.EmployeeId
, second_day."DATE" as LowerDate
, row_number() over (partition by second_day.EmployeeId
order by second_day."DATE") as RN
from T second_day
left outer join T first_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = dateadd(day, -1, second_day."DATE")
and first_day.IsPresent = 1
where first_day.EmployeeId is null
and second_day.IsPresent = 1)
, UpperBound as (select first_day.EmployeeId
, first_day."DATE" as UpperDate
, row_number() over (partition by first_day.EmployeeId
order by first_day."DATE") as RN
from T first_day
left outer join T second_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = dateadd(day, -1, second_day."DATE")
and second_day.IsPresent = 1
where second_day.EmployeeId is null
and first_day.IsPresent = 1)
select LB.EmployeeID, max(datediff(day, LowerDate, UpperDate) + 1) as LongestStreak
from LowerBound LB
inner join UpperBound UB
on LB.EmployeeId = UB.EmployeeId
and LB.RN = UB.RN
group by LB.EmployeeId
SQL Server Version of the test data:
create table T (EmployeeId int
, "DATE" date not null
, IsPresent bit not null
, constraint T_PK primary key (EmployeeId, "DATE")
)
insert into T values (1, '2000-01-01', 1);
insert into T values (2, '2000-01-01', 0);
insert into T values (3, '2000-01-01', 0);
insert into T values (3, '2000-01-02', 1);
insert into T values (3, '2000-01-03', 1);
insert into T values (3, '2000-01-04', 0);
insert into T values (3, '2000-01-05', 1);
insert into T values (3, '2000-01-06', 1);
insert into T values (3, '2000-01-07', 0);
insert into T values (4, '2000-01-01', 0);
insert into T values (4, '2000-01-02', 1);
insert into T values (4, '2000-01-03', 1);
insert into T values (4, '2000-01-04', 1);
insert into T values (4, '2000-01-05', 1);
insert into T values (4, '2000-01-06', 1);
insert into T values (4, '2000-01-07', 0);
insert into T values (5, '2000-01-01', 0);
insert into T values (5, '2000-01-02', 1);
insert into T values (5, '2000-01-03', 0);
insert into T values (5, '2000-01-04', 1);
insert into T values (5, '2000-01-05', 1);
insert into T values (5, '2000-01-06', 1);
insert into T values (5, '2000-01-07', 0);
Sorry, this is written in Oracle, so substitute the appropriate SQL Server date arithmetic.
Assumptions:
Date is either a Date value or
DateTime with time component of
00:00:00.
The primary key is
(EmployeeId, Date)
All fields are not null
If a date is missing for the employee, they were not present. (Used to handle the beginning and ending of the data series, but also means that missing dates in the middle will break streaks. Could be a problem depending on requirements.
with LowerBound as (select second_day.EmployeeId
, second_day."DATE" as LowerDate
, row_number() over (partition by second_day.EmployeeId
order by second_day."DATE") as RN
from T second_day
left outer join T first_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = second_day."DATE" - 1
and first_day.IsPresent = 1
where first_day.EmployeeId is null
and second_day.IsPresent = 1)
, UpperBound as (select first_day.EmployeeId
, first_day."DATE" as UpperDate
, row_number() over (partition by first_day.EmployeeId
order by first_day."DATE") as RN
from T first_day
left outer join T second_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = second_day."DATE" - 1
and second_day.IsPresent = 1
where second_day.EmployeeId is null
and first_day.IsPresent = 1)
select LB.EmployeeID, max(UpperDate - LowerDate + 1) as LongestStreak
from LowerBound LB
inner join UpperBound UB
on LB.EmployeeId = UB.EmployeeId
and LB.RN = UB.RN
group by LB.EmployeeId
Test Data:
create table T (EmployeeId number(38)
, "DATE" date not null check ("DATE" = trunc("DATE"))
, IsPresent number not null check (IsPresent in (0, 1))
, constraint T_PK primary key (EmployeeId, "DATE")
)
/
insert into T values (1, to_date('2000-01-01', 'YYYY-MM-DD'), 1);
insert into T values (2, to_date('2000-01-01', 'YYYY-MM-DD'), 0);
insert into T values (3, to_date('2000-01-01', 'YYYY-MM-DD'), 0);
insert into T values (3, to_date('2000-01-02', 'YYYY-MM-DD'), 1);
insert into T values (3, to_date('2000-01-03', 'YYYY-MM-DD'), 1);
insert into T values (3, to_date('2000-01-04', 'YYYY-MM-DD'), 0);
insert into T values (3, to_date('2000-01-05', 'YYYY-MM-DD'), 1);
insert into T values (3, to_date('2000-01-06', 'YYYY-MM-DD'), 1);
insert into T values (3, to_date('2000-01-07', 'YYYY-MM-DD'), 0);
insert into T values (4, to_date('2000-01-01', 'YYYY-MM-DD'), 0);
insert into T values (4, to_date('2000-01-02', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-03', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-04', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-05', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-06', 'YYYY-MM-DD'), 1);
insert into T values (4, to_date('2000-01-07', 'YYYY-MM-DD'), 0);
insert into T values (5, to_date('2000-01-01', 'YYYY-MM-DD'), 0);
insert into T values (5, to_date('2000-01-02', 'YYYY-MM-DD'), 1);
insert into T values (5, to_date('2000-01-03', 'YYYY-MM-DD'), 0);
insert into T values (5, to_date('2000-01-04', 'YYYY-MM-DD'), 1);
insert into T values (5, to_date('2000-01-05', 'YYYY-MM-DD'), 1);
insert into T values (5, to_date('2000-01-06', 'YYYY-MM-DD'), 1);
insert into T values (5, to_date('2000-01-07', 'YYYY-MM-DD'), 0);
groupby is missing.
To select total man-days (for everyone) attendance of the whole office.
Select Id,Count(*) from Employee where IsPresent=1
To select man-days attendance per employee.
Select Id,Count(*)
from Employee
where IsPresent=1
group by id;
But that is still not good because it counts the total days of attendance and NOT the length of continuous attendance.
What you need to do is construct a temp table with another date column date2. date2 is set to today. The table is the list of all days an employee is absent.
create tmpdb.absentdates as
Select id, date, today as date2
from EMPLOYEE
where IsPresent=0
order by id, date;
So the trick is to calculate the date difference between two absent days to find the length of continuously present days.
Now, fill in date2 with the next absent date per employee. The most recent record per employee will not be updated but left with value of today because there is no record with greater date than today in the database.
update tmpdb.absentdates
set date2 =
select min(a2.date)
from
tmpdb.absentdates a1,
tmpdb.absentdates a2
where a1.id = a2.id
and a1.date < a2.date
The above query updates itself by performing a join on itself and may cause deadlock query so it is better to create two copies of the temp table.
create tmpdb.absentdatesX as
Select id, date
from EMPLOYEE
where IsPresent=0
order by id, date;
create tmpdb.absentdates as
select *, today as date2
from tmpdb.absentdatesX;
You need to insert the hiring date, presuming the earliest date per employee in the database is the hiring date.
insert into tmpdb.absentdates a
select a.id, min(e.date), today
from EMPLOYEE e
where a.id = e.id
Now update date2 with the next later absent date to be able to perform date2 - date.
update tmpdb.absentdates
set date2 =
select min(x.date)
from
tmpdb.absentdates a,
tmpdb.absentdatesX x
where a.id = x.id
and a.date < x.date
This will list the length of days an emp is continuously present:
select id, datediff(date2, date) as continuousPresence
from tmpdb.absentdates
group by id, continuousPresence
order by id, continuousPresence
But you only want to longest streak:
select id, max(datediff(date2, date) as continuousPresence)
from tmpdb.absentdates
group by id
order by id
However, the above is still problematic because datediff does not take into account holidays and weekends.
So we depend on the count of records as the legitimate working days.
create tmpdb.absentCount as
Select a.id, a.date, a.date2, count(*) as continuousPresence
from EMPLOYEE e, tmpdb.absentdates a
where e.id = a.id
and e.date >= a.date
and e.date < a.date2
group by a.id, a.date
order by a.id, a.date;
Remember, every time you use an aggregator like count, ave
yo need to groupby the selected item list because it is common sense that you have to aggregate by them.
Now select the max streak
select id, max(continuousPresence)
from tmpdb.absentCount
group by id
To list the dates of streak:
select id, date, date2, continuousPresence
from tmpdb.absentCount
group by id
having continuousPresence = max(continuousPresence);
There may be some mistakes (sql server tsql) above but this is the general idea.
Try this:
select
e.Id,
e.date,
(select
max(e1.date)
from
employee e1
where
e1.Id = e.Id and
e1.date < e.date and
e1.IsPresent = 0) StreakStartDate,
(select
min(e2.date)
from
employee e2
where
e2.Id = e.Id and
e2.date > e.date and
e2.IsPresent = 0) StreakEndDate
from
employee e
where
e.IsPresent = 1
Then finds out the longest streak for each employee:
select id, max(datediff(streakStartDate, streakEndDate))
from (<use subquery above>)
group by id
I'm not fully sure this query has correct syntax because I havn't database just now.
Also notice streak start and streak end columns contains not the first and last day when employee was present, but nearest dates when he was absent. If dates in table have approximately equal distance, this does not means, otherwise query become little more complex, because we need to finds out nearest presence dates. Also this improvements allow to handle situation when the longest streak is first or last streak.
The main idea is for each date when employee was present find out streak start and streak end.
For each row in table when employee was present, streak start is maximum date that is less then date of current row when employee was absent.
Here is an alternate version, to handle missing days differently. Say that you only record a record for work days, and being at work Monday-Friday one week and Monday-Friday of the next week counts as ten consecutive days. This query assumes that missing dates in the middle of a series of rows are non-work days.
with LowerBound as (select second_day.EmployeeId
, second_day."DATE" as LowerDate
, row_number() over (partition by second_day.EmployeeId
order by second_day."DATE") as RN
from T second_day
left outer join T first_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = dateadd(day, -1, second_day."DATE")
and first_day.IsPresent = 1
where first_day.EmployeeId is null
and second_day.IsPresent = 1)
, UpperBound as (select first_day.EmployeeId
, first_day."DATE" as UpperDate
, row_number() over (partition by first_day.EmployeeId
order by first_day."DATE") as RN
from T first_day
left outer join T second_day
on first_day.EmployeeId = second_day.EmployeeId
and first_day."DATE" = dateadd(day, -1, second_day."DATE")
and second_day.IsPresent = 1
where second_day.EmployeeId is null
and first_day.IsPresent = 1)
select LB.EmployeeID, max(datediff(day, LowerDate, UpperDate) + 1) as LongestStreak
from LowerBound LB
inner join UpperBound UB
on LB.EmployeeId = UB.EmployeeId
and LB.RN = UB.RN
group by LB.EmployeeId
go
with NumberedRows as (select EmployeeId
, "DATE"
, IsPresent
, row_number() over (partition by EmployeeId
order by "DATE") as RN
-- , min("DATE") over (partition by EmployeeId, IsPresent) as MinDate
-- , max("DATE") over (partition by EmployeeId, IsPresent) as MaxDate
from T)
, LowerBound as (select SecondRow.EmployeeId
, SecondRow.RN
, row_number() over (partition by SecondRow.EmployeeId
order by SecondRow.RN) as LowerBoundRN
from NumberedRows SecondRow
left outer join NumberedRows FirstRow
on FirstRow.IsPresent = 1
and FirstRow.EmployeeId = SecondRow.EmployeeId
and FirstRow.RN + 1 = SecondRow.RN
where FirstRow.EmployeeId is null
and SecondRow.IsPresent = 1)
, UpperBound as (select FirstRow.EmployeeId
, FirstRow.RN
, row_number() over (partition by FirstRow.EmployeeId
order by FirstRow.RN) as UpperBoundRN
from NumberedRows FirstRow
left outer join NumberedRows SecondRow
on SecondRow.IsPresent = 1
and FirstRow.EmployeeId = SecondRow.EmployeeId
and FirstRow.RN + 1 = SecondRow.RN
where SecondRow.EmployeeId is null
and FirstRow.IsPresent = 1)
select LB.EmployeeId, max(UB.RN - LB.RN + 1)
from LowerBound LB
inner join UpperBound UB
on LB.EmployeeId = UB.EmployeeId
and LB.LowerBoundRN = UB.UpperBoundRN
group by LB.EmployeeId
I did this once to determine consecutive days that a fire fighter had been on shift at least 15 minutes.
Your case is a bit more simple.
If you wanted to assume that no employee came more than 32 consecutive times, you could just use a Common Table Expression. But a better approach would be to use a temp table and a while loop.
You will need a column called StartingRowID. Keep joining from your temp table to the employeeWorkDay table for the next consecutive employee work day and insert them back into the temp table. When ##Row_Count = 0, you have captured the longest streak.
Now aggregate by StartingRowID to get the first day of the longest streak. I'm running short on time, or I would include some sample code.