TSQL, counting coherent days of holiday - tsql

I hope someone can help me on this one. :-)
I wish to count coherent periods of holiday to see if anyone had coherent holiday more than three days in a row. In other words it is not enough to count the number of days overall. The days have to be coherent. In the example of my data below I have illustrated three people with each their own days of holiday. Person 1234 has two periods of two days of holiday in a row, so this person has no periods above three days since there is a day in between two periods (the 3rd). Person 1235 and 1236 each have one period above three days. Time of day in the timestamps has no relevance, so data can be formatted as just date.
What I have:
ID
Start
1234
2022-01-01 00:00:00
1234
2022-01-02 00:00:00
1234
2022-01-04 06:50:00
1234
2022-01-05 06:50:00
1235
2022-01-04 06:50:00
1235
2022-01-05 06:50:00
1235
2022-01-06 00:00:00
1236
2022-01-01 00:00:00
1236
2022-01-02 00:00:00
1236
2022-01-03 06:50:00
1236
2022-01-04 06:50:00
1236
2022-01-05 06:50:00
1236
2022-01-08 00:00:00
What I hope to get:
ID
N holidays > 3 days
1234
0
1235
1
1236
1
Anyways, any help will be appreciated!
Kind regards,
Jacob

This is a "gaps and islands" problem. You need to first group the data into "islands", which in your case is groups of consecutive holidays. Then summarize them in your final result set
Side note: your question requests greater than 3 days, but your expected output uses greater than or equal to 3 so I used that instead.
DROP TABLE IF EXISTS #Holiday;
DROP TABLE IF EXISTS #ConsecutiveHoliday
CREATE TABLE #Holiday (ID INT,StartDateTime DATETIME)
INSERT INTO #Holiday
VALUES (1234,'2022-01-01 00:00:00')
,(1234,'2022-01-02 00:00:00')
,(1234,'2022-01-04 06:50:00')
,(1234,'2022-01-05 06:50:00')
,(1235,'2022-01-04 06:50:00')
,(1235,'2022-01-05 06:50:00')
,(1235,'2022-01-06 00:00:00')
,(1236,'2022-01-01 00:00:00')
,(1236,'2022-01-02 00:00:00')
,(1236,'2022-01-03 06:50:00')
,(1236,'2022-01-04 06:50:00')
,(1236,'2022-01-05 06:50:00')
,(1236,'2022-01-08 00:00:00');
WITH cte_Previous AS (
SELECT A.ID,B.StartDate
,IsHolidayConsecutive = CASE WHEN DATEADD(day,-1,StartDate) /*Current day minus 1*/ = LAG(StartDate) OVER (PARTITION BY ID ORDER BY StartDate) /*Previous holiday date*/
THEN 0
ELSE 1
END
FROM #Holiday AS A
CROSS APPLY (SELECT StartDate = CAST(StartDateTime AS DATE)) AS B
),
cte_Groups AS (
SELECT *,GroupID = SUM(IsHolidayConsecutive) OVER (PARTITION BY ID ORDER BY StartDate)
FROM cte_Previous
)
/*Groups of holidays taken consecutively*/
SELECT ID
,StartDate = MIN(StartDate)
,EndDate = MAX(StartDate)
,NumOfDays = COUNT(*)
INTO #ConsecutiveHoliday
FROM cte_Groups
GROUP BY ID,GroupID
ORDER BY ID,StartDate
/*See list of consecutive holidays taken*/
SELECT *
FROM #ConsecutiveHoliday
/*Formatted result*/
SELECT ID
,[N holidays >= 3 days] = COUNT(CASE WHEN NumOfDays >= 3 THEN 1 END)
FROM #ConsecutiveHoliday
GROUP BY ID

Related

How can I get weekly sales for every salesman

I have a table like below (tablename: sales)
sales_datetime
sales
salesman
2022-08-01 09:00:00
100
John
2022-08-01 11:00:00
200
John
2022-08-02 10:00:00
100
Peter
2022-08-02 13:00:00
300
John
2022-08-04 14:00:00
300
Peter
2022-08-05 12:00:00
100
John
2022-08-05 16:00:00
200
John
From that table I want to make a summary sales for 5 days period for each salesman. So the summary table that I want is look like this
periode
total_sales
salesman
2022-08-01
300
John
2022-08-01
0
Peter
2022-08-02
300
John
2022-08-02
100
Peter
2022-08-03
0
John
2022-08-03
0
Peter
2022-08-04
0
John
2022-08-04
300
Peter
2022-08-05
300
John
2022-08-05
0
Peter
I have created following query (PSQL) but the results were not same as I want. Assume today is 2022-08-05
with dateseries as
(select generate_series(current_date-'4 days'::interval,
current_date::date,
'1 day'::interval)::date as periode)
select d.periode,coalesce(sum(s.sales),0) as total_sales,s.salesman from dateseries d
left outer join sales s
on d.periode=s.sales_datetime::date
group by d.periode, s.salesman order by d.periode
results:
periode
total_sales
salesman
2022-08-01
300
John
2022-08-02
300
John
2022-08-02
100
Peter
2022-08-03
0
(NULL)
2022-08-04
300
Peter
2022-08-05
300
John
Any advices would be so great. Thank you
Step by step first aggregate the daily sales per salesperson (aggregated_sales CTE), create a list of days to report (days CTE), create a list of salesmen (salesmen CTE) and then query the sales for each day/salesman pair.
with aggregated_sales as
(
select sales_datetime::date sales_date, sum(sales) sales, salesman
from sales group by sales_datetime::date, salesman
),
days(sales_date) as
(
select d::date
from generate_series('2022-08-01', '2022-08-08', interval '1 day') d
),
salesmen (salesman) as
(
select distinct salesman from sales
)
select sales_date, coalesce(sales, 0) sales, salesman
from (select * from days cross join salesmen) fl
left outer join aggregated_sales ags using (sales_date, salesman);
The query may be shorter if CTEs are inlined yet I think that clarity and readability are more important than mere size.
In order to "make a summary sales for 5 days period for each salesman" replace generate_series('2022-08-01', '2022-08-08', interval '1 day') with generate_series(current_date - 4, current_date, interval '1 day').
the results were not same as I want. Assume today is 2022-08-05
Please note that '2022-08-05'::date - '5 days'::interval will give you 2022-07-31, and not 2022-08-01 as you assume. Because of that, I think you meant it to be current_date - '4 days'::interval.
With that out of the way, here is one possible query:
with sales_by_date as (
select
salesman,
sales_datetime::date,
sum(sales) total_sales
from sales
where
-- assuming you need to have totals for salesmen that had sales in specified period only
sales_datetime::date between current_date-'4 days'::interval and current_date
group by
salesman,
sales_datetime::date),
dateseries as (
select
distinct salesman,
generate_series(current_date-'4 days'::interval, current_date, '1 day'::interval)::date as periode
from sales_by_date)
select
d.periode,
coalesce(s.total_sales, 0) total_sales,
d.salesman
from dateseries d
left join sales_by_date s
on d.periode = s.sales_datetime
and d.salesman = s.salesman
order by d.periode, d.salesman;
But you still have to figure out some requirements for this problem. E.g. what if for the specified period there are no sales at all in the sales table?

how to filter my database with only "Monday" queries?

I am trying to extract only monday from timestamp (in time,date,month format) in my database (would do count on it after wards). I tried to convert my dates to string characters. I was able to get all days in text format.
select to_char (payment_date, 'dy') as days from payment;
however, when i try to add where in it, to filter days, it gives an error.
select to_char (payment_date, 'dy') as days
from payment
where days.payment_date = 'mon';
You might want something like this (all of the code below is available on the fiddle here):
Generate a table with all of the dates in the first half of 2022.
CREATE TABLE dat AS
SELECT the_day FROM GENERATE_SERIES
('2022-01-01'::TIMESTAMPTZ, '2022-06-30'::TIMESTAMPTZ, '1 DAY') AS t(the_day);
and then run:
SELECT
the_day::DATE,
EXTRACT(ISODOW FROM the_day),
to_char(the_day, 'Day')
FROM
dat
WHERE
EXTRACT(ISODOW FROM the_day) = 1;
Result:
the_day extract to_char
2022-01-03 1 Monday
2022-01-10 1 Monday
2022-01-17 1 Monday
2022-01-24 1 Monday
2022-01-31 1 Monday
2022-02-07 1 Monday
2022-02-14 1 Monday
2022-02-21 1 Monday
2022-02-28 1 Monday
2022-03-07 1 Monday
...
... snipped for brevity
...
or similarly:
SELECT
the_day::DATE,
EXTRACT(ISODOW FROM the_day),
to_char(the_day, 'DAY')
FROM
dat
WHERE
to_char(the_day, 'DAY') = 'WEDNESDAY';
Result:
the_day extract to_char
2022-01-05 3 WEDNESDAY
2022-01-12 3 WEDNESDAY
2022-01-19 3 WEDNESDAY
2022-01-26 3 WEDNESDAY
2022-02-02 3 WEDNESDAY
2022-02-09 3 WEDNESDAY
...
... snipped for brevity
...

Using the SUM OVER clause, how to check sum over period only when output is not greater than a certain value, otherwise use current month value?

Sample data:
select date, agent, sales
from agentsales
date agent sales
2021-01-03 00:00:00.000 Agent A 10
2021-02-05 00:00:00.000 Agent A 15
2021-03-10 00:00:00.000 Agent A 10
2021-01-05 00:00:00.000 Agent B 5
2021-02-06 00:00:00.000 Agent B 28
2021-03-10 00:00:00.000 Agent B 5
2021-01-02 00:00:00.000 Agent C 35
2021-02-04 00:00:00.000 Agent C 25
2021-03-08 00:00:00.000 Agent C 15
2021-01-01 00:00:00.000 Agent D 5
2021-02-02 00:00:00.000 Agent D 35
2021-03-10 00:00:00.000 Agent D 31
I want to get the counts of agents who have crossed 30 sales, such that if they have never crossed a total of 30 sales then consider sum over current and previous months, otherwise only current month.
Expected output:
YrMon Count_Agent_more_than_30_sales
Jan21 1
Feb21 2
Mar21 2
Logic:
Jan21 - 1 since only C has crossed 30 sales
Feb21 - 2 since B and D have crossed 30 sales. Agent D has crossed the 30 mark in the month, and B has crossed over period for first time. C is not considered as it previously crossed the 30 mark.
Mar21 - 2 since A and D have crossed 30 sales. Agent A has crossed mark over period for 1st time. D has crossed for the month. B is not considered as periodic case was already considered in last month. C is not considered as it already crossed 30 mark last month.
As mentioned above, I want to get the counts of agents who have crossed 30 sales, such that if they have never crossed a total of 30 sales then consider sum over current and previous months, otherwise only current month.
My query to calculate sum over period:
;WITH CTE AS (SELECT CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR) YRMON, AGENT, SUM(SALES) SALES
FROM AgentSales
GROUP BY CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR), AGENT
)
SELECT *, SUM(SALES) OVER(PARTITION BY AGENT ORDER BY YRMON) SUMOVERPERIOD FROM CTE
ORDER BY 2,1
YRMON AGENT SALES SUMOVERPERIOD
2021 1 Agent A 10 10
2021 2 Agent A 15 25
2021 3 Agent A 10 35
2021 1 Agent B 5 5
2021 2 Agent B 28 33
2021 3 Agent B 5 38
2021 1 Agent C 35 35
2021 2 Agent C 25 60
2021 3 Agent C 15 75
2021 1 Agent D 5 5
2021 2 Agent D 35 40
2021 3 Agent D 31 71
Now I am trying to apply the logic on the calculated sum:
;WITH CTE AS (SELECT CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR) YRMON, AGENT, SUM(SALES) SALES
FROM AgentSales
GROUP BY CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR), AGENT
)
SELECT *, SUM(SALES) OVER(PARTITION BY AGENT ORDER BY YRMON) SUMOVERPERIOD,
CASE WHEN SUM(SALES) OVER(PARTITION BY AGENT ORDER BY YRMON)>30 THEN 1 ELSE 0 END AS CALC
FROM CTE
ORDER BY 2,1
YRMON AGENT SALES SUMOVERPERIOD CALC
2021 1 Agent A 10 10 0
2021 2 Agent A 15 25 0
2021 3 Agent A 10 35 1
2021 1 Agent B 5 5 0
2021 2 Agent B 28 33 1
2021 3 Agent B 5 38 1
2021 1 Agent C 35 35 1
2021 2 Agent C 25 60 1
2021 3 Agent C 15 75 1
2021 1 Agent D 5 5 0
2021 2 Agent D 35 40 1
2021 3 Agent D 31 71 1
This query is always considering sum over current and previous period.
How to check whether the sales has previously crossed the 30 sales mark and for such cases to exclude doing the sum over period? For example can we apply LAG on the result of the SUM OVER column?
Please check if one of these fits your needs (I think the description confusion)
Option 1
-- If you want to count only the first time [agent] crossed 30 sales
;With MyCTE01 as (
SELECT
[date] = EOMONTH([date], -1),
[agent],[sales],
S = SUM([sales]) OVER (PARTITION BY [agent] ORDER BY [date] ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW)
FROM [AgentSales]
),
MyCTE02 as (
SELECT [date],[agent],[sales], S
FROM MyCTE01
-- The idea of using "and S - [sales] < 30" instead of ROW_NUMBER came from #Charlieface, but it is better to do the work on DATE data type and not on string
WHERE S > 30 and S - [sales] < 30
)
SELECT DATENAME(month,[Date]), YEAR([Date]), COUNT(*)
FROM MyCTE02
GROUP BY [date]
GO
Option 2
-- If you want to count all the [agent] crossed 30 sales till now
;With MyCTE01 as (
SELECT
[date] = DATEADD(DAY, 1, EOMONTH([date], -1)),
[agent],[sales],
S = SUM([sales]) OVER (PARTITION BY [agent] ORDER BY [date] ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW)
FROM [AgentSales]
)
,MyCTE02 as (
SELECT [date],[agent],[sales], S
FROM MyCTE01
WHERE S > 30
)
SELECT DATENAME(month,[Date]), YEAR([Date]), COUNT(*)
FROM MyCTE02
GROUP BY [date]
GO
Option 3
-- If you want to count only the first time [agent] crossed 30 sales or when the sales or over 30
;With MyCTE01 as (
SELECT
[date] = DATEADD(DAY,1,EOMONTH([date], -1)),
[agent],[sales],
S = SUM([sales]) OVER (PARTITION BY [agent] ORDER BY [date] ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW)
FROM [AgentSales]
)
,MyCTE02 as (
SELECT [date],[agent],[sales], S
FROM MyCTE01
-- The idea of using "and S - [sales] < 30" instead of ROW_NUMBER came from #Charlieface, but it is better to do the work on DATE data type and not on string
WHERE (S > 30 and S - [sales] < 30) or sales > 30
)
SELECT DATENAME(month,[Date]), YEAR([Date]), COUNT(*)
FROM MyCTE02
GROUP BY [date]
GO
DDL+DML
USE tempdb
GO
DROP TABLE IF EXISTS [AgentSales]
GO
CREATE TABLE [AgentSales](id INT IDENTITY(1,1), [date] DATE, agent VARCHAR(100), sales INT)
GO
INSERT [AgentSales]([date],[agent],[sales]) VALUES
('2021-01-03 00:00:00.000','Agent A', 10),
('2021-02-05 00:00:00.000','Agent A', 15),
('2021-03-10 00:00:00.000','Agent A',10),
('2021-01-05 00:00:00.000','Agent B',5 ),
('2021-02-06 00:00:00.000','Agent B',28),
('2021-03-10 00:00:00.000','Agent B',5 ),
('2021-01-02 00:00:00.000','Agent C',35),
('2021-02-04 00:00:00.000','Agent C',25),
('2021-03-08 00:00:00.000','Agent C',15),
('2021-01-01 00:00:00.000','Agent D',5 ),
('2021-02-02 00:00:00.000','Agent D',35),
('2021-03-10 00:00:00.000','Agent D',31)
GO
SELECT [id],[date],[agent],[sales]
FROM [AgentSales]
GO
Looks like this should work for you
You need to pre-aggregate the sales per agent and month, then get a running sum of that aggregate
Then simply check if each row has crossed over in this month by comparing the current data with the running sum
SELECT
YrMon = FORMAT(Month, 'yyyy MM'),
Count_Agent_more_than_30_sales =
COUNT(CASE WHEN SumOverPeriod >= 30 AND SumOverPeriod - sales < 30 OR sales >= 30 THEN 1 END)
FROM (
SELECT
Month = EOMONTH(date),
agent,
sales = SUM(sales),
SumOverPeriod = SUM(SUM(sales)) OVER (PARTITION BY agent ORDER BY EOMONTH(date)
ROWS UNBOUNDED PRECEDING)
FROM AgentSales
GROUP BY EOMONTH(date), agent
) sales
GROUP BY Month;
db<>fiddle

T-SQL Dynamic Date based on Today's Month

My fiscal year begins on April 1 and I need to include 1 full year of historical data plus current fiscal year as of today. In DAX this looks like:
DATESBETWEEN(Calendar_Date
,IF(MONTH(TODAY()) < 4
,DATE(YEAR(TODAY())-2, 4, 1)
,DATE(YEAR(TODAY())-1, 4, 1)
)
,DATE(TODAY())
)
I need to create this same range as a filter in a T-SQL query, preferably in the "WHERE" clause, but I am totally new to sql and have been unsuccessful in finding a solution online. Any help from more experienced people would be much appreciated!
If you just want to find these values and use as a where filter this is fairly straightforward date arithmetic, the logic for which you already have in your DAX code:
declare #dates table(d date);
insert into #dates values
('20190101')
,('20190601')
,('20200213')
,('20201011')
,('20190101')
,(getdate())
;
select d
,dateadd(month,3,dateadd(year,datediff(year,0,dateadd(month,-4,d))-1,0)) as TraditionalMethod
,case when month(d) < 4
then datetime2fromparts(year(d)-2,4,1,0,0,0,0,0)
else datetime2fromparts(year(d)-1,4,1,0,0,0,0,0)
end as YourDAXTranslated
from #dates;
Which outputs:
d
TraditionalMethod
YourDAXTranslated
2019-01-01
2017-04-01 00:00:00.000
2017-04-01 00:00:00
2019-06-01
2018-04-01 00:00:00.000
2018-04-01 00:00:00
2020-02-13
2018-04-01 00:00:00.000
2018-04-01 00:00:00
2020-10-11
2019-04-01 00:00:00.000
2019-04-01 00:00:00
2019-01-01
2017-04-01 00:00:00.000
2017-04-01 00:00:00
2021-07-22
2020-04-01 00:00:00.000
2020-04-01 00:00:00
However, I would suggest that you may be better served by creating a Dates Table to which you apply filters and from which you join to your transactional data to return the values you require. In an appropriately configured environment this will make full use of available indexes and should provide very good performance.
A very basic tally table approach to generate such a Dates Table is as follows, which returns all dates and their fiscal year start dates for 2015-01-01 to 2042-05-18:
with t as (select t from(values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) as t(t))
,d as (select dateadd(day,row_number() over (order by (select null))-1,'20150101') as d from t,t t2,t t3,t t4)
select d as DateValue
,case when month(d) < 4
then datetime2fromparts(year(d)-1,4,1,0,0,0,0,0)
else datetime2fromparts(year(d),4,1,0,0,0,0,0)
end as FinancialYearStart
from d
order by DateValue;

How can I group by 2 fields and having by an interval type?

I've got this table:
TABLE T (
id int,
month int,
interval hours
);
and I want to group by id and month, and add the hours.
For example:
id month hours
-------------------
1 1 08:00:00
1 1 09:00:00
1 2 10:00:00
1 2 11:00:00
I want:
1 1 17:00:00
1 2 21:00:00
I tried this:
SELECT * FROM T
GROUP BY T.id , T.month
HAVING SUM( SELECT EXTRACT ( epoch FROM T.hours ) / 3600 );
but it doens't work and I can't fix it.
SELECT
id,
month,
sum(extract ('epoch' from hours)/3600)
FROM
hours
GROUP BY
id,
month
SQL Fiddle