Calculate service date of a truck with time and service day data - tsql

We have some trucks that serve some destinations in different days. In Sql Server table, the table is like below.
TruckCode ServiceHour Monday Tuesday Wednesday Thursday Friday Saturday
Route1 17:00 1 0 1 0 1 0
Route2 09:30 1 1 1 1 1 1
Route3 14:30 0 1 0 1 0 0
Explanation of this table,
Route1 truck is serving on Monday,Wednesday,Friday and the serving time is 17:00.
Route2 truck is serving on every day and the serving time is 09:30.
Route3 truck is serving on Tuesday,Thursday and the serving time is 14:30.
I have order dates, and these orders have truck code and order time. I want to calculate the date and time when the order will be served. On Sunday, there is no serving.
For example, an order which order time is 17:05, order day is Monday and truck code is Route1, this will be served on Wednesday at 17:00. Because, Tuesday is not a serving day for Route1 and time is 5 minutes passed on Monday.
I am working on a solution based on charindex, like below.
I join day data into one string like, for example Route1, 10101001010100 , this represents, first char is 1 then this Route is serving on Monday, second char is 0 then this Route is not serving on Tuesday etc.
And i get the daynumber with ((datepart(dw,#orderInsertDate) + ##DATEFIRST-2) % 7+1)
And charindex function is:
CHARINDEX('1','10101001010100',((datepart(dw,#orderInsertDate) + ##DATEFIRST-2) % 7+1)+
(case when ServeTime < getdate() then 1 else 0 end)
)
I need to compare the time as well, because, for example, a Route1 order which inserted at 16:30 on Monday is serving the same day, but an order which inserted at 17:30 on Monday is serving next Wednesday. Still couldn't manage to calculate the right value but i think the charindex will solve it.
How to calculate that?
Thanks from now.

This requires normalized data but I think it works. Trust me that is it worth normalizing the data.
It does not wrap to the next week but you could check for null and wrap to the first delivery.
declare #R table (id int identity primary key, name varchar(20), tm time not null);
insert into #R (name, tm) values ('Route1', '17:00');
declare #D table (fk int, d tinyint not null);
insert into #D (fk, d) values (1, 2), (1, 4), (1, 6);
declare #dd int = 2;
declare #tt time = cast('17:05' as time);
select top (1)
r.name, r.tm, d.d, #dd as dOr, #tt as tOr
from #R r
join #D d
on d.fk = r.id
where r.id = 1
and d.d > #dd
or (d.d = #dd and r.tm >= #tt)
order by d.d;
name tm d dOr tOr
-------------------- ---------------- ---- ----------- ----------------
Route1 17:00:00.0000000 4 2 17:05:00.0000000

Related

How can I, in T-SQL, examine date intervals to remove overlapping intervals before adding totals together

I am running an analysis on medication prescribing practices. We want to identify whether someone has been on a class of medications for 60 days out of a 90 day quarter. We have a start and end date for each prescription, and the bounds of the quarter (e.g., 4/1/2022 – 6/30/2022). For each prescription I’ve calculated the number of days between the start and end date (only including days that fall within the bounds of the quarter). There are many instances in which multiple drugs within the same class are prescribed someone might try one antidepressant but not like it, so be given another in the same class.
My original strategy was just to total up number of days for each class of medication and see if it’s 60 or over. The days don’t have to be consecutive, but if they overlap, days during an overlap period shouldn’t count twice (which they would in a simple sum).
For instance in the data table below, patient 1 in row 1 should be included as they are over 60 days. Patient 2 should also get in (rows 2 and 3) because the non-overlapping total (57+8) within the same med class gets them to over 60 days. However, patient 3 should NOT get in, even though the total of 32 + 32 is over 60 because the intervals overlap. This means that they were really on the medication class for only 32 days – this is an instance where someone might be on two different antidepressants simultaneously.
It’s not sufficient to just sum the days in the interval, but I also have to include some way to examine whether the intervals are overlapping and only add days if an interval for a given medication class falls outside another interval for that same class.
Row num Patid Med class Start date End date Interval
1 1 A 2022-04-28 2022-09-12 63
2 2 B 2022-05-03 2022-06-29 57
3 2 B 2022-04-21 2022-04-29 8
4 3 A 2022-01-19 2022-05-03 32
5 3 A 2022-01-19 2022-05-03 32
I’m having a hard time figuring out how to do this. Note, I'm limited to just using SQL for this.
Code that produced the above data. I would embed this in another query to generate a total interval but need to deal with the overlap issue.
DECLARE #startdt DATE;
DECLARE #enddt DATE;
SET #startdt='4/1/2022'
SET #enddt='6/30/2022'
--for q4 fy2022-23 (4/1/2022-6/30/2022)`
SELECT DISTINCT
rx.patid, d.medication_category as medcat, start_date, end_date,
-- case statement to capture days within quarter only
CASE WHEN start_date<#startdt and end_date>#enddt then 90
WHEN start_date<#startdt and end_date>=#startdt then datediff(d,#startdt,end_date)
WHEN start_date>=#startdt and end_date>#enddt then datediff(d,start_date,#enddt)
ELSE datediff(d,start_date,end_date)
END as interval
FROM rx
INNER JOIN Drug_names_categories d
ON rx.drugname=d.drugname
WHERE start_date<'7/1/2022' and end_date>'3/30/2022'
AND rx.patid IS NOT NULL
AND d.medication_category IS NOT NULL
AND d.medication_category <>''
You can accomplish what you want by generating a calendar table (using a Common Table Expression) of individual days within the test range, joining those days with the prescriptions with overlapping days, and then counting distinct days for each patient and medication category combination.
Something like:
DECLARE #startdt DATE = '2022-04-01';
DECLARE #enddt DATE = '2022-06-30';
DECLARE #threshold INT = 60;
WITH Days AS (
SELECT #startdt AS Day
UNION ALL
SELECT DATEADD(day, 1, Day)
FROM Days
WHERE Day < #enddt
)
SELECT
rx.patid, d.medication_category as medcat,
COUNT(DISTINCT DD.Day) AS days_medicated,
MIN(DD.Day) AS start_date,
MAX(DD.Day) AS end_date
FROM rx
INNER JOIN Drug_names_categories d
ON rx.drugname = d.drugname
INNER JOIN Days DD
ON DD.Day BETWEEN rx.start_date AND rx.end_date
WHERE rx.start_date <= #enddt AND #startdt <= rx.end_date
GROUP BY rx.patid, d.medication_category
HAVING COUNT(DISTINCT DD.Day) >= #threshold
ORDER BY rx.patid, start_date;
If using SQL Server 2022 or later, the Days generator can be simplified by using the new GENERATE_SERIES() function:
WITH Days AS (
SELECT DATEADD(day, S.value, #startdt) AS Day
FROM GENERATE_SERIES(0, DATEDIFF(day, #Startdt, #enddt)) S
)
See this db<>fiddle for an example with some sample data.
I would do this using a date/calendar table, then it's pretty easy.
If you don't already have a date table, this link is one of many that describe how to create one easily ( https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/ )
Here's the script from this link (in case the link dies)
DECLARE #StartDate date = '20100101';
DECLARE #CutoffDate date = DATEADD(DAY, -1, DATEADD(YEAR, 30, #StartDate));
;WITH seq(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM seq
WHERE n < DATEDIFF(DAY, #StartDate, #CutoffDate)
),
d(d) AS
(
SELECT DATEADD(DAY, n, #StartDate) FROM seq
),
src AS
(
SELECT
TheDate = CONVERT(date, d),
TheDay = DATEPART(DAY, d),
TheDayName = DATENAME(WEEKDAY, d),
TheWeek = DATEPART(WEEK, d),
TheISOWeek = DATEPART(ISO_WEEK, d),
TheDayOfWeek = DATEPART(WEEKDAY, d),
TheMonth = DATEPART(MONTH, d),
TheMonthName = DATENAME(MONTH, d),
TheQuarter = DATEPART(Quarter, d),
TheYear = DATEPART(YEAR, d),
TheFirstOfMonth = DATEFROMPARTS(YEAR(d), MONTH(d), 1),
TheLastOfYear = DATEFROMPARTS(YEAR(d), 12, 31),
TheDayOfYear = DATEPART(DAYOFYEAR, d)
FROM d
)
SELECT *
INTO MyDateTable
FROM src
ORDER BY TheDate
OPTION (MAXRECURSION 0);
No that you have your new date table you can join to it to get the list of dates that are within the start and end date, something like
SELECT DISTINCT COUNT(TheDate)
FROM rx
INNER JOIN MyDateTable dt on dt BETWEEN rx.start_date AND rx.end_date
INNER JOIN Drug_names_categories d ON rx.drugname=d.drugname
WHERE start_date<'7/1/2022' and end_date>'3/30/2022'
AND rx.patid IS NOT NULL
AND d.medication_category IS NOT NULL
AND d.medication_category <>''
Obviously this is simple example but you could extend this easily to include all the details you need, the point is that you now have a list of dates or distinct list of dates which you can work with easily.
You could also simply the date range applied by referencing the TheQuarter and TheYear columns. If this is a common task consider extending the date table to contain a comound YearQurater columns (e.g. 2023Q1/202301 etc)

Exclude weekend between two timestamps in a query in Db2

I have two columns with a timestamp in each
column_a column_b
2021-08-03 13:22:29 2021-08-09 15:51:59
I want to calculate the difference in hours, but exclude the weekend (if the dates fall on or between the two timestamps).
I have tried TIMESTAMPDIFF and HOURS_BETWEEN - but these would still include the weekend.
UPDATE:
my solution was to ...
create a function to calculate the number of days between the two days, excluding weekends taken from here
How to get the WORKING day diff in db2 without saturdays and sundays?
Then in my SELECT used Db2s native DATEDIFF(8,xxx,yyy) to get the total number of hours, and subtracted from this DATEDIFF, the value returned from the function * 24 (for hours)
WITH
MYTAB (A, B) AS
(
VALUES ('2021-08-03 13:22:29'::TIMESTAMP, '2021-08-09 15:51:59'::TIMESTAMP)
)
, MYTAB2 (A, B) AS
(
SELECT
CASE WHEN DAYOFWEEK(A) IN (1, 7) THEN DATE (A) + 1 DAY ELSE A END
, CASE WHEN DAYOFWEEK(B) IN (1, 7) THEN B - (MIDNIGHT_SECONDS (B) + 1) SECOND ELSE B END
FROM MYTAB
)
, R (TS) AS
(
SELECT V.TS
FROM MYTAB2 T, TABLE (VALUES T.A, T.B) V (TS)
WHERE T.A <= T.B
UNION ALL
SELECT DATE (R.TS) + 1 DAY
FROM R, MYTAB2 T
WHERE DATE (R.TS) + 1 DAY < DATE (T.B)
)
SELECT
COALESCE
(
-- Seconds between start and end
(DAYS (MAX (TS)) - DAYS (MIN (TS))) * 86400
+ MIDNIGHT_SECONDS (MAX (TS)) - MIDNIGHT_SECONDS (MIN (TS))
-- SUM of seconds for weekend days is subtracted
- SUM (CASE WHEN DAYOFWEEK (TS) IN (1, 7) THEN 86400 ELSE 0 END)
, 0
) / 3600 AS HRS
FROM R
The idea is to construct the following table with Recursive Common Table expression first:
2021-08-03 13:22:29 --> 2021-08-04 00:00:00 (if start is a weekend)
2021-08-04 00:00:00 --> 2021-08-05 00:00:00 (if start is a weekend)
...
2021-08-08 00:00:00 --> 2021-08-07 00:00:00 (if end is a weekend)
2021-08-09 15:51:59 --> 2021-08-08 23:59:59 (if end is a weekend)
That is: one timestamp for each day between the start and the end timestamps. These start and end timestamps are adjusted:
If start is a weekend - change it to the start of the next day
If end is a weekend - change it to the end of the previous day
The final calculation is simple: we subtract sum of seconds for all weekends in the list from the difference in seconds between start and end.

Dynamically change StartDate on a query depending on ClientId

I want to return the sum of daily spent since the beginning of the current insertion order (invoice) for a number of clients. Each client unfortunately has a different start date for the current insertion order.
I don't have any problem to pull the start date for each client but I don't get how to create a sort of lookup to a table with the start dates associated to each client.
Let's say I have a table IO:
ClientId StartDate
1 2014-10-01
2 2014-10-04
3 2014-09-17
...
And another table with the DailySpend for each Client:
Date Client Spend
2014-10-01 1 2325
2014-10-01 2 195
2014-10-01 3 434
2014-10-02 1 43624
...
Now, I would simply want to check for each client how much we spend from the start date of the current insertion order until yesterday.
May be something lyk this
SELECT a.client,
Sum(b.spend)
FROM [IO] a
JOIN DailySpend b
ON a.id = b.id
and a.startdate=>b.date
WHERE b.date <= Dateadd(dd, -1, Cast(Getdate() AS DATE))
GROUP BY client
select *
from IO
join DailySpend
on IO.ClientId = DailySpend.Client
and DailySpend.Date <= IO.StartDate
and datediff(dd, getdate(), DailySpend.Date) <= 1
select DailySpend.Client, sum(DailySpend.Spend)
from IO
join DailySpend
on IO.ClientId = DailySpend.Client
and DailySpend.Date >= IO.StartDate
and datediff(dd, getdate(), DailySpend.Date) <= 1
group by DailySpend.Client
you may need to flip the date order in the datediff

T-SQL - Data Islands and Gaps - How do I summarise transactional data by month?

I'm trying to query some transactional data to establish the CurrentProductionHours value for each Report at the end of each month.
Providing there has been a transaction for each report in each month, that's pretty straight-forward... I can use something along the lines of the code below to partition transactions by month and then pick out the rows where TransactionByMonth = 1 (effectively, the last transaction for each report each month).
SELECT
ReportId,
TransactionId,
CurrentProductionHours,
ROW_NUMBER() OVER (PARTITION BY [ReportId], [CalendarYear], [MonthOfYear]
ORDER BY TransactionTimestamp desc
) AS TransactionByMonth
FROM
tblSource
The problem that I have is that there will not necessarily be a transaction for every report every month... When that's the case, I need to carry forward the last known CurrentProductionHours value to the month which has no transaction as this indicates that there has been no change. Potentially, this value may need to be carried forward multiple times.
Source Data:
ReportId TransactionTimestamp CurrentProductionHours
1 2014-01-05 13:37:00 14.50
1 2014-01-20 09:15:00 15.00
1 2014-01-21 10:20:00 10.00
2 2014-01-22 09:43:00 22.00
1 2014-02-02 08:50:00 12.00
Target Results:
ReportId Month Year ProductionHours
1 1 2014 10.00
2 1 2014 22.00
1 2 2014 12.00
2 2 2014 22.00
I should also mention that I have a date table available, which can be referenced if required.
** UPDATE 05/03/2014 **
I now have query which is genertating results as shown in the example below but I'm left with islands of data (where a transaction existed in that month) and gaps in between... My question is still similar but in some ways a little more generic - What is the best way to fill gaps between data islands if you have the dataset below as a starting point?
ReportId Month Year ProductionHours
1 1 2014 10.00
1 2 2014 12.00
1 3 2014 NULL
2 1 2014 22.00
2 2 2014 NULL
2 3 2014 NULL
Any advice about how to tackle this would be greatly appreciated!
Try this:
;with a as
(
select dateadd(m, datediff(m, 0, min(TransactionTimestamp))+1,0) minTransactionTimestamp,
max(TransactionTimestamp) maxTransactionTimestamp from tblSource
), b as
(
select minTransactionTimestamp TT, maxTransactionTimestamp
from a
union all
select dateadd(m, 1, TT), maxTransactionTimestamp
from b
where tt < maxTransactionTimestamp
), c as
(
select distinct t.ReportId, b.TT from tblSource t
cross apply b
)
select c.ReportId,
month(dateadd(m, -1, c.TT)) Month,
year(dateadd(m, -1, c.TT)) Year,
x.CurrentProductionHours
from c
cross apply
(select top 1 CurrentProductionHours from tblSource
where TransactionTimestamp < c.TT
and ReportId = c.ReportId
order by TransactionTimestamp desc) x
A similar approach but using a cartesian to obtain all the combinations of report ids/months.
in the first step.
A second step adds to that cartesian the maximum timestamp from the source table where the month is less or equal to the month in the current row.
Finally it joins the source table to the temp table by report id/timestamp to obtain the latest source table row for every report id/month.
;
WITH allcombinations -- Cartesian (reportid X yearmonth)
AS ( SELECT reportid ,
yearmonth
FROM ( SELECT DISTINCT
reportid
FROM tblSource
) a
JOIN ( SELECT DISTINCT
DATEPART(yy, transactionTimestamp)
* 100 + DATEPART(MM,
transactionTimestamp) yearmonth
FROM tblSource
) b ON 1 = 1
),
maxdates --add correlated max timestamp where the month is less or equal to the month in current record
AS ( SELECT a.* ,
( SELECT MAX(transactionTimestamp)
FROM tblSource t
WHERE t.reportid = a.reportid
AND DATEPART(yy, t.transactionTimestamp)
* 100 + DATEPART(MM,
t.transactionTimestamp) <= a.yearmonth
) maxtstamp
FROM allcombinations a
)
-- join previous data to the source table by reportid and timestamp
SELECT distinct m.reportid ,
m.yearmonth ,
t.CurrentProductionHours
FROM maxdates m
JOIN tblSource t ON t.transactionTimestamp = m.maxtstamp and t.reportid=m.reportid
ORDER BY m.reportid ,
m.yearmonth

SQL query to convert date ranges to per day records

Requirements
I have data table that saves data in date ranges.
Each record is allowed to overlap previous record(s) (record has a CreatedOn datetime column).
New record can define it's own date range if it needs to hence can overlap several older records.
Each new overlapping record overrides settings of older records that it overlaps.
Result set
What I need to get is get per day data for any date range that uses record overlapping. It should return a record per day with corresponding data for that particular day.
To convert ranges to days I was thinking of numbers/dates table and user defined function (UDF) to get data for each day in the range but I wonder whether there's any other (as in better* or even faster) way of doing this since I'm using the latest SQL Server 2008 R2.
Stored data
Imagine my stored data looks like this
ID | RangeFrom | RangeTo | Starts | Ends | CreatedOn (not providing data)
---|-----------|----------|--------|-------|-----------
1 | 20110101 | 20110331 | 07:00 | 15:00
2 | 20110401 | 20110531 | 08:00 | 16:00
3 | 20110301 | 20110430 | 06:00 | 14:00 <- overrides both partially
Results
If I wanted to get data from 1st January 2011 to 31st May 2001 resulting table should look like the following (omitted obvious rows):
DayDate | Starts | Ends
--------|--------|------
20110101| 07:00 | 15:00 <- defined by record ID = 1
20110102| 07:00 | 15:00 <- defined by record ID = 1
... many rows omitted for obvious reasons
20110301| 06:00 | 14:00 <- defined by record ID = 3
20110302| 06:00 | 14:00 <- defined by record ID = 3
... many rows omitted for obvious reasons
20110501| 08:00 | 16:00 <- defined by record ID = 2
20110502| 08:00 | 16:00 <- defined by record ID = 2
... many rows omitted for obvious reasons
20110531| 08:00 | 16:00 <- defined by record ID = 2
Actually, since you are working with dates, a Calendar table would be more helpful.
Declare #StartDate date
Declare #EndDate date
;With Calendar As
(
Select #StartDate As [Date]
Union All
Select DateAdd(d,1,[Date])
From Calendar
Where [Date] < #EndDate
)
Select ...
From Calendar
Left Join MyTable
On Calendar.[Date] Between MyTable.Start And MyTable.End
Option ( Maxrecursion 0 );
Addition
Missed the part about the trumping rule in your original post:
Set DateFormat MDY;
Declare #StartDate date = '20110101';
Declare #EndDate date = '20110501';
-- This first CTE is obviously to represent
-- the source table
With SampleData As
(
Select 1 As Id
, Cast('20110101' As date) As RangeFrom
, Cast('20110331' As date) As RangeTo
, Cast('07:00' As time) As Starts
, Cast('15:00' As time) As Ends
, CURRENT_TIMESTAMP As CreatedOn
Union All Select 2, '20110401', '20110531', '08:00', '16:00', DateAdd(s,1,CURRENT_TIMESTAMP )
Union All Select 3, '20110301', '20110430', '06:00', '14:00', DateAdd(s,2,CURRENT_TIMESTAMP )
)
, Calendar As
(
Select #StartDate As [Date]
Union All
Select DateAdd(d,1,[Date])
From Calendar
Where [Date] < #EndDate
)
, RankedData As
(
Select C.[Date]
, S.Id
, S.RangeFrom, S.RangeTo, S.Starts, S.Ends
, Row_Number() Over( Partition By C.[Date] Order By S.CreatedOn Desc ) As Num
From Calendar As C
Join SampleData As S
On C.[Date] Between S.RangeFrom And S.RangeTo
)
Select [Date], Id, RangeFrom, RangeTo, Starts, Ends
From RankedData
Where Num = 1
Option ( Maxrecursion 0 );
In short, I rank all the sample data preferring the newer rows that overlap the same date.
Why do it all in DB when you can do it better in memory
This is the solution (I eventually used) that seemed most reasonable in terms of data transferred, speed and resources.
get actual range definitions from DB to mid tier (smaller amount of data)
generate in memory calendar of a certain date range (faster than in DB)
put those DB definitions in (much easier and faster than DB)
And that's it. I realised that complicating certain things in DB is not not worth it when you have executable in memory code that can do the same manipulation faster and more efficient.