POSTGRESQL Comparing a time overlap across rows - postgresql

I'm creating a query in our EHR. I am looking for patients that have accounts that are within 48 hours of each other to make sure they aren't coming back in with the same symptoms.
to work with i have the following columns in two tables
visit table
'patnum' which is the unique number attached to each visit
'mr_num' which is the unique number attached to each patient
.
er_log table
'patnum' which is the unique number attached to each visit
arrivaldt and arrivaltime which i will unite in a timestamp and is when the patient presented themselves
dischargedate and dischargetime which I will unite in a timestamp and is when the patient was discharged
1.what are some methods for comparing the arrivaltime and dischargetime across all of an mr_num's accounts and finding those with less than 48 hours of difference?
thanks!
edit: my explanation may be a little vague
example pt with MR num '334455' is a frequent flyer at the ER. she has come in 25 times in the last 7 months. each of those visits has a unique patnum, an arrivaldate and a discharge date. I am trying to find if any of her visits happened within 48 hours of each other by comparing one visits date with every other visits arrival date. query will report those visits that are within 48hours of each other.

I'm assuming that dischargetime and arrivaltime are of type timestamp to simplify the code, but you can always do the conversion on your own.
Previous answer
Use INNER JOIN to connect with table containing your times. Assuming that a patient can be discharged only after he presented himself, you can subtract arrivaltime from dischargetime to receive result in INTERVAL type and get back only those rows where they differ by maximum 2 days, which is 48 hours.
SELECT
v.mr_num, v.patnum
FROM
visit v
INNER JOIN er_log e USING ( patnum )
WHERE
v.mr_num = ?
AND e.dischargetime - e.arrivaltime < interval '2 days'
To get some deeper understanding you could look into SQL Fiddle I've created.
Current answer
Changes in code after comment and edited question. This will scan all visits for a particular patient and return those that happened within 2 days from the end of any of previous visits. You may want to change logic a bit with the line where the comparison takes place, to better suit your need
AND e1.dischargetime - e2.arrivaltime < interval '2 days'
and decide whether you base your times on arrivaltime or dischargetime for the current visit being compared and all the visits that are in a lookup.
SELECT
v.mr_num, v.patnum
FROM
visit v1
INNER JOIN er_log e1 USING ( patnum )
WHERE
v.mr_num = ?
AND EXISTS (
SELECT 1
FROM visit v2
INNER JOIN er_log e2 USING ( patnum )
WHERE
v1.mr_num = v2.mr_num
AND v1.patnum IS DISTINCT FROM v2.patnum
AND e1.dischargetime - e2.arrivaltime < interval '2 days'
)

What you will need to do is do TWO inner joins to the ER_LOG table. I'm going to borrow the previous answer and extend it:
SELECT
v1.mr_num, v1.patnum, v2.patnum,
(e1.dischargetime - e2.arrivaltime) as elapsed
FROM
visit v1
INNER JOIN er_log e1 on (v1.patnum = e1.patnum)
INNER JOIN visit v2 on (v1.mr_num = v2.mr_num and v1.patnum <> v2.patnum)
INNER JOIN er_log e2 on (v2.patnum = e2.patnum)
WHERE
v1.mr_num = ?
AND e1.dischargetime - e2.arrivaltime < interval '2 days'
You must join against the er_log table twice to get the two different visits so that you can compare times. You must also join against visit again to establish visits for the same patient (mr_num), and you may exclude the identical visit by making sure the patnum values are different.

Related

A T-SQL process to identify the total duration or number of days of all "cases" within a specified time period. This is a challenge

I could really do with some help and intend to be active in this community and help others in return. I am a SQL developer using MS SQL Server for the last two years but I've hit a roadblock on this one. Imagine the scenario you have a number of "Accommodation Providers". Each has a certain "Service Capacity". We have a dataset with a number of concurrent "Placements" which can be any duration from a day to several years. We would like to know the "Occupancy Rate" by calculating it as
Occupancy = Placement Days (all days in all placements within period)
/
(Capacity x Days in Period) X 100
I have changed names of fields/tables and am showing some made-up sample data here.
We have one dataset in a table (tPL) for "Placements". There are many thousands of records, going back 7 years
e.g
tbl_Placements tPL:
[Provder Name] [Name of Client] [Vacancy Filled Date] [Vacancy End Date]
Accommodation1 John Smith 2018-08-04 2018-08-12
Accommodation1 Jane Smith 2019-01-28 2019-04-09
and:
[Placement_Length_in_Days]
8
294
tbl_Month_Year tMY:
Month_Year
2018-03-01
2018-04-01
2018-05-01
2018-06-01
2018-07-01
2018-08-01
2018-09-01
2018-10-01
2018-11-01
2018-12-01
2019-01-01
2019-02-01
2019-03-01
2019-04-01
2019-05-01
and lastly
tbl_Service_Capacity tSC:
[Provider Name] [Service Capacity]
Accommodation 1 12
Accommodation 2 4
Dividing by the service capacity is the easy part. Where I'm struggling is calculating the total number of "Placement Days" in a given period such as a month or quarter.
If you consider that Accommodation1, 2 and 3 can have multiple concurrent and overlapping placements of different lengths which can start and finish at any time, how can I calculate the total number of days in all placements, that fall within a given time period e.g. quarter or a month, to then calculate the occupancy percentage? The code below is an attempt. I'm presuming all months to be 30 days here, which I know is wrong. I know the logic is wrong here about calculating the number of days. To be honest, I'm almost totally fried and I just can't seem to get this done, hence I'm asking for help.
Am I going about this the wrong way by joining on a date table? Has anyone come against this before. Also if you would like me to give you more information or clarify, I'm happy to do so.
Any help you can give will be hugely appreciated!
Please see the code below. I've tried it a few different ways, but sadly did not save the older versions to show. They didn't work, though. I've done something similar in the past to see how many "open cases" there were at any given point in time. That inspired the code here and went like this:
SELECT TOP (1000) tMY.Month_Year, COUNT(*) AS ActiveCases
FROM tbl_Casework AS tblCW LEFT OUTER JOIN
tbl_Month_Year AS tMY ON tMY.Month_Year >= tblCW.Start_Date AND tMY.Month_Year <= DATEADD(day, 31 - DATEPART(day,
ISNULL(tblCW.End_Date, GETDATE())), ISNULL(tblCW.End_Date, GETDATE()))
GROUP BY tMY.Month_Year
This definitely worked, but was just a count of "how many cases were open at some point during each month?"
SELECT tMY.Month_Year
,tPL.[Accommodation Provider]
,tSC.[Service_capacity_Total]
-- if started before month began and closed at or after end of month / or still open
,(sum(case when (datediff(day, tPL.[Vacancy Filled Date], [tMY].[MonthYear])<0 AND
(datediff(day, [tMY].[Month_Year], tPL.[Vacancy End Date])>=30) OR tPL.[Vacancy End Date] is null) then 30
-- if started after month began and closed during month
,sum(case when (datediff(day, tPL.[Vacancy Filled Date], [tMY].[MonthYear])>=0 AND
datediff(day, [tMY].[Month_Year], tPL.[Vacancy End Date])<=30) then tPL.[Placement_Length_in_Days]
-- if started before and closed after month - take filled date to end of month
,sum(case when datediff(day, [tMY].[Month_Year], tPL.[Vacancy End Date])>=30 AND datediff(day, tPL.[Vacancy Filled Date], [tMY].[Month_Year])<0 then
datediff(day, tPL.[Vacancy Filled Date], DATEADD(DAY, 30, tMY.Month_Year)) END) / (tSC.[Service_capacity]*30)*100 As [Occupancy Rate]
FROM [tbl_Placements] tPL
inner join tbl_Service_Capacity tSC on tSC.[Service Name] = tPL.[Accommodation Provider]
left outer join tbl_Month_Year tMY ON tMY.MonthYear >= [Vacancy Filled Date] and tMY.MonthYear <= DATEADD(day, 30, tPL.[Vacancy Filled Date])
WHERE tPL.[Vacancy Filled Date] >= '20160501' and tMY.MonthYear < (getdate()-30) AND tSC.[Service Capacity] IS NOT NULL
group by tMY.MonthYear, tPL.[Service Name], tSC.[Service Capacity]--, tPL.[Client Name]
order by tMY.MonthYear Asc
The code runs but I get crazy occupancy rates at 300% or 3% so the figures must be incorrect. The only part I'm sure of is taking the [Placement_Length_in_Days] when it starts and finishes within the time period. The calculations here are wrong, I'm sure of that.
To give you a quick shot, you might try this:
DECLARE #tbl_Placements TABLE
(
[Provider Name] VARCHAR(100),
[Name of Client] VARCHAR(100),
[Vacancy Filled Date] DATE,
[Vacancy End Date] DATE
);
INSERT INTO #tbl_Placements
VALUES ('Accommodation1', 'John Smith', '2018-08-04', '2018-08-12'),
('Accommodation1', 'Jane Smith ', '2019-01-28', '2019-04-09');
SELECT
p.[Provider Name], p.[Name of Client],
DATEADD(DAY, A.Nmbr - 1, p.[Vacancy Filled Date]) AS OccupiedAt
FROM
#tbl_Placements p
CROSS APPLY
(SELECT TOP (DATEDIFF(DAY, p.[Vacancy Filled Date], p.[Vacancy End Date]) + 1)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
master..spt_values) A(Nmbr);
The idea in short:
We use CROSS APPLY to create a joined set per row.
We use a computed TOP clause to get the right count of rows back
We create a numbers-table on the fly, simply by querying any table with enough rows (here I took master..spt_values. We do not need the actual table's content, just a counter we get from ROW_NUMBER().
We return the set together with a running day starting with the first day of occupation and ending with the last day of occupation.
Hint: This was much easier, if you have an existing physical numbers/date table in your database. You would simply inner join this table with a BETWEEN in the ON-clause.
You might read this.

Why are these two SQL date functions not providing the same answer

I've actually already solved the problem, but I'm trying to understand why the problem occurred because as far as I can see it has no reason to happen.
I have a rather large query that I run to prepare a table with some often used combinations. Generally, it only contains 2 years of data. Occasionally I will reconstruct it. While doing this I tweaked the query to add more information, but suddenly the result no longer matched up to the old query. Comparing the old to the new I noticed several missing orders. Amazingly, even after removing the tweaked parts the results still didn't match up.
I ultimately tracked the problem down to my WHERE clause, which was different from how I did it last time.
The type of the orderdate column I go over has type (datetime, null)
One of the orders that was omitted had this as date:
2018-12-23 20:58:52.383
An order that was included had this as date:
2019-01-28 15:20:49.107
It looks exactly the same to me.
The entire query is the same, except for the WHERE clause. My original where was:
WHERE DATEPART(yyyy,tbOrder.[OrderDate]) >= DATEPART(yyyy,GETDATE()-2)
My new where is now:
WHERE tborder.[OrderDate] >= DATEADD(yy, DATEDIFF(yy, 0, GETDATE())-2, 0)
Any help in understanding why the original where clause drops some lines would be greatly appreciated.
Because you are doing two different things. First predicate,
WHERE DATEPART(yyyy,tbOrder.[OrderDate]) >= DATEPART(yyyy,GETDATE()-2)
Take all order dates that are bigger than the year for the day before yesterday or two days before. Notice that, -2 is inside the brackets.
Second predicate,
WHERE tborder.[OrderDate] >= DATEADD( yy, DATEDIFF( yy, 0, GETDATE() ) - 2, 0)
Take all order dates bigger than two years before, i.e., datediff(yy,startdate,enddate) will return the year result of the difference between today and the initial value for date datatype, which is 1900-01-01. Then, add this, -2, to 1900-01-01. The second expression is the form of:
1900 + ( 201X - 1898 )
I simplified 1900 - 2 = 1898.
The two expressions return completely different things, so it shouldn't be a surprise the results are different. The first one returns the current year as a number (or the year of the day before yesterday to be precise). The second one returns January 1st two years ago.
You can put both expressions in a SELECT query to see what they return :
select DATEPART(yyyy,GETDATE()-2), DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 2, 0)
The result is :
2019 2017-01-01 00:00:00.000
Both expressions are more complex than they need to be. The first condition will also harm performance because DATEPART(yyyy,tbOrder.[OrderDate]) prevents the server from using any indexes that cover OrderDate.
The question doesn't explain what you actually want to return. If you wanted to return all rows in the current year you can use :
Where
OrderDate >=DATEFROMPARTS( YEAR(GETDATE()) ,1,1) and
OrderDate < DATEFROMPARTS( YEAR(GETDATE()) + 1,1,1)
The same can be used to find rows two years in the past :
Where
OrderDate >= DATEFROMPARTS( YEAR(GETDATE()) -2 ,1,1) and
OrderDate < DATEFROMPARTS(YEAR(GETDATE()) - 1,1,1)
All rows since January 1st two years ago :
Where OrderDate >= DATEFROMPARTS( YEAR(GETDATE()) -2 ,1,1)
All those queries can take advantage of indexes that cover OrderDate.
Date range queries become a lot easier if you use a Calendar table. A Calendar table is a table that contains eg 50 or 100 years' worth of dates with extra columns for month, month day, week number, day of week, quarter, semester, month and day names, holidays, business reprorint periods, formatted short, long dates etc.
This makes yearly, monthly or weekly queries as easy as joining with the Calendar table and filtering based on the month or period you want.
In this case, retrieving rows two yeas in the past would look like :
From Orders inner Join Calendar on OrderDate=Calendar.Date
Where Calendar.Year=YEAR(GETDATE())-2
That may not looks so impressive but what about Q2 two years ago?
From Orders inner Join Calendar on OrderDate=Calendar.Date
Where Calendar.Year=YEAR(GETDATE())-2 and Quarter=2
Two years ago, same quarter
From Orders inner Join Calendar on OrderDate=Calendar.Date
Where Calendar.Year=YEAR(GETDATE())-2 and Quarter=DATEPART(q,GETDATE())
Retrieving totals for the current quarter for the last two years :
SELECT Year,Quarter,SUM(Total) QuarterTotal
From Orders inner Join Calendar on OrderDate=Calendar.Date
Where Calendar.Year > YEAR(GETDATE())-2 and Quarter=DATEPART(q,GETDATE())
GROUP BY Calendar.Year

Year over year monthly sales

I am using SQL Server 2008 R2. Here is the query I have that returns monthly sales totals by zip code, per store.
select
left(a.Zip, 5) as ZipCode,
s.Store,
datename(month,s.MovementDate) as TheMonth,
datepart(year,s.MovementDate) as TheYear,
datepart(mm,s.MovementDate) as MonthNum,
sum(s.Dollars) as Sales,
count(*) as [TxnCount],
count(distinct s.AccountNumber) as NumOfAccounts
from
dbo.DailySales s
inner join
dbo.Accounts a on a.AccountNumber = s.AccountNumber
where
s.SaleType = 3
and s.MovementDate > '1/1/2016'
and isnull(a.Zip, '') <> ''
group by
left(a.Zip, 5),
s.Store,
datename(month, s.MovementDate),
datepart(year, s.MovementDate),
datepart(mm, s.MovementDate)
Now I'd like to add columns that compare sales, TxnCount, and NumOfAccounts to the same month the previous year for each zip code and store. I also would like each zip code/store combo to have a record for every month in the range; so zeros if null.
I do have a calendar table that I tried to use to get all months, but I ran into problems because of my "where" statements.
I know that both of these issues (comparing to previous year and including all dates in a date range) have been asked and answered before, and I've gotten them to work before myself, but this particular one has me running in circles. Any help would be appreciated.
I hope this is clear enough.
Thanks,
Tim
Treat the Query you have above as a data source. Run it as a CTE for the period you want to report, plus the period - 12 months (to get the historic data). (SalesPerMonth)
Then do a query that gets all the months you need from your calendar table as another CTE. This is the reporting months, not the previous year. (MonthsToReport)
Get a list of every valid zip code / Store combo - probably a select distinct from the SalesPerMonth CTE this would give you only combos that have at least one sale in the period (or historical period - you probably also want ones that sold last year, but not this year). Another CTE - StoreZip
Finally, your main query cross joins the StoreZip results with the MonthsToReport - this gives you the one row per StoreZip/Month combos you are looking for. Left join twice to the SalesPerMonth data, once for the month, once for the 1 year previous data. Use ISNULL to change any null records (no data) to zero.
Instead of CTEs, you could also do it as separate queries, storing the results in Temp tables instead. This may work better for large amounts of data.

No Records Returned in Left Join with Inequality in MS Access

Description
Hi,
I have a query using the same table twice in a left join with a inequality, but it does not produce any records, even though I am using a left join. I use MS Access 2013.
Code
The code is:
SELECT DCT01A.*,
DCT01B.*
FROM utb_DCT_01_DailyConversionTrends AS DCT01A
LEFT JOIN utb_DCT_01_DailyConversionTrends AS DCT01B
ON DCT01A.[Hour] = DCT01B.[Hour]
AND DCT01A.[WeekDay] = DCT01B.[WeekDay]
AND DCT01A.[Specification] = DCT01B.[Specification]
AND INT(DCT01A.[Date]) > INT(DCT01B.[Date])
Data
I am expecting (at the moment, though this will change later) that an inner join would result in no records produced. This is due to that this is only test data and the dates only span two days.
Hour and WeekDay refers to the current the hour of the day and the week day number in the week. Specification is an ID. Date is that date, which include time.
Goal
My goal of this query is to find all previous data on the same week day and hour, without picking the current record again (hence an inequality).
I realise I can simply run the inequality as >= instead, and then remove the = records afterwards. So I do have a simple workaround, I just can't understand why it won't work when written as above.
Thank you all for looking at this.
Would this work for you?
SELECT * FROM
(
SELECT DCT01A.*,
DCT01B.*
FROM utb_DCT_01_DailyConversionTrends AS DCT01A
LEFT JOIN utb_DCT_01_DailyConversionTrends AS DCT01B
ON DCT01A.[Hour] = DCT01B.[Hour]
AND DCT01A.[WeekDay] = DCT01B.[WeekDay]
AND DCT01A.[Specification] = DCT01B.[Specification]
AND INT(DCT01A.[Date]) >= INT(DCT01B.[Date])
)
WHERE
DCT01A.[Date] <> DCT01B.[Date]
Kindest regards..

Daily counts with TSQL?

I have a site where I record client metrics in a SQL Server 2008 db on every link clicked. I have already written the query to get the daily total clicks, however I want to find out how many times the user clicked within a given timespan (ie. within 5 seconds).
The idea here is to lock out incoming IP addresses that are trying to scrape content. It would be assumed that if more than 5 "clicks" is detected within 5 seconds or the number of daily clicks from a given IP address exceeds some value, that this is a scraping attempt.
I have tried a few variations of the following:
-- when a user clicked more than 5 times in 5 seconds
SELECT DATEADD(SECOND, DATEDIFF(SECOND, 0, ClickTimeStamp), 0) as ClickTimeStamp, COUNT(UserClickID) as [Count]
FROM UserClicks
WHERE DATEDIFF(SECOND, 0, ClickTimeStamp) = 5
GROUP BY IPAddress, ClickTimeStamp
This one in particular returns the following error:
Msg 535, Level 16, State 0, Line 3 The datediff function resulted in
an overflow. The number of dateparts separating two date/time
instances is too large. Try to use datediff with a less precise
datepart.
So once again, I want to use the seconds datepart, which I believe I'm on the right track, but not quite getting it.
Help appreciated. Thanks.
-- UPDATE --
Great suggestions and helped me think that the approach is wrong. The check is going to be made on every click. What I should do is for a given timestamp, check to see if in the last 5 seconds 5 clicks have been recorded from the same IP address. So it would be something like, count the number of clicks for > GetDate() - 5 seconds
Trying the following still isn't giving me an accurate figure.
SELECT COUNT(*)
FROM UserClicks
WHERE ClickTimeStamp >= GetDate() - DATEADD(SECOND, -5, GetDate())
Hoping my syntax is good, I only have oracle to test this on. I'm going to assume you have an ID column called user_id that is unique to that user (is it user_click_id? helpful to include table create statements in these questions when you can)
You'll have to preform a self join on this one. Logic will be take the userclick and join onto userclick on userId = userId and difference on clicktimestamp is between 0-5 seconds. Then it's counting from the subselect.
select u1.user_id, u1.clicktimestamp, u2.clicktimestamp
from userclicks uc1
left join user_clicks uc2
on u2.userk_id = u1.user_id
and datediff(second,u1.ClickTimeStamp,u2.ClickTimeStamp) <= 5
and datediff(second,u1.ClickTimeStamp,u2.ClickTimeStamp) > 0
This select statement should give you the user_id/clicktimestampe and 1 row for every record that is between 0 and 5 seconds apart from that clicktimestamp from the same user. Now it's just a matter of counting all user_id,u1.clicktimestamp combinations and highlighting the ones with 5 or more. Take the above query and turn it into a subselect and pull counts from it:
select u1.user_id, u1.clicktimestamp, count(1)
from
(select u1.user_id, u1.clicktimestamp
from userclicks uc1
left join user_clicks uc2
on u2.userk_id = u1.user_id
and datediff(second,u1.ClickTimeStamp,u2.ClickTimeStamp) <= 5
and datediff(second,u1.ClickTimeStamp,u2.ClickTimeStamp) > 0) a
group by u1.user_id, u1.clicktimestamp
having count(1) >= 5
Wish I could verify my syntax on a MS machine....there might be some typo's in there, but the logic should be good.
An answer for your UPDATE: the problem is in the third line of
SELECT COUNT(*)
FROM UserClicks
WHERE ClickTimeStamp >= GetDate() - DATEADD(SECOND, -5, GetDate())
GetDate() - DATEADD(SECOND, -5, GetDate()) is saying "take the current date time and subtract (the current date time minus five seconds)". I'm not entirely sure what kind of value this produces, but it won't be the one you want.
You still want some kind of time-period, perahps like so:
SELECT count(*)
from UserClicks
where IPAddress = #IPAddress
and ClickTimeStamp between getdate() and dateadd(second, -5, getdate())
I'm a bit uncomfortable using getdate() there--if you have a specific datetime value (accurate to the second), you should probably use it.
Assuming log entries are only entered for current activity -- that is, whenever a new row is inserted, the logged time is for that point in time and never for any prior point in time -- then you should only need to review data for a set period of time, and not have to review "all data" as you are doing now.
Next question is: how frequently do you make this check? If you are concerned with clicks per second, then something between "once per hour" and "once every 24 hours" seems reasonable.
Next up: define your interval. "All clicks per IPAddress within 5 seconds" could go two ways: set window (00-04, 05-09, 10-14, etc), or sliding window(00-04, 01-05, 02-06, etc.) Probably irrelevant with a 5 second window, but perhaps more relevant for longer periods (clicks per "day").
With that, the general approach I'd take is:
Start with earliest point in time you care about (1 hour ago, 24 hours ago)
Set up "buckets", means by which time windows can be identified (00:00:00 - 00:00:04, 00:00:05 - 00:00:09, etc.). This could be done as a temp table.
For all events, calculate number of elapsed seconds since your earliest point
For each bucket, count number of events that hit that bucket, grouped by IPAddress (inner join on the temp table on seconds between lowValue and highValue)
Identify those that exceed your threshold (having count(*) > X), and defenestrate them.