Getting the nearest instances from the past and the future - postgresql

I've got a table that contains appointment records, and the user selects a date range (begin_date, end_date). I would like to get the appointment that falls in this date range, as well as the closest instances in the past and future that fall outside of this date range (aka, the previous occurrence and the next occurrence).
I think the best way to approach this is using CTEs and self joins, but I'm open to another strategy. This is my current query:
WITH present AS
(SELECT appt.ewssubject, appt.ewsstart::DATE, appt.ewsend::DATE
FROM appointment appt
WHERE (appt.ewsstart, appt.ewsend) OVERLAPS (begin_date::DATE, end_date::DATE))
SELECT
present.ewssubject, present.ewsstart, present.ewsend,
past.ewssubject AS pastsubject, past.ewsstart::DATE AS paststart,past.ewsend::DATE AS pastend,
future.ewssubject AS futuresubject, future.ewsstart::date AS futurestart, future.ewsend::date AS futureend
FROM present
LEFT JOIN appointment AS past USING (ewssubject)
LEFT JOIN appointment AS future USING (ewssubject)
WHERE
present.ewssubject = past.ewssubject AND
present.ewssubject = future.ewssubject AND
past.ewsend < present.ewsstart AND
future.ewsstart > present.ewsend
ORDER BY present.ewsstart ASC
I'm getting a huge list of appointments, and there are a lot of repeats -- like so:
subject start end last_start last_end next_start next_end
DINNER 2015-09-18 2015-09 18 2015-09-17 2015-09-17 2015-09-19 2015-09-19
DINNER 2015-09-18 2015-09 18 2015-09-17 2015-09-17 2015-09-19 2015-09-19
... // more repeats! :(
All I want to do is reduce the number of duplicates, like this format:
subject start end last_start last_end next_start next_end
DINNER 2015-09-18 2015-09-18 2015-09-17 2015-09-17 2015-09-19 2015-09-19
DINNER 2015-09-21 2015-09-21 2015-09-18 2015-09-18 2015-10-02 2015-10-02
... // and so on
n.b. An appointment can span multiple days.
How can I fix my query? Or is there another one I can write?

You did not enter details about your data so I'm not sure whether this is a good idea, but you can probably use window functions:
select
ewssubject, ewsstart, ewsend,
lag(ewsstart) over (partition by ewssubject order by ewstart) prior_start,
lag(ewsend) over (partition by ewssubject order by ewstart) prior_end,
lead(ewsstart) over (partition by ewssubject order by ewstart) next_start,
lead(ewsend) over (partition by ewssubject order by ewstart) next_end
from appointment
order by ewstart;

Related

How to get this query logic (instead of using Checksum)?

I have been struggling to get the right data using Checksum for last 15+ days, and now I am trying to find other way.
I am trying to get any data output that has been changed from Previous day's file to Today's file on Punch Card's punch_start HOUR due to unexpected Time Zone hour change (not minute).
Please see the bottom sample of data.
Dataset1 (Yesterday's file):
chcecksum person_id applied_date punch_start punch_end punch_hours
-1552866149 650067 2022-09-04 2022-09-04T20:11:00Z 2022-09-04T22:52:00Z 2.68333333333333
-1367087212 650067 2022-09-04 2022-09-04T22:52:00Z 2022-09-04T23:26:00Z 0.566666666666667
Dataset2 (Today's file):
chcecksum person_id applied_date punch_start punch_end punch_hours
-1564056421 650067 2022-09-04 2022-09-04T20:11:00Z 2022-09-04T22:52:00Z 2.683333333
-1470176798 650067 2022-09-04 2022-09-04T20:52:00Z 2022-09-04T23:26:00Z 0.566666667
So, what I am trying to is if there is any change of HOUR (in this example) on punch_start only, it will notify (or select those ones).
In this case, there was change from 22:52:00Z to 20:52:00Z on the second entry.
Checksum would not work because if there is any change like 2.683333333 to 2.68333 (without change of punch_start), it will still create different checksum value.
The challenge is finding unique ID for those corresponding entries of two datasets, and it has been a struggle for me.
I have been using something like bottom to create an unique ID for each entry:
,concat(
[person_id],
[applied_date] ,
[punch_hours],
datepart(minute, convert(datetime, cast([punch_start] as datetime), 112))
But, it sill gives me a lot of duplicates because if somebody works from
9:00 AM -- 12:00 PM &
1:00 PM -- 5:00 PM on the same day,
it would create duplicates because they work on the same [applied_date] and same [punch_hours] and same [min].
How do we tackle this?
Have you looked at using EXCEPT?
-- Prep data
select *
INTO #yesterday
from (values
(-1552866149 ,650067 , '2022-09-04', cast('2022-09-04T20:11:00Z' as datetime), cast('2022-09-04T22:52:00Z' as datetime) , 2.68333333333333 ),
(-1367087212 ,650067 , '2022-09-04', cast('2022-09-04T22:52:00Z' as datetime), cast('2022-09-04T23:26:00Z' as datetime) , 0.566666666666667)
)t1(chcecksum ,person_id ,applied_date ,punch_start ,punch_end ,punch_hours)
select *
INTO #today
from (values
(-1564056421 , 650067 ,'2022-09-04', cast('2022-09-04T20:11:00Z' as datetime), cast('2022-09-04T22:52:00Z' as datetime), 2.683333333),
(-1470176798 , 650067 ,'2022-09-04', cast('2022-09-04T20:52:00Z' as datetime), cast('2022-09-04T23:26:00Z' as datetime), 0.566666667)
)t2(chcecksum ,person_id ,applied_date ,punch_start ,punch_end ,punch_hours)
-- output
select
person_id,
applied_date,
punch_end,
Round(punch_hours, 4) as punch_hours, -- hope this is acceptable
datepart(HH, punch_start) as punch_start_hour, -- only looking for changes to HOUR
format(punch_start, 'yyyy-MM-dd XX:mm') as punch_start_hourless -- mask the the hour with XX so the rest of the Datetime can still be compared
from #yesterday
except
select
person_id,
applied_date,
punch_end,
Round(punch_hours, 4) as punch_hours,
datepart(HH, punch_start) as punch_start_hour,
format(punch_start, 'yyyy-MM-dd XX:mm') as punch_start_hourless
from #today
Wrap the 'output' query in this if you want to get the original values (minus the checksum )
SELECT
person_id
,applied_date
,Cast(REPLACE(punch_start_hourless, 'XX', punch_start_hour) as Datetime) as punch_start
,punch_end
,punch_hours
FROM (
-- insert query from above
) sub
You can use FULL OUTER JOIN to identified rows that exists in one table but not in the other
select *
from Dataset1 d1
full outer join Dataset2 d2 on d1.person_id = d2.person_id
and d1.applied_date = d2.applied_date
and d1.punch_start = d2.punch_start

Get postgres query log statement and duration as one record

I have log_min_duration_statement=0 in config.
When I check log file, sql statement and duration are saved into different rows.
(Not sure what I have wrong, but statement and duration are not saved together as this answer points)
As I understand session_line_num for duration record always equals to session_line_num + 1 for relevant statement, for same session of course.
Is this correct? is below query reliable to correctly get statement with duration in one row?
(csv log imported into postgres_log table):
WITH
sql_cte AS(
SELECT session_id, session_line_num, message AS sql_statement
FROM postgres_log
WHERE
message LIKE 'statement%'
)
,durat_cte AS (
SELECT session_id, session_line_num, message AS duration
FROM postgres_log
WHERE
message LIKE 'duration%'
)
SELECT
t1.session_id,
t1.session_line_num,
t1.sql_statement,
t2.duration
FROM sql_cte t1
LEFT JOIN durat_cte t2
ON t1.session_id = t2.session_id AND t1.session_line_num + 1 = t2.session_line_num;

Trying to isolate the hours given a date range (YYYY:MM:DD HH:MM:SS) in SQL and group them. by specific hour intervals regardless of the date

I am struggling trying to extract information out of the database I created in SQL. The views work great and all data is displayed but I am trying to isolate the following:
Isolate time frames from 07:00:00 to 09:00:00.
Still new to coding, so help is appreciated.
SELECT ch.name,
t.date,
t.amount,
t.card AS "Credit Card",
t.id_merchant,
m.name AS "Merchant",
mc.name AS "merchant category"
FROM transaction AS t
JOIN credit_card AS cc
ON (t.card = cc.card)
JOIN card_holder AS ch
ON (cc.cardholder_id = ch.id)
JOIN merchant AS m
ON (t.id_merchant = m.id
JOIN merchant_category AS mc
ON (m.id_merchant_category = mc.id);

How to select first and last records between certain date parameters?

I need a Query to extract the first instance and last instance only between date parameters.
I have a Table recording financial information with financialyearenddate field linked to Company table via companyID. Each company is also linked to programme table and can have multiple programmes. I have a report to pull the financials for each company
on certain programme which I have adjusted to pull only the first and last instance (using MIN & MAX) however I need the first instance.
after a certain date parameter and the last instance before a certain date parameter.
Example: Company ABloggs has financials for 1999,2000,2001,2004,2006,2007,2009 but the programme ran from 2001 to 2007 so I only want
the first financial record and last financial record between those years i.e. 2001 & 2007 records. Any help appreciated.
At the moment I am using 2 queries as I needed the data in a hurry but I need it in 1 query and only where financial year end dates are between parameters and only where there are minimum of 2 GVA records for a company.
Query1:
SELECT
gva.ccx_companyname,
gva.ccx_depreciation,
gva.ccx_exportturnover,
gva.ccx_financialyearenddate,
gva.ccx_netprofitbeforetax,
gva.ccx_totalturnover,
gva.ccx_totalwages,
gva.ccx_statusname,
gva.ccx_status,
gva.ccx_company,
gva.ccx_totalwages + gva.ccx_netprofitbeforetax + gva.ccx_depreciation AS GVA,
gva.ccx_nofulltimeequivalentemployees
FROM
(
SELECT
ccx_companyname,
MAX(ccx_financialyearenddate) AS LatestDate
FROM Filteredccx_gva AS Filteredccx_gva_1
GROUP BY ccx_companyname
) AS min_1
INNER JOIN Filteredccx_gva AS gva
ON min_1.ccx_companyname = gva.ccx_companyname AND
min_1.LatestDate = gva.ccx_financialyearenddate
WHERE (gva.ccx_status = ACTUAL)
Query2:
SELECT
gva.ccx_companyname,
gva.ccx_depreciation,
gva.ccx_exportturnover,
gva.ccx_financialyearenddate,
gva.ccx_netprofitbeforetax,
gva.ccx_totalturnover,
gva.ccx_totalwages,
gva.ccx_statusname,
gva.ccx_status,
gva.ccx_company,
gva.ccx_totalwages + gva.ccx_netprofitbeforetax + gva.ccx_depreciation AS GVA,
gva.ccx_nofulltimeequivalentemployees
FROM
(
SELECT
ccx_companyname,
MIN(ccx_financialyearenddate) AS FirstDate
FROM Filteredccx_gva AS Filteredccx_gva_1
GROUP BY ccx_companyname
) AS MAX_1
INNER JOIN Filteredccx_gva AS gva
ON MAX_1.ccx_companyname = gva.ccx_companyname AND
MAX_1.FirstDate = gva.ccx_financialyearenddate
WHERE (gva.ccx_status = ACTUAL)
Can't you just add a where clause using the first and last date parameters. Something like this:
SELECT <companyId>, MIN(<date>), MAX(<date>)
FROM <table>
WHERE <date> BETWEEN #firstDate AND #lastDate
GROUP BY <companyId>
declare #programme table (ccx_companyname varchar(max), start_year int, end_year int);
insert #programme values
('ABloggs', 2001, 2007);
declare #companies table (ccx_companyname varchar(max), ccx_financialyearenddate int);
insert #companies values
('ABloggs', 1999)
,('ABloggs', 2000)
,('ABloggs', 2001)
,('ABloggs', 2004)
,('ABloggs', 2006)
,('ABloggs', 2007)
,('ABloggs', 2009);
select c.ccx_companyname, min(ccx_financialyearenddate), max(ccx_financialyearenddate)
from #companies c
join #programme p on c.ccx_companyname = p.ccx_companyname
where c.ccx_financialyearenddate >= p.start_year and c.ccx_financialyearenddate <= p.end_year
group by c.ccx_companyname
having count(*) > 1;
You can combine your two original queries into a single query by including the MIN and MAX aggregates in the same GROUP BY query of the virtual table. Also including COUNT() and HAVING COUNT() > 1 ensures company must have at least 2 dates. So query should look like:
SELECT
gva.ccx_companyname,
gva.ccx_depreciation,
gva.ccx_exportturnover,
gva.ccx_financialyearenddate,
gva.ccx_netprofitbeforetax,
gva.ccx_totalturnover,
gva.ccx_totalwages,
gva.ccx_statusname,
gva.ccx_status,
gva.ccx_company,
gva.ccx_totalwages + gva.ccx_netprofitbeforetax + gva.ccx_depreciation AS GVA,
gva.ccx_nofulltimeequivalentemployees
FROM
(SELECT
ccx_companyname,
ccx_status,
MIN(ccx_financialyearenddate) AS FirstDate,
MAX(ccx_financialyearenddate) AS LastDate,
COUNT(*) AS NumDates
FROM Filteredccx_gva AS Filteredccx_gva_1
WHERE (ccx_status = ACTUAL)
GROUP BY ccx_companyname, ccx_status
HAVING COUNT(*) > 1
) AS MinMax
INNER JOIN Filteredccx_gva AS gva
ON MinMax.ccx_companyname = gva.ccx_companyname AND
(MinMax.FirstDate = gva.ccx_financialyearenddate OR
MinMax.LastDate = gva.ccx_financialyearenddate)
WHERE (gva.ccx_status = MinMax.ccx_status)
ORDER BY gva.ccx_companyname, gva.ccx_financialyearenddate

TSQL CTE Error: Incorrect syntax near ')'

I am developing a TSQL stored proc using SSMS 2008 and am receiving the above error while generating a CTE. I want to add logic to this SP to return every day, not just the days with data. How do I do this? Here is my SP so far:
ALTER Proc [dbo].[rpt_rd_CensusWithChart]
#program uniqueidentifier = NULL,
#office uniqueidentifier = NULL
AS
DECLARE #a_date datetime
SET #a_date = case when MONTH(GETDATE()) >= 7 THEN '7/1/' + CAST(YEAR(GETDATE()) AS VARCHAR(30))
ELSE '7/1/' + CAST(YEAR(GETDATE())-1 AS VARCHAR(30)) END
if exists (
select * from tempdb.dbo.sysobjects o where o.xtype in ('U') and o.id = object_id(N'tempdb..#ENROLLEES')
) DROP TABLE #ENROLLEES;
if exists (
select * from tempdb.dbo.sysobjects o where o.xtype in ('U') and o.id = object_id(N'tempdb..#DISCHARGES')
) DROP TABLE #DISCHARGES;
declare #sum_enrollment int
set #sum_enrollment =
(select sum(1)
from enrollment_view A
join enrollment_info_expanded_view C on A.enrollment_id = C.enroll_el_id
where
(#office is NULL OR A.group_profile_id = #office)
AND (#program is NULL OR A.program_info_id = #program)
and (C.pe_end_date IS NULL OR C.pe_end_date > #a_date)
AND C.pe_start_date IS NOT NULL and C.pe_start_date < #a_date)
select
A.program_info_id as [Program code],
A.[program_name],
A.profile_name as Facility,
A.group_profile_id as Facility_code,
A.people_id,
1 as enrollment_id,
C.pe_start_date,
C.pe_end_date,
LEFT(datename(month,(C.pe_start_date)),3) as a_month,
day(C.pe_start_date) as a_day,
#sum_enrollment as sum_enrollment
into #ENROLLEES
from enrollment_view A
join enrollment_info_expanded_view C on A.enrollment_id = C.enroll_el_id
where
(#office is NULL OR A.group_profile_id = #office)
AND (#program is NULL OR A.program_info_id = #program)
and (C.pe_end_date IS NULL OR C.pe_end_date > #a_date)
AND C.pe_start_date IS NOT NULL and C.pe_start_date >= #a_date
;WITH #ENROLLEES AS (
SELECT '7/1/11' AS dt
UNION ALL
SELECT DATEADD(d, 1, pe_start_date) as dt
FROM #ENROLLEES s
WHERE DATEADD(d, 1, pe_start_date) <= '12/1/11')
The most obvious issue (and probably the one that causes the error message too) is the absence of the actual statement to which the last CTE is supposed to pertain. I presume it should be a SELECT statement, one that would combine the result set of the CTE with the data from the #ENROLLEES table.
And that's where another issue emerges.
You see, apart from the fact that a name that starts with a single # is hardly advisable for anything that is not a local temporary table (a CTE is not a table indeed), you've also chosen for your CTE a particular name that already belongs to an existing table (more precisely, to the already mentioned #ENROLLEES temporary table), and the one you are going to pull data from too. You should definitely not use an existing table's name for a CTE, or you will not be able to join it with the CTE due to the name conflict.
It also appears that, based on its code, the last CTE represents an unfinished implementation of the logic you say you want to add to the SP. I can suggest some idea, but before I go on I'd like you to realise that there are actually two different requests in your post. One is about finding the cause of the error message, the other is about code for a new logic. Generally you are probably better off separating such requests into distinct questions, and so you might be in this case as well.
Anyway, here's my suggestion:
build a complete list of dates you want to be accounted for in the result set (that's what the CTE will be used for);
left-join that list with the #ENROLLEES table to pick data for the existing dates and some defaults or NULLs for the non-existing ones.
It might be implemented like this:
… /* all your code up until the last WITH */
;
WITH cte AS (
SELECT CAST('7/1/11' AS date) AS dt
UNION ALL
SELECT DATEADD(d, 1, dt) as dt
FROM cte
WHERE dt < '12/1/11'
)
SELECT
cte.dt,
tmp.[Program code],
tmp.[program_name],
… /* other columns as necessary; you might also consider
enveloping some or all of the "tmp" columns in ISNULLs,
like in
ISNULL(tmp.[Program code], '(none)') AS [Program code]
to provide default values for absent data */
FROM cte
LEFT JOIN #ENROLLEES tmp ON cte.dt = tmp.pe_start_date
;