Report - SQL group by week number and year - tsql

I have a table temperatures with columns mac_address varchar(255), tm datetime,
temperature float.
I'd like to create a report with 3 parameters: mac_address, week_number and year.
The report should show maximum temperature on certain mac_address in certain week_number (01, ... 50, ...) in certain year. There may be more than 1 rows for certain week_number and year...
The SQL Query for the dataset could be something like
select max(temperature), mac_address, tm
from temperatures
group by mac_address
having mac_address = #mac_address and week_number = #week_number and year = #year
Do you know how to construct the query correctly? Maybe I will need 3 more datasets.
For #mac_address parameter it is easy. I will
select distinct mac_address from temperatures
But how can I do it with #week_number and #year parameters? Is the only option to add the values for dropdown list manually?
The possible result may be:
When the user select from the parameters
mac_address 001, week_number 47 and year 2019
max_temperature | tm
27.8 t1
27.8 t2
27.8 t3
Now it returns 3 rows. Most of the time there will be only one row.

From what I understand from your question, you want to dynamically search by weeknumber, mac_address and year returning the highest temperature for each day matching in that range. A query like the following should do what you are after.
declare #macaddress varchar(255) = '2', #WeekNumber int = 36, #year int = 2019
Select
Date = tm,
Temperature = max(temperature)
from
Temps t
where
year(tm)=#year
and mac_address=#macaddress
and datepart(week,tm) = #WeekNumber
group by
tm
If you want the mac_address included in the results, simply add to the Select and Group By sections.
Here's the SQL fiddle with a test setup.

For a data set to build the year parameter you could use something like this, which will get you 10 years. You can expand as needed:
;WITH years AS (
SELECT YEAR(DATEADD(YY,-5,GETDATE())) AS yr
UNION ALL
SELECT yr + 1
FROM years
WHERE yr < YEAR(DATEADD(YY,5,GETDATE()))
)
SELECT *
FROM years
For the weeks parameter you could hard-code them or use something like:
;WITH weeks AS (
SELECT 1 AS wk
UNION ALL
SELECT wk + 1
FROM weeks
WHERE wk < 52
)
SELECT *
FROM weeks
In your main SQL data set you would want to use something like:
select max(temperature), mac_address, tm
from temperatures
where mac_address = #mac_address
and week_number = #week_number
and year = #year
group by mac_address, tm
Edit: remove tm from the SELECT and GROUP BY
select max(temperature), mac_address
from temperatures
where mac_address = #mac_address
and week_number = #week_number
and year = #year
group by mac_address

Related

How can I, in T-SQL, examine date intervals to remove overlapping intervals before adding totals together

I am running an analysis on medication prescribing practices. We want to identify whether someone has been on a class of medications for 60 days out of a 90 day quarter. We have a start and end date for each prescription, and the bounds of the quarter (e.g., 4/1/2022 – 6/30/2022). For each prescription I’ve calculated the number of days between the start and end date (only including days that fall within the bounds of the quarter). There are many instances in which multiple drugs within the same class are prescribed someone might try one antidepressant but not like it, so be given another in the same class.
My original strategy was just to total up number of days for each class of medication and see if it’s 60 or over. The days don’t have to be consecutive, but if they overlap, days during an overlap period shouldn’t count twice (which they would in a simple sum).
For instance in the data table below, patient 1 in row 1 should be included as they are over 60 days. Patient 2 should also get in (rows 2 and 3) because the non-overlapping total (57+8) within the same med class gets them to over 60 days. However, patient 3 should NOT get in, even though the total of 32 + 32 is over 60 because the intervals overlap. This means that they were really on the medication class for only 32 days – this is an instance where someone might be on two different antidepressants simultaneously.
It’s not sufficient to just sum the days in the interval, but I also have to include some way to examine whether the intervals are overlapping and only add days if an interval for a given medication class falls outside another interval for that same class.
Row num Patid Med class Start date End date Interval
1 1 A 2022-04-28 2022-09-12 63
2 2 B 2022-05-03 2022-06-29 57
3 2 B 2022-04-21 2022-04-29 8
4 3 A 2022-01-19 2022-05-03 32
5 3 A 2022-01-19 2022-05-03 32
I’m having a hard time figuring out how to do this. Note, I'm limited to just using SQL for this.
Code that produced the above data. I would embed this in another query to generate a total interval but need to deal with the overlap issue.
DECLARE #startdt DATE;
DECLARE #enddt DATE;
SET #startdt='4/1/2022'
SET #enddt='6/30/2022'
--for q4 fy2022-23 (4/1/2022-6/30/2022)`
SELECT DISTINCT
rx.patid, d.medication_category as medcat, start_date, end_date,
-- case statement to capture days within quarter only
CASE WHEN start_date<#startdt and end_date>#enddt then 90
WHEN start_date<#startdt and end_date>=#startdt then datediff(d,#startdt,end_date)
WHEN start_date>=#startdt and end_date>#enddt then datediff(d,start_date,#enddt)
ELSE datediff(d,start_date,end_date)
END as interval
FROM rx
INNER JOIN Drug_names_categories d
ON rx.drugname=d.drugname
WHERE start_date<'7/1/2022' and end_date>'3/30/2022'
AND rx.patid IS NOT NULL
AND d.medication_category IS NOT NULL
AND d.medication_category <>''
You can accomplish what you want by generating a calendar table (using a Common Table Expression) of individual days within the test range, joining those days with the prescriptions with overlapping days, and then counting distinct days for each patient and medication category combination.
Something like:
DECLARE #startdt DATE = '2022-04-01';
DECLARE #enddt DATE = '2022-06-30';
DECLARE #threshold INT = 60;
WITH Days AS (
SELECT #startdt AS Day
UNION ALL
SELECT DATEADD(day, 1, Day)
FROM Days
WHERE Day < #enddt
)
SELECT
rx.patid, d.medication_category as medcat,
COUNT(DISTINCT DD.Day) AS days_medicated,
MIN(DD.Day) AS start_date,
MAX(DD.Day) AS end_date
FROM rx
INNER JOIN Drug_names_categories d
ON rx.drugname = d.drugname
INNER JOIN Days DD
ON DD.Day BETWEEN rx.start_date AND rx.end_date
WHERE rx.start_date <= #enddt AND #startdt <= rx.end_date
GROUP BY rx.patid, d.medication_category
HAVING COUNT(DISTINCT DD.Day) >= #threshold
ORDER BY rx.patid, start_date;
If using SQL Server 2022 or later, the Days generator can be simplified by using the new GENERATE_SERIES() function:
WITH Days AS (
SELECT DATEADD(day, S.value, #startdt) AS Day
FROM GENERATE_SERIES(0, DATEDIFF(day, #Startdt, #enddt)) S
)
See this db<>fiddle for an example with some sample data.
I would do this using a date/calendar table, then it's pretty easy.
If you don't already have a date table, this link is one of many that describe how to create one easily ( https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/ )
Here's the script from this link (in case the link dies)
DECLARE #StartDate date = '20100101';
DECLARE #CutoffDate date = DATEADD(DAY, -1, DATEADD(YEAR, 30, #StartDate));
;WITH seq(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM seq
WHERE n < DATEDIFF(DAY, #StartDate, #CutoffDate)
),
d(d) AS
(
SELECT DATEADD(DAY, n, #StartDate) FROM seq
),
src AS
(
SELECT
TheDate = CONVERT(date, d),
TheDay = DATEPART(DAY, d),
TheDayName = DATENAME(WEEKDAY, d),
TheWeek = DATEPART(WEEK, d),
TheISOWeek = DATEPART(ISO_WEEK, d),
TheDayOfWeek = DATEPART(WEEKDAY, d),
TheMonth = DATEPART(MONTH, d),
TheMonthName = DATENAME(MONTH, d),
TheQuarter = DATEPART(Quarter, d),
TheYear = DATEPART(YEAR, d),
TheFirstOfMonth = DATEFROMPARTS(YEAR(d), MONTH(d), 1),
TheLastOfYear = DATEFROMPARTS(YEAR(d), 12, 31),
TheDayOfYear = DATEPART(DAYOFYEAR, d)
FROM d
)
SELECT *
INTO MyDateTable
FROM src
ORDER BY TheDate
OPTION (MAXRECURSION 0);
No that you have your new date table you can join to it to get the list of dates that are within the start and end date, something like
SELECT DISTINCT COUNT(TheDate)
FROM rx
INNER JOIN MyDateTable dt on dt BETWEEN rx.start_date AND rx.end_date
INNER JOIN Drug_names_categories d ON rx.drugname=d.drugname
WHERE start_date<'7/1/2022' and end_date>'3/30/2022'
AND rx.patid IS NOT NULL
AND d.medication_category IS NOT NULL
AND d.medication_category <>''
Obviously this is simple example but you could extend this easily to include all the details you need, the point is that you now have a list of dates or distinct list of dates which you can work with easily.
You could also simply the date range applied by referencing the TheQuarter and TheYear columns. If this is a common task consider extending the date table to contain a comound YearQurater columns (e.g. 2023Q1/202301 etc)

extract days of daterange grouped by month postresql

I have a pickupDate and returnDate in my OrderHistory table. I want to extract the sum of rental days of all OrderHistory entries, grouped/ordered by month. A cte seems to be the solution but I don´t get how to implement it in my query since the cte´s i saw were refering to themselves where it says "FROM cte".
I tried something like this:
SELECT
SUM((EXTRACT (DAY FROM("OrderHistory"."returnDate")-("OrderHistory"."pickupDate")))) as traveltime
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
M
ORDER BY
M
But the outcome doesn´t split bookings btw two months (e.g. pickupDate=27th march 2022 and returnDate=03rd of april 2022) but will assign the whole 7 days to the month of march, since the returndate is in it. It should show 4 days in march and 3 in april.
Sorry for the probably very stupid question but I am a beginner. (my code is written in postgresql btw)
PostgreSQL naming conventions
Are PostgreSQL column names case-sensitive?
use legal, lower-case names exclusively so double-quoting is not
needed.
Final result in db fiddle
Add daterange column.
alter table order_history add column date_ranges daterange;
update order_history
with a(m_begin, m_end, pickup_date) as
(select date_trunc('month', pickup_date)::date,
(date_trunc('month', pickup_date) + interval '1 month - 1 day')::date,
pickup_date from order_history)
update order_history set date_ranges =
daterange(a.m_begin, a.m_end,'[]') from a
where a.pickup_date = order_history.pickup_date;
then final query:
WITH A AS(
select
pickup_date,
return_date,
return_date - pickup_date as total,
case when return_date <# date_ranges then (return_date - pickup_date)
else ( date_trunc('month', pickup_date) + interval '1 month - 1 day')::date - pickup_date
end partial_mth
from order_history),
b as (SELECT *, a.total - partial_mth parital_not_mth FROM a)
select *,
case when to_char(pickup_date,'YYYY-MM') = to_char(return_date,'YYYY-MM')
then
sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM')) +
sum(parital_not_mth) over (partition by to_char(return_date,'YYYY-MM'))
else sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM'))
end
from b;
After trying different things I think I found the best answer to my question, that I want to share with the community:
WITH hier as (
SELECT
"OrderHistory"."pickupDate" as start_date
, "OrderHistory"."returnDate" as end_date
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
1, 2, 3
ORDER BY
3
), calendar as (
select date '2022-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 365) n
)
select
to_char(calendar_date::date, 'YYYY-MM')
, count(*) as tage_gebucht
from calendar
inner join hier on calendar.calendar_date between start_date and end_date
where calendar_date between '2022-01-01' and '2022-12-31'
group by 1
order by 1;
I think this is the simplest solution I came up with.

How to form a dynamic pivot table or return multiple values from GROUP BY subquery

I'm having some major issues with the following query formation:
I have projects with start and end dates
Name Start End
---------------------------------------
Project 1 2020-08-01 2020-09-10
Project 2 2020-01-01 2025-01-01
and I'm trying to count the monthly working days within each project with the following subquery
select datetrunc('month', days) as d_month, count(days) as d_count
from generate_series(greatest('2020-08-01'::date, p.start), least('2020-09-14'::date, p.end), '1 day'::interval) days
where extract(DOW from days) not IN (0, 6)
group by d_month
where p.start is from the aliased main query and the dates are hard-coded for now, this correctly gives me the following result:
{"d_month"=>2020-08-01 00:00:00 +0000, "d_count"=>21}
{"d_month"=>2020-09-01 00:00:00 +0000, "d_count"=>10}
However subqueries can't return multiple values. The date range for the query is dynamic, so I would either need to somehow return the query as:
Name Start End 2020-08-01 2020-09-01 ...
-------------------------------------------------------------------------
Project 1 2020-08-01 2020-09-10 21 8
Project 2 2020-01-01 2025-01-01 21 10
Or simply return the whole subquery as JSON, but it doesn't seem to working either.
Any idea on how to achieve this or whether there are simpler solutions for this?
The most correct solution would be to create an actual calendar table that holds every possible day of interest to your business and, at a minimum for your purpose here, marks work days.
Ideally you would have columns to hold fiscal quarters, periods, and weeks to match your industry. You would also mark holidays. Joining to this table makes these kinds of calculations a snap.
create table calendar (
ddate date not null primary key,
is_work_day boolean default true
);
insert into calendar
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from generate_series(
'2000-01-01'::timestamp,
'2099-12-31'::timestamp,
interval '1 day'
) as gs(ts);
Assuming a calendar table is not within scope, you can do this:
with bounds as (
select min(start) as first_start, max("end") as last_end
from my_projects
), cal as (
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from bounds
cross join generate_series(
first_start,
last_end,
interval '1 day'
) as gs(ts)
), bymonth as (
select p.name, p.start, p.end,
date_trunc('month', c.ddate) as month_start,
count(*) as work_days
from my_projects p
join cal c on c.ddate between p.start and p.end
where c.is_work_day
group by p.name, p.start, p.end, month_start
)
select jsonb_object_agg(to_char(month_start, 'YYYY-MM-DD'), work_days)
|| jsonb_object_agg('name', name)
|| jsonb_object_agg('start', start)
|| jsonb_object_agg('end', "end") as result
from bymonth
group by name;
Doing a pivot from rows to columns in SQL is usually a bad idea, so the query produces json for you.

Extracting data that is relevant to the financial year as set by the parameter date

I have a student_table and in this table there is a column student_financial_aid_type and the next column is date_ , so the value of student_financial_aid_type e.g. = 'direct' and the date_ 1/04/2018. I have used CTE tables and I have a parameter date at the beginning of the code, so that I get the number of students as of that day. e.g. my parameter date is 20/04/2019.
My financial year runs from april to march eg 1/04/18 - 31/3/19.
My question is where, it indicates that the student received some form of financial aid in the financial year, I will have an output column that says either 'Y' or 'N'. So using the example above, because the date 1/04/2018 is not in the financial year of the parameter date (20/04/19), it's actually in the previous financial year (1/04/18 - 31/3/19) then I would want this to be 'N' in the output column as in the financial year of the parameter date (20/04/19) the student did not receive any financial aid. However if I happen to change the parameter date 2/06/18, then the date that the student received the financial aid (1/04/18) is in the dame financial year as the parameter date, therefore my output column will now have 'Y' to reflect this. So however I do this it has to be dynamic and respond to the parameter date as that is the one that I as the user will be changing as and when
I have tried using date_part and I have managed to have the month number of the date that the student received the payout, from this point on I was thinking of using the month number as an indicator to what FY year it falls in, but I am not sure how to go about this.
WITH
parameter_date as (
select '2019-04-26':: date p_date),
student_cohort as (select * from (
SELECT Distinct
ms.studentid,ss.student_admission_date,ms.graduation_date
FROM master_student_table ms
left join student_semeter ss on ms.student_id=ss.student_id ,
parameter_date, p
AND ss.student_admission_date <= p_date -- i.e. began studies less than
or equal to p_date
AND (ms.graduation_date is null or ms.graduation_date > p_date)) -- i.e.
student finished studies more than p_date or IS NULL
)x ),
student_finance as (select * from ( select date_part('month', st.date_::
date)
date_part, st.date_, st.studentid,st.student_financial_aid_type
from student_table st
left join student_cohort s on st.studentid = s.studentid
where st.student_financial_aid_type in ('direct' , 'indirect')
) x )
select distinct
s.student_id,
s.graduation_date,
s.admissiondate_date,
sf.date_,
-- this is what I would like it to be -- case when sf.date is in the same
--financial year as the parameter_date
--then 'Y' else 'N' end was_financial_aid_received_in_the_fy,
sf.date_part
from
cohort s
left join student_finance sf on s.student_id = sf.student_id and
sf.student_financial_aid_type = 'direct'
left join student_finance sf1 on s.student_id = sf1.student_id and
sf1.student_financial_aid_type = 'indirect' `
I would love for the output column 'was_financial_aid_received_in_the_fy' from the case statement, to have 'Y' if the sf.date_ that the student received financial aid is in the same FY year as the parameter_date and 'N' if this isn't the case
Thank you very much for all your help
I think this question basically boils down to the following:
Given a parameter date, figure out the financial year for that date.
Figure out if other dates fall in this financial year.
This is a great place to use dateranges, one of my favorite types. We can figure out the financial year from the parameter date and use a daterange to represent it. If the parameter date is before April, the financial year should be from April 1 of the previous year (inclusive) to April 1 of this year (exclusive). If the parameter date is after April, the financial year should be April 1 of this year (inclusive) to April 1 of next year (exclusive).
Here's a query that should demonstrate how to do this:
WITH parameter_date as (
select '2019-04-26'::date p_date
), fiscal_year as (
select daterange(
make_date(case when date_part('month', p_date)<4
THEN date_part('year', p_date)::int-1
ELSE date_part('year', p_date)::int END,
4, 1),
make_date(case when date_part('month', p_date)<4
THEN date_part('year', p_date)::int
ELSE date_part('year', p_date)::int+1 END,
4, 1),
'[)') as f_year
FROM parameter_date
),
test_data as (
select test_date::date from (values
('2019-04-01'),
('2018-04-01'),
('2019-03-02'),
('2020-12-01'),
('2017-05-26'),
('2020-02-27'),
('2020-04-01')
) v(test_date)
)
select test_date,
CASE WHEN test_date <# fiscal_year.f_year THEN 'Y' ELSE 'N' END as in_f_year
from test_data, fiscal_year;
test_date | in_f_year
------------+-----------
2019-04-01 | Y
2018-04-01 | N
2019-03-02 | N
2020-12-01 | N
2017-05-26 | N
2020-02-27 | Y
2020-04-01 | N
(7 rows)

Select lines whose date-field is in a given month and year

My SQL table looks like this:
id (int) | date (date) | text1 (varchar) | text2 (varchar)
I want to select the lines whose date suits a given month and year, regardless of the day.
Both month and year are given in the select-statement as integers.
So the missing thing is the where-clause. Perhaps extract() is the thing I'm looking for, but I don't know how to use it with the two integers, e.g. 2011 and 02.
You can use extract:
SELECT * FROM yourtable
WHERE EXTRACT(month FROM "date") = 2
AND EXTRACT(year FROM "date") = 2011
But in this case you could also do this:
SELECT * FROM yourtable
WHERE "date" >= '2011-02-01' AND "date" < '2011-03-01'