I have an Azure SQL database that includes an upload date and birth date. I've added a column called "AgeFlag" which I'd like to equal "Over 40" if upload date - birth date is >= 40, and "Under 40" otherwise. I think this means I need an update statement with an IF statement, but I am uncertain how to proceed:
UPDATE datasetitems SET ageflag = SELECT IF((datediff(year, d.timestamp, di.birthdate)>40,'Over 40','Under 40') FROM datasetitems di JOIN datasets d ON di.datasetid = d.datasetid);
Maybe this would be easier with a temporary table to do the age calculation?
A CASE expression rather than IF will do the job of your pseudo code. Here's an example, reversing the dates as #Seekwell74 noticed:
UPDATE datasetitems
SET ageflag = CASE WHEN DATEDIFF(YEAR, di.birthdate, d.timestamp) > 40 THEN 'Over 40' ELSE 'Under 40' END
FROM datasetitems di
JOIN datasets d ON di.datasetid = d.datasetid;
However, your age calculation is wrong. DATEDIFF counts the number of year boundaries between the dates, not the interval in years. For example, DATEDIFF will result in 1 year between dates 2017-12-31 and 2018-01-01.
Below is another method to calculate a person's age:
UPDATE datasetitems
SET ageflag = CASE WHEN (CAST(CONVERT(char(8), di.birthdate, 112) AS int) - CAST(CONVERT(char(8), d.timestamp, 112) AS int)) / 10000 > 40 THEN 'Over 40' ELSE 'Under 40' END
FROM datasetitems di
JOIN datasets d ON di.datasetid = d.datasetid;
Perhaps use as case statement:
UPDATE di
SET ageflag =
case
when datediff(year, di.birthdate, d.timestamp) > 40
then 'Over 40'
else 'Under 40'
end
FROM datasetitems di JOIN datasets d
ON di.datasetid = d.datasetid);
Also, I think your date parameters for datediff are reversed?
Use this
UPDATE datasetitems
SET ageflag = case when abs(datediff(year, d.timestamp, di.birthdate))>40 then 'Over 40' else'Under 40' end
FROM datasets d join datasetitems di ON di.datasetid = d.datasetid);
Related
I am running an analysis on medication prescribing practices. We want to identify whether someone has been on a class of medications for 60 days out of a 90 day quarter. We have a start and end date for each prescription, and the bounds of the quarter (e.g., 4/1/2022 – 6/30/2022). For each prescription I’ve calculated the number of days between the start and end date (only including days that fall within the bounds of the quarter). There are many instances in which multiple drugs within the same class are prescribed someone might try one antidepressant but not like it, so be given another in the same class.
My original strategy was just to total up number of days for each class of medication and see if it’s 60 or over. The days don’t have to be consecutive, but if they overlap, days during an overlap period shouldn’t count twice (which they would in a simple sum).
For instance in the data table below, patient 1 in row 1 should be included as they are over 60 days. Patient 2 should also get in (rows 2 and 3) because the non-overlapping total (57+8) within the same med class gets them to over 60 days. However, patient 3 should NOT get in, even though the total of 32 + 32 is over 60 because the intervals overlap. This means that they were really on the medication class for only 32 days – this is an instance where someone might be on two different antidepressants simultaneously.
It’s not sufficient to just sum the days in the interval, but I also have to include some way to examine whether the intervals are overlapping and only add days if an interval for a given medication class falls outside another interval for that same class.
Row num Patid Med class Start date End date Interval
1 1 A 2022-04-28 2022-09-12 63
2 2 B 2022-05-03 2022-06-29 57
3 2 B 2022-04-21 2022-04-29 8
4 3 A 2022-01-19 2022-05-03 32
5 3 A 2022-01-19 2022-05-03 32
I’m having a hard time figuring out how to do this. Note, I'm limited to just using SQL for this.
Code that produced the above data. I would embed this in another query to generate a total interval but need to deal with the overlap issue.
DECLARE #startdt DATE;
DECLARE #enddt DATE;
SET #startdt='4/1/2022'
SET #enddt='6/30/2022'
--for q4 fy2022-23 (4/1/2022-6/30/2022)`
SELECT DISTINCT
rx.patid, d.medication_category as medcat, start_date, end_date,
-- case statement to capture days within quarter only
CASE WHEN start_date<#startdt and end_date>#enddt then 90
WHEN start_date<#startdt and end_date>=#startdt then datediff(d,#startdt,end_date)
WHEN start_date>=#startdt and end_date>#enddt then datediff(d,start_date,#enddt)
ELSE datediff(d,start_date,end_date)
END as interval
FROM rx
INNER JOIN Drug_names_categories d
ON rx.drugname=d.drugname
WHERE start_date<'7/1/2022' and end_date>'3/30/2022'
AND rx.patid IS NOT NULL
AND d.medication_category IS NOT NULL
AND d.medication_category <>''
You can accomplish what you want by generating a calendar table (using a Common Table Expression) of individual days within the test range, joining those days with the prescriptions with overlapping days, and then counting distinct days for each patient and medication category combination.
Something like:
DECLARE #startdt DATE = '2022-04-01';
DECLARE #enddt DATE = '2022-06-30';
DECLARE #threshold INT = 60;
WITH Days AS (
SELECT #startdt AS Day
UNION ALL
SELECT DATEADD(day, 1, Day)
FROM Days
WHERE Day < #enddt
)
SELECT
rx.patid, d.medication_category as medcat,
COUNT(DISTINCT DD.Day) AS days_medicated,
MIN(DD.Day) AS start_date,
MAX(DD.Day) AS end_date
FROM rx
INNER JOIN Drug_names_categories d
ON rx.drugname = d.drugname
INNER JOIN Days DD
ON DD.Day BETWEEN rx.start_date AND rx.end_date
WHERE rx.start_date <= #enddt AND #startdt <= rx.end_date
GROUP BY rx.patid, d.medication_category
HAVING COUNT(DISTINCT DD.Day) >= #threshold
ORDER BY rx.patid, start_date;
If using SQL Server 2022 or later, the Days generator can be simplified by using the new GENERATE_SERIES() function:
WITH Days AS (
SELECT DATEADD(day, S.value, #startdt) AS Day
FROM GENERATE_SERIES(0, DATEDIFF(day, #Startdt, #enddt)) S
)
See this db<>fiddle for an example with some sample data.
I would do this using a date/calendar table, then it's pretty easy.
If you don't already have a date table, this link is one of many that describe how to create one easily ( https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/ )
Here's the script from this link (in case the link dies)
DECLARE #StartDate date = '20100101';
DECLARE #CutoffDate date = DATEADD(DAY, -1, DATEADD(YEAR, 30, #StartDate));
;WITH seq(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM seq
WHERE n < DATEDIFF(DAY, #StartDate, #CutoffDate)
),
d(d) AS
(
SELECT DATEADD(DAY, n, #StartDate) FROM seq
),
src AS
(
SELECT
TheDate = CONVERT(date, d),
TheDay = DATEPART(DAY, d),
TheDayName = DATENAME(WEEKDAY, d),
TheWeek = DATEPART(WEEK, d),
TheISOWeek = DATEPART(ISO_WEEK, d),
TheDayOfWeek = DATEPART(WEEKDAY, d),
TheMonth = DATEPART(MONTH, d),
TheMonthName = DATENAME(MONTH, d),
TheQuarter = DATEPART(Quarter, d),
TheYear = DATEPART(YEAR, d),
TheFirstOfMonth = DATEFROMPARTS(YEAR(d), MONTH(d), 1),
TheLastOfYear = DATEFROMPARTS(YEAR(d), 12, 31),
TheDayOfYear = DATEPART(DAYOFYEAR, d)
FROM d
)
SELECT *
INTO MyDateTable
FROM src
ORDER BY TheDate
OPTION (MAXRECURSION 0);
No that you have your new date table you can join to it to get the list of dates that are within the start and end date, something like
SELECT DISTINCT COUNT(TheDate)
FROM rx
INNER JOIN MyDateTable dt on dt BETWEEN rx.start_date AND rx.end_date
INNER JOIN Drug_names_categories d ON rx.drugname=d.drugname
WHERE start_date<'7/1/2022' and end_date>'3/30/2022'
AND rx.patid IS NOT NULL
AND d.medication_category IS NOT NULL
AND d.medication_category <>''
Obviously this is simple example but you could extend this easily to include all the details you need, the point is that you now have a list of dates or distinct list of dates which you can work with easily.
You could also simply the date range applied by referencing the TheQuarter and TheYear columns. If this is a common task consider extending the date table to contain a comound YearQurater columns (e.g. 2023Q1/202301 etc)
I have a postgres query like this
select application.status as status, count(*) as "current_month" from application
where to_char(application.created, 'mon') = to_char('now'::timestamp - '1 month'::interval, 'mon')
and date_part('year',application.created) = date_part('year', CURRENT_DATE)
and application.job_status != 'expired'
group by application.status
it returns the table below that has the number of applications grouped by status for the current month. However I want to subtract a total count of a seperate but related query from the internal review number only. I want to count the number of rows with type = abc within the same table and for the same date range and then subtract that amount from the internal review number (Type is a seperate field). Current_month_desired is how it should look.
status
current_month
current_month_desired
fail
22
22
internal_review
95
22
pass
146
146
UNTESTED: but maybe...
The intent here is to use an analytic and case expression to conditionally sum. This way, the subtraction is not needed in the first place as you are only "counting" the values needed.
SELECT application.status as status
, sum(case when type = 'abc'
and application.status ='internal_review' then 0
else 1 end) over (partition by application.status)) as
"current_month"
FROM application
WHERE to_char(application.created, 'mon') = to_char('now'::timestamp - '1 month'::interval, 'mon')
and date_part('year',application.created) = date_part('year', CURRENT_DATE)
and application.job_status != 'expired'
GROUP BY application.status
How to find the distribution of credit cards by year, and completed transaction. Group these credit cards into three buckets: less than 10 transactions, between 10 and 30 transactions, more than 30 transactions?
The first method I tried to use was using the width_buckets function in PostgresQL, but the documentation says that only creates equidistant buckets, which is not what I want in this case. Because of that, I turned to case statements. However, I'm not sure how to use the case statement with a group by.
This is the data I am working with:
table 1 - credit_cards table
credit_card_id
year_opened
table 2 - transactions table
transaction_id
credit_card_id - matches credit_cards.credit_card_id
transaction_status ("complete" or "incomplete")
This is what I have gotten so far:
SELECT
CASE WHEN transaction_count < 10 THEN “Less than 10”
WHEN transaction_count >= 10 and transaction_count < 30 THEN “10 <= transaction count < 30”
ELSE transaction_count>=30 THEN “Greater than or equal to 30”
END as buckets
count(*) as ct.transaction_count
FROM credit_cards c
INNER JOIN transactions t
ON c.credit_card_id = t.credit_card_id
WHERE t.status = “completed”
GROUP BY v.year_opened
GROUP BY buckets
ORDER BY buckets
Expected output
credit card count | year opened | transaction count bucket
23421 | 2002 | Less than 10
etc
You can specify the bin sizes in width_bucket by specifying a sorted array of the lower bound of each bin.
In you case, it would be array[10,30]: anything less than 10 gets bin 0, between 10 and 29 gets bin 1 and 30 or more gets bin 2.
WITH a AS (select generate_series(5,35) cnt)
SELECT cnt, width_bucket(cnt, array[10,30])
FROM a;
To figure this out you need to count transactions per credit card in order to figure out the right bucket, then you need to count the credit cards per bucket per year. There are a couple of different ways to get the final result. One way is to first join up all your data and compute the first level of aggregate values. Then compute the final level of aggregate values:
with t1 as (
select year_opened
, c.credit_card_id
, case when count(*) < 10 then 'Less than 10'
when count(*) < 30 then 'Between [10 and 30)'
else 'Greater than or equal to 30'
end buckets
from credit_cards c
join transactions t
on t.credit_card_id = c.credit_card_id
where t.transaction_status = 'complete'
group by year_opened
, c.credit_card_id
)
select count(*) credit_card_count
, year_opened
, buckets
from t1
group by year_opened
, buckets;
However, it may be more perforamant first calculate the first level of aggregate data on the transactions table before joining it to the credit cards table:
select count(*) credit_card_count
, year_opened
, buckets
from credit_cards c
join (select credit_card_id
, case when count(*) < 10 then 'Less than 10'
when count(*) < 30 then 'Between [10 and 30)'
else 'Greater than or equal to 30'
end buckets
from transactions
group by credit_card_id) t
on t.credit_card_id = c.credit_card_id
group by year_opened
, buckets;
If you prefer to unroll the above query and uses Common Table Expressions, you can do that too (I find this easier to read/follow along):
with bkt as (
select credit_card_id
, case when count(*) < 10 then 'Less than 10'
when count(*) < 30 then 'Between [10 and 30)'
else 'Greater than or equal to 30'
end buckets
from transactions
group by credit_card_id
)
select count(*) credit_card_count
, year_opened
, buckets
from credit_cards c
join bkt t
on t.credit_card_id = c.credit_card_id
group by year_opened
, buckets;
Not sure if this is what you are looking for.
WITH cte
AS (
SELECT c.year_opened
,c.credit_card_id
,count(*) AS transaction_count
FROM credit_cards c
INNER JOIN transactions t ON c.credit_card_id = t.credit_card_id
WHERE t.STATUS = 'completed'
GROUP BY c.year_opened
,c.credit_card_id
)
SELECT cte.year_opened AS 'year opened'
,SUM(CASE
WHEN transaction_count < 10
THEN 1
ELSE 0
END) AS 'Less than 10'
,SUM(CASE
WHEN transaction_count >= 10
AND transaction_count < 30
THEN 1
ELSE 0
END) AS '10 <= transaction count < 30'
,SUM(CASE
WHEN transaction_count >= 30
THEN 1
ELSE 0
END) AS 'Greater than or equal to 30'
FROM CTE
GROUP BY cte.year_opened
and the output would be as below.
year opened | Less than 10 | 10 <= transaction count < 30 | Greater than or equal to 30
2002 | 23421 | |
I'm trying to calculate the number of active transactions on the first of each month at a specific time. I can calculate that for a specific day for the time I want. For example I can run:
SELECT
A.country_code,
A.transaction_type,
COUNT (*)
FROM table A
JOIN table_history h ON h.listing_id = A.listing_id
WHERE h.start_date <= '2017-12-01 08:01:03' and NVL(h.end_date, '31-DEC-2099') >= '2017-12-01 08:01:03'
GROUP BY
A.country_code,
A.transaction_type
That code works fine for the first of December. However, I want to expand that to get all active transactions at that specific time (08:01:03) for every first of the month.
Thank you
Managed to sort it out by bringing in a Dim Date table and creating a complicated join:
SELECT
A.country_code,
A.transaction_type,
COUNT (*)
FROM table A
JOIN table_history h ON h.listing_id = A.listing_id
JOIN dim_date dd on h.start_date <= DATEADD(m,1,DATEADD(hour,8,dd.date)) and NVL(h.end_date, '31-DEC-2099') >= DATEADD(m,1,DATEADD(hour,8,dd.date))
WHERE dd.month_day_no = 1
--WHERE h.start_date <= '2017-12-01 08:01:03' and NVL(h.end_date, '31-DEC-2099') >= '2017-12-01 08:01:03'
GROUP BY
A.country_code,
A.transaction_type
You can cast a timestamp as a time to get just the time and use extract to get the day of the month:
where
cast (start_date as time) = '08:01:03' and
extract (day from start_date) = 1
I'm new to SQL and proceeded much by trial and error as well as searching books and the internet. I have to repeat a query for the sum over monthly data for five years and I'd like to insert the results for every month as a column in a table. I tried adding new columns for every month
alter table add column, insert etc.
but I can't get it right. Here's the code I used for jan and feb07:
CREATE TABLE "TVD_db"."lebendetiere"
(nuar text,
ak text,
sex text,
jan07 text,
feb07 text,
märz07 text,
april07 text,
mai07 text,
juni07 text,
juli07 text,
aug07 text,
sept07 text,
okt07 text,
nov07 text,
dez07 text,
jan08 text,
....
dez11 text);
INSERT INTO "TVD_db"."lebendetiere" (nuar, ak, sex, jan07)
SELECT
"AUFENTHALTE"."nuar",
CASE WHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") < 365 THEN '1' WHEN DATE('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") > 730 THEN 3 ELSE 2 END AS AK,
CASE WHEN "AUFENTHALTE"."isweiblich" = 'T' THEN 'female' ELSE 'male' END AS sex,
COUNT("AUFENTHALTE"."tierid")
FROM "TVD_db"."AUFENTHALTE"
WHERE DATE("AUFENTHALTE"."gueltigvon") <= DATE('2007-01-01')
AND DATE("AUFENTHALTE"."gueltigbis") >= DATE('2007-01-01')
GROUP BY "AUFENTHALTE"."nuar",
CASE WHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") < 365 THEN '1' WHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") > 730 THEN 3 ELSE 2 END,
CASE WHEN "AUFENTHALTE"."isweiblich" = 'T' THEN 'female' ELSE 'male' END
ORDER BY "AUFENTHALTE"."nuar",
CASE WHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") < 365 THEN '1' wWHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") > 730 THEN 3 ELSE 2 END,
CASE WHEN "AUFENTHALTE"."isweiblich" = 'T' THEN 'female' ELSE 'male' END
;
--until here it works fine
UPDATE "TVD_db"."lebendetiere" SET "feb07"= --this is the part I cant get right...
(SELECT
COUNT("AUFENTHALTE"."tierid")
FROM "TVD_db"."AUFENTHALTE"
WHERE DATE("AUFENTHALTE"."gueltigvon") <= DATE('2007-02-01')
AND DATE("AUFENTHALTE"."gueltigbis") >= DATE('2007-02-01')
GROUP BY "AUFENTHALTE"."nuar",
CASE WHEN DATE ('2007-02-01')- DATE ("AUFENTHALTE"."gebdat") < 365 THEN '1' WHEN DATE ('2007-02-01')- DATE ("AUFENTHALTE"."gebdat") > 730 THEN 3 ELSE 2 END,
CASE WHEN "AUFENTHALTE"."isweiblich" = 'T' THEN 'female' ELSE 'male' END
ORDER BY "AUFENTHALTE"."nuar",
CASE WHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") < 365 THEN '1' wWHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") > 730 THEN 3 ELSE 2 END,
CASE WHEN "AUFENTHALTE"."isweiblich" = 'T' THEN 'female' ELSE 'male' END);
Has anyone a solution or do I have to make a table for every month and then join the results?
After reading your post thoroughly, here is a complete redesign that should hold some insight for beginners in the field of SQL / PostgreSQL.
I would advise not to use mixed case identifiers in PostgreSQL. Use lower case exclusively, then you don't have to double-quote them and your code is much easier to read. You also avoid a lot of possible confusion.
Use table aliases to make your code more readable.
Column names in the SELECT statement for the INSERT are irrelevant. That's why I commented then out (avoids possible naming conflicts).
Use ordinal numbers in GROUP BY and ORDER BY to further simplify.
Don't use a separate column for every new month. Use a column identifying the month and add a row per month.
If you actually need the design with one column per month, then you need a large CASE statement or a pivot query. Refer to the tablefunc extension. But this is complicated stuff for an SQL newbie. I really think, you want a row per month.
I use generate_series() to generate one row per month between Jan 2007 and Dec 2011.
With my changed design, you don't need extra UPDATEs. It's all done in one INSERT.
I simplified quite a couple of other things. Here is what I would propose instead:
CREATE TABLE tvd_db.lebendetiere(
nuar text,
,alterskat integer
,sex text
,datum date
,anzahl integer
);
INSERT INTO tvd_db.lebendetiere (nuar, alterskat, sex, datum, anzahl)
SELECT a.nuar
,CASE WHEN a.gebdat >= '2006-01-01'::date THEN 1 -- use >= !
WHEN a.gebdat < '2005-01-01'::date THEN 3
ELSE 2 END -- AS alterskat
,CASE WHEN a.isweiblich = 'T' THEN 'female' ELSE 'male' END -- AS sex
,m.m
,count(*) -- AS anzahl
FROM tvd_db.aufenthalte a
CROSS JOIN (
SELECT generate_series('2007-01-01'::date
,'2011-12-01'::date, interval '1 month')::date
) m(m)
WHERE a.gueltigvon <= m.m
AND a.gueltigbis >= m.m
GROUP BY a.nuar, 2, 3, m.m
ORDER BY a.nuar, 2, 3, m.m;