How do I return the daily average for an entire month?
select count(distinct people_id)
from #enrollments_PreviousMonth
where program_modifier_id = 'E1AA7A36-0500-4BAE-A0AA-D9E0BC91A6F3'
AND actual_date between '4/1/13' and '5/1/13'
Above is a portion of my current code. I want to get a distinct count of people_id for each day in April. Then I want to average these counts for the month of April. For example, if the count was 764 on April 1 and 763 on April 2, then I would sum 764 and 763 = 1527. Likewise, I would sum every day in April. Finally I would divide by number of days in April to get my daily average. What's the most efficient TSQL to accomplish this? Is there a CTE I could use for this or some other standard SQL operator?
WITH list AS (
select actual_date,count(distinct people_id) cnt
from #enrollments_PreviousMonth
where program_modifier_id = 'E1AA7A36-0500-4BAE-A0AA-D9E0BC91A6F3'
AND actual_date between '4/1/13' and '5/1/13'
GROUP BY actual_date )
SELECT AVG(cnt) FROM list
You might want to use something like this
SELECT round(Cast(count(people_id)as float) / Cast(DateDiff(day, '4/1/13', '4/6/13')as float), 2) as average
FROM #enrollments_PreviousMonth
WHERE program_modifier_id = 'E1AA7A36-0500-4BAE-A0AA-D9E0BC91A6F3'
AND actual_date between '4/1/13' and '5/1/13'
GROUP BY people_id
SELECT AVG(CONVERT(FLOAT, distinctPeopleByDate))
FROM
(
select actual_date, count(distinct people_id) as distinctPeopleByDate
from #enrollments_PreviousMonth
where program_modifier_id = 'E1AA7A36-0500-4BAE-A0AA-D9E0BC91A6F3'
and actual_date between '4/1/13' and '5/1/13'
group by actual_date
) x
Related
I have a pickupDate and returnDate in my OrderHistory table. I want to extract the sum of rental days of all OrderHistory entries, grouped/ordered by month. A cte seems to be the solution but I don´t get how to implement it in my query since the cte´s i saw were refering to themselves where it says "FROM cte".
I tried something like this:
SELECT
SUM((EXTRACT (DAY FROM("OrderHistory"."returnDate")-("OrderHistory"."pickupDate")))) as traveltime
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
M
ORDER BY
M
But the outcome doesn´t split bookings btw two months (e.g. pickupDate=27th march 2022 and returnDate=03rd of april 2022) but will assign the whole 7 days to the month of march, since the returndate is in it. It should show 4 days in march and 3 in april.
Sorry for the probably very stupid question but I am a beginner. (my code is written in postgresql btw)
PostgreSQL naming conventions
Are PostgreSQL column names case-sensitive?
use legal, lower-case names exclusively so double-quoting is not
needed.
Final result in db fiddle
Add daterange column.
alter table order_history add column date_ranges daterange;
update order_history
with a(m_begin, m_end, pickup_date) as
(select date_trunc('month', pickup_date)::date,
(date_trunc('month', pickup_date) + interval '1 month - 1 day')::date,
pickup_date from order_history)
update order_history set date_ranges =
daterange(a.m_begin, a.m_end,'[]') from a
where a.pickup_date = order_history.pickup_date;
then final query:
WITH A AS(
select
pickup_date,
return_date,
return_date - pickup_date as total,
case when return_date <# date_ranges then (return_date - pickup_date)
else ( date_trunc('month', pickup_date) + interval '1 month - 1 day')::date - pickup_date
end partial_mth
from order_history),
b as (SELECT *, a.total - partial_mth parital_not_mth FROM a)
select *,
case when to_char(pickup_date,'YYYY-MM') = to_char(return_date,'YYYY-MM')
then
sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM')) +
sum(parital_not_mth) over (partition by to_char(return_date,'YYYY-MM'))
else sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM'))
end
from b;
After trying different things I think I found the best answer to my question, that I want to share with the community:
WITH hier as (
SELECT
"OrderHistory"."pickupDate" as start_date
, "OrderHistory"."returnDate" as end_date
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
1, 2, 3
ORDER BY
3
), calendar as (
select date '2022-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 365) n
)
select
to_char(calendar_date::date, 'YYYY-MM')
, count(*) as tage_gebucht
from calendar
inner join hier on calendar.calendar_date between start_date and end_date
where calendar_date between '2022-01-01' and '2022-12-31'
group by 1
order by 1;
I think this is the simplest solution I came up with.
From the DB (Postgresql) I want to get the percentage per month (of all months) of stock items with a certain condition. So the total of the whole month is 100% and per condition it would be a percentage of that. I'm trying all kinds of 'partition by' queries, but i quite can't get it right.
In the example there would be an extra column and on each row there would be the percentage of that month. So the value for the new column for the first row it would be 25/506*100.
Right now I have and works is:
select to_char(created_at, 'YYYY-MM') as maand, count(si.id) as aantal,
case
when condition_id=1 then 'Nieuw'
when condition_id=2 then 'Als nieuw'
when condition_id=3 then 'Goed'
when condition_id=4 then 'Redelijk'
when condition_id=5 then 'Matig'
else 'Onbepaald'
end
from stock_items si
group by maand, condition_id
order by maand desc, condition_id asc
maand
aantal
case
new column
2022-01
25
Nieuw
25/506*100
2022-01
234
Als nieuw
234/506*100
2022-01
127
Goed
127/506*100
2022-01
16
Redelijk
16/506*100
2022-01
104
Matig
104/506*100
2021-12
456
Nieuw
other month
I hope it's all clear. Thanks!
I got what I wanted. To realise i want it a little different, but this is the answer to my question.
select
to_char(created_at, 'YYYY-MM') as maand,
count(id) as aantal,
round((count(id) / (sum(count(id)) over (partition by to_char(created_at, 'YYYY-MM'))) * 100), 2) as percentage,
case
when condition_id=1 then 'Nieuw'
when condition_id=2 then 'Als nieuw'
when condition_id=3 then 'Goed'
when condition_id=4 then 'Redelijk'
when condition_id=5 then 'Matig'
else 'Onbepaald'
end
from stock_items
group by maand, condition_id
order by maand desc, condition_id asc
just warp it with CTE.
with a as (
select to_char(created_at, 'YYYY-MM') as maand, count(si.id) as aantal,
case
when condition_id=1 then 'Nieuw'
when condition_id=2 then 'Als nieuw'
when condition_id=3 then 'Goed'
when condition_id=4 then 'Redelijk'
when condition_id=5 then 'Matig'
else 'Onbepaald'
end as case
from stock_items si
group by maand, condition_id
order by maand desc, condition_id asc)
select a.*, aantal * 100 / sum(aantal) over (PARTITION BY maand) as anntal_rate from a;
/* some characters so the edit is accepted */
I'm currently writing an application which shows the growth of the total number of events in my table over time, I currently have the following query to do this:
query = session.query(
count(Event.id).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
This results in the following output:
Count
Year
Month
100
2021
1
50
2021
2
75
2021
3
While this is okay on it's own, I want it to display the total number over time, so not just the number of events that month, so the desired outpout should be:
Count
Year
Month
100
2021
1
150
2021
2
225
2021
3
I read on various places I should use a window function using SqlAlchemy's over function, however I can't seem to wrap my head around it and every time I try using it I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "event.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT count(event.id) OVER (PARTITION BY event.date ORDER...
^
[SQL: SELECT count(event.id) OVER (PARTITION BY event.date ORDER BY EXTRACT(year FROM event.date), EXTRACT(month FROM event.date)) AS count, EXTRACT(year FROM event.date) AS year, EXTRACT(month FROM event.date) AS month
FROM event
WHERE event.date IS NOT NULL GROUP BY year, month]
This is the query I used:
session.query(
count(Event.id).over(
order_by=(
extract('year', Event.date),
extract('month', Event.date)
),
partition_by=Event.date
).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
Could someone show me what I'm doing wrong? I've been searching for hours but can't figure out how to get the desired output as adding event.id in the group by would stop my rows from getting grouped by month and year
The final query I ended up using:
query = session.query(
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month'),
func.sum(func.count(Event.id)).over(order_by=(
extract('year', Event.date),
extract('month', Event.date)
)).label('count'),
).filter(
Event.date.isnot(None)
).group_by('year', 'month')
I'm not 100% sure what you want, but I'm assuming you want the number of events up to that month for each month. You're going to first need to calculate the # of events per month and also sum them with the postgresql window function.
You can do that with in a single select statement:
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, SUM(COUNT(events.id)) OVER(ORDER BY extract(year FROM events.date), extract(month FROM events.date)) AS total_so_far
FROM events
GROUP BY 1,2
but it might be easier to think about if you split it into two:
SELECT year, month, SUM(events_count) OVER(ORDER BY year, month)
FROM (
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, COUNT(events.id) AS events_count
FROM events
GROUP BY 1,2
)
but not sure how to do that in SqlAlchemy
I am trying to build a cohort analysis for monthly retention but experiencing challenge getting the Month Number column right. The month number is supposed to return month(s) user transacted i.e 0 for registration month, 1 for the first month after registration month, 2 for the second month until the last month but currently, it returns negative month numbers in some cells.
It should be like this table:
cohort_month total_users month_number percentage
---------- ----------- -- ------------ ---------
January 100 0 40
January 341 1 90
January 115 2 90
February 103 0 73
February 100 1 40
March 90 0 90
Here is the SQL:
with cohort_items as (
select
extract(month from insert_date) as cohort_month,
msisdn as user_id
from mfscore.t_um_user_detail where extract(year from insert_date)=2020
order by 1, 2
),
user_activities as (
select
A.sender_msisdn,
extract(month from A.insert_date)-C.cohort_month as month_number
from mfscore.t_wm_transaction_logs A
left join cohort_items C ON A.sender_msisdn = C.user_id
where extract(year from A.insert_date)=2020
group by 1, 2
),
cohort_size as (
select cohort_month, count(1) as num_users
from cohort_items
group by 1
order by 1
),
B as (
select
C.cohort_month,
A.month_number,
count(1) as num_users
from user_activities A
left join cohort_items C ON A.sender_msisdn = C.user_id
group by 1, 2
)
select
B.cohort_month,
S.num_users as total_users,
B.month_number,
B.num_users * 100 / S.num_users as percentage
from B
left join cohort_size S ON B.cohort_month = S.cohort_month
where B.cohort_month IS NOT NULL
order by 1, 3
I think the RANK window function is the right solution. So the idea is to assigne a rank to months of user activities for each user, order by year and month.
Something like:
WITH activity_per_user AS (
SELECT
user_id,
event_date,
RANK() OVER (PARTITION BY user_id ORDER BY DATE_PART('year', event_date) , DATE_PART('month', event_date) ASC) AS month_number
FROM user_activities_table
)
RANK number starts from 1, so you may want to substract 1.
Then, you can group by user_id and month_number to get the number of interactions for each user per month from the subscription (adapt to your use case accordingly).
SELECT
user_id,
month_number,
COUNT(1) AS n_interactions
FROM activity_per_user
GROUP BY 1, 2
Here is the documentation:
https://docs.aws.amazon.com/redshift/latest/dg/r_WF_RANK.html
I'm currently trying to get the first and last day of any year. I have data from 1950 and I want to get the first day of the year in the dataset to the last day of the year in the dataset (note that the last day of the year might not be December 31rst and same with the first day of the year).
Initially I thought I could use a CTE and call DATEPART with the day of the year selection, but this wouldn't partition appropriately. I also tried a CTE self-join, but since the last day or first day of the year might be different, this also yields inaccurate results.
For instance, using the below actually generates some MINs in the MAX and vice versa, though in theory it should only grab the MAX date for the year and the MIN date for the year:
;WITH CT AS(
SELECT Points
, Date
, DATEPART(DY,Date) DA
FROM Table
WHERE DATEPART(DY,Date) BETWEEN 363 AND 366
OR DATEPART(DY,Date) BETWEEN 1 AND 3
)
SELECT MIN(c.Date) MinYear
, MAX(c.Date) MaxYear
FROM CT c
GROUP BY YEAR(c.Date)
You want something like this for the first day of the year:
dateadd(year, datediff(year,0, c.Date), 0)
and this for the last day of the year:
--first day of next year -1
dateadd(day, -1, dateadd(year, datediff(year,0, c.Date) + 1, 0)
try this
for getting first day ,last day of the year && firstofthe next_year
SELECT
DATEADD(yy, DATEDIFF(yy,0,getdate()), 0) AS Start_Of_Year,
dateadd(yy, datediff(yy,-1, getdate()), -1) AS Last_Day_Of_Year,
DATEADD(yy, DATEDIFF(yy,0,getdate()) + 1, 0) AS FirstOf_the_NextYear
so putting this in your query
;WITH CT AS(
SELECT Points
, Date
, DATEPART(DY,Date) DA
FROM Table
WHERE DATEPART(DY,Date) BETWEEN
DATEPART(day,DATEADD(yy, DATEDIFF(yy,0,getdate()), 0)) AND
DATEPART(day,dateadd(yy, datediff(yy,-1, getdate()), -1))
)
SELECT MIN(c.Date) MinYear
, MAX(c.Date) MaxYear
FROM CT c
GROUP BY YEAR(c.Date)
I should refrain from developing in the evenings because I solved it, and it's actually quite simple:
SELECT MIN(Date)
, MAX(Date)
FROM Table
GROUP BY YEAR(Date)
I can put these values into a CTE and then JOIN on the dates and get what I need:
;WITH CT AS(
SELECT MIN(Date) Mi
, MAX(Date) Ma
FROM Table
GROUP BY YEAR(Date)
)
SELECT c.Mi
, m.Points
, c.Ma
, f.Points
FROM CT c
INNER JOIN Table m ON c.Mi = m.Date
INNER JOIN Table f ON c.Ma = f.Date