Postgres calculate growth rate over two month - postgresql

I would like to calculate growth rate for customers for following data.
month | customers
-------------------------
01-2015 | 1
02-2015 | 10
03-2014 | 10
06-2015 | 15
I have used following formula to calculate the growth rate, it works only for one month interval as well as not able to give expected output due to gap between 3rd and 6th month as shown in above table
select
month, total,
(total::float / lag(total) over (order by month) - 1) * 100 growth
from (
select to_char(created, 'yyyy-mm') as month, count(id) total
from customers
group by month
) s
order by month;
I think this can be done by creating a date range and group by that range.
I expect two main output separately
1) Generate growth rate with exact one month difference
2) Growth rate with interval of 2 month instead of single month only. In above data sum the two month result and group by two month instead of month

Still not sure about the second part. Here's growth from your adapted query and twon month growth column:
select
month, total,
(total::float / lag(total) over (order by m) - 1) * 100 growth,m,m2
from (
select created, (sum(customers) over (order by m))::float total,customers,m,m2,to_char(created, 'yyyy-mm') as month
from customers c
right outer join (
select generate_series('2015-01-01','2015-06-01','1 month'::interval) m
) m1 on m=c.created
left outer join (
select generate_series('2015-01-01','2015-06-01','2 month'::interval) m2
) m2 on m2=m
order by m
) s
order by m;
basically answer is use generate_series

Related

Using 'over' function results in column "table.id" must appear in the GROUP BY clause or be used in an aggregate function

I'm currently writing an application which shows the growth of the total number of events in my table over time, I currently have the following query to do this:
query = session.query(
count(Event.id).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
This results in the following output:
Count
Year
Month
100
2021
1
50
2021
2
75
2021
3
While this is okay on it's own, I want it to display the total number over time, so not just the number of events that month, so the desired outpout should be:
Count
Year
Month
100
2021
1
150
2021
2
225
2021
3
I read on various places I should use a window function using SqlAlchemy's over function, however I can't seem to wrap my head around it and every time I try using it I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "event.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT count(event.id) OVER (PARTITION BY event.date ORDER...
^
[SQL: SELECT count(event.id) OVER (PARTITION BY event.date ORDER BY EXTRACT(year FROM event.date), EXTRACT(month FROM event.date)) AS count, EXTRACT(year FROM event.date) AS year, EXTRACT(month FROM event.date) AS month
FROM event
WHERE event.date IS NOT NULL GROUP BY year, month]
This is the query I used:
session.query(
count(Event.id).over(
order_by=(
extract('year', Event.date),
extract('month', Event.date)
),
partition_by=Event.date
).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
Could someone show me what I'm doing wrong? I've been searching for hours but can't figure out how to get the desired output as adding event.id in the group by would stop my rows from getting grouped by month and year
The final query I ended up using:
query = session.query(
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month'),
func.sum(func.count(Event.id)).over(order_by=(
extract('year', Event.date),
extract('month', Event.date)
)).label('count'),
).filter(
Event.date.isnot(None)
).group_by('year', 'month')
I'm not 100% sure what you want, but I'm assuming you want the number of events up to that month for each month. You're going to first need to calculate the # of events per month and also sum them with the postgresql window function.
You can do that with in a single select statement:
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, SUM(COUNT(events.id)) OVER(ORDER BY extract(year FROM events.date), extract(month FROM events.date)) AS total_so_far
FROM events
GROUP BY 1,2
but it might be easier to think about if you split it into two:
SELECT year, month, SUM(events_count) OVER(ORDER BY year, month)
FROM (
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, COUNT(events.id) AS events_count
FROM events
GROUP BY 1,2
)
but not sure how to do that in SqlAlchemy

Postgres find where dates are NOT overlapping between two tables

I have two tables and I am trying to find data gaps in them where the dates do not overlap.
Item Table:
id unique start_date end_date data
1 a 2019-01-01 2019-01-31 X
2 a 2019-02-01 2019-02-28 Y
3 b 2019-01-01 2019-06-30 Y
Plan Table:
id item_unique start_date end_date
1 a 2019-01-01 2019-01-10
2 a 2019-01-15 'infinity'
I am trying to find a way to produce the following
Missing:
item_unique from to
a 2019-01-11 2019-01-14
b 2019-01-01 2019-06-30
step-by-step demo:db<>fiddle
WITH excepts AS (
SELECT
item,
generate_series(start_date, end_date, interval '1 day') gs
FROM items
EXCEPT
SELECT
item,
generate_series(start_date, CASE WHEN end_date = 'infinity' THEN ( SELECT MAX(end_date) as max_date FROM items) ELSE end_date END, interval '1 day')
FROM plan
)
SELECT
item,
MIN(gs::date) AS start_date,
MAX(gs::date) AS end_date
FROM (
SELECT
*,
SUM(same_day) OVER (PARTITION BY item ORDER BY gs)
FROM (
SELECT
item,
gs,
COALESCE((gs - LAG(gs) OVER (PARTITION BY item ORDER BY gs) >= interval '2 days')::int, 0) as same_day
FROM excepts
) s
) s
GROUP BY item, sum
ORDER BY 1,2
Finding the missing days is quite simple. This is done within the WITH clause:
Generating all days of the date range and subtract this result from the expanded list of the second table. All dates that not occur in the second table are keeping. The infinity end is a little bit tricky, so I replaced the infinity occurrence with the max date of the first table. This avoids expanding an infinite list of dates.
The more interesting part is to reaggregate this list again, which is the part outside the WITH clause:
The lag() window function take the previous date. If the previous date in the list is the last day then give out true (here a time changing issue occurred: This is why I am not asking for a one day difference, but a 2-day-difference. Between 2019-03-31 and 2019-04-01 there are only 23 hours because of daylight saving time)
These 0 and 1 values are aggregated cumulatively. If there is one gap greater than one day, it is a new interval (the days between are covered)
This results in a groupable column which can be used to aggregate and find the max and min date of each interval
Tried something with date ranges which seems to be a better way, especially for avoiding to expand long date lists. But didn't come up with a proper solution. Maybe someone else?

If today's results are blank, show the totals from yesterday

My code is an accumulated total of revenue over a period of time. If a single day is blank (no revenue for that day) I need it to show the totals from the day before. CASE WHEN (today is blank), Yesterday's data ELSE Today's Total
I am not sure what the syntax is on this one.
select distinct
date_trunc('day',admit_date) as admit_date,
revenue,
sum(revenue) over(order by admit_date) as running_rev
from dailyrev
order by admit_date
Expected Results:
Day 1: $100
Day 2: $200
Day 3: (no data so show Day 2 data) $200
Maybe this is what you need:
SELECT admit_date,
prev_revs[cardinality(prev_revs)] AS adj_revenue,
sum(prev_revs[cardinality(prev_revs)])
OVER (ORDER BY admit_date) AS running_sum
FROM (SELECT date_trunc('day', admit_date) AS admit_date,
array_remove(array_agg(revenue)
OVER (order by admit_date),
NULL) AS prev_revs
FROM dailyrev) AS q
ORDER BY admit_date;
Unfortunately PostgreSQL doesn't yet support the IGNORE NULLS clause, then it would have been simpler.
I am not sure if this is what you want, but try this:
SELECT
gs.date::date AS admit_date,
(SELECT revenue FROM dailyrev WHERE admit_date::date = gs.date) AS revenue,
(SELECT SUM(revenue) FROM dailyrev WHERE admit_date::date <= gs.date) AS accumulated_total
FROM
generated_series(
(SELECT MIN(admit_date::date) FROM dailyrev),
(SELECT MAX(admit_date::date) FROM dailyrev),
INTERVAL '1 day'
) gs
ORDER BY gs.date::date;
Yes, it does not look that nice, but..

Extracting data that is relevant to the financial year as set by the parameter date

I have a student_table and in this table there is a column student_financial_aid_type and the next column is date_ , so the value of student_financial_aid_type e.g. = 'direct' and the date_ 1/04/2018. I have used CTE tables and I have a parameter date at the beginning of the code, so that I get the number of students as of that day. e.g. my parameter date is 20/04/2019.
My financial year runs from april to march eg 1/04/18 - 31/3/19.
My question is where, it indicates that the student received some form of financial aid in the financial year, I will have an output column that says either 'Y' or 'N'. So using the example above, because the date 1/04/2018 is not in the financial year of the parameter date (20/04/19), it's actually in the previous financial year (1/04/18 - 31/3/19) then I would want this to be 'N' in the output column as in the financial year of the parameter date (20/04/19) the student did not receive any financial aid. However if I happen to change the parameter date 2/06/18, then the date that the student received the financial aid (1/04/18) is in the dame financial year as the parameter date, therefore my output column will now have 'Y' to reflect this. So however I do this it has to be dynamic and respond to the parameter date as that is the one that I as the user will be changing as and when
I have tried using date_part and I have managed to have the month number of the date that the student received the payout, from this point on I was thinking of using the month number as an indicator to what FY year it falls in, but I am not sure how to go about this.
WITH
parameter_date as (
select '2019-04-26':: date p_date),
student_cohort as (select * from (
SELECT Distinct
ms.studentid,ss.student_admission_date,ms.graduation_date
FROM master_student_table ms
left join student_semeter ss on ms.student_id=ss.student_id ,
parameter_date, p
AND ss.student_admission_date <= p_date -- i.e. began studies less than
or equal to p_date
AND (ms.graduation_date is null or ms.graduation_date > p_date)) -- i.e.
student finished studies more than p_date or IS NULL
)x ),
student_finance as (select * from ( select date_part('month', st.date_::
date)
date_part, st.date_, st.studentid,st.student_financial_aid_type
from student_table st
left join student_cohort s on st.studentid = s.studentid
where st.student_financial_aid_type in ('direct' , 'indirect')
) x )
select distinct
s.student_id,
s.graduation_date,
s.admissiondate_date,
sf.date_,
-- this is what I would like it to be -- case when sf.date is in the same
--financial year as the parameter_date
--then 'Y' else 'N' end was_financial_aid_received_in_the_fy,
sf.date_part
from
cohort s
left join student_finance sf on s.student_id = sf.student_id and
sf.student_financial_aid_type = 'direct'
left join student_finance sf1 on s.student_id = sf1.student_id and
sf1.student_financial_aid_type = 'indirect' `
I would love for the output column 'was_financial_aid_received_in_the_fy' from the case statement, to have 'Y' if the sf.date_ that the student received financial aid is in the same FY year as the parameter_date and 'N' if this isn't the case
Thank you very much for all your help
I think this question basically boils down to the following:
Given a parameter date, figure out the financial year for that date.
Figure out if other dates fall in this financial year.
This is a great place to use dateranges, one of my favorite types. We can figure out the financial year from the parameter date and use a daterange to represent it. If the parameter date is before April, the financial year should be from April 1 of the previous year (inclusive) to April 1 of this year (exclusive). If the parameter date is after April, the financial year should be April 1 of this year (inclusive) to April 1 of next year (exclusive).
Here's a query that should demonstrate how to do this:
WITH parameter_date as (
select '2019-04-26'::date p_date
), fiscal_year as (
select daterange(
make_date(case when date_part('month', p_date)<4
THEN date_part('year', p_date)::int-1
ELSE date_part('year', p_date)::int END,
4, 1),
make_date(case when date_part('month', p_date)<4
THEN date_part('year', p_date)::int
ELSE date_part('year', p_date)::int+1 END,
4, 1),
'[)') as f_year
FROM parameter_date
),
test_data as (
select test_date::date from (values
('2019-04-01'),
('2018-04-01'),
('2019-03-02'),
('2020-12-01'),
('2017-05-26'),
('2020-02-27'),
('2020-04-01')
) v(test_date)
)
select test_date,
CASE WHEN test_date <# fiscal_year.f_year THEN 'Y' ELSE 'N' END as in_f_year
from test_data, fiscal_year;
test_date | in_f_year
------------+-----------
2019-04-01 | Y
2018-04-01 | N
2019-03-02 | N
2020-12-01 | N
2017-05-26 | N
2020-02-27 | Y
2020-04-01 | N
(7 rows)

Customize query of postgresql

I am using postgresql database, for i am trying to achieve like i have two queries and but i don't want to use multiple queries so is it possible to manage by single query ?
Query 1 :
select coalesce(sum("dummy"),0) as sum from generate_series ('2014-09-09 00:00:00'::timestamp,'2014-09-09 23:59:59','1 minute')
minutes(minute) LEFT JOIN report ON
minutes.minute=date_trunc('minute', report.fetchdate)
AND fetchdate >= '2014-09-09 00:00:00' AND fetchdate <= '2014-09-09 23:59:00'
AND entity_id ='0' group by minute order by minute
OUTPUT:
Total count of dummy field for each minutes of each day it means each day have total (24*60=1440) records
Note : This Query Using for single Day
Query2 :
select date(day)as day,coalesce(sum("dummy"),0) as sum from generate_series ('2014-09-06 00:00:01'::date,'2014-09-12 23:59:59'::date,'1 day'::interval) days(day) LEFT JOIN report ON days.day=date_trunc('day', report.fetchdate) AND entity_id ='0' group by day order by day
OUTPUT:
give total count of dummy field for each day between day 2014-09-06 to 2014-09-12 it means total 7 records (Date : 6,7,8,9,10,11,12)
Note :This Query using for more than 1 days
Required Output:
1) Need to see total count of dummy field of each day between specified date(Output of 2nd query)
2) Need to see maximum call of each day
Ex :
Suppose i am search by any two days then need to break in single date and get data for each minute of each date and whenever we have maximum count of dummy field of particular day then need to show as output maximum call for each day
select
date_trunc('day', minute) as day,
sum(minute_sum) as day_sum,
max(minute_sum) as max_minute_sum
from (
select
minute,
coalesce(sum("dummy"),0) as minute_sum
from
generate_series(
'2014-09-06'::timestamp,
'2014-09-13'::timestamp - interval '1 minute',
'1 minute'
) minutes(minute)
left join
report on
minutes.minute = date_trunc('minute', report.fetchdate)
and entity_id ='0'
group by minute
) s
group by 1
order by 1