Using 'over' function results in column "table.id" must appear in the GROUP BY clause or be used in an aggregate function - postgresql

I'm currently writing an application which shows the growth of the total number of events in my table over time, I currently have the following query to do this:
query = session.query(
count(Event.id).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
This results in the following output:
Count
Year
Month
100
2021
1
50
2021
2
75
2021
3
While this is okay on it's own, I want it to display the total number over time, so not just the number of events that month, so the desired outpout should be:
Count
Year
Month
100
2021
1
150
2021
2
225
2021
3
I read on various places I should use a window function using SqlAlchemy's over function, however I can't seem to wrap my head around it and every time I try using it I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "event.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT count(event.id) OVER (PARTITION BY event.date ORDER...
^
[SQL: SELECT count(event.id) OVER (PARTITION BY event.date ORDER BY EXTRACT(year FROM event.date), EXTRACT(month FROM event.date)) AS count, EXTRACT(year FROM event.date) AS year, EXTRACT(month FROM event.date) AS month
FROM event
WHERE event.date IS NOT NULL GROUP BY year, month]
This is the query I used:
session.query(
count(Event.id).over(
order_by=(
extract('year', Event.date),
extract('month', Event.date)
),
partition_by=Event.date
).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
Could someone show me what I'm doing wrong? I've been searching for hours but can't figure out how to get the desired output as adding event.id in the group by would stop my rows from getting grouped by month and year
The final query I ended up using:
query = session.query(
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month'),
func.sum(func.count(Event.id)).over(order_by=(
extract('year', Event.date),
extract('month', Event.date)
)).label('count'),
).filter(
Event.date.isnot(None)
).group_by('year', 'month')

I'm not 100% sure what you want, but I'm assuming you want the number of events up to that month for each month. You're going to first need to calculate the # of events per month and also sum them with the postgresql window function.
You can do that with in a single select statement:
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, SUM(COUNT(events.id)) OVER(ORDER BY extract(year FROM events.date), extract(month FROM events.date)) AS total_so_far
FROM events
GROUP BY 1,2
but it might be easier to think about if you split it into two:
SELECT year, month, SUM(events_count) OVER(ORDER BY year, month)
FROM (
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, COUNT(events.id) AS events_count
FROM events
GROUP BY 1,2
)
but not sure how to do that in SqlAlchemy

Related

Mixing DISTINCT with GROUP_BY Postgres

I am trying to get a list of:
all months in a specified year that,
have at least 2 unique rows based on their date
and ignore specific column values
where I got to is:
SELECT DATE_PART('month', "orderDate") AS month, count(*)
FROM public."Orders"
WHERE "companyId" = 00001 AND "orderNumber" != 1 and DATE_PART('year', ("orderDate")) = '2020' AND "orderNumber" != NULL
GROUP BY month
HAVING COUNT ("orderDate") > 2
The HAVING_COUNT sort of works in place of DISTINCT insofar as I can be reasonably sure that condition filters the condition of data required.
However, being able to use DISTINCT based on a given date within a month would return a more reliable result. Is this possible with Postgres?
A sample line of data from the table:
Sample Input
"2018-12-17 20:32:00+00"
"2019-02-26 14:38:00+00"
"2020-07-26 10:19:00+00"
"2020-10-13 19:15:00+00"
"2020-10-26 16:42:00+00"
"2020-10-26 19:41:00+00"
"2020-11-19 20:21:00+00"
"2020-11-19 21:22:00+00"
"2020-11-23 21:10:00+00"
"2021-01-02 12:51:00+00"
without the HAVING_COUNT this produces
month
count
7
1
10
2
11
3
Month 7 can be discarded easily as only 1 record.
Month 10 is the issue: we have two records. But from the data above, those records are from the same day. Similarly, month 11 only has 2 distinct records by day.
The output should therefore be ideally:
month
count
11
2
We have only two distinct dates from the 2020 data, and they are from month 11 (November)
I think you just want to take the distinct count of dates for each month:
SELECT
DATE_PART('month', orderDate) AS month,
COUNT(DISTINCT orderDate::date) AS count
FROM Orders
WHERE
companyId = 1 AND
orderNumber != 1 AND
DATE_PART('year', orderDate) = '2020'
GROUP BY
DATE_PART('month', orderDate)
HAVING
COUNT(DISTINCT orderDate::date) > 2;

How to form a dynamic pivot table or return multiple values from GROUP BY subquery

I'm having some major issues with the following query formation:
I have projects with start and end dates
Name Start End
---------------------------------------
Project 1 2020-08-01 2020-09-10
Project 2 2020-01-01 2025-01-01
and I'm trying to count the monthly working days within each project with the following subquery
select datetrunc('month', days) as d_month, count(days) as d_count
from generate_series(greatest('2020-08-01'::date, p.start), least('2020-09-14'::date, p.end), '1 day'::interval) days
where extract(DOW from days) not IN (0, 6)
group by d_month
where p.start is from the aliased main query and the dates are hard-coded for now, this correctly gives me the following result:
{"d_month"=>2020-08-01 00:00:00 +0000, "d_count"=>21}
{"d_month"=>2020-09-01 00:00:00 +0000, "d_count"=>10}
However subqueries can't return multiple values. The date range for the query is dynamic, so I would either need to somehow return the query as:
Name Start End 2020-08-01 2020-09-01 ...
-------------------------------------------------------------------------
Project 1 2020-08-01 2020-09-10 21 8
Project 2 2020-01-01 2025-01-01 21 10
Or simply return the whole subquery as JSON, but it doesn't seem to working either.
Any idea on how to achieve this or whether there are simpler solutions for this?
The most correct solution would be to create an actual calendar table that holds every possible day of interest to your business and, at a minimum for your purpose here, marks work days.
Ideally you would have columns to hold fiscal quarters, periods, and weeks to match your industry. You would also mark holidays. Joining to this table makes these kinds of calculations a snap.
create table calendar (
ddate date not null primary key,
is_work_day boolean default true
);
insert into calendar
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from generate_series(
'2000-01-01'::timestamp,
'2099-12-31'::timestamp,
interval '1 day'
) as gs(ts);
Assuming a calendar table is not within scope, you can do this:
with bounds as (
select min(start) as first_start, max("end") as last_end
from my_projects
), cal as (
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from bounds
cross join generate_series(
first_start,
last_end,
interval '1 day'
) as gs(ts)
), bymonth as (
select p.name, p.start, p.end,
date_trunc('month', c.ddate) as month_start,
count(*) as work_days
from my_projects p
join cal c on c.ddate between p.start and p.end
where c.is_work_day
group by p.name, p.start, p.end, month_start
)
select jsonb_object_agg(to_char(month_start, 'YYYY-MM-DD'), work_days)
|| jsonb_object_agg('name', name)
|| jsonb_object_agg('start', start)
|| jsonb_object_agg('end', "end") as result
from bymonth
group by name;
Doing a pivot from rows to columns in SQL is usually a bad idea, so the query produces json for you.

How to Display Last rolling 12 months in correct month order

I have an SSRS report that shows customer sales for the year and I have been asked to change it to the last 13 rolling months. I have changed my where clause to be:
WHERE (#First12Months.FirstSaleDate BETWEEN DATEADD(MM,-13,#ReportDate) AND (#ReportDate))
( #ReportDate is the last day of the month that needs to be displayed on the right of the matrix.)
This where clause pulls the correct data but it is still displaying in my monthsort order and I need to change this to the last 12 months so that the newest month is on the right and the oldest month is on the left. I cannot work out how to do the sort.
My old sort is MonthSort which gives each month a number where April is 1 through to March = 12:
CASE WHEN Month(#First12Months.FirstSaleDate)<=3 THEN MONTH(#First12Months.FirstSaleDate)+9 ELSE MONTH(#First12Months.FirstSaleDate)-3 END AS MonthSort
but of course this is now incorrect as I need the month from #ReportDate to be number 13 and each month before that chronologically to be 1 number less.
I found this post which is the only one that seems to come close to what I need but unfortunately I simply don't understand what it is saying.
Dynamic table/output each month for report
How do I tell the MonthSort column which number to allocate to the months to get the correct sort order for a rolling 13 months?
As your data is in rows and your SSRS displays it in columns you can do the following:
Add a sorting column to your sql query that uses an analytical function in order to give the (dense) rank of the month. That rank can then be used as a sorting criteria in SSRS.
Assuming your month column is called month, your query could look like this:
select t.*, dense_rank() over (order by month) rnk from t
That order could also be done descending like this:
select t.*, dense_rank() over (order by month desc) rnk from t
Let's have an example:
with t as (
select 2134 sales, cast('20190101' as date) month union all
select 3456 sales, cast('20190201' as date) month union all
select 234 sales, cast('20190301' as date) month union all
select 4567 sales, cast('20190401' as date) month union all
select 5678 sales, cast('20190501' as date) month union all
select 234 sales, cast('20190601' as date) month union all
select 756 sales, cast('20190701' as date) month union all
select 9 sales, cast('20190801' as date) month union all
select 24356134 sales, cast('20190901' as date) month union all
select 2456134 sales, cast('20191001' as date) month union all
select 234 sales, cast('20191101' as date) month union all
select 675 sales, cast('20191201' as date) month union all
select 86 sales, cast('20200101' as date) month union all
select 786 sales, cast('20200201' as date) month union all
select 715 sales, cast('20200301' as date) month union all
select 156 sales, cast('20200401' as date) month union all
select 123 sales, cast('20200501' as date) month union all
select 687 sales, cast('20200601' as date) month union all
select 45 sales, cast('20200701' as date) month
)
, t1 as (
select sales, month from t where t.month > dateadd(MONTH, -12, getdate())
)
select t1.*, DENSE_RANK() over (order by datefromparts(year(month), month([month]), 1)) rnk from t1
will return
I guess, your first query retrieving the data is correct. Changing the columns' order would have to be done in the SSRS report.
For sorting tablix (table elements in SSRS) have a look here
Posting my own answer:
I have managed to work out a formula that calculates a position number for each column to replace my MonthSort column:
select case when MONTH(saledate) BETWEEN MONTH(dateadd(mm,-11,#ReportDate)) AND 12 THEN((MONTH(saledate)+1)-MONTH(#ReportDate)+11) ELSE ((month(saledate)+1)+month(#ReportDate)+11) end as position,
from table
WHERE (saledate BETWEEN DATEADD(MM,-11,#ReportDate) AND (#ReportDate))
This doesn't quite give me what I wanted as I wanted 12 months but couldn't work out how to differentiate between the same month this year and the same month last year e.g. if the report date is 30/6/2020 then with a 12 month parameter it gives 2 June months (1 for each year 2019 and 2020) but places both of them in position 1 when June 2019 should be in position 1 and June 2020 should be in position 13. Works well with 11 months. If anyone can help with getting it to 12 months I would be grateful
I am posting another answer of my own because it is a different way of achieving what I needed as well as the simplest - I was making this issue far more complicated than it needed to be.
In my second effort I added an extra column called SaleYear using: YEAR(SaleDate). I already had a column for MONTH(SaleDate) that I was using in a case when to achieve the Apr to March sort.
I restricted the data within the SQL query to the last 13 months using a where clause:
WHERE (SaleDate BETWEEN DATEADD(MM,-13,#ReportDate) AND DATEADD(minute, - 1, #ReportDate + 1))
And in the SSRS report I added 'Year' as a parent column group to the 'Month' column group. In the column group I added sorting by Year and then by month.
Because I had already restricted the data in the sql query to the last 13 months I have the last 13 rolling months in the correct order.
This is the cleanest and most simplistic answer.

CASE expressions with MAX aggregate functions Oracle

Using Oracle, I have selected the title_id with its the associated month of publication with:
SELECT title_id,
CASE EXTRACT(month FROM pubdate)
WHEN 1 THEN 'Jan'
WHEN 2 THEN 'Feb'
WHEN 3 THEN 'Mar'
WHEN 4 THEN 'Apr'
WHEN 5 THEN 'May'
WHEN 6 THEN 'Jun'
WHEN 7 THEN 'Jul'
WHEN 8 THEN 'Aug'
WHEN 9 THEN 'Sep'
WHEN 10 THEN 'Oct'
WHEN 11 THEN 'Nov'
ELSE 'Dec'
END MONTH
FROM TITLES;
Using the statement:
SELECT MAX(Most_Titles)
FROM (SELECT count(title_id) Most_Titles, month
FROM (SELECT title_id, extract(month FROM pubdate) AS MONTH FROM titles) GROUP BY month);
I was able to determine the month with the maximum number of books published.
Is there a way to join the two statements so that I can associate the month's text equivalent with the maximum number of titles?
In order to convert a month to a string, I wouldn't use a CASE statement, I'd just use a TO_CHAR. And you can use analytic functions to rank the results to get the month with the most books published.
SELECT num_titles,
to_char( publication_month, 'Mon' ) month_str
FROM (SELECT count(title_id) num_titles,
trunc(pubdate, 'MM') publication_month,
rank() over (order by count(title_id) desc) rnk
FROM titles
GROUP BY trunc(pubdate, 'MM'))
WHERE rnk = 1
A couple of additional caveats
If there are two months that are tied with the most publications, this query will return both rows. If you want Oracle to arbitrarily pick one, you can use the row_number analytic function rather than rank.
If the PUBDATE column in your table only has dates of midnight on the first of the month where the book is published, you can eliminate the trunc on the PUBDATE column.

deriving calendar month from week number

I've had a hunt around for something similar to this but can't find anything.
I have a query that provides the number of transactions that have occurred each day and need to group by year, month, week BUT of course some months span multiple week numbers, eg. Sept. & Oct. 2009.
Take for example week 39 last year (September & October). Thursday is the 1st October therefore 4 days of that week fall in Oct., therefore the volume of transactions for the last 3 days of Sept. should be added to the first week of October's totals? Clear?
For example:
VOLUME----TRANSACTION----YEAR----MONTH----WEEK
1264.1730----53----2009----September----37
2739.7200---109----2009----September----38
522.5500-----21----2009----October----39
1196.6450----51----2009----September----39
2827.9550---113----2009----October----40
2730.4050---110----2009----October----41
3763.7200---154----2009----October----42
3425.6250---137----2009----October----43
3551.8100---143----2009----November--44
2788.0150---113----2009----November--45
The problem is that the calendar is awkward, and there's not much you can do about it. As far as I can see, you have three choices:
Group by year and month. Display the week or weeks in the result but don't group by them.
Group by year and weeks. Display the month or months in the result but don't group by them.
Group by year, month, week, and accept that some of the groups contain less than one week's data. (i.e. what you have now)
From your description it seems like you want option 2:
SELECT year, MIN(month), week, SUM(transaction)
FROM Table1
GROUP BY year, week
Something like this would do:
-- For weeks starting Sunday and ending Saturday, the US default:
SET DATEFIRST 7
-- Alternatively, for weeks starting Saturday and ending Friday:
--SET DATEFIRST 6
SELECT
[Date]
, DATENAME(WEEKDAY,[Date]) AS [DayOfWeek]
, DATEADD(DAY,1-DATEPART(WEEKDAY,[Date]),[Date]) AS WeekStarting
, DATEADD(DAY,7-DATEPART(WEEKDAY,[Date]),[Date]) AS WeekEnding
FROM (
SELECT CONVERT(DATETIME,'20100124') UNION ALL
SELECT CONVERT(DATETIME,'20100125') UNION ALL
SELECT CONVERT(DATETIME,'20100126') UNION ALL
SELECT CONVERT(DATETIME,'20100127') UNION ALL
SELECT CONVERT(DATETIME,'20100128') UNION ALL
SELECT CONVERT(DATETIME,'20100129') UNION ALL
SELECT CONVERT(DATETIME,'20100130') UNION ALL
SELECT CONVERT(DATETIME,'20100131') UNION ALL
SELECT CONVERT(DATETIME,'20100201') UNION ALL
SELECT CONVERT(DATETIME,'20100202') UNION ALL
SELECT CONVERT(DATETIME,'20100203') UNION ALL
SELECT CONVERT(DATETIME,'20100204') UNION ALL
SELECT CONVERT(DATETIME,'20100205') UNION ALL
SELECT CONVERT(DATETIME,'20100206')
) a ([Date])
Then, convert your week start or end date to a month:
SELECT *
, WeekStartingMonthStart = DATEADD(DAY,1-DAY(WeekStarting),WeekStarting)
, WeekStartingMonthEnd = DATEADD(DAY,-1,DATEADD(MONTH,1,DATEADD(DAY,1-DAY(WeekStarting),WeekStarting)))
, WeekEndingMonthStart = DATEADD(DAY,1-DAY(WeekEnding),WeekEnding)
, WeekEndingMonthEnd = DATEADD(DAY,-1,DATEADD(MONTH,1,DATEADD(DAY,1-DAY(WeekEnding),WeekEnding)))
FROM (
SELECT
[Date]
, DATENAME(WEEKDAY,[Date]) AS [DayOfWeek]
, DATEADD(DAY,1-DATEPART(WEEKDAY,[Date]),[Date]) AS WeekStarting
, DATEADD(DAY,7-DATEPART(WEEKDAY,[Date]),[Date]) AS WeekEnding
FROM (
SELECT CONVERT(DATETIME,'20100124') UNION ALL
SELECT CONVERT(DATETIME,'20100125') UNION ALL
SELECT CONVERT(DATETIME,'20100126') UNION ALL
SELECT CONVERT(DATETIME,'20100127') UNION ALL
SELECT CONVERT(DATETIME,'20100128') UNION ALL
SELECT CONVERT(DATETIME,'20100129') UNION ALL
SELECT CONVERT(DATETIME,'20100130') UNION ALL
SELECT CONVERT(DATETIME,'20100131') UNION ALL
SELECT CONVERT(DATETIME,'20100201') UNION ALL
SELECT CONVERT(DATETIME,'20100202') UNION ALL
SELECT CONVERT(DATETIME,'20100203') UNION ALL
SELECT CONVERT(DATETIME,'20100204') UNION ALL
SELECT CONVERT(DATETIME,'20100205') UNION ALL
SELECT CONVERT(DATETIME,'20100206')
) a ([Date])
) a