CASE expressions with MAX aggregate functions Oracle - oracle10g

Using Oracle, I have selected the title_id with its the associated month of publication with:
SELECT title_id,
CASE EXTRACT(month FROM pubdate)
WHEN 1 THEN 'Jan'
WHEN 2 THEN 'Feb'
WHEN 3 THEN 'Mar'
WHEN 4 THEN 'Apr'
WHEN 5 THEN 'May'
WHEN 6 THEN 'Jun'
WHEN 7 THEN 'Jul'
WHEN 8 THEN 'Aug'
WHEN 9 THEN 'Sep'
WHEN 10 THEN 'Oct'
WHEN 11 THEN 'Nov'
ELSE 'Dec'
END MONTH
FROM TITLES;
Using the statement:
SELECT MAX(Most_Titles)
FROM (SELECT count(title_id) Most_Titles, month
FROM (SELECT title_id, extract(month FROM pubdate) AS MONTH FROM titles) GROUP BY month);
I was able to determine the month with the maximum number of books published.
Is there a way to join the two statements so that I can associate the month's text equivalent with the maximum number of titles?

In order to convert a month to a string, I wouldn't use a CASE statement, I'd just use a TO_CHAR. And you can use analytic functions to rank the results to get the month with the most books published.
SELECT num_titles,
to_char( publication_month, 'Mon' ) month_str
FROM (SELECT count(title_id) num_titles,
trunc(pubdate, 'MM') publication_month,
rank() over (order by count(title_id) desc) rnk
FROM titles
GROUP BY trunc(pubdate, 'MM'))
WHERE rnk = 1
A couple of additional caveats
If there are two months that are tied with the most publications, this query will return both rows. If you want Oracle to arbitrarily pick one, you can use the row_number analytic function rather than rank.
If the PUBDATE column in your table only has dates of midnight on the first of the month where the book is published, you can eliminate the trunc on the PUBDATE column.

Related

Mixing DISTINCT with GROUP_BY Postgres

I am trying to get a list of:
all months in a specified year that,
have at least 2 unique rows based on their date
and ignore specific column values
where I got to is:
SELECT DATE_PART('month', "orderDate") AS month, count(*)
FROM public."Orders"
WHERE "companyId" = 00001 AND "orderNumber" != 1 and DATE_PART('year', ("orderDate")) = '2020' AND "orderNumber" != NULL
GROUP BY month
HAVING COUNT ("orderDate") > 2
The HAVING_COUNT sort of works in place of DISTINCT insofar as I can be reasonably sure that condition filters the condition of data required.
However, being able to use DISTINCT based on a given date within a month would return a more reliable result. Is this possible with Postgres?
A sample line of data from the table:
Sample Input
"2018-12-17 20:32:00+00"
"2019-02-26 14:38:00+00"
"2020-07-26 10:19:00+00"
"2020-10-13 19:15:00+00"
"2020-10-26 16:42:00+00"
"2020-10-26 19:41:00+00"
"2020-11-19 20:21:00+00"
"2020-11-19 21:22:00+00"
"2020-11-23 21:10:00+00"
"2021-01-02 12:51:00+00"
without the HAVING_COUNT this produces
month
count
7
1
10
2
11
3
Month 7 can be discarded easily as only 1 record.
Month 10 is the issue: we have two records. But from the data above, those records are from the same day. Similarly, month 11 only has 2 distinct records by day.
The output should therefore be ideally:
month
count
11
2
We have only two distinct dates from the 2020 data, and they are from month 11 (November)
I think you just want to take the distinct count of dates for each month:
SELECT
DATE_PART('month', orderDate) AS month,
COUNT(DISTINCT orderDate::date) AS count
FROM Orders
WHERE
companyId = 1 AND
orderNumber != 1 AND
DATE_PART('year', orderDate) = '2020'
GROUP BY
DATE_PART('month', orderDate)
HAVING
COUNT(DISTINCT orderDate::date) > 2;

Using 'over' function results in column "table.id" must appear in the GROUP BY clause or be used in an aggregate function

I'm currently writing an application which shows the growth of the total number of events in my table over time, I currently have the following query to do this:
query = session.query(
count(Event.id).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
This results in the following output:
Count
Year
Month
100
2021
1
50
2021
2
75
2021
3
While this is okay on it's own, I want it to display the total number over time, so not just the number of events that month, so the desired outpout should be:
Count
Year
Month
100
2021
1
150
2021
2
225
2021
3
I read on various places I should use a window function using SqlAlchemy's over function, however I can't seem to wrap my head around it and every time I try using it I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "event.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT count(event.id) OVER (PARTITION BY event.date ORDER...
^
[SQL: SELECT count(event.id) OVER (PARTITION BY event.date ORDER BY EXTRACT(year FROM event.date), EXTRACT(month FROM event.date)) AS count, EXTRACT(year FROM event.date) AS year, EXTRACT(month FROM event.date) AS month
FROM event
WHERE event.date IS NOT NULL GROUP BY year, month]
This is the query I used:
session.query(
count(Event.id).over(
order_by=(
extract('year', Event.date),
extract('month', Event.date)
),
partition_by=Event.date
).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
Could someone show me what I'm doing wrong? I've been searching for hours but can't figure out how to get the desired output as adding event.id in the group by would stop my rows from getting grouped by month and year
The final query I ended up using:
query = session.query(
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month'),
func.sum(func.count(Event.id)).over(order_by=(
extract('year', Event.date),
extract('month', Event.date)
)).label('count'),
).filter(
Event.date.isnot(None)
).group_by('year', 'month')
I'm not 100% sure what you want, but I'm assuming you want the number of events up to that month for each month. You're going to first need to calculate the # of events per month and also sum them with the postgresql window function.
You can do that with in a single select statement:
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, SUM(COUNT(events.id)) OVER(ORDER BY extract(year FROM events.date), extract(month FROM events.date)) AS total_so_far
FROM events
GROUP BY 1,2
but it might be easier to think about if you split it into two:
SELECT year, month, SUM(events_count) OVER(ORDER BY year, month)
FROM (
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, COUNT(events.id) AS events_count
FROM events
GROUP BY 1,2
)
but not sure how to do that in SqlAlchemy

How to Display Last rolling 12 months in correct month order

I have an SSRS report that shows customer sales for the year and I have been asked to change it to the last 13 rolling months. I have changed my where clause to be:
WHERE (#First12Months.FirstSaleDate BETWEEN DATEADD(MM,-13,#ReportDate) AND (#ReportDate))
( #ReportDate is the last day of the month that needs to be displayed on the right of the matrix.)
This where clause pulls the correct data but it is still displaying in my monthsort order and I need to change this to the last 12 months so that the newest month is on the right and the oldest month is on the left. I cannot work out how to do the sort.
My old sort is MonthSort which gives each month a number where April is 1 through to March = 12:
CASE WHEN Month(#First12Months.FirstSaleDate)<=3 THEN MONTH(#First12Months.FirstSaleDate)+9 ELSE MONTH(#First12Months.FirstSaleDate)-3 END AS MonthSort
but of course this is now incorrect as I need the month from #ReportDate to be number 13 and each month before that chronologically to be 1 number less.
I found this post which is the only one that seems to come close to what I need but unfortunately I simply don't understand what it is saying.
Dynamic table/output each month for report
How do I tell the MonthSort column which number to allocate to the months to get the correct sort order for a rolling 13 months?
As your data is in rows and your SSRS displays it in columns you can do the following:
Add a sorting column to your sql query that uses an analytical function in order to give the (dense) rank of the month. That rank can then be used as a sorting criteria in SSRS.
Assuming your month column is called month, your query could look like this:
select t.*, dense_rank() over (order by month) rnk from t
That order could also be done descending like this:
select t.*, dense_rank() over (order by month desc) rnk from t
Let's have an example:
with t as (
select 2134 sales, cast('20190101' as date) month union all
select 3456 sales, cast('20190201' as date) month union all
select 234 sales, cast('20190301' as date) month union all
select 4567 sales, cast('20190401' as date) month union all
select 5678 sales, cast('20190501' as date) month union all
select 234 sales, cast('20190601' as date) month union all
select 756 sales, cast('20190701' as date) month union all
select 9 sales, cast('20190801' as date) month union all
select 24356134 sales, cast('20190901' as date) month union all
select 2456134 sales, cast('20191001' as date) month union all
select 234 sales, cast('20191101' as date) month union all
select 675 sales, cast('20191201' as date) month union all
select 86 sales, cast('20200101' as date) month union all
select 786 sales, cast('20200201' as date) month union all
select 715 sales, cast('20200301' as date) month union all
select 156 sales, cast('20200401' as date) month union all
select 123 sales, cast('20200501' as date) month union all
select 687 sales, cast('20200601' as date) month union all
select 45 sales, cast('20200701' as date) month
)
, t1 as (
select sales, month from t where t.month > dateadd(MONTH, -12, getdate())
)
select t1.*, DENSE_RANK() over (order by datefromparts(year(month), month([month]), 1)) rnk from t1
will return
I guess, your first query retrieving the data is correct. Changing the columns' order would have to be done in the SSRS report.
For sorting tablix (table elements in SSRS) have a look here
Posting my own answer:
I have managed to work out a formula that calculates a position number for each column to replace my MonthSort column:
select case when MONTH(saledate) BETWEEN MONTH(dateadd(mm,-11,#ReportDate)) AND 12 THEN((MONTH(saledate)+1)-MONTH(#ReportDate)+11) ELSE ((month(saledate)+1)+month(#ReportDate)+11) end as position,
from table
WHERE (saledate BETWEEN DATEADD(MM,-11,#ReportDate) AND (#ReportDate))
This doesn't quite give me what I wanted as I wanted 12 months but couldn't work out how to differentiate between the same month this year and the same month last year e.g. if the report date is 30/6/2020 then with a 12 month parameter it gives 2 June months (1 for each year 2019 and 2020) but places both of them in position 1 when June 2019 should be in position 1 and June 2020 should be in position 13. Works well with 11 months. If anyone can help with getting it to 12 months I would be grateful
I am posting another answer of my own because it is a different way of achieving what I needed as well as the simplest - I was making this issue far more complicated than it needed to be.
In my second effort I added an extra column called SaleYear using: YEAR(SaleDate). I already had a column for MONTH(SaleDate) that I was using in a case when to achieve the Apr to March sort.
I restricted the data within the SQL query to the last 13 months using a where clause:
WHERE (SaleDate BETWEEN DATEADD(MM,-13,#ReportDate) AND DATEADD(minute, - 1, #ReportDate + 1))
And in the SSRS report I added 'Year' as a parent column group to the 'Month' column group. In the column group I added sorting by Year and then by month.
Because I had already restricted the data in the sql query to the last 13 months I have the last 13 rolling months in the correct order.
This is the cleanest and most simplistic answer.

Postgres return multiple identical results based on the number of years

I have a Postgres query containing subqueries and outputs only 12 rows of data, exactly how I'd like it to look. Each of these rows represents month averaging of data over all the years in the database, one row for each month.
The query:
SELECT
to_char(to_timestamp(to_char(subquery2.month, '999'), 'MM'), 'Mon') AS Month,
subquery2.month AS Month_num,
round(AVG(subquery2.all_use_tot), 2) AS Average_Withdrawals,
round(AVG(subquery2.all_cu_tot), 2) AS Average_Consumptive_Use
FROM
(SELECT
subquery1.month,
subquery1.year,
sum(subquery1.liv_tot) AS all_use_tot,
sum(subquery1.CU_total) AS all_cu_tot
FROM
(SELECT
EXTRACT('Month' FROM withdrawals.begintime) AS month,
EXTRACT(YEAR FROM withdrawals.begintime)::int AS year,
tdx_ic_use_type.use_type_code AS Use_type,
sum(withdrawals.withdrawalvalue_mgd) AS liv_tot,
(tdx_ic_use_type.cufactor)*(sum(withdrawals.withdrawalvalue_mgd)) AS CU_total
FROM
medford.tdx_ic_use_type,
medford.withdrawalsites
JOIN medford.subshed2 ON ST_Contains(medford.subshed2.the_geom, medford.withdrawalsites.the_geom)
JOIN medford.withdrawals ON medford.withdrawals.ic_site_id = medford.withdrawalsites.ic_site_id
WHERE
subshed2.id = '${param.shed_id}' AND
withdrawalsites.ic_pr_use_type_id = tdx_ic_use_type.use_type_code AND
withdrawals.intervalcodes_id = 2
GROUP BY
year,
extract('Month' from withdrawals.begintime),
tdx_ic_use_type.cufactor,
Use_type
ORDER BY
year, extract('Month' from withdrawals.begintime) ASC
) AS subquery1
GROUP BY
subquery1.year,
subquery1.month
) AS subquery2
GROUP BY
subquery2.month
ORDER BY
subquery2.month
What the output looks like:
"Jan" 1 1426.50 472.65
"Feb" 2 1449.00 482.10
"Mar" 3 1459.50 485.55
"Apr" 4 1470.00 489.00
"May" 5 1480.50 492.45
"Jun" 6 1491.00 495.90
"Jul" 7 1489.50 493.35
"Aug" 8 1512.00 502.80
"Sep" 9 1510.50 500.25
"Oct" 10 1533.00 509.70
"Nov" 11 1543.50 513.15
"Dec" 12 1542.00 510.60
The month column I've extracted from a datetime column in the database. I'm using the results of this query and storing it in an array, but I'd like to repeat these exact results once for each year that exists in the table. So, if there are 2 years of data in the database, there should be 24 output rows, or, 12 of the same rows repeated twice. If there are 3 years of data in the database, there should be 36 output rows, or 12 of the same rows repeated 3 times.
How do I accomplish this in my query? Is there a way to loop a query based on values in a column (i.e. the number of years present in a datetime field? )
You should change group by month to group by year, month in your query.
(But this will give you only results where there are data for the given month.)
For iterating through months:
select a::date
from generate_series(
'2011-06-01'::date,
'2012-06-01',
'1 month'
) s(a)
(based on this: Writing a function in SQL to loop through a date range in a UDF)

deriving calendar month from week number

I've had a hunt around for something similar to this but can't find anything.
I have a query that provides the number of transactions that have occurred each day and need to group by year, month, week BUT of course some months span multiple week numbers, eg. Sept. & Oct. 2009.
Take for example week 39 last year (September & October). Thursday is the 1st October therefore 4 days of that week fall in Oct., therefore the volume of transactions for the last 3 days of Sept. should be added to the first week of October's totals? Clear?
For example:
VOLUME----TRANSACTION----YEAR----MONTH----WEEK
1264.1730----53----2009----September----37
2739.7200---109----2009----September----38
522.5500-----21----2009----October----39
1196.6450----51----2009----September----39
2827.9550---113----2009----October----40
2730.4050---110----2009----October----41
3763.7200---154----2009----October----42
3425.6250---137----2009----October----43
3551.8100---143----2009----November--44
2788.0150---113----2009----November--45
The problem is that the calendar is awkward, and there's not much you can do about it. As far as I can see, you have three choices:
Group by year and month. Display the week or weeks in the result but don't group by them.
Group by year and weeks. Display the month or months in the result but don't group by them.
Group by year, month, week, and accept that some of the groups contain less than one week's data. (i.e. what you have now)
From your description it seems like you want option 2:
SELECT year, MIN(month), week, SUM(transaction)
FROM Table1
GROUP BY year, week
Something like this would do:
-- For weeks starting Sunday and ending Saturday, the US default:
SET DATEFIRST 7
-- Alternatively, for weeks starting Saturday and ending Friday:
--SET DATEFIRST 6
SELECT
[Date]
, DATENAME(WEEKDAY,[Date]) AS [DayOfWeek]
, DATEADD(DAY,1-DATEPART(WEEKDAY,[Date]),[Date]) AS WeekStarting
, DATEADD(DAY,7-DATEPART(WEEKDAY,[Date]),[Date]) AS WeekEnding
FROM (
SELECT CONVERT(DATETIME,'20100124') UNION ALL
SELECT CONVERT(DATETIME,'20100125') UNION ALL
SELECT CONVERT(DATETIME,'20100126') UNION ALL
SELECT CONVERT(DATETIME,'20100127') UNION ALL
SELECT CONVERT(DATETIME,'20100128') UNION ALL
SELECT CONVERT(DATETIME,'20100129') UNION ALL
SELECT CONVERT(DATETIME,'20100130') UNION ALL
SELECT CONVERT(DATETIME,'20100131') UNION ALL
SELECT CONVERT(DATETIME,'20100201') UNION ALL
SELECT CONVERT(DATETIME,'20100202') UNION ALL
SELECT CONVERT(DATETIME,'20100203') UNION ALL
SELECT CONVERT(DATETIME,'20100204') UNION ALL
SELECT CONVERT(DATETIME,'20100205') UNION ALL
SELECT CONVERT(DATETIME,'20100206')
) a ([Date])
Then, convert your week start or end date to a month:
SELECT *
, WeekStartingMonthStart = DATEADD(DAY,1-DAY(WeekStarting),WeekStarting)
, WeekStartingMonthEnd = DATEADD(DAY,-1,DATEADD(MONTH,1,DATEADD(DAY,1-DAY(WeekStarting),WeekStarting)))
, WeekEndingMonthStart = DATEADD(DAY,1-DAY(WeekEnding),WeekEnding)
, WeekEndingMonthEnd = DATEADD(DAY,-1,DATEADD(MONTH,1,DATEADD(DAY,1-DAY(WeekEnding),WeekEnding)))
FROM (
SELECT
[Date]
, DATENAME(WEEKDAY,[Date]) AS [DayOfWeek]
, DATEADD(DAY,1-DATEPART(WEEKDAY,[Date]),[Date]) AS WeekStarting
, DATEADD(DAY,7-DATEPART(WEEKDAY,[Date]),[Date]) AS WeekEnding
FROM (
SELECT CONVERT(DATETIME,'20100124') UNION ALL
SELECT CONVERT(DATETIME,'20100125') UNION ALL
SELECT CONVERT(DATETIME,'20100126') UNION ALL
SELECT CONVERT(DATETIME,'20100127') UNION ALL
SELECT CONVERT(DATETIME,'20100128') UNION ALL
SELECT CONVERT(DATETIME,'20100129') UNION ALL
SELECT CONVERT(DATETIME,'20100130') UNION ALL
SELECT CONVERT(DATETIME,'20100131') UNION ALL
SELECT CONVERT(DATETIME,'20100201') UNION ALL
SELECT CONVERT(DATETIME,'20100202') UNION ALL
SELECT CONVERT(DATETIME,'20100203') UNION ALL
SELECT CONVERT(DATETIME,'20100204') UNION ALL
SELECT CONVERT(DATETIME,'20100205') UNION ALL
SELECT CONVERT(DATETIME,'20100206')
) a ([Date])
) a