NTILE FUNCTION IN EDB - postgresql

What will be equivalent of following query in edb database:
SELECT
salesman_id,
sales,
year,
NTILE(4) OVER(
PARTITION BY year
ORDER BY sales DESC
) quartile
FROM
salesman_performance
WHERE
year = 2016 OR year = 2017;

Related

Postgresql: Find the average in millions, extract the year from (yyyy-mm-dd) date column, and use where

SELECT EXTRACT(year FROM date) as year from table,
AVG(employment_1000s)/1000 as avg_employment_millions FROM table
WHERE table.state = 'California'
AND table.industry = 'Gov'
AND year BETWEEN 2005 AND 2006;
ERROR: syntax error at or near "/" LINE 2: AVG(employment_1000s)/1000 as avg_employment_millions FROM b... ^ SQL state: 42601 Character: 72
How to fix the error and create a valid code?
You have two from clauses and that's confusing the parser.
SELECT EXTRACT(year FROM date) as year from table,
^^^^^^^^^^
AVG(employment_1000s)/1000 as avg_employment_millions FROM table
^^^^^^^^^^
WHERE table.state = 'California'
AND table.industry = 'Gov'
AND year BETWEEN 2005 AND 2006;
Remove the first one.
SELECT
EXTRACT(year FROM date) as year,
AVG(employment_1000s)/1000 as avg_employment_millions
FROM "table"
WHERE state = 'California'
AND industry = 'Gov'
AND year BETWEEN 2005 AND 2006;
However, that won't work. You're using an aggregate function, avg, without aggregating with a group by. Second, you can't use a derived column, year, in a where clause.
You have to group by year (I assume), and repeat the extraction.
SELECT
EXTRACT(year FROM date) as year,
AVG(employment_1000s)/1000 as avg_employment_millions
FROM states
WHERE state = 'California'
AND industry = 'Gov'
AND extract(year from date) between 2005 and 2006
GROUP BY EXTRACT(year FROM date)
Or do the grouping in a CTE and then select from that.
with california_gov_years as (
SELECT
EXTRACT(year FROM date) as year,
AVG(employment_1000s)/1000 as avg_employment_millions
FROM states
WHERE state = 'California'
AND industry = 'Gov'
group by year
)
select *
from california_gov_years
where year between 2005 and 2006
Demonstration.
Note, year and date are SQL keywords. Avoid using them as column names, they can cause confusion. They're also not descriptive of what the date is. Consider, for example, created_on instead.

Using 'over' function results in column "table.id" must appear in the GROUP BY clause or be used in an aggregate function

I'm currently writing an application which shows the growth of the total number of events in my table over time, I currently have the following query to do this:
query = session.query(
count(Event.id).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
This results in the following output:
Count
Year
Month
100
2021
1
50
2021
2
75
2021
3
While this is okay on it's own, I want it to display the total number over time, so not just the number of events that month, so the desired outpout should be:
Count
Year
Month
100
2021
1
150
2021
2
225
2021
3
I read on various places I should use a window function using SqlAlchemy's over function, however I can't seem to wrap my head around it and every time I try using it I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "event.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT count(event.id) OVER (PARTITION BY event.date ORDER...
^
[SQL: SELECT count(event.id) OVER (PARTITION BY event.date ORDER BY EXTRACT(year FROM event.date), EXTRACT(month FROM event.date)) AS count, EXTRACT(year FROM event.date) AS year, EXTRACT(month FROM event.date) AS month
FROM event
WHERE event.date IS NOT NULL GROUP BY year, month]
This is the query I used:
session.query(
count(Event.id).over(
order_by=(
extract('year', Event.date),
extract('month', Event.date)
),
partition_by=Event.date
).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
Could someone show me what I'm doing wrong? I've been searching for hours but can't figure out how to get the desired output as adding event.id in the group by would stop my rows from getting grouped by month and year
The final query I ended up using:
query = session.query(
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month'),
func.sum(func.count(Event.id)).over(order_by=(
extract('year', Event.date),
extract('month', Event.date)
)).label('count'),
).filter(
Event.date.isnot(None)
).group_by('year', 'month')
I'm not 100% sure what you want, but I'm assuming you want the number of events up to that month for each month. You're going to first need to calculate the # of events per month and also sum them with the postgresql window function.
You can do that with in a single select statement:
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, SUM(COUNT(events.id)) OVER(ORDER BY extract(year FROM events.date), extract(month FROM events.date)) AS total_so_far
FROM events
GROUP BY 1,2
but it might be easier to think about if you split it into two:
SELECT year, month, SUM(events_count) OVER(ORDER BY year, month)
FROM (
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, COUNT(events.id) AS events_count
FROM events
GROUP BY 1,2
)
but not sure how to do that in SqlAlchemy

How to Display Last rolling 12 months in correct month order

I have an SSRS report that shows customer sales for the year and I have been asked to change it to the last 13 rolling months. I have changed my where clause to be:
WHERE (#First12Months.FirstSaleDate BETWEEN DATEADD(MM,-13,#ReportDate) AND (#ReportDate))
( #ReportDate is the last day of the month that needs to be displayed on the right of the matrix.)
This where clause pulls the correct data but it is still displaying in my monthsort order and I need to change this to the last 12 months so that the newest month is on the right and the oldest month is on the left. I cannot work out how to do the sort.
My old sort is MonthSort which gives each month a number where April is 1 through to March = 12:
CASE WHEN Month(#First12Months.FirstSaleDate)<=3 THEN MONTH(#First12Months.FirstSaleDate)+9 ELSE MONTH(#First12Months.FirstSaleDate)-3 END AS MonthSort
but of course this is now incorrect as I need the month from #ReportDate to be number 13 and each month before that chronologically to be 1 number less.
I found this post which is the only one that seems to come close to what I need but unfortunately I simply don't understand what it is saying.
Dynamic table/output each month for report
How do I tell the MonthSort column which number to allocate to the months to get the correct sort order for a rolling 13 months?
As your data is in rows and your SSRS displays it in columns you can do the following:
Add a sorting column to your sql query that uses an analytical function in order to give the (dense) rank of the month. That rank can then be used as a sorting criteria in SSRS.
Assuming your month column is called month, your query could look like this:
select t.*, dense_rank() over (order by month) rnk from t
That order could also be done descending like this:
select t.*, dense_rank() over (order by month desc) rnk from t
Let's have an example:
with t as (
select 2134 sales, cast('20190101' as date) month union all
select 3456 sales, cast('20190201' as date) month union all
select 234 sales, cast('20190301' as date) month union all
select 4567 sales, cast('20190401' as date) month union all
select 5678 sales, cast('20190501' as date) month union all
select 234 sales, cast('20190601' as date) month union all
select 756 sales, cast('20190701' as date) month union all
select 9 sales, cast('20190801' as date) month union all
select 24356134 sales, cast('20190901' as date) month union all
select 2456134 sales, cast('20191001' as date) month union all
select 234 sales, cast('20191101' as date) month union all
select 675 sales, cast('20191201' as date) month union all
select 86 sales, cast('20200101' as date) month union all
select 786 sales, cast('20200201' as date) month union all
select 715 sales, cast('20200301' as date) month union all
select 156 sales, cast('20200401' as date) month union all
select 123 sales, cast('20200501' as date) month union all
select 687 sales, cast('20200601' as date) month union all
select 45 sales, cast('20200701' as date) month
)
, t1 as (
select sales, month from t where t.month > dateadd(MONTH, -12, getdate())
)
select t1.*, DENSE_RANK() over (order by datefromparts(year(month), month([month]), 1)) rnk from t1
will return
I guess, your first query retrieving the data is correct. Changing the columns' order would have to be done in the SSRS report.
For sorting tablix (table elements in SSRS) have a look here
Posting my own answer:
I have managed to work out a formula that calculates a position number for each column to replace my MonthSort column:
select case when MONTH(saledate) BETWEEN MONTH(dateadd(mm,-11,#ReportDate)) AND 12 THEN((MONTH(saledate)+1)-MONTH(#ReportDate)+11) ELSE ((month(saledate)+1)+month(#ReportDate)+11) end as position,
from table
WHERE (saledate BETWEEN DATEADD(MM,-11,#ReportDate) AND (#ReportDate))
This doesn't quite give me what I wanted as I wanted 12 months but couldn't work out how to differentiate between the same month this year and the same month last year e.g. if the report date is 30/6/2020 then with a 12 month parameter it gives 2 June months (1 for each year 2019 and 2020) but places both of them in position 1 when June 2019 should be in position 1 and June 2020 should be in position 13. Works well with 11 months. If anyone can help with getting it to 12 months I would be grateful
I am posting another answer of my own because it is a different way of achieving what I needed as well as the simplest - I was making this issue far more complicated than it needed to be.
In my second effort I added an extra column called SaleYear using: YEAR(SaleDate). I already had a column for MONTH(SaleDate) that I was using in a case when to achieve the Apr to March sort.
I restricted the data within the SQL query to the last 13 months using a where clause:
WHERE (SaleDate BETWEEN DATEADD(MM,-13,#ReportDate) AND DATEADD(minute, - 1, #ReportDate + 1))
And in the SSRS report I added 'Year' as a parent column group to the 'Month' column group. In the column group I added sorting by Year and then by month.
Because I had already restricted the data in the sql query to the last 13 months I have the last 13 rolling months in the correct order.
This is the cleanest and most simplistic answer.

Redshift - Find average sales by month

I have the below query that find count of sales done by month.
select to_char(sale_date,'Mon') as mon,
count (*) as "Sales"
from sales
where to_char(sale_date,'yyyy-mm-dd') between '2018-10-01' and '2018-12-01'
group by 1
I am trying to find the average sales done by month. How could I modify the above query to get this output. I am using Redshift.
You may query a CTE and take the average:
WITH cte AS (
select to_char(sale_date,'Mon') as mon, count (*) as "Sales"
from sales
where to_char(sale_date,'yyyy-mm-dd') between '2018-10-01' and '2018-12-01'
group by 1
)
SELECT AVG(Sales)
FROM cte;
Note that ideally you should be grouping by year and month, because a given month can belong to more than one year. If you wanted to keep your current query, but include the average over all months, then you could try:
select
to_char(sale_date,'Mon') as mon,
extract(year from sale_date) as year,
count (*) as "Sales",
avg(count(*)) over () "AvgSales"
from sales
where to_char(sale_date,'yyyy-mm-dd') between '2018-10-01' and '2018-12-01'
group by 1, 2;

Insert subquery date according to day

I would like to insert subquery a date based on it day. Plus, each date can only be used four times. Once it reached fourth times, the fifth value will use another date of same day. In other word, use date of Monday of next week. Example, Monday with 6 JUNE 2016 to Monday with 13 JUNE 2016 (you may check the calendar).
I have a query of getting a list of date based on presentationdatestart and presentationdateend from presentation table:
select a.presentationid,
a.presentationday,
to_char (a.presentationdatestart + delta, 'DD-MM-YYYY', 'NLS_CALENDAR=GREGORIAN') list_date
from presentation a,
(select level - 1 as delta
from dual
connect by level - 1 <= (select max (presentationdateend - presentationdatestart)
from presentation))
where a.presentationdatestart + delta <= a.presentationdateend
and a.presentationday = to_char(a.presentationdatestart + delta, 'fmDay')
order by a.presentationdatestart + delta,
a.presentationid; --IMPORTANT!!!--
For example,
presentationday presentationdatestart presentationdateend
Monday 01-05-2016 04-06-2016
Tuesday 01-05-2016 04-06-2016
Wednesday 01-05-2016 04-06-2016
Thursday 01-05-2016 04-06-2016
The query result will list all possible dates between 01-05-2016 until 04-06-2016:
Monday 02-05-2016
Tuesday 03-05-2016
Wednesday 04-05-2016
Thursday 05-05-2016
....
Monday 30-05-2016
Tuesday 31-05-2016
Wednesday 01-06-2016
Thursday 02-06-2016 (20 rows)
This is my INSERT query :
insert into CSP600_SCHEDULE (studentID,
studentName,
projectTitle,
supervisorID,
supervisorName,
examinerID,
examinerName,
exavailableID,
availableday,
availablestart,
availableend,
availabledate)
select '2013816591',
'mong',
'abc',
'1004',
'Sue',
'1002',
'hazlifah',
2,
'Monday', //BASED ON THIS DAY
'12:00:00',
'2:00:00',
to_char (a.presentationdatestart + delta, 'DD-MM-YYYY', 'NLS_CALENDAR=GREGORIAN') list_date //FOR AVAILABLEDATE
from presentation a,
(select level - 1 as delta
from dual
connect by level - 1 <= (select max (presentationdateend - presentationdatestart)
from presentation))
where a.presentationdatestart + delta <= a.presentationdateend
and a.presentationday = to_char(a.presentationdatestart + delta, 'fmDay')
order by a.presentationdatestart + delta,
a.presentationid;
This query successfully added 20 rows because all possible dates were 20 rows. I would like modify the query to be able to insert based on availableDay and each date can only be used four times for each different studentID.
Possible outcome in CSP600_SCHEDULE (I am removing unrelated columns to ease readability):
StudentID StudentName availableDay availableDate
2013 abc Monday 01-05-2016
2014 def Monday 01-05-2016
2015 ghi Monday 01-05-2016
2016 klm Monday 01-05-2016
2010 nop Tuesday 02-05-2016
2017 qrs Tuesday 02-05-2016
2018 tuv Tuesday 02-05-2016
2019 wxy Tuesday 02-05-2016
.....
2039 rrr Monday 09-05-2016
.....
You may check the calendar :)
I think what you're asking for is to list your students and then batch them up in groups of 4 - each batch is then allocated to a date. Is that right?
In which case something like this should work (I'm using a list of tables as the student names just so I don't need to insert any data into a custom table) :
WITH students AS
(SELECT table_name
FROM all_tables
WHERE rownum < 100
)
SELECT
table_name
,SYSDATE + (CEIL(rownum/4) -1)
FROM
students
;
I hope that helps you
...okay, following your comments, I think this might be a better solution :
WITH students AS
(SELECT table_name student_name
FROM all_tables
WHERE rownum < 100
)
, dates AS
(SELECT TRUNC(sysdate) appointment_date from dual UNION
SELECT TRUNC(sysdate+2) from dual UNION
SELECT TRUNC(sysdate+4) from dual UNION
SELECT TRUNC(sysdate+6) from dual UNION
SELECT TRUNC(sysdate+8) from dual UNION
SELECT TRUNC(sysdate+10) from dual UNION
SELECT TRUNC(sysdate+12) from dual UNION
SELECT TRUNC(sysdate+14) from dual
)
SELECT
s.student_name
,d.appointment_date
FROM
--get a list of students each with a sequential row number, ordered by student name
(SELECT
student_name
,ROW_NUMBER() OVER (ORDER BY student_name) rn
FROM students
) s
--get a list of available dates with a sequential row number, ordered by date
,(SELECT
appointment_date
,ROW_NUMBER() OVER (ORDER BY appointment_date) rn
FROM dates
) d
WHERE 1=1
--allocate the first four students to date rownumber1, next four students to date rownumber 2...
AND CEIL(s.rn/4) = d.rn
;