MAX(COUNT(...)) - ERROR: calls to aggregate functions cannot be nested - postgresql

I have two tables - ticket(..,event_name,..) and event(..,name,..).
Task:
Write a query that displays the name of the event and the number of tickets for the event for which the largest number of tickets was sold.
The code below get an error: ERROR: calls to aggregate functions cannot be nested
SELECT name, COUNT(event_name) as ticket_count
FROM event
INNER JOIN ticket
ON event.name = ticket.event_name
GROUP BY name
HAVING COUNT(event_name) = MAX(COUNT(event_name));
I know I can't use MAX(COUNT()), but what I should write instead of it for having the similar logic?
I only have the hint from my lecturer :)
COUNT(...)= (SELECT MAX(...) FROM (...))

You will have to rank the events based on the count, e.g. using the window function dense_rank().
select event_name, ticket_count
from (
select event_name,
count(*) as ticket_count,
dense_rank() over (order by count(*) desc) as rnk
from ticket
group by event_name
) t
where rnk = 1;
The window function dense_rank() is applied after the group by and will calculate the rank based on the number of tickets for that event. Because of the order by ... DESC the even with the highest number of tickets will get a rank of 1.
If there are two events with the same highest number of tickets, both will be listed. If you don't want that, use row_number() instead of dense_rank()
Note that I also removed the event table from the query as it is not needed for this.

A simplest solution I can think of is making use of ordering and LIMIT 1.
SELECT
name, COUNT(event_name) as ticket_count
FROM event
LEFT JOIN ticket ON event.name = ticket.event_name
GROUP BY
name
ORDER BY
ticket_count DESC
LIMIT 1;
(using LEFT JOIN as you may have events without tickets I guess)

Related

Converting counts inside query result tables to percentages of total

I have a table and want to calculate the percentage of total by store_id which each (category_id, store_id) subtotal represents. My code is below:
WITH
example_table (name, store_id)
AS
(
select name, store_id
from category
join film_category using (category_id)
join film using (film_id)
join inventory using (film_id)
join rental using (inventory_id)
)
SELECT name, store_id, cast(count(*) as numeric)/(SELECT count(*) FROM example_table)
FROM example_table
GROUP BY name, store_id
ORDER BY name, store_id
This code actually works, as in, it doesn't throw an error, only they're not the results I'm looking for. Here each of the subtotals is divided by the total across both stores and all 16 names. Instead, I want the subtotals divided by their respective store totals or divided by their respective name totals.
I'm wondering how to perform calculations on those subtotals in general.
Thanks in advance,
I believe you need to explore the possibilities of using aggregate functions combined with an OVER(PARTITION BY ...) e.g.
SELECT DISTINCT
name, store_id, store_id_count, name_count
FROM (
select name, store_id
, count(*) over(partition by store_id) as store_id_count
, count(*) over(partition by name) as name_count
from category
join film_category using (category_id)
join film using (film_id)
join inventory using (film_id)
join rental using (inventory_id)
) AS example_table
When using aggregate function with the over clause you get the wanted counts on each row of the result, and it seems that in this case you need this. Note that select distinct has been used simply to reduce the final number of rows returned, you might still need to use a group by but I am not sure if you do.
Once you have the needed values within the derived table (aliases as example_table) then it should be a simple matter of some arithmetic in the overall select clause.

ROW_NUMBER() OVER PARTITION optimization

I have the following query:
SELECT *
FROM
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Code ORDER BY Price ASC) as RowNum
from Offers) r
where RowNum = 1
Offers table contains about 10 million records. But there are only ~4000 distinct codes there. So I need to get the row with the lowest price for each code and there will be only 4000 rows in the result.
I have an Index on (Code, Price) columns with all other columns in INCLUDE statement.
The query runs for 2 minutes. And if I look at the execution plan, I see Index scan with 10M actual rows. So, I guess it scans the whole index to get needed values.
Why MSSQL do the whole index scan? Is it because subquery needs the whole data? How to avoid this scan? Is there a SQL hint to process only the first row in partition?
Is there another way to optimize such query?
After trying multiple different solutions, I've found the fastest query with CROSS APPLY statement:
SELECT C.*
FROM (SELECT DISTINCT Code from Offers) A
CROSS APPLY (SELECT TOP 1 *
FROM Offers B
WHERE A.Code = B.Code
ORDER by Price) C
It take ~1 second to run.
Try creating an index on ( Code, Price ) without including the other columns and then (assuming that there is a unique Id column):
select L.*
from Offers as L inner join
( select Id,
Row_Number() over ( partition by Code order by Price ) as RN
from Offers ) as R on R.Id = L.Id and R.RN = 1
An index scan on a smaller index ought to help.
Second guess would be to get the Id of the row with the lowest Price for each Code explicitly: Get distinct Code values, get Id of top 1 (to avoid problems with duplicate prices) Min( Price ) row for that Code, join with Offers to get complete rows. Again, the more compact index should help.
Not sure if you'll get any significant performance gains, but you may want to try the WITH TIES clause
Example
Select Top 1 with Ties *
From Offers
Order By Row_Number() over (Partition By Code Order By Price)

select last of an item for each user in postgres

I want to get the last entry for each user but the customer_id is a hash 'ASAG#...' order by customer_id destroys the query. Is there an alternative?
Select Distinct On (l.customer_id)
l.customer_id
,l.created_at
,l.text
From likes l
Order By l.customer_id, l.created_at Desc
Your current query already appears to be working, q.v. here:
Demo
I don't know why your current query is not generating the results you would expect. It should return one distinct record for every customer, corresponding to the more recent one, given your ORDER BY statement.
In any case, if it does not do what you want, an alternative would be to use ROW_NUMBER() here with a partition by user. The inner query assigns a row number to each user, with the value 1 going to the most recent record for each user. Then the outer query retains only the latest record.
SELECT
t.customer_id,
t.created_at,
t.text
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) rn
FROM likes
) t
WHERE t.rn = 1
To speed up the inner query which uses ROW_NUMBER() you can try adding a composite index on the customer_id and created_at columns:
CREATE INDEX yourIdx ON likes (customer_id, created_at);

PostgreSQL RANK() function over aggregated column

I'm constructing quite complex query, where I try to load users with their aggregated points altogether with their rank. I found the RANK() function that could help me to achieve this but can't get it working.
Here's the query that is working without RANK:
SELECT users.*, SUM(received_points.count) AS pts
FROM users
LEFT JOIN received_points ON received_points.user_id = users.id AND ...other joining conditions...
GROUP BY users.id
ORDER BY pts DESC NULLS LAST
Now I would like to select also the rank - but this way using RANK function it's not working:
SELECT users.*, SUM(received_points.count) AS pts,
RANK() OVER (ORDER BY pts DESC NULLS LAST) AS position
FROM users
LEFT JOIN received_points ON received_points.user_id = users.id AND ...other joining conditions...
GROUP BY users.id
ORDER BY pts DESC NULLS LAST
It tells: PG::UndefinedColumn: ERROR: column "pts" does not exist
I guess I get whole concept of window functions wrong. How can I select the rank of user sorted by aggregated value like pts in example above?
I know I can assign ranks manually afterwards but what if I want to also filter the rows according to users.name in query and still get user's rank in general (not-filtered) leaderboard...? Dont know if I'm clear...
As Marth suggested in his comment:
You can't use pts here as the alias doesn't exist yet (you can't reference an alias in the same SELECT it's defined). RANK() OVER (ORDER BY SUM(received_points.count) DESC NULLS LAST) should work fine.

SQL SELECT in another table with most recent date

I have a list of Matter data in Table1 that I need to query, as well as get the most recent Invoice Number in Table2 that is tied to the original Matter. I'm having extreme difficulty in joining these tables together and only getting one result for each Matter as I only want the most recent Invoice #.
Any and all help would be greatly appreciated.
Table1
Table2
RESULT
The following assigns numbers to each invoice row in order of date, and selects only the most recent. Note that this assumes InvoiceDate is stored as a date,datetime, or something else that will sort chronologically, and that in the event of two invoices for the same date, returning either will be fine. If you need to return both invoices in the event of ties, replace row_number with rank.
Select * from Table1 a
inner join
(Select *
, row_number() over (partition by MatterID order by InvoiceDate desc) as RN
from Table2) b
on a.MatterID = b.MatterID and b.RN = 1