Postgres, aggregate function and ORDER BY - postgresql

The statement works for me:
SELECT e.id, e.title, array_agg(d.start_date) date, array_agg(d.id) ids
FROM event e JOIN event_date d ON e.id = d.event_id
GROUP BY e.id
I receive the results
id
title
dates
ids
1
First Event
{2022-06-05,2022-10-05}
{1,2}
2
Second Event
{2022-07-05}
{3}
I want to order events by start_date. For that, I add ORDER BY d.start_date DESC to the statement:
SELECT e.id, e.title, array_agg(d.start_date) date, array_agg(d.id) ids
FROM event e JOIN event_date d ON e.id = d.event_id
GROUP BY e.id
ORDER BY d.start_date DESC
And I receive the error message:
ERROR: column "d.start_date" must appear in the GROUP BY clause or be
used in an aggregate function LINE 4: ORDER BY d.start_date DESC
I don't understand it. array_agg is an aggregate function. How to solve this issue?

it seems to work
SELECT e.id, e.title, array_agg(d.start_date ORDER BY d.start_date ASC) AS dates
FROM event e JOIN event_date d ON e.id = d.event_id
GROUP BY e.id
ORDER BY dates ASC

Related

Sub query in SELECT - ungrouped column from outer query

I have to calculate the ARPU (Revenue / # users) but I got this error:
subquery uses ungrouped column "usage_records.date" from outer query
LINE 7: WHERE created_at <= date_trunc('day', usage_records.d... ^
Expected results:
Revenue(day) = SUM(quantity_eur) for that day
Users Count (day) = Total signed up users before that day
Postgresql (Query)
SELECT
date_trunc('day', usage_records.date) AS day,
SUM(usage_records.quantity_eur) as Revenue,
( SELECT
COUNT(users.id)
FROM users
WHERE created_at <= date_trunc('day', usage_records.date)
) as users_count
FROM users
INNER JOIN ownerships ON (ownerships.user_id = users.id)
INNER JOIN profiles ON (profiles.id = ownerships.profile_id)
INNER JOIN usage_records ON (usage_records.profile_id = profiles.id)
GROUP BY DAY
ORDER BY DAY asc
your subquery (executed for each row ) cointain a column nont mentioned in group by but not involeved in aggregation ..
this produce error
but you could refactor your query using a contional also for this value
SELECT
date_trunc('day', usage_records.date) AS day,
SUM(usage_records.quantity_eur) as Revenue,
sum( case when created_at <= date_trunc('day', usage_records.date)
AND users.id is not null
then 1 else 0 end ) users_count
FROM users
INNER JOIN ownerships ON (ownerships.user_id = users.id)
INNER JOIN profiles ON (profiles.id = ownerships.profile_id)
INNER JOIN usage_records ON (usage_records.profile_id = profiles.id)
GROUP BY DAY
ORDER BY DAY asc

Can't solve this SQL query

I have a difficulty dealing with a SQL query. I use PostgreSQL.
The query says: Show the customers that have done at least an order that contains products from 3 different categories. The result will be 2 columns, CustomerID, and the amount of orders. I have written this code but I don't think it's correct.
select SalesOrderHeader.CustomerID,
count(SalesOrderHeader.SalesOrderID) AS amount_of_orders
from SalesOrderHeader
inner join SalesOrderDetail on
(SalesOrderHeader.SalesOrderID=SalesOrderDetail.SalesOrderID)
inner join Product on
(SalesOrderDetail.ProductID=Product.ProductID)
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
from Product
group by ProductCategoryID
having count(DISTINCT ProductCategoryID)>=3)
group by SalesOrderHeader.CustomerID;
Here are the database tables needed for the query:
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
Is never going to give you a result as an ID (SalesOrderDetailID) will never logically match a COUNT (count(ProductCategoryID)).
This should get you the output I think you want.
SELECT soh.CustomerID, COUNT(soh.SalesOrderID) AS amount_of_orders
FROM SalesOrderHeader soh
INNER JOIN SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
INNER JOIN Product p ON sod.ProductID = p.ProductID
HAVING COUNT(DISTINCT p.ProductCategoryID) >= 3
GROUP BY soh.CustomerID
Try this :
select CustomerID,count(*) as amount_of_order from
SalesOrder join
(
select SalesOrderID,count(distinct ProductCategoryID) CategoryCount
from SalesOrderDetail JOIN Product using (ProductId)
group by 1
) CatCount using (SalesOrderId)
group by 1
having bool_or(CategoryCount>=3) -- At least on CategoryCount>=3

postgresql complex query joing same table

I would like to get those customers from a table 'transactions' which haven't created any transactions in the last 6 Months.
Table:
'transactions'
id, email, state, paid_at
To visualise:
|------------------------ time period with all transactions --------------------|
|-- period before month transactions > 0) ---|---- curr month transactions = 0 -|
I guess this is doable with a join showing only those that didn't have any transactions on the right side.
Example:
Month = November
The conditions for the left side should be:
COUNT(l.id) > 0
l.paid_at < '2013-05-01 00:00:00'
Conditions for the right side:
COUNT(r.id) = 0
r.paid_at BETWEEN '2013-05-01 00:00:00' AND '2013-11-30 23:59:59'
Is join the right approach?
Answer
SELECT
C .email
FROM
transactions C
WHERE
(
C .email NOT IN (
SELECT DISTINCT
email
FROM
transactions
WHERE
paid_at >= '2013-05-01 00:00:00'
AND paid_at <= '2013-11-30 23:59:59'
)
AND
C .email IN (
SELECT DISTINCT
email
FROM
transactions
WHERE
paid_at <= '2013-05-01 00:00:00'
)
)
AND c.paid_at <= '2013-11-30 23:59:59'
There are a couple of ways you could do this. Use a subquery to get distinct customer ids for transactions in the last 6 months, and then select customers where their id isn't in the subquery.
select c.id, c.name
from customer c
where c.id not in (select distinct customer_id from transaction where dt between <start> and <end>);
Or, use a left join from customer to transaction, and filter the results to have transaction id null. A left join includes all rows from the left-hand table, even when there are no matching rows in the right-hand table. Explanation of left joins here: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
select c.id, c.name
from customer c
left join transaction t on c.id = t.customer_id
and t.dt between <start> and <end>
where t.id is null;
The left join approach is likely to be faster.

How to display rollup data in new column?

I have the following query which returns the number of android questions per each day on StackOverflow in the year of 2011. I want to get the sum of all the questions asked during the year 2011. For this I am using ROLLUP.
select
year(p.CreationDate) as [Year],
month(p.CreationDate) as [Month],
day(p.CreationDate) as [Day],
count(*) as [QuestionsAskedToday]
from Posts p
inner join PostTags pt on p.id = pt.postid
inner join Tags t on t.id = pt.tagid
where
t.tagname = 'android' and
p.CreationDate > '2011-01-01 00:00:00'
group by year(p.CreationDate), month(p.CreationDate),day(p.CreationDate)
​with rollup
order by year(p.CreationDate), month(p.CreationDate) desc,day(p.CreationDate) desc​
This is the output:
The sum of all questions asked on each day in 2011 is being displayed in the QuestionsAskedToday column itself.
Is there a way to display the rollup in a new column with an alias?
Link to the query
To show this as a column rather than a row you can use SUM(COUNT(*)) OVER () instead of ROLLUP. (Online Demo)
SELECT YEAR(p.CreationDate) AS [Year],
MONTH(p.CreationDate) AS [Month],
DAY(p.CreationDate) AS [Day],
COUNT(*) AS [QuestionsAskedToday],
SUM(COUNT(*)) OVER () AS [Total]
FROM Posts p
INNER JOIN PostTags pt
ON p.id = pt.postid
INNER JOIN Tags t
ON t.id = pt.tagid
WHERE t.tagname = 'android'
AND p.CreationDate > '2011-01-01 00:00:00'
GROUP BY YEAR(p.CreationDate),
MONTH(p.CreationDate),
DAY(p.CreationDate)
ORDER BY YEAR(p.CreationDate),
MONTH(p.CreationDate) DESC,
DAY(p.CreationDate) DESC
You could take an approach like this: Example
SELECT
YEAR(p.CreationDate) AS 'Year'
, CASE
WHEN GROUPING(MONTH(p.CreationDate)) = 0
THEN CAST(MONTH(p.CreationDate) AS VARCHAR(2))
ELSE 'Totals:'
END AS 'Month'
, CASE
WHEN GROUPING(DAY(p.CreationDate)) = 0
THEN CAST(DAY(p.CreationDate) AS VARCHAR(2))
ELSE 'Totals:'
END AS [DAY]
, CASE
WHEN GROUPING(MONTH(p.CreationDate)) = 0
AND GROUPING(DAY(p.CreationDate)) = 0
THEN COUNT(1)
END AS 'QuestionsAskedToday'
, CASE
WHEN GROUPING(MONTH(p.CreationDate)) = 1
OR GROUPING(DAY(p.CreationDate)) = 1
THEN COUNT(1)
END AS 'Totals'
FROM Posts AS p
INNER JOIN PostTags AS pt ON p.id = pt.postid
INNER JOIN Tags AS t ON t.id = pt.tagid
WHERE t.tagname = 'android'
AND p.CreationDate >= '2011-01-01'
GROUP BY ROLLUP(YEAR(p.CreationDate)
, MONTH(p.CreationDate)
, DAY(p.CreationDate))
ORDER BY YEAR(p.CreationDate)
, MONTH(p.CreationDate) DESC
, DAY(p.CreationDate) DESC​​​​​​​
If this is what you wanted, the same technique can be applied to Years as well to total them in the new column, or their own column, if you want to query for multiple years and aggregate them.

SQL Join Statement Issue

I'm tring to grab all fields from the latest Cash record, and then all fields from the related TransactionInfo record. I can't quite get this to work yet:
select t.*, top 1 c.* from Cash c
inner join TransactionInfo t
on c.TransactionID = t.id
order by c.createdOn desc
select top 1 *
from Cash c
inner join TransactionInfo t on c.TransactionID = t.id
order by createdOn desc
What's that top 1 doing there? If you only want one row then the TOP(1) must come first:
SELECT TOP(1) t.*, c.*
FROM Cash c
INNER JOIN TransactionInfo t
ON c.TransactionID = t.id
ORDER BY c.createdOn DESC
select t.,c.
from (Select top 1 * from Cash order by createdOn desc
) c
inner join TransactionInfo t
on c.TransactionID = t.id
order by createdOn desc
DOn;t use select * especially with a join, it wastes server resources.
SELECT c.*, t.* FROM cash c, transactioninfo t
WHERE c.infoid = t.id AND c.createdOn = (SELECT max(createdOn) FROM cash WHERE infoId = t.id) ORDER BY transactiontabledate desc
You need to find the record with the latest date from the cash table for each transactionId and use that also to filter it out in your query.