The following query (tested with Postgresql 11.1) evaluates, for each customer/product combination, the following elements:
(A) the sum of sales value that the customer spent on this product
(B) the sum of sales value that the customer spent in the parent category of this product
And divides A / B to get to a metric called loyalty.
select
pp.customer, pp.product, pp.category,
pp.sales_product / pc.sales_category as loyalty
from (
select
t.household_key as customer,
t.product_id as product,
p.commodity as category,
sum(t.sales_value) as sales_product
from transaction_data t
left join product p on p.product_id = t.product_id
group by t.household_key, t.product_id, p.commodity
) pp
left join (
select
t.household_key as customer,
p.commodity as category,
sum(t.sales_value) as sales_category
from transaction_data t
left join product p on p.product_id = t.product_id
group by t.household_key, p.commodity
) pc on pp.customer = pc.customer and pp.category = pc.category
;
Results are of this form:
customer product category loyalty
---------------------------------------------
1 tomato food 0.01
1 beef food 0.02
1 toothpaste hygiene 0.04
1 toothbrush hygiene 0.03
My question is, instead of having to rely on two sub-queries which are then left-joined, would it be feasible with a single query using window functions instead?
I've tried to do something like the following, but obviously this doesn't work because, in this case, column "t.sales_value" must appear in the GROUP BY clause or be used in an aggregate function. I don't see what can be done to fix this.
-- does not work
select
t.household_key as customer,
t.product_id as product,
p.commodity as category,
sum(t.sales_value) as sales_product,
sum(t.sales_value) over (partition by t.household_key, p.commodity) as sales_category
from transaction_data t
left join product p on p.product_id = t.product_id
group by t.household_key, t.product_id, p.commodity;
I don't know how to do this without using either a join or a subquery, but here is one way to do this with a subquery, using analytic functions:
WITH cte AS (
SELECT
t.household_key AS customer,
t.product_id AS product,
p.commodity as category,
SUM(t.sales_value) OVER (PARTITION BY t.household_key, t.product_id, p.commodity)
AS sales_product,
SUM(t.sales_value) OVER (PARTITION BY t.household_key, p.commodity)
AS sales_category
FROM transaction_data t
LEFT JOIN product p
ON p.product_id = t.product_id
)
SELECT
t.customer,
t.product,
t.category
MAX(t.sales_product) / MAX(t.sales_category) AS loyalty
FROM cte
GROUP BY
t.customer,
t.product,
t.category;
The trick here is to make a single pass over your joined tables, and use analytic sum to compute the aggregates you want, with two different partitions, one with 2 columns and the other with three columns. Then, we can aggregate by 3 columns and just arbitrarily take the max value of the aggregates for each group.
Related
I have a products table and corresponding ratings table which contains a foreign key product_id, grade(int) and type which is an enum accepting values robustness and price_quality_ratio
The grades accept values from 1 to 10. So for example, how would the query look like, if I wanted to filter the products where minimum grade for robustness would be 7 and minimum grade for price_quality_ratio would be 8?
You can join twice, once per rating. The inner joins eliminate the products that fail any rating criteria,
select p.*
from products p
inner join rating r1
on r1.product_id = p.product_id
and r1.type = 'robustness'
and r1.rating >= 7
inner join rating r2
on r2.product_id = p.product_id
and r2.type = 'price_quality_ratio'
and r2.rating >= 8
Another option is to use do conditional aggregation. This requires only one join, then a group by; the rating criteria are checked in the having clause.
select p.product_id, p.product_name
from products p
inner join rating r
on r.product_id = p.product_id
and r.type in ('robustness', 'price_quality_ratio')
group by p.product_id, p.product_name
having
min(case when r.type = 'robustness' then r.rating end) >= 7
and min(case when r.type = 'price_quality_ratio then r.rating end) >= 8
The JOIN proposed by #GMB would've been my first suggestion as well. If that gets too complicated with having to maintain too many rX.ratings, you can also use a nested query:
SELECT *
FROM (
SELECT p.*, r1.rating as robustness, r2.rating as price_quality_ratio
FROM products p
JOIN rating r1 ON (r1.product_id = p.product_id AND r1.type = 'robustness')
JOIN rating r2 ON (r2.product_id = p.product_id AND r2.type = 'price_quality_ratio')
) AS tmp
WHERE robustness >= 7
AND price_quality_ratio >= 8
-- ORDER BY (price_quality_ratio DESC, robustness DESC) -- etc
I have two tables. One is Transactions and the other is Tickets. In Tickets I have the Ticket_Number,the name of the Category(Theater,Cinema,Concert), the Price of the Ticket. In Transactions I also have the Ticket_Number. What i want to do is to Get a SUM of money for each Category, and then with that data I want to Select the Category with the most money.
I already managed to get the SUM for each category but I am stuck here
SELECT category, SUM (Tickets.Price) AS Price
FROM Tickets,Transactions
WHERE Tickets.ticket_num=Transactions.ticket_num
GROUP BY Category
ORDER BY Price DESC;
I know i can add LIMIT 1 but I know it's not correct because 2 or more values can be the same
Using ROW_NUMBER to generate a sequence based on the sum of the price. Then, restrict to only the matching aggregated row with the highest total price.
WITH cte AS (
SELECT category, SUM(t1.Price) AS Price,
ROW_NUMBER() OVER (ORDER BY SUM(t1.Price) DESC) rn
FROM Tickets t1
INNER JOIN Transactions t2
ON t1.ticket_num = t2.ticket_num
GROUP BY Category
)
SELECT category, Price
FROM cte
WHERE rn = 1
ORDER BY Price DESC;
Note that if you want to capture all categories tied for the highest price, should a tie occur, then replace ROW_NUMBER in the above CTE with RANK, keeping everything else the same.
What you are looking for is a window function DENSE_RANK() which will handle ties properly.
RANK() will also work for your case, but if you would like to extend it to get TOP N places with ties (where N > 1), dense rank is the way to go.
SELECT Category, Price
FROM (
SELECT
Category,
SUM(ti.Price) AS Price,
DENSE_RANK() OVER (ORDER BY SUM(ti.Price) DESC) AS rnk
FROM Tickets ti
INNER JOIN Transactions tr ON
ti.ticket_num = tr.ticket_num
GROUP BY Category
) t
WHERE rnk = 1
I've also replaced the old style and not recommended joining of tables as comma separated list in FROM clause to a proper INNER JOIN clause and assigned aliases to tables.
You can use rank() to rank the sums of the prices, more expensive first.
SELECT category,
price
FROM (SELECT category,
sum(tickets.price) price,
rank() OVER (ORDER BY sum(tickets.price) DESC) r
FROM tickets
INNER JOIN transactions
ON transactions.ticket_num = tickets.ticket_num
GROUP BY category) x
WHERE r = 1;
I also took the liberty to rewrite your join from the ancient comma style to a modern, clearer version.
I am trying to calculate a percentile using the percentile_cont() function in PostgreSQL using common table expressions. The goal is find the top 1% of accounts regards to their balances (called amount here). My logic is to find the 99th percentile which will return those whose account balances are greater than 99% of their peers (and thus finding the 1 percenters)
Here is my query
--ranking subquery works fine
with ranking as(
select a.lname,sum(c.amount) as networth from customer a
inner join
account b on a.customerid=b.customerid
inner join
transaction c on b.accountid=c.accountid
group by a.lname order by sum(c.amount)
)
select lname, networth, percentile_cont(0.99) within group
order by networth over (partition by lname) from ranking ;
I keeping getting the following error.
ERROR: syntax error at or near "order"
LINE 2: ...ame, networth, percentile_cont(0.99) within group order by n..
I am thinking that perhaps I forgot a closing brace etc. but I can't seem to figure out where. I know it could be something with the order keyword but I am not sure what to do. Can you please help me to fix this error?
This tripped me up, too.
It turns out percentile_cont is not supported in postgres 9.3, only in 9.4+.
https://www.postgresql.org/docs/9.4/static/release-9-4.html
So you have to use something like this:
with ordered_purchases as (
select
price,
row_number() over (order by price) as row_id,
(select count(1) from purchases) as ct
from purchases
)
select avg(price) as median
from ordered_purchases
where row_id between ct/2.0 and ct/2.0 + 1
That query care of https://www.periscopedata.com/blog/medians-in-sql (section: "Median on Postgres")
You are missing the brackets in the within group (order by x) part.
Try this:
with ranking
as (
select a.lname,
sum(c.amount) as networth
from customer a
inner join account b on a.customerid = b.customerid
inner join transaction c on b.accountid = c.accountid
group by a.lname
order by networth
)
select lname,
networth,
percentile_cont(0.99) within group (
order by networth
) over (partition by lname)
from ranking;
I want to point out that you don't need a subquery for this:
select c.lname, sum(t.amount) as networth,
percentile_cont(0.99) within group (order by sum(t.amount)) over (partition by lname)
from customer c inner join
account a
on c.customerid = a.customerid inner join
transaction t
on a.accountid = t.accountid
group by c.lname
order by networth;
Also, when using table aliases (which should be always), table abbreviations are much easier to follow than arbitrary letters.
Select p.prodCode,
p.description,
p.unit,
SUM(sd.quantity) "Total quantity"
FROM salesDetail sd
RIGHT JOIN product p
ON p.prodCode = sd.prodCode
GROUP BY p.prodCode
ORDER BY 4 DESC
Help! My Script is not running. I need to get the total quantity of every product but my group by is not working.
Compute the sum of the quantity per product in separate subquery, and then join this back to the original product table:
SELECT t1.prodCode,
t1.description,
t1.unit,
t2.total_quantity
FROM product t1
INNER JOIN
(
SELECT p.prodCode, SUM(sd.quantity) total_quantity
FROM product p
LEFT JOIN salesDetail sd
ON p.prodCode = sd.prodCode
GROUP BY p.prodCode
) t2
ON t1.prodCode = t2.prodCode
Note that I replaced the RIGHT JOIN with a LEFT JOIN by switching the order of the joined tables in the subquery.
Update:
If you absolutely need to use a RIGHT JOIN, then just replace the subquery with this:
SELECT p.prodCode, SUM(sd.quantity) total_quantity
FROM salesDetail sd
RIGHT JOIN product p
ON p.prodCode = sd.prodCode
GROUP BY p.prodCode
I have a difficulty dealing with a SQL query. I use PostgreSQL.
The query says: Show the customers that have done at least an order that contains products from 3 different categories. The result will be 2 columns, CustomerID, and the amount of orders. I have written this code but I don't think it's correct.
select SalesOrderHeader.CustomerID,
count(SalesOrderHeader.SalesOrderID) AS amount_of_orders
from SalesOrderHeader
inner join SalesOrderDetail on
(SalesOrderHeader.SalesOrderID=SalesOrderDetail.SalesOrderID)
inner join Product on
(SalesOrderDetail.ProductID=Product.ProductID)
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
from Product
group by ProductCategoryID
having count(DISTINCT ProductCategoryID)>=3)
group by SalesOrderHeader.CustomerID;
Here are the database tables needed for the query:
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
Is never going to give you a result as an ID (SalesOrderDetailID) will never logically match a COUNT (count(ProductCategoryID)).
This should get you the output I think you want.
SELECT soh.CustomerID, COUNT(soh.SalesOrderID) AS amount_of_orders
FROM SalesOrderHeader soh
INNER JOIN SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
INNER JOIN Product p ON sod.ProductID = p.ProductID
HAVING COUNT(DISTINCT p.ProductCategoryID) >= 3
GROUP BY soh.CustomerID
Try this :
select CustomerID,count(*) as amount_of_order from
SalesOrder join
(
select SalesOrderID,count(distinct ProductCategoryID) CategoryCount
from SalesOrderDetail JOIN Product using (ProductId)
group by 1
) CatCount using (SalesOrderId)
group by 1
having bool_or(CategoryCount>=3) -- At least on CategoryCount>=3