How to get fields and added in group by in PostreSQL8.4? - postgresql

I am selecting column used in group by and count, and query looks something like
SELECT s.country, count(*) AS posts_ct
FROM store s
JOIN store_post_map sp ON sp.store_id = s.id
GROUP BY 1;
However, I want to select some more fields, like store name or store address from store table where count is max, but I don't to include that in group by clause.

For instance, to get the stores with the highest post-count per country:
SELECT DISTINCT ON (s.country)
s.country, s.store_id, s.name, sp.post_ct
FROM store s
JOIN (
SELECT store_id, count(*) AS post_ct
FROM store_post_map
GROUP BY store_id
) sp ON sp.store_id = s.id
ORDER BY s.country, sp.post_ct DESC
Add any number of columns from store to the SELECT list.
Details about this query style in this related answer:
Select first row in each GROUP BY group?
Reply to comment
This produces the count per country and picks (one of) the store(s) with the highest post-count:
SELECT DISTINCT ON (s.country)
s.country, s.store_id, s.name
,sum(post_ct) OVER (PARTITION BY s.country) AS post_ct_for_country
FROM store s
JOIN (
SELECT store_id, count(*) AS post_ct
FROM store_post_map
GROUP BY store_id
) sp ON sp.store_id = s.id
ORDER BY s.country, sp.post_ct DESC;
This works because the window function sum() is applied before DISTINCT ON per definition.

Related

Query to select by number of associated objects

I have two tables that look like the following:
Orders
------
id
tracking_number
ShippingLogs
------
tracking_number
created_at
stage
I would like to select the IDs of Orders that have ONLY ONE ShippingLog associated with it, and the stage of the ShippingLog must be error. If it has two ShippingLog entries, I don't want it. If it has one ShippingLog bug its stage is shipped, I don't want it.
This is what I have, and it doesn't work, and I know why (it finds the log with the error, but has no way of knowing if there are others). I just don't really know how to get it the way I need it.
SELECT DISTINCT
orders.id, shipping_logs.created_at, COUNT(shipping_logs.*)
FROM
orders
JOIN
shipping_logs ON orders.tracking_number = shipping_logs.tracking_number
WHERE
shipping_logs.created_at BETWEEN '2021-01-01 23:40:00'::timestamp AND '2021-01-26 23:40:00'::timestamp AND shipping_logs.stage = 'error'
GROUP BY
orders.id, shipping_logs.created_at
HAVING
COUNT(shipping_logs.*) = 1
ORDER BY
orders.id, shipping_logs.created_at DESC;
If you want to retain every column from the join of the two tables given your requirements, then I would suggest using COUNT here as an analytic function:
WITH cte AS (
SELECT o.id, sl.created_at,
COUNT(*) OVER (PARTITION BY o.id) num_logs,
COUNT(*) FILTER (WHERE sl.stage <> 'error')
OVER (PARTITION BY o.id) non_error_cnt
FROM orders o
INNER JOIN shipping_logs sl ON sl.tracking_number = o.tracking_number
WHERE sl.created_at BETWEEN '2021-01-01 23:40:00'::timestamp AND
'2021-01-26 23:40:00'::timestamp
)
SELECT id AS order_id, created_at
FROM cte
WHERE num_logs = 1 AND non_error_cnt = 0
ORDER BY id, created_at DESC;

can you use max in this query?

From this table, I'm trying to determine the nation (s) that have the highest number of teams (a nation X has a team if it has at least one athlete from that country X).
driver(id,name, team, country)
This solution restores all countries in descending order. Would it be possible to ensure that only the one (s) with the most team (s) return and not all of them? I think you should use the 'max' command but I'm not sure.
SELECT (country) ,count(distinct team)
FROM driver
GROUP BY country
order by count(distinct team) DESC;
I would use your query as a CTE and then select from it like this -
WITH t AS
(
SELECT country, count(distinct team) cnt
FROM driver
GROUP BY country
)
SELECT country, cnt FROM t
WHERE cnt = (SELECT max(cnt) FROM t);
You can combine this with a window function:
with counts as (
SELECT country,
count(distinct team) as num_teams,
dense_rank() over (order by count(distinct team) desc) as rnk
FROM driver
GROUP BY country
)
select country, num_teams
from counts
where rnk = 1;
If you are using Postgres 14, you can use fetch first with the option with ties:
SELECT country,
count(distinct team) as num_teams
FROM driver
GROUP BY country
order by count(distinct team) desc
fetch first 1 rows with ties
If two countries have the same highest number of drivers, this would return both. Without the with ties option (which was introduced in Postgres 14) only one of them would be returned.

How to get the MAX(SUM of values) to find the category with the biggest total? PostgreSQL

I have two tables. One is Transactions and the other is Tickets. In Tickets I have the Ticket_Number,the name of the Category(Theater,Cinema,Concert), the Price of the Ticket. In Transactions I also have the Ticket_Number. What i want to do is to Get a SUM of money for each Category, and then with that data I want to Select the Category with the most money.
I already managed to get the SUM for each category but I am stuck here
SELECT category, SUM (Tickets.Price) AS Price
FROM Tickets,Transactions
WHERE Tickets.ticket_num=Transactions.ticket_num
GROUP BY Category
ORDER BY Price DESC;
I know i can add LIMIT 1 but I know it's not correct because 2 or more values can be the same
Using ROW_NUMBER to generate a sequence based on the sum of the price. Then, restrict to only the matching aggregated row with the highest total price.
WITH cte AS (
SELECT category, SUM(t1.Price) AS Price,
ROW_NUMBER() OVER (ORDER BY SUM(t1.Price) DESC) rn
FROM Tickets t1
INNER JOIN Transactions t2
ON t1.ticket_num = t2.ticket_num
GROUP BY Category
)
SELECT category, Price
FROM cte
WHERE rn = 1
ORDER BY Price DESC;
Note that if you want to capture all categories tied for the highest price, should a tie occur, then replace ROW_NUMBER in the above CTE with RANK, keeping everything else the same.
What you are looking for is a window function DENSE_RANK() which will handle ties properly.
RANK() will also work for your case, but if you would like to extend it to get TOP N places with ties (where N > 1), dense rank is the way to go.
SELECT Category, Price
FROM (
SELECT
Category,
SUM(ti.Price) AS Price,
DENSE_RANK() OVER (ORDER BY SUM(ti.Price) DESC) AS rnk
FROM Tickets ti
INNER JOIN Transactions tr ON
ti.ticket_num = tr.ticket_num
GROUP BY Category
) t
WHERE rnk = 1
I've also replaced the old style and not recommended joining of tables as comma separated list in FROM clause to a proper INNER JOIN clause and assigned aliases to tables.
You can use rank() to rank the sums of the prices, more expensive first.
SELECT category,
price
FROM (SELECT category,
sum(tickets.price) price,
rank() OVER (ORDER BY sum(tickets.price) DESC) r
FROM tickets
INNER JOIN transactions
ON transactions.ticket_num = tickets.ticket_num
GROUP BY category) x
WHERE r = 1;
I also took the liberty to rewrite your join from the ancient comma style to a modern, clearer version.

Can't solve this SQL query

I have a difficulty dealing with a SQL query. I use PostgreSQL.
The query says: Show the customers that have done at least an order that contains products from 3 different categories. The result will be 2 columns, CustomerID, and the amount of orders. I have written this code but I don't think it's correct.
select SalesOrderHeader.CustomerID,
count(SalesOrderHeader.SalesOrderID) AS amount_of_orders
from SalesOrderHeader
inner join SalesOrderDetail on
(SalesOrderHeader.SalesOrderID=SalesOrderDetail.SalesOrderID)
inner join Product on
(SalesOrderDetail.ProductID=Product.ProductID)
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
from Product
group by ProductCategoryID
having count(DISTINCT ProductCategoryID)>=3)
group by SalesOrderHeader.CustomerID;
Here are the database tables needed for the query:
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
Is never going to give you a result as an ID (SalesOrderDetailID) will never logically match a COUNT (count(ProductCategoryID)).
This should get you the output I think you want.
SELECT soh.CustomerID, COUNT(soh.SalesOrderID) AS amount_of_orders
FROM SalesOrderHeader soh
INNER JOIN SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
INNER JOIN Product p ON sod.ProductID = p.ProductID
HAVING COUNT(DISTINCT p.ProductCategoryID) >= 3
GROUP BY soh.CustomerID
Try this :
select CustomerID,count(*) as amount_of_order from
SalesOrder join
(
select SalesOrderID,count(distinct ProductCategoryID) CategoryCount
from SalesOrderDetail JOIN Product using (ProductId)
group by 1
) CatCount using (SalesOrderId)
group by 1
having bool_or(CategoryCount>=3) -- At least on CategoryCount>=3

How should I add fields without adding them to a GROUP BY?

I have a SQL statement that works as-is. I get an area name and the minimum value within that area. next, I need to add in a key so I can actually do something with the results. The key is necessary since names and values are unlikely to be unique.
select g.name, min(g.rndval) from
(
select p.rndval, a.name, p.id
from points p, areas a
where ST_WITHIN(p.geom, a.geom)
) AS g
group by g.name
When I add the Id field to the group by, the query returns multiple rows for each area, as expected since it's grouping by the name and id combination, and the results are no longer what I need. How should I add in the id field (p.id in the inner select)?
You can try:
WITH cte AS
( select p.rndval, a.name, p.id
from points p, areas a
where ST_WITHIN(p.geom, a.geom)
), cte_aggregated AS
(
SELECT name, min(rndval) AS min_value
FROM cte
GROUP BY name
)
SELECT DISTINCT c.rndval, c.name, c.id
FROM cte c
JOIN cte_aggregated ca
ON c.rndval = ca.min_value
AND c.name = ca.name;
You can solve this quite elegantly with a window function:
select name, rndval as min, id
from (
select a.name, p.rndval, p.id, rank() over (partition by a.name order by p.rndval) as rnk
from points p
join areas a on ST_Within(p.geom, a.geom)) as g
where rnk = 1;