GROUP BY - How to create 3 group for the column? - postgresql

Say I have a table of products, fields are id, number_of_product, price
Let's price is min = 100, max = 1000*
How to create 3 groups for this column (PostgreSQL) - 100-400, 400-600, 600-1000*
*PS - it would be nice to know how to split into 3 equal parts.
SELECT COUNT(id),
COUNT(number_of_product),
!!!! price - ?!
FROM Scheme.Table
GROUP BY PRICE

You can try next query:
with p as (
select
*,
min(price) over() min_price,
(max(price) over() - min(price) over()) / 3 step
from products
) select
id, product, price,
case
when price < min_price + step then 'low_price'
when price < min_price + 2 * step then 'mid_price'
else 'high'
end as category
from p
order by price;
PostgreSQL fiddle

To do this quickly, you can use a case statement to set the groups.
CASE WHEN price BETWEEN 100 AND 400 THEN 1 WHEN price BETWEEN 400 AND 600 THEN 2 WHEN price BETWEEN 600 AND 1000 THEN 3 ELSE 0 END
You would group on this.
For splitting into equal parts, you would use the NTILE window function to group.
NTILE(3) OVER (
ORDER BY price]
)

Related

How insert AVG in BETWEEN clause?

For example lets take Northwind. I want to use CASE clause to create categories by comparing units_in_stock with its AVG value and place this value in multiple BETWEEN clauses. That is what I have got:
SELECT product_name, unit_price, units_in_stock,
CASE
WHEN units_in_stock > (SELECT AVG(units_in_stock) + 10 FROM products) THEN 'many'
WHEN units_in_stock BETWEEN (SELECT AVG(units_in_stock) - 10 FROM products) AND (SELECT AVG(units_in_stock) + 10 FROM products) THEN 'average'
ELSE 'low'
END AS amount
FROM products
ORDER BY units_in_stock;
According to Analyze tool in pgAdmin AVG(units_in_stock) was calculated three times. Is there a way to reduce amount of calculations?
You can use window functions insead of a subquery. Also, there is no need to use BETWEEN in the second WHEN condition; values that are greater than the average + 10 are handled by the first branch, and never reach the second branch.
I would phrase this as:
SELECT product_name, unit_price, units_in_stock,
CASE
WHEN units_in_stock > AVG(units_in_stock) OVER() + 10 THEN 'many'
WHEN units_in_stock >= AVG(units_in_stock) OVER() - 10 THEN 'average'
ELSE 'low'
END AS amount
FROM products
ORDER BY units_in_stock;
I would expect the database to optimize the query so that the window average is only computed once. If that's not the case, an alternative would be to compute the average in a subquery first:
SELECT product_name, unit_price, units_in_stock,
CASE
WHEN units_in_stock > avg_units_in_stock + 10 THEN 'many'
WHEN units_in_stock >= avg_units_in_stock - 10 THEN 'average'
ELSE 'low'
END AS amount
FROM (SELECT p.*, AVG(units_in_stock) OVER() avg_units_in_stock FROM products p) p
ORDER BY units_in_stock;

PostgreSQL: select column and exclude from group

This question has probably been asked in different formats, but I could not find the answer.
I have table orders
date, quantity_ordered, unit_cost_cents , product_model_number, title
I would like to:
SELECT
model_number,
title,
SUM(unit_cost_cents / 100.00 * quantity_ordered) as total
FROM orders
GROUP BY model_number
HAVING SUM(quantity_submitted) > 0
ORDER BY total DESC
But it requires grouping by the title as well.
My problem being is that my title changes over time. I'd like to preserve the titles and simply display/select the most recent title without grouping by title which would make the numbers different.
You can use a subquery to fetch the latest title:
SELECT
model_number,
(select max(title) from orders where date = (
select max(date) from orders where model_number = o.model_number)
) title,
SUM(unit_cost_cents / 100.00 * quantity_ordered) as total
FROM orders o
GROUP BY model_number
HAVING SUM(quantity_submitted) > 0
ORDER BY total DESC
I used select max(title) instead of select title to make sure that the subquery will not return more than 1 rows (just in case).
SELECT
o.model_number
, om.title
, SUM(o.unit_cost_cents / 100.00 * o.quantity_ordered) as total
FROM orders o
JOIN (SELECT model_number, title
,row_number() OVER (PARTITION BY model_number ORDER BY zdate DESC) AS rn
FROM orders) om
ON om.model_number=o.model_number AND om.rn=1
GROUP BY 1,2
HAVING SUM(o.quantity_submitted) > 0
ORDER BY 3 DESC
;

SQL to select users into groups based on group percentage

To keep this simple, let's say I have a table with 100 records that include:
userId
pointsEarned
I would like to group these 100 records (or whatever the total is based on other criteria) into several groups as follows:
Group 1, 15% of total records
Group 2, 25% of total records
Group 3, 10% of total records
Group 4, 10% of total records
Group 5, 40% (remaining of total records, percentage doesn't really matter)
In addition to the above, there will be a minimum of 3 groups and a maximum of 5 groups with varying percentages that always totally 100%. If it makes it easier, the last group will always be the remainder not picked in the other groups.
I'd like to results to be as follows:
groupNbr
userId
pointsEarned
To do this sort of breakup, you need a way to rank the records so that you can decide which group they belong in. If you do not want to randomise the group allocation, and userId is contiguous number, then using userId would be sufficient. However, you probably can't guarantee that, so you need to create some sort of ranking, then use that to split your data into groups. Here is a simple example.
Declare #Total int
Set #Total = Select COUNT(*) from dataTable
Select case
when ranking <= 0.15 * #Total then 1
when ranking <= 0.4 * #Total then 2
when ranking <= 0.5 * #Total then 3
when ranking <= 0.6 * #Total then 4
else 5 end as groupNbr,
userId,
pointsEearned
FROM (Select userId, pointsEarned, ROW_NUMBER() OVER (ORDER BY userId) as ranking From dataTable) A
If you need to randomise which group data end up in, then you need to allocate a random number to each row first, and then rank them by that random number and then split as above.
If you need to make the splits more flexible, you could design a split table that has columns like minPercentage, maxPercentage, groupNbr, fill it with the splits and do something like this
Declare #Total int
Set #Total = Select COUNT(*) from dataTable
Select S.groupNbr
B.userId,
B.pointsEearned
FROM (Select ranking / #Total * 100 as rankPercent, userId, pointsEarned
FROM (Select userId, pointsEarned, ROW_NUMBER() OVER (ORDER BY userId) as ranking From dataTable) A
) B
inner join splitTable S on S.minPercentage <= rankPercent and S.maxPercentage >= rankPercent

How to normalize group by count results?

How can the results of a "group by" count be normalized by the count's sum?
For example, given:
User Rating (1-5)
----------------------
1 3
1 4
1 2
3 5
4 3
3 2
2 3
The result will be:
User Count Percentage
---------------------------
1 3 .42 (=3/7)
2 1 .14 (=1/7)
3 2 .28 (...)
4 1 .14
So for each user the number of ratings they provided is given as the percentage of the total ratings provided by everyone.
SELECT DISTINCT ON (user) user, count(*) OVER (PARTITION BY user) AS cnt,
count(*) OVER (PARTITION BY user) / count(*) OVER () AS percentage;
The count(*) OVER (PARTITION BY user) is a so-called window function. Window functions let you perform some operation over a "window" created by some "partition" which is here made over the user id. In plain and simple English: the partitioned count(*) is calculated for each distinct user value, so in effect it counts the number of rows for each user value.
Without using a windowing function or variables, you will need to cross join a grouped subquery on a second "maxed" subquery then select again to return a subset you can work with.
SELECT
B.UserID,
B.UserCount,
A.CountAll
FROM
(
SELECT
CountAll=SUM(UserCount)
FROM
(
SELECT
UserCount=COUNT(*)
FROM
MyTable
GROUP BY
UserID
) AS A
)AS C
CROSS JOIN(
SELECT
UserID,
UserCount=COUNT(*)
FROM
MyTable
GROUP BY
UserID
)AS B

Summing Multiple Records by maxdate

I have a table with the following data
Bldg Suit SQFT Date
1 1 1,000 9/24/2012
1 1 1,500 12/31/2011
1 2 800 8/31/2012
1 2 500 10/1/2005
I want to write a query that will sum the max date for each suit record, so the desired result would be 1,800, and must be in one cell/row. This will ultimately be part of subquery, I am just not getting what I expect with the queries I have writtren so far.
Thanks in advance.
You can use the following (See SQL Fiddle with Demo):
select sum(t1.sqft) Total
from yourtable t1
inner join
(
select max(dt) mxdt, suit, bldg
from yourtable
group by suit, bldg
) t2
on t1.dt = t2.mxdt
and t1.bldg = t2.bldg
and t1.suit = t2.suit
; With Data As
(
Select Bldg, Suit, SQFT, Row_Number() Over (Partition By Bldg, Suit Order By Date DESC) As RowID
From YourTableNameHere
)
Select Bldg, Sum(SQFT) As TotalSQFT
From Data
Where RowId = 1
Group By Bldg