How insert AVG in BETWEEN clause? - postgresql

For example lets take Northwind. I want to use CASE clause to create categories by comparing units_in_stock with its AVG value and place this value in multiple BETWEEN clauses. That is what I have got:
SELECT product_name, unit_price, units_in_stock,
CASE
WHEN units_in_stock > (SELECT AVG(units_in_stock) + 10 FROM products) THEN 'many'
WHEN units_in_stock BETWEEN (SELECT AVG(units_in_stock) - 10 FROM products) AND (SELECT AVG(units_in_stock) + 10 FROM products) THEN 'average'
ELSE 'low'
END AS amount
FROM products
ORDER BY units_in_stock;
According to Analyze tool in pgAdmin AVG(units_in_stock) was calculated three times. Is there a way to reduce amount of calculations?

You can use window functions insead of a subquery. Also, there is no need to use BETWEEN in the second WHEN condition; values that are greater than the average + 10 are handled by the first branch, and never reach the second branch.
I would phrase this as:
SELECT product_name, unit_price, units_in_stock,
CASE
WHEN units_in_stock > AVG(units_in_stock) OVER() + 10 THEN 'many'
WHEN units_in_stock >= AVG(units_in_stock) OVER() - 10 THEN 'average'
ELSE 'low'
END AS amount
FROM products
ORDER BY units_in_stock;
I would expect the database to optimize the query so that the window average is only computed once. If that's not the case, an alternative would be to compute the average in a subquery first:
SELECT product_name, unit_price, units_in_stock,
CASE
WHEN units_in_stock > avg_units_in_stock + 10 THEN 'many'
WHEN units_in_stock >= avg_units_in_stock - 10 THEN 'average'
ELSE 'low'
END AS amount
FROM (SELECT p.*, AVG(units_in_stock) OVER() avg_units_in_stock FROM products p) p
ORDER BY units_in_stock;

Related

GROUP BY - How to create 3 group for the column?

Say I have a table of products, fields are id, number_of_product, price
Let's price is min = 100, max = 1000*
How to create 3 groups for this column (PostgreSQL) - 100-400, 400-600, 600-1000*
*PS - it would be nice to know how to split into 3 equal parts.
SELECT COUNT(id),
COUNT(number_of_product),
!!!! price - ?!
FROM Scheme.Table
GROUP BY PRICE
You can try next query:
with p as (
select
*,
min(price) over() min_price,
(max(price) over() - min(price) over()) / 3 step
from products
) select
id, product, price,
case
when price < min_price + step then 'low_price'
when price < min_price + 2 * step then 'mid_price'
else 'high'
end as category
from p
order by price;
PostgreSQL fiddle
To do this quickly, you can use a case statement to set the groups.
CASE WHEN price BETWEEN 100 AND 400 THEN 1 WHEN price BETWEEN 400 AND 600 THEN 2 WHEN price BETWEEN 600 AND 1000 THEN 3 ELSE 0 END
You would group on this.
For splitting into equal parts, you would use the NTILE window function to group.
NTILE(3) OVER (
ORDER BY price]
)

How to simplify this UNION query (remove the UNION)?

I have a table with players, let's call it player.
Let's say they have 3 columns: userId (UUID in a varchar(255)), levelNumber (integer) and a column through a one-to-one relation with FetchType.Lazy, let's say facebookProfile.
I need to retrieve the rankings "around" the player, so 9 players above the given player and 9 players below the given player, to have a total of 19 players (with my player in the middle).
Some time ago I just came up with this idea:
(select * from player where current_level >= :levelNumber + 1 and (not userid = :userIdToIgnore) order by current_level asc limit 9)
union
(select * from player where current_level <= :levelNumber - 1 and (not userid = :userIdToIgnore) order by current_level desc limit 9)
You get the idea.
Is there any way to simplify this so it doesn't use the UNION?
I'm asking cause I need to convert that to a JPQL query, so it won't be a nativeQuery.
This is all because nativeQueries lead to the N+1 problem and I have troubles with lazy-loading (facebookProfile column) and multiple selects later. That's why I need to simplify that algorithm to be able to use JPQL.
I think you can do this with window functions and conditional expressions:
select *
from (
select p.*,
case when current_level >= :levelNumber + 1 then row_number() over(order by current_level) end rn1,
case when current_level <= :levelNumber - 1 then row_number() over(order by current_level desc) end rn_desc
from player p
where userid <> :userIdToIgnore and (current_level >= :levelNumber + 1 or current_level <= :levelNumber - 1)
) t
where rn1 between 1 and 9 or rn2 between 1 and 9

(postgres) Different types of data having own limits on different columns

I have a CTE that i want to grab data from, but i want different types of data with the same limit from the same CTE according to different rules.
Example: fruit_cte -> (id::integer, name::text, q1::boolean, q2::boolean)
I could do something like:
SELECT * FROM (SELECT 1 as query_num, * FROM fruit_cte WHERE q1 ORDER BY name LIMIT 100) as ABC
UNION ALL
SELECT * FROM (SELECT 2 as query_num, * FROM fruit_cte WHERE q2 ORDER BY name LIMIT 100) as ABC
UNION ALL
SELECT * FROM (SELECT -1 as query_num, * FROM fruit_cte WHERE NOT q1 AND NOT q2 ORDER BY name LIMIT 100) as ABC
But this is very costly and would be nice to tie this up into 1 select. Is this even possible?
The last select is a nice to have to get data that doesn't meet the requirements but possible to go without.
PG version 11+
You could get it all without the CTE, by using window functions instead.
SELECT type, id, name, q1, q2
FROM (
SELECT
CASE WHEN q1 THEN 1 WHEN q2 THEN 2 ELSE -1 END AS type,
ROW_NUMBER() OVER (
PARTITION BY CASE WHEN q1 THEN 1 WHEN q2 THEN 2 ELSE -1 END
ORDER BY NAME
) AS row_number,
id,
name,
q1,
q2
FROM ...
)
WHERE row_number <= 100
The row_number() will count, sorted by name, and keep a separate tally for every type

Aggregated values depending on an other field

I have a table with a date-time and multiples propertied some on which I group by and some on which I aggregate, the query will be like get me revenue per customer last week.
Now I want to see the change between the requested period and the previous one so I will have 2 columns revenue and previous_revenue.
Right now I'm requesting the rows of the requested period plus the rows of the previous period and for each aggregated field I add a case statement inside which return the value or 0 if not in the period that I want.
That lead to as many CASE as aggregate fields but always with the same conditional statement.
I'm wondering if there is a better design for this use case...
SELECT
customer,
SUM(
CASE TIMESTAMP_CMP('2016-07-01 00:00:00', ft.date) > 0 WHEN true THEN
REVENUE
ELSE 0 END
) AS revenue,
SUM(
CASE TIMESTAMP_CMP('2016-07-01 00:00:00', ft.date) < 0 WHEN true THEN
REVENUE
ELSE 0 END
) AS previous_revenue
WHERE date_hour >= '2016-06-01 00:00:00'
AND date_hour <= '2016-07-31 23:59:59'
GROUP BY customer
(In my real use case I have many columns which make it even more ugly)
First, I'd suggest to refactor out the timestamps and precalculate the current and previous period for later use. This is not strictly necessary to solve your problem, though:
create temporary table _period as
select
'2016-07-01 00:00:00'::timestamp as curr_period_start
, '2016-07-31 23:59:59'::timestamp as curr_period_end
, '2016-06-01 00:00:00'::timestamp as prev_period_start
, '2016-06-30 23:59:59'::timestamp as prev_period_end
;
Now a possible design to avoid repetition of timestamps and CASE statements is to group by the periods first and then doing a FULL OUTER JOIN for that table on itself:
with _aggregate as (
select
case
when date_hour between prev_period_start and prev_period_end then 'previous'
when date_hour between curr_period_start and curr_period_end then 'current'
end::varchar(20) as period
, customer
-- < other columns to group by go here >
, sum(revenue) as revenue
-- < other aggregates go here >
from
_revenue, _period
where
date_hour between prev_period_start and curr_period_end
group by 1, 2
)
select
customer
, current_period.revenue as revenue
, previous_period.revenue as previous_revenue
from
(select * from _aggregate where period = 'previous') previous_period
full outer join (select * from _aggregate where period = 'current') current_period
using(customer) -- All columns which have been group by must go into the using() clause:
-- e.g. using(customer, some_column, another_column)
;

How to normalize group by count results?

How can the results of a "group by" count be normalized by the count's sum?
For example, given:
User Rating (1-5)
----------------------
1 3
1 4
1 2
3 5
4 3
3 2
2 3
The result will be:
User Count Percentage
---------------------------
1 3 .42 (=3/7)
2 1 .14 (=1/7)
3 2 .28 (...)
4 1 .14
So for each user the number of ratings they provided is given as the percentage of the total ratings provided by everyone.
SELECT DISTINCT ON (user) user, count(*) OVER (PARTITION BY user) AS cnt,
count(*) OVER (PARTITION BY user) / count(*) OVER () AS percentage;
The count(*) OVER (PARTITION BY user) is a so-called window function. Window functions let you perform some operation over a "window" created by some "partition" which is here made over the user id. In plain and simple English: the partitioned count(*) is calculated for each distinct user value, so in effect it counts the number of rows for each user value.
Without using a windowing function or variables, you will need to cross join a grouped subquery on a second "maxed" subquery then select again to return a subset you can work with.
SELECT
B.UserID,
B.UserCount,
A.CountAll
FROM
(
SELECT
CountAll=SUM(UserCount)
FROM
(
SELECT
UserCount=COUNT(*)
FROM
MyTable
GROUP BY
UserID
) AS A
)AS C
CROSS JOIN(
SELECT
UserID,
UserCount=COUNT(*)
FROM
MyTable
GROUP BY
UserID
)AS B