calculating median to output similar results to over partition - postgresql

i have a large table, here is a snippet of how it looks like
name class brand rating
12 d 1 3.8
22 d 1 3.9
33 a 2 1.1
12 d 1 2.3
12 a 3 3.4
44 b 1 9.8
22 c 2 3.0
i calculated for the average of the rating doing the below
select avg(rating) over(partition by name,class,brand) as avg_rating from df
i'm aware that postgres doesn't have a median function but i would like to calculate for that column and have the output in a similar structure to that of my window function for average
in case of even number of rows, i would like the average number between the middle two numbers

To get the median, you should use percentile_cont
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY rating) FROM df;

Related

find the difference between a column ion two rows grouped by one and sort by another column

Need to find the sequential difference and average between within a columns of two rows group by brand column and order by bill_id column and find the difference of worth column between rows in a single query.
I have a data
brand bill_id worth
Moto 1 2550
Samsung 1 3430
Samsung 2 3450
Moto 2 2500
Moto 3 2530
Expected Output
brand bill_id worth net_diff avg_diff
Moto 1 2550 0 00
Moto 2 2560 10 5
Moto 3 2540 -20 -5
Samsung 1 3430 0 0
Samsung 2 3450 20 10
With the following data :
CREATE TABLE T (brand VARCHAR(16), bill_id INT, worth DECIMAL(16,2))
INSERT INTO T VALUES
('Moto', 1, 2550),
('Samsung', 1, 3430),
('Samsung', 2, 3450),
('Moto', 2, 2500),
('Moto', 3, 2530);
One possible solution could be :
WITH
T0 AS
(
SELECT *, worth - COALESCE(LAG(worth) OVER(PARTITION BY brand ORDER BY bill_id), worth) AS net_diff
FROM T
)
SELECT *, AVG(net_diff) OVER(PARTITION BY brand ORDER BY bill_id)
FROM T0;
But I do not understand the computation formulae of your example for AVG...
It appears that by average you are looking for 1/2 the difference between 2 consecutive bill_id for a brand. You can get this by applying the lag() function twice (with answer from #SQLpro as a base) arriving at: (see demo)
with bill_net(brand, bill_id, worth,net_diff) as
( select billing.*, worth - coalesce(lag(worth) over(partition by brand order by bill_id), worth)
from billing
)
select brand, bill_id, worth, net_diff, coalesce(round(((net_diff - lag(net_diff) over(partition by brand order by bill_id))/2.0),2),0.00)
from bill_net;
NOTE: Due to inconsistency between Input and Results it does not exactly produce you expected results.

Calculate percentage difference between two rows

I have this query that produced the table below.
select season,
guildname,
count(guildname) as mp_count,
(count(guildname)/600::float)*100 as grank
from mp_rankings
group by season, guildname
order by grank desc
season
guildname
mp_count
grank
10
LEGENDS
56
9.33333333333333
9
LEGENDS
54
9
10
EVERGLADE
50
8.33333333333333
9
Mystic
46
7.66666666666667
10
Mystic
42
7
9
EVERGLADE
39
6.5
10
100
36
6
9
PARABELLUM
33
5.5
10
PARABELLUM
29
4.83333333333333
9
100
29
4.83333333333333
I wanted to create a new column that calculates the percentage difference between the two seasons using identical guildnames. For example:
season
guildname
mp_count
grank
prev_season_percent_diff
10
LEGENDS
56
9.33333333333333
0.33%
10
EVERGLADE
50
8.33333333333333
1.83%
The resulting table will only show the current season (which is the highest season value, 10 in this case) and adds a new column prev_season_percent_diff, which is the current season's grank minus the previous season's grank.
How can I achieve this?
Use a Common Table Expression ("CTE") for the grouped result and join it to itself to calculate the difference to the previous season:
with summary as (
select
season,
guildname,
count(*) as mp_count, -- simplified equivalent expression
count(*)/6 as grank -- simplified equivalent expression
from mp_rankings
group by season, guildname
)
select
a.season,
a.guildname,
a.mp_count,
a.grank,
a.mp_count - b.mp_count as prev_season_percent_diff
from summary a
left join summary b on b.guildname = a.guildname
and b.season = a.season - 1
where a.season = (select max(season) from summary)
order by a.grank desc
If you actually want a % in the result, concatenate a % to the difference calculation.

Calculate Subtotals using fixed categories in postgresql query

I have a query that returns the count and sum of certain fields on my tables, and also a total. It goes like this:
example:
with foo as(
select s.subcat as subcategory,
sum(s.cost) as costs,
round(((sum(s.cost) / (s.tl)::numeric )*100),2)|| ' %' as pct_cost
from (select ...big query)s group by s.subcat
)
select * from foo
union
select 'Total costs' as subcategory,
sum(costs) as costs,
null as pct_cost
from foo
order by...
Category
Cost
Percentage
x_subcategory 1
5
0.5%
x_subcategory 2
1
0.1%
x_subcategory 3
18
1.8%
y_subcategory 1
7
0.7%
y_subcategory 2
10
1.0%
...
...
...
Total
41
5.8%
And what I need to do for another report is to get the totals by Category. I have to assign these categories based on the value of the subcategory name, the point is how to partition the result so I can get something like this:
Category
Cost
Percentage
x_subcategory 1
5
0.5%
x_subcategory 2
1
0.1%
x_subcategory 3
18
1.8%
X category
24
2.4%
y_subcategory 1
7
0.7%
y_subcategory 2
10
1.0%
Y category
17
1.7%
With GROUP BY and GROUP BY GROUPING SET I don't get what I want, and with PARTITION I'm getting syntax errors, I'm able to use it in simpler queries but this one turned out to be very complicated and I wonder if it's possible to achieve this on a query on PostgreSQL.

KDB Select rows from a table based on one of its column while comparing it to another table

I have table1 as below.
num
value
1
10
2
15
3
20
table2
ver
value
1.0
5
2.0
15
3.0
18
Output should be as below. I need to select all rows from table1 such that table1.value <= table2.value.
num
value
1
10
2
15
I tried this, it's not working.
select from table1 where value <= (exec value from table2)
From a logical point of view what you're asking kdb to compare is:
10 15 20<=5 15 18
Because these are equal lengths, kdb assumes you mean pairwise comparison, aka
10<=5
15<=15
20<=18
to which it would return
q)10 15 20<=5 15 18
010b
What you actually seem to mean (based on your expected output) is 10 15 20<=max(5 15 18). So in that case you would want:
q)t1:([]num:1 2 3;val:10 15 20)
q)t2:([]ver:1 2 3.;val:5 15 18)
q)select from t1 where val<=exec max val from t2
num val
-------
1 10
2 15
As an aside, you can't/shouldn't have a column called value as it clashes with a keyword
value is a keyword so don't assign to it.
Assuming you want all values from table1 with value less than the max value in table2 you could do:
q)table1:([]num:til 3;val:10 15 20)
q)table2:([]ver:`float$til 3;val:5 15 18)
q)select from table1 where val<=max table2`val
num val
-------
0 10
1 15

Insert rownumber repeatedly in records in t-sql

I want to insert a row number in a records like counting rows in a specific number of range. example output:
RowNumber ID Name
1 20 a
2 21 b
3 22 c
1 23 d
2 24 e
3 25 f
1 26 g
2 27 h
3 28 i
1 29 j
2 30 k
I rather to try using the rownumber() over (partition by order by column name) but my real records are not containing columns that will count into 1-3 rownumber.
I already try to loop each of record to insert a row count 1-3 but this loop affects the performance of the query. The query will use for the RDL report, that is why as much as possible the performance of the query must be good.
any suggestions are welcome. Thanks
have you tried modulo-ing rownumber()?
SELECT
((row_number() over (order by ID)-1) % 3) +1 as RowNumber
FROM table