Calculate percentage difference between two rows - postgresql

I have this query that produced the table below.
select season,
guildname,
count(guildname) as mp_count,
(count(guildname)/600::float)*100 as grank
from mp_rankings
group by season, guildname
order by grank desc
season
guildname
mp_count
grank
10
LEGENDS
56
9.33333333333333
9
LEGENDS
54
9
10
EVERGLADE
50
8.33333333333333
9
Mystic
46
7.66666666666667
10
Mystic
42
7
9
EVERGLADE
39
6.5
10
100
36
6
9
PARABELLUM
33
5.5
10
PARABELLUM
29
4.83333333333333
9
100
29
4.83333333333333
I wanted to create a new column that calculates the percentage difference between the two seasons using identical guildnames. For example:
season
guildname
mp_count
grank
prev_season_percent_diff
10
LEGENDS
56
9.33333333333333
0.33%
10
EVERGLADE
50
8.33333333333333
1.83%
The resulting table will only show the current season (which is the highest season value, 10 in this case) and adds a new column prev_season_percent_diff, which is the current season's grank minus the previous season's grank.
How can I achieve this?

Use a Common Table Expression ("CTE") for the grouped result and join it to itself to calculate the difference to the previous season:
with summary as (
select
season,
guildname,
count(*) as mp_count, -- simplified equivalent expression
count(*)/6 as grank -- simplified equivalent expression
from mp_rankings
group by season, guildname
)
select
a.season,
a.guildname,
a.mp_count,
a.grank,
a.mp_count - b.mp_count as prev_season_percent_diff
from summary a
left join summary b on b.guildname = a.guildname
and b.season = a.season - 1
where a.season = (select max(season) from summary)
order by a.grank desc
If you actually want a % in the result, concatenate a % to the difference calculation.

Related

KDB Select rows from a table based on one of its column while comparing it to another table

I have table1 as below.
num
value
1
10
2
15
3
20
table2
ver
value
1.0
5
2.0
15
3.0
18
Output should be as below. I need to select all rows from table1 such that table1.value <= table2.value.
num
value
1
10
2
15
I tried this, it's not working.
select from table1 where value <= (exec value from table2)
From a logical point of view what you're asking kdb to compare is:
10 15 20<=5 15 18
Because these are equal lengths, kdb assumes you mean pairwise comparison, aka
10<=5
15<=15
20<=18
to which it would return
q)10 15 20<=5 15 18
010b
What you actually seem to mean (based on your expected output) is 10 15 20<=max(5 15 18). So in that case you would want:
q)t1:([]num:1 2 3;val:10 15 20)
q)t2:([]ver:1 2 3.;val:5 15 18)
q)select from t1 where val<=exec max val from t2
num val
-------
1 10
2 15
As an aside, you can't/shouldn't have a column called value as it clashes with a keyword
value is a keyword so don't assign to it.
Assuming you want all values from table1 with value less than the max value in table2 you could do:
q)table1:([]num:til 3;val:10 15 20)
q)table2:([]ver:`float$til 3;val:5 15 18)
q)select from table1 where val<=max table2`val
num val
-------
0 10
1 15

calculating median to output similar results to over partition

i have a large table, here is a snippet of how it looks like
name class brand rating
12 d 1 3.8
22 d 1 3.9
33 a 2 1.1
12 d 1 2.3
12 a 3 3.4
44 b 1 9.8
22 c 2 3.0
i calculated for the average of the rating doing the below
select avg(rating) over(partition by name,class,brand) as avg_rating from df
i'm aware that postgres doesn't have a median function but i would like to calculate for that column and have the output in a similar structure to that of my window function for average
in case of even number of rows, i would like the average number between the middle two numbers
To get the median, you should use percentile_cont
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY rating) FROM df;

duplicating table columns in KDB

Consider the code below:
q)tab:flip `items`sales`prices!(`nut`bolt`cam`cog;6 8 0 3;10 20 15 20)
q)tab
items sales prices
------------------
nut 6 10
bolt 8 20
cam 0 15
cog 3 20
I would like to duplicate the prices column. I can write a query like this:
q)update prices_copy: prices from tab
I also can write a query like this:
q)select items, sales, prices, prices_copy: first prices by items from tab
Both would work. I would like to know how the "by" version would work and the motivation for writing each version. I cannot help but think the "by" version is more thinking in rows.
Your initial query would be ideally what you want for your duplicate column requirement.
The by creates groups of the column items in your example and collapses every other column in the select query according to the indices calculated from grouping items. More info on by here - http://code.kx.com/wiki/Reference/select and http://code.kx.com/wiki/JB:QforMortals2/queries_q_sql#The_by_Phrase
In your example, the column items is already unique and so no collapsing into groups is actually performed, however, the by will create nested lists from the other columns (i.e. lists of lists). The use of first will just un-nest the items column, thus collapsing it to a normal (long-typed) vector.
When the grouping is finished the by columns are used as the key column[s] of the result and you will see this by the use of a vertical line to the right hand side of the key column[s]. All other columns within the select query are placed to the right hand side of the key.
The logic of the by version coincidentally creates a copy of prices. But by changes the order:
q)ungroup select sales, prices by items from tab
items sales prices
------------------
bolt 8 20
cam 0 15
cog 3 20
nut 6 10
q)tab
items sales prices
------------------
nut 6 10
bolt 8 20
cam 0 15
cog 3 20
The by version works only because items is unique. For a tab with multiple values for item eg. 8#tab, the query only produces 4 values for prices_copy.
q)select items, sales, prices, prices_copy: first prices by items from 8#tab
items| items sales prices prices_copy
-----| ----------------------------------
bolt | bolt bolt 8 8 20 20 20
cam | cam cam 0 0 15 15 15
cog | cog cog 3 3 20 20 20
nut | nut nut 6 6 10 10 10
There is a fundamental difference between a simple update and update by queries.
Let's explore it by adding an extra column brand to the table
tab2:flip `items`sales`prices`brand!(`nut`bolt`cam`cog`nut`bolt`cam`cog;6 8 0 3 1 2 3 4;10 20 15 20 30 40 50 60;`b1`b1`b1`b1`b2`b2`b2`b2)
The following will now simply copy the column :
asc update prices_copy: prices from tab2
However, the following query is copying the first item price regardless of the brand and updating it for all other brands of same item.
asc ungroup select sales, prices,brand, prices_copy: first prices by items from tab2
items sales prices brand prices_copy
------------------------------------
bolt 2 40 b2 20
bolt 8 20 b1 20 //b2 price
cam 0 15 b1 15 //b2 price
cam 3 50 b2 15
cog 3 20 b1 20
cog 4 60 b2 20 //b2 price
nut 1 30 b2 10 //b2 price
nut 6 10 b1 10
update by might be useful in the case where you want to copy the max price of the items regardless of the brand or some other aggregation query.
asc ungroup select sales, prices,brand, prices_copy: max prices by items from tab2
items sales prices brand prices_copy
------------------------------------
bolt 2 40 b2 40
bolt 8 20 b1 40 //max price in bolts regardless of the brand
cam 0 15 b1 50
cam 3 50 b2 50
cog 3 20 b1 60
cog 4 60 b2 60
nut 1 30 b2 30
nut 6 10 b1 30

TSQL cumulative column from previous row [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Calculate a Running Total in SqlServer
I need to use values from previous row inorder to generate a cumulative value as shown below. Always for each Code for the year 2000 the starting Base is 100.
I need to ahieve this using tsql code.
id Code Yr Rate Base
1 4 2000 5 100
2 4 2001 7 107 (100+7)
3 4 2002 4 111 (107+4)
4 4 2003 8 119 (111+8)
5 4 2004 10 129 (119+10)
6 5 2000 2 100
7 5 2001 3 103 (100+3)
8 5 2002 8 111 (103+8)
9 5 2003 5 116 (111+5)
10 5 2004 4 120 (116+4)
OK. We have table like this
CREATE Table MyTbl(id INT PRIMARY KEY IDENTITY(1,1), Code INT, Yr INT, Rate INT)
And we would like to calculate cumulative value by Code.
So we can use query like this:
1) recursion (requires more resources, but outputs the result as in the example)
with cte as
(SELECT *, ROW_NUMBER()OVER(PARTITION BY Code ORDER BY Yr ASC) rn
FROM MyTbl),
recursion as
(SELECT id,Code,Yr,Rate,rn, CAST(NULL as int) as Tmp_base, CAST('100' as varchar(25)) AS Base FROM cte
WHERE rn=1
UNION ALL
SELECT cte.id,cte.Code,cte.Yr,cte.Rate,cte.rn,
CAST(recursion.Base as int),
CAST(recursion.Base+cte.Rate as varchar(25))
FROM recursion JOIN cte ON recursion.Code=cte.Code AND recursion.rn+1=cte.rn
)
SELECT id,Code,Yr,Rate,
CAST(Base as varchar(10))+ISNULL(' ('+ CAST(Tmp_base as varchar(10))+'+'+CAST(Rate as varchar(10))+')','') AS Base
FROM recursion
ORDER BY 1
OPTION(MAXRECURSION 0)
2) or we can use a faster query without using recursion. but the result is impossible to generate the strings like '107 (100+7)' (only strings like '107')
SELECT *,
100 +
(SELECT ISNULL(SUM(rate),0) /*we need to calculate only the sum in subquery*/
FROM MyTbl AS a
WHERE
a.Code=b.Code /*the year in subquery equals the year in main query*/
AND a.Yr<b.Yr /*main feature in our subquery*/
) AS base
FROM MyTbl AS b

Extract Unique Time Slices in Oracle

I use Oracle 10g and I have a table that stores a snapshot of data on a person for a given day. Every night an outside process adds new rows to the table for any person whose had any changes to their core data (stored elsewhere). This allows a query to be written using a date to find out what a person 'looked' like on some past day. A new row is added to the table even if only a single aspect of the person has changed--the implication being that many columns have duplicate values from slice to slice since not every detail changed in each snapshot.
Below is a data sample:
SliceID PersonID StartDt Detail1 Detail2 Detail3 Detail4 ...
1 101 08/20/09 Red Vanilla N 23
2 101 08/31/09 Orange Chocolate N 23
3 101 09/15/09 Yellow Chocolate Y 24
4 101 09/16/09 Green Chocolate N 24
5 102 01/10/09 Blue Lemon N 36
6 102 01/11/09 Indigo Lemon N 36
7 102 02/02/09 Violet Lemon Y 36
8 103 07/07/09 Red Orange N 12
9 104 01/31/09 Orange Orange N 12
10 104 10/20/09 Yellow Orange N 13
I need to write a query that pulls out time slices records where some pertinent bits, not the whole record, have changed. So, referring to the above, if I only want to know the slices in which Detail3 has changed from its previous value, then I would expect to only get rows having SliceID 1, 3 and 4 for PersonID 101 and SliceID 5 and 7 for PersonID 102 and SliceID 8 for PersonID 103 and SliceID 9 for PersonID 104.
I'm thinking I should be able to use some sort of Oracle Hierarchical Query (using CONNECT BY [PRIOR]) to get what I want, but I have not figured out how to write it yet. Perhaps YOU can help.
Thanks you for your time and consideration.
Here is my take on the LAG() solution, which is basically the same as that of egorius, but I show my workings ;)
SQL> select * from
2 (
3 select sliceid
4 , personid
5 , startdt
6 , detail3 as new_detail3
7 , lag(detail3) over (partition by personid
8 order by startdt) prev_detail3
9 from some_table
10 )
11 where prev_detail3 is null
12 or ( prev_detail3 != new_detail3 )
13 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
8 103 07-JUL-09 N
9 104 31-JAN-09 N
7 rows selected.
SQL>
The point about this solution is that it hauls in results for 103 and 104, who don't have slice records where detail3 has changed. If that is a problem we can apply an additional filtration, to return only rows with changes:
SQL> with subq as (
2 select t.*
3 , row_number () over (partition by personid
4 order by sliceid ) rn
5 from
6 (
7 select sliceid
8 , personid
9 , startdt
10 , detail3 as new_detail3
11 , lag(detail3) over (partition by personid
12 order by startdt) prev_detail3
13 from some_table
14 ) t
15 where t.prev_detail3 is null
16 or ( t.prev_detail3 != t.new_detail3 )
17 )
18 select sliceid
19 , personid
20 , startdt
21 , new_detail3
22 , prev_detail3
23 from subq sq
24 where exists ( select null from subq x
25 where x.personid = sq.personid
26 and x.rn > 1 )
27 order by sliceid
28 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
SQL>
edit
As egorius points out in the comments, the OP does want hits for all users, even if they haven't changed, so the first version of the query is the correct solution.
In addition to OMG Ponies' answer: if you need to query slices for all persons, you'll need partition by:
SELECT s.sliceid
, s.personid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (
PARTITION BY t.personid ORDER BY t.startdt
) prev_val
FROM t) s
WHERE (s.prev_val IS NULL OR s.prev_val != s.detail3)
I think you'll have better luck with the LAG function:
SELECT s.sliceid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t) s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)
Subquery Factoring alternative:
WITH slices AS (
SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t)
SELECT s.sliceid
FROM slices s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)