KDB Table Pivot reverse engineering problem

KDB Table Pivot reverse engineering problem - kdb

In KDB, I have a simple table that I am trying to pivot. Using the https://code.kx.com/q/kb/pivoting-tables/ I am still lost. I am a newbie to KDB.
I have a table
t:([]sym:`IBM`FB`TESLA`IBM;exchange:`A`A`B`B;OB_p1:4#100.1;OB_p2:100.2 100.2 100.3 100.4;OB_p3:100.5 100.6 100.7 100.8;OB_p4:100.8 100.8 100.8 100.9)
sym exchange OB_p1 OB_p2 OB_p3 OB_p4
IBM A 100.1 100.2 100.5 100.8
FB A 100.1 100.2 100.6 100.8
TESLA B 100.1 100.3 100.7 100.8
IBM B 100.1 100.4 100.8 100.9
I am trying to create the following table:
exchange OB IBM FB TESLA
A OB_1
A OB_1
B OB_1
I was wondering if there is anyone who could kindly assist in this task? I have tried the pivot table recommendation on https://code.kx.com/q/kb/pivoting-tables/ with no luck

Sample table:
t:flip `sym`exchange`OB_p1`OB_p2`OB_p3`OB_p4!flip (
(`IBM;`A;100.1;100.2;100.5;100.8);
(`FB;`A;100.1;100.2;100.6;100.8);
(`TESLA;`B;100.1;100.3;100.7;100.8);
(`IBM;`B;100.1;100.4;100.8;100.9))
Gives:
sym exchange OB_p1 OB_p2 OB_p3 OB_p4
--------------------------------------
IBM A 100.1 100.2 100.5 100.8
FB A 100.1 100.2 100.6 100.8
TESLA B 100.1 100.3 100.7 100.8
IBM B 100.1 100.4 100.8 100.9
unpivot function taken from https://gist.github.com/rianoc/a14b832f12908c4785e2297995db1e76
unpivot:{[tab;baseCols;pivotCols;kCol;vCol]
base:?[tab;();0b;{x!x}(),baseCols];
newCols:{[k;v;t;p] flip (k;v)!(count[t]#p;t p)}[kCol;vCol;tab] each pivotCols;
baseCols xasc raze {[b;n] b,'n}[base] each newCols
}
Calling:
t:unpivot[t;`sym`exchange;`OB_p1`OB_p2`OB_p3`OB_p4;`OB;`val]
Gives:
sym exchange OB val
--------------------------
FB A OB_p1 100.1
FB A OB_p2 100.2
FB A OB_p3 100.6
FB A OB_p4 100.8
IBM A OB_p1 100.1
IBM A OB_p2 100.2
IBM A OB_p3 100.5
IBM A OB_p4 100.8
IBM B OB_p1 100.1
IBM B OB_p2 100.4
IBM B OB_p3 100.8
IBM B OB_p4 100.9
TESLA B OB_p1 100.1
TESLA B OB_p2 100.3
TESLA B OB_p3 100.7
TESLA B OB_p4 100.8
piv function taken from https://code.kx.com/q/kb/pivoting-tables/
f from link simplified for this usecase when naming pivoted columns
piv:{[t;k;p;v;f;g]
v:(),v;
G:group flip k!(t:.Q.v t)k;
F:group flip p!t p;
count[k]!g[k;P;C]xcols 0!key[G]!flip(C:f[v]P:flip value flip key F)!raze
{[i;j;k;x;y]
a:count[x]#x 0N;
a[y]:x y;
b:count[x]#0b;
b[y]:1b;
c:a i;
c[k]:first'[a[j]#'where'[b j]];
c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]}
f:{[v;P]P[;0]}
g:{[k;P;c]k,(raze/)flip flip each 5 cut'10 cut raze reverse 10 cut asc c}
Calling:
piv[t;`exchange`OB;(),`sym;(),`val;f;g]
Gives:
exchange OB | FB IBM TESLA
--------------| -----------------
A OB_p1| 100.1 100.1
A OB_p2| 100.2 100.2
A OB_p3| 100.6 100.5
A OB_p4| 100.8 100.8
B OB_p1| 100.1 100.1
B OB_p2| 100.4 100.3
B OB_p3| 100.8 100.7
B OB_p4| 100.9 100.8

Rians answer is a more general approach but this should help you understand the manipulation:
q)ungroup exec((`FB`IBM`TESLA!3#0n),sym!flip(OB_p1;OB_p2;OB_p3;OB_p4))by exchange,OB:count[i]#enlist`OB_p1`OB_p2`OB_p3`OB_p4 from t
exchange OB FB IBM TESLA
--------------------------------
A OB_p1 100.1 100.1
A OB_p2 100.2 100.2
A OB_p3 100.6 100.5
A OB_p4 100.8 100.8
B OB_p1 100.1 100.1
B OB_p2 100.4 100.3
B OB_p3 100.8 100.7
B OB_p4 100.9 100.8
In a functional form it would be:
q)c:cols[t]where cols[t]like"*OB*";
q)s:exec distinct sym from t;
q)ungroup ?[t;();`exchange`OB!(`exchange;(#;(count;`i);enlist enlist c));(,;s!count[s]#0n;(!;`sym;(flip;enlist,c)))]
exchange OB IBM FB TESLA
--------------------------------
A OB_p1 100.1 100.1
A OB_p2 100.2 100.2
A OB_p3 100.5 100.6
A OB_p4 100.8 100.8
B OB_p1 100.1 100.1
B OB_p2 100.4 100.3
B OB_p3 100.8 100.7
B OB_p4 100.9 100.8

Related

Rank rows per group based on numeric column values in PostgreSQL

I have following table in PostgreSQL 11.0
drug_id synonym score
96165807064 chembl490421 0.667
96165807064 querciformolide a 1.0
96165807064 querciformolide b 1.0
96165807066 chembl196832 1.0
96165807066 cpiylcsbeicghy-uhfffaoysa-n 0.875
96165807066 schembl1752046 0.938
96165807066 stk694847 0.75
96165807066 molport-006-827-808 0.812
96165807066 akos016348681 0.625
96165807066 akos004112738 0.688
96165807066 mcule-5237395512 0.562
I would like to add a column with 'rank' group by drug_id based on the score column.
Following is the expected output
drug_id synonym score rank
96165807064 querciformolide a 1.0 1
96165807064 querciformolide b 1.0 1
96165807064 chembl490421 0.667 2
96165807066 chembl196832 1.0 1
96165807066 schembl1752046 0.938 2
96165807066 cpiylcsbeicghy-uhfffaoysa-n 0.875 3
96165807066 molport-006-827-808 0.812 4
96165807066 stk694847 0.75 5
96165807066 akos004112738 0.688 6
96165807066 akos016348681 0.625 7
96165807066 mcule-5237395512 0.562 8
I am using following query:
SELECT distinct
drug_id,
synonym,
score,
dense_RANK () OVER (
PARTITION BY drug_id
ORDER BY score
) rank_number
FROM
tbl
order by drug_id, score desc
;
I am not getting expected output using above query.
drug_id synonym score rank_number
96165807064 querciformolide a 1.0 2
96165807064 querciformolide b 1.0 2
96165807064 chembl490421 0.667 1
96165807066 chembl196832 1.0 15
96165807066 schembl1752046 0.938 14
96165807066 cpiylcsbeicghy-uhfffaoysa-n 0.875 13
96165807066 molport-006-827-808 0.812 12
96165807066 stk694847 0.75 11
96165807066 akos004112738 0.688 10
96165807066 akos016348681 0.625 9
96165807066 mcule-5237395512 0.562 8

You can use the following query:
SELECT
t.drug_id,
t.synonym,
t.score,
DENSE_RANK() OVER (
PARTITION BY t.drug_id
ORDER BY t.drug_id, t.score desc
) rank
FROM
test t;
I created a sql fiddle to show the query working.
https://www.db-fiddle.com/f/p9ANUghi8TxLgXrhUHsUaY/3

How to consistently sum lists of values contained in a table?

I have the following two tables:
t1:([]sym:`AAPL`GOOG; histo_dates1:(2000.01.01+til 10;2000.01.01+til 10);histo_values1:(til 10;5+til 10));
t2:([]sym:`AAPL`GOOG; histo_dates2:(2000.01.05+til 5;2000.01.06+til 4);histo_values2:(til 5; 2+til 4));
What I want is to sum the histo_values of each symbol across the histo_dates, such that the resulting table would look like this:
t:([]sym:`AAPL`GOOG; histo_dates:(2000.01.01+til 10;2000.01.01+til 10);histo_values:(0 1 2 3 4 6 8 10 12 9;5 6 7 8 9 12 14 16 18 14))
So the resulting dates histo_dates should be the union of histo_dates1 and histo_dates2, and histo_values should be the sum of histo_values1 and histo_values2 across dates.
EDIT:
I insist on the union of the dates, as I want the resulting histo_dates to be the union of both histo_dates1 and histo_dates2.

There are a few ways. One would be to ungroup to remove nesting, join the tables, aggregate on sym/date and then regroup on sym:
q)0!select histo_dates:histo_dates1, histo_values:histo_values1 by sym from select sum histo_values1 by sym, histo_dates1 from ungroup[t1],cols[t1]xcol ungroup[t2]
sym histo_dates histo_values
-------------------------------------------------------------------------------------------------------------------------------------------
AAPL 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 0 1 2 3 4 6 8 10 12 9
GOOG 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 5 6 7 8 9 12 14 16 18 14
A possibly faster way would be to make each row a dictionary and then key the tables on sym and add them:
q)select sym:s, histo_dates:key each v, histo_values:value each v from (1!select s, d!'v from `s`d`v xcol t1)+(1!select s, d!'v from `s`d`v xcol t2)
sym histo_dates histo_values
-------------------------------------------------------------------------------------------------------------------------------------------
AAPL 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 0 1 2 3 4 6 8 10 12 9
GOOG 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 5 6 7 8 9 12 14 16 18 14
Another option would be to use a plus join pj:
q)0!`sym xgroup 0!pj[ungroup `sym`histo_dates`histo_values xcol t1;2!ungroup `sym`histo_dates`histo_values xcol t2]
sym histo_dates histo_values
-------------------------------------------------------------------------------------------------------------------------------------------
AAPL 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 0 1 2 3 4 6 8 10 12 9
GOOG 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 5 6 7 8 9 12 14 16 18 14
See here for more on plus joins: https://code.kx.com/v2/ref/pj/
EDIT:
To explicitly make sure the result has the union of the dates, you could use a union join:
q)0!`sym xgroup select sym,histo_dates,histo_values:hv1+hv2 from 0^uj[2!ungroup `sym`histo_dates`hv1 xcol t1;2!ungroup `sym`histo_dates`hv2 xcol t2]
sym histo_dates histo_values
-------------------------------------------------------------------------------------------------------------------------------------------
AAPL 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 0 1 2 3 4 6 8 10 12 9
GOOG 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 5 6 7 8 9 12 14 16 18 14

another way:
// rename the columns to be common names, ungroup the tables, and place the key on `sym and `histo_dates
q){2!ungroup `sym`histo_dates`histo_values xcol x} each (t1;t2)
// add them together (or use pj in place of +), group on `sym
`sym xgroup (+) . {2!ungroup `sym`histo_dates`histo_values xcol x} each (t1;t2)
// and to test this matches t, remove the key from the resulting table
q)t~0!`sym xgroup (+) . {2!ungroup `sym`histo_dates`histo_values xcol x} each (t1;t2)
1b

Another possible way using functional amend
//Column join the histo_dates* columns and get the distinct dates - drop idx
//Using a functional apply use the idx to determine which values to plus
//Join the two tables using sym as the key - Find the idx of common dates
(enlist `idx) _select sym,histo_dates:distinct each (histo_dates1,'histo_dates2),
histovalues:{#[x;z;+;y]}'[histo_values1;histo_values2;idx],idx from
update idx:(where each histo_dates1 in' histo_dates2) from ((1!t1) uj 1!t2)
One possible problem with this is that to get the idx, it depends on the date columns being sorted which is usually the case.

KDB+ How to join data for particular dates

I have the following table containing some time-series data about some countries:
t1 : ([]dates:"d"$4+til 6) cross ([]country:`PT`AR`MR`LT; category1:1+til 4)
dates country category1
----------------------------
2000.01.05 PT 1
2000.01.05 AR 2
2000.01.05 MR 3
2000.01.05 LT 4
2000.01.06 PT 1
2000.01.06 AR 2
2000.01.06 MR 3
2000.01.06 LT 4
2000.01.07 PT 1
2000.01.07 AR 2
2000.01.07 MR 3
2000.01.07 LT 4
..
I have another table containing some complementary data for t1, but that are only valid from a certain point in time, as follows:
t2 : (([]validFrom:"d"$(0;6)) cross ([]country:`PT`AR`MR`LT)),'([]category2:1000*(1+til 8))
validFrom country category2
----------------------------
2000.01.01 PT 1000
2000.01.01 AR 2000
2000.01.01 MR 3000
2000.01.01 LT 4000
2000.01.07 PT 5000
2000.01.07 AR 6000
2000.01.07 MR 7000
2000.01.07 LT 8000
My question is: how do I join t1 and t2 to get the category2 column only for dates in t1 that are "compliant" with the validFrom dates in t2, such that the resulting table would look like this:
dates country category1 category2
--------------------------------------
2000.01.05 PT 1 1000
2000.01.05 AR 2 2000
2000.01.05 MR 3 3000
2000.01.05 LT 4 4000
2000.01.06 PT 1 1000
2000.01.06 AR 2 2000
2000.01.06 MR 3 3000
2000.01.06 LT 4 4000
2000.01.07 PT 1 5000
2000.01.07 AR 2 6000
2000.01.07 MR 3 7000
2000.01.07 LT 4 8000
..

You may use asof join to get the most recent category2 from t2 by date
aj[`country`dates;t1;`dates xasc `dates xcol t2]
Just don't forget to rename validFrom column to dates in table 2 and sort it by dates

postgresql average of numrange

I have a Postgresql table with following data
Power_range;unit; Date
[0.055,0.065];un_MW_el;14.01.1985
[0.02,0.02];un_MW_el;22.08.1985
[0.075,0.085];un_MW_el;09.04.1986
[0.055,0.055];un_MW_el;01.08.1986
[0.065,0.065];un_MW_el;19.01.1987
[0.075,0.075];un_MW_el;16.04.1987
[0.055,0.055];un_MW_el;15.05.1987
How can we query to list the average of numrange for each year/row?
The end result should be something like
0.060;1985
0.2;1985
0.80;1986
0.055;1986
0.065;1987
0.075;1987
0.055;1987

I'm not sure if thereis inline avg math func, but you can easily mock it up with arythmetics:
t=# select ((lower(power_range) + upper(power_range))/2)::float(2),extract(year from d) avg From rt;
float4 | avg
--------+------
0.06 | 1985
0.02 | 1985
0.08 | 1986
0.055 | 1986
0.065 | 1987
0.075 | 1987
0.055 | 1987
(7 rows)
where rt is created from your sample:
t=# create table rt(Power_range numrange, unit text, d date);
CREATE TABLE
t=# copy rt from stdin delimiter ';';
https://www.postgresql.org/docs/current/static/functions-range.html

Summary and Details section of a record in same report

I have table like below
ParkingLot Vehicle City Two/Four Owner Date Fee
p1 v1 c1 Two xxx 01-OCT-14 10
p1 v1 c1 Two yyy 01-OCT-14 11
p1 v1 c1 Four zzz 01-OCT-14 12
p1 v1 c2 Two aaa 01-OCT-14 13
p1 v1 c2 Two yyy 01-OCT-14 11
p1 v1 c2 Four ddd 01-OCT-14 18
p1 v2 c1 Two fff 01-OCT-14 20
p1 v2 c1 Two yyy 01-OCT-14 10
p1 v2 c1 Four hhh 01-OCT-14 10
p1 v2 c2 Two xxx 01-OCT-14 54
p1 v2 c2 Two iii 01-OCT-14 10
p1 v2 c2 Four zzz 01-OCT-14 66
p1 v1 c1 Two xxx 02-OCT-14 66
p1 v1 c1 Two yyy 02-OCT-14 2
p1 v1 c1 Four zzz 02-OCT-14 44
p1 v1 c2 Two aaa 02-OCT-14 11
p1 v1 c2 Two yyy 02-OCT-14 11
p1 v1 c2 Four ddd 02-OCT-14 18
p1 v2 c1 Two fff 02-OCT-14 44
p1 v2 c1 Two yyy 02-OCT-14 10
p1 v2 c1 Four hhh 02-OCT-14 88
p1 v2 c2 Two xxx 02-OCT-14 54
p1 v2 c2 Two iii 02-OCT-14 10
p1 v2 c2 Four zzz 02-OCT-14 33
..........
This data i need in Crystal reports in below format
SUMMARY
P1
v1
ParkingLot Vehicle City 01-OCT-14 02-OCT-14
p1 v1 c1 33 112
p1 v1 c2 42 40
p1 v1 Total 66 152
v2
ParkingLot Vehicle City 01-OCT-14 02-OCT-14
p1 v2 c1 40 142
p1 v2 c2 130 97
p1 v2 Total 170 239
DETAILS
v1
ParkingLot Vehicle City Two/Four Owner 01-OCT-14 02-OCT-14
p1 v1 c1 Two xxx 10 66
p1 v1 c1 Two yyy 11 2
p1 v1 c1 Two Total 21 68
p1 v1 c1 Four zzz 12 44
p1 v1 c1 Four Total 12 44
p1 v1 c1 ALL Total 33 112
p1 v1 c2 Two aaa 13 11
p1 v1 c2 Two yyy 11 11
p1 v1 c2 Two Total 24 22
p1 v1 c2 Four ddd 18 18
p1 v1 c2 Four Total 18 18
p1 v1 c1 ALL Total 42 40
p1 v1 ALL ALL Total 66 152
v2
ParkingLot Vehicle City Two/Four Owner 01-OCT-14 02-OCT-14
p1 v2 c1 Two fff 20 44
p1 v2 c1 Two yyy 10 10
p1 v2 c1 Two Total 30 54
p1 v2 c1 Four hhh 10 88
p1 v2 c1 Four Total 10 88
p1 v2 c1 ALL Total 40 142
p1 v2 c2 Two xxx 54 54
p1 v2 c2 Two iii 10 10
p1 v2 c2 Two Total 64 64
p1 v2 c2 Four zzz 66 33
p1 v2 c2 Four Total 66 33
p1 v2 c2 ALL Total 130 97
p1 v2 ALL ALL Total 170 239
At first i tried with by making subreport for details section and main report for summary.
i successfully got desired result without cross tabs...
but as so many p1, p2, p3..... are there it will call subport for each and every detail scetion of P
it will give performance impact...
please let me know how to do this in single report, with out cross tabs
Thanks in advance

use the same idea of making a subreport for details section and main report for summary, but on your subreport create a Vehicle and you will get the desire result even though you have hundred of different Vehicles. That subreport will be placed on your report footer, so it will show up after your summary

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

KDB Table Pivot reverse engineering problem - kdb

Related

Rank rows per group based on numeric column values in PostgreSQL

How to consistently sum lists of values contained in a table?

KDB+ How to join data for particular dates

postgresql average of numrange

Summary and Details section of a record in same report

Categories

Resources