Pass a table as condition with where clause in kdb - kdb

I have a table t:
t:([] sym:`GOOG`AMZN; px:10 20; vol:100 200);
Is it possible to pass a sub-table as a where clause condition to the table?
Below query throws type error:
select from t where ([] sym:enlist `GOOG; px:enlist 10)

Yes, it is possible:
q)select from t where([]sym;px) in ([] sym:enlist `GOOG; px:enlist 10)
sym px vol
-----------
GOOG 10 100
Update: however, if t is large this should be much faster:
q)([] sym:enlist `GOOG; px:enlist 10)#2!t
sym px| vol
-------| ---
GOOG 10| 100

Related

Fby q/kdb aggregation function

I want to apply an fby using count distinct, i.e.
select from t where 1=(count distinct; column) fby another_column
How can I do this?
To add to what's already up there you could do this in less characters with an # apply
q)n:100
// Create a table where all entries for sym=`a are size 10
q)show t:update size:10 from ([]sym:n#`a`b`c`d;size:n?200) where sym = `a
sym size
--------
a 10
b 28
c 51
d 64
a 10
b 43
...
// Use count distinct# to select from the table as per your requirements
q)select from t where 1=(count distinct#;size) fby sym
sym size
--------
a 10
a 10
a 10
a 10
a 10
0N! is a great operator for checking the operation of these queries 'in-flight'
Using it we can see count distinct fails on it's own because it tries to count the function distinct which returns 1
q)select from t where 1=(0N!count distinct; size) fby sym
1
'type
[0] select from t where 1=(0N!count distinct; size) fby sym
However with dyadic # we can create a handy projection
q)select from t where 1=(0N!(count distinct#); size) fby sym
##[?:]
sym size
--------
a 10
a 10
a 10
a 10
...
Here I've had to wrap using brackets here to prevent 0N! getting sucked into the count distinct # projection. In k-speak this effectively translates to 'count the result of the distinct operator applied to whatever the second argument to # is'. Quite handy for code-golfing
It'd be easier to know for sure if you provide a sample table and desired output, but one of the following will likely help.
The simplest solution would be to use an anonymous function:
select from t where 1=({count distinct x};c1) fby c2
Alternatively you can use this syntax which I first saw used in Nick Psaris' new book Fun Q:
select from t where 1=(count distinct ::;c1) fby c2

Pivot table with multiple value columns in KDB+

I would like to transform the following two row table generated by:
tb: ([] time: 2010.01.01 2010.01.01; side:`Buy`Sell; price:100 101; size:30 50)
time side price size
--------------------------------
2010.01.01 Buy 100 30
2010.01.01 Sell 101 50
To the table below with single row:
tb1: ([] enlist time: 2010.01.01; enlist price_buy:100; enlist price_sell:101; enlist size_buy:30; enlist size_sell:50)
time price_buy price_sell size_buy size_sell
-----------------------------------------------------
2010.01.01 100 101 30 50
What is the most efficient way to achieve this?
(select price_buy:price, size_buy:size by time from tb where side = `Buy) lj select price_sell:price, size_sell:size by time from tb where side = `Sell
time | price_buy size_buy price_sell size_sell
----------| ---------------------------------------
2010.01.01| 100 30 101 50
If you wanted to avoid 2 select statements:
raze each select `price_buy`price_sell!(side!price)#/:`Buy`Sell, `size_buy`size_sell!(side!size)#/:`Buy`Sell by time from tb
As an additional note, having a date column labeled time can be misleading. Typical financial tables in kdb have the format date time sym etc
Edit: Functional form for dynamic column generation:
{x[0] lj x[1]}[{?[`tb;enlist (=;`side;enlist `$x);(enlist `time)!enlist `time;(`$("price",x;"size",x))!(`price;`size)]} each ("Sell";"Buy")]
time | priceSell sizeSell priceBuy sizeBuy
----------| -----------------------------------
2010.01.01| 101 50 100 30
The general pivot function on the Kx website can do this, see https://code.kx.com/q/kb/pivoting-tables/
q)piv[tb;(),`time;(),`side;`price`size;{[v;P]`$raze each string raze P[;0],'/:v,/:\:P[;1]};{x,z}]
time | Buyprice Sellprice Buysize Sellsize
----------| -----------------------------------
2010.01.01| 100 101 30 50
I have a pivot function in github . But it doesn't support multiple columns
.math.st.pivot: {[t;rc;cf;ff]
P: asc distinct t cf;
Pcol: `$string[P] cross "_",/:string key ff;
t: ?[t;();rc!rc;key[ff]!{({[x;y;z] z each y#group x}[;;z];x;y)}[cf]'[key ff;value ff]];
t: ![t;();0b; Pcol! raze {((';#);x;$[-11h=type y;enlist;::] y)}'[key ff]'[P] ];
![t;();0b;key ff]
};
But you can left join to achieve expected result:
.math.st.pivot[tb;enlist`time;`side;enlist[`price]!enlist first]
lj .math.st.pivot[tb;enlist`time;`side;enlist[`size]!enlist first]
Looks like adding support for multiple columns is a good idea.

KDB query returns more 2 columns instead of 1 for max filter

I just want to create one report where I need max price for each symbol so I wrote following query which works fine on PROD but fails on UAT. So just wanted to know if following query is the appropriate or not.
select from (select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31) ) where size=(max;price) fby tier
Above query returns 2 column for each symbol instead of 1. Following is the result inner query i.e select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31)
t:([]time:8#2019.03.11D09:00+"v"$0 4 8 10;sym:8#`GOOG`GOOG`MSFT`MSFT;src:8#`L`O`N`O;price:36.01 35.01 35.5 31.1 39.01 38.01 33.5 32.1;size:8#1427 708 7810 1100)
time sym src price
--------------------------------------------
2019.03.11D09:00:00.000000000 GOOG L 36.01
2019.03.11D09:00:04.000000000 GOOG O 35.01
2019.03.11D09:00:08.000000000 MSFT N 35.5
2019.03.11D09:00:10.000000000 MSFT O 31.1
2019.03.11D09:00:00.000000000 GOOG L 39.01
2019.03.11D09:00:04.000000000 GOOG O 38.01
2019.03.11D09:00:08.000000000 MSFT N 33.5
2019.03.11D09:00:10.000000000 MSFT O 32.1
And output for select from (select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31) ) where size=(max;price) fby tier is :
t[0,2,4,7]
time sym src price
---------------------------------------------
2019.03.11D09:00:00.000000000 GOOG L 36.01
2019.03.11D09:00:08.000000000 MSFT N 35.5
2019.03.11D09:00:00.000000000 GOOG L 39.01
2019.03.11D09:00:10.000000000 MSFT O 32.1
I suspect that there is something missing with the dataset that you have provided in the question. The results of your inner queries are all floats with remainders, as size is a long, it doesn't make any sense that size=(max;price) is returning any results.
To answer your question in the most general of sense, to get the max price by sym is
select from t where price=(max;price) fby sym
Applying this to the inner result you have provided
q)select from t where price=(max;price) fby sym
time sym src price size
-------------------------------------------------
2019.03.11D09:00:08.000000000 MSFT N 35.5 7810
2019.03.11D09:00:00.000000000 GOOG L 39.01 1427

How to apply a function to an entire column?

I have the following table from a JDBC connection in Q.
q)r
some_int this created_at updated_at ..
-----------------------------------------------------------------------------..
1231231 "ASD" 2016.02.11D14:16:29.743260000 2016.02.11D14:16:29...
13312 "TSM" 2016.02.11D14:16:29.743260000 2016.02.11D14:16:29...
I would like to apply the following function to the first column.
deviation:{a:avg x; sqrt avg (x*x)-a*a}
This works for arrays.
q)l
1 2 3 4
q)deviation l
1.118034
How can I apply deviation on a column in a table? It seems my approach does not work:
q)select deviation(some_id) from r
'rank
UPDATE:
I cannot explain the following:
q)select avg(some_int) from r
some_int
---------
1005341
q)select min(some_int) from r
some_int
---------
812361
q)select max(some_int) from r
some_int
---------
1184014
q)select sum(some_int) from r
some_int
---------
You need to enlist the result if it is an atom since table columns must be lists, not atoms. Normally kdb can do this for you but often not when you're performing your own custom aggregations. For example, even if you define a function sum2 to be an exact copy of sum:
q)sum2:sum
kdb can only recognise sum as an aggregation and will enlist automatically, but not for sum2
q)select sum col1 from ([]col1:1 2 3 4)
col1
----
10
q)select sum2 col1 from ([]col1:1 2 3 4)
'rank
So you need to enlist in the second case:
q)select enlist sum2 col1 from ([]col1:1 2 3 4)
col1
----
10
UPDATE:
To answer your second question - it looks like your sum of numbers has spilled over the boundary for an integer. You'd need to convert them to long and then sum
q)select sum col1 from ([]col1:2147483645 1i)
col1
----------
2147483646
Above is the maximum integer. Adding one more gives infinity for an int
q)select sum col1 from ([]col1:2147483645 1 1i)
col1
----
0W
Adding anything more than that shows a blank (null)
q)select sum col1 from ([]col1:2147483645 1 1 1i)
col1
----
Solution is to cast to long before summing (or make them long in the first place)
q)select sum `long$col1 from ([]col1:2147483645 1 1 1i)
col1
----------
2147483648
You get a rank because the function does not return a list. Since the function returns a single number presumably you just want the single number answer? In which case you can simple index into the table (or use exec) to get the column vector and apply it:
deviation t`some_id
Else if you want to retain a table as the answer if you enlist the result:
select enlist deviation some_id from t

kdb: dynamically denormalize a table (convert key values to column names)

I have a table like this:
q)t:([sym:(`EURUSD`EURUSD`AUDUSD`AUDUSD);server:(`S01`S02`S01`S02)];volume:(20;10;30;50))
q)t
sym server| volume
-------------| ------
EURUSD S01 | 20
EURUSD S02 | 10
AUDUSD S01 | 30
AUDUSD S02 | 50
I need to de-normalize it to display the data nicely. The resulting table should look like this:
sym | S01 S02
------| -------
EURUSD| 20 10
AUDUSD| 30 50
How do I dynamically convert the original table using distinct values from server column as column names for the new table?
Thanks!
Basically you want 'pivot' table. Following page has a very good solution for your problem:
http://code.kx.com/q/cookbook/pivoting-tables/
Here are the commands to get the required table:
q) P:asc exec distinct server from t
q) exec P#(server!volume) by sym:sym from t
One tricky thing around pivoting a table is - the keys of the dictionary should be of type symbol otherwise it won't generate the pivot table structure.
E.g. In the following table, we have a column dt with type as date.
t:([sym:(`EURUSD`EURUSD`AUDUSD`AUDUSD);dt:(0 1 0 1+.z.d)];volume:(20;10;30;50))
Now if we want to pivot it with columns as dates , it will generate a structure like :
q)P:asc exec distinct dt from t
q)exec P#(dt!volume) by sym:sym from t
(`s#flip (enlist `sym)!enlist `s#`AUDUSD`EURUSD)!((`s#2018.06.22 2018.06.23)!30j, 50j;(`s#2018.06.22 2018.06.23)!20j, 10j)
To get the dates as the columns , the dt column has to be typecasted to symbol :
show P:asc exec distinct `$string date from t
`s#`2018.06.22`2018.06.23
q)exec P#((`$string date)!volume) by sym:sym from t
sym | 2018.06.22 2018.06.23
------| ---------------------
AUDUSD| 30 50
EURUSD| 20 10