KDB query returns more 2 columns instead of 1 for max filter - kdb

I just want to create one report where I need max price for each symbol so I wrote following query which works fine on PROD but fails on UAT. So just wanted to know if following query is the appropriate or not.
select from (select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31) ) where size=(max;price) fby tier
Above query returns 2 column for each symbol instead of 1. Following is the result inner query i.e select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31)
t:([]time:8#2019.03.11D09:00+"v"$0 4 8 10;sym:8#`GOOG`GOOG`MSFT`MSFT;src:8#`L`O`N`O;price:36.01 35.01 35.5 31.1 39.01 38.01 33.5 32.1;size:8#1427 708 7810 1100)
time sym src price
--------------------------------------------
2019.03.11D09:00:00.000000000 GOOG L 36.01
2019.03.11D09:00:04.000000000 GOOG O 35.01
2019.03.11D09:00:08.000000000 MSFT N 35.5
2019.03.11D09:00:10.000000000 MSFT O 31.1
2019.03.11D09:00:00.000000000 GOOG L 39.01
2019.03.11D09:00:04.000000000 GOOG O 38.01
2019.03.11D09:00:08.000000000 MSFT N 33.5
2019.03.11D09:00:10.000000000 MSFT O 32.1
And output for select from (select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31) ) where size=(max;price) fby tier is :
t[0,2,4,7]
time sym src price
---------------------------------------------
2019.03.11D09:00:00.000000000 GOOG L 36.01
2019.03.11D09:00:08.000000000 MSFT N 35.5
2019.03.11D09:00:00.000000000 GOOG L 39.01
2019.03.11D09:00:10.000000000 MSFT O 32.1

I suspect that there is something missing with the dataset that you have provided in the question. The results of your inner queries are all floats with remainders, as size is a long, it doesn't make any sense that size=(max;price) is returning any results.
To answer your question in the most general of sense, to get the max price by sym is
select from t where price=(max;price) fby sym
Applying this to the inner result you have provided
q)select from t where price=(max;price) fby sym
time sym src price size
-------------------------------------------------
2019.03.11D09:00:08.000000000 MSFT N 35.5 7810
2019.03.11D09:00:00.000000000 GOOG L 39.01 1427

Related

query for selecting N records

I have a table tab that has cols date,sym,value and is sorted from oldest date to the recent.
I am trying to select the past N records for each sym and am not sure of the query for this. I know that I can select based on date being within a range but I needed it based on sym irrespective if value appeared on consecutive dates or not.
You could do this with fby and the virtual row number column i:
https://code.kx.com/q/ref/fby/
q){ select from tab where ({y in x#y}[x];i) fby sym }[-2]
date sym time src price size
------------------------------------------------------------
2014.04.21 AAPL 2014.04.21D16:29:03.253000000 N 24.98 3561
2014.04.21 AAPL 2014.04.21D16:29:03.558000000 N 24.98 2733
2014.04.21 CSCO 2014.04.21D16:28:56.265000000 O 35.6 8390
2014.04.21 CSCO 2014.04.21D16:29:44.572000000 L 35.61 2286
2014.04.21 DELL 2014.04.21D16:29:35.374000000 L 29.57 1444
2014.04.21 DELL 2014.04.21D16:29:39.979000000 N 29.56 216
2014.04.21 GOOG 2014.04.21D16:29:50.569000000 N 41.87 722
2014.04.21 GOOG 2014.04.21D16:29:58.633000000 O 41.9 437
Edit: Faster way would be to use functional exec with the 5th argument n(number of records) for each sym.
raze{
//[table;where;by;cols;rows]
?[tab;enlist (in;`sym;enlist x);0b;();y]
}[;-2]'[distinct tab[`sym]]
https://code.kx.com/q/basics/funsql/
Matt's suggestions using an fby and functional select are best if you want all columns in the table returned. If you only need the date, sym & price columns returned you could use
q)ungroup select -2#date,-2#price by sym from trade
sym date price
----------------------
APPL 2021.03.13 111.77
APPL 2021.03.13 111.85
CAT 2021.03.13 246
CAT 2021.03.13 246.27
GOOG 2021.03.13 206.24
GOOG 2021.03.13 206.21
NYSE 2021.03.13 60.67
NYSE 2021.03.13 60.97
Note that this can become tedious when selecting a large number of columns. In those cases it's better to stick with Matt's suggestions.

Pass a table as condition with where clause in kdb

I have a table t:
t:([] sym:`GOOG`AMZN; px:10 20; vol:100 200);
Is it possible to pass a sub-table as a where clause condition to the table?
Below query throws type error:
select from t where ([] sym:enlist `GOOG; px:enlist 10)
Yes, it is possible:
q)select from t where([]sym;px) in ([] sym:enlist `GOOG; px:enlist 10)
sym px vol
-----------
GOOG 10 100
Update: however, if t is large this should be much faster:
q)([] sym:enlist `GOOG; px:enlist 10)#2!t
sym px| vol
-------| ---
GOOG 10| 100

Pivot table with multiple value columns in KDB+

I would like to transform the following two row table generated by:
tb: ([] time: 2010.01.01 2010.01.01; side:`Buy`Sell; price:100 101; size:30 50)
time side price size
--------------------------------
2010.01.01 Buy 100 30
2010.01.01 Sell 101 50
To the table below with single row:
tb1: ([] enlist time: 2010.01.01; enlist price_buy:100; enlist price_sell:101; enlist size_buy:30; enlist size_sell:50)
time price_buy price_sell size_buy size_sell
-----------------------------------------------------
2010.01.01 100 101 30 50
What is the most efficient way to achieve this?
(select price_buy:price, size_buy:size by time from tb where side = `Buy) lj select price_sell:price, size_sell:size by time from tb where side = `Sell
time | price_buy size_buy price_sell size_sell
----------| ---------------------------------------
2010.01.01| 100 30 101 50
If you wanted to avoid 2 select statements:
raze each select `price_buy`price_sell!(side!price)#/:`Buy`Sell, `size_buy`size_sell!(side!size)#/:`Buy`Sell by time from tb
As an additional note, having a date column labeled time can be misleading. Typical financial tables in kdb have the format date time sym etc
Edit: Functional form for dynamic column generation:
{x[0] lj x[1]}[{?[`tb;enlist (=;`side;enlist `$x);(enlist `time)!enlist `time;(`$("price",x;"size",x))!(`price;`size)]} each ("Sell";"Buy")]
time | priceSell sizeSell priceBuy sizeBuy
----------| -----------------------------------
2010.01.01| 101 50 100 30
The general pivot function on the Kx website can do this, see https://code.kx.com/q/kb/pivoting-tables/
q)piv[tb;(),`time;(),`side;`price`size;{[v;P]`$raze each string raze P[;0],'/:v,/:\:P[;1]};{x,z}]
time | Buyprice Sellprice Buysize Sellsize
----------| -----------------------------------
2010.01.01| 100 101 30 50
I have a pivot function in github . But it doesn't support multiple columns
.math.st.pivot: {[t;rc;cf;ff]
P: asc distinct t cf;
Pcol: `$string[P] cross "_",/:string key ff;
t: ?[t;();rc!rc;key[ff]!{({[x;y;z] z each y#group x}[;;z];x;y)}[cf]'[key ff;value ff]];
t: ![t;();0b; Pcol! raze {((';#);x;$[-11h=type y;enlist;::] y)}'[key ff]'[P] ];
![t;();0b;key ff]
};
But you can left join to achieve expected result:
.math.st.pivot[tb;enlist`time;`side;enlist[`price]!enlist first]
lj .math.st.pivot[tb;enlist`time;`side;enlist[`size]!enlist first]
Looks like adding support for multiple columns is a good idea.

Split single column to multiple columns in KDB

I want to split values in a column to multiple columns after applying a complex function.
e.g. For the following trade table t , I want to split the sym into 2 separate columns sym and src. However, the function I would be applying would be slightly complex.
q)t:([] time:10:01:01 10:01:03 10:01:04;sym:`goog.l`vod.l`apple.o;qty:100 200 150)
time sym qty
--------------------
10:01:01 goog.l 100
10:01:03 vod.l 200
10:01:04 apple.o 150
If your table is very big and the sym column is very repetitive (which it looks like it will be if it's tick data) then the following will be much quicker:
f:{` vs'x}
#[t;`col1`col2;:;flip .Q.fu[f]t`sym]
You can create a table of sym and src by splitting on ., creating a dictionary then using flip to create a table:
q)show r:exec flip`sym`src!flip` vs/:sym from t
sym src
---------
goog l
vod l
apple o
This can be joined to the original table using each-both ,':
q)t,'r
time sym qty src
----------------------
10:01:01 goog 100 l
10:01:03 vod 200 l
10:01:04 apple 150 o
If column order is important then this can be fixed with xcols:
q)`time`sym`src xcols t,'r
time sym src qty
----------------------
10:01:01 goog l 100
10:01:03 vod l 200
10:01:04 apple o 150
One of the way to get this done is :
q)update sym:sym[;0] , mkt:sym[;1] from update ` vs/:sym from t
time sym qty mkt
----------------------
10:01:01 goog 100 l
10:01:03 vod 200 l
10:01:04 apple 150 o
If you are not intrested in any other columns except the one from the table that needs spliting then
q)exec {`s`mkt!` vs x}each sym from t
s mkt
---------
goog l
vod l
apple o
Another option would be;
q)(,'/)(t;flip`sym`src!exec flip ` vs'sym from t)
time sym qty src
----------------------
10:01:01 goog 100 l
10:01:03 vod 200 l
10:01:04 apple 150 o

Append columns to empty table - Q/KDB+

I'm pulling data from a source that returns tick data for stocks (timespan + float prices).
I need to build 1 table that has the tick data for each stock, while inserting new timespan index values for each one. Example:
AAPL:
t0 101.20
t3 102.10
GOOG:
t1 850.50
t2 860.10
Table:
AAPL GOOG
t0 101.20 NA
t1 NA 850.50
t2 NA 860.10
t3 102.10 NA
There would be many symbols, so I can't just manually type AAPL, GOOG etc.
While it would be possible to set up a table like you have described it would not be advisable. You would be better to set up a column to record each stock, sym in this case:
t sym price
-------------------------------------------
2018.02.05D14:11:09.241245000 AAPL 101.7808
2018.02.05D14:11:09.241246000 GOOG 103.0177
2018.02.05D14:11:09.241246000 AAPL 107.8503
2018.02.05D14:11:09.241247000 GOOG 105.3471