Rolling window in multi groups - kdb

I have the following trade table:
time ticker side price qty
--------------------------
2018.01.01T13:00:20 AAPL BUY 10.0 100
2018.01.01T13:01:30 AAPL SELL 12.0 300
2018.01.01T13:01:45 AAPL BUY 11.0 500
2018.01.01T13:02:13 AAPL BUY 10.5 100
2018.01.01T13:05:00 AAPL SELL 13.0 200
I need a rolling window function with a lookback of 1 minute to seperate the buy/sells of a stock price
time ticker BUYs SELLs TOTAL
--------------------------------
2018.01.01T13:00:20 AAPL 1 0 1
2018.01.01T13:01:30 AAPL 0 1 1
2018.01.01T13:01:45 AAPL 1 1 2
2018.01.01T13:02:13 AAPL 1 1 2
2018.01.01T13:05:00 AAPL 0 1 1
I have decided on using the "wj" function, because the rolling function suit my purpose. However I can't get it to work:
w: -00:01 00:00 +:/ select time from table
wj[w;'ticker'time;table;(table;(count;ticker);(count;ticker))]
So at least I want the count every buy/sell first then group them later. But I cannot even get the initial query to run without getting a type error.
Can someone point me in the right direction?
Additional Question
I know would have to perform a rolling sum/count over several accounts which is not known until runtime.
time ticker side price qty account
----------------------------------
2018.01.01T13:00:20 AAPL BUY 10.0 100 ACCT123
2018.01.01T13:01:30 AAPL SELL 12.0 300 ACCT456
2018.01.01T13:01:45 AAPL BUY 11.0 500 ACCT789
2018.01.01T13:02:13 AAPL BUY 10.5 100 ERRORACCT123
2018.01.01T13:05:00 AAPL SELL 13.0 200 TESTACCT123
I know I can pivot the table to:
time ticker side price qty ACCT123 ACCT456 ACC789 ERRORACCT123 TESTACCT23
---------------------------------
but can I using the rolling function to sum the sizes in a 1 minute lookback period?

The window w is required to be a pair of lists:
w: -00:01 00:00 +\: exec time from t
You'll also need to use wj1 as you only want to consider rows on or after entry to the window.
http://code.kx.com/q/ref/joins/#wj-wj1-window-join
q)table,'exec side from wj1[w;`ticker`time;table;(table;({`BUY`SELL!count each (group x)`BUY`SELL};`side))]
The monadic lambda:
{`BUY`SELL!count each (group x)`BUY`SELL}
Uses group to return the indices of BUY and SELL values and also ensures that BUY and SELL are present in all keys.
exec creates a table:
q)exec side from wj1[w;`ticker`time;table;(table;({{`BUY`SELL!count each x`BUY`SELL}group x};`side))]
BUY SELL
--------
1 0
0 1
1 1
2 1
0 1
And then we use join each to get the final result:
q)update TOTAL:BUY+SELL from table,'exec side from wj1[w;`ticker`time;table;(table;({`BUY`SELL!count each (group x)`BUY`SELL};`side))]
time ticker side price qty BUY SELL TOTAL
------------------------------------------------------------------
2018.01.01D13:00:20.000000000 AAPL BUY 10 100 1 0 1
2018.01.01D13:01:30.000000000 AAPL SELL 12 300 0 1 1
2018.01.01D13:01:45.000000000 AAPL BUY 11 500 1 1 2
2018.01.01D13:02:13.000000000 AAPL BUY 10.5 100 2 1 3
2018.01.01D13:05:00.000000000 AAPL SELL 13 200 0 1 1
For summing quantities depending on side it is easier to the following:
First update two new columns using vector conditional and then sum these using wj1.
http://code.kx.com/q/ref/lists/#vector-conditional
q)wj1[w;`ticker`time;table;(update BUYQUANTITY:?[`BUY=side;qty;0],SELLQUANTITY:?[`SELL=side;qty;0]from table;(sum;`BUYQUANTITY);(sum;`SELLQUANTITY))]
time ticker side price qty BUYQUANTITY SELLQUANTITY
----------------------------------------------------------------------------
2018.01.01D13:00:20.000000000 AAPL BUY 10 100 100 0
2018.01.01D13:01:30.000000000 AAPL SELL 12 300 0 300
2018.01.01D13:01:45.000000000 AAPL BUY 11 500 500 300
2018.01.01D13:02:13.000000000 AAPL BUY 10.5 100 600 300
2018.01.01D13:05:00.000000000 AAPL SELL 13 200 0 200

w: -00:01 00:00 +\: exec time from table
Using an exec will allow you to create a pair of times or timestamps for the time interval to join on. You must also use \: to perform the each left operation.
wj[w;`sym`time;table;(table;(count;`sym);(count;`sym))]
w defines the time interval - a pair of times or timestamps;
The table names in the window join must also be passed in as a symbol using `.

Related

Add condition to where clause in q/kdb+

Table Tab
minThreshold
maxThreshold
point
1000
10000
10
wClause,:enlist((';~:;<);`qty;Tab[`minThreshold])
trying to incorporate maxThreshold column to where clause
qty >= MinThreshold
qty <= MaxThreshold
something like
wClause,:enlist((';~:;<);`qty;Tab[`minThreshold]);Tab[`maxThreshold])
q)Tab:([] minThreshold:500 1000;maxThreshold:700 2000;point:5 10)
q)Tab
minThreshold maxThreshold point
-------------------------------
500 700 5
1000 2000 10
q)select from Tab where minThreshold>=900,maxThreshold<=2500
minThreshold maxThreshold point
-------------------------------
1000 2000 10
q)parse"select from Tab where minThreshold>=900,maxThreshold<=2500"
?
`Tab
,(((';~:;<);`minThreshold;900);((';~:;>);`maxThreshold;2500))
0b
()
q)?[Tab;((>=;`minThreshold;900);(<=;`maxThreshold;2500));0b;()]
minThreshold maxThreshold point
-------------------------------
1000 2000 10
See the whitepaper for more information on functional selects:
https://code.kx.com/q/wp/parse-trees/
Is your problem
you have a Where phrase that works for functional qSQL and you want to extend it?
you want to select rows of a table where the value of a quantity falls within an upper and lower bound?
If (2) you can use Join Each to get the bounds for each row, and within to test the quantity.
q)show t:([]lwr:1000 900 150;upr:10000 25000 500;qty:10 1000 450)
lwr upr qty
---------------
1000 10000 10
900 25000 1000
150 500 450
q)select from t where qty within' lwr{x,y}'upr
lwr upr qty
--------------
900 25000 1000
150 500 450
Above we use {x,y} because in qSQL queries comma does not denote Join.

kdb/q -- cumulative sum by symbol, but with a cap

For example in the below table, I want to run the cumulative sum on "val" column of the table, grouped by the symbol column. but I want to cap the cumulative sum by the value in the "cap" column. If the cumulative sum exceeds the cap, I just cap it at that value. And for the next value, I will add it on up of the capped value:
Example (and better format in picture). I am given input date, sym, val, cap, I want to produce the output in "cumval" column.
date sym val cap cumval
-----------------------------------
2020.01.01 AAPL 100 200 100
2020.01.02 AAPL 100 200 200
2020.01.03 AAPL 100 200 200
2020.01.04 AAPL -100 200 100
2020.01.01 MSFT 100 300 100
2020.01.02 MSFT 100 300 200
2020.01.03 MSFT 100 300 300
2020.01.04 MSFT 100 400 400
You'll need to use a custom accumulate to achieve your result. The built in sums function uses the binary accumulator \. To add the cap logic you'll need to use a ternary scan accumulator. The following will work {z&x+y}\. The first parameter is the initial value, zero in your case. The second parameter are the values being accumulated, and the third parameter are the cap values.
q)show t:([]date:2020.01.01 2020.01.02 2020.01.03 2020.01.04 2020.01.01 2020.01.02 2020.01.03 2020.01.04;sym:`AAPL`AAPL`AAPL`AAPL`MSFT`MSFT`MSFT`MSFT;val:100 100 100 -100 100 100 100 100;cap:200 200 200 200 300 300 300 400)
date sym val cap
------------------------
2020.01.01 AAPL 100 200
2020.01.02 AAPL 100 200
2020.01.03 AAPL 100 200
2020.01.04 AAPL -100 200
2020.01.01 MSFT 100 300
2020.01.02 MSFT 100 300
2020.01.03 MSFT 100 300
2020.01.04 MSFT 100 400
q)update cumval:{z&x+y}\[0;val;cap] by sym from t
date sym val cap cumval
-------------------------------
2020.01.01 AAPL 100 200 100
2020.01.02 AAPL 100 200 200
2020.01.03 AAPL 100 200 200
2020.01.04 AAPL -100 200 100
2020.01.01 MSFT 100 300 100
2020.01.02 MSFT 100 300 200
2020.01.03 MSFT 100 300 300
2020.01.04 MSFT 100 400 400

Create rolling calculation based unique entries (Q/KDB+)

I have a table:
q)data:([]dt:2017.01.05D19:45:00.238248239 2017.01.05D20:46:00.282382392 2017.01.05D21:47:00.232842342 2017.01.05D22:48:00.835838442 2017.01.05D20:49:00.282382392;sym:`AAPL`GOOG`AAPL`BBRY`GOOG;price:101.20 800.20 102.30 2.20 800.50;shares:500 100 500 900 100)
q)data
dt sym price shares
2017.01.05D19:45:00.238248239 AAPL 101.20 500
2017.01.05D20:46:00.282382392 GOOG 800.20 100
2017.01.05D21:47:00.232842342 AAPL 102.30 500
2017.01.05D22:48:00.835838442 BBRY 2.20 900
2017.01.05D20:49:00.282382392 GOOG 800.50 100
I need to create a column containing the sum of price*shares for the latest observation of each individual ticker.
To demonstrate using the above data, we're looking for:
data:
dt sym price shares index
2017.01.05D19:45:00.238248239 AAPL 101.20 500 50,600
2017.01.05D20:46:00.282382392 GOOG 800.20 100 130,620
2017.01.05D21:47:00.232842342 AAPL 102.30 500 131,170
2017.01.05D22:48:00.835838442 BBRY 2.20 900 133,150
2017.01.05D20:49:00.282382392 GOOG 800.50 100 133,180
To further clarify, at row 1, only 1 symbol is included, at row 2, 2 symbols, then 2 again, then 3, then 3 again at row 5.
Answered in a different thread: Apply formula to current and previous rows only (Q/KDB)
Slight variation on Jonathon's solution, using a vector conditional:
q)delete dict from update index:?[all flip distinct[sym]in/: key'[dict]; {sum[x]*sum[y]%sum z} ./: flip each value each dict;0N] from update dict:#[;;:;]\[()!();sym;flip (price;shares;divisor)] from data
dt sym price shares divisor index
----------------------------------------------------------------
2018.02.05D22:47:22.175914000 AAPL 101.2 500 2
2018.02.05D22:21:10.175914000 GOOG 800.2 500 1
2018.02.05D22:58:00.175914000 AAPL 102.3 500 2
2018.02.05D22:19:27.175914000 BBRY 2.2 500 1 339262.5
Given the question has changed considerably since the initial posting and my previous answer, here is an updated solution:
q)delete ind from update index:sum#'ind from (update ind:#\[()!();sym;:;shares*price] from data) where i>=max(first;i)fby sym
dt sym price shares index
------------------------------------------------------
2017.01.05D19:45:00.238248239 AAPL 101.2 500
2017.01.05D20:46:00.282382392 GOOG 800.2 100
2017.01.05D21:47:00.232842342 AAPL 102.3 500
2017.01.05D22:48:00.835838442 BBRY 2.2 900 133150
2017.01.05D20:49:00.282382392 GOOG 800.5 100 133180
Or without the other initial condition that it should only be populated once all tickers have ticked:
q)delete ind from update index:sum#'ind from update ind:#\[()!();sym;:;shares*price] from data
dt sym price shares index
------------------------------------------------------
2017.01.05D19:45:00.238248239 AAPL 101.2 500 50600
2017.01.05D20:46:00.282382392 GOOG 800.2 100 130620
2017.01.05D21:47:00.232842342 AAPL 102.3 500 131170
2017.01.05D22:48:00.835838442 BBRY 2.2 900 133150
2017.01.05D20:49:00.282382392 GOOG 800.5 100 133180
(Note these are only minor modifications to the solution I posted yesterday, updated for the changed requirements in the question)
q)delete ind from update index:sum#'ind from (update ind:#\[()!();sym;:;shares*price%divisor] from data) where i>=max(first;i)fby sym
dt sym price shares divisor index
--------------------------------------------------------------
2017.01.05D19:45:00.238248239 AAPL 101.2 500 2
2017.01.05D20:45:00.282382392 GOOG 800.2 500 1
2017.01.05D21:45:00.232842342 AAPL 102.3 500 2
2017.01.05D22:45:00.835838442 BBRY 2.2 500 1 426775
Slightly different answer to what you got, this is doing sum(shares*price%divisor) rather than summing each individually.
A slightly more messy and complicated version that gets the same answer as you seem to be expecting:
q)delete ind from update index:sum'[ind[;;0]]*sum'[ind[;;1]]%sum'[ind[;;2]] from (update ind:#\[()!();sym;:;shares,'price,'divisor] from data) where i>=max(first;i)fby sym
dt sym price shares divisor index
----------------------------------------------------------------
2017.01.05D19:45:00.238248239 AAPL 101.2 500 2
2017.01.05D20:45:00.282382392 GOOG 800.2 500 1
2017.01.05D21:45:00.232842342 AAPL 102.3 500 2
2017.01.05D22:45:00.835838442 BBRY 2.2 500 1 339262.5

How to generate datatable by iterating through multiple lists? (KDB)

I have a function quotes[ticker;startDate;endDate], and a function indexConstituents[index;startDate;endDate] that yield the below:
daterange: 2017.12.05,2017.12.06;
quotes'[AAPL;daterange]
date time sym price
2017.12.05 09:45 AAPL 101.20
2017.12.06 09:45 AAPL 102.30
quotes'[GOOG;daterange]
date time sym price
2017.12.05 10:00 GOOG 800.50
quotes'[BBRY;daterange]
date time sym price
2017.12.06 11:15 BBRY 02.10
and
indexConstituents'[DJIA;daterange]
date sym shares divisor
2017.12.05 AAPL 20 2
2017.12.05 GOOG 5 1
2017.12.06 AAPL 10 1.5
2017.12.06 BBRY 100 1
I need a way to run the indexConstituents function as normal to yield a list of constituents over a set of days (as in the second table above), then fetch the data from table 1 for each constituent. Finally, I need to join the data from both tables to yield the below:
data:
date time sym price shares divisor
2017.12.05 09:45 AAPL 101.20 20 2
2017.12.06 09:45 AAPL 101.30 10 1.5
2017.12.05 10:00 GOOG 800.50 5 1
2017.12.06 11:15 BBRY 02.10 200 1
Code for the first two tables:
([] date:2017.12.05,2017.12.06; time:09:45,09:45; sym:`AAPL,`AAPL; price:101.20,102.30)
([] date:2017.12.05,2017.12.05,2017.12.06,2017.12.06; sym:`AAPL,`GOOG,`AAPL,`BBRY; shares:20f,5f,10f,100f; divisor:2f,1f,1.5f,1f)
I think the best approach is to assign the resultant table from indexConstituents'[DJIA;daterange] to a variable, so that we can then pull out the sym column and apply distinct to it.
You can then use that list of syms as your first argument to the quotes.
Finally join the two resultant tables together.
idx:indexConstituents'[DJIA;daterange];
q:quotes\:/:[distinct idx`sym;daterange];
q lj 2!idx
Hope this helps!

Apply formula to current and previous rows only (Q/KDB)

I have a formula that I'd like to apply row-by-row, such that only the current and previous rows on any given row are included in calculation. Consider this data:
data:([]dt:2017.01.05D19:45:00.238248239 2017.01.05D20:46:00.282382392 2017.01.05D21:47:00.232842342 2017.01.05D22:48:00.835838442 2017.01.05D20:49:00.282382392;sym:`AAPL`GOOG`AAPL`BBRY`GOOG;price:101.20 800.20 102.30 2.20 800.50;shares:500 100 500 900 100)
data:
dt sym price shares
2017.01.05D19:45:00:238248239 AAPL 101.20 500
2017.01.05D20:46:00:282382392 GOOG 800.20 100
2017.01.05D21:47:00:232842342 AAPL 102.30 500
2017.01.05D22:48:00:835838442 BBRY 2.20 900
2017.01.05D20:49:00:282382392 GOOG 800.50 100
The formula select sum price from data where i=(last;i)fby sym would yield the result I need, however it would only yield 1 datapoint. I need that calculation done at every row of the dataset.
Scan ("\") applies this behavior, but unfortunately I don't know how to do that when using select statements.
Not entirely sure what you want but the following uses the latest price for each sym to calculate the sum rp:
q)update rp:sum each #\[()!();sym;:;price] from data
dt sym price shares rp
-----------------------------------------------------
2017.01.05D19:45:00.238248239 AAPL 101.2 500 101.2
2017.01.05D20:46:00.282382392 GOOG 800.2 100 901.4
2017.01.05D21:47:00.232842342 AAPL 102.3 500 902.5
2017.01.05D22:48:00.835838442 BBRY 2.2 900 904.7
2017.01.05D20:49:00.282382392 GOOG 800.5 100 905
Which gives the same answer for the final data point as you have given above.
You can also get the last price at each index, like so:
{[x;y] exec sum price from x where i<=y, i=(last;i) fby sym}[data]each til count data
101.2 901.4 902.5 904.7 905