How to generate datatable by iterating through multiple lists? (KDB) - kdb

I have a function quotes[ticker;startDate;endDate], and a function indexConstituents[index;startDate;endDate] that yield the below:
daterange: 2017.12.05,2017.12.06;
quotes'[AAPL;daterange]
date time sym price
2017.12.05 09:45 AAPL 101.20
2017.12.06 09:45 AAPL 102.30
quotes'[GOOG;daterange]
date time sym price
2017.12.05 10:00 GOOG 800.50
quotes'[BBRY;daterange]
date time sym price
2017.12.06 11:15 BBRY 02.10
and
indexConstituents'[DJIA;daterange]
date sym shares divisor
2017.12.05 AAPL 20 2
2017.12.05 GOOG 5 1
2017.12.06 AAPL 10 1.5
2017.12.06 BBRY 100 1
I need a way to run the indexConstituents function as normal to yield a list of constituents over a set of days (as in the second table above), then fetch the data from table 1 for each constituent. Finally, I need to join the data from both tables to yield the below:
data:
date time sym price shares divisor
2017.12.05 09:45 AAPL 101.20 20 2
2017.12.06 09:45 AAPL 101.30 10 1.5
2017.12.05 10:00 GOOG 800.50 5 1
2017.12.06 11:15 BBRY 02.10 200 1
Code for the first two tables:
([] date:2017.12.05,2017.12.06; time:09:45,09:45; sym:`AAPL,`AAPL; price:101.20,102.30)
([] date:2017.12.05,2017.12.05,2017.12.06,2017.12.06; sym:`AAPL,`GOOG,`AAPL,`BBRY; shares:20f,5f,10f,100f; divisor:2f,1f,1.5f,1f)

I think the best approach is to assign the resultant table from indexConstituents'[DJIA;daterange] to a variable, so that we can then pull out the sym column and apply distinct to it.
You can then use that list of syms as your first argument to the quotes.
Finally join the two resultant tables together.
idx:indexConstituents'[DJIA;daterange];
q:quotes\:/:[distinct idx`sym;daterange];
q lj 2!idx
Hope this helps!

Related

query for selecting N records

I have a table tab that has cols date,sym,value and is sorted from oldest date to the recent.
I am trying to select the past N records for each sym and am not sure of the query for this. I know that I can select based on date being within a range but I needed it based on sym irrespective if value appeared on consecutive dates or not.
You could do this with fby and the virtual row number column i:
https://code.kx.com/q/ref/fby/
q){ select from tab where ({y in x#y}[x];i) fby sym }[-2]
date sym time src price size
------------------------------------------------------------
2014.04.21 AAPL 2014.04.21D16:29:03.253000000 N 24.98 3561
2014.04.21 AAPL 2014.04.21D16:29:03.558000000 N 24.98 2733
2014.04.21 CSCO 2014.04.21D16:28:56.265000000 O 35.6 8390
2014.04.21 CSCO 2014.04.21D16:29:44.572000000 L 35.61 2286
2014.04.21 DELL 2014.04.21D16:29:35.374000000 L 29.57 1444
2014.04.21 DELL 2014.04.21D16:29:39.979000000 N 29.56 216
2014.04.21 GOOG 2014.04.21D16:29:50.569000000 N 41.87 722
2014.04.21 GOOG 2014.04.21D16:29:58.633000000 O 41.9 437
Edit: Faster way would be to use functional exec with the 5th argument n(number of records) for each sym.
raze{
//[table;where;by;cols;rows]
?[tab;enlist (in;`sym;enlist x);0b;();y]
}[;-2]'[distinct tab[`sym]]
https://code.kx.com/q/basics/funsql/
Matt's suggestions using an fby and functional select are best if you want all columns in the table returned. If you only need the date, sym & price columns returned you could use
q)ungroup select -2#date,-2#price by sym from trade
sym date price
----------------------
APPL 2021.03.13 111.77
APPL 2021.03.13 111.85
CAT 2021.03.13 246
CAT 2021.03.13 246.27
GOOG 2021.03.13 206.24
GOOG 2021.03.13 206.21
NYSE 2021.03.13 60.67
NYSE 2021.03.13 60.97
Note that this can become tedious when selecting a large number of columns. In those cases it's better to stick with Matt's suggestions.

How do I get this output from MySQL

I have a table called nasdaq_transactions looks like below
Ticker Close Date
GOOG 1195.06 08/15/2018
AAPL 215.15 08/15/2018
MSFT 104.56 08/15/2018
GOOG 1198.11 08/16/2018
AAPL 216.1 08/16/2018
MSFT 105.1 08/16/2018
GOOG 1200.96 08/17/2018
AAPL 217.58 08/17/2018
MSFT 107.58 08/17/2018
Want to build a query that gives a output
Ticker 08/15/2018 08/16/2018 08/17/2018
GOOG 1196.06 1198.11 1200.96
AAPL 215.15 216.1 217.58
MSFT 104.56 105.1 107.58
There are several ways that you can transform data from multiple rows into columns.
In SQL Server you can use the PIVOT function to transform the data from rows to columns:
select * from
(
select Ticker, Date
from nasdaq_transactions
)
pivot
(
max(value)
for columnname in ('08/15/2018', '08/16/2018', '08/17/2018')
) piv;
I have given you a sample code. Refer to it and apply accordingly.

Rolling window in multi groups

I have the following trade table:
time ticker side price qty
--------------------------
2018.01.01T13:00:20 AAPL BUY 10.0 100
2018.01.01T13:01:30 AAPL SELL 12.0 300
2018.01.01T13:01:45 AAPL BUY 11.0 500
2018.01.01T13:02:13 AAPL BUY 10.5 100
2018.01.01T13:05:00 AAPL SELL 13.0 200
I need a rolling window function with a lookback of 1 minute to seperate the buy/sells of a stock price
time ticker BUYs SELLs TOTAL
--------------------------------
2018.01.01T13:00:20 AAPL 1 0 1
2018.01.01T13:01:30 AAPL 0 1 1
2018.01.01T13:01:45 AAPL 1 1 2
2018.01.01T13:02:13 AAPL 1 1 2
2018.01.01T13:05:00 AAPL 0 1 1
I have decided on using the "wj" function, because the rolling function suit my purpose. However I can't get it to work:
w: -00:01 00:00 +:/ select time from table
wj[w;'ticker'time;table;(table;(count;ticker);(count;ticker))]
So at least I want the count every buy/sell first then group them later. But I cannot even get the initial query to run without getting a type error.
Can someone point me in the right direction?
Additional Question
I know would have to perform a rolling sum/count over several accounts which is not known until runtime.
time ticker side price qty account
----------------------------------
2018.01.01T13:00:20 AAPL BUY 10.0 100 ACCT123
2018.01.01T13:01:30 AAPL SELL 12.0 300 ACCT456
2018.01.01T13:01:45 AAPL BUY 11.0 500 ACCT789
2018.01.01T13:02:13 AAPL BUY 10.5 100 ERRORACCT123
2018.01.01T13:05:00 AAPL SELL 13.0 200 TESTACCT123
I know I can pivot the table to:
time ticker side price qty ACCT123 ACCT456 ACC789 ERRORACCT123 TESTACCT23
---------------------------------
but can I using the rolling function to sum the sizes in a 1 minute lookback period?
The window w is required to be a pair of lists:
w: -00:01 00:00 +\: exec time from t
You'll also need to use wj1 as you only want to consider rows on or after entry to the window.
http://code.kx.com/q/ref/joins/#wj-wj1-window-join
q)table,'exec side from wj1[w;`ticker`time;table;(table;({`BUY`SELL!count each (group x)`BUY`SELL};`side))]
The monadic lambda:
{`BUY`SELL!count each (group x)`BUY`SELL}
Uses group to return the indices of BUY and SELL values and also ensures that BUY and SELL are present in all keys.
exec creates a table:
q)exec side from wj1[w;`ticker`time;table;(table;({{`BUY`SELL!count each x`BUY`SELL}group x};`side))]
BUY SELL
--------
1 0
0 1
1 1
2 1
0 1
And then we use join each to get the final result:
q)update TOTAL:BUY+SELL from table,'exec side from wj1[w;`ticker`time;table;(table;({`BUY`SELL!count each (group x)`BUY`SELL};`side))]
time ticker side price qty BUY SELL TOTAL
------------------------------------------------------------------
2018.01.01D13:00:20.000000000 AAPL BUY 10 100 1 0 1
2018.01.01D13:01:30.000000000 AAPL SELL 12 300 0 1 1
2018.01.01D13:01:45.000000000 AAPL BUY 11 500 1 1 2
2018.01.01D13:02:13.000000000 AAPL BUY 10.5 100 2 1 3
2018.01.01D13:05:00.000000000 AAPL SELL 13 200 0 1 1
For summing quantities depending on side it is easier to the following:
First update two new columns using vector conditional and then sum these using wj1.
http://code.kx.com/q/ref/lists/#vector-conditional
q)wj1[w;`ticker`time;table;(update BUYQUANTITY:?[`BUY=side;qty;0],SELLQUANTITY:?[`SELL=side;qty;0]from table;(sum;`BUYQUANTITY);(sum;`SELLQUANTITY))]
time ticker side price qty BUYQUANTITY SELLQUANTITY
----------------------------------------------------------------------------
2018.01.01D13:00:20.000000000 AAPL BUY 10 100 100 0
2018.01.01D13:01:30.000000000 AAPL SELL 12 300 0 300
2018.01.01D13:01:45.000000000 AAPL BUY 11 500 500 300
2018.01.01D13:02:13.000000000 AAPL BUY 10.5 100 600 300
2018.01.01D13:05:00.000000000 AAPL SELL 13 200 0 200
w: -00:01 00:00 +\: exec time from table
Using an exec will allow you to create a pair of times or timestamps for the time interval to join on. You must also use \: to perform the each left operation.
wj[w;`sym`time;table;(table;(count;`sym);(count;`sym))]
w defines the time interval - a pair of times or timestamps;
The table names in the window join must also be passed in as a symbol using `.

Create rolling calculation based unique entries (Q/KDB+)

I have a table:
q)data:([]dt:2017.01.05D19:45:00.238248239 2017.01.05D20:46:00.282382392 2017.01.05D21:47:00.232842342 2017.01.05D22:48:00.835838442 2017.01.05D20:49:00.282382392;sym:`AAPL`GOOG`AAPL`BBRY`GOOG;price:101.20 800.20 102.30 2.20 800.50;shares:500 100 500 900 100)
q)data
dt sym price shares
2017.01.05D19:45:00.238248239 AAPL 101.20 500
2017.01.05D20:46:00.282382392 GOOG 800.20 100
2017.01.05D21:47:00.232842342 AAPL 102.30 500
2017.01.05D22:48:00.835838442 BBRY 2.20 900
2017.01.05D20:49:00.282382392 GOOG 800.50 100
I need to create a column containing the sum of price*shares for the latest observation of each individual ticker.
To demonstrate using the above data, we're looking for:
data:
dt sym price shares index
2017.01.05D19:45:00.238248239 AAPL 101.20 500 50,600
2017.01.05D20:46:00.282382392 GOOG 800.20 100 130,620
2017.01.05D21:47:00.232842342 AAPL 102.30 500 131,170
2017.01.05D22:48:00.835838442 BBRY 2.20 900 133,150
2017.01.05D20:49:00.282382392 GOOG 800.50 100 133,180
To further clarify, at row 1, only 1 symbol is included, at row 2, 2 symbols, then 2 again, then 3, then 3 again at row 5.
Answered in a different thread: Apply formula to current and previous rows only (Q/KDB)
Slight variation on Jonathon's solution, using a vector conditional:
q)delete dict from update index:?[all flip distinct[sym]in/: key'[dict]; {sum[x]*sum[y]%sum z} ./: flip each value each dict;0N] from update dict:#[;;:;]\[()!();sym;flip (price;shares;divisor)] from data
dt sym price shares divisor index
----------------------------------------------------------------
2018.02.05D22:47:22.175914000 AAPL 101.2 500 2
2018.02.05D22:21:10.175914000 GOOG 800.2 500 1
2018.02.05D22:58:00.175914000 AAPL 102.3 500 2
2018.02.05D22:19:27.175914000 BBRY 2.2 500 1 339262.5
Given the question has changed considerably since the initial posting and my previous answer, here is an updated solution:
q)delete ind from update index:sum#'ind from (update ind:#\[()!();sym;:;shares*price] from data) where i>=max(first;i)fby sym
dt sym price shares index
------------------------------------------------------
2017.01.05D19:45:00.238248239 AAPL 101.2 500
2017.01.05D20:46:00.282382392 GOOG 800.2 100
2017.01.05D21:47:00.232842342 AAPL 102.3 500
2017.01.05D22:48:00.835838442 BBRY 2.2 900 133150
2017.01.05D20:49:00.282382392 GOOG 800.5 100 133180
Or without the other initial condition that it should only be populated once all tickers have ticked:
q)delete ind from update index:sum#'ind from update ind:#\[()!();sym;:;shares*price] from data
dt sym price shares index
------------------------------------------------------
2017.01.05D19:45:00.238248239 AAPL 101.2 500 50600
2017.01.05D20:46:00.282382392 GOOG 800.2 100 130620
2017.01.05D21:47:00.232842342 AAPL 102.3 500 131170
2017.01.05D22:48:00.835838442 BBRY 2.2 900 133150
2017.01.05D20:49:00.282382392 GOOG 800.5 100 133180
(Note these are only minor modifications to the solution I posted yesterday, updated for the changed requirements in the question)
q)delete ind from update index:sum#'ind from (update ind:#\[()!();sym;:;shares*price%divisor] from data) where i>=max(first;i)fby sym
dt sym price shares divisor index
--------------------------------------------------------------
2017.01.05D19:45:00.238248239 AAPL 101.2 500 2
2017.01.05D20:45:00.282382392 GOOG 800.2 500 1
2017.01.05D21:45:00.232842342 AAPL 102.3 500 2
2017.01.05D22:45:00.835838442 BBRY 2.2 500 1 426775
Slightly different answer to what you got, this is doing sum(shares*price%divisor) rather than summing each individually.
A slightly more messy and complicated version that gets the same answer as you seem to be expecting:
q)delete ind from update index:sum'[ind[;;0]]*sum'[ind[;;1]]%sum'[ind[;;2]] from (update ind:#\[()!();sym;:;shares,'price,'divisor] from data) where i>=max(first;i)fby sym
dt sym price shares divisor index
----------------------------------------------------------------
2017.01.05D19:45:00.238248239 AAPL 101.2 500 2
2017.01.05D20:45:00.282382392 GOOG 800.2 500 1
2017.01.05D21:45:00.232842342 AAPL 102.3 500 2
2017.01.05D22:45:00.835838442 BBRY 2.2 500 1 339262.5

Apply formula to current and previous rows only (Q/KDB)

I have a formula that I'd like to apply row-by-row, such that only the current and previous rows on any given row are included in calculation. Consider this data:
data:([]dt:2017.01.05D19:45:00.238248239 2017.01.05D20:46:00.282382392 2017.01.05D21:47:00.232842342 2017.01.05D22:48:00.835838442 2017.01.05D20:49:00.282382392;sym:`AAPL`GOOG`AAPL`BBRY`GOOG;price:101.20 800.20 102.30 2.20 800.50;shares:500 100 500 900 100)
data:
dt sym price shares
2017.01.05D19:45:00:238248239 AAPL 101.20 500
2017.01.05D20:46:00:282382392 GOOG 800.20 100
2017.01.05D21:47:00:232842342 AAPL 102.30 500
2017.01.05D22:48:00:835838442 BBRY 2.20 900
2017.01.05D20:49:00:282382392 GOOG 800.50 100
The formula select sum price from data where i=(last;i)fby sym would yield the result I need, however it would only yield 1 datapoint. I need that calculation done at every row of the dataset.
Scan ("\") applies this behavior, but unfortunately I don't know how to do that when using select statements.
Not entirely sure what you want but the following uses the latest price for each sym to calculate the sum rp:
q)update rp:sum each #\[()!();sym;:;price] from data
dt sym price shares rp
-----------------------------------------------------
2017.01.05D19:45:00.238248239 AAPL 101.2 500 101.2
2017.01.05D20:46:00.282382392 GOOG 800.2 100 901.4
2017.01.05D21:47:00.232842342 AAPL 102.3 500 902.5
2017.01.05D22:48:00.835838442 BBRY 2.2 900 904.7
2017.01.05D20:49:00.282382392 GOOG 800.5 100 905
Which gives the same answer for the final data point as you have given above.
You can also get the last price at each index, like so:
{[x;y] exec sum price from x where i<=y, i=(last;i) fby sym}[data]each til count data
101.2 901.4 902.5 904.7 905