how to handle type F in a table in Q/KDB - kdb

I have started to learn q/KDB since a while, therefore forgive me in advance for trivial question but I am facing the following problem I don't know how to solve.
I have a table named "res" showing, side, summation of orders and average_price of some simbols
sym side | sum_order avg_price
----------| -------------------
ALPHA B | 95109 9849.73
ALPHA S | 91662 9849.964
BETA B | 47 9851.638
BETA S | 60 9853.383
with these types
c | t f a
---------| -----
sym | s p
side | s
sum_order| f
avg_price| f
I would like to calculate close and open positions, average point, made by close position, and average price of the open position.
I have used this query which I believe it is pretty bizarre (I am sure there will be a more professional way to do it) but it works as expected
position_summary:select
close_position:?[prev[sum_order]>sum_order;sum_order;prev[sum_order]],
average_price:avg_price-prev[avg_price],
open_pos:prev[sum_order]-sum_order,
open_wavgprice:?[sum_order>next[sum_order];avg_price;next[avg_price]][0]
by sym from res
giving me the following table
sym | close_position average_price open_pos open_wavgprice
----------| ----------------------------------------------------
ALPHA | 91662 0.2342456 3447 9849.73
BETA | 47 1.745035 -13 9853.38
and types are
c | t f a
--------------| -----
sym | s s
close_position| F
average_price | F
open_pos | F
open_wavgprice| f
Now my problem starts here, imagine I join position_summary table with another table appending another column "current_price" of type f
What I want to do is to determinate the points of the open positions.
I have tried this way:
select
?[open_pos>0;open_price-open_wavgprice;open_wavgprice-open]
from position_summary
but I got 'type error,
surely because sum_order is type F and open_wavgprice and current_price are f. I have search on internet by I did not find much about F type.
First: how can I handle this ? I have tried "cast" or use "raze" but no effects and moreover I am not sure if they are right on this particular occasion.
Second: is there a better way to use "if-then" during query tables (for example, in plain English :if this row of this column then take the previous / next of another column or the second or third of previous /next column)
Thank you for you help

Let me rephrase your question using a slightly simpler table:
q)show res:([sym:`A`A`B`B;side:`B`S`B`S]size:95 91 47 60;price:49.7 49.9 51.6 53.3)
sym side| size price
--------| ----------
A B | 95 49.7
A S | 91 49.9
B B | 47 51.6
B S | 60 53.3
You are trying to find the closing position for each symbol using a query like this:
q)show summary:select close:?[prev[size]>size;size;prev[size]] by sym from res
sym| close
---| -----
A | 91
B | 47
The result seems to have one number in each row of the "close" column, but in fact it has two. You may notice an extra space before each number in the display above or you can display the first row
q)first 0!summary
sym | `A
close| 0N 91
and see that the first row in the "close" column is 0N 91. Since the missing values such as 0N are displayed as a space, it was hard to see them in the earlier display.
It is not hard to understand how you've got these two values. Since you select by sym, each column gets grouped by symbol and for the symbol A, you have
q)show size:95 91
95 91
and
q)prev size
0N 95
that leads to
q)?[prev[size]>size;size;prev[size]]
0N 91
(Recall that 0N is smaller than any other integer.)
As a side note, ?[a>b;b;a] is element-wise minimum and can be written as a & b in q, so your conditional expression could be written as
q)size & prev size
0N 91
Now we can see why ? gave you the type error
q)close:exec close from summary
q)close
91
47
While the display is deceiving, "close" above is a list of two vectors:
q)first close
0N 91
and
q)last close
0N 47
The vector conditional does not support that:
q)?[close>0;10;20]
'type
[0] ?[close>0;10;20]
^
One can probably cure that by using each:
q)?[;10;20]each close>0
20 10
20 10
But I don't think this is what you want. Your problem started when you computed the summary table. I would expect the closing position to be the sum of "B" orders minus the sum of "S" orders that can be computed as
q)select close:sum ?[side=`B;size;neg size] by sym from res
sym| close
---| -----
A | 4
B | -13
Now you should be able to fix the rest of the columns in your summary query. Just make sure that you use an aggregation function such as sum in the expression for every column.

Type F means the "cell" in the column contains a vector of floats rather than an atom. So your column is actually a vector of vectors rather than a flat vector.
In your case you have a vector of size 1 in each cell, so in your case you could just do:
select first each close_position, first each average_price.....
which will give you a type f.
I'm not 100% on what you were trying to do in the first query, and I don't have a q terminal to hand to check but you could put this into your query:
select close_position:?[prev[sum_order]>sum_order;last sum_order; last prev[sum_order].....
i.e. get the last sum_order in the list.

Related

Sum in range until value change

I'am trying to use this formula to make it work
=ARRAYFORMULA(IF(ISDATE_STRICT(S2:S) ; (MATCH(MAX(AB2:AB),AB2:AB;0)-1) ; "" ))
If there is a date in Column "S" I want it to display the sum of the blanks that would appear if in Column "S" is text
=ARRAYFORMULA(IF(ISDATE_STRICT(S2:S) ; ArrayFormula(MATCH(FALSE ; ISBLANK(AB2:AB) ; 0)-1) ; "" ))
I've tried this one as well but I only get 0's as a result.
Any idea how I can make it work?
Here is the sample sheet.
https://docs.google.com/spreadsheets/d/19f5phXeAwXwrKbWz7njgbznmurOav72GUuo_5IGcbls/edit?usp=sharing
in Q2 use:
=ARRAYFORMULA(IF(ISBLANK(
I1:INDEX(I:I; ROWS(I:I)-1));
{N2:INDEX(N:N; ROWS(N:N))\
I1:INDEX(N:N; ROWS(N:N)-1)};
I1:INDEX(O:O; ROWS(O:O)-1)))
in X2 use:
=INDEX(LAMBDA(x; IFNA(VLOOKUP(x; QUERY(VLOOKUP(ROW(x);
IF(ISDATE_STRICT(x); {ROW(x)\x}); 2; 1);
"select Col1,count(Col1) group by Col1"); 2; 0)-1))
(Q2:INDEX(Q:Q; MAX((Q:Q<>"")*ROW(Q:Q)))))
UPDATE:
we start with column Q. we can take a range Q2:Q but that range contains a lot of empty rows. the next best thing is to check the last non-empty row and set it as the end of the range resulting in Q2:Q73. but static 73 won't do in case the dataset would grow or shrink so to get 73 dynamically we take the MAX of multiplication of Q:Q not being empty and row number of that case eg. Q:Q<>"" will output only TRUE or FALSE so what we are getting is
...
TRUE * 72 = 1 * 72 = 72
TRUE * 73 = 1 * 73 = 73
FALSE * 74 = 0 * 74 = 0
...
so the formula for getting Q2:Q73 is:
=Q2:INDEX(Q:Q; MAX((Q:Q<>"")*ROW(Q:Q)))
it could also be:
=INDEX(INDIRECT("Q2:Q"&MAX((Q:Q<>"")*ROW(Q:Q))))
but it's just long to type... next, we use the new LAMBDA function that allows us to reference cell/range/formula with a placeholder. simple LAMBDA syntax is:
=LAMBDA(x; x)(A1)
where x is A1 and we can do whatever we want with the 2nd (x) argument of LAMBDA like for example:
=LAMBDA(a, a+a*120-a/a)(A1)
you can think of it as:
LAMBDA(A1, A1+A1*120-A1/A1)(A1)
or as just:
=A1+A1*120-A1/A1
the issue here is that we repeat A1 4 times but with LAMBDA we do it only once. also, imagine if we would have 100 characters long formula instead of A1 so the final formula with lambda would be 300 characters shorter compared to "old way" formula.
back to our formula... x is the representation of Q2:Q73. now let's focus on VLOOKUP. basically, the idea here is that IF Q column contains a date we return that date, otherwise we return the last date from above. simply put:
=ARRAYFORMULA(VLOOKUP(ROW(Q2:Q73);
IF(ISDATE_STRICT(Q2:Q73); {ROW(Q2:Q73)\Q2:Q73}); 2; 1))
as you can see Y2, Y3 and Y4 are the same so all we need to do is to count them up and later take away one to exclude Q2 but include just Q3 and Q4 eg. 3-1=2. for that we use simple QUERY where the output is:
date count
30.06.2022 3
so all we need to do is to pair up dates from Q column to QUERY output for that we use the outer VLOOKUP where the output is as follows:
3
#N/A
#N/A
9
#N/A
#N/A
...
now is the right time for that -1 correction while we have these errors coz ERROR-1=ERROR and 3-1=2 so after this -1 correction the output is:
2
#N/A
#N/A
8
#N/A
#N/A
...
and all we need to do now is to hide errors with IFERROR and the output is column X

95% Confidence Interval for a weighted column in KDB+/Q

I have a table like the following where each row corresponds to an execution:
table:([]name:`account1`account1`account1`account2`account2`account1`account1`account1`account1`account2;
Pnl:13.7,13.2,74.1,57.8,29.9,15.9,7.8,-50.4,2.3,-16.2;
markouts:.01,.002,-.003,-.02,.004,.001,-.008,.04,.011,.09;
notional:1370,6600,-24700,-2890,7475,15900,-975,-1260,210,-180)
I'd like to create a 95% confidence interval of Pnl for `account1. The problem is, Pnl is the product of markouts and notional values, so it's weighted and the mean wouldn't be a simple mean. I'm pretty sure the standard deviation calculation would also be a bit different than normal.
Is there a way to still do this in KDB? I'm not really sure how to go about this. Any advice is greatly appreciated!
statistics isn't my strong point but most of this can be done with some keywords for the standard calculation:
q)select { avg[x] + -1 1* 1.960*sdev[x]%sqrt count x } Pnl by name from table
name | Pnl
--------| ------------------
account1| -15.90856 37.76571
account2| -18.45611 66.12278
https://code.kx.com/q/ref/avg/#avg
https://code.kx.com/q/ref/sqrt/
https://code.kx.com/q/ref/dev/#sdev
As shown on the kx ref, the sdev calculation is as follows which you could use as a base to create your own to suit what you want/expect.
{sqrt var[x]*count[x]%-1+count x}
There is also wavg if you want to do weighted average:
https://code.kx.com/q/ref/avg/#wavg
Edit: Assuming this can work by substituting in weighted calculations, here's a weighted sdev I've thrown together wsdev:
table:update weight:2 6 3 5 2 4 5 6 7 3 from table;
wsdev:{[w;v] sqrt (sum ( (v-wavg[w;v]) xexp 2) *w)%-1+sum w }
// substituting avg and sdev above
w95CI:{[w;v] wavg[w;v] + -1 1* 1.960*wsdev[w;v]%sqrt count v };
select w95CI[weight;Pnl] by name from table
name | Pnl
--------| ------------------
account1| -19.70731 28.47701
account2| -8.201463 68.24146

Understanding how to read each-right and each-left combined in kdb

From q for mortals, i'm struggling to understand how to read this, and understand it logically.
1 2 3,/:\:10 20
I understand the result is a cross product when in full form: raze 1 2 3,/:\:10 20.
But reading from left to right, I'm currently lost at understanding what this yields (in my head)
\:10 20
combined with 1 2 3,/: ??
Help in understanding how to read this clearly (in words or clear logic) would be appreciated.
I found myself saying the following in my head whilst I program the syntax in q. q works from right to left.
Internal Monologue -> Join the string on the right onto each of the strings on the left
code -> "ABC",\:"-D"
result -> "A-D"
"B-D"
"C-D"
I think that's an easy way to understand it. 'join' can be replaced with whatever...
Internal Monologue -> Does the string on the right match any of the strings on the left
code -> ("Cat";"Dog";"CAT";"dog")~\:"CAT"
result -> 0010b
Each-right is the same concept and combining them is straightforward also;
Internal Monologue -> Does each of the strings on the right match each of the strings on the left
code -> ("Cat";"Dog";"CAT";"dog")~\:/:("CAT";"Dog")
result -> 0010b
0100b
So in your example 1 2 3,/:\:10 20 - you're saying 'Join each of the elements on the right to each of the elements on the left'
Hope this helps!!
EDIT To add a real world example.... - consider the following table
q)show tab:([] upper syms:10?`2; names:10?("Robert";"John";"Peter";"Jenny"); amount:10?til 10)
syms names amount
--------------------
CF "Peter" 8
BP "Robert" 1
IC "John" 9
IN "John" 5
NM "Peter" 4
OJ "Jenny" 6
BJ "Robert" 6
KH "John" 1
HJ "Peter" 8
LH "John" 5
q)
I you want to get all records where the name is Robert, you can do; select from tab where names like "Robert"
But if you want to get the results where the name is either Robert or John, then it is a perfect scenario to use our each-left and each-right.
Consider the names column - it's a list of strings (a list where each element is a list of chars). What we want to ask is 'does any of the strings in the names column match any of the strings we want to find'... that translates to (namesList)~\:/:(list;of;names;to;find). Here's the steps;
q)(tab`names)~\:/:("Robert";"John")
0100001000b
0011000101b
From that result we want a compiled list of booleans where each element is true of it is true for Robert OR John - for example, if you look at index 1 of both lists, it's 1b for Robert and 0b for John - in our result, the value at index 1 should be 1b. Index 2 should be 1b, index3 should be 1b, index4 should be 0b etc... To do this, we can apply the any function (or max or sum!). The result is then;
q)any(tab`names)~\:/:("Robert";"John")
0111001101b
Putting it all together, we get;
q)select from tab where any names~\:/:("Robert";"John")
syms names amount
--------------------
BP "Robert" 1
IC "John" 9
IN "John" 5
BJ "Robert" 6
KH "John" 1
LH "John" 5
q)
Firstly, q is executed (and hence generally read) right to left. This means that it's interpreting the \: as a modifier to be applied to the previous function, which itself is a simple join modified by the /: adverb. So the way to read this is "Apply join each-right to each of the left-hand arguments."
In this case, you're applying the two adverbs to the join - \:10 20 on its own has no real meaning here.
I find it helpful to also look at the converse case 1 2 3,\:/:10 20, running that code produces a 2x6 matrix, which I'd describe more like "apply join each-left to each of the right hand arguments" ... I hope that makes sense.
An alternative syntax which also might help is ,/:\:[1 2 3;10 20] - this might be useful as it makes it very clear what the function you're applying is, and is equivalent to your in-place notation.

Joining multiple times in kdb

I have two tables
table 1 (orders) columns: (date,symbol,qty)
table 2 (marketData) columns: (date,symbol,close price)
I want to add the close for T+0 to T+5 to table 1.
{[nday]
value "temp0::update date",string[nday],":mdDates[DateInd+",string[nday],"] from orders";
value "temp::temp0 lj 2! select date",string[nday],":date,sym,close",string[nday],":close from marketData";
table1::temp
} each (1+til 5)
I'm sure there is a better way to do this, but I get a 'loop error when I try to run this function. Any suggestions?
See here for common errors. Your loop error is because you're setting views with value, not globals. Inside a function value evaluates as if it's outside the function so you don't need the ::.
That said there's lots of room for improvement, here's a few pointers.
You don't need the value at all in your case. E.g. this line:
First line can be reduced to (I'm assuming mdDates is some kind of function you're just dropping in to work out the date from an integer, and DateInd some kind of global):
{[nday]
temp0:update date:mdDates[nday;DateInd] from orders;
....
} each (1+til 5)
In this bit it just looks like you're trying to append something to the column name:
select date",string[nday],":date
Remember that tables are flipped dictionaries... you can mess with their column names via the keys, as illustrated (very noddily) below:
q)t:flip `a`b!(1 2; 3 4)
q)t
a b
---
1 3
2 4
q)flip ((`$"a","1"),`b)!(t`a;t`b)
a1 b
----
1 3
2 4
You can also use functional select, which is much neater IMO:
q)?[t;();0b;((`$"a","1"),`b)!(`a`b)]
a1 b
----
1 3
2 4
Seems like you wanted to have p0 to p5 columns with prices corresponding to date+0 to date+5 dates.
Using adverb over to iterate over 0 to 5 days :
q)orders:([] date:(2018.01.01+til 5); sym:5?`A`G; qty:5?10)
q)data:([] date:20#(2018.01.01+til 10); sym:raze 10#'`A`G; price:20?10+10.)
q)delete d from {c:`$"p",string[y]; (update d:date+y from x) lj 2!(`d`sym,c )xcol 0!data}/[ orders;0 1 2 3 4]
date sym qty p0 p1 p2 p3 p4
---------------------------------------------------------------
2018.01.01 A 0 10.08094 6.027448 6.045174 18.11676 1.919615
2018.01.02 G 3 13.1917 8.515314 19.018 19.18736 6.64622
2018.01.03 A 2 6.045174 18.11676 1.919615 14.27323 2.255483
2018.01.04 A 7 18.11676 1.919615 14.27323 2.255483 2.352626
2018.01.05 G 0 19.18736 6.64622 11.16619 2.437314 4.698096

kdb+/q: Apply iterative procedure with updated variable to a column

Consider the following procedure f:{[x] ..} with starting value a:0:
Do something with x and a. The output is saved as the new version of a, and the output is returned by the function
For the next input x, redo the procedure but now with the new a.
For a single value x, this procedure is easily constructed. For example:
a:0;
f:{[x] a::a+x; :a} / A simple example (actual function more complicated)
However, how do I make such a function such that it also works when applied on a table column?
I am clueless how to incorporate this step for 'intermediate saving of a variable' in a function that can be applied on a column at once. Is there a special technique for this? E.g. when I use a table column in the example above, it will simply calculate a+x with a:0 for all rows, opposed to also updating a at each iteration.
No need to use global vars for this - can use scan instead - see here.
Example --
Generate a table -
q)t:0N!([] time:5?.z.p; sym:5?`3; price:5?100f; size:5?10000)
time sym price size
-----------------------------------------------
2002.04.04D18:06:07.889113280 cmj 29.07093 3994
2007.05.21D04:26:13.021438816 llm 7.347808 496
2010.10.30D10:15:14.157553088 obp 31.59526 1728
2005.11.01D21:15:54.022395584 dhc 34.10485 5486
2005.03.06D21:05:07.403334368 mho 86.17972 2318
Example with a simple accumilator - note, the function has access to the other args if needed (see next example):
q)update someCol:{[a;x;y;z] (a+1)}\[0;time;price;size] from t
time sym price size someCol
-------------------------------------------------------
2002.04.04D18:06:07.889113280 cmj 29.07093 3994 1
2007.05.21D04:26:13.021438816 llm 7.347808 496 2
2010.10.30D10:15:14.157553088 obp 31.59526 1728 3
2005.11.01D21:15:54.022395584 dhc 34.10485 5486 4
2005.03.06D21:05:07.403334368 mho 86.17972 2318 5
Say you wanted to get cumilative size:
q)update cuSize:{[a;x;y;z] (a+z)}\[0;time;price;size] from t
time sym price size cuSize
------------------------------------------------------
2002.04.04D18:06:07.889113280 cmj 29.07093 3994 3994
2007.05.21D04:26:13.021438816 llm 7.347808 496 4490
2010.10.30D10:15:14.157553088 obp 31.59526 1728 6218
2005.11.01D21:15:54.022395584 dhc 34.10485 5486 11704
2005.03.06D21:05:07.403334368 mho 86.17972 2318 14022
If you wanted more than one var passed through the scan, can pack more values into the first var, by giving it a more complex structure:
q)update cuPriceAndSize:{[a;x;y;z] (a[0]+y;a[1]+z)}\[0 0;time;price;size] from t
time sym price size cuPriceAndSize
--------------------------------------------------------------
2002.04.04D18:06:07.889113280 cmj 29.07093 3994 29.07093 3994
2007.05.21D04:26:13.021438816 llm 7.347808 496 36.41874 4490
2010.10.30D10:15:14.157553088 obp 31.59526 1728 68.014 6218
2005.11.01D21:15:54.022395584 dhc 34.10485 5486 102.1188 11704
2005.03.06D21:05:07.403334368 mho 86.17972 2318 188.2986 14022
#MdSalih solution is correct, I am just explaining here what could be the possible reason with global variable in your case and solution for that.
q) t:([]id: 1 2)
q)a:1
I think you might have been using it like this:
q) select k:{x:x+a;a::a+1;:x} id from t
output:
k
--
1
2
And a value is 2 which means function executed only once. Reason is we passed full id column list to function and (+) is atomic which means it operates on full list at once. In following ex. 2 will get added to all items in list.
q) 2 + (1;3;5)
Correct way to use it is 'each':
q)select k:{x:x+a;a::a+1;:x} each id from t
output:
k
--
2
3