KDB projection with multiple variables - kdb

I have a function on KDB server as test[date;sym;starttime;endtime] and I want to run this function for list of symbol with specific date, starttime and endtime. for eg Test[2014.07.02,IBM,09:30:00,"11:00:25.325"] is one such row of the list which i want to pass to the "Test" function. I understand the projection function in KDB for eg each right/left (x f/: y) but how to pass a list of specific values for all the input arguments. Please see below for the input list
Date Symbol Starttime Endtime
2014.07.02 IBM 09:30:45 15:59:59.2
2014.07.03 AAPL 09:40:50 13:52:19.125
I will appreciate any help in this regard.
Thanks,

Here is my understanding of your doubt:
You have a list of inputs as L:(inputs1;inputs2;...) , where inputs1 is a list of (date;symbol;starttime;endtime) and you want to apply 'test' function on each input list in 'L'.
For this, KDB provide 'dot' operator.
Ex:
q) f:{[a;b;c] a+b+c}
q) f . (1 2 3)
q) 6
For list of inputs:
q) f ./: ((1 2 3);(4 5 6))
q) 6 15
In your case, it would be like:
q)test ./:L
Reference: https://code.kx.com/q/ref/apply/

Related

Make a list with the quarter and year based on a date range of quarters KDB+/Q

I have a list of date ranges for the past 8 quarters given by the below function
q) findLastYQuarters:{reverse("d"$(-3*til y)+m),'-1+"d"$(-3*-1+til y)+m:3 bar"m"$x}[currentDate;8]
q) findLastYQuarters
2020.01.01 2020.03.31
2020.04.01 2020.06.30
2020.07.01 2020.09.30
2020.10.01 2020.12.31
2021.01.01 2021.03.31
2021.04.01 2021.06.30
2021.07.01 2021.09.30
2021.10.01 2021.12.31
I need to produce a separate list that labels each item in this list by a specific format; the second list would need to be
1Q20,2Q20,3Q20,4Q20,1Q21,2Q21,3Q21,4Q21
This code needs to be able to run on it's own, so how can I take the first list as an input and produce the second list? I thought about casting the latter date in the range as a month and dividing it by 3 to get the quarter and extracting the year, but I couldn't figure out how to actually implement that. Any advice would be much appreciated!
I'm sure there are many ways to solve this, a function like f defined below would do the trick:
q)f:{`$string[1+mod[`month$d;12]%3],'"Q",/:string[`year$d:x[;0]][;2 3]}
q)lyq
2020.01.01 2020.03.31
2020.04.01 2020.06.30
2020.07.01 2020.09.30
2020.10.01 2020.12.31
2021.01.01 2021.03.31
2021.04.01 2021.06.30
2021.07.01 2021.09.30
2021.10.01 2021.12.31
q)f lyq
`1Q20`2Q20`3Q20`4Q20`1Q21`2Q21`3Q21`4Q21
Figured it out.
crop:findLastYQuarters;
crop[0]:crop[0][1];
crop[1]:crop[1][1];
crop[2]:crop[2][1];
crop[3]:crop[3][1];
crop[4]:crop[4][1];
crop[5]:crop[5][1];
crop[6]:crop[6][1];
crop[7]:crop[7][1];
labels:()
labelingFunc:{[r] temp:("." vs string["m"$r]); labels,((string(("J"$temp[1])%3)),"Q",(temp[0][2,3])};
leblingFunc each crop;
labels

kdb - how to create sum a list of dynamic columns using functional select

I want to be able to construct (+; (+; `a; `b); `c) given a list of `a`b`c
Similarly if I have a list of `a`b`c`d, I want to be able to construct another nest and so on and so fourth.
I've been trying to use scan but I cant get it right
q)fsum:(+;;)/
enlist[+;;]/
q)fsum `a`b`c`d
+
(+;(+;`a;`b);`c)
`d
If you only want the raw parse tree output, one way is to form the equivalent string and use parse. This isn't recommended for more complex examples, but in this case it is clear.
{parse "+" sv string x}[`a`b`c`d]
+
`d
(+;`c;(+;`b;`a))
If you are looking to use this in a functional select, we can use +/ instead of adding each column individually, like how you specified in your example
q)parse"+/[(a;b;c;d)]"
(/;+)
(enlist;`a;`b;`c;`d)
q)f:{[t;c] ?[t;();0b;enlist[`res]!enlist (+/;(enlist,c))]};
q)t:([]a:1 2 3;b:4 5 6;c:7 8 9;d:10 11 12)
q)f[t;`a`b`c]
res
---
12
15
18
q)f[t;`a`b]
res
---
5
7
9
q)f[t;`a`b`c]~?[t;();0b;enlist[`res]!enlist (+;(+;`a;`b);`c)]
1b
You can also get the sum by indexing directly to return a list of each column values and sum over these. We use (), to turn any input into a list, otherwise it will sum the values in that single column and return only a single value
q)f:{[t;c] sum t (),c}
q)f[t;`a`b`c]
12 15 18

KDB - Automatic function argument behavior with Iterators

I'm struggling to understand the behavior of the arguments in the below scan function. I understand the EWMA calc and have made an Excel worksheet to match in an attempt to try to understand but the kdb syntax is throwing me off in terms of what (and when) is x,y and z. I've referenced Q for Mortals, books and https://code.kx.com/q/ref/over/ and I do understand whats going on in the simpler examples provided.
I understand the EWMA formula based on the Excel calc but how is that translated into the function below?
x = constant, y= passed in values (but also appears to be prior result?) and z= (prev period?)
ewma: {{(y*1-x)+(z*x)} [x]\[y]};
ewma [.25; 15 20 25 30 35f]
15 16.25 18.4375 21.32813 24.74609
Rearranging terms makes it easier to read but if I were write this in Excel, I would incorrectly reference the y value column in the addition operator instead of correctly referencing the prev EWMA value.
ewma: {{y+x*z-y} [x]\[y]};
ewma [.25; 15 20 25 30 35f]
15 16.25 18.4375 21.32813 24.74609
EWMA in Excel formula for auditing
0N! is useful in these cases for determining variables passed. Simply add to start of function to display variable in console. EG. to show what z is being passed in as each run:
q)ewma: {{0N!z;(y*1-x)+(z*x)} [x]\[y]};
q)ewma [.25; 15 20 25 30 35f]
15f
16.25
18.4375
21.32812
//Or multiple at once
q)ewma: {{0N!(x;y;z);(y*1-x)+(z*x)} [x]\[y]};
q)
q)ewma [.25; 15 20 25 30 35f]
0.25 15 20
0.25 16.25 25
0.25 18.4375 30
0.25 21.32812 35
Edit:
To think about why z is holding 'y' values it is best to think about below simplified example using just x/y.
//two parameters specified in beginning.
//x initialised as 1 then takes the function result for next run
//y takes value of next value in list
q){0N!(x;y);x+y}\[1;2 3 4]
1 2
3 3
6 4
3 6 10
//in this example only one parameter is passed
//but q takes first value in list as x in this special case
q){0N!(x;y);x+y}\[1 2 3 4]
1 2
3 3
6 4
1 3 6 10
A similar occurrence is happening in your example. x is not being passed to the the iterator and therefore will assume the same value in each run.
The inner function y value will be initilised taking the first value of the outer y variable (15f in this case) like above simplified example. Then the z takes the 2nd value of the list for it's initial run. y then takes the result of previous function run and z takes the next value in the list until how list has bee passed to function.

Efficient method to query percentile in a list

I've come across the requirement to collect the percentiles from a list a few times:
Within what percentile is a certain number?
What is the nth percentile in a list?
I have written these methods to solve the issue:
/for 1:
percentileWithinThreshold:{[threshold;list] (100 * count where list <= threshold) % count list};
/for 2:
thresholdForPercentile:{[percentile;list] (asc list)[-1 + "j"$((percentile % 100) * count list)]};
They work well for both use cases, but I was thinking this is a too common use case, so probably Q offers already something out of the box that does the same. Any idea if there already exists something else?
'100 xrank' generates percentiles.
q) 100 xrank 1 2 3 4
q) 0 25 50 75
Solution for your second requirement:
q) f:{ y (100 xrank y:asc y) bin x}
Also, note that your second function result will not be always same as xrank. Reason for that is 'xrank' uses floor for fractional index output which is the normal scenario with calculating percentiles and your function round up the value and subtracts -1 which ensures that output will always be lesser-equal to input percentile. For example:
q) thresholdForPercentile[63;til 21] / output 12
q) f[63;til 21] / output 13
For first requirement, there is no inbuilt function. However you could improve your function if you keep your input list sorted because in that case you could use 'bin' function which runs faster on big lists.
q) percentileWithinThreshold:{[threshold;list] (100 * 1+list bin threshold) % count list};
Remember that 'bin' will throw type error if one argument is of float type and other is an integer. So make sure to cast them correctly inside the function.
qtln:{[x;y;z]cf:(0 1;1%2 2;0 0;1 1;1%3 3;3%8 8) z-4;n:count y:asc y;?[hf<1;first y;last y]^y[hf-1]+(h-hf)*y[hf]-y -1+hf:floor h:cf[0]+x*n+1f-sum cf}
qtl:qtln[;;8];

Understanding how to read each-right and each-left combined in kdb

From q for mortals, i'm struggling to understand how to read this, and understand it logically.
1 2 3,/:\:10 20
I understand the result is a cross product when in full form: raze 1 2 3,/:\:10 20.
But reading from left to right, I'm currently lost at understanding what this yields (in my head)
\:10 20
combined with 1 2 3,/: ??
Help in understanding how to read this clearly (in words or clear logic) would be appreciated.
I found myself saying the following in my head whilst I program the syntax in q. q works from right to left.
Internal Monologue -> Join the string on the right onto each of the strings on the left
code -> "ABC",\:"-D"
result -> "A-D"
"B-D"
"C-D"
I think that's an easy way to understand it. 'join' can be replaced with whatever...
Internal Monologue -> Does the string on the right match any of the strings on the left
code -> ("Cat";"Dog";"CAT";"dog")~\:"CAT"
result -> 0010b
Each-right is the same concept and combining them is straightforward also;
Internal Monologue -> Does each of the strings on the right match each of the strings on the left
code -> ("Cat";"Dog";"CAT";"dog")~\:/:("CAT";"Dog")
result -> 0010b
0100b
So in your example 1 2 3,/:\:10 20 - you're saying 'Join each of the elements on the right to each of the elements on the left'
Hope this helps!!
EDIT To add a real world example.... - consider the following table
q)show tab:([] upper syms:10?`2; names:10?("Robert";"John";"Peter";"Jenny"); amount:10?til 10)
syms names amount
--------------------
CF "Peter" 8
BP "Robert" 1
IC "John" 9
IN "John" 5
NM "Peter" 4
OJ "Jenny" 6
BJ "Robert" 6
KH "John" 1
HJ "Peter" 8
LH "John" 5
q)
I you want to get all records where the name is Robert, you can do; select from tab where names like "Robert"
But if you want to get the results where the name is either Robert or John, then it is a perfect scenario to use our each-left and each-right.
Consider the names column - it's a list of strings (a list where each element is a list of chars). What we want to ask is 'does any of the strings in the names column match any of the strings we want to find'... that translates to (namesList)~\:/:(list;of;names;to;find). Here's the steps;
q)(tab`names)~\:/:("Robert";"John")
0100001000b
0011000101b
From that result we want a compiled list of booleans where each element is true of it is true for Robert OR John - for example, if you look at index 1 of both lists, it's 1b for Robert and 0b for John - in our result, the value at index 1 should be 1b. Index 2 should be 1b, index3 should be 1b, index4 should be 0b etc... To do this, we can apply the any function (or max or sum!). The result is then;
q)any(tab`names)~\:/:("Robert";"John")
0111001101b
Putting it all together, we get;
q)select from tab where any names~\:/:("Robert";"John")
syms names amount
--------------------
BP "Robert" 1
IC "John" 9
IN "John" 5
BJ "Robert" 6
KH "John" 1
LH "John" 5
q)
Firstly, q is executed (and hence generally read) right to left. This means that it's interpreting the \: as a modifier to be applied to the previous function, which itself is a simple join modified by the /: adverb. So the way to read this is "Apply join each-right to each of the left-hand arguments."
In this case, you're applying the two adverbs to the join - \:10 20 on its own has no real meaning here.
I find it helpful to also look at the converse case 1 2 3,\:/:10 20, running that code produces a 2x6 matrix, which I'd describe more like "apply join each-left to each of the right hand arguments" ... I hope that makes sense.
An alternative syntax which also might help is ,/:\:[1 2 3;10 20] - this might be useful as it makes it very clear what the function you're applying is, and is equivalent to your in-place notation.