Explanation of code that constructs correlation matrix - kdb

I am referring to this answer.
The code to construct a correlation matrix given a table of columns is
u cor/:\:u:flip t where t is a table.
Reading right to left, I understand up till u:flip t. May I please ask for an explanation on what the rest of the code does?
Thanks

If you substitute for a something which gives more visual output, such as join with two vectors, it should be easier to see what the derived function cor/:\: is doing
q)"123","abc" // simple join
"123abc"
q)"123",/:"abc" // join left arg to each item of right arg
"123a"
"123b"
"123c"
q)"123",/:\:"abc" // join each item of left arg to each item of right
"1a" "1b" "1c"
"2a" "2b" "2c"
"3a" "3b" "3c"
Back to a simple example of cor
q)show t:([]a:3?1.0;b:3?1.0;c:3?1.0)
a b c
-------------------------------
0.7935513 0.6377554 0.3573039
0.2037285 0.03845637 0.02547383
0.7757617 0.8972357 0.688089
q)u cor/:\:u:flip t
| a b c
-| -----------------------------
a| 1 0.9474878 0.8529413
b| 0.9474878 1 0.975085
c| 0.8529413 0.975085 1
q)show data:value flip t; // extract the data for clarity
0.7935513 0.2037285 0.7757617
0.6377554 0.03845637 0.8972357
0.3573039 0.02547383 0.688089
q)cor[data 0;]each data // first row cor each row
1 0.9474878 0.8529413
q)cor[data 1;]each data // second row cor each row
0.9474878 1 0.975085
q)cor[data 2;]each data // last row cor each row
0.8529413 0.975085 1
q){cor[x]each data}each data // all at once
1 0.9474878 0.8529413
0.9474878 1 0.975085
0.8529413 0.975085 1
q)data cor/:\:data // derived function much nicer
1 0.9474878 0.8529413
0.9474878 1 0.975085
0.8529413 0.975085 1

If you are looking into correlation matrices then it might be a good idea to have a look into what they are, this might give some context to the inputs/outputs/code.
https://www.displayr.com/what-is-a-correlation-matrix/?msclkid=f68768aeab8e11ecbca30d34e2ba880f
In this case, we are finding the correlation between some matrix/table u:flip t and itself.
The rest of the query is comprised of function cor and two kdb+ iterators each right /: and each left \:.
https://code.kx.com/q/ref/cor/?msclkid=748e645bab8d11ecab715a544f547398
https://code.kx.com/q/wp/iterators/?msclkid=d2172906ab8c11ecac2b46902bbe505d
Each right will apply each item of the right-hand argument to each of the left-hand argument
q)1 ,/: 10 20 30
1 10
1 20
1 30
While each left will apply each item of the left-hand argument to each of the right-hand arguments
q)1 2 3 ,\: 10
1 10
2 10
3 10
If we use both simultaneously as illustrated below where we join , each element of the left-hand list \: with each element of the right-hand list /:
q)1 2 3,/:\:10 20 30
1 10 1 20 1 30
2 10 2 20 2 30
3 10 3 20 3 30
Thenu cor/:\:u:flip t can be understood to be taking each element of u and finding its correlation with every element within u, achieved through the use of cor/:\:.

Related

read function name and args from a table row and iterate and execute it and store the output to a single table

I have a data.csv which looks like below having a function name and a dictionary.
function,args
fun1,(`startDate`endDate`sym`rollPerct`expDateThreshold`expDateThresholdExpiry)!(.z.D-5;.z.D;`AAPL;0.8;10;1)
fun2,(`startDate`endDate`sym`rollPerct`expDateThreshold`expDateThresholdExpiry)!(.z.D-5;.z.D;`MSFT`ZAK;0.8;10;1)
fun3,(`startDate`endDate`sym`rollPerct`expDateThreshold`expDateThresholdExpiry)!(.z.D-5;.z.D;`NAFK;0.8;10;1)
And If I read the data
tab:("S*";enlist ",") 0:`$data.csv
Now, I want to iterate all rows from the table like below and call them and save all 3 results to a single table res
fun1 [(`startDate`endDate`sym`rollPerct`expDateThreshold`expDateThresholdExpiry)!(.z.D-5;.z.D;`AAPL;0.8;10;1)]
fun2 [(`startDate`endDate`sym`rollPerct`expDateThreshold`expDateThresholdExpiry)!(.z.D-5;.z.D;`MSFT`ZAK;0.8;10;1)]
fun3 [(`startDate`endDate`sym`rollPerct`expDateThreshold`expDateThresholdExpiry)!(.z.D-5;.z.D;`NAFK;0.8;10;1)]
Code snippet to iterate over f1[args], f2[args] and f3[args]. Combine all 3 results into a single table. I had used loop here, but there should be something better than loop here? let me know if any?
cnt:(count table); //get count of table
ino:0; //initialize out counter to 0
tab::flip (`date`sym`ric!(`date$();`symbol$();`symbol$())); //create a global table so it can hold iteration data
//perform iteration where f1[args],f2[args],f3[args]=tab
while[ino<cnt;
data:exec .[first function;args] from table where i=ino;
upsert[`tab;data];
ino:ino+1
]
//tab now has all the itration data of f1 f2 f3
tab
if your inputs are correctly ordered for all functions, the following simple example should work
q)f1:{x+y+z+2};f2:{x*y*z*22};f3:{x%y%z%42};
q)tab:([]func:`f1`f2`f3;args:`x`y`z!/:3 cut til 9)
q)tab
func args
-----------------
f1 `x`y`z!0 1 2
f2 `x`y`z!3 4 5
f3 `x`y`z!6 7 8
q)update res:func .'get'[args]from tab
func args res
---------------------------
f1 `x`y`z!0 1 2 5
f2 `x`y`z!3 4 5 1320
f3 `x`y`z!6 7 8 0.1632653
NB: if you're loaded args are strings, you'll want to parse these
for example - taking the above again
q)tab:update .Q.s1'[args]from tab
q)tab
func args
-------------------
f1 "`x`y`z!0 1 2"
f2 "`x`y`z!3 4 5"
f3 "`x`y`z!6 7 8"
q)meta tab
c | t f a
----| -----
func| s
args| C
q)tab:update'[reval;parse]'[args]from tab
q)tab
func args
-----------------
f1 `x`y`z!0 1 2
f2 `x`y`z!3 4 5
f3 `x`y`z!6 7 8
q)meta tab
c | t f a
----| -----
func| s
args|
q)update res:func .'get'[args]from tab
func args res
---------------------------
f1 `x`y`z!0 1 2 5
f2 `x`y`z!3 4 5 1320
f3 `x`y`z!6 7 8 0.1632653
reval in the above will try to stop anything dodgy being ran but i would avoid parsing code straight from files where possible

How do I calculate a rolling 30 day window in KDB?

I have a keyed table of the form:
t | ar av mr mv
-----------------------------| ----------------------------------------
2016.01.04D09:51:00.000000000| -0.001061315 513 -0.01507338 576
2016.01.04D11:37:00.000000000| -0.0004846135 618 -0.001100514 583
2016.01.04D12:04:00.000000000| -0.0009708739 1619 -0.001653045 1000
I want to calculate the 30 day rolling correlation ar cor mr.
I'm stuck trying to create a self join with wj, but I'm not getting anywhere. Is this the way to do it?
You could do something like:
/-Function which creates the rolling windows (w:window size, s:list)
q)f:{[w;s] (w-1)_({ 1_x,y }\[w#0;s])}
/-e.g.
q)f[3;til 5]
0 1 2
1 2 3
2 3 4
/-Apply cor to each 30-day rolling window as below:
q)ar:exec ar from t;
q)mr:exec mr from t;
q)cor'[f[30;ar]; f[30; mr]]

How to traverse M*N grid in KDB

How to traverse m*n grid in Qlang, you can traverse up , down or diagonally.
to find how many possible ways end point can be reached.
Like Below :
0
|
------- ------
| | |
( 0 1) (1 1) (1 0)
| . |
------ ----- ------ -----
| | . | |
( 0 1) (1 0) ( 1 1) (2 0)
....
(2 2) ..................... (2 2)
One way of doing it using .z.s to recursively call the initial function with different arguments and summing to give total number of paths.
f:{
// When you reach a wall, there is only one way to corner so return valid path
if[any 1=(x;y);:1];
// Otherwise spawn 3 paths - one up, one right and one diagonally
:.z.s[x-1;y] + .z.s[x;y-1] + .z.s[x-1;y-1]
}
q)f[2;2]
3
q)f[2;3]
5
q)f[3;3]
13
If you are travelling along the edges and not the squares you can change the first line to:
if[any 0=(x;y);:1];
A closed form solution is just finding the Delannoy Number, which could be implemented something like this when you are travelling along edges.
d:{
k:1+min(x;y);
f:{prd 1+til x};
comb:{[f;m;n] f[m] div f[n]*f[m-n]}[f];
(sum/) (2 xexp til k) * prd (x;y) comb/:\: til k
}
q)d[3;3]
63f
This is much quicker for larger boards as I think the complexity of the first solution is O(3^m+n) while the complexity of the second is O(m*n)
q)\t f[7;7]
13
q)\t f[10;10]
1924
q)\t d[7;7]
0
q)\t d[100;100]
1

How can i calculate correlation matrix?

I have a table with columns a, b, c. Can I calculate the correlation matrix of cor[a;a], cor[a;b], cor[a;c] using functional form somehow?
?[table; (); 0b; (`aa`ab`ac)!((cor; `a; `a); (cor; `a; `b);(cor; `a; `b));
How can i generate the list of the last argument?
(cor; a;b)
q)show t:([]a:5?1.0;b:5?1.0;c:5?1.0)
a b c
------------------------------
0.389056 0.949975 0.6919531
0.391543 0.439081 0.4707883
0.08123546 0.5759051 0.6346716
0.9367503 0.5919004 0.9672398
0.2782122 0.8481567 0.2306385
q)u cor/:\:u:flip t
| a b c
-| --------------------------------
a| 1 -0.1328262 0.6671159
b| -0.1328262 1 -0.1830702
c| 0.6671159 -0.1830702 1
So manually typed out form:
q)t:([] a:10?10; b:10?10; c:10?10)
q)?[t;();0b;`aa`ab`ac!((cor;`a;`a);(cor;`a;`b);(cor;`a;`c))]
aa ab ac
-----------------------
1 -0.2530506 0.7966834
If you wanted to generate the last argument, assuming you wanted all permutations of first column combined with all columns:
q)a:{(`$raze'[string x])!(cor),/:x}{x[0],/:x}cols t;
q)?[t;();0b;a]
aa ab ac
-----------------------
1 -0.2530506 0.7966834
If you wanted all column permutations:
q)a:{(`$raze'[string x])!(cor),/:x}{x cross x}cols t
q)?[t;();0b;a]
aa ab ac ba bb bc ca cb cc
----------------------------------------------------------------------
1 -0.2530506 0.7966834 -0.2530506 1 -0.268787 0.7966834 -0.268787 1

Creating open-high-low-close (ohlc) bars from tick data in Matlab

I have a CSV file 'XPQ12.csv' of futures tick data in the following form:
20090312 30:14.0 717.25 1 E
20090312 30:15.0 718.47 1 E
20090312 30:17.0 717.25 1 E
20090312 30:32.0 718.42 1 E
20090312 30:49.0 715.32 1 E
20090312 30:58.0 717.57 1 E
20090312 31:06.0 716.65 3 E
20090312 31:12.0 718.35 2 E
20090312 31:45.0 721.14 1 E
20090312 31:52.0 719.24 1 E
20090312 32:11.0 717.02 6 E
20090312 32:29.0 717.14 1 E
20090312 32:35.0 717.34 1 E
20090312 32:55.0 717.26 1 E
(The first column is the yearmonthdate, the second column is the minute:second:tenthofsecond, the third column is the price, the fourth column is the number of contracts traded, and the fifth indicates if the trade was electronic or in a pit). In my actual data set, I may have thousands of price quotes within any given minute.
I read the file using the following code:
fid = fopen('C:\Program Files\MATLAB\R2013a\XPQ12.csv','r');
[c] = fscanf(fid, '%d,%d:%d.%d,%f,%d,%c')
Which outputs:
20090312
30
14
0
717.25
1
69
20090312
30
15
0
718.47
3
69
.
.
.
(the 69s are the matlab representation for E I believe)
Now I want to cut this up into one minute ohlc bars, so that for each minute, I record what the first, highest, lowest, and last price was within that minute. I'd really like to know the best way to go about this.
My original idea was to store the sequence of minutes in a vector d, and while working through the data, each time the number at the end of d changed I would record the corresponding price as an open, record the previous price as a close for the last bar, and find the largest and smallest prices within each open and close.
c(2) is the first minute, so I said:
d(1)=c(2);
and then noting that I'd always be counting by 7 before getting to the next minute, I said:
Nrows = numel(textread('XPQ12.csv','%1c%*[^\n]')); % counts rows in file
for i=1:Nrows
if mod(i-2,7)== 0;
d(end+1)=c(i);
end
end
which should fill up d with all the minutes:
30
30
30
30
30
30
31
31
31
31
32
32
32
32
in the case of the example data. I'm kind of lost what to do from here, or if what I'm doing is on the right track.
From where you are:
Minutes = c(2:7:end);
MinuteValues=unique(Minutes);
Prices = c(5:7:end);
if (length(Prices)>length(Minutes))
Prices=Prices(1:length(Minutes));
elseif (length(Prices)<length(Minutes))
Minutes=Minutes(1:length(Prices));
OverflowValues=1+find(Minutes(2:end)==0 & Minutes(1:end-1)==59);
for v=length(OverflowValues):-1:1
Minutes(OverflowValues(v):end)=Minutes(OverflowValues(v):end)+60;
end
Highs=zeros(1,length(MinuteValues));
Lows=zeros(1,length(MinuteValues));
First=zeros(1,length(MinuteValues));
Last=zeros(1,length(MinuteValues));
for v=1:length(MinuteValues)
Highs(v) = max(Prices(Minutes==MinuteValues(v)));
Lows(v) = min(Prices(Minutes==MinuteValues(v)));
First(v) = Prices(find(Minutes==MinuteVales(v),1,'first'));
Last(v) = Prices(find(Minutes==MinuteVales(v),1,'last'));
end
Using textread would make this easier for you, as mentioned.
(If you are lost at this stage, I wouldn't find accumarray as mentioned in the comments is the best place to start!)
By the way, this is assuming that minutes increases above 60 and you don't have hours in there somewhere. Otherwise this won't work at all.