How do I apply transformations to groups in KDB? - kdb

I have a table of the form:
sym date o h l c v d sp ao ah al ac av
-------------------------------------------------------------------------------------------------------
A 1999.11.18 45.5 50 40 44 4.47399e+07 0 1 30.10473 33.08212 26.46569 29.11226 4.47399e+07
A 1999.11.19 42.94 43 39.81 40.38 1.08971e+07 0 1 28.41092 28.45062 26.33998 26.71712 1.08971e+07
A 1999.11.22 41.31 44 40.06 44 4705200 0 1 27.33244 29.11226 26.50539 29.11226 4705200
I'm trying to bring down the previous close down to today:
select sym, date, c, prev c from daily
but this doesn't respect the sym ticker groups. How do I apply this transformation at the ticker level?
Edit:
Also, is there a way that I can enforce a sort on date with this schema?

This is a bit messy, but is this roughly what you are looking for?
q)t: ([] sym: `a`b`a`b`a; date: 2021.01.01 2021.01.01 2021.01.02 2021.01.02 2021.01.03; c: 10 11 8 9 10)
q)t
sym date c
-----------------
a 2021.01.01 10
b 2021.01.01 11
a 2021.01.02 8
b 2021.01.02 9
a 2021.01.03 10
q)ungroup select date, c, prevClose: prev c by sym from `date xasc t
sym date c prevClose
---------------------------
a 2021.01.01 10
a 2021.01.02 8 10
a 2021.01.03 10 8
b 2021.01.01 11
b 2021.01.02 9 11
If not, could you give some example output?

Related

Why is my conforming dictionary not getting turned into a table?

Let's say I have a table:
m:([] t: raze 3#'(2021.01.04+til 5); sym:15#`A`B`C; c: til 15)
t sym c
-----------------
2021.01.04 A 0
2021.01.04 B 1
2021.01.04 C 2
2021.01.05 A 3
2021.01.05 B 4
When I try to pivot it:
exec t!c by sym:sym from m
sym|
---| -----------------------------------------------------------------
A | 2021.01.04 2021.01.05 2021.01.06 2021.01.07 2021.01.08!0 3 6 9 12
B | 2021.01.04 2021.01.05 2021.01.06 2021.01.07 2021.01.08!1 4 7 10 13
C | 2021.01.04 2021.01.05 2021.01.06 2021.01.07 2021.01.08!2 5 8 11 14
I'd expect to get a table back, with columns sym, but I don't. What am I doing wrong?
if you're after a pivot with columns of sym you would want the following:
q)exec sym!c by t:t from m
t | A B C
----------| --------
2021.01.04| 0 1 2
2021.01.05| 3 4 5
2021.01.06| 6 7 8
2021.01.07| 9 10 11
2021.01.08| 12 13 14
It's because your column names have to be symbols:
q)exec(`$string t)!c by sym:sym from m
sym| 2021.01.04 2021.01.05 2021.01.06 2021.01.07 2021.01.08
---| ------------------------------------------------------
A | 0 3 6 9 12
B | 1 4 7 10 13
C | 2 5 8 11 14
These would be terrible column names though, so I would use .Q.id
q).Q.id exec(`$string t)!c by sym:sym from m
sym| a20210104 a20210105 a20210106 a20210107 a20210108
---| -------------------------------------------------
A | 0 3 6 9 12
B | 1 4 7 10 13
C | 2 5 8 11 14
It sounds like this isn't what you actually want though, so maybe Matthews answer is more relevant. My answer just explains why it didn't look like what you thought

How do I write a function that consumes columns in a query in KDB?

I've got a query of the form:
select lR cor mR from j
This works fine, and I get a single number.
When I try to use my custom function:
center:{x - avg raze x}
fit1:{
xc: 0^center[x];
yc: 0^center[y];
beta: raze yc lsq xc;
intercept: (avg raze y) - (beta * avg raze x);
raze beta
}
select fit1[lR;mR] from j
I get a column of numbers, as it appears to have applied the function row-wise, rather than column wise.
Two questions:
How can I go about fixing this?
Is it possible to see the source code of cor so I can learn, or is it closed?
Edit
t:([] date:tt; x:xx; y:yy)
t x y
--------------
2021.01.01 0 4
2021.01.02 1 4
2021.01.03 2 4
2021.01.04 3 4
2021.01.05 4 4
2021.01.06 5 4
2021.01.07 6 4
2021.01.08 7 4
2021.01.09 8 4
2021.01.10 9 4
select fit1[x;y] from t
Updated now with a proper example:
center:{x - avg raze x}
fit1:{
xc: 0^center[x];
yc: 0^center[y];
beta: raze yc lsq xc;
intercept: (avg raze y) - (beta * avg raze x);
raze beta
}
tt: 2021.01.01 + til 10
xx: 9h$til 10
yy: ((xx)*3) + 4
t:([] date:tt; x:xx; y:yy)
date x y
---------------
2021.01.01 0 4
2021.01.02 1 7
2021.01.03 2 10
2021.01.04 3 13
2021.01.05 4 16
2021.01.06 5 19
2021.01.07 6 22
2021.01.08 7 25
2021.01.09 8 28
2021.01.10 9 31
q)fit1[enlist xx;enlist yy]
,3f
q)select fit1[x;y] from t
y
----
-4.5
-3.5
-2.5
-1.5
-0.5
0.5
1.5
2.5
3.5
4.5
d
I don't think there is enough information provided to help give a definitive fix for your issue but your function creates an intercept variable which it doesn't use in the final return. This line is currently pointless. Your custom function is not designed to return an atom like cor.
Some keywords provide the underlying k code if you just enter the keyword in the q console. If you get the key word back then it is hidden but for cor and cov the equivalent code is provided in the docs.
https://code.kx.com/q/ref/cor/
https://code.kx.com/q/ref/cov/
To answer your first question you can do:
q)fit1[j`lR;j`mR]

How to join tables in Matlab (2018) by matching time intervals?

I have two tables A and B. I want to join them based on their validity time intervals.
A has product quality (irregular times) and B has hourly settings during the production period. I need to create a table like C that includes the parameters p1 and p2 for all A's RefDates that fall in the time range of B's ValidFrom ValidTo.
A
RefDate result
'11-Oct-2017 00:14:00' 17
'11-Oct-2017 00:14:00' 19
'11-Oct-2017 00:20:00' 5
'11-Oct-2017 01:30:00' 25
'11-Oct-2017 01:30:00' 18
'11-Oct-2017 03:03:00' 28
B
ValidFrom ValidTo p1 p2
'11-Oct-2017 00:13:00' '11-Oct-2017 01:12:59' 2 1
'11-Oct-2017 01:13:00' '11-Oct-2017 02:12:59' 3 1
'11-Oct-2017 02:13:00' '11-Oct-2017 03:12:59' 4 5
'11-Oct-2017 03:13:00' '11-Oct-2017 04:12:59' 6 1
'11-Oct-2017 04:13:00' '11-Oct-2017 05:12:59' 7 9
I need to get something like this.
C
RefDate res p1 p2
'11-Oct-2017 00:14:00' 17 2 1
'11-Oct-2017 00:14:00' 19 2 1
'11-Oct-2017 00:20:00' 5 2 1
'11-Oct-2017 01:30:00' 25 3 1
'11-Oct-2017 01:30:00' 18 3 1
'11-Oct-2017 03:03:00' 28 4 5
I know how to do this in SQL and I think I have figured out how to do this row by row in MatLab but this is horribly slow. The data set is rather large. I just assume there must be a more elegant way that I just couldn't find.
Something that caused many of my approaches to fail is that the RefDate column is not unique.
edit:
the real tables have thousands of rows and hundreds of variables.
C (in reality)
RefDate res res2 ... res200 p1 p2 ... p1000
11-Oct-2017 00:14:00 17 2 1
11-Oct-2017 00:14:00 19 2 1
11-Oct-2017 00:20:00 5 2 1
11-Oct-2017 01:30:00 25 3 1
11-Oct-2017 01:30:00 18 3 1
11-Oct-2017 03:03:00 28 4 5
This can actually be done in a single line of code. Assuming your ValidTo value always ends immediately before the ValidFrom in the next row (which it does in your example), you only need to use your ValidFrom values. First, convert those and your RefDate values to serial date numbers using datenum. Then use the discretize function to bin the RefDate values using the ValidFrom values as the edges, which will give you the row index in B that contains each time in A. Then use that index to extract the p1 and p2 values and append them to A:
>> C = [A B(discretize(datenum(A.RefDate), datenum(B.ValidFrom)), 3:end)]
C =
RefDate result p1 p2
______________________ ______ __ __
'11-Oct-2017 00:14:00' 17 2 1
'11-Oct-2017 00:14:00' 19 2 1
'11-Oct-2017 00:20:00' 5 2 1
'11-Oct-2017 01:30:00' 25 3 1
'11-Oct-2017 01:30:00' 18 3 1
'11-Oct-2017 03:03:00' 28 4 5
The above solution should work for any number of columns pN in B.
If there are any times in A that don't fall in any of the ranges in B, you will have to break the solution into multiple lines so you can check whether or not the index returned from discretize contains NaN values. Assuming you want to exclude those rows from C, this would be the new solution:
index = discretize(datenum(A.RefDate), datenum(B.ValidFrom));
C = [A(~isnan(index), :) B(index(~isnan(index)), 3:end)];
The following code does exactly what you are asking for:
% convert to datetime
A.RefDate = datetime(A.RefDate);
B.ValidFrom = datetime(B.ValidFrom);
B.ValidTo = datetime(B.ValidTo);
% for each row in A, find the matching row in B
i = cellfun(#find, arrayfun(#(x) (x >= B.ValidFrom) & (x <= B.ValidTo), A.RefDate, 'UniformOutput', false), 'UniformOutput', false);
% find rows in A that where not matched
j = cellfun(#isempty, i, 'UniformOutput', false);
% build the result
C = [B(cell2mat(i),:) A(~cell2mat(j),:)];
% display output
C

kdb voolkup. get value from table that is mapped to smallest val larger than x

Assuming I have a dict
d:flip(100 200 400 800 1600; 1 3 4 6 10)
how can I create a lookup function that returns the value of the smallest key that is larger than x? Given a table
tbl:flip `sym`val!(`a`b`c`d; 50 280 1200 1800)
I would like to do something like
{[x] : update new:fun[x[`val]] from x} each tbl
to end up at a table like this
tbl:flip `sym`val`new!(`a`b`c`d; 50 280 1200 1800; 1 4 10 0N)
sym val new
a 50 1
b 280 4
c 1200 10
d 1800
stepped dictionaries may help
http://code.kx.com/q/cookbook/temporal-data/#stepped-attribute
q)d:`s#-0W 100 200 400 800 1600!1 3 4 6 10 0N
q)d 50 280 1200 1800
1 4 10 0N
I think you will want to use binr to return the next element greater than or equal to x. Note that you should use a sorted list for this to work correctly. For the examples above, converting d to a dictionary with d:(!). flip d I came up with:
q)k:asc key d
q)d k k binr tbl`val
1 4 10 0N
q)update new:d k k binr val from tbl
sym val new
------------
a 50 1
b 280 4
c 1200 10
d 1800
Where you get the dictionary keys to use with: k k binr tbl`val.
Edit: if the value in the table needs to be mapped to a value greater than x but not equal to, you could try:
q)show tbl:update val:100 from tbl where i=0
sym val
--------
a 100
b 280
c 1200
d 1800
q)update new:d k (k-1) binr val from tbl
sym val new
------------
a 100 3
b 280 4
c 1200 10
d 1800

How to sum across a row in KDB/Q

I have a table rCom which has various columns. I would like to sum across each row..
for example:
Date TypeA TypeB TypeC TypeD
date1 40.5 23.1 45.1 65.2
date2 23.3 32.2 56.1 30.1
How can I write a q query to add a fourth column 'Total' that sums across each row?
why not just:
update Total: TypeA+TypeB+TypeC+TypeD from rCom
?
Sum will work just fine:
q)flip`a`b`c!3 3#til 9
a b c
-----
0 3 6
1 4 7
2 5 8
q)update d:sum(a;b;c) from flip`a`b`c!3 3#til 9
a b c d
--------
0 3 6 9
1 4 7 12
2 5 8 15
Sum has map reduce which will be better for a huge table.
One quick point regarding summing across rows. You should be careful about nulls in 1 column resulting in a null result for the sum. Borrowing #WooiKent Lee's example.
We put a null into the first position of the a column. Notice how our sum now becomes null
q)wn:.[flip`a`b`c!3 3#til 9;(0;`a);first 0#] //with null
q)update d:sum (a;b;c) from wn
a b c d
--------
3 6
1 4 7 12
2 5 8 15
This is a direct effect of the way nulls in q are treated. If you sum across a simple list, the nulls are ignored
q)sum 1 2 3 0N
6
However, a sum across a general list will not display this behavior
q)sum (),/:1 2 3 0N
,0N
So, for your table situation, you might want to fill in with a zero beforehand
q)update d:sum 0^(a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
Or alternatively, make it s.t. you are actually summing across simple lists rather than general lists.
q)update d:sum each flip (a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
For a more complete reference on null treatment please see the reference website
This is what worked:
select Answer:{[x;y;z;a] x+y+z+a }'[TypeA;TypeB;TypeC;TypeD] from
([] dt:2014.01.01 2014.01.02 2014.01.03; TypeA:4 5 6; TypeB:1 2 3; TypeC:8 9 10; TypeD:3 4 5)