Accumulate all values at every point in time by symbol - kdb

I have this table:
execs:([]time:til 12;sym:`a`b`c`a`c`c`c`b`b`a`b`c;leavesQty:(1000;900;1300;800;1200;900;600;800;400;300;200;100))
I have different syms and for each at some time a leavesQty. And now I want to extend the table this way that at every row I get the sum of all leavesQty entries by sym at this time.
So I would have to come up to these values for this example:
execs:([]time:til 12;sym:`a`b`c`a`c`c`c`b`b`a`b`c;leavesQty:(1000;900;1300;800;1200;900;600;800;400;300;200;100);accLeavesQty:(1000;1900;3200;3000;2900;2600;2300;2200;1800;1300;1100;600))

You add this column with a single update statement if you use fby:
q)update accLeavesQty:sums (deltas;leavesQty) fby sym from execs
time sym leavesQty accLeavesQty
-------------------------------
0 a 1000 1000
1 b 900 1900
2 c 1300 3200
3 a 800 3000
4 c 1200 2900
5 c 900 2600
6 c 600 2300
7 b 800 2200
8 b 400 1800
9 a 300 1300
10 b 200 1100
11 c 100 600

Firstly you want to get the deltas of the leaves Qunatity for each symbol so can see how the value changes over time. After that you just need to do a cumulative sum of the resulting column.
q)update sums accLeavesQty from update accLeavesQty:deltas leavesQty by sym from execs
time sym leavesQty accLeavesQty
-------------------------------
0 a 1000 1000
1 b 900 1900
2 c 1300 3200
3 a 800 3000
4 c 1200 2900
5 c 900 2600
6 c 600 2300
7 b 800 2200
8 b 400 1800
9 a 300 1300
10 b 200 1100
11 c 100 600

You have a nice case for fby
q)update accLeavesQty:sums (deltas;leavesQty) fby sym from execs
time sym leavesQty accLeavesQty
-------------------------------
0 a 1000 1000
1 b 900 1900
2 c 1300 3200
3 a 800 3000
4 c 1200 2900
5 c 900 2600
6 c 600 2300
7 b 800 2200
8 b 400 1800
9 a 300 1300
10 b 200 1100
11 c 100 600

Another method involves recursion:
update accLeavesQty:sum each #[;;:;]\[()!();sym;leavesQty] from execs
It keeps a running dictionary of the last accLeavesQty for each sym and then calculates the sum of each of them
q)update accLeavesQty:#[;;:;]\[()!();sym;leavesQty] from execs
time sym leavesQty accLeavesQty
---------------------------------------
0 a 1000 (,`a)!,1000
1 b 900 `a`b!1000 900
2 c 1300 `a`b`c!1000 900 1300
3 a 800 `a`b`c!800 900 1300
4 c 1200 `a`b`c!800 900 1200
5 c 900 `a`b`c!800 900 900
6 c 600 `a`b`c!800 900 600
7 b 800 `a`b`c!800 800 600
8 b 400 `a`b`c!800 400 600
9 a 300 `a`b`c!300 400 600
10 b 200 `a`b`c!300 200 600
11 c 100 `a`b`c!300 200 100

Related

Indexing a Structure in matlab

I was under the impression that structure in matlab were similar to query tables in sql but I have a feeling I might be wrong.
I have a rather large dataset consisting of many entries and many fields. Ideally I want to index the structure, pulling out only the data I am interested in. Here is an example of the dataset
Cond Type Stime ETime
2 10 1 900
2 10 1 900
2 10 1 900
3 1 901 1800
3 1 901 1800
4 1 1801 2700
8 1 901 1800
8 1 901 1800
9 1 901 1800
9 1 901 1800
12 1 901 1800
12 1 901 1800
13 10 1 900
13 10 1 900
13 10 1 900
16 1 901 1800
16 1 901 1800
17 10 1 900
17 10 1 900
17 10 1 900
19 10 1 900
19 10 1 900
19 10 1 900
20 10 1 900
20 10 1 900
20 10 1 900
22 1 901 1800
22 1 901 1800
25 10 1 900
25 10 1 900
25 10 1 900
27 1 901 1800
27 1 901 1800
28 1 901 1800
28 1 901 1800
30 1 1801 2700
31 1 901 1800
31 1 901 1800
32 10 1 900
32 10 1 900
32 10 1 900
35 10 1 900
35 10 1 900
35 10 1 900
What I want to do is pull specific data entries for analysis example being I want all entries where Type is equal to 10 or I want all Cond from 1:20 that have ETime == 900.
I can do this by the following
idx = find([stats.Type] == 10);
[stats(idx).Stime]
but for multiple types I need a for loop as trying to use a vector throws an error.
idx = find([stats.Type] == 1:10); % Does not work
% must use this
temp = [];
for aa = 1:10
idx = find([stats.Type] == aa);
temp = horzcat(idx,temp);
end
[stats(temp).Stime]
Is this the wrong way to use structures? Is there an easier method to index a structure to pull data of interest?
This answer proposes using table indexing instead of struct indexing, which is a bit of a side-step to directly answering the question. However, my comments on this post were deemed useful so I've formalised as an answer...
If you use struct2table then you can interact with it as a table, which is generally much more intuitive.
Structures are useful if your fields have different numbers of elements (i.e. you couldn't form a consistent height table). In almost all other areas, I find tables are easier to use.
With tables you can use:
Logical indexing
Sorting (including sortrows by column name)
The family of "join" operations
Dot notation for accessing table columns by name, as you do for accessing struct fields, or select multiple columns by name using myTable( :, {'col1','col2'} ). - You don't need weird syntactic tricks like [stats.Type] to group outputs, you can just do stats.Type
I would then use ismember to compare multiple items against a table column...
idx = ismember( stats.Type, 1:10 );
Unless you need the indices, you can skip using find for speed, and just directly index using idx.

kdb how to pass a column name into a function

As a simplifying example, I have
tbl:flip `sym`v1`v2!(`a`b`c`d; 50 280 1200 1800; 40 190 1300 1900)
and I d like to pass a column name into a function like
f:{[t;c];:update v3:2 * c from t;}
In this form it doesnt work. any suggestion how I can make this happen?
Thanks
Another option is to use the functional form of the update statement.
https://code.kx.com/q/ref/funsql/#functional-sql
q)tbl:flip `sym`v1`v2!(`a`b`c`d; 50 280 1200 1800; 40 190 1300 1900)
q)parse"update v3:2*x from t"
!
`t
()
0b
(,`v3)!,(*;2;`x)
q){![x;();0b;enlist[`v3]!enlist(*;2;y)]} [tbl;`v2]
sym v1 v2 v3
------------------
a 50 40 80
b 280 190 380
c 1200 1300 2600
d 1800 1900 3800
One option to achieve this is using # amend:
q){[t;c;n] #[t;n;:;2*t c]}[tbl;`v1;`v3]
sym v1 v2 v3
------------------
a 50 40 100
b 280 190 560
c 1200 1300 2400
d 1800 1900 3600
This updates the column c in table t saving the new value as column n. You could also alter this to allow you to pass in custom functions too:
{[t;c;n;f] #[t;n;:;f t c]}[tbl;`v1;`v3;{2*x}]

kdb voolkup. get value from table that is mapped to smallest val larger than x

Assuming I have a dict
d:flip(100 200 400 800 1600; 1 3 4 6 10)
how can I create a lookup function that returns the value of the smallest key that is larger than x? Given a table
tbl:flip `sym`val!(`a`b`c`d; 50 280 1200 1800)
I would like to do something like
{[x] : update new:fun[x[`val]] from x} each tbl
to end up at a table like this
tbl:flip `sym`val`new!(`a`b`c`d; 50 280 1200 1800; 1 4 10 0N)
sym val new
a 50 1
b 280 4
c 1200 10
d 1800
stepped dictionaries may help
http://code.kx.com/q/cookbook/temporal-data/#stepped-attribute
q)d:`s#-0W 100 200 400 800 1600!1 3 4 6 10 0N
q)d 50 280 1200 1800
1 4 10 0N
I think you will want to use binr to return the next element greater than or equal to x. Note that you should use a sorted list for this to work correctly. For the examples above, converting d to a dictionary with d:(!). flip d I came up with:
q)k:asc key d
q)d k k binr tbl`val
1 4 10 0N
q)update new:d k k binr val from tbl
sym val new
------------
a 50 1
b 280 4
c 1200 10
d 1800
Where you get the dictionary keys to use with: k k binr tbl`val.
Edit: if the value in the table needs to be mapped to a value greater than x but not equal to, you could try:
q)show tbl:update val:100 from tbl where i=0
sym val
--------
a 100
b 280
c 1200
d 1800
q)update new:d k (k-1) binr val from tbl
sym val new
------------
a 100 3
b 280 4
c 1200 10
d 1800

kdb how to aj with the first time of appearance

Here is my problem:
I have two tables:
q)t1:([]sym:1 5;x: 90 90)
q)t2:([]sym: 2 3 4 6 7 8; y: 100 200 300 400 500 600)
If I do aj[`sym;t2;t1], all the 6 columns in the result table will contain x with value 90.
But what I want is value 90 in column x only in row with sym 2 and 6, i.e the first time that sym in table t2 appear before table t1.
In other words, I want the result table to be like this:
q)([]sym:2 3 4 6 7 8; y: 100 200 300 400 500 600; x:90 0N 0N 90 0N 0N)
sym y x
----------
2 100 90
3 200
4 300
6 400 90
7 500
8 600
Could anyone tell me how I can achieve this? Thank you so much!
Not sure if aj can be used in this sense. This might give you what you need:
q)t2 lj 1!update sym:{x x binr y}[t2.sym;sym] from t1
sym y x
----------
2 100 90
3 200
4 300
6 400 90
7 500
8 600
Uses binr to find the next value greater than the value in t1 then joins only on that.
EDIT: note also that binr is >= ..... If you need strictly greater than you could use:
q)t2 lj 1!update sym:{x 1+x bin y}[t2.sym;sym] from t1
sym y x
----------
2 100 90
3 200
5 300
6 400 90
7 500
8 600
You can do aj to get the index where nearest smaller number of x will fit in, then a vector condition to get x when that index has got incremented, i.e.
select sym, y, x:?[c>prev c;x;0n] from aj[`sym; t2; update c:i from t1]

How to write matrices from matlab to .xlsx with special formatting tables

I have one problem with exporting matrices from Matlab to Excel. This is not a problem, but I need some formatting.
I made matrices A and B and I printed them to .xlsx document.
filename = 'example.xlsx';
A;
sheet = 1;
xlRange = 'A9';
xlswrite(filename,A,sheet,xlRange)
B;
xlRange2= 'B9';
xlswrite(filename,B,sheet,xlRange2)
And i get the example.xlsx file with this formating:
400 4.56
500 5.12
600 6.76
700 7.98
800 8.21
900 9.21
1000 10.12
1100 11.23
1200 12.43
1300 13.89
1400 14.54
1500 15.21
1600 16.23
1700 17.53
I need this kind of formating:
400 4.56
500 5.12
600 6.76
700 7.98
800 8.21
900 9.21
1000 10.12
100 11.23
200 12.43
300 13.89
400 14.54
500 15.21
600 16.23
700 17.53
Steps are on 500, 1000, 1500, 2000, 2500... How to put one empty row and how to make this kind of formating?
This code provides the cell as required for xlswrite:
M=[400 4.56
500 5.12
600 6.76
700 7.98
800 8.21
900 9.21
1000 10.12
1100 11.23
1200 12.43
1300 13.89
1400 14.54
1500 15.21
1600 16.23
1700 17.53
900 9.21
1000 10.12
1100 11.23
1200 12.43
1300 13.89
1400 14.54
1500 15.21
1600 16.23
1700 17.53];
gaps=[500, 1000, 1500, 2000, 2500];
%calculates a group indx. 0 is below first gap, 1 between first and second etc..
group=sum(bsxfun(#ge,M(:,1),gaps),2);
%whenever group increases a line must be jumped, calculate indices
index=cumsum(ones(size(M,1),1)+[0;diff(group)>0]);
%allocate empty cell
X=cell(max(index),size(M,2));
%fill data
X(index,:)=num2cell(M);
xlswrite('a.xlsx',X)