Pivot table in kdb+/q - kdb

I'm trying to pivot some trade data in KDB/q. Although my data are only slightly different from the working example on the website (see the general pivot function: http://code.kx.com/q/cookbook/pivoting-tables/),
I can't get the function to work, even after several hours of trying (I'm very new to KDB).
Put simply, I'm trying to go from this table:
q)5# trades_agg
date sym time exchange buysell| shares
--------------------------------------| ------
2009.01.05 aaca 09:30 BATS B | 484
2009.01.05 aaca 09:30 BATS S | 434
2009.01.05 aaca 09:30 NASDAQ B | 235
2009.01.05 aaca 09:30 NASDAQ S | 429
2009.01.05 aaca 09:30 NYSE B | 309
to this one:
date sym time | BATSsharesB BATSsharesS NASDAQsharesB ...
----------------------| -----------------------------------------------
2009.01.05 aaca 09:30 | 484 434 235 ...
... | ...
I'll provide a working example to illustrate things:
// Create data
qpd:5*2*4*"i"$16:00-09:30
date:raze(100*qpd)#'2009.01.05+til 5
sym:(raze/)5#enlist qpd#'100?`4
sym:(neg count sym)?sym
time:"t"$raze 500#enlist 09:30:00+15*til qpd
time+:(count time)?1000
exchange:raze 500#enlist raze(qpd div 3)#enlist`NYSE`NASDAQ`BATS
buysell:raze 500#enlist raze(qpd div 2)#enlist`B`S
shares:(500*qpd)?100
trades:([]date;sym;time;exchange;buysell;shares)
//I then aggregate the data into equal sized buckets
trades_agg: select sum shares by date, sym, time: 15 xbar time.minute, exchange, buysell from trades
// pivot function from the code.kx.com website
piv:{[t;k;p;v;f;g]
v:(),v;
G:group flip k!(t:.Q.v t)k;
F:group flip p!t p;
count[k]!g[k;P;C]xcols 0!key[G]!flip(C:f[v]P:flip value flip key F)!raze
{[i;j;k;x;y]
a:count[x]#x 0N;
a[y]:x y;
b:count[x]#0b;
b[y]:1b;
c:a i;
c[k]:first'[a[j]#'where'[b j]];
c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]}
I subsequently apply this pivot function to the example with the functions f and g set to their default (::) values but I get an error message:
piv[`trades_agg;`date`sym`time;`exchange`buysell;`shares;(::);(::)]
Even when I use the suggested f and g functions it doesn't work:
f:{[v;P]`$raze each string raze P[;0],'/:v,/:\:P[;1]}
g:{[k;P;c]k,(raze/)flip flip each 5 cut'10 cut raze reverse 10 cut asc c}
I don't get why this is not working correctly since it is so close to the example on the website.

This is a self-contained version that's easier to use:
tt:1000#0!trades_agg
piv:{[t;k;p;v]
/ controls new columns names
f:{[v;P]`${raze " " sv x} each string raze P[;0],'/:v,/:\:P[;1]};
v:(),v; k:(),k; p:(),p; / make sure args are lists
G:group flip k!(t:.Q.v t)k;
F:group flip p!t p;
key[G]!flip(C:f[v]P:flip value flip key F)!raze
{[i;j;k;x;y]
a:count[x]#x 0N;
a[y]:x y;
b:count[x]#0b;
b[y]:1b;
c:a i;
c[k]:first'[a[j]#'where'[b j]];
c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]};
q)piv[`tt;`date`sym`time;`exchange`buysell;enlist `shares]
date sym time | BATS shares B BATS shares S NASDAQ shares B NASDAQ sha..
---------------------| ------------------------------------------------------..
2009.01.05 adkk 09:30| 577 359 499 452 ..
2009.01.05 adkk 09:45| 882 501 339 467 ..
2009.01.05 adkk 10:00| 620 513 411 128 ..
2009.01.05 adkk 10:15| 501 544 272 544 ..
2009.01.05 adkk 10:30| 291 594 363 331 ..
2009.01.05 adkk 10:45| 867 500 498 536 ..
2009.01.05 adkk 11:00| 624 632 694 493 ..
2009.01.05 adkk 11:15| 99 704 600 299 ..
2009.01.05 adkk 11:30| 269 394 280 392 ..
2009.01.05 adkk 11:45| 635 744 758 597 ..
2009.01.05 adkk 12:00| 562 354 498 405 ..
2009.01.05 adkk 12:15| 416 437 303 492 ..
2009.01.05 adkk 12:30| 447 699 370 302 ..
2009.01.05 adkk 12:45| 336 647 512 245 ..
2009.01.05 adkk 13:00| 692 457 497 553 ..

Your table is keyed so unkey it:
trades_agg:0!select sum shares by date, sym, time: 15 xbar time.minute,exchange,buysell from trades
And define your g as:
g:{[k;P;c]k,c}
Best way to figure out what the f/g needs to be is to define it with a breakpoint and then investigate the variables
g:{[k;P;c]break}

I found it difficult to understand the original piv function in Ryan's answer, so I updated it by adding some comments + more readable variable names HTH
piv:{[table; rows; columns; vals]
/ make sure args are lists
vals: (),vals;
rows: (),rows;
columns: (),columns;
/ Get columns of table corresponding to those of row labels and calculate groups
/ group returns filteredValues dict whose keys are the unique row labels and vals are the row indices of each group e.g. (0 1 3; 2 4; ...)
rowGroups: group rows#table;
rowGroupIdxs: value rowGroups;
rowValues: key[rowGroups];
/ Similarly, get columns of table corresponding to those of column labels and calculate groups
colGroups: group columns#table;
colGroupIdxs: value colGroups;
colValues: key colGroups;
getPivotCol: {[rowGroupStartIdx; nonSingleRowGroups; nonSingleRowGroupsIdx; vals; colGroupIdxs]
/ vals: the list of values for this particular value-column combination
/ colGroupIdxs: the list of indices for this particular column group
/ We only care about vals that should belong in this pivot column - we need to filter out vals not part of this column group
filteredValues: count[vals]#vals[0N];
filteredValues[colGroupIdxs]: vals[colGroupIdxs];
/ Equivalent to filteredValues <> 0N
hasValue: count[vals]#0b;
hasValue[colGroupIdxs]: 1b;
/ Seed off pivot column with the first (filtered) value of each row group
/ This will be correct for row groups of size 1 as no aggregation needs to occur
pivotCol: filteredValues[rowGroupStartIdx];
/ Otherwise, for the row groups larger than 1, get the first (filtered) value
pivotCol[nonSingleRowGroupsIdx]: first'[filteredValues[nonSingleRowGroups]#'where'[hasValue[nonSingleRowGroups]]];
pivotCol
}
/ Groups with more than 1 row (these are the ones that will need aggregating)
nonSingleRowGroupsIdx: where 1 <> count'[rowGroupIdxs];
/ Get resulting pivot column for each combination of column and value fields
pivotCols: raze getPivotCol[rowGroupIdxs[;0]; rowGroupIdxs[nonSingleRowGroupsIdx]; nonSingleRowGroupsIdx] /:\: [table[vals]; colGroupIdxs]
/ Columns names are the cross-product of column and value fields
colNames:`${raze "" sv vals} each string raze (flip value flip colValues),'/:vals;
/ Finally, stitch together row and column headings with pivot data to obtain final table
rowValues!flip colNames!pivotCols
};
I also made a small change to formatting of columns names for my needs btw

Related

How can I efficiently convert the output of one KDB function into three table columns?

I have a function that takes as input some of the values in a table and returns a tuple if you will - three separate return values, which I want to transpose into the output of a query. Here's a simplified example of what I want to achieve:
multiplier:{(x*2;x*3;x*3)};
select twoX:multiplier[price][0]; threeX:multiplier[price][1]; fourX:multiplier[price][2] from data;
The above basically works (I think I've got the syntax right for the simplified example - if not then hopefully my intention is clear), but is inefficient because I'm calling the function three times and throwing away most of the output each time. I want to rewrite the query to only call the function once, and I'm struggling.
Update
I think I missed a crucial piece of information in my explanation of the problem which affects the outcome - I need to get other data in the query alongside the output of my function. Here's a hopefully more realistic example:
multiplier:{(x*2;x*3;x*4)};
select average:avg price, total:sum price, twoX:multiplier[sum price][0]; threeX:multiplier[sum price][1]; fourX:multiplier[sum price][2] by category from data;
I'll have a go at adapting your answers to fit this requirement anyway, and apologies for missing this bit of information. The real function if a proprietary and fairly complex algorithm and the real query has about 30 output columns, hence the attempt at simplifying the example :)
If you're just looking for the results themselves you can extract (exec) as lists, create dictionary and then flip the dictionary into a table:
q)exec flip`twoX`threeX`fourX!multiplier[price] from ([]price:til 10)
twoX threeX fourX
-----------------
0 0 0
2 3 4
4 6 8
6 9 12
8 12 16
10 15 20
12 18 24
14 21 28
16 24 32
18 27 36
If you need other columns from the original table too then its trickier but you could join the tables sideways using ,'
q)t:([]price:til 10)
q)t,'exec flip`twoX`threeX`fourX!multiplier[price] from t
An apply # can also achieve what you want. Here data is just a table with 10 random prices. # is then used to apply the multiplier function to the price column while also assigning a column name to each of the three resulting lists:
q)data:([] price:10?100)
q)multiplier:{(x*2;x*3;x*3)}
q)#[data;`twoX`threeX`fourX;:;multiplier data`price]
price twoX threeX fourX
-----------------------
80 160 240 240
24 48 72 72
41 82 123 123
0 0 0 0
81 162 243 243
10 20 30 30
36 72 108 108
36 72 108 108
16 32 48 48
17 34 51 51

Arrays in J: indexing from one into another

Using J I am trying to do something similar to the following example shown on page 128 of Mastering Dyalog APL by Bernard Legrand (2009). I have not been able to find a direct conversion of this code into J, which is what I want.
Here's the example:
BHCodes ← 83 12 12 83 43 66 50 81 12 83 14 66 etc...
BHAmounts ← 609 727 458 469 463 219 431 602 519 317 663 631...
13.3.2 - First Question
We would like to focus on some selected countries (14, 43, 50, 37, and 66) and
calculate the total amount of their sales. Let’s first identify which
items of BHCodes are relevant:
Selected ← 14 43 50 37 66
BHCodes ∊ Selected
0 0 0 0 1 1 1 0 0 0 1 1 0 1 0 ⇦ Identifies sales in the selected countries only.
Then we can apply this filter to the amounts, and add them up:
(BHCodes ∊ Selected) / BHAmounts
463 219 431 663 631 421
+/ (BHCodes ∊ Selected) / BHAmounts
2828
+/ (BHCodes e. Selected) # BHAmounts
For your purposes here, APL's ∊ is J's e. (Member (In)) and APL's / is J's # (Copy).
Notes:
APL's ∊ and J's e. are not completely equivalent as APL's ∊ looks for every element in its left argument among the elements of its right argument, while J's e. looks for every major cell. of its left argument in the major cells of its right argument.
APL's / and J's # are not completely equivalent as APL's / operates along the trailing axis while J's # operates along the leading axis. APL does have ⌿ though, which operates along the leading axis. There are more nuances, but they are not relevant here.

Replace values within matlab matrix using column values from another matrix

I have a big matrix (8656x25960) with some speckle noise within it. I used the findpeaks tool in order to find in what columns I indeed have peaks above a certain threshold. The output of the findspeaks tool is a matrix containing all of the bad columns, for example -
loc =
Columns 1 through 6
30 51 155 307 333 338
Columns 7 through 12
642 955 1409 1567 1728 1730
Columns 13 through 18
2332 2546 2615 2685 2806 2995
Columns 19 through 24
3002 3122 3124 3164 3690 4176
Columns 25 through 30
4430 4475 4539 5142 5155 5244
Columns 31 through 36
5246 5941 5943 6114 6486 6922
Columns 37 through 42
7165 7169 7460 7587 7647 8944
Columns 43 through 44
12754 13693
How can I use those columns numbers with the original matrix and replace the values of this 'bad' column with the value 0 (for example).
Hoping I'm clear enough.
For row vector Ioc simply use indexing:
yourmatrix(:,Ioc) = 0;

How to match and copy time-data from one matrix to another?

In MATLAB (R2015a), I have two large matrices that can be simplified as:
A = [ 24 10 830; 24 15 830; 150 17 945; 231 40 1130; 231 45 1130]
(note that in A, column 3 is the time for the event cataloged in column 1) and
B = [24 13; 150 29; 231 43]
How can I match and copy the time-data from column 3 in matrix A, to a matching and filtered event column in matrix B?
For example, I want the value 24 from first column in B to be matched with value 24 from first column in A, and then copy the corresponding time-data in A's third column (for 24 it's 830, for 150 it's 945 etc.) into B. This should result into our new B with the time-data from A:
B = [24 13 830; 150 29 945; 231 43 1130]
I relatively new to MATLAB so any help is much appreciated!
First find the location of the elements in the first row of B in the first row of A using the ismember function. And then use those locations to construct the new matrix.
[~,Locb] = ismember(B(:,1),A(:,1));
Bnew = [B A(Locb,3)]
Bnew =
24 13 830
150 29 945
231 43 1130
This is a fast way that comes to my mind. There might be some singularities that needed to be checked more thoroughly.

MATLAB - extract selected rows in a table based on some criterion

Let's say I have a table like this:
post user date
____ ____ ________________
1 A 12.01.2014 13:05
2 B 15.01.2014 20:17
3 A 16.01.2014 05:22
I want to create a smaller table (but not delete the original one!) containing all posts of - for example - user A including the dates that those were posted on.
When looking at MATLAB's documentation (see the very last part for deleting rows) I discovered that MATLAB allows you to create a mask for a table based on some criterion. So in my case if I do something like this:
postsA = myTable.user == 'A'
I get a nice mask vector as follows:
>> postsA =
1
0
1
where the 1s are obviously those rows in myTable, which satisfy the rule I have given.
In the documention I have pointed at above rows are deleted from the original table:
postsNotA = myTable.user ~= 'A' % note that I have to reverse the criterion since I'm choosing stuff that will be removed
myTable(postsNotA,:) = [];
I would however - as stated above - like to not touch my original table. One possible solution here is to create an empty table with two columns:
post date
____ ____
interate through all rows of my original table, while also looking at the current value of my mask vector postsA and if it's equal to 1, copy the two of the columns in that row that I'm interested in and concatenate this shrunk row to my smaller table. What I'd like to know is if there is a more or less 1-2 lines long solution for this problem?
Assuming myTable is your original table.
You can just do
myTable(myTable.user == 'A',:)
Sample Code:
user = ['A';'B';'A';'C';'B'];
Age = [38;43;38;40;49];
Height = [71;69;64;67;64];
Weight = [176;163;131;133;119];
BloodPressure = [124 93; 109 77; 125 83; 117 75; 122 80];
T = table(user,Age,Height,Weight,BloodPressure)
T(T.user=='A',:)
Gives:
T =
user Age Height Weight BloodPressure
____ ___ ______ ______ _________________________
A 38 71 176 124 93
B 43 69 163 109 77
A 38 64 131 125 83
C 40 67 133 117 75
B 49 64 119 122 80
ans =
user Age Height Weight BloodPressure
____ ___ ______ ______ _________________________
A 38 71 176 124 93
A 38 64 131 125 83