Arrays in J: indexing from one into another - filtering

Using J I am trying to do something similar to the following example shown on page 128 of Mastering Dyalog APL by Bernard Legrand (2009). I have not been able to find a direct conversion of this code into J, which is what I want.
Here's the example:
BHCodes ← 83 12 12 83 43 66 50 81 12 83 14 66 etc...
BHAmounts ← 609 727 458 469 463 219 431 602 519 317 663 631...
13.3.2 - First Question
We would like to focus on some selected countries (14, 43, 50, 37, and 66) and
calculate the total amount of their sales. Let’s first identify which
items of BHCodes are relevant:
Selected ← 14 43 50 37 66
BHCodes ∊ Selected
0 0 0 0 1 1 1 0 0 0 1 1 0 1 0 ⇦ Identifies sales in the selected countries only.
Then we can apply this filter to the amounts, and add them up:
(BHCodes ∊ Selected) / BHAmounts
463 219 431 663 631 421
+/ (BHCodes ∊ Selected) / BHAmounts
2828

+/ (BHCodes e. Selected) # BHAmounts
For your purposes here, APL's ∊ is J's e. (Member (In)) and APL's / is J's # (Copy).
Notes:
APL's ∊ and J's e. are not completely equivalent as APL's ∊ looks for every element in its left argument among the elements of its right argument, while J's e. looks for every major cell. of its left argument in the major cells of its right argument.
APL's / and J's # are not completely equivalent as APL's / operates along the trailing axis while J's # operates along the leading axis. APL does have ⌿ though, which operates along the leading axis. There are more nuances, but they are not relevant here.

Related

How can I efficiently convert the output of one KDB function into three table columns?

I have a function that takes as input some of the values in a table and returns a tuple if you will - three separate return values, which I want to transpose into the output of a query. Here's a simplified example of what I want to achieve:
multiplier:{(x*2;x*3;x*3)};
select twoX:multiplier[price][0]; threeX:multiplier[price][1]; fourX:multiplier[price][2] from data;
The above basically works (I think I've got the syntax right for the simplified example - if not then hopefully my intention is clear), but is inefficient because I'm calling the function three times and throwing away most of the output each time. I want to rewrite the query to only call the function once, and I'm struggling.
Update
I think I missed a crucial piece of information in my explanation of the problem which affects the outcome - I need to get other data in the query alongside the output of my function. Here's a hopefully more realistic example:
multiplier:{(x*2;x*3;x*4)};
select average:avg price, total:sum price, twoX:multiplier[sum price][0]; threeX:multiplier[sum price][1]; fourX:multiplier[sum price][2] by category from data;
I'll have a go at adapting your answers to fit this requirement anyway, and apologies for missing this bit of information. The real function if a proprietary and fairly complex algorithm and the real query has about 30 output columns, hence the attempt at simplifying the example :)
If you're just looking for the results themselves you can extract (exec) as lists, create dictionary and then flip the dictionary into a table:
q)exec flip`twoX`threeX`fourX!multiplier[price] from ([]price:til 10)
twoX threeX fourX
-----------------
0 0 0
2 3 4
4 6 8
6 9 12
8 12 16
10 15 20
12 18 24
14 21 28
16 24 32
18 27 36
If you need other columns from the original table too then its trickier but you could join the tables sideways using ,'
q)t:([]price:til 10)
q)t,'exec flip`twoX`threeX`fourX!multiplier[price] from t
An apply # can also achieve what you want. Here data is just a table with 10 random prices. # is then used to apply the multiplier function to the price column while also assigning a column name to each of the three resulting lists:
q)data:([] price:10?100)
q)multiplier:{(x*2;x*3;x*3)}
q)#[data;`twoX`threeX`fourX;:;multiplier data`price]
price twoX threeX fourX
-----------------------
80 160 240 240
24 48 72 72
41 82 123 123
0 0 0 0
81 162 243 243
10 20 30 30
36 72 108 108
36 72 108 108
16 32 48 48
17 34 51 51

Is graycomatrix's NumLevels and GrayLimits the same thing MATLAB

Ive been looking at implementing GLCM within MATLAB using graycomatrix. There are two arguments that I have discovered (NumLevels and GrayLimits) but in in my research and implementation they seem to achieve the same result.
GrayLimits specified bins between a range set [low high], causing a restricted set of gray levels.
NumLevels declares the number of gray levels in an image.
Could someone please explain the difference between these two arguments, as I don't understand why there would be two arguments that achieve the same result.
From the documentation:
'GrayLimits': Range used scaling input image into gray levels, specified as a two-element vector [low high]. If N is the number of gray levels (see parameter 'NumLevels') to use for scaling, the range [low high] is divided into N equal width bins and values in a bin get mapped to a single gray level.
'NumLevels': Number of gray levels, specified as an integer.
Thus the first parameter sets the input gray level range to be used (defaults to the min and max values in the image), and the second parameter sets the number of unique gray levels considered (and thus the size of the output matrix, defaults to 8, or 2 for binary images).
For example:
>> graycomatrix(img,'NumLevels',8,'GrayLimits',[0,255])
ans =
17687 1587 81 31 7 0 0 0
1498 7347 1566 399 105 8 0 0
62 1690 3891 1546 298 38 1 0
12 335 1645 4388 1320 145 4 0
2 76 305 1349 4894 959 18 0
0 16 40 135 965 7567 415 0
0 0 0 2 15 421 2410 0
0 0 0 0 0 0 0 0
>> graycomatrix(img,'NumLevels',8,'GrayLimits',[0,127])
ans =
1 9 0 0 0 0 0 0
7 17670 1431 156 50 31 23 15
1 1369 3765 970 350 142 84 92
0 128 1037 1575 750 324 169 167
0 46 361 836 1218 747 335 260
0 16 163 330 772 1154 741 547
0 10 74 150 370 787 1353 1208
0 4 67 136 294 539 1247 21199
>> graycomatrix(img,'NumLevels',4,'GrayLimits',[0,255])
ans =
28119 2077 120 0
2099 11470 1801 5
94 1829 14385 433
0 2 436 2410
As you can see, these parameters modify the output in different ways:
In the first case above, the range [0,255] was mapped to columns/rows 1-8, putting 32 different input grey values into each.
In the second case, the smaller range [0,127] was mapped to 8 indices, putting 16 different input grey values into each, and putting the remaining grey values 128-255 into the 8th index.
In the third case, the range [0,255] was mapped to 4 indices, putting 64 different input grey values into each.

Replace values within matlab matrix using column values from another matrix

I have a big matrix (8656x25960) with some speckle noise within it. I used the findpeaks tool in order to find in what columns I indeed have peaks above a certain threshold. The output of the findspeaks tool is a matrix containing all of the bad columns, for example -
loc =
Columns 1 through 6
30 51 155 307 333 338
Columns 7 through 12
642 955 1409 1567 1728 1730
Columns 13 through 18
2332 2546 2615 2685 2806 2995
Columns 19 through 24
3002 3122 3124 3164 3690 4176
Columns 25 through 30
4430 4475 4539 5142 5155 5244
Columns 31 through 36
5246 5941 5943 6114 6486 6922
Columns 37 through 42
7165 7169 7460 7587 7647 8944
Columns 43 through 44
12754 13693
How can I use those columns numbers with the original matrix and replace the values of this 'bad' column with the value 0 (for example).
Hoping I'm clear enough.
For row vector Ioc simply use indexing:
yourmatrix(:,Ioc) = 0;

Pivot table in kdb+/q

I'm trying to pivot some trade data in KDB/q. Although my data are only slightly different from the working example on the website (see the general pivot function: http://code.kx.com/q/cookbook/pivoting-tables/),
I can't get the function to work, even after several hours of trying (I'm very new to KDB).
Put simply, I'm trying to go from this table:
q)5# trades_agg
date sym time exchange buysell| shares
--------------------------------------| ------
2009.01.05 aaca 09:30 BATS B | 484
2009.01.05 aaca 09:30 BATS S | 434
2009.01.05 aaca 09:30 NASDAQ B | 235
2009.01.05 aaca 09:30 NASDAQ S | 429
2009.01.05 aaca 09:30 NYSE B | 309
to this one:
date sym time | BATSsharesB BATSsharesS NASDAQsharesB ...
----------------------| -----------------------------------------------
2009.01.05 aaca 09:30 | 484 434 235 ...
... | ...
I'll provide a working example to illustrate things:
// Create data
qpd:5*2*4*"i"$16:00-09:30
date:raze(100*qpd)#'2009.01.05+til 5
sym:(raze/)5#enlist qpd#'100?`4
sym:(neg count sym)?sym
time:"t"$raze 500#enlist 09:30:00+15*til qpd
time+:(count time)?1000
exchange:raze 500#enlist raze(qpd div 3)#enlist`NYSE`NASDAQ`BATS
buysell:raze 500#enlist raze(qpd div 2)#enlist`B`S
shares:(500*qpd)?100
trades:([]date;sym;time;exchange;buysell;shares)
//I then aggregate the data into equal sized buckets
trades_agg: select sum shares by date, sym, time: 15 xbar time.minute, exchange, buysell from trades
// pivot function from the code.kx.com website
piv:{[t;k;p;v;f;g]
v:(),v;
G:group flip k!(t:.Q.v t)k;
F:group flip p!t p;
count[k]!g[k;P;C]xcols 0!key[G]!flip(C:f[v]P:flip value flip key F)!raze
{[i;j;k;x;y]
a:count[x]#x 0N;
a[y]:x y;
b:count[x]#0b;
b[y]:1b;
c:a i;
c[k]:first'[a[j]#'where'[b j]];
c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]}
I subsequently apply this pivot function to the example with the functions f and g set to their default (::) values but I get an error message:
piv[`trades_agg;`date`sym`time;`exchange`buysell;`shares;(::);(::)]
Even when I use the suggested f and g functions it doesn't work:
f:{[v;P]`$raze each string raze P[;0],'/:v,/:\:P[;1]}
g:{[k;P;c]k,(raze/)flip flip each 5 cut'10 cut raze reverse 10 cut asc c}
I don't get why this is not working correctly since it is so close to the example on the website.
This is a self-contained version that's easier to use:
tt:1000#0!trades_agg
piv:{[t;k;p;v]
/ controls new columns names
f:{[v;P]`${raze " " sv x} each string raze P[;0],'/:v,/:\:P[;1]};
v:(),v; k:(),k; p:(),p; / make sure args are lists
G:group flip k!(t:.Q.v t)k;
F:group flip p!t p;
key[G]!flip(C:f[v]P:flip value flip key F)!raze
{[i;j;k;x;y]
a:count[x]#x 0N;
a[y]:x y;
b:count[x]#0b;
b[y]:1b;
c:a i;
c[k]:first'[a[j]#'where'[b j]];
c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]};
q)piv[`tt;`date`sym`time;`exchange`buysell;enlist `shares]
date sym time | BATS shares B BATS shares S NASDAQ shares B NASDAQ sha..
---------------------| ------------------------------------------------------..
2009.01.05 adkk 09:30| 577 359 499 452 ..
2009.01.05 adkk 09:45| 882 501 339 467 ..
2009.01.05 adkk 10:00| 620 513 411 128 ..
2009.01.05 adkk 10:15| 501 544 272 544 ..
2009.01.05 adkk 10:30| 291 594 363 331 ..
2009.01.05 adkk 10:45| 867 500 498 536 ..
2009.01.05 adkk 11:00| 624 632 694 493 ..
2009.01.05 adkk 11:15| 99 704 600 299 ..
2009.01.05 adkk 11:30| 269 394 280 392 ..
2009.01.05 adkk 11:45| 635 744 758 597 ..
2009.01.05 adkk 12:00| 562 354 498 405 ..
2009.01.05 adkk 12:15| 416 437 303 492 ..
2009.01.05 adkk 12:30| 447 699 370 302 ..
2009.01.05 adkk 12:45| 336 647 512 245 ..
2009.01.05 adkk 13:00| 692 457 497 553 ..
Your table is keyed so unkey it:
trades_agg:0!select sum shares by date, sym, time: 15 xbar time.minute,exchange,buysell from trades
And define your g as:
g:{[k;P;c]k,c}
Best way to figure out what the f/g needs to be is to define it with a breakpoint and then investigate the variables
g:{[k;P;c]break}
I found it difficult to understand the original piv function in Ryan's answer, so I updated it by adding some comments + more readable variable names HTH
piv:{[table; rows; columns; vals]
/ make sure args are lists
vals: (),vals;
rows: (),rows;
columns: (),columns;
/ Get columns of table corresponding to those of row labels and calculate groups
/ group returns filteredValues dict whose keys are the unique row labels and vals are the row indices of each group e.g. (0 1 3; 2 4; ...)
rowGroups: group rows#table;
rowGroupIdxs: value rowGroups;
rowValues: key[rowGroups];
/ Similarly, get columns of table corresponding to those of column labels and calculate groups
colGroups: group columns#table;
colGroupIdxs: value colGroups;
colValues: key colGroups;
getPivotCol: {[rowGroupStartIdx; nonSingleRowGroups; nonSingleRowGroupsIdx; vals; colGroupIdxs]
/ vals: the list of values for this particular value-column combination
/ colGroupIdxs: the list of indices for this particular column group
/ We only care about vals that should belong in this pivot column - we need to filter out vals not part of this column group
filteredValues: count[vals]#vals[0N];
filteredValues[colGroupIdxs]: vals[colGroupIdxs];
/ Equivalent to filteredValues <> 0N
hasValue: count[vals]#0b;
hasValue[colGroupIdxs]: 1b;
/ Seed off pivot column with the first (filtered) value of each row group
/ This will be correct for row groups of size 1 as no aggregation needs to occur
pivotCol: filteredValues[rowGroupStartIdx];
/ Otherwise, for the row groups larger than 1, get the first (filtered) value
pivotCol[nonSingleRowGroupsIdx]: first'[filteredValues[nonSingleRowGroups]#'where'[hasValue[nonSingleRowGroups]]];
pivotCol
}
/ Groups with more than 1 row (these are the ones that will need aggregating)
nonSingleRowGroupsIdx: where 1 <> count'[rowGroupIdxs];
/ Get resulting pivot column for each combination of column and value fields
pivotCols: raze getPivotCol[rowGroupIdxs[;0]; rowGroupIdxs[nonSingleRowGroupsIdx]; nonSingleRowGroupsIdx] /:\: [table[vals]; colGroupIdxs]
/ Columns names are the cross-product of column and value fields
colNames:`${raze "" sv vals} each string raze (flip value flip colValues),'/:vals;
/ Finally, stitch together row and column headings with pivot data to obtain final table
rowValues!flip colNames!pivotCols
};
I also made a small change to formatting of columns names for my needs btw

Tidying up a list

I'm fairly sure there should be an elegant solution to this (in MATLAB), but I just can't think of it right now.
I have a list with [classIndex, start, end], and I want to collapse consecutive class indices into one group like so:
This
1 1 40
2 46 53
2 55 55
2 57 64
2 67 67
3 68 91
1 94 107
Should turn into this
1 1 40
2 46 67
3 68 91
1 94 107
How do I do that?
EDIT
Never mind, I think I got it - it's almost like fmarc's solution, but gets the indices right
a=[ 1 1 40
2 46 53
2 55 55
2 57 64
2 67 67
3 68 91
1 94 107];
d = diff(a(:,1));
startIdx = logical([1;d]);
endIdx = logical([d;1]);
b = [a(startIdx,1),a(startIdx,2),a(endIdx,3)];
Here is one solution:
Ad = find([1; diff(A(:,1))]~=0);
output = A(Ad,:);
output(:,3) = A([Ad(2:end)-1; Ad(end)],3);
clear Ad
One way to do it if the column in question is numeric:
Build the differences along the id-column. Consecutive identical items will have zero here:
diffind = diff(a(:,1)');
Use that to index your array, using logical indexing.
b = a([true [diffind~=0]],:);
Since the first item is always included and the difference vector starts with the difference from first to second element, we need to prepend one true value to the list.