Transform matrix without loop - matlab

I have oldMat which is a ranking of equity tickers. The column number represents the respective rank, e.g. first column equals highest rank, second column represents second highest rank and so on. The integers within oldMatrepresent the number of the individual equity ticker. The number 3 in oldMat(3,2,1)means, that the third equity ticker is ranked second in the third period (rows represent different periods).
Now, I need to transform oldMat in the following way: The column numbers now represent the individual equity tickers. The integers now represent the rank that individual equity tickers hold at specific periods. For example, the number 2 in newMat(3,3,1) means, that the third equity ticker is ranked second in the third period.
I used a for-loop in order to solve that problem, but I am pretty sure there exists a more efficient way to achieve this result. Here's my code:
% Define oldMat
oldMat(:,:,1) = ...
[NaN, NaN, NaN, NaN, NaN, NaN; ...
1, 3, 4, 6, 2, 5; ...
6, 3, 4, 1, 2, 5; ...
2, 3, 6, 1, 4, 5; ...
5, 4, 6, 2, 3, 1; ...
5, 1, 2, 3, 6, 4; ...
4, 5, 1, 3, 6, 2; ...
4, 1, 6, 5, 2, 3];
oldMat(:,:,2) = ...
[NaN, NaN, NaN, NaN, NaN, NaN; ...
NaN, NaN, NaN, NaN, NaN, NaN; ...
1, 6, 3, 4, 2, 5; ...
6, 3, 2, 1, 4, 5; ...
2, 6, 3, 4, 1, 5; ...
5, 2, 1, 6, 3, 4; ...
5, 1, 3, 6, 2, 4; ...
4, 1, 5, 6, 3, 2];
% Pre-allocate newMat
newMat = nan(size(oldMat));
% Transform oldMat to newMat
for runNum = 1 : size(newMat,3)
for colNum = 1 : size(newMat,2)
for rowNum = 1 : size(newMat,1)
if ~isnan(oldMat(rowNum, colNum, runNum))
newMat(rowNum,oldMat(rowNum, colNum, runNum), runNum) = colNum;
end
end
end
end

Looks like a classic case of sub2ind. You want to create a set of linear indices to access the second dimension of the new matrix and set those equal to the column number. First create a grid of 3D coordinates with meshgrid, then use the oldMat matrix as an index into the second column of the output and set this equal to the column number. Make sure that you don't copy over any NaN values or sub2ind will complain. You can use isnan to help filter these values out for you:
% Initialize new matrix
newMat = nan(size(oldMat));
% Generate a grid of coordinates
[X,Y,Z] = meshgrid(1:size(newMat,2), 1:size(newMat,1), 1:size(newMat,3));
% Find elements that are NaN and remove
mask = isnan(oldMat);
X(mask) = []; Y(mask) = []; Z(mask) = [];
% Set the values now
newMat(sub2ind(size(oldMat), Y, oldMat(~isnan(oldMat)).', Z)) = X;

Related

Octave function to get groups of consecutive columns in matrix

I am trying to find an efficient way of extracting groups of n consecutive columns in a matrix. Example:
A = [0, 1, 2, 3, 4; 0, 1, 2, 3, 4; 0, 1, 2, 3, 4];
n = 3;
should produce an output similar to this:
answer = cat(3, ...
[0, 1, 2; 0, 1, 2; 0, 1, 2], ...
[1, 2, 3; 1, 2, 3; 1, 2, 3], ...
[2, 3, 4; 2, 3, 4; 2, 3, 4]);
I know this is possible using a for loop, such as the following code snippet:
answer = zeros([3, 3, 3]);
for i=1:3
answer(:, :, i) = A(:, i:i+2);
endfor
However, I am trying to avoid using a for loop in this case - is there any possibility to vectorize this operation as well (using indexed expressions)?
Using just indexing
ind = reshape(1:size(A,1)*n, [], n) + reshape((0:size(A,2)-n)*size(A,1), 1, 1, []);
result = A(ind);
The index ind is built using linear indexing and implicit expansion.
Using the Image Package / Image Processing Toolbox
result = reshape(im2col(A, [size(A,1) n], 'sliding'), size(A,1), n, []);
Most of the work here is done by the im2col function with the 'sliding' option.

Octave/MATLAB: Using a matrix to access elements in a matrix without loops

Consider, the two matrices:
>> columns = [1,3,2,4]
and
>> WhichSet =
[2, 2, 1, 2;
1, 1, 2, 1;
1, 2, 1, 2;
2, 1, 2, 2]
My intent is to do the following:
>> result = [WhichSet(1,columns(1)), WhichSet(2,columns(2)), WhichSet(3, columns(3)) and WhichSet(4, columns(4))]
result = [2,2,2,2]
without any loops.
Because how indexing works, you can not just plug them as they are now, unless you use linear indexing
Your desired linear indices are:
ind=sub2ind(size(WhichSet),1:size(whichSet,1),columns);
Then
out=WhichSet(ind);

Group and sum elements that are the same within a vector

Let's say I have a vector that looks as so (the numbers will always be > 0)...
[1, 2, 1, 4, 1, 2, 4, 3]
I need a vectorized implementation that sums the numbers together and uses the original number as the index to store the number. So if I run it I would get...
% step 1
[1+1+1, 2+2, 3, 4+4]
% step 2
[3, 4, 3, 8]
I have already implemented this using for loops, but I feel like there is a vectorized way to achieve this. I am still quite new at vectorizing functions so any help is appreciated.
This sounds like a job for accumarray:
v = [1, 2, 1, 4, 1, 2, 4, 3];
result = accumarray(v(:), v(:)).'
result =
3 4 3 8
Other approaches:
Using histcounts:
x = [1, 2, 1, 4, 1, 2, 4, 3];
u = unique(x);
result = u.*histcounts(x, [u inf]);
Using bsxfun (may be more memory-intensive):
x = [1, 2, 1, 4, 1, 2, 4, 3];
u = unique(x);
result = u .* sum(bsxfun(#eq, x(:), u(:).' ), 1);

cut vector according to NaN values

data_test is a vector that is populated by numbers with some NaN.
data_test = [NaN, 2, 3, 4, NaN,NaN,NaN, 12 ,44, 34, NaN,5,NaN];
I would like to cut data_test according to the NaNs and create a cell array containing the pieces of data_set in between NaNs.
data_cell{1}=[2 3 4];
data_cell{2}=[12 44 34];
data_cell{3}=[5];
at this point I need to filter these values (this is OK, just as an example the filtered values will be the same of data_test +1)
data_cell{1} -> data_cell_filt{1}
data_cell{2} -> data_cell_filt{2}
data_cell{3} -> data_cell_filt{3}
and put back the filtered values in data_test.
data_cell_filt{1}
data_cell_filt{2} -> data_test
data_cell_filt{3}
in order that data_test is
data_test = [NaN, 3, 4, 5, NaN,NaN,NaN, 13 ,45, 35, NaN, 6, NaN];
ps (data_test in my case is ~20000 elements)
You can do it easily with a loop or use arrayfun like this:
A = [NaN, 2, 3, 4, NaN, NaN, NaN, 13, 45, 35, NaN, 6, NaN]
i1 = find(diff(isnan(A))==-1)+1 %// Index where clusters of numbers begin
i2 = find(diff(isnan(A))==1) %// Index where clusters of numbers end
data_cell_filt = arrayfun(#(x,y)({A(x:y)}),i1,i2 ,'uni', false)
One approch with accumarray and cumsum and diff
%// find the index of regular numbers
idx = find(~isnan(data_test))
%// group the numbers which are adjacent, to some index number
idx1 = cumsum([1,diff(idx)~=1])
%// put all those numbers of same index number into a cell
out = accumarray(idx1.',data_test(idx).',[],#(x) {x.'})
Sample run:
data_test = [NaN, 2, 3, 4, NaN,NaN,NaN, 12 ,44, 34, NaN,5,NaN];
>> celldisp(out)
out{1} =
2 3 4
out{2} =
12 44 34
out{3} =
5
Convolution-based approach:
ind = isnan(data_test);
t = conv(2*x-1, [-1 1], 'same'); %// convolution is like correlation but flips 2nd input
starts = find(t==2); %// indices of where a run of non-NaN's starts, minus 1
ends = find(t==-2); %// indices of where it ends
result = mat2cell(data_test(~ind), 1, ends-starts); %// pick non-NaN's and split

MATLAB, frequency table with a class of interval of size 2

The following data display the number of errors per book for 20 publisher
2, 5, 2, 8, 2, 3, 5, 6, 1, 0, 2, 0, 1, 5, 0, 0, 4, 5, 1, 2
Now i want to compute a frequency table with a class of interval of size 2 and relative frequency by using MATLAB.
I can make a frequency table by the command tabulate(x) but do not finding any reference that clarify how to compute a frequency table with a class of interval of size 2.
You can use histc, which allows to specify the edges of the histogram bins. It doesn't compute relative frequencies or print a table though, you have to do this yourself:
% error data
e = [2, 5, 2, 8, 2, 3, 5, 6, 1, 0, 2, 0, 1, 5, 0, 0, 4, 5, 1, 2];
% bin edges
be = 0 :2: ceil(max(e) / 2) * 2;
% absolute frequencies
af = histc(e, be);
% relative frequencies
rf = af / sum(af);
% print table
fprintf(' Value Count Percent\n')
fprintf(' %d-%d\t %d\t %5.2f%%\n', [be; be + 1; af; rf * 100])
The result is:
Value Count Percent
0-1 7 35.00%
2-3 6 30.00%
4-5 5 25.00%
6-7 1 5.00%
8-9 1 5.00%