Apply function to all rows - matlab

I have a function, ranker, that takes a vector and assigns numerical ranks to it in ascending order. For example,
ranker([5 1 3 600]) = [3 1 2 4] or
ranker([42 300 42 42 1 42] = [3.5 6 3.5 3.5 1 3.5] .
I am using a matrix, variable_data and I want to apply the ranker function to each row for all rows in variable data. This is my current solution, but I feel there is a way to vectorize it and have it as equally fast :p
variable_ranks = nan(size(variable_data));
for i=1:1:numel(nmac_ids)
variable_ranks(i,:) = ranker(abs(variable_data(i,:)));
end

If you place the matrix rows into a cell array, you can then apply a function to each cell.
Consider this simple example of applying the SORT function to each row
a = rand(10,3);
b = cell2mat( cellfun(#sort, num2cell(a,2), 'UniformOutput',false) );
%# same as: b = sort(a,2);
You can even do this:
b = cell2mat( arrayfun(#(i) sort(a(i,:)), 1:size(a,1), 'UniformOutput',false)' );
Again, you version with the for loop is probably faster..

With collaboration from Amro and Jonas
variable_ranks = tiedrank(variable_data')';
Ranker has been replaced by the Matlab function in the Stat toolbox (sorry for those who don't have it),
[R,TIEADJ] = tiedrank(X) computes the
ranks of the values in the vector X.
If any X values are tied, tiedrank
computes their average rank. The
return value TIEADJ is an adjustment
for ties required by the nonparametric
tests signrank and ranksum, and for
the computation of Spearman's rank
correlation.
TIEDRANK will compute along columns in Matlab 7.9.0 (R2009b), however it is undocumented. So by transposing the input matrix, rows turn into columns and will rank them. The second transpose is then used to organize the data in the same manner as the input. There in essence is a very classy hack :p

One way would be to rewrite ranker to take array input
sizeData = size(variable_data);
[sortedData,almostRanks] = sort(abs(variable_data),2);
[rowIdx,colIdx] = ndgrid(1:sizeData(1),1:sizeData(2));
linIdx = sub2ind(sizeData,rowIdx,almostRanks);
variable_ranks = variable_data;
variable_ranks(linIdx) = colIdx;
%# break ties by finding subsequent equal entries in sorted data
[rr,cc] = find(diff(sortedData,1,2) == 0);
ii = sub2ind(sizeData,rr,cc);
ii2 = sub2ind(sizeData,rr,cc+1);
ii = sub2ind(sizeData,rr,almostRanks(ii));
ii2 = sub2ind(sizeData,rr,almostRanks(ii2));
variable_ranks(ii) = variable_ranks(ii2);
EDIT
Instead, you can just use TIEDRANK from TMW (thanks, #Amro):
variable_rank = tiedrank(variable_data')';

I wrote a function that does this, it's on the FileExchange tiedrank_(X,dim). And it looks like this...
%[Step 0a]: force dim to be 1, and compress everything else into a single
%dimension. We will reverse this process at the end.
if dim > 1
otherDims = 1:length(size(X));
otherDims(dim) = [];
perm = [dim otherDims];
X = permute(X,perm);
end
originalSiz = size(X);
X = reshape(X,originalSiz(1),[]);
siz = size(X);
%[Step 1]: sort and get sorting indicies
[X,Ind] = sort(X,1);
%[Step 2]: create matrix [D], which has +1 at the start of consecutive runs
% and -1 at the end, with zeros elsewhere.
D = zeros(siz,'int8');
D(2:end-1,:) = diff(X(1:end-1,:) == X(2:end,:));
D(1,:) = X(1,:) == X(2,:);
D(end,:) = -( X(end,:) == X(end-1,:) );
clear X
%[Step 3]: calculate the averaged rank for each consecutive run
[a,~] = find(D);
a = reshape(a,2,[]);
h = sum(a,1)/2;
%[Step 4]: insert the troublseome ranks in the relevant places
L = zeros(siz);
L(D==1) = h;
L(D==-1) = -h;
L = cumsum(L);
L(D==-1) = h; %cumsum set these ranks to zero, but we wanted them to be h
clear D h
%[Step 5]: insert the simple ranks (i.e. the ones that didn't clash)
[L(~L),~] = find(~L);
%[Step 6]: assign the ranks to the relevant position in the matrix
Ind = bsxfun(#plus,Ind,(0:siz(2)-1)*siz(1)); %equivalent to using sub2ind + repmat
r(Ind) = L;
%[Step 0b]: As promissed, we reinstate the correct dimensional shape and order
r = reshape(r,originalSiz);
if dim > 1
r = ipermute(r,perm);
end
I hope that helps someone.

Related

Vectorize MATLAB code

Let's say we have three m-by-n matrices of equal size: A, B, C.
Every column in C represents a time series.
A is the running maximum (over a fixed window length) of each time series in C.
B is the running minimum (over a fixed window length) of each time series in C.
Is there a way to determine T in a vectorized way?
[nrows, ncols] = size(A);
T = zeros(nrows, ncols);
for row = 2:nrows %loop over the rows (except row #1).
for col = 1:ncols %loop over the columns.
if C(row, col) > A(row-1, col)
T(row, col) = 1;
elseif C(row, col) < B(row-1, col)
T(row, col) = -1;
else
T(row, col) = T(row-1, col);
end
end
end
This is what I've come up with so far:
T = zeros(m, n);
T(C > circshift(A,1)) = 1;
T(C < circshift(B,1)) = -1;
Well, the trouble was the dependency with the ELSE part of the conditional statement. So, after a long mental work-out, here's a way I summed up to vectorize the hell-outta everything.
Now, this approach is based on mapping. We get column-wise runs or islands of 1s corresponding to the 2D mask for the ELSE part and assign them the same tags. Then, we go to the start-1 along each column of each such run and store that value. Finally, indexing into each such start-1 with those tagged numbers, which would work as mapping indices would give us all the elements that are to be set in the new output.
Here's the implementation to fulfill all those aspirations -
%// Store sizes
[m1,n1] = size(A);
%// Masks corresponding to three conditions
mask1 = C(2:nrows,:) > A(1:nrows-1,:);
mask2 = C(2:nrows,:) < B(1:nrows-1,:);
mask3 = ~(mask1 | mask2);
%// All but mask3 set values as output
out = [zeros(1,n1) ; mask1 + (-1*(~mask1 & mask2))];
%// Proceed if any element in mask3 is set
if any(mask3(:))
%// Row vectors for appending onto matrices for matching up sizes
mask_appd = false(1,n1);
row_appd = zeros(1,n1);
%// Get 2D mapped indices
df = diff([mask_appd ; mask3],[],1)==1;
cdf = cumsum(df,1);
offset = cumsum([0 max(cdf(:,1:end-1),[],1)]);
map_idx = bsxfun(#plus,cdf,offset);
map_idx(map_idx==0) = 1;
%// Extract the values to be used for setting into new places
A1 = out([df ; false(1,n1)]);
%// Map with the indices obtained earlier and set at places from mask3
newval = [row_appd ; A1(map_idx)];
mask3_appd = [mask_appd ; mask3];
out(mask3_appd) = newval(mask3_appd);
end
Doing this vectorized is rather difficult because the current row's output depends on the previous row's output. Doing vectorized operations usually means that each element should stand out on its own using some relationship that is independent of the other elements that surround it.
I don't have any input on how you would achieve this without a for loop but I can help you reduce your operations down to one instead of two. You can do the assignment vectorized per row, but I can't see how you'd do it all in one shot.
As such, try something like this instead:
[nrows, ncols] = size(A);
T = zeros(nrows, ncols);
for row = 2:nrows
out = T(row-1,:); %// Change - Make a copy of the previous row
out(C(row,:) > A(row-1,:)) = 1; %// Set those elements of C
%// in the current row that are larger
%// than the previous row of A to 1
out(C(row,:) < B(row-1,:)) = -1; %// Same logic but for B now and it's
%// less than and the value is -1 instead
T(row,:) = out; %// Assign to the output
end
I'm currently figuring out how to do this with any loops whatsoever. I'll keep you posted.

Generate random 2D matrix with unique rows in octave/matlab

I want to generate a 2D matrix(1000x3) with random values in the range of 1 to 10 in octave. Using randi(10,1000,3) will generate a matrix with repeated row values. But I want to generate unique(unrepeated) rows. Is there any way that, I can do that?
You can do that easily by getting the cartesian product to create all possibilities and shuffle the array as follows. To create the cartesian product, you will need my custom cartprod.m function that generates a cartesian product.
C = cartprod(1:10,1:10,1:10);
The following line then shuffles the cartesian product C.
S = C(randperm( size(C,1) ),:);
Notes:
Every row in S is unique and you can verify that size( unique( S ) ) == 1000.
I should note that this code works on Matlab 2015a. I haven't tested it in Octave, which is what OP seems to be using. I've been told the syntax is pretty much identical though.
You can generate all possible three-item sequences drawn from 1 through 10, with replacement, using the following function:
function result = nchoosek_replacement(n, k)
%// Edge cases: just return an empty matrix
if k < 1 || n < 1 || k >= n
result = [];
return
end
reps = n^(k-1);
result = zeros(n^k, k);
cur_col = repmat(1:n, reps, 1);
result(:,1) = cur_col(:);
%// Base case: when k is 1, just return the
%// fully populated matrix 'result'
if k == 1
return
end
%// Recursively generate a matrix that will
%// be used to populate columns 2:end
next = nchoosek_replacement(n, k-1);
%// Repeatedly use the matrix above to
%// populate the matrix 'result'
for i = 1:n
cur_range = (i-1)*reps+1:i*reps;
result(cur_range, 2:end) = next;
end
end
With this function defined, you can now generate all possible sequences. In this case there are exactly 1000 so they could simply be shuffled with randperm. A more general approach is to sample from them with randsample, which would also allow for smaller matrices if desired:
max_value = 10;
row_size = 3;
num_rows = 1000;
possible = nchoosek_replacement(max_value, row_size);
indices = randsample(size(possible, 1), num_rows);
data = possible(indices, :);

Find size of matrix, without using `size` in MATLAB

Suppose I want to find the size of a matrix, but can't use any functions such as size, numel, and length. Are there any neat ways to do this? I can think of a few versions using loops, such as the one below, but is it possible to do this without loops?
function sz = find_size(m)
sz = [0, 0]
for ii = m' %' or m(1,:) (probably faster)
sz(1) = sz(1) + 1;
end
for ii = m %' or m(:,1)'
sz(2) = sz(2) + 1;
end
end
And for the record: This is not a homework, it's out of curiosity. Although the solutions to this question would never be useful in this context, it is possible that they provide new knowledge in terms of how certain functions/techniques can be used.
Here is a more generic solution
function sz = find_size(m)
sz = [];
m(f(end), f(end));
function r = f(e)
r=[];
sz=[sz e];
end
end
Which
Works for arrays, cell arrays and arrays of objects
Its time complexity is constant and independent of matrix size
Does not use any MATLAB functions
Is easy to adapt to higher dimensions
For non-empty matrices you can use:
sz = [sum(m(:,1)|1) sum(m(1,:)|1)];
But to cover empty matrices we need more function calls
sz = sqrt([sum(sum(m*m'|1)) sum(sum(m'*m|1))]);
or more lines
n=m&0;
n(end+1,end+1)=1;
[I,J]=find(n);
sz=[I,J]-1;
Which both work fine for m=zeros(0,0), m=zeros(0,10) and m=zeros(10,0).
Incremental indexing and a try-catch statement works:
function sz = find_size(m)
sz = [0 0];
isError = false;
while ~isError
try
b = m(sz(1) + 1, :);
sz(1) = sz(1) + 1;
catch
isError = true;
end
end
isError = false;
while ~isError
try
b = m(:, sz(2) + 1);
sz(2) = sz(2) + 1;
catch
isError = true;
end
end
end
A quite general solution is:
[ sum(~sum(m(:,[]),2)) sum(~sum(m([],:),1)) ]
It accepts empty matrices (with 0 columns, 0 rows, or both), as well as complex, NaN or inf values.
It is also very fast: for a 1000 × 1000 matrix it takes about 22 microseconds in my old laptop (a for loop with 1e5 repetitions takes 2.2 seconds, measured with tic, toc).
How this works:
The keys to handling empty matrices in a unified way are:
empty indexing (that is, indexing with []);
the fact that summing along an empty dimension gives zeros.
Let r and c be the (possibly zero) numbers of rows and columns of m. m(:,[]) is an r × 0 empty vector. This holds even if r or c are zero. In addition, this empty indexing automatically provides insensitivity to NaN, inf or complex values in m (and probably accounts for the small computation time as well).
Summing that r × 0 vector along its second dimension (sum(m(:,[]),2)) produces a vector of r × 1 zeros. Negating and summing this vector gives r.
The same procedure is applied for the number of columns, c, by empty-indexing in the first dimension and summing along that dimension.
The find command has a neat option to get the last K elements:
I = find(X,K,'last') returns at most the last K indices corresponding to the nonzero entries of the arrayX`.
To get the size, ask for the last k=1 elements. For example,
>> x=zeros(256,4);
>> [numRows,numCols] = find(x|x==0, 1, 'last')
numRows =
256
numCols =
4
>> numRows0 = size(x,1), numCols0 = size(x,2)
numRows0 =
256
numCols0 =
4
You can use find with the single output argument syntax, which will give you numel:
>> numEl = find(x|x==0, 1, 'last')
numEl =
1024
>> numEl0 = numel(x)
numEl0 =
1024
Another straightforward, but less interesting solution uses whos (thanks for the reminder Navan):
s=whos('x'); s.size
Finally, there is format debug.

General method to find submatrix in matlab matrix

I am looking for a 'good' way to find a matrix (pattern) in a larger matrix (arbitrary number of dimensions).
Example:
total = rand(3,4,5);
sub = total(2:3,1:3,3:4);
Now I want this to happen:
loc = matrixFind(total, sub)
In this case loc should become [2 1 3].
For now I am just interested in finding one single point (if it exists) and am not worried about rounding issues. It can be assumed that sub 'fits' in total.
Here is how I could do it for 3 dimensions, however it just feels like there is a better way:
total = rand(3,4,5);
sub = total(2:3,1:3,3:4);
loc = [];
for x = 1:size(total,1)-size(sub,1)+1
for y = 1:size(total,2)-size(sub,2)+1
for z = 1:size(total,3)-size(sub,3)+1
block = total(x:x+size(sub,1)-1,y:y+size(sub,2)-1,z:z+size(sub,3)-1);
if isequal(sub,block)
loc = [x y z]
end
end
end
end
I hope to find a workable solution for an arbitrary number of dimensions.
Here is low-performance, but (supposedly) arbitrary dimensional function. It uses find to create a list of (linear) indices of potential matching positions in total and then just checks if the appropriately sized subblock of total matches sub.
function loc = matrixFind(total, sub)
%matrixFind find position of array in another array
% initialize result
loc = [];
% pre-check: do all elements of sub exist in total?
elements_in_both = intersect(sub(:), total(:));
if numel(elements_in_both) < numel(unique(sub))
% if not, return nothing
return
end
% select a pivot element
% Improvement: use least common element in total for less iterations
pivot_element = sub(1);
% determine linear index of all occurences of pivot_elemnent in total
starting_positions = find(total == pivot_element);
% prepare cell arrays for variable length subscript vectors
[subscripts, subscript_ranges] = deal(cell([1, ndims(total)]));
for k = 1:length(starting_positions)
% fill subscript vector for starting position
[subscripts{:}] = ind2sub(size(total), starting_positions(k));
% add offsets according to size of sub per dimension
for m = 1:length(subscripts)
subscript_ranges{m} = subscripts{m}:subscripts{m} + size(sub, m) - 1;
end
% is subblock of total equal to sub
if isequal(total(subscript_ranges{:}), sub)
loc = [loc; cell2mat(subscripts)]; %#ok<AGROW>
end
end
end
This is based on doing all possible shifts of the original matrix total and comparing the upper-leftmost-etc sub-matrix of the shifted total with the sought pattern subs. Shifts are generated using strings, and are applied using circshift.
Most of the work is done vectorized. Only one level of loops is used.
The function finds all matchings, not just the first. For example:
>> total = ones(3,4,5,6);
>> sub = ones(3,3,5,6);
>> matrixFind(total, sub)
ans =
1 1 1 1
1 2 1 1
Here is the function:
function sol = matrixFind(total, sub)
nd = ndims(total);
sizt = size(total).';
max_sizt = max(sizt);
sizs = [ size(sub) ones(1,nd-ndims(sub)) ].'; % in case there are
% trailing singletons
if any(sizs>sizt)
error('Incorrect dimensions')
end
allowed_shift = (sizt-sizs);
max_allowed_shift = max(allowed_shift);
if max_allowed_shift>0
shifts = dec2base(0:(max_allowed_shift+1)^nd-1,max_allowed_shift+1).'-'0';
filter = all(bsxfun(#le,shifts,allowed_shift));
shifts = shifts(:,filter); % possible shifts of matrix "total", along
% all dimensions
else
shifts = zeros(nd,1);
end
for dim = 1:nd
d{dim} = 1:sizt(dim); % vectors with subindices per dimension
end
g = cell(1,nd);
[g{:}] = ndgrid(d{:}); % grid of subindices per dimension
gc = cat(nd+1,g{:}); % concatenated grid
accept = repmat(permute(sizs,[2:nd+1 1]), [sizt; 1]); % acceptable values
% of subindices in order to compare with matrix "sub"
ind_filter = find(all(gc<=accept,nd+1));
sol = [];
for shift = shifts
total_shifted = circshift(total,-shift);
if all(total_shifted(ind_filter)==sub(:))
sol = [ sol; shift.'+1 ];
end
end
For an arbitrary number of dimensions, you might try convn.
C = convn(total,reshape(sub(end:-1:1),size(sub)),'valid'); % flip dimensions of sub to be correlation
[~,indmax] = max(C(:));
% thanks to Eitan T for the next line
cc = cell(1,ndims(total)); [cc{:}] = ind2sub(size(C),indmax); subs = [cc{:}]
Thanks to Eitan T for the suggestion to use comma-separated lists for a generalized ind2sub.
Finally, you should test the result with isequal because this is not a normalized cross correlation, meaning that larger numbers in a local subregion will inflate the correlation value potentially giving false positives. If your total matrix is very inhomogeneous with regions of large values, you might need to search other maxima in C.

Removing a random number of columns from a matrix

I need to take away a random number of columns from an arbitrarily large matrix, I've put my attempt below, but I'm certain that there is a better way.
function new = reduceMatrices(original, colsToTakeAway)
a = colsToTakeAway(1);
b = colsToTakeAway(2);
c = colsToTakeAway(3);
x = original(1:a-1);
y = original(a+1:b-1);
z = original(b+1:c-1);
if c == size(original, 2);
new = [x,y,z];
elseif (c+1) == size(original, 2);
new = [x,y,z,c+1]
else
new = [x,y,z,c+1:size(original, 2)];
end
Here's one approach. First, generate a row vector of random numbers with numcols elements, where numcols is the number of columns in the original matrix:
rc = rand(1,numcols)
Next make a vector of 1s and 0s from this, for example
lv = rc>0.75
which will produce something like
0 1 1 0 1
and you can use Matlab's logical indexing feature to write
original(:,lv)
which will return only those columns of original which correspond to the 1s in lv.
It's not entirely clear from your question how you want to make the vector of column selections, but this should give you some ideas.
function newM = reduceMatrices(original, colsToTakeAway)
% define the columns to keep := cols \ colsToTakeAway
colsToKeep = setdiff(1:size(original,2), colsToTakeAway);
newM = original(:, colsToKeep);
end