General method to find submatrix in matlab matrix - matlab

I am looking for a 'good' way to find a matrix (pattern) in a larger matrix (arbitrary number of dimensions).
Example:
total = rand(3,4,5);
sub = total(2:3,1:3,3:4);
Now I want this to happen:
loc = matrixFind(total, sub)
In this case loc should become [2 1 3].
For now I am just interested in finding one single point (if it exists) and am not worried about rounding issues. It can be assumed that sub 'fits' in total.
Here is how I could do it for 3 dimensions, however it just feels like there is a better way:
total = rand(3,4,5);
sub = total(2:3,1:3,3:4);
loc = [];
for x = 1:size(total,1)-size(sub,1)+1
for y = 1:size(total,2)-size(sub,2)+1
for z = 1:size(total,3)-size(sub,3)+1
block = total(x:x+size(sub,1)-1,y:y+size(sub,2)-1,z:z+size(sub,3)-1);
if isequal(sub,block)
loc = [x y z]
end
end
end
end
I hope to find a workable solution for an arbitrary number of dimensions.

Here is low-performance, but (supposedly) arbitrary dimensional function. It uses find to create a list of (linear) indices of potential matching positions in total and then just checks if the appropriately sized subblock of total matches sub.
function loc = matrixFind(total, sub)
%matrixFind find position of array in another array
% initialize result
loc = [];
% pre-check: do all elements of sub exist in total?
elements_in_both = intersect(sub(:), total(:));
if numel(elements_in_both) < numel(unique(sub))
% if not, return nothing
return
end
% select a pivot element
% Improvement: use least common element in total for less iterations
pivot_element = sub(1);
% determine linear index of all occurences of pivot_elemnent in total
starting_positions = find(total == pivot_element);
% prepare cell arrays for variable length subscript vectors
[subscripts, subscript_ranges] = deal(cell([1, ndims(total)]));
for k = 1:length(starting_positions)
% fill subscript vector for starting position
[subscripts{:}] = ind2sub(size(total), starting_positions(k));
% add offsets according to size of sub per dimension
for m = 1:length(subscripts)
subscript_ranges{m} = subscripts{m}:subscripts{m} + size(sub, m) - 1;
end
% is subblock of total equal to sub
if isequal(total(subscript_ranges{:}), sub)
loc = [loc; cell2mat(subscripts)]; %#ok<AGROW>
end
end
end

This is based on doing all possible shifts of the original matrix total and comparing the upper-leftmost-etc sub-matrix of the shifted total with the sought pattern subs. Shifts are generated using strings, and are applied using circshift.
Most of the work is done vectorized. Only one level of loops is used.
The function finds all matchings, not just the first. For example:
>> total = ones(3,4,5,6);
>> sub = ones(3,3,5,6);
>> matrixFind(total, sub)
ans =
1 1 1 1
1 2 1 1
Here is the function:
function sol = matrixFind(total, sub)
nd = ndims(total);
sizt = size(total).';
max_sizt = max(sizt);
sizs = [ size(sub) ones(1,nd-ndims(sub)) ].'; % in case there are
% trailing singletons
if any(sizs>sizt)
error('Incorrect dimensions')
end
allowed_shift = (sizt-sizs);
max_allowed_shift = max(allowed_shift);
if max_allowed_shift>0
shifts = dec2base(0:(max_allowed_shift+1)^nd-1,max_allowed_shift+1).'-'0';
filter = all(bsxfun(#le,shifts,allowed_shift));
shifts = shifts(:,filter); % possible shifts of matrix "total", along
% all dimensions
else
shifts = zeros(nd,1);
end
for dim = 1:nd
d{dim} = 1:sizt(dim); % vectors with subindices per dimension
end
g = cell(1,nd);
[g{:}] = ndgrid(d{:}); % grid of subindices per dimension
gc = cat(nd+1,g{:}); % concatenated grid
accept = repmat(permute(sizs,[2:nd+1 1]), [sizt; 1]); % acceptable values
% of subindices in order to compare with matrix "sub"
ind_filter = find(all(gc<=accept,nd+1));
sol = [];
for shift = shifts
total_shifted = circshift(total,-shift);
if all(total_shifted(ind_filter)==sub(:))
sol = [ sol; shift.'+1 ];
end
end

For an arbitrary number of dimensions, you might try convn.
C = convn(total,reshape(sub(end:-1:1),size(sub)),'valid'); % flip dimensions of sub to be correlation
[~,indmax] = max(C(:));
% thanks to Eitan T for the next line
cc = cell(1,ndims(total)); [cc{:}] = ind2sub(size(C),indmax); subs = [cc{:}]
Thanks to Eitan T for the suggestion to use comma-separated lists for a generalized ind2sub.
Finally, you should test the result with isequal because this is not a normalized cross correlation, meaning that larger numbers in a local subregion will inflate the correlation value potentially giving false positives. If your total matrix is very inhomogeneous with regions of large values, you might need to search other maxima in C.

Related

Generate a random sparse matrix with N non-zero-elements

I've written a function that generates a sparse matrix of size nxd
and puts in each column 2 non-zero values.
function [M] = generateSparse(n,d)
M = sparse(d,n);
sz = size(M);
nnzs = 2;
val = ceil(rand(nnzs,n));
inds = zeros(nnzs,d);
for i=1:n
ind = randperm(d,nnzs);
inds(:,i) = ind;
end
points = (1:n);
nnzInds = zeros(nnzs,d);
for i=1:nnzs
nnzInd = sub2ind(sz, inds(i,:), points);
nnzInds(i,:) = nnzInd;
end
M(nnzInds) = val;
end
However, I'd like to be able to give the function another parameter num-nnz which will make it choose randomly num-nnz cells and put there 1.
I can't use sprand as it requires density and I need the number of non-zero entries to be in-dependable from the matrix size. And giving a density is basically dependable of the matrix size.
I am a bit confused on how to pick the indices and fill them... I did with a loop which is extremely costly and would appreciate help.
EDIT:
Everything has to be sparse. A big enough matrix will crash in memory if I don't do it in a sparse way.
You seem close!
You could pick num_nnz random (unique) integers between 1 and the number of elements in the matrix, then assign the value 1 to the indices in those elements.
To pick the random unique integers, use randperm. To get the number of elements in the matrix use numel.
M = sparse(d, n); % create dxn sparse matrix
num_nnz = 10; % number of non-zero elements
idx = randperm(numel(M), num_nnz); % get unique random indices
M(idx) = 1; % Assign 1 to those indices

Find the consecutive positive and negative elements for the entire array using matlab

I have a channel from which I take around 1 million samples now it contains both positive and negative values in it. My intention is to find the consecutive positive and negative integers(doesn't have to be same) and once the value is found I can then perform some operations on it. I have given my code below. chA is my channel from where i derive my inputs as values. The code is only giving me a value of 43.2600, which ideally should have given an array of numbers as there are lots of samples which are consecutive positive and negative.
consider the array as [0,1,-3,4,5,6,7,8,9,-19]
for i = 1:1000000 % loops strats from 1 and ends at 1000000
if (chA(i)<0) && (chA((i+1) >0)) % if i = 1, i+1 = -3 <it satisfy the condition>
tan = ((chA(i+1))- chA(i)); %calculate it
deltaOfTime = tan/i; %store the value here in the vector deltaOfTime
end
now in the next iteration it should be able to find out the next consecutive positive and negative value which is 9,-19
I think this is what you are trying to do...
origVec=[0,1,-3,4,5,6,7,8,9,-19];
yTemp=origVec(:); %make a column vector
yTemp = [NaN; yTemp; NaN]; %NaN pad
iTemp = (1:numel(yTemp)).'; %Get index array
% keep only the first of any adjacent pairs of equal values (including NaN).
yFinite = ~isnan(yTemp);
iNeq = [true;((yTemp(1:end-1) ~= yTemp(2:end)) & ...
(yFinite(1:end-1) | yFinite(2:end)))];
iTemp = iTemp(iNeq);
% take the sign of the first sample derivative
s = sign(diff(yTemp(iTemp)));
% find local maxima
iMax = [false;diff(s)<0];
iPk = iTemp(iMax)-1;
pksAndFollowingIdx = [iPk.';iPk.'+1]; %get neighbouring +ve and -ve values
deltaOfTime = diff(origVec(pksAndFollowingIdx))./iPk.'; %take difference between consecutive positive and negative values
That is, if your original code was supposed to be something more like:
for i = 1:10-1 % loop through array???
if (origVec(i)>0) && (origVec(i+1) <0) % check neighbouring +ve THEN -ve values???
tan12 = ((origVec(i+1))- origVec(i)); %calculate difference???
deltaOfTime(i) = tan12/i % deltaOfTime, not sure how this is "delta of time"???
end
end
You should save each of the value you calculate rather than overwrite it each loop:
deltaOfTime = zeros(1,1000000);
for i = 1:1000000 % loops strats from 1 and ends at 1000000
if (chA(i)<0) && (chA((i+1) >0)) % if i = 1, i+1 = -3 <it satisfy the condition>
tan = ((chA(i+1))- chA(i)); %calculate it
deltaOfTime(i) = tan/i; %store the value here in the vector deltaOfTime
end
However there are are better way to calculate the transitions and you would not need to loop through your signal, or pre-allocate large vector deltaOfTime.
This is a way to not pre-allocate values, However it might be slower as the array changes within the loop:
for i = 1:1000000 % loops strats from 1 and ends at 1000000
if (chA(i)<0) && (chA((i+1) >0)) % if i = 1, i+1 = -3 <it satisfy the condition>
tan = ((chA(i+1))- chA(i)); %calculate it
deltaOfTime = cat(2,deltaOfTime,tan/i); %store the value here in the vector deltaOfTime
end
Another try to correct all the bugs in the code:
for i = 1:length(chA)-1
if (chA(i)<0) && (chA((i+1) >0))
temp = ((chA(i+1))- chA(i));
deltaOfTime = cat(2,deltaOfTime,temp/i);
end
end
Fixed the if-statement, as well a the looping condition which will give you an error if you array is exactly 1million long.
Note: avoid using variable names of existing function e.g. tan.
Note2: Are you sure that you do not want both the definition of tan and deltaOfTime to be inside the if-statement?

Index Exceeds Matrix Dimensions Error

I'm currently working on creating a histogram of Altitudes at which a type of atmospheric instability happens. To be specific, it is when the values of what we call, N^2 is less than zero. This is where the problem comes in. I am trying to plot the occurrence frequency against the altitudes.
load /data/matlabst/DavidBloom/N_square_Ri_number_2005.mat
N_square(N_square > 0) = 0;
N_square = abs(N_square);
k = (1:87);
H = 7.5;
p0 = 101325;
nbins = (500);
N_square(N_square==0)=[];
Alt = zeros(1,578594);
PresNew = squeeze(N_square(:,:,k,:));
for lati = 1:32
for long = 1:64
for t = 1:1460
for k = 1:87
Alt(1,:) = -log((PresNew)/p0)*H;
end
end
end
end
So, let me explain what I am doing. I'm loading a file with all these different variables. Link To Image This shows the different variables it displays. Next, I take the 4-D matrix N_square and I filter all values greater than zero to equal 0. Then I take the absolute value of the leftover negative values. I then define several variables and move on to the next filtering.
(N_square(N_square==0)=[];
The goal of this one was give just discard all values of N_square that were 0. I think this is where the problem begins. Jumping down to the for loop, I am then taking the 3rd dimension of N_square and converting pressure to altitude.
My concern is that when I run this, PresNew = squeeze(N_square(:,:,k,:)); is giving me the error.
Error in PlottingN_2 (line 10)
PresNew = squeeze(N_square(:,:,k,:));
And I have no idea why.
Any thoughts or suggestions on how I could avoid this catastrophe and make my code simpler? Thanks.
When you remove random elements from a multi-dimensional array, they are removed but it can no longer be a valid multi-dimensional array because it has holes in it. Because of this, MATLAB will collapse the result into a vector, and you can't index into the third dimension of a vector like you're trying.
data = magic(3);
% 8 1 6
% 3 5 7
% 4 9 2
% Remove all values < 2
data(data < 2) = []
% 8 3 4 5 9 6 7 2
data(2,3)
% Index exceeds matrix dimensions.
The solution is to remove the 0 values after your indexing (i.e. within your loop).
Alt = zeros(1,578594);
for lati = 1:32
for long = 1:64
for t = 1:1460
for k = 1:87
% Index into 4D matrix
PresNew = N_square(:,:,k,:);
% NOW remove the 0 values
PresNew(PresNew == 0) = [];
Alt(1,:) = -log((PresNew)/p0)*H;
end
end
end
end

Vectorize MATLAB code

Let's say we have three m-by-n matrices of equal size: A, B, C.
Every column in C represents a time series.
A is the running maximum (over a fixed window length) of each time series in C.
B is the running minimum (over a fixed window length) of each time series in C.
Is there a way to determine T in a vectorized way?
[nrows, ncols] = size(A);
T = zeros(nrows, ncols);
for row = 2:nrows %loop over the rows (except row #1).
for col = 1:ncols %loop over the columns.
if C(row, col) > A(row-1, col)
T(row, col) = 1;
elseif C(row, col) < B(row-1, col)
T(row, col) = -1;
else
T(row, col) = T(row-1, col);
end
end
end
This is what I've come up with so far:
T = zeros(m, n);
T(C > circshift(A,1)) = 1;
T(C < circshift(B,1)) = -1;
Well, the trouble was the dependency with the ELSE part of the conditional statement. So, after a long mental work-out, here's a way I summed up to vectorize the hell-outta everything.
Now, this approach is based on mapping. We get column-wise runs or islands of 1s corresponding to the 2D mask for the ELSE part and assign them the same tags. Then, we go to the start-1 along each column of each such run and store that value. Finally, indexing into each such start-1 with those tagged numbers, which would work as mapping indices would give us all the elements that are to be set in the new output.
Here's the implementation to fulfill all those aspirations -
%// Store sizes
[m1,n1] = size(A);
%// Masks corresponding to three conditions
mask1 = C(2:nrows,:) > A(1:nrows-1,:);
mask2 = C(2:nrows,:) < B(1:nrows-1,:);
mask3 = ~(mask1 | mask2);
%// All but mask3 set values as output
out = [zeros(1,n1) ; mask1 + (-1*(~mask1 & mask2))];
%// Proceed if any element in mask3 is set
if any(mask3(:))
%// Row vectors for appending onto matrices for matching up sizes
mask_appd = false(1,n1);
row_appd = zeros(1,n1);
%// Get 2D mapped indices
df = diff([mask_appd ; mask3],[],1)==1;
cdf = cumsum(df,1);
offset = cumsum([0 max(cdf(:,1:end-1),[],1)]);
map_idx = bsxfun(#plus,cdf,offset);
map_idx(map_idx==0) = 1;
%// Extract the values to be used for setting into new places
A1 = out([df ; false(1,n1)]);
%// Map with the indices obtained earlier and set at places from mask3
newval = [row_appd ; A1(map_idx)];
mask3_appd = [mask_appd ; mask3];
out(mask3_appd) = newval(mask3_appd);
end
Doing this vectorized is rather difficult because the current row's output depends on the previous row's output. Doing vectorized operations usually means that each element should stand out on its own using some relationship that is independent of the other elements that surround it.
I don't have any input on how you would achieve this without a for loop but I can help you reduce your operations down to one instead of two. You can do the assignment vectorized per row, but I can't see how you'd do it all in one shot.
As such, try something like this instead:
[nrows, ncols] = size(A);
T = zeros(nrows, ncols);
for row = 2:nrows
out = T(row-1,:); %// Change - Make a copy of the previous row
out(C(row,:) > A(row-1,:)) = 1; %// Set those elements of C
%// in the current row that are larger
%// than the previous row of A to 1
out(C(row,:) < B(row-1,:)) = -1; %// Same logic but for B now and it's
%// less than and the value is -1 instead
T(row,:) = out; %// Assign to the output
end
I'm currently figuring out how to do this with any loops whatsoever. I'll keep you posted.

Apply function to all rows

I have a function, ranker, that takes a vector and assigns numerical ranks to it in ascending order. For example,
ranker([5 1 3 600]) = [3 1 2 4] or
ranker([42 300 42 42 1 42] = [3.5 6 3.5 3.5 1 3.5] .
I am using a matrix, variable_data and I want to apply the ranker function to each row for all rows in variable data. This is my current solution, but I feel there is a way to vectorize it and have it as equally fast :p
variable_ranks = nan(size(variable_data));
for i=1:1:numel(nmac_ids)
variable_ranks(i,:) = ranker(abs(variable_data(i,:)));
end
If you place the matrix rows into a cell array, you can then apply a function to each cell.
Consider this simple example of applying the SORT function to each row
a = rand(10,3);
b = cell2mat( cellfun(#sort, num2cell(a,2), 'UniformOutput',false) );
%# same as: b = sort(a,2);
You can even do this:
b = cell2mat( arrayfun(#(i) sort(a(i,:)), 1:size(a,1), 'UniformOutput',false)' );
Again, you version with the for loop is probably faster..
With collaboration from Amro and Jonas
variable_ranks = tiedrank(variable_data')';
Ranker has been replaced by the Matlab function in the Stat toolbox (sorry for those who don't have it),
[R,TIEADJ] = tiedrank(X) computes the
ranks of the values in the vector X.
If any X values are tied, tiedrank
computes their average rank. The
return value TIEADJ is an adjustment
for ties required by the nonparametric
tests signrank and ranksum, and for
the computation of Spearman's rank
correlation.
TIEDRANK will compute along columns in Matlab 7.9.0 (R2009b), however it is undocumented. So by transposing the input matrix, rows turn into columns and will rank them. The second transpose is then used to organize the data in the same manner as the input. There in essence is a very classy hack :p
One way would be to rewrite ranker to take array input
sizeData = size(variable_data);
[sortedData,almostRanks] = sort(abs(variable_data),2);
[rowIdx,colIdx] = ndgrid(1:sizeData(1),1:sizeData(2));
linIdx = sub2ind(sizeData,rowIdx,almostRanks);
variable_ranks = variable_data;
variable_ranks(linIdx) = colIdx;
%# break ties by finding subsequent equal entries in sorted data
[rr,cc] = find(diff(sortedData,1,2) == 0);
ii = sub2ind(sizeData,rr,cc);
ii2 = sub2ind(sizeData,rr,cc+1);
ii = sub2ind(sizeData,rr,almostRanks(ii));
ii2 = sub2ind(sizeData,rr,almostRanks(ii2));
variable_ranks(ii) = variable_ranks(ii2);
EDIT
Instead, you can just use TIEDRANK from TMW (thanks, #Amro):
variable_rank = tiedrank(variable_data')';
I wrote a function that does this, it's on the FileExchange tiedrank_(X,dim). And it looks like this...
%[Step 0a]: force dim to be 1, and compress everything else into a single
%dimension. We will reverse this process at the end.
if dim > 1
otherDims = 1:length(size(X));
otherDims(dim) = [];
perm = [dim otherDims];
X = permute(X,perm);
end
originalSiz = size(X);
X = reshape(X,originalSiz(1),[]);
siz = size(X);
%[Step 1]: sort and get sorting indicies
[X,Ind] = sort(X,1);
%[Step 2]: create matrix [D], which has +1 at the start of consecutive runs
% and -1 at the end, with zeros elsewhere.
D = zeros(siz,'int8');
D(2:end-1,:) = diff(X(1:end-1,:) == X(2:end,:));
D(1,:) = X(1,:) == X(2,:);
D(end,:) = -( X(end,:) == X(end-1,:) );
clear X
%[Step 3]: calculate the averaged rank for each consecutive run
[a,~] = find(D);
a = reshape(a,2,[]);
h = sum(a,1)/2;
%[Step 4]: insert the troublseome ranks in the relevant places
L = zeros(siz);
L(D==1) = h;
L(D==-1) = -h;
L = cumsum(L);
L(D==-1) = h; %cumsum set these ranks to zero, but we wanted them to be h
clear D h
%[Step 5]: insert the simple ranks (i.e. the ones that didn't clash)
[L(~L),~] = find(~L);
%[Step 6]: assign the ranks to the relevant position in the matrix
Ind = bsxfun(#plus,Ind,(0:siz(2)-1)*siz(1)); %equivalent to using sub2ind + repmat
r(Ind) = L;
%[Step 0b]: As promissed, we reinstate the correct dimensional shape and order
r = reshape(r,originalSiz);
if dim > 1
r = ipermute(r,perm);
end
I hope that helps someone.