Finding similar rows in MATLAB - matlab

I have a matrix with a large number of rows. I have another matrix that I will loop through one row at a time. For each row in the second matrix, I need to look for similar rows in the first matrix. Once all the similar rows are found, I need to know the row numbers of the similar rows. These rows will almost never be exact, so ismember does not work.
Also, the solution would preferably (not necessarily, however) give some way to set a level of similarity that would trigger the code to say it is similar and give me the row number.
Is there any way to do this? I've looked around, and I can't find anything.

You could use cosine distance, which finds the angle between two vectors. Similar vectors (in your case, a row and your comparison vector) have a value close to 1 and dissimilar vectors have a value close to 0.
function d = cosSimilarity(u, v)
d = dot(u,v)/(norm(u)*norm(v));
end
To apply this function to each to all pairs of rows in the matrices M and V you could use nested for loops. Hardly the most elegant, but it will work:
numRowsM = size(M, 1)
numRowsV = size(V, 1)
similarThresh = .9
for m = 1:numRowsM
for v = 1:numRowsV
similarity = cosSimilarity(V(v,:), M(m, :))
% Notify about similar rows
if similarity > similarThresh
disp([num2str(m) ' is similar to a row in V'])
end
end
end
Instead of nested for loops, there are definitely other ways. You could start by looking at the solution from this question, which will help you avoid the loop by converting the rows of the matrix into cells of a cell array and then applying the function with cellfun.

Related

How to save matrices from for loop into another matrix

I have a 5-by-200 matrix where the i:50:200, i=1:50 are related to each other, so for example the matrix columns 1,51,101,151 are related to each other, and columns 49,99,149,199 are also related to each other.
I want to use a for-loop to create another matrix that re-sorts the previous matrix based on this relationship.
My code is
values=zeros(5,200);
for j=1:50
for m=1:4:200
a=factor_mat(:,j:50:200)
values(:,m)=a
end
end
However, the code does not work.
Here's what's happening. Let's say we're on the first iteration of the outer loop, so j == 1. This effectively gives you:
j = 1;
for m=1:4:200
a=factor_mat(:,j:50:200)
values(:,m)=a;
end
So you're creating the same submatrix for a (j doesn't change) 50 times and storing it at different places in the values matrix. This isn't really what you want to do.
To create each 4-column submatrix once and store them in 50 different places, you need to use j to tell you which of the 50 you're currently processing:
for j=1:50
a=factor_mat(:,j:50:200);
m=j*4; %// This gives us the **end** of the current range
values(:,m-3:m)=a;
end
I've used a little trick here, because the indices of Matlab arrays start at 1 rather than 0. I've calculated the index of the last column we want to insert. For the first group, this is column 4. Since j == 1, j * 4 == 4. Then I subtract 3 to find the first column index.
That will fix the problem you have with your loops. But loops aren't very Matlab-ish. They used to be very slow; now they're adequate. But they're still not the cool way to do things.
To do this without loops, you can use reshape and permute:
a=reshape(factor_mat,[],50,4);
b=permute(a,[1,3,2]);
values=reshape(b,[],200);

for or while loops in matlab

I've just started using for loops in matlab in programming class and the basic stuff is doing me fine, However I've been asked to "Use loops to create a 3 x 5 matrix in which the value of each element is its row number to the power of its column number divided by the sum of its row number and column number for example the value of element (2,3) is (2^3 / 2+3) = 1.6
So what sort of looping do I need to use to enable me to start new lines to form a matrix?
Since you need to know the row and column numbers (and only because you have to use loops), for-loops are a natural choice. This is because a for-loop will automatically keep track of your row and column number for you if you set it up right. More specifically, you want a nested for loop, i.e. one for loop within another. The outer loop might loop through the rows and the inner loop through the columns for example.
As for starting new lines in a matrix, this is extremely bad practice to do in a loop. You should rather pre-allocate your matrix. This will have a major performance impact on your code. Pre-allocation is most commonly done using the zeros function.
e.g.
num_rows = 3;
num_cols = 5;
M = zeros(num_rows,num_cols); %// Preallocation of memory so you don't grow your matrix in your loop
for row = 1:num_rows
for col = 1:num_cols
M(row,col) = (row^col)/(row+col);
end
end
But the most efficient way to do it is probably not to use loops at all but do it in one shot using ndgrid:
[R, C] = ndgrid(1:num_rows, 1:num_cols);
M = (R.^C)./(R+C);
The command bsxfun is very helpful for such problems. It will do all the looping and preallocation for you.
eg:
bsxfun(#(x,y) x.^y./(x+y), (1:3)', 1:5)

Determining if any duplicate rows in two matrices in MatLab

Introduction to problem:
I'm modelling a system where i have a matrix X=([0,0,0];[0,1,0],...) where each row represent a point in 3D-space. I then choose a random row, r, and take all following rows and rotate around the point represented by r, and make a new matrix from these rows, X_rot. I now want to check whether any of the rows from X_rot is equal two any of the rows of X (i.e. two vertices on top of each other), and if that is the case refuse the rotation and try again.
Actual question:
Until now i have used the following code:
X_sim=[X;X_rot];
if numel(unique(X_sim,'rows'))==numel(X_sim);
X(r+1:N+1,:,:)=X_rot;
end
Which works, but it takes up over 50% of my running time and i were considering if anybody in here knew a more efficient way to do it, since i don't need all the information that i get from unique.
P.S. if it matters then i typically have between 100 and 1000 rows in X.
Best regards,
Morten
Additional:
My x-matrix contains N+1 rows and i have 12 different rotational operations that i can apply to the sub-matrix x_rot:
step=ceil(rand()*N);
r=ceil(rand()*12);
x_rot=x(step+1:N+1,:);
x_rot=bsxfun(#minus,x_rot,x(step,:));
x_rot=x_rot*Rot(:,:,:,r);
x_rot=bsxfun(#plus,x_rot,x(step,:));
Two possible approaches (I don't know if they are faster than using unique):
Use pdist2:
d = pdist2(X, X_rot, 'hamming'); %// 0 if rows are equal, 1 if different.
%// Any distance function will do, so try those available and choose fastest
result = any(d(:)==0);
Use bsxfun:
d = squeeze(any(bsxfun(#ne, X, permute(X_rot, [3 2 1])), 2));
result = any(d(:)==0);
result is 1 if there is a row of X equal to some row of X_rot, and 0 otherwise.
How about ismember(X_rot, X, 'rows')?

Assigning the different row to another matrix after comparing two matrices

i have two matrices
r=10,000x2
q=10,000x2
i have to find out those rows of q which are one value or both values(as it is a two column matrix) different then r and allocate them in another matrix, right now i am trying this.i cannot use isequal because i want to know those rows
which are not equal this code gives me the individual elements not the complete rows different
can anyone help please
if r(:,:)~=q(:,:)
IN= find(registeredPts(:,:)~=q(:,:))
end
You can probably do this using ismember. Is this what you want? Here you get the values from q in rows that are different from r.
q=[1,2;3,4;5,6]
r=[1,2;3,5;5,6]
x = q(sum(ismember(q,r),2) < 2,:)
x =
3 4
What this do:
ismember creates an array with 1's in the positions where q == r, and 0 in the remaining positions. sum(.., 2) takes the column sum of each of these rows. If the sum is less than 2, that row is included in the new array.
Update
If the values might differ some due to floating point arithmetic, check out ismemberf from the file exchange. I haven't tested it myself, but it looks good.

Compact MATLAB matrix indexing notation

I've got an n-by-k sized matrix, containing k numbers per row. I want to use these k numbers as indexes into a k-dimensional matrix. Is there any compact way of doing so in MATLAB or must I use a for loop?
This is what I want to do (in MATLAB pseudo code), but in a more MATLAB-ish way:
for row=1:1:n
finalTable(row) = kDimensionalMatrix(indexmatrix(row, 1),...
indexmatrix(row, 2),...,indexmatrix(row, k))
end
If you want to avoid having to use a for loop, this is probably the cleanest way to do it:
indexCell = num2cell(indexmatrix, 1);
linearIndexMatrix = sub2ind(size(kDimensionalMatrix), indexCell{:});
finalTable = kDimensionalMatrix(linearIndexMatrix);
The first line puts each column of indexmatrix into separate cells of a cell array using num2cell. This allows us to pass all k columns as a comma-separated list into sub2ind, a function that converts subscripted indices (row, column, etc.) into linear indices (each matrix element is numbered from 1 to N, N being the total number of elements in the matrix). The last line uses these linear indices to replace your for loop. A good discussion about matrix indexing (subscript, linear, and logical) can be found here.
Some more food for thought...
The tendency to shy away from for loops in favor of vectorized solutions is something many MATLAB users (myself included) have become accustomed to. However, newer versions of MATLAB handle looping much more efficiently. As discussed in this answer to another SO question, using for loops can sometimes result in faster-running code than you would get with a vectorized solution.
I'm certainly NOT saying you shouldn't try to vectorize your code anymore, only that every problem is unique. Vectorizing will often be more efficient, but not always. For your problem, the execution speed of for loops versus vectorized code will probably depend on how big the values n and k are.
To treat the elements of the vector indexmatrix(row, :) as separate subscripts, you need the elements as a cell array. So, you could do something like this
subsCell = num2cell( indexmatrix( row, : ) );
finalTable( row ) = kDimensionalMatrix( subsCell{:} );
To expand subsCell as a comma-separated-list, unfortunately you do need the two separate lines. However, this code is independent of k.
Convert your sub-indices into linear indices in a hacky way
ksz = size(kDimensionalMatrix);
cksz = cumprod([ 1 ksz(1:end-1)] );
lidx = ( indexmatrix - 1 ) * cksz' + 1; #'
% lindx is now (n)x1 linear indices into kDimensionalMatrix, one index per row of indexmatrix
% access all n values:
selectedValues = kDimensionalMatrix( lindx );
Cheers!