Finding closest (not equal) rows in two matrices matlab - matlab

I have two matrices A(m X 3) and B(n X 3); where m >> n.
Numbers in B have close or equal values to numbers in A.
I want to search closest possible values from A to the values present in B in a way that at the end of search, A will reduced to (n X 3).
There are two main issues:
Only a complete row from A can be compared to a complete row in B, where numbers in each column of A and B are varying independently.
Numbers in A and B may be as close as third place of decimal (e.g. 20.101 and 20.103)
I hope I am clear in asking my question.
Does anybody know about any function already present in matlab for this thing?

Depending on how you look at the task, here are two different approaches
Minimum Distance to Each Row in Second Matrix
Two ways to look at this: (1) closest point in A for each point in B, or (2) closest point in B for each point in A.
Closest point in A
For each point in B you can find the closest point in A (e.g. Euclidean distance), as requested in comments:
% Calculate all MxN high-dimensional (3D space) distances at once
distances = squeeze(sum(bsxfun(#minus,B,permute(A,[3 2 1])).^2,2));
% Find closest row in A for each point in B
[~,ik] = min(distances,[],2)
Make an array the size of B containing these closest points in A:
Anew = A(ik,:)
This will implicitly throw out any points in A that are too far from points in B, as long as each point in B does have a match in A. If each point in B does not necessarily have a "match" (point at an acceptable distance) in A, then it is necessary to actively reject points based on distances, resulting in an output that would be shorter than B. This solution seems out of scope.
Closest point in B
Compute the Euclidenan distance from each point (row) in A to each point in B and identify the closest point in B:
distances = squeeze(sum(bsxfun(#minus,A,permute(B,[3 2 1])).^2,2));
[~,ik] = min(distances,[],2)
Make an array the size of A containing these closest points in B:
Anew = B(ik,:)
The size of Anew in this approach is the same as A.
Merging Similar Points in First Matrix
Another approach is to use the undocumented _mergesimpts function.
Consider this test data:
>> B = randi(5,4,3)
B =
1 4 4
2 3 4
1 3 4
3 4 5
>> tol = 0.001;
>> A = repmat(B,3,1) + tol * rand(size(B,1)*3,3)
A =
1.0004 4.0005 4.0000
2.0004 3.0005 4.0008
1.0004 3.0009 4.0002
3.0008 4.0005 5.0004
1.0006 4.0004 4.0007
2.0008 3.0007 4.0004
1.0009 3.0007 4.0007
3.0010 4.0005 5.0004
1.0002 4.0003 4.0007
2.0001 3.0001 4.0007
1.0007 3.0006 4.0004
3.0001 4.0003 5.0000
Merge similar rows in A according to a specified tolerance, tol:
>> builtin('_mergesimpts',A,tol,'average')
ans =
1.0004 4.0004 4.0005
1.0007 3.0007 4.0005
2.0005 3.0005 4.0006
3.0006 4.0004 5.0003
Merge similar rows, using B to get expected numbers
>> builtin('_mergesimpts',[A; B],tol,'first')
ans =
1 3 4
1 4 4
2 3 4
3 4 5

To replace each row of A by the closest row of B
You can use pdist2 to compute distance between rows, and then the second output of min to find the index of the minimum-distance row:
[~, ind] = min(pdist2(B,A,'euclidean')); %// or specify some other distance
result = B(ind,:);
The advantage of this approach is that pdist2 lets you specify other distance functions, or even define your own. For example, to use L1 distance change first line to
[~, ind] = min(pdist2(B,A,'cityblock'));
To retain rows of A which are closest to rows of B
Use pdist2 as above. For each row of A compute the minimum distance to rows of B. Retain the n rows of A with lowest value of that minimum distance:
[~, ii] = sort(min(pdist2(B,A,'euclidean'))); %// or use some other distance
result = A(ii(1:n),:);

Try this code -
%% Create data
m=7;
n=4;
TOL = 0.0005;
A = rand(m,3)/100;
B = rand(n,3)/100;
B(2,:) = A(5,:); % For testing that the matching part of the second row from B must be the fifth row from A
%% Interesting part
B2 = repmat(reshape(B',1,3,n),[m 1]);
closeness_matrix = abs(bsxfun(#minus, A, B2));
closeness_matrix(closeness_matrix<TOL)=0;
closeness_matrix_mean = mean(closeness_matrix,2); % Assuming that the "closeness" for the triplets on each row can be measured by the mean value of them
X1 = squeeze(closeness_matrix_mean);
[minval,minind] = min(X1,[],1);
close_indices = minind';
A_final = A(close_indices,:)

Related

Comparing matrices of different size in matlab and storing values that are close

I have two matrices A and B. A(:,1) corresponds to an x-coordinate, A(:,2) corresponds to a y-coordinate, and A(:,3) corresponds to a certain radius. All three values in a row describe the same circle. Now let's say...
A =
[1,4,3]
[8,8,7]
[3,6,3]
B =
[1,3,3]
[1, 92,3]
[4,57,8]
[5,62,1]
[3,4,6]
[9,8,7]
What I need is to be able to loop through matrix A and determine if there are any rows in matrix B that are "similar" as in the x value is within a range (-2,2) of the x value of A (Likewise with the y-coordinate and radius).If it satisfies all three of these conditions, it will be added to a new matrix with the values that were in A. So for example I would need the above data to return...
ans =
[1,4,3]
[8,8,7]
Please help and thank you in advance to anyone willing to take the time!
You can use ismembertol.
result = A(ismembertol(A,B,2,'ByRows',1,'DataScale',1),:)
Manual method
A = [1,4,3;
8,8,7;
3,6,3];
B = [1,3,3;
1,92,3;
4,57,8;
5,62,1;
3,4,6;
9,8,7]; % example matrices
t = 2; % desired threshold
m = any(all(abs(bsxfun(#minus, A, permute(B, [3 2 1])))<=t, 2), 3);
result = A(m,:);
The key is using permute to move the first dimension of B to the third dimension. Then bsxfun computes the element-wise differences for all pairs of rows in the original matrices. A row of A should be selected if all the absolute differences with respect to any column of B are less than the desired threshold t. The resulting variable m is a logical index which is used for selecting those rows.
Using pdist2 (Statistics and Machine Learning Toolbox)
m = any(pdist2(A, B, 'chebychev')<=t, 2);
result = A(m,:);
Ths pdist2 function with the chebychev option computes the maximum coordinate difference (Chebychev distance, or L∞ metric) between pairs of rows.
With for loop
It should work:
A = [1,4,3;
8,8,7;
3,6,3]
B = [1,3,3;
1,92,3;
4,57,8;
5,62,1;
3,4,6;
9,8,7]
index = 1;
for i = 1:size(A,1)
C = abs(B - A(i,:));
if any(max(C,[],2)<=2)
out(index,:) = A(i,:);
index = index + 1
end
end
For each row of A, computes the absolute difference between B and that row, then checks if there exists a row in which the maximum is less than 2.
Without for loop
ind = any(max(abs(B - permute(A,[3 2 1])),[],2)<=2);
out = A(ind(:),:);

Shifting repeating rows to a new column in a matrix

I am working with a n x 1 matrix, A, that has repeating values inside it:
A = [0;1;2;3;4; 0;1;2;3;4; 0;1;2;3;4; 0;1;2;3;4]
which correspond to an n x 1 matrix of B values:
B = [2;4;6;8;10; 3;5;7;9;11; 4;6;8;10;12; 5;7;9;11;13]
I am attempting to produce a generalised code to place each repetition into a separate column and store it into Aa and Bb, e.g.:
Aa = [0 0 0 0 Bb = [2 3 4 5
1 1 1 1 4 5 6 7
2 2 2 2 6 7 8 9
3 3 3 3 8 9 10 11
4 4 4 4] 10 11 12 13]
Essentially, each repetition from A and B needs to be copied into the next column and then deleted from the first column
So far I have managed to identify how many repetitions there are and copy the entire column over to the next column and then the next for the amount of repetitions there are but my method doesn't shift the matrix rows to columns as such.
clc;clf;close all
A = [0;1;2;3;4;0;1;2;3;4;0;1;2;3;4;0;1;2;3;4];
B = [2;4;6;8;10;3;5;7;9;11;4;6;8;10;12;5;7;9;11;13];
desiredCol = 1; %next column to go to
destinationCol = 0; %column to start on
n = length(A);
for i = 2:1:n-1
if A == 0;
A = [ A(:, 1:destinationCol)...
A(:, desiredCol+1:destinationCol)...
A(:, desiredCol)...
A(:, destinationCol+1:end) ];
end
end
A = [...] retrieved from Move a set of N-rows to another column in MATLAB
Any hints would be much appreciated. If you need further explanation, let me know!
Thanks!
Given our discussion in the comments, all you need is to use reshape which converts a matrix of known dimensions into an output matrix with specified dimensions provided that the number of elements match. You wish to transform a vector which has a set amount of repeating patterns into a matrix where each column has one of these repeating instances. reshape creates a matrix in column-major order where values are sampled column-wise and the matrix is populated this way. This is perfect for your situation.
Assuming that you already know how many "repeats" you're expecting, we call this An, you simply need to reshape your vector so that it has T = n / An rows where n is the length of the vector. Something like this will work.
n = numel(A); T = n / An;
Aa = reshape(A, T, []);
Bb = reshape(B, T, []);
The third parameter has empty braces and this tells MATLAB to infer how many columns there will be given that there are T rows. Technically, this would simply be An columns but it's nice to show you how flexible MATLAB can be.
If you say you already know the repeated subvector, and the number of times it repeats then it is relatively straight forward:
First make your new A matrix with the repmat function.
Then remap your B vector to the same size as you new A matrix
% Given that you already have the repeated subvector Asub, and the number
% of times it repeats; An:
Asub = [0;1;2;3;4];
An = 4;
lengthAsub = length(Asub);
Anew = repmat(Asub, [1,An]);
% If you can assume that the number of elements in B is equal to the number
% of elements in A:
numberColumns = size(Anew, 2);
newB = zeros(size(Anew));
for i = 1:numberColumns
indexStart = (i-1) * lengthAsub + 1;
indexEnd = indexStart + An;
newB(:,i) = B(indexStart:indexEnd);
end
If you don't know what is in your original A vector, but you do know it is repetitive, if you assume that the pattern has no repeats you can use the find function to find when the first element is repeated:
lengthAsub = find(A(2:end) == A(1), 1);
Asub = A(1:lengthAsub);
An = length(A) / lengthAsub
Hopefully this fits in with your data: the only reason it would not is if your subvector within A is a pattern which does not have unique numbers, such as:
A = [0;1;2;3;2;1;0; 0;1;2;3;2;1;0; 0;1;2;3;2;1;0; 0;1;2;3;2;1;0;]
It is worth noting that from the above intuitively you would have lengthAsub = find(A(2:end) == A(1), 1) - 1;, But this is not necessary because you are already effectively taking the one off by only looking in the matrix A(2:end).

Search for 1-D sequence in multidimensional array in Matlab

I have an array with n dimensions, and I have a sequence along one dimension at a certain location on all other dimensions. How do I find the location of this sequence? Preferably without loops.
I use matlab. I know what dimension it should be in, but the sequence isnt necessarily there. Find and == dont work. I could make an nd find function using crosscorrelation but Im guessing this is already implemented and I just dont know what function to call.
example:
ND = rand(10,10,10,10);
V = ND(randi(10),randi(10),randi(10),:);
[I1, I2, I3] = find(ND==V);
Edit: The sequence to be found spans the entire dimension it is on, I did not mention this in my original formulation of the problem. Knedlsepp`s solution solves exactly the problem I had, but Luis' solution solves a more general problem for when the sequence doesn't necessarily span the entire dimension.
As there are multiple ways to interpret your question, I will clarify: This approach assumes a 1D sequence of size: numel(V) == size(ND, dimToSearch). So, for V = [1,2] and ND = [1,2,1,2] it is not applicable. If you want this functionality go with Luis Mendo's answer, if not this will likely be faster.
This will be a perfect opportunity to use bsxfun:
We start with some example data:
ND = rand(10,10,10,10);
V = ND(3,2,:,3);
If you don't have the vector V given in the correct dimension (in this case [1,1,10,1]) you can reshape it in the following way:
dimToSearch = 3;
Vdims = ones(1, ndims(ND));
Vdims(dimToSearch) = numel(V);
V = reshape(V, Vdims);
Now we generate a cell that will hold the indices of the matches:
I = cell(1, ndims(ND));
At this point we compute the size of ND if it were collapsed along the dimension dimToSearch (we compute dimToSearch according to V, as at this point it will have the correct dimensions):
dimToSearch = find(size(V)>1);
collapsedDims = size(ND);
collapsedDims(dimToSearch) = 1;
Finally the part where we actually look for the pattern:
[I{:}] = ind2sub(collapsedDims, find(all(bsxfun(#eq, ND, V), dimToSearch)));
This is done in the following way: bsxfun(#eq, ND, V) will implicitly repmat the array V so it has the same dimensions as ND and do an equality comparison. After this we do a check with all to see if all the entries in the dimension dimToSearch are equal. The calls to find and ind2sub will then generate the correct indices to your data.
Let d be the dimension along which to search. I'm assuming that the sought sequence V may be shorter than size(ND,d). So the sequence may appear once, more than once, or never along each dimension-d- "thread".
The following code uses num2cell to reshape ND into a cell array such that each dimension-d-thread is in a different cell. Then strfind is applied to each cell to determine matches with V, and the result is a cell array with the same dimensions as ND, but where the dimension d is a singleton. The contents of each cell tell the d-dimension-positions of the matches, if any.
Credit goes to #knedlsepp for his suggestion to use num2cell, which greatly simplified the code.
ND = cat(3, [1 2 1 2; 3 4 5 6],[2 1 0 5; 0 0 1 2] ); %// example. 2x4x2
V = 1:2; %// sought pattern. It doesn't matter if it's a row, or a column, or...
d = 2; %// dimension along which to search for pattern V
result = cellfun(#(x) strfind(x(:).', V(:).'), num2cell(ND,d), 'UniformOutput', 0);
This gives
ND(:,:,1) =
1 2 1 2
3 4 5 6
ND(:,:,2) =
2 1 0 5
0 0 1 2
V =
1 2
result{1,1,1} =
1 3 %// V appears twice (at cols 1 and 3) in 1st row, 1st slice
result{2,1,1} =
[] %// V doesn't appear in 2nd row, 1st slice
result{1,1,2} =
[] %// V appears appear in 1st row, 2nd slice
result{2,1,2} =
3 %// V appears once (at col 3) in 2nd row, 2nd slice
One not very optimal way of doing it:
dims = size(ND);
Vrep = repmat(V, [dims(1), dims(2), dims(3), 1]);
ND_V_dist = sqrt(sum(abs(ND.^2-Vrep.^2), 4));
iI = find(ND_V_dist==0);
[I1, I2, I3] = ind2sub([dims(1), dims(2), dims(3)], iI);

Loop through matrix, find row values that meet constraint, create new column

I am sure something similar has been answered before, but I am new to MATLAB and currently stuck with a very simple problem. I have a matrix M and I would like to do the following: Loop through all the values of a column C and if any of these values are not equal to some value, x copy the corresponding values over into another column in the matrix (call it Z), while leaving those values that did not satisfy the condition alone.
I have tried the following, but it's not doing anything:
rows = size(M,1);
for i = 1:rows
if M(:,x) ~= 0
then M(:,Z) = M(:,x)
end
end
Looking at your comments, this is the gist of what you want to accomplish:
Given a matrix A, choose two columns in this matrix: x and Z.
Given a value d, we wish to find all values within x that are different than some value, d.
Take these locations in x and copy them over to the corresponding locations in Z.
Let's do this step by step. Given your example in your comments:
% // Step #1
A = [1 2 3; 4 5 6; 7 8 10];
x = 1; % //Column x
Z = 3; % //Column Z
d = 1; % //Value to compare to
% //Step #2
loc = A(:,x) ~= d;
%//Step #3
A(loc, Z) = A(loc, x);
Since you're new to MATLAB, let's go through this slowly. The first step is pretty basic MATLAB syntax. It's choosing your parameters as well as some basic set up. The second step will give us a logical array that tells us which rows in column x are not equal to d. The last step will find these corresponding rows in column Z and copy these values over from column x over to Z.
The great thing about MATLAB is that it does these kinds of vectorized operations natively. The best thing for optimization is to try to avoid for loops as much as you can. Only use for loops if you have to repeat heavy computations, or situations where you absolutely cannot avoid for loops.
As a bonus, let's see how MATLAB displays each step that we've talked above (except the first one as it's redundant).
% //Step #2
loc = A(:,x) ~= d
>> loc
loc =
0
1
1
%// Step #3
A(loc, Z) = A(loc, x)
>> A =
1 2 3
4 5 4
7 8 7
Note that loc is a logical array such that 0 is false, and 1 is true. As such, rows 2 and 3 satisfy the condition that these values are not equal to d.
Also, FWIW, welcome to StackOverflow!

Update only one matrix element for iterative computation

I have a 3x3 matrix, A. I also compute a value, g, as the maximum eigen value of A. I am trying to change the element A(3,3) = 0 for all values from zero to one in 0.10 increments and then update g for each of the values. I'd like all of the other matrix elements to remain the same.
I thought a for loop would be the way to do this, but I do not know how to update only one element in a matrix without storing this update as one increasingly larger matrix. If I call the element at A(3,3) = p (thereby creating a new matrix Atry) I am able (below) to get all of the values from 0 to 1 that I desired. I do not know how to update Atry to get all of the values of g that I desire. The state of the code now will give me the same value of g for all iterations, as expected, as I do not know how to to update Atry with the different values of p to then compute the values for g.
Any suggestions on how to do this or suggestions for jargon or phrases for me to web search would be appreciated.
A = [1 1 1; 2 2 2; 3 3 0];
g = max(eig(A));
% This below is what I attempted to achieve my solution
clear all
p(1) = 0;
Atry = [1 1 1; 2 2 2; 3 3 p];
g(1) = max(eig(Atry));
for i=1:100;
p(i+1) = p(i)+ 0.01;
% this makes a one giant matrix, not many
%Atry(:,i+1) = Atry(:,i);
g(i+1) = max(eig(Atry));
end
This will also accomplish what you want to do:
A = #(x) [1 1 1; 2 2 2; 3 3 x];
p = 0:0.01:1;
g = arrayfun(#(x) eigs(A(x),1), p);
Breakdown:
Define A as an anonymous function. This means that the command A(x) will return your matrix A with the (3,3) element equal to x.
Define all steps you want to take in vector p
Then "loop" through all elements in p by using arrayfun instead of an actual loop.
The function looped over by arrayfun is not max(eig(A)) but eigs(A,1), i.e., the 1 largest eigenvalue. The result will be the same, but the algorithm used by eigs is more suited for your type of problem -- instead of computing all eigenvalues and then only using the maximum one, you only compute the maximum one. Needless to say, this is much faster.
First, you say 0.1 increments in the text of your question, but your code suggests you are actually interested in 0.01 increments? I'm going to operate under the assumption you mean 0.01 increments.
Now, with that out of the way, let me state what I believe you are after given my interpretation of your question. You want to iterate over the matrix A, where for each iteration you increase A(3, 3) by 0.01. Given that you want all values from 0 to 1, this implies 101 iterations. For each iteration, you want to calculate the maximum eigenvalue of A, and store all these eigenvalues in some vector (which I will call gVec). If this is correct, then I believe you just want the following:
% Specify the "Current" A
CurA = [1 1 1; 2 2 2; 3 3 0];
% Pre-allocate the values we want to iterate over for element (3, 3)
A33Vec = (0:0.01:1)';
% Pre-allocate a vector to store the maximum eigenvalues
gVec = NaN * ones(length(A33Vec), 1);
% Loop over A33Vec
for i = 1:1:length(A33Vec)
% Obtain the version of A that we want for the current i
CurA(3, 3) = A33Vec(i);
% Obtain the maximum eigen value of the current A, and store in gVec
gVec(i, 1) = max(eig(CurA));
end
EDIT: Probably best to paste this code into your matlab editor. The stack-overflow automatic text highlighting hasn't done it any favors :-)
EDIT: Go with Rody's solution (+1) - it is much better!