proper way to normalize my distance matrices (matlab) - matlab

I am facing a doubt about a comparison that I want to do between two distance matrices. Lets say that I have my ground truth matrix:
gt = [1 0 0 0 1;
0 1 0 0 1;
0 0 1 0 0;
0 0 0 1 0];
and then I have two other extracted matrices:
v1 = [0.6136 0.1012 0.1146 0.1647 0.7445;
0.2264 0.7457 -0.0015 -0.0093 1.0026;
-0.0107 0.1975 1.1219 0.1699 0.1926;
-0.0019 0.0564 0.1560 0.7723 0.0565];
v2 = [0.8209 0.1390 0.1538 0.0203 0.9997;
0.2295 0.7720 -0.0028 -0.0112 1.0329;
-0.0167 0.2593 0.8172 0.2227 0.2501;
-0.0000 0.0549 0.1561 1.2728 0.0569];
Then I want to extract the distance matrix of each column of the above matrices to the columns of the ground truth matrix gt. The way I am getting this distance is dist1 = pdist2(gt', V1','euclidean'); and dist2 = pdist2(gt', V2','euclidean');. However, the result two distance matrices are not comparable right? Since the value range of each of the v1 and v2 matrices are different, therefore I need to apply a kind of normalization in order to be able to make conclusions on the result (please correct, if I am wrong).
However, I am not sure if this should be before or after I compute the distance matrices and what type of normalization to use. The negative values are playing a role of penalizing against (for that reason I am saying that I might need to apply the normalization after I compute the distance matrix, otherwise my first pick would be to normalize the v1 and v2 before I get their distance to the gt), therefore their affect should be kept and after the normalization.
Can you please give some feedback on that, how and what type of normalization to apply.
Thanks

Related

Finding equal rows in Matlab

I have a matrix suppX in Matlab with size GxN and a matrix A with size MxN. I would like your help to construct a matrix Xresponse with size GxM with Xresponse(g,m)=1 if the row A(m,:) is equal to the row suppX(g,:) and zero otherwise.
Let me explain better with an example.
suppX=[1 2 3 4;
5 6 7 8;
9 10 11 12]; %GxN
A=[1 2 3 4;
1 2 3 4;
9 10 11 12;
1 2 3 4]; %MxN
Xresponse=[1 1 0 1;
0 0 0 0;
0 0 1 0]; %GxM
I have written a code that does what I want.
Xresponsemy=zeros(size(suppX,1), size(A,1));
for x=1:size(suppX,1)
Xresponsemy(x,:)=ismember(A, suppX(x,:), 'rows').';
end
My code uses a loop. I would like to avoid this because in my real case this piece of code is part of another big loop. Do you have suggestions without looping?
One way to do this would be to treat each matrix as vectors in N dimensional space and you can find the L2 norm (or the Euclidean distance) of each vector. After, check if the distance is 0. If it is, then you have a match. Specifically, you can create a matrix such that element (i,j) in this matrix calculates the distance between row i in one matrix to row j in the other matrix.
You can treat your problem by modifying the distance matrix that results from this problem such that 1 means the two vectors completely similar and 0 otherwise.
This post should be of interest: Efficiently compute pairwise squared Euclidean distance in Matlab.
I would specifically look at the answer by Shai Bagon that uses matrix multiplication and broadcasting. You would then modify it so that you find distances that would be equal to 0:
nA = sum(A.^2, 2); % norm of A's elements
nB = sum(suppX.^2, 2); % norm of B's elements
Xresponse = bsxfun(#plus, nB, nA.') - 2 * suppX * A.';
Xresponse = Xresponse == 0;
We get:
Xresponse =
3×4 logical array
1 1 0 1
0 0 0 0
0 0 1 0
Note on floating-point efficiency
Because you are using ismember in your implementation, it's implicit to me that you expect all values to be integer. In this case, you can very much compare directly with the zero distance without loss of accuracy. If you intend to move to floating-point, you should always compare with some small threshold instead of 0, like Xresponse = Xresponse <= 1e-10; or something to that effect. I don't believe that is needed for your scenario.
Here's an alternative to #rayryeng's answer: reduce each row of the two matrices to a unique identifier using the third output of unique with the 'rows' input flag, and then compare the identifiers with singleton expansion (broadcast) using bsxfun:
[~, ~, w] = unique([A; suppX], 'rows');
Xresponse = bsxfun(#eq, w(1:size(A,1)).', w(size(A,1)+1:end));

Matlab 2-D density plot

I am trying to do a density plot for a data containing two columns with different ranges. The RMSD column is [0-2] and Angle is [0-200] ranges.
My data in the file is like this:
0.0225370 37.088
0.1049553 35.309
0.0710002 33.993
0.0866880 34.708
0.0912664 33.011
0.0932054 33.191
0.1083590 37.276
0.1104145 34.882
0.1027977 34.341
0.0896688 35.991
0.1047578 36.457
0.1215936 38.914
0.1105484 35.051
0.0974138 35.533
0.1390955 33.601
0.1333878 32.133
0.0933365 35.714
0.1200465 33.038
0.1155794 33.694
0.1125247 34.522
0.1181806 37.890
0.1291700 38.871
I want both x and y axis to be binned 1/10th of the range
The 0 of both the axis to be starting in the same
Print the number of elements in each grid of the matrix like this and make a density plot based on these number of elements
0 0.1 0.2 (RMSD)
0 0 1 3
20 2 0 4
40 1 0 5
60 0 0 2
(Angle)
I can find ways to do 1-D binning but then I am stumped about how to make a density plot from those values and havent even dared to attempt2-D binning + plotting.
Thanks for the help
I think you want hist3. Assuming you want to specifty bin edges (not bin centers), use
result = hist3(data, 'Edges', {[0 .1 .2], [0 20 40 60]}).';
where data denotes your data.
From the linked documentation:
hist3(X,'Edges',edges), where edges is a two-element cell array of numeric vectors with monotonically non-decreasing values, uses a 2-D grid of bins with edges at edges{1} in the first dimension and at edges{2} in the second. The (i,j)th bin includes the value X(k,:) if
edges{1}(i) <= X(k,1) < edges{1}(i+1)
edges{2}(j) <= X(k,2) < edges{2}(j+1)
With your example data this gives
result =
0 0 0
8 14 0
0 0 0
0 0 0
For those who don't have Statistics and Machine Learning Toolbox to run bivariate histogram (hist3), it may be more practical using an alternative to solve 2-D hist problem. The following function generates the same output
function N = hist3_alt(x,y,edgesX,edgesY)
N = zeros(length(edgesY)-1,length(edgesX)-1);
[~,~,binX] = histcounts(x,edgesX);
for ii=1:numel(edgesX)-1
N(:,ii) = (histcounts(y(binX==ii),edgesY))';
end
It's simple and efficient. Then you could run the function like this:
N = hist3_alt(x,y,[0:0.1:2],[0:20:200])

isequal() and == used to compare matrices not working properly matlab

I am trying to write a method that checks if a matrix is orthogonal and return TRUE if it is or FALSE if it isn't My problem is that my isequal() is not working how I want it to. Basically I can do the check in two ways based on the two formulas:
ONE way is check to see if the transpose of matrix R is equal to the inverse of matrix R. If they are equal then it is orthogonal. (R'=inv(R))
ANOTHER way is to check and see if matrix R times the transpose of matrix R equals the Identity matrix of R. (R'R=I) If yes then the matrix is orthogonal. I have most been using isequal() but it keeps yielding false. Can someone look at my code and tell me why this would be so?
I use Z=orth(randn(3,3)) to generate random orthogonal matrix and i call my method isortho(Z)
function R = isortho(r)
%isortho(R), which returns true if R is orthogonal matrix, otherwise returns false.
if ismatrix(r) && size(r,1)==size(r,2) %checks if input is square matrix
'------'
trans=transpose(r)
inverted=inv(r)
isequal(trans,inverted)
trans==inverted
isequal(transpose(r),inv(r)) %METHOD ONE
i=size(r,1);
I=eye(i) %creating Identity matrix based on size of r
r*transpose(r)
r*transpose(r)==I %METHOD TWO
%check if transpose of r is times inverse of r equals Identity matrix of r
if (r*transpose(r)==I)
R= 'True';
else
R= 'False';
end
end
end
this is my output:
>> isortho(Z)
ans =
------
trans =
-0.2579 -0.7291 -0.6339
0.8740 0.1035 -0.4747
0.4117 -0.6765 0.6106
inverted =
-0.2579 -0.7291 -0.6339
0.8740 0.1035 -0.4747
0.4117 -0.6765 0.6106
ans = ////isequal(trans,inverted) which yielded 0 false
0
ans = ////trans==inverted
0 1 0
1 0 0
0 1 1
ans = ////isequal(transpose(r),inv(r))
0
I =
1 0 0
0 1 0
0 0 1
ans =
1.0000 0 0.0000
0 1.0000 0.0000
0.0000 0.0000 1.0000
ans =
1 1 0
1 1 0
0 0 1
ans =
False
>>
could someone help me fix this or tell my why the isequal() is failing when matrix inverted and trans appear to be the same?
As stated in the comments, you are running into computer precision issues. For more detail see Why is 24.0000 not equal to 24.0000 in MATLAB? and http://matlabgeeks.com/tips-tutorials/floating-point-comparisons-in-matlab/. This is not a Matlab specific thing, it's a computer thing, and you just have to deal with it.
In your case, you are trying to see whether two things are equal, but the two things are the result of a lot of floating point operations. So they will virtually never be exactly the same, but should always be very close. So, set a tolerance, say 1e-12, and say that the two things are equal if some measure of their difference is below that tolerance, e.g.:
norm(r.'-inv(r))<tol
Which finds the 2-norm of the difference between the two matrices, and then if it is less that tol, this will evaluate to 1, or true.
If I set tol=1e-12, then everything works well. If I set tol=1e-15, everything works well. But if I set tol=1e-16, then everything stops working! This is because the amount of computer precsion error is larger than 1e-16, so the answer to norm(r.'-inv(r)) cannot be accurate to that tolerance. The smallest amount Matlab can distinguish between on my computer is roughly 2.2x10^(-16), so you have to ensure that you tolerance is set well above this value. Setting tol too large will, of course, mean you say some non-orthogonal matrices are orthogonal, but I would not expect tol=1e-14 to give you any significant issues.

Adjacency matrix from Euclidean distance matrix given in matlab

I wanted to know how to create adjacency matrix from euclidean distance matrix i've created before.
for example :
Edm = [0 7.7466 7.7534 0 3.7296 2.8171;
7.7466 0 0.0068 7.7466 4.0170 4.9295;
7.7534 0.0068 0 7.7534 4.0239 4.9364;
0 7.7466 7.7534 0 3.7296 2.8171;
3.7296 4.0170 4.0239 3.7296 0 0.9125;
2.8171 4.9295 4.9364 2.8171 0.9125 0 ]
Edm shows conectivity node 1-6 based on their euclidean distance between each other. Diagonal must be 0 because distance from the same node is zero.
is there a way for me to retrieve an adjaceny matrix with 2 nearest neighbor from Edm above?
I don't get Mohsen's answer to work, so here's my (more cumbersome) suggestion:
sz = size(Edm,1);
n = 2; % Number of desired smallest distances
E = Edm + diag(Inf(1,sz));
[~, mm] = sort(E);
mmi = mm(1:n,:)'; % n smallest distances (in your example, n = 2)
Edm_idx = sparse(mmi(:),repmat(1:sz,1,n),1,sz,sz);
Adj = full(Edm.*Edm_idx);
Not that there are non-diagonal values in Edm that are 0. If these are suppose to be Inf, (as in not connected), you must account for that as well.
Set the diagonal to Inf and use bsxfun to compare the elements in each column with the minimum value in that column:
E = Edm + diag(Inf(1,size(Edm,1)));
A = bsxfun(#eq, E, min(E));

Why sprank(A) and A\b report different rank in matlab?

I have a point set P and I construct it's adjacent matrix A by k-nearest neighbor. Each row of A is [...+1...-1...], indicates a pair of neighbor points. The size of A is 48348 x 8058, sprank(A) is 8058. But when I do the following, it gives me a warning: "Warning: Rank deficient, rank = 8055, tol = 8.307912e-10."
a=A*b;
c=A\a;
and norm(c-b) is quite large. It seems something is wrong with the adjacent matrix A, but I can't figure it out. Thanks in advance!
sprank only tells you how many rows/columns of your matrix have non-zero elements, while A\b is reporting the actual rank of the matrix which indicates how many rows of your matrix are linearly independent. For example, for following matrix:
A = [-1 1 0 0;
0 1 -1 0;
1 0 -1 0;
0 0 1 -1]
sprank(A) is 4 but rank(A) is only 3 because you can write the third row as a linear combination of the other rows, specifically A(2,:) - A(1,:).
The issue that you need to address is either in how you're computing A (if you expect that to generate a system of linearly independent equations) or you need to find a way to use A that doesn't require factorizing a rank deficient matrix.