How do I create a simliarity matrix in MATLAB? - matlab

I am working towards comparing multiple images. I have these image data as column vectors of a matrix called "images." I want to assess the similarity of images by first computing their Eucledian distance. I then want to create a matrix over which I can execute multiple random walks. Right now, my code is as follows:
% clear
% clc
% close all
%
% load tea.mat;
images = Input.X;
M = zeros(size(images, 2), size (images, 2));
for i = 1:size(images, 2)
for j = 1:size(images, 2)
normImageTemp = sqrt((sum((images(:, i) - images(:, j))./256).^2));
%Need to accurately select the value of gamma_i
gamma_i = 1/10;
M(i, j) = exp(-gamma_i.*normImageTemp);
end
end
My matrix M however, ends up having a value of 1 along its main diagonal and zeros elsewhere. I'm expecting "large" values for the first few elements of each row and "small" values for elements with column index > 4. Could someone please explain what is wrong? Any advice is appreciated.

Since you're trying to compute a Euclidean distance, it looks like you have an error in where your parentheses are placed when you compute normImageTemp. You have this:
normImageTemp = sqrt((sum((...)./256).^2));
%# ^--- Note that this parenthesis...
But you actually want to do this:
normImageTemp = sqrt(sum(((...)./256).^2));
%# ^--- ...should be here
In other words, you need to perform the element-wise squaring, then the summation, then the square root. What you are doing now is summing elements first, then squaring and taking the square root of the summation, which essentially cancel each other out (or are actually the equivalent of just taking the absolute value).
Incidentally, you can actually use the function NORM to perform this operation for you, like so:
normImageTemp = norm((images(:, i) - images(:, j))./256);

The results you're getting seem reasonable. Recall the behavior of the exp(-x). When x is zero, exp(-x) is 1. When x is large exp(-x) is zero.
Perhaps if you make M(i,j) = normImageTemp; you'd see what you expect to see.

Consider this solution:
I = Input.X;
D = squareform( pdist(I') ); %'# euclidean distance between columns of I
M = exp(-(1/10) * D); %# similarity matrix between columns of I
PDIST and SQUAREFORM are functions from the Statistics Toolbox.
Otherwise consider this equivalent vectorized code (using only built-in functions):
%# we know that: ||u-v||^2 = ||u||^2 + ||v||^2 - 2*u.v
X = sum(I.^2,1);
D = real( sqrt(bsxfun(#plus,X,X')-2*(I'*I)) );
M = exp(-(1/10) * D);
As was explained in the other answers, D is the distance matrix, while exp(-D) is the similarity matrix (which is why you get ones on the diagonal)

there is an already implemented function pdist, if you have a matrix A, you can directly do
Sim= squareform(pdist(A))

Related

Octave Matrices: Why are these two costfunctions acting differently?

Knowing that X is a mx3 matrix and theta is a 3x1 matrix, I calculated the cost function of logistic regression as follows:
h = sigmoid(theta'*X');
J = ((-y)*log(h)-(1-y)*log(1-h))/m;
grad(1) = (h'-y)'*X(:,1);
grad(2) = (h'-y)'*X(:,2);
grad(3) = (h'-y)'*X(:,3);
The output is the picture attached:
That's explicitly not the correct result.
When I do
h = sigmoid(X*theta);
J = ((-y)'*log(h)-(1-y)'*log(1-h))/m;
grad = (X'*(h - y))/m;
I get the right result:
For me, these two codes are the same - and yes, I checked the matrices sizes in the first code.
Could somebody help me understand while one is giving one input and the other a different output? Somehow, the first code is giving lots of cost at theta values...
This is because you're not paying attention to the dimensionality of your inputs and outputs. (which in turn is because your code is not properly commented/structured.). Assuming y has the same orientation as X in terms of observations, then:
In the first case you have:
h = sigmoid(theta'*X'); # h is a 1xm horizontal vector
J = ((-y)*log(h)-(1-y)*log(1-h))/m; # J is an mxm matrix
In the second case you have:
h = sigmoid(X*theta); # h is an mx1 vector
J = ((-y)'*log(h)-(1-y)'*log(1-h))/m; # J is a 1xm horizontal vector
This is also the reason you get multiple printouts of that "Cost at test theta" printout. My guess is you're calling "sum" somewhere down the line, to sum over m observations, but because J was an mxm matrix instead of a vector, you ended up with a vector in an fprintf statement, which has the effect of printing that statement as many times as there are elements in your vector. Is m=12 by any chance?

How to Deal with Edge Cases: For Loops and Modulo

I'm trying to apply bare-bones image processing to images like this: My for-loop does exactly what I want it to: it allows me to find the pixels of highest intensity, and also remember the coordinates of that pixel. However, the code breaks whenever it encounters a multiple of rows – which in this case is equal to 18.
For example, the length of this image (rows * columns of image) is 414. So there are 414/18 = 23 cases where the program fails (i.e., the number of columns).
Perhaps there is a better way to accomplish my goal, but this is the only way I could think of sorting an image by pixel intensity while also knowing the coordinates of each pixel. Happy to take suggestions of alternative code, but it'd be great if someone had an idea of how to handle the cases where mod(x,18) = 0 (i.e., when the index of the vector is divisible by the total # of rows).
image = imread('test.tif'); % feed program an image
image_vector = image(:); % vectorize image
[sortMax,sortIndex] = sort(image_vector, 'descend'); % sort vector so
%that highest intensity pixels are at top
max_sort = [];
[rows,cols] = size(image);
for i=1:length(image_vector)
x = mod(sortIndex(i,1),rows); % retrieve original coordinates
% of pixels from matrix "image"
y = floor(sortIndex(i,1)/rows) +1;
if image(x,y) > 0.5 * max % filter out background noise
max_sort(i,:) = [x,y];
else
continue
end
end
You know that MATLAB indexing starts at 1, because you do +1 when you compute y. But you forgot to subtract 1 from the index first. Here is the correct computation:
index = sortIndex(i,1) - 1;
x = mod(index,rows) + 1;
y = floor(index/rows) + 1;
This computation is performed by the function ind2sub, which I recommend you use.
Edit: Actually, ind2sub does the equivalent of:
x = rem(sortIndex(i,1) - 1, rows) + 1;
y = (sortIndex(i,1) - x) / rows + 1;
(you can see this by typing edit ind2sub. rem and mod are the same for positive inputs, so x is computed identically. But for computing y they avoid the floor, I guess it is slightly more efficient.
Note also that
image(x,y)
is the same as
image(sortIndex(i,1))
That is, you can use the linear index directly to index into the two-dimensional array.

Vectorize function that finds an array of nearest values

I am still wrapping my head around vectorization and I'm having a difficult time trying to resolve the following function I made...
for i = 1:size(X, 1)
min_n = inf;
for j=1:K
val = X(i,:)' - centroids(j,:)';
diff = val'*val;
if (diff < min_n)
idx(i) = j;
min_n = diff;
end
end
end
X is an array of (x,y) coordinates...
2 5
5 6
...
...
centroids in this example is limited to 3 rows. It is also in (x,y) format as shown above.
For every pair in X I am computing the closest pair of centroids. I then store the index of the centroid in idx.
So idx(i) = j means that I am storing the index j of the centroid at index i, where i corresponds to the index of X. This means the closest centroid to pair X(i, :) is at idx(i).
Can I possibly simplify this via vectorization? I struggle with just vectorizing the inner loop.
Here are three options. But please note that the disadvantage of vectorization, as compared to your double loops, is that it stores all the difference operation results at once, which means that if your matrices have many rows, you might run out of memory. On the other hand, the vectorized approach is probably much faster.
Option 1
If you have access to Statistics and Machine Learning Toolbox, you can use the function pdist2 to get all the pairwise distances between rows of two matrices. Then, the min function gives you the minimum of each column of the result. Its first returned value are the minimal values, and its second are the indices, which is what you need for idx:
diff = pdist2(centroids,X);
[~,idx] = min(diff);
Option 2
If you don't have access to the toolbox, you can use bsxfun. This will let you compute the difference operation between the two matrices even if their dimensions don't agree. All you need to do is to use shiftdim to reshape X' to have size [1,size(X,2),size(X,1)], and then reshapedX and and centroids are compatible with their dimensions (see documentation of bsxfun). This lets you take the difference between their values. The result is a three dimensional array, which you need to sum along the second dimension to get the norm of the differences between rows. At this point you can proceed as in option 1.
reshapedX = shiftdim(X',-1);
diff = bsxfun(#minus,centroids,reshapedX);
diff = squeeze(sum(diff.^2,2));
[~,idx] = min(diff);
Note: Starting in the Matlab version 2016b, the bsxfun is used implicitly and you do not need to call it anymore. So the line with bsxfun can be replaced with the simpler line diff = centroids-reshapedX.
Option 3
Use the function dsearchn, which performs exactly what you need:
idx = dsearchn(centroids,X);
it could be done using pdist2 - pairwise distances between rows of two matrices:
% random data
X = rand(500,2);
centroids = rand(3,2);
% pairwise distances
D = pdist2(X,centroids);
% closest centroid index for each X coordinates
[~,idx] = min(D,[],2)
% plot
scatter(centroids(:,1),centroids(:,2),300,(1:size(centroids,1))','filled');
hold on;
scatter(X(:,1),X(:,2),30,idx);
legend('Centroids','data');

How do I write correlation coefficient manually in matlab?

The following is a function that takes two equal sized vectors X and Y, and is supposed to return a vector containing single correlation coefficients for image correspondence. The function is supposed to work similarly to the built in corr(X,Y) function in matlab if given two equal sized vectors. Right now my code is producing a vector containing multiple two-number vectors instead of a vector containing single numbers. How do I fix this?
function result = myCorr(X, Y)
meanX = mean(X);
meanY = mean(Y);
stdX = std(X);
stdY = std(Y);
for i = 1:1:length(X),
X(i) = (X(i) - meanX)/stdX;
Y(i) = (Y(i) - meanY)/stdY;
mult = X(i) * Y(i);
end
result = sum(mult)/(length(X)-1);
end
Edit: To clarify I want myCorr(X,Y) above to produce the same output at matlab's corr(X,Y) when given equal sized vectors of image intensity values.
Edit 2: Now the format of the output vector is correct, however the values are off by a lot.
I recommend you use r=corrcoef(X,Y) it will give you a normalized r value you are looking for in a 2x2 matrix and you can just return the r(2,1) entry as your answer. Doing this is equivalent to
r=(X-mean(X))*(Y-mean(Y))'/(sqrt(sum((X-mean(X)).^2))*sqrt(sum((Y-mean(Y)).^2)))
However, if you really want to do what you mentioned in the question you can also do
r=(X)*(Y)'/(sqrt(sum((X-mean(X)).^2))*sqrt(sum((Y-mean(Y)).^2)))

Vectorizing Dependent Nested Loops

I am looking for an efficient way to calculate the pairwise distances between all points in a coordinate matrix of size (nodeCount x 2) using MATLAB. I do not desire to calculate the pairwise distance twice (for example, between nodes 1-2 and between nodes 2-1). I have constructed an outer 'for' loop that increments through each node with an inner loop that evaluates only nodes of a higher index number. The result is an upper triangular matrix populated by the nodal separation distances. I would like to vectorize these computations, or at a minimum increase the efficiency of this operation. Any help would be appreciated.
gap = 10;
for s = 1:(nodeCount);
for ss = s+1:(nodeCount);
if abs(nodeCoord(s,1)-nodeCoord(ss,1)) < gap;
sep(s,ss) = sqrt((nodeCoord(s,1)-nodeCoord(ss,1))^2+(nodeCoord(s,2)-nodeCoord(ss,2))^2);
end
end
end
The loop is not really dependent that perspect. I guess you want to find the distance to all other coordinates try this:
xCoord = [1;2;3;4;5];
yCoord = [1;2;3;4;5]:
xSquare = bsxfun(#(x,y) power((x-y),2),xCoord,xCoord.');
ySquare = bsxfun(#(x,y) power((x-y),2),yCoord,yCoord.');
dist = sqrt(xSquare+ySquare);
xCoord = [1;2;3;4;5];
yCoord = [1;2;3;4;5];
dist = sqrt(pdist2(xCoord,yCoord,'euclidean'));
Can use the function pdist2
Rather than trying to use the fact that the lower triangular elements are not needed, as they are zeros in the output, I think you are better off using a technique that's based on fast matrix multiplication as discussed in this very smart solution to get the full matrix of euclidean distances. To get the desired output of upper triangular matrix, you can wrap the output with triu.
The code that follows next is a slightly modified version of it on the terms that we are calculating the distances among the same pair of coordinates from nodeCoord.
Code
numA = size(nodeCoord,1);
helpA = ones(numA,6);
helpB = ones(numA,6);
for idx = 1:2
sqA_idx = nodeCoord(:,idx).^2;
helpA(:,3*idx-1:3*idx) = [-2*nodeCoord(:,idx), sqA_idx ];
helpB(:,3*idx-2:3*idx-1) = [sqA_idx , nodeCoord(:,idx)];
end
sep = triu(sqrt(helpA(:,1:3) * helpB(:,1:3)')<gap).* sqrt(helpA * helpB');
pdist(nodeCoord) does it in a fast way, but returning the data in a vector. Mapping it back to a matrix costs about the same time as computing the distances:
sep3=zeros(nodeCount,nodeCount);
sep3(tril(true(nodeCount),-1))=pdist(nodeCoord);
sep3=sep3+sep3.';
If you are happy with a lower triangular matrix, you can leave out the last line.