How do I write correlation coefficient manually in matlab? - matlab

The following is a function that takes two equal sized vectors X and Y, and is supposed to return a vector containing single correlation coefficients for image correspondence. The function is supposed to work similarly to the built in corr(X,Y) function in matlab if given two equal sized vectors. Right now my code is producing a vector containing multiple two-number vectors instead of a vector containing single numbers. How do I fix this?
function result = myCorr(X, Y)
meanX = mean(X);
meanY = mean(Y);
stdX = std(X);
stdY = std(Y);
for i = 1:1:length(X),
X(i) = (X(i) - meanX)/stdX;
Y(i) = (Y(i) - meanY)/stdY;
mult = X(i) * Y(i);
end
result = sum(mult)/(length(X)-1);
end
Edit: To clarify I want myCorr(X,Y) above to produce the same output at matlab's corr(X,Y) when given equal sized vectors of image intensity values.
Edit 2: Now the format of the output vector is correct, however the values are off by a lot.

I recommend you use r=corrcoef(X,Y) it will give you a normalized r value you are looking for in a 2x2 matrix and you can just return the r(2,1) entry as your answer. Doing this is equivalent to
r=(X-mean(X))*(Y-mean(Y))'/(sqrt(sum((X-mean(X)).^2))*sqrt(sum((Y-mean(Y)).^2)))
However, if you really want to do what you mentioned in the question you can also do
r=(X)*(Y)'/(sqrt(sum((X-mean(X)).^2))*sqrt(sum((Y-mean(Y)).^2)))

Related

Octave Matrices: Why are these two costfunctions acting differently?

Knowing that X is a mx3 matrix and theta is a 3x1 matrix, I calculated the cost function of logistic regression as follows:
h = sigmoid(theta'*X');
J = ((-y)*log(h)-(1-y)*log(1-h))/m;
grad(1) = (h'-y)'*X(:,1);
grad(2) = (h'-y)'*X(:,2);
grad(3) = (h'-y)'*X(:,3);
The output is the picture attached:
That's explicitly not the correct result.
When I do
h = sigmoid(X*theta);
J = ((-y)'*log(h)-(1-y)'*log(1-h))/m;
grad = (X'*(h - y))/m;
I get the right result:
For me, these two codes are the same - and yes, I checked the matrices sizes in the first code.
Could somebody help me understand while one is giving one input and the other a different output? Somehow, the first code is giving lots of cost at theta values...
This is because you're not paying attention to the dimensionality of your inputs and outputs. (which in turn is because your code is not properly commented/structured.). Assuming y has the same orientation as X in terms of observations, then:
In the first case you have:
h = sigmoid(theta'*X'); # h is a 1xm horizontal vector
J = ((-y)*log(h)-(1-y)*log(1-h))/m; # J is an mxm matrix
In the second case you have:
h = sigmoid(X*theta); # h is an mx1 vector
J = ((-y)'*log(h)-(1-y)'*log(1-h))/m; # J is a 1xm horizontal vector
This is also the reason you get multiple printouts of that "Cost at test theta" printout. My guess is you're calling "sum" somewhere down the line, to sum over m observations, but because J was an mxm matrix instead of a vector, you ended up with a vector in an fprintf statement, which has the effect of printing that statement as many times as there are elements in your vector. Is m=12 by any chance?

Numerical derivative of a vector

I have a problem with numerical derivative of a vector that is x: Nx1 with respect to another vector t (time) that is the same size of x.
I do the following (x is chosen to be sine function as an example):
t=t0:ts:tf;
x=sin(t);
xd=diff(x)/ts;
but the answer xd is (N-1)x1 and I figured out that it does not compute derivative corresponding to the first element of x.
is there any other way to compute this derivative?
You are looking for the numerical gradient I assume.
t0 = 0;
ts = pi/10;
tf = 2*pi;
t = t0:ts:tf;
x = sin(t);
dx = gradient(x)/ts
The purpose of this function is a different one (vector fields), but it offers what diff doesn't: input and output vector of equal length.
gradient calculates the central difference between data points. For an
array, matrix, or vector with N values in each row, the ith value is
defined by
The gradient at the end points, where i=1 and i=N, is calculated with
a single-sided difference between the endpoint value and the next
adjacent value within the row. If two or more outputs are specified,
gradient also calculates central differences along other dimensions.
Unlike the diff function, gradient returns an array with the same
number of elements as the input.
I know I'm a little late to the game here, but you can also get an approximation of the numerical derivative by taking the derivatives of the polynomial (cubic) splines that runs through your data:
function dy = splineDerivative(x,y)
% the spline has continuous first and second derivatives
pp = spline(x,y); % could also use pp = pchip(x,y);
[breaks,coefs,K,r,d] = unmkpp(pp);
% pre-allocate the coefficient vector
dCoeff = zeroes(K,r-1);
% Columns are ordered from highest to lowest power. Both spline and pchip
% return 4xn matrices, ordered from 3rd to zeroth power. (Thanks to the
% anonymous person who suggested this edit).
dCoeff(:, 1) = 3 * coefs(:, 1); % d(ax^3)/dx = 3ax^2;
dCoeff(:, 2) = 2 * coefs(:, 2); % d(ax^2)/dx = 2ax;
dCoeff(:, 3) = 1 * coefs(:, 3); % d(ax^1)/dx = a;
dpp = mkpp(breaks,dCoeff,d);
dy = ppval(dpp,x);
The spline polynomial is always guaranteed to have continuous first and second derivatives at each point. I haven not tested and compared this against using pchip instead of spline, but that might be another option as it too has continuous first derivatives (but not second derivatives) at every point.
The advantage of this is that there is no requirement that the step size be even.
There are some options to work-around your issue.
First: you can make your domain larger. Instead of N, use N+1 gridpoints.
Second: depending on the end-point of interest, you can use
Forward difference: F(x + dx) - F(x)
Backward difference: F(x) - F(x - dx)

Calculating the covariance of a 1000 5x5 matrices in matlab

I have a 1000 5x5 matrices (Xm) like this:
Each $(x_ij)m$ is a point estimate drawn from a distribution. I'd like to calculate the covariance cov of each $x{ij}$, where i=1..n, and j=1..n in the direction of the red arrow.
For example the variance of $X_m$ is `var(X,0,3) which gives a 5x5 matrix of variances. Can I calculate the covariance in the same way?
Attempt at answer
So far I've done this:
for m=1:1000
Xm_new(m,:)=reshape(Xm(:,:,m)',25,1);
end
cov(Xm_new)
spy(Xm_new) gives me this unusual looking sparse matrix:
If you look at cov (edit cov in the command window) you might see why it doesn't support multi-dimensional arrays. It perform a transpose and a matrix multiplication of the input matrices: xc' * xc. Both operations don't support multi-dimensional arrays and I guess whoever wrote the function decided not to do the work to generalize it (it still might be good to contact the Mathworks however and make a feature request).
In your case, if we take the basic code from cov and make a few assumptions, we can write a covariance function M-file the supports 3-D arrays:
function x = cov3d(x)
% Based on Matlab's cov, version 5.16.4.10
[m,n,p] = size(x);
if m == 1
x = zeros(n,n,p,class(x));
else
x = bsxfun(#minus,x,sum(x,1)/m);
for i = 1:p
xi = x(:,:,i);
x(:,:,i) = xi'*xi;
end
x = x/(m-1);
end
Note that this simple code assumes that x is a series of 2-D matrices stacked up along the third dimension. And the normalization flag is 0, the default in cov. It could be exapnded to multiple dimensions like var with a bit of work. In my timings, it's over 10 times faster than a function that calls cov(x(:,:,i)) in a for loop.
Yes, I used a for loop. There may or may not be faster ways to do this, but in this case for loops are going to be faster than most schemes, especially when the size of your array is not known a priori.
The answer below also works for a rectangular matrix xi=x(:,:,i)
function xy = cov3d(x)
[m,n,p] = size(x);
if m == 1
x = zeros(n,n,p,class(x));
else
xc = bsxfun(#minus,x,sum(x,1)/m);
for i = 1:p
xci = xc(:,:,i);
xy(:,:,i) = xci'*xci;
end
xy = xy/(m-1);
end
My answer is very similar to horchler, however horchler's code does not work with rectangular matrices xi (whose dimensions are different from xi'*xi dimensions).

Can someone explain how to graph this sum in MATLAB using contourf?

I'm going to start off by stating that, yes, this is homework (my first homework question on stackoverflow!). But I don't want you to solve it for me, I just want some guidance!
The equation in question is this:
I'm told to take N = 50, phi1 = 300, phi2 = 400, 0<=x<=1, and 0<=y<=1, and to let x and y be vectors of 100 equally spaced points, including the end points.
So the first thing I did was set those variables, and used x = linspace(0,1) and y = linspace(0,1) to make the correct vectors.
The question is Write a MATLAB script file called potential.m which calculates phi(x,y) and makes a filled contour plot versus x and y using the built-in function contourf (see the help command in MATLAB for examples). Make sure the figure is labeled properly. (Hint: the top and bottom portions of your domain should be hotter at about 400 degrees versus the left and right sides which should be at 300 degrees).
However, previously, I've calculated phi using either x or y as a constant. How am I supposed to calculate it where both are variables? Do I hold x steady, while running through every number in the vector of y, assigning that to a matrix, incrementing x to the next number in its vector after running through every value of y again and again? And then doing the same process, but slowly incrementing y instead?
If so, I've been using a loop that increments to the next row every time it loops through all 100 values. If I did it that way, I would end up with a massive matrix that has 200 rows and 100 columns. How would I use that in the linspace function?
If that's correct, this is how I'm finding my matrix:
clear
clc
format compact
x = linspace(0,1);
y = linspace(0,1);
N = 50;
phi1 = 300;
phi2 = 400;
phi = 0;
sum = 0;
for j = 1:100
for i = 1:100
for n = 1:N
sum = sum + ((2/(n*pi))*(((phi2-phi1)*(cos(n*pi)-1))/((exp(n*pi))-(exp(-n*pi))))*((1-(exp(-n*pi)))*(exp(n*pi*y(i)))+((exp(n*pi))-1)*(exp(-n*pi*y(i))))*sin(n*pi*x(j)));
end
phi(j,i) = phi1 - sum;
end
end
for j = 1:100
for i = 1:100
for n = 1:N
sum = sum + ((2/(n*pi))*(((phi2-phi1)*(cos(n*pi)-1))/((exp(n*pi))-(exp(-n*pi))))*((1-(exp(-n*pi)))*(exp(n*pi*y(j)))+((exp(n*pi))-1)*(exp(-n*pi*y(j))))*sin(n*pi*x(i)));
end
phi(j+100,i) = phi1 - sum;
end
end
This is the definition of contourf. I think I have to use contourf(X,Y,Z):
contourf(X,Y,Z), contourf(X,Y,Z,n), and contourf(X,Y,Z,v) draw filled contour plots of Z using X and Y to determine the x- and y-axis limits. When X and Y are matrices, they must be the same size as Z and must be monotonically increasing.
Here is the new code:
N = 50;
phi1 = 300;
phi2 = 400;
[x, y, n] = meshgrid(linspace(0,1),linspace(0,1),1:N)
f = phi1-((2./(n.*pi)).*(((phi2-phi1).*(cos(n.*pi)-1))./((exp(n.*pi))-(exp(-n.*pi)))).*((1-(exp(-1.*n.*pi))).*(exp(n.*pi.*y))+((exp(n.*pi))-1).*(exp(-1.*n.*pi.*y))).*sin(n.*pi.*x));
g = sum(f,3);
[x1,y1] = meshgrid(linspace(0,1),linspace(0,1));
contourf(x1,y1,g)
Vectorize the code. For example you can write f(x,y,n) with:
[x y n] = meshgrid(-1:0.1:1,-1:0.1:1,1:10);
f=exp(x.^2-y.^2).*n ;
f is a 3D matrix now just sum over the right dimension...
g=sum(f,3);
in order to use contourf, we'll take only the 2D part of x,y:
[x1 y1] = meshgrid(-1:0.1:1,-1:0.1:1);
contourf(x1,y1,g)
The reason your code takes so long to calculate the phi matrix is that you didn't pre-allocate the array. The error about size happens because phi is not 100x100. But instead of fixing those things, there's an even better way...
MATLAB is a MATrix LABoratory so this type of equation is pretty easy to compute using matrix operations. Hints:
Instead of looping over the values, rows, or columns of x and y, construct matrices to represent all the possible input combinations. Check out meshgrid for this.
You're still going to need a loop to sum over n = 1:N. But for each value of n, you can evaluate your equation for all x's and y's at once (using the matrices from hint 1). The key to making this work is using element-by-element operators, such as .* and ./.
Using matrix operations like this is The Matlab Way. Learn it and love it. (And get frustrated when using most other languages that don't have them.)
Good luck with your homework!

How do I create a simliarity matrix in MATLAB?

I am working towards comparing multiple images. I have these image data as column vectors of a matrix called "images." I want to assess the similarity of images by first computing their Eucledian distance. I then want to create a matrix over which I can execute multiple random walks. Right now, my code is as follows:
% clear
% clc
% close all
%
% load tea.mat;
images = Input.X;
M = zeros(size(images, 2), size (images, 2));
for i = 1:size(images, 2)
for j = 1:size(images, 2)
normImageTemp = sqrt((sum((images(:, i) - images(:, j))./256).^2));
%Need to accurately select the value of gamma_i
gamma_i = 1/10;
M(i, j) = exp(-gamma_i.*normImageTemp);
end
end
My matrix M however, ends up having a value of 1 along its main diagonal and zeros elsewhere. I'm expecting "large" values for the first few elements of each row and "small" values for elements with column index > 4. Could someone please explain what is wrong? Any advice is appreciated.
Since you're trying to compute a Euclidean distance, it looks like you have an error in where your parentheses are placed when you compute normImageTemp. You have this:
normImageTemp = sqrt((sum((...)./256).^2));
%# ^--- Note that this parenthesis...
But you actually want to do this:
normImageTemp = sqrt(sum(((...)./256).^2));
%# ^--- ...should be here
In other words, you need to perform the element-wise squaring, then the summation, then the square root. What you are doing now is summing elements first, then squaring and taking the square root of the summation, which essentially cancel each other out (or are actually the equivalent of just taking the absolute value).
Incidentally, you can actually use the function NORM to perform this operation for you, like so:
normImageTemp = norm((images(:, i) - images(:, j))./256);
The results you're getting seem reasonable. Recall the behavior of the exp(-x). When x is zero, exp(-x) is 1. When x is large exp(-x) is zero.
Perhaps if you make M(i,j) = normImageTemp; you'd see what you expect to see.
Consider this solution:
I = Input.X;
D = squareform( pdist(I') ); %'# euclidean distance between columns of I
M = exp(-(1/10) * D); %# similarity matrix between columns of I
PDIST and SQUAREFORM are functions from the Statistics Toolbox.
Otherwise consider this equivalent vectorized code (using only built-in functions):
%# we know that: ||u-v||^2 = ||u||^2 + ||v||^2 - 2*u.v
X = sum(I.^2,1);
D = real( sqrt(bsxfun(#plus,X,X')-2*(I'*I)) );
M = exp(-(1/10) * D);
As was explained in the other answers, D is the distance matrix, while exp(-D) is the similarity matrix (which is why you get ones on the diagonal)
there is an already implemented function pdist, if you have a matrix A, you can directly do
Sim= squareform(pdist(A))