Summation using for loop in MATLAB - matlab

J = 0;
sumTerm = 0;
for i=1:m
sumTerm = sumTerm + ((theta(1)+theta(2)*X(i))-y(i)).^2;
end
J = (1/2*m)*sumTerm;
Is this the right way to do summation ?

How about this:
J = 0.5 * sum(((theta(1)*ones(size(X))+theta(2)*X)-y).^2)/m
Or as #rayryeng pointed out, you can even drop the ones
J = 0.5 * sum(((theta(1)+theta(2)*X)-y).^2)/m

That's correct, but you'll want to implement that vectorized instead of using loops. You can take advantage of this by using linear algebra to compute the sum for you. You can compute theta(1) + theta(2)*X(i) - y(i) for each term by first creating the matrix X that is a matrix of points where the first column is appended with all ones and the next column contains your single feature / data points. You would finally compute the difference between the output from the prediction line and the true output for each data point by X*theta - y which would thus produce a vector of differences for each data point. This is also assuming that your array of points and theta are both column vectors, and I believe that this is the right structure since this looks like you're implementing the cost function for univariate linear regression from Andrew Ng's Machine Learning course.
You can then compute the dot product of this vector with itself to compute the sum of square differences, then you can divide by 2*m when you're done:
vec = [ones(m,1) X]*theta - y;
J = (vec.'*vec) / (2*m); %'
The reason why you should pursue a linear algebra solution instead is because native matrix and vector operations in MATLAB are very, very fast and if you can find a solution to your computational problems with linear algebra, it'll be the fastest you can ever get your code to compute things.
For example, see this post on why matrix multiplication in MATLAB is amongst the fastest when benchmarking with other platforms: Why is MATLAB so fast in matrix multiplication?

Related

MATLAB: Faster method of finding Discrete Fourier Transform

I'm trying to write a code in matlab that does basically the same as the build-in fft function. Hence computing the discrete fourier transform of any given input vector.
The transform is given by
% N
% X(k) = sum x(n)*exp(-j*2*pi*(k-1)*(n-1)/N), 1 <= k <= N.
% n=1
Now I created my own code to do this, but the computational effort is about a factor 200 when I look at the computation times. Obviously I would like to reduce this.
Below the computational part of my code, where y is the output vector.
N=length(input_vector)
for k = 1:N
y(k)=0;
for n = 1:N
term = input_vector(n)*exp(-2*pi*1i*(n-1)*(k-1)/N);
y(k)=y(k)+term;
end
end
Now I think the computation is heavy because of the for loops and the line with y(k)=y(k)+term, since this happens at all iterations. I reckon I should be able to make this smaller by either using vector/matrix notation or by using functions with dummy variables and then iterate these functions. But I don't know how to start this process.
Any help or suggestions would be much appreciated.
Using implicit expansion you can greatly reduce the computation time of your algorithm:
% Vector length
N = length(input_vector);
% Vectorized DFT algorithm
y = sum(input_vector.*exp(-2*pi*1i*[0:N-1].'*[0:N-1]/N),2);
There is however two downsides:
The vectorization will consume a lot of memory (since a vector N*N
have to be created)
It won't be faster than the built-in function.

Calculating covariance in Matlab for large dataset and different mean

So I'm trying to implement an EM-Algorithm to train a Gaussian Class Conditional model for classifying data. I'm stuck in the M-step at the moment because I can't figure out how to calculate the covariance matrix.
The problem is I have a big data set and using a for loop to go through each point would be way to slow. I also can't use the covariance function cov(), because I need to use a mean which I calculated using this formula(mu symbol one)
Is there a way to adjust cov() to use the mean I want? Or is there another way I could do this without for loops?
Edit: Forgot to explain what the data matrix is like. Its an nx3 where each row is a data point.
It technically needs to work for the general case nxm but n is usually really big(1000 or more) while m is relatively small.
You can calculate your covariance matrix manually. Let data be the matrix containing all your variables (for example, [x y]) and mu your custom mean, proceed as follows:
n = size(data,1);
data_dem = data - (ones(n,1) * mu);
cov_mat = (data_dem.' * data_dem) ./ (n - 1);
Notice that I used the Bessel's Correction (n-1 instead of n) because the Matlab cov function uses it, unless you specify the third argument as 1:
cov_mat = cov(x,y,1);
C = cov(___,w) specifies the normalization weight for any of the
previous syntaxes. When w = 0 (default), C is normalized by the number
of observations-1. When w = 1, it is normalized by the number of
observations.

Vectorizing Arithmetic Operations

I am trying to improve the performance of my code by converting some iterations into matrix operations in Matlab. One of these is the following code and I need to figure out how can I avoid using loop in the operation.
Here gamma_ic & bow are two dimensional matrices.
c & z are variables set from outer iterations.
for z=1:maxNumber,
for c=1:K,
n = 0;
for y2=1:number_documents,
n = n+(gamma_ic(y2,c)*bow(y2,z));
end
mu(z,c) = n / 2.3;
end
end
Appreciate your assistance.
Edit. Added The loop for c and z. The iteration goes on till the maximum indices in gamma_ic & bow. Added mu which is another two dimensional matrix to show usage of n.
This should work for you to get mu, which seems to be the desired output -
mu = bow(1:number_documents,1:maxNumber).'*gamma_ic(1:number_documents,1:K)./2.3

How to calculate matrix entries efficently using Matlab

I have a cell array myBasis of sparse matricies B_1,...,B_n.
I want to evaluate with Matlab the matrix Q(i,j) = trace (B^T_i * B_j).
Therefore, I wrote the following code:
for i=1:n
for j=1:n
B=myBasis{i};
C=myBasis{j};
Q(i,j)=trace(B'*C);
end
end
Which takes already 68 seconds when n=1226 and B_i has 50 rows, and 50 colums.
Is there any chance to speed this up? Usually I exclude for-loops from my matlab code in a c++ file - but I have no experience how to handle a sparse cell array in C++.
As noted by Inox Q is symmetric and therefore you only need to explicitly compute half the entries.
Computing trace( B.'*C ) is equivalent to B(:).'*C(:):
trace(B.'*C) = sum_i [B.'*C]_ii = sum_i sum_j B_ij * C_ij
which is the sum of element-wise products and therefore equivalent to B(:).'*C(:).
When explicitly computing trace( B.'*C ) you are actually pre-computing all k-by-k entries of B.'*C only to use the diagonal later on. AFAIK, Matlab does not optimize its calculation to save it from computing all the entries.
Here's a way
for ii = 1:n
B = myBasis{ii};
for jj = ii:n
C = myBasis{jj};
t = full( B(:).'*C(:) ); % equivalent to trace(B'*C)!
Q(ii,jj) = t;
Q(jj,ii) = t;
end
end
PS,
It is best not to use i and j as variable names in Matlab.
PPS,
You should notice that ' operator in Matlab is not matrix transpose, but hermitian conjugate, for actual transpose you need to use .'. In most cases complex numbers are not involved and there is no difference between the two operators, but once complex data is introduced, confusing between the two operators makes debugging quite a mess...
Well, a couple of thoughts
1) Basic stuff: A'*B = (B'*A)' and trace(A) = trace(A'). Well, only this trick cut your calculations by almost 50%. Your Q(i,j) matrix is symmetric, and you only need to calculate n(n+1)/2 terms (and not n²)
2) To calculate the trace you don't need to calculate every term of B'*C, just the diagonal. Nevertheless, I don't know if it's easy to create a script in Matlab that is actually faster then just calculating B'*C (MatLab is pretty fast with matrix operations).
But I would definitely implement (1)

Matlab inverse of large matrix

This is the equation I am trying to solve:
h = (X'*X)^-1*X'*y
where X is a matrix and y is a vector ((X'X)^-1 is the inverse of X-transpose times X). I have coded this in Matlab as:
h = (X'*X)\X'*y
which I believe is correct. The problem is that X is around 10000x10000, and trying to calculate that inverse is crashing Matlab on even the most powerful computer I can find (16 cores, 24GB RAM). Is there any way to split this up, or a library designed for doing such large inversions?
Thank you.
That looks like a pseudo inverse. Are you perhaps looking for just
h = X \ y;
I generated a random 10,000 by 10,000 matrix X and a random 10,000 by 1 vector y.
I just broke up my computation step by step. (Code shown below)
Computed the transpose and held it in matrix K
Then I computed Matrix A by multiplying K by X
Computed vector b by multiplying K by vector y
Lastly, I used the backslash operator on A and b to solve
I didn't have a problem with the computation. It took a while, but breaking up the operations into the smallest groups possible helped to prevent the computer from being overwhelmed. However, it could be the composition of the matrix that you are using (ie. Sparse, decimals, etc.).
X = randi(2000, [10000, 10000]);
y = randi(2000, 10000, 1);
K = X';
A = K*X;
b = K*y;
S = A\b;
If you have multiple machines at your disposal, and you can recast your problem into the form h = X\y as proposed by #Ben, then you could use distributed arrays. This demo shows how you can do that.
Jordan,
Your equation is exactly the definition for "Moore-Penrose Matrix Inverse".
Check: http://mathworld.wolfram.com/Moore-PenroseMatrixInverse.html
Directly using h = X \ y; should help.
Or check Matlab pinv(X)*y