Gradient Descent with multiple variable without Matrix - matlab

I'm new with Matlab and Machine Learning and I tried to make a gradient descent function without using matrix.
m is the number of example on my training set
n is the number of feature for each example
The function gradientDescentMulti takes 5 arguments:
X mxn Matrix
y m-dimensional vector
theta : n-dimensional vector
alpha : a real number
nb_iters : a real number
I already have a solution using matrix multiplication
function theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
for iter = 1:num_iters
gradJ = 1/m * (X'*X*theta - X'*y);
theta = theta - alpha * gradJ;
end
end
The result after iterations:
theta =
1.0e+05 *
3.3430
1.0009
0.0367
But now, I tried to do the same without matrix multiplication, this is the function:
function theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
n = size(X, 2); % number of features
for iter = 1:num_iters
new_theta = zeros(1, n);
%// for each feature, found the new theta
for t = 1:n
S = 0;
for example = 1:m
h = 0;
for example_feature = 1:n
h = h + (theta(example_feature) * X(example, example_feature));
end
S = S + ((h - y(example)) * X(example, n)); %// Sum each feature for this example
end
new_theta(t) = theta(t) - alpha * (1/m) * S; %// Calculate new theta for this example
end
%// only at the end of the function, update all theta simultaneously
theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector)
end
end
The result, all the theta are the same :/
theta =
1.0e+04 *
3.5374
3.5374
3.5374

If you look at the gradient update rule, it may be more efficient to actually compute the hypothesis of all of your training examples first, then subtract this with the ground truth value of each training example and store these into an array or vector. Once you do this, you can then compute the update rule very easily. To me, it doesn't appear that you're doing this in your code.
As such, I rewrote the code, but I have a separate array that stores the difference in the hypothesis of each training example and ground truth value. Once I do this, I compute the update rule for each feature separately:
for iter = 1 : num_iters
%// Compute hypothesis differences with ground truth first
h = zeros(1, m);
for t = 1 : m
%// Compute hypothesis
for tt = 1 : n
h(t) = h(t) + theta(tt)*X(t,tt);
end
%// Compute difference between hypothesis and ground truth
h(t) = h(t) - y(t);
end
%// Now update parameters
new_theta = zeros(1, n);
%// for each feature, find the new theta
for tt = 1 : n
S = 0;
%// For each sample, compute products of hypothesis difference
%// and the right feature of the sample and accumulate
for t = 1 : m
S = S + h(t)*X(t,tt);
end
%// Compute gradient descent step
new_theta(tt) = theta(tt) - (alpha/m)*S;
end
theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector)
end
When I do this, I get the same answers as using the matrix formulation.

Related

How to compute CDF from a given PMF in Matlab

For a given PMF p=f(\theta) for \theta between 0 and 2\pi, i computed the CDF in Matlab as
theta=0:2*pi/n:2*pi
for i=1:n
cdf(i)=trapz(theta(1:i),p(1:i));
end
and the result is verified.
I tried to do the same with cumsum as cdf=cumsum(p)*(2*pi)/n but the result is wrong. why?
How can i compute the CDF if the given PMF is in 2D asp=f(\theta,\phi) ? Can i do it without going into detail as explained here ?
In 1D case you can use cumsum to get the vectorized version of loop (assuming that both theta and p are column vectors):
n = 10;
theta = linspace(0, 2*pi, n).';
p = rand(n,1);
cdf = [0; 0.5 * cumsum((p(1:n-1) + p(2:n)) .* diff(theta(1:n)))];
In 2D case the function cumsum will be applied two times, in vertical and horizontal directions:
nthet = 10;
nphi = 10;
theta = linspace(0, 2*pi, nthet).'; % as column vector
phi = linspace(0, pi, nphi); % as row vector
p = rand(nthet, nphi);
cdf1 = 0.5 * cumsum((p(1:end-1, :) + p(2:end, :)) .* diff(theta), 1);
cdf2 = 0.5 * cumsum((cdf1(:, 1:end-1) + cdf1(:, 2:end)) .* diff(phi), 2);
cdf = zeros(nthet, nphi);
cdf(2:end, 2:end) = cdf2;

What is the error in the iterative implementation of gradient descent algorithm below?

I have attempted to implement the iterative version of gradient descent algorithm which however is not working correctly. The vectorized implementation of the same algorithm however works correctly.
Here is the iterative implementation :
function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)
% get the number of rows and columns
nrows = size(X, 1);
ncols = size(X, 2);
% initialize the hypothesis vector
h = zeros(nrows, 1);
% initialize the temporary theta vector
theta_temp = zeros(ncols, 1);
% run gradient descent for the specified number of iterations
count = 1;
while count <= iterations
% calculate the hypothesis values and fill into the vector
for i = 1 : nrows
for j = 1 : ncols
term = theta(j) * X(i, j);
h(i) = h(i) + term;
end
end
% calculate the gradient
for j = 1 : ncols
for i = 1 : nrows
term = (h(i) - y(i)) * X(i, j);
theta_temp(j) = theta_temp(j) + term;
end
end
% update the gradient with the factor
fact = alpha / nrows;
for i = 1 : ncols
theta_temp(i) = fact * theta_temp(i);
end
% update the theta
for i = 1 : ncols
theta(i) = theta(i) - theta_temp(i);
end
% update the count
count += 1;
end
end
And below is the vectorized implementation of the same algorithm :
function [theta, theta_all, J_cost] = gradientDescent(X, y, theta, alpha)
% set the learning rate
learn_rate = alpha;
% set the number of iterations
n = 1500;
% number of training examples
m = length(y);
% initialize the theta_new vector
l = length(theta);
theta_new = zeros(l,1);
% initialize the cost vector
J_cost = zeros(n,1);
% initialize the vector to store all the calculated theta values
theta_all = zeros(n,2);
% perform gradient descent for the specified number of iterations
for i = 1 : n
% calculate the hypothesis
hypothesis = X * theta;
% calculate the error
err = hypothesis - y;
% calculate the gradient
grad = X' * err;
% calculate the new theta
theta_new = (learn_rate/m) .* grad;
% update the old theta
theta = theta - theta_new;
% update the cost
J_cost(i) = computeCost(X, y, theta);
% store the calculated theta value
if i < n
index = i + 1;
theta_all(index,:) = theta';
end
end
Link to the dataset can be found here
The filename is ex1data1.txt
ISSUES
For initial theta = [0, 0] (this is a vector!), learning rate of 0.01 and running this for 1500 iterations I get the optimal theta as :
theta0 = -3.6303
theta1 = 1.1664
The above is the output for the vectorized implementation which I know I have implemented correctly (it passed all the test cases on Coursera).
However, when I implemented the same algorithm using the iterative method (1st code I mentioned) the theta values I get are (alpha = 0.01, iterations = 1500):
theta0 = -0.20720
theta1 = -0.77392
This implementation fails to pass the test cases and I know therefore that the implementation is incorrect.
I am however unable to understand where I am going wrong as the iterative code does the same job, same multiplications as the vectorized one and when I tried to trace the output of 1 iteration of both the codes, the values came same (on pen and paper!) but failed when I ran them on Octave.
Any help regarding this would be of great help especially if you could point out where I went wrong and what exactly was the cause of failure.
Points to consider
The implementation of hypothesis is correct as I tested it out and both the codes gave the same results, so no issues here.
I printed the output of the gradient vector in both the codes and realised that the error lies here because the outputs here were very different!
Additionally, here is the code for pre-processing the data :
function[X, y] = fileReader(filename)
% load the dataset
dataset = load(filename);
% get the dimensions of the dataset
nrows = size(dataset, 1);
ncols = size(dataset, 2);
% generate the X matrix from the dataset
X = dataset(:, 1 : ncols - 1);
% generate the y vector
y = dataset(:, ncols);
% append 1's to the X matrix
X = [ones(nrows, 1), X];
end
What is going wrong with the first code is that the theta_temp and the h vectors are not being initialised properly. For the very first iteration (when count value equals 1) your code runs properly because for that particular iteration the the h and the theta_temp vectors have been initialised to 0 properly. However, since these are temporary vectors for each iteration of gradient descent, they have not been initialised to 0 vectors again for the subsequent iterations. That is, for iteration 2, the values that are modified into h(i) and theta_temp(i) are just added to the old values. Hence because of that, the code does not work properly. You need to update the vectors as zero vectors at the beginning of each iteration and then they would work correctly. Here is my implementation of your code (the first one, observe the changes) :
function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)
% get the number of rows and columns
nrows = size(X, 1);
ncols = size(X, 2);
% run gradient descent for the specified number of iterations
count = 1;
while count <= iterations
% initialize the hypothesis vector
h = zeros(nrows, 1);
% initialize the temporary theta vector
theta_temp = zeros(ncols, 1);
% calculate the hypothesis values and fill into the vector
for i = 1 : nrows
for j = 1 : ncols
term = theta(j) * X(i, j);
h(i) = h(i) + term;
end
end
% calculate the gradient
for j = 1 : ncols
for i = 1 : nrows
term = (h(i) - y(i)) * X(i, j);
theta_temp(j) = theta_temp(j) + term;
end
end
% update the gradient with the factor
fact = alpha / nrows;
for i = 1 : ncols
theta_temp(i) = fact * theta_temp(i);
end
% update the theta
for i = 1 : ncols
theta(i) = theta(i) - theta_temp(i);
end
% update the count
count += 1;
end
end
I ran the code and it gave the same values of theta which you have mentioned. However, what I wonder is how did you state that the output of hypothesis vector was the same in both cases where clearly, this was one of the reasons for the first code failing!

Double integral of the equation in Matlab

How to implement this equation
in Matlab,
where: A and B are mxm matrices.
for example:
A = [3.45 32.54 78.2; 8.4 33.1 4.66; 68.2 9.336 33.87 ]
B = [6.45 36.54 28.24; 85.4 323.1 74.66; 98.2 93.336 38.55 ]
my code:
f1 = #(A) (abs(A) ).^2;
f2 = #(B) (abs(B) ).^2;
Q = integral2( f1, 0, 1, 0, 1) * integral2(f2, 0, 1, 0, 1);
But when i run the code I got the error "Too many input arguments.".
What is the problem in the code?
After clarification of your question, let me change my post.
What you are after is numerical integration of a function that was already sampled on a fixed grid, and the function values are stored in matrices A and B that are two dimensional M by M. I suppose that you have the associated gridpoints as well, suppose they are denoted xc and yc. Then, if you have sufficiently fine sampling of a smooth function, the integral approaches:
xc = linspace(0,1,M);
yc = linspace(0,1,M);
Q = trapz(xc,trapz(yc, abs(A).^2)) * trapz(xc,trapz(yc, abs(B).^2 ));
To test that, I made a simple example that evaluates the surface of a circle, i.e.
so to do that with trapezoidal method with N samples for r and M samples for phi, we have,
r = 2; % Pick a value for r
M = 100; % Pick M samples for the angular coordinate from 0 to 2*pi
N = 101; % Pick N samples for the radial coordinate from 0 to r
phic = linspace(0,2*pi,M); % take M samples uniformly for example
rc = linspace(0,r,N); % take N samples uniformly for example
integrand = repmat(rc,M,1); % Make MxN matrix, phi along rows, r along columns
I = trapz(rc, trapz(phic, integrand));
So for the case r=2, it gives indeed 4*pi.

Least square minimization of a Cost function

I am aiming to minimize the below cost function over W
J = (E)^2
E = A - W .* B
Such that W(n+1) = W(n) - (u/2) * delJ
delJ = gradient of J = -2 * E .* B
u = step_size=0.2
where:
- A, B are STFT matrix of 2 audio signals (dimension is 257x4000 for a 16s audio with window size = 256 , 75% overlap, nfft=512)
- W is a matrix constructed with [257x1] vector repeated 4000 times (so that it become 257x4000] matrix
I have written my customize function as below. The problem is, the elements in A and B are so small (~e-20) that even after 1000 iteration that no change happens to g.
I am certainly missing something, if anybody can help or guide me to some link which explains the whole process for a new person.
[M,N] = size(A);
E =#(x) A - repmat(x,1,N) .* B; % Error Function
J = #(x) E(x) .^ 2; % Cost Function
G = #(x) -2 * E(x) .* B; % Gradiant Function
alpha = .2; % Step size
maxiter = 500; % Max iteration
dwmin = 1e-6; % Min change in gradiation
tolerence = 1e-6; % Max Gradiant norm
gnorm = inf;
w = rand(M,1);
dw = inf;
for i = 1:maxiter
g = G(w);
gnorm = norm(g);
wnew = w - (alpha/2)*g(:,1);
dw = norm(wnew-w)
if or(dw < dwmin, gnorm < tolerence)
break
end
end
w = wnew;
A & B are always positive real numbered vectors.
Your problem is actually a series of independent problems. If we index each row of A and B and each element of w with i, then minimizing the sum of squares across the error matrix
A - repmat(w, 1, N) .* B
is the same as minimizing the the sum of squares across the error vector
A(i, :) - w(i) * B(i, :)
for all rows separately. The latter problem can be solved using one of the least-squares operators of Matlab, in particular mrdivide or /:
for i = 1 : M
w(i) = A(i, :) / B(i, :);
end
As far as I can tell, there is no way to further vectorize this calculation.
In any case, there is no need to use a gradient descent or other form of optimization algorithm.

Vectorizing MATLAB function

I have double summation over m = 1:M and n = 1:N for polar point with coordinates rho, phi, z:
I have written vectorized notation of it:
N = 10;
M = 10;
n = 1:N;
m = 1:M;
rho = 1;
phi = 1;
z = 1;
summ = cos (n*z) * besselj(m'-1, n*rho) * cos(m*phi)';
Now I need to rewrite this function for accepting vectors (columns) of coordinates rho, phi, z. I tried arrayfun, cellfun, simple for loop - they work too slow for me. I know about "MATLAB array manipulation tips and tricks", but as MATLAB beginner I can't understand repmat and other functions.
Can anybody suggest vectorized solution?
I think your code is already well vectorized (for n and m). If you want this function to also accept an array of rho/phi/z values, I suggest you simply process the values in a for-loop, as I doubt any further vectorization will bring significant improvements (plus the code will be harder to read).
Having said that, in the code below, I tried to vectorize the part where you compute the various components being multiplied {row N} * { matrix N*M } * {col M} = {scalar}, by making a single call to the BESSELJ and COS functions (I place each of the row/matrix/column in the third dimension). Their multiplication is still done in a loop (ARRAYFUN to be exact):
%# parameters
N = 10; M = 10;
n = 1:N; m = 1:M;
num = 50;
rho = 1:num; phi = 1:num; z = 1:num;
%# straightforward FOR-loop
tic
result1 = zeros(1,num);
for i=1:num
result1(i) = cos(n*z(i)) * besselj(m'-1, n*rho(i)) * cos(m*phi(i))';
end
toc
%# vectorized computation of the components
tic
a = cos( bsxfun(#times, n, permute(z(:),[3 2 1])) );
b = besselj(m'-1, reshape(bsxfun(#times,n,rho(:))',[],1)'); %'
b = permute(reshape(b',[length(m) length(n) length(rho)]), [2 1 3]); %'
c = cos( bsxfun(#times, m, permute(phi(:),[3 2 1])) );
result2 = arrayfun(#(i) a(:,:,i)*b(:,:,i)*c(:,:,i)', 1:num); %'
toc
%# make sure the two results are the same
assert( isequal(result1,result2) )
I did another benchmark test using the TIMEIT function (gives more fair timings). The result agrees with the previous:
0.0062407 # elapsed time (seconds) for the my solution
0.015677 # elapsed time (seconds) for the FOR-loop solution
Note that as you increase the size of the input vectors, the two methods will start to have similar timings (the FOR-loop even wins on some occasions)
You need to create two matrices, say m_ and n_ so that by selecting element i,j of each matrix you get the desired index for both m and n.
Most MATLAB functions accept matrices and vectors and compute their results element by element. So to produce a double sum, you compute all elements of the sum in parallel by f(m_, n_) and sum them.
In your case (note that the .* operator performs element-wise multiplication of matrices)
N = 10;
M = 10;
n = 1:N;
m = 1:M;
rho = 1;
phi = 1;
z = 1;
% N rows x M columns for each matrix
% n_ - all columns are identical
% m_ - all rows are identical
n_ = repmat(n', 1, M);
m_ = repmat(m , N, 1);
element_nm = cos (n_*z) .* besselj(m_-1, n_*rho) .* cos(m_*phi);
sum_all = sum( element_nm(:) );