Least square minimization of a Cost function - matlab

I am aiming to minimize the below cost function over W
J = (E)^2
E = A - W .* B
Such that W(n+1) = W(n) - (u/2) * delJ
delJ = gradient of J = -2 * E .* B
u = step_size=0.2
where:
- A, B are STFT matrix of 2 audio signals (dimension is 257x4000 for a 16s audio with window size = 256 , 75% overlap, nfft=512)
- W is a matrix constructed with [257x1] vector repeated 4000 times (so that it become 257x4000] matrix
I have written my customize function as below. The problem is, the elements in A and B are so small (~e-20) that even after 1000 iteration that no change happens to g.
I am certainly missing something, if anybody can help or guide me to some link which explains the whole process for a new person.
[M,N] = size(A);
E =#(x) A - repmat(x,1,N) .* B; % Error Function
J = #(x) E(x) .^ 2; % Cost Function
G = #(x) -2 * E(x) .* B; % Gradiant Function
alpha = .2; % Step size
maxiter = 500; % Max iteration
dwmin = 1e-6; % Min change in gradiation
tolerence = 1e-6; % Max Gradiant norm
gnorm = inf;
w = rand(M,1);
dw = inf;
for i = 1:maxiter
g = G(w);
gnorm = norm(g);
wnew = w - (alpha/2)*g(:,1);
dw = norm(wnew-w)
if or(dw < dwmin, gnorm < tolerence)
break
end
end
w = wnew;
A & B are always positive real numbered vectors.

Your problem is actually a series of independent problems. If we index each row of A and B and each element of w with i, then minimizing the sum of squares across the error matrix
A - repmat(w, 1, N) .* B
is the same as minimizing the the sum of squares across the error vector
A(i, :) - w(i) * B(i, :)
for all rows separately. The latter problem can be solved using one of the least-squares operators of Matlab, in particular mrdivide or /:
for i = 1 : M
w(i) = A(i, :) / B(i, :);
end
As far as I can tell, there is no way to further vectorize this calculation.
In any case, there is no need to use a gradient descent or other form of optimization algorithm.

Related

Gradient Descent with multiple variable without Matrix

I'm new with Matlab and Machine Learning and I tried to make a gradient descent function without using matrix.
m is the number of example on my training set
n is the number of feature for each example
The function gradientDescentMulti takes 5 arguments:
X mxn Matrix
y m-dimensional vector
theta : n-dimensional vector
alpha : a real number
nb_iters : a real number
I already have a solution using matrix multiplication
function theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
for iter = 1:num_iters
gradJ = 1/m * (X'*X*theta - X'*y);
theta = theta - alpha * gradJ;
end
end
The result after iterations:
theta =
1.0e+05 *
3.3430
1.0009
0.0367
But now, I tried to do the same without matrix multiplication, this is the function:
function theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
n = size(X, 2); % number of features
for iter = 1:num_iters
new_theta = zeros(1, n);
%// for each feature, found the new theta
for t = 1:n
S = 0;
for example = 1:m
h = 0;
for example_feature = 1:n
h = h + (theta(example_feature) * X(example, example_feature));
end
S = S + ((h - y(example)) * X(example, n)); %// Sum each feature for this example
end
new_theta(t) = theta(t) - alpha * (1/m) * S; %// Calculate new theta for this example
end
%// only at the end of the function, update all theta simultaneously
theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector)
end
end
The result, all the theta are the same :/
theta =
1.0e+04 *
3.5374
3.5374
3.5374
If you look at the gradient update rule, it may be more efficient to actually compute the hypothesis of all of your training examples first, then subtract this with the ground truth value of each training example and store these into an array or vector. Once you do this, you can then compute the update rule very easily. To me, it doesn't appear that you're doing this in your code.
As such, I rewrote the code, but I have a separate array that stores the difference in the hypothesis of each training example and ground truth value. Once I do this, I compute the update rule for each feature separately:
for iter = 1 : num_iters
%// Compute hypothesis differences with ground truth first
h = zeros(1, m);
for t = 1 : m
%// Compute hypothesis
for tt = 1 : n
h(t) = h(t) + theta(tt)*X(t,tt);
end
%// Compute difference between hypothesis and ground truth
h(t) = h(t) - y(t);
end
%// Now update parameters
new_theta = zeros(1, n);
%// for each feature, find the new theta
for tt = 1 : n
S = 0;
%// For each sample, compute products of hypothesis difference
%// and the right feature of the sample and accumulate
for t = 1 : m
S = S + h(t)*X(t,tt);
end
%// Compute gradient descent step
new_theta(tt) = theta(tt) - (alpha/m)*S;
end
theta = new_theta'; %// Transpose new_theta (horizontal vector) to theta (vertical vector)
end
When I do this, I get the same answers as using the matrix formulation.

Finding optimal weight factor for SOR

I am using the SOR method and need to find the optimal weight factor. I think a good way to go about this is to run my SOR code with a number of omegas from 0 to 2, then store the number of iterations for each of these. Then I can see which iteration is the lowest and which omega it corresponds to. Being a novice programer, however, I am unsure how to go about this.
Here is my SOR code:
function [x, l] = SORtest(A, b, x0, TOL,w)
[m n] = size(A); % assigning m and n to number of rows and columns of A
l = 0; % counter variable
x = [0;0;0]; % introducing solution matrix
max_iter = 200;
while (l < max_iter) % loop until max # of iters.
l = l + 1; % increasing counter variable
for i=1:m % looping through rows of A
sum1 = 0; sum2 = 0; % intoducing sum1 and sum2
for j=1:i-1 % looping through columns
sum1 = sum1 + A(i,j)*x(j); % computing sum using x
end
for j=i+1:n
sum2 = sum2 + A(i,j)*x0(j); % computing sum using more recent values in x0
end
x(i) =(1-w)*x0(i) + w*(-sum1-sum2+b(i))/A(i,i); % assigning elements to the solution matrix.
end
if abs(norm(x) - norm(x0)) < TOL % checking tolerance
break
end
x0 = x; % assigning x to x0 before relooping
end
end
That's pretty easy to do. Simply loop through values of w and determine what the total number of iterations is at each w. Once the function finishes, check to see if this is the current minimum number of iterations required to get a solution. If it is, then update what the final solution would be. Once we iterate over all w, the result would be the solution vector that produced the smallest number of iterations to converge. Bear in mind that SOR has the w such that it does not include w = 0 or w = 2, or 0 < w < 2, so we can't include 0 or 2 in the range. As such, do something like this:
omega_vec = 0.01:0.01:1.99;
final_x = x0;
min_iter = intmax;
for w = omega_vec
[x, iter] = SORtest(A, b, x0, TOL, w);
if iter < min_iter
min_iter = iter;
final_x = x;
end
end
The loop checks to see if the total number of iterations at each w is less than the current minimum. If it is, log this and also record what the solution vector was. The final solution vector that was the minimum over all w will be stored in final_x.

simplifying the formula for EM

I want to compute below formula in Matlab (E-step of EM for Multinomial Mixture Model),
g and θ are matrix , θ and λ have below constrains:
but count of m is more than 1593 and when compute product of θ, number get very small and Matlab save it with zero.
Anyone can simplifying the g formula or use other tricks to solve this problem?
update:
data:
data.txt
(after downloads, change file extension to 'mat')
code:
function EM(data)
%% initialize
K=2;
[N M]=size(data);
g=zeros(N,K);
landa=ones(K,1) .* 0.5;
theta = rand(M, K);
theta = bsxfun(#rdivide, theta, sum(theta,1))';
%% EM
for i=1:10
%% E Step
for n=1:N
normalize=0;
for k=1:K
g(n,k)=landa(k) * prod(theta(k,:) .^ data(n,:));
normalize=normalize + landa(k) * prod(theta(k,:) .^ data(n,:));
end
g(n,:)=g(n,:) ./ normalize;
end
%% M Step
for k=1:K
landa(k)=sum(g(:,k)) / N ;
for m=1:M
theta(k,m)=(sum(g(:,k) .* data(:,m)) + 1) / (sum(g(:,k) .* sum(data,2)) + M);
end
end
end
end
You can use computations on logarithms instead of the actual values to avoid underflow problems.
To start things, we slightly reformat the E step code:
for n = 1 : N
for k = 1 : K
g(n, k) = lambda(k) * prod(theta(k, :) .^ data(n, :));
end
end
g = bsxfun(#rdivide, g, sum(g, 2));
So instead of accumulating the denominator in an extra variable normalize, we do the normalization in one step after both loops.
Now we introduce a variable lg with contains the logarithm of g:
for n = 1 : N
for k = 1 : K
lg(n, k) = log(lambda(k)) + sum(log(theta(k, :)) .* data(n, :));
end
end
g = exp(lg);
g = bsxfun(#rdivide, g, sum(g, 2));
So far, nothing is achieved. The underflow is just moved from within the loop to the conversion from lg to g via the exponential afterwards.
But, in the next line there is the normalization step, which means that the correct value of g is not really necessary: All that is important is that different values have the correct ratios between them. This means we can divide all values that jointly enter a normalization by an arbitrary constant, without changing the end result. On the logarithmic scale, this means subtracting something, and we choose this something to be the arithmetic mean of lg (corresponding to the harmonic mean of g):
lg = bsxfun(#minus, lg, mean(lg, 2));
g = exp(lg);
g = bsxfun(#rdivide, g, sum(g, 2));
Via the subtraction, logarithmic values are moved from something like -2000, which doesn't survive the exponential, to something like +50 or -30. Values of g now are sensible, and can be easily normalized to reach the correct end result.

Infinite while loop in Matlab, counter doesn't exit loop

I'm writing a function of the Gauss Seidel method of solving a linear system of equations of the form Ax=b, x being the unknown we are looking for.
I am having a problem with the while loop in my function, it seems that it runs infinitely. I can't seem to figure out why.
This is my function for creating the coefficient matrix A and the column vectors x and b, all with the same number of rows of course. No problem with this one.
function [A, b, x0] = test_system(n)
u = ones(n, 1);
A = spdiags([u 4*u u], [-1 0 1], n, n);
b = zeros(n, 1);
b(1) = 3;
b(2 : 2 : end-2) = -2;
b(3 : 2 : end-1) = 2;
b(end) = -3;
x0 = ones(n, 1);
This is my function for solving the system. I have included all of it just in case, but I believe the real problem is within the while loop at the very end which runs infinitely when I execute the function. The counter doesn't break away from it either. I can't really see what its problem is. Any clues?
Be gentle, I'm new at Matlab :)
function [x] = GaussSeidel(A,b,x0,tol)
% implementation of the GaussSeidel iterative method
% for solving a linear system of equations Ax = b
%INPUTS:
% A: coefficient matrix
% b: column vector of constants
% x0: setup for the unknown vector (using vector of ones)
% tol: result must be within 'tol' of correct answer.
%OUTPUTS:
% x: unknown
%check that A is a matrix
if ~(ismatrix(A))
error('A is not a matrix');
end
%check that A is square
[m,n] = size(A);
if m ~= n
error('Matrix A is not square');
end
%check that b is a column vector
if ~(iscolumn(b))
error('b is not a column vector');
end
%check that x0 is a column vector
if ~(iscolumn(x0))
error('x0 is not a column vector');
end
%check that A, b and x0 agree in size
[rowA,colA] = size(A);
[rowb,colb] = size(b);
[rowx0,colx0] = size(x0);
if ~isequal(colA,rowb)||~isequal(rowb,rowx0)
error('matrix dimensions of A, b and xo do not agree');
end
%check that A and b have real entries
if ~isreal(A) || ~isreal(b)
error('matrix A or vector b do not have real entries');
end
%check that the provided tolerance is positive
if tol <= 0
error('tolerance must be positive');
end
%check that A is strictly diagonally dominant
absoluteA = abs(A);
row_sum=sum(absoluteA,2);
diagonal=diag(absoluteA);
if ~all(2*diagonal > row_sum)
warning('matrix A is not strictly diagonally dominant');
end
L = tril(A,-1);
U = triu(A,+1);
D = diag(diag(A));
x = x0;
M1 = inv(D).*L;
M2 = inv(D).*U;
M3 = D\b;
k = 0; %iterations counter
disp(size(M1));
disp(size(M2));
disp(size(M3));
disp(size(x));
while (norm(A*x - b) > tol)
for i=1:n
x(i) = - M1(i,:).*x - M2(i,:).*x + M3(i,:);
end
k=k+1;
if(k >= 10e4)
error('too many iterations carried out');
end
end
end %end function
Coding Discussion
The source of the coding error is the use of element-wise operations instead of matrix-matrix and matrix-vector operations.
These operations produce zero-matrices, so no progress is made.
The M matrices should be defined by
M1 = inv(D)*L; % Note that D\L is more efficient
M2 = inv(D)*U; % Note that D\U is more efficient
M3 = D\b;
and the iterator performing the update should be
x(i) = - M1(i,:)*x - M2(i,:)*x + M3(i,:);
Method Discussion
I also think it is worth mentioning that the code is currently implementing the Jacobi Method since the updater has the form
while a Gauss-Seidel update has the form
.
I don’t have 50 reputation so I can not comment this.
The line if(k >= 10e4), I think this does not make what you think it would. 10e4 is 100,000 and 1e4 is 10,000. This will be the reason why you think your counter does not work. Matlab ist still running because it’s running further than you think it will. Also I run in the same Problem knedlsepp already pointed out.

Vectorizing MATLAB function

I have double summation over m = 1:M and n = 1:N for polar point with coordinates rho, phi, z:
I have written vectorized notation of it:
N = 10;
M = 10;
n = 1:N;
m = 1:M;
rho = 1;
phi = 1;
z = 1;
summ = cos (n*z) * besselj(m'-1, n*rho) * cos(m*phi)';
Now I need to rewrite this function for accepting vectors (columns) of coordinates rho, phi, z. I tried arrayfun, cellfun, simple for loop - they work too slow for me. I know about "MATLAB array manipulation tips and tricks", but as MATLAB beginner I can't understand repmat and other functions.
Can anybody suggest vectorized solution?
I think your code is already well vectorized (for n and m). If you want this function to also accept an array of rho/phi/z values, I suggest you simply process the values in a for-loop, as I doubt any further vectorization will bring significant improvements (plus the code will be harder to read).
Having said that, in the code below, I tried to vectorize the part where you compute the various components being multiplied {row N} * { matrix N*M } * {col M} = {scalar}, by making a single call to the BESSELJ and COS functions (I place each of the row/matrix/column in the third dimension). Their multiplication is still done in a loop (ARRAYFUN to be exact):
%# parameters
N = 10; M = 10;
n = 1:N; m = 1:M;
num = 50;
rho = 1:num; phi = 1:num; z = 1:num;
%# straightforward FOR-loop
tic
result1 = zeros(1,num);
for i=1:num
result1(i) = cos(n*z(i)) * besselj(m'-1, n*rho(i)) * cos(m*phi(i))';
end
toc
%# vectorized computation of the components
tic
a = cos( bsxfun(#times, n, permute(z(:),[3 2 1])) );
b = besselj(m'-1, reshape(bsxfun(#times,n,rho(:))',[],1)'); %'
b = permute(reshape(b',[length(m) length(n) length(rho)]), [2 1 3]); %'
c = cos( bsxfun(#times, m, permute(phi(:),[3 2 1])) );
result2 = arrayfun(#(i) a(:,:,i)*b(:,:,i)*c(:,:,i)', 1:num); %'
toc
%# make sure the two results are the same
assert( isequal(result1,result2) )
I did another benchmark test using the TIMEIT function (gives more fair timings). The result agrees with the previous:
0.0062407 # elapsed time (seconds) for the my solution
0.015677 # elapsed time (seconds) for the FOR-loop solution
Note that as you increase the size of the input vectors, the two methods will start to have similar timings (the FOR-loop even wins on some occasions)
You need to create two matrices, say m_ and n_ so that by selecting element i,j of each matrix you get the desired index for both m and n.
Most MATLAB functions accept matrices and vectors and compute their results element by element. So to produce a double sum, you compute all elements of the sum in parallel by f(m_, n_) and sum them.
In your case (note that the .* operator performs element-wise multiplication of matrices)
N = 10;
M = 10;
n = 1:N;
m = 1:M;
rho = 1;
phi = 1;
z = 1;
% N rows x M columns for each matrix
% n_ - all columns are identical
% m_ - all rows are identical
n_ = repmat(n', 1, M);
m_ = repmat(m , N, 1);
element_nm = cos (n_*z) .* besselj(m_-1, n_*rho) .* cos(m_*phi);
sum_all = sum( element_nm(:) );