I am trying to compute log(N(x | mu, sigma)) in MATLAB where
x is the data vector(Dimensions D x 1) , mu(Dimensions D x 1) is mean and sigma(Dimensions D x D) is covariance.
My present implementation is
function [loggaussian] = logmvnpdf(x,mu,Sigma)
[D,~] = size(x);
const = -0.5 * D * log(2*pi);
term1 = -0.5 * ((x - mu)' * (inv(Sigma) * (x - mu)));
term2 = - 0.5 * logdet(Sigma);
loggaussian = const + term1 + term2;
end
function y = logdet(A)
y = log(det(A));
end
For some cases I get an error
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND =
NaN
I know you will point out that my data is not consistent, but I need to implement the function so that I can get the best approximate instead of throwing an warning. . How do I ensure that I always get a value.
I think the warning comes from using inv(Sigma). According to the documentation, you should avoid using inv where its use can be replaced by \ (mldivide). This will give you both better speed and accuracy.
For your code, instead of inv(Sigma) * (x - mu) use Sigma \ (x - mu).
The following approach should be (a little) less sensitive to ill-conditioning of the covariance matrix:
function logpdf = logmvnpdf (x, mu, K)
n = length (x);
R = chol (K);
const = 0.5 * n * log (2 * pi);
term1 = 0.5 * sum (((R') \ (x - mu)) .^ 2);
term2 = sum (log (diag (R)));
logpdf = - (const + term1 + term2);
end
If K is singular or near-singular, you can still have warnings (or errors) when calling chol.
Related
I'm trying to implement a vectorized version of the regularised logistic regression. I have found a post that explains the regularised version but I don't understand it.
To make it easy I will copy the code below:
hx = sigmoid(X * theta);
m = length(X);
J = (sum(-y' * log(hx) - (1 - y') * log(1 - hx)) / m) + lambda * sum(theta(2:end).^2) / (2*m);
grad =((hx - y)' * X / m)' + lambda .* theta .* [0; ones(length(theta)-1, 1)] ./ m ;
I understand the first part of the Cost equation, If I'm correct it could be represented as:
J = ((-y' * log(hx)) - ((1-y)' * log(1-hx)))/m;
The problem it's the regularization term. Let's take more detail:
Dimensions:
X = (m x (n+1))
theta = ((n+1) x 1)
I don't understand why he let the first term of theta (theta_0) outside of the equation, when in theory the regularized term it's:
and it has to take into account all the thetas
For the gradient descent, I think that this equation it's equivalent:
L = eye(length(theta));
L(1,1) = 0;
grad = (1/m * X'* (hx - y)+ (lambda*(L*theta)/m).
In Matlab indexes begin from 1, and in mathematic indexes begin from 0 (the indexes on the formula which you mentioned are also beginning from 0).
So, in theory, the first term of theta also needs to be let outside of the equation.
And as for your second question, you right! It is an equivalent clean equation!
For the purpose of generalization, I hope Matlab can automatically compute the 1st & 2nd derivatives of the associated function f(x). (in case I change f(x) = sin(6x) to f(x) = sin(8x))
I know there exists built-in commands called diff() and syms, but I cannot figure out how to deal with them with the index i in the for-loop. This is the key problem I am struggling with.
How do I make changes to the following set of codes? I am using MATLAB R2019b.
n = 10;
h = (2.0 * pi) / (n - 1);
for i = 1 : n
x(i) = 0.0 + (i - 1) * h;
f(i) = sin(6 * x(i));
dfe(i) = 6 * cos(6 * x(i)); % first derivative
ddfe(i) = -36 * sin(6 * x(i)); % second derivative
end
You can simply use subs and double to do that. For your case:
% x is given here
n = 10;
h = (2.0 * pi) / (n - 1);
syms 'y';
g = sin(6 * y);
for i = 1 : n
x(i) = 0.0 + (i - 1) * h;
f(i) = double(subs(g,y,x(i)));
dfe(i) = double(subs(diff(g),y,x(i))); % first derivative
ddfe(i) = double(subs(diff(g,2),y,x(i))); % second derivative
end
By #Daivd comment, you can vectorize the loop as well:
% x is given here
n = 10;
h = (2.0 * pi) / (n - 1);
syms 'y';
g = sin(6 * y);
x = 0.0 + ((1:n) - 1) * h;
f = double(subs(g,y,x));
dfe = double(subs(diff(g),y,x)); % first derivative
ddfe = double(subs(diff(g,2),y,x)); % second derivative
I have been studying data science and ML topics for a while and I always get sucked at one point that makes a great confusion for me.
In courses like Andrew Ng's, it is defined that the error between the predicted value and the true value from e.g. Linear regression is expressed by:
error = predicted_value - y
In some other tutorials/courses, the error is presented as:
error = y - predicted_value
Also, for instance, on Udacity's data science Nanodegree, the gradient descent weights update is given by:
error = y - predicted_value
W_new = W + learn_rate * np.matmul(error, X)
At the same time, in several other books/courses, the same procedure is given by :
error = predicted_value - y
W_new = W - learn_rate * np.matmul(error, X)
Could someone help me out with those different notations?
Thank you!
EDIT
Following #bottaio answer, I got the following:
First case :
# compute errors
y_pred = np.matmul(X, W) + b
error = y_pred - y
# compute steps
W_new = W - learn_rate * np.matmul(error, X)
b_new = b - learn_rate * error.sum()
return W_new, b_new
Second case :
# compute errors
y_pred = np.matmul(X, W) + b
error = y - y_pred
# compute steps
W_new = W + learn_rate * np.matmul(error, X)
b_new = b + learn_rate * error.sum()
return W_new, b_new
Running the first and second cases, I get :
Third case :
# compute errors
y_pred = np.matmul(X, W) + b
error = y_pred - y
# compute steps
W_new = W + learn_rate * np.matmul(error, X)
b_new = b + learn_rate * error.sum()
return W_new, b_new
Running the third case, I get :
That's exactly the intuition I'm trying to achieve.
Whats the relation between using the error = y - y_pred and having to use the step computation as positive W_new = W + learn_rate * np.matmul(error, X) instead of W_new = W - learn_rate * np.matmul(error, X) ?
Thank you for all the support!!!!!
error = predicted_value - y
error' = y - predicted_value = -error
W = W + lr * matmul(error, X) = W + lr * matmul(-error', X) = W - lr * matmul(-error', X)
These two expressions are two ways of looking at the same thing. You propagate error backwards.
To be honest, the second states more clearly what is going on under the hood - error is just a difference between what model predicted relative to ground truth (explains predicted - y). And gradient descent step is about changing weights in opposite direction to gradient (explains minus).
2nd ODE to solve in MATLAB:
( (a + f(t))·d²x/dt² + (b/2 + k(t))·dx/dt ) · dx/dt - g(t) = 0
Boundary condition:
dx/dt(0) = v0
where
t is the time,
x is the position
dx/dt is the velocity
d2x/dt2 is the acceleration
a, b, v0 are constants
f(t), k(t) and h(t) are KNOWN functions dependent on t
(I do not write them because they are quite big)
As an example, using symbolic variables:
syms t y
%% --- Initial conditions ---
phi = 12.5e-3;
v0 = 300;
e = 3e-3;
ro = 1580;
E = 43e9;
e_r = 0.01466;
B = 0.28e-3;
%% --- Intermediate calculations ---
v_T = sqrt(((1 + e_r) * 620e6) /E) - sqrt(E/ro) * e_r;
R_T = v_T * t;
m_acc = pi * e * ro *(R_T^2);
v_L = sqrt (E/ro);
R_L = v_L * t;
z = 2 * R_L;
E_4 = B * ((e_r^2)* B * (0.9^(z/B)-1)) /(log(0.9));
E_1 = E * e * pi * e_r^2 * (-phi* (phi - 2*v_T*t)) /16;
E_2 = pi * R_T^2 * 10e9;
E_3 = pi * R_T^2 * 1e6 * e;
%% Resolution of the problem
g_t = -diff(E_1 + E_2 + E_3, t);
f(t,y)=(g_t - (pi*v_T*e*ro/2 + E_4) * y^2 /(y* (8.33e-3 + m_acc))];
fun=matlabFunction(f);
[T,Y]=ode45(fun,[0 1], v0]);
How can I rewrite this to get x as y=dx/dt? I'm new to Matlab and any help is very welcome !
First, you chould use subs to evaluate a symbolic function. Another approach is to use matlabFunction to convert all symbolic expressions to anonymous functions, as suggested by Horchler.
Second, you're integrating the ODE as if it is 1st order in dx/dt. If you're interested in x(t) as well as dx/dt(t), then you'll have to modify the function like so:
fun = #(t,y) [y(2);
( subs(g) - (b/2 + subs(k))*y(2)*y(2) ) / ( y(2) * (a + subs(f))) ];
and of course, provide an initial value for x0 = x(0) as well as v0 = dx/dt(0).
Third, the absolute value of the parameters is hardly ever a real concern. IEEE754 double-precision floating point format can effortlessly represent numbers between 2.225073858507201e-308 and 1.797693134862316e+308 (realmin and realmax, respectively). So for the coefficients you gave (O(1014)), this is absolutely not a problem. You might lose a few digits of precision if you don't take precautions (rescale to [-1 +1], reformulate the problem in different units, ...), but the relative error due to this is more than likely to be tiny and insignificant compared to the algorithmic error made by ode45.
<RANDOM_OPINIONATED_RANT>
Fourth, WHY do you use symbolic math for this purpose?! You are doing a numerical integration, meaning, there is no analytic solution anyway. Why bother with symbolics then? Doing the integration with symbolics (through vpa even) is going to be dozens, hundreds, yes, often even thousands of times slower than keeping (or re-implementing) everything numerical (which some would argue is already slow in MATLAB compared to a bare-metal approach).
Yes, of course, for this specific, individual, isolated use case it may not matter much, but for the future I'd strongly advise you to learn to:
use symbolics for derivations, proving theorems, simplifying expressions, ...
use numerics to implement any algorithm or function from which actual numbers are expected.
In other words, symbolics for drafting, numerics for crunching. And exactly zero symbolics should appear in any good implementation of any algorithm.
Although it's possible to mix them to some extent, that does not mean it is a good idea to do so. In fact, that's almost never. And the few isolated cases where it is the only viable option are not a vindication of the approach.
They are rare, isolated cases after all, far from the abundant norm.
For me it bears resemblance with the evil eval, with similar reasons for why it Should. Be. Avoided.
</RANDOM_OPINIONATED_RANT>
With the full code, it's easy to come up with a complete solution:
% Initial conditions
phi = 12.5e-3;
v0 = 300;
x0 = 0; % (my assumption)
e = 3e-3;
ro = 1580;
E = 43e9;
e_r = 0.01466;
B = 0.28e-3;
% Intermediate calculations
v_T = sqrt(((1 + e_r) * 620e6) /E) - sqrt(E/ro) * e_r;
R_T = #(t) v_T * t;
m_acc = #(t) pi * e * ro *(R_T(t)^2);
v_L = sqrt (E/ro);
R_L = #(t) v_L * t;
z = #(t) 2 * R_L(t);
E_4 = #(t) B * ((e_r^2)* B * (0.9^(z(t)/B)-1)) /(log(0.9));
% UNUSED
%{
E_1 = #(t) -phi * E * e * pi * e_r^2 * (phi - 2*v_T*t) /16;
E_2 = #(t) pi * R_T(t)^2 * 10e9;
E_3 = #(t) pi * R_T(t)^2 * 1e6 * e;
%}
% Resolution of the problem
g_t = #(t) -( phi * E * e * pi * e_r^2 * v_T / 8 + ... % dE_1/dt
pi * 10e9 * 2 * R_T(t) * v_T + ... % dE_2/dt
pi * 1e6 * e * 2 * R_T(t) * v_T ); % dE_3/dt
% The derivative of Z = [x(t); x'(t)] equals Z' = [x'(t); x''(t)]
f = #(t,y)[y(2);
(g_t(t) - (0.5*pi*v_T*e*ro + E_4(t)) * y(2)^2) /(y(2) * (8.33e-3 + m_acc(t)))];
% Which is readily integrated
[T,Y] = ode45(f, [0 1], [x0 v0]);
% Plot solutions
figure(1)
plot(T, Y(:,1))
xlabel('t [s]'), ylabel('position [m]')
figure(2)
plot(T, Y(:,2))
xlabel('t [s]'), ylabel('velocity [m/s]')
Results:
Note that I've not used symbolics anywhere, except to double-check my hand-derived derivatives.
I want to write my own 2 Dimensional DFT function with reduced loops.
What I try to implement is Discrete Fourier Transform:
Using the separability property of transform (actually exponential function), we can write this as multiplication of two 1 dimensional DFT. Then, we can calculate the exponential terms for rows (the matrix wM below) and columns (the matrix wN below) of transform. Then, for summation process we can multiply them as "F = wM * original_matrix * wN"
Here is the code I wrote:
f = imread('cameraman.tif');
[M, N, ~] = size(f);
wM = zeros(M, M);
wN = zeros(N, N);
for u = 0 : (M - 1)
for x = 0 : (M - 1)
wM(u+1, x+1) = exp(-2 * pi * 1i / M * x * u);
end
end
for v = 0 : (N - 1)
for y = 0 : (N - 1)
wN(y+1, v+1) = exp(-2 * pi * 1i / N * y * v);
end
end
F = wM * im2double(f) * wN;
The first thing is I dont want to use 2 loops which are MxM and NxN times running. If I used a huge matrix (or image), that would be a problem. Is there any chance to make this code faster (for example eliminating the loops)?
The second thing is displaying the Fourier Transform result. I use the codes below to display the transform:
% // "log" method
fl = log(1 + abs(F));
fm = max(fl(:));
imshow(im2uint8(fl / fm))
and
% // "abs" method
fa = abs(F);
fm = max(fa(:));
imshow(fa / fm)
When I use the "abs" method, I see only black figure, nothing else. What is wrong with "abs" method you think?
And the last thing is when I compare the transform result of my own function with MATLAB' s fft2() function', mine displays darker figure than MATLAB' s result. What am I missing here? Implementation misktake?
The transform result of my own function:
The transform result of MATLAB fft2() function:
I am happy you solved your problem but unfortunately you answer is not completely right. Indeed it does the job, but as I commented, im2double will normalize everything to 1, therefore showing the scaled result you have. What you want (if you are looking for performance) is not doing im2doubleand then multiply by 255, but directly casting to double().
You can eliminate loops by using meshgrid.
For example:
M = 1024;
tic
[ mX, mY ] = meshgrid( 0 : M - 1, 0 : M - 1 );
wM1 = exp( -2 * pi * 1i / M .* mX .* mY );
toc
tic
for u = 0 : (M - 1)
for x = 0 : (M - 1)
wM2( u + 1, x + 1 ) = exp( -2 * pi * 1i / M * x * u );
end
end
toc
all( wM1( : ) == wM2( : ) )
The timing on my system was:
Elapsed time is 0.130923 seconds.
Elapsed time is 0.493163 seconds.