DerivativeCheck fails with minFunc - matlab

I'm trying to train a single layer of an autoencoder using minFunc, and while the cost function appears to decrease, when enabled, the DerivativeCheck fails. The code I'm using is as close to textbook values as possible, though extremely simplified.
The loss function I'm using is the squared-error:
$ J(W; x) = \frac{1}{2}||a^{l} - x||^2 $
with $a^{l}$ equal to $\sigma(W^{T}x)$, where $\sigma$ is the sigmoid function. The gradient should therefore be:
$ \delta = (a^{l} - x)*a^{l}(1 - a^{l}) $
$ \nabla_{W} = \delta(a^{l-1})^T $
Note, that to simplify things, I've left off the bias altogether. While this will cause poor performance, it shouldn't affect the gradient check, as I'm only looking at the weight matrix. Additionally, I've tied the encoder and decoder matrices, so there is effectively a single weight matrix.
The code I'm using for the loss function is (edit: I've vectorized the loop I had and cleaned code up a little):
% loss function passed to minFunc
function [ loss, grad ] = calcLoss(theta, X, nHidden)
[nInstances, nVars] = size(X);
% we get the variables a single vector, so need to roll it into a weight matrix
W = reshape(theta(1:nVars*nHidden), nVars, nHidden);
Wp = W; % tied weight matrix
% encode each example (nInstances)
hidden = sigmoid(X*W);
% decode each sample (nInstances)
output = sigmoid(hidden*Wp);
% loss function: sum(-0.5.*(x - output).^2)
% derivative of loss: -(x - output)*f'(o)
% if f is sigmoid, then f'(o) = output.*(1-output)
diff = X - output;
error = -diff .* output .* (1 - output);
dW = hidden*error';
loss = 0.5*sum(diff(:).^2, 2) ./ nInstances;
% need to unroll gradient matrix back into a single vector
grad = dW(:) ./ nInstances;
end
Below is the code I use to run the optimizer (for a single time, as the runtime is fairly long with all training samples):
examples = 5000;
fprintf('loading data..\n');
images = readMNIST('train-images-idx3-ubyte', examples) / 255.0;
data = images(:, :, 1:examples);
% each row is a different training sample
X = reshape(data, examples, 784);
% initialize weight matrix with random values
% W: (R^{784} -> R^{10}), W': (R^{10} -> R^{784})
numHidden = 10; % NOTE: this is extremely small to speed up DerivativeCheck
numVisible = 784;
low = -4*sqrt(6./(numHidden + numVisible));
high = 4*sqrt(6./(numHidden + numVisible));
W = low + (high-low)*rand(numVisible, numHidden);
% run optimization
options = {};
options.Display = 'iter';
options.GradObj = 'on';
options.MaxIter = 10;
mfopts.MaxFunEvals = ceil(options.MaxIter * 2.5);
options.DerivativeCheck = 'on';
options.Method = 'lbfgs';
[ x, f, exitFlag, output] = minFunc(#calcLoss, W(:), options, X, numHidden);
The results I get with the DerivitiveCheck on are generally less than 0, but greater than 0.1. I've tried similar code using batch gradient descent, and get slightly better results (some are < 0.0001, but certainly not all).
I'm not sure if I made either a mistake with my math or code. Any help would be greatly appreciated!
update
I discovered a small typo in my code (which doesn't appear in the code below) causing exceptionally bad performance. Unfortunately, I'm still getting getting less-than-good results. For example, comparison between the two gradients:
calculate check
0.0379 0.0383
0.0413 0.0409
0.0339 0.0342
0.0281 0.0282
0.0322 0.0320
with differences of up to 0.04, which I'm assuming is still failing.

Okay, I think I might have solved the problem. Generally the differences in the gradients are < 1e-4, though I do have at least one which is 6e-4. Does anyone know if this is still acceptable?
To get this result, I rewrote the code and without tying the weight matrices (I'm not sure if doing so will always cause the derivative check to fail). I've also included biases, as they didn't complicate things too badly.
Something else I realized when debugging is that it's really easy to make a mistake in the code. For example, it took me a while to catch:
grad_W1 = error_h*X';
instead of:
grad_W1 = X*error_h';
While the difference between these two lines is just the transpose of grad_W1, because of the requirement of packing/unpacking the parameters into a single vector, there's no way for Matlab to complain about grad_W1 being the wrong dimensions.
I've also included my own derivative check which gives slightly different answers than minFunc's (my deriviate check gives differences that are all below 1e-4).
fwdprop.m:
function [ hidden, output ] = fwdprop(W1, bias1, W2, bias2, X)
hidden = sigmoid(bsxfun(#plus, W1'*X, bias1));
output = sigmoid(bsxfun(#plus, W2'*hidden, bias2));
end
calcLoss.m:
function [ loss, grad ] = calcLoss(theta, X, nHidden)
[nVars, nInstances] = size(X);
[W1, bias1, W2, bias2] = unpackParams(theta, nVars, nHidden);
[hidden, output] = fwdprop(W1, bias1, W2, bias2, X);
err = output - X;
delta_o = err .* output .* (1.0 - output);
delta_h = W2*delta_o .* hidden .* (1.0 - hidden);
grad_W1 = X*delta_h';
grad_bias1 = sum(delta_h, 2);
grad_W2 = hidden*delta_o';
grad_bias2 = sum(delta_o, 2);
loss = 0.5*sum(err(:).^2);
grad = packParams(grad_W1, grad_bias1, grad_W2, grad_bias2);
end
unpackParams.m:
function [ W1, bias1, W2, bias2 ] = unpackParams(params, nVisible, nHidden)
mSize = nVisible*nHidden;
W1 = reshape(params(1:mSize), nVisible, nHidden);
offset = mSize;
bias1 = params(offset+1:offset+nHidden);
offset = offset + nHidden;
W2 = reshape(params(offset+1:offset+mSize), nHidden, nVisible);
offset = offset + mSize;
bias2 = params(offset+1:end);
end
packParams.m
function [ params ] = packParams(W1, bias1, W2, bias2)
params = [W1(:); bias1; W2(:); bias2(:)];
end
checkDeriv.m:
function [check] = checkDeriv(X, theta, nHidden, epsilon)
[nVars, nInstances] = size(X);
[W1, bias1, W2, bias2] = unpackParams(theta, nVars, nHidden);
[hidden, output] = fwdprop(W1, bias1, W2, bias2, X);
err = output - X;
delta_o = err .* output .* (1.0 - output);
delta_h = W2*delta_o .* hidden .* (1.0 - hidden);
grad_W1 = X*delta_h';
grad_bias1 = sum(delta_h, 2);
grad_W2 = hidden*delta_o';
grad_bias2 = sum(delta_o, 2);
check = zeros(size(theta, 1), 2);
grad = packParams(grad_W1, grad_bias1, grad_W2, grad_bias2);
for i = 1:size(theta, 1)
Jplus = calcHalfDeriv(X, theta(:), i, nHidden, epsilon);
Jminus = calcHalfDeriv(X, theta(:), i, nHidden, -epsilon);
calcGrad = (Jplus - Jminus)/(2*epsilon);
check(i, :) = [calcGrad grad(i)];
end
end
checkHalfDeriv.m:
function [ loss ] = calcHalfDeriv(X, theta, i, nHidden, epsilon)
theta(i) = theta(i) + epsilon;
[nVisible, nInstances] = size(X);
[W1, bias1, W2, bias2] = unpackParams(theta, nVisible, nHidden);
[hidden, output] = fwdprop(W1, bias1, W2, bias2, X);
err = output - X;
loss = 0.5*sum(err(:).^2);
end
Update
Okay, I've also figured out why tying the weights was causing issues. I wanted to go down to just [ W1; bias1; bias2 ] since W2 = W1'. This way I could simply recreate W2 by looking at W1. However, because the values of $\theta$ are changed by epsilon, this was in effect changing both matrices at the same time. The proper solution is to simply pass W1 as a separate parameter while at the same time reducing $\theta$.
Update 2
Okay, this is what I get for posting too late at night. While the first update does indeed cause things to pass correctly, it's not the correct solution.
I think the correct thing to do is to actually calculate the gradients for W1 and W2, and then set the final gradient of W1 to grad_W1 to grad_W2. The hand-waving argument is that since the weight matrix is acting to both encode and decode, its weights must be affected by both gradients. I haven't thought through the actual theoretical ramifications of this yet, however.
If I run this using my own derivative check, it passes the 10e-4 threshold. It does much better than before with minFunc's derivative check, though still worse than if I don't tie the weights.

Related

How to perform adaptive step size using Runge-Kutta fourth order (Matlab)?

For me, it seems like the estimated hstep takes quite a long time and long iteration to converge.
I tried it with this first ODE.
Basically, you perform the difference between RK4 with stepsize of h with h/2.Please note that to reach the same timestep value, you will have to use the y value after two timestep of h/2 so that it reaches h also.
frhs=#(x,y) x.^2*y;
Is my code correct?
clear all;close all;clc
c=[]; i=1; U_saved=[]; y_array=[]; y_array_alt=[];
y_arr=1; y_arr_2=1;
frhs=#(x,y) 20*cos(x);
tol=0.001;
y_ini= 1;
y_ini_2= 1;
c=abs(y_ini-y_ini_2)
hc=1
all_y_values=[];
for m=1:500
if (c>tol || m==1)
fprintf('More')
y_arr
[Unew]=vpa(Runge_Kutta(0,y_arr,frhs,hc))
if (m>1)
y_array(m)=vpa(Unew);
y_array=y_array(logical(y_array));
end
[Unew_alt]=Runge_Kutta(0,y_arr_2,frhs,hc/2);
[Unew_alt]=vpa(Runge_Kutta(hc/2,Unew_alt,frhs,hc/2))
if (m>1)
y_array_alt(m)=vpa(Unew_alt);
y_array_alt=y_array_alt(logical(y_array_alt));
end
fprintf('More')
%y_array_alt(m)=vpa(Unew_alt);
c=vpa(abs(Unew_alt-Unew) )
hc=abs(tol/c)^0.25*hc
if (c<tol)
fprintf('Less')
y_arr=vpa(y_array(end) )
y_arr_2=vpa(y_array_alt(end) )
[Unew]=Runge_Kutta(0,y_arr,frhs,hc)
all_y_values(m)=Unew;
[Unew_alt]=Runge_Kutta(0,y_arr_2,frhs,hc/2);
[Unew_alt]=Runge_Kutta(hc/2,Unew_alt,frhs,hc/2)
c=vpa( abs(Unew_alt-Unew) )
hc=abs(tol/c)^0.2*hc
end
end
end
all_y_values
A better structure for the time loop has only one place where the time step is computed.
x_array = [x0]; y_array = [y0]; h = h_init;
x = x0; y = y0;
while x < x_end
[y_new, err] = RK4_step_w_error(x,y,rhs,h);
factor = abs(tol/err)^0.2;
if factor >= 1
y_array(end+1) = y = y_new;
x_array(end+1) = x = x+h;
end
h = factor*h;
end
For the data given in the code
rhs = #(x,y) 20*cos(x);
x0 = 0; y0 = 1; x_end = 6.5; tol = 1e-3; h_init = 1;
this gives the result against the exact solution
The computed points lie exactly on the exact solution, for the segments between them one would need to use a "dense output" interpolation. Or as a first improvement, just include the middle value from the half-step computation.
function [ y_next, err] = RK4_step_w_error(x,y,rhs,h)
y2 = RK4_step(x,y,rhs,h);
y1 = RK4_step(x,y,rhs,h/2);
y1 = RK4_step(x+h/2,y1,rhs,h/2);
y_next = y1;
err = (y2-y1)/15;
end
function y_next = RK4_step(x,y,rhs,h)
k1 = h*rhs(x,y);
k2 = h*rhs(x+h/2,y+k1);
k3 = h*rhs(x+h/2,y+k2);
k4 = h*rhs(x+h,y+k3);
y_next = y + (k1+2*k2+2*k3+k4)/6;
end
Revision 1
The error returned is the actual step error. The error that is required for the step size control however is the unit step error or error density, which is the step error with divided by h
function [ y_next, err] = RK4_step_w_error(x,y,rhs,h)
y2 = RK4_step(x,y,rhs,h);
y1 = RK4_step(x,y,rhs,h/2);
y1 = RK4_step(x+h/2,y1,rhs,h/2);
y_next = y1;
err = (y2-y1)/15/h;
end
Changing the example to a simple bi-stable model oscillating between two branches of stable equilibria
rhs = #(x,y) 3*y-y^3 + 3*cos(x);
x0 = 0; y0 = 1; x_end = 13.5; tol = 5e-3; h_init = 5e-2;
gives plots of solution, error (against an ode45 integration) and step sizes
Red crosses are the step sizes of rejected steps.
Revision 2
The error in the function values can be used as an error guidance for the extrapolation value which is of 5th order, making the method a 5th order method in extrapolation mode. As it uses the 4th order error to predict the 5th order optimal step size, a caution factor is recommended, the code changes in the appropriate places to
factor = 0.75*abs(tol/err)^0.2;
...
function [ y_next, err] = RK4_step_w_error(x,y,rhs,h)
y2 = RK4_step(x,y,rhs,h);
y1 = RK4_step(x,y,rhs,h/2);
y1 = RK4_step(x+h/2,y1,rhs,h/2);
y_next = y1+(y1-y2)/15;
err = (y1-y2)/15;
end
In the plots the step size is appropriately larger, but the error shows sharper and larger spikes, this version of the method is apparently less stable.

Thresholding a signal using hysteresis to reduce noise

I'm using a simple thresholding for noisy data to detect zeros/edges in signals. Since the noise can be pretty strong, I'm using hysteresis to improve the results. While this helps a lot, it also slows down a lot. Is there a way to improve this, maybe even a better way to calculate this? Currently I'm using the straight-forward loop approach:
% generate a signal
t = linspace(0, 10, 1000);
y = sin(2 * pi * t);
% constants
threshold = 0;
hyst = 0.1;
% thresholding
yth = zeros(size(y));
for i = 2:length(y)
if yth(i - 1) > 0.5
yth(i) = y(i) > (threshold - hyst);
else
yth(i) = y(i) > (threshold + hyst);
end
end
In comparison to yth = y > threshold this is much slower.
An improvement (25% time reduction; only for large inputs; on R2017b) can be gained by precomputing both possibilities for yth(i):
function yth = idea1()
yth = false(size(y)); % change #1 - `false` vs `zeros`
c1 = y > (th - hyst); % change #2 - precompute c1, c2
c2 = y > (th + hyst);
for k = 2:numel(y)
if yth(k - 1)
yth(k) = c1(k);
else
yth(k) = c2(k);
end
end
end
Usually, a big improvement can be gained by vectorization. To vectorize this problem we must understand the switching logic, so let us put it into words:
Start from Mode 2.
Mode 1: Take a value from c1. As long as c1 contains true values - take those. When a false value is encountered switch to Mode 2.
Mode 2: Take a value from c2. As long as c2 contains false values - take those. When a true value is encountered switch to Mode 1.
So if we could find the transition locations, we are practically done.
After some trial and error, while I was unable to get rid of the loop in a way that improves performance, I did reach the conclusion that it was possible (idea2). Furthermore, looking at the RLE-encoded correct yth, I came up with some pretty good approximations for it (idea3 and idea4) - though these will require tweaking for other inputs.
Perhaps somebody could use it to find a more clever implementation with fewer redundant computations. My full code is provided below. The RLE encoding algorithm was adapted from this answer, and RLE decoding from here.
function q48637952
% generate a signal
t = linspace(0, 10, 1000).';
y = sin(2 * pi * t);
% constants
th = 0;
hyst = 0.1;
%% Comaprison:
% Correctness:
R = {originalIdea(), idea1(), idea2()};
assert(isequal(R{:}));
% Runtime:
T = [timeit(#originalIdea,1), timeit(#idea1,1), timeit(#idea2,1)];
disp(T);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function yth = originalIdea()
yth = zeros(size(y));
for i = 2:length(y)
if yth(i - 1) > 0.5
yth(i) = y(i) > (th - hyst);
else
yth(i) = y(i) > (th + hyst);
end
end
end
function yth = idea1()
yth = false(size(y)); % change #1 - `false` vs `zeros`
c1 = y > (th - hyst); % change #2 - precompute c1, c2
c2 = y > (th + hyst);
for k = 2:numel(y)
if yth(k - 1)
yth(k) = c1(k);
else
yth(k) = c2(k);
end
end
end
function yth = idea2()
c = [y > (th - hyst), y > (th + hyst)];
% Using Run-length encoding:
[v1,l1] = rlEncode(c(:,1));
[v2,l2] = rlEncode(c(:,2));
rle = cat(3,[v1 l1 cumsum(l1)],[v2 l2 cumsum(l2)]);
% rle(:,:,2) is similar to rle(:,:,1) with small changes:
% - col1 is circshifted by 1 and
% - col2 has the 1st and last elements switched
newenc = reshape([rle(1:2:end,3,2), rle(1:2:end,3,1)].', [], 1);
yth = rlDecode(rle(:,1,2), [newenc(1); diff(newenc(1:end-1))]);
end
function yth = idea3()
% Approximation of yth, should be almost indistinguishable with high resolution
yth = [0 0 repmat(repelem([1,0],50), 1, ceil(numel(y)/(2*50)) )].';
% The amount of zeros at the beginning as well as the value "50" are problem-specific
% and need to be computed using signal-specific considerations
yth = yth(1:1000);
end
function yth = idea4()
% Another approximation
yth = circshift(y > th, 1);
% The value "1" is problem-specific
end
end % q48637952
%% RLE (de)compression:
function [vals, lens] = rlEncode(vec)
J = find(diff([vec(1)-1; vec(:)]));
vals = vec(J);
lens = diff([J; numel(vec)+1]);
end
function vec = rlDecode(vals, lens)
l = cumsum([ 1; lens(:) ]);
z = zeros(1, l(end)-1);
z(l(1:end-1)) = 1;
vec = vals(cumsum(z));
end

Comparing the function fminunc with the BFGS method for logistic regression

I´m constructing an algorithm that uses the BFGS method to find the parameters in a logistic regression for a binary dataset in Octave.
Now, I´m struggling with something I believe is an overfitting problem. I run the algorithm for several datasets and it actually converges to the same results as the fminunc function of Octave. However for an especific "type of dataset" the algorithm converges to very high values of the parameters, at contrary to the fminunc which gives razonable values of these parameters. I added a regularization term and I actually achieved my algorithm to converge to the same values of fminunc.
This especific type of dataset has data that can be completely separated by a straight line. My question is: why this is a problem for the BFGS method but it´s not a problem for fminunc? How this function avoid this issue without regularization? Could I implement this in my algorithm?
The code of my algorithm is the following:
function [beta] = Log_BFGS(data, L_0)
clc
close
%************************************************************************
%************************************************************************
%Loading the data:
[n, e] = size(data);
d = e - 1;
n; %Number of observations.
d; %Number of features.
Y = data(:, e); %Labels´ values
X_o = data(:, 1:d);
X = [ones(n, 1) X_o]; %Features values
%Initials conditions:
beta_0 = zeros(e, 1);
beta = [];
beta(:, 1) = beta_0;
N = 600; %Max iterations
Tol = 1e-10; %Tolerance
error = .1;
L = L_0; %Regularization parameter
B = eye(e);
options = optimset('GradObj', 'on', 'MaxIter', 600);
[beta_s] = fminunc(#(t)(costFunction(t, X, Y, L)), beta_0, options);
disp('Beta obtained with the fminunc function');
disp("--------------");
disp(beta_s)
k = 1;
a_0 = 1;
% Define the sigmoid function
h = inline('1.0 ./ (1.0 + exp(-z))');
while (error > Tol && k < N)
beta_k = beta(:, k);
x_0 = X*beta_k;
h_0 = h(x_0);
beta_r = [0 ; beta(:, k)(2:e, :)];
g_k = ((X)'*(h_0 - Y) + L*beta_r)/n;
d_k = -pinv(B)*g_k;
a = 0.1; %I´ll implement an Armijo line search here (soon)
beta(:, k+1) = beta(:, k) + a*d_k;
beta_k_1 = beta(:, k+1);
x_1 = X*beta_k_1;
h_1 = h(x_1);
beta_s = [0 ; beta(:, k+1)(2:e, :)];
g_k_1 = (transpose(X)*(h_1 - Y) + L*beta_s)/n;
s_k = beta(:, k+1) - beta(:, k);
y_k = g_k_1 - g_k;
B = B - B*s_k*s_k'*B/(s_k'*B*s_k) + y_k*y_k'/(s_k'*y_k);
k = k + 1;
error = norm(d_k);
endwhile
%Accuracy of the logistic model:
p = zeros(n, 1);
for j = 1:n
if (1./(1. + exp(-1.*(X(j, :)*beta(:, k)))) >= 0.5)
p(j) = 1;
else
p(j) = 0;
endif
endfor
R = mean(double(p == Y));
beta = beta(:, k);
%Showing the results:
disp("Estimation of logistic regression model Y = 1/(1 + e^(beta*X)),")
disp("using the algorithm BFGS =")
disp("--------------")
disp(beta)
disp("--------------")
disp("with a convergence error in the last iteration of:")
disp(error)
disp("--------------")
disp("and a total number of")
disp(k-1)
disp("iterations")
disp("--------------")
if k == N
disp("The maximum number of iterations was reached before obtaining the desired error")
else
disp("The desired error was reached before reaching the maximum of iterations")
endif
disp("--------------")
disp("The precision of the logistic regression model is given by (max 1.0):")
disp("--------------")
disp(R)
disp("--------------")
endfunction
The results I got for the dataset are showed in the following picture. If you need the data used in this situation, please let me know.
Results of the algorithm
Check the objectives!
The values of the solution-vector are nice, but the whole optimization is driven by the objective. You say fminunc which gives reasonable values of these parameters, but reasonable is not defined within this model.
It would not be impossible, that both, your low-value and your high-value solution allows pretty much the same objective. And that's what those solvers are solely caring about (when using no regulization-term).
So the important question is: is there a unique solution (which should disallow these results)? Only when your dataset has full rank! So maybe your data is rank-deficient and you obtain two equally good solutions. Of course there might be slight differences due to numerical-issues, which are always a source of errors, especially in more complex optimization-algorithms.

How integral image influence the result of local binary pattern or center symmetric local binary pattern

I know this looks somehow not related to code errors and development but
I want to know if someone can understand these codes of
integral image and local binary pattern, and tell me how they affect the resulting histograms.
Before the use of integral image the output histogram is normal, but after applying the integral image method I found that most of the histogram changed to zeros. To clarify things, the expected benefit from the use of an integral image is to speed up the process of lbp method. In fact, I haven't seen this before because I'm trying it for the first time. Does anybody who knows about this may help me please?
These are the codes of every method:
Integral image
function [outimg] = integral( image )
[y,x] = size(image);
outimg = zeros(y+1,x+1);
disp(y);
for a = 1:y+1
for b = 1:x+1
rx = b-1;
ry = a-1;
while ry>=1
while rx>=1
outimg(a,b) = outimg(a,b)+image(ry,rx);
rx = rx-1;
end
rx = b-1;
ry = ry-1;
end
% outimg(a,b) = outimg(a,b)-image(a,b);
end
end
% outimg(1,1) = image(1,1);
disp('end loop');
end
CS-LBP
function h = CSLBP(I)
%% this function takes patch or image as input and return Histogram of
%% CSLBP operator.
h = zeros(1,16);
[y,x] = size(I);
T = 0.1; % threshold given by authors in their paper
for i = 2:y-1
for j = 2:x-1
% keeping I(j,i) as center we compute CSLBP
% N0 - N4
a = ((I(i,j+1) - I(i, j-1) > T ) * 2^0 );
b = ((I(i+1,j+1) - I(i-1, j-1) > T ) * 2^1 );
c = ((I(i+1,j) - I(i-1, j) > T ) * 2^2 );
d = ((I(i+1,j-1) - I(i - 1, j + 1) > T ) * 2^3 );
e = a+b+c+d;
h(e+1) = h(e+1) + 1;
end
end
end
Matlab has an inbuilt function for creating integral images, integralimage(). If you don't want to use the computer vision system toolbox you can achieve the same result by calling:
IntIm = cumsum(cumsum(double(I)),2);
Possibly adding padding if needed. You should check out that the image is not saturated, they do that sometimes. Calculating the cumulative sum goes to integers way above the range of uint8 and uint16 quickly, I even had it happen with a double once!

MATLAB - Exponential Curve Fitting without Toolbox

I have data points (x, y) that I need to fit an exponential function to,
y = A + B * exp(C * x),
but I can neither use the Curve Fitting Toolbox nor the Optimization Toolbox.
User rayryeng was good enough to help me with working code:
x = [0 0.0036 0.0071 0.0107 0.0143 0.0178 0.0214 0.0250 0.0285 0.0321 0.0357 0.0392 0.0428 0.0464 0.0464];
y = [1.3985 1.3310 1.2741 1.2175 1.1694 1.1213 1.0804 1.0395 1.0043 0.9691 0.9385 0.9080 0.8809 0.7856 0.7856];
M = [ones(numel(x),1), x(:)]; %// Ensure x is a column vector
lny = log(y(:)); %// Ensure y is a column vector and take ln
X = M\lny; %// Solve for parameters
A = exp(X(1)); %// Solve for A
b = X(2); %// Get b
xval = linspace(min(x), max(x));
yval = A*exp(b*xval);
plot(x,y,'r.',xval,yval,'b');
However, this code only fits the equation without offset
y = A * exp(B * x).
How can I extend this code to fit the three-parameter equation?
In another attempt, I managed to fit the function using fminsearch:
function [xval, yval] = curve_fitting_exponential_1_optimized(x,y,xval)
start_point = rand(1, 3);
model = #expfun;
est = fminsearch(model, start_point);
function [sse, FittedCurve] = expfun(params)
A = params(1);
B = params(2);
C = params(3);
FittedCurve = A + B .* exp(-C * x);
ErrorVector = FittedCurve - y;
sse = sum(ErrorVector .^ 2);
end
yval = est(1)+est(2) * exp(-est(3) * xval);
end
The problem here is that the result depends on the starting point which is randomly chosen, so I don't get a stable solution. But since I need the function for automatization, I need something stable. How can I get a stable solution?
How to adapt rayryeng's code for three parameters?
rayryeng used the strategy to linearize a nonlinear equation so that standard regression methods can be applied. See also Jubobs' answer to a similar question.
This strategy does no longer work if there is a non-zero offset A. We can fix the situation by getting a rough estimate of the offset. As rubenvb mentioned in the comments, we could estimate A by min(y), but then the logarithm gets applied to a zero. Instead, we could leave a bit of space between our guess of A and the minimum of the data, say half its range. Then we subtract A from the data and use rayreng's method:
x = x(:); % bring the data into the standard, more
y = y(:); % convenient format of column vectors
Aguess = min(y) - (max(y) - min(y) / 2;
guess = [ones(size(x)), -x] \ log(y - Aguess);
Bguess = exp(guess(1));
Cguess = guess(2);
For the given data, this results in
Aguess = 0.4792
Bguess = 0.9440
Cguess = 21.7609
Other than for the two-parameter situation, we cannot expect this to be a good fit. Its SSE is 0.007331.
How to get a stable solution?
This guess is however useful as a starting point for the nonlinear optimization:
start_point = [Aguess, Bguess, Cguess];
est = fminsearch(#expfun, start_point);
Aest = est(1);
Best = est(2);
Cest = est(3);
Now the optimization arrives at a stable estimate, because the computation is deterministic:
Aest = -0.1266
Best = 1.5106
Cest = 10.2314
The SSE of this estimate is 0.004041.
This is what the data (blue dots) and fitted curves (green: guess, red: optimized) look like:
Here is the whole function in all its glory - special thanks to A. Donda!
function [xval, yval] = curve_fitting_exponential_1_optimized(x,y,xval)
x = x(:); % bring the data into the standard, more convenient format of column vectors
y = y(:);
Aguess = min(y) - (max(y)-min(y)) / 2;
guess = [ones(size(x)), -x] \ log(y - Aguess);
Bguess = exp(guess(1));
Cguess = guess(2);
start_point = [Aguess, Bguess, Cguess];
est = fminsearch(#expfun, start_point);
function [sse, FittedCurve] = expfun(params)
A = params(1);
B = params(2);
C = params(3);
FittedCurve = A + B .* exp(-C * x);
ErrorVector = FittedCurve - y;
sse = sum(ErrorVector .^ 2);
end
yval = est(1)+est(2) * exp(-est(3) * xval);
end