creating a train perceptron in MATLAB for gender clasiffication - matlab

I am coding a perceptron to learn to categorize gender in pictures of faces. I am very very new to MATLAB, so I need a lot of help. I have a few questions:
I am trying to code for a function:
function [y] = testset(x,w)
%y = sign(sigma(x*w-threshold))
where y is the predicted results, x is the training/testing set put in as a very large matrix, and w is weight on the equation. The part after the % is what I am trying to write, but I do not know how to write this in MATLAB code. Any ideas out there?
I am trying to code a second function:
function [err] = testerror(x,w,y)
%err = sigma(max(0,-w*x*y))
w, x, and y have the same values as stated above, and err is my function of error, which I am trying to minimize through the steps of the perceptron.
I am trying to create a step in my perceptron to lower the percent of error by using gradient descent on my original equation. Does anyone know how I can increment w using gradient descent in order to minimize the error function using an if then statement?
I can put up the code I have up till now if that would help you answer these questions.
Thank you!
edit--------------------------
OK, so I am still working on the code for this, and would like to put it up when I have something more complete. My biggest question right now is:
I have the following function:
function [y] = testset(x,w)
y = sign(sum(x*w-threshold))
Now I know that I am supposed to put a threshold in, but cannot figure out what I am supposed to put in as the threshold! any ideas out there?
edit----------------------------
this is what I have so far. Changes still need to be made to it, but I would appreciate input, especially regarding structure, and advice for making the changes that need to be made!
function [y] = Perceptron_Aviva(X,w)
y = sign(sum(X*w-1));
end
function [err] = testerror(X,w,y)
err = sum(max(0,-w*X*y));
end
%function [w] = perceptron(X,Y,w_init)
%w = w_init;
%end
%------------------------------
% input samples
X = X_train;
% output class [-1,+1];
Y = y_train;
% init weigth vector
w_init = zeros(size(X,1));
w = w_init;
%---------------------------------------------
loopcounter = 0
while abs(err) > 0.1 && loopcounter < 100
for j=1:size(X,1)
approx_y(j) = Perceptron_Aviva(X(j),w(j))
err = testerror(X(j),w(j),approx_y(j))
if err > 0 %wrong (structure is correct, test is wrong)
w(j) = w(j) - 0.1 %wrong
elseif err < 0 %wrong
w(j) = w(j) + 0.1 %wrong
end
% -----------
% if sign(w'*X(:,j)) ~= Y(j) %wrong decision?
% w = w + X(:,j) * Y(j); %then add (or subtract) this point to w
end

you can read this question I did some time ago.
I uses a matlab code and a function perceptron
function [w] = perceptron(X,Y,w_init)
w = w_init;
for iteration = 1 : 100 %<- in practice, use some stopping criterion!
for ii = 1 : size(X,2) %cycle through training set
if sign(w'*X(:,ii)) ~= Y(ii) %wrong decision?
w = w + X(:,ii) * Y(ii); %then add (or subtract) this point to w
end
end
sum(sign(w'*X)~=Y)/size(X,2) %show misclassification rate
end
and it is called from code (#Itamar Katz) like (random data):
% input samples
X1=[rand(1,100);rand(1,100);ones(1,100)]; % class '+1'
X2=[rand(1,100);1+rand(1,100);ones(1,100)]; % class '-1'
X=[X1,X2];
% output class [-1,+1];
Y=[-ones(1,100),ones(1,100)];
% init weigth vector
w=[.5 .5 .5]';
% call perceptron
wtag=perceptron(X,Y,w);
% predict
ytag=wtag'*X;
% plot prediction over origianl data
figure;hold on
plot(X1(1,:),X1(2,:),'b.')
plot(X2(1,:),X2(2,:),'r.')
plot(X(1,ytag<0),X(2,ytag<0),'bo')
plot(X(1,ytag>0),X(2,ytag>0),'ro')
legend('class -1','class +1','pred -1','pred +1')
I guess this can give you an idea to make the functions you described.
To the error compare the expected result with the real result (class)

Assume your dataset is X, the datapoins, and Y, the labels of the classes.
f=newp(X,Y)
creates a perceptron.
If you want to create an MLP then:
f=newff(X,Y,NN)
where NN is the network architecture, i.e. an array that designates the number of neurons at each hidden layer. For example
NN=[5 3 2]
will correspond to an network with 5 neurons at the first layers, 3 at the second and 2 a the third hidden layer.

Well what you call threshold is the Bias in machine learning nomenclature. This should be left as an input for the user because it is used during training.
Also, I wonder why you are not using the builtin matlab functions. i.e newp or newff. e.g.
ff=newp(X,Y)
Then you can set the properties of the object ff to do your job for selecting gradient descent and so on.

Related

How to interprete the regression plot obtained at the end of neural network regression for multiple outputs?

I have trained my Neural network model using MATLAB NN Toolbox. My network has multiple inputs and multiple outputs, 6 and 7 respectively, to be precise. I would like to clarify few questions based on it:-
The final regression plot showed at the end of the training shows a very good accuracy, R~0.99. However, since I have multiple outputs, I am confused as to which scatter plot does it represent? Shouldn't we have 7 target vs predicted plots for each of the output variable?
According to my knowledge, R^2 is a better method of commenting upon the accuracy of the model, whereas MATLAB reports R in its plot. Do I treat that R as R^2 or should I square the reported R value to obtain R^2.
I have generated the Matlab Script containing weight, bias and activation functions, as a final Result of the training. So shouldn't I be able to simply give my raw data as input and obtain the corresponding predicted output. I gave the exact same training set using the indices Matlab chose for training (to cross check), and plotted the predicted output vs actual output, but the result is not at all good. Definitely, not along the lines of R~0.99. Am I doing anything wrong?
code:
function [y1] = myNeuralNetworkFunction_2(x1)
%MYNEURALNETWORKFUNCTION neural network simulation function.
% X = [torque T_exh lambda t_Spark N EGR];
% Y = [O2R CO2R HC NOX CO lambda_out T_exh2];
% Generated by Neural Network Toolbox function genFunction, 17-Dec-2018 07:13:04.
%
% [y1] = myNeuralNetworkFunction(x1) takes these arguments:
% x = Qx6 matrix, input #1
% and returns:
% y = Qx7 matrix, output #1
% where Q is the number of samples.
%#ok<*RPMT0>
% ===== NEURAL NETWORK CONSTANTS =====
% Input 1
x1_step1_xoffset = [-24;235.248;0.75;-20.678;550;0.799];
x1_step1_gain = [0.00353982300884956;0.00284355877067267;6.26959247648903;0.0275865874012055;0.000366568914956012;0.0533831576137729];
x1_step1_ymin = -1;
% Layer 1
b1 = [1.3808996210168685;-2.0990163849711894;0.9651733083552595;0.27000953282929346;-1.6781835509820286;-1.5110463684800366;-3.6257438832309905;2.1569498669085361;1.9204156230460485;-0.17704342477904209];
IW1_1 = [-0.032892214008082517 -0.55848270745152429 -0.0063993424771670616 -0.56161004933654057 2.7161844536020197 0.46415317073346513;-0.21395624254052176 -3.1570133640176681 0.71972178875396853 -1.9132557838515238 1.3365248285282931 -3.022721627052706;-1.1026780445896862 0.2324603066452392 0.14552308208231421 0.79194435276493658 -0.66254679969168417 0.070353201192052434;-0.017994515838487352 -0.097682677816992206 0.68844109281256027 -0.001684535122025588 0.013605622123872989 0.05810686279306107;0.5853667840629273 -2.9560683084876329 0.56713425120259764 -2.1854386350040116 1.2930115031659106 -2.7133159265497957;0.64316656469750333 -0.63667017646313084 0.50060179040086761 -0.86827897068177973 2.695456517458648 0.16822164719859456;-0.44666821007466739 4.0993786464616679 -0.89370838440321498 3.0445073606237933 -3.3015566360833453 -4.492874075961689;1.8337574137485424 2.6946232855369989 1.1140472073136622 1.6167763205944321 1.8573696127039145 -0.81922672766933646;-0.12561950922781362 3.0711045035224349 -0.6535751823440773 2.0590707752473199 -1.3267693770634292 2.8782780742777794;-0.013438026967107483 -0.025741311825949621 0.45460734966889638 0.045052447491038108 -0.21794568374100454 0.10667240367191703];
% Layer 2
b2 = [-0.96846557414356171;-0.2454718918618051;-0.7331628718025488;-1.0225195290982099;0.50307202195645395;-0.49497234988401961;-0.21817117469133171];
LW2_1 = [-0.97716474643411022 -0.23883775971686808 0.99238069915206006 0.4147649511973347 0.48504023209224734 -0.071372217431684551 0.054177719330469304 -0.25963474838320832 0.27368380212104881 0.063159321947246799;-0.15570858147605909 -0.18816739764334323 -0.3793600124951475 2.3851961990944681 0.38355142531334563 -0.75308427071748985 -0.1280128732536128 -1.361052031781103 0.6021878865831336 -0.24725687748503239;0.076251356114485525 -0.10178293627600112 0.10151304376762409 -0.46453434441403058 0.12114876632815359 0.062856969143306296 -0.0019628163322658364 -0.067809039768745916 0.071731544062023825 0.65700427778446913;0.17887084584125315 0.29122649575978238 0.37255802759192702 1.3684190468992126 0.60936238465090853 0.21955911453674043 0.28477957899364675 -0.051456306721251184 0.6519451272106177 -0.64479205028051967;0.25743349663436799 2.0668075180209979 0.59610776847961111 -3.2609682919282603 1.8824214917530881 0.33542869933904396 0.03604272669356564 -0.013842766338427388 3.8534510207741826 2.2266745660915586;-0.16136175574939746 0.10407287099228898 -0.13902245286490234 0.87616472446622717 -0.027079111747601223 0.024812287505204988 -0.030101536834009103 0.043168268669541855 0.12172932035587079 -0.27074383434206573;0.18714562505165402 0.35267726325386606 -0.029241400610813449 0.53053853235049087 0.58880054832728757 0.047959541165126809 0.16152268183097709 0.23419456403348898 0.83166785128608967 -0.66765237856750781];
% Output 1
y1_step1_ymin = -1;
y1_step1_gain = [0.114200879346771;0.145581598485951;0.000139011547272197;0.000456244862967996;2.05816254143146e-05;5.27704485488127;0.00284355877067267];
y1_step1_xoffset = [-0.045;1.122;2.706;17.108;493.726;0.75;235.248];
% ===== SIMULATION ========
% Dimensions
Q = size(x1,1); % samples
% Input 1
x1 = x1';
xp1 = mapminmax_apply(x1,x1_step1_gain,x1_step1_xoffset,x1_step1_ymin);
% Layer 1
a1 = tansig_apply(repmat(b1,1,Q) + IW1_1*xp1);
% Layer 2
a2 = repmat(b2,1,Q) + LW2_1*a1;
% Output 1
y1 = mapminmax_reverse(a2,y1_step1_gain,y1_step1_xoffset,y1_step1_ymin);
y1 = y1';
end
% ===== MODULE FUNCTIONS ========
% Map Minimum and Maximum Input Processing Function
function y = mapminmax_apply(x,settings_gain,settings_xoffset,settings_ymin)
y = bsxfun(#minus,x,settings_xoffset);
y = bsxfun(#times,y,settings_gain);
y = bsxfun(#plus,y,settings_ymin);
end
% Sigmoid Symmetric Transfer Function
function a = tansig_apply(n)
a = 2 ./ (1 + exp(-2*n)) - 1;
end
% Map Minimum and Maximum Output Reverse-Processing Function
function x = mapminmax_reverse(y,settings_gain,settings_xoffset,settings_ymin)
x = bsxfun(#minus,y,settings_ymin);
x = bsxfun(#rdivide,x,settings_gain);
x = bsxfun(#plus,x,settings_xoffset);
end
The above one is the automatically generated code. The plot which I generated to cross-check the first variable is below:-
% X and Y are input and output - same as above
X_train = X(results.info1.train.indices,:);
y_train = Y(results.info1.train.indices,:);
out_train = myNeuralNetworkFunction_2(X_train);
scatter(y_train(:,1),out_train(:,1))
To answer your question about R: Yes, you should square R to get the R^2 value. In this case, they will be very close since R is very close to 1.
The graphs give the correlation between the estimated and real (target) values. So R is the strenght of the correlation. You can square it to find the R-square.
The graph you draw and matlab gave are not the graph of the same variables. The ranges or scales of the axes are very different.
First of all, is the problem you are trying to solve a regression problem? Or is it a classification problem with 7 classes converted to numeric? I assume this is a classification problem, as you are trying to get the success rate for each class.
As for your first question: According to the literature it is recommended to use the value "All: R". If you want to get the success rate of each of your classes, Precision, Recall, F-measure, FP rate, TP Rate, etc., which are valid in classification problems. values ​​you need to reach. There are many matlab documents for this (help ROC) and you can look at the details. All the values ​​I mentioned and which I think you actually want are obtained from the confusion matrix.
There is a good example of this.
[x,t] = simpleclass_dataset;
net = patternnet(10);
net = train(net,x,t);
y = net(x);
[c,cm,ind,per] = confusion(t,y)
I hope you will see what you want from the "nntraintool" window that appears when you run the code.
Your other questions have already been answered. Alternatively, you can consider using a machine learning algorithm with open source software such as Weka.

Input equations into Matlab for Simulink Function

I am currently working on an assignment where I need to create two different controllers in Matlab/Simulink for a robotic exoskeleton leg. The idea behind this is to compare both of them and see which controller is better at assisting a human wearing it. I am having a lot of trouble putting specific equations into a Matlab function block to then run in Simulink to get results for an AFO (adaptive frequency oscillator). The link has the equations I'm trying to put in and the following is the code I have so far:
function [pos_AFO, vel_AFO, acc_AFO, offset, omega, phi, ampl, phi1] = LHip(theta, eps, nu, dt, AFO_on)
t = 0;
% syms j
% M = 6;
% j = sym('j', [1 M]);
if t == 0
omega = 3*pi/2;
theta = 0;
phi = pi/2;
ampl = 0;
else
omega = omega*(t-1) + dt*(eps*offset*cos(phi1));
theta = theta*(t-1) + dt*(nu*offset);
phi = phi*(t-1) + dt*(omega + eps*offset*cos(phi*core(t-1)));
phi1 = phi*(t-1) + dt*(omega + eps*offset*cos(phi*core(t-1)));
ampl = ampl*(t-1) + dt*(nu*offset*sin(phi));
offset = theta - theta*(t-1) - sym(ampl*sin(phi), [1 M]);
end
pos_AFO = (theta*(t-1) + symsum(ampl*(t-1)*sin(phi* (t-1))))*AFO_on; %symsum needs input argument for index M and range
vel_AFO = diff(pos_AFO)*AFO_on;
acc_AFO = diff(vel_AFO)*AFO_on;
end
https://www.pastepic.xyz/image/pg4mP
Essentially, I don't know how to do the subscripts, sigma, or the (t+1) function. Any help is appreciated as this is due next week
You are looking to find the result of an adaptive process therefore your algorithm needs to consider time as it progresses. There is no (t-1) operator as such. It is just a mathematical notation telling you that you need to reuse an old value to calculate a new value.
omega_old=0;
theta_old=0;
% initialize the rest of your variables
for [t=1:N]
omega[t] = omega_old + % here is the rest of your omega calculation
theta[t] = theta_old + % ...
% more code .....
% remember your old values for next iteration
omega_old = omega[t];
theta_old = theta[t];
end
I think you forgot to apply the modulo operation to phi judging by the original formula you linked. As a general rule, design your code in small pieces, make sure the output of each piece makes sense and then combine all pieces and make sure the overall result is correct.

How to generate the desired oscillation graph? [MATLAB]

I have a mathematical equation that describes a dynamical system as
The parameters are defined as follows
k1=1; S=1; Kd=1; p=2; tau=10; k2=1; ET=1; Km=1;
I coded the system as
y(1) = 1; % based on the y-axes starting point in the last figure
y(2) = y(1) + k1*S*Kd^p/(Kd^p + y(1)^p) - k2*ET*y(1)/(Km + y(1)); % to avoid errors
for t=1:100
y(t+1) = y(t+1) + (k1*S*Kd^p/(Kd^p + y(t)^p) - k2*ET*y(t+1)/(Km + y(t+1)));
end
plot(y);
Note that I did not use tau=10 for simplicity and instead used a delayed version by 1 instead of 10 (because I am not sure how to insert a delay of 10)
And obtained the following result
However, I need to obtain this
Can anyone help me rectify the mistake in my code?
Thanking you in advance.
If we assume that for Y(t) = 0 for t < 0 then you're code could be modified to produce a similar plot. However, it looks like the plot you are looking to generate uses different initial conditions. If you're just looking to measure Tc then it appears that the signal stabilizes with the period you're looking for.
k1=1; S=1; Kd=1; p=2; tau=10; k2=1; ET=1; Km=1;
% time step size (tau MUST be divisible by dt to ensure proper array indexing)
dt = 0.01;
% time series
t = -10:dt:100;
% initialize y to all zeros so that y(t)=0 for all t<0 (initial condition)
y = zeros(size(t));
% Find starting and ending indexes to iterate from t=0 to t=100-dt
idx0 = find(t == 0);
idx1 = numel(t)-1;
% initial condition y(0) = 1
y(idx0) = 1;
for n = idx0:idx1
% The indexing used here ensures the following equivalences.
% y(n+1) = y(t+dt)
% y(n) = y(t)
% y(n - round(tau/dt)) = y(t-tau)
%
% Note that (y(t+dt)-y(t))/dt is approximately y'(t)
% Solving for y(t+dt) we get the following formula
y(n+1) = y(n) + dt*((k1*S*Kd^p/(Kd^p + y(n - round(tau/dt))^p) - k2*ET*y(n)/(Km + y(n))));
end
% plot y(t) for t > 0
plot(t(t>0),y(t>0));
Result
Seeing as things stabilize we can take the values in one of the periods and use those for the initial conditions and we get.
Edit: To elaborate, the function contains a delay of 10 which means that instead of just a single initial condition at y(0), we also need to initialize all values from t=-10 to 0. In the code posted in this answer I arbitrarily assumed that y(t) = 0 for t < 0 and y(0) = 1 because I don't know otherwise. Once we run the code and see that the signal becomes periodic we can borrow the values from one of these periods to use those as the initial conditions.
From the diagram you posted we can use our intuition to guess that, before time 0, the signal probably looks something like the region highlighted in the figure below.
If, rather than using zero to initialize y at y < 0, we copy the values in the red highlighted region, then we get a plot that is more like what you desire.
To get the plot shown above I ran the script once, then found the indices in y for the part I wanted to use as initial conditions, then copied those into a new array.
init_cond = y(7004:8004);
Then I changed script to use this array as the initial condition and changed the initial y values to
y = zeros(size(t));
y(1:1001) = init_cond;
and ran the modified script again.
Edit 2: The built-in function dde23 appears to be applicable for your problem. To see an example run the command edit ddex1 in the command window.

Something's wrong with my Logistic Regression?

I'm trying to verify if my implementation of Logistic Regression in Matlab is good. I'm doing so by comparing the results I get via my implementation with the results given by the built-in function mnrfit.
The dataset D,Y that I have is such that each row of D is an observation in R^2 and the labels in Y are either 0 or 1. Thus, D is a matrix of size (n,2), and Y is a vector of size (n,1)
Here's how I do my implementation:
I first normalize my data and augment it to include the offset :
d = 2; %dimension of data
M = mean(D) ;
centered = D-repmat(M,n,1) ;
devs = sqrt(sum(centered.^2)) ;
normalized = centered./repmat(devs,n,1) ;
X = [normalized,ones(n,1)];
I will be doing my calculations on X.
Second, I define the gradient and hessian of the likelihood of Y|X:
function grad = gradient(w)
grad = zeros(1,d+1) ;
for i=1:n
grad = grad + (Y(i)-sigma(w'*X(i,:)'))*X(i,:) ;
end
end
function hess = hessian(w)
hess = zeros(d+1,d+1) ;
for i=1:n
hess = hess - sigma(w'*X(i,:)')*sigma(-w'*X(i,:)')*X(i,:)'*X(i,:) ;
end
end
with sigma being a Matlab function encoding the sigmoid function z-->1/(1+exp(-z)).
Third, I run the Newton algorithm on gradient to find the roots of the gradient of the likelihood. I implemented it myself. It behaves as expected as the norm of the difference between the iterates goes to 0. I wrote it based on this script.
I verified that the gradient at the wOPT returned by my Newton implementation is null:
gradient(wOP)
ans =
1.0e-15 *
0.0139 -0.0021 0.2290
and that the hessian has strictly negative eigenvalues
eig(hessian(wOPT))
ans =
-7.5459
-0.0027
-0.0194
Here's the wOPT I get with my implementation:
wOPT =
-110.8873
28.9114
1.3706
the offset being the last element. In order to plot the decision line, I should convert the slope wOPT(1:2) using M and devs. So I set :
my_offset = wOPT(end);
my_slope = wOPT(1:d)'.*devs + M ;
and I get:
my_slope =
1.0e+03 *
-7.2109 0.8166
my_offset =
1.3706
Now, when I run B=mnrfit(D,Y+1), I get
B =
-1.3496
1.7052
-1.0238
The offset is stored in B(1).
I get very different values. I would like to know what I am doing wrong. I have some doubt about the normalization and 'un-normalization' process. But I'm not sure, may be I'm doing something else wrong.
Additional Info
When I tape :
B=mnrfit(normalized,Y+1)
I get
-1.3706
110.8873
-28.9114
which is a rearranged version of the opposite of my wOPT. It contains exactly the same elements.
It seems likely that my scaling back of the learnt parameters is wrong. Otherwise, it would have given the same as B=mnrfit(D,Y+1)

Neural Network convergence speed (Levenberg-Marquardt) (MATLAB)

I was trying to approximate a function (single input and single output) with an ANN. Using MATLAB toolbox I could see that with 5 or more neurons in the hidden layer, I can achieve a very nice result. So I am trying to do it manually.
Calculations:
As the network has only one input and one output, the partial derivative of the error (e=d-o, where 'd' is the desired output and 'o' is the actual output) in respect to a weigth which connects a hidden neuron j to the output neuron, will be -hj (where hj is the output of a hidden neuron j);
The partial derivative of the error in respect to output bias will be -1;
The partial derivative of the error in respect to a weight which connects the input to a hidden neuron j will be -woj*f'*i, where woj is the hidden neuron j output weigth, f' is the tanh() derivative and 'i' is the input value;
Finally, the partial derivative of the error in respect to hidden layer bias will be the same as above (in respect to input weight) except that here we dont have the input:
-woj*f'
The problem is:
the MATLAB algorithm always converge faster and better. I can achieve the same curve as MATLAB does, but my algorithm requires much more epochs.
I've tried to remove pre and postprocessing functions from MATLAB algorithm. It still converges faster.
I've also tried to create and configure the network, and extract weight/bias values before training so I could copy them to my algorithm to see if it converges faster but nothing changed (is the weight/bias initialization inside create/configure or train function?).
Does the MATLAB algorithm have some kind of optimizations inside the code?
Or may be this difference only in the organization of the training set and weight/bias initialization?
In case one wants to look my code, here is the main loop which makes the training:
Err2 = N;
epochs = 0;
%compare MSE of error2
while ((Err2/N > 0.0003) && (u < 10000000) && (epochs < 100))
epochs = epochs+1;
Err = 0;
%input->hidden weight vector
wh = w(1:hidden_layer_len);
%hidden->output weigth vector
wo = w((hidden_layer_len+1):(2*hidden_layer_len));
%hidden bias
bi = w((2*hidden_layer_len+1):(3*hidden_layer_len));
%output bias
bo = w(length(w));
%start forward propagation
for i=1:N
%take next input value
x = t(i);
%propagate to hidden layer
neth = x*wh + bi;
%propagate through neurons
ij = tanh(neth)';
%propagate to output layer
neto = ij*wo + bo;
%propagate to output (purelin)
output(i) = neto;
%calculate difference from target (error)
error(i) = yp(i) - output(i);
%Backpropagation:
%tanh derivative
fhd = 1 - tanh(neth').*tanh(neth');
%jacobian matrix
J(i,:) = [-x*wo'.*fhd -ij -wo'.*fhd -1];
%SSE (sum square error)
Err = Err + 0.5*error(i)*error(i);
end
%calculate next error with updated weights and compare with old error
%start error2 from error1 + 1 to enter while loop
Err2 = Err+1;
%while error2 is > than old error and Mu (u) is not too large
while ((Err2 > Err) && (u < 10000000))
%Weight update
w2 = w - (((J'*J + u*eye(3*hidden_layer_len+1))^-1)*J')*error';
%New Error calculation
%New weights to propagate
wh = w2(1:hidden_layer_len);
wo = w2((hidden_layer_len+1):(2*hidden_layer_len));
%new bias to propagate
bi = w2((2*hidden_layer_len+1):(3*hidden_layer_len));
bo = w2(length(w));
%calculate error2
Err2 = 0;
for i=1:N
%forward propagation again
x = t(i);
neth = x*wh + bi;
ij = tanh(neth)';
neto = ij*wo + bo;
output(i) = neto;
error2(i) = yp(i) - output(i);
%Error2 (SSE)
Err2 = Err2 + 0.5*error2(i)*error2(i);
end
%compare MSE from error2 with a minimum
%if greater still runing
if (Err2/N > 0.0003)
%compare with old error
if (Err2 <= Err)
%if less, update weights and decrease Mu (u)
w = w2;
u = u/10;
else
%if greater, increment Mu (u)
u = u*10;
end
end
end
end
It's not easy to know the exact implementation of the Levenberg Marquardt algorithm in Matlab. You may try to run the algorithm one iteration at a time, and see if it is identical to your algorithm. You can also try other implementations, such as, http://www.mathworks.com/matlabcentral/fileexchange/16063-lmfsolve-m--levenberg-marquardt-fletcher-algorithm-for-nonlinear-least-squares-problems, to see if the performance can be improved. For simple learning problems, convergence speed may be a matter of learning rate. You might simply increase the learning rate to get faster convergence.