Error Backpropagation - Neural network - matlab

I am trying to write a code for error back-propagation for neural network but my code is taking really long time to execute. I know that training of Neural network takes long time but it is taking long time for a single iteration as well.
Multi-class classification problem!
Total number of training set = 19978
Number of inputs = 513
Number of hidden units = 345
Number of classes = 10
Below is my entire code:
X=horzcat(ones(19978,1),inputMatrix); %Adding bias
M=floor(0.66*(513+10)); %Taking two-third of imput+output
Wji=rand(513,M);
aj=X*Wji;
zj=tanh(aj); %Hidden Layer output
Wkj=rand(M,10);
ak=zj*Wkj;
akTranspose = ak';
ykTranspose=softmax(akTranspose); %For multi-class classification
yk=ykTranspose'; %Final output
error=0;
%Initializing target variables
t = zeros(19978,10);
t(1:2000,1)=1;
t(2001:4000,2)=1;
t(4001:6000,3)=1;
t(6001:8000,4)=1;
t(8001:10000,5)=1;
t(10001:12000,6)=1;
t(12001:14000,7)=1;
t(14001:16000,8)=1;
t(16001:18000,9)=1;
t(18001:19778,10)=1;
errorArray=zeros(100000,1); %Stroing error values to keep track of error iteration
errorDiff=zeros(100000,1);
for nIterations=1:5
errorOld=error;
aj=X*Wji; %Forward propagating in each iteration
zj=tanh(aj);
ak=zj*Wkj;
akTranspose = ak';
ykTranspose=softmax(akTranspose);
yk=ykTranspose';
error=0;
%Calculating error
for n=1:19978 %for 19978 training samples
for k=1:10 %for 10 classes
error = error + t(n,k)*log(yk(n,k)); %using cross entropy function
end
end
error=-error;
Ediff = error-errorOld;
errorArray(nIterations,1)=error;
errorDiff(nIterations,1)=Ediff;
%Calculating dervative of error wrt weights wji
derEWji=zeros(513,345);
derEWkj=zeros(345,10);
for i=1:513
for j=1:M;
derErrorTemp=0;
for k=1:10
for n=1:19978
derErrorTemp=derErrorTemp+Wkj(j,k)*(yk(n,k)-t(n,k));
Calculating derivative of E wrt Wkj%
derEWkj(j,k) = derEWkj(j,k)+(yk(n,k)-t(n,k))*zj(n,j);
end
end
for n=1:19978
Calculating derivative of E wrt Wji
derEWji(i,j) = derEWji(i,j)+(1-(zj(n,j)*zj(n,j)))*derErrorTemp;
end
end
end
eta = 0.0001; %learning rate
Wji = Wji - eta.*derEWji; %updating weights
Wkj = Wkj - eta.*derEWkj;
end

for-loop is very time-consuming in Matlab even with the help of JIT. Try to modify your code by vectorize them rather than organizing them in a 3-loop or even 4-loop. For example,
for n=1:19978 %for 19978 training samples
for k=1:10 %for 10 classes
error = error + t(n,k)*log(yk(n,k)); %using cross entropy function
end
end
can be changed to:
error = sum(sum(t.*yk)); % t and yk are both n*k arrays that you construct
You may try to do similar jobs for the rest of your code. Use dot product or multiplication operations on arrays for different cases.

Related

How to interprete the regression plot obtained at the end of neural network regression for multiple outputs?

I have trained my Neural network model using MATLAB NN Toolbox. My network has multiple inputs and multiple outputs, 6 and 7 respectively, to be precise. I would like to clarify few questions based on it:-
The final regression plot showed at the end of the training shows a very good accuracy, R~0.99. However, since I have multiple outputs, I am confused as to which scatter plot does it represent? Shouldn't we have 7 target vs predicted plots for each of the output variable?
According to my knowledge, R^2 is a better method of commenting upon the accuracy of the model, whereas MATLAB reports R in its plot. Do I treat that R as R^2 or should I square the reported R value to obtain R^2.
I have generated the Matlab Script containing weight, bias and activation functions, as a final Result of the training. So shouldn't I be able to simply give my raw data as input and obtain the corresponding predicted output. I gave the exact same training set using the indices Matlab chose for training (to cross check), and plotted the predicted output vs actual output, but the result is not at all good. Definitely, not along the lines of R~0.99. Am I doing anything wrong?
code:
function [y1] = myNeuralNetworkFunction_2(x1)
%MYNEURALNETWORKFUNCTION neural network simulation function.
% X = [torque T_exh lambda t_Spark N EGR];
% Y = [O2R CO2R HC NOX CO lambda_out T_exh2];
% Generated by Neural Network Toolbox function genFunction, 17-Dec-2018 07:13:04.
%
% [y1] = myNeuralNetworkFunction(x1) takes these arguments:
% x = Qx6 matrix, input #1
% and returns:
% y = Qx7 matrix, output #1
% where Q is the number of samples.
%#ok<*RPMT0>
% ===== NEURAL NETWORK CONSTANTS =====
% Input 1
x1_step1_xoffset = [-24;235.248;0.75;-20.678;550;0.799];
x1_step1_gain = [0.00353982300884956;0.00284355877067267;6.26959247648903;0.0275865874012055;0.000366568914956012;0.0533831576137729];
x1_step1_ymin = -1;
% Layer 1
b1 = [1.3808996210168685;-2.0990163849711894;0.9651733083552595;0.27000953282929346;-1.6781835509820286;-1.5110463684800366;-3.6257438832309905;2.1569498669085361;1.9204156230460485;-0.17704342477904209];
IW1_1 = [-0.032892214008082517 -0.55848270745152429 -0.0063993424771670616 -0.56161004933654057 2.7161844536020197 0.46415317073346513;-0.21395624254052176 -3.1570133640176681 0.71972178875396853 -1.9132557838515238 1.3365248285282931 -3.022721627052706;-1.1026780445896862 0.2324603066452392 0.14552308208231421 0.79194435276493658 -0.66254679969168417 0.070353201192052434;-0.017994515838487352 -0.097682677816992206 0.68844109281256027 -0.001684535122025588 0.013605622123872989 0.05810686279306107;0.5853667840629273 -2.9560683084876329 0.56713425120259764 -2.1854386350040116 1.2930115031659106 -2.7133159265497957;0.64316656469750333 -0.63667017646313084 0.50060179040086761 -0.86827897068177973 2.695456517458648 0.16822164719859456;-0.44666821007466739 4.0993786464616679 -0.89370838440321498 3.0445073606237933 -3.3015566360833453 -4.492874075961689;1.8337574137485424 2.6946232855369989 1.1140472073136622 1.6167763205944321 1.8573696127039145 -0.81922672766933646;-0.12561950922781362 3.0711045035224349 -0.6535751823440773 2.0590707752473199 -1.3267693770634292 2.8782780742777794;-0.013438026967107483 -0.025741311825949621 0.45460734966889638 0.045052447491038108 -0.21794568374100454 0.10667240367191703];
% Layer 2
b2 = [-0.96846557414356171;-0.2454718918618051;-0.7331628718025488;-1.0225195290982099;0.50307202195645395;-0.49497234988401961;-0.21817117469133171];
LW2_1 = [-0.97716474643411022 -0.23883775971686808 0.99238069915206006 0.4147649511973347 0.48504023209224734 -0.071372217431684551 0.054177719330469304 -0.25963474838320832 0.27368380212104881 0.063159321947246799;-0.15570858147605909 -0.18816739764334323 -0.3793600124951475 2.3851961990944681 0.38355142531334563 -0.75308427071748985 -0.1280128732536128 -1.361052031781103 0.6021878865831336 -0.24725687748503239;0.076251356114485525 -0.10178293627600112 0.10151304376762409 -0.46453434441403058 0.12114876632815359 0.062856969143306296 -0.0019628163322658364 -0.067809039768745916 0.071731544062023825 0.65700427778446913;0.17887084584125315 0.29122649575978238 0.37255802759192702 1.3684190468992126 0.60936238465090853 0.21955911453674043 0.28477957899364675 -0.051456306721251184 0.6519451272106177 -0.64479205028051967;0.25743349663436799 2.0668075180209979 0.59610776847961111 -3.2609682919282603 1.8824214917530881 0.33542869933904396 0.03604272669356564 -0.013842766338427388 3.8534510207741826 2.2266745660915586;-0.16136175574939746 0.10407287099228898 -0.13902245286490234 0.87616472446622717 -0.027079111747601223 0.024812287505204988 -0.030101536834009103 0.043168268669541855 0.12172932035587079 -0.27074383434206573;0.18714562505165402 0.35267726325386606 -0.029241400610813449 0.53053853235049087 0.58880054832728757 0.047959541165126809 0.16152268183097709 0.23419456403348898 0.83166785128608967 -0.66765237856750781];
% Output 1
y1_step1_ymin = -1;
y1_step1_gain = [0.114200879346771;0.145581598485951;0.000139011547272197;0.000456244862967996;2.05816254143146e-05;5.27704485488127;0.00284355877067267];
y1_step1_xoffset = [-0.045;1.122;2.706;17.108;493.726;0.75;235.248];
% ===== SIMULATION ========
% Dimensions
Q = size(x1,1); % samples
% Input 1
x1 = x1';
xp1 = mapminmax_apply(x1,x1_step1_gain,x1_step1_xoffset,x1_step1_ymin);
% Layer 1
a1 = tansig_apply(repmat(b1,1,Q) + IW1_1*xp1);
% Layer 2
a2 = repmat(b2,1,Q) + LW2_1*a1;
% Output 1
y1 = mapminmax_reverse(a2,y1_step1_gain,y1_step1_xoffset,y1_step1_ymin);
y1 = y1';
end
% ===== MODULE FUNCTIONS ========
% Map Minimum and Maximum Input Processing Function
function y = mapminmax_apply(x,settings_gain,settings_xoffset,settings_ymin)
y = bsxfun(#minus,x,settings_xoffset);
y = bsxfun(#times,y,settings_gain);
y = bsxfun(#plus,y,settings_ymin);
end
% Sigmoid Symmetric Transfer Function
function a = tansig_apply(n)
a = 2 ./ (1 + exp(-2*n)) - 1;
end
% Map Minimum and Maximum Output Reverse-Processing Function
function x = mapminmax_reverse(y,settings_gain,settings_xoffset,settings_ymin)
x = bsxfun(#minus,y,settings_ymin);
x = bsxfun(#rdivide,x,settings_gain);
x = bsxfun(#plus,x,settings_xoffset);
end
The above one is the automatically generated code. The plot which I generated to cross-check the first variable is below:-
% X and Y are input and output - same as above
X_train = X(results.info1.train.indices,:);
y_train = Y(results.info1.train.indices,:);
out_train = myNeuralNetworkFunction_2(X_train);
scatter(y_train(:,1),out_train(:,1))
To answer your question about R: Yes, you should square R to get the R^2 value. In this case, they will be very close since R is very close to 1.
The graphs give the correlation between the estimated and real (target) values. So R is the strenght of the correlation. You can square it to find the R-square.
The graph you draw and matlab gave are not the graph of the same variables. The ranges or scales of the axes are very different.
First of all, is the problem you are trying to solve a regression problem? Or is it a classification problem with 7 classes converted to numeric? I assume this is a classification problem, as you are trying to get the success rate for each class.
As for your first question: According to the literature it is recommended to use the value "All: R". If you want to get the success rate of each of your classes, Precision, Recall, F-measure, FP rate, TP Rate, etc., which are valid in classification problems. values ​​you need to reach. There are many matlab documents for this (help ROC) and you can look at the details. All the values ​​I mentioned and which I think you actually want are obtained from the confusion matrix.
There is a good example of this.
[x,t] = simpleclass_dataset;
net = patternnet(10);
net = train(net,x,t);
y = net(x);
[c,cm,ind,per] = confusion(t,y)
I hope you will see what you want from the "nntraintool" window that appears when you run the code.
Your other questions have already been answered. Alternatively, you can consider using a machine learning algorithm with open source software such as Weka.

Matlab SVM example

I am trying to implement SVM for classification. The goal is to output the correct grid of origin of a power signal (.wav file). The grids are titled A-I and there are 93 total signals for the training set and 49 practice signals. I have a 93x10x36 matrix of feature vectors. Does anyone know why I get the errors shown? TrainCorrectGrid and Training_Cepstrum1 both have 93 rows so I don't understand what the problem is. Any help is greatly appreciated.
My code is shown here:
clc; clear; close all;
load('avg_fft_feature (4).mat'); %training feature vectors
load('practice_fft_Mag_all (2).mat'); %practice feauture vectors
load('practice_GridOrigin.mat'); %correct grids of origin for practice data
load PracticeCorrectGrid.mat;
load Training_Cepstrum1;
load Practice_Cepstrum1a;
load fSet1.mat %load in correct practice grids
TrainCorrectGrid=['A';'A';'A';'A';'A';'A';'A';'A';'A';'B';'B';'B';'B';'B';'B';'B';'B';'B';'B';'C';'C';'C';'C';'C';'C';'C';'C';'C';'C';'C';'D';'D';'D';'D';'D';'D';'D';'D';'D';'D';'D';'E';'E';'E';'E';'E';'E';'E';'E';'E';'E';'E';'F';'F';'F';'F';'F';'F';'F';'F';'G';'G';'G';'G';'G';'G';'G';'G';'G';'G';'G';'H';'H';'H';'H';'H';'H';'H';'H';'H';'H';'H';'I';'I';'I';'I';'I';'I';'I';'I';'I';'I';'I'];
%[results,u] = multisvm(avg_fft_feature, TrainCorrectGrid, avg_fft_feature_practice);%avg_fft_feature);
[results,u] = multisvm(Training_Cepstrum1(93,:,1), TrainCorrectGrid, Practice_Cepstrum1a(49,:,1));
disp('Grids of Origin (SVM)');
%Display SVM Results
for i = 1:numel(u)
str = sprintf('%d: %s', i, u(i));
disp(str);
end
%Display Percent Correct
numCorrect = 0;
for i = 1:numel(u)
%if (strcmp(TrainCorrectGrid(i,1), u(i))==1); %compare training to
%training
if (strcmp(PracticeCorrectGrid(i,1), u(i))==1); %compare practice data to training
numCorrect = numCorrect + 1;
end
end
numberOfElements = numel(u);
percentCorrect = numCorrect / numberOfElements * 100;
% percentCorrect = round(percentCorrect, 2);
dispPercent = sprintf('Percent Correct = %0.3f%%', percentCorrect);
disp(dispPercent);
error shown here
The multisvm function is shown here:
function [result, u] = multisvm(TrainingSet,GroupTrain,TestSet)
%Models a given training set with a corresponding group vector and
%classifies a given test set using an SVM classifier according to a
%one vs. all relation.
%
%This code was written by Cody Neuburger cneuburg#fau.edu
%Florida Atlantic University, Florida USA and slightly modified by Renny Varghese
%This code was adapted and cleaned from Anand Mishra's multisvm function
%found at http://www.mathworks.com/matlabcentral/fileexchange/33170-multi-class-support-vector-machine/
u=unique(GroupTrain);
numClasses=length(u);
result = zeros(length(TestSet(:,1)),1);
%build models
for k=1:numClasses
%Vectorized statement that binarizes Group
%where 1 is the current class and 0 is all other classes
G1vAll=(GroupTrain==u(k));
models(k) = svmtrain(TrainingSet,G1vAll);
end
%classify test cases
for j=1:size(TestSet,1)
for k=1:numClasses
if(svmclassify(models(k),TestSet(j,:)))
break;
end
end
result(j) = k;
end
mapValues = 'ABCDEFGHI';
u = mapValues(result);
You state that Training_Cepstrum1 has size [93,10,36]. But when you call multisvm, you are only passing in Training_Cepstrum1(93,:,1) which has size [1,10]. Since TrainCorrectGrid has size [93,1], there is a mismatch in the number of rows.
It looks like you make the same error when passing in Practice_Cepstrum1a.
Try replacing your call to multisvm with
[results,u] = multisvm(Training_Cepstrum1(:,:,1), TrainCorrectGrid, Practice_Cepstrum1a(:,:,1));
This way Training_Cepstrum1(:,:,1) has size [93,10], the same number of rows as TrainCorrectGrid.

Neural Network convergence speed (Levenberg-Marquardt) (MATLAB)

I was trying to approximate a function (single input and single output) with an ANN. Using MATLAB toolbox I could see that with 5 or more neurons in the hidden layer, I can achieve a very nice result. So I am trying to do it manually.
Calculations:
As the network has only one input and one output, the partial derivative of the error (e=d-o, where 'd' is the desired output and 'o' is the actual output) in respect to a weigth which connects a hidden neuron j to the output neuron, will be -hj (where hj is the output of a hidden neuron j);
The partial derivative of the error in respect to output bias will be -1;
The partial derivative of the error in respect to a weight which connects the input to a hidden neuron j will be -woj*f'*i, where woj is the hidden neuron j output weigth, f' is the tanh() derivative and 'i' is the input value;
Finally, the partial derivative of the error in respect to hidden layer bias will be the same as above (in respect to input weight) except that here we dont have the input:
-woj*f'
The problem is:
the MATLAB algorithm always converge faster and better. I can achieve the same curve as MATLAB does, but my algorithm requires much more epochs.
I've tried to remove pre and postprocessing functions from MATLAB algorithm. It still converges faster.
I've also tried to create and configure the network, and extract weight/bias values before training so I could copy them to my algorithm to see if it converges faster but nothing changed (is the weight/bias initialization inside create/configure or train function?).
Does the MATLAB algorithm have some kind of optimizations inside the code?
Or may be this difference only in the organization of the training set and weight/bias initialization?
In case one wants to look my code, here is the main loop which makes the training:
Err2 = N;
epochs = 0;
%compare MSE of error2
while ((Err2/N > 0.0003) && (u < 10000000) && (epochs < 100))
epochs = epochs+1;
Err = 0;
%input->hidden weight vector
wh = w(1:hidden_layer_len);
%hidden->output weigth vector
wo = w((hidden_layer_len+1):(2*hidden_layer_len));
%hidden bias
bi = w((2*hidden_layer_len+1):(3*hidden_layer_len));
%output bias
bo = w(length(w));
%start forward propagation
for i=1:N
%take next input value
x = t(i);
%propagate to hidden layer
neth = x*wh + bi;
%propagate through neurons
ij = tanh(neth)';
%propagate to output layer
neto = ij*wo + bo;
%propagate to output (purelin)
output(i) = neto;
%calculate difference from target (error)
error(i) = yp(i) - output(i);
%Backpropagation:
%tanh derivative
fhd = 1 - tanh(neth').*tanh(neth');
%jacobian matrix
J(i,:) = [-x*wo'.*fhd -ij -wo'.*fhd -1];
%SSE (sum square error)
Err = Err + 0.5*error(i)*error(i);
end
%calculate next error with updated weights and compare with old error
%start error2 from error1 + 1 to enter while loop
Err2 = Err+1;
%while error2 is > than old error and Mu (u) is not too large
while ((Err2 > Err) && (u < 10000000))
%Weight update
w2 = w - (((J'*J + u*eye(3*hidden_layer_len+1))^-1)*J')*error';
%New Error calculation
%New weights to propagate
wh = w2(1:hidden_layer_len);
wo = w2((hidden_layer_len+1):(2*hidden_layer_len));
%new bias to propagate
bi = w2((2*hidden_layer_len+1):(3*hidden_layer_len));
bo = w2(length(w));
%calculate error2
Err2 = 0;
for i=1:N
%forward propagation again
x = t(i);
neth = x*wh + bi;
ij = tanh(neth)';
neto = ij*wo + bo;
output(i) = neto;
error2(i) = yp(i) - output(i);
%Error2 (SSE)
Err2 = Err2 + 0.5*error2(i)*error2(i);
end
%compare MSE from error2 with a minimum
%if greater still runing
if (Err2/N > 0.0003)
%compare with old error
if (Err2 <= Err)
%if less, update weights and decrease Mu (u)
w = w2;
u = u/10;
else
%if greater, increment Mu (u)
u = u*10;
end
end
end
end
It's not easy to know the exact implementation of the Levenberg Marquardt algorithm in Matlab. You may try to run the algorithm one iteration at a time, and see if it is identical to your algorithm. You can also try other implementations, such as, http://www.mathworks.com/matlabcentral/fileexchange/16063-lmfsolve-m--levenberg-marquardt-fletcher-algorithm-for-nonlinear-least-squares-problems, to see if the performance can be improved. For simple learning problems, convergence speed may be a matter of learning rate. You might simply increase the learning rate to get faster convergence.

matlab fitting function in genetic algorithm (ga function) doesn't accept output

Hi everyone i'm new to matlab and i have some problem with the ga function.
I must find the best integer values of input and output delays for a net applied to a time series problem using genetic algorithm.
I write a fitting function that using 2 variables as input and output delays create the net and return the performance value.
I want 2 delays between 1 and 10 so i use the ga function like this
[x,fval,exitflag,output,population,scores] = ga(#rendimentoRete, 2, [], [], [], [], [1 1], [10 10], [], [1 2])
And i have this error "Output argument "out" (and maybe others) not assigned during call to"D:\Documents\MATLAB\rendimentoRete.m>rendimentoRete"."
This is my fitting function
(if someone prefer this is the same function on pastebin with Syntax Highlighting http://pastebin.com/5iwzwhi0)
function out = rendimentoRete(iDelay, oDelay)
if nargin < 2
return;
end
%%carico le variabili dal workspace in modo che la funzione di fitness
%%conservi come varibili di ingresso solo i delay
load baseSet.mat;
inputSeries = tonndata(first_period,false,false);
targetSeries = tonndata(first_usd_ise,false,false);
% Create a Nonlinear Autoregressive Network with External Input
inputDelays = 1:iDelay;
feedbackDelays = 1:oDelay;
hiddenLayerSize = 8;
net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize);
% Prepare the Data for Training and Simulation
% The function PREPARETS prepares timeseries data for a particular network,
% shifting time by the minimum amount to fill input states and layer states.
% Using PREPARETS allows you to keep your original time series data unchanged, while
% easily customizing it for networks with differing numbers of delays, with
% open loop or closed loop feedback modes.
[inputs,inputStates,layerStates,targets] = preparets(net,inputSeries,{},targetSeries);
% Setup Division of Data for Training, Validation, Testing
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% Train the Network
for j=1:10
[net,tr] = train(net,inputs,targets,inputStates,layerStates);
end
% Test the Network
outputs = net(inputs,inputStates,layerStates);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs); %%alt -> MSE
%prova1
out = performance;
%{
%prova2
%converto l'output a un vettore e approssimo i risultati ai valori 1 e -1
%convert outputs cell to array
outputs_array = cell2mat(outputs);
%convertiamo outputs_array
for k=1:(268-outputDelay)
if outputs_array(k)>0
outputs_array(k)=1;
else
outputs_array(k)=-1;
end
end
out = performance + calcoloMape(outputDelay, first_usd_ise, outputs_array) + calcoloPgcp(outputDelay, first_usd_ise, outputs_array);
%}
end
with the load of baseSet.mat i'm loading first_period and first_usd_ise because they are always the same.
I can't figure out what to do because when i use the function stand alone gives me good result.
Someone can explain me where i mistake.
According to the documentation for MATLAB's ga, the signature of the fitness function is described as
...should accept a row vector of length nvars and return a scalar value.
That means that your function signature has to change from one with two input variables like
function out = rendimentoRete(iDelay, oDelay)
to a signature with just one input variable like
function out = rendimentoRete(inputVars)
where inputVars is a vector whereby you can obtain the two variables as
iDelay = inputVars(1);
oDelay = inputVars(2);
Modify your function signature and first two lines to the above (you can remove the margin check), try it out, and see what happens!

creating a train perceptron in MATLAB for gender clasiffication

I am coding a perceptron to learn to categorize gender in pictures of faces. I am very very new to MATLAB, so I need a lot of help. I have a few questions:
I am trying to code for a function:
function [y] = testset(x,w)
%y = sign(sigma(x*w-threshold))
where y is the predicted results, x is the training/testing set put in as a very large matrix, and w is weight on the equation. The part after the % is what I am trying to write, but I do not know how to write this in MATLAB code. Any ideas out there?
I am trying to code a second function:
function [err] = testerror(x,w,y)
%err = sigma(max(0,-w*x*y))
w, x, and y have the same values as stated above, and err is my function of error, which I am trying to minimize through the steps of the perceptron.
I am trying to create a step in my perceptron to lower the percent of error by using gradient descent on my original equation. Does anyone know how I can increment w using gradient descent in order to minimize the error function using an if then statement?
I can put up the code I have up till now if that would help you answer these questions.
Thank you!
edit--------------------------
OK, so I am still working on the code for this, and would like to put it up when I have something more complete. My biggest question right now is:
I have the following function:
function [y] = testset(x,w)
y = sign(sum(x*w-threshold))
Now I know that I am supposed to put a threshold in, but cannot figure out what I am supposed to put in as the threshold! any ideas out there?
edit----------------------------
this is what I have so far. Changes still need to be made to it, but I would appreciate input, especially regarding structure, and advice for making the changes that need to be made!
function [y] = Perceptron_Aviva(X,w)
y = sign(sum(X*w-1));
end
function [err] = testerror(X,w,y)
err = sum(max(0,-w*X*y));
end
%function [w] = perceptron(X,Y,w_init)
%w = w_init;
%end
%------------------------------
% input samples
X = X_train;
% output class [-1,+1];
Y = y_train;
% init weigth vector
w_init = zeros(size(X,1));
w = w_init;
%---------------------------------------------
loopcounter = 0
while abs(err) > 0.1 && loopcounter < 100
for j=1:size(X,1)
approx_y(j) = Perceptron_Aviva(X(j),w(j))
err = testerror(X(j),w(j),approx_y(j))
if err > 0 %wrong (structure is correct, test is wrong)
w(j) = w(j) - 0.1 %wrong
elseif err < 0 %wrong
w(j) = w(j) + 0.1 %wrong
end
% -----------
% if sign(w'*X(:,j)) ~= Y(j) %wrong decision?
% w = w + X(:,j) * Y(j); %then add (or subtract) this point to w
end
you can read this question I did some time ago.
I uses a matlab code and a function perceptron
function [w] = perceptron(X,Y,w_init)
w = w_init;
for iteration = 1 : 100 %<- in practice, use some stopping criterion!
for ii = 1 : size(X,2) %cycle through training set
if sign(w'*X(:,ii)) ~= Y(ii) %wrong decision?
w = w + X(:,ii) * Y(ii); %then add (or subtract) this point to w
end
end
sum(sign(w'*X)~=Y)/size(X,2) %show misclassification rate
end
and it is called from code (#Itamar Katz) like (random data):
% input samples
X1=[rand(1,100);rand(1,100);ones(1,100)]; % class '+1'
X2=[rand(1,100);1+rand(1,100);ones(1,100)]; % class '-1'
X=[X1,X2];
% output class [-1,+1];
Y=[-ones(1,100),ones(1,100)];
% init weigth vector
w=[.5 .5 .5]';
% call perceptron
wtag=perceptron(X,Y,w);
% predict
ytag=wtag'*X;
% plot prediction over origianl data
figure;hold on
plot(X1(1,:),X1(2,:),'b.')
plot(X2(1,:),X2(2,:),'r.')
plot(X(1,ytag<0),X(2,ytag<0),'bo')
plot(X(1,ytag>0),X(2,ytag>0),'ro')
legend('class -1','class +1','pred -1','pred +1')
I guess this can give you an idea to make the functions you described.
To the error compare the expected result with the real result (class)
Assume your dataset is X, the datapoins, and Y, the labels of the classes.
f=newp(X,Y)
creates a perceptron.
If you want to create an MLP then:
f=newff(X,Y,NN)
where NN is the network architecture, i.e. an array that designates the number of neurons at each hidden layer. For example
NN=[5 3 2]
will correspond to an network with 5 neurons at the first layers, 3 at the second and 2 a the third hidden layer.
Well what you call threshold is the Bias in machine learning nomenclature. This should be left as an input for the user because it is used during training.
Also, I wonder why you are not using the builtin matlab functions. i.e newp or newff. e.g.
ff=newp(X,Y)
Then you can set the properties of the object ff to do your job for selecting gradient descent and so on.