I tried to implement the Widrow - Nguyen weight initialization on MATLAB 2014a. to compare its performance against HARD RANDOM weight init technique.
a = -1;
b = 1;
% WIDROW weights for Layer Input to Hidden Layer 1
sum_sq_wts = 0;
for k=1:30
iw(:,:) = zeros(num_input, nodes_hidden_layer);
for i=1:num_input
for j=1:nodes_hidden_layer
iw(i,j)=(b-a)*rand(1,1) + a;
sum_sq_wts = sum_sq_wts + (iw(i,j)*iw(i,j));
end
end
norm = sqrt(sum_sq_wts);
beta = 0.7*nodes_hidden_layer.^(1/num_input);
for i=1:num_input
for j=1:nodes_hidden_layer
iw(i,j) = beta*iw(i,j)/norm;
end
end
IW{k}=iw';
end
% WIDROW weights for Hidden Layer 1 to output Layer
sum_sq_wts = 0;
for k=1:30
lw(:,:) = zeros(nodes_hidden_layer,1);
for i=1:nodes_hidden_layer
for j=1:1
iw(i,j)=(b-a)*rand(1,1) + a;
sum_sq_wts = sum_sq_wts + iw(i,j)*iw(i,j);
end
end
norm = sqrt(sum_sq_wts);
beta = 0.7*nodes_hidden_layer.^(1/num_input);
for i=1:nodes_hidden_layer
for j=1:1
lw(i,j) = beta*lw(i,j)/norm;
end
end
LW{k}=lw';
end
WidNgu{1,1} = IW;
WidNgu{1,2} = LW;
I am generating 30 different set of Widrow weights in the above code. The problem is that the weights generated by the above code generate a lesser performance value for a neural network trained using them as compared to the random set of weights. The problem i used to train was a simple function approx prob.
One thing more interesting i observe is that, the first weight set generated by the above, at times performs better than the random weight approach, but the rest 29 sets that i created are always poor performing.
Where have i gone wrong in this??
Related
So i am having some trouble understanding the standardisation processes of this KNN classifier. Basically i need to know what is happening in the standardisation processes. If someone could help it would be greatly appreciated. I know that there is being a variable of the mean and std being made of the "train examples" but what's actually going on after that is what i am having difficulty with.
classdef myknn
methods(Static)
%the function m calls the train examples, train labels
%and the no. of nearest neighbours.
function m = fit(train_examples, train_labels, k)
% start of standardisation process
m.mean = mean(train_examples{:,:}); %mean variable
m.std = std(train_examples{:,:}); %standard deviation variable
for i=1:size(train_examples,1)
train_examples{i,:} = train_examples{i,:} - m.mean;
train_examples{i,:} = train_examples{i,:} ./ m.std;
end
% end of standardisation process
m.train_examples = train_examples;
m.train_labels = train_labels;
m.k = k;
end
function predictions = predict(m, test_examples)
predictions = categorical;
for i=1:size(test_examples,1)
fprintf('classifying example example %i/%i\n', i, size(test_examples,1));
this_test_example = test_examples{i,:};
% start of standardisation process
this_test_example = this_test_example - m.mean;
this_test_example = this_test_example ./ m.std;
% end of standardisation process
this_prediction = myknn.predict_one(m, this_test_example);
predictions(end+1) = this_prediction;
end
end
function prediction = predict_one(m, this_test_example)
distances = myknn.calculate_distances(m, this_test_example);
neighbour_indices = myknn.find_nn_indices(m, distances);
prediction = myknn.make_prediction(m, neighbour_indices);
end
function distances = calculate_distances(m, this_test_example)
distances = [];
for i=1:size(m.train_examples,1)
this_training_example = m.train_examples{i,:};
this_distance = myknn.calculate_distance(this_training_example, this_test_example);
distances(end+1) = this_distance;
end
end
function distance = calculate_distance(p, q)
differences = q - p;
squares = differences .^ 2;
total = sum(squares);
distance = sqrt(total);
end
function neighbour_indices = find_nn_indices(m, distances)
[sorted, indices] = sort(distances);
neighbour_indices = indices(1:m.k);
end
function prediction = make_prediction(m, neighbour_indices)
neighbour_labels = m.train_labels(neighbour_indices);
prediction = mode(neighbour_labels);
end
end
end
Standardization is the process of normalizing each feature in your training examples so that each feature has a mean of zero and a standard deviation of one. The procedure to do this would be to find the mean of each feature and standard deviation of each feature. After, we take each feature and subtract but its corresponding mean and divide by its corresponding standard deviation.
That can clearly be seen by this code:
m.mean = mean(train_examples{:,:}); %mean variable
m.std = std(train_examples{:,:}); %standard deviation variable
for i=1:size(train_examples,1)
train_examples{i,:} = train_examples{i,:} - m.mean;
train_examples{i,:} = train_examples{i,:} ./ m.std;
end
m.mean remembers the mean of each feature while m.std remembers the standard deviation of each feature. Take note that you must remember both of these when you want to perform the classification at test time. That can be seen by the predict method you have where it takes the test features and subtracts by the mean and standard deviation of each feature from the training examples.
function predictions = predict(m, test_examples)
predictions = categorical;
for i=1:size(test_examples,1)
fprintf('classifying example example %i/%i\n', i, size(test_examples,1));
this_test_example = test_examples{i,:};
% start of standardisation process
this_test_example = this_test_example - m.mean;
this_test_example = this_test_example ./ m.std;
% end of standardisation process
this_prediction = myknn.predict_one(m, this_test_example);
predictions(end+1) = this_prediction;
end
Take note that we're using m.mean and m.std on the test examples and these quantities come from the training examples.
My post on standardization should provide some more context. In addition, it achieves the same effect as the code you have provided but in a more vectorized fashion: How does this code for standardizing data work?
I am trying to implement SVM for classification. The goal is to output the correct grid of origin of a power signal (.wav file). The grids are titled A-I and there are 93 total signals for the training set and 49 practice signals. I have a 93x10x36 matrix of feature vectors. Does anyone know why I get the errors shown? TrainCorrectGrid and Training_Cepstrum1 both have 93 rows so I don't understand what the problem is. Any help is greatly appreciated.
My code is shown here:
clc; clear; close all;
load('avg_fft_feature (4).mat'); %training feature vectors
load('practice_fft_Mag_all (2).mat'); %practice feauture vectors
load('practice_GridOrigin.mat'); %correct grids of origin for practice data
load PracticeCorrectGrid.mat;
load Training_Cepstrum1;
load Practice_Cepstrum1a;
load fSet1.mat %load in correct practice grids
TrainCorrectGrid=['A';'A';'A';'A';'A';'A';'A';'A';'A';'B';'B';'B';'B';'B';'B';'B';'B';'B';'B';'C';'C';'C';'C';'C';'C';'C';'C';'C';'C';'C';'D';'D';'D';'D';'D';'D';'D';'D';'D';'D';'D';'E';'E';'E';'E';'E';'E';'E';'E';'E';'E';'E';'F';'F';'F';'F';'F';'F';'F';'F';'G';'G';'G';'G';'G';'G';'G';'G';'G';'G';'G';'H';'H';'H';'H';'H';'H';'H';'H';'H';'H';'H';'I';'I';'I';'I';'I';'I';'I';'I';'I';'I';'I'];
%[results,u] = multisvm(avg_fft_feature, TrainCorrectGrid, avg_fft_feature_practice);%avg_fft_feature);
[results,u] = multisvm(Training_Cepstrum1(93,:,1), TrainCorrectGrid, Practice_Cepstrum1a(49,:,1));
disp('Grids of Origin (SVM)');
%Display SVM Results
for i = 1:numel(u)
str = sprintf('%d: %s', i, u(i));
disp(str);
end
%Display Percent Correct
numCorrect = 0;
for i = 1:numel(u)
%if (strcmp(TrainCorrectGrid(i,1), u(i))==1); %compare training to
%training
if (strcmp(PracticeCorrectGrid(i,1), u(i))==1); %compare practice data to training
numCorrect = numCorrect + 1;
end
end
numberOfElements = numel(u);
percentCorrect = numCorrect / numberOfElements * 100;
% percentCorrect = round(percentCorrect, 2);
dispPercent = sprintf('Percent Correct = %0.3f%%', percentCorrect);
disp(dispPercent);
error shown here
The multisvm function is shown here:
function [result, u] = multisvm(TrainingSet,GroupTrain,TestSet)
%Models a given training set with a corresponding group vector and
%classifies a given test set using an SVM classifier according to a
%one vs. all relation.
%
%This code was written by Cody Neuburger cneuburg#fau.edu
%Florida Atlantic University, Florida USA and slightly modified by Renny Varghese
%This code was adapted and cleaned from Anand Mishra's multisvm function
%found at http://www.mathworks.com/matlabcentral/fileexchange/33170-multi-class-support-vector-machine/
u=unique(GroupTrain);
numClasses=length(u);
result = zeros(length(TestSet(:,1)),1);
%build models
for k=1:numClasses
%Vectorized statement that binarizes Group
%where 1 is the current class and 0 is all other classes
G1vAll=(GroupTrain==u(k));
models(k) = svmtrain(TrainingSet,G1vAll);
end
%classify test cases
for j=1:size(TestSet,1)
for k=1:numClasses
if(svmclassify(models(k),TestSet(j,:)))
break;
end
end
result(j) = k;
end
mapValues = 'ABCDEFGHI';
u = mapValues(result);
You state that Training_Cepstrum1 has size [93,10,36]. But when you call multisvm, you are only passing in Training_Cepstrum1(93,:,1) which has size [1,10]. Since TrainCorrectGrid has size [93,1], there is a mismatch in the number of rows.
It looks like you make the same error when passing in Practice_Cepstrum1a.
Try replacing your call to multisvm with
[results,u] = multisvm(Training_Cepstrum1(:,:,1), TrainCorrectGrid, Practice_Cepstrum1a(:,:,1));
This way Training_Cepstrum1(:,:,1) has size [93,10], the same number of rows as TrainCorrectGrid.
I'm trying to set up a custom neural network, but when I train it, it doesn't train : the training process makes 0 iterations! I don't get any errors though, just 0 iterations, and I have no idea why. (The architecture might seem odd to you, it is supposed to be a custom PNN. But before we can even discuss if it makes sense or not, I would like to be able to train it...)
Here is the code
net = network;
net.trainFcn = 'trainlm';
net.performFcn = 'mse';
net.numInputs = 1;
net.numLayers = (2*nbclasses)+1; % (one pattern layer + one summation layer per class) + competition layer
net.inputConnect(1:nbclasses,:) = 1; % connects the input to all pattern layers
for i = 1:nbclasses % Connect the pattern layers to their corresponding summation layers
net.layerConnect(i+nbclasses,i) = 1;
net.layers{i}.size = size(tr_feature,1);
net.layers{i}.transferFcn = 'radbas';
end
for i = (nbclasses+1):(nbclasses*2) % Connect all summation layers to the competition layer
net.layers{i}.size = 1;
net.layerConnect(net.numLayers,i) = 1;
end
net.layers{net.numLayers}.transferFcn = 'compet';
net.outputConnect(1,end) = 1;
net.view;
[net, tr] = train(net,tr_feature',tr_true');
% tr_feature is a 800x2 data matrix, tr_true is the 800x1 corresponding labels
Any idea?
Thanks in advance!
I have implemented the Naive Bayse Classifier for multiclass but problem is my error rate is same while I increase the training data set. I was debugging this over an over but wasn't able to figure why its happening. So I thought I ll post it here to find if I am doing anything wrong.
%Naive Bayse Classifier
%This function split data to 80:20 as data and test, then from 80
%We use incremental 5,10,15,20,30 as the test data to understand the error
%rate.
%Goal is to compare the plots in stanford paper
%http://ai.stanford.edu/~ang/papers/nips01-discriminativegenerative.pdf
function[tPercent] = naivebayes(file, iter, percent)
dm = load(file);
for i=1:iter
%Getting the index common to test and train data
idx = randperm(size(dm.data,1))
%Using same idx for data and labels
shuffledMatrix_data = dm.data(idx,:);
shuffledMatrix_label = dm.labels(idx,:);
percent_data_80 = round((0.8) * length(shuffledMatrix_data));
%Doing 80-20 split
train = shuffledMatrix_data(1:percent_data_80,:);
test = shuffledMatrix_data(percent_data_80+1:length(shuffledMatrix_data),:);
%Getting the label data from the 80:20 split
train_labels = shuffledMatrix_label(1:percent_data_80,:);
test_labels = shuffledMatrix_label(percent_data_80+1:length(shuffledMatrix_data),:);
%Getting the array of percents [5 10 15..]
percent_tracker = zeros(length(percent), 2);
for pRows = 1:length(percent)
percentOfRows = round((percent(pRows)/100) * length(train));
new_train = train(1:percentOfRows,:);
new_train_label = train_labels(1:percentOfRows);
%get unique labels in training
numClasses = size(unique(new_train_label),1);
classMean = zeros(numClasses,size(new_train,2));
classStd = zeros(numClasses, size(new_train,2));
priorClass = zeros(numClasses, size(2,1));
% Doing the K class mean and std with prior
for kclass=1:numClasses
classMean(kclass,:) = mean(new_train(new_train_label == kclass,:));
classStd(kclass, :) = std(new_train(new_train_label == kclass,:));
priorClass(kclass, :) = length(new_train(new_train_label == kclass))/length(new_train);
end
error = 0;
p = zeros(numClasses,1);
% Calculating the posterior for each test row for each k class
for testRow=1:length(test)
c=0; k=0;
for class=1:numClasses
temp_p = normpdf(test(testRow,:),classMean(class,:), classStd(class,:));
p(class, 1) = sum(log(temp_p)) + (log(priorClass(class)));
end
%Take the max of posterior
[c,k] = max(p(1,:));
if test_labels(testRow) ~= k
error = error + 1;
end
end
avgError = error/length(test);
percent_tracker(pRows,:) = [avgError percent(pRows)];
tPercent = percent_tracker;
plot(percent_tracker)
end
end
end
Here is the dimentionality of my data
x =
data: [768x8 double]
labels: [768x1 double]
I am using Pima data set from UCI
What are the results of your implementation of the training data itself? Does it fit it at all?
It's hard to be sure but there are couple things that I noticed:
It is important for every class to have training data. You can't really train a classifier to recognize a class if there was no training data.
If possible number of training examples shouldn't be skewed towards some of classes. For example if in 2-class classification number of training and cross validation examples for class 1 constitutes only 5% of the data then function that always returns class 2 will have error of 5%. Did you try checking precision and recall separately?
You're trying to fit normal distribution to each feature in a class and then use it for posterior probabilities. I'm not sure how it plays out in terms of smoothing. Could you try to re-implement it with simple counting and see if it gives any different results?
It also could be that features are highly redundant and bayes method overcounts probabilities.
I am trying to recreate the results reported in Reducing the dimensionality of data with neural networks of autoencoding the olivetti face dataset with an adapted version of the MNIST digits matlab code, but am having some difficulty. It seems that no matter how much tweaking I do on the number of epochs, rates, or momentum the stacked RBMs are entering the fine-tuning stage with a large amount of error and consequently fail to improve much at the fine-tuning stage. I am also experiencing a similar problem on another real-valued dataset.
For the first layer I am using a RBM with a smaller learning rate (as described in the paper) and with
negdata = poshidstates*vishid' + repmat(visbiases,numcases,1);
I'm fairly confident I am following the instructions found in the supporting material but I cannot achieve the correct errors.
Is there something I am missing? See the code I'm using for real-valued visible unit RBMs below, and for the whole deep training. The rest of the code can be found here.
rbmvislinear.m:
epsilonw = 0.001; % Learning rate for weights
epsilonvb = 0.001; % Learning rate for biases of visible units
epsilonhb = 0.001; % Learning rate for biases of hidden units
weightcost = 0.0002;
initialmomentum = 0.5;
finalmomentum = 0.9;
[numcases numdims numbatches]=size(batchdata);
if restart ==1,
restart=0;
epoch=1;
% Initializing symmetric weights and biases.
vishid = 0.1*randn(numdims, numhid);
hidbiases = zeros(1,numhid);
visbiases = zeros(1,numdims);
poshidprobs = zeros(numcases,numhid);
neghidprobs = zeros(numcases,numhid);
posprods = zeros(numdims,numhid);
negprods = zeros(numdims,numhid);
vishidinc = zeros(numdims,numhid);
hidbiasinc = zeros(1,numhid);
visbiasinc = zeros(1,numdims);
sigmainc = zeros(1,numhid);
batchposhidprobs=zeros(numcases,numhid,numbatches);
end
for epoch = epoch:maxepoch,
fprintf(1,'epoch %d\r',epoch);
errsum=0;
for batch = 1:numbatches,
if (mod(batch,100)==0)
fprintf(1,' %d ',batch);
end
%%%%%%%%% START POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
data = batchdata(:,:,batch);
poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1)));
batchposhidprobs(:,:,batch)=poshidprobs;
posprods = data' * poshidprobs;
poshidact = sum(poshidprobs);
posvisact = sum(data);
%%%%%%%%% END OF POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
poshidstates = poshidprobs > rand(numcases,numhid);
%%%%%%%%% START NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
negdata = poshidstates*vishid' + repmat(visbiases,numcases,1);% + randn(numcases,numdims) if not using mean
neghidprobs = 1./(1 + exp(-negdata*vishid - repmat(hidbiases,numcases,1)));
negprods = negdata'*neghidprobs;
neghidact = sum(neghidprobs);
negvisact = sum(negdata);
%%%%%%%%% END OF NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
err= sum(sum( (data-negdata).^2 ));
errsum = err + errsum;
if epoch>5,
momentum=finalmomentum;
else
momentum=initialmomentum;
end;
%%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
vishidinc = momentum*vishidinc + ...
epsilonw*( (posprods-negprods)/numcases - weightcost*vishid);
visbiasinc = momentum*visbiasinc + (epsilonvb/numcases)*(posvisact-negvisact);
hidbiasinc = momentum*hidbiasinc + (epsilonhb/numcases)*(poshidact-neghidact);
vishid = vishid + vishidinc;
visbiases = visbiases + visbiasinc;
hidbiases = hidbiases + hidbiasinc;
%%%%%%%%%%%%%%%% END OF UPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
fprintf(1, '\nepoch %4i error %f \n', epoch, errsum);
end
dofacedeepauto.m:
clear all
close all
maxepoch=200; %In the Science paper we use maxepoch=50, but it works just fine.
numhid=2000; numpen=1000; numpen2=500; numopen=30;
fprintf(1,'Pretraining a deep autoencoder. \n');
fprintf(1,'The Science paper used 50 epochs. This uses %3i \n', maxepoch);
load fdata
%makeFaceData;
[numcases numdims numbatches]=size(batchdata);
fprintf(1,'Pretraining Layer 1 with RBM: %d-%d \n',numdims,numhid);
restart=1;
rbmvislinear;
hidrecbiases=hidbiases;
save mnistvh vishid hidrecbiases visbiases;
maxepoch=50;
fprintf(1,'\nPretraining Layer 2 with RBM: %d-%d \n',numhid,numpen);
batchdata=batchposhidprobs;
numhid=numpen;
restart=1;
rbm;
hidpen=vishid; penrecbiases=hidbiases; hidgenbiases=visbiases;
save mnisthp hidpen penrecbiases hidgenbiases;
fprintf(1,'\nPretraining Layer 3 with RBM: %d-%d \n',numpen,numpen2);
batchdata=batchposhidprobs;
numhid=numpen2;
restart=1;
rbm;
hidpen2=vishid; penrecbiases2=hidbiases; hidgenbiases2=visbiases;
save mnisthp2 hidpen2 penrecbiases2 hidgenbiases2;
fprintf(1,'\nPretraining Layer 4 with RBM: %d-%d \n',numpen2,numopen);
batchdata=batchposhidprobs;
numhid=numopen;
restart=1;
rbmhidlinear;
hidtop=vishid; toprecbiases=hidbiases; topgenbiases=visbiases;
save mnistpo hidtop toprecbiases topgenbiases;
backpropface;
Thanks for your time
Silly me, I had forgotten to change the back-propagation fine-tuning script (backprop.m). One has to change the output layer (where the faces get reconstructed) to be for real-valued units. I.e.
dataout = w7probs*w8;