Let me explain the background to the problem before I explain the problem. My task is to take in an image labelled with a smile of not. The files that have smiles are labelled, for example 100a.jpg and 100b.jpg. Where 'a' is used to represent an image without a smile and 'b' is used to represent an image with a smile. As such I'm looking to make a 3 layered network i.e. Layer 1 = input nodes, Layer 2 = hidden layer and layer 3 = output node.
The general algorithm is to:
Take in an image and re-size it to the size of x 24x20.
Apply a forward propagation from the input nodes to the hidden layer.
Apply a forward propagation from the hidden layer to the output node.
Then apply a backward propagation from the output node to the hidden layer. (Formula1)
Then apply a backward propagation from the hidden layer to the input nodes. (Formula2)
Formula 1:
Formula 2:
Now the problem quite simply is... my code never converges and as such I dont have weight vectors that can be used to test the network. Problem is I HAVE NO CLUE WHY THIS IS HAPPENING... Here is the error I display, clearly not converging:
Training done full cycle
0.5015
Training done full cycle
0.5015
Training done full cycle
0.5015
Training done full cycle
0.5038
Training done full cycle
0.5038
Training done full cycle
0.5038
Training done full cycle
0.5038
Training done full cycle
0.5038
Here is my matlab code:
function [thetaLayer12,thetaLayer23]=trainSystem()
%This is just the directory where I read the images from
files = dir('train1/*jpg');
filelength = length(files);
%Here I create my weights between input layer and hidden layer and then
%from the hidden layer to the output node. The reason the value 481 is used
%is because there will be 480 input nodes + 1 bias node. The reason 200 is
%used is for the number of hidden layer nodes
thetaLayer12 = unifrnd (-1, 1 ,[481,200]);
thetaLayer23 = unifrnd (-1, 1 ,[201,1]);
%Learning Rate value
alpha = 0.00125;
%Initalize Convergence Error
globalError = 100;
while(globalError > 0.001)
globalError = 0;
%Run through all the files in my training set. 400 Files to be exact.
for i = 1 : filelength
%Here we find out if the image has a smile in it or not. If there
%Images are labled 1a.jpg, 1b.jpg where images with an 'a' in them
%have no smile and images with a 'b' in them have a smile.
y = isempty(strfind(files(i).name,'a'));
%We read in the image
imageBig = imread(strcat('train1/',files(i).name));
%We resize the image to 24x20
image = imresize(imageBig,[24 20]);
%I then take the 2D image and map it to a 1D vector
inputNodes = reshape(image,480,1);
%A bias value of 1 is added to the top of the vector
inputNodes = [1;inputNodes];
%Forward Propogation is applied the input layer and the hidden
%layer
outputLayer2 = logsig(double(inputNodes')* thetaLayer12);
%Here we then add a bias value to hidden layer nodes
inputNodes2 = [1;outputLayer2'];
%Here we then do a forward propagation from the hidden layer to the
%output node to obtain a single value.
finalResult = logsig(double(inputNodes2')* thetaLayer23);
%Backward Propogation is then applied to the weights between the
%output node and the hidden layer.
thetaLayer23 = thetaLayer23 - alpha*(finalResult - y)*inputNodes2;
%Backward Propogation is then applied to the weights between the
%hidden layer and the input nodes.
thetaLayer12 = thetaLayer12 - (((alpha*(finalResult-y)*thetaLayer23(2:end))'*inputNodes2(2:end))*(1-inputNodes2(2:end))*double(inputNodes'))';
%I sum the error across each iteration over all the images in the
%folder
globalError = globalError + abs(finalResult-y);
if(i == 400)
disp('Training done full cycle');
end
end
%I take the average error
globalError = globalError / filelength;
disp(globalError);
end
end
Any help would seriously be appreciated!!!!
The success of training any machine learning algorithm is heavily dependent on the amount of training examples you use to train your algorithm. You never said exactly how many training examples you have, but in the case of face detection a huge number of examples would probably be needed (if it would work at all).
Think of it this way, a computer scientist shows you two arrays of pixel intensity values. He tells you which one has a simile in it and which does not. Than he shows you two more and asks you to tell him which one has a simile in it.
Fortunately we can work around this to some extent. You can use an autoencoder or a dictionary learner like sparse coding to find higher level structures in the data. Instead of the computer scientist showing your pixels intensities, he could show you edges or even body parts. You could then use this as input to your neural network, but a significant number of training examples would probably still be needed (but lesser than before).
Than analogy was inspired by a talk given by professor Ng of Stanford on unsupervised feature learning.
Related
Encouraged by some success in MNIST classification I wanted to solve a "real" problem with some neural networks.
The task seems quite easy:
We have:
some x-value (e.g. 1:1:100)
some y-values (e.g. x^2)
I want to train a network with 1 input (for 1 x-value) and one output (for 1 y-value). One hidden layer.
Here is my basic procedure:
Slicing my x-values into different batches (e.g. 10 elements per batch)
In each batch calculating the outputs of the net, then applying backpropagation, calculating weight and bias updates
After each batch averaging the calculated weight and bias updates and actually update the weights and biases
Repeating step 1. - 3. multiple times
This procedure worked fine for MNIST, but for the regression it totally fails.
I am wondering if I do something fundamentally wrong.
I tried different batchsizes, up to averaging over ALL x values.
Basically the network does not train well. After manually tweaking the weights and biases (with 2 hidden neurons) I could approximate my y=f(x) quite well, but when the network shall learn the parameters, it fails.
When I have just one element for x and one for y and I train the network, it trains well for this one specific pair.
Maybe somebody has a hint for me. Am I misunderstanding regression with neural networks?
So far I assume, the code itself is okay, as it worked for MNIST and it works for the "one x/y pair example". I rather think my overall approach (see above) may be not suitable for regression.
Thanks,
Jim
ps: I will post some code tomorrow...
Here comes the code (MATLAB). As I said, its one hidden layer, with two hidden neurons:
% init hyper-parameters
hidden_neurons=2;
input_neurons=1;
output_neurons=1;
learning_rate=0.5;
batchsize=50;
% load data
training_data=d(1:100)/100;
training_labels=v_start(1:100)/255;
% init weights
init_randomly=1;
if init_randomly
% initialize weights and bias with random numbers between -0.5 and +0.5
w1=rand(hidden_neurons,input_neurons)-0.5;
b1=rand(hidden_neurons,1)-0.5;
w2=rand(output_neurons,hidden_neurons)-0.5;
b2=rand(output_neurons,1)-0.5;
else
% initialize with manually determined values
w1=[10;-10];
b1=[-3;-0.5];
w2=[0.2 0.2];
b2=0;
end
for epochs =1:2000 % looping over some epochs
for i = 1:batchsize:length(training_data) % slice training data into batches
batch_data=training_data(i:min(i+batchsize,length(training_data))); % generating training batch
batch_labels=training_labels(i:min(i+batchsize,length(training_data))); % generating training label batch
% initialize weight updates for next batch
w2_update=0;
b2_update =0;
w1_update =0;
b1_update =0;
for k = 1: length(batch_data) % looping over one single batch
% extract trainig sample
x=batch_data(k); % extracting one single training sample
y=batch_labels(k); % extracting expected output of training sample
% forward pass
z1 = w1*x+b1; % sum of first layer
a1 = sigmoid(z1); % activation of first layer (sigmoid)
z2 = w2*a1+b2; % sum of second layer
a2=z2; %activation of second layer (linear)
% backward pass
delta_2=(a2-y); %calculating delta of second layer assuming quadratic cost; derivative of linear unit is equal to 1 for all x.
delta_1=(w2'*delta_2).* (a1.*(1-a1)); % calculating delta of first layer
% calculating the weight and bias updates averaging over one
% batch
w2_update = w2_update +(delta_2*a1') * (1/length(batch_data));
b2_update = b2_update + delta_2 * (1/length(batch_data));
w1_update = w1_update + (delta_1*x') * (1/length(batch_data));
b1_update = b1_update + delta_1 * (1/length(batch_data));
end
% actually updating the weights. Updated weights will be used in
% next batch
w2 = w2 - learning_rate * w2_update;
b2 = b2 - learning_rate * b2_update;
w1 = w1 - learning_rate * w1_update;
b1 = b1 - learning_rate * b1_update;
end
end
Here is the outcome with random initialization, showing the expected output, the output before training, and the output after training:
training with random init
One can argue that the blue line is already closer than the black one, in that sense the network has optimized the results already. But I am not satisfied.
Here is the result with my manually tweaked values:
training with pre-init
The black line is not bad for just two hidden neurons, but my expectation was rather, that such a black line would be the outcome of training starting with random init.
Any suggestions what I am doing wrong?
Thanks!
Ok, after some research I found some interesting points:
The function I tried to learn seems particularly hard to learn (not sure why)
With the same setup I tried to learn some 3rd degree polynomials which was successful (cost <1e-6)
Randomizing training samples seems to improve learning (for the polynomial and my initial function). I know this is well known in literature but I always skipped that part in implementation. So I learned for myself how important it is.
For learning "curvy/wiggly" functions, I found sigmoid works better than ReLu. (output layer is still "linear" as suggested for regression)
a learning rate of 0.1 worked fine for the curve fitting I finally wanted to perform
A larger batchsize would smoothen the cost vs. epochs plot (surprise...)
Initializing weigths between -5 and +5 worked better than -0.5 and 0.5 for my application
In the end I got quite convincing results for what I intendet to learn with the network :)
Have you tried with a much smaller learning rate? Generally, learning rates of 0.001 are a good starting point, 0.5 is in most cases way too large.
Also note that your predefined weights are in an extremely flat region of the sigmoid function (sigmoid(10) = 1, sigmoid(-10) = 0), with the derivative at both positions close to 0. That means that backpropagating from such a position (or getting to such a position) is extremely difficult; For exactly that reason, some people prefer to use ReLUs instead of sigmoid, since it has only a "dead" region for negative activations.
Also, am I correct in seeing that you only have 100 training samples? You could maybe try a smaller batch size, or increase the number of samples you take. Also don't forget to shuffle your samples after each epoch. Reasons are given plenty, for example here.
I'm trying to classify a testset using GMM. I have a trainset (n*4 matrix) with labels {1,2,3}, n means the number of training examples, which have 4 properties. And I also have a testset (m*4) to be classified.
My goal is to have a probability matrix (m*3) for each testing example giving each label P(x_test|labels). Just like soft clustering.
first, I create a GMM with k=9 components over the whole trainset. I know in some papers, the author create a GMM for each label in trainset. But I want to deal with the data from all of the classes.
GMModel = fitgmdist(trainset,k_component,'RegularizationValue',0.1,'Start','plus');
My problem is, I want to confirm the relationship P(component|labels)between components and labels. So I write a code as below, but not sure if it's right,
idx_ex_of_c1 = find(trainset_label==1);
idx_ex_of_c2 = find(trainset_label==2);
idx_ex_of_c3 = find(trainset_label==3);
[~,~,post] = cluster(GMModel,trainset);
cita_c_k = zeros(3,k_component);
for id_k = 1:k_component
cita_c_k(1,id_k) = sum(post(idx_ex_of_c1,id_k))/numel(idx_ex_of_c1);
cita_c_k(2,id_k) = sum(post(idx_ex_of_c2,id_k))/numel(idx_ex_of_c2);
cita_c_k(3,id_k) = sum(post(idx_ex_of_c3,id_k))/numel(idx_ex_of_c3);
end
cita_c_k is a (3*9) matrix to store the relationships. idx_ex_of_c1 is the index of examples, whose label is '1' in the trainset.
For the testing process. I first apply the GMModel to testset
[P,~] = posterior(GMModel,testset); % P is a m*9 matrix
And then, sum all components,
P_testset = P*cita_c_k';
[a,b] = max(P_testset,3);
imagesc(b);
The result is ok, But not good enough. Can anyone give me some tips?
Thanks!
You can take following steps:
Increase target error and/or use optimal network size in training, but over-training and network size increase usually won't help
Most important, shuffle training data while training and use only important data points for a label to train (ignore data points that may belong to more than one labels)
SEPARABILITY
Verify separability of data using properties using correlation.
Correlation of all data in a label (X) should be high (near to one)
Cross-correlation of all data in label (X) with data in label (!=X) should be low (near zero).
If you observe that data points in a label have low correlation and data points across labels have high correlation - It puts a question on selection of properties (there could be properties which actually won't make data separable). Being so do follows:
Add more relevant properties to data points and remove less relevant properties (technique to use this is PCA)
Use derived parameters like top frequency component etc. from data points to train rather than direct points
Use a time delay network to train time series (always)
I'm trying to learn neural net that is 289x300x1. E.g. input vector is 289 elements, 300 hidden neurons, 1 class-output.
So
net = feedforwardnet(300);
net = train(net,X,y,'useParallel','yes','showResources','yes');
gives error
Error using nn7/perfsJEJJ>calc_Y_trainPerfJeJJ (line 37) Error
detected on worker 2. Requested 87301x87301 (56.8GB) array exceeds
maximum array size preference.
X is an array of size 289x2040, type of elements is double.
y is an array of size 1x2040, type of elemetns is double.
I dont understand why matlab wants so much of memory for such small task. Weights need to be stored = 289 * 300 * 64 bytes which is ~5.5 MB.
And how to solve it.
It is probably due to a combination of a few things:
The number of neurons into your hidden layer is rather large.... are you sure 300 features / neurons is what you need? Consider breaking down the problem to fewer features... a dimensionality reduction may be fruitful, but I'm just speculating. However, from what I know, a neural network of 300 hidden neurons should be fine from experience, but I just brought this point up because that hidden neuron size is rather large.
You have too many inputs going in for training. You have 2040 points going in and that's perhaps why it's breaking. Try breaking up the dataset into chunks of a given size, then incrementally train the network for each chunk.
Let's assume that point #1 you can't fix, but you can address point #2, something like this comes to mind:
chunk_size = 200; %// Declare chunk size
num_chunks = ceil(size(X,2)/chunk_size); %// Get total number of chunks
net = feedforwardnet(300); %// Initialize NN
%// For each chunk, extract out a section of the data, then train the
%// network. Retrain on original network until we run out of data to train
for ii = 1 : num_chunks
%// Ensure cap off if we get to the chunk at the end that isn't
%// evenly divisible by the chunk size
if ii*chunk_size > size(X,2)
max_val = size(X,2);
else
max_val = ii*chunk_size;
end
%// Specify portion of data to extract
interval = (ii-1)*chunk_size + 1 : max_val;
%// Train the NN on this data
net = train(net, X(:,interval), y(interval),'useParallel','yes','showResources','yes'));
end
As such, break up your data into chunks, train your neural network on each chunk separately and update the neural network as you go. You can do this because neural networks basically implement Stochastic Gradient Descent where the parameters are updated each time a new input sample is provided.
I seem to have stumbled upon a small problem while developing an Optical Character Recognition engine. I have trained the K nearest neighbour classifier on MNIST images and even tested it. It seems to work fine. However, when I input images of different dimensions, it seems unable to classify the input image correctly.
Any suggestions on how to work around this problem ?
I] KNN Classifier -
the code for knn classification is :
% herein, I resize the binary image 'b' to contain the
% same dimensions as the training set 'trainingImages' as the input and training Images
% should have the same no. of columns / dimensions
b = imresize(b, size(trainingImages));
% now i try to classify the input image 'b' against the set of training images and
% training labels.
cls = knnclassify(b, trainingImages, trainingLabels, 3, 'euclidean');
cls is now the classification vector. However, this almost always shows the incorrect classification of 1 regardless of the input image.
On the other hand, when I classify the set of MNIST test images, I get a VERY high level of accuracy! The code for the same is as follows -
class = knnclassify(testImg, trainingImages, trainingLabels, 3, 'euclidean');
Right now the main problem is, no matter what kind of input image I give it to predict, it mostly gives me a wrong result (varying for different images), even for those very different images. Seems like it is not working correctly. Could someone help me check out where should be the problem here? I couldn't find any explanation from the existing sources on the internet. Thanks in advance.
I believe I solved the problem which I listed above.
The problems were :
Like Dhanushka said, I was converting the original input image's dimensions to match that of the training image set's dimensions (which in the case of MNIST was 60000 * 784, implying 60000 digits and 784 features for each digit [28 * 28] ).
Thus, I simply changed the dimension of the input image to 28*28.
Pre-processing the input image.
I was simply converting the image to the binary image and trying to classify that against the MNIST training image data set. This was an INCOMPLETE procedure.
When I further detected the edges of the input binary image (Canny, Prewitt or Zerocross - whichever suits you better) and used this for classification, I got an extremely accurate prediction!
NOTE : In KNN classification, you will have to arrive at the number of neighbouring pixels to consider by trial and error. I managed to arrive at the following conclusions -
3 neighbouring pixels are generally enough for synthetic images
1 neighbouring pixel is mostly suitable for handwritten images
The code for the same is as follows :
% herein, I resize the binary image 'b' as the input and training Images
% should have the same no. of columns / dimensions
b = imresize(b, [28 28]); % this resizes the binary image b to 28*28 dimension
b = edge(b, 'canny'); % this uses Canny edge detection on the resized binary
% image
b = b(:)'; % This converts 'b' to a vector using b(:) and then
% transposes the result using the " ' " operator
% Thus, now 'b' has same no of dimensions/columns as
% MNIST training image set
% now i try to classify the input image 'b' against the set of training images
% and training labels.
cls = knnclassify(b, trainingImages, trainingLabels, 3, 'euclidean');
I need to classify a dataset using Matlab MLP and show classification.
The dataset looks like
Click to view
What I have done so far is:
I have create an neural network contains a hidden layer (two neurons
?? maybe someone could give me some suggestions on how many
neurons are suitable for my example) and a output layer (one
neuron).
I have used several different learning methods such as Delta bar
Delta, backpropagation (both of these methods are used with or -out
momentum and Levenberg-Marquardt.)
This is the code I used in Matlab(Levenberg-Marquardt example)
net = newff(minmax(Input),[2 1],{'logsig' 'logsig'},'trainlm');
net.trainParam.epochs = 10000;
net.trainParam.goal = 0;
net.trainParam.lr = 0.1;
[net tr outputs] = train(net,Input,Target);
The following shows hidden neuron classification boundaries generated by Matlab on the data, I am little bit confused, beacause network should produce nonlinear result, but the result below seems that two boundary lines are linear..
Click to view
The code for generating above plot is:
figure(1)
plotpv(Input,Target);
hold on
plotpc(net.IW{1},net.b{1});
hold off
I also need to plot the output function of the output neuron, but I am stucking on this step. Can anyone give me some suggestions?
Thanks in advance.
Regarding the number of neurons in the hidden layer, for such an small example two are more than enough. The only way to know for sure the optimum is to test with different numbers. In this faq you can find a rule of thumb that may be useful: http://www.faqs.org/faqs/ai-faq/neural-nets/
For the output function, it is often useful to divide it in two steps:
First, given the input vector x, the output of the neurons in the hidden layer is y = f(x) = x^T w + b where w is the weight matrix from the input neurons to the hidden layer and b is the bias vector.
Second, you will have to apply the activation function g of the network to the resulting vector of the previous step z = g(y)
Finally, the output is the dot product h(z) = z . v + n, where v is the weight vector from the hidden layer to the output neuron and n the bias. In the case of more than one output neurons, you will repeat this for each one.
I've never used the matlab mlp functions, so I don't know how to get the weights in this case, but I'm sure the network stores them somewhere. Edit: Searching the documentation I found the properties:
net.IW numLayers-by-numInputs cell array of input weight values
net.LW numLayers-by-numLayers cell array of layer weight values
net.b numLayers-by-1 cell array of bias values