How to use training data and learning rate in perceptron learning procedure?

How to use training data and learning rate in perceptron learning procedure? - neural-network

I am implementing perceptron learning algorithm in python and unable to decide if I need to add a value of 1 to each training data or to the bias when working on weights.
For example if the training data is -
[7.627531214,2.759262235]
[5.332441248,2.088626775]
[6.922596716,1.77106367]
[8.675418651,-0.242068655]
[7.673756466,3.508563011]
Do I need to add value of 1 to the training data as below and why?-
[7.627531214,2.759262235,1]
[5.332441248,2.088626775,1]
[6.922596716,1.77106367,1]
[8.675418651,-0.242068655,1]
[7.673756466,3.508563011,1]
Instead of adding value 1 to the training data, can I not add variable (for example bias), assign it value 1 to use it with weights. For example
min_weight = 0
max_weight = 5
bias = 1
weights = [bias, min_weight, max_weight]
Do we need to implement learning rate in perceptron and if yes then can I use delta rule and dotproduct method for learning rate in perception learning procedure?

Related

How to apply the learned model in Matlab after cross-validation

Once the classifier is trained and tested using cross-validation approach, how does one use the results to validate on an unseen data especially during free running stage / deployment stage? How does one use the learned model? the following code trains and tests the data X using cross-validation. How am I supposed to use the learned model after the line pred = predict(svmModel, X(istest,:)); is computed?
part = cvpartition(Y,'Holdout',0.5);
istrain = training(part); % Data for fitting
istest = test(part); % Data for quality assessment
balance_Train=tabulate(Y(istrain))
NumbTrain = sum(istrain); % Number of observations in the training sample
NumbTest = sum(istest);
svmModel = fitcsvm(X(istrain,:),Y(istrain), 'KernelFunction','rbf');
pred = predict(svmModel, X(istest,:));
% compute the confusion matrix
cmat = confusionmat(Y(istest),pred);
acc = 100*sum(diag(cmat))./sum(cmat(:))

The clue's in the name:
predict
Predict labels using support vector machine (SVM) classifier
Syntax
label = predict(SVMModel,X)
[label,score] = predict(SVMModel,X)
Description
label = predict(SVMModel,X) returns a vector of predicted class labels
for the predictor data in the table or matrix X, based on the trained
support vector machine (SVM) classification model SVMModel. The
trained SVM model can either be full or compact.
In the code in your question, the code from pred = ... onwards is there to evaluate the predictions made by your svmModel object. However you can take the same object and use it to make predictions with further input dataset(s) - or, better, train a second model using all the data, and use that model for making actual predictions on new, unknown inputs.
You seem to be unclear on the role of (cross-)validation in model building. You should build your deployment model using the whole dataset (X, as per your comment), because as a rule more data always gives you a better model. To estimate how good this deployment model will be, you build one or more models from subsets of X and test each model against the rest of X that wasn't in that model's training subset. If you only do this once, this is called holdout validation; if you use multiple subsets and average the outcomes it's cross-validation.
If it's important to you for some reason that the deployed model is exactly the same one that you used to obtain your validation results, then you can deploy the model that was trained on the training partition of your holdout. But as I said, more training data usually results in a better model.

How to use Deep Neural Networks for regression?

I wrote this script (Matlab) for classification using Softmax. Now I want to use same script for regression by replacing the Softmax output layer with a Sigmoid or ReLU activation function. But I wasn't able to do that.
X=houseInputs ;
T=houseTargets;
%Train an autoencoder with a hidden layer of size 10 and a linear transfer function for the decoder. Set the L2 weight regularizer to 0.001, sparsity regularizer to 4 and sparsity proportion to 0.05.
hiddenSize = 10;
autoenc1 = trainAutoencoder(X,hiddenSize,...
'L2WeightRegularization',0.001,...
'SparsityRegularization',4,...
'SparsityProportion',0.05,...
'DecoderTransferFunction','purelin');
%%
%Extract the features in the hidden layer.
features1 = encode(autoenc1,X);
%Train a second autoencoder using the features from the first autoencoder. Do not scale the data.
hiddenSize = 10;
autoenc2 = trainAutoencoder(features1,hiddenSize,...
'L2WeightRegularization',0.001,...
'SparsityRegularization',4,...
'SparsityProportion',0.05,...
'DecoderTransferFunction','purelin',...
'ScaleData',false);
features2 = encode(autoenc2,features1);
%%
softnet = trainSoftmaxLayer(features2,T,'LossFunction','crossentropy');
%Stack the encoders and the softmax layer to form a deep network.
deepnet = stack(autoenc1,autoenc2,softnet);
%Train the deep network on the wine data.
deepnet = train(deepnet,X,T);
%Estimate the deep network, deepnet.
y = deepnet(X);

Regression is a different problem from classification. You have to change your loss function to something that fits with a regression e.g. mean square error and of course change the number of neuron to one (you will only ouput 1 value on your last layer).

It is possible to use a Neural Network to perform a regression task but it might be an overkill for many tasks. True regression means to perform a mapping of one set of continuous inputs to another set of continuous outputs:
f: x -> ý
Changing the architecture of a neural network to make it perform a regression task is usually fairly simple. Instead of mapping the continuous input data to a specific class as it is done using the Softmax function as in your case, you have to make the network use only a single output node.
This node will just sum the outputs of the the previous layer (last hidden layer) and multiply the summed activations by 1. During the training process this output ý will be compared to the correct ground-truth value y that comes with your dataset. As a loss function you may use the Root-means-squared-error (RMSE).
Training such a network will result in a model that maps an arbitrary number of independent variables x to a dependent variable ý, which basically is a regression task.
To come back to your Matlab implementation, it would be incorrect to change the current Softmax output layer to be an activation function such as a Sigmoid or ReLU. Instead your would have to implement a custom RMSE output layer for your network, which is fed with the sum of activations coming from the last hidden layer of your network.

Gradient checking in backpropagation

I'm trying to implement gradient checking for a simple feedforward neural network with 2 unit input layer, 2 unit hidden layer and 1 unit output layer. What I do is the following:
Take each weight w of the network weights between all layers and perform forward propagation using w + EPSILON and then w - EPSILON.
Compute the numerical gradient using the results of the two feedforward propagations.
What I don't understand is how exactly to perform the backpropagation. Normally, I compare the output of the network to the target data (in case of classification) and then backpropagate the error derivative across the network. However, I think in this case some other value have to be backpropagated, since in the results of the numerical gradient computation are not dependent of the target data (but only of the input), while the error backpropagation depends on the target data. So, what is the value that should be used in the backpropagation part of gradient check?

Backpropagation is performed after computing the gradients analytically and then using those formulas while training. A neural network is essentially a multivariate function, where the coefficients or the parameters of the functions needs to be found or trained.
The definition of a gradient with respect to a specific variable is the rate of change of the function value. Therefore, as you mentioned, and from the definition of the first derivative we can approximate the gradient of a function, including a neural network.
To check if your analytical gradient for your neural network is correct or not, it is good to check it using the numerical method.
For each weight layer w_l from all layers W = [w_0, w_1, ..., w_l, ..., w_k]
For i in 0 to number of rows in w_l
For j in 0 to number of columns in w_l
w_l_minus = w_l; # Copy all the weights
w_l_minus[i,j] = w_l_minus[i,j] - eps; # Change only this parameter
w_l_plus = w_l; # Copy all the weights
w_l_plus[i,j] = w_l_plus[i,j] + eps; # Change only this parameter
cost_minus = cost of neural net by replacing w_l by w_l_minus
cost_plus = cost of neural net by replacing w_l by w_l_plus
w_l_grad[i,j] = (cost_plus - cost_minus)/(2*eps)
This process changes only one parameter at a time and computes the numerical gradient. In this case I have used the (f(x+h) - f(x-h))/2h, which seems to work better for me.
Note that, you mentiond: "since in the results of the numerical gradient computation are not dependent of the target data", this is not true. As when you find the cost_minus and cost_plus above, the cost is being computed on the basis of
The weights
The target classes
Therefore, the process of backpropagation should be independent of the gradient checking. Compute the numerical gradients before backpropagation update. Compute the gradients using backpropagation in one epoch (using something similar to above). Then compare each gradient component of the vectors/matrices and check if they are close enough.

Whether you want to do some classification or have your network calculate a certain numerical function, you always have some target data. For example, let's say you wanted to train a network to calculate the function f(a, b) = a + b. In that case, this is the input and target data you want to train your network on:
a b Target
1 1 2
3 4 7
21 0 21
5 2 7
...
Just as with "normal" classification problems, the more input-target pairs, the better.

Improve the accuracy performance on SVM

I am working on people detecting using two different features HOG and LBP. I used SVM to train the positive and negative samples. Here, I wanna ask how to improve the accuracy of SVM itself? Because, everytime I added up more positives and negatives sample, the accuracy is always decreasing. Currently my positive samples are 1500 and negative samples are 700.
%extract features
[fpos,fneg] = features(pathPos, pathNeg);
%train SVM
HOG_featV = loadingV(fpos,fneg); % loading and labeling each training example
fprintf('Training SVM..\n');
%L = ones(length(SV),1);
T = cell2mat(HOG_featV(2,:));
HOGtP = HOG_featV(3,:)';
C = cell2mat(HOGtP); % each row of P correspond to a training example
%extract features from LBP
[LBPpos,LBPneg] = LBPfeatures(pathPos, pathNeg);
LBP_featV = loadingV(LBPpos, LBPneg);
LBPlabel = cell2mat(LBP_featV(2,:));
LBPtP = LBP_featV(3,:);
M = cell2mat(LBPtP)'; % each row of P correspond to a training example
featureVector = [C M];
model = svmlearn(featureVector, T','-t 2 -g 0.3 -c 0.5');
Anyone knows how to find best C and Gamma value for improving SVM accuracy?
Thank you,

To find best C and Gamma value for improving SVM accuracy you typically perform cross-validation. In sum you can leave-one-out (1 sample) and test the VBM for that sample using the different parameters (2 parameters define a 2d grid). Typically you would test each decade of the parameters for a certain range. For example: C = [0.01, 0.1, 1, ..., 10^9]; G= [1^-5, 1^-4, ..., 1000]. This should also improve your SVM accuracy by optimizing the hyper-parameters.
By looking again to your question it seems you are using the svmlearn of the machine learning toolbox (statistics toolbox) of Matlab. Therefore you have already built-in functions for cross-validation. Give a look at: http://www.mathworks.co.uk/help/stats/support-vector-machines-svm.html

I followed ASantosRibeiro's method to optimize the parameters before and it works well.
In addition, you could try to add more negative samples until the proportion of the negative and positive reach 2:1. The reason is that when you implement real-time application, you should scan the whole image step by step and commonly the negative extracted samples would be much more than the people-contained samples.
Thus, add more negative training samples is a quite straightforward but effective way to improve overall accuracy(Both false positive and true negative).

Connecting perceptrons with output of previous ones?

Because of the help I received and researched here I was able to create a simple perceptron in C#, code of which goes like:
int Input1 = A;
int Input2 = B;
//weighted sum
double WSum = A * W1 + B * W2 + Bias;
//get the sign: -1 for negative, +1 for positive
int Sign=Math.Sign(WSum);
double error = Desired - Sign;
//updating weights
W1 += error * Input1 * 0.1; //0.1 being a learning rate
W2 += error * Input2 * 0.1;
return Sign;
I do not use Sigmoid here and just get -1 or 1.
I would have two questions:
1) Is that correct that my weights get values like -5 etc? When input is e.g. 100,50 it goes like: W1+=error*100*0.1
2) I want to proceed deeper and create more connected neurons - I guess I would need at least two to provide inputs to the third. Is that correct that the third will be fed with values just -1..1? I am aiming to a simple pattern recognition but so far do not understand how it should work.

It is perfectly valid that the values of your weights range from -Infinity to +Infinity. You should always use real numbers instead of integers (so as mentioned above, double will work. 32 bit floats precision is perfectly sufficient for neural networks).
Moreover, you should decay your learning rate with every learning step, e.g. reduce it by a factor of 0.99 after each update. Else, your algorithm will oscillate when approaching an optimum.
If you want to go "deeper", you will need to implement a Multilayer Perceptron (MLP). There exists a proof that a neural network with simple threshold neurons and multiple layers alsways has an equivalent with only 1 layer. This is why several decades ago the research community temporarily abandoned the idea of artificial neural networks. 1986, Geoffrey Hinton made the Backpropagation algorithm popular. With it you can train MLPs with multiple hidden layers.
To solve non-linear problems like XOR or other complex problems like pattern recognition, you need to apply a non-linear activation function. Have a look at the logistic sigmoid activation function for a start. f(x) = 1. / (1. + exp(-x)). When doing this you should normalize your input as well as your output values to the range [0.0; 1.0]. This is especially important for the output neurons since the output of the logistic sigmoid activation function is defined in exactly this range.
A simple Python implementation of feed-forward MLPs using arrays can be found in this answer.
Edit: You also need at least 1 hidden layer to solve e.g. XOR.

Try to set your weights as double.
Also i think it's much better to work with arrays, especially in neural networks and perceptron is the only way.
And you will need some for or while loops to succeed what you want.