Is it better to individually normalize all inputs for a neural network? - neural-network

I'm working on a neural network with Keras using TensorFlow as the backend right now, and my model takes 5 inputs, all normalized to 0 to 1. The inputs' units vary from m/s to meters to m/s/s. So, for example, one input could vary from 0 m/s to 30 m/s, while another input could vary from 5 m to 200 m in the training dataset.
Is it better to individually and independently normalize all inputs so that I have different scales for each unit/input? Or would normalizing all inputs to one scale (mapping 0-200 to 0-1 for the example above) be better for accuracy?

Normalize individualy each input. Because if you normalize everything by dividing 200 some inputs will affect your network less than others. If one input vary between 0-30, after dividing by 200 you get 0-0.15 scale and scale for input which vary 0-200 will be 0-1 after division. So 0-30 input will have less numbers and you tell your network that input is not so relevant as one whith 0-200.

Related

Can I normalise subsets of training data for a neural network?

Say I have a training set with 50 vectors. I split this set into 5 sets each with 10 vectors and then I scale the vectors in each subset and normalise the subsets. Then I train my ANN with each vector from each subset.
After training is complete, I group my test set into subsets of 10 vectors each, scale the features of the vectors in each subset and normalise each subset and then feed it to the neural network to attempt to classify it.
Is this the right approach? Is it right to scale and normalise each subset, each with its own minimum, maximum, mean and standard deviation?

How to take the difference between the resulting and the correct bucket of a one hot vector into account?

Hi I am using tensorflow at my university to try to classify steering angles of a simulation program using only the images the simulation produces.
The Steering angles are values from -1 to 1 and I separated them into 50 "buckets". So the first value of my prediction vector would mean that the predicted steering angle is between -1 and -0.96.
The following shows the classification and optimization functions I am using.
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(prediction, y))
optimizer = tf.train.AdamOptimizer(0.001).minimize(cost)
y is a vector that with 49 zeros and a single 1 for the correct bucket. My question now is.
How do I take into account if e.g. the correct bucket is at index 25, that the a prediction of 26 is much better than a prediction of 48.
I didn't post the actual network since it is just a couple of conv2d and maxpool layers with a fully connected layer at the end.
Since you are applying Cross entropy or negative log likelihood. you are penalizing the system given the predicted output and the ground truth.
So saying that your system predicted different numbers on your 50 classes output and the highest one was the class number 25 but your ground truth is class 26. So your system will take the value predicted on 26 and adapt the parameters to produce the highest number on this output the next time it sees this input.
You could do two basic things:
Change your y and prediction to be scalars in the range -1..1; make the loss function be (y-prediction)**2 or something. A very different model, but perhaps more reasonable that the one-hot.
Keep the one-hot target and loss, but have y = target*w, where w is a constant matrix, mostly zeros, 1s on the diagonal, and smaller values on the next diagonal, elements (e.g. y(i) = target(i) * 1. + target(i-1) * .5 + target(i+1) * .5 + ...); kind of gross, but it should converge to something reasonable.

How can I efficiently find the accuracy of a classifier

Even with a simple classifier like the nearest neighbour I cannot seem to judge its accuracy and thus cannot improve it.
For example with the code below:
IDX = knnsearch(train_image_feats, test_image_feats);
predicted_categories = cell([size(test_image_feats, 1), 1]);
for i=1:size(IDX,1)
predicted_categories{i}=train_labels(IDX(i));
end
Here train_image_feats is a 300 by 256 matrix where each row represents an image. Same is the structure of test_image_feats. train_labels is the label corresponding to each row of the training matrix.
The book I am following simply said that the above method achieves an accuracy of 19%.
How did the author come to this conclusion? Is there any way to judge the accuracy of my results be it with this classifier or other?
The author then uses another method of feature extraction and says it improved accuracy by 30%.
How can I find the accuracy? Be it graphically or just via a simple percentage.
Accuracy when doing machine learning and classification is usually calculated by comparing your predicted outputs from your classifier in comparison to the ground truth. When you're evaluating the classification accuracy of your classifier, you will have already created a predictive model using a training set with known inputs and outputs. At this point, you will have a test set with inputs and outputs that were not used to train the classifier. For the purposes of this post, let's call this the ground truth data set. This ground truth data set helps assess the accuracy of your classifier when you are providing inputs to this classifier that it has not seen before. You take your inputs from your test set, and run them through your classifier. You get outputs for each input and we call the collection of these outputs the predicted values.
For each predicted value, you compare to the associated ground truth value and see if it is the same. You add up all of the instances where the outputs match up between the predicted and the ground truth. Adding all of these values up, and dividing by the total number of points in your test set yields the fraction of instances where your model accurately predicted the result in comparison to the ground truth.
In MATLAB, this is really simple to calculate. Supposing that your categories for your model were enumerated from 1 to N where N is the total number of labels you are classifying with. Let groundTruth be your vector of labels that denote the ground truth while predictedLabels denote your labels that are generated from your classifier. The accuracy is simply calculated by:
accuracy = sum(groundTruth == predictedLabels) / numel(groundTruth);
accuracyPercentage = 100*accuracy;
The first line of code calculates what the accuracy of your model is as a fraction. The second line calculates this as a percentage, where you simply multiply the first line of code by 100. You can use either or when you want to assess accuracy. One is just normalized between [0,1] while the other is a percentage from 0% to 100%. What groundTruth == predictedLabels does is that it compares each element between groundTruth and predictedLabels. If the ith value in groundTruth matches with the ith value in predictedLabels, we output a 1. If not, we output a 0. This will be a vector of 0s and 1s and so we simply sum up all of the values that are 1, which is eloquently encapsulated in the sum operation. We then divide by the total number of points in our test set to obtain the final accuracy of the classifier.
With a toy example, supposing I had 4 labels, and my groundTruth and predictedLabels vectors were this:
groundTruth = [1 2 3 2 3 4 1 1 2 3 3 4 1 2 3];
predictedLabels = [1 2 2 4 4 4 1 2 3 3 4 1 2 3 3];
The accuracy using the above vectors gives us:
>> accuracy
accuracy =
0.4000
>> accuracyPercentage
accuracyPercentage =
40
This means that we have a 40% accuracy or an accuracy of 0.40. Using this example, the predictive model was only able to accurately classify 40% of the test set when you put each test set input through the classifier. This makes sense, because between our predicted outputs and ground truth, only 40%, or 6 outputs match up. These are the 1st, 2nd, 6th, 7th, 10th and 15th elements. There are other metrics to calculating accuracy, like ROC curves, but when calculating accuracy in machine learning, this is what is usually done.

normalization and non-normalization in Neural Network modeling in MATLAB

I have a data set with 5 inputs and one output. I want to train this data set with Neural Network modeling in MATLAB. When i input data without normalization, the MSE is very large around 1e+3. But, when i normalize the input data, the output error becomes around 1e-4. So, as i understand, normalization is an important task.
My 2 questions:
1- my real target (output) before training process is in range of [0 1000] or [50 800], but after normalization, the neural network gives me a value in range of [0 1] due to normalization. I mean, i cannot get any value in my real range [0 1000] or [50 800]. How can i convert the output from neural network to its correct target in range of [0 1000] or [50 800]? Is it logic to do it? When my real (target) output should be [0 1000] or [50 800], what can i do with the values in range of [0 1]?
2- I want to test the trained NN with one new input pattern. When i normalized the input data in training phase, this new input pattern should be normalized. yes? How? I mean, my input data in training phase, was around 1000 data and i normalized them with (x-min)/(max-min). Should i normalized my one new input pattern in this min and max range?
Well, assuming the normalization is linear (it probably is), you can take the outputs, multiply them by 1000 and round that.
You can also play around with different transfer functions. Sigmoid has many desirable properties, but it is not capable of producing anything larger than 1 (which is sort of necessary if your outputs run up to 1000). I think that often the last layer has linear transfer function - but since your outputs are this large, it may not be sufficient in this particular case.

RBF neural network parameters size

I want to define a function approximation by RBF neural network in MATLAB.
RBF needs there parameters as "unit centers", "sigma" and "weight". I have a dataset by 1000 records and 10 features.
first question: these three parameters should be in an array format? or can be in matrix format?
second question: I defined "unit centers" by k-means clustering over dataset. This is three cluster centers.
For "sigma" and "weight" parameters, i should define a matrix same as the "unit centers" size?
unit centers are matrix by 3*10 size. Other two RBF parameters should assign in 3in10 size? Or can i define them in 1in10 or 2in10 size?
Centers are of course in the form of a matrix, you have 10 features, you are calculating the centers by distance based on these 10 dimension. and you have more than one centers so it is a matrix of shape: (#centers,#features).
Sigma is just a single number for every center so it is in the shape of: (#centers,1), hence it is a 1D-array
Weights depend on the size of hidden layer(Centers) and with one output neuron it is of the shape: (#centers,1), it is a 1D-array
one last thing to mention here is that, number of your centers is small compare to your input size which is 1000. try 100, 200 or even 500 centers if you did not get good accuracy on test set.