Data Normalization: features same unit but different range - neural-network

In my neural network all features have the same unit (centimeter).
But the range of some features is very different e.g.
range of feature1: 5cm to 6cm
range of feature2: 12.5cm to 15cm
range of feature3: 5.5cm to 12.5cm
Is normalization useful in this case?

Yes, of course its useful, you have to normalize your inputs and outputs, there is not much choice in that.

Related

Applying Feature Scaling in a Neural Network

I have two questions:
Do I have to apply Feature Scaling over ALL features in Neural Network(and Deep Learning too)?
How can I scale categorical features in a dataset for neural network(if needed)?
Its depends, what are you are trying to you could:
Use one hot encoding, to create numeric values.
For numerical features you could divide in each value into the average value for example, ant then you will will get values between
[0,1], which make make it , good to feed a neural network.

Training data range for Neural Network

Is it better for Neural Network to use smaller range of training data or it does not matter? For example, if I want to train an ANN with angles (values of float) should I pass those values in degrees [0; 360] or in radians [0; 6.28] or maybe all values should be normalized to range [0; 1]? Does the range of training data affects ANN learing quality?
My Neural Network has 6 input neurons, 1 hidden layer and I am using sigmoid symmetric activation function (tanh).
For the neural network it doesn't matter whether the data is normalised.
However, the performance of the training method can vary a lot.
In a nutshell: typically the methods prefer variables which have larger values. This might send the training method off-track.
Crucial for most NN training methods is that all dimensions of the training data have the same domain. If all your variables are angles it doesn't matter, whether they are [0,1) or [0,2*pi) or [0,360) as long as they have the same domain. However, you should avoid having one variable for the angle [0,2*pi) and another variable for the distance in mm where distance can be much larger then 2000000mm.
Two cases where an algorithm might suffer in these cases:
(a) regularisation: if the weights of the NN are force to be small a tiny change of a weight controlling the input of a large domain variable has a much larger impact, than for a small domain
(b) gradient descent: if the step size is limited you have similar effects.
Recommendation: All variables should have the same domain size whether it is [0,1] or [0,2*pi] or ... doesn't matter.
Addition: for many domain "z-score normalisation" works extremely well.
The data points range affects the way you train a model. Suppose the range of values for features in the data set is not normalized. Then, depending on your data, you may end up having elongated Ellipses for the data points in the feature space and the learning model will have a very hard time learning the manifold on which the data points lie on (learn the underlying distribution). Also, in most cases the data points are sparsely spread in the feature space, if not normalized (see this). So, the take-home message is to normalize the features when possible.

can use more than ten inputs with a single layer neural network to separate into two categories

I have a pattern data with 12 categories. And I want to separate these data into two categories. So Can anyone tell me is it possible to do with a single layer neural network with 12 input values with the bias term? And also I implemented it with matlab but i'm having some doubt what should be the best initial weight values(range) and possible learning rate? can you please guide me on these cases.
Is a single layer enough?
Whether a single hidden layer suffices to correctly label your input data depends on the complexity of your data. You should empirically try different topologies (combinations of layers and number of neurons) until you discover a setting that works for you.
What are the best weight ranges?
The recommended weight ranges depends on the activation function you intend to use. For the sigmoid function, the range is a small interval centered around 0, eg: [-0.1, 0.1]
What is the ideal learning rate?
The learning rate often set to a small value such as 0.03, but if the data is easily learned by your network you can often increase the rate drastically eg: 0.3. Check out this discussion on how learning rates affect the learning process: https://stackoverflow.com/a/11415434/1149632
A side note
You should search the Web for a few pointers and tips, and rather post more to the point questions on StackOverflow.
Check this out:
http://www.willamette.edu/~gorr/classes/cs449/intro.html

Applying z-score (zero mean, unit std) before scaling to [0,1]?

I'm currently using neural network for classification of a dataset. Of course before doing classification either the data points or the features should be normalized. The toolbox which I'm using for neural network requires all values to be in range [0,1].
Does it make sense to first apply z-score (zero mean and unit standard deviation) and then to scale to range [0,1]?
Second, should I normalize along the feature vectors or the data points (either applying z-score or to range [0,1])?
You certainly need to normalize, however, some of these questions will depend on your application.
First: Scaling does not change the result of the z-score, since it is designed to be in terms of the standard deviation. However you will need to renormalize after the z-score if you decide to use it, to get within the range of [0,1].
Second: I don't understand the distinction between normalizing the features vs. the data points. Your choice of basis is depends on you. Whatever the points of data are that you plan to feed into your algorithm, those need to be normalized to [0,1]. How you want get them into that range depends heavily on your context.

Pattern recognition teachniques that allow input as sequence of different length

I am trying to classify water end-use events expressed as a time-series sequences into appropriate categories (e.g. toilet, tap, shower, etc). My first attempt using HMM shows a quite promising result with an average accuracy of 80%. I just wonder if there is any other techniques that allow the training input as time-series sequences of different length like HMM does rather than the extracted feature vector of each sequence. I have tried Conditional Random Field (CRF) and SVM ;however, as far as I know, these two techniques require input as a pre-computed feature vector and the length of all input vectors must be the same for training purpose. I am not sure if I am right or wrong at this point. Any help would be appreciated.
Thanks, Will