Do we need to normalize input segment of training set only? - neural-network

I want to know that data normalization that is required whether it must be applied to whole part of training set both input and output or input segment is enough.

Whether you should normalize output depends on the type of neurons you use in your neural network and what type of output you expect. Find out the range of possible outputs for the given type of cell, and check that your target outputs falls within this range. If not, you will need to normalize.
The 'standard' neural network uses the sigmoid function which outputs a value between 0 and 1, so if the desired output doesn't fall in this range, you'll need to normalize.

Related

Lasagne layer whose output shape depends on the input value and not its input shape

I am working on a project with lasagne and theano and need to create a custom layer.
The output of this layer does not however depend on the size of the input, but on the values of the input...
I know that keras (only with the tensorflow backend) offers the possibility of lambda layers, and I managed to write an expression which allowed me to have the output depending on the values of the input. But I don't know how or even if it is possible to do so using lasagne and theano.
For example: if my input tensor has a fixed size of 100 values, but I know that at the end there could be some 0 values, which do not influence the output of the network at all, how can I remove those values and let only the values with information go further to the next layer?
I would like to minimize the space requirements of the network :)
Is there a possibility to have a layer in lasagne like that? If so, how should I write the get_output_shape_for() method?
If not, I'll switch to keras and tensorflow :D
Thanks in advance!
Thanks to Jan Schlüter for providing me with the answer here:
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/lasagne-users/ucjNayfhSu0
To summarize:
1) Yes, it is possible to have a lasagne layer whose output shape depends on the input values (instead of the input shape) and
2) You must write "None" in the dimensions which do hot have a fixed compile-time shape (so the changed dimensions that depend on the input values).
Regarding the example:
You can compute the output shape first, then create a new tensor with the shape of the length of non-zero entries in the original tensor and then fill the new tensor with the non-zero values (e.g. using the theano.tensor.set_subtensor function). However, I don't know if this is the optimal solution to achieve this result...

Temperature prediction using artificial neural network

I am using four parameters temperature, rainfall, humidity and date for the prediction. I am trying to predict a single parameter temperature. I am trying to use back propagation algorithm for training. What might be the best network structure for this purpose?
You could start by setting up a Multilayer Perceptron with 4 input nodes, a single hidden layer(with multiple nodes) and one output node.
Train your network by feeding your testset (for example as .cvs) so that the first input node receives the temperature value, second the rainfall value and so on.
Note that can't use a date as input! Try to convert your date into a numeric value by for example just using the month of the year {1,..,12}, the week {1,..,52} or the day {1,..,365}.
I would also try to normalize my input value to the range of your activation function. So if you use the logistic function, normalize your data to the range [0,1] and for Tanh [-1,1] and so on.
Your output value will be in the same range so you have to denormalize it afterwards. It's important that you choose a bijective function for the normalization process.

sigmoid - back propagation neural network

I'm trying to create a sample neural network that can be used for credit scoring. Since this is a complicated structure for me, i'm trying to learn them small first.
I created a network using back propagation - input layer (2 nodes), 1 hidden layer (2 nodes +1 bias), output layer (1 node), which makes use of sigmoid as activation function for all layers. I'm trying to test it first using a^2+b2^2=c^2 which means my input would be a and b, and the target output would be c.
My problem is that my input and target output values are real numbers which can range from (-/infty, +/infty). So when I'm passing these values to my network, my error function would be something like (target- network output). Would that be correct or accurate? In the sense that I'm getting the difference between the network output (which is ranged from 0 to 1) and the target output (which is a large number).
I've read that the solution would be to normalise first, but I'm not really sure how to do this. Should i normalise both the input and target output values before feeding them to the network? What normalisation function is best to use cause I read different methods in normalising. After getting the optimized weights and use them to test some data, Im getting an output value between 0 and 1 because of the sigmoid function. Should i revert the computed values to the un-normalized/original form/value? Or should i only normalise the target output and not the input values? This really got me stuck for weeks as I'm not getting the desired outcome and not sure how to incorporate the normalisation idea in my training algorithm and testing..
Thank you very much!!
So to answer your questions :
Sigmoid function is squashing its input to interval (0, 1). It's usually useful in classification task because you can interpret its output as a probability of a certain class. Your network performes regression task (you need to approximate real valued function) - so it's better to set a linear function as an activation from your last hidden layer (in your case also first :) ).
I would advise you not to use sigmoid function as an activation function in your hidden layers. It's much better to use tanh or relu nolinearities. The detailed explaination (as well as some useful tips if you want to keep sigmoid as your activation) might be found here.
It's also important to understand that architecture of your network is not suitable for a task which you are trying to solve. You can learn a little bit of what different networks might learn here.
In case of normalization : the main reason why you should normalize your data is to not giving any spourius prior knowledge to your network. Consider two variables : age and income. First one varies from e.g. 5 to 90. Second one varies from e.g. 1000 to 100000. The mean absolute value is much bigger for income than for age so due to linear tranformations in your model - ANN is treating income as more important at the beginning of your training (because of random initialization). Now consider that you are trying to solve a task where you need to classify if a person given has grey hair :) Is income truly more important variable for this task?
There are a lot of rules of thumb on how you should normalize your input data. One is to squash all inputs to [0, 1] interval. Another is to make every variable to have mean = 0 and sd = 1. I usually use second method when the distribiution of a given variable is similiar to Normal Distribiution and first - in other cases.
When it comes to normalize the output it's usually also useful to normalize it when you are solving regression task (especially in multiple regression case) but it's not so crucial as in input case.
You should remember to keep parameters needed to restore the original size of your inputs and outputs. You should also remember to compute them only on a training set and apply it on both training, test and validation sets.

Backpropagation and training set for dummies

I'm at the very beginning of studying neural networks but my scarce skills or lack of intelligence do not allow me to understand from popular articles how to correctly prepare training set for backpropagation training method (or its limitations). For example, I want to train the simplest two-layer perceptron to solve XOR with backpropagation (e. g. modify random initial weights for 4 synapses from first layer and 4 from second). Simple XOR function has two inputs, one output: {0,0}=>0, {0,1}=>1, {1,0}=>1, {1,1}=>0. But neural networks theory tells that "backpropagation requires a known, desired output for each input value in order to calculate the loss function gradient". Does it means that backpropagation can't be applied if in training set amount of inputs is not strictly equal to amount of outputs and this restriction can not be avoided? Or does it means, if I want to use backpropagation for solving such classification tasks as XOR (i. e. number of inputs is bigger than number of outputs), theory tells that it's always necessary to remake training set in the similarly way (input=>desired output): {0,0}=>{0,0}, {0,1}=>{1,1}, {1,0}=>{1,1}, {1,1}=>{0,0}?
Thanks for any help in advance!
Does it means that backpropagation can't be applied if in training set amount of inputs is not strictly equal to amount of outputs
If you mean the output is "the class" in classification task, I don't think so,
backpropagation requires a known, desired output for each input value in order to calculate the loss function gradient
I think it's mean every input should have an output, not a different output.
In real life problem, like handwriting digit classification (MNIST), there are around 50.000 data training (input), but only classed to 10 digit

Confusion with inputs and targets for a neural network

Recently I've posted many question s regarding a character recognition program that I am making. I thought I had it working fully until today. I think it has to do with my training of the network. What follows is an explanation of how I think the training and simulation procedure goes.
Give these two images
targets
inputs
I want to train the network to recognize the letter D. Note that before this is done, I've processed the images into a binary matrix. For training I use
[net,tr] = train(net,inputs,targets);
where instead of inputs I was targets because I want to train the network to recognize all the letters in the target image.
I then run
outputs = sim(net,inputs);
where inputs is the image with the letter "D", or an image with any other letter that is in ABCD. The basic premise here is that I want to train the network to recognized all the letters in ABCD, then choose any letter A, B, C, or D and see if the network recognizes this choosen letter.
Question:
Am I correct with the training procedure?
Well it greatly depends on how you implemented your neural network. Although regarding the question you're asking I guess you didn't implement it yourself but used some ready made API.
Anyways, you should first understand the tools you use before you use them (here neural networks).
A neural network takes an input and performs linear or non-linear transformations of the input and returns an output.
Inputs and outputs are always numeric values. However they may represent any kind of data.
Inputs can be:
Pixels of an image
Real valued or integer attributes
Categories
etc.
In your case the inputs are the pixels of your character images (your binary matrices).
Outputs can be:
Classes (if you're doing classification)
Values (if you're doing regression)
Next value in a time series (if you're doing time series prediction)
In your case, you're doing classification (predicting which character the inputs represent) so your output is a class.
For you to understand how the network is trained, I'll first explain how to use it once it's trained and then what it implies for the training phase.
So once you've trained you network, you will give it the binary matrix representing your image and it will output the class (the character) which will be (for example): 0 for A, 1 for B, 2 for C and 3 for D. In other words, you have:
Input: binary matrix (image)
Output: 0,1,2 or 3 (depending on which character the network recognizes in the image)
The training phase consists in telling the network which output you would like for each input.
The type of data used during the training phase is the same as the one being used in the "prediction phase". Hence, for the training phase:
Inputs: binary matrices [A,B,C,D] (One for each letter! Very important !)
Targets: corresponding classes [0,1,2,3]
This way, you're telling the network to learn that if you give it the image of A it should output 0, if you give it the image of B it should output 1, and so on.
Note: You were mistaken because you thought of the "inputs" as the inputs you wanted to give the network after the training phase, when they were actually the inputs given to the network during the training phase.