Neural Networks and correlation between input and output - matlab

I am trying to fit some input to predict an output in Matlab using fitnet neural networks, but I am concerned in finding which input candidate vector would correlate the most with the output as a preprocessing step prior to my neural network training.
In the figure below the output in yellow has five input candidates where I need to chose only from. What command should I use in Matlab and how should I prepare that data (repeated around 1000 time) so I can get a clear correlation between the input candidate and the output.

To find out correlation between given feature and target variable you can use R = corrcoef(A,B), but... do not do it!.
This process makes no sense and will be probably harmfull for the whole process. You are going to remove part of information from your data so only features which have idependent, linear realtion to target variable persist. Then, you will apply highly-non linear model which exploits co-occurences and features correlations. These two steps are completely incompatible. The only valid relation is - if your data is very simple and it can be pretty much modeled with linear model, then neural net will work as well. But then there is no point in using a neural net in the first place, just apply linear regression. Consequently: do not perform feature selection unless you have to. Try to build a good model without doing that, and if you have to remove some features (maybe getting them is expensive process?) use post-hoc model analysis to remove features which are not used by this model. Do not split your problem to multiple, independent processes if you do not have to (unless you can show that this decomposition does not harm the process, but in case of feature selection + regressor this is not true, as you cannot construct a valid feature selection supervision without trained regressor).

Related

Torch implementation of multi-output-layer neural network

I am going to build a neural network which has an architecture of more than one output layers. More specificly, it is designed to construct parallel procedures on top of a series of convolutional layers. One branch is to compute classification results (softmax-like); the other is to get regression results. However, I'm stuck designing the model as well as choosing loss functions(criterions).
I. Should I use torch container nn.Parallel() or nn.Concat() for the branch layers on top of conv layers (nn.Sequential())? What is the differenct except for data format.
II. Due to output data, a classification loss function and a regression loss function are to be combined linearly. I am wondering whether nn.MultiCriterion() or nn.ParallelCriterion() to be chosen with respect to determined container. Or I have to customize a new criterion class.
III. Could anyone who had done similar work tell me if torch needs additional customization to implement backprop for training. I concern about data structure issue of torch containers.
Concat vs Parallel differ in that each module in Concat gets the entire output of the last layer as input, while each input of Parallel takes a slice of the output of the last layer. For your purpose you need Concat, not Parallel, since both loss functions need to take the entire output of your sequential network.
Based on the source code of MultiCriterion and ParallenCriterion they do practically the same thing. The important difference is that in case of MultiCriterion you provide multiple loss functions, but only one target, and they are all computed against that target. Given that you have a classification and a regression task I assume you have different targets, so you need ParallelCriterion(false), where false enables the multitarget mode (if the argument is true ParallelCriterion seems to behave identical to MultiCriterion). Then the target is expected to be a table of targets for individual criterions.
If you use Concat and ParallelCriterion, torch should be able to compute gradients properly for you. The both implement updateGradInput, which properly merges the gradients of individual blocks.

When I should use transformation of inputs variable in neural network?

I have run ANN in matlab for prediction a variable based on several response variables.ALL variables have numerical values.I could not get a desirable results although I changed hidden neuron several times many runs of the model and so on.My question is should I use transformation of the input variables to get a better results?how can I know that which transformation I should choos?Thanks for any help.
I strongly advise you to use some methods from time series analysis like lagged correlation or window lagged correlation (with statistical tests). You can find it in most of statistical packages (e.g. in R). From one small picture it's hard to deduce whether your prediction is lagged or not. Testing huge amount of data can help you in revealing true dependencies and avoid trusting in spurious correlations.

i need a way to train a neural network other than backpropagation

This is an on-going venture and some details are purposefully obfuscated.
I have a box that has several inputs and one output. The output voltage changes as the input voltages are changed. The desirability of the output sequence cannot be evaluated until many states pass and a look back process is evaluated.
I want to design a neural network that takes a number of outputs from the box as input and produce the correct input settings for the box to produce the optimal next output.
I cannot train this network using backpropagation. How do I train this network?
Genetic algorithm would be a good candidate here. A chromosome could encode the weights of the neural network. After evaluation you assign a fitness value to the chromosomes based on their performance. Chromosomes with higher fitness value have a higher chance to reproduce, helping to generate better performing chromosomes in the next generation.
Encoding the weights is a relatively simple solution, more complex ones could even define the topology of the network.
You might find some additional helpful information here:
http://en.wikipedia.org/wiki/Neuroevolution
Hillclimbing is the simplest optimization algorithm to implement. Just randomly modify the weights, see if it does better, if not reset them and try again. It's also generally faster than genetic algorithms. However it is prone to getting stuck in local optima, so try running it several times and selecting the best result.

Neural networks: classification using Encog

I'm trying to get started using neural networks for a classification problem. I chose to use the Encog 3.x library as I'm working on the JVM (in Scala). Please let me know if this problem is better handled by another library.
I've been using resilient backpropagation. I have 1 hidden layer, and e.g. 3 output neurons, one for each of the 3 target categories. So ideal outputs are either 1/0/0, 0/1/0 or 0/0/1. Now, the problem is that the training tries to minimize the error, e.g. turn 0.6/0.2/0.2 into 0.8/0.1/0.1 if the ideal output is 1/0/0. But since I'm picking the highest value as the predicted category, this doesn't matter for me, and I'd want the training to spend more effort in actually reducing the number of wrong predictions.
So I learnt that I should use a softmax function as the output (although it is unclear to me if this becomes a 4th layer or I should just replace the activation function of the 3rd layer with softmax), and then have the training reduce the cross entropy. Now I think that this cross entropy needs to be calculated either over the entire network or over the entire output layer, but the ErrorFunction that one can customize calculates the error on a neuron-by-neuron basis (reads array of ideal inputs and actual inputs, writes array of error values). So how does one actually do cross entropy minimization using Encog (or which other JVM-based library should I choose)?
I'm also working with Encog, but in Java, though I don't think it makes a real difference. I have similar problem and as far as I know you have to write your own function that minimizes cross entropy.
And as I understand it, softmax should just replace your 3rd layer.

Neural Network Input Bias in MATLAB

In Matlab (Neural Network Toolbox + Image Processing Toolbox), I have written a script to extract features from images and construct a "feature vector". My problem is that some features have more data than others. I don't want these features to have more significance than others with less data.
For example, I might have a feature vector made up of 9 elements:
hProjection = [12,45,19,10];
vProjection = [3,16,90,19];
area = 346;
featureVector = [hProjection, vProjection, area];
If I construct a Neural Network with featureVector as my input, the area only makes up 10% of the input data and is less significant.
I'm using a feed-forward back-propogation network with a tansig transfer function (pattern-recognition network).
How do I deal with this?
When you present your input data to the network, each column of your feature vector is fed to the input layer as an attribute by itself.
The only bias you have to worry about is the scale of each (ie: we usually normalize the features to the [0,1] range).
Also if you believe that the features are dependent/correlated, you might want to perform some kind of attribute selection technique. And in your case it depends one the meaning of the hProj/vProj features...
EDIT:
It just occurred to me that as an alternative to feature selection, you can use a dimensionality reduction technique (PCA/SVD, Factor Analysis, ICA, ...). For example, factor analysis can be used to extract a set of latent hidden variables upon which those hProj/vProj depends on. So instead of these 8 features, you can get 2 features such that the original 8 are a linear combination of the new two features (plus some error term). Refer to this page for a complete example