Working with a constant as input when working with neural networks - neural-network

I am creating a neural network to do some prediction and I have an input that is a constant for all individual data points. My approach was to have a column with the constant value in each row and pass that as an input with the rest of the independent variables. Is there a simpler way of passing constants to the model in a more straightforward manner?

Related

How to use multiple labels as targets in Neural Net Pattern Recognition Toolbox?

I am trying to use the Neural Net Pattern Recognition toolbox in MATLAB for recognizing different types of classes in my dataset. I have a 21392 x 4 table, with the columns 1-3 which I would like to use as predictors and the 4th column has the labels with 14 different categories (strings like Angry, Sad, Happy, Neutral etc.). It seems that the Neural Net Pattern Recognition toolbox, unlike the MATLAB Classification Learner toolbox doesn't allow me to import the table and automatically extract the predictors and responses from it. Moreover, I am unable to either specify the inputs and targets to the neural network manually as it isn't showing up in the options.
I looked into the examples like the Iris Dataset, Wine Dataset, Cancer Dataset etc., but all of them only have 2-3 classes as outputs which are being Identified (and encoded in binary like 000, 010, 011 etc.) and the labels are not string type unlike mine like Angry, Sad, Happy, Neutral etc. (total 14 different classes). I would like to know how I can use my table as input to the neural network pattern recognition toolbox, or otherwise, any way in which I can extract the data from my table and use it in the toolbox. I am new to using the toolbox, so any help in this regard would be highly appreciated. Thanks!
The first step to use the Neural Net Pattern Recognition Toolbox is to convert the table to a numeric array, as neural networks work only with numeric arrays, not other datatypes directly. Considering the table as my_table, it can be converted to a numeric array using
my_table_array = table2array(my_table);
From my_table_array, the inputs (predictors) and outputs/targets can be extracted. But, it is imperative to mention that the inputs and outputs need to be transposed (as the data is needed to be in column format for the toolbox, each column is one datapoint, and each row is the feature), which can easily be accomplished using:-
inputs = inputs'; %(now of dimensions 3x21392)
labels = labels'; %(now of dimensions 1x21392)
The string type labels (categorical) can be converted to numeric values using a one-hot encoding technique with categorical, followed by ind2vec:
my_table_vector = ind2vec(double(categorical(labels)));
Now, the my_table_vector (final targets) and inputs (final input predictors) can easily be fed to the neural network and used for classification/prediction of the target labels.

Avoiding Dummy variable trap and neural network

I know that categorical data should be one-hot encoded before training the machine learning algorithm. I also need that for multivariate linear regression I need to exclude one of the encoded variable to avoid so called dummy variable trap.
Ex: If I have categorical feature "size": "small", "medium", "large", then in one hot encoded I would have something like:
small medium large other-feature
0 1 0 2999
So to avoid dummy variable trap I need to remove any of the 3 columns, for example, column "small".
Should I do the same for training a Neural Network? Or this is purely for multivariate regression?
Thanks.
As stated here, dummy variable trap needs to be avoided (one category of each categorical feature removed after encoding but before training) on input of algorithms that consider all the predictors together, as a linear combination. Such algorithms are:
Linear/multilinear regression
Logistic regression
Discriminant analysis
Neural networks that don't employ weight decay
If you remove a category from input of a neural network that employs weight decay, it will get biased in favor of the omitted category instead.
Even though no information is lost when omitting one category after encoding a feature, other algorithms will have to infer the correlation of the omitted category indirectly through combination of all the other categories, making them do more computation for the same result.

Activation function for output layer for regression models in Neural Networks

I have been experimenting with neural networks these days. I have come across a general question regarding the activation function to use. This might be a well known fact to but I couldn't understand properly. A lot of the examples and papers I have seen are working on classification problems and they either use sigmoid (in binary case) or softmax (in multi-class case) as the activation function in the out put layer and it makes sense. But I haven't seen any activation function used in the output layer of a regression model.
So my question is that is it by choice we don't use any activation function in the output layer of a regression model as we don't want the activation function to limit or put restrictions on the value. The output value can be any number and as big as thousands so the activation function like sigmoid to tanh won't make sense. Or is there any other reason? Or we actually can use some activation function which are made for these kind of problems?
for linear regression type of problem, you can simply create the Output layer without any activation function as we are interested in numerical values without any transformation.
more info :
https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/
for classification :
You can use sigmoid, tanh, Softmax etc.
If you have, say, a Sigmoid as an activation function in output layer of your NN you will never get any value less than 0 and greater than 1.
Basically if the data your're trying to predict are distributed within that range you might approach with a Sigmoid function and test if your prediction performs well on your training set.
Even more general, when predict a data you should come up with the function that represents your data in the most effective way.
Hence if your real data does not fit Sigmoid function well you have to think of any other function (e.g. some polynomial function, or periodic function or any other or a combination of them) but you also should always care of how easily you will build your cost function and evaluate derivatives.
Just use a linear activation function without limiting the output value range unless you have some reasonable assumption about it.

Self organizing Maps and Linear vector quantization

Self organizing maps are more suited for clustering(dimension reduction) rather than classification. But SOM's are used in Linear vector quantization for fine tuning. But LVQ is a supervised leaning method. So to use SOM's in LVQ, LVQ should be provided with a labelled training data set. But since SOM's only do clustering and not classification and thus cannot have labelled data how can SOM be used as an input for LVQ?
Does LVQ fine tune the clusters in SOM?
Before using in LVQ should SOM be put through another classification algorithm so that it can classify the inputs so that these labelled inputs maybe used in LVQ?
It must be clear that supervised differs from unsupervised because in the first the target values are known.
Therefore, the output of supervised models is a prediction.
Instead, the output of unsupervised models is a label for which we don't know the meaning yet. For this purpose, after clustering, it is necessary to do the profiling of each one of those new label.
Having said so, you could label the dataset using an unsupervised learning technique such as SOM. Then, you should profile each class in order to be sure to understand the meaning of each class.
At this point, you can pursue two different path depending on what is your final objective:
1. use this new variable as a way for dimensionality reduction
2. use this new dataset featured with the additional variable representing the class as a labelled data that you will try to predict using the LVQ
Hope this can be useful!

Neural Network Categorical Data Implementation

I've been learning to work with neural networks as a hobby project, but am at a complete loss with how to handle categorical data. I read the article http://visualstudiomagazine.com/articles/2013/07/01/neural-network-data-normalization-and-encoding.aspx, which explains normalization of the input data and explains how to preprocess categorical data using effects encoding. I understand the concept of breaking the categories into vectors, but have no idea how to actually implement this.
For example, if I'm using countries as categorical data (e.g. Finland, Thailand, etc), would I process the resulting vector into a single number to be fed to a single input, or would I have a separate input for each component of the vector? Under the latter, if there are 196 different countries, that would mean I would need 196 different inputs just to process this particular piece of data. If a lot of different categorical data is being fed to the network, I can see this becoming really unwieldy very fast.
Is there something I'm missing? How exactly is categorical data mapped to neuron inputs?
Neural network inputs
As a rule of thumb: different classes and categories should have their own input signals.
Why you can't encode it with a single input
Since a neural network acts upon the input values through activation functions, a higher input value will result in a higher activation input.
A higher input value will make the neuron more likely to fire.
As long as you don't want to tell the network that Thailand is "better" than Finland then you may not encode the country input signal as InputValue(Finland) = 24, InputValue(Thailand) = 140.
How it should be encoded
Each country deserves its own input signal so that they contribute equally to activating the neurons.