Avoiding Dummy variable trap and neural network - neural-network

I know that categorical data should be one-hot encoded before training the machine learning algorithm. I also need that for multivariate linear regression I need to exclude one of the encoded variable to avoid so called dummy variable trap.
Ex: If I have categorical feature "size": "small", "medium", "large", then in one hot encoded I would have something like:
small medium large other-feature
0 1 0 2999
So to avoid dummy variable trap I need to remove any of the 3 columns, for example, column "small".
Should I do the same for training a Neural Network? Or this is purely for multivariate regression?
Thanks.

As stated here, dummy variable trap needs to be avoided (one category of each categorical feature removed after encoding but before training) on input of algorithms that consider all the predictors together, as a linear combination. Such algorithms are:
Linear/multilinear regression
Logistic regression
Discriminant analysis
Neural networks that don't employ weight decay
If you remove a category from input of a neural network that employs weight decay, it will get biased in favor of the omitted category instead.
Even though no information is lost when omitting one category after encoding a feature, other algorithms will have to infer the correlation of the omitted category indirectly through combination of all the other categories, making them do more computation for the same result.

Related

How is Cross Entropy Loss Converted to a Scalar During Optimization?

I have a basic beginner question about how neural networks are defined, and I am learning in the context of the Keras library. Following the MNIST hello world program, I have defined this network:
model = Sequential()
model.add(Dense(NB_CLASSES, input_shape=(RESHAPED,), activation='softmax'))
My understanding is that that this creates a neural network with two layers, in this case RESHAPED is 784, and NB_CLASSES is 10, so the network will have 1 input layer with 785 neurons and one output layer with 10 neurons.
Then I added this:
model.compile(loss='categorical_crossentropy', optimizer=OPTIMIZER, metrics=['accuracy'])
I understand have read up on the formula for categorical cross entropy, but it appears to be calculated per output node. My question is, during training, how would the values of the cross entropy be combined to create a scalar valued objective function? Is it just an average?
Keras computes the mean of the per-instance loss values, possibly weighted (see sample_weight_mode argument if you're interested).
Here's the reference to the source code: training.py. As you can see, the result value goes through K.mean(...), which ensures the result is a scalar.
In general, however, it is possible to reduce the losses differently, e.g., just a sum, but it usually performs worse, so the mean is more preferable (see this question).

What is the difference between `fitclinear` and `fitcsvm`?

In MATLAB the documentation says fitclinear uses SVM or logistic regression and fitcsvm also is an SVM.
Also fitclinear is usually faster. Why?
What is the difference between both?
As stated in the description of fitclinear:
For reduced computation time on a high-dimensional data set that
includes many predictor variables, train a linear classification model
by using fitclinear. For low- through medium-dimensional predictor
data sets, see Alternatives for Lower-Dimensional Data.
fitcsvm is present among these alternatives for Lower-Dimensional Data.
In other words, fitclinear is best to be used with high-dimensional data, while fictsvm should be used for low through medium-dimensional predictor data sets.

Backpropagation and training set for dummies

I'm at the very beginning of studying neural networks but my scarce skills or lack of intelligence do not allow me to understand from popular articles how to correctly prepare training set for backpropagation training method (or its limitations). For example, I want to train the simplest two-layer perceptron to solve XOR with backpropagation (e. g. modify random initial weights for 4 synapses from first layer and 4 from second). Simple XOR function has two inputs, one output: {0,0}=>0, {0,1}=>1, {1,0}=>1, {1,1}=>0. But neural networks theory tells that "backpropagation requires a known, desired output for each input value in order to calculate the loss function gradient". Does it means that backpropagation can't be applied if in training set amount of inputs is not strictly equal to amount of outputs and this restriction can not be avoided? Or does it means, if I want to use backpropagation for solving such classification tasks as XOR (i. e. number of inputs is bigger than number of outputs), theory tells that it's always necessary to remake training set in the similarly way (input=>desired output): {0,0}=>{0,0}, {0,1}=>{1,1}, {1,0}=>{1,1}, {1,1}=>{0,0}?
Thanks for any help in advance!
Does it means that backpropagation can't be applied if in training set amount of inputs is not strictly equal to amount of outputs
If you mean the output is "the class" in classification task, I don't think so,
backpropagation requires a known, desired output for each input value in order to calculate the loss function gradient
I think it's mean every input should have an output, not a different output.
In real life problem, like handwriting digit classification (MNIST), there are around 50.000 data training (input), but only classed to 10 digit

Neural Network Categorical Data Implementation

I've been learning to work with neural networks as a hobby project, but am at a complete loss with how to handle categorical data. I read the article http://visualstudiomagazine.com/articles/2013/07/01/neural-network-data-normalization-and-encoding.aspx, which explains normalization of the input data and explains how to preprocess categorical data using effects encoding. I understand the concept of breaking the categories into vectors, but have no idea how to actually implement this.
For example, if I'm using countries as categorical data (e.g. Finland, Thailand, etc), would I process the resulting vector into a single number to be fed to a single input, or would I have a separate input for each component of the vector? Under the latter, if there are 196 different countries, that would mean I would need 196 different inputs just to process this particular piece of data. If a lot of different categorical data is being fed to the network, I can see this becoming really unwieldy very fast.
Is there something I'm missing? How exactly is categorical data mapped to neuron inputs?
Neural network inputs
As a rule of thumb: different classes and categories should have their own input signals.
Why you can't encode it with a single input
Since a neural network acts upon the input values through activation functions, a higher input value will result in a higher activation input.
A higher input value will make the neuron more likely to fire.
As long as you don't want to tell the network that Thailand is "better" than Finland then you may not encode the country input signal as InputValue(Finland) = 24, InputValue(Thailand) = 140.
How it should be encoded
Each country deserves its own input signal so that they contribute equally to activating the neurons.

Time series classification MATLAB

My task is to classify time-series data with use of MATLAB and any neural-network framework.
Describing task more specifically:
Is is a problem from computer-vision field. Is is a scene boundary detection task.
Source data are 4 arrays of neighbouring frame histogram correlations from the videoflow.
Based on this data, we have to classify this timeseries with 2 classes:
"scene break"
"no scene break"
So network input is 4 double values for each source data entry, and output is one binary value. I am going to show example of src data below:
0.997894,0.999413,0.982098,0.992164
0.998964,0.999986,0.999127,0.982068
0.993807,0.998823,0.994008,0.994299
0.225917,0.000000,0.407494,0.400424
0.881150,0.999427,0.949031,0.994918
Problem is that pattern-recogition tools from Matlab Neural Toolbox (like patternnet) threat source data like independant entrues. But I have strong belief that results will be precise only if net take decision based on the history of previous correlations.
But I also did not manage to get valid response from reccurent nets which serve time series analysis (like delaynet and narxnet).
narxnet and delaynet return lousy result and it looks like these types of networks not supposed to solve classification tasks. I am not insert any code here while it is allmost totally autogenerated with use of Matlab Neural Toolbox GUI.
I would apprecite any help. Especially, some advice which tool fits better for accomplishing my task.
I am not sure how difficult to classify this problem.
Given your sample, 4 input and 1 output feed-forward neural network is sufficient.
If you insist on using historical inputs, you simply pre-process your input d, such that
Your new input D(t) (a vector at time t) is composed of d(t) is a 1x4 vector at time t; d(t-1) is 1x4 vector at time t-1;... and d(t-k) is a 1x4 vector at time t-k.
If t-k <0, just treat it as '0'.
So you have a 1x(4(k+1)) vector as input, and 1 output.
Similar as Dan mentioned, you need to find a good k.
Speaking of the weights, I think additional pre-processing like windowing method on the input is not necessary, since neural network would be trained to assign weights to each input dimension.
It sounds a bit messy, since the neural network would consider each input dimension independently. That means you lose the information as four neighboring correlations.
One possible solution is the pre-processing extracts the neighborhood features, e.g. using mean and std as two features representative for the originals.