Neural Network Categorical Data Implementation - neural-network

I've been learning to work with neural networks as a hobby project, but am at a complete loss with how to handle categorical data. I read the article http://visualstudiomagazine.com/articles/2013/07/01/neural-network-data-normalization-and-encoding.aspx, which explains normalization of the input data and explains how to preprocess categorical data using effects encoding. I understand the concept of breaking the categories into vectors, but have no idea how to actually implement this.
For example, if I'm using countries as categorical data (e.g. Finland, Thailand, etc), would I process the resulting vector into a single number to be fed to a single input, or would I have a separate input for each component of the vector? Under the latter, if there are 196 different countries, that would mean I would need 196 different inputs just to process this particular piece of data. If a lot of different categorical data is being fed to the network, I can see this becoming really unwieldy very fast.
Is there something I'm missing? How exactly is categorical data mapped to neuron inputs?

Neural network inputs
As a rule of thumb: different classes and categories should have their own input signals.
Why you can't encode it with a single input
Since a neural network acts upon the input values through activation functions, a higher input value will result in a higher activation input.
A higher input value will make the neuron more likely to fire.
As long as you don't want to tell the network that Thailand is "better" than Finland then you may not encode the country input signal as InputValue(Finland) = 24, InputValue(Thailand) = 140.
How it should be encoded
Each country deserves its own input signal so that they contribute equally to activating the neurons.

Related

Avoiding Dummy variable trap and neural network

I know that categorical data should be one-hot encoded before training the machine learning algorithm. I also need that for multivariate linear regression I need to exclude one of the encoded variable to avoid so called dummy variable trap.
Ex: If I have categorical feature "size": "small", "medium", "large", then in one hot encoded I would have something like:
small medium large other-feature
0 1 0 2999
So to avoid dummy variable trap I need to remove any of the 3 columns, for example, column "small".
Should I do the same for training a Neural Network? Or this is purely for multivariate regression?
Thanks.
As stated here, dummy variable trap needs to be avoided (one category of each categorical feature removed after encoding but before training) on input of algorithms that consider all the predictors together, as a linear combination. Such algorithms are:
Linear/multilinear regression
Logistic regression
Discriminant analysis
Neural networks that don't employ weight decay
If you remove a category from input of a neural network that employs weight decay, it will get biased in favor of the omitted category instead.
Even though no information is lost when omitting one category after encoding a feature, other algorithms will have to infer the correlation of the omitted category indirectly through combination of all the other categories, making them do more computation for the same result.

Input values of an ANN constructed with keras framework (using theano)

I want to costruct a neural network which will be trained based on data i create. My question is what form these data should have? In other words does keras allow neural networks that take strings/characters as input? If not, and only is able to accept numbers in what range should the input/output be?
The only condition for your input data i.e features, is that it should be numerical. There isn't really any constraint on range but it's always a good idea to do Feature Scaling, Normalization etc to make sure that our model won't get confused. Neural Networks or other machine learning methods cannot accept string (characters, words) directly, therefore, you need to first convert string to numbers. There are many ways to do that, most common techniques include Bag of Words, tf-idf features, word embeddings etc.
Following tutorials (using scikit) might be a good starting point:
http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-1-for-beginners-bag-of-words

Pattern recognition teachniques that allow input as sequence of different length

I am trying to classify water end-use events expressed as a time-series sequences into appropriate categories (e.g. toilet, tap, shower, etc). My first attempt using HMM shows a quite promising result with an average accuracy of 80%. I just wonder if there is any other techniques that allow the training input as time-series sequences of different length like HMM does rather than the extracted feature vector of each sequence. I have tried Conditional Random Field (CRF) and SVM ;however, as far as I know, these two techniques require input as a pre-computed feature vector and the length of all input vectors must be the same for training purpose. I am not sure if I am right or wrong at this point. Any help would be appreciated.
Thanks, Will

How to use created "net" neural network object for prediction?

I used ntstool to create NAR (nonlinear Autoregressive) net object, by training on a 1x1247 input vector. (daily stock price for 6 years)
I have finished all the steps and saved the resulting net object to workspace.
Now I am clueless on how to use this object to predict the y(t) for example t = 2000, (I trained the model for t = 1:1247)
In some other threads, people recommended to use sim(net, t) function - however this will give me the same result for any value of t. (same with net(t) function)
I am not familiar with the specific neural net commands, but I think you are approaching this problem in the wrong way. Typically you want to model the evolution in time. You do this by specifying a certain window, say 3 months.
What you are training now is a single input vector, which has no information about evolution in time. The reason you always get the same prediction is because you only used a single point for training (even though it is 1247 dimensional, it is still 1 point).
You probably want to make input vectors of this nature (for simplicity, assume you are working with months):
[month1 month2; month2 month 3; month3 month4]
This example contains 2 training points with the evolution of 3 months. Note that they overlap.
Use the Network
After the network is trained and validated, the network object can be used to calculate the network response to any input. For example, if you want to find the network response to the fifth input vector in the building data set, you can use the following
a = net(houseInputs(:,5))
a =
34.3922
If you try this command, your output might be different, depending on the state of your random number generator when the network was initialized. Below, the network object is called to calculate the outputs for a concurrent set of all the input vectors in the housing data set. This is the batch mode form of simulation, in which all the input vectors are placed in one matrix. This is much more efficient than presenting the vectors one at a time.
a = net(houseInputs);
Each time a neural network is trained, can result in a different solution due to different initial weight and bias values and different divisions of data into training, validation, and test sets. As a result, different neural networks trained on the same problem can give different outputs for the same input. To ensure that a neural network of good accuracy has been found, retrain several times.
There are several other techniques for improving upon initial solutions if higher accuracy is desired. For more information, see Improve Neural Network Generalization and Avoid Overfitting.
strong text

Time series classification MATLAB

My task is to classify time-series data with use of MATLAB and any neural-network framework.
Describing task more specifically:
Is is a problem from computer-vision field. Is is a scene boundary detection task.
Source data are 4 arrays of neighbouring frame histogram correlations from the videoflow.
Based on this data, we have to classify this timeseries with 2 classes:
"scene break"
"no scene break"
So network input is 4 double values for each source data entry, and output is one binary value. I am going to show example of src data below:
0.997894,0.999413,0.982098,0.992164
0.998964,0.999986,0.999127,0.982068
0.993807,0.998823,0.994008,0.994299
0.225917,0.000000,0.407494,0.400424
0.881150,0.999427,0.949031,0.994918
Problem is that pattern-recogition tools from Matlab Neural Toolbox (like patternnet) threat source data like independant entrues. But I have strong belief that results will be precise only if net take decision based on the history of previous correlations.
But I also did not manage to get valid response from reccurent nets which serve time series analysis (like delaynet and narxnet).
narxnet and delaynet return lousy result and it looks like these types of networks not supposed to solve classification tasks. I am not insert any code here while it is allmost totally autogenerated with use of Matlab Neural Toolbox GUI.
I would apprecite any help. Especially, some advice which tool fits better for accomplishing my task.
I am not sure how difficult to classify this problem.
Given your sample, 4 input and 1 output feed-forward neural network is sufficient.
If you insist on using historical inputs, you simply pre-process your input d, such that
Your new input D(t) (a vector at time t) is composed of d(t) is a 1x4 vector at time t; d(t-1) is 1x4 vector at time t-1;... and d(t-k) is a 1x4 vector at time t-k.
If t-k <0, just treat it as '0'.
So you have a 1x(4(k+1)) vector as input, and 1 output.
Similar as Dan mentioned, you need to find a good k.
Speaking of the weights, I think additional pre-processing like windowing method on the input is not necessary, since neural network would be trained to assign weights to each input dimension.
It sounds a bit messy, since the neural network would consider each input dimension independently. That means you lose the information as four neighboring correlations.
One possible solution is the pre-processing extracts the neighborhood features, e.g. using mean and std as two features representative for the originals.