How to use multiple labels as targets in Neural Net Pattern Recognition Toolbox? - matlab

I am trying to use the Neural Net Pattern Recognition toolbox in MATLAB for recognizing different types of classes in my dataset. I have a 21392 x 4 table, with the columns 1-3 which I would like to use as predictors and the 4th column has the labels with 14 different categories (strings like Angry, Sad, Happy, Neutral etc.). It seems that the Neural Net Pattern Recognition toolbox, unlike the MATLAB Classification Learner toolbox doesn't allow me to import the table and automatically extract the predictors and responses from it. Moreover, I am unable to either specify the inputs and targets to the neural network manually as it isn't showing up in the options.
I looked into the examples like the Iris Dataset, Wine Dataset, Cancer Dataset etc., but all of them only have 2-3 classes as outputs which are being Identified (and encoded in binary like 000, 010, 011 etc.) and the labels are not string type unlike mine like Angry, Sad, Happy, Neutral etc. (total 14 different classes). I would like to know how I can use my table as input to the neural network pattern recognition toolbox, or otherwise, any way in which I can extract the data from my table and use it in the toolbox. I am new to using the toolbox, so any help in this regard would be highly appreciated. Thanks!

The first step to use the Neural Net Pattern Recognition Toolbox is to convert the table to a numeric array, as neural networks work only with numeric arrays, not other datatypes directly. Considering the table as my_table, it can be converted to a numeric array using
my_table_array = table2array(my_table);
From my_table_array, the inputs (predictors) and outputs/targets can be extracted. But, it is imperative to mention that the inputs and outputs need to be transposed (as the data is needed to be in column format for the toolbox, each column is one datapoint, and each row is the feature), which can easily be accomplished using:-
inputs = inputs'; %(now of dimensions 3x21392)
labels = labels'; %(now of dimensions 1x21392)
The string type labels (categorical) can be converted to numeric values using a one-hot encoding technique with categorical, followed by ind2vec:
my_table_vector = ind2vec(double(categorical(labels)));
Now, the my_table_vector (final targets) and inputs (final input predictors) can easily be fed to the neural network and used for classification/prediction of the target labels.

Related

Input values of an ANN constructed with keras framework (using theano)

I want to costruct a neural network which will be trained based on data i create. My question is what form these data should have? In other words does keras allow neural networks that take strings/characters as input? If not, and only is able to accept numbers in what range should the input/output be?
The only condition for your input data i.e features, is that it should be numerical. There isn't really any constraint on range but it's always a good idea to do Feature Scaling, Normalization etc to make sure that our model won't get confused. Neural Networks or other machine learning methods cannot accept string (characters, words) directly, therefore, you need to first convert string to numbers. There are many ways to do that, most common techniques include Bag of Words, tf-idf features, word embeddings etc.
Following tutorials (using scikit) might be a good starting point:
http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-1-for-beginners-bag-of-words

Training SVM classifier in MATLAB with numeric+text data

I want to train a SVM classifier in MATLAB for threat detection. The training data is in Excel file and contains both numeric and text fields/columns. When I export this data to MATLAB, it is either in table or cell format. How do I convert it in matrix format?
P.S: Using xlsread function does not import text data.
There are 4 type of attributes in data. Numerical ,discrete , nominal and ordinal. Here you can read more about them . First run an statistical analysis for each feature in your dataset to know the basic statistics such as mean, median, max , min , variable type and if it like nominal or ordinal distinct words and all. So you then have a pretty good idea what you are dealing with.Then according to the variable type you can decide which vectorization we are using.if it is an numerical variable you can divide it into different classes and feature scaling . if it an ordinal variable you can give logical order . if it is nominal variable you can give a identical numerical names. Here , you are just checking how much each feature bring the impact to final prediction
My advice , use Weka GUI too to visualize the data. Then you can pre process the data with column by column
You need to transform your text fields into numeric using dummy variables or another technique, or drop them entirely if they actually are id's (e.g. patient name for medical data, record number, respondent uuid for a survey, etc.)
This would actually be easier in R or Python+Pandas, but in Matlab, you will need to perform encoding by yourself, working from the cell array towards a matrix. Or you can try this toolbox.

Neural Network Categorical Data Implementation

I've been learning to work with neural networks as a hobby project, but am at a complete loss with how to handle categorical data. I read the article http://visualstudiomagazine.com/articles/2013/07/01/neural-network-data-normalization-and-encoding.aspx, which explains normalization of the input data and explains how to preprocess categorical data using effects encoding. I understand the concept of breaking the categories into vectors, but have no idea how to actually implement this.
For example, if I'm using countries as categorical data (e.g. Finland, Thailand, etc), would I process the resulting vector into a single number to be fed to a single input, or would I have a separate input for each component of the vector? Under the latter, if there are 196 different countries, that would mean I would need 196 different inputs just to process this particular piece of data. If a lot of different categorical data is being fed to the network, I can see this becoming really unwieldy very fast.
Is there something I'm missing? How exactly is categorical data mapped to neuron inputs?
Neural network inputs
As a rule of thumb: different classes and categories should have their own input signals.
Why you can't encode it with a single input
Since a neural network acts upon the input values through activation functions, a higher input value will result in a higher activation input.
A higher input value will make the neuron more likely to fire.
As long as you don't want to tell the network that Thailand is "better" than Finland then you may not encode the country input signal as InputValue(Finland) = 24, InputValue(Thailand) = 140.
How it should be encoded
Each country deserves its own input signal so that they contribute equally to activating the neurons.

Pattern recognition teachniques that allow input as sequence of different length

I am trying to classify water end-use events expressed as a time-series sequences into appropriate categories (e.g. toilet, tap, shower, etc). My first attempt using HMM shows a quite promising result with an average accuracy of 80%. I just wonder if there is any other techniques that allow the training input as time-series sequences of different length like HMM does rather than the extracted feature vector of each sequence. I have tried Conditional Random Field (CRF) and SVM ;however, as far as I know, these two techniques require input as a pre-computed feature vector and the length of all input vectors must be the same for training purpose. I am not sure if I am right or wrong at this point. Any help would be appreciated.
Thanks, Will

Time series classification MATLAB

My task is to classify time-series data with use of MATLAB and any neural-network framework.
Describing task more specifically:
Is is a problem from computer-vision field. Is is a scene boundary detection task.
Source data are 4 arrays of neighbouring frame histogram correlations from the videoflow.
Based on this data, we have to classify this timeseries with 2 classes:
"scene break"
"no scene break"
So network input is 4 double values for each source data entry, and output is one binary value. I am going to show example of src data below:
0.997894,0.999413,0.982098,0.992164
0.998964,0.999986,0.999127,0.982068
0.993807,0.998823,0.994008,0.994299
0.225917,0.000000,0.407494,0.400424
0.881150,0.999427,0.949031,0.994918
Problem is that pattern-recogition tools from Matlab Neural Toolbox (like patternnet) threat source data like independant entrues. But I have strong belief that results will be precise only if net take decision based on the history of previous correlations.
But I also did not manage to get valid response from reccurent nets which serve time series analysis (like delaynet and narxnet).
narxnet and delaynet return lousy result and it looks like these types of networks not supposed to solve classification tasks. I am not insert any code here while it is allmost totally autogenerated with use of Matlab Neural Toolbox GUI.
I would apprecite any help. Especially, some advice which tool fits better for accomplishing my task.
I am not sure how difficult to classify this problem.
Given your sample, 4 input and 1 output feed-forward neural network is sufficient.
If you insist on using historical inputs, you simply pre-process your input d, such that
Your new input D(t) (a vector at time t) is composed of d(t) is a 1x4 vector at time t; d(t-1) is 1x4 vector at time t-1;... and d(t-k) is a 1x4 vector at time t-k.
If t-k <0, just treat it as '0'.
So you have a 1x(4(k+1)) vector as input, and 1 output.
Similar as Dan mentioned, you need to find a good k.
Speaking of the weights, I think additional pre-processing like windowing method on the input is not necessary, since neural network would be trained to assign weights to each input dimension.
It sounds a bit messy, since the neural network would consider each input dimension independently. That means you lose the information as four neighboring correlations.
One possible solution is the pre-processing extracts the neighborhood features, e.g. using mean and std as two features representative for the originals.