I want to do classification using MultiLayer Perceptron with Backprogation algorithm.
I have 5 classes and any input data belong to a single class.(no multi class)
Ex: C1 C2 C3 C4 C5
Input 1 belongs to only C2
Input 2 belongs to only C5
How should I represent the output layer for each input??
input layer Output Layer
input1 : 0 1 0 0 0
input1 : 0 0 0 0 1
or
take only single neuron in output layer.
input layer Output Layer
input1 : 0.4
input1 : 1.0
if <=0.2 C1
if <=0.4 C2
if <=0.6 C3
if <=0.8 C4
if <=1.0 C5
Is there any other better method??
Thanks,
Atish
You should represent your 5 classes by 5 binary outputs. This is known as 1-of-C encoding, one-hot encoding, dummy variables, indicators, ...
Then you need a softmax activation function in the output layer which will give you class probabilities as outputs. In addition, you should use the cross-entropy (CE) error function. Softmax+CE will give you the same gradient as identity+SSE in the output layer: dE/da_i = y_i - t_i. Softmax+CE has been used for up to 20,000 classes in the ImageNet dataset.
In many cases of NN, you better configure one output node to represent one class. :)
Related
Consider the hypothetical neural network here
$o_1$ is the output of neuron 1.
$o_2$ is the output of neuron 2.
$w_1$ is the weight of connection between 1 and 3.
$w_2$ is the weight of connection between 2 and 3.
So the input to neuron 3 is $i =o_1w_1 +o_2w_2$
Let the activation function of neuron 3 be sigmoid function.
$f(x) = \dfrac{1}{1+e^{-x}}$ and the threshold value of neuron 3 be $\theta$.
Therefore, output of neuron 3 will be $f(i)$ if $i\geq\theta$ and $0$ if $i\lt\theta$.
Am I correct?
Thresholds are used for binary neurons (I forget the technical name), whereas biases are used for sigmoid (and pretty much all modern) neurons. Your understanding of the threshold is correct, but again this is used in neurons where the output is either 1 or 0, which is not very useful for learning (optimization). With a sigmoid neuron, you would simply add the bias (previously the threshold but moved to the other side of the equation), so you're output would be f(weight * input + bias). All the sigmoid function is doing (for the most part) is limiting your output to a value between 0 and 1
I do not think it is the place to ask this sort of questions. You will find lot of NN ressources online. For your simple case, each link has a weight, so basicly the input of neuron 3 is :
Neuron3Input = Neuron1Output * WeightOfLinkNeuron1To3 + Neuron2Output * WeightOfLinkNeuron2To3
+ bias.
Then, to get the output, just use the activation function. Neuron3Output = F_Activation(Neuron3Input)
O3 = F(O1 * W1 + O2 * W2 + Bias)
I am trying to use F1 scores for model selection in multiclass classification.
I am calculating them class-wise and average over them:
(F1(class1)+F1(class1)+F1(class1))/3 = F1(total)
However, in some cases I get NaN values for the F1 score. Here is an example:
Let true_label = [1 1 1 2 2 2 3 3 3] and pred_label = [2 2 2 2 2 2 3 3 3].
Then the confusion matrix looks like:
C =[0 3 0; 0 3 0; 0 0 3]
Which means when I calculate the precision (to calculate the F1 score) for the first class, I obtain: 0/(0+0+0), which is not defined or NaN.
Firstly, am I making a mistake in calculating F1 scores or precisions here?
Secondly, how should I treat these cases in model selection? Ignore them or should I just set the F1 scores for this class to 0 (reducing the total F1 score for this model).
Any help would be greatly appreciated!
You need to avoid the division by zero for the precision in order to report meaningful results. You might find this answer useful, in which you explicitly report a poor outcome. Additionally, this implementation suggests an alternate way to differentiate in your reporting between good and poor outcomes.
I have approximately 5000 integer vectors (=SIZE) that look like:
[1 0 4 2 0 1 3 ...]
They have the same length N=32 and their values ranges from 0 to 4 but let's say [0 MAX].
I created a NN that takes vectors as inputs and outputs a binary array corresponding to one of the desired output(number of possible outputs = M):
for instance [0 1 0 0 ...0] => 2nd output. array_length = M
I used a Multi Layer Perceptron in Neuroph with those integer values but it did not converge.
So I am guessing the problem is using integer values or using a MLP with 3 layers: input, hidden and output.
Can you advise me on the network structure? which type of NN is suitable? Should I remodel the input and output to simplify the learning process? I have been thinking about Gray encoding for the integers input.
I would like to perform simple LDA on my small data set (65x8). I have 65 instances (samples) , 8 features (attributes) and 4 classes. Any matlab code for LDA , as I know Matlab Toolbox does not have LDA function So I need to write own code. Any help?
I find on web this code
load /Data;
All_data= Data(:,1:8);
All_data_label= Data(:,9);
testing_ind = [];
for i = 1:length(Data)
if rand>0.8
testing_ind = [testing_ind, i];
end
end
training_ind = setxor(1:length(Data), testing_ind);
[ldaClass,err,P,logp,coeff] = classify(Data(testing_ind,:),...
Data((training_ind),:),Data_label(training_ind,:),'linear');
[ldaResubCM,grpOrder] = confusionmat(All_data_label(testing_ind,:),ldaClass)
Then I got this results
ldaClass =
3
2
3
2
1
4
3
3
1
2
1
1
2
err =
0.2963
P =
0.0001 0.0469 0.7302 0.2229
0.1178 0.5224 0.3178 0.0419
0.0004 0.2856 0.4916 0.2224
0.0591 0.6887 0.1524 0.0998
0.8327 0.1637 0.0030 0.0007
0.0002 0.1173 0.3897 0.4928
0.0000 0.0061 0.7683 0.2255
0.0000 0.0241 0.5783 0.3976
0.9571 0.0426 0.0003 0.0000
0.2719 0.5569 0.1630 0.0082
0.9999 0.0001 0.0000 0.0000
0.9736 0.0261 0.0003 0.0000
0.0842 0.6404 0.2634 0.0120
coeff =
4x4 struct array with fields:
type
name1
name2
const
linear
ldaResubCM =
4 0 0 0
0 3 1 0
0 1 1 0
0 0 2 1
grpOrder =
1
2
3
4
So I have 65 Instances, 8 Attributes and 4 classes (1,2,3,4). So dont know how to interpret these results. Any help?
The interpretation of the results derives directly from the documentation of classify.
classify trains a classifier based on the training data and labels (second and third argument), and applies the classifier to the test data (first argument).
ldaClass gives the classes chosen for the test data points, based on the classifier that has been trained using the training data points and labels.
err is the training error rate, the fraction of training data points that are incorrectly classified using the classifier which was trained using that data. The training error rate underestimates the error to be expected on independent test data.
P gives the posterior probabilities. I.e. for each test data point (rows) it gives for each class (columns) the probability that the data point belongs to that class. Probabilities sum to 1 across classes (for each row). The definite classification in ldaClass derives from the posterior probabilities such that for each test data point the class with the highest probability is chosen: [~, ind] = max(P') results in ind = ldaClass'.
coeff contains details about the trained classifier. In order to use this, you have to study in detail how the classifier works.
confusionmat compares the classes assigned by the classifier to the test data with the known true classes, and makes a table of the results, a confusion matrix. Each row corresponds to the true class of a test data point, each column to the class assigned by the classifier. Numbers on the diagonal indicate correct classifications; in your result, you have a test error of 1 - sum(diag(confusionmat)) / sum(confusionmat(:)) of 0.308. In particular, the confusion matrix shows you that of the 4 test data points that belong to class two, three have been classified correctly and 1 incorrectly (as belonging to class three).
grpOrder just gives the explicit class labels for the four classes numbered 1 to 4; in your case, indices and labels are identical.
I have some basic questions regarding multivariate model. In the ARFIT toolbox, the demo file ardem.m shows the working of a 2nd order bivariate (v1,v2) AR model. The coefficient matrices
A1 = [ 0.4 1.2; 0.3 0.7 ]
A2 = [ 0.35 -0.3; -0.4 -0.5 ]
are concatenated into
A = [ A1 A2 ]
Then a transpose of A is taken. So the result is a 2*4 matrix.
My question is that there should be only 4 coefficients viz. 2 for v1 variable and 2 for v2 variable but why are there 8 coefficients? If the equation format is
v(k,:) = a11*v1(k-1)+a12*v1(k-2) + a21*v2(k-1)+ a22*v2(k-2)
where a11 = 0.4, a12=1.2, a21=0.3 and a22=0.7.
I think I am missing somewhere in understanding. Can somebody please explain what is the correct representation?
The matrices A1 and A2 contain transfer coefficients that describe the contribution of states at times k-1 and k-2, respectively, to the state at time k. Since this is a bivariate process, we are following two variables which can influence each other, and both A1 and A2 are 2 x 2. Writing v1 = v(k,1) and v2 = v(k,2):
v1(k) = A1(1,1)*v1(k-1) + A1(1,2)*v2(k-1) + A2(1,1)*v1(k-2) + A2(1,2)*v2(k-2)
and similarly for v2(k). Then collectively A1 and A2 contain 8 elements. If the two processes were independent then A1 and A2 would be diagonal and would collectively contain only 4 nonzero elements.
By the way this is not really a Matlab question so I don't think this is the right forum for this question.