normalization command before using classifier in matlab - matlab

I am using normalization command:
normA = Xtrain - min(Xtrain(:));
Xtrain = normA ./ max(normA(:));
normB = Xtest - min(Xtest(:));
Xtest = normB ./ max(normB(:));
to normalized my data before using classifier (design tree), but every time I got very poor accuracy, it is around 55.00. Meanwhile, I got accuracy 93.88 without using the normalization algorithm. can anyone tell me what the problem exactly and what I have to do?
This is my code:
load('train_and_test_data.mat')
Xtrain= Xtrain(:, 2:42);
Xtest= Xtest(:,2:42);
normA = Xtrain - min(Xtrain(:));
Xtrain = normA ./ max(normA(:));
normB = Xtest - min(Xtest(:));
Xtest = normB ./ max(normB(:));
Mdl = fitctree(Xtrain ,Ytrain);
y =Mdl.predict(Xtest); %test
Conf_Mat = confusionmat(Ytest,y)
This small sample of data I am using before normalization:
1 0 0 0 0
17 4 2 2 0
38 20 17 0 0
11 2 2 0 0
2 1 1 0 0
11 1 4 0 0
8 5 1 1 1
21 1 16 0 0
27 12 11 0 0
13 11 2 1 0
12 3 2 2 1

You are not normalizing the training and the test set using the same transformation.
normA = Xtrain - min(Xtrain(:));
Xtrain = normA ./ max(normA(:));
normB = Xtest - min(Xtest(:));
Xtest = normB ./ max(normB(:));
Subtracting a different amount. Dividing by a different amount. Therefore the inputs from your test set are not comparable to the inputs from your training set. Instead, normalize your test data using the same transformation.
normB = Xtest - min(Xtrain(:));
Xtest = normB ./ max(normA(:));

Related

Set length of a column vector equal to the length of a submatrix

I am trying to use the convhull function in a loop and for that I need to split matrices into submatrices of different sizes. Here is the code I am using:
x1=data(:,5); % x centre location
y1=data(:,16); % y centre location
z1=phi*90; % phi angle value
n=300;
%Create regular grid across data space
[X,Y] = meshgrid(linspace(min(x1),max(x1),n), linspace(min(y1),max(y1),n));
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% PLOT USING SCATTER - TRYING TO ISOLATE SOME REGIONS %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
c=z1>10 & z1 < 20;
c=c.*1;
j=1;
for i=1:length(z1)
if z1(i)< 20 && z1(i)> 10
c(i) = 1;
else
c(i)= 0;
end
end
C=[c c c];
C = ~C;
elementalLengthA = cellfun('length',regexp(sprintf('%i',all(C,2)),'1+','match'));
elementalStartA = regexp(sprintf('%i',all(C,2)),'1+','start');
result = cell(length(elementalLengthA),1);
for i = 1:length(elementalLengthA)
result(i) = {C(elementalStartA(i):elementalStartA(i)+elementalLengthA(i)-1,:)};
length(x1(i))=length(cell2mat(result(i)));
length(y1(i))=length(cell2mat(result(i)));
end
My for loop doens't work properly and I get this error: ??? Subscript indices must either be real positive integers or
logicals.
My matrix C is an nx3 matrix made of lines of 1 and 0. With the result(i) line I am splitting the C matrix into submatrices of 1. Let's say
c = [1 1 1;
0 0 0;
0 0 0;
1 1 1;
1 1 1;
1 1 1;
0 0 0;
1 1 1;
1 1 1;]
Then
>> cell2mat(result(1))
ans =
1 1 1
>> cell2mat(result(2))
ans =
1 1 1
1 1 1
1 1 1
>> cell2mat(result(3))
ans =
1 1 1
1 1 1
Now x1 and y1 are two vector column nx1. And I want to split them according to the length of C submatrices. so length(x1(1)) should be 1, length(x1(2))=3, length(x1(3))=2 and same for the y vector.
Is it possible to do that?
EDIT:
Just to make it more clear
For instance
x1 =
1
2
3
4
5
6
7
8
9
and
y1 =
2
4
6
8
10
12
14
16
18
I want to get this as an output:
x1(1)=[1], x1(2)=[4 5 6]' and x1(3)=[8 9]'
y1(1)=[2], y1(2)[8 10 12]' and y1(3)=[16 18]'
Thanks
Dorian

How to plot a DET curve from results provided by Weka?

I am facing a problem of classification between 4 classes, I used for this classification Weka and I get a result in this form:
Correctly Classified Instances 3860 96.5 %
Incorrectly Classified Instances 140 3.5 %
Kappa statistic 0.9533
Mean absolute error 0.0178
Root mean squared error 0.1235
Relative absolute error 4.7401 %
Root relative squared error 28.5106 %
Total Number of Instances 4000
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.98 0.022 0.936 0.98 0.957 0.998 A
0.92 0.009 0.973 0.92 0.946 0.997 B
0.991 0.006 0.982 0.991 0.987 1 C
0.969 0.01 0.971 0.969 0.97 0.998 D
Weighted Avg. 0.965 0.012 0.965 0.965 0.965 0.998
=== Confusion Matrix ===
a b c d <-- classified as
980 17 1 2 | a = A
61 920 1 18 | b = B
0 0 991 9 | c = C
6 9 16 969 | d = D
My goal now is to draw (The Detection Error Trade-off) DET curve from results provided by Weka.
I found a MATLAB code that allows me to draw the DET curve, here are some line of code in this function:
Ntrials_True = 1000;
True_scores = randn(Ntrials_True,1);
Ntrials_False = 1000;
mean_False = -3;
stdv_False = 1.5;
False_scores = stdv_False * randn(Ntrials_False,1) + mean_False;
%-----------------------
% Compute Pmiss and Pfa from experimental detection output scores
[P_miss,P_fa] = Compute_DET(True_scores,False_scores);
the code of function Compute_DET is:
[Pmiss, Pfa] = Compute_DET(true_scores, false_scores)
num_true = max(size(true_scores));
num_false = max(size(false_scores));
total=num_true+num_false;
Pmiss = zeros(num_true+num_false+1, 1); %preallocate for speed
Pfa = zeros(num_true+num_false+1, 1); %preallocate for speed
scores(1:num_false,1) = false_scores;
scores(1:num_false,2) = 0;
scores(num_false+1:total,1) = true_scores;
scores(num_false+1:total,2) = 1;
scores=DETsort(scores);
sumtrue=cumsum(scores(:,2),1);
sumfalse=num_false - ([1:total]'-sumtrue);
Pmiss(1) = 0;
Pfa(1) = 1.0;
Pmiss(2:total+1) = sumtrue ./ num_true;
Pfa(2:total+1) = sumfalse ./ num_false;
return
but I have a problem with the translation of the meaning of different parameters. for example what is the significance of mean_False and stdv_False and what is the correspondence with the parameters of Weka?

How to use train in neural networks for Matlab R2009b

I have input matrix as:
input =
1 0 0 1 1
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
0 0 1 0 0
0 1 1 1 0
0 1 1 1 0
and
T = [eye(10) eye(10) eye(10) eye(10)];
The neural network that I created is:
net = newff(input,T,[35], {'logsig'})
%net.performFcn = 'sse';
net.divideParam.trainRatio = 1; % training set [%]
net.divideParam.valRatio = 0; % validation set [%]
net.divideParam.testRatio = 0; % test set [%]
net.trainParam.goal = 0.001;
It works fine till now, but when i use train function the problem arises
[net tr] = train(net,input,T);
and the following error show up in matlab window:
??? Error using ==> network.train at 145
Targets are incorrectly sized for network.
Matrix must have 5 columns.
Error in ==> test at 103
[net tr] = train(net,input,T);
I've also tried the input' and T' as well. Any help is appreciated in advance
If you look at MATLAB's official documentaion of train, you'll notice that T must have the same amount of columns as the input matrix, which is 5 in your case. Instead, try:
T = ones(size(input, 1));
or
T = [1, size(input, 1) - 1];
and see if this works.

Neural Network : Train Y= X1+X2 Poor performance: How to train small erratic pattern for regression

I was trying to simulate MATLAB's NN functions before testing my own coded network. I was training y = x1+x2.
But see how it performed,
>> net = newfit([1 2 3 4 5 0 1 2 5;1 2 3 4 5 1 1 1 1],[2 4 6 8 10 0 2 3 6],15);
>> net = train(net,[1 2 3 4 5 0 1 2 5;1 2 3 4 5 1 1 1 1],[2 4 6 8 10 0 2 3 6]);
>> sim(net,[1;4])
ans =
12.1028
>> sim(net,[4;4])
ans =
8.0000
>> sim(net,[4;1])
ans =
3.0397
>> sim(net,[2;2])
ans =
5.1659
>> sim(net,[3;3])
ans =
10.3024
Can anyone explain what is wrong with these training data? Is it not enough to estimate y = x1+x2 ? Or it just over-specialized? I believe it is a regression problem. Now, I do not know what should I expect from my own coded network. I was wondering on basis of what criteria this NN converges where it is producing such stupid result? is there any way to know what function it maps to (I know no way !)? My own network would not even converge , because it checks sum squared error as loop break condition. So how to deal with such training pattern?
However, I have another awesome training pattern which I am unable to train.
Can anyone train the following data set? Will it work/converge?
0 0 -------> 0
0 1 -------> 1000
1000 0 ----> 1
1 1 -------> 0
I have been using f(x)=x in output layer and used back propagation algorithm , but for this pattern the code never seems converge.
By calling
net = newfit([1 2 3 4 5 0 1 2 5;1 2 3 4 5 1 1 1 1],[2 4 6 8 10 0 2 3 6],15);
you create an ANN with hidden layers of size 15, which is probably too much for your problem. Besides, your training set is too small.
Here is a working code (it will take a while on old computers), I let you analyze it and diff with yours, please ask should you need further explanations:
% Prepare input and target vectors
a = perms(1:9);
x = a(:, 1);
y = a(:, 2);
z = x + y;
input = [x y];
% Create ANN
net = newfit(input',z',2);
% Learn
net.trainParam.epochs = 1000;
net = train(net, input', z');
Results are virtually perfect:
>> sim(net,[1;4])
ans =
5.0002
>> sim(net,[4;4])
ans =
7.9987
>> sim(net,[4;1])
ans =
4.9998
>> sim(net,[2;2])
ans =
4.0024
>> sim(net,[3;3])
ans =
5.9988
PS: NEWFIT is obsoleted in R2010b NNET 7.0. Last used in R2010a NNET 6.0.4.

Solving a MATLAB equation

I have the following equation:
((a^3)-(4*a^2))+[1 0 2;-1 4 6;-1 1 1] = 0
How do I solve this in MATLAB?
Here is one possibility:
% A^3 - 4*A^2 + [1 0 2;-1 4 6;-1 1 1] = 0
% 1) Change base to diagonalize the constant term
M = [1 0 2;-1 4 6;-1 1 1];
[V, L] = eig(M);
% 2) Solve three equations "on the diagonal", i.e. find a root of
% x^4 - 4*x^3 + eigenvalue = 0 for each eigenvalue of M
% (in this example, for each eigenvalue I choose the 3rd root,
% which happens to be real)
roots1 = roots([1 -4 0 L(1,1)]); r1 = roots1(3);
roots2 = roots([1 -4 0 L(2,2)]); r2 = roots2(3);
roots3 = roots([1 -4 0 L(3,3)]); r3 = roots3(3);
% 3) Build matrix solution and transform with inverse change of base
SD = diag([r1, r2, r3]);
A = V*SD*inv(V) % This is your solution
% The error should be practically zero
error = A^3 - 4*A^2 + [1 0 2;-1 4 6;-1 1 1]
norm(error)
(The error is actually of the order of 10^-14.)