Matlab SVM linear binary classification failure - matlab

I'm trying to implement a simple SVM linear binary classification in Matlab but I got strange results.
I have two classes g={-1;1} defined by two predictors varX and varY. In fact, varY is enough to classify the dataset in two distinct classes (about varY=0.38) but I will keep varX as random variable since I will need it to other works.
Using the code bellow (adapted from MAtlab examples) I got a wrong classifier. Linear classifier should be closer to an horizontal line about varY=0.38, as we can perceive by ploting 2D points.
It is not displayed the line that should separate two classes
What am I doing wrong?
g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528;
0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY
SVMmodel_testm = fitcsvm(m3,g,'KernelFunction','Linear');
d = 0.005; % Step size of the grid
[x1Grid,x2Grid] = meshgrid(min(m3(:,1)):d:max(m3(:,1)),...
min(m3(:,2)):d:max(m3(:,2)));
xGrid = [x1Grid(:),x2Grid(:)]; % The grid
[~,scores2] = predict(SVMmodel_testm,xGrid); % The scores
figure();
h(1:2)=gscatter(m3(:,1), m3(:,2), g,'br','ox');
hold on
% Support vectors
h(3) = plot(m3(SVMmodel_testm.IsSupportVector,1),m3(SVMmodel_testm.IsSupportVector,2),'ko','MarkerSize',10);
% Decision boundary
contour(x1Grid,x2Grid,reshape(scores2(:,1),size(x1Grid)),[0 0],'k');
xlabel('varX'); ylabel('varY');
set(gca,'Color',[0.5 0.5 0.5]);
hold off

A common problem with SVM or any classification method for that matter is unnormalized data. You have one dimension that spans for 0 to 1 and the other from about 0.3 to 0.4. This causes inbalance between the features. Common practice is to somehow normalize the features, for examply by std. try this code:
g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528;
0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY
m3(:,2) = m3(:,2)./std(m3(:,2));
SVMmodel_testm = fitcsvm(m3,g,'KernelFunction','Linear');
Notice the line before the last.

Related

How to interprete the regression plot obtained at the end of neural network regression for multiple outputs?

I have trained my Neural network model using MATLAB NN Toolbox. My network has multiple inputs and multiple outputs, 6 and 7 respectively, to be precise. I would like to clarify few questions based on it:-
The final regression plot showed at the end of the training shows a very good accuracy, R~0.99. However, since I have multiple outputs, I am confused as to which scatter plot does it represent? Shouldn't we have 7 target vs predicted plots for each of the output variable?
According to my knowledge, R^2 is a better method of commenting upon the accuracy of the model, whereas MATLAB reports R in its plot. Do I treat that R as R^2 or should I square the reported R value to obtain R^2.
I have generated the Matlab Script containing weight, bias and activation functions, as a final Result of the training. So shouldn't I be able to simply give my raw data as input and obtain the corresponding predicted output. I gave the exact same training set using the indices Matlab chose for training (to cross check), and plotted the predicted output vs actual output, but the result is not at all good. Definitely, not along the lines of R~0.99. Am I doing anything wrong?
code:
function [y1] = myNeuralNetworkFunction_2(x1)
%MYNEURALNETWORKFUNCTION neural network simulation function.
% X = [torque T_exh lambda t_Spark N EGR];
% Y = [O2R CO2R HC NOX CO lambda_out T_exh2];
% Generated by Neural Network Toolbox function genFunction, 17-Dec-2018 07:13:04.
%
% [y1] = myNeuralNetworkFunction(x1) takes these arguments:
% x = Qx6 matrix, input #1
% and returns:
% y = Qx7 matrix, output #1
% where Q is the number of samples.
%#ok<*RPMT0>
% ===== NEURAL NETWORK CONSTANTS =====
% Input 1
x1_step1_xoffset = [-24;235.248;0.75;-20.678;550;0.799];
x1_step1_gain = [0.00353982300884956;0.00284355877067267;6.26959247648903;0.0275865874012055;0.000366568914956012;0.0533831576137729];
x1_step1_ymin = -1;
% Layer 1
b1 = [1.3808996210168685;-2.0990163849711894;0.9651733083552595;0.27000953282929346;-1.6781835509820286;-1.5110463684800366;-3.6257438832309905;2.1569498669085361;1.9204156230460485;-0.17704342477904209];
IW1_1 = [-0.032892214008082517 -0.55848270745152429 -0.0063993424771670616 -0.56161004933654057 2.7161844536020197 0.46415317073346513;-0.21395624254052176 -3.1570133640176681 0.71972178875396853 -1.9132557838515238 1.3365248285282931 -3.022721627052706;-1.1026780445896862 0.2324603066452392 0.14552308208231421 0.79194435276493658 -0.66254679969168417 0.070353201192052434;-0.017994515838487352 -0.097682677816992206 0.68844109281256027 -0.001684535122025588 0.013605622123872989 0.05810686279306107;0.5853667840629273 -2.9560683084876329 0.56713425120259764 -2.1854386350040116 1.2930115031659106 -2.7133159265497957;0.64316656469750333 -0.63667017646313084 0.50060179040086761 -0.86827897068177973 2.695456517458648 0.16822164719859456;-0.44666821007466739 4.0993786464616679 -0.89370838440321498 3.0445073606237933 -3.3015566360833453 -4.492874075961689;1.8337574137485424 2.6946232855369989 1.1140472073136622 1.6167763205944321 1.8573696127039145 -0.81922672766933646;-0.12561950922781362 3.0711045035224349 -0.6535751823440773 2.0590707752473199 -1.3267693770634292 2.8782780742777794;-0.013438026967107483 -0.025741311825949621 0.45460734966889638 0.045052447491038108 -0.21794568374100454 0.10667240367191703];
% Layer 2
b2 = [-0.96846557414356171;-0.2454718918618051;-0.7331628718025488;-1.0225195290982099;0.50307202195645395;-0.49497234988401961;-0.21817117469133171];
LW2_1 = [-0.97716474643411022 -0.23883775971686808 0.99238069915206006 0.4147649511973347 0.48504023209224734 -0.071372217431684551 0.054177719330469304 -0.25963474838320832 0.27368380212104881 0.063159321947246799;-0.15570858147605909 -0.18816739764334323 -0.3793600124951475 2.3851961990944681 0.38355142531334563 -0.75308427071748985 -0.1280128732536128 -1.361052031781103 0.6021878865831336 -0.24725687748503239;0.076251356114485525 -0.10178293627600112 0.10151304376762409 -0.46453434441403058 0.12114876632815359 0.062856969143306296 -0.0019628163322658364 -0.067809039768745916 0.071731544062023825 0.65700427778446913;0.17887084584125315 0.29122649575978238 0.37255802759192702 1.3684190468992126 0.60936238465090853 0.21955911453674043 0.28477957899364675 -0.051456306721251184 0.6519451272106177 -0.64479205028051967;0.25743349663436799 2.0668075180209979 0.59610776847961111 -3.2609682919282603 1.8824214917530881 0.33542869933904396 0.03604272669356564 -0.013842766338427388 3.8534510207741826 2.2266745660915586;-0.16136175574939746 0.10407287099228898 -0.13902245286490234 0.87616472446622717 -0.027079111747601223 0.024812287505204988 -0.030101536834009103 0.043168268669541855 0.12172932035587079 -0.27074383434206573;0.18714562505165402 0.35267726325386606 -0.029241400610813449 0.53053853235049087 0.58880054832728757 0.047959541165126809 0.16152268183097709 0.23419456403348898 0.83166785128608967 -0.66765237856750781];
% Output 1
y1_step1_ymin = -1;
y1_step1_gain = [0.114200879346771;0.145581598485951;0.000139011547272197;0.000456244862967996;2.05816254143146e-05;5.27704485488127;0.00284355877067267];
y1_step1_xoffset = [-0.045;1.122;2.706;17.108;493.726;0.75;235.248];
% ===== SIMULATION ========
% Dimensions
Q = size(x1,1); % samples
% Input 1
x1 = x1';
xp1 = mapminmax_apply(x1,x1_step1_gain,x1_step1_xoffset,x1_step1_ymin);
% Layer 1
a1 = tansig_apply(repmat(b1,1,Q) + IW1_1*xp1);
% Layer 2
a2 = repmat(b2,1,Q) + LW2_1*a1;
% Output 1
y1 = mapminmax_reverse(a2,y1_step1_gain,y1_step1_xoffset,y1_step1_ymin);
y1 = y1';
end
% ===== MODULE FUNCTIONS ========
% Map Minimum and Maximum Input Processing Function
function y = mapminmax_apply(x,settings_gain,settings_xoffset,settings_ymin)
y = bsxfun(#minus,x,settings_xoffset);
y = bsxfun(#times,y,settings_gain);
y = bsxfun(#plus,y,settings_ymin);
end
% Sigmoid Symmetric Transfer Function
function a = tansig_apply(n)
a = 2 ./ (1 + exp(-2*n)) - 1;
end
% Map Minimum and Maximum Output Reverse-Processing Function
function x = mapminmax_reverse(y,settings_gain,settings_xoffset,settings_ymin)
x = bsxfun(#minus,y,settings_ymin);
x = bsxfun(#rdivide,x,settings_gain);
x = bsxfun(#plus,x,settings_xoffset);
end
The above one is the automatically generated code. The plot which I generated to cross-check the first variable is below:-
% X and Y are input and output - same as above
X_train = X(results.info1.train.indices,:);
y_train = Y(results.info1.train.indices,:);
out_train = myNeuralNetworkFunction_2(X_train);
scatter(y_train(:,1),out_train(:,1))
To answer your question about R: Yes, you should square R to get the R^2 value. In this case, they will be very close since R is very close to 1.
The graphs give the correlation between the estimated and real (target) values. So R is the strenght of the correlation. You can square it to find the R-square.
The graph you draw and matlab gave are not the graph of the same variables. The ranges or scales of the axes are very different.
First of all, is the problem you are trying to solve a regression problem? Or is it a classification problem with 7 classes converted to numeric? I assume this is a classification problem, as you are trying to get the success rate for each class.
As for your first question: According to the literature it is recommended to use the value "All: R". If you want to get the success rate of each of your classes, Precision, Recall, F-measure, FP rate, TP Rate, etc., which are valid in classification problems. values ​​you need to reach. There are many matlab documents for this (help ROC) and you can look at the details. All the values ​​I mentioned and which I think you actually want are obtained from the confusion matrix.
There is a good example of this.
[x,t] = simpleclass_dataset;
net = patternnet(10);
net = train(net,x,t);
y = net(x);
[c,cm,ind,per] = confusion(t,y)
I hope you will see what you want from the "nntraintool" window that appears when you run the code.
Your other questions have already been answered. Alternatively, you can consider using a machine learning algorithm with open source software such as Weka.

Matlab implementation of Perceptron - can't seem to fix plotting

This is my first go with ML (and Matlab) and I'm following "Learning From Data" by Yaser S. Abu-Mostafa.
I'm trying to implement the Perceptron algorithm, after trying to go through the pseudocode, using other people's solutions I can't seem to fix my problem (I went through other threads too).
The algorithm separates the data fine, it works. However, I want to plot a single line, but it seems as it separates them in a way so the '-1' cluster is divided to a second cluster or more.
This is the code:
iterations = 100;
dim = 3;
X1=[rand(1,dim);rand(1,dim);ones(1,dim)]; % class '+1'
X2=[rand(1,dim);1+rand(1,dim);ones(1,dim)]; % class '-1'
X=[X1,X2];
Y=[-ones(1,dim),ones(1,dim)];
w=[0,0,0]';
% call perceptron
wtag=weight(X,Y,w,iterations);
% predict
ytag=wtag'*X;
% plot prediction over origianl data
figure;hold on
plot(X1(1,:),X1(2,:),'b.')
plot(X2(1,:),X2(2,:),'r.')
plot(X(1,ytag<0),X(2,ytag<0),'bo')
plot(X(1,ytag>0),X(2,ytag>0),'ro')
legend('class -1','class +1','pred -1','pred +1')
%Why don't I get just one line?
plot(X,Y);
The weight function (Perceptron):
function [w] = weight(X,Y,w_init,iterations)
%WEIGHT Summary of this function goes here
% Detailed explanation goes here
w = w_init;
for iteration = 1 : iterations %<- was 100!
for ii = 1 : size(X,2) %cycle through training set
if sign(w'*X(:,ii)) ~= Y(ii) %wrong decision?
w = w + X(:,ii) * Y(ii); %then add (or subtract) this point to w
end
end
sum(sign(w'*X)~=Y)/size(X,2); %show misclassification rate
end
I don't think the problem is in the second function but I added it regardless
I'm pretty sure the algorithm separates it to more than one cluster but I can't tell why most of the learning I've done so far was math and theory and not actual coding so I'm probably missing something obvious..

(matlab) MLP with relu and softmax not working with mini-batch SGD and produces similar predictions on MNIST dataset

I implemented a multilayer perceptron with 1 hidden layer on MNIST dataset. The activation function in hidden layer is leaky(0.01) ReLu and output layer has a softmax activation function. The learning method is mini-batch SGD. The network structure is 784*30*10. The problem is I found the predictions the network made, for each input sample, are quite similar. That means the model would always like to think the image is some certain number. Thanks #Lemm Ras for pointing out the label-data mismatching problem in previous data_shuffle function and now fixed. But after some batch training, I found the predictions are still some kind of similar: That's confusing.
Another issue is the update value is too small comparing with original weight, in the MLP code, I add variable 'cc' and 'dd' to record the ratio between their weight_update and weight,
cc=W_OUTPUT_Update./W_OUTPUT;
dd=W_MLP_Update./W_MLP;
During debugging, the magnitude for cc is 10^-4(0.0001) and dd is also 10^-4. This might be the reason that the accuracy doesn't seems improved a lot.
After several days debugging. I have no idea why that happens and how to solve it, it made me stuck for one week. Can someone help me please?
The screenshot is the value of A2 after softmax function.
[dimension, images, labels, labels_matrix, train_amount, test_labels_matrix, test_images, test_labels, test_amount] = load_mnist_data(); %initialize str
images=images(:,1:10000); % for debugging, get part of whole data set
labels=labels(1:10000,1);
labels_matrix=labels_matrix(:,1:10000);
test_images=test_images(:,1:500);
test_labels=test_labels(1:500,1);
train_amount=10000;
test_amount=500;
% initialize the structure
[ W_MAD, W_MLP, W_OUTPUT] = initialize_structure(dimension, train_amount, test_amount);
epoch=100;
correct_rate=zeros(1,epoch); %record testing accuracy
corr=zeros(1,epoch); %record training accuracy
lr=0.2;
lamda=0;
batch_size=50;
for i=1:epoch
sprintf('MLP in iteration %d over %d', i, epoch)
%shuffle data
[labels_shuffled labels_matrix_shuffled images_shuffled]=shuffle_data(labels, labels_matrix,images);
[ cor, W_MLP, W_OUTPUT ] = train_mlp_relu(lr, leaky, lamda, momentum_gamma, batch_size,W_MLP, W_OUTPUT, W_MAD, power, images_shuffled, train_amount, labels_shuffled, labels_matrix_shuffled);
corr(i)=cor/train_amount;
% test
correct_rate(i) = structure_test( W_MAD, W_MLP, W_OUTPUT, test_images, test_labels, test_amount );
end
% plot results
plot(1:epoch,correct_rate);
Here's the training MLP function, please ignore L2 regularization parameter lamda which is currently set as 0.
%MLP with batch size batch_size
cor=0;
%leaky=(1/batch_size);
leaky=0.001;
for i=1:train_amount/batch_size
batch_images=images(:,batch_size*(i-1)+1:batch_size*i);
batch_labels=labels_matrix(:,batch_size*(i-1)+1:batch_size*i);
%from MAD to MLP
V1=W_MLP'*batch_images;
V1(1,:)=1; %set bias unit as 1
V1_dirivative=ones(size(V1));
V1_dirivative(find(V1<0))=leaky;
A1=relu(V1,leaky); % A stands for activation
V2=W_OUTPUT'* A1;
A2=softmax(V2);
%write these scope control codes into functions.
%train error
[val idx]=max(A2);
idx=idx-1; %because index(idx) for matrix vaires from 1 to 10 while label varies from 0 to 9.
res=labels(batch_size*(i-1)+1:batch_size*i)-idx';
cor=cor+sum(res(:)==0);
%softmax loss, due to relu applied nodes that has
%contribution to activate neurons has gradient 1; while <0 nodes
%has no contribution
delta_softmax=-(1/batch_size)*(batch_labels-A2);
delta_output=W_OUTPUT*delta_softmax.*V1_dirivative;
%update
W_OUTPUT_Update=lr*(1/batch_size)*A1*delta_softmax'+lamda*W_OUTPUT;
cc=W_OUTPUT_Update./W_OUTPUT;
W_MLP_Update=lr*(1/batch_size)*batch_images*delta_output'+lamda*W_MLP;
dd=W_MLP_Update./W_MLP;
k=mean(A2,2);
W_OUTPUT=W_OUTPUT-W_OUTPUT_Update;
W_MLP=W_MLP-W_MLP_Update;
end
end
Here is the softmax function:
function [ val ] = softmax( val )
val=exp(val);
val=val./repmat(sum(val),10,1);
end
The labels_matrix is the aimed output matrix for A2 and created as:
labels_matrix=full(sparse(labels+1,1:train_amount,1));
test_labels_matrix=full(sparse(test_labels+1,1:test_amount,1));
And Relu:
function [ val ] = relu( val,leaky )
val(find(val<0))=leaky*val(find(val<0));
end
Data shuffle
%this version is wrong, due to it only shuffles label and data without doing the same shuffling on the 'labels_matrix' which is used to calculate MLP's delta in output layer. It destroyed the link between data and label.
% function [ label, data ] = shuffle_data( label, data )
% [row column]=size(data);
% array=randperm(column);
% data=data(:,array);
% label=label(array);
% %if shuffle respect to row then use the code below
% %data=data(randperm(row),:);
% end
function [ label, label_matrix, data ] = shuffle_data( label, label_matrix, data )
[row column]=size(data);
array=randperm(column);
data=data(:,array);
label=label(array);
label_matrix=label_matrix(:, array);
%if shuffle respect to row then use the code below
%data=data(randperm(row),:);
end
Data loading:
function [ dimension, images, labels, labels_matrix, train_amount, test_labels_matrix, test_images, test_labels, test_amount] = load_mnist_data()
%%load training and testing data, labels
data_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/train-images.idx3-ubyte';
label_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/train-labels.idx1-ubyte';
test_data_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/t10k-images.idx3-ubyte';
test_label_location='C:\Users\yz39g15\Documents\MATLAB\common\mnist test\for the report/modify/t10k-labels.idx1-ubyte';
images = loadMNISTImages(data_location);
labels = loadMNISTLabels(label_location);
test_images=loadMNISTImages(test_data_location);
test_labels=loadMNISTLabels(test_label_location);
%%data centralization
[dimension train_amount]=size(images);
[dimension test_amount]=size(test_images);
%%complete normalization
%%transform labels from index to matrix in order to apply square loss function in output layer
labels_matrix=full(sparse(labels+1,1:train_amount,1));
test_labels_matrix=full(sparse(test_labels+1,1:test_amount,1));
end
When you are shuffling the images, the association data-label is lost. Since this association must survive, what you need is to enforce the same shuffling for both data and labels.
In order to do so you could, for instance, Create an external shuffled index list: shuffled=randperm(N), with N the number of images and then pass to the train method either the list created or the elements of images and label addressed by the shuffled list.

Normalization 3D Image according to Slices in MATLAB

I have a matrix which is 256X192X80. I want to normalize all slices (80 represents the slices) without using for loop.
The way I'm doing with for is below: (im_dom_raw is our matrix)
normalized_raw = zeros(size(im_dom_raw));
for a=1:80
slice_raw = im_dom_raw(:,:,a);
slice_raw = slice_raw-min(slice_raw(:));
slice_raw = slice_raw/(max(slice_raw(:)));
normalized_raw(:,:,a) = slice_raw;
end
The code below implements your normalization approach without using loops. Its based on bsxfun.
% Shift all values to the positive side
slices_raw = bsxfun(#minus,im_dom_raw,min(min(im_dom_raw)));
% Normalize all values with respect to the slice maximum (With input from #Daniel)
normalized_raw2 = bsxfun(#mrdivide,slices_raw,max(max(slices_raw)));
% A slightly faster approach would be
%normalized_raw2 = bsxfun(#times,slices_raw,max(max(slices_raw)).^-1);
% ... but it will differ with your approach due to numerical approximation
% Comparison to your previous loop based implementation
sum(abs(normalized_raw(:)-normalized_raw2(:)))
The last line of code outputs
ans =
0
Which (thanks to #Daniel) means that both approaches yield exact same results.

MATLAB: Naive Bayes with Univariate Gaussian

I am trying to implement Naive Bayes Classifier using a dataset published by UCI machine learning team. I am new to machine learning and trying to understand techniques to use for my work related problems, so I thought it's better to get the theory understood first.
I am using pima dataset (Link to Data - UCI-ML), and my goal is to build Naive Bayes Univariate Gaussian Classifier for K class problem (Data is only there for K=2). I have done splitting data, and calculate the mean for each class, standard deviation, priors for each class, but after this I am kind of stuck because I am not sure what and how I should be doing after this. I have a feeling that I should be calculating posterior probability,
Here is my code, I am using percent as a vector, because I want to see the behavior as I increase the training data size from 80:20 split. Basically if you pass [10 20 30 40] it will take that percentage from 80:20 split, and use 10% of 80% as training.
function[classMean] = naivebayes(file, iter, percent)
dm = load(file);
for i=1:iter
idx = randperm(size(dm.data,1))
%Using same idx for data and labels
shuffledMatrix_data = dm.data(idx,:);
shuffledMatrix_label = dm.labels(idx,:);
percent_data_80 = round((0.8) * length(shuffledMatrix_data));
%Doing 80-20 split
train = shuffledMatrix_data(1:percent_data_80,:);
test = shuffledMatrix_data(percent_data_80+1:length(shuffledMatrix_data),:);
train_labels = shuffledMatrix_label(1:percent_data_80,:)
test_labels = shuffledMatrix_data(percent_data_80+1:length(shuffledMatrix_data),:);
%Getting the array of percents
for pRows = 1:length(percent)
percentOfRows = round((percent(pRows)/100) * length(train));
new_train = train(1:percentOfRows,:)
new_trin_label = shuffledMatrix_label(1:percentOfRows)
%get unique labels in training
numClasses = size(unique(new_trin_label),1)
classMean = zeros(numClasses,size(new_train,2));
for kclass=1:numClasses
classMean(kclass,:) = mean(new_train(new_trin_label == kclass,:))
std(new_train(new_trin_label == kclass,:))
priorClassforK = length(new_train(new_trin_label == kclass))/length(new_train)
priorClassforK_1 = 1 - priorClassforK
end
end
end
end
First, compute the probability of evey class label based on frequency counts. For a given sample of data and a given class in your data set, you compute the probability of evey feature. After that, multiply the conditional probability for all features in the sample by each other and by the probability of the considered class label. Finally, compare values of all class labels and you choose the label of the class with the maximum probability (Bayes classification rule).
For computing conditonal probability, you can simply use the Normal distribution function.