I'm using LIBSVM toolbox for Matlab. My problem is a binary classification problem with labels of 1 (True) and 0 (False). When I train my system with this function :
svmstruct = svmtrain(TrainTargets, TrainInputs,['-t 2 ' '-g ' SIGMA ' -c ' P ' -q' ' -b 1']);
and test my test accuracy with this function :
[TestOutputs, ~, ~] = svmpredict(TestTargets, TestInputs, svmstruct,'-b 1 -q');
Now I want use desined SVM model for out sample data. So I use this function :
[OUT, ~, Prob_Out] = svmpredict(zeros(size(Outsample_DATA,1),1), Outsample_DATA, svmstruct,'-q -b 1');
For my first trained model (I have trained SVM model with different parameters) I have this Output (Out sample data set is same in both cases) : [Prob_Out OUT]
0.8807 0.1193 0
0.8717 0.1283 0
0.0860 0.9140 1.0000
0.7846 0.2154 0
0.7685 0.2315 0
0.7916 0.2084 0
0.0326 0.9674 1.0000
0.7315 0.2685 0
0.3550 0.6450 1.0000
for second one I have this :
0.4240 0.5760 0
0.4090 0.5910 0
0.7601 0.2399 1.0000
0.5000 0.5000 1.0000
0.4646 0.5354 0
0.4589 0.5411 0
Suppose that I want find class 1 with these probabilities. In first group of data when column 2 is larger than column 1 this sample belongs to class 1 but in second group when column 1 is larger than column 2 the sample belongs to class 1.
The structure of these two out sample data is same. What is the problem?
Thanks.
PS.
When I check SVMstruct parameters after training the model in on of these models Label is [0;1] and in another label is [1;0] !
As you have already noticed, the difference is due to the different mapping of the labels.
LIBSVM uses its own labels internally and therefore needs a mapping between the internal labels and the labels you provided.
The labels in this mapping are generated using the order the labels appear in the training data. So if the label of the first element in your training data changes, the label mapping also changes.
Related
Would greatly appreciate some help with the following challenge:
I am importing a fact table from a database into a Matlab table. The fact table consist of a sequence of observations across several categories as follows:
SeqNo Cat Observation
1 A 0.3
1 B 0.5
1 C 0.6
2 B 0.9
2 C 1.0
3 A 1.2
3 C 1.5
I need now to delinearize the fact table and create a matrix (or another table) with the categories representing columns, i.e. something like this:
Seq A B C
1 0.3 0.5 0.6
2 NaN 0.9 1.0
3 1.2 NaN 1.5
I played around with findgroup and the split-apply-combine workflow, but no luck. In the end I had to resort to SPSS Modeler create to create a properly structured csv file for import, but would need to achieve this fully in Matlab or Simulink.
Any help would be most welcome.
%Import table
T=readtable('excelTable.xlsx');
obs_Array=T.Observation;
%Extract unique elements from SeqNo column
seqNo_values=(unique(T.SeqNo));
%Extract unique elements from Cat column
cat_values=(unique(T.Cat));
%Notice that the elements in seqNo_values
%already specify the row of your new matrix
%The index of each element in cat_values
%does the same thing for the columns of your new matrix.
numRows=numel(seqNo_values);
numCols=numel(cat_values);
%Initialize a new, NaN matrix:
reformatted_matrix=NaN(numRows,numCols);
%magic numbers:
seqNo_ColNum=1;
cat_ColNum=2;
for i=1:numel(obs_Array)
target_row=T(i,seqNo_ColNum);
%convert to array for ease of indexing
target_row=table2array(target_row);
%convert to array for ease of indexing
target_cat=table2array(T(i,cat_ColNum));
target_cat=cell2mat(target_cat);
target_col=find([cat_values{:}] == target_cat);
reformatted_matrix(target_row,target_col)=obs_Array(i);
end
reformatted_matrix
Output:
reformatted_matrix =
0.3000 0.5000 0.6000
NaN 0.9000 1.0000
1.2000 NaN 1.5000
I am trying to write a method that checks if a matrix is orthogonal and return TRUE if it is or FALSE if it isn't My problem is that my isequal() is not working how I want it to. Basically I can do the check in two ways based on the two formulas:
ONE way is check to see if the transpose of matrix R is equal to the inverse of matrix R. If they are equal then it is orthogonal. (R'=inv(R))
ANOTHER way is to check and see if matrix R times the transpose of matrix R equals the Identity matrix of R. (R'R=I) If yes then the matrix is orthogonal. I have most been using isequal() but it keeps yielding false. Can someone look at my code and tell me why this would be so?
I use Z=orth(randn(3,3)) to generate random orthogonal matrix and i call my method isortho(Z)
function R = isortho(r)
%isortho(R), which returns true if R is orthogonal matrix, otherwise returns false.
if ismatrix(r) && size(r,1)==size(r,2) %checks if input is square matrix
'------'
trans=transpose(r)
inverted=inv(r)
isequal(trans,inverted)
trans==inverted
isequal(transpose(r),inv(r)) %METHOD ONE
i=size(r,1);
I=eye(i) %creating Identity matrix based on size of r
r*transpose(r)
r*transpose(r)==I %METHOD TWO
%check if transpose of r is times inverse of r equals Identity matrix of r
if (r*transpose(r)==I)
R= 'True';
else
R= 'False';
end
end
end
this is my output:
>> isortho(Z)
ans =
------
trans =
-0.2579 -0.7291 -0.6339
0.8740 0.1035 -0.4747
0.4117 -0.6765 0.6106
inverted =
-0.2579 -0.7291 -0.6339
0.8740 0.1035 -0.4747
0.4117 -0.6765 0.6106
ans = ////isequal(trans,inverted) which yielded 0 false
0
ans = ////trans==inverted
0 1 0
1 0 0
0 1 1
ans = ////isequal(transpose(r),inv(r))
0
I =
1 0 0
0 1 0
0 0 1
ans =
1.0000 0 0.0000
0 1.0000 0.0000
0.0000 0.0000 1.0000
ans =
1 1 0
1 1 0
0 0 1
ans =
False
>>
could someone help me fix this or tell my why the isequal() is failing when matrix inverted and trans appear to be the same?
As stated in the comments, you are running into computer precision issues. For more detail see Why is 24.0000 not equal to 24.0000 in MATLAB? and http://matlabgeeks.com/tips-tutorials/floating-point-comparisons-in-matlab/. This is not a Matlab specific thing, it's a computer thing, and you just have to deal with it.
In your case, you are trying to see whether two things are equal, but the two things are the result of a lot of floating point operations. So they will virtually never be exactly the same, but should always be very close. So, set a tolerance, say 1e-12, and say that the two things are equal if some measure of their difference is below that tolerance, e.g.:
norm(r.'-inv(r))<tol
Which finds the 2-norm of the difference between the two matrices, and then if it is less that tol, this will evaluate to 1, or true.
If I set tol=1e-12, then everything works well. If I set tol=1e-15, everything works well. But if I set tol=1e-16, then everything stops working! This is because the amount of computer precsion error is larger than 1e-16, so the answer to norm(r.'-inv(r)) cannot be accurate to that tolerance. The smallest amount Matlab can distinguish between on my computer is roughly 2.2x10^(-16), so you have to ensure that you tolerance is set well above this value. Setting tol too large will, of course, mean you say some non-orthogonal matrices are orthogonal, but I would not expect tol=1e-14 to give you any significant issues.
I have encountered a problem in MatLab as I attempt to run a loop. For each iteration in the loop eigenvalues and eigenvectors for a 3x3 matrix are calculated (the matrix differs with each iteration). Further, each iteration should always yield one eigenvector of the form [0 a 0], where only the middle-value, a, is non-zero.
I need to obtain the index of the column of the eigenvector-matrix where this occurs. To do this I set up the following loop within my main-loop (where the matrix is generated):
for i = 1:3
if (eigenvectors(1,i)==0) && (eigenvectors(3,i)==0)
index_sh = i
end
end
The problem is that the eigenvector matrix in question will sometimes have an output of the form:
eigenvectors =
-0.7310 -0.6824 0
0 0 1.0000
0.6824 -0.7310 0
and in this case my code works well, and I get index_sh = 3. However, sometimes the matrix is of the form:
eigenvectors =
0.0000 0.6663 0.7457
-1.0000 0.0000 0.0000
-0.0000 -0.7457 0.6663
And in this case, MatLab does not assign any value to index_sh even though I want index_sh to be equal to 1 in this case.
If anyone knows how I can tackle this problem, so that MatLab assigns a value also when the zeros are written as 0.0000 I would be very grateful!
The problem is, very likely, that those "0.0000" are not exactly 0. To solve that, choose a tolerance and use it when comparing with 0:
tol = 1e-6;
index_sh = find(abs(eigenvectors(1,:))<tol & abs(eigenvectors(3,:))<tol);
In your code:
for ii = 1:3
if abs(eigenvectors(1,ii))<tol && abs(eigenvectors(3,ii))<tol
index_sh = i
end
end
Or, instead of a tolerance, you could choose the column whose first- and third-row entries are closer to 0:
[~, index_sh] = min(abs(eigenvectors(1,:)) + abs(eigenvectors(3,:)));
I would like to perform simple LDA on my small data set (65x8). I have 65 instances (samples) , 8 features (attributes) and 4 classes. Any matlab code for LDA , as I know Matlab Toolbox does not have LDA function So I need to write own code. Any help?
I find on web this code
load /Data;
All_data= Data(:,1:8);
All_data_label= Data(:,9);
testing_ind = [];
for i = 1:length(Data)
if rand>0.8
testing_ind = [testing_ind, i];
end
end
training_ind = setxor(1:length(Data), testing_ind);
[ldaClass,err,P,logp,coeff] = classify(Data(testing_ind,:),...
Data((training_ind),:),Data_label(training_ind,:),'linear');
[ldaResubCM,grpOrder] = confusionmat(All_data_label(testing_ind,:),ldaClass)
Then I got this results
ldaClass =
3
2
3
2
1
4
3
3
1
2
1
1
2
err =
0.2963
P =
0.0001 0.0469 0.7302 0.2229
0.1178 0.5224 0.3178 0.0419
0.0004 0.2856 0.4916 0.2224
0.0591 0.6887 0.1524 0.0998
0.8327 0.1637 0.0030 0.0007
0.0002 0.1173 0.3897 0.4928
0.0000 0.0061 0.7683 0.2255
0.0000 0.0241 0.5783 0.3976
0.9571 0.0426 0.0003 0.0000
0.2719 0.5569 0.1630 0.0082
0.9999 0.0001 0.0000 0.0000
0.9736 0.0261 0.0003 0.0000
0.0842 0.6404 0.2634 0.0120
coeff =
4x4 struct array with fields:
type
name1
name2
const
linear
ldaResubCM =
4 0 0 0
0 3 1 0
0 1 1 0
0 0 2 1
grpOrder =
1
2
3
4
So I have 65 Instances, 8 Attributes and 4 classes (1,2,3,4). So dont know how to interpret these results. Any help?
The interpretation of the results derives directly from the documentation of classify.
classify trains a classifier based on the training data and labels (second and third argument), and applies the classifier to the test data (first argument).
ldaClass gives the classes chosen for the test data points, based on the classifier that has been trained using the training data points and labels.
err is the training error rate, the fraction of training data points that are incorrectly classified using the classifier which was trained using that data. The training error rate underestimates the error to be expected on independent test data.
P gives the posterior probabilities. I.e. for each test data point (rows) it gives for each class (columns) the probability that the data point belongs to that class. Probabilities sum to 1 across classes (for each row). The definite classification in ldaClass derives from the posterior probabilities such that for each test data point the class with the highest probability is chosen: [~, ind] = max(P') results in ind = ldaClass'.
coeff contains details about the trained classifier. In order to use this, you have to study in detail how the classifier works.
confusionmat compares the classes assigned by the classifier to the test data with the known true classes, and makes a table of the results, a confusion matrix. Each row corresponds to the true class of a test data point, each column to the class assigned by the classifier. Numbers on the diagonal indicate correct classifications; in your result, you have a test error of 1 - sum(diag(confusionmat)) / sum(confusionmat(:)) of 0.308. In particular, the confusion matrix shows you that of the 4 test data points that belong to class two, three have been classified correctly and 1 incorrectly (as belonging to class three).
grpOrder just gives the explicit class labels for the four classes numbered 1 to 4; in your case, indices and labels are identical.
I have one column which contains the group ID of each participant. There are three groups so every number in this column is 1, 2 or 3.
Then I have a second column which contains response scores for each participant. I want to calculate the mean/median response score within each group.
I have managed to do this by looping through every row but I sense this is a slow and suboptimal solution. Could someone please suggest a better way of doing things?
grpstats is a good function to be used ( documentation here )
This is a list of embedded statistics:
'mean' Mean
'sem' Standard error of the mean
'numel' Count, or number, of non-NaN elements
'gname' Group name
'std' Standard deviation
'var' Variance
'min' Minimum
'max' Maximum
'range' Range
'meanci' 95% confidence interval for the mean
'predci' 95% prediction interval for a new observation
and it accepts as well function handles ( Ex: #mean, #skeweness)
>> groups = [1 1 1 2 2 2 3 3 3]';
>> data = [0 0 1 0 1 1 1 1 1]';
>> grpstats(data, groups, {'mean'})
ans =
0.3333
0.6667
1.0000
>> [mea, med] = grpstats(data, groups, {'mean', #median})
mea =
0.3333
0.6667
1.0000
med =
0
1
1
This is a good place to use accumarray (documentation and blog post):
result = accumarray(groupIDs, data, [], #median);
You can of course give a row or column of a matrix instead of a variable called groupIDs and another for data. If you'd prefer the mean instead of the median, use #mean as the 4th arg.
Note: the documentation notes that you should sort the input parameters if you need to rely on the order of the output. I'll leave that exercise for another day though.
Use logic conditions, for example say your data is in matrix m as follows: the first col is ID the second col is the response scores,
mean(m(m(:,1)==1,2))
median(m(m(:,1)==1,2))
will give you the mean and median for 1 in the response score, etc