fitcdiscr function in MatLab - matlab

I'm attempting to use the fitcdiscr class to reproduce SAS results, but I'm evidently not using the function correctly. I've been through the MatLab documentation on the subject repeatedly, but can't seem to figure out what I'm specifying incorrectly. I'll first post the data, then the SAS code that is specifying the correct linear discriminant functions, then the MatLab code that I'm struggling with. Any help would be appreciated.
Data:
a 191 131 53
a 185 134 50
a 200 137 52
a 173 127 50
a 171 128 49
a 160 118 47
a 188 134 54
a 186 129 51
a 174 131 52
a 163 115 47
b 186 107 49
b 211 122 49
b 201 144 47
b 242 131 54
b 184 108 43
b 211 118 51
b 217 122 49
b 223 127 51
b 208 125 50
b 199 124 46
SAS:
options ls=78;
title "Discrimant Analysis - Insect Data";
data insect;
infile "C:\Users\Zach\Documents\Classes\Penn State\STAT 505\Data\insect.txt";
input species $ joint1 joint2 aedeagus;
run;
data test;
input joint1 joint2 aedeagus;
cards;
194 124 49
;
proc discrim data=insect pool=test crossvalidate testdata=test testout=a;
class species;
var joint1 joint2 aedeagus;
run;
proc print;
run;
MatLab:
path = 'C:\Users\Zach\Documents\Classes\Penn State\STAT 505\Data';
file = 'insect.txt';
fid = fopen(fullfile(path,file));
X = textscan(fid, '%s%f%f%f','Delimiter');
fclose(fid); clear fid
data = cell2mat(X(:,2:4));%data
grps = X{:,1};%group a or b
prednames = {'joint 1', 'joint 2', 'aedeagus'};
M = fitcdiscr(data,grps,'PredictorNames',prednames,...
'Prior','uniform','DiscrimType','linear');
The linear functions are specified using variants of
M.Coeffs(1,2).Linear and M.Coeffs(1,2).Const
The two functions I'm expecting are:
da = -247 -1.4x1 + 1.5x2 10.9x3
db = -193 - .74x1 + 1.1x2 + 8.3x3
Thanks for the help.
Edit:
Using "Applied Multivariate Statistical Analysis, 6th ed" by Johnson and Wichern, section 11.3, I've produced the following code that gives me the SAS results:
xbar = M.Mu;
S = M.Sigma;
Sinv = inv(S);
for ii = 1:2
d0(ii,:) = -1/2*xbar(ii,:)*Sinv*xbar(ii,:)';
d(ii,:) = xbar(ii,:)*Sinv;
end
I'm still left wondering just what fitcdiscr is doing.

Related

How can I create a not-equally-spaced sequence of numbers in MATLAB?

I want to create a not-equally-spaced sequence of numbers in MATLAB starting from 24 and ending to 511.The Sequence uses 32 and 33 alternately as the increment. Thus, the sequence would be as below : [24 56 89 121 154 186 219 251 284 316 349 381 414 446 479 511] Notice that :
24+32=56
56+33=89
89+32=121
121+33=154
...
I just wonder how to modify my own codes or to write new codes to have the answer. My own codes are below:
t_3233=0;
for k=24:(32+t_3233):511
t_3233
k
if t_3233==1
t_3233=0;
else if t_3233==0
t_3233=1;
end
end
end
In this particular case you can use:
len = 16;
vector = round(linspace(24,511,len))

Importing CSV into Matlab

Chemical composition of a certain material
Hi,
I am trying to import the below mentioned data in CSV format in matlab, which is [1000x10] in dimensions.
HCL;H2SO4;CH4; SULPHUR;CHLORINE;S2O3;SO2;NH3;CO2;O2
144 2 3 141 140 6 7 137 136 10 11 133
13 131 130 16 17 127 126 20 21 123 122 24
25 119 118 28 29 115 114 32 33 111 110 36
108 38 39 105 104 42 43 101 100 46 47 97
96 50 51 93 92 54 55 89 88 58 59 85
61 83 82 64 65 79 78 68 69 75 74 72
73 71 70 76 77 67 66 80 81 63 62 84
60 86 87 57 56 90 91 53 52 94 95 49
48 98 99 45 44 102 103 41 40 106 107 37
109 35 34 112 113 31 30 116 117 27 26 120
121 23 22 124 125 19 18 128 129 15 14 132
12 134 135 9 8 138 139 5 4 142 143 1
I am able to import this data through my code
fid = fopen(uigetfile('.csv'),'rt');
FileName = fopen(fid);
headers = fgets(fid); %get first line
headers = textscan(headers,'%s','delimiter',';'); %read first line
format = repmat('%f',1,size(headers{1,1},1)); %count columns n makeformat string
data = textscan(fid,format,'delimiter',';'); %read rest of the file
data = [data{:}];
I am getting data in matrix form in variable data [1000x10] and name of all the components like HCL, H2SO4 in a cell array named headers{1x1}.
Now I have two questions like the built in import feature in matlab you have flexibility to import data as separate column vectors, numeric matrix,cell array and table format. Is it possible to do as such through code, like i get column vectors with their name HCL with [1000x1] and H2sO4 with [1000x1] in my workspace after import and so on all the column vectors with their names with [1000x1]dimensions.
if yes then help me please...?
If above mentioned is not possible then i can do alternatively that now I have names of column vectors in headers cell array, how I can extract those name and use those names as column vector names through code and I can assign data from data matrix [1000x10] to each column vector with their corresponding names.
like if i say
x = headers {1*1}{1*1}; i will get x = "HCL"
x = genvarname(x); I will get x= x0x22HCL0x2 BUT
I want that x get replaced with HCL.and then I assign
HCL = data(:,1) and same like this other variables H2SO4,SULPHUR, CHLORINE.
You can say i try to implement the import feature of column vector through my code.
Kindly help me to solve this issue. thanks
Have you tried the built-in readtable function?
You can access each column of the table by using the named column header.
If you'd like, you can use the two data types to create a table in MatLab. I'm not terribly familiar with its use, but it seems to be well documented. I'm sure someone else can expand upon this.
Edit:
After re-reading your question, I think this is closer to what you are after.
n=10;
what='HCL';%change this to any of the strings you interested in
numstr = repmat('%f',1,n);
hdrstr = repmat('%s',1,n);
headers = textscan(headers,hdrstr,'delimiter',';');
headers = headers(1,:)
data = cell2mat(textscan(fid,numstr,'delimiter',';'));
datout = data(:,strcmp(headers,what));%datout will be 1000x1 HCL data
Depending on what you want to do, you can loop through these appropriately
I know this is not what you asked for, but I would convert to a struct:
x=cell2struct(num2cell(data),headers,2)
reason is simple, selecting for example the third row with individual variables is not possible. With a struct simply use x(3)
If at some point you need the vectors you originally asked for and you can't use the strcut, use [x.HCL]

Error in for loop in matlab

I have an error in the following forloop. I know because the end value of the first for is going to be changed and it is not acceptable for Matlab to change in inside iteration. But would you have any idea how to overcome to it? By the way I used while, but does not help me at all. Data are as follow:
D = [
2.39484592826072e-05 286
4.94140791861196e-05 161
5.07906972800045e-05 163
0.000103133134300751 141
0.000142755898501384 136
0.000143741615840070 152
0.000188072960663613 177
0.000203545320971960 1
0.000269110781516704 296
0.000333161025069404 293
0.000351184122591795 167
0.000393661764751196 299
0.000469154814856272 173
0.000516662289403544 181
0.000537612407901054 156
0.000698464342131732 246
0.000848447859349023 66
0.000875283151707512 75
0.00102377583629824 68
0.00110034589129900 277
0.00110693756077989 129
0.00120680501123819 87
0.00151080017572355 78
0.00159156469379168 248
0.00190852817897233 270
0.00192106167039306 133
0.00224677708557380 258
0.00246430115488258 264
0.00288772180685041 255
0.00299392149856582 81
0.00315341807121748 242
0.00327625233716732 27
0.00362308575885149 124
0.00434568780796603 220
0.00443389247698617 239
0.00470947127244510 60
0.00474015278667278 23
0.00481651908877289 230
0.00487750364266560 53
0.00510342992049100 56
0.00513758569662983 228
0.00515453564704144 121
0.00515656244518627 232
0.00526922882200147 8
0.00547349131456174 50
0.00553337871530176 117
0.00569159206242299 18
0.00620144292620718 13
0.00630382865700000 119
0.00755647842280271 92
0.00983041839684126 40
0.00997057619578698 98
0.0102611966834032 44
0.0103337998140422 100
0.0105132461082006 37
0.0106952804631761 109
0.0107424055503829 208
0.0109630950142485 111
0.0115094667290339 105
0.0119529682389369 107];
ymin= D(:,1);
mean_value = 0.00773867192661190;
criteria = min(ymin);
kk = 1;
diff = 60;
and here is the code that I would have an error for the changing size_D which is expected.
while criteria < mean_value
if isempty(B)
ind_crt = find(min(ymin));
B(kk,:) = D(ind_crt,:);
D(ind_crt,:) = [];
kk = kk + 1;
end
criteria = min(min(D));
size_D = size(D,1);
for ii=1:size_D
if D(ii,1) == criteria
size_B = size(B,1);
for jj = 1:size_B
if abs(D(ii,2) - B(jj,2)) > diff
B(kk,:) = D(ii,:);
D(ii,:)= [];
kk = kk + 1;
end
size_D = size_D -1;
criteria = min(min(D))
end
end
end
end
Update:
Here is the error:
Attempted to access D(59,1); index out of bounds because
size(D)=[58,2].
Error in local_minima (line 50)
if D(ii,1) == criteria
Replace your for loop by a while loop, so that the code in the loop is run only if the condition ii<=size_D is verified:
ii=0;
while ii<=size_D
ii=ii+1;
%loop code
instead of the
for ii=1:size_D
%loop code

read/load parts of the irregular file by Matlab

I would like to partly load a PTX file by matlab (please see the following example)
I need to read and write the first two row (2 numbers) into 2 variables say a and b. And read and write the data from 5th row to the end into a matrix
Thanks for your help
114
221
1 0 0
1 0 0 0
-5.566405 -7.161944 -1.144557 0.197208 24 29 35
-5.560656 -7.154540 -1.137673 0.222400 29 32 39
-5.559846 -7.153491 -1.131895 0.254002 37 40 49
-5.560894 -7.154833 -1.126452 0.305013 51 54 63
-5.560084 -7.153783 -1.120633 0.290013 72 76 88
-5.561128 -7.155119 -1.115189 0.243214 105 113 134
-5.563203 -7.157782 -1.109926 0.227604 130 143 177
-5.569191 -7.165479 -1.105504 0.201602 121 140 173
-7.833616 -10.078705 -1.546952 0.130007 94 112 134
Look at the tdfread function in order to get the data into Matlab. It should be something like datafile = tdfread(filename, '\t'). Once you have that, index into the variable returned from that function like
a = datafile(1, 1);
b = datafile(2, 1);
data = datafile(5:end, :);

Selecting elements of a vector based on two vectors of starting and ending positions matlab

I would appreciate your help with the following problem in matlab:
I have a vector and I would like to select parts of it based on the following two vector of start and end indices of parts:
aa = [1 22 41 64 83 105 127 147 170 190 212 233]
bb = [21 40 63 82 104 126 146 169 189 211 232 252]
Basically I would like to perform some function on V(1:21), V(22:40),... V(233:252).
I have tried V(aa:bb) or V(aa(t):bb(t)) where t = 1:12 but I get only V(1:21), probably because V(22:40) has 19 elements compared to V(1:21) which has 22 elements.
Is there a fast way of programming this?
Put your selection in a cell array, and apply your function to each cell:
aa = [1 22 41 64 83 105 127 147 170 190 212 233]
bb = [21 40 63 82 104 126 146 169 189 211 232 252]
V = rand(252,1); % some sample data
selV = arrayfun(#(t) V(aa(t):bb(t)), 1:12,'uniformoutput',false);
result = cellfun(#yourfunction,selV)
% or
result = cellfun(#(selVi) yourfunction(selVi), selV);
If the function you want to apply has scalar output to every vector input, this should give you an 1x12 array. If the function gives vector output, you'll have to include the uniformoutput parameter:
result = cellfun(#(selVi) yourfunction(selVi), selV,'uniformoutput',false);
which gives you a 1x12 cell array.
If you want to run this in a highly condensed form, you can write (in two lines, for clarity)
aa = [1 22 41 64 83 105 127 147 170 190 212 233]
bb = [21 40 63 82 104 126 146 169 189 211 232 252]
V = rand(252,1); % some sample data borrowed from #Gunther
%# create an anonymous function that accepts start/end of range as input
myFunctionHandle = #(low,high)someFunction(V(low:high));
%# calculate result
%# if "someFunction" returns a scalar, you can drop the 'Uni',false part
%# from arrayfun
result = arrayfun(myFunctionHandle(low,high),aa,bb,'uni',false)
Note that this may run more slowly than an explicit loop at the moment, but arrayfun is likely to be multithreaded in a future release.