Libsvm Classification MATLAB - matlab

I used 1~200 data as trainning data, 201~220 as testing data
format likes: 3 class(class 1,class 2, class 3) and 20 features
2 1:100 2:96 3:88 4:94 5:96 6:94 7:72 8:68 9:69 10:70 11:76 12:70 13:73 14:71 15:74 16:76 17:78 18:81 19:76 20:76
2 1:96 2:100 3:88 4:88 5:90 6:98 7:71 8:66 9:63 10:74 11:75 12:66 13:71 14:68 15:74 16:78 17:78 18:85 19:77 20:76
2 1:88 2:88 3:100 4:96 5:91 6:89 7:70 8:70 9:68 10:74 11:76 12:71 13:73 14:74 15:79 16:77 17:73 18:80 19:78 20:78
2 1:94 2:87 3:96 4:100 5:92 6:88 7:76 8:73 9:71 10:70 11:74 12:67 13:71 14:71 15:76 16:77 17:71 18:80 19:73 20:73
2 1:96 2:90 3:91 4:93 5:100 6:92 7:74 8:67 9:67 10:75 11:75 12:67 13:74 14:73 15:77 16:77 17:75 18:82 19:76 20:74
2 1:93 2:98 3:90 4:88 5:92 6:100 7:73 8:66 9:65 10:73 11:78 12:69 13:73 14:72 15:75 16:74 17:75 18:83 19:79 20:77
3 1:73 2:71 3:73 4:76 5:74 6:73 7:100 8:79 9:79 10:71 11:65 12:58 13:67 14:73 15:74 16:72 17:60 18:63 19:64 20:60
3 1:68 2:66 3:70 4:73 5:68 6:67 7:78 8:100 9:85 10:77 11:57 12:57 13:58 14:62 15:68 16:64 17:59 18:57 19:57 20:59
3 1:69 2:64 3:70 4:72 5:69 6:65 7:78 8:85 9:100 10:70 11:56 12:63 13:62 14:61 15:64 16:69 17:56 18:55 19:55 20:51
3 1:71 2:74 3:74 4:70 5:76 6:73 7:71 8:73 9:71 10:100 11:58 12:58 13:59 14:60 15:58 16:65 17:57 18:57 19:63 20:57
1 1:77 2:75 3:76 4:73 5:75 6:79 7:66 8:56 9:56 10:59 11:100 12:77 13:84 14:79 15:82 16:80 17:82 18:82 19:81 20:82
1 1:70 2:66 3:71 4:67 5:67 6:70 7:63 8:57 9:62 10:58 11:77 12:100 13:84 14:75 15:76 16:78 17:73 18:72 19:87 20:80
1 1:73 2:72 3:73 4:71 5:74 6:74 7:68 8:58 9:61 10:59 11:84 12:84 13:100 14:86 15:88 16:91 17:81 18:81 19:84 20:86
1 1:71 2:69 3:75 4:71 5:73 6:73 7:74 8:61 9:61 10:60 11:79 12:75 13:86 14:100 15:90 16:88 17:74 18:79 19:81 20:82
1 1:74 2:74 3:80 4:76 5:78 6:76 7:73 8:66 9:64 10:59 11:81 12:76 13:88 14:90 15:100 16:93 17:74 18:83 19:81 20:85
1 1:76 2:77 3:77 4:76 5:78 6:75 7:73 8:64 9:68 10:65 11:80 12:78 13:91 14:88 15:93 16:100 17:79 18:79 19:82 20:83
1 1:78 2:78 3:73 4:71 5:75 6:75 7:61 8:58 9:57 10:56 11:82 12:73 13:81 14:74 15:74 16:80 17:100 18:85 19:80 20:85
1 1:81 2:85 3:79 4:80 5:82 6:82 7:63 8:56 9:55 10:57 11:82 12:72 13:81 14:79 15:83 16:79 17:85 18:100 19:83 20:79
1 1:76 2:77 3:78 4:75 5:76 6:79 7:65 8:57 9:57 10:63 11:81 12:87 13:84 14:81 15:81 16:82 17:80 18:83 19:100 20:87
1 1:76 2:76 3:78 4:73 5:75 6:78 7:60 8:59 9:51 10:57 11:82 12:80 13:86 14:82 15:85 16:83 17:85 18:79 19:87 20:100
Then, I write code to classify them:
% read the data set
[image_label, image_features] = libsvmread(fullfile('D:\...'));
[N D] = size(image_features);
% Determine the train and test index
trainIndex = zeros(N,1);
trainIndex(1:200) = 1;
testIndex = zeros(N,1);
testIndex(201:N) = 1;
trainData = image_features(trainIndex==1,:);
trainLabel = image_label(trainIndex==1,:);
testData = image_features(testIndex==1,:);
testLabel = image_label(testIndex==1,:);
% Train the SVM
model = svmtrain(trainLabel, trainData, '-c 1 -g 0.05 -b 1');
% Use the SVM model to classify the data
[predict_label, accuracy, prob_values] = svmpredict(testLabel, testData, model, '-b 1');
But the final result for predict_label are all class 1, so the accuracy is 50%, which that it cannot get the correct predict label for class 2 and 3.
Is there something wrong from the format of data, or the code that I implemented?
Please help me, thanks very much.

To elaborate a bit more about the problem, there are at least three problems here:
You just check one values of parameters C (c) and Gamma (g) - behaviour of SVM is heavily dependant on the good choice of these parameters, so it is a common approach to use a grid search using cross validation testing for selecting the best ones.
Data scale also plays an important role here, if some of the dimensions are much bigger then the rest, you will bias the whole classifier, in order to deal with it there are at least two basic approaches: 1. Scale linearly each dimension to some interval (like [0,1] or [-1,1]) or normalize the data by transformation through Sigma^(-1/2) where Sigma is a data covariance matrix
Label imbalance - SVM works best when you have exactly the same amount of points in each class. Once it is not true, you should use the class weighting scheme in order to get valid results.
After fixing these three issues you should get reasonable results.

My guess is that you'd want to tune your parameters.
Make a loop over your -c and -g values (typically logarithimically, eg -c 10^(-3:5) ) and pick the one that is best.
That said, it is advisable to normalize your data, eg. scale it such that all values are between 0 and 1.

Related

Having trouble in using nlinfit function in MATLAB

Kindly please help me with the problem as I need to use nlinfit function for fitting unknown parameters but it is showing some error. Although yesterday I was getting some values for parameters to be fitted but now if I am running it is having some issue for the function output to be used in fitted with NaN answer for last iteration only. X data is a concatenated matrix of three columns as independent variable and yk is dependent variable, taua is a matrix of initial guesses of number of parameters to be fitted.
function [yk]=activity_coefficientE(taua,x)
T=523;
alpha12=0.3; alpha13=0.3; alpha21=0.3; alpha23=0.3; alpha31=0.3; alpha32=0.3;
alpha18=0.2; alpha81=0.2; alpha28=0.2; alpha82=0.2; alpha38=0.2; alpha83=0.3;
alpha19=0.2; alpha91=0.2; alpha29=0.2; alpha92=0.2; alpha39=0.2; alpha93=0.2;
alpha110=0.2;alpha101=0.2;alpha210=0.2;alpha102=0.2;alpha310=0.2;alpha103=0.2;
alpha113=0.2;alpha131=0.2;alpha213=0.2;alpha132=0.2;alpha313=0.2;alpha133=0.2;
alpha114=0.2;alpha141=0.2;alpha214=0.2;alpha142=0.2;alpha314=0.2;alpha143=0.2;
alpha115=0.2;alpha151=0.2;alpha215=0.2;alpha152=0.2;alpha315=0.2;alpha153=0.2;
alpha117=0.2;alpha171=0.2;alpha217=0.2;alpha172=0.2;alpha317=0.2;alpha173=0.2;
alpha118=0.2;alpha181=0.2;alpha218=0.2;alpha182=0.2;alpha318=0.2;alpha183=0.2;
alpha810=0.2;alpha915=0.2;alpha1314=0.2;alpha108=0.2;alpha159=0.2;alpha1413=0.2;
alpha1718=0.2;alpha1817=0.2;
tau12=0; tau13=0; tau21=0; tau23=0; tau31=0; tau32=0;
%taua=randi([-5,5],1,112)
tau18=taua(1)+taua(57)/T;
tau81=taua(2)+taua(58)/T;
tau28=taua(3)+taua(59)/T;
tau82=taua(4)+taua(60)/T;
tau38=taua(5)+taua(61)/T;
tau83=taua(6)+taua(62)/T;
tau19=taua(7)+taua(63)/T;
tau91=taua(8)+taua(64)/T;
tau29=taua(9)+taua(65)/T;
tau92=taua(10)+taua(66)/T;
tau39=taua(11)+taua(67)/T;
tau93=taua(12)+taua(68)/T;
tau110=taua(13)+taua(69)/T;
tau101=taua(14)+taua(70)/T;
tau210=taua(15)+taua(71)/T;
tau102=taua(16)+taua(72)/T;
tau310=taua(17)+taua(73)/T;
tau103=taua(18)+taua(74)/T;
tau113=taua(19)+taua(75)/T;
tau131=taua(20)+taua(76)/T;
tau213=taua(21)+taua(77)/T;
tau132=taua(22)+taua(78)/T;
tau313=taua(23)+taua(79)/T;
tau133=taua(24)+taua(80)/T;
tau114=taua(25)+taua(81)/T;
tau141=taua(26)+taua(82)/T;
tau214=taua(27)+taua(83)/T;
tau142=taua(28)+taua(84)/T;
tau314=taua(29)+taua(85)/T;
tau143=taua(30)+taua(86)/T;
tau115=taua(31)+taua(87)/T;
tau151=taua(32)+taua(88)/T;
tau215=taua(33)+taua(89)/T;
tau152=taua(34)+taua(90)/T;
tau315=taua(35)+taua(91)/T;
tau153=taua(36)+taua(92)/T;
tau117=taua(37)+taua(93)/T;
tau171=taua(38)+taua(94)/T;
tau217=taua(39)+taua(95)/T;
tau172=taua(40)+taua(96)/T;
tau317=taua(41)+taua(97)/T;
tau173=taua(42)+taua(98)/T;
tau118=taua(43)+taua(99)/T;
tau181=taua(44)+taua(100)/T;
tau218=taua(45)+taua(101)/T;
tau182=taua(46)+taua(102)/T;
tau318=taua(47)+taua(103)/T;
tau183=taua(48)+taua(104)/T;
tau810=taua(49)+taua(105)/T;
tau108=taua(50)+taua(106)/T;
tau915=taua(51)+taua(107)/T;
tau159=taua(52)+taua(108)/T;
tau1314=taua(53)+taua(109)/T;
tau1413=taua(54)+taua(110)/T;
tau1718=taua(55)+taua(111)/T;
tau1817=taua(56)+taua(112)/T;
G12=exp(-(tau12*alpha12));
G21=exp(-(tau21*alpha21));
G13=exp(-(tau13*alpha13));
G31=exp(-(tau31*alpha31));
G23=exp(-(tau23*alpha23));
G32=exp(-(tau32*alpha32));
G18=exp(-(tau18*alpha18));
G81=exp(-(tau81*alpha81));
G28=exp(-(tau28*alpha28));
G82=exp(-(tau82*alpha82));
G38=exp(-(tau38*alpha83));
G83=exp(-(tau83*alpha83));
G19=exp(-(tau19*alpha19));
G91=exp(-(tau91*alpha91));
G29=exp(-(tau29*alpha29));
G92=exp(-(tau92*alpha92));
G39=exp(-(tau39*alpha39));
G93=exp(-(tau93*alpha93));
G110=exp(-(tau110*alpha110));
G101=exp(-(tau101*alpha101));
G210=exp(-(tau210*alpha210));
G102=exp(-(tau102*alpha102));
G310=exp(-(tau310*alpha310));
G103=exp(-(tau103*alpha103));
G113=exp(-(tau113*alpha113));
G131=exp(-(tau131*alpha131));
G213=exp(-(tau213*alpha213));
G132=exp(-(tau132*alpha132));
G313=exp(-(tau313*alpha313));
G133=exp(-(tau133*alpha133));
G114=exp(-(tau114*alpha114));
G141=exp(-(tau141*alpha141));
G214=exp(-(tau214*alpha214));
G142=exp(-(tau142*alpha142));
G314=exp(-(tau314*alpha314));
G143=exp(-(tau143*alpha143));
G115=exp(-(tau115*alpha115));
G151=exp(-(tau151*alpha151));
G215=exp(-(tau215*alpha215));
G152=exp(-(tau152*alpha152));
G315=exp(-(tau315*alpha315));
G153=exp(-(tau153*alpha153));
G117=exp(-(tau117*alpha117));
G171=exp(-(tau171*alpha171));
G217=exp(-(tau217*alpha217));
G172=exp(-(tau172*alpha172));
G317=exp(-(tau317*alpha317));
G173=exp(-(tau173*alpha173));
G118=exp(-(tau118*alpha118));
G181=exp(-(tau181*alpha181));
G218=exp(-(tau218*alpha218));
G182=exp(-(tau182*alpha182));
G318=exp(-(tau318*alpha318));
G183=exp(-(tau183*alpha183));
G810=exp(-(tau810*alpha810));
G108=exp(-(tau108*alpha108));
G915=exp(-(tau915*alpha915));
G159=exp(-(tau159*alpha159));
G1314=exp(-(tau1314*alpha1314));
G1413=exp(-(tau1413*alpha1413));
G1718=exp(-(tau1718*alpha1718));
G1817=exp(-(tau1817*alpha1817));
%calculating mole fractions of ionic species
x1=x(:,1);
x2=x(:,2);
x3=x(:,3);
%x1=[0.1577 0.1492 0.1462 0.1366 0.1299 0.1180 0.0863 0.0761 0.0550 ];
%x2=[0.8278 0.7945 0.7678 0.7450 0.6979 0.6309 0.4611 0.4114 0.2952 ];
%x3=[0.0145 0.0563 0.0860 0.1184 0.1722 0.2511 0.4526 0.5125 0.6498 ];
A=[0.0674243 0.0773881 0.0843400 0.0865343 0.0899223 0.0882858 0.0715087 0.0643867 0.0483658];
B=[0.0141081 0.0479814 0.0643151 0.0737477 0.0820756 0.0838701 0.0701576 0.0634457 0.0479639];
C=[0.0565665 0.0450072 0.0387724 0.0313828 0.02506094 0.0186280 0.0092734 0.0073438 0.0041595 ];
D=[0.0336447 0.0267694 0.0230611 0.0186659 0.0149058 0.0110795 0.0055157 0.0043679 0.0024739 ];
E=[0.0008148 0.0008756 0.00087131 0.0008794 0.0008711 0.0008441 0.0007384 0.0006997 0.0005980 ];
N=length(A);
x1n=zeros(N,1);x2n=zeros(N,1);x3n=zeros(N,1);
X1=zeros(N,1);X2=zeros(N,1);X3=zeros(N,1);X4=zeros(N,1);X5=zeros(N,1);X6=zeros(N,1);X7=zeros(N,1);
X12=zeros(N,1);X16=zeros(N,1);
for i=1:N
x1n(i)=(x1(i)-A(i)-D(i)-2*E(i)-C(i)+3*B(i))
x2n(i)=(x2(i)-A(i)-C(i)-D(i))
x3n(i)=(x3(i)-B(i))
X1(i)=(x1n(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X2(i)=(x2n(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X3(i)=(x3n(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X4(i)=(A(i)+D(i)+E(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X5(i)=(C(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X6(i)=(A(i)-B(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X7(i)=(B(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X12(i)=(E(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X16(i)=(C(i)+D(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
end
yc4=X4./(X4+X5);
yc5=X5./(X4+X5);
yc6=X6./(X6+X7+X12+X16);
yc7=X7./(X6+X7+X12+X16);
yc12=X12./(X6+X7+X12+X16);
yc16=X16./(X6+X7+X12+X16);
alpha14=yc6.*alpha18+yc7.*alpha19+yc12.*alpha113+yc16.*alpha117;
%alpha41=alpha14;
alpha24=yc6.*alpha28+yc7.*alpha29+yc12.*alpha213+yc16.*alpha217;
%alpha42=alpha24;
alpha34=yc6.*alpha38+yc7.*alpha39+yc12.*alpha313+yc16.*alpha317;
%alpha43=alpha34;
alpha15=yc6.*alpha110+yc7.*alpha115+yc12.*alpha114+yc16.*alpha118;
%alpha51=alpha15;
alpha25=yc6.*alpha210+yc7.*alpha215+yc12.*alpha214+yc16.*alpha218;
%alpha52=alpha25;
alpha35=yc6.*alpha310+yc7.*alpha315+yc12.*alpha314+yc16.*alpha318;
%alpha53=alpha35;
alpha16=yc4.*alpha81+yc5.*alpha101;
%alpha61=alpha16;
alpha26=yc4.*alpha82+yc5.*alpha102;
%alpha62=alpha26;
alpha36=yc4.*alpha83+yc5.*alpha103;
%alpha63=alpha36;
alpha17=yc4.*alpha91+yc5.*alpha151;
%alpha71=alpha17;
alpha27=yc4.*alpha92+yc5.*alpha152;
%alpha72=alpha27;
alpha37=yc4.*alpha93+yc5.*alpha153;
%alpha73=alpha37;
alpha112=yc4.*alpha131+yc5.*alpha141;
%alpha121=alpha112;
alpha212=yc4.*alpha132+yc5.*alpha142;
%alpha122=alpha212;
alpha312=yc4.*alpha133+yc5.*alpha143;
%alpha123=alpha312;
alpha116=yc4.*alpha171+yc5.*alpha181;
%alpha161=alpha116;
alpha216=yc4.*alpha172+yc5.*alpha182;
%alpha162=alpha216;
alpha316=yc4.*alpha173+yc5.*alpha183;
%alpha163=alpha316;
alpha46=yc5.*alpha810;
%alpha64=alpha46;
alpha47=yc5.*alpha915;
%alpha74=alpha47;
alpha412=yc5.*alpha1314;
%alpha124=alpha412;
alpha416=yc5.*alpha1718;
%alpha164=alpha416;
alpha56=yc4.*alpha108;
%alpha65=alpha56;
alpha57=yc4.*alpha159;
%alpha75=alpha57;
alpha512=yc4.*alpha1413;
%alpha125=alpha512;
alpha516=yc4.*alpha1817;
%alpha165=alpha516;
G14=yc6.*G18+yc7.*G19+yc12.*G113+yc16.*G117;
%G41=G14;
G24=yc6.*G28+yc7.*G29+yc12.*G213+yc16.*G217;
%G42=G24;
G34=yc6.*G38+yc7.*G39+yc12.*G313+yc16.*G317;
%G43=G34;
G15=yc6.*G110+yc7.*G115+yc12.*G114+yc16.*G118;
%G51=G15;
G25=yc6.*G210+yc7.*G215+yc12.*G214+yc16.*G218;
%G52=G25;
G35=yc6.*G310+yc7.*G315+yc12.*G314+yc16.*G318;
%G53=G35;
G16=yc4.*G81+yc5.*G101;
%G61=G16;
G26=yc4.*G82+yc5.*G102;
%G62=G26;
G36=yc4.*G83+yc5.*G103;
%G63=G36;
G17=yc4.*G91+yc5.*G151;
%G71=G17;
G27=yc4.*G92+yc5.*G152;
%G72=G27;
G37=yc4.*G93+yc5.*G153;
%G73=G37;
G112=yc4.*G131+yc5.*G141;
%G121=G112;
G212=yc4.*G132+yc5.*G142;
%G122=G212;
G312=yc4.*G133+yc5.*G143;
%G123=G312;
G116=yc4.*G171+yc5.*G181;
%G161=G116;
G216=yc4.*G172+yc5.*G182;
%G162=G216;
G316=yc4.*G173+yc5.*G183;
%G163=G316;
G46=yc5.*G810;
%G64=G46;
G47=yc5.*G915;
%G74=G47;
G412=yc5.*G1314;
%G124=G412;
G416=yc5.*G1718;
%G164=G416;
G56=yc4.*G108;
%G65=G56;
G57=yc4.*G159;
%G75=G57;
G512=yc4.*G1413;
%G125=G512;
G516=yc4.*G1817;
%G165=G516;
tau14=-log(G14)./alpha14;
%tau41=tau14;
tau24=-log(G24)./alpha24;
%tau42=tau24;
tau34=-log(G34)./alpha34;
%tau43=tau34;
tau15=-log(G15)./alpha15;
%tau51=tau15;
tau25=-log(G25)./alpha25;
%tau52=tau25;
tau35=-log(G35)./alpha35;
%tau53=tau35;
tau16=-log(G16)./alpha16;
%tau61=tau16;
tau26=-log(G26)./alpha26;
%tau62=tau26;
tau36=-log(G36)./alpha36;
%tau63=tau36;
tau17=-log(G17)./alpha17;
%tau71=tau17;
tau27=-log(G27)./alpha27;
%tau72=tau27;
tau37=-log(G37)./alpha37;
%tau73=tau37;
tau112=-log(G112)./alpha112;
%tau121=tau112;
tau212=-log(G212)./alpha212;
%tau122=tau212;
tau312=-log(G312)./alpha312;
%tau123=tau312;
tau116=-log(G116)./alpha116;
%tau161=tau116;
tau216=-log(G216)./alpha216;
%tau162=tau216;
tau316=-log(G316)./alpha316;
%tau163=tau316;
tau46=-log(G46)./alpha46;
%tau64=tau46;
tau47=-log(G47)./alpha47;
%tau74=tau47;
tau412=-log(G412)./alpha412;
%tau124=tau412;
tau416=-log(G416)./alpha416;
%tau164=tau416;
tau56=-log(G56)./alpha56;
%tau65=tau56;
tau57=-log(G57)./alpha57;
%tau75=tau57;
tau512=-log(G512)./alpha512;
%tau125=tau512;
tau516=-log(G516)./alpha516;
%tau165=tau516;
ln_y1_1=G12.*X2.*tau12+ G31.*X3.*tau13+ G14.*X4.*tau14+G15.*X5.*tau15+G16.*X6.*tau16+G17.*X7.*tau17+G112.*X12.*tau112+G116.*X16.*tau116;
ln_y1_2=G12.*X2+ G13.*X3+ G14.*X4+G15.*X5+G16.*X6+G17.*X7+G112.*X12+G116.*X16;
ln_y2_1=G21.*X1.*tau12+ G32.*X3.*tau32+ G24.*X4.*tau24+G25.*X5.*tau25+G26.*X6.*tau26+G27.*X7.*tau27+G212.*X12.*tau212+G216.*X16.*tau216;
ln_y2_2=G12.*X1+ G23.*X3+G24.*X4+G25.*X5+G26.*X6+G27.*X7+G212.*X12+G216.*X16;
ln_y3_1=G13.*X1.*tau13+ G23.*X3.*tau23+ G34.*X4.*tau34+G35.*X5.*tau35+G36.*X6.*tau36+G37.*X7.*tau37+G312.*X12.*tau312+G316.*X16.*tau316;
ln_y3_2=G13.*X1+ G23.*X3+ G34.*X4+G35.*X5+G36.*X6+G37.*X7+G312.*X12+G316.*X16;
ln_y4_1=G14.*X1.*tau14+G24.*X2.*tau24+G34.*X3.*tau34+G46.*X6.*tau46+G47.*X7.*tau47+G412.*X12.*tau412+G416.*X16.*tau416;
ln_y4_2=G14.*X1+G24.*X2+G34.*X3+G46.*X6+G47.*X7+G412.*X12+G416.*X16;
ln_y5_1=G15.*X1.*tau15+G25.*X2.*tau25+G35.*X3.*tau35+G56.*X6.*tau56+G57.*X7.*tau57+G512.*X12.*tau512+G516.*X16.*tau516;
ln_y5_2=G15.*X1+G25.*X2+G35.*X3+G56.*X6+G57.*X7+G512.*X12+G516.*X16;
ln_y6_1=G16.*X1.*tau16+G26.*X2.*tau26+G36.*X3.*tau36+G46.*X4.*tau46+G56.*X5.*tau56;
ln_y6_2=G16.*X1+G26.*X2+G36.*X3+G46.*X4+G56.*X5;
ln_y7_1=G17.*X1.*tau17+G27.*X2.*tau27+G37.*X3.*tau37+G47.*X4.*tau47+G57.*X5.*tau57;
ln_y7_2=G17.*X1+G27.*X2+G37.*X3+G47.*X4+G57.*X5;
ln_y12_1=G112.*X1.*tau112+G212.*X2.*tau212+G312.*X3.*tau312+G412.*X4.*tau412+G512.*X5.*tau512;
ln_y12_2=G112.*X1+G212.*X2+G312.*X3+G412.*X4+G512.*X5;
ln_y16_1=G116.*X1.*tau116+G216.*X2.*tau216+G316.*X3.*tau316+G416.*X4.*tau416+G516.*X5.*tau516;
ln_y16_2=G116.*X1+G216.*X2+G316.*X3+G416.*X4+G516.*X5;
ln_y1_3=(((X2.*G12)./ln_y2_2).*(tau12-(ln_y2_1)./(ln_y2_2)))+(((X3.*G13)./ln_y3_2).*(tau13-(ln_y3_1)./(ln_y3_2)));
ln_y1_4=(((X6.*G16)./ln_y6_2).*(tau16- (ln_y6_1./ln_y6_2))) + (((X7.*G17)./ln_y7_2).*(tau17- (ln_y7_1./ln_y7_2)))+(((X12.*G12)./ln_y12_2).*(tau112- (ln_y12_1./ln_y12_2)))+(((X16.*G16)./ln_y16_2).*(tau116- (ln_y16_1./ln_y16_2)));
ln_y1_5=(((X4.*G14)./ln_y4_2).*(tau14- (ln_y4_1./ln_y4_2))) + (((X5.*G15)./ln_y5_2).*(tau15- (ln_y5_1./ln_y5_2)));
yk=exp((ln_y1_1./ln_y1_2) + ln_y1_3 + ln_y1_4+ ln_y1_5) % activity coefficient for H2O
end
........................................
Another function where above function to be called.....
% calling the function act_coeff to estimate the binary interaction parameters
for i=1:112
filename = 'EagelsDATA.xlsx'; %reading VLE data from excel file
Data = xlsread(filename);
x(:,1) = Data([10:15 17:19],16);
x(:,2) = Data([10:15 17:19],1);
x(:,3)= Data([10:15 17:19],2);
taua=(randi([-5,5],1,112));
yk=[0.0606 (values calculated from above function and will be used for fitting)
0.4327
0.6545
0.9417
1.2570
1.6881
1.9108
1.7777
1.3821]
% taua =[ -2 3 4 -3 -4 1 4 -2 4 -4 -1 4 5 -3 3 2 -5 3 -4
% 1 4 1 5 -1 -1 -3 2 -3 4 3 4 2 5 4 -2 4 3 -1
% 1 0 -5 -5 -5 -3 4 2 1 4 0 2 -3 -4 5 0 -3 2 5
% 1 0 5 1 -3 5 4 1 5 2 3 2 0 -5 -4 -2 1 -2 5
%-5 5 -2 -2 4 1 -1 3 -1 1 5 -1 0 -1 4 5 5 1 4
% 1 0 4 -4 4 0 -1 -2 -5 -3 -4 -5
% -5 0 -2 0 -5] (random values for which yk was calculted from the command
taua= randi([-5,5],1,112))
try % try-catch used to continue the loop without stopping on encountering an error
[taua1]= nlinfit(x,yk,#activity_coefficientE,taua)
catch exception
continue
end
end
I am not able to attach excel sheet here so data from excel sheet is as:
x =[0.1577 0.1492 0.1462 0.1366 0.1299 0.1180 0.0863 0.0761 0.0550; column 1
0.8278 0.7945 0.7678 0.7450 0.6979 0.6309 0.4611 0.4114 0.2952 ; column 2
0.0145 0.0563 0.0860 0.1184 0.1722 0.2511 0.4526 0.5125 0.6498 ]; column 3
I found 3 major problems with what you did.
Problem #1 - errors
The reason you get the error is because your function "activity_coefficientE" can sometimes return NaN or inf values. My suggestion is to look for these values and set the value of "yk" to a large value so that the optimizer in "nlinfit" will stay away from coefficients that produce infinite or NaN values. This is the code at the bottom of the function so that you avoid crashes:
if any(~isfinite(yk))
yk = 10 * ones(size(yk));
end
Problem #2 - random initial guesses
The trouble with using random numbers for your initial conditions is that every time you run it you get a different answer, so sometimes it works and sometimes it doesn't. If you set the random number generator seed, you can get the same random numbers each time you run the script. If you change you seed, you can get a different set of random numbers. I shortened your main script to this, where I try 100 different random seeds (and store the results of each attempt) to see what answers result:
for i=1:100
rng(i)
taua = randi([-5,5],1,112);
taua1(i, :) = nlinfit(x,yk,#activity_coefficientE,taua);
end
Each row of "taua1" is a set of 111 coefficients.
Problem #3 - Trying to fit 9 points with 112 coefficients
Every time nlinfit is called, you get this warning:
Warning: Rank deficient
because you have more coefficients (112) that you are asking nlinfit to find than data points you are fitting (9). It's like trying to find the 2nd order equation that best fits 2 points, there are an infinite number of solutions. When curve fitting you should have more data points than coefficients to make sure you're not fitting noise. You need more data points in "yk" and "x" and/or fewer coefficients to fit. I've done a lot of curve fitting and I've never seen an equation with 112 coefficients, so I am thinking that you are not solving the problem correctly. Perhaps the 112 coefficients aren't really independent or there are 112 data points and 9 coefficients that you want to find.
For completeness, here is my edited version of the activity_coefficientE.m function that I created to work on this solution. In general, I never see Matlab code with this many variables with similar names. Much of this code could be greatly simplified by using vector operations. Most of my changes involve formatting, adding semicolons, and the checks for non-finite values at the end.
function yk=activity_coefficientE(taua,x)
T=523;
alpha12=0.3; alpha13=0.3; alpha21=0.3; alpha23=0.3; alpha31=0.3; alpha32=0.3;
alpha18=0.2; alpha81=0.2; alpha28=0.2; alpha82=0.2; alpha38=0.2; alpha83=0.3;
alpha19=0.2; alpha91=0.2; alpha29=0.2; alpha92=0.2; alpha39=0.2; alpha93=0.2;
alpha110=0.2;alpha101=0.2;alpha210=0.2;alpha102=0.2;alpha310=0.2;alpha103=0.2;
alpha113=0.2;alpha131=0.2;alpha213=0.2;alpha132=0.2;alpha313=0.2;alpha133=0.2;
alpha114=0.2;alpha141=0.2;alpha214=0.2;alpha142=0.2;alpha314=0.2;alpha143=0.2;
alpha115=0.2;alpha151=0.2;alpha215=0.2;alpha152=0.2;alpha315=0.2;alpha153=0.2;
alpha117=0.2;alpha171=0.2;alpha217=0.2;alpha172=0.2;alpha317=0.2;alpha173=0.2;
alpha118=0.2;alpha181=0.2;alpha218=0.2;alpha182=0.2;alpha318=0.2;alpha183=0.2;
alpha810=0.2;alpha915=0.2;alpha1314=0.2;alpha108=0.2;alpha159=0.2;alpha1413=0.2;
alpha1718=0.2;alpha1817=0.2;
tau12=0; tau13=0; tau21=0; tau23=0; tau31=0; tau32=0;
tau18=taua(1)+taua(57)/T;
tau81=taua(2)+taua(58)/T;
tau28=taua(3)+taua(59)/T;
tau82=taua(4)+taua(60)/T;
tau38=taua(5)+taua(61)/T;
tau83=taua(6)+taua(62)/T;
tau19=taua(7)+taua(63)/T;
tau91=taua(8)+taua(64)/T;
tau29=taua(9)+taua(65)/T;
tau92=taua(10)+taua(66)/T;
tau39=taua(11)+taua(67)/T;
tau93=taua(12)+taua(68)/T;
tau110=taua(13)+taua(69)/T;
tau101=taua(14)+taua(70)/T;
tau210=taua(15)+taua(71)/T;
tau102=taua(16)+taua(72)/T;
tau310=taua(17)+taua(73)/T;
tau103=taua(18)+taua(74)/T;
tau113=taua(19)+taua(75)/T;
tau131=taua(20)+taua(76)/T;
tau213=taua(21)+taua(77)/T;
tau132=taua(22)+taua(78)/T;
tau313=taua(23)+taua(79)/T;
tau133=taua(24)+taua(80)/T;
tau114=taua(25)+taua(81)/T;
tau141=taua(26)+taua(82)/T;
tau214=taua(27)+taua(83)/T;
tau142=taua(28)+taua(84)/T;
tau314=taua(29)+taua(85)/T;
tau143=taua(30)+taua(86)/T;
tau115=taua(31)+taua(87)/T;
tau151=taua(32)+taua(88)/T;
tau215=taua(33)+taua(89)/T;
tau152=taua(34)+taua(90)/T;
tau315=taua(35)+taua(91)/T;
tau153=taua(36)+taua(92)/T;
tau117=taua(37)+taua(93)/T;
tau171=taua(38)+taua(94)/T;
tau217=taua(39)+taua(95)/T;
tau172=taua(40)+taua(96)/T;
tau317=taua(41)+taua(97)/T;
tau173=taua(42)+taua(98)/T;
tau118=taua(43)+taua(99)/T;
tau181=taua(44)+taua(100)/T;
tau218=taua(45)+taua(101)/T;
tau182=taua(46)+taua(102)/T;
tau318=taua(47)+taua(103)/T;
tau183=taua(48)+taua(104)/T;
tau810=taua(49)+taua(105)/T;
tau108=taua(50)+taua(106)/T;
tau915=taua(51)+taua(107)/T;
tau159=taua(52)+taua(108)/T;
tau1314=taua(53)+taua(109)/T;
tau1413=taua(54)+taua(110)/T;
tau1718=taua(55)+taua(111)/T;
tau1817=taua(56)+taua(112)/T;
G12=exp(-(tau12*alpha12));
G21=exp(-(tau21*alpha21));
G13=exp(-(tau13*alpha13));
G31=exp(-(tau31*alpha31));
G23=exp(-(tau23*alpha23));
G32=exp(-(tau32*alpha32));
G18=exp(-(tau18*alpha18));
G81=exp(-(tau81*alpha81));
G28=exp(-(tau28*alpha28));
G82=exp(-(tau82*alpha82));
G38=exp(-(tau38*alpha83));
G83=exp(-(tau83*alpha83));
G19=exp(-(tau19*alpha19));
G91=exp(-(tau91*alpha91));
G29=exp(-(tau29*alpha29));
G92=exp(-(tau92*alpha92));
G39=exp(-(tau39*alpha39));
G93=exp(-(tau93*alpha93));
G110=exp(-(tau110*alpha110));
G101=exp(-(tau101*alpha101));
G210=exp(-(tau210*alpha210));
G102=exp(-(tau102*alpha102));
G310=exp(-(tau310*alpha310));
G103=exp(-(tau103*alpha103));
G113=exp(-(tau113*alpha113));
G131=exp(-(tau131*alpha131));
G213=exp(-(tau213*alpha213));
G132=exp(-(tau132*alpha132));
G313=exp(-(tau313*alpha313));
G133=exp(-(tau133*alpha133));
G114=exp(-(tau114*alpha114));
G141=exp(-(tau141*alpha141));
G214=exp(-(tau214*alpha214));
G142=exp(-(tau142*alpha142));
G314=exp(-(tau314*alpha314));
G143=exp(-(tau143*alpha143));
G115=exp(-(tau115*alpha115));
G151=exp(-(tau151*alpha151));
G215=exp(-(tau215*alpha215));
G152=exp(-(tau152*alpha152));
G315=exp(-(tau315*alpha315));
G153=exp(-(tau153*alpha153));
G117=exp(-(tau117*alpha117));
G171=exp(-(tau171*alpha171));
G217=exp(-(tau217*alpha217));
G172=exp(-(tau172*alpha172));
G317=exp(-(tau317*alpha317));
G173=exp(-(tau173*alpha173));
G118=exp(-(tau118*alpha118));
G181=exp(-(tau181*alpha181));
G218=exp(-(tau218*alpha218));
G182=exp(-(tau182*alpha182));
G318=exp(-(tau318*alpha318));
G183=exp(-(tau183*alpha183));
G810=exp(-(tau810*alpha810));
G108=exp(-(tau108*alpha108));
G915=exp(-(tau915*alpha915));
G159=exp(-(tau159*alpha159));
G1314=exp(-(tau1314*alpha1314));
G1413=exp(-(tau1413*alpha1413));
G1718=exp(-(tau1718*alpha1718));
G1817=exp(-(tau1817*alpha1817));
%calculating mole fractions of ionic species
x1=x(:,1);
x2=x(:,2);
x3=x(:,3);
A=[0.0674243 0.0773881 0.0843400 0.0865343 0.0899223 0.0882858 0.0715087 0.0643867 0.0483658];
B=[0.0141081 0.0479814 0.0643151 0.0737477 0.0820756 0.0838701 0.0701576 0.0634457 0.0479639];
C=[0.0565665 0.0450072 0.0387724 0.0313828 0.02506094 0.0186280 0.0092734 0.0073438 0.0041595 ];
D=[0.0336447 0.0267694 0.0230611 0.0186659 0.0149058 0.0110795 0.0055157 0.0043679 0.0024739 ];
E=[0.0008148 0.0008756 0.00087131 0.0008794 0.0008711 0.0008441 0.0007384 0.0006997 0.0005980 ];
N=length(A);
x1n=zeros(N,1);x2n=zeros(N,1);x3n=zeros(N,1);
X1=zeros(N,1);X2=zeros(N,1);X3=zeros(N,1);X4=zeros(N,1);X5=zeros(N,1);X6=zeros(N,1);X7=zeros(N,1);
X12=zeros(N,1);X16=zeros(N,1);
for i=1:N
x1n(i)=(x1(i)-A(i)-D(i)-2*E(i)-C(i)+3*B(i));
x2n(i)=(x2(i)-A(i)-C(i)-D(i));
x3n(i)=(x3(i)-B(i));
X1(i)=(x1n(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X2(i)=(x2n(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X3(i)=(x3n(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X4(i)=(A(i)+D(i)+E(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X5(i)=(C(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X6(i)=(A(i)-B(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X7(i)=(B(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X12(i)=(E(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X16(i)=(C(i)+D(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
end
yc4=X4./(X4+X5);
yc5=X5./(X4+X5);
yc6=X6./(X6+X7+X12+X16);
yc7=X7./(X6+X7+X12+X16);
yc12=X12./(X6+X7+X12+X16);
yc16=X16./(X6+X7+X12+X16);
alpha14=yc6.*alpha18+yc7.*alpha19+yc12.*alpha113+yc16.*alpha117;
alpha24=yc6.*alpha28+yc7.*alpha29+yc12.*alpha213+yc16.*alpha217;
alpha34=yc6.*alpha38+yc7.*alpha39+yc12.*alpha313+yc16.*alpha317;
alpha15=yc6.*alpha110+yc7.*alpha115+yc12.*alpha114+yc16.*alpha118;
alpha25=yc6.*alpha210+yc7.*alpha215+yc12.*alpha214+yc16.*alpha218;
alpha35=yc6.*alpha310+yc7.*alpha315+yc12.*alpha314+yc16.*alpha318;
alpha16=yc4.*alpha81+yc5.*alpha101;
alpha26=yc4.*alpha82+yc5.*alpha102;
alpha36=yc4.*alpha83+yc5.*alpha103;
alpha17=yc4.*alpha91+yc5.*alpha151;
alpha27=yc4.*alpha92+yc5.*alpha152;
alpha37=yc4.*alpha93+yc5.*alpha153;
alpha112=yc4.*alpha131+yc5.*alpha141;
alpha212=yc4.*alpha132+yc5.*alpha142;
alpha312=yc4.*alpha133+yc5.*alpha143;
alpha116=yc4.*alpha171+yc5.*alpha181;
alpha216=yc4.*alpha172+yc5.*alpha182;
alpha316=yc4.*alpha173+yc5.*alpha183;
alpha46=yc5.*alpha810;
alpha47=yc5.*alpha915;
alpha412=yc5.*alpha1314;
alpha416=yc5.*alpha1718;
alpha56=yc4.*alpha108;
alpha57=yc4.*alpha159;
alpha512=yc4.*alpha1413;
alpha516=yc4.*alpha1817;
G14=yc6.*G18+yc7.*G19+yc12.*G113+yc16.*G117;
G24=yc6.*G28+yc7.*G29+yc12.*G213+yc16.*G217;
G34=yc6.*G38+yc7.*G39+yc12.*G313+yc16.*G317;
G15=yc6.*G110+yc7.*G115+yc12.*G114+yc16.*G118;
G25=yc6.*G210+yc7.*G215+yc12.*G214+yc16.*G218;
G35=yc6.*G310+yc7.*G315+yc12.*G314+yc16.*G318;
G16=yc4.*G81+yc5.*G101;
G26=yc4.*G82+yc5.*G102;
G36=yc4.*G83+yc5.*G103;
G17=yc4.*G91+yc5.*G151;
G27=yc4.*G92+yc5.*G152;
G37=yc4.*G93+yc5.*G153;
G112=yc4.*G131+yc5.*G141;
G212=yc4.*G132+yc5.*G142;
G312=yc4.*G133+yc5.*G143;
G116=yc4.*G171+yc5.*G181;
G216=yc4.*G172+yc5.*G182;
G316=yc4.*G173+yc5.*G183;
G46=yc5.*G810;
G47=yc5.*G915;
G412=yc5.*G1314;
G416=yc5.*G1718;
G56=yc4.*G108;
G57=yc4.*G159;
G512=yc4.*G1413;
G516=yc4.*G1817;
tau14=-log(G14)./alpha14;
tau24=-log(G24)./alpha24;
tau34=-log(G34)./alpha34;
tau15=-log(G15)./alpha15;
tau25=-log(G25)./alpha25;
tau35=-log(G35)./alpha35;
tau16=-log(G16)./alpha16;
tau26=-log(G26)./alpha26;
tau36=-log(G36)./alpha36;
tau17=-log(G17)./alpha17;
tau27=-log(G27)./alpha27;
tau37=-log(G37)./alpha37;
tau112=-log(G112)./alpha112;
tau212=-log(G212)./alpha212;
tau312=-log(G312)./alpha312;
tau116=-log(G116)./alpha116;
tau216=-log(G216)./alpha216;
tau316=-log(G316)./alpha316;
tau46=-log(G46)./alpha46;
tau47=-log(G47)./alpha47;
tau412=-log(G412)./alpha412;
tau416=-log(G416)./alpha416;
tau56=-log(G56)./alpha56;
tau57=-log(G57)./alpha57;
tau512=-log(G512)./alpha512;
tau516=-log(G516)./alpha516;
ln_y1_1=G12.*X2.*tau12+ G31.*X3.*tau13+ G14.*X4.*tau14+G15.*X5.*tau15+G16.*X6.*tau16+G17.*X7.*tau17+G112.*X12.*tau112+G116.*X16.*tau116;
ln_y1_2=G12.*X2+ G13.*X3+ G14.*X4+G15.*X5+G16.*X6+G17.*X7+G112.*X12+G116.*X16;
ln_y2_1=G21.*X1.*tau12+ G32.*X3.*tau32+ G24.*X4.*tau24+G25.*X5.*tau25+G26.*X6.*tau26+G27.*X7.*tau27+G212.*X12.*tau212+G216.*X16.*tau216;
ln_y2_2=G12.*X1+ G23.*X3+G24.*X4+G25.*X5+G26.*X6+G27.*X7+G212.*X12+G216.*X16;
ln_y3_1=G13.*X1.*tau13+ G23.*X3.*tau23+ G34.*X4.*tau34+G35.*X5.*tau35+G36.*X6.*tau36+G37.*X7.*tau37+G312.*X12.*tau312+G316.*X16.*tau316;
ln_y3_2=G13.*X1+ G23.*X3+ G34.*X4+G35.*X5+G36.*X6+G37.*X7+G312.*X12+G316.*X16;
ln_y4_1=G14.*X1.*tau14+G24.*X2.*tau24+G34.*X3.*tau34+G46.*X6.*tau46+G47.*X7.*tau47+G412.*X12.*tau412+G416.*X16.*tau416;
ln_y4_2=G14.*X1+G24.*X2+G34.*X3+G46.*X6+G47.*X7+G412.*X12+G416.*X16;
ln_y5_1=G15.*X1.*tau15+G25.*X2.*tau25+G35.*X3.*tau35+G56.*X6.*tau56+G57.*X7.*tau57+G512.*X12.*tau512+G516.*X16.*tau516;
ln_y5_2=G15.*X1+G25.*X2+G35.*X3+G56.*X6+G57.*X7+G512.*X12+G516.*X16;
ln_y6_1=G16.*X1.*tau16+G26.*X2.*tau26+G36.*X3.*tau36+G46.*X4.*tau46+G56.*X5.*tau56;
ln_y6_2=G16.*X1+G26.*X2+G36.*X3+G46.*X4+G56.*X5;
ln_y7_1=G17.*X1.*tau17+G27.*X2.*tau27+G37.*X3.*tau37+G47.*X4.*tau47+G57.*X5.*tau57;
ln_y7_2=G17.*X1+G27.*X2+G37.*X3+G47.*X4+G57.*X5;
ln_y12_1=G112.*X1.*tau112+G212.*X2.*tau212+G312.*X3.*tau312+G412.*X4.*tau412+G512.*X5.*tau512;
ln_y12_2=G112.*X1+G212.*X2+G312.*X3+G412.*X4+G512.*X5;
ln_y16_1=G116.*X1.*tau116+G216.*X2.*tau216+G316.*X3.*tau316+G416.*X4.*tau416+G516.*X5.*tau516;
ln_y16_2=G116.*X1+G216.*X2+G316.*X3+G416.*X4+G516.*X5;
ln_y1_3=(((X2.*G12)./ln_y2_2).*(tau12-(ln_y2_1)./(ln_y2_2)))+(((X3.*G13)./ln_y3_2).*(tau13-(ln_y3_1)./(ln_y3_2)));
ln_y1_4=(((X6.*G16)./ln_y6_2).*(tau16- (ln_y6_1./ln_y6_2))) + (((X7.*G17)./ln_y7_2).*(tau17- (ln_y7_1./ln_y7_2)))+(((X12.*G12)./ln_y12_2).*(tau112- (ln_y12_1./ln_y12_2)))+(((X16.*G16)./ln_y16_2).*(tau116- (ln_y16_1./ln_y16_2)));
ln_y1_5=(((X4.*G14)./ln_y4_2).*(tau14- (ln_y4_1./ln_y4_2))) + (((X5.*G15)./ln_y5_2).*(tau15- (ln_y5_1./ln_y5_2)));
yk=exp((ln_y1_1./ln_y1_2) + ln_y1_3 + ln_y1_4+ ln_y1_5)'; % activity coefficient for H2O
if any(~isfinite(yk))
yk = 10 * ones(size(yk));
end

Get matrix from one text file

i need get matrix from attached text.
this file:
stk.v.11.0
WrittenBy STK_v11.2.0
BEGIN Attitude
NumberOfAttitudePoints 1441
BlockingFactor 20
InterpolationOrder 1
CentralBody Earth
ScenarioEpoch 1 Sep 2019 07:30:00.000000
Epoch in JDate format: 2458727.81250000000000
Epoch in YYDDD format: 19244.31250000000000
Time of first point: 1 Sep 2019 07:30:00.000000000 UTCG = 2458727.81250000000000 JDate = 19244.31250000000000 YYDDD
CoordinateAxes Fixed
AttitudeTimeQuaternions
0.00000000000000e+00 -6.31921733422396e-01 -3.17293118154861e-01 -3.46458799928973e-01 6.16414065342263e-01
6.00000000000000e+01 -6.43812442721383e-01 -3.36172942616126e-01 -3.26612615262581e-01 6.04828480481266e-01
1.20000000000000e+02 -6.55079207620880e-01 -3.54631351046725e-01 -3.06354155948717e-01 5.92667670562959e-01
1.80000000000000e+02 -6.65707851270427e-01 -3.72650894083426e-01 -2.85703135087044e-01 5.79946623834615e-01
i need seprate this 4 row of matrix and asset it to one matrix variable...
in fact i have bigger matrix than that (that near 1400 rows) i must extract from text file and store it in one variable of matrix...
Use readmatrix and specify the number of lines to skip to get to the matrix data. If the number is fixed at 25 as in your example, then:
>> a = readmatrix('textfile.txt','NumHeaderLines',25)
a =
0 -0.6319 -0.3173 -0.3465 0.6164
60.0000 -0.6438 -0.3362 -0.3266 0.6048
120.0000 -0.6551 -0.3546 -0.3064 0.5927
180.0000 -0.6657 -0.3727 -0.2857 0.5799

Analyze weather data stored in csv

I have some weather data stored in a csv file in the form of: „id, date, temperature, rainfall“, with id being the weather station and, obviously, date being the date of measurement. The file contains the data of 3 different stations over a period of 10 years.
What I'd like to do is analyze the data of each station and each year. For example: I'd like to calculate day-to-day differences in temperature [abs((n+1)-n)] for each station and each year.
I thought while-loops could be a possibility, with the loop calculating something as long as the id value is equal to the one in the next row.
But I’ve no idea how to do it.
Best regards
If you still need assistance, I would consider importing the .csv file data using "readtable". So long as only the first row are text, MATLAB will create a 'table' variable (this shouldn't be an issue for a .csv file). The individual columns can be accessed via "tablename.header" and can be reestablished as double data type (ex variable_1=tablename.header). You can then concatenate your dataset as you like. As for sorting by date and station id, I would advocate using "sortrows". For example, if the station id is the first column, sortrow(data,1) will sort "data" by the station id. sortrow(data, [1 2]) will sort "data" by the first column, then by the second column. From there, you can write an if statement to compare the station id's and perform the required calculations. I hope my brief answer is somewhat helpful.
A basic code structure would be:
path=['copy and paste file path here']; % show matlab where to look
data=readtable([path '\filename.csv'], 'ReadVariableNames',1); % read the file from csv format to table
variable1=data.header1 % general example of making double type variable from table
variable2=data.header2
variable3=data.header3
double_data=[variable1 variable2 variable3]; % concatenates the three columns together
sorted_data=sortrows(double_data, [1 2]); % sorts double_data by column 1 then column 2
It always helps to have actual data to work on and specifics as to what kind of output format is expected. Basically, ins and outs :) With the little info provided, I figured I would generate random data for you in the first section, and then calculate some stats in the second. I include the loop as an example since that's what you asked, but I highly recommend using vectorized calculations whenever available, such as the one done in summary stats.
%% example for weather stations
% generation of random data to correspond to what your csv file looks like
rng(1); % keeps the random seed for testing purposes
nbDates = 1000; % number of days of data
nbStations = 3; % number of weather stations
measureDates = repmat((now()-(nbDates-1):now())',nbStations,1); % nbDates days of data ending today
stationIds = kron((1:nbStations)',ones(nbDates,1)); % assuming 3 weather stations with IDs [1,2,3]
temp = rand(nbStations*nbDates,1)*70+30; % temperatures are in F and vary between 30 and 100 degrees
rain = max(rand(nbStations*nbDates,1)*40-20,0); % rain fall is 0 approximately half the time, and between 0mm and 20mm the rest of the time
csv = table(measureDates, stationIds, temp, rain);
clear measureDates stationIds temps rain;
% augment the original dataset as needed
years = year(csv.measureDates);
data = [csv,array2table(years)];
sorted = sortrows( data, {'stationIds', 'measureDates'}, {'ascend', 'ascend'} );
% example looping through your data
for i = 1 : size( sorted, 1 )
fprintf( 'Id=%d, year=%d, temp=%g, rain=%g', sorted.stationIds( i ), sorted.years( i ), sorted.temp( i ), sorted.rain( i ) );
if( i > 1 && sorted.stationIds( i )==sorted.stationIds( i-1 ) && sorted.years( i )==sorted.years( i-1 ) )
fprintf( ' => absolute difference with day before: %g', abs( sorted.temp( i ) - sorted.temp( i-1 ) ) );
end
fprintf( '\n' ); % new line
end
% depending on the statistics you wish to do, other more efficient ways of
% accessing summary stats might be accessible, for example:
grpstats( data ...
, {'stationIds','years'} ... % group by categories
, {'mean','min','max','meanci'} ... % statistics we want
, 'dataVars', {'temp','rain'} ... % variables on which to calculate stats
) % doesn't require data to be sorted or any looping
This produces one line printed for each row of data (and only calculates difference in temperature when there is no year or station change). It also produces some summary stats at the end, here's what I get:
stationIds years GroupCount mean_temp min_temp max_temp meanci_temp mean_rain min_rain max_rain meanci_rain
__________ _____ __________ _________ ________ ________ ________________ _________ ________ ________ ________________
1_2016 1 2016 82 63.13 30.008 99.22 58.543 67.717 6.1181 0 19.729 4.6284 7.6078
1_2017 1 2017 365 65.914 30.028 99.813 63.783 68.045 5.0075 0 19.933 4.3441 5.6708
1_2018 1 2018 365 65.322 30.218 99.773 63.275 67.369 4.7039 0 19.884 4.0615 5.3462
1_2019 1 2019 188 63.642 31.16 99.654 60.835 66.449 5.9186 0 19.864 4.9834 6.8538
2_2016 2 2016 82 65.821 31.078 98.144 61.179 70.463 4.7633 0 19.688 3.4369 6.0898
2_2017 2 2017 365 66.002 30.054 99.896 63.902 68.102 4.5902 0 19.902 3.9267 5.2537
2_2018 2 2018 365 66.524 30.072 99.852 64.359 68.69 4.9649 0 19.812 4.2967 5.6331
2_2019 2 2019 188 66.481 30.249 99.889 63.647 69.315 5.2711 0 19.811 4.3234 6.2189
3_2016 3 2016 82 61.996 32.067 98.802 57.831 66.161 4.5445 0 19.898 3.1523 5.9366
3_2017 3 2017 365 63.914 30.176 99.902 61.932 65.896 4.8879 0 19.934 4.246 5.5298
3_2018 3 2018 365 63.653 30.137 99.991 61.595 65.712 5.3728 0 19.909 4.6943 6.0514
3_2019 3 2019 188 64.201 30.078 99.8 61.319 67.082 5.3926 0 19.874 4.4541 6.3312

downsampling rate with movement data (first point equal from the original matrix)

I was wondering if the procedure applied trying to download the sample rate was the appropriate as follows the instruction: y = downsample(x,n)
downsamp_rate = 40;
downsampled_data = downsample(X,downsamp_rate);
.. because my doubt relays in why the first column from both matrices is exactly the same (the original matrix and the sample donwloaded)maintaining the same data....
then the other data have already transformed to a lower sample rate.
Thank you so much!
Best!
edited: Sample data. I pasted the data but I can upload de .mat files.
Original data.
column 1 column 2 column 3
-0,593600000000000 -0,592699999999996 -0,591899999999995
2,42180000000000 2,41010000000000 2,40360000000000
1,78550000000000 1,79020000000000 1,79530000000000
-1,30590000000000 -1,31520000000000 -1,31530000000000
-0,707800000000003 -0,712699999999999 -0,727700000000003
-0,986500000000001 -0,996000000000002 -1,00460000000000
-0,989699999999999 -0,989699999999999 -0,989699999999999
1,23500000000000 1,22970000000000 1,21880000000000
0,122899999999998 0,127899999999997 0,128899999999998
0,938300000000003 0,937500000000002 0,936200000000004
0,248600000000004 0,248500000000002 0,248700000000002
-0,381499999999996 -0,393199999999999 -0,393699999999997
0,294099999999997 0,279299999999999 0,271299999999997
-0,223200000000001 -0,223699999999999 -0,227299999999997
0,0879999999999992 0,117300000000004 0,122500000000003
-0,167899999999999 -0,170999999999999 -0,174800000000003
-0,687499999999996 -0,697199999999998 -0,701600000000002
-0,681700000000002 -0,682200000000000 -0,683000000000000
1,19659999999999 1,19670000000000 1,19490000000000
-0,565500000000008 -0,565199999999999 -0,557400000000008
Downsampled data
column 1 column 2 column 3
-0,593600000000000 0,821900000000003 0,936300000000001
2,42180000000000 1,14610000000000 -0,255400000000000
1,78550000000000 2,86550000000000 3,66890000000000
-1,30590000000000 7,01950000000000 12,9564000000000
-0,707800000000003 3,05920000000000 0,852999999999998
-0,986500000000001 -0,372200000000000 -0,951000000000002
-0,989699999999999 -0,988000000000000 -1,21730000000000
1,23500000000000 5,79700000000000 3,40880000000000
0,122899999999998 5,32230000000000 5,19260000000000
0,938300000000003 4,88130000000000 7,55900000000000
0,248600000000004 4,79290000000000 2,96620000000000
-0,381499999999996 -0,400000000000000 0,641500000000000
0,294099999999997 -0,131400000000004 -1,20040000000000
-0,223200000000001 1,49610000000000 1,59030000000000
0,0879999999999992 0,418700000000000 -0,0114999999999976
-0,167899999999999 0,0149999999999983 -0,857500000000000
-0,687499999999996 -0,593100000000002 0,119700000000000
-0,681700000000002 -0,170000000000003 0,126799999999999
1,19659999999999 1,17670000000000 1,15780000000000
-0,565500000000008 8,89019999999999 6,58569999999999
A possible for your output is a periodic input signal with a period length of downsamp_rate-1. To give a short demonstration:
>> X=repmat(1:39,1,10);
>> downsampled_data = downsample(X,downsamp_rate);
>> downsampled_data
downsampled_data =
Columns 1 through 9
1 2 3 4 5 6 7 8 9
Column 10
10
Thus, take a look at your rows 40,41,42. I assume the first value is identical to your row 1,2,3

Fails when using sequentialfs in MATLAB?

I am trying to use sequentialfs on a logistic regression in order to determine the variables to include. I have tried to modify the answer from here Sequential feature selection Matlab in order to make it work, but the handle part is tricky to me!
I am using
[b, dev, stats] = glmfit(X_train,y_train,'binomial','link','logit')
in order to fit a model, and then I use
y_hat = glmval(b,X_test,'logit');
to evaluate the model output. I have tried to make a handle
f = #(X_test, y_test)...
(sum(y_test ~= round(glmval(b,X_test,'logit'))))
Which just says that I have "Too many input arguments", when I use
sequentialfs(f,X,y)
Can anyone help identify the relevant variables? A data set is below, where all variables will be deemed relevant, but the real problem is of a much larger scale which leads to overparametrization of the model.
X=
0,0305780001742986 0,0293310740486058 0,0289653631914407 0,0313646568650811 0,0308948854477814 0,0298740323895053
0,0221062699789144 0,0213746063391538 0,0196872068542263 0,0209915418572080 0,0206064713419377 0,0198587113064423
0,0275588428138312 0,0273957214651399 0,0291622596392042 0,0313729847567230 0,0314439993783026 0,0307137185244424
0,0262451954064372 0,0230629198767100 0,0251192876802874 0,0243459679829053 0,0235752627390268 0,0245208219450122
0,0232074343987232 0,0272778269415268 0,0288725913067116 0,0274565324127577 0,0283032894902223 0,0287391056368953
0,0237589488887855 0,0251929947662669 0,0209989755767701 0,0146662148369109 0,0193830305676060 0,0198900627308170
0,0275142606053146 0,0311683593689258 0,0287109246912083 0,0307544961383919 0,0293964246310913 0,0291758079280418
0,0284337063240141 0,0249820611584738 0,0261664330153102 0,0270312804219022 0,0269178494606530 0,0273467522864029
0,0279150521116060 0,0314021886143824 0,0291020356476994 0,0278247389567505 0,0294200854382611 0,0306460336509255
0,0270622115094110 0,0317795784913586 0,0278283619288299 0,0307757941373800 0,0292513615541838 0,0283900407512898
0,0270275432108930 0,0330384417745352 0,0323886885104962 0,0330255939800101 0,0329789138848656 0,0333091935226094
0,0263417729025468 0,0243442097895390 0,0253328659546050 0,0270828343149025 0,0262845355278762 0,0257915212526289
0,0247503544929709 0,0282150822748136 0,0282408722769508 0,0306907484707723 0,0284025718962319 0,0280291257508206
0,0282116130443164 0,0259317921438547 0,0316179116969559 0,0300579055064814 0,0315134680888256 0,0299693403497154
0,0221040769734790 0,0232354360252008 0,0231588261739581 0,0240414200785524 0,0209509517094598 0,0223174875964419
0,0264965936690525 0,0312918915850473 0,0297480867085914 0,0349060220702562 0,0307640365732823 0,0299291946182921
0,0283027112468824 0,0288419304885060 0,0275801208398665 0,0239401671924088 0,0263296648119700 0,0260497226349653
0,0298063469363023 0,0253535298515575 0,0245113899712628 0,0158158669753461 0,0228044538675689 0,0221885556611738
0,0276409442724517 0,0283430130710139 0,0303893043659674 0,0314511518633802 0,0315673022208602 0,0302375851656905
0,0270849987059202 0,0312381323489334 0,0301662309393833 0,0290482017036615 0,0299207348490636 0,0295746114571195
0,0225683074444599 0,0297455473987182 0,0241145950178924 0,0233857691372279 0,0259911200866772 0,0248345888658664
0,0267870191916224 0,0254332496269710 0,0270915983261551 0,0263567311441536 0,0267470932454238 0,0279742674970829
0,0271933786309775 0,0274722435013798 0,0249484244285920 0,0301299698898670 0,0255527349811283 0,0263901147510067
0,0262114755708772 0,0168593285634855 0,0205147916736994 0,0227484518393022 0,0187327306255277 0,0197581601499656
0,0314796048983783 0,0281771847088388 0,0318159664952504 0,0325586660052902 0,0315277330661507 0,0324593871738733
0,0265694239044569 0,0239306067986609 0,0263341531447523 0,0277011505766640 0,0274385429019891 0,0258861916796922
0,0248731179403750 0,0253801474063385 0,0258606627949811 0,0234446644496543 0,0262626821946271 0,0265908368467206
0,0268335079060221 0,0327877888534457 0,0292050848084788 0,0286811931594028 0,0288058286572012 0,0297311873407772
0,0289655102183149 0,0297912585631799 0,0289955796469846 0,0301374575223736 0,0286017508461882 0,0294957275181626
0,0291599221376467 0,0284752300175276 0,0294302899166185 0,0291600417637315 0,0304544724881292 0,0299842921064845
0,0306574107981617 0,0282562949634004 0,0290757741762068 0,0266759231631069 0,0276079132984815 0,0283490628634946
0,0251419690541645 0,0190208495552799 0,0219740565645893 0,0229700198654972 0,0231913971059666 0,0218104715761714
0,0279646047804379 0,0213712975401601 0,0266486742241442 0,0282351537040943 0,0256010069497723 0,0254151479565295
0,0291819765619095 0,0274841050997730 0,0277287252877124 0,0253910521460239 0,0270103968729900 0,0280719282161061
0,0267034888169756 0,0230641558482702 0,0254891453796317 0,0246876229024146 0,0232209026694445 0,0255548967304529
0,0282525579813869 0,0275995431014968 0,0263616638576222 0,0282844384170093 0,0262147272435204 0,0257109782177451
0,0293098464976821 0,0323503780295293 0,0266630772869637 0,0228075737924046 0,0263296732877124 0,0254313353506367
y=
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
The function you use for sequentialfs should take in four variables. Yours only accepts two:
criterion = fun(XTRAIN,ytrain,XTEST,ytest)
This is because, even though you give only two variables, X and y, sequentialfs splits these into a training and testing subset (straight from the docs):
Starting from an empty feature set, sequentialfs creates candidate
feature subsets by sequentially adding each of the features not yet
selected. For each candidate feature subset, sequentialfs performs
10-fold cross-validation by repeatedly calling fun with different
training subsets of X and y, XTRAIN and ytrain, and test subsets of X
and y, XTEST and ytest
So the function passed for sequentialfs must take both the training and test subsets, e.g.:
function criterion = my_function(X_train,y_train,X_test,y_test)
[b, dev, stats] = glmfit(X_train,y_train,'binomial','link','logit')
y_hat = glmval(b,X_test,'logit');
criterion = sum(y_test ~= round(y_hat));
end
If this function is on your path you can pass it as #my_function, you don't have to make an anonymous function to get a handle.