An issue with argument "sortv" of function seqIplot() - traminer

I'm trying to plot individual sequences by means of function seqIplot() in TraMineR. These individual sequences represent work trajectories, completed by former school's graduates via a WEB questionnaire.
Using argument "sortv", I'd like to sort my sequences according to the order of the levels of one covariate, the year of graduation, named "PROMO".
"PROMO" is a factor variable contained in a data frame named "covariates.seq", gathering covariates together:
str(covariates.seq)
'data.frame': 733 obs. of 6 variables:
$ ID_SQ : Factor w/ 733 levels "1","2","3","5",..: 1 2 3 4 5 6
7 8 9 10 ...
$ SEXE : Factor w/ 2 levels "Féminin","Masculin": 1 1 1 1 2 1
1 2 2 1 ...
$ PROMO : Factor w/ 6 levels "1997","1998",..: 1 2 2 4 4 3 2 2
2 2 ...
$ DEPARTEMENT : Factor w/ 10 levels "BC","GCU","GE",..: 1 4 7 8 7 9
9 7 7 4 ...
$ NIVEAU_ADMISSION: Factor w/ 2 levels "En Premier Cycle",..: NA 1 1 1 1
1 NA 1 1 1 ...
$ FILIERE_SECTION : Factor w/ 4 levels "Cursus Classique",..: NA 4 2 NA
1 1 NA NA 4 3 ..
I'm also using "SEXE", the graduates' gender, as a grouping variable. To plot the individual sequences so, my command is as follows:
seqIplot(sequences, group = covariates.seq$SEXE,
sortv = covariates.seq$PROMO,
cex.axis = 0.7, cex.legend = 0.7)
I expected that, by using a process time axis (with the year of graduation as sequence-dependent origin), sorting the sequences according to the order of the levels of "PROMO" would give a plot with groups of sequences from the longest (for the older graduates) to the shortest (for the younger graduates).
But I've got an issue: in the output plot, the sequences don't appear to be correctly sorted according to the levels of "PROMO". Indeed, by using "sortv = covariates.seq$PROMO" as in the command above, the plot doesn't show groups of sequences from the longest to the shortest, as expected. It looks like the plot obtained without using the argument "sortv" (see Figures below).
Without using argument "sortv"
Using "sortv = covariates.seq$PROMO"
Note that I have 733 individual sequences in my object "sequences", created as follows:
labs <- c("En poste","Au chômage (d'au moins 6 mois)", "Autre situation
(d'au moins 6 mois)","En poursuite d'études (thèse ou hors
thèse)", "En reprise d'études / formation (d'au moins 6 mois)")
codes <- c("En poste", "Au chômage", "Autre situation", "En poursuite
d'études", "En reprise d'études / formation")
sequences <- seqdef(situations, alphabet = labs, states = codes, left =
NA, right = "DEL", missing = NA,
cnames = as.character(seq(0,7400/365,1/365)),
xtstep = 365)
The values of the covariates are sorted in the same order as the individual sequences. The covariate "PROMO" doesn't contain any missing value.
Something's going wrong, but what?
Thank you in advance for your help,
Best,
Arnaud.

Using a factor as sortv argument in seqIplot works fine as illustrated by the example below:
sdc <- c("aabbccdd","bbbccc","aaaddd","abcabcab")
sd <- seqdecomp(sdc, sep="")
seq <- seqdef(sd)
fac <- factor(c("2000","2001","2001","2000"))
par(mfrow=c(1,3))
seqIplot(seq, with.legend=FALSE)
seqIplot(seq, sortv=fac, with.legend=FALSE)
seqlegend(seq)

Related

Having trouble in using nlinfit function in MATLAB

Kindly please help me with the problem as I need to use nlinfit function for fitting unknown parameters but it is showing some error. Although yesterday I was getting some values for parameters to be fitted but now if I am running it is having some issue for the function output to be used in fitted with NaN answer for last iteration only. X data is a concatenated matrix of three columns as independent variable and yk is dependent variable, taua is a matrix of initial guesses of number of parameters to be fitted.
function [yk]=activity_coefficientE(taua,x)
T=523;
alpha12=0.3; alpha13=0.3; alpha21=0.3; alpha23=0.3; alpha31=0.3; alpha32=0.3;
alpha18=0.2; alpha81=0.2; alpha28=0.2; alpha82=0.2; alpha38=0.2; alpha83=0.3;
alpha19=0.2; alpha91=0.2; alpha29=0.2; alpha92=0.2; alpha39=0.2; alpha93=0.2;
alpha110=0.2;alpha101=0.2;alpha210=0.2;alpha102=0.2;alpha310=0.2;alpha103=0.2;
alpha113=0.2;alpha131=0.2;alpha213=0.2;alpha132=0.2;alpha313=0.2;alpha133=0.2;
alpha114=0.2;alpha141=0.2;alpha214=0.2;alpha142=0.2;alpha314=0.2;alpha143=0.2;
alpha115=0.2;alpha151=0.2;alpha215=0.2;alpha152=0.2;alpha315=0.2;alpha153=0.2;
alpha117=0.2;alpha171=0.2;alpha217=0.2;alpha172=0.2;alpha317=0.2;alpha173=0.2;
alpha118=0.2;alpha181=0.2;alpha218=0.2;alpha182=0.2;alpha318=0.2;alpha183=0.2;
alpha810=0.2;alpha915=0.2;alpha1314=0.2;alpha108=0.2;alpha159=0.2;alpha1413=0.2;
alpha1718=0.2;alpha1817=0.2;
tau12=0; tau13=0; tau21=0; tau23=0; tau31=0; tau32=0;
%taua=randi([-5,5],1,112)
tau18=taua(1)+taua(57)/T;
tau81=taua(2)+taua(58)/T;
tau28=taua(3)+taua(59)/T;
tau82=taua(4)+taua(60)/T;
tau38=taua(5)+taua(61)/T;
tau83=taua(6)+taua(62)/T;
tau19=taua(7)+taua(63)/T;
tau91=taua(8)+taua(64)/T;
tau29=taua(9)+taua(65)/T;
tau92=taua(10)+taua(66)/T;
tau39=taua(11)+taua(67)/T;
tau93=taua(12)+taua(68)/T;
tau110=taua(13)+taua(69)/T;
tau101=taua(14)+taua(70)/T;
tau210=taua(15)+taua(71)/T;
tau102=taua(16)+taua(72)/T;
tau310=taua(17)+taua(73)/T;
tau103=taua(18)+taua(74)/T;
tau113=taua(19)+taua(75)/T;
tau131=taua(20)+taua(76)/T;
tau213=taua(21)+taua(77)/T;
tau132=taua(22)+taua(78)/T;
tau313=taua(23)+taua(79)/T;
tau133=taua(24)+taua(80)/T;
tau114=taua(25)+taua(81)/T;
tau141=taua(26)+taua(82)/T;
tau214=taua(27)+taua(83)/T;
tau142=taua(28)+taua(84)/T;
tau314=taua(29)+taua(85)/T;
tau143=taua(30)+taua(86)/T;
tau115=taua(31)+taua(87)/T;
tau151=taua(32)+taua(88)/T;
tau215=taua(33)+taua(89)/T;
tau152=taua(34)+taua(90)/T;
tau315=taua(35)+taua(91)/T;
tau153=taua(36)+taua(92)/T;
tau117=taua(37)+taua(93)/T;
tau171=taua(38)+taua(94)/T;
tau217=taua(39)+taua(95)/T;
tau172=taua(40)+taua(96)/T;
tau317=taua(41)+taua(97)/T;
tau173=taua(42)+taua(98)/T;
tau118=taua(43)+taua(99)/T;
tau181=taua(44)+taua(100)/T;
tau218=taua(45)+taua(101)/T;
tau182=taua(46)+taua(102)/T;
tau318=taua(47)+taua(103)/T;
tau183=taua(48)+taua(104)/T;
tau810=taua(49)+taua(105)/T;
tau108=taua(50)+taua(106)/T;
tau915=taua(51)+taua(107)/T;
tau159=taua(52)+taua(108)/T;
tau1314=taua(53)+taua(109)/T;
tau1413=taua(54)+taua(110)/T;
tau1718=taua(55)+taua(111)/T;
tau1817=taua(56)+taua(112)/T;
G12=exp(-(tau12*alpha12));
G21=exp(-(tau21*alpha21));
G13=exp(-(tau13*alpha13));
G31=exp(-(tau31*alpha31));
G23=exp(-(tau23*alpha23));
G32=exp(-(tau32*alpha32));
G18=exp(-(tau18*alpha18));
G81=exp(-(tau81*alpha81));
G28=exp(-(tau28*alpha28));
G82=exp(-(tau82*alpha82));
G38=exp(-(tau38*alpha83));
G83=exp(-(tau83*alpha83));
G19=exp(-(tau19*alpha19));
G91=exp(-(tau91*alpha91));
G29=exp(-(tau29*alpha29));
G92=exp(-(tau92*alpha92));
G39=exp(-(tau39*alpha39));
G93=exp(-(tau93*alpha93));
G110=exp(-(tau110*alpha110));
G101=exp(-(tau101*alpha101));
G210=exp(-(tau210*alpha210));
G102=exp(-(tau102*alpha102));
G310=exp(-(tau310*alpha310));
G103=exp(-(tau103*alpha103));
G113=exp(-(tau113*alpha113));
G131=exp(-(tau131*alpha131));
G213=exp(-(tau213*alpha213));
G132=exp(-(tau132*alpha132));
G313=exp(-(tau313*alpha313));
G133=exp(-(tau133*alpha133));
G114=exp(-(tau114*alpha114));
G141=exp(-(tau141*alpha141));
G214=exp(-(tau214*alpha214));
G142=exp(-(tau142*alpha142));
G314=exp(-(tau314*alpha314));
G143=exp(-(tau143*alpha143));
G115=exp(-(tau115*alpha115));
G151=exp(-(tau151*alpha151));
G215=exp(-(tau215*alpha215));
G152=exp(-(tau152*alpha152));
G315=exp(-(tau315*alpha315));
G153=exp(-(tau153*alpha153));
G117=exp(-(tau117*alpha117));
G171=exp(-(tau171*alpha171));
G217=exp(-(tau217*alpha217));
G172=exp(-(tau172*alpha172));
G317=exp(-(tau317*alpha317));
G173=exp(-(tau173*alpha173));
G118=exp(-(tau118*alpha118));
G181=exp(-(tau181*alpha181));
G218=exp(-(tau218*alpha218));
G182=exp(-(tau182*alpha182));
G318=exp(-(tau318*alpha318));
G183=exp(-(tau183*alpha183));
G810=exp(-(tau810*alpha810));
G108=exp(-(tau108*alpha108));
G915=exp(-(tau915*alpha915));
G159=exp(-(tau159*alpha159));
G1314=exp(-(tau1314*alpha1314));
G1413=exp(-(tau1413*alpha1413));
G1718=exp(-(tau1718*alpha1718));
G1817=exp(-(tau1817*alpha1817));
%calculating mole fractions of ionic species
x1=x(:,1);
x2=x(:,2);
x3=x(:,3);
%x1=[0.1577 0.1492 0.1462 0.1366 0.1299 0.1180 0.0863 0.0761 0.0550 ];
%x2=[0.8278 0.7945 0.7678 0.7450 0.6979 0.6309 0.4611 0.4114 0.2952 ];
%x3=[0.0145 0.0563 0.0860 0.1184 0.1722 0.2511 0.4526 0.5125 0.6498 ];
A=[0.0674243 0.0773881 0.0843400 0.0865343 0.0899223 0.0882858 0.0715087 0.0643867 0.0483658];
B=[0.0141081 0.0479814 0.0643151 0.0737477 0.0820756 0.0838701 0.0701576 0.0634457 0.0479639];
C=[0.0565665 0.0450072 0.0387724 0.0313828 0.02506094 0.0186280 0.0092734 0.0073438 0.0041595 ];
D=[0.0336447 0.0267694 0.0230611 0.0186659 0.0149058 0.0110795 0.0055157 0.0043679 0.0024739 ];
E=[0.0008148 0.0008756 0.00087131 0.0008794 0.0008711 0.0008441 0.0007384 0.0006997 0.0005980 ];
N=length(A);
x1n=zeros(N,1);x2n=zeros(N,1);x3n=zeros(N,1);
X1=zeros(N,1);X2=zeros(N,1);X3=zeros(N,1);X4=zeros(N,1);X5=zeros(N,1);X6=zeros(N,1);X7=zeros(N,1);
X12=zeros(N,1);X16=zeros(N,1);
for i=1:N
x1n(i)=(x1(i)-A(i)-D(i)-2*E(i)-C(i)+3*B(i))
x2n(i)=(x2(i)-A(i)-C(i)-D(i))
x3n(i)=(x3(i)-B(i))
X1(i)=(x1n(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X2(i)=(x2n(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X3(i)=(x3n(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X4(i)=(A(i)+D(i)+E(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X5(i)=(C(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X6(i)=(A(i)-B(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X7(i)=(B(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X12(i)=(E(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
X16(i)=(C(i)+D(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)))
end
yc4=X4./(X4+X5);
yc5=X5./(X4+X5);
yc6=X6./(X6+X7+X12+X16);
yc7=X7./(X6+X7+X12+X16);
yc12=X12./(X6+X7+X12+X16);
yc16=X16./(X6+X7+X12+X16);
alpha14=yc6.*alpha18+yc7.*alpha19+yc12.*alpha113+yc16.*alpha117;
%alpha41=alpha14;
alpha24=yc6.*alpha28+yc7.*alpha29+yc12.*alpha213+yc16.*alpha217;
%alpha42=alpha24;
alpha34=yc6.*alpha38+yc7.*alpha39+yc12.*alpha313+yc16.*alpha317;
%alpha43=alpha34;
alpha15=yc6.*alpha110+yc7.*alpha115+yc12.*alpha114+yc16.*alpha118;
%alpha51=alpha15;
alpha25=yc6.*alpha210+yc7.*alpha215+yc12.*alpha214+yc16.*alpha218;
%alpha52=alpha25;
alpha35=yc6.*alpha310+yc7.*alpha315+yc12.*alpha314+yc16.*alpha318;
%alpha53=alpha35;
alpha16=yc4.*alpha81+yc5.*alpha101;
%alpha61=alpha16;
alpha26=yc4.*alpha82+yc5.*alpha102;
%alpha62=alpha26;
alpha36=yc4.*alpha83+yc5.*alpha103;
%alpha63=alpha36;
alpha17=yc4.*alpha91+yc5.*alpha151;
%alpha71=alpha17;
alpha27=yc4.*alpha92+yc5.*alpha152;
%alpha72=alpha27;
alpha37=yc4.*alpha93+yc5.*alpha153;
%alpha73=alpha37;
alpha112=yc4.*alpha131+yc5.*alpha141;
%alpha121=alpha112;
alpha212=yc4.*alpha132+yc5.*alpha142;
%alpha122=alpha212;
alpha312=yc4.*alpha133+yc5.*alpha143;
%alpha123=alpha312;
alpha116=yc4.*alpha171+yc5.*alpha181;
%alpha161=alpha116;
alpha216=yc4.*alpha172+yc5.*alpha182;
%alpha162=alpha216;
alpha316=yc4.*alpha173+yc5.*alpha183;
%alpha163=alpha316;
alpha46=yc5.*alpha810;
%alpha64=alpha46;
alpha47=yc5.*alpha915;
%alpha74=alpha47;
alpha412=yc5.*alpha1314;
%alpha124=alpha412;
alpha416=yc5.*alpha1718;
%alpha164=alpha416;
alpha56=yc4.*alpha108;
%alpha65=alpha56;
alpha57=yc4.*alpha159;
%alpha75=alpha57;
alpha512=yc4.*alpha1413;
%alpha125=alpha512;
alpha516=yc4.*alpha1817;
%alpha165=alpha516;
G14=yc6.*G18+yc7.*G19+yc12.*G113+yc16.*G117;
%G41=G14;
G24=yc6.*G28+yc7.*G29+yc12.*G213+yc16.*G217;
%G42=G24;
G34=yc6.*G38+yc7.*G39+yc12.*G313+yc16.*G317;
%G43=G34;
G15=yc6.*G110+yc7.*G115+yc12.*G114+yc16.*G118;
%G51=G15;
G25=yc6.*G210+yc7.*G215+yc12.*G214+yc16.*G218;
%G52=G25;
G35=yc6.*G310+yc7.*G315+yc12.*G314+yc16.*G318;
%G53=G35;
G16=yc4.*G81+yc5.*G101;
%G61=G16;
G26=yc4.*G82+yc5.*G102;
%G62=G26;
G36=yc4.*G83+yc5.*G103;
%G63=G36;
G17=yc4.*G91+yc5.*G151;
%G71=G17;
G27=yc4.*G92+yc5.*G152;
%G72=G27;
G37=yc4.*G93+yc5.*G153;
%G73=G37;
G112=yc4.*G131+yc5.*G141;
%G121=G112;
G212=yc4.*G132+yc5.*G142;
%G122=G212;
G312=yc4.*G133+yc5.*G143;
%G123=G312;
G116=yc4.*G171+yc5.*G181;
%G161=G116;
G216=yc4.*G172+yc5.*G182;
%G162=G216;
G316=yc4.*G173+yc5.*G183;
%G163=G316;
G46=yc5.*G810;
%G64=G46;
G47=yc5.*G915;
%G74=G47;
G412=yc5.*G1314;
%G124=G412;
G416=yc5.*G1718;
%G164=G416;
G56=yc4.*G108;
%G65=G56;
G57=yc4.*G159;
%G75=G57;
G512=yc4.*G1413;
%G125=G512;
G516=yc4.*G1817;
%G165=G516;
tau14=-log(G14)./alpha14;
%tau41=tau14;
tau24=-log(G24)./alpha24;
%tau42=tau24;
tau34=-log(G34)./alpha34;
%tau43=tau34;
tau15=-log(G15)./alpha15;
%tau51=tau15;
tau25=-log(G25)./alpha25;
%tau52=tau25;
tau35=-log(G35)./alpha35;
%tau53=tau35;
tau16=-log(G16)./alpha16;
%tau61=tau16;
tau26=-log(G26)./alpha26;
%tau62=tau26;
tau36=-log(G36)./alpha36;
%tau63=tau36;
tau17=-log(G17)./alpha17;
%tau71=tau17;
tau27=-log(G27)./alpha27;
%tau72=tau27;
tau37=-log(G37)./alpha37;
%tau73=tau37;
tau112=-log(G112)./alpha112;
%tau121=tau112;
tau212=-log(G212)./alpha212;
%tau122=tau212;
tau312=-log(G312)./alpha312;
%tau123=tau312;
tau116=-log(G116)./alpha116;
%tau161=tau116;
tau216=-log(G216)./alpha216;
%tau162=tau216;
tau316=-log(G316)./alpha316;
%tau163=tau316;
tau46=-log(G46)./alpha46;
%tau64=tau46;
tau47=-log(G47)./alpha47;
%tau74=tau47;
tau412=-log(G412)./alpha412;
%tau124=tau412;
tau416=-log(G416)./alpha416;
%tau164=tau416;
tau56=-log(G56)./alpha56;
%tau65=tau56;
tau57=-log(G57)./alpha57;
%tau75=tau57;
tau512=-log(G512)./alpha512;
%tau125=tau512;
tau516=-log(G516)./alpha516;
%tau165=tau516;
ln_y1_1=G12.*X2.*tau12+ G31.*X3.*tau13+ G14.*X4.*tau14+G15.*X5.*tau15+G16.*X6.*tau16+G17.*X7.*tau17+G112.*X12.*tau112+G116.*X16.*tau116;
ln_y1_2=G12.*X2+ G13.*X3+ G14.*X4+G15.*X5+G16.*X6+G17.*X7+G112.*X12+G116.*X16;
ln_y2_1=G21.*X1.*tau12+ G32.*X3.*tau32+ G24.*X4.*tau24+G25.*X5.*tau25+G26.*X6.*tau26+G27.*X7.*tau27+G212.*X12.*tau212+G216.*X16.*tau216;
ln_y2_2=G12.*X1+ G23.*X3+G24.*X4+G25.*X5+G26.*X6+G27.*X7+G212.*X12+G216.*X16;
ln_y3_1=G13.*X1.*tau13+ G23.*X3.*tau23+ G34.*X4.*tau34+G35.*X5.*tau35+G36.*X6.*tau36+G37.*X7.*tau37+G312.*X12.*tau312+G316.*X16.*tau316;
ln_y3_2=G13.*X1+ G23.*X3+ G34.*X4+G35.*X5+G36.*X6+G37.*X7+G312.*X12+G316.*X16;
ln_y4_1=G14.*X1.*tau14+G24.*X2.*tau24+G34.*X3.*tau34+G46.*X6.*tau46+G47.*X7.*tau47+G412.*X12.*tau412+G416.*X16.*tau416;
ln_y4_2=G14.*X1+G24.*X2+G34.*X3+G46.*X6+G47.*X7+G412.*X12+G416.*X16;
ln_y5_1=G15.*X1.*tau15+G25.*X2.*tau25+G35.*X3.*tau35+G56.*X6.*tau56+G57.*X7.*tau57+G512.*X12.*tau512+G516.*X16.*tau516;
ln_y5_2=G15.*X1+G25.*X2+G35.*X3+G56.*X6+G57.*X7+G512.*X12+G516.*X16;
ln_y6_1=G16.*X1.*tau16+G26.*X2.*tau26+G36.*X3.*tau36+G46.*X4.*tau46+G56.*X5.*tau56;
ln_y6_2=G16.*X1+G26.*X2+G36.*X3+G46.*X4+G56.*X5;
ln_y7_1=G17.*X1.*tau17+G27.*X2.*tau27+G37.*X3.*tau37+G47.*X4.*tau47+G57.*X5.*tau57;
ln_y7_2=G17.*X1+G27.*X2+G37.*X3+G47.*X4+G57.*X5;
ln_y12_1=G112.*X1.*tau112+G212.*X2.*tau212+G312.*X3.*tau312+G412.*X4.*tau412+G512.*X5.*tau512;
ln_y12_2=G112.*X1+G212.*X2+G312.*X3+G412.*X4+G512.*X5;
ln_y16_1=G116.*X1.*tau116+G216.*X2.*tau216+G316.*X3.*tau316+G416.*X4.*tau416+G516.*X5.*tau516;
ln_y16_2=G116.*X1+G216.*X2+G316.*X3+G416.*X4+G516.*X5;
ln_y1_3=(((X2.*G12)./ln_y2_2).*(tau12-(ln_y2_1)./(ln_y2_2)))+(((X3.*G13)./ln_y3_2).*(tau13-(ln_y3_1)./(ln_y3_2)));
ln_y1_4=(((X6.*G16)./ln_y6_2).*(tau16- (ln_y6_1./ln_y6_2))) + (((X7.*G17)./ln_y7_2).*(tau17- (ln_y7_1./ln_y7_2)))+(((X12.*G12)./ln_y12_2).*(tau112- (ln_y12_1./ln_y12_2)))+(((X16.*G16)./ln_y16_2).*(tau116- (ln_y16_1./ln_y16_2)));
ln_y1_5=(((X4.*G14)./ln_y4_2).*(tau14- (ln_y4_1./ln_y4_2))) + (((X5.*G15)./ln_y5_2).*(tau15- (ln_y5_1./ln_y5_2)));
yk=exp((ln_y1_1./ln_y1_2) + ln_y1_3 + ln_y1_4+ ln_y1_5) % activity coefficient for H2O
end
........................................
Another function where above function to be called.....
% calling the function act_coeff to estimate the binary interaction parameters
for i=1:112
filename = 'EagelsDATA.xlsx'; %reading VLE data from excel file
Data = xlsread(filename);
x(:,1) = Data([10:15 17:19],16);
x(:,2) = Data([10:15 17:19],1);
x(:,3)= Data([10:15 17:19],2);
taua=(randi([-5,5],1,112));
yk=[0.0606 (values calculated from above function and will be used for fitting)
0.4327
0.6545
0.9417
1.2570
1.6881
1.9108
1.7777
1.3821]
% taua =[ -2 3 4 -3 -4 1 4 -2 4 -4 -1 4 5 -3 3 2 -5 3 -4
% 1 4 1 5 -1 -1 -3 2 -3 4 3 4 2 5 4 -2 4 3 -1
% 1 0 -5 -5 -5 -3 4 2 1 4 0 2 -3 -4 5 0 -3 2 5
% 1 0 5 1 -3 5 4 1 5 2 3 2 0 -5 -4 -2 1 -2 5
%-5 5 -2 -2 4 1 -1 3 -1 1 5 -1 0 -1 4 5 5 1 4
% 1 0 4 -4 4 0 -1 -2 -5 -3 -4 -5
% -5 0 -2 0 -5] (random values for which yk was calculted from the command
taua= randi([-5,5],1,112))
try % try-catch used to continue the loop without stopping on encountering an error
[taua1]= nlinfit(x,yk,#activity_coefficientE,taua)
catch exception
continue
end
end
I am not able to attach excel sheet here so data from excel sheet is as:
x =[0.1577 0.1492 0.1462 0.1366 0.1299 0.1180 0.0863 0.0761 0.0550; column 1
0.8278 0.7945 0.7678 0.7450 0.6979 0.6309 0.4611 0.4114 0.2952 ; column 2
0.0145 0.0563 0.0860 0.1184 0.1722 0.2511 0.4526 0.5125 0.6498 ]; column 3
I found 3 major problems with what you did.
Problem #1 - errors
The reason you get the error is because your function "activity_coefficientE" can sometimes return NaN or inf values. My suggestion is to look for these values and set the value of "yk" to a large value so that the optimizer in "nlinfit" will stay away from coefficients that produce infinite or NaN values. This is the code at the bottom of the function so that you avoid crashes:
if any(~isfinite(yk))
yk = 10 * ones(size(yk));
end
Problem #2 - random initial guesses
The trouble with using random numbers for your initial conditions is that every time you run it you get a different answer, so sometimes it works and sometimes it doesn't. If you set the random number generator seed, you can get the same random numbers each time you run the script. If you change you seed, you can get a different set of random numbers. I shortened your main script to this, where I try 100 different random seeds (and store the results of each attempt) to see what answers result:
for i=1:100
rng(i)
taua = randi([-5,5],1,112);
taua1(i, :) = nlinfit(x,yk,#activity_coefficientE,taua);
end
Each row of "taua1" is a set of 111 coefficients.
Problem #3 - Trying to fit 9 points with 112 coefficients
Every time nlinfit is called, you get this warning:
Warning: Rank deficient
because you have more coefficients (112) that you are asking nlinfit to find than data points you are fitting (9). It's like trying to find the 2nd order equation that best fits 2 points, there are an infinite number of solutions. When curve fitting you should have more data points than coefficients to make sure you're not fitting noise. You need more data points in "yk" and "x" and/or fewer coefficients to fit. I've done a lot of curve fitting and I've never seen an equation with 112 coefficients, so I am thinking that you are not solving the problem correctly. Perhaps the 112 coefficients aren't really independent or there are 112 data points and 9 coefficients that you want to find.
For completeness, here is my edited version of the activity_coefficientE.m function that I created to work on this solution. In general, I never see Matlab code with this many variables with similar names. Much of this code could be greatly simplified by using vector operations. Most of my changes involve formatting, adding semicolons, and the checks for non-finite values at the end.
function yk=activity_coefficientE(taua,x)
T=523;
alpha12=0.3; alpha13=0.3; alpha21=0.3; alpha23=0.3; alpha31=0.3; alpha32=0.3;
alpha18=0.2; alpha81=0.2; alpha28=0.2; alpha82=0.2; alpha38=0.2; alpha83=0.3;
alpha19=0.2; alpha91=0.2; alpha29=0.2; alpha92=0.2; alpha39=0.2; alpha93=0.2;
alpha110=0.2;alpha101=0.2;alpha210=0.2;alpha102=0.2;alpha310=0.2;alpha103=0.2;
alpha113=0.2;alpha131=0.2;alpha213=0.2;alpha132=0.2;alpha313=0.2;alpha133=0.2;
alpha114=0.2;alpha141=0.2;alpha214=0.2;alpha142=0.2;alpha314=0.2;alpha143=0.2;
alpha115=0.2;alpha151=0.2;alpha215=0.2;alpha152=0.2;alpha315=0.2;alpha153=0.2;
alpha117=0.2;alpha171=0.2;alpha217=0.2;alpha172=0.2;alpha317=0.2;alpha173=0.2;
alpha118=0.2;alpha181=0.2;alpha218=0.2;alpha182=0.2;alpha318=0.2;alpha183=0.2;
alpha810=0.2;alpha915=0.2;alpha1314=0.2;alpha108=0.2;alpha159=0.2;alpha1413=0.2;
alpha1718=0.2;alpha1817=0.2;
tau12=0; tau13=0; tau21=0; tau23=0; tau31=0; tau32=0;
tau18=taua(1)+taua(57)/T;
tau81=taua(2)+taua(58)/T;
tau28=taua(3)+taua(59)/T;
tau82=taua(4)+taua(60)/T;
tau38=taua(5)+taua(61)/T;
tau83=taua(6)+taua(62)/T;
tau19=taua(7)+taua(63)/T;
tau91=taua(8)+taua(64)/T;
tau29=taua(9)+taua(65)/T;
tau92=taua(10)+taua(66)/T;
tau39=taua(11)+taua(67)/T;
tau93=taua(12)+taua(68)/T;
tau110=taua(13)+taua(69)/T;
tau101=taua(14)+taua(70)/T;
tau210=taua(15)+taua(71)/T;
tau102=taua(16)+taua(72)/T;
tau310=taua(17)+taua(73)/T;
tau103=taua(18)+taua(74)/T;
tau113=taua(19)+taua(75)/T;
tau131=taua(20)+taua(76)/T;
tau213=taua(21)+taua(77)/T;
tau132=taua(22)+taua(78)/T;
tau313=taua(23)+taua(79)/T;
tau133=taua(24)+taua(80)/T;
tau114=taua(25)+taua(81)/T;
tau141=taua(26)+taua(82)/T;
tau214=taua(27)+taua(83)/T;
tau142=taua(28)+taua(84)/T;
tau314=taua(29)+taua(85)/T;
tau143=taua(30)+taua(86)/T;
tau115=taua(31)+taua(87)/T;
tau151=taua(32)+taua(88)/T;
tau215=taua(33)+taua(89)/T;
tau152=taua(34)+taua(90)/T;
tau315=taua(35)+taua(91)/T;
tau153=taua(36)+taua(92)/T;
tau117=taua(37)+taua(93)/T;
tau171=taua(38)+taua(94)/T;
tau217=taua(39)+taua(95)/T;
tau172=taua(40)+taua(96)/T;
tau317=taua(41)+taua(97)/T;
tau173=taua(42)+taua(98)/T;
tau118=taua(43)+taua(99)/T;
tau181=taua(44)+taua(100)/T;
tau218=taua(45)+taua(101)/T;
tau182=taua(46)+taua(102)/T;
tau318=taua(47)+taua(103)/T;
tau183=taua(48)+taua(104)/T;
tau810=taua(49)+taua(105)/T;
tau108=taua(50)+taua(106)/T;
tau915=taua(51)+taua(107)/T;
tau159=taua(52)+taua(108)/T;
tau1314=taua(53)+taua(109)/T;
tau1413=taua(54)+taua(110)/T;
tau1718=taua(55)+taua(111)/T;
tau1817=taua(56)+taua(112)/T;
G12=exp(-(tau12*alpha12));
G21=exp(-(tau21*alpha21));
G13=exp(-(tau13*alpha13));
G31=exp(-(tau31*alpha31));
G23=exp(-(tau23*alpha23));
G32=exp(-(tau32*alpha32));
G18=exp(-(tau18*alpha18));
G81=exp(-(tau81*alpha81));
G28=exp(-(tau28*alpha28));
G82=exp(-(tau82*alpha82));
G38=exp(-(tau38*alpha83));
G83=exp(-(tau83*alpha83));
G19=exp(-(tau19*alpha19));
G91=exp(-(tau91*alpha91));
G29=exp(-(tau29*alpha29));
G92=exp(-(tau92*alpha92));
G39=exp(-(tau39*alpha39));
G93=exp(-(tau93*alpha93));
G110=exp(-(tau110*alpha110));
G101=exp(-(tau101*alpha101));
G210=exp(-(tau210*alpha210));
G102=exp(-(tau102*alpha102));
G310=exp(-(tau310*alpha310));
G103=exp(-(tau103*alpha103));
G113=exp(-(tau113*alpha113));
G131=exp(-(tau131*alpha131));
G213=exp(-(tau213*alpha213));
G132=exp(-(tau132*alpha132));
G313=exp(-(tau313*alpha313));
G133=exp(-(tau133*alpha133));
G114=exp(-(tau114*alpha114));
G141=exp(-(tau141*alpha141));
G214=exp(-(tau214*alpha214));
G142=exp(-(tau142*alpha142));
G314=exp(-(tau314*alpha314));
G143=exp(-(tau143*alpha143));
G115=exp(-(tau115*alpha115));
G151=exp(-(tau151*alpha151));
G215=exp(-(tau215*alpha215));
G152=exp(-(tau152*alpha152));
G315=exp(-(tau315*alpha315));
G153=exp(-(tau153*alpha153));
G117=exp(-(tau117*alpha117));
G171=exp(-(tau171*alpha171));
G217=exp(-(tau217*alpha217));
G172=exp(-(tau172*alpha172));
G317=exp(-(tau317*alpha317));
G173=exp(-(tau173*alpha173));
G118=exp(-(tau118*alpha118));
G181=exp(-(tau181*alpha181));
G218=exp(-(tau218*alpha218));
G182=exp(-(tau182*alpha182));
G318=exp(-(tau318*alpha318));
G183=exp(-(tau183*alpha183));
G810=exp(-(tau810*alpha810));
G108=exp(-(tau108*alpha108));
G915=exp(-(tau915*alpha915));
G159=exp(-(tau159*alpha159));
G1314=exp(-(tau1314*alpha1314));
G1413=exp(-(tau1413*alpha1413));
G1718=exp(-(tau1718*alpha1718));
G1817=exp(-(tau1817*alpha1817));
%calculating mole fractions of ionic species
x1=x(:,1);
x2=x(:,2);
x3=x(:,3);
A=[0.0674243 0.0773881 0.0843400 0.0865343 0.0899223 0.0882858 0.0715087 0.0643867 0.0483658];
B=[0.0141081 0.0479814 0.0643151 0.0737477 0.0820756 0.0838701 0.0701576 0.0634457 0.0479639];
C=[0.0565665 0.0450072 0.0387724 0.0313828 0.02506094 0.0186280 0.0092734 0.0073438 0.0041595 ];
D=[0.0336447 0.0267694 0.0230611 0.0186659 0.0149058 0.0110795 0.0055157 0.0043679 0.0024739 ];
E=[0.0008148 0.0008756 0.00087131 0.0008794 0.0008711 0.0008441 0.0007384 0.0006997 0.0005980 ];
N=length(A);
x1n=zeros(N,1);x2n=zeros(N,1);x3n=zeros(N,1);
X1=zeros(N,1);X2=zeros(N,1);X3=zeros(N,1);X4=zeros(N,1);X5=zeros(N,1);X6=zeros(N,1);X7=zeros(N,1);
X12=zeros(N,1);X16=zeros(N,1);
for i=1:N
x1n(i)=(x1(i)-A(i)-D(i)-2*E(i)-C(i)+3*B(i));
x2n(i)=(x2(i)-A(i)-C(i)-D(i));
x3n(i)=(x3(i)-B(i));
X1(i)=(x1n(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X2(i)=(x2n(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X3(i)=(x3n(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X4(i)=(A(i)+D(i)+E(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X5(i)=(C(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X6(i)=(A(i)-B(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X7(i)=(B(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X12(i)=(E(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
X16(i)=(C(i)+D(i)/(x1n(i)+x2n(i)+x3n(i)+2*A(i)+4*B(i)+2*C(i)+2*D(i)+2*E(i)));
end
yc4=X4./(X4+X5);
yc5=X5./(X4+X5);
yc6=X6./(X6+X7+X12+X16);
yc7=X7./(X6+X7+X12+X16);
yc12=X12./(X6+X7+X12+X16);
yc16=X16./(X6+X7+X12+X16);
alpha14=yc6.*alpha18+yc7.*alpha19+yc12.*alpha113+yc16.*alpha117;
alpha24=yc6.*alpha28+yc7.*alpha29+yc12.*alpha213+yc16.*alpha217;
alpha34=yc6.*alpha38+yc7.*alpha39+yc12.*alpha313+yc16.*alpha317;
alpha15=yc6.*alpha110+yc7.*alpha115+yc12.*alpha114+yc16.*alpha118;
alpha25=yc6.*alpha210+yc7.*alpha215+yc12.*alpha214+yc16.*alpha218;
alpha35=yc6.*alpha310+yc7.*alpha315+yc12.*alpha314+yc16.*alpha318;
alpha16=yc4.*alpha81+yc5.*alpha101;
alpha26=yc4.*alpha82+yc5.*alpha102;
alpha36=yc4.*alpha83+yc5.*alpha103;
alpha17=yc4.*alpha91+yc5.*alpha151;
alpha27=yc4.*alpha92+yc5.*alpha152;
alpha37=yc4.*alpha93+yc5.*alpha153;
alpha112=yc4.*alpha131+yc5.*alpha141;
alpha212=yc4.*alpha132+yc5.*alpha142;
alpha312=yc4.*alpha133+yc5.*alpha143;
alpha116=yc4.*alpha171+yc5.*alpha181;
alpha216=yc4.*alpha172+yc5.*alpha182;
alpha316=yc4.*alpha173+yc5.*alpha183;
alpha46=yc5.*alpha810;
alpha47=yc5.*alpha915;
alpha412=yc5.*alpha1314;
alpha416=yc5.*alpha1718;
alpha56=yc4.*alpha108;
alpha57=yc4.*alpha159;
alpha512=yc4.*alpha1413;
alpha516=yc4.*alpha1817;
G14=yc6.*G18+yc7.*G19+yc12.*G113+yc16.*G117;
G24=yc6.*G28+yc7.*G29+yc12.*G213+yc16.*G217;
G34=yc6.*G38+yc7.*G39+yc12.*G313+yc16.*G317;
G15=yc6.*G110+yc7.*G115+yc12.*G114+yc16.*G118;
G25=yc6.*G210+yc7.*G215+yc12.*G214+yc16.*G218;
G35=yc6.*G310+yc7.*G315+yc12.*G314+yc16.*G318;
G16=yc4.*G81+yc5.*G101;
G26=yc4.*G82+yc5.*G102;
G36=yc4.*G83+yc5.*G103;
G17=yc4.*G91+yc5.*G151;
G27=yc4.*G92+yc5.*G152;
G37=yc4.*G93+yc5.*G153;
G112=yc4.*G131+yc5.*G141;
G212=yc4.*G132+yc5.*G142;
G312=yc4.*G133+yc5.*G143;
G116=yc4.*G171+yc5.*G181;
G216=yc4.*G172+yc5.*G182;
G316=yc4.*G173+yc5.*G183;
G46=yc5.*G810;
G47=yc5.*G915;
G412=yc5.*G1314;
G416=yc5.*G1718;
G56=yc4.*G108;
G57=yc4.*G159;
G512=yc4.*G1413;
G516=yc4.*G1817;
tau14=-log(G14)./alpha14;
tau24=-log(G24)./alpha24;
tau34=-log(G34)./alpha34;
tau15=-log(G15)./alpha15;
tau25=-log(G25)./alpha25;
tau35=-log(G35)./alpha35;
tau16=-log(G16)./alpha16;
tau26=-log(G26)./alpha26;
tau36=-log(G36)./alpha36;
tau17=-log(G17)./alpha17;
tau27=-log(G27)./alpha27;
tau37=-log(G37)./alpha37;
tau112=-log(G112)./alpha112;
tau212=-log(G212)./alpha212;
tau312=-log(G312)./alpha312;
tau116=-log(G116)./alpha116;
tau216=-log(G216)./alpha216;
tau316=-log(G316)./alpha316;
tau46=-log(G46)./alpha46;
tau47=-log(G47)./alpha47;
tau412=-log(G412)./alpha412;
tau416=-log(G416)./alpha416;
tau56=-log(G56)./alpha56;
tau57=-log(G57)./alpha57;
tau512=-log(G512)./alpha512;
tau516=-log(G516)./alpha516;
ln_y1_1=G12.*X2.*tau12+ G31.*X3.*tau13+ G14.*X4.*tau14+G15.*X5.*tau15+G16.*X6.*tau16+G17.*X7.*tau17+G112.*X12.*tau112+G116.*X16.*tau116;
ln_y1_2=G12.*X2+ G13.*X3+ G14.*X4+G15.*X5+G16.*X6+G17.*X7+G112.*X12+G116.*X16;
ln_y2_1=G21.*X1.*tau12+ G32.*X3.*tau32+ G24.*X4.*tau24+G25.*X5.*tau25+G26.*X6.*tau26+G27.*X7.*tau27+G212.*X12.*tau212+G216.*X16.*tau216;
ln_y2_2=G12.*X1+ G23.*X3+G24.*X4+G25.*X5+G26.*X6+G27.*X7+G212.*X12+G216.*X16;
ln_y3_1=G13.*X1.*tau13+ G23.*X3.*tau23+ G34.*X4.*tau34+G35.*X5.*tau35+G36.*X6.*tau36+G37.*X7.*tau37+G312.*X12.*tau312+G316.*X16.*tau316;
ln_y3_2=G13.*X1+ G23.*X3+ G34.*X4+G35.*X5+G36.*X6+G37.*X7+G312.*X12+G316.*X16;
ln_y4_1=G14.*X1.*tau14+G24.*X2.*tau24+G34.*X3.*tau34+G46.*X6.*tau46+G47.*X7.*tau47+G412.*X12.*tau412+G416.*X16.*tau416;
ln_y4_2=G14.*X1+G24.*X2+G34.*X3+G46.*X6+G47.*X7+G412.*X12+G416.*X16;
ln_y5_1=G15.*X1.*tau15+G25.*X2.*tau25+G35.*X3.*tau35+G56.*X6.*tau56+G57.*X7.*tau57+G512.*X12.*tau512+G516.*X16.*tau516;
ln_y5_2=G15.*X1+G25.*X2+G35.*X3+G56.*X6+G57.*X7+G512.*X12+G516.*X16;
ln_y6_1=G16.*X1.*tau16+G26.*X2.*tau26+G36.*X3.*tau36+G46.*X4.*tau46+G56.*X5.*tau56;
ln_y6_2=G16.*X1+G26.*X2+G36.*X3+G46.*X4+G56.*X5;
ln_y7_1=G17.*X1.*tau17+G27.*X2.*tau27+G37.*X3.*tau37+G47.*X4.*tau47+G57.*X5.*tau57;
ln_y7_2=G17.*X1+G27.*X2+G37.*X3+G47.*X4+G57.*X5;
ln_y12_1=G112.*X1.*tau112+G212.*X2.*tau212+G312.*X3.*tau312+G412.*X4.*tau412+G512.*X5.*tau512;
ln_y12_2=G112.*X1+G212.*X2+G312.*X3+G412.*X4+G512.*X5;
ln_y16_1=G116.*X1.*tau116+G216.*X2.*tau216+G316.*X3.*tau316+G416.*X4.*tau416+G516.*X5.*tau516;
ln_y16_2=G116.*X1+G216.*X2+G316.*X3+G416.*X4+G516.*X5;
ln_y1_3=(((X2.*G12)./ln_y2_2).*(tau12-(ln_y2_1)./(ln_y2_2)))+(((X3.*G13)./ln_y3_2).*(tau13-(ln_y3_1)./(ln_y3_2)));
ln_y1_4=(((X6.*G16)./ln_y6_2).*(tau16- (ln_y6_1./ln_y6_2))) + (((X7.*G17)./ln_y7_2).*(tau17- (ln_y7_1./ln_y7_2)))+(((X12.*G12)./ln_y12_2).*(tau112- (ln_y12_1./ln_y12_2)))+(((X16.*G16)./ln_y16_2).*(tau116- (ln_y16_1./ln_y16_2)));
ln_y1_5=(((X4.*G14)./ln_y4_2).*(tau14- (ln_y4_1./ln_y4_2))) + (((X5.*G15)./ln_y5_2).*(tau15- (ln_y5_1./ln_y5_2)));
yk=exp((ln_y1_1./ln_y1_2) + ln_y1_3 + ln_y1_4+ ln_y1_5)'; % activity coefficient for H2O
if any(~isfinite(yk))
yk = 10 * ones(size(yk));
end

With gtsummary, is it possible to have N on a separate row to the column name?

gtsummary by default puts the number of observations in a by group beside the label for that group. This increases the width of the table... with many groups or large N, the table would quickly become very wide.
Is it possible to get gtsummary to report N on a separate row beneath the label? E.g.
> data(mtcars)
> mtcars %>%
+ select(mpg, cyl, vs, am) %>%
+ tbl_summary(by = am) %>%
+ as_tibble()
# A tibble: 6 x 3
`**Characteristic**` `**0**, N = 19` `**1**, N = 13`
<chr> <chr> <chr>
1 mpg 17.3 (14.9, 19.2) 22.8 (21.0, 30.4)
2 cyl NA NA
3 4 3 (16%) 8 (62%)
4 6 4 (21%) 3 (23%)
5 8 12 (63%) 2 (15%)
6 vs 7 (37%) 7 (54%)
would become
# A tibble: 6 x 3
`**Characteristic**` `**0**` `**1**`
<chr> <chr> <chr>
1 N = 19 N = 13
2 mpg 17.3 (14.9, 19.2) 22.8 (21.0, 30.4)
3 cyl NA NA
4 4 3 (16%) 8 (62%)
5 6 4 (21%) 3 (23%)
6 8 12 (63%) 2 (15%)
7 vs 7 (37%) 7 (54%)
(I only used as_tibble so that it was easy to show what I mean by editing it manually...)
Any idea?
Thanks!
Here is one way you could do this:
library(tidyverse)
library(gtsummary)
mtcars %>%
select(mpg, cyl, vs, am) %>%
# create a new variable to display N in table
mutate(total = 1) %>%
# this is just to reorder variables for table
select(total, everything()) %>%
tbl_summary(
by = am,
# this is to specify you only want N (and no percentage) for new total variable
statistic = total ~ "N = {N}") %>%
# this is a gtsummary function that allows you to edit the header
modify_header(all_stat_cols() ~ "**{level}**")
First, I am making a new variable that is just total observations (called total)
Then, I am customizing the way I want that variable statistic to be displayed
Then I am using gtsummary::modify_header() to remove N from the header
Additionally, if you use the flextable print engine, you can add a line break in the header itself:
mtcars %>%
select(mpg, cyl, vs, am) %>%
# create a new variable to display N in table
tbl_summary(
by = am
# this is to specify you only want N (and no percentage) for new total variable
) %>%
# this is a gtsummary function that allows you to edit the header
modify_header(all_stat_cols() ~ "**{level}**\nN = {n}") %>%
as_flex_table()
Good luck!
#kittykatstat already posted two fantastic solutions! I'll just add a slight variation :)
If you want to use the {gt} package to print the table and you're outputting to HTML, you can use the HTML tag <br> to add a line break in the header row (very similar to the \n solution already posted).
library(gtsummary)
mtcars %>%
select(mpg, cyl, vs, am) %>%
dplyr::mutate(am = factor(am, labels = c("Manual", "Automatic"))) %>%
# create a new variable to display N in table
tbl_summary(by = am) %>%
# this is a gtsummary function that allows you to edit the header
modify_header(stat_by = "**{level}**<br>N = {N}")

Text file processing in Matlab

I have a text output from a program with a set format. I need to parse ~200 of them to extract an information. I tried in MATLAB with 'textscan' but did not work. Following is the input:
MOTIFS SUMMARY:
1) TTATAGCCGC (GCGGCTATAA) 1.986
2) AAACCGCCTC (GAGGCGGTTT) 1.865
DETAILED RESULTS:
1) TTATAGCCGC (GCGGCTATAA) 1.986
Matrix: MAT1 TTATAGCCGC
A 0.1249 0.177 0.7364 0.1189 0.7072 0.1149 0.09858 0.1096
C 0.0899 0.07379 0.1136 0.1298 0.08662 0.1293 0.7528 0.721
G 0.06828 0.1284 0.07195 0.1031 0.1352 0.6708 0.05556 0.0713
T 0.7169 0.6209 0.07802 0.6482 0.07096 0.08492 0.09305 0.09804
OCCURRENCES:
>GENE_1 1 TTATAGCCGC 1 561 +
>GENE_2 24 TAATAGCCGC 0.928699 762 -
>GENE_3 10 ATATAGCCGC 0.904905 185 -
>GENE_1 7 TTATAGCAGC 0.901785 726 +
**********
2) AAACCGCCTC (GAGGCGGTTT) 1.865
Matrix: MAT2 AAACCGCCTC
A 0.653 0.7401 0.7763 0.1323 0.09619 0.09134 0.07033 0.1383
C 0.1163 0.07075 0.09441 0.749 0.6347 0.1132 0.6559 0.6982
G 0.09136 0.09402 0.07385 0.04209 0.1799 0.7332 0.1241 0.07568
T 0.1393 0.09518 0.05541 0.07659 0.08921 0.06234 0.1497 0.08786
OCCURRENCES:
>GENE_1 21 AAACCGCCTC 1 963 +
>GENE_2 14 AAACGGCCTC 0.928198 212 +
>GENE_2 8 AAACCGTCTC 0.92009 170 +
>GENE_4 3 TAACCGCCTC 0.918883 370 +
**********
I am trying to count the unique() occurrence under each motif and add it to the MOTIF SUMMARY and a final average of them. My expected output is:
MOTIFS SUMMARY:
1) TTATAGCCGC (GCGGCTATAA) 1.986 3
2) AAACCGCCTC (GAGGCGGTTT) 1.865 3
AVERAGE OCCURRENCE: 3
For motif 1, unique occurrence is 3 (GENE_1, GENE_2, GENE_3). Similarly for motif 2, it is again 3 (GENE_1, GENE_2, GENE_4)
How can I use OCCURRENCES and ****** as blocks ? so that, I can regexp GENE_x to store it and count.
Kindly help.
Thanks,
AP
You better try to change the original text file so that it will be legal matlab m file code, then just use 'eval' function to run it .
Most of the job will be to find where to insert '=' and '[' ']' and '%' for ignore parts.
If all files are identical in format than it will be easy.

downsampling rate with movement data (first point equal from the original matrix)

I was wondering if the procedure applied trying to download the sample rate was the appropriate as follows the instruction: y = downsample(x,n)
downsamp_rate = 40;
downsampled_data = downsample(X,downsamp_rate);
.. because my doubt relays in why the first column from both matrices is exactly the same (the original matrix and the sample donwloaded)maintaining the same data....
then the other data have already transformed to a lower sample rate.
Thank you so much!
Best!
edited: Sample data. I pasted the data but I can upload de .mat files.
Original data.
column 1 column 2 column 3
-0,593600000000000 -0,592699999999996 -0,591899999999995
2,42180000000000 2,41010000000000 2,40360000000000
1,78550000000000 1,79020000000000 1,79530000000000
-1,30590000000000 -1,31520000000000 -1,31530000000000
-0,707800000000003 -0,712699999999999 -0,727700000000003
-0,986500000000001 -0,996000000000002 -1,00460000000000
-0,989699999999999 -0,989699999999999 -0,989699999999999
1,23500000000000 1,22970000000000 1,21880000000000
0,122899999999998 0,127899999999997 0,128899999999998
0,938300000000003 0,937500000000002 0,936200000000004
0,248600000000004 0,248500000000002 0,248700000000002
-0,381499999999996 -0,393199999999999 -0,393699999999997
0,294099999999997 0,279299999999999 0,271299999999997
-0,223200000000001 -0,223699999999999 -0,227299999999997
0,0879999999999992 0,117300000000004 0,122500000000003
-0,167899999999999 -0,170999999999999 -0,174800000000003
-0,687499999999996 -0,697199999999998 -0,701600000000002
-0,681700000000002 -0,682200000000000 -0,683000000000000
1,19659999999999 1,19670000000000 1,19490000000000
-0,565500000000008 -0,565199999999999 -0,557400000000008
Downsampled data
column 1 column 2 column 3
-0,593600000000000 0,821900000000003 0,936300000000001
2,42180000000000 1,14610000000000 -0,255400000000000
1,78550000000000 2,86550000000000 3,66890000000000
-1,30590000000000 7,01950000000000 12,9564000000000
-0,707800000000003 3,05920000000000 0,852999999999998
-0,986500000000001 -0,372200000000000 -0,951000000000002
-0,989699999999999 -0,988000000000000 -1,21730000000000
1,23500000000000 5,79700000000000 3,40880000000000
0,122899999999998 5,32230000000000 5,19260000000000
0,938300000000003 4,88130000000000 7,55900000000000
0,248600000000004 4,79290000000000 2,96620000000000
-0,381499999999996 -0,400000000000000 0,641500000000000
0,294099999999997 -0,131400000000004 -1,20040000000000
-0,223200000000001 1,49610000000000 1,59030000000000
0,0879999999999992 0,418700000000000 -0,0114999999999976
-0,167899999999999 0,0149999999999983 -0,857500000000000
-0,687499999999996 -0,593100000000002 0,119700000000000
-0,681700000000002 -0,170000000000003 0,126799999999999
1,19659999999999 1,17670000000000 1,15780000000000
-0,565500000000008 8,89019999999999 6,58569999999999
A possible for your output is a periodic input signal with a period length of downsamp_rate-1. To give a short demonstration:
>> X=repmat(1:39,1,10);
>> downsampled_data = downsample(X,downsamp_rate);
>> downsampled_data
downsampled_data =
Columns 1 through 9
1 2 3 4 5 6 7 8 9
Column 10
10
Thus, take a look at your rows 40,41,42. I assume the first value is identical to your row 1,2,3

POS tagging in Scala

I tried to POS tag a sentence in Scala using Stanford parser like below
val lp:LexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
lp.setOptionFlags("-maxLength", "50", "-retainTmpSubcategories")
val s = "I love to play"
val parse :Tree = lp.apply(s)
val taggedWords = parse.taggedYield()
println(taggedWords)
I got an error type mismatch; found : java.lang.String required: java.util.List[_ <: edu.stanford.nlp.ling.HasWord] in the line val parse :Tree = lp.apply(s)
I don't know whether this is the right way of doing it or not. Are there any other easy ways of POS tagging a sentence in Scala?
You might like to consider the FACTORIE toolkit (http://github.com/factorie/factorie). It is a general library for machine learning and graphical models that happens to include an extensive suite of natural language processing components (tokenization, token normalization, morphological analysis, sentence segmentation, part-of-speech tagging, named entity recognition, dependency parsing, mention finding, coreference).
Furthermore it is written entirely in Scala, and it is released under the Apache License.
Documentation is currently sparse, but will be improving in the coming months.
For example, once Maven-based installation is finished you can type at the command line:
bin/fac nlp --pos1 --parser1 --ner1
to launch a socket-listening multi-threaded NLP server. Then query it by piping plain text to its socket number:
echo "Mr. Jones took a job at Google in New York. He and his Australian wife moved from New South Wales on 4/1/12." | nc localhost 3228
The output is then
1 1 Mr. NNP 2 nn O
2 2 Jones NNP 3 nsubj U-PER
3 3 took VBD 0 root O
4 4 a DT 5 det O
5 5 job NN 3 dobj O
6 6 at IN 3 prep O
7 7 Google NNP 6 pobj U-ORG
8 8 in IN 7 prep O
9 9 New NNP 10 nn B-LOC
10 10 York NNP 8 pobj L-LOC
11 11 . . 3 punct O
12 1 He PRP 6 nsubj O
13 2 and CC 1 cc O
14 3 his PRP$ 5 poss O
15 4 Australian JJ 5 amod U-MISC
16 5 wife NN 6 nsubj O
17 6 moved VBD 0 root O
18 7 from IN 6 prep O
19 8 New NNP 9 nn B-LOC
20 9 South NNP 10 nn I-LOC
21 10 Wales NNP 7 pobj L-LOC
22 11 on IN 6 prep O
23 12 4/1/12 NNP 11 pobj O
24 13 . . 6 punct O
Of course there is a programmatic API to all this functionality as well.
import cc.factorie._
import cc.factorie.app.nlp._
val doc = new Document("Education is the most powerful weapon which you can use to change the world.")
DocumentAnnotatorPipeline(pos.POS1).process(doc)
for (token <- doc.tokens)
println("%-10s %-5s".format(token.string, token.posLabel.categoryValue))
will output:
Education NN
is VBZ
the DT
most RBS
powerful JJ
weapon NN
which WDT
you PRP
can MD
use VB
to TO
change VB
the DT
world NN
. .
I found a very simple way to do POS tagging in Scala
Step 1
Download stanford tagger version 3.2.0 form the link below
http://nlp.stanford.edu/software/stanford-postagger-2013-06-20.zip
Step 2
Add stanford-postagger jar present in the folder to your project and also place the english-left3words-distsim.tagger file present in the models folder in your project
Then, with the code below you can pos tag a sentence in Scala
val tagger = new MaxentTagger(
"english-left3words-distsim.tagger")
val art_con = "My name is Rahul"
val tagged = tagger.tagString(art_con)
println(tagged)
Output: My_PRP$ name_NN is_VBZ Rahul_NNP
I believe the API of the Stanford Parser has changed somewhat, as it does sometimes. apply has the signature, public Tree apply(java.util.List<? extends HasWord> words), and this is what you see in the error message.
What you should use now is parse, which has the signature public Tree parse(java.lang.String sentence).