I am trying to learn a linear regression model in Matlab. So my variables are : train_fv, train_fv_labels, test_fv and test_fv_labels. The sizes of the variables are as follows : 333x9, 333x1, 167x9 and 167x1. I want to train the model and then predict the labels on test_fv compare them with the actual labels given in test_fv_labels.
My matlab code is as follows : I am using a stepwise linear regression to model to get the best fit possible :
mdl = stepwiselm(train_fv,train_fv_labels,'PEnter',0.001,'verbose',1)
mdl1 = step(mdl,'upper','quadratic','verbose',1)
The outputs which I am getting are as follows
1. Adding x5, FStat = 83.3108, pValue = 7.06324e-18
2. Adding x1, FStat = 35.6014, pValue = 6.24096e-09
3. Adding x7, FStat = 41.0932, pValue = 5.0338e-10
4. Adding x5:x7, FStat = 33.3157, pValue = 1.81571e-08
5. Adding x1:x5, FStat = 14.1821, pValue = 0.000196729
mdl =
Linear regression model:
y ~ 1 + x1*x5 + x5*x7
Estimated Coefficients:
Estimate SE tStat pValue
__________ __________ _______ __________
(Intercept) 0.0014532 5.5229e-05 26.312 9.9458e-83
x1 0.00071972 0.00011402 6.3121 8.9595e-10
x5 -0.0021179 0.00018102 -11.7 1.1938e-26
x7 0.0011401 0.00022498 5.0678 6.7473e-07
x1:x5 -0.0015096 0.00040087 -3.7659 0.00019673
x5:x7 -0.0049673 0.00077872 -6.3788 6.0915e-10
Number of observations: 333, Error degrees of freedom: 327
Root Mean Squared Error: 0.001
R-squared: 0.442, Adjusted R-Squared 0.434
F-statistic vs. constant model: 51.9, p-value = 1.65e-39
6. Adding x5^2, FStat = 63.1344, pValue = 3.17359e-14
mdl1 =
Linear regression model:
y ~ 1 + x1*x5 + x5*x7 + x5^2
Estimated Coefficients:
Estimate SE tStat pValue
__________ __________ _______ __________
(Intercept) 0.0011415 6.4043e-05 17.825 4.3107e-50
x1 0.00071722 0.00010452 6.8618 3.4339e-11
x5 -0.0018651 0.00016896 -11.039 2.7782e-24
x7 0.0011951 0.00020635 5.7915 1.6426e-08
x1:x5 -0.0019348 0.00037135 -5.2101 3.354e-07
x5:x7 -0.0045341 0.00071592 -6.3332 7.9578e-10
x5^2 0.0033789 0.00042525 7.9457 3.1736e-14
Number of observations: 333, Error degrees of freedom: 326
Root Mean Squared Error: 0.000921
R-squared: 0.533, Adjusted R-Squared 0.524
F-statistic vs. constant model: 61.9, p-value = 5.33e-51
So it basically means that for regression using mdl model I have this function : y ~ 1 + x1*x5 + x5*x7 and for mdl1 I have this : y ~ 1 + x1*x5 + x5*x7 + x5^2
But when I am trying to predict the values using the test set, I am getting an error. Why is this so ?
test_fv_labels = feval(mdl1,test_fv);
Predictor data matrix must have 5 columns.
But if I use the predict function instead of feval I am not getting an error. Why is this so ?
test_fv_labels = predict(mdl1,test_fv);
Please kindly tell me where I am going wrong and what is the difference between predict and feval command in Matlab.
Related
I have a Matlab table that contains information on students (numerical and categorical). A sample is given here:
School = {'GB'; 'UR'; 'GB'; 'GB'; 'UR'};
School = categorical(School);
Age = [14;14;12;16;19];
Relationship = {'yes'; 'yes'; 'no'; 'no'; 'yes'};
Relationship = categorical(Relationship);
Status = {'ft'; 'pt'; 'ft'; 'ft'; 'ft'};
Status = categorical(Status);
Father_Job = {'pol'; 'ser'; 'oth'; 'ele'; 'cle'};
Father_Job = categorical(Father_Job);
Health = [1;2;3;3;5];
Exam = {'pass'; 'pass'; 'fail'; 'fail'; 'pass'};
Exam = categorical(Exam);
T =
School Age Relationship Status Father_Job Health Exam
______ ___ ____________ ______ __________ ______ ____
GB 14 yes ft pol 1 pass
UR 14 yes pt ser 2 pass
GB 12 no ft oth 3 fail
GB 16 no ft ele 3 fail
UR 19 yes ft cle 5 pass
I want to use this data to predict and classify the pass/fail of the exam. I am planning to use the fitglm to make a logistic regression, and fitcnb to make a Naive Bayes classifier. I know that both methods can handle well categorical variables in Matlab, so there should be no problem using my table as it is, with its categorical variables.
However, I have a problem when I want to use cvpartition and crossvalind to perform a 10-fold cross-validation. When I try to create indices of my folds, I get the following error: Error using statslib.internal.grp2idx Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts) is not supported. Use a row subscript and a variable subscript.
My goal is to perform the following operations:
% Column 7 (Exam) is the response variable
X = T(:, 1:6);
Y = T(:, 7);
% Create indices of 5-fold cross-validation (here I get errors)
cvpart = cvpartition(Y,'KFold',5);
indices = crossvalind('Kfold',Y,5);
% Create my test and training sets
for i = 1:5
test = (indices == i);
train = ~test;
Xtrain = X(train,:);
Xtest = X(test,:);
Ytrain = Y(train,:);
Ytest = Y(test,:);
end
% Fit logistic model
mdl = fitglm(Xtrain,Ytrain,'Distribution','binomial')
Would anyone have a take on this please? I know that I can possibly change the categorical variables to numerical ones, but I would rather not. Is there anyway around this? Thank you.
I think your main problem is that your data set is just too small. You have n =5, which isn't even enough to create a non-validated model.
I am trying to fit some data in Matlab to a Hill function of the form y = r^n/(r^n+K^n). I have data for r,y and I need to find K,n.
I tried two different approaches after reading the docs extensively - one uses fit from the CurveFitting Toolbox and the other uses lsqcurvefit from the Optimization Toolbox. I haven't had any success with either. What am I missing?
Here is my xdata and ydata:
xdata = logspace(-2,2,101);
ydata = [0.0981 0.1074 0.1177 0.1289 0.1411 0.1545 0.1692 0.1852 0.2027 ...
0.2219 0.2428 0.2656 0.2905 0.3176 0.3472 0.3795 0.4146 0.4528 ...
0.4944 0.5395 0.5886 0.6418 0.6994 0.7618 0.8293 0.9022 0.9808 ...
1.0655 1.1566 1.2544 1.3592 1.4713 1.5909 1.7183 1.8537 1.9972 ...
2.1490 2.3089 2.4770 2.6532 2.8371 3.0286 3.2272 3.4324 3.6437 ...
3.8603 4.0815 4.3065 4.5344 4.7642 4.9950 5.2258 5.4556 5.6833 ...
5.9082 6.1292 6.3457 6.5567 6.7616 6.9599 7.1511 7.3347 7.5105 ...
7.6783 7.8379 7.9893 8.1324 8.2675 8.3946 8.5139 8.6257 8.7301 ...
8.8276 8.9184 9.0029 9.0812 9.1539 9.2212 9.2834 9.3408 9.3939 ...
9.4427 9.4877 9.5291 9.5672 9.6022 9.6343 9.6638 9.6909 9.7157 ...
9.7384 9.7592 9.7783 9.7957 9.8117 9.8263 9.8397 9.8519 9.8630 ...
9.8732 9.8826];
'Fit' code:
HillEqn = '#(x,xdata)xdata.^x(1)./(xdata.^x(1)+x(2).^x(1))';
startPoints = [1 1];
fit(xdata',ydata',HillEqn,'Start',startPoints)
Error message:
Error using fittype>iTestCustomModelEvaluation (line 726)
Expression #(x,xdata)xdata.^x(1)./(xdata.^x(1)+x(2).^x(1)) is not a valid MATLAB
expression, has non-scalar coefficients, or cannot be evaluated:
Undefined function 'imag' for input arguments of type 'function_handle'.
'lsqcurvefit' code:
fun = #(x,xdata) xdata.^x(1)./(xdata.^x(1)+x(2).^x(1));
x0 = [1,1]; % Initial Parameter Estimates
x = lsqcurvefit(fun,x0,xdata,ydata);
Error message:
Error using snls (line 47)
Objective function is returning undefined values at initial point. lsqcurvefit cannot
continue.
First I think you need 3 variables to start from, because the hill function will be max of 1, and your data it is maxed at 10. So either normalize your data by doing ydata=ydata./max(ydata), or add a 3rd variable (which I did just for the demonstration). This is how I did it:
startPoints = [1 1 1];
s = fitoptions('Method','NonlinearLeastSquares',... %
'Lower',[0 0 0 ],...
'Upper',[inf inf inf],...
'Startpoint',startPoints);
HillEqn = fittype( 'x.^a1./(x.^a1+a2.^a1)*a3','options',s);
[ffun,gofrr] = fit(xdata(:),ydata(:),HillEqn);
yfit=feval(ffun,xdata(:)); %Fitted function
plot(xdata,ydata,'-bx',xdata,yfit,'-ro');
ffun =
General model:
ffun(x) = x.^a1./(x.^a1+a2.^a1)*a3
Coefficients (with 95% confidence bounds):
a1 = 1.004 (1.004, 1.004)
a2 = 0.9977 (0.9975, 0.9979)
a3 = 9.979 (9.978, 9.979)
Side note:
In your case what you really want to do is to transform you data by looking at Y=1./ydata instead, then fit, and then take another 1./Y to get the answer in the previous "hill" function representation. This is because your problem is bilinear in nature , so by going 1./ydata you get a bilinear relation, for which a polyfit of order 1 will do:
Y=1./ydata;
X = 1./xdata;
p=polyfit(X,Y,1);
plot(X,Y,'-bx',X,polyval(p,X),'-ro')
I use mdl=NonLinearModel.fit to fit a non-linear model.
How can I get the F-statistic and p-value from the result (line: "F-statistic vs. constant model")?
Example code:
load carsmall
X = Weight;
y = MPG;
modelfun = 'y ~ b1 + b2*exp(-b3*x/1000)';
beta0 = [1 1 1];
mdl=NonLinearModel.fit(X,y,modelfun,beta0)
which produces the following:
mdl =
Nonlinear regression model:
y ~ b1 + b2*exp( - b3*x/1000)
Estimated Coefficients:
Estimate SE tStat pValue
________ _______ ________ __________
b1 -17.725 31.321 -0.56594 0.57283
b2 77.862 21.332 3.6499 0.00043735
b3 0.21775 0.17176 1.2677 0.20814
Number of observations: 94, Error degrees of freedom: 91
Root Mean Squared Error: 4.12
R-Squared: 0.743, Adjusted R-Squared 0.738
F-statistic vs. constant model: 132, p-value = 1.34e-27
You're going to need [Fstat,pval]=fTest(mdl);
However this requires you to change permissions to public in a number of scripts. I just ran MATLAB as root and manually changed every permission to public in all files that were returning errors until I could actually get what I needed. It seems to work but there is probably a better way to do it.
I have a defined values of input (time samples) output (concentration), I would like to fit a model in order to estimate the Parameter values of rate constants(here in this case rate constants were defined as K1 K2 K3)
I have used lsqcurvefit for optimization but i am incurring errors in my usage of lsqcurvefit solver
First the values of x variable(t) and y variable(c_tot)
c_tot =[0,0,0,396.979609003375,503.769614648079,285.408414510699,137.309948090421,...
63.0089145454838,28.2076980338446,12.4169874862731,5.39698687726493,...
2.32247111168307,0.971427824475975,0.396298705702665,0.154518313562792,...
0.0563350826881436,0.0175309433420762,
0.00589400762862266,0.00199918527022414];%Loading Sampled Mat file values for fitting the Estimates of K in it
t=[0 0.25 0.5 0.75 1 1.5 2 3 4 9 13 18 23 28 33 38 43 48 53];%time samples
now the model which needs to be fitted
%-----------Model to be fitted-------------------------------------
k1_r=0.014;%reference tissue rate constant
a1=17501;a2=28500;a3=65000;%Values of a1 a2 a3 of Arterial input function
b1=0.9;b2=0.2;b3=0.5;%Values of b1 b2 b3 of Arterial Input Function
td=0.3% Indicates delay time
tmax=0.8% maximum peak time concentration
A = ((K(1)*K(2))/(k1_r*(K(2)+K(3))));
B = ((K(1)*K(3))/(k1_r*(K(2)+K(3))));
model=#(K,t)conv((a1 * exp(-b1 * (t - tmax))+...
a2 * exp(-b2 * (t - tmax))) +...
a3 * exp(-b3 * (t - tmax)),...
(A*exp(-(K(2)+K(3))*t+B)),'same');
Now i have used initial estimates of the parameters(k1 k2 k3) which i would like to fit
and called the solver
%----------------Assignment------------------------------------------------
K=[0.1 0.08 0.4];% Initial estimates of K1 K2 K3
%----------------Least-square curvefitting---------------------------------
K_est=lsqcurvefit(model,K,t,c_tot);
plot(K_est,'o');
xlabel('Time(mins)');
ylabel('Concentration(Bq/ml)');
Following is the error that i am incurring
Undefined function 'tomlablic' for input arguments of type 'double'.
Error in tomlabVersion (line 45)
[x1,x2,x3,x4,x5,x6,tomV,OS]=tomlablic(1);
Error in GetSolver (line 72)
[TomV,os,TV] = tomlabVersion;
Error in lsqnonlin (line 415)
Solver = GetSolver(checkType([],Prob.probType),...
Error in lsqcurvefit (line 298)
[x, f_k, r_k, ExitFlag, Output, Lambda, J_k, Result] =...
I am not experienced enough to know what I could do in order to perform model fitting.
Does any other solver will help me in this or is there any Global optimization tool box with which i can fit in initialized values
Any tips are welcome...
You are missing some TOMLAB files (looks very much like a missing license file).
Please make sure you have a complete installation of TOMLAB (which is sold by a company different from Mathworks) and try again.
As an alternative to TOMLAB, you can try Matlab's Global Optimization toolbox.
well first thanks in advance.
i am a machine learning person.
for a project, i have created a matlab function which returns several features of a signal in frequency domain.
the function returns signal's energy, sum of fourier coefficients, entropy, pwr_at_DC, power at peak frequency, and peak freq/dominant frequency.
the error states 'two many output arguments'!
code is this...
%signal = [120 111 117 109 94 104 125 161]; %for example consider this discrete signal.
%the function returns Singal's energy, sum of fourier coefficients, entropy,
%pwr_at_DC, power at peak frq, and peak freq.
function [signalFeatures] = SigFreqAnalysis(signal)
NFFT = length(signal); %leangth of the signal
signal = signal - mean(signal); %remove DC comp (avoid peak at 0Freq.
FT = fft(signal,NFFT); %fourier transform n point
sEnergy = sum(abs(FT).^2)/NFFT; %spectral energy
SumCoeff = sum(abs(FT)); %total of all NFFT coefficients!
%[P,F] = periodogram(signal,[],NFFT,'power');
[P,F] = pwelch(signal,ones(NFFT,1),0,NFFT,'power'); %[P,F] - PSD of the signal
%P1=real(P1);
%Steps for Entropy: calc PSD ---> normalize p ---> entropy = ??(P)log2(P);
Pn=P/norm(P); log2Pn = log2(Pn + 1e-12);
Entropy = -sum(Pn.*log2Pn)/log2(length(Pn));
PdBW = 10*log10(P); pwr_at_DC = PdBW(F==0); % power in dBW
%the most important dominant frquency! ! !
[pks_dBW,locs] = findpeaks(PdBW,'NPEAKS',1,'SORTSTR','descend'); %peak/dominant!
%findpeak returns empty vector if no freq found!
if isempty(pks_dBW)
pks_dBW=0; pkFrq = 0; %if pks_dbs is 0 findpeak returns empty matrix;
else
pkFrq = F(locs); %this is the dominant/peak frequency of X axes!!
end
signalFeatures = [sEnergy SumCoeff Entropy pwr_at_DC pks_dBW pkFrq];
%return this vector.
end
error ---> 'too many output arguments' in findpeak function!
can anyone help me resolve this error!
thanks,
Adesh Shah
Your code that calls the function should look like this:
signal = [120 111 117 109 94 104 125 161]
a = SigFreqAnalysis(signal)
but you are probably calling it this way
[a b] = SigFreqAnalysis(signal)
Hence, too many outputs