I have a series of 200 x/y data-points and am using matlab to generate a model. I am trying to determine of what order the polynomial function generated by fitln should be. I tried starting at 6, hoping that some higher-order coefficients wouldn't be significant, but get the following:
Linear regression model:
y ~ 1 + x1 + x1^2 + x1^3 + x1^4 + x1^5 + x1^6
Estimated Coefficients:
Estimate SE tStat pValue
___________ __________ _______ __________
(Intercept) 0 0 NaN NaN
x1 11897 462.8 25.706 2.1825e-64
x1^2 -442.92 26.689 -16.596 4.438e-39
x1^3 7.323 0.55975 13.083 1.8862e-28
x1^4 -0.059949 0.0053902 -11.122 1.516e-22
x1^5 0.00023784 2.4198e-05 9.8286 9.3122e-19
x1^6 -3.6511e-07 4.1034e-08 -8.8978 4.0169e-16
Number of observations: 201, Error degrees of freedom: 195
Root Mean Squared Error: 1.36e+04
R-squared: 0.519, Adjusted R-Squared 0.506
F-statistic vs. constant model: 42, p-value = 3.1e-29
I get the following with a polynomial of order 5:
Linear regression model:
y ~ 1 + x1 + x1^2 + x1^3 + x1^4
Estimated Coefficients:
Estimate SE tStat pValue
___________ __________ _______ __________
(Intercept) 1.0011e+05 19.058 5252.9 0
x1 -19.02 1.3004 -14.626 3.0955e-33
x1^2 0.27502 0.026087 10.542 7.1559e-21
x1^3 -0.0029912 0.00019381 -15.434 1.0751e-35
x1^4 -2.1979e-06 4.7601e-07 -4.6174 7.0203e-06
Number of observations: 201, Error degrees of freedom: 196
Root Mean Squared Error: 52.4
R-squared: 1, Adjusted R-Squared 1
F-statistic vs. constant model: 5.8e+05, p-value = 0
Now, I noticed that in all cases p values are very low (good, I suppose) and R-squared is greater than 0.5 (which I assume is also good).
So, I am not really sure what to make of this data. I know that I should aim for lower-order polynomials, but how can I justify this?
Related
I know how to calculate the line parameter defined as x below for one layer, considering the given wavelength range 50 to 550 um. Now I want to repeat this calculation for all 10 layers. all the other parameters remain as a constant while temperature varies from layer 1 to 10.Any suggestion would be greatly appreciated.
wl=[100 200 300 400 500]; %5 wavelengths, 5 spectral lines
br=[0.12 0.56 0.45 0.67 0.89]; % broadening parameter for each wavelength
T=[101 102 103 104 105 106 107 108 109 110];% temperature for 10 layers
wlall=linspace(50,550,40);%all the wavelength in 50um to 550 um range
% x is defined as,
%(br*wl/(br*br + (wlall-wl)^2))*br;
%If I do a calculation for the first line
((br(1)*T(1)*wl(1))./(br(1)*br(1)*(T(1)) + (wlall(:)-wl(1)).^2))*br(1)*T(1)
%Now I'm going to calculate it for all the lines in the first layer
k= repmat(wlall,5,1);
for i=1:5;
kn(i,:)=(br(i)*T(1)* wl(i)./(br(i)*br(i)*T(1) + (k(i,:)-
wl(i)).^2))*br(i)*T(1);
end
%Above code gives me x parameter for all the wavelengths in the
%given range( 50 to 550 um) in the first layer, dimension is (5,40)
% I need only the maximum value of each column
an=(kn(:,:)');
[ll,mm]=sort(an,2,'descend');
vn=(ll(:,1))'
%Now my output has the dimension , (1,40) one is for the first layer, 40 is
%for the maximum x parameter corresponding to each wavelength in first layer
%Now I want to calculate the x parameter in all 10 layers,So T should vary
%from T(1) to T(10) and get the
%maximum in each column, so my output should have the dimension ( 10, 40)
You just need to run an extra 'for' loop for each value of 'T'. Here is an example:
clc; close all; clear all;
wl=[100 200 300 400 500]; %5 wavelengths, 5 spectral lines
br=[0.12 0.56 0.45 0.67 0.89]; % broadening parameter for each wavelength
T=[101 102 103 104 105 106 107 108 109 110];% temperature for 10 layers
wlall=linspace(50,550,40);%all the wavelength in 50um to 550 um range
% x is defined as,
%(br*wl/(br*br + (wlall-wl)^2))*br;
%If I do a calculation for the first line
((br(1)*T(1)*wl(1))./(br(1)*br(1)*(T(1)) + (wlall(:)-wl(1)).^2))*br(1)*T(1)
%Now I'm going to calculate it for all the lines in the first layer
k= repmat(wlall,5,1);
for index = 1:numel(T)
for i=1:5
kn(i,:, index)=(br(i)*T(index)* wl(i)./(br(i)*br(i)*T(index) + (k(i,:)- wl(i)).^2))*br(i)*T(index);
end
an(:, :, index) = transpose(kn(:, :, index));
vn(:, index) = max(an(:, :, index), [], 2);
end
vn = transpose(vn);
I have a vectorization problem with nlinfit.
Let A = (n,p) the matrix of observations and t(1,p) the explanatory variable.
For ex
t=[0 1 2 3 4 5 6 7]
and
A=[3.12E-04 7.73E-04 3.58E-04 5.05E-04 4.02E-04 5.20E-04 1.84E-04 3.70E-04
3.38E-04 3.34E-04 3.28E-04 4.98E-04 5.19E-04 5.05E-04 1.97E-04 2.88E-04
1.09E-04 3.64E-04 1.82E-04 2.91E-04 1.82E-04 3.62E-04 4.65E-04 3.89E-04
2.70E-04 3.37E-04 2.03E-04 1.70E-04 1.37E-04 2.08E-04 1.05E-04 2.45E-04
3.70E-04 3.34E-04 2.63E-04 3.21E-04 2.52E-04 2.81E-04 6.25E+09 2.51E-04
3.11E-04 3.68E-04 3.65E-04 2.71E-04 2.69E-04 1.49E-04 2.97E-04 4.70E-04
5.48E-04 4.12E-04 5.55E-04 5.94E-04 6.10E-04 5.44E-04 5.67E-04 4.53E-04
....
]
I want to estimate a linear model for each row of A without looping and avoid the loop
for i=1:7
ml[i]=fitlm(A(i,:),t);
end
Thanks for your help !
Luc
I believe that your probem is about undertanding how fitlm works, for matrix:
Let's work with the hald example for matlab:
>> load hald
>> Description
Description =
== Portland Cement Data ==
Multiple regression data
ingredients (%):
column1: 3CaO.Al2O3 (tricalcium aluminate)
column2: 3CaO.SiO2 (tricalcium silicate)
column3: 4CaO.Al2O3.Fe2O3 (tetracalcium aluminoferrite)
column4: 2CaO.SiO2 (beta-dicalcium silicate)
heat (cal/gm):
heat of hardening after 180 days
Source:
Woods,H., H. Steinour, H. Starke,
"Effect of Composition of Portland Cement on Heat Evolved
during Hardening," Industrial and Engineering Chemistry,
v.24 no.11 (1932), pp.1207-1214.
Reference:
Hald,A., Statistical Theory with Engineering Applications,
Wiley, 1960.
>> ingredients
ingredients =
7 26 6 60
1 29 15 52
11 56 8 20
11 31 8 47
7 52 6 33
11 55 9 22
3 71 17 6
1 31 22 44
2 54 18 22
21 47 4 26
1 40 23 34
11 66 9 12
10 68 8 12
>> heat
heat =
78.5000
74.3000
104.3000
87.6000
95.9000
109.2000
102.7000
72.5000
93.1000
115.9000
83.8000
113.3000
109.4000
This means that you have a matrix ingredients column % of ingredients in a component
>> sum(ingredients(1,:))
ans =
99 % so it is near 100%
and the rows are the 13 measures of the prodcut and the heat vector, the heat at the observation was taken.
>> mdl = fitlm(ingredients,heat)
mdl =
Linear regression model:
y ~ 1 + x1 + x2 + x3 + x4
Estimated Coefficients:
Estimate SE tStat pValue
________ _______ ________ ________
(Intercept) 62.405 70.071 0.8906 0.39913
x1 1.5511 0.74477 2.0827 0.070822
x2 0.51017 0.72379 0.70486 0.5009
x3 0.10191 0.75471 0.13503 0.89592
x4 -0.14406 0.70905 -0.20317 0.84407
Number of observations: 13, Error degrees of freedom: 8
Root Mean Squared Error: 2.45
R-squared: 0.982, Adjusted R-Squared 0.974
F-statistic vs. constant model: 111, p-value = 4.76e-07
So in your case, it not have sense to measure for each observation separately. is simply with t the same number of elements than observations.
take a look here
mdl = fitllm(A,t)
Problem solved using sapply and findgroups !
I have a number of wavelengths and their corresponding absorbances.
First I entered the x and y values
x = [400 425 450 475 500 505 510 525];
y = [.24 .382 .486 .574 .608 .608 .602 .508];
To plot the points
plot(x, y, 'o')
Then I want to fit the data.
I'm not sure what degree of polynomial to choose, but since it's a plot of Wavelength vs Absorbtion, wont there already be a mathematical formula? Like how you know a plot of Kinetic energy vs Velocity will be degree 2 because KE = 1/2mv^2?
Alright so here is a solution that works fine with your data, using polyfit and polyval to evaluate a polynomial that passes through your data points.
In the doc for polyfit (here), it states that
In general, for n points, you can fit a polynomial of degree n-1 to
exactly pass through the points.
Since you have 8 data points, we can try using a polynomial of degree 7 and see what it givesL
clear
clc
x = [400 425 450 475 500 505 510 525];
y = [.24 .382 .486 .574 .608 .608 .602 .508];
%// Get polynomial coefficients to fit the data
p = polyfit(x,y,7)
%// Create polynomial to plot
fFit = polyval(p,x);
plot(x,y,'o')
hold on
plot(x,fFit,'r--')
hold off
axis([400 525 0 .7]);
legend({'Data points' 'Fitted curve'},'Location','NorthWest')
gives this:
So it does look to work very well! If we look at the coefficients given by polyfit:
p =
1.0e+05 *
Columns 1 through 6
0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0003
Columns 7 through 8
0.0401 -2.7206
Maybe the degree 7 was a bit overkill since the first 5 coefficients are 0 ( or about 0), but anyhow it fits very well!
Hope that helps!
I want to solve linear Programming by MATLAB . For this purpose , I am following the following link . Linear Programming .
Here , a sample problem is given :
Find x that minimizes
f(x) = –5x1 – 4x2 –6x3,
subject to
x1 – x2 + x3 ≤ 20
3x1 + 2x2 + 4x3 ≤ 42
3x1 + 2x2 ≤ 30
0 ≤ x1, 0 ≤ x2, 0 ≤ x3.
First, enter the coefficients
f = [-5; -4; -6];
A = [1 -1 1
3 2 4
3 2 0];
b = [20; 42; 30];
lb = zeros(3,1);
Next, call a linear programming routine.
[x,fval,exitflag,output,lambda] = linprog(f,A,b,[],[],lb);
My question is that what is meant by this line ?
lb = zeros(3,1);
Without this line , all problems solvable by MATLAB is seen as infeasible . Can you help me in this purpose ?
This is not common to ALL linear problems. Here you deal with a problem where there are some constraints on the minimal values of the solution:
0 ≤ x1, 0 ≤ x2, 0 ≤ x3
You have to set up these constraints in the parameters of your problem. The way to do so is by specifying lower boundaries of the solution, which is the 5th argument.
Without this line, the domain on which you search for a solution is not bounded, and exitflag has the value -3 after calling the function, which is precisely the error code for unbounded problems.
I am trying to understand how deconv works in Matlab.
Can anyone clarify that for me by explaining how this is calculated
[quotient,remainder]=deconv([1 2 8 4 4],[1 1 2 2])
quotient=
1 1
remainder=
0 0 5 0 2
I need to understand the step by step method of calculation.
Thank you.
Well, if you understand polynomial (long) division, you already have it. This result just says that
x^4 + 2x^3 + 8x^2 + 4x + 4
divided by
x^3 + x^2 + 2x + 2
equals
x + 1
with remainder
5x^2 + 2
The reason is that convolution is the same as polynomial multiplication, and thus deconvolution is polynomial division.
This is mentioned in deconv documentation:
If u and v are vectors of polynomial coefficients, convolving them is equivalent to multiplying the two polynomials, and deconvolution is polynomial division. The result of dividing v by u is quotient q and remainder r.