Difference between lm and dynlm - linear-regression

I am trying to decide between using lm and dylm for some regressions and have done some tests to see if I have reliable results using the lag operator.
However, I found a result that seemed unusual to me.
First, I did the estimation with the regressor already with the lag (1) and then using the lag function (2) in lm.
(1)
lm_PCLR_stack <-lm (IPCA_diff1 ~ IPCA_diff1_lag1 + U3_log -1, data = final_data_clean)
summary(lm_PCLR_stack)
Call:
lm(formula = IPCA_diff1 ~ IPCA_diff1_lag1 + U3_log - 1, data = final_data_clean)
Residuals:
Min 1Q Median 3Q Max
-0.0067594 -0.0021942 -0.0001499 0.0017506 0.0123410
Coefficients:
Estimate Std. Error t value Pr(>|t|)
IPCA_diff1_lag1 0.5348701 0.0934555 5.723 1.53e-07 ***
U3_log 0.0030485 0.0007368 4.138 8.22e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.003406 on 85 degrees of freedom
Multiple R-squared: 0.8029, Adjusted R-squared: 0.7983
F-statistic: 173.1 on 2 and 85 DF, p-value: < 2.2e-16
(2)
lmlag_PCLR_stack <-lm(IPCA_diff1 ~ lag(IPCA_diff1, 1) + U3_log -1, data = final_data_clean)
summary(lmlag_PCLR_stack)
Call:
lm(formula = IPCA_diff1 ~ lag(IPCA_diff1, 1) + U3_log - 1, data = final_data_clean)
Residuals:
Min 1Q Median 3Q Max
-0.0067577 -0.0021978 -0.0001549 0.0017687 0.0123458
Coefficients:
Estimate Std. Error t value Pr(>|t|)
lag(IPCA_diff1, 1) 0.5342741 0.0942942 5.666 2.00e-07 ***
U3_log 0.0030494 0.0007412 4.114 9.03e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.003426 on 84 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.8004, Adjusted R-squared: 0.7956
F-statistic: 168.4 on 2 and 84 DF, p-value: < 2.2e-16
So I repeated the process using dynlm. For the regression with the regressor already with the lag (3) the result was the same, but for the regression with the lag operator (4), the result was not only different but a little strange.
(3)
dynlm_PCLR_stack <-dynlm(IPCA_diff1 ~ IPCA_diff1_lag1 + U3_log - 1, data = final_data_clean)
summary(dynlm_PCLR_stack)
Time series regression with "numeric" data:
Start = 1, End = 87
Call:
dynlm(formula = IPCA_diff1 ~ IPCA_diff1_lag1 + U3_log - 1, data = final_data_clean)
Residuals:
Min 1Q Median 3Q Max
-0.0067594 -0.0021942 -0.0001499 0.0017506 0.0123410
Coefficients:
Estimate Std. Error t value Pr(>|t|)
IPCA_diff1_lag1 0.5348701 0.0934555 5.723 1.53e-07 ***
U3_log 0.0030485 0.0007368 4.138 8.22e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.003406 on 85 degrees of freedom
Multiple R-squared: 0.8029, Adjusted R-squared: 0.7983
F-statistic: 173.1 on 2 and 85 DF, p-value: < 2.2e-16
(4)
dynlm_lag_PCLR_stack <-dynlm(IPCA_diff1 ~ L(IPCA_diff1) + (U3_log) - 1, data = final_data_clean)
summary(dynlm_lag_PCLR_stack)
essentially perfect fit: summary may be unreliable
Time series regression with "numeric" data:
Start = 1, End = 87
Call:
dynlm(formula = IPCA_diff1 ~ L(IPCA_diff1) + (U3_log) - 1, data = final_data_clean)
Residuals:
Min 1Q Median 3Q Max
-2.302e-18 -4.373e-19 -2.505e-19 -1.223e-19 3.053e-17
Coefficients:
Estimate Std. Error t value Pr(>|t|)
L(IPCA_diff1) 1.000e+00 9.088e-17 1.10e+16 <2e-16 ***
U3_log 1.575e-19 7.112e-19 2.21e-01 0.825
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.359e-18 on 85 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 2.217e+32 on 2 and 85 DF, p-value: < 2.2e-16
Any tips on what might have happened? There are no NAs at the base.
Thanks in advance!

Related

Matlab calculating line parameter x for 10 layers in the given wavelength range

I know how to calculate the line parameter defined as x below for one layer, considering the given wavelength range 50 to 550 um. Now I want to repeat this calculation for all 10 layers. all the other parameters remain as a constant while temperature varies from layer 1 to 10.Any suggestion would be greatly appreciated.
wl=[100 200 300 400 500]; %5 wavelengths, 5 spectral lines
br=[0.12 0.56 0.45 0.67 0.89]; % broadening parameter for each wavelength
T=[101 102 103 104 105 106 107 108 109 110];% temperature for 10 layers
wlall=linspace(50,550,40);%all the wavelength in 50um to 550 um range
% x is defined as,
%(br*wl/(br*br + (wlall-wl)^2))*br;
%If I do a calculation for the first line
((br(1)*T(1)*wl(1))./(br(1)*br(1)*(T(1)) + (wlall(:)-wl(1)).^2))*br(1)*T(1)
%Now I'm going to calculate it for all the lines in the first layer
k= repmat(wlall,5,1);
for i=1:5;
kn(i,:)=(br(i)*T(1)* wl(i)./(br(i)*br(i)*T(1) + (k(i,:)-
wl(i)).^2))*br(i)*T(1);
end
%Above code gives me x parameter for all the wavelengths in the
%given range( 50 to 550 um) in the first layer, dimension is (5,40)
% I need only the maximum value of each column
an=(kn(:,:)');
[ll,mm]=sort(an,2,'descend');
vn=(ll(:,1))'
%Now my output has the dimension , (1,40) one is for the first layer, 40 is
%for the maximum x parameter corresponding to each wavelength in first layer
%Now I want to calculate the x parameter in all 10 layers,So T should vary
%from T(1) to T(10) and get the
%maximum in each column, so my output should have the dimension ( 10, 40)
You just need to run an extra 'for' loop for each value of 'T'. Here is an example:
clc; close all; clear all;
wl=[100 200 300 400 500]; %5 wavelengths, 5 spectral lines
br=[0.12 0.56 0.45 0.67 0.89]; % broadening parameter for each wavelength
T=[101 102 103 104 105 106 107 108 109 110];% temperature for 10 layers
wlall=linspace(50,550,40);%all the wavelength in 50um to 550 um range
% x is defined as,
%(br*wl/(br*br + (wlall-wl)^2))*br;
%If I do a calculation for the first line
((br(1)*T(1)*wl(1))./(br(1)*br(1)*(T(1)) + (wlall(:)-wl(1)).^2))*br(1)*T(1)
%Now I'm going to calculate it for all the lines in the first layer
k= repmat(wlall,5,1);
for index = 1:numel(T)
for i=1:5
kn(i,:, index)=(br(i)*T(index)* wl(i)./(br(i)*br(i)*T(index) + (k(i,:)- wl(i)).^2))*br(i)*T(index);
end
an(:, :, index) = transpose(kn(:, :, index));
vn(:, index) = max(an(:, :, index), [], 2);
end
vn = transpose(vn);

Matlab to calculate a spectral line parameter for each layer

I need to calculate a parameter defined as x,( this is defined in my code below) for the given spectral lines in each layer. My atmospheric profile has 10 layers. I know how to calculate x for just one layer. Then I get 5 values for x corresponding to each spectral line ( or wavelength) .
Suppose I want to do this for all 10 layers. Then my output should have 10 rows and 5 columns , size should be (10,5) , 10 represent number of the layer and 5 represent the spectral line. Any suggestion would be greatly appreciated
wl=[100 200 300 400 500]; %5 wavelengths, 5 spectral lines
br=[0.12 0.56 0.45 0.67 0.89]; % broadening parameter for each wavelength
p=[1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 ]; % pressure for 10 layers
T=[101 102 103 104 105 106 107 108 109 110]; % temperature for 10 layers
%suppose I want to caculate a parameter, x for all the layers
% x is defined as,( wavelength*br*T)/p
%when I do the calculation for the first layer,I have to consider all the
%wavelengths , all the broadening parameters and only the first value of
%pressure and only the first value of temperature
for i=1:5;
x(i)= (wl(i)*br(i)*T(1))/p(1);
end
% x is the x parameter for all the wavelengths in the first layer
%Now I want to calculate the x parameter for all the wavelengths in all 10
%layers
%my output should have 10 rows for 10 layers and 5 columns , size= (10,5)
you don't need loops for this case
>> (T./p)'*(wl.*br)
ans =
1.0e+05 *
0.0121 0.1131 0.1364 0.2707 0.4495
0.0136 0.1269 0.1530 0.3037 0.5043
0.0155 0.1442 0.1738 0.3451 0.5729
0.0178 0.1664 0.2006 0.3982 0.6611
0.0210 0.1960 0.2362 0.4690 0.7788
0.0254 0.2374 0.2862 0.5682 0.9434
0.0321 0.2996 0.3611 0.7169 1.1904
0.0432 0.4032 0.4860 0.9648 1.6020
0.0654 0.6104 0.7358 1.4606 2.4253
0.1320 1.2320 1.4850 2.9480 4.8950

Interpreting Matlab Fitln

I have a series of 200 x/y data-points and am using matlab to generate a model. I am trying to determine of what order the polynomial function generated by fitln should be. I tried starting at 6, hoping that some higher-order coefficients wouldn't be significant, but get the following:
Linear regression model:
y ~ 1 + x1 + x1^2 + x1^3 + x1^4 + x1^5 + x1^6
Estimated Coefficients:
Estimate SE tStat pValue
___________ __________ _______ __________
(Intercept) 0 0 NaN NaN
x1 11897 462.8 25.706 2.1825e-64
x1^2 -442.92 26.689 -16.596 4.438e-39
x1^3 7.323 0.55975 13.083 1.8862e-28
x1^4 -0.059949 0.0053902 -11.122 1.516e-22
x1^5 0.00023784 2.4198e-05 9.8286 9.3122e-19
x1^6 -3.6511e-07 4.1034e-08 -8.8978 4.0169e-16
Number of observations: 201, Error degrees of freedom: 195
Root Mean Squared Error: 1.36e+04
R-squared: 0.519, Adjusted R-Squared 0.506
F-statistic vs. constant model: 42, p-value = 3.1e-29
I get the following with a polynomial of order 5:
Linear regression model:
y ~ 1 + x1 + x1^2 + x1^3 + x1^4
Estimated Coefficients:
Estimate SE tStat pValue
___________ __________ _______ __________
(Intercept) 1.0011e+05 19.058 5252.9 0
x1 -19.02 1.3004 -14.626 3.0955e-33
x1^2 0.27502 0.026087 10.542 7.1559e-21
x1^3 -0.0029912 0.00019381 -15.434 1.0751e-35
x1^4 -2.1979e-06 4.7601e-07 -4.6174 7.0203e-06
Number of observations: 201, Error degrees of freedom: 196
Root Mean Squared Error: 52.4
R-squared: 1, Adjusted R-Squared 1
F-statistic vs. constant model: 5.8e+05, p-value = 0
Now, I noticed that in all cases p values are very low (good, I suppose) and R-squared is greater than 0.5 (which I assume is also good).
So, I am not really sure what to make of this data. I know that I should aim for lower-order polynomials, but how can I justify this?

How to plot a DET curve from results provided by Weka?

I am facing a problem of classification between 4 classes, I used for this classification Weka and I get a result in this form:
Correctly Classified Instances 3860 96.5 %
Incorrectly Classified Instances 140 3.5 %
Kappa statistic 0.9533
Mean absolute error 0.0178
Root mean squared error 0.1235
Relative absolute error 4.7401 %
Root relative squared error 28.5106 %
Total Number of Instances 4000
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.98 0.022 0.936 0.98 0.957 0.998 A
0.92 0.009 0.973 0.92 0.946 0.997 B
0.991 0.006 0.982 0.991 0.987 1 C
0.969 0.01 0.971 0.969 0.97 0.998 D
Weighted Avg. 0.965 0.012 0.965 0.965 0.965 0.998
=== Confusion Matrix ===
a b c d <-- classified as
980 17 1 2 | a = A
61 920 1 18 | b = B
0 0 991 9 | c = C
6 9 16 969 | d = D
My goal now is to draw (The Detection Error Trade-off) DET curve from results provided by Weka.
I found a MATLAB code that allows me to draw the DET curve, here are some line of code in this function:
Ntrials_True = 1000;
True_scores = randn(Ntrials_True,1);
Ntrials_False = 1000;
mean_False = -3;
stdv_False = 1.5;
False_scores = stdv_False * randn(Ntrials_False,1) + mean_False;
%-----------------------
% Compute Pmiss and Pfa from experimental detection output scores
[P_miss,P_fa] = Compute_DET(True_scores,False_scores);
the code of function Compute_DET is:
[Pmiss, Pfa] = Compute_DET(true_scores, false_scores)
num_true = max(size(true_scores));
num_false = max(size(false_scores));
total=num_true+num_false;
Pmiss = zeros(num_true+num_false+1, 1); %preallocate for speed
Pfa = zeros(num_true+num_false+1, 1); %preallocate for speed
scores(1:num_false,1) = false_scores;
scores(1:num_false,2) = 0;
scores(num_false+1:total,1) = true_scores;
scores(num_false+1:total,2) = 1;
scores=DETsort(scores);
sumtrue=cumsum(scores(:,2),1);
sumfalse=num_false - ([1:total]'-sumtrue);
Pmiss(1) = 0;
Pfa(1) = 1.0;
Pmiss(2:total+1) = sumtrue ./ num_true;
Pfa(2:total+1) = sumfalse ./ num_false;
return
but I have a problem with the translation of the meaning of different parameters. for example what is the significance of mean_False and stdv_False and what is the correspondence with the parameters of Weka?

find exact numbers of true postive in weka

In WEKA, I can easily find the TP Rate and total True Classified Instances from Confusion Matrix but is there any way to see exact number of tp and/or tn?
And do you know any way to find these values in matlab-anfis?
Since you are mentioning MATLAB, I'm assuming you are using the Java API to the Weka library to programmatically build classifiers.
In that case, you can evaluate the model using the weka.classifiers.Evaluation class, which provides all sorts of statistics.
Assuming you already have weka.jar file on the java class path (see javaaddpath function), here is an example in MATLAB:
%# data
fName = 'C:\Program Files\Weka-3-7\data\iris.arff';
loader = weka.core.converters.ArffLoader();
loader.setFile( java.io.File(fName) );
data = loader.getDataSet();
data.setClassIndex( data.numAttributes()-1 );
%# classifier
classifier = weka.classifiers.trees.J48();
classifier.setOptions( weka.core.Utils.splitOptions('-C 0.25 -M 2') );
classifier.buildClassifier( data );
%# evaluation
evl = weka.classifiers.Evaluation(data);
pred = evl.evaluateModel(classifier, data, {''});
%# display
disp(classifier.toString())
disp(evl.toSummaryString())
disp(evl.toClassDetailsString())
disp(evl.toMatrixString())
%# confusion matrix and other stats
cm = evl.confusionMatrix();
%# number of TP/TN/FP/FN with respect to class=1 (Iris-versicolor)
tp = evl.numTruePositives(1);
tn = evl.numTrueNegatives(1);
fp = evl.numFalsePositives(1);
fn = evl.numFalseNegatives(1);
%# class=XX is a zero-based index which maps to the following class values
classValues = arrayfun(#(k)char(data.classAttribute.value(k-1)), ...
1:data.classAttribute.numValues, 'Uniform',false);
The output:
J48 pruned tree
------------------
petalwidth <= 0.6: Iris-setosa (50.0)
petalwidth > 0.6
| petalwidth <= 1.7
| | petallength <= 4.9: Iris-versicolor (48.0/1.0)
| | petallength > 4.9
| | | petalwidth <= 1.5: Iris-virginica (3.0)
| | | petalwidth > 1.5: Iris-versicolor (3.0/1.0)
| petalwidth > 1.7: Iris-virginica (46.0/1.0)
Number of Leaves : 5
Size of the tree : 9
Correctly Classified Instances 147 98 %
Incorrectly Classified Instances 3 2 %
Kappa statistic 0.97
Mean absolute error 0.0233
Root mean squared error 0.108
Relative absolute error 5.2482 %
Root relative squared error 22.9089 %
Coverage of cases (0.95 level) 98.6667 %
Mean rel. region size (0.95 level) 34 %
Total Number of Instances 150
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 Iris-setosa
0.980 0.020 0.961 0.961 0.961 0.955 0.990 0.969 Iris-versicolor
0.960 0.010 0.980 0.980 0.980 0.955 0.990 0.970 Iris-virginica
Weighted Avg. 0.980 0.010 0.980 0.980 0.980 0.970 0.993 0.980
=== Confusion Matrix ===
a b c <-- classified as
50 0 0 | a = Iris-setosa
0 49 1 | b = Iris-versicolor
0 2 48 | c = Iris-virginica