I am using mle and mlecov to estimate the mean and variance of the scalar noise signal n which is assumed to be normally distributed with the following models for mean and standard deviation:
mean(x,y) = #(x,y) k(1)+k(2)*x+k(3)*x.^2+k(4)*y+k(5)*y.^2;
sd(x,y) = #(x,y) k(6)+k(7)*x+k(8)*x.^2+k(9)*y+k(10)*y.^2;
where x is in the [0,3] interval and and y is in the [0,pi/2] interval (thus, scaling does not immediately seem to be an issue). The sample of n, x and y values used for MLE has 10981 samples. Here are some graphs to show the sample qualitatively:
Figure 1. Histogram of the noise samples.
Figure 2. Scatter plot of the noise samples vs. the x and y samples respectively.
My goal is to compute the maximum likelihood estimates for the k(i) model parameters, i=1,...,10, as well as their standard deviation, kSE(i) (given by the square root of the diagonal elements of the asymptotic covariance matrix output by mlecov).
For the maximum likelihood estimation, I minimize the negative log likelihood:
I also give MATLAB the analytical gradient of the negative log likelihood L(k(1),...,k(10)), used by mle and mlecov such that numerical approximations of the gradient hopefully do not contribute to the numerical issue I am about to describe.
Numerical Issue
To demonstrate the issue, I present three scenarios.
Scenario 1. I directly run mle and mlecov on the sample data. This outputs the following Stata-like summary:
-----------------------------------------------------------------------------
Coeffs | Val. Std. Err. z P>|z| [95% Conf. Interval]
---------+-------------------------------------------------------------------
k1 | -0.0153 0.0014 -11.27 0.000 -0.0179 -0.0126
k2 | 0.0075 0.0016 4.79 0.000 0.0045 0.0106
k3 | 0.0045 0.0006 7.44 0.000 0.0033 0.0056
k4 | 0.0131 0.0023 5.57 0.000 0.0085 0.0177
k5 | -0.0101 0.0012 -8.45 0.000 -0.0125 -0.0078
k6 | 0.0114 0.0011 10.25 0.000 0.0092 0.0135
k7 | 0.0244 0.0011 21.86 0.000 0.0222 0.0266
k8 | -0.0001 0.0004 -0.34 0.732 -0.0010 0.0007
k9 | -0.0190 0.0018 -10.48 0.000 -0.0225 -0.0154
k10 | 0.0057 0.0009 6.32 0.000 0.0039 0.0074
-----------------------------------------------------------------------------
The "Val." column corresponds to the k(i) estimates and the "Std. Err." column corresponds to kSE(i). The "P>|z|" column gives the p-value for a single coefficient Wald test of the null hypothesis k(i)==0 (if this p-value is <0.05, we reject the null hypothesis and thus conclude that the coefficient k(i) may be significant at the 95% level).
Note that to compute the asymptotic covariance matrix of the k(i) estimates, mlecov computes the Hessian H of L(k(1),...,k(10)) - which I provide an analytic gradient for. The condition number of H is cond(H)=2.7437e3. The mlecov function does a Cholesky factorization of the Hessian, which gives the upper-triangular matrix R with cond(R)=52.38.
Scenario 2. I multiply all samples by 0.1 and thus run mle and mlecov on the sample data n*0.1, x*0.1 and y*0.1. This outputs the following summary:
-----------------------------------------------------------------------------
Coeffs | Val. Std. Err. z P>|z| [95% Conf. Interval]
---------+-------------------------------------------------------------------
k1 | -0.0010 0.0001 -7.39 0.000 -0.0013 -0.0008
k2 | 0.0063 0.0016 3.97 0.000 0.0032 0.0093
k3 | 0.0494 0.0060 8.21 0.000 0.0376 0.0611
k4 | 0.0023 0.0024 0.95 0.340 -0.0024 0.0070
k5 | -0.0462 0.0123 -3.75 0.000 -0.0704 -0.0221
k6 | 0.0014 0.0001 12.30 0.000 0.0012 0.0016
k7 | 0.0220 0.0011 20.86 0.000 0.0200 0.0241
k8 | 0.0078 0.0042 1.87 0.062 -0.0004 0.0160
k9 | -0.0228 0.0020 -11.27 0.000 -0.0267 -0.0188
k10 | 0.0747 0.0097 7.70 0.000 0.0557 0.0937
-----------------------------------------------------------------------------
The p-values have changed. Also, now cond(H)=9.3831e5 (!!!) and cond(R)=968.6616. Note that when I remove the second order terms (x.^2 and y.^2) from the mean and standard deviation models, there is no longer this problem (i.e. the p-values stay the same and the k(i) values, except for the constant terms k(1) and k(6), are simply scaled by 0.1). Does this indicate a numerical issue?
Scenario 3. I decided to also try scaling n, x and y to the interval [-1,1] by dividing their samples by the largest element (i.e. n(i)=n(i)/max(abs(n)), x(i)=x(i)/max(abs(x)) and y(i)=y(i)/max(abs(y))). Running mle and mlecov on this scaled sample outputs the following summary:
-----------------------------------------------------------------------------
Coeffs | Val. Std. Err. z P>|z| [95% Conf. Interval]
---------+-------------------------------------------------------------------
k1 | -0.0347 0.0041 -8.40 0.000 -0.0428 -0.0266
k2 | 0.1193 0.0141 8.46 0.000 0.0917 0.1470
k3 | 0.0482 0.0164 2.94 0.003 0.0160 0.0803
k4 | -0.0002 0.0120 -0.02 0.987 -0.0238 0.0234
k5 | -0.0305 0.0103 -2.96 0.003 -0.0506 -0.0103
k6 | 0.0557 0.0035 16.11 0.000 0.0489 0.0624
k7 | 0.1131 0.0107 10.60 0.000 0.0922 0.1341
k8 | 0.1164 0.0128 9.13 0.000 0.0914 0.1414
k9 | -0.1132 0.0094 -11.99 0.000 -0.1317 -0.0947
k10 | 0.0583 0.0079 7.37 0.000 0.0428 0.0738
-----------------------------------------------------------------------------
The p-values have changed again! Now cond(H)=4.7550e3 (higher than Scenario 1 (unscaled) but lower than Scenario 2 (everything multiplied by 0.1)). Also, cond(R)=68.9565, which is only slightly higher than for Scenario 1.
My problem
The expected behavior across the three analyses, for me, is that k(i) and kSE(i) would change but the p-values would remain the same - in other words, scaling the data should not make any model coefficient more or less statistically significant. This is contrary to the above scenarios, where the p-values change each time!
Please help me to debug this numerical issue - or explain whether this is in fact the expected behavior and I have misunderstood something. Thank you for reading this long post and helping - I tried to encapsulate all relevant problem details here.
First, I assume you are controlling the random seed of the sampling so that's the same in all scenarios.
That taken care of, I think it may have something to do with the optimization problem you're trying to solve.
I have firsthand experience that tiny numerical changes (in my case, scaling the loglikelihood function by a factor, or equivalently: adding copies of all the datapoints) will change your result when the objective function is not convex.
I would try to derive the analytical gradient of the loglikelihood function in all of the parameters.
This should give you an idea of whether the optimization problem is convex.
If it is not convex, there are some things to do to make sure you get the real MLE.
Optimize the function 1000 times and pick the estimate with the highest loglikelihood
Change the tolerance and number of steps of the optimizer
Try other optimizers, like trust-region searches or particle swarms
I would start by simulating a simpler version of this problem and build it up gradually to see where this behaviour starts happening. For example, start with just 1 parameter for the mean and 1 for the noise, and see what happens with the p values then.
Related
I need to calculate a parameter defined as x,( this is defined in my code below) for the given spectral lines in each layer. My atmospheric profile has 10 layers. I know how to calculate x for just one layer. Then I get 5 values for x corresponding to each spectral line ( or wavelength) .
Suppose I want to do this for all 10 layers. Then my output should have 10 rows and 5 columns , size should be (10,5) , 10 represent number of the layer and 5 represent the spectral line. Any suggestion would be greatly appreciated
wl=[100 200 300 400 500]; %5 wavelengths, 5 spectral lines
br=[0.12 0.56 0.45 0.67 0.89]; % broadening parameter for each wavelength
p=[1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 ]; % pressure for 10 layers
T=[101 102 103 104 105 106 107 108 109 110]; % temperature for 10 layers
%suppose I want to caculate a parameter, x for all the layers
% x is defined as,( wavelength*br*T)/p
%when I do the calculation for the first layer,I have to consider all the
%wavelengths , all the broadening parameters and only the first value of
%pressure and only the first value of temperature
for i=1:5;
x(i)= (wl(i)*br(i)*T(1))/p(1);
end
% x is the x parameter for all the wavelengths in the first layer
%Now I want to calculate the x parameter for all the wavelengths in all 10
%layers
%my output should have 10 rows for 10 layers and 5 columns , size= (10,5)
you don't need loops for this case
>> (T./p)'*(wl.*br)
ans =
1.0e+05 *
0.0121 0.1131 0.1364 0.2707 0.4495
0.0136 0.1269 0.1530 0.3037 0.5043
0.0155 0.1442 0.1738 0.3451 0.5729
0.0178 0.1664 0.2006 0.3982 0.6611
0.0210 0.1960 0.2362 0.4690 0.7788
0.0254 0.2374 0.2862 0.5682 0.9434
0.0321 0.2996 0.3611 0.7169 1.1904
0.0432 0.4032 0.4860 0.9648 1.6020
0.0654 0.6104 0.7358 1.4606 2.4253
0.1320 1.2320 1.4850 2.9480 4.8950
import weka.core.Instances.*
filename = 'C:\Users\Girish\Documents\MATLAB\DRESDEN_NSC.csv';
loader = weka.core.converters.CSVLoader();
loader.setFile(java.io.File(filename));
data = loader.getDataSet();
data.setClassIndex(data.numAttributes()-1);
%% classification
classifier = weka.classifiers.trees.J48();
classifier.setOptions( weka.core.Utils.splitOptions('-C 0.25 -M 2') );
classifier.buildClassifier(data);
classifier.toString()
ev = weka.classifiers.Evaluation(data);
v(1) = java.lang.String('-t');
v(2) = java.lang.String(filename);
v(3) = java.lang.String('-split-percentage');
v(4) = java.lang.String('66');
prm = cat(1,v(1:4));
ev.evaluateModel(classifier, prm)
Result:
Time taken to build model: 0.04 seconds
Time taken to test model on training split: 0.01 seconds
=== Error on training split ===
Correctly Classified Instances 767 99.2238 %
Incorrectly Classified Instances 6 0.7762 %
Kappa statistic 0.9882
Mean absolute error 0.0087
Root mean squared error 0.0658
Relative absolute error 1.9717 %
Root relative squared error 14.042 %
Total Number of Instances 773
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.994 0.009 0.987 0.994 0.990 0.984 0.999 0.999 Nikon
1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 Sony
0.981 0.004 0.990 0.981 0.985 0.980 0.999 0.997 Canon
Weighted Avg. 0.992 0.004 0.992 0.992 0.992 0.988 1.000 0.999
=== Confusion Matrix ===
a b c <-- classified as
306 0 2 | a = Nikon
0 258 0 | b = Sony
4 0 203 | c = Canon
=== Error on test split ===
Correctly Classified Instances 358 89.9497 %
Incorrectly Classified Instances 40 10.0503 %
Kappa statistic 0.8482
Mean absolute error 0.0656
Root mean squared error 0.2464
Relative absolute error 14.8485 %
Root relative squared error 52.2626 %
Total Number of Instances 398
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.885 0.089 0.842 0.885 0.863 0.787 0.908 0.832 Nikon
0.993 0.000 1.000 0.993 0.997 0.995 0.997 0.996 Sony
0.796 0.060 0.841 0.796 0.818 0.749 0.897 0.744 Canon
Weighted Avg. 0.899 0.048 0.900 0.899 0.899 0.853 0.938 0.867
=== Confusion Matrix ===
a b c <-- classified as
123 0 16 | a = Nikon
0 145 1 | b = Sony
23 0 90 | c = Canon
import weka.core.Instances.*
filename = 'C:\Users\Girish\Documents\MATLAB\DRESDEN_NSC.csv';
loader = weka.core.converters.CSVLoader();
loader.setFile(java.io.File(filename));
data = loader.getDataSet();
data.setClassIndex(data.numAttributes()-1);
%% classification
classifier = weka.classifiers.trees.J48();
classifier.setOptions( weka.core.Utils.splitOptions('-C 0.1 -M 1') );
classifier.buildClassifier(data);
classifier.toString()
ev = weka.classifiers.Evaluation(data);
v(1) = java.lang.String('-t');
v(2) = java.lang.String(filename);
v(3) = java.lang.String('-split-percentage');
v(4) = java.lang.String('66');
prm = cat(1,v(1:4));
ev.evaluateModel(classifier, prm)
Result:
Time taken to build model: 0.04 seconds
Time taken to test model on training split: 0 seconds
=== Error on training split ===
Correctly Classified Instances 767 99.2238 %
Incorrectly Classified Instances 6 0.7762 %
Kappa statistic 0.9882
Mean absolute error 0.0087
Root mean squared error 0.0658
Relative absolute error 1.9717 %
Root relative squared error 14.042 %
Total Number of Instances 773
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.994 0.009 0.987 0.994 0.990 0.984 0.999 0.999 Nikon
1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 Sony
0.981 0.004 0.990 0.981 0.985 0.980 0.999 0.997 Canon
Weighted Avg. 0.992 0.004 0.992 0.992 0.992 0.988 1.000 0.999
=== Confusion Matrix ===
a b c <-- classified as
306 0 2 | a = Nikon
0 258 0 | b = Sony
4 0 203 | c = Canon
=== Error on test split ===
Correctly Classified Instances 358 89.9497 %
Incorrectly Classified Instances 40 10.0503 %
Kappa statistic 0.8482
Mean absolute error 0.0656
Root mean squared error 0.2464
Relative absolute error 14.8485 %
Root relative squared error 52.2626 %
Total Number of Instances 398
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.885 0.089 0.842 0.885 0.863 0.787 0.908 0.832 Nikon
0.993 0.000 1.000 0.993 0.997 0.995 0.997 0.996 Sony
0.796 0.060 0.841 0.796 0.818 0.749 0.897 0.744 Canon
Weighted Avg. 0.899 0.048 0.900 0.899 0.899 0.853 0.938 0.867
=== Confusion Matrix ===
a b c <-- classified as
123 0 16 | a = Nikon
0 145 1 | b = Sony
23 0 90 | c = Canon
Same Result with both split options which is the result for default options i.e. -C 0.25 -M 2 for J48 classifier
please help!!! stuck here for a long time.Tried Different means but nothing worked for me
I am trying to classify vehicles in matlab. I need to reduce the dimensionality of the features to eliminate redundancy. Am using pca for this. Unfortunately, the the pca function is not returning the expected results. The output seems truncated and i don't understand why.
summary of this is as follows:
Components_matrix = [Areas_vector MajorAxisLengths_vector MinorAxisLengths_vector Perimeters_vector...
EquivDiameters_vector Extents_vector Orientations_vector Soliditys_vector]
The output is:
Components_matrix =
1.0e+03 *
1.4000 0.1042 0.0220 0.3352 0.0422 0.0003 0.0222 0.0006
2.7690 0.0998 0.0437 0.3973 0.0594 0.0005 0.0234 0.0007
1.7560 0.0853 0.0317 0.2610 0.0473 0.0005 0.0236 0.0008
1.0870 0.0920 0.0258 0.3939 0.0372 0.0003 0.0157 0.0005
0.7270 0.0583 0.0233 0.2451 0.0304 0.0004 0.0093 0.0006
1.2380 0.0624 0.0317 0.2436 0.0397 0.0004 0.0106 0.0007
Then i used the pca function as follows:
[COEFF, SCORE, LATENT] = pca(Components_matrix)
The displayed results are:
COEFF =
0.9984 -0.0533 -0.0057 -0.0177 0.0045
0.0162 0.1810 0.8788 0.0695 -0.3537
0.0099 -0.0218 -0.2809 0.8034 -0.2036
0.0514 0.9817 -0.1739 -0.0016 0.0468
0.0138 -0.0018 0.0616 0.4276 -0.3585
0.0001 -0.0008 -0.0025 0.0215 0.0210
0.0069 0.0158 0.3388 0.4070 0.8380
0.0001 -0.0011 0.0022 0.0198 0.0016
SCORE =
1.0e+03 *
-0.0946 0.0312 0.0184 -0.0014 -0.0009
1.2758 0.0179 -0.0086 -0.0008 0.0001
0.2569 -0.0642 0.0107 0.0016 0.0012
-0.4043 0.1031 -0.0043 0.0015 0.0003
-0.7721 -0.0299 -0.0079 -0.0017 0.0012
-0.2617 -0.0580 -0.0083 0.0008 -0.0020
LATENT =
1.0e+05 *
5.0614
0.0406
0.0014
0.0000
0.0000
I expected for instance COEFF and LATENT to be 8x8 and 8x1 matrices respectively. But that is not what i get. Why is this so and how can the situation be rectified. Kindly help.
Your usage of pca() and Matlab's output are correct. The issue is that you have more dimensions than you have samples, i.e., you only have 6 vehicles but 8 variables. If you have N samples and N or greater variables, the number of principal components there are is only N-1, because further components would not be unique. So COEFF are the eigenvectors of the covariance matrix of the input, and SCORE(:,1) is the first principal component, SCORE(:,2) is the second, etc., of which there are only N-1=5 in total, and LATENT are the eigenvalues of the covariance matrix, or the amount of variance explained by each successive principal component, of which there are, again, only N-1=5.
There is a more detailed discussion of this here.
I have a number of wavelengths and their corresponding absorbances.
First I entered the x and y values
x = [400 425 450 475 500 505 510 525];
y = [.24 .382 .486 .574 .608 .608 .602 .508];
To plot the points
plot(x, y, 'o')
Then I want to fit the data.
I'm not sure what degree of polynomial to choose, but since it's a plot of Wavelength vs Absorbtion, wont there already be a mathematical formula? Like how you know a plot of Kinetic energy vs Velocity will be degree 2 because KE = 1/2mv^2?
Alright so here is a solution that works fine with your data, using polyfit and polyval to evaluate a polynomial that passes through your data points.
In the doc for polyfit (here), it states that
In general, for n points, you can fit a polynomial of degree n-1 to
exactly pass through the points.
Since you have 8 data points, we can try using a polynomial of degree 7 and see what it givesL
clear
clc
x = [400 425 450 475 500 505 510 525];
y = [.24 .382 .486 .574 .608 .608 .602 .508];
%// Get polynomial coefficients to fit the data
p = polyfit(x,y,7)
%// Create polynomial to plot
fFit = polyval(p,x);
plot(x,y,'o')
hold on
plot(x,fFit,'r--')
hold off
axis([400 525 0 .7]);
legend({'Data points' 'Fitted curve'},'Location','NorthWest')
gives this:
So it does look to work very well! If we look at the coefficients given by polyfit:
p =
1.0e+05 *
Columns 1 through 6
0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0003
Columns 7 through 8
0.0401 -2.7206
Maybe the degree 7 was a bit overkill since the first 5 coefficients are 0 ( or about 0), but anyhow it fits very well!
Hope that helps!
I used the following scripts to compute the p-values for t-test:
stats = regstats(Ratio,Delay,'quadratic');
stats.tstat.pval;
It shows a column of 3 numbers,
e.g.
0.0001
0.0002
0.0003
they should be corresponding to the coefficients:
y = ax^2 + bx + c
Then which is corresponding to which?