I'm using a version of pmdarima that no longer includes statsmodels ARIMA or ARMA class. How do I interpret SARIMAX without pdq? - pmdarima

auto_arima(df1['Births'],seasonal=False).summary()
SARIMAX Results
Dep. Variable: y No. Observations: 120
Model: SARIMAX Log Likelihood -409.745
Date: Mon, 23 Aug 2021 AIC 823.489
Time: 06:55:06 BIC 829.064
Sample: 0 HQIC 825.753
120
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
intercept 39.7833 0.687 57.896 0.000 38.437 41.130
sigma2 54.1197 8.319 6.506 0.000 37.815 70.424
Ljung-Box (L1) (Q): 0.85 Jarque-Bera (JB): 2.69
Prob(Q): 0.36 Prob(JB): 0.26
Heteroskedasticity (H): 0.80 Skew: 0.26
Prob(H) (two-sided): 0.48 Kurtosis: 2.48
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
auto_arima(df1['Births'],seasonal=False)
auto_arima(df1['Births'],seasonal=False)

Related

Matrix columns correlations, excluding self-correlation, Matlab

I have a couple of matrices (1800 x 27) that represent subjects and their recordings (3 minutes equivalent for each of 27 subjects). Each column represents a subject.
I need to do intercorrelation between subjects, let's say to correlate F to G, G to H, and H to F for all 27 subjects.
I use CORR command corr(B) where B is a matrix and it returns the next example:
1 0.07 -0.05 0.10 0.04 0.12
0.07 1 -0.02 -0.08 0.17 0.03
-0.05 -0.02 1 0.04 0.16 0.13
0.10 -0.08 0.04 1 -0.04 0.34
0.04 0.18 0.16 -0.04 1 0.13
How can I adjust the code to exclude self-correlation (eg F to F) so I won't get "1" numerals?
(it's present in each row/column)
I have to perform some transformations afterwards, like Fisher Z-Transformation, which returns inf for each "1", and as result, I can't use further calculations.

Octave - why is surf not working but trisurf does?

I am able to plot a trisurf chart, but surf does not work.
What am I doing wrong?
pkg load statistics;
figure (1,'name','Matrix Map');
colormap('hot');
t = dlmread('C:\Map3D.csv');
tx =t(:,1);ty=t(:,2);tz=t(:,3);
tri = delaunay(tx,ty);
handle = surf(tx,ty,tz); #This does NOT work
#handle = trisurf(tri,tx,ty,tz); #This does work
`error: surface: rows (Z) must be the same as length (Y) and columns (Z) must be the same as length
(X)
My data is in a CSV (commas not shown here)
1 2 -0.32
2 2 0.33
3 2 0.39
4 2 0.09
5 2 0.14
1 2.5 -0.19
2 2.5 0.13
3 2.5 0.15
4 2.5 0.24
5 2.5 0.33
1 3 0.06
2 3 0.44
3 3 0.36
4 3 0.45
5 3 0.51
1 3.5 0.72
2 3.5 0.79
3 3.5 0.98
4 3.5 0.47
5 3.5 0.55
1 4 0.61
2 4 0.13
3 4 0.44
4 4 0.47
5 4 0.58
1 4.5 0.85
surf error message is different in Matlab or in Octave.
Error message from Matlab:
Z must be a matrix, not a scalar or vector.
The problem is pretty clear here since you specified Z (for you tz) as a vector.
Error message from Octave:
surface: rows (Z) must be the same as length (Y) and columns (Z) must be the same as length (X)
You are wrong here since on your example, columns (Z) = 1, but length (X) = 26, so here is the mistake.
One of the consequences of that is that with surf you cannot have "holes" or undefined points on your grid. On your case you have a X-grid from 1 to 5 and a Y-grid from 2 to 4.5 but point of coordinate (2, 4.5) is not defined.
#Luis Mendo, Matlab and Octave do allow the prototype surf(matrix_x, matrix_y, matrix_z) but the third argument matrix_z still have to be a matrix (not a scalar or vector). Apparently, a matrix of only one line or column is not considered as a matrix.
To solve the issue, I suggest something like:
tx = 1:5; % tx is a vector of length 5
ty = 2:0.5:4.5; % ty is a vector of length 6
tz = [-0.32 0.33 0.39 0.09 0.14;
-0.19 0.13 0.15 0.24 0.33;
0.06 0.44 0.36 0.45 0.51;
0.72 0.79 0.98 0.47 0.55;
0.61 0.13 0.44 0.47 0.58;
0.85 0. 0. 0. 0.]; % tz is a matrix of size 6*5
surf(tx,ty,tz);
Note that I had to invent some values at the points where your grid was not defined, I put 0. but you can change it with the value you prefer.

classifier.setOptions( weka.core.Utils.splitOptions()) is taking only default values even if other values provided in matlab

import weka.core.Instances.*
filename = 'C:\Users\Girish\Documents\MATLAB\DRESDEN_NSC.csv';
loader = weka.core.converters.CSVLoader();
loader.setFile(java.io.File(filename));
data = loader.getDataSet();
data.setClassIndex(data.numAttributes()-1);
%% classification
classifier = weka.classifiers.trees.J48();
classifier.setOptions( weka.core.Utils.splitOptions('-C 0.25 -M 2') );
classifier.buildClassifier(data);
classifier.toString()
ev = weka.classifiers.Evaluation(data);
v(1) = java.lang.String('-t');
v(2) = java.lang.String(filename);
v(3) = java.lang.String('-split-percentage');
v(4) = java.lang.String('66');
prm = cat(1,v(1:4));
ev.evaluateModel(classifier, prm)
Result:
Time taken to build model: 0.04 seconds
Time taken to test model on training split: 0.01 seconds
=== Error on training split ===
Correctly Classified Instances 767 99.2238 %
Incorrectly Classified Instances 6 0.7762 %
Kappa statistic 0.9882
Mean absolute error 0.0087
Root mean squared error 0.0658
Relative absolute error 1.9717 %
Root relative squared error 14.042 %
Total Number of Instances 773
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.994 0.009 0.987 0.994 0.990 0.984 0.999 0.999 Nikon
1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 Sony
0.981 0.004 0.990 0.981 0.985 0.980 0.999 0.997 Canon
Weighted Avg. 0.992 0.004 0.992 0.992 0.992 0.988 1.000 0.999
=== Confusion Matrix ===
a b c <-- classified as
306 0 2 | a = Nikon
0 258 0 | b = Sony
4 0 203 | c = Canon
=== Error on test split ===
Correctly Classified Instances 358 89.9497 %
Incorrectly Classified Instances 40 10.0503 %
Kappa statistic 0.8482
Mean absolute error 0.0656
Root mean squared error 0.2464
Relative absolute error 14.8485 %
Root relative squared error 52.2626 %
Total Number of Instances 398
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.885 0.089 0.842 0.885 0.863 0.787 0.908 0.832 Nikon
0.993 0.000 1.000 0.993 0.997 0.995 0.997 0.996 Sony
0.796 0.060 0.841 0.796 0.818 0.749 0.897 0.744 Canon
Weighted Avg. 0.899 0.048 0.900 0.899 0.899 0.853 0.938 0.867
=== Confusion Matrix ===
a b c <-- classified as
123 0 16 | a = Nikon
0 145 1 | b = Sony
23 0 90 | c = Canon
import weka.core.Instances.*
filename = 'C:\Users\Girish\Documents\MATLAB\DRESDEN_NSC.csv';
loader = weka.core.converters.CSVLoader();
loader.setFile(java.io.File(filename));
data = loader.getDataSet();
data.setClassIndex(data.numAttributes()-1);
%% classification
classifier = weka.classifiers.trees.J48();
classifier.setOptions( weka.core.Utils.splitOptions('-C 0.1 -M 1') );
classifier.buildClassifier(data);
classifier.toString()
ev = weka.classifiers.Evaluation(data);
v(1) = java.lang.String('-t');
v(2) = java.lang.String(filename);
v(3) = java.lang.String('-split-percentage');
v(4) = java.lang.String('66');
prm = cat(1,v(1:4));
ev.evaluateModel(classifier, prm)
Result:
Time taken to build model: 0.04 seconds
Time taken to test model on training split: 0 seconds
=== Error on training split ===
Correctly Classified Instances 767 99.2238 %
Incorrectly Classified Instances 6 0.7762 %
Kappa statistic 0.9882
Mean absolute error 0.0087
Root mean squared error 0.0658
Relative absolute error 1.9717 %
Root relative squared error 14.042 %
Total Number of Instances 773
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.994 0.009 0.987 0.994 0.990 0.984 0.999 0.999 Nikon
1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 Sony
0.981 0.004 0.990 0.981 0.985 0.980 0.999 0.997 Canon
Weighted Avg. 0.992 0.004 0.992 0.992 0.992 0.988 1.000 0.999
=== Confusion Matrix ===
a b c <-- classified as
306 0 2 | a = Nikon
0 258 0 | b = Sony
4 0 203 | c = Canon
=== Error on test split ===
Correctly Classified Instances 358 89.9497 %
Incorrectly Classified Instances 40 10.0503 %
Kappa statistic 0.8482
Mean absolute error 0.0656
Root mean squared error 0.2464
Relative absolute error 14.8485 %
Root relative squared error 52.2626 %
Total Number of Instances 398
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.885 0.089 0.842 0.885 0.863 0.787 0.908 0.832 Nikon
0.993 0.000 1.000 0.993 0.997 0.995 0.997 0.996 Sony
0.796 0.060 0.841 0.796 0.818 0.749 0.897 0.744 Canon
Weighted Avg. 0.899 0.048 0.900 0.899 0.899 0.853 0.938 0.867
=== Confusion Matrix ===
a b c <-- classified as
123 0 16 | a = Nikon
0 145 1 | b = Sony
23 0 90 | c = Canon
Same Result with both split options which is the result for default options i.e. -C 0.25 -M 2 for J48 classifier
please help!!! stuck here for a long time.Tried Different means but nothing worked for me

Mystery degree of freedom in VAR coefficient stderr

I've been testing the vector autoregressive coefficient estimation vgxvarx in Matlab's Econometrics toolbox. Once the coefficients are determined, vgxdisp gives you the choice of showing the standard errors estimated according to maximum likelihood or minimum bias. The only difference between the two is normalization by number of observations versus degrees of freedom, respectively. Since both are constant, you should be able to verify the 2 sets of standard errors by converting from one to the other. Just unnormalize by one constant and renormalize by the other.
I tried this and found that the minimum bias estimate of standard error seems to be off by one in the degrees of freedom. In the script below, I use vgxvarx to calculate VAR model coefficients and then request maximum likelihood and minimum bias estimates of their standard errors from vgxdisp (DoFAdj=false and true, respectively). To validate the two, I then convert the standard errors from ML to min bias by unnormalizing by the number of observations (nPoints) and renormalizing by degrees of freedom LESS ONE (found by trial and error). These scalings have to be square-rooted because they apply to variance and we're comparing standard errors.
I'm wondering if anyone can point out whether I am missing something basic that explains this mystery degree of freedom?
I've originally posted this to usenet. Here is a modification of the original code that natively sets the data so that it doesn't need to be obtained from http://www.econ.uiuc.edu/~econ472/eggs.txt.
clear variables
fnameDiary = [mfilename '.out.txt'];
if logical(exist(fnameDiary,'file'))
diary off
delete(fnameDiary)
end % if
diary(fnameDiary) % Also turns on diary
CovarType='full' % 'full'
nMaxLag=3
clf
tbChicEgg=table([
1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 ...
1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 ...
1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 ...
1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 ...
1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ...
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 ...
2002 2003 2004 2005 2006 ...
]',[
468491 449743 436815 444523 433937 389958 403446 423921 ...
389624 418591 438288 422841 476935 542047 582197 516497 ...
523227 467217 499644 430876 456549 430988 426555 398156 ...
396776 390708 383690 391363 374281 387002 369484 366082 ...
377392 375575 382262 394118 393019 428746 425158 422096 ...
433280 421763 404191 408769 394101 379754 378361 386518 ...
396933 400585 392110 384838 378609 364584 374000 370000 ...
373000 380000 356000 356000 353000 363000 371000 380000 ...
386000 388000 393000 410000 425000 437000 437000 444000 ...
444000 450000 454000 453000 453000 ...
]',[
3581 3532 3327 3255 3156 3081 3166 3443 3424 3561 3640 3840 ...
4456 5000 5366 5154 5130 5077 5032 5148 5404 5322 5323 5307 ...
5402 5407 5500 5442 5442 5542 5339 5358 5403 5345 5435 5474 ...
5540 5836 5777 5629 5704 5806 5742 5502 5461 5382 5377 5408 ...
5608 5777 5825 5625 5800 5656 5683 5700 5758 5867 5808 5600 ...
5675 5750 5892 5992 6158 6233 6367 6458 6650 6908 7058 7175 ...
7275 7292 7425 7500 7575 ...
]', ...
'VariableNames', {'year' 'chic' 'egg'} ...
);
seriesNames={'chic','egg'};
varChicEgg = vgxset( 'Series', seriesNames, 'n',2 );
chicEgg = table2array(tbChicEgg(:,seriesNames));
dChicEgg = diff(chicEgg);
dChicEgg = bsxfun( #minus, dChicEgg, mean(dChicEgg) ); % Make 0-mean
dChicEgg0 = dChicEgg(1:nMaxLag,:); % Presample-data
dChicEgg = dChicEgg(1+nMaxLag:end,:);
nPoints = length(dChicEgg)
yrs = table2array(tbChicEgg(1+nMaxLag:end,'year'));
yrs = yrs(1:nPoints);
subplot(3,1,1);
plotyy( yrs,dChicEgg(:,1) , yrs,dChicEgg(:,2) );
for DoFAdj = [false true]
% DoFAdj=1 means std err normalizes by df rather than n, where
% n=number of observations and df is n less the number of
% parameters estimated (from vgxdisp or vgxcount's NumParam)
[est.spec est.stdErr est.LLF est.W] = vgxvarx( ...
vgxset( varChicEgg, 'nAR',nMaxLag ), ...
dChicEgg, NaN, dChicEgg0, ...
'StdErrType', 'all', ...
'CovarType', CovarType ...
);
fprintf('-------------------------\nDoFAdj=%g\n',DoFAdj);
subplot(3,1,2+DoFAdj)
plotyy(yrs,est.W(:,1),yrs,est.W(:,2))
vgxdisp(est.spec,est.stdErr,'DoFAdj',DoFAdj);
end
fprintf('\nConvert ML stderr (DoFAdj=false) to min bias (DoFAdj=true):\n');
fprintf('Number of parameters: ')
[~,NumParam]=vgxcount(est.spec)
degreeFree = nPoints - NumParam
fprintf('\n');
stderr_ML_2_minBias=[
0.148195
21.1939
0.00104974
0.150127
0.160034
22.2911
0.0011336
0.157899
0.147694
20.9146
0.00104619
0.148148
6.43245e+07
381484
3227.54
] ...
* sqrt( nPoints / ( degreeFree - 1 ) );
for iParam = 1:length(stderr_ML_2_minBias)
disp(stderr_ML_2_minBias(iParam));
end
%--------------------------------------------------
diary off
% error('Stopping before return.');
return

Draw network or graph from matrix in matlab

How do I draw a sequence of frames of a network with the help of a transition matrix?
I have a matrix that denotes a graph. The matrix changes with iterations. Can anyone give me an insight of what functions I can use to create the series of the network?
original=[0.06 0.57 0.37 0 0;
0.57 0.06 0.37 0 0;
0.37 0.57 0.03 0.03 0;
0 0 0.03 0.13 0.84;
0 0 0 0.84 0.16];
Suppose the, above is the matrix in question. Then the graph should be
This question is related to this earlier query and this one. But here's an answer specific to your situation.
Given a weighted adjacency matrix:
original = [0.06 0.57 0.37 0 0;
0.57 0.06 0.37 0 0;
0.37 0.57 0.03 0.03 0;
0 0 0.03 0.13 0.84;
0 0 0 0.84 0.16];
you can first define the number of nodes in the network:
N = size(original,1);
and then a corresponding set of coordinates on the perimeter of a circle:
coords = [cos(2*pi*(1:N)/N); sin(2*pi*(1:N)/N)]';
Then you can plot the graph using gplot:
gplot(original, coords)
and mark the vertices using text:
text(coords(:,1) - 0.1, coords(:,2) + 0.1, num2str((1:N)'), 'FontSize', 14)
Note that the gplot function does not weight the lines by connection strength; the matrix element (i,j) is treated as binary, indicating absence or presence of a link between nodes i and j.