vgxset command: Q parameter for resulting model object? - matlab

Matlab's command for defining a vector time series model is vgxset, the formalism for which can be accessed by the command "doc vgxset". It says that the model parameter Q is "[a]n n-by-n symmetric innovations covariance matrix". No description of what it is for. I assumed that it was the covariance of the noise sources that show up in the equations for each times series in the archetypal representation of a vector time series, e.g., http://faculty.chicagobooth.edu/john.cochrane/research/papers/time_series_book.pdf.
I could be off about something (I often am), but this doesn't seem to match results from actually issuing the command to estimate a model's parameters. You can access the code that illustrates such estimation via the command "doc vgxvarx":
load Data_VARMA22
[EstSpec, EstStdErrors] = vgxvarx(vgxar(Spec), Y, [], Y0);
The object EstSpec contains the model, and the Q matrix is:
0.0518 0.0071
0.0071 0.0286
I would have expected that a covariance matrix as ones on the diagonal. Obviously, I misunderstand and/or mis-guessed at the purpose of Q. However, if you actually pull up the code for vgxset ("edit vgxset"), the comments explicitly describe Q as an "[i]nnovations covariance matrix".
I have 3 questions:
(1) What exactly is Q?
(2)Is there a Matlab reference document that I've failed to locate for fundamental parameters like this?
(3)If it isn't the covariance matrix for the noise sources, how does one actually supply actual noise source covariances to the model?
Please note that this question is specifically about Matlab's command for setting up the model, and as such, does not belong in the more concept-oriented Cross Validated Stack Exchange forum. I have posted this to:
(1) vgxset command: Q parameter for resulting model object?
(2) http://groups.google.com/forum/#!topic/comp.soft-sys.matlab/tg59h1wkRCw

I will try to iterate to an answer, but being so many branches of discussion, i prefer to access directly onto this format. Whatever mean, this is a constructive process, as the purpose of this forum is...
Some previous "clarifications":
The Output Covariance from EstSpec.Q after and before running the command vgxvarx are quite similar. Thus the command is outputting what he is shiningly expecting from itself.
As an Output Covariance -or whatever other meaning for the Q parameter- is almost never to be a "mask" of the parameters to use, -i.e. an identity or a sparse zero-one matrix input parameter-. If you can assign it as a diagonal multiplied by some scalar univariate scalar is a different history. This is a covariance, plainly, just as in other MATLAB commands.
Hence:
(2) Is there a Matlab reference document that I've failed to locate for fundamental parameters like this?
No, Matlab ussualy don't give further explanations for "non popular" commands. Yes, this is, under some measure, "not popular", so i would not be impressed if the answer for this question is no.
Of course, the doctoral method is to check the provided references, on this case, those provided under doc vartovec. Which i dunno the hell where to find without order the books seeking the proper library or seeking the overall internet on five minutes...
Thus the obscure method is always better... check the code for the function by doing edit vgxvarx. Check the commented Section % Step 7 - Solve for parameters (Line 515, Matlab R2014b). There are calculations for a Q matrix through a function mvregress. At this point, both of us know, this is the core function.
This mvregress function (Line 62, Matlab R2014b) receives an input parameter called Covar0, which is described as a *D-by-D matrix to be used as the initial estimate for SIGMA*.
This antecedent leads to the answer for (1).
(1) What exactly is Q?
The MATLAB code has dozens of switch -both as options and auto-triggered- so i am actually not sure of which algorithm are you interested on, or based on your data, which ones are actually "triggered" :). Please read the previous answer, and place a Debug Point on the mvregress function:
Covar=Covar+CovAdj; %Line 433, Matlab R2014b
and/or at:
Covar = (Covar + Resid'*Resid) / Count; % Line 439, Matlab R2014b
Having that, the exact meaning of Q, and as indicated by the mvregress help, would be an "Initial Matrix for the Estimate of the Output Covariance Matrix". The average is simply given by averaging the Counts...
But, for the provided data, making:
Spec.Q=[1 0.1;0.1 1];
and then running vgxvarx, the parameter Covar never got initialized!.
Which for the presented unfortunate case, leads to a simply "Unused Parameter".
(3) If it isn't the covariance matrix for the noise sources, how does one actually supply actual noise source covariances to the model?
I've lost tons of manhours trying to gather the correct information from pre-built Matlab commands. Thus, my suggestion here, is to stick onto the concepts of system identification, and I would put my faith under one of the following alternatives:
Keep believing, and dig a bit and debug inside the mvregress function, and check if some of the EstMethods -i.e. cwls ecm mvn under Line 195- leads to a proper filling of the Covar0 parameter,
Stick to the vgxvarx command, but let the Q parameter go, and diagonalize | normalize the data properly, in order to let the algorithm identify the data as a Identically Distributed Gaussian Noise,
Send vgxvarx to the hell, and use arx. I am not sure about the current stability of vgxvarx, but i am quite sure arx should be more "stable" on this regard...
Good Luck,
hypfco.
EDIT
A huge and comprehensive comment, i have nothing much to add.
Indeed, it is quite probable the vgxvarx was run on the Matlab data sample. Hence the results lay explained,
I tried to use the Q parameter on the vgxvarx, with no success by now. If any working code is found, it would be interesting to include it,
The implementation of the noise transformation over the data should be really simple, of the form:
Y1=(Y-Y0)*L
with L the left triangular cholesky for the inverse calculated covariance of Y, and Y0 the mean,
I think the MA part is as critical as the AR part. Unless you have very good reasons, you usually cannot say you explained your data in a gaussian way.
From your very last comment, I really suggest you to move onto a better, more established command for doing AR, MA, ARMA and such flavours. I am pretty sure they handle the MV case...
Again, Matlab don't impress me on that behaviour...
Cheers...
hypfco

Related

Given a Cost Function, C(weights), that depends on expected and network outputs, how is C differentiated with respect to weights?

I'm building a Neural Network from scratch that categorizes values of x into 21 possible estimates for sin(x). I plan to use MSE as my loss function.
MSE of each minibatch = ||y(x) - a||^2, where y(x) is the vector of network outputs for x-
values in the minibatch. a is the vector of expected outputs that correspond to each x.
After finding the loss, the column vector of all weights in the network is recalculated. Column vector of delta w's ~= column vector of partial derivatives of C with respect to each weight.
∇C≡(∂C/∂w1,∂C/∂w2...).T and Δw =−η∇C where η is the (positive) learn rate.
The problem is, to find the gradient of C, you have to differentiate with respect to each weight. What does that function even look like? It's not just the previously stated MSE right?
Any help is appreciated. Also, apologies in advance if this question is misplaced, I wasn't sure if it belonged here or in a math forum.
Thank you.
(I might add that I have tried to find an answer to this online, but few examples exist that either don't use libraries to do the dirty work or present the information clearly.)
http://neuralnetworksanddeeplearning.com/chap2.html
I had found this a while ago but only now realized it's significance. The link describes δ(j,l) as an intermediary value to arrive at the partial derivative of C with respect to weights. I will post back here with a full answer if the link above answers my question, as I've seen a few posts similar to mine that have yet to be answered.

dfittool results interpretation

Does anyone know how to tell the difference between distributions (ie their goodness of fit) using the dfittool in Matlab? In a class I took forever ago, we learned about the log likelihood parameter and how to compare a pdf fitted to Gaussian vs gamma, etc. But right now, all the matlab help files online are like "it means something." Any assistance would be appreciated. Basically, I need to interpret the "results" in "edit fit" of the dfittool. I want to be able to compare my dfits to each other from the results, so I can pick the best fit for my analysis. I don't know what the difference is between a log likelihood of -111 vs -105.
Example below:
Distribution: Normal
Log likelihood: -110.954
Domain: -Inf < y < Inf
Mean: 101.443
Variance: 436.332
Parameter Estimate Std. Err.
mu 101.443 4.17771
sigma 20.8886 3.04691
Estimated covariance of parameter estimates:
mu sigma
mu 17.4533 6.59643e-15
sigma 6.59643e-15 9.28366
Thank you!
(Log) likelihood is a measure of the fit of a distribution to data, so the simple answer is: the distribution with the largest likelihood is the one that fits best. However, what you get here as an output is the maximized likelihood, i.e. the likelihood at those parameter values where it is maximal. Different families of distributions might be differently "flexible", so that it is easier to get a larger likelihood with one of them in general, so this limits comparability. This holds especially if you compare families with different numbers of parameters. A fix for this is to use formal model comparison, e.g. using the Bayes factor, which however is considerably more complex mathematically, or its approximation, the Bayesian information criterion.
More generally speaking however, it is seldomly a good idea to just randomly pick distributions and see how well they fit. It would be better to have some at least partially theoretically motivated idea why a distribution is a candidate. On the most basic level this means considering its definition range: the normal distribution is defined on the whole real line, the gamma distribution only for nonnegative real numbers. This way it should be possible to rule one of them out based on basic properties of your data.

Matlab's VARMAX regression parameters/coefficients nX & b

I'm having a bit of trouble following the explanation of the parameters for vgxset. Being new to the field of time-series is probably part of my problem.
The vgxset help page (http://www.mathworks.com/help/econ/vgxset.html) says that its for a generalized model structure, VARMAX, and I assume that I just use a portion of that for VARMA. I basically tried to figure out what parameters pertain to VARMA versus, as opposed to the additional parameters for VARMAX. I assumed (maybe wrongly) nX and b pertain to the exogenous variables. Unfortunatley, I haven't found much on the internet about the prevailing notational conventions for a VARMAX model, so it's hard to be sure.
The SAS page for VARMAX (http://support.sas.com/documentation/cdl/en/etsug/67525/HTML/default/viewer.htm#etsug_varmax_details02.htm) shows that if you have "r" exogenous inputs and k time series, and if you look back at "s" time steps' worth of exogenous inputs, then you need "s" matrices of coefficients, each (k)x(r) in size.
This doesn't seem to be consistent with the vgxset page, which simply provides an nX-vector "b" of regression parameters. So my assumption that nX and b pertain to the exogenous inputs seems wrong, yet I'm not sure what else they can refer to in a VARMAX model. Furthermore, in all 3 examples given, nX seems to be set to the 3rd argument "s" in VARMAX(p,q,s). Again, though, it's not entirely clear because in all the examples, p=s=2.
Would someone be so kind as to shed some light on VARMAX parameters "b" and "nX"?
On Saturday, May 16, 2015 at 6:09:20 AM UTC-4, Rick wrote:
Your assessment is generally correct, "nX" and "b" parameters do
indeed correspond to the exogenous input data "x(t)". The number of
columns (i.e., time series) in x(t) is "nX" and is what SAS calls
"r", and the coefficient vector "b" is its regression coefficient.
I think the distinction here, and perhaps your confusion, is that
SAS incorporates exogenous data x(t) as what's generally called a
"distributed lag structure" in which they specify an r-by-T
predictor time series and allow this entire series to be lagged
using lag operator polynomial notation as are the AR and MA
components of the model.
MATLAB's Econometrics Toolbox, adopts a more classical regression
component approach. Any exogenous data is included as a simple
regression component and is not associated with a lag operator
polynomial.
In this convention, if the user wants to include lags of x(t), then
they would simply create the appropriate lag of x(t) and include it
as additional series (i.e., additional columns of a larger
multi-variate exogenous/predictor matrix, say X(t)).
See the utility function LAGMATRIX.
Note that both conventions are perfectly correct. Personally, I feel
that regression component approach is slightly more flexible since
it does not require you to include "s" lags of all series in x(t).
Interesting. I'm still wrapping my brain around the use of regression to determine lag coefficients. It turns out the the multitude of online tutorial info & hard copy library texts that I've looked at haven't really given much explanatory transition between the theoretical projection of new values onto past values versus actual regression using sample data. Your description is making this more concrete. Thank you.
AFTERNOTE: In keeping with the best practice of which I've been advised, I am posting links to the fora that I posed this question in:
http://www.mathworks.com/matlabcentral/newsreader/view_thread/341064
Matlab's VARMAX regression parameters/coefficients nX & b
https://stats.stackexchange.com/questions/152578/matlabs-varmax-regression-parameters-coefficients-nx-b

I used least square method but matlab return compeletly wrong answer

I must solve an over constrained problem (Equations more than unknowns). So I have to use least square method.
First I create coefficient matrix .It is a 225*375 matrix. For inversing, I use pinv() function and then multiply it in load matrix .
My problem is about plate bending under uniform load with clamped edge. I expect at least correct answer in my boundary (the deflection must be zero), but even in boundary I have wrong answer.
I have read in a book that sometimes an error occurs in the Least Square method, which should be corrected manually by the user but I couldn’t find any more explanation about it elsewhere.
First of all we need more data about your problem:
What's the model?
Where are the measurements coming from?
Yet few notes about what I could figure from your issue:
If you have boundaries on the solution you should use Constrained Least Squares. If you do it on MATLAB it is easily can be done (Look at Quadratic Programming as well).
Does L2 error fit your problem? Maybe you should a different
There's no bug in the implementation of MATLAB. Using pinv gives the minimum norm (Both of the solution vector and the residual L2 norm) solution in the range of the given matrix. It might be you either construct the data in a wrong manner or the model you're using isn't adequate.

Simple Sequential feature selection in Matlab

I have a 40X3249 noisy dataset and 40X1 resultset. I want to perform simple sequential feature selection on it, in Matlab. Matlab example is complicated and I can't follow it. Even a few examples on SoF didn't help. I want to use decision tree as classifier to perform feature selection. Can someone please explain in simple terms.
Also is it a problem that my dataset has very low number of observations compared to the number of features?
I am following this example: Sequential feature selection Matlab and I am getting error like this:
The pooled covariance matrix of TRAINING must be positive definite.
I've explained the error message you're getting in answers to your previous questions.
In general, it is a problem that you have many more variables than samples. This will prevent you using some techniques, such as the discriminant analysis you were attempting, but it's a problem anyway. The fact is that if you have that high a ratio of variables to samples, it is very likely that some combination of variables would perfectly classify your dataset even if they were all random numbers. That's true if you build a single decision tree model, and even more true if you are using a feature selection method to explicitly search through combinations of variables.
I would suggest you try some sort of dimensionality reduction method. If all of your variables are continuous, you could try PCA as suggested by #user1207217. Alternatively you could use a latent variable method for model-building, such as PLS (plsregress in MATLAB).
If you're still intent on using sequential feature selection with a decision tree on this dataset, then you should be able to modify the example in the question you linked to, replacing the call to classify with one to classregtree.
This error comes from the use of the classify function in that question, which is performing LDA. This error occurs when the data is rank deficient (or in other words, some features are almost exactly correlated). In order to overcome this, you should project the data down to a lower dimensional subspace. Principal component analysis can do this for you. See here for more details on how to use pca function within statistics toolbox of Matlab.
[basis, scores, ~] = pca(X); % Find the basis functions and their weighting, X is row vectors
indices = find(scores > eps(2*max(scores))); % This is to find irrelevant components up to machine precision of the biggest component .. with a litte extra tolerance (2x)
new_basis = basis(:, indices); % This gets us the relevant components, which are stored in variable "basis" as column vectors
X_new = X*new_basis; % inner products between the new basis functions spanning some subspace of the original, and the original feature vectors
This should get you automatic projections down into a relevant subspace. Note that your features won't have the same meaning as before, because they will be weighted combinations of the old features.
Extra note: If you don't want to change your feature representation, then instead of classify, you need to use something which works with rank deficient data. You could roll your own version of penalised discriminant analysis (which is quite simple), use support vector machines, or other classification functions which don't break with correlated features as LDA does (by virtue of requiring matrix inversion of the covariance estimate).
EDIT: P.S I haven't tested this, because I have rolled my own version of PCA in Matlab.