Matlab's VARMAX regression parameters/coefficients nX & b - matlab

I'm having a bit of trouble following the explanation of the parameters for vgxset. Being new to the field of time-series is probably part of my problem.
The vgxset help page (http://www.mathworks.com/help/econ/vgxset.html) says that its for a generalized model structure, VARMAX, and I assume that I just use a portion of that for VARMA. I basically tried to figure out what parameters pertain to VARMA versus, as opposed to the additional parameters for VARMAX. I assumed (maybe wrongly) nX and b pertain to the exogenous variables. Unfortunatley, I haven't found much on the internet about the prevailing notational conventions for a VARMAX model, so it's hard to be sure.
The SAS page for VARMAX (http://support.sas.com/documentation/cdl/en/etsug/67525/HTML/default/viewer.htm#etsug_varmax_details02.htm) shows that if you have "r" exogenous inputs and k time series, and if you look back at "s" time steps' worth of exogenous inputs, then you need "s" matrices of coefficients, each (k)x(r) in size.
This doesn't seem to be consistent with the vgxset page, which simply provides an nX-vector "b" of regression parameters. So my assumption that nX and b pertain to the exogenous inputs seems wrong, yet I'm not sure what else they can refer to in a VARMAX model. Furthermore, in all 3 examples given, nX seems to be set to the 3rd argument "s" in VARMAX(p,q,s). Again, though, it's not entirely clear because in all the examples, p=s=2.
Would someone be so kind as to shed some light on VARMAX parameters "b" and "nX"?

On Saturday, May 16, 2015 at 6:09:20 AM UTC-4, Rick wrote:
Your assessment is generally correct, "nX" and "b" parameters do
indeed correspond to the exogenous input data "x(t)". The number of
columns (i.e., time series) in x(t) is "nX" and is what SAS calls
"r", and the coefficient vector "b" is its regression coefficient.
I think the distinction here, and perhaps your confusion, is that
SAS incorporates exogenous data x(t) as what's generally called a
"distributed lag structure" in which they specify an r-by-T
predictor time series and allow this entire series to be lagged
using lag operator polynomial notation as are the AR and MA
components of the model.
MATLAB's Econometrics Toolbox, adopts a more classical regression
component approach. Any exogenous data is included as a simple
regression component and is not associated with a lag operator
polynomial.
In this convention, if the user wants to include lags of x(t), then
they would simply create the appropriate lag of x(t) and include it
as additional series (i.e., additional columns of a larger
multi-variate exogenous/predictor matrix, say X(t)).
See the utility function LAGMATRIX.
Note that both conventions are perfectly correct. Personally, I feel
that regression component approach is slightly more flexible since
it does not require you to include "s" lags of all series in x(t).
Interesting. I'm still wrapping my brain around the use of regression to determine lag coefficients. It turns out the the multitude of online tutorial info & hard copy library texts that I've looked at haven't really given much explanatory transition between the theoretical projection of new values onto past values versus actual regression using sample data. Your description is making this more concrete. Thank you.
AFTERNOTE: In keeping with the best practice of which I've been advised, I am posting links to the fora that I posed this question in:
http://www.mathworks.com/matlabcentral/newsreader/view_thread/341064
Matlab's VARMAX regression parameters/coefficients nX & b
https://stats.stackexchange.com/questions/152578/matlabs-varmax-regression-parameters-coefficients-nx-b

Related

How do you determine how many variables is too many for a CCA?

I am running a CCA of some ecological data with ~50 sites and several hundred species. I know that you have to be careful when your number of explanatory variables approaches your number of samples. I have 23 explanatory variables, so this isn't a problem for me, but I have also heard that using too many explanatory variables can start to "un-constrain" the CCA.
Are there any guidelines about how many explanatory variables is appropriate? So far, I have just plotted them all and then removed the ones that appear to be redundant (leaving me with 8). Can I use the intertia values to help inform/justify this?
Thanks
This is the same question as asking "how many variables are too many for regression analysis?". Not "almost the same", but exactly the same: CCA is an ordination of fitted values of linear regression. In most severe cases you can over-fit. In CCA this is evident when the first eigenvalues of CCA and (unconstrained) CA are almost identical and the ordinations look similar in first dimensions (you can use Procrustes analysis to check this). Extreme case would be that residual variation disappears, but in ordination you focus on first dimensions, and there the constraints can get lost much earlier than in later constrained axes or in residuals. More importantly: you must see CCA as a kind of regression analysis and have the same attitude to constraints as to explanatory (independent) variables in regression. If you have no prior hypothesis to study, you have all the problems of model selection of regression analysis plus the problems of multivariate ordination, but these are non-technical problems that should be handled somewhere else than in stackoverflow.

Matlab: Fit a custom function to xy-data with given x-y errors

I have been looking for a Matlab function that can do a nonlinear total least square fit, basically fit a custom function to data which has errors in all dimensions. The easiest case being x and y data-points with different given standard deviations in x and y, for every single point. This is a very common scenario in all natural sciences and just because most people only know how to do a least square fit with errors in y does not mean it wouldn't be extremely useful. I know the problem is far more complicated than a simple y-error, this is probably why most (not even physicists like myself) learned how to properly do this with multidimensional errors.
I would expect that a software like matlab could do it but unless I'm bad at reading the otherwise mostly useful help pages I think even a 'full' Matlab license doesn't provide such fitting functionality. Other tools like Origin, Igor, Scipy use the freely available fortran package "ODRPACK95", for instance. There are few contributions about total least square or deming fits on the file exchange, but they're for linear fits only, which is of little use to me.
I'd be happy for any hint that can help me out
kind regards
First I should point out that I haven't practiced MATLAB much since I graduated last year (also as a Physicist). That being said, I remember using
lsqcurvefit()
in MATLAB to perform non-linear curve fits. Now, this may, or may not work depending on what you mean by custom function? I'm assuming you want to fit some known expression similar to one of these,
y = A*sin(x)+B
y = A*e^(B*x) + C
It is extremely difficult to perform a fit without knowning the form, e.g. as above. Ultimately, all mathematical functions can be approximated by polynomials for small enough intervals. This is something you might want to consider, as MATLAB does have lots of tools for doing polynomial regression.
In the end, I would acutally reccomend you to write your own fit-function. There are tons of examples for this online. The idea is to know the true solution's form as above, and guess on the parameters, A,B,C.... Create an error- (or cost-) function, which produces an quantitative error (deviation) between your data and the guessed solution. The problem is then reduced to minimizing the error, for which MATLAB has lots of built-in functionality.

vgxset command: Q parameter for resulting model object?

Matlab's command for defining a vector time series model is vgxset, the formalism for which can be accessed by the command "doc vgxset". It says that the model parameter Q is "[a]n n-by-n symmetric innovations covariance matrix". No description of what it is for. I assumed that it was the covariance of the noise sources that show up in the equations for each times series in the archetypal representation of a vector time series, e.g., http://faculty.chicagobooth.edu/john.cochrane/research/papers/time_series_book.pdf.
I could be off about something (I often am), but this doesn't seem to match results from actually issuing the command to estimate a model's parameters. You can access the code that illustrates such estimation via the command "doc vgxvarx":
load Data_VARMA22
[EstSpec, EstStdErrors] = vgxvarx(vgxar(Spec), Y, [], Y0);
The object EstSpec contains the model, and the Q matrix is:
0.0518 0.0071
0.0071 0.0286
I would have expected that a covariance matrix as ones on the diagonal. Obviously, I misunderstand and/or mis-guessed at the purpose of Q. However, if you actually pull up the code for vgxset ("edit vgxset"), the comments explicitly describe Q as an "[i]nnovations covariance matrix".
I have 3 questions:
(1) What exactly is Q?
(2)Is there a Matlab reference document that I've failed to locate for fundamental parameters like this?
(3)If it isn't the covariance matrix for the noise sources, how does one actually supply actual noise source covariances to the model?
Please note that this question is specifically about Matlab's command for setting up the model, and as such, does not belong in the more concept-oriented Cross Validated Stack Exchange forum. I have posted this to:
(1) vgxset command: Q parameter for resulting model object?
(2) http://groups.google.com/forum/#!topic/comp.soft-sys.matlab/tg59h1wkRCw
I will try to iterate to an answer, but being so many branches of discussion, i prefer to access directly onto this format. Whatever mean, this is a constructive process, as the purpose of this forum is...
Some previous "clarifications":
The Output Covariance from EstSpec.Q after and before running the command vgxvarx are quite similar. Thus the command is outputting what he is shiningly expecting from itself.
As an Output Covariance -or whatever other meaning for the Q parameter- is almost never to be a "mask" of the parameters to use, -i.e. an identity or a sparse zero-one matrix input parameter-. If you can assign it as a diagonal multiplied by some scalar univariate scalar is a different history. This is a covariance, plainly, just as in other MATLAB commands.
Hence:
(2) Is there a Matlab reference document that I've failed to locate for fundamental parameters like this?
No, Matlab ussualy don't give further explanations for "non popular" commands. Yes, this is, under some measure, "not popular", so i would not be impressed if the answer for this question is no.
Of course, the doctoral method is to check the provided references, on this case, those provided under doc vartovec. Which i dunno the hell where to find without order the books seeking the proper library or seeking the overall internet on five minutes...
Thus the obscure method is always better... check the code for the function by doing edit vgxvarx. Check the commented Section % Step 7 - Solve for parameters (Line 515, Matlab R2014b). There are calculations for a Q matrix through a function mvregress. At this point, both of us know, this is the core function.
This mvregress function (Line 62, Matlab R2014b) receives an input parameter called Covar0, which is described as a *D-by-D matrix to be used as the initial estimate for SIGMA*.
This antecedent leads to the answer for (1).
(1) What exactly is Q?
The MATLAB code has dozens of switch -both as options and auto-triggered- so i am actually not sure of which algorithm are you interested on, or based on your data, which ones are actually "triggered" :). Please read the previous answer, and place a Debug Point on the mvregress function:
Covar=Covar+CovAdj; %Line 433, Matlab R2014b
and/or at:
Covar = (Covar + Resid'*Resid) / Count; % Line 439, Matlab R2014b
Having that, the exact meaning of Q, and as indicated by the mvregress help, would be an "Initial Matrix for the Estimate of the Output Covariance Matrix". The average is simply given by averaging the Counts...
But, for the provided data, making:
Spec.Q=[1 0.1;0.1 1];
and then running vgxvarx, the parameter Covar never got initialized!.
Which for the presented unfortunate case, leads to a simply "Unused Parameter".
(3) If it isn't the covariance matrix for the noise sources, how does one actually supply actual noise source covariances to the model?
I've lost tons of manhours trying to gather the correct information from pre-built Matlab commands. Thus, my suggestion here, is to stick onto the concepts of system identification, and I would put my faith under one of the following alternatives:
Keep believing, and dig a bit and debug inside the mvregress function, and check if some of the EstMethods -i.e. cwls ecm mvn under Line 195- leads to a proper filling of the Covar0 parameter,
Stick to the vgxvarx command, but let the Q parameter go, and diagonalize | normalize the data properly, in order to let the algorithm identify the data as a Identically Distributed Gaussian Noise,
Send vgxvarx to the hell, and use arx. I am not sure about the current stability of vgxvarx, but i am quite sure arx should be more "stable" on this regard...
Good Luck,
hypfco.
EDIT
A huge and comprehensive comment, i have nothing much to add.
Indeed, it is quite probable the vgxvarx was run on the Matlab data sample. Hence the results lay explained,
I tried to use the Q parameter on the vgxvarx, with no success by now. If any working code is found, it would be interesting to include it,
The implementation of the noise transformation over the data should be really simple, of the form:
Y1=(Y-Y0)*L
with L the left triangular cholesky for the inverse calculated covariance of Y, and Y0 the mean,
I think the MA part is as critical as the AR part. Unless you have very good reasons, you usually cannot say you explained your data in a gaussian way.
From your very last comment, I really suggest you to move onto a better, more established command for doing AR, MA, ARMA and such flavours. I am pretty sure they handle the MV case...
Again, Matlab don't impress me on that behaviour...
Cheers...
hypfco

Multi-parametric regression in MATLAB?

I have a curve which looks roughly / qualitative like the curves displayed in those 3 images.
The only thing I know is that the first part of the curve is hardware-specific supposed to be a linear curve and the second part is some sort of logarithmic part (might be a combination of two logarithmic curves), i.e. linlog camera. But I couldn't tell the mathematic structure of the equation, e.g. wether it looks like a*log(b)+c or a*(log(c+b))^2 etc. Is there a way to best fit/find out a good regression for this type of curve and is there a certain way to do this specifically in MATLAB? :-) I've got the student version, i.e. all toolboxes etc.
fminsearch is a very general way to find best-fit parameters once you have decided on a parametric equation. And the optimization toolbox has a range of more-sophisticated ways.
Comparing the merits of one parametric equation against another, however, is a deep topic. The main thing to be aware of is that you can always tweak the equation, adding another term or parameter or whatever, and get a better fit in terms of lower sum-squared-error or whatever other goodness-of-fit metric you decide is appropriate. That doesn't mean it's a good thing to keep adding parameters: your solution might be becoming overly complex. In the end the most reliable way to compare how well two different parametric models are doing is to cross-validate: optimize the parameters on a subset of the data, and evaluate only on data that the optimization procedure has not yet seen.
You can try the "function finder" on my curve fitting web site zunzun.com and see what it comes up with - it is free. If you have any trouble please email me directly and I'll do my best to help.
James Phillips
zunzun#zunzun.com

Solving a non-polynomial equation numerically

I've got a problem with my equation that I try to solve numerically using both MATLAB and Symbolic Toolbox. I'm after several source pages of MATLAB help, picked up a few tricks and tried most of them, still without satisfying result.
My goal is to solve set of three non-polynomial equations with q1, q2 and q3 angles. Those variables represent joint angles in my industrial manipulator and what I'm trying to achieve is to solve inverse kinematics of this model. My set of equations looks like this: http://imgur.com/bU6XjNP
I'm solving it with
numeric::solve([z1,z2,z3], [q1=x1..x2,q2=x3..x4,q3=x5..x6], MultiSolutions)
Changing the xn constant according to my needs. Yet I still get some odd results, the q1 var is off by approximately 0.1 rad, q2 and q3 being off by ~0.01 rad. I don't have much experience with numeric solve, so I just need information, should it supposed to look like that?
And, if not, what valid option do you suggest I should take next? Maybe transforming this equation to polynomial, maybe using a different toolbox?
Or, if trying to do this in Matlab, how can you limit your solutions when using solve()? I'm thinking of an equivalent to Symbolic Toolbox's assume() and assumeAlso.
I would be grateful for your help.
The numerical solution of a system of nonlinear equations is generally taken as an iterative minimization process involving the minimization (i.e., finding the global minimum) of the norm of the difference of left and right hand sides of the equations. For example fsolve essentially uses Newton iterations. Those methods perform a "deterministic" optimization: they start from an initial guess and then move in the unknowns space essentially according to the opposite of the gradient until the solution is not found.
You then have two kinds of issues:
Local minima: the stopping rule of the iteration is related to the gradient of the functional. When the gradient becomes small, the iterations are stopped. But the gradient can become small in correspondence to local minima, besides the desired global one. When the initial guess is far from the actual solution, then you are stucked in a false solution.
Ill-conditioning: large variations of the unknowns can be reflected into large variations of the data. So, small numerical errors on data (for example, machine rounding) can lead to large variations of the unknowns.
Due to the above problems, the solution found by your numerical algorithm will be likely to differ (even relevantly) from the actual one.
I recommend that you make a consistency test by choosing a starting guess, for example when using fsolve, very close to the actual solution and verify that your final result is accurate. Then you will discover that, by making the initial guess more far away from the actual solution, your result will be likely to show some (even large) errors. Of course, the entity of the errors depend on the nature of the system of equations. In some lucky cases, those errors could keep also very small.