Matlab Confidence Interval for Degrees of Freedom - matlab

I would like to calculate a Confidence Interval along with my Degrees of Freedom (DOF) estimation in Matlab. I am trying to run the following line of code:
[R, DoF, ciDOF] = copulafit('t', U); % fit the copula
The code line without the "ciDOF" arguments takes between 1-3 hours to run with my data. I tried to run the code with the "ciDOF" argument several times, but the calculations seem to take very long (I stopped the calculation after 8 hours). No error message is generated.
Does anyone have experience with this argument and could kindly tell me how long I should expect the calculation to take (the size of my data is 167*19) and if I have specified the "ciDOF" argument correctly?
Many thanks for the help!
Carolin

If your data matrix U is of size 167 x 19, then what you are asking for is a copula-fit distribution dependent on 19-dimensions, making your copula a distribution in a 20-dimensional space with 19 dependent variables.
This is almost definitely why it is taking so long, because whether it is your intention or not, you are asking MATLAB to solve a minimization problem of taking 19 marginal distributions and come-up with the 19-variate joint distribution (the copula) where each marginal distribution (represented by 167 x 1 row-vectors) is uniform.
Most-likely this is a limit of the MATLAB implementation that is iterating through many independent computations and then trying to combine them together to fit the joint distribution's ideal conditions.
First and foremost -- and not to be insulting or insinuating -- you should definitely check that you really are trying to find a 19-variate copula. Also, just in case, make sure that your matrix U is oriented in the proper way, because if you have it transposed, you could be trying to ask for the solution to a 167-variate distribution.
But, if this is what you are actually trying to do, there is not really an easy way to predict how long it will take or how long it should take. Even with multiple dimensions, if your marginals are simple or uniform already, that would greatly reduce the copula computation. But, really, there is no way to tell.
Although this may seem like a cop-out, you may actually have better luck switching from MATLAB to R, especially if you have a lot of multivariate data, and you will probably find a lot more functionality in R than MATLAB. R is freely available and comes with a Graphical User Interface (GUI), in-case you aren't comfortable with command-line programming.
There are many more sources, but here is one PDF lecture on computing copula-fits in R:
http://faculty.washington.edu/ezivot/econ589/copulasPowerpoint.pdf

Related

Randomnes in fitclinear from Matlab

I am running a logistic regression using the matlab function fitclinear with the following parameters:
rng('default')
[Mdl,FitInfo] = fitclinear(X',y', 'Lambda','auto',...
'Learner','logistic',...
'ObservationsIn','columns',...
'Regularization','ridge',...
'Solver','sgd',...
'Verbose',1,...
'BatchSize',100,...
'LearnRate',0.1,...
'OptimizeLearnRate',true,...
'PassLimit',100,...
'ClassNames',[-1,1]);
And due to the fact that i m working with recent and long historycal data, I came to realize that training this logistic regression with the exact same X and y and after setting the random generator to default to reproduce results, could results in 2 different results, i.e. 2 different set of Betas and different bias.
Could anyone tell me what could be the reason behing? Where could the randomness come from?
The system starts at a random start point, from there, with the size of your system, there are many local minima which could still be good. The idea is that with larger sizes of frameworks we don't really care about having the same global minima, we care about having decent results. Therefore, we can start at any random point and accept that our system is unlikely to end up with the best result but rather in some location that gives us good results. This means that it is unlikely, given a large system that any two training sequences will be the same.
https://stats.stackexchange.com/questions/203288/understanding-almost-all-local-minimum-have-very-similar-function-value-to-the

Matlab: Fit a custom function to xy-data with given x-y errors

I have been looking for a Matlab function that can do a nonlinear total least square fit, basically fit a custom function to data which has errors in all dimensions. The easiest case being x and y data-points with different given standard deviations in x and y, for every single point. This is a very common scenario in all natural sciences and just because most people only know how to do a least square fit with errors in y does not mean it wouldn't be extremely useful. I know the problem is far more complicated than a simple y-error, this is probably why most (not even physicists like myself) learned how to properly do this with multidimensional errors.
I would expect that a software like matlab could do it but unless I'm bad at reading the otherwise mostly useful help pages I think even a 'full' Matlab license doesn't provide such fitting functionality. Other tools like Origin, Igor, Scipy use the freely available fortran package "ODRPACK95", for instance. There are few contributions about total least square or deming fits on the file exchange, but they're for linear fits only, which is of little use to me.
I'd be happy for any hint that can help me out
kind regards
First I should point out that I haven't practiced MATLAB much since I graduated last year (also as a Physicist). That being said, I remember using
lsqcurvefit()
in MATLAB to perform non-linear curve fits. Now, this may, or may not work depending on what you mean by custom function? I'm assuming you want to fit some known expression similar to one of these,
y = A*sin(x)+B
y = A*e^(B*x) + C
It is extremely difficult to perform a fit without knowning the form, e.g. as above. Ultimately, all mathematical functions can be approximated by polynomials for small enough intervals. This is something you might want to consider, as MATLAB does have lots of tools for doing polynomial regression.
In the end, I would acutally reccomend you to write your own fit-function. There are tons of examples for this online. The idea is to know the true solution's form as above, and guess on the parameters, A,B,C.... Create an error- (or cost-) function, which produces an quantitative error (deviation) between your data and the guessed solution. The problem is then reduced to minimizing the error, for which MATLAB has lots of built-in functionality.

Naive bayes classifier calculation

I'm trying to use naive Bayes classifier to classify my dataset.My questions are:
1- Usually when we try to calculate the likehood we use the formula:
P(c|x)= P(c|x1) * P(c|x2)*...P(c|xn)*P(c) . But in some examples it says in order to avoid getting very small results we use P(c|x)= exp(log(c|x1) + log(c|x2)+...log(c|xn) + logP(c)). can anyone explain more to me the difference between these two formula and are they both used to calculate the "likehood" or the sec one is used to calculate something called "information gain".
2- In some cases when we try to classify our datasets some joints are null. Some ppl use "LAPLACE smoothing" technique in order to avoid null joints. Doesnt this technique influence on the accurancy of our classification?.
Thanks in advance for all your time. I'm just new to this algorithm and trying to learn more about it. So is there any recommended papers i should read? Thanks alot.
I'll take a stab at your first question, assuming you lost most of the P's in your second equation. I think the equation you are ultimately driving towards is:
log P(c|x) = log P(c|x1) + log P(c|x2) + ... + log P(c)
If so, the examples are pointing out that in many statistical calculations, it's often easier to work with the logarithm of a distribution function, as opposed to the distribution function itself.
Practically speaking, it's related to the fact that many statistical distributions involve an exponential function. For example, you can find where the maximum of a Gaussian distribution K*exp^(-s_0*(x-x_0)^2) occurs by solving the mathematically less complex problem (if we're going through the whole formal process of taking derivatives and finding equation roots) of finding where the maximum of its logarithm K-s_0*(x-x_0)^2 occurs.
This leads to many places where "take the logarithm of both sides" is a standard step in an optimization calculation.
Also, computationally, when you are optimizing likelihood functions that may involve many multiplicative terms, adding logarithms of small floating-point numbers is less likely to cause numerical problems than multiplying small floating point numbers together is.

Creating a stochastic time-series with given parameters

I would like to create a tool for generating a stochastic time-series distribution, for which I can provide the parameters (for a normal distribution) the mean, standard deviation, skewness and kurtosis. There is a similar question here using R, but I am not able to interpret this and put it in MATLAB.
Is there something that someone knows can do this already? (I haven't been able to find anything)
If not, what would be some good advice for starting something of my own? Any known useful functions? I would also like to be able to build upon it afterwards, for example: adding outliers, clusters of volatility, adjusting heteroscedasticity.
I realise me saying 'stochastic' and then in the same sentence 'given parameters' may seem odd, but it isn't - I want each time point to be random, but the parameters to describe, say 10,000 time points.
If you're looking for the equivalent of the solution in R, Matlab's Statistics Toolbox has limited support for the Johnson and Pearson distribution systems. In particular, the johnsrnd function produces random variates for the Johnson system. The Pearson system and pearsrnd, however, takes moments directly.
A big caveat. Using moments to describe or fit or produce random variates – often referred to as moment matching – is not robust and poorly regarded by statisticians. They're not guaranteed to uniquely define a distribution unless you have the entire moment generating function.

Solving a non-polynomial equation numerically

I've got a problem with my equation that I try to solve numerically using both MATLAB and Symbolic Toolbox. I'm after several source pages of MATLAB help, picked up a few tricks and tried most of them, still without satisfying result.
My goal is to solve set of three non-polynomial equations with q1, q2 and q3 angles. Those variables represent joint angles in my industrial manipulator and what I'm trying to achieve is to solve inverse kinematics of this model. My set of equations looks like this: http://imgur.com/bU6XjNP
I'm solving it with
numeric::solve([z1,z2,z3], [q1=x1..x2,q2=x3..x4,q3=x5..x6], MultiSolutions)
Changing the xn constant according to my needs. Yet I still get some odd results, the q1 var is off by approximately 0.1 rad, q2 and q3 being off by ~0.01 rad. I don't have much experience with numeric solve, so I just need information, should it supposed to look like that?
And, if not, what valid option do you suggest I should take next? Maybe transforming this equation to polynomial, maybe using a different toolbox?
Or, if trying to do this in Matlab, how can you limit your solutions when using solve()? I'm thinking of an equivalent to Symbolic Toolbox's assume() and assumeAlso.
I would be grateful for your help.
The numerical solution of a system of nonlinear equations is generally taken as an iterative minimization process involving the minimization (i.e., finding the global minimum) of the norm of the difference of left and right hand sides of the equations. For example fsolve essentially uses Newton iterations. Those methods perform a "deterministic" optimization: they start from an initial guess and then move in the unknowns space essentially according to the opposite of the gradient until the solution is not found.
You then have two kinds of issues:
Local minima: the stopping rule of the iteration is related to the gradient of the functional. When the gradient becomes small, the iterations are stopped. But the gradient can become small in correspondence to local minima, besides the desired global one. When the initial guess is far from the actual solution, then you are stucked in a false solution.
Ill-conditioning: large variations of the unknowns can be reflected into large variations of the data. So, small numerical errors on data (for example, machine rounding) can lead to large variations of the unknowns.
Due to the above problems, the solution found by your numerical algorithm will be likely to differ (even relevantly) from the actual one.
I recommend that you make a consistency test by choosing a starting guess, for example when using fsolve, very close to the actual solution and verify that your final result is accurate. Then you will discover that, by making the initial guess more far away from the actual solution, your result will be likely to show some (even large) errors. Of course, the entity of the errors depend on the nature of the system of equations. In some lucky cases, those errors could keep also very small.