matlab / xcov function - matlab

I looked into using xcov to calculate auto-covariances of a Tx1 time series vector. Using xcov in the first instance, it seemed as if this function would not devide by T as opposed to what the general formula for sample autocovariance would suggest: ?
Can someone confirm whether I need to devide by T to get a vector of sample auto-covariances? If I do not devide by T, the covariances seem to explode, as more and more terms are added. It is also not clear from the documentation of xcov.
For instance:
test = randn(10,1);
xcov(test)
Best
Marcel

Related

How to generate a 2D random vector in MATLAB?

I have a non-negative function f defined on a unit square S = [0,1] x [0,1] such that
My question is, how can I use MATLAB to generate a 2D random vector from S according to the probability density function f?
Rejection Sampling
The suggestion Luis Mendo made is very good because it applies to nearly all distribution functions. Based on this answer I wrote code for m.
An important point when using rejection sampling this way is that you must know the maximum of your pdf within the range. If you over-estimate the maximum your code will only run slower. If you under-estimate it it will create wrong numbers!
The idea is that you sample many uniform distributed points and accept depending on the probability density for the points.
pdf=#(x).5.*x(:,1)+3./2.*x(:,2);
maximum=2; %Right maximum for THIS EXAMPLE.
%If you are unable to determine the maximum of your
%function within the [0,1]x[0,1] range, please give an example.
result=[];
n=10;
while (size(result,1)<n)
%1. sample random point:
val=rand(1,2);
%2. Accept with probability pdf(val)/maximum
if rand<pdf(val)/maximum
%append to solution
result(end+1,:)=val;
end
end
I know that this solution is not a fast implementation, but I wanted to start with an implementation as simple as possible to make sure that the concept of rejection sampling becomes clear.
ICDF
Besides rejection sampling there is a different approach to address this issue on a more mathematical level, but you need to sit down and do some math first to end up with a better solution. For 1 dimensional distributions you typically sample using the ICDF (inverted cumulative density function) function simply using ICDF(rand(n,1)) to get random samples.
If you manage to do the math, you could instead for your PDF function define two functions ICDF1 (ICDF for the first dimension) and ICDF2 (ICDF for the second dimension) in matlab.
The first ICDF1 would map unifrom random distributed samples to sample values for the first dimension of your random distribution.
The second ICDF2 would map the output if ICDF1 and uniform distributed samples to your intended solution.
Here is some matlab code assuming you already defined ICDF1 and ICDF2
samples=ICDF1(rand(n,1));
samples(:,2)=ICDF2(samples,rand(n,1));
The great advantage of this solution is, that it does not reject any samples, being potentially much faster.

For loop equation into Octave / Matlab code

I'm using octave 3.8.1 which works like matlab.
I have an array of thousands of values I've only included three groupings as an example below:
(amp1=0.2; freq1=3; phase1=1; is an example of one grouping)
t=0;
amp1=0.2; freq1=3; phase1=1; %1st grouping
amp2=1.4; freq2=2; phase2=1.7; %2nd grouping
amp3=0.8; freq3=5; phase3=1.5; %3rd grouping
The Octave / Matlab code below solves for Y so I can plug it back into the equation to check values along with calculating values not located in the array.
clear all
t=0;
Y=0;
a1=[.2,3,1;1.4,2,1.7;.8,5,1.5]
for kk=1:1:length(a1)
Y=Y+a1(kk,1)*cos ((a1(kk,2))*t+a1(kk,3))
kk
end
Y
PS: I'm not trying to solve for Y since it's already solved for I'm trying to solve for Phase
The formulas located below are used to calculate Phase but I'm not sure how to put it into a for loop that will work in an array of n groupings:
How would I write the equation / for loop for finding the phase if I want to find freq=2.5 and amp=.23 and the phase is unknown I've looked online and it may require writing non linear equations which I'm not sure how to convert what I'm trying to do into such an equation.
phase1_test=acos(Y/amp1-amp3*cos(2*freq3*pi*t+phase3)/amp1-amp2*cos(2*freq2*pi*t+phase2)/amp1)-2*freq1*pi*t
phase2_test=acos(Y/amp2-amp3*cos(2*freq3*pi*t+phase3)/amp2-amp1*cos(2*freq1*pi*t+phase1)/amp2)-2*freq2*pi*t
phase3_test=acos(Y/amp3-amp2*cos(2*freq2*pi*t+phase2)/amp3-amp1*cos(2*freq1*pi*t+phase1)/amp3)-2*freq2*pi*t
Image of formula below:
I would like to do a check / calculate phases if given a freq and amp values.
I know I have to do a for loop but how do I convert the phase equation into a for loop so it will work on n groupings in an array and calculate different values not found in the array?
Basically I would be given an array of n groupings and freq=2.5 and amp=.23 and use the formula to calculate phase. Note: freq will not always be in the array hence why I'm trying to calculate the phase using a formula.
Ok, I think I finally understand your question:
you are trying to find a set of phase1, phase2,..., phaseN, such that equations like the ones you describe are satisfied
You know how to find y, and you supply values for freq and amp.
In Matlab, such a problem would be solved using, for example fsolve, but let's look at your problem step by step.
For simplicity, let me re-write your equations for phase1, phase2, and phase3. For example, your first equation, the one for phase1, would read
amp1*cos(phase1 + 2 freq1 pi t) + amp2*cos(2 freq2 pi t + phase2) + amp3*cos(2 freq3 pi t + phase3) - y = 0
Note that ampX (X is a placeholder for 1, 2, 3) are given, pi is a constant, t is given via Y (I think), freqX are given.
Hence, you are, in fact, dealing with a non-linear vector equation of the form
F(phase) = 0
where F is a multi-dimensional (vector) function taking a multi-dimensional (vector) input variable phase (comprised of phase1, phase2,..., phaseN). And you are looking for the set of phaseX, where all of the components of your vector function F are zero. N.B. F is a shorthand for your functions. Therefore, the first component of F, called f1, for example, is
f1 = amp1*cos(phase1+...) + amp2*cos(phase2+...) + amp3*cos(phase3+...) - y = 0.
Hence, f1 is a one-dimensional function of phase1, phase2, and phase3.
The technical term for what you are trying to do is find a zero of a non-linear vector function, or find a solution of a non-linear vector function. In Matlab, there are different approaches.
For a one-dimensional function, you can use fzero, which is explained at http://www.mathworks.com/help/matlab/ref/fzero.html?refresh=true
For a multi-dimensional (vector) function as yours, I would look into using fsolve, which is part of Matlab's optimization toolbox (which means I don't know how to do this in Octave). The function fsolve is explained at http://www.mathworks.com/help/optim/ug/fsolve.html
If you know an approximate solution for your phases, you may also look into iterative, local methods.
In particular, I would recommend you look into the Newton's Method, which allows you to find a solution to your system of equations F. Wikipedia has a good explanation of Newton's Method at https://en.wikipedia.org/wiki/Newton%27s_method . Newton iterations are very simple to implement and you should find a lot of resources online. You will have to compute the derivative of your function F with respect to each of your variables phaseX, which is very simple to compute since you're only dealing with cos() functions. For starters, have a look at the one-dimensional Newton iteration method in Matlab at http://www.math.colostate.edu/~gerhard/classes/331/lab/newton.html .
Finally, if you want to dig deeper, I found a textbook on this topic from the society for industrial and applied math: https://www.siam.org/books/textbooks/fr16_book.pdf .
As you can see, this is a very large field; Newton's method should be able to help you out, though.
Good luck!

Using Linear Prediction Over Time Series to Determine Next K Points

I have a time series of N data points of sunspots and would like to predict based on a subset of these points the remaining points in the series and then compare the correctness.
I'm just getting introduced to linear prediction using Matlab and so have decided that I would go the route of using the following code segment within a loop so that every point outside of the training set until the end of the given data has a prediction:
%x is the data, training set is some subset of x starting from beginning
%'unknown' is the number of points to extend the prediction over starting from the
%end of the training set (i.e. difference in length of training set and data vectors)
%x_pred is set to x initially
p = length(training_set);
coeffs = lpc(training_set, p);
for i=1:unknown
nextValue = -coeffs(2:end) * x_pred(end-unknown-1+i:-1:end-unknown-1+i-p+1)';
x_pred(end-unknown+i) = nextValue;
end
error = norm(x - x_pred)
I have three questions regarding this:
1) Does this appropriately do what I have described? I ask because my error seems rather large (>100) when predicting over only the last 20 points of a dataset that has hundreds of points.
2) Am I interpreting the second argument of lpc correctly? Namely, that it means the 'order' or rather number of points that you want to use in predicting the next point?
3) If this is there a more efficient, single line function in Matlab that I can call to replace the looping and just compute all necessary predictions for me given some subset of my overall data as a training set?
I tried looking through the lpc Matlab tutorial but it didn't seem to do the prediction as I have described my needs require. I have also been using How to use aryule() in Matlab to extend a number series? as a reference.
So after much deliberation and experimentation I have found the above approach to be correct and there does not appear to be any single Matlab function to do the above work. The large errors experienced are reasonable since I am using a linear prediction algorithm for a problem (i.e. sunspot prediction) that has inherent nonlinear behavior.
Hope this helps anyone else out there working on something similar.

Optimal choosing : MATLAB

Consider 2 sets
A = randi(1000,100,7);
B = randi(700,300,7);
I would like to find a function : B# = optimf(A,B) and gives me B# = {100x7} which is a collection of rows from B such that some attribute( eg. mean ) is minimum.
For eg: B# = optimf(A,B) such that mean(B#) - mean(A) is minimum.
Any ideas?
According to me Optimization is finding the best suitable value of a function. For example, if you have an equation and you need to find minimum value at which the equation satisfies criteria, then this is an optimization problem.
There is no such optimization function AFAIK for your function. But, you can take help of optimization algorithms like Least Square Errors
Or simply use some filter of MATLAB.
I hope it helps.
UPDATE
(Not a sophisticated solution but just works in some cases.)
Step 1-
Make a for-loop and select randomly some values from input vector. So you get a random subset.
Step 2
Define a cost function. A function that can measure how good is the subset. The function will take input as vector and gives output a numerical quantity such as % of quality.
Step 3
Go on taking these readings. Take the max value of output function and its corresponding vector. That should be a solution.
OR
use algorithms like ACO

Goodness of fit with MATLAB and chi-square test

I would like to measure the goodness-of-fit to an exponential decay curve. I am using the lsqcurvefit MATLAB function. I have been suggested by someone to do a chi-square test.
I would like to use the MATLAB function chi2gof but I am not sure how I would tell it that the data is being fitted to an exponential curve
The chi2gof function tests the null hypothesis that a set of data, say X, is a random sample drawn from some specified distribution (such as the exponential distribution).
From your description in the question, it sounds like you want to see how well your data X fits an exponential decay function. I really must emphasize, this is completely different to testing whether X is a random sample drawn from the exponential distribution. If you use chi2gof for your stated purpose, you'll get meaningless results.
The usual approach for testing the goodness of fit for some data X to some function f is least squares, or some variant on least squares. Further, a least squares approach can be used to generate test statistics that test goodness-of-fit, many of which are distributed according to the chi-square distribution. I believe this is probably what your friend was referring to.
EDIT: I have a few spare minutes so here's something to get you started. DISCLAIMER: I've never worked specifically on this problem, so what follows may not be correct. I'm going to assume you have a set of data x_n, n = 1, ..., N, and the corresponding timestamps for the data, t_n, n = 1, ..., N. Now, the exponential decay function is y_n = y_0 * e^{-b * t_n}. Note that by taking the natural logarithm of both sides we get: ln(y_n) = ln(y_0) - b * t_n. Okay, so this suggests using OLS to estimate the linear model ln(x_n) = ln(x_0) - b * t_n + e_n. Nice! Because now we can test goodness-of-fit using the standard R^2 measure, which matlab will return in the stats structure if you use the regress function to perform OLS. Hope this helps. Again I emphasize, I came up with this off the top of my head in a couple of minutes, so there may be good reasons why what I've suggested is a bad idea. Also, if you know the initial value of the process (ie x_0), then you may want to look into constrained least squares where you bind the parameter ln(x_0) to its known value.