Matlab: going from spectral density to variance - matlab

I am trying to understand the relationship between the spectral density of a time series and its variance. From what I understand, the integral of the spectral density should be equal to the variance. At least according to most lecture notes such as these.
I am struggling to replicate this finding however. Lets say I generate a simple AR(1) series with an autoregressive coefficient of 0.9.
T = 1000;
rho = 0.9;
dat = zeros(T,1);
for ii = 2:T
dat(ii) = rho*dat(ii-1)+randn;
end
I then proceed to calculate the spectral density (autocov does the same thing as xcov in the signal toolbox, which i don't have. it is the covariances of the demeaned series, with the variance in the middle of the vector)
lag = 20;
autocovs = autocov(dat,lag);
lags = -lag:1:lag;
wb = 0:pi/64:pi;
rT = sqrt(length(dat));
weight = 1-abs(lags)/rT;
weight(abs(lags)>rT) = 0; %bartlett weight
for j = 1:length(wb)
sdb(j) = real(sum(autocovs'.*weight.*exp(-i*wb(j).*lags)))/(2*pi);
end
sdb = sdb;
sdb is the power density function and is certainly the correct shape for an AR(1), weighted towards low frequencies:
enter image description here. But the sum of the power spectrum is 54.5, while the variance of the simulated AR(1) series is around 5.
What am I missing? I understood the spectral density to be how the variance of the series was distributed across the spectrum. I'm not sure if I have misunderstood the thoery or made a coding error. Any good references would be much appreciated.
Edit: I realized that obviously summing the "sdb" series is not taking the integral. To integrate between {-pi,pi}, I should be summing sdb*(2*pi/130), or equivalently sdb*(pi/65) since i am only looking at the {+pi} segment and sdb is symmetric for negative values. I still however seem to get a number that is bigger than the variance (even resimulating multiple times)... am I still missing something? The sdb line above becomes
sdb(j) = real(sum(autocovs'.*weight.*exp(-i*wb(j).*lags)))/(65);

Related

How do I calculate an exponentially weighted moving mean using DSP signal processing toolbox

I am using MATLAB R2020a with MacOS. I am trying to find the exponentially weighted moving mean of the cycle period of an ECG signal, and have used the dsp.MovingAverage function from the DSP signal processing toolbox, and called the commands shown. However, I am not sure how to specify how many of the elements of the vector to include in the weighted mean. At the moment, is it just adding a weight to all of the elements and then finding the moving mean?
movavgExp = dsp.MovingAverage('Method', 'Exponential weighting', 'ForgettingFactor', 0.1);
Whenever I call the 'WindowLength' command as specified in the DSP documentation, it produces an error:
movavgExp = dsp.MovingAverage(10, 'Method', 'Exponential weighting', 'ForgettingFactor', 0.1);
Warning: The WindowLength property is not relevant in this configuration of the System
object.
I would really appreciate any suggestions for this, thanks in advance!
From the Mathworks page for dsp.MovingAverage:
"Exponential weighting — The block multiplies the samples by a set of weighting factors. The magnitude of the weighting factors decreases exponentially as the age of the data increases, but the magnitude never reaches zero. To compute the average, the algorithm sums the weighted data."
So there is no real averaging window as you use all your signal up to time t (exponentially weighted) for the mean value at that instant.
Of course older samples are weighted less than newer ones, and the parameter for that is that ForgettingFactor. I guess you could then define an "effective" averaging window as the number of samples whose weight is larger than a threshold.
Unfortunately it doesn't seem like dsp.MovingAverage can return the weights itself, but you can calculate them yourself. From the Mathworks page,
where is the weight for the Nth sample and is your forgetting factor. Remember to initialize the weight for the first sample to 1, so that you could have something like:
w = zeros(length(x),1); % where x is your signal
w(1) = 1; % initialize the weight for the first sample
for i = 2:length(x)
w(i) = lambda*w(i-1) + 1; % calculate the successive weights
end
To have then the averaging window for the N-th sample I would probably then normalize the weights from 1 to N with respect to the their sum:
thr = 1.e-3; % your threshold, you'll probably have to play with this a bit
lengthAveragWdw = zeros(length(x),1);
for i = 1:length(x)
wi = w(1:i); % weights used to calculate the moving average up to the i-th sample
wi = wi./sum(wi); % normalize the weights
lengthAveragWdw(i) = sum(wi >= thr); % count the number of samples whose weight is greater than the threshold
end
where thr is a threshold value that you have to decide beforehand.

Variance in random Walk with Matlab

I'm new to the forum and a beginner in programming.
I have the task to program a random walk in Matlab (1D or 2D) with a variance that I can adjust. I found the code for the random walk, but I'm really confused where to put the variance. I thought that the random walk always has the same variance (= t) so maybe I'm just lost in the math.
How do I control the variance?
For a simple random walk, consider using the Normal distribution with mean 0 (also called 'drift') and a non-zero variance. Notice since the mean is zero and the distribution is symmetric, this is a symmetric random walk. On each step, the process is equally like to go up or down, left or right, etc.
One easy way:
Step 1: Generate each step
Step 2: Get the cumulative sum
This can be done for any number of dimensions.
% MATLAB R2019a
drift = 0;
std = 1; % std = sqrt(variance)
pd = makedist('Normal',drift,std);
% One Dimension
nsteps = 50;
Z = random(pd,nsteps,1);
X = [0; cumsum(Z)];
plot(0:nsteps,X) % alternatively: stairs(0:nsteps,X)
And in two dimensions:
% Two Dimensions
nsteps = 100;
Z = random(pd,nsteps,2);
X = [zeros(1,2); cumsum(Z)];
% 2D Plot
figure, hold on, box on
plot(X(1,1),X(1,1),'gd','DisplayName','Start','MarkerFaceColor','g')
plot(X(:,1),X(:,2),'k-','HandleVisibility','off')
plot(X(end,1),X(end,2),'rs','DisplayName','Stop','MarkerFaceColor','r')
legend('show')
The variance will affect the "volatility" so a higher variance means a more "jumpy" process relative to the lower variance.
Note: I've intentionally avoided the Brownian motion-type implementation (scaling, step size decreasing in the limit, etc.) since OP specifically asked for a random walk. A Brownian motion implementation can link the variance to a time-index due to Gaussian properties.
The OP writes:
the random walk has always the same variance
This is true for the steps (each step typically has the same distribution). However, the variance of the process at a time step (or point in time) should be increasing with the number of steps (or as time increases).
Related:
MATLAB: plotting a random walk

Matlab: computing signal to noise ratio (SNR) of two highly correlated time domain signals

I'm working in the space of biosignal acquisition. I made a experiment as detailed below, and am now trying to obtain some results from the data.
I have a text file of a signal in Matlab. I loaded the signal onto a waveform generator, then I recorded the generator output on an oscilloscope.
I imported the recorded signal from the oscilloscope back into Matlab.
The Pearson's correlation coefficient between the original signal and the oscilloscope signal is 0.9958 (obtained using corrcoeff function).
I want to compute the SNR of the oscilloscope signal (what I'm calling my signal plus whatever noise is introduced through the digital-to-analog conversion and visa-versa). I have attached a snippet of the 2 signals for reference.
So my original signal is X and oscilloscope signal is X + N.
I used the snr function to compute SNR as follows.
snr(original, (oscilloscope - original))
The result I got was 20.44 dB.
This seems off to me as I would have thought with such a high correlation, that the SNR should be much higher?
Or is it not appropriate to try and compute SNR in this sort of situation?
All help is appreciated.
Thanks
Edit: Graph of a couple of results vs Sleutheye's simulated relationship
You might be surprised at just how even such moderate SNR can still result in fairly high correlations.
I ran an experiment to illustrate the approximate relation between correlation and signal-to-noise-ratio estimate. Since I did not have your specific EEG signal, I just used a reference constant signal and some white Gaussian noise. Keep in mind that the relationship could be affected by the nature of the signal and noise, but it should give you an idea of what to expect. This simulation can be executed with the following code:
SNR = [10:1:40];
M = 10000;
C = zeros(size(SNR));
for i=1:length(SNR)
x = ones(1,M);
K = sqrt(sum(x.*x)/M)*power(10, -SNR(i)/20);
z = x + K*randn(size(x));
C(i) = xcorr(x,z,0)./sqrt(sum(x.*x)*sum(z.*z));
end
figure(1);
hold off; plot(SNR, C);
corr0 = 0.9958;
hold on; plot([SNR(1) SNR(end)], [corr0 corr0], 'k:');
snr0 = 20.44;
hold on; plot([snr0 snr0], [min(C) max(C)], 'r:');
xlabel('SNR (dB)');
ylabel('Correlation');
The dotted black horizontal line highlights your 0.9958 correlation measurement, and the dotted red vertical line highlights your 20.44 dB SNR result.
I'd say that's a pretty good match!
In fact, for this specific case in my simulation (x = 1; z = x + N(0,σ)) if we denote C(x,z) to be the correlation between x and z, and σ as the noise standard deviation, we can actually show that:
Given a correlation value of 0.9958, this would yield an SNR of 20.79dB, which is consistent with your results.

Transforming draws in Matlab from Gaussian mixture to uniform

Consider the following draws for a 2x1 vector in Matlab with a probability distribution that is a mixture of two Gaussian components.
P=10^3; %number draws
v=1;
%First component
mu_a = [0,0.5];
sigma_a = [v,0;0,v];
%Second component
mu_b = [0,8.2];
sigma_b = [v,0;0,v];
%Combine
MU = [mu_a;mu_b];
SIGMA = cat(3,sigma_a,sigma_b);
w = ones(1,2)/2; %equal weight 0.5
obj = gmdistribution(MU,SIGMA,w);
%Draws
RV_temp = random(obj,P);%Px2
% Transform each component of RV_temp into a uniform in [0,1] by estimating the cdf.
RV1=ksdensity(RV_temp(:,1), RV_temp(:,1),'function', 'cdf');
RV2=ksdensity(RV_temp(:,2), RV_temp(:,2),'function', 'cdf');
Now, if we check whether RV1 and RV2 are uniformly distributed on [0,1] by doing
ecdf(RV1)
ecdf(RV2)
we can see that RV1 is uniformly distributed on [0,1] (the empirical cdf is close to the 45 degree line) while RV2 is not.
I don't understand why. It seems that the more distant are mu_a(2)and mu_b(2), the worse the job done by ksdensity with a reasonable number of draws. Why?
When you have a mixture of N(0.5,v) and N(8.2,v) then the range of the generated data is larger than if you had expectation which were closer, like N(0,v) and N(0,v), as you have in the other dimension. Then you ask ksdensity to approximate a function using P points inside this range.
Like in standard linear interpolation, the denser the points the better approximation of the function (inside the range), this is the same case here. Thus in the N(0.5,v) and N(8.2,v) where the points are "sparse" (or sparser, is that a word?) the approximation is worse than in the N(0,v) and N(0,v) where the points are denser.
As a small side note, are there any reason that you do not apply ksdensity directly on the bivariate data? Also I cannot reproduce your comment where you say that 5e2points are also good. Final comment, 1e3 is typically prefered over 10^3.
I think this is simply about the number of samples you're using. For the first example, the means of the two Gaussians are relatively close, hence a thousand samples are enough to obtain a cdf really close the the U[0,1] cdf. On the second vector though, you have a higher difference, and need more samples. With 100000 samples, I obtained the following result:
With 1000 I obtained this:
Which is clearly farther from the Uniform cdf function. Try to increase the number of samples to a million and check if the result is again getting closer.

entropy estimation using histogram of normal data vs direct formula (matlab)

Let's assume we have drawn n=10000 samples of the standard normal distribution.
Now I want to calculate its entropy using histograms to calculate the probabilities.
1) calculate probabilities (for example using matlab)
[p,x] = hist(samples,binnumbers);
area = (x(2)-x(1))*sum(p);
p = p/area;
(binnumbers is determined due to some rule)
2) estimate entropy
H = -sum(p.*log2(p))
which gives 58.6488
Now when i use the direct formula to calculate the entropy of normal data
H = 0.5*log2(2*pi*exp(1)) = 2.0471
What do i do wrong when using the histograms + entropy formula?
Thank you very much for any help!!
You are missing the dp term in the sum
dp = (x(2)-x(1));
area = sum(p)*dp;
H = -sum( (p*dp) * log2(p) );
This should bring you close enough...
PS,
be careful when you take log2(p) for sometimes you might have empty bins. You might find nansum useful.