I'm new to the forum and a beginner in programming.
I have the task to program a random walk in Matlab (1D or 2D) with a variance that I can adjust. I found the code for the random walk, but I'm really confused where to put the variance. I thought that the random walk always has the same variance (= t) so maybe I'm just lost in the math.
How do I control the variance?
For a simple random walk, consider using the Normal distribution with mean 0 (also called 'drift') and a non-zero variance. Notice since the mean is zero and the distribution is symmetric, this is a symmetric random walk. On each step, the process is equally like to go up or down, left or right, etc.
One easy way:
Step 1: Generate each step
Step 2: Get the cumulative sum
This can be done for any number of dimensions.
% MATLAB R2019a
drift = 0;
std = 1; % std = sqrt(variance)
pd = makedist('Normal',drift,std);
% One Dimension
nsteps = 50;
Z = random(pd,nsteps,1);
X = [0; cumsum(Z)];
plot(0:nsteps,X) % alternatively: stairs(0:nsteps,X)
And in two dimensions:
% Two Dimensions
nsteps = 100;
Z = random(pd,nsteps,2);
X = [zeros(1,2); cumsum(Z)];
% 2D Plot
figure, hold on, box on
plot(X(1,1),X(1,1),'gd','DisplayName','Start','MarkerFaceColor','g')
plot(X(:,1),X(:,2),'k-','HandleVisibility','off')
plot(X(end,1),X(end,2),'rs','DisplayName','Stop','MarkerFaceColor','r')
legend('show')
The variance will affect the "volatility" so a higher variance means a more "jumpy" process relative to the lower variance.
Note: I've intentionally avoided the Brownian motion-type implementation (scaling, step size decreasing in the limit, etc.) since OP specifically asked for a random walk. A Brownian motion implementation can link the variance to a time-index due to Gaussian properties.
The OP writes:
the random walk has always the same variance
This is true for the steps (each step typically has the same distribution). However, the variance of the process at a time step (or point in time) should be increasing with the number of steps (or as time increases).
Related:
MATLAB: plotting a random walk
Related
I am using MATLAB R2020a with MacOS. I am trying to find the exponentially weighted moving mean of the cycle period of an ECG signal, and have used the dsp.MovingAverage function from the DSP signal processing toolbox, and called the commands shown. However, I am not sure how to specify how many of the elements of the vector to include in the weighted mean. At the moment, is it just adding a weight to all of the elements and then finding the moving mean?
movavgExp = dsp.MovingAverage('Method', 'Exponential weighting', 'ForgettingFactor', 0.1);
Whenever I call the 'WindowLength' command as specified in the DSP documentation, it produces an error:
movavgExp = dsp.MovingAverage(10, 'Method', 'Exponential weighting', 'ForgettingFactor', 0.1);
Warning: The WindowLength property is not relevant in this configuration of the System
object.
I would really appreciate any suggestions for this, thanks in advance!
From the Mathworks page for dsp.MovingAverage:
"Exponential weighting — The block multiplies the samples by a set of weighting factors. The magnitude of the weighting factors decreases exponentially as the age of the data increases, but the magnitude never reaches zero. To compute the average, the algorithm sums the weighted data."
So there is no real averaging window as you use all your signal up to time t (exponentially weighted) for the mean value at that instant.
Of course older samples are weighted less than newer ones, and the parameter for that is that ForgettingFactor. I guess you could then define an "effective" averaging window as the number of samples whose weight is larger than a threshold.
Unfortunately it doesn't seem like dsp.MovingAverage can return the weights itself, but you can calculate them yourself. From the Mathworks page,
where is the weight for the Nth sample and is your forgetting factor. Remember to initialize the weight for the first sample to 1, so that you could have something like:
w = zeros(length(x),1); % where x is your signal
w(1) = 1; % initialize the weight for the first sample
for i = 2:length(x)
w(i) = lambda*w(i-1) + 1; % calculate the successive weights
end
To have then the averaging window for the N-th sample I would probably then normalize the weights from 1 to N with respect to the their sum:
thr = 1.e-3; % your threshold, you'll probably have to play with this a bit
lengthAveragWdw = zeros(length(x),1);
for i = 1:length(x)
wi = w(1:i); % weights used to calculate the moving average up to the i-th sample
wi = wi./sum(wi); % normalize the weights
lengthAveragWdw(i) = sum(wi >= thr); % count the number of samples whose weight is greater than the threshold
end
where thr is a threshold value that you have to decide beforehand.
I am trying to understand the relationship between the spectral density of a time series and its variance. From what I understand, the integral of the spectral density should be equal to the variance. At least according to most lecture notes such as these.
I am struggling to replicate this finding however. Lets say I generate a simple AR(1) series with an autoregressive coefficient of 0.9.
T = 1000;
rho = 0.9;
dat = zeros(T,1);
for ii = 2:T
dat(ii) = rho*dat(ii-1)+randn;
end
I then proceed to calculate the spectral density (autocov does the same thing as xcov in the signal toolbox, which i don't have. it is the covariances of the demeaned series, with the variance in the middle of the vector)
lag = 20;
autocovs = autocov(dat,lag);
lags = -lag:1:lag;
wb = 0:pi/64:pi;
rT = sqrt(length(dat));
weight = 1-abs(lags)/rT;
weight(abs(lags)>rT) = 0; %bartlett weight
for j = 1:length(wb)
sdb(j) = real(sum(autocovs'.*weight.*exp(-i*wb(j).*lags)))/(2*pi);
end
sdb = sdb;
sdb is the power density function and is certainly the correct shape for an AR(1), weighted towards low frequencies:
enter image description here. But the sum of the power spectrum is 54.5, while the variance of the simulated AR(1) series is around 5.
What am I missing? I understood the spectral density to be how the variance of the series was distributed across the spectrum. I'm not sure if I have misunderstood the thoery or made a coding error. Any good references would be much appreciated.
Edit: I realized that obviously summing the "sdb" series is not taking the integral. To integrate between {-pi,pi}, I should be summing sdb*(2*pi/130), or equivalently sdb*(pi/65) since i am only looking at the {+pi} segment and sdb is symmetric for negative values. I still however seem to get a number that is bigger than the variance (even resimulating multiple times)... am I still missing something? The sdb line above becomes
sdb(j) = real(sum(autocovs'.*weight.*exp(-i*wb(j).*lags)))/(65);
Consider the following draws for a 2x1 vector in Matlab with a probability distribution that is a mixture of two Gaussian components.
P=10^3; %number draws
v=1;
%First component
mu_a = [0,0.5];
sigma_a = [v,0;0,v];
%Second component
mu_b = [0,8.2];
sigma_b = [v,0;0,v];
%Combine
MU = [mu_a;mu_b];
SIGMA = cat(3,sigma_a,sigma_b);
w = ones(1,2)/2; %equal weight 0.5
obj = gmdistribution(MU,SIGMA,w);
%Draws
RV_temp = random(obj,P);%Px2
% Transform each component of RV_temp into a uniform in [0,1] by estimating the cdf.
RV1=ksdensity(RV_temp(:,1), RV_temp(:,1),'function', 'cdf');
RV2=ksdensity(RV_temp(:,2), RV_temp(:,2),'function', 'cdf');
Now, if we check whether RV1 and RV2 are uniformly distributed on [0,1] by doing
ecdf(RV1)
ecdf(RV2)
we can see that RV1 is uniformly distributed on [0,1] (the empirical cdf is close to the 45 degree line) while RV2 is not.
I don't understand why. It seems that the more distant are mu_a(2)and mu_b(2), the worse the job done by ksdensity with a reasonable number of draws. Why?
When you have a mixture of N(0.5,v) and N(8.2,v) then the range of the generated data is larger than if you had expectation which were closer, like N(0,v) and N(0,v), as you have in the other dimension. Then you ask ksdensity to approximate a function using P points inside this range.
Like in standard linear interpolation, the denser the points the better approximation of the function (inside the range), this is the same case here. Thus in the N(0.5,v) and N(8.2,v) where the points are "sparse" (or sparser, is that a word?) the approximation is worse than in the N(0,v) and N(0,v) where the points are denser.
As a small side note, are there any reason that you do not apply ksdensity directly on the bivariate data? Also I cannot reproduce your comment where you say that 5e2points are also good. Final comment, 1e3 is typically prefered over 10^3.
I think this is simply about the number of samples you're using. For the first example, the means of the two Gaussians are relatively close, hence a thousand samples are enough to obtain a cdf really close the the U[0,1] cdf. On the second vector though, you have a higher difference, and need more samples. With 100000 samples, I obtained the following result:
With 1000 I obtained this:
Which is clearly farther from the Uniform cdf function. Try to increase the number of samples to a million and check if the result is again getting closer.
I've got an arbitrary probability density function discretized as a matrix in Matlab, that means that for every pair x,y the probability is stored in the matrix:
A(x,y) = probability
This is a 100x100 matrix, and I would like to be able to generate random samples of two dimensions (x,y) out of this matrix and also, if possible, to be able to calculate the mean and other moments of the PDF. I want to do this because after resampling, I want to fit the samples to an approximated Gaussian Mixture Model.
I've been looking everywhere but I haven't found anything as specific as this. I hope you may be able to help me.
Thank you.
If you really have a discrete probably density function defined by A (as opposed to a continuous probability density function that is merely described by A), you can "cheat" by turning your 2D problem into a 1D problem.
%define the possible values for the (x,y) pair
row_vals = [1:size(A,1)]'*ones(1,size(A,2)); %all x values
col_vals = ones(size(A,1),1)*[1:size(A,2)]; %all y values
%convert your 2D problem into a 1D problem
A = A(:);
row_vals = row_vals(:);
col_vals = col_vals(:);
%calculate your fake 1D CDF, assumes sum(A(:))==1
CDF = cumsum(A); %remember, first term out of of cumsum is not zero
%because of the operation we're doing below (interp1 followed by ceil)
%we need the CDF to start at zero
CDF = [0; CDF(:)];
%generate random values
N_vals = 1000; %give me 1000 values
rand_vals = rand(N_vals,1); %spans zero to one
%look into CDF to see which index the rand val corresponds to
out_val = interp1(CDF,[0:1/(length(CDF)-1):1],rand_vals); %spans zero to one
ind = ceil(out_val*length(A));
%using the inds, you can lookup each pair of values
xy_values = [row_vals(ind) col_vals(ind)];
I hope that this helps!
Chip
I don't believe matlab has built-in functionality for generating multivariate random variables with arbitrary distribution. As a matter of fact, the same is true for univariate random numbers. But while the latter can be easily generated based on the cumulative distribution function, the CDF does not exist for multivariate distributions, so generating such numbers is much more messy (the main problem is the fact that 2 or more variables have correlation). So this part of your question is far beyond the scope of this site.
Since half an answer is better than no answer, here's how you can compute the mean and higher moments numerically using matlab:
%generate some dummy input
xv=linspace(-50,50,101);
yv=linspace(-30,30,100);
[x y]=meshgrid(xv,yv);
%define a discretized two-hump Gaussian distribution
A=floor(15*exp(-((x-10).^2+y.^2)/100)+15*exp(-((x+25).^2+y.^2)/100));
A=A/sum(A(:)); %normalized to sum to 1
%plot it if you like
%figure;
%surf(x,y,A)
%actual half-answer starts here
%get normalized pdf
weight=trapz(xv,trapz(yv,A));
A=A/weight; %A normalized to 1 according to trapz^2
%mean
mean_x=trapz(xv,trapz(yv,A.*x));
mean_y=trapz(xv,trapz(yv,A.*y));
So, the point is that you can perform a double integral on a rectangular mesh using two consecutive calls to trapz. This allows you to compute the integral of any quantity that has the same shape as your mesh, but a drawback is that vector components have to be computed independently. If you only wish to compute things which can be parametrized with x and y (which are naturally the same size as you mesh), then you can get along without having to do any additional thinking.
You could also define a function for the integration:
function res=trapz2(xv,yv,A,arg)
if ~isscalar(arg) && any(size(arg)~=size(A))
error('Size of A and var must be the same!')
end
res=trapz(xv,trapz(yv,A.*arg));
end
This way you can compute stuff like
weight=trapz2(xv,yv,A,1);
mean_x=trapz2(xv,yv,A,x);
NOTE: the reason I used a 101x100 mesh in the example is that the double call to trapz should be performed in the proper order. If you interchange xv and yv in the calls, you get the wrong answer due to inconsistency with the definition of A, but this will not be evident if A is square. I suggest avoiding symmetric quantities during the development stage.
I'd like to make an array of random samples from a Gaussian distrubution.
Mean value is 0 and variance is 1.
If I take enough samples, I would think my maximum value of a sample can be 0+1=1.
However, I find that I get values like 4.2891 ...
My code:
x = 0+sqrt(1)*randn(100000,1);
mean(x)
var(x)
max(x)
This would give me a mean like 0, a var of 0.9937 but my maximum value is 4.2891?
Can anyone help me out why it does this?
As others have mentioned, there is no bound on the possible values that x can take on in a gaussian distribution. However, the farther x is from the mean, the less likely it is to be drawn.
To give you some intuition for what the variance actually means (for any distribution, not just the gaussian case), you can look at the 68-95-99.7 rule. The rule says:
about 68% of the population will be within one sigma of the mean
about 95% of the population will be within two sigma's of the mean
about 99.7% of the population will be within three sigma's of the mean
Here sigma = sqrt(var) is the standard deviation of the distribution.
So while in theory it is possible to draw any x from a gaussian distribution, in practice it is unlikely to draw anything past 5 or 6 standard deviations away for a population of 100000.
This will yield N random numbers using the gaussian normal distribution.
N = 100;
mu = 0;
sigma = 1;
Xs = normrnd(mu, sigma, N);
EDIT:
I just realized that your code is in fact equivalent to what I've written.
As others already pointed out: variance is not the maximum distance a sample can deviate from the mean! (It is just the average of the squares of those distances)