In my homework, I am required to depict that a method can generate an Gaussian Distribution. the matlab program is shown below:
n=100;
b=25;
len=200000;
X=rand(n,len);
x=sum(X-0.5)*b/n;
[ps2,t2]=hist(x,50);
ps2=ps2/len;
bar(t2,ps2,'y');
hold on;
sigma_2=b^2/(12*n);
R=normrnd(0,sqrt(sigma_2),1,len);
[ps2,t2]=hist(R,50);
ps2=ps2/len;
plot(t2,ps2,'bo-','linewidth',1.5);
x is the sum of n uniformly distributed variables multiplying by b/n. And x is gaussian distributed with zero-mean and sigma^2=b^2/12n.
Then I got the image where the two distribution matched.
However, when I substituted the t2 inside the normal distibution density function f(x)=exp(-x.^2/(2*sigma_2))/sqrt(2*pi*sigma_2), the output is quite larger than the first one, although the shape is similar.
I wander why this occurs?
Its because you did not normalize discrete histograms. We know that in a continuous distributions the integral of probability functions are one. For solving this issue you should divide histogram to its integral. An approximate integral of a discrete function is rectangular integral:
integral (f) = sum(f)* LengthStep
so you should change your code this way :
n=100;
b=25;
len=200000;
X=rand(n,len);
x=sum(X-0.5)*b/n;
[ps2,t2]=hist(x,50);
ps2=ps2/(sum(ps2)*(t2(2)-t2(1))); % normalize discrete distribution
bar(t2,ps2,'y');
hold on;
sigma_2=b^2/(12*n);
R=normrnd(0,sqrt(sigma_2),1,len);
[ps2,t2]=hist(R,50);
ps2=ps2/(sum(ps2)*(t2(2)-t2(1))); % normalize discrete distribution
plot(t2,ps2,'bo-','linewidth',1.5);
hold on
plot(t2,exp(-t2.^2/(2*sigma_2))/sqrt(2*pi*sigma_2),'r'); %plot continuous distribution
and this is the result :
Related
Suppose I have a continuous probability distribution, e.g., Normal, on a support A. Suppose that there is a Matlab code that allows me to draw random numbers from such a distribution, e.g., this.
I want to build a Matlab code to "approximate" this continuous probability distribution with a probability mass function spanning over r points.
This means that I want to write a Matlab code to:
(1) Select r points from A. Let us call these points a1,a2,...,ar. These points will constitute the new discretised support.
(2) Construct a probability mass function over a1,a2,...,ar. This probability mass function should "well" approximate the original continuous probability distribution.
Could you help by providing also an example? This is a similar question asked for Julia.
Here some of my thoughts. Suppose that the continuous probability distribution of interest is one-dimensional. One way to go could be:
(1) Draw 10^6 random numbers from the continuous probability distribution of interest and store them in a column vector D.
(2) Suppose that r=10. Compute the 10-th, 20-th,..., 90-th quantiles of D. Find the median point falling in each of the 10 bins obtained. Call these median points a1,...,ar.
How can I construct the probability mass function from here?
Also, how can I generalise this procedure to more than one dimension?
Update using histcounts: I thought about using histcounts. Do you think it is a valid option? For many dimensions I can use this.
clear
rng default
%(1) Draw P random numbers for standard normal distribution
P=10^6;
X = randn(P,1);
%(2) Apply histcounts
[N,edges] = histcounts(X);
%(3) Construct the new discrete random variable
%(3.1) The support of the discrete random variable is the collection of the mean values of each bin
supp=zeros(size(N,2),1);
for j=2:size(N,2)+1
supp(j-1)=(edges(j)-edges(j-1))/2+edges(j-1);
end
%(3.2) The probability mass function of the discrete random variable is the
%number of X within each bin divided by P
pmass=N/P;
%(4) Check if the approximation is OK
%(4.1) Find the CDF of the discrete random variable
CDF_discrete=zeros(size(N,2),1);
for h=2:size(N,2)+1
CDF_discrete(h-1)=sum(X<=edges(h))/P;
end
%(4.2) Plot empirical CDF of the original random variable and CDF_discrete
ecdf(X)
hold on
scatter(supp, CDF_discrete)
I don't know if this is what you're after but maybe it can help you. You know, P(X = x) = 0 for any point in a continuous probability distribution, that is the pointwise probability of X mapping to x is infinitesimal small, and thus regarded as 0.
What you could do instead, in order to approximate it to a discrete probability space, is to define some points (x_1, x_2, ..., x_n), and let their discrete probabilities be the integral of some range of the PDF (from your continuous probability distribution), that is
P(x_1) = P(X \in (-infty, x_1_end)), P(x_2) = P(X \in (x_1_end, x_2_end)), ..., P(x_n) = P(X \in (x_(n-1)_end, +infty))
:-)
I am not doing signal processing. But in my area, I will use the spectral density of a matrix of data. I get quite confused at a very detailed level.
%matrix H is given.
corr=xcorr2(H); %get the correlation
spec=fft2(corr); % Wiener-Khinchin Theorem
In matlab, xcorr2 will calculate the correlation function of this matrix. The lag will range from -N+1 to N-1. So if size of matrix H is N by N, then size of corr will be 2N-1 by 2N-1. For discretized data, I should use corr or half of corr?
Another problem is I think Wiener-Khinchin Theorem is basically for continuous function. I have always thought that Discretized FT is an approximation to Continuous FT, or you can say it is a tool to calculate Continuous FT. If you use matlab build in function 'fft', you should divide the final result by \delta x.
Any kind soul who knows this area well there to share some matlab code with me?
Basically, approximating a continuous FT by a Discretized FT is the same as approximating an integral by a finite sum.
We will first discuss the 1D case, then we'll discuss the 2D case.
Let's look at the Wiener-Kinchin theorem (for example here).
It states that :
"For the discrete-time case, the power spectral density of the function with discrete values x[n], is :
where
Is the autocorrelation function of x[n]."
1) You can see already that the sum is taken from -infty to +infty in the calculation of S(f)
2) Now considering the Matlab fft - You can see (command 'edit fft' in Matlab), that it is defined as :
X(k) = sum_{n=1}^N x(n)*exp(-j*2*pi*(k-1)*(n-1)/N), 1 <= k <= N.
which is exactly what you want to be done in order to calculate the power spectral density for a frequency f.
Note that, for continuous functions, S(f) will be a continuous function. For Discretized function, S(f) will be discrete.
Now that we know all that, it can easily be extended to the 2D case. Indeed, the structure of fft2 matches the structure of the right hand side of the Wiener-Kinchin Theorem for the 2D case.
Though, it will be necessary to divide your result by NxM, where N is the number of sample points in x and M is the number of sample points in y.
I've got an arbitrary probability density function discretized as a matrix in Matlab, that means that for every pair x,y the probability is stored in the matrix:
A(x,y) = probability
This is a 100x100 matrix, and I would like to be able to generate random samples of two dimensions (x,y) out of this matrix and also, if possible, to be able to calculate the mean and other moments of the PDF. I want to do this because after resampling, I want to fit the samples to an approximated Gaussian Mixture Model.
I've been looking everywhere but I haven't found anything as specific as this. I hope you may be able to help me.
Thank you.
If you really have a discrete probably density function defined by A (as opposed to a continuous probability density function that is merely described by A), you can "cheat" by turning your 2D problem into a 1D problem.
%define the possible values for the (x,y) pair
row_vals = [1:size(A,1)]'*ones(1,size(A,2)); %all x values
col_vals = ones(size(A,1),1)*[1:size(A,2)]; %all y values
%convert your 2D problem into a 1D problem
A = A(:);
row_vals = row_vals(:);
col_vals = col_vals(:);
%calculate your fake 1D CDF, assumes sum(A(:))==1
CDF = cumsum(A); %remember, first term out of of cumsum is not zero
%because of the operation we're doing below (interp1 followed by ceil)
%we need the CDF to start at zero
CDF = [0; CDF(:)];
%generate random values
N_vals = 1000; %give me 1000 values
rand_vals = rand(N_vals,1); %spans zero to one
%look into CDF to see which index the rand val corresponds to
out_val = interp1(CDF,[0:1/(length(CDF)-1):1],rand_vals); %spans zero to one
ind = ceil(out_val*length(A));
%using the inds, you can lookup each pair of values
xy_values = [row_vals(ind) col_vals(ind)];
I hope that this helps!
Chip
I don't believe matlab has built-in functionality for generating multivariate random variables with arbitrary distribution. As a matter of fact, the same is true for univariate random numbers. But while the latter can be easily generated based on the cumulative distribution function, the CDF does not exist for multivariate distributions, so generating such numbers is much more messy (the main problem is the fact that 2 or more variables have correlation). So this part of your question is far beyond the scope of this site.
Since half an answer is better than no answer, here's how you can compute the mean and higher moments numerically using matlab:
%generate some dummy input
xv=linspace(-50,50,101);
yv=linspace(-30,30,100);
[x y]=meshgrid(xv,yv);
%define a discretized two-hump Gaussian distribution
A=floor(15*exp(-((x-10).^2+y.^2)/100)+15*exp(-((x+25).^2+y.^2)/100));
A=A/sum(A(:)); %normalized to sum to 1
%plot it if you like
%figure;
%surf(x,y,A)
%actual half-answer starts here
%get normalized pdf
weight=trapz(xv,trapz(yv,A));
A=A/weight; %A normalized to 1 according to trapz^2
%mean
mean_x=trapz(xv,trapz(yv,A.*x));
mean_y=trapz(xv,trapz(yv,A.*y));
So, the point is that you can perform a double integral on a rectangular mesh using two consecutive calls to trapz. This allows you to compute the integral of any quantity that has the same shape as your mesh, but a drawback is that vector components have to be computed independently. If you only wish to compute things which can be parametrized with x and y (which are naturally the same size as you mesh), then you can get along without having to do any additional thinking.
You could also define a function for the integration:
function res=trapz2(xv,yv,A,arg)
if ~isscalar(arg) && any(size(arg)~=size(A))
error('Size of A and var must be the same!')
end
res=trapz(xv,trapz(yv,A.*arg));
end
This way you can compute stuff like
weight=trapz2(xv,yv,A,1);
mean_x=trapz2(xv,yv,A,x);
NOTE: the reason I used a 101x100 mesh in the example is that the double call to trapz should be performed in the proper order. If you interchange xv and yv in the calls, you get the wrong answer due to inconsistency with the definition of A, but this will not be evident if A is square. I suggest avoiding symmetric quantities during the development stage.
I have a matrix z(x,y)
This is an NxN abitary pdf constructed from a unique Kernel density estimation (i.e. not a usual pdf and it doesn't have a function). It is multivariate and can't be separated and is discrete data.
I wan't to construct a NxN matrix (F(x,y)) that is the cumulative distribution function in 2 dimensions of this pdf so that I can then randomly sample the F(x,y) = P(x < X ,y < Y);
Analytically I think the CDF of a multivariate function is the surface integral of the pdf.
What I have tried is using the cumsum function in order to calculate the surface integral and tested this with a multivariate normal against the analytical solution and there seems to be some discrepancy between the two:
% multivariate parameters
delta = 100;
mu = [1 1];
Sigma = [0.25 .3; .3 1];
x1 = linspace(-2,4,delta); x2 = linspace(-2,4,delta);
[X1,X2] = meshgrid(x1,x2);
% Calculate Normal multivariate pdf
F = mvnpdf([X1(:) X2(:)],mu,Sigma);
F = reshape(F,length(x2),length(x1));
% My attempt at a numerical surface integral
FN = cumsum(cumsum(F,1),2);
% Normalise the CDF
FN = FN./max(max(FN));
X = [X1(:) X2(:)];
% Analytic solution to a multivariate normal pdf
p = mvncdf(X,mu,Sigma);
p = reshape(p,delta,delta);
% Highlight the difference
dif = p - FN;
error = max(max(sqrt(dif.^2)));
% %% Plot
figure(1)
surf(x1,x2,F);
caxis([min(F(:))-.5*range(F(:)),max(F(:))]);
xlabel('x1'); ylabel('x2'); zlabel('Probability Density');
figure(2)
surf(X1,X2,FN);
xlabel('x1'); ylabel('x2');
figure(3);
surf(X1,X2,p);
xlabel('x1'); ylabel('x2');
figure(5)
surf(X1,X2,dif)
xlabel('x1'); ylabel('x2');
Particularly the error seems to be in the transition region which is the most important.
Does anyone have any better solution to this problem or see what I'm doing wrong??
Any help would be much appreciated!
EDIT: This is the desired outcome of the cumulative integration, The reason this function is of value to me is that when you randomly generate samples from this function on the closed interval [0,1] the higher weighted (i.e. the more likely) values appear more often in this way the samples converge on the expected value(s) (in the case of multiple peaks) this is desired outcome for algorithms such as particle filters, neural networks etc.
Think of the 1-dimensional case first. You have a function represented by a vector F and want to numerically integrate. cumsum(F) will do that, but it uses a poor form of numerical integration. Namely, it treats F as a step function. You could instead do a more accurate numerical integration using the Trapezoidal rule or Simpson's rule.
The 2-dimensional case is no different. Your use of cumsum(cumsum(F,1),2) is again treating F as a step function, and the numerical errors resulting from that assumption only get worse as the number of dimensions of integration increases. There exist 2-dimensional analogues of the Trapezoidal rule and Simpson's rule. Since there's a bit too much math to repeat here, take a look here:
http://onestopgate.com/gate-study-material/mathematics/numerical-analysis/numerical-integration/2d-trapezoidal.asp.
You DO NOT need to compute the 2-dimensional integral of the probability density function in order to sample from the distribution. If you are computing the 2-d integral, you are going about the problem incorrectly.
Here are two ways to approach the sampling problem.
(1) You write that you have a kernel density estimate. A kernel density estimate is a special case of a mixture density. Any mixture density can be sampled by first selecting one kernel (perhaps differently or equally weighted, same procedure applies), and then sampling from that kernel. (That applies in any number of dimensions.) Typically the kernels are some relatively simple distribution such as a Gaussian distribution so that it is easy to sample from it.
(2) Any joint density P(X, Y) is equal to P(X | Y) P(Y) (and equivalently P(Y | X) P(X)). Therefore you can sample from P(Y) (or P(X)) and then from P(X | Y). In order to sample from P(X | Y), you will need to integrate P(X, Y) along a line Y = y (where y is the sampled value of Y), but (this is crucial) you only need to integrate along that line; you don't need to integrate over all values of X and Y.
If you tell us more about your problem, I can help with the details.
I need to create a diagonal matrix containing the Fourier coefficients of the Gaussian wavelet function, but I'm unsure of what to do.
Currently I'm using this function to generate the Haar Wavelet matrix
http://www.mathworks.co.uk/matlabcentral/fileexchange/33625-haar-wavelet-transformation-matrix-implementation/content/ConstructHaarWaveletTransformationMatrix.m
and taking the rows at dyadic scales (2,4,8,16) as the transform:
M= 256
H = ConstructHaarWaveletTransformationMatrix(M);
fi = conj(dftmtx(M))/M;
H = fi*H;
H = H(4,:);
H = diag(H);
etc
How do I repeat this for Gaussian wavelets? Is there a built in Matlab function which will do this for me?
For reference I'm implementing the algorithm in section 4 of this paper:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04218361
I maybe would not being answering the question, but i will try to help you advance.
As far as i know, the Matlab Wavelet Toolbox only deal with wavelet operations and coefficients, increase or decrease resolution levels, and similar operations, but do not exposes the internal matrices serving to doing the transformations from signals and coefficients.
Hence i fear the answer to this question is no. Some time ago, i did this for some of the Hart Class wavelet, and i actually build the matrix from the scratch, and then i compared the coefficients obtained with the Built-in Matlab Wavelet Toolbox, hence ensuring your matrices are good enough for your algorithm. In my case, recursive parameter estimation for time varying models.
For the function ConstructHaarWaveletTransformationMatrix it is really simple to create the matrix, because the Hart Class could be really simple expressed as Kronecker products.
The Gaussian Wavelet case as i fear should be done from the scratch too...
THe steps i suggest would be;
Although MATLAB dont include explicitely the matrices, you can use the Matlab built-in functions to recover the Gaussian Wavelets, and thus compose the matrix for your algorithm.
Build every column of the matrix with every Gaussian Wavelet, for every resolution levels you are requiring (the dyadic scales). Use the Matlab Wavelets toolbox for recover the shapes.
After this, compare the coefficients obtained by you, with the coefficients of the toolbox. This way you will correct the order of the Matrix row.
Numerically, being fj the signal projection over Vj (the PHI signals space, scaling functions) at resolution level j, and gj the signal projection over Wj (the PSI signals space, mother functions) at resolution level j, we can write:
f=fj0+sum_{j0}^{j1-1}{gj}
Hence, both fj0 and gj will induce two matrices, lets call them PHIj and PSIj matrices:
f=PHIj0*cj0+sum_{j0}^{j1-1}{PSIj*dj}
The PHIj columns contain the scaled and shifted scaling wavelet signal (one, for j0 only) for the approximation projection (the Vj0 space), and the PSIj columns contain the scaled and shifted mother wavelet signals (several, from j0 to j1-1) for the detail projection (onto the Wj0 to Wj1-1 spaces).
Hence, the Matrix you need is:
PHI=[PHIj0 PSIj0... PSIj1]
Thus you can express you original signal as:
f=PHI*C
where C is a vector of approximation and detail coefficients, for the levels:
C=[cj0' dj0'...dj1']'
The first part, for addressing the PHI build can be achieved by writing:
function PHI=MakePhi(l,str,Jmin,Jmax)
% [PHI]=MakePhi(l,str,Jmin,Jmax)
%
% Build full PHI Wavelet Matrix for obtaining wavelet coefficients
% (extract)
%FILTER
[LO_R,HI_R] = wfilters(str,'r');
lf=length(LO_R);
%PHI BUILD
PHI=[];
laux=l([end-Jmax end-Jmax:end]);
PHI=[PHI MakeWMatrix('a',str,laux)];
for j=Jmax:-1:Jmin
laux=l([end-j end-j:end]);
PHI=[PHI MakeWMatrix('d',str,laux)];
end
the wfilters is a MATLAB built in function, giving the required signal for the approximation and or detail wavelet signals.
The MakeWMatrix function is:
function M=MakeWMatrix(typestr,str,laux)
% M=MakeWMatrix(typestr,str,laux)
%
% Build Wavelet Matrix for obtaining wavelet coefficients
% for a single level vector.
% (extract)
[LO_R,HI_R] = wfilters(str,'r');
if typestr=='a'
F_R=LO_R';
else
F_R=HI_R';
end
la=length(laux);
lin=laux(2); lout=laux(3);
M=MakeCMatrix(F_R,lin,lout);
for i=3:la-1
lin=laux(i); lout=laux(i+1);
Mi=MakeCMatrix(LO_R',lin,lout);
M=Mi*M;
end
and finally the MakeCMatrix is:
function [M]=MakeCMatrix(F_R,lin,lout)
% Convolucion Matrix
% (extract)
lf=length(F_R);
M=[];
for i=1:lin
M(:,i)=[zeros(2*(i-1),1) ;F_R ;zeros(2*(lin-i),1)];
end
M=[zeros(1,lin); M ;zeros(1,lin)];
[ltot,lin]=size(M);
lmin=floor((ltot-lout)/2)+1;
lmax=floor((ltot-lout)/2)+lout;
M=M(lmin:lmax,:);
This last matrix should include some interpolation routine for having better general results in each case.
I expect this solve part of your problem.....
Hyp