Integration of normal probability distribution function with random numbers - matlab

function Y=normpdf(X)
syms X
Y = normpdf(X);
int(Y,X,1,inf)
end
I need to integrate normal pdf function from 1 to infinity for the case of N=100 where N is the total numbers generated.I know i need to use randn() for generating random numbers but i dont know how to use it in this situation.

You could have N = 100 random numbers from t = randn(N, 1);. First, we sort with t = sort(t), then the integrated PDF, i.e. cumulative density function is approximated by your samples with p = (1 : N) / N for t as you can see with plot(t, p). It will overlap well with hold on, plot(t, normcdf(t), 'r').

A perhaps more intuitive approach is to partition the x axis into bins in order to estimate the CDF:
N = 100; % number of samples
t = randn(N, 1); % random data
x = linspace(-10,10,200); % define bins
estim_cdf = mean(bsxfun(#le, t, x)); % estimate CDF
plot(x, estim_cdf);
hold on
plot(x, normcdf(x), 'r')
Note that #s.bandara's solution can be interpreted as the limiting case of this as the number of bins tends to infinity, and therefore it probably gives more accurate results.

Related

Vectors must be the same length error in Curve Fitting in Matlab

I'm having problems in curve fitting my randomized data for the function
Here is my code
N = 100;
mu = 5; stdev = 2;
x = mu+stdev*randn(N,1);
bin=mu-6*stdev:0.5:mu+6*stdev;
f=hist(x,bin);
plot(bin,f,'bo'); hold on;
x_ = x(1):0.1:x(end);
y_ = (1./sqrt(8.*pi)).*exp(-((x_-mu).^2)./8);
plot(x_,y_,'b-'); hold on;
It seems like I'm having vector size problems since it is giving me the error
Error using plot
Vectors must be the same length.
Note that I simplified y_ since mu and the standard deviation is known.
Plot:
Well first of all some adjustments to your question:
You are not trying to do curve fitting. What you are trying to do (in my opinion) is to overlay a probability density function on an histogram obtained by taking random points from the same distribution (A normal distribution with parameters (mu,sigma)). These two curve should indeed overlay, as they represent the same thing, only one is analytical and the other one is obtained numerically.
As seen in the hist documentation, hist is not recommended and you should use histogram instead
First step: Generating your random data
Knowing the distribution is the Normal distribution, we can use MATLAB's random function to do that :
N = 150;
rng('default') % For reproducibility
mu = 5;
sigma = 2;
r = random('Normal',mu,sigma,N,1);
Second step: Plot the histogram
Because we don't just want a count of the elements in each bin, but a feel of the probability density function, we can use the 'Normalization' 'pdf' arguments
Nbins = 25;
f=histogram(r,Nbins,'Normalization','pdf');
hold on
Here I'd rather specify a number of bins than specifying the bins themselves, because you never know in advance how far from the mean your data is going to be.
Last step: overlay the probability density function over the histogram
The histogram being already consistent with a probability density function, it is sufficient to just overlay the density function:
x_ = linspace(min(r),max(r),100);
y_ = (1./sqrt(2*sigma^2*pi)).*exp(-((x_-mu).^2)./(2*sigma^2));
plot(x_,y_,'b-');
With N = 150
With N = 1500
With N = 150.000 and Nbins = 50
If for some obscure reason you want to use old hist() function
The old hist() function can't handle normalization, so you'll have to do it by hand, by normalizing your density function to fit your histogram:
N = 1500;
% rng('default') % For reproducibility
mu = 5;
sigma = 2;
r = random('Normal',mu,sigma,1,N);
Nbins = 50;
[~,centers]=hist(r,Nbins);
hist(r,Nbins); hold on
% Width of bins
Widths = diff(centers);
x_ = linspace(min(r),max(r),100);
y_ = N*mean(Widths)*(1./sqrt(2*sigma^2*pi)).*exp(-((x_-mu).^2)./(2*sigma^2));
plot(x_,y_,'r-');

How to generate data with a specific trend in MATLAB

I want to test the Akaike criterion (it is a criterion that gives where do you get a significant change in a time series), but to do that I need to generate data that for example follow a sinusoidal trend, a linear trend with positive or negative slope, a constant trend, etc. So far I have done this but with random numbers, this is:
%Implementation of the Akaike method for Earth sciences.
N=100;
data=zeros(N,1);
for i=1:N
data(i,1)=unifrnd(1,N);
end
%AIC=zeros(N-1,1);
data=rand(1,N);
for k=1:N
%y=datasample(data,k);
AIC(k,1)=k*log(var(data(1:k),1))+(N-k-1)*log(var(data(k+1:N),1));
end
AIC(1)=NaN;
%AIC(N-1)=[];AIC(N)=[];
%disp(AIC)
%plot(AIC)
subplot(2,1,1)
plot(data,'Marker','.')
subplot(2,1,2)
plot(AIC,'Marker','.')
So, How can I generate different data with different trend in MATLAB?
Thanks a lot in advance.
What you can do is first start off with a known curve, then add some noise or random values so that the signal does follow a trend but it is noisy. Given a set of independent values, use these to generate values for your sinusoidal curves, a line with a positive or negative slope and a constant value.
Something like this comes to mind:
X = 1 : N; % N is defined in your code
Y1 = sin(X) + rand(1, N); % Sinusoidal
slope1 = 2; intercept = 3;
Y2 = slope1*X + intercept + rand(1, N); % Line with a positive slope
slope2 = -1; intercept2 = 0.5;
Y3 = slope2*X + intercept2 + rand(1, N); % Line with a negative slope
B = 2;
Y4 = B*ones(1, N) + rand(1, N); % Constant line
rand is a function in MATLAB that uniformly generates floating-point values between [0,1]. Y1, Y2, Y3 and Y4 are the trends you desire where they follow the curve defined but they add a bit of random values so that you don't completely get the trend you want and the noise is designed to decrease how similarity those curves are to the curve you defined. Increase the magnitude of the random values to decrease the similarity.

I need to spectral clustering for two donuts shape dataset.(Matlab)

I have tried hours but I cannot find solution.
I have "two Donuts" Data sample (variable "X")
you can download file below link
donut dataset(rings.mat)
which spreads to 2D shape like below image
First 250pts are located inside donuts and last 750 pts are located outside donuts.
and I need to perform spectral clustering.
I made (similarity matrix "W") with Gaussian similarity distance.
and I made degree matrix by sum of each raw of "W"
and then I computed eigen value(E) and eigen Vector(V)
and the shape of "V" is not good.
what is wrong with my trial???
I cannot figure out.
load rings.mat
[D, N] = size(X); % data stored in X
%initial plot data
figure; hold on;
for i=1:N,
plot(X(1,i), X(2,i),'o');
end
% perform spectral clustering
W = zeros(N,N);
D = zeros(N,N);
sigma = 1;
for i=1:N,
for j=1:N,
xixj2 = (X(1,i)-X(1,j))^2 + (X(2,i)-X(2,j))^2 ;
W(i,j) = exp( -1*xixj2 / (2*sigma^2) ) ; % compute weight here
% if (i==j)
% W(i,j)=0;
% end;
end;
D(i,i) = sum(W(i,:)) ;
end;
L = D - W ;
normL = D^-0.5*L*D^-0.5;
[u,s,v] = svd(normL);
If you use the Laplacian like it is in your code (the "real" laplacian), then to cluster your points into two sets you will want the eigenvector corresponding to second smallest eigenvalue.
The intuitive idea is to connect all of your points to each other with springs, where the springs are stiffer if the points are near each other, and less stiff for points far away. The eigenvectors of the Laplacian are the modes of vibration if you hit your spring network with a hammer and watch it oscillate - smaller eigenvalues corresponding to lower frequency "bulk" modes, and larger eigenvalues corresponding to higher frequency oscillations. You want the eigenvalue corresponding to the second smallest eigenvalue, which will be like the second mode in a drum, with a positive clustered together, and negative part clustered together.
Now there is some confusion in the comments about whether to use the largest or smallest eigenvalue, and it is because the laplacian in the paper linked there by dave is slightly different, being the identity minus your laplacian. So there they want the largest ones, whereas you want the smallest. The clustering in the paper is also a bit more advanced, and better, but not as easy to implement.
Here is your code, modified to work:
load rings.mat
[D, N] = size(X); % data stored in X
%initial plot data
figure; hold on;
for i=1:N,
plot(X(1,i), X(2,i),'o');
end
% perform spectral clustering
W = zeros(N,N);
D = zeros(N,N);
sigma = 0.3; % <--- Changed to be smaller
for i=1:N,
for j=1:N,
xixj2 = (X(1,i)-X(1,j))^2 + (X(2,i)-X(2,j))^2 ;
W(i,j) = exp( -1*xixj2 / (2*sigma^2) ) ; % compute weight here
% if (i==j)
% W(i,j)=0;
% end;
end;
D(i,i) = sum(W(i,:)) ;
end;
L = D - W ;
normL = D^-0.5*L*D^-0.5;
[u,s,v] = svd(normL);
% New code below this point
cluster1 = find(u(:,end-1) >= 0);
cluster2 = find(u(:,end-1) < 0);
figure
plot(X(1,cluster1),X(2,cluster1),'.b')
hold on
plot(X(1,cluster2),X(2,cluster2),'.r')
hold off
title(sprintf('sigma=%d',sigma))
Here is the result:
Now notice that I changed sigma to be smaller - from 1.0 to 0.3. When I left it at 1.0, I got the following result:
which I assume is because with sigma=1, the points in the inner cluster were able to "pull" on the outer cluster (which they are about distance 1 away from) enough so that it was more energetically favorable to split both circles in half like a solid vibrating drum, rather than have two different circles.

Defining windows to find multiple slopes

I need to define several windows for an experimental plot for which slopes can be found. For example, x runs from 0 to 400. I want to find the derivative of each 50x (i.e. 0 to 50, 50 to 100 & so on), and then average all derivatives (8 derivatives in this example). Thanks for any helps!
Assuming you have a vector y of measurements and want to compute the derivative by taking the difference between entry 1 and 50, 51 and 100, and so on, you could do the following:
% generate a signal
x=1:400;
y = x.^2;
nSamples = length(y);
% define number of segments and window size
N = 8;
Winsize = ceil(nSamples/N);
% preallocate the vector of slopes and compute the slopes
slopes = zeros(1,N);
for ii=1:N
slopes(ii) = (y(min(nSamples,Winsize*ii))-y(1+Winsize*(ii-1)))/(x(min(nSamples,Winsize*ii))-x(1+Winsize*(ii-1)));
end
% take the average slope value
Averageslope = mean(slopes);
However, since you are using matlab anyway you could also just take the average derivative of the vector, which should yield a much more accurate average when dealing with noisy data:
% generate a signal
x=1:400;
y = x.^2;
slope = mean(diff(y)/diff(x));

Creating Gaussian random variable with MATLAB

By using randn function I want to create a Gaussian random variable X such that X ~ N(2,4) and plot this simulated PDF together with theoretic curve.
Matlab randn generates realisations from a normal distribution with zero mean and a standard deviation of 1.
Samples from any other normal distribution can simply be generated via:
numSamples = 1000;
mu = 2;
sigma = 4;
samples = mu + sigma.*randn(numSamples, 1);
You can verify this by plotting the histogram:
figure;hist(samples(:));
See the matlab help.
N = 1000;
x = [-20:20];
samples = 2 + 4*randn(N, 1);
ySamples = histc(samples,x) / N;
yTheoretical = pdf('norm', x, 2, 4);
plot(x, yTheoretical, x, ySamples)
randn(N, 1) creates an N-by-1 vector.
histc is histogram count by bins given in x - you can use hist to plot the result immediately, but here we want to divide it by N.
pdf contains many useful PDFs, normal is just one example.
remember this: X ~ N(mean, variance)
randn in matlab produces normal distributed random variables W with zero mean and unit variance.
To change the mean and variance to be the random variable X (with custom mean and variance), follow this equation:
X = mean + standard_deviation*W
Please be aware of that standard_deviation is square root of variance.
N = 1000;
x = [-20:20];
samples = 2 + sqrt(4)*randn(N, 1);
ySamples = histc(samples,x) / N;
yTheoretical = pdf('norm', x, 2, sqrt(4)); %put std_deviation not variance
plot(x, yTheoretical, x, ySamples)
A quick and easy way to achieve this using one line of code is to use :
mu = 2;
sigma = 2;
samples = normrnd(mu,sigma,M,N);
This will generate an MxN matrix, sampled from N(μ,𝜎), (= N(2,2) in this particular case).
For additional information, see normrnd.