Check weather predicted values follow the gaussian distribution or not using matlab? - matlab

I have used Gaussian Process for my prediction. Now let us assume I have predicted value store in x of size 1900 X 1. Now I want to check whether its distribution follow the gaussian distribution or not . I need this in order to compare the distribution functions of other methods predicted values like NN,KNN in order to judge which one is following smooth gaussian or normal distribution functions
How I can Do this ? Better if I can get some result in the form of numerical data. the code is written as follows,
m = mean(ypred); % mean of r
s = std(ypred); % stdev of r
pd = makedist('Normal','mu',m,'sigma',s); % make probability distribution with mu = m and sigma = s
[h,p] = kstest(ypred,'CDF',pd); % calculate probability that it is a normal distribution
The ypred value is the output obtain from fitrgp of matlab. Sample of ypred values are attached here
The [figure]2 is a residual qq_plot of measured and predicted values.

You can make a One-sample Kolmogorov-Smirnov test:
x = 1 + 2.*randn(1000,1); % just some random normal distributed data, replace it with your actual 1900x1 vector.
m = mean(x); % mean of r
s = std(x); % stdev of r
pd = makedist('Normal','mu',m,'sigma',s); % make probability distribution with mu = m and sigma = s
[h,p] = kstest(x,'CDF',pd); % calculate probability that it is a normal distribution
Where p is the probability that it follows a normal distribution and h = 1 if the null-hypothesis is rejected with a significance of 0.05. Since the null-hypothesis is "it follows a normal distribution", h = 0means that it is normal distributed.
Since x was in this example was sampled from a normal distribution, most likely h = 0 and p > 0.05. If you run above code with
x = 1 + 2.*rand(1000,1); % sampled from uniform distribution
h will most likely be 1 and p<0.05. Of course you can write the whole thing as a one-liner to avoid creating m,s and pd.

Related

How to use the randn function in Matlab to create an array of values (range 0-10) of size 1,000 that follows a Gaussian distribution? [duplicate]

Matlab has the function randn to draw from a normal distribution e.g.
x = 0.5 + 0.1*randn()
draws a pseudorandom number from a normal distribution of mean 0.5 and standard deviation 0.1.
Given this, is the following Matlab code equivalent to sampling from a normal distribution truncated at 0 at 1?
while x <=0 || x > 1
x = 0.5 + 0.1*randn();
end
Using MATLAB's Probability Distribution Objects makes sampling from truncated distributions very easy.
You can use the makedist() and truncate() functions to define the object and then modify (truncate it) to prepare the object for the random() function which allows generating random variates from it.
% MATLAB R2017a
pd = makedist('Normal',0.5,0.1) % Normal(mu,sigma)
pdt = truncate(pd,0,1) % truncated to interval (0,1)
sample = random(pdt,numRows,numCols) % Sample from distribution `pdt`
Once the object is created (here it is pdt, the truncated version of pd), you can use it in a variety of function calls.
To generate samples, random(pdt,m,n) produces a m x n array of samples from pdt.
Further, if you want to avoid use of toolboxes, this answer from #Luis Mendo is correct (proof below).
figure, hold on
h = histogram(cr,'Normalization','pdf','DisplayName','#Luis Mendo samples');
X = 0:.01:1;
p = plot(X,pdf(pdt,X),'b-','DisplayName','Theoretical (w/ truncation)');
You need the following steps
1. Draw a random value from uniform distribution, u.
2. Assuming the normal distribution is truncated at a and b. get
u_bar = F(a)*u +F(b) *(1-u)
3. Use the inverse of F
epsilon= F^{-1}(u_bar)
epsilon is a random value for the truncated normal distribution.
Why don't you vectorize? It will probably be faster:
N = 1e5; % desired number of samples
m = .5; % desired mean of underlying Gaussian
s = .1; % desired std of underlying Gaussian
lower = 0; % lower value for truncation
upper = 1; % upper value for truncation
remaining = 1:N;
while remaining
result(remaining) = m + s*randn(1,numel(remaining)); % (pre)allocates the first time
remaining = find(result<=lower | result>upper);
end

Plot normalized uniform mixture

I need to reproduce the normalized density p(x) below, but the code given does not generate a normalized PDF.
clc, clear
% Create three distribution objects with different parameters
pd1 = makedist('Uniform','lower',2,'upper',6);
pd2 = makedist('Uniform','lower',2,'upper',4);
pd3 = makedist('Uniform','lower',5,'upper',6);
% Compute the pdfs
x = -1:.01:9;
pdf1 = pdf(pd1,x);
pdf2 = pdf(pd2,x);
pdf3 = pdf(pd3,x);
% Sum of uniforms
pdf = (pdf1 + pdf2 + pdf3);
% Plot the pdfs
figure;
stairs(x,pdf,'r','LineWidth',2);
If I calculate the normalized mixture PDF by simply scaling them by their total sum, I have different normalized probability comparing with the original figure above.
pdf = pdf/sum(pdf);
Mixture
A mixture of two random variables means with probability p use Distribution 1, and with probability 1-p use Distribution 2.
Based on your graph, it appears you are mixing the distributions rather than adding (convolving) them. The precise results matter very much upon the mixing probabilities. As an example, I've chosen a = 0.25, b = 0.35, and c = 1-a-b.
For a mixture, the probability density function (PDF) is analytically available:
pdfMix =#(x) a.*pdf(pd1,x) + b.*pdf(pd2,x) + c.*pdf(pd3,x).
% MATLAB R2018b
pd1 = makedist('Uniform',2,6);
pd2 = makedist('Uniform',2,4);
pd3 = makedist('Uniform',5,6);
a = 0.25;
b = 0.35;
c = 1 - a - b; % a + b + c = 1
pdfMix =#(x) a.*pdf(pd1,x) + b.*pdf(pd2,x) + c.*pdf(pd3,x);
Xrng = 0:.01:8;
plot(Xrng,pdfMix(Xrng))
xlabel('X')
ylabel('Probability Density Function')
Since the distributions being mixed are uniform you could also use the stairs() command: stairs(Xrng,pdfMix(Xrng)).
We can verify this is a valid PDF by ensuring the total area is 1.
integral(pdfMix,0,9)
ans = 1.0000
Convolution: Adding Random Variables
Adding the random variables together yields a different result. Again, this can be done empirically easily. It is possible to this analytically. For example, convolving two Uniform(0,1) distributions yields a Triangular(0,1,2) distribution. The convolution of random variables is just a fancy way of saying we add them up and there is a way to obtain the resulting PDF using integration if you're interested in analytical results.
N = 80000; % Number of samples
X1 = random(pd1,N,1); % Generate samples
X2 = random(pd2,N,1);
X3 = random(pd3,N,1);
X = X1 + X2 + X3; % Convolution
Notice the change of scale for the x-axis (Xrng = 0:.01:16;).
To obtain this, I generated 80k samples from each distribution with random() then added them up to obtain 80k samples of the desired convolution. Notice when I used histogram() I used the 'Normalization', 'pdf' option.
Xrng = 0:.01:16;
figure, hold on, box on
p(1) = plot(Xrng,pdf(pd1,Xrng),'DisplayName','X1 \sim U(2,6)')
p(2) = plot(Xrng,pdf(pd2,Xrng),'DisplayName','X2 \sim U(2,4)')
p(3) = plot(Xrng,pdf(pd3,Xrng),'DisplayName','X3 \sim U(5,6)')
h = histogram(X,'Normalization','pdf','DisplayName','X = X1 + X2 + X3')
% Cosmetics
legend('show','Location','northeast')
for k = 1:3
p(k).LineWidth = 2.0;
end
title('X = X1 + X2 + X3 (50k samples)')
xlabel('X')
ylabel('Probability Density Function (PDF)')
You can obtain an estimate of the PDF using the fitdist() and the Kernel distribution object then calling the pdf() command on the resulting Kernel distribution object.
pd_kernel = fitdist(X,'Kernel')
figure, hold on, box on
h = histogram(X,'Normalization','pdf','DisplayName','X = X1 + X2 + X3')
pk = plot(Xrng,pdf(pd_kernel,Xrng),'b-') % Notice use of pdf command
legend('Empirical','Kernel Distribution','Location','northwest')
If you do this, you'll notice the resulting kernel is unbounded but you can correct this since you know the bounds using truncate(). You could also use the ksdensity() function, though the probability distribution object approach is probably more user friendly due to all the functions you have direct access to. You should be aware that the kernel is an approximation (you can see that clearly in the kernel plot). In this case, the integration to convolve 3 uniform distributions isn't too bad, so finding the PDF analytically is probably the preferred choice if the PDF is desired. Otherwise, empirical approaches (especially for generation), are probably sufficient though this depends on your application.
pdt_kernel = truncate(pd_kernel,9,16)
Generating samples from mixtures and convolutions is a different issue (but manageable).

Generating a random number based off normal distribution in matlab

I am trying to generate a random number based off of normal distribution traits that I have (mean and standard deviation). I do NOT have the Statistics and Machine Learning toolbox.
I know one way to do it would be to randomly generate a random number r from 0 to 1 and find the value that gives a probability of that random number. I can do this by entering the standard normal function
f= #(y) (1/(1*2.50663))*exp(-((y).^2)/(2*1^2))
and solving for
r=integral(f,-Inf,z)
and then extrapolating from that z-value to the final answer X with the equation
z=(X-mew)/sigma
But as far as I know, there is no matlab command that allows you to solve for x where x is the limit of an integral. Is there a way to do this, or is there a better way to randomly generate this number?
You can use the built-in randn function which yields random numbers pulled from a standard normal distribution with a zero mean and a standard deviation of 1. To alter this distribution, you can multiply the output of randn by your desired standard deviation and then add your desired mean.
% Define the distribution that you'd like to get
mu = 2.5;
sigma = 2.0;
% You can any size matrix of values
sz = [10000 1];
value = (randn(sz) * sigma) + mu;
% mean(value)
% 2.4696
%
% std(value)
% 1.9939
If you just want a single number from the distribution, you can use the no-input version of randn to yield a scalar
value = (randn * sigma) + mu;
Just for the fun of it, you can generate a Gaussian random variable using a uniform random generator:
The logarithm of a uniform random variable on (0,1) has an exponential distribution
The square root of that has a Rayleigh distribution
Multiply by the cosine (or sine) of a uniform random variable on (0,2*pi) and the result is Gaussian. You need to multiply by sqrt(2) to normalize.
The obtained Gaussian variable is normalized (zero mean, unit standard deviation). If you need specific mean and standard deviation, multiply by the latter and then add the former.
Example (normalized Gaussian):
m = 1; n = 1e5; % desired output size
x = sqrt(-2*log(rand(m,n))).*cos(2*pi*rand(m,n));
Check:
>> mean(x)
ans =
-0.001194631660594
>> std(x)
ans =
0.999770464360453
>> histogram(x,41)

empirical mean and variance plot in matlab with the normal distribution

I'am new in matlab programming,I should Write a script to generate a random sequence (x1,..., XN) of size N following the normal distribution N (0, 1) and calculate the empirical mean mN and variance σN^2
Then,I should plot them:
this is my essai:
function f = normal_distribution(n)
x =randn(n);
muem = 1./n .* (sum(x));
muem
%mean(muem)
vaem = 1./n .* (sum((x).^2));
vaem
hold on
plot(x,muem,'-')
grid on
plot(x,vaem,'*')
NB:those are the formules that I have used:
I have obtained,a Figure and I don't know if is it correct or not ,thanks for Help
From your question, it seems what you want to do is calculate the mean and variance from a sample of size N (nor an NxN matrix) drawn from a standard normal distribution. So you may want to use randn(n, 1), instead of randn(n). Also as #ThP pointed out, it does not make sense to plot mean and variance vs. x. What you could do is to calculate means and variances for inceasing sample sizes n1, n2, ..., nm, and then plot sample size vs. mean or variance, to see them converge to 0 and 1. See the code below:
function [] = plotMnV(nIter)
means = zeros(nIter, 1);
vars = zeros(nIter, 1);
for pow = 1:nIter
n = 2^pow;
x =randn(n, 1);
means(pow) = 1./n * sum(x);
vars(pow) = 1./n * sum(x.^2);
end
plot(1:nIter, means, 'o-');
hold on;
plot(1:nIter, vars, '*-');
end
For example, plotMnV(20) gave me the plot below.

Random samples from Lognormal distribution

I have a parameter X that is lognormally distributed with mean 15 and standard deviation 0.48. For monte carlo simulation in MATLAB, I want to generate 40,000 samples from this distribution. How could be done in MATLAB?
To generate an MxN matrix of lognornally distributed random numbers with parameter mu and sigma, use lognrnd (Statistics Toolbox):
result = lognrnd(mu,sigma,M,N);
If you don't have the Statistics Toolbox, you can equivalently use randn and then take the exponential. This exploits the fact that, by definition, the logarithm of a lognormal random variable is a normal random variable:
result = exp(mu+sigma*randn(M,N));
The parameters mu and sigma of the lognormal distribution are the mean and standard deviation of the associated normal distribution. To see how the mean and standard deviarion of the lognormal distribution are related to parameters mu, sigma, see lognrnd documentation.
To generate random samples, you need the inverted cdf. If you have done this, generating samples is nothing more than 'my_icdf(rand(n, m))'
First get the cdf (integrating the pdf) and then invert the function to get the inverted cdf.
You can convert between the mean and variance of the Lognormal distribution and its parameters (mu,sigma) which correspond to the associated Normal (Gaussian) distribution using the formulas.
The approach below uses the Probability Distribution Objects introduced in MATLAB 2013a. More specifically, it uses the makedist, random, and pdf functions.
% Notation
% if X~Lognormal(mu,sigma) the E[X] = m & Var(X) = v
m = 15; % Target mean for Lognormal distribution
v = 0.48; % Target variance Lognormal distribution
getLmuh=#(m,v) log(m/sqrt(1+(v/(m^2))));
getLvarh=#(m,v) log(1 + (v/(m^2)));
mu = getLmuh(m,v);
sigma = sqrt(getLvarh(m,v));
% Generate Random Samples
pd = makedist('Lognormal',mu,sigma);
X = random(pd,1000,1); % Generates a 1000 x 1 vector of samples
You can verify the correctness via the mean and var functions and the distribution object:
>> mean(pd)
ans =
15
>> var(pd)
ans =
0.4800
Generating samples via the inverse transform is also made easy using the icdf (inverse CDF) function.
% Alternate way to generate X~Lognormal(mu,sigma)
U = rand(1000,1); % U ~ Uniform(0,1)
X = icdf(pd,U); % Inverse Transform
The graphic generated by following code (MATLAB 2018a).
Xrng = [0:.01:20]';
figure, hold on, box on
h(1) = histogram(X,'DisplayName','Random Sample (N = 1000)');
h(2) = plot(Xrng,pdf(pd,Xrng),'b-','DisplayName','Theoretical PDF');
legend('show','Location','northwest')
title('Lognormal')
xlabel('X')
ylabel('Probability Density Function')
% Options
h(1).Normalization = 'pdf';
h(1).FaceColor = 'k';
h(1).FaceAlpha = 0.35;
h(2).LineWidth = 2;