How do I simulate returns from an empirically derived distribution in MATLAB (Or Python)?

How do I simulate returns from an empirically derived distribution in MATLAB (Or Python)? - matlab

I also have to keep in mind the skewness and the kurtosis of the distribution and these have to be reflected in the simulated values.
My empirical values are past stock returns (non-standard normal distribution).
Is there an existing package that will do this for me? All the packages I see online have only the first two moments.

What you're describing is using the method of moments to define a distribution. Such methods have generally fallen out of favor in statistics. However, you can check out pearsonrnd, which may work fine depending on your data.
Instead, I'd suggest directly finding the empirical CDF for the data using ecdf and use that in conjunction with inverse sampling to generate random variates. Here's a basic function that will do that:
function r=empiricalrnd(y,varargin)
%EMPIRICALRND Random values from an empirical distribution
% EMPIRICALRND(Y) returns a single random value from distribution described by the data
% in the vector Y.
%
% EMPIRICALRND(Y,M), EMPIRICALRND(Y,M,N), EMPIRICALRND(Y,[M,N]), etc. return random arrays.
[f,x] = ecdf(y);
r = rand(varargin{:});
r = interp1(f,x,r,'linear','extrap');
You can play with the options for interp1 if you like. And here's a quick test:
% Generate demo data for 100 samples of log-normal distribution
mu = 1;
sig = 1;
m = 1e2;
rng(1); % Set seed to make repeatable
y = lognrnd(mu,sig,m,1);
% Generate 1000 random variates from data
n = 1e3;
r = empiricalrnd(y,n,1);
% Plot CDFs
ecdf(y);
hold on;
ecdf(r);
x = 0:0.1:50;
plot(x,logncdf(x,mu,sig),'g');
legend('Empirical CDF','CDF of sampled data','Actual CDF');

Related

How to use the randn function in Matlab to create an array of values (range 0-10) of size 1,000 that follows a Gaussian distribution? [duplicate]

Matlab has the function randn to draw from a normal distribution e.g.
x = 0.5 + 0.1*randn()
draws a pseudorandom number from a normal distribution of mean 0.5 and standard deviation 0.1.
Given this, is the following Matlab code equivalent to sampling from a normal distribution truncated at 0 at 1?
while x <=0 || x > 1
x = 0.5 + 0.1*randn();
end

Using MATLAB's Probability Distribution Objects makes sampling from truncated distributions very easy.
You can use the makedist() and truncate() functions to define the object and then modify (truncate it) to prepare the object for the random() function which allows generating random variates from it.
% MATLAB R2017a
pd = makedist('Normal',0.5,0.1) % Normal(mu,sigma)
pdt = truncate(pd,0,1) % truncated to interval (0,1)
sample = random(pdt,numRows,numCols) % Sample from distribution `pdt`
Once the object is created (here it is pdt, the truncated version of pd), you can use it in a variety of function calls.
To generate samples, random(pdt,m,n) produces a m x n array of samples from pdt.
Further, if you want to avoid use of toolboxes, this answer from #Luis Mendo is correct (proof below).
figure, hold on
h = histogram(cr,'Normalization','pdf','DisplayName','#Luis Mendo samples');
X = 0:.01:1;
p = plot(X,pdf(pdt,X),'b-','DisplayName','Theoretical (w/ truncation)');

You need the following steps
1. Draw a random value from uniform distribution, u.
2. Assuming the normal distribution is truncated at a and b. get
u_bar = F(a)*u +F(b) *(1-u)
3. Use the inverse of F
epsilon= F^{-1}(u_bar)
epsilon is a random value for the truncated normal distribution.

Why don't you vectorize? It will probably be faster:
N = 1e5; % desired number of samples
m = .5; % desired mean of underlying Gaussian
s = .1; % desired std of underlying Gaussian
lower = 0; % lower value for truncation
upper = 1; % upper value for truncation
remaining = 1:N;
while remaining
result(remaining) = m + s*randn(1,numel(remaining)); % (pre)allocates the first time
remaining = find(result<=lower | result>upper);
end

Matlab fit poisson function to histogram

I am trying to fit a Poisson function to a histogram in Matlab: the example calls for using hist() (which is deprecated) so I want to use histogram() instead, especially as you cannot seem to normalize a hist(). I then want to apply a poisson function to it using poisspdf() or any other standard function (preferably no toolboxes!). The histogram is probability scaled, which is where the issue with the poisson function comes from AFAIK.
clear
clc
lambda = 5;
range = 1000;
rangeVec = 1:range;
randomData = poissrnd(lambda, 1, range);
histoFigure = histogram(randomData, 'Normalization', 'probability');
hold on
poissonFunction = poisspdf(randomData, lambda);
poissonFunction2 = poisspdf(histoFigure, lambda);
plot(poissonFunction)
plot(poissonFunction2)
I have tried multiple different methods of creating the poisson function + plotting and neither of them seems to work: the values within this function are not consistent with the histogram values as they differ by several decimals.
This is what the image should look like
however currently I can only get the bar graphs to show up correctly.

You're not specifing the x-data of you're curve. Then the sample number is used and since you have 1000 samples, you get the ugly plot. The x-data that you use is randomData. Using
plot(randomData, poissonFunction)
will lead to lines between different samples, because the samples follow each other randomly. To take each sample only once, you can use unique. It is important that the x and y values stay connected to each other, so it's best to put randomData and poissonFunction in 1 matrix, and then use unique:
d = [randomData;poissonFunction].'; % make 1000x2 matrix to find unique among rows
d = unique(d,'rows');
You can use d to plot the data.
Full code:
clear
clc
lambda = 5;
range = 1000;
rangeVec = 1:range;
randomData = poissrnd(lambda, 1, range);
histoFigure = histogram(randomData, 'Normalization', 'probability');
hold on
poissonFunction = poisspdf(randomData, lambda);
d = [randomData; poissonFunction].';
d = unique(d, 'rows');
plot(d(:,1), d(:,2))
With as result:

Check weather predicted values follow the gaussian distribution or not using matlab?

I have used Gaussian Process for my prediction. Now let us assume I have predicted value store in x of size 1900 X 1. Now I want to check whether its distribution follow the gaussian distribution or not . I need this in order to compare the distribution functions of other methods predicted values like NN,KNN in order to judge which one is following smooth gaussian or normal distribution functions
How I can Do this ? Better if I can get some result in the form of numerical data. the code is written as follows,
m = mean(ypred); % mean of r
s = std(ypred); % stdev of r
pd = makedist('Normal','mu',m,'sigma',s); % make probability distribution with mu = m and sigma = s
[h,p] = kstest(ypred,'CDF',pd); % calculate probability that it is a normal distribution
The ypred value is the output obtain from fitrgp of matlab. Sample of ypred values are attached here
The [figure]2 is a residual qq_plot of measured and predicted values.

You can make a One-sample Kolmogorov-Smirnov test:
x = 1 + 2.*randn(1000,1); % just some random normal distributed data, replace it with your actual 1900x1 vector.
m = mean(x); % mean of r
s = std(x); % stdev of r
pd = makedist('Normal','mu',m,'sigma',s); % make probability distribution with mu = m and sigma = s
[h,p] = kstest(x,'CDF',pd); % calculate probability that it is a normal distribution
Where p is the probability that it follows a normal distribution and h = 1 if the null-hypothesis is rejected with a significance of 0.05. Since the null-hypothesis is "it follows a normal distribution", h = 0means that it is normal distributed.
Since x was in this example was sampled from a normal distribution, most likely h = 0 and p > 0.05. If you run above code with
x = 1 + 2.*rand(1000,1); % sampled from uniform distribution
h will most likely be 1 and p<0.05. Of course you can write the whole thing as a one-liner to avoid creating m,s and pd.

Random samples from Lognormal distribution

I have a parameter X that is lognormally distributed with mean 15 and standard deviation 0.48. For monte carlo simulation in MATLAB, I want to generate 40,000 samples from this distribution. How could be done in MATLAB?

To generate an MxN matrix of lognornally distributed random numbers with parameter mu and sigma, use lognrnd (Statistics Toolbox):
result = lognrnd(mu,sigma,M,N);
If you don't have the Statistics Toolbox, you can equivalently use randn and then take the exponential. This exploits the fact that, by definition, the logarithm of a lognormal random variable is a normal random variable:
result = exp(mu+sigma*randn(M,N));
The parameters mu and sigma of the lognormal distribution are the mean and standard deviation of the associated normal distribution. To see how the mean and standard deviarion of the lognormal distribution are related to parameters mu, sigma, see lognrnd documentation.

To generate random samples, you need the inverted cdf. If you have done this, generating samples is nothing more than 'my_icdf(rand(n, m))'
First get the cdf (integrating the pdf) and then invert the function to get the inverted cdf.

You can convert between the mean and variance of the Lognormal distribution and its parameters (mu,sigma) which correspond to the associated Normal (Gaussian) distribution using the formulas.
The approach below uses the Probability Distribution Objects introduced in MATLAB 2013a. More specifically, it uses the makedist, random, and pdf functions.
% Notation
% if X~Lognormal(mu,sigma) the E[X] = m & Var(X) = v
m = 15; % Target mean for Lognormal distribution
v = 0.48; % Target variance Lognormal distribution
getLmuh=#(m,v) log(m/sqrt(1+(v/(m^2))));
getLvarh=#(m,v) log(1 + (v/(m^2)));
mu = getLmuh(m,v);
sigma = sqrt(getLvarh(m,v));
% Generate Random Samples
pd = makedist('Lognormal',mu,sigma);
X = random(pd,1000,1); % Generates a 1000 x 1 vector of samples
You can verify the correctness via the mean and var functions and the distribution object:
>> mean(pd)
ans =
15
>> var(pd)
ans =
0.4800
Generating samples via the inverse transform is also made easy using the icdf (inverse CDF) function.
% Alternate way to generate X~Lognormal(mu,sigma)
U = rand(1000,1); % U ~ Uniform(0,1)
X = icdf(pd,U); % Inverse Transform
The graphic generated by following code (MATLAB 2018a).
Xrng = [0:.01:20]';
figure, hold on, box on
h(1) = histogram(X,'DisplayName','Random Sample (N = 1000)');
h(2) = plot(Xrng,pdf(pd,Xrng),'b-','DisplayName','Theoretical PDF');
legend('show','Location','northwest')
title('Lognormal')
xlabel('X')
ylabel('Probability Density Function')
% Options
h(1).Normalization = 'pdf';
h(1).FaceColor = 'k';
h(1).FaceAlpha = 0.35;
h(2).LineWidth = 2;

simulating a random walk in matlab

i have variable x that undergoes a random walk according to the following rules:
x(t+1)=x(t)-1; probability p=0.3
x(t+1)=x(t)-2; probability q=0.2
x(t+1)=x(t)+1; probability p=0.5
a) i have to create this variable initialized at zero and write a for loop for 100 steps and that runs 10000 times storing each final value in xfinal
b) i have to plot a probability distribution of xfinal (a histogram) choosing a bin size and normalization!!* i have to report the mean and variance of xfinal
c) i have to recreate the distribution by application of the central limit theorem and plot the probability distribution on the same plot!
help would be appreciated in telling me how to choose the bin size and normalize the histogram and how to attempt part c)
your help is much appreciated!!
p=0.3;
q=0.2;
s=0.5;
numberOfSteps = 100;
maxCount = 10000;
for count=1:maxCount
x=0;
for i = 1:numberOfSteps
random = rand(1, 1);
if random <=p
x=x-1;
elseif random<=(p+q)
x=x-2;
else
x=x+1;
end
end
xfinal(count) = x;
end
[f,x]=hist(xfinal,30);
figure(1)
bar(x,f/sum(f));
xlabel('xfinal')
ylabel('frequency')
mean = mean(xfinal)
variance = var(xfinal)

For the first question, check the help for hist on mathworks homepage
[nelements,centers] = hist(data,nbins);
You do not select the bin size, but the number of bins. nelements gives the elements per bin and center is all the bin centers. So to say, it would be the same to call
hist(data,nbins);
as
[nelements,centers] = hist(data,nbins);
plot(centers,nelements);
except that the representation is different (line or pile). To normalize, simply divide nelements with sum(nelements)
For c, here i.i.d. variables it actually is a difference if the variables are real or complex. However for real variables the central limit theorem in short tells you that for a large number of samples the distribution will limit the normal distribution. So if the samples are real, you simply asssumes a normal distribution, calculates the mean and variance and plots this as a normal distribution. If the variables are complex, then each of the variables will be normally distributed which means that you will have a rayleigh distribution instead.

Mathworks is deprecating hist that is being replaced with histogram.
more details in this link
You are not applying the PDF function as expected, the expression Y doesn't work
For instance Y does not have the right X-axis start stop points. And you are using x as input to Y while x already used as pivot inside the double for loop.
When I ran your code Y generates a single value, it is not a vector but just a scalar.
This
bar(x,f/sum(f));
bringing down all input values with sum(f) division? no need.
On attempting to overlap the ideal probability density function, often one has to do additional scaling, to have both real and ideal visually overlapped.
MATLAB can do the scaling for us, and no need to modify input data /sum(f).
With a dual plot using yyaxis
You also mixed variance and standard deviation.
Instead try something like this
y2=1 / sqrt(2*pi*var1)*exp(-(x2-m1).^2 / (2*var1))
ok, the following solves your question(s)
codehere
clear all;
close all;
clc
p=0.3; % thresholds
q=0.2;
s=0.5;
n_step=100;
max_cnt=10000;
n_bin=30; % histogram amount bins
xf=zeros(1,max_cnt);
for cnt=1:max_cnt % runs loop
x=0;
for i = 1:n_step % steps loop
t_rand1 = rand(1, 1);
if t_rand1 <=p
x=x-1;
elseif t_rand1<=(p+q)
x=x-2;
else
x=x+1;
end
end
xf(cnt) = x;
end
% [f,x]=hist(xf,n_bin);
hf1=figure(1)
ax1=gca
yyaxis left
hp1=histogram(xf,n_bin);
% bar(x,f/sum(f));
grid on
xlabel('xf')
ylabel('frequency')
m1 = mean(xf)
var1 = var(xf)
s1=var1^.5 % sigma
%applying central limit theorem %finding the mean
n_x2=1e3 % just enough points
min_x2=min(hp1.BinEdges)
max_x2=max(hp1.BinEdges)
% quite same as
min_x2=hp1.BinLimits(1)
max_x2=hp1.BinLimits(2)
x2=linspace(min_x2,max_x2,n_x2)
y2=1/sqrt(2*pi*var1)*exp(-(x2-m1).^2/(2*var1));
% hold(ax1,'on')
yyaxis right
plot(ax1,x2,y2,'r','LineWidth',2)
.
.
.
note I have not used these lines
% Xp=-1; Xq=-2; Xs=1; mu=Xp.*p+Xq.*q+Xs.*s;
% muN=n_step.*mu;
%
% sigma=(Xp).^2.*p+(Xq).^2.*q+(Xs).^2.s; % variance
% sigmaN=n_step.(sigma-(mu).^2);
People ususally call sigma to variance^.5
This supplied script is a good start point to now take it to wherever you need it to go.