How to get cumulative distribution functions of a vector in Matlab using cumsum? - matlab

I want to get the probability to get a value X higher than x_i, which means the cumulative distribution functions CDF. P(X>=x_i).
I've tried to do it in Matlab with this code.
Let's assume the data is in the column vector p1.
xp1 = linspace(min(p1), max(p1)); %range of bins
histp1 = histc(p1(:), xp1); %histogram od data
probp1 = histp1/sum(histp1); %PDF (probability distribution function)
`figure;plot(probp1, 'o') `
Now I want to calculate the CDF,
sorncount = flipud(histp1);
cumsump1 = cumsum(sorncount);
normcumsump1 = cumsump1/max(cumsump1);
cdf = flipud(normcumsump1);
figure;plot(xp1, cdf, 'ok');
I'm wondering whether anyone can help me to know if I'm ok or am I doing something wrong?

Your code works correctly, but is a bit more complicated than it could be. Since probp1 has been normalized to have sum equal to 1, the maximum of its cumulative sum is guaranteed to be 1, so there is no need to divide by this maximum. This shortens the code a bit:
xp1 = linspace(min(p1), max(p1)); %range of bins
histp1 = histc(p1(:), xp1); %count for each bin
probp1 = histp1/sum(histp1); %PDF (probability distribution function)
cdf = flipud(cumsum(flipud(histp1))); %CDF (unconventional, of P(X>=a) kind)
As Raab70 noted, most of the time CDF is understood as P(X<=a), in which case you don't need flipud: taking cumsum(histp1) is all that's needed.
Also, I would probably use histp1(end:-1:1) instead of flipud(histp1), so that the vector is flipped no matter if it's a row or column.

Related

Verify Law of Large Numbers in MATLAB

The problem:
If a large number of fair N-sided dice are rolled, the average of the simulated rolls is likely to be close to the mean of 1,2,...N i.e. the expected value of one die. For example, the expected value of a 6-sided die is 3.5.
Given N, simulate 1e8 N-sided dice rolls by creating a vector of 1e8 uniformly distributed random integers. Return the difference between the mean of this vector and the mean of integers from 1 to N.
My code:
function dice_diff = loln(N)
% the mean of integer from 1 to N
A = 1:N
meanN = sum(A)/N;
% I do not have any idea what I am doing here!
V = randi(1e8);
meanvector = V/1e8;
dice_diff = meanvector - meanN;
end
First of all, make sure everytime you ask a question that it is as clear as possible, to make it easier for other users to read.
If you check how randi works, you can see this:
R = randi(IMAX,N) returns an N-by-N matrix containing pseudorandom
integer values drawn from the discrete uniform distribution on 1:IMAX.
randi(IMAX,M,N) or randi(IMAX,[M,N]) returns an M-by-N matrix.
randi(IMAX,M,N,P,...) or randi(IMAX,[M,N,P,...]) returns an
M-by-N-by-P-by-... array. randi(IMAX) returns a scalar.
randi(IMAX,SIZE(A)) returns an array the same size as A.
So, if you want to use randi in your problem, you have to use it like this:
V=randi(N, 1e8,1);
and you need some more changes:
function dice_diff = loln(N)
%the mean of integer from 1 to N
A = 1:N;
meanN = mean(A);
V = randi(N, 1e8,1);
meanvector = mean(V);
dice_diff = meanvector - meanN;
end
For future problems, try using the command
help randi
And matlab will explain how the function randi (or other function) works.
Make sure to check if the code above gives the desired result
As pointed out, take a closer look at the use of randi(). From the general case
X = randi([LowerInt,UpperInt],NumRows,NumColumns); % UpperInt > LowerInt
you can adapt to dice rolling by
Rolls = randi([1 NumSides],NumRolls,NumSamplePaths);
as an example. Exchanging NumRolls and NumSamplePaths will yield Rolls.', or transpose(Rolls).
According to the Law of Large Numbers, the updated sample average after each roll should converge to the true mean, ExpVal (short for expected value), as the number of rolls (trials) increases. Notice that as NumRolls gets larger, the sample mean converges to the true mean. The image below shows this for two sample paths.
To get the sample mean for each number of dice rolls, I used arrayfun() with
CumulativeAvg1 = arrayfun(#(jj)mean(Rolls(1:jj,1)),[1:NumRolls]);
which is equivalent to using the cumulative sum, cumsum(), to get the same result.
CumulativeAvg1 = (cumsum(Rolls(:,1))./(1:NumRolls).'); % equivalent
% MATLAB R2019a
% Create Dice
NumSides = 6; % positive nonzero integer
NumRolls = 200;
NumSamplePaths = 2;
% Roll Dice
Rolls = randi([1 NumSides],NumRolls,NumSamplePaths);
% Output Statistics
ExpVal = mean(1:NumSides);
CumulativeAvg1 = arrayfun(#(jj)mean(Rolls(1:jj,1)),[1:NumRolls]);
CumulativeAvgError1 = CumulativeAvg1 - ExpVal;
CumulativeAvg2 = arrayfun(#(jj)mean(Rolls(1:jj,2)),[1:NumRolls]);
CumulativeAvgError2 = CumulativeAvg2 - ExpVal;
% Plot
figure
subplot(2,1,1), hold on, box on
plot(1:NumRolls,CumulativeAvg1,'b--','LineWidth',1.5,'DisplayName','Sample Path 1')
plot(1:NumRolls,CumulativeAvg2,'r--','LineWidth',1.5,'DisplayName','Sample Path 2')
yline(ExpVal,'k-')
title('Average')
xlabel('Number of Trials')
ylim([1 NumSides])
subplot(2,1,2), hold on, box on
plot(1:NumRolls,CumulativeAvgError1,'b--','LineWidth',1.5,'DisplayName','Sample Path 1')
plot(1:NumRolls,CumulativeAvgError2,'r--','LineWidth',1.5,'DisplayName','Sample Path 2')
yline(0,'k-')
title('Error')
xlabel('Number of Trials')

Monte Carlo simulation for approximating delta in Matlab

I need to code the Monte Carlo algorithm for approximating delta to Matlab and calculate confidence intervals:
but for some reason my code doesn't work, any ideas why?
randn('state', 100)
%Problem and method parameters
S=10; E=9; sigma=0.1; r=0.06; T=1;
Dt=1e-3; N=T/Dt; M=2^17;h=10^(-4);
delta = zeros(M,1);
for i = 1:M
Sfinal = S*exp((r-0.5*sigma^2)*T+sigma*sqrt(T).*randn(M,1));
S_h = (S+h)*exp((r-0.5*sigma^2)*T+sigma*sqrt(T).*randn(M,1));
delta(i) = exp(-r*T).*(max(Sfinal-E,0)-max(S_h-E,0))/h;
end
aM=mean(delta);
bM=std(delta);
conf=[aM-1.96*bM/sqrt(M),aM+1.96*bM/sqrt(M)]
The error message is
"Unable to perform assignment because the left and right sides have a different number of elements."
Any help is appreciated!
You do not need to explicitly write the for loop since you have already vectorized it. In other words, Sfinal and S_h are vectors of length M, and their i-th entries correspond to S_i and S^h_i in the image. Since the right hand side of the delta expression evaluates to a vector of length M, which contains all the values of delta, you should assign that vector directly to delta, not delta(i).
One more thing: the pseudo-code in the image seems to suggest that the same random number should be used for calculating both S_i and S^h_i. This is not the case in your code, since you call randn separately for calculating Sfinal and S_h. I think you should generate the random samples once, save them, and then use it for both calculations.
Here's the code:
randn('state', 100)
%Problem and method parameters
S=10; E=9; sigma=0.1; r=0.06; T=1;
Dt=1e-3; N=T/Dt; M=2^17;h=10^(-4);
xi = randn(M,1);
Sfinal = S*exp((r-0.5*sigma^2)*T+sigma*sqrt(T).*xi);
S_h = (S+h)*exp((r-0.5*sigma^2)*T+sigma*sqrt(T).*xi);
delta = exp(-r*T).*(max(Sfinal-E,0)-max(S_h-E,0))/h;
aM=mean(delta);
bM=std(delta);
conf=[aM-1.96*bM/sqrt(M),aM+1.96*bM/sqrt(M)]

A moving average with different functions and varying time-frames

I have a matrix time-series data for 8 variables with about 2500 points (~10 years of mon-fri) and would like to calculate the mean, variance, skewness and kurtosis on a 'moving average' basis.
Lets say frames = [100 252 504 756] - I would like calculate the four functions above on over each of the (time-)frames, on a daily basis - so the return for day 300 in the case with 100 day-frame, would be [mean variance skewness kurtosis] from the period day201-day300 (100 days in total)... and so on.
I know this means I would get an array output, and the the first frame number of days would be NaNs, but I can't figure out the required indexing to get this done...
This is an interesting question because I think the optimal solution is different for the mean than it is for the other sample statistics.
I've provided a simulation example below that you can work through.
First, choose some arbitrary parameters and simulate some data:
%#Set some arbitrary parameters
T = 100; N = 5;
WindowLength = 10;
%#Simulate some data
X = randn(T, N);
For the mean, use filter to obtain a moving average:
MeanMA = filter(ones(1, WindowLength) / WindowLength, 1, X);
MeanMA(1:WindowLength-1, :) = nan;
I had originally thought to solve this problem using conv as follows:
MeanMA = nan(T, N);
for n = 1:N
MeanMA(WindowLength:T, n) = conv(X(:, n), ones(WindowLength, 1), 'valid');
end
MeanMA = (1/WindowLength) * MeanMA;
But as #PhilGoddard pointed out in the comments, the filter approach avoids the need for the loop.
Also note that I've chosen to make the dates in the output matrix correspond to the dates in X so in later work you can use the same subscripts for both. Thus, the first WindowLength-1 observations in MeanMA will be nan.
For the variance, I can't see how to use either filter or conv or even a running sum to make things more efficient, so instead I perform the calculation manually at each iteration:
VarianceMA = nan(T, N);
for t = WindowLength:T
VarianceMA(t, :) = var(X(t-WindowLength+1:t, :));
end
We could speed things up slightly by exploiting the fact that we have already calculated the mean moving average. Simply replace the within loop line in the above with:
VarianceMA(t, :) = (1/(WindowLength-1)) * sum((bsxfun(#minus, X(t-WindowLength+1:t, :), MeanMA(t, :))).^2);
However, I doubt this will make much difference.
If anyone else can see a clever way to use filter or conv to get the moving window variance I'd be very interested to see it.
I leave the case of skewness and kurtosis to the OP, since they are essentially just the same as the variance example, but with the appropriate function.
A final point: if you were converting the above into a general function, you could pass in an anonymous function as one of the arguments, then you would have a moving average routine that works for arbitrary choice of transformations.
Final, final point: For a sequence of window lengths, simply loop over the entire code block for each window length.
I have managed to produce a solution, which only uses basic functions within MATLAB and can also be expanded to include other functions, (for finance: e.g. a moving Sharpe Ratio, or a moving Sortino Ratio). The code below shows this and contains hopefully sufficient commentary.
I am using a time series of Hedge Fund data, with ca. 10 years worth of daily returns (which were checked to be stationary - not shown in the code). Unfortunately I haven't got the corresponding dates in the example so the x-axis in the plots would be 'no. of days'.
% start by importing the data you need - here it is a selection out of an
% excel spreadsheet
returnsHF = xlsread('HFRXIndices_Final.xlsx','EquityHedgeMarketNeutral','D1:D2742');
% two years to be used for the moving average. (250 business days in one year)
window = 500;
% create zero-matrices to fill with the MA values at each point in time.
mean_avg = zeros(length(returnsHF)-window,1);
st_dev = zeros(length(returnsHF)-window,1);
skew = zeros(length(returnsHF)-window,1);
kurt = zeros(length(returnsHF)-window,1);
% Now work through the time-series with each of the functions (one can add
% any other functions required), assinging the values to the zero-matrices
for count = window:length(returnsHF)
% This is the most tricky part of the script, the indexing in this section
% The TwoYearReturn is what is shifted along one period at a time with the
% for-loop.
TwoYearReturn = returnsHF(count-window+1:count);
mean_avg(count-window+1) = mean(TwoYearReturn);
st_dev(count-window+1) = std(TwoYearReturn);
skew(count-window+1) = skewness(TwoYearReturn);
kurt(count-window +1) = kurtosis(TwoYearReturn);
end
% Plot the MAs
subplot(4,1,1), plot(mean_avg)
title('2yr mean')
subplot(4,1,2), plot(st_dev)
title('2yr stdv')
subplot(4,1,3), plot(skew)
title('2yr skewness')
subplot(4,1,4), plot(kurt)
title('2yr kurtosis')

why is the vector coming out of 'trapz' function as NAN?

i am trying to calculate the inverse fourier transform of the vector XRECW. for some reason i get a vector of NANs.
please help!!
t = -2:1/100:2;
x = ((2/5)*sin(5*pi*t))./((1/25)-t.^2);
w = -20*pi:0.01*pi:20*pi;
Hw = (exp(j*pi.*(w./(10*pi)))./(sinc(w./(10*pi)))).*(heaviside(w+5*pi)-heaviside(w-5*pi));%low pass filter
xzohw = 0;
for q=1:20:400
xzohw = xzohw + x(q).*(2./w).*sin(0.1.*w).*exp(-j.*w*0.2*((q-1)/20)+0.5);%calculating fourier transform of xzoh
end
xzohw = abs(xzohw);
xrecw = abs(xzohw.*Hw);%filtering the fourier transform high frequencies
xrect=0;
for q=1:401
xrect(q) = (1/(2*pi)).*trapz(xrecw.*exp(j*w*t(q))); %inverse fourier transform
end
xrect = abs(xrect);
plot(t,xrect)
Here's a direct answer to your question of "why" there is a nan. If you run your code, the Nan comes from dividing by zero in line 7 for computing xzohw. Notice that w contains zero:
>> find(w==0)
ans =
2001
and you can see in line 7 that you divide by the elements of w with the (2./w) factor.
A quick fix (although it is not a guarantee that your code will do what you want) is to avoid including 0 in w by using a step which avoids zero. Since pi is certainly not divisible by 100, you can try taking steps in .01 increments:
w = -20*pi:0.01:20*pi;
Using this, your code produces a plot which might resemble what you're looking for. In order to do better, we might need more details on exactly what you're trying to do, or what these variables represent.
Hope this helps!

Not sure how the hist function in MATLAB works

I am not very sure how the hist function in MATLAB works. I seem to have few problems with it.
Bascially, in the code below, i am trying to run the rotation invariant Uniform Local Binary Pattern(LBP) code. I have no problem with the LBP code but the problem is with hist function(indicated in the code below).
The problem is that the range i should get is from 0:9 but when i apply the histogram function i get values greater than 9 such as 35, 27 and even values such as 178114.Not very sure how to correct it.
I2 = imread('test.png');
RIUniformHist=[];
m=size(I2,1);
n=size(I2,2);
for i=1:10:m
for j=1:10:n
for k=i+1:i+8
for l=j+1:j+8
J0=I2(k,l);
I3(k-1,l-1)=I2(k-1,l-1)>J0;
I3(k-1,l)=I2(k-1,l)>J0;
I3(k-1,l+1)=I2(k-1,l+1)>J0;
I3(k,l+1)=I2(k,l+1)>J0;
I3(k+1,l+1)=I2(k+1,l+1)>J0;
I3(k+1,l)=I2(k+1,l)>J0;
I3(k+1,l-1)=I2(k+1,l-1)>J0;
I3(k,l-1)=I2(k,l-1)>J0;
LBP=I3(k-1,l-1)*2^7+I3(k-1,l)*2^6+I3(k-1,l+1)*2^5+I3(k,l+1)*2^4+I3(k+1,l+1)*2^3+I3(k+1,l)*2^2+I3(k+1,l-1)*2^1+I3(k,l-1)*2^0;
bits = bitand(LBP, 2.^(7:-1:0))>0;
if nnz(diff(bits([1:end, 1]))) <= 2
RIULBP(k,l)=abs(I3(k-1,l-1)-I3(k-1,l))+ abs(I3(k-1,l)-I3(k-1,l+1))+ abs(I3(k-1,l+1)-I3(k,l+1))+ abs(I3(k,l+1)-I3(k+1,l+1))+abs(I3(k+1,l+1)-I3(k+1,l))+abs(I3(k+1,l)-I3(k+1,l-1))+abs(I3(k+1,l-1)-I3(k,l-1));
else
RIULBP(k,l)=9;
end
end
end
RIULBP=uint8(RIULBP);
RIULBPv=reshape(RIULBP,1,size(RIULBP,1)*size(RIULBP,2));
RIUHist=hist(RIULBPv,0:9); % problem
RIUniformHist = [RIUniformHist RIUHist];
end
end
The vector returned by
RIUHist=hist(data, bins)
is the count of how many elements of data are nearest the point identified by the bins vector. So if you have a value of 178114, that juts means that there were 178114 elements of data that were nearest to the matching index in bins.
You can use
[RIUHist, binsOut] = hist(data)
to let Matlab choose the bins (I believe it uses 20 bins) or
[RIUHist, binsOut] = hist(data, binCount)
To let Matlab choose the bins, but force a certain number of bins (I often use 100 or 200).