Not sure how the hist function in MATLAB works - matlab

I am not very sure how the hist function in MATLAB works. I seem to have few problems with it.
Bascially, in the code below, i am trying to run the rotation invariant Uniform Local Binary Pattern(LBP) code. I have no problem with the LBP code but the problem is with hist function(indicated in the code below).
The problem is that the range i should get is from 0:9 but when i apply the histogram function i get values greater than 9 such as 35, 27 and even values such as 178114.Not very sure how to correct it.
I2 = imread('test.png');
RIUniformHist=[];
m=size(I2,1);
n=size(I2,2);
for i=1:10:m
for j=1:10:n
for k=i+1:i+8
for l=j+1:j+8
J0=I2(k,l);
I3(k-1,l-1)=I2(k-1,l-1)>J0;
I3(k-1,l)=I2(k-1,l)>J0;
I3(k-1,l+1)=I2(k-1,l+1)>J0;
I3(k,l+1)=I2(k,l+1)>J0;
I3(k+1,l+1)=I2(k+1,l+1)>J0;
I3(k+1,l)=I2(k+1,l)>J0;
I3(k+1,l-1)=I2(k+1,l-1)>J0;
I3(k,l-1)=I2(k,l-1)>J0;
LBP=I3(k-1,l-1)*2^7+I3(k-1,l)*2^6+I3(k-1,l+1)*2^5+I3(k,l+1)*2^4+I3(k+1,l+1)*2^3+I3(k+1,l)*2^2+I3(k+1,l-1)*2^1+I3(k,l-1)*2^0;
bits = bitand(LBP, 2.^(7:-1:0))>0;
if nnz(diff(bits([1:end, 1]))) <= 2
RIULBP(k,l)=abs(I3(k-1,l-1)-I3(k-1,l))+ abs(I3(k-1,l)-I3(k-1,l+1))+ abs(I3(k-1,l+1)-I3(k,l+1))+ abs(I3(k,l+1)-I3(k+1,l+1))+abs(I3(k+1,l+1)-I3(k+1,l))+abs(I3(k+1,l)-I3(k+1,l-1))+abs(I3(k+1,l-1)-I3(k,l-1));
else
RIULBP(k,l)=9;
end
end
end
RIULBP=uint8(RIULBP);
RIULBPv=reshape(RIULBP,1,size(RIULBP,1)*size(RIULBP,2));
RIUHist=hist(RIULBPv,0:9); % problem
RIUniformHist = [RIUniformHist RIUHist];
end
end

The vector returned by
RIUHist=hist(data, bins)
is the count of how many elements of data are nearest the point identified by the bins vector. So if you have a value of 178114, that juts means that there were 178114 elements of data that were nearest to the matching index in bins.
You can use
[RIUHist, binsOut] = hist(data)
to let Matlab choose the bins (I believe it uses 20 bins) or
[RIUHist, binsOut] = hist(data, binCount)
To let Matlab choose the bins, but force a certain number of bins (I often use 100 or 200).

Related

Verify Law of Large Numbers in MATLAB

The problem:
If a large number of fair N-sided dice are rolled, the average of the simulated rolls is likely to be close to the mean of 1,2,...N i.e. the expected value of one die. For example, the expected value of a 6-sided die is 3.5.
Given N, simulate 1e8 N-sided dice rolls by creating a vector of 1e8 uniformly distributed random integers. Return the difference between the mean of this vector and the mean of integers from 1 to N.
My code:
function dice_diff = loln(N)
% the mean of integer from 1 to N
A = 1:N
meanN = sum(A)/N;
% I do not have any idea what I am doing here!
V = randi(1e8);
meanvector = V/1e8;
dice_diff = meanvector - meanN;
end
First of all, make sure everytime you ask a question that it is as clear as possible, to make it easier for other users to read.
If you check how randi works, you can see this:
R = randi(IMAX,N) returns an N-by-N matrix containing pseudorandom
integer values drawn from the discrete uniform distribution on 1:IMAX.
randi(IMAX,M,N) or randi(IMAX,[M,N]) returns an M-by-N matrix.
randi(IMAX,M,N,P,...) or randi(IMAX,[M,N,P,...]) returns an
M-by-N-by-P-by-... array. randi(IMAX) returns a scalar.
randi(IMAX,SIZE(A)) returns an array the same size as A.
So, if you want to use randi in your problem, you have to use it like this:
V=randi(N, 1e8,1);
and you need some more changes:
function dice_diff = loln(N)
%the mean of integer from 1 to N
A = 1:N;
meanN = mean(A);
V = randi(N, 1e8,1);
meanvector = mean(V);
dice_diff = meanvector - meanN;
end
For future problems, try using the command
help randi
And matlab will explain how the function randi (or other function) works.
Make sure to check if the code above gives the desired result
As pointed out, take a closer look at the use of randi(). From the general case
X = randi([LowerInt,UpperInt],NumRows,NumColumns); % UpperInt > LowerInt
you can adapt to dice rolling by
Rolls = randi([1 NumSides],NumRolls,NumSamplePaths);
as an example. Exchanging NumRolls and NumSamplePaths will yield Rolls.', or transpose(Rolls).
According to the Law of Large Numbers, the updated sample average after each roll should converge to the true mean, ExpVal (short for expected value), as the number of rolls (trials) increases. Notice that as NumRolls gets larger, the sample mean converges to the true mean. The image below shows this for two sample paths.
To get the sample mean for each number of dice rolls, I used arrayfun() with
CumulativeAvg1 = arrayfun(#(jj)mean(Rolls(1:jj,1)),[1:NumRolls]);
which is equivalent to using the cumulative sum, cumsum(), to get the same result.
CumulativeAvg1 = (cumsum(Rolls(:,1))./(1:NumRolls).'); % equivalent
% MATLAB R2019a
% Create Dice
NumSides = 6; % positive nonzero integer
NumRolls = 200;
NumSamplePaths = 2;
% Roll Dice
Rolls = randi([1 NumSides],NumRolls,NumSamplePaths);
% Output Statistics
ExpVal = mean(1:NumSides);
CumulativeAvg1 = arrayfun(#(jj)mean(Rolls(1:jj,1)),[1:NumRolls]);
CumulativeAvgError1 = CumulativeAvg1 - ExpVal;
CumulativeAvg2 = arrayfun(#(jj)mean(Rolls(1:jj,2)),[1:NumRolls]);
CumulativeAvgError2 = CumulativeAvg2 - ExpVal;
% Plot
figure
subplot(2,1,1), hold on, box on
plot(1:NumRolls,CumulativeAvg1,'b--','LineWidth',1.5,'DisplayName','Sample Path 1')
plot(1:NumRolls,CumulativeAvg2,'r--','LineWidth',1.5,'DisplayName','Sample Path 2')
yline(ExpVal,'k-')
title('Average')
xlabel('Number of Trials')
ylim([1 NumSides])
subplot(2,1,2), hold on, box on
plot(1:NumRolls,CumulativeAvgError1,'b--','LineWidth',1.5,'DisplayName','Sample Path 1')
plot(1:NumRolls,CumulativeAvgError2,'r--','LineWidth',1.5,'DisplayName','Sample Path 2')
yline(0,'k-')
title('Error')
xlabel('Number of Trials')

defining the X values for a code

I have this task to create a script that acts similarly to normcdf on matlab.
x=linspace(-5,5,1000); %values for x
p= 1/sqrt(2*pi) * exp((-x.^2)/2); % THE PDF for the standard normal
t=cumtrapz(x,p); % the CDF for the standard normal distribution
plot(x,t); %shows the graph of the CDF
The problem is when the t values are assigned to 1:1000 instead of -5:5 in increments. I want to know how to assign the correct x values, that is -5:5,1000 to the t values output? such as when I do t(n) I get the same result as normcdf(n).
Just to clarify: the problem is I cannot simply say t(-5) and get result =1 as I would in normcdf(1) because the cumtrapz calculated values are assigned to x=1:1000 instead of -5 to 5.
Updated answer
Ok, having read your comment; here is how to do what you want:
x = linspace(-5,5,1000);
p = 1/sqrt(2*pi) * exp((-x.^2)/2);
cdf = cumtrapz(x,p);
q = 3; % Query point
disp(normcdf(q)) % For reference
[~,I] = min(abs(x-q)); % Find closest index
disp(cdf(I)) % Show the value
Sadly, there is no matlab syntax which will do this nicely in one line, but if you abstract finding the closest index into a different function, you can do this:
cdf(findClosest(x,q))
function I = findClosest(x,q)
if q>max(x) || q<min(x)
warning('q outside the range of x');
end
[~,I] = min(abs(x-q));
end
Also; if you are certain that the exact value of the query point q exists in x, you can just do
cdf(x==q);
But beware of floating point errors though. You may think that a certain range outght to contain a certain value, but little did you know it was different by a tiny roundoff erorr. You can see that in action for example here:
x1 = linspace(0,1,1000); % Range
x2 = asin(sin(x1)); % Ought to be the same thing
plot((x1-x2)/eps); grid on; % But they differ by rougly 1 unit of machine precision
Old answer
As far as I can tell, running your code does reproduce the result of normcdf(x) well... If you want to do exactly what normcdf does them use erfc.
close all; clear; clc;
x = linspace(-5,5,1000);
cdf = normcdf(x); % Result of normcdf for comparison
%% 1 Trapezoidal integration of normal pd
p = 1/sqrt(2*pi) * exp((-x.^2)/2);
cdf1 = cumtrapz(x,p);
%% 2 But error function IS the integral of the normal pd
cdf2 = (1+erf(x/sqrt(2)))/2;
%% 3 Or, even better, use the error function complement (works better for large negative x)
cdf3 = erfc(-x/sqrt(2))/2;
fprintf('1: Mean error = %.2d\n',mean(abs(cdf1-cdf)));
fprintf('2: Mean error = %.2d\n',mean(abs(cdf2-cdf)));
fprintf('3: Mean error = %.2d\n',mean(abs(cdf3-cdf)));
plot(x,cdf1,x,cdf2,x,cdf3,x,cdf,'k--');
This gives me
1: Mean error = 7.83e-07
2: Mean error = 1.41e-17
3: Mean error = 00 <- Because that is literally what normcdf is doing
If your goal is not not to use predefined matlab funcitons, but instead to calculate the result numerically (i.e. calculate the error function) then it's an interesting challange which you can read about for example here or in this stats stackexchange post. Just as an example, the following piece of code calculates the error function by implementing eq. 2 form the first link:
nerf = #(x,n) (-1)^n*2/sqrt(pi)*x.^(2*n+1)./factorial(n)/(2*n+1);
figure(1); hold on;
temp = zeros(size(x)); p =[];
for n = 0:20
temp = temp + nerf(x/sqrt(2),n);
if~mod(n,3)
p(end+1) = plot(x,(1+temp)/2);
end
end
ylim([-1,2]);
title('\Sigma_{n=0}^{inf} ( 2/sqrt(pi) ) \times ( (-1)^n x^{2*n+1} ) \div ( n! (2*n+1) )');
p(end+1) = plot(x,cdf,'k--');
legend(p,'n = 0','\Sigma_{n} 0->3','\Sigma_{n} 0->6','\Sigma_{n} 0->9',...
'\Sigma_{n} 0->12','\Sigma_{n} 0->15','\Sigma_{n} 0->18','normcdf(x)',...
'location','southeast');
grid on; box on;
xlabel('x'); ylabel('norm. cdf approximations');
Marcin's answer suggests a way to find the nearest sample point. It is easier, IMO, to interpolate. Given x and t as defined in the question,
interp1(x,t,n)
returns the estimated value of the CDF at x==n, for whatever value of n. But note that, for values outside the computed range, it will extrapolate and produce unreliable values.
You can define an anonymous function that works like normcdf:
my_normcdf = #(n)interp1(x,t,n);
my_normcdf(-5)
Try replacing x with 0.01 when you call cumtrapz. You can either use a vector or a scalar spacing for cumtrapz (https://www.mathworks.com/help/matlab/ref/cumtrapz.html), and this might solve your problem. Also, have you checked the original x-values? Is the problem with linspace (i.e. you are not getting the correct x vector), or with cumtrapz?

How to get cumulative distribution functions of a vector in Matlab using cumsum?

I want to get the probability to get a value X higher than x_i, which means the cumulative distribution functions CDF. P(X>=x_i).
I've tried to do it in Matlab with this code.
Let's assume the data is in the column vector p1.
xp1 = linspace(min(p1), max(p1)); %range of bins
histp1 = histc(p1(:), xp1); %histogram od data
probp1 = histp1/sum(histp1); %PDF (probability distribution function)
`figure;plot(probp1, 'o') `
Now I want to calculate the CDF,
sorncount = flipud(histp1);
cumsump1 = cumsum(sorncount);
normcumsump1 = cumsump1/max(cumsump1);
cdf = flipud(normcumsump1);
figure;plot(xp1, cdf, 'ok');
I'm wondering whether anyone can help me to know if I'm ok or am I doing something wrong?
Your code works correctly, but is a bit more complicated than it could be. Since probp1 has been normalized to have sum equal to 1, the maximum of its cumulative sum is guaranteed to be 1, so there is no need to divide by this maximum. This shortens the code a bit:
xp1 = linspace(min(p1), max(p1)); %range of bins
histp1 = histc(p1(:), xp1); %count for each bin
probp1 = histp1/sum(histp1); %PDF (probability distribution function)
cdf = flipud(cumsum(flipud(histp1))); %CDF (unconventional, of P(X>=a) kind)
As Raab70 noted, most of the time CDF is understood as P(X<=a), in which case you don't need flipud: taking cumsum(histp1) is all that's needed.
Also, I would probably use histp1(end:-1:1) instead of flipud(histp1), so that the vector is flipped no matter if it's a row or column.

A moving average with different functions and varying time-frames

I have a matrix time-series data for 8 variables with about 2500 points (~10 years of mon-fri) and would like to calculate the mean, variance, skewness and kurtosis on a 'moving average' basis.
Lets say frames = [100 252 504 756] - I would like calculate the four functions above on over each of the (time-)frames, on a daily basis - so the return for day 300 in the case with 100 day-frame, would be [mean variance skewness kurtosis] from the period day201-day300 (100 days in total)... and so on.
I know this means I would get an array output, and the the first frame number of days would be NaNs, but I can't figure out the required indexing to get this done...
This is an interesting question because I think the optimal solution is different for the mean than it is for the other sample statistics.
I've provided a simulation example below that you can work through.
First, choose some arbitrary parameters and simulate some data:
%#Set some arbitrary parameters
T = 100; N = 5;
WindowLength = 10;
%#Simulate some data
X = randn(T, N);
For the mean, use filter to obtain a moving average:
MeanMA = filter(ones(1, WindowLength) / WindowLength, 1, X);
MeanMA(1:WindowLength-1, :) = nan;
I had originally thought to solve this problem using conv as follows:
MeanMA = nan(T, N);
for n = 1:N
MeanMA(WindowLength:T, n) = conv(X(:, n), ones(WindowLength, 1), 'valid');
end
MeanMA = (1/WindowLength) * MeanMA;
But as #PhilGoddard pointed out in the comments, the filter approach avoids the need for the loop.
Also note that I've chosen to make the dates in the output matrix correspond to the dates in X so in later work you can use the same subscripts for both. Thus, the first WindowLength-1 observations in MeanMA will be nan.
For the variance, I can't see how to use either filter or conv or even a running sum to make things more efficient, so instead I perform the calculation manually at each iteration:
VarianceMA = nan(T, N);
for t = WindowLength:T
VarianceMA(t, :) = var(X(t-WindowLength+1:t, :));
end
We could speed things up slightly by exploiting the fact that we have already calculated the mean moving average. Simply replace the within loop line in the above with:
VarianceMA(t, :) = (1/(WindowLength-1)) * sum((bsxfun(#minus, X(t-WindowLength+1:t, :), MeanMA(t, :))).^2);
However, I doubt this will make much difference.
If anyone else can see a clever way to use filter or conv to get the moving window variance I'd be very interested to see it.
I leave the case of skewness and kurtosis to the OP, since they are essentially just the same as the variance example, but with the appropriate function.
A final point: if you were converting the above into a general function, you could pass in an anonymous function as one of the arguments, then you would have a moving average routine that works for arbitrary choice of transformations.
Final, final point: For a sequence of window lengths, simply loop over the entire code block for each window length.
I have managed to produce a solution, which only uses basic functions within MATLAB and can also be expanded to include other functions, (for finance: e.g. a moving Sharpe Ratio, or a moving Sortino Ratio). The code below shows this and contains hopefully sufficient commentary.
I am using a time series of Hedge Fund data, with ca. 10 years worth of daily returns (which were checked to be stationary - not shown in the code). Unfortunately I haven't got the corresponding dates in the example so the x-axis in the plots would be 'no. of days'.
% start by importing the data you need - here it is a selection out of an
% excel spreadsheet
returnsHF = xlsread('HFRXIndices_Final.xlsx','EquityHedgeMarketNeutral','D1:D2742');
% two years to be used for the moving average. (250 business days in one year)
window = 500;
% create zero-matrices to fill with the MA values at each point in time.
mean_avg = zeros(length(returnsHF)-window,1);
st_dev = zeros(length(returnsHF)-window,1);
skew = zeros(length(returnsHF)-window,1);
kurt = zeros(length(returnsHF)-window,1);
% Now work through the time-series with each of the functions (one can add
% any other functions required), assinging the values to the zero-matrices
for count = window:length(returnsHF)
% This is the most tricky part of the script, the indexing in this section
% The TwoYearReturn is what is shifted along one period at a time with the
% for-loop.
TwoYearReturn = returnsHF(count-window+1:count);
mean_avg(count-window+1) = mean(TwoYearReturn);
st_dev(count-window+1) = std(TwoYearReturn);
skew(count-window+1) = skewness(TwoYearReturn);
kurt(count-window +1) = kurtosis(TwoYearReturn);
end
% Plot the MAs
subplot(4,1,1), plot(mean_avg)
title('2yr mean')
subplot(4,1,2), plot(st_dev)
title('2yr stdv')
subplot(4,1,3), plot(skew)
title('2yr skewness')
subplot(4,1,4), plot(kurt)
title('2yr kurtosis')

Using Negative Values in Matlab

In order to find best fit (thru polyfit), i am getting negative p value but matlab is not accepting it (Subscript indices must either be real positive integers or logicals.). Is there any way that I can use it ? I can't think of alternative method. I'll always get negative values.
EDIT:
I am trying to flattening baseline of a curve, for that. I am running for loop to have fit from 1 to 3 order. And then I am using smallest normr s value to to find the best fit and then subtract it from the whole curve to get baseline straight. I tried with few curves it works well but not with all of the data because of above describes issue.
part of the code I am working on:
for i=1:3
[p,s]=polyfit(x,y,i);
a=s.normr;
b(i,1)=p(1);
normr(i,1)=a;
ind=find(b==min(b));
mn=b(ind,1);
Yflat=y-mn(1)*(x-mean(x));
ca{2,2}=Yflat;
clear a b normr p s rte ind ind2 Yflat
end
When I translate an image into negative coordinates,
I usually record an offset e.g.
offset = [ -5, -8.5 ]
and save the intensity values in matrix begin with (1, 1) as usual,
But when comes to calculation, let the coordinates array add up with the offset
e.g. [ actualX, actualY ] = [ x, y ] + offset ;
It may need extra efforts, but it works.
Good Luck!
The code below (your code from the comments + initialization of x, y) executes. What is the problem?
x = 1:50;
y = randn(size(x));
for i=1:3
[p,s]=polyfit(x,y,i);
a=s.normr;
b(i,1)=p(1);
normr(i,1)=a;
ind=find(b==min(b));
mn=b(ind,1);
Yflat=y-mn(1)*(x-mean(x));
ca{2,2}=Yflat;
end