Shannon entropy for a vector - matlab

I'd like to calculate the Shanon entropy of a vector (psi) over the time period. According to this reference,
I can calculate the entropy for every single element of psi using a loop that computes the entropy at every point. What I wan't to understand is how to set up the probability of psi(tk) lying in a certain bin. and how to set up the total number of bins.
I tried using Matlab's histogram command that will generate the suitable bins (" [N,edges] = histcounts(psi)") but I don't know how to proceed from there. How do I get the probability of each element being in the xth bin?
here is my current code:
% get the number of bins
[N,edges] = histcounts(psi)
%// Compute probability
h = hist(psi);
pdf = h / length(psi);
%// Set any entries that are 0 to 1 so that log calculation equals 0.
pdf(pdf == 0) = 1;
e=[];
%// Calculate entropy
for i=1:length(N)
e(i) = -sum(pdf(i).*log2(pdf(i)));
end
any ideas?

Related

How to use the randn function in Matlab to create an array of values (range 0-10) of size 1,000 that follows a Gaussian distribution? [duplicate]

Matlab has the function randn to draw from a normal distribution e.g.
x = 0.5 + 0.1*randn()
draws a pseudorandom number from a normal distribution of mean 0.5 and standard deviation 0.1.
Given this, is the following Matlab code equivalent to sampling from a normal distribution truncated at 0 at 1?
while x <=0 || x > 1
x = 0.5 + 0.1*randn();
end
Using MATLAB's Probability Distribution Objects makes sampling from truncated distributions very easy.
You can use the makedist() and truncate() functions to define the object and then modify (truncate it) to prepare the object for the random() function which allows generating random variates from it.
% MATLAB R2017a
pd = makedist('Normal',0.5,0.1) % Normal(mu,sigma)
pdt = truncate(pd,0,1) % truncated to interval (0,1)
sample = random(pdt,numRows,numCols) % Sample from distribution `pdt`
Once the object is created (here it is pdt, the truncated version of pd), you can use it in a variety of function calls.
To generate samples, random(pdt,m,n) produces a m x n array of samples from pdt.
Further, if you want to avoid use of toolboxes, this answer from #Luis Mendo is correct (proof below).
figure, hold on
h = histogram(cr,'Normalization','pdf','DisplayName','#Luis Mendo samples');
X = 0:.01:1;
p = plot(X,pdf(pdt,X),'b-','DisplayName','Theoretical (w/ truncation)');
You need the following steps
1. Draw a random value from uniform distribution, u.
2. Assuming the normal distribution is truncated at a and b. get
u_bar = F(a)*u +F(b) *(1-u)
3. Use the inverse of F
epsilon= F^{-1}(u_bar)
epsilon is a random value for the truncated normal distribution.
Why don't you vectorize? It will probably be faster:
N = 1e5; % desired number of samples
m = .5; % desired mean of underlying Gaussian
s = .1; % desired std of underlying Gaussian
lower = 0; % lower value for truncation
upper = 1; % upper value for truncation
remaining = 1:N;
while remaining
result(remaining) = m + s*randn(1,numel(remaining)); % (pre)allocates the first time
remaining = find(result<=lower | result>upper);
end

Matlab: generate random numbers from custom made probability density function

I have a dataset with 3-hourly precipitation amounts for the month of January in the period 1977-1983 (see attachment). However, I want to generate precipitation data for the period 1984-1990 based upon these data. Therefore, I was wondering if it would be possible to create a custom made probability density function of the precipitation amounts (1977-1983) and from this, generate random numbers (precipitation data) for the desired period (1984-1990).
Is this possible in Matlab and could someone help me by doing so?
Thanks in advance!
A histogram will give you an estimate of the PDF -- just divide the bin counts by the total number of samples. From there you can estimate the CDF by integrating. Finally, you can choose a uniformly distributed random number between 0 and 1 and estimate the argument of the CDF that would yield that number. That is, if y is the random number you choose, then you want to find x such that CDF(x) = y. The value of x will be a random number with the desired PDF.
If you have 'Statistics and Machine Learning Toolbox', you can evaluate the PDF of the data with 'Kernel Distribution' method:
Percip_pd = fitdist(Percip,'Kernel');
Then use it for generating N random numbers from the same distribution:
y = random(Percip_pd,N,1);
Quoting #AnonSubmitter85:
"estimate the CDF by integrating. Finally, you can choose a uniformly
distributed random number between 0 and 1 and estimate the argument of
the CDF that would yield that number. That is, if y is the random
number you choose, then you want to find x such that CDF(x) = y. The
value of x will be a random number with the desired PDF."
%random sampling
N=10; %number of resamples
pdf = normrnd(0, 1, 1,100); %your pdf
s = cumsum(pdf); %its cumulative distribution
r = rand(N,1); %random numbers between 0 and 1
for ii=1:N
inds = find(s>r(ii));
indeces(ii)=inds(1); %find first value greater than the random number
end
resamples = pdf(indeces) %the resamples

Normal distribution with the minimum values in Matlab

I tried to generate 1000 the random values in normal distribution by the normrnd function.
A = normrnd(4,1,[1000 1]);
I would like to set the minimum value is 2. However, that function just can define the mean and sd. How can I set the minimum value is 2 ?
You can't. Gaussian or normally distributed numbers are in a bell curve, with the tails tailing off to infinity. What you can do is "censor" them by eliminating every number beyond a cut-off.
Since you choose mean = 4 and sigma = 1, you will end up ~95% elements of A fall within range [2,6]. The number of elements whose values smaller than 2 is about 2.5%. If you consider this figure is small, you can wrap these elements to a minimum value. For example:
A = normrnd(4,1,[1000 1]);
A(A < 2) = A(A<2) + 2 - min(A(A<2))
Of course, it is technically not gaussian distribution. However if you have total control of mean and sigma, you can get a "more gaussian like" distribution by adding an offset to A:
A = A + 2 - min(A)
Note: This assumes you can have an arbitrarily set standard deviation, which may not be the case
As others have said, you cannot specify a lower bound for a true Gaussian. However, you can generate a Gaussian and estimate 1-p percent of values to be above and then ignore p percent of values (which will fall outside your cutoff).
For example, in the following code, I am generating a Gaussian where 95% of data-points fall above 2. Then I am removing all points below 2, knowing that 5% of data will be removed.
This is a solution because setting as p gets closer to zero, your chances of getting uncensored sample data that follows your Gaussian curve and is entirely above your cutoff goes to 100% (Really it's defined by the p/n ratio, but if n is fixed this is true).
n = 1000; % number of samples
cutoff = 2; % Cutoff point for min-value
mu = 4; % Mean
p = .05; % Percentile you would like to cutoff
z = -sqrt(2) * erfcinv(p*2); % compute z score
sigma = (cutoff - mu)/z; % compute standard deviation
A = normrnd(mu,sigma,[n 1]);
I would recommend removing values below the cutoff rather than re-attributing them to the lower bound of your distribution, but that is up to you.
A(A<cutoff) = []; % removes all values of A less than cutoff
If you want to be symmetrical (which you should to prevent sample skew) the following should work.
A(A>(2*mu-cutoff)) = [];

Matlab: trying to estimate multifractal spectrum from time series by histogram box-counting

I am using the approach from this Yale page on fractals:
http://classes.yale.edu/fractals/MultiFractals/Moments/TSMoments/TSMoments.html
which is also expounded on this set of lecture slides (slide 32):
http://multiscale.emsl.pnl.gov/docs/multifractal.pdf
The idea is that you get a dataset, and examine it through many histograms with increasing numbers of bars i.e. resolution. Once resolution is high enough, some bars take the value zero. At each of these stages, we take the number of results that fall into each histogram bin (neglecting any zero-valued bins), divide it by the total size of the dataset, and raise this to a power, q. This operation gives the 'partition function' for a given moment and bin size. Quoting the above linked tutorial: "Provides a selective characterization of the nonhomogeneity of the
measure, positive q’s accentuating the densest regions and negative q’s the smoothest regions."
So I'm using the histogram function in Matlab, looping over bin sizes, summing over all the non-zero bin contents, and so forth. But my output array of partition functions is just a bunch of 1s. I can't see what's going wrong, can anybody else?
Data for intel, cisco, apple and others is available on the same Yale website: yale.edu/fractals/MultiFractals/Finf(a)/Finf(a).html
N.B. intel refers to the intel stock price I was originally using as the dataset.
lower = 1; %set lowest level of histogram resolution (bin size)
upper = 300; %set highest level of histogram resolution (bin size)
qlow = -20; %set lowest moment exponent
qhigh = 20; %set highet moment exponent
qstep = 0.25; %set step size between moment exponents
qn= ((qhigh-qlow)/qstep) + 1; %calculates number of steps given qlow, qhigh, qstep
qvalues= linspace(qlow, qhigh, qn); %creates a vector of q values given above parameters
m = min(intel); %find the maximum of the dataset
M = max(intel); %find the minimum of the dataset
for Q = 1:length(qvalues) %loop over moment exponents q
for k = lower:upper %loop over bin sizes
counts = hist(intel, k); %unpack all k histogram height values into 'counts'
counts(counts==0) = []; %delete all zero values in ''counts
Zq = counts ./ length(intel);
Zq = Zq .^ qvalues(Q);
Zq = sum(Zq);
partitions(k-(lower-1), Q) = Zq; %store Zq in the kth row and the Qth column of 'partitions'
end
end
Your code seems to be generally bug-free but I made some changes since you perform needless repetitions over loops (I moved the outer loop inside and "vectorized" it since all moment calculations can be performed simultaneously for a given histogram. Also, it is building the histogram that takes longest).
intel = cumsum(randn(64,1)); % <-- mock random walk
Ni =length(intel);
%figure, plot(intel)
lower = 1; %set lowest level of histogram resolution (bin size)
upper = 300; %set highest level of histogram resolution (bin size)
qlow = -20; %set lowest moment exponent
qhigh = 20; %set highet moment exponent
qstep = 0.25; %set step size between moment exponents
qn= ((qhigh-qlow)/qstep) + 1; %calculates number of steps given qlow, qhigh, qstep
qvalues= linspace(qlow, qhigh, qn); %creates a vector of q values given above parameters
m = min(intel); %find the maximum of the dataset
M = max(intel); %find the minimum of the dataset
partitions = zeros(upper-lower+1,length(qvalues));
for k = lower:upper %loop over bin sizes
% (1) Select a bin size r and partition [m,M] into intervals of size r:
% [m, m+r), [m+r, m+2r), ..., [m+kr, M], where m+kr < M <= m+(k+1)r.
% Call these bins B0, ..., Bk.
edges = linspace(m,M,k+1);
edges(end)=Inf;
% (2) For each j, 0 <= j <= k, count the number of xi that lie in bin Bj. Call this number nj. Ignore all nj that equal 0 after all the xi have been counted..
counts = histc(intel, edges); %unpack all k histogram height values into 'counts'
counts(counts==0) = []; %delete all zero values in ''counts
% (3) Now compute the qth moment, Mrq = (n0/N)q + ... + (nk/N)q, where the sum is over all nonzero ni.
% Zq = counts/Ni;
partitions(k, :) = sum( (counts/Ni) .^ qvalues); %store Zq in the kth row and the Qth column of 'partitions'
end
figure, hold on
loglog(1./[1:k]', partitions(:,1),'g.-')
loglog(1./[1:k]', partitions(:,80),'b.-')
loglog(1./[1:k]', partitions(:,160),'r.-')
% (4) Perform linear regressions here to get alpha(r) ....

Coefficient Thresholding by keeping largest coefficients

I have been doing some experimentations using some transformation e.g DCT on image data in Matlab.
example in DCT using 512x512 px lena image:
x = double(imread('lenna.bmp'));
R = dct2(x);
Then, i want to threshold the transform coefficients by keeping 100000 largest coefficients of R and set the remaining to zeros.
How can i do that?
Use prctile to find the value that is exceeded or equalled by exactly 100000 entries of R. Then use that value as a threshold, that is, set all lower values to zero:
threshold = prctile(R(:),(1-1e5/numel(R))*100); %// compute threshold
R(R<threshold) = 0; %// set values below the threshold to zero