I have got the following results after using [h, bins] = hist(H) in matlab:
h =
221 20 6 4 1 1 2 0 0 1
bins =
Columns 1 through 7
8.2500 24.7500 41.2500 57.7500 74.2500 90.7500 107.2500
Columns 8 through 10
123.7500 140.2500 156.7500
How do I know the full range of values? Especially that I expected to have up to 255, that is `[0,255], and if we analyze the range of the bins below we will have the following for the ten bins respectively:
0-16.5
16.5-33
33-49.5
49.5-66
66-82.5
82.5-99
99-115.5
115.5-132
132-148.5
148.5-165
So, did I get this range only due to having only 10 bins?
Thanks.
yes, the 10 bins are the default of hist. If you know you might have values between [0,255] you can force whatever bin positions you want, for example:
[h, bins] = hist(H,0:255)
will create 256 bins each for each integer value [0,255]
Related
I know how to generate random numbers in a certain range in Matlab. What i am trying to do now is generate random numbers in a range where there is more chance of getting certain ones.
For example: how could i use Matlab to generate random numbers between 0 and 2, where 50% of them will be less than 0.5?
To get numbers between 0 and 2 I would use (2-0)*rand+0. How can i do this but get a certain percentage of the numbers generated to be less than 0.5? Is there a way to do this using the rand function?
Here is a suggestion:
N = 10; % how many random numbers to generate
bounds = [0 0.5 1 2]; % define the ranges
prob = cumsum([0.5 0.3 0.2]); % define the probabilities
% pick a random range with probability from 'prob':
s = size(bounds,2)-cumsum(bsxfun(#lt,rand(N,1),prob),2);
% pick a random number in this range:
b = rand(1,N).*(bounds(s(:,end)+1)-bounds(s(:,end)))+bounds(s(:,end))
Here we have a probability of prob(k) to draw a number between bounds(k) to bounds(k+1). Basically we first draw a range with defined probability, and then draw another number from the range. So we are interested only in b, but need s on the way (mainly for creating a lot of numbers in a vectorized manner).
so we get:
b =
Columns 1 through 5
0.5297 0.15791 0.88636 0.34822 0.062666
Columns 6 through 10
0.065076 0.54618 0.0039101 0.21155 0.82779
Or, for N = 100000 we can draw:
so we can see how the values are distributed between the 3 ranges in bounds.
You can use a multinomial distribution to draw the ranges, and then compute the random numbers. Here's how:
N = 10;
bounds = [0 0.5 1 2]; % define the ranges
d = diff(bounds);
% pick a N random ranges from a multinomial distribution:
s = mnrnd(N,[0.5 0.3 0.2]);
% pick a random number in this range:
b = rand(1,N).*repelem(d,s)+repelem(bounds(1:end-1),s)
so you get s:
s =
50 39 11
that says you take 50 values from the first range, 39 from the second, and so on...
And you got the result in b:
b =
Columns 1 through 5
0.28212 0.074551 0.18166 0.035787 0.33316
Columns 6 through 10
0.12404 0.93468 1.9808 1.4522 1.6955
So basically it works the same as the first method I posted here, but it may be more accurate and/or readable. Also, I didn't test which method is faster.
I have a set of data that I wish to approximate via random sampling in a non-parametric manner, e.g.:
eventl=
4
5
6
8
10
11
12
24
32
In order to accomplish this, I initially bin the data up to a certain value:
binsize = 5;
nbins = 20;
[bincounts,ind] = histc(eventl,1:binsize:binsize*nbins);
Then populate a matrix with all possible numbers covered by the bins which the approximation can choose:
sizes = transpose(1:binsize*nbins);
To use the bin counts as weights for selection i.e. bincount (1-5) = 2, thus the weight for choosing 1,2,3,4 or 5 = 2 whilst (16-20) = 0 so 16,17,18, 19 or 20 can never be chosen, I simply take the bincounts and replicate them across the bin size:
w = repelem(bincounts,binsize);
To then perform weighted number selection, I use:
[~,R] = histc(rand(1,1),cumsum([0;w(:)./sum(w)]));
R = sizes(R);
For some reason this approach is unable to approximate the data. It was my understanding that was sufficient sampling depth, the binned version of R would be identical to the binned version of eventl however there is significant variation and often data found in bins whose weights were 0.
Could anybody suggest a better method to do this or point out the error?
For a better method, I suggest randsample:
values = [1 2 3 4 5 6 7 8]; %# values from which you want to pick
numberOfElements = 1000; %# how many values you want to pick
weights = [2 2 2 2 2 1 1 1]; %# weights given to the values (1-5 are twice as likely as 6-8)
sample = randsample(values, numberOfElements, true, weights);
Note that even with 1000 samples, the distribution does not exactly correspond to the weights, so if you only pick 20 samples, the histogram may look rather different.
i have a question regarding histc:
I choose the max and min of a sorted signal as my range.
ma = ssigPE(end);
mi = ssigPE(1);
range = mi:ma;
[bincountsO,indO2] = histc(ssigPE, range);
so the range i get back is:
range = [-1.097184703736132 -0.097184703736132 0.902815296263868]
my problem is that just 2 bins get develop, so bincountsO has 2 bins
and indO2 has values as 0, 1 and 2
What am I doing wrong? I guess I m using the range wrong. I read the text here:
http://de.mathworks.com/help/matlab/ref/histc.html#inputarg_binranges
but I don't get it.
The bin ranges tell you where do bins start and stop. So a value of [0 1 2 7]for example, will give 3 bins: [0 1] , [1 2] , [2 7]
In matlab if you do mi:ma it will create an array from the value mi to ma with a step of 1. With your values, that gives just 3 values, hence 2 bins. There are 2 ways of creating a given step size length vectors.
Step size if 100 as an example
range=mi:(ma-mi)/100:ma;
alternatively, and way clearer
range=linspace(mi,ma,100)
I am trying to do a density plot for a data containing two columns with different ranges. The RMSD column is [0-2] and Angle is [0-200] ranges.
My data in the file is like this:
0.0225370 37.088
0.1049553 35.309
0.0710002 33.993
0.0866880 34.708
0.0912664 33.011
0.0932054 33.191
0.1083590 37.276
0.1104145 34.882
0.1027977 34.341
0.0896688 35.991
0.1047578 36.457
0.1215936 38.914
0.1105484 35.051
0.0974138 35.533
0.1390955 33.601
0.1333878 32.133
0.0933365 35.714
0.1200465 33.038
0.1155794 33.694
0.1125247 34.522
0.1181806 37.890
0.1291700 38.871
I want both x and y axis to be binned 1/10th of the range
The 0 of both the axis to be starting in the same
Print the number of elements in each grid of the matrix like this and make a density plot based on these number of elements
0 0.1 0.2 (RMSD)
0 0 1 3
20 2 0 4
40 1 0 5
60 0 0 2
(Angle)
I can find ways to do 1-D binning but then I am stumped about how to make a density plot from those values and havent even dared to attempt2-D binning + plotting.
Thanks for the help
I think you want hist3. Assuming you want to specifty bin edges (not bin centers), use
result = hist3(data, 'Edges', {[0 .1 .2], [0 20 40 60]}).';
where data denotes your data.
From the linked documentation:
hist3(X,'Edges',edges), where edges is a two-element cell array of numeric vectors with monotonically non-decreasing values, uses a 2-D grid of bins with edges at edges{1} in the first dimension and at edges{2} in the second. The (i,j)th bin includes the value X(k,:) if
edges{1}(i) <= X(k,1) < edges{1}(i+1)
edges{2}(j) <= X(k,2) < edges{2}(j+1)
With your example data this gives
result =
0 0 0
8 14 0
0 0 0
0 0 0
For those who don't have Statistics and Machine Learning Toolbox to run bivariate histogram (hist3), it may be more practical using an alternative to solve 2-D hist problem. The following function generates the same output
function N = hist3_alt(x,y,edgesX,edgesY)
N = zeros(length(edgesY)-1,length(edgesX)-1);
[~,~,binX] = histcounts(x,edgesX);
for ii=1:numel(edgesX)-1
N(:,ii) = (histcounts(y(binX==ii),edgesY))';
end
It's simple and efficient. Then you could run the function like this:
N = hist3_alt(x,y,[0:0.1:2],[0:20:200])
I'm trying to check the full range of values on the x-axis of a histogram, especially while I expected the full range to be [0, 255], when I used the following command for the histogram [h, bins] = hist(H), I got the following:
h =
221 20 6 4 1 1 2 0 0 1
bins =
Columns 1 through 7
8.2500 24.7500 41.2500 57.7500 74.2500 90.7500 107.2500
Columns 8 through 10
123.7500 140.2500 156.7500
This implies that the maximum range I got here is up to 165.
If we look at the histogram below, we can see that 165 seems to be the maximum number of frequency value. How do I know the maximum value (range) of the x-axis?
I think you need either one of these 3 options:
max(H)
Or
numel(unique(H))
or
numel(H)
I would start at the top till you find the one you need.