I have just began using Octave and am attempting to simulate 10,000 outcomes of a Binomial Random Variable defined as:
X ~ Bi(5, 0.2)
I have graphed the result with the following function:
function x = generate_binomial_bernoulli(n,p,m)
% generate Bi(n, p) outcomes m times
x = zeros(1,m); % allocate array for m simulations
for i = 1:m % iterate over m simulations
successes = 0; % count the number of successful trials per simualtion (0-5)
for j = 1:n % iterate through the n trials
u = rand; % generate random nuumber from 0-1
if (u <= p) % if random number is <= p
successes++; % count it as a success
endif
end
x(i) = successes; % store the number of successful trials in this simulation
end
alphabet_x=[0:n]; % create an array from 0 to n
hist(x,alphabet_x); % plot graph
end
I then call the function with generate_binomial_bernoulli(5, 0.2, 10000).
This is simulating 5 Bernoulli Trials, each with a probability of success of 0.2, repeating the 5 trials 10,000 times and graphing the distribution of the number of successes. This plot shows the empirical results of the simulation.
I have now also been asked to plot the theoretical results, which my best guess would be a normally distributed plot around 1 success on the x-axis (0.2 * 5 = 1).
How could I create this plot, and display it on the same histogram?
How could I correclty display my graph with the x-axis spanning just from 0 to 5, both axes labelled, and the two histograms colour-coded with a legend?
EDIT
Here is my current function where I have tried to plot the normalized / theoretical curve:
function x = generate_binomial_bernoulli(n,p,m)
% generate Bi(n, p) outcomes m times
emperical = zeros(1,m); % allocate array for m simulations
for i = 1:m % iterate over m simulations
successes = 0; % count the number of successful trials per simualtion (0-5)
for j = 1:n % iterate through the n trials
u = rand; % generate random nuumber from 0-1
if (u <= p) % if random number is <= p
successes++; % count it as a success
endif
end
emperical(i) = successes; % store the number of successful trials in this simulation
end
close all; % close any existing graphs
x_values = [0:n]; % array of x-axis values
hist(emperical, x_values, "facecolor", "r"); % plot empirical data
xlim([-0.5 (n + 0.5)]); % set x-axis to allow for histogram bar widths
hold on; % hold current graph
mean = n * p; % theoretical mean
norm = normpdf(x_values, mean, 1); % normalised y values
plot(x_values, norm, "color", "b"); % plot theoretical distribution
legend('Emprical', 'Theoretical');
end
As shown below, this curve only extends to a very low height along the y-axis, but I do not know how to span it across the entire dataset.
Get histogram number counts and bins:
[counts,centers] = hist(x);
Normalize:
freq = counts/sum(counts);
Draw normalized histogram:
bar(centers,freq)
Related
I found matlab file (https://in.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum) which generates probability matrix with uniform probability and each column has sum 1. file is as follow
function [x,v] = randfixedsum(n,m,s,a,b)
%[x,v] = randfixedsum(n,m,s,a,b)
%
% This generates an n by m array x, each of whose m columns
% contains n random values lying in the interval [a,b], but
% subject to the condition that their sum be equal to s. The
% scalar value s must accordingly satisfy n*a <= s <= n*b. The
% distribution of values is uniform in the sense that it has the
% conditional probability distribution of a uniform distribution
% over the whole n-cube, given that the sum of the x's is s.
%
% The scalar v, if requested, returns with the total
% n-1 dimensional volume (content) of the subset satisfying
% this condition. Consequently if v, considered as a function
% of s and divided by sqrt(n), is integrated with respect to s
% from s = a to s = b, the result would necessarily be the
% n-dimensional volume of the whole cube, namely (b-a)^n.
%
% This algorithm does no "rejecting" on the sets of x's it
% obtains. It is designed to generate only those that satisfy all
% the above conditions and to do so with a uniform distribution.
% It accomplishes this by decomposing the space of all possible x
% sets (columns) into n-1 dimensional simplexes. (Line segments,
% triangles, and tetrahedra, are one-, two-, and three-dimensional
% examples of simplexes, respectively.) It makes use of three
% different sets of 'rand' variables, one to locate values
% uniformly within each type of simplex, another to randomly
% select representatives of each different type of simplex in
% proportion to their volume, and a third to perform random
% permutations to provide an even distribution of simplex choices
% among like types. For example, with n equal to 3 and s set at,
% say, 40% of the way from a towards b, there will be 2 different
% types of simplex, in this case triangles, each with its own
% area, and 6 different versions of each from permutations, for
% a total of 12 triangles, and these all fit together to form a
% particular planar non-regular hexagon in 3 dimensions, with v
% returned set equal to the hexagon's area.
%
% Roger Stafford - Jan. 19, 2006
% Check the arguments.
if (m~=round(m))|(n~=round(n))|(m<0)|(n<1)
error('n must be a whole number and m a non-negative integer.')
elseif (s<n*a)|(s>n*b)|(a>=b)
error('Inequalities n*a <= s <= n*b and a < b must hold.')
end
% Rescale to a unit cube: 0 <= x(i) <= 1
s = (s-n*a)/(b-a);
% Construct the transition probability table, t.
% t(i,j) will be utilized only in the region where j <= i + 1.
k = max(min(floor(s),n-1),0); % Must have 0 <= k <= n-1
s = max(min(s,k+1),k); % Must have k <= s <= k+1
s1 = s - [k:-1:k-n+1]; % s1 & s2 will never be negative
s2 = [k+n:-1:k+1] - s;
w = zeros(n,n+1); w(1,2) = realmax; % Scale for full 'double' range
t = zeros(n-1,n);
tiny = 2^(-1074); % The smallest positive matlab 'double' no.
for i = 2:n
tmp1 = w(i-1,2:i+1).*s1(1:i)/i;
tmp2 = w(i-1,1:i).*s2(n-i+1:n)/i;
w(i,2:i+1) = tmp1 + tmp2;
tmp3 = w(i,2:i+1) + tiny; % In case tmp1 & tmp2 are both 0,
tmp4 = (s2(n-i+1:n) > s1(1:i)); % then t is 0 on left & 1 on right
t(i-1,1:i) = (tmp2./tmp3).*tmp4 + (1-tmp1./tmp3).*(~tmp4);
end
% Derive the polytope volume v from the appropriate
% element in the bottom row of w.
v = n^(3/2)*(w(n,k+2)/realmax)*(b-a)^(n-1);
% Now compute the matrix x.
x = zeros(n,m);
if m == 0, return, end % If m is zero, quit with x = []
rt = rand(n-1,m); % For random selection of simplex type
rs = rand(n-1,m); % For random location within a simplex
s = repmat(s,1,m);
j = repmat(k+1,1,m); % For indexing in the t table
sm = zeros(1,m); pr = ones(1,m); % Start with sum zero & product 1
for i = n-1:-1:1 % Work backwards in the t table
e = (rt(n-i,:)<=t(i,j)); % Use rt to choose a transition
sx = rs(n-i,:).^(1/i); % Use rs to compute next simplex coord.
sm = sm + (1-sx).*pr.*s/(i+1); % Update sum
pr = sx.*pr; % Update product
x(n-i,:) = sm + pr.*e; % Calculate x using simplex coords.
s = s - e; j = j - e; % Transition adjustment
end
x(n,:) = sm + pr.*s; % Compute the last x
% Randomly permute the order in the columns of x and rescale.
rp = rand(n,m); % Use rp to carry out a matrix 'randperm'
[ig,p] = sort(rp); % The values placed in ig are ignored
x = (b-a)*x(p+repmat([0:n:n*(m-1)],n,1))+a; % Permute & rescale x
return
but I want to generate matrix which has each row elements sum 1 and each row have uniform probability in matlab. how to do this with above programme. to run above programme I am calling it to another file and setting parameters
i.e.
m=4;n=4; a=0; b=1.5;s=1;
[x,v] = randfixedsum(n,m,s,a,b)
create a random matrix and divide each row by sum of elements of that row:
function result = randrowsum(m ,n)
rnd = rand(m,n);
rowsums = sum(rnd,2);
result = bsxfun(#rdivide, rnd, rowsums);
end
to create an m * n random matrix :
a=randrowsum(3,4)
check if sum of each row is 1:
sum(a,2)
I would say the easiest was is to generate the array with the given function.
[x,v] = randfixedsum(n,m,s,a,b);
Then just transport the results.
x = x';
I have written a matlab function (Version 7.10.0.499 (R2010a)) to evaluate incoming FT signal and calculate the morlet wavelet for the signal. I have a similar program, but I needed to make it more readable and closer to mathematical lingo. The output plot is supposed to be a 2D plot with colour showing the intensity of the frequencies. My plot seems to have all frequencies the same per time. The program does make an fft per row of time for each frequency, so I suppose another way to look at it is that the same line repeats itself per step in my for loop. The issue is I have checked with the original program, which does return the correct plot, and I cannot locate any difference beyond what I named the values and how I organized the code.
function[msg] = mile01_wlt(FT_y, f_mn, f_mx, K, N, F_s)
%{
Fucntion to perform a full wlt of a morlet wavelett.
optimization of the number of frequencies to be included.
FT_y satisfies the FT(x) of 1 envelope and is our ft signal.
f min and max enter into the analysis and are decided from
the f-image for optimal values.
While performing the transformation there are different scalings
on the resulting "intensity".
Plot is made with a 2D array and a colour code for intensity.
version 05.05.2016
%}
%--------------------------------------------------------------%
%{
tableofcontents:
1: determining nr. of analysis f, prints and readies f's to be used.
2: ensuring correct orientation of FT_y
3:defining arrays
4: declaring waveletdiagram and storage of frequencies
5: for-loop over all frequencies:
6: reducing file to manageable size by truncating time.
7: marking plot to highlight ("randproblemer")
8: plotting waveletdiagram
%}
%--------------------------------------------------------------%
%1: determining nr. of analysis f, prints and readies f's to be used.
DF = floor( log(f_mx/f_mn) / log(1+( 1/(8*K) ) ) ) + 1;% f-spectre analysed
nr_f_analysed = DF %output to commandline
f_step = (f_mx/f_mn)^(1/(DF-1)); % multiplicative step for new f_a
f_a = f_mn; %[Hz] frequency of analysis
T = N/F_s; %[s] total time sampled
C = 2.0; % factor to scale Psi
%--------------------------------------------------------------%
%2: ensuring correct orientation of FT_y
siz = size(FT_y);
if (siz(2)>siz(1))
FT_y = transpose(FT_y);
end;
%--------------------------------------------------------------%
%3:defining arrays
t = linspace(0, T*(N-1)/N, N); %[s] timespan
f = linspace(0, F_s*(N-1)/N, N); %[Hz] f-specter
%--------------------------------------------------------------%
%4: declaring waveletdiagram and storage of frequencies
WLd = zeros(DF,N); % matrix of DF rows and N columns for storing our wlt
f_store = zeros(1,DF); % horizontal array for storing DF frequencies
%--------------------------------------------------------------%
%5: for-loop over all frequencies:
for jj = 1:DF
o = (K/f_a)*(K/f_a); %factor sigma
Psi = exp(- 0*(f-f_a).*(f-f_a)); % FT(\psi) for 1 envelope
Psi = Psi - exp(-K*K)*exp(- o*(f.*f)); % correctional element
Psi = C*Psi; %factor. not set in stone
%next step fits 1 row in the WLd (3 alternatives)
%WLd(jj,:) = abs(ifft(Psi.*transpose(FT_y)));
WLd(jj,:) = sqrt(abs(ifft(Psi.*transpose(FT_y))));
%WLd(jj,:) = sqrt(abs(ifft(Psi.*FT_y))); %for different array sizes
%and emphasizes weaker parts.
%prep for next round
f_store (jj) = f_a; % storing used frequencies
f_a = f_a*f_step; % determines the next step
end;
%--------------------------------------------------------------%
%6: reducing file to manageable size by truncating time.
P = floor( (K*F_s) / (24*f_mx) );%24 not set in stone
using_every_P_point = P %printout to cmdline for monitoring
N_P = floor(N/P);
points_in_time = N_P %printout to cmdline for monitoring
% truncating WLd and time
WLd2 = zeros(DF,N_P);
for jj = 1:DF
for ii = 1:N_P
WLd2(jj,ii) = WLd(jj,ii*P);
end
end
t_P = zeros(1,N_P);
for ii = 1:N_P % set outside the initial loop to reduce redundancy
t_P(ii) = t(ii*P);
end
%--------------------------------------------------------------%
%7: marking plot to highlight boundary value problems
maxval = max(WLd2);%setting an intensity
mxv = max(maxval);
% marks in wl matrix
for jj= 1:DF
m = floor( K*F_s / (P*pi*f_store(jj)) ); %finding edges of envelope
WLd2(jj,m) = mxv/2; % lower limit
WLd2(jj,N_P-m) = mxv/2;% upper limit
end
%--------------------------------------------------------------%
%8: plotting waveletdiagram
figure;
imagesc(t_P, log10(f_store), WLd2, 'Ydata', [1 size(WLd2,1)]);
set(gca, 'Ydir', 'normal');
xlabel('Time [s]');
ylabel('log10(frequency [Hz])');
%title('wavelet power spectrum'); % for non-sqrt inensities
title('sqrt(wavelet power spectrum)'); %when calculating using sqrt
colorbar('location', 'southoutside');
msg = 'done.';
There are no error message, so I am uncertain what exactly I am doing wrong.
Hope I followed all the guidelines. Otherwise, I apologize.
edit:
my calling program:
% establishing parameters
N = 2^(16); % | number of points to sample
F_s = 3.2e6; % Hz | samplings frequency
T_t = N/F_s; % s | length in seconds of sample time
f_c = 2.0e5; % Hz | carrying wave frequency
f_m = 8./T_t; % Hz | modulating wave frequency
w_c = 2pif_c; % Hz | angular frequency("omega") of carrying wave
w_m = 2pif_m; % Hz | angular frequency("omega") of modulating wave
% establishing parameter arrays
t = linspace(0, T_t, N);
% function variables
T_h = 2*f_m.*t; % dimless | 1/2 of the period for square signal
% combined carry and modulated wave
% y(t) eq. 1):
y_t = 0.5.*cos(w_c.*t).*(1+cos(w_m.*t));
% y(t) eq. 2):
% y_t = 0.5.*cos(w_c.*t)+0.25*cos((w_c+w_m).*t)+0.25*cos((w_c-w_m).*t);
%square wave
sq_t = cos(w_c.*t).*(1 - mod(floor(t./T_h), 2)); % sq(t)
% the following can be exchanged between sq(t) and y(t)
plot(t, y_t)
% plot(t, sq_t)
xlabel('time [s]');
ylabel('signal amplitude');
title('plot of harmonically modulated signal with carrying wave');
% title('plot of square modulated signal with carrying wave');
figure()
hold on
% Fourier transform and plot of freq-image
FT_y = mile01_fftplot(y_t, N, F_s);
% FT_sq = mile01_fftplot(sq_t, N, F_s);
% Morlet wavelet transform and plot of WLdiagram
%determining K, check t-image
K_h = 57*4; % approximation based on 1/4 of an envelope, harmonious
%determining f min and max, from f-image
f_m = 1.995e5; % minimum frequency. chosen to showcase all relevant f
f_M = 2.005e5; % maximum frequency. chosen to showcase all relevant f
%calling wlt function.
name = 'mile'
msg = mile01_wlt(FT_y, f_m, f_M, K_h, N, F_s)
siz = size(FT_y);
if (siz(2)>siz(1))
FT_y = transpose(FT_y);
end;
name = 'arnt'
msg = arnt_wltransf(FT_y, f_m, f_M, K_h, N, F_s)
The time image has a constant frequency, but the amplitude oscillates resempling a gaussian curve. My code returns a sharply segmented image over time, where each point in time holds only 1 frequency. It should reflect a change in intensity across the spectra over time.
hope that helps and thanks!
I found the error. There is a 0 rather than an o in the first instance of Psi. Thinking I'll maybe rename the value as sig or something. besides this the code works. sorry for the trouble there
How can I generate integer random number within [a,b] with below distribution in MATLAB:
p(x)= x^(-a)
I want the distribution to be normalized.
For continuous distributions: Generate random values given a PDF
For discrete distributions, as later it was specified in the OP:
The same rationale can be used as for continuous distributions: inverse transform sampling.
So from mathematical point of view there is no difference, the Matlab implementation however is different. Here is a simple solution with your distribution function:
% for reproducibility
rng(333)
% OPTIONS
% interval endpoints
a = 4;
b = 20;
% number of required random draws
n = 1e4;
% CALCULATION
x = a:b;
% normalization constant
nc = sum(x.^(-a));
% if a and b are finite it is more convinient to have the pdf and cdf as vectors
pmf = 1/nc*x.^(-a);
% create cdf
cdf = cumsum(pmf);
% generate uniformly distributed random numbers from [0,1]
r = rand(n,1);
% use the cdf to get the x value to rs
R = nan(n,1);
for ii = 1:n
rr = r(ii);
if rr == 1
R(ii) = b;
else
idx = sum(cdf < rr) + 1;
R(ii) = x(idx);
end
end
%PLOT
% verfication plot
f = hist(R,x);
bar(x,f/sum(f))
hold on
plot(x, pmf, 'xr', 'Linewidth', 1.2)
xlabel('x')
ylabel('Probability mass')
legend('histogram of random values', 'analytical pdf')
Notes:
the code is general, just replace the pmf with your function;
it is strange that the same parameter a appears in the distribution function and in the interval too.
This question already has answers here:
Random numbers that add to 100: Matlab
(4 answers)
Closed 7 years ago.
I am looking how to pick 10 positive non-zero elements in 1x10 array randomly whose sum is 1
Example :
A=[0.0973 0.1071 0.0983 0.0933 0.1110 0.0942 0.1062 0.0970 0.0981 0.0974]
Note: If we sum the elements in above matrix it will be 1. I need matlab to generate a matrix like this randomly
Try using Roger's fex submission: http://www.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum
Here is a copy of the content of the file (in case the link dies).
All the credit obviously goes to the original poster Roger Stafford:
function [x,v] = randfixedsum(n,m,s,a,b)
% [x,v] = randfixedsum(n,m,s,a,b)
%
% This generates an n by m array x, each of whose m columns
% contains n random values lying in the interval [a,b], but
% subject to the condition that their sum be equal to s. The
% scalar value s must accordingly satisfy n*a <= s <= n*b. The
% distribution of values is uniform in the sense that it has the
% conditional probability distribution of a uniform distribution
% over the whole n-cube, given that the sum of the x's is s.
%
% The scalar v, if requested, returns with the total
% n-1 dimensional volume (content) of the subset satisfying
% this condition. Consequently if v, considered as a function
% of s and divided by sqrt(n), is integrated with respect to s
% from s = a to s = b, the result would necessarily be the
% n-dimensional volume of the whole cube, namely (b-a)^n.
%
% This algorithm does no "rejecting" on the sets of x's it
% obtains. It is designed to generate only those that satisfy all
% the above conditions and to do so with a uniform distribution.
% It accomplishes this by decomposing the space of all possible x
% sets (columns) into n-1 dimensional simplexes. (Line segments,
% triangles, and tetrahedra, are one-, two-, and three-dimensional
% examples of simplexes, respectively.) It makes use of three
% different sets of 'rand' variables, one to locate values
% uniformly within each type of simplex, another to randomly
% select representatives of each different type of simplex in
% proportion to their volume, and a third to perform random
% permutations to provide an even distribution of simplex choices
% among like types. For example, with n equal to 3 and s set at,
% say, 40% of the way from a towards b, there will be 2 different
% types of simplex, in this case triangles, each with its own
% area, and 6 different versions of each from permutations, for
% a total of 12 triangles, and these all fit together to form a
% particular planar non-regular hexagon in 3 dimensions, with v
% returned set equal to the hexagon's area.
%
% Roger Stafford - Jan. 19, 2006
% Check the arguments.
if (m~=round(m))|(n~=round(n))|(m<0)|(n<1)
error('n must be a whole number and m a non-negative integer.')
elseif (s<n*a)|(s>n*b)|(a>=b)
error('Inequalities n*a <= s <= n*b and a < b must hold.')
end
% Rescale to a unit cube: 0 <= x(i) <= 1
s = (s-n*a)/(b-a);
% Construct the transition probability table, t.
% t(i,j) will be utilized only in the region where j <= i + 1.
k = max(min(floor(s),n-1),0); % Must have 0 <= k <= n-1
s = max(min(s,k+1),k); % Must have k <= s <= k+1
s1 = s - [k:-1:k-n+1]; % s1 & s2 will never be negative
s2 = [k+n:-1:k+1] - s;
w = zeros(n,n+1); w(1,2) = realmax; % Scale for full 'double' range
t = zeros(n-1,n);
tiny = 2^(-1074); % The smallest positive matlab 'double' no.
for i = 2:n
tmp1 = w(i-1,2:i+1).*s1(1:i)/i;
tmp2 = w(i-1,1:i).*s2(n-i+1:n)/i;
w(i,2:i+1) = tmp1 + tmp2;
tmp3 = w(i,2:i+1) + tiny; % In case tmp1 & tmp2 are both 0,
tmp4 = (s2(n-i+1:n) > s1(1:i)); % then t is 0 on left & 1 on right
t(i-1,1:i) = (tmp2./tmp3).*tmp4 + (1-tmp1./tmp3).*(~tmp4);
end
% Derive the polytope volume v from the appropriate
% element in the bottom row of w.
v = n^(3/2)*(w(n,k+2)/realmax)*(b-a)^(n-1);
% Now compute the matrix x.
x = zeros(n,m);
if m == 0, return, end % If m is zero, quit with x = []
rt = rand(n-1,m); % For random selection of simplex type
rs = rand(n-1,m); % For random location within a simplex
s = repmat(s,1,m);
j = repmat(k+1,1,m); % For indexing in the t table
sm = zeros(1,m); pr = ones(1,m); % Start with sum zero & product 1
for i = n-1:-1:1 % Work backwards in the t table
e = (rt(n-i,:)<=t(i,j)); % Use rt to choose a transition
sx = rs(n-i,:).^(1/i); % Use rs to compute next simplex coord.
sm = sm + (1-sx).*pr.*s/(i+1); % Update sum
pr = sx.*pr; % Update product
x(n-i,:) = sm + pr.*e; % Calculate x using simplex coords.
s = s - e; j = j - e; % Transition adjustment
end
x(n,:) = sm + pr.*s; % Compute the last x
% Randomly permute the order in the columns of x and rescale.
rp = rand(n,m); % Use rp to carry out a matrix 'randperm'
[ig,p] = sort(rp); % The values placed in ig are ignored
x = (b-a)*x(p+repmat([0:n:n*(m-1)],n,1))+a; % Permute & rescale x
return
I've plotted a simple boxplot of a vector y (1xN) using Matlab. I used multiple grouping variables: x1, x2, x3
x1 (1xN) represents length (0.5, 1 , 2 or 3)
x2 (1xN) represents gauge (26 or 30)
x3 (1xN cell array) represents the name of the vendor.
close all; clc;
N = 1000;
% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);
% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);
% each cable being measured have a gauge of 1awg or 2awg:
x2 = randi(2,N,1);
% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA'
x3 = cell(N,1);
for ii = 1:N
if mod(ii,3) == 0
x3{ii} = 'SONY';
else
x3{ii} = 'YAMAHA';
end
end
figure(1)
boxplot(y,{x1,x2,x3});
I would like to plot a scatter plot over this boxplot in order to show the relevant values of y that create the boxplot, but I could not find a function that groups the values as the boxplot function does.
the closest thing I've found is the following function but it only accepts a single grouping variable.
any help?
The box of the boxplot is determined by the IQR. The data between boxes and outliers is everything in a range of 1.5*IQR from the upper and lower quartile. You can filter the data manually.
For instance...
% data generation
data=randn(100,3);
%%
datas=sort(data);
datainbox=datas(ceil(end/4)+1:floor(end*3/4),:);
[n1 n2]=size(datainbox);
figure(1);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3],datainbox,'k.')
%%
% All datapoints coincide now horizontally. Consider adding a little random
% horizontal play to make them not coincide:
figure(2);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')
%%
% If you want to add all data between boxes and outliers too, do something like:
dataoutbox=datas([1:ceil(end/4) floor(end*3/4)+1:end],:);
n3=size(dataoutbox,1);
% calculate quartiles
dataq=quantile(data,[.25 .5 .75]);
% calculate range between box and outliers = between 1.5*IQR from quartiles
dataiqr=iqr(data);
datar=[dataq(1,:)-dataiqr*1.5;dataq(3,:)+dataiqr*1.5];
dataoutbox(dataoutbox<ones(n3,1)*datar(1,:)|dataoutbox>ones(n3,1)*datar(2,:))=nan;
figure(3);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')
plot(ones(n3,1)*[1 2 3]+.4*(rand(n3,n2)-.5),dataoutbox,'.','color',[1 1 1]*.5)
found a simple solution:
I edited the signature of the 'boxplot' function so it will return 'groupIndexByPoint' in addition to 'h':
function [h,groupIndexByPoint] = boxplot(varargin)
groupIndexByPoint is an internal variable used by 'boxplot'.
and now simply add 4 lines to the original code:
N = 1000;
% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);
% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);
% each cable being measured have a gauge of 1awg or 2awg:
x2 = randi(2,N,1);
% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA'
x3 = cell(N,1);
for ii = 1:N
if mod(ii,3) == 0
x3{ii} = 'SONY';
else
x3{ii} = 'YAMAHA';
end
end
figure(1);
hold on;
[h,groups] = boxplot(y,{x1,x2,x3});
scattering_factor = 0.3;
scaterring_vector = (rand(N,1)-0.5)*scattering_factor;
groups_scattered = groups + scaterring_vector;
plot(groups_scattered,y,'.g');