scatter plot over boxplot using Matlab - matlab

I've plotted a simple boxplot of a vector y (1xN) using Matlab. I used multiple grouping variables: x1, x2, x3
x1 (1xN) represents length (0.5, 1 , 2 or 3)
x2 (1xN) represents gauge (26 or 30)
x3 (1xN cell array) represents the name of the vendor.
close all; clc;
N = 1000;
% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);
% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);
% each cable being measured have a gauge of 1awg or 2awg:
x2 = randi(2,N,1);
% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA'
x3 = cell(N,1);
for ii = 1:N
if mod(ii,3) == 0
x3{ii} = 'SONY';
else
x3{ii} = 'YAMAHA';
end
end
figure(1)
boxplot(y,{x1,x2,x3});
I would like to plot a scatter plot over this boxplot in order to show the relevant values of y that create the boxplot, but I could not find a function that groups the values as the boxplot function does.
the closest thing I've found is the following function but it only accepts a single grouping variable.
any help?

The box of the boxplot is determined by the IQR. The data between boxes and outliers is everything in a range of 1.5*IQR from the upper and lower quartile. You can filter the data manually.
For instance...
% data generation
data=randn(100,3);
%%
datas=sort(data);
datainbox=datas(ceil(end/4)+1:floor(end*3/4),:);
[n1 n2]=size(datainbox);
figure(1);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3],datainbox,'k.')
%%
% All datapoints coincide now horizontally. Consider adding a little random
% horizontal play to make them not coincide:
figure(2);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')
%%
% If you want to add all data between boxes and outliers too, do something like:
dataoutbox=datas([1:ceil(end/4) floor(end*3/4)+1:end],:);
n3=size(dataoutbox,1);
% calculate quartiles
dataq=quantile(data,[.25 .5 .75]);
% calculate range between box and outliers = between 1.5*IQR from quartiles
dataiqr=iqr(data);
datar=[dataq(1,:)-dataiqr*1.5;dataq(3,:)+dataiqr*1.5];
dataoutbox(dataoutbox<ones(n3,1)*datar(1,:)|dataoutbox>ones(n3,1)*datar(2,:))=nan;
figure(3);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')
plot(ones(n3,1)*[1 2 3]+.4*(rand(n3,n2)-.5),dataoutbox,'.','color',[1 1 1]*.5)

found a simple solution:
I edited the signature of the 'boxplot' function so it will return 'groupIndexByPoint' in addition to 'h':
function [h,groupIndexByPoint] = boxplot(varargin)
groupIndexByPoint is an internal variable used by 'boxplot'.
and now simply add 4 lines to the original code:
N = 1000;
% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);
% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);
% each cable being measured have a gauge of 1awg or 2awg:
x2 = randi(2,N,1);
% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA'
x3 = cell(N,1);
for ii = 1:N
if mod(ii,3) == 0
x3{ii} = 'SONY';
else
x3{ii} = 'YAMAHA';
end
end
figure(1);
hold on;
[h,groups] = boxplot(y,{x1,x2,x3});
scattering_factor = 0.3;
scaterring_vector = (rand(N,1)-0.5)*scattering_factor;
groups_scattered = groups + scaterring_vector;
plot(groups_scattered,y,'.g');

Related

Adding two color maps using imagesc between two sets of curves with specified boundaries (MATLAB)

I'm trying to add two color gradients between two curves (in this example these are lines).
This is the code for what I've done so far
% the mesh
ns=1000;
t_vec = linspace(0,100,ns);
x_vec = linspace(-120,120,ns);
[N, X] = meshgrid(t_vec, x_vec);
% the curves
x1 = linspace(0,100,ns); x2 = linspace(10,110,ns);
y1 = linspace(-50,50,ns); y2 = linspace(-20,80,ns);
X1 = repmat(x1, [size(N, 1) 1]); X2 = repmat(x2, [size(N, 1) 1]);
Y1 = repmat(y1, [size(N, 1) 1]); Y2 = repmat(y2, [size(N, 1) 1]);
% the gradient function
cc = #(x,x2,x1) ...
1./(1+(exp(-x)./(exp(-x1)-exp(-x2))));
for i=1:ns
CData1(:,i)=cc(x_vec,x2(i),x1(i));
CData2(:,i)=cc(x_vec,y2(i),y1(i));
end
CData=CData1+CData2; % here I've added the two gradients
% mask
mask = true(size(N));
mask((X > Y2 | X < Y1) & (X > X2 | X < X1)) = false;
% finalized data
Z = NaN(size(N));
Z(mask) = CData(mask);
Z = normalize(Z, 1, 'range');
% draw a figure!
figure(1); clf;
ax = axes; % create some axes
sc = imagesc(ax, t_vec, x_vec, Z); % plot the data
colormap('summer')
ax.YDir = 'normal' % set the YDir to normal again, imagesc reverses it by default;
hold on
plot(t_vec,x1,'r',t_vec,x2,'r',t_vec,y1,'k',t_vec,y2,'k')
ylim([-120 120]); xlim([0 100])
the result I get is
As you can see, the gradient stretches between the most lower line to the most upper line.
How can I separate between the two color data and present them in the same image (using imagesc) using a different colormap?
Here is a function called comat (see at the bottom of the answer) that I once made for something similar, I think you might find it useful in your case. Here's an example how to use it:
imagesc(t_vec, x_vec, comat(CData2.*mask,CData1.*mask));
colormap([summer(256).^2;flipud(bone(256).^0.5)]); % and the two colormaps
set(gca,'Ydir','normal')
The result is:
I'm not sure this is what you meant, but you can see how the data of the thin stripe is only visualized using the bone b&w colormap, while the rest is with summer. I also "abused" the colormaps with a ^ factor for emphasizing the range of the gradient.
function z = comat(z1,z2,DR)
% the function combines matrices z1 and z2 for the purpose of
% visualization with 2 different colormaps
% z1,z2 - matrices of the same size
% DR - the dynamic range for visualization (default 256)
%example
%imagesc(comat(z1,z2)); colormap([jet(256);bone(256)]);
%defaults
if (nargin < 3); DR=256; end
%normalize to dynamic range, integer values in the range 0 to DR
z1=double(uint32(DR*(z1-min(z1(:)))./(max(z1(:)-min(z1(:))))));
z2=double(uint32(DR*(z2-min(z2(:)))./(max(z2(:)-min(z2(:))))+DR+1));
thr=DR+2+10; %threshold where data is not important for z2, must be at least DR+2
z=z1.*(z2<thr)+z2.*(z2>thr);
end

Exponential fitting for matlab

I have my data in arrays which are exponential like e^(ax+c)+d. I'm trying to draw a fit to them.
a = data1 (:,1);
b = data1 (:,2);
log(b);
p = polyfit (a,log(b),1);
But I don't know what to do now. I found an equation by polyfit and I was hoping to take the exponential of the equation I got from polyfit with
exp (0.5632x+2.435)
But I figured out that it doesn't work like that. Does anyone have any suggestions?
try with nonlinear fitting:
%% PARAMETERS (you need this part)
clear all;
clc, clf;
N = 128; % number of datapoints
Nint = N*10; % number of datapoints for curve interpolation
fun = #(prms,x) prms(4).^(prms(1)*x+prms(2))+prms(3); % write your function
iniPrm = rand(4,1); % find some initial values for the parameters (choose meaningful values for better results)
%% SIMULATE DATA (this is only for testing purposes)
SNR = .01; % signal to noise ratio for simulated data
noise = (rand(1,N)-.5)*SNR; % create some random noise
x = linspace(0,10,N); % create the x axis
y = fun(iniPrm,x) + noise; % simulate a dataset that follows the given function
x = x(:); % reshape as a vector
y = y(:); % reshape as a vector
X = linspace(x(1),x(end),Nint); % interpolate the output to plot it smoothly
plot(x,y,'.r','markersize',10); hold on; % plot the dataset
%% FIT AND INTERPOLATE YOUR MODEL
[out.BETA,out.RESID,out.J,out.COVB,out.MSE] = nlinfit(x,y,fun,iniPrm,[]); % model your data
[out.YPRED,out.DELTA] = nlpredci(fun,X,out.BETA,out.RESID,'Covar',out.COVB); % interpolate your model
out.YPREDLOWCI = out.YPRED - out.DELTA; % find lower confidence intervals of your fitting
out.YPREDUPCI = out.YPRED + out.DELTA; % find upper confidence intervals of your fitting
out.X = X; % store the interpolated X
%% PLOT FITTING
plotCI = #(IO,spec) patch([IO.X(:);flipud(IO.X(:))],[IO.YPREDLOWCI(:);flipud(IO.YPREDUPCI(:))],spec{:}); % create patches: IE: patch(0:10,10:-1:0,ones(10,1)-1,1,{'r','facealpha',0.2})
plot(X,out.YPRED,'-b','linewidth',3);
plotCI(out,{'r','facealpha',.3,'edgealpha',0})

Morlet wavelet transformation function returns nonsensical plot

I have written a matlab function (Version 7.10.0.499 (R2010a)) to evaluate incoming FT signal and calculate the morlet wavelet for the signal. I have a similar program, but I needed to make it more readable and closer to mathematical lingo. The output plot is supposed to be a 2D plot with colour showing the intensity of the frequencies. My plot seems to have all frequencies the same per time. The program does make an fft per row of time for each frequency, so I suppose another way to look at it is that the same line repeats itself per step in my for loop. The issue is I have checked with the original program, which does return the correct plot, and I cannot locate any difference beyond what I named the values and how I organized the code.
function[msg] = mile01_wlt(FT_y, f_mn, f_mx, K, N, F_s)
%{
Fucntion to perform a full wlt of a morlet wavelett.
optimization of the number of frequencies to be included.
FT_y satisfies the FT(x) of 1 envelope and is our ft signal.
f min and max enter into the analysis and are decided from
the f-image for optimal values.
While performing the transformation there are different scalings
on the resulting "intensity".
Plot is made with a 2D array and a colour code for intensity.
version 05.05.2016
%}
%--------------------------------------------------------------%
%{
tableofcontents:
1: determining nr. of analysis f, prints and readies f's to be used.
2: ensuring correct orientation of FT_y
3:defining arrays
4: declaring waveletdiagram and storage of frequencies
5: for-loop over all frequencies:
6: reducing file to manageable size by truncating time.
7: marking plot to highlight ("randproblemer")
8: plotting waveletdiagram
%}
%--------------------------------------------------------------%
%1: determining nr. of analysis f, prints and readies f's to be used.
DF = floor( log(f_mx/f_mn) / log(1+( 1/(8*K) ) ) ) + 1;% f-spectre analysed
nr_f_analysed = DF %output to commandline
f_step = (f_mx/f_mn)^(1/(DF-1)); % multiplicative step for new f_a
f_a = f_mn; %[Hz] frequency of analysis
T = N/F_s; %[s] total time sampled
C = 2.0; % factor to scale Psi
%--------------------------------------------------------------%
%2: ensuring correct orientation of FT_y
siz = size(FT_y);
if (siz(2)>siz(1))
FT_y = transpose(FT_y);
end;
%--------------------------------------------------------------%
%3:defining arrays
t = linspace(0, T*(N-1)/N, N); %[s] timespan
f = linspace(0, F_s*(N-1)/N, N); %[Hz] f-specter
%--------------------------------------------------------------%
%4: declaring waveletdiagram and storage of frequencies
WLd = zeros(DF,N); % matrix of DF rows and N columns for storing our wlt
f_store = zeros(1,DF); % horizontal array for storing DF frequencies
%--------------------------------------------------------------%
%5: for-loop over all frequencies:
for jj = 1:DF
o = (K/f_a)*(K/f_a); %factor sigma
Psi = exp(- 0*(f-f_a).*(f-f_a)); % FT(\psi) for 1 envelope
Psi = Psi - exp(-K*K)*exp(- o*(f.*f)); % correctional element
Psi = C*Psi; %factor. not set in stone
%next step fits 1 row in the WLd (3 alternatives)
%WLd(jj,:) = abs(ifft(Psi.*transpose(FT_y)));
WLd(jj,:) = sqrt(abs(ifft(Psi.*transpose(FT_y))));
%WLd(jj,:) = sqrt(abs(ifft(Psi.*FT_y))); %for different array sizes
%and emphasizes weaker parts.
%prep for next round
f_store (jj) = f_a; % storing used frequencies
f_a = f_a*f_step; % determines the next step
end;
%--------------------------------------------------------------%
%6: reducing file to manageable size by truncating time.
P = floor( (K*F_s) / (24*f_mx) );%24 not set in stone
using_every_P_point = P %printout to cmdline for monitoring
N_P = floor(N/P);
points_in_time = N_P %printout to cmdline for monitoring
% truncating WLd and time
WLd2 = zeros(DF,N_P);
for jj = 1:DF
for ii = 1:N_P
WLd2(jj,ii) = WLd(jj,ii*P);
end
end
t_P = zeros(1,N_P);
for ii = 1:N_P % set outside the initial loop to reduce redundancy
t_P(ii) = t(ii*P);
end
%--------------------------------------------------------------%
%7: marking plot to highlight boundary value problems
maxval = max(WLd2);%setting an intensity
mxv = max(maxval);
% marks in wl matrix
for jj= 1:DF
m = floor( K*F_s / (P*pi*f_store(jj)) ); %finding edges of envelope
WLd2(jj,m) = mxv/2; % lower limit
WLd2(jj,N_P-m) = mxv/2;% upper limit
end
%--------------------------------------------------------------%
%8: plotting waveletdiagram
figure;
imagesc(t_P, log10(f_store), WLd2, 'Ydata', [1 size(WLd2,1)]);
set(gca, 'Ydir', 'normal');
xlabel('Time [s]');
ylabel('log10(frequency [Hz])');
%title('wavelet power spectrum'); % for non-sqrt inensities
title('sqrt(wavelet power spectrum)'); %when calculating using sqrt
colorbar('location', 'southoutside');
msg = 'done.';
There are no error message, so I am uncertain what exactly I am doing wrong.
Hope I followed all the guidelines. Otherwise, I apologize.
edit:
my calling program:
% establishing parameters
N = 2^(16); % | number of points to sample
F_s = 3.2e6; % Hz | samplings frequency
T_t = N/F_s; % s | length in seconds of sample time
f_c = 2.0e5; % Hz | carrying wave frequency
f_m = 8./T_t; % Hz | modulating wave frequency
w_c = 2pif_c; % Hz | angular frequency("omega") of carrying wave
w_m = 2pif_m; % Hz | angular frequency("omega") of modulating wave
% establishing parameter arrays
t = linspace(0, T_t, N);
% function variables
T_h = 2*f_m.*t; % dimless | 1/2 of the period for square signal
% combined carry and modulated wave
% y(t) eq. 1):
y_t = 0.5.*cos(w_c.*t).*(1+cos(w_m.*t));
% y(t) eq. 2):
% y_t = 0.5.*cos(w_c.*t)+0.25*cos((w_c+w_m).*t)+0.25*cos((w_c-w_m).*t);
%square wave
sq_t = cos(w_c.*t).*(1 - mod(floor(t./T_h), 2)); % sq(t)
% the following can be exchanged between sq(t) and y(t)
plot(t, y_t)
% plot(t, sq_t)
xlabel('time [s]');
ylabel('signal amplitude');
title('plot of harmonically modulated signal with carrying wave');
% title('plot of square modulated signal with carrying wave');
figure()
hold on
% Fourier transform and plot of freq-image
FT_y = mile01_fftplot(y_t, N, F_s);
% FT_sq = mile01_fftplot(sq_t, N, F_s);
% Morlet wavelet transform and plot of WLdiagram
%determining K, check t-image
K_h = 57*4; % approximation based on 1/4 of an envelope, harmonious
%determining f min and max, from f-image
f_m = 1.995e5; % minimum frequency. chosen to showcase all relevant f
f_M = 2.005e5; % maximum frequency. chosen to showcase all relevant f
%calling wlt function.
name = 'mile'
msg = mile01_wlt(FT_y, f_m, f_M, K_h, N, F_s)
siz = size(FT_y);
if (siz(2)>siz(1))
FT_y = transpose(FT_y);
end;
name = 'arnt'
msg = arnt_wltransf(FT_y, f_m, f_M, K_h, N, F_s)
The time image has a constant frequency, but the amplitude oscillates resempling a gaussian curve. My code returns a sharply segmented image over time, where each point in time holds only 1 frequency. It should reflect a change in intensity across the spectra over time.
hope that helps and thanks!
I found the error. There is a 0 rather than an o in the first instance of Psi. Thinking I'll maybe rename the value as sig or something. besides this the code works. sorry for the trouble there

random number with p(x)= x^(-a) distribution

How can I generate integer random number within [a,b] with below distribution in MATLAB:
p(x)= x^(-a)
I want the distribution to be normalized.
For continuous distributions: Generate random values given a PDF
For discrete distributions, as later it was specified in the OP:
The same rationale can be used as for continuous distributions: inverse transform sampling.
So from mathematical point of view there is no difference, the Matlab implementation however is different. Here is a simple solution with your distribution function:
% for reproducibility
rng(333)
% OPTIONS
% interval endpoints
a = 4;
b = 20;
% number of required random draws
n = 1e4;
% CALCULATION
x = a:b;
% normalization constant
nc = sum(x.^(-a));
% if a and b are finite it is more convinient to have the pdf and cdf as vectors
pmf = 1/nc*x.^(-a);
% create cdf
cdf = cumsum(pmf);
% generate uniformly distributed random numbers from [0,1]
r = rand(n,1);
% use the cdf to get the x value to rs
R = nan(n,1);
for ii = 1:n
rr = r(ii);
if rr == 1
R(ii) = b;
else
idx = sum(cdf < rr) + 1;
R(ii) = x(idx);
end
end
%PLOT
% verfication plot
f = hist(R,x);
bar(x,f/sum(f))
hold on
plot(x, pmf, 'xr', 'Linewidth', 1.2)
xlabel('x')
ylabel('Probability mass')
legend('histogram of random values', 'analytical pdf')
Notes:
the code is general, just replace the pmf with your function;
it is strange that the same parameter a appears in the distribution function and in the interval too.

Gaussian Process Regression

I am coding a Gaussian Process regression algorithm. Here is the code:
% Data generating function
fh = #(x)(2*cos(2*pi*x/10).*x);
% range
x = -5:0.01:5;
N = length(x);
% Sampled data points from the generating function
M = 50;
selection = boolean(zeros(N,1));
j = randsample(N, M);
% mark them
selection(j) = 1;
Xa = x(j);
% compute the function and extract mean
f = fh(Xa) - mean(fh(Xa));
sigma2 = 1;
% computing the interpolation using all x's
% It is expected that for points used to build the GP cov. matrix, the
% uncertainty is reduced...
K = squareform(pdist(x'));
K = exp(-(0.5*K.^2)/sigma2);
% upper left corner of K
Kaa = K(selection,selection);
% lower right corner of K
Kbb = K(~selection,~selection);
% upper right corner of K
Kab = K(selection,~selection);
% mean of posterior
m = Kab'*inv(Kaa+0.001*eye(M))*f';
% cov. matrix of posterior
D = Kbb - Kab'*inv(Kaa + 0.001*eye(M))*Kab;
% sampling M functions from from GP
[A,B,C] = svd(Kaa);
F0 = A*sqrt(B)*randn(M,M);
% mean from GP using sampled points
F0m = mean(F0,2);
F0d = std(F0,0,2);
%%
% put together data and estimation
F = zeros(N,1);
S = zeros(N,1);
F(selection) = f' + F0m;
S(selection) = F0d;
% sampling M function from posterior
[A,B,C] = svd(D);
a = A*sqrt(B)*randn(N-M,M);
% mean from posterior GPs
Fm = m + mean(a,2);
Fmd = std(a,0,2);
F(~selection) = Fm;
S(~selection) = Fmd;
%%
figure;
% show what we got...
plot(x, F, ':r', x, F-2*S, ':b', x, F+2*S, ':b'), grid on;
hold on;
% show points we got
plot(Xa, f, 'Ok');
% show the whole curve
plot(x, fh(x)-mean(fh(x)), 'k');
grid on;
I expect to get some nice figure where the uncertainty of unknown data points would be big and around sampled data points small. I got an odd figure and even odder is that the uncertainty around sampled data points is bigger than on the rest. Can someone explain to me what I am doing wrong? Thanks!!
There are a few things wrong with your code. Here are the most important points:
The major mistake that makes everything go wrong is the indexing of f. You are defining Xa = x(j), but you should actually do Xa = x(selection), so that the indexing is consistent with the indexing you use on the kernel matrix K.
Subtracting the sample mean f = fh(Xa) - mean(fh(Xa)) does not serve any purpose, and makes the circles in your plot be off from the actual function. (If you choose to subtract something, it should be a fixed number or function, and not depend on the randomly sampled observations.)
You should compute the posterior mean and variance directly from m and D; no need to sample from the posterior and then obtain sample estimates for those.
Here is a modified version of the script with the above points fixed.
%% Init
% Data generating function
fh = #(x)(2*cos(2*pi*x/10).*x);
% range
x = -5:0.01:5;
N = length(x);
% Sampled data points from the generating function
M = 5;
selection = boolean(zeros(N,1));
j = randsample(N, M);
% mark them
selection(j) = 1;
Xa = x(selection);
%% GP computations
% compute the function and extract mean
f = fh(Xa);
sigma2 = 2;
sigma_noise = 0.01;
var_kernel = 10;
% computing the interpolation using all x's
% It is expected that for points used to build the GP cov. matrix, the
% uncertainty is reduced...
K = squareform(pdist(x'));
K = var_kernel*exp(-(0.5*K.^2)/sigma2);
% upper left corner of K
Kaa = K(selection,selection);
% lower right corner of K
Kbb = K(~selection,~selection);
% upper right corner of K
Kab = K(selection,~selection);
% mean of posterior
m = Kab'/(Kaa + sigma_noise*eye(M))*f';
% cov. matrix of posterior
D = Kbb - Kab'/(Kaa + sigma_noise*eye(M))*Kab;
%% Plot
figure;
grid on;
hold on;
% GP estimates
plot(x(~selection), m);
plot(x(~selection), m + 2*sqrt(diag(D)), 'g-');
plot(x(~selection), m - 2*sqrt(diag(D)), 'g-');
% Observations
plot(Xa, f, 'Ok');
% True function
plot(x, fh(x), 'k');
A resulting plot from this with 5 randomly chosen observations, where the true function is shown in black, the posterior mean in blue, and confidence intervals in green.