How to compute confidence intervals and plot them on a bar plot - matlab

How can I plot a bar out of a
data = 1x10 cell
, where each value in the cell has a different dimension like 3x100, 3x40, 66x2 etc.
My goal is to get a bar plot, where I would have 10 group of bars and in every group three bars for each of the values. On the bar, I want it to be shown the median of the values, and I want to calculate the confidence interval and show it additionally.
On this example there are not group of bars, but my point is to show you how I want the confidence intervals shown. On the site, where I found this example they offer a solution where they have this command line
e1 = errorbar(mean(data), ci95);
but I have the problem that it can't find any ci95
So, are there any other effective ways to do it, without installing or downloading additional services?

I've found Patrick Happel's answer to not work because the figure window (and therefore the variable b) gets cleared out by subsequent calls to errorbar. Simply adding a hold on command takes care of this. To avoid confusion, here's a new answer that reproduces all of Patrick's original code, plus my small tweak:
%% Old answer
%Just to be safe, let's clear everything
clear all
data = cell(1,10);
% Random length of the data
l = randi(500, 10, 1) + 50;
% Random "width" of the data, with 3 more likely
w = randi(4, 10, 1);
w(w==4) = 3;
% random "direction" of the data
d = randi(2, 10, 1);
% sigma of the data (in fraction of mean)
sigma = rand(10,1) / 3;
% means of the data
dmean = randi(150,10,1);
dsigma = dmean.*sigma;
for c = 1 : 10
if d(c) == 1
data{c} = randn(l(c), w(c)) .* dsigma(c) + dmean(c);
else
data{c} = randn(w(c), l(c)) .* dsigma(c) + dmean(c);
end
end
%============================================
%Next thing is
% On the bar, I want it to be shown the median of the values, and I
% want to calculate the confidence interval and show it additionally.
%
%Are you really sure you want to plot the median? The median of some data
%is not connected to the variance of the data, and hus no type of error
%bars are required. I guess you want to show the mean. If you really want
%to show the median, a box plot might be a better alternative.
%
%The following code computes and plots the mean in a bar plot:
%============================================
means = zeros(numel(data),3);
stds = zeros(numel(data),3);
n = zeros(numel(data),3);
for c = 1:numel(data)
d = data{c};
if size(d,1) < size(d,2)
d = d';
end
cols = size(d,2);
means(c, 1:cols) = nanmean(d);
stds(c, 1:cols) = nanstd(d);
n(c, 1:cols) = sum(~isnan((d)));
end
b = bar(means);
%% New code
%This ensures that b continues to reference existing data in the next for
%loop, as the graphics objects can otherwise be deleted.
hold on
%% Continuing Patrick Happel's answer
%============================================
%Now, we need to compute the length of the error bars. Typical choices are
%the standard deviation of the data (already computed by the code above,
%stored in stds), the standard error or the 95% confidence interval (which
%is the 1.96fold of the standard error, assuming the underlying data
%follows a normal distribution).
%============================================
% for standard deviation use stds
% for standard error
ste = stds./sqrt(n);
% for 95% confidence interval
ci95 = 1.96 * ste;
%============================================
%Last thing is to plot the error bars. Here I chose the ci95 as you asked
%in your question, if you want to change that, simply change the variable
%in the call to errorbar:
%============================================
for c = 1:3
size(means(:, c))
size(b(c).XData)
e = errorbar(b(c).XData + b(c).XOffset, means(:,c), ci95(:, c));
e.LineStyle = 'none';
end

Since I am not sure how your data looks like, since in your question you stated that the elements of the cell contain data with different dimension like
3x100, 3x40, 66x2
I assume that your data can be arranged in columns or rows and that not all data requires three bars.
Since you did not provide a short piece of your data for us to test, I generate some artificial data:
data = cell(1,10);
% Random length of the data
l = randi(500, 10, 1) + 50;
% Random "width" of the data, with 3 more likely
w = randi(4, 10, 1);
w(w==4) = 3;
% random "direction" of the data
d = randi(2, 10, 1);
% sigma of the data (in fraction of mean)
sigma = rand(10,1) / 3;
% means of the data
dmean = randi(150,10,1);
dsigma = dmean.*sigma;
for c = 1 : 10
if d(c) == 1
data{c} = randn(l(c), w(c)) .* dsigma(c) + dmean(c);
else
data{c} = randn(w(c), l(c)) .* dsigma(c) + dmean(c);
end
end
Next thing is
On the bar, I want it to be shown the median of the values, and I want to calculate the confidence interval and show it additionally.
Are you really sure you want to plot the median? The median of some data is not connected to the variance of the data, and hus no type of error bars are required. I guess you want to show the mean. If you really want to show the median, a box plot might be a better alternative.
The following code computes and plots the mean in a bar plot:
means = zeros(numel(data),3);
stds = zeros(numel(data),3);
n = zeros(numel(data),3);
for c = 1:numel(data)
d = data{c};
if size(d,1) < size(d,2)
d = d';
end
cols = size(d,2);
means(c, 1:cols) = nanmean(d);
stds(c, 1:cols) = nanstd(d);
n(c, 1:cols) = sum(~isnan((d)));
end
b = bar(means);
Now, we need to compute the length of the error bars. Typical choices are the standard deviation of the data (already computed by the code above, stored in stds), the standard error or the 95% confidence interval (which is the 1.96fold of the standard error, assuming the underlying data follows a normal distribution).
% for standard deviation use stds
% for standard error
ste = stds./sqrt(n);
% for 95% confidence interval
ci95 = 1.96 * ste;
Last thing is to plot the error bars. Here I chose the ci95 as you asked in your question, if you want to change that, simply change the variable in the call to errorbar:
for c = 1:3
size(means(:, c))
size(b(c).XData)
e = errorbar(b(c).XData + b(c).XOffset, means(:,c), ci95(:, c));
e.LineStyle = 'none';
end

Related

Verify Law of Large Numbers in MATLAB

The problem:
If a large number of fair N-sided dice are rolled, the average of the simulated rolls is likely to be close to the mean of 1,2,...N i.e. the expected value of one die. For example, the expected value of a 6-sided die is 3.5.
Given N, simulate 1e8 N-sided dice rolls by creating a vector of 1e8 uniformly distributed random integers. Return the difference between the mean of this vector and the mean of integers from 1 to N.
My code:
function dice_diff = loln(N)
% the mean of integer from 1 to N
A = 1:N
meanN = sum(A)/N;
% I do not have any idea what I am doing here!
V = randi(1e8);
meanvector = V/1e8;
dice_diff = meanvector - meanN;
end
First of all, make sure everytime you ask a question that it is as clear as possible, to make it easier for other users to read.
If you check how randi works, you can see this:
R = randi(IMAX,N) returns an N-by-N matrix containing pseudorandom
integer values drawn from the discrete uniform distribution on 1:IMAX.
randi(IMAX,M,N) or randi(IMAX,[M,N]) returns an M-by-N matrix.
randi(IMAX,M,N,P,...) or randi(IMAX,[M,N,P,...]) returns an
M-by-N-by-P-by-... array. randi(IMAX) returns a scalar.
randi(IMAX,SIZE(A)) returns an array the same size as A.
So, if you want to use randi in your problem, you have to use it like this:
V=randi(N, 1e8,1);
and you need some more changes:
function dice_diff = loln(N)
%the mean of integer from 1 to N
A = 1:N;
meanN = mean(A);
V = randi(N, 1e8,1);
meanvector = mean(V);
dice_diff = meanvector - meanN;
end
For future problems, try using the command
help randi
And matlab will explain how the function randi (or other function) works.
Make sure to check if the code above gives the desired result
As pointed out, take a closer look at the use of randi(). From the general case
X = randi([LowerInt,UpperInt],NumRows,NumColumns); % UpperInt > LowerInt
you can adapt to dice rolling by
Rolls = randi([1 NumSides],NumRolls,NumSamplePaths);
as an example. Exchanging NumRolls and NumSamplePaths will yield Rolls.', or transpose(Rolls).
According to the Law of Large Numbers, the updated sample average after each roll should converge to the true mean, ExpVal (short for expected value), as the number of rolls (trials) increases. Notice that as NumRolls gets larger, the sample mean converges to the true mean. The image below shows this for two sample paths.
To get the sample mean for each number of dice rolls, I used arrayfun() with
CumulativeAvg1 = arrayfun(#(jj)mean(Rolls(1:jj,1)),[1:NumRolls]);
which is equivalent to using the cumulative sum, cumsum(), to get the same result.
CumulativeAvg1 = (cumsum(Rolls(:,1))./(1:NumRolls).'); % equivalent
% MATLAB R2019a
% Create Dice
NumSides = 6; % positive nonzero integer
NumRolls = 200;
NumSamplePaths = 2;
% Roll Dice
Rolls = randi([1 NumSides],NumRolls,NumSamplePaths);
% Output Statistics
ExpVal = mean(1:NumSides);
CumulativeAvg1 = arrayfun(#(jj)mean(Rolls(1:jj,1)),[1:NumRolls]);
CumulativeAvgError1 = CumulativeAvg1 - ExpVal;
CumulativeAvg2 = arrayfun(#(jj)mean(Rolls(1:jj,2)),[1:NumRolls]);
CumulativeAvgError2 = CumulativeAvg2 - ExpVal;
% Plot
figure
subplot(2,1,1), hold on, box on
plot(1:NumRolls,CumulativeAvg1,'b--','LineWidth',1.5,'DisplayName','Sample Path 1')
plot(1:NumRolls,CumulativeAvg2,'r--','LineWidth',1.5,'DisplayName','Sample Path 2')
yline(ExpVal,'k-')
title('Average')
xlabel('Number of Trials')
ylim([1 NumSides])
subplot(2,1,2), hold on, box on
plot(1:NumRolls,CumulativeAvgError1,'b--','LineWidth',1.5,'DisplayName','Sample Path 1')
plot(1:NumRolls,CumulativeAvgError2,'r--','LineWidth',1.5,'DisplayName','Sample Path 2')
yline(0,'k-')
title('Error')
xlabel('Number of Trials')

Reverse-calculating original data from a known moving average

I'm trying to estimate the (unknown) original datapoints that went into calculating a (known) moving average. However, I do know some of the original datapoints, and I'm not sure how to use that information.
I am using the method given in the answers here: https://stats.stackexchange.com/questions/67907/extract-data-points-from-moving-average, but in MATLAB (my code below). This method works quite well for large numbers of data points (>1000), but less well with fewer data points, as you'd expect.
window = 3;
datapoints = 150;
data = 3*rand(1,datapoints)+50;
moving_averages = [];
for i = window:size(data,2)
moving_averages(i) = mean(data(i+1-window:i));
end
length = size(moving_averages,2)+(window-1);
a = (tril(ones(length,length),window-1) - tril(ones(length,length),-1))/window;
a = a(1:length-(window-1),:);
ai = pinv(a);
daily = mtimes(ai,moving_averages');
x = 1:size(data,2);
figure(1)
hold on
plot(x,data,'Color','b');
plot(x(window:end),moving_averages(window:end),'Linewidth',2,'Color','r');
plot(x,daily(window:end),'Color','g');
hold off
axis([0 size(x,2) min(daily(window:end))-1 max(daily(window:end))+1])
legend('original data','moving average','back-calculated')
Now, say I know a smattering of the original data points. I'm having trouble figuring how might I use that information to more accurately calculate the rest. Thank you for any assistance.
You should be able to calculate the original data exactly if you at any time can exactly determine one window's worth of data, i.e. in this case n-1 samples in a window of length n. (In your case) if you know A,B and (A+B+C)/3, you can solve now and know C. Now when you have (B+C+D)/3 (your moving average) you can exactly solve for D. Rinse and repeat. This logic works going backwards too.
Here is an example with the same idea:
% the actual vector of values
a = cumsum(rand(150,1) - 0.5);
% compute moving average
win = 3; % sliding window length
idx = hankel(1:win, win:numel(a));
m = mean(a(idx));
% coefficient matrix: m(i) = sum(a(i:i+win-1))/win
A = repmat([ones(1,win) zeros(1,numel(a)-win)], numel(a)-win+1, 1);
for i=2:size(A,1)
A(i,:) = circshift(A(i-1,:), [0 1]);
end
A = A / win;
% solve linear system
%x = A \ m(:);
x = pinv(A) * m(:);
% plot and compare
subplot(211), plot(1:numel(a),a, 1:numel(m),m)
legend({'original','moving average'})
title(sprintf('length = %d, window = %d',numel(a),win))
subplot(212), plot(1:numel(a),a, 1:numel(a),x)
legend({'original','reconstructed'})
title(sprintf('error = %f',norm(x(:)-a(:))))
You can see the reconstruction error is very small, even using the data sizes in your example (150 samples with a 3-samples moving average).

separate 'entangled' vectors in Matlab

I have a set of three vectors (stored into a 3xN matrix) which are 'entangled' (e.g. some value in the second row should be in the third row and vice versa). This 'entanglement' is based on looking at the figure in which alpha2 is plotted. To separate the vector I use a difference based approach where I calculate the difference of one value with respect the three next values (e.g. comparing (1,i) with (:,i+1)). Then I take the minimum and store that. The method works to separate two of the three vectors, but not for the last.
I was wondering if you guys can share your ideas with me how to solve this problem (if possible). I have added my coded below.
Thanks in advance!
Problem in figures:
clear all; close all; clc;
%%
alpha2 = [-23.32 -23.05 -22.24 -20.91 -19.06 -16.70 -13.83 -10.49 -6.70;
-0.46 -0.33 0.19 2.38 5.44 9.36 14.15 19.80 26.32;
-1.58 -1.13 0.06 0.70 1.61 2.78 4.23 5.99 8.09];
%%% Original
figure()
hold on
plot(alpha2(1,:))
plot(alpha2(2,:))
plot(alpha2(3,:))
%%% Store start values
store1(1,1) = alpha2(1,1);
store2(1,1) = alpha2(2,1);
store3(1,1) = alpha2(3,1);
for i=1:size(alpha2,2)-1
for j=1:size(alpha2,1)
Alpha1(j,i) = abs(store1(1,i)-alpha2(j,i+1));
Alpha2(j,i) = abs(store2(1,i)-alpha2(j,i+1));
Alpha3(j,i) = abs(store3(1,i)-alpha2(j,i+1));
[~, I] = min(Alpha1(:,i));
store1(1,i+1) = alpha2(I,i+1);
[~, I] = min(Alpha2(:,i));
store2(1,i+1) = alpha2(I,i+1);
[~, I] = min(Alpha3(:,i));
store3(1,i+1) = alpha2(I,i+1);
end
end
%%% Plot to see if separation worked
figure()
hold on
plot(store1)
plot(store2)
plot(store3)
Solution using extrapolation via polyfit:
The idea is pretty simple: Iterate over all positions i and use polyfit to fit polynomials of degree d to the d+1 values from F(:,i-(d+1)) up to F(:,i). Use those polynomials to extrapolate the function values F(:,i+1). Then compute the permutation of the real values F(:,i+1) that fits those extrapolations best. This should work quite well, if there are only a few functions involved. There is certainly some room for improvement, but for your simple setting it should suffice.
function F = untangle(F, maxExtrapolationDegree)
%// UNTANGLE(F) untangles the functions F(i,:) via extrapolation.
if nargin<2
maxExtrapolationDegree = 4;
end
extrapolate = #(f) polyval(polyfit(1:length(f),f,length(f)-1),length(f)+1);
extrapolateAll = #(F) cellfun(extrapolate, num2cell(F,2));
fitCriterion = #(X,Y) norm(X(:)-Y(:),1);
nFuncs = size(F,1);
nPoints = size(F,2);
swaps = perms(1:nFuncs);
errorOfFit = zeros(1,size(swaps,1));
for i = 1:nPoints-1
nextValues = extrapolateAll(F(:,max(1,i-(maxExtrapolationDegree+1)):i));
for j = 1:size(swaps,1)
errorOfFit(j) = fitCriterion(nextValues, F(swaps(j,:),i+1));
end
[~,j_bestSwap] = min(errorOfFit);
F(:,i+1) = F(swaps(j_bestSwap,:),i+1);
end
Initial solution: (not that pretty - Skip this part)
This is a similar solution that tries to minimize the sum of the derivatives up to some degree of the vector valued function F = #(j) alpha2(:,j). It does so by stepping through the positions i and checks all possible permutations of the coordinates of i to get a minimal seminorm of the function F(1:i).
(I'm actually wondering right now if there is any canonical mathematical way to define the seminorm so we get our expected results... I initially was going for the H^1 and H^2 seminorms, but they didn't quite work...)
function F = untangle(F)
nFuncs = size(F,1);
nPoints = size(F,2);
seminorm = #(x,i) sum(sum(abs(diff(x(:,1:i),1,2)))) + ...
sum(sum(abs(diff(x(:,1:i),2,2)))) + ...
sum(sum(abs(diff(x(:,1:i),3,2)))) + ...
sum(sum(abs(diff(x(:,1:i),4,2))));
doSwap = #(x,swap,i) [x(:,1:i-1), x(swap,i:end)];
swaps = perms(1:nFuncs);
normOfSwap = zeros(1,size(swaps,1));
for i = 2:nPoints
for j = 1:size(swaps,1)
normOfSwap(j) = seminorm(doSwap(F,swaps(j,:),i),i);
end
[~,j_bestSwap] = min(normOfSwap);
F = doSwap(F,swaps(j_bestSwap,:),i);
end
Usage:
The command alpha2 = untangle(alpha2); will untangle your functions:
It should even work for more complicated data, like these shuffled sine-waves:
nPoints = 100;
nFuncs = 5;
t = linspace(0, 2*pi, nPoints);
F = bsxfun(#(a,b) sin(a*b), (1:nFuncs).', t);
for i = 1:nPoints
F(:,i) = F(randperm(nFuncs),i);
end
Remark: I guess if you already know that your functions will be quadratic or some other special form, RANSAC would be a better idea for larger number of functions. This could also be useful if the functions are not given with the same x-value spacing.

How to make a graph from function output in matlab

I'm completely lost at this using MATLAB functions, so here is the case:
lets assume I have SUM=0, and
I have a constant probability P that the user gives me, and I have to compare this constant P, with other M (also user gives M) random probabilities, if P is larger I add 1 to SUM, if P is smaller I add -1 to SUM... and at the end I want print on the screen the graph of the process.
I managed till now to make only one stage with this code:
function [result] = ex1(p)
if (rand>=p) result=1;
else result=-1;
end
(its like M=1)
How do You suggest I can modify this code in order to make it work the way I described it before (including getting a graph) ?
Or maybe I'm getting the logic wrong? the question says I get 1 with probability P, and -1 with probability (1-P), and the SUM is the same
Many thanks
I'm not sure how you achieve your input, but this should get you on the way:
p = 0.5; % Constant probability
m = 10;
randoms = rand(m,1) % Random probabilities
results = ones(m,1);
idx = find(randoms < p)
results(idx) = -1;
plot(cumsum(results))
For m = 1000:
You can do it like this:
p = 0.25; % example data
M = 20; % example data
random = rand(M,1); % generate values
y = cumsum(2*(random>=p)-1); % compute cumulative sum of +1/-1
plot(y) % do the plot
The important function here is cumsum, which does the cumulative sum on the sequence of +1/-1 values generated by 2*(random>=p)-1.
Example graph with p=0.5, M=2000:

Matlab fourier descriptors what's wrong?

I am using Gonzalez frdescp function to get Fourier descriptors of a boundary. I use this code, and I get two totally different sets of numbers describing two identical but different in scale shapes.
So what is wrong?
im = imread('c:\classes\a1.png');
im = im2bw(im);
b = bwboundaries(im);
f = frdescp(b{1}); // fourier descriptors for the boundary of the first object ( my pic only contains one object anyway )
// Normalization
f = f(2:20); // getting the first 20 & deleting the dc component
f = abs(f) ;
f = f/f(1);
Why do I get different descriptors for identical - but different in scale - two circles?
The problem is that the frdescp code (I used this code, that should be the same as referred by you) is written also in order to center the Fourier descriptors.
If you want to describe your shape in a correct way, it is mandatory to mantain some descriptors that are symmetric with respect to the one representing the DC component.
The following image summarize the concept:
In order to solve your problem (and others like yours), I wrote the following two functions:
function descriptors = fourierdescriptor( boundary )
%I assume that the boundary is a N x 2 matrix
%Also, N must be an even number
np = size(boundary, 1);
s = boundary(:, 1) + i*boundary(:, 2);
descriptors = fft(s);
descriptors = [descriptors((1+(np/2)):end); descriptors(1:np/2)];
end
function significativedescriptors = getsignificativedescriptors( alldescriptors, num )
%num is the number of significative descriptors (in your example, is was 20)
%In the following, I assume that num and size(alldescriptors,1) are even numbers
dim = size(alldescriptors, 1);
if num >= dim
significativedescriptors = alldescriptors;
else
a = (dim/2 - num/2) + 1;
b = dim/2 + num/2;
significativedescriptors = alldescriptors(a : b);
end
end
Know, you can use the above functions as follows:
im = imread('test.jpg');
im = im2bw(im);
b = bwboundaries(im);
b = b{1};
%force the number of boundary points to be even
if mod(size(b,1), 2) ~= 0
b = [b; b(end, :)];
end
%define the number of significative descriptors I want to extract (it must be even)
numdescr = 20;
%Now, you can extract all fourier descriptors...
f = fourierdescriptor(b);
%...and get only the most significative:
f_sign = getsignificativedescriptors(f, numdescr);
I just went through the same problem with you.
According to this link, if you want invariant to scaling, make the comparison ratio-like, for example by dividing every Fourier coefficient by the DC-coefficient. f*1 = f1/f[0], f*[2]/f[0], and so on. Thus, you need to use the DC-coefficient where the f(1) in your code is not the actual DC-coefficient after your step "f = f(2:20); % getting the first 20 & deleting the dc component". I think the problem can be solved by keeping the value of the DC-coefficient first, the code after adjusted should be like follows:
% Normalization
DC = f(1);
f = f(2:20); % getting the first 20 & deleting the dc component
f = abs(f) ; % use magnitudes to be invariant to translation & rotation
f = f/DC; % divide the Fourier coefficients by the DC-coefficient to be invariant to scale