Draw Normal Distribution Graph of a Sample in Matlab - matlab

I have 100 sampled numbers, and I need to draw the normal distribution curve of them in matlab.
The mean and standard deviation of these sampled data can be calculated easily, but is there any function that plots the normal distribution?

If you have access to Statistics Toolbox, the function histfit does what I think you need:
>> x = randn(10000,1);
>> histfit(x)
Just like with the hist command, you can also specify the number of bins, and you can also specify which distribution is used (by default, it's a normal distribution).
If you don't have Statistics Toolbox, you can reproduce a similar effect using a combination of the answers from #Gunther and #learnvst.

Use hist:
hist(data)
It draws a histogram plot of your data:
You can also specify the number of bins to draw, eg:
hist(data,5)
If you only want to draw the resulting pdf, create it yourself using:
mu=mean(data);
sg=std(data);
x=linspace(mu-4*sg,mu+4*sg,200);
pdfx=1/sqrt(2*pi)/sg*exp(-(x-mu).^2/(2*sg^2));
plot(x,pdfx);
You probably can overlay this on the previous hist plot (I think you need to scale things first however, the pdf is in the range 0-1, and the histogram is in the range: number of elements per bin).

If you want to draw a Gaussian distribution for your data, you can use the following code, replacing mean and standard deviation values with those calculated from your data set.
STD = 1;
MEAN = 2;
x = -4:0.1:4;
f = ( 1/(STD*sqrt(2*pi)) ) * exp(-0.5*((x-MEAN)/STD).^2 );
hold on; plot (x,f);
The array x in this example is the xaxis of your distribution, so change that to whatever range and sampling density you have.
If you want to draw your Gaussian fit over your data without the aid of the signal processing toolbox, the following code will draw such a plot with correct scaling. Just replace y with your own data.
y = randn(1000,1) + 2;
x = -4:0.1:6;
n = hist(y,x);
bar (x,n);
MEAN = mean(y);
STD = sqrt(mean((y - MEAN).^2));
f = ( 1/(STD*sqrt(2*pi)) ) * exp(-0.5*((x-MEAN)/STD).^2 );
f = f*sum(n)/sum(f);
hold on; plot (x,f, 'r', 'LineWidth', 2);

Related

How to make a vector that follows a certain trend?

I have a set of data with over 4000 points. I want to exclude grooves from them, ideally from the point from which they start. The data look for example like this:
The problem with this is the noise I get at the top of the plateaus. I have an idea, in which I would take an average value of the most common within some boundaries (again, ideally sth like the red line here:
and then I would construct a temporary matrix, which would fill up one by one with Y if they are less than this average. If the Y(i) would rise above average, the matrix would find its minima and compare it with the global minima. If the temporary matrix's minima wouldn't be sth like 80% of the global minima, it would be discarded as noise.
I've tried using mean(Y), interpolating and fitting it in a polynomial (the green line) - none of those method would cut it to the point I would be satisfied.
I need this to be extremely robust and it doesn't need to be quick. The top and bottom values can vary a lot, as well as the shape of the plateaus. The groove width is more or less the same.
Do you have any ideas? Again, the point is to extract the values that would make the groove.
How about a median filter?
Let's define some noisy data similar to yours, and plot it in blue:
x = .2*sin((0:9999)/1000); %// signal
x(1000:1099) = x(1000:1099) + sin((0:99)/50*pi); %// noise: spike
x(5000:5199) = x(5000:5199) - sin((0:199)/100*pi); %// noise: wider spike
x = x + .05*sin((0:9999)/10); %// noise: high-freq ripple
plot(x)
Now apply the median filter (using medfilt2 from the Image Processing Toolbox) and plot in red. The parameter k controls the filter memory. It should chosen to be large compared to noise variations, and small compared to signal variations:
k = 500; %// filter memory. Choose as needed
y = medfilt2(x,[1 k]);
hold on
plot(y, 'r', 'linewidth', 2)
In case you don't have the image processing toolbox and can't use medfilt2 a method that's more manual. Skip the extreme values, and do a curve fit with sin1 as curve type. Note that this will only work if the signal is in fact a sine wave!
x = linspace(0,3*pi,1000);
y1 = sin(x) + rand()*sin(100*x).*(mod(round(10*x),5)<3);
y2 = 20*(mod(round(5*x),5) == 0).*sin(20*x);
y = y1 + y2; %// A messy sine-wave
yy = y; %// Store the messy sine-wave
[~, idx] = sort(y);
y(idx(1:round(0.15*end))) = y(idx(round(0.15*end))); %// Flatten out the smallest values
y(idx(round(0.85*end):end)) = y(idx(round(0.85*end)));%// Flatten out the largest values
[foo goodness output] = fit(x.',y.', 'sin1'); %// Do a curve fit
plot(foo,x,y) %// Plot it
hold on
plot(x,yy,'black')
Might not be perfect, but it's a step in the right direction.

Plotting many lines as a heatmap

I have a large number (~1000) of files from a data logger that I am trying to process.
If I wanted to plot the trend from a single one of these log files I could do it using
plot(timevalues,datavalues)
I would like to be able to view all of these lines at same time in a similar way to how an oscilloscope has a "persistant" mode.
I can probably cobble together something that uses histograms but am hoping there is pre-existing or more elegant solution to this problem.
You can do exactly what you are suggesting yourself, i.e. plotting the heatmap of the signals.
Consider the following: I'll build a test signals (out of sine waves of different amplitude), then I'll plot the heatmap via hist3 and imagesc.
The idea is to build an auxiliary signal which is just the juxtaposition of all your time histories (both in x and y), then extract basic bivariate statistics out of that.
% # Test signals
xx = 0 : .01 : 2* pi;
center = 1;
eps_ = .2;
amps = linspace(center - eps_ , center + eps_ , 100 );
% # the auxiliary signal will be stored in the following variables
yy = [];
xx_f = [];
for A = amps
xx_f = [xx_f,xx];
yy = [yy A*sin(xx)];
end
% # final heat map
colormap(hot)
[N,C] = hist3([xx_f' yy'],[100 100]);
imagesc(C{1},C{2},N')
You can use also jet colormap instead of hot colormap for readability.
In the following the amplitude is gaussian instead of homogeneus.
here's a "primitive" solution that is just using hist:
%# generate some fake data
x=-8:0.01:8;
y=10*sinc(x);
yy=bsxfun(#plus,y,0.1*randn(numel(x),1000)' );
yy(randi(1000,1,200),:)= 5-randi(10)+ circshift(yy(randi(1000,1,200),:),[1 randi(numel(x),1,200)]);
%# get plot limit parameters
plot(x,yy)
yl=get(gca,'Ylim');
xl=get(gca,'Xlim');
close all;
%# set 2-d histogram ranges
ybins=100;
xbins=numel(x);
yrange=linspace(yl(1),yl(2),ybins);
xrange=linspace(xl(1),xl(2),xbins);
%# prealocate
m=zeros(numel(yrange),numel(xrange));
% build 2d hist
for n=1:numel(x)
ind=hist(yy(:,n),yrange);
m(:,n)=m(:,n)+ind(:);
end
imagesc(xrange,yrange,m)
set(gca,'Ydir','normal')
Why don't you normalize the data and then add all the lines together? You could then plot the heatmap from the single datafile.

Distribution histogram

Hi i am trying to make a simple distribution histogram using some code from stack overflow
however i am unable to get it to work. i know that there are is a simple method for this using statistic toolbox but form a learning point of view i prefer a more explanatory code - can any one help me ?
%%
clear all
load('Mini Project 1.mat')
% Set data to var2
data = var2;
% Set the number of bins
nbins = 0:.8:8;
% Create a histogram plot of data sorted into (nbins) equally spaced bins
n = hist(data,nbins);
% Plot a bar chart with y values at each x value.
% Notice that the length(x) and length(y) have to be same.
bar(nbins,n);
MEAN = mean(data);
STD = sqrt(mean((data - MEAN).^2)); % can also use the simple std(data)
f = ( 1/(STD*sqrt(2*pi)) ) * exp(-0.5*((nbins-MEAN)/STD).^2 );
f = f*sum(nbins)/sum(f);
hold on;
% Plots a 2-D line plot(x,y) with the normal distribution,
% c = color cyan , Width of line = 2
plot (data,f, 'c', 'LineWidth', 2);
xlabel('');
ylabel('Firmness of apples after one month of storage')
title('Histogram compared to normal distribution');
hold of
You are confusing
hist
with
histc
Read up on both.
Also, you are not defining the number of bins, you are defining the bins themselves .
I don't have Matlab at hand now, but try the following:
If you want to compare a normal distribution to the bar plot bar(nbins,n), you should first normalize it:
bar(nbins,n/sum(n))
See if this solves your problem.
If not, try also removing the line f = f*sum(nbins)/sum(f);.

Relative Frequency Histograms and Probability Density Functions

The function called DicePlot simulates rolling 10 dice 5000 times.
The function calculates the sum of values of the 10 dice of each roll, which will be a 1 ⇥ 5000 vector, and plot relative frequency histogram with edges of bins being selected in where each bin in the histogram represents a possible value of for the sum of the dice.
The mean and standard deviation of the 1 ⇥ 5000 sums of dice values will be computed, and the probability density function of normal distribution (with the mean and standard deviation computed) on top of the relative frequency histogram will be plotted.
Below is my code so far - What am I doing wrong? The graph shows up but not the extra red line on top? I looked at answers like this, and I don't think I'll be plotting anything like the Gaussian function.
% function[]= DicePlot()
for roll=1:5000
diceValues = randi(6,[1, 10]);
SumDice(roll) = sum(diceValues);
end
distr=zeros(1,6*10);
for i = 10:60
distr(i)=histc(SumDice,i);
end
bar(distr,1)
Y = normpdf(X)
xlabel('sum of dice values')
ylabel('relative frequency')
title(['NumDice = ',num2str(NumDice),' , NumRolls = ',num2str(NumRolls)]);
end
It is supposed to look like
But it looks like
The red line is not there because you aren't plotting it. Look at the documentation for normpdf. It computes the pdf, it doesn't plot it. So you problem is how do you add this line to the plot. The answer to that problem is to google "matlab hold on".
Here's some code to get you going in the right direction:
% Normalize your distribution
normalizedDist = distr/sum(distr);
bar(normalizedDist ,1);
hold on
% Setup your density function using the mean and std of your sample data
mu = mean(SumDice);
stdv = std(SumDice);
yy = normpdf(xx,mu,stdv);
xx = linspace(0,60);
% Plot pdf
h = plot(xx,yy,'r'); set(h,'linewidth',1.5);

Representing three variables in a three dimension plot

I have a problem dealing with 3rd dimension plot for three variables.
I have three matrices: Temperature, Humidity and Power. During one year, at every hour, each one of the above were measured. So, we have for each matrix 365*24 = 8760 points. Then, one average point is taken every day. So,
Tavg = 365 X 1
Havg = 365 X 1
Pavg = 365 X 1
In electrical point of veiw, the power depends on the temperature and humidity. I want to discover this relation using a three dimensional plot.
I tried using mesh, meshz, surf, plot3, and many other commands in MATLAB but unfortunately I couldn't get what I want. For example, let us take first 10 days. Here, every day is represented by average temperature, average humidity and average power.
Tavg = [18.6275
17.7386
15.4330
15.4404
16.4487
17.4735
19.4582
20.6670
19.8246
16.4810];
Havg = [75.7105
65.0892
40.7025
45.5119
47.9225
62.8814
48.1127
62.1248
73.0119
60.4168];
Pavg = [13.0921
13.7083
13.4703
13.7500
13.7023
10.6311
13.5000
12.6250
13.7083
12.9286];
How do I represent these matrices by three dimension plot?
The challenge is that the 3-D surface plotting functions (mesh, surf, etc.) are looking for a 2-D matrix of z values. So to use them you need to construct such a matrix from the data.
Currently the data is sea of points in 3-D space, so, you have to map these points to a surface. A simple approach to this is to divide up the X-Y (temperature-humidity) plane into bins and then take the average of all of the Z (power) data. Here is some sample code for this that uses accumarray() to compute the averages for each bin:
% Specify bin sizes
Tbin = 3;
Hbin = 20;
% Create binned average array
% First create a two column array of bin indexes to use as subscripts
subs = [round(Havg/Hbin)+1, round(Tavg/Tbin)+1];
% Now create the Z (power) estimate as the average value in each bin
Pest = accumarray(subs,Pavg,[],#mean);
% And the corresponding X (temp) & Y (humidity) vectors
Tval = Tbin/2:Tbin:size(Pest,2)*Tbin;
Hval = Hbin/2:Hbin:size(Pest,1)*Hbin;
% And create the plot
figure(1)
surf(Tval, Hval, Pest)
xlabel('Temperature')
ylabel('Humidity')
zlabel('Power')
title('Simple binned average')
xlim([14 24])
ylim([40 80])
The graph is a bit coarse (can't post image yet, since I am new) because we only have a few data points. We can enhance the visualization by removing any empty bins by setting their value to NaN. Also the binning approach hides any variation in the Z (power) data so we can also overlay the orgional point cloud using plot3 without drawing connecting lines. (Again no image b/c I am new)
Additional code for the final plot:
%% Expanded Plot
% Remove zeros (useful with enough valid data)
%Pest(Pest == 0) = NaN;
% First the original points
figure(2)
plot3(Tavg, Havg, Pavg, '.')
hold on
% And now our estimate
% The use of 'FaceColor' 'Interp' uses colors that "bleed" down the face
% rather than only coloring the faces away from the origin
surfc(Tval, Hval, Pest, 'FaceColor', 'Interp')
% Make this plot semi-transparent to see the original dots anb back side
alpha(0.5)
xlabel('Temperature')
ylabel('Humidity')
zlabel('Power')
grid on
title('Nicer binned average')
xlim([14 24])
ylim([40 80])
I think you're asking for a surface fit for your data. The Curve Fitting Toolbox handles this nicely:
% Fit model to data.
ft = fittype( 'poly11' );
fitresult = fit( [Tavg, Havg], Pavg, ft);
% Plot fit with data.
plot( fitresult, [xData, yData], zData );
legend( 'fit 1', 'Pavg vs. Tavg, Havg', 'Location', 'NorthEast' );
xlabel( 'Tavg' );
ylabel( 'Havg' );
zlabel( 'Pavg' );
grid on
If you don't have the Curve Fitting Toolbox, you can use the backslash operator:
% Find the coefficients.
const = ones(size(Tavg));
coeff = [Tavg Havg const] \ Pavg;
% Plot the original data points
clf
plot3(Tavg,Havg,Pavg,'r.','MarkerSize',20);
hold on
% Plot the surface.
[xx, yy] = meshgrid( ...
linspace(min(Tavg),max(Tavg)) , ...
linspace(min(Havg),max(Havg)) );
zz = coeff(1) * xx + coeff(2) * yy + coeff(3);
surf(xx,yy,zz)
title(sprintf('z=(%f)*x+(%f)*y+(%f)',coeff))
grid on
axis tight
Both of these fit a linear polynomial surface, i.e. a plane, but you'll probably want to use something more complicated. Both of these techniques can be adapted to this situation. There's more information on this subject at mathworks.com: How can I determine the equation of the best-fit line, plane, or N-D surface using MATLAB?.
You might want to look at Delaunay triangulation:
tri = delaunay(Tavg, Havg);
trisurf(tri, Tavg, Havg, Pavg);
Using your example data, this code generates an interesting 'surface'. But I believe this is another way of doing what you want.
You might also try the GridFit tool by John D'Errico from MATLAB Central. This tool produces a surface similar to interpolating between the data points (as is done by MATLAB's griddata) but with cleaner results because it smooths the resulting surface. Conceptually multiple datapoints for nearby or overlapping X,Y coordinates are averaged to produce a smooth result rather than noisy "ripples." The tool also allows for some extrapolation beyond the data points. Here is a code example (assuming the GridFit Tool has already been installed):
%Establish points for surface
num_points = 20;
Tval = linspace(min(Tavg),max(Tavg),num_points);
Hval = linspace(min(Havg),max(Havg),num_points);
%Do the fancy fitting with smoothing
Pest = gridfit(Tavg, Havg, Pavg, Tval, Hval);
%Plot results
figure(5)
surfc(XI,YI,Pest, 'FaceColor', 'Interp')
To produce an even nicer plot, you can add labels, some transparancy and overlay the original points:
alpha(0.5)
hold on
plot3(Tavg,Havg,Pavg,'.')
xlabel('Temperature')
ylabel('Humidity')
zlabel('Power')
grid on
title('GridFit')
PS: #upperBound: Thanks for the Delaunay triangulation tip. That seems like the way to go if you want to go through each of the points. I am a newbie so can't comment yet.
Below is your solution:
Save/write the Myplot3D function
function [x,y,V]=Myplot3D(X,Y,Z)
x=linspace(X(1),X(end),100);
y=linspace(Y(1),Y(end),100);
[Xt,Yt]=meshgrid(x,y);
V=griddata(X,Y,Z,Xt,Yt);
Call the following from your command line (or script)
[Tavg_new,Pavg_new,V]=Myplot3D(Tavg,Pavg,Havg);
surf(Tavg_new,Pavg_new,V)
colormap jet;
xlabel('Temperature')
ylabel('Power/Pressure')
zlabel('Humidity')