Cumulative distribution in Matlab - matlab

I have a question on plotting probability distribution and cumulative distribution curves using Matlab. I apologize for asking a noob question but I am new to Matlab, having only used it for a few hours.
I have a set of data which has the size range for the sand particles found on a beach in millimeters (e.g: >2.00, 1.00–2.00, 0.50–1.00, <0.50).
And their corresponding percentages of finding these sand particles are as follows:
(e.g.: 30, 25.5, 35.9, 8.6).
How am I supposed to input the values in the Matlab system for it to plot the probability distribution and cumulative distribution curves on the same plot with different colors? Percentage should be the y-axis and the size range should be the x-axis.

If your dataset is literally 4 points, then you can simply enter them literally. For example, if my dataset was {(A, 1), (B, 2), (C, 3)}, then we could simply set y = [1, 2, 3] and x = {'a', 'b', 'c'}.
For distributions, you should take a look at the sum and cumsum functions.
For plotting, take a look at bar for frequency plots and plot for cumulative plots (this is just my preference). The documentation contains information on setting colors.
For plotting on the same graph, look at hold. To label your plot and your axes, look at xlabel, ylabel, and title.
Matlab has a good FAQ on setting the actual values that are displayed on each axis. For example, I can plot my dataset above by plotting the y vector only, and then setting the X tick labels to 'A', 'B', and 'C'.

I'd be careful with the cumulative distribution function (CDF). It might make more sense to reorder your data in increasing particle size (see fliplr() function) otherwise your CDF's interpretation will be suspect.
The cumsum() function can get your CDF from the given probability mass function (PMF).
label={'<0.50','0.50-1.00','1.00-2.00','>2.00'}';
pmf = [0.086 0.359 0.255 0.30]';
cdf = cumsum(pmf);
bar(pmf) % PMF
set(gca,'XTickLabel',label)
title('Sand Particle Size Distribution')
xlabel('Sand particle size (mm)')
figure
stairs(cdf,'ks-','LineWidth',2.0) % CDF
set(gca,'XTick',1:length(label),'XTickLabel',label)
ylabel('Percentile')
xlabel('Sand particle size (mm)')
ylim([0 1])

Related

Poisson fit curve over histogram plot

I 'd like to fit my empirical data to a poisson distribution curve.
I have the mean given value, say 2.3, and data (empirical).
def fit_poisson(data=None,network=None,mu=2.3):
sns.set_theme()
fig, ax = plt.subplots(1, 1)
x = np.arange(poisson.ppf(0.01, mu),
poisson.ppf(0.99, mu))
sns.histplot(data, stat='density')
plt.plot(x, poisson.pmf(x, mu))
It plots:
Apparently, there's is a range issue in y, here. Maybe a problem with lambda? How do I properly fit my empirical histogram to a poisson distribution curve of same mean?
Poisson random variables are discrete: their y value is "probability" not "density". But the default behavior of histplot avoids guessing that you have discrete data, and it is choosing bins with binwidth < 1 in this case.
Because density normalization forces the area of all bars to sum to 1, that means the density value for the bar containing observations of a certain value will be greater than the probability mass on that value.
There are two relevant parameters here:
stat="probability" will make the heights of the bars sum to 1, so they will match the PMF (assuming binwidth < 2, so that only one unique value appears in each bar)
discrete=True, which sets binwidth=1 (and aligns the center of each bar with integral values)
sns.histplot(data, stat='probability', discrete=True, shrink=.8)
I've also added shrink=0.8, which draws the bars a bit narrower than the binwidth; this helps emphasize the discrete nature of the data.
(Note that with discrete=True (implying binwidth=1), density and probability normalization will do the same thing so that's actually all you need, but Probability is the right y axis label to use here).

How to make smooth plot with matrix that don't have the same column and line [duplicate]

Let's say we have the following data:
A1= [41.3251
18.2350
9.9891
36.1722
50.8702
32.1519
44.6284
60.0892
58.1297
34.7482
34.6447
6.7361
1.2960
1.9778
2.0422];
A2=[86.3924
86.4882
86.1717
85.8506
85.8634
86.1267
86.4304
86.6406
86.5022
86.1384
86.5500
86.2765
86.7044
86.8075
86.9007];
When I plot the above data using plot(A1,A2);, I get this graph:
Is there any way to make the graph look smooth like a cubic plot?
Yes you can. You can interpolate in between the keypoints. This will require a bit of trickery though. Blindly using interpolation with any of MATLAB's commands won't work because they require that the independent axes (the x-axis in your case) to increase. You can't do this with your data currently... at least out of the box. Therefore you'll have to create a dummy list of values that span from 1 up to as many elements as there are in A1 (or A2 as they're both equal in size) to create an independent axis and interpolate both arrays independently by specifying the dummy list with a finer spacing in resolution. This finer spacing is controlled by the total number of new points you want to introduce in the plot. These points will be defined within the range of the dummy list but the spacing in between each point will decrease as you increase the total number of new points. As a general rule, the more points you add the less spacing there will be and so the plot should be more smooth. Once you do that, plot the final values together.
Here's some code for you to run. We will be using interp1 to perform the interpolation for us and most of the work. The function linspace creates the finer grid of points in the dummy list to facilitate the interpolation. N would be the total number of desired points you want to plot. I've made it 500 for now meaning that 500 points will be used for interpolation using your original data. Experiment by increasing (or decreasing) the total number of points and seeing what effect this has in the smoothness of your data.
I'll also be using the Piecewise Cubic Hermite Interpolating Polynomial or pchip as the method of interpolation, which is basically cubic spline interpolation if you want to get technical. Assuming that A1 and A2 are already created:
%// Specify number of interpolating points
N = 500;
%// Specify dummy list of points
D = 1 : numel(A1);
%// Generate finer grid of points
NN = linspace(1, numel(A1), N);
%// Interpolate each set of points independently
A1interp = interp1(D, A1, NN, 'pchip');
A2interp = interp1(D, A2, NN, 'pchip');
%// Plot the data
plot(A1interp, A2interp);
I now get the following:

Smooth plot of non-dependent variable graph

Let's say we have the following data:
A1= [41.3251
18.2350
9.9891
36.1722
50.8702
32.1519
44.6284
60.0892
58.1297
34.7482
34.6447
6.7361
1.2960
1.9778
2.0422];
A2=[86.3924
86.4882
86.1717
85.8506
85.8634
86.1267
86.4304
86.6406
86.5022
86.1384
86.5500
86.2765
86.7044
86.8075
86.9007];
When I plot the above data using plot(A1,A2);, I get this graph:
Is there any way to make the graph look smooth like a cubic plot?
Yes you can. You can interpolate in between the keypoints. This will require a bit of trickery though. Blindly using interpolation with any of MATLAB's commands won't work because they require that the independent axes (the x-axis in your case) to increase. You can't do this with your data currently... at least out of the box. Therefore you'll have to create a dummy list of values that span from 1 up to as many elements as there are in A1 (or A2 as they're both equal in size) to create an independent axis and interpolate both arrays independently by specifying the dummy list with a finer spacing in resolution. This finer spacing is controlled by the total number of new points you want to introduce in the plot. These points will be defined within the range of the dummy list but the spacing in between each point will decrease as you increase the total number of new points. As a general rule, the more points you add the less spacing there will be and so the plot should be more smooth. Once you do that, plot the final values together.
Here's some code for you to run. We will be using interp1 to perform the interpolation for us and most of the work. The function linspace creates the finer grid of points in the dummy list to facilitate the interpolation. N would be the total number of desired points you want to plot. I've made it 500 for now meaning that 500 points will be used for interpolation using your original data. Experiment by increasing (or decreasing) the total number of points and seeing what effect this has in the smoothness of your data.
I'll also be using the Piecewise Cubic Hermite Interpolating Polynomial or pchip as the method of interpolation, which is basically cubic spline interpolation if you want to get technical. Assuming that A1 and A2 are already created:
%// Specify number of interpolating points
N = 500;
%// Specify dummy list of points
D = 1 : numel(A1);
%// Generate finer grid of points
NN = linspace(1, numel(A1), N);
%// Interpolate each set of points independently
A1interp = interp1(D, A1, NN, 'pchip');
A2interp = interp1(D, A2, NN, 'pchip');
%// Plot the data
plot(A1interp, A2interp);
I now get the following:

How can I take data and plot it as a normal distribution in MATLAB

New to MATLAB, I want to take a vector data, normalize it, and plot it as a normal distribution. I have code to normalize my data and plot it as a histogram, but it does not come out as a normally distributed, so can someone point me in the right direction as to how to do this. The code below is for normalizing the data:
subplot(3, 1, 1)
[x, y] = hist(data, 50);
bar(y, x/trapz(y, x))
So this normalizes my histogram but does not make a normally distributed curve. The data is not random and is stored as a vector.
What you probably need is histfit which produces a histogram and fits a distribution to it at the same time. You can choose the number of bins and this distribution to fit as arguments to the function. The example below uses 100 bins for the histogram and fits a normal curve to your data:
histfit(data, 100, 'normal')
The default value for number of bins is square-root of the number of elements in data, rounded up, and the default value for distribution is normal. Full documentation for histfit is available here.

How to plot parametric functions between infinite limits?

How to plot these parametric functions with infinite limits to get a circle using matlab?
x(t)=2t/(1+t.^2)
y(t)=(1-t.^2)/(1+t.^2)
I don't know about infinite limits but
%Construct a vector of t ranging from a very small number to a very large number
t = -1000:0.1:1000;
%Create x and y vectors based on your formula (with a couple of extra dots for element wise division)
x =2*t./(1+t.^2);
y =(1-t.^2)./(1+t.^2);
%Just normal plotting now
plot(x,y)
gives me a circle. There will still be a tiny gap around (0, -1) however.