How to plot a probability density distribution graph in MATLAB? - matlab

I have about 10000 floating point data, and have read them into a single row matrix.
Now I would like to plot them and show their distribution, would there be some simple functions to do that?
plot() actually plots value with respect to data number...which is not what I want
bar() is similar to what I want, but actually I would like to lower the sample rate and merge neighbor bars which are close enough (e.g. one bar for 0.50-0.55, and one bar for 0.55-0.60, etc) instead of having one single bar for every single data sample.
would there be a function to calculate this distribution by dividing the range into small steps, and outputting the prob density in each step?
Thank you!

hist() would be best. It plots a histogram, with a lot of options which you can see by doc hist, or by checking the Matlab website. Options include a specified number of bins, or a range of bins. This will plot a histogram of 1000 normally random points, with 50 bins.
hist(randn(1000,1),50)

Related

Matlab - how does contour plot generating levels automatically?

I am using contourf to generate filled contour plots on MatLab with specified levels number.
According to the documents (https://www.mathworks.com/help/matlab/ref/contourf.html#mw_9088c636-4036-4e00-bd43-f6c5632b63ec)
It says Specify levels as a scalar value n to display the contour lines at n automatically chosen levels (heights).
I am wondering how does it choose the threshold automatically? What is the algorithm of choosing the thresholds? Take level as 1 as an example.
Many thanks!
As said in the comments, it just makes sure there are n dividing lines between your max and min.
Proof:
n=10;
z=peaks;
[m,c]=contour(z,10,'ShowText','on');
levels=linspace(min(z(:)),max(z(:)),n+2);
isequal(c.LevelList,levels(2:end-1))

Fitting gaussians to close peaks in MATLAB

I have a data set that has two peaks close together. I'd like to fit these peaks with gaussians so that I come up with a new data set that replicates the original one. To this end, I am using MATLAB's "findpeaks" function, and using the heights and widths of the peaks in order to come up with the appropriate number of gaussians, and then add those gaussians together. However, because the peaks are so close together, the result looks like the following (with the original data set in blue and the replicated one in red):
Is there a better method to replicate the data with gaussian peaks?
Gaussian function are defined by two variable, mean and variance. The two peak would give you the means of the two gaussians and by the look of the figure the same variance for them both (If some data has gone through a Gaussian process the variance would be the same, I can not think of a physical process where that would not be the case, unless it is just an arbitrary plot). So you only have to find one variable. As for the peaks that would just be the normalization so that the area under the curve sums up to 1. A gaussian sums up to 1 by default, if the sum under the plot you are trying to fit is 2 you do not need to do anything, otherwise scale accordingly.
My guess is something like this (pseudo code):
f = 0.5*gauss(-3,var)+0.5*gauss(3,var)
If you know more about the process that created the plot, then you can actually do better.

MATLAB: count number of peaks

I have a graph like this and I want to determine the number of peaks. Since it's a wave function the whole graph has many peaks that is why I was unsuccefull in finding the number of peaks using functions such as findpeaks for the graph below it returns a number of peaks around 3000 whereas I want to have the number 7.
My idea was to do a for or while loop that counts the number of instances where the average is higher than 0.5. So ultimately I want a function that iterates in segments over the graph returns the number of peaks and the range of y index values for which this peak occurs (I think the best way to do this would to save them to a zeros matrix).
link of file data: https://www.dropbox.com/s/vv8vqv28mqzfr9l/Example_data.mat?dl=0
Do you mean you are trying to count the 'on' parts of your data?
You're on the right track using findpeaks. If you scroll to the bottom of the documentation you'll see that you can actually tweak the routine in various ways, for example specifying the minimum distance between peaks, or the minimum difference between a point and its neighbour before it is considered a peak.
By defining the minimum distance between peaks, I detected the following 7 peaks. Code is included below. Alternatively you can play around with the other parameters you can pass into findpeaks.
The only other thing to note is that I took the absolute value of your data.
load('Example_data.mat')
indx = 1:numel(number11);
[pks, locs] = findpeaks(abs(number11), indx, 'MinPeakDistance', 0.25e4);
hold on
plot(number11)
plot(locs,pks, 'rx')
disp(numel(pks))

Finding defined peaks with Clusters in MATLAB

this is my problem:
I have the next data "A", which looks like:
As you can see, I have drawn with red circles the apparently peaks, the most defined are 2 and 7, I say that they are defined because its standard deviation is low in comparison with the other peaks (especially the second one).
What I need is a way (anyway) to get the values and the standard deviation of n peaks in a numeric array.
I have tried with "clusters", but I got no good results:
First of all, I used "kmeans" MATLAB function, and I realize that this algorithm doesn't group peaks as I need. As you can see in the picture above, in the red circle, that cluster has at less 3 or 4 peaks. And kmeans need that you set the number of clusters, and I need to identify it automatically.
I hope that anyone can give me some ideas, or a way to get better results, thanks.
Pd: I leave the data "A" in the next link.
https://drive.google.com/file/d/0B4WGV21GqSL5a2EyQ2l0SHZURzA/edit?usp=sharing
The problem is that your axes have very different meaning.
K-means optimizes variance. But variance in X is something entirely different than variance in Y, isn't it? Furthermore, each of these methods will split your data in both X and Y, whereas I assume you want the data to be partitioned on the X axis only.
I suggest the following: consider the Y axis to be a weight, and X axis to be a position.
Then perform weighted density estimation, and look for low density to separate your clusters.
I can't help you with MATLAB. I don't use it.
Mathematically, what you want to do is place a Gaussian at each point, with area Y and center X. Then find minima and maxima on the sum of these Gaussians. See Wikipedia, Kernel Density Estimation for details; except that you want to use the Y axis as weights. You could maybe also use 1/Y as standard deviation, if you don't want to use weights.

Limit data values displayed in MATLAB histogram

I have a vector that I want to print a histogram of of data for. This data ranges from -100 to +100. The amount of data around the outer edges is insignificant and therefore I don't want to see it. I am most interested in showing data from -20 to +20.
1.) How can I limit that window to print on my histogram?
The amount of data I have at 0 outnumbers of the amount of data I have anywhere in the dataset by a minimum of 10:1. When I print the histogram, the layout of element frequency is lost because it is outnumbered by 0.
2.) Is there a way that I can scale the number of 0 values to be three times the number of -1 entries?
I am expecting an exponential drop of this dataset (in general) and therefore three times the frequency of -1 would easily allow me to see the frequency of the other data.
You can use something like
binCenters = -20:5:20;
[N,X] = hist(V,binCenters);
N = N./scalingVector;
bar(X(2:end-1),N(2:end-1));
Note that the code excludes the extremes of N and X from the bar plot, since they contain the number of values smaller than -20 and larger than 20. Also, by building scalingVector appropriately, you can scale N as you please.
You could also just toss out any values outside the [-20,20] range by using
subsetData=data(abs(data)<=20)
1) You can limit the histogram range you see on the plot by just setting the X axes limits:
xlim([-20 20])
Setting bins in hist command is good, but remember thatall the values outside the bins will fall into the most left and right bin. So you will need to set the axes limits anyway.
2) If there is a big difference between values in different bins, one way is to transform values on Y axes to log scale. Unfortunately just setting Y axes to log (set(gca,'YScale','log')) does not work for bar plot. Calculate the histogram with hist or histc (depending on whether you want to specify bins centers or edges) and log2 the values:
[y, xbin] = hist(data);
bar(xbin, log2(y) ,'hist')
Histogram has a few different methods of calling it. I strongly recommend you read the documentation on the function (doc hist)
What you are looking for is to put in a custom range in the histogram bin. It depends a bit on how many bins you want, but something like this will work.
Data=randn(1000,1)*20;
hist(Data,-20:20);
You could, if you want to, change the frequency of the binning as well. You could also change the axis so that you only focus on the range from -20 to 20, using a xaxis([-20 20]) command. You could also ignore the bin at 0, by using an yaxis and limiting the values to exclude the 0 bin. Without knowing what you want exactly, I can only give you suggestions.