How can we calculate the percentage of similarities between two pattern of Histogram?
For example, I have a histogram of template which I called HistA, and I have another histogram which is HistB where I want to check the similarities percentage of HistB with HistA.
I tried check out some of method such as histogram equalization, histogram matching but none of them works with my problem.
As image below, I create a multiple histogram between HistA and HistB. The value of the frequencies were actually value from a 1D data.
I saw the pattern of HistA and HistB almost the same, so I want to know how to calculate the percentage of the similarities of this two histogram.
Measure Bhattacharya co-efficient between the two normalized histograms and as
where N is the number of bins in the histograms.
Note the normalization.
For more information, see Bhattacharya distance|Wikipedia or On a measure of divergence between two statistical populations defined by their probability distributions.
Related
I want to carry out hierarchical clustering in Matlab and plot the clusters on a scatterplot. I have used the evalclusters function to first investigate what a 'good' number of clusters would be using different criteria values eg Silhouette, CalinskiHarabasz. Here is the code I used for the evaluation (x is my data with 200 observations and 10 variables):
E = evalclusters(x,'linkage','CalinskiHarabasz','KList',[1:10])
%store kmean optimal clusters
optk=E.OptimalK;
%save the outouts to a structure
clust_struc(1).Optimalk=optk;
clust_struc(1).method={'CalinskiHarabasz'}
I then used code similar to what I have found online:
gscatter(x(:,1),x(:,2),E.OptimalY,'rbgckmr','xod*s.p')
%OptimalY is a vector 200 long with the cluster numbers
and this is what I get:
My question may be silly, but I don't understand why I am only using the first two columns of data to produce the scatter plot? I realise that the clusters themselves are being incorporated through the use of the Optimal Y, but should I not be using all of the data in x?
Each row in x is an observation with properties in size(x,2) dimensions. All this dimensions are used for clustering x rows.
However, when plotting the clusters, we cannot plot more than 2-3 dimensions so we try to represent each element with its key properties. I'm not sure that x(:,1),x(:,2) are the best option, but you have to choose 2 for a 2-D plot.
Usually you would have some property of interest that you want to plot. Have a look at the example in MATLAB doc: the fisheriris data has 4 different variables - the length and width measurements from the sepals and petals of three species of iris flowers. It is up to you to decide which you want to plot against each other (in the example they choosed Petal Length and Petal Width).
Here is a comparison between taking Petals measurements and Sepals measurements as the axis for plotting the grouping:
What test can I do in MATLAB to test the spread of a histogram? For example, in the given set of histograms, I am only interested in 1,2,3,5 and 7 (going from left to right, top to bottom) because they are less spread out. How can I obtain a value that will tell me if a histogram is positively skewed?
It may be possible using Chi-Squared tests but I am not sure what the MATLAB code will be for that.
You can use the standard definition of skewness. In other words, you can use:
You compute the mean of your data and you use the above equation to calculate skewness. Positive and negative skewness are like so:
Source: Wikipedia
As such, the larger the value, the more positively skewed it is. The more negative the value, the more negatively skewed it is.
Now to compute the mean of your histogram data, it's quite simple. You simply do a weighted sum of the histogram entries and divide by the total number of entries. Given that your histogram is stored in h, the bin centres of your histogram are stored in x, you would do the following. What I will do here is assume that you have bins from 0 up to N-1 where N is the total number of bins in the histogram... judging from your picture:
x = 0:numel(h)-1; %// Judging from your pictures
num_entries = sum(h(:));
mu = sum(h.*x) / num_entries;
skew = ((1/num_entries)*(sum((h.*x - mu).^3))) / ...
((1/(num_entries-1))*(sum((h.*x - mu).^2)))^(3/2);
skew would contain the numerical measure of skewness for a histogram that follows that formula. Therefore, with your problem statement, you will want to look for skewness numbers that are positive and large. I can't really comment on what threshold you should look at, but look for positive numbers that are much larger than most of the histograms that you have.
I have about 10000 floating point data, and have read them into a single row matrix.
Now I would like to plot them and show their distribution, would there be some simple functions to do that?
plot() actually plots value with respect to data number...which is not what I want
bar() is similar to what I want, but actually I would like to lower the sample rate and merge neighbor bars which are close enough (e.g. one bar for 0.50-0.55, and one bar for 0.55-0.60, etc) instead of having one single bar for every single data sample.
would there be a function to calculate this distribution by dividing the range into small steps, and outputting the prob density in each step?
Thank you!
hist() would be best. It plots a histogram, with a lot of options which you can see by doc hist, or by checking the Matlab website. Options include a specified number of bins, or a range of bins. This will plot a histogram of 1000 normally random points, with 50 bins.
hist(randn(1000,1),50)
I want to calculate p-value of my image comparison with ground truth image (reference image). So I find that we can calculate it from sensitivity and specificity value. Is it possible? Could you show me formula that apply for two images? Or any function in matlab?
p-values only make sense when you can model the statistical distribution of data.
In order to compute the p-value, first compute the c.d.f.:
https://en.wikipedia.org/wiki/Cumulative_distribution_function
If you cannot compute a c.d.f. for your problem (I am not aware of how one could compute c.d.f.s for images, don't ask me about this!), you won't be able to compute a p-value either.
I have a histogram with two peaks and I want to generate the corresponding probability distribution. I have used the following MATLAB code:
A=mydata;
M1=max(A);
M2=min(A);
I=(0:100).*(M1-M2)./100+M2;
[n,x]=hist(A,I);
bar(x,n/(1000*0.352))
I found this code frequently to explain how we can find a prob distribution for a histogram of random numbers normally distributed but I don't know whether if this true for a histogram with two peaks and generate a normalised probability distribution.
Try using this FileExchange submission - ALLFITDIST.
Not sure it can fit two peaks. But since they are quite far from each other, you can try to fit by range and then sum them together.