Kullback-Leibler Divergence between 2 Histograms from an image (MATLAB) - matlab

I pulled histograms from images on matlab, than I want to compare the histograms using KL-divergence.
I found this script but I do not understand how I could apply it to my case.
So here I pull my histogram (pretty simple!!):
[N,X]=hist(I,n);
[N1,X1]=hist(I1,n);
KLDiv(N,N1)
% ans=inf
N is the histogram of my image I
Like you can see my result is inf...
Please can you tell me in my case how to use the script?
Thanks

You probably want to calculate the histogram of an image using imhist, instead of the columnwise calculation of the histogram:
I1 = rand(10);
I2 = rand(10);
[N1, X1] = imhist(I1, 10); % limit the number of bins to avoid zero values
[N2, X2] = imhist(I2, 10);
KLDiv(N1.', N2.') % convert to row vectors to correspond with the requested format
KLDiv(N1.', N1.') % the divergence of an histogram with itself is indeed zero
Note that I limited the number of bins to be sure that each bin has at least one point, because the Kullback-Leibler divergence is not defined if Q(i) is zero and P(i) not:
The Kullback–Leibler divergence is defined only if Q(i)=0 implies
P(i)=0, for all i (absolute continuity).
Notes
Range of Kullback–Leibler divergence?
Any positive number, zero if (and only if) they are equal: KLD >= 0.
To which base should I take the logarithm? Natural logarithm log or base 2 logarithm log2?
Note that it is just a matter of scaling your results. So in fact, it doesn't matter, but be sure to use the same logarithm if you want to compare your results. Wikipedia suggests the following:
logarithms in these formulae are taken to base 2 if information is
measured in units of bits, or to base e if information is measured in
nats.

Related

Non parametric estimate of cdf in Matlab

I have a vector A in Matlab of dimension Nx1. I want to get a non-parametric estimate the cdf at each point in A and store all the values in a vector B of dimension Nx1. Which different options do I have?
I have read about ecdf and ksdensity but it is not clear to me what is the difference, pros and cons. Any direction would be appreciated.
This doesn't exactly answer your question, but you can compute the empirical CDF very simply:
A = randn(1,1e3); % example Gaussian data
x_cdf = sort(A);
y_cdf = (1:numel(A))/numel(A);
plot(x_cdf, y_cdf) % plot CDF
This works because, by definition, each sample contributes to the (empirical) CDF with an increment of 1/N. That is, for values smaller than the minimum sample the CDF equals 0; for values between the minimum sample and the next highest sample it equals 1/N, etc.
The advantage of this approach is that you know exactly what is being done.
If you need to evaluate the empirical CDF at prescribed x-axis values:
A = randn(1,1e3); % example Gaussian data
x_cdf = -5:.1:5;
y_cdf = sum(bsxfun(#le, A(:), x_cdf), 1)/numel(A);
plot(x_cdf, y_cdf) % plot CDF
If you have prescribed y-axis values, the corresponding x-axis values are by definition the quantiles of the (empirical) distribution:
A = randn(1,1e3); % example Gaussian data
y_cdf = 0:.01:1;
x_cdf = quantile(A, y_cdf);
plot(x_cdf, y_cdf) % plot CDF
You want ecdf, not ksdensity.
ecdf computes the empirical distribution function of your data set. This converges to the cumulative distribution function of the underlying population as the sample size increases.
ksdensity computes a kernel density estimation from your data. This converges to the probability density function of the underlying population as the sample size increases.
The PDF tells you how likely you are to get values near a given value. It wiggles up and down over your domain, going up near more likely values and falling near less likely values. The CDF tells you how likely you are to get values below a given value. So it always starts at zero at the left end of your domain and increases monotonically to one at the right end of your domain.

Generate random samples from arbitrary discrete probability density function in Matlab

I've got an arbitrary probability density function discretized as a matrix in Matlab, that means that for every pair x,y the probability is stored in the matrix:
A(x,y) = probability
This is a 100x100 matrix, and I would like to be able to generate random samples of two dimensions (x,y) out of this matrix and also, if possible, to be able to calculate the mean and other moments of the PDF. I want to do this because after resampling, I want to fit the samples to an approximated Gaussian Mixture Model.
I've been looking everywhere but I haven't found anything as specific as this. I hope you may be able to help me.
Thank you.
If you really have a discrete probably density function defined by A (as opposed to a continuous probability density function that is merely described by A), you can "cheat" by turning your 2D problem into a 1D problem.
%define the possible values for the (x,y) pair
row_vals = [1:size(A,1)]'*ones(1,size(A,2)); %all x values
col_vals = ones(size(A,1),1)*[1:size(A,2)]; %all y values
%convert your 2D problem into a 1D problem
A = A(:);
row_vals = row_vals(:);
col_vals = col_vals(:);
%calculate your fake 1D CDF, assumes sum(A(:))==1
CDF = cumsum(A); %remember, first term out of of cumsum is not zero
%because of the operation we're doing below (interp1 followed by ceil)
%we need the CDF to start at zero
CDF = [0; CDF(:)];
%generate random values
N_vals = 1000; %give me 1000 values
rand_vals = rand(N_vals,1); %spans zero to one
%look into CDF to see which index the rand val corresponds to
out_val = interp1(CDF,[0:1/(length(CDF)-1):1],rand_vals); %spans zero to one
ind = ceil(out_val*length(A));
%using the inds, you can lookup each pair of values
xy_values = [row_vals(ind) col_vals(ind)];
I hope that this helps!
Chip
I don't believe matlab has built-in functionality for generating multivariate random variables with arbitrary distribution. As a matter of fact, the same is true for univariate random numbers. But while the latter can be easily generated based on the cumulative distribution function, the CDF does not exist for multivariate distributions, so generating such numbers is much more messy (the main problem is the fact that 2 or more variables have correlation). So this part of your question is far beyond the scope of this site.
Since half an answer is better than no answer, here's how you can compute the mean and higher moments numerically using matlab:
%generate some dummy input
xv=linspace(-50,50,101);
yv=linspace(-30,30,100);
[x y]=meshgrid(xv,yv);
%define a discretized two-hump Gaussian distribution
A=floor(15*exp(-((x-10).^2+y.^2)/100)+15*exp(-((x+25).^2+y.^2)/100));
A=A/sum(A(:)); %normalized to sum to 1
%plot it if you like
%figure;
%surf(x,y,A)
%actual half-answer starts here
%get normalized pdf
weight=trapz(xv,trapz(yv,A));
A=A/weight; %A normalized to 1 according to trapz^2
%mean
mean_x=trapz(xv,trapz(yv,A.*x));
mean_y=trapz(xv,trapz(yv,A.*y));
So, the point is that you can perform a double integral on a rectangular mesh using two consecutive calls to trapz. This allows you to compute the integral of any quantity that has the same shape as your mesh, but a drawback is that vector components have to be computed independently. If you only wish to compute things which can be parametrized with x and y (which are naturally the same size as you mesh), then you can get along without having to do any additional thinking.
You could also define a function for the integration:
function res=trapz2(xv,yv,A,arg)
if ~isscalar(arg) && any(size(arg)~=size(A))
error('Size of A and var must be the same!')
end
res=trapz(xv,trapz(yv,A.*arg));
end
This way you can compute stuff like
weight=trapz2(xv,yv,A,1);
mean_x=trapz2(xv,yv,A,x);
NOTE: the reason I used a 101x100 mesh in the example is that the double call to trapz should be performed in the proper order. If you interchange xv and yv in the calls, you get the wrong answer due to inconsistency with the definition of A, but this will not be evident if A is square. I suggest avoiding symmetric quantities during the development stage.

Estimating skewness of histogram in MATLAB

What test can I do in MATLAB to test the spread of a histogram? For example, in the given set of histograms, I am only interested in 1,2,3,5 and 7 (going from left to right, top to bottom) because they are less spread out. How can I obtain a value that will tell me if a histogram is positively skewed?
It may be possible using Chi-Squared tests but I am not sure what the MATLAB code will be for that.
You can use the standard definition of skewness. In other words, you can use:
You compute the mean of your data and you use the above equation to calculate skewness. Positive and negative skewness are like so:
Source: Wikipedia
As such, the larger the value, the more positively skewed it is. The more negative the value, the more negatively skewed it is.
Now to compute the mean of your histogram data, it's quite simple. You simply do a weighted sum of the histogram entries and divide by the total number of entries. Given that your histogram is stored in h, the bin centres of your histogram are stored in x, you would do the following. What I will do here is assume that you have bins from 0 up to N-1 where N is the total number of bins in the histogram... judging from your picture:
x = 0:numel(h)-1; %// Judging from your pictures
num_entries = sum(h(:));
mu = sum(h.*x) / num_entries;
skew = ((1/num_entries)*(sum((h.*x - mu).^3))) / ...
((1/(num_entries-1))*(sum((h.*x - mu).^2)))^(3/2);
skew would contain the numerical measure of skewness for a histogram that follows that formula. Therefore, with your problem statement, you will want to look for skewness numbers that are positive and large. I can't really comment on what threshold you should look at, but look for positive numbers that are much larger than most of the histograms that you have.

matlab code for perceptual hashing

I need a matlab code for a perceptual hashing algorithm descried here:
http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
Basically I want this to remove deatails in an image and only leave the major structure components information.
To do so, I think I need the following steps:
1. Reduce the DCT. Suppose the DCT is 32x32 (), just keep the top-left 8x8. Those represent the lowest frequencies in the picture.
Compute the average value. Like the Average Hash, compute the mean DCT value (using only the 8x8 DCT low-frequency values and excluding the first term since the DC coefficient can be significantly different from the other values and will throw off the average).
Further reduce the DCT. Set the 64 hash bits to 0 or 1 depending on whether each of the 64 DCT values is above or below the average value. The result doesn't tell us the actual low frequencies; it just tells us the very-rough relative scale of the frequencies to the mean. The result will not vary as long as the overall structure of the image remains the same; this can survive gamma and color histogram adjustments without a problem.
reconstruct image after the processing.
Anyone can help on any one of above steps?
I have tried some code that gives some results (in the below link), it is not yet perfect:
https://stackoverflow.com/questions/26748051/extract-low-frequency-from-dct-coeffecients-of-an-image-in-matlab
Try this:
% read image
I = imread('cameraman.tif');
% cosine transform and reduction
d = dct2(I);
d = d(1:8,1:8);
% compute average
a = mean(mean(d));
% set bits, here unclear whether > or >= shall be used
b = d > a;
% maybe convert to string:
string = num2str(b(:)');

Mahalanobis distance in matlab: pdist2() vs. mahal() function

I have two matrices X and Y. Both represent a number of positions in 3D-space. X is a 50*3 matrix, Y is a 60*3 matrix.
My question: why does applying the mean-function over the output of pdist2() in combination with 'Mahalanobis' not give the result obtained with mahal()?
More details on what I'm trying to do below, as well as the code I used to test this.
Let's suppose the 60 observations in matrix Y are obtained after an experimental manipulation of some kind. I'm trying to assess whether this manipulation had a significant effect on the positions observed in Y. Therefore, I used pdist2(X,X,'Mahalanobis') to compare X to X to obtain a baseline, and later, X to Y (with X the reference matrix: pdist2(X,Y,'Mahalanobis')), and I plotted both distributions to have a look at the overlap.
Subsequently, I calculated the mean Mahalanobis distance for both distributions and the 95% CI and did a t-test and Kolmogorov-Smirnoff test to asses if the difference between the distributions was significant. This seemed very intuitive to me, however, when testing with mahal(), I get different values, although the reference matrix is the same. I don't get what the difference between both ways of calculating mahalanobis distance is exactly.
Comment that is too long #3lectrologos:
You mean this: d(I) = (Y(I,:)-mu)inv(SIGMA)(Y(I,:)-mu)'? This is just the formula for calculating mahalanobis, so should be the same for pdist2() and mahal() functions. I think mu is a scalar and SIGMA is a matrix based on the reference distribution as a whole in both pdist2() and mahal(). Only in mahal you are comparing each point of your sample set to the points of the reference distribution, while in pdist2 you are making pairwise comparisons based on a reference distribution. Actually, with my purpose in my mind, I think I should go for mahal() instead of pdist2(). I can interpret a pairwise distance based on a reference distribution, but I don't think it's what I need here.
% test pdist2 vs. mahal in matlab
% the purpose of this script is to see whether the average over the rows of E equals the values in d...
% data
X = []; % 50*3 matrix, data omitted
Y = []; % 60*3 matrix, data omitted
% calculations
S = nancov(X);
% mahal()
d = mahal(Y,X); % gives an 60*1 matrix with a value for each Cartesian element in Y (second matrix is always the reference matrix)
% pairwise mahalanobis distance with pdist2()
E = pdist2(X,Y,'mahalanobis',S); % outputs an 50*60 matrix with each ij-th element the pairwise distance between element X(i,:) and Y(j,:) based on the covariance matrix of X: nancov(X)
%{
so this is harder to interpret than mahal(), as elements of Y are not just compared to the "mahalanobis-centroid" based on X,
% but to each individual element of X
% so the purpose of this script is to see whether the average over the rows of E equals the values in d...
%}
F = mean(E); % now I averaged over the rows, which means, over all values of X, the reference matrix
mean(d)
mean(E(:)) % not equal to mean(d)
d-F' % not zero
% plot output
figure(1)
plot(d,'bo'), hold on
plot(mean(E),'ro')
legend('mahal()','avaraged over all x values pdist2()')
ylabel('Mahalanobis distance')
figure(2)
plot(d,'bo'), hold on
plot(E','ro')
plot(d,'bo','MarkerFaceColor','b')
xlabel('values in matrix Y (Yi) ... or ... pairwise comparison Yi. (Yi vs. all Xi values)')
ylabel('Mahalanobis distance')
legend('mahal()','pdist2()')
One immediate difference between the two is that mahal subtracts the sample mean of X from each point in Y before computing distances.
Try something like E = pdist2(X,Y-mean(X),'mahalanobis',S); to see if that gives you the same results as mahal.
Note that
mahal(X,Y)
is equivalent to
pdist2(X,mean(Y),'mahalanobis',cov(Y)).^2
Well, I guess there are two different ways to calculate mahalanobis distance between two clusters of data like you explain above:
1) you compare each data point from your sample set to mu and sigma matrices calculated from your reference distribution (although labeling one cluster sample set and the other reference distribution may be arbitrary), thereby calculating the distance from each point to this so called mahalanobis-centroid of the reference distribution.
2) you compare each datapoint from matrix Y to each datapoint of matrix X, with, X the reference distribution (mu and sigma are calculated from X only)
The values of the distances will be different, but I guess the ordinal order of dissimilarity between clusters is preserved when using either method 1 or 2? I actually wonder when comparing 10 different clusters to a reference matrix X, or to each other, if the order of the dissimilarities would differ using method 1 or method 2? Also, I can't imagine a situation where one method would be wrong and the other method not. Although method 1 seems more intuitive in some situations, like mine.