I have a population matrix of 5 images with 49 extracted salience features.
I want to calculate the cosine similarity in Matlab between a test image with the same extracted features 49.
1) Transform your images of size M lines X N columns in a vector M*N lines. Keep one image in a vector u and the other image in a vector v.
2) Evaluate: cosTheta = dot(u,v)/(norm(u)*norm(v)); [As far as I know there is no function in matlab that does that]
Usually people evaluate similarities among images using the projections of them on the eigenfaces. So, before doing that, people usually evaluate the eigenfaces.
You could use the matlab's built in function to get the cosine distance:
pdist([u;v],'cosine')
which returns the "One minus the cosine of the included angle between points". You could then subtract the answer from one to get the 'cosine of the included angle' (similarity), like this:
1 - pdist([u;v],'cosine')
Source: Pairwise distance between pairs of objects.
Related
I have extracted the features from a biometric image, and then applied histogram remapping on the extracted features. I found that this step has increased the recognition accuracy. It has reduced the distances between samples from the same person, and increased the distances between samples from different persons. When I used the histogram matlab function to plot the distribution of features after mapping, I found that all histogram are the same for all images and all persons. Is there any Matlab plot function which I can use to show the small differences between features from the same person, and the large differences between features from different persons, after the mapping step, compared to the differences between features before mapping?
The attached file presents examples. In this file, note the following please:
Images 21 and 22 are for the same parson, while image 63 is for a different person
Knn distance between features after mapping for images 21 & 22 is 394.3704
compared to 992.2379 between 21 & 63, and 993.2462 between 22 & 63. Although this difference in distances, the three histograms are the same.
Matlab codes:
% to draw the histogram of the filtered image
histogram(filtered_image22)
% to measure knn distance
[a b]=size(filtered_image21)
filtered_image21_vector=reshape(filtered_image21, [1 a*b]);
[x z]=size(filtered_image63)
filtered_image63_vector=reshape(filtered_image63, [1 x*z]);
[idx D]=knnsearch(filtered_image21_vector,filtered_image63_vector)
%knnsearch(x,y) searches for the nearest neighbor (i.e., the closest
%point, row, or observation) in x to each point (i.e., row or observation)
%in the query data y using an exhaustive search or a Kd-tree.
%knnsearch returns Idx, which is a column vector of the indices representing the nearest neighbors.
% and D whcih contains the distances between each observation in Y that correspond to the closest
I want to do a comparison of 2 audio files (each audio file is speaking "ba a ta") with the existing function in matlab called Dynamic Time Warping (DTW). Before doing a dynamic time warping, I get an array/vector from the Fast Fourier Transform (FFT) functions available in matlab, my code so far (my matlab filename: test.m):
fftRecording1 = fft(audioread('C:\Users\handy\Documents\MATLAB\my_recording_1.wav'));
fftRecording2 = fft(audioread('C:\Users\handy\Documents\MATLAB\fajar.wav'));
dist = dtw(fftRecording1, fftRecording2);
When I try the DTW function there is an error because the length (row) of the array/vector 2 file is different. Error message:
Error using dtw (line 82)
The number of rows between X and Y must be equal when X and Y are matrices
Error in test (line 3)
dist = dtw(fftRecording1, fftRecording2);
contents of the fftRecording1 and fftRecording2 variables
My question is: before do the FFT and DTW, how do step by step normalize so that the length (row) 2 audio files is equal? or there are other ways to make the data length (row) 2 audio files is equal?
According to dtw's documentation:
To stretch the inputs, dtw repeats each element of x and y as many times as necessary. If x and y are matrices, then dist stretches them by repeating their columns. In that case, x and y must have the same number of rows.
In your case your columns represent the audio channels, with the rows representing the quantity to be aligned (i.e. the reverse of what dtw is expecting). To setup the inputs according to what dtw expect, simply transpose the inputs:
dist = dtw(transpose(fftRecording1), transpose(fftRecording2));
Dynamic Time Warping does not need the input sequences to be of same length. DTW is actually used to find similarity between two different time aligned sequences.
No, they don’t need to have the same length in a time-related-sense. They need to have the same number of dimensions (2D Signal, 3D Signal,...) which is equivalent to their number or rows. The whole idea of DTW is to match similar contents which might be stretched to different lengths - so there would absolutely be no point in requiring the inputs to have the same length.
Related to your question: just call the dtw with the transposed of your signals and you will get a proper result.
dtw(signal1’, signal2’);
You should apply the DTW on the original signals rather than the fourier transforms. The FFT transfers the signal from time to frequency domain. So instead of warping signal1 in order to match signal2, you are warping frequencies when using FFT before DTW. The amplitude of the fourier transform depends on the number of points in the considered FFT-Time-Window. From my point of view there is absolutely no point in applying DTW on a fourier transform.
A proof of concept prototype I have to do for my final year project is to implement K-Means Clustering on a big data set and display the results on a graph. I only know object-oriented languages like Java and C# and decided to give MATLAB a try. I notice that with a functional language the approach to solving problems is very different, so I would like some insight on a few things if possible.
Suppose I have the following data set:
raw_data
400.39 513.29 499.99 466.62 396.67
234.78 231.92 215.82 203.93 290.43
15.07 14.08 12.27 13.21 13.15
334.02 328.79 272.2 306.99 347.79
49.88 52.2 66.35 47.69 47.86
732.88 744.62 687.53 699.63 694.98
And I picked row 2 and 4 to be the 2 centroids:
centroids
234.78 231.92 215.82 203.93 290.43 % Centroid 1
334.02 328.79 272.2 306.99 347.79 % Centroid 2
I want to now compute the euclidean distances of each point to each centroid, then assign each point to it's closest centroid and display this on a graph. Let's say I want I want to classify the centroids as blue and green. How can I do this in MATLAB? If this was Java I would initialise each row as an object and add to separate ArrayLists (representing the clusters).
If rows 1, 2 and 3 all belong to the first centroid / cluster, and rows 4, 5 and 6 belong to the second centroid / cluster - how can I classify these to display them as blue or green points on a graph? I am new to MATLAB and really curious about this. Thanks for any help.
(To begin with, Matlab has a flexible distance measuring function, pdist2 and also kmeans implementation, but I'm assuming that you want to build your code from scratch).
In Matlab, you try to implement everything as matrix algebra, without loops over elements.
In your case, if R is the raw_data matrix and C is the centroids matrix,
you can shift the dimension that represents centroid number to the 3rd place by
permC=permute(C,[3 2 1]); Then the bsxfun function allows you to subtract C from R while expanding R's third dimension as necessary: D=bsxfun(#minus,R,permC). Element-wise square followed by summation across columns SqD=sum(D.^2,2) will give you the squared distances of each observation from each centroid. Performing all these operations within a single statement and shifting the third (centroid) dimension back to the 2nd place will look like this:
SqD=permute(sum(bsxfun(#minus,R,permute(C,[3 2 1])).^2,2),[1 3 2])
Picking the centroid of minimal distance is now straightforward: [minDist,minCentroid]=min(SqD,[],2)
If this looks complex, I recommend inspecting the product of each sub-step and reading the help of each command.
I've got an arbitrary probability density function discretized as a matrix in Matlab, that means that for every pair x,y the probability is stored in the matrix:
A(x,y) = probability
This is a 100x100 matrix, and I would like to be able to generate random samples of two dimensions (x,y) out of this matrix and also, if possible, to be able to calculate the mean and other moments of the PDF. I want to do this because after resampling, I want to fit the samples to an approximated Gaussian Mixture Model.
I've been looking everywhere but I haven't found anything as specific as this. I hope you may be able to help me.
Thank you.
If you really have a discrete probably density function defined by A (as opposed to a continuous probability density function that is merely described by A), you can "cheat" by turning your 2D problem into a 1D problem.
%define the possible values for the (x,y) pair
row_vals = [1:size(A,1)]'*ones(1,size(A,2)); %all x values
col_vals = ones(size(A,1),1)*[1:size(A,2)]; %all y values
%convert your 2D problem into a 1D problem
A = A(:);
row_vals = row_vals(:);
col_vals = col_vals(:);
%calculate your fake 1D CDF, assumes sum(A(:))==1
CDF = cumsum(A); %remember, first term out of of cumsum is not zero
%because of the operation we're doing below (interp1 followed by ceil)
%we need the CDF to start at zero
CDF = [0; CDF(:)];
%generate random values
N_vals = 1000; %give me 1000 values
rand_vals = rand(N_vals,1); %spans zero to one
%look into CDF to see which index the rand val corresponds to
out_val = interp1(CDF,[0:1/(length(CDF)-1):1],rand_vals); %spans zero to one
ind = ceil(out_val*length(A));
%using the inds, you can lookup each pair of values
xy_values = [row_vals(ind) col_vals(ind)];
I hope that this helps!
Chip
I don't believe matlab has built-in functionality for generating multivariate random variables with arbitrary distribution. As a matter of fact, the same is true for univariate random numbers. But while the latter can be easily generated based on the cumulative distribution function, the CDF does not exist for multivariate distributions, so generating such numbers is much more messy (the main problem is the fact that 2 or more variables have correlation). So this part of your question is far beyond the scope of this site.
Since half an answer is better than no answer, here's how you can compute the mean and higher moments numerically using matlab:
%generate some dummy input
xv=linspace(-50,50,101);
yv=linspace(-30,30,100);
[x y]=meshgrid(xv,yv);
%define a discretized two-hump Gaussian distribution
A=floor(15*exp(-((x-10).^2+y.^2)/100)+15*exp(-((x+25).^2+y.^2)/100));
A=A/sum(A(:)); %normalized to sum to 1
%plot it if you like
%figure;
%surf(x,y,A)
%actual half-answer starts here
%get normalized pdf
weight=trapz(xv,trapz(yv,A));
A=A/weight; %A normalized to 1 according to trapz^2
%mean
mean_x=trapz(xv,trapz(yv,A.*x));
mean_y=trapz(xv,trapz(yv,A.*y));
So, the point is that you can perform a double integral on a rectangular mesh using two consecutive calls to trapz. This allows you to compute the integral of any quantity that has the same shape as your mesh, but a drawback is that vector components have to be computed independently. If you only wish to compute things which can be parametrized with x and y (which are naturally the same size as you mesh), then you can get along without having to do any additional thinking.
You could also define a function for the integration:
function res=trapz2(xv,yv,A,arg)
if ~isscalar(arg) && any(size(arg)~=size(A))
error('Size of A and var must be the same!')
end
res=trapz(xv,trapz(yv,A.*arg));
end
This way you can compute stuff like
weight=trapz2(xv,yv,A,1);
mean_x=trapz2(xv,yv,A,x);
NOTE: the reason I used a 101x100 mesh in the example is that the double call to trapz should be performed in the proper order. If you interchange xv and yv in the calls, you get the wrong answer due to inconsistency with the definition of A, but this will not be evident if A is square. I suggest avoiding symmetric quantities during the development stage.
I currently have a large matrix M (~100x100x50 elements) containing both positive and negative values. At the moment, if I want to smooth this matrix, I use the smooth3 function to apply a gaussian kernel over the entire 3-D matrix.
What I want to achieve is a variable level of smoothing within this matrix - i.e.. different parts of the matrix M are smoothed to different levels of sigma depending of the value in a similar 3-D matrix, d (with values ranging from 0 to 1). Where d is 0, no smoothing occurs, where d is 1 a maximum level of smoothing occurs.
The fact that the matrix is 3-D is trivial. Smoothing in 3 dimensions is nice, but not essential, and my current code (performing various other manipulations) handles each of the 50 slices of M separately anyway. I am happy to replace smooth3 with a convolution of M with a gaussian function, and perform this convolution over each slice individually. What I can't figure out is how to vary the sigma level of this gaussian function (based on d) given its location in M and output the result accordingly.
An alternative approach may be to use matrix d as a mask for a very smooth version of matrix Ms and somehow manipulate M and Ms to give an equivalent result, however I'm not convinced that this will work as I can't think of a function to combine M and Md that won't give artefacts of each of M or Ms when 0 < d < 1...any thoughts?
[I'm using 2009b, and only have access to the Signal Processing toolbox.]
You should have a look at the Guided Image Filter. It is a computationally efficient generalization of the bilateral filter.
http://research.microsoft.com/en-us/um/people/jiansun/papers/guidedfilter_eccv10.pdf
It will allow you to do proper smoothing based on your guidance matrix.