I m little confused about Andrea Vedaldi implementation of the algorithm. I m trying to extract features with the algorithm sift of the toolbox.
I m using this command [frames,descriptors] = sift(image, 'Verbosity', 1); so I ve got the frames which is 4xk matrix and the descriptors which is 128xK. I want to use a vector as a feature. Which of the two matrices should i use as a feature? Has anyone idea?
The descriptors are what you compare in order to determine matches.
I1 = double(rgb2gray(imread('image1.png'))/256) ;
I2 = double(rgb2gray(imread('image2.png'))/256) ;
[frames1,descriptors1] = sift(I1, 'Verbosity', 1) ;
[frames2,descriptors2] = sift(I2, 'Verbosity', 1) ;
matches = siftmatch(descriptors1, descriptors2) ;
You now have a matrix of matched features between the two images.
To visualize the results add the following line to the above
plotsiftmatches(I1,I2,frames1,frames2,matches);
Vedaldi's report can be found here.
Related
I would like to compare which are the 5 most similar images to an input image.
To do this I thought to use the SIFT (VLFeat library) and compare the respective descriptors.
So I use the vl_ubcmatch (doc here) method to calculate the similarity measurement between the images.
This is the code:
path_dir = './img/';
imgs = dir(path_dir);
imgs = imgs(3 : end);
numImgs = size(imgs);
numImgs = numImgs(1);
path1 = './img/car01.jpg';
Ia = imread(path1);
Ia = single(rgb2gray(Ia));
[fa, da] = vl_sift(Ia);
results = struct;
m = 0;
j = 1; % indice dell'img (del for)
for img = imgs'
path = strcat(path_dir, img.name);
if(strcmp(path1, path) == 0)
Ib = imread(path);
Ib = single(rgb2gray(Ib));
[fb, db] = vl_sift(Ib);
[matches, scores] = vl_ubcmatch(da, db);
s = sum(scores);
[r, c] = size(scores);
m = s ./ c;
results(j).measure = m;
results(j).img = path;
j = j + 1;
end
end
As you can see from the code, I thought I would use the mean as a measure of similarity but the results I get are not satisfactory (for example, it tells me that the input image of a cup is more similar to a tree than another cup).
According to you, is it better to have more equal descriptors but with low similar or less similar descriptors but with greater similarity?
I have 50 images of 5 different categories (cups, trees, people, tables and cars) and, given an image as input, the program will return the 5 most similar images to it and preferably belonging to the same category.
What measurement can I use instead of the mean to get a more precise classification?
Thanks!
According to your code you measure the similarity between image (Ia) and all other images (Ib). Therefore you compare the SIFT descriptors of Ia with those of all Ib's - which gives you a list of feature matches for each image pair (matches) and the Euclidean distance of each feature pair (scores).
Now using the mean of all scores of an image pair as a measure of similarity is not a very robust approach because an image pair with only one feature match could (by chance) lead to a better "similarity" than an image pair with many features - which I guess is an unrealistic solution for your task.
Concerning your question it is always better to have meaningful/robust descriptors, even if there are only a few (of course the more the better!), than having a lot of meaningless descriptors.
Proposal: why don't you just count the number of inliers (= number of feature matches for each image pair, numel(matches))?
With this it should give more inliers between images of the same object than different objects, so taking those pairs which have the 5 most inliers should be the most similar ones.
If you just want to distinguish a cup from a tree it should work. If your classification task is getting more difficult and you need to distinguish different types of trees, SIFT is not the best algorithm to use. A learning approach will give better results... but depends on your task.
I am trying to extract SIFT Features with vl_feat implementation in Matlab and compute then the GMM model as well as the Fisher Vector. I have two subsets train and test images from DTD Dataset.
run vl_sift on each split (train&test) and save the 128xN Features
Apply the cell Array each consists of 128xN Features to vl_gmm and get for each Feature [mean covarinace weight] and then apply the Features with calculated gmm model values to vl_fisher for each Feature.
Make PCA
Put all in SVM
My problem is that I dont know in step 2. how to transform the Feature values of each image to fit in into vl_gmm and vl_fisher.
Here is my code:
%% SIFT Feature Extraction
FV_train = cell(size(train_name, 1), 1);
FV_test = cell(size(test_name, 1), 1);
parfor_progress(size(train_name, 1));
parfor n = 1:size(train_name, 1)
[~, FV_train{n}] = vl_sift(single(histeq(imresize(rgb2gray(imread(strcat(pwd, '/DTD/images', '/', train_name{n}))), [512 512]))));
[~, FV_test{n}] = vl_sift(single(histeq(imresize(rgb2gray(imread(strcat(pwd, '/DTD/images', '/', test_name{n}))), [512 512]))));
parfor_progress;
end
parfor_progress(0);
FV_train = FV_train(~cellfun('isempty',FV_train));
FV_test = FV_test(~cellfun('isempty',FV_test));
FV_train = adaptFV(FV_train);
FV_test = adaptFV(FV_test);
parfor n = 1:size(FV_train, 1)
FV_train{n} = double(reshape(FV_train{n},1,size(FV_train{n},2)*size(FV_train{n},1)));
FV_test{n} = double(reshape(FV_test{n},1,size(FV_test{n},2)*size(FV_test{n},1)));
end
There exists two other problems:
One ist that SIFT fails on some images, therefore I rejected them
Due to the fact of the different dimensionality of SIFT Feature I have taken the longest one and fill the others with zeros to an 1xN Feature Vector.
MATLAB's im2col and col2im are very important function for vectorization in MATLAB when dealing with images.
Yet they require MATLAB's Image Processing Toolbox.
My question is, is there an efficient (Vectorzied) way to implement the using MATLAB's functions (With no toolbox)?
I need both the sliding and distinct mode.
I don't need any padding.
Thank You.
I can only hope that Mathworks guys don't sue you or me or Stackoverflow for that matter, trying to create vectorized implementations of their IP toolbox functions, as they have put price on that toolbox. But in any case, forgetting those issues, here are the implementations.
Replacement for im2col with 'sliding' option
I wasn't able to vectorize this until I sat down to write solution to another problem on Stackoverflow. So, I would strongly encourage to look into it too.
function out = im2col_sliding(A,blocksize)
nrows = blocksize(1);
ncols = blocksize(2);
%// Get sizes for later usages
[m,n] = size(A);
%// Start indices for each block
start_ind = reshape(bsxfun(#plus,[1:m-nrows+1]',[0:n-ncols]*m),[],1); %//'
%// Row indices
lin_row = permute(bsxfun(#plus,start_ind,[0:nrows-1])',[1 3 2]); %//'
%// Get linear indices based on row and col indices and get desired output
out = A(reshape(bsxfun(#plus,lin_row,[0:ncols-1]*m),nrows*ncols,[]));
return;
Replacement for im2col with 'distinct' option
function out = im2col_distinct(A,blocksize)
nrows = blocksize(1);
ncols = blocksize(2);
nele = nrows*ncols;
row_ext = mod(size(A,1),nrows);
col_ext = mod(size(A,2),ncols);
padrowlen = (row_ext~=0)*(nrows - row_ext);
padcollen = (col_ext~=0)*(ncols - col_ext);
A1 = zeros(size(A,1)+padrowlen,size(A,2)+padcollen);
A1(1:size(A,1),1:size(A,2)) = A;
t1 = reshape(A1,nrows,size(A1,1)/nrows,[]);
t2 = reshape(permute(t1,[1 3 2]),size(t1,1)*size(t1,3),[]);
t3 = permute(reshape(t2,nele,size(t2,1)/nele,[]),[1 3 2]);
out = reshape(t3,nele,[]);
return;
Some quick tests show that both these implementations particularly sliding one for small to decent sized input data and distinct for all datasizes perform much better than the in-built MATLAB function implementations in terms of runtime performance.
How to use
With in-built MATLAB function -
B = im2col(A,[nrows ncols],'sliding')
With our custom function -
B = im2col_sliding(A,[nrows ncols])
%// ------------------------------------
With in-built MATLAB function -
B = im2col(A,[nrows ncols],'distinct')
With our custom function -
B = im2col_distinct(A,[nrows ncols])
You can cheat while looking to GNU Octave image package. There are im2col and col2im as script language implemented:
im2col
col2im
As far as I see, it differs most in different comment style (# instead of %) and different string style (" instead of '). If you change this and remove the assert test on the bottom, it might be runnable already. If not, go through it with the debugger.
Furthermore, be aware of the License (GPLv3). It's free, but your changes have to be free too!
I found here How are HoG features represented graphically? code to visualize HOG features; it is done by 2 files in http://www.cs.berkeley.edu/~rbg/latent/index.html, visualizeHOG.m and
HOGpicture.m that is
(below code is released under an MIT license)
function im = HOGpicture(w, bs)
% Make picture of positive HOG weights.
% im = HOGpicture(w, bs)
% construct a "glyph" for each orientation
bim1 = zeros(bs, bs);
bim1(:,round(bs/2):round(bs/2)+1) = 1;
bim = zeros([size(bim1) 9]);
bim(:,:,1) = bim1;
for i = 2:9,
bim(:,:,i) = imrotate(bim1, -(i-1)*20, 'crop');
end
% make pictures of positive weights bs adding up weighted glyphs
s = size(w);
w(w < 0) = 0;
im = zeros(bs*s(1), bs*s(2));
for i = 1:s(1),
iis = (i-1)*bs+1:i*bs;
for j = 1:s(2),
jjs = (j-1)*bs+1:j*bs;
for k = 1:9,
im(iis,jjs) = im(iis,jjs) + bim(:,:,k) * w(i,j,k);
end
end
end
I don't undestand what is the bs parameter and what means..anycan can help me?
If you looking for visualizing HOG, you can have a look here, http://web.mit.edu/vondrick/ihog/#code
It was recently published in iccv 2013
If you want to visualize HOG features, then use VLFeat (there is a option called render which allows you to do this). The ICCV paper mentioned in the answer below reconstructs HOG features into an image. It tries to show you, "what computers would have seen"? Both are different, you may want to try both.
bs stands for bin size. Usually 8x8 (therefore, bs=8) is used but you should know what was the value of the bin size because that is a necessary parameter in computing HOG itself.
The extractHOGFeatures function in the Computer Vision System Toolbox for MATLAB optionally returns a visualization object that lets you visualize the features.
I have a question if that's ok. I was recently looking for algorithm to calculate MFCCs. I found a good tutorial rather than code so I tried to code it by myself. I still feel like I am missing one thing. In the code below I took FFT of a signal, calculated normalized power, filter a signal using triangular shapes and eventually sum energies corresponding to each bank to obtain MFCCs.
function output = mfcc(x,M,fbegin,fs)
MF = #(f) 2595.*log10(1 + f./700);
invMF = #(m) 700.*(10.^(m/2595)-1);
M = M+2; % number of triangular filers
mm = linspace(MF(fbegin),MF(fs/2),M); % equal space in mel-frequency
ff = invMF(mm); % convert mel-frequencies into frequency
X = fft(x);
N = length(X); % length of a short time window
N2 = max([floor(N+1)/2 floor(N/2)+1]); %
P = abs(X(1:N2,:)).^2./N; % NoFr no. of periodograms
mfccShapes = triangularFilterShape(ff,N,fs); %
output = log(mfccShapes'*P);
end
function [out,k] = triangularFilterShape(f,N,fs)
N2 = max([floor(N+1)/2 floor(N/2)+1]);
M = length(f);
k = linspace(0,fs/2,N2);
out = zeros(N2,M-2);
for m=2:M-1
I = k >= f(m-1) & k <= f(m);
J = k >= f(m) & k <= f(m+1);
out(I,m-1) = (k(I) - f(m-1))./(f(m) - f(m-1));
out(J,m-1) = (f(m+1) - k(J))./(f(m+1) - f(m));
end
end
Could someone please confirm that this is all right or direct me if I made mistake> I tested it on a simple pure tone and it gives me, in my opinion, reasonable answers.
Any help greatly appreciated :)
PS. I am working on how to apply vectorized Cosinus Transform. It looks like I would need a matrix of MxM of transform coefficients but I did not find any source that would explain how to do it.
You can test it yourself by comparing your results against other implementations like this one here
you will find a fully configurable matlab toolbox incl. MFCCs and even a function to reverse MFCC back to a time signal, which is quite handy for testing purposes:
melfcc.m - main function for calculating PLP and MFCCs from sound waveforms, supports many options.
invmelfcc.m - main function for inverting back from cepstral coefficients to spectrograms and (noise-excited) waveforms, options exactly match melfcc (to invert that processing).
the page itself has a lot of information on the usage of the package.