I found here How are HoG features represented graphically? code to visualize HOG features; it is done by 2 files in http://www.cs.berkeley.edu/~rbg/latent/index.html, visualizeHOG.m and
HOGpicture.m that is
(below code is released under an MIT license)
function im = HOGpicture(w, bs)
% Make picture of positive HOG weights.
% im = HOGpicture(w, bs)
% construct a "glyph" for each orientation
bim1 = zeros(bs, bs);
bim1(:,round(bs/2):round(bs/2)+1) = 1;
bim = zeros([size(bim1) 9]);
bim(:,:,1) = bim1;
for i = 2:9,
bim(:,:,i) = imrotate(bim1, -(i-1)*20, 'crop');
end
% make pictures of positive weights bs adding up weighted glyphs
s = size(w);
w(w < 0) = 0;
im = zeros(bs*s(1), bs*s(2));
for i = 1:s(1),
iis = (i-1)*bs+1:i*bs;
for j = 1:s(2),
jjs = (j-1)*bs+1:j*bs;
for k = 1:9,
im(iis,jjs) = im(iis,jjs) + bim(:,:,k) * w(i,j,k);
end
end
end
I don't undestand what is the bs parameter and what means..anycan can help me?
If you looking for visualizing HOG, you can have a look here, http://web.mit.edu/vondrick/ihog/#code
It was recently published in iccv 2013
If you want to visualize HOG features, then use VLFeat (there is a option called render which allows you to do this). The ICCV paper mentioned in the answer below reconstructs HOG features into an image. It tries to show you, "what computers would have seen"? Both are different, you may want to try both.
bs stands for bin size. Usually 8x8 (therefore, bs=8) is used but you should know what was the value of the bin size because that is a necessary parameter in computing HOG itself.
The extractHOGFeatures function in the Computer Vision System Toolbox for MATLAB optionally returns a visualization object that lets you visualize the features.
Related
I want to ask your help in EEG data classification.
I am a graduate student trying to analyze EEG data.
Now I am struggling with classifying ERP speller (P300) with SWLDA using Matlab
Maybe there is something wrong in my code.
I have read several articles, but they did not cover much details.
My data size is described as below.
size(target) = [300 1856]
size(nontarget) = [998 1856]
row indicates the number of trials, column indicates spanned feature
(I stretched data [64 29] (for visual representation I did not select ROI)
I used stepwisefit function in Matlab to classify target vs non-target
Code is attached below.
ingredients = [targets; nontargets];
heat = [class_targets; class_nontargets]; % target: 1, non-target: -1
randomized_set = shuffle([ingredients heat]);
for k=1:10 % 10-fold cross validation
parition_factor = ceil(size(randomized_set,1) / 10);
cv_test_idx = (k-1)*parition_factor + 1:min(k * parition_factor, size(randomized_set,1));
total_idx = 1:size(randomized_set,1);
cv_train_idx = total_idx(~ismember(total_idx, cv_test_idx));
ingredients = randomized_set(cv_train_idx, 1:end-1);
heat = randomized_set(cv_train_idx, end);
[W,SE,PVAL,INMODEL,STATS,NEXTSTEP,HISTORY]= stepwisefit(ingredients, heat, 'penter', .1);
valid_id = find(INMODEL==1);
v_weights = W(valid_id)';
t_ingredients = randomized_set(cv_test_idx, 1:end-1);
t_heat = randomized_set(cv_test_idx, end); % true labels for test set
v_features = t_ingredients(:, valid_id);
v_weights = repmat(v_weights, size(v_features, 1), 1);
predictor = sum(v_weights .* v_features, 2);
m_result = predictor > 0; % class A: +1, B: 0
t_heat(t_heat==-1) = 0;
acc(k) = sum(m_result==t_heat) / length(m_result);
end
p.s. my code is currently very inefficient and might be bad..
In my assumption, stepwisefit calculates significant coefficients every steps, and valid column would be remained.
Even though it's not LDA, but for binary classification, LDA and linear regression are not different.
However, results were almost random chance.. (for other binary data on the internet, it worked..)
I think I made something wrong, and your help can correct me.
I will appreciate any suggestion and tips to implement classifier for ERP speller.
Or any idea for implementing SWLDA in Matlab code?
The name SWLDA is only used in the context of Brain Computer Interfaces, but I bet it has another name in a more general context.
If you track the recipe of SWLDA you will end up in Krusienski 2006 papers ("A comparison..." and "Toward enhanced P300..") and from there the book where stepwise logarithmic regression is explained: "Draper Smith, Applied Regression Analysis, 1981". However, as far as I am aware of, no paper gives actually the complete recipe on how to implement it (and their details and secrets).
My approach was using stepwiseglm:
H=predictors;
TH=variables;
lbs=labels % (1,2)
if (stepwiseflag)
mdl = stepwiseglm(H', lbs'-1,'constant','upper','linear','distr','binomial');
if (mdl.NumEstimatedCoefficients>1)
inmodel = [];
for i=2:mdl.NumEstimatedCoefficients
inmodel = [inmodel str2num(mdl.CoefficientNames{i}(2:end))];
end
H = H(inmodel,:);
TH = TH(inmodel,:);
end
end
lbls = classify(TH',H',lbs','linear');
You can also use a k-fold cross validaton approach using matlab cvpartition.
c = cvpartition(lbs,'k',10);
opts = statset('display','iter');
fun = #(XT,yT,Xt,yt)...
(sum(~strcmp(yt,classify(Xt,XT,yT,'linear'))));
I would like to compare which are the 5 most similar images to an input image.
To do this I thought to use the SIFT (VLFeat library) and compare the respective descriptors.
So I use the vl_ubcmatch (doc here) method to calculate the similarity measurement between the images.
This is the code:
path_dir = './img/';
imgs = dir(path_dir);
imgs = imgs(3 : end);
numImgs = size(imgs);
numImgs = numImgs(1);
path1 = './img/car01.jpg';
Ia = imread(path1);
Ia = single(rgb2gray(Ia));
[fa, da] = vl_sift(Ia);
results = struct;
m = 0;
j = 1; % indice dell'img (del for)
for img = imgs'
path = strcat(path_dir, img.name);
if(strcmp(path1, path) == 0)
Ib = imread(path);
Ib = single(rgb2gray(Ib));
[fb, db] = vl_sift(Ib);
[matches, scores] = vl_ubcmatch(da, db);
s = sum(scores);
[r, c] = size(scores);
m = s ./ c;
results(j).measure = m;
results(j).img = path;
j = j + 1;
end
end
As you can see from the code, I thought I would use the mean as a measure of similarity but the results I get are not satisfactory (for example, it tells me that the input image of a cup is more similar to a tree than another cup).
According to you, is it better to have more equal descriptors but with low similar or less similar descriptors but with greater similarity?
I have 50 images of 5 different categories (cups, trees, people, tables and cars) and, given an image as input, the program will return the 5 most similar images to it and preferably belonging to the same category.
What measurement can I use instead of the mean to get a more precise classification?
Thanks!
According to your code you measure the similarity between image (Ia) and all other images (Ib). Therefore you compare the SIFT descriptors of Ia with those of all Ib's - which gives you a list of feature matches for each image pair (matches) and the Euclidean distance of each feature pair (scores).
Now using the mean of all scores of an image pair as a measure of similarity is not a very robust approach because an image pair with only one feature match could (by chance) lead to a better "similarity" than an image pair with many features - which I guess is an unrealistic solution for your task.
Concerning your question it is always better to have meaningful/robust descriptors, even if there are only a few (of course the more the better!), than having a lot of meaningless descriptors.
Proposal: why don't you just count the number of inliers (= number of feature matches for each image pair, numel(matches))?
With this it should give more inliers between images of the same object than different objects, so taking those pairs which have the 5 most inliers should be the most similar ones.
If you just want to distinguish a cup from a tree it should work. If your classification task is getting more difficult and you need to distinguish different types of trees, SIFT is not the best algorithm to use. A learning approach will give better results... but depends on your task.
I am trying to implement Bag of Words in opencv and has come with the implementation below. I am using Caltech 101 database. However, since its my first time and not being familiar, I have planned to used two image sets from the database, the chair image set and the soccer ball image set. I have coded for the svm using this.
Everything went allright, except when I call classifier.predict(descriptor) , I do not get the label vale as intended. I always get a0 instead of '1', irrespective of my test image. The number of images in the chair dataset is 10 and in the soccer ball dataset is 10. I labelled chair as 0 and soccer ball as 1 . The links represent the samples of each categories, the top 10 is of chairs, the bottom 10 is of soccer balls
function hello
clear all; close all; clc;
detector = cv.FeatureDetector('SURF');
extractor = cv.DescriptorExtractor('SURF');
links = {
'http://i.imgur.com/48nMezh.jpg'
'http://i.imgur.com/RrZ1i52.jpg'
'http://i.imgur.com/ZI0N3vr.jpg'
'http://i.imgur.com/b6lY0bJ.jpg'
'http://i.imgur.com/Vs4TYPm.jpg'
'http://i.imgur.com/GtcwRWY.jpg'
'http://i.imgur.com/BGW1rqS.jpg'
'http://i.imgur.com/jI9UFn8.jpg'
'http://i.imgur.com/W1afQ2O.jpg'
'http://i.imgur.com/PyX3adM.jpg'
'http://i.imgur.com/U2g4kW5.jpg'
'http://i.imgur.com/M8ZMBJ4.jpg'
'http://i.imgur.com/CinqIWI.jpg'
'http://i.imgur.com/QtgsblB.jpg'
'http://i.imgur.com/SZX13Im.jpg'
'http://i.imgur.com/7zVErXU.jpg'
'http://i.imgur.com/uUMGw9i.jpg'
'http://i.imgur.com/qYSkqEg.jpg'
'http://i.imgur.com/sAj3pib.jpg'
'http://i.imgur.com/DMPsKfo.jpg'
};
N = numel(links);
trainer = cv.BOWKMeansTrainer(100);
train = struct('val',repmat({' '},N,1),'img',cell(N,1), 'pts',cell(N,1), 'feat',cell(N,1));
for i=1:N
train(i).val = links{i};
train(i).img = imread(links{i});
if ndims(train(i).img > 2)
train(i).img = rgb2gray(train(i).img);
end;
train(i).pts = detector.detect(train(i).img);
train(i).feat = extractor.compute(train(i).img,train(i).pts);
end;
for i=1:N
trainer.add(train(i).feat);
end;
dictionary = trainer.cluster();
extractor = cv.BOWImgDescriptorExtractor('SURF','BruteForce');
extractor.setVocabulary(dictionary);
for i=1:N
desc(i,:) = extractor.compute(train(i).img,train(i).pts);
end;
a = zeros(1,10)';
b = ones(1,10)';
labels = [a;b];
classifier = cv.SVM;
classifier.train(desc,labels);
test_im =rgb2gray(imread('D:\ball1.jpg'));
test_pts = detector.detect(test_im);
test_feat = extractor.compute(test_im,test_pts);
val = classifier.predict(test_feat);
disp('Value is: ')
disp(val)
end
These are my test samples:
Soccer Ball
(source: timeslive.co.za)
Chair
Searching through this site I think that my algorithm is okay, even though I am not quite confident about it. If anybody can help me in finding the bug, it will be appreciable.
Following Amro's code , this was my result:
Distribution of classes:
Value Count Percent
1 62 49.21%
2 64 50.79%
Number of training instances = 61
Number of testing instances = 65
Number of keypoints detected = 38845
Codebook size = 100
SVM model parameters:
svm_type: 'C_SVC'
kernel_type: 'RBF'
degree: 0
gamma: 0.5063
coef0: 0
C: 62.5000
nu: 0
p: 0
class_weights: 0
term_crit: [1x1 struct]
Confusion matrix:
ans =
29 1
1 34
Accuracy = 96.92 %
Your logic looks fine to me.
Now I guess you'll have to tweak the various parameters if you want to improve the classification accuracy. This includes the clustering algorithm parameters (such as the vocabulary size, clusters initialization, termination criteria, etc..), the SVM parameters (kernel type, the C coefficient, ..), the local features algorithm used (SIFT, SURF, ..).
Ideally, whenever you want to perform parameter selection, you ought to use cross-validation. Some methods already have such mechanism embedded (CvSVM::train_auto for instance), but for the most part you'll have to do this manually...
Finally you should follow general machine learning guidelines; see the whole bias-variance tradeoff dilemma. The online Coursera ML class discusses this topic in detail in week 6, and explains how to perform error analysis and use learning curves to decide what to try next (do we need to add more instances, increase model complexity, and so on..).
With that said, I wrote my own version of the code. You might wanna compare it with your code:
% dataset of images
% I previously saved them as: chair1.jpg, ..., ball1.jpg, ball2.jpg, ...
d = [
dir(fullfile('images','chair*.jpg')) ;
dir(fullfile('images','ball*.jpg'))
];
% local-features algorithm used
detector = cv.FeatureDetector('SURF');
extractor = cv.DescriptorExtractor('SURF');
% extract local features from images
t = struct();
for i=1:numel(d)
% load image as grayscale
img = imread(fullfile('images', d(i).name));
if ~ismatrix(img), img = rgb2gray(img); end
% extract local features
pts = detector.detect(img);
feat = extractor.compute(img, pts);
% store along with class label
t(i).img = img;
t(i).class = find(strncmp(d(i).name,{'chair','ball'},4));
t(i).pts = pts;
t(i).feat = feat;
end
% split into training/testing sets
% (a better way would be to use cvpartition from Statistics toolbox)
disp('Distribution of classes:')
tabulate([t.class])
tTrain = t([1:7 11:17]);
tTest = t([8:10 18:20]);
fprintf('Number of training instances = %d\n', numel(tTrain));
fprintf('Number of testing instances = %d\n', numel(tTest));
% build visual vocabulary (by clustering training descriptors)
K = 100;
bowTrainer = cv.BOWKMeansTrainer(K, 'Attempts',5, 'Initialization','PP');
clust = bowTrainer.cluster(vertcat(tTrain.feat));
fprintf('Number of keypoints detected = %d\n', numel([tTrain.pts]));
fprintf('Codebook size = %d\n', K);
% compute histograms of visual words for each training image
bowExtractor = cv.BOWImgDescriptorExtractor('SURF', 'BruteForce');
bowExtractor.setVocabulary(clust);
M = zeros(numel(tTrain), K);
for i=1:numel(tTrain)
M(i,:) = bowExtractor.compute(tTrain(i).img, tTrain(i).pts);
end
labels = vertcat(tTrain.class);
% train an SVM model (perform paramter selection using cross-validation)
svm = cv.SVM();
svm.train_auto(M, labels, 'SvmType','C_SVC', 'KernelType','RBF');
disp('SVM model parameters:'); disp(svm.Params)
% evaluate classifier using testing images
actual = vertcat(tTest.class);
pred = zeros(size(actual));
for i=1:numel(tTest)
descs = bowExtractor.compute(tTest(i).img, tTest(i).pts);
pred(i) = svm.predict(descs);
end
% report performance
disp('Confusion matrix:')
confusionmat(actual, pred)
fprintf('Accuracy = %.2f %%\n', 100*nnz(pred==actual)./numel(pred));
Here are the output:
Distribution of classes:
Value Count Percent
1 10 50.00%
2 10 50.00%
Number of training instances = 14
Number of testing instances = 6
Number of keypoints detected = 6300
Codebook size = 100
SVM model parameters:
svm_type: 'C_SVC'
kernel_type: 'RBF'
degree: 0
gamma: 0.5063
coef0: 0
C: 312.5000
nu: 0
p: 0
class_weights: []
term_crit: [1x1 struct]
Confusion matrix:
ans =
3 0
1 2
Accuracy = 83.33 %
So the classifier correctly labels 5 out of 6 images from the test set, which is not bad for a start :) Obviously you'll get different results each time you run the code due to the inherent randomness of the clustering step.
What is the number of images you are using to build your dictionary i.e. what is N? From your code, it seems that you are only using a 10 images (those listed in links). I hope this list is truncated down for this post else that would be too few. Typically you need in the order of 1000 or much more images to build the dictionary and the images need not be restricted to only these 2 classes that you are classifying. Otherwise, with only 10 images and 100 clusters your dictionary is likely to be messed up.
Also, you might want to use SIFT as a first choice as it tends to perform better than the other descriptors.
Lastly, you can also debug by checking the detected keypoints. You can get OpenCV to draw the keypoints. Sometimes your keypoint detector parameters are not set properly, resulting in too few keypoints getting detected, which in turn gives poor feature vectors.
To understand more about the BOW algorithm, you can take a look at these posts here and here. The second post has a link to a free pdf for an O'Reilley book on computer vision using python. The BOW model (and other useful stuff) is described in more details inside that book.
Hope this helps.
I have a question if that's ok. I was recently looking for algorithm to calculate MFCCs. I found a good tutorial rather than code so I tried to code it by myself. I still feel like I am missing one thing. In the code below I took FFT of a signal, calculated normalized power, filter a signal using triangular shapes and eventually sum energies corresponding to each bank to obtain MFCCs.
function output = mfcc(x,M,fbegin,fs)
MF = #(f) 2595.*log10(1 + f./700);
invMF = #(m) 700.*(10.^(m/2595)-1);
M = M+2; % number of triangular filers
mm = linspace(MF(fbegin),MF(fs/2),M); % equal space in mel-frequency
ff = invMF(mm); % convert mel-frequencies into frequency
X = fft(x);
N = length(X); % length of a short time window
N2 = max([floor(N+1)/2 floor(N/2)+1]); %
P = abs(X(1:N2,:)).^2./N; % NoFr no. of periodograms
mfccShapes = triangularFilterShape(ff,N,fs); %
output = log(mfccShapes'*P);
end
function [out,k] = triangularFilterShape(f,N,fs)
N2 = max([floor(N+1)/2 floor(N/2)+1]);
M = length(f);
k = linspace(0,fs/2,N2);
out = zeros(N2,M-2);
for m=2:M-1
I = k >= f(m-1) & k <= f(m);
J = k >= f(m) & k <= f(m+1);
out(I,m-1) = (k(I) - f(m-1))./(f(m) - f(m-1));
out(J,m-1) = (f(m+1) - k(J))./(f(m+1) - f(m));
end
end
Could someone please confirm that this is all right or direct me if I made mistake> I tested it on a simple pure tone and it gives me, in my opinion, reasonable answers.
Any help greatly appreciated :)
PS. I am working on how to apply vectorized Cosinus Transform. It looks like I would need a matrix of MxM of transform coefficients but I did not find any source that would explain how to do it.
You can test it yourself by comparing your results against other implementations like this one here
you will find a fully configurable matlab toolbox incl. MFCCs and even a function to reverse MFCC back to a time signal, which is quite handy for testing purposes:
melfcc.m - main function for calculating PLP and MFCCs from sound waveforms, supports many options.
invmelfcc.m - main function for inverting back from cepstral coefficients to spectrograms and (noise-excited) waveforms, options exactly match melfcc (to invert that processing).
the page itself has a lot of information on the usage of the package.
Maximally Stable Extremal Regions (MSERs) are found from an image in Matlab using detectMSERFeatures.
Is there any patch or method to get the hierarchical MSER component tree from Matlab?
This tree is anyways being generated when Matlab calculates the regions - it returns only the most "stable" component from each region's tree. Since this tree is already present, I am searching for ways to expose this to user code from the Matlab libraries, which keeps this part hidden and provides only the final "maximally stable" regions.
Anything would be acceptable - modifications of the Matlab inbuilt code, patches, hacks whatever. (I realize OpenCV has such a patch, however I am trying to avoid porting to OpenCV as most of the other procedures are written in Matlab).
EDIT: (from the original hierarchical MSER paper)
Detected MSERs(left), MSER Tree(right)
"Hierarchical MSER component tree" is a confusing phrase, because (1) the component tree is already hierarchical (2) if you want the whole tree, then you don't want only the Maximally Stable Extremal Regions (MSER), but instead you want all the extremal regions, and (3) extremal regions and components are the same thing in this context.
So let's say you want the extremal region tree. As noted in comments, you can't have exactly the one MATLAB uses because detectMSERFeatures.m calls a mex function that we don't have source code for (though, based on its inputs and name, it is likely very similar to the openCV MSER function). But you can still compute your own extremal region tree. Basically what this code does is finds connected components in the image at various levels of thresholding. Those CCs are the extremal regions. The trickiest part of the code is recording the parent relations. This should get you started:
% input image, included with MATLAB
x = imread('rice.png');
pixelList = {};
parents = [];
oldERsLabeled = zeros(size(x));
oldPixelList = {};
regionLabelOffset = 0;
for i = 255:-10:1 % the stride here is important, smaller will be slower and give more regions
newERs = bwlabel(x > i);
newERsLabeled = zeros(size(newERs));
newERsLabeled(newERs > 0) = newERs(newERs > 0) + regionLabelOffset;
regionLabelOffset = max(newERsLabeled(:));
% update the list of regions
props = regionprops(newERs, 'pixelList');
newPixelList = {props(:).PixelList};
pixelList = [pixelList newPixelList];
% figure out parents
newParents = cellfun(#(c)(newERsLabeled( sub2ind(size(x), c(1,2), c(1,1)))), oldPixelList);
parents = [parents; newParents'];
oldPixelList = newPixelList;
oldERsLabeled = newERsLabeled;
end
parents(end+1 : length(pixelList)) = -1; % top level regions have no parents
pixelListInt = cellfun(#int32, pixelList, 'UniformOutput', false);
regions = MSERRegions(pixelListInt');
% plot the first 300 regions
figure
imshow(x)
hold on
plot(regions(1:300), 'showEllipses', false, 'showPixelList', true);
% show all parents of a region ("close all" command might be useful after)
curRegion = 102;
while curRegion ~= -1
figure
imshow(x)
hold on
plot(regions(curRegion), 'showEllipses', false, 'showPixelList', true);
curRegion = parents(curRegion);
end