I have downloaded three different HoG codes.
using the image of 64x128
1) using the matlab function:extractHOGFeatures,
[hog, vis] = extractHOGFeatures(img,'CellSize',[8 8]);
The size of hog is 3780.
How to calculate:
HOG feature length, N, is based on the image size and the function parameter values.
N = prod([BlocksPerImage, BlockSize, NumBins])
BlocksPerImage = floor((size(I)./CellSize – BlockSize)./(BlockSize – BlockOverlap) + 1)
2) the second HOG function is downloaded from here.
Same image is used
H = hog( double(rgb2gray(img)), 8, 9 );
% I - [mxn] color or grayscale input image (must have type double)
% sBin - [8] spatial bin size
% oBin - [9] number of orientation bins
The size of H is 3024
How to calculate:
H - [m/sBin-2 n/sBin-2 oBin*4] computed hog features
3) HoG code from vl_feat.
cellSize = 8;
hog = vl_hog(im2single(rgb2gray(img)), cellSize, 'verbose','variant', 'dalaltriggs') ;
vl_hog: image: [64 x 128 x 1]
vl_hog: descriptor: [8 x 16 x 36]
vl_hog: number of orientations: 9
vl_hog: bilinear orientation assignments: no
vl_hog: variant: DalalTriggs
vl_hog: input type: Image
the output is 4608.
Which one is correct?
All are correct. Thing is HOG feature extraction function default parameters vary with packages. (Eg - opencv, matlab, scikit-image etc). By parameters I mean, winsize, stride, blocksize, scale etc.
Usually HOG descriptor length is :
Length = Number of Blocks x Cells in each Block x Number of Bins in each Cell
Since all are correct, which one you may use can be answered in many ways.
You can experiment with different param values and choose the one that suits you. Since there is no fixed way to find right values, it would be helpful if you know how change in each parameters affect the result.
Cell-size : If you increase this, you may not capture small details.
Block-size : Again, large block with large cell size may not help you capture the small details. Also since large block means illumination variation can be more and due to gradient normalization step, lot of details will be lost. So choose accordingly.
Overlap/Stride: This again helps you capture more information about the image patch if you choose overlapping blocks. Usually it is set to half the blocksize.
You may have lot of information by choosing the values of the above params accordingly. But the descriptor length will become unnecessarily long.
Hope this helps :)
Related
with JPEG snoop for an image 4:2:2 hor (YYCbCr)
I see this in the SOF0:
Component[1]: ID=0x01, Samp Fac=0x21 (Subsamp 1 x 1), Quant Tbl Sel=0x00 (Lum: Y)
Component[2]: ID=0x02, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cb)
Component[3]: ID=0x03, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cr)
Now where are the values 0x21 and 0x11 coming from?
I know that sampling factors are stored like this: (1byte) (bit 0-3 vertical., 4-7 horizontal.)
but I don't see how 0x11 relates to 2x1 and 0x21 to 1x1.
I expected to see 0x11 for Y component and not 0x21.
(not sure how you get 0x21 as result).
Can somebody explain these values and how you calculate them for example 4:2:2 horizontal (16x8)?
JPEG does it bassackwarks. The values indicate RELATIVE SAMPLING RATES.
The highest sampling rate is for Y (2). The sampling rate for Cb and Cr is 1.
Use the highest sampling rate to normalize to pixels:
2Y = Cb = Cr.
Y = 1/2 Cb = 1/2 Cr.
For every Y pixel value in that direction you use 1/2 a Cb and Cr pixel value.
You could even have something like according to the JPEG standard.
4Y = 3Cb = 1Cr
Y = 3/4Cb = 1/4 Cr
or
3Y=2Cb=1Cr
Y=2/3Cb=1/3Cr
But most decoders could not handle that.
The labels like "4:4:4", "4:2:2", and "4:4:0" are just that: labels that are not in the JPEG standard. Quite frankly, I don't even know where those term even come from and they are not intuitive at all (there is never a zero sampling).
Let me add another way of looking at this problem. But first, you have to keep in mind that the JPEG standard itself is not implementable. Things necessary to encode images are undefined and the standard is sprawling with unnecessary stuff.
If a scan is interleaved (all three components), it is encoded in minimum coded units (MCUs). An MCU consists of 8x8 encoded blocks.
The sampling rate specifies the number of 8x8 blocks in an MCU.
You have 2x1 for Y + 1x1 for Cb and 1x1 for Cr. That means a total of 4 8x8 blocks are in an MCU. While I mentioned other theoretical values above, the maximum number of blocks in an MCU is 10. Thus 4x4 + 3x3 + 2x2 is not possible.
The JPEG standard does not say how those blocks are mapped to pixels in an image. We usually use the largest value and say that wave a 2x1 zone or 16x8 pixels.
But all kinds of weirdness is possible under the standard, such as:
Y = 2x1, Cb = 1x2 and Cr = 1x1
That would probably mean an MCU maps to a 16x16 block of pixels but your decoder would probably not support this. Alternatively, it might mean an MCA maps to a 16x8 block of pixels and the Cb component has more values in the 8 direction.
A final way of viewing this (the practicable way) is to use the Y component as a reference point. Assume that Y is always going to have 1 or 2 (and maybe a 4) as the sampling rate in the X and Y directions and define the rates on Cb and Cr are going to be 1 (and maybe 2).The Y component always defines the pixels in the image.
These would then be realistic possibilities:
Y Cb Cr
1x1, 1x1, 1x1
2x2, 1x1, 1x1
4x4, 1x1, 1x1
2x1, 1x1, 1x1
1x2, 1x1, 1x1
To apply the combination of SVD perturbation:
I = imread('image.jpg');
Ibw = single(im2double(I));
[U S V] = svd(Ibw);
% calculate derviced image
P = U * power(S, i) * V'; % where i is between 1 and 2
%To compute the combined image of SVD perturbations:
J = (single(I) + (alpha*P))/(1+alpha); % where alpha is between 0 and 1
I applied this method to a specific face recognition model and I noticed the accuracy was highly increased!! So it is very efficient!. Interestingly, I used the value i=3/4 and alpha=0.25 according to a paper that was published in a journal in 2012 in which the authors used i=3/4 and alpha=0.25. But I didn't make attention that i must be between 1 and 2! (I don't know if the authors make an error of dictation or they in fact used the value 3/4). So I tried to change the value of i to a value greater than 1, the accuracy decreased!!. So can I use the value 3/4 ? If yes, how can I argument therefore my approach?
The paper that I read is entitled "Enhanced SVD based face recognition". In page 3, they used the value i=3/4.
(http://www.oalib.com/paper/2050079)
Kindly I need your help and opinions. Any help will be very appreciated!
The idea to have the value between one and two is to magnify the singular values to make them invariant to illumination changes.
Refer to this paper: A New Face Recognition Method based on SVD Perturbation for Single Example Image per Person: Daoqiang Zhang,Songcan Chen,and Zhi-Hua Zhou
Note that when n equals to 1, the derived image P is equivalent to the original image I . If we
choose n>1, then the singular values satisfying s_i > 1 will be magnified. Thus the reconstructed
image P emphasizes the contribution of the large singular values, while restraining that of the
small ones. So by integrating P into I , we get a combined image J which keeps the main
information of the original image and is expected to work better against minor changes of
expression, illumination and occlusions.
My take:
When you scale the singular values in the exponent, you are basically introducing a non-linearity, so its possible that for a specific dataset, scaling down the singular values may be beneficial. Its like adjusting the gamma correction factor in a monitor.
The objective is to see if two images, which have one object captured in each image, matches.
The object or image I have stored. This will be used as a baseline:
item1 (This is being matched in the code)
The object/image that needs to matched with-this is stored:
input (Need to see if this matches with what is stored
My method:
Covert images to gray-scale.
Extract SURF interest points.
Obtain features.
Match features.
Get 50 strongest features.
Match the number of strongest features with each image.
Take the ratio of- number of features matched/ number of strongest
features (which is 50).
If I have two images of the same object (two images taken separately on a camera), ideally the ratio should be near 1 or near 100%.
However this is not the case, the best ratio I am getting is near 0.5 or even worse, 0.3.
I am aware the SURF detectors and features can be used in neural networks, or using a statistics based approach. I believe I have approached the statistics based approach to some extent by using 50 of the strongest features.
Is there something I am missing? What do I add onto this or how do I improve it? Please provide me a point to start from.
%Clearing the workspace and all variables
clc;
clear;
%ITEM 1
item1 = imread('Loreal.jpg');%Retrieve order 1 and digitize it.
item1Grey = rgb2gray(item1);%convert to grayscale, 2 dimensional matrix
item1KP = detectSURFFeatures(item1Grey,'MetricThreshold',600);%get SURF dectectors or interest points
strong1 = item1KP.selectStrongest(50);
[item1Features, item1Points] = extractFeatures(item1Grey, strong1,'SURFSize',128); % using SURFSize of 128
%INPUT : Aquire Image
input= imread('MakeUp1.jpg');%Retrieve input and digitize it.
inputGrey = rgb2gray(input);%convert to grayscale, 2 dimensional matrix
inputKP = detectSURFFeatures(inputGrey,'MetricThreshold',600);%get SURF dectectors or interest
strongInput = inputKP.selectStrongest(50);
[inputFeatures, inputPoints] = extractFeatures(inputGrey, strongInput,'SURFSize',128); % using SURFSize of 128
pairs = matchFeatures(item1Features, inputFeatures, 'MaxRatio',1); %matching SURF Features
totalFeatures = length(item1Features); %baseline number of features
numPairs = length(pairs); %the number of pairs
percentage = numPairs/50;
if percentage >= 0.49
disp('We have this');
else
disp('We do not have this');
disp(percentage);
end
The baseline image
The input image
I would try not doing selectStrongest and not setting MaxRatio. Just call matchFeatures with the default options and compare the number of resulting matches.
The default behavior of matchFeatures is to use the ratio test to exclude ambiguous matches. So the number of matches it returns may be a good indicator of the presence or absence of the object in the scene.
If you want to try something more sophisticated, take a look at this example.
I am accessing 10 images from a folder "c1" and I have query image. I have implemented code for loading images in cell array and then I'm calculating histogram intersection between query image and each image from folder "c1" one-by-one. Now i want to draw precision-recall curve but i am not sure how to write code for getting "precision-recall curve" using the data obtained from histogram intersection.
My code:
Inp1=rgb2gray(imread('D:\visionImages\c1\1.ppm'));
figure, imshow(Inp1), title('Input image 1');
srcFiles = dir('D:\visionImages\c1\*.ppm'); % the folder in which images exists
for i = 1 : length(srcFiles)
filename = strcat('D:\visionImages\c1\',srcFiles(i).name);
I = imread(filename);
I=rgb2gray(I);
Seq{i}=I;
end
for i = 1 : length(srcFiles) % loop for calculating histogram intersections
A=Seq{i};
B=Inp1;
a = size(A,2); b = size(B,2);
K = zeros(a, b);
for j = 1:a
Va = repmat(A(:,j),1,b);
K(j,:) = 0.5*sum(Va + B - abs(Va - B));
end
end
Precision-Recall graphs measure the accuracy of your image retrieval system. They're also used in the performance of any search engine really, like text or documents. They're also used in machine learning evaluation and performance, though ROC Curves are what are more commonly used.
Precision-Recall graphs are more suitable for document and data retrieval. For the case of images here, given your query image, you are measuring how similar this image is with the rest of the images in your database. You then have similarity measures for each of the database images in relation to your query image and then you sort these similarities in descending order. For a good retrieval system, you would want the images that are the most relevant (i.e. what you are searching for) to all appear at the beginning, while the irrelevant images would appear after.
Precision
The definition of Precision is the ratio of the number of relevant images you have retrieved to the total number of irrelevant and relevant images retrieved. In other words, supposing that A was the number of relevant images retrieved and B was the total number of irrelevant images retrieved. When calculating precision, you take a look at the first several images, and this amount is A + B, as the total number of relevant and irrelevant images is how many images you are considering at this point. As such, another definition Precision is defined as the ratio of how many relevant images you have retrieved so far out of the bunch that you have grabbed:
Precision = A / (A + B)
Recall
The definition of Recall is slightly different. This evaluates how many of the relevant images you have retrieved so far out of a known total, which is the the total number of relevant images that exist. As such, let's say you again take a look at the first several images. You then determine how many relevant images there are, then you calculate how many relevant images that have been retrieved so far out of all of the relevant images in the database. This is defined as the ratio of how many relevant images you have retrieved overall. Supposing that A was again the total number of relevant images you have retrieved out of a bunch you have grabbed from the database, and C represents the total number of relevant images in your database. Recall is thus defined as:
Recall = A / C
How you calculate this in MATLAB is actually quite easy. You first need to know how many relevant images are in your database. After, you need to know the similarity measures assigned to each database image with respect to the query image. Once you compute these, you need to know which similarity measures map to which relevant images in your database. I don't see that in your code, so I will leave that to you. Once you do this, you then sort on the similarity values then you go through where in the sorted similarity values these relevant images occur. You then use these to calculate your precision and recall.
I'll provide a toy example so I can show you what the graph looks like as it isn't quite clear on how you're calculating your similarities here. Let's say I have 5 images in a database of 20, and I have a bunch of similarity values between them and a query image:
rng(123); %// Set seed for reproducibility
num_images = 20;
sims = rand(1,num_images);
sims =
Columns 1 through 13
0.6965 0.2861 0.2269 0.5513 0.7195 0.4231 0.9808 0.6848 0.4809 0.3921 0.3432 0.7290 0.4386
Columns 14 through 20
0.0597 0.3980 0.7380 0.1825 0.1755 0.5316 0.5318
Also, I know that images [1 5 7 9 12] are my relevant images.
relevant_IDs = [1 5 7 9 12];
num_relevant_images = numel(relevant_IDs);
Now let's sort the similarity values in descending order, as higher values mean higher similarity. You'd reverse this if you were calculating a dissimilarity measure:
[sorted_sims, locs] = sort(sims, 'descend');
locs will now contain the image ranks that each image ranked as. Specifically, these tell you which position in similarity the image belongs to. sorted_sims will have the similarities sorted in descending order:
sorted_sims =
Columns 1 through 13
0.9808 0.7380 0.7290 0.7195 0.6965 0.6848 0.5513 0.5318 0.5316 0.4809 0.4386 0.4231 0.3980
Columns 14 through 20
0.3921 0.3432 0.2861 0.2269 0.1825 0.1755 0.0597
locs =
7 16 12 5 1 8 4 20 19 9 13 6 15 10 11 2 3 17 18 14
Therefore, the 7th image is the highest ranked image, followed by the 16th image being the second highest image and so on. What you need to do now is for each of the images that you know are relevant, you need to figure out where these are located after sorting. We will go through each of the image IDs that we know are relevant, and figure out where these are located in the above locations array:
locations_final = arrayfun(#(x) find(locs == x, 1), relevant_IDs)
locations_final =
5 4 1 10 3
Let's sort these to get a better understand of what this is saying:
locations_sorted = sort(locations_final)
locations_sorted =
1 3 4 5 10
These locations above now tell you the order in which the relevant images will appear. As such, the first relevant image will appear first, the second relevant image will appear in the third position, the third relevant image appears in the fourth position and so on. These precisely correspond to part of the definition of Precision. For example, in the last position of locations_sorted, it would take ten images to retrieve all of the relevant images (5) in our database. Similarly, it would take five images to retrieve four relevant images in the database. As such, you would compute precision like this:
precision = (1:num_relevant_images) ./ locations_sorted;
Similarly for recall, it's simply the ratio of how many images were retrieved so far from the total, and so it would just be:
recall = (1:num_relevant_images) / num_relevant_images;
Your Precision-Recall graph would now look like the following, with Recall on the x-axis and Precision on the y-axis:
plot(recall, precision, 'b.-');
xlabel('Recall');
ylabel('Precision');
title('Precision-Recall Graph - Toy Example');
axis([0 1 0 1.05]); %// Adjust axes for better viewing
grid;
This is the graph I get:
You'll notice that between a recall ratio of 0.4 to 0.8 the precision is increasing a bit. This is because you have managed to retrieve a successive chain of images without touching any of the irrelevant ones, and so your precision will naturally increase. It goes way down after the last image, as you've had to retrieve so many irrelevant images before finally hitting a relevant image.
You'll also notice that precision and recall are inversely related. As such, if precision increases, then recall decreases. Similarly, if precision decreases, then recall will increase.
The first part makes sense because if you don't retrieve that many images in the beginning, you have a greater chance of not including irrelevant images in your results but at the same time, the amount of relevant images is rather small. This is why recall would decrease when precision would increase
The second part also makes sense because as you keep trying to retrieve more images in your database, you'll inevitably be able to retrieve all of the relevant ones, but you'll most likely start to include more irrelevant images, which would thus drive your precision down.
In an ideal world, if you had N relevant images in your database, you would want to see all of these images in the top N most similar spots. As such, this would make your precision-recall graph a flat horizontal line hovering at y = 1, which means that you've managed to retrieve all of your images in all of the top spots without accessing any irrelevant images. Unfortunately, that's never going to happen (or at least not for now...) as trying to figure out the best features for CBIR is still an on-going investigation, and no image search engine that I have seen has managed to get this perfect. This is still one of the most broadest and unsolved computer vision problems that exist today!
Edit
You retrieved this code to compute histogram intersection from this post. They have a neat way of computing histogram intersection as:
n is the total number of bins in your histogram. You'll have to play around with this to get good results, but we can leave that as a parameter in your code. The code above assumes that you have two matrices A and B where each column is a histogram. You'll generate a matrix that is of a x b, where a is the number of columns in A and b is the number of columns in b. The row and column of this matrix (i,j) tells you the similarity between the ith column in A with the b jth column in B. In your case, A would be a single column which denotes the histogram of your query image. B would be a 10 column matrix that denotes the histograms for each of the database images. Therefore, we will get a 1 x 10 array of similarity measures through histogram intersection.
As such, we need to modify your code so that you're using imhist for each of the images. We can also specify an additional parameter that gives you how many bins each histogram will have. Therefore, your code will look like this. Each new line that I have placed will have a %// NEW comment beside each line.
Inp1=rgb2gray(imread('D:\visionImages\c1\1.ppm'));
figure, imshow(Inp1), title('Input image 1');
num_bins = 32; %// NEW - I'm specifying 32 bins here. Play around with this yourself
A = imhist(Inp1, num_bins); %// NEW - Calculate histogram
srcFiles = dir('D:\visionImages\c1\*.ppm'); % the folder in which images exists
B = zeros(num_bins, length(srcFiles)); %// NEW - Store histograms of database images
for i = 1 : length(srcFiles)
filename = strcat('D:\visionImages\c1\',srcFiles(i).name);
I = imread(filename);
I=rgb2gray(I);
B(:,i) = imhist(I, num_bins); %// NEW - Put each histogram in a separate
%// column
end
%// NEW - Taken directly from the website
%// but modified for only one histogram in `A`
b = size(B,2);
Va = repmat(A, 1, b);
K = 0.5*sum(Va + B - abs(Va - B));
Take note that I have copied the code from the website, but I have modified it because there is only one image in A and so there is some code that isn't necessary.
K should now be a 1 x 10 array of histogram intersection similarities. You would then use K and assign sims to this variable (i.e. sims = K;) in the code I have written above, then run through your images. You also need to know which images are relevant images, and you'd have to change the code I've written to reflect that.
Hope this helps!
I'm looking through various online sources trying to learn some new stuff with matlab.
I can across a dilation function, shown below:
function rtn = dilation(in)
h =size(in,1);
l =size(in,2);
rtn = zeros(h,l,3);
rtn(:,:,1)=[in(2:h,:); in(h,:)];
rtn(:,:,2)=in;
rtn(:,:,3)=[in(1,:); in(1:h-1,:)];
rtn_two = max(rtn,[],3);
rtn(:,:,1)=[rtn_two(:,2:l), rtn_two(:,l)];
rtn(:,:,2)=rtn_two;
rtn(:,:,3)=[rtn_two(:,1), rtn_two(:,1:l-1)];
rtn = max(rtn,[],3);
The parameter it takes is: max(img,[],3) %where img is an image
I was wondering if anyone could shed some light on what this function appears to do and if there's a better (or less confusing way) to do it? Apart from a small wiki entry, I can't seem to find any documentation, hence asking for your help.
Could this be achieved with the imdilate function maybe?
What this is doing is creating two copies of the image shifted by one pixel up/down (with the last/first row duplicated to preserve size), then taking the max value of the 3 images at each point to create a vertically dilated image. Since the shifted copies and the original are layered in a 3-d matrix, max(img,[],3) 'flattens' the 3 layers along the 3rd dimension. It then repeats this column-wise for the horizontal part of the dilation.
For a trivial image:
00100
20000
00030
Step 1:
(:,:,1) (:,:,2) (:,:,3) max
20000 00100 00100 20100
00030 20000 00100 20130
00030 00030 20000 20030
Step 2:
(:,:,1) (:,:,2) (:,:,3) max
01000 20100 22010 22110
01300 20130 22013 22333
00300 20030 22003 22333
You're absolutely correct this would be simpler with the Image Processing Toolbox:
rtn = imdilate(in, ones(3));
With the original code, dilating by more than one pixel would require multiple iterations, and because it operates one dimension at a time it's limited to square (or possibly rectangular, with a bit of modification) structuring elements.
Your function replaces each element with the maximum value among the corresponding 3*3 kernel. By creating a 3D matrix, the function align each element with two of its shift, thus equivalently achieves the 3*3 kernel. Such alignment was done twice to find the maximum value along each column and row respectively.
You can generate a simple matrix to compare the result with imdilate:
a=magic(8)
rtn = dilation(a)
b=imdilate(a,ones(3))
Besides imdilate, you can also use
c=ordfilt2(a,9,ones(3))
to get the same result ( implements a 3-by-3 maximum filter. )
EDIT
You may have a try on 3D image with imdilate as well:
a(:,:,1)=magic(8);
a(:,:,2)=magic(8);
a(:,:,3)=magic(8);
mask = true(3,3,3);
mask(2,2,2) = false;
d = imdilate(a,mask);