How can I find quantized coefficients from MATLAB using Sallee's code? - matlab

First, I admit that this is a homework question. However, I seem to be stuck. I need to get all quantized coefficients from a jpeg image using Phil Sallee's JPEG Toolbox (link listed at the bottom of the table under an "update" heading)(I'll be building a histogram, but that part I can handle once I can get to the data I need). I have a JPEG image that is about 5 MB in size and get back this data when I run it through Sallee's code:
image_width: 3000
image_height: 4000
image_components: 3
image_color_space: 2
jpeg_components: 3
jpeg_color_space: 3
comments: {}
coef_arrays: {[4000x3000 double] [2000x3000 double] [2000x3000 double]}
quant_tables: {[8x8 double] [8x8 double]}
ac_huff_tables: [1x2 struct]
dc_huff_tables: [1x2 struct]
optimize_coding: 0
comp_info: [1x3 struct]
progressive_mode: 0
How do I get the quantized coefficients from this image? At first I tried something like this to just spit out the coefficients so I could see what I was dealing with:
pic = jpeg_read(image)
img_coef = pic.quant_tables{pic.comp_info(1).quant_tbl_no}
img_coef = pic.quant_tables{pic.comp_info(2).quant_tbl_no}
img_coef is run twice because there are two elements to the quant_tables data point above. However, this seems like a very low amount of coefficients for such a large image. Can someone more knowledgeable than me in this regard point me in the right direction? Where/how do I pull the quantized coefficients from a jpeg image?

It appears that you have the information you need. From the data you've provided, it looks like the JPEG toolkit decodes the coefficients and loads them into the "coef_arrays". Your image has horizontal subsampling; this is indicated by the color coefficient arrays being half the width of the luminance. The 3 arrays represent (Y, Cr, Cb). There are 2 quantization tables because one is for the Y component and the other is for the Cr and Cb components. In order to de-quantize the coefficients, you will need to multiply the correct element of the quant_tables[] array with each coefficient. For example, element [8, 10] of your coefficients array should be multiplied by element [0,2] of your quant_table. The 8x8 quantization array gets re-used across every 8x8 set of coefficients. Normally these are in zig-zag order, but it appears that your toolkit has laid it out like a complete image.

This will open a file, pull off the luminance, Cr and Cb arrays, and the two quantization arrays. It will then quantize luminance, Cr and Cb into their own variables.
im = jpeg_read(image);
% Pull image information - Lum, Cb, Cr
lum = im.coef_arrays{im.comp_info(1).component_id};
cb = im.coef_arrays{im.comp_info(2).component_id};
cr = im.coef_arrays{im.comp_info(3).component_id};
% Pull quantization arrays
lqtable = im.quant_tables{im.comp_info(1).quant_tbl_no};
cqtable = im.quant_tables{im.comp_info(2).quant_tbl_no};
% Quantize above two sets of information
lqcof = quantize(lum,lum_qtable);
bqcof = quantize(cb,cho_qtable);
rqcof = quantize(cr,cho_qtable);

Related

Confusion in different HOG codes

I have downloaded three different HoG codes.
using the image of 64x128
1) using the matlab function:extractHOGFeatures,
[hog, vis] = extractHOGFeatures(img,'CellSize',[8 8]);
The size of hog is 3780.
How to calculate:
HOG feature length, N, is based on the image size and the function parameter values.
N = prod([BlocksPerImage, BlockSize, NumBins])
BlocksPerImage = floor((size(I)./CellSize – BlockSize)./(BlockSize – BlockOverlap) + 1)
2) the second HOG function is downloaded from here.
Same image is used
H = hog( double(rgb2gray(img)), 8, 9 );
% I - [mxn] color or grayscale input image (must have type double)
% sBin - [8] spatial bin size
% oBin - [9] number of orientation bins
The size of H is 3024
How to calculate:
H - [m/sBin-2 n/sBin-2 oBin*4] computed hog features
3) HoG code from vl_feat.
cellSize = 8;
hog = vl_hog(im2single(rgb2gray(img)), cellSize, 'verbose','variant', 'dalaltriggs') ;
vl_hog: image: [64 x 128 x 1]
vl_hog: descriptor: [8 x 16 x 36]
vl_hog: number of orientations: 9
vl_hog: bilinear orientation assignments: no
vl_hog: variant: DalalTriggs
vl_hog: input type: Image
the output is 4608.
Which one is correct?
All are correct. Thing is HOG feature extraction function default parameters vary with packages. (Eg - opencv, matlab, scikit-image etc). By parameters I mean, winsize, stride, blocksize, scale etc.
Usually HOG descriptor length is :
Length = Number of Blocks x Cells in each Block x Number of Bins in each Cell
Since all are correct, which one you may use can be answered in many ways.
You can experiment with different param values and choose the one that suits you. Since there is no fixed way to find right values, it would be helpful if you know how change in each parameters affect the result.
Cell-size : If you increase this, you may not capture small details.
Block-size : Again, large block with large cell size may not help you capture the small details. Also since large block means illumination variation can be more and due to gradient normalization step, lot of details will be lost. So choose accordingly.
Overlap/Stride: This again helps you capture more information about the image patch if you choose overlapping blocks. Usually it is set to half the blocksize.
You may have lot of information by choosing the values of the above params accordingly. But the descriptor length will become unnecessarily long.
Hope this helps :)

which are the ranges in hsv image representation in matlab?

i've to compare several images of the same scene taken from different devices/position. To do so, i want to quantize the colors in order to remove some color representation differences due to device and illumination.
If i work in RGB i know that matlab represent each channel in the range [0 255], if i work in YCbCr i know that the three ranges are[16 235] and [16 240], but if i wanted to work in HSV color space i just know that converting with rgb2hsv i get an image which each channel is a double... but i don't know if all range between 0 and 1 are used for all the three channels.... so that i cannot make a quantization without this information.
Parag basically answered your question, but if you want physical proof, you can do what chappjc suggested and just... try it yourself! Read in an image, convert it to HSV using rgb2hsv, and take a look at the distribution of values. For example, using onion.png that is part of MATLAB's system path, try something like:
im = imread('onion.png');
out = rgb2hsv(im);
str = 'HSV';
for idx = 1 : 3
disp(['Range of ', str(idx)]);
disp([min(min(out(:,:,idx))) max(max(out(:,:,idx)))]);
end
The above code will read in each channel and display the minimum and maximum in each (Hue, Saturation and Value). This is what I get:
Range of H
0 0.9991
Range of S
0.0791 1.0000
Range of V
0.0824 1.0000
As you can see, the values range between [0,1]. Have fun!

How do I plot Precision-Recall graphs for Content-Based Image Retrieval in MATLAB?

I am accessing 10 images from a folder "c1" and I have query image. I have implemented code for loading images in cell array and then I'm calculating histogram intersection between query image and each image from folder "c1" one-by-one. Now i want to draw precision-recall curve but i am not sure how to write code for getting "precision-recall curve" using the data obtained from histogram intersection.
My code:
Inp1=rgb2gray(imread('D:\visionImages\c1\1.ppm'));
figure, imshow(Inp1), title('Input image 1');
srcFiles = dir('D:\visionImages\c1\*.ppm'); % the folder in which images exists
for i = 1 : length(srcFiles)
filename = strcat('D:\visionImages\c1\',srcFiles(i).name);
I = imread(filename);
I=rgb2gray(I);
Seq{i}=I;
end
for i = 1 : length(srcFiles) % loop for calculating histogram intersections
A=Seq{i};
B=Inp1;
a = size(A,2); b = size(B,2);
K = zeros(a, b);
for j = 1:a
Va = repmat(A(:,j),1,b);
K(j,:) = 0.5*sum(Va + B - abs(Va - B));
end
end
Precision-Recall graphs measure the accuracy of your image retrieval system. They're also used in the performance of any search engine really, like text or documents. They're also used in machine learning evaluation and performance, though ROC Curves are what are more commonly used.
Precision-Recall graphs are more suitable for document and data retrieval. For the case of images here, given your query image, you are measuring how similar this image is with the rest of the images in your database. You then have similarity measures for each of the database images in relation to your query image and then you sort these similarities in descending order. For a good retrieval system, you would want the images that are the most relevant (i.e. what you are searching for) to all appear at the beginning, while the irrelevant images would appear after.
Precision
The definition of Precision is the ratio of the number of relevant images you have retrieved to the total number of irrelevant and relevant images retrieved. In other words, supposing that A was the number of relevant images retrieved and B was the total number of irrelevant images retrieved. When calculating precision, you take a look at the first several images, and this amount is A + B, as the total number of relevant and irrelevant images is how many images you are considering at this point. As such, another definition Precision is defined as the ratio of how many relevant images you have retrieved so far out of the bunch that you have grabbed:
Precision = A / (A + B)
Recall
The definition of Recall is slightly different. This evaluates how many of the relevant images you have retrieved so far out of a known total, which is the the total number of relevant images that exist. As such, let's say you again take a look at the first several images. You then determine how many relevant images there are, then you calculate how many relevant images that have been retrieved so far out of all of the relevant images in the database. This is defined as the ratio of how many relevant images you have retrieved overall. Supposing that A was again the total number of relevant images you have retrieved out of a bunch you have grabbed from the database, and C represents the total number of relevant images in your database. Recall is thus defined as:
Recall = A / C
How you calculate this in MATLAB is actually quite easy. You first need to know how many relevant images are in your database. After, you need to know the similarity measures assigned to each database image with respect to the query image. Once you compute these, you need to know which similarity measures map to which relevant images in your database. I don't see that in your code, so I will leave that to you. Once you do this, you then sort on the similarity values then you go through where in the sorted similarity values these relevant images occur. You then use these to calculate your precision and recall.
I'll provide a toy example so I can show you what the graph looks like as it isn't quite clear on how you're calculating your similarities here. Let's say I have 5 images in a database of 20, and I have a bunch of similarity values between them and a query image:
rng(123); %// Set seed for reproducibility
num_images = 20;
sims = rand(1,num_images);
sims =
Columns 1 through 13
0.6965 0.2861 0.2269 0.5513 0.7195 0.4231 0.9808 0.6848 0.4809 0.3921 0.3432 0.7290 0.4386
Columns 14 through 20
0.0597 0.3980 0.7380 0.1825 0.1755 0.5316 0.5318
Also, I know that images [1 5 7 9 12] are my relevant images.
relevant_IDs = [1 5 7 9 12];
num_relevant_images = numel(relevant_IDs);
Now let's sort the similarity values in descending order, as higher values mean higher similarity. You'd reverse this if you were calculating a dissimilarity measure:
[sorted_sims, locs] = sort(sims, 'descend');
locs will now contain the image ranks that each image ranked as. Specifically, these tell you which position in similarity the image belongs to. sorted_sims will have the similarities sorted in descending order:
sorted_sims =
Columns 1 through 13
0.9808 0.7380 0.7290 0.7195 0.6965 0.6848 0.5513 0.5318 0.5316 0.4809 0.4386 0.4231 0.3980
Columns 14 through 20
0.3921 0.3432 0.2861 0.2269 0.1825 0.1755 0.0597
locs =
7 16 12 5 1 8 4 20 19 9 13 6 15 10 11 2 3 17 18 14
Therefore, the 7th image is the highest ranked image, followed by the 16th image being the second highest image and so on. What you need to do now is for each of the images that you know are relevant, you need to figure out where these are located after sorting. We will go through each of the image IDs that we know are relevant, and figure out where these are located in the above locations array:
locations_final = arrayfun(#(x) find(locs == x, 1), relevant_IDs)
locations_final =
5 4 1 10 3
Let's sort these to get a better understand of what this is saying:
locations_sorted = sort(locations_final)
locations_sorted =
1 3 4 5 10
These locations above now tell you the order in which the relevant images will appear. As such, the first relevant image will appear first, the second relevant image will appear in the third position, the third relevant image appears in the fourth position and so on. These precisely correspond to part of the definition of Precision. For example, in the last position of locations_sorted, it would take ten images to retrieve all of the relevant images (5) in our database. Similarly, it would take five images to retrieve four relevant images in the database. As such, you would compute precision like this:
precision = (1:num_relevant_images) ./ locations_sorted;
Similarly for recall, it's simply the ratio of how many images were retrieved so far from the total, and so it would just be:
recall = (1:num_relevant_images) / num_relevant_images;
Your Precision-Recall graph would now look like the following, with Recall on the x-axis and Precision on the y-axis:
plot(recall, precision, 'b.-');
xlabel('Recall');
ylabel('Precision');
title('Precision-Recall Graph - Toy Example');
axis([0 1 0 1.05]); %// Adjust axes for better viewing
grid;
This is the graph I get:
You'll notice that between a recall ratio of 0.4 to 0.8 the precision is increasing a bit. This is because you have managed to retrieve a successive chain of images without touching any of the irrelevant ones, and so your precision will naturally increase. It goes way down after the last image, as you've had to retrieve so many irrelevant images before finally hitting a relevant image.
You'll also notice that precision and recall are inversely related. As such, if precision increases, then recall decreases. Similarly, if precision decreases, then recall will increase.
The first part makes sense because if you don't retrieve that many images in the beginning, you have a greater chance of not including irrelevant images in your results but at the same time, the amount of relevant images is rather small. This is why recall would decrease when precision would increase
The second part also makes sense because as you keep trying to retrieve more images in your database, you'll inevitably be able to retrieve all of the relevant ones, but you'll most likely start to include more irrelevant images, which would thus drive your precision down.
In an ideal world, if you had N relevant images in your database, you would want to see all of these images in the top N most similar spots. As such, this would make your precision-recall graph a flat horizontal line hovering at y = 1, which means that you've managed to retrieve all of your images in all of the top spots without accessing any irrelevant images. Unfortunately, that's never going to happen (or at least not for now...) as trying to figure out the best features for CBIR is still an on-going investigation, and no image search engine that I have seen has managed to get this perfect. This is still one of the most broadest and unsolved computer vision problems that exist today!
Edit
You retrieved this code to compute histogram intersection from this post. They have a neat way of computing histogram intersection as:
n is the total number of bins in your histogram. You'll have to play around with this to get good results, but we can leave that as a parameter in your code. The code above assumes that you have two matrices A and B where each column is a histogram. You'll generate a matrix that is of a x b, where a is the number of columns in A and b is the number of columns in b. The row and column of this matrix (i,j) tells you the similarity between the ith column in A with the b jth column in B. In your case, A would be a single column which denotes the histogram of your query image. B would be a 10 column matrix that denotes the histograms for each of the database images. Therefore, we will get a 1 x 10 array of similarity measures through histogram intersection.
As such, we need to modify your code so that you're using imhist for each of the images. We can also specify an additional parameter that gives you how many bins each histogram will have. Therefore, your code will look like this. Each new line that I have placed will have a %// NEW comment beside each line.
Inp1=rgb2gray(imread('D:\visionImages\c1\1.ppm'));
figure, imshow(Inp1), title('Input image 1');
num_bins = 32; %// NEW - I'm specifying 32 bins here. Play around with this yourself
A = imhist(Inp1, num_bins); %// NEW - Calculate histogram
srcFiles = dir('D:\visionImages\c1\*.ppm'); % the folder in which images exists
B = zeros(num_bins, length(srcFiles)); %// NEW - Store histograms of database images
for i = 1 : length(srcFiles)
filename = strcat('D:\visionImages\c1\',srcFiles(i).name);
I = imread(filename);
I=rgb2gray(I);
B(:,i) = imhist(I, num_bins); %// NEW - Put each histogram in a separate
%// column
end
%// NEW - Taken directly from the website
%// but modified for only one histogram in `A`
b = size(B,2);
Va = repmat(A, 1, b);
K = 0.5*sum(Va + B - abs(Va - B));
Take note that I have copied the code from the website, but I have modified it because there is only one image in A and so there is some code that isn't necessary.
K should now be a 1 x 10 array of histogram intersection similarities. You would then use K and assign sims to this variable (i.e. sims = K;) in the code I have written above, then run through your images. You also need to know which images are relevant images, and you'd have to change the code I've written to reflect that.
Hope this helps!

Calculating value y for each XI and XII in MATLAB:

I am currently working in matlab to design a way to reconstruct 3D data. For this I have two pictures with black points. The difference in the amount of points per frame is key for the reconstruction, but MATLAB gives an error when matrixes are not equal. This is happening becaus the code is not doing what I want it to do, so can anyone hel me with the following?
I have two columns of Xdata: XLI and XRI
What matlab does when I do XLI-XRI is substracting the pairs i.e XLI(1)-XRI(1) etc, but I want to substract each value of XRI of every value of XLI. i.e
XLI(1)-XRI(1,2,3,4 etc)
XLI(2)-XRI(1 2 3 4 etc)
and so on
Can anyone help?
I think you are looking for a way to deduct all combinations from eachother. Here is an example of how you can do that with bsxfun:
xLI = [1 2 3]
xRI = [1 2]
bsxfun(#minus,xLI ,xRI')
I cannot comment on Dennis's post (not enough points on this website) : his solution should work, but depending on your version of Matlab you might get a "Error using ==> bsxfun" and need to transpose either xLI or xRI for that to work :
bsxfun(#minus,xLI' ,xRI)
Best,
Tepp

Matlab fast neighborhood operation

I have a Problem. I have a Matrix A with integer values between 0 and 5.
for example like:
x=randi(5,10,10)
Now I want to call a filter, size 3x3, which gives me the the most common value
I have tried 2 solutions:
fun = #(z) mode(z(:));
y1 = nlfilter(x,[3 3],fun);
which takes very long...
and
y2 = colfilt(x,[3 3],'sliding',#mode);
which also takes long.
I have some really big matrices and both solutions take a long time.
Is there any faster way?
+1 to #Floris for the excellent suggestion to use hist. It's very fast. You can do a bit better though. hist is based on histc, which can be used instead. histc is a compiled function, i.e., not written in Matlab, which is why the solution is much faster.
Here's a small function that attempts to generalize what #Floris did (also that solution returns a vector rather than the desired matrix) and achieve what you're doing with nlfilter and colfilt. It doesn't require that the input have particular dimensions and uses im2col to efficiently rearrange the data. In fact, the the first three lines and the call to im2col are virtually identical to what colfit does in your case.
function a=intmodefilt(a,nhood)
[ma,na] = size(a);
aa(ma+nhood(1)-1,na+nhood(2)-1) = 0;
aa(floor((nhood(1)-1)/2)+(1:ma),floor((nhood(2)-1)/2)+(1:na)) = a;
[~,a(:)] = max(histc(im2col(aa,nhood,'sliding'),min(a(:))-1:max(a(:))));
a = a-1;
Usage:
x = randi(5,10,10);
y3 = intmodefilt(x,[3 3]);
For large arrays, this is over 75 times faster than colfilt on my machine. Replacing hist with histc is responsible for a factor of two speedup. There is of course no input checking so the function assumes that a is all integers, etc.
Lastly, note that randi(IMAX,N,N) returns values in the range 1:IMAX, not 0:IMAX as you seem to state.
One suggestion would be to reshape your array so each 3x3 block becomes a column vector. If your initial array dimensions are divisible by 3, this is simple. If they don't, you need to work a little bit harder. And you need to repeat this nine times, starting at different offsets into the matrix - I will leave that as an exercise.
Here is some code that shows the basic idea (using only functions available in FreeMat - I don't have Matlab on my machine at home...):
N = 100;
A = randi(0,5*ones(3*N,3*N));
B = reshape(permute(reshape(A,[3 N 3 N]),[1 3 2 4]), [ 9 N*N]);
hh = hist(B, 0:5); % histogram of each 3x3 block: bin with largest value is the mode
[mm mi] = max(hh); % mi will contain bin with largest value
figure; hist(B(:),0:5); title 'histogram of B'; % flat, as expected
figure; hist(mi-1, 0:5); title 'histogram of mi' % not flat?...
Here are the plots:
The strange thing, when you run this code, is that the distribution of mi is not flat, but skewed towards smaller values. When you inspect the histograms, you will see that is because you will frequently have more than one bin with the "max" value in it. In that case, you get the first bin with the max number. This is obviously going to skew your results badly; something to think about. A much better filter might be a median filter - the one that has equal numbers of neighboring pixels above and below. That has a unique solution (while mode can have up to four values, for nine pixels - namely, four bins with two values each).
Something to think about.
Can't show you a mex example today (wrong computer); but there are ample good examples on the Mathworks website (and all over the web) that are quite easy to follow. See for example http://www.shawnlankton.com/2008/03/getting-started-with-mex-a-short-tutorial/