I know how to generate a combined image:
STEP1: I = imread('image.jpg');
STEP2: Ibw = single(im2double(I));
STEP3: [U S V] = svd(Ibw); %where U and S are letf and right odd vectors, respectively, and D the
%diagonal matrix of particular values
% calculate derived image
STEP4: P = U * power(S, i) * V'; % where i is between 1 and 2
%To compute the combined image of SVD perturbations:
STEP5: J = (single(I) + (alpha*P))/(1+alpha); % where alpha is between 0 and 1
So by integrating P into I , we get a combined image J which keeps the main information of the original image and is expected to work better against minor changes of expression, illumination and occlusions..
I have some questions:
1) I would like to know in details What is the motivation of applying Step3 ? and what we are perturbing here?
2)In Step3, what was meant by "particular values"?
3) The derived image P can also be called: "the perturbed image"?
Any help will be very appreciated!
This method originated from this paper that can be accessed here. Let's answer your questions in order.
If you want to know why this step is useful, you need to know a bit of theory about how the SVD works. The SVD stands for Singular Value Decomposition. What you are doing with the SVD is that it is transforming your N-dimensional data in such a way where it orders it according to which dimension exhibits the most amount of variation, and the other dimensions are ordered by this variation in decreasing order (SVD experts and math purists... don't shoot me. This is how I understand the SVD to be). The singular values in this particular context give you a weighting of how much each dimension of your data contributes to in its overall decomposition.
Therefore, by applying that particular step (P = U * power(S, i) * V';), you are giving more emphasis to the "variation" in your data so that the most important features in your image will stand out while the unimportant ones will "fade" away. This is really the only rationale that I can see behind why they're doing this.
The "particular" values are the singular values. These values are part of the S matrix and those values appear in the diagonals of the matrix.
I wouldn't call P the derived image, but an image that locates which parts of the image are more important in comparison to the rest of the image. By mixing this with the original image, those features that you should concentrate on are more emphasized while the other parts of the image that most people wouldn't pay attention to, the get de-emphasized in the overall result.
I would recommend you read that paper that you got this algorithm from as it explains the whole process fairly well.
Some more references for you
Take a look at this great tutorial on the SVD here. Also, this post may answer more questions regarding the insight of the algorithm.
Related
I have to use SVD in Matlab to obtain a reduced version of my data.
I've read that the function svds(X,k) performs the SVD and returns the first k eigenvalues and eigenvectors. There is not mention in the documentation if the data have to be normalized.
With normalization I mean both substraction of the mean value and division by the standard deviation.
When I implemented PCA, I used to normalize in such way. But I know that it is not needed when using the matlab function pca() because it computes the covariance matrix by using cov() which implicitly performs the normalization.
So, the question is. I need the projection matrix useful to reduce my n-dim data to k-dim ones by SVD. Should I perform data normalization of the train data (and therefore, the same normalization to further projected new data) or not?
Thanks
Essentially, the answer is yes, you should typically perform normalization. The reason is that features can have very different scalings, and we typically do not want to take scaling into account when considering the uniqueness of features.
Suppose we have two features x and y, both with variance 1, but where x has a mean of 1 and y has a mean of 1000. Then the matrix of samples will look like
n = 500; % samples
x = 1 + randn(n,1);
y = 1000 + randn(n,1);
svd([x,y])
But the problem with this is that the scale of y (without normalizing) essentially washes out the small variations in x. Specifically, if we just examine the singular values of [x,y], we might be inclined to say that x is a linear factor of y (since one of the singular values is much smaller than the other). But actually, we know that that is not the case since x was generated independently.
In fact, you will often find that you only see the "real" data in a signal once we remove the mean. At the extremely end, you could image that we have some feature
z = 1e6 + sin(t)
Now if somebody just gave you those numbers, you might look at the sequence
z = 1000001.54, 1000001.2, 1000001.4,...
and just think, "that signal is boring, it basically is just 1e6 plus some round off terms...". But once we remove the mean, we see the signal for what it actually is... a very interesting and specific one indeed. So long story short, you should always remove the means and scale.
It really depends on what you want to do with your data. Centering and scaling can be helpful to obtain principial components that are representative of the shape of the variations in the data, irrespective of the scaling. I would say it is mostly needed if you want to further use the principal components itself, particularly, if you want to visualize them. It can also help during classification since your scores will then be normalized which may help your classifier. However, it depends on the application since in some applications the energy also carries useful information that one should not discard - there is no general answer!
Now you write that all you need is "the projection matrix useful to reduce my n-dim data to k-dim ones by SVD". In this case, no need to center or scale anything:
[U,~] = svd(TrainingData);
RecudedData = U(:,k)'*TestData;
will do the job. The svds may be worth considering when your TrainingData is huge (in both dimensions) so that svd is too slow (if it is huge in one dimension, just apply svd to the gram matrix).
It depends!!!
A common use in signal processing where it makes no sense to normalize is noise reduction via dimensionality reduction in correlated signals where all the fearures are contiminated with a random gaussian noise with the same variance. In that case if the magnitude of a certain feature is twice as large it's snr is also approximately twice as large so normalizing the features makes no sense since it would just make the parts with the worse snr larger and the parts with the good snr smaller. You also don't need to subtract the mean in that case (like in PCA), the mean (or dc) isn't different then any other frequency.
Can someone explain to me the correlation function corr2 in MATLAB? I know that it is for 2D comparing similarities of objects, but in the equation I have doubts what it is A and B (probably matrices for comparison), and also Amn and Bmn.
I'm not sure how MATLAB executes this function, because I have found in several cases that the correlation is not executed for the entire image (matrix) but instead it divides the image into blocks and then compares blocks of one picture with blocks of another picture.
In MATLAB's documentation, the corr2 equation is not put as referral point to the way the equation itself is calculated, like in other functions in MATLAB's documentation, such as referring to what book it is taken from and where it is explained.
The correlation coefficient is a number representing the similarity between 2 images in relation with their respective pixel intensity.
As you pointed out this function is used to calculate this coefficient:
Here A and B are the images you are comparing, whereas the subscript indices m and n refer to the pixel location in the image. Basically what Matab does is to compute, for every pixel location in both images, the difference between the intensity value at that pixel and the mean intensity of the whole image, denoted as a letter with a straightline over it.
As Kostya pointed out, typing edit corr2 in the command window will show you the code used by Matlab to compute the correlation coefficient. The formula is basically this:
a = a - mean2(a);
b = b - mean2(b);
r = sum(sum(a.*b))/sqrt(sum(sum(a.*a))*sum(sum(b.*b)));
where:
a is the input image and b is the image you wish to compare to a.
If we break down the formula, we see that a - mean2(a) and b-mean2(b) are the elements in the numerator of the above equation. mean2(a) is equivalent to mean(mean(a)) or mean(a(:)), that is the mean intensity of the whole image. This is only calculated once.
The 3rd line of code calculates the coefficient. Here sum(sum(a.*b)) calculates the double-sum present in the formula element-wise, that is considering each pixel location separately. Be aware that using sum(a) calculates the sum in every column individually, hence in order to get a single value you need to apply sum twice.
That's pretty much the same happening in the denominator, however calculations are performed on a-mean2(a)^2 and b-mean2(b)^2. You can see this a some kind of normalization process in which you consider the pixel intensity difference among each individual image.
As for your last comment, you can break down an image into small blocks and calculate the correlation coefficient on them; that might save some time for very large images but since everything is vectorized the calculation is quite fast. It might be useful in distributed processing I guess. Of course the correlation coefficient between 2 blocks of images is not necessarily identical to that of the whole image.
For the sake of curiosity you can look at this paper which highlights some caveats in using the correlation coefficient for image comparison.
Hope that makes things a bit clearer!
My objective is to handle illumination and expression variations on an image. So I tried to implement a MATLAB code in order to work with only the important information within the image. In other words, to work with only the "USEFUL" information on an image. To do that, it is necessary to delete all unimportant information from the image.
Reference: this paper
Lets see my steps:
1) Apply the Histogram Equalization in order to get an histo_equalized_image=histeq(MyGrayImage). so that large intensity variations
can be handled to some extent.
2) Apply svd approximations on the histo_equalized_image. But before to do that, I applied the svd decomposition ([L D R]=svd(histo_equalized_image)), then these singular values are used to make the derived image J=L*power(D, i)*R where i varies between 1 and 2.
3) Finally, the derived image is combined with the original image to: C=(MyGrayImage+(a*J))/1+a. Where a varies from 0 to 1.
4) But all the steps above are not able to perform well under varying conditions. So finally, wavelet transform should be used to handle those variations(we use only the LL image bloc). Low frequency component contains the useful information, also, unimportant
information gets lost in this component. The (LL) component is ineffective with illumination changes and expression variations.
I wrote a matlab code for that, and I would like to know if my code is correct or no (if no, so how to correct it). Furthermore, I am interested to know if I can optimize these steps. Can we improve this method? if yes, so how? Please I need help.
Lets see now my Matlab code:
%Read the RGB image
image=imread('img.jpg');
%convert it to grayscale
image_gray=rgb2gray(image);
%convert it to double
image_double=im2double(image_gray);
%Apply histogram equalization
histo_equalized_image=histeq(image_double);
%Apply the svd decomposition
[U S V] = svd(histo_equalized_image);
%calculate the derived image
P=U * power(S, 5/4) * V';
%Linearly combine both images
J=(single(histo_equalized_image) + (0.25 * P)) / (1 + 0.25);
%Apply DWT
[c,s]=wavedec2(J,2,'haar');
a1=appcoef2(c,s,'haar',1); % I need only the LL bloc.
You need to define, what do you mean by "USEFUL" or "important" information. And only then do some steps.
Histogram equalization is global transformation, which gives different results on different images. You can make an experiment - do histeq on image, that benefits from it. Then make two copies of the original image and draw in one black square (30% of image area) and white square on second. Then apply histeq and compare results.
Low frequency component contains the useful information, also,
unimportant information gets lost in this component.
Really? Edges and shapes - which are (at least for me) quite important are in high frequencies. Again we need definition of "useful" information.
I cannot see theoretical background why and how your approach would work. Could you a little bit explain, why do you choose this method?
P.S. I`m not sure if this papers are relevant to you, but recommend "Which Edges Matter?" by Bansal et al. and "Multi-Scale Image Contrast Enhancement" by V. Vonikakis and I. Andreadis.
I want to compute a general PCA matrix for a dataset, and I will use it to reduce dimensions of sift descriptors. I have already found some algorithms to compute it, but I couldn't find a way to compute it by using MATLAB.
Can someone help me?
[coeff, score] = princomp(X)
is the right thing to do, but knowing how to use it is a little tricky.
My understanding is that you did something like:
sift_image = sift_fun(img)
which gives you a binary image: sift_feature?
(Even if not binary, this still works.)
Inputs, formulating X:
To use princomp/pca formulate X so that each column is a numel(sift_image) x 1 vector (i.e. sift_image(:))
Do this for all your images and line them up as columns in X. So X will be numel(sift_image) x num_images.
If your images aren't the same size (e.g. pixel dimensions different, more or less of a scene in the images), then you'll need to bring them into some common space, which is a whole different problem.
Unless your stuff is binary, you'll probably want to de-mean/normalize X, both in the column direction (i.e. normalizing each individual image) and row direction (de-meaning the whole dataset).
Outputs
score is the set of eigen vectors: it will be num_pixels * num_images.
To get, say the first eigen vector back into an image shape, do:
first_component = reshape(score(:,1),size(im));
And so on for the rest of the components. There are as many components as input images.
Each row of coeff is the num_images (equal to num_components) set of weights that can be applied to generate each input image. i.e.
input_image_1 = reshape(score * coeff(:,1) , size(original_im));
where input_image_1 is the correct, original shape
coeff(1,:) is a vector (num_images x 1)
score is pixels x num_images
(Disclaimer: I may have the columns/rows mixed up, but the descriptions are correct.)
Does that help?
If you have access to Statistics Toolbox, you can use the command princomp, or in recent versions the command pca.
I have two images I'm trying to co-register - ie, one could be of a ball in the centre of the picture, the other is of the same ball near the edge and I'm trying to find the numbed of pixels I have to move the second image so that the balls would be in the same place. (I'm actually using 3D MRI brain scans, but the principle is the same).
I've written a function that will move the ball left, right, up or down by a given number of pixels as well as another function that compares the correlation of the ball-in-the-centre image with the translated ball-at-the-edge image. When the two balls are in the same place the correlation function will return 0 and a number larger than 0 for other positions.
I'm trying to use fminsearch (documentation) to find the optimal translation for the correlation function's minimum (ie, the balls being in the same place) like so:
global reference_im unknown_im;
starting_trans = [0 0 0];
trans_vector = fminsearch(#correlate_images,starting_trans)
correlate_images.m:
function r = correlate_images(translate)
global reference_im unknown_im;
new_im = move_image(unknown_im,translate(1),translate(2),translate(3));
% This bit is unimportant to the question
% but you can see how I calculate my correlation
r = 1 - corr(reshape(new_im,[],1),reshape(reference_im,[],1));
There are two problems, firstly fminsearch insists on passing float values for the translation vector into the correlate_images function. Is there any way to inform it that only integers are necessary? (I would save a large number of cpu cycles!)
Secondly, when I run this program the resulting trans_vector is always the same as starting_trans - I assume this is because no minimum has been found, but is there another reason its just plain not working?
Many thanks!
EDIT
I've discovered what I think is the reason the output trans_vector is always the same as starting_trans. The fminsearch looks at the starting value, then a small increment in each direction from there, this small increment is always less than one, which means that the result from the correlation will be a perfect match (as the move_image will return the same as the input image for sub-pixel movements). I'm going to continue working on convincing matlab to only fminsearch over integer values!
First, I'd say that Matlab might not be the best tool for this problem. I'd look at Elastix, which is a pretty user-friendly wrapper around the registration functions in ITK. You get a variety of registration techniques, and the manuals for both programs do a good job of explaining the specifics of image registration.
Second, for this kind of simple translational registration, you can use the FFT. Forward transform both images, multiply the images together (pointwise! That is, use A .* B, not A * B, as those are different operations, and the first is what you want), and there should be a peak in the inverse transform whose offset from the origin is the translational amount you need. Numerical Recipes in C has a good explanation; here's a link to an index pdf. The speed difference between the FFT version and the direct correlation version is huge; the FFT is O(N log N), while the correlation method will be O (N * M), where M is the number of pixels in your search neighborhood. If you want to allow the entire image to be searched, then correlation becomes O (N*N), which will take much longer than the FFT version. Changing parameters from floats to integers won't solve the problem.
The reason the fminsearch function uses floats (if I can guess at the reasons behind the coders' decisions) is that for problems that aren't test problems (ie, spheres in a volume), you often need sub-pixel resolution to perform a correct registration. Take a look at the ITK documentation about the reasons behind this approach.
Third, I'd suggest that a good way to write this program in Matlab (if you still want to do so!) while still forcing integer correlations would be to avoid the fminsearch function, which will want to use floats. Try something like:
startXPos = -10; %these parameters dictate the size of your search neighborhood
startYPos = -10; %corresponds to M in the above explanation
endXPos = 10;
endYPos = 10;
optimalX = 0;
optimalY = 0;
maxCorrVal = 0;
for i=startXPos:endXPos
for j = startYPos:endYPos
%test the correlation of the two images here, where one image is shifted to another
currCorrVal = Correlate(image1, image2OffsetByiAndj);
if (currCorrVal > maxCorrVal)
maxCorrVal = currCorrVal;
optimalX = i;
optimalY = j;
end
end
end
From here, you just have to write the offset function. This way, you avoid the float problem, and you're also incrementing your translation vector (I don't see any way for that vector to move in your provided functions, which probably explains your lack of movement).
There is a very similar demo in the Image Processing Toolbox that uses the normalized cross-correlation function normxcorr2 to perform image registration. To avoid repeating the same thing, check out the demo directly:
Registering an Image Using Normalized Cross-Correlation