I implemented Grad-CAM and Guided Backprop as presented in the paper and everything is working as expected. The next step is to combine the class activation map and the gradient map to get the final weighted gradients. In the paper this is done by point-wise multiplication:
In order to combine the best aspects of both, we fuse Guided Backpropagation and Grad-CAM visualizations via pointwise multiplication (Grad-CAM is first up-sampled to the input image resolution using bi-linear interpolation)
The corresponding figure (cropped) is:
My problem is as follows: The class activation map contains mostly 0's, i.e. the blue regions, which will produce 0's when multiplied with the gradients. However, in the image the guided grad-cam map is mostly grey.
I'm aware that the grey area in the gradient map is due to the gradients being 0 in most places and normalization to the range [0,1] will put them somewhere around 0.5 (assuming that we have both positive and negative gradients with a similar magnitude). Still, multiplication with 0 will result in 0, which should be displayed as black.
For comparison my maps look like this:
Can anyone explain what operation is used to combine both maps? Or am I missing something else?
Thanks in advance.
All assumptions are correct. The thing I was missing is that in the case of guided Grad-CAM the weighting of the gradients is done before the normalization to the range [0,1].
Related
I found a Matlab implementation of the LKT algorithm here and it is based on the brightness constancy equation.
The algorithm calculates the Image gradients in x and y direction by convolving the image with appropriate 2x2 horizontal and vertical edge gradient operators.
The brightness constancy equation in the classic literature has on its right hand side the difference between two successive frames.
However, in the implementation referred to by the aforementioned link, the right hand side is the difference of convolution.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);
Why couldn't It_m be simply calculated as:
it_m = im1 - im2;
As you mentioned, in theory only pixel by pixel difference is stated for optical flow computation.
However, in practice, all natural (not synthetic) images contain some degree of noise. On the other hand, differentiating is some kind of high pass filter and would stress (high pass) noise ratio to the signal.
Therefore, to avoid artifact caused by noise, usually an image smoothing (or low pass filtering) is carried out prior to any image differentiating (we have such process in edge detection too). The code does exactly this, i.e. apply and moving average filter on the image to reduce noise effect.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);
(Comments converted to an answer.)
In theory, there is nothing wrong with taking a pixel-wise difference:
Im_t = im1-im2;
to compute the time derivative. Using a spatial smoother when computing the time derivative mitigates the effect of noise.
Moreover, looking at the way that code computes spatial (x and y) derivatives:
Ix_m = conv2(im1,[-1 1; -1 1], 'valid');
computing the time derivate with a similar kernel and the valid option ensures the matrices It_x, It_y and Im_t have compatible sizes.
The temporal partial derivative(along t), is connected to the spatial partial derivatives (along x and y).
Think of the video sequence you are analyzing as a volume, spatio-temporal volume. At any given point (x,y,t), if you want to estimate partial derivatives, i.e. estimate the 3D gradient at that point, then you will benefit from having 3 filters that have the same kernel support.
For more theory on why this should be so, look up the topic steerable filters, or better yet look up the fundamental concept of what partial derivative is supposed to be, and how it connects to directional derivatives.
Often, the 2D gradient is estimated first, and then people tend to think of the temporal derivative secondly as independent of the x and y component. This can, and very often do, lead to numerical errors in the final optical flow calculations. The common way to deal with those errors is to do a forward and backward flow estimation, and combine the results in the end.
One way to think of the gradient that you are estimating is that it has a support region that is 3D. The smallest size of such a region should be 2x2x2.
if you do 2D gradients in the first and second image both using only 2x2 filters, then the corresponding FIR filter for the 3D volume is collected by averaging the results of the two filters.
The fact that you should have the same filter support region in 2D is clear to most: thats why the Sobel and Scharr operators look the way they do.
You can see the sort of results you get from having sanely designed differential operators for optical flow in this Matlab toolbox that I made, in part to show this particular point.
This is a follow-up question on the one below:
Second moments question
MATLAB's regionprops function estimates an ellipse from a given set of 2d-points. This is done by using the image moments, they claim to use normalized second central moments, the formulas also follow what is suggested by the wikipedia link on image moments.
Effectively the covariance matrix of the region is calculated (in a slightly more efficient way) and then the square root of the eigenvalues of this matrix are calculated and put out as the major and minor axes - with one change: They are multiplied by a factor of 4.
Why?
Essentially, covariance estimation assumes a multivariate normal distribution. However, an arbitrary image region is most likely not normally distributed, I would rather expect a factor based on the assumption that data is uniformly distributed. So what is the justification for choosing 4?
Meanwhile I found the answer: Factor 4 yields correct results for regions with an elliptical shape. For e.g. rectangular or non-solid regions, the estimated axis lengths are incorrect, and the error varies nonlinearly with changes in the region.
I don't clearly understand how imresize works, especially when we are downscaling an image (say from 4x4 to 2x2). When we're upscaling it's easier to understand. I mean we've to just find intermediate points by either seeing which known point is closer (method = 'nearest') or use linear averaging of 4 closest known points (method = 'bilinear') and so on. We do not need any filter for this right?
And my main doubt is when we downscale. I understand from signal processing classes that to avoid aliasing a smoothening low pass filter must be applied before we decimate intermediate values. But which filter is MATLAB using? They just say methods and I don't understand how we can use 'bilinear' or 'bicubic' as a kernel.
Thank you for reading.
The documentation for the function seems to be incomplete. Open the imresize.m (edit imresize) and take a look at the contributions-function.
There you can see, that matlab is not using a 2x2 neibourhood when using the bilinear or bicubic-method and downscaling. The kernel size is increased to avoid aliasing.
Some explanations about the Math behind imresize. To simplify, I will explain the 1D case only. When a scale <1 is used, the window size is increased. This means, the resulting value is no longer the weighted average of the 2 (2x2 for images) Neighbours. Instead a larger window size of w (wxw) is used.
Start with the standard method:
The Image shows the common case, two known grid values averaged to a new one with the weights 1/5 and 4/5. Instead of the well known definition, the weights could also be defined drawing a triangle with the base w=2:
Now increasing the base of the triangle, we get the weights for a larger window size. A base of w=6 is drawn:
The new triangle defines the weight over 6 points.
Things are like this:
I have some graphs like the pictures above and I am trying to classify them to different kinds so the shape of a character can be recognized, and here is what I've done:
I apply a 2-D FFT to the graphs, so I can get the spectral analysis of these graphs. And here are some result:
S after 2-D FFT
T after 2-D FFT
I have found that the same letter share the same pattern of magnitude graph after FFT, and I want to use this feature to cluster these letters. But there is a problem: I want the features of interested can be presented in a 2-D plane, i.e in the form of (x,y), but the features here is actually a graph, with about 600*400 element, and I know the only thing I am interested is the shape of the graph(S is a dot in the middle, and T is like a cross). So what can I do to reduce the dimension of the magnitude graph?
I am not sure I am clear about my question here, but thanks in advance.
You can use dimensionality reduction methods such as
k-means clustering
SVM
PCA
MDS
Each of these methods can take 2-dimensional arrays, and work out the best coordinate frame to distinguish / represent etc your letters.
One way to start would be reducing your 240000 dimensional space to a 26-dimensional space using any of these methods.
This would give you an 'amplitude' for each of the possible letters.
But as #jucestain says, a network classifiers are great for letter recognition.
I have 50 images and created a database of the green channel of each image by separating them into two classes (Skin and wound) and storing the their respective green channel value.
Also, I have 1600 wound pixel values and 3000 skin pixel values.
Now I have to use bayes classification in matlab to classify the skin and wound pixels in a new (test) image using the data base that I have. I have tried the in-built command diaglinear but results are poor resulting in lot of misclassification.
Also, I dont know if it's a normal distribution or not so can't use gaussian estimation for finding the conditional probability density function for the data.
Is there any way to perform pixel wise classification?
If there is any part of the question that is unclear, please ask.
I'm looking for help. Thanks in advance.
If you realy want to use pixel wise classification (quite simple, but why not?) try exploring pixel value distributions with hist()/imhist(). It might give you a clue about a gaussianity...
Second, you might fit your values to the some appropriate curves (gaussians?) manually with fit() if you have curve fitting toolbox (or again do it manualy). Then multiply the curves by probabilities of the wound/skin if you like it to be MAP classifier, and finally find their intersection. Voela! you have your descition value V.
if Xi skin
else -> wound