I try to implement a facial feature detector with a CLM following the paper of Cootes et. al. ('Feature Detection and Tracking with Constrained Local Models' and 'Automatic feature localization with constrained local models').
The training of the model worked out well and I got the expected results. Now I'm stuck with the building of the response image. According to the paper I have to calculate the normalized correlation response of every template patch with the underlying image around his current position ( I used the normxcorr2 function from matlab).
I did this. But the result doesn't look like the pictures in the papers. And the Nelder-Meade optimization doesn't converge either.
Did I missed something? Has anyone implemented this algorithm yet and could anyone give me some hints? Is the normxcorr2 function a bad choice?
Thanks!
Related
Currently, I am working at a short project about stereo-vision.
I'm trying to create depth maps of a scenery. For this, I use my phone from to view points and use the following code/workflow provided by Matlab : https://nl.mathworks.com/help/vision/ug/uncalibrated-stereo-image-rectification.html
Following this code I am able to create nice disparity maps, but I want to now the depths (as in meters). For this, I need the baseline, focal length and disparity, as shown here: https://www.researchgate.net/figure/Relationship-between-the-baseline-b-disparity-d-focal-length-f-and-depth-z_fig1_2313285
The focal length and base-line are known, but not the baseline. I determined the estimate of the Fundamental Matrix. Is there a way to get from the Fundamental Matrix to the baseline, or by making some assumptions to get to the Essential Matrix, and from there to the baseline.
I would be thankful for any hint in the right direction!
"The focal length and base-line are known, but not the baseline."
I guess you mean the disparity map is known.
Without a known or estimated calibration matrix, you cannot determine the essential matrix.
(Compare Multi View Geometry of Hartley and Zisserman for details)
With respect to your available data, you cannot compute a metric reconstruction. From the fundamental matrix, you can only extract camera matrices in a canonical form that allow for a projective reconstruction and will not satisfy the true baseline of the setup. A projective reconstruction is a reconstruction that differs from the metric result by an unknown transformation.
Non-trivial techniques could allow to upgrade these reconstructions to an Euclidean reconstruction result. However, the success of these self-calibration techniques strongly depends of the quality of the data. Thus, using images of a calibrated camera is actually the best way to go.
What kind of noise removal training does the function 'denoisingNetwork' do, which is used as a part of 'denoiseImage'? Is it specific to some kind of noise and noise level or just a generalized network that gives an average output image?
It works only for gaussian noise, but with almost every level of noise.
It could be used to remove some other kind of noises, but that's not guaranteed.
However if you look at Matlab documentation it says that uses a pre-trained model called "DnCNN".
So i think that could be useful to see the relative paper:
link to paper
I'm trying to get a better understanding of neural networks by trying to programm a Convolution Neural Network by myself.
So far, I'm going to make it pretty simple by not using max-pooling and using simple ReLu-activation. I'm aware of the disadvantages of this setup, but the point is not making the best image detector in the world.
Now, I'm stuck understanding the details of the error calculation, propagating it back and how it interplays with the used activation-function for calculating the new weights.
I read this document (A Beginner's Guide To Understand CNN), but it doesn't help me understand much. The formula for calculating the error already confuses me.
This sum-function doesn't have defined start- and ending points, so i basically can't read it. Maybe you can simply provide me with the correct one?
After that, the author assumes a variable L that is just "that value" (i assume he means E_total?) and gives an example for how to define the new weight:
where W is the weights of a particular layer.
This confuses me, as i always stood under the impression the activation-function (ReLu in my case) played a role in how to calculate the new weight. Also, this seems to imply i simply use the error for all layers. Doesn't the error value i propagate back into the next layer somehow depends on what i calculated in the previous one?
Maybe all of this is just uncomplete and you can point me into the direction that helps me best for my case.
Thanks in advance.
You do not backpropagate errors, but gradients. The activation function plays a role in caculating the new weight, depending on whether or not the weight in question is before or after said activation, and whether or not it is connected. If a weight w is after your non-linearity layer f, then the gradient dL/dw wont depend on f. But if w is before f, then, if they are connected, then dL/dw will depend on f. For example, suppose w is the weight vector of a fully connected layer, and assume that f directly follows this layer. Then,
dL/dw=(dL/df)*df/dw //notations might change according to the shape
//of the tensors/matrices/vectors you chose, but
//this is just the chain rule
As for your cost function, it is correct. Many people write these formulas in this non-formal style so that you get the idea, but that you can adapt it to your own tensor shapes. By the way, this sort of MSE function is better suited to continous label spaces. You might want to use softmax or an svm loss for image classification (I'll come back to that). Anyway, as you requested a correct form for this function, here is an example. Imagine you have a neural network that predicts a vector field of some kind (like surface normals). Assume that it takes a 2d pixel x_i and predicts a 3d vector v_i for that pixel. Now, in your training data, x_i will already have a ground truth 3d vector (i.e label), that we'll call y_i. Then, your cost function will be (the index i runs on all data samples):
sum_i{(y_i-v_i)^t (y_i-vi)}=sum_i{||y_i-v_i||^2}
But as I said, this cost function works if the labels form a continuous space (here , R^3). This is also called a regression problem.
Here's an example if you are interested in (image) classification. I'll explain it with a softmax loss, the intuition for other losses is more or less similar. Assume we have n classes, and imagine that in your training set, for each data point x_i, you have a label c_i that indicates the correct class. Now, your neural network should produce scores for each possible label, that we'll note s_1,..,s_n. Let's note the score of the correct class of a training sample x_i as s_{c_i}. Now, if we use a softmax function, the intuition is to transform the scores into a probability distribution, and maximise the probability of the correct classes. That is , we maximse the function
sum_i { exp(s_{c_i}) / sum_j(exp(s_j))}
where i runs over all training samples, and j=1,..n on all class labels.
Finally, I don't think the guide you are reading is a good starting point. I recommend this excellent course instead (essentially the Andrew Karpathy parts at least).
I'm new to image processing and wanted to deblur/denoise some images using Matlab. Example:
Input
Output
I don't know the exact blurring/noising effects by which the second image came about. At first, I normally did so by trial and error of the Wiener deconvolution method, but not able to reach best results.
So my question is, is there a more clever method other than trial and error?
(Note: The output image was obtained from Robot36 radio transmission decoder.)
I'd suggest trying the Richardson-Lucy deblurring algorithm. It is available as a built-in function in the Image Processing Toolbox. It makes use of multiple iterations which you can set to control the degree to which you want the image deblurred. I've always found this a very useful method. Here's the doc link: deconvlucy
I'm trying the assess the correctness of my SURF descriptor implementation with the de facto standard framework by Mikolajczyk et. al. I'm using OpenCV to detect and describe SURF features, and use the same feature positions as input to my descriptor implementation.
To evaluate descriptor performance, the framework requires to evaluate detector repeatability first. Unfortunately, the repeatability test expects a list of feature positions along with ellipse parameters defining the size and orientation of an image region around each feature. However, OpenCV's SURF detector only provides feature position, scale and orientation.
The related paper proposes to compute those ellipse parameters iteratively from the eigenvalues of the second moment matrix. Is this the only way? As far as I can see, this would require some fiddling with OpenCV. Is there no way to compute those ellipse parameters afterwards (e.g. in Matlab) from the feature list and the input image?
Has anyone ever worked with this framework and could assist me with some insights or pointers?
You can use the file evaluation.cpp from OpenCV. Is in the directory OpenCV/modules/features2d/src. In this file you could use the class "EllipticKeyPoint", this class has one function to convert "KeyPoint" to "ElipticKeyPoint"
Honestly I never worked with this framework., but I think you should see this paper about a performance evaluation of local descriptors.