Lukas Kanade optical flow: Understanding the math - matlab

I found a Matlab implementation of the LKT algorithm here and it is based on the brightness constancy equation.
The algorithm calculates the Image gradients in x and y direction by convolving the image with appropriate 2x2 horizontal and vertical edge gradient operators.
The brightness constancy equation in the classic literature has on its right hand side the difference between two successive frames.
However, in the implementation referred to by the aforementioned link, the right hand side is the difference of convolution.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);
Why couldn't It_m be simply calculated as:
it_m = im1 - im2;

As you mentioned, in theory only pixel by pixel difference is stated for optical flow computation.
However, in practice, all natural (not synthetic) images contain some degree of noise. On the other hand, differentiating is some kind of high pass filter and would stress (high pass) noise ratio to the signal.
Therefore, to avoid artifact caused by noise, usually an image smoothing (or low pass filtering) is carried out prior to any image differentiating (we have such process in edge detection too). The code does exactly this, i.e. apply and moving average filter on the image to reduce noise effect.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);

(Comments converted to an answer.)
In theory, there is nothing wrong with taking a pixel-wise difference:
Im_t = im1-im2;
to compute the time derivative. Using a spatial smoother when computing the time derivative mitigates the effect of noise.
Moreover, looking at the way that code computes spatial (x and y) derivatives:
Ix_m = conv2(im1,[-1 1; -1 1], 'valid');
computing the time derivate with a similar kernel and the valid option ensures the matrices It_x, It_y and Im_t have compatible sizes.

The temporal partial derivative(along t), is connected to the spatial partial derivatives (along x and y).
Think of the video sequence you are analyzing as a volume, spatio-temporal volume. At any given point (x,y,t), if you want to estimate partial derivatives, i.e. estimate the 3D gradient at that point, then you will benefit from having 3 filters that have the same kernel support.
For more theory on why this should be so, look up the topic steerable filters, or better yet look up the fundamental concept of what partial derivative is supposed to be, and how it connects to directional derivatives.
Often, the 2D gradient is estimated first, and then people tend to think of the temporal derivative secondly as independent of the x and y component. This can, and very often do, lead to numerical errors in the final optical flow calculations. The common way to deal with those errors is to do a forward and backward flow estimation, and combine the results in the end.
One way to think of the gradient that you are estimating is that it has a support region that is 3D. The smallest size of such a region should be 2x2x2.
if you do 2D gradients in the first and second image both using only 2x2 filters, then the corresponding FIR filter for the 3D volume is collected by averaging the results of the two filters.
The fact that you should have the same filter support region in 2D is clear to most: thats why the Sobel and Scharr operators look the way they do.
You can see the sort of results you get from having sanely designed differential operators for optical flow in this Matlab toolbox that I made, in part to show this particular point.

Related

Can the baseline between two cameras be determined from an uncalibrated rectified image pair?

Currently, I am working at a short project about stereo-vision.
I'm trying to create depth maps of a scenery. For this, I use my phone from to view points and use the following code/workflow provided by Matlab : https://nl.mathworks.com/help/vision/ug/uncalibrated-stereo-image-rectification.html
Following this code I am able to create nice disparity maps, but I want to now the depths (as in meters). For this, I need the baseline, focal length and disparity, as shown here: https://www.researchgate.net/figure/Relationship-between-the-baseline-b-disparity-d-focal-length-f-and-depth-z_fig1_2313285
The focal length and base-line are known, but not the baseline. I determined the estimate of the Fundamental Matrix. Is there a way to get from the Fundamental Matrix to the baseline, or by making some assumptions to get to the Essential Matrix, and from there to the baseline.
I would be thankful for any hint in the right direction!
"The focal length and base-line are known, but not the baseline."
I guess you mean the disparity map is known.
Without a known or estimated calibration matrix, you cannot determine the essential matrix.
(Compare Multi View Geometry of Hartley and Zisserman for details)
With respect to your available data, you cannot compute a metric reconstruction. From the fundamental matrix, you can only extract camera matrices in a canonical form that allow for a projective reconstruction and will not satisfy the true baseline of the setup. A projective reconstruction is a reconstruction that differs from the metric result by an unknown transformation.
Non-trivial techniques could allow to upgrade these reconstructions to an Euclidean reconstruction result. However, the success of these self-calibration techniques strongly depends of the quality of the data. Thus, using images of a calibrated camera is actually the best way to go.

Thresholds for different edge detectors are comparable?

I was developing an analysis of the performance of different edge detetors (Canny, Sobel and Roberts). Matlab give us the function edge, that has as one of its inputs the parameter threshold. I gave the same threshold (=0.1) to all of them (Matlab automatically generated the low threshold for Canny's detector). The result, given the code that I wrote, was:
(Ignored the LoG detector, I think I can interpret those results).
After that, I tested those same filters but with a different threshold (=0.8, which gave a 0.32 low-threshold for Canny's detector). However, now only Canny detects boundaries that are associated with stronger edges (stronger gradients associated with boundaries that separate structures with higher contrast):
!shows same results for higher threshold, some
methods don’t find any edges
I can't understand those results, because if Canny detects stronger boundaries and Sobel is more sensitive for stronger boundaries (as we seen for threshold = 0.1 where it almost only detects abrupt changes of intensity), then why does Sobel not seem to calculate an estimate of the gradient that is comparable to that given by Canny?
With that arises another question: what does the threshold value for Canny, Sobel and Roberts really mean? I would say they were a value of the magnitude of the gradient, somehow normalized because it has to belong to [0,1] (that I don't understand as well, normalized relative to what?)
Different edge detectors have no reason to require equal thresholds because they respond differently to different types of edges (in terms of contrast, sharpness, noise), and no threshold will segment the same edge set.
In addition, the formulas can have different "scaling factors", depending on the implementation. The best you can hope for is that if you pick thresholds that suit you for different methods on the same image, the thresholds will vary proportionally on other images.

DWT: What is it and when and where we use it

I was reading up on the DWT for the first time and the document stated that it is used to represent time-frequency data of a signal which other transforms do not provide.
But when I look for a usage example of the DWT in MATLAB I see the following code:
X=imread('cameraman.tif');
X=im2double(X);
[F1,F2]= wfilters('db1', 'd');
[LL,LH,HL,HH] = dwt2(X,'db1','d');
I am unable to understand the implementation of dwt2 or rather what is it and when and where we use it. What actually does dwt2 return and what does the above code do?
The first two statements simply read in the image, and convert it so that the dynamic range of each channel is between [0,1] through im2double.
Now, the third statement, wfilters constructs the wavelet filter banks for you. These filter banks are what are used in the DWT. The method of the DWT is the same, but you can use different kinds of filters to achieve specific results.
Basically, with wfilters, you get to choose what kind of filter you want (in your case, you chose db1: Daubechies), and you can optionally specify the type of filter that you want. Different filters provide different results and have different characteristics. There are a lot of different wavelet filter banks you could use and I'm not quite the expert as to the advantages and disadvantages for each filter bank that exists. Traditionally, Daubechies-type filters are used so stick with those if you don't know which ones to use.
Not specifying the type will output both the decomposition and the reconstruction filters. Decomposition is the forward transformation where you are given the original image / 2D data and want to transform it using the DWT. Reconstruction is the reverse transformation where you are given the transform data and want to recreate the original data.
The fourth statement, dwt2, computes the 2D DWT for you, but we will get into that later.
You specified the flag d, so you want only the decomposition filters. You can use wfilters as input into the 2D DWT if you wish, as this will specify the low-pass and high-pass filters that you want to use when decomposing your image. You don't have to do it like this. You can simply specify what filter you want to use, which is how you're calling the function in your code. In other words, you can do this:
[F1,F2]= wfilters('db1', 'd');
[LL,LH,HL,HH] = dwt2(X,F1,F2);
... or you can just do this:
[LL,LH,HL,HH] = dwt2(X,'db1','d');
The above statements are the same thing. Note that there is a 'd' flag on the dwt2 function because you want the forward transform as well.
Now, dwt2 is the 2D DWT (Discrete Wavelet Transform). I won't go into the DWT in detail here because this isn't the place to talk about it, but I would definitely check out this link for better details. They also have fully working MATLAB code and their own implementation of the 2D DWT so you can fully understand what exactly the DWT is and how it's computed.
However, the basics behind the 2D DWT is that it is known as a multi-resolution transform. It analyzes your signal and decomposes your signal into multiple scales / sizes and features. Each scale / size has a bunch of features that describe something about the signal that was not seen in the other scales.
One thing about the DWT is that it naturally subsamples your image by a factor of 2 (i.e. halves each dimension) after the analysis is done - hence the multi-resolution bit I was talking about. For MATLAB, dwt2 outputs four different variables, and these correspond to the variable names of the output of dwt2:
LL - Low-Low. This means that the vertical direction of your 2D image / signal is low-pass filtered as well as the horizontal direction.
LH - Low-High. This means that the vertical direction of your 2D image / signal is low-pass filtered while the horizontal direction is high-pass filtered.
HL - High-Low. This means that the vertical direction of your 2D image / signal is high-pass filtered while the horizontal direction is low-pass filtered.
HH - High-High. This means that the vertical direction of your 2D image / signal is high-pass filtered as well as the horizontal direction.
Roughly speaking, LL corresponds to just the structural / predominant information of your image while HH corresponds to the edges of your image. The LH and HL components I'm not too familiar with, but they're used in feature analysis sometimes. If you want to do a further decomposition, you would apply the DWT again on the LL only. However, depending on your analysis, the other components are used.... it just depends on what you want to use it for! dwt2 only performs a single-level DWT decomposition, so if you want to use this again for the next level, you would call dwt2 on the LL component.
Applications
Now, for your specific question of applications. The DWT for images is mostly used in image compression and image analysis. One application of the 2D DWT is in JPEG 2000. The core of the algorithm is that they break down the image into the DWT components, then construct trees of the coefficients generated by the DWT to determine which components can be omitted before you save the image. This way, you eliminate extraneous information, but there is also a great benefit that the DWT is lossless. I don't know which filter(s) is/are being used in JPEG 2000, but I know for certain that the standard is lossless. This means that you will be able to reconstruct the original data back without any artifacts or quantization errors. JPEG 2000 also has a lossy option, where you can reduce the file size even more by eliminating more of the DWT coefficients in such a way that is imperceptible to the average use.
Another application is in watermarking images. You can embed information in the wavelet coefficients so that it prevents people from trying to steal your images without acknowledgement. The DWT is also heavily used in medical image analysis and compression as the images generated in this domain are quite high resolution and quite large. It would be extremely useful if you could represent the images in the same way but occupying less physical space in comparison to the standard image compression algorithms (that are also lossy if you want high compression ratios) that exist.
One more application I can think of would be the dynamic delivery of video content over networks. Depending on what your connection speed is or the resolution of your screen, you get a lower or higher quality video. If you specifically use the LL component of each frame, you would stream / use a particular version of the LL component depending on what device / connection you have. So if you had a bad connection or if your screen has a low resolution, you would most likely show the video with the smallest size. You would then keep increasing the resolution depending on the connection speed and/or the size of your screen.
This is just a taste as to what the DWT is used for (personally, I don't use it because the DWT is used in domains that I don't personally have any experience in), but there are a lot more applications that are quite useful where the DWT is used.

difference between the gaussian LPF and ideal LPF in frequency domain in image processing

I am working with the same image and I also need to remove the texture from the image posted in this link
How can I remove the texture from an image using matlab?
Discussions were made on this and I'm quite confused which filter(gaussian LPF or ideal lowpass) is really needed and what is the reason behind this.Which frequencies contribute for this texture????please can someone explain me!
An ideal low pass filter will keep all spatial frequencies below a nominal spatial frequency, and remove all spatial frequencies above it. Unfortunately, a true ideal low pass filter has infinite support (i.e., has an infinitely large non-zero spatial extend). Even a practical approximation to an ideal low pass filter has large spatial support.
A Gaussian, on the other hand, isn't ideal in terms of which frequencies it filters out. A Gaussian in the spatial domain turns out to be a Gaussian in the spatial frequency domain. That is, it doesn't produce very sharp spatial frequency selectivity. The advantage though is that the spatial support of the filter is small. People use Gaussian filters for this because they are convenient mostly. Filtering with a Gaussian tends to look "natural" compared to ideal low pass filters, which can generate ringing artifacts.
A Lanczos filter (windowed sinc filter) is also another choice as it will have a small spatial support and will approximate an ideal filter better than a Gaussian.
However, which is better for your image largely depends on what you want to do. While there's significant theory behind it, qualitative choices like this in image processing are largely an art.
The type of filter you are looking for is ideally nonlinear:
smoothing in areas without large-scale gradients (edges), and
little smoothing close to edges to be preserved.
Here are two alternatives:
The Kuwahara filter:
http://homepage.tudelft.nl/e3q6n/publications/1999/PAA99DRBDPVLV/PAA99DRBDPVLV.pdf
Enhanced shortening flow (Figure 8) in:
http://www.cs.jhu.edu/~misha/Fall07/Papers/intro-to-scalespace.pdf
In the second filter (Enhanced shortening flow), you can
vary the scale parameter and the nonlinear function,
h(Lw) on page 17. Thus, more trimming possibilities.
Ideally, the filter is completely isotropic
(same frequency effect on each possible angle).
Michael

High-pass filtering in MATLAB

Does anyone know how to use filters in MATLAB?
I am not an aficionado, so I'm not concerned with roll-off characteristics etc — I have a 1 dimensional signal vector x sampled at 100 kHz, and I want to perform a high pass filtering on it (say, rejecting anything below 10Hz) to remove the baseline drift.
There are Butterworth, Elliptical, and Chebychev filters described in the help, but no simple explanation as to how to implement.
There are several filters that can be used, and the actual choice of the filter will depend on what you're trying to achieve. Since you mentioned Butterworth, Chebyschev and Elliptical filters, I'm assuming you're looking for IIR filters in general.
Wikipedia is a good place to start reading up on the different filters and what they do. For example, Butterworth is maximally flat in the passband and the response rolls off in the stop band. In Chebyschev, you have a smooth response in either the passband (type 2) or the stop band (type 1) and larger, irregular ripples in the other and lastly, in Elliptical filters, there's ripples in both the bands. The following image is taken from wikipedia.
So in all three cases, you have to trade something for something else. In Butterworth, you get no ripples, but the frequency response roll off is slower. In the above figure, it takes from 0.4 to about 0.55 to get to half power. In Chebyschev, you get steeper roll off, but you have to allow for irregular and larger ripples in one of the bands, and in Elliptical, you get near-instant cut off, but you have ripples in both bands.
The choice of filter will depend entirely on your application. Are you trying to get a clean signal with little to no losses? Then you need something that gives you a smooth response in the passband (Butterworth/Cheby2). Are you trying to kill frequencies in the stopband, and you won't mind a minor loss in the response in the passband? Then you will need something that's smooth in the stop band (Cheby1). Do you need extremely sharp cut-off corners, i.e., anything a little beyond the passband is detrimental to your analysis? If so, you should use Elliptical filters.
The thing to remember about IIR filters is that they've got poles. Unlike FIR filters where you can increase the order of the filter with the only ramification being the filter delay, increasing the order of IIR filters will make the filter unstable. By unstable, I mean you will have poles that lie outside the unit circle. To see why this is so, you can read the wiki articles on IIR filters, especially the part on stability.
To further illustrate my point, consider the following band pass filter.
fpass=[0.05 0.2];%# passband
fstop=[0.045 0.205]; %# frequency where it rolls off to half power
Rpass=1;%# max permissible ripples in stopband (dB)
Astop=40;%# min 40dB attenuation
n=cheb2ord(fpass,fstop,Rpass,Astop);%# calculate minimum filter order to achieve these design requirements
[b,a]=cheby2(n,Astop,fstop);
Now if you look at the zero-pole diagram using zplane(b,a), you'll see that there are several poles (x) lying outside the unit circle, which makes this approach unstable.
and this is evident from the fact that the frequency response is all haywire. Use freqz(b,a) to get the following
To get a more stable filter with your exact design requirements, you'll need to use second order filters using the z-p-k method instead of b-a, in MATLAB. Here's how for the same filter as above:
[z,p,k]=cheby2(n,Astop,fstop);
[s,g]=zp2sos(z,p,k);%# create second order sections
Hd=dfilt.df2sos(s,g);%# create a dfilt object.
Now if you look at the characteristics of this filter, you'll see that all the poles lie inside the unit circle (hence stable) and matches the design requirements
The approach is similar for butter and ellip, with equivalent buttord and ellipord. The MATLAB documentation also has good examples on designing filters. You can build upon these examples and mine to design a filter according to what you want.
To use the filter on your data, you can either do filter(b,a,data) or filter(Hd,data) depending on what filter you eventually use. If you want zero phase distortion, use filtfilt. However, this does not accept dfilt objects. So to zero-phase filter with Hd, use the filtfilthd file available on the Mathworks file exchange site
EDIT
This is in response to #DarenW's comment. Smoothing and filtering are two different operations, and although they're similar in some regards (moving average is a low pass filter), you can't simply substitute one for the other unless it you can be sure that it won't be of concern in the specific application.
For example, implementing Daren's suggestion on a linear chirp signal from 0-25kHz, sampled at 100kHz, this the frequency spectrum after smoothing with a Gaussian filter
Sure, the drift close to 10Hz is almost nil. However, the operation has completely changed the nature of the frequency components in the original signal. This discrepancy comes about because they completely ignored the roll-off of the smoothing operation (see red line), and assumed that it would be flat zero. If that were true, then the subtraction would've worked. But alas, that is not the case, which is why an entire field on designing filters exists.
Create your filter - for example using [B,A] = butter(N,Wn,'high') where N is the order of the filter - if you are unsure what this is, just set it to 10. Wn is the cutoff frequency normalized between 0 and 1, with 1 corresponding to half the sample rate of the signal. If your sample rate is fs, and you want a cutoff frequency of 10 Hz, you need to set Wn = (10/(fs/2)).
You can then apply the filter by using Y = filter(B,A,X) where X is your signal. You can also look into the filtfilt function.
A cheapo way to do this kind of filtering that doesn't involve straining brain cells on design, zeros and poles and ripple and all that, is:
* Make a copy of the signal
* Smooth it. For a 100KHz signal and wanting to eliminate about 10Hz on down, you'll need to smooth over about 10,000 points. Use a Gaussian smoother, or a box smoother maybe 1/2 that width twice, or whatever is handy. (A simple box smoother of total width 10,000 used once may produce unwanted edge effects)
* Subtract the smoothed version from the original. Baseline drift will be gone.
If the original signal is spikey, you may want to use a short median filter before the big smoother.
This generalizes easily to 2D images, 3D volume data, whatever.