I am new to Digital Image Processing and have to simulate a Fourier Descriptor Program that is Affine Invariant, I want to know the prerequisites required to be able to understand this program, my reference is Digital Image Processing Using MATLAB by Gonzalez, I have seen a question on this site, regarding same program, but not able to understand the program as well as the solution, the question says:
"I am using Gonzalez frdescp function to get Fourier descriptors of a boundary. I use this code, and I get two totally different sets of numbers describing two identical but different in scale shapes.
So what is wrong?"
Can some body help me in knowing the prerequisite to understand this program as well as help me further?
Let me give this a try as I will have to use english and not mathematical notation. First, this is the documentation of the frdescp shown here. frdescp takes one argument which is an n by 2 matrix of numbers. What are these numbers? This requires some understanding of the mathematical foundation of Fourier Descriptors. The assumptions, before computing the Fourier Descriptors, is that you have a contour of the object, and you have some points on that contour. So for example a contour is shown in this picture:
You see that black line in the image? That is where you will pick a list of points going clockwise from the contour. Let's call this vector {(x_1, y_1), (x_2,y_2),... ,(x_n,y_n)}. Now that we have these points we are ready to compute the Fourier descriptors of this contour. The complex Fourier descriptor implemented in this Matlab function requires numbers to be in the complex domain. So you have to convert the numbers in our list to complex numbers, this is easy as you can transform a tuple of real numbers in 2D (x,y) to x + iy in the complex plane. However the matlab the function already does this for you. But now you know what the n by 2 matrix is for, it is just a list of xs and ys on the contour. After you have this, the matlab function takes the discrete Fourier transform and you get the descriptors. The benefit of this descriptor business is that it is invariant under certain geometric transformations such translation, rotation and scaling. I hope this was helpful.
Related
I found a Matlab implementation of the LKT algorithm here and it is based on the brightness constancy equation.
The algorithm calculates the Image gradients in x and y direction by convolving the image with appropriate 2x2 horizontal and vertical edge gradient operators.
The brightness constancy equation in the classic literature has on its right hand side the difference between two successive frames.
However, in the implementation referred to by the aforementioned link, the right hand side is the difference of convolution.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);
Why couldn't It_m be simply calculated as:
it_m = im1 - im2;
As you mentioned, in theory only pixel by pixel difference is stated for optical flow computation.
However, in practice, all natural (not synthetic) images contain some degree of noise. On the other hand, differentiating is some kind of high pass filter and would stress (high pass) noise ratio to the signal.
Therefore, to avoid artifact caused by noise, usually an image smoothing (or low pass filtering) is carried out prior to any image differentiating (we have such process in edge detection too). The code does exactly this, i.e. apply and moving average filter on the image to reduce noise effect.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);
(Comments converted to an answer.)
In theory, there is nothing wrong with taking a pixel-wise difference:
Im_t = im1-im2;
to compute the time derivative. Using a spatial smoother when computing the time derivative mitigates the effect of noise.
Moreover, looking at the way that code computes spatial (x and y) derivatives:
Ix_m = conv2(im1,[-1 1; -1 1], 'valid');
computing the time derivate with a similar kernel and the valid option ensures the matrices It_x, It_y and Im_t have compatible sizes.
The temporal partial derivative(along t), is connected to the spatial partial derivatives (along x and y).
Think of the video sequence you are analyzing as a volume, spatio-temporal volume. At any given point (x,y,t), if you want to estimate partial derivatives, i.e. estimate the 3D gradient at that point, then you will benefit from having 3 filters that have the same kernel support.
For more theory on why this should be so, look up the topic steerable filters, or better yet look up the fundamental concept of what partial derivative is supposed to be, and how it connects to directional derivatives.
Often, the 2D gradient is estimated first, and then people tend to think of the temporal derivative secondly as independent of the x and y component. This can, and very often do, lead to numerical errors in the final optical flow calculations. The common way to deal with those errors is to do a forward and backward flow estimation, and combine the results in the end.
One way to think of the gradient that you are estimating is that it has a support region that is 3D. The smallest size of such a region should be 2x2x2.
if you do 2D gradients in the first and second image both using only 2x2 filters, then the corresponding FIR filter for the 3D volume is collected by averaging the results of the two filters.
The fact that you should have the same filter support region in 2D is clear to most: thats why the Sobel and Scharr operators look the way they do.
You can see the sort of results you get from having sanely designed differential operators for optical flow in this Matlab toolbox that I made, in part to show this particular point.
I am trying to use MATLAB to implement a CT (computed tomography) projection operator, A, which I think is also referred as "system matrix" often times.
Basically, for a N x N image M, the projection data, P, can be obtained by multiplication of the project operator to the image:
P = AM
and the backprojection procedure can be performed by multiplying the (conjugate) transpose of the projection operator to the projection data:
M = A'P
Anyone has any idea/example/sample code on how to implement matrix A (for example: Radon transform)? I would really like to start with a small size of matrix, say 8 x 8, or 16 x 16, if possible.
My question really is: how to implement the projection operator, such that by multiplying the operator with an image, I can get the projections, and by multiplying the (conjugate) transpose of the operator with the projections, I can get the original image back.
EDIT:
Particularly, I would like to implement distance-driven projector, in which case beam trajectory (parallel, fan, or etc) would not matter. Very simple example (MATLAB preferred) will be the best for me to start.
You have different examples:
Here there is a Matlab example related to 3d Cone Beam. It can be a good starting point.
Here you also have another operator
Here you have a brief explanation of the Distance-Driven Method. So using the first example and the explanation in this book, you can obtain some ideas.
If not, you can always go to the Distance-Driven operator paper and implement it using the first example.
As far as I'm aware, there are no freely available implementations of the distance-driven projector/backprojector (it is patented). You can, however, code it yourself without too much difficulty.
Start by reading the papers and understanding what the projector is doing. There are only a few key parts that you need:
Projecting pixel boundaries onto an axis.
Projecting detector boundaries onto an axis.
The overlap kernel.
The first two are simple geometry. The overlap kernel is described in good detail (and mostly usable pseudocode) in the papers.
Note that you won't wind up with an actual matrix that does the projection. The system would be too large for all but the tiniest examples. Instead, you should write a function that implements the linear operator corresponding to distance-driven projection.
Although there are already a lot of satisfactory answers, I would like to mention that I have implemented the Distance Driven method for 2D Computed Tomography (CT) and 3D Digital Breast Tomosynthesis (DBT) on MATLAB.
Until now, for 2D CT, these codes are available:
Simple Distance-Driven, base on the original papers [1] and [2],
Branchless Distance-Driven, for acceleration on GPU, based on the papers [3] and [4],
and for 3D DBT:
Simple Distance-Driven, based on the book [5].
Note that:
1 - The code for DBT is strictly for limited angle tomography; however it is straightforward to extend to a full rotation angle.
2 - All the codes are implemented for CPU.
Please, report any issue on the codes so we can keep improving it.
Distance-driven projection is not implemented in stock MATLAB. For forward projection, there is the fanbeam() and radon() command, depending on what geometry you're looking for. I don't consider fanbeam to be very good. It exhibits nonlinear behavior, as of R2013a, see here for details
As for a matching transpose, there is no function for that either for fanbeam or parallel geometry. Note, iradon and ifanbeam are not operator implementations of the matching transpose. However, you might consider using FUNC2MAT. It will let you convert any linear operator from function form to matrix form and then you can transpose freely.
Things are like this:
I have some graphs like the pictures above and I am trying to classify them to different kinds so the shape of a character can be recognized, and here is what I've done:
I apply a 2-D FFT to the graphs, so I can get the spectral analysis of these graphs. And here are some result:
S after 2-D FFT
T after 2-D FFT
I have found that the same letter share the same pattern of magnitude graph after FFT, and I want to use this feature to cluster these letters. But there is a problem: I want the features of interested can be presented in a 2-D plane, i.e in the form of (x,y), but the features here is actually a graph, with about 600*400 element, and I know the only thing I am interested is the shape of the graph(S is a dot in the middle, and T is like a cross). So what can I do to reduce the dimension of the magnitude graph?
I am not sure I am clear about my question here, but thanks in advance.
You can use dimensionality reduction methods such as
k-means clustering
SVM
PCA
MDS
Each of these methods can take 2-dimensional arrays, and work out the best coordinate frame to distinguish / represent etc your letters.
One way to start would be reducing your 240000 dimensional space to a 26-dimensional space using any of these methods.
This would give you an 'amplitude' for each of the possible letters.
But as #jucestain says, a network classifiers are great for letter recognition.
Can anyone advise on the correct FFT to be using (Real or Complex)? I have looked here but still have questions.
I want to do image correlation to identify the location of a sub image within a master image. I understand the basics of FFTs and iFFTs.
The plan:
Perform an FFT on a master image. 512x512
Take complex conjugate of sub image.
Perform an FFT on the sub image. 30x30 but padded with zeros to 512x512
Complex Multiply the two resulting matrixes
Perform iFFT on result
Even though the result should be (mostly) real, take the magnitude of resulting matrix
Look for maximum value which should correspond to maximum correlation.
I am having trouble getting the results that I anticipate.
If I use the real 2d fft (vDSP_fft2dzrip), the result is in a packed format that makes it hard to use a vDSP_zvmul to multiply to two result matrixes.
If I use the complex fft (vDSP_fft2dzip), I fail to get any correlation at all.
The apple examples and most of the audio examples don't do anything with the results of the forward FFT other than do the inverse.
Can anyone help me get started with image correlation? First question...can I use the complex FFT and avoid the packed format?
The only difference between a real and complex FFT is that the real FFT can be slightly more efficient by using a clever packing scheme that transforms a 2^n real FFT into a 2^(n-1) complex FFT. The results should be the same in both cases. So I would stick with the complex FFT for simplicity if I were you, at least until you have everything working.
Have you also taken a look at vImageConvolve_ARGB8888? It seems to do what you are trying to do, with a lot less effort :)
I've been attempting to figure out how to take a homography between two planes and convert it into an projective transform. Matlab does this automatically, but I've been trying to figure out how matlab implements the conversion.
You can look at the source code in toolbox\images\images\maketform.m
At least within the editor you can get to this by hitting F4 on the function name.
A homography is a projective transform that maps lines to lines, keeps cross ratio, but does not keep parallelism or other similarity magnitudes (angles, distances, etc).
A homography can be expressed as a homogeneous 3x3 matrix, and computed in many (really, many) different ways according to your problem.
The most typical one is to determine 4 point correspondences between the two planes and use the Direct Linear Transform (DLT). There are also many implementations of the DLT. If you are familiar with OpenCV, you can easily obtain such homography matrix using cv::findHomography (http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html?highlight=findhomography#findhomography).
In general, I recommend you to take a look to the "Multiple View Geometry" book from Hartley & Zisserman, which explain in detail the concept of homographies in the context of computer vision.