How to interpret upsampling(deconv, nn, bilinear) as matrix? - neural-network

I am reading this Distill article Deconvolution and Checkerboard Artifacts about avoiding artifacts in images generated by neural networks.
In the section of Better Upsampling, the author compares the difference between deconvolution (i.e. transposed convolution), NN-resize then convolution, and bilinear-resize then convolution. He says:
Both deconvolution and the different resize-convolution approaches are linear operations, and can be interpreted as matrices.
Above the sentence, he shows the figure on how to interpret upsampling methods as matrices. I am confused about the figure. What is the meaning of a, b, c? How do the matrices in the bottom half correspond to the grey maps in the top half?

Related

Lukas Kanade optical flow: Understanding the math

I found a Matlab implementation of the LKT algorithm here and it is based on the brightness constancy equation.
The algorithm calculates the Image gradients in x and y direction by convolving the image with appropriate 2x2 horizontal and vertical edge gradient operators.
The brightness constancy equation in the classic literature has on its right hand side the difference between two successive frames.
However, in the implementation referred to by the aforementioned link, the right hand side is the difference of convolution.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);
Why couldn't It_m be simply calculated as:
it_m = im1 - im2;
As you mentioned, in theory only pixel by pixel difference is stated for optical flow computation.
However, in practice, all natural (not synthetic) images contain some degree of noise. On the other hand, differentiating is some kind of high pass filter and would stress (high pass) noise ratio to the signal.
Therefore, to avoid artifact caused by noise, usually an image smoothing (or low pass filtering) is carried out prior to any image differentiating (we have such process in edge detection too). The code does exactly this, i.e. apply and moving average filter on the image to reduce noise effect.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);
(Comments converted to an answer.)
In theory, there is nothing wrong with taking a pixel-wise difference:
Im_t = im1-im2;
to compute the time derivative. Using a spatial smoother when computing the time derivative mitigates the effect of noise.
Moreover, looking at the way that code computes spatial (x and y) derivatives:
Ix_m = conv2(im1,[-1 1; -1 1], 'valid');
computing the time derivate with a similar kernel and the valid option ensures the matrices It_x, It_y and Im_t have compatible sizes.
The temporal partial derivative(along t), is connected to the spatial partial derivatives (along x and y).
Think of the video sequence you are analyzing as a volume, spatio-temporal volume. At any given point (x,y,t), if you want to estimate partial derivatives, i.e. estimate the 3D gradient at that point, then you will benefit from having 3 filters that have the same kernel support.
For more theory on why this should be so, look up the topic steerable filters, or better yet look up the fundamental concept of what partial derivative is supposed to be, and how it connects to directional derivatives.
Often, the 2D gradient is estimated first, and then people tend to think of the temporal derivative secondly as independent of the x and y component. This can, and very often do, lead to numerical errors in the final optical flow calculations. The common way to deal with those errors is to do a forward and backward flow estimation, and combine the results in the end.
One way to think of the gradient that you are estimating is that it has a support region that is 3D. The smallest size of such a region should be 2x2x2.
if you do 2D gradients in the first and second image both using only 2x2 filters, then the corresponding FIR filter for the 3D volume is collected by averaging the results of the two filters.
The fact that you should have the same filter support region in 2D is clear to most: thats why the Sobel and Scharr operators look the way they do.
You can see the sort of results you get from having sanely designed differential operators for optical flow in this Matlab toolbox that I made, in part to show this particular point.

How to visualize kernel generated by a trained convolutional neural network?

I found the picture below:
which is shown in this web page
I wonder how are this kind of images generated?
This picture is made by printing out the weights of first layer filters, later in that course there's a section about visualizing the networks.
For higher layers printing out the weights may not make sense, there's a paper that might be useful Visualizing Higher-Layer Features of a Deep Network.
Each convolutional kernel is just a matrix N x M (weight matrix), thus you can simply plot it (each square in the above plot is a single convoluational matrix). Color ones are probably taken from 3-channel convolution, thus each is encoded as one color.

What are the prerequisite to understand Fourier Descriptor program in matlab?

I am new to Digital Image Processing and have to simulate a Fourier Descriptor Program that is Affine Invariant, I want to know the prerequisites required to be able to understand this program, my reference is Digital Image Processing Using MATLAB by Gonzalez, I have seen a question on this site, regarding same program, but not able to understand the program as well as the solution, the question says:
"I am using Gonzalez frdescp function to get Fourier descriptors of a boundary. I use this code, and I get two totally different sets of numbers describing two identical but different in scale shapes.
So what is wrong?"
Can some body help me in knowing the prerequisite to understand this program as well as help me further?
Let me give this a try as I will have to use english and not mathematical notation. First, this is the documentation of the frdescp shown here. frdescp takes one argument which is an n by 2 matrix of numbers. What are these numbers? This requires some understanding of the mathematical foundation of Fourier Descriptors. The assumptions, before computing the Fourier Descriptors, is that you have a contour of the object, and you have some points on that contour. So for example a contour is shown in this picture:
You see that black line in the image? That is where you will pick a list of points going clockwise from the contour. Let's call this vector {(x_1, y_1), (x_2,y_2),... ,(x_n,y_n)}. Now that we have these points we are ready to compute the Fourier descriptors of this contour. The complex Fourier descriptor implemented in this Matlab function requires numbers to be in the complex domain. So you have to convert the numbers in our list to complex numbers, this is easy as you can transform a tuple of real numbers in 2D (x,y) to x + iy in the complex plane. However the matlab the function already does this for you. But now you know what the n by 2 matrix is for, it is just a list of xs and ys on the contour. After you have this, the matlab function takes the discrete Fourier transform and you get the descriptors. The benefit of this descriptor business is that it is invariant under certain geometric transformations such translation, rotation and scaling. I hope this was helpful.

Hessian Matrix of the image

I am wondering what information does an Hessian Matrix of an image provides? Does it provide the information of the stable points? What is Hessian matrix used for?
Hessian matrix describes the 2nd order local image intensity variations around the selected voxel. For the obtained Hessian matrix, eigenvector decomposition extracts an orthonormal coordinate system that is aligned with the second order structure of the image. Having the eigenvalues and knowing the
(assumed) model of the structure to be detected and the resulting theoretical behavior of the eigenvalues, the decision can be made if the analyzed voxel belongs to the structure being searched.
The figure below illustrates the correspondence between eigenvalues of the hessian operation on the image and the local features (corner, edge, or flat region).
The Hessian operator is also widely used in 3D images, and it can reflect more local features:
It is widely used in vessel detection in medical images. For more details, please see M.Rudzki et al's Vessel Detection Method Based on Eigenvalues of the Hessian Matrix and its Applicability to Airway Tree Segmentation

Relevant eigenvectors in a face image?

I am using PCA for face recognition. I have obtained the eigenvectors / eigenfaces for each image, which is a colomn matrix. I want to know if selecting the first three eigenvectors , since their corresponding eigenvalues amount to 70% of total variance, will be sufficient for face recognition?
Firstly, lets be clear about a few things. The eigenvectors are computed from the covariance matrix formed from the entire dataset i.e., you reshape each grayscale image of a face into a single column and treat it as a point in R^d space, compute the covariance matrix from them and compute the eigenvectors of the covariance matrix. These eigenvectors become a new basis for your space of face images. You do not have eigenvectors for each image. Instead, you represent each face image in terms of the eigenvectors by projecting onto (a possibly subset) of them.
Limitations of eigenfaces
As to whether the representation of your face images under this new basis good enough for face recognition depends on many factors. But in general, the eigenfaces method does not perform well for real world unconstrained faces. It only works for faces which are pixel-wise aligned, facing frontal, and has fairly uniform illumination conditions across the images.
More is not necessarily better
While it is commonly believed (when using PCA) that retaining more variance is better than less, things are much more complicated than that because of two factors: 1) Noise in real world data and 2) dimensionality of data. Sometimes projecting to a lower dimension and losing variance can actually produce better results.
Conclusion
Hence, my answer is it is difficult to say whether retaining a certain amount of variance is enough beforehand. The number of dimensions (and hence the number of eigenvectors to keep and the associated variance retained) should be determined by cross-validation. But ultimately, as I have mentioned above, eigenfaces is not a good method for face recognition unless you have a "nice" dataset. You might be slightly better off using "Fisherfaces" i.e., LDA on the face images or combine these methods with Local Binary Pattern (LBP) as features (instead of raw face pixels). But seriously, face recognition is a difficult problem and in general the state-of-the-art has not reached a stage where it can be deployed in real world systems.
It's not impossible, but a little rare to me that only 3 eigenvalues can achieve 70% variance. How many training samples do you have (what is the total dimension)? Make sure you are reshape each image from the database into a vector, normalize the vector data then align them into a matrix. The eigenvalues/eigenvectors are obtained from the covariance of the matrix.
In theory, 70% variance should be enough to form a human-recognizable face with the corresponding eigenvectors. However, the optimal number of eigenvalues is better to get from cross-validation: you can try to increase 1 eigenvector each time, observe the face formation and the recognition accuracy. You can even plot the cross validation accuracy curve, there may be a sharp corner on the curve, then the corresponding eigenvector number is hopefully applied in your test.