Convolution in log space - convolution

I need to do a convolution with a given kernel. This kernel is defined in linear space (in this case: wavelength), but I need to do the convolution in logarithmic space. Does anyone know how to handle this?
Some first thoughts:
Convolution must be done piecewise, because the log kernel will change with wavelength. For instance, convolving with a Gaussian in linear space will require different kernel with different sigmas for each wavelength.
How do I handle this? Shift the kernel to the desired wavelength, log it, and re-shift back to zero?
If this is true, that means that a symmetric kernel in linear space will be asymmetric in log space, doesn't it?
Thanks!

Related

Lukas Kanade optical flow: Understanding the math

I found a Matlab implementation of the LKT algorithm here and it is based on the brightness constancy equation.
The algorithm calculates the Image gradients in x and y direction by convolving the image with appropriate 2x2 horizontal and vertical edge gradient operators.
The brightness constancy equation in the classic literature has on its right hand side the difference between two successive frames.
However, in the implementation referred to by the aforementioned link, the right hand side is the difference of convolution.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);
Why couldn't It_m be simply calculated as:
it_m = im1 - im2;
As you mentioned, in theory only pixel by pixel difference is stated for optical flow computation.
However, in practice, all natural (not synthetic) images contain some degree of noise. On the other hand, differentiating is some kind of high pass filter and would stress (high pass) noise ratio to the signal.
Therefore, to avoid artifact caused by noise, usually an image smoothing (or low pass filtering) is carried out prior to any image differentiating (we have such process in edge detection too). The code does exactly this, i.e. apply and moving average filter on the image to reduce noise effect.
It_m = conv2(im1,[1,1;1,1]) + conv2(im2,[-1,-1;-1,-1]);
(Comments converted to an answer.)
In theory, there is nothing wrong with taking a pixel-wise difference:
Im_t = im1-im2;
to compute the time derivative. Using a spatial smoother when computing the time derivative mitigates the effect of noise.
Moreover, looking at the way that code computes spatial (x and y) derivatives:
Ix_m = conv2(im1,[-1 1; -1 1], 'valid');
computing the time derivate with a similar kernel and the valid option ensures the matrices It_x, It_y and Im_t have compatible sizes.
The temporal partial derivative(along t), is connected to the spatial partial derivatives (along x and y).
Think of the video sequence you are analyzing as a volume, spatio-temporal volume. At any given point (x,y,t), if you want to estimate partial derivatives, i.e. estimate the 3D gradient at that point, then you will benefit from having 3 filters that have the same kernel support.
For more theory on why this should be so, look up the topic steerable filters, or better yet look up the fundamental concept of what partial derivative is supposed to be, and how it connects to directional derivatives.
Often, the 2D gradient is estimated first, and then people tend to think of the temporal derivative secondly as independent of the x and y component. This can, and very often do, lead to numerical errors in the final optical flow calculations. The common way to deal with those errors is to do a forward and backward flow estimation, and combine the results in the end.
One way to think of the gradient that you are estimating is that it has a support region that is 3D. The smallest size of such a region should be 2x2x2.
if you do 2D gradients in the first and second image both using only 2x2 filters, then the corresponding FIR filter for the 3D volume is collected by averaging the results of the two filters.
The fact that you should have the same filter support region in 2D is clear to most: thats why the Sobel and Scharr operators look the way they do.
You can see the sort of results you get from having sanely designed differential operators for optical flow in this Matlab toolbox that I made, in part to show this particular point.

Deconvolution of data convolved by a Gaussian response

I have a set of experimental data s(t) which consists of a vector (with 81 points as a function of time t).
From the physics, this is the result of the convolution of the system response e(t) with a probe p(t), which is a Gaussian (actually a laser pulse). In terms of vector, its FWHM covers approximately 15 points in time.
I want to deconvolve this data in Matlab using the convolution theorem: FT{e(t)*p(t)}=FT{e(t)}xFT{p(t)} (where * is the convolution, x the product and FT the Fourier transform).
The procedure itself is no problem, if I suppose a Dirac function as my probe, I recover exactly the initial signal (which makes sense, measuring a system with a Dirac gives its impulse response)
However, the Gaussian case as a probe, as far as I understood turns out to be a critical one. When I divide the signal in the Fourier space by the FT of the probe, the wings of the Gaussian highly amplifies those frequencies and I completely loose my initial signal instead of having a deconvolved one.
From your experience, which method could be used here (like Hamming windows or any windowing technique, or...) ? This looks rather pretty simple but I did not find any easy way to follow in signal processing and this is not my field.
You have noise in your experimental data, do you? The problem is ill-posed then (non-uniquely solvable) and you need regularization.
If the noise is Gaussian the keywords are Tikhonov regularization or Wiener filtering.
Basically, add a positive regularization factor that acts as a lowpass filter. In your notation the estimation of the true curve o(t) then becomes:
o(t) = FT^-1(FT(e)*conj(FT(p))/(abs(FT(p))^2+l))
with a suitable l>0.
You're trying to do Deconvolution process by assuming the Filter Model is Gaussian Blur.
Few notes for doing Deconvolution:
Since your data is real (Not synthetic) data it includes some kind of Noise.
Hence it is better to use the Wiener Filter (Even with the assumption of low variance noise). Otherwise, the "Deconvolution Filter" will increase the noise significantly (As it is an High Pass basically).
When doing the division in the Fourier Domain zero pad the signals to the correct size or better yet create the Gaussian Filter in the time domain with the same number of samples as the signal.
Boundaries will create artifact, Windowing might be useful.
There are many more sophisticated methods for Deconvolution by defining a more sophisticated model on the signal and the noise. If you have more prior data about them, you should look for this kind of framework.
You can always set a threshold on the amplification level for certain frequencies, do that if needed.
Use as much samples as you can.
I hope this will assist you.

Hyper-parameters of Gaussian Processes for Regression

I know a Gaussian Process Regression model is mainly specified by its covariance matrix and the free hyper-parameters act as the 'weights'of the model. But could anyone explain what do the 2 hyper-parameters (length-scale & amplitude) in the covariance matrix represent (since they are not 'real' parameters)? I'm a little confused on the 'actual' meaning of these 2 parameters.
Thank you for your help in advance. :)
First off I would like to point out that there are infinite number of kernels that could be used in a gaussian process. One of the most common however is the RBF (also referred to as squared exponential, the expodentiated quadratic, etc). This kernel is of the following form:
The above equation is of course for the simple 1D case. Here l is the length scale and sigma is the variance parameter (note they go under different names depending on the source). Effectively the length scale controls how two points appear to be similar as it simply magnifies the distance between x and x'. The variance parameter controls how smooth the function is. These are related but not the same.
The Kernel Cookbook give a nice little description and compares RBF kernels to other commonly used kernels.

Minimizing a fitted gaussian process

I fitted a computational model with scikit's gaussian process. I would like to find the minimum of the fitted gaussian process.
Davoud
A gaussian process (GP) is a probability distribution over an infinite number functions with an infinite number minima. That is the beauty of a GP, although perhaps in your view a downfall.
What I believe you mean to say is what is the minima of the mean function of a GP. This, unfortunately, is not something we can write down in closed from and hence is probably why scikit would not have it built into the framework.
If you really want to find it I suggest taking a linespace over your input space to sample and try to find a minima (if your input space is low dimensional) or expand the expression for the mean function and differentiate with respect to your input space to try and find the extreme points and global minima (not much use when you have many observations).

How to efficiently solve linear system with Laplacian + diagonal matrix?

In my implementation of an image processing algorithm, I have to solve a large linear system of the form A*x=b, where:
Matrix A=L+D is the sum of a Laplacian matrix L and a diagonal matrix D
Laplacian matrix L is sparse, with about 25 non-zeros per row
The system is large, with as many unknowns as there are pixels in the input image (typically > 1 million).
The Laplacian matrix L does not change between successive runs of the algorithm; I can construct this matrix in preprocessing, and possibly compute its factorization. The diagonal matrix D and right-side vector b change at each run of the algorithm.
I am trying to find out what would be the fastest method to solve the system at runtime; I do not mind spending time on preprocessing (for computing a factorization of L, for example).
My initial idea was to pre-compute a Cholesky factorization of L, then update the factorization at runtime with values from D (rank-1 update with cholupdate), and solve quickly the problem with back-substitution. Unfortunately, the Cholesky factorization is not as sparse as the original L matrix, and just loading it from disk already takes 5.48s; as a comparison, it takes 8.30s to directly solve the system with backslash.
Given the shape of my matrices, is there any other method that you would recommend to speedup the solving at runtime, no matter how long it takes at preprocessing time?
Assuming that you are working on a grid (since you mention images - although this is not guaranteed), that you are more interested in speed than precision (since 5s seems already too slow for 1 million unknowns), I see several options.
First, forget about exact methods such as Cholesky (+reordering). Even if they allow to store the factorization and reuse it for multiple rhs, you'll likely need to store gigantic matrices that appear to be intractable in your case (I hope you're re-ordering rows/columns with reverse Cuthill McKee or anything else though - that sparsifies the factorization a lot).
Depending on your boundary conditions, I would first try a Matlab poisolv that solves a Poisson problem using an FFT, and possible reprojections if you want Dirichlet boundary conditions instead of periodic ones. It's very fast, but might not be appropriate for your problem (you mention having 25 nnz for a Laplacian matrix+identity : why ? is-it a high order Laplace matrix, in which case you may be more interested in precision than what I assume ? or is-it in fact a different problem than the one you describe ?).
Then, you can try multigrid solvers that are very fast for images and smooth problems. You can use a simple relaxation method for each iteration and each level of the multigrid, or use fancier methods (for instance, a preconditioned conjugate gradient par level).
Alternatively, you can do a simpler preconditioned conjugate gradient (or even SSOR) without multigrid, and if you're only interested in an approximate solution, you can stop the iterations before full convergence.
My arguments for iterative solvers are:
you can stop before convergence if you want an approximate problem
you can still re-use other results to initialize your solution (for instance, if your different runs correspond to different frames of a video, then using the solution of the previous frame as an initialization of the next would make some sense).
Of course, a direct solver for which you can precompute, store and keep the factorization also makes sense (although I don't understand your argument for a rank-1 update if your matrix is constant) since only the backsubstitution remains to be done at runtime. But given this ignores the structure of the problem (a regular grid, a possible interest in limited precision results etc.), I'd opt for methods which have been designed for these cases such as Fourier-like methods or multigrids. Both methods can be implemented on the GPU for faster results (recall that GPUs are rather tailored for dealing with images/textures!).
Finally, you can get interesting answers from scicomp.stackexchange which is more targeted to numerical analysis.