How to vectorize the intersection kernel function in MATLAB? - matlab

I need to pre-compute the histogram intersection kernel matrices for using LIBSVM in MATLAB.
Assume x, y are two vectors. The kernel function is K(x, y) = sum(min(x, y)). In order to be efficient, the best practice in most cases is to vectorize the operations.
What I want to do is like calculate the kernel matrices like calculating the euclidean distance between two matrices, like pdist2(A, B, 'euclidean'). After defining function 'intKernel', I could calculate the intersection kernel by calling pdist2(A, B, intKernel).
I know the function 'pdist2' may be an option. But I have no idea how to write the self-defined distance function. While, I do not know how to code the intersection kernel between vector(1-by-M) and matrix(M-by-N) in one condense expression.
'repmat' may not be feasible, because the matrix is really large, let us say, 20000-by-360000.
Any help would be appreciated.
Regards,
Peiyun

I think pdist2 is a good option, so I help you to define your distance function.
According to the doc, the self-defined distance function must have 2 inputs: first one is a 1-by-N vector; second one is a M-by-N matrix (be careful of the order!).
To avoid the use of repmat which is indeed memory-consumant, you can use bsxfun to apply some basic operations on data with expansion over singleton dimensions. In your case, you can do the following thing:
distance_kernel = #(x,Y) sum(bsxfun(#min,x,Y),2);
Summation is done over the columns to get a column vector as output.
Then just call pdist2 and you are done.

Related

Nonlinear curve fitting of a matrix function in python

I have the following problem. I have a N x N real matrix called Z(x; t), where x and t might be vectors in general. I have N_s observations (x_k, Z_k), k=1,..., N_s and I'd like to find the vector of parameters t that better approximates the data in the least square sense, which means I want t that minimizes
S(t) = \sum_{k=1}^{N_s} \sum_{i=1}^{N} \sum_{j=1}^N (Z_{k, i j} - Z(x_k; t))^2
This is in general a non-linear fitting of a matrix function. I'm only finding examples in which one has to fit scalar functions which are not immediately generalizable to a matrix function (nor a vector function). I tried using the scipy.optimize.leastsq function, the package symfit and lmfit, but still I don't manage to find a solution. Eventually, I'm ending up writing my own code...any help is appreciated!
You can do curve-fitting with multi-dimensional data. As far as I am aware, none of the low-level algorithms explicitly support multidimensional data, but they do minimize a one-dimensional array in the least-squares sense. And the fitting methods do not really care about the "independent variable(s)" x except in that they help you calculate the array to be minimized - perhaps to calculate a model function to match to y data.
That is to say: if you can write a function that would take the parameter values and calculate the matrix to be minimized, just flatten that 2-d (on n-d) array to one dimension. The fit will not mind.

extrapolating a 2D matrix to predict a future output

I have a 2D 2401*266 matrix K which corresponds to x values (t: stored in a 1*266 array) and y values(z: stored in a 1*2401 array).
I want to extrapolate the matrix K to predict some future values (corresponding to t(1,267:279). So far I have extended t so that it is now a 1*279 matrix using a for loop:
for tq = 267:279
t(1,tq) = t(1,tq-1)+0.0333333333;
end
However I am stumped on how to extrapolate K without fitting a polynomial to each individual row?
I feel like there must be a more efficient way than this??
There are countless of extrapolation methods in the literature, "fitting a polynomial to each row" would be just one of them, not necessarily invalid, not sure why you mention that you do no wan't to do it. For 2D data perhaps fitting a surface would lead to better results though.
However, if you want an easy, simple way (that might or might not work with your problem), you can always use the function interp2, for interpolation. If you chose spline or makima as interpolation functions, it will also extrapolate for any query point outside the domain of K.

MATLAB: Efficient (vectorized) way to apply function on two matrices?

I have two matrices X and Y, both of order mxn. I want to create a new matrix O of order mxm such that each i,j th entry in this new matrix is computed by applying a function to ith and jth row of X and Y respectively. In my case m = 10000 and n = 500. I tried using a loop but it takes forever. Is there an efficient way to do it?
I am targeting two functions dot product -- dot(row_i, row_j) and exp(-1*norm(row_i-row_j)). But I was wondering if there is a general way so that I can plugin any function.
Solution #1
For the first case, it looks like you can simply use matrix multiplication after transposing Y -
X*Y'
If you are dealing with complex numbers -
conj(X*ctranspose(Y))
Solution #2
For the second case, you need to do a little more work. You need to use bsxfun with permute to re-arrange dimensions and employ the raw form of norm calculations and finally squeeze to get a 2D array output -
squeeze(exp(-1*sqrt(sum(bsxfun(#minus,X,permute(Y,[3 2 1])).^2,2)))
If you would like to avoid squeeze, you can use two permute's -
exp(-1*sqrt(sum(bsxfun(#minus,permute(X,[1 3 2]),permute(Y,[3 1 2])).^2,3)))
I would also advise you to look into this problem - Efficiently compute pairwise squared Euclidean distance in Matlab.
In conclusion, there isn't a common most efficient way that could be employed for every function to ith and jth row of X. If you are still hell bent on that, you can use anonymous function handles with bsxfun, but I am afraid it won't be the most efficient technique.
For the second part, you could also use pdist2:
result = exp(-pdist2(X,Y));

Find the best linear combination of two vectors resembling a third vector; implementing constraints

I have a vector z that I want to approximate by a linear combination of two other vectors (x,y) such that the residual of a*x+b*y and z is minimized. Also I want to keep one coefficient (a) positive for the fitting.
Any suggestions which command may help?
Thanks!
If you didn't have a bound on one of the coefficients, your problem could have been viewed as multiple regression (solved in matlab by regress). Since one of the coefficients is bounded, you should use lsqlin. This function solves least squares problems with bounds or inequalities on the coefficients. Don't forget to include an all-ones intercept predictor if your signals are not centered.
I think that fminsearch would be an overshoot in this case, since lsqlin does exactly what you want.
You have to define a function, which desribes the cost. The lower the cost is, the better the solution. The output must be a single scalar, e.g. the norm of the difference.
To avoid negative values for x, add something like (x<0)*inf. This rejects every solution with a negative x.
If done so, use fminsearch for a numeric solution.

MATLAB dct2/idct2 vs. dctmtx

There are two alternative methods to compute DCT and its inverse in MATLAB. One is dct2/idct2 and the other is the transformation matrix computed by dctmtx. Why is there an alternative way based on matrix multiplications making use of dctmtx?
"If A is square, the two-dimensional DCT of A can be computed as D*A*D'. This computation is sometimes faster than using dct2, especially if you are computing a large number of small DCTs, because D needs to be determined only once."
Where D = dctmtx(n)
Source: http://www.mathworks.com/help/toolbox/images/ref/dctmtx.html