MATLAB short way to find closest vector? - matlab

In my application, I need to find the "closest" (minimum Euclidean distance vector) to an input vector, among a set of vectors (i.e. a matrix)
Therefore every single time I have to do this :
function [match_col] = find_closest_column(input_vector, vectors)
cmin = 99999999999; % current minimum distance
match_col = -1;
for col=1:width
candidate_vector = vectors(:,c); % structure of the input is not important
dist = norm(input_vector - candidate_vector);
if dist < cmin
cmin = dist;
match_col = col;
end
end
is there a built-in MATLAB function that does this kind of thing easily (with a short amount of code) for me ?
Thanks for any help !

Use pdist2. Assuming (from your code) that your vectors are columns, transposition is needed because pdist2 works with rows:
[cmin, match_col] = min(pdist2(vectors.', input_vector.' ,'euclidean'));
It can also be done with bsxfun (in this case it's easier to work directly with columns):
[cmin, match_col] = min(sum(bsxfun(#minus, vectors, input_vector).^2));
cmin = sqrt(cmin); %// to save operations, apply sqrt only to the minimizer

norm can't be directly applied to every column or row of a matrix, so you can use arrayfun:
dist = arrayfun(#(col) norm(input_vector - candidate_vector(:,col)), 1:width);
[cmin, match_col] = min(dist);
This solution was also given here.
HOWEVER, this solution is much much slower than doing a direct computation using bsxfun (as in Luis Mendo's answer), so it should be avoided. arrayfun should be used for more complex functions, where a vectorized approach is harder to get at.

Related

Normalization 3D Image according to Slices in MATLAB

I have a matrix which is 256X192X80. I want to normalize all slices (80 represents the slices) without using for loop.
The way I'm doing with for is below: (im_dom_raw is our matrix)
normalized_raw = zeros(size(im_dom_raw));
for a=1:80
slice_raw = im_dom_raw(:,:,a);
slice_raw = slice_raw-min(slice_raw(:));
slice_raw = slice_raw/(max(slice_raw(:)));
normalized_raw(:,:,a) = slice_raw;
end
The code below implements your normalization approach without using loops. Its based on bsxfun.
% Shift all values to the positive side
slices_raw = bsxfun(#minus,im_dom_raw,min(min(im_dom_raw)));
% Normalize all values with respect to the slice maximum (With input from #Daniel)
normalized_raw2 = bsxfun(#mrdivide,slices_raw,max(max(slices_raw)));
% A slightly faster approach would be
%normalized_raw2 = bsxfun(#times,slices_raw,max(max(slices_raw)).^-1);
% ... but it will differ with your approach due to numerical approximation
% Comparison to your previous loop based implementation
sum(abs(normalized_raw(:)-normalized_raw2(:)))
The last line of code outputs
ans =
0
Which (thanks to #Daniel) means that both approaches yield exact same results.

Vectorizing the solution of a linear equation system in MATLAB

Summary: This question deals with the improvement of an algorithm for the computation of linear regression.
I have a 3D (dlMAT) array representing monochrome photographs of the same scene taken at different exposure times (the vector IT) . Mathematically, every vector along the 3rd dimension of dlMAT represents a separate linear regression problem that needs to be solved. The equation whose coefficients need to be estimated is of the form:
DL = R*IT^P, where DL and IT are obtained experimentally and R and P must be estimated.
The above equation can be transformed into a simple linear model after applying a logarithm:
log(DL) = log(R) + P*log(IT) => y = a + b*x
Presented below is the most "naive" way to solve this system of equations, which essentially involves iterating over all "3rd dimension vectors" and fitting a polynomial of order 1 to (IT,DL(ind1,ind2,:):
%// Define some nominal values:
R = 0.3;
IT = 600:600:3000;
P = 0.97;
%// Impose some believable spatial variations:
pMAT = 0.01*randn(3)+P;
rMAT = 0.1*randn(3)+R;
%// Generate "fake" observation data:
dlMAT = bsxfun(#times,rMAT,bsxfun(#power,permute(IT,[3,1,2]),pMAT));
%// Regression:
sol = cell(size(rMAT)); %// preallocation
for ind1 = 1:size(dlMAT,1)
for ind2 = 1:size(dlMAT,2)
sol{ind1,ind2} = polyfit(log(IT(:)),log(squeeze(dlMAT(ind1,ind2,:))),1);
end
end
fittedP = cellfun(#(x)x(1),sol); %// Estimate of pMAT
fittedR = cellfun(#(x)exp(x(2)),sol); %// Estimate of rMAT
The above approach seems like a good candidate for vectorization, since it does not utilize MATLAB's main strength that is MATrix operations. For this reason, it does not scale very well and takes much longer to execute than I think it should.
There exist alternative ways to perform this computation based on matrix division, as demonstrated here and here, which involve something like this:
sol = [ones(size(x)),log(x)]\log(y);
That is, appending a vector of 1s to the observations, followed by mldivide to solve the equation system.
The main challenge I'm facing is how to adapt my data to the algorithm (or vice versa).
Question #1: How can the matrix-division-based solution be extended to solve the problem presented above (and potentially replace the loops I am using)?
Question #2 (bonus): What is the principle behind this matrix-division-based solution?
The secret ingredient behind the solution that includes matrix division is the Vandermonde matrix. The question discusses a linear problem (linear regression), and those can always be formulated as a matrix problem, which \ (mldivide) can solve in a mean-square error senseā€”. Such an algorithm, solving a similar problem, is demonstrated and explained in this answer.
Below is benchmarking code that compares the original solution with two alternatives suggested in chat1, 2 :
function regressionBenchmark(numEl)
clc
if nargin<1, numEl=10; end
%// Define some nominal values:
R = 5;
IT = 600:600:3000;
P = 0.97;
%// Impose some believable spatial variations:
pMAT = 0.01*randn(numEl)+P;
rMAT = 0.1*randn(numEl)+R;
%// Generate "fake" measurement data using the relation "DL = R*IT.^P"
dlMAT = bsxfun(#times,rMAT,bsxfun(#power,permute(IT,[3,1,2]),pMAT));
%% // Method1: loops + polyval
disp('-------------------------------Method 1: loops + polyval')
tic; [fR,fP] = method1(IT,dlMAT); toc;
fprintf(1,'Regression performance:\nR: %d\nP: %d\n',norm(fR-rMAT,1),norm(fP-pMAT,1));
%% // Method2: loops + Vandermonde
disp('-------------------------------Method 2: loops + Vandermonde')
tic; [fR,fP] = method2(IT,dlMAT); toc;
fprintf(1,'Regression performance:\nR: %d\nP: %d\n',norm(fR-rMAT,1),norm(fP-pMAT,1));
%% // Method3: vectorized Vandermonde
disp('-------------------------------Method 3: vectorized Vandermonde')
tic; [fR,fP] = method3(IT,dlMAT); toc;
fprintf(1,'Regression performance:\nR: %d\nP: %d\n',norm(fR-rMAT,1),norm(fP-pMAT,1));
function [fittedR,fittedP] = method1(IT,dlMAT)
sol = cell(size(dlMAT,1),size(dlMAT,2));
for ind1 = 1:size(dlMAT,1)
for ind2 = 1:size(dlMAT,2)
sol{ind1,ind2} = polyfit(log(IT(:)),log(squeeze(dlMAT(ind1,ind2,:))),1);
end
end
fittedR = cellfun(#(x)exp(x(2)),sol);
fittedP = cellfun(#(x)x(1),sol);
function [fittedR,fittedP] = method2(IT,dlMAT)
sol = cell(size(dlMAT,1),size(dlMAT,2));
for ind1 = 1:size(dlMAT,1)
for ind2 = 1:size(dlMAT,2)
sol{ind1,ind2} = flipud([ones(numel(IT),1) log(IT(:))]\log(squeeze(dlMAT(ind1,ind2,:)))).'; %'
end
end
fittedR = cellfun(#(x)exp(x(2)),sol);
fittedP = cellfun(#(x)x(1),sol);
function [fittedR,fittedP] = method3(IT,dlMAT)
N = 1; %// Degree of polynomial
VM = bsxfun(#power, log(IT(:)), 0:N); %// Vandermonde matrix
result = fliplr((VM\log(reshape(dlMAT,[],size(dlMAT,3)).')).');
%// Compressed version:
%// result = fliplr(([ones(numel(IT),1) log(IT(:))]\log(reshape(dlMAT,[],size(dlMAT,3)).')).');
fittedR = exp(real(reshape(result(:,2),size(dlMAT,1),size(dlMAT,2))));
fittedP = real(reshape(result(:,1),size(dlMAT,1),size(dlMAT,2)));
The reason why method 2 can be vectorized into method 3 is essentially that matrix multiplication can be separated by the columns of the second matrix. If A*B produces matrix X, then by definition A*B(:,n) gives X(:,n) for any n. Moving A to the right-hand side with mldivide, this means that the divisions A\X(:,n) can be done in one go for all n with A\X. The same holds for an overdetermined system (linear regression problem), in which there is no exact solution in general, and mldivide finds the matrix that minimizes the mean-square error. In this case too, the operations A\X(:,n) (method 2) can be done in one go for all n with A\X (method 3).
The implications of improving the algorithm when increasing the size of dlMAT can be seen below:
For the case of 500*500 (or 2.5E5) elements, the speedup from Method 1 to Method 3 is about x3500!
It is also interesting to observe the output of profile (here, for the case of 500*500):
Method 1
Method 2
Method 3
From the above it is seen that rearranging the elements via squeeze and flipud takes up about half (!) of the runtime of Method 2. It is also seen that some time is lost on the conversion of the solution from cells to matrices.
Since the 3rd solution avoids all of these pitfalls, as well as the loops altogether (which mostly means re-evaluation of the script on every iteration) - it unsurprisingly results in a considerable speedup.
Notes:
There was very little difference between the "compressed" and the "explicit" versions of Method 3 in favor of the "explicit" version. For this reason it was not included in the comparison.
A solution was attempted where the inputs to Method 3 were gpuArray-ed. This did not provide improved performance (and even somewhat degradaed them), possibly due to wrong implementation, or the overhead associated with copying matrices back and forth between RAM and VRAM.

Vectorize Evaluations of Meshgrid Points in Matlab

I need the "for" loop in the following representative section of code to run as efficiently as possible. The mean function in the code is acting as a representative placeholder for my own function.
x = linspace(-1,1,15);
y = linspace(2,4,15);
[xgrid, ygrid] = meshgrid(x,y);
mc = rand(100000,1);
z=zeros(size(xgrid));
for i=1:length(xgrid)
for j=1:length(ygrid)
z(i,j) = mean(xgrid(i,j) + ygrid(i,j) + xgrid(i,j)*ygrid(i,j)*mc);
end
end
I have vectorized the code and improved its speed by about 2.5 times by building a matrix in which mc is replicated for each grid point. My implementation results in a very large matrix (3 x 22500000) filled with repeated data. I've mitigated the memory penalty of this approach by converting the matrix to single precision, but it seems like there should be a more efficient way to do what I want that avoids replicating so much data.
You could use bsxfun with few reshapes -
A = bsxfun(#times,y,x.'); %//'
B = bsxfun(#plus,y,x.'); %//'
C = mean(bsxfun(#plus,bsxfun(#times,mc,reshape(A,1,[])) , reshape(B,1,[])),1);
z_out = reshape(C,numel(x),[]).';

Fastest way to add multiple sparse matrices in a loop in MATLAB

I have a code that repeatedly calculates a sparse matrix in a loop (it performs this calculation 13472 times to be precise). Each of these sparse matrices is unique.
After each execution, it adds the newly calculated sparse matrix to what was originally a sparse zero matrix.
When all 13742 matrices have been added, the code exits the loop and the program terminates.
The code bottleneck occurs in adding the sparse matrices. I have made a dummy version of the code that exhibits the same behavior as my real code. It consists of a MATLAB function and a script given below.
(1) Function that generates the sparse matrix:
function out = test_evaluate_stiffness(n)
ind = randi([1 n*n],300,1);
val = rand(300,1);
[I,J] = ind2sub([n,n],ind);
out = sparse(I,J,val,n,n);
end
(2) Main script (program)
% Calculate the stiffness matrix
n=1000;
K=sparse([],[],[],n,n,n^2);
tic
for i=1:13472
temp=rand(1)*test_evaluate_stiffness(n);
K=K+temp;
end
fprintf('Stiffness Calculation Complete\nTime taken = %f s\n',toc)
I'm not very familiar with sparse matrix operations so I may be missing a critical point here that may allow my code to be sped up considerably.
Am I handling the updating of my stiffness matrix in a reasonable way in my code? Is there another way that I should be using sparse that will result in a faster solution?
A profiler report is also provided below:
If you only need the sum of those matrices, instead of building all of them individually and then summing them, simply concatenate the vectors I,J and vals and call sparse only once. If there are duplicate rows [i,j] in [I,J] the corresponding values S(i,j) will be summed automatically, so the code is absolutely equivalent. As calling sparse involves an internal call to a sorting algorithm, you save 13742-1 intermediate sorts and can get away with only one.
This involves changing the signature of test_evaluate_stiffness to output [I,J,val]:
function [I,J,val] = test_evaluate_stiffness(n)
and removing the line out = sparse(I,J,val,n,n);.
You will then change your other function to:
n = 1000;
[I,J,V] = deal([]);
tic;
for i = 1:13472
[I_i, J_i, V_i] = test_evaluate_stiffness(n);
nE = numel(I_i);
I(end+(1:nE)) = I_i;
J(end+(1:nE)) = J_i;
V(end+(1:nE)) = rand(1)*V_i;
end
K = sparse(I,J,V,n,n);
fprintf('Stiffness Calculation Complete\nTime taken = %f s\n',toc);
If you know the lengths of the output of test_evaluate_stiffness ahead of time, you can possibly save some time by preallocating the arrays I,J and V with appropriately-sized zeros matrices and set them using something like:
I((i-1)*nE + (1:nE)) = ...
J((i-1)*nE + (1:nE)) = ...
V((i-1)*nE + (1:nE)) = ...
The biggest remaining computation, taking 11s, is the sparse operation
on the final I,J,V vectors so I think we've taken it down to the bare
bones.
Nearly... but one final trick: if you can create the vectors so that J is sorted ascending then you will greatly improve the speed of the sparse call, about a factor 4 in my experience.
(If it's easier to have I sorted, then create the transpose matrix sparse(J,I,V) and un-transpose it afterwards.)

matlab - optimize getting the angle between each vector with all others in a large array

I am trying to get the angle between every vector in a large array (1896378x4 -EDIT: this means I need 1.7981e+12 angles... TOO LARGE, but if there's room to optimize the code, let me know anyways). It's too slow - I haven't seen it finish yet. Here's the steps towards optimizing I've taken:
First, logically what I (think I) want (just use Bt=rand(N,4) for testing):
[ro,col]=size(Bt);
angbtwn = zeros(ro-1); %too long to compute!! total non-zero = ro*(ro-1)/2
count=1;
for ii=1:ro-1
for jj=ii+1:ro
angbtwn(count) = atan2(norm(cross(Bt(ii,1:3),Bt(jj,1:3))), dot(Bt(ii,1:3),Bt(jj,1:3))).*180/pi;
count=count+1;
end
end
So, I though I'd try and vectorize it, and get rid of the non-built-in functions:
[ro,col]=size(Bt);
% angbtwn = zeros(ro-1); %TOO LONG!
for ii=1:ro-1
allAxes=Bt(ii:ro,1:3);
repeachAxis = allAxes(ones(ro-ii+1,1),1:3);
c = [repeachAxis(:,2).*allAxes(:,3)-repeachAxis(:,3).*allAxes(:,2)
repeachAxis(:,3).*allAxes(:,1)-repeachAxis(:,1).*allAxes(:,3)
repeachAxis(:,1).*allAxes(:,2)-repeachAxis(:,2).*allAxes(:,1)];
crossedAxis = reshape(c,size(repeachAxis));
normedAxis = sqrt(sum(crossedAxis.^2,2));
dottedAxis = sum(repeachAxis.*allAxes,2);
angbtwn(1:ro-ii+1,ii) = atan2(normedAxis,dottedAxis)*180/pi;
end
angbtwn(1,:)=[]; %angle btwn vec and itself
%only upper left triangle are values...
Still too long, even to pre-allocate... So I try to do sparse, but not implemented right:
[ro,col]=size(Bt);
%spalloc:
angbtwn = sparse([],[],[],ro,ro,ro*(ro-1)/2);%zeros(ro-1); %cell(ro,1)
for ii=1:ro-1
...same
angbtwn(1:ro-ii+1,ii) = atan2(normedAxis,dottedAxis)*180/pi; %WARNED: indexing = >overhead
% WHAT? Can't index sparse?? what's the point of spalloc then?
end
So if my logic can be improved, or if sparse is really the way to go, and I just can't implement it right, let me know where to improve. THANKS for your help.
Are you trying to get the angle between every pair of vectors in Bt? If Bt has 2 million vectors that's a trillion pairs each (apparently) requiring an inner product to get the angle between. I don't know that any kind of optimization is going to help have this operation finish in a reasonable amount of time in MATLAB on a single machine.
In any case, you can turn this problem into a matrix multiplication between matrices of unit vectors:
N=1000;
Bt=rand(N,4); % for testing. A matrix of N (row) vectors of length 4.
[ro,col]=size(Bt);
magnitude = zeros(N,1); % the magnitude of each row vector.
units = zeros(size(Bt)); % the unit vectors
% Compute the unit vectors for the row vectors
for ii=1:ro
magnitude(ii) = norm(Bt(ii,:));
units(ii,:) = Bt(ii,:) / magnitude(ii);
end
angbtwn = acos(units * units') * 360 / (2*pi);
But you'll run out of memory during the matrix multiplication for largish N.
You might want to use pdist with 'cosine' distance to compute the 1-cos(angbtwn).
Another perk for this approach that it does not compute n^2 values but exaxtly .5*(n-1)*n unique values :)