calculate maximum distance with large vectors

calculate maximum distance with large vectors - matlab

ok so I have 2 vectors (A and B) with different lengths (in this example lets say 100000 and 300000) and I wish to obtain the indexes that have the largest difference.
Something like this
distAB=bsxfun(#(v1,v2) abs(v1-v2),A,B));
[~,lin_indx]=max(distAB(:));
[x_indx,y_indx]=ind2sub(size(A),lin_indx)
The problem here is that my vectors A and B are too large and producing the matrix "distAB" is too expensive. I would wish to obtain the min directly with the bsxfun.

If you want to maximize distance, the search can be reduced to just two candidate pairs: max(A), min(B)) or min(A), max(B)).
So just try those two pairs:
[ma_val, ma_ind] = min(A);
[Ma_val, Ma_ind] = max(A);
[mb_val, mb_ind] = min(B);
[Mb_val, Mb_ind] = max(B);
diff1 = abs(Mb_val-ma_val);
diff2 = abs(Ma_val-mb_val);
if diff1 > diff2
result_ind_A = ma_ind;
result_ind_B = Mb_ind;
result_value = diff1;
else
result_ind_A = Ma_ind;
result_ind_B = mb_ind;
result_value = diff2;
end
If you want to minimize distance: sort the concatenation of A and B, keeping track of which element is from A and which from B:
C = sortrows([A(:) zeros(numel(A),1); B(:) ones(numel(B),1)] ,1);
%// C(k,2)==0 indicates element k comes from A; 1 indicates from B
Now, use a for loop to traverse all elements in C(:,1) that come from B. For each such element, find the two elements from A that are located closest above and to below left in C. Those are the only candidates from A to be nearest to that element from A.
So for each element from B you have two candidates from A, which reduces the complexity of the problem significantly.

You can calculate distAB this way with the built-in #minus which could be more efficient -
distAB = abs(bsxfun(#minus,A(:).',B(:)))

Luis' approach of sorting will probably be the fastest. But if you have the statistics toolbox installed, you could use the function knnsearch, which makes for a simple yet efficient solution.
(There are also some similar free versions of this on the File Exchange. Look for: kd tree nearest neighbor)
One extra benefit of this solution is that it also works for 2D, 3D, ..., nD data.
[Is, D] = knnsearch(A,B,'K',1);
[~,j] = min(D); i = Is(j);
[A(i), B(j)]

Related

Fastest way of finding the only index of vector b where array A(i,j) == b

I have 2 big arrays A and b:
A: 10.000++ rows, 4 columns, not unique integers
b: vector with 500.000++ elements, unique integers
Due to the uniqueness of the values of b, I need to find the only index of b, where A(i,j) == b.
What I started with is
[rows,columns] = size(A);
B = zeros(rows,columns);
for i = 1 : rows
for j = 1 : columns
B(i,j) = find(A(i,j)==b,1);
end
end
This takes approx 5.5 seconds to compute, which is way to long, since A and b can be significantly bigger... That in mind I tried to speed up the code by using logical indexing and reducing the for-loops
[rows,columns] = size(A);
B = zeros(rows,columns);
for idx = 1 : numel(b)
B(A==b(idx)) = idx;
end
Sadly this takes even longer: 21 seconds
I even tried to do use bsxfun
for i = 1 : columns
[I,J] = find(bsxfun(#eq,A(:,i),b))
... stitch B together ...
end
but with a bigger arrays the maximum array size is quickly exceeded (102,9GB...).
Can you help me find a faster solution to this? Thanks in advance!
EDIT: I extended find(A(i,j)==b,1), which speeds up the algorithm by factor 2! Thank you, but overall still too slow... ;)

The function ismember is the right tool for this:
[~,B] = ismember(A,b);
Test code:
function so
A = rand(1000,4);
b = unique([A(:);rand(2000,1)]);
B1 = op1(A,b);
B2 = op2(A,b);
isequal(B1,B2)
tic;op1(A,b);op1(A,b);op1(A,b);op1(A,b);toc
tic;op2(A,b);op2(A,b);op2(A,b);op2(A,b);toc
end
function B = op1(A,b)
B = zeros(size(A));
for i = 1:numel(A)
B(i) = find(A(i)==b,1);
end
end
function B = op2(A,b)
[~,B] = ismember(A,b);
end
I ran this on Octave, which is not as fast with loops as MATLAB. It also doesn't have the timeit function, hence the crappy timing using tic/toc (sorry for that). In Octave, op2 is more than 100 times faster than op1. Timings will be different in MATLAB, but ismember should still be the fastest option. (Note I also replaced your double loop with a single loop, this is the same but simpler and probably faster.)
If you want to repeatedly do the search in b, it is worthwhile to sort b first, and implement your own binary search. This will avoid the checks and sorting that ismember does. See this other question.

Assuming that you have positive integers you can use array indexing:
mm = max(max(A(:)),max(b(:)));
idxs = sparse(b,1,1:numel(b),mm,1);
result = full(idxs(A));
If the range of values is small you can use dense matrix instead of sparse matrix:
mm = max(max(A(:)),max(b(:)));
idx = zeros(mm,1);
idx(b)=1:numel(b);
result = idx(A);

Multiply two vectors with dimensions increasing along time

I have two vectors (called A and B) with length N. Then I need to multiply both of them, but as an "integration" process. Which means I have to multiply first A(1)*B(1), then A(1:2)*B(1:2), until A(1:N)*B(1:N). The result of multiplying booth vector is a number, since B is a column vector. I've done it with a for loop:
for k = 1:N
C(k) = A(1:k) * B(1:k).';
end
But I wanted to ask you if this is the best solution or there is any other option more time-efficient, since N is very large (about 110,000)

C = cumsum(A.*B)
does the same thing without for loop. As EBH suggested in the comments if you are not sure whether A and B have same orientation, then use
C = cumsum(A(:).*B(:))

Average part of a multidimensional array based on another array (Matlab)

B = randn(1,25,10);
Z = [1;1;1;2;2;3;4;4;4;3];
Ok, so, I want to find the locations where Z=1(or any numbers that are equal to each other), then average across each of the 25 points at these specific locations. In the example you would end with a 1*25*4 array.
Is there an easy way to do this?
I'm not the most versed in Matlab.

First things first: break down the problem.
Define the groups (i.e. the set of unique Z values)
Find elements which belong to these groups
Take the average.
Once you have done that, you can begin to see it's a pretty standard for loop and "Select columns which meet criteria".
Something along the lines of:
B = randn(1,25,10);
Z = [1;1;1;2;2;3;4;4;4;3];
groups = unique(Z); %//find the set of groups
C = nan(1,25,length(groups)); %//predefine the output space for efficiency
for gi = 1:length(groups) %//for each group
idx = Z == groups(gi); %//find it's members
C(:,:,gi) = mean(B(:,:,idx), 3); %//select and mean across the third dimension
end

If B = randn(10,25); then it's very easy because Matlab function usually works down the rows.
Using logical indexing:
ind = Z == 1;
mean(B(ind,:));
If you're dealing with multiple dimensions use permute (and reshape if you actually have 3 dimensions or more) to get yourself to a point where you're averaging down the rows as above:
B = randn(1,25,10);
BB = permute(B, [3,2,1])
continue as above

MATLAB: I want to threshold a matrix, based on thresholds in a vector, without a for loop. Possible?

Let us say I have the following:
M = randn(10,20);
T = randn(1,20);
I would like to threshold each column of M, by each entry of T. For example, find all indicies of all elements of M(:,1) that are greater than T(1). Find all indicies of all elements in M(:,2) that are greater than T(2), etc etc.
Of course, I would like to do this without a for-loop. Is this possible?

You can use bsxfun like this:
I = bsxfun(#gt, M, T);
Then I will be a logcial matrix of size(M) with ones where M(:,i) > T(i).

You can use bsxfun to do things like this, but it may not be faster than a for loop (more below on this).
result = bsxfun(#gt,M,T)
This will do an element wise comparison and return you a logical matrix indicating the relationship governed by the first argument. I have posted code below to show the direct comparison, indicating that it does return what you are looking for.
%var declaration
M = randn(10,20);
T = randn(1,20);
% quick method
fastres = bsxfun(#gt,M,T);
% looping method
res = false(size(M));
for i = 1:length(T)
res(:,i) = M(:,i) > T(i);
end
% check to see if the two matrices are identical
isMatch = all(all(fastres == res))
This function is very powerful and can be used to help speed up processes, but keep in mind that it will only speed things up if there is a lot of data. There is a bit of background work that bsxfun must do, which can actually cause it to be slower.
I would only recommend using it if you have several thousand data points. Otherwise, the traditional for-loop will actually be faster. Try it out for yourself by changing the size of the M and T variables.

You can replicate the threshold vector and use matrix comparison:
s=size(M);
T2=repmat(T, s(1), 1);
M(M<T2)=0;
Indexes=find(M);

Calculating all two element sums of [a1 a2 a3 ... an] in Matlab

Context: I'm working on Project Euler Problem 23 using Matlab in order to practice my barely existing programming skills.
My Problem:
Now I have a vector with roughly 6500 numbers (ranging from 12 to 28122) as elements and want to calculate all the two element sums. That is I only need one instance of every sum, so having calculated a1 + an it's not necessary to calculate an + a1.
Edit for clarification: This includes the sums a1+a1, a2+a2,..., an+an.
The problem is that this is much too slow.
Problem specific constraints:
It's a given that sums 28123 or over aren't necessary to calculate, since those can't be used to solve the problem further.
My approach:
AbundentNumberSumsRaw=[];
for i=1:3490
AbundentNumberSumsRaw=[AbundentNumberSumRaw AbundentNumbers(i)+AbundentNumbers(i:end);
end
This works terribly :p
My Comments:
I'm pretty sure that incrementally increasing the vector AbundentNumbersRaw is bad coding, since that means memory usage will spike unnecessarily. I haven't done so, since a) I don't know what size vector to pre-allocate and b) I couldn't come up with a way to inject the sums into AbundentNumbersRaw in a orderly manner without using some ugly looking nested loops.
"for i=1:3490" is lower than the numbers of elements simply because I checked and saw that all the resulting sums for numbers whose index are above 3490 would be too large for me to use anyway.
I'm pretty sure my main issue is that the program need to do a lot of incremental increases of the vector AbundentNumbersRaw.
Any and all help and suggestions would be much appreciated :)
Cheers
Rasmus

Suppose
a = 28110*rand(6500,1)+12;
then
sums = [
a(1) + a(1:end)
a(2) + a(2:end)
...
];
is the calculation you're after.
You also state that sums whose value goes over 28123 should be discarded.
This can be generalized like so:
% Compute all 2-element sums without repetitions
C = arrayfun(#(x) a(x)+a(x:end), 1:numel(a), 'uniformoutput', false);
C = cat(1, C{:});
% discard sums exceeding threshold
C(C>28123) = [];
or using a loop
% Compute all 2-element sums without repetitions
E = cell(numel(a),1);
for ii = 1:numel(a)
E{ii} = a(ii)+a(ii:end); end
E = cat(1, E{:});
% discard sums exceeding threshold
E(E>28123) = [];
Simple testing shows that arrayfun is somewhat faster than the loop, so I'd go for the arrayfun option.

As your primary problem is to find out, which integers in a given set can be written as the sum of two integers of a different set, I'd choose a different approach:
AbundantNumbers = 1:6500; % replace with the list you generated somewhere else
maxInteger = 28122;
AbundantNumberSum(1:maxInteger) = true; % logical array
for i = 1:length(AbundantNumbers)
sumIndices = AbundantNumbers(i) + AbundantNumbers;
AbundantNumberSum(sumIndices(sumIndices <= maxInteger)) = false;
end
Unfortunantely, this is not an answer to your question but to your problem ;-) For the MatLab way to solve your original question, see the elegant answer of Rody Oldenhuis.

My approach would be the following:
v = 1:3490; % your vector here
s = length(v);
result = zeros(s); % preallocate memory
for m = 1:s
result(m,m:end) = v(m)+v(m:end);
end
You will get a matrix of 3490 x 3490 elements and more than half of them 0.