Let A be of size [n,m], i.e. it has n rows and m columns. Given I of size [n,1] with max(I)<=m, what is the fastest way to return B of size [n,1], such that B(i)=A(i,I(i))?
Example:
A =
8 1 6
3 5 7
4 9 2
and
I =
1
2
2
I want B to look like
B =
8
5
9
There obviously exist several ways to implement this, but in my case n is in the order of 1e6 and m in the order of 1e2, which is why I'm interested in the fastest implementation. I would like to avoid ind2sub or sub2ind since they both appear to be too slow as well. Any idea is greatly appreciated! Thanks!
You can replicate the behavior of sub2ind yourself. This gives me a speedup in my test:
clear
%% small example
A = rand(4,6)
I = [3 2 2 1]
inds = (I-1)*size(A,1) + (1:length(I));
B = A(inds)
%% timing
n = 1e4;
m = 1e2;
A = rand(n, m);
I = ceil(rand(1,n) * m);
% sub2ind
F = #() A(sub2ind(size(A), 1:size(A,1), I));
timeit(F)
% manual
F = #() A((I-1)*size(A,1) + (1:length(I)));
timeit(F)
You can also use something like this:
A(meshgrid(1:size(A,2),1:size(A,1)) == repmat(I,1,size(A,2)))
which will give you the same result, with no loop and no sub2ind.
Related
How can I remove any number that has duplicate from an array.
for example:
b =[ 1 1 2 3 3 5 6]
becomes
b =[ 2 5 6]
Use unique function to extract unique values then compute histogram of data for unique values and preserve those that have counts of 1.
a =[ 1 1 2 3 3 5 6];
u = unique(a)
idx = hist(a, u) ==1;
b = u(idx)
result
2 5 6
for multi column input this can be done:
a = [1 2; 1 2;1 3;2 1; 1 3; 3 5 ; 3 6; 5 9; 6 10] ;
[u ,~, uid] = unique(a,'rows');
idx = hist(uid,1:size(u,1))==1;
b= u(idx,:)
You can first sort your elements and afterwards remove all elements which have the same value as one of its neighbors as follows:
A_sorted = sort(A); % sort elements
A_diff = diff(A_sorted)~=0; % check if element is the different from the next one
A_unique = [A_diff true] & [true A_diff]; % check if element is different from previous and next one
A = A_sorted(A_unique); % obtain the unique elements.
Benchmark
I will benchmark my solution with the other provided solutions, i.e.:
using diff (my solution)
using hist (rahnema1)
using sum (Jean Logeart)
using unique (my alternative solution)
I will use two cases:
small problem (yours): A = [1 1 2 3 3 5 6];
larger problem
rng('default');
A= round(rand(1, 1000) * 300);
Result:
Small Large Comments
----------------|------------|------------%----------------
using `diff` | 6.4080e-06 | 6.2228e-05 % Fastest method for large problems
using `unique` | 6.1228e-05 | 2.1923e-04 % Good performance
using `sum` | 5.4352e-06 | 0.0020 % Only fast for small problems, preserves the original order
using `hist` | 8.4408e-05 | 1.5691e-04 % Good performance
My solution (using diff) is the fastest method for somewhat larger problems. The solution of Jean Logeart using sum is faster for small problems, but the slowest method for larger problems, while mine is almost equally fast for the small problem.
Conclusion: In general, my proposed solution using diff is the fastest method.
timeit(#() usingDiff(A))
timeit(#() usingUnique(A))
timeit(#() usingSum(A))
timeit(#() usingHist(A))
function A = usingDiff (A)
A_sorted = sort(A);
A_unique = [diff(A_sorted)~=0 true] & [true diff(A_sorted)~=0];
A = A_sorted(A_unique);
end
function A = usingUnique (A)
[~, ia1] = unique(A, 'first');
[~, ia2] = unique(A, 'last');
A = A(ia1(ia1 == ia2));
end
function A = usingSum (A)
A = A(sum(A==A') == 1);
end
function A = usingHist (A)
u = unique(A);
A = u(hist(A, u) ==1);
end
Let's say we have two matrices
A = [1,2,3;
2,4,5;
8,3,5]
B= [2,3;
4,5;
8,5]
How do I perform sediff for each row in A and B respectively without using loops or cellfun, in other words performing setdiff(A(i,:),B(i,:)) for all i. For this example I want to get
[1;
2;
3]
I am trying to do this for two very big matrices for my fluid simulator, thus I can't compromise on performance.
UPDATE:
you can assume that the second dimension (number of columns) of the answer will be fixed e.g. the answer will always be some n by m matrix and not some ragged array of different column sizes.
Another Example:
In my case A and B are m by 3 and m by 2 respectively and the answer should be m by 1. A solution for this case will suffice, but a general solution for matrices of size m by n1, m by n2 with answer of m by n3 will be very interesting. another example is
A = [1,2,3,4,5;
8,4,7,9,6]
B = [2,3;
4,9]
And the answer is
C = [1,4,5;
8,7,6]
Approach #1 Using bsxfun -
mask = all(bsxfun(#ne,A,permute(B,[1 3 2])),3);
At = A.'; %//'
out = reshape(At(mask.'),[],size(A,1)).'
Sample run -
>> A
A =
1 2 3 4 5
8 4 7 9 6
>> B
B =
2 3
4 9
>> mask = all(bsxfun(#ne,A,permute(B,[1 3 2])),3);
>> At = A.'; %//'
>> out = reshape(At(mask.'),[],size(A,1)).'
out =
1 4 5
8 7 6
Approach #2 Using diff and sort -
sAB = sort([A B],2)
dsAB = diff(sAB,[],2)~=0
mask1 = [true(size(A,1),1) dsAB]
mask2 = [dsAB true(size(A,1),1)]
mask = mask1 & mask2
sABt = sAB.'
out = reshape(sABt(mask.'),[],size(A,1)).'
I have a vector a=[1 2 3 1 4 2 5]'
I am trying to create a new vector that would give for each row, the occurence number of the element in a. For instance, with this matrix, the result would be [1 1 1 2 1 2 1]': The fourth element is 2 because this is the first time that 1 is repeated.
The only way I can see to achieve that is by creating a zero vector whose number of rows would be the number of unique elements (here: c = [0 0 0 0 0] because I have 5 elements).
I also create a zero vector d of the same length as a. Then, going through the vector a, adding one to the row of c whose element we read and the corresponding number of c to the current row of d.
Can anyone think about something better?
This is a nice way of doing it
C=sum(triu(bsxfun(#eq,a,a.')))
My first suggestion was this, a not very nice for loop
for i=1:length(a)
F(i)=sum(a(1:i)==a(i));
end
This does what you want, without loops:
m = max(a);
aux = cumsum([ ones(1,m); bsxfun(#eq, a(:), 1:m) ]);
aux = (aux-1).*diff([ ones(1,m); aux ]);
result = sum(aux(2:end,:).');
My first thought:
M = cumsum(bsxfun(#eq,a,1:numel(a)));
v = M(sub2ind(size(M),1:numel(a),a'))
on a completely different level, you can look into tabulate to get info about the frequency of the values. For example:
tabulate([1 2 4 4 3 4])
Value Count Percent
1 1 16.67%
2 1 16.67%
3 1 16.67%
4 3 50.00%
Please note that the solutions proposed by David, chappjc and Luis Mendo are beautiful but cannot be used if the vector is big. In this case a couple of naïve approaches are:
% Big vector
a = randi(1e4, [1e5, 1]);
a1 = a;
a2 = a;
% Super-naive solution
tic
x = sort(a);
x = x([find(diff(x)); end]);
for hh = 1:size(x, 1)
inds = (a == x(hh));
a1(inds) = 1:sum(inds);
end
toc
% Other naive solution
tic
x = sort(a);
y(:, 1) = x([find(diff(x)); end]);
y(:, 2) = histc(x, y(:, 1));
for hh = 1:size(y, 1)
a2(a == y(hh, 1)) = 1:y(hh, 2);
end
toc
% The two solutions are of course equivalent:
all(a1(:) == a2(:))
Actually, now the question is: can we avoid the last loop? Maybe using arrayfun?
In matlab, how could you create a matrix M using its indices to populate the values? For example, say I want to create a 3x3 matrix M such that
M(i,j) = i+j --> [ 2 3 4; 3 4 5; 4 5 6]
I tried making vectors: x = 1:3', y = 1:3 and then
M = x(:) + y(:)
but it didn't work as expected.
Any thoughts on how this can be done?
Thanks!
UPDATE
The M I actually desire is:
M(i,j) = -2^(-i - j).
One way would be
x = 1:3;
z = ones(1,3);
N = z'*x + x'*z
M = -2 .^ -(z'*x + x'*z)
% Or simply
% M = -2 .^ -N
Output:
N =
2 3 4
3 4 5
4 5 6
M =
-0.250000 -0.125000 -0.062500
-0.125000 -0.062500 -0.031250
-0.062500 -0.031250 -0.015625
You should use bsxfun to find the sum:
M=bsxfun(#plus, (1:3).', 1:3)
and for the second formula:
M=-2.^(-bsxfun(#plus, (1:3).', 1:3))
bsxfun(#(x,y)(-2.^(-x-y)), (1:3).', 1:3)
This uses the Answer of Mohsen Nosratinia with the function you wished.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to Add a row vector to a column vector like matrix multiplication
I have a nx1 vector and a 1xn vector. I want to add them in a special manner like matrix multiplication in an efficient manner (vectorized):
Example:
A=[1 2 3]'
B=[4 5 6]
A \odd_add B =
[1+4 1+5 1+6
2+4 2+5 2+6
3+4 3+5 3+6
]
I have used bsxfun in MATLAB, but I think it is slow. Please help me...
As mentioned by #b3. this would be an appropriate place to use repmat. However in general, and especially if you are dealing with very large matrices, bsxfun normally makes a better substitute. In this case:
>> bsxfun(#plus, [1,2,3]', [4,5,6])
returns the same result, using about a third the memory in the large-matrix limit.
bsxfun basically applies the function in the first argument to every combination of items in the second and third arguments, placing the results in a matrix according to the shape of the input vectors.
I present a comparison of the different methods mentioned here. I am using the TIMEIT function to get robust estimates (takes care of warming up the code, average timing on multiple runs, ..):
function testBSXFUN(N)
%# data
if nargin < 1
N = 500; %# N = 10, 100, 1000, 10000
end
A = (1:N)';
B = (1:N);
%# functions
f1 = #() funcRepmat(A,B);
f2 = #() funcTonyTrick(A,B);
f3 = #() funcBsxfun(A,B);
%# timeit
t(1) = timeit( f1 );
t(2) = timeit( f2 );
t(3) = timeit( f3 );
%# time results
fprintf('N = %d\n', N);
fprintf('REPMAT: %f, TONY_TRICK: %f, BSXFUN: %f\n', t);
%# validation
v{1} = f1();
v{2} = f2();
v{3} = f3();
assert( isequal(v{:}) )
end
where
function C = funcRepmat(A,B)
N = numel(A);
C = repmat(A,1,N) + repmat(B,N,1);
end
function C = funcTonyTrick(A,B)
N = numel(A);
C = A(:,ones(N,1)) + B(ones(N,1),:);
end
function C = funcBsxfun(A,B)
C = bsxfun(#plus, A, B);
end
The timings:
>> for N=[10 100 1000 5000], testBSXFUN(N); end
N = 10
REPMAT: 0.000065, TONY_TRICK: 0.000013, BSXFUN: 0.000031
N = 100
REPMAT: 0.000120, TONY_TRICK: 0.000065, BSXFUN: 0.000085
N = 1000
REPMAT: 0.032988, TONY_TRICK: 0.032947, BSXFUN: 0.010185
N = 5000
REPMAT: 0.810218, TONY_TRICK: 0.824297, BSXFUN: 0.258774
BSXFUN is a clear winner.
In matlab vectorization, there is no substitute for Tony's Trick in terms of speed in comparison to repmat or any other built in Matlab function for that matter. I am sure that the following code must be fastest for your purpose.
>> A = [1 2 3]';
>> B = [4 5 6];
>> AB_sum = A(:,ones(3,1)) + B(ones(3,1),:);
The speed differential will be much more apparent (at LEAST an order of magnitude) for larger size of A and B. See this test I conducted some time ago to ascertain the superiority of Tony's Trick over repmatin terms of time consumption.
REPMAT is your friend:
>> A = [1 2 3]';
>> B = [4 5 6];
>> AplusB = repmat(A, 1, 3) + repmat(B, 3, 1)
AplusB =
5 6 7
6 7 8
7 8 9