Best way to retrieve the number of elements in a vector in matlab - matlab

I would like to know what is the best way to retrieve the number of element in a vector in matlab in term of speed:
is it:
length(A)
or
size(A,1)

Neither. You want to always use numel for this purpose. length only returns the longest dimension (which can get confusing for 2D arrays) and size(data, dimension) requires you to know whether it's a row or column vector. numel will return the number of elements whether it is a row vector, column vector, or multi-dimensional array.
We can easily test the performance of these by writing a quick benchmark. We will take the size with the various methods N times (for this I used 10000).
function size_test
nRepeats = 10000;
times1 = zeros(nRepeats, 1);
times2 = zeros(nRepeats, 1);
times3 = zeros(nRepeats, 1);
for k = 1:nRepeats
data = rand(10000, 1);
tic
size(data, 1);
times1(k) = toc;
tic
length(data);
times2(k) = toc;
tic
numel(data);
times3(k) = toc;
end
% Compute the total time required for each method
fprintf('size:\t%0.8f\n', sum(times1));
fprintf('length:\t%0.8f\n', sum(times2));
fprintf('numel:\t%0.8f\n', sum(times3));
end
When run on my machine it yields:
size: 0.00860400
length: 0.00626700
numel: 0.00617300
So in addition to being the most robust, numel is also slightly faster than the other alternatives.
That being said, there are likely many other bottlenecks in your code than determining the number of elements in an array so I would focus on optimizing those.

Related

Efficient generation of permutation matrices in MATLAB

I'm trying to generate a 100-by-5 matrix where every line is a permutation of 1..100 (that is, every line is 5 random numbers from [1..100] without repetitions).
So far I've only been able to do it iteratively with a for-loop. Is there a way to do it more efficiently (using fewer lines of code), without loops?
N = 100;
T = zeros(N, 5);
for i = 1:N
T(i, :) = randperm(100, 5);
end
Let
N = 100; % desired number of rows
K = 5; % desired number of columns
M = 100; % size of population to sample from
Here's an approach that's probably fast; but memory-expensive, as it generates an intermediate M×N matrix and then discards N-K rows:
[~, result] = sort(rand(N, M), 2);
result = result(:, 1:K);
There is very little downside to using a loop here, at least in this minimal example. Indeed, it may well be the best-performing solution for MATLAB's execution engine. But perhaps you don't like assigning the temporary variable i or there are other advantages to vectorization in your non-minimal implementation. Consider this carefully before blindly implementing a solution.
You need to call randperm N times, but each call has no dependency on its position in the output. Without a loop index you will need something else to regulate the number of calls, but this can be just N empty cells cell(N,1). You can use this cell array to evaluate a function that calls randperm but ignores the contents (or, rather, lack of contents) of the cells, and then reassemble the function outputs into one matrix with cell2mat:
T = cell2mat(cellfun(#(~) {randperm(100,5)}, cell(N,1)));

Accelerating the index in huge sparse matrix during loop in MATLAB

I need to construct a huge sparse matrix in iterations. The code is as follow:
function Huge_Matrix = Create_Huge_Matrix(len, Weight, Index)
k = size(Weight,1);
Huge_Matrix = spalloc(len, len,floor(len*k));
parfor i = 1:len
temp = sparse(1,len);
ind = Index(:,i);
temp(ind) = Weight(:,i);
Huge_Matrix(i,:) = temp;
end
Huge_Matrix = Huge_Matrix + spdiags(-k*ones(len,1),0,len,len);
end
As is shown, len is size of the height * weight of the input image, for 200*200 image, the len is 40000! And I am assigning the Weight into this huge matrix according the position stored in Index. Even though I use parfor to accerlate the loop, the speed is very slow.
I also try to create full matrix at first, it seems that the code can becomes faster, but memory is limited. Is there any other way to speed up the code? Thanks in advance!
As #CrisLuengo says in the comments, there is probably a better way to do what you're trying to do than to create a 40kx40k matrix, but if you have to create a large sparse matrix, it's better to let MATLAB do it for you.
The sparse function has a signature that takes lists of rows, columns and the corresponding values for the nonzero elements of the matrix:
S = sparse(i,j,v) generates a sparse matrix S from the triplets i, j, and v such that S(i(k),j(k)) = v(k). The max(i)-by-max(j) output matrix has space allotted for length(v) nonzero elements. sparse adds together elements in v that have duplicate subscripts in i and j.
If the inputs i, j, and v are vectors or matrices, they must have the same number of elements. Alternatively, the argument v and/or one of the arguments i or j can be scalars.
So, we can simply pass Index as the row indices and Weight as the values, so all we need is an array of column indices the same size as Index:
col_idx = repmat(1:len, k, 1);
Huge_Matrix = sparse(Index, col_idx, Weight, len, len);
(The last two parameters specify the size of the sparse matrix.)
The next step is to create another large sparse matrix and add it to the first. That seems kind of wasteful, so why not just add those entries to the existing arrays before creating the matrix?
Here's the final function:
function Huge_Matrix = Create_Huge_Matrix(len, Weight, Index)
k = size(Weight,1);
% add diagonal indices/weights to arrays
% this avoids creating second huge sparse array
Index(end+1, :) = [1:len];
Weight(end+1, :) = -k*ones(1,len);
% create array of column numbers corresponding to each Index
% make k+1 rows because we've added the diagonal
col_idx = repmat(1:len, k+1, 1);
% let sparse do the work
Huge_Matrix = sparse(Index, col_idx, Weight, len, len);
end

MATLAB Indexing Conventions for Vectors / 1D-Arrays

Consider the preallocation of the following two vectors:
vecCol = NaN( 3, 1 );
vecRow = NaN( 1, 3 );
Now the goal is to assign values to those vectors (e.g. within a loop if vectorization is not possible). Is there a convention or best practice regarding the indexing?
Is the following approach recommended?
for k = 1:3
vecCol( k, 1 ) = 1; % Row, Column
vecRow( 1, k ) = 2; % Row, Column
end
Or is it better to code as follows?
for k = 1:3
vecCol(k) = 1; % Element
vecRow(k) = 2; % Element
end
It makes no difference functionally. If the context means that the vectors are always 1D (your naming convention in this example helps) then you can just use vecCol(i) for brevity and flexibility. However, there are some advantages to using the vecCol(i,1) syntax:
It's explicitly clear which type of vector you're using. This is good if it matters, e.g. when using linear algebra, but might be irrelevant if direction is arbitrary.
If you forget to initialise (bad but it happens) then it will ensure the direction is as expected
It's a good habit to get into so you don't forget when using 2D arrays
It appears to be slightly quicker. This will be negligible on small arrays but see the below benchmark for vectors with 10^8 elements, and a speed improvement of >10%.
function benchie()
% Benchmark. Set up large row/column vectors, time value assignment using timeit.
n = 1e8;
vecCol = NaN(n, 1); vecRow = NaN(1, n);
f = #()fullidx(vecCol, vecRow, n);
s = #()singleidx(vecCol, vecRow, n);
timeit(f)
timeit(s)
end
function fullidx(vecCol, vecRow, n)
% 2D indexing, copied from the example in question
for k = 1:n
vecCol(k, 1) = 1; % Row, Column
vecRow(1, k) = 2; % Row, Column
end
end
function singleidx(vecCol, vecRow, n)
% Element indexing, copied from the example in question
for k = 1:n
vecCol(k) = 1; % Element
vecRow(k) = 2; % Element
end
end
Output (tested on Windows 64-bit R2015b, your mileage may vary!)
% f (full indexing): 2.4874 secs
% s (element indexing): 2.8456 secs
Iterating this benchmark over increasing n, we can produce the following plot for reference.
A general rule of thumb in programming is "explicit is better than implicit". Since there is no functional difference between the two, I'd say it depends on context which one is cleaner/better:
if the context uses a lot of matrix algebra and the distinction between row and column vectors is important, use the 2-argument indexing to reduce bugs and facilitate reading
if the context doesn't disciminate much between the two and you're just using vectors as simple arrays, using 1-argument indexing is cleaner

Mablab/Octave - use cellfun to index one matrix with another

I have a cell containing a random number of matrices, say a = {[300*20],....,[300*20]};. I have another cell of the same format, call it b, that contains the logicals of the position of the nan terms in a.
I want to use cellfun to loop through the cell and basically let the nan terms equal to 0 i.e. a(b)=0.
Thanks,
j
You could define a function that replaces any NaN with zero.
function a = nan2zero(a)
a(isnan(a)) = 0;
Then you can use cellfun to apply this function to your cell array.
a0 = cellfun(#nan2zero, a, 'UniformOutput', 0)
That way, you don't even need any matrices b.
First, you should probably give the tick to #s.bandara, as that was the first correct answer and it used cellfun (as you requested). Do NOT give it to this answer. The purpose of this answer is to provide some additional analysis.
I thought I'd look into the efficiency of some of the possible approaches to this problem.
The first approach is the one advocated by #s.bandara.
The second approach is similar to the one advocated by #s.bandara, but it uses b to convert nan to 0, rather than using isnan. In theory, this method may be faster, since nothing is assigned to b inside the function, so it should be treated "By Ref".
The third approach uses a loop to get around using cellfun, since cellfun is often slower than an explicit loop
The results of a quick speed test are:
Elapsed time is 3.882972 seconds. %# First approach (a, isnan, and cellfun, eg #s.bandara)
Elapsed time is 3.391190 seconds. %# Second approach (a, b, and cellfun)
Elapsed time is 3.041992 seconds. %# Third approach (loop-based solution)
In other words, there are (small) savings to be made by passing b in rather than using isnan. And there are further (small) savings to be made by using a loop rather than cellfun. But I wouldn't lose sleep over it. Remember, the results of any simulation are specific to the specified inputs.
Note, these results were consistent across several runs, I used tic and toc to do this, albeit with many loops over each method. If I wanted to be really thorough, I should use timeit from FEX. If anyone is interested, the code for the three methods follows:
%# Build some example matrices
T = 1000; N = 100; Q = 50; M = 100;
a = cell(1, Q); b = cell(1, Q);
for q = 1:Q
a{q} = randn(T, N);
b{q} = logical(randi(2, T, N) - 1);
a{q}(b{q}) = nan;
end
%# Solution using a, isnan, and cellfun (#s.bandara solution)
tic
for m = 1:M
Soln2 = cellfun(#f1, a, 'UniformOutput', 0);
end
toc
%# Solution using a, b, and cellfun
tic
for m = 1:M
Soln1 = cellfun(#f2, a, b, 'UniformOutput', 0);
end
toc
%# Solution using a loop to avoid cellfun
tic
for m = 1:M
Soln3 = cell(1, Q);
for q = 1:Q
Soln3{q} = a{q};
Soln3{q}(b{q}) = 0;
end
end
toc
%# Solution proposed by #EitanT
[K, N] = size(a{1});
tic
for m = 1:M
a0 = [a{:}]; %// Concatenate matrices along the 2nd dimension
a0(isnan(a0)) = 0; %// Replace NaNs with zeroes
Soln4 = mat2cell(a0, K, N * ones(size(a)));
end
toc
where:
function x1 = f1(x1)
x1(isnan(x1)) = 0;
and:
function x1 = f2(x1, x2)
x1(x2) = 0;
UPDATE: A fourth approach has been suggested by #EitanT. This approach concatenates the cell array of matrices into one large matrix, performs the operation on the large matrix, then optionally converts it back to a cell array. I have added the code for this procedure to my testing routine above. For the inputs specified in my testing code, ie T = 1000, N = 100, Q = 50, and M = 100, the timed run is as follows:
Elapsed time is 3.916690 seconds. %# #s.bandara
Elapsed time is 3.362319 seconds. %# a, b, and cellfun
Elapsed time is 2.906029 seconds. %# loop-based solution
Elapsed time is 4.986837 seconds. %# #EitanT
I was somewhat surprised by this as I thought the approach of #EitanT would yield the best results. On paper, it seems extremely sensible. Note, we can of course mess around with the input parameters to find specific settings that advantage different solutions. For example, if the matrices are small, but the number of them is large, then the approach of #EitanT does well, eg T = 10, N = 5, Q = 500, and M = 100 yields:
Elapsed time is 0.362377 seconds. %# #s.bandara
Elapsed time is 0.299595 seconds. %# a, b, and cellfun
Elapsed time is 0.352112 seconds. %# loop-based solution
Elapsed time is 0.030150 seconds. %# #EitanT
Here the approach of #EitanT dominates.
For the scale of the problem indicated by the OP, I found that the loop based solution usually had the best performance. However, for some Q, eg Q = 5, the solution of #EitanT managed to edge ahead.
Hmm.
Given the nature of the contents of your cell array, there may exist an even faster solution: you can convert your cell data to a single matrix and use vector indexing to replace all NaN values in it at once, without the need of cellfun or loops:
a0 = [a{:}]; %// Concatenate matrices along the 2nd dimension
a0(isnan(a0)) = 0; %// Replace NaNs with zeroes
If you want to convert it back to a cell array, that's fine:
[M, N] = size(a{1});
mat2cell(a0, M, N * ones(size(a)))
P.S.
Work with a 3-D matrix instead of a cell array, if possible. Vectorized operations are usually much faster in MATLAB.

Use a vector to index a matrix without linear index

G'day,
I'm trying to find a way to use a vector of [x,y] points to index from a large matrix in MATLAB.
Usually, I would convert the subscript points to the linear index of the matrix.(for eg. Use a vector as an index to a matrix) However, the matrix is 4-dimensional, and I want to take all of the elements of the 3rd and 4th dimensions that have the same 1st and 2nd dimension. Let me hopefully demonstrate with an example:
Matrix = nan(4,4,2,2); % where the dimensions are (x,y,depth,time)
Matrix(1,2,:,:) = 999; % note that this value could change in depth (3rd dim) and time (4th time)
Matrix(3,4,:,:) = 888; % note that this value could change in depth (3rd dim) and time (4th time)
Matrix(4,4,:,:) = 124;
Now, I want to be able to index with the subscripts (1,2) and (3,4), etc and return not only the 999 and 888 which exist in Matrix(:,:,1,1) but the contents which exist at Matrix(:,:,1,2),Matrix(:,:,2,1) and Matrix(:,:,2,2), and so on (IRL, the dimensions of Matrix might be more like size(Matrix) = (300 250 30 200)
I don't want to use linear indices because I would like the results to be in a similar vector fashion. For example, I would like a result which is something like:
ans(time=1)
999 888 124
999 888 124
ans(time=2)
etc etc etc
etc etc etc
I'd also like to add that due to the size of the matrix I'm dealing with, speed is an issue here - thus why I'd like to use subscript indices to index to the data.
I should also mention that (unlike this question: Accessing values using subscripts without using sub2ind) since I want all the information stored in the extra dimensions, 3 and 4, of the i and jth indices, I don't think that a slightly faster version of sub2ind still would not cut it..
I can think of three ways to go about this
Simple loop
Just loop over all the 2D indices you have, and use colons to access the remaining dimensions:
for jj = 1:size(twoDinds,1)
M(twoDinds(jj,1),twoDinds(jj,2),:,:) = rand;
end
Vectorized calculation of Linear indices
Skip sub2ind and vectorize the computation of linear indices:
% generalized for arbitrary dimensions of M
sz = size(M);
nd = ndims(M);
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
% the linear indices
inds = twoDinds(:,1) + ([twoDinds(:,2) [argout{:}]]-1) * cumprod(sz(1:3)).';
Sub2ind
Just use the ready-made tool that ships with Matlab:
inds = sub2ind(size(M), twoDinds(:,1), twoDinds(:,2), argout{:});
Speed
So which one's the fastest? Let's find out:
clc
M = nan(4,4,2,2);
sz = size(M);
nd = ndims(M);
twoDinds = [...
1 2
4 3
3 4
4 4
2 1];
tic
for ii = 1:1e3
for jj = 1:size(twoDinds,1)
M(twoDinds(jj,1),twoDinds(jj,2),:,:) = rand;
end
end
toc
tic
twoDinds_prev = twoDinds;
for ii = 1:1e3
twoDinds = twoDinds_prev;
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
inds = twoDinds(:,1) + ([twoDinds(:,2) [argout{:}]]-1) * cumprod(sz(1:3)).';
M(inds) = rand;
end
toc
tic
for ii = 1:1e3
twoDinds = twoDinds_prev;
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
inds = sub2ind(size(M), twoDinds(:,1), twoDinds(:,2), argout{:});
M(inds) = rand;
end
toc
Results:
Elapsed time is 0.004778 seconds. % loop
Elapsed time is 0.807236 seconds. % vectorized linear inds
Elapsed time is 0.839970 seconds. % linear inds with sub2ind
Conclusion: use the loop.
Granted, the tests above are largely influenced by JIT's failure to compile the two last loops, and the non-specificity to 4D arrays (the last two method also work on ND arrays). Making a specialized version for 4D will undoubtedly be much faster.
Nevertheless, the indexing with simple loop is, well, simplest to do, easiest on the eyes and very fast too, thanks to JIT.
So, here is a possible answer... but it is messy. I suspect it would more computationally expensive then a more direct method... And this would definitely not be my preferred answer. It would be great if we could get the answer without any for loops!
Matrix = rand(100,200,30,400);
grabthese_x = (1 30 50 90);
grabthese_y = (61 9 180 189);
result=nan(size(length(grabthese_x),size(Matrix,3),size(Matrix,4));
for tt = 1:size(Matrix,4)
subset = squeeze(Matrix(grabthese_x,grabthese_y,:,tt));
for NN=1:size(Matrix,3)
result(:,NN,tt) = diag(subset(:,:,NN));
end
end
The resulting matrix, result should have size size(result) = (4 N tt).
I think this should work, even if Matrix isn't square. However, it is not ideal, as I said above.