MATLAB Indexing Conventions for Vectors / 1D-Arrays - matlab

Consider the preallocation of the following two vectors:
vecCol = NaN( 3, 1 );
vecRow = NaN( 1, 3 );
Now the goal is to assign values to those vectors (e.g. within a loop if vectorization is not possible). Is there a convention or best practice regarding the indexing?
Is the following approach recommended?
for k = 1:3
vecCol( k, 1 ) = 1; % Row, Column
vecRow( 1, k ) = 2; % Row, Column
end
Or is it better to code as follows?
for k = 1:3
vecCol(k) = 1; % Element
vecRow(k) = 2; % Element
end

It makes no difference functionally. If the context means that the vectors are always 1D (your naming convention in this example helps) then you can just use vecCol(i) for brevity and flexibility. However, there are some advantages to using the vecCol(i,1) syntax:
It's explicitly clear which type of vector you're using. This is good if it matters, e.g. when using linear algebra, but might be irrelevant if direction is arbitrary.
If you forget to initialise (bad but it happens) then it will ensure the direction is as expected
It's a good habit to get into so you don't forget when using 2D arrays
It appears to be slightly quicker. This will be negligible on small arrays but see the below benchmark for vectors with 10^8 elements, and a speed improvement of >10%.
function benchie()
% Benchmark. Set up large row/column vectors, time value assignment using timeit.
n = 1e8;
vecCol = NaN(n, 1); vecRow = NaN(1, n);
f = #()fullidx(vecCol, vecRow, n);
s = #()singleidx(vecCol, vecRow, n);
timeit(f)
timeit(s)
end
function fullidx(vecCol, vecRow, n)
% 2D indexing, copied from the example in question
for k = 1:n
vecCol(k, 1) = 1; % Row, Column
vecRow(1, k) = 2; % Row, Column
end
end
function singleidx(vecCol, vecRow, n)
% Element indexing, copied from the example in question
for k = 1:n
vecCol(k) = 1; % Element
vecRow(k) = 2; % Element
end
end
Output (tested on Windows 64-bit R2015b, your mileage may vary!)
% f (full indexing): 2.4874 secs
% s (element indexing): 2.8456 secs
Iterating this benchmark over increasing n, we can produce the following plot for reference.

A general rule of thumb in programming is "explicit is better than implicit". Since there is no functional difference between the two, I'd say it depends on context which one is cleaner/better:
if the context uses a lot of matrix algebra and the distinction between row and column vectors is important, use the 2-argument indexing to reduce bugs and facilitate reading
if the context doesn't disciminate much between the two and you're just using vectors as simple arrays, using 1-argument indexing is cleaner

Related

Efficient generation of permutation matrices in MATLAB

I'm trying to generate a 100-by-5 matrix where every line is a permutation of 1..100 (that is, every line is 5 random numbers from [1..100] without repetitions).
So far I've only been able to do it iteratively with a for-loop. Is there a way to do it more efficiently (using fewer lines of code), without loops?
N = 100;
T = zeros(N, 5);
for i = 1:N
T(i, :) = randperm(100, 5);
end
Let
N = 100; % desired number of rows
K = 5; % desired number of columns
M = 100; % size of population to sample from
Here's an approach that's probably fast; but memory-expensive, as it generates an intermediate M×N matrix and then discards N-K rows:
[~, result] = sort(rand(N, M), 2);
result = result(:, 1:K);
There is very little downside to using a loop here, at least in this minimal example. Indeed, it may well be the best-performing solution for MATLAB's execution engine. But perhaps you don't like assigning the temporary variable i or there are other advantages to vectorization in your non-minimal implementation. Consider this carefully before blindly implementing a solution.
You need to call randperm N times, but each call has no dependency on its position in the output. Without a loop index you will need something else to regulate the number of calls, but this can be just N empty cells cell(N,1). You can use this cell array to evaluate a function that calls randperm but ignores the contents (or, rather, lack of contents) of the cells, and then reassemble the function outputs into one matrix with cell2mat:
T = cell2mat(cellfun(#(~) {randperm(100,5)}, cell(N,1)));

Best way to retrieve the number of elements in a vector in matlab

I would like to know what is the best way to retrieve the number of element in a vector in matlab in term of speed:
is it:
length(A)
or
size(A,1)
Neither. You want to always use numel for this purpose. length only returns the longest dimension (which can get confusing for 2D arrays) and size(data, dimension) requires you to know whether it's a row or column vector. numel will return the number of elements whether it is a row vector, column vector, or multi-dimensional array.
We can easily test the performance of these by writing a quick benchmark. We will take the size with the various methods N times (for this I used 10000).
function size_test
nRepeats = 10000;
times1 = zeros(nRepeats, 1);
times2 = zeros(nRepeats, 1);
times3 = zeros(nRepeats, 1);
for k = 1:nRepeats
data = rand(10000, 1);
tic
size(data, 1);
times1(k) = toc;
tic
length(data);
times2(k) = toc;
tic
numel(data);
times3(k) = toc;
end
% Compute the total time required for each method
fprintf('size:\t%0.8f\n', sum(times1));
fprintf('length:\t%0.8f\n', sum(times2));
fprintf('numel:\t%0.8f\n', sum(times3));
end
When run on my machine it yields:
size: 0.00860400
length: 0.00626700
numel: 0.00617300
So in addition to being the most robust, numel is also slightly faster than the other alternatives.
That being said, there are likely many other bottlenecks in your code than determining the number of elements in an array so I would focus on optimizing those.

Vectorization of double for loop including sine of two variables

I need to numerically evaluate some integrals which are all of the form shown in this image:
These integrals are the matrix elements of a N x N matrix, so I need to evaluate them for all possible combinations of n and m in the range of 1 to N. The integrals are symmetric in n and m which I have implemented in my current nested for loop approach:
function [V] = coulomb3(N, l, R, R0, c, x)
r1 = 0.01:x:R;
r2 = R:x:R0;
r = [r1 r2];
rl1 = r1.^(2*l);
rl2 = r2.^(2*l);
sines = zeros(N, length(r));
V = zeros(N, N);
for i = 1:N;
sines(i, :) = sin(i*pi*r/R0);
end
x1 = length(r1);
x2 = length(r);
for nn = 1:N
for mm = 1:nn
f1 = (1/6)*rl1.*r1.^2.*sines(nn, 1:x1).*sines(mm, 1:x1);
f2 = ((R^2/2)*rl2 - (R^3/3)*rl2.*r2.^(-1)).*sines(nn, x1+1:x2).*sines(mm, x1+1:x2);
value = 4*pi*c*x*trapz([f1 f2]);
V(nn, mm) = value;
V(mm, nn) = value;
end
end
I figured that calling sin(x) in the loop was a bad idea, so I calculate all the needed values and store them. To evaluate the integrals I used trapz, but as the first and the second/third integrals have different ranges the function values need to be calculated separately and then combined.
I've tried a couple different ways of vectorization but the only one that gives the correct results takes much longer than the above loop (used gmultiply but the arrays created are enourmous). I've also made an analytical solution (which is possible assuming m and n are integers and R0 > R > 0) but these solutions involve a cosine integral (cosint in MATLAB) function which is extremely slow for large N.
I'm not sure the entire thing can be vectorized without creating very large arrays, but the inner loop at least should be possible. Any ideas would be be greatly appreciated!
The inputs I use currently are:
R0 = 1000;
R = 8.4691;
c = 0.393*10^(-2);
x = 0.01;
l = 0 # Can reasonably be 0-6;
N = 20; # Increasing the value will give the same results,
# but I would like to be able to do at least N = 600;
Using these values
V(1, 1:3) = 873,379900963549 -5,80688363271849 -3,38139152472590
Although the diagonal values never converge with increasing R0 so they are less interesting.
You will lose the gain from the symmetricity of the problem with my approach, but this means a factor of 2 loss. Odds are that you'll still benefit in the end.
The idea is to use multidimensional arrays, making use of trapz supporting these inputs. I'll demonstrate the first term in your figure, as the two others should be done similarly, and the point is the technique:
r1 = 0.01:x:R;
r2 = R:x:R0;
r = [r1 r2].';
rl1 = r1.'.^(2*l);
rl2 = r2.'.^(2*l);
sines = zeros(length(r),N); %// CHANGED!!
%// V = zeros(N, N); not needed now, see later
%// you can define sines in a vectorized way as well:
sines = sin(r*(1:N)*pi/R0); %//' now size [Nr, N] !
%// note that implicitly r is of size [Nr, 1, 1]
%// and sines is of size [Nr, N, 1]
sines2mat = permute(sines,[1, 3, 2]); %// size [Nr, 1, N]
%// the first term in V: perform integral along first dimension
%//V1 = 1/6*squeeze(trapz(bsxfun(#times,bsxfun(#times,r.^(2*l+2),sines),sines2mat),1))*x; %// 4*pi*c prefactor might be physics, not math
V1 = 1/6*permute(trapz(bsxfun(#times,bsxfun(#times,r.^(2*l+2),sines),sines2mat),1),[2,3,1])*x; %// 4*pi*c prefactor might be physics, not math
The key point is that bsxfun(#times,r.^(2*l+2),sines) is a matrix of size [Nr,N,1], which is again multiplied by sines2mat using bsxfun, the result is of size [Nr,N,N] and an element (k1,k2,k3) corresponds to an integrand at radial point k1, n=k2 and m=k3. Using trapz() with explicitly the first dimension (which would be default) reduces this to an array of size [1,N,N], which is just what you need after a good squeeze(). Update: as per #Dev-iL's comment you should use permute instead of squeeze to get rid of the leading singleton dimension, as that might be more efficent.
The two other terms can be handled the same way, and of course it might still help if you restructure the integrals based on overlapping and non-overlapping parts.

Can someone help vectorise this matlab loop?

i am trying to learn how to vectorise matlab loops, so im just doing a few small examples.
here is the standard loop i am trying to vectorise:
function output = moving_avg(input, N)
output = [];
for n = N:length(input) % iterate over y vector
summation = 0;
for ii = n-(N-1):n % iterate over x vector N times
summation += input(ii);
endfor
output(n) = summation/N;
endfor
endfunction
i have been able to vectorise one loop, but cant work out what to do with the second loop. here is where i have got to so far:
function output = moving_avg(input, N)
output = [];
for n = N:length(input) % iterate over y vector
output(n) = mean(input(n-(N-1):n));
endfor
endfunction
can someone help me simplify it further?
EDIT:
the input is just a one dimensional vector and probably maximum 100 data points. N is a single integer, less than the size of the input (typically probably around 5)
i don't actually intend to use it for any particular application, it was just a simple nested loop that i thought would be good to use to learn about vectorisation..
Seems like you are performing convolution operation there. So, just use conv -
output = zeros(size(input1))
output(N:end) = conv(input1,ones(1,N),'valid')./N
Please note that I have replaced the variable name input with input1, as input is already used as the name of a built-in function in MATLAB, so it's a good practice to avoid such conflicts.
Generic case: For a general case scenario, you can look into bsxfun to create such groups and then choose your operation that you intend to perform at the final stage. Here's how such a code would look like for sliding/moving average operation -
%// Create groups of indices for each sliding interval of length N
idx = bsxfun(#plus,[1:N]',[0:numel(input1)-N]) %//'
%// Index into input1 with those indices to get grouped elements from it along columns
input1_indexed = input1(idx)
%// Finally, choose the operation you intend to perform and apply along the
%// columns. In this case, you are doing average, so use mean(...,1).
output = mean(input1_indexed,1)
%// Also pre-append with zeros if intended to match up with the expected output
Matlab as a language does this type of operation poorly - you will always require an outside O(N) loop/operation involving at minimum O(K) copies which will not be worth it in performance to vectorize further because matlab is a heavy weight language. Instead, consider using the
filter function where these things are typically implemented in C which makes that type of operation nearly free.
For a sliding average, you can use cumsum to minimize the number of operations:
x = randi(10,1,10); %// example input
N = 3; %// window length
y = cumsum(x); %// compute cumulative sum of x
z = zeros(size(x)); %// initiallize result to zeros
z(N:end) = (y(N:end)-[0 y(1:end-N)])/N; %// compute order N difference of cumulative sum

make this matlab snippet run without a loop

I want a code the below code more efficient timewise. preferably without a loop.
arguments:
t % time values vector
t_index = c % one of the possible indices ranging from 1:length(t).
A % a MXN array where M = length(t)
B % a 1XN array
code:
m = 1;
for k = t_index:length(t)
A(k,1:(end-m+1)) = A(k,1:(end-m+1)) + B(m:end);
m = m + 1;
end
Many thanks.
I'd built from B a matrix of size NxM (call it B2), with zeros in the right places and a triangular from according to the conditions and then all you need to do is A+B2.
something like this:
N=size(A,2);
B2=zeros(size(A));
k=c:length(t);
B2(k(1):k(N),:)=hankel(B)
ans=A+B2;
Note, the fact that it is "vectorized" doesn't mean it is faster these days. Matlab's JIT makes for loops comparable and sometimes faster than built-in vectorized options.