I'm trying to generate a 100-by-5 matrix where every line is a permutation of 1..100 (that is, every line is 5 random numbers from [1..100] without repetitions).
So far I've only been able to do it iteratively with a for-loop. Is there a way to do it more efficiently (using fewer lines of code), without loops?
N = 100;
T = zeros(N, 5);
for i = 1:N
T(i, :) = randperm(100, 5);
end
Let
N = 100; % desired number of rows
K = 5; % desired number of columns
M = 100; % size of population to sample from
Here's an approach that's probably fast; but memory-expensive, as it generates an intermediate M×N matrix and then discards N-K rows:
[~, result] = sort(rand(N, M), 2);
result = result(:, 1:K);
There is very little downside to using a loop here, at least in this minimal example. Indeed, it may well be the best-performing solution for MATLAB's execution engine. But perhaps you don't like assigning the temporary variable i or there are other advantages to vectorization in your non-minimal implementation. Consider this carefully before blindly implementing a solution.
You need to call randperm N times, but each call has no dependency on its position in the output. Without a loop index you will need something else to regulate the number of calls, but this can be just N empty cells cell(N,1). You can use this cell array to evaluate a function that calls randperm but ignores the contents (or, rather, lack of contents) of the cells, and then reassemble the function outputs into one matrix with cell2mat:
T = cell2mat(cellfun(#(~) {randperm(100,5)}, cell(N,1)));
Related
In Matlab, I would like to generate a matrix with 4 random, unique samples (out of 10) 7 times.
In order to avoid a for-loop, I thought I could just repeat my data and use datasample from Statistics and Machine Learning Toolbox on the first dimension. But it always chooses the same 4 values from each column, so this is kind of useless.
Consider the following MWE:
randomData = [50.29; 47.72; 48.38; 48.02; 44.23; 47.17; 48.19; 49.11; 50.44; 53.40];
numOfReps = 7;
numOfSamples = 4;
randomDataRepMatrix = randomData*ones(1, numOfReps);
s = RandStream('mlfg6331_64');
y = datasample(s, randomDataRepMatrix, numOfSamples, 'Replace', false);
Even without the RandStream part, I get the same results...
Any idea? Or do I need to use the for-loop after all?
I don't think datasample or randsample can produce several sets of samples in one go. Here's a "manual" way to do it (not necessarily faster than using datasample with a loop):
[~, ind] = sort(rand(numel(randomData), numOfReps)); % each column is a permutation
ind = ind(1:numOfSamples,:); % keep only the first values in each column
y = randomData(ind); % index into data
Consider the preallocation of the following two vectors:
vecCol = NaN( 3, 1 );
vecRow = NaN( 1, 3 );
Now the goal is to assign values to those vectors (e.g. within a loop if vectorization is not possible). Is there a convention or best practice regarding the indexing?
Is the following approach recommended?
for k = 1:3
vecCol( k, 1 ) = 1; % Row, Column
vecRow( 1, k ) = 2; % Row, Column
end
Or is it better to code as follows?
for k = 1:3
vecCol(k) = 1; % Element
vecRow(k) = 2; % Element
end
It makes no difference functionally. If the context means that the vectors are always 1D (your naming convention in this example helps) then you can just use vecCol(i) for brevity and flexibility. However, there are some advantages to using the vecCol(i,1) syntax:
It's explicitly clear which type of vector you're using. This is good if it matters, e.g. when using linear algebra, but might be irrelevant if direction is arbitrary.
If you forget to initialise (bad but it happens) then it will ensure the direction is as expected
It's a good habit to get into so you don't forget when using 2D arrays
It appears to be slightly quicker. This will be negligible on small arrays but see the below benchmark for vectors with 10^8 elements, and a speed improvement of >10%.
function benchie()
% Benchmark. Set up large row/column vectors, time value assignment using timeit.
n = 1e8;
vecCol = NaN(n, 1); vecRow = NaN(1, n);
f = #()fullidx(vecCol, vecRow, n);
s = #()singleidx(vecCol, vecRow, n);
timeit(f)
timeit(s)
end
function fullidx(vecCol, vecRow, n)
% 2D indexing, copied from the example in question
for k = 1:n
vecCol(k, 1) = 1; % Row, Column
vecRow(1, k) = 2; % Row, Column
end
end
function singleidx(vecCol, vecRow, n)
% Element indexing, copied from the example in question
for k = 1:n
vecCol(k) = 1; % Element
vecRow(k) = 2; % Element
end
end
Output (tested on Windows 64-bit R2015b, your mileage may vary!)
% f (full indexing): 2.4874 secs
% s (element indexing): 2.8456 secs
Iterating this benchmark over increasing n, we can produce the following plot for reference.
A general rule of thumb in programming is "explicit is better than implicit". Since there is no functional difference between the two, I'd say it depends on context which one is cleaner/better:
if the context uses a lot of matrix algebra and the distinction between row and column vectors is important, use the 2-argument indexing to reduce bugs and facilitate reading
if the context doesn't disciminate much between the two and you're just using vectors as simple arrays, using 1-argument indexing is cleaner
i am trying to learn how to vectorise matlab loops, so im just doing a few small examples.
here is the standard loop i am trying to vectorise:
function output = moving_avg(input, N)
output = [];
for n = N:length(input) % iterate over y vector
summation = 0;
for ii = n-(N-1):n % iterate over x vector N times
summation += input(ii);
endfor
output(n) = summation/N;
endfor
endfunction
i have been able to vectorise one loop, but cant work out what to do with the second loop. here is where i have got to so far:
function output = moving_avg(input, N)
output = [];
for n = N:length(input) % iterate over y vector
output(n) = mean(input(n-(N-1):n));
endfor
endfunction
can someone help me simplify it further?
EDIT:
the input is just a one dimensional vector and probably maximum 100 data points. N is a single integer, less than the size of the input (typically probably around 5)
i don't actually intend to use it for any particular application, it was just a simple nested loop that i thought would be good to use to learn about vectorisation..
Seems like you are performing convolution operation there. So, just use conv -
output = zeros(size(input1))
output(N:end) = conv(input1,ones(1,N),'valid')./N
Please note that I have replaced the variable name input with input1, as input is already used as the name of a built-in function in MATLAB, so it's a good practice to avoid such conflicts.
Generic case: For a general case scenario, you can look into bsxfun to create such groups and then choose your operation that you intend to perform at the final stage. Here's how such a code would look like for sliding/moving average operation -
%// Create groups of indices for each sliding interval of length N
idx = bsxfun(#plus,[1:N]',[0:numel(input1)-N]) %//'
%// Index into input1 with those indices to get grouped elements from it along columns
input1_indexed = input1(idx)
%// Finally, choose the operation you intend to perform and apply along the
%// columns. In this case, you are doing average, so use mean(...,1).
output = mean(input1_indexed,1)
%// Also pre-append with zeros if intended to match up with the expected output
Matlab as a language does this type of operation poorly - you will always require an outside O(N) loop/operation involving at minimum O(K) copies which will not be worth it in performance to vectorize further because matlab is a heavy weight language. Instead, consider using the
filter function where these things are typically implemented in C which makes that type of operation nearly free.
For a sliding average, you can use cumsum to minimize the number of operations:
x = randi(10,1,10); %// example input
N = 3; %// window length
y = cumsum(x); %// compute cumulative sum of x
z = zeros(size(x)); %// initiallize result to zeros
z(N:end) = (y(N:end)-[0 y(1:end-N)])/N; %// compute order N difference of cumulative sum
Let us say I have the following:
M = randn(10,20);
T = randn(1,20);
I would like to threshold each column of M, by each entry of T. For example, find all indicies of all elements of M(:,1) that are greater than T(1). Find all indicies of all elements in M(:,2) that are greater than T(2), etc etc.
Of course, I would like to do this without a for-loop. Is this possible?
You can use bsxfun like this:
I = bsxfun(#gt, M, T);
Then I will be a logcial matrix of size(M) with ones where M(:,i) > T(i).
You can use bsxfun to do things like this, but it may not be faster than a for loop (more below on this).
result = bsxfun(#gt,M,T)
This will do an element wise comparison and return you a logical matrix indicating the relationship governed by the first argument. I have posted code below to show the direct comparison, indicating that it does return what you are looking for.
%var declaration
M = randn(10,20);
T = randn(1,20);
% quick method
fastres = bsxfun(#gt,M,T);
% looping method
res = false(size(M));
for i = 1:length(T)
res(:,i) = M(:,i) > T(i);
end
% check to see if the two matrices are identical
isMatch = all(all(fastres == res))
This function is very powerful and can be used to help speed up processes, but keep in mind that it will only speed things up if there is a lot of data. There is a bit of background work that bsxfun must do, which can actually cause it to be slower.
I would only recommend using it if you have several thousand data points. Otherwise, the traditional for-loop will actually be faster. Try it out for yourself by changing the size of the M and T variables.
You can replicate the threshold vector and use matrix comparison:
s=size(M);
T2=repmat(T, s(1), 1);
M(M<T2)=0;
Indexes=find(M);
I want a code the below code more efficient timewise. preferably without a loop.
arguments:
t % time values vector
t_index = c % one of the possible indices ranging from 1:length(t).
A % a MXN array where M = length(t)
B % a 1XN array
code:
m = 1;
for k = t_index:length(t)
A(k,1:(end-m+1)) = A(k,1:(end-m+1)) + B(m:end);
m = m + 1;
end
Many thanks.
I'd built from B a matrix of size NxM (call it B2), with zeros in the right places and a triangular from according to the conditions and then all you need to do is A+B2.
something like this:
N=size(A,2);
B2=zeros(size(A));
k=c:length(t);
B2(k(1):k(N),:)=hankel(B)
ans=A+B2;
Note, the fact that it is "vectorized" doesn't mean it is faster these days. Matlab's JIT makes for loops comparable and sometimes faster than built-in vectorized options.