MATLAB cellfun vectorization slow when using function handle - matlab

I encountered a weird bug in cell vectorization (MATLAB version R2019B).
Please consider the following minimal example, say we generate a cell array with variable length vector in each cell:
N = 10000;
rng(1);
result = cell(N,1);
numConnect = randi(10, [N,1]); % randomly generated number of connected nodes
for i = 1:N
result{i} = randi(N, [1, numConnect(i)]);
end
Now we want to retrospectively retrieve numConnect, i.e., the length of each cell, we can use cellfun. According to this documentation, in Backward Compatibility mode, you can use string as func variable instead of function handle. However, there is a drastic difference in performance locally.
tic;
nC1 = cellfun('length', result);
toc;
This one usually produces something like
Elapsed time is 0.038531 seconds.
If I changed to # function handle:
tic;
nC2 = cellfun(#length, result);
toc;
Then
Elapsed time is 1.041925 seconds.
is normal. There is a 30x difference!
I wonder is this performance difference a bug on my local machine, or a "feature" of MATLAB cellfun?

Related

Initialize vector with function in matlab

just started out with matlab and have some troubles finding the solution for the following action:
I am trying to initialize a vector of 1000 different values, with a function that doesn't take any arguments as input. I can do this with a for loop, but haven't found out how to do it without.
What I expected that would work:
z = zeros(1,1000)
result = arrayfun(*functionname*,z)
This however gives an error saying that the first input must be a function handle.
My function is a simple implementation of a monte carlo method to calculate pi:
function Result = mcm()
clear
N=1000;
M=0;
for j=1:N
p=[2*rand-1; 2*rand-1];
if p'*p<1
M=M+1;
end
end
Result=4*M/N
One way to actually vectorize your given function mcm would be -
N = 1000; %// Number of data points
P = [2*rand(1,N)-1; 2*rand(1,N)-1]; %// OR 2*rand(2,N)-1
out = 4*sum(sum(P.^2,1)<1)/N
Runtime tests
Code -
N = 1000000; %// Number of data points
disp('---------------- With Original Approach')
tic
M=0;
for j=1:N
P=[2*rand-1; 2*rand-1];
if P'*P<1
M=M+1;
end
end
Result=4*M/N;
toc
disp('---------------- With Proposed Approach')
tic
P = 2*rand(2,N)-1;
out = 4*sum(sum(P.^2,1)<1)/N;
toc
Timings & Outputs -
---------------- With Original Approach
Elapsed time is 3.952998 seconds.
---------------- With Proposed Approach
Elapsed time is 0.089590 seconds.
>> Result
Result =
3.1422
>> out
out =
3.1428
Since your function takes no arguments you can't use arrayfun. arrayfun applies the function to each element in the array.
Instead use this:
z = ones(1,1000) * mcm;
A side benefit is that mcm will only run once so it will be faster than looping that function 1000 times.

How to accumulate submatrices without looping (subarray smoothing)?

In Matlab I need to accumulate overlapping diagonal blocks of a large matrix. The sample code is given below.
Since this piece of code needs to run several times, it consumes a lot of resources. The process is used in array signal processing for a so-called subarray smoothing or spatial smoothing. Is there any way to do this faster?
% some values for parameters
M = 1000; % size of array
m = 400; % size of subarray
n = M-m+1; % number of subarrays
R = randn(M)+1i*rand(M);
% main code
S = R(1:m,1:m);
for i = 2:n
S = S + R(i:m+i-1,i:m+i-1);
end
ATTEMPTS:
1) I tried the following alternative vectorized version, but unfortunately it became much slower!
[X,Y] = meshgrid(1:m);
inds1 = sub2ind([M,M],Y(:),X(:));
steps = (0:n-1)*(M+1);
inds = repmat(inds1,1,n) + repmat(steps,m^2,1);
RR = sum(R(inds),2);
S = reshape(RR,m,m);
2) I used Matlab coder to create a MEX file and it became much slower!
I've personally had to fasten up some portions of my code lately. Being not an expert at all, I would recommend trying the following:
1) Vectorize:
Getting rid of the for-loop
S = R(1:m,1:m);
for i = 2:n
S = S + R(i:m+i-1,i:m+i-1)
end
and replacing it for an alternative based on cumsum should be the way to go here.
Note: will try and work on this approach on a future Edit
2) Generating a MEX-file:
In some instances, you could simply fire up the Matlab Coder app (given that you have it in your current Matlab version).
This should generate a .mex file for you, that you can call as it was the function that you are trying to replace.
Regardless of your choice (1) or 2)), you should profile your current implementation with tic; my_function(); toc; for a fair number of function calls, and compare it with your current implementation:
my_time = zeros(1,10000);
for count = 1:10000
tic;
my_function();
my_time(count) = toc;
end
mean(my_time)

Call multiple functions from cells in MATLAB

I store some functions in cell, e.g. f = {#sin, #cos, #(x)x+4}.
Is it possible to call all those functions at the same time (with the same input). I mean something more efficient than using a loop.
As constructed, the *fun family of functions exists for this purpose (e.g., cellfun is the pertinent one here). They are other questions on the use and performance of these functions.
However, if you construct f as a function that constructs a cell array as
f = #(x) {sin(x), cos(x), x+4};
then you can call the function more naturally: f([1,2,3]) for example.
This method also avoids the need for the ('UniformOutput',false) option pair needed by cellfun for non-scalar argument.
You can also use regular double arrays, but then you need to be wary of input shape for concatenation purposes: #(x) [sin(x), cos(x), x+4] vs. #(x) [sin(x); cos(x); x+4].
I'm just posting these benchmarking results here, just to illustrate that loops not necessarily are slower than other approaches:
f = {#sin, #cos, #(x)x+4};
x = 1:100;
tic
for ii = 1:1000
for jj = 1:numel(f)
res{jj} = f{jj}(x);
end
end
toc
tic
for ii = 1:1000
res = cellfun(#(arg) arg(x),functions,'uni',0);
end
toc
Elapsed time is 0.042201 seconds.
Elapsed time is 0.179229 seconds.
Troy's answer is almost twice as fast as the loop approach:
tic
for ii = 1:1000
res = f((1:100).');
end
toc
Elapsed time is 0.025378 seconds.
This might do the trick
functions = {#(arg) sin(arg),#(arg) sqrt(arg)}
x = 5;
cellfun(#(arg) arg(x),functions)
hope this helps.
Adrien.

MATLab Bootstrap without for loop

yesterday I implemented my first bootstrap in MATLab. (and yes, I know, for loops are evil.):
%data is an mxn matrix where the data should be sampled per column but there
can be a NaNs Elements
%from the array (a column of data) n values are sampled nReps times
function result = bootstrap_std(data, n, nReps,quantil)
result = zeros(1,size(data,2));
for i=1:size(data,2)
bootstrap_data = zeros(n,nReps);
values = find(~isnan(data(:,i)));
if isempty(values)
bootstrap_data(:,:) = NaN;
else
for k=1:nReps
bootstrap_data(:,k) = datasample(data(values,i),n);
end
end
stat = zeros(1,nReps);
for k=1:nReps
stat(k) = nanstd(bootstrap_data(:,k));
end
sort(stat);
result(i) = quantile(stat,quantil);
end
end
As one can see, this version works columnwise. The algorithm does what it should but is really slow when the data size increaes. My question is now: Is it possible to implement this logic without using for loops? My problem is here that I could not find a version of datasample which does the sampling columnwise. Or is there a better function to use?
I am happy for any hint or idea how I can speed up this implementation.
Thanks and best regards!
stephan
The bottlenecks in your implementation are
The function spends a lot of time inside nanstd which is unnecessary since you exclude NaN values from your sample anyway.
There are a lot of functions that operate column-wise, but you spend time looping over the columns and calling them many times.
You make many calls to datasample which is a relatively slow function. It's much faster to create a random vector of indices using randi and use that instead.
Here's how I would write the function (actually I probably wouldn't put in this many comments, and I wouldn't use so many temp variables, but I'm doing it now so you can see what all the steps of the computation are).
function result = bootstrap_std_new(data, n, nRep, quantil)
result = zeros(1, size(data,2));
for i = 1:size(data,2)
isbad = isnan(data(:,i)); %// Vector of NaN values
if all(isbad)
result(i) = NaN;
else
data0 = data(~isbad, i); %// Temp copy of this column for indexing
index = randi(size(data0,1), n, nRep); %// Create the indexing vector
bootstrapdata = data0(index); %// Sample the data
stdevs = std(bootstrapdata); %// Stdev of sampled data
result(i) = quantile(stdevs, quantil); %// Find the correct quantile
end
end
end
Here are some timings
>> data = randn(100,10);
>> data(randi(1000, 50, 1)) = NaN;
>> tic, bootstrap_std(data, 50, 1000, 0.5); toc
Elapsed time is 1.359529 seconds.
>> tic, bootstrap_std_new(data, 50, 1000, 0.5); toc
Elapsed time is 0.038558 seconds.
So this gives you about a 35x speedup.
Your main issue seems to be that you may have varying numbers/positions of NaN in each column, so can't work on the full matrix unless you're okay with also sampling NaNs. However, some of the inner loops could be simplified.
for k=1:nReps
bootstrap_data(:,k) = datasample(data(values,i),n);
end
Since you're sampling with replacement, you should be able to just do:
bootstrap_data = datasample(data(values,i), n*nReps);
bootstrap_data = reshape(bootstrap_data, [n nReps]);
Also nanstd can work on a full matrix so no need to loop:
stat = nanstd(bootstrap_data); % or nanstd(x,0,2) to change dimension
It would also be worth just looking over your code with profile to see where the bottlenecks are.

why arrayfun does NOT improve my struct array operation performance

here is the input data:
% #param Landmarks:
% Landmarks should be 1*m struct.
% m is the number of training set.
% Landmark(i).data is a n*2 matrix
old function:
function Landmarks=CenterOfGravity(Landmarks)
% align center of gravity
for i=1 : length(Landmarks)
Landmarks(i).data=Landmarks(i).data - ones(size(Landmarks(i).data,1),1)...
*mean(Landmarks(i).data);
end
end
new function which use arrayfun:
function [Landmarks] = center_to_gravity(Landmarks)
Landmarks = arrayfun(#(struct_data)...
struct('data', struct_data.data - repmat(mean(struct_data.data), [size(struct_data.data, 1), 1]))...
,Landmarks);
end %function center_to_gravity
when using profiler, I find the usage of time is NOT what I expected:
Function Total Time Self Time*
CenterOfGravity 0.011s 0.004 s
center_to_gravity 0.029s 0.001 s
Can someone tell me why?
BTW...I can't add "arrayfun" as a new tag for my reputation.
Using arrayfun does not count as "vectorizing your code" as described in every Matlab performance blog post ever written.
If your .data field is the same length for all entries of landmark, your could vectorize this code by first placing all of the data into a single DATASIZE-BY-LANDMARKSIZE martix, and then running this command
meanRemovedData = bsxfun(#minus, data, mean(data,1));
But you lose an awful lot of code clarity that way. (I'm pretty sure that bsxfun usually has vectorization-like speed advantages, but I haven't done any time testing this morning.)
In terms of why, I'm not really the right guy to ask. But many of the advantages of vectorization are dependent on performing simple operations of contiguous blocks of memory. Data stored in an array of structures is (I believe) stored as an array of pointers to disparate memory locations, which is why you can change the size or class of Landmarks(i).data without reallocating the whole structure array.
Thanks for Amro and Pursuit's enthusiastic to my question.
I get the best solution at Matlab answers from Jan Simon:
why arrayfun does NOT improve my struct array operation performance
There are some points that do improve the performance:
It is surprisingly that SUM/LENGTH is faster than MEAN
timeit can give more accurate result.
The fastest approach use tricks like this:
m = sum(data, 1) / size(data, 1);
data(:, 1) = data(:, 1) - m(1);
Consider the following three implementations (all vectorized using BSXFUN):
function s = func1(s)
for i=1:numel(s)
s(i).data = bsxfun(#minus, s(i).data, mean(s(i).data));
end
end
function v = func2(s)
v = arrayfun(#(ss) bsxfun(#minus,ss.data,mean(ss.data)), ...
s, 'UniformOutput',false);
v = struct('data',v);
end
function v = func3(s)
v = arrayfun(#(ss) struct('data',bsxfun(#minus,ss.data,mean(ss.data))), ...
s, 'UniformOutput',true);
end
Explanation:
First uses a for-loop to iterate over the array of structs.
Second uses ARRAYFUN to return a cell array of the data matrices, which are then passed to STRUCT to build the array of structures.
The last one uses ARRAYFUN and builds a structure directly at each iteration.
Here is a simple test to compare the timings:
function testArrayStruct()
%# sample array of structures
s = struct('data',[]);
for i=5000:-1:1
s(i).data = rand(randi(1000),2);
end
%# timing
tic; v1 = func1(s); toc
tic; v2 = func2(s); toc
tic; v3 = func3(s); toc
%# check all have the same output
assert(isequal(v1,v2,v3))
end
The results:
Elapsed time is 0.357796 seconds. %# func1
Elapsed time is 0.427568 seconds. %# func2
Elapsed time is 0.537971 seconds. %# func3
So you can see the loop-based solution is actually the fastest..