Powers of a matrix - matlab

I have a square matrix A (nxn). I would like to create a series of k powers of this matrix into an nxnxk multidimensional matrix (Not element-wise but actual powers of the matrix), i.e.getting [A^0 A^1 A^2..A^k]. It's sort of a varied vandermonde for matrix case.
I am able to do it with loops but it is annoying and slow. I tried using bsxfun but no luck since I am probably missing something here.
Here is a simple loop that I did:
for j=1:1:100
final(:,:,j)=A^(j-1);
end

You are trying to perform cummulative version of mpower with a vector of k values.
Sadly, bsxfun hasn't evolved yet to handle such a case. So, the best I could suggest at this point would be having a running storage that accumulates the matrix-product at each iteration to be used at the next one.
Your original loop code looked something like this -
final = zeros([size(A),100]);
for j=1:1:100
final(:,:,j)=A^(j-1);
end
So, with the suggestion, the modified loopy code would be -
final = zeros([size(A),100]);
matprod = A^0;
final(:,:,1) = matprod;
for j=2:1:100
matprod = A*matprod;
final(:,:,j)= matprod;
end
Benchmarking -
%// Input
A = randi(9,200,200);
disp('---------- Original loop code -----------------')
tic
final = zeros([size(A),100]);
for j=1:1:100
final(:,:,j)=A^(j-1);
end
toc
disp('---------- Modified loop code -----------------')
tic
final2 = zeros([size(A),100]);
matprod = A^0;
final2(:,:,1) = matprod;
for j=2:1:100
matprod = A*matprod;
final2(:,:,j)= matprod;
end
toc
Runtimes -
---------- Original loop code -----------------
Elapsed time is 1.255266 seconds.
---------- Modified loop code -----------------
Elapsed time is 0.205227 seconds.

Related

MATLAB cellfun vectorization slow when using function handle

I encountered a weird bug in cell vectorization (MATLAB version R2019B).
Please consider the following minimal example, say we generate a cell array with variable length vector in each cell:
N = 10000;
rng(1);
result = cell(N,1);
numConnect = randi(10, [N,1]); % randomly generated number of connected nodes
for i = 1:N
result{i} = randi(N, [1, numConnect(i)]);
end
Now we want to retrospectively retrieve numConnect, i.e., the length of each cell, we can use cellfun. According to this documentation, in Backward Compatibility mode, you can use string as func variable instead of function handle. However, there is a drastic difference in performance locally.
tic;
nC1 = cellfun('length', result);
toc;
This one usually produces something like
Elapsed time is 0.038531 seconds.
If I changed to # function handle:
tic;
nC2 = cellfun(#length, result);
toc;
Then
Elapsed time is 1.041925 seconds.
is normal. There is a 30x difference!
I wonder is this performance difference a bug on my local machine, or a "feature" of MATLAB cellfun?

Fastest approach to copying/indexing variable parts of 3D matrix

I have large sets of 3D data consisting of 1D signals acquired in 2D space.
The first step in processing this data is thresholding all signals to find the arrival of a high-amplitude pulse. This pulse is present in all signals and arrives at different times.
After thresholding, the 3D data set should be reordered so that every signal starts at the arrival of the pulse and what came before is thrown away (the end of the signals is of no importance, as of now i concatenate zeros to the end of all signals so the data remains the same size).
Now, I have implemented this in the following manner:
First, i start by calculating the sample number of the first sample exceeding the threshold in all signals
M = randn(1000,500,500); % example matrix of realistic size
threshold = 0.25*max(M(:,1,1)); % 25% of the maximum in the first signal as threshold
[~,index] = max(M>threshold); % indices of first sample exceeding threshold in all signals
Next, I want all signals to be shifted so that they all start with the pulse. For now, I have implemented it this way:
outM = zeros(size(M)); % preallocation for speed
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
end
end
This works fine, and i know for-loops are not that slow anymore, but this easily takes a few seconds for the datasets on my machine. A single iteration of the for-loop takes about 0.05-0.1 sec, which seems slow to me for just copying a vector containing 500-2000 double values.
Therefore, I have looked into the best way to tackle this, but for now I haven't found anything better.
I have tried several things: 3D masks, linear indexing, and parallel loops (parfor).
for 3D masks, I checked to see if any improvements are possible. Therefore i first contruct a logical mask, and then compare the speed of the logical mask indexing/copying to the double nested for loop.
%% set up for logical mask copying
AA = logical(ones(500,1)); % only copy the first 500 values after the threshold value
Mask = logical(zeros(size(M)));
Jepla = zeros(500,size(M,2),size(M,3));
for i = 1:size(M,2)
for j = 1:size(M,3)
Mask(index(1,i,j):index(1,i,j)+499,i,j) = AA;
end
end
%% speed comparison
tic
Jepla = M(Mask);
toc
tic
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
end
end
toc
The for-loop is faster every time, even though there is more that's copied.
Next, linear indexing.
%% setup for linear index copying
%put all indices in 1 long column
LongIndex = reshape(index,numel(index),1);
% convert to linear indices and store in new variable
linearIndices = sub2ind(size(M),LongIndex,repmat(1:size(M,2),1,size(M,3))',repelem(1:size(M,3),size(M,2))');
% extend linear indices with those of all values to copy
k = zeros(numel(M),1);
count = 1;
for i = 1:numel(LongIndex)
values = linearIndices(i):size(M,1)*i;
k(count:count+length(values)-1) = values;
count = count + length(values);
end
k = k(1:count-1);
% get linear indices of locations in new matrix
l = zeros(length(k),1);
count = 1;
for i = 1:numel(LongIndex)
values = repelem(LongIndex(i)-1,size(M,1)-LongIndex(i)+1);
l(count:count+length(values)-1) = values;
count = count + length(values);
end
l = k-l;
% create new matrix
outM = zeros(size(M));
%% speed comparison
tic
outM(l) = M(k);
toc
tic
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
end
end
toc
Again, the alternative approach, linear indexing, is (a lot) slower.
After this failed, I learned about parallelisation, and though this would for sure speed up my code.
By reading some of the documentation around parfor and trying it out a bit, I changed my code to the following:
gcp;
outM = zeros(size(M));
inM = mat2cell(M,size(M,1),ones(size(M,2),1),size(M,3));
tic
parfor i = 1:500
for j = 1:500
outM(:,i,j) = [inM{i}(index(1,i,j):end,1,j);zeros(index(1,i,j)-1,1)];
end
end
end
toc
I changed it so that "outM" and "inM" would both be sliced variables, as I read this is best. Still this is very slow, a lot slower than the original for loop.
So now the question, should I give up on trying to improve the speed of this operation? Or is there another way in which to do this? I have searched a lot, and for now do not see how to speed this up.
Sorry for the long question, but I wanted to show what I tried.
Thank you in advance!
Not sure if an option in your situation, but looks like cell arrays are actually faster here:
outM2 = cell(size(M,2),size(M,3));
tic;
for i = 1:size(M,2)
for j = 1:size(M,3)
outM2{i,j} = M(index(1,i,j):end,i,j);
end
end
toc
And a second idea which also came out faster, batch all data which have to be shifted by the same value:
tic;
for i = 1:unique(index).'
outM(1:size(M,1)+1-i,index==i) = M(i:end,index==i);
end
toc
It totally depends on your data if this approach is actually faster.
And yes integer valued and logical indexing can be mixed

Initialize vector with function in matlab

just started out with matlab and have some troubles finding the solution for the following action:
I am trying to initialize a vector of 1000 different values, with a function that doesn't take any arguments as input. I can do this with a for loop, but haven't found out how to do it without.
What I expected that would work:
z = zeros(1,1000)
result = arrayfun(*functionname*,z)
This however gives an error saying that the first input must be a function handle.
My function is a simple implementation of a monte carlo method to calculate pi:
function Result = mcm()
clear
N=1000;
M=0;
for j=1:N
p=[2*rand-1; 2*rand-1];
if p'*p<1
M=M+1;
end
end
Result=4*M/N
One way to actually vectorize your given function mcm would be -
N = 1000; %// Number of data points
P = [2*rand(1,N)-1; 2*rand(1,N)-1]; %// OR 2*rand(2,N)-1
out = 4*sum(sum(P.^2,1)<1)/N
Runtime tests
Code -
N = 1000000; %// Number of data points
disp('---------------- With Original Approach')
tic
M=0;
for j=1:N
P=[2*rand-1; 2*rand-1];
if P'*P<1
M=M+1;
end
end
Result=4*M/N;
toc
disp('---------------- With Proposed Approach')
tic
P = 2*rand(2,N)-1;
out = 4*sum(sum(P.^2,1)<1)/N;
toc
Timings & Outputs -
---------------- With Original Approach
Elapsed time is 3.952998 seconds.
---------------- With Proposed Approach
Elapsed time is 0.089590 seconds.
>> Result
Result =
3.1422
>> out
out =
3.1428
Since your function takes no arguments you can't use arrayfun. arrayfun applies the function to each element in the array.
Instead use this:
z = ones(1,1000) * mcm;
A side benefit is that mcm will only run once so it will be faster than looping that function 1000 times.

MATLab Bootstrap without for loop

yesterday I implemented my first bootstrap in MATLab. (and yes, I know, for loops are evil.):
%data is an mxn matrix where the data should be sampled per column but there
can be a NaNs Elements
%from the array (a column of data) n values are sampled nReps times
function result = bootstrap_std(data, n, nReps,quantil)
result = zeros(1,size(data,2));
for i=1:size(data,2)
bootstrap_data = zeros(n,nReps);
values = find(~isnan(data(:,i)));
if isempty(values)
bootstrap_data(:,:) = NaN;
else
for k=1:nReps
bootstrap_data(:,k) = datasample(data(values,i),n);
end
end
stat = zeros(1,nReps);
for k=1:nReps
stat(k) = nanstd(bootstrap_data(:,k));
end
sort(stat);
result(i) = quantile(stat,quantil);
end
end
As one can see, this version works columnwise. The algorithm does what it should but is really slow when the data size increaes. My question is now: Is it possible to implement this logic without using for loops? My problem is here that I could not find a version of datasample which does the sampling columnwise. Or is there a better function to use?
I am happy for any hint or idea how I can speed up this implementation.
Thanks and best regards!
stephan
The bottlenecks in your implementation are
The function spends a lot of time inside nanstd which is unnecessary since you exclude NaN values from your sample anyway.
There are a lot of functions that operate column-wise, but you spend time looping over the columns and calling them many times.
You make many calls to datasample which is a relatively slow function. It's much faster to create a random vector of indices using randi and use that instead.
Here's how I would write the function (actually I probably wouldn't put in this many comments, and I wouldn't use so many temp variables, but I'm doing it now so you can see what all the steps of the computation are).
function result = bootstrap_std_new(data, n, nRep, quantil)
result = zeros(1, size(data,2));
for i = 1:size(data,2)
isbad = isnan(data(:,i)); %// Vector of NaN values
if all(isbad)
result(i) = NaN;
else
data0 = data(~isbad, i); %// Temp copy of this column for indexing
index = randi(size(data0,1), n, nRep); %// Create the indexing vector
bootstrapdata = data0(index); %// Sample the data
stdevs = std(bootstrapdata); %// Stdev of sampled data
result(i) = quantile(stdevs, quantil); %// Find the correct quantile
end
end
end
Here are some timings
>> data = randn(100,10);
>> data(randi(1000, 50, 1)) = NaN;
>> tic, bootstrap_std(data, 50, 1000, 0.5); toc
Elapsed time is 1.359529 seconds.
>> tic, bootstrap_std_new(data, 50, 1000, 0.5); toc
Elapsed time is 0.038558 seconds.
So this gives you about a 35x speedup.
Your main issue seems to be that you may have varying numbers/positions of NaN in each column, so can't work on the full matrix unless you're okay with also sampling NaNs. However, some of the inner loops could be simplified.
for k=1:nReps
bootstrap_data(:,k) = datasample(data(values,i),n);
end
Since you're sampling with replacement, you should be able to just do:
bootstrap_data = datasample(data(values,i), n*nReps);
bootstrap_data = reshape(bootstrap_data, [n nReps]);
Also nanstd can work on a full matrix so no need to loop:
stat = nanstd(bootstrap_data); % or nanstd(x,0,2) to change dimension
It would also be worth just looking over your code with profile to see where the bottlenecks are.

why arrayfun does NOT improve my struct array operation performance

here is the input data:
% #param Landmarks:
% Landmarks should be 1*m struct.
% m is the number of training set.
% Landmark(i).data is a n*2 matrix
old function:
function Landmarks=CenterOfGravity(Landmarks)
% align center of gravity
for i=1 : length(Landmarks)
Landmarks(i).data=Landmarks(i).data - ones(size(Landmarks(i).data,1),1)...
*mean(Landmarks(i).data);
end
end
new function which use arrayfun:
function [Landmarks] = center_to_gravity(Landmarks)
Landmarks = arrayfun(#(struct_data)...
struct('data', struct_data.data - repmat(mean(struct_data.data), [size(struct_data.data, 1), 1]))...
,Landmarks);
end %function center_to_gravity
when using profiler, I find the usage of time is NOT what I expected:
Function Total Time Self Time*
CenterOfGravity 0.011s 0.004 s
center_to_gravity 0.029s 0.001 s
Can someone tell me why?
BTW...I can't add "arrayfun" as a new tag for my reputation.
Using arrayfun does not count as "vectorizing your code" as described in every Matlab performance blog post ever written.
If your .data field is the same length for all entries of landmark, your could vectorize this code by first placing all of the data into a single DATASIZE-BY-LANDMARKSIZE martix, and then running this command
meanRemovedData = bsxfun(#minus, data, mean(data,1));
But you lose an awful lot of code clarity that way. (I'm pretty sure that bsxfun usually has vectorization-like speed advantages, but I haven't done any time testing this morning.)
In terms of why, I'm not really the right guy to ask. But many of the advantages of vectorization are dependent on performing simple operations of contiguous blocks of memory. Data stored in an array of structures is (I believe) stored as an array of pointers to disparate memory locations, which is why you can change the size or class of Landmarks(i).data without reallocating the whole structure array.
Thanks for Amro and Pursuit's enthusiastic to my question.
I get the best solution at Matlab answers from Jan Simon:
why arrayfun does NOT improve my struct array operation performance
There are some points that do improve the performance:
It is surprisingly that SUM/LENGTH is faster than MEAN
timeit can give more accurate result.
The fastest approach use tricks like this:
m = sum(data, 1) / size(data, 1);
data(:, 1) = data(:, 1) - m(1);
Consider the following three implementations (all vectorized using BSXFUN):
function s = func1(s)
for i=1:numel(s)
s(i).data = bsxfun(#minus, s(i).data, mean(s(i).data));
end
end
function v = func2(s)
v = arrayfun(#(ss) bsxfun(#minus,ss.data,mean(ss.data)), ...
s, 'UniformOutput',false);
v = struct('data',v);
end
function v = func3(s)
v = arrayfun(#(ss) struct('data',bsxfun(#minus,ss.data,mean(ss.data))), ...
s, 'UniformOutput',true);
end
Explanation:
First uses a for-loop to iterate over the array of structs.
Second uses ARRAYFUN to return a cell array of the data matrices, which are then passed to STRUCT to build the array of structures.
The last one uses ARRAYFUN and builds a structure directly at each iteration.
Here is a simple test to compare the timings:
function testArrayStruct()
%# sample array of structures
s = struct('data',[]);
for i=5000:-1:1
s(i).data = rand(randi(1000),2);
end
%# timing
tic; v1 = func1(s); toc
tic; v2 = func2(s); toc
tic; v3 = func3(s); toc
%# check all have the same output
assert(isequal(v1,v2,v3))
end
The results:
Elapsed time is 0.357796 seconds. %# func1
Elapsed time is 0.427568 seconds. %# func2
Elapsed time is 0.537971 seconds. %# func3
So you can see the loop-based solution is actually the fastest..