Can I vectorize extraction of data from a cell array in MATLAB? - matlab

I am wondering if it is possible to use a vector to access data within a cell array. I am hoping to accomplish this using a vectorized approach rather than a for-loop.
I'm attempting to run a simple microsimulation in MATLAB. I have a simulated cohort that is initially healthy, but some are at low risk for a particular disease while others are at high risk. Thus, I have an array (Starting_Cohort) that indicates each patient's risk level (first column) and their initial status (second column). In addition, I have a cell array (pstar) that indicates each patient's likelihood of transitioning between two hypothetical health states (i.e., "healthy" and "sick").
What I would like to accomplish is the following:
1) During each period of the simulation (t = 1:T), use the first column of the starting cohort to determine the patient's risk level (i.e., 1 or 2).
2) Use the risk level to access a specific row (dependent on their current health status) within a specific cell (dependent on their risk level) of the cell array.
3) Compare the resultant vector against a random draw from the uniform distribution (contained in the array "r"), and select the column number associated with the first value larger than that draw (the column number determines their health state in the subsequent period).
HOWEVER, I want to avoid doing this for one patient at a time (i.e., introducing a nested loop), as this increases the execution time of the code by an order of magnitude (the actual cohort consists of approximately 20000 patients). I've been trying to accomplish this through vectorization - that is, running the simulation over the entire patient cohort concurrently - but I hit a roadblock when trying to access data from the cell array described above.
Starting_Cohort = [1 1; 1 1; 2 1; 2 1;];
[Cohort_Size, ~] = size(Starting_Cohort);
pstar = cell(2, 1);
pstar{1, 1} = [0.75 1.00; 0.15 1.00]; pstar{2, 1} = [0.65 1.00; 0.25 1.00];
rng(1234, 'twister'); T = 5; r = rand(Cohort_Size, T);
Sim_Results = [Starting_Cohort zeros(Cohort_Size, T)];
for t = 1:T
[~, Sim_Results(:, t+2)] = max(pstar{Sim_Results(:, 1), 1} ...
(Sim_Results(:, t+1), :) > r(:, t), [], 2);
end
When I run the above code, I obtain the error "Expected one output from a curly brace or dot indexing expression, but there were 4 results." I take this to mean that my approach to extracting information from the cell array is inappropriate, although I'm not sure whether I can address this or how. I would be deeply appreciative for any assistance rendered!
UPDATE 070619: I did eventually get this to work, using the code below. Effectively, I created a string array containing the expression I wanted to apply to each row. The expression is identical for every row EXCEPT in that it contains the row index. I can then use arrayfun and evalin to produce results similar to those I was looking for. Unfortunately, my own problem involves sparse arrays, so I could not actually solve my original problem. However, I'm hoping this information may nonetheless be useful for others.
Starting_Cohort = [1 1; 1 1; 2 1; 2 1;];
[Cohort_Size, ~] = size(Starting_Cohort);
pstar = cell(2, 1);
pstar{1, 1} = [0.75 1.00; 0.15 1.00];
pstar{2, 1} = [0.65 1.00; 0.25 1.00];
rng(1234, 'twister'); T = 5; r = rand(Cohort_Size, T);
Sim_Results = [Starting_Cohort zeros(Cohort_Size, T)];
for i = 1:Cohort_Size
TEST(i, 1) = strcat("max(pstar{Sim_Results(", string(i), ", 1), 1}",...
"(Sim_Results(", string(i), ", t+1), :) > ", ...
"r(", string(i), ", t), [], 2)");
end
for t = 1:T
[~, Sim_Results(:, t+2)] = arrayfun(#(x) evalin('base', x), TEST);
end

Related

How can I avoid constructing these grid variables in MATLAB?

I have the following calculations in two steps:
Initially, I create a set of 4 grid vectors, each spanning from -2 to 2:
u11grid=[-2:0.1:2];
u12grid=[-2:0.1:2];
u22grid=[-2:0.1:2];
u21grid=[-2:0.1:2];
[ca, cb, cc, cd] = ndgrid(u11grid, u12grid, u22grid, u21grid);
u11grid=ca(:);
u12grid=cb(:);
u22grid=cc(:);
u21grid=cd(:);
%grid=[u11grid u12grid u22grid u21grid]
sg=size(u11grid,1);
Next, I have an algorithm assigning the same index (equalorder) to the rows of grid sharing a specific structure:
U1grid=[-u11grid -u21grid -u12grid -u22grid Inf*ones(sg,1) -Inf*ones(sg,1)];
U2grid=[u21grid-u11grid -u21grid u22grid-u12grid -u22grid Inf*ones(sg,1) -Inf*ones(sg,1)];
s1=size(U1grid,2);
s2=size(U2grid,2);
%-------------------------------------------------------
%sortedU1grid gives U1grid with each row sorted from smallest to largest
%for each row i of sortedU1grid and for j=1,2,...,s1 index1(i,j) gives
%the column position 1,2,...,s1 in U1grid(i,:) of sortedU1grid(i,j)
[sortedU1grid,index1] = sort(U1grid,2);
%for each row i of sortedU1grid, d1(i,:) is a 1x(s1-1) row of ones and zeros
% d1(i,j)=1 if sortedU1grid(i,j)-sortedU1grid(i,j-1)=0 and d1(i,j)=0 otherwise
d1 = diff(sortedU1grid,[],2) == 0;
%-------------------------------------------------------
%Repeat for U2grid
[sortedU2grid,index2] = sort(U2grid,2);
d2 = diff(sortedU2grid,[],2) == 0;
%-------------------------------------------------------
%Assign the same index to the rows of grid sharing the same "ordering"
[~,~,equalorder] = unique([index1 index2 d1 d2],'rows', 'stable'); %sgx1
My question: is there a way to compute the algorithm in step 2 without the initial construction of the grid vectors in step 1? I am asking this because step 1 takes a lot of memory given that it basically generates the Cartesian product of 4 sets.
A solution should not rely on the specific content of U1grid and U2grid as that part changes in my actual code. To be more clear: U1grid and U2grid are ALWAYS derived from u11grid, ..., u21grid; however, the way in which they are derived from u11grid, ..., u21grid is slightly more complicated in my actual code from what I have reported here.
As Cris Luengo mentions in a comment, you're always going to be dealing with a trade-off between speed and memory. That said, one option you have is to only compute each of your 4 grid variables (u11grid u12grid u22grid u21grid) when needed instead of computing them once and storing them. You will save on memory but will lose speed if you are recomputing each one multiple times.
The solution I came up with involves creating an anonymous function equivalent for each of the 4 grid variables, using combinations of repmat and repelem to compute each individually instead of ndgrid to compute them all together:
u11gridFcn = #() repmat((-2:0.1:2).', 41.^3, 1);
u12gridFcn = #() repmat(repelem((-2:0.1:2).', 41), 41.^2, 1);
u22gridFcn = #() repmat(repelem((-2:0.1:2).', 41.^2), 41, 1);
u21gridFcn = #() repelem((-2:0.1:2).', 41.^3);
sg = 41.^4;
You would then use these by replacing every usage of your 4 grid variables in U1grid and U2grid with their corresponding function call. For your specific example above, this would be the new code for U1grid and U2grid (note also the use of inf(...) instead of Inf*ones(...), a small detail):
U1grid = [-u11gridFcn() ...
-u21gridFcn() ...
-u12gridFcn() ...
-u22gridFcn() ...
inf(sg, 1) ...
-inf(sg, 1)];
U2grid = [u21gridFcn()-u11gridFcn() ...
-u21gridFcn() ...
u22gridFcn()-u12gridFcn() ...
-u22gridFcn() ...
inf(sg, 1) ...
-inf(sg, 1)];
In this example, you avoid the memory needed to store the 4 grid variables, but the values for u11grid and u12grid will each be computed twice while the values for u21grid and u22grid will each be computed three times. Likely a small time trade-off for a potentially significant memory savings.
You may be able to remove the ndgrid, but it is not the memory bottleneck of this code, which is the call to unique on the large matrix A = [index1 index2 d1 d2]. The size of A is 2825761 by 22 (much larger than the grids), and it seems that unique may even internally copy A. I was able to avoid this call using
[sorted, ind] = sortrows([index1 index2 d1 d2]);
change = [1; any(diff(sorted), 2)];
uniqueInd = cumsum(change);
equalorder(ind) = uniqueInd;
[~, ~, equalorder] = unique(equalorder, 'stable');
where the last line is still the memory bottleneck and is only needed if you want the same numbering as your code produces. If any unique ordering is okay, you can skip it. You may be able to further reduce the memory footprint by carefully clearing variables are soon as they are no longer needed.

Time series aggregation efficiency

I commonly need to summarize a time series with irregular timing with a given aggregation function (i.e., sum, average, etc.). However, the current solution that I have seems inefficient and slow.
Take the aggregation function:
function aggArray = aggregate(array, groupIndex, collapseFn)
groups = unique(groupIndex, 'rows');
aggArray = nan(size(groups, 1), size(array, 2));
for iGr = 1:size(groups,1)
grIdx = all(groupIndex == repmat(groups(iGr,:), [size(groupIndex,1), 1]), 2);
for iSer = 1:size(array, 2)
aggArray(iGr,iSer) = collapseFn(array(grIdx,iSer));
end
end
end
Note that both array and groupIndex can be 2D. Every column in array is an independent series to be aggregated, but the columns of groupIndex should be taken together (as a row) to specify a period.
Then when we bring an irregular time series to it (note the second period is one base period longer), the timing results are poor:
a = rand(20006,10);
b = transpose([ones(1,5) 2*ones(1,6) sort(repmat((3:4001), [1 5]))]);
tic; aggregate(a, b, #sum); toc
Elapsed time is 1.370001 seconds.
Using the profiler, we can find out that the grpIdx line takes about 1/4 of the execution time (.28 s) and the iSer loop takes about 3/4 (1.17 s) of the total (1.48 s).
Compare this with the period-indifferent case:
tic; cumsum(a); toc
Elapsed time is 0.000930 seconds.
Is there a more efficient way to aggregate this data?
Timing Results
Taking each response and putting it in a separate function, here are the timing results I get with timeit with Matlab 2015b on Windows 7 with an Intel i7:
original | 1.32451
felix1 | 0.35446
felix2 | 0.16432
divakar1 | 0.41905
divakar2 | 0.30509
divakar3 | 0.16738
matthewGunn1 | 0.02678
matthewGunn2 | 0.01977
Clarification on groupIndex
An example of a 2D groupIndex would be where both the year number and week number are specified for a set of daily data covering 1980-2015:
a2 = rand(36*52*5, 10);
b2 = [sort(repmat(1980:2015, [1 52*5]))' repmat(1:52, [1 36*5])'];
Thus a "year-week" period are uniquely identified by a row of groupIndex. This is effectively handled through calling unique(groupIndex, 'rows') and taking the third output, so feel free to disregard this portion of the question.
Method #1
You can create the mask corresponding to grIdx across all
groups in one go with bsxfun(#eq,..). Now, for collapseFn as #sum, you can bring in matrix-multiplication and thus have a completely vectorized approach, like so -
M = squeeze(all(bsxfun(#eq,groupIndex,permute(groups,[3 2 1])),2))
aggArray = M.'*array
For collapseFn as #mean, you need to do a bit more work, as shown here -
M = squeeze(all(bsxfun(#eq,groupIndex,permute(groups,[3 2 1])),2))
aggArray = bsxfun(#rdivide,M,sum(M,1)).'*array
Method #2
In case you are working with a generic collapseFn, you can use the 2D mask M created with the previous method to index into the rows of array, thus changing the complexity from O(n^2) to O(n). Some quick tests suggest this to give appreciable speedup over the original loopy code. Here's the implementation -
n = size(groups,1);
M = squeeze(all(bsxfun(#eq,groupIndex,permute(groups,[3 2 1])),2));
out = zeros(n,size(array,2));
for iGr = 1:n
out(iGr,:) = collapseFn(array(M(:,iGr),:),1);
end
Please note that the 1 in collapseFn(array(M(:,iGr),:),1) denotes the dimension along which collapseFn would be applied, so that 1 is essential there.
Bonus
By its name groupIndex seems like would hold integer values, which could be abused to have a more efficient M creation by considering each row of groupIndex as an indexing tuple and thus converting each row of groupIndex into a scalar and finally get a 1D array version of groupIndex. This must be more efficient as the datasize would be 0(n) now. This M could be fed to all the approaches listed in this post. So, we would have M like so -
dims = max(groupIndex,[],1);
agg_dims = cumprod([1 dims(end:-1:2)]);
[~,~,idx] = unique(groupIndex*agg_dims(end:-1:1).'); %//'
m = size(groupIndex,1);
M = false(m,max(idx));
M((idx-1)*m + [1:m]') = 1;
Mex Function 1
HAMMER TIME: Mex function to crush it:
The base case test with original code from the question took 1.334139 seconds on my machine. IMHO, the 2nd fastest answer from #Divakar is:
groups2 = unique(groupIndex);
aggArray2 = squeeze(all(bsxfun(#eq,groupIndex,permute(groups,[3 2 1])),2)).'*array;
Elapsed time is 0.589330 seconds.
Then my MEX function:
[groups3, aggArray3] = mg_aggregate(array, groupIndex, #(x) sum(x, 1));
Elapsed time is 0.079725 seconds.
Testing that we get the same answer: norm(groups2-groups3) returns 0 and norm(aggArray2 - aggArray3) returns 2.3959e-15. Results also match original code.
Code to generate the test conditions:
array = rand(20006,10);
groupIndex = transpose([ones(1,5) 2*ones(1,6) sort(repmat((3:4001), [1 5]))]);
For pure speed, go mex. If the thought of compiling c++ code / complexity is too much of a pain, go with Divakar's answer. Another disclaimer: I haven't subject my function to robust testing.
Mex Approach 2
Somewhat surprising to me, this code appears even faster than the full Mex version in some cases (eg. in this test took about .05 seconds). It uses a mex function mg_getRowsWithKey to figure out the indices of groups. I think it may be because my array copying in the full mex function isn't as fast as it could be and/or overhead from calling 'feval'. It's basically the same algorithmic complexity as the other version.
[unique_groups, map] = mg_getRowsWithKey(groupIndex);
results = zeros(length(unique_groups), size(array,2));
for iGr = 1:length(unique_groups)
array_subset = array(map{iGr},:);
%// do your collapse function on array_subset. eg.
results(iGr,:) = sum(array_subset, 1);
end
When you do array(groups(1)==groupIndex,:) to pull out array entries associated with the full group, you're searching through the ENTIRE length of groupIndex. If you have millions of row entries, this will totally suck. array(map{1},:) is far more efficient.
There's still unnecessary copying of memory and other overhead associated with calling 'feval' on the collapse function. If you implement the aggregator function efficiently in c++ in such a way to avoid copying of memory, probably another 2x speedup can be achieved.
A little late to the party, but a single loop using accumarray makes a huge difference:
function aggArray = aggregate_gnovice(array, groupIndex, collapseFn)
[groups, ~, index] = unique(groupIndex, 'rows');
numCols = size(array, 2);
aggArray = nan(numel(groups), numCols);
for col = 1:numCols
aggArray(:, col) = accumarray(index, array(:, col), [], collapseFn);
end
end
Timing this using timeit in MATLAB R2016b for the sample data in the question gives the following:
original | 1.127141
gnovice | 0.002205
Over a 500x speedup!
Doing away with the inner loop, i.e.
function aggArray = aggregate(array, groupIndex, collapseFn)
groups = unique(groupIndex, 'rows');
aggArray = nan(size(groups, 1), size(array, 2));
for iGr = 1:size(groups,1)
grIdx = all(groupIndex == repmat(groups(iGr,:), [size(groupIndex,1), 1]), 2);
aggArray(iGr,:) = collapseFn(array(grIdx,:));
end
and calling the collapse function with a dimension parameter
res=aggregate(a, b, #(x)sum(x,1));
gives some speedup (3x on my machine) already and avoids the errors e.g. sum or mean produce, when they encounter a single row of data without a dimension parameter and then collapse across columns rather than labels.
If you had just one group label vector, i.e. same group labels for all columns of data, you could speed further up:
function aggArray = aggregate(array, groupIndex, collapseFn)
ng=max(groupIndex);
aggArray = nan(ng, size(array, 2));
for iGr = 1:ng
aggArray(iGr,:) = collapseFn(array(groupIndex==iGr,:));
end
The latter functions gives identical results for your example, with a 6x speedup, but cannot handle different group labels per data column.
Assuming a 2D test case for the group index (provided here as well with 10 different columns for groupIndex:
a = rand(20006,10);
B=[]; % make random length periods for each of the 10 signals
for i=1:size(a,2)
n0=randi(10);
b=transpose([ones(1,n0) 2*ones(1,11-n0) sort(repmat((3:4001), [1 5]))]);
B=[B b];
end
tic; erg0=aggregate(a, B, #sum); toc % original method
tic; erg1=aggregate2(a, B, #(x)sum(x,1)); toc %just remove the inner loop
tic; erg2=aggregate3(a, B, #(x)sum(x,1)); toc %use function below
Elapsed time is 2.646297 seconds.
Elapsed time is 1.214365 seconds.
Elapsed time is 0.039678 seconds (!!!!).
function aggArray = aggregate3(array, groupIndex, collapseFn)
[groups,ix1,jx] = unique(groupIndex, 'rows','first');
[groups,ix2,jx] = unique(groupIndex, 'rows','last');
ng=size(groups,1);
aggArray = nan(ng, size(array, 2));
for iGr = 1:ng
aggArray(iGr,:) = collapseFn(array(ix1(iGr):ix2(iGr),:));
end
I think this is as fast as it gets without using MEX. Thanks to the suggestion of Matthew Gunn!
Profiling shows that 'unique' is really cheap here and getting out just the first and last index of the repeating rows in groupIndex speeds things up considerably. I get 88x speedup with this iteration of the aggregation.
Well I have a solution that is almost as quick as the mex but only using matlab.
The logic is the same as most of the above, creating a dummy 2D matrix but instead of using #eq I initialize a logical array from the start.
Elapsed time for mine is 0.172975 seconds.
Elapsed time for Divakar 0.289122 seconds.
function aggArray = aggregate(array, group, collapseFn)
[m,~] = size(array);
n = max(group);
D = false(m,n);
row = (1:m)';
idx = m*(group(:) - 1) + row;
D(idx) = true;
out = zeros(m,size(array,2));
for ii = 1:n
out(ii,:) = collapseFn(array(D(:,ii),:),1);
end
end

Apply function to rolling window

Say I have a long list A of values (say of length 1000) for which I want to compute the std in pairs of 100, i.e. I want to compute std(A(1:100)), std(A(2:101)), std(A(3:102)), ..., std(A(901:1000)).
In Excel/VBA one can easily accomplish this by writing e.g. =STDEV(A1:A100) in one cell and then filling down in one go. Now my question is, how could one accomplish this efficiently in Matlab without having to use any expensive for-loops.
edit: Is it also possible to do this for a list of time series, e.g. when A has dimensions 1000 x 4 (i.e. 4 time series of length 1000)? The output matrix should then have dimensions 901 x 4.
Note: For the fastest solution see Luis Mendo's answer
So firstly using a for loop for this (especially if those are your actual dimensions) really isn't going to be expensive. Unless you're using a very old version of Matlab, the JIT compiler (together with pre-allocation of course) makes for loops inexpensive.
Secondly - have you tried for loops yet? Because you should really try out the naive implementation first before you start optimizing prematurely.
Thirdly - arrayfun can make this a one liner but it is basically just a for loop with extra overhead and very likely to be slower than a for loop if speed really is your concern.
Finally some code:
n = 1000;
A = rand(n,1);
l = 100;
for loop (hardly bulky, likely to be efficient):
S = zeros(n-l+1,1); %//Pre-allocation of memory like this is essential for efficiency!
for t = 1:(n-l+1)
S(t) = std(A(t:(t+l-1)));
end
A vectorized (memory in-efficient!) solution:
[X,Y] = meshgrid(1:l)
S = std(A(X+Y-1))
A probably better vectorized solution (and a one-liner) but still memory in-efficient:
S = std(A(bsxfun(#plus, 0:l-1, (1:l)')))
Note that with all these methods you can replace std with any function so long as it is applies itself to the columns of the matrix (which is the standard in Matlab)
Going 2D:
To go 2D we need to go 3D
n = 1000;
k = 4;
A = rand(n,k);
l = 100;
ind = bsxfun(#plus, permute(o:n:(k-1)*n, [3,1,2]), bsxfun(#plus, 0:l-1, (1:l)')); %'
S = squeeze(std(A(ind)));
M = squeeze(mean(A(ind)));
%// etc...
OR
[X,Y,Z] = meshgrid(1:l, 1:l, o:n:(k-1)*n);
ind = X+Y+Z-1;
S = squeeze(std(A(ind)))
M = squeeze(mean(A(ind)))
%// etc...
OR
ind = bsxfun(#plus, 0:l-1, (1:l)'); %'
for t = 1:k
S = std(A(ind));
M = mean(A(ind));
%// etc...
end
OR (taken from Luis Mendo's answer - note in his answer he shows a faster alternative to this simple loop)
S = zeros(n-l+1,k);
M = zeros(n-l+1,k);
for t = 1:(n-l+1)
S(t,:) = std(A(k:(k+l-1),:));
M(t,:) = mean(A(k:(k+l-1),:));
%// etc...
end
What you're doing is basically a filter operation.
If you have access to the image processing toolbox,
stdfilt(A,ones(101,1)) %# assumes that data series are in columns
will do the trick (no matter the dimensionality of A). Note that if you also have access to the parallel computing toolbox, you can let filter operations like these run on a GPU, although your problem might be too small to generate noticeable speedups.
To minimize number of operations, you can exploit the fact that the standard deviation can be computed as a difference involving second and first moments,
and moments over a rolling window are obtained efficiently with a cumulative sum (using cumsum):
A = randn(1000,4); %// random data
N = 100; %// window size
c = size(A,2);
A1 = [zeros(1,c); cumsum(A)];
A2 = [zeros(1,c); cumsum(A.^2)];
S = sqrt( (A2(1+N:end,:)-A2(1:end-N,:) ...
- (A1(1+N:end,:)-A1(1:end-N,:)).^2/N) / (N-1) ); %// result
Benchmarking
Here's a comparison against a loop based solution, using timeit. The loop approach is as in Dan's solution but adapted to the 2D case, exploting the fact that std works along each column in a vectorized manner.
%// File loop_approach.m
function S = loop_approach(A,N);
[n, p] = size(A);
S = zeros(n-N+1,p);
for k = 1:(n-N+1)
S(k,:) = std(A(k:(k+N-1),:));
end
%// File bsxfun_approach.m
function S = bsxfun_approach(A,N);
[n, p] = size(A);
ind = bsxfun(#plus, permute(0:n:(p-1)*n, [3,1,2]), bsxfun(#plus, 0:n-N, (1:N).')); %'
S = squeeze(std(A(ind)));
%// File cumsum_approach.m
function S = cumsum_approach(A,N);
c = size(A,2);
A1 = [zeros(1,c); cumsum(A)];
A2 = [zeros(1,c); cumsum(A.^2)];
S = sqrt( (A2(1+N:end,:)-A2(1:end-N,:) ...
- (A1(1+N:end,:)-A1(1:end-N,:)).^2/N) / (N-1) );
%// Benchmarking code
clear all
A = randn(1000,4); %// Or A = randn(1000,1);
N = 100;
t_loop = timeit(#() loop_approach(A,N));
t_bsxfun = timeit(#() bsxfun_approach(A,N));
t_cumsum = timeit(#() cumsum_approach(A,N));
disp(' ')
disp(['loop approach: ' num2str(t_loop)])
disp(['bsxfun approach: ' num2str(t_bsxfun)])
disp(['cumsum approach: ' num2str(t_cumsum)])
disp(' ')
disp(['bsxfun/loop gain factor: ' num2str(t_loop/t_bsxfun)])
disp(['cumsum/loop gain factor: ' num2str(t_loop/t_cumsum)])
Results
I'm using Matlab R2014b, Windows 7 64 bits, dual core processor, 4 GB RAM:
4-column case:
loop approach: 0.092035
bsxfun approach: 0.023535
cumsum approach: 0.0002338
bsxfun/loop gain factor: 3.9106
cumsum/loop gain factor: 393.6526
Single-column case:
loop approach: 0.085618
bsxfun approach: 0.0040495
cumsum approach: 8.3642e-05
bsxfun/loop gain factor: 21.1431
cumsum/loop gain factor: 1023.6236
So the cumsum-based approach seems to be the fastest: about 400 times faster than the loop in the 4-column case, and 1000 times faster in the single-column case.
Several functions can do the job efficiently in Matlab.
On one side, you can use functions such as colfilt or nlfilter, which performs computations on sliding blocks. colfilt is way more efficient than nlfilter, but can be used only if the order of the elements inside a block does not matter. Here is how to use it on your data:
S = colfilt(A, [100,1], 'sliding', #std);
or
S = nlfilter(A, [100,1], #std);
On your example, you can clearly see the difference of performance. But there is a trick : both functions pad the input array so that the output vector has the same size as the input array. To get only the relevant part of the output vector, you need to skip the first floor((100-1)/2) = 49 first elements, and take 1000-100+1 values.
S(50:end-50)
But there is also another solution, close to colfilt, more efficient. colfilt calls col2im to reshape the input vector into a matrix on which it applies the given function on each distinct column. This transforms your input vector of size [1000,1] into a matrix of size [100,901]. But colfilt pads the input array with 0 or 1, and you don't need it. So you can run colfilt without the padding step, then apply std on each column and this is easy because std applied on a matrix returns a row vector of the stds of the columns. Finally, transpose it to get a column vector if you want. In brief and in one line:
S = std(im2col(X,[100 1],'sliding')).';
Remark: if you want to apply a more complex function, see the code of colfilt, line 144 and 147 (for v2013b).
If your concern is speed of the for loop, you can greatly reduce the number of loop iteration by folding your vector into an array (using reshape) with the columns having the number of element you want to apply your function on.
This will let Matlab and the JIT perform the optimization (and in most case they do that way better than us) by calculating your function on each column of your array.
You then reshape an offseted version of your array and do the same. You will still need a loop but the number of iteration will only be l (so 100 in your example case), instead of n-l+1=901 in a classic for loop (one window at a time).
When you're done, you reshape the array of result in a vector, then you still need to calculate manually the last window, but overall it is still much faster.
Taking the same input notation than Dan:
n = 1000;
A = rand(n,1);
l = 100;
It will take this shape:
width = (n/l)-1 ; %// width of each line in the temporary result array
tmp = zeros( l , width ) ; %// preallocation never hurts
for k = 1:l
tmp(k,:) = std( reshape( A(k:end-l+k-1) , l , [] ) ) ; %// calculate your stat on the array (reshaped vector)
end
S2 = [tmp(:) ; std( A(end-l+1:end) ) ] ; %// "unfold" your results then add the last window calculation
If I tic ... toc the complete loop version and the folded one, I obtain this averaged results:
Elapsed time is 0.057190 seconds. %// windows by window FOR loop
Elapsed time is 0.016345 seconds. %// "Folded" FOR loop
I know tic/toc is not the way to go for perfect timing but I don't have the timeit function on my matlab version. Besides, the difference is significant enough to show that there is an improvement (albeit not precisely quantifiable by this method). I removed the first run of course and I checked that the results are consistent with different matrix sizes.
Now regarding your "one liner" request, I suggest your wrap this code into a function like so:
function out = foldfunction( func , vec , nPts )
n = length( vec ) ;
width = (n/nPts)-1 ;
tmp = zeros( nPts , width ) ;
for k = 1:nPts
tmp(k,:) = func( reshape( vec(k:end-nPts+k-1) , nPts , [] ) ) ;
end
out = [tmp(:) ; func( vec(end-nPts+1:end) ) ] ;
Which in your main code allows you to call it in one line:
S = foldfunction( #std , A , l ) ;
The other great benefit of this format, is that you can use the very same sub function for other statistical function. For example, if you want the "mean" of your windows, you call the same just changing the func argument:
S = foldfunction( #mean , A , l ) ;
Only restriction, as it is it only works for vector as input, but with a bit of rework it could be made to take arrays as input too.

Matlab -- random walk with boundaries, vectorized

Suppose I have a vector J of jump sizes and an initial starting point X_0. Also I have boundaries 0, B (assume 0 < X_0 < B). I want to do a random walk where X_i = [min(X_{i-1} + J_i,B)]^+. (positive part). Basically if it goes over a boundary, it is made equal to the boundary. Anyone know a vectorized way to do this? The current way I am doing it consists of doing cumsums and then finding places where it violates a condition, and then starting from there and repeating the cumsum calculation, etc until I find that I stop violating the boundaries. It works when the boundaries are rarely hit, but if they are hit all the time, it basically becomes a for loop.
In the code below, I am doing this across many samples. To 'fix' the ones that go out of the boundary, I have to loop through the samples to check...(don't think there is a vectorized 'find')
% X_init is a row vector describing initial resource values to use for
% each sample
% J is matrix where each col is a sequence of Jumps (columns = sample #)
% In this code the jumps are subtracted, but same thing
X_intvl = repmat(X_init,NumJumps,1) - cumsum(J);
X = [X_init; X_intvl];
for sample = 1:NumSamples
k = find(or(X_intvl(:,sample) > B, X_intvl(:,sample) < 0),1);
while(~isempty(k))
change = X_intvl(k-1,sample) - X_intvl(k,sample);
X_intvl(k:end,sample) = X_intvl(k:end,sample)+change;
k = find(or(X_intvl(:,sample) > B, X_intvl(:,sample) < 0),1);
end
end
Interesting question (+1).
I faced a similar problem a while back, although slightly more complex as my lower and upper bound depended on t. I never did work out a fully-vectorized solution. In the end, the fastest solution I found was a single loop which incorporates the constraints at each step. Adapting the code to your situation yields the following:
%# Set the parameters
LB = 0; %# Lower bound
UB = 5; %# Upper bound
T = 100; %# Number of observations
N = 3; %# Number of samples
X0 = (1/2) * (LB + UB); %# Arbitrary start point halfway between LB and UB
%# Generate the jumps
Jump = randn(N, T-1);
%# Build the constrained random walk
X = X0 * ones(N, T);
for t = 2:T
X(:, t) = max(min(X(:, t-1) + Jump(:, t-1), UB), 0);
end
X = X';
I would be interested in hearing if this method proves faster than what you are currently doing. I suspect it will be for cases where the constraint is binding in more than one or two places. I can't test it myself as the code you provided is not a "working" example, ie I can't just copy and paste it into Matlab and run it, as it depends on several variables for which example (or simulated) values are not provided. I tried adapting it myself, but couldn't get it to work properly?
UPDATE: I just switched the code around so that observations are indexed on columns and samples are indexed on rows, and then I transpose X in the last step. This will make the routine more efficient, since Matlab allocates memory for numeric arrays column-wise - hence it is faster when performing operations down the columns of an array (as opposed to across the rows). Note, you will only notice the speed-up for large N.
FINAL THOUGHT: These days, the JIT accelerator is very good at making single loops in Matlab efficient (double loops are still pretty slow). Therefore personally I'm of the opinion that every time you try and obtain a fully-vectorized solution in Matlab, ie no loops, you should weigh up whether the effort involved in finding a clever solution is worth the slight gains in efficiency to be made over an easier-to-obtain method that utilizes a single loop. And it is important to remember that fully-vectorized solutions are sometimes slower than solutions involving single loops when T and N are small!
I'd like to propose another vectorized solution.
So, first we should set the parameters and generate random Jumpls. I used the same set of parameters as Colin T Bowers:
% Set the parameters
LB = 0; % Lower bound
UB = 20; % Upper bound
T = 1000; % Number of observations
N = 3; % Number of samples
X0 = (1/2) * (UB + LB); % Arbitrary start point halfway between LB and UB
% Generate the jumps
Jump = randn(N, T-1);
But I changed generation code:
% Generate initial data without bounds
X = cumsum(Jump, 2);
% Apply bounds
Amplitude = UB - LB;
nsteps = ceil( max(abs(X(:))) / Amplitude - 0.5 );
for ii = 1:nsteps
ind = abs(X) > (1/2) * Amplitude;
X(ind) = Amplitude * sign(X(ind)) - X(ind);
end
% Shifting X
X = X0 + X;
So, instead of for loop I'm using cumsum function with smart post-processing.
N.B. This solution works significantly slower than Colin T Bowers's one for tight bounds (Amplitude < 5), but for loose bounds (Amplitude > 20) it works much faster.

How can I speed up this call to quantile in Matlab?

I have a MATLAB routine with one rather obvious bottleneck. I've profiled the function, with the result that 2/3 of the computing time is used in the function levels:
The function levels takes a matrix of floats and splits each column into nLevels buckets, returning a matrix of the same size as the input, with each entry replaced by the number of the bucket it falls into.
To do this I use the quantile function to get the bucket limits, and a loop to assign the entries to buckets. Here's my implementation:
function [Y q] = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q)
q=transpose(q);
end
Y = zeros(size(X));
for i = 1:nLevels
% "The variables g and l indicate the entries that are respectively greater than
% or less than the relevant bucket limits. The line Y(g & l) = i is assigning the
% value i to any element that falls in this bucket."
if i ~= nLevels % "The default; doesnt include upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#lt,X,q(i+1,:));
else % "For the final level we include the upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#le,X,q(i+1,:));
end
Y(g & l) = i;
end
Is there anything I can do to speed this up? Can the code be vectorized?
If I understand correctly, you want to know how many items fell in each bucket.
Use:
n = hist(Y,nbins)
Though I am not sure that it will help in the speedup. It is just cleaner this way.
Edit : Following the comment:
You can use the second output parameter of histc
[n,bin] = histc(...) also returns an index matrix bin. If x is a vector, n(k) = >sum(bin==k). bin is zero for out of range values. If x is an M-by-N matrix, then
How About this
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
Y = zeros(size(X));
for i = 1:numel(q)-1
Y = Y+ X>=q(i);
end
This results in the following:
>>X = [3 1 4 6 7 2];
>>[Y, q] = levels(X,2)
Y =
1 1 2 2 2 1
q =
1 3.5 7
You could also modify the logic line to ensure values are less than the start of the next bin. However, I don't think it is necessary.
I think you shoud use histc
[~,Y] = histc(X,q)
As you can see in matlab's doc:
Description
n = histc(x,edges) counts the number of values in vector x that fall
between the elements in the edges vector (which must contain
monotonically nondecreasing values). n is a length(edges) vector
containing these counts. No elements of x can be complex.
I made a couple of refinements (including one inspired by Aero Engy in another answer) that have resulted in some improvements. To test them out, I created a random matrix of a million rows and 100 columns to run the improved functions on:
>> x = randn(1000000,100);
First, I ran my unmodified code, with the following results:
Note that of the 40 seconds, around 14 of them are spent computing the quantiles - I can't expect to improve this part of the routine (I assume that Mathworks have already optimized it, though I guess that to assume makes an...)
Next, I modified the routine to the following, which should be faster and has the advantage of being fewer lines as well!
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
Y = ones(size(X));
for i = 2:nLevels
Y = Y + bsxfun(#ge,X,q(i,:));
end
The profiling results with this code are:
So it is 15 seconds faster, which represents a 150% speedup of the portion of code that is mine, rather than MathWorks.
Finally, following a suggestion of Andrey (again in another answer) I modified the code to use the second output of the histc function, which assigns entries to bins. It doesn't treat the columns independently, so I had to loop over the columns manually, but it seems to be performing really well. Here's the code:
function [Y q] = levels(X,nLevels)
p = linspace(0,1,nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
q(end,:) = 2 * q(end,:);
Y = zeros(size(X));
for k = 1:size(X,2)
[junk Y(:,k)] = histc(X(:,k),q(:,k));
end
And the profiling results:
We now spend only 4.3 seconds in codes outside the quantile function, which is around a 500% speedup over what I wrote originally. I've spent a bit of time writing this answer because I think it's turned into a nice example of how you can use the MATLAB profiler and StackExchange in combination to get much better performance from your code.
I'm happy with this result, although of course I'll continue to be pleased to hear other answers. At this stage the main performance increase will come from increasing the performance of the part of the code that currently calls quantile. I can't see how to do this immediately, but maybe someone else here can. Thanks again!
You can sort the columns and divide+round the inverse indexes:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
[S,IX]=sort(X);
[grid1,grid2]=ndgrid(1:size(IX,1),1:size(IX,2));
invIX=zeros(size(X));
invIX(sub2ind(size(X),IX(:),grid2(:)))=grid1;
Y=ceil(invIX/size(X,1)*nLevels);
Or you can use tiedrank:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
R=tiedrank(X);
Y=ceil(R/size(X,1)*nLevels);
Surprisingly, both these solutions are slightly slower than the quantile+histc solution.