How to sum all other rows in MATLAB - matlab

I am still learning some of the advanced features in MATLAB.
I have a 2D matrix and I want to sum all rows, except for for i.
eg
1 1 1
2 2 2
4 4 4
say i = 2, I want to get this:
5 5 5
I can do it by summing all the rows, then subtracting row i, but I want to know if there is a faster way using MATLAB's indexing/selection syntax.

It seems that summing all the rows, then subtracting row i, is much faster tough:
A=rand(500);
n = randi(500);
tic
for i=1:1e3
%sum(A([1:n-1 n+1:end], :));
sum(A)-A(n,:);
end
toc
Elapsed time is 0.162987 seconds.
A=rand(500);
n = randi(500);
tic
for i=1:1e3
sum(A([1:n-1 n+1:end], :));
end
toc
Elapsed time is 1.386113 seconds.

To add to the performance considerations of previous authors. The solution by nate is faster, because it does not use complex matrix indexing of the second method. Complex matrix/vector indexing is very inefficient in MATLAB. I suspect this is the same problem with indexing as the one described in the cited question.
Consider the following simple tests, following the previous framework:
A=rand(500);
n = randi(500);
tic
for i=1:1e3
B=sum(A(:, :));
end
toc
Elapsed time is 0.747704 seconds.
tic
for i=1:1e3
B=sum(A(1:end, :));
end
toc
Elapsed time is 5.476109 seconds. % What ???!!!
tic
id = [1:n-1 n+1:500];
for i=1:1e3
B=sum(A(id, :));
end
toc
Elapsed time is 5.449064 seconds.

Well, you could do it like this:
>> A = [ 1 1 1
2 2 2
4 4 4];
>> n = 2;
>> sum(A([1:n-1 n+1:end], :))
ans =
5 5 5
However, as Nate has already indicated, as nice as it may look, it's actually so much slower than just subtracting a single row that I advise against using it :)

Related

Optimization/vectorization of Matlab algorithm including very big matrices

I have a optimization problem in Matlab. Assume I get the following three vectors as input:
A of size (1 x N) (time-series of signal amplitude)
F of size (1 x N) (time-series of signal instantaneous frequency)
fx of size (M x 1) (frequency axis that I want to match the above on)
Now, the elements of F might not (99% of the times they will not) necessarily match the items of fx exactly, which is why I have to match to the closest frequency.
Here's the catch: We are talking about big data. N can easily be up to 2 million, and this has to be run hundred times on several hundred subjects. My two concerns:
Time (main concern)
Memory (production will be run on machines with +16GB memory, but development is on a machine with only 8GB of memory)
I have these two working solutions. For the following, N=2604000 and M=201:
Method 1 (for-loop)
Simple for-loop. Memory is no problem at all, but it is time consuming. Easiest implementation.
tic;
I = zeros(M,N);
for i = 1:N
[~,f] = min(abs(fx-F(i)));
I(f,i) = A(i).^2;
end
toc;
Duration: 18.082 seconds.
Method 2 (vectorized)
The idea is to match the frequency axis with each instantaneous frequency, to get the id.
F
[ 0.9 0.2 2.3 1.4 ] N
[ 0 ][ 0 1 0 0 ]
fx [ 1 ][ 1 0 0 1 ]
[ 2 ][ 0 0 1 0 ]
M
And then multiply each column with the amplitude at that time.
tic;
m_Ff = repmat(F,M,1);
m_fF = repmat(fx,1,N);
[~,idx] = min(abs(m_Ff - m_fF)); clearvars m_Ff m_fF;
m_if = repmat(idx,M,1); clearvars idx;
m_fi = repmat((1:M)',1,N);
I = double(m_if==m_fi); clearvars m_if m_fi;
I = bsxfun(#times,I,A);
toc;
Duration: 64.223 seconds. This is surprising to me, but probably because the huge variable sizes and my limited memory forces it to store the variables as files. I have SSD, though.
The only thing I have not taken advantage of, is that the matrices will have many zero-elements. I will try and look into sparse matrices.
I need at least single precision for both the amplitudes and frequencies, but really I found that it takes a lot of time to convert from double to single.
Any suggestions on how to improve?
UPDATE
As of the suggestions, I am now down to a time of combined 2.53 seconds. This takes advantage of the fact that fx is monotonically increasing and even-spaced (always starting in 0). Here is the code:
tic; df = mode(diff(fx)); toc; % Find fx step size
tic; idx = round(F./df+1); doc; % Convert to bin ids
tic; I = zeros(M,N); toc; % Pre-allocate output
tic; lin_idx = idx + (0:N-1)*M; toc; % Find indices to insert at
tic; I(lin_idx) = A.^2; toc; % Insert
The timing outputs are the following:
Elapsed time is 0.000935 seconds.
Elapsed time is 0.021878 seconds.
Elapsed time is 0.175729 seconds.
Elapsed time is 0.018815 seconds.
Elapsed time is 2.294869 seconds.
Hence the most time-consuming step is now the very final one. Any advice on this is greatly appreciated. Thanks to #Peter and #Divakar for getting me this far.
UPDATE 2 (Solution)
Wuhuu. Using sparse(i,j,k) really improves the outcome;
tic; df = fx(2)-fx(1); toc;
tic; idx = round(F./df+1); toc;
tic; I = sparse(idx,1:N,A.^2); toc;
With timings:
Elapsed time is 0.000006 seconds.
Elapsed time is 0.016213 seconds.
Elapsed time is 0.114768 seconds.
Here's one approach based on bsxfun -
abs_diff = abs(bsxfun(#minus,fx,F));
[~,idx] = min(abs_diff,[],1);
IOut = zeros(M,N);
lin_idx = idx + [0:N-1]*M;
IOut(lin_idx) = A.^2;
I'm not following entirely the relationship of F and fx, but it sounds like fx might be a set of bins of frequency, and you want to find the appropriate bin for each input F.
Optimizing this depends on the characteristics of fx.
If fx is monotonic and evenly spaced, then you don't need to search it at all. You just need to scale and offset F to align the scales, then round to get the bin number.
If fx is monotonic (sorted) but not evenly spaced, you want histc. This will use an efficient search on the edges of fx to find the correct bin. You probably need to transform f first so that it contains the edges of the bins rather than the centers.
If it's neither, then you should be able to at least sort it to get it monotonic, storing the sort order, and restoring the original order once you've found the correct "bin".

Correlation matrix ignoring NaN

I am using matlab and I have a (60x882) matrix and I need to compute pairwise correlations between columns. However I want to ignore all the columns which have a NaN or more (i.e. the result for any pair of columns in which at least one entry is NaN should be NaN).
Here's my code so far:
for i=1:size(auxret,2)
for j=1:size(auxret,2)
rho(i,j)=corr(auxret(:,i),auxret(:,j));
end
end
end
But this is extremely innefficient. I considered using the function:
corr(auxret, 'rows','pairwise');
But it didn't produce the same result (it ignores NaNs but still computes the correlation - so unless all entries of a column except one are NaN it will still give an output).
Any suggestions on how to improve efficiency?
To get the same output as your code using corr(auxret, 'rows','pairwise'), the following does the job
auxret(:,any(isnan(auxret))) = NaN;
r = corr(auxret, 'rows','pairwise');
What you describe is the default behavior of corr without any special options. For example,
auxret = [8 2 3
3 5 NaN
7 10 3
7 4 6
2 6 7];
rho = corr(auxret)
results in
rho =
1.0000 -0.1497 NaN
-0.1497 1.0000 NaN
NaN NaN NaN
This would be an efficient approach, specially when dealing with input data involving NaNs -
%// Get mask of invalid columns and thus extract columns without any NaN
mask = any(isnan(auxret),1);
A = auxret(:,~mask);
%// Use correlation formula to get correlation outputs for valid columns
n = size(A,1);
sum_cols = sum(A,1);
sumsq_sqcolsum = n*sum(A.^2,1) - sum_cols.^2;
val1 = n.*(A.'*A) - bsxfun(#times,sum_cols.',sum_cols); %//'
val2 = sqrt(bsxfun(#times,sumsq_sqcolsum.',sumsq_sqcolsum)); %//'
valid_outvals = val1./val2;
%// Setup output array and store the valid outputs in it
ncols = size(auxret,2);
valid_idx = find(~mask);
out = nan(ncols);
out(valid_idx,valid_idx) = valid_outvals;
Basically, as the pre-processing step, it altogether removes all columns having one or more NaNs and calculates the correlation outputs. Then we initialize an output array of NaNs with appropriate size and puts back the valid outputs into it at appropriate places.
Benchmarking
It seems the results are valid whether you stay with your loopy approach or go with the optional corr(auxret, 'rows','pairwise'). But, there is a big catch here: Even a single NaN
in any of the columns slows down the performance quite a bit and this performance drop is huge with the original loopy approach and still big with the rows + pairwise option as we will
find out in the benchmarking results next up.
Benchmarking code
nrows = 60;
ncols = 882;
percent_nans = 1; %// decides the percentage of NaNs in input
auxret = rand(nrows,ncols);
auxret(randperm(numel(auxret),round((percent_nans/100)*numel(auxret))))=nan;
disp('------------------------------- With Proposed Approach')
tic
%// Solution code from earlier
toc
disp('------------------------------- With ROWS + PAIRWISE Approach')
tic
auxret(:,any(isnan(auxret))) = NaN;
out1 = corr(auxret, 'rows','pairwise');
toc
disp('------------------------------- With Original Loopy Approach')
tic
out2 = zeros(size(auxret,2));
for i=1:size(auxret,2)
for j=1:size(auxret,2)
out2(i,j)=corr(auxret(:,i),auxret(:,j));
end
end
toc
So, there are few cases possible based on the input datasizes and percentage of NaNs and correspondingly we have the runtime results -
Case 1: Input is 6 x 88 and Percentage of NaNs is 10
------------------------------- With Proposed Approach
Elapsed time is 0.006371 seconds.
------------------------------- With ROWS + PAIRWISE Approach
Elapsed time is 0.052563 seconds.
------------------------------- With Original Loopy Approach
Elapsed time is 0.875620 seconds.
Case 2: Input is 6 x 88 and Percentage of NaNs is 1
------------------------------- With Proposed Approach
Elapsed time is 0.006303 seconds.
------------------------------- With ROWS + PAIRWISE Approach
Elapsed time is 0.049194 seconds.
------------------------------- With Original Loopy Approach
Elapsed time is 0.871369 seconds.
Case 3: Input is 6 x 88 and Percentage of NaNs is 0.001
------------------------------- With Proposed Approach
Elapsed time is 0.006738 seconds.
------------------------------- With ROWS + PAIRWISE Approach
Elapsed time is 0.025754 seconds.
------------------------------- With Original Loopy Approach
Elapsed time is 0.867647 seconds.
Case 4: Input is 60 x 882 and Percentage of NaNs is 10
------------------------------- With Proposed Approach
Elapsed time is 0.007766 seconds.
------------------------------- With ROWS + PAIRWISE Approach
Elapsed time is 2.479645 seconds.
------------------------------- With Original Loopy Approach
...... Taken Too long ...
Case 5: Input is 60 x 882 and Percentage of NaNs is 1
------------------------------- With Proposed Approach
Elapsed time is 0.014144 seconds.
------------------------------- With ROWS + PAIRWISE Approach
Elapsed time is 2.324878 seconds.
------------------------------- With Original Loopy Approach
...... Taken Too long ...
Case 6: Input is 60 x 882 and Percentage of NaNs is 0.001
------------------------------- With Proposed Approach
Elapsed time is 0.020410 seconds.
------------------------------- With ROWS + PAIRWISE Approach
Elapsed time is 1.830632 seconds.
------------------------------- With Original Loopy Approach
...... Taken Too long ...

How to profile a vector outer product in matlab

during my matlab profiling, i noticed one line of code that consumes much more time than i imagined. Any idea how to make it faster?
X = Y(ids_A, ids_A) - (Y(ids_A,k) * Y(k,ids_A))/Y(k,k);
X, and Y are symmetric matrices with the same size (dxd), k is an index of a single row/column in Y, ids_A is a vector of indices of all the other rows/columns( therefore Y(ids_A,k) is a column vector and Y(k,ids_A) is a row vector)
ids_A = setxor(1:d,k);
Thanks!
You can perhaps replace the outer product multiplication with a call to bsxfun:
X = Y(ids_A, ids_A) - (bsxfun(#times, Y(ids_A,k), Y(k,ids_A))/Y(k,k));
So how does the above code work? Let's take a look at the definition of the outer product when one vector is 4 elements and the other 3 elements:
Source: Wikipedia
As you can see, the outer product is created by element-wise products where the first vector u is replicated horizontally while the second vector v is replicated vertically. You then find the element-wise products of each element to produce your result. This is eloquently done with bsxfun:
bsxfun(#times, u, v.');
u would be a column vector and v.' would be a row vector. bsxfun naturally replicates the data to follow the above pattern, and then we use #times to perform the element-wise products.
I am assuming your code to look something like this -
for k = 1:d
ids_A = setxor(1:d,k);
X = Y(ids_A, ids_A) - (Y(ids_A,k) * Y(k,ids_A))/Y(k,k);
end
With the given code snippet, it's safe to assume that you are somehow using X within that loop. You can calculate all the X matrices as a pre-calculation step before the start of such a loop and these calculations could be performed as a vectorized approach.
Regarding the code snippet itself, it can be seen that you are "escaping" one index at each iteration with setxor. Now, if you are going with a vectorized approach, you can perform all those mathematical operations in one-go and later on remove the elements that got incorporated in the vectorized approach, but weren't intended. This really is the essence of a bsxfun based vectorized approach listed next -
%// Perform all matrix-multiplications in one go with bsxfun and permute
mults = bsxfun(#times,permute(Y,[1 3 2]),permute(Y,[3 2 1]));
%// Scale those with diagonal elements from Y and get X for every iteration
scaledvals = bsxfun(#rdivide,mults,permute(Y(1:d+1:end),[1 3 2]));
X_vectorized = bsxfun(#minus,Y,scaledvals);
%// Find row and column indices as linear indices to be removed from X_all
row_idx = bsxfun(#plus,[0:d-1]*d+1,[0:d-1]'*(d*d+1));
col_idx = bsxfun(#plus,[1:d]',[0:d-1]*(d*(d+1)));
%// Remove those "setxored" indices and then reshape to expected size
X_vectorized([row_idx col_idx])=[];
X_vectorized = reshape(X_vectorized,d-1,d-1,d);
Benchmarking
Benchmarking Code
d = 50; %// Datasize
Y = rand(d,d); %// Create random input
num_iter = 100; %// Number of iterations to be run for each approach
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('------------------------------ With original loopy approach')
tic
for iter = 1:num_iter
for k = 1:d
ids_A = setxor(1:d,k);
X = Y(ids_A, ids_A) - (Y(ids_A,k) * Y(k,ids_A))/Y(k,k);
end
end
toc
clear X k ids_A
disp('------------------------------ With proposed vectorized approach')
tic
for iter = 1:num_iter
mults = bsxfun(#times,permute(Y,[1 3 2]),permute(Y,[3 2 1]));
scaledvals = bsxfun(#rdivide,mults,permute(Y(1:d+1:end),[1 3 2]));
X_vectorized = bsxfun(#minus,Y,scaledvals);
row_idx = bsxfun(#plus,[0:d-1]*d+1,[0:d-1]'*(d*d+1));
col_idx = bsxfun(#plus,[1:d]',[0:d-1]*(d*(d+1)));
X_vectorized([row_idx col_idx])=[];
X_vectorized = reshape(X_vectorized,d-1,d-1,d);
end
toc
Results
Case #1: d = 50
------------------------------ With original loopy approach
Elapsed time is 0.849518 seconds.
------------------------------ With proposed vectorized approach
Elapsed time is 0.154395 seconds.
Case #2: d = 100
------------------------------ With original loopy approach
Elapsed time is 2.079886 seconds.
------------------------------ With proposed vectorized approach
Elapsed time is 2.285884 seconds.
Case #1: d = 200
------------------------------ With original loopy approach
Elapsed time is 7.592865 seconds.
------------------------------ With proposed vectorized approach
Elapsed time is 19.012421 seconds.
Conclusions
One can easily notice that the proposed vectorized approach might be a better choice when dealing with matrixes of sizes upto 100 x 100 beyond which
the memory-hungry bsxfun slows us down.

How can I vectorize code that runs a function on subsets of a larger matrix?

Let's assume I have the following 9 x 5 matrix:
myArray = [
54.7 8.1 81.7 55.0 22.5
29.6 92.9 79.4 62.2 17.0
74.4 77.5 64.4 58.7 22.7
18.8 48.6 37.8 20.7 43.5
68.6 43.5 81.1 30.1 31.1
18.3 44.6 53.2 47.0 92.3
36.8 30.6 35.0 23.0 43.0
62.5 50.8 93.9 84.4 18.4
78.0 51.0 87.5 19.4 90.4
];
I have 11 "subsets" of this matrix and I need to run a function (let's say max) on each of these subsets. The subsets can be identified with the following matirx of logicals (identified column-wise, not row-wise):
myLogicals = logical([
0 1 0 1 1
1 1 0 1 1
1 1 0 0 0
0 1 0 1 1
1 0 1 1 1
1 1 1 1 0
0 1 1 0 1
1 1 0 0 1
1 1 0 0 1
]);
or via linear indexing:
starts = [2 5 8 10 15 23 28 31 37 40 43]; #%index start of each subset
ends = [3 6 9 13 18 25 29 33 38 41 45]; #%index end of each subset
such that the first subset is 2:3, the second is 5:6, and so on.
I can find the max of each subset and store it in a vector as follows:
finalAnswers = NaN(11,1);
for n=1:length(starts) #%i.e. 1 through the number of subsets
finalAnswers(n) = max(myArray(starts(n):ends(n)));
end
After the loop runs, finalAnswers contains the maximum value of each of the data subsets:
74.4 68.6 78.0 92.9 51.0 81.1 62.2 47.0 22.5 43.5 90.4
Is it possible to obtain the same result without the use of a for loop? In other words, can this code be vectorized? Would such an approach be more efficient than the current one?
EDIT:
I did some testing of the proposed solutions. The data I used was a 1,510 x 2,185 matrix with 10,103 subsets that varied in length from 2 to 916 with a standard deviation of subset length of 101.92.
I wrapped each solution in tic;for k=1:1000 [code here] end; toc; and here are the results:
for loop approach --- Elapsed time is 16.237400 seconds.
Shai's approach --- Elapsed time is 153.707076 seconds.
Dan's approach --- Elapsed time is 44.774121 seconds.
Divakar's approach #2 --- Elapsed time is 127.621515 seconds.
Notes:
I also tried benchmarking Dan's approach by wrapping the k=1:1000 for loop around just the accumarray line (since the rest could be
theoretically run just once). In this case the time was 28.29
seconds.
Benchmarking Shai's approach, while leaving the lb = ... line out
of the k loop, the time was 113.48 seconds.
When I ran Divakar's code, I got Non-singleton dimensions of the two
input arrays must match each other. errors for the bsxfun lines.
I "fixed" this by using conjugate transposition (the apostrophe
operator ') on trade_starts(1:starts_extent) and
intv(1:starts_extent) in the lines of code calling bsxfun. I'm
not sure why this error was occuring...
I'm not sure if my benchmarking setup is correct, but it appears that the for loop actually runs the fastest in this case.
One approach is to use accumarray. Unfortunately in order to do that we first need to "label" your logical matrix. Here is a convoluted way of doing that if you don't have the image processing toolbox:
sz=size(myLogicals);
s_ind(sz(1),sz(2))=0;
%// OR: s_ind = zeros(size(myLogicals))
s_ind(starts) = 1;
labelled = cumsum(s_ind(:)).*myLogicals(:);
So that just does what Shai's bwlabeln implementation does (but this will be 1-by-numel(myLogicals) in shape as opposed to size(myLogicals) in shape)
Now you can use accumarray:
accumarray(labelled(myLogicals), myArray(myLogicals), [], #max)
or else it may be faster to try
result = accumarray(labelled+1, myArray(:), [], #max);
result = result(2:end)
This is fully vectorized, but is it worth it? You'll have to do speed tests against your loop solution to know.
Use bwlabeln with a vertical connectivity:
lb = bwlabeln( myLogicals, [0 1 0; 0 1 0; 0 1 0] );
Now you have a label 1..11 for each region.
To get max value you can use regionprops
props = regionprops( lb, myArray, 'MaxIntensity' );
finalAnswers = [props.MaxIntensity];
You can use regionprops to get some other properties of each subset, but it is not too general.
If you wish to apply a more general function to each region, e.g., median, you can use accumarray:
finalAnswer = accumarray( lb( myLogicals ), myArray( myLogicals ), [], #median );
Ideas behind vectorization and optimization
One of the approaches that one can employ to vectorize this problem would be to convert the subsets into regular shaped blocks and then finding the max of the elements
of the those blocks in one go. Now, converting to regular shaped blocks has one issue here and it is that the subsets are unequal in lengths. To avoid this issue, one can
create a 2D matrix of indices starting from each of starts elements and extending until the maximum of the subset lengths. Good thing about this is, it allows
vectorization, but at the cost of more memory requirements which would depend on the scattered-ness of the subsets lengths.
Another issue with this vectorization technique would be that it could potentially lead to out-of-limits indices creations for final subsets.
To avoid this, one can think of two possible ways -
Use a bigger input array by extending the input array such that maximum of the subset lengths plus the starts indices still lie within the confinements of the
extended array.
Use the original input array for starts until we are within the limits of original input array and then for the rest of the subsets use the original loop code. We can call it the mixed programming just for the sake of having a short title. This would save us memory requirements on creating the extended array as discussed in the other approach earlier.
These two ways/approaches are listed next.
Approach #1: Vectorized technique
[m,n] = size(myArray); %// store no. of rows and columns in input array
intv = ends-starts; %// intervals
max_intv = max(intv); %// max interval
max_intv_arr = [0:max_intv]'; %//'# array of max indices extent
[row1,col1] = ind2sub([m n],starts); %// get starts row and column indices
m_ext = max(row1+max_intv); %// no. of rows in extended input array
myArrayExt(m_ext,n)=0; %// extended form of input array
myArrayExt(1:m,:) = myArray;
%// New linear indices for extended form of input array
idx = bsxfun(#plus,max_intv_arr,(col1-1)*m_ext+row1);
%// Index into extended array; select only valid ones by setting rest to nans
selected_ele = myArrayExt(idx);
selected_ele(bsxfun(#gt,max_intv_arr,intv))= nan;
%// Get the max of the valid ones for the desired output
out = nanmax(selected_ele); %// desired output
Approach #2: Mixed programming
%// PART - I: Vectorized technique for subsets that when normalized
%// with max extents still lie within limits of input array
intv = ends-starts; %// intervals
max_intv = max(intv); %// max interval
%// Find the last subset that when extended by max interval would still
%// lie within the limits of input array
starts_extent = find(starts+max_intv<=numel(myArray),1,'last');
max_intv_arr = [0:max_intv]'; %//'# Array of max indices extent
%// Index into extended array; select only valid ones by setting rest to nans
selected_ele = myArray(bsxfun(#plus,max_intv_arr,starts(1:starts_extent)));
selected_ele(bsxfun(#gt,max_intv_arr,intv(1:starts_extent))) = nan;
out(numel(starts)) = 0; %// storage for output
out(1:starts_extent) = nanmax(selected_ele); %// output values for part-I
%// PART - II: Process rest of input array elements
for n = starts_extent+1:numel(starts)
out(n) = max(myArray(starts(n):ends(n)));
end
Benchmarking
In this section we will compare the the two approaches and the original loop code against each other for performance. Let's setup codes before starting the actual benchmarking -
N = 10000; %// No. of subsets
M1 = 1510; %// No. of rows in input array
M2 = 2185; %// No. of cols in input array
myArray = rand(M1,M2); %// Input array
num_runs = 50; %// no. of runs for each method
%// Form the starts and ends by getting a sorted random integers array from
%// 1 to one minus no. of elements in input array. That minus one is
%// compensated later on into ends because we don't want any subset with
%// starts and ends as the same index
y1 = reshape(sort(randi(numel(myArray)-1,1,2*N)),2,[]);
starts = y1(1,:);
ends = y1(1,:)+1;
%// Remove identical starts elements
invalid = [false any(diff(starts,[],2)==0,1)];
starts = starts(~invalid);
ends = ends(~invalid);
%// Create myLogicals
myLogicals = false(size(myArray));
for k1=1:numel(starts)
myLogicals(starts(k1):ends(k1))=1;
end
clear invalid y1 k1 M1 M2 N %// clear unnecessary variables
%// Warm up tic/toc.
for k = 1:100
tic(); elapsed = toc();
end
Now, the placebo codes that gets us the runtimes -
disp('---------------------- With Original loop code')
tic
for iter = 1:num_runs
%// ...... approach #1 codes
end
toc
%// clear out variables used in the above approach
%// repeat this for approach #1,2
Benchmark Results
In your comments, you mentioned using 1510 x 2185 matrix, so let's do two case runs with such size and subsets of size 10000 and 2000.
Case 1 [Input - 1510 x 2185 matrix, Subsets - 10000]
---------------------- With Original loop code
Elapsed time is 15.625212 seconds.
---------------------- With Approach #1
Elapsed time is 12.102567 seconds.
---------------------- With Approach #2
Elapsed time is 0.983978 seconds.
Case 2 [Input - 1510 x 2185 matrix, Subsets - 2000]
---------------------- With Original loop code
Elapsed time is 3.045402 seconds.
---------------------- With Approach #1
Elapsed time is 11.349107 seconds.
---------------------- With Approach #2
Elapsed time is 0.214744 seconds.
Case 3 [Bigger Input - 3000 x 3000 matrix, Subsets - 20000]
---------------------- With Original loop code
Elapsed time is 12.388061 seconds.
---------------------- With Approach #1
Elapsed time is 12.545292 seconds.
---------------------- With Approach #2
Elapsed time is 0.782096 seconds.
Note that the number of runs num_runs was varied to keep the runtime of the fastest approach close to 1 sec.
Conclusions
So, I guess the mixed programming (approach #2) is the way to go! As future work, one can use standard deviation into the scattered-ness criteria if the performance suffers because of the scattered-ness and offload the work for most scattered subsets (in terms of their lengths) into the loop code.
Efficiency
Measure both the vectorised & for-loop code samples on your respective platform ( be it a <localhost> or Cloud-based ) to see the difference:
MATLAB:7> tic();max( myArray( startIndex(:):endIndex(:) ) );toc() %% Details
Elapsed time is 0.0312 seconds. %% below.
%% Code is not
%% the merit,
%% method is:
and
tic(); %% for/loop
for n = 1:length( startIndex ) %% may be
max( myArray( startIndex(n):endIndex(n) ) ); %% significantly
end %% faster than
toc(); %% vectorised
Elapsed time is 0.125 seconds. %% setup(s)
%% overhead(s)
%% As commented below,
%% subsequent re-runs yield unrealistic results due to caching artifacts
Elapsed time is 0 seconds.
Elapsed time is 0 seconds.
Elapsed time is 0 seconds.
%% which are not so straight visible if encapsulated in an artificial in-vitro
%% via an outer re-run repetitions ( for k=1:1000 ) et al ( ref. in text below )
For a better interpretation of the test results, rather test on much larger sizes than just on a few tens of row/cols.
EDIT:
An erroneous code removed, thanks Dan for the notice. Having taken more attention to emphasize the quantitative validation, that may prove the assumption that a vectorised code may, but need not in all circumstances, be faster is not an excuse for a faulty code, sure.
Output - quantitatively comparative data:
While recommended, there is not IMHO fair to assume, the memalloc and similar overheads to be excluded from the in-vivo testing. Test re-runs typically show VM-page hits improvements, other caching artifacts, while the raw 1st "virgin" run is what typically appears in the real code deployment ( excl. external iterators, for sure ). So consider the results with care and retest in your real environment ( sometimes being run as a Virtual Machine inside a bigger system -- that also makes VM-swap mechanics necessary to take into account once huge matrices start hurt on real-life memory-access patterns ).
On other Projects I am used to use [usec] granularity of the realtime test timing, but the more care is necessary to be taken into account about the test-execution conditions and O/S background.
So nothing but testing gives relevant answers to your specific code/deployment situation, however be methodic to compare data comparable in principle.
Alarik's code:
MATLAB:8> tic(); for k=1:1000 % ( flattens memalloc issues & al )
> for n = 1:length( startIndex )
> max( myArray( startIndex(n):endIndex() ) );
> end;
> end; toc()
Elapsed time is 0.2344 seconds.
%% time is 0.0002 seconds per k-for-loop <--[ ref.^ remarks on testing ]
Dan's code:
MATLAB:9> tic(); for k=1:1000
> s_ind( size( myLogicals ) ) = 0;
> s_ind( startIndex ) = 1;
> labelled = cumsum( s_ind(:) ).*myLogicals(:);
> result = accumarray( labelled + 1, myArray(:), [], #max );
> end; toc()
error: product: nonconformant arguments (op1 is 43x1, op2 is 45x1)
%%
%% [Work in progress] to find my mistake -- sorry for not being able to reproduce
%% Dan's code and to make it work
%%
%% Both myArray and myLogicals shape was correct ( 9 x 5 )

Number of values greater than a threshold

I have a matrix A. Now I want to find the number of elements greater than 5 and their corresponding indices. How to solve this in matlab without using for loop?
For example if A = [1 4 6 8 9 5 6 8 9]':
Number of elements > 5: 6
Indices: [3 4 5 7 8 9]
You use find:
index = find(A>5);
numberOfElements = length(index);
You use sum, which allows you to get the number of elements with one command:
numberOfElements = sum(A>5);
Do you really need explicit indices? Because the logical matrix A>5 can also be used as index (usually a tad more efficient than indexing with find):
index = (A>5);
numberOfElements = sum(index);
For completeness: indexing with logicals is the same as with regular indices:
>> A(A>5)
ans =
6 8 9 6 8 9
Motivated by the above discussion with Rody, here is a simple benchmark, which tests speed of integer vs. logical array indexing in MATLAB. Quite an important thing I would say, since 'vectorized' MATLAB is mostly about indexing. So
% random data
a = rand(10^7, 1);
% threashold - how much data meets the a>threashold criterion
% This determines the total indexing time - the more data we extract from a,
% the longer it takes.
% In this example - small threashold meaning most data in a
% will meet the criterion.
threashold = 0.08;
% prepare logical and integer indices (note the uint32 cast)
index_logical = a>threashold;
index_integer = uint32(find(index_logical));
% logical indexing of a
tic
for i=1:10
b = a(index_logical);
end
toc
% integer indexing of a
tic
for i=1:10
b = a(index_integer);
end
toc
On my computer the results are
Elapsed time is 0.755399 seconds.
Elapsed time is 0.728462 seconds.
meaning that the two methods perform almost the same - thats how I chose the example threashold. It is interesing, because the index_integer array is almost 4 times larger!
index_integer 9198678x1 36794712 uint32
index_logical 10000000x1 10000000 logical
For larger values of the threashold integer indexing is faster. Results for threashold=0.5:
Elapsed time is 0.687044 seconds. (logical)
Elapsed time is 0.296044 seconds. (integer)
Unless I am doing something wrong here, integer indexing seems to be the fastest most of the time.
Including the creation of the indices in the test yields very different results however:
a = rand(1e7, 1);
threshold = 0.5;
% logical
tic
for i=1:10
inds = a>threshold;
b = a(inds);
end
toc
% double
tic
for i=1:10
inds = find(a>threshold);
b = a(inds);
end
toc
% integer
tic
for i=1:10
inds = uint32(find(a>threshold));
b = a(inds);
end
toc
Results (Rody):
Elapsed time is 1.945478 seconds. (logical)
Elapsed time is 3.233831 seconds. (double)
Elapsed time is 3.508009 seconds. (integer)
Results (angainor):
Elapsed time is 1.440018 seconds. (logical)
Elapsed time is 1.851225 seconds. (double)
Elapsed time is 1.726806 seconds. (integer)
So it would seem that the actual indexing is faster when indexing with integers, but front-to-back, logical indexing performs much better.
The runtime difference between the last two methods is unexpected though -- it seems Matlab's internals either do not cast the doubles to integers, of perform error-checking on each element before doing the actual indexing. Otherwise, we would have seen virtually no difference between the double and integer methods.
Edit There are two options as I see it:
matlab converts double indices to uint32 indices explicitly before the indexing call (much like we do in the integer test)
matlab passes doubles and performs the double->int cast on the fly during the indexing call
The second option should be faster, because we only have to read the double indexes once. In our explicit conversion test we have to read double indices, write integer indices, and then again read the integer indices during the actual indexing. So matlab should be faster... Why is it not?