Performace Issue on MATLAB's vector addressing - matlab

I'm wondering, what is faster for addressing a single Element of a vector:
1) direct access via
result = a(index)
or
2) access an element via a matrix multiplication e.g
a = [1 2 3 4]';
b = [0 0 1 0];
result = b*a; % Would return 3
In my oppinion (which comes from "classic" programming like C++) the first method must be more performant, because of the direct access...the second method would need a iteration through both vectors(?).
The reason why I'm asking is, that matlab is very performant on matrix and vector operations, maybe I am missing any aspect and the second method is more effective...

A quick test:
function [] = fun1()
a = [1 2 3 4]';
b = [0 0 1 0];
tic;
for i=1:1000000
r = a(3);
end
toc;
end
Elapsed time: 0.006 seconds
Change a(3) to b*a
Elapsed time: 0.9 seconds
The performance difference is quite obvious(, and you should have done that yourself before asking this question).
Reason behind that:
No matter how efficient MATLAB's calculation is, MATLAB still needs to fetch the number 1 by 1, and do multiplication 1 by 1, and sum up. There is no hope to be faster than a single access.
In your special case, there are all 0's except 1, but it is useless to do optimization for single special case in my opinion, and the best optimization I can come up with still needs to access all the elements for at least once each.
EDIT:
It seems I am in quite good mood today....
Change a(3) to a(1)*b(1)+a(2)*b(2)+a(3)*b(3)+a(4)*b(4)
Elapsed time: 0.02 seconds
It seems that boundary checking (and/or other errands) take more time than the access and calculation.

Why would you think that multiplying a lot of numbers by zeros would be at all efficient? Even if MATLAB could be smart enough to do a test first before the multiply, it must then still do many tests.
I'm asking this question to make a point, that the dot product cannot possibly be at all efficient. Even if MATLAB were smart enough to know that there was only one element that was non-zero, to know that, it would need to do a search for the non-zero element. And how would MATLAB be smart enough to know that what you have written as a vector*vector dot product is actually intended just to access a single element, instead of a true dot product for nefarious purposes unknown to it?

How about
3) access an element by a boolean index matrix:
a = [1 2 3 4]';
b = [0 0 1 0];
result = a(b)
It's almost certainly going to be faster than (2), slower than (1).

Related

Optimize a for loop in matlab

This is my code:
variables=1000;
t=20;
x=zeros(t,t,3);
y=rand(variables,3);
z=rand(t,t,variables);
e=rand(variables,1);
for c=1:variables
x(:,:,1)=x(:,:,1)+y(c,1).*((z(:,:,c)-e(c)).^2);
x(:,:,2)=x(:,:,2)+y(c,2).*((z(:,:,c)-e(c)).^2);
x(:,:,3)=x(:,:,3)+y(c,3).*((z(:,:,c)-e(c)).^2);
end
How can I improve calculation speed on this loop? I think that the problem is the for loop with a large c.
It's a myth, but alas a persistent one, that loops are slow in MATLAB. As you've written your for loop, it goes sequentially through the last dimension of your variables. That pretty much translates to a FORTRAN loop directly, leaving little room for improvement using vectorisation. The below does vectorise your output as much as possible, but doesn't improve performance much, even though reshape() is almost free, and severely degrades readability.
In each iteration, all you're doing is calculating y(c,1).*((z(:,:,c)-e(c)).^2), which is added to the total. If we are able to vectorise that expression, we can sum over the dimension of c to get rid of the loop.
z(:,:,c)-e(c) can be vectorised by adding two singleton dimensions to e: reshape(e, [1 1 numel(e)]), then subtract and power by 2 as usual.
Multiplication by y(c,1) also works, if we add two singleton dimensions to y(:,1):, reshape(y(:,1), [1 1 numel(e)]), then multiply again as usual.
Finally, we just need to sum over our 3rd dimension and we end up with our t -by- t result: sum(tmp2, 3).
All that's left are the hardcoded three dimensions in x, which I've left be in a loop.
The working code on R2007b:
variables=10;
t=2;
x=zeros(t,t,3);
y=rand(variables,3);
z=rand(t,t,variables);
e=rand(variables,1);
for ii = 1:size(x, 3)
x(:, :, ii) = sum(bsxfun(#times, reshape(y(:,1), [1 1 numel(e)]), bsxfun(#minus, z, reshape(e, [1 1 numel(e)])).^2), 3);
end
I wasn't sure what to do with the hardcoded dimension of 3, so I just left a loop over that. The rest is vectorised away, thanks to a few reshape() calls to arrange the dimensions for the bsxfun() expansion.
Code for >R2016b with implicit expansion:
for ii = 1:size(x, 3)
x(:, :, ii) = sum(reshape(y(:,ii), [1 1 numel(e)]) .* (z - reshape(e, [1 1 numel(e)])).^2, 3)
end
A quick timing comparison shows that this is roughly 2x faster than your original loop:
Elapsed time is 0.780516 seconds. Original code
Elapsed time is 0.397369 seconds. My bsxfun() solution
Elapsed time is 0.305160 seconds. My implicit expansion
Note that in the above a 100 loops were ran for each code version, i.e. timings are 8ms, 4ms and 3ms per version.
For an introduction to reshape() you can refer to this answer of mine.
The documentation article on implicit broadcasting is rather good, as is this blog.

Comparing columns of matrix and giving boolean output

I have checked other questions. I didn't find my answer. I have a matrix of n * 2 size. I want to compare the 1st and 2nd column and based on which is greater I want to assign 0/1 to the respective index. Suppose I want an output as
a = 1 2
4 3
7 8
I want the output like this
out = 0 1
1 0
0 1
I did this :
o1 = a(:,1) > a (:,2)
o2 = not(o1)
out = [o1, o2]
This does the job but I am sure there's a better way to do this. Need suggestions on that/.
Forgot to mention, the datatype is float in the matrix.
A more generic solution that can handle matrices with more than two columns:
out = bsxfun(#eq, a, max(a,[],2));
What you did is good. The number of lines doesn't really matter, what matters is the complexity of the operation in each line. Following the comments, I think you could gain some time as well by avoiding copy and multiple allocations:
out = false(size(a)); out(:,1) = (a(:,1) > a(:,2)); out(:,2) = ~out(:,1);
It is good practice to preallocate in Matlab, and in general to avoid copies in any programming language.
Optimizing further the runtime of this by using different operations is pointless IMO. If you really need speed you could Mex it to spare one iteration through the rows (second assignment), it's literally a dozen C lines, although you'd have to be careful about how you write the loop (the naive way would cause cache-miss at each iteration).

Optimization with discrete parameters in Matlab

I have 12 sets of vectors (about 10-20 vectors each) and i want to pick one vector of each set so that a function f that takes the sum of these vectors as argument is maximized. In addition i have constraints for some components of that sum.
Example:
a_1 = [3 2 0 5], a_2 = [3 0 0 2], a_3 = [6 0 1 1], ... , a_20 = [2 12 4 3]
b_1 = [4 0 4 -2], b_2 = [0 0 1 0], b_3 = [2 0 0 4], ... , b_16 = [0 9 2 3]
...
l_1 = [4 0 2 0], l_2 = [0 1 -2 0], l_3 = [4 4 0 1], ... , l_19 = [3 0 9 0]
s = [s_1 s_2 s_3 s_4] = a_x + b_y + ... + l_z
Constraints:
s_1 > 40
s_2 < 100
s_4 > -20
Target: Chose x, y, ... , z to maximize f(s):
f(s) -> max
Where f is a nonlinear function that takes the vector s and returns a scalar.
Bruteforcing takes too long because there are about 5.9 trillion combinations, and since i need the maximum (or even better the top 10 combinations) i can not use any of the greedy algorithms that came to my mind.
The vectors are quite sparse, about 70-90% are zeros. If that is helping somehow ...?
The Matlab Optimization toolbox didnt help either since it doesnt much support for discrete optimization.
Basically this is a lock-picking problem, where the lock's pins have 20 distinct positions, and there are 12 pins. Also:
some of the pin's positions will be blocked, depending on the positions of all the other pins.
Depending on the specifics of the lock, there may be multiple keys that fit
...interesting!
Based on Rasman's approach and Phpdna's comment, and the assumption that you are using int8 as data type, under the given constraints there are
>> d = double(intmax('int8'));
>> (d-40) * (d+100) * (d+20) * 2*d
ans =
737388162
possible vectors s (give or take a few, haven't thought about +1's etc.). ~740 million evaluations of your relatively simple f(s) shouldn't take more than 2 seconds, and having found all s that maximize f(s), you are left with the problem of finding linear combinations in your vector set that add up to one of those solutions s.
Of course, this finding of combinations is no easy feat, and the whole method breaks down anyway if you are dealing with
int16: ans = 2.311325368800510e+018
int32: ans = 4.253529737045237e+037
int64: ans = 1.447401115466452e+076
So, I'll discuss a more direct and more general approach here.
Since we're talking integers and a fairly large search space, I'd suggest using a branch-and-bound algorithm. But unlike the bintprog algorithm, you'd have to use different branching strategies, and of course, these should be based on a non-linear objective function.
Unfortunately, there is nothing like this in the optimization toolbox (or the File Exchange as far as I could find). fmincon is a no-go, since it uses gradient and Hessian information (which will usually be all-zero for integers), and fminsearch is a no-go, since you'll need a really good initial estimate, and the rate of convergence is (roughly) O(N), meaning, for this 20-dimensional problem you'll have to wait quite long before convergence, without the guarantee of having found the global solution.
An interval method could be a possibility, however, I personally have very little experience with this. There is no native interval-related stuff in MATLAB or any of its toolboxes, but there's the freely available INTLAB.
So, if you're not feeling like implementing your own non-linear binary integer programming algorithm, or are not in the mood for an adventure with INTLAB, there's really only one thing left: heuristic methods. In this link there is a similar situation, with an outline of the solution: use the genetic algorithm (ga) from the Global Optimization toolbox.
I would implement the problem roughly like so:
function [sol, fval, exitflag] = bintprog_nonlinear()
%// insert your data here
%// Any sparsity you may have here will only make this more
%// *memory* efficient, not *computationally*
data = [...
... %// this will be an array with size 4-by-20-by-12
... %// (or some permutation of that you find more intuitive)
];
%// offsets into the 3D array to facilitate indexing a bit
offsets = bsxfun(#plus, ...
repmat(1:size(data,1), size(data,3),1), ...
(0:size(data,3)-1)' * size(data,1)*size(data,2)); %//'
%// your objective function
function val = obj(X)
%// limit "X" to integers in [1 20]
X = min(max(round(X),1),size(data,3));
%// "X" will be a collection of 12 integers between 0 and 20, which are
%// indices into the data matrix
%// form "s" from "X"
s = sum(bsxfun(#plus, offsets, X*size(data,1) - size(data,1)));
%// XxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxX
%// Compute the NEGATIVE VALUE of your function here
%// XxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxX
end
%// your "non-linear" constraint function
function [C, Ceq] = nonlcon(X)
%// limit "X" to integers in [1 20]
X = min(max(round(X),1),size(data,3));
%// form "s" from "X"
s = sum(bsxfun(#plus, offsets, X(:)*size(data,1) - size(data,1)));
%// we have no equality constraints
Ceq = [];
%// Compute inequality constraints
%// NOTE: solver is trying to solve C <= 0, so:
C = [...
40 - s(1)
s(2) - 100
-20 - s(4)
];
end
%// useful GA options
options = gaoptimset(...
'UseParallel', 'always'...
...
);
%// The rest really depends on the specifics of the problem.
%// Useful to look at will be at least 'TolCon', 'Vectorized', and of course,
%// 'PopulationType', 'Generations', etc.
%// THE OPTIMZIATION
[sol, fval, exitflag] = ga(...
#obj, size(data,3), ... %// objective function, taking a vector of 20 values
[],[], [],[], ... %// no linear (in)equality constraints
1,size(data,2), ... %// lower and upper limits
#nonlcon, options); %// your "nonlinear" constraints
end
Note that even though your constraints are essentially linear, the way by which you must compute the value for your s necessitates the use of a custom constraint function (nonlcon).
Especially note that this is currently (probably) a sub-optimal way to use ga -- I don't know the specifics of your objective function, so a lot more may be possible. For instance, I currently use a simple round() to convert the input X to integers, but using 'PopulationType', 'custom' (with a custom 'CreationFcn', 'MutationFcn' etc.) might produce better results. Also, 'Vectorized' will likely speed things up a lot, but I don't know whether your function is easily vectorized.
And yes, I use nested functions (I just love those things!); it prevents these huge, usually identical lists of input arguments if you use sub-functions or stand-alone functions, and they can really be a performance boost because there is little copying of data. But, I realize that their scoping rules make them somewhat akin to goto constructs, and so they are -ahum- "not everyone's cup of tea"...you might want to convert them to sub-functions to prevent long and useless discussions with your co-workers :)
Anyway, this should be a good place to start. Let me know if this is useful at all.
Unless you define some intelligence on how the vector sets are organized, there will be no intelligent way of solving your problem other then pure brute force.
Say you find s s.t. f(s) is max given constraints of s, you still need to figure out how to build s with twelve 4-element vectors (an overdetermined system if there ever was one), where each vector has 20 possible values. Sparsity may help, although I'm not sure how it is possible to have a vector with four elements be 70-90% zero, and sparsity would only be useful if there was some yet to be described methodology in how the vector are organized
So I'm not saying you can't solve the problem, I'm saying you need to rethink how the problem is set-up.
I know, this answer is reaching you really late.
Unfortunately, the problem, as is, show not many patterns to be exploited, besides of brute force -Branch&Bound, Master& Slave, etc.- Trying a Master Slave approach -i.e. solving first the function continuous nonlinear problem as master, and solving the discrete selection as slave could help, but with as many combinations, and without any more information over the vectors, there is not too much space for work.
But based on the given continuous almost everywhere functions, based on combinations of sums and multiplication operators and their inverses, the sparsity is a clear point to be exploited here. If 70-90% of vectors are zero, almost a good part of the solution space will be close to zero, or close to infinite. Hence a 80-20 pseudo solution would discard easily the 'zero' combinations, and use only the 'infinite' ones.
This way, the brute-force could be guided.

MATLAB: Create smaller matrix without "side NaN" columns

I'm working with a 30*26000 size matrix that has NaNs at the beginning and at the end. NaNs are also sprinkled throughout each row. I can fill in the NaNs with linear interpolation but that will leave NaNs at the beginning and end of each row. Extrapolating to replace these NaNs at the ends is not ideal for my data set.
I want to just trim the matrix. Take for example a 3 by 6 matrix:
NaN NaN 1 2 3 NaN
NaN 1 2 3 NaN NaN
1 NaN 2 3 4 5
Cut off the left most and right most columns such that no row begins or ends with a NaN.
1 2
2 3
2 3
So we are left with a 3 by 2 matrix.
How can I do this in Matlab? (speed-optimized; I will need to apply this to a million size matrix)
Thanks!
For your example you can do the following:
let a your matrix with NaN and numerical values.
ind1 = sum(isnan(a),1); % count the NaN values along columns
s = find(ind1 == 0, 1, 'first'); % find the first column without any NaN
e = find(ind1 == 0, 1, 'last'); % find the last column without any NaN
So now just keep this part of the matrix from s-th to e-th column:
b = a(:,s:e);
Additional check may be needed for the case no column is clear of NaNs.
Firstly, the vectorized solution of argyris will work perfectly well (+1). I'm only posting this because you emphasized that you wanted a speed optimized solution. Well, the downside of argyris solution is that the sum and isnan operation are performed on the entire matrix. This will be optimal if you have to come a long way in on either side to find the first non-NaN column. But what if you don't? A loop-based solution that exploits the fact that you may only need to come in a few columns may do better (particularly given how good the JIT accelerator is getting at executing single loops quickly). I've put together a speed test that includes both argyris and my solution:
%#Set up an example case using the matrix size you indicated in the question
T = 30;
N = 26000;
X = rand(T, N);
TrueL = 8;
TrueR = N - 8;
X(:, 1:TrueL) = NaN;
X(:, TrueR:end) = NaN;
%#argyris solution
tic
I1 = sum(isnan(X));
argL = find(I1 == 0, 1, 'first');
argR = find(I1 == 0, 1, 'last');
Soln1 = X(:, argL:argR);
toc
%#My loop based solution (faster if TrueL and TrueR are small)
tic
for n = 1:N
if ~any(isnan(X(:, n)))
break
end
end
ColinL = n;
for n = N:-1:1
if ~any(isnan(X(:, n)))
break
end
end
ColinR = n;
Soln2 = X(:, ColinL:ColinR);
toc
In the above example, the solution will need to get rid of the first 8 and last 8 columns. The outcome of the speed test?
Elapsed time is 0.002919 seconds. %#argyris solution
Elapsed time is 0.001007 seconds. %#My solution
The loop based solution is almost 3 times faster. Okay, now let's up the number of columns that we need to get rid of on either side to 100:
Elapsed time is 0.002769 seconds. %#argyris solution
Elapsed time is 0.001999 seconds. %#My solution
Still ahead. What about 1000 columns on either side?
Elapsed time is 0.003597 seconds. %#argyris solution
Elapsed time is 0.003719 seconds. %#My solution
So we've found our tipping point (on my machine at least - Quad core i7, Linux Mint v12, Matlab R2012b). Once we need to come in about 1000 columns on either side, we're better off using the vectorized solution.
One final note of CAUTION: If the routine is occurring inside another (possibly unrelated) loop, then speed comparisons should be re-done. This is because my solution will now involve a double loop. Even if the loops are unrelated, the JIT accelerator is not so good with double loops. I did some quick tests on my machine, and my solution still comes out ahead for small TrueL and TrueR (ie less than 100), but the advantage is not as large as it was when the outer loop was not present.
Anyway, hope this proves useful to you or anyone else who comes a-reading.
Cheers!
EDIT: I've done a few speed tests incorporating angainor's very neat one-liner (+1). It performs almost as well as my loop based solution when the number of columns to be removed is small. Suprisingly, it didn't scale that well when the number of columns to be removed is large, unlike argyris's solution. That may have something to do with the computer I'm on now though: work Windows machine - I've never really trusted it fully :-)
Both earlier proposed solutions are great, I am posting this one-liner for completeness:
A(:,isfinite(sum(A)))
ans =
1 2
2 3
2 3
It avoids going through the matrix entries twice (what Colin pointed out) by first calculating the row sums and after that calling isfinite. I also removed the find calls - they are not necessary since you can use logical indexing instead.
I do not have my computer here, so I leave out the performance tests.

Octave / Matlab: Extend a vector making it repeat itself?

Is there a way to extend a vector by making it repeat itself?
>v = [1 2];
>v10 = v x 5; %x represents some function. Something like "1 2" x 5 in perl
Then v10 would be:
>v10
1 2 1 2 1 2 1 2 1 2
This should work for the general case, not just for [1 2]
The function you're looking for is repmat().
v10 = repmat(v, 1, 5)
Obviously repmat is the way to go if you know in which direction you want to expand the vector.
However, if you want a general solution that always repeats the vector in the longest direction, this combination of repmat and indexing should do the trick:
v10=v(repmat(1:length(v),1,5))
Although late, I am posting this because this turned out to be the most popular answer to a similar question here.
This is a Faster Method Than repmat or reshape by an Order of Magnitude
One of the best methods for doing such things is Using Tony's Trick. I came across this trick in one of the Electrical Engineering course lectures notes of Columbia University. Repmat and Reshape are usually found to be slower than Tony's trick as it directly uses Matlabs inherent indexing. To answer you question,
Lets say, you want to tile the row vector r=[1 2 3] N times like r=[1 2 3 1 2 3 1 2 3...], then,
c=r'
cc=c(:,ones(N,1));
r_tiled = cc(:)';
This method has significant time savings against reshape or repmat for large N's.
I conducted a small Matlab test to check the speed differential between repmat and tony's trick. Using the code mentioned below, I calculated the times for constructing the same tiled vector from a base vector A=[1:N]. The results show that YES, Tony's-Trick is FASTER BY AN ORDER of MAGNITUDE, especially for larger N. People are welcome to try it themselves. This much time differential can be critical if such an operation has to be performed in loops. Here is the small script I used;
N= 10 ;% ASLO Try for values N= 10, 100, 1000, 10000
% time for tony_trick
tic;
A=(1:N)';
B=A(:,ones(N,1));
C=B(:)';
t_tony=toc;
clearvars -except t_tony N
% time for repmat
tic;
A=(1:N);
B=repmat(A,1,N);
t_repmat=toc;
clearvars -except t_tony t_repmat N
The Times (in seconds) for both methods are given below;
N=10, time_repmat = 8e-5 , time_tony = 3e-5
N=100, time_repmat = 2.9e-4 , time_tony = 6e-5
N=1000, time_repmat = 0.0302 , time_tony = 0.0058
N=10000, time_repmat = 2.9199 , time_tony = 0.5292
My RAM didn't permit me to go beyond N=10000. I am sure, the time difference between the two methods will be even more significant for N=100000. I know, these times might be different for different machines, but the relative difference in order-of-magnitude of times will stand. Also, I know, the avg of times could have been a better metric, but I just wanted to show the order of magnitude difference in time consumption between the two approaches. My machine/os details are given below :
Relevant Machine/OS/Matlab Details : Athlon i686 Arch, Ubuntu 11.04 32 bit, 3gb ram, Matlab 2011b