I'm unable to vectorize this :
for x=2:i
for y=2:j
if(x ~= y)
Savings(x,y) = Costs(x,1) + Costs(1,y) - Costs(x,y);
end
end
end
Could someone tell me of I could improve the performance of this code ? Thanks
With some help from bsxfun:
Ix=2:i;
Iy=2:j;
I = false(i,j);
I(Ix,Iy) = bsxfun(#ne, Ix', Iy);
S = bsxfun(#plus, Costs(Ix,1), Costs(1,Iy)) - Costs(Ix,Iy);
Savings(I) = S(I(Ix,Iy));
You can vectorize it like this, I don't know if that will effect your performance or not though. You will need to test that yourself.
m=size(Costs, 1);
n=size(Costs, 2);
[Y, X] = meshgrid(2:m, 2:n);
Cx = Costs(:,1);
Cy = Costs(1,:);
S = Cx(X) + Cy(Y) - Costs(2:end,2:end);
S(eye(m-1,n-1)==1) = 0;
Savings = zeros(m,n);
Savings(2:end,2:end) = S;
EDIT
Incidentally I have verified that all three methods give the same answer. For a Costs size of 400x400 the run times are were as follows:
Elapsed time is 0.00741386 seconds. //My method
Elapsed time is 0.003304 seconds. //Mohsen's method (after correcting to prevent errors)
Elapsed time is 2.16231 seconds. //Original Loop
So both our methods give a significant boost. However if you just pre-allocate Savings your loop is actually the fastest. Is this really too slow for your purposes?
Related
I am running a piece of Matlab code that is taking almost 70 hours and I'm sure there's a more efficient way of scripting it, but I cannot figure out how.
Looping over 1 iteration takes 1 second. The problem of course is that length(i) is 186144.
braindip = normrnd(0, 50, 186144,3);
nobrain = normrnd(0, 45, 25014656,3);
ok = 1;
alpha = 2;
h = waitbar(0,'Please wait...');
dip_away = nan(size(braindip));
for i = 1:size(braindip,1)
tic
h_norm = repmat(braindip(i,:), size(nobrain,1),1);
nn = sqrt(sum((h_norm - nobrain).^2,2));
if min(nn) > alpha
dip_away(ok,:) = braindip(i,:);
ok = ok+1;
end
toc
waitbar(i / size(braindip,1))
end
Does any one have a clever suggestion for optimising this loop? Thanks very much!
Assuming you are using MATLAB 2016b or later, which supports automatic broadcasting, you can change:
h_norm = repmat(braindip(i,:), size(nobrain,1),1);
nn = sqrt(sum((h_norm - nobrain).^2,2));
to
nn = sqrt(sum((braindip(i,:) - nobrain).^2,2));
A second option would be, to eliminate the sqrt. Use:
nn_qbd = sum((braindip(i,:) - nobrain).^2,2);
if min(nn_qbd) > alpha_qbd
Where alpha_qbd=alpha.^2, obviously calculated only once in advance. This leads to the third step, no need to store nn_qbd in a variable. You are only interested in the minimum:
nn_qbd_min = min(sum((braindip(i,:) - nobrain).^2,2));
if nn_qbd_min > alpha_qbd
Comparing the original code to the third option, the execution time is roughly cut in half.
I have some problems with the efficiency of my code. Basically my code works like this:
a = zeros(1,50000);
for n = 1:50000
a(n) = 10.*n - 5;
end
sum(a);
What is the fastest way to solve the sum of all the elements of this matrix?
first you want to remove your for loop by making it a vector multiplication:
tic
a = zeros(1,50000);
b = [1:50000];
a = 10.*b-5;
result = sum(a);
toc
Elapsed time is 0.008504 seconds.
An alternative way is to simplify your operation, you are multiplying 1 to 50000 by 10 and subtracting 5 then taking the sum (which is a single number), which is equivalent to:
tic
result = sum(1:50000)*10 - 5*50000;
toc
Elapsed time is 0.003851 seconds.
or if you are really into Math (this is a pure mathematical expression approach) :
tic
result = (1+50000)*(50000/2)*10 - 5*50000;
toc
Elapsed time is 0.003702 seconds.
and as you can see, a little math can do greater good than pure efficient programming, and actually, loop is not always slow, in your case, the loop is actually faster than the vectorized method:
tic
a = zeros(1,50000);
for n = 1:50000
a(n)=10.*n-5;
end
sum(a);
toc
Elapsed time is 0.006431 seconds.
Timing
Let's do some timing and see the results. The function to run it yourself is provided at the bottom. The approximate execution time execTime is in seconds and the percentage of improvement impPercentage in %.
Results
R2016a on OSX 10.11.4
execTime impPercentage
__________ _____________
loop 0.00059336 0
vectorized 0.00014494 75.574
adiel 0.00010468 82.359
math 9.3659e-08 99.984
Code
The following function can be used to generate the output. Note that it requires minimum R2013b to be able to use the built-in timeit-function and table.
function timings
%feature('accel','on') %// commented out because it's undocumented
cycleCount = 100;
execTime = zeros(4,cycleCount);
names = {'loop';'vectorized';'adiel';'math'};
w = warning;
warning('off','MATLAB:timeit:HighOverhead');
for k = 1:cycleCount
execTime(1,k) = timeit(#()loop,1);
execTime(2,k) = timeit(#()vectorized,1);
execTime(3,k) = timeit(#()adiel,1);
execTime(4,k) = timeit(#()math,1);
end
warning(w);
execTime = min(execTime,[],2);
impPercentage = (1 - execTime/max(execTime)) * 100;
table(execTime,impPercentage,'RowNames',names)
function result = loop
a = zeros(1,50000);
for n = 1:50000
a(n) = 10.*n - 5;
end
result = sum(a);
function result = vectorized
b = 1:50000;
a = 10.*b - 5;
result = sum(a);
function result = adiel
result = sum(1:50000)*10 - 5*50000;
function result = math
result = (1+50000)*(50000/2)*10 - 5*50000;
I am writing an Octave script to calculate the price of an European option.
The first part uses Monte Carlo to simulate the underlying asset price over n number of time periods. This is repeated nIter number of times.
Octave makes it very easy to setup initial matrices. But I haven't found the way to complete the task in a vectorized way, avoiding FOR loops:
%% Octave simplifies creation of 'e', 'dlns', and 'Prices'
e = norminv(rand(nIter,n));
dlns = cat(2, ones(nIter,1), exp((adj_r+0.5*sigma^2)*dt+sigma*e.*sqrt(dt)));
Prices = zeros(nIter, n+1);
for i = 1:nIter % IS THERE A WAY TO VECTORIZE THESE FOR LOOPS?
for j = 1:n+1
if j == 1
Prices(i,j)=S0;
else
Prices(i,j)=Prices(i,j-1)*dlns(i,j);
end
endfor
endfor
Note that the price in n is equal to price in n-1 times a factor, hence the following does not work...
Prices(i,:) = S0 * dlns(i,:)
...since it takes S0 and multiplies it by all the factors, yielding different results than the expected random walk.
Because of the dependency between iterations to obtain results for each new column with respect to the previous column, it seems you would need at least one loop there, but do all operations within a column in a vectorized fashion and that might speed it up for you. The vectorized replacement for the two nested loops would look something like this -
Prices(:,1)=S0;
for j = 2:n+1
Prices(:,j) = Prices(:,j-1).*dlns(:,j);
endfor
It just occurred to me that the dependency can be taken care of with cumprod that gets us cumulative product which is essentially being done here and thus would lead to a no-loop solution! Here's the implementation -
Prices = [repmat(S0,nIter,1) cumprod(dlns(:,2:end),2)*S0]
Benchmarking on MATLAB
Benchmarking Code -
%// Parameters as told by OP and then create the inputs
nIter= 100000;
n = 100;
adj_r = 0.03;
sigma = 0.2;
dt = 1/n;
S0 = 60;
e = norminv(rand(nIter,n));
dlns = cat(2, ones(nIter,1), exp((adj_r+0.5*sigma^2)*dt+sigma*e.*sqrt(dt)));
disp('-------------------------------------- With Original Approach')
tic
Prices = zeros(nIter, n+1);
for i = 1:nIter
for j = 1:n+1
if j == 1
Prices(i,j)=S0;
else
Prices(i,j)=Prices(i,j-1)*dlns(i,j);
end
end
end
toc, clear Prices
disp('-------------------------------------- With Proposed Approach - I')
tic
Prices2(nIter, n+1)=0; %// faster pre-allocation scheme
Prices2(:,1)=S0;
for j = 2:n+1
Prices2(:,j)=Prices2(:,j-1).*dlns(:,j);
end
toc, clear Prices2
disp('-------------------------------------- With Proposed Approach - II')
tic
Prices3 = [repmat(S0,nIter,1) cumprod(dlns(:,2:end),2)*S0];
toc, clear Prices3
Runtimes results -
-------------------------------------- With Original Approach
Elapsed time is 0.259054 seconds.
-------------------------------------- With Proposed Approach - I
Elapsed time is 0.020566 seconds.
-------------------------------------- With Proposed Approach - II
Elapsed time is 0.067292 seconds.
Now, the runtimes do suggest that the first proposed approach might be a better fit here!
I am trying to improve the speed of script I am trying to run.
Here is the code: (my machine = 4 core win 7)
clear y;
n=100;
x=linspace(0,1,n);
% no y pre-allocation using zeros
start_time=tic;
for k=1:n,
y(k) = (1-(3/5)*x(k)+(3/20)*x(k)^2 -(x(k)^3/60)) / (1+(2/5)*x(k)-(1/20)*x(k)^2);
end
elapsed_time1 = toc(start_time);
fprintf('Computational time for serialized solution: %f\n',elapsed_time1);
Above code gives 0.013654 elapsed time.
On the other hand, I was tried to use pre-allocation by adding y = zeros(1,n); in the above code where the comment is but the running time is similar around ~0.01. Any ideas why? I was told it would improve by a factor of 2. Am I missing something?
Lastly is there any type of vectorization in Matlab that will allow me to forget about the for loop in the above code?
Thanks,
In your code: try with n=10000 and you'll see more of a difference (a factor of almost 10 on my machine).
These things related with allocation are most noticeable when the size of your variable is large. In that case it's more difficult for Matlab to dynamically allocate memory for that variable.
To reduce the number of operations: do it vectorized, and reuse intermediate results to avoid powers:
y = (1 + x.*(-3/5 + x.*(3/20 - x/60))) ./ (1 + x.*(2/5 - x/20));
Benchmarking:
With n=100:
Parag's / venergiac's solution:
>> tic
for count = 1:100
y=(1-(3/5)*x+(3/20)*x.^2 -(x.^3/60))./(1+(2/5)*x-(1/20)*x.^2);
end
toc
Elapsed time is 0.010769 seconds.
My solution:
>> tic
for count = 1:100
y = (1 + x.*(-3/5 + x.*(3/20 - x/60))) ./ (1 + x.*(2/5 - x/20));
end
toc
Elapsed time is 0.006186 seconds.
You don't need a for loop. Replace the for loop with the following and MATLAB will handle it.
y=(1-(3/5)*x+(3/20)*x.^2 -(x.^3/60))./(1+(2/5)*x-(1/20)*x.^2);
This may give a computational advantage when vectors become larger in size. Smaller size is the reason why you cannot see the effect of pre-allocation. Read this page for additional tips on how to improve the performance.
Edit: I observed that at larger sizes, n>=10^6, I am getting a constant performance improvement when I try the following:
x=0:1/n:1;
instead of using linspace. At n=10^7, I gain 0.05 seconds (0.03 vs 0.08) by NOT using linspace.
try operation element per element (.*, .^)
clear y;
n=50000;
x=linspace(0,1,n);
% no y pre-allocation using zeros
start_time=tic;
for k=1:n,
y(k) = (1-(3/5)*x(k)+(3/20)*x(k)^2 -(x(k)^3/60)) / (1+(2/5)*x(k)-(1/20)*x(k)^2);
end
elapsed_time1 = toc(start_time);
fprintf('Computational time for serialized solution: %f\n',elapsed_time1);
start_time=tic;
y = (1-(3/5)*x+(3/20)*x.^2 -(x.^3/60)) / (1+(2/5)*x-(1/20)*x.^2);
elapsed_time1 = toc(start_time);
fprintf('Computational time for product solution: %f\n',elapsed_time1);
my data
Computational time for serialized solution: 2.578290
Computational time for serialized solution: 0.010060
Having a vector x and I have to add an element (newElem) .
Is there any difference between -
x(end+1) = newElem;
and
x = [x newElem];
?
x(end+1) = newElem is a bit more robust.
x = [x newElem] will only work if x is a row-vector, if it is a column vector x = [x; newElem] should be used. x(end+1) = newElem, however, works for both row- and column-vectors.
In general though, growing vectors should be avoided. If you do this a lot, it might bring your code down to a crawl. Think about it: growing an array involves allocating new space, copying everything over, adding the new element, and cleaning up the old mess...Quite a waste of time if you knew the correct size beforehand :)
Just to add to #ThijsW's answer, there is a significant speed advantage to the first method over the concatenation method:
big = 1e5;
tic;
x = rand(big,1);
toc
x = zeros(big,1);
tic;
for ii = 1:big
x(ii) = rand;
end
toc
x = [];
tic;
for ii = 1:big
x(end+1) = rand;
end;
toc
x = [];
tic;
for ii = 1:big
x = [x rand];
end;
toc
Elapsed time is 0.004611 seconds.
Elapsed time is 0.016448 seconds.
Elapsed time is 0.034107 seconds.
Elapsed time is 12.341434 seconds.
I got these times running in 2012b however when I ran the same code on the same computer in matlab 2010a I get
Elapsed time is 0.003044 seconds.
Elapsed time is 0.009947 seconds.
Elapsed time is 12.013875 seconds.
Elapsed time is 12.165593 seconds.
So I guess the speed advantage only applies to more recent versions of Matlab
As mentioned before, the use of x(end+1) = newElem has the advantage that it allows you to concatenate your vector with a scalar, regardless of whether your vector is transposed or not. Therefore it is more robust for adding scalars.
However, what should not be forgotten is that x = [x newElem] will also work when you try to add multiple elements at once. Furthermore, this generalizes a bit more naturally to the case where you want to concatenate matrices. M = [M M1 M2 M3]
All in all, if you want a solution that allows you to concatenate your existing vector x with newElem that may or may not be a scalar, this should do the trick:
x(end+(1:numel(newElem)))=newElem