Related
How can I vectorize this code? At the moment it runs so slow. I'm really stuck and I've spent the last couple of hours trying to vectorize it, however I can't seem to get it to work correctly.
My naive program below works incredibly slowly. N should really be 10,000 but the program is struggling with N = 100. Any advice would be appreciated.
The code wants to iterate through the functions given N times for each value w21. It then plots the last 200 values for each value of w21. The code below does work as expected in terms of the plot but as mentioned is far to slow since for a good plot the values need to be in the thousands.
hold on
% Number of iterations
N = 100;
x = 1;
y = 1;
z = 1;
for w21 = linspace(-12,-3,N)
for i = 1:N-1
y = y_iterate(x,z,w21);
z = z_iterate(y);
x = x_iterate(y);
if i >= (N - 200)
p = plot(w21,x,'.k','MarkerSize',3);
end
end
end
Required functions:
function val = x_iterate(y)
val = -3 + 8.*(1 ./ (1 + exp(-y)));
end
function val = z_iterate(y)
val = -7 + 8.*(1 ./ (1 + exp(-y)));
end
function val = y_iterate(x,z,w21)
val = 4 + w21.*(1 ./ (1 + exp(-x))) + 6.*(1 ./ (1 + exp(-z)));
end
I believe it's because of plot. Try:
[X,Y,Z] = deal( zeros(N,N-1) );
w21 = linspace(-12,-3,N);
for i = 1:N
for j = 1:N-1
y = y_iterate(x,z,w21(i));
z = z_iterate(y);
x = x_iterate(y);
X(i,j) = x;
Y(i,j) = y;
Z(i,j) = z;
end
end
nn = max(1,N-200);
plot(w21,X(nn:end,:),'.k')
I have a function that tells me the nth number in a Fibonacci sequence. The problem is it becomes very slow when trying to find larger numbers in the Fibonacci sequence does anyone know how I can fix this?
function f = rtfib(n)
if (n==1)
f= 1;
elseif (n == 2)
f = 2;
else
f =rtfib(n-1) + rtfib(n-2);
end
The Results,
tic; rtfib(20), toc
ans = 10946
Elapsed time is 0.134947 seconds.
tic; rtfib(30), toc
ans = 1346269
Elapsed time is 16.6724 seconds.
I can't even get a value after 5 mins doing rtfib(100)
PS: I'm using octave 3.8.1
If time is important (not programming techniques):
function f = fib(n)
if (n == 1)
f = 1;
elseif (n == 2)
f = 2;
else
fOld = 2;
fOlder = 1;
for i = 3 : n
f = fOld + fOlder;
fOlder = fOld;
fOld = f;
end
end
end
tic;fib(40);toc; ans = 165580141; Elapsed time is 0.000086 seconds.
You could even use uint64. n = 92 is the most you can get from uint64:
tic;fib(92);toc; ans = 12200160415121876738; Elapsed time is 0.001409 seconds.
Because,
fib(93) = 19740274219868223167 > intmax('uint64') = 18446744073709551615
Edit
In order to get fib(n) up to n = 183, It is possible to use two uint64 as one number,
with a special function for summation,
function [] = fib(n)
fL = uint64(0);
fH = uint64(0);
MaxNum = uint64(1e19);
if (n == 1)
fL = 1;
elseif (n == 2)
fL = 2;
else
fOldH = uint64(0);
fOlderH = uint64(0);
fOldL = uint64(2);
fOlderL = uint64(1);
for i = 3 : n
[fL q] = LongSum (fOldL , fOlderL , MaxNum);
fH = fOldH + fOlderH + q;
fOlderL = fOldL;
fOlderH = fOldH;
fOldL = fL;
fOldH = fH;
end
end
sprintf('%u',fH,fL)
end
LongSum is:
function [s q] = LongSum (a, b, MaxNum)
if a + b >= MaxNum
q = 1;
if a >= MaxNum
s = a - MaxNum;
s = s + b;
elseif b >= MaxNum
s = b - MaxNum;
s = s + a;
else
s = MaxNum - a;
s = b - s;
end
else
q = 0;
s = a + b;
end
Note some complications in LongSum might seem unnecessary, but they are not!
(All the deal with inner if is that I wanted to avoid s = a + b - MaxNum in one command, because it might overflow and store an irrelevant number in s)
Results
tic;fib(159);toc; Elapsed time is 0.009631 seconds.
ans = 1226132595394188293000174702095995
tic;fib(183);toc; Elapsed time is 0.009735 seconds.
fib(183) = 127127879743834334146972278486287885163
However, you have to be careful about sprintf.
I also did it with three uint64, and I could get up to,
tic;fib(274);toc; Elapsed time is 0.032249 seconds.
ans = 1324695516964754142521850507284930515811378128425638237225
(It's pretty much the same code, but I could share it if you are interested).
Note that we have fib(1) = 1 , fib(2) = 2according to question, while it is more common with fib(1) = 1 , fib(2) = 1, first 300 fibs are listed here (thanks to #Rick T).
Seems like fibonaacci series follows the golden ratio, as talked about in some detail here.
This was used in this MATLAB File-exchange code and I am writing here, just the esssence of it -
sqrt5 = sqrt(5);
alpha = (1 + sqrt5)/2; %// alpha = 1.618... is the golden ratio
fibs = round( alpha.^n ./ sqrt5 )
You can feed an integer into n for the nth number in Fibonacci Series or feed an array 1:n to have the whole series.
Please note that this method holds good till n = 69 only.
If you have access to the Symbolic Math Toolbox in MATLAB, you could always just call the Fibonacci function from MuPAD:
>> fib = #(n) evalin(symengine, ['numlib::fibonacci(' num2str(n) ')'])
>> fib(274)
ans =
818706854228831001753880637535093596811413714795418360007
It is pretty fast:
>> timeit(#() fib(274))
ans =
0.0011
Plus you can you go for as large numbers as you want (limited only by how much RAM you have!), it is still blazing fast:
% see if you can beat that!
>> tic
>> x = fib(100000);
>> toc % Elapsed time is 0.004621 seconds.
% result has more than 20 thousand digits!
>> length(char(x)) % 20899
Here is the full value of fib(100000): http://pastebin.com/f6KPGKBg
To reach large numbers you can use symbolic computation. The following works in Matlab R2010b.
syms x y %// declare variables
z = x + y; %// define formula
xval = '0'; %// initiallize x, y values
yval = '1';
for n = 2:300
zval = subs(z, [x y], {xval yval}); %// update z value
disp(['Iteration ' num2str(n) ':'])
disp(zval)
xval = yval; %// shift values
yval = zval;
end
You can do it in O(log n) time with matrix exponentiation:
X = [0 1
1 1]
X^n will give you the nth fibonacci number in the lower right-hand corner; X^n can be represented as the product of several matrices X^(2^i), so for example X^11 would be X^1 * X^2 * X^8, i <= log_2(n). And X^8 = (X^4)^2, etc, so at most 2*log(n) matrix multiplications.
One performance issue is that you use a recursive solution. Going for an iterative method will spare you of the argument passing for each function call. As Olivier pointed out, it will reduce the complexity to linear.
You can also look here. Apparently there's a formula that computes the n'th member of the Fibonacci sequence. I tested it for up to 50'th element. For higher n values it's not very accurate.
The implementation of a fast Fibonacci computation in Python could be as follows. I know this is Python not MATLAB/Octave, however it might be helpful.
Basically, rather than calling the same Fibonacci function over and over again with O(2n), we are storing Fibonacci sequence on a list/array with O(n):
#!/usr/bin/env python3.5
class Fib:
def __init__(self,n):
self.n=n
self.fibList=[None]*(self.n+1)
self.populateFibList()
def populateFibList(self):
for i in range(len(self.fibList)):
if i==0:
self.fibList[i]=0
if i==1:
self.fibList[i]=1
if i>1:
self.fibList[i]=self.fibList[i-1]+self.fibList[i-2]
def getFib(self):
print('Fibonacci sequence up to ', self.n, ' is:')
for i in range(len(self.fibList)):
print(i, ' : ', self.fibList[i])
return self.fibList[self.n]
def isNonnegativeInt(value):
try:
if int(value)>=0:#throws an exception if non-convertible to int: returns False
return True
else:
return False
except:
return False
n=input('Please enter a non-negative integer: ')
while isNonnegativeInt(n)==False:
n=input('A non-negative integer is needed: ')
n=int(n) # convert string to int
print('We are using ', n, 'based on what you entered')
print('Fibonacci result is ', Fib(n).getFib())
Output for n=12 would be like:
I tested the runtime for n=100, 300, 1000 and the code is really fast, I don't even have to wait for the output.
One simple way to speed up the recursive implementation of a Fibonacci function is to realize that, substituting f(n-1) by its definition,
f(n) = f(n-1) + f(n-2)
= f(n-2) + f(n-3) + f(n-2)
= 2*f(n-2) + f(n-3)
This simple transformation greatly reduces the number of steps taken to compute a number in the series.
If we start with OP's code, slightly corrected:
function result = fibonacci(n)
switch n
case 0
result = 0;
case 1
result = 1;
case 2
result = 1;
case 3
result = 2;
otherwise
result = fibonacci(n-2) + fibonacci(n-1);
end
And apply our transformation:
function result = fibonacci_fast(n)
switch n
case 0
result = 0;
case 1
result = 1;
case 2
result = 1;
case 3
result = 2;
otherwise
result = fibonacci_fast(n-3) + 2*fibonacci_fast(n-2);
end
Then we see a 30x speed improvement for computing the 20th number in the series (using Octave):
>> tic; for ii=1:100, fibonacci(20); end; toc
Elapsed time is 12.4393 seconds.
>> tic; for ii=1:100, fibonacci_fast(20); end; toc
Elapsed time is 0.448623 seconds.
Of course Rashid's non-recursive implementation is another 60x faster still: 0.00706792 seconds.
I'm trying to optimize the performance (e.g. speed) of my code. I 'm new to vectorization and tried myself to vectorize, but unsucessful ( also try bxsfun, parfor, some kind of vectorization, etc ). Can anyone help me optimize this code, and a short description of how to do this?
% for simplify, create dummy data
Z = rand(250,1)
z1 = rand(100,100)
z2 = rand(100,100)
%update missing param on the last updated, thanks #Bas Swinckels and #Daniel R
j = 2;
n = length(Z);
h = 0.4;
tic
[K1, K2] = size(z1);
result = zeros(K1,K2);
for l = 1 : K1
for m = 1: K2
result(l,m) = sum(K_h(h, z1(l,m), Z(j+1:n)).*K_h(h, z2(l,m), Z(1:n-j)));
end
end
result = result ./ (n-j);
toc
The K_h.m function is the boundary kernel and defined as (x is scalar and y can be vector)
function res = K_h(h, x,y)
res = 0;
if ( x >= 0 & x < h)
denominator = integral(#kernelFunc,-x./h,1);
res = 1./h.*kernelFunc((x-y)/h)/denominator;
elseif (x>=h & x <= 1-h)
res = 1./h*kernelFunc((x-y)/h);
elseif (x > 1 - h & x <= 1)
denominator = integral(#kernelFunc,-1,(1-x)./h);
res = 1./h.*kernelFunc((x-y)/h)/denominator;
else
fprintf('x is out of [0,1]');
return;
end
end
It takes a long time to obtain the results: \Elapsed time is 13.616413 seconds.
Thank you. Any comments are welcome.
P/S: Sorry for my lack of English
Some observations: it seems that Z(j+1:n)) and Z(1:n-j) are constant inside the loop, so do the indexing operation before the loop. Next, it seems that the loop is really simple, every result(l, m) depends on z1(l, m) and z2(l, m). This is an ideal case for the use of arrayfun. A solution might look something like this (untested):
tic
% do constant stuff outside of the loop
Zhigh = Z(j+1:n);
Zlow = Z(1:n-j);
result = arrayfun(#(zz1, zz2) sum(K_h(h, zz1, Zhigh).*K_h(h, zz2, Zlow)), z1, z2)
result = result ./ (n-j);
toc
I am not sure if this will be a lot faster, since I guess the running time will not be dominated by the for-loops, but by all the work done inside the K_h function.
I have a matrix named l having size 20X3.
What I wanted to do was this :
Suppose I have this limits:
l1_max=20; l1_min=0.5;
l2_max=20; l2_min=0.5;
mu_max=20; mu_min=0.5;
I wanted to force all the elements of the matrix l within the limits.
The values of 1st column within l1_max & l1_min.
The values of 2nd column within l2_max & l2_min.
The values of 3rd column within mu_max & mu_min.
What I did was like this:
for k=1:20
if l(k,1)>l1_max
l(k,1) = l1_max;
elseif l(k,1)<l1_min
l(k,1) = l1_min;
end
if l(k,2)>l2_max
l(k,2) = l2_max;
elseif l(k,2)<l2_min
l(k,2) = l2_min;
end
if l(k,3)>mu_max
l(k,3) = mu_max;
elseif l(k,3)<mu_min
l(k,3) = mu_min;
end
end
Can it be done in a better way ?
You don't have to loop over rows, use vectorized operations on entire columns:
l(l(:, 1) > l1_max, 1) = l1_max;
l(l(:, 1) < l1_min, 1) = l1_min;
Similarily:
l(l(:, 2) > l2_max, 2) = l2_max;
l(l(:, 2) < l2_min, 2) = l2_min;
l(l(:, 3) > l2_max, 3) = mu_max;
l(l(:, 3) < l2_min, 3) = mu_min;
An alternative method, which resembles to Bas' idea, is to apply min and max as follows:
l(:, 1) = max(min(l(:, 1), l1_max), l1_min);
l(:, 2) = max(min(l(:, 2), l2_max), l2_min);
l(:, 3) = max(min(l(:, 3), mu_max), mu_min);
It appears that both approaches have comparable performance.
You don't even have to loop over all columns, the operation on the whole matrix can be done in 2 calls to bsxfun, independent of the number of columns:
column_max = [l1_max, l2_max, mu_max];
column_min = [l1_min, l2_min, mu_min];
M = bsxfun(#min, M, column_max); %clip to maximum
M = bsxfun(#max, M, column_min); %clip to minimum
This uses two tricks: to clip a value between min_val and max_val, you can do clipped_x = min(max(x, min_val), max_val). The other trick is to use the somewhat obscure bsxfun, which applies a function after doing singleton expansion. When you use it on two matrices, it 'extrudes' the smallest one to the same size as the largest one before applying the function, so the example above is equivalent to M = min(M, repmat(column_max, size(M, 1), 1)), but hopefully calculated in a more efficient way.
Below is a benchmark to test the various methods discussed so far. I'm using the TIMEIT function found on the File Exchange.
function [t,v] = testClampColumns()
% data and limits ranges for each column
r = 10000; c = 500;
M = randn(r,c);
mn = -1.1 * ones(1,c);
mx = +1.1 * ones(1,c);
% functions
f = { ...
#() clamp1(M,mn,mx) ;
#() clamp2(M,mn,mx) ;
#() clamp3(M,mn,mx) ;
#() clamp4(M,mn,mx) ;
#() clamp5(M,mn,mx) ;
};
% timeit and check results
t = cellfun(#timeit, f, 'UniformOutput',true);
v = cellfun(#feval, f, 'UniformOutput',false);
assert(isequal(v{:}))
end
Given the following implementations:
1) loop over all values and compare against min/max
function M = clamp1(M, mn, mx)
for j=1:size(M,2)
for i=1:size(M,1)
if M(i,j) > mx(j)
M(i,j) = mx(j);
elseif M(i,j) < mn(j)
M(i,j) = mn(j);
end
end
end
end
2) compare each column against min/max
function M = clamp2(M, mn, mx)
for j=1:size(M,2)
M(M(:,j) < mn(j), j) = mn(j);
M(M(:,j) > mx(j), j) = mx(j);
end
end
3) truncate each columns to limits
function M = clamp3(M, mn, mx)
for j=1:size(M,2)
M(:,j) = min(max(M(:,j), mn(j)), mx(j));
end
end
4) vectorized version of truncation in (3)
function M = clamp4(M, mn, mx)
M = bsxfun(#min, bsxfun(#max, M, mn), mx);
end
5) absolute value comparison: -a < x < a <==> |x| < a
(Note: this is not applicable to your case, since it requires a symmetric limits range. I only included this for completeness. Besides it turns out to be the slowest method.)
function M = clamp5(M, mn, mx)
assert(isequal(-mn,mx), 'Only works when -mn==mx')
idx = bsxfun(#gt, abs(M), mx);
v = bsxfun(#times, sign(M), mx);
M(idx) = v(idx);
end
The timing I get on my machine with an input matrix of size 10000x500:
>> t = testClampColumns
t =
0.2424
0.1267
0.0569
0.0409
0.2868
I would say that all the above methods are acceptably fast enough, with the bsxfun solution being the fastest :)
I have a for loop nested thrice in a matlab program. Can any of you help me optimize it.
w=5;
a = rand(m*n,10); b=rand(m,n);
for i = 1 : m
for j = 1 : n
for k = 1 : l
if (i-w >= 1 && i+w <= m)
featureL = a(((i-1)*n)+j,:); featureR = a(((i-1)*n)+j-d,:);
D1(i,j,k) = sqrt( sum( (featureL - featureR) .* (featureL - featureR) ) );
D2(i,j,k) = mean2( b(i-w:i+w, j-w:j+w) );
end
end
end
end
I know the performance could be heavily improved by using meshgrid, but I am not sure how to do it.
Thanks in anticipation.
Can it be done something like this..
[X Y Z] = meshgrid(1:m,1:n,1:l);
D1(something containing X,Y,Z) = sqrt( sum( ( a(something cont. X,Y) - a(something cont. X,Y)).*(a(something cont. X,Y) - a(something cont. X,Y)) ) );
% similarly D2
Thanks a lot!.
I've found that a good way to attack these things is incrementally. Start by examining everything in the innermost loop, and see if it can be done at a higher level. This will reduce repeated computations.
For example, you can perform your if (i-w >= 1 && i+w <= m) two levels higher (since it only depends on i,w, and m), reducing if checks and skipping loop iterations.
Once that is done, your featureL and featureR calculations can be moved up one level; they are performed inside the k loop but only depend on j. Similarly, sqrt( sum( (featureL - featureR) .* (featureL - featureR) ) ) can be computed outside of the k loop, put into a variable, and assigned later.
In fact, as far as I can see you can get rid of the entire k loop since k is never used. Here's your code with some of this applied:
w=5;
a = rand(m*n,10);
b=rand(m,n);
for i = 1 : m
if (i-w >= 1 && i+w <= m)
for j = 1 : n
featureL = a(((i-1)*n)+j,:);
featureR = a(((i-1)*n)+j-d,:);
x = sqrt( sum( (featureL - featureR) .* (featureL - featureR) ) );
y = mean2( b(i-w:i+w, j-w:j+w) )
D1(i,j,:) = x;
D2(i,j,:) = y;
end
end
end