Julia Code optimization - compiler-optimization

I have the following code from a previous question and I need help optimizing the code for speed. This is the code:
function OfdmSym()
N = 64
n = 1000
symbol = ones(Complex{Float64}, n, 64)
data = ones(Complex{Float64}, 1, 48)
unused = zeros(Complex{Float64}, 1, 12)
pilot = ones(Complex{Float64}, 1, 4)
s = [-1-im -1+im 1-im 1+im]
for i=1:n # generate 1000 symbols
for j = 1:48 # generate 48 complex data symbols whose basis is s
r = rand(1:4) # 1, 2, 3, or 4
data[j] = s[r]
end
symbol[i,:]=[data[1,1:10] pilot[1] data[1,11:20] pilot[2] data[1,21:30] pilot[3] data[1,31:40] pilot[4] data[1,41:48] unused]
end
end
OfdmSym()
I appreciate your help.

First of all, I timed it with N = 100000
OfdmSym() # Warmup
for i = 1:5
#time OfdmSym()
end
and its pretty quick as it is
elapsed time: 3.235866305 seconds (1278393328 bytes allocated, 15.18% gc time)
elapsed time: 3.147812323 seconds (1278393328 bytes allocated, 14.89% gc time)
elapsed time: 3.144739194 seconds (1278393328 bytes allocated, 14.68% gc time)
elapsed time: 3.118775273 seconds (1278393328 bytes allocated, 14.79% gc time)
elapsed time: 3.137765971 seconds (1278393328 bytes allocated, 14.85% gc time)
But I rewrote using for loops to avoid the slicing:
function OfdmSym2()
N = 64
n = 100000
symbol = zeros(Complex{Float64}, n, 64)
s = [-1-im, -1+im, 1-im, 1+im]
for i=1:n
for j = 1:48
#inbounds symbol[i,j] = s[rand(1:4)]
end
symbol[i,11] = one(Complex{Float64})
symbol[i,22] = one(Complex{Float64})
symbol[i,33] = one(Complex{Float64})
symbol[i,44] = one(Complex{Float64})
end
end
OfdmSym2() # Warmup
for i = 1:5
#time OfdmSym2()
end
which is 20x faster
elapsed time: 0.159715932 seconds (102400256 bytes allocated, 12.80% gc time)
elapsed time: 0.159113184 seconds (102400256 bytes allocated, 14.75% gc time)
elapsed time: 0.158200345 seconds (102400256 bytes allocated, 14.82% gc time)
elapsed time: 0.158469032 seconds (102400256 bytes allocated, 15.00% gc time)
elapsed time: 0.157919113 seconds (102400256 bytes allocated, 14.86% gc time)
If you look at the profiler (#profile) you'll see that most of the time is spent generating random numbers, as you'd expect, as everything else is just moving numbers around.

It's all just bits, right? This isn't clean (at all), but it runs slightly faster on my machine (which is much slower than yours so I won't bother posting my times). Is it a little faster on your machine?
function my_OfdmSym()
const n = 100000
const my_one_bits = uint64(1023) << 52
const my_sign_bit = uint64(1) << 63
my_sym = Array(Uint64,n<<1,64)
fill!(my_sym, my_one_bits)
for col = [1:10, 12:21, 23:32, 34:43, 45:52]
for row = 1:(n<<1)
if randbool() my_sym[row, col] |= my_sign_bit end
end
end
my_symbol = reinterpret(Complex{Float64}, my_sym, (n, 64))
for k in [11, 22, 33, 44]
my_symbol[:, k] = 1.0
end
for k=53:64
my_symbol[:, k] = 0.0
end
end

Related

How to decrease the computation time of this code?

I am working on computing some features of an image data set and saving the features for later use. Below is the code:
tic
l = 9907 % size of image data set
% pre-allocating space for variables in the for loop
Icolor = cell(1,l);
Iwave = cell(1,l);
IglrlFeatures = cell(1,l);
for i = 1:l % l = size of image data set = 9907
IDB{1,i} = imread(strcat(path,strcat(num2str(i),'.jpg')));
Icolor{1,i} = colorMoments(IDB{1,i}); % 6-features in each cell
Iwave{1,i} = waveletTransform(IDB{1,i}); % 8-features in each cell
IglrlFeatures{1,i} = textureFeatures(IDB{1,i}); % 44-features in each cell
ICW{1,i} = [Icolor{1,i} Iwave{1,i} IglrlFeatures{1,i}];
end
toc
Here the computation time for each function on single image is:
colorMoments(single_image) = Elapsed time is 0.009689 seconds.
waveletTransform(single_image) = Elapsed time is 0.018069 seconds.
textureFeatures(single_image) = Elapsed time is 0.022902 seconds.
l = data set size = 9907 images
Computational times for different data set sizes (l):
l = 10; Elapsed time is 0.402629 seconds.
l = 100; Elapsed time is 2.233971 seconds.
l = 1000; Elapsed time is 21.178395 seconds.
l = 2000; Elapsed time is 44.510071 seconds.
l = 5000; Elapsed time is 111.393866 seconds.
l = 9907; Elapsed time is 238.924998 seconds. approximately (~4 mins)
I want to decrease this computational time, any suggestions?
Thanks,
Gopi
The measured times seem to indicate computational complexity of order O(n). I doubt that the order can be reduced further for this type of problem. So, I believe that, at best, we can only hope for a linear increase in performance.
One thing you should look into is whether the code is using multiple processor cores. If not, try restructuring the for-loop to be able to use parfor instead.

Performance/Low level meaning of " a(idx) = [] " vs "a = a(~idx)"

As the title says, I wish to know what does Matlab do differently between the two options. For the sake of argument, let's say that matrix a and idx are sufficiently large to be dealing with memory issues, and define:
Case A: a(idx) = []
Case B: a = a(~idx)
My intuition says that in Case A performs a value reassignment, which then the CPU needs to deal with indexed copies from original positions to the new ordered ones, while keeping track what is the current "head" of the same matrix, and later trimming the excess memory.
On the other hand, Case B would perform an indexed bulk copy to a newly allocation memory space.
So probably Case A is slower but less memory demanding than Case B. Am I assuming right? I don't know, immediately after writing this I feel like Case B needs to perform Case A first... Any ideas?
Thanks in advance
It's interesting, so I decided to take a measure:
I am using Windows (64 bits) version of Matlab R2016a.
CPU: Core i5-3550 at 3.3GHz.
Memory: 8GB DDR3 1333 (Dual channel).
len = 100000000; %Number of elements in array (make it large enouth to be outsize of cache memory).
idx = zeros(len, 1, 'logical'); %Fill idx with ones.
idx(1:10:end) = 1; %Put 1 in every 10'th element of idx.
a = ones(len, 1); %Fill arrary a with ones.
disp('Measure: a(idx) = [];')
tic
a(idx) = [];
toc
a = ones(len, 1);
disp(' ');disp('Measure: a = a(~idx);')
tic
a = a(~idx);
toc
disp(' ');disp('Measure: not_idx = ~idx;')
tic
not_idx = ~idx;
toc
a = ones(len, 1);
disp(' ');disp('Measure: a = a(not_idx);')
tic
a = a(not_idx);
toc
Result:
Measure: a(idx) = [];
Elapsed time is 1.647617 seconds.
Measure: a = a(~idx);
Elapsed time is 0.732233 seconds.
Measure: not_idx = ~idx;
Elapsed time is 0.032649 seconds.
Measure: a = a(not_idx);
Elapsed time is 0.686351 seconds.
Conclusions:
a = a(~idx) is about twice faster than a(idx) = [].
Total time of a = a(~idx) equals sum of not_idx = ~idx plus a = a(not_idx)
Matlab is probably calculating ~idx separately, so it consumes more memory.
Memory consumption meters only when physical RAM is in full consumption.
I think it's negligible (~idx memory consumption is temporary).
Both solutions are not optimized.
I estimate, fully optimized implementation (in C) to be 10 times faster.

One step reduction in MATLAB increases the execution time

I am doing a comparison and performance test of 3 methods to get the closest index of what i click in the ginput() the first method takes distance from each click and finds the nearest index in the next step the second one is doing the same but through a for loop and 3rd is the exact same copy of the first one but reduction of one step
ax = subplot(1,1,1)
plot(timestamps,datavalue)
hzoom = zoom(ax);
hzoom.Motion = 'horizontal';
[x, ~] = ginput(2);
%1)
tic;
tmp = abs(bsxfun(#minus,x,datenum(timestamps).'));
[~, idx1] = min(tmp,[],2);
toc;
%2)
tic;
for r = 1:length(x)
val = x(r);
tmp = abs(datenum(datenum(timestamps - val)));
[~, idx2] = min(tmp);
closest_indx(r) = idx2;
end
toc;
%3)
tic;
[~, idx3] = min(abs(bsxfun(#minus,x,datenum(timestamps).')),[],2);
toc;
Now when I look at the results
test1)
Elapsed time is 0.009182 seconds.
Elapsed time is 0.019211 seconds.
Elapsed time is 0.011261 seconds.
test2)
Elapsed time is 0.012625 seconds.
Elapsed time is 0.022681 seconds.
Elapsed time is 0.017999 seconds.
test3)
Elapsed time is 0.013053 seconds.
Elapsed time is 0.020170 seconds.
Elapsed time is 0.015248 seconds.
test4)
Elapsed time is 0.011613 seconds.
Elapsed time is 0.018644 seconds.
Elapsed time is 0.015952 seconds.
It takes less time for the first method even though it has one more step of taking all the values and placing in into a 'tmp' matrix. Does anyone have a good explaination for this ?

Efficient 3D element-wise operations in MATLAB

Say i have two martrices:
A=50;
B=50;
C=1000;
X = rand(A,B);
Y = rand(A,B,C);
I want to subtract X from each slice C of Y. This is a fairly common problem and i have found three alternative solutions:
% Approach 1: for-loop
tic
Z1 = zeros(size(Y));
for i=1:C
Z1(:,:,i) = Y(:,:,i) - X;
end
toc
% Approach 2: repmat
tic
Z2 = Y - repmat(X,[1 1 C]);
toc
% Approach 3: bsxfun
tic
Z3=bsxfun(#minus,Y,X);
toc
I'm building a program which frequently (i.e., many thousands of times) solves problems like this, so i'm looking for the most efficient solution. Here is a common pattern of results:
Elapsed time is 0.013527 seconds.
Elapsed time is 0.004080 seconds.
Elapsed time is 0.006310 seconds.
The loop is clearly slower, and bsxfun is a little slower than repmat. I find the same pattern when i element-wise multiply (rather than subtract) X against slices of Y, though repmat and bsxfun are a little closer in multiplication.
Increasing the size of the data...
A=500;
B=500;
C=1000;
Elapsed time is 2.049753 seconds.
Elapsed time is 0.570809 seconds.
Elapsed time is 1.016121 seconds.
Here, repmat is the clear winner. I'm wondering if anyone in the SO community has a cool trick up their sleeves to speed this operation up at all.
Depending on your real case scenario, bsxfun and repmat will sometimes have some advantage over the other, just like #rayryeng suggested. There is one other option you can consider : mex. I hard coded some parameters for better performance here.
#include "mex.h"
#include "matrix.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double *A, *B, *C;
int ind_1, ind_2, ind_3, ind_21, ind_32, ind_321, Dims[3] = {500,500,5000};
plhs[0] = mxCreateNumericArray(3, Dims, mxDOUBLE_CLASS, mxREAL);
A = mxGetPr(prhs[0]);
B = mxGetPr(prhs[1]);
C = mxGetPr(plhs[0]);
for ( int ind_3 = 0; ind_3 < 5000; ind_3++)
{
ind_32 = ind_3*250000;
for ( int ind_2 = 0; ind_2 < 500; ind_2++)
{
ind_21 = ind_2*500; // taken out of the innermost loop to save some calculation
ind_321 = ind_32 + ind_21;
for ( int ind_1 = 0 ; ind_1 < 500; ind_1++)
{
C[ind_1 + ind_321] = A[ind_1 + ind_321] - B[ind_1 + ind_21];
}
}
}
}
To use it, type this into command window ( assuming you name the above c file mexsubtract.c )
mex -WIN64 mexsubtract.c
Then you can use it like this:
Z4 = mexsubtract(Y,X);
Here are some test results on my computer using A=500, B=500, C=5000:
(repmat) Elapsed time is 3.441695 seconds.
(bsxfun) Elapsed time is 3.357830 seconds.
(cmex) Elapsed time is 3.391378 seconds.
It's a close contender and in some more extreme case, it'll have an edge. For example, this is what I got with A = 10, B = 500, C = 200000 :
(repmat) Elapsed time is 2.769177 seconds.
(bsxfun) Elapsed time is 3.178385 seconds.
(cmex) Elapsed time is 2.552115 seconds.

Is there a Matlab function to convert elapsed seconds to HH:MM:SS format?

I would like to convert an elapsed number of seconds into HH:MM:SS format. Is there a built-in function for this, or do I have to write my own?
datestr is probably the function you are looking for. Express your time interval as a decimal fraction of a day, for example:
>> datestr(0.25, 'HH:MM:SS.FFF')
ans =
06:00:00.000
That is, one quarter of a day is 6 hours. If you want to transform intervals longer than a day this way you'll have to adjust the second argument, which formats the function's output, for example:
>> datestr(2.256789741, 'DD:HH:MM:SS.FFF')
ans =
02:06:09:46.634
The first argument to datestr could also be either a date vector or a date string rather than a date serial number. This should get you started, if you have problems ask another question or edit this one.
--
To convert a time in seconds using datestr, divide the value by 24*60*60.
Sample:
t1 = toc;
timeString = datestr(t1/(24*60*60), 'DD:HH:MM:SS.FFF');
I don't know a built-in function. However, there is a SEC2HMS on Matlab's File Exchange. Basically, it boils down to something like
function [hours, mins, secs] = sec2hms(t)
hours = floor(t / 3600);
t = t - hours * 3600;
mins = floor(t / 60);
secs = t - mins * 60;
end
If you also want to have it formatted, use a printf:
function hms = sec2hms(t)
hours = floor(t / 3600);
t = t - hours * 3600;
mins = floor(t / 60);
secs = t - mins * 60;
hms = sprintf('%02d:%02d:%05.2f\n', hours, mins, secs);
end
sec2hms(69.9904)
ans =
00:01:09.99
If you want to get the hours, minutes and seconds as doubles consider the following line of code:
seconds = 5000;
hms = fix(mod(seconds, [0, 3600, 60]) ./ [3600, 60, 1])
hms =
1 23 20
This line of code is more than 100 times faster than using the built-in datestr funciton.
nIterations = 10000;
tic
for i = 1:nIterations
hms = fix(mod(seconds, [0, 3600, 60])./[3600, 60, 1]);
end
sprintf('%f ms\r', toc / nIterations * 1000)
gives 0.001934 ms.
tic
for i = 1:nIterations
datestr(seconds/24/3600, 'HH:MM:SS');
end
sprintf('%f ms\r', toc / nIterations * 1000)
gives 0.209402 ms.
If you want from original second input, just convert it to a fraction of the day:
datestr(25/24/3600, 'DD-HH:MM:SS')
ans =
00-00:00:25
Just gives it for 25 seconds (as from tic/toc)