I have a 2800x4800 matrix. There is data only in the first column. I want to add data the rest of the columns as well. The values in a row should continue like this: n = (n-1) + 0.005. I wrote a code with a loop and it works, however, it takes too long. How can I write this without a loop?
for j=2:size(Time,2)
Time(:,j) = Time(:,(j-1)) + (1/(Fs*1000));
end
It could be likes the following by replacing the computation for rows of 1:2:size(Time,2)-1 with rows of 2:2:size(Time,2) (indeed you can remove the for to speed up). Notice we assume that Fs is a constant here:
m = size(Time,2);
Time(:,2:m)= Time(:,1:(m-1))+(1/(Fs*1000));
It's possible to get the same results as your sample code in just one line by writing
Time(:,2:end) = bsxfun(#plus,Time(:,1), (1/(Fs*1000)) .* (1:size(Time,2)-1));
If you have a newer version of Matlab (>= r2016b) you can use implicit expansion by Matlab and simply write
Time(:,2:end) = Time(:,1) + (1/(Fs*1000)) .* (1:size(Time,2)-1);
But at least on my computer I do not really see any performance improvement by using this vectorization instead of your loop. The JIT compilation has gotten quite a bit better over time, so it would be interesting to know which Matlab version you use.
Related
I have a function f(x,y) = abs(cos(x+3) * sin(y+2)) that I need to sum up using two for loops. Note: the real function is more complex, this is a toy version of it for the purposes of the question.
f = #(x,y) abs(cos(x+3) * sin(y+2));
tot = 0;
for m=1:100
for n=1:100
tot = tot + f(m,n);
end
end
disp(tot)
Output: 4.026314876227891e+03
How can I vectorize this code to get rid of the for loops and make it faster?
[n,m]=meshgrid(1:100,1:100);
tot=sum(f(m,n),'all')
However I am not sure this is any faster, you can time it. Matlab is quite fast in loops, the old truth about it being slower when you loop is outdated by 5 years or so. Most of the times the JIT compiler will find the fastest way to run it. This is one of the cases where your toy problem may hid the actual problem, as the JIT may find this toy problem easier to speed up, but not your real one, or vice versa.
You will need to time.
I would like to optimize this piece of Matlab code but so far I have failed. I have tried different combinations of repmat and sums and cumsums, but all my attempts seem to not give the correct result. I would appreciate some expert guidance on this tough problem.
S=1000; T=10;
X=rand(T,S),
X=sort(X,1,'ascend');
Result=zeros(S,1);
for c=1:T-1
for cc=c+1:T
d=(X(cc,:)-X(c,:))-(cc-c)/T;
Result=Result+abs(d');
end
end
Basically I create 1000 vectors of 10 random numbers, and for each vector I calculate for each pair of values (say the mth and the nth) the difference between them, minus the difference (n-m). I sum over of possible pairs and I return the result for every vector.
I hope this explanation is clear,
Thanks a lot in advance.
It is at least easy to vectorize your inner loop:
Result=zeros(S,1);
for c=1:T-1
d=(X(c+1:T,:)-X(c,:))-((c+1:T)'-c)./T;
Result=Result+sum(abs(d),1)';
end
Here, I'm using the new automatic singleton expansion. If you have an older version of MATLAB you'll need to use bsxfun for two of the subtraction operations. For example, X(c+1:T,:)-X(c,:) is the same as bsxfun(#minus,X(c+1:T,:),X(c,:)).
What is happening in the bit of code is that instead of looping cc=c+1:T, we take all of those indices at once. So I simply replaced cc for c+1:T. d is then a matrix with multiple rows (9 in the first iteration, and one fewer in each subsequent iteration).
Surprisingly, this is slower than the double loop, and similar in speed to Jodag's answer.
Next, we can try to improve indexing. Note that the code above extracts data row-wise from the matrix. MATLAB stores data column-wise. So it's more efficient to extract a column than a row from a matrix. Let's transpose X:
X=X';
Result=zeros(S,1);
for c=1:T-1
d=(X(:,c+1:T)-X(:,c))-((c+1:T)-c)./T;
Result=Result+sum(abs(d),2);
end
This is more than twice as fast as the code that indexes row-wise.
But of course the same trick can be applied to the code in the question, speeding it up by about 50%:
X=X';
Result=zeros(S,1);
for c=1:T-1
for cc=c+1:T
d=(X(:,cc)-X(:,c))-(cc-c)/T;
Result=Result+abs(d);
end
end
My takeaway message from this exercise is that MATLAB's JIT compiler has improved things a lot. Back in the day any sort of loop would halt code to a grind. Today it's not necessarily the worst approach, especially if all you do is use built-in functions.
The nchoosek(v,k) function generates all combinations of the elements in v taken k at a time. We can use this to generate all possible pairs of indicies then use this to vectorize the loops. It appears that in this case the vectorization doesn't actually improve performance (at least on my machine with 2017a). Maybe someone will come up with a more efficient approach.
idx = nchoosek(1:T,2);
d = bsxfun(#minus,(X(idx(:,2),:) - X(idx(:,1),:)), (idx(:,2)-idx(:,1))/T);
Result = sum(abs(d),1)';
Update: here are the results for the running times for the different proposals (10^5 trials):
So it looks like the transformation of the matrix is the most efficient intervention, and my original double-loop implementation is, amazingly, the best compared to the vectorized versions. However, in my hands (2017a) the improvement is only 16.6% compared to the original using the mean (18.2% using the median).
Maybe there is still room for improvement?
So, I am aware that there are a number of other posts about eliminating for loops but I still haven't been able to figure this out.
I am looking to rewrite my code so that it has fewer for loops and runs a little faster. The code describes an optics problem calculating the intensity of different colors after the light has propagated through a medium. I have already gotten credit for this assignment but I would like to learn of better ways than just throwing in for loops all over the place. I tried rewriting the innermost loop using recursion which worked and looked nice but was a little slower.
Any other comments/improvements are also welcome.
Thanks!
n_o=1.50;
n_eo=1.60;
d=20e-6;
N_skiv=100;
lambda=[650e-9 510e-9 475e-9];
E_in=[1;1]./sqrt(2);
alfa=pi/2/N_skiv;
delta=d/N_skiv;
points=100;
int=linspace(0,pi/2,points);
I_ut=zeros(3,points);
n_eo_theta=#(theta)n_eo*n_o/sqrt(n_o^2*cos(theta)^2+n_eo^2*sin(theta)^2);
hold on
for i=1:3
for j=1:points
J_last=J_pol2(0);
theta=int(j);
for n=0:N_skiv
alfa_n=alfa*n;
J_last=J_ret_uppg2(alfa_n, delta , n_eo_theta(theta) , n_o , lambda(i) ) * J_last;
end
E_ut=J_pol2(pi/2)*J_last*E_in;
I_ut(i,j)=norm(E_ut)^2;
end
end
theta_grad=linspace(0,90,points);
plot(theta_grad,I_ut(1,:),'r')
plot(theta_grad,I_ut(2,:),'g')
plot(theta_grad,I_ut(3,:),'b')
And the functions:
function matris=J_proj(alfa)
matris(1,1)=cos(alfa);
matris(1,2)=sin(alfa);
matris(2,1)=-sin(alfa);
matris(2,2)=cos(alfa);
end
function matris=J_pol2(alfa)
J_p0=[1 0;0 0];
matris=J_proj(-alfa)*J_p0*J_proj(alfa);
end
function matris=J_ret_uppg2(alfa_n,delta,n_eo_theta,n_o,lambda)
k0=2*pi/lambda;
J_r0_u2(1,1)=exp(1i*k0*delta*n_eo_theta);
J_r0_u2(2,2)=exp(1i*k0*n_o*delta);
matris=J_proj(-alfa_n)*J_r0_u2*J_proj(alfa_n);
end
Typically you cannot get rid of a for-loop if you are doing a calculation that depends on a previous answer, which seems to be the case with the J_last-variable.
However I saw at least one possible improvement with the n_eo_theta inline-function, instead of doing that calculation 100 times, you could instead simply change this line:
n_eo_theta=#(theta)n_eo*n_o/sqrt(n_o^2*cos(theta)^2+n_eo^2*sin(theta)^2);
into:
theta_0 = 1:100;
n_eo_theta=n_eo*n_o./sqrt(n_o^2*cos(theta_0).^2+n_eo^2*sin(theta_0).^2);
This would run as is, although you should also want to remove the variable "theta" in the for-loop. I.e. simply change
n_eo_theta(theta)
into
n_eo_theta(j)
The way of using the "." prefix in the calculations is the furthermost tool for getting rid of for-loops (i.e. using element-wise calculations). For instance; see element-wise multiplication.
You can use matrices!!!!
For example, you have the statement:
theta=int(j)
which is inside a nested loop. You can replace it by:
theta = [int(1:points);int(1:points);int(1:points)];
or:
theta = int(repmat((1:points), 3, 1));
Then, you have
alfa_n=alfa * n;
you can replace it by:
alfa_n = alfa .* (0:N_skiv);
And have all the calculation done in a row like fashion. That means, instead looping, you will have the values of a loop in a row. Thus, you perform the calculations at the rows using the MATLAB's functionalities and not looping.
I have two lists of timestamps and I'm trying to create a map between them that uses the imu_ts as the true time and tries to find the nearest vicon_ts value to it. The output is a 3xd matrix where the first row is the imu_ts index, the third row is the unix time at that index, and the second row is the index of the closest vicon_ts value above the timestamp in the same column.
Here's my code so far and it works, but it's really slow. I'm not sure how to vectorize it.
function tmap = sync_times(imu_ts, vicon_ts)
tstart = max(vicon_ts(1), imu_ts(1));
tstop = min(vicon_ts(end), imu_ts(end));
%trim imu data to
tmap(1,:) = find(imu_ts >= tstart & imu_ts <= tstop);
tmap(3,:) = imu_ts(tmap(1,:));%Use imu_ts as ground truth
%Find nearest indecies in vicon data and map
vic_t = 1;
for i = 1:size(tmap,2)
%
while(vicon_ts(vic_t) < tmap(3,i))
vic_t = vic_t + 1;
end
tmap(2,i) = vic_t;
end
The timestamps are already sorted in ascending order, so this is essentially an O(n) operation but because it's looped it runs slowly. Any vectorized ways to do the same thing?
Edit
It appears to be running faster than I expected or first measured, so this is no longer a critical issue. But I would be interested to see if there are any good solutions to this problem.
Have a look at knnsearch in MATLAB. Use cityblock distance and also put an additional constraint that the data point in vicon_ts should be less than its neighbour in imu_ts. If it is not then take the next index. This is required because cityblock takes absolute distance. Another option (and preferred) is to write your custom distance function.
I believe that your current method is sound, and I would not try and vectorize any further. Vectorization can actually be harmful when you are trying to optimize some inner loops, especially when you know more about the context of your data (e.g. it is sorted) than the Mathworks engineers can know.
Things that I typically look for when I need to optimize some piece of code liek this are:
All arrays are pre-allocated (this is the biggest driver of performance)
Fast inner loops use simple code (Matlab does pretty effective JIT on basic commands, but must interpret others.)
Take advantage of any special data features that you have, e.g. use sort appropriate algorithms and early exit conditions from some loops.
You're already doing all this. I recommend no change.
A good start might be to get rid of the while, try something like:
for i = 1:size(tmap,2)
C = max(0,tmap(3,:)-vicon_ts(i));
tmap(2,i) = find(C==min(C));
end
I have a function which does the following loop many, many times:
for cluster=1:max(bins), % bins is a list in the same format as kmeans() IDX output
select=bins==cluster; % find group of values
means(select,:)=repmat_fast_spec(meanOneIn(x(select,:)),sum(select),1);
% (*, above) for each point, write the mean of all points in x that
% share its label in bins to the equivalent row of means
delta_x(select,:)=x(select,:)-(means(select,:));
%subtract out the mean from each point
end
Noting that repmat_fast_spec and meanOneIn are stripped-down versions of repmat() and mean(), respectively, I'm wondering if there's a way to do the assignment in the line labeled (*) that avoids repmat entirely.
Any other thoughts on how to squeeze performance out of this thing would also be welcome.
Here is a possible improvement to avoid REPMAT:
x = rand(20,4);
bins = randi(3,[20 1]);
d = zeros(size(x));
for i=1:max(bins)
idx = (bins==i);
d(idx,:) = bsxfun(#minus, x(idx,:), mean(x(idx,:)));
end
Another possibility:
x = rand(20,4);
bins = randi(3,[20 1]);
m = zeros(max(bins),size(x,2));
for i=1:max(bins)
m(i,:) = mean( x(bins==i,:) );
end
dd = x - m(bins,:);
One obvious way to speed up calculation in MATLAB is to make a MEX file. You can compile C code and perform any operations you want. If you're searching for the fastest-possible performance, turning the operation into a custom MEX file would likely be the way to go.
You may be able to get some improvement by using ACCUMARRAY.
%# gather array sizes
[nPts,nDims] = size(x);
nBins = max(bins);
%# calculate means. Not sure whether it might be faster to loop over nDims
meansCell = accumarray(bins,1:nPts,[nBins,1],#(idx){mean(x(idx,:),1)},{NaN(1,nDims)});
means = cell2mat(meansCell);
%# subtract cluster means from x - this is how you can avoid repmat in your code, btw.
%# all you need is the array with cluster means.
delta_x = x - means(bins,:);
First of all: format your code properly, surround any operator or assignment by whitespace. I find your code very hard to comprehend as it looks like a big blob of characters.
Next of all, you could follow the other responses and convert the code to C (mex) or Java, automatically or manually, but in my humble opinion this is a last resort. You should only do such things when your performance is not there yet by a small margin. On the other hand, your algorithm doesn't show obvious flaws.
But the first thing you should do when trying to improve performance: profile. Use the MATLAB profiler to determine which part of your code is causing your problems. How much would you need to improve this to meet your expectations? If you don't know: first determine this boundary, otherwise you will be looking for a needle in a hay stack which might not even be in there in the first place. MATLAB will never be the fastest kid on the block with respect to runtime, but it might be the fastest with respect to development time for certain kinds of operations. In that respect, it might prove useful to sacrifice the clarity of MATLAB over the execution speed of other languages (C or even Java). But in the same respect, you might as well code everything in assembler to squeeze all of the performance out of the code.
Another obvious way to speed up calculation in MATLAB is to make a Java library (similar to #aardvarkk's answer) since MATLAB is built on Java and has very good integration with user Java libraries.
Java's easier to interface and compile than C. It might be slower than C in some cases, but the just-in-time (JIT) compiler in the Java virtual machine generally speeds things up very well.