I have been happily using MATLAB to solve some project Euler problems. Yesterday, I wrote some code to solve one of these problems (14). When I write code containing long loops I always test the code by running it with short loops. If it runs fine and it does what it's supposed to do I assume this will also be the case when the length of the loop is longer.
This assumption turned out to be wrong. While executing the code below, MATLAB ran out of memory somewhere around the 75000th iteration.
c=1;
e=1000000;
for s=c:e
n=s;
t=1;
while n>1
a(s,t)=n;
if mod(n,2) == 0
n=n/2;
else
n=3*n+1;
end
a(s,t+1)=n;
t=t+1;
end
end
What can I do to prevent this from happening? Do I need to clear variables or free up memory somewhere in the process? Will saving the resulting matrix a to the hard drive help?
Here is the solution, staying as close as possible to your code (which is very close, the main difference is that you only need a 1D matrix):
c=1;
e=1000000;
a=zeros(e,1);
for s=c:e
n=s;
t=1;
while n>1
if mod(n,2) == 0
n=n/2;
else
n=3*n+1;
end
t=t+1;
end
a(s)=t;
end
[f g]=max(a);
This takes a few seconds (note the preallocation), and the result g unlocks the Euler 14 door.
Simply put, there's not enough memory to hold the matrix a.
Why are you making a two-dimensional matrix here anyway? You're storing information that you can compute just as fast as looking it up.
There's a much better thing to memoize here.
EDIT: Looking again, you're not even using the stuff you put in that matrix! Why are you bothering to create it?
The code appears to be storing every sequence in a different row of a matrix. The number of columns of that matrix will be equal to the length of the longest sequence currently found. This means that a sequence of two numbers will be padded with a bunch of right hand zeros.
I am sure you can see how this is incredibly inefficient. That may be the point of the exercise, or it will be for you in this implementation.
Better is to keep a variable like "Seed of longest solution found" which would store the seed for the longest solution. I would also keep a "length of longest solution found" keep the length. As you try every new seed, if it wins the title of longest, then update those variables.
This will keep only what you need in memory.
Short Answer:Use a 2d sparse matrix instead.
Long Answer: http://www.mathworks.com/access/helpdesk/help/techdoc/ref/sparse.html
Related
I would like to optimize this piece of Matlab code but so far I have failed. I have tried different combinations of repmat and sums and cumsums, but all my attempts seem to not give the correct result. I would appreciate some expert guidance on this tough problem.
S=1000; T=10;
X=rand(T,S),
X=sort(X,1,'ascend');
Result=zeros(S,1);
for c=1:T-1
for cc=c+1:T
d=(X(cc,:)-X(c,:))-(cc-c)/T;
Result=Result+abs(d');
end
end
Basically I create 1000 vectors of 10 random numbers, and for each vector I calculate for each pair of values (say the mth and the nth) the difference between them, minus the difference (n-m). I sum over of possible pairs and I return the result for every vector.
I hope this explanation is clear,
Thanks a lot in advance.
It is at least easy to vectorize your inner loop:
Result=zeros(S,1);
for c=1:T-1
d=(X(c+1:T,:)-X(c,:))-((c+1:T)'-c)./T;
Result=Result+sum(abs(d),1)';
end
Here, I'm using the new automatic singleton expansion. If you have an older version of MATLAB you'll need to use bsxfun for two of the subtraction operations. For example, X(c+1:T,:)-X(c,:) is the same as bsxfun(#minus,X(c+1:T,:),X(c,:)).
What is happening in the bit of code is that instead of looping cc=c+1:T, we take all of those indices at once. So I simply replaced cc for c+1:T. d is then a matrix with multiple rows (9 in the first iteration, and one fewer in each subsequent iteration).
Surprisingly, this is slower than the double loop, and similar in speed to Jodag's answer.
Next, we can try to improve indexing. Note that the code above extracts data row-wise from the matrix. MATLAB stores data column-wise. So it's more efficient to extract a column than a row from a matrix. Let's transpose X:
X=X';
Result=zeros(S,1);
for c=1:T-1
d=(X(:,c+1:T)-X(:,c))-((c+1:T)-c)./T;
Result=Result+sum(abs(d),2);
end
This is more than twice as fast as the code that indexes row-wise.
But of course the same trick can be applied to the code in the question, speeding it up by about 50%:
X=X';
Result=zeros(S,1);
for c=1:T-1
for cc=c+1:T
d=(X(:,cc)-X(:,c))-(cc-c)/T;
Result=Result+abs(d);
end
end
My takeaway message from this exercise is that MATLAB's JIT compiler has improved things a lot. Back in the day any sort of loop would halt code to a grind. Today it's not necessarily the worst approach, especially if all you do is use built-in functions.
The nchoosek(v,k) function generates all combinations of the elements in v taken k at a time. We can use this to generate all possible pairs of indicies then use this to vectorize the loops. It appears that in this case the vectorization doesn't actually improve performance (at least on my machine with 2017a). Maybe someone will come up with a more efficient approach.
idx = nchoosek(1:T,2);
d = bsxfun(#minus,(X(idx(:,2),:) - X(idx(:,1),:)), (idx(:,2)-idx(:,1))/T);
Result = sum(abs(d),1)';
Update: here are the results for the running times for the different proposals (10^5 trials):
So it looks like the transformation of the matrix is the most efficient intervention, and my original double-loop implementation is, amazingly, the best compared to the vectorized versions. However, in my hands (2017a) the improvement is only 16.6% compared to the original using the mean (18.2% using the median).
Maybe there is still room for improvement?
So, I've recently started using Matlab's built-in profiler on a regular basis, and I've noticed that while its usually great at showing which lines are taking up the most time, sometimes it'll tell me a large chunk of time is being used on the end statement of a for loop.
Now, seeing as such a line is just used for denoting the end of the loop, I can't imagine how it could use anything other than a trivial amount of processing.
I've seen a specific version of this question asked on matlab central, but a consensus didn't seem to be reached.
EDIT: Here's a minimal example of this problem:
for i =1:1000
x = 1;
x = [x 1];
% clear x;
end
Even if you uncomment the clear, the end line still takes up a lot of computation (about 20%), and the clear actually increases the absolute amount of computation performed by the end line.
When I've seen this in my code, it's been the deallocation of large temporaries created in the loop. Each new variable created in the loop is deallocated at the end.
I wrote a MATLAB code for finding seismic signal (ex. P wave) from SAC(seismic) file (which is read via another code). This algorithm is called STA/LTA trigger algorithm (actually not that important for my question)
Important thing is that actually this code works well, but since my seismic file is too big (1GB, which is for two months), it takes almost 40 minutes for executing to see the result. Thus, I feel the need to optimize the code.
I heard that replacing loops with advanced functions would help, but since I am a novice in MATLAB, I cannot get an idea about how to do it, since the purpose of code is scan through the every time series.
Also, I heard that preallocation might help, but I have mere idea about how to actually do this.
Since this code is about seismology, it might be hard to understand, but my notes at the top might help. I hope I can get useful advice here.
Following is my code.
function[pstime]=classic_LR(tseries,ltw,stw,thresh,dt)
% This is the code for "Classic LR" algorithm
% 'ns' is the number of measurement in STW-used to calculate STA
% 'nl' is the number of measurement in LTW-used to calculate LTA
% 'dt' is the time gap between measurements i.e. 0.008s for HHZ and 0.02s for BHZ
% 'ltw' and 'stw' are long and short time windows respectively
% 'lta' and 'sta' are long and short time windows average respectively
% 'sra' is the ratio between 'sta' and 'lta' which will be each component
% for a vector containing the ratio for each measurement point 'i'
% Index 'i' denotes each measurement point and this will be converted to actual time
nl=fix(ltw/dt);
ns=fix(stw/dt);
nt=length(tseries);
aseries=abs(detrend(tseries));
sra=zeros(1,nt);
for i=1:nt-ns
if i>nl
lta=mean(aseries(i-nl:i));
sta=mean(aseries(i:i+ns));
sra(i)=sta/lta;
else
sra(i)=0;
end
end
[k]=find(sra>thresh);
if ~isempty(k)
pstime=k*dt;
else
pstime=0;
end
return;
If you have MATLAB 2016a or later, you can use movmean instead of your loop (this means you also don't need to preallocate anything):
lta = movmean(aseries(1:nt-ns),nl+1,'Endpoints','discard');
sta = movmean(aseries(nl+1:end),ns+1,'Endpoints','discard');
sra = sta./lta;
The only difference here is that you will get sra with no leading and trailing zeros. This is most likely to be the fastest way. If for instance, aseries is 'only' 8 MB than this method takes less than 0.02 second while the original method takes almost 6 seconds!
However, even if you don't have Matlab 2016a, considering your loop, you can still do the following:
Remove the else statement - sta(i) is already zero from the preallocating.
Start the loop from nl+1, instead of checking when i is greater than nl.
So your new loop will be:
for i=nl+1:nt-ns
lta = mean(aseries(i-nl:i));
sta = mean(aseries(i:i+ns));
sra(i)=sta/lta;
end
But it won't be so faster.
I am trying to avoid the for loops and I have been reading through all the old posts there are about it but I am not able to solve my problem. I am new in MATLAB, so apologies for my ignorance.
The thing is that I have a 300x2 cell and in each one I have a 128x128x256 matrix. Each one is an image with 128x128 pixels and 256 channels per pixel. In the first column of the 300x2 cell I have my parallel intensity values and in the second one my perpendicular intensity values.
What I want to do is to take every pixel of every image (for each component) and sum the intensity values channel by channel.
The code I have is the following:
Image_par_channels=zeros(128,128,256);
Image_per_channels=zeros(128,128,256);
Image_tot_channels=zeros(128,128,256);
for a=1:128
for b=1:128
for j=1:256
for i=1:numfiles
Image_par_channels(a,b,j)=Image_par_channels(a,b,j)+Image_cell_par_per{i,1}(a,b,j);
Image_per_channels(a,b,j)=Image_per_channels(a,b,j)+Image_cell_par_per{i,2}(a,b,j);
end
Image_tot_channels(a,b,j)=Image_par_channels(a,b,j)+2*G*Image_per_channels(a,b,j);
end
end
end
I think I could speed it up introducing (:,:,j) instead of specifying a and b. But still a for loop. I am trying to use cellfun without any success due to my lack of expertise. Could you please give me a hand?
I would really appreciate it.
Many thanks and have a nice day!
Y
I believe you could do something like
Image_par_channels=zeros(128,128,256);
Image_per_channels=zeros(128,128,256);
Image_tot_channels=zeros(128,128,256);
for i=1:numfiles
Image_par_channels = Image_par_channels + Image_cell_par_per{i,1};
Image_per_channels = Image_per_channels + Image_cell_par_per{i,2};
end
Image_tot_channels = Image_par_channels + 2*G*Image_per_channels;
I haven't work with matlab in a long time, but I seem to recall you can do something like this. g is a constant.
EDIT:
Removed the +=. Incremental assignment is not an operator available in matlab. You should also note that Image_tot_channels can be build directly in the loop, if you don't need the other two variables later.
I am working on a big Matlab testbench with thousands of lines of code, and I am trying to optimize the most time-consuming routines, determined via the profiler in Matlab.
I noticed that one of those most time-consuming operations is the following:
list = list((list(:,1) >= condxMin) & (list(:,1) <= condxMax) & (list(:,2) >= condyMin) & (list(:,2) <= condyMax),:);
Concretely, I have a big list of coordinates (50000 x 2 at least) and I want to restrict the values of this list so as to keep only the points that verify both of these conditions :
list(:,1) must be within [condxMin, condxMax] and list(:2) within [condyMin condyMax].
I was wondering if there was a more efficient way to do it, considering that this line of code is already vectorized.
Also, I am wondering if Matlab does a short-circuiting or not. If it doesn't, then I don't think there is a way to do it without breaking the vectorization and do it with a while loop, where I would write instead something like this:
j=1;
for i=1:size(list,1)
if(cond1 && cond2 && cond3 && cond4)
newlist(j,1:2) = list(i,1:2);
j=j+1;
end
end
Thank you in advance for your answer :)
Looks like the original vectorized version is the fastest way I can find, barring any really clever ideas. Matlab does do short circuiting, but not for matrices. The loop implementation you you showed would be very slow, since you're not pre-allocating (nor are you able to pre-allocate the full matrix).
I tried a couple of variations on this, including a for loop which used a short circuited && to determine whether the index was bad or not, but no such luck. On the plus side, the vectorized version you've got runs at 0.21s for a 5 million element coordinate list.