Rearranging operations to optimize and vectorize nested looping algorithm - matlab

I have a series of nested loops that works to store data in a cell array. I am trying to find ways to speed up the loop and also help to simplify the readability. I have already optimized the loop a fair bit, but would like to see if I could vectorize it further. My original code looked like this:
%% ORIGINAL LOOP
for iA = 1:length(arrA)
for iB = 1:length(arrB)
for iC = 1:length(arrC)
a = arrA(iA); % depends only on iA
a_x = AData.x(AData.a==a);
a_y = AData.y(AData.a==a);
b = arrB(iB); % depends only on iB
b_x = BData.x(BData.b==b);
b_y = BData.y(BData.b==b);
c = arrC(iC); % depends only on iC
FinalData{iA,iB,iC} = computedata(a_x, a_y, b_x, b_y, c);
end
end
end
Since the calculations for a, a_x, a_y depended only on iA I pulled them out of the inner loops, and did similarly for the other variables, which increased performance significantly:
%% FASTER LOOP
for iA = 1:length(arrA)
a = arrA(iA);
a_x = AData.x(AData.a==a);
a_y = AData.y(AData.a==a);
for iB = 1:length(arrB)
b = arrB(iB);
b_x = BData.x(BData.b==b);
b_y = BData.y(BData.b==b);
for iC = 1:length(arrC)
c = arrC(iC);
FinalData{iA,iB,iC} = computedata(a_x, a_y, b_x, b_y, c);
end
end
end
I am wondering if there yet a better way to speed up this process, perhaps by MATLAB vectorization (elimination of loops altogether).
I also wanted to make it more compact and easier to rearrange the order of the loops if need be, for other functions I plan to design for plotting things in various orders. Any tips would be greatly appreciated.

Related

Parallelize nested loops in Matlab

I'm trying to speed up the simulation of some panel data in Matlab. I have to simulate first over individuals (loop index ii from 1 to N) and then for each individual over age (loop index jj from 1 to JJ). The code is slow because inside the two loops there is a bilinear interpolation to do.
Since the iterations in the outer loop are independent, I tried to use parfor in the outer loop (the loop indexed by ii), but I get the error message "the parfor cannot run due to the way the variable hsim is used". Could someone explain why and how to solve the problem if possible? Any help is greatly appreciated!
a_sim = zeros(Nsim,JJ);
h_sim = zeros(Nsim,JJ);
% Find point on a_grid corresponding to zero assets
aa0 = find_loc(a_grid,0.0);
% Zero housing
hh0 = 1;
a_sim(:,1) = a_grid(aa0);
h_sim(:,1) = h_grid(hh0);
parfor ii=1:Nsim !illegal
for jj=1:JJ-1
z_c = z_sim_ind(ii,jj);
apol_interp = griddedInterpolant({a_grid,h_grid},apol(:,:,z_c,jj));
hpol_interp = griddedInterpolant({a_grid,h_grid},hpol(:,:,z_c,jj));
a_sim(ii,jj+1) = apol_interp(a_sim(ii,jj),h_sim(ii,jj));
h_sim(ii,jj+1) = hpol_interp(a_sim(ii,jj),h_sim(ii,jj));
end
end
I think #Ben Voigt's suggestion was correct. To spell it out, do something like this:
parfor ii=1:Nsim
a_sim_row = a_sim(ii,:);
h_sim_row = h_sim(ii,:);
for jj=1:JJ-1
z_c = z_sim_ind(ii,jj);
apol_interp = griddedInterpolant({a_grid,h_grid},apol(:,:,z_c,jj));
hpol_interp = griddedInterpolant({a_grid,h_grid},hpol(:,:,z_c,jj));
a_sim_row(jj+1) = apol_interp(a_sim_row(jj),h_sim_row(jj));
h_sim_row(jj+1) = hpol_interp(a_sim_row(jj),h_sim_row(jj));
end
a_sim(ii,:) = a_sim_row;
h_sim(ii,:) = h_sim_row;
end
This is a fairly standard parfor pattern to work around the limitation (in this case, parfor cannot spot that what you're doing is not order-independent as far as the outer loop is concerned) - extract a whole slice, do whatever is needed, then put the whole slice back.

How can I avoid this for-loop in spite of every element having to be checked individually?

Using Matlab R2019a, is there any way to avoid the for-loop in the following code in spite of the dimensions containing different element so that each element has to be checked? M is a vector with indices, and Inpts.payout is a 5D array with numerical data.
for m = 1:length(M)-1
for power = 1:noScenarios
for production = 1:noScenarios
for inflation = 1:noScenarios
for interest = 1:noScenarios
if Inpts.payout(M(m),power,production,inflation,interest)<0
Inpts.payout(M(m+1),power,production,inflation,interest)=...
Inpts.payout(M(m+1),power,production,inflation,interest)...
+Inpts.payout(M(m),power,production,inflation,interest);
Inpts.payout(M(m),power,production,inflation,interest)=0;
end
end
end
end
end
end
It is quite simple to remove the inner 4 loops. This will be more efficient unless you have a huge matrix Inpts.payout, as a new indexing matrix must be generated.
The following code extracts the two relevant 'planes' from the input data, does the logic on them, then writes them back:
for m = 1:length(M)-1
payout_m = Inpts.payout(M(m),:,:,:,:);
payout_m1 = Inpts.payout(M(m+1),:,:,:,:);
indx = payout_m < 0;
payout_m1(indx) = payout_m1(indx) + payout_m(indx);
payout_m(indx) = 0;
Inpts.payout(M(m),:,:,:,:) = payout_m;
Inpts.payout(M(m+1),:,:,:,:) = payout_m1;
end
It is possible to avoid extracting the 'planes' and writing them back by working directly with the input data matrix. However, this yields more complex code.
However, we can easily avoid some indexing operations this way:
payout_m = Inpts.payout(M(1),:,:,:,:);
for m = 1:length(M)-1
payout_m1 = Inpts.payout(M(m+1),:,:,:,:);
indx = payout_m < 0;
payout_m1(indx) = payout_m1(indx) + payout_m(indx);
payout_m(indx) = 0;
Inpts.payout(M(m),:,:,:,:) = payout_m;
payout_m = payout_m1;
end
Inpts.payout(M(m+1),:,:,:,:) = payout_m1;
It seems like there is not a way to avoid this. I am assuming that each for lop independently changes a variable parameter used in the main calculation. Thus, it is required to have this many for loops. My only suggestion is to turn your nested loops into a function if you're concerned about appearance. Not sure if this will help run-time.

Declaring a vector in matlab whose size we don't know

Suppose we are running an infinite for loop in MATLAB, and we want to store the iterative values in a vector. How can we declare the vector without knowing the size of it?
z=??
for i=1:inf
z(i,1)=i;
if(condition)%%condition is met then break out of the loop
break;
end;
end;
Please note first that this is bad practise, and you should preallocate where possible.
That being said, using the end keyword is the best option for extending arrays by a single element:
z = [];
for ii = 1:x
z(end+1, 1) = ii; % Index to the (end+1)th position, extending the array
end
You can also concatenate results from previous iterations, this tends to be slower since you have the assignment variable on both sides of the equals operator
z = [];
for ii = 1:x
z = [z; ii];
end
Sadar commented that directly indexing out of bounds (as other answers are suggesting) is depreciated by MathWorks, I'm not sure on a source for this.
If your condition computation is separate from the output computation, you could get the required size first
k = 0;
while ~condition
condition = true; % evaluate the condition here
k = k + 1;
end
z = zeros( k, 1 ); % now we can pre-allocate
for ii = 1:k
z(ii) = ii; % assign values
end
Depending on your use case you might not know the actual number of iterations and therefore vector elements, but you might know the maximum possible number of iterations. As said before, resizing a vector in each loop iteration could be a real performance bottleneck, you might consider something like this:
maxNumIterations = 12345;
myVector = zeros(maxNumIterations, 1);
for n = 1:maxNumIterations
myVector(n) = someFunctionReturningTheDesiredValue(n);
if(condition)
vecLength = n;
break;
end
end
% Resize the vector to the length that has actually been filled
myVector = myVector(1:vecLength);
By the way, I'd give you the advice to NOT getting used to use i as an index in Matlab programs as this will mask the imaginary unit i. I ran into some nasty bugs in complex calculations inside loops by doing so, so I would advise to just take n or any other letter of your choice as your go-to loop index variable name even if you are not dealing with complex values in your functions ;)
You can just declare an empty matrix with
z = []
This will create a 0x0 matrix which will resize when you write data to it.
In your case it will grow to a vector ix1.
Keep in mind that this is much slower than initializing your vector beforehand with the zeros(dim,dim) function.
So if there is any way to figure out the max value of i you should initialize it withz = zeros(i,1)
cheers,
Simon
You can initialize z to be an empty array, it'll expand automatically during looping ...something like:
z = [];
for i = 1:Inf
z(i) = i;
if (condition)
break;
end
end
However this looks nasty (and throws a warning: Warning: FOR loop index is too large. Truncating to 9223372036854775807), I would do here a while (true) or the condition itself and increment manually.
z = [];
i = 0;
while !condition
i=i+1;
z[i]=i;
end
And/or if your example is really what you need at the end, replace the re-creation of the array with something like:
while !condition
i=i+1;
end
z = 1:i;
As mentioned in various times in this thread the resizing of an array is very processing intensive, and could take a lot of time.
If processing time is not an issue:
Then something like #Wolfie mentioned would be good enough. In each iteration the array length will be increased and that is that:
z = [];
for ii = 1:x
%z = [z; ii];
z(end+1) = ii % Best way
end
If processing time is an issue:
If the processing time is a large factor, and you want it to run as smooth as possible, then you need to preallocating.If you have a rough idea of the maximum number of iterations that will run then you can use #PluginPenguin's suggestion. But there could still be a change of hitting that preset limit, which will break (or severely slow down) the program.
My suggestion:
If your loop is running infinitely until you stop it, you could do occasional resizing. Essentially extending the size as you go, but only doing it once in a while. For example every 100 loops:
z = zeros(100,1);
for i=1:inf
z(i,1)=i;
fprintf("%d,\t%d\n",i,length(z)); % See it working
if i+1 >= length(z) %The array as run out of space
%z = [z; zeros(100,1)]; % Extend this array (note the semi-colon)
z((length(z)+100),1) = 0; % Seems twice as fast as the commented method
end
if(condition)%%condition is met then break out of the loop
break;
end;
end
This means that the loop can run forever, the array will increase with it, but only every once in a while. This means that the processing time hit will be minimal.
Edit:
As #Cris kindly mentioned MATLAB already does what I proposed internally. This makes two of my comments completely wrong. So the best will be to follow what #Wolfie and #Cris said with:
z(end+1) = i
Hope this helps!

Is there a more elegant way to write these loops?

I have a script that requires a handful of parameters to run. I'm interested in exploring the results as the parameters change, so I define a few scan arrays at the top, wrap the whole code in multiple for loops and set the parameters values to the current scan values.
This is error prone and inelegant. The process for changing the code is: 1) reset scan variables at the top, 2) comment out eg b = scan2(j2) and 3) uncomment b=b0.
What's a better method to allow variables to be set to arrays, and subsequently run the code for all such combinations? Example of my code now:
close all
clear all
%scan1 = linspace(1,4,10);
scan1 = 0;
scan2 = linspace(0,1,10);
scan3 = linspace(-1,0,10);
for j3 = 1:length(scan3)
for j2 = 1:length(scan2)
for j1 = 1:length(scan1)
a = a0;
%b = scan2(j2);
b = b0;
%c = c0;
c = scan3(j3);
d = scan2(j2);
%(CODE BLOCK THAT DEPENDS ON variables a,b,c,d...)
end
end
end
Based on this idea to use one for loop to simulate multiple loops, I tried to adapt it to your case. While fulfilling a good memory efficiency and usability, this solution is slower than using individual for loops.
%define your parameters
p.a = 1;
p.b = linspace(1,4,4);
p.c = linspace(11,15,5);
p.d = linspace(101,104,4);
p.e = 5;
iterations=structfun(#numel,p);
iterator=cell(1,numel(iterations));
for jx = 1:prod(iterations)
[iterator{:}]=ind2sub(iterations(:).',jx);%.'
%This line uses itertor to extract the corresponding elemets of p and creates a struct which only contains scalars.
q=cell2struct(cellfun(#(a,b)(a(b)),struct2cell(p),iterator(:),'uniform',false),fieldnames(p));
%__ (CODE THAT DEPENDS ON q.a to q.e here) __
end
For the scenarios I tested it adds an computation overhead below 0.0002s per iteration which is 0.0002.*prod(iterations)s in total.
One method is to make a single vector that contains all the parameter combinations, using ndgrid. For a sufficiently large parameter scans this may become a memory concern, but otherwise is at least much cleaner, requiring only a single loop and no re-assignments later in the code:
a0vec = 1;
b0vec = linspace(1,4,4);
c0vec = linspace(11,15,5);
d0vec = linspace(101,104,4);
e0vec = 5;
[a0s,b0s,c0s,d0s,e0s] = ndgrid(a0vec,b0vec,c0vec,d0vec,e0vec);
N = numel(a0s);
for j = 1:N
a0 = a0s(j);
b0 = b0s(j);
c0 = c0s(j);
d0 = d0s(j);
e0 = e0s(j);
%__ (CODE THAT DEPENDS ON a0 - e0 here) __
end
Would still like to see your suggestions!

Variably name histogram in for loop

I'm trying to have a hist function in a for loop because I work with varying amounts of datasets each time and its much faster and easier that having to edit a script each time, but I can't get it right. Can I have some help please? In essence I'm trying to have this in a for loop for variable number of unc{i} datasets and i number of [h{i},x{i}] resulting arrays:
[h1,x1] = hist(unc1,range);
[h2,x2] = hist(unc2,range);
[h3,x3] = hist(unc3,range);
[h4,x4] = hist(unc4,range);
Any help would be greatly appreciated. Thanking you in advance
Desclaimer: the use of eval is dangerous!
Let's say you have n uncs arrays. You can use struct to store them
for ii=1:n
cmd = sprintf( 's.unc%d = unc%d;', ii, ii );
eval( cmd );
end
Once you have the uncs is a sttruct, you can simply
for ii=n:-1:1
[h{ii} x{ii}] = hist( s.(sprintf('unc%d',ii)), range );
end
Notes:
1. Note that I used a backward loop for computing the histograms: this is a nice trick to preallocate h and x, see this thread.
2. It is extremly unwise to use eval, therefore, it might be wiser to create the different uncs arrays as a struct fields to begin with, skipping the first part of this answer.
You can put each of your input datasets in a cell array, and the output of the histograms in a second cell array.
For example,
unc1 = rand(5,1);
unc2 = rand(5,1);
unc3 = rand(5,1);
unc_cell = {unc1, unc2, unc3};
h_cell = cell(3, 1);
x_cell = cell(3, 1);
for ii = 1:3
[h{ii} x{ii}] = hist(unc_cell{ii});
end
This does require preloading all of the datasets and holding them in memory simultaneously. If this would use too much memory, you can load the datasets in the for loop rather than preloading them.
For example,
h_cell = cell(3, 1);
x_cell = cell(3, 1);
for ii = 1:3
unc = load(sprintf('data_%d.mat', ii)); %You would replace this with your file name
[h{ii} x{ii}] = hist(unc);
end