This is a part of my code in Matlab. I tried to make it parallel but there is an error:
The variable gax in a parfor cannot be classified.
I know why the error occurs. because I should tell Matlab that v is an incresing vector which doesn't contain repeated elements. Could anyone help me to use this information to parallelize the code?
for m=v
if m > 1
parfor j=1:m-1
gax(j,m-1) = ggx(j,m-1);
if m<nn
parfor jo=m+1:15
gax(jo,m) = ggx(jo,m);

Optimizing a code should be closely related to its purpose, especially when you use parfor. The code you wrote in the question can be written in a much more efficient way, and definitely, do not need to be parallelized.
However, I understand that you tried to simplify the problem, just to get the idea of how to slice your variables, so here is a fixed version the can run with parfor. But this is surely not the way to write this code:
v = [1,3,6,8];
ggx = 5.*ones(15,14);
gax = ones(15,14);
nn = 5;
for m = v
if m > 1
temp_end = m-1;
temp = ggx(:,temp_end);
parfor ja = 1:temp_end
gax(ja,temp_end) = temp(ja);
if m < nn
temp = ggx(:,m);
parfor jo = m+1:15
gax(jo,m) = temp(jo);
A vectorized implementation will look like this:
v = [1,3,6,8];
ggx = 5.*ones(15,14);
gax = ones(15,14);
nn = 5;
m1 = v>1; % first condition with logical indexing
temp = v(m1)-1; % get the values from v
r = ones(1,sum(temp)); % generate a vector of indicies
r(cumsum(temp)) = -temp+1; % place the reseting locations
r = cumsum(r); % calculate the indecies
r(cumsum(temp)) = temp; % place the ending points
c = repelem(temp,temp); % create an indecies vector for the columns
inds1 = sub2ind(size(gax),r,c); % convert the indecies to linear
mnn = v<nn; % second condition with logical indexing
temp = v(mnn)+1; % get the values from v
r_max = size(gax,1); % get the height of gax
r_count = r_max-temp+1; % calculate no. of rows per value in v
r = ones(1,sum(r_count)); % generate a vector of indicies
r([1 r_count(1:end-1)+1]) = temp; % set the t indicies
r(cumsum(r_count)+1) = -(r_count-temp)+1; % place the reseting locations
r = cumsum(r(1:end-1)); % calculate the indecies
c = repelem(temp-1,r_count); % create an indecies vector for the columns
inds2 = sub2ind(size(gax),r,c); % convert the indecies to linear
gax([inds1 inds2]) = ggx([inds1 inds2]); % assgin the relevant values
This is indeed quite complicated, and not always necessary. A good thing to remember, though, is that nested for loop are much slower than a single loop, so in some cases (depend on the size of the output), this will may be the fastest solution:
for m = v
if m > 1
gax(1:m-1,m-1) = ggx(1:m-1,m-1);
if m<nn
gax(m+1:15,m) = ggx(m+1:15,m);


I am very new to Scilab, but so far have not been able to find an answer (either here or via google) to my question. I'm sure it's a simple solution, but I'm at a loss. I have a lot of MATLAB scripts I wrote in grad school, but now that I'm out of school, I no longer have access to MATLAB (and can't justify the cost). Scilab looked like the best open alternative. I'm trying to convert my .m files to Scilab compatible versions using mfile2sci, but when running the mfile2sci GUI, I get the error/message shown below. Attached is the original code from the M-file, in case it's relevant.
I Searched Stack Overflow and companion sites, Google, Scilab documentation.
The M-file code follows (it's a super basic MATLAB script as part of an old homework question -- I chose it as it's the shortest, most straightforward M-file I had):
Mmax = 15;
N = 20;
T = 2000;
%define upper limit for sparsity of signal
smax = 15;
mNE = zeros(smax,Mmax);
mESR= zeros(smax,Mmax);
for M = 1:Mmax
aNormErr = zeros(smax,1);
aSz = zeros(smax,1);
ESR = zeros(smax,1);
for s=1:smax % for-loop to loop script smax times
normErr = zeros(1,T);
vESR = zeros(1,T);
sz = zeros(1,T);
for t=1:T %for-loop to carry out 2000 trials per s-value
esr = 0;
A = randn(M,N); % generate random MxN matrix
[M,N] = size(A);
An = zeros(M,N); % initialize normalized matrix
for h = 1:size(A,2) % normalize columns of matrix A
V = A(:,h)/norm(A(:,h));
An(:,h) = V;
A = An; % replace A with its column-normalized counterpart
c = randperm(N,s); % create random support vector with s entries
x = zeros(N,1); % initialize vector x
for i = 1:size(c,2)
val = (10-1)*rand + 1;% generate interval [1,10]
neg = mod(randi(10),2); % include [-10,-1]
if neg~=0
val = -1*val;
x(c(i)) = val; %replace c(i)th value of x with the nonzero value
y = A*x; % generate measurement vector (y)
R = y;
S = []; % initialize array to store selected columns of A
indx = []; % vector to store indices of selected columns
coeff = zeros(1,s); % vector to store coefficients of approx.
stop = 10; % init. stop condition
in = 0; % index variable
esr = 0;
xhat = zeros(N,1); % intialize estimated x signal
while (stop>0.5 && size(S,2)<smax)
%MAX = abs(A(:,1)'*R);
maxV = zeros(1,N);
for i = 1:size(A,2)
maxV(i) = abs(A(:,i)'*R);
in = find(maxV == max(maxV));
indx = [indx in];
S = [S A(:,in)];
coeff = [coeff R'*S(:,size(S,2))]; % update coefficient vector
for w=1:size(S,2)
r = y - ((R'*S(:,w))*S(:,w)); % update residuals
if norm(r)<norm(R)
index = w;
R = r;
stop = norm(R); % update stop condition
for j=1:size(S,2) % place coefficients into xhat at correct indices
nE = norm(x-xhat)/norm(x); % calculate normalized error for this estimate
%esr = 0;
indx = sort(indx);
c = sort(c);
if isequal(indx,c)
esr = esr+1;
vESR(t) = esr;
sz(t) = size(S,2);
normErr(t) = nE;
%avsz = sum(sz)/T;
aSz(s) = sum(sz)/T;
%aESR = sum(vESR)/T;
ESR(s) = sum(vESR)/T;
%avnormErr = sum(normErr)/T; % produce average normalized error for these run
aNormErr(s) = sum(normErr)/T; % add new avnormErr to vector of all av norm errors
% just put this here to view the vector
mNE(:,M) = aNormErr;
mESR(:,M) = ESR;
% had an 'end' placed here, might've been unmatched
dimx = [1 Mmax];
dimy = [1 smax];
colormap gray
strESR = sprintf('Average ESR, N=%d',N);
strNE = sprintf('Average Normed Error, N=%d',N);
colormap gray
The command used (and results) follow:
--> mfile2sci
ans =
****** Beginning of mfile2sci() session ******
File to convert: C:/Users/User/Downloads/WTF_new.m
Result file path: C:/Users/User/DOWNLO~1/
Recursive mode: OFF
Only double values used in M-file: NO
Verbose mode: 3
Generate formatted code: NO
M-file reading...
M-file reading: Done
Syntax modification...
Syntax modification: Done
File contains no instruction, no translation made...
****** End of mfile2sci() session ******
To convert the foo.m file one has to enter
mfile2sci <path>/foo.m
where stands for the path of the directoty where foo.m is. The result is written in /foo.sci
Remove the ```` at the begining of each line, the conversion will proceed normally ?. However, don't expect to obtain a working .sci file as the m2sci converter is (to me) still an experimental tool !

I want to be able to vectorize the for-loops of this function to then be able to parallelize it in octave. Can these for-loops be vectorized? Thank you very much in advance!
I attach the code of the function commenting on the start and end of each for-loop and if-else.
function [par]=pem_v(tsm,pr)
% tsm and pr are arrays of N by n. % par is an array of N by 8
% main-loop
for ii=1:N
% I extract the rows in each loop because each one represents a sample
sst=tsm(ii,:); sst=sst'; %then I convert each sample to column vectors
pre=pr(ii,:); pre=pre';
% main-condition
if isnan(nanmean(sst))==1;
% first sub-loop
for k=1:length(tss);
idxx=find(sst>=tss(k)-0.25 & sst<=tss(k)+0.25);
% end first sub-loop
% second sub-loop
for j=1:length(tc)
cond1=find(sst>=tc(j) & sst<=tp90);
clear A B AA BB;
clear pem;
% end second sub-loop
% sub-condition
cond1=find(sst>=tcc & sst<=tp90);
% outputs
% end sub-condition
clear pem pre sst RMSE BB B tp90 tcc
% end main-condition
% end main-loop
You haven't given any example inputs, so I've created some like so:
N = 5; n = 800;
tsm = rand(N,n)*5+27; pr = rand(N,n);
Then, before you even consider vectorising your code, you should keep 4 things in mind...
Avoid calulating the same thing (like the size of a vector) every loop, instead do it before looping
Pre-allocate arrays where possible (declare them as zeros/NaNs etc)
Don't use find to convert logical indices into linear indices, there is no need and it will slow down your code
Don't repeatedly use clear, especially many times within loops. It is slow! Instead, use pre-allocation to ensure the variables are as you expect each loop.
Using the above random inputs, and taking account of these 4 things, the below code is ~65% quicker than your code. Note: this is without even doing any vectorising!
function [par]=pem_v(tsm,pr)
% tsm and pr are arrays of N by n.
% par is an array of N by 8
% Transpose once here instead of every loop
tsm = tsm';
pr = pr';
% Pre-allocate memory for output 'par'
par = NaN(N, 8);
% Don't compute these every loop, do it before the loop.
% numel simpler than length for vectors, and size is clearer still
ntss = numel(tss);
nsst = size(tsm,1);
ntc = numel(tc);
npr = size(pr, 1);
for ii=1:N
% Extract the columns in each loop because each one represents a sample
% main-condition. Previously isnan(nanmean(sst))==1, but that's only true if all(isnan(sst))
% We don't need to assign par(ii,1:8)=NaN since we initialised par to a matrix of NaNs
if ~all(isnan(sst));
% first sub-loop, initialise 'out' first
out = zeros(1, ntss);
for k=1:ntss;
% Don't use FIND on an indexing vector. Use the logical index raw, it's quicker
idxx = (sst>=tss(k)-0.25 & sst<=tss(k)+0.25);
% We need a check that some values of idxx are true, otherwise prctile will error.
if nnz(idxx) > 0
out(k) = prctile(pre(idxx), 90);
% Again, no need for FIND, just reduces speed. This is a theme...
for jj=1:ntc
cond1 = (sst>=tc(jj) & sst<=tp90);
cond2 = (sst>=tp90);
% Use nnz (numer of non-zero) instead of length, since cond1 is now a logical vector of all elements
A = [sst(cond1),ones(nnz(cond1),1)];
B = regress(pre(cond1), A);
pt90 = B(1)*(tp90-tc(jj));
AA = [(sst(cond2)-tp90)];
BB = regress(pre(cond2)-pt90,AA);
pem(cond1) = max(0, B(1)*(sst(cond1)-tc(jj)));
pem(cond2) = max(0, (BB(1)*(sst(cond2)-tp90))+pt90);
E(jj) = sqrt(nansum((pem-pre).^2)/npr);
tcc = tc(E==min(E));
if ~isempty(tcc);
cond1 = (sst>=tcc & sst<=tp90);
cond2 = (sst>=tp90);
A = [sst(cond1),ones(nnz(cond1),1)];
B = regress(pre(cond1),A);
pt90 = B(1)*(tp90-tcc);
AA = [sst(cond2)-tp90];
BB = regress(pre(cond2)-pt90,AA);
pem = zeros(length(sst),1);
pem(cond1) = max(0, B(1)*(sst(cond1)-tcc));
pem(cond2) = max(0, (BB(1)*(sst(cond2)-tp90))+pt90);
RMSE = sqrt(nansum((pem-pre).^2)/npr);
% Outputs, which we might as well assign all at once!
par(ii,:)=[tcc, tp90, B(1), BB(1), RMSE, ...
nanmean(sst), nanmean(pre), nanmean(pem)];

I have originally written the following Matlab code to find intersection between a set of Axes Aligned Bounding Boxes (AABB) and space partitions (here 8 partitions). I believe it is readable by itself, moreover, I have added some comments for even more clarity.
function [A,B] = AABBPart(bbx,it) % bbx: aabb, it: iteration
global F
IT = it+1;
n = size(bbx,1);
F = cell(n,it);
A = Part([min(bbx(:,1:3)),max(bbx(:,4:6))],it,0); % recursive partitioning
B = F; % matlab does not allow
function s = Part(bx,it,J) % output to be global
s = {};
if it < 1; return; end
s = cell(8,1);
p = bx(1:3);
q = bx(4:6);
h = 0.5*(p+q);
prt = [p,h;... % 8 sub-parts (octa)
for j=1:8 % check for each sub-part
k = 0;
t = zeros(0,1);
for i=1:n
if all(bbx(i,1:3) <= prt(j,4:6)) && ... % interscetion test for
all(prt(j,1:3) <= bbx(i,4:6)) % every aabb and sub-parts
k = k+1;
t(k) = i;
if ~isempty(t)
s{j,1} = [t; Part(prt(j,:),it-1,j)]; % recursive call
for i=1:numel(t) % collecting the results
if isempty(F{t(i),IT-it})
F{t(i),IT-it} = [-J,j];
F{t(i),IT-it} = [F{t(i),IT-it}; [-J,j]];
In my tests, it seems that probably few intersections are missing, say, 10 or so for 1000 or more setup. So I would be glad if you could help to find out any problematic parts in the code.
I am also concerned about using global F. I prefer to get rid of it.
Any other better solution in terms of speed, will be loved.
Note that the code is complete. And you can easily try it by some following setup.
n = 10000; % in the original application, n would be millions
bbx = rand(n,6);
it = 3;
[A,B] = AABBPart(bbx,it);

Can anyone help vectorize this Matlab code? The specific problem is the sum and bessel function with vector inputs.
Thank you!
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
for ii = 1:length(rho_g)
for jj = 1:length(phi_g)
% Coordinates
rho_o = rho_g(ii);
phi_o = phi_g(jj);
% factors
fc = cos(n.*(phi_o-phi_s));
fs = sin(n.*(phi_o-phi_s));
Ez_t(ii,jj) = sum(tau.*besselj(n,k(3)*rho_s).*besselh(n,2,k(3)*rho_o).*fc);
You could try to vectorize this code, which might be possible with some bsxfun or so, but it would be hard to understand code, and it is the question if it would run any faster, since your code already uses vector math in the inner loop (even though your vectors only have length 3). The resulting code would become very difficult to read, so you or your colleague will have no idea what it does when you have a look at it in 2 years time.
Before wasting time on vectorization, it is much more important that you learn about loop invariant code motion, which is easy to apply to your code. Some observations:
you do not use fs, so remove that.
the term tau.*besselj(n,k(3)*rho_s) does not depend on any of your loop variables ii and jj, so it is constant. Calculate it once before your loop.
you should probably pre-allocate the matrix Ez_t.
the only terms that change during the loop are fc, which depends on jj, and besselh(n,2,k(3)*rho_o), which depends on ii. I guess that the latter costs much more time to calculate, so it better to not calculate this N*N times in the inner loop, but only N times in the outer loop. If the calculation based on jj would take more time, you could swap the for-loops over ii and jj, but that does not seem to be the case here.
The result code would look something like this (untested):
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
% constant part, does not depend on ii and jj, so calculate only once!
temp1 = tau.*besselj(n,k(3)*rho_s);
Ez_t = nan(length(rho_g), length(phi_g)); % preallocate space
for ii = 1:length(rho_g)
% calculate stuff that depends on ii only
rho_o = rho_g(ii);
temp2 = besselh(n,2,k(3)*rho_o);
for jj = 1:length(phi_g)
phi_o = phi_g(jj);
fc = cos(n.*(phi_o-phi_s));
Ez_t(ii,jj) = sum(temp1.*temp2.*fc);
Initialization -
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
Nested loops form (Copy from your code and shown here for comparison only) -
for ii = 1:length(rho_g)
for jj = 1:length(phi_g)
% Coordinates
rho_o = rho_g(ii);
phi_o = phi_g(jj);
% factors
fc = cos(n.*(phi_o-phi_s));
fs = sin(n.*(phi_o-phi_s));
Ez_t(ii,jj) = sum(tau.*besselj(n,k(3)*rho_s).*besselh(n,2,k(3)*rho_o).*fc);
Vectorized solution -
%%// Term - 1
term1 = repmat(tau.*besselj(n,k(3)*rho_s),[N*N 1]);
%%// Term - 2
[n1,rho_g1] = meshgrid(n,rho_g);
term2_intm = besselh(n1,2,k(3)*rho_g1);
term2 = transpose(reshape(repmat(transpose(term2_intm),[N 1]),N,N*N));
%%// Term -3
angle1 = repmat(bsxfun(#times,bsxfun(#minus,phi_g,phi_s')',n),[N 1]);
fc = cos(angle1);
%%// Output
Ez_t = sum(term1.*term2.*fc,2);
Ez_t = transpose(reshape(Ez_t,N,N));
Points to note about this vectorization or code simplification –
‘fs’ doesn’t change the output of the script, Ez_t, so it could be removed for now.
The output seems to be ‘Ez_t’,which requires three basic terms in the code as –
tau.*besselj(n,k(3)*rho_s), besselh(n,2,k(3)*rho_o) and fc. These are calculated separately for vectorization as terms1,2 and 3 respectively.
All these three terms appear to be of 1xN sizes. Our aim thus becomes to calculate these three terms without loops. Now, the two loops run for N times each, thus giving us a total loop count of NxN. Thus, we must have NxN times the data in each such term as compared to when these terms were inside the nested loops.
This is basically the essence of the vectorization done here, as the three terms are represented by ‘term1’,’term2’ and ‘fc’ itself.
In order to give a self-contained answer, I'll copy the original initialization
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
and generate some missing data (k(3) and rho_s and phi_s in the dimension of n)
rho_s = rand(size(n));
phi_s = rand(size(n));
k(3) = rand(1);
then you can compute the same Ez_t with multidimensional arrays:
[RHO_G, PHI_G, N] = meshgrid(rho_g, phi_g, n);
[~, ~, TAU] = meshgrid(rho_g, phi_g, tau);
[~, ~, RHO_S] = meshgrid(rho_g, phi_g, rho_s);
[~, ~, PHI_S] = meshgrid(rho_g, phi_g, phi_s);
FC = cos(N.*(PHI_G - PHI_S));
FS = sin(N.*(PHI_G - PHI_S)); % not used
EZ_T = sum(TAU.*besselj(N, k(3)*RHO_S).*besselh(N, 2, k(3)*RHO_G).*FC, 3).';
You can check afterwards that both matrices are the same
norm(Ez_t - EZ_T)

I am doing a very large calculation (atmospheric absorption) that has a lot of individual narrow peaks that all get added up at the end. For each peak, I have pre-calculated the range over which the value of the peak shape function is above my chosen threshold, and I am then going line by line and adding the peaks to my spectrum. A minimum example is given below:
X = 1:1e7;
K = numel(a); % count the number of peaks I have.
spectrum = zeros(size(X));
for k = 1:K
grid = X >= rng(1,k) & X <= rng(2,k);
spectrum(grid) = spectrum(grid) + peakfn(X(grid),a(k),b(k),c(k)]);
Here, each peak has some parameters that define the position and shape (a,b,c), and a range over which to do the calculation (rng). This works great, and on my machine it benchmarks at around 220 seconds to do a complete data set. However, I have a 4 core machine and I would eventually like to run this on a cluster, so I'd like to parallelize it and make it scaleable.
Because each loop relies on the results of the previous iteration, I cannot use parfor, so I am taking my first step into learning how to use spmd blocks. My first try looked like this:
X = 1:1e7;
cores = matlabpool('size');
K = numel(a);
spectrum = zeros(size(X),cores);
n = labindex:cores:K
N = numel(n);
for k = 1:N
grid = X >= rng(1,n(k)) & X <= rng(2,n(k));
spectrum(grid,labindex) = spectrum(grid,labindex) + peakfn(X(grid),a(n(k)),b(n(k)),c(n(k))]);
finalSpectrum = sum(spectrum,2);
This almost works. The program crashes at the last line because spectrum is of type Composite, and the documentation for 2013a is spotty on how to turn Composite data into a matrix (cell2mat does not work). This also does not scale well because the more cores I have, the larger the matrix is, and that large matrix has to get copied to each worker, which then ignores most of the data. Question 1: how do I turn a Composite data type into a useable array?
The second thing I tried was to use a codistributed array.
spectrum = codistributed.zeros(K,cores);
This tells me that each worker has a single vector of size [K 1], which I believe is what I want, but when I try to then meld the above methods
spectrum = codistributed.zeros(K,cores);
n = labindex:cores:K
N = numel(n);
for k = 1:N
grid = X >= rng(1,n(k)) & X <= rng(2,n(k));
spectrum(grid) = spectrum(grid) + peakfn(X(grid),a(n(k)),b(n(k)),c(n(k))]); end
finalSpectrum = gather(spectrum);
finalSpectrum = sum(finalSpectrum,2);
I get Matrix dimensions must agree errors. Since it's in a parallel block, I can't use my normal debugging crutch of stepping through the loop and seeing what the size of each block is at each point to see what's going on. Question 2: what is the proper way to index into and out of a codistributed array in an spmd block?
Regarding question#1, the Composite variable in the client basically refers to a non-distributed variant array stored on the workers. You can access the array from each worker by {}-indexing using its corresponding labindex (e.g: spectrum{1}, spectrum{2}, ..).
For your code that would be: finalSpectrum = sum(cat(2,spectrum{:}), 2);
Now I tried this problem myself using random data. Below are three implementations to compare (see here to understand the difference between distributed and nondistributed arrays). First we start with the common data:
len = 100; % spectrum length
K = 10; % number of peaks
X = 1:len;
% random position and shape parameters
a = rand(1,K); b = rand(1,K); c = rand(1,K);
% random peak ranges (lower/upper thresholds)
ranges = sort(randi([1 len], [2 K]));
% dummy peakfn() function
fcn = #(x,a,b,c) x+a+b+c;
% prepare a pool of MATLAB workers
matlabpool open
1) Serial for-loop:
spectrum = zeros(size(X));
for i=1:size(ranges,2)
r = ranges(:,i);
idx = (r(1) <= X & X <= r(2));
spectrum(idx) = spectrum(idx) + fcn(X(idx), a(i), b(i), c(i));
s1 = spectrum;
clear spectrum i r idx
2) SPMD with Composite array
spectrum = zeros(1,len);
ind = labindex:numlabs:K;
for i=1:numel(ind)
r = ranges(:,ind(i));
idx = (r(1) <= X & X <= r(2));
spectrum(idx) = spectrum(idx) + ...
feval(fcn, X(idx), a(ind(i)), b(ind(i)), c(ind(i)));
s2 = sum(vertcat(spectrum{:}));
clear spectrum i r idx ind
3) SPMD with co-distributed array
spectrum = zeros(numlabs, len, codistributor('1d',1));
ind = labindex:numlabs:K;
for i=1:numel(ind)
r = ranges(:,ind(i));
idx = (r(1) <= X & X <= r(2));
spectrum(labindex,idx) = spectrum(labindex,idx) + ...
feval(fcn, X(idx), a(ind(i)), b(ind(i)), c(ind(i)));
s3 = sum(gather(spectrum));
clear spectrum i r idx ind
All three results should be equal (to within an acceptably small margin of error)
>> max([max(s1-s2), max(s1-s3), max(s2-s3)])
ans =