Double summation with vectorized loops in Matlab

Double summation with vectorized loops in Matlab - matlab

I want to vectorize this double for loop because it is a bottleneck in my code. Since Matlab is a based-one indexing language I have to create an additional term for M = 0.
R,r,lambda are constants
Slm(L,M),Clm(L,M) are matrices 70x70
Plm(L,M) is a matrix 70x71
Cl(L),Pl(L) are vectors 70x1
% function dU_r
s1 = 0;
for L = 2:70
s1 = s1 + ((R/r)^L)*(L+1)*Pl(L)*Cl(L);
for m = 1:L
s1 = s1 + ((R/r)^L)*(L+1)*Plm(L,M)*(Clm(L,M)*...
cos(M*lambda) + Slm(L,M)*sin(M*lambda));
end
end
dU_r = -(mu_p/(r^2))*s1;
% function dU_phi
s2=0;
for L = 2:70
s2 = s2 + ((R/r)^L))*Plm(L,1)*Cl(L);
for m = 1:l
s2 = s2 + ((R/r)^L)*(Plm(L,M+1)-M*tan(phi)*Plm(L,M))*...
(Clm(L,M)*cos(M*lambda) + Slm(L,M)*sin(M*lambda));
end;
end;
dU_phi = (mu_p/r)*s2;
% function dU_lambda
s3=0;
for L=2:70
for m=1:L
s3 = s3 + ((R/r)^L)*M*Plm(L,M)*(Slm(L,M)*cos(M*lambda)...
- Clm(L,M)*sin(M*lambda));
end;
dU_lambda = (mu_p/r)*s3;

I think the following code may be a part of the solution (only partially vectorized), for a full vectorization you might want to look at meshgrid, ndgrid, bsxfun and/or repmat, I suppose.
s2 = 0;
L = 2:70;
PlCl = Pl.*Cl;
PlmClm = Plm.*Clm;
Rrl = (R/r).^(L).*(L+1);
COS = cos((1:70)*lambda);
SIN = sin((1:70)*lambda);
for l=L
M = 1:l;
s1 = sum(Rrl(l-1)*PlmClm(l,M).*(COS(M) + Slm(l,M).*SIN(M)));
s2 = s2 + s1;
end;
s2 = s2 + sum(Rrl(L).*PlCl(L));
I haven't tried to run this code, so if it may complain about some dimensions. You should be able to figure that out (just some transposes here or there may do).
Just a note on the side: try to keep away from short variable names (and certainly ambiguous ones like l (that can be misread for a 1, a I or an l). Also, it might be easier to vectorize your code if you have actual (i.e. not coded) expressions to start with.
edit: applied corections suggested by gnovice

For this particular situation, it will likely be more efficient for you to not fully vectorize your solution. As Egon illustrates, it's possible to replace the inner loop through vectorization, but replacing the outer for loop as well could potentially slow-down your solution.
The reason? If you pay attention to how you index the matrices Plm, Clm, and Slm in your loop, you only ever use values from the lower triangular parts of them. All of the values above the main diagonal are ignored. By fully vectorizing your solution, such that it uses matrix operations instead of a loop, you would end up performing unnecessary multiplications with the zeroed-out upper triangular portions of the matrices. This is one of those cases where a for loop may be a better option.
As such, here's a refined and corrected version of Egon's answer that should be close to optimum with regard to the number of mathematical operations performed:
index = 2:70;
coeff = ((R/r).^index).*(index+1);
lambda = lambda.*(1:70).'; %'
cosVector = cos(lambda);
sinVector = sin(lambda);
s2 = coeff*(Pl(index).*Cl(index));
for L = index
M = 1:L;
s2 = s2 + coeff(L-1)*((Plm(L,M).*Clm(L,M))*cosVector(M) + ...
(Plm(L,M).*Slm(L,M))*sinVector(M));
end

Taking like reference a solution given by Jan Simon 1, I have written the following code for my problem
Rr = R/r;
RrL = Rr;
cosLambda = cos((1:70)* lambda);
sinLambda = sin((1:70)* lambda);
u1 = uint8(1);
s1 = 0;
s2 = 0;
s3 = 0;
for j = uint8(2):uint8(70)
RrL = RrL * Rr;
q1 = RrL * (double(j) + 1);
t1 = Pl(j) * datos.Cl(j);
q2 = RrL;
t2 = Plm(j,1) * Cl(j);
t3 = 0;
for m = u1:j
t1 = t1 + Plm(j,m) * ...
(Clm(j, m) * cosLambda(m) + ...
Slm(j, m) * sinLambda(m));
t2 = t2 + (Plm(j,m+1)-double(m)*tan_phi*Plm(j,m))*...
(Clm(j,m)*cosLambda(m) + Slm(j,m)*sinLambda(m));
t3 = t3 + double(m)*Plm(j,m)*(Slm(j,m)*cosLambda(m)...
- Clm(j,m)*sinLambda(m));
end
s1 = s1 + q1 * t1;
s2 = s2 + q2 * t2;
s3 = s3 + q2 * t3;
end
dU_r = -(mu_p/(r^2))*s1;
dU_phi = (mu_p/r)*s2;
dU_lambda = (mu_p/r)*s3

Related

Memory usage by Matlab

I'm running a Matlab code in the HPC of my university. I have two versions of the code. The second version, despite generating a smaller array, seems to require more memory. I would like your help to understand if this is in fact the case and why.
Let me start from some preliminary lines:
clear
rng default
%Some useful components
n=7^4;
vectors{1}=[1,20,20,20,-1,Inf,-Inf];
vectors{2}=[-19,19,19,19,-20,Inf,-Inf];
vectors{3}=[-19,0,0,0,-20,Inf,-Inf];
vectors{4}=[-19,0,0,0,-20,Inf,-Inf];
T_temp = cell(1,4);
[T_temp{:}] = ndgrid(vectors{:});
T_temp = cat(4+1, T_temp{:});
T = reshape(T_temp,[],4); %all the possible 4-tuples from vectors{1}, ..., vectors{4}
This is the first version 1 of the code: I construct the matrix D1 listing all possible pairs of unordered rows from T
indices_pairs=pairIndices(n);
D1=[T(indices_pairs(:,1),:) T(indices_pairs(:,2),:)];
This is the second version of the code: I construct the matrix D2 listing a random draw of m=10^6 unordered pairs of rows from T
m=10^6;
p=n*(n-1)/2;
random_indices_pairs = randperm(p, m).';
[C1, C2] = myind2ind (random_indices_pairs, n);
indices_pairs=[C1 C2];
D2=[T(indices_pairs(:,1),:) T(indices_pairs(:,2),:)];
My question: when generating D2 the HPC goes out of memory. When generating D1 the HPC works fine, despite D1 being a larger array than D2. Why is that the case?
These are complementary functions used above:
function indices = pairIndices(n)
[y, x] = find(tril(logical(ones(n)), -1)); %#ok<LOGL>
indices = [x, y];
end
function [R , C] = myind2ind(ii, N)
jj = N * (N - 1) / 2 + 1 - ii;
r = (1 + sqrt(8 * jj)) / 2;
R = N -floor(r);
idx_first = (floor(r + 1) .* floor(r)) / 2;
C = idx_first-jj + R + 1;
end

Vectorization of a loop in matlab

I have a loop
for i = 2:K
T(K,i) = ((4^(i-1))*T(K,i-1)-T(K-1,i-1))/(4^(i-1)-1);
end
where T is a two dimensional matrix (first element in given row, and all elements in rows above are already there) and K is a scalar.
I have tried to vectorize this loop to make it faster like this:
i = 2:K;
T(K,i) = ((4.^(i-1)).*T(K,i-1)-T(K-1,i-1))./(4.^(i-1)-1);
it compiles, but it yields improper results. Can you tell me where am I making mistake?
#EDIT:
I have written this, but still the result is wrong
i = 2:K;
i2 = 1:(K-1);
temp1 = T(K,i2)
temp2 = T(K-1,i2)
T(K,i) = ((4.^(i2)).*temp1-temp2)./(4.^(i2)-1);

First, let's re-index your loop (having fewer i-1 expressions):
for i=1:K-1
T(K,i+1) = ( 4^i*T(K,i) - T(K-1,i) ) / (4^i-1);
end
Then (I'll leave out the loop for now), we can factor out 4^i/(4^i-1):
T(K,i+1) = ( T(K,i) - T(K-1,i)/4^i ) * (4^i/(4^i-1));
Let's call a(i) = (4^i/(4^i-1)), b(i) = - T(K-1,i)/4^i, then expanding the first terms we get:
T(K,1) = T(K,1)
T(K,2) = T(K,1)*a(1) + b(1)*a(1)
T(K,3) = T(K,1)*a(1)*a(2) + b(1)*a(1)*a(2) + b(2)*a(2)
T(K,4) = T(K,1)*a(1)*a(2)*a(3) + b(1)*a(1)*a(2)*a(3) + b(2)*a(2)*a(3) + b(3)*a(3)
Then with c = [1, a(1), a(1)*a(2), ...] = [1, cumprod(a)]
T(K,i) = (T(K,1) + (b(1)/c(1) + b(2)/c(2) + ... + b(i-1)/c(i-1) ) * c(i)
So with d = b ./ c, e = cumsum(d), summarizing all calculations:
i=1:K-1;
a = 4.^i./(4.^i-1);
b = -T(K-1,1:end-1) ./ 4.^i;
c = [1, cumprod(a)];
d = b ./ c(1:end-1);
e = cumsum(d);
T(K,2:K) = (T(K,1) + e) .* c(2:end);
To further optimize this, note that 4^14/(4^14 - 1) equals 1, when calculated with double-precision, so actually T(K,14:K) could be optimized drastically -- i.e., you actually just need to calculate a, c, 1./c up to index 13. (I'll leave that as an exercise).

Simplifying function by removing a loop

What would be the best way to simplify a function by getting rid of a loop?
function Q = gs(f, a, b)
X(4) = sqrt((3+2*sqrt(6/5))/7);
X(3) = sqrt((3-2*sqrt(6/5))/7);
X(2) = -sqrt((3-2*sqrt(6/5))/7);
X(1) = -sqrt((3+2*sqrt(6/5))/7);
W(4) = (18-sqrt(30))/36;
W(3) = (18+sqrt(30))/36;
W(2) = (18+sqrt(30))/36;
W(1) = (18-sqrt(30))/36;
Q = 0;
for i = 1:4
W(i) = (W(i)*(b-a))/2;
X(i) = ((b-a)*X(i)+(b+a))/2;
Q = Q + W(i) * f(X(i));
end
end
Is there any way to use any vector-like solution instead of a for loop?

sum is your best friend here. Also, declaring some constants and creating vectors is useful:
function Q = gs(f, a, b)
c = sqrt((3+2*sqrt(6/5))/7);
d = sqrt((3-2*sqrt(6/5))/7);
e = (18-sqrt(30))/36;
g = (18+sqrt(30))/36;
X = [-c -d d c];
W = [e g g e];
W = ((b - a) / 2) * W;
X = ((b - a)*X + (b + a)) / 2;
Q = sum(W .* f(X));
end
Note that MATLAB loves to handle element-wise operations, so the key is to replace the for loop at the end with scaling all of the elements in W and X with those scaling factors seen in your loop. In addition, using the element-wise multiplication (.*) is key. This of course assumes that f can handle things in an element-wise fashion. If it doesn't, then there's no way to avoid the for loop.
I would highly recommend you consult the MATLAB tutorial on element-wise operations before you venture onwards on your MATLAB journey: https://www.mathworks.com/help/matlab/matlab_prog/array-vs-matrix-operations.html

Translate an equation into a code

Can you help me please translate this equation into a code?
I tried this code but its only the first part
theSum = sum(M(:, y) .* S(:, y) ./ (1 + K(:, y)))

EDIT: sorry, I was having a brainfart. The answer below makes no assumptions on the nature of M, K, etc, which is why I recommended functions like that. But they're clearly matrices. I'll make another answer, I'll leave this here for reference though in case it's useful
I would start by making the M, K, X, L, and O expressions into simple functions, so that you can easily call them as M(z,y), X(z,y) (or X(z,j) depending on the input you need ) etc.
Then you will convert each summation into a for loop and collect the result (you can think about vectorisation later, right now focus on translating the problem). The double summation is essentially a nested for loop, where the result of the inner loop is used in the outer one at each outer iteration.
So your end result should look something like:
Summation1 = 0;
for z = 1 : Z
tmp = M(z,y) / K(z,y) * (X(z,y) / (1 + L(z,y));
Summation1 = Summation1 + tmp;
end
Summation2 = 0;
for j = 1 : Y
if j ~= y
for z = 1 : Z
tmp = (M(z,j) * X(z,j) * O(j)) / (K(z,j)^2 * (1 + L(z,j)) * X(z,y);
Summation2 = Summation2 + tmp;
end
end
end
Result = Summation1 - Summation2;
(Btw, this assumes that all operations are on scalars. If M(z,y) outputs a vector, adjust for elementwise operations appropriately)

IF M, K, etc are all matrices, and all operations are expected to be element-wise, then this is a vectorised approach for this equation.
Left summation is
S1 = M(1:Z,y) ./ K(1:Z,y) .* X(1:Z,y) ./ (1 + L(1:Z,y));
S1 = sum(S1);
Right summation is (assuming (O is a horizontal vector)
S2 = M(1:Z, 1:Y) .* X(1:X, 1:Y) .* repmat(O(1:Y), [Z,1]) ./ ...
(K(1:Z, 1:Y) .^ 2 .* (1 + L(1:Z, 1:Y))) .* X(1:Z, 1:Y);
S2(:,y) = []; % remove the 'y' column from the matrix
S2 = sum(S2(:)); % add all elements
End result: S1 - S2

this is lambda version vectorized:
equation = #(y,M,K,X,L,O) ...
sum(M(:,y)./K(:,y).*X(:,y)./(1+L(:,y))) ...
-sum(sum( ...
bsxfun( ...
#times ...
,M(:,[1:y-1,y+1:end]) ...
.* X(:,[1:y-1,y+1:end]) ...
.* O(:,[1:y-1,y+1:end]) ...
./ (K(:,[1:y-1,y+1:end]) .^ 2 ...
.*(1+ L(:,[1:y-1,y+1:end]))) ...
,X(:,y) ...
) ...
));
%%% example:
y = 3;
Y = 5;
Z = 10;
M = rand(Y, Z);K = rand(Y, Z);X = rand(Y, Z);L = rand(Y, Z);O = rand(Y, Z);
equation(y,M,K,X,L,O)

find optimum values of model iteratively

Given that I have a model that can be expressed as:
y = a + b*st + c*d2
where st is a smoothed version of some data, and a, b and c are model coffieicients that are unknown. An iterative process should be used to find the best values for a, b, and c and also an additional value alpha, shown below.
Here, I show an example using some data that I have. I'll only show a small fraction of the data here to get an idea of what I have:
17.1003710350253 16.7250000000000 681.521316544969
17.0325989276234 18.0540000000000 676.656460644882
17.0113862864815 16.2460000000000 671.738125420192
16.8744356336601 15.1580000000000 666.767363772145
16.5537077980594 12.8830000000000 661.739644621949
16.0646524243248 10.4710000000000 656.656219934146
15.5904357723302 9.35000000000000 651.523986525985
15.2894427136087 12.4580000000000 646.344231349275
15.1181450512182 9.68700000000000 641.118300709434
15.0074128442766 10.4080000000000 635.847600747838
14.9330905954828 11.5330000000000 630.533597865332
14.8201069920058 10.6830000000000 625.177819082427
16.3126863409751 15.9610000000000 619.781852331734
16.2700386755872 16.3580000000000 614.347346678083
15.8072873786912 10.8300000000000 608.876012461843
15.3788908036751 7.55000000000000 603.369621360944
15.0694302370038 13.1960000000000 597.830006367160
14.6313314652840 8.36200000000000 592.259061672302
14.2479738025295 9.03000000000000 586.658742460043
13.8147156115234 5.29100000000000 581.031064599264
13.5384821473624 7.22100000000000 575.378104234926
13.3603543306796 8.22900000000000 569.701997272687
13.2469020140965 9.07300000000000 564.004938753678
13.2064193251406 12.0920000000000 558.289182116093
13.1513460035983 12.2040000000000 552.557038340513
12.8747853506079 4.46200000000000 546.810874976187
12.5948999131388 4.61200000000000 541.053115045791
12.3969691298003 6.83300000000000 535.286235826545
12.1145822760120 2.43800000000000 529.512767505944
11.9541188991626 2.46700000000000 523.735291710730
11.7457790927936 4.15000000000000 517.956439908176
11.5202981254529 4.47000000000000 512.178891679167
11.2824263926694 2.62100000000000 506.405372863054
11.0981930749608 2.50000000000000 500.638653574697
10.8686514170776 1.66300000000000 494.881546094641
10.7122053911554 1.68800000000000 489.136902633882
10.6255883267131 2.48800000000000 483.407612975178
10.4979083986908 4.65800000000000 477.696601993434
10.3598092538338 4.81700000000000 472.006827058220
10.1929490084608 2.46700000000000 466.341275322034
10.1367069580204 2.36700000000000 460.702960898512
10.0194072271384 4.87800000000000 455.094921935306
9.88627023967911 3.53700000000000 449.520217586971
9.69091601129389 0.417000000000000 443.981924893704
9.48684595125235 -0.567000000000000 438.483135572389
9.30742664359900 0.892000000000000 433.026952726910
9.18283037670750 1.50000000000000 427.616487485241
9.02385722622626 1.75800000000000 422.254855571341
8.90355705229410 2.46700000000000 416.945173820367
8.76138912769045 1.99200000000000 411.690556646207
8.61299614111510 0.463000000000000 406.494112470755
8.56293606861698 6.55000000000000 401.358940124780
8.47831879772002 4.65000000000000 396.288125230599
8.42736865902327 6.45000000000000 391.284736577104
8.26325535934842 -1.37900000000000 386.351822497948
8.14547793724500 1.37900000000000 381.492407263967
8.00075641792910 -1.03700000000000 376.709487501030
7.83932517791044 -1.66700000000000 372.006028644665
7.68389447250257 -4.12900000000000 367.384961442799
7.63402151555169 -2.57900000000000 362.849178517935
The results that follow probably won't be meaningful as the full data would be needed (but this is an example). Using this data I have tried to solve iteratively by
y = d(:,1);
d1 = d(:,2);
d2 = d(:,3);
alpha_o = linspace(0.01,1,10);
a = linspace(0.01,1,10);
b = linspace(0.01,1,10);
c = linspace(0.01,1,10);
defining different values for a, b, and c as well as another term alpha, which is used in the model, and am now going to find every possible combination of these parameters and see which combination provides the best fit to the data:
% every possible combination of values
xx = combvec(alpha_o,a,b,c);
% loop through each possible combination of values
for j = 1:size(xx,2);
alpha_o = xx(1,j);
a_o = xx(2,j);
b_o = xx(3,j);
c_o = xx(4,j);
st = d1(1);
for i = 2:length(d1);
st(i) = alpha_o.*d1(i) + (1-alpha_o).*st(i-1);
end
st = st(:);
y_pred = a_o + (b_o*st) + (c_o*d2);
mae(j) = nanmean(abs(y - y_pred));
end
I can then re-run the model using these optimum values:
[id1,id2] = min(mae);
alpha_opt = xx(:,id2);
st = d1(1);
for i = 2:length(d1);
st(i) = alpha_opt(1).*d1(i) + (1-alpha_opt(1)).*st(i-1);
end
st = st(:);
y_pred = alpha_opt(2) + (alpha_opt(3)*st) + (alpha_opt(4)*d2);
mae_final = nanmean(abs(y - y_pred));
However, to reach a final answer I would need to increase the number of initial guesses to more than 10 for each variable. This will take a long time to run. Thereofre, I am wondering if there is a better method for what I am trying to do here? Any advice is appreciated.

Here's some thoughts: If you could decrease the amount of computation within each for loop, you could possibly speed it up. One possible way is to look for common factors between each loop and move it outside for loop:
If you look at the iteration, you'll see
st(1) = d1(1)
st(2) = a * d1(2) + (1-a) * st(1) = a *d1(2) + (1-a)*d1(1)
st(3) = a * d1(3) + (1-a) * st(2) = a * d1(3) + a *(1-a)*d1(2) +(1-a)^2 * d1(1)
st(n) = a * d1(n) + a *(1-a)*d1(n-1) + a *(1-a)^2 * d1(n-2) + ... +(1-a)^(n-1)*d1(1)
Which means st can be calculated by multiplying these two matrices (here I use n=4 for example to illustrate the concept) and sum along the first dimension:
temp1 = [ 0 0 0 a ;
0 0 a a(1-a) ;
0 a a(1-a) a(1-a)^2 ;
1 (1-a) (1-a)^2 (1-a)^3 ;]
temp2 = [ 0 0 0 d1(4) ;
0 0 d1(3) d1(3) ;
0 d1(2) d1(2) d1(2) ;
d1(1) d1(1) d1(1) d1(1) ;]
st = sum(temp1.*temp2,1)
Here's codes that utilize this concept: Computation has been moved out of the inner for loop and only assignment is left.
alpha_o = linspace(0.01,1,10);
xx = nchoosek(alpha_o, 4);
n = size(d1,1);
matrix_d1 = zeros(n, n);
d2 = d2'; % To make the dimension of d2 and st the same.
for ii = 1:n
matrix_d1(n-ii+1:n, ii) = d1(1:ii);
end
st = zeros(size(d1)'); % Pre-allocation of matrix will improve speed.
mae = zeros(1,size(xx,1));
matrix_alpha = zeros(n, n);
for j = 1 : size(xx,1)
alpha_o = xx(j,1);
temp = (power(1-alpha_o, [0:n-1])*alpha_o)';
matrix_alpha(n,:) = power(1-alpha_o, [0:n-1]);
for ii = 2:n
matrix_alpha(n-ii+1:n-1, ii) = temp(1:ii-1);
end
st = sum(matrix_d1.*matrix_alpha, 1);
y_pred = xx(j,2) + xx(j,3)*st + xx(j,4)*d2;
mae(j) = nanmean(abs(y - y_pred));
end
Then :
idx = find(min(mae));
alpha_opt = xx(idx,:);
st = zeros(size(d1)');
temp = (power(1-alpha_opt(1), [0:n-1])*alpha_opt(1))';
matrix_alpha = zeros(n, n);
matrix_alpha(n,:) = power(1-alpha_opt(1), [0:n-1]);;
for ii = 2:n
matrix_alpha(n-ii+1:n-1, ii) = temp(1:ii-1);
end
st = sum(matrix_d1.*matrix_alpha, 1);
y_pred = alpha_opt(2) + (alpha_opt(3)*st) + (alpha_opt(4)*d2);
mae_final = nanmean(abs(y - y_pred));
Let me know if this helps !