How can I vectorize this Nested for loop - matlab

I'm trying to vectorize the inner loop but I'm not being able to.
for i = 1:NTS
U(1) = 0;
V(1) = 0;
U(end)=1;
V(end)=1;
for j = 2:NAS-1
U(j) = (b(j) * U(j-1) + a(j) * UOld(j) + c(j) * UOld(j+1)) / gamma(j);
V(NAS-j+1) = (b(NAS-j+1) * VOld(NAS-j+1-1) + a(NAS-j+1) * VOld(NAS-j+1) + c(NAS-j+1) * V(NAS-j+1+1)) / gamma(NAS-j+1);
end
end
I've tried the following code but I get a wrong result, can someone please help me out if that?
U(2:NAS-1) = ((b(2:NAS-1) .* U(1:NAS-2) + a(2:NAS-1) .* UOld(2:NAS-1) + c(2:NAS-1) .* UOld(3:NAS)) ./ gamma(2:NAS-1));
V(NAS-1:-1:2) = ((b(NAS-1:-1:2) .* VOld(NAS-2:-1:1) + a(NAS-1:-1:2) .* VOld(NAS-1:-1:2) + c(NAS-1:-1:2) .* V(NAS:-1:3)) ./ gamma(NAS-1:-1:2));
this is the whole code as requested for the minimal verifiable thing. Thanks
clc
clear
NTS=1;
NAS=10;
U=rand(1,10)*100;
V=U;
UU=zeros(1,10);
VV=zeros(1,10);
Gamma=ones(1,100)*(1-10.05);
a=ones(1,10)*(1-0.05);
b=ones(1,10)*(1-0.15);
c=ones(1,10)*(1-0.35);
UOld=U;
VOld=V;
UU=U;
VV=V;
for i=1:NTS
UU(2:NAS-1) = ((b(2:NAS-1) .* UU(1:NAS-2) + a(2:NAS-1) .* UOld(2:NAS-1) + c(2:NAS-1) .* UOld(3:NAS)) ./ gamma(2:NAS-1));
VY(NAS-1:-1:2) = ((b(NAS-1:-1:2) .* VOld(NAS-2:-1:1) + a(NAS-1:-1:2) .* VOld(NAS-1:-1:2) + c(NAS-1:-1:2) .* VV(NAS:-1:3)) ./ gamma(NAS-1:-1:2));
for j = 2:NAS-1
U(j) = (b(j) * U(j-1) + a(j) * UOld(j) + c(j) * UOld(j+1)) / gamma(j);
V(NAS-j+1) = (b(NAS-j+1) * VOld(NAS-j+1-1) + a(NAS-j+1) * VOld(NAS-j+1) + c(NAS-j+1) * V(NAS-j+1+1)) / gamma(NAS-j+1);
end
display(sum(U(2:NAS-1)'-UU(2:NAS-1)'));
display(sum(V(2:NAS-1)'-VV(2:NAS-1)'));
end
display([U' UU']);
display([V' VV']);

Related

Runge-Kutta Numerical Method Bad Approximation

I´m attempting to use the Runge-Kutta method to compare it to the lsode function. But it is performing rather poorly, every other method I used (forwards and backwards Euler, Heun) to compare to lsode do a way better job to the point they are almost indistinguishable from lsode.
This is what my code returns
https://i.stack.imgur.com/vJ6Yi.png
If anyone can point out a way to improve it or if I'm doing something wrong I´d appreciate it.
The following is what I use for the Runge-Kutta method
%Initial conditions
u(1) = 1;
v(1) = 2;
p(1) = -1/sqrt(3);
q(1) = 1/sqrt(3);
%Graf interval / step size
s0 = 0;
sf = 50;
h = 0.25;
n=(sf-s0)/h;
s(1) = s0;
%-----------------------------------------------------------------------%
for j = 2:n
i = j-1;
k1_u(j) = p(i);
k1_v(j) = q(i);
k1_p(j) = (-2*v(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1);
k1_q(j) = (-2*u(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1);
u1(j) = p(i) + (1/2)*k1_u(j)*h;
v1(j) = q(i) + (1/2)*k1_v(j)*h;
p1(j) = (-2*v(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1) + (1/2)*k1_p(j)*h;
q1(j) = (-2*u(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1) + (1/2)*k1_q(j)*h;
k2_u(j) = p1(j);
k2_v(j) = q1(j);
k2_p(j) = (-2*v1(j)*p1(j)*q1(j)) / (u1(j)*u1(j) + v1(j)*v1(j) + 1);
k2_q(j) = (-2*u1(j)*p1(j)*q1(j)) / (u1(j)*u1(j) + v1(j)*v1(j) + 1);
u2(j) = p(i) + (1/2)*k2_u(j)*h;
v2(j) = q(i) + (1/2)*k2_v(j)*h;
p2(j) = (-2*v(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1) + (1/2)*k2_p(j)*h;
q2(j) = (-2*u(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1) + (1/2)*k2_q(j)*h;
k3_u(j) = p2(j);
k3_v(j) = q2(j);
k3_p(j) = (-2*v2(j)*p2(j)*q2(j)) / (u2(j)*u2(j) + v2(j)*v2(j) + 1);
k3_q(j) = (-2*u2(j)*p2(j)*q2(j)) / (u2(j)*u2(j) + v2(j)*v2(j) + 1);
u3(j) = p(i) + k3_u(j)*h;
v3(j) = q(i) + k3_v(j)*h;
p3(j) = (-2*v(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1) + k3_p(j)*h;
q3(j) = (-2*u(i)*p(i)*q(i)) / (u(i)*u(i) + v(i)*v(i) + 1) + k3_q(j)*h;
k4_u(j) = p3(j);
k4_v(j) = q3(j);
k4_p(j) = (-2*v3(j)*p3(j)*q3(j)) / (u3(j)*u3(j) + v3(j)*v3(j) + 1);
k4_q(j) = (-2*u3(j)*p3(j)*q3(j)) / (u3(j)*u3(j) + v3(j)*v3(j) + 1);
s(j) = s(j-1) + h;
u(j) = u(j-1) + (h/6)*(k1_u(j) + 2*k2_u(j) + 2*k3_u(j) + k4_u(j));
v(j) = v(j-1) + (h/6)*(k1_v(j) + 2*k2_v(j) + 2*k3_v(j) + k4_v(j));
p(j) = p(j-1) + (h/6)*(k1_p(j) + 2*k2_p(j) + 2*k3_p(j) + k4_p(j));
q(j) = q(j-1) + (h/6)*(k1_q(j) + 2*k2_q(j) + 2*k3_q(j) + k4_q(j));
endfor
subplot(2,3,1), plot(s,u);
hold on; plot(s,v); hold off;
title ("Runge-Kutta");
h = legend ("u(s)", "v(s)");
legend (h, "location", "northwestoutside");
set (h, "fontsize", 10);
You misunderstood something in the method. The intermediate values for p,q are computed the same way as the intermediate values for u,v, and both are "Euler steps" with the last computed slopes, not separate slope computations. For the first ones that is
u1(j) = u(i) + (1/2)*k1_u(j)*h;
v1(j) = v(i) + (1/2)*k1_v(j)*h;
p1(j) = p(i) + (1/2)*k1_p(j)*h;
q1(j) = q(i) + (1/2)*k1_q(j)*h;
The computation for the k2 values then is correct, the next midpoints need to be computed correctly via "Euler steps", etc.

Assignment to an array defined outside parloop inside parfor

Consider the following code.
Wx = zeros(N, N);
for ii = 1 : 1 : N
x_ref = X(ii); y_ref = Y(ii);
nghlst_Local = nghlst(ii, find(nghlst(ii, :))); Nl = length(nghlst_Local);
x_Local = X(nghlst_Local, 1); y_Local = Y(nghlst_Local, 1);
PhiU = ones(Nl+1, Nl+1); PhiU(end, end) = 0;
Phi = ones(Nl+1, Nl+1); Phi(end, end) = 0;
Bx = zeros(Nl+1,1);
for jj = 1 : 1 : Nl
for kk = 1 : 1 : Nl
rx = x_Local(jj,1) - x_Local(kk,1);
ry = y_Local(jj,1) - y_Local(kk,1);
PhiU(jj, kk) = (1 - U(1,1))) / sqrt(rx^2 + ry^2 + c^2);
end
rx = x_ref - x_Local(jj);
ry = y_ref - y_Local(jj);
Bx(jj, 1) = ( (Beta * pi * U(1,1)/(2*r_0*norm(U))) * cos( (pi/2) * (-rx * U(1,1) - ry * U(2,1)) / (r_0 * norm(U)) ) ) / sqrt(rx^2 + ry^2 + c^2) - rx * (1 - Beta * sin( (pi/2) * (-rx * U(1,1) - ry * U(2,1)) / (r_0 * norm(U)) ))/ (rx^2 + ry^2 + c^2)^(3/2);
end
invPhiU = inv(PhiU);
CX = Bx' * invPhiU; CX = CX (1, 1:end-1); Wx (ii, nghlst_Local) = CX;
end
I want to convert the first for loop into parfor loop. The rest of the code works fine, but the following assignment statement does not work when I change for to parfor.
Wx (ii, nghlst_Local) = CX;
I want to know what is this is wrong and how to remove such errors. Thank you.

Solution of transcendental equation in with Matlab

I have an equation which goes like this:
Here, I_L(lambdap) is the modified bessel function. This and product with exponential function can be written in matlab as besseli(L,lambdap,1). "i" stands for square root of -1. I want to solve:
1+pt+it=0
where I have to vary 'k' and find values of 'w'. I had posted similar problem at mathematica stack exchange, but I couldn't solve the problem fully, though i have got a clue (please go through the comments at mathematica stack exchange site). I could not convert my equation to the code that has been posted in clue. Any help in this regards will be highly appreciated.
Thanks in advance...
I never attempted this before, but... is this returning a suitable result?
syms w k;
fun = 1 + pt(w,k) + it(w,k);
sol = vpasolve(fun == 0,w,k);
disp(sol.w);
disp(sol.k);
function res = pt(w,k)
eps_l0 = w / (1.22 * k);
lam_k = 0.25 * k^2;
res = sym('res',[5 1]);
res_off = 1;
for L = -2:2
gam = besseli(L,lam_k) * exp(-lam_k);
eps_z = (w - L) / (1.22 * k);
zeta = 1i * sqrt(pi()) * exp(-eps_z^2) * (1 + erfc(1i * eps_z));
res(res_off,:) = ((25000 * gam) / k^2) * (1 + (eps_l0 * zeta));
res_off = res_off + 1;
end
res = sum(res);
end
function res = it(w,k)
eps_l0 = (w - (0.86 * k)) / (3.46 * k);
lam_k = 0.03 * k^2;
res = sym('res',[5 1]);
res_off = 1;
for L = -2:2
gam = besseli(L,lam_k) * exp(-lam_k);
eps_z = (w - (8 * L) - (0.86 * k)) / (3.46 * k);
zeta = 1i * sqrt(pi()) * exp(-eps_z^2) * (1 + erfc(1i * eps_z));
res(res_off,:) = ((2000000 * gam) / k^2) * (1 + (eps_l0 * zeta));
res_off = res_off + 1;
end
res = sum(res);
end
EDIT
For numeric k and symbolic w:
syms w;
for k = -3:3
fun = 1 + pt(w,k) + it(w,k);
sol = vpasolve(fun == 0,w);
disp(sol.w);
end

Finding the minimum of a function over an interval

Upon request by Martin here is the basic problem. There is a function M(x) which is supposed to be minimized over the interval [lb, ub].
M = #(x) (a_1 * x + b_1) * (log((a_1 * x + b_1)/P_1) + X_u)...
+ (a_2 * x + b_2) * (log((a_2 * x + b_2)/P_2) + X_m)...
+ x * (log(x / P_3) + X_d);
lb = max(0, -b_1 / a_1);
ub = -b_2 / a_2;
where the inputs are:
P_1 = 0.6;
P_2 = 0.2;
P_3 = 0.2;
a_1 = 0.7071;
a_2 = -1.7071;
b_1 = 0.0245;
b_2 = 0.9755;
X_u = 44;
X_m = 2.9949;
X_d = 0;
The other option would be to solve for the root of the equation m_dash:
m_dash = #(x) log(((a_1 .* x + b_1).^a_1) .* ((a_2 .* x + b_2).^a_2) .* x)...
- log((P_1.^a_1) .* (P_2.^a_2) .* P_3) + a_1 .* X_u + a_2 .* X_m + X_d;
Any help would be greatly appreciated.
If you want to minimize a function over a certain interval, you can use the fminbnd function from the Optimization Toolbox. If you don't have that toolbox installed, you can either try a free alternative, or instead coerce the built-in function fminsearch to only return results from the interval:
rlv = 1e12; % ridiculously large value
M_hacked= #(x) rlv*((x < lb) + (x > ub)) + M(x);
x_min = fminsearch(M_hacked, (lb + ub)/2)
I introduced a new function, M_hacked, which returns ridiculously large values for x outside of the interval.
This is not be the most elegant solution, but it should do for your problem.

Radix-4 FFT implementation

The Octave radix-4 FFT code below works fine if I set power of 4 (xp) values case-by-case.
$ octave fft4.m
ans = 1.4198e-015
However, if I uncomment the loop code I get the following error
$ octave fft4.m
error: `stage' undefined near line 48 column 68
error: evaluating argument list element number 1
error: evaluating argument list element number 2
error: called from:
error: r4fftN at line 48, column 22
error: c:\Users\david\Documents\Visual Studio 2010\Projects\mv_fft\fft4.m at line 80, column 7
the "error" refers to a line the in fft function code which otherwise works correctly when xp is not set by a loop ... very strange.
function Z = radix4bfly(x,segment,stageFlag,W)
% For the last stage of a radix-4 FFT all the ABCD multiplers are 1.
% Use the stageFlag variable to indicate the last stage
% stageFlag = 0 indicates last FFT stage, set to 1 otherwise
% Initialize variables and scale to 1/4
a=x(1)*.25;
b=x(2)*.25;
c=x(3)*.25;
d=x(4)*.25;
% Radix-4 Algorithm
A=a+b+c+d;
B=(a-b+c-d)*W(2*segment*stageFlag + 1);
C=(a-b*j-c+d*j)*W(segment*stageFlag + 1);
D=(a+b*j-c-d*j)*W(3*segment*stageFlag + 1);
% assemble output
Z = [A B C D];
end % radix4bfly()
% radix-4 DIF FFT, input signal must be floating point, real or complex
%
function S = r4fftN(s)
% Initialize variables and signals: length of input signal is a power of 4
N = length(s);
M = log2(N)/2;
% Initialize variables for floating point sim
W=exp(-j*2*pi*(0:N-1)/N);
S = complex(zeros(1,N));
sTemp = complex(zeros(1,N));
% FFT algorithm
% Calculate butterflies for first M-1 stages
sTemp = s;
for stage = 0:M-2
for n=1:N/4
S((1:4)+(n-1)*4) = radix4bfly(sTemp(n:N/4:end), floor((n-1)/(4^stage)) *(4^stage), 1, W);
end
sTemp = S;
end
% Calculate butterflies for last stage
for n=1:N/4
S((1:4)+(n-1)*4) = radix4bfly(sTemp(n:N/4:end), floor((n-1)/(4^stage)) * (4^
stage), 0, W);
end
sTemp = S;
% Rescale the final output
S = S*N;
end % r4fftN(s)
% test FFT code
%
xp = 2;
% ERROR if I uncomment loop!
%for xp=1:8
N = 4^xp; % must be power of: 4 16 64 256 1024 4086 ....
x = 2*pi/N * (0:N-1);
x = cos(x);
Y_ref = fft(x);
Y = r4fftN(x);
Y = digitrevorder(Y,4);
%Y = bitrevorder(Y,4);
abs(sum(Y_ref-Y)) % compare fft4 to built-in fft
%end
The problem was the loop-bound for the exponent xp should start from 2 as the fft4 code assumes at least 2 stages of radix-4 butterflies
Sorry folks :(
-David
Please find below a fully worked Matlab implementation of a radix-4 Decimation In Frequency FFT algorithm. I have also provided an overall operations count in terms of complex matrix multiplications and additions. It can be indeed shown that each radix-4 butterfly involves 3 complex multiplications and 8 complex additions. Since there are log_4(N) = log_2(N) / 2 stages and each stage involves N / 4 butterflies, so the operations count is
complex multiplications = (3 / 8) * N * log2(N)
complex additions = N * log2(N)
Here is the code:
% --- Radix-2 Decimation In Frequency - Iterative approach
clear all
close all
clc
% --- N should be a power of 4
N = 1024;
% x = randn(1, N);
x = zeros(1, N);
x(1 : 10) = 1;
xoriginal = x;
xhat = zeros(1, N);
numStages = log2(N) / 2;
W = exp(-1i * 2 * pi * (0 : N - 1) / N);
omegaa = exp(-1i * 2 * pi / N);
mulCount = 0;
sumCount = 0;
M = N / 4;
for p = 1 : numStages;
for index = 0 : (N / (4^(p - 1))) : (N - 1);
for n = 0 : M - 1;
a = x(n + index + 1) + x(n + index + M + 1) + x(n + index + 2 * M + 1) + x(n + index + 3 * M + 1);
b = (x(n + index + 1) - x(n + index + M + 1) + x(n + index + 2 * M + 1) - x(n + index + 3 * M + 1)) .* omegaa^(2 * (4^(p - 1) * n));
c = (x(n + index + 1) - 1i * x(n + index + M + 1) - x(n + index + 2 * M + 1) + 1i * x(n + index + 3 * M + 1)) .* omegaa^(1 * (4^(p - 1) * n));
d = (x(n + index + 1) + 1i * x(n + index + M + 1) - x(n + index + 2 * M + 1) - 1i * x(n + index + 3 * M + 1)) .* omegaa^(3 * (4^(p - 1) * n));
x(n + 1 + index) = a;
x(n + M + 1 + index) = b;
x(n + 2 * M + 1 + index) = c;
x(n + 3 * M + 1 + index) = d;
mulCount = mulCount + 3;
sumCount = sumCount + 8;
end;
end;
M = M / 4;
end
xhat = bitrevorder(x);
tic
xhatcheck = fft(xoriginal);
timeFFTW = toc;
rms = 100 * sqrt(sum(sum(abs(xhat - xhatcheck).^2)) / sum(sum(abs(xhat).^2)));
fprintf('Theoretical multiplications count \t = %i; \t Actual multiplications count \t = %i\n', ...
(3 / 8) * N * log2(N), mulCount);
fprintf('Theoretical additions count \t\t = %i; \t Actual additions count \t\t = %i\n\n', ...
N * log2(N), sumCount);
fprintf('Root mean square with FFTW implementation = %.10e\n', rms);