Radix-4 FFT implementation - matlab

The Octave radix-4 FFT code below works fine if I set power of 4 (xp) values case-by-case.
$ octave fft4.m
ans = 1.4198e-015
However, if I uncomment the loop code I get the following error
$ octave fft4.m
error: `stage' undefined near line 48 column 68
error: evaluating argument list element number 1
error: evaluating argument list element number 2
error: called from:
error: r4fftN at line 48, column 22
error: c:\Users\david\Documents\Visual Studio 2010\Projects\mv_fft\fft4.m at line 80, column 7
the "error" refers to a line the in fft function code which otherwise works correctly when xp is not set by a loop ... very strange.
function Z = radix4bfly(x,segment,stageFlag,W)
% For the last stage of a radix-4 FFT all the ABCD multiplers are 1.
% Use the stageFlag variable to indicate the last stage
% stageFlag = 0 indicates last FFT stage, set to 1 otherwise
% Initialize variables and scale to 1/4
a=x(1)*.25;
b=x(2)*.25;
c=x(3)*.25;
d=x(4)*.25;
% Radix-4 Algorithm
A=a+b+c+d;
B=(a-b+c-d)*W(2*segment*stageFlag + 1);
C=(a-b*j-c+d*j)*W(segment*stageFlag + 1);
D=(a+b*j-c-d*j)*W(3*segment*stageFlag + 1);
% assemble output
Z = [A B C D];
end % radix4bfly()
% radix-4 DIF FFT, input signal must be floating point, real or complex
%
function S = r4fftN(s)
% Initialize variables and signals: length of input signal is a power of 4
N = length(s);
M = log2(N)/2;
% Initialize variables for floating point sim
W=exp(-j*2*pi*(0:N-1)/N);
S = complex(zeros(1,N));
sTemp = complex(zeros(1,N));
% FFT algorithm
% Calculate butterflies for first M-1 stages
sTemp = s;
for stage = 0:M-2
for n=1:N/4
S((1:4)+(n-1)*4) = radix4bfly(sTemp(n:N/4:end), floor((n-1)/(4^stage)) *(4^stage), 1, W);
end
sTemp = S;
end
% Calculate butterflies for last stage
for n=1:N/4
S((1:4)+(n-1)*4) = radix4bfly(sTemp(n:N/4:end), floor((n-1)/(4^stage)) * (4^
stage), 0, W);
end
sTemp = S;
% Rescale the final output
S = S*N;
end % r4fftN(s)
% test FFT code
%
xp = 2;
% ERROR if I uncomment loop!
%for xp=1:8
N = 4^xp; % must be power of: 4 16 64 256 1024 4086 ....
x = 2*pi/N * (0:N-1);
x = cos(x);
Y_ref = fft(x);
Y = r4fftN(x);
Y = digitrevorder(Y,4);
%Y = bitrevorder(Y,4);
abs(sum(Y_ref-Y)) % compare fft4 to built-in fft
%end

The problem was the loop-bound for the exponent xp should start from 2 as the fft4 code assumes at least 2 stages of radix-4 butterflies
Sorry folks :(
-David

Please find below a fully worked Matlab implementation of a radix-4 Decimation In Frequency FFT algorithm. I have also provided an overall operations count in terms of complex matrix multiplications and additions. It can be indeed shown that each radix-4 butterfly involves 3 complex multiplications and 8 complex additions. Since there are log_4(N) = log_2(N) / 2 stages and each stage involves N / 4 butterflies, so the operations count is
complex multiplications = (3 / 8) * N * log2(N)
complex additions = N * log2(N)
Here is the code:
% --- Radix-2 Decimation In Frequency - Iterative approach
clear all
close all
clc
% --- N should be a power of 4
N = 1024;
% x = randn(1, N);
x = zeros(1, N);
x(1 : 10) = 1;
xoriginal = x;
xhat = zeros(1, N);
numStages = log2(N) / 2;
W = exp(-1i * 2 * pi * (0 : N - 1) / N);
omegaa = exp(-1i * 2 * pi / N);
mulCount = 0;
sumCount = 0;
M = N / 4;
for p = 1 : numStages;
for index = 0 : (N / (4^(p - 1))) : (N - 1);
for n = 0 : M - 1;
a = x(n + index + 1) + x(n + index + M + 1) + x(n + index + 2 * M + 1) + x(n + index + 3 * M + 1);
b = (x(n + index + 1) - x(n + index + M + 1) + x(n + index + 2 * M + 1) - x(n + index + 3 * M + 1)) .* omegaa^(2 * (4^(p - 1) * n));
c = (x(n + index + 1) - 1i * x(n + index + M + 1) - x(n + index + 2 * M + 1) + 1i * x(n + index + 3 * M + 1)) .* omegaa^(1 * (4^(p - 1) * n));
d = (x(n + index + 1) + 1i * x(n + index + M + 1) - x(n + index + 2 * M + 1) - 1i * x(n + index + 3 * M + 1)) .* omegaa^(3 * (4^(p - 1) * n));
x(n + 1 + index) = a;
x(n + M + 1 + index) = b;
x(n + 2 * M + 1 + index) = c;
x(n + 3 * M + 1 + index) = d;
mulCount = mulCount + 3;
sumCount = sumCount + 8;
end;
end;
M = M / 4;
end
xhat = bitrevorder(x);
tic
xhatcheck = fft(xoriginal);
timeFFTW = toc;
rms = 100 * sqrt(sum(sum(abs(xhat - xhatcheck).^2)) / sum(sum(abs(xhat).^2)));
fprintf('Theoretical multiplications count \t = %i; \t Actual multiplications count \t = %i\n', ...
(3 / 8) * N * log2(N), mulCount);
fprintf('Theoretical additions count \t\t = %i; \t Actual additions count \t\t = %i\n\n', ...
N * log2(N), sumCount);
fprintf('Root mean square with FFTW implementation = %.10e\n', rms);

Related

Matlab: resize with custom interpolation kernel Mitchell-Netravali

I have seen that there was an interest in custom interpolation kernels for resize (MATLAB imresize with a custom interpolation kernel). Did anyone implemented the parametric Mitchell-Netravali kernel [1] that is used as default in ImageMagick and is willing to share the Matlab code? Thank you very much!
[1] http://developer.download.nvidia.com/books/HTML/gpugems/gpugems_ch24.html
// Mitchell Netravali Reconstruction Filter
// B = 0 C = 0 - Hermite B-Spline interpolator
// B = 1, C = 0 - cubic B-spline
// B = 0, C = 1/2 - Catmull-Rom spline
// B = 1/3, C = 1/3 - recommended
float MitchellNetravali(float x, float B, float C)
{
float ax = fabs(x);
if (ax < 1) {
return ((12 - 9 * B - 6 * C) * ax * ax * ax +
(-18 + 12 * B + 6 * C) * ax * ax + (6 - 2 * B)) / 6;
} else if ((ax >= 1) && (ax < 2)) {
return ((-B - 6 * C) * ax * ax * ax +
(6 * B + 30 * C) * ax * ax + (-12 * B - 48 * C) *
ax + (8 * B + 24 * C)) / 6;
} else {
return 0;
}
}
Here I got another approach with vectorization; according to my tests with upscaling (1000x1000 -> 3000x3000) this is faster than the standard bicubic even with a large Mitchell radius = 6:
function [outputs] = Mitchell_vect(x,M_B,M_C)
outputs= zeros(size(x,1),size(x,2));
ax = abs(x);
temp = ((12-9*M_B-6*M_C) .* ax.^3 + (-18+12*M_B+6*M_C) .* ax.^2 + (6-2*M_B))./6;
temp2 = ((-M_B-6*M_C) .* ax.^3 + (6*M_B+30*M_C) .* ax.^2 + (-12*M_B-48*M_C) .* ax + (8*M_B + 24*M_C))./6;
index = find(ax<1);
outputs(index)=temp(index);
index = find(ax>=1 & ax<2);
outputs(index)=temp2(index);
end
I got the following proposal for the Mitchel kernel called by imresize with the parameters B and C and a kernel radius using for-loops (and preallocation):
img_resize = imresize(img, [h w], {#(x)Mitchell(x,B,C),radius});
function [outputs] = Mitchell(x,B,C)
outputs= zeros(size(x,1),size(x,2));
for i = 1 : size(x,1)
for j = 1 : size(x,2)
ax = abs(x(i,j));
if ax < 1
outputs(i,j) = ((12-9*B-6*C) * ax^3 + (-18+12*B+6*C) * ax^2 + (6-2*B))/6;
elseif (ax >= 1) && (ax < 2)
outputs(i,j) = ((-B-6*C) * ax^3 + (6*B+30*C) * ax^2 + (-12*B-48*C) * ax + (8*B + 24*C))/6;
else
outputs(i,j) = 0;
end
end
end
end

ind2sub for nonzero elements of triangular matrix

I just wanted to simply find the index of (row, col) that is a minimum point of a matrix A. I can use
[minval, imin] = min( A(:) )
and MATLAB built in function
[irow, icol] = ind2sub(imin);
But for efficiency reason, where matrix A is trigonal, i wanted to implement the following function
function [i1, i2] = myind2ind(ii, N);
k = 1;
for i = 1:N
for j = i+1:N
I(k, 1) = i; I(k, 2) = j;
k = k + 1;
end
end
i1 = I(ii, 1);
i2 = I(ii, 2);
this function returns 8 and 31 for the following input
[irow, icol] = myind2ind(212, 31); % irow=8, icol = 31
How can I implement myind2ind function more efficient way without using the internal "I"?
The I matrix can be generated by nchoosek.
For example if N = 5 we have:
N =5
I= nchoosek(1:N,2)
ans =
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
so that
4 repeated 1 times
3 repeated 2 times
2 repeated 3 times
1 repeated 4 times
We can get the number of rows of I with the Gauss formula for triangular number
(N-1) * (N-1+1) /2 =
N * (N -1) / 2 =
10
Given jj = size(I,1) + 1 - ii as a row index I that begins from the end of I and using N * (N -1) / 2 we can formulate a quadratic equation:
N * (N -1) / 2 = jj
(N^2 -N)/2 =jj
So
N^2 -N - 2*jj = 0
Its root is:
r = (1+sqrt(8*jj))/2
r can be rounded and subtracted from N to get the first element (row number of triangular matrix) of the desired output.
R = N + 1 -floor(r);
For the column number we find the index of the first element idx_first of the current row R:
idx_first=(floor(r+1) .* floor(r)) /2;
The column number can be found by subtracting current linear index from the linear index of the first element of the current row and adding R to it.
Here is the implemented function:
function [R , C] = myind2ind(ii, N)
jj = N * (N - 1) / 2 + 1 - ii;
r = (1 + sqrt(8 * jj)) / 2;
R = N -floor(r);
idx_first = (floor(r + 1) .* floor(r)) / 2;
C = idx_first-jj + R + 1;
end

Finding the minimum of a function over an interval

Upon request by Martin here is the basic problem. There is a function M(x) which is supposed to be minimized over the interval [lb, ub].
M = #(x) (a_1 * x + b_1) * (log((a_1 * x + b_1)/P_1) + X_u)...
+ (a_2 * x + b_2) * (log((a_2 * x + b_2)/P_2) + X_m)...
+ x * (log(x / P_3) + X_d);
lb = max(0, -b_1 / a_1);
ub = -b_2 / a_2;
where the inputs are:
P_1 = 0.6;
P_2 = 0.2;
P_3 = 0.2;
a_1 = 0.7071;
a_2 = -1.7071;
b_1 = 0.0245;
b_2 = 0.9755;
X_u = 44;
X_m = 2.9949;
X_d = 0;
The other option would be to solve for the root of the equation m_dash:
m_dash = #(x) log(((a_1 .* x + b_1).^a_1) .* ((a_2 .* x + b_2).^a_2) .* x)...
- log((P_1.^a_1) .* (P_2.^a_2) .* P_3) + a_1 .* X_u + a_2 .* X_m + X_d;
Any help would be greatly appreciated.
If you want to minimize a function over a certain interval, you can use the fminbnd function from the Optimization Toolbox. If you don't have that toolbox installed, you can either try a free alternative, or instead coerce the built-in function fminsearch to only return results from the interval:
rlv = 1e12; % ridiculously large value
M_hacked= #(x) rlv*((x < lb) + (x > ub)) + M(x);
x_min = fminsearch(M_hacked, (lb + ub)/2)
I introduced a new function, M_hacked, which returns ridiculously large values for x outside of the interval.
This is not be the most elegant solution, but it should do for your problem.

Matrix calculation gets slower after each iteration in matlab

I have a 1024*1024*51 matrix. I'll do calculations to change some value of the matrix within for loops (change the value of matrix for each iteration). I find that the computing speed gets slower and slower and finally my computer gets into trouble.However the size of the matrix doesn't change. Anyone can shed some light on this problem?
function ActiveContours3D(method,grad,im,mu,nu,lambda1,lambda2,TimeSteps)
epsilon = 10e-10;
tic
fid=fopen('Chr18_z_25of25tiles-C=0_c0_n000.raw','rb','ieee-le');
Xdim = 1024;
Ydim = 1024;
Zdim = 51;
A = fread(fid,[Xdim Ydim*Zdim],'int16');
A = double(A);
size_of_A = size(A)
for(i=1:Zdim)
u0_color(:,:,i) = A(1 : Xdim , (i-1)*Ydim+1 : i*Ydim);
end
fclose(fid)
time = toc
[M,N,P,color] = size(u0_color);
size(u0_color );
u0_color = double(u0_color); % Convert u0_color values to double;
u0 = u0_color(:,:,:,1); % Define the Grayscale volumetric image.
u0_color = uint8(u0_color); % Necessary for color visualization
x = 1:M;
y = 1:N;
z = 1:P;
dx = 1
dy = 1
dz = 1
dim_approx = 2*M*N*P / sqrt(M*N*P);
if(method == 'Explicit')
dt = 0.9 / ((2*mu/(dx^2)) + (2*mu/(dy^2)) + (2*mu/(dz^2))) % 90% CFL
elseif(method == 'Implicit')
dt = (10e7) * 0.9 / ((2*mu/(dx^2)) + (2*mu/(dy^2)) + (2*mu/(dz^2)))
end
[X,Y,Z] = meshgrid(x,y,z);
x0 = (M+1)/2;
y0 = (N+1)/2;
z0 = (P+1)/2;
r0 = min(min(M,N),P)/3;
phi = sqrt((X-x0).^2 + (Y-y0).^2 + (Z-z0).^2) - r0;
phi_visualize = phi; % Use this for visualization in 3D
phi = permute(phi,[2,1,3]); % Use this for computations in 3D
write_to_binary_file(phi_visualize,0); % record initial conditions
tic
for(n=1:TimeSteps)
n
c1 = C1_3d(u0,phi);
c2 = C2_3d(u0,phi);
% x
phi_xp = [phi(2:M,:,:); phi(M,:,:)]; % vertical concatenation
phi_xm = [phi(1,:,:); phi(1:M-1,:,:)]; % (since x values are rows)
% cat(1,A,B) is the same as [A;B]
Dx_m = (phi - phi_xm)/dx; % first derivatives
Dx_p = (phi_xp - phi)/dx;
Dxx = (Dx_p - Dx_m)/dx; % second derivative
% y
phi_yp = [phi(:,2:N,:) phi(:,N,:)]; % horizontal concatenation
phi_ym = [phi(:,1,:) phi(:,1:N-1,:)]; % (since y values are columns)
% cat(2,A,B) is the same as [A,B]
Dy_m = (phi - phi_ym)/dy;
Dy_p = (phi_yp - phi)/dy;
Dyy = (Dy_p - Dy_m)/dy;
% z
phi_zp = cat(3,phi(:,:,2:P),phi(:,:,P));
phi_zm = cat(3,phi(:,:,1) ,phi(:,:,1:P-1));
Dz_m = (phi - phi_zm)/dz;
Dz_p = (phi_zp - phi)/dz;
Dzz = (Dz_p - Dz_m)/dz;
% x,y,z
Dx_0 = (phi_xp - phi_xm) / (2*dx);
Dy_0 = (phi_yp - phi_ym) / (2*dy);
Dz_0 = (phi_zp - phi_zm) / (2*dz);
phi_xp_yp = [phi_xp(:,2:N,:) phi_xp(:,N,:)];
phi_xp_ym = [phi_xp(:,1,:) phi_xp(:,1:N-1,:)];
phi_xm_yp = [phi_xm(:,2:N,:) phi_xm(:,N,:)];
phi_xm_ym = [phi_xm(:,1,:) phi_xm(:,1:N-1,:)];
phi_xp_zp = cat(3,phi_xp(:,:,2:P),phi_xp(:,:,P));
phi_xp_zm = cat(3,phi_xp(:,:,1) ,phi_xp(:,:,1:P-1));
phi_xm_zp = cat(3,phi_xm(:,:,2:P),phi_xm(:,:,P));
phi_xm_zm = cat(3,phi_xm(:,:,1) ,phi_xm(:,:,1:P-1));
phi_yp_zp = cat(3,phi_yp(:,:,2:P),phi_yp(:,:,P));
phi_yp_zm = cat(3,phi_yp(:,:,1) ,phi_yp(:,:,1:P-1));
phi_ym_zp = cat(3,phi_ym(:,:,2:P),phi_ym(:,:,P));
phi_ym_zm = cat(3,phi_ym(:,:,1) ,phi_ym(:,:,1:P-1));
if(grad == 'Dirac')
Grad = DiracDelta(phi); % Dirac delta
%Grad = 1;
elseif(grad == 'Grad ')
Grad = (((Dx_0.^2)+(Dy_0.^2)+(Dz_0.^2)).^(1/2)); % |grad phi|
end
if(method == 'Explicit')
% CURVATURE: *mu*k|grad phi|* (central differences):
K = zeros(M,N,P);
Dxy = (phi_xp_yp - phi_xp_ym - phi_xm_yp + phi_xm_ym) / (4*dx*dy);
Dxz = (phi_xp_zp - phi_xp_zm - phi_xm_zp + phi_xm_zm) / (4*dx*dz);
Dyz = (phi_yp_zp - phi_yp_zm - phi_ym_zp + phi_ym_zm) / (4*dy*dz);
K = ( (Dx_0.^2).*Dyy - 2*Dx_0.*Dy_0.*Dxy + (Dy_0.^2).*Dxx ...
+ (Dx_0.^2).*Dzz - 2*Dx_0.*Dz_0.*Dxz + (Dz_0.^2).*Dxx ...
+ (Dy_0.^2).*Dzz - 2*Dy_0.*Dz_0.*Dyz + (Dz_0.^2).*Dyy) ./ ((Dx_0.^2 + Dy_0.^2 + Dz_0.^2).^(3/2) + epsilon);
phi_temp = phi + dt * Grad .* ( mu.*K + lambda1*(u0 - c1).^2 - lambda2*(u0 - c2).^2 );
elseif(method == 'Implicit')
C1x = 1 ./ sqrt(Dx_p.^2 + Dy_0.^2 + Dz_0.^2 + (10e-7)^2);
C2x = 1 ./ sqrt(Dx_m.^2 + Dy_0.^2 + Dz_0.^2 + (10e-7)^2);
C3y = 1 ./ sqrt(Dx_0.^2 + Dy_p.^2 + Dz_0.^2 + (10e-7)^2);
C4y = 1 ./ sqrt(Dx_0.^2 + Dy_m.^2 + Dz_0.^2 + (10e-7)^2);
C5z = 1 ./ sqrt(Dx_0.^2 + Dy_0.^2 + Dz_p.^2 + (10e-7)^2);
C6z = 1 ./ sqrt(Dx_0.^2 + Dy_0.^2 + Dz_m.^2 + (10e-7)^2);
% m = (dt/(dx*dy)) * Grad .* mu; % 2D
m = (dt/(dx*dy)) * Grad .* mu;
C = 1 + m.*(C1x + C2x + C3y + C4y + C5z + C6z);
C1x_2x = C1x.*phi_xp + C2x.*phi_xm;
C3y_4y = C3y.*phi_yp + C4y.*phi_ym;
C5z_6z = C5z.*phi_zp + C6z.*phi_zm;
phi_temp = (1 ./ C) .* ( phi + m.*(C1x_2x+C3y_4y) + (dt*Grad).*(lambda1*(u0 - c1).^2) - (dt*Grad).*(lambda2*(u0 - c2).^2) );
end
phi = phi_temp;
phi_visualize = permute(phi,[2,1,3]);
write_to_binary_file(phi_visualize,n); % record
end
time = toc
n = n
T = dt*n
clear
clear all
In general Matlab keeps track of all the variables in the form of matrix. When you work with lot of variables with many dimensions the RAM memory will be allocated for storing this variable. Hence on working with lots of variables that is gonna run for multiple iterations it is better to clear the variable from the memory. To do so use the command
clear variable_name_1, variable_name_2,... variable_name_3;
Although keeping all the variables keeps the code to look organised, however when you face such issues try clearing the unwanted variables.
Check this link to use clear command in detail: http://www.mathworks.in/help/matlab/ref/clear.html

How would you vectorize this nested loop in matlab/octave?

I am stuck at vectorizing this tricky loop in MATLAB/Octave:
[nr, nc] = size(R);
P = rand(nr, K);
Q = rand(K, nc);
for i = 1:nr
for j = 1:nc
if R(i,j) > 0
eij = R(i,j) - P(i,:)*Q(:,j);
for k = 1:K
P(i,k) = P(i,k) + alpha * (2 * eij * Q(k,j) - beta * P(i,k));
Q(k,j) = Q(k,j) + alpha * (2 * eij * P(i,k) - beta * Q(k,j));
end
end
end
end
The code tries to factorize R into P and Q, and approaching the nearest P and Q with an update rule. For example, let R = [3 4 0 1 1; 0 1 0 4 4; 5 4 3 1 0; 0 0 5 4 3; 5 3 0 2 1], K=2, alpha=0.01 and beta=0.015. In my real case, I will use a huge sparse matrix R (that's why I need vectorization), and K remain small (less than 10). The goal of the whole script is producing a prediction value for every 0 elements in R, based on the non zero elements. I got this code from here, originally written in Python.
This looks like one of those cases that not all code can be vectorized. Still, you can make it a bit better than it is now.
[nr, nc] = size(R);
P = rand(nr, K);
Q = rand(K, nc);
for i = 1:nr
for j = 1:nc
if R(i,j) > 0
eij = R(i,j) - P(i,:)*Q(:,j);
P(i,:) = P(i,:) + alpha * (2 * eij * Q(:,j)' - beta * P(i,:));
Q(:,j) = Q(:,j) + alpha * (2 * eij * P(i,:)' - beta * Q(:,j));
end
end
end
Since the operations on P and Q are serial in nature (iterative updates) I do not think you can do much better. You can save the if in the loop:
[nr, nc] - size(R);
P = rand(nr, K);
Q = rand(K, nc);
[nzi nzj] = find( R > 0 );
for ii=1:numel(nzi)
i = nzi(ii);
j = nzj(ii);
eij = R(i,j) - P(i,:)*Q(:,j);
P(i,:) = P(i,:) + alpha * (2 * eij * Q(:,j)' - beta * P(i,:));
Q(:,j) = Q(:,j) + alpha * (2 * eij * P(i,:)' - beta * Q(:,j));
end