Nested for loops in Matlab GPU programming - matlab

I want to run this specific nested for loop in GPU using matlab, can anyboy help me,
Phi=rand(100,100); FluxD=rand(100,100); FluxC=rand(100,100);
Ima = 100;
Jma = 100;
for i=1:Ima-1
for j=1:Jma-1
Phi(i,j) =Phi(i,j)+dt*(FluxD(i,j)-FluxC(i,j));
end
end

You need to do two things here - firstly, build your data on the GPU, and then for best performance, operate on it in a vectorised manner, like this:
% Build input data arrays directly on the GPU
Phi = rand(100, 'gpuArray');
FluxD = rand(100, 'gpuArray');
FluxC = rand(100, 'gpuArray');
Ima = 100;
Jma = 100;
% For convenience, make index vectors for i and j
ii = 1:Ima-1;
jj = 1:Jma-1;
% Compute Phi in a vectorised manner
Phi(ii, jj) = Phi(ii, jj) + dt * (FluxD(ii,jj) - FluxC(ii,jj));

Related

Optimization of column shifting of large matrices (Circshift, etc.)

I am currently looking for the most efficient way to shift and rearrange large matrices. Essentially, I have data with some parabolic shift that needs to be corrected in order to shift the "signal" to a linear event.
I have currently tried the following solutions and tried timing them. Is there any other method that may prove to be more efficient?
DATA = ones(100000,501);
DATA(10000,251) = 100;
for i=1:250
DATA(10000+i^2-1000:10000+i^2+1000,251-i) = 100;
DATA(10000+i^2-1000:10000+i^2+1000,251+i) = 100;
end
k = abs(-250:1:250).^2;
d = size(DATA,1);
figure(99)
imagesc(DATA)
t_INDEX = timeit(#()fun_INDEX(DATA,k))
t_SNIPPET = timeit(#()fun_SNIPPET(DATA,k))
t_CIRCSHIFT = timeit(#()fun_CIRCSHIFT(DATA,k))
t_INDEX_clean = timeit(#()fun_INDEX_clean(DATA,k))
t_SPARSE = timeit(#()fun_SPARSE(DATA,k))
t_BSXFUN = timeit(#()fun_BSXFUN(DATA,k))
function fun_INDEX(DATA,k)
DATA_1 = zeros(size(DATA));
for i=1:size(DATA,2)
DATA_1(:,i) = DATA([k(i)+1:end 1:k(i)],i);
end
figure(1)
imagesc(DATA_1)
end
function fun_SNIPPET(DATA,k)
kmax = max(k);
DATA_2 = zeros(size(DATA,1)-kmax,size(DATA,2));
for i=1:size(DATA,2)
DATA_2(:,i) = DATA(k(i)+1:end-kmax+k(i),i);
end
figure(2)
imagesc(DATA_2)
end
function fun_CIRCSHIFT(DATA,k)
DATA_3 = zeros(size(DATA));
for i=1:size(DATA,2)
DATA_3(:,i) = circshift(DATA(:,i),-k(i),1);
end
figure(3)
imagesc(DATA_3)
end
function fun_INDEX_clean(DATA,k)
[m, n] = size(DATA);
k = size(DATA,1)-k;
DATA_4 = zeros(m, n);
for i = (1 : n)
DATA_4(:, i) = [DATA((m - k(i) + 1 : m), i); DATA((1 : m - k(i) ), i)];
end
figure(4)
imagesc(DATA_4)
end
function fun_SPARSE(DATA,k)
[m,n] = size(DATA);
k = -k;
S = full(sparse(mod(k,m)+1,1:n,1,m,n));
DATA_5 = ifft(fft(DATA).*fft(S),'symmetric');
figure(5)
imagesc(DATA_5)
end
function fun_BSXFUN(DATA,k)
DATA = DATA';
k = -k;
[m,n] = size(DATA);
idx0 = mod(bsxfun(#plus,n-k(:),1:n)-1,n);
DATA_6 = DATA(bsxfun(#plus,(idx0*m),(1:m)'));
figure(6)
imagesc(DATA_6)
end
Is there any way to decrease computation time for this kind of problem?
Thanks in advance for any tips!
One option would be to use MATLAB's GPU functions, if your workstation has a GPU. Depending on if the entire data fits on the GPU at once, it will start to outperform CPU circshift at 1000 X 1000 matrix size.
The implementation only requires you to copy your data to the GPU with a single statement, and then operate circshift on the newly created you array.
A small discussion on its performance can be found here: https://www.mathworks.com/matlabcentral/answers/274619-circshift-slower-on-gpu . Especially, the last post describes a much faster GPU implementation if you actually don't need to circularly shift, but can get away with zero passing on one side, which might be relevant.

Storing Results from multiple loops and creating a data frame looking object in Matlab

i used for loops :
for i=1:length(thetas)
theta = thetas(i); % Utility function
for j=1:length(rhos)
rho = rhos(j);
for ii=1:length(gammas)
gamma = gammas(ii);
[kss]=equilibirum(debt)wherein
end
end
end
where in each step I essentially change some parameter values to get different values for the column vector kss (size: 10000x1)
e.g the vector of parameters I am looping over are:
thetas = [1, 1.5];
rhos = [0, 0.99, 2];
gammas = [-1,0,0.76, 0.9, 1] ;
I want to remember (or store) for which combination of parameters i get the values for `kss'.
How can I do this Matlab in some easy to understand and easy to export (e.g. in Excel) way? An ideal solution, will make my result look like a data frame object as in python(pandas) or R
You can use tables in MATLAB to describe what you wish to accomplish.
kss_table = table;
counter = 1;
for i=1:length(thetas)
theta = thetas(i); % Utility function
for j=1:length(rhos)
rho = rhos(j);
for ii=1:length(gammas)
gamma = gammas(ii);
kss = equilibirum(debt)wherein
kss_table.Theta(counter) = theta;
kss_table.Rho(counter) = rho;
kss_table.Gamma(counter) = gamma;
counter = counter + 1;
end
end
end

Simple Monte Carlo on GPU using Parallel toolbox

Here is a toy example that i put together exploring the parfoor function using the CPU to speed up executions. Even after reviewing the Parallel documentation however i am confused how to upgrade this to run on my GPU (Nvidia 980ti).
Would appreciate any pointers on how to update this code to run on GPU.
Cheers.
% toy example--monte carlo estimation of pi using for loops
tic;
N = 1000000000;
hitcounter = 0;
for i = 1:N
x = rand;
y = rand;
if ( y < sqrt(1-x*x) )
hitcounter = hitcounter + 1;
end
end
disp(hitcounter/N*4)
toc;
% toy example--monte carlo estimation of pi using parfor loops
tic;
N = 1000000000;
hitcounter = 0;
parfor i = 1:N
x = rand;
y = rand;
if ( y < sqrt(1-x*x) )
hitcounter = hitcounter + 1;
end
end
disp(hitcounter/N*4)
toc;
The main thing you need to do is to vectorise your code - this is always a good idea, and especially so on the GPU. Then, you simply need to build x and y directly on the GPU using the trailing argument to rand.
N = 1000000;
x = rand(1, N, 'gpuArray');
y = rand(1, N, 'gpuArray');
pi_est = sum(y < sqrt(1 - x.*x)) / N * 4;
Note I scaled back N to enable this to fit on the GPU. If you want to run with a higher value of N - I would suggest adding an outer loop, and essentially performing the computation in "chunks" that fit on the limited memory of the GPU.

Vectorization - Sum and Bessel function

Can anyone help vectorize this Matlab code? The specific problem is the sum and bessel function with vector inputs.
Thank you!
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
for ii = 1:length(rho_g)
for jj = 1:length(phi_g)
% Coordinates
rho_o = rho_g(ii);
phi_o = phi_g(jj);
% factors
fc = cos(n.*(phi_o-phi_s));
fs = sin(n.*(phi_o-phi_s));
Ez_t(ii,jj) = sum(tau.*besselj(n,k(3)*rho_s).*besselh(n,2,k(3)*rho_o).*fc);
end
end
You could try to vectorize this code, which might be possible with some bsxfun or so, but it would be hard to understand code, and it is the question if it would run any faster, since your code already uses vector math in the inner loop (even though your vectors only have length 3). The resulting code would become very difficult to read, so you or your colleague will have no idea what it does when you have a look at it in 2 years time.
Before wasting time on vectorization, it is much more important that you learn about loop invariant code motion, which is easy to apply to your code. Some observations:
you do not use fs, so remove that.
the term tau.*besselj(n,k(3)*rho_s) does not depend on any of your loop variables ii and jj, so it is constant. Calculate it once before your loop.
you should probably pre-allocate the matrix Ez_t.
the only terms that change during the loop are fc, which depends on jj, and besselh(n,2,k(3)*rho_o), which depends on ii. I guess that the latter costs much more time to calculate, so it better to not calculate this N*N times in the inner loop, but only N times in the outer loop. If the calculation based on jj would take more time, you could swap the for-loops over ii and jj, but that does not seem to be the case here.
The result code would look something like this (untested):
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
% constant part, does not depend on ii and jj, so calculate only once!
temp1 = tau.*besselj(n,k(3)*rho_s);
Ez_t = nan(length(rho_g), length(phi_g)); % preallocate space
for ii = 1:length(rho_g)
% calculate stuff that depends on ii only
rho_o = rho_g(ii);
temp2 = besselh(n,2,k(3)*rho_o);
for jj = 1:length(phi_g)
phi_o = phi_g(jj);
fc = cos(n.*(phi_o-phi_s));
Ez_t(ii,jj) = sum(temp1.*temp2.*fc);
end
end
Initialization -
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
Nested loops form (Copy from your code and shown here for comparison only) -
for ii = 1:length(rho_g)
for jj = 1:length(phi_g)
% Coordinates
rho_o = rho_g(ii);
phi_o = phi_g(jj);
% factors
fc = cos(n.*(phi_o-phi_s));
fs = sin(n.*(phi_o-phi_s));
Ez_t(ii,jj) = sum(tau.*besselj(n,k(3)*rho_s).*besselh(n,2,k(3)*rho_o).*fc);
end
end
Vectorized solution -
%%// Term - 1
term1 = repmat(tau.*besselj(n,k(3)*rho_s),[N*N 1]);
%%// Term - 2
[n1,rho_g1] = meshgrid(n,rho_g);
term2_intm = besselh(n1,2,k(3)*rho_g1);
term2 = transpose(reshape(repmat(transpose(term2_intm),[N 1]),N,N*N));
%%// Term -3
angle1 = repmat(bsxfun(#times,bsxfun(#minus,phi_g,phi_s')',n),[N 1]);
fc = cos(angle1);
%%// Output
Ez_t = sum(term1.*term2.*fc,2);
Ez_t = transpose(reshape(Ez_t,N,N));
Points to note about this vectorization or code simplification –
‘fs’ doesn’t change the output of the script, Ez_t, so it could be removed for now.
The output seems to be ‘Ez_t’,which requires three basic terms in the code as –
tau.*besselj(n,k(3)*rho_s), besselh(n,2,k(3)*rho_o) and fc. These are calculated separately for vectorization as terms1,2 and 3 respectively.
All these three terms appear to be of 1xN sizes. Our aim thus becomes to calculate these three terms without loops. Now, the two loops run for N times each, thus giving us a total loop count of NxN. Thus, we must have NxN times the data in each such term as compared to when these terms were inside the nested loops.
This is basically the essence of the vectorization done here, as the three terms are represented by ‘term1’,’term2’ and ‘fc’ itself.
In order to give a self-contained answer, I'll copy the original initialization
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
and generate some missing data (k(3) and rho_s and phi_s in the dimension of n)
rho_s = rand(size(n));
phi_s = rand(size(n));
k(3) = rand(1);
then you can compute the same Ez_t with multidimensional arrays:
[RHO_G, PHI_G, N] = meshgrid(rho_g, phi_g, n);
[~, ~, TAU] = meshgrid(rho_g, phi_g, tau);
[~, ~, RHO_S] = meshgrid(rho_g, phi_g, rho_s);
[~, ~, PHI_S] = meshgrid(rho_g, phi_g, phi_s);
FC = cos(N.*(PHI_G - PHI_S));
FS = sin(N.*(PHI_G - PHI_S)); % not used
EZ_T = sum(TAU.*besselj(N, k(3)*RHO_S).*besselh(N, 2, k(3)*RHO_G).*FC, 3).';
You can check afterwards that both matrices are the same
norm(Ez_t - EZ_T)

How to obtain complexity cosine similarity in Matlab?

I have implemented cosine similarity in Matlab like this. In fact, I have a two-dimensional 50-by-50 matrix. To obtain a cosine should I compare items in a line by line form.
for j = 1:50
x = dat(j,:);
for i = j+1:50
y = dat(i,:);
c = dot(x,y);
sim = c/(norm(x,2)*norm(y,2));
end
end
Is this correct?
and The question is this: wath is the complexity or O(n) in this state?
Just a note on an efficient implementation of the same thing using vectorized and matrix-wise operations (which are optimized in MATLAB). This can have huge time savings for large matrices:
dat = randn(50, 50);
OP (double-for) implementation:
sim = zeros(size(dat));
nRow = size(dat,1);
for j = 1:nRow
x = dat(j, :);
for i = j+1:nRow
y = dat(i, :);
c = dot(x, y);
sim(j, i) = c/(norm(x,2)*norm(y,2));
end
end
Vectorized implementation:
normDat = sqrt(sum(dat.^2, 2)); % L2 norm of each row
datNorm = bsxfun(#rdivide, dat, normDat); % normalize each row
dotProd = datNorm*datNorm'; % dot-product vectorized (redundant!)
sim2 = triu(dotProd, 1); % keep unique upper triangular part
Comparisons for 1000 x 1000 matrix: (MATLAB 2013a, x64, Intel Core i7 960 # 3.20GHz)
Elapsed time is 34.103095 seconds.
Elapsed time is 0.075208 seconds.
sum(sum(sim-sim2))
ans =
-1.224314766369880e-14
Better end with 49. Maybe you should also add an index to sim?
for j = 1:49
x = dat(j,:);
for i = j+1:50
y = dat(i,:);
c = dot(x,y);
sim(j) = c/(norm(x,2)*norm(y,2));
end
end
The complexity should be roughly like o(n^2), isn't it?
Maybe you should have a look at correlation functions ... I don't get what you want to write exactly, but it looks like you want to do something similar. There are built-in correlation functions in Matlab.