I'm trying to implement this paper 'Salient Object detection by composition' here is the link: http://research.microsoft.com/en-us/people/yichenw/iccv11_salientobjectdetection.pdf
I have implemented the algorithm but it takes a long time to execute and display the output. I'm using 4 for loops in the code(Using for loops is the only way I could think of to implement this algorithm.) I have searched online for MATLAB code, but couldn't find anything. So can anyone please suggest any faster way to implement the algorithm. Also in the paper they(the authors) say that they have implemented the code using MATLAB and it runs quickly. So there definitely is a way to write the code more efficiently.
I appreciate any hint or code to execute this algorithm efficiently.
clear all
close all
%%instructions to run segment.cpp
%to run this code
%we need an output image
%segment sigma K min input output
%sigma: used for gaussian smoothing of the image
%K: scale of observation; larger K means larger components in segmentation
%min: minimum component size enforced by post processing
%calculating composition cost for each segment
I_org = imread('segment\1.ppm');
I = imread('segment\output1.ppm');
[rows,cols,dims] = size(I);
pixels = zeros(rows*cols,dims);
red_channel = I(:,:,1);
green_channel = I(:,:,2);
blue_channel = I(:,:,3);
[unique_pixels,count_pixels] = countPixels(I);
no_segments = size(count_pixels,1);
area_segments = count_pixels ./ (rows * cols);
appearance_distance = zeros(no_segments,no_segments);
spatial_distance = zeros(no_segments,no_segments);
thresh = multithresh(I_org,11);
thresh_values = [0 thresh];
for i = 1:no_segments
leave_pixel = unique_pixels(i,:);
mask_image = ((I(:,:,1) == leave_pixel(1)) & (I(:,:,2) == leave_pixel(2)) & (I(:,:,3) == leave_pixel(3)));
I_i(:,:,1) = I_org(:,:,1) .* uint8((mask_image));
I_i(:,:,2) = I_org(:,:,2) .* uint8((mask_image));
I_i(:,:,3) = I_org(:,:,3) .* uint8((mask_image));
LAB_trans = makecform('srgb2lab');
I_i_LAB = applycform(I_i,LAB_trans);
L_i_LAB = imhist(I_i_LAB(:,:,1));
A_i_LAB = imhist(I_i_LAB(:,:,2));
B_i_LAB = imhist(I_i_LAB(:,:,3));
for j = i:no_segments
leave_pixel = unique_pixels(j,:);
mask_image = ((I(:,:,1) == leave_pixel(1)) & (I(:,:,2) == leave_pixel(2)) & (I(:,:,3) == leave_pixel(3)));
I_j(:,:,1) = I_org(:,:,1) .* uint8((mask_image));
I_j(:,:,2) = I_org(:,:,2) .* uint8((mask_image));
I_j(:,:,3) = I_org(:,:,3) .* uint8((mask_image));
I_j_LAB = applycform(I_j,LAB_trans);
L_j_LAB = imhist(I_j_LAB(:,:,1));
A_j_LAB = imhist(I_j_LAB(:,:,2));
B_j_LAB = imhist(I_j_LAB(:,:,3));
appearance_distance(i,j) = sum(min(L_i_LAB,L_j_LAB) + min(A_i_LAB,A_j_LAB) + min(B_i_LAB,B_j_LAB));
spatial_distance(i,j) = ModHausdorffDist(I_i,I_j) / max(rows,cols);
spatial_distance = spatial_distance ./ max(max(spatial_distance));
max_apperance_distance = max(max(appearance_distance));
composition_cost = ((1 - spatial_distance) .* appearance_distance) + (spatial_distance * max_apperance_distance);
%input parameters for computation
window_size = 9; %rows and colums are considered to be same
window = ones(window_size);
additional_elements = (window_size - 1)/2;
I_temp(:,:,1) = [zeros(additional_elements,cols);I(:,:,1);zeros(additional_elements,cols)];
I_new(:,:,1) = [zeros(rows + (window_size - 1),additional_elements) I_temp(:,:,1) zeros(rows + (window_size - 1),additional_elements)];
I_temp(:,:,2) = [zeros(additional_elements,cols);I(:,:,2);zeros(additional_elements,cols)];
I_new(:,:,2) = [zeros(rows + (window_size - 1),additional_elements) I_temp(:,:,2) zeros(rows + (window_size - 1),additional_elements)];
I_temp(:,:,3) = [zeros(additional_elements,cols);I(:,:,3);zeros(additional_elements,cols)];
I_new(:,:,3) = [zeros(rows + (window_size - 1),additional_elements) I_temp(:,:,3) zeros(rows + (window_size - 1),additional_elements)];
cost = zeros(rows,cols);
for i = additional_elements + 1:rows
for j = additional_elements+1:cols
I_windowed(:,:,1) = I_new(i-additional_elements:i+additional_elements,i-additional_elements:i+additional_elements,1);
I_windowed(:,:,2) = I_new(i-additional_elements:i+additional_elements,i-additional_elements:i+additional_elements,2);
I_windowed(:,:,3) = I_new(i-additional_elements:i+additional_elements,i-additional_elements:i+additional_elements,3);
[unique_pixels_w,count_pixels_w] = countPixels(I_windowed);
unique_pixels_w = setdiff(unique_pixels_w,[0 0 0],'rows');
inside_segment = setdiff(unique_pixels,unique_pixels_w);
outside_segments = setdiff(unique_pixels,inside_segment);
area_segment = count_pixels_w;
for k = 1:size(inside_pixels,1)
current_segment = inside_segment(k,:);
cost_curr_seg = sort(composition_cost(ismember(unique_pixels,current_segment,'rows'),:));
for l = 1:size(cost_curr_seg,2)
if(ismember(unique_pixels(l,:),outside_segments,'rows') && count_pixels(l) > 0)
composed_area = min(area_segment(k),count_pixels(l));
cost(i,j) = cost(i,j) + cost_curr_seg(l) * composed_area;
area_segment(k) = area_segment(k) - composed_area;
count_pixels(l) = count_pixels(l) - composed_area;
if area_segment(k) == 0
if area(k) > 0
cost(i,j) = cost(i,j) + max_apperance_distance * area_segment(k);
cost = cost / window_size;
The code for the countPixels function:
function [unique_rows,counts] = countPixels(I)
[rows,cols,dims] = size(I);
pixels_I = zeros(rows*cols,dims);
count = 1;
for i = 1:rows
for j = 1:cols
pixels_I(count,:) = reshape(I(i,j,:),[1,3]);
count = count + 1;
[unique_rows,~,ind] = unique(pixels_I,'rows');
counts = histc(ind,unique(ind));


MATLAB 'parfor' Loops Very Slow When Compared With 'for' loop

I have a script that I'm running, and at one point I have a loop over n objects, where I want n to be fairly large.
I have access to a server, so I put in a parfor loop. However, this is incredibly slow compared with a standard for loops.
For example, running a certain configuration ( the one below ) with the parfor loop on 35 workers took 68 seconds, whereas the for loop took 2.3 seconds.
I know there's stuff to do with array-broadcasting that can cause issues, but I don't know a lot about this.
n = 20;
r = 1/30;
X = rand([2,n-1]);
X = [X,[0.5;0.5]];
D = sq_distance(X,X);
A = sparse((D < r) - eye(n));
% Infected set I
I = n;
[S,C] = graphconncomp(A);
compnum = C(I);
I_new = find(C == compnum);
I = I_new;
hold on
hold off
title('time = 0')
time = 0;
t_max = 10; t_int = 1/100;
TIME = 1; T_plot = t_int^(-1) /100;
loops = t_max / T_plot;
F(loops) = struct('cdata',[],'colormap',[]);
F(1) = getframe;
% Probability of healing in interval of length t_int
heal_rate = 1/3; % (higher number is faster heal)
p_heal = t_int * heal_rate;
numhealed = 0;
while time < t_max
time = time+t_int;
steps = poissrnd(t_int,[n,1]);
parfor k = 1:n
for s = 1:steps(k)
unit_vec = unif_unitvector;
X_new = X(:,k) + unit_vec*t_int;
if ( X_new < 1 == ones(2,1) ) ...
& ( X_new > 0 == ones(2,1) )
X(:,k) = X_new;
D = sq_distance(X,X);
A = sparse((D < r) - eye(n));
[S,C] = graphconncomp(A);
particles_healed = binornd(ones(length(I),1),p_heal);
still_infected = find(particles_healed == 0);
I = I(still_infected);
numhealed = numhealed + sum(particles_healed);
I_new = I;
% compnum = zeros(length(I),1);
for i = 1:length(I)
compnum = C(I(i));
I_new = union(I_new,find(C == compnum));
I = I_new;
if time >= T_plot*TIME
hold on
hold off
title(sprintf('time = %1g',time))
% fprintf('number healed = %1g\n',numhealed)
numhealed = 0;
F(TIME) = getframe;
TIME = TIME + 1;

MATLAB - Adaptive Step Size Runge-Kutta

I've programmed in MATLAB an adaptive step size RK4 to solve a system of ODEs. The code runs without error, however it does not produce the desired curve when I try to plot x against y. Instead of being a toroidal shape, I simply get a flat line. This is evident from the fact that r is outputting a constant value. After checking the outputs of each line, they are not outputting constants or errors or inf or NaN, rather they are outputting both a real and imaginary component (complex numbers). I have no idea as to why this is occurring and I believe it to be the source of my trouble.
function AdaptRK4()
parsec = 3.08*10^18;
r_1 = 8.5*1000.0*parsec; % in cm
theta_1 = 0.0;
a = 0.5*r_1;
gam = 1;
grav = 6.6720*10^-8;
amsun = 1.989*10^33;
amg = 1.5d11*amsun;
gm = grav*amg;
u_1 = 20.0*10^5;
v = sqrt(gm/r_1);
time = 0.0;
epsilon = 0.00001;
m1 = 0.5;
m2 = 0.5;
m3 = 0.5;
i = 1;
nsteps = 50000;
deltat = 5.0*10^12;
angmom = r_1*v;
angmom2 = angmom^2.0;
e = -2*10^5.0*gm/r_1+u_1*u_1/2.0+angmom2/(2.0*r_1*r_1);
for i=1:nsteps
deltat = min(deltat,nsteps-time);
fk3_1 = deltat*u_1;
fk4_1 = deltat*(-gm*r_1*r_1^(-gam)/(a+r_1)^(3- gam)+angmom2/(r_1^3.0));
fk5_1 = deltat*(angmom/(r_1^2.0));
r_2 = r_1+fk3_1/4.0;
u_2 = u_1+fk4_1/4.0;
theta_2 = theta_1+fk5_1/4.0;
fk3_2 = deltat*u_2;
fk4_2 = deltat*(-gm*r_2*r_2^(-gam)/(a+r_2)^(3-gam)+angmom2/(r_2^3.0));
fk5_2 = deltat*(angmom/(r_2^2.0));
r_3 = r_1+(3/32)*fk3_1 + (9/32)*fk3_2;
u_3 = u_1+(3/32)*fk4_1 + (9/32)*fk4_2;
theta_3 = theta_1+(3/32)*fk5_1 + (9/32)*fk5_2;
fk3_3 = deltat*u_3;
fk4_3 = deltat*(-gm*r_3*r_3^(-gam)/(a+r_3)^(3-gam)+angmom2/(r_3^3.0));
fk5_3 = deltat*(angmom/(r_3^2.0));
r_4 = r_1+(1932/2197)*fk3_1 - (7200/2197)*fk3_2 + (7296/2197)*fk3_3;
u_4 = u_1+(1932/2197)*fk4_1 - (7200/2197)*fk4_2 + (7296/2197)*fk4_3;
theta_4 = theta_1+(1932/2197)*fk5_1 - (7200/2197)*fk5_2 + (7296/2197)*fk5_3;
fk3_4 = deltat*u_4;
fk4_4 = deltat*(-gm*r_4*r_4^(-gam)/(a+r_4)^(3-gam)+angmom2/(r_4^3.0));
fk5_4 = deltat*(angmom/(r_4^2.0));
r_5 = r_1+(439/216)*fk3_1 - 8*fk3_2 + (3680/513)*fk3_3 - (845/4104)*fk3_4;
u_5 = u_1+(439/216)*fk4_1 - 8*fk4_2 + (3680/513)*fk4_3 - (845/4104)*fk4_4;
theta_5 = theta_1+(439/216)*fk5_1 - 8*fk5_2 + (3680/513)*fk5_3 - (845/4104)*fk5_4;
fk3_5 = deltat*u_5;
fk4_5 = deltat*(-gm*r_5*r_5^(-gam)/(a+r_5)^(3-gam)+angmom2/(r_5^3.0));
fk5_5 = deltat*(angmom/(r_5^2.0));
r_6 = r_1-(8/27)*fk3_1 - 2*fk3_2 - (3544/2565)*fk3_3 + (1859/4104)*fk3_4-(11/40)*fk3_5;
u_6 = u_1-(8/27)*fk4_1 - 2*fk4_2 - (3544/2565)*fk4_3 + (1859/4104)*fk4_4-(11/40)*fk4_5;
theta_6 = theta_1-(8/27)*fk5_1 - 2*fk5_2 - (3544/2565)*fk5_3 + (1859/4104)*fk5_4-(11/40)*fk5_5;
fk3_6 = deltat*u_6;
fk4_6 = deltat*(-gm*r_6*r_6^(-gam)/(a+r_6)^(3-gam)+angmom2/(r_6^3.0));
fk5_6 = deltat*(angmom/(r_6^2.0));
fm3_1 = m1 + 25*fk3_1/216+1408*fk3_3/2565+2197*fk3_4/4104-fk3_5/5;
fm4_1 = m2 + 25*fk4_1/216+1408*fk4_3/2565+2197*fk4_4/4104-fk4_5/5;
fm5_1 = m3 + 25*fk5_1/216+1408*fk5_3/2565+2197*fk5_4/4104-fk5_5/5;
fm3_2 = m1 + 16*fk3_1/135+6656*fk3_3/12825+28561*fk3_4/56430-9*fk3_5/50+2*fk3_6/55;
fm4_2 = m2 + 16*fk4_1/135+6656*fk4_3/12825+28561*fk4_4/56430-9*fk4_5/50+2*fk4_6/55;
fm5_2 = m3 + 16*fk5_1/135+6656*fk5_3/12825+28561*fk5_4/56430-9*fk5_5/50+2*fk5_6/55;
R3 = abs(fm3_1-fm3_2)/deltat;
R4 = abs(fm4_1-fm4_2)/deltat;
R5 = abs(fm5_1-fm5_2)/deltat;
err3 = 0.84*(epsilon/R3)^(1/4);
err4 = 0.84*(epsilon/R4)^(1/4);
err5 = 0.84*(epsilon/R5)^(1/4);
if R3<= epsilon
time = time+deltat;
fm3 = fm3_1;
i = i+1;
deltat = err3*deltat;
if R4<= epsilon
time = time+deltat;
fm4 = fm4_1;
i = i+1;
deltat = err4*deltat;
if R5<= epsilon
time = time+deltat;
fm5 = fm5_1;
i = i+1;
deltat = err5*deltat;
plot(x,y, '-k');
TeXString = title('Plot of Orbit in Gamma Potential Obtained Using RK4')
You are getting complex values because at some point npts - time < 0. You may want to print out the values of deltat to check the error.
Also, your code doesn't seem to take into account the case when the error estimate is larger than your tolerance. When your error estimate is greater than your tolerance you have to:
Shift back the time AND solution
calculate a new step-size based on a formula, and
recalculate your solution and error estimate.
The fact that you don't know how many iterations you will have to go through makes the use of a for-loop for adaptive runge Kutta a bit awkward. I suggest using a while loop instead.
You are using "i" in your code. "i" returns the basic imaginary unit. "i" is equivalent to sqrt(-1). Try to use another identifier in your loops and only use "i" or "j" in calculations where complex numbers are involved.

Change input video from saved video to streaming video for manipulation

I downloaded this code from MIT's video magnification lab: http://people.csail.mit.edu/mrub/evm/#code
Except it currently only runs on saved videos. We wanted to change it so that we can have a live video set up.
The matlab code is as follows:
dataDir = './data';
resultsDir = 'ResultsSIGGRAPH2012';
inFile = fullfile(dataDir,'face2.mp4');
fprintf('Processing %s\n', inFile);
% Motion
amplify_spatial_lpyr_temporal_butter(inFile,resultsDir,20,80, ...
0.5,10,30, 0);
The amplify_spatial_lpyr_temporal_butter matlab code is:
function amplify_spatial_lpyr_temporal_butter(vidFile, outDir ...
,alpha, lambda_c, fl, fh ...
,samplingRate, chromAttenuation)
[low_a, low_b] = butter(1, fl/samplingRate, 'low');
[high_a, high_b] = butter(1, fh/samplingRate, 'low');
[~,vidName] = fileparts(vidFile);
outName = fullfile(outDir,[vidName '-butter-from-' num2str(fl) '-to-' ...
num2str(fh) '-alpha-' num2str(alpha) '-lambda_c-' num2str(lambda_c) ...
'-chromAtn-' num2str(chromAttenuation) '.avi']);
% Read video
vid = VideoReader(vidFile);
% Extract video info
vidHeight = vid.Height;
vidWidth = vid.Width;
nChannels = 3;
fr = vid.FrameRate;
len = vid.NumberOfFrames;
temp = struct('cdata', ...
zeros(vidHeight, vidWidth, nChannels, 'uint8'), ...
'colormap', []);
startIndex = 1;
endIndex = len-10;
vidOut = VideoWriter(outName);
vidOut.FrameRate = fr;
% firstFrame
temp.cdata = read(vid, startIndex);
[rgbframe,~] = frame2im(temp);
rgbframe = im2double(rgbframe);
frame = rgb2ntsc(rgbframe);
[pyr,pind] = buildLpyr(frame(:,:,1),'auto');
pyr = repmat(pyr,[1 3]);
[pyr(:,2),~] = buildLpyr(frame(:,:,2),'auto');
[pyr(:,3),~] = buildLpyr(frame(:,:,3),'auto');
lowpass1 = pyr;
lowpass2 = pyr;
pyr_prev = pyr;
output = rgbframe;
nLevels = size(pind,1);
for i=startIndex+1:endIndex
progmeter(i-startIndex,endIndex - startIndex + 1);
temp.cdata = read(vid, i);
[rgbframe,~] = frame2im(temp);
rgbframe = im2double(rgbframe);
frame = rgb2ntsc(rgbframe);
[pyr(:,1),~] = buildLpyr(frame(:,:,1),'auto');
[pyr(:,2),~] = buildLpyr(frame(:,:,2),'auto');
[pyr(:,3),~] = buildLpyr(frame(:,:,3),'auto');
%% temporal filtering
lowpass1 = (-high_b(2) .* lowpass1 + high_a(1).*pyr + ...
lowpass2 = (-low_b(2) .* lowpass2 + low_a(1).*pyr + ...
filtered = (lowpass1 - lowpass2);
pyr_prev = pyr;
%% amplify each spatial frequency bands according to Figure 6 of our paper
ind = size(pyr,1);
delta = lambda_c/8/(1+alpha);
% the factor to boost alpha above the bound we have in the
% paper. (for better visualization)
exaggeration_factor = 2;
% compute the representative wavelength lambda for the lowest spatial
% freqency band of Laplacian pyramid
lambda = (vidHeight^2 + vidWidth^2).^0.5/3; % 3 is experimental constant
for l = nLevels:-1:1
indices = ind-prod(pind(l,:))+1:ind;
% compute modified alpha for this level
currAlpha = lambda/delta/8 - 1;
currAlpha = currAlpha*exaggeration_factor;
if (l == nLevels || l == 1) % ignore the highest and lowest frequency band
filtered(indices,:) = 0;
elseif (currAlpha > alpha) % representative lambda exceeds lambda_c
filtered(indices,:) = alpha*filtered(indices,:);
filtered(indices,:) = currAlpha*filtered(indices,:);
ind = ind - prod(pind(l,:));
% go one level down on pyramid,
% representative lambda will reduce by factor of 2
lambda = lambda/2;
%% Render on the input video
output = zeros(size(frame));
output(:,:,1) = reconLpyr(filtered(:,1),pind);
output(:,:,2) = reconLpyr(filtered(:,2),pind);
output(:,:,3) = reconLpyr(filtered(:,3),pind);
output(:,:,2) = output(:,:,2)*chromAttenuation;
output(:,:,3) = output(:,:,3)*chromAttenuation;
output = frame + output;
output = ntsc2rgb(output);
% filtered = rgbframe + filtered.*mask;
output(output > 1) = 1;
output(output < 0) = 0;
We are trying to get it to work with a streaming video input, but don't really know where to start. We are somewhat familiar with programming but not so familiar with Matlab and so don't really know where to look for such answers.
Any help would be greatly appreciated.

speed up code - vectorization

I'm not really familiar with vectorization, but I am aware that, amongst MATLAB's strengths, code vectorization is probably the most rewarded.
I have this code:
ikx= (-Nx/2:Nx/2-1)*dk1;
iky= (-Ny/2:Ny/2-1)*dk2;
ikz= (-Nz/2:Nz/2-1)*dk3;
[k1,k2,k3] = ndgrid(ikx,iky,ikz);
k = sqrt(k1.^2 + k2.^2 + k3.^2);
Cij = zeros(3,3,Nx,Ny,Nz);
count = 0;
for ii = 1:Nx
for jj = 1:Ny
for kk = 1:Nz
if ~isequal(k1(ii,jj,kk),0)
count = count +1;
fprintf('iteration step %i\r\n',count)
E_int = interp1(k_vec,E_vec,k(ii,jj,kk),'spline','extrap');
beta = c*gamma./(k(ii,jj,kk).*sqrt(E_int));
k30 = k3(ii,jj,kk) + beta*k1(ii,jj,kk);
k0 = sqrt(k1(ii,jj,kk)^2 + k2(ii,jj,kk)^2 + k30^2);
Ek0 = 1.453*(k0^4/((1 + k0^2)^(17/6)));
B = sigmaiso*sqrt((Ek0./(k0.^2))*((dk1*dk2*dk3)/(4*pi)));
C1 = ((beta.*k1(ii,jj,kk).^2).*(k0.^2 - 2*k30.^2 + k30.*beta.*k1(ii,jj,kk)))./(k(ii,jj,kk).^2.*(k1(ii,jj,kk).^2 + k2(ii,jj,kk).^2));
C2 = ((k2(ii,jj,kk).*(k0.^2))./((k1(ii,jj,kk).^2 + k2(ii,jj,kk).^2).^(3/2))).*atan2((beta.*k1(ii,jj,kk).*sqrt(k1(ii,jj,kk).^2 + k2(ii,jj,kk).^2)),(k0.^2 - k30.*beta.*k1(ii,jj,kk)));
xhsi1 = C1 - C2.*(k2(ii,jj,kk)./k1(ii,jj,kk));
xhsi2 = C1.*(k2(ii,jj,kk)./k1(ii,jj,kk)) + C2;
Cij(1,1,ii,jj,kk) = B.*((k2(ii,jj,kk).*xhsi1)./(k0));
Cij(1,2,ii,jj,kk) = B.*((k3(ii,jj,kk)-k1(ii,jj,kk).*xhsi1+beta.*k1(ii,jj,kk))./(k0));
Cij(1,3,ii,jj,kk) = B.*(-k2(ii,jj,kk)./(k0));
Cij(2,1,ii,jj,kk) = B.*((k2(ii,jj,kk).*xhsi2-k3(ii,jj,kk)-beta.*k1(ii,jj,kk))./(k0));
Cij(2,2,ii,jj,kk) = B.*((-k1(ii,jj,kk).*xhsi2)./(k0));
Cij(2,3,ii,jj,kk) = B.*(k1(ii,jj,kk)./(k0));
Cij(3,1,ii,jj,kk) = B.*(k2(ii,jj,kk).*k0./(k(ii,jj,kk).^2));
Cij(3,2,ii,jj,kk) = B.*(-k1(ii,jj,kk).*k0./(k(ii,jj,kk).^2));
Generally, I might avoid the nested for loops; nonetheless, the if statement on k1 values is currently directing me towards the classical and old-fashion code structure.
I blatantly would like to bypass the presence of the for loops in favour of vectorized and more elegant solution.
Any support is more than welcome.
To let better understand what the code is expected to perform, I hereby provide you with some basics:
As #Floris advised, I came up with this alternative solution:
ikx= (-Nx/2:Nx/2-1)*dk1;
iky= (-Ny/2:Ny/2-1)*dk2;
ikz= (-Nz/2:Nz/2-1)*dk3;
[k1,k2,k3] = ndgrid(ikx,iky,ikz);
k = sqrt(k1.^2 + k2.^2 + k3.^2);
ii = (ikx ~= 0);
k1w = k1(ii,:,:);
k2w = k2(ii,:,:);
k3w = k3(ii,:,:);
kw = k(ii,:,:);
E_int = interp1(k_vec,E_vec,kw,'spline','extrap');
beta = c*gamma./(kw.*sqrt(E_int));
k30 = k3w + beta.*k1w;
k0 = sqrt(k1w.^2 + k2w.^2 + k30.^2);
Ek0 = (1.453*k0.^4)./((1 + k0.^2).^(17/6));
B = sqrt((2*(pi^2)*(l^3))*(Ek0./(V*k0.^4)));
k1w_2 = k1w.^2;
k2w_2 = k2w.^2;
k30_2 = k30.^2;
k0_2 = k0.^2;
kw_2 = kw.^2;
C1 = ((beta.*k1w_2).*(k0_2 - 2.*k30_2 + beta.*k1w.*k30))./(kw_2.*(k1w_2 + k2w_2));
C2 = ((k2w.*k0_2)./((k1w_2 + k2w_2).^(3/2))).*atan2((beta.*k1w).*sqrt(k1w_2 + k2w_2),(k0_2 - k30.*k1w.*beta));
xhsi1 = C1 - (k2w./k1w).*C2;
xhsi2 = (k2w./k1w).*C1 + C2;
Cij = zeros(3,3,Nx,Ny,Nz);
Cij(1,1,ii,:,:) = B.*(k2w.*xhsi1);
Cij(1,2,ii,:,:) = B.*(k3w - k1w.*xhsi1 + beta.*k1w);
Cij(1,3,ii,:,:) = B.*(-k2w);
Cij(2,1,ii,:,:) = B.*(k2w.*xhsi2 - k3w - beta.*k1w);
Cij(2,2,ii,:,:) = B.*(-k1w.*xhsi2);
Cij(2,3,ii,:,:) = B.*(k1w);
Cij(3,1,ii,:,:) = B.*((k0_2./kw_2).*k2w);
Cij(3,2,ii,:,:) = B.*(-(k0_2./kw_2).*k1w);
You can do your test just once, and then create arrays of "just the elements you need". Example:
% create an index of all the elements that are worth computing:
worthComputing = find(k1(:)~=0);
% now create sub-arrays of all the other arrays... a little bit expensive on memory,
% but much faster for computation:
kw = k(worthComputing);
k1w = k1(worthComputing);
k2w = k2(worthComputing);
k3w = k3(worthComputing);
% now we'll compute all the results of the innermost for loop in single statements:
E_int = interp1(k_vec,E_vec,kw,'spline','extrap');
beta = c*gamma./kw.*sqrt(E_int));
k30 = k3w + beta*k1w;
k0 = sqrt(k1w.^2 + k2w.^2 + k30.^2);
Ek0 = 1.453*(k0.^4/((1 + k0.^2).^(17/6)));
% the next line has dk1, dk2, dk3 ... not sure what they are? Not shown to be initialized. Assuming scalars as they are not indexed.
B = sigmaiso*sqrt((Ek0./(k0.^2))*((dk1*dk2*dk3)/(4*pi)));
C1 = ((beta.*k1w.^2).*(k0.^2 - 2*k30.^2 + k30.*beta.*k1w))./(kw.^2.*(k1w.^2 + k2w.^2));
C2 = ((k2w.*(k0.^2))./((k1w.^2 + k2w.^2).^(3/2))).*atan2((beta.*k1w.*sqrt(k1w.^2 + ...
k2w.^2)),(k0.^2 - k30.*beta.*k1w));
xhsi1 = C1 - C2.*(k2w./k1w);
xhsi2 = C1.*(k2w./k1w) + C2;
% in the next lines I am using the trick of "collapsing" the remaining indices
% in other words, Matlab figures out that I want to access the elements in C
% that correspond to the ii, jj, kk that were picked before...
Cij(1,1,worthComputing) = B.*((k2w.*xhsi1)./(k0));
Cij(1,2,worthComputing) = B.*((k3w-k1w.*xhsi1+beta.*k1w)./(k0));
Cij(1,3,worthComputing) = B.*(-k2w./(k0));
Cij(2,1,worthComputing) = B.*((k2w.*xhsi2-k3w-beta.*k1w)./(k0));
Cij(2,2,worthComputing) = B.*((-k1w.*xhsi2)./(k0));
Cij(2,3,worthComputing) = B.*(k1w./(k0));
Cij(3,1,worthComputing) = B.*(k2w.*k0./(kw.^2));
Cij(3,2,worthComputing) = B.*(-k1w.*k0./(kw.^2));
It is entirely possible there's a typo or two in the above - but this is the basic approach to vectorization.

Perform step-by-step integral

I have this piece of code:
time = 614.4;
Uhub = 11;
HubHt = 90;
TI = 'A';
N1 = 4096;
N2 = 32;
N3 = 32;
L1 = Uhub*time;
L2 = 150;
L3 = 220;
V = L1*L2*L3;
gamma = 3.9;
c = 1.476;
b = 5.6;
if HubHt < 60
lambda1 = 0.7*HubHt;
lambda1 = 42;
L = 0.8*lambda1;
if isequal(TI,'A')
Iref = 0.16;
sigma1 = Iref*(0.75*Uhub + b);
elseif isequal(TI,'B')
Iref = 0.14;
sigma1 = Iref*(0.75*Uhub + b);
elseif isequal(TI,'C')
Iref = 0.12;
sigma1 = Iref*(0.75*Uhub + b);
sigma1 = str2num(TI)*Uhub/100;
sigma_iso = 0.55*sigma1;
%% Wave number vectors
ik1 = cat(2,(-N1/2:-1/2),(1/2:N1/2));
ik2 = -N2/2:N2/2-1;
ik3 = -N3/2:N3/2-1;
[x y z] = ndgrid(ik1,ik2,ik3);
k1 = reshape((2*pi*L/L1)*x,N1*N2*N3,1);
k2 = reshape((2*pi*L/L2)*y,N1*N2*N3,1);
k3 = reshape((2*pi*L/L3)*z,N1*N2*N3,1);
k = sqrt(k1.^2 + k2.^2 + k3.^2);
Now I should calculate
The procedure to calculate the integral is
At the moment I'm using this loop
E = #(k) (1.453*k.^4)./((1 + k.^2).^(17/6));
E_int = zeros(1,N1*N2*N3);
E_int(1) = 1.5;
for i = 2:(N1*N2*N3)
E_int(i) = E_int(i) + quad(E,i-1,i);
neglecting for the k>400 approximation. I believe that my loop is not right.
How would you suggest to calculate the integral?
I thank you in advance.
This is a list of correction from the more obvious to the possibly more subtle. (Indeed I start from what you wrote in the final part going upwards).
From what you write:
E = #(k) (1.453*k.^4)./((1 + k.^2).^(17/6));
E_int = zeros(1,N1*N2*N3);
E_int(1) = 1.5;
for i = 2:(N1*N2*N3)
%//No point in doing this:
%//E_int(i) = E_int(i) + quad(E,i-1,i);
%//According to what you write, it should be:
E_int(i) = E_int(i-1) + quad(E,i-1,i);
You could speed the whole thing up by doing
%//Independent integration on segments
Local_int = arrayfun(#(i)quad(E,i-1,i), 2:(N1*N2*N3));
Local_int = [1.5 Local_int];
%//integral additivity
E_int = cumsum(Local_int);
Moreover, if the known condition (point 2.) really is "... ( = 1.5 if k' = 0)", then the whole implementation should really be more like
%//Independent integration on segments
Local_int = arrayfun(#(i)quad(E,i-1,i), 2:(N1*N2*N3));
%//integral additivity + cumulative removal of queues
E_int = 1.5 - [0 fliplr(cumsum(fliplr(Local_int)))]; %//To remove queues