I want to calculate the Euclidean distance between two images using the Hyperbolic Tangent (Sigmoid) kernel. Please follow this link where I have discussed the same problem using Gaussian Kernel in detail.
If x=(i,j) & y=(i1,j1) are any two pixels in our image then for hyperbolic tangent kernel, my H(x,y) will be defined as:
H(i,j) = tanh(alpha*(x'*y) + c)
where alpha and c are parameters and x' is the transpose of x. Parameter alpha can be taken as 1/N where N is my image dimension(8192 x 200 in my case) and c can take any value according to the problem. More detailed description about Hyperbolic Tangent kernel can be found here.
To achieve my goal & keeping the running time under consideration, I have written the below MATLAB script.
s1 = 8192;
s2 = 200;
alpha = s1*s2;
perms = combvec(1:s2,1:s1);
perms = [perms(2,:);perms(1,:)]';
perms1 = perms;
gray1(4096,100) = 10;
gray2(10,100) = 10;
img_diff = gray1 - gray2;
display('Calculation of Sigmoid Kernel started');
for i = 1:length(perms1)
kernel = sum(bsxfun(#times,perms,perms1(i,:))');
kernel1 = tanh((1/alpha)*kernel + 1)';
g_temp(i) = img_diff(:)'*kernel1;
temp = g_temp*img_diff(:);
ans = sqrt(temp);
In spite of my all efforts I couldn't vectorize it further so as to decrease its running cost. Currently, it is taking around 29 hours to complete which is too much for me as I want to run it for various different images. I want to give it a completely vectorized form using intrinsic MATLAB functions as it was done by #dan-man in the case of Gaussian Kernel. With his help the Gaussian Version was taking 1-2 secs to complete. I tried my best to use the same conv2fft function in this case also but it seems difficult to find a way to achieve that.
Can someone please help me to remove that one extra for loop so as to get the running cost of algorithm in the same proportion as that of the Gaussian version of same problem.
Thanks in advance.

Get rid of the nasty loop with matrix-multiplication -
g_temp = img_diff(:).'*tanh((1/alpha)*(perms*perms.')+1)

With my times in my PC for just 50 iterations, the code takes 2.07s
Just changing the bsxfun line to
kernel = sum(bsxfun(#times,perms,perms1(i,:)),2)';
as the warning suggests you can get it to 1.65s
If you use the Neural Network toolbox and substitute tanh by tansig , the time goes to 1.44s
If you write your own tanhas
kernel1= (2./(1+exp(-2.*((1/alpha)*kernel + 1)))-1)';
the time goes to 1.28s
Just these changes would mean improvement from 29h to 18h
And remember to preallocate!


How does MATLAB calculate immse?

How does Matlab calculate immse? I want to find the mse between two images. According to how to measure he similarity between two 2D complex fields in matlab?, immse is the same as MSE=mean((abs(Y(:))-abs(Y1(:))).^2) for reference image Y1 and comparison image Y. Likewise, I could calculate MSE as the summed square errors divided by the number of row*cols. When I run on one of the demo images, these different approaches don't give the same answer as immse.
Here are two MSE approaches in the sample code below. The image is from the Matlab immse demo and immse gives around an MSE=340. The other two codes give around an MSE=2.5.
Note: The code is example code, I did not use the same function name twice in the same script. And I understand if you want to complain about using size(image) but that is a detail. I am more worried about the basic flaw in my understanding that is giving me orders of magnitude differences. Thank you so much.
n01 = imread('pout.tif');
n02 = imnoise(n01,'salt & pepper', 0.02);
mse = mymse(n02,n01);
mlmse = immse(n02,n01);
function this = mymse(icomp, ibase)
[X Y nchan] = size(ibase);
diff = (icomp - ibase);
this = sum(sum(diff.*diff))/(X*Y*nchan);
function this = mymse(icomp, ibase)
this = mean ((abs(ibase(:)) - abs(icomp(:))).^2);
You can check the underlying code to many matlab functions by simply doing
open <func>
in the Matlab command window.
In this case you can see that immse is doing the norm of the differences, scaled by number of points.
function this = mymse(icomp, ibase)
this = sum((ibase(:) - icomp(:)).^2) / numel(ibase);

Optimizing Fourier Series Fitting Function Matlab

I am trying to iterate through a set of samples that seems to show periodic changes. I need continuously apply the fit function to get the fourier series coefficients, the regression has to be n samples in the past (in my case, around 30). The problem is, my code is extremely slow! It will take like 1 hour to do this for a set of 50,000 samples. Is there any way to optimize this? What am I doing wrong?
Here's my code:
function[coefnames,coef] = fourier_regression(vect_waves,n)
j = 1;
coef = zeros(length(vect_waves)-n,10);
for i=n+1:length(vect_waves)
take_fourier = vect_waves(i-n+1:i);
x = 1:n;
f = fit(x,take_fourier,'fourier4');
current_coef = coeffvalues(f);
coef(j,1:length(current_coef)) = current_coef;
j = j + 1;
coefnames = coeffnames(f);
When I call [coefnames,coef] = fourier_regression(VECTOR,30); This takes forever to compute. Is there any way to fix it? What's wrong with my code?
Note: I have a intel i7 5500 U cpu, 16GB RAM, and using Matlab 2015a.
As I am not familiar with your application, I am not sure whether it is possible to vectorize the code to improve performance. However, I have a couple of other tips.
One thing you should consider is preallocation of arrays. In this case, you should preallocate at least the array coef since I believe you do know its size before starting the loop.
Another thing I suggest is to profile your code. This will provide information on what parts of your code are consuming the most time, helping you focus your effort on improving those parts' performance.

nested forloops for running sliding window function for MATLAB

I'm trying to write a simple code that will generate the sum of a large window and divide by the sum of the small running window to get the energy ratio.
my code looks like this in MATLAB
S = data1;
[nt,ntraces] = size(S);
!Create sliding windows for First Break Picking:
!define a window length
!for large Window
nl = 300
!for small running Window
ns = 50
! tolerance/Fudge Factor
beta = 0.0000
for i_slide = 1:nt-nl
for i_large = i_slide:(i_slide+nl)
large_window(i_large) = sum(S(i_large).^2)';
for i_small = i_slide+ns:i_slide+nl
small_window(i_small) = sum(S(i_small).^2)';
ER(i_slide) = small_window/(large_window + beta);
The problem i am having is that my small running window is not indexing correctly nor is it running the sum along the whole large window length at the maximum slide.
any ideas how i can overcome this problem?
In general, the problem you're really trying to solve seems to be general 2-D (or 1-D?) convolution. You can use MATLAB's conv or conv2 function (or filter or imfilter, if you have image processing toolbox) to do this. If you need to write a 2-D convolution function, I wouldn't try and write one that does two convolutions and takes the ratio. Instead write a simple convolution function: my_conv and run it twice, and take the ratio. e.g., you're trying to write:
output = my_double_conv(data,smallFilt,bigFilt); %this does ratios
I don't think that's a good idea in general. Don't do that. Do
output = my_conv(data,smallFilt) ./ my_conv(data,bigFilt);
You might see some speed benefits from not having to index everything twice in my_double_conv, but if computational concerns are your issue, you shouldn't be writing your own convolution in the first place; instead you should be using FFT convolutions, or integral-image convolutions (e.g., http://hebb.mit.edu/courses/9.29/2004/readings/c13-1.pdf or http://en.wikipedia.org/wiki/Summed_area_table )
That said, your code has several problems. Have you tried debugging with the MATLAB debugger?
For example, this is clearly wrong, since i_small is a scalar index:
for i_small = i_slide+ns:i_slide+nl
small_window(i_small) = sum(S(i_small).^2)';
That sum is not going to "sum" over anything, since i_small will be a scalar...
Do you want:
small_window= S(i_slide+ns:i_slide+nl);
small_window_sum = sum(small_window.^2);
Also note that for element-wise matrix operations, like:
small_window/(large_window + beta);
Where small_window and large_window are scalars, you want:
small_window./(large_window + beta); %note the "."

Scope for improvement in this code

I have written the following code in MATLAB to process large images of the order of 3000x2500 pixels. Currently the operation takes more than half hour to complete. Is there any scope to improve the code to consume less time? I heard parallel processing can make things faster, but I have no idea on how to implement it. How do I do it, given the following code?
function dirvar(subfn)
[fn,pn] = uigetfile({'*.TIF; *.tiff; *.tif; *.TIFF; *.jpg; *.bmp; *.JPG; *.png'}, ...
'Select an image', '~/');
I = double(imread(fullfile(pn,fn)));
ld = input('Enter the lag distance = '); % prompt for lag distance
fh = eval(['#' subfn]); % Function handles
I2 = uint8(nlfilter(I, [7 7], fh));
imshow(I2); % Texture Layer Image
% Zero Degree Variogram
function [gamma] = ewvar(I)
c = (size(I)+1)/2; % Finds the central pixel of moving window
EW = I(c(1),c(2):end); % Determines the values from central pixel to margin of window
h = length(EW) - ld; % Number of lags
gamma = 1/(2 * h) * sum((EW(1:ld:end-1) - EW(2:ld:end)).^2);
The input lag distance is usually 1.
You really need to use the profiler to get some improvements out of it. My first guess (as I haven't run the profiler, which you should as suggested already), would be to use as little length operations as possible. Since you are processing every image with a [7 7] window, you can precalculate some parts,
such that you won't repeat these actions
function dirvar(subfn)
[fn,pn] = uigetfile({'*.TIF; *.tiff; *.tif; *.TIFF; *.jpg; *.bmp; *.JPG; *.png'}, ...
'Select an image', '~/');
I = double(imread(fullfile(pn,fn)));
ld = input('Enter the lag distance = '); % prompt for lag distance
fh = eval(['#' subfn]); % Function handles
%% precalculations
wind = [7 7];
center = (wind+1)/2; % Finds the central pixel of moving window
EWlength = (wind(2)+1)/2;
h = EWlength - ld; % Number of lags
%% calculations
I2 = nlfilter(I, wind, fh);
imshow(I2); % Texture Layer Image
% Zero Degree Variogram
function [gamma] = ewvar(I)
EW = I(center(1),center(2):end); % Determines the values from central pixel to margin of window
gamma = 1/(2 * h) * sum((EW(1:ld:end-1) - EW(2:ld:end)).^2);
Note that by doing so, you trade performance for clearness of your code and coupling (between the function dirvar and the nested function ewvar). However, since I haven't profiled your code (you should do that yourself using your own inputs), you can find what line of your code consumes the most time.
For batch processing, I would also recommend to leave out any input, imshow, imwrite and uigetfile. Those are commands that you typically call from a more high-level function/script and that will force you to enter these inputs even when you want them to stay the same. So instead of that code, make each of the variables they produce (/process) a parameter (/return value) for your function. That way, you could leave MATLAB running during the weekend to process everything (without having manually enter to all those values), even if you are unable to speed up the code.
A few general purpose tricks:
1 - use the MATLAB profiler to determine all the computational bottlenecks
2 - parallel processing can make things faster and there are a lot of tools that you can use, but it depends on how your entire code is set up and whether the code is optimized for it. By far the easiest trick to learn is parfor, where you can replace the top level for loop by parfor. This does mean you must open the MATLAB pool with matlabpool open.
3 - If you have a rather recent Nvidia GPU as well as MATLAB 2011, you can also write some CUDA code.
All in all 30 mins to me is peanuts, so don't fret it too much.
First of all, I strongly suggest you follow the advice by #Egon: Write a separate function that collects a list of files (the excellent UIPICKFILES from the FEX is your friend here), and then runs your filtering code in a loop for each image. Note that you should definitely keep the call to imwrite in your filtering code: In case the analysis crashes at image 48 (e.g. due to power failure), you don't want to lose all the previous work.
Running thusly in batch mode has two big advantages: (1) you can start running your code and go home for the week-end, and (2) you can easily parallelize this outside loop using PARFOR. However, with only a dual-core machine, it is unlikely that you get any significant improvements from parallelization - your OS also wants to run stuff at times, and the overhead of parallelization might be more than the gain from running two workers. Also, 2.5GB of RAM is seriously limiting.
As to your specific code: in my experience using IM2COL is often faster than NLFILTER. im2col creates a nElementsInMask-by-nMasks array out of your image, so that you can apply the filtering in one single operation. With a 7x7 window, the output of im2col will be 3000*2500*49 bytes, which is close to 400MB. Thus, it should just work. All that you need to do is rewrite ewvar so that it works on a 49x1 array of pixels that make up the pixels your mask, which will require some index juggling, if I understand your code correctly.

vectorizing loops in Matlab - performance issues

This question is related to these two:
Introduction to vectorizing in MATLAB - any good tutorials?
filter that uses elements from two arrays at the same time
Basing on the tutorials I read, I was trying to vectorize some procedure that takes really a lot of time.
I've rewritten this:
function B = bfltGray(A,w,sigma_r)
dim = size(A);
B = zeros(dim);
for i = 1:dim(1)
for j = 1:dim(2)
% Extract local region.
iMin = max(i-w,1);
iMax = min(i+w,dim(1));
jMin = max(j-w,1);
jMax = min(j+w,dim(2));
I = A(iMin:iMax,jMin:jMax);
% Compute Gaussian intensity weights.
F = exp(-0.5*(abs(I-A(i,j))/sigma_r).^2);
B(i,j) = sum(F(:).*I(:))/sum(F(:));
into this:
function B = rngVect(A, w, sigma)
W = 2*w+1;
I = padarray(A, [w,w],'symmetric');
I = im2col(I, [W,W]);
H = exp(-0.5*(abs(I-repmat(A(:)', size(I,1),1))/sigma).^2);
B = reshape(sum(H.*I,1)./sum(H,1), size(A, 1), []);
A is a matrix 512x512
w is half of the window size, usually equal 5
sigma is a parameter in range [0 1] (usually one of: 0.1, 0.2 or 0.3)
So the I matrix would have 512x512x121 = 31719424 elements
But this version seems to be as slow as the first one, but in addition it uses a lot of memory and sometimes causes memory problems.
I suppose I've made something wrong. Probably some logic mistake regarding vectorizing. Well, in fact I'm not surprised - this method creates really big matrices and probably the computations are proportionally longer.
I have also tried to write it using nlfilter (similar to the second solution given by Jonas) but it seems to be hard since I use Matlab 6.5 (R13) (there are no sophisticated function handles available).
So once again, I'm asking not for ready solution, but for some ideas that would help me to solve this in reasonable time. Maybe you will point me what I did wrong.
As Mikhail suggested, the results of profiling are as follows:
65% of time was spent in the line H= exp(...)
25% of time was used by im2col
How big are I and H (i.e. numel(I)*8 bytes)? If you start paging, then the performance of your second solution is going to be affected very badly.
To test whether you really have a problem due to too large arrays, you can try and measure the speed of the calculation using tic and toc for arrays A of increasing size. If the execution time increases faster than by the square of the size of A, or if the execution time jumps at some size of A, you can try and split the padded I into a number of sub-arrays and perform the calculations like that.
Otherwise, I don't see any obvious places where you could be losing lots of time. Well, maybe you could skip the reshape, by replacing B with A in your function (saves a little memory as well), and writing
A(:) = sum(H.*I,1)./sum(H,1);
You may also want to look into upgrading to a more recent version of Matlab - they've worked hard on improving performance.