Non Local Means Filter Optimization in MATLAB - matlab

I'm trying to write a Non-Local Means filter for an assignment. I've written the code in two ways, but the method I'd expect to be quicker is much slower than the other method.
Method 1: (This method is slower)
for i = 1:size(I,1)
tic
sprintf('%d/%d',i,size(I,1))
for j = 1:size(I,2)
w = exp((-abs(I-I(i,j))^2)/(h^2));
Z = sum(sum(w));
w = w/Z;
sumV = w .* I;
NL(i,j) = sum(sum(sumV));
end
toc
end
Method 2: (This method is faster)
for i = 1:size(I,1)
tic
sprintf('%d/%d',i,size(I,1))
for j = 1:size(I,2)
Z = 0;
for k = 1:size(I,1)
for l = 1:size(I,2)
w = exp((-abs(I(i,j)-I(k,l))^2)/(h^2));
Z = Z + w;
end
end
sumV = 0;
for k = 1:size(I,1)
for l = 1:size(I,2)
w = exp((-abs(I(i,j)-I(k,l))^2)/(h^2));
w = w/Z;
sumV = sumV + w * I(k,l);
end
end
NL(i,j) = sumV;
end
toc
end
I really thought that MATLAB would be optimized for Matrix operations. Is there reason it isn't in this code? The difference is pretty large. For a 512x512 image, with h = 0.05, one iteration of the outer loop takes 24-28 seconds for Method 1 and 10-12 seconds for Method 2.

The two methods are not doing the same thing. In Method 2, the term abs(I(i,j)-I(k,l)) in the w= expression is being squared, which is fine because the term is just a single numeric value.
However, in Method 1, the term abs(I-I(i,j)) is actually a matrix (The numeric value I(i,j) is being subtracted from every element in the matrix I, returning a matrix again). So, when this term is squared with the ^ operator, matrix multiplication is happening. My guess, based on Method 2, is that this is not what you intended. If instead, you want to square each element in that matrix, then use the .^ operator, as in abs(I-I(i,j)).^2
Matrix multiplication is a much more computation intensive operation, which is likely why Method 1 takes so much longer.

My guess is that you have not preassigned NL, that both methods are in the same function (or are scripts and you didn't clear NL between function runs). This would have slowed the first method by quite a bit.
Try the following: Create a function for both methods. Run each method once. Then use the profiler to see where each function spends most of its time.

A much faster implementation (Vectorized) could be achieved using im2col:
Create a Vector out of each neighborhood.
Using predefined indices calculate the distance between each patch.
Sum over the values and the weights using sum function.
This method will work with no loop at all.

Related

Newton's Method in Matlab

I am trying to apply Newton's method in Matlab, and I wrote a script:
syms f(x)
f(x) = x^2-4
g = diff(f)
x_1=1 %initial point
while f(['x_' num2str(i+1)])<0.001;% tolerance
for i=1:1000 %it should be stopped when tolerance is reached
['x_' num2str(i+1)]=['x_' num2str(i)]-f(['x_' num2str(i)])/g(['x_' num2str(i)])
end
end
I am getting this error:
Error: An array for multiple LHS assignment cannot contain M_STRING.
Newton's Method formula is x_(n+1)= x_n-f(x_n)/df(x_n) that goes until f(x_n) value gets closer to zero.
All of the main pieces are present in the code present. However, there are some problems.
The main problem is assuming string concatenation makes a variable in the workspace; it does not. The primary culprit is this line is this one
['x_' num2str(i+1)]=['x_' num2str(i)]-f(['x_' num2str(i)])/g(['x_' num2str(i)])
['x_' num2str(i+1)] is a string, and the MATLAB language does not support assignment to character arrays (which is my interpretation of An array for multiple LHS assignment cannot contain M_STRING.).
My answer, those others' may vary, would be
Convert the symbolic functions to handles via matlabFunction (since Netwon's Method is almost always a numerical implementation, symbolic functions should be dropper once the result of their usage is completed)
Replace the string creations with a double array for x (much, much cleaner, faster, and overall better code).
Put a if-test with a break in the for-loop versus the current construction.
My suggestions, implemented, would look like this:
syms f(x)
f(x) = x^2-4;
g = diff(f);
f = matlabFunction(f);
g = matlabFunction(g);
nmax = 1000;
tol = 0.001;% tolerance
x = zeros(1, nmax);
x(1) = 1; %initial point
fk = f(x(1));
for k = 1:nmax
if (abs(fk) < tol)
break;
end
x(k+1) = x(k) - f(x(k))/g(x(k));
fk = f(x(k));
end

Setting up a matrix of time and space for a function

I'm writing a program in matlab to observe how a function evolves in time. I'd like to set up a matrix that fills its first row with the initial function, and progressively fills more rows based off of a time derivative (that's dependent on the spatial derivative). The function is arbitrary, the program just needs to 'evolve' it. This is what I have so far:
xleft = -10;
xright = 10;
xsampling = 1000;
tmax = 1000;
tsampling = 1000;
dt = tmax/tsampling;
x = linspace(xleft,xright,xsampling);
t = linspace(0,tmax,tsampling);
funset = [exp(-(x.^2)/100);cos(x)]; %Test functions.
funsetvel = zeros(size(funset)); %The functions velocities.
spacetimevalue1 = zeros(length(x), length(t));
spacetimevalue2 = zeros(length(x), length(t));
% Loop that fills the first functions spacetime matrix.
for j=1:length(t)
funsetvel(1,j) = diff(funset(1,:),x,2);
spacetimevalue1(:,j) = funsetvel(1,j)*dt + funset(1,j);
end
This outputs the error, Difference order N must be a positive integer scalar. I'm unsure what this means. I'm fairly new to Matlab. I will exchange the Euler-method for another algorithm once I can actually get some output along the proper expectation. Aside from the error associated with taking the spatial derivative, do you all have any suggestions on how to evaluate this sort of process? Thank you.

Apply function to rolling window

Say I have a long list A of values (say of length 1000) for which I want to compute the std in pairs of 100, i.e. I want to compute std(A(1:100)), std(A(2:101)), std(A(3:102)), ..., std(A(901:1000)).
In Excel/VBA one can easily accomplish this by writing e.g. =STDEV(A1:A100) in one cell and then filling down in one go. Now my question is, how could one accomplish this efficiently in Matlab without having to use any expensive for-loops.
edit: Is it also possible to do this for a list of time series, e.g. when A has dimensions 1000 x 4 (i.e. 4 time series of length 1000)? The output matrix should then have dimensions 901 x 4.
Note: For the fastest solution see Luis Mendo's answer
So firstly using a for loop for this (especially if those are your actual dimensions) really isn't going to be expensive. Unless you're using a very old version of Matlab, the JIT compiler (together with pre-allocation of course) makes for loops inexpensive.
Secondly - have you tried for loops yet? Because you should really try out the naive implementation first before you start optimizing prematurely.
Thirdly - arrayfun can make this a one liner but it is basically just a for loop with extra overhead and very likely to be slower than a for loop if speed really is your concern.
Finally some code:
n = 1000;
A = rand(n,1);
l = 100;
for loop (hardly bulky, likely to be efficient):
S = zeros(n-l+1,1); %//Pre-allocation of memory like this is essential for efficiency!
for t = 1:(n-l+1)
S(t) = std(A(t:(t+l-1)));
end
A vectorized (memory in-efficient!) solution:
[X,Y] = meshgrid(1:l)
S = std(A(X+Y-1))
A probably better vectorized solution (and a one-liner) but still memory in-efficient:
S = std(A(bsxfun(#plus, 0:l-1, (1:l)')))
Note that with all these methods you can replace std with any function so long as it is applies itself to the columns of the matrix (which is the standard in Matlab)
Going 2D:
To go 2D we need to go 3D
n = 1000;
k = 4;
A = rand(n,k);
l = 100;
ind = bsxfun(#plus, permute(o:n:(k-1)*n, [3,1,2]), bsxfun(#plus, 0:l-1, (1:l)')); %'
S = squeeze(std(A(ind)));
M = squeeze(mean(A(ind)));
%// etc...
OR
[X,Y,Z] = meshgrid(1:l, 1:l, o:n:(k-1)*n);
ind = X+Y+Z-1;
S = squeeze(std(A(ind)))
M = squeeze(mean(A(ind)))
%// etc...
OR
ind = bsxfun(#plus, 0:l-1, (1:l)'); %'
for t = 1:k
S = std(A(ind));
M = mean(A(ind));
%// etc...
end
OR (taken from Luis Mendo's answer - note in his answer he shows a faster alternative to this simple loop)
S = zeros(n-l+1,k);
M = zeros(n-l+1,k);
for t = 1:(n-l+1)
S(t,:) = std(A(k:(k+l-1),:));
M(t,:) = mean(A(k:(k+l-1),:));
%// etc...
end
What you're doing is basically a filter operation.
If you have access to the image processing toolbox,
stdfilt(A,ones(101,1)) %# assumes that data series are in columns
will do the trick (no matter the dimensionality of A). Note that if you also have access to the parallel computing toolbox, you can let filter operations like these run on a GPU, although your problem might be too small to generate noticeable speedups.
To minimize number of operations, you can exploit the fact that the standard deviation can be computed as a difference involving second and first moments,
and moments over a rolling window are obtained efficiently with a cumulative sum (using cumsum):
A = randn(1000,4); %// random data
N = 100; %// window size
c = size(A,2);
A1 = [zeros(1,c); cumsum(A)];
A2 = [zeros(1,c); cumsum(A.^2)];
S = sqrt( (A2(1+N:end,:)-A2(1:end-N,:) ...
- (A1(1+N:end,:)-A1(1:end-N,:)).^2/N) / (N-1) ); %// result
Benchmarking
Here's a comparison against a loop based solution, using timeit. The loop approach is as in Dan's solution but adapted to the 2D case, exploting the fact that std works along each column in a vectorized manner.
%// File loop_approach.m
function S = loop_approach(A,N);
[n, p] = size(A);
S = zeros(n-N+1,p);
for k = 1:(n-N+1)
S(k,:) = std(A(k:(k+N-1),:));
end
%// File bsxfun_approach.m
function S = bsxfun_approach(A,N);
[n, p] = size(A);
ind = bsxfun(#plus, permute(0:n:(p-1)*n, [3,1,2]), bsxfun(#plus, 0:n-N, (1:N).')); %'
S = squeeze(std(A(ind)));
%// File cumsum_approach.m
function S = cumsum_approach(A,N);
c = size(A,2);
A1 = [zeros(1,c); cumsum(A)];
A2 = [zeros(1,c); cumsum(A.^2)];
S = sqrt( (A2(1+N:end,:)-A2(1:end-N,:) ...
- (A1(1+N:end,:)-A1(1:end-N,:)).^2/N) / (N-1) );
%// Benchmarking code
clear all
A = randn(1000,4); %// Or A = randn(1000,1);
N = 100;
t_loop = timeit(#() loop_approach(A,N));
t_bsxfun = timeit(#() bsxfun_approach(A,N));
t_cumsum = timeit(#() cumsum_approach(A,N));
disp(' ')
disp(['loop approach: ' num2str(t_loop)])
disp(['bsxfun approach: ' num2str(t_bsxfun)])
disp(['cumsum approach: ' num2str(t_cumsum)])
disp(' ')
disp(['bsxfun/loop gain factor: ' num2str(t_loop/t_bsxfun)])
disp(['cumsum/loop gain factor: ' num2str(t_loop/t_cumsum)])
Results
I'm using Matlab R2014b, Windows 7 64 bits, dual core processor, 4 GB RAM:
4-column case:
loop approach: 0.092035
bsxfun approach: 0.023535
cumsum approach: 0.0002338
bsxfun/loop gain factor: 3.9106
cumsum/loop gain factor: 393.6526
Single-column case:
loop approach: 0.085618
bsxfun approach: 0.0040495
cumsum approach: 8.3642e-05
bsxfun/loop gain factor: 21.1431
cumsum/loop gain factor: 1023.6236
So the cumsum-based approach seems to be the fastest: about 400 times faster than the loop in the 4-column case, and 1000 times faster in the single-column case.
Several functions can do the job efficiently in Matlab.
On one side, you can use functions such as colfilt or nlfilter, which performs computations on sliding blocks. colfilt is way more efficient than nlfilter, but can be used only if the order of the elements inside a block does not matter. Here is how to use it on your data:
S = colfilt(A, [100,1], 'sliding', #std);
or
S = nlfilter(A, [100,1], #std);
On your example, you can clearly see the difference of performance. But there is a trick : both functions pad the input array so that the output vector has the same size as the input array. To get only the relevant part of the output vector, you need to skip the first floor((100-1)/2) = 49 first elements, and take 1000-100+1 values.
S(50:end-50)
But there is also another solution, close to colfilt, more efficient. colfilt calls col2im to reshape the input vector into a matrix on which it applies the given function on each distinct column. This transforms your input vector of size [1000,1] into a matrix of size [100,901]. But colfilt pads the input array with 0 or 1, and you don't need it. So you can run colfilt without the padding step, then apply std on each column and this is easy because std applied on a matrix returns a row vector of the stds of the columns. Finally, transpose it to get a column vector if you want. In brief and in one line:
S = std(im2col(X,[100 1],'sliding')).';
Remark: if you want to apply a more complex function, see the code of colfilt, line 144 and 147 (for v2013b).
If your concern is speed of the for loop, you can greatly reduce the number of loop iteration by folding your vector into an array (using reshape) with the columns having the number of element you want to apply your function on.
This will let Matlab and the JIT perform the optimization (and in most case they do that way better than us) by calculating your function on each column of your array.
You then reshape an offseted version of your array and do the same. You will still need a loop but the number of iteration will only be l (so 100 in your example case), instead of n-l+1=901 in a classic for loop (one window at a time).
When you're done, you reshape the array of result in a vector, then you still need to calculate manually the last window, but overall it is still much faster.
Taking the same input notation than Dan:
n = 1000;
A = rand(n,1);
l = 100;
It will take this shape:
width = (n/l)-1 ; %// width of each line in the temporary result array
tmp = zeros( l , width ) ; %// preallocation never hurts
for k = 1:l
tmp(k,:) = std( reshape( A(k:end-l+k-1) , l , [] ) ) ; %// calculate your stat on the array (reshaped vector)
end
S2 = [tmp(:) ; std( A(end-l+1:end) ) ] ; %// "unfold" your results then add the last window calculation
If I tic ... toc the complete loop version and the folded one, I obtain this averaged results:
Elapsed time is 0.057190 seconds. %// windows by window FOR loop
Elapsed time is 0.016345 seconds. %// "Folded" FOR loop
I know tic/toc is not the way to go for perfect timing but I don't have the timeit function on my matlab version. Besides, the difference is significant enough to show that there is an improvement (albeit not precisely quantifiable by this method). I removed the first run of course and I checked that the results are consistent with different matrix sizes.
Now regarding your "one liner" request, I suggest your wrap this code into a function like so:
function out = foldfunction( func , vec , nPts )
n = length( vec ) ;
width = (n/nPts)-1 ;
tmp = zeros( nPts , width ) ;
for k = 1:nPts
tmp(k,:) = func( reshape( vec(k:end-nPts+k-1) , nPts , [] ) ) ;
end
out = [tmp(:) ; func( vec(end-nPts+1:end) ) ] ;
Which in your main code allows you to call it in one line:
S = foldfunction( #std , A , l ) ;
The other great benefit of this format, is that you can use the very same sub function for other statistical function. For example, if you want the "mean" of your windows, you call the same just changing the func argument:
S = foldfunction( #mean , A , l ) ;
Only restriction, as it is it only works for vector as input, but with a bit of rework it could be made to take arrays as input too.

Matlab Vectorize

I have a probability matrix(glcm) of size 256x256x20. I have reshaped the matrix to
65536x20, so that I can eliminate one loop (along the 3rd dimension).
I want to do the following calculation.
for y = 1:256
for x = 1:256
if (ismember((x + y),(2:2*256)))
p_xplusy((x+y),:) = p_xplusy((x+y),:) + glcm(((y-1)*256+x),:);
end
end
end
So the p_xplusy will be a 511x20 matrix which each element is the sum of the diagonal of nxn sub matrix (where n belongs to 1:256) of the original 256x256x20 matrix.
This code block is making my program inefficient and I want to vectorize this loop. Any help would be appreciated.
Since your if statement is just checking whether x+y is less than or equal to 256, just force it to always be, and remove excess loops:
for y = 1:256
for x = 1:256-y
p_xplusy((x+y),:) = p_xplusy((x+y),:) + glcm(((y-1)*256+x),:);
end
end
This should provide a noticeable speed-up to your code.
You can reduce the complexity from O(n^2) to O(2*n) and thus improve runtime efficiency -
N = 256;
for k1 = 1:N
idx_glcm = k1:N-1:N*(k1-1)+1;
p_xplusy(k1+1,:) = p_xplusy(k1+1,:) + sum(glcm(idx_glcm,:),1);
end
for k1 = 2:N
idx_glcm = k1*N:N-1:N*(N-1)+k1;
p_xplusy(N+k1,:) = p_xplusy(N+k1,:) + sum(glcm(idx_glcm,:),1);
end
Some quick runtime tests seem to confirm our efficiency theory too.

Integration via trapezoidal sums in MATLAB

I need help finding an integral of a function using trapezoidal sums.
The program should take successive trapezoidal sums with n = 1, 2, 3, ...
subintervals until there are two neighouring values of n that differ by less than a given tolerance. I want at least one FOR loop within a WHILE loop and I don't want to use the trapz function. The program takes four inputs:
f: A function handle for a function of x.
a: A real number.
b: A real number larger than a.
tolerance: A real number that is positive and very small
The problem I have is trying to implement the formula for trapezoidal sums which is
Δx/2[y0 + 2y1 + 2y2 + … + 2yn-1 + yn]
Here is my code, and the area I'm stuck in is the "sum" part within the FOR loop. I'm trying to sum up 2y2 + 2y3....2yn-1 since I already accounted for 2y1. I get an answer, but it isn't as accurate as it should be. For example, I get 6.071717974723753 instead of 6.101605982576467.
Thanks for any help!
function t=trapintegral(f,a,b,tol)
format compact; format long;
syms x;
oldtrap = ((b-a)/2)*(f(a)+f(b));
n = 2;
h = (b-a)/n;
newtrap = (h/2)*(f(a)+(2*f(a+h))+f(b));
while (abs(newtrap-oldtrap)>=tol)
oldtrap = newtrap;
for i=[3:n]
dx = (b-a)/n;
trapezoidsum = (dx/2)*(f(x) + (2*sum(f(a+(3:n-1))))+f(b));
newtrap = trapezoidsum;
end
end
t = newtrap;
end
The reason why this code isn't working is because there are two slight errors in your summation for the trapezoidal rule. What I am precisely referring to is this statement:
trapezoidsum = (dx/2)*(f(x) + (2*sum(f(a+(3:n-1))))+f(b));
Recall the equation for the trapezoidal integration rule:
Source: Wikipedia
For the first error, f(x) should be f(a) as you are including the starting point, and shouldn't be left as symbolic. In fact, you should simply get rid of the syms x statement as it is not useful in your script. a corresponds to x1 by consulting the above equation.
The next error is the second term. You actually need to multiply your index values (3:n-1) by dx. Also, this should actually go from (1:n-1) and I'll explain later. The equation above goes from 2 to N, but for our purposes, we are going to go from 1 to N-1 as you have your code set up like that.
Remember, in the trapezoidal rule, you are subdividing the finite interval into n pieces. The ith piece is defined as:
x_i = a + dx*i; ,
where i goes from 1 up to N-1. Note that this starts at 1 and not 3. The reason why is because the first piece is already taken into account by f(a), and we only count up to N-1 as piece N is accounted by f(b). For the equation, this goes from 2 to N and by modifying the code this way, this is precisely what we are doing in the end.
Therefore, your statement actually needs to be:
trapezoidsum = (dx/2)*(f(a) + (2*sum(f(a+dx*(1:n-1))))+f(b));
Try this and let me know if you get the right answer. FWIW, MATLAB already implements trapezoidal integration by doing trapz as #ADonda already pointed out. However, you need to properly structure what your x and y values are before you set this up. In other words, you would need to set up your dx before hand, then calculate your x points using the x_i equation that I specified above, then use these to generate your y values. You then use trapz to calculate the area. In other words:
dx = (b-a) / n;
x = a + dx*(0:n);
y = f(x);
trapezoidsum = trapz(x,y);
You can use the above code as a reference to see if you are implementing the trapezoidal rule correctly. Your implementation and using the above code should generate the same results. All you have to do is change the value of n, then run this code to generate the approximation of the area for different subdivisions underneath your curve.
Edit - August 17th, 2014
I figured out why your code isn't working. Here are the reasons why:
The for loop is unnecessary. Take a look at the for loop iteration. You have a loop going from i = [3:n] yet you don't reference the i variable at all in your loop. As such, you don't need this at all.
You are not computing successive intervals properly. What you need to do is when you compute the trapezoidal sum for the nth subinterval, you then increment this value of n, then compute the trapezoidal rule again. This value is not being incremented properly in your while loop, which is why your area is never improving.
You need to save the previous area inside the while loop, then when you compute the next area, that's when you determine whether or not the difference between the areas is less than the tolerance. We can also get rid of that code at the beginning that tries and compute the area for n = 2. That's not needed, as we can place this inside your while loop. As such, this is what your code should look like:
function t=trapintegral(f,a,b,tol)
format long; %// Got rid of format compact. Useless
%// n starts at 2 - Also removed syms x - Useless statement
n = 2;
newtrap = ((b-a)/2)*(f(a) + f(b)); %// Initialize
oldtrap = 0; %// Initialize to 0
while (abs(newtrap-oldtrap)>=tol)
oldtrap = newtrap; %//Save the old area from the previous iteration
dx = (b-a)/n; %//Compute width
%//Determine sum
trapezoidsum = (dx/2)*(f(a) + (2*sum(f(a+dx*(1:n-1))))+f(b));
newtrap = trapezoidsum; % //This is the new sum
n = n + 1; % //Go to the next value of n
end
t = newtrap;
end
By running your code, this is what I get:
trapezoidsum = trapintegral(#(x) (x+x.^2).^(1/3),1,4,0.00001)
trapezoidsum =
6.111776299189033
Caveat
Look at the way I defined your function. You must use element-by-element operations as the sum command inside the loop will be vectorized. Take a look at the ^ operations specifically. You need to prepend a dot to the operations. Once you do this, I get the right answer.
Edit #2 - August 18th, 2014
You said you want at least one for loop. This is highly inefficient, and whoever specified having one for loop in the code really doesn't know how MATLAB works. Nevertheless, you can use the for loop to accumulate the sum term. As such:
function t=trapintegral(f,a,b,tol)
format long; %// Got rid of format compact. Useless
%// n starts at 3 - Also removed syms x - Useless statement
n = 3;
%// Compute for n = 2 first, then proceed if we don't get a better
%// difference tolerance
newtrap = ((b-a)/2)*(f(a) + f(b)); %// Initialize
oldtrap = 0; %// Initialize to 0
while (abs(newtrap-oldtrap)>=tol)
oldtrap = newtrap; %//Save the old area from the previous iteration
dx = (b-a)/n; %//Compute width
%//Determine sum
%// Initialize
trapezoidsum = (dx/2)*(f(a) + f(b));
%// Accumulate sum terms
%// Note that we multiply each term by (dx/2), but because of the
%// factor of 2 for each of these terms, these cancel and we thus have dx
for n2 = 1 : n-1
trapezoidsum = trapezoidsum + dx*f(a + dx*n2);
end
newtrap = trapezoidsum; % //This is the new sum
n = n + 1; % //Go to the next value of n
end
t = newtrap;
end
Good luck!