Vectorizing multiple gaussian calculations - matlab

I am attempting to vectorize some of my code that adds the intensity of many Gaussian distributions over an image. I currently loop over the function 'gaussIt2D' for each Gaussian, which is vectorized for a single 2D gaussian:
windowSize=10;
imSize=[512,512];
%pointsR is an nx2 array of coordinates [x1,y1;x2,y2;...;xn,yn]
pointsR=rand(100,2)*511+1;
%sigmaR is the standard deviation of the gaussian being created
sigmaR = 1;
outputImage=zeros(imSize);
for n=1:size(pointsR,1)
rangeX = floor(pointsR(n,1)-windowSize):ceil(pointsR(n,1)+ windowSize);
rangeX = rangeX(rangeX > 0 & rangeX <= imSize(1));
rangeY = floor(pointsR(n,2)-windowSize):ceil(pointsR(n,2)+windowSize);
rangeY = rangeY(rangeY > 0 & rangeY <= imSize(2));
outputImage(rangeX,rangeY) = outputImage(rangeX,rangeY)+gaussIt2D(rangeX(1),rangeX(end),rangeY(1),rangeY(end),[sigmaR,pointsR(n,1),pointsR(n,2)]);
end
function [result] = gaussIt2D(xInit,xFinal,yInit,yFinal,sigma,xCenter,yCenter)
%Returns gaussian intenisty values for the region defined by [xInit:xFinal,yInit:yFinal] using the gaussian properties sigma,centerX,centerY
[gridX,gridY]=ndgrid(xInit:xFinal,yInit:yFinal);
result=exp( -( (gridX-xCenter).^2 + (gridY-yCenter).^2 ) ./ (2*sigma.^2) );
end
I am trying to further vectorize this process by allowing the gaussIt2D function to accept a vector of x and y values and a vector of x and y centers and do all of them together. My thought process so far has been to try to stack the grids and replicate the centers and do the element-wise gaussian calculations. For (a simplified) example if:
xInits = [1,2,3];
xFinals = [2,3,4];
xCenters = [1.2,2.8,3.1];
yInits = [1,2,3];
yFinals = [2,3,4];
yCenters = [1.5,2.4,3.6];
Then I was thinking to create grids and centers following the form:
gridX = [1,2
1,2
2,3
2,3
3,4
3,4]
xCenters = [1.2,1.2
1.2,1.2
2.8,2.8
2.8,2.8
3.1,3.1
3.1,3.1]
This could then be used in the same gaussian equation used in the original function. However, generating these arrays is tripping me up. What I have right now is:
function [result]=gaussIt2DVectorized(xInits,xFinals,yInits,yFinals,sigmas,xCenters,yCenters)
%Incomplete
%Returns gaussian intenisty values for the region defined by
%[xInit:xFinal,yInit:yFinal] using the values array:[sigma,centerX,centerY]
[gridX,gridY]=arrayfun('ndgrid',xInits:xFinals,yInits:yFinals);
xCenters = repelem(xCenters,numel(xInits(1):xFinals(1)), numel(yInits(1):yFinals(1)));
yCenters = repelem(yCenters,numel(xInits(1):xFinals(1)), numel(yInits(1):yFinals(1)));
result=exp( -( (gridX-xCenters).^2 + (gridY-yCenters).^2 ) ./ (2*sigmas^2) );
end
This doesn't actually work though, and the I also anticipate difficulty accounting for ranges (ie xInit:xFinal) of different lengths.
Any help, tips, or alternate methods would be appriciated.
Thanks.

Since you cannot be sure that the grids will all be the same size, it's probably best to store them in a cell array rather than stacking them in a matrix. With cells arrays, you can still run your calculation without looping using cellfun.
For example:
function [result] = gaussIt2D_better(xInits,xFinals,yInits,yFinals,sigmas,xCenters,yCenters)
[gridsX, gridsY] = arrayfun(#(x) ndgrid(xInits(x):xFinals(x), yInits(x):yFinals(x)),1:length(xInits),'UniformOutput',0);
f=#(gridX, gridY, xCenter, yCenter, sigma) exp( -( (gridX-xCenter).^2 + (gridY-yCenter).^2 ) ./ (2*sigma.^2) );
result=cellfun(f, gridsX, gridsY, num2cell(xCenters), num2cell(yCenters), num2cell(sigmas), 'UniformOutput',0);
end
Note that in this example, the value returned is a cell array with the same length as the input vectors, one result for each.

Related

Which Bins are occupied in a 3D histogram in MatLab

I got 3D data, from which I need to calculate properties.
To reduce computung I wanted to discretize the space and calculate the properties from the Bin instead of the individual data points and then reasign the propertie caclulated from the bin back to the datapoint.
I further only want to calculate the Bins which have points within them.
Since there is no 3D-binning function in MatLab, what i do is using histcounts over each dimension and then searching for the unique Bins that have been asigned to the data points.
a5pre=compositions(:,1);
a7pre=compositions(:,2);
a8pre=compositions(:,3);
%% BINNING
a5pre_edges=[0,linspace(0.005,0.995,19),1];
a5pre_val=(a5pre_edges(1:end-1) + a5pre_edges(2:end))/2;
a5pre_val(1)=0;
a5pre_val(end)=1;
a7pre_edges=[0,linspace(0.005,0.995,49),1];
a7pre_val=(a7pre_edges(1:end-1) + a7pre_edges(2:end))/2;
a7pre_val(1)=0;
a7pre_val(end)=1;
a8pre_edges=a7pre_edges;
a8pre_val=a7pre_val;
[~,~,bin1]=histcounts(a5pre,a5pre_edges);
[~,~,bin2]=histcounts(a7pre,a7pre_edges);
[~,~,bin3]=histcounts(a8pre,a8pre_edges);
bins=[bin1,bin2,bin3];
[A,~,C]=unique(bins,'rows','stable');
a5pre=a5pre_val(A(:,1));
a7pre=a7pre_val(A(:,2));
a8pre=a8pre_val(A(:,3));
It seems like that the unique function is pretty time consuming, so I was wondering if there is a faster way to do it, knowing that the line only can contain integer or so... or a totaly different.
Best regards
function [comps,C]=compo_binner(x,y,z,e1,e2,e3,v1,v2,v3)
C=NaN(length(x),1);
comps=NaN(length(x),3);
id=1;
for i=1:numel(x)
B_temp(1,1)=v1(sum(x(i)>e1));
B_temp(1,2)=v2(sum(y(i)>e2));
B_temp(1,3)=v3(sum(z(i)>e3));
C_id=sum(ismember(comps,B_temp),2)==3;
if sum(C_id)>0
C(i)=find(C_id);
else
comps(id,:)=B_temp;
id=id+1;
C_id=sum(ismember(comps,B_temp),2)==3;
C(i)=find(C_id>0);
end
end
comps(any(isnan(comps), 2), :) = [];
end
But its way slower than the histcount, unique version. Cant avoid find-function, and thats a function you sure want to avoid in a loop when its about speed...
If I understand correctly you want to compute a 3D histogram. If there's no built-in tool to compute one, it is simple to write one:
function [H, lindices] = histogram3d(data, n)
% histogram3d 3D histogram
% H = histogram3d(data, n) computes a 3D histogram from (x,y,z) values
% in the Nx3 array `data`. `n` is the number of bins between 0 and 1.
% It is assumed all values in `data` are between 0 and 1.
assert(size(data,2) == 3, 'data must be Nx3');
H = zeros(n, n, n);
indices = floor(data * n) + 1;
indices(indices > n) = n;
lindices = sub2ind(size(H), indices(:,1), indices(:,2), indices(:,3));
for ii = 1:size(data,1)
H(lindices(ii)) = H(lindices(ii)) + 1;
end
end
Now, given your compositions array, and binning each dimension into 20 bins, we get:
[H, indices] = histogram3d(compositions, 20);
idx = find(H);
[x,y,z] = ind2sub(size(H), idx);
reduced_compositions = ([x,y,z] - 0.5) / 20;
The bin centers for H are at ((1:20)-0.5)/20.
On my machine this runs in a fraction of a second for 5 million inputs points.
Now, for each composition(ii,:), you have a number indices(ii), which matches with another number idx[jj], corresponding to reduced_compositions(jj,:). One easy way to make the assignment of results is as follows:
H(H > 0) = 1:numel(idx);
indices = H(indices);
Now for each composition(ii,:), your closest match in the reduced set is reduced_compositions(indices(ii),:).

Something's wrong with my Logistic Regression?

I'm trying to verify if my implementation of Logistic Regression in Matlab is good. I'm doing so by comparing the results I get via my implementation with the results given by the built-in function mnrfit.
The dataset D,Y that I have is such that each row of D is an observation in R^2 and the labels in Y are either 0 or 1. Thus, D is a matrix of size (n,2), and Y is a vector of size (n,1)
Here's how I do my implementation:
I first normalize my data and augment it to include the offset :
d = 2; %dimension of data
M = mean(D) ;
centered = D-repmat(M,n,1) ;
devs = sqrt(sum(centered.^2)) ;
normalized = centered./repmat(devs,n,1) ;
X = [normalized,ones(n,1)];
I will be doing my calculations on X.
Second, I define the gradient and hessian of the likelihood of Y|X:
function grad = gradient(w)
grad = zeros(1,d+1) ;
for i=1:n
grad = grad + (Y(i)-sigma(w'*X(i,:)'))*X(i,:) ;
end
end
function hess = hessian(w)
hess = zeros(d+1,d+1) ;
for i=1:n
hess = hess - sigma(w'*X(i,:)')*sigma(-w'*X(i,:)')*X(i,:)'*X(i,:) ;
end
end
with sigma being a Matlab function encoding the sigmoid function z-->1/(1+exp(-z)).
Third, I run the Newton algorithm on gradient to find the roots of the gradient of the likelihood. I implemented it myself. It behaves as expected as the norm of the difference between the iterates goes to 0. I wrote it based on this script.
I verified that the gradient at the wOPT returned by my Newton implementation is null:
gradient(wOP)
ans =
1.0e-15 *
0.0139 -0.0021 0.2290
and that the hessian has strictly negative eigenvalues
eig(hessian(wOPT))
ans =
-7.5459
-0.0027
-0.0194
Here's the wOPT I get with my implementation:
wOPT =
-110.8873
28.9114
1.3706
the offset being the last element. In order to plot the decision line, I should convert the slope wOPT(1:2) using M and devs. So I set :
my_offset = wOPT(end);
my_slope = wOPT(1:d)'.*devs + M ;
and I get:
my_slope =
1.0e+03 *
-7.2109 0.8166
my_offset =
1.3706
Now, when I run B=mnrfit(D,Y+1), I get
B =
-1.3496
1.7052
-1.0238
The offset is stored in B(1).
I get very different values. I would like to know what I am doing wrong. I have some doubt about the normalization and 'un-normalization' process. But I'm not sure, may be I'm doing something else wrong.
Additional Info
When I tape :
B=mnrfit(normalized,Y+1)
I get
-1.3706
110.8873
-28.9114
which is a rearranged version of the opposite of my wOPT. It contains exactly the same elements.
It seems likely that my scaling back of the learnt parameters is wrong. Otherwise, it would have given the same as B=mnrfit(D,Y+1)

Apply function to rolling window

Say I have a long list A of values (say of length 1000) for which I want to compute the std in pairs of 100, i.e. I want to compute std(A(1:100)), std(A(2:101)), std(A(3:102)), ..., std(A(901:1000)).
In Excel/VBA one can easily accomplish this by writing e.g. =STDEV(A1:A100) in one cell and then filling down in one go. Now my question is, how could one accomplish this efficiently in Matlab without having to use any expensive for-loops.
edit: Is it also possible to do this for a list of time series, e.g. when A has dimensions 1000 x 4 (i.e. 4 time series of length 1000)? The output matrix should then have dimensions 901 x 4.
Note: For the fastest solution see Luis Mendo's answer
So firstly using a for loop for this (especially if those are your actual dimensions) really isn't going to be expensive. Unless you're using a very old version of Matlab, the JIT compiler (together with pre-allocation of course) makes for loops inexpensive.
Secondly - have you tried for loops yet? Because you should really try out the naive implementation first before you start optimizing prematurely.
Thirdly - arrayfun can make this a one liner but it is basically just a for loop with extra overhead and very likely to be slower than a for loop if speed really is your concern.
Finally some code:
n = 1000;
A = rand(n,1);
l = 100;
for loop (hardly bulky, likely to be efficient):
S = zeros(n-l+1,1); %//Pre-allocation of memory like this is essential for efficiency!
for t = 1:(n-l+1)
S(t) = std(A(t:(t+l-1)));
end
A vectorized (memory in-efficient!) solution:
[X,Y] = meshgrid(1:l)
S = std(A(X+Y-1))
A probably better vectorized solution (and a one-liner) but still memory in-efficient:
S = std(A(bsxfun(#plus, 0:l-1, (1:l)')))
Note that with all these methods you can replace std with any function so long as it is applies itself to the columns of the matrix (which is the standard in Matlab)
Going 2D:
To go 2D we need to go 3D
n = 1000;
k = 4;
A = rand(n,k);
l = 100;
ind = bsxfun(#plus, permute(o:n:(k-1)*n, [3,1,2]), bsxfun(#plus, 0:l-1, (1:l)')); %'
S = squeeze(std(A(ind)));
M = squeeze(mean(A(ind)));
%// etc...
OR
[X,Y,Z] = meshgrid(1:l, 1:l, o:n:(k-1)*n);
ind = X+Y+Z-1;
S = squeeze(std(A(ind)))
M = squeeze(mean(A(ind)))
%// etc...
OR
ind = bsxfun(#plus, 0:l-1, (1:l)'); %'
for t = 1:k
S = std(A(ind));
M = mean(A(ind));
%// etc...
end
OR (taken from Luis Mendo's answer - note in his answer he shows a faster alternative to this simple loop)
S = zeros(n-l+1,k);
M = zeros(n-l+1,k);
for t = 1:(n-l+1)
S(t,:) = std(A(k:(k+l-1),:));
M(t,:) = mean(A(k:(k+l-1),:));
%// etc...
end
What you're doing is basically a filter operation.
If you have access to the image processing toolbox,
stdfilt(A,ones(101,1)) %# assumes that data series are in columns
will do the trick (no matter the dimensionality of A). Note that if you also have access to the parallel computing toolbox, you can let filter operations like these run on a GPU, although your problem might be too small to generate noticeable speedups.
To minimize number of operations, you can exploit the fact that the standard deviation can be computed as a difference involving second and first moments,
and moments over a rolling window are obtained efficiently with a cumulative sum (using cumsum):
A = randn(1000,4); %// random data
N = 100; %// window size
c = size(A,2);
A1 = [zeros(1,c); cumsum(A)];
A2 = [zeros(1,c); cumsum(A.^2)];
S = sqrt( (A2(1+N:end,:)-A2(1:end-N,:) ...
- (A1(1+N:end,:)-A1(1:end-N,:)).^2/N) / (N-1) ); %// result
Benchmarking
Here's a comparison against a loop based solution, using timeit. The loop approach is as in Dan's solution but adapted to the 2D case, exploting the fact that std works along each column in a vectorized manner.
%// File loop_approach.m
function S = loop_approach(A,N);
[n, p] = size(A);
S = zeros(n-N+1,p);
for k = 1:(n-N+1)
S(k,:) = std(A(k:(k+N-1),:));
end
%// File bsxfun_approach.m
function S = bsxfun_approach(A,N);
[n, p] = size(A);
ind = bsxfun(#plus, permute(0:n:(p-1)*n, [3,1,2]), bsxfun(#plus, 0:n-N, (1:N).')); %'
S = squeeze(std(A(ind)));
%// File cumsum_approach.m
function S = cumsum_approach(A,N);
c = size(A,2);
A1 = [zeros(1,c); cumsum(A)];
A2 = [zeros(1,c); cumsum(A.^2)];
S = sqrt( (A2(1+N:end,:)-A2(1:end-N,:) ...
- (A1(1+N:end,:)-A1(1:end-N,:)).^2/N) / (N-1) );
%// Benchmarking code
clear all
A = randn(1000,4); %// Or A = randn(1000,1);
N = 100;
t_loop = timeit(#() loop_approach(A,N));
t_bsxfun = timeit(#() bsxfun_approach(A,N));
t_cumsum = timeit(#() cumsum_approach(A,N));
disp(' ')
disp(['loop approach: ' num2str(t_loop)])
disp(['bsxfun approach: ' num2str(t_bsxfun)])
disp(['cumsum approach: ' num2str(t_cumsum)])
disp(' ')
disp(['bsxfun/loop gain factor: ' num2str(t_loop/t_bsxfun)])
disp(['cumsum/loop gain factor: ' num2str(t_loop/t_cumsum)])
Results
I'm using Matlab R2014b, Windows 7 64 bits, dual core processor, 4 GB RAM:
4-column case:
loop approach: 0.092035
bsxfun approach: 0.023535
cumsum approach: 0.0002338
bsxfun/loop gain factor: 3.9106
cumsum/loop gain factor: 393.6526
Single-column case:
loop approach: 0.085618
bsxfun approach: 0.0040495
cumsum approach: 8.3642e-05
bsxfun/loop gain factor: 21.1431
cumsum/loop gain factor: 1023.6236
So the cumsum-based approach seems to be the fastest: about 400 times faster than the loop in the 4-column case, and 1000 times faster in the single-column case.
Several functions can do the job efficiently in Matlab.
On one side, you can use functions such as colfilt or nlfilter, which performs computations on sliding blocks. colfilt is way more efficient than nlfilter, but can be used only if the order of the elements inside a block does not matter. Here is how to use it on your data:
S = colfilt(A, [100,1], 'sliding', #std);
or
S = nlfilter(A, [100,1], #std);
On your example, you can clearly see the difference of performance. But there is a trick : both functions pad the input array so that the output vector has the same size as the input array. To get only the relevant part of the output vector, you need to skip the first floor((100-1)/2) = 49 first elements, and take 1000-100+1 values.
S(50:end-50)
But there is also another solution, close to colfilt, more efficient. colfilt calls col2im to reshape the input vector into a matrix on which it applies the given function on each distinct column. This transforms your input vector of size [1000,1] into a matrix of size [100,901]. But colfilt pads the input array with 0 or 1, and you don't need it. So you can run colfilt without the padding step, then apply std on each column and this is easy because std applied on a matrix returns a row vector of the stds of the columns. Finally, transpose it to get a column vector if you want. In brief and in one line:
S = std(im2col(X,[100 1],'sliding')).';
Remark: if you want to apply a more complex function, see the code of colfilt, line 144 and 147 (for v2013b).
If your concern is speed of the for loop, you can greatly reduce the number of loop iteration by folding your vector into an array (using reshape) with the columns having the number of element you want to apply your function on.
This will let Matlab and the JIT perform the optimization (and in most case they do that way better than us) by calculating your function on each column of your array.
You then reshape an offseted version of your array and do the same. You will still need a loop but the number of iteration will only be l (so 100 in your example case), instead of n-l+1=901 in a classic for loop (one window at a time).
When you're done, you reshape the array of result in a vector, then you still need to calculate manually the last window, but overall it is still much faster.
Taking the same input notation than Dan:
n = 1000;
A = rand(n,1);
l = 100;
It will take this shape:
width = (n/l)-1 ; %// width of each line in the temporary result array
tmp = zeros( l , width ) ; %// preallocation never hurts
for k = 1:l
tmp(k,:) = std( reshape( A(k:end-l+k-1) , l , [] ) ) ; %// calculate your stat on the array (reshaped vector)
end
S2 = [tmp(:) ; std( A(end-l+1:end) ) ] ; %// "unfold" your results then add the last window calculation
If I tic ... toc the complete loop version and the folded one, I obtain this averaged results:
Elapsed time is 0.057190 seconds. %// windows by window FOR loop
Elapsed time is 0.016345 seconds. %// "Folded" FOR loop
I know tic/toc is not the way to go for perfect timing but I don't have the timeit function on my matlab version. Besides, the difference is significant enough to show that there is an improvement (albeit not precisely quantifiable by this method). I removed the first run of course and I checked that the results are consistent with different matrix sizes.
Now regarding your "one liner" request, I suggest your wrap this code into a function like so:
function out = foldfunction( func , vec , nPts )
n = length( vec ) ;
width = (n/nPts)-1 ;
tmp = zeros( nPts , width ) ;
for k = 1:nPts
tmp(k,:) = func( reshape( vec(k:end-nPts+k-1) , nPts , [] ) ) ;
end
out = [tmp(:) ; func( vec(end-nPts+1:end) ) ] ;
Which in your main code allows you to call it in one line:
S = foldfunction( #std , A , l ) ;
The other great benefit of this format, is that you can use the very same sub function for other statistical function. For example, if you want the "mean" of your windows, you call the same just changing the func argument:
S = foldfunction( #mean , A , l ) ;
Only restriction, as it is it only works for vector as input, but with a bit of rework it could be made to take arrays as input too.

how to write this script in matlab

I've just started learning Matlab(a few days ago) and I have the following homework that I just don't know how to code it:
Write a script that creates a graphic using the positions of the roots of the polynomial function: p(z)=z^n-1 in the complex plan for various values for n-natural number(nth roots of unity)
so I am assuming the function you are using is p(z) = (z^n) - 1 where n is some integer value.
you can the find the roots of this equation by simply plugging into the roots function. The array passed to roots are the coefficients of the input function.
Example
f = 5x^2-2x-6 -> Coefficients are [5,-2,-6]
To get roots enter roots([5,-2,-6]). This will find all points at which x will cause the function to be equal to 0.
so in your case you would enter funcRoots = roots([1,zeros(1,n-1),-1]);
You can then plot these values however you want, but a simple plot like plot(funcRoots) would likely suffice.
To do in a loop use the following. Mind you, if there are multiple roots that are the same, there may be some overlap so you may not be able to see certain values.
minN = 1;
maxN = 10;
nVals = minN:maxN;
figure; hold on;
colors = hsv(numel(nVals));
legendLabels = cell(1,numel(nVals));
iter = 1;
markers = {'x','o','s','+','d','^','v'};
for n = nVals
funcRoots = roots([1,zeros(1,n-1),-1]);
plot(real(funcRoots),imag(funcRoots),...
markers{mod(iter,numel(markers))+1},...
'Color',colors(iter,:),'MarkerSize',10,'LineWidth',2)
legendLabels{iter} = [num2str(n),'-order'];
iter = iter+1;
end
hold off;
xlabel('Real Value')
ylabel('Imaginary Value')
title('Unity Roots')
axis([-1,1,-1,1])
legend(legendLabels)

Mutual Information of MATLAB Matrix

I have a square matrix that represents the frequency counts of co-occurrences in a data set. In other words, the rows represent all possible observations of feature 1, and the columns are the possible observations of feature 2. The number in cell (x, y) is the number of times feature 1 was observed to be x at the same time feature 2 was y.
I want to calculate the mutual information contained in this matrix. MATLAB has a built-in information function, but it takes 2 arguments, one for x and one for y. How would I manipulate this matrix to get the arguments it expects?
Alternatively, I wrote my own mutual information function that takes a matrix, but I'm unsure about its accuracy. Does it look right?
function [mutualinfo] = mutualInformation(counts)
total = sum(counts(:));
pX = sum(counts, 1) ./ total;
pY = sum(counts) ./ total;
pXY = counts ./ total;
[h, w] = size(counts);
mutualinfo = 0;
for row = 1:h
for col = 1:w
mutualinfo = mutualinfo + pXY(row, col) * log(pXY(row, col) / (pX(row)*pY(col)));
end;
end;
end
I don't know of any built-in mutual information functions in MATLAB. Perhaps you got a hold of one of the submissions from the MathWorks File Exchange or some other third-party developer code?
I think there may be something wrong with how you are computing pX and pY. Plus, you can vectorize your operations instead of using for loops. Here's another version of your function to try out:
function mutualInfo = mutualInformation(counts)
pXY = counts./sum(counts(:));
pX = sum(pXY,2);
pY = sum(pXY,1);
mutualInfo = pXY.*log(pXY./(pX*pY));
mutualInfo = sum(mutualInfo(:));
end