calculate histogram of 2 variables - matlab

I have a vector containining the speed of 200 walks:
a = 50;
b = 100;
speed = (b-a).*rand(200,1) + a;
and another vector contaning the number of steps for each walk:
a = 8;
b = 100;
steps = (b-a).*rand(200,1) + a;
I would like to create a histogram plot with on the x axis the speeds and on the y axes the sum of the steps of each speed.
What I do is the following but I guess there is a more elegant way to do this:
unique_speed = unique(speed);
y_unique_speed = zeros(size(unique_speed));
for i = 1 : numel(unique_speed)
speed_idx = unique_speed(i);
idx = speed==speed_idx ;
y_unique_speed (i) = sum(steps (idx));
end

First you need to discretize your speed variable. Unlike in the other answer, the following allows you to choose arbitrary step size, e.g. I've chosen 1.5. Note that the last bin edge should be strictly more than the maximum data point, hence I used max(speed)+binstep. You can then use histc to determine which bin each pairs falls into, and accumarray to determine total number of steps in each bin. Finally, use bar to plot.
binstep = 1.5;
binranges = (min(speed):binstep:max(speed)+binstep)';
[~, ind] = histc(speed, binranges);
bincounts = accumarray(ind, steps, size(binranges));
hFig = figure(); axh = axes('Parent', hFig); hold(axh, 'all'); grid(axh, 'on');
bar(axh, binranges, bincounts); axis(axh, 'tight');

First, bin the speed data to discrete values:
sspeed = ceil(speed);
and then accumulate the various step sizes to each bin:
numsteps = accumarray(sspeed, steps);
plot(numsteps)

Related

average bins along a dimension of a nd array in matlab

To compute the mean of every bins along a dimension of a nd array in matlab, for example, average every 10 elements along dim 4 of a 4d array
x = reshape(1:30*30*20*300,30,30,20,300);
n = 10;
m = size(x,4)/10;
y = nan(30,30,20,m);
for ii = 1 : m
y(:,:,:,ii) = mean(x(:,:,:,(1:n)+(ii-1)*n),4);
end
It looks a bit silly. I think there must be better ways to average the bins?
Besides, is it possible to make the script applicable to general cases, namely, arbitray ndims of array and along an arbitray dim to average?
For the second part of your question you can use this:
x = reshape(1:30*30*20*300,30,30,20,300);
dim = 4;
n = 10;
m = size(x,dim)/10;
y = nan(30,30,20,m);
idx1 = repmat({':'},1,ndims(x));
idx2 = repmat({':'},1,ndims(x));
for ii = 1 : m
idx1{dim} = ii;
idx2{dim} = (1:n)+(ii-1)*n;
y(idx1{:}) = mean(x(idx2{:}),dim);
end
For the first part of the question here is an alternative using cumsum and diff, but it may not be better then the loop solution:
function y = slicedmean(x,slice_size,dim)
s = cumsum(x,dim);
idx1 = repmat({':'},1,ndims(x));
idx2 = repmat({':'},1,ndims(x));
idx1{dim} = slice_size;
idx2{dim} = slice_size:slice_size:size(x,dim);
y = cat(dim,s(idx1{:}),diff(s(idx2{:}),[],dim))/slice_size;
end
Here is a generic solution, using the accumarray function. I haven't tested how fast it is. There might be some room for improvement though.
Basically, accumarray groups the value in x following a matrix of customized index for your question
x = reshape(1:30*30*20*300,30,30,20,300);
s = size(x);
% parameters for averaging
dimAv = 4;
n = 10;
% get linear index
ix = (1:numel(x))';
% transform them to a matrix of index per dimension
% this is a customized version of ind2sub
pcum = [1 cumprod(s(1:end-1))];
sub = zeros(numel(ix),numel(s));
for i = numel(s):-1:1,
ixtmp = rem(ix-1, pcum(i)) + 1;
sub(:,i) = (ix - ixtmp)/pcum(i) + 1;
ix = ixtmp;
end
% correct index for the given dimension
sub(:,dimAv) = floor((sub(:,dimAv)-1)/n)+1;
% run the accumarray to compute the average
sout = s;
sout(dimAv) = ceil(sout(dimAv)/n);
y = accumarray(sub,x(:), sout, #mean);
If you need a faster and memory efficient operation, you'll have to write your own mex function. It shouldn't be so difficult, I think !

Solve for independent variable between data points in MATLAB

I have many sets of data over the same time period, with a timestep of 300 seconds. Sets that terminate before the end of the observation period (here I've truncated it to 0 to 3000 seconds) have NaNs in the remaining spaces:
x = [0;300;600;900;1200;1500;1800;2100;2400;2700;3000];
y(:,1) = [4.65;3.67;2.92;2.39;2.02;1.67;1.36;1.07;NaN;NaN;NaN];
y(:,2) = [4.65;2.65;2.33;2.18;2.03;1.89;1.75;1.61;1.48;1.36;1.24];
y(:,3) = [4.65;2.73;1.99;1.49;1.05;NaN;NaN;NaN;NaN;NaN;NaN];
I would like to know at what time each dataset would reach the point where y is equal to a specific value, in this case y = 2.5
I first tried finding the nearest y value to 2.5, and then using the associated time, but this isn't very accurate (the dots should all fall on the same horizontal line):
ybreak = 2.5;
for ii = 1:3
[~, index] = min(abs(y(:,ii)-ybreak));
yclosest(ii) = y(index,ii);
xbreak(ii) = x(index);
end
I then tried doing a linear interpolation between data points, and then solving for x at y=2.5, but wasn't able to make this work:
First I removed the NaNs (which it seems like there must be a simpler way of doing?):
for ii = 1:3
NaNs(:,ii) = isnan(y(:,ii));
for jj = 1:length(x);
if NaNs(jj,ii) == 0;
ycopy(jj,ii) = y(jj,ii);
end
end
end
Then tried fitting:
for ii = 1:3
f(ii) = fit(x(1:length(ycopy(:,ii))),ycopy(:,ii),'linearinterp');
end
And get the following error message:
Error using cfit/subsasgn (line 7)
Can't assign to an empty FIT.
When I try fitting outside the loop (for just one dataset), it works fine:
f = fit(x(1:length(ycopy(:,1))),ycopy(:,1),'linearinterp');
f =
Linear interpolant:
f(x) = piecewise polynomial computed from p
Coefficients:
p = coefficient structure
But I then still can't solve f(x)=2.5 to find the time at which y=2.5
syms x;
xbreak = solve(f(x) == 2.5,x);
Error using cfit/subsref>iParenthesesReference (line 45)
Cannot evaluate CFIT model for some reason.
Error in cfit/subsref (line 15)
out = iParenthesesReference( obj, currsubs );
Any advice or thoughts on other approaches to this would be much appreciated. I need to be able to do it for many many datasets, all of which have different numbers of NaN values.
As you mention y=2.5 is not in your data set so the value of x which corresponds to this depends on the interpolation method you use. For linear interpolation, you could use something like the following
x = [0;300;600;900;1200;1500;1800;2100;2400;2700;3000];
y(:,1) = [4.65;3.67;2.92;2.39;2.02;1.67;1.36;1.07;NaN;NaN;NaN];
y(:,2) = [4.65;2.65;2.33;2.18;2.03;1.89;1.75;1.61;1.48;1.36;1.24];
y(:,3) = [4.65;2.73;1.99;1.49;1.05;NaN;NaN;NaN;NaN;NaN;NaN];
N = size(y, 2);
x_interp = NaN(N, 1);
for i = 1:N
idx = find(y(:,i) >= 2.5, 1, 'last');
x_interp(i) = interp1(y(idx:idx+1, i), x(idx:idx+1), 2.5);
end
figure
hold on
plot(x, y)
scatter(x_interp, repmat(2.5, N, 1))
hold off
It's worth keeping in mind that the above code is assuming your data is monotonically decreasing (as your data is), but this solution could be adapted for monotonically increasing as well.

MeanShift Clustering on dataset

I have a numeric dataset and I want to cluster data with a non-parametric algorithm. Basically, I would like to cluster without specifying the number of clusters for the input. I am using this code that I accessed through the MathWorks File Exchange network which implements the Mean Shift algorithm. However, I don't Know how to adapt my data to this code as my dataset has dimensions 516 x 19.
function [clustCent,data2cluster,cluster2dataCell] =MeanShiftCluster(dataPts,bandWidth,plotFlag)
%UNTITLED2 Summary of this function goes here
% Detailed explanation goes here
%perform MeanShift Clustering of data using a flat kernel
%
% ---INPUT---
% dataPts - input data, (numDim x numPts)
% bandWidth - is bandwidth parameter (scalar)
% plotFlag - display output if 2 or 3 D (logical)
% ---OUTPUT---
% clustCent - is locations of cluster centers (numDim x numClust)
% data2cluster - for every data point which cluster it belongs to (numPts)
% cluster2dataCell - for every cluster which points are in it (numClust)
%
% Bryan Feldman 02/24/06
% MeanShift first appears in
% K. Funkunaga and L.D. Hosteler, "The Estimation of the Gradient of a
% Density Function, with Applications in Pattern Recognition"
%*** Check input ****
if nargin < 2
error('no bandwidth specified')
end
if nargin < 3
plotFlag = true;
plotFlag = false;
end
%**** Initialize stuff ***
%[numPts,numDim] = size(dataPts);
[numDim,numPts] = size(dataPts);
numClust = 0;
bandSq = bandWidth^2;
initPtInds = 1:numPts
maxPos = max(dataPts,[],2); %biggest size in each dimension
minPos = min(dataPts,[],2); %smallest size in each dimension
boundBox = maxPos-minPos; %bounding box size
sizeSpace = norm(boundBox); %indicator of size of data space
stopThresh = 1e-3*bandWidth; %when mean has converged
clustCent = []; %center of clust
beenVisitedFlag = zeros(1,numPts,'uint8'); %track if a points been seen already
numInitPts = numPts %number of points to posibaly use as initilization points
clusterVotes = zeros(1,numPts,'uint16'); %used to resolve conflicts on cluster membership
while numInitPts
tempInd = ceil( (numInitPts-1e-6)*rand) %pick a random seed point
stInd = initPtInds(tempInd) %use this point as start of mean
myMean = dataPts(:,stInd); % intilize mean to this points location
myMembers = []; % points that will get added to this cluster
thisClusterVotes = zeros(1,numPts,'uint16'); %used to resolve conflicts on cluster membership
while 1 %loop untill convergence
sqDistToAll = sum((repmat(myMean,1,numPts) - dataPts).^2); %dist squared from mean to all points still active
inInds = find(sqDistToAll < bandSq); %points within bandWidth
thisClusterVotes(inInds) = thisClusterVotes(inInds)+1; %add a vote for all the in points belonging to this cluster
myOldMean = myMean; %save the old mean
myMean = mean(dataPts(:,inInds),2); %compute the new mean
myMembers = [myMembers inInds]; %add any point within bandWidth to the cluster
beenVisitedFlag(myMembers) = 1; %mark that these points have been visited
%*** plot stuff ****
if plotFlag
figure(12345),clf,hold on
if numDim == 2
plot(dataPts(1,:),dataPts(2,:),'.')
plot(dataPts(1,myMembers),dataPts(2,myMembers),'ys')
plot(myMean(1),myMean(2),'go')
plot(myOldMean(1),myOldMean(2),'rd')
pause
end
end
%**** if mean doesnt move much stop this cluster ***
if norm(myMean-myOldMean) < stopThresh
%check for merge posibilities
mergeWith = 0;
for cN = 1:numClust
distToOther = norm(myMean-clustCent(:,cN)); %distance from posible new clust max to old clust max
if distToOther < bandWidth/2 %if its within bandwidth/2 merge new and old
mergeWith = cN;
break;
end
end
if mergeWith > 0 % something to merge
clustCent(:,mergeWith) = 0.5*(myMean+clustCent(:,mergeWith)); %record the max as the mean of the two merged (I know biased twoards new ones)
%clustMembsCell{mergeWith} = unique([clustMembsCell{mergeWith} myMembers]); %record which points inside
clusterVotes(mergeWith,:) = clusterVotes(mergeWith,:) + thisClusterVotes; %add these votes to the merged cluster
else %its a new cluster
numClust = numClust+1 %increment clusters
clustCent(:,numClust) = myMean; %record the mean
%clustMembsCell{numClust} = myMembers; %store my members
clusterVotes(numClust,:) = thisClusterVotes;
end
break;
end
end
initPtInds = find(beenVisitedFlag == 0); %we can initialize with any of the points not yet visited
numInitPts = length(initPtInds); %number of active points in set
end
[val,data2cluster] = max(clusterVotes,[],1); %a point belongs to the cluster with the most votes
%*** If they want the cluster2data cell find it for them
if nargout > 2
cluster2dataCell = cell(numClust,1);
for cN = 1:numClust
myMembers = find(data2cluster == cN);
cluster2dataCell{cN} = myMembers;
end
end
This is the test code I am using to try and get the Mean Shift program to work:
clear
profile on
nPtsPerClust = 250;
nClust = 3;
totalNumPts = nPtsPerClust*nClust;
m(:,1) = [1 1];
m(:,2) = [-1 -1];
m(:,3) = [1 -1];
var = .6;
bandwidth = .75;
clustMed = [];
%clustCent;
x = var*randn(2,nPtsPerClust*nClust);
%*** build the point set
for i = 1:nClust
x(:,1+(i-1)*nPtsPerClust:(i)*nPtsPerClust) = x(:,1+(i-1)*nPtsPerClust:(i)*nPtsPerClust) + repmat(m(:,i),1,nPtsPerClust);
end
tic
[clustCent,point2cluster,clustMembsCell] = MeanShiftCluster(x,bandwidth);
toc
numClust = length(clustMembsCell)
figure(10),clf,hold on
cVec = 'bgrcmykbgrcmykbgrcmykbgrcmyk';%, cVec = [cVec cVec];
for k = 1:min(numClust,length(cVec))
myMembers = clustMembsCell{k};
myClustCen = clustCent(:,k);
plot(x(1,myMembers),x(2,myMembers),[cVec(k) '.'])
plot(myClustCen(1),myClustCen(2),'o','MarkerEdgeColor','k','MarkerFaceColor',cVec(k), 'MarkerSize',10)
end
title(['no shifting, numClust:' int2str(numClust)])
The test script generates random data X. In my case. I want to use the matrix D of size 516 x 19 but I am not sure how to adapt my data to this function. The function is returning results that are not agreeing with my understanding of the algorithm.
Does anyone know how to do this?

How do I index codistributed arrays in a spmd block

I am doing a very large calculation (atmospheric absorption) that has a lot of individual narrow peaks that all get added up at the end. For each peak, I have pre-calculated the range over which the value of the peak shape function is above my chosen threshold, and I am then going line by line and adding the peaks to my spectrum. A minimum example is given below:
X = 1:1e7;
K = numel(a); % count the number of peaks I have.
spectrum = zeros(size(X));
for k = 1:K
grid = X >= rng(1,k) & X <= rng(2,k);
spectrum(grid) = spectrum(grid) + peakfn(X(grid),a(k),b(k),c(k)]);
end
Here, each peak has some parameters that define the position and shape (a,b,c), and a range over which to do the calculation (rng). This works great, and on my machine it benchmarks at around 220 seconds to do a complete data set. However, I have a 4 core machine and I would eventually like to run this on a cluster, so I'd like to parallelize it and make it scaleable.
Because each loop relies on the results of the previous iteration, I cannot use parfor, so I am taking my first step into learning how to use spmd blocks. My first try looked like this:
X = 1:1e7;
cores = matlabpool('size');
K = numel(a);
spectrum = zeros(size(X),cores);
spmd
n = labindex:cores:K
N = numel(n);
for k = 1:N
grid = X >= rng(1,n(k)) & X <= rng(2,n(k));
spectrum(grid,labindex) = spectrum(grid,labindex) + peakfn(X(grid),a(n(k)),b(n(k)),c(n(k))]);
end
end
finalSpectrum = sum(spectrum,2);
This almost works. The program crashes at the last line because spectrum is of type Composite, and the documentation for 2013a is spotty on how to turn Composite data into a matrix (cell2mat does not work). This also does not scale well because the more cores I have, the larger the matrix is, and that large matrix has to get copied to each worker, which then ignores most of the data. Question 1: how do I turn a Composite data type into a useable array?
The second thing I tried was to use a codistributed array.
spmd
spectrum = codistributed.zeros(K,cores);
disp(size(getLocalPart(spectrum)))
end
This tells me that each worker has a single vector of size [K 1], which I believe is what I want, but when I try to then meld the above methods
spmd
spectrum = codistributed.zeros(K,cores);
n = labindex:cores:K
N = numel(n);
for k = 1:N
grid = X >= rng(1,n(k)) & X <= rng(2,n(k));
spectrum(grid) = spectrum(grid) + peakfn(X(grid),a(n(k)),b(n(k)),c(n(k))]); end
finalSpectrum = gather(spectrum);
end
finalSpectrum = sum(finalSpectrum,2);
I get Matrix dimensions must agree errors. Since it's in a parallel block, I can't use my normal debugging crutch of stepping through the loop and seeing what the size of each block is at each point to see what's going on. Question 2: what is the proper way to index into and out of a codistributed array in an spmd block?
Regarding question#1, the Composite variable in the client basically refers to a non-distributed variant array stored on the workers. You can access the array from each worker by {}-indexing using its corresponding labindex (e.g: spectrum{1}, spectrum{2}, ..).
For your code that would be: finalSpectrum = sum(cat(2,spectrum{:}), 2);
Now I tried this problem myself using random data. Below are three implementations to compare (see here to understand the difference between distributed and nondistributed arrays). First we start with the common data:
len = 100; % spectrum length
K = 10; % number of peaks
X = 1:len;
% random position and shape parameters
a = rand(1,K); b = rand(1,K); c = rand(1,K);
% random peak ranges (lower/upper thresholds)
ranges = sort(randi([1 len], [2 K]));
% dummy peakfn() function
fcn = #(x,a,b,c) x+a+b+c;
% prepare a pool of MATLAB workers
matlabpool open
1) Serial for-loop:
spectrum = zeros(size(X));
for i=1:size(ranges,2)
r = ranges(:,i);
idx = (r(1) <= X & X <= r(2));
spectrum(idx) = spectrum(idx) + fcn(X(idx), a(i), b(i), c(i));
end
s1 = spectrum;
clear spectrum i r idx
2) SPMD with Composite array
spmd
spectrum = zeros(1,len);
ind = labindex:numlabs:K;
for i=1:numel(ind)
r = ranges(:,ind(i));
idx = (r(1) <= X & X <= r(2));
spectrum(idx) = spectrum(idx) + ...
feval(fcn, X(idx), a(ind(i)), b(ind(i)), c(ind(i)));
end
end
s2 = sum(vertcat(spectrum{:}));
clear spectrum i r idx ind
3) SPMD with co-distributed array
spmd
spectrum = zeros(numlabs, len, codistributor('1d',1));
ind = labindex:numlabs:K;
for i=1:numel(ind)
r = ranges(:,ind(i));
idx = (r(1) <= X & X <= r(2));
spectrum(labindex,idx) = spectrum(labindex,idx) + ...
feval(fcn, X(idx), a(ind(i)), b(ind(i)), c(ind(i)));
end
end
s3 = sum(gather(spectrum));
clear spectrum i r idx ind
All three results should be equal (to within an acceptably small margin of error)
>> max([max(s1-s2), max(s1-s3), max(s2-s3)])
ans =
2.8422e-14

Optimizing repetitive estimation (currently a loop) in MATLAB

I've found myself needing to do a least-squares (or similar matrix-based operation) for every pixel in an image. Every pixel has a set of numbers associated with it, and so it can be arranged as a 3D matrix.
(This next bit can be skipped)
Quick explanation of what I mean by least-squares estimation :
Let's say we have some quadratic system that is modeled by Y = Ax^2 + Bx + C and we're looking for those A,B,C coefficients. With a few samples (at least 3) of X and the corresponding Y, we can estimate them by:
Arrange the (lets say 10) X samples into a matrix like X = [x(:).^2 x(:) ones(10,1)];
Arrange the Y samples into a similar matrix: Y = y(:);
Estimate the coefficients A,B,C by solving: coeffs = (X'*X)^(-1)*X'*Y;
Try this on your own if you want:
A = 5; B = 2; C = 1;
x = 1:10;
y = A*x(:).^2 + B*x(:) + C + .25*randn(10,1); % added some noise here
X = [x(:).^2 x(:) ones(10,1)];
Y = y(:);
coeffs = (X'*X)^-1*X'*Y
coeffs =
5.0040
1.9818
0.9241
START PAYING ATTENTION AGAIN IF I LOST YOU THERE
*MAJOR REWRITE*I've modified to bring it as close to the real problem that I have and still make it a minimum working example.
Problem Setup
%// Setup
xdim = 500;
ydim = 500;
ncoils = 8;
nshots = 4;
%// matrix size for each pixel is ncoils x nshots (an overdetermined system)
%// each pixel has a matrix stored in the 3rd and 4rth dimensions
regressor = randn(xdim,ydim, ncoils,nshots);
regressand = randn(xdim, ydim,ncoils);
So my problem is that I have to do a (X'*X)^-1*X'*Y (least-squares or similar) operation for every pixel in an image. While that itself is vectorized/matrixized the only way that I have to do it for every pixel is in a for loop, like:
Original code style
%// Actual work
tic
estimate = zeros(xdim,ydim);
for col=1:size(regressor,2)
for row=1:size(regressor,1)
X = squeeze(regressor(row,col,:,:));
Y = squeeze(regressand(row,col,:));
B = X\Y;
% B = (X'*X)^(-1)*X'*Y; %// equivalently
estimate(row,col) = B(1);
end
end
toc
Elapsed time = 27.6 seconds
EDITS in reponse to comments and other ideas
I tried some things:
1. Reshaped into a long vector and removed the double for loop. This saved some time.
2. Removed the squeeze (and in-line transposing) by permute-ing the picture before hand: This save alot more time.
Current example:
%// Actual work
tic
estimate2 = zeros(xdim*ydim,1);
regressor_mod = permute(regressor,[3 4 1 2]);
regressor_mod = reshape(regressor_mod,[ncoils,nshots,xdim*ydim]);
regressand_mod = permute(regressand,[3 1 2]);
regressand_mod = reshape(regressand_mod,[ncoils,xdim*ydim]);
for ind=1:size(regressor_mod,3) % for every pixel
X = regressor_mod(:,:,ind);
Y = regressand_mod(:,ind);
B = X\Y;
estimate2(ind) = B(1);
end
estimate2 = reshape(estimate2,[xdim,ydim]);
toc
Elapsed time = 2.30 seconds (avg of 10)
isequal(estimate2,estimate) == 1;
Rody Oldenhuis's way
N = xdim*ydim*ncoils; %// number of columns
M = xdim*ydim*nshots; %// number of rows
ii = repmat(reshape(1:N,[ncoils,xdim*ydim]),[nshots 1]); %//column indicies
jj = repmat(1:M,[ncoils 1]); %//row indicies
X = sparse(ii(:),jj(:),regressor_mod(:));
Y = regressand_mod(:);
B = X\Y;
B = reshape(B(1:nshots:end),[xdim ydim]);
Elapsed time = 2.26 seconds (avg of 10)
or 2.18 seconds (if you don't include the definition of N,M,ii,jj)
SO THE QUESTION IS:
Is there an (even) faster way?
(I don't think so.)
You can achieve a ~factor of 2 speed up by precomputing the transposition of X. i.e.
for x=1:size(picture,2) % second dimension b/c already transposed
X = picture(:,x);
XX = X';
Y = randn(n_timepoints,1);
%B = (X'*X)^-1*X'*Y; ;
B = (XX*X)^-1*XX*Y;
est(x) = B(1);
end
Before: Elapsed time is 2.520944 seconds.
After: Elapsed time is 1.134081 seconds.
EDIT:
Your code, as it stands in your latest edit, can be replaced by the following
tic
xdim = 500;
ydim = 500;
n_timepoints = 10; % for example
% Actual work
picture = randn(xdim,ydim,n_timepoints);
picture = reshape(picture, [xdim*ydim,n_timepoints])'; % note transpose
YR = randn(n_timepoints,size(picture,2));
% (XX*X).^-1 = sum(picture.*picture).^-1;
% XX*Y = sum(picture.*YR);
est = sum(picture.*picture).^-1 .* sum(picture.*YR);
est = reshape(est,[xdim,ydim]);
toc
Elapsed time is 0.127014 seconds.
This is an order of magnitude speed up on the latest edit, and the results are all but identical to the previous method.
EDIT2:
Okay, so if X is a matrix, not a vector, things are a little more complicated. We basically want to precompute as much as possible outside of the for-loop to keep our costs down. We can also get a significant speed-up by computing XT*X manually - since the result will always be a symmetric matrix, we can cut a few corners to speed things up. First, the symmetric multiplication function:
function XTX = sym_mult(X) % X is a 3-d matrix
n = size(X,2);
XTX = zeros(n,n,size(X,3));
for i=1:n
for j=i:n
XTX(i,j,:) = sum(X(:,i,:).*X(:,j,:));
if i~=j
XTX(j,i,:) = XTX(i,j,:);
end
end
end
Now the actual computation script
xdim = 500;
ydim = 500;
n_timepoints = 10; % for example
Y = randn(10,xdim*ydim);
picture = randn(xdim,ydim,n_timepoints); % 500x500x10
% Actual work
tic % start timing
picture = reshape(picture, [xdim*ydim,n_timepoints])';
% Here we precompute the (XT*Y) calculation to speed things up later
picture_y = [sum(Y);sum(Y.*picture)];
% initialize
est = zeros(size(picture,2),1);
picture = permute(picture,[1,3,2]);
XTX = cat(2,ones(n_timepoints,1,size(picture,3)),picture);
XTX = sym_mult(XTX); % precompute (XT*X) for speed
X = zeros(2,2); % preallocate for speed
XY = zeros(2,1);
for x=1:size(picture,2) % second dimension b/c already transposed
%For some reason this is a lot faster than X = XTX(:,:,x);
X(1,1) = XTX(1,1,x);
X(2,1) = XTX(2,1,x);
X(1,2) = XTX(1,2,x);
X(2,2) = XTX(2,2,x);
XY(1) = picture_y(1,x);
XY(2) = picture_y(2,x);
% Here we utilise the fact that A\B is faster than inv(A)*B
% We also use the fact that (A*B)*C = A*(B*C) to speed things up
B = X\XY;
est(x) = B(1);
end
est = reshape(est,[xdim,ydim]);
toc % end timing
Before: Elapsed time is 4.56 seconds.
After: Elapsed time is 2.24 seconds.
This is a speed up of about a factor of 2. This code should be extensible to X being any dimensions you want. For instance, in the case where X = [1 x x^2], you would change picture_y to the following
picture_y = [sum(Y);sum(Y.*picture);sum(Y.*picture.^2)];
and change XTX to
XTX = cat(2,ones(n_timepoints,1,size(picture,3)),picture,picture.^2);
You would also change a lot of 2s to 3s in the code, and add XY(3) = picture_y(3,x) to the loop. It should be fairly straight-forward, I believe.
Results
I sped up your original version, since your edit 3 was actually not working (and also does something different).
So, on my PC:
Your (original) version: 8.428473 seconds.
My obfuscated one-liner given below: 0.964589 seconds.
First, for no other reason than to impress, I'll give it as I wrote it:
%%// Some example data
xdim = 500;
ydim = 500;
n_timepoints = 10; % for example
estimate = zeros(xdim,ydim); %// initialization with explicit size
picture = randn(xdim,ydim,n_timepoints);
%%// Your original solution
%// (slightly altered to make my version's results agree with yours)
tic
Y = randn(n_timepoints,xdim*ydim);
ii = 1;
for x = 1:xdim
for y = 1:ydim
X = squeeze(picture(x,y,:)); %// or similar creation of X matrix
B = (X'*X)^(-1)*X' * Y(:,ii);
ii = ii+1;
%// sometimes you keep everything and do
%// estimate(x,y,:) = B(:);
%// sometimes just the first element is important and you do
estimate(x,y) = B(1);
end
end
toc
%%// My version
tic
%// UNLEASH THE FURY!!
estimate2 = reshape(sparse(1:xdim*ydim*n_timepoints, ...
builtin('_paren', ones(n_timepoints,1)*(1:xdim*ydim),:), ...
builtin('_paren', permute(picture, [3 2 1]),:))\Y(:), ydim,xdim).'; %'
toc
%%// Check for equality
max(abs(estimate(:)-estimate2(:))) % (always less than ~1e-14)
Breakdown
First, here's the version that you should actually use:
%// Construct sparse block-diagonal matrix
%// (Type "help sparse" for more information)
N = xdim*ydim; %// number of columns
M = N*n_timepoints; %// number of rows
ii = 1:N;
jj = ones(n_timepoints,1)*(1:N);
s = permute(picture, [3 2 1]);
X = sparse(ii,jj(:), s(:));
%// Compute ALL the estimates at once
estimates = X\Y(:);
%// You loop through the *second* dimension first, so to make everything
%// agree, we have to extract elements in the "wrong" order, and transpose:
estimate2 = reshape(estimates, ydim,xdim).'; %'
Here's an example of what picture and the corresponding matrix X looks like for xdim = ydim = n_timepoints = 2:
>> clc, picture, full(X)
picture(:,:,1) =
-0.5643 -2.0504
-0.1656 0.4497
picture(:,:,2) =
0.6397 0.7782
0.5830 -0.3138
ans =
-0.5643 0 0 0
0.6397 0 0 0
0 -2.0504 0 0
0 0.7782 0 0
0 0 -0.1656 0
0 0 0.5830 0
0 0 0 0.4497
0 0 0 -0.3138
You can see why sparse is necessary -- it's mostly zeros, but will grow large quickly. The full matrix would quickly consume all your RAM, while the sparse one will not consume much more than the original picture matrix does.
With this matrix X, the new problem
X·b = Y
now contains all the problems
X1 · b1 = Y1
X2 · b2 = Y2
...
where
b = [b1; b2; b3; ...]
Y = [Y1; Y2; Y3; ...]
so, the single command
X\Y
will solve all your systems at once.
This offloads all the hard work to a set of highly specialized, compiled to machine-specific code, optimized-in-every-way algorithms, rather than the interpreted, generic, always-two-steps-away from the hardware loops in MATLAB.
It should be straightforward to convert this to a version where X is a matrix; you'll end up with something like what blkdiag does, which can also be used by mldivide in exactly the same way as above.
I had a wee play around with an idea, and I decided to stick it as a separate answer, as its a completely different approach to my other idea, and I don't actually condone what I'm about to do. I think this is the fastest approach so far:
Orignal (unoptimised): 13.507176 seconds.
Fast Cholesky-decomposition method: 0.424464 seconds
First, we've got a function to quickly do the X'*X multiplication. We can speed things up here because the result will always be symmetric.
function XX = sym_mult(X)
n = size(X,2);
XX = zeros(n,n,size(X,3));
for i=1:n
for j=i:n
XX(i,j,:) = sum(X(:,i,:).*X(:,j,:));
if i~=j
XX(j,i,:) = XX(i,j,:);
end
end
end
The we have a function to do LDL Cholesky decomposition of a 3D matrix (we can do this because the (X'*X) matrix will always be symmetric) and then do forward and backwards substitution to solve the LDL inversion equation
function Y = fast_chol(X,XY)
n=size(X,2);
L = zeros(n,n,size(X,3));
D = zeros(n,n,size(X,3));
B = zeros(n,1,size(X,3));
Y = zeros(n,1,size(X,3));
% These loops compute the LDL decomposition of the 3D matrix
for i=1:n
D(i,i,:) = X(i,i,:);
L(i,i,:) = 1;
for j=1:i-1
L(i,j,:) = X(i,j,:);
for k=1:(j-1)
L(i,j,:) = L(i,j,:) - L(i,k,:).*L(j,k,:).*D(k,k,:);
end
D(i,j,:) = L(i,j,:);
L(i,j,:) = L(i,j,:)./D(j,j,:);
if i~=j
D(i,i,:) = D(i,i,:) - L(i,j,:).^2.*D(j,j,:);
end
end
end
for i=1:n
B(i,1,:) = XY(i,:);
for j=1:(i-1)
B(i,1,:) = B(i,1,:)-D(i,j,:).*B(j,1,:);
end
B(i,1,:) = B(i,1,:)./D(i,i,:);
end
for i=n:-1:1
Y(i,1,:) = B(i,1,:);
for j=n:-1:(i+1)
Y(i,1,:) = Y(i,1,:)-L(j,i,:).*Y(j,1,:);
end
end
Finally, we have the main script which calls all of this
xdim = 500;
ydim = 500;
n_timepoints = 10; % for example
Y = randn(10,xdim*ydim);
picture = randn(xdim,ydim,n_timepoints); % 500x500x10
tic % start timing
picture = reshape(pr, [xdim*ydim,n_timepoints])';
% Here we precompute the (XT*Y) calculation
picture_y = [sum(Y);sum(Y.*picture)];
% initialize
est2 = zeros(size(picture,2),1);
picture = permute(picture,[1,3,2]);
% Now we calculate the X'*X matrix
XTX = cat(2,ones(n_timepoints,1,size(picture,3)),picture);
XTX = sym_mult(XTX);
% Call our fast Cholesky decomposition routine
B = fast_chol(XTX,picture_y);
est2 = B(1,:);
est2 = reshape(est2,[xdim,ydim]);
toc
Again, this should work equally well for a Nx3 X matrix, or however big you want.
I use octave, thus I can't say anything about the resulting performance in Matlab, but would expect this code to be slightly faster:
pictureT=picture'
est=arrayfun(#(x)( (pictureT(x,:)*picture(:,x))^-1*pictureT(x,:)*randn(n_ti
mepoints,1)),1:size(picture,2));