Summing scalars and vectors [MATLAB] - matlab

First off; I'm not very well taught in programming, but i tend to learn exactly what I need to learn in order to do what I want when programming, I have moderate experience with python, html/css C and matlab. I've now enrolled in a physics-simulation course where I use matlab to compute the trajectory of 500 particles under the influence of 5 force-fields of different magnitude.
So now to my thing; I need to write the following for all i=1...500 particles
f_i = m*g - sum{(f_k/r_k^2)*exp((||vec(x)_i - vec(p)_k||^2)/2*r_k^2)(vec(x)_i - vec(p)_k)}
I hope its not too cluttered
And here is my code so far;
clear all
close all
echo off
%Simulation Parameters-------------------------------------------
h = 0.01; %Time-step h (s)
t_0 = 0; %initial time (s)
t_f = 3; %final time (s)
m = 1; %Particle mass (kg)
L = 5; %Charateristic length (m)
NT = t_f/h; %Number of time steps
g = [0,-9.81];
f = [32 40 28 16 20]; %the force f_k (N)
r = [0.3*L 0.2*L 0.4*L 0.5*L 0.3*L]; %the radii r_k
p = [-0.2*L 0.8*L; -0.3*L -0.8*L; -0.6*L 0.1*L;
0.4*L 0.7*L; 0.8*L -0.3*L];
%Forcefield origin position
%stepper = 'forward_euler'; % use forward Euler time-integration
fprintf('Simulation Parameters set');
%initialization---------------------------------------------------
for i = 1:500 %Gives inital value to each of the 500 particles
particle{i,1}.x = [-L -L];
particle{i,1}.v = [5,10];
particle{i,1}.m = m;
for k = 1:5
C = particle{i}.x - p(k,:);
F = rdivide(f(1,k),r(1,k)^2).*C
%clear C; %Creates elements for array F
end
particle{i}.fi = m*g - sum(F); %Compute attractive force on particle
%clear F; %Clear F for next use
end
What this code seems to do is that it goes into the first loop with index i, then goes through the 'k'-loop and exits it with a value for F then uses that last value for F(k) to compute f_i.
What I want it to do is to put all the values of F(k) from 1-5 and put into a matrix which columns I can sum for f_i. I'd prefer to sum the columns as the first column should represent all F-components in the x-axis and the second column all F-components in the y-axis.
Note that the expression for F in the k-loop is not done.

I fixed it by defining the index for F(k,:)

Related

Serious performance issue with iterating simulations

I recently stumbled upon a performance problem while implementing a simulation algorithm. I managed to find the bottleneck function (signally, it's the internal call to arrayfun that slows everything down):
function sim = simulate_frequency(the_f,k,n)
r = rand(1,n); %
x = arrayfun(#(x) find(x <= the_f,1,'first'),r);
sim = (histcounts(x,[1:k Inf]) ./ n).';
end
It is being used in other parts of code as follows:
h0 = zeros(1,sims);
for i = 1:sims
p = simulate_frequency(the_f,k,n);
h0(i) = max(abs(p - the_p));
end
Here are some possible values:
% Test Case 1
sims = 10000;
the_f = [0.3010; 0.4771; 0.6021; 0.6990; 0.7782; 0.8451; 0.9031; 0.9542; 1.0000];
k = 9;
n = 95;
% Test Case 2
sims = 10000;
the_f = [0.0413; 0.0791; 0.1139; 0.1461; 0.1760; 0.2041; 0.2304; 0.2552; 0.2787; 0.3010; 0.3222; 0.3424; 0.3617; 0.3802; 0.3979; 0.4149; 0.4313; 0.4471; 0.4623; 0.4771; 0.4913; 0.5051; 0.5185; 0.5314; 0.5440; 0.5563; 0.5682; 0.5797; 0.5910; 0.6020; 0.6127; 0.6232; 0.6334; 0.6434; 0.6532; 0.6627; 0.6720; 0.6812; 0.6901; 0.6989; 0.7075; 0.7160; 0.7242; 0.7323; 0.7403; 0.7481; 0.7558; 0.7634; 0.7708; 0.7781; 0.7853; 0.7923; 0.7993; 0.8061; 0.8129; 0.8195; 0.8260; 0.8325; 0.8388; 0.8450; 0.8512; 0.8573; 0.8633; 0.8692; 0.8750; 0.8808; 0.8864; 0.8920; 0.8976; 0.9030; 0.9084; 0.9138; 0.9190; 0.9242; 0.9294; 0.9344; 0.9395; 0.9444; 0.9493; 0.9542; 0.9590; 0.9637; 0.9684; 0.9731; 0.9777; 0.9822; 0.9867; 0.9912; 0.9956; 1.000];
k = 90;
n = 95;
The scalar sims must be in the range 1000 1000000. The vector of cumulated frequencies the_f never contains more than 100 elements. The scalar k represents the number of elements in the_f. Finally, the scalar n represents the number of elements in the empirical sample vector, and can even be very large (up to 10000 elements, as far as I can tell).
Any clue about how to improve the computation time of this process?
This seems to be slightly faster for me in the second test case, not the first. The time differences might be larger for longer the_f and larger values of n.
function sim = simulate_frequency(the_f,k,n)
r = rand(1,n); %
[row,col] = find(r <= the_f); % Implicit singleton expansion going on here!
[~,ind] = unique(col,'first');
x = row(ind);
sim = (histcounts(x,[1:k Inf]) ./ n).';
end
I'm using implicit singleton expansion in r <= the_f, use bsxfun if you have an older version of MATLAB (but you know the drill).
Find then returns row and column to all the locations where r is larger than the_f. unique finds the indices into the result for the first element of each column.
Credit: Andrei Bobrov over on MATLAB Answers
Another option (derived from this other answer) is a bit shorter but also a bit more obscure IMO:
mask = r <= the_f;
[x,~] = find(mask & (cumsum(mask,1)==1));
If I want performance, I would avoid arrayfun. Even this for loop is faster:
function sim = simulate_frequency(the_f,k,n)
r = rand(1,n); %
for i = 1:numel(r)
x(i) = find(r(i)<the_f,1,'first');
end
sim = (histcounts(x,[1:k Inf]) ./ n).';
end
Running 10000 sims with the first set of the sample data gives the following timing.
Your arrayfun function:
>Elapsed time is 2.848206 seconds.
The for loop function:
>Elapsed time is 0.938479 seconds.
Inspired by Cris Luengo's answer, I suggest below:
function sim = simulate_frequency(the_f,k,n)
r = rand(1,n); %
x = sum(r > the_f)+1;
sim = (histcounts(x,[1:k Inf]) ./ n)';
end
Time:
>Elapsed time is 0.264146 seconds.
You can use histcounts with r as its input:
r = rand(1,n);
sim = (histcounts(r,[-inf ;the_f]) ./ n).';
If histc is used instead of histcounts the whole simulation can be vectorized:
r = rand(n,sims);
p = histc(r, [-inf; the_f],1);
p = [p(1:end-2,:) ;sum(p(end-1:end,:))]./n;
h0 = max(abs(p-the_p(:))); %h0 = max(abs(bsxfun(#minus,p,the_p(:))));

How do I index codistributed arrays in a spmd block

I am doing a very large calculation (atmospheric absorption) that has a lot of individual narrow peaks that all get added up at the end. For each peak, I have pre-calculated the range over which the value of the peak shape function is above my chosen threshold, and I am then going line by line and adding the peaks to my spectrum. A minimum example is given below:
X = 1:1e7;
K = numel(a); % count the number of peaks I have.
spectrum = zeros(size(X));
for k = 1:K
grid = X >= rng(1,k) & X <= rng(2,k);
spectrum(grid) = spectrum(grid) + peakfn(X(grid),a(k),b(k),c(k)]);
end
Here, each peak has some parameters that define the position and shape (a,b,c), and a range over which to do the calculation (rng). This works great, and on my machine it benchmarks at around 220 seconds to do a complete data set. However, I have a 4 core machine and I would eventually like to run this on a cluster, so I'd like to parallelize it and make it scaleable.
Because each loop relies on the results of the previous iteration, I cannot use parfor, so I am taking my first step into learning how to use spmd blocks. My first try looked like this:
X = 1:1e7;
cores = matlabpool('size');
K = numel(a);
spectrum = zeros(size(X),cores);
spmd
n = labindex:cores:K
N = numel(n);
for k = 1:N
grid = X >= rng(1,n(k)) & X <= rng(2,n(k));
spectrum(grid,labindex) = spectrum(grid,labindex) + peakfn(X(grid),a(n(k)),b(n(k)),c(n(k))]);
end
end
finalSpectrum = sum(spectrum,2);
This almost works. The program crashes at the last line because spectrum is of type Composite, and the documentation for 2013a is spotty on how to turn Composite data into a matrix (cell2mat does not work). This also does not scale well because the more cores I have, the larger the matrix is, and that large matrix has to get copied to each worker, which then ignores most of the data. Question 1: how do I turn a Composite data type into a useable array?
The second thing I tried was to use a codistributed array.
spmd
spectrum = codistributed.zeros(K,cores);
disp(size(getLocalPart(spectrum)))
end
This tells me that each worker has a single vector of size [K 1], which I believe is what I want, but when I try to then meld the above methods
spmd
spectrum = codistributed.zeros(K,cores);
n = labindex:cores:K
N = numel(n);
for k = 1:N
grid = X >= rng(1,n(k)) & X <= rng(2,n(k));
spectrum(grid) = spectrum(grid) + peakfn(X(grid),a(n(k)),b(n(k)),c(n(k))]); end
finalSpectrum = gather(spectrum);
end
finalSpectrum = sum(finalSpectrum,2);
I get Matrix dimensions must agree errors. Since it's in a parallel block, I can't use my normal debugging crutch of stepping through the loop and seeing what the size of each block is at each point to see what's going on. Question 2: what is the proper way to index into and out of a codistributed array in an spmd block?
Regarding question#1, the Composite variable in the client basically refers to a non-distributed variant array stored on the workers. You can access the array from each worker by {}-indexing using its corresponding labindex (e.g: spectrum{1}, spectrum{2}, ..).
For your code that would be: finalSpectrum = sum(cat(2,spectrum{:}), 2);
Now I tried this problem myself using random data. Below are three implementations to compare (see here to understand the difference between distributed and nondistributed arrays). First we start with the common data:
len = 100; % spectrum length
K = 10; % number of peaks
X = 1:len;
% random position and shape parameters
a = rand(1,K); b = rand(1,K); c = rand(1,K);
% random peak ranges (lower/upper thresholds)
ranges = sort(randi([1 len], [2 K]));
% dummy peakfn() function
fcn = #(x,a,b,c) x+a+b+c;
% prepare a pool of MATLAB workers
matlabpool open
1) Serial for-loop:
spectrum = zeros(size(X));
for i=1:size(ranges,2)
r = ranges(:,i);
idx = (r(1) <= X & X <= r(2));
spectrum(idx) = spectrum(idx) + fcn(X(idx), a(i), b(i), c(i));
end
s1 = spectrum;
clear spectrum i r idx
2) SPMD with Composite array
spmd
spectrum = zeros(1,len);
ind = labindex:numlabs:K;
for i=1:numel(ind)
r = ranges(:,ind(i));
idx = (r(1) <= X & X <= r(2));
spectrum(idx) = spectrum(idx) + ...
feval(fcn, X(idx), a(ind(i)), b(ind(i)), c(ind(i)));
end
end
s2 = sum(vertcat(spectrum{:}));
clear spectrum i r idx ind
3) SPMD with co-distributed array
spmd
spectrum = zeros(numlabs, len, codistributor('1d',1));
ind = labindex:numlabs:K;
for i=1:numel(ind)
r = ranges(:,ind(i));
idx = (r(1) <= X & X <= r(2));
spectrum(labindex,idx) = spectrum(labindex,idx) + ...
feval(fcn, X(idx), a(ind(i)), b(ind(i)), c(ind(i)));
end
end
s3 = sum(gather(spectrum));
clear spectrum i r idx ind
All three results should be equal (to within an acceptably small margin of error)
>> max([max(s1-s2), max(s1-s3), max(s2-s3)])
ans =
2.8422e-14

Optimizing repetitive estimation (currently a loop) in MATLAB

I've found myself needing to do a least-squares (or similar matrix-based operation) for every pixel in an image. Every pixel has a set of numbers associated with it, and so it can be arranged as a 3D matrix.
(This next bit can be skipped)
Quick explanation of what I mean by least-squares estimation :
Let's say we have some quadratic system that is modeled by Y = Ax^2 + Bx + C and we're looking for those A,B,C coefficients. With a few samples (at least 3) of X and the corresponding Y, we can estimate them by:
Arrange the (lets say 10) X samples into a matrix like X = [x(:).^2 x(:) ones(10,1)];
Arrange the Y samples into a similar matrix: Y = y(:);
Estimate the coefficients A,B,C by solving: coeffs = (X'*X)^(-1)*X'*Y;
Try this on your own if you want:
A = 5; B = 2; C = 1;
x = 1:10;
y = A*x(:).^2 + B*x(:) + C + .25*randn(10,1); % added some noise here
X = [x(:).^2 x(:) ones(10,1)];
Y = y(:);
coeffs = (X'*X)^-1*X'*Y
coeffs =
5.0040
1.9818
0.9241
START PAYING ATTENTION AGAIN IF I LOST YOU THERE
*MAJOR REWRITE*I've modified to bring it as close to the real problem that I have and still make it a minimum working example.
Problem Setup
%// Setup
xdim = 500;
ydim = 500;
ncoils = 8;
nshots = 4;
%// matrix size for each pixel is ncoils x nshots (an overdetermined system)
%// each pixel has a matrix stored in the 3rd and 4rth dimensions
regressor = randn(xdim,ydim, ncoils,nshots);
regressand = randn(xdim, ydim,ncoils);
So my problem is that I have to do a (X'*X)^-1*X'*Y (least-squares or similar) operation for every pixel in an image. While that itself is vectorized/matrixized the only way that I have to do it for every pixel is in a for loop, like:
Original code style
%// Actual work
tic
estimate = zeros(xdim,ydim);
for col=1:size(regressor,2)
for row=1:size(regressor,1)
X = squeeze(regressor(row,col,:,:));
Y = squeeze(regressand(row,col,:));
B = X\Y;
% B = (X'*X)^(-1)*X'*Y; %// equivalently
estimate(row,col) = B(1);
end
end
toc
Elapsed time = 27.6 seconds
EDITS in reponse to comments and other ideas
I tried some things:
1. Reshaped into a long vector and removed the double for loop. This saved some time.
2. Removed the squeeze (and in-line transposing) by permute-ing the picture before hand: This save alot more time.
Current example:
%// Actual work
tic
estimate2 = zeros(xdim*ydim,1);
regressor_mod = permute(regressor,[3 4 1 2]);
regressor_mod = reshape(regressor_mod,[ncoils,nshots,xdim*ydim]);
regressand_mod = permute(regressand,[3 1 2]);
regressand_mod = reshape(regressand_mod,[ncoils,xdim*ydim]);
for ind=1:size(regressor_mod,3) % for every pixel
X = regressor_mod(:,:,ind);
Y = regressand_mod(:,ind);
B = X\Y;
estimate2(ind) = B(1);
end
estimate2 = reshape(estimate2,[xdim,ydim]);
toc
Elapsed time = 2.30 seconds (avg of 10)
isequal(estimate2,estimate) == 1;
Rody Oldenhuis's way
N = xdim*ydim*ncoils; %// number of columns
M = xdim*ydim*nshots; %// number of rows
ii = repmat(reshape(1:N,[ncoils,xdim*ydim]),[nshots 1]); %//column indicies
jj = repmat(1:M,[ncoils 1]); %//row indicies
X = sparse(ii(:),jj(:),regressor_mod(:));
Y = regressand_mod(:);
B = X\Y;
B = reshape(B(1:nshots:end),[xdim ydim]);
Elapsed time = 2.26 seconds (avg of 10)
or 2.18 seconds (if you don't include the definition of N,M,ii,jj)
SO THE QUESTION IS:
Is there an (even) faster way?
(I don't think so.)
You can achieve a ~factor of 2 speed up by precomputing the transposition of X. i.e.
for x=1:size(picture,2) % second dimension b/c already transposed
X = picture(:,x);
XX = X';
Y = randn(n_timepoints,1);
%B = (X'*X)^-1*X'*Y; ;
B = (XX*X)^-1*XX*Y;
est(x) = B(1);
end
Before: Elapsed time is 2.520944 seconds.
After: Elapsed time is 1.134081 seconds.
EDIT:
Your code, as it stands in your latest edit, can be replaced by the following
tic
xdim = 500;
ydim = 500;
n_timepoints = 10; % for example
% Actual work
picture = randn(xdim,ydim,n_timepoints);
picture = reshape(picture, [xdim*ydim,n_timepoints])'; % note transpose
YR = randn(n_timepoints,size(picture,2));
% (XX*X).^-1 = sum(picture.*picture).^-1;
% XX*Y = sum(picture.*YR);
est = sum(picture.*picture).^-1 .* sum(picture.*YR);
est = reshape(est,[xdim,ydim]);
toc
Elapsed time is 0.127014 seconds.
This is an order of magnitude speed up on the latest edit, and the results are all but identical to the previous method.
EDIT2:
Okay, so if X is a matrix, not a vector, things are a little more complicated. We basically want to precompute as much as possible outside of the for-loop to keep our costs down. We can also get a significant speed-up by computing XT*X manually - since the result will always be a symmetric matrix, we can cut a few corners to speed things up. First, the symmetric multiplication function:
function XTX = sym_mult(X) % X is a 3-d matrix
n = size(X,2);
XTX = zeros(n,n,size(X,3));
for i=1:n
for j=i:n
XTX(i,j,:) = sum(X(:,i,:).*X(:,j,:));
if i~=j
XTX(j,i,:) = XTX(i,j,:);
end
end
end
Now the actual computation script
xdim = 500;
ydim = 500;
n_timepoints = 10; % for example
Y = randn(10,xdim*ydim);
picture = randn(xdim,ydim,n_timepoints); % 500x500x10
% Actual work
tic % start timing
picture = reshape(picture, [xdim*ydim,n_timepoints])';
% Here we precompute the (XT*Y) calculation to speed things up later
picture_y = [sum(Y);sum(Y.*picture)];
% initialize
est = zeros(size(picture,2),1);
picture = permute(picture,[1,3,2]);
XTX = cat(2,ones(n_timepoints,1,size(picture,3)),picture);
XTX = sym_mult(XTX); % precompute (XT*X) for speed
X = zeros(2,2); % preallocate for speed
XY = zeros(2,1);
for x=1:size(picture,2) % second dimension b/c already transposed
%For some reason this is a lot faster than X = XTX(:,:,x);
X(1,1) = XTX(1,1,x);
X(2,1) = XTX(2,1,x);
X(1,2) = XTX(1,2,x);
X(2,2) = XTX(2,2,x);
XY(1) = picture_y(1,x);
XY(2) = picture_y(2,x);
% Here we utilise the fact that A\B is faster than inv(A)*B
% We also use the fact that (A*B)*C = A*(B*C) to speed things up
B = X\XY;
est(x) = B(1);
end
est = reshape(est,[xdim,ydim]);
toc % end timing
Before: Elapsed time is 4.56 seconds.
After: Elapsed time is 2.24 seconds.
This is a speed up of about a factor of 2. This code should be extensible to X being any dimensions you want. For instance, in the case where X = [1 x x^2], you would change picture_y to the following
picture_y = [sum(Y);sum(Y.*picture);sum(Y.*picture.^2)];
and change XTX to
XTX = cat(2,ones(n_timepoints,1,size(picture,3)),picture,picture.^2);
You would also change a lot of 2s to 3s in the code, and add XY(3) = picture_y(3,x) to the loop. It should be fairly straight-forward, I believe.
Results
I sped up your original version, since your edit 3 was actually not working (and also does something different).
So, on my PC:
Your (original) version: 8.428473 seconds.
My obfuscated one-liner given below: 0.964589 seconds.
First, for no other reason than to impress, I'll give it as I wrote it:
%%// Some example data
xdim = 500;
ydim = 500;
n_timepoints = 10; % for example
estimate = zeros(xdim,ydim); %// initialization with explicit size
picture = randn(xdim,ydim,n_timepoints);
%%// Your original solution
%// (slightly altered to make my version's results agree with yours)
tic
Y = randn(n_timepoints,xdim*ydim);
ii = 1;
for x = 1:xdim
for y = 1:ydim
X = squeeze(picture(x,y,:)); %// or similar creation of X matrix
B = (X'*X)^(-1)*X' * Y(:,ii);
ii = ii+1;
%// sometimes you keep everything and do
%// estimate(x,y,:) = B(:);
%// sometimes just the first element is important and you do
estimate(x,y) = B(1);
end
end
toc
%%// My version
tic
%// UNLEASH THE FURY!!
estimate2 = reshape(sparse(1:xdim*ydim*n_timepoints, ...
builtin('_paren', ones(n_timepoints,1)*(1:xdim*ydim),:), ...
builtin('_paren', permute(picture, [3 2 1]),:))\Y(:), ydim,xdim).'; %'
toc
%%// Check for equality
max(abs(estimate(:)-estimate2(:))) % (always less than ~1e-14)
Breakdown
First, here's the version that you should actually use:
%// Construct sparse block-diagonal matrix
%// (Type "help sparse" for more information)
N = xdim*ydim; %// number of columns
M = N*n_timepoints; %// number of rows
ii = 1:N;
jj = ones(n_timepoints,1)*(1:N);
s = permute(picture, [3 2 1]);
X = sparse(ii,jj(:), s(:));
%// Compute ALL the estimates at once
estimates = X\Y(:);
%// You loop through the *second* dimension first, so to make everything
%// agree, we have to extract elements in the "wrong" order, and transpose:
estimate2 = reshape(estimates, ydim,xdim).'; %'
Here's an example of what picture and the corresponding matrix X looks like for xdim = ydim = n_timepoints = 2:
>> clc, picture, full(X)
picture(:,:,1) =
-0.5643 -2.0504
-0.1656 0.4497
picture(:,:,2) =
0.6397 0.7782
0.5830 -0.3138
ans =
-0.5643 0 0 0
0.6397 0 0 0
0 -2.0504 0 0
0 0.7782 0 0
0 0 -0.1656 0
0 0 0.5830 0
0 0 0 0.4497
0 0 0 -0.3138
You can see why sparse is necessary -- it's mostly zeros, but will grow large quickly. The full matrix would quickly consume all your RAM, while the sparse one will not consume much more than the original picture matrix does.
With this matrix X, the new problem
X·b = Y
now contains all the problems
X1 · b1 = Y1
X2 · b2 = Y2
...
where
b = [b1; b2; b3; ...]
Y = [Y1; Y2; Y3; ...]
so, the single command
X\Y
will solve all your systems at once.
This offloads all the hard work to a set of highly specialized, compiled to machine-specific code, optimized-in-every-way algorithms, rather than the interpreted, generic, always-two-steps-away from the hardware loops in MATLAB.
It should be straightforward to convert this to a version where X is a matrix; you'll end up with something like what blkdiag does, which can also be used by mldivide in exactly the same way as above.
I had a wee play around with an idea, and I decided to stick it as a separate answer, as its a completely different approach to my other idea, and I don't actually condone what I'm about to do. I think this is the fastest approach so far:
Orignal (unoptimised): 13.507176 seconds.
Fast Cholesky-decomposition method: 0.424464 seconds
First, we've got a function to quickly do the X'*X multiplication. We can speed things up here because the result will always be symmetric.
function XX = sym_mult(X)
n = size(X,2);
XX = zeros(n,n,size(X,3));
for i=1:n
for j=i:n
XX(i,j,:) = sum(X(:,i,:).*X(:,j,:));
if i~=j
XX(j,i,:) = XX(i,j,:);
end
end
end
The we have a function to do LDL Cholesky decomposition of a 3D matrix (we can do this because the (X'*X) matrix will always be symmetric) and then do forward and backwards substitution to solve the LDL inversion equation
function Y = fast_chol(X,XY)
n=size(X,2);
L = zeros(n,n,size(X,3));
D = zeros(n,n,size(X,3));
B = zeros(n,1,size(X,3));
Y = zeros(n,1,size(X,3));
% These loops compute the LDL decomposition of the 3D matrix
for i=1:n
D(i,i,:) = X(i,i,:);
L(i,i,:) = 1;
for j=1:i-1
L(i,j,:) = X(i,j,:);
for k=1:(j-1)
L(i,j,:) = L(i,j,:) - L(i,k,:).*L(j,k,:).*D(k,k,:);
end
D(i,j,:) = L(i,j,:);
L(i,j,:) = L(i,j,:)./D(j,j,:);
if i~=j
D(i,i,:) = D(i,i,:) - L(i,j,:).^2.*D(j,j,:);
end
end
end
for i=1:n
B(i,1,:) = XY(i,:);
for j=1:(i-1)
B(i,1,:) = B(i,1,:)-D(i,j,:).*B(j,1,:);
end
B(i,1,:) = B(i,1,:)./D(i,i,:);
end
for i=n:-1:1
Y(i,1,:) = B(i,1,:);
for j=n:-1:(i+1)
Y(i,1,:) = Y(i,1,:)-L(j,i,:).*Y(j,1,:);
end
end
Finally, we have the main script which calls all of this
xdim = 500;
ydim = 500;
n_timepoints = 10; % for example
Y = randn(10,xdim*ydim);
picture = randn(xdim,ydim,n_timepoints); % 500x500x10
tic % start timing
picture = reshape(pr, [xdim*ydim,n_timepoints])';
% Here we precompute the (XT*Y) calculation
picture_y = [sum(Y);sum(Y.*picture)];
% initialize
est2 = zeros(size(picture,2),1);
picture = permute(picture,[1,3,2]);
% Now we calculate the X'*X matrix
XTX = cat(2,ones(n_timepoints,1,size(picture,3)),picture);
XTX = sym_mult(XTX);
% Call our fast Cholesky decomposition routine
B = fast_chol(XTX,picture_y);
est2 = B(1,:);
est2 = reshape(est2,[xdim,ydim]);
toc
Again, this should work equally well for a Nx3 X matrix, or however big you want.
I use octave, thus I can't say anything about the resulting performance in Matlab, but would expect this code to be slightly faster:
pictureT=picture'
est=arrayfun(#(x)( (pictureT(x,:)*picture(:,x))^-1*pictureT(x,:)*randn(n_ti
mepoints,1)),1:size(picture,2));

How to obtain complexity cosine similarity in Matlab?

I have implemented cosine similarity in Matlab like this. In fact, I have a two-dimensional 50-by-50 matrix. To obtain a cosine should I compare items in a line by line form.
for j = 1:50
x = dat(j,:);
for i = j+1:50
y = dat(i,:);
c = dot(x,y);
sim = c/(norm(x,2)*norm(y,2));
end
end
Is this correct?
and The question is this: wath is the complexity or O(n) in this state?
Just a note on an efficient implementation of the same thing using vectorized and matrix-wise operations (which are optimized in MATLAB). This can have huge time savings for large matrices:
dat = randn(50, 50);
OP (double-for) implementation:
sim = zeros(size(dat));
nRow = size(dat,1);
for j = 1:nRow
x = dat(j, :);
for i = j+1:nRow
y = dat(i, :);
c = dot(x, y);
sim(j, i) = c/(norm(x,2)*norm(y,2));
end
end
Vectorized implementation:
normDat = sqrt(sum(dat.^2, 2)); % L2 norm of each row
datNorm = bsxfun(#rdivide, dat, normDat); % normalize each row
dotProd = datNorm*datNorm'; % dot-product vectorized (redundant!)
sim2 = triu(dotProd, 1); % keep unique upper triangular part
Comparisons for 1000 x 1000 matrix: (MATLAB 2013a, x64, Intel Core i7 960 # 3.20GHz)
Elapsed time is 34.103095 seconds.
Elapsed time is 0.075208 seconds.
sum(sum(sim-sim2))
ans =
-1.224314766369880e-14
Better end with 49. Maybe you should also add an index to sim?
for j = 1:49
x = dat(j,:);
for i = j+1:50
y = dat(i,:);
c = dot(x,y);
sim(j) = c/(norm(x,2)*norm(y,2));
end
end
The complexity should be roughly like o(n^2), isn't it?
Maybe you should have a look at correlation functions ... I don't get what you want to write exactly, but it looks like you want to do something similar. There are built-in correlation functions in Matlab.

Multiply a 3D matrix with a 2D matrix

Suppose I have an AxBxC matrix X and a BxD matrix Y.
Is there a non-loop method by which I can multiply each of the C AxB matrices with Y?
As a personal preference, I like my code to be as succinct and readable as possible.
Here's what I would have done, though it doesn't meet your 'no-loops' requirement:
for m = 1:C
Z(:,:,m) = X(:,:,m)*Y;
end
This results in an A x D x C matrix Z.
And of course, you can always pre-allocate Z to speed things up by using Z = zeros(A,D,C);.
You can do this in one line using the functions NUM2CELL to break the matrix X into a cell array and CELLFUN to operate across the cells:
Z = cellfun(#(x) x*Y,num2cell(X,[1 2]),'UniformOutput',false);
The result Z is a 1-by-C cell array where each cell contains an A-by-D matrix. If you want Z to be an A-by-D-by-C matrix, you can use the CAT function:
Z = cat(3,Z{:});
NOTE: My old solution used MAT2CELL instead of NUM2CELL, which wasn't as succinct:
[A,B,C] = size(X);
Z = cellfun(#(x) x*Y,mat2cell(X,A,B,ones(1,C)),'UniformOutput',false);
Here's a one-line solution (two if you want to split into 3rd dimension):
A = 2;
B = 3;
C = 4;
D = 5;
X = rand(A,B,C);
Y = rand(B,D);
%# calculate result in one big matrix
Z = reshape(reshape(permute(X, [2 1 3]), [A B*C]), [B A*C])' * Y;
%'# split into third dimension
Z = permute(reshape(Z',[D A C]),[2 1 3]);
Hence now: Z(:,:,i) contains the result of X(:,:,i) * Y
Explanation:
The above may look confusing, but the idea is simple.
First I start by take the third dimension of X and do a vertical concatenation along the first dim:
XX = cat(1, X(:,:,1), X(:,:,2), ..., X(:,:,C))
... the difficulty was that C is a variable, hence you can't generalize that expression using cat or vertcat. Next we multiply this by Y:
ZZ = XX * Y;
Finally I split it back into the third dimension:
Z(:,:,1) = ZZ(1:2, :);
Z(:,:,2) = ZZ(3:4, :);
Z(:,:,3) = ZZ(5:6, :);
Z(:,:,4) = ZZ(7:8, :);
So you can see it only requires one matrix multiplication, but you have to reshape the matrix before and after.
I'm approaching the exact same issue, with an eye for the most efficient method. There are roughly three approaches that i see around, short of using outside libraries (i.e., mtimesx):
Loop through slices of the 3D matrix
repmat-and-permute wizardry
cellfun multiplication
I recently compared all three methods to see which was quickest. My intuition was that (2) would be the winner. Here's the code:
% generate data
A = 20;
B = 30;
C = 40;
D = 50;
X = rand(A,B,C);
Y = rand(B,D);
% ------ Approach 1: Loop (via #Zaid)
tic
Z1 = zeros(A,D,C);
for m = 1:C
Z1(:,:,m) = X(:,:,m)*Y;
end
toc
% ------ Approach 2: Reshape+Permute (via #Amro)
tic
Z2 = reshape(reshape(permute(X, [2 1 3]), [A B*C]), [B A*C])' * Y;
Z2 = permute(reshape(Z2',[D A C]),[2 1 3]);
toc
% ------ Approach 3: cellfun (via #gnovice)
tic
Z3 = cellfun(#(x) x*Y,num2cell(X,[1 2]),'UniformOutput',false);
Z3 = cat(3,Z3{:});
toc
All three approaches produced the same output (phew!), but, surprisingly, the loop was the fastest:
Elapsed time is 0.000418 seconds.
Elapsed time is 0.000887 seconds.
Elapsed time is 0.001841 seconds.
Note that the times can vary quite a lot from one trial to another, and sometimes (2) comes out the slowest. These differences become more dramatic with larger data. But with much bigger data, (3) beats (2). The loop method is still best.
% pretty big data...
A = 200;
B = 300;
C = 400;
D = 500;
Elapsed time is 0.373831 seconds.
Elapsed time is 0.638041 seconds.
Elapsed time is 0.724581 seconds.
% even bigger....
A = 200;
B = 200;
C = 400;
D = 5000;
Elapsed time is 4.314076 seconds.
Elapsed time is 11.553289 seconds.
Elapsed time is 5.233725 seconds.
But the loop method can be slower than (2), if the looped dimension is much larger than the others.
A = 2;
B = 3;
C = 400000;
D = 5;
Elapsed time is 0.780933 seconds.
Elapsed time is 0.073189 seconds.
Elapsed time is 2.590697 seconds.
So (2) wins by a big factor, in this (maybe extreme) case. There may not be an approach that is optimal in all cases, but the loop is still pretty good, and best in many cases. It is also best in terms of readability. Loop away!
Nope. There are several ways, but it always comes out in a loop, direct or indirect.
Just to please my curiosity, why would you want that anyway ?
To answer the question, and for readability, please see:
ndmult, by ajuanpi (Juan Pablo Carbajal), 2013, GNU GPL
Input
2 arrays
dim
Example
nT = 100;
t = 2*pi*linspace (0,1,nT)’;
# 2 experiments measuring 3 signals at nT timestamps
signals = zeros(nT,3,2);
signals(:,:,1) = [sin(2*t) cos(2*t) sin(4*t).^2];
signals(:,:,2) = [sin(2*t+pi/4) cos(2*t+pi/4) sin(4*t+pi/6).^2];
sT(:,:,1) = signals(:,:,1)’;
sT(:,:,2) = signals(:,:,2)’;
G = ndmult (signals,sT,[1 2]);
Source
Original source. I added inline comments.
function M = ndmult (A,B,dim)
dA = dim(1);
dB = dim(2);
# reshape A into 2d
sA = size (A);
nA = length (sA);
perA = [1:(dA-1) (dA+1):(nA-1) nA dA](1:nA);
Ap = permute (A, perA);
Ap = reshape (Ap, prod (sA(perA(1:end-1))), sA(perA(end)));
# reshape B into 2d
sB = size (B);
nB = length (sB);
perB = [dB 1:(dB-1) (dB+1):(nB-1) nB](1:nB);
Bp = permute (B, perB);
Bp = reshape (Bp, sB(perB(1)), prod (sB(perB(2:end))));
# multiply
M = Ap * Bp;
# reshape back to original format
s = [sA(perA(1:end-1)) sB(perB(2:end))];
M = squeeze (reshape (M, s));
endfunction
I highly recommend you use the MMX toolbox of matlab. It can multiply n-dimensional matrices as fast as possible.
The advantages of MMX are:
It is easy to use.
Multiply n-dimensional matrices (actually it can multiply arrays of 2-D matrices)
It performs other matrix operations (transpose, Quadratic Multiply, Chol decomposition and more)
It uses C compiler and multi-thread computation for speed up.
For this problem, you just need to write this command:
C=mmx('mul',X,Y);
here is a benchmark for all possible methods. For more detail refer to this question.
1.6571 # FOR-loop
4.3110 # ARRAYFUN
3.3731 # NUM2CELL/FOR-loop/CELL2MAT
2.9820 # NUM2CELL/CELLFUN/CELL2MAT
0.0244 # Loop Unrolling
0.0221 # MMX toolbox <===================
I would like to share my answer to the problems of:
1) making the tensor product of two tensors (of any valence);
2) making the contraction of two tensors along any dimension.
Here are my subroutines for the first and second tasks:
1) tensor product:
function [C] = tensor(A,B)
C = squeeze( reshape( repmat(A(:), 1, numel(B)).*B(:).' , [size(A),size(B)] ) );
end
2) contraction:
Here A and B are the tensors to be contracted along the dimesions i and j respectively. The lengths of these dimensions should be equal, of course. There's no check for this (this would obscure the code) but apart from this it works well.
function [C] = tensorcontraction(A,B, i,j)
sa = size(A);
La = length(sa);
ia = 1:La;
ia(i) = [];
ia = [ia i];
sb = size(B);
Lb = length(sb);
ib = 1:Lb;
ib(j) = [];
ib = [j ib];
% making the i-th dimension the last in A
A1 = permute(A, ia);
% making the j-th dimension the first in B
B1 = permute(B, ib);
% making both A and B 2D-matrices to make use of the
% matrix multiplication along the second dimension of A
% and the first dimension of B
A2 = reshape(A1, [],sa(i));
B2 = reshape(B1, sb(j),[]);
% here's the implicit implication that sa(i) == sb(j),
% otherwise - crash
C2 = A2*B2;
% back to the original shape with the exception
% of dimensions along which we've just contracted
sa(i) = [];
sb(j) = [];
C = squeeze( reshape( C2, [sa,sb] ) );
end
Any critics?
I would think recursion, but that's the only other non- loop method you can do
You could "unroll" the loop, ie write out all the multiplications sequentially that would occur in the loop