Duplicating a 2d matrix in matlab along a 3rd axis MANY times - matlab

I'm looking to duplication a 784x784 matrix in matlab along a 3rd axis. The following code seems to work:
mat = reshape(repmat(mat, 1,10000),784,784,10000);
Unfortunately, it takes so long to run it's worthless (changing the 10,000s to 1000 makes it take a few minutes, and using 10,000 makes my whole machine freeze up practically). is there a faster way to do this?
For reference, I'm looking to use mvnpdf on 10,000 vectors each of length 784, using the same covariance matrix for each. So my final call looks like
mvnpdf(X,mu,mat)
%size(X) = (10000,784), size(mu) = (10000,784), size(mat) = 784,784,10000
If there's a way to do this that's not repeating the covariance matrix 10,000 times, that'd be helpful too. Thanks!

For replication in more than 2 dimensions, you need to supply the replication counts as an array:
out = repmat(mat,[1,1,10000])

Creating a 784x784 matrix 10,000 times isn't going to take advantage of the vectorization in MATLAB, which is going to be more useful for small arrays. Avoiding a for loop also won't help too much, given the following:
The main speedup you can gain here is by computing the inverse of the covariance matrix once, and then computing the pdf yourself. The inverse of sigma takes O(n^3), and you are needlessly doing that 10,000 times. (Also, the square root determinant can be precomputed.) For reference, the PDF of the multivariate normal distribution is computed as follows:
http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Properties
Better to just compute the inverse once, and then compute z = x - mu for each value, then doing z'Sz for each pdf value, and applying a simple function and a constant. But wait! You can vectorize that, too.
I don't have MATLAB in front of me, but this is basically what you need to do, and it'll run in an instant.
s = inv(sigma);
c = -0.5*log(det(s)) - (k/2)*log(2*pi);
z = x - mu; % 10000 x 784 matrix
ans = exp( c - 0.5 .* dot(z*s, z, 2) ); % 10000 x 1 vector

Related

Calculating covariance in Matlab for large dataset and different mean

So I'm trying to implement an EM-Algorithm to train a Gaussian Class Conditional model for classifying data. I'm stuck in the M-step at the moment because I can't figure out how to calculate the covariance matrix.
The problem is I have a big data set and using a for loop to go through each point would be way to slow. I also can't use the covariance function cov(), because I need to use a mean which I calculated using this formula(mu symbol one)
Is there a way to adjust cov() to use the mean I want? Or is there another way I could do this without for loops?
Edit: Forgot to explain what the data matrix is like. Its an nx3 where each row is a data point.
It technically needs to work for the general case nxm but n is usually really big(1000 or more) while m is relatively small.
You can calculate your covariance matrix manually. Let data be the matrix containing all your variables (for example, [x y]) and mu your custom mean, proceed as follows:
n = size(data,1);
data_dem = data - (ones(n,1) * mu);
cov_mat = (data_dem.' * data_dem) ./ (n - 1);
Notice that I used the Bessel's Correction (n-1 instead of n) because the Matlab cov function uses it, unless you specify the third argument as 1:
cov_mat = cov(x,y,1);
C = cov(___,w) specifies the normalization weight for any of the
previous syntaxes. When w = 0 (default), C is normalized by the number
of observations-1. When w = 1, it is normalized by the number of
observations.

Multiplying multi-dimensional matrices efficiently

I'd love to know if there is a more efficient way to multiply specific elements of multi-dimensional matrices that doesn't require a 'for' loop.
I have a region * time matrix for an individual (say, 50 regions and 1000 timepoints) and I want to multiply each pair of regions at each timepoint to create a new matrix of the products of each region pair at each time point (50 x 50 x 1000). The way that I'm currently running it is:
for t = 1:1000
for i = 1:50
for j = 1:50
new(i,j,t) = old(i,t) .* old(j,t)
As I'm sure you can imagine, this is super slow. Any ideas on how i can fix it up so that it will run more quickly?
%some example data easy to trace
old=[1:5]'
old(:,2)=old*i
%multiplicatiion
a=permute(old,[1,3,2])
b=permute(old,[3,1,2])
bsxfun(#times,a,b)
permute is used to make 3d-matrices with dimensions n*1*m and 1*n*m out of the n*m input matrix. Changing the dimensions this way, new(i,j,k) can be calculated using new(i,j,k)=a(i,1,k)*b(1,j,k). Applying such operations element-by-element is what bsxfun was designed for.
Regarding bsxfun, try to understand simple 2d-examples like bsxfun(#times,[1:7],[1,10,100]') first

How to calculate matrix entries efficently using Matlab

I have a cell array myBasis of sparse matricies B_1,...,B_n.
I want to evaluate with Matlab the matrix Q(i,j) = trace (B^T_i * B_j).
Therefore, I wrote the following code:
for i=1:n
for j=1:n
B=myBasis{i};
C=myBasis{j};
Q(i,j)=trace(B'*C);
end
end
Which takes already 68 seconds when n=1226 and B_i has 50 rows, and 50 colums.
Is there any chance to speed this up? Usually I exclude for-loops from my matlab code in a c++ file - but I have no experience how to handle a sparse cell array in C++.
As noted by Inox Q is symmetric and therefore you only need to explicitly compute half the entries.
Computing trace( B.'*C ) is equivalent to B(:).'*C(:):
trace(B.'*C) = sum_i [B.'*C]_ii = sum_i sum_j B_ij * C_ij
which is the sum of element-wise products and therefore equivalent to B(:).'*C(:).
When explicitly computing trace( B.'*C ) you are actually pre-computing all k-by-k entries of B.'*C only to use the diagonal later on. AFAIK, Matlab does not optimize its calculation to save it from computing all the entries.
Here's a way
for ii = 1:n
B = myBasis{ii};
for jj = ii:n
C = myBasis{jj};
t = full( B(:).'*C(:) ); % equivalent to trace(B'*C)!
Q(ii,jj) = t;
Q(jj,ii) = t;
end
end
PS,
It is best not to use i and j as variable names in Matlab.
PPS,
You should notice that ' operator in Matlab is not matrix transpose, but hermitian conjugate, for actual transpose you need to use .'. In most cases complex numbers are not involved and there is no difference between the two operators, but once complex data is introduced, confusing between the two operators makes debugging quite a mess...
Well, a couple of thoughts
1) Basic stuff: A'*B = (B'*A)' and trace(A) = trace(A'). Well, only this trick cut your calculations by almost 50%. Your Q(i,j) matrix is symmetric, and you only need to calculate n(n+1)/2 terms (and not n²)
2) To calculate the trace you don't need to calculate every term of B'*C, just the diagonal. Nevertheless, I don't know if it's easy to create a script in Matlab that is actually faster then just calculating B'*C (MatLab is pretty fast with matrix operations).
But I would definitely implement (1)

Faster projected-norm (quadratic-form, metric-matrix...) style computations

I need to perform lots of evaluations of the form
X(:,i)' * A * X(:,i) i = 1...n
where X(:,i) is a vector and A is a symmetric matrix. Ostensibly, I can either do this in a loop
for i=1:n
z(i) = X(:,i)' * A * X(:,i)
end
which is slow, or vectorise it as
z = diag(X' * A * X)
which wastes RAM unacceptably when X has a lot of columns. Currently I am compromising on
Y = A * X
for i=1:n
z(i) = Y(:,i)' * X(:,i)
end
which is a little faster/lighter but still seems unsatisfactory.
I was hoping there might be some matlab/scilab idiom or trick to achieve this result more efficiently?
Try this in MATLAB:
z = sum(X.*(A*X));
This gives results equivalent to Federico's suggestion using the function DOT, but should run slightly faster. This is because the DOT function internally computes the result the same way as I did above using the SUM function. However, DOT also has additional input argument checks and extra computation for cases where you are dealing with complex numbers, which is extra overhead you probably don't want or need.
A note on computational efficiency:
Even though the time difference is small between how fast the two methods run, if you are going to be performing the operation many times over it's going to start to add up. To test the relative speeds, I created two 100-by-100 matrices of random values and timed the two methods over many runs to get an average execution time:
METHOD AVERAGE EXECUTION TIME
--------------------------------------------
Z = sum(X.*Y); 0.0002595 sec
Z = dot(X,Y); 0.0003627 sec
Using SUM instead of DOT therefore reduces the execution time of this operation by about 28% for matrices with around 10,000 elements. The larger the matrices, the more negligible this difference will be between the two methods.
To summarize, if this computation represents a significant bottleneck in how fast your code is running, I'd go with the solution using SUM. Otherwise, either solution should be fine.
Try this:
z = dot(X, A*X)
I don't have Matlab here to test, but it works on Octave, so I expect Matlab to have an analogous dot() function.
From Octave's help:
-- Function File: dot (X, Y, DIM)
Computes the dot product of two vectors. If X and Y are matrices,
calculate the dot-product along the first non-singleton dimension.
If the optional argument DIM is given, calculate the dot-product
along this dimension.
For completeness, gnovice's answer in Scilab would be
z = sum(X .* Y, 1)'

tformfwd and tforminv - what's the difference?

Suppose I have an arbitrary transformation matrix A such as,
A =
0.9966 0.0007 -6.5625
0.0027 0.9938 1.0598
0 0 1.0000
And a set of points such that their x and y coordinates are represented by X and Y respectively.
And suppose,
[Xf Yf] = tformfwd(maketform('projective',A),X,Y);
Now,
[Xff Yff] = tformfwd(maketform('projective',inv(A)),Xf,Yf);
[Xfi Yfi] = tforminv(maketform('projective',A),Xf,Yf);
[Xff Yff] and [Xfi Yfi] seem to be exactly the same (and they should).
Is tforminv just there for convenience or am I missing something here?
I'll preface this by saying it is my best guess...
It's possible that tforminv may perform the transformation without actually forming the inverse matrix. For example, you can solve a system of linear equations Ax = b in two ways:
x = inv(A)*b;
x = A\b;
According to the documentation for inv, the second option (using the matrix division operator) can perform better "from both an execution time and numerical accuracy standpoint" since it "produces the solution using Gaussian elimination, without forming the inverse". tforminv may do something similar and thus show better overall behavior compared with passing the inverse matrix to tformfwd.
If you were so inclined, you could probably try a number of different transformation matrices and test the two approaches (tforminv or tformfwd and inv) to see how accurate the results are and how fast they are each computed.