Solving a large system takes too much memory? - matlab

Suppose i have a sparse matrix M with the following properties:
size(M) -> 100000 100000
sprank(M) -> 99236
nnz(M) -> 499987
numel(M) -> 1.0000e+10
How come solving the system takes way more than 8GB of RAM? whos('M') gives only 8.4mb.
I'm using the following code (provided at http://www.mathworks.com/moler/exm/chapters/pagerank.pdf)
function x = pagerank(G,p)
G = G - diag(diag(G));
[n,n] = size(G);
c = full(sum(G,1));
r = full(sum(G,2));
% Scale column sums to be 1 (or 0 where there are no out links).
k = find(c~=0);
D = sparse(k,k,1./c(k),n,n);
% Solve (I - p*G*D)*x = e
e = ones(n,1);
I = speye(n,n);
x = (I - p*G*D)\e;
% Normalize so that sum(x) == 1.
x = x/sum(x);`

Left divide! that x = (I - p*G*D)\e does way more things that what it seems!
From Matlab mldivide for sparse matrices:
Not all the solvers take the same amount of memory, and some of them take a lot. Left dividing in Matlab is fantastic, but you need to know what you are doing.
I suggest to have a look to some iterative solvers if you run out of memory, such as Preconditioned Conjugate Gradient (PGC) or Algebraic Multigrid (AMG) or in case of complex numbers I think Biconjugate gradients stabilized method works fine
If you dont know where to start, I highly recommend PGC. In the project I am working on my code for left dividing is something like:
% CAUTION! PSEUDOCODE! do not try to run
try
x=A\b
catch
x=pgc(A,b)
end

Related

Julia vs. MATLAB - Distance Matrix - Run Time Test

I started learning Julia not a long time ago and I decided to do a simple
comparison between Julia and Matlab on a simple code for computing Euclidean
distance matrices from a set of high dimensional points.
The task is simple and can be divided into two cases:
Case 1: Given two datasets in the form of n x d matrices, say X1 and X2, compute the pair wise Euclidean distance between each point in X1 and all the points in X2. If X1 is of size n1 x d, and X2 is of size n2 x d, then the resulting Euclidean distance matrix D will be of size n1 x n2. In the general setting, matrix D is not symmetric, and diagonal elements are not equal to zero.
Case 2: Given one dataset in the form of n x d matrix X, compute the pair wise Euclidean distance between all the n points in X. The resulting Euclidean distance matrix D will be of size n x n, symmetric, with zero elements on the main diagonal.
My implementation of these functions in Matlab and in Julia is given below. Note that none of the implementations rely on loops of any sort, but rather simple linear algebra operations. Also, note that the implementation using both languages is very similar.
My expectations before running any tests for these implementations is that the Julia code will be much faster than the Matlab code, and by a significant margin. To my surprise, this was not the case!
The parameters for my experiments are given below with the code. My machine is a MacBook Pro. (15" Mid 2015) with 2.8 GHz Intel Core i7 (Quad Core), and 16 GB 1600 MHz DDR3.
Matlab version: R2018a
Julia version: 0.6.3
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
The results are given in Table (1) below.
Table 1: Average time in seconds (with standard deviation) over 30 trials for computing Euclidean distance matrices between two different datasets (Col. 1),
and between all pairwise points in one dataset (Col. 2).
Two Datasets || One Dataset
Matlab: 2.68 (0.12) sec. 1.88 (0.04) sec.
Julia V1: 5.38 (0.17) sec. 4.74 (0.05) sec.
Julia V2: 5.2 (0.1) sec.
I was not expecting this significant difference between both languages. I expected Julia to be faster than Matlab, or at least, as fast as Matlab. It was really a surprise to see that Matlab is almost 2.5 times faster than Julia in this particular task. I didn't want to draw any early conclusions based on these results for few reasons.
First, while I think that my Matlab implementation is as good as it can be, I'm wondering whether my Julia implementation is the best one for this task. I'm still learning Julia and I hope there is a more efficient Julia code that can yield faster computation time for this task. In particular, where is the main bottleneck for Julia in this task? Or, why does Matlab have an edge in this case?
Second, my current Julia package is based on the generic and standard BLAS and LAPACK packages for MacOS. I'm wondering whether JuliaPro with BLAS and LAPACK based on Intel MKL will be faster than the current version I'm using. This is why I opted to get some feedback from more knowledgeable people on StackOverflow.
The third reason is that I'm wondering whether the compile time for Julia was
included in the timings shown in Table 1 (2nd and 3rd rows), and whether there is a better way to assess the execution time for a function.
I will appreciate any feedback on my previous three questions.
Thank you!
Hint: This question has been identified as a possible duplicate of another question on StackOverflow. However, this is not entirely true. This question has three aspects as reflected by the answers below. First, yes, one part of the question is related to the comparison of OpenBLAS vs. MKL. Second, it turns out that the implementation as well can be improved as shown by one of the answers. And last, bench-marking the julia code itself can be improved by using BenchmarkTools.jl.
MATLAB
num_trials = 30;
dim = 1000;
n1 = 10000;
n2 = 10000;
T = zeros(num_trials,1);
XX1 = randn(n1,dim);
XX2 = rand(n2,dim);
%%% DIFEERENT MATRICES
DD2ds = zeros(n1,n2);
for (i = 1:num_trials)
tic;
DD2ds = distmat_euc2ds(XX1,XX2);
T(i) = toc;
end
mt = mean(T);
st = std(T);
fprintf(1,'\nDifferent Matrices:: dim: %d, n1 x n2: %d x %d -> Avg. Time %f (+- %f) \n',dim,n1,n2,mt,st);
%%% SAME Matrix
T = zeros(num_trials,1);
DD1ds = zeros(n1,n1);
for (i = 1:num_trials)
tic;
DD1ds = distmat_euc1ds(XX1);
T(i) = toc;
end
mt = mean(T);
st = std(T);
fprintf(1,'\nSame Matrix:: dim: %d, n1 x n1 : %d x %d -> Avg. Time %f (+- %f) \n\n',dim,n1,n1,mt,st);
distmat_euc2ds.m
function [DD] = distmat_euc2ds (XX1,XX2)
n1 = size(XX1,1);
n2 = size(XX2,1);
DD = sqrt(ones(n1,1)*sum(XX2.^2.0,2)' + (ones(n2,1)*sum(XX1.^2.0,2)')' - 2.*XX1*XX2');
end
distmat_euc1ds.m
function [DD] = distmat_euc1ds (XX)
n1 = size(XX,1);
GG = XX*XX';
DD = sqrt(ones(n1,1)*diag(GG)' + diag(GG)*ones(1,n1) - 2.*GG);
end
JULIA
include("distmat_euc.jl")
num_trials = 30;
dim = 1000;
n1 = 10000;
n2 = 10000;
T = zeros(num_trials);
XX1 = randn(n1,dim)
XX2 = rand(n2,dim)
DD = zeros(n1,n2)
# Euclidean Distance Matrix: Two Different Matrices V1
# ====================================================
for i = 1:num_trials
tic()
DD = distmat_eucv1(XX1,XX2)
T[i] = toq();
end
mt = mean(T)
st = std(T)
println("Different Matrices V1:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")
# Euclidean Distance Matrix: Two Different Matrices V2
# ====================================================
for i = 1:num_trials
tic()
DD = distmat_eucv2(XX1,XX2)
T[i] = toq();
end
mt = mean(T)
st = std(T)
println("Different Matrices V2:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")
# Euclidean Distance Matrix: Same Matrix V1
# =========================================
for i = 1:num_trials
tic()
DD = distmat_eucv1(XX1)
T[i] = toq();
end
mt = mean(T)
st = std(T)
println("Same Matrix V1:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")
distmat_euc.jl
function distmat_eucv1(XX1::Array{Float64,2},XX2::Array{Float64,2})
(num1,dim1) = size(XX1)
(num2,dim2) = size(XX2)
if (dim1 != dim2)
error("Matrices' 2nd dimensions must agree!")
end
DD = sqrt.((ones(num1)*sum(XX2.^2.0,2)') +
(ones(num2)*sum(XX1.^2.0,2)')' - 2.0.*XX1*XX2');
end
function distmat_eucv2(XX1::Array{Float64,2},XX2::Array{Float64,2})
(num1,dim1) = size(XX1)
(num2,dim2) = size(XX2)
if (dim1 != dim2)
error("Matrices' 2nd dimensions must agree!")
end
DD = (ones(num1)*sum(Base.FastMath.pow_fast.(XX2,2.0),2)') +
(ones(num2)*sum(Base.FastMath.pow_fast.(XX1,2.0),2)')' -
Base.LinAlg.BLAS.gemm('N','T',2.0,XX1,XX2);
DD = Base.FastMath.sqrt_fast.(DD)
end
function distmat_eucv1(XX::Array{Float64,2})
n = size(XX,1)
GG = XX*XX';
DD = sqrt.(ones(n)*diag(GG)' + diag(GG)*ones(1,n) - 2.0.*GG)
end
First question: If I re-write the julia distance function like so:
function dist2(X1::Matrix, X2::Matrix)
size(X1, 2) != size(X2, 2) && error("Matrices' 2nd dimensions must agree!")
return sqrt.(sum(abs2, X1, 2) .+ sum(abs2, X2, 2)' .- 2 .* (X1 * X2'))
end
I shave >40% off the execution time.
For a single dataset you can save a bit more, like this:
function dist2(X::Matrix)
G = X * X'
dG = diag(G)
return sqrt.(dG .+ dG' .- 2 .* G)
end
Third question: You should do your benchmarking with BenchmarkTools.jl, and perform the benchmarking like this (remember $ for variable interpolation):
julia> using BenchmarkTools
julia> #btime dist2($XX1, $XX2);
Additionally, you should not do powers using floats, like this: X.^2.0. It is faster, and equally correct to write X.^2.
For multiplication there is no speed difference between 2.0 .* X and 2 .* X, but you should still prefer using an integer, because it is more generic. As an example, if X has Float32 elements, multiplying with 2.0 will promote the array to Float64s, while multiplying with 2 will preserve the eltype.
And finally, note that in new versions of Matlab, too, you can get broadcasting behaviour by simply adding Mx1 arrays with 1xN arrays. There is no need to first expand them by multiplying with ones(...).

Vectorize Evaluations of Meshgrid Points in Matlab

I need the "for" loop in the following representative section of code to run as efficiently as possible. The mean function in the code is acting as a representative placeholder for my own function.
x = linspace(-1,1,15);
y = linspace(2,4,15);
[xgrid, ygrid] = meshgrid(x,y);
mc = rand(100000,1);
z=zeros(size(xgrid));
for i=1:length(xgrid)
for j=1:length(ygrid)
z(i,j) = mean(xgrid(i,j) + ygrid(i,j) + xgrid(i,j)*ygrid(i,j)*mc);
end
end
I have vectorized the code and improved its speed by about 2.5 times by building a matrix in which mc is replicated for each grid point. My implementation results in a very large matrix (3 x 22500000) filled with repeated data. I've mitigated the memory penalty of this approach by converting the matrix to single precision, but it seems like there should be a more efficient way to do what I want that avoids replicating so much data.
You could use bsxfun with few reshapes -
A = bsxfun(#times,y,x.'); %//'
B = bsxfun(#plus,y,x.'); %//'
C = mean(bsxfun(#plus,bsxfun(#times,mc,reshape(A,1,[])) , reshape(B,1,[])),1);
z_out = reshape(C,numel(x),[]).';

How can I streamline this code and reduce the time it takes to run?

I have a code which works perfectly, and I'm looking to make it more efficient.
t = -1:.001:1;
t_for_y = -50:.01:50;
x = zeros(size(t));
x(1001:end) = exp(-3 * t(1001:end));
h = zeros(size(t));
h(1001:end) = exp(-2 * t(1001:end)); % FIXED TYPO
for k = 1:length(t_for_y)
X(k)=trapz(t,x.*exp(-1i*t*t_for_y(k)));
H(k)=trapz(t,h.*exp(-1i*t*t_for_y(k)));
end
Y = X.*H;
for k = 1:length(t)
y(k) = (1/(2*pi))*trapz(t_for_y,Y.*exp(1i*t(k)*t_for_y));
end
plot(t,real(y));grid on;
I'd like to only use one for-loop or no for loops is this possible?
Is there a way of using doing this faster?
The trapz function can take a matrix as the second input (see help trapz for more info). This means that your first column can be replaced by the following:
t_i = 1i*t';
exp_t = bsxfun(#times,t_i,t_for_y); % Precompute for speed
xexp = bsxfun(#times,x',exp_t);
hexp = bsxfun(#times,h',exp_t);
% NOTE: As you've got it, X and H are identical - I assume this is a typo
X = trapz(t,xexp,1);
H = trapz(t,xexp,1);
Be aware that this will generate some fairly large matrices (~2000 X 10000), which can eat up your memory if you're not careful.
The second loop can be linearised in a similar manner:
% Using exp_t from the previous loop
yexp = bsxfun(#times,Y,exp_t);
% NOTE: As you've got it, X and H are identical - I assume this is a typo
y = trapz(t_for_y,xexp,2);
Again, this will use a lot of memory. You may find that you will save memory by using sparse matrices.
If memory is at a premium for you, then your original code is better (though you should preallocate X, H and y for a slight speed boost), as the time saved by linearising it is not really enough to justify the extra memory. If you've got memory aplenty, then this method is slightly faster.

MATLAB code for calculating MFCC

I have a question if that's ok. I was recently looking for algorithm to calculate MFCCs. I found a good tutorial rather than code so I tried to code it by myself. I still feel like I am missing one thing. In the code below I took FFT of a signal, calculated normalized power, filter a signal using triangular shapes and eventually sum energies corresponding to each bank to obtain MFCCs.
function output = mfcc(x,M,fbegin,fs)
MF = #(f) 2595.*log10(1 + f./700);
invMF = #(m) 700.*(10.^(m/2595)-1);
M = M+2; % number of triangular filers
mm = linspace(MF(fbegin),MF(fs/2),M); % equal space in mel-frequency
ff = invMF(mm); % convert mel-frequencies into frequency
X = fft(x);
N = length(X); % length of a short time window
N2 = max([floor(N+1)/2 floor(N/2)+1]); %
P = abs(X(1:N2,:)).^2./N; % NoFr no. of periodograms
mfccShapes = triangularFilterShape(ff,N,fs); %
output = log(mfccShapes'*P);
end
function [out,k] = triangularFilterShape(f,N,fs)
N2 = max([floor(N+1)/2 floor(N/2)+1]);
M = length(f);
k = linspace(0,fs/2,N2);
out = zeros(N2,M-2);
for m=2:M-1
I = k >= f(m-1) & k <= f(m);
J = k >= f(m) & k <= f(m+1);
out(I,m-1) = (k(I) - f(m-1))./(f(m) - f(m-1));
out(J,m-1) = (f(m+1) - k(J))./(f(m+1) - f(m));
end
end
Could someone please confirm that this is all right or direct me if I made mistake> I tested it on a simple pure tone and it gives me, in my opinion, reasonable answers.
Any help greatly appreciated :)
PS. I am working on how to apply vectorized Cosinus Transform. It looks like I would need a matrix of MxM of transform coefficients but I did not find any source that would explain how to do it.
You can test it yourself by comparing your results against other implementations like this one here
you will find a fully configurable matlab toolbox incl. MFCCs and even a function to reverse MFCC back to a time signal, which is quite handy for testing purposes:
melfcc.m - main function for calculating PLP and MFCCs from sound waveforms, supports many options.
invmelfcc.m - main function for inverting back from cepstral coefficients to spectrograms and (noise-excited) waveforms, options exactly match melfcc (to invert that processing).
the page itself has a lot of information on the usage of the package.

How can I speed up this call to quantile in Matlab?

I have a MATLAB routine with one rather obvious bottleneck. I've profiled the function, with the result that 2/3 of the computing time is used in the function levels:
The function levels takes a matrix of floats and splits each column into nLevels buckets, returning a matrix of the same size as the input, with each entry replaced by the number of the bucket it falls into.
To do this I use the quantile function to get the bucket limits, and a loop to assign the entries to buckets. Here's my implementation:
function [Y q] = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q)
q=transpose(q);
end
Y = zeros(size(X));
for i = 1:nLevels
% "The variables g and l indicate the entries that are respectively greater than
% or less than the relevant bucket limits. The line Y(g & l) = i is assigning the
% value i to any element that falls in this bucket."
if i ~= nLevels % "The default; doesnt include upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#lt,X,q(i+1,:));
else % "For the final level we include the upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#le,X,q(i+1,:));
end
Y(g & l) = i;
end
Is there anything I can do to speed this up? Can the code be vectorized?
If I understand correctly, you want to know how many items fell in each bucket.
Use:
n = hist(Y,nbins)
Though I am not sure that it will help in the speedup. It is just cleaner this way.
Edit : Following the comment:
You can use the second output parameter of histc
[n,bin] = histc(...) also returns an index matrix bin. If x is a vector, n(k) = >sum(bin==k). bin is zero for out of range values. If x is an M-by-N matrix, then
How About this
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
Y = zeros(size(X));
for i = 1:numel(q)-1
Y = Y+ X>=q(i);
end
This results in the following:
>>X = [3 1 4 6 7 2];
>>[Y, q] = levels(X,2)
Y =
1 1 2 2 2 1
q =
1 3.5 7
You could also modify the logic line to ensure values are less than the start of the next bin. However, I don't think it is necessary.
I think you shoud use histc
[~,Y] = histc(X,q)
As you can see in matlab's doc:
Description
n = histc(x,edges) counts the number of values in vector x that fall
between the elements in the edges vector (which must contain
monotonically nondecreasing values). n is a length(edges) vector
containing these counts. No elements of x can be complex.
I made a couple of refinements (including one inspired by Aero Engy in another answer) that have resulted in some improvements. To test them out, I created a random matrix of a million rows and 100 columns to run the improved functions on:
>> x = randn(1000000,100);
First, I ran my unmodified code, with the following results:
Note that of the 40 seconds, around 14 of them are spent computing the quantiles - I can't expect to improve this part of the routine (I assume that Mathworks have already optimized it, though I guess that to assume makes an...)
Next, I modified the routine to the following, which should be faster and has the advantage of being fewer lines as well!
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
Y = ones(size(X));
for i = 2:nLevels
Y = Y + bsxfun(#ge,X,q(i,:));
end
The profiling results with this code are:
So it is 15 seconds faster, which represents a 150% speedup of the portion of code that is mine, rather than MathWorks.
Finally, following a suggestion of Andrey (again in another answer) I modified the code to use the second output of the histc function, which assigns entries to bins. It doesn't treat the columns independently, so I had to loop over the columns manually, but it seems to be performing really well. Here's the code:
function [Y q] = levels(X,nLevels)
p = linspace(0,1,nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
q(end,:) = 2 * q(end,:);
Y = zeros(size(X));
for k = 1:size(X,2)
[junk Y(:,k)] = histc(X(:,k),q(:,k));
end
And the profiling results:
We now spend only 4.3 seconds in codes outside the quantile function, which is around a 500% speedup over what I wrote originally. I've spent a bit of time writing this answer because I think it's turned into a nice example of how you can use the MATLAB profiler and StackExchange in combination to get much better performance from your code.
I'm happy with this result, although of course I'll continue to be pleased to hear other answers. At this stage the main performance increase will come from increasing the performance of the part of the code that currently calls quantile. I can't see how to do this immediately, but maybe someone else here can. Thanks again!
You can sort the columns and divide+round the inverse indexes:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
[S,IX]=sort(X);
[grid1,grid2]=ndgrid(1:size(IX,1),1:size(IX,2));
invIX=zeros(size(X));
invIX(sub2ind(size(X),IX(:),grid2(:)))=grid1;
Y=ceil(invIX/size(X,1)*nLevels);
Or you can use tiedrank:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
R=tiedrank(X);
Y=ceil(R/size(X,1)*nLevels);
Surprisingly, both these solutions are slightly slower than the quantile+histc solution.