Eigenvalue decomposition using MATLAB - matlab

I'm conducting dimensional reduction of a square matrix A. My issue now is that I have problem computing eigvalue decomposition of a 13000 x 13000 matrix A, i.e. [v d]=eigs(A). Because it's a sparse matrix, I get 'out of memory error' using a 4GB RAM. I'm convinced it's not my PC's problem, since the memory is not used up when eigs command is run. The help I saw online had to do with ARPACK. I checked the recommended site, but there were a lot of files there, don't know which to download. Also, I did not understand how to use it with MATLAB. Another help says use numerical methods, but I dont know which specific one to use. Please any solution is welcome.
Error in ==> eigs>ishermitian at 1535
tf = isequal(A,A');
Error in ==> eigs>checkInputs at 479
issymA = ishermitian(A);
Error in ==> eigs at 96
[A,Amatrix,isrealprob,issymA,n,B,classAB,k,eigs_sigma,whch, ...
Error in ==> labcomp at 20
[vector lambda] = eigs(A)
Please can I get translation of these errors and how to correct it?

The reason you don't see the memory used up, is that it isn't used up - Matlab fails to allocate the needed amount of memory.
Although an array of 13000 x 13000 doubles (the default data type in Matlab) is about 1.25 GB, it doesn't mean a 4Gb of ram is enough - Matlab need 1.25Gb of contiguous memory, otherwise it won't succeed in allocating your matrix. You can read more on memory problems in Matlab here: http://www.mathworks.com/support/tech-notes/1100/1106.html
You can as a first step try using single precision:
[v d]=eigs(single(A));
You say
another help says use numerical methods
If you are doing it on the computer, it's numerical by definition.
If you dont' want (or can't due to memory constraints) to do it in Matlab, you can look for a linear algebra library (ARPACK is just one of them) and have the calculation done outside of Matlab.

First if A is sparse, single(A) won't work. Single sparse matrices are not implemented in MATLAB, see comments:
how to create a single float sparse matrix in mex files
The call to ishermitian may fail because you can't store two copies of your matrix (A and A'). Bypass this problem by commenting the line out and setting issymA to true or false, depending on whether your matrix is Hermitian.
If you find further problems with memory inside eigs, try to reduce its memory footage by asking less solutions, eigs(A,1), or reducing the maximum size of the basis (option p), which by default is twice the number of asked solutions:
opts.p = 3
[x,d] = eigs(A,2,'LM',opts)

Related

sparse matrix causes Segmentation fault exit code 139

When working with a sparse matrix, it abruptly kills the kernel and exit code 139.
This happened when working with Gensim, which uses the sparse matrix format.
The failure happens when multiplying the matrix with another matrix, or even when using matrix.sum().
the matrix was created using scipy:
matrix = scipy.sparse.csc_matrix((data, indices, indptr), shape=(num_terms, num_docs), dtype=dtype)
Turns out the shape of the matrix (num_terms) didn't match the max(indices) which causes numpy to make erroneous assumptions about memory addresses.
This can easily be avoided if after creating the matrix we call:
matrix.check_format()
which makes some sanity checks on the matrix.
if using gensim, just use a high num_features. It doesn't have to be your actual number of features, as long as it's not lower than the actual number.
edit for more details:
with gensim you might be working on document similarity using:
sim_method = gensim.similarities.SparseMatrixSimilarity(documents, num_features=max_index)
if "documents" contain a higher id than max_index, it will cause a bug.
gensim simply wraps the scipy sparse matrix object. to call the check_format on it use:
sim_method.index.check_format()
Though a more likely bug can occur when you're trying to use this similarity method on another corpus of documents to get their similarity scores.
sim_method[query_documents]
again if query_documents contain an id higher than the max_index given at the time of the creation of the sim method - it will cause the bug.
Here gensim hides the scipy matrix completely so you can't call check_format directly, you'll just have to check your own input and make sure there's no bug there.

a very ill conditioned linear system

I have a linear system to solve, written as Ax=b
A is a 175 by 175 symmetric square, with ones at it's diagonal (i.e., aii=1), and other entries ranges from 0 to 1(i.e., 0
A is very ill conditioned, and not positive definite, its rank is 162 and its condition number is 3.5869e+16
I spent several days to solve this in MATLAB, I've tried almost every method I can find, including \, pcg, bicg, bicgstab, bicgstabl, cgs, gmres, lsqr, minres, qmr, symmlq, tfqmr
These methods gave me some solutions. But I don't know how to trust them, or which solution to trust. Is there a criteria to determine?
I'll appreciate someone who can give me a solution that I can trust.
Thanks!
A and b are stored in .mat files and can be download from dropbox links:
https://www.dropbox.com/s/s6xlbq68juqs6xi/A.mat?dl=0
https://www.dropbox.com/s/pxl0hdup20hf2lr/b.mat?dl=0
use like this:
load('A.mat');
load('b.mat');
x = A\b;
Not sure if this will help, but give it a go:
Tikhonov regularization
Basically, when the following is hard to compute due to ill-conditiones:
You minimize the following instead
Being \Gamma generally the identity matrix.
In the end, you get the following equation for x:
To add to that, you will generally want to add an "hyperparameter", to control how much you regularize the problem. So \Gamma instead of being just the identity matrix, it will be a number (i.e. 0.001) multiplied by the identity matrix of size(A).
The last equation should be straightforward to try in Matlab. Give it a go.
NOTE: this is not the answer. Actually probably there is no unique answer to solving an ill posed problem. This is just a way to go.

Matlab, economy QR decomposition, control precision?

There is a [Q,R] = qr(A,0) function in Matlab, which, according to documentation, returns an "economy" version of qr-decomposition of A. norm(A-Q*R) returns ~1e-12 for my data set. Also Q'*Q should theoretically return I. In practice there are small nonzero elements above and below the diagonal (of the order of 1e-6 or so), as well as diagonal elements that are slightly greater than 1 (again, by 1e-6 or so). Is anyone aware of a way to control precision of qr(.,0), or quality(orthogonality) of resulting Q, either by specifying epsilon, or via the number of iterations ? The size of the data set makes qr(A) run out of memory so I have to use qr(A,0).
When I try the non- economy setting, I actually get comparable results for A-Q*R. Even for a tiny matrix containing small numbers as shown here:
A = magic(20);
[Q, R] = qr(A); %Result does not change when using qr(A,0)
norm(A-Q*R)
As such I don't believe the 'economy' is the problem as confirmed by #horchler in the comments, but that you have just ran into the limits of how accurate calculations can be done with data of type 'double'.
Even if you change the accuracy somehow, you will always be dealing with an approximation, so perhaps the first thing to consider here is whether you really need greater accuracy than you already have. If you need more accuracy there may always be a way, but I doubt whether it will be a straightforward one.

How to get level of fitness of data to a distribution by using probplot() in Matlab?

I have 2 sets of data of float numbers, set A and set B. Both of them are matrices of size 40*40. I would like to find out which set is closer to the normal distribution. I know how to use probplot() in matlab to plot the probability of one set. However, I do not know how to find out the level of the fitness of the distribution is.
In python, when people use problot, a parameter ,R^2, shows how good the distribution of the data is against to the normal distribution. The closer the R^2 value to value 1, the better the fitness is. Thus, I can simply use the function to compare two set of data by their R^2 value. However, because of some machine problem, I can not use the python in my current machine. Is there such parameter or function similar to the R^2 value in matlab ?
Thank you very much,
Fitting a curve or surface to data and obtaining the goodness of fit, i.e., sse, rsquare, dfe, adjrsquare, rmse, can be done using the function fit. More info here...
The approach of #nate (+1) is definitely one possible way of going about this problem. However, the statistician in me is compelled to suggest the following alternative (that does, alas, require the statistics toolbox - but you have this if you have the student version):
Given that your data is Normal (not Multivariate normal), consider using the Jarque-Bera test.
Jarque-Bera tests the null hypothesis that a given dataset is generated by a Normal distribution, versus the alternative that it is generated by some other distribution. If the Jarque-Bera test statistic is less than some critical value, then we fail to reject the null hypothesis.
So how does this help with the goodness-of-fit problem? Well, the larger the test statistic, the more "non-Normal" the data is. The smaller the test statistic, the more "Normal" the data is.
So, assuming you have converted your matrices into two vectors, A and B (each should be 1600 by 1 based on the dimensions you provide in the question), you could do the following:
%# Build sample data
A = randn(1600, 1);
B = rand(1600, 1);
%# Perform JB test
[ANormal, ~, AStat] = jbtest(A);
[BNormal, ~, BStat] = jbtest(B);
%# Display result
if AStat < BStat
disp('A is closer to normal');
else
disp('B is closer to normal');
end
As a little bonus of doing things this way, ANormal and BNormal tell you whether you can reject or fail to reject the null hypothesis that the sample in A or B comes from a normal distribution! Specifically, if ANormal is 1, then you fail to reject the null (ie the test statistic indicates that A is probably drawn from a Normal). If ANormal is 0, then the data in A is probably not generated from a Normal distribution.
CAUTION: The approach I've advocated here is only valid if A and B are the same size, but you've indicated in the question that they are :-)

4 dimensional matrix

I need to use 4 dimensional matrix as an accumulator for voting 4 parameters. every parameters vary in the range of 1~300. for that, I define Acc = zeros(300,300,300,300) in MATLAB. and somewhere for example, I used:
Acc(4,10,120,78)=Acc(4,10,120,78)+1
however, MATLAB says some error happened because of memory limitation.
??? Error using ==> zeros
Out of memory. Type HELP MEMORY for your options.
in the below, you can see a part of my code:
I = imread('image.bmp'); %I is logical 300x300 image.
Acc = zeros(100,100,100,100);
for i = 1:300
for j = 1:300
if I(i,j)==1
for x0 = 3:3:300
for y0 = 3:3:300
for a = 3:3:300
b = abs(j-y0)/sqrt(1-((i-x0)^2) / (a^2));
b1=floor(b/3);
if b1==0
b1=1;
end
a1=ceil(a/3);
Acc(x0/3,y0/3,a1,b1) = Acc(x0/3,y0/3,a1,b1)+1;
end
end
end
end
end
end
As #Rasman mentioned, you probably want to use a sparse representation of the matrix Acc.
Unfortunately, the sparse function is geared toward 2D matrices, not arbitrary n-D.
But that's ok, because we can take advantage of sub2ind and linear indexing to go back and forth to 4D.
Dims = [300, 300, 300, 300]; % it will be a 300 by 300 by 300 by 300 matrix
Acc = sparse([], [], [], prod(Dims), 1, ExpectedNumElts);
Here ExpectedNumElts should be some number like 30 or 9000 or however many non-zero elements you expect for the matrix Acc to have. We notionally think of Acc as a matrix, but actually it will be a vector. But that's okay, we can use sub2ind to convert 4D coordinates into linear indices into the vector:
ind = sub2ind(Dims, 4, 10, 120, 78);
Acc(ind) = Acc(ind) + 1;
You may also find the functions find, nnz, spy, and spfun helpful.
edit: see lambdageek for the exact same answer with a bit more elegance.
The other answers are helping to guide you to use a sparse mat instead of your current dense solution. This is made a little more difficult since current matlab doesn't support N-dimensional sparse arrays. One implementation to do this is
replace
zeros(100,100,100,100)
with
sparse(100*100*100*100,1)
this will store all your counts in a sparse array, as long as most remain zero, you will be ok for memory.
then to access this data, instead of:
Acc(h,i,j,k)=Acc(h,i,j,k)+1
use:
index = h+100*i+100*100*j+100*100*100*k
Acc(index,1)=Acc(index,1)+1
See Avoiding 'Out of Memory' Errors
Your statement would require more than 4 GB of RAM (Around 16 Gigs, to be specific).
Solutions to 'Out of Memory' problems
fall into two main categories:
Maximizing the memory available to
MATLAB (i.e., removing or increasing
limits) on your system via operating
system selection and system
configuration. These usually have the
greatest overall applicability but are
potentially the most disruptive (e.g.
using a different operating system).
These techniques are covered in the
first two sections of this document.
Minimizing the memory used by MATLAB
by making your code more memory
efficient. These are all algorithm
and application specific and therefore
are less broadly applicable. These
techniques are covered in later
sections of this document.
In your case later seems to be the solution - try reducing the amount of memory used / required.