Is it possible to define a sparse matrix in scipy from a function rather than the laying out all the possible values? In the doc's I see that a sparse matrix can be created by
There are seven available sparse matrix types:
csc_matrix: Compressed Sparse Column format
csr_matrix: Compressed Sparse Row format
bsr_matrix: Block Sparse Row format
lil_matrix: List of Lists format
dok_matrix: Dictionary of Keys format
coo_matrix: COOrdinate format (aka IJV, triplet format)
dia_matrix: DIAgonal format
All of these force you to specify the matrix beforehand, which takes up memory. Is there a way I can simply supply a function to calculate (i,j) when needed? The end goal is to calculate the few largest eigenvectors of the matrix through something like a Lanczos method.
Short answer is "no", but it's pretty easy i think to roll your own matrix-like object. If you are using eigsh to get your answer, (which appears to be an implementation of the Lanczos algorithm.), then your matrix-like requires a matvec(x) method, which may or may not be easy.
I realize this is not a complete answer, but I hope this sets you on your way.
Related
What is the more generalized term?
Why is MATLAB named matrix laboratory, then?
A matrix is a practical way to represent a linear transformation from a space of dimension n to a space of dimension m in the form of a nxm array of scalar values.
It is also very practical to perform linear algebra operation in a very systematic way that can be implemented on a computer. For instance if matrix A represents the linear transformation f and matrix B the linear transformation g, then the composition f o g writes as A*B where * denotes matrix multiplication. Matlab has also a lot of routines related to matrix operations (i.e. linear algebra operations) like det, pinv, svd etc...
As you can still see nowadays in Matlab, operators like *, / are strongly tied to matrix operations and thus strongly tied to linear algebra operations, which I think was the original goal of matlab in its early elaboration, hence its name (surely quite speculative but guess not so far from reality).
To perform element-wise operations on n-dimensional data sets, you have to write .*, or ./. denoting you are now performing array operations.
I would not say array operations encompass matrix operations, they are different. The later ones relate to linear algebra, while the other ones just relate to a practical way to operate on large sets of data. These data are not limited to be numbers, they are just n-dimensional data sets of whatever (string, numbers, cells, etc...).
Matlab also has a very synthetic syntax to perform array operations on sub-blocks (i.e. linear/logical subscripts) that makes it very easy to reorganize data sets in just one line of code before applying subsequent matrix or array operations.
If you're asking about MATLAB, the word "matrix" typically refers to a 2d array, whereas an "array" can be n-dimensional.
Early versions of MATLAB supported only 2d matrices, not n-dimensional arrays. I believe support for n-dimensional arrays was introduced in version 5 of MATLAB.
I would say that MATLABs matrix is a more advanced kind of array if you compare to the c-style arrays, eg double array[], or the Java array, eg double arry2[]. I would also say that the matlab matrix is better for mathematical purposed than the c++ vector or Java ArrayList. However, if you mean the matlab array I would say that it is more complicated. I would then recommend the link about matlab data which describes the mxArray type, used to store most of the data in matlab. The question is hard to answer completely without better description of what you mean with array, but I would say that regarding the type there is no difference between an array like a = [1,2,3,4] and matrix like b = [1,2,3,4;5,6,7,8]. There can also be matrices of higher dimensions as c = ones(3,4,3). These are in general called matrices as well in MATLAB, or if you need to be more specific N dimensional matrices.
I have a huge sparse matrix (1,000 x 1,000,000) that I cannot load on matlab (not enough RAM).
I want to visualize this matrix to have an idea of its sparsity and of the differences of the values.
Because of the memory constraints, I want to proceed as follows:
1- Divide the matrix into 4 matrices
2- Load each matrix on matlab and visualize it so that the colors give an idea of the values (and of the zeros particularly)
3- "Stick" the 4 images I will get in order to have a global idea for the original matrix
(i) Is it possible to load "part of a matrix" in matlab?
(ii) For the visualization tool, I read about spy (and daspect). However, this function only enables to visualize the non-zero values indifferently of their scales. Is there a way to add a color code?
(iii) How can I "stick" plots in order to make one?
If your matrix is sparse, then it seems that the currently method of storing it (as a full matrix in a text file) is very inefficient, and certainly makes loading it into MATLAB very hard. However, I suspect that as long as it is sparse enough, it can still be leaded into MATLAB as a sparse matrix.
The traditional way of doing this would be to load it all in at once, then convert to sparse representation. In your case, however, it would make sense to read in the text file, one line at a time, and convert to a MATLAB sparse matrix on-the-fly.
You can find out if this is possible by estimating the sparsity of your matrix, and using this to see if the whole thing could be loaded into MATLAB's memory as a sparse matrix.
Try something like: (untested code!)
% initialise sparse matrix
sparse_matrix = sparse(num_rows, num_cols);
row_num = 1;
fid = fopen(filename);
% read each line of text file in turn
while ~feof(fid)
this_line = fscanf(fid, '%f');
% add row to sparse matrix (note transpose, which I think is required)
sparse_matrix(row_num, :) = this_line';
row_num = row_num + 1;
end
fclose(fid)
% visualise using spy
spy(sparse_matrix)
Visualisation
With regards to visualisation: visualising a sparse matrix like this via a tool like imagesc is possible, but I believe it may internally create the full matrix – maybe someone can confirm if this is true or not. If it does, then it's going to cause you memory problems.
All spy is really doing is plotting in 2D the locations of the non-zero elements. You can fairly easily write your own spy function, which can have different coloured or sized points depending on the values at each location. See this answer for some examples.
Saving sparse matrices
As I say above, the method your matrix is saved as is pretty inefficient – for a matrix with 10% sparsity, around 95% of your text file will be a zero or a space. I don't know where this data has come from, but if you have any control over its creation (e.g. it comes from another program you have written) it would make much more sense to save only the non-zero elements in the format row_idx, col_idx, value.
You can then use spconvert to import the sparse matrix directly.
One of the simplest methods (if you can actually store the full sparse matrix in RAM) is to use gnuplot to visualize the sparisty pattern.
I was able to spy matrices of size 10-20GB using gnuplot without problems. But make sure you use png or jpeg formats to output the image. Note that you don't need the value of the non-zero entry only the integers (row, col). And plot them "plot "row_col.dat" using 1:2 with points".
This chooses your row as x axis and cols as your y axis and start plotting the non-zero entries. It is very easy to do this. This is the most scalable solution I know. Gnuplot works at decent speed even for very large datasets (>10GB of [row, cols]), but Matlab just hangs (with due respect)
I use imagesc() to visualise arrays. It scales the values in array to values between 0 and 1, then plots the array like a greyscale bitmap image (of course you can change the colormap to make it easier to see detail).
I am from C/C++ programming world and finding it difficult to understand what exactly is a Vector / Matrix in MATLAB - why are the not termed as array everywhere.
What is Vector in MATLAB and why it is not called or referenced as an array?
The "MAT" in MATLAB is for Matrix, not Math. In MATLAB, basically everything you do is calculations with what you would call matrices / vectors in mathematical terms.
It is common to call a numeric array a matrix (or vector if it's 1xn), and other arrays for arrays. You'll see terms like cell array, which is an array of cells.
This way you can use mathematical terms when describing calculations with numerical arrays. For instance inv can be used to find the inverse of a matrix, instead of the inverse of a numeric array. (Btw, never use inv, it was just an example).
Matlab is designed to use as "Matrix-Lab": a tool for numerically process linear-algebra objects such as vector and matrices. So, in terms of "data structures" it indeed works with n-dimensional arrays, but has special names for the special cases: "vector" for 1-d array and "matrix" for 2-d array.
I've got a really big matrix which I should "upscale" (i.e.: create another matrix where the elements of the first are grouped 40-by-40). For every 40-by-40 group I should evaluate a series of parameters (i.e.: frequencies, average and standard deviation).
I'm quite sure I can make such thing with a loop, but I was wondering if there was a more elegant vectorized method...
You might find blockproc useful. This command allows you to apply a function (e.g. #mean, #std etc.) to each distinct block in a 2D matrix.
MATLAB documentation of SVD states that the diagonal matrix returned has singular values in decreasing order. Is there a way to find out what the natural ordering of singular values would be?
The reason I ask is because the singular values correspond to dimensions associated with rows of the input matrix.
No, the very definition of SVD does not introduce an ordering. Restricting the discussion to square X matrices and adopting the same notation of the cited matlab documentation, if X = U*S*V' is a SVD of X, then for every permutation matrix P, we can form a valid SVD as X = (U*P)*(P'*S*P)*(V*P)'. Presenting matrix S with descending values is just a matter of convenience: every permutation P'*S*P would do the same job.
As a side note: P*X = P*U*S*V' showing that a row permutation of matrix X does not change the singular values S, which can be considered independent from any row (or column) permutation of X.
I was hoping to get some idea of what is being asked here before responding. For example, the eigenshuffle tool I've posted on the file exchange allows you to reorder the eigenvalues and eigenvectors of a sequence of eigen-problems, so they are maximally consistent with each other in sequence. Perhaps your problem is similar, thus you might think of the singular values as functions that vary along with some parameter that drives a system.
But really, there is no natural ordering of the singular values that comes from the method used to compute the SVD. In fact, the only ordering that makes sense is the one that comes out - decreasing order. The order of the singular values is not dependent on the sequence of the rows of your matrix, as the question seems to vaguely imply, so I'm not sure what is meant there.
Feel free to modify the question in case you can make your needs clearer.