I want create a huge matrix with scipy, 70,000 rows, 70,000 columns - scipy

I want to use Python's SciPy to create a huge matrix, 70,000 rows, 70,000 columns. Which sparse matrix in SciPy would work?

Related

Sparse Matrix Coding in Matlab

I have a dataset in which I have a 25,000 by 25,000 matrix for each timepoint with around 60% of the cells being zeroes. I need to be able to take the eigenvalues of my matrices and reorder the columns and rows. Currently, I am running into memory errors and am considering implementing sparse matrices to save memory. Is my dataset sparse enough to do this? Would I simply use the 'sparse' and 'eigs' commands to manipulate these matrices? Thank you in advance!

how to generate a large square binary matrix in MATLAB

i need to generate a large square binary sparse matrix in MATLAB (about 100k x 100k). but i get the "out of memory" error.
Can anybody help?
A 100,000 x 100,000 matrix contains 10,000,000,000 doubles. At 8 bytes each, that's 80,000,000,000 bytes, i.e. about 74.5058 Gb.
I seriously doubt you have 80Gb of RAM (let alone, allocated only to matlab), so presumably you'll have to find another way to process your data in chunks.
EDIT Apologies, I only just noticed the sparse bit.
If you try to initialise your sparse matrix as sparse (zeros( 100000,100000)), this will fail for the above reason (i.e. you're asking octave / matlab to first store a 75Gb matrix of zeros, and only then convert it to a sparse matrix).
Instead, you should initialise your 100,000x100,000 sparse matrix like so:
s = sparse(100000,100000);
and then proceed to fill in its contents.
Assuming the number of nonzero elements in your sparse matrix is low enough that they can be handled easily with your system's memory, and that you have a way of filling in the necessary values you have in mind without allocating a big bad matrix first, then this should work fine.
Have a look at the sparse function for other ways of initialising a sparse matrix from data.
Try increasing the size of the swap file of your system.

Fast way to set many values of sparse matrix

I have a sparse 5018x5018 matrix in MATLAB, which has about 100k values set to 1 (i.e., about 99.6% empty).
I'm trying to flip roughly 5% of those zeros to ones (i.e., about 1.25m entries). I have the x and y indices in the matrix I want to flip.
Here is what I have done:
sizeMat=size(network);
idxToReplace=sub2ind(sizeMat,x_idx, y_idx);
network(idxToReplace) = 1;
This is incredibly slow, in particular the last line. Is there any way to make this operation run noticeably faster, preferably without using mex files?
This should be faster:
idxToReplace=sparse(x_idx,y_idx,ones(size(x_idx),size(matrix,1),size(matrix,2)); % Create a sparse with ones at locations
network=network+idxToReplace; % Add the two matrices
I think your solution is very slow because you create a 1.26e6 logical array with your points and then store them in the sparse matrix. In my solution, you only create a sparse matrix and just sum the two.

Multiplying two Huge matrices using spark

i currently learning spark. i wanted to compute the matrix W and it is defined as
W=B*H'*inverse(R + H*B*H') here each variable is a matrix and
B=eb*eb'(eb' represents transpose of eb vector) [400000 * 400000]
R=eo*eo'(eo' represents transpose of eb vector) [200000 * 200000]
H is sparse matrix [200000 * 400000]
and eb matrix size is 4000000*1 so my B matrix size 400000*400000 now the problem is storing this total file i am currently using 4GB RAM, 500GB Disk space computer. initially i have done this in matlab by block multiplication and i am writing output to file, and my output file size is more than 300GB. and it is taking so much time after that i done by using spark it is taking less time but output file is same size.
i have used this method matrix multiplication to multiply two vectors
after computation of B matrix i am unable to compute B*H' as it requires to store them in RAM to multiply these two matrices. when i am running above code it is giving memory exception. is there any way to do this computation with limited memory i mean with out bringing total into memory. and how to compute the inverse of huge matrix of size [200000 * 200000].
If almost all entries of your matrices are 0, you might want to consider using a sparse matrix datastructure, only storing the main diagonal and a map of positions to nonzero entries.

Averaging Binned Values

I have an NxM matrix where I am binning on the first column into n buckets. What I would like to know is if I would like to take the average of all values in the secondary columns of my matrix is this possible.
So basically what I want is if I have a 2x1000 matrix where I bin on the data for the first column, I'd like to be able to average all values in each bin based on the second column of my matrix.
Perhaps a function exists that will easily allow me to do this. I appreciate any input.