Converting a scipy.sparse.csr.csr_matrix into a form readable by MATLAB - matlab

I have a scipy.sparse.csr.csr_matrix which is the output from TfidfVectorizer() class. I know I can access the individual components of this matrix in this manner:
So if I have this matrix here:
tf_idf_matrix = vectorizer.fit_transform(lines)
I can access the individual components here:
tf_idf_matrix.data
tf_idf_matrix.indices
tf_idf_matrix.indptr
How do I save this from Python- such that I can load it into a MATLAB sparse matrix? OR how do I change it into a dense array, and save it as one numpy.ndarray text file - such that I can just simply load it into MATLAB as a matrix. The size of this matrix is not VERY large - its (5000, 68k)
Please help. Thanks

The MATLAB sparse constructor:
S = sparse(i,j,s,m,n,nzmax) uses vectors i, j, and s to generate an m-by-n sparse matrix such that S(i(k),j(k)) = s(k), with space allocated for nzmax nonzeros
is the same as the scipy sparse (including the step of adding values with ij are same).
csr_matrix((data, ij), [shape=(M, N)])
where data and ij satisfy the relationship a[ij[0, k], ij[1, k]] = data[k]
data and ij the attributes of the coo_matrix format. So for a start I'd suggest converting tocoo and writing the three arrays to a .mat file (scipy.io).

Assuming you have those components in matlab
then
x = accumarray(indptr+1, ones(size(indptr)),[1,N]);
% N being the number of rows >= max indptr+1
colind = cumsum(x);
res = sparse(colind,indices,data);
should do it.
The first part just converts the indptr vector into a vector to match each index with the right column number.
(note that indptr may have repititions and this is why the accumarray is necessary )

Related

One hot encode column vectors in matrix without iterating

I am implementing a neural network and am trying to one hot encode a matrix of column vectors based on the max value in each column. Previously, I had been iterating through the matrix vector by vector, but I've been told that this is unnecessary and that I can actually one hot encode every column vector in the matrix at the same time. Unfortunately, after perusing SO, GitHub, and MathWorks, nothing seems to be getting the job done. I've listed my previous code below. Please help! Thanks :)
UPDATE:
This is what I am trying to accomplish...except this only changed the max value in the entire matrix to 1. I want to change the max value in each COLUMN to 1.
one_hots = bsxfun(#eq, mini_batch_activations, max(mini_batch_activations(:)))
UPDATE 2:
This is what I am looking for, but it only works for rows. I need columns.
V = max(mini_batch_activations,[],2);
idx = mini_batch_activations == V;
Iterative code:
% This is the matrix I want to one hot encode
mini_batch_activations = activations{length(layers)};
%For each vector in the mini_batch:
for m = 1:size(mini_batch_activations, 2)
% Isolate column vector for mini_batch
vector = mini_batch_activations(:,m);
% One hot encode vector to compare to target vector
one_hot = zeros(size(mini_batch_activations, 1),1);
[max_val,ind] = max(vector);
one_hot(ind) = 1;
% Isolate corresponding column vector in targets
mini_batch = mini_batch_y{k};
target_vector = mini_batch(:,m);
% Compare one_hot to target vector , and increment result if they match
if isequal(one_hot, target_vector)
num_correct = num_correct + 1;
endif
...
endfor
You’ve got the maxima for each column:
V = max(mini_batch_activations,[],1); % note 1, not 2!
Now all you need to do is equality comparison, the output is a logical array that readily converts to 0s and 1s. Note that MATLAB and Octave do implicit singleton expansion:
one_hot = mini_batch_activations==V;

Access multiple elements and assign to each selected element a different value

I need to know if there is any efficient way of doing the following in MATLAB.
I have several big sparse matrices, the size of each one is roughly 9000000x9000000.
I need to access multiple element of such matrix and assign to each selected element a different value stored in another array. I'll give an example:
What I have:
SPARSE MATRIX of size 9000000x9000000
Matrix with the list of indexes and values I want to access, this is a matrix like this:
[row1, col1, value1;
row2, col2, value2;
...
rowN, colN, valueN]
Where N is the length of such matrix.
What I need:
Assign to the SPARSE MATRIX the corresponding value to the corresponding index, this is:
SPARSE_MATRIX(row1, col1) = value1
SPARSE_MATRIX(row2, col2) = value2
...
SPARSE_MATRIX(rowN, colN) = valueN
Thanks in advance!
EDIT:
Thank you to both for answering, I think I did not explain myself well, I'll try again.
I already have a large SPARSE MATRIX of about 9000000 rows x 9000000 columns, it is a SPARSE MATRIX filled with zeros.
Then I have another array or matrix, let's call it M with N number of rows, where N could take values from 0 to 9000000; and 3 columns. The first two columns are used to index an element of my SPARSE MATRIX, and the third column stores the value I want to transfer to the SPARSE MATRIX, this is, given a random row of M, i:
SPARSE_MATRIX(M(i, 1), M(i, 2)) = M(i, 3)
The idea is to do that for all the rows, I have tried it with common indexing:
SPARSE_MATRIX(M(:, 1), M(:, 2)) = M(:, 3)
Now I would like to do this assignation for all the rows in M as fast as possible, because if I use a loop or common indexing it takes ages (I am using a 7th Gen i7 processor with 16 GB of RAM). And I also need to keep the zeros in the SPARSE_MATRIX.
EDIT 2: SOLVED! Thank you Metahominid, I was not thinking through, but yes the sparse function does solve my problem, I just think my brain circuits were shortcircuited yesterday and was unable to see through it hahaha. Thank you to both anyway!
Regards!
You can construct a sparse matrix like this.
A = sparse(i,j,v)
S = sparse(i,j,v) generates a sparse matrix S from the triplets i, j,
and v such that S(i(k),j(k)) = v(k). The max(i)-by-max(j) output
matrix has space allotted for length(v) nonzero elements. sparse adds
together elements in v that have duplicate subscripts in i and j.
So you can simply construct the row vector, column vector and value vector.
I am answering in part because I cannot comment. You question seems a little confusing to me. The sparse() function in MATLAB does just this.
You can enter your arrays of indices and values directly into the interface, or declare a sparse matrix of zeros and set each individually.
Given your data format make three vectors, ROWS = [row1; ...; rown], COLS = [col1; ...; coln], and DATA = [val1; ... valn]. I am assuming that your size is the overall size of the full matrix and not the sparse portion.
Then
A = sparse(ROWS, COLS, DATA) will do just what you want. You can even specify the original matrix size.
A = sparse(ROWS, COLS, DATA, 90...., 90....).

Address Complex numbers and create new matrix in MATLAB

I have a complex matrix of size 3x2x372 complex double. I would like to work with only one specific of these three dimensions. Therefore, I used the following code to make the table easier to read:
new_output = abs(output);
In fact, the new matrix is of size 3x2x372 double. I guess it makes the further computation simpler. So I obtain the following output:
I would now like to create a matrix that only refers to the highlighted values. So it should ideally be of size 2x372 double.
Make a for-loop and assign the last row to a new matrix.
mat = zeroes(372, 2)
for k = 1:372
a = val(:, :, k)
mat(k, :) = a(1, :)
end
Edit: above gives you a 372x2 matrix. Use below to get a 2x372 matrix
mat = zeroes(2, 372)
And
mat(:, k) = a(1,:).'
in the loop
Actually, you need the last row from each "slice", so you can get it by:
new_output=data(size(data,1),:,:);
But that will give you the same dimensions as the original matrix, with 3D. To directly get it as 2D matrix, use squeeze:
new_output=squeeze(data(size(data,1),:,:));

Matrix indices for creating sparse matrix

I want to create a 4 by 4 sparse matrix A. I want assign values (e.g. 1) to following entries:
A(2,1), A(3,1), A(4,1)
A(2,2), A(3,2), A(4,2)
A(2,3), A(3,3), A(4,3)
A(2,4), A(3,4), A(4,4)
According to the manual page, I know that I should store the indices by row and column respectively. That is, for row indices,
r=[2,2,2,2,3,3,3,3,4,4,4,4]
Also, for column indices
c=[1,2,3,4,1,2,3,4,1,2,3,4]
Since I want to assign 1 to each of the entries, so I use
value = ones(1,length(r))
Then, my sparse matrix will be
Matrix = sparse(r,c,value,4,4)
My problem is this:
Indeed, I want to construct a square matrix of arbitrary dimension. Says, if it is a 10 by 10 matrix, then my column vector will be
[1,2,..., 10, 1,2, ..., 10, 1,...,10, 1,...10]
For row vector, it will be
[2,2,...,2,3,3,...,3,...,10, 10, ...,10]
I would like to ask if there is a quick way to build these column and row vector in an efficient manner? Thanks in advance.
I think the question aims to create vectors c,r in an easy way.
n = 4;
c = repmat(1:n,1,n-1);
r = reshape(repmat(2:n,n,1),1,[]);
Matrix = sparse(r,c,value,n,n);
This will create your specified vectors in general.
However as pointed out by others full sparse matrixes are not very efficient due to overhead. If I recall correctly a sparse matrix offers advantages if the density is lower than 25%. Having everything except the first row will result in slower performance.
You can sparse a matrix after creating its full version.
A = (10,10);
A(1,:) = 0;
B = sparse(A);

How to see resampled data after BOOTSTRAP

I was trying to resample (with replacement) my database using 'bootstrap' in Matlab as follows:
D = load('Data.txt');
lead = D(:,1);
depth = D(:,2);
X = D(:,3);
Y = D(:,4);
%Bootstraping to resample 100 times
[resampling100,bootsam] = bootstrp(100,'corr',lead,depth);
%plottig the bootstraping result as histogram
hist(resampling100,10);
... ... ...
... ... ...
Though the script written above is correct, I wonder how I would be able to see/load the resampled 100 datasets created through bootstrap? 'bootsam(:)' display the indices of the data/values selected for the bootstrap samples, but not the new sample values!! Isn't it funny that I'm creating fake data from my original data and I can't even see what is created behind the scene?!?
My second question: is it possible to resample the whole matrix (in this case, D) altogether without using any function? However, I know how to create random values from a vector data using 'unidrnd'.
Thanks in advance for your help.
The answer to question 1 is that bootsam provides the indices of the resampled data. Specifically, the nth column of bootsam provides the indices of the nth resampled dataset. In your case, to obtain the nth resampled dataset you would use:
lead_resample_n = lead(bootsam(:, n));
depth_resample_n = depth(bootsam(:, n));
Regarding the second question, I'm guessing what you mean is, how would you just get a re-sampled dataset without worrying about applying a function to the resampled data. Personally, I would use randi, but in this situation, it is irrelevant whether you use randi or unidrnd. An example follows that assumes 4 columns of some data matrix D (as in your question):
%# Build an example dataset
T = 10;
D = randn(T, 4);
%# Obtain a set of random indices, ie indices of draws with replacement
Ind = randi(T, T, 1);
%# Obtain the resampled data
DResampled = D(Ind, :);
To create multiple re-sampled data, you can simply loop over the creation of random indices. Or you could do it in one step by creating a matrix of random indices and using that to index D. With careful use of reshape and permute you can turn this into a T*4*M array, where indexing m = 1, ..., M along the third dimension yields the mth resampled dataset. Example code follows:
%# Build an example dataset
T = 10;
M = 3;
D = randn(T, 4);
%# Obtain a set of random indices, ie indices of draws with replacement
Ind = randi(T, T, M);
%# Obtain the resampled data
DResampled = permute(reshape(D(Ind, :)', 4, T, []), [2 1 3]);