How can I efficiently convert a scipy sparse matrix into a sympy sparse matrix? - scipy

I have a matrix A with the following properties.
<1047x1047 sparse matrix of type '<class 'numpy.float64'>'
with 888344 stored elements in Compressed Sparse Column format>
A has this content.
array([[ 1.00000000e+00, -5.85786642e-17, -3.97082034e-17, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ 6.82195979e-17, 1.00000000e+00, -4.11166786e-17, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[-4.98202332e-17, 1.13957868e-17, 1.00000000e+00, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
...,
[ 4.56847824e-15, 1.32261454e-14, -7.22890998e-15, ...,
1.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[-9.11597396e-15, -2.28796167e-14, 1.26624823e-14, ...,
0.00000000e+00, 1.00000000e+00, 0.00000000e+00],
[ 1.80765584e-14, 1.93779820e-14, -1.36520100e-14, ...,
0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])
Now I'm trying to create a sympy sparse matrix from this scipy sparse matrix.
from sympy.matrices import SparseMatrix
A = SparseMatrix(A)
But I get this error message.
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
I am confused because this matrix has no logical entries.
Thanks for any help!

The Error
When you get an error that you don't understand, take a bit of time to look at the traceback. Or at least show it to us!
In [288]: M = sparse.random(5,5,.2, 'csr')
In [289]: M
Out[289]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
In [290]: print(M)
(1, 1) 0.17737340878962138
(2, 2) 0.12362174819457106
(2, 3) 0.24324155883057885
(3, 0) 0.7666429046432961
(3, 4) 0.21848551209470246
In [291]: SparseMatrix(M)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-291-cca56ea35868> in <module>
----> 1 SparseMatrix(M)
/usr/local/lib/python3.6/dist-packages/sympy/matrices/sparse.py in __new__(cls, *args, **kwargs)
206 else:
207 # handle full matrix forms with _handle_creation_inputs
--> 208 r, c, _list = Matrix._handle_creation_inputs(*args)
209 self.rows = r
210 self.cols = c
/usr/local/lib/python3.6/dist-packages/sympy/matrices/matrices.py in _handle_creation_inputs(cls, *args, **kwargs)
1070 if 0 in row.shape:
1071 continue
-> 1072 elif not row:
1073 continue
1074
/usr/local/lib/python3.6/dist-packages/scipy/sparse/base.py in __bool__(self)
281 return self.nnz != 0
282 else:
--> 283 raise ValueError("The truth value of an array with more than one "
284 "element is ambiguous. Use a.any() or a.all().")
285 __nonzero__ = __bool__
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
A full understanding requires reading the sympy code, but a cursory look indicates that it's trying to handle your input as "full matrix", and looks at rows. The error isn't the result of you doing logical operations on the entries, but that sympy is doing a logical test on your sparse matrix. It's trying to check if the row is empty (so it can skip it).
SparseMatrix docs may not be the clearest, but most examples either show a dict of points, or a flat array of ALL values plus shape, or a ragged list of lists. I suspect it's trying to treat your matrix that way, looking at it row by row.
But the row of M is itself a sparse matrix:
In [295]: [row for row in M]
Out[295]:
[<1x5 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in Compressed Sparse Row format>,
<1x5 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>,
...]
And trying to check if that row is empty not row produces this error:
In [296]: not [row for row in M][0]
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
So clearly SparseMatrix cannot handle a scipy.sparse matrix as is (at least not in the csr or csc format, and probably not the others. Plus scipy.sparse is not mentioned anywhere in the SparseMatrix docs!
from dense array
Converting the sparse matrix to its dense equivalent does work:
In [297]: M.A
Out[297]:
array([[0. , 0. , 0. , 0. , 0. ],
[0. , 0.17737341, 0. , 0. , 0. ],
[0. , 0. , 0.12362175, 0.24324156, 0. ],
[0.7666429 , 0. , 0. , 0. , 0.21848551],
[0. , 0. , 0. , 0. , 0. ]])
In [298]: SparseMatrix(M.A)
Out[298]:
⎡ 0 0 0 0 0 ⎤
...⎦
Or a list of lists:
SparseMatrix(M.A.tolist())
from dict
The dok format stores a sparse matrix as a dict, which then can be
In [305]: dict(M.todok())
Out[305]:
{(3, 0): 0.7666429046432961,
(1, 1): 0.17737340878962138,
(2, 2): 0.12362174819457106,
(2, 3): 0.24324155883057885,
(3, 4): 0.21848551209470246}
Which works fine as an input:
SparseMatrix(5,5,dict(M.todok()))
I don't know what's most efficient. Generally when working with sympy we (or at least I) don't worry about efficiency. Just get it to work is enough. Efficiency is more relevant in numpy/scipy where arrays can be large, and using the fast compiled numpy methods makes a big difference in speed.
Finally - numpy and sympy are not integrated. That applies also to the sparse versions. sympy is built on Python, not numpy. So inputs in the form of lists and dicts makes most sense.

from sympy.matrices import SparseMatrix
import scipy.sparse as sps
A = sps.random(100, 10, format="dok")
B = SparseMatrix(100, 10, dict(A.items()))
From the perspective of someone who likes efficient memory structures this is like staring into the abyss. But it will work.

This is a simplified version of your error.
from scipy import sparse
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
A = sparse.csc_matrix((data, (row, col)), shape=(3, 3))
So A is a sparse matrix with 6 elements:
<3x3 sparse matrix of type '<class 'numpy.intc'>'
with 6 stored elements in Compressed Sparse Column format>
Calling SparseMatrix() on it returns the same kind of error that you have. You might like to convert A to numpy array first:
>>> SparseMatrix(A.todense())
Matrix([
[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])

Related

De-nest elements of cell-matrix into a matrix

EDIT: It turns out this problem is not solved, as it fails to handle empty cells in the source data. i.e. k = {1 2 [] 4 5}; cat( 2, k{:} ) gives 1 2 4 5 not 1 2 NaN 4 5. So the subsequent reshape is now misaligned. Can anyone outline a strategy for handling this?
I have data of the form:
X = { ...
{ '014-03-01' [1.1] [1.2] [1.3] }; ...
{ '014-03-02' [2.1] [2.2] [2.3] }; ... %etc
}
I wish to replace [1.1] with 1.1 etc, and also replace the date with an integer using datenum
So I may as well use a standard 2D matrix to hold the result (as every element can be expressed as a Double).
But how to go out this repacking?
I was hoping to switch the dates in-place using X{:,1} = datenum( X{:,1} ) but this command fails.
I can do:
A = cat( 1, X{:} )
dateNums = datenum( cat( 1, A{:,1} ) )
values = reshape( cat( 1, A{:,2:end} ), size(X,1), [] )
final = [dateNums values]
Well, that works, but I don't feel at all comfortable with it.
>> u = A{:,1}
u =
014-03-01
>> cat(1,u)
ans =
014-03-01
This suggests only one value is output. But:
>> cat(1,A{:,1})
ans =
014-03-01
014-03-02
So A{:,1} must be emitting a sequential stream of values, and cat must be accepting varargs.
So now if I do A{:,2:end}, it is now spitting out that 2D subgrid again as a sequential stream of values...? And the only way to get at that grid is to cat -> reshape it. Is this a correct understanding?
I'm finding MATLAB's console output infuriatingly inconsistent.
The "sequential stream of values" is known as a comma-separated list. Doing A{:,1} in MATLAB in the console is equivalent to the following syntax:
>> A{1,1}, A{2,1}, A{3,1}, ..., A{end,1}
This is why you see a stream of values because it is literally typing out each row of the cell for the first column, separated by commas and showing that in the command prompt. This is probably the source of your infuriation as you're getting a verbose dump of all of the contents in the cell when you are unpacking them into a comma-separated list. In any case, this is why you use cat because doing cat(1, A{:,1}) is equivalent to doing:
cat(1, A{1,1}, A{2,1}, A{3,1}, ... A{end,1})
The end result is that it takes all elements in the 2D cell array of the first column and creates a new result concatenating all of these together. Similarly, doing A{:, 2:end} is equivalent to (note the column-major order):
>> A{1, 2}, A{2, 2}, A{3, 2}, ..., A{end, 2}, A{1, 3}, A{2, 3}, A{3, 3}..., A{end, 3}, ..., A{end, end}
This is why you need to perform a reshape because if you did cat with this by itself, it will only give you a single vector as a result. You probably want a 2D matrix, so the reshape is necessary to convert the vector into matrix form.
Comma-separated lists are very similar to Python's splat operator if you're familiar with Python. The splat operator is used for unpacking input arguments that are placed in a single list or iterator type... so for example, if you had the following in Python:
l = [1, 2, 3, 4]
func(*l)
This is equivalent to doing:
func(1, 2, 3, 4)
The above isn't really necessary to understand comma-separated lists, but I just wanted to show you that they're used in many programming languages, including Python.
There is a problem with empty cells: cat will skip them. Which means that a subsequent reshape will throw a 'dimension mismatch' error.
The following code simply removes rows containing empty cells (which is what I require) as a preprocessing step.
(It would only take a minor alteration to replace empty cells with NaNs).
A = cat( 1, X{:} );
% Remove any row containing empty cells
emptiesInRow = sum( cellfun('isempty',A), 2 );
A( emptiesInRow > 0, : ) = [];
% Date is first col
dateNums = datenum( cat( 1, A{:,1} ) );
% Get other cols
values = reshape( cat( 1, A{:,2:end} ), size(A,1), [] );
% Recombine into (double) array
grid = [dateNums values]; %#ok<NASGU>

Accessing sparse matrix elements

I have a very large sparse matrix of the type 'scipy.sparse.coo.coo_matrix'. I can convert to csr with .tocsr(), however .todense() will not work since the array is too large. I want to be able to extract elements from the matrix as I would do with a regular array, so that I may pass row elements to a function.
For reference, when printed, the matrix looks as follows:
(7, 0) 0.531519363001
(48, 24) 0.400946334437
(70, 6) 0.684460955022
...
Make a matrix with 3 elements:
In [550]: M = sparse.coo_matrix(([.5,.4,.6],([0,1,2],[0,5,3])), shape=(5,7))
It's default display (repr(M)):
In [551]: M
Out[551]:
<5x7 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in COOrdinate format>
and print display (str(M)) - looks like the input:
In [552]: print(M)
(0, 0) 0.5
(1, 5) 0.4
(2, 3) 0.6
convert to csr format:
In [553]: Mc=M.tocsr()
In [554]: Mc[1,:] # row 1 is another matrix (1 row):
Out[554]:
<1x7 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>
In [555]: Mc[1,:].A # that row as 2d array
Out[555]: array([[ 0. , 0. , 0. , 0. , 0. , 0.4, 0. ]])
In [556]: print(Mc[1,:]) # like 2nd element of M except for row number
(0, 5) 0.4
Individual element:
In [560]: Mc[1,5]
Out[560]: 0.40000000000000002
The data attributes of these format (if you want to dig further)
In [562]: Mc.data
Out[562]: array([ 0.5, 0.4, 0.6])
In [563]: Mc.indices
Out[563]: array([0, 5, 3], dtype=int32)
In [564]: Mc.indptr
Out[564]: array([0, 1, 2, 3, 3, 3], dtype=int32)
In [565]: M.data
Out[565]: array([ 0.5, 0.4, 0.6])
In [566]: M.col
Out[566]: array([0, 5, 3], dtype=int32)
In [567]: M.row
Out[567]: array([0, 1, 2], dtype=int32)

vector of indices

I have a 10x10 matrix called A:
I have vector of column numbers:
C = [ 2, 6, 8 ];
I have a vector of row numbers:
R = [1; 3; 7];
The column numbers correspond to each row. i.e. For column 1 we are looking at row numbers given by R, for column 3 we are looking at row numbers given by R and so on.
I want to replace those exact locations in A with some other number 13.
i.e. for each of these locations in matrix A:
(1,2) (1,6) (1,8), (3,2), (3, 6), (3,8) I want to insert 13.
How do I achieve the above ?
you can do A(R,C) = 13 .......
As dlavila pointed out, you can do A(R,C) = 13 wich would be the best and easiest. Nevertheless I have written a longer code involving the eval function that you might find useful in the future:
for ii=1:length(C)
for jj =1:length(R)
eval(strcat('A(', num2str(C(ii)), ',',num2str(R(jj)),')=13;'))
end
end
Both give the same results.

How to find all index pairs of unequal elements in vector (Matlab)

Lets say I have the following vector in Matlab:
V = [4, 5, 5, 7];
How can I list (in a n-by-2 matrix for example) all the index pairs corresponding to unequal elements in the vector. For example for this particular vector the index pairs would be:
index pairs (1, 2) and (1, 3) corresponding to element pair (4,5)
index pair (1, 4) corresponding to element pair (4,7)
index pairs (2, 4) and (3, 4) corresponding to element pair (5,7)
The reason I need this is because I have a cost-function which takes a vector such as V as input and produces a cost-value.
I want to see how does the random swapping of two differing elements in the vector affect the cost value (using this for steepest descent hill climbing).
The order of the index pairs doesn't also matter. For my purposes (1,2) is the same as (2,1).
For example if my cost-function was evalCost(), then I could have V = [4, 5, 5, 7] and
evalCost(V) = 14
whereas for W = [4, 7, 5, 5] the cost could be:
evalCost(W) = 10
How to get the list of "swapping" pair indexes in Matlab. Hope my question is clear =)
I don't understand the cost function part, but the first part is simple:
[a,b]=unique(V)
C = combnk(b,2)
C contains the indices, and V(C) the values:
C = combnk(b,2)
C =
1 2
1 4
2 4
V(C)
ans =
4 5
4 7
5 7
Use bsxfun and then the two-ouput version of find to get the pairs. triu is applied to the output of bsxfun to consider only one of the two possible orders.
[ii jj] = find(triu(bsxfun(#ne, V, V.')));
pairs = [ii jj];

Matlab "for loop" to create a matrix

I have a fairly large vector named blender. I have extracted n elements for which blender is greater than x (irrelevant). Now my difficulty is the following:
I am trying to create a (21 x n) matrix with each element of blender plus 10 things before, and the 10 things after.
element=find(blender >= 120);
I have been trying variations of the following:
for i=element(1:end)
Matrix(i)= Matrix(blender(i-10:i+10));
end
then I want to plot one column of the matrix at the time when I hit Enter.
This second part I can figure out later, but I would appreciate some help making the Matrix
Thanks
First, you can use "logical indexing" of your array, which uses a logical expression do address your vector. With blender = [2, 302, 35, 199, 781, 312, 8], it could look like this:
>> b_hi = blender(blender>=120)
b_hi =
302 199 781 312
Second, you can concatenate arrays like in b_padded = [1, 2, b_hi, 3, 4]. If b_hi was a column vector, you'd use semicolons instead of commas.
Third, there is a function reshape that allows you to turn the resulting vector into a matrix. doc reshape will tell you details. For example, to turn b_padded into a 2-by-4 matrix,
>> b_matrix = reshape(b_padded, 4, 2)
b_matrix =
1 302 781 3
2 199 312 4
will do. This means you can do all of the job without any for-loop. Note that transposing the result of reshape(b_padded, 2, 4) will give you the other possible 2-by-4 matrix. You obtain the transpose of a matrix A by A'. You will find out which one you want.
You need to create a new matrix, and use two indices so that Matlab knows it is assigning to a column in a 2D matrix.
NewMatrix = zeros(21, length(element));
for i = 1:length(element)
k = element(i);
NewMatrix(:,i)= Matrix(blender(k-10:k+10));
end