Get specific columns from scipy csr_matrix - scipy

I have a sparse matrix which if I print looks like this:
(0, 1773) 0.626247271075
(0, 1604) 0.626247271075
(0, 1515) 0.299755787345
(0, 660) 0.354647964743
(1, 2379) 0.206542018824
(1, 2159) 0.158175640686
There are in fact over 2000 columns, but as it is a sparse matrix only this is printed. I want to get the values of what looks like the second column, sorted by the last column, so like this:
x: 1604 y: 0.626247271075
x: 660 y: 0.354647964743
x: 1515 y: 0.299755787345
x: 2379 y: 0.206542018824
x: 2159 y: 0.158175640686
The Scipy documentation is not very clear to me, how do I access these columns?

You can access the non-zero indices using scipy.sparse.csr_matrix.nonzero
from scipy.sparse import csr_matrix
A = csr_matrix([[1,2,0],[0,0,3],[4,0,5]])
print(A.nonzero())
(array([0, 0, 1, 2, 2]), array([0, 1, 2, 0, 2]))

Related

How to map over a Spark Vector?

I have a mllib.linalg.Vector in Scala containing Double values in range of (-1; 1). I would like to multiply all of the values by, let's say, 100.
For example I'd like to convert [0.5, 0.3, -0.1] to [50, 30, -10].
How can I do it?
import org.apache.spark.mllib.linalg.*
val vec = org.apache.spark.mllib.linalg.Vectors.dense(0.5, 0.3, -0.1)
val vec2 = Vectors.dense(vec.toArray.map(_*100))

reshape scipy csr matrix

How can I reshape efficiently and scipy.sparse csr_matrix?
I need to add zero rows at the end.
Using:
from scipy.sparse import csr_matrix
data = [1,2,3,4,5,6]
col = [0,0,0,1,1,1]
row = [0,1,2,0,1,2]
a = csr_matrix((data, (row, col)))
a.reshape(3,5)
I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/scipy/sparse/base.py", line 129, in reshape
self.__class__.__name__)
NotImplementedError: Reshaping not implemented for csr_matrix.
If you can catch the problem early enough, just include a shape parameter:
In [48]: a = csr_matrix((data, (row, col)))
In [49]: a
Out[49]:
<3x2 sparse matrix of type '<class 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
In [50]: a = csr_matrix((data, (row, col)),shape=(3,5))
In [51]: a
Out[51]:
<3x5 sparse matrix of type '<class 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
In [52]: a.A
Out[52]:
array([[1, 4, 0, 0, 0],
[2, 5, 0, 0, 0],
[3, 6, 0, 0, 0]], dtype=int64)
You could also hstack on a pad. Make sure it's the sparse version:
In [59]: z = sparse.coo_matrix(np.zeros((3,3)))
In [60]: z
Out[60]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in COOrdinate format>
In [61]: sparse.hstack((a,z))
Out[61]:
<3x5 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in COOrdinate format>
In [62]: _.A
Out[62]:
array([[1., 4., 0., 0., 0.],
[2., 5., 0., 0., 0.],
[3., 6., 0., 0., 0.]])
hstack uses sparse.bmat. That combines the coo attributes of the 2 arrays, and makes a new coo matrix.
The reshape() method will work with csr_matrix objects in scipy 1.1, which is close to being released. In the meantime, you can try the code at Reshape sparse matrix efficiently, Python, SciPy 0.12 for reshaping a sparse matrix.
Your example won't work, however, because you are trying to reshape an array with shape (3, 2) into an array with shape (3, 5). The code linked to above and the sparse reshape() method follow the same rules as the reshape() method of numpy arrays: you can't change the total size of the array.
If you want to change the total size, you will eventually be able to use the resize() method (which operates in-place), but that is also a new feature of scipy 1.1, so it is not yet released.
Instead, you can construct a new sparse matrix as follows:
In [57]: b = csr_matrix((a.data, a.indices, a.indptr), shape=(3, 5))
In [58]: b.shape
Out[58]: (3, 5)
In [59]: b.A
Out[59]:
array([[1, 4, 0, 0, 0],
[2, 5, 0, 0, 0],
[3, 6, 0, 0, 0]], dtype=int64)

Training Multi-Layer Perceptron correctly for powers of 2

I'm quite new to neural networks world, I followed some tutorials and was able to implement a MLP, but the activation function is a hyperbolic tangent, whose range is [-1:1].
I wrote a training file like this: 1 2 3 4 ... n
and the target-output file is the powers of 2: 2 4 8 16 ... 2^n
and I want the network to simulate this function, but I don't know how to adjust the learning rate, momentum and activation function in order to correctly simulate.
I tried the activation function f(x) = x (with derivative 1) to overcome the range problem (outputs between -1 and 1), but my output and error grew very fast with it, even when I decreased the learning rate, so I'm confused about how to modify these parameters in order to simulate f(x) = 2^x or how to train the net correctly.
What should I do for this MLP to work?
What I prefer is to get binary representation of each 2^N and train that with learning rate of 0.01 and Sigmoid activation function.
It is hard to generate exactly 2, 4, 8 in decimals, overflow might happen and your output will go nan. Instead generate binary representation of same length for all ys and train your network for that.
import numpy as np
x = np.arange(5)
y = np.power(2, x)
x = x.reshape((-1, 1))
lr = 0.01
o = list(map(lambda x:int(np.binary_repr(x)), y))
o = list(map(lambda x:"{:010d}".format(x), o))
y = np.array(list(map(lambda x: [int(i) for i in x], o)))
print(y, x)
Here's what you get.
(array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0]]), array([[0],
[1],
[2],
[3],
[4]]))
It is our responsibility to not to let overflow happen. So activation function is necessary. You can come with your own intermediate representation within the scope of activation function.

Accessing sparse matrix elements

I have a very large sparse matrix of the type 'scipy.sparse.coo.coo_matrix'. I can convert to csr with .tocsr(), however .todense() will not work since the array is too large. I want to be able to extract elements from the matrix as I would do with a regular array, so that I may pass row elements to a function.
For reference, when printed, the matrix looks as follows:
(7, 0) 0.531519363001
(48, 24) 0.400946334437
(70, 6) 0.684460955022
...
Make a matrix with 3 elements:
In [550]: M = sparse.coo_matrix(([.5,.4,.6],([0,1,2],[0,5,3])), shape=(5,7))
It's default display (repr(M)):
In [551]: M
Out[551]:
<5x7 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in COOrdinate format>
and print display (str(M)) - looks like the input:
In [552]: print(M)
(0, 0) 0.5
(1, 5) 0.4
(2, 3) 0.6
convert to csr format:
In [553]: Mc=M.tocsr()
In [554]: Mc[1,:] # row 1 is another matrix (1 row):
Out[554]:
<1x7 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>
In [555]: Mc[1,:].A # that row as 2d array
Out[555]: array([[ 0. , 0. , 0. , 0. , 0. , 0.4, 0. ]])
In [556]: print(Mc[1,:]) # like 2nd element of M except for row number
(0, 5) 0.4
Individual element:
In [560]: Mc[1,5]
Out[560]: 0.40000000000000002
The data attributes of these format (if you want to dig further)
In [562]: Mc.data
Out[562]: array([ 0.5, 0.4, 0.6])
In [563]: Mc.indices
Out[563]: array([0, 5, 3], dtype=int32)
In [564]: Mc.indptr
Out[564]: array([0, 1, 2, 3, 3, 3], dtype=int32)
In [565]: M.data
Out[565]: array([ 0.5, 0.4, 0.6])
In [566]: M.col
Out[566]: array([0, 5, 3], dtype=int32)
In [567]: M.row
Out[567]: array([0, 1, 2], dtype=int32)

Sequence in MATLAB

Write a single MATLAB expression to generate a vector that contains first 100 terms of the following sequence: 2, -4, 8, -16, 32, …
My attempt :
n = -1
for i = 1:100
n = n * 2
disp(n)
end
The problem is that all values of n is not displayed in a single (1 x 100) vector. Neither the alternating positive and negative terms are shown. How to do that ?
You are having a geometric series where r = -2.
To produce 2, -4, 8, -16, 32, type this:
>>-(-2).^[1:5]
2, -4, 8, -16, 32
You can change the value of 5 accordingly.
Though there are better methods, as mentioned in the answer by #lakesh. I will point out the mistakes in your code.
By typing n = n * 2, how can it become a vector?
By doing n=n * 2, you are going to generate -2, -4, -8, -16, ...
Therefore, the correct code should be:
n = -1
for i = 2:101 % 1 extra term since first term has to be discarded later
n(i) = -n(i-1) * 2;
disp(n)
end
You can discard first element of n, to get the exact series you want.
n(end)=[];