Add a vector to every column of a matrix, using Scala Breeze - scala

I have a matrix M of (L x N) rank and I want to add the same vector v of length L to every column of the matrix. Is there a way do this please, using Scala Breeze?
I tried:
val H = DenseMatrix.zeros(L,N)
for (j <- 0 to L) {
H (::,j) = M(::,j) + v
}
but this doesn't really fit Scala's immutability as H is then already defined and therefore gives a reassignment to val error. Any suggestions appreciated!

To add a vector to all columns of a matrix, you don't need to loop through columns; you can use the column broadcasting feature, for your example:
H(::,*) + v // assume v is breeze dense vector
Should work.
import breeze.linalg._
val L = 3
val N = 2
val v = DenseVector(1.0,2.0,3.0)
val H = DenseMatrix.zeros[Double](L, N)
val result = H(::,*) + v
//result: breeze.linalg.DenseMatrix[Double] = 1.0 1.0
// 2.0 2.0
// 3.0 3.0

Related

Inserting values in row (Spark - Scala)

I want to create a row (for any given k) such -
for k =2, graph will be [Row(1,2), Row(3,4)]
for k =3, graph will be [Row(1,2,3), Row(4,5,6), Row(7,8,9)]
I am new to scala and dont know how exactly can I insert values in row like this.
import org.apache.spark.sql.Row
var graph = ArrayBuffer[Row]()
val k = 3
val k2 = k * k
for (a <- 1 to k2) {
graph += Row(a)
}
val k = 3
val range = ArrayBuffer.range(1, k*k + 1)
val rows = range.map(v => Row(v))
val grouped = rows.grouped(k).toBuffer

How can I normalize a matrix in spark?

I need to divide each matrix element (i, j) by the sqrt of the product of the diagonal elements (i, i) and (j, j)
in other words for all i and j I need to perform:
mat(i, j) = mat(i, j)/sqrt(mat(i,i)*mat(j,j))
So the matrix:
4 0 12
0 1 1
12 0 9
turns into:
1 0 2
0 1 1
2 0 1
What I have so far is a list of row/column index pairs with a weight that I convert into a CoordinateMatrix (and later RowMatrix). I extract the diagonal by filtering elements where row == column.
What's the best way to implement this elementwise division?
import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry, RowMatrix}
import scala.math.sqrt
val pairs = Array((0,0,4.0), (0,2,12.0), (1,1,1.0), (2,0,12.0), (2,2,9.0))
val pairs_rdd = sc.parallelize(pairs)
val diagonal = pairs_rdd.filter(r => r._1 == r._2).map(r => (r._2, sqrt(r._3)))
val matrixEntries = pairs_rdd.map(r => MatrixEntry(r._1, r._2, r._3))
val coordinateMatrix: CoordinateMatrix = new CoordinateMatrix(matrixEntries)
val rowMatrix: RowMatrix = coordinateMatrix.toRowMatrix()
It seems none of the MLLib matrix helper-classes can really assist here, so the only way out seems to be manual joining of your matrix with the diagonal you've created (once by i, once by j):
val diagonal: RDD[(Long, Double)] = pairs_rdd.filter(r => r._1 == r._2).map(r => (r._2, r._3))
val result = matrixEntries
.keyBy(_.i).join(diagonal).values // join by i coordinate
.keyBy(_._1.j).join(diagonal).values // join by j coordinate
.map { case ((e, di), dj) => MatrixEntry(e.i, e.j, e.value / sqrt(di * dj)) }

Conditional slicing in Scala Breeze

I try to slice a DenseVector based on a elementwise boolean condition on another DenseVector:
import breeze.linalg.DenseVector
val x = DenseVector(1.0,2.0,3.0)
val y = DenseVector(10.0,20,0,30.0)
// I want a new DenseVector containing all elements of y where x > 1.5
// i.e. I want DenseVector(20,0,30.0)
val newy = y(x:>1.5) // does not give a DenseVector but a SliceVector
With Python/Numpy, I would just write y[x>1.5]
Using Breeze you have to use for comprehensions for filtering DenseVectors
val y = DenseVector(10.0,20,0,30.0)
val newY = for {
v <- y
if v > 1.5
} yield v
// or to write it in one line
val newY = for (v <- y if v > 1.5) yield v
The SliceVector resulting from y(x:>1.5) is just a view on the original DenseVector. To create a new DenseVector, use
val newy = y(x:>1.5).toDenseVector

Inverse of a spark RowMatrix

I am trying to inverse a spark rowmatrix. The function I am using is below.
def computeInverse(matrix: RowMatrix): BlockMatrix = {
val numCoefficients = matrix.numCols.toInt
val svd = matrix.computeSVD(numCoefficients, computeU = true)
val indexed_U = new IndexedRowMatrix(svd.U.rows.zipWithIndex.map(r => new IndexedRow(r._2, r._1)))
val invS = DenseMatrix.diag(new DenseVector(svd.s.toArray.map(x => if(x == 0) 0 else math.pow(x,-1))))
val V_inv = svd.V.multiply(invS)
val inverse = indexed_U.multiply(V_inv.transpose)
inverse.toBlockMatrix.transpose
}
The logic I am implementing is through SVD. An explanation of the process is
U, Σ, V = svd(A)
A = U * Σ * V.transpose
A.inverse = (U * Σ * V.transpose).inverse
= (V.transpose).inverse * Σ.inverse * U.inverse
Now U and V are orthogonal matrix
Therefore,
M * M.transpose = 1
Applying the above,
A.inverse = V * Σ.inverse * U.transpose
Let V * Σ.inverse be X
A.inverse = X * U.transpose
Now, A * B = ((A * B).transpose).transpose
= (B.transpose * A.transpose).transpose
Applying the same, to keep U as a row matrix, not a local matrix
A.inverse = X * U.transpose
= (U.transpose.transpose * X.transpose).transpose
= (U * X.transpose).transpose
The problem is with the input row matrix. For example
1, 2, 3
4, 5, 6
7, 8, 9
10,11,12
the inverse from the above code snippet and on using python numpy is different. I am unable to find out why is it so? Is it because of some underlying assumption made during svd calculation? Any help will be greatly appreciated. Thanks.
The above code works properly. The reason I was getting this error was that I made the RowMatrix with a RDD[Vector]. Now, in spark things are sorted column wise to form a matrix, whereas in the case of numpy, array is converted row wise to a matrix
Array(1,2,3,4,5,6,7,8,9)
In Spark
1 4 7
2 5 8
3 6 9
In python, it is interpreted as
1 2 3
4 5 6
7 8 9
So, the test case was failing :|

Is it possible to correctly calculate SVD on IndexedRowMatrix in Spark?

I've got a IndexedRowMatrix [m x n], which contains only X non-zero rows. I'm setting k = 3.
When I try to calculate SVD on this object with computeU set to true, dimensions of U matrix are [m x n], when the correct dimensions are [m x k].
Why does it happen?
I've already tried converting IndexedRowMatrix to RowMatrix and then calculating SVD. The result dimensions are [X x k], so it only calculates result for non-zero rows (matrix is dropping indices, as in documentation).
Is it possible to convert this matrix, but with keeping rows indices?
val csv = sc.textFile("hdfs://spark/nlp/merged_sparse.csv").cache() // original file
val data = csv.mapPartitions(lines => {
val parser = new CSVParser(' ')
lines.map(line => {
parser.parseLine(line)
})
}).map(line => {
MatrixEntry(line(0).toLong - 1, line(1).toLong - 1 , line(2).toInt)
}
)
val coordinateMatrix: CoordinateMatrix = new CoordinateMatrix(data)
val indexedRowMatrix: IndexedRowMatrix = coordinateMatrix.toIndexedRowMatrix()
val rowMatrix: RowMatrix = indexedRowMatrix.toRowMatrix()
val svd: SingularValueDecomposition[RowMatrix, Matrix] = rowMatrix.computeSVD(3, computeU = true, 1e-9)
val U: RowMatrix = svd.U // The U factor is a RowMatrix.
val S: Vector = svd.s // The singular values are stored in a local dense vector.
val V: Matrix = svd.V // The V factor is a local dense matrix.
val indexedSvd: SingularValueDecomposition[IndexedRowMatrix, Matrix] = indexedRowMatrix.computeSVD(3, computeU = true, 1e-9)
val indexedU: IndexedRowMatrix = indexedSvd.U // The U factor is a RowMatrix.
val indexedS: Vector = indexedSvd.s // The singular values are stored in a local dense vector.
val indexedV: Matrix = indexedSvd.V // The V factor is a local dense matrix.
It looks like this is a bug in Spark MLlib. If you you get the size of a row vector in your indexed matrix it will correctly return 3 columns:
indexedU.rows.first().vector.size
I looked at the source and it looks like they're incorrectly copying the current number of columns from the indexed matrix:
val U = if (computeU) {
val indexedRows = indices.zip(svd.U.rows).map { case (i, v) =>
IndexedRow(i, v)
}
new IndexedRowMatrix(indexedRows, nRows, nCols) //nCols is incorrect here
} else {
null
}
Looks like a prime candidate for a bugfix/pull request.