How to have the total sum of all the numbers contained in a matrix? - scala

I'm a beginner in Scala and I'm trying to build a function that calculates the total sum of all the numbers contained in a matrix.
I tried to do this code :
val pixels = Vector(
Vector(0, 1, 2),
Vector(1, 2, 3)
)
def sum(matrix: Vector[Vector[Int]]): Int = {
matrix.reduce((a, b) => a + b)
}
println(sum(pixels))
But I get the following error: "value + is not a member of Vector[Int]"
I would like to have the sum total of all the numbers contained in the matrix as an integer.
Can you help me to solve this problem ?
Thank you in advance !

You defined matrix as a Vector of Vectors, so arguments to reduce are two Vectors, not ints. If you want to sum the all, you need to flatten first to pull the actual ints from the inner vectors. Also your function the way you wrote it does not return anything. You don't want that variable assignment:
def sum(matrix: Vector[Vector[Int]]) = matrix.flatten.reduce(_ + _)

Related

Functional way to build a matrix from the list in scala

I've asked this question on https://users.scala-lang.org/ but haven't got a concrete answer yet. I am given a vector v and I would like to construct a matrix m based on this vector according to the rules specified below. I would like to write the following code in a purely functional way, i.e. m = v.map(...) or similar. I can do it easily in procedural way like this
import scala.util.Random
val v = Vector.fill(50)(Random.nextInt(100))
println(v)
val m = Array.fill[Int](10, 10)(0)
def populateMatrix(x: Int): Unit = m(x/10)(x%10) += 1
v.map(x => populateMatrix(x))
m foreach { row => row foreach print; println }
In words, I am iterating through v, getting a pair of indices (i,j) from each v(k) and updating the matrix m at these positions, i.e., m(i)(j) += 1. But I am seeking a functional way. It is clear for me how to implement this in, e.g. Mathematica
v=RandomInteger[{99}, 300]
m=SparseArray[{Rule[{Quotient[#, 10] + 1, Mod[#, 10] + 1}, 1]}, {10, 10}] & /# v // Total // Normal
But how to do it in scala, which is functional language, too?
Your populate matrix approach can be "reversed" - map vector into index tuples, group them, count size of each group and turn it into map (index tuple -> size) which will be used to populate corresponding index in array with Array.tabulate:
val v = Vector.fill(50)(Random.nextInt(100))
val values = v.map(i => (i/10, i%10))
.groupBy(identity)
.view
.mapValues(_.size)
val result = Array.tabulate(10,10)( (i, j)=> values.getOrElse((i,j), 0))

how to accumulate 2D array elements in Scala?

I have 2D array as:
Array(Array(1,1,0), Array(1,0,1))
and I would like to accumulate values over column so my final output look like
Array(Array(1,1,0), Array(2,1,1))
If this is 1D array, I can simply use 'scan' but I'm having trouble with using scan in 2D array.
can anyone help on this issue?
Here's one way to do it:
val t = Array(Array(1,1,0), Array(1,0,1))
val result = t.scanLeft(Array.fill(t(0).length)(0)) ((x,y) =>
x.zip(y).map(e => e._1 + e._2)).drop(1)
//to see the results
result.foreach(e => println(e.toList))
gives:
List(1, 1, 0)
List(2, 1, 1)
The idea is to create an array filled with zeros (using Array.fill) and then scan the 2D array using that as an accumulator. In the end, drop(1) gets rid of the zero-filled array.
EDIT:
In response to the comment, this solution works for a matrix of any size. The zip function takes care of element-wise addition.
EDIT 2 (Step by step explanation):
You already know about scan or a one-dimensional array. The idea is essentially the same.
We initialize the accumulator with zero. In this case, zero means an array of zeros. Array.fill is used to create an array filled with zeros.
Instead of a single addition, we need to add arrays element-wise. This is what the combination of zip and map do. There are a lot of examples available on the Internet about how these methods work.
Finally, we drop the zero element using Scala's drop(1). The result is an array of arrays containing accumulated values.
I would solve it as for given row r, sum all previous rows.
val accumatedMatrix =
for(row <- array.indices)
yield array.take(row + 1).foldLeft(Array(0, 0, 0)) {
case (a, b) => Array(a(0) + b(0), a(1) + b(1), a(2) + b(2))
}
input:
val array = Array(
Array(1,1,0),
Array(1,0,1)
)
output:
1,1,0
2,1,1
Instead of summing all the previous rows repeatedly you can improve it to memoize as well.
Pretty much the same approach as a 1D array:
a.tail.scan(a.head)((acc, value) =>
Array(acc(0) + value(0), acc(1) + value(1), acc(2) + value(2))
)
For arbitrary number (as long as they all have the same number):
a.tail.scan(a.head)((acc, value) =>
acc zip value map {case (a,b) => a+b}
)

Using `quick-look` to find certain element

First of all, I want to say that this is a school assignment and I am only seeking for some guidance.
My task was to write an algorithm that finds the k:th smallest element in a seq using quickselect. This should be easy enough but when running some tests I hit a wall. For some reason if I use input (List(1, 1, 1, 1), 1) it goes into infinite loop.
Here is my implementation:
val rand = new scala.util.Random()
def find(seq: Seq[Int], k: Int): Int = {
require(0 <= k && k < seq.length)
val a: Array[Int] = seq.toArray[Int] // Can't modify the argument sequence
val pivot = rand.nextInt(a.length)
val (low, high) = a.partition(_ < a(pivot))
if (low.length == k) a(pivot)
else if (low.length < k) find(high, k - low.length)
else find(low, k)
}
For some reason (or because I am tired) I cannot spot my mistake. If someone could hint me where I go wrong I would be pleased.
Basically you are depending on this line - val (low, high) = a.partition(_ < a(pivot)) to split the array into 2 arrays. The first one containing the continuous sequence of elements smaller than pivot-element and the second contains the rest.
Then you say that if the first array has length k that means you have already seen k elements smaller that your pivot-element. Which means pivot-element is actually k+1th smallest and you are actually returning k+1th smallest element instead of kth. This is your first mistake.
Also... A greater problem occurs when you have all elements which are same because your first array will always have 0 elements.
Not only that... your code will give you wrong answer for inputs where you have repeating elements among k smallest ones like - (1, 3, 4, 1, 2).
The solution lies in obervation that in the sequence (1, 1, 1, 1) the 4th smallest element is the 4th 1. Meaning you have to use <= instead of <.
Also... Since the partition function will not split the array until your boolean condition is false, you can not use partition for achieving this array split. you will have to write the split yourself.

How to calculate median over RDD[org.apache.spark.mllib.linalg.Vector] in Spark efficiently?

What I want to do like this:
http://cn.mathworks.com/help/matlab/ref/median.html?requestedDomain=www.mathworks.com
Find the median value of each column.
It can be done by collecting the RDD to driver, for a big data which will become impossible.
I know Statistics.colStats() can calculate mean, variance... but median is not included.
Additionally, the vector is high-dimensional and sparse.
Well I didn't understand the vector part, however this is my approach (I bet there are better ones):
val a = sc.parallelize(Seq(1, 2, -1, 12, 3, 0, 3))
val n = a.count() / 2
println(n) // outputs 3
val b = a.sortBy(x => x).zipWithIndex()
val median = b.filter(x => x._2 == n).collect()(0)._1 // this part doesn't look nice, I hope someone tells me how to improve it, maybe zero?
println(median) // outputs 2
b.collect().foreach(println) // (-1,0) (0,1) (1,2) (2,3) (3,4) (3,5) (12,6)
The trick is to sort your dataset using sortBy, then zip the entries with their index using zipWithIndex and then get the middle entry, note that I set an odd number of samples, for simplicity but the essence is there, besides you have to do this with every column of your dataset.

Why mapped pairs get obliterated?

I'm trying to understand the example here which computes Jaccard similarity between pairs of vectors in a matrix.
val aBinary = adjacencyMatrix.binarizeAs[Double]
// intersectMat holds the size of the intersection of row(a)_i n row (b)_j
val intersectMat = aBinary * aBinary.transpose
val aSumVct = aBinary.sumColVectors
val bSumVct = aBinary.sumRowVectors
//Using zip to repeat the row and column vectors values on the right hand
//for all non-zeroes on the left hand matrix
val xMat = intersectMat.zip(aSumVct).mapValues( pair => pair._2 )
val yMat = intersectMat.zip(bSumVct).mapValues( pair => pair._2 )
Why does the last comment mention non-zero values? As far as I'm aware, the ._2 function selects the second element of a pair independent of the first element. At what point are (0, x) pairs obliterated?
Yeah, I don't know anything about scalding but this seems odd. If you look at zip implementation it mentions specifically that it does an outer join to preserve zeros on either side. So it does not seem that the comment applies to how zeroes are actually treated in matrix.zip.
Besides looking at the dimension returned by zip, it really seems this line just replicates the aSumVct column vector for each column:
val xMat = intersectMat.zip(aSumVct).mapValues( pair => pair._2 )
Also I find the val bSumVct = aBinary.sumRowVectors suspicious, because it sums the matrix along the wrong dimension. It feels like something like this would be better:
val bSumVct = aBinary.tranpose.sumRowVectors
Which would conceptually be the same as aSumVct.transpose, so that at the end of the day, in the cell (i, j) of xMat + yMat we find the sum of elements of row(i) plus the sum of elements of row(j), then we subtract intersectMat to adjust for the double counting.
Edit: a little bit of googling unearthed this blog post: http://www.flavianv.me/post-15.htm. It seems the comments were related to that version where the vectors to compare are in two separate matrices that don't necessarily have the same size.