Averaging values at same position in List - scala

The following code averages the values at same position within an array:
val toadd = List(Array(8.0, 4.0), Array(5.0, 8.0), Array(7.0, 5.0))
val a1 = toadd.map(m => m(0)).sum
val a2 = toadd.map(m => m(1)).sum
(a1/toadd.size , a2/toadd.size)
Currently this just works for arrays of length 2.
How can this be modified so that it works for arrays of arbitrary length?

How about using transpose:
toadd.transpose.map(xs => xs.sum / xs.size)
// List(6.666666666666667, 5.666666666666667)

I like the idea of using transpose, as suggested by dhg. If you wanted to use more primitive functions, you could do:
toadd reduce {
(x, y) => (x zip y) map {
case (a, b) => a + b
}
} map { a => a / toadd.length }
Or more concisely:
toadd.reduce(_.zip(_).map(a=>a._1+a._2)).map(_/toadd.length)

You want something like
val innerSize = toadd.map(_.length).min
and then map from 0 until innerSize instead of doing it manually with a1, a2, etc..

Related

How to sum up pair elements individually in Scala

I have the following method to sum up the pair elements in an array of pairs. I am new to scala and feel like there will be a better way than the following piece of code.
def accumulate(results: Array[(Int, Int)]): (Int, Int) = {
var x: Int = 0
var y: Int = 0
for (elem <- results) {
x = x + elem._1
y = y + elem._2
}
(x, y)
}
Yes, you can use foldLeft.
(BTW, I would also use List, instead of Array)
results.foldLeft((0, 0)) {
case ((accX, accY), (x, y)) =>
(accX + x, accY + y)
}
All of the operations in scala.collection.ArrayOps are available on Array[T]. In particular, you can unzip an array of pairs into a pair of arrays
val (xs, ys) = results.unzip
Summing a container is a standard use of fold
val x = xs.fold(0)(_ + _)
val y = ys.fold(0)(_ + _)
And then you can return the pair of values
(x, y)
https://scalafiddle.io/sf/meEKv6T/0 has a complete working example.

Matrix Vector multiplication in Scala

I am having a Matrix of size D by D (implemented as List[List[Int]]) and a Vector of size D (implemented as List[Int]). Assuming value of D = 3, I can create matrix and vector in following way.
val Vector = List(1,2,3)
val Matrix = List(List(4,5,6) , List(7,8,9) , List(10,11,12))
I can multiply both these as
(Matrix,Vector).zipped.map((x,y) => (x,Vector).zipped.map(_*_).sum )
This code multiplies matrix with vector and returns me vector as needed. I want to ask is there any faster or optimal way to get the same result using Scala functional style? As in my scenario I have much bigger value of D.
What about something like this?
def vectorDotProduct[N : Numeric](v1: List[N], v2: List[N]): N = {
import Numeric.Implicits._
// You may replace this with a while loop over two iterators if you require even more speed.
#annotation.tailrec
def loop(remaining1: List[N], remaining2: List[N], acc: N): N =
(remaining1, remaining2) match {
case (x :: tail1, y :: tail2) =>
loop(
remaining1 = tail1,
remaining2 = tail2,
acc + (x * y)
)
case (Nil, _) | (_, Nil) =>
acc
}
loop(
remaining1 = v1,
remaining2 = v2,
acc = Numeric[N].zero
)
}
def matrixVectorProduct[N : Numeric](matrix: List[List[N]], vector: List[N]): List[N] =
matrix.map(row => vectorDotProduct(vector, row))

How to optionally return value from map function

map function on collections requires to return some value for each iteration. But I'm trying to find a way to return value not for each iteration, but only for initial values which matches some predicate .
What I want looks something like this:
(1 to 10).map { x =>
val res: Option[Int] = service.getById(x)
if (res.isDefined) Pair(x, res.get )// no else part
}
I think something like .collect function could do it, but seems with collect function I need to write many code in guards blocks (case x if {...// too much code here})
If you are returning an Option you can flatMap it and get only the values that are present (that is, are not None).
(1 to 10).flatMap { x =>
val res: Option[Int] = service.getById(x)
res.map{y => Pair(x, y) }
}
As you suggest, an alternative way to combine map and filter is to use collect and a partially applied function. Here is a simplified example:
(1 to 10).collect{ case x if x > 5 => x*2 }
res0: scala.collection.immutable.IndexedSeq[Int] = Vector(12, 14, 16, 18, 20)
You can use the collect function (see here) to do exactly what you want. Your example would then look like:
(1 to 10) map (x => (x, service.getById(x))) collect {
case (x, Some(res)) => Pair(x, res)
}
Using a for comprehension, like this,
for ( x <- 1 to 10; res <- service.getById(x) ) yield Pair(x, res.get)
This yields pairs where res does not evaluate to None.
Getting the first element:
(1 to 10).flatMap { x =>
val res: Option[Int] = service.getById(x)
res.map{y => Pair(x, y) }
}.head

How to avoid for loop with Spark?

i'm new to spark and don't understand how mapreduce mechanism works with spark. I have one csv file with only doubles, what i want is to make an operation (compute euclidian distance) with the first vector with the rest of the rdd. Then iterate with the other vectors. It is exist a other way than this one ? Maybe use wisely the cartesian product...
val rdd = sc.parallelize(Array((1,Vectors.dense(1,2)),(2,Vectors.dense(3,4),...)))
val array_vects = rdd.collect
val size = rdd.count
val emptyArray = Array((0,Vectors.dense(0))).tail
var rdd_rez = sc.parallelize(emptyArray)
for( ind <- 0 to size -1 ) {
val vector = array_vects(ind)._2
val rest = rdd.filter(x => x._1 != ind)
val rdd_dist = rest.map( x => (x._1 , Vectors.sqdist(x._2,vector)))
rdd_rez = rdd_rez ++ rdd_dist
}
Thank you for your support.
The distances (between all pairs of vectors) can be calculated using rdd.cartesian:
val rdd = sc.parallelize(Array((1,Vectors.dense(1,2)),
(2,Vectors.dense(3,4)),...))
val product = rdd.cartesian(rdd)
val result = product.filter{ case ((a, b), (c, d)) => a != c }
.map { case ((a, b), (c, d)) =>
(a, Vectors.sqdist(b, d)) }
I don't think why you were trying to do something like that. you can simply do this as follows.
val initialArray = Array( ( 1,Vectors.dense(1,2) ), ( 2,Vectors.dense(3,4) ),... )
val firstVector = initialArray( 0 )
val initialRdd = sc.parallelize( initialArray )
val euclideanRdd = initialRdd.map( { case ( i, vec ) => ( i, euclidean( firstVector, vec ) ) } )
Where we define a function euclidean which take two dense vectors and returns euclidean distances.

How to compute inverse of a multi-map

I have a Scala Map:
x: [b,c]
y: [b,d,e]
z: [d,f,g,h]
I want inverse of this map for look-up.
b: [x,y]
c: [x]
d: [x,z] and so on.
Is there a way to do it without using in-between mutable maps
If its not a multi-map - Then following works:
typeMap.flatMap { case (k, v) => v.map(vv => (vv, k))}
EDIT: fixed answer to include what Marth rightfully pointed out. My answer is a bit more lenghty than his as I try to go through each step and not use the magic provided by flatMaps for educational purposes, his is more straightforward :)
I'm unsure about your notation. I assume that what you have is something like:
val myMap = Map[T, Set[T]] (
x -> Set(b, c),
y -> Set(b, d, e),
z -> Set(d, f, g, h)
)
You can achieve the reverse lookup as follows:
val instances = for {
keyValue <- myMap.toList
value <- keyValue._2
}
yield (value, keyValue._1)
At this point, your instances variable is a List of the type:
(b, x), (c, x), (b, y) ...
If you now do:
val groupedLookups = instances.groupBy(_._1)
You get:
b -> ((b, x), (b, y)),
c -> ((c, x)),
d -> ((d, y), (d, z)) ...
Now we want to reduce the values so that they only contain the second part of each pair. Therefore we do:
val reverseLookup = groupedLookup.map(_._1 -> _._2.map(_._2))
Which means that for every pair we maintain the original key, but we map the list of arguments to something that only has the second value of the pair.
And there you have your result.
(You can also avoid assigning to an intermediate result, but I thought it was clearer like this)
Here is my simplification as a function:
def reverseMultimap[T1, T2](map: Map[T1, Seq[T2]]): Map[T2, Seq[T1]] =
map.toSeq
.flatMap { case (k, vs) => vs.map((_, k)) }
.groupBy(_._1)
.mapValues(_.map(_._2))
The above was derived from #Diego Martinoia's answer, corrected and reproduced below in function form:
def reverseMultimap[T1, T2](myMap: Map[T1, Seq[T2]]): Map[T2, Seq[T1]] = {
val instances = for {
keyValue <- myMap.toList
value <- keyValue._2
} yield (value, keyValue._1)
val groupedLookups = instances.groupBy(_._1)
val reverseLookup = groupedLookups.map(kv => kv._1 -> kv._2.map(_._2))
reverseLookup
}