Attacking grouping sets of values functionally - scala

Given a map associating indices to values, how do I create a separate map that accumulates the values that are above a particular threshold when the number of values that can be grouped together cannot exceed some limiting value?
For example, given a mapping like this:
val raw = Map(0 -> 2, 1 -> 1, 2 -> 2, 3 -> 0, 4 -> 1, 5 -> 2)
Group those values over 2 together but each grouping can only contain at most the sum of 2 values such that if the first value is >= 2 then the grouping would contain a single value. In contrast, if the 1st value is less than 2, the grouping will be of size 2 with a value consisting of the same of the 1st value and the second value.
Executing that on the mapping above would yield a map of the group's index to the value, e.g.,
Map(0 -> 2, 1 -> 3, 2 -> 1, 3 -> 2) // Result
Obviously the way to do this in a non-functional way would be like this:
var c = 0
var sortedIndex = 0
var acc: Map[Int, Int] = Map() // Result accumulator
val limit = 2 // Anything larger will be forced into the next group
while (c < raw.size) {
if (raw(c) >= limit) {
acc = acc ++ Map(sortedIndex -> raw(c))
c = c + 1
} else {
acc = acc ++ Map(sortedIndex -> raw(c) + raw(c + i)
c = c + 2
}
sortedIndex = sortedIndex + 1
}
acc
How would I do this functionally? I.e., immutable states, reducing my use of loops. (I understand that loops are not "dead" in FP, just trying to reinforce a use case where I can get away with NOT using loops.)

I do not think you need to work with Map for this problem. Since the key of the map is simple index. Any case the following work for your problem:
val testLimit = 2 // Update the constants as required
val takeUpto = 2
def accumulator(input: List[Int], output: List[Int] = List.empty[Int]): List[Int] = {
input match {
case Nil => output // We have reached at the end of the input
case head :: tail if head >= testLimit => accumulator(tail, output :+ head)
case m =>
val (toSum, next) = m.splitAt(takeUpto)
accumulator(next, output :+ toSum.sum)
}
}
// Map(0 -> 2, 1 -> 3, 3 -> 1, 4 -> 2) // Result
// val raw = Map(0 -> 2, 1 -> 1, 2 -> 2, 3 -> 0, 4 -> 1, 5 -> 2) equivalent is List(2, 1, 2, 0, 1, 2)
println(accumulator(List(2, 1, 2, 0, 1, 2)))

Related

How to pass the initial value to foldLeft from a filtered List with function chaining?

Say I have a List. I filter it first on some condition. Now I want to pass the initial value from this filtered array to foldLeft all while chaining both together. Is there a way to do that?
For example:
scala> val numbers = List(5, 4, 8, 6, 2)
val numbers: List[Int] = List(5, 4, 8, 6, 2)
scala> numbers.filter(_ % 2 == 0).foldLeft(numbers(0)) { // this is obviously incorrect since numbers(0) is the value at index 0 of the original array not the filtered array
| (z, i) => z + i
| }
val res88: Int = 25
You could just pattern match on the result of filtering to get the first element of list (head) and the rest (tail):
val numbers = List(5, 4, 8, 6, 2)
val result = numbers.filter(_ % 2 == 0) match {
case head :: tail => tail.foldLeft(head) {
(z, i) => z + i
}
// here you need to handle the case, when after filtering there are no elements, in this case, I just return 0
case Nil => 0
}
You could also just use reduce:
numbers.filter(_ % 100 == 0).reduce {
(z, i) => z + i
}
but it will throw an exception in case after filtering the list is empty.

How to move contents of one element in a map to another element in Scala

I am trying to transfer/copy an element in a map, to another element in the map in Scala. For example:
Map(0 -> 5)
Let's say this is the initial state of the map. What I want to happen is the following:
Map(0 -> 0, 1 -> 5)
So after the change has happened, 0 that initially points to 5, but after the transformation 0 will point to 0, and a new element is added (1) that points to 5.
I have tried the following:
theMap + (pointer -> (theMap(pointer) + 1))
However, I get the following error:
java.util.NoSuchElementException: key not found: 1
Thanks for any help!
This should do the trick.
def transfer(pointer: Int)(map: Map[Int, Int]): Map[Int, Int] =
map.get(key = pointer) match {
case Some(value) =>
map ++ Map(
pointer -> 0,
(pointer + 1) -> value
)
case None =>
// Pointer didn't exist, what should happen here?
map // For now returning the map unmodified.
}
And you can use it like this:
transfer(pointer = 0)(map = Map(0 -> 5))
// res: Map[Int,Int] = Map(0 -> 0, 1 -> 5)

Calculating the Dot Product of two Sparse Vectors (and generating them) in Scala using the standard library

I am trying to calculate the dot product (scalar product) of two sparse vectors in Scala. The code I have written is doing everything that I want it to, except when multiplying the similar elements of the two vectors, it is not accounting for the 0 values.
I expect to get 72 as my answer as 3 and 18 are the only keys that are both non-zero and they evaluate to: (3 -> 21) + (18 -> 51) = 72
I used withDefaultValue(0) hoping it would "fill in" the unmentioned key/value pairs but I do not think this is the case, and I believe this is where my problem is coming from, in the very beginning. I think my question could also be "How to generate a Sparse Vector in Scala using the Standard Library".
If I enter the corresponding 0's and the two Maps (vectors) have the same number of key/value pairs, my code works properly.
```
val Sparse1 = Map(0 -> 4, 3 -> 7, 6 -> 11, 18 -> 17).withDefaultValue(0)
val Sparse2 = Map(1 -> 3, 3 -> 3, 11 -> 2,18 -> 3, 20 -> 6).withDefaultValue(0)
//println(Sparse2.toSeq)//to see what it is....0's missing
val SparseSum = (Sparse1.toSeq ++ Sparse2.toSeq).groupBy(_._1).mapValues(_.map(_._2).sum)
//println(SparseSum)
val productOfValues = ((Sparse1.toSeq ++ Sparse2.toSeq).groupBy(_._1).mapValues(_.map(_._2).reduce(_*_)))
//println(productOfValues)
var dotProduct = 0
for ((h,i) <- productOfValues) {
dotProduct += i
}
//println(dotProduct)
//If I specify some zero values, lets see what happens:
val Sparse3 = Map(0 -> 4, 1 -> 0, 3 -> 7, 6 -> 11, 11 -> 0, 18 -> 17, 20 -> 0).withDefaultValue(0)
val Sparse4 = Map(0 -> 0, 1 -> 3, 3 -> 3, 6 -> 0, 11 -> 2,18 -> 3, 20 -> 6).withDefaultValue(0)
val productOfValues2 = ((Sparse3.toSeq ++ Sparse4.toSeq).groupBy(_._1).mapValues(_.map(_._2).reduce(_*_)))
var dotProduct2 = 0
for ((l, m) <- productOfValues2) {
dotProduct2 += m
}
println(productOfValues2)
println(dotProduct2)//I get 72
```
I can create a Sparse Vector this way, and then update the values
import scala.collection.mutable.Map
val Sparse1 = Map[Int, Int]()
for (k <- 0 to 20) {
Sparse1 getOrElseUpdate (k, 0)
}
val Sparse2 = Map[Int, Int]()
for (k <- 0 to 20) {
Sparse2 getOrElseUpdate (k, 0)
}
But I'm wondering if there is a "better" way. More along the lines of what I tried and failed to do using "withDefaultValue(0)"
Since you are using sparse vectors, you can ignore all keys that are not on both vectors.
Thus, I would compute the intersection between both keys sets and then perform a simple map-reduce to compute the dot product.
type SparseVector[T] = Map[Int, T]
/** Generic function for any type T that can be multiplied & summed. */
def sparseDotProduct[T: Numeric](v1: SparseVector[T], v2: SparseVector[T]): T = {
import Numeric.Implicits._
val commonIndexes = v1.keySet & v2.keySet
commonIndexes
.map(i => v1(i) * v2(i))
.foldLeft(implicitly[Numeric[T]].zero)(_ + _)
}
Then, you can use it like this:
// The withDefault(0) is optional now.
val sparse1 = Map(0 -> 4, 3 -> 7, 6 -> 11, 18 -> 17).withDefaultValue(0)
val sparse2 = Map(1 -> 3, 3 -> 3, 11 -> 2, 18 -> 3, 20 -> 6).withDefaultValue(0)
sparseDotProduct(sparse1, sparse2)
// res: Int = 72
Edit - the same method, but without context bounds & implicit syntax.
type SparseVector[T] = Map[Int, T]
/** Generic function for any type T that can be multiplied & summed. */
def sparseDotProduct[T](v1: SparseVector[T], v2: SparseVector[T])(implicit N: Numeric[T]): T = {
val commonIndexes = v1.keySet & v2.keySet
commonIndexes
.map(i => N.times(v1(i), v2(i)))
.foldLeft(N.zero)((acc, element) => N.plus(acc, element))
}
Bonus - General approach for non-spare vectors.
One can modify the above method to work for any kind of vector, not just spare.
In this case, we would need the union of the keys, and take into account cases where one key does not exist on the other.
type MyVector[T] = Map[Int, T]
/** Generic function for any type T that can be multiplied & summed. */
def dotProduct[T: Numeric](v1: MyVector[T], v2: MyVector[T]): T = {
import Numeric.Implicits._
val zero = implicitly[Numeric[T]].zero
val allIndexes = v1.keySet | v2.keySet
allIndexes.map { i =>
v1.getOrElse(
key = i,
default = zero
) * v2.getOrElse(
key = i,
default = zero
)
}.foldLeft(zero)(_ + _)
}

Count occurrences of each item in a Scala parallel collection

My question is very similar to Count occurrences of each element in a List[List[T]] in Scala, except that I would like to have an efficient solution involving parallel collections.
Specifically, I have a large (~10^7) vector vec of short (~10) lists of Ints, and I would like to get for each Int x the number of times x occurs, for example as a Map[Int,Int]. The number of distinct integers is of the order 10^6.
Since the machine this needs to be done on has a fair amount of memory (150GB) and number of cores (>100) it seems like parallel collections would be a good choice for this. Is the code below a good approach?
val flatpvec = vec.par.flatten
val flatvec = flatpvec.seq
val unique = flatpvec.distinct
val counts = unique map (x => (x -> flatvec.count(_ == x)))
counts.toMap
Or are there better solutions? In case you are wondering about the .seq conversion: for some reason the following code doesn't seem to terminate, even for small examples:
val flatpvec = vec.par.flatten
val unique = flatpvec.distinct
val counts = unique map (x => (x -> flatpvec.count(_ == x)))
counts.toMap
This does something. aggregate is like fold except you also combine the results of the sequential folds.
Update: It's not surprising that there is overhead in .par.groupBy, but I was surprised by the constant factor. By these numbers, you would never count that way. Also, I had to bump the memory way up.
The interesting technique used to build the result map is described in this paper linked from the overview. (It cleverly saves the intermediate results and then coalesces them in parallel at the end.)
But copying around the intermediate results of the groupBy turns out to be expensive, if all you really want is a count.
The numbers are comparing sequential groupBy, parallel, and finally aggregate.
apm#mara:~/tmp$ scalacm countints.scala ; scalam -J-Xms8g -J-Xmx8g -J-Xss1m countints.Test
GroupBy: Starting...
Finished in 12695
GroupBy: List((233,10078), (237,20041), (268,9939), (279,9958), (315,10141), (387,9917), (462,9937), (680,9932), (848,10139), (858,10000))
Par GroupBy: Starting...
Finished in 51481
Par GroupBy: List((233,10078), (237,20041), (268,9939), (279,9958), (315,10141), (387,9917), (462,9937), (680,9932), (848,10139), (858,10000))
Aggregate: Starting...
Finished in 2672
Aggregate: List((233,10078), (237,20041), (268,9939), (279,9958), (315,10141), (387,9917), (462,9937), (680,9932), (848,10139), (858,10000))
Nothing magical in the test code.
import collection.GenTraversableOnce
import collection.concurrent.TrieMap
import collection.mutable
import concurrent.duration._
trait Timed {
def now = System.nanoTime
def timed[A](op: =>A): A = {
val start = now
val res = op
val end = now
val lapsed = (end - start).nanos.toMillis
Console println s"Finished in $lapsed"
res
}
def showtime(title: String, op: =>GenTraversableOnce[(Int,Int)]): Unit = {
Console println s"$title: Starting..."
val res = timed(op)
//val showable = res.toIterator.min //(res.toIterator take 10).toList
val showable = res.toList.sorted take 10
Console println s"$title: $showable"
}
}
It generates some random data for interest.
object Test extends App with Timed {
val upto = math.pow(10,6).toInt
val ran = new java.util.Random
val ten = (1 to 10).toList
val maxSamples = 1000
// samples of ten random numbers in the desired range
val samples = (1 to maxSamples).toList map (_ => ten map (_ => ran nextInt upto))
// pick a sample at random
def anyten = samples(ran nextInt maxSamples)
def mag = 7
val data: Vector[List[Int]] = Vector.fill(math.pow(10,mag).toInt)(anyten)
The sequential operation and the combining operation of aggregate are invoked from a task, and the result is assigned to a volatile var.
def z: mutable.Map[Int,Int] = mutable.Map.empty[Int,Int]
def so(m: mutable.Map[Int,Int], is: List[Int]) = {
for (i <- is) {
val v = m.getOrElse(i, 0)
m(i) = v + 1
}
m
}
def co(m: mutable.Map[Int,Int], n: mutable.Map[Int,Int]) = {
for ((i, count) <- n) {
val v = m.getOrElse(i, 0)
m(i) = v + count
}
m
}
showtime("GroupBy", data.flatten groupBy identity map { case (k, vs) => (k, vs.size) })
showtime("Par GroupBy", data.flatten.par groupBy identity map { case (k, vs) => (k, vs.size) })
showtime("Aggregate", data.par.aggregate(z)(so, co))
}
If you want to make use of parallel collections and Scala standard tools, you could do it like that. Group your collection by the identity and then map it to (Value, Count):
scala> val longList = List(1, 5, 2, 3, 7, 4, 2, 3, 7, 3, 2, 1, 7)
longList: List[Int] = List(1, 5, 2, 3, 7, 4, 2, 3, 7, 3, 2, 1, 7)
scala> longList.par.groupBy(x => x)
res0: scala.collection.parallel.immutable.ParMap[Int,scala.collection.parallel.immutable.ParSeq[Int]] = ParMap(5 -> ParVector(5), 1 -> ParVector(1, 1), 2 -> ParVector(2, 2, 2), 7 -> ParVector(7, 7, 7), 3 -> ParVector(3, 3, 3), 4 -> ParVector(4))
scala> longList.par.groupBy(x => x).map(x => (x._1, x._2.size))
res1: scala.collection.parallel.immutable.ParMap[Int,Int] = ParMap(5 -> 1, 1 -> 2, 2 -> 3, 7 -> 3, 3 -> 3, 4 -> 1)
Or even nicer like pagoda_5b suggested in the comments:
scala> longList.par.groupBy(identity).mapValues(_.size)
res1: scala.collection.parallel.ParMap[Int,Int] = ParMap(5 -> 1, 1 -> 2, 2 -> 3, 7 -> 3, 3 -> 3, 4 -> 1)

Comparing items in two lists

I have two lists : List(1,1,1) , List(1,0,1)
I want to get the following :
A count of every element that contains a 1 in first list and a 0 in the corresponding list at same position and vice versa.
In above example this would be 1 , 0 since the first list contains a 1 at middle position and second list contains a 0 at same position (middle).
A count of every element where 1 is in first list and 1 is also in second list.
In above example this is two since there are two 1's in each corresponding list. I can get this using the intersect method of class List.
I am just looking an answer to point 1 above. I could use an iterative a approach to count the items but is there a more functional method ?
Here is the entire code :
class Similarity {
def getSimilarity(number1: List[Int], number2: List[Int]) = {
val num: List[Int] = number1.intersect(number2)
println("P is " + num.length)
}
}
object HelloWorld {
def main(args: Array[String]) {
val s = new Similarity
s.getSimilarity(List(1, 1, 1), List(1, 0, 1))
}
}
For the first one:
scala> val a = List(1,1,1)
a: List[Int] = List(1, 1, 1)
scala> val b = List(1,0,1)
b: List[Int] = List(1, 0, 1)
scala> a.zip(b).filter(x => x._1==1 && x._2==0).size
res7: Int = 1
For the second:
scala> a.zip(b).filter(x => x._1==1 && x._2==1).size
res7: Int = 2
You can count all combinations easily and have it in a map with
def getSimilarity(number1 : List[Int] , number2 : List[Int]) = {
//sorry for the 1-liner, explanation follows
val countMap = (number1 zip number2) groupBy (identity) mapValues {_.length}
}
/*
* Example
* number1 = List(1,1,0,1,0,0,1)
* number2 = List(0,1,1,1,0,1,1)
*
* countMap = Map((1,0) -> 1, (1,1) -> 3, (0,1) -> 2, (0,0) -> 1)
*/
The trick is a common one
// zip the elements pairwise
(number1 zip number2)
/* List((1,0), (1,1), (0,1), (1,1), (0,0), (0,1), (1,1))
*
* then group together with the identity function, so pairs
* with the same elements are grouped together and the key is the pair itself
*/
.groupBy(identity)
/* Map( (1,0) -> List((1,0)),
* (1,1) -> List((1,1), (1,1), (1,1)),
* (0,1) -> List((0,1), (0,1)),
* (0,0) -> List((0,0))
* )
*
* finally you count the pairs mapping the values to the length of each list
*/
.mapValues(_.length)
/* Map( (1,0) -> 1,
* (1,1) -> 3,
* (0,1) -> 2,
* (0,0) -> 1
* )
Then all you need to do is lookup on the map
a.zip(b).filter(x => x._1 != x._2).size
Almost the same solution that was proposed by Jatin, except that you can useList.countfor a better lisibility:
def getSimilarity(l1: List[Int], l2: List[Int]) =
l1.zip(l2).count({case (x,y) => x != y})
You can also use foldLeft. Assuming there are no non-negative numbers:
a.zip(b).foldLeft(0)( (x,y) => if (y._1 + y._2 == 1) x + 1 else x )
1) You could zip 2 lists to get list of (Int, Int), collect only pairs (1, 0) and (0, 1), replace (1, 0) with 1 and (0, 1) with -1 and get sum. If count of (1, 0) and count of (0, 1) are the same the sum would be equal 0:
val (l1, l2) = (List(1,1,1) , List(1,0,1))
(l1 zip l2).collect{
case (1, 0) => 1
case (0, 1) => -1
}.sum == 0
You could use view method to prevent creation intermediate collections.
2) You could use filter and length to get count of elements with some condition:
(l1 zip l2).filter{ _ == (1, 1) }.length
(l1 zip l2).collect{ case (1, 1) => () }.length