Count occurrences of each item in a Scala parallel collection - scala

My question is very similar to Count occurrences of each element in a List[List[T]] in Scala, except that I would like to have an efficient solution involving parallel collections.
Specifically, I have a large (~10^7) vector vec of short (~10) lists of Ints, and I would like to get for each Int x the number of times x occurs, for example as a Map[Int,Int]. The number of distinct integers is of the order 10^6.
Since the machine this needs to be done on has a fair amount of memory (150GB) and number of cores (>100) it seems like parallel collections would be a good choice for this. Is the code below a good approach?
val flatpvec = vec.par.flatten
val flatvec = flatpvec.seq
val unique = flatpvec.distinct
val counts = unique map (x => (x -> flatvec.count(_ == x)))
counts.toMap
Or are there better solutions? In case you are wondering about the .seq conversion: for some reason the following code doesn't seem to terminate, even for small examples:
val flatpvec = vec.par.flatten
val unique = flatpvec.distinct
val counts = unique map (x => (x -> flatpvec.count(_ == x)))
counts.toMap

This does something. aggregate is like fold except you also combine the results of the sequential folds.
Update: It's not surprising that there is overhead in .par.groupBy, but I was surprised by the constant factor. By these numbers, you would never count that way. Also, I had to bump the memory way up.
The interesting technique used to build the result map is described in this paper linked from the overview. (It cleverly saves the intermediate results and then coalesces them in parallel at the end.)
But copying around the intermediate results of the groupBy turns out to be expensive, if all you really want is a count.
The numbers are comparing sequential groupBy, parallel, and finally aggregate.
apm#mara:~/tmp$ scalacm countints.scala ; scalam -J-Xms8g -J-Xmx8g -J-Xss1m countints.Test
GroupBy: Starting...
Finished in 12695
GroupBy: List((233,10078), (237,20041), (268,9939), (279,9958), (315,10141), (387,9917), (462,9937), (680,9932), (848,10139), (858,10000))
Par GroupBy: Starting...
Finished in 51481
Par GroupBy: List((233,10078), (237,20041), (268,9939), (279,9958), (315,10141), (387,9917), (462,9937), (680,9932), (848,10139), (858,10000))
Aggregate: Starting...
Finished in 2672
Aggregate: List((233,10078), (237,20041), (268,9939), (279,9958), (315,10141), (387,9917), (462,9937), (680,9932), (848,10139), (858,10000))
Nothing magical in the test code.
import collection.GenTraversableOnce
import collection.concurrent.TrieMap
import collection.mutable
import concurrent.duration._
trait Timed {
def now = System.nanoTime
def timed[A](op: =>A): A = {
val start = now
val res = op
val end = now
val lapsed = (end - start).nanos.toMillis
Console println s"Finished in $lapsed"
res
}
def showtime(title: String, op: =>GenTraversableOnce[(Int,Int)]): Unit = {
Console println s"$title: Starting..."
val res = timed(op)
//val showable = res.toIterator.min //(res.toIterator take 10).toList
val showable = res.toList.sorted take 10
Console println s"$title: $showable"
}
}
It generates some random data for interest.
object Test extends App with Timed {
val upto = math.pow(10,6).toInt
val ran = new java.util.Random
val ten = (1 to 10).toList
val maxSamples = 1000
// samples of ten random numbers in the desired range
val samples = (1 to maxSamples).toList map (_ => ten map (_ => ran nextInt upto))
// pick a sample at random
def anyten = samples(ran nextInt maxSamples)
def mag = 7
val data: Vector[List[Int]] = Vector.fill(math.pow(10,mag).toInt)(anyten)
The sequential operation and the combining operation of aggregate are invoked from a task, and the result is assigned to a volatile var.
def z: mutable.Map[Int,Int] = mutable.Map.empty[Int,Int]
def so(m: mutable.Map[Int,Int], is: List[Int]) = {
for (i <- is) {
val v = m.getOrElse(i, 0)
m(i) = v + 1
}
m
}
def co(m: mutable.Map[Int,Int], n: mutable.Map[Int,Int]) = {
for ((i, count) <- n) {
val v = m.getOrElse(i, 0)
m(i) = v + count
}
m
}
showtime("GroupBy", data.flatten groupBy identity map { case (k, vs) => (k, vs.size) })
showtime("Par GroupBy", data.flatten.par groupBy identity map { case (k, vs) => (k, vs.size) })
showtime("Aggregate", data.par.aggregate(z)(so, co))
}

If you want to make use of parallel collections and Scala standard tools, you could do it like that. Group your collection by the identity and then map it to (Value, Count):
scala> val longList = List(1, 5, 2, 3, 7, 4, 2, 3, 7, 3, 2, 1, 7)
longList: List[Int] = List(1, 5, 2, 3, 7, 4, 2, 3, 7, 3, 2, 1, 7)
scala> longList.par.groupBy(x => x)
res0: scala.collection.parallel.immutable.ParMap[Int,scala.collection.parallel.immutable.ParSeq[Int]] = ParMap(5 -> ParVector(5), 1 -> ParVector(1, 1), 2 -> ParVector(2, 2, 2), 7 -> ParVector(7, 7, 7), 3 -> ParVector(3, 3, 3), 4 -> ParVector(4))
scala> longList.par.groupBy(x => x).map(x => (x._1, x._2.size))
res1: scala.collection.parallel.immutable.ParMap[Int,Int] = ParMap(5 -> 1, 1 -> 2, 2 -> 3, 7 -> 3, 3 -> 3, 4 -> 1)
Or even nicer like pagoda_5b suggested in the comments:
scala> longList.par.groupBy(identity).mapValues(_.size)
res1: scala.collection.parallel.ParMap[Int,Int] = ParMap(5 -> 1, 1 -> 2, 2 -> 3, 7 -> 3, 3 -> 3, 4 -> 1)

Related

Calculating the Dot Product of two Sparse Vectors (and generating them) in Scala using the standard library

I am trying to calculate the dot product (scalar product) of two sparse vectors in Scala. The code I have written is doing everything that I want it to, except when multiplying the similar elements of the two vectors, it is not accounting for the 0 values.
I expect to get 72 as my answer as 3 and 18 are the only keys that are both non-zero and they evaluate to: (3 -> 21) + (18 -> 51) = 72
I used withDefaultValue(0) hoping it would "fill in" the unmentioned key/value pairs but I do not think this is the case, and I believe this is where my problem is coming from, in the very beginning. I think my question could also be "How to generate a Sparse Vector in Scala using the Standard Library".
If I enter the corresponding 0's and the two Maps (vectors) have the same number of key/value pairs, my code works properly.
```
val Sparse1 = Map(0 -> 4, 3 -> 7, 6 -> 11, 18 -> 17).withDefaultValue(0)
val Sparse2 = Map(1 -> 3, 3 -> 3, 11 -> 2,18 -> 3, 20 -> 6).withDefaultValue(0)
//println(Sparse2.toSeq)//to see what it is....0's missing
val SparseSum = (Sparse1.toSeq ++ Sparse2.toSeq).groupBy(_._1).mapValues(_.map(_._2).sum)
//println(SparseSum)
val productOfValues = ((Sparse1.toSeq ++ Sparse2.toSeq).groupBy(_._1).mapValues(_.map(_._2).reduce(_*_)))
//println(productOfValues)
var dotProduct = 0
for ((h,i) <- productOfValues) {
dotProduct += i
}
//println(dotProduct)
//If I specify some zero values, lets see what happens:
val Sparse3 = Map(0 -> 4, 1 -> 0, 3 -> 7, 6 -> 11, 11 -> 0, 18 -> 17, 20 -> 0).withDefaultValue(0)
val Sparse4 = Map(0 -> 0, 1 -> 3, 3 -> 3, 6 -> 0, 11 -> 2,18 -> 3, 20 -> 6).withDefaultValue(0)
val productOfValues2 = ((Sparse3.toSeq ++ Sparse4.toSeq).groupBy(_._1).mapValues(_.map(_._2).reduce(_*_)))
var dotProduct2 = 0
for ((l, m) <- productOfValues2) {
dotProduct2 += m
}
println(productOfValues2)
println(dotProduct2)//I get 72
```
I can create a Sparse Vector this way, and then update the values
import scala.collection.mutable.Map
val Sparse1 = Map[Int, Int]()
for (k <- 0 to 20) {
Sparse1 getOrElseUpdate (k, 0)
}
val Sparse2 = Map[Int, Int]()
for (k <- 0 to 20) {
Sparse2 getOrElseUpdate (k, 0)
}
But I'm wondering if there is a "better" way. More along the lines of what I tried and failed to do using "withDefaultValue(0)"
Since you are using sparse vectors, you can ignore all keys that are not on both vectors.
Thus, I would compute the intersection between both keys sets and then perform a simple map-reduce to compute the dot product.
type SparseVector[T] = Map[Int, T]
/** Generic function for any type T that can be multiplied & summed. */
def sparseDotProduct[T: Numeric](v1: SparseVector[T], v2: SparseVector[T]): T = {
import Numeric.Implicits._
val commonIndexes = v1.keySet & v2.keySet
commonIndexes
.map(i => v1(i) * v2(i))
.foldLeft(implicitly[Numeric[T]].zero)(_ + _)
}
Then, you can use it like this:
// The withDefault(0) is optional now.
val sparse1 = Map(0 -> 4, 3 -> 7, 6 -> 11, 18 -> 17).withDefaultValue(0)
val sparse2 = Map(1 -> 3, 3 -> 3, 11 -> 2, 18 -> 3, 20 -> 6).withDefaultValue(0)
sparseDotProduct(sparse1, sparse2)
// res: Int = 72
Edit - the same method, but without context bounds & implicit syntax.
type SparseVector[T] = Map[Int, T]
/** Generic function for any type T that can be multiplied & summed. */
def sparseDotProduct[T](v1: SparseVector[T], v2: SparseVector[T])(implicit N: Numeric[T]): T = {
val commonIndexes = v1.keySet & v2.keySet
commonIndexes
.map(i => N.times(v1(i), v2(i)))
.foldLeft(N.zero)((acc, element) => N.plus(acc, element))
}
Bonus - General approach for non-spare vectors.
One can modify the above method to work for any kind of vector, not just spare.
In this case, we would need the union of the keys, and take into account cases where one key does not exist on the other.
type MyVector[T] = Map[Int, T]
/** Generic function for any type T that can be multiplied & summed. */
def dotProduct[T: Numeric](v1: MyVector[T], v2: MyVector[T]): T = {
import Numeric.Implicits._
val zero = implicitly[Numeric[T]].zero
val allIndexes = v1.keySet | v2.keySet
allIndexes.map { i =>
v1.getOrElse(
key = i,
default = zero
) * v2.getOrElse(
key = i,
default = zero
)
}.foldLeft(zero)(_ + _)
}

Scala: How to "map" an Array[Int] to a Map[String, Int] using the "map" method?

I have the following Array[Int]: val array = Array(1, 2, 3), for which I have the following mapping relation between an Int and a String:
val a1 = array.map{
case 1 => "A"
case 2 => "B"
case 3 => "C"
}
To create a Map to contain the above mapping relation, I am aware that I can use a foldLeft method:
val a2 = array.foldLeft(Map[String, Int]()) { (m, e) =>
m + (e match {
case 1 => ("A", 1)
case 2 => "B" -> 2
case 3 => "C" -> 3
})
}
which outputs:
a2: scala.collection.immutable.Map[String,Int] = Map(A -> 1, B -> 2, C
-> 3)
This is the result I want. But can I achieve the same result via the map method?
The following codes do not work:
val a3 = array.map[(String, Int), Map[String, Int]] {
case 1 => ("A", 1)
case 2 => ("B", 2)
case 3 => ("C", 3)
}
The signature of map is
def map[B, That](f: A => B)
(implicit bf: CanBuildFrom[Repr, B, That]): That
What is this CanBuildFrom[Repr, B, That]? I tried to read Tribulations of CanBuildFrom but don't really understand it. That article mentioned Scala 2.12+ has provided two implementations for map. But how come I didn't find it when I use Scala 2.12.4?
I mostly use Scala 2.11.12.
Call toMap in the end of your expression:
val a3 = array.map {
case 1 => ("A", 1)
case 2 => ("B", 2)
case 3 => ("C", 3)
}.toMap
I'll first define your function here for the sake of brevity in later explanation:
// worth noting that this function is effectively partial
// i.e. will throw a `MatchError` if n is not in (1, 2, 3)
def toPairs(n: Int): (String, Int) =
n match {
case 1 => "a" -> 1
case 2 => "b" -> 2
case 3 => "c" -> 3
}
One possible way to go (as already highlighted in another answer) is to use toMap, which only works on collection of pairs:
val ns = Array(1, 2, 3)
ns.toMap // doesn't compile
ns.map(toPairs).toMap // does what you want
It is worth noting however that unless you are working with a lazy representation (like an Iterator or a Stream) this will result in two passes over the collection and the creation of unnecessary intermediate collections: the first time by mapping toPairs over the collection and then by turning the whole collection from a collection of pairs to a Map (with toMap).
You can see it clearly in the implementation of toMap.
As suggested in the read you already linked in the answer (and in particular here) You can avoid this double pass in two ways:
you can leverage scala.collection.breakOut, an implementation of CanBuildFrom that you can give map (among others) to change the target collection, provided that you explicitly provide a type hint for the compiler:
val resultMap: Map[String, Int] = ns.map(toPairs)(collection.breakOut)
val resultSet: Set[(String, Int)] = ns.map(toPairs)(collection.breakOut)
otherwise, you can create a view over your collection, which puts it in the lazy wrapper that you need for the operation to not result in a double pass
ns.view.map(toPairs).toMap
You can read more about implicit builder providers and views in this Q&A.
Basically toMap (credits to Sergey Lagutin) is the right answer.
You could actually make the code a bit more compact though:
val a1 = array.map { i => ((i + 64).toChar, i) }.toMap
If you run this code:
val array = Array(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 0)
val a1 = array.map { i => ((i + 64).toChar, i) }.toMap
println(a1)
You will see this on the console:
Map(E -> 5, J -> 10, F -> 6, A -> 1, # -> 0, G -> 7, L -> 12, B -> 2, C -> 3, H -> 8, K -> 11, D -> 4)

Sum of Values based on key in scala

I am new to scala I have List of Integers
val list = List((1,2,3),(2,3,4),(1,2,3))
val sum = list.groupBy(_._1).mapValues(_.map(_._2)).sum
val sum2 = list.groupBy(_._1).mapValues(_.map(_._3)).sum
How to perform N values I tried above but its not good way how to sum N values based on key
Also I have tried like this
val sum =list.groupBy(_._1).values.sum => error
val sum =list.groupBy(_._1).mapvalues(_.map(_._2).sum (_._3).sum) error
It's easier to convert these tuples to List[Int] with shapeless and then work with them. Your tuples are actually more like lists anyways. Also, as a bonus, you don't need to change your code at all for lists of Tuple4, Tuple5, etc.
import shapeless._, syntax.std.tuple._
val list = List((1,2,3),(2,3,4),(1,2,3))
list.map(_.toList) // convert tuples to list
.groupBy(_.head) // group by first element of list
.mapValues(_.map(_.tail).map(_.sum).sum) // sums elements of all tails
Result is Map(2 -> 7, 1 -> 10).
val sum = list.groupBy(_._1).map(i => (i._1, i._2.map(j => j._1 + j._2 + j._3).sum))
> sum: scala.collection.immutable.Map[Int,Int] = Map(2 -> 9, 1 -> 12)
Since tuple can't type safe convert to List, need to specify add one by one as j._1 + j._2 + j._3.
using the first element in the tuple as the key and the remaining elements as what you need you could do something like this:
val list = List((1,2,3),(2,3,4),(1,2,3))
list: List[(Int, Int, Int)] = List((1, 2, 3), (2, 3, 4), (1, 2, 3))
val sum = list.groupBy(_._1).map { case (k, v) => (k -> v.flatMap(_.productIterator.toList.drop(1).map(_.asInstanceOf[Int])).sum) }
sum: Map[Int, Int] = Map(2 -> 7, 1 -> 10)
i know its a bit dirty to do asInstanceOf[Int] but when you do .productIterator you get a Iterator of Any
this will work for any tuple size

How can I find repeated items in a Scala List?

I have a Scala List that contains some repeated numbers. I want to count the number of times a specific number will repeat itself. For example:
val list = List(1,2,3,3,4,2,8,4,3,3,5)
val repeats = list.takeWhile(_ == List(3,3)).size
And the val repeats would equal 2.
Obviously the above is pseudo-code and takeWhile will not find two repeated 3s since _ represents an integer. I tried mixing both takeWhile and take(2) but with little success. I also referred code from How to find count of repeatable elements in scala list but it appears the author is looking to achieve something different.
Thanks for your help.
This will work in this case:
val repeats = list.sliding(2).count(_.forall(_ == 3))
The sliding(2) method gives you an iterator of lists of elements and successors and then we just count where these two are equal to 3.
Question is if it creates the correct result to List(3, 3, 3)? Do you want that to be 2 or just 1 repeat.
val repeats = list.sliding(2).toList.count(_==List(3,3))
and more generally the following code returns tuples of element and repeats value for all elements:
scala> list.distinct.map(x=>(x,list.sliding(2).toList.count(_.forall(_==x))))
res27: List[(Int, Int)] = List((1,0), (2,0), (3,2), (4,0), (8,0), (5,0))
which means that the element '3' repeats 2 times consecutively at 2 places and all others 0 times.
and also if we want element repeats 3 times consecutively we just need to modify the code as follows:
list.distinct.map(x=>(x,list.sliding(3).toList.count(_.forall(_==x))))
in SCALA REPL:
scala> val list = List(1,2,3,3,3,4,2,8,4,3,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 3, 4, 2, 8, 4, 3, 3, 3, 5)
scala> list.distinct.map(x=>(x,list.sliding(3).toList.count(_==List(x,x,x))))
res29: List[(Int, Int)] = List((1,0), (2,0), (3,2), (4,0), (8,0), (5,0))
Even sliding value can be varied by defining a function as:
def repeatsByTimes(list:List[Int],n:Int) =
list.distinct.map(x=>(x,list.sliding(n).toList.count(_.forall(_==x))))
Now in REPL:
scala> val list = List(1,2,3,3,4,2,8,4,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 4, 2, 8, 4, 3, 3, 5)
scala> repeatsByTimes(list,2)
res33: List[(Int, Int)] = List((1,0), (2,0), (3,2), (4,0), (8,0), (5,0))
scala> val list = List(1,2,3,3,3,4,2,8,4,3,3,3,2,4,3,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 3, 4, 2, 8, 4, 3, 3, 3, 2, 4, 3, 3, 3, 5)
scala> repeatsByTimes(list,3)
res34: List[(Int, Int)] = List((1,0), (2,0), (3,3), (4,0), (8,0), (5,0))
scala>
We can go still further like given a list of integers and given a maximum number
of consecutive repetitions that any of the element can occur in the list, we may need a list of 3-tuples representing (the element, number of repetitions of this element, at how many places this repetition occurred). this is more exhaustive information than the above. Can be achieved by writing a function like this:
def repeats(list:List[Int],maxRep:Int) =
{ var v:List[(Int,Int,Int)] = List();
for(i<- 1 to maxRep)
v = v ++ list.distinct.map(x=>
(x,i,list.sliding(i).toList.count(_.forall(_==x))))
v.sortBy(_._1) }
in SCALA REPL:
scala> val list = List(1,2,3,3,3,4,2,8,4,3,3,3,2,4,3,3,3,5)
list: List[Int] = List(1, 2, 3, 3, 3, 4, 2, 8, 4, 3, 3, 3, 2, 4, 3, 3, 3, 5)
scala> repeats(list,3)
res38: List[(Int, Int, Int)] = List((1,1,1), (1,2,0), (1,3,0), (2,1,3),
(2,2,0), (2,3,0), (3,1,9), (3,2,6), (3,3,3), (4,1,3), (4,2,0), (4,3,0),
(5,1,1), (5,2,0), (5,3,0), (8,1,1), (8,2,0), (8,3,0))
scala>
These results can be understood as follows:
1 times the element '1' occurred at 1 places.
2 times the element '1' occurred at 0 places.
............................................
............................................
.............................................
2 times the element '3' occurred at 6 places..
.............................................
3 times the element '3' occurred at 3 places...
............................................and so on.
Thanks to Luigi Plinge I was able to use methods in run-length encoding to group together items in a list that repeat. I used some snippets from this page here: http://aperiodic.net/phil/scala/s-99/
var n = 0
runLengthEncode(totalFrequencies).foreach{ o =>
if(o._1 > 1 && o._2==subjectNumber) n+=1
}
n
The method runLengthEncode is as follows:
private def pack[A](ls: List[A]): List[List[A]] = {
if (ls.isEmpty) List(List())
else {
val (packed, next) = ls span { _ == ls.head }
if (next == Nil) List(packed)
else packed :: pack(next)
}
}
private def runLengthEncode[A](ls: List[A]): List[(Int, A)] =
pack(ls) map { e => (e.length, e.head) }
I'm not entirely satisfied that I needed to use the mutable var n to count the number of occurrences but it did the trick. This will count the number of times a number repeats itself no matter how many times it is repeated.
If you knew your list was not very long you could do it with Strings.
val list = List(1,2,3,3,4,2,8,4,3,3,5)
val matchList = List(3,3)
(matchList.mkString(",")).r.findAllMatchIn(list.mkString(",")).length
From you pseudocode I got this working:
val pairs = list.sliding(2).toList //create pairs of consecutive elements
val result = pairs.groupBy(x => x).map{ case(x,y) => (x,y.size); //group pairs and retain the size, which is the number of occurrences.
result will be a Map[List[Int], Int] so you can the count number like:
result(List(3,3)) // will return 2
I couldn't understand if you also want to check lists of several sizes, then you would need to change the parameter to sliding to the desired size.
def pack[A](ls: List[A]): List[List[A]] = {
if (ls.isEmpty) List(List())
else {
val (packed, next) = ls span { _ == ls.head }
if (next == Nil) List(packed)
else packed :: pack(next)
}
}
def encode[A](ls: List[A]): List[(Int, A)] = pack(ls) map { e => (e.length, e.head) }
val numberOfNs = list.distinct.map{ n =>
(n -> list.count(_ == n))
}.toMap
val runLengthPerN = runLengthEncode(list).map{ t => t._2 -> t._1}.toMap
val nRepeatedMostInSuccession = runLengthPerN.toList.sortWith(_._2 <= _._2).head._1
Where runLength is defined as below from scala's 99 problems problem 9 and scala's 99 problems problem 10.
Since numberOfNs and runLengthPerN are Maps, you can get the population count of any number in the list with numberOfNs(number) and the length of the longest repitition in succession with runLengthPerN(number). To get the runLength, just compute as above with runLength(list).map{ t => t._2 -> t._1 }.

Scala: idiomatic way to merge list of maps with the greatest value of each key?

I have a List of Map[Int, Int], that all have the same keys (from 1 to 20) and I'd like to merge their contents into a single Map[Int, Int].
I've read another post on stack overflow about merging maps that uses |+| from the scalaz library.
I've come up with the following solution, but it seems clunky to me.
val defaultMap = (2 to ceiling).map((_,0)).toMap
val factors: Map[Int, Int] = (2 to ceiling). map(primeFactors(_)).
foldRight(defaultMap)(mergeMaps(_, _))
def mergeMaps(xm: Map[Int, Int], ym: Map[Int, Int]): Map[Int,Int] = {
def iter(acc: Map[Int,Int], other: Map[Int,Int], i: Int): Map[Int,Int] = {
if (other.isEmpty) acc
else iter(acc - i + (i -> math.max(acc(i), other(i))), other - i, i + 1)
}
iter(xm, ym, 2)
}
def primeFactors(number: Int): Map[Int, Int] = {
def iter(factors: Map[Int,Int], rem: Int, i: Int): Map[Int,Int] = {
if (i > number) factors
else if (rem % i == 0) iter(factors - i + (i -> (factors(i)+1)), rem / i, i)
else iter(factors, rem, i + 1)
}
iter((2 to ceiling).map((_,0)).toMap, number, 2)
}
Explanation: val factors creates a list of maps that each represent the prime factors for the numbers from 2-20; then these 18 maps are folded into a single map containing the greatest value for each key.
UPDATE
Using the suggestion of #folone, I end up with the following code (a definite improvement over my original version, and I don't have to change the Maps to HashMaps):
import scalaz._
import Scalaz._
import Tags._
/**
* Smallest Multiple
*
* 2520 is the smallest number that can be divided by each of the numbers
* from 1 to 10 without any remainder. What is the smallest positive number
* that is evenly divisible by all of the numbers from 1 to 20?
*
* User: Alexandros Bantis
* Date: 1/29/13
* Time: 8:07 PM
*/
object Problem005 {
def findSmallestMultiple(ceiling: Int): Int = {
val factors = (2 to ceiling).map(primeFactors(_).mapValues(MaxVal)).reduce(_ |+| _)
(1 /: factors.map(m => intPow(m._1, m._2)))(_ * _)
}
private def primeFactors(number: Int): Map[Int, Int] = {
def iter(factors: Map[Int,Int], rem: Int, i: Int): Map[Int,Int] = {
if (i > number) factors.filter(_._2 > 0).mapValues(MaxVal)
else if (rem % i == 0) iter(factors - i + (i -> (factors(i)+1)), rem / i, i)
else iter(factors, rem, i + 1)
}
iter((2 to number).map((_,0)).toMap, number, 2)
}
private def intPow(x: Int, y: Int): Int = {
def iter(acc: Int, rem: Int): Int = {
if (rem == 0) acc
else iter(acc * x, rem -1)
}
if (y == 0) 1 else iter(1, y)
}
}
This solution does not work for general Maps, but if you are using immutable.HashMaps you may consider the merged method:
def merged[B1 >: B](that: HashMap[A, B1])(mergef: ((A, B1), (A, B1)) ⇒ (A, B1)): HashMap[A, B1]
Creates a new map which is the merge of this and the argument hash
map.
Uses the specified collision resolution function if two keys are the
same. The collision resolution function will always take the first
argument from this hash map and the second from that.
The merged method is on average more performant than doing a traversal
and reconstructing a new immutable hash map from scratch, or ++.
Use case:
val m1 = immutable.HashMap[Int, Int](1 -> 2, 2 -> 3)
val m2 = immutable.HashMap[Int, Int](1 -> 3, 4 -> 5)
m1.merged(m2) {
case ((k1, v1), (k2, v2)) => ((k1, math.max(v1, v2)))
}
As your tags suggest, you might be interested in a scalaz solution. Here goes:
> console
[info] Starting scala interpreter...
[info]
Welcome to Scala version 2.10.0 (OpenJDK 64-Bit Server VM, Java 1.7.0_15).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import scalaz._, Scalaz._, Tags._
import scalaz._
import Scalaz._
import Tags._
There exists a Semigroup instance for Ints under a maximum operation:
scala> Semigroup[Int ## MaxVal]
res0: scalaz.Semigroup[scalaz.##[Int,scalaz.Tags.MaxVal]] = scalaz.Semigroup$$anon$9#15a9a9c6
Let's just use it:
scala> val m1 = Map(1 -> 2, 2 -> 3) mapValues MaxVal
m1: scala.collection.immutable.Map[Int,scalaz.##[Int,scalaz.Tags.MaxVal]] = Map(1 -> 2, 2 -> 3)
scala> val m2 = Map(1 -> 3, 4 -> 5) mapValues MaxVal
m2: scala.collection.immutable.Map[Int,scalaz.##[Int,scalaz.Tags.MaxVal]] = Map(1 -> 3, 4 -> 5)
scala> m1 |+| m2
res1: scala.collection.immutable.Map[Int,scalaz.##[Int,scalaz.Tags.MaxVal]] = Map(1 -> 3, 4 -> 5, 2 -> 3)
If you're interested in how this "tagging" (the ## thing) works, here's a good explanation: http://etorreborre.blogspot.de/2011/11/practical-uses-for-unboxed-tagged-types.html
Starting Scala 2.13, another solution only based on the standard library consists in merging the Maps as sequences before applying a groupMapReduce which (as its name suggests) is an equivalent of a groupBy followed by a mapping and a reduce step on values:
// val map1 = Map(1 -> 2, 2 -> 3)
// val map2 = Map(1 -> 3, 4 -> 5)
(map1.toSeq ++ map2).groupMapReduce(_._1)(_._2)(_ max _)
// Map[Int,Int] = Map(2 -> 3, 4 -> 5, 1 -> 3)
This:
concatenates the two maps as a sequence of tuples (List((1,2), (2,3), (1,3), (4,5))). For conciseness, map2 is implicitly converted to Seq to adopt the type of map1.toSeq - but you could choose to make it explicit by using map2.toSeq.
groups elements based on their first tuple part (group part of groupMapReduce)
maps grouped values to their second tuple part (map part of groupMapReduce)
reduces mapped values (_ max _) by taking their max (reduce part of groupMapReduce)