Scala: optional sequence of arguments - scala

Here's some data:
val data = List(1,1,2,2,3)
I would like to write a function filteredSum which supports the following:
/*1*/ filteredSum(data) // Outputs 0
/*2*/ filteredSum(data, 1) // Outputs 1 + 1 = 2
/*3*/ filteredSum(data, 1, 3) // Outputs 1 + 1 + 3 = 5
/*4*/ filteredSum(data, None) // Outputs 1 + 1 + 2 + 2 + 3 = 9
There are a couple close misses; for instance * notation supports the first three calls:
def filteredSum(data: Seq[Int], filterValues: Int*): Int = {
data.intersect(filterValues).sum
}
And options give you the fourth:
def filteredSum(data: Seq[Int], filterValues: Option[Seq[Int]]) : Int = {
if(filterValues.nonempty) data.intersect(filterValues.get).sum
else data.sum
}
But with this implementation the first three calls look a lot clunkier: filteredSum(data, Some(Seq(1))), for instance.
Any other ideas? (Obviously my actual use case is much more complicated than just adding some numbers together, so I'm not looking for answers that are too closely tied to the intersect or sum functions.)

Make two functions:
def filteredSum(data: Seq[Int], filterValues: Int*): Int =
data.filter(filterValues.toSet).sum
def filteredSum(data: Seq[Int], all: Boolean) : Int =
if(all) data.sum else 0

Related

How to avoid boxing when composing numbers in base 3 in Scala

I want to compose numbers in base 3, represented as fixed-length Array[Byte].
Here are some attempts :
val byteBoard = Array.fill(9)(1.toByte)
val cache: Seq[(Int, Int)] = (0 to 8).map(i => (i, math.pow(3d, i.toDouble).toInt))
#Benchmark
def composePow(): Unit = {
val _ = (0 to 8).foldLeft(0) { case (acc, i) => acc + math.pow(3d, i.toDouble).toInt * byteBoard(i) }
}
#Benchmark
def composeCachedPowWithFold(): Unit = {
val _ = cache.foldLeft(0) { case (acc, (i, k)) => acc + k * byteBoard(i).toInt }
}
#Benchmark
def composeCachedPowWithForeach(): Unit = {
var acc = 0
cache.foreach { case (i, k) => acc = acc + k * byteBoard(i)}
}
#Benchmark
def composeUnrolled(): Unit = {
val _ = byteBoard(0) +
3 * byteBoard(1) +
3 * 3 * byteBoard(2) +
3 * 3 * 3 * byteBoard(3) +
3 * 3 * 3 * 3 * byteBoard(4) +
3 * 3 * 3 * 3 * 3 * byteBoard(5) +
3 * 3 * 3 * 3 * 3 * 3 * byteBoard(6) +
3 * 3 * 3 * 3 * 3 * 3 * 3 * byteBoard(7) +
3 * 3 * 3 * 3 * 3 * 3 * 3 * 3 * byteBoard(8)
}
Can you confirm the following conclusion :
composePow: boxing + type conversions to use math.pow
composeCachedPowWithFold: boxing because fold is a parameterized method
composeCachedPowWithForeach: no boxing, less idiomatic Scala (local mutation)
composeUnrolled: no boxing
And explain why 4. is way faster than 3. ?
PS: Here are the result of the JMH Benchmark
[info] IndexBenchmark.composeCachedPowWithFold thrpt 10 7180844,823 ± 1015310,847 ops/s
[info] IndexBenchmark.composeCachedPowWithForeach thrpt 10 14234192,613 ± 1449571,042 ops/s
[info] IndexBenchmark.composePow thrpt 10 1515312,179 ± 34892,700 ops/s
[info] IndexBenchmark.composeUnrolled thrpt 10 152297653,110 ± 2237446,053 ops/s
I mostly agree with your analysis of the cases 1,2,4, but the third variant is really funny!
I agree with you about the first two versions: foldLeft is not #specialized, so, yes, there is some boxing-unboxing. But math.pow is evil for integer arithmetic anyway, and all those conversions only incur additional overhead.
Now let's take a closer look at the third variant. It is so slow because you are constructing a closure over mutable state. Look at output of scala -print. This is what your method is rewritten into:
private def composeCachedPowWithForeach(): Unit = {
var acc: runtime.IntRef = scala.runtime.IntRef.create(0);
anon$1.this.cache().foreach({
((x0$3: Tuple2) =>
anon$1.this.
$anonfun$composeCachedPowWithForeach$1(acc, x0$3))
})
};
And here is the function used in foreach:
final <artifact> private[this] def
$anonfun$composeCachedPowWithForeach$1(
acc$1: runtime.IntRef, x0$3: Tuple2
): Unit = {
case <synthetic> val x1: Tuple2 = x0$3;
case4(){
if (x1.ne(null))
{
val i: Int = x1._1$mcI$sp();
val k: Int = x1._2$mcI$sp();
matchEnd3({
acc$1.elem = acc$1.elem.+(k.*(anon$1.this.byteBoard().apply(i)));
scala.runtime.BoxedUnit.UNIT
})
}
else
case5()
};
case5(){
matchEnd3(throw new MatchError(x1))
};
matchEnd3(x: scala.runtime.BoxedUnit){
()
}
};
You see that there is apparently lot of code generated by pattern matching. I'm not sure whether it contributes much to the overhead. What I personally find much more interesting is the runtime.IntRef part. This is an object that keeps a mutable variable that corresponds to var acc in your code. Even though it looks like a simple local variable in the code, it must be referenced from the closure somehow, and is therefore wrapped into an object, and evicted to the heap. I assume that accessing this mutable variable on the heap causes most of the overhead.
In contrast to that, if the byteBoard were passed as an argument, then nothing in the fourth variant would ever leave function's stack frame:
private def composeUnrolled(): Unit = {
val _: Int =
anon$1.this.byteBoard().apply(0).+
(3.*(anon$1.this.byteBoard().apply(1))).+
(9.*(anon$1.this.byteBoard().apply(2))).+
(27.*(anon$1.this.byteBoard().apply(3))).+
(81.*(anon$1.this.byteBoard().apply(4))).+
(243.*(anon$1.this.byteBoard().apply(5))).+
(729.*(anon$1.this.byteBoard().apply(6))).+
(2187.*(anon$1.this.byteBoard().apply(7))).+
(6561.*(anon$1.this.byteBoard().apply(8)));
()
};
There is essentially no control flow to speak of, not really any method invocations (the apply is for accessing array elements, that doesn't count), and overall it's just one very simple arithmetic operation, that might even fit into registers of your processor. That's why it is so fast.
While you are at it, you might want to benchmark these two methods:
def ternaryToInt5(bytes: Array[Byte]): Int = {
var acc = 0
val n = bytes.size
var i = n - 1
while (i >= 0) {
acc *= 3
acc += bytes(i)
i -= 1
}
acc
}
def ternaryToInt6(bytes: Array[Byte]): Int = {
bytes(0) +
3 * (bytes(1) +
3 * (bytes(2) +
3 * (bytes(3) +
3 * (bytes(4) +
3 * (bytes(5) +
3 * (bytes(6) +
3 * (bytes(7) +
3 * (bytes(8)))))))))
}
Also, if you are working with byte arrays frequently, you might find this syntactic sugar useful.

Scala - What type are the numbers in the List using x.toString.toList?

I have written a function in Scala that should calculate the sum of the squares of the digits of a number. Eg: 44 -> 32 (4^2 + 4^2 = 16 + 16 = 32)
Here it is:
def digitSum(x:BigInt) : BigInt = {
var sum = 0
val leng = x.toString.toList.length
var y = x.toString.toList
for (i<-0 until leng ) {
sum += y(i).toInt * y(i).toInt
}
return sum
}
However when I call the function let's say with digitSum(44) instead of 32 I get 5408.
Why is this happening? Does it have to do with the fact that in the list there are Strings? If so why does the .toInt method do not work?
Thanks!
The answer to your questions has been already covered here Scala int value of String characters, have a good read through and you will have more information than required ;)
Also looking at your code, it can benefit more from Scala expressiveness and functional features. The same function can be written in the following manner:
def digitSum(x: BigInt) = x.toString
.map(_.asDigit)
.map(a => a * a)
.sum
In the future try to avoid using mutable variables and standard looping techniques if you could.
When you do toString you're mapping the String to Chars not Ints and then to Ints later. This is what it looks like in the repl:
scala> "1".toList.map(_.toInt)
res0: List[Int] = List(49)
What you want is probably something like this:
def digitSum(x:BigInt) : BigInt = {
var sum = 0
val leng = x.toString.toList.length
var y = x.toString.toList
for (i<-0 until leng ) {
sum += (y(i).toInt - 48) * (y(i).toInt - 48) //Subtract out char base
}
sum
}

'let...in' expression in Scala

In OCaml, the let...in expression allows you to created a named local variable in an expression rather than a statement. (Yes I know that everything is technically an expression, but Unit return values are fairly useless.) Here's a quick example in OCaml:
let square_the_sum a b = (* function definition *)
let sum = a + b in (* declare a named local called sum *)
sum * sum (* return the value of this expression *)
Here's what I would want the equivalent Scala to look like:
def squareTheSum(a: Int, b: Int): Int =
let sum: Int = a + b in
sum * sum
Is there anything in Scala that I can use to achieve this?
EDIT:
You learn something new every day, and this has been answered before.
object ForwardPipeContainer {
implicit class ForwardPipe[A](val value: A) extends AnyVal {
def |>[B](f: A => B): B = f(value)
}
}
import ForwardPipeContainer._
def squareTheSum(a: Int, b: Int): Int = { a + b } |> { sum => sum * sum }
But I'd say that is not nearly as easy to read, and is not as flexible (it gets awkward with nested lets).
You can nest val and def in a def. There's no special syntax; you don't need a let.
def squareTheSum(a: Int, b: Int): Int = {
val sum = a + b
sum * sum
}
I don't see the readability being any different here at all. But if you want to only create the variable within the expression, you can still do that with curly braces like this:
val a = 2 //> a : Int = 2
val b = 3 //> b : Int = 3
val squareSum = { val sum = a + b; sum * sum } //> squareSum : Int = 25
There is no significant difference here between a semicolon and the word "in" (or you could move the expression to the next line, and pretend that "in" is implied if it makes it more OCaml-like :D).
val squareSum = {
val sum = a + b // in
sum * sum
}
Another, more technical, take on this: Clojure's 'let' equivalent in Scala. I think the resulting structures are pretty obtuse compared to the multi-statement form.

Updating a 2d table of counts

Suppose I want a Scala data structure that implements a 2-dimensional table of counts that can change over time (i.e., individual cells in the table can be incremented or decremented). What should I be using to do this?
I could use a 2-dimensional array:
val x = Array.fill[Int](1, 2) = 0
x(1)(2) += 1
But Arrays are mutable, and I guess I should slightly prefer immutable data structures.
So I thought about using a 2-dimensional Vector:
val x = Vector.fill[Int](1, 2) = 0
// how do I update this? I want to write something like val newX : Vector[Vector[Int]] = x.add((1, 2), 1)
// but I'm not sure how
But I'm not sure how to get a new vector with only a single element changed.
What's the best approach?
Best depends on what your criteria are. The simplest immutable variant is to use a map from (Int,Int) to your count:
var c = (for (i <- 0 to 99; j <- 0 to 99) yield (i,j) -> 0).toMap
Then you access your values with c(i,j) and set them with c += ((i,j) -> n); c += ((i,j) -> (c(i,j)+1)) is a little bit annoying, but it's not too bad.
Faster is to use nested Vectors--by about a factor of 2 to 3, depending on whether you tend to re-set the same element over and over or not--but it has an ugly update method:
var v = Vector.fill(100,100)(0)
v(82)(49) // Easy enough
v = v.updated(82, v(82).updated(49, v(82)(49)+1) // Ouch!
Faster yet (by about 2x) is to have only one vector which you index into:
var u = Vector.fill(100*100)(0)
u(82*100 + 49) // Um, you think I can always remember to do this right?
u = u.updated(82*100 + 49, u(82*100 + 49)+1) // Well, that's actually better
If you don't need immutability and your table size isn't going to change, just use an array as you've shown. It's ~200x faster than the fastest vector solution if all you're doing is incrementing and decrementing an integer.
If you want to do this in a very general and functional (but not necessarily performant) way, you can use lenses. Here's an example of how you could use Scalaz 7's implementation, for example:
import scalaz._
def at[A](i: Int): Lens[Seq[A], A] = Lens.lensg(a => a.updated(i, _), (_(i)))
def at2d[A](i: Int, j: Int) = at[Seq[A]](i) andThen at(j)
And a little bit of setup:
val table = Vector.tabulate(3, 4)(_ + _)
def show[A](t: Seq[Seq[A]]) = t.map(_ mkString " ") mkString "\n"
Which gives us:
scala> show(table)
res0: String =
0 1 2 3
1 2 3 4
2 3 4 5
We can use our lens like this:
scala> show(at2d(1, 2).set(table, 9))
res1: String =
0 1 2 3
1 2 9 4
2 3 4 5
Or we can just get the value at a given cell:
scala> val v: Int = at2d(2, 3).get(table)
v: Int = 5
Or do a lot of more complex things, like apply a function to a particular cell:
scala> show(at2d(2, 2).mod(((_: Int) * 2), table))
res8: String =
0 1 2 3
1 2 3 4
2 3 8 5
And so on.
There isn't a built-in method for this, perhaps because it would require the Vector to know that it contains Vectors, or Vectors or Vectors etc, whereas most methods are generic, and it would require a separate method for each number of dimensions, because you need to specify a co-ordinate arg for each dimension.
However, you can add these yourself; the following will take you up to 4D, although you could just add the bits for 2D if that's all you need:
object UpdatableVector {
implicit def vectorToUpdatableVector2[T](v: Vector[Vector[T]]) = new UpdatableVector2(v)
implicit def vectorToUpdatableVector3[T](v: Vector[Vector[Vector[T]]]) = new UpdatableVector3(v)
implicit def vectorToUpdatableVector4[T](v: Vector[Vector[Vector[Vector[T]]]]) = new UpdatableVector4(v)
class UpdatableVector2[T](v: Vector[Vector[T]]) {
def updated2(c1: Int, c2: Int)(newVal: T) =
v.updated(c1, v(c1).updated(c2, newVal))
}
class UpdatableVector3[T](v: Vector[Vector[Vector[T]]]) {
def updated3(c1: Int, c2: Int, c3: Int)(newVal: T) =
v.updated(c1, v(c1).updated2(c2, c3)(newVal))
}
class UpdatableVector4[T](v: Vector[Vector[Vector[Vector[T]]]]) {
def updated4(c1: Int, c2: Int, c3: Int, c4: Int)(newVal: T) =
v.updated(c1, v(c1).updated3(c2, c3, c4)(newVal))
}
}
In Scala 2.10 you don't need the implicit defs and can just add the implicit keyword to the class definitions.
Test:
import UpdatableVector._
val v2 = Vector.fill(2,2)(0)
val r2 = v2.updated2(1,1)(42)
println(r2) // Vector(Vector(0, 0), Vector(0, 42))
val v3 = Vector.fill(2,2,2)(0)
val r3 = v3.updated3(1,1,1)(42)
println(r3) // etc
Hope that's useful.

Integer partitioning in Scala

Given n ( say 3 people ) and s ( say 100$ ), we'd like to partition s among n people.
So we need all possible n-tuples that sum to s
My Scala code below:
def weights(n:Int,s:Int):List[List[Int]] = {
List.concat( (0 to s).toList.map(List.fill(n)(_)).flatten, (0 to s).toList).
combinations(n).filter(_.sum==s).map(_.permutations.toList).toList.flatten
}
println(weights(3,100))
This works for small values of n. ( n=1, 2, 3 or 4).
Beyond n=4, it takes a very long time, practically unusable.
I'm looking for ways to rework my code using lazy evaluation/ Stream.
My requirements : Must work for n upto 10.
Warning : The problem gets really big really fast. My results from Matlab -
---For s =100, n = 1 thru 5 results are ---
n=1 :1 combinations
n=2 :101 combinations
n=3 :5151 combinations
n=4 :176851 combinations
n=5: 4598126 combinations
---
You need dynamic programming, or memoization. Same concept, anyway.
Let's say you have to divide s among n. Recursively, that's defined like this:
def permutations(s: Int, n: Int): List[List[Int]] = n match {
case 0 => Nil
case 1 => List(List(s))
case _ => (0 to s).toList flatMap (x => permutations(s - x, n - 1) map (x :: _))
}
Now, this will STILL be slow as hell, but there's a catch here... you don't need to recompute permutations(s, n) for numbers you have already computed. So you can do this instead:
val memoP = collection.mutable.Map.empty[(Int, Int), List[List[Int]]]
def permutations(s: Int, n: Int): List[List[Int]] = {
def permutationsWithHead(x: Int) = permutations(s - x, n - 1) map (x :: _)
n match {
case 0 => Nil
case 1 => List(List(s))
case _ =>
memoP getOrElseUpdate ((s, n),
(0 to s).toList flatMap permutationsWithHead)
}
}
And this can be even further improved, because it will compute every permutation. You only need to compute every combination, and then permute that without recomputing.
To compute every combination, we can change the code like this:
val memoC = collection.mutable.Map.empty[(Int, Int, Int), List[List[Int]]]
def combinations(s: Int, n: Int, min: Int = 0): List[List[Int]] = {
def combinationsWithHead(x: Int) = combinations(s - x, n - 1, x) map (x :: _)
n match {
case 0 => Nil
case 1 => List(List(s))
case _ =>
memoC getOrElseUpdate ((s, n, min),
(min to s / 2).toList flatMap combinationsWithHead)
}
}
Running combinations(100, 10) is still slow, given the sheer numbers of combinations alone. The permutations for each combination can be obtained simply calling .permutation on the combination.
Here's a quick and dirty Stream solution:
def weights(n: Int, s: Int) = (1 until s).foldLeft(Stream(Nil: List[Int])) {
(a, _) => a.flatMap(c => Stream.range(0, n - c.sum + 1).map(_ :: c))
}.map(c => (n - c.sum) :: c)
It works for n = 6 in about 15 seconds on my machine:
scala> var x = 0
scala> weights(100, 6).foreach(_ => x += 1)
scala> x
res81: Int = 96560646
As a side note: by the time you get to n = 10, there are 4,263,421,511,271 of these things. That's going to take days just to stream through.
My solution of this problem, it can computer n till 6:
object Partition {
implicit def i2p(n: Int): Partition = new Partition(n)
def main(args : Array[String]) : Unit = {
for(n <- 1 to 6) println(100.partitions(n).size)
}
}
class Partition(n: Int){
def partitions(m: Int):Iterator[List[Int]] = new Iterator[List[Int]] {
val nums = Array.ofDim[Int](m)
nums(0) = n
var hasNext = m > 0 && n > 0
override def next: List[Int] = {
if(hasNext){
val result = nums.toList
var idx = 0
while(idx < m-1 && nums(idx) == 0) idx = idx + 1
if(idx == m-1) hasNext = false
else {
nums(idx+1) = nums(idx+1) + 1
nums(0) = nums(idx) - 1
if(idx != 0) nums(idx) = 0
}
result
}
else Iterator.empty.next
}
}
}
1
101
5151
176851
4598126
96560646
However , we can just show the number of the possible n-tuples:
val pt: (Int,Int) => BigInt = {
val buf = collection.mutable.Map[(Int,Int),BigInt]()
(s,n) => buf.getOrElseUpdate((s,n),
if(n == 0 && s > 0) BigInt(0)
else if(s == 0) BigInt(1)
else (0 to s).map{k => pt(s-k,n-1)}.sum
)
}
for(n <- 1 to 20) printf("%2d :%s%n",n,pt(100,n).toString)
1 :1
2 :101
3 :5151
4 :176851
5 :4598126
6 :96560646
7 :1705904746
8 :26075972546
9 :352025629371
10 :4263421511271
11 :46897636623981
12 :473239787751081
13 :4416904685676756
14 :38393094575497956
15 :312629484400483356
16 :2396826047070372396
17 :17376988841260199871
18 :119594570260437846171
19 :784008849485092547121
20 :4910371215196105953021