Is it advisable to use for comprehensions with yield to return a huge number of items? - scala

I am working on a little program that generates combinations and I am using for comprehensions. Something like this:
def posibilities2(n: Int): Seq[List[Int]] = {
val maxValues = (1 to 3).map(i => n / i).toList
for {
n1 <- 0 to maxValues(0)
n2 <- 0 to maxValues(1)
n3 <- 0 to maxValues(2)
if n1 * 1 + n2 * 2 + n3 * 3 == n
}
yield List(n1, n2, n3)
}
posibilities2(1000).foreach(doSomething)
For bigger values of n it can lead to lots of items.
My question is this: Is this the way to do it, given that, for each item generated, I have to do some additional processing? I am not concerned about the program taking a long time to run, I am concerned about running out of memory.
Thank you

As amount of values for-comprehension produces is very high its better to go for iterator or stream implementation.
The below function will generate values on demand basis thus not risking out of memory errors.
def posibilities2(n: Int): Iterator[(Int, Int, Int)] = {
val maxValues = (1 to 3).map(i => n / i).toList
for {
n1 <- (0 to maxValues(0)).toIterator
n2 <- (0 to maxValues(1)).toIterator
n3 <- (0 to maxValues(2)).toIterator
if n1 * 1 + n2 * 2 + n3 * 3 == n
} yield (n1, n2, n3)
}

Related

Recursive sum function, How do i limit the sum?

The goal is to code this sum into a recursive function.
Sum
I have tried so far to code it like this.
def under(u: Int): Int = {
var i1 = u/2
var i = i1+1
if ( u/2 == 1 ) then u + 1 - 2 * 1
else (u + 1 - 2 * i) + under(u-1)
}
It seems like i am running into an issue with the recursive part but i am not able to figure out what goes wrong.
In theory, under(5) should produce 10.
Your logic is wrong. It should iterate (whether through loop, recursion or collection is irrelevant) from i=1 to i=n/2. But using n and current i as they are.
(1 to (n/2)).map(i => n + 1 - 2 * i).sum
You are (more or less) running computations from i=1 to i=n (or rather n down to 1) but instead of n you use i/2 and instead of i you use i/2+1. (sum from i=1 to i=n of (n/2 + 1 - 2 * i)).
// actually what you do is more like (1 to n).toList.reverse
// rather than (1 to n)
(1 to n).map(i => i/2 + 1 - 2 * (i/2 + 1)).sum
It's a different formula. It has twice the elements to sum, and a part of each of them is changing instead of being constant while another part has a wrong value.
To implement the same logic with recursion you would have to do something like:
// as one function with default args
// tail recursive version
def under(n: Int, i: Int = 1, sum: Int = 0): Int =
if (i > n/2) sum
else under(n, i+1, sum + (n + 2 - 2 * i))
// not tail recursive
def under(n: Int, i: Int = 1): Int =
if (i > n/2) 0
else (n + 2 - 2 * i) + under(n, i + 1)
// with nested functions without default args
def under(n: Int): Int = {
// tail recursive
def helper(i: Int, sum: Int): Int =
if (i > n/2) sum
else helper(i + 1, sum + (n + 2 - 2 * i))
helper(1, 0)
}
def under(n: Int): Int = {
// not tail recursive
def helper(i: Int): Int =
if (i > n/2) 0
else (n + 2 - 2 * i) + helper(i + 1)
helper(1)
}
As a side note: there is no need to use any iteration / recursion at all. Here is an explicit formula:
def g(n: Int) = n / 2 * (n - n / 2)
that gives the same results as
def h(n: Int) = (1 to n / 2).map(i => n + 1 - 2 * i).sum
Both assume that you want floored n / 2 in the case that n is odd, i.e. both of the functions above behave the same as
def j(n: Int) = (math.ceil(n / 2.0) * math.floor(n / 2.0)).toInt
(at least until rounding errors kick in).

Scala parallel collections

I am very naively trying to use Scala .par, and the result turns out to be slower than the non-parallel version, by quite a bit. What is the explanation for that?
Note: the question is not to make this faster, but to understand why this naive use of .par doesn't yield an immediate speed-up.
Note 2: timing method: I ran both methods with N = 10000. The first one returned in about 20s. The second one I killed after 3 minutes. Not even close. If I let it run longer I get into a Java heap space exception.
def pi_random(N: Long): Double = {
val count = (0L until N * N)
.map { _ =>
val (x, y) = (rng.nextDouble(), rng.nextDouble())
if (x*x + y*y <= 1) 1 else 0
}
.sum
4 * count.toDouble / (N * N)
}
def pi_random_parallel(N: Long): Double = {
val count = (0L until N * N)
.par
.map { _ =>
val (x, y) = (rng.nextDouble(), rng.nextDouble())
if (x*x + y*y <= 1) 1 else 0
}
.sum
4 * count.toDouble / (N * N)
}
Hard to know for sure without doing some actual profiling, but I have two theories:
First, you may be losing some benefits of the Range class, specifically near-zero memory usage. When you do (0L until N * N), you create a Range object, which is lazy. It does not actually create any object holding every single number in the range. Neither does map, I think. And sum calculates and adds numbers one at a time, so also allocates barely any memory.
I'm not sure the same is all true about ParRange. Seems like it would have to allocate some amount per split, and after map is called, perhaps it might have to store some amount of intermediate results in memory as "neighboring" splits wait for the other to complete. Especially the heap space exception makes me think something like this is the case. So you'll lose a lot of time to GC and such.
Second, probably the calls to rng.nextDouble are by far the most expensive part of that inner function. But I believe both java and scala Random classes are essentially single-threaded. They synchronize and block internally. So you won't gain that much from parallelism anyway, and in fact lose some to overhead.
There is not enough work per task, the task granularity is too fine-grained.
Creating each task requires some overhead:
Some object representing the task must be created
It must be ensured that only one thread executes one task at a time
In the case that some threads become idle, some job-stealing procedure must be invoked.
For N = 10000, you instantiate 100,000,000 tiny tasks. Each of those tasks does almost nothing: it generates two random numbers and performs some basic arithmetic and an if-branch. The overhead of creating a task is not comparable to the work that each task is doing.
The tasks must be much larger, so that each thread has enough work to do. Furthermore, it's probably faster if you make each RNG thread local, so that the threads can do their job in parallel, without permanently locking the default random number generator.
Here is an example:
import scala.util.Random
def pi_random(N: Long): Double = {
val rng = new Random
val count = (0L until N * N)
.map { _ =>
val (x, y) = (rng.nextDouble(), rng.nextDouble())
if (x*x + y*y <= 1) 1 else 0
}
.sum
4 * count.toDouble / (N * N)
}
def pi_random_parallel(N: Long): Double = {
val rng = new Random
val count = (0L until N * N)
.par
.map { _ =>
val (x, y) = (rng.nextDouble(), rng.nextDouble())
if (x*x + y*y <= 1) 1 else 0
}
.sum
4 * count.toDouble / (N * N)
}
def pi_random_properly(n: Long): Double = {
val count = (0L until n).par.map { _ =>
val rng = ThreadLocalRandom.current
var sum = 0
var idx = 0
while (idx < n) {
val (x, y) = (rng.nextDouble(), rng.nextDouble())
if (x*x + y*y <= 1.0) sum += 1
idx += 1
}
sum
}.sum
4 * count.toDouble / (n * n)
}
Here is a little demo and timings:
def measureTime[U](repeats: Long)(block: => U): Double = {
val start = System.currentTimeMillis
var iteration = 0
while (iteration < repeats) {
iteration += 1
block
}
val end = System.currentTimeMillis
(end - start).toDouble / repeats
}
// basic sanity check that all algos return roughly same result
println(pi_random(2000))
println(pi_random_parallel(2000))
println(pi_random_properly(2000))
// time comparison (N = 2k, 10 repetitions for each algorithm)
val N = 2000
val Reps = 10
println("Sequential: " + measureTime(Reps)(pi_random(N)))
println("Naive: " + measureTime(Reps)(pi_random_parallel(N)))
println("My proposal: " + measureTime(Reps)(pi_random_properly(N)))
Output:
3.141333
3.143418
3.14142
Sequential: 621.7
Naive: 3032.6
My version: 44.7
Now the parallel version is roughly an order of magnitude faster than the sequential version (result will obviously depend on the number of cores etc.).
I couldn't test it with N = 10000, because the naively parallelized version crashed everything with an "GC overhead exceeded"-error, which also illustrates that the overhead for creating the tiny tasks is too large.
In my implementation, I've additionaly unrolled the inner while: you need only one counter in one register, no need to create a huge collection by mapping on the range.
Edit: Replaced everything by ThreadLocalRandom, it now shouldn't matter whether your compiler versions supports SAM or not, so it should work with earlier versions of 2.11 too.

How does yield expand to in multiple dimension loop in Scala?

From Here we get to know that an expression like:
for( i <- 1 to 10 ) yield i + 1
will expand into
( 1 to 10 ).map( _+1 )
But what does the following expression expand to?
for( i <- 1 to 50 j <- i to 50 ) yield List(1,i,j)
Is this correct?
( 1 to 50 ).map( x => (1 to 50 ).map(List(1,x,_))
I'm interested in this problem because I'd like to make a function which performs multiple Xi <- Xi-1 to 50 operations, as shown below:
for( X1 <- 1 to 50 X2 <- X1 to 50 X3 <- X2 to 50 ..... Xn <- Xn-1 to 50 )
yield List(1,X1,X2,X3,.....,Xn)
The function has one parameter: dimension which denotes the n in the above expression.
Its return type is IndexSeq[List[Int]]
How can I achieve that?
Thank you for answering (:
It's well explained in a relevant doc. In particular:
for(x <- c1; y <- c2; z <- c3) yield {...}
will be translated into
c1.flatMap(x => c2.flatMap(y => c3.map(z => {...})))
I don't think there is a way to abstract over arbitrary nested comprehension (unless you're using voodoo magic, like macros)
See om-nom-nom's answer for an explanation of what the for loops expand to. I'd like to answer the second part of the opening question, how to implement a function that can do:
for( X1 <- 1 to 50 X2 <- X1 to 50 X3 <- X2 to 50 ..... Xn <- Xn to 50 )
yield List(1,X1,X2,X3,.....,Xn)
You can use:
def upto50(dimension: Int) = {
def loop(n: Int, start: Int): IndexedSeq[List[Int]] = {
if (n > dimension)
IndexedSeq(List())
else {
(n to 50).flatMap(x => loop(n + 1, x).map(x :: _))
}
}
loop(1, 1)
}
We compute each of the loops recursively, working inside-out, starting with Xn to 50 and building up the solution.
Solutions for the more general case of:
for( X1 <- S1 X2 <- S2 X3 <- S3 ..... Xn <- Sn )
yield List(1,X1,X2,X3,.....,Xn)
Where S1..Sn are arbitraray sequences or monads are also possible. See this gist for the necessary wall of code.

Scala - can 'for-yield' clause yields nothing for some condition?

In Scala language, I want to write a function that yields odd numbers within a given range. The function prints some log when iterating even numbers. The first version of the function is:
def getOdds(N: Int): Traversable[Int] = {
val list = new mutable.MutableList[Int]
for (n <- 0 until N) {
if (n % 2 == 1) {
list += n
} else {
println("skip even number " + n)
}
}
return list
}
If I omit printing logs, the implementation become very simple:
def getOddsWithoutPrint(N: Int) =
for (n <- 0 until N if (n % 2 == 1)) yield n
However, I don't want to miss the logging part. How do I rewrite the first version more compactly? It would be great if it can be rewritten similar to this:
def IWantToDoSomethingSimilar(N: Int) =
for (n <- 0 until N) if (n % 2 == 1) yield n else println("skip even number " + n)
def IWantToDoSomethingSimilar(N: Int) =
for {
n <- 0 until N
if n % 2 != 0 || { println("skip even number " + n); false }
} yield n
Using filter instead of a for expression would be slightly simpler though.
I you want to keep the sequentiality of your traitement (processing odds and evens in order, not separately), you can use something like that (edited) :
def IWantToDoSomethingSimilar(N: Int) =
(for (n <- (0 until N)) yield {
if (n % 2 == 1) {
Option(n)
} else {
println("skip even number " + n)
None
}
// Flatten transforms the Seq[Option[Int]] into Seq[Int]
}).flatten
EDIT, following the same concept, a shorter solution :
def IWantToDoSomethingSimilar(N: Int) =
(0 until N) map {
case n if n % 2 == 0 => println("skip even number "+ n)
case n => n
} collect {case i:Int => i}
If you will to dig into a functional approach, something like the following is a good point to start.
First some common definitions:
// use scalaz 7
import scalaz._, Scalaz._
// transforms a function returning either E or B into a
// function returning an optional B and optionally writing a log of type E
def logged[A, E, B, F[_]](f: A => E \/ B)(
implicit FM: Monoid[F[E]], FP: Pointed[F]): (A => Writer[F[E], Option[B]]) =
(a: A) => f(a).fold(
e => Writer(FP.point(e), None),
b => Writer(FM.zero, Some(b)))
// helper for fixing the log storage format to List
def listLogged[A, E, B](f: A => E \/ B) = logged[A, E, B, List](f)
// shorthand for a String logger with List storage
type W[+A] = Writer[List[String], A]
Now all you have to do is write your filtering function:
def keepOdd(n: Int): String \/ Int =
if (n % 2 == 1) \/.right(n) else \/.left(n + " was even")
You can try it instantly:
scala> List(5, 6) map(keepOdd)
res0: List[scalaz.\/[String,Int]] = List(\/-(5), -\/(6 was even))
Then you can use the traverse function to apply your function to a list of inputs, and collect both the logs written and the results:
scala> val x = List(5, 6).traverse[W, Option[Int]](listLogged(keepOdd))
x: W[List[Option[Int]]] = scalaz.WriterTFunctions$$anon$26#503d0400
// unwrap the results
scala> x.run
res11: (List[String], List[Option[Int]]) = (List(6 was even),List(Some(5), None))
// we may even drop the None-s from the output
scala> val (logs, results) = x.map(_.flatten).run
logs: List[String] = List(6 was even)
results: List[Int] = List(5)
I don't think this can be done easily with a for comprehension. But you could use partition.
def getOffs(N:Int) = {
val (evens, odds) = 0 until N partition { x => x % 2 == 0 }
evens foreach { x => println("skipping " + x) }
odds
}
EDIT: To avoid printing the log messages after the partitioning is done, you can change the first line of the method like this:
val (evens, odds) = (0 until N).view.partition { x => x % 2 == 0 }

Integer partitioning in Scala

Given n ( say 3 people ) and s ( say 100$ ), we'd like to partition s among n people.
So we need all possible n-tuples that sum to s
My Scala code below:
def weights(n:Int,s:Int):List[List[Int]] = {
List.concat( (0 to s).toList.map(List.fill(n)(_)).flatten, (0 to s).toList).
combinations(n).filter(_.sum==s).map(_.permutations.toList).toList.flatten
}
println(weights(3,100))
This works for small values of n. ( n=1, 2, 3 or 4).
Beyond n=4, it takes a very long time, practically unusable.
I'm looking for ways to rework my code using lazy evaluation/ Stream.
My requirements : Must work for n upto 10.
Warning : The problem gets really big really fast. My results from Matlab -
---For s =100, n = 1 thru 5 results are ---
n=1 :1 combinations
n=2 :101 combinations
n=3 :5151 combinations
n=4 :176851 combinations
n=5: 4598126 combinations
---
You need dynamic programming, or memoization. Same concept, anyway.
Let's say you have to divide s among n. Recursively, that's defined like this:
def permutations(s: Int, n: Int): List[List[Int]] = n match {
case 0 => Nil
case 1 => List(List(s))
case _ => (0 to s).toList flatMap (x => permutations(s - x, n - 1) map (x :: _))
}
Now, this will STILL be slow as hell, but there's a catch here... you don't need to recompute permutations(s, n) for numbers you have already computed. So you can do this instead:
val memoP = collection.mutable.Map.empty[(Int, Int), List[List[Int]]]
def permutations(s: Int, n: Int): List[List[Int]] = {
def permutationsWithHead(x: Int) = permutations(s - x, n - 1) map (x :: _)
n match {
case 0 => Nil
case 1 => List(List(s))
case _ =>
memoP getOrElseUpdate ((s, n),
(0 to s).toList flatMap permutationsWithHead)
}
}
And this can be even further improved, because it will compute every permutation. You only need to compute every combination, and then permute that without recomputing.
To compute every combination, we can change the code like this:
val memoC = collection.mutable.Map.empty[(Int, Int, Int), List[List[Int]]]
def combinations(s: Int, n: Int, min: Int = 0): List[List[Int]] = {
def combinationsWithHead(x: Int) = combinations(s - x, n - 1, x) map (x :: _)
n match {
case 0 => Nil
case 1 => List(List(s))
case _ =>
memoC getOrElseUpdate ((s, n, min),
(min to s / 2).toList flatMap combinationsWithHead)
}
}
Running combinations(100, 10) is still slow, given the sheer numbers of combinations alone. The permutations for each combination can be obtained simply calling .permutation on the combination.
Here's a quick and dirty Stream solution:
def weights(n: Int, s: Int) = (1 until s).foldLeft(Stream(Nil: List[Int])) {
(a, _) => a.flatMap(c => Stream.range(0, n - c.sum + 1).map(_ :: c))
}.map(c => (n - c.sum) :: c)
It works for n = 6 in about 15 seconds on my machine:
scala> var x = 0
scala> weights(100, 6).foreach(_ => x += 1)
scala> x
res81: Int = 96560646
As a side note: by the time you get to n = 10, there are 4,263,421,511,271 of these things. That's going to take days just to stream through.
My solution of this problem, it can computer n till 6:
object Partition {
implicit def i2p(n: Int): Partition = new Partition(n)
def main(args : Array[String]) : Unit = {
for(n <- 1 to 6) println(100.partitions(n).size)
}
}
class Partition(n: Int){
def partitions(m: Int):Iterator[List[Int]] = new Iterator[List[Int]] {
val nums = Array.ofDim[Int](m)
nums(0) = n
var hasNext = m > 0 && n > 0
override def next: List[Int] = {
if(hasNext){
val result = nums.toList
var idx = 0
while(idx < m-1 && nums(idx) == 0) idx = idx + 1
if(idx == m-1) hasNext = false
else {
nums(idx+1) = nums(idx+1) + 1
nums(0) = nums(idx) - 1
if(idx != 0) nums(idx) = 0
}
result
}
else Iterator.empty.next
}
}
}
1
101
5151
176851
4598126
96560646
However , we can just show the number of the possible n-tuples:
val pt: (Int,Int) => BigInt = {
val buf = collection.mutable.Map[(Int,Int),BigInt]()
(s,n) => buf.getOrElseUpdate((s,n),
if(n == 0 && s > 0) BigInt(0)
else if(s == 0) BigInt(1)
else (0 to s).map{k => pt(s-k,n-1)}.sum
)
}
for(n <- 1 to 20) printf("%2d :%s%n",n,pt(100,n).toString)
1 :1
2 :101
3 :5151
4 :176851
5 :4598126
6 :96560646
7 :1705904746
8 :26075972546
9 :352025629371
10 :4263421511271
11 :46897636623981
12 :473239787751081
13 :4416904685676756
14 :38393094575497956
15 :312629484400483356
16 :2396826047070372396
17 :17376988841260199871
18 :119594570260437846171
19 :784008849485092547121
20 :4910371215196105953021