Using Streams for iteration in Scala - scala

SICP says that iterative processes (e.g. Newton method of square root calculation, "pi" calculation, etc.) can be formulated in terms of Streams.
Does anybody use streams in Scala to model iterations?

Here is one way to produce the stream of approximations of pi:
val naturals = Stream.from(0) // 0, 1, 2, ...
val odds = naturals.map(_ * 2 + 1) // 1, 3, 5, ...
val oddInverses = odds.map(1.0d / _) // 1/1, 1/3, 1/5, ...
val alternations = Stream.iterate(1)(-_) // 1, -1, 1, ...
val products = (oddInverses zip alternations)
.map(ia => ia._1 * ia._2) // 1/1, -1/3, 1/5, ...
// Computes a stream representing the cumulative sum of another one
def sumUp(s : Stream[Double], acc : Double = 0.0d) : Stream[Double] =
Stream.cons(s.head + acc, sumUp(s.tail, s.head + acc))
val pi = sumUp(products).map(_ * 4.0) // Approximations of pi.
Now, say you want the 200th iteration:
scala> pi(200)
resN: Double = 3.1465677471829556
...or the 300000th:
scala> pi(300000)
resN : Double = 3.14159598691202

Streams are extremely useful when you are doing a sequence of recursive calculations and a single result depends on previous results, such as calculating pi. Here's a simpler example, consider the classic recursive algorithm for calculating fibbonacci numbers (1, 2, 3, 5, 8, 13, ...):
def fib(n: Int) : Int = n match {
case 0 => 1
case 1 => 2
case _ => fib(n - 1) + fib(n - 2)
}
One of the main points of this code is that while very simple, is extremely inefficient. fib(100) almost crashed my computer! Each recursion branches into two calls and you are essentially calculating the same values many times.
Streams allow you to do dynamic programming in a recursive fashion, where once a value is calculated, it is reused every time it is needed again. To implement the above using streams:
val naturals: Stream[Int] = Stream.cons(0, naturals.map{_ + 1})
val fibs : Stream[Int] = naturals.map{
case 0 => 1
case 1 => 2
case n => fibs(n - 1) + fibs( n - 2)
}
fibs(1) //2
fibs(2) //3
fibs(3) //5
fibs(100) //1445263496
Whereas the recursive solution runs in O(2^n) time, the Streams solution runs in O(n^2) time. Since you only need the last 2 generated members, you can easily optimize this using Stream.drop so that the stream size doesn't overflow memory.

Related

How to trick Scala map method to produce more than one output per each input item?

Quite complex algorith is being applied to list of Spark Dataset's rows (list was obtained using groupByKey and flatMapGroups). Most rows are transformed 1 : 1 from input to output, but in some scenarios require more than one output per each input. The input row schema can change anytime. The map() fits the requirements quite well for the 1:1 transformation, but is there a way to use it producing 1 : n output?
The only work-around I found relies on foreach method which has unpleasant overhed cause by creating the initial empty list (remember, unlike the simplified example below, real-life list structure is changing randomly).
My original problem is too complex to share here, but this example demonstrates the concept. Let's have a list of integers. Each should be transformed into its square value and if the input is even it should also transform into one half of the original value:
val X = Seq(1, 2, 3, 4, 5)
val y = X.map(x => x * x) //map is intended for 1:1 transformation so it works great here
val z = X.map(x => for(n <- 1 to 5) (n, x * x)) //this attempt FAILS - generates list of five rows with emtpy tuples
// this work-around works, but newX definition is problematic
var newX = List[Int]() //in reality defining as head of the input list and dropping result's tail at the end
val za = X.foreach(x => {
newX = x*x :: newX
if(x % 2 == 0) newX = (x / 2) :: newX
})
newX
Is there a better way than foreach construct?
.flatMap produces any number of outputs from a single input.
val X = Seq(1, 2, 3, 4, 5)
X.flatMap { x =>
if (x % 2 == 0) Seq(x*x, x / 2) else Seq(x / 2)
}
#=> Seq[Int] = List(0, 4, 1, 1, 16, 2, 2)
flatMap in more detail
In X.map(f), f is a function that maps each input to a single output. By contrast, in X.flatMap(g), the function g maps each input to a sequence of outputs. flatMap then takes all the sequences produced (one for each element in f) and concatenates them.
The neat thing is .flatMap works not just for sequences, but for all sequence-like objects. For an option, for instance, Option(x)#flatMap(g) will allow g to return an Option. Similarly, Future(x)#flatMap(g) will allow g to return a Future.
Whenever the number of elements you return depends on the input, you should think of flatMap.

Splitting a Scala Range into evenly-sized contiguous sub-Ranges

If I have a Range, how can I split it into a sequence of contiguous sub-ranges, where the number of sub-ranges (buckets) is specified? Empty buckets should be omitted if there are not enough items.
For example:
splitRange(1 to 6, 3) == Seq(Range(1,2), Range(3,4), Range(5,6))
splitRange(1 to 2, 3) == Seq(Range(1), Range(2))
Some additional constraints, that rule out some of the solutions I've seen:
Roughly even bucket size - the bucket size should vary by 1, at most
The length of the input range may sometimes be very large, so the ranges should not be materialized into sequences (e.g. can't use grouped)
This also implies that we don't allocate numbers to buckets in round-robin fashion, because then numbers in each bucket wouldn't be contiguous and so wouldn't form a Range
Ideally, the sub-ranges would be produced in order, i.e (1,2)(3,4), not (3,4)(1,2)
A colleague found a solution here:
def splitRange(r: Range, chunks: Int): Seq[Range] = {
if (r.step != 1)
throw new IllegalArgumentException("Range must have step size equal to 1")
val nchunks = scala.math.max(chunks, 1)
val chunkSize = scala.math.max(r.length / nchunks, 1)
val starts = r.by(chunkSize).take(nchunks)
val ends = starts.map(_ - 1).drop(1) :+ r.end
starts.zip(ends).map(x => x._1 to x._2)
}
but this can produce very uneven bucket sizes when N is small, e.g:
splitRange(1 to 14, 5)
//> Vector(Range(1, 2), Range(3, 4), Range(5, 6),
//| Range(7, 8), Range(9, 10, 11, 12, 13, 14))
^^^^^^^^^^^^^^^^^^^^^
Floating-point approaches
One way is to generate a fractional (floating-point) offset for each bucket, then convert these to integer Ranges, by zipping. Empty Ranges also need filtering out using collect.
def splitRange(r: Range, chunks: Int): Seq[Range] = {
require(r.step == 1, "Range must have step size equal to 1")
require(chunks >= 1, "Must ask for at least 1 chunk")
val m = r.length.toDouble
val chunkSize = m / chunks
val bins = (0 to chunks).map { x => math.round((x.toDouble * m) / chunks).toInt }
val pairs = bins zip (bins.tail)
pairs.collect { case (a, b) if b > a => a to b }
}
(The first version of this solution had a rounding problem such that it could not handle Int.MaxValue - this has now been fixed based on Rex Kerr's recursive floating-point solution below)
Another floating-point approach is to recurse down the range, taking the head off the range each time, so we cannot miss any elements. This version can handle Int.MaxValue correctly.
def splitRange(r: Range, chunks: Int): Seq[Range] = {
require(r.step == 1, "Range must have step size equal to 1")
require(chunks >= 1, "Must ask for at least 1 chunk")
val chunkSize = r.length.toDouble / chunks
def go(i: Int, r: Range, delta: Double, acc: List[Range]): List[Range] = {
if (i == chunks) r :: acc
// ensures the last chunk has all remaining values, even if error accumulates
else {
val s = delta + chunkSize
val (chunk, rest) = r.splitAt(s.toInt)
go(i + 1, rest, s - s.toInt, if (chunk.length > 0) chunk :: acc else acc)
}
}
go(1, r, 0.0D, Nil).reverse
}
One can also recurse to generate the (start,end) pairs, rather than zipping them. This is adapted from Rex Kerr's answer to a similar question
def splitRange(r: Range, chunks: Int): Seq[Range] = {
require(r.step == 1, "Range must have step size equal to 1")
require(chunks >= 1, "Must ask for at least 1 chunk")
val m = r.length
val bins = (0 to chunks).map { x => math.round((x.toDouble * m) / chunks).toInt }
def snip(r: Range, ns: Seq[Int], got: Vector[Range]): Vector[Range] = {
if (ns.length < 2) got
else {
val (i, j) = (ns.head, ns.tail.head)
snip(r.drop(j - i), ns.tail, got :+ r.take(j - i))
}
}
snip(r, bins, Vector.empty).filter(_.length > 0)
}
Integer approach
Finally, I realized that this can be done with purely integer arithmetic by adapting Bresenham's line-drawing algorithm, which solves a basically equivalent problem - how to allocate the x-pixels evenly across the y rows, using only integer operations!
I initially translated the pseudo-code into an imperative solution using var and ArrayBuffer, then converted it into a tail-recursive solution:
def splitRange(r: Range, chunks: Int): List[Range] = {
require(r.step == 1, "Range must have step size equal to 1")
require(chunks >= 1, "Must ask for at least 1 chunk")
val dy = r.length
val dx = chunks
#tailrec
def go(y0:Int, y:Int, d:Int, ch:Int, acc: List[Range]):List[Range] = {
if (ch == 0) acc
else {
if (d > 0) go(y0, y-1, d-dx, ch, acc)
else go(y-1, y, d+dy, ch-1, if (y > y0) acc
else (y to y0) :: acc)
}
}
go(r.end, r.end, dy - dx, chunks, Nil)
}
Please see the Wikipedia link for a full explanation, but essentially the algorithm zig-zags up the slope of a line, alternatively adding the y-range dy and subtracting the x-range dx. If these don't divide exactly, then an error accumulates until it divides exactly, leading to an extra pixel in some sub-ranges.
splitRange(3 to 15, 5)
//> List(Range(3, 4), Range(5, 6, 7), Range(8, 9),
//| Range(10, 11, 12), Range(13, 14, 15))

How to generate a list in Scala, where each item depends on the preceding item

Say, I have a recursive rule:
f(0) = 2
f(n) = f(n-1) * 3 - 2
I need to generate a list for n ∈ [0, 10].
If I was interested in f(10), I could use foldLeft like this:
(1 to 10).foldLeft(2)((z, _) => z * 3 - 2)
I want to achieve the following in a concise and functional style:
val list = new ListBuffer[Int]
list += 2
(1 to 10).foreach {
list += list.last * 3 - 2
}
What's the solution?
You can use a Stream to generate this list lazily and functionally:
val stream: Stream[Int] = {
def next(i: Int): Stream[Int] = {
val n = i * 3 - 2
n #:: next(n)
}
2 #:: next(2)
}
println(stream.take(11).toList)
//prints List(2, 4, 10, 28, 82, 244, 730, 2188, 6562, 19684, 59050)
One of the multiple approaches involves for instance the use of scanLeft as follows,
(1 to 10).scanLeft(2)( (acc,_) => acc*3-2)
This applies the function onto the latest (accumulated) result.
Update
Also consider this Iterator
val f = Iterator.iterate(2)(_*3-2)
and so
(1 to 10).map(_ => f.next)
For a large number of iterations, initial value 2: Int may be cast onto BigInt(2) so as to avoid overflow for instance in
(1 to 100).map(_ => f.next)

Monte Carlo calculation of Pi in Scala

Suppose I would like to calculate Pi with Monte Carlo simulation as an exercise.
I am writing a function, which picks a point in a square (0, 1), (1, 0) at random and tests if the point is inside the circle.
import scala.math._
import scala.util.Random
def circleTest() = {
val (x, y) = (Random.nextDouble, Random.nextDouble)
sqrt(x*x + y*y) <= 1
}
Then I am writing a function, which takes as arguments the test function and the number of trials and returns the fraction of the trials in which the test was found to be true.
def monteCarlo(trials: Int, test: () => Boolean) =
(1 to trials).map(_ => if (test()) 1 else 0).sum * 1.0 / trials
... and I can calculate Pi
monteCarlo(100000, circleTest) * 4
Now I wonder if monteCarlo function can be improved. How would you write monteCarlo efficient and readable ?
For example, since the number of trials is large is it worth using a view or iterator instead of Range(1, trials) and reduce instead of map and sum ?
It's worth noting that Random.nextDouble is side-effecting—when you call it it changes the state of the random number generator. This may not be a concern to you, but since there are already five answers here I figure it won't hurt anything to add one that's purely functional.
First you'll need a random number generation monad implementation. Luckily NICTA provides a really nice one that's integrated with Scalaz. You can use it like this:
import com.nicta.rng._, scalaz._, Scalaz._
val pointInUnitSquare = Rng.choosedouble(0.0, 1.0) zip Rng.choosedouble(0.0, 1.0)
val insideCircle = pointInUnitSquare.map { case (x, y) => x * x + y * y <= 1 }
def mcPi(trials: Int): Rng[Double] =
EphemeralStream.range(0, trials).foldLeftM(0) {
case (acc, _) => insideCircle.map(_.fold(1, 0) + acc)
}.map(_ / trials.toDouble * 4)
And then:
scala> val choosePi = mcPi(10000000)
choosePi: com.nicta.rng.Rng[Double] = com.nicta.rng.Rng$$anon$3#16dd554f
Nothing's been computed yet—we've just built up a computation that will generate our value randomly when executed. Let's just execute it on the spot in the IO monad for the sake of convenience:
scala> choosePi.run.unsafePerformIO
res0: Double = 3.1415628
This won't be the most performant solution, but it's good enough that it may not be a problem for many applications, and the referential transparency may be worth it.
Stream based version, for another alternative. I think this is quite clear.
def monteCarlo(trials: Int, test: () => Boolean) =
Stream
.continually(if (test()) 1.0 else 0.0)
.take(trials)
.sum / trials
(the sum isn't specialised for streams but the implementation (in TraversableOnce) just calls foldLeft that is specialised and "allows GC to collect along the way." So the .sum won't force the stream to be evaluated and so won't keep all the trials in memory at once)
I see no problem with the following recursive version:
def monteCarlo(trials: Int, test: () => Boolean) = {
def bool2double(b: Boolean) = if (b) 1.0d else 0.0d
#scala.annotation.tailrec
def recurse(n: Int, sum: Double): Double =
if (n <= 0) sum / trials
else recurse(n - 1, sum + bool2double(test()))
recurse(trials, 0.0d)
}
And a foldLeft version, too:
def monteCarloFold(trials: Int, test: () => Boolean) =
(1 to trials).foldLeft(0.0d)((s,i) => s + (if (test()) 1.0d else 0.0d)) / trials
This is more memory efficient than the map version in the question.
Using tail recursion might be an idea:
def recMonteCarlo(trials: Int, currentSum: Double, test:() => Boolean):Double = trials match {
case 0 => currentSum
case x =>
val nextSum = currentSum + (if (test()) 1.0 else 0.0)
recMonteCarlo(trials-1, nextSum, test)
def monteCarlo(trials: Int, test:() => Boolean) = {
val monteSum = recMonteCarlo(trials, 0, test)
monteSum / trials
}
Using aggregate on a parallel collection, like this,
def monteCarlo(trials: Int, test: () => Boolean) = {
val pr = (1 to trials).par
val s = pr.aggregate(0)( (a,_) => a + (if (test()) 1 else 0), _ + _)
s * 4.0 / trials
}
where partial results are summed up in parallel with other test calculations.

help rewriting in functional style

I'm learning Scala as my first functional-ish language. As one of the problems, I was trying to find a functional way of generating the sequence S up to n places. S is defined so that S(1) = 1, and S(x) = the number of times x appears in the sequence. (I can't remember what this is called, but I've seen it in programming books before.)
In practice, the sequence looks like this:
S = 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7 ...
I can generate this sequence pretty easily in Scala using an imperative style like this:
def genSequence(numItems: Int) = {
require(numItems > 0, "numItems must be >= 1")
var list: List[Int] = List(1)
var seq_no = 2
var no = 2
var no_nos = 0
var num_made = 1
while(num_made < numItems) {
if(no_nos < seq_no) {
list = list :+ no
no_nos += 1
num_made += 1
} else if(no % 2 == 0) {
no += 1
no_nos = 0
} else {
no += 1
seq_no += 1
no_nos = 0
}
}
list
}
But I don't really have any idea how to write this without using vars and the while loop.
Thanks!
Pavel's answer has come closest so far, but it's also inefficient. Two flatMaps and a zipWithIndex are overkill here :)
My understanding of the required output:
The results contain all the positive integers (starting from 1) at least once
each number n appears in the output (n/2) + 1 times
As Pavel has rightly noted, the solution is to start with a Stream then use flatMap:
Stream from 1
This generates a Stream, a potentially never-ending sequence that only produces values on demand. In this case, it's generating 1, 2, 3, 4... all the way up to Infinity (in theory) or Integer.MAX_VALUE (in practice)
Streams can be mapped over, as with any other collection. For example: (Stream from 1) map { 2 * _ } generates a Stream of even numbers.
You can also use flatMap on Streams, allowing you to map each input element to zero or more output elements; this is key to solving your problem:
val strm = (Stream from 1) flatMap { n => Stream.fill(n/2 + 1)(n) }
So... How does this work? For the element 3, the lambda { n => Stream.fill(n/2 + 1)(n) } will produce the output stream 3,3. For the first 5 integers you'll get:
1 -> 1
2 -> 2, 2
3 -> 3, 3
4 -> 4, 4, 4
5 -> 5, 5, 5
etc.
and because we're using flatMap, these will be concatenated, yielding:
1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, ...
Streams are memoised, so once a given value has been calculated it'll be saved for future reference. However, all the preceeding values have to be calculated at least once. If you want the full sequence then this won't cause any problems, but it does mean that generating S(10796) from a cold start is going to be slow! (a problem shared with your imperative algorithm). If you need to do this, then none of the solutions so far is likely to be appropriate for you.
The following code produces exactly the same sequence as yours:
val seq = Stream.from(1)
.flatMap(Stream.fill(2)(_))
.zipWithIndex
.flatMap(p => Stream.fill(p._1)(p._2))
.tail
However, if you want to produce the Golomb sequence (that complies with the definition, but differs from your sample code result), you may use the following:
val seq = 1 #:: a(2)
def a(n: Int): Stream[Int] = (1 + seq(n - seq(seq(n - 2) - 1) - 1)) #:: a(n + 1)
You may check my article for more examples of how to deal with number sequences in functional style.
Here is a translation of your code to a more functional style:
def genSequence(numItems: Int): List[Int] = {
genSequenceR(numItems, 2, 2, 0, 1, List[Int](1))
}
def genSequenceR(numItems: Int, seq_no: Int, no:Int, no_nos: Int, numMade: Int, list: List[Int]): List[Int] = {
if(numMade < numItems){
if(no_nos < seq_no){
genSequenceR(numItems, seq_no, no, no_nos + 1, numMade + 1, list :+ no)
}else if(no % 2 == 0){
genSequenceR(numItems, seq_no, no + 1, 0, numMade, list)
}else{
genSequenceR(numItems, seq_no + 1, no + 1, 0, numMade, list)
}
}else{
list
}
}
The genSequenceR is the recursive function that accumulates values in the list and calls the function with new values based on the conditions. Like the while loop, it terminates, when numMade is less than numItems and returns the list to genSequence.
This is a fairly rudimentary functional translation of your code. It can be improved and there are better approaches typically used. I'd recommend trying to improve it with pattern matching and then work towards the other solutions that use Stream here.
Here's an attempt from a Scala tyro. Keep in mind I don't really understand Scala, I don't really understand the question, and I don't really understand your algorithm.
def genX_Ys[A](howMany : Int, ofWhat : A) : List[A] = howMany match {
case 1 => List(ofWhat)
case _ => ofWhat :: genX_Ys(howMany - 1, ofWhat)
}
def makeAtLeast(startingWith : List[Int], nextUp : Int, howMany : Int, minimumLength : Int) : List[Int] = {
if (startingWith.size >= minimumLength)
startingWith
else
makeAtLeast(startingWith ++ genX_Ys( howMany, nextUp),
nextUp +1, howMany + (if (nextUp % 2 == 1) 1 else 0), minimumLength)
}
def genSequence(numItems: Int) = makeAtLeast(List(1), 2, 2, numItems).slice(0, numItems)
This seems to work, but re-read the caveats above. In particular, I am sure there is a library function that performs genX_Ys, but I couldn't find it.
EDIT Could be
def genX_Ys[A](howMany : Int, ofWhat : A) : Seq[A] =
(1 to howMany) map { x => ofWhat }
Here is a very direct "translation" of the definition of the Golomb seqence:
val it = Iterator.iterate((1,1,Map(1->1,2->2))){ case (n,i,m) =>
val c = m(n)
if (c == 1) (n+1, i+1, m + (i -> n) - n)
else (n, i+1, m + (i -> n) + (n -> (c-1)))
}.map(_._1)
println(it.take(10).toList)
The tripel (n,i,m) contains the actual number n, the index i and a Map m, which contains how often an n must be repeated. When the counter in the Map for our n reaches 1, we increase n (and can drop n from the map, as it is not longer needed), else we just decrease n's counter in the map and keep n. In every case we add the new pair i -> n into the map, which will be used as counter later (when a subsequent n reaches the value of the current i).
[Edit]
Thinking about it, I realized that I don't need indexes and not even a lookup (because the "counters" are already in the "right" order), which means that I can replace the Map with a Queue:
import collection.immutable.Queue
val it = 1 #:: Iterator.iterate((2, 2, Queue[Int]())){
case (n,1,q) => (n+1, q.head, q.tail + (n+1))
case (n,c,q) => (n,c-1,q + n)
}.map(_._1).toStream
The Iterator works correctly when starting by 2, so I had to add a 1 at the beginning. The second tuple argument is now the counter for the current n (taken from the Queue). The current counter could be kept in the Queue as well, so we have only a pair, but then it's less clear what's going on due to the complicated Queue handling:
val it = 1 #:: Iterator.iterate((2, Queue[Int](2))){
case (n,q) if q.head == 1 => (n+1, q.tail + (n+1))
case (n,q) => (n, ((q.head-1) +: q.tail) + n)
}.map(_._1).toStream