Nested lazy for-comprehension - scala

I have a deeply "nested" for-comprehension, simplified to 3 levels below: x, y, and z. I was hoping making only x a Stream would make the y and z computations lazy too:
val stream = for {
x <- List(1, 2, 3).toStream
y <- List("foo", "bar", "baz")
z = {
println("Processed " + x + y)
x + y
}
} yield z
stream take (2) foreach (doSomething)
But this computes all 3 elements, as evidenced by the 3 prints. I'd like to only compute the first 2, since those are all I take from the stream. I can work around this by calling toStream on the second List and so on. Is there a better way than calling that at every level of the for-comprehension?

What it prints is:
Processed 1foo
Processed 1bar
Processed 1baz
stream: scala.collection.immutable.Stream[String] = Stream(1foo, ?)
scala> stream take (2) foreach (println)
1foo
1bar
The head of a Stream is always strictly evaluated, which is why you see Processed 1foo etc and not Processed 2foo etc. This is printed when you create the Stream, or more precisely, when the head of stream is evaluated.
You are correct that if you only wish to process each resulting element one by one then all the generators will have to be Streams. You could get around calling toStream by making them Streams to start with as in example below.
stream is a Stream[String] and its head needs to be evaluated. If you don't want to calculate a value eagerly, you could either prepend a dummy value, or better, make your value stream lazy:
lazy val stream = for {
x <- Stream(1, 2, 3)
y <- Stream("foo", "bar", "baz")
z = { println("Processed " + x + y); x + y }
} yield z
This does not do any "processing" until you take each value:
scala> stream take 2 foreach println
Processed 1foo
1foo
Processed 1bar
1bar

Related

How to trick Scala map method to produce more than one output per each input item?

Quite complex algorith is being applied to list of Spark Dataset's rows (list was obtained using groupByKey and flatMapGroups). Most rows are transformed 1 : 1 from input to output, but in some scenarios require more than one output per each input. The input row schema can change anytime. The map() fits the requirements quite well for the 1:1 transformation, but is there a way to use it producing 1 : n output?
The only work-around I found relies on foreach method which has unpleasant overhed cause by creating the initial empty list (remember, unlike the simplified example below, real-life list structure is changing randomly).
My original problem is too complex to share here, but this example demonstrates the concept. Let's have a list of integers. Each should be transformed into its square value and if the input is even it should also transform into one half of the original value:
val X = Seq(1, 2, 3, 4, 5)
val y = X.map(x => x * x) //map is intended for 1:1 transformation so it works great here
val z = X.map(x => for(n <- 1 to 5) (n, x * x)) //this attempt FAILS - generates list of five rows with emtpy tuples
// this work-around works, but newX definition is problematic
var newX = List[Int]() //in reality defining as head of the input list and dropping result's tail at the end
val za = X.foreach(x => {
newX = x*x :: newX
if(x % 2 == 0) newX = (x / 2) :: newX
})
newX
Is there a better way than foreach construct?
.flatMap produces any number of outputs from a single input.
val X = Seq(1, 2, 3, 4, 5)
X.flatMap { x =>
if (x % 2 == 0) Seq(x*x, x / 2) else Seq(x / 2)
}
#=> Seq[Int] = List(0, 4, 1, 1, 16, 2, 2)
flatMap in more detail
In X.map(f), f is a function that maps each input to a single output. By contrast, in X.flatMap(g), the function g maps each input to a sequence of outputs. flatMap then takes all the sequences produced (one for each element in f) and concatenates them.
The neat thing is .flatMap works not just for sequences, but for all sequence-like objects. For an option, for instance, Option(x)#flatMap(g) will allow g to return an Option. Similarly, Future(x)#flatMap(g) will allow g to return a Future.
Whenever the number of elements you return depends on the input, you should think of flatMap.

Why does iterating over multiple streams only iterate over the first element?

I've recently run into a bug in my code, in which iterating over multiple streams causes them to only iterate only through the first item. I converted my streams to buffers (I wasn't even aware that the function's implementation that I was calling returns a stream) and the problem was fixed. I found this hard to believe, so I created a minimum verifiable example:
def f(as: Seq[String], bs: Seq[String]): Unit =
for {
a <- as
b <- bs
} yield println((a, b))
val seq = Seq(1, 2, 3).map(_.toString)
f(seq, seq)
println()
val stream = Stream.iterate(1)(_ + 1).map(_.toString).take(3)
f(stream, stream)
A function that prints every combination of its inputs, and is invoked with the Seq [1, 2, 3] and the Stream [1, 2, 3].
The result with the seq is:
(1,1)
(1,2)
(1,3)
(2,1)
(2,2)
(2,3)
(3,1)
(3,2)
(3,3)
And the result with the stream is:
(1,1)
I've only been able to replicate this when iterating through multiple generators, iterating through a single stream seems to work fine.
So my questions are: why does this happen, and how can I avoid this kind of glitch? That is, short of using .toBuffer or .to[Vector] before every multi-generator iteration?
Thanks.
The manner in which you're using the for-comprehension (with the println in the yield) is a bit strange and probably not what you want to do. If you really just want to print out the entries, then just use foreach. This will force lazy sequences like Stream, i.e.
def f_strict(as: Seq[String], bs: Seq[String]): Unit = {
for {
a <- as
b <- bs
} println((a, b))
}
The reason you're getting the strange behavior with your f is that Streams are lazy, and elements are only computed (and then memoized) as needed. Since you never use the Stream created by f (necessarily because your f returns a Unit), only the head ever gets computed (which is why you're seeing the single (1, 1).) If you were instead to have it return the sequence it generated (which will have type Seq[Unit]), i.e.
def f_new(as: Seq[String], bs: Seq[String]): Seq[Unit] = {
for {
a <- as
b <- bs
} yield println((a, b))
}
Then you'll get the following behavior which should hopefully help to elucidate what's going on:
val xs = Stream(1, 2, 3)
val result = f_new(xs.map(_.toString), xs.map(_.toString))
//prints out (1, 1) as a result of evaluating the head of the resulting Stream
result.foreach(aUnit => {})
//prints out the other elements as the rest of the entries of Stream are computed, i.e.
//(1,2)
//(1,3)
//(2,1)
//...
result.foreach(aUnit => {})
//probably won't print out anything because elements of Stream have been computed,
//memoized and probably don't need to be computed again at this point.

Scala: Side effects with collection transformations

I am trying to understand views in scala via this link http://docs.scala-lang.org/overviews/collections/views.html.
I didn't understand what collection transformations have/not side effects means ! ?
Thank you
By having side effects there is meant a situation when you execute a code in the collection transformations that closures over some external state, or have any side effect that effects anything else than the result of the transformation. Example:
val l = List(1, 2, 3, 4).view.map(x => {println(x); x + 1})
When you execute this code it will print nothing, because view delays the execution of map. Furthermore each time you try to iterate over this list, map will be executed, resulting in printing value more times than it was desirable.
var counter = 0
val ll = for (i <- List(1, 2, 3, 4).view)
yield { counter += 1; i + 1}
println(counter) // 0
println(ll.toList) // this executes .force internally
println(counter) // 4
Behaves in the same way, but it is even more unexpected. counter increases only after the fact of iteration, and given that ll is lazy and delayed, the iteration may happen far more deeper in the code, resulting in counter being equal to 0 before that
Scala has immutable collections (everything in scala.collection.immutable). These collection types have no operations to modfiy them, but only to get modified copies.
So for example this
Set(1) + 2
will give you a new Set containing 1 and 2, not modify the first set. The same holds for transformations as map, flatMap, filter etc.
Views do not change anything about that. The only difference between a view and the collection it is based on is, that (most) operations on views are lazy, i.e. intermediate results are not computed.
val l1 = List(1,2)
val l2 = List(1,2).map(x => x + 1) // a new List(2,3) is computed here
l2.foreach(println) // the elements of l2 are just printed
With views:
val v2 = l1.view.map(x => x + 1) // nothing is computed here
v2.foreach(println) // the values are computed step by step

Is there an elegant way to foldLeft on a growing scala.collections.mutable.Queue?

I have a recursive function that I am trying to make #tailrec by having the inner, recursive part (countR3) add elements to a queue (agenda is a scala.collections.mutable.Queue). My idea is to then have the outer part of the function fold over the agenda and sum up the results.
NOTE: This was a homework problem, thus I don't want to post the whole code; however, making the implementation tail-recursive was not part of the homework.
Here is the portion of the code relevant to my question:
import scala.collection.mutable.Queue
val agenda: Queue[Tuple2[Int, List[Int]]] = Queue()
#tailrec
def countR3(y: Int, x: List[Int]): Int = {
if (y == 0) 1
else if (x.isEmpty) 0
else if …
else {
agenda.enqueue((y - x.head, x))
countR3(y, x.tail)
}
}
⋮
agenda.enqueue((4, List(1, 2)))
val count = agenda.foldLeft(0) {
(count, pair) => {
val mohr = countR3(pair._1, pair._2)
println("count=" + count + " countR3=" + mohr)
count + mohr
}
}
println(agenda.mkString(" + "))
count
This almost seems to work… The problem is that it doesn't iterate over all of the items added to the agenda, yet it does process some of them. You can see this in the output below:
count=0 countR3=0
count=0 countR3=0
count=0 countR3=0
(4,List(1, 2)) + (3,List(1, 2)) + (2,List(2)) + (2,List(1, 2)) + (1,List(2)) + (0,List(2))
[Of the six items on the final agenda, only the first three were processed.]
I'm generally well-aware of the hazards of mutating a collection while you're iterating over it in, say, Java. But a Queue is kind of a horse of a different color. Of course, I understand I could simply write an imperative loop, like so:
var count = 0
while (!agenda.isEmpty) {
val pair = agenda.dequeue()
count += countR3(pair._1, pair._2)
}
This works perfectly well, but this being Scala, I am exploring to see if there is a more functionally elegant way.
Any suggestions?
Although probably not entirely idiomatic, you could try this:
Stream.continually({ if (agenda.isEmpty) None else Some(agenda.dequeue()) }).
takeWhile(_.isDefined).flatten.
map({ case (x, y) => countR3(x, y) }).
toList.sum
The Stream.continually({ if (agenda.isEmpty) None else Some(agenda.dequeue()) }) gives you an infinite stream of queue items wrapped in Option[Tuple2[Int, List[Int]]].
Then, takeWhile(_.isDefined) cuts off the sequence as soon as the first None item is encountered, i.e. when the queue is exhausted.
As the previous call still yields a sequence of Options, we need to unwrap them with flatten.
map({ case (x, y) => countR3(x, y) }) basically applies your function to each item.
And finally, toList forces the evaluation of a stream (that's what we were working with) and then sum calculates the sum of list's items.
I think the problem with agenda.foldLeft (that it processes only 'some' enqueued items) is I'd guess that it takes a (probably immutable) list of currently enqueued items, and therefore isn't affected by items that were added during the calculation.

Incrementing the for loop (loop variable) in scala by power of 5

I had asked this question on Javaranch, but couldn't get a response there. So posting it here as well:
I have this particular requirement where the increment in the loop variable is to be done by multiplying it with 5 after each iteration. In Java we could implement it this way:
for(int i=1;i<100;i=i*5){}
In scala I was trying the following code-
var j=1
for(i<-1.to(100).by(scala.math.pow(5,j).toInt))
{
println(i+" "+j)
j=j+1
}
But its printing the following output:
1 1
6 2
11 3
16 4
21 5
26 6
31 7
36 8
....
....
Its incrementing by 5 always. So how do I got about actually multiplying the increment by 5 instead of adding it.
Let's first explain the problem. This code:
var j=1
for(i<-1.to(100).by(scala.math.pow(5,j).toInt))
{
println(i+" "+j)
j=j+1
}
is equivalent to this:
var j = 1
val range: Range = Predef.intWrapper(1).to(100)
val increment: Int = scala.math.pow(5, j).toInt
val byRange: Range = range.by(increment)
byRange.foreach {
println(i+" "+j)
j=j+1
}
So, by the time you get to mutate j, increment and byRange have already been computed. And Range is an immutable object -- you can't change it. Even if you produced new ranges while you did the foreach, the object doing the foreach would still be the same.
Now, to the solution. Simply put, Range is not adequate for your needs. You want a geometric progression, not an arithmetic one. To me (and pretty much everyone else answering, it seems), the natural solution would be to use a Stream or Iterator created with iterate, which computes the next value based on the previous one.
for(i <- Iterator.iterate(1)(_ * 5) takeWhile (_ < 100)) {
println(i)
}
EDIT: About Stream vs Iterator
Stream and Iterator are very different data structures, that share the property of being non-strict. This property is what enables iterate to even exist, since this method is creating an infinite collection1, from which takeWhile will create a new2 collection which is finite. Let's see here:
val s1 = Stream.iterate(1)(_ * 5) // s1 is infinite
val s2 = s1.takeWhile(_ < 100) // s2 is finite
val i1 = Iterator.iterate(1)(_ * 5) // i1 is infinite
val i2 = i1.takeWhile(_ < 100) // i2 is finite
These infinite collections are possible because the collection is not pre-computed. On a List, all elements inside the list are actually stored somewhere by the time the list has been created. On the above examples, however, only the first element of each collection is known in advance. All others will only be computed if and when required.
As I mentioned, though, these are very different collections in other respects. Stream is an immutable data structure. For instance, you can print the contents of s2 as many times as you wish, and it will show the same output every time. On the other hand, Iterator is a mutable data structure. Once you used a value, that value will be forever gone. Print the contents of i2 twice, and it will be empty the second time around:
scala> s2 foreach println
1
5
25
scala> s2 foreach println
1
5
25
scala> i2 foreach println
1
5
25
scala> i2 foreach println
scala>
Stream, on the other hand, is a lazy collection. Once a value has been computed, it will stay computed, instead of being discarded or recomputed every time. See below one example of that behavior in action:
scala> val s2 = s1.takeWhile(_ < 100) // s2 is finite
s2: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> println(s2)
Stream(1, ?)
scala> s2 foreach println
1
5
25
scala> println(s2)
Stream(1, 5, 25)
So Stream can actually fill up the memory if one is not careful, whereas Iterator occupies constant space. On the other hand, one can be surprised by Iterator, because of its side effects.
(1) As a matter of fact, Iterator is not a collection at all, even though it shares a lot of the methods provided by collections. On the other hand, from the problem description you gave, you are not really interested in having a collection of numbers, just in iterating through them.
(2) Actually, though takeWhile will create a new Iterator on Scala 2.8.0, this new iterator will still be linked to the old one, and changes in one have side effects on the other. This is subject to discussion, and they might end up being truly independent in the future.
In a more functional style:
scala> Stream.iterate(1)(i => i * 5).takeWhile(i => i < 100).toList
res0: List[Int] = List(1, 5, 25)
And with more syntactic sugar:
scala> Stream.iterate(1)(_ * 5).takeWhile(_ < 100).toList
res1: List[Int] = List(1, 5, 25)
Maybe a simple while-loop would do?
var i=1;
while (i < 100)
{
println(i);
i*=5;
}
or if you want to also print the number of iterations
var i=1;
var j=1;
while (i < 100)
{
println(j + " : " + i);
i*=5;
j+=1;
}
it seems you guys likes functional so how about a recursive solution?
#tailrec def quints(n:Int): Unit = {
println(n);
if (n*5<100) quints(n*5);
}
Update: Thanks for spotting the error... it should of course be power, not multiply:
Annoyingly, there doesn't seem to be an integer pow function in the standard library!
Try this:
def pow5(i:Int) = math.pow(5,i).toInt
Iterator from 1 map pow5 takeWhile (100>=) toList
Or if you want to use it in-place:
Iterator from 1 map pow5 takeWhile (100>=) foreach {
j => println("number:" + j)
}
and with the indices:
val iter = Iterator from 1 map pow5 takeWhile (100>=)
iter.zipWithIndex foreach { case (j, i) => println(i + " = " + j) }
(0 to 2).map (math.pow (5, _).toInt).zipWithIndex
res25: scala.collection.immutable.IndexedSeq[(Int, Int)] = Vector((1,0), (5,1), (25,2))
produces a Vector, with i,j in reversed order.