How might one implement C# yield return using Scala continuations? I'd like to be able to write Scala Iterators in the same style. A stab is in the comments on this Scala news post, but it doesn't work (tried using the Scala 2.8.0 beta). Answers in a related question suggest this is possible, but although I've been playing with delimited continuations for a while, I can't seem to exactly wrap my head around how to do this.
Before we introduce continuations we need to build some infrastructure.
Below is a trampoline that operates on Iteration objects.
An iteration is a computation that can either Yield a new value or it can be Done.
sealed trait Iteration[+R]
case class Yield[+R](result: R, next: () => Iteration[R]) extends Iteration[R]
case object Done extends Iteration[Nothing]
def trampoline[R](body: => Iteration[R]): Iterator[R] = {
def loop(thunk: () => Iteration[R]): Stream[R] = {
thunk.apply match {
case Yield(result, next) => Stream.cons(result, loop(next))
case Done => Stream.empty
}
}
loop(() => body).iterator
}
The trampoline uses an internal loop that turns the sequence of Iteration objects into a Stream.
We then get an Iterator by calling iterator on the resulting stream object.
By using a Stream our evaluation is lazy; we don't evaluate our next iteration until it is needed.
The trampoline can be used to build an iterator directly.
val itr1 = trampoline {
Yield(1, () => Yield(2, () => Yield(3, () => Done)))
}
for (i <- itr1) { println(i) }
That's pretty horrible to write, so let's use delimited continuations to create our Iteration objects automatically.
We use the shift and reset operators to break the computation up into Iterations,
then use trampoline to turn the Iterations into an Iterator.
import scala.continuations._
import scala.continuations.ControlContext.{shift,reset}
def iterator[R](body: => Unit #cps[Iteration[R],Iteration[R]]): Iterator[R] =
trampoline {
reset[Iteration[R],Iteration[R]] { body ; Done }
}
def yld[R](result: R): Unit #cps[Iteration[R],Iteration[R]] =
shift((k: Unit => Iteration[R]) => Yield(result, () => k(())))
Now we can rewrite our example.
val itr2 = iterator[Int] {
yld(1)
yld(2)
yld(3)
}
for (i <- itr2) { println(i) }
Much better!
Now here's an example from the C# reference page for yield that shows some more advanced usage.
The types can be a bit tricky to get used to, but it all works.
def power(number: Int, exponent: Int): Iterator[Int] = iterator[Int] {
def loop(result: Int, counter: Int): Unit #cps[Iteration[Int],Iteration[Int]] = {
if (counter < exponent) {
yld(result)
loop(result * number, counter + 1)
}
}
loop(number, 0)
}
for (i <- power(2, 8)) { println(i) }
I managed to discover a way to do this, after a few more hours of playing around. I thought this was simpler to wrap my head around than all the other solutions I've seen thus far, though I did afterward very much appreciate Rich's and Miles' solutions.
def loopWhile(cond: =>Boolean)(body: =>(Unit #suspendable)): Unit #suspendable = {
if (cond) {
body
loopWhile(cond)(body)
}
}
class Gen {
var prodCont: Unit => Unit = { x: Unit => prod }
var nextVal = 0
def yld(i: Int) = shift { k: (Unit => Unit) => nextVal = i; prodCont = k }
def next = { prodCont(); nextVal }
def prod = {
reset {
// following is generator logic; can be refactored out generically
var i = 0
i += 1
yld(i)
i += 1
yld(i)
// scala continuations plugin can't handle while loops, so need own construct
loopWhile (true) {
i += 1
yld(i)
}
}
}
}
val it = new Gen
println(it.next)
println(it.next)
println(it.next)
Related
I am aware that Scala has optimizations for tail-recursive functions (i.e. those functions in which the recursive call is the last thing executed by the function). What I am asking here is whether there is a way to optimize tail calls to different functions. Consider the following Scala code:
def doA(): Unit = {
doB()
}
def doB(): Unit = {
doA()
}
If we let this execute long enough it will give a stack overflow error which one can mitigate by allocating more stack space. Nonetheless, it will eventually exceed the allocated space and once again cause a stack overflow error. One way to mitigate this could be:
case class C(f: () => C)
def run(): Unit = {
var c: C = C(() => doA())
while(true){
c = c.f.apply()
}
}
def doA(): C = {
C(() => doB())
}
def doB(): C = {
C(() => doA())
}
However, this proved to be quite slow. Is there a better way to optimize this?
Here's one way achieve an infinite progression of method calls, without consuming stack, where each method decides which method goes next.
def doA(): () => Any = {
doB _
}
def doB(): () => Any = {
doC _
}
def doC(): () => Any = {
if (util.Random.nextBoolean()) doA _
else doB _
}
Iterator.iterate(doA())(_.asInstanceOf[() => () => Any]())
.foreach(identity)
Let's say we have a fake data source which will return data it holds in batch
class DataSource(size: Int) {
private var s = 0
implicit val g = scala.concurrent.ExecutionContext.global
def getData(): Future[List[Int]] = {
s = s + 1
Future {
Thread.sleep(Random.nextInt(s * 100))
if (s <= size) {
List.fill(100)(s)
} else {
List()
}
}
}
object Test extends App {
val source = new DataSource(100)
implicit val g = scala.concurrent.ExecutionContext.global
def process(v: List[Int]): Unit = {
println(v)
}
def next(f: (List[Int]) => Unit): Unit = {
val fut = source.getData()
fut.onComplete {
case Success(v) => {
f(v)
v match {
case h :: t => next(f)
}
}
}
}
next(process)
Thread.sleep(1000000000)
}
I have mine, the problem here is some portion is more not pure. Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list? My situation is a little from this post, the next() there is synchronous call while my is also async.
Or is it ever possible to do what I want? Next batch will only be fetched when the previous one is resolved in the end whether to fetch the next batch depends on the size returned?
What's the best way to walk through this type of data sources? Are there any existing Scala frameworks that provide the feature I am looking for? Is play's Iteratee, Enumerator, Enumeratee the right tool? If so, can anyone provide an example on how to use those facilities to implement what I am looking for?
Edit----
With help from chunjef, I had just tried out. And it actually did work out for me. However, there was some small change I made based on his answer.
Source.fromIterator(()=>Iterator.continually(source.getData())).mapAsync(1) (f=>f.filter(_.size > 0))
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
However, can someone give comparison between Akka Stream and Play Iteratee? Does it worth me also try out Iteratee?
Code snip 1:
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
Code snip 2: Assuming the getData depends on some other output of another flow, and I would like to concat it with the below flow. However, it yield too many files open error. Not sure what would cause this error, the mapAsync has been limited to 1 as its throughput if I understood correctly.
Flow[Int].mapConcat[Future[List[Int]]](c => {
Iterator.continually(ds.getData(c)).to[collection.immutable.Iterable]
}).mapAsync(1)(identity).takeWhile(_.nonEmpty).runForeach(println)
The following is one way to achieve the same behavior with Akka Streams, using your DataSource class:
import scala.concurrent.Future
import scala.util.Random
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl._
object StreamsExample extends App {
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
val ds = new DataSource(100)
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
}
class DataSource(size: Int) {
...
}
A simplified line-by-line overview:
line 1: Creates a stream source that continually calls ds.getData if there is downstream demand.
line 2: mapAsync is a way to deal with stream elements that are Futures. In this case, the stream elements are of type Future[List[Int]]. The argument 1 is the level of parallelism: we specify 1 here because DataSource internally uses a mutable variable, and a parallelism level greater than one could produce unexpected results. identity is shorthand for x => x, which basically means that for each Future, we pass its result downstream without transforming it.
line 3: Essentially, ds.getData is called as long as the result of the Future is a non-empty List[Int]. If an empty List is encountered, processing is terminated.
line 4: runForeach here takes a function List[Int] => Unit and invokes that function for each stream element.
Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list?
I think you are looking for a Promise.
You would set up a Promise before you start the first iteration.
This gives you promise.future, a Future that you can then use to follow the completion of everything.
In your onComplete, you add a case _ => promise.success().
Something like
def loopUntilDone(f: (List[Int]) => Unit): Future[Unit] = {
val promise = Promise[Unit]
def next(): Unit = source.getData().onComplete {
case Success(v) =>
f(v)
v match {
case h :: t => next()
case _ => promise.success()
}
case Failure(e) => promise.failure(e)
}
// get going
next(f)
// return the Future for everything
promise.future
}
// future for everything, this is a `Future[Unit]`
// its `onComplete` will be triggered when there is no more data
val everything = loopUntilDone(process)
You are probably looking for a reactive streams library. My personal favorite (and one I'm most familiar with) is Monix. This is how it will work with DataSource unchanged
import scala.concurrent.duration.Duration
import scala.concurrent.Await
import monix.reactive.Observable
import monix.execution.Scheduler.Implicits.global
object Test extends App {
val source = new DataSource(100)
val completed = // <- this is Future[Unit], completes when foreach is done
Observable.repeat(Observable.fromFuture(source.getData()))
.flatten // <- Here it's Observable[List[Int]], it has collection-like methods
.takeWhile(_.nonEmpty)
.foreach(println)
Await.result(completed, Duration.Inf)
}
I just figured out that by using flatMapConcat can achieve what I wanted to achieve. There is no point to start another question as I have had the answer already. Put my sample code here just in case someone is looking for similar answer.
This type of API is very common for some integration between traditional Enterprise applications. The DataSource is to mock the API while the object App is to demonstrate how the client code can utilize Akka Stream to consume the APIs.
In my small project the API was provided in SOAP, and I used scalaxb to transform the SOAP to Scala async style. And with the client calls demonstrated in the object App, we can consume the API with AKKA Stream. Thanks for all for the help.
class DataSource(size: Int) {
private var transactionId: Long = 0
private val transactionCursorMap: mutable.HashMap[TransactionId, Set[ReadCursorId]] = mutable.HashMap.empty
private val cursorIteratorMap: mutable.HashMap[ReadCursorId, Iterator[List[Int]]] = mutable.HashMap.empty
implicit val g = scala.concurrent.ExecutionContext.global
case class TransactionId(id: Long)
case class ReadCursorId(id: Long)
def startTransaction(): Future[TransactionId] = {
Future {
synchronized {
transactionId += transactionId
}
val t = TransactionId(transactionId)
transactionCursorMap.update(t, Set(ReadCursorId(0)))
t
}
}
def createCursorId(t: TransactionId): ReadCursorId = {
synchronized {
val c = transactionCursorMap.getOrElseUpdate(t, Set(ReadCursorId(0)))
val currentId = c.foldLeft(0l) { (acc, a) => acc.max(a.id) }
val cId = ReadCursorId(currentId + 1)
transactionCursorMap.update(t, c + cId)
cursorIteratorMap.put(cId, createIterator)
cId
}
}
def createIterator(): Iterator[List[Int]] = {
(for {i <- 1 to 100} yield List.fill(100)(i)).toIterator
}
def startRead(t: TransactionId): Future[ReadCursorId] = {
Future {
createCursorId(t)
}
}
def getData(cursorId: ReadCursorId): Future[List[Int]] = {
synchronized {
Future {
Thread.sleep(Random.nextInt(100))
cursorIteratorMap.get(cursorId) match {
case Some(i) => i.next()
case _ => List()
}
}
}
}
}
object Test extends App {
val source = new DataSource(10)
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
implicit val g = scala.concurrent.ExecutionContext.global
//
// def process(v: List[Int]): Unit = {
// println(v)
// }
//
// def next(f: (List[Int]) => Unit): Unit = {
// val fut = source.getData()
// fut.onComplete {
// case Success(v) => {
// f(v)
// v match {
//
// case h :: t => next(f)
//
// }
// }
//
// }
//
// }
//
// next(process)
//
// Thread.sleep(1000000000)
val s = Source.fromFuture(source.startTransaction())
.map { e =>
source.startRead(e)
}
.mapAsync(1)(identity)
.flatMapConcat(
e => {
Source.fromIterator(() => Iterator.continually(source.getData(e)))
})
.mapAsync(5)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
/*
val done = Source.fromIterator(() => Iterator.continually(source.getData())).mapAsync(1)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runFold(List[List[Int]]()) { (acc, r) =>
// println("=======" + acc + r)
r :: acc
}
done.onSuccess {
case e => {
e.foreach(println)
}
}
done.onComplete(_ => system.terminate())
*/
}
Is there any more functional alternative in Scala for an infinite loop?
while(true) {
if (condition) {
// Do something
} else {
Thread.sleep(interval);
}
}
You can do it recursively
#tailrec
def loop(): Nothing = {
if (condition) {
// Do something
} else {
Thread.sleep(interval);
}
loop()
}
One thing that you can do is using higher-order functions like Stream.continually and pair it up with a for comprehension:
import scala.util.Random
import scala.collection.immutable.Stream.continually
def rollTheDice: Int = Random.nextInt(6) + 1
for (n <- continually(rollTheDice)) {
println(s"the dice rolled $n")
}
This example itself is not purely functional due to the non-referentially transparent nextInt method, but it's a possible construct that may help you think about function composition rather then using side effects.
EDIT (2020-12-24)
As correctly point out in a recent comment, "[a]s of 2.13, Stream is deprecated. But the same method does exist in LazyList(import scala.collection.immutable.LazyList.continually)".
The following will work from 2.13 onward:
import scala.util.Random
import scala.collection.immutable.LazyList.continually
def rollTheDice: Int = Random.nextInt(6) + 1
for (n <- continually(rollTheDice)) {
println(s"the dice rolled $n")
}
You can see it in action and play around with it here on Scastie.
I guess infinite tail recursion:
#tailrec
def loop(): Nothing = {
if (condition) {
// Do something
} else {
Thread.sleep(interval);
}
loop()
}
Just to add to Stefano's great answer, in case someone is looking to a use-case like mine:
I was working on tasks from Kafka Streams course and needed to create an infinite stream of mock events to Kafka with some fields being completely random(amounts), but others rotated within a specific list(names).
The same approach with continually can be used passing a method(via ETA expansion) to it and traversing the bounded variable afterwards:
for {record <- continually(newRandomTransaction _)
name <- List("John", "Stephane", "Alice")} {
producer.send(record(name))
}
where the signature of newRandomTransaction is as follows:
def newRandomTransaction(name: String): ProducerRecord[String, String] = {
...
}
Imagine following variation of InputStream:
trait FutureInputStream {
//read bytes asynchronously. Empty array means EOF
def read(): Future[Array[Byte]]
}
Question is how to write discardAll function for such stream? Here is my solution:
//function that discards all input and returns Future completed on EOF
def discardAll(is: FutureInputStream): Future[Unit] = {
val f = is.read()
f.flatMap {
case v if v.length == 0 =>
Future successful Unit
case _ =>
discardAll(is)
}
}
Obvious problem with this code is non-optimizable recursion: it will quickly run out of stack. Is there more efficient solution?
There is nothing wrong with your solution. The call to discardAll(is) is done asynchronously. It doesn't happen in the same stack frame as the previous call, so there will be no stack overflow.
You can kind of see what happens with a naive implementation:
trait FutureInputStream {
var count = 0
def read(): Future[Array[Byte]] = {
if(count < 100000) {
count += 1
Future(Array(1))
} else
Future(Array())
}
}
If you were to feed discardAll with an instance of the above, it would be okay.
scala> val is = new FutureInputStream{}
is: FutureInputStream = $anon$1#255d542f
scala> discardAll(is).onComplete { println }
Success(())
I have a Traversable, and I want to make it into a Java Iterator. My problem is that I want everything to be lazily done. If I do .toIterator on the traversable, it eagerly produces the result, copies it into a List, and returns an iterator over the List.
I'm sure I'm missing something simple here...
Here is a small test case that shows what I mean:
class Test extends Traversable[String] {
def foreach[U](f : (String) => U) {
f("1")
f("2")
f("3")
throw new RuntimeException("Not lazy!")
}
}
val a = new Test
val iter = a.toIterator
The reason you can't get lazily get an iterator from a traversable is that you intrinsically can't. Traversable defines foreach, and foreach runs through everything without stopping. No laziness there.
So you have two options, both terrible, for making it lazy.
First, you can iterate through the whole thing each time. (I'm going to use the Scala Iterator, but the Java Iterator is basically the same.)
class Terrible[A](t: Traversable[A]) extends Iterator[A] {
private var i = 0
def hasNext = i < t.size // This could be O(n)!
def next: A = {
val a = t.slice(i,i+1).head // Also could be O(n)!
i += 1
a
}
}
If you happen to have efficient indexed slicing, this will be okay. If not, each "next" will take time linear in the length of the iterator, for O(n^2) time just to traverse it. But this is also not necessarily lazy; if you insist that it must be you have to enforce O(n^2) in all cases and do
class Terrible[A](t: Traversable[A]) extends Iterator[A] {
private var i = 0
def hasNext: Boolean = {
var j = 0
t.foreach { a =>
j += 1
if (j>i) return true
}
false
}
def next: A = {
var j = 0
t.foreach{ a =>
j += 1
if (j>i) { i += 1; return a }
}
throw new NoSuchElementException("Terribly empty")
}
}
This is clearly a terrible idea for general code.
The other way to go is to use a thread and block the traversal of foreach as it's going. That's right, you have to do inter-thread communication on every single element access! Let's see how that works--I'm going to use Java threads here since Scala is in the middle of a switch to Akka-style actors (though any of the old actors or the Akka actors or the Scalaz actors or the Lift actors or (etc.) will work)
class Horrible[A](t: Traversable[A]) extends Iterator[A] {
private val item = new java.util.concurrent.SynchronousQueue[Option[A]]()
private class Loader extends Thread {
override def run() { t.foreach{ a => item.put(Some(a)) }; item.put(None) }
}
private val loader = new Loader
loader.start
private var got: Option[A] = null
def hasNext: Boolean = {
if (got==null) { got = item.poll; hasNext }
else got.isDefined
}
def next = {
if (got==null) got = item.poll
val ans = got.get
got = null
ans
}
}
This avoids the O(n^2) disaster, but ties up a thread and has desperately slow element-by-element access. I get about two million accesses per second on my machine, as compared to >100M for a typical traversable. This is clearly a horrible idea for general code.
So there you have it. Traversable is not lazy in general, and there is no good way to make it lazy without compromising performance tremendously.
I've run into this problem before and as far as I can tell, no one's particularly interested in making it easier to get an Iterator when all you've defined is foreach.
But as you've noted, toStream is the problem, so you could just override that:
class Test extends Traversable[String] {
def foreach[U](f: (String) => U) {
f("1")
f("2")
f("3")
throw new RuntimeException("Not lazy!")
}
override def toStream: Stream[String] = {
"1" #::
"2" #::
"3" #::
Stream[String](throw new RuntimeException("Not lazy!"))
}
}
Another alternative would be to define an Iterable instead of a Traversable, and then you'd get the iterator method directly. Could you explain a bit more what your Traversable is doing in your real use case?