Scala future combining - scala

Imagine following variation of InputStream:
trait FutureInputStream {
//read bytes asynchronously. Empty array means EOF
def read(): Future[Array[Byte]]
}
Question is how to write discardAll function for such stream? Here is my solution:
//function that discards all input and returns Future completed on EOF
def discardAll(is: FutureInputStream): Future[Unit] = {
val f = is.read()
f.flatMap {
case v if v.length == 0 =>
Future successful Unit
case _ =>
discardAll(is)
}
}
Obvious problem with this code is non-optimizable recursion: it will quickly run out of stack. Is there more efficient solution?

There is nothing wrong with your solution. The call to discardAll(is) is done asynchronously. It doesn't happen in the same stack frame as the previous call, so there will be no stack overflow.
You can kind of see what happens with a naive implementation:
trait FutureInputStream {
var count = 0
def read(): Future[Array[Byte]] = {
if(count < 100000) {
count += 1
Future(Array(1))
} else
Future(Array())
}
}
If you were to feed discardAll with an instance of the above, it would be okay.
scala> val is = new FutureInputStream{}
is: FutureInputStream = $anon$1#255d542f
scala> discardAll(is).onComplete { println }
Success(())

Related

Get partial result on Scala time limited best effort computation

Trying to execute a function in a given time frame, but if computation fails by TimeOut get a partial result instead of an empty exception.
The attached code solves it.
The timedRun function is from Computation with time limit
Any better approach?.
package ga
object Ga extends App {
//this is the ugly...
var bestResult = "best result";
try {
val result = timedRun(150)(bestEffort())
} catch {
case e: Exception =>
print ("timed at = ")
}
println(bestResult)
//dummy function
def bestEffort(): String = {
var res = 0
for (i <- 0 until 100000) {
res = i
bestResult = s" $res"
}
" " + res
}
//This is the elegant part from stackoverflow gruenewa
#throws(classOf[java.util.concurrent.TimeoutException])
def timedRun[F](timeout: Long)(f: => F): F = {
import java.util.concurrent.{ Callable, FutureTask, TimeUnit }
val task = new FutureTask(new Callable[F]() {
def call() = f
})
new Thread(task).start()
task.get(timeout, TimeUnit.MILLISECONDS)
}
}
I would introduce a small intermediate class for more explicitly communicating the partial results between threads. That way you don't have to modify non-local state in any surprising ways. Then you can also just catch the exception within the timedRun method:
class Result[A](var result: A)
val result = timedRun(150)("best result")(bestEffort)
println(result)
//dummy function
def bestEffort(r: Result[String]): Unit = {
var res = 0
for (i <- 0 until 100000) {
res = i
r.result = s" $res"
}
r.result = " " + res
}
def timedRun[A](timeout: Long)(initial: A)(f: Result[A] => _): A = {
import java.util.concurrent.{ Callable, FutureTask, TimeUnit }
val result = new Result(initial)
val task = new FutureTask(new Callable[A]() {
def call() = { f(result); result.result }
})
new Thread(task).start()
try {
task.get(timeout, TimeUnit.MILLISECONDS)
} catch {
case e: java.util.concurrent.TimeoutException => result.result
}
}
It's admittedly a bit awkward since you don't usually have the "return value" of a function passed in as a parameter. But I think it's the least-radical modification of your code that makes sense. You could also consider modeling your computation as something that returns a Stream or Iterator of partial results, and then essentially do .takeWhile(notTimedOut).last. But how feasible that is really depends on the actual computation.
First, you need to use one of the solution to recover after the future timed out which are unfortunately not built-in in Scala:
See: Scala Futures - built in timeout?
For example:
def withTimeout[T](fut:Future[T])(implicit ec:ExecutionContext, after:Duration) = {
val prom = Promise[T]()
val timeout = TimeoutScheduler.scheduleTimeout(prom, after)
val combinedFut = Future.firstCompletedOf(List(fut, prom.future))
fut onComplete{case result => timeout.cancel()}
combinedFut
}
Then it is easy:
var bestResult = "best result"
val expensiveFunction = Future {
var res = 0
for (i <- 0 until 10000) {
Thread.sleep(10)
res = i
bestResult = s" $res"
}
" " + res
}
val timeoutFuture = withTimeout(expensiveFunction) recover {
case _: TimeoutException => bestResult
}
println(Await.result(timeoutFuture, 1 seconds))

Iterate data source asynchronously in batch and stop while remote return no data in Scala

Let's say we have a fake data source which will return data it holds in batch
class DataSource(size: Int) {
private var s = 0
implicit val g = scala.concurrent.ExecutionContext.global
def getData(): Future[List[Int]] = {
s = s + 1
Future {
Thread.sleep(Random.nextInt(s * 100))
if (s <= size) {
List.fill(100)(s)
} else {
List()
}
}
}
object Test extends App {
val source = new DataSource(100)
implicit val g = scala.concurrent.ExecutionContext.global
def process(v: List[Int]): Unit = {
println(v)
}
def next(f: (List[Int]) => Unit): Unit = {
val fut = source.getData()
fut.onComplete {
case Success(v) => {
f(v)
v match {
case h :: t => next(f)
}
}
}
}
next(process)
Thread.sleep(1000000000)
}
I have mine, the problem here is some portion is more not pure. Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list? My situation is a little from this post, the next() there is synchronous call while my is also async.
Or is it ever possible to do what I want? Next batch will only be fetched when the previous one is resolved in the end whether to fetch the next batch depends on the size returned?
What's the best way to walk through this type of data sources? Are there any existing Scala frameworks that provide the feature I am looking for? Is play's Iteratee, Enumerator, Enumeratee the right tool? If so, can anyone provide an example on how to use those facilities to implement what I am looking for?
Edit----
With help from chunjef, I had just tried out. And it actually did work out for me. However, there was some small change I made based on his answer.
Source.fromIterator(()=>Iterator.continually(source.getData())).mapAsync(1) (f=>f.filter(_.size > 0))
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
However, can someone give comparison between Akka Stream and Play Iteratee? Does it worth me also try out Iteratee?
Code snip 1:
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
Code snip 2: Assuming the getData depends on some other output of another flow, and I would like to concat it with the below flow. However, it yield too many files open error. Not sure what would cause this error, the mapAsync has been limited to 1 as its throughput if I understood correctly.
Flow[Int].mapConcat[Future[List[Int]]](c => {
Iterator.continually(ds.getData(c)).to[collection.immutable.Iterable]
}).mapAsync(1)(identity).takeWhile(_.nonEmpty).runForeach(println)
The following is one way to achieve the same behavior with Akka Streams, using your DataSource class:
import scala.concurrent.Future
import scala.util.Random
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl._
object StreamsExample extends App {
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
val ds = new DataSource(100)
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
}
class DataSource(size: Int) {
...
}
A simplified line-by-line overview:
line 1: Creates a stream source that continually calls ds.getData if there is downstream demand.
line 2: mapAsync is a way to deal with stream elements that are Futures. In this case, the stream elements are of type Future[List[Int]]. The argument 1 is the level of parallelism: we specify 1 here because DataSource internally uses a mutable variable, and a parallelism level greater than one could produce unexpected results. identity is shorthand for x => x, which basically means that for each Future, we pass its result downstream without transforming it.
line 3: Essentially, ds.getData is called as long as the result of the Future is a non-empty List[Int]. If an empty List is encountered, processing is terminated.
line 4: runForeach here takes a function List[Int] => Unit and invokes that function for each stream element.
Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list?
I think you are looking for a Promise.
You would set up a Promise before you start the first iteration.
This gives you promise.future, a Future that you can then use to follow the completion of everything.
In your onComplete, you add a case _ => promise.success().
Something like
def loopUntilDone(f: (List[Int]) => Unit): Future[Unit] = {
val promise = Promise[Unit]
def next(): Unit = source.getData().onComplete {
case Success(v) =>
f(v)
v match {
case h :: t => next()
case _ => promise.success()
}
case Failure(e) => promise.failure(e)
}
// get going
next(f)
// return the Future for everything
promise.future
}
// future for everything, this is a `Future[Unit]`
// its `onComplete` will be triggered when there is no more data
val everything = loopUntilDone(process)
You are probably looking for a reactive streams library. My personal favorite (and one I'm most familiar with) is Monix. This is how it will work with DataSource unchanged
import scala.concurrent.duration.Duration
import scala.concurrent.Await
import monix.reactive.Observable
import monix.execution.Scheduler.Implicits.global
object Test extends App {
val source = new DataSource(100)
val completed = // <- this is Future[Unit], completes when foreach is done
Observable.repeat(Observable.fromFuture(source.getData()))
.flatten // <- Here it's Observable[List[Int]], it has collection-like methods
.takeWhile(_.nonEmpty)
.foreach(println)
Await.result(completed, Duration.Inf)
}
I just figured out that by using flatMapConcat can achieve what I wanted to achieve. There is no point to start another question as I have had the answer already. Put my sample code here just in case someone is looking for similar answer.
This type of API is very common for some integration between traditional Enterprise applications. The DataSource is to mock the API while the object App is to demonstrate how the client code can utilize Akka Stream to consume the APIs.
In my small project the API was provided in SOAP, and I used scalaxb to transform the SOAP to Scala async style. And with the client calls demonstrated in the object App, we can consume the API with AKKA Stream. Thanks for all for the help.
class DataSource(size: Int) {
private var transactionId: Long = 0
private val transactionCursorMap: mutable.HashMap[TransactionId, Set[ReadCursorId]] = mutable.HashMap.empty
private val cursorIteratorMap: mutable.HashMap[ReadCursorId, Iterator[List[Int]]] = mutable.HashMap.empty
implicit val g = scala.concurrent.ExecutionContext.global
case class TransactionId(id: Long)
case class ReadCursorId(id: Long)
def startTransaction(): Future[TransactionId] = {
Future {
synchronized {
transactionId += transactionId
}
val t = TransactionId(transactionId)
transactionCursorMap.update(t, Set(ReadCursorId(0)))
t
}
}
def createCursorId(t: TransactionId): ReadCursorId = {
synchronized {
val c = transactionCursorMap.getOrElseUpdate(t, Set(ReadCursorId(0)))
val currentId = c.foldLeft(0l) { (acc, a) => acc.max(a.id) }
val cId = ReadCursorId(currentId + 1)
transactionCursorMap.update(t, c + cId)
cursorIteratorMap.put(cId, createIterator)
cId
}
}
def createIterator(): Iterator[List[Int]] = {
(for {i <- 1 to 100} yield List.fill(100)(i)).toIterator
}
def startRead(t: TransactionId): Future[ReadCursorId] = {
Future {
createCursorId(t)
}
}
def getData(cursorId: ReadCursorId): Future[List[Int]] = {
synchronized {
Future {
Thread.sleep(Random.nextInt(100))
cursorIteratorMap.get(cursorId) match {
case Some(i) => i.next()
case _ => List()
}
}
}
}
}
object Test extends App {
val source = new DataSource(10)
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
implicit val g = scala.concurrent.ExecutionContext.global
//
// def process(v: List[Int]): Unit = {
// println(v)
// }
//
// def next(f: (List[Int]) => Unit): Unit = {
// val fut = source.getData()
// fut.onComplete {
// case Success(v) => {
// f(v)
// v match {
//
// case h :: t => next(f)
//
// }
// }
//
// }
//
// }
//
// next(process)
//
// Thread.sleep(1000000000)
val s = Source.fromFuture(source.startTransaction())
.map { e =>
source.startRead(e)
}
.mapAsync(1)(identity)
.flatMapConcat(
e => {
Source.fromIterator(() => Iterator.continually(source.getData(e)))
})
.mapAsync(5)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
/*
val done = Source.fromIterator(() => Iterator.continually(source.getData())).mapAsync(1)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runFold(List[List[Int]]()) { (acc, r) =>
// println("=======" + acc + r)
r :: acc
}
done.onSuccess {
case e => {
e.foreach(println)
}
}
done.onComplete(_ => system.terminate())
*/
}

Running two scala functions in parallel, returning the latest value after 5 minutes

I have two Scala functions that are expensive to run. Each one is like below, they start improving the value of a variable and I'd like to run them simultaneously and after 5 minutes (or some other time). I'd like to terminate the two functions and take their latest value up to that time.
def func1(n: Int): Double = {
var a = 0.0D
while (not terminated) {
/// improve value of 'a' with algorithm 1
}
}
def func2(n: Int): Double = {
var a = 0.0D
while (not terminated) {
/// improve value of 'a' with algorithm 2
}
}
I would like to know how I should structure my code for doing that and what is the best practice here? I was thinking about running them in two different threads with a timeout and return their latest value at time out. But it seems there can be other ways for doing that. I am new to Scala so any insight would be tremendously helpful.
It is not hard. Here is one way of doing it:
#volatile var terminated = false
def func1(n: Int): Double = {
var a = 0.0D
while (!terminated) {
a = 0.0001 + a * 0.99999; //some useless formula1
}
a
}
def func2(n: Int): Double = {
var a = 0.0D
while (!terminated) {
a += 0.0001 //much simpler formula2, just for testing
}
a
}
def main(args: Array[String]): Unit = {
val f1 = Future { func1(1) } //work starts here
val f2 = Future { func2(2) } //and here
//aggregate results into one common future
val aggregatedFuture = for{
f1Result <- f1
f2Result <- f2
} yield (f1Result, f2Result)
Thread.sleep(500) //wait here for some calculations in ms
terminated = true //this is where we actually command to stop
//since looping to while() takes time, we need to wait for results
val res = Await.result(aggregatedFuture, 50.millis)
//just a printout
println("results:" + res)
}
But, of course, you would want to maybe look at your while loops and create a more manageable and chainable calculations.
Output: results:(9.999999999933387,31206.34691883926)
I am not 100% sure if this is something you would want to do, but here is one approach (not for 5 minutes, but you can change that) :
object s
{
def main(args: Array[String]): Unit = println(run())
def run(): (Int, Int) =
{
val (s, numNanoSec, seedVal) = (System.nanoTime, 500000000L, 0)
Seq(f1 _, f2 _).par.map(f =>
{
var (i, id) = f(seedVal)
while (System.nanoTime - s < numNanoSec)
{
i = f(i)._1
}
(i, id)
}).seq.maxBy(_._1)
}
def f1(a: Int): (Int, Int) = (a + 1, 1)
def f2(a: Int): (Int, Int) = (a + 2, 2)
}
Output:
me#ideapad:~/junk> scala s.scala
(34722678,2)
me#ideapad:~/junk> scala s.scala
(30065688,2)
me#ideapad:~/junk> scala s.scala
(34650716,2)
Of course this all assumes you have at least two threads available to distribute tasks to.
You can use Future with Await result to do that:
def fun2(): Double = {
var a = 0.0f
val f = Future {
// improve a with algorithm 2
a
}
try {
Await.result(f, 5 minutes)
} catch {
case e: TimeoutException => a
}
}
use the Await.result to wait algorithm with timeout, when we met this timeout, we return the a directly

Scala Actors + Console.withOut possible bug

I found some strange behavior when Console.withOut used within an actor. For code:
case object I
val out = new PipedOutputStream
val pipe = new PipedInputStream(out)
def read: String = ** read from `pipe` stream
class A extends Actor{
var b: Actor = _
Console.withOut(out){
b = actor { loop { self react {
case I => println("II")
}}}
}
def act = {
loop { self react {
case I =>
println("I")
b ! I
}}
}
}
def main(args: Array[String]): Unit = {
val a = new A
a.start
a ! I
Thread sleep 100
println("!!\n" + read + "!!")
}
got following output:
!!
I
II
!!
Any idea why output from A actor's act method is also redirected? Thank you for your answers.
UPDATE:
Here is read function:
#tailrec
def read(instream: InputStream, acc: List[Char] = Nil): String =
if(instream.available > 0) read(instream, acc :+ instream.read.toChar) else acc mkString ""
def read: String = read(pipe)
It seems to me, on the contrary, that neither actor has its output redirected, since withOut will have finished executing long before println("II") is called. Since this is all based on DynamicVariable, however, I'm not willing to bet on it. :-) The absence of working code precludes any testing as well.

Implementing yield (yield return) using Scala continuations

How might one implement C# yield return using Scala continuations? I'd like to be able to write Scala Iterators in the same style. A stab is in the comments on this Scala news post, but it doesn't work (tried using the Scala 2.8.0 beta). Answers in a related question suggest this is possible, but although I've been playing with delimited continuations for a while, I can't seem to exactly wrap my head around how to do this.
Before we introduce continuations we need to build some infrastructure.
Below is a trampoline that operates on Iteration objects.
An iteration is a computation that can either Yield a new value or it can be Done.
sealed trait Iteration[+R]
case class Yield[+R](result: R, next: () => Iteration[R]) extends Iteration[R]
case object Done extends Iteration[Nothing]
def trampoline[R](body: => Iteration[R]): Iterator[R] = {
def loop(thunk: () => Iteration[R]): Stream[R] = {
thunk.apply match {
case Yield(result, next) => Stream.cons(result, loop(next))
case Done => Stream.empty
}
}
loop(() => body).iterator
}
The trampoline uses an internal loop that turns the sequence of Iteration objects into a Stream.
We then get an Iterator by calling iterator on the resulting stream object.
By using a Stream our evaluation is lazy; we don't evaluate our next iteration until it is needed.
The trampoline can be used to build an iterator directly.
val itr1 = trampoline {
Yield(1, () => Yield(2, () => Yield(3, () => Done)))
}
for (i <- itr1) { println(i) }
That's pretty horrible to write, so let's use delimited continuations to create our Iteration objects automatically.
We use the shift and reset operators to break the computation up into Iterations,
then use trampoline to turn the Iterations into an Iterator.
import scala.continuations._
import scala.continuations.ControlContext.{shift,reset}
def iterator[R](body: => Unit #cps[Iteration[R],Iteration[R]]): Iterator[R] =
trampoline {
reset[Iteration[R],Iteration[R]] { body ; Done }
}
def yld[R](result: R): Unit #cps[Iteration[R],Iteration[R]] =
shift((k: Unit => Iteration[R]) => Yield(result, () => k(())))
Now we can rewrite our example.
val itr2 = iterator[Int] {
yld(1)
yld(2)
yld(3)
}
for (i <- itr2) { println(i) }
Much better!
Now here's an example from the C# reference page for yield that shows some more advanced usage.
The types can be a bit tricky to get used to, but it all works.
def power(number: Int, exponent: Int): Iterator[Int] = iterator[Int] {
def loop(result: Int, counter: Int): Unit #cps[Iteration[Int],Iteration[Int]] = {
if (counter < exponent) {
yld(result)
loop(result * number, counter + 1)
}
}
loop(number, 0)
}
for (i <- power(2, 8)) { println(i) }
I managed to discover a way to do this, after a few more hours of playing around. I thought this was simpler to wrap my head around than all the other solutions I've seen thus far, though I did afterward very much appreciate Rich's and Miles' solutions.
def loopWhile(cond: =>Boolean)(body: =>(Unit #suspendable)): Unit #suspendable = {
if (cond) {
body
loopWhile(cond)(body)
}
}
class Gen {
var prodCont: Unit => Unit = { x: Unit => prod }
var nextVal = 0
def yld(i: Int) = shift { k: (Unit => Unit) => nextVal = i; prodCont = k }
def next = { prodCont(); nextVal }
def prod = {
reset {
// following is generator logic; can be refactored out generically
var i = 0
i += 1
yld(i)
i += 1
yld(i)
// scala continuations plugin can't handle while loops, so need own construct
loopWhile (true) {
i += 1
yld(i)
}
}
}
}
val it = new Gen
println(it.next)
println(it.next)
println(it.next)