Functional way of interrupting lazy iteration depedning on timeout and comparisson between previous and next, while, LazyList vs Stream - scala

Background
I have the following scenario. I want to execute the method of a class from an external library, repeatedly, and I want to do so until a certain timeout condition and result condition (compared to the previous result) is met. Furthermore I want to collect the return values, even on the "failed" run (the run with the "failing" result condition that should interrupt further execution).
Thus far I have accomplished this with initializing an empty var result: Result, a var stop: Boolean and using a while loop that runs while the conditions are true and modifying the outer state. I would like to get rid of this and use a functional approach.
Some context. Each run is expected to run from 0 to 60 minutes and the total time of iteration is capped at 60 minutes. Theoretically, there's no bound to how many times it executes in this period but in practice, it's generally 2-60 times.
The problem is, the runs take a long time so I need to stop the execution. My idea is to use some kind of lazy Iterator or Stream coupled with scanLeft and Option.
Code
Boiler plate
This code isn't particularly relevant but used in my approach samples and provide identical but somewhat random pseudo runtime results.
import scala.collection.mutable.ListBuffer
import scala.util.Random
val r = Random
r.setSeed(1)
val sleepingTimes: Seq[Int] = (1 to 601)
.map(x => Math.pow(2, x).toInt * r.nextInt(100))
.toList
.filter(_ > 0)
.sorted
val randomRes = r.shuffle((0 to 600).map(x => r.nextInt(10)).toList)
case class Result(val a: Int, val slept: Int)
class Lib() {
def run(i: Int) = {
println(s"running ${i}")
Thread.sleep(sleepingTimes(i))
Result(randomRes(i), sleepingTimes(i))
}
}
case class Baz(i: Int, result: Result)
val lib = new Lib()
val timeout = 10 * 1000
While approach
val iteratorStart = System.currentTimeMillis()
val iterator = for {
i <- (0 to 600).iterator
if System.currentTimeMillis() < iteratorStart + timeout
f = Baz(i, lib.run(i))
} yield f
val iteratorBuffer = ListBuffer[Baz]()
if (iterator.hasNext) iteratorBuffer.append(iterator.next())
var run = true
while (run && iterator.hasNext) {
val next = iterator.next()
run = iteratorBuffer.last.result.a < next.result.a
iteratorBuffer.append(next)
}
Stream approach (Scala.2.12)
Full example
val streamStart = System.currentTimeMillis()
val stream = for {
i <- (0 to 600).toStream
if System.currentTimeMillis() < streamStart + timeout
} yield Baz(i, lib.run(i))
var last: Option[Baz] = None
val head = stream.headOption
val tail = if (stream.nonEmpty) stream.tail else stream
val streamVersion = (tail
.scanLeft((head, true))((x, y) => {
if (x._1.exists(_.result.a > y.result.a)) (Some(y), false)
else (Some(y), true)
})
.takeWhile {
case (baz, continue) =>
if (!baz.eq(head)) last = baz
continue
}
.map(_._1)
.toList :+ last).flatten
LazyList approach (Scala 2.13)
Full example
val lazyListStart = System.currentTimeMillis()
val lazyList = for {
i <- (0 to 600).to(LazyList)
if System.currentTimeMillis() < lazyListStart + timeout
} yield Baz(i, lib.run(i))
var last: Option[Baz] = None
val head = lazyList.headOption
val tail = if (lazyList.nonEmpty) lazyList.tail else lazyList
val lazyListVersion = (tail
.scanLeft((head, true))((x, y) => {
if (x._1.exists(_.result.a > y.result.a)) (Some(y), false)
else (Some(y), true)
})
.takeWhile {
case (baz, continue) =>
if (!baz.eq(head)) last = baz
continue
}
.map(_._1)
.toList :+ last).flatten
Result
Both approaches appear to yield the correct end result:
List(Baz(0,Result(4,170)), Baz(1,Result(5,208)))
and they interrupt execution as desired.
Edit: The desired outcome is to not execute the next iteration but still return the result of the iteration that caused the interruption. Thus the desired result is
List(Baz(0,Result(4,170)), Baz(1,Result(5,208)), Baz(2,Result(2,256))
and lib.run(i) should only run 3 times.
This is achieved by the while approach, as well as the LazyList approach but not the Stream approach which executes lib.run 4 times (Bad!).
Question
Is there another stateless approach, which is hopefully more elegant?
Edit
I realized my examples were faulty and not returning the "failing" result, which it should, and that they kept executing beyond the stop condition. I rewrote the code and examples but I believe the spirit of the question is the same.

I would use something higher level, like fs2.
(or any other high-level streaming library, like: monix observables, akka streams or zio zstreams)
def runUntilOrTimeout[F[_]: Concurrent: Timer, A](work: F[A], timeout: FiniteDuration)
(stop: (A, A) => Boolean): Stream[F, A] = {
val interrupt =
Stream.sleep_(timeout)
val run =
Stream
.repeatEval(work)
.zipWithPrevious
.takeThrough {
case (Some(p), c) if stop(p, c) => false
case _ => true
} map {
case (_, c) => c
}
run mergeHaltBoth interrupt
}
You can see it working here.

Related

Returning a non-unit value from a scala while loop

private def canProceed: Boolean = {
val startTime = System.currentTimeMillis
val endTime = startTime + (5 * 1000)
while (System.currentTimeMillis < endTime) {
if (isSafe) { // method where my current implementation is just true or false for testing
true
} else {
println("Not safe. Trying again")
}
}
false
}
This will just keep iterating through the while loop since the true from the conditional doesn't actually do anything as a scala while loop always returns a Unit, so the final result will always be false. Is there some idiomatic way to do this without leveraging var or return?
Well technically speaking you could write your own tail-recursive function like this following.
def attemptWhile(cond: => Boolean)(check: => Boolean): Boolean = {
#annotation.tailrec
def loop(): Boolean = {
if (cond) check || loop()
else false
}
loop()
}
Which then you could use like:
private def canProceed: Boolean = {
val startTime = System.currentTimeMillis
val endTime = startTime + (5 * 1000)
attemptWhile(System.currentTimeMillis < endTime)(isSafe)
}
However, this is still depending on mutable state so not really functional.
At that point, you may wonder if it is worth the effort adding that attemptWhile or just use return.
BTW, if you want a fully functional solution this seems something that could be solved using something like fs2 but maybe that is just too off-topic for you.
You are basically asking whether or not you can do anything that's not a side effect inside a while loop and observe it on the outside.
Just like you cannot do anything worthwhile without side-effects in a while loop, you cannot observe anything on the outside of a while loop without side effects. The var and return keywords are directly related to side effects.
(Of course you can observe the heating up of your CPU or the slowdown of your program, but this is usually disregarded as it doesn't pertain to computation directly)
edit
For yield is there to address exactly this "problem".
scala> for {
| x <- List(1,2,3)
| y <- List(4,5,6)
| } yield (x,y)
val res0: List[(Int, Int)] = List((1,4), (1,5), (1,6), (2,4), (2,5), (2,6), (3,4), (3,5), (3,6))
You may want to look into effect types (Cats effect, Zio). They deal with exactly the case of turning effects into values. In your case, there are 2 effects:
System.currentTimeMillis this value is mutable and changes over time
isSafe has to be mutable and changed from the outside.
edit2
Here's an example of how you can iterate a side effect with Cats Effect
import cats.effect.{ExitCode, IO, IOApp}
import cats.implicits.catsSyntaxMonad
import scala.concurrent.duration.DurationInt
object CheckTime extends IOApp {
val checkTime: IO[Long] = IO.delay { System.currentTimeMillis() }
val waitForSomeTime: IO[Long] = for {
start <- checkTime
_ <- IO.delay { println(s"Start time: $start") }
stopTime <- checkTime.iterateWhile(currentTime => currentTime - start < 10.seconds.toMillis)
_ <- IO.delay { println(s"Ended at $stopTime") }
} yield stopTime
override def run(args: List[String]): IO[ExitCode] = waitForSomeTime.as(ExitCode.Success)
}
Output:
Start time: 1608103950673
Ended at 1608103960673

ParSeq.fill running sequentially?

I am trying to initialize an array in Scala, using parallelization. However, when using ParSeq.fill method, the performance doesn't seem to be better any better than sequential initialization (Seq.fill). If I do the same task, but initializing the collection with map, then it is much faster.
To show my point, I set up the following example:
import scala.collection.parallel.immutable.ParSeq
import scala.util.Random
object Timer {
def apply[A](f: => A): (A, Long) = {
val s = System.nanoTime
val ret = f
(ret, System.nanoTime - s)
}
}
object ParallelBenchmark extends App {
def randomIsPrime: Boolean = {
val n = Random.nextInt(1000000)
(2 until n).exists(i => n % i == 0)
}
val seqSize = 100000
val (_, timeSeq) = Timer { Seq.fill(seqSize)(randomIsPrime) }
println(f"Time Seq:\t\t $timeSeq")
val (_, timeParFill) = Timer { ParSeq.fill(seqSize)(randomIsPrime) }
println(f"Time Par Fill:\t $timeParFill")
val (_, timeParMap) = Timer { (0 until seqSize).par.map(_ => randomIsPrime) }
println(f"Time Par map:\t $timeParMap")
}
And the result is:
Time Seq: 32389215709
Time Par Fill: 32730035599
Time Par map: 17270448112
Clearly showing that the fill method is not running in parallel.
The parallel collections library in Scala can only parallelize existing collections, fill hasn't been implemented yet (and may never be). Your method of using a Range to generate a cheap placeholder collection is probably your best option if you want to see a speed boost.
Here's the underlying method being called by ParSeq.fill, obviously not parallel.

Scala - Execute arbitrary number of Futures sequentially but dependently [duplicate]

This question already has answers here:
Is there sequential Future.find?
(3 answers)
Closed 8 years ago.
I'm trying to figure out the neatest way to execute a series of Futures in sequence, where one Future's execution depends on the previous. I'm trying to do this for an arbitrary number of futures.
User case:
I have retrieved a number of Ids from my database.
I now need to retrieve some related data on a web service.
I want to stop once I've found a valid result.
I only care about the result that succeeded.
Executing these all in parallel and then parsing the collection of results returned isn't an option. I have to do one request at a time, and only execute the next request if the previous request returned no results.
The current solution is along these lines. Using foldLeft to execute the requests and then only evaluating the next future if the previous future meets some condition.
def dblFuture(i: Int) = { i * 2 }
val list = List(1,2,3,4,5)
val future = list.foldLeft(Future(0)) {
(previousFuture, next) => {
for {
previousResult <- previousFuture
nextFuture <- { if (previousResult <= 4) dblFuture(next) else previousFuture }
} yield (nextFuture)
}
}
The big downside of this is a) I keep processing all items even once i've got a result i'm happy with and b) once I've found the result I'm after, I keep evaluating the predicate. In this case it's a simple if, but in reality it could be more complicated.
I feel like I'm missing a far more elegant solution to this.
Looking at your example, it seems as though the previous result has no bearing on subsequent results, and instead what only matters is that the previous result satisfies some condition to prevent the next result from being computed. If that is the case, here is a recursive solution using filter and recoverWith.
def untilFirstSuccess[A, B](f: A => Future[B])(condition: B => Boolean)(list: List[A]): Future[B] = {
list match {
case head :: tail => f(head).filter(condition).recoverWith { case _: Throwable => untilFirstSuccess(f)(condition)(tail) }
case Nil => Future.failed(new Exception("All failed.."))
}
}
filter will only be called when the Future has completed, and recoverWith will only be called if the Future has failed.
def dblFuture(i: Int): Future[Int] = Future {
println("Executing.. " + i)
i * 2
}
val list = List(1, 2, 3, 4, 5)
scala> untilFirstSuccess(dblFuture)(_ > 6)(list)
Executing.. 1
Executing.. 2
Executing.. 3
Executing.. 4
res1: scala.concurrent.Future[Int] = scala.concurrent.impl.Promise$DefaultPromise#514f4e98
scala> res1.value
res2: Option[scala.util.Try[Int]] = Some(Success(8))
Neatest way, and "true functional programming" is scalaz-stream ;) However you'll need to switch to scalaz.concurrent.Task from scala Future for abstraction for "future result". It's a bit different. Task is pure, and Future is "running computation", but they have a lot in common.
import scalaz.concurrent.Task
import scalaz.stream.Process
def dblTask(i: Int) = Task {
println(s"Executing task $i")
i * 2
}
val list = Seq(1,2,3,4,5)
val p: Process[Task, Int] = Process.emitAll(list)
val result: Task[Option[Int]] =
p.flatMap(i => Process.eval(dblTask(i))).takeWhile(_ < 10).runLast
println(s"result = ${result.run}")
Result:
Executing task 1
Executing task 2
Executing task 3
Executing task 4
Executing task 5
result = Some(8)
if your computation is already scala Future, you can transform it to Task
implicit class Transformer[+T](fut: => SFuture[T]) {
def toTask(implicit ec: scala.concurrent.ExecutionContext): Task[T] = {
import scala.util.{Failure, Success}
import scalaz.syntax.either._
Task.async {
register =>
fut.onComplete {
case Success(v) => register(v.right)
case Failure(ex) => register(ex.left)
}
}
}
}

Efficient way to fold list in scala, while avoiding allocations and vars

I have a bunch of items in a list, and I need to analyze the content to find out how many of them are "complete". I started out with partition, but then realized that I didn't need to two lists back, so I switched to a fold:
val counts = groupRows.foldLeft( (0,0) )( (pair, row) =>
if(row.time == 0) (pair._1+1,pair._2)
else (pair._1, pair._2+1)
)
but I have a lot of rows to go through for a lot of parallel users, and it is causing a lot of GC activity (assumption on my part...the GC could be from other things, but I suspect this since I understand it will allocate a new tuple on every item folded).
for the time being, I've rewritten this as
var complete = 0
var incomplete = 0
list.foreach(row => if(row.time != 0) complete += 1 else incomplete += 1)
which fixes the GC, but introduces vars.
I was wondering if there was a way of doing this without using vars while also not abusing the GC?
EDIT:
Hard call on the answers I've received. A var implementation seems to be considerably faster on large lists (like by 40%) than even a tail-recursive optimized version that is more functional but should be equivalent.
The first answer from dhg seems to be on-par with the performance of the tail-recursive one, implying that the size pass is super-efficient...in fact, when optimized it runs very slightly faster than the tail-recursive one on my hardware.
The cleanest two-pass solution is probably to just use the built-in count method:
val complete = groupRows.count(_.time == 0)
val counts = (complete, groupRows.size - complete)
But you can do it in one pass if you use partition on an iterator:
val (complete, incomplete) = groupRows.iterator.partition(_.time == 0)
val counts = (complete.size, incomplete.size)
This works because the new returned iterators are linked behind the scenes and calling next on one will cause it to move the original iterator forward until it finds a matching element, but it remembers the non-matching elements for the other iterator so that they don't need to be recomputed.
Example of the one-pass solution:
scala> val groupRows = List(Row(0), Row(1), Row(1), Row(0), Row(0)).view.map{x => println(x); x}
scala> val (complete, incomplete) = groupRows.iterator.partition(_.time == 0)
Row(0)
Row(1)
complete: Iterator[Row] = non-empty iterator
incomplete: Iterator[Row] = non-empty iterator
scala> val counts = (complete.size, incomplete.size)
Row(1)
Row(0)
Row(0)
counts: (Int, Int) = (3,2)
I see you've already accepted an answer, but you rightly mention that that solution will traverse the list twice. The way to do it efficiently is with recursion.
def counts(xs: List[...], complete: Int = 0, incomplete: Int = 0): (Int,Int) =
xs match {
case Nil => (complete, incomplete)
case row :: tail =>
if (row.time == 0) counts(tail, complete + 1, incomplete)
else counts(tail, complete, incomplete + 1)
}
This is effectively just a customized fold, except we use 2 accumulators which are just Ints (primitives) instead of tuples (reference types). It should also be just as efficient a while-loop with vars - in fact, the bytecode should be identical.
Maybe it's just me, but I prefer using the various specialized folds (.size, .exists, .sum, .product) if they are available. I find it clearer and less error-prone than the heavy-duty power of general folds.
val complete = groupRows.view.filter(_.time==0).size
(complete, groupRows.length - complete)
How about this one? No import tax.
import scala.collection.generic.CanBuildFrom
import scala.collection.Traversable
import scala.collection.mutable.Builder
case class Count(n: Int, total: Int) {
def not = total - n
}
object Count {
implicit def cbf[A]: CanBuildFrom[Traversable[A], Boolean, Count] = new CanBuildFrom[Traversable[A], Boolean, Count] {
def apply(): Builder[Boolean, Count] = new Counter
def apply(from: Traversable[A]): Builder[Boolean, Count] = apply()
}
}
class Counter extends Builder[Boolean, Count] {
var n = 0
var ttl = 0
override def +=(b: Boolean) = { if (b) n += 1; ttl += 1; this }
override def clear() { n = 0 ; ttl = 0 }
override def result = Count(n, ttl)
}
object Counting extends App {
val vs = List(4, 17, 12, 21, 9, 24, 11)
val res: Count = vs map (_ % 2 == 0)
Console println s"${vs} have ${res.n} evens out of ${res.total}; ${res.not} were odd."
val res2: Count = vs collect { case i if i % 2 == 0 => i > 10 }
Console println s"${vs} have ${res2.n} evens over 10 out of ${res2.total}; ${res2.not} were smaller."
}
OK, inspired by the answers above, but really wanting to only pass over the list once and avoid GC, I decided that, in the face of a lack of direct API support, I would add this to my central library code:
class RichList[T](private val theList: List[T]) {
def partitionCount(f: T => Boolean): (Int, Int) = {
var matched = 0
var unmatched = 0
theList.foreach(r => { if (f(r)) matched += 1 else unmatched += 1 })
(matched, unmatched)
}
}
object RichList {
implicit def apply[T](list: List[T]): RichList[T] = new RichList(list)
}
Then in my application code (if I've imported the implicit), I can write var-free expressions:
val (complete, incomplete) = groupRows.partitionCount(_.time != 0)
and get what I want: an optimized GC-friendly routine that prevents me from polluting the rest of the program with vars.
However, I then saw Luigi's benchmark, and updated it to:
Use a longer list so that multiple passes on the list were more obvious in the numbers
Use a boolean function in all cases, so that we are comparing things fairly
http://pastebin.com/2XmrnrrB
The var implementation is definitely considerably faster, even though Luigi's routine should be identical (as one would expect with optimized tail recursion). Surprisingly, dhg's dual-pass original is just as fast (slightly faster if compiler optimization is on) as the tail-recursive one. I do not understand why.
It is slightly tidier to use a mutable accumulator pattern, like so, especially if you can re-use your accumulator:
case class Accum(var complete = 0, var incomplete = 0) {
def inc(compl: Boolean): this.type = {
if (compl) complete += 1 else incomplete += 1
this
}
}
val counts = groupRows.foldLeft( Accum() ){ (a, row) => a.inc( row.time == 0 ) }
If you really want to, you can hide your vars as private; if not, you still are a lot more self-contained than the pattern with vars.
You could just calculate it using the difference like so:
def counts(groupRows: List[Row]) = {
val complete = groupRows.foldLeft(0){ (pair, row) =>
if(row.time == 0) pair + 1 else pair
}
(complete, groupRows.length - complete)
}

Quicksort using Future ends up in a deadlock

I have written a quicksort (method quicksortF()) that uses a Scala's Future to let the recursive sorting of the partitions be done concurrently. I also have implemented a regular quicksort (method quicksort()). Unfortunately, the Future version ends up in a deadlock (apparently blocks forever) when the list to sort is greater than about 1000 elements (900 would work). The source is shown below.
I am relatively new to Actors and Futures. What is goind wrong here?
Thanks!
import util.Random
import actors.Futures._
/**
* Quicksort with and without using the Future pattern.
* #author Markus Gumbel
*/
object FutureQuickSortProblem {
def main(args: Array[String]) {
val n = 1000 // works for n = 900 but not for 1000 anymore.
// Create a random list of size n:
val list = (1 to n).map(c => Random.nextInt(n * 10)).toList
println(list)
// Sort it with regular quicksort:
val sortedList = quicksort(list)
println(sortedList)
// ... and with quicksort using Future (which hangs):
val sortedListF = quicksortF(list)
println(sortedListF)
}
// This one works.
def quicksort(list: List[Int]): List[Int] = {
if (list.length <= 1) list
else {
val pivot = list.head
val leftList = list.filter(_ < pivot)
val middleList = list.filter(pivot == _)
val rightList = list.filter(_ > pivot)
val sortedLeftList = quicksort(leftList)
val sortedRightList = quicksort(rightList)
sortedLeftList ::: middleList ::: sortedRightList
}
}
// Almost the same as quicksort() except that Future is used.
// However, this one hangs.
def quicksortF(list: List[Int]): List[Int] = {
if (list.length <= 1) list
else {
val pivot = list.head
val leftList = list.filter(_ < pivot)
val middleList = list.filter(pivot == _)
val rightList = list.filter(_ > pivot)
// Same as quicksort() but here we are using a Future
// to sort the left and right partitions independently:
val sortedLeftListFuture = future {
quicksortF(leftList)
}
val sortedRightListFuture = future {
quicksortF(rightList)
}
sortedLeftListFuture() ::: middleList ::: sortedRightListFuture()
}
}
}
class FutureQuickSortProblem // If not defined, Intellij won't find the main method.?!
Disclaimer: I've never personally used the (pre-2.10) standard library's actors or futures in any serious way, and there are a number of things I don't like (or at least don't understand) about the API there, compared for example to the implementations in Scalaz or Akka or Play 2.0.
But I can tell you that the usual approach in a case like this is to combine your futures monadically instead of claiming them immediately and combining the results. For example, you could write something like this (note the new return type):
import scala.actors.Futures._
def quicksortF(list: List[Int]): Responder[List[Int]] = {
if (list.length <= 1) future(list)
else {
val pivot = list.head
val leftList = list.filter(_ < pivot)
val middleList = list.filter(pivot == _)
val rightList = list.filter(_ > pivot)
for {
left <- quicksortF(leftList)
right <- quicksortF(rightList)
} yield left ::: middleList ::: right
}
}
Like your vanilla implementation, this won't necessarily be very efficient, and it will also blow the stack pretty easily, but it shouldn't run out of threads.
As a side note, why does flatMap on a Future return a Responder instead of a Future? I don't know, and neither do some other folks. For reasons like this I'd suggest skipping the now-deprecated pre-2.10 standard library actor-based concurrency stuff altogether.
As I understand, calling apply on the Future (as you do when concatenating the results of the recursive calls) will block until the result is retrieved.