I am Java developer and learning Scala at the moment. It is generally admitted, that Java is more verbose then Scala. I just need to call 2 or more methods concurrently and then combine the result. Official Scala documentation at docs.scala-lang.org/overviews/core/futures.html suggests to use for-comprehention for that. So I used that out-of-the-box solution straightforwardly. Then I thought how I would do it with CompletableFuture and was surprised that it produced more concise and faster code, then Scala's Future
Let's consider a basic concurrent case: summing up values in array. For simplicity, let's split array in 2 parts(hence it will be 2 worker threads). Java's sumConcurrently takes only 4 LOC, while Scala's version requires 12 LOC. Also Java's version is 15% faster on my computer.
Complete code, not benchmark optimised.
Java impl.:
public class CombiningCompletableFuture {
static int sumConcurrently(List<Integer> numbers) throws ExecutionException, InterruptedException {
int mid = numbers.size() / 2;
return CompletableFuture.supplyAsync( () -> sumSequentially(numbers.subList(0, mid)))
.thenCombine(CompletableFuture.supplyAsync( () -> sumSequentially(numbers.subList(mid, numbers.size())))
, (left, right) -> left + right).get();
}
static int sumSequentially(List<Integer> numbers) {
try {
Thread.sleep(TimeUnit.SECONDS.toMillis(1));
} catch (InterruptedException ignored) { }
return numbers.stream().mapToInt(Integer::intValue).sum();
}
public static void main(String[] args) throws ExecutionException, InterruptedException {
List<Integer> from1toTen = IntStream.rangeClosed(1, 10).boxed().collect(toList());
long start = System.currentTimeMillis();
long sum = sumConcurrently(from1toTen);
long duration = System.currentTimeMillis() - start;
System.out.printf("sum is %d in %d ms.", sum, duration);
}
}
Scala's impl.:
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
object CombiningFutures extends App {
def sumConcurrently(numbers: Seq[Int]) = {
val splitted = numbers.splitAt(5)
val leftFuture = Future {
sumSequentally(splitted._1)
}
val rightFuture = Future {
sumSequentally(splitted._2)
}
val totalFuture = for {
left <- leftFuture
right <- rightFuture
} yield left + right
Await.result(totalFuture, Duration.Inf)
}
def sumSequentally(numbers: Seq[Int]) = {
Thread.sleep(1000)
numbers.sum
}
val from1toTen = 1 to 10
val start = System.currentTimeMillis
val sum = sumConcurrently(from1toTen)
val duration = System.currentTimeMillis - start
println(s"sum is $sum in $duration ms.")
}
Any explanations and suggestions how to improve Scala code without impacting readability too much?
A verbose scala version of your sumConcurrently,
def sumConcurrently(numbers: List[Int]): Future[Int] = {
val (v1, v2) = numbers.splitAt(numbers.length / 2)
for {
sum1 <- Future(v1.sum)
sum2 <- Future(v2.sum)
} yield sum1 + sum2
}
A more concise version
def sumConcurrently2(numbers: List[Int]): Future[Int] = numbers.splitAt(numbers.length / 2) match {
case (l1, l2) => Future.sequence(List(Future(l1.sum), Future(l2.sum))).map(_.sum)
}
And all this is because we have to partition the list. Lets say we had to write a function which takes a few lists and returns the sum of their sum's using multiple concurrent computations,
def sumConcurrently3(lists: List[Int]*): Future[Int] =
Future.sequence(lists.map(l => Future(l.sum))).map(_.sum)
If the above looks cryptic... then let me de-crypt it,
def sumConcurrently3(lists: List[Int]*): Future[Int] = {
val listOfFuturesOfSums = lists.map { l => Future(l.sum) }
val futureOfListOfSums = Future.sequence(listOfFuturesOfSums)
futureOfListOfSums.map { l => l.sum }
}
Now, whenever you use the result of a Future (lets say the future completes at time t1) in a computation, it means that this computation is bound to happen after time t1. You can do it with blocking like this in Scala,
val sumFuture = sumConcurrently(List(1, 2, 3, 4))
val sum = Await.result(sumFuture, Duration.Inf)
val anotherSum = sum + 100
println("another sum = " + anotherSum)
But what is the point of all that, you are blocking the current thread while for the computation on those threads to finish. Why not move the whole computation into the future itself.
val sumFuture = sumConcurrently(List(1, 2, 3, 4))
val anotherSumFuture = sumFuture.map(s => s + 100)
anotherSumFuture.foreach(s => println("another sum = " + s))
Now, you are not blocking anywhere and the threads can be used anywhere required.
Future implementation and api in Scala is designed to enable you to write your program avoiding blocking as far as possible.
For the task at hand, the following is probably not the tersest option either:
def sumConcurrently(numbers: Vector[Int]): Future[Int] = {
val (v1, v2) = numbers.splitAt(numbers.length / 2)
Future(v1.sum).zipWith(Future(v2.sum))(_ + _)
}
As I mentioned in my comment there are several issues with your example.
Related
private def canProceed: Boolean = {
val startTime = System.currentTimeMillis
val endTime = startTime + (5 * 1000)
while (System.currentTimeMillis < endTime) {
if (isSafe) { // method where my current implementation is just true or false for testing
true
} else {
println("Not safe. Trying again")
}
}
false
}
This will just keep iterating through the while loop since the true from the conditional doesn't actually do anything as a scala while loop always returns a Unit, so the final result will always be false. Is there some idiomatic way to do this without leveraging var or return?
Well technically speaking you could write your own tail-recursive function like this following.
def attemptWhile(cond: => Boolean)(check: => Boolean): Boolean = {
#annotation.tailrec
def loop(): Boolean = {
if (cond) check || loop()
else false
}
loop()
}
Which then you could use like:
private def canProceed: Boolean = {
val startTime = System.currentTimeMillis
val endTime = startTime + (5 * 1000)
attemptWhile(System.currentTimeMillis < endTime)(isSafe)
}
However, this is still depending on mutable state so not really functional.
At that point, you may wonder if it is worth the effort adding that attemptWhile or just use return.
BTW, if you want a fully functional solution this seems something that could be solved using something like fs2 but maybe that is just too off-topic for you.
You are basically asking whether or not you can do anything that's not a side effect inside a while loop and observe it on the outside.
Just like you cannot do anything worthwhile without side-effects in a while loop, you cannot observe anything on the outside of a while loop without side effects. The var and return keywords are directly related to side effects.
(Of course you can observe the heating up of your CPU or the slowdown of your program, but this is usually disregarded as it doesn't pertain to computation directly)
edit
For yield is there to address exactly this "problem".
scala> for {
| x <- List(1,2,3)
| y <- List(4,5,6)
| } yield (x,y)
val res0: List[(Int, Int)] = List((1,4), (1,5), (1,6), (2,4), (2,5), (2,6), (3,4), (3,5), (3,6))
You may want to look into effect types (Cats effect, Zio). They deal with exactly the case of turning effects into values. In your case, there are 2 effects:
System.currentTimeMillis this value is mutable and changes over time
isSafe has to be mutable and changed from the outside.
edit2
Here's an example of how you can iterate a side effect with Cats Effect
import cats.effect.{ExitCode, IO, IOApp}
import cats.implicits.catsSyntaxMonad
import scala.concurrent.duration.DurationInt
object CheckTime extends IOApp {
val checkTime: IO[Long] = IO.delay { System.currentTimeMillis() }
val waitForSomeTime: IO[Long] = for {
start <- checkTime
_ <- IO.delay { println(s"Start time: $start") }
stopTime <- checkTime.iterateWhile(currentTime => currentTime - start < 10.seconds.toMillis)
_ <- IO.delay { println(s"Ended at $stopTime") }
} yield stopTime
override def run(args: List[String]): IO[ExitCode] = waitForSomeTime.as(ExitCode.Success)
}
Output:
Start time: 1608103950673
Ended at 1608103960673
I'm reading 《hands on scala》, and one of its exercise is parallelizing merge sort.
I want to know why for-comprehension, which can be translated into flatMap and map, takes more time than zip and map.
my code:
def mergeSortParallel0[T: Ordering](items: IndexedSeq[T]): Future[IndexedSeq[T]] = {
if (items.length <= 16) Future.successful(mergeSortSequential(items))
else {
val (left, right) = items.splitAt(items.length / 2)
for (
l <- mergeSortParallel0(left);
r <- mergeSortParallel0(right)
) yield merge(l, r)
}
}
the standard answer provided by book:
def mergeSortParallel0[T: Ordering](items: IndexedSeq[T]): Future[IndexedSeq[T]] = {
if (items.length <= 16) Future.successful(mergeSortSequential(items))
else {
val (left, right) = items.splitAt(items.length / 2)
mergeSortParallel0(left).zip(mergeSortParallel0(right)).map{
case (sortedLeft, sortedRight) => merge(sortedLeft, sortedRight)
}
}
}
flatMap or map are sequential operations on Scala Future and on their own have nothing to do with running things in parallel. They can be viewed as simple callbacks executed when a Future completes. Or in other words, provided code inside map(...) or flatMap(...) will start to execute only when the previous Future is finished.
zip on the other hand will run your Futures in parallel and return the result as a Tuple when both of them are complete. Similarly, you could use zipWith which takes a function to transform the results of two Futures (combines zip and map operations):
mergeSortParallel0(left).zipWith(mergeSortParallel0(right)){
case (sortedLeft, sortedRight) => merge(sortedLeft, sortedRight)
}
Another way to achieve parallelism is to declare Futures outside for-comprehension. This works as Futures in Scala are 'eager' and they start as soon as you declare them (assign to val):
def mergeSortParallel0[T: Ordering](items: IndexedSeq[T]): Future[IndexedSeq[T]] = {
if (items.length <= 16) Future.successful(mergeSortSequential(items))
else {
val (left, right) = items.splitAt(items.length / 2)
val leftF = mergeSortParallel0(left)
val rightF = mergeSortParallel0(right)
for {
sortedLeft <- leftF
sortedRight <- rightF
} yield {
merge(sortedLeft, sortedRight)
}
}
}
Background
I have the following scenario. I want to execute the method of a class from an external library, repeatedly, and I want to do so until a certain timeout condition and result condition (compared to the previous result) is met. Furthermore I want to collect the return values, even on the "failed" run (the run with the "failing" result condition that should interrupt further execution).
Thus far I have accomplished this with initializing an empty var result: Result, a var stop: Boolean and using a while loop that runs while the conditions are true and modifying the outer state. I would like to get rid of this and use a functional approach.
Some context. Each run is expected to run from 0 to 60 minutes and the total time of iteration is capped at 60 minutes. Theoretically, there's no bound to how many times it executes in this period but in practice, it's generally 2-60 times.
The problem is, the runs take a long time so I need to stop the execution. My idea is to use some kind of lazy Iterator or Stream coupled with scanLeft and Option.
Code
Boiler plate
This code isn't particularly relevant but used in my approach samples and provide identical but somewhat random pseudo runtime results.
import scala.collection.mutable.ListBuffer
import scala.util.Random
val r = Random
r.setSeed(1)
val sleepingTimes: Seq[Int] = (1 to 601)
.map(x => Math.pow(2, x).toInt * r.nextInt(100))
.toList
.filter(_ > 0)
.sorted
val randomRes = r.shuffle((0 to 600).map(x => r.nextInt(10)).toList)
case class Result(val a: Int, val slept: Int)
class Lib() {
def run(i: Int) = {
println(s"running ${i}")
Thread.sleep(sleepingTimes(i))
Result(randomRes(i), sleepingTimes(i))
}
}
case class Baz(i: Int, result: Result)
val lib = new Lib()
val timeout = 10 * 1000
While approach
val iteratorStart = System.currentTimeMillis()
val iterator = for {
i <- (0 to 600).iterator
if System.currentTimeMillis() < iteratorStart + timeout
f = Baz(i, lib.run(i))
} yield f
val iteratorBuffer = ListBuffer[Baz]()
if (iterator.hasNext) iteratorBuffer.append(iterator.next())
var run = true
while (run && iterator.hasNext) {
val next = iterator.next()
run = iteratorBuffer.last.result.a < next.result.a
iteratorBuffer.append(next)
}
Stream approach (Scala.2.12)
Full example
val streamStart = System.currentTimeMillis()
val stream = for {
i <- (0 to 600).toStream
if System.currentTimeMillis() < streamStart + timeout
} yield Baz(i, lib.run(i))
var last: Option[Baz] = None
val head = stream.headOption
val tail = if (stream.nonEmpty) stream.tail else stream
val streamVersion = (tail
.scanLeft((head, true))((x, y) => {
if (x._1.exists(_.result.a > y.result.a)) (Some(y), false)
else (Some(y), true)
})
.takeWhile {
case (baz, continue) =>
if (!baz.eq(head)) last = baz
continue
}
.map(_._1)
.toList :+ last).flatten
LazyList approach (Scala 2.13)
Full example
val lazyListStart = System.currentTimeMillis()
val lazyList = for {
i <- (0 to 600).to(LazyList)
if System.currentTimeMillis() < lazyListStart + timeout
} yield Baz(i, lib.run(i))
var last: Option[Baz] = None
val head = lazyList.headOption
val tail = if (lazyList.nonEmpty) lazyList.tail else lazyList
val lazyListVersion = (tail
.scanLeft((head, true))((x, y) => {
if (x._1.exists(_.result.a > y.result.a)) (Some(y), false)
else (Some(y), true)
})
.takeWhile {
case (baz, continue) =>
if (!baz.eq(head)) last = baz
continue
}
.map(_._1)
.toList :+ last).flatten
Result
Both approaches appear to yield the correct end result:
List(Baz(0,Result(4,170)), Baz(1,Result(5,208)))
and they interrupt execution as desired.
Edit: The desired outcome is to not execute the next iteration but still return the result of the iteration that caused the interruption. Thus the desired result is
List(Baz(0,Result(4,170)), Baz(1,Result(5,208)), Baz(2,Result(2,256))
and lib.run(i) should only run 3 times.
This is achieved by the while approach, as well as the LazyList approach but not the Stream approach which executes lib.run 4 times (Bad!).
Question
Is there another stateless approach, which is hopefully more elegant?
Edit
I realized my examples were faulty and not returning the "failing" result, which it should, and that they kept executing beyond the stop condition. I rewrote the code and examples but I believe the spirit of the question is the same.
I would use something higher level, like fs2.
(or any other high-level streaming library, like: monix observables, akka streams or zio zstreams)
def runUntilOrTimeout[F[_]: Concurrent: Timer, A](work: F[A], timeout: FiniteDuration)
(stop: (A, A) => Boolean): Stream[F, A] = {
val interrupt =
Stream.sleep_(timeout)
val run =
Stream
.repeatEval(work)
.zipWithPrevious
.takeThrough {
case (Some(p), c) if stop(p, c) => false
case _ => true
} map {
case (_, c) => c
}
run mergeHaltBoth interrupt
}
You can see it working here.
I wrote this simple program in my attempt to learn how Cats Writer works
import cats.data.Writer
import cats.syntax.applicative._
import cats.syntax.writer._
import cats.instances.vector._
object WriterTest extends App {
type Logged2[A] = Writer[Vector[String], A]
Vector("started the program").tell
val output1 = calculate1(10)
val foo = new Foo()
val output2 = foo.calculate2(20)
val (log, sum) = (output1 + output2).pure[Logged2].run
println(log)
println(sum)
def calculate1(x : Int) : Int = {
Vector("came inside calculate1").tell
val output = 10 + x
Vector(s"Calculated value ${output}").tell
output
}
}
class Foo {
def calculate2(x: Int) : Int = {
Vector("came inside calculate 2").tell
val output = 10 + x
Vector(s"calculated ${output}").tell
output
}
}
The program works and the output is
> run-main WriterTest
[info] Compiling 1 Scala source to /Users/Cats/target/scala-2.11/classes...
[info] Running WriterTest
Vector()
50
[success] Total time: 1 s, completed Jan 21, 2017 8:14:19 AM
But why is the vector empty? Shouldn't it contain all the strings on which I used the "tell" method?
When you call tell on your Vectors, each time you create a Writer[Vector[String], Unit]. However, you never actually do anything with your Writers, you just discard them. Further, you call pure to create your final Writer, which simply creates a Writer with an empty Vector. You have to combine the writers together in a chain that carries your value and message around.
type Logged[A] = Writer[Vector[String], A]
val (log, sum) = (for {
_ <- Vector("started the program").tell
output1 <- calculate1(10)
foo = new Foo()
output2 <- foo.calculate2(20)
} yield output1 + output2).run
def calculate1(x: Int): Logged[Int] = for {
_ <- Vector("came inside calculate1").tell
output = 10 + x
_ <- Vector(s"Calculated value ${output}").tell
} yield output
class Foo {
def calculate2(x: Int): Logged[Int] = for {
_ <- Vector("came inside calculate2").tell
output = 10 + x
_ <- Vector(s"calculated ${output}").tell
} yield output
}
Note the use of for notation. The definition of calculate1 is really
def calculate1(x: Int): Logged[Int] = Vector("came inside calculate1").tell.flatMap { _ =>
val output = 10 + x
Vector(s"calculated ${output}").tell.map { _ => output }
}
flatMap is the monadic bind operation, which means it understands how to take two monadic values (in this case Writer) and join them together to get a new one. In this case, it makes a Writer containing the concatenation of the logs and the value of the one on the right.
Note how there are no side effects. There is no global state by which Writer can remember all your calls to tell. You instead make many Writers and join them together with flatMap to get one big one at the end.
The problem with your example code is that you're not using the result of the tell method.
If you take a look at its signature, you'll see this:
final class WriterIdSyntax[A](val a: A) extends AnyVal {
def tell: Writer[A, Unit] = Writer(a, ())
}
it is clear that tell returns a Writer[A, Unit] result which is immediately discarded because you didn't assign it to a value.
The proper way to use a Writer (and any monad in Scala) is through its flatMap method. It would look similar to this:
println(
Vector("started the program").tell.flatMap { _ =>
15.pure[Logged2].flatMap { i =>
Writer(Vector("ended program"), i)
}
}
)
The code above, when executed will give you this:
WriterT((Vector(started the program, ended program),15))
As you can see, both messages and the int are stored in the result.
Now this is a bit ugly, and Scala actually provides a better way to do this: for-comprehensions. For-comprehension are a bit of syntactic sugar that allows us to write the same code in this way:
println(
for {
_ <- Vector("started the program").tell
i <- 15.pure[Logged2]
_ <- Vector("ended program").tell
} yield i
)
Now going back to your example, what I would recommend is for you to change the return type of compute1 and compute2 to be Writer[Vector[String], Int] and then try to make your application compile using what I wrote above.
I have written a quicksort (method quicksortF()) that uses a Scala's Future to let the recursive sorting of the partitions be done concurrently. I also have implemented a regular quicksort (method quicksort()). Unfortunately, the Future version ends up in a deadlock (apparently blocks forever) when the list to sort is greater than about 1000 elements (900 would work). The source is shown below.
I am relatively new to Actors and Futures. What is goind wrong here?
Thanks!
import util.Random
import actors.Futures._
/**
* Quicksort with and without using the Future pattern.
* #author Markus Gumbel
*/
object FutureQuickSortProblem {
def main(args: Array[String]) {
val n = 1000 // works for n = 900 but not for 1000 anymore.
// Create a random list of size n:
val list = (1 to n).map(c => Random.nextInt(n * 10)).toList
println(list)
// Sort it with regular quicksort:
val sortedList = quicksort(list)
println(sortedList)
// ... and with quicksort using Future (which hangs):
val sortedListF = quicksortF(list)
println(sortedListF)
}
// This one works.
def quicksort(list: List[Int]): List[Int] = {
if (list.length <= 1) list
else {
val pivot = list.head
val leftList = list.filter(_ < pivot)
val middleList = list.filter(pivot == _)
val rightList = list.filter(_ > pivot)
val sortedLeftList = quicksort(leftList)
val sortedRightList = quicksort(rightList)
sortedLeftList ::: middleList ::: sortedRightList
}
}
// Almost the same as quicksort() except that Future is used.
// However, this one hangs.
def quicksortF(list: List[Int]): List[Int] = {
if (list.length <= 1) list
else {
val pivot = list.head
val leftList = list.filter(_ < pivot)
val middleList = list.filter(pivot == _)
val rightList = list.filter(_ > pivot)
// Same as quicksort() but here we are using a Future
// to sort the left and right partitions independently:
val sortedLeftListFuture = future {
quicksortF(leftList)
}
val sortedRightListFuture = future {
quicksortF(rightList)
}
sortedLeftListFuture() ::: middleList ::: sortedRightListFuture()
}
}
}
class FutureQuickSortProblem // If not defined, Intellij won't find the main method.?!
Disclaimer: I've never personally used the (pre-2.10) standard library's actors or futures in any serious way, and there are a number of things I don't like (or at least don't understand) about the API there, compared for example to the implementations in Scalaz or Akka or Play 2.0.
But I can tell you that the usual approach in a case like this is to combine your futures monadically instead of claiming them immediately and combining the results. For example, you could write something like this (note the new return type):
import scala.actors.Futures._
def quicksortF(list: List[Int]): Responder[List[Int]] = {
if (list.length <= 1) future(list)
else {
val pivot = list.head
val leftList = list.filter(_ < pivot)
val middleList = list.filter(pivot == _)
val rightList = list.filter(_ > pivot)
for {
left <- quicksortF(leftList)
right <- quicksortF(rightList)
} yield left ::: middleList ::: right
}
}
Like your vanilla implementation, this won't necessarily be very efficient, and it will also blow the stack pretty easily, but it shouldn't run out of threads.
As a side note, why does flatMap on a Future return a Responder instead of a Future? I don't know, and neither do some other folks. For reasons like this I'd suggest skipping the now-deprecated pre-2.10 standard library actor-based concurrency stuff altogether.
As I understand, calling apply on the Future (as you do when concatenating the results of the recursive calls) will block until the result is retrieved.