Beginning to learn scala (and futures).
Given the following code:
import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.util.{Failure, Success}
object MyFuture2 extends App {
val twenty = Future{Thread.sleep(2000); 20}
twenty onComplete {
case Success(nums) => println("twenty onComplete: " + nums)
case Failure(t) => println("An error has occurred: " + t.getMessage)
}
while (!twenty.isCompleted ) {
Thread.sleep(500)
}
println("Step 1: " + twenty)
val myresult = for {
bb: Int <- twenty
} yield {
bb * 2
}
println("Step 2: " + myresult)
}
The output is
twenty onComplete: 20
Step 1: Future(Success(20))
Step 2: Future(<not completed>)
"Step 2: Future()" does not make sense because almost in the immediate statement earlier, the twenty is already complete, otherwise, it wouldn't print out "Step 1: Future(Success(20))" . The expected value is 40 (20 * 2).
What is wrong with my understanding of how Futures are used? And how can Step 2 ever be not completed!
Thanks.
Note that the first part could also have been simply val twenty = Future.successful(20) (which executes synchronously on the same thread where you call it) and the result would have been the same (unless for some reason you get lucky with the scheduling).
What is happening is the following:
you have a Future that contains the value 20
you call map on that (which is the meaning of the for comprehension), which sends a callback that multiplies the content of the Future by 2 to the global ExecutionContext that you imported (which happens to be a fork-join pool)
Now two things can happen:
most likely, the execution of the current thread continues and you get Future(<not completed>) printed
maybe (but less likely), for some reason the current thread yields and the thread which multiplies by two is scheduled -- the number is multiplied by two and when the control returns to the other thread, Future(40) is printed
Related
I want to create a list of Future, each of which could pass or fail and collate results from successful Future. How can I do this?
val futures2:List[Future[Int]] = List(Future{1}, Future{2},Future{throw new Exception("error")})
Questions
1) I want to wait for each future to finish
2) I want to collect sum of return values from each success future and ignore the ones which failed (so I should get 3).
One thing that you need to understand is that... Avoid trying to "get" values from a Future or Futures.
You can keep on operating in the Futuristic land.
val futureList = List(
Future(1),
Future(2),
Future(throw new Exception("error"))
)
// addd 1 to futures
// map will propagate errors to transformed futures
// only successful futures will result in +1, rest will stay with errors
val tranformedFutureList = futureList
.map(future => future.map(i => i + 1))
// print values of futures
// simimlar to map... for each will work only with successful futures
val unitFutureList = futureList
.map(future => future.foreach(i => println(i)))
// now lets give you sum of your "future" values
val sumFuture = futureList
.foldLeft(Future(0))((facc, f) => f.onComplete({
case Success(i) => facc.map(acc => acc + i)
case Failure(ex) => facc
})
And since OP (#Manu Chanda) asked about "getting" a value from a Promise, I am adding some bits about what Promise are in Scala.
So... first lets talk how to think about a Future in Scala.
If you see a Future[Int] then try to think of it as an ongoing computation which is "supposed to produce" an Int. Now that computation can successfully complete and result in a Success[Int] or a throw an exception and result in a Failure[Throwable]. And thus you see the functions such as onComplete, recoverWith, onFailure which seem like talking about a computation.
val intFuture = Future {
// all this inside Future {} is going to run in some other thread
val i = 5;
val j = i + 10;
val k = j / 5;
k
}
Now... what is a Promise.
Well... as the name indicates... a Promise[Int] is a promise of an Int value... nothing more.
Just like when a parent promises a certain toy to their child. Note that in this case... the parent has not necessarily started working on getting that toy, they have just promised that they will.
To complete the promise... they will first have to start working to complete it... got to market... buy from shop... come back home.Or... sometimes... they are busy so... they will ask someone else to bring that toy and keep doing their work... that other guy will try to bring that toy to parent (he may fail to buy it) and then they will complete the promise with whatever result they got from him.
So... basically a Promise wraps a Future inside of it. And that "wrapped" Future "value" can be considered as the value of the Promise.
so...
println("Well... The program wants an 'Int' toy")
// we "promised" our program that we will give it that int "toy"
val intPromise = Promise[Int]()
// now we can just move on with or life
println("Well... We just promised an 'Int' toy")
// while the program can make plans with how will it play with that "future toy"
val intFuture = intPromise.future
val plusOneIntFuture = intFuture.map(i => i + 1)
plusOneIntFuture.onComplete({
case Success(i) => println("Wow... I got the toy and modified it to - " + i)
case Failure(ex) => println("I did not get they toy")
})
// but since we at least want to try to complete our promise
println("Now... I suppose we need to get that 'Int' toy")
println("But... I am busy... I can not stop everything else for that toy")
println("ok... lets ask another thread to get that")
val getThatIntFuture = Future {
println("Well... I am thread 2... trying to get the int")
val i = 1
println("Well... I am thread 2... lets just return this i = 1 thingy")
i
}
// now lets complete our promise with whatever we will get from this other thread
getThatIntFuture.onComplete(intTry => intPromise.complete(intTry))
The above code will result in following output,
Well... The program wants an 'Int' toy
Well... We just promised an 'Int' toy
Now... I suppose we need to get that 'Int' toy
But... I am busy... I can not stop everything else for that toy
Well... I am thread 2... trying to get the int
Well... I am thread 2... lets just return this i = 1 thingy
Wow... I got the toy and modified it to - 2
Promise don't help you in "getting" a value from a Future. Asynchronous processes (or Future in Scala) are just running in another timeline... you can not "get" their "value" in your time-line unless you work on aligning your timeline with the process's time-line itself.
I have some code that is not performance-sensitive and was trying to make stacks easier to follow by using fewer futures. This resulted in some code similar to the following:
val fut = Future {
val r = Future.traverse(ips) { ip =>
val httpResponse: Future[HttpResponse] = asyncHttpClient.exec(req)
httpResponse.andThen {
case x => logger.info(s"received response here: $x")
}
httpResponse.map(r => (ip, r))
}
r.andThen { case x => logger.info(s"final result: $x") }
Await.result(r, 10 seconds)
}
fut.andThen { x => logger.info(s"finished $x") }
logger.info("here nonblocking")
As expected internal logging in the http client shows that the response returns immediately, but the callbacks executing logger.info(s"received response here: $x") and logger.info(s"final result: $x") do not execute until after Await.result(r, 10 seconds) times out. Looking at the log output, which includes thread ids, the callbacks are being executed in the same thread (ForkJoinPool-1-worker-3) that is awaiting the result, creating a deadlock. It was my understanding that ExecutionContext.global would create extra threads on demand when it ran out of threads. Is this not the case? There appears only to be two threads from the global fork join pool that are producing any output in the logs (1 and 3). Can anyone explain this?
As for fixes, I know perhaps the best way is to separate blocking and nonblocking work into different thread pools, but I was hoping to avoid this extra bookkeeping by using a dynamically sized thread pool. Is there a better solution?
If you want to grow the pool (temporarily) when threads are blocked, use concurrent.blocking. Here, you've used all the threads, doing i/o and then scheduling more work with map and andThen (the result of which you don't use).
More info: your "final result" is expected to execute after the traverse, so that is normal.
Example for blocking, although there must be a SO Q&A for it:
scala> import concurrent._ ; import ExecutionContext.Implicits._
scala> val is = 1 to 100 toList
scala> def db = s"${Thread.currentThread}"
db: String
scala> def f(i: Int) = Future { println(db) ; Thread.sleep(1000L) ; 2 * i }
f: (i: Int)scala.concurrent.Future[Int]
scala> Future.traverse(is)(f _)
Thread[ForkJoinPool-1-worker-13,5,main]
Thread[ForkJoinPool-1-worker-7,5,main]
Thread[ForkJoinPool-1-worker-9,5,main]
Thread[ForkJoinPool-1-worker-3,5,main]
Thread[ForkJoinPool-1-worker-5,5,main]
Thread[ForkJoinPool-1-worker-1,5,main]
Thread[ForkJoinPool-1-worker-15,5,main]
Thread[ForkJoinPool-1-worker-11,5,main]
res0: scala.concurrent.Future[List[Int]] = scala.concurrent.impl.Promise$DefaultPromise#3a4b0e5d
[etc, N at a time]
versus overly parallel:
scala> def f(i: Int) = Future { blocking { println(db) ; Thread.sleep(1000L) ; 2 * i }}
f: (i: Int)scala.concurrent.Future[Int]
scala> Future.traverse(is)(f _)
Thread[ForkJoinPool-1-worker-13,5,main]
Thread[ForkJoinPool-1-worker-3,5,main]
Thread[ForkJoinPool-1-worker-1,5,main]
res1: scala.concurrent.Future[List[Int]] = scala.concurrent.impl.Promise$DefaultPromise#759d81f3
Thread[ForkJoinPool-1-worker-7,5,main]
Thread[ForkJoinPool-1-worker-25,5,main]
Thread[ForkJoinPool-1-worker-29,5,main]
Thread[ForkJoinPool-1-worker-19,5,main]
scala> Thread[ForkJoinPool-1-worker-23,5,main]
Thread[ForkJoinPool-1-worker-27,5,main]
Thread[ForkJoinPool-1-worker-21,5,main]
Thread[ForkJoinPool-1-worker-31,5,main]
Thread[ForkJoinPool-1-worker-17,5,main]
Thread[ForkJoinPool-1-worker-49,5,main]
Thread[ForkJoinPool-1-worker-45,5,main]
Thread[ForkJoinPool-1-worker-59,5,main]
Thread[ForkJoinPool-1-worker-43,5,main]
Thread[ForkJoinPool-1-worker-57,5,main]
Thread[ForkJoinPool-1-worker-37,5,main]
Thread[ForkJoinPool-1-worker-51,5,main]
Thread[ForkJoinPool-1-worker-35,5,main]
Thread[ForkJoinPool-1-worker-53,5,main]
Thread[ForkJoinPool-1-worker-63,5,main]
Thread[ForkJoinPool-1-worker-47,5,main]
I have a Play project which uses for-comprehension to complete the tasks in parallel. Let's say we are given with the code below. I want to test that function firstF and secondF is in fact happening in parallel. What would be the best way to test this? I thought of asserting that the start and end time overlaps, but there could be a better way to test this.
def async = Action.async {
val firstF = future{
val s = "start: " + new java.util.Date().toString() + " :: "
Thread.sleep(1000)
val e = "end: " + new java.util.Date().toString()
"First function: " + s + e + "\n\n"
}
val secondF = future{
val s = "start: " + new java.util.Date().toString() + " :: "
Thread.sleep(1000)
val e = "end: " + new java.util.Date().toString()
"Second function: " + s + e + "\n\n"
}
val result = for {
fContent <- firstF
sContent <- secondF
} yield fContent + sContent
result map {
x => Ok( x )
}
}
EDIT:
To test the running time of some execution, the standard way is to use the ScalaMeter framework for performance regression testing.
val gen = Gen.range("times")(1000, 2000, 500)
performance of "Futures" in {
using(gen) in { time =>
val f: Future[Unit] = runningFor(time) // returns some future that takes time milliseconds to execute
Await.ready(f)
}
}
See a complete Getting Started example here.
Manual method:
You can print the name of the current thread using Thread.currentThread.getName. In the standard ExecutionContext implementation, this will print the names of the fork/join worker threads. If the names are different for the two future computations, that is a good indication that they are executing in parallel.
Otherwise, you can try timing the start and the end of the futures using System.currentTimeMillis, and comparing if the two intervals overlap.
Finally, in your example, each future takes a long amount of time. If the program completes in 1 second, they are obviously executing in parallel. If the program completes in 2 seconds, the two futures are executed serially, because the ExecutionContext is using only one thread.
Note that, Thread.sleep is a long-running blocking operation, which blocks the worker thread. Generally, use the blocking directive around blocking operations, to allow resizing the worker thread pool when necessary:
Future { // note: lowercase `future` has been deprecated, use uppercase
blocking {
Thread.sleep(1000)
}
}
In the code below I create 20 threads, have them each print out a message, sleep, and print another message. I start the threads in my main thread and then join all of the threads as well. I would expect the "all done" message to only be printed after all of the threads have finished. Yet "all done" gets printed before all the threads are done. Can someone help me to understand this behavior?
Thanks.
Kent
Here is the code:
def ttest() = {
val threads =
for (i <- 1 to 5)
yield new Thread() {
override def run() {
println("going to sleep")
Thread.sleep(1000)
println("awake now")
}
}
threads.foreach(t => t.start())
threads.foreach(t => t.join())
println("all done")
}
Here is the output:
going to sleep
all done
going to sleep
going to sleep
going to sleep
going to sleep
awake now
awake now
awake now
awake now
awake now
It works if you transform the Range into a List:
def ttest() = {
val threads =
for (i <- 1 to 5 toList)
yield new Thread() {
override def run() {
println("going to sleep")
Thread.sleep(1000)
println("awake now")
}
}
threads.foreach(t => t.start())
threads.foreach(t => t.join())
println("all done")
}
The problem is that "1 to 5" is a Range, and ranges are not "strict", so to speak. In good English, when you call the method map on a Range, it does not compute each value right then. Instead, it produces an object -- a RandomAccessSeq.Projection on Scala 2.7 -- which has a reference to the function passed to map and another to the original range. Thus, when you use an element of the resulting range, the function you passed to map is applied to the corresponding element of the original range. And this will happen each and every time you access any element of the resulting range.
This means that each time you refer to an element of t, you are calling new Thread() { ... } anew. Since you do it twice, and the range has 5 elements, you are creating 10 threads. You start on the first 5, and join on the second 5.
If this is confusing, look at the example below:
scala> object test {
| val t = for (i <- 1 to 5) yield { println("Called again! "+i); i }
| }
defined module test
scala> test.t
Called again! 1
Called again! 2
Called again! 3
Called again! 4
Called again! 5
res4: scala.collection.generic.VectorView[Int,Vector[_]] = RangeM(1, 2, 3, 4, 5)
scala> test.t
Called again! 1
Called again! 2
Called again! 3
Called again! 4
Called again! 5
res5: scala.collection.generic.VectorView[Int,Vector[_]] = RangeM(1, 2, 3, 4, 5)
Each time I print t (by having Scala REPL print res4 and res5), the yielded expression gets evaluated again. It happens for individual elements too:
scala> test.t(1)
Called again! 2
res6: Int = 2
scala> test.t(1)
Called again! 2
res7: Int = 2
EDIT
As of Scala 2.8, Range will be strict, so the code in the question will work as originally expected.
In your code, threads is deferred - each time you iterate it, the for generator expression is run anew. Thus, you actually create 10 threads there - the first foreach creates 5 and starts them, the second foreach creates 5 more (which are not started) and joins them - since they aren't running, join returns immediately. You should use toList on the result of for to make a stable snapshot.
let's say we have a list of states and we want to sequence them:
import cats.data.State
import cats.instances.list._
import cats.syntax.traverse._
trait MachineState
case object ContinueRunning extends MachineState
case object StopRunning extends MachineState
case class Machine(candy: Int)
val addCandy: Int => State[Machine, MachineState] = amount =>
State[Machine, MachineState] { machine =>
val newCandyAmount = machine.candy + amount
if(newCandyAmount > 10)
(machine, StopRunning)
else
(machine.copy(newCandyAmount), ContinueRunning)
}
List(addCandy(1),
addCandy(2),
addCandy(5),
addCandy(10),
addCandy(20),
addCandy(50)).sequence.run(Machine(0)).value
Result would be
(Machine(10),List(ContinueRunning, ContinueRunning, ContinueRunning, StopRunning, StopRunning, StopRunning))
It's obvious that 3 last steps are redundant. Is there a way to make this sequence stop early? Here when StopRunning gets returned I would like to stop. For example a list of Either's would fail fast and stop sequence early if needed (because it acts like a monad).
For the record - I do know that it is possible to simply write a tail recursion that checks each state that is being runned and if some condition is satisfied - stop the recursion. I just want to know if there is a more elegant way of doing this? The recursion solution seems like a lot of boilerplate to me, am I wrong or not?
Thank you!:))
There are 2 things here needed to be done.
The first is understanding what is actually happening:
State takes some state value, threads in between many composed calls and in the process produces some output value as well
in your case Machine is the state threaded between calls, while MachineState is the output of a single operation
sequence (usually) takes a collection (here List) of some parametric stuff here State[Machine, _] and turns nesting on the left side (here: List[State[Machine, _]] -> State[Machine, List[_]]) (_ is the gap that you'll be filling with your type)
the result is that you'll thread state (Machine(0)) through all the functions, while you combine the output of each of them (MachineState) into list of outputs
// ammonite
// to better see how many times things are being run
# {
val addCandy: Int => State[Machine, MachineState] = amount =>
State[Machine, MachineState] { machine =>
val newCandyAmount = machine.candy + amount
println("new attempt with " + machine + " and " + amount)
if(newCandyAmount > 10)
(machine, StopRunning)
else
(machine.copy(newCandyAmount), ContinueRunning)
}
}
addCandy: Int => State[Machine, MachineState] = ammonite.$sess.cmd24$$$Lambda$2669/1733815710#25c887ca
# List(addCandy(1),
addCandy(2),
addCandy(5),
addCandy(10),
addCandy(20),
addCandy(50)).sequence.run(Machine(0)).value
new attempt with Machine(0) and 1
new attempt with Machine(1) and 2
new attempt with Machine(3) and 5
new attempt with Machine(8) and 10
new attempt with Machine(8) and 20
new attempt with Machine(8) and 50
res25: (Machine, List[MachineState]) = (Machine(8), List(ContinueRunning, ContinueRunning, ContinueRunning, StopRunning, StopRunning, StopRunning))
In other words, what you want is circuit breaking then .sequence might not be what you want.
As a matter of the fact, you probably want something else - combine a list of A => (A, B) functions into one function which stops next computation if the result of a computation is StopRunning (in your code nothing tells the code what is the condition of circuit break and how it should be performed). I would suggest doing it explicitly with some other function, e.g.:
# {
List(addCandy(1),
addCandy(2),
addCandy(5),
addCandy(10),
addCandy(20),
addCandy(50))
.reduce { (a, b) =>
a.flatMap {
// flatMap and map uses MachineState
// - the second parameter is the result after all!
// we are pattern matching on it to decide if we want to
// proceed with computation or stop it
case ContinueRunning => b // runs next computation
case StopRunning => State.pure(StopRunning) // returns current result without modifying it
}
}
.run(Machine(0))
.value
}
new attempt with Machine(0) and 1
new attempt with Machine(1) and 2
new attempt with Machine(3) and 5
new attempt with Machine(8) and 10
res23: (Machine, MachineState) = (Machine(8), StopRunning)
This will eliminate the need for running code within addCandy - but you cannot really get rid of code that combines states together, so this reduce logic will be applied on runtime n-1 times (where n is the size of your list) and that cannot be helped.
BTW If you take a closer look at Either you will find that it also computes n results and only then combines them so that it looks like it's circuit breaking, but in fact isn't. Sequence is combining a result of "parallel" computations but won't interrupt them if any of them failed.