Thread.join not behaving as I expected in scala - scala

In the code below I create 20 threads, have them each print out a message, sleep, and print another message. I start the threads in my main thread and then join all of the threads as well. I would expect the "all done" message to only be printed after all of the threads have finished. Yet "all done" gets printed before all the threads are done. Can someone help me to understand this behavior?
Thanks.
Kent
Here is the code:
def ttest() = {
val threads =
for (i <- 1 to 5)
yield new Thread() {
override def run() {
println("going to sleep")
Thread.sleep(1000)
println("awake now")
}
}
threads.foreach(t => t.start())
threads.foreach(t => t.join())
println("all done")
}
Here is the output:
going to sleep
all done
going to sleep
going to sleep
going to sleep
going to sleep
awake now
awake now
awake now
awake now
awake now

It works if you transform the Range into a List:
def ttest() = {
val threads =
for (i <- 1 to 5 toList)
yield new Thread() {
override def run() {
println("going to sleep")
Thread.sleep(1000)
println("awake now")
}
}
threads.foreach(t => t.start())
threads.foreach(t => t.join())
println("all done")
}
The problem is that "1 to 5" is a Range, and ranges are not "strict", so to speak. In good English, when you call the method map on a Range, it does not compute each value right then. Instead, it produces an object -- a RandomAccessSeq.Projection on Scala 2.7 -- which has a reference to the function passed to map and another to the original range. Thus, when you use an element of the resulting range, the function you passed to map is applied to the corresponding element of the original range. And this will happen each and every time you access any element of the resulting range.
This means that each time you refer to an element of t, you are calling new Thread() { ... } anew. Since you do it twice, and the range has 5 elements, you are creating 10 threads. You start on the first 5, and join on the second 5.
If this is confusing, look at the example below:
scala> object test {
| val t = for (i <- 1 to 5) yield { println("Called again! "+i); i }
| }
defined module test
scala> test.t
Called again! 1
Called again! 2
Called again! 3
Called again! 4
Called again! 5
res4: scala.collection.generic.VectorView[Int,Vector[_]] = RangeM(1, 2, 3, 4, 5)
scala> test.t
Called again! 1
Called again! 2
Called again! 3
Called again! 4
Called again! 5
res5: scala.collection.generic.VectorView[Int,Vector[_]] = RangeM(1, 2, 3, 4, 5)
Each time I print t (by having Scala REPL print res4 and res5), the yielded expression gets evaluated again. It happens for individual elements too:
scala> test.t(1)
Called again! 2
res6: Int = 2
scala> test.t(1)
Called again! 2
res7: Int = 2
EDIT
As of Scala 2.8, Range will be strict, so the code in the question will work as originally expected.

In your code, threads is deferred - each time you iterate it, the for generator expression is run anew. Thus, you actually create 10 threads there - the first foreach creates 5 and starts them, the second foreach creates 5 more (which are not started) and joins them - since they aren't running, join returns immediately. You should use toList on the result of for to make a stable snapshot.

Related

Why am I getting "Future(<not completed>)"

Beginning to learn scala (and futures).
Given the following code:
import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.util.{Failure, Success}
object MyFuture2 extends App {
val twenty = Future{Thread.sleep(2000); 20}
twenty onComplete {
case Success(nums) => println("twenty onComplete: " + nums)
case Failure(t) => println("An error has occurred: " + t.getMessage)
}
while (!twenty.isCompleted ) {
Thread.sleep(500)
}
println("Step 1: " + twenty)
val myresult = for {
bb: Int <- twenty
} yield {
bb * 2
}
println("Step 2: " + myresult)
}
The output is
twenty onComplete: 20
Step 1: Future(Success(20))
Step 2: Future(<not completed>)
"Step 2: Future()" does not make sense because almost in the immediate statement earlier, the twenty is already complete, otherwise, it wouldn't print out "Step 1: Future(Success(20))" . The expected value is 40 (20 * 2).
What is wrong with my understanding of how Futures are used? And how can Step 2 ever be not completed!
Thanks.
Note that the first part could also have been simply val twenty = Future.successful(20) (which executes synchronously on the same thread where you call it) and the result would have been the same (unless for some reason you get lucky with the scheduling).
What is happening is the following:
you have a Future that contains the value 20
you call map on that (which is the meaning of the for comprehension), which sends a callback that multiplies the content of the Future by 2 to the global ExecutionContext that you imported (which happens to be a fork-join pool)
Now two things can happen:
most likely, the execution of the current thread continues and you get Future(<not completed>) printed
maybe (but less likely), for some reason the current thread yields and the thread which multiplies by two is scheduled -- the number is multiplied by two and when the control returns to the other thread, Future(40) is printed

Observables created at time interval

I was looking at the RxScala observables which are created at a given time interval:
val periodic: Observable[Long] = Observable.interval(100 millis)
periodic.foreach(x => println(x))
If I put this in a worksheet, I get this result:
periodic: rx.lang.scala.Observable[Long] = rx.lang.scala.JavaConversions$$anon$2#2cce3493
res0: Unit = ()
This leaves me confused: What do the elements of periodic actually contain?
Do they contain some index?
Do they contain the time interval at which they were created?
As you can read here http://reactivex.io/documentation/operators/interval.html produced elements are Long values incrementing from 0.
As for your code and results:
Here, you create the observable, and get Observable[Long] assigned to periodic. Everything as expected.
scala> val periodic: Observable[Long] = Observable.interval(100 millis)
periodic: rx.lang.scala.Observable[Long] = rx.lang.scala.JavaConversions$$anon$2#2cce3493
Here, you register a callback, i.e. what happens when value is emmited. The return type of foreach method is Unit as it doesn't have a reasonable value and happens just for the side effect of registering callbacks.
periodic.foreach(x => println(x))
res0: Unit = ()
You don't see actual values because execution stops. Try to insert Thread.sleep.
val periodic: Observable[Long] = Observable.interval(100.millis)
periodic.foreach(x => println(x))
Thread.sleep(1000)
Gives output similar to
periodic: rx.lang.scala.Observable[Long] = rx.lang.scala.JavaConversions$$anon$2#207cb62f
res0: Unit = ()
0
1
2
3
4
5
6
7
8
9
res1: Unit = ()
The problem is that interval is asynchronous, so you´re not waiting for the result.
Another way to wait for the result is use TestSubscriber
def interval(): Unit = {
addHeader("Interval observable")
Observable.interval(createDuration(100))
.map(n => "New item emitted:" + n)
.doOnNext(s => print("\n" + s))
.subscribe();
new TestSubscriber[Subscription].awaitTerminalEvent(1000, TimeUnit.MILLISECONDS);
}
You can see more examples here https://github.com/politrons/reactiveScala

Observable takeUntil misbehaving

I'm trying to implement a helper method on observables that returns a new observable emitting only the values until a timeout is reached:
implicit class ObservableOps[T](obs: Observable[T]) {
def timedOut(totalSec: Long): Observable[T] = {
require(totalSec >= 0)
val timeOut = Observable.interval(totalSec seconds)
.filter(_ > 0)
.take(1)
obs.takeUntil(timeOut)
}
}
I wrote a test for it, which creates an observable emitting its first value long after the timeout. However, the resulting observable still seems to include the late value:
test("single value too late for timeout") {
val obs = Observable({Thread.sleep(8000); 1})
val list = obs.timedOut(1).toBlockingObservable.toList
assert(list === List())
}
The test fails with the message List(1) did not equal List(). What am I doing wrong?
I suspect that your Thread.sleep(8000) is actually blocking your main thread. Did you try to add a println after val obs in your test to see if it appears right after the test starts?
What's happening here is that your declaration of obs blocks your program for 8 seconds, then you create your new observable using timedOut, such that timedOut see the emitted value as soon as it's called.
Using rx-scala 0.23.0 your timedOut method works (excepted that Observable.interval doesn't emit immediately so the filter(_ > 0) should be removed).
val obs = Observable.just(42).delay(900.millis)
val list = obs.timedOut(1).toBlocking.toList
println(list) // prints List(42)
val obs = Observable.just(42).delay(1100.millis)
val list = obs.timedOut(1).toBlocking.toList
println(list) // prints List()

scala - lazy iterator calls next too many times?

I'm trying to build a lazy iterator that pulls from a blocking queue, and have encountered a weird problem where next() appears to be called more times than expected. Because my queue is blocking, this causes my application to get stuck in certain cases.
Some simplified sample code:
"infinite iterators" should {
def mkIter = new Iterable[Int] {
var i = 0
override def iterator: Iterator[Int] = {
new Iterator[Int] {
override def hasNext: Boolean = true
override def next(): Int = {
i = i + 1
i
}
}
}
override def toString(): String = "lazy"
}
"return subsets - not lazy" in {
val x = mkIter
x.take(2).toList must equal(List(1, 2))
x.take(2).toList must equal(List(3, 4))
}
"return subsets - lazy" in {
val x = mkIter
x.view.take(2).toList must equal(List(1, 2))
x.view.take(2).toList must equal(List(3, 4))
}
}
In the example above, the lazy test fails because the second call to take(2) returns List(4, 5).
Given that I see this behaviour with both Scala 2.10 and 2.11, I suspect the error is mine, but I'm not sure what I'm missing.
take invalidates iterators. See the code example at the top of http://www.scala-lang.org/api/2.10.3/index.html#scala.collection.Iterator
As explained by #dlwh, Scala is explicitly documented to not allow reuse of an iterator after calling take(Int). That said, a way to implement my core use case is to create a new stream each time I want to get another element out of the iterator.
Adding to my example in the original question:
"return subsets - streams" in {
val x = mkIter
x.toStream.take(2).toList must equal(List(1, 2))
x.toStream.take(2).toList must equal(List(3, 4))
}
Note that toStream has the side effect of calling next() on the iterator, so this is only safe if you know you will be taking at least one item off of the stream. The advantage streams have over lazy views is that it will not call next() more than the minimum number of times needed.

Cats a List of State Monads "fail fast" on <...>.sequence method?

let's say we have a list of states and we want to sequence them:
import cats.data.State
import cats.instances.list._
import cats.syntax.traverse._
trait MachineState
case object ContinueRunning extends MachineState
case object StopRunning extends MachineState
case class Machine(candy: Int)
val addCandy: Int => State[Machine, MachineState] = amount =>
State[Machine, MachineState] { machine =>
val newCandyAmount = machine.candy + amount
if(newCandyAmount > 10)
(machine, StopRunning)
else
(machine.copy(newCandyAmount), ContinueRunning)
}
List(addCandy(1),
addCandy(2),
addCandy(5),
addCandy(10),
addCandy(20),
addCandy(50)).sequence.run(Machine(0)).value
Result would be
(Machine(10),List(ContinueRunning, ContinueRunning, ContinueRunning, StopRunning, StopRunning, StopRunning))
It's obvious that 3 last steps are redundant. Is there a way to make this sequence stop early? Here when StopRunning gets returned I would like to stop. For example a list of Either's would fail fast and stop sequence early if needed (because it acts like a monad).
For the record - I do know that it is possible to simply write a tail recursion that checks each state that is being runned and if some condition is satisfied - stop the recursion. I just want to know if there is a more elegant way of doing this? The recursion solution seems like a lot of boilerplate to me, am I wrong or not?
Thank you!:))
There are 2 things here needed to be done.
The first is understanding what is actually happening:
State takes some state value, threads in between many composed calls and in the process produces some output value as well
in your case Machine is the state threaded between calls, while MachineState is the output of a single operation
sequence (usually) takes a collection (here List) of some parametric stuff here State[Machine, _] and turns nesting on the left side (here: List[State[Machine, _]] -> State[Machine, List[_]]) (_ is the gap that you'll be filling with your type)
the result is that you'll thread state (Machine(0)) through all the functions, while you combine the output of each of them (MachineState) into list of outputs
// ammonite
// to better see how many times things are being run
# {
val addCandy: Int => State[Machine, MachineState] = amount =>
State[Machine, MachineState] { machine =>
val newCandyAmount = machine.candy + amount
println("new attempt with " + machine + " and " + amount)
if(newCandyAmount > 10)
(machine, StopRunning)
else
(machine.copy(newCandyAmount), ContinueRunning)
}
}
addCandy: Int => State[Machine, MachineState] = ammonite.$sess.cmd24$$$Lambda$2669/1733815710#25c887ca
# List(addCandy(1),
addCandy(2),
addCandy(5),
addCandy(10),
addCandy(20),
addCandy(50)).sequence.run(Machine(0)).value
new attempt with Machine(0) and 1
new attempt with Machine(1) and 2
new attempt with Machine(3) and 5
new attempt with Machine(8) and 10
new attempt with Machine(8) and 20
new attempt with Machine(8) and 50
res25: (Machine, List[MachineState]) = (Machine(8), List(ContinueRunning, ContinueRunning, ContinueRunning, StopRunning, StopRunning, StopRunning))
In other words, what you want is circuit breaking then .sequence might not be what you want.
As a matter of the fact, you probably want something else - combine a list of A => (A, B) functions into one function which stops next computation if the result of a computation is StopRunning (in your code nothing tells the code what is the condition of circuit break and how it should be performed). I would suggest doing it explicitly with some other function, e.g.:
# {
List(addCandy(1),
addCandy(2),
addCandy(5),
addCandy(10),
addCandy(20),
addCandy(50))
.reduce { (a, b) =>
a.flatMap {
// flatMap and map uses MachineState
// - the second parameter is the result after all!
// we are pattern matching on it to decide if we want to
// proceed with computation or stop it
case ContinueRunning => b // runs next computation
case StopRunning => State.pure(StopRunning) // returns current result without modifying it
}
}
.run(Machine(0))
.value
}
new attempt with Machine(0) and 1
new attempt with Machine(1) and 2
new attempt with Machine(3) and 5
new attempt with Machine(8) and 10
res23: (Machine, MachineState) = (Machine(8), StopRunning)
This will eliminate the need for running code within addCandy - but you cannot really get rid of code that combines states together, so this reduce logic will be applied on runtime n-1 times (where n is the size of your list) and that cannot be helped.
BTW If you take a closer look at Either you will find that it also computes n results and only then combines them so that it looks like it's circuit breaking, but in fact isn't. Sequence is combining a result of "parallel" computations but won't interrupt them if any of them failed.