I was looking at the RxScala observables which are created at a given time interval:
val periodic: Observable[Long] = Observable.interval(100 millis)
periodic.foreach(x => println(x))
If I put this in a worksheet, I get this result:
periodic: rx.lang.scala.Observable[Long] = rx.lang.scala.JavaConversions$$anon$2#2cce3493
res0: Unit = ()
This leaves me confused: What do the elements of periodic actually contain?
Do they contain some index?
Do they contain the time interval at which they were created?
As you can read here http://reactivex.io/documentation/operators/interval.html produced elements are Long values incrementing from 0.
As for your code and results:
Here, you create the observable, and get Observable[Long] assigned to periodic. Everything as expected.
scala> val periodic: Observable[Long] = Observable.interval(100 millis)
periodic: rx.lang.scala.Observable[Long] = rx.lang.scala.JavaConversions$$anon$2#2cce3493
Here, you register a callback, i.e. what happens when value is emmited. The return type of foreach method is Unit as it doesn't have a reasonable value and happens just for the side effect of registering callbacks.
periodic.foreach(x => println(x))
res0: Unit = ()
You don't see actual values because execution stops. Try to insert Thread.sleep.
val periodic: Observable[Long] = Observable.interval(100.millis)
periodic.foreach(x => println(x))
Thread.sleep(1000)
Gives output similar to
periodic: rx.lang.scala.Observable[Long] = rx.lang.scala.JavaConversions$$anon$2#207cb62f
res0: Unit = ()
0
1
2
3
4
5
6
7
8
9
res1: Unit = ()
The problem is that interval is asynchronous, so you´re not waiting for the result.
Another way to wait for the result is use TestSubscriber
def interval(): Unit = {
addHeader("Interval observable")
Observable.interval(createDuration(100))
.map(n => "New item emitted:" + n)
.doOnNext(s => print("\n" + s))
.subscribe();
new TestSubscriber[Subscription].awaitTerminalEvent(1000, TimeUnit.MILLISECONDS);
}
You can see more examples here https://github.com/politrons/reactiveScala
Related
Background
I have the following scenario. I want to execute the method of a class from an external library, repeatedly, and I want to do so until a certain timeout condition and result condition (compared to the previous result) is met. Furthermore I want to collect the return values, even on the "failed" run (the run with the "failing" result condition that should interrupt further execution).
Thus far I have accomplished this with initializing an empty var result: Result, a var stop: Boolean and using a while loop that runs while the conditions are true and modifying the outer state. I would like to get rid of this and use a functional approach.
Some context. Each run is expected to run from 0 to 60 minutes and the total time of iteration is capped at 60 minutes. Theoretically, there's no bound to how many times it executes in this period but in practice, it's generally 2-60 times.
The problem is, the runs take a long time so I need to stop the execution. My idea is to use some kind of lazy Iterator or Stream coupled with scanLeft and Option.
Code
Boiler plate
This code isn't particularly relevant but used in my approach samples and provide identical but somewhat random pseudo runtime results.
import scala.collection.mutable.ListBuffer
import scala.util.Random
val r = Random
r.setSeed(1)
val sleepingTimes: Seq[Int] = (1 to 601)
.map(x => Math.pow(2, x).toInt * r.nextInt(100))
.toList
.filter(_ > 0)
.sorted
val randomRes = r.shuffle((0 to 600).map(x => r.nextInt(10)).toList)
case class Result(val a: Int, val slept: Int)
class Lib() {
def run(i: Int) = {
println(s"running ${i}")
Thread.sleep(sleepingTimes(i))
Result(randomRes(i), sleepingTimes(i))
}
}
case class Baz(i: Int, result: Result)
val lib = new Lib()
val timeout = 10 * 1000
While approach
val iteratorStart = System.currentTimeMillis()
val iterator = for {
i <- (0 to 600).iterator
if System.currentTimeMillis() < iteratorStart + timeout
f = Baz(i, lib.run(i))
} yield f
val iteratorBuffer = ListBuffer[Baz]()
if (iterator.hasNext) iteratorBuffer.append(iterator.next())
var run = true
while (run && iterator.hasNext) {
val next = iterator.next()
run = iteratorBuffer.last.result.a < next.result.a
iteratorBuffer.append(next)
}
Stream approach (Scala.2.12)
Full example
val streamStart = System.currentTimeMillis()
val stream = for {
i <- (0 to 600).toStream
if System.currentTimeMillis() < streamStart + timeout
} yield Baz(i, lib.run(i))
var last: Option[Baz] = None
val head = stream.headOption
val tail = if (stream.nonEmpty) stream.tail else stream
val streamVersion = (tail
.scanLeft((head, true))((x, y) => {
if (x._1.exists(_.result.a > y.result.a)) (Some(y), false)
else (Some(y), true)
})
.takeWhile {
case (baz, continue) =>
if (!baz.eq(head)) last = baz
continue
}
.map(_._1)
.toList :+ last).flatten
LazyList approach (Scala 2.13)
Full example
val lazyListStart = System.currentTimeMillis()
val lazyList = for {
i <- (0 to 600).to(LazyList)
if System.currentTimeMillis() < lazyListStart + timeout
} yield Baz(i, lib.run(i))
var last: Option[Baz] = None
val head = lazyList.headOption
val tail = if (lazyList.nonEmpty) lazyList.tail else lazyList
val lazyListVersion = (tail
.scanLeft((head, true))((x, y) => {
if (x._1.exists(_.result.a > y.result.a)) (Some(y), false)
else (Some(y), true)
})
.takeWhile {
case (baz, continue) =>
if (!baz.eq(head)) last = baz
continue
}
.map(_._1)
.toList :+ last).flatten
Result
Both approaches appear to yield the correct end result:
List(Baz(0,Result(4,170)), Baz(1,Result(5,208)))
and they interrupt execution as desired.
Edit: The desired outcome is to not execute the next iteration but still return the result of the iteration that caused the interruption. Thus the desired result is
List(Baz(0,Result(4,170)), Baz(1,Result(5,208)), Baz(2,Result(2,256))
and lib.run(i) should only run 3 times.
This is achieved by the while approach, as well as the LazyList approach but not the Stream approach which executes lib.run 4 times (Bad!).
Question
Is there another stateless approach, which is hopefully more elegant?
Edit
I realized my examples were faulty and not returning the "failing" result, which it should, and that they kept executing beyond the stop condition. I rewrote the code and examples but I believe the spirit of the question is the same.
I would use something higher level, like fs2.
(or any other high-level streaming library, like: monix observables, akka streams or zio zstreams)
def runUntilOrTimeout[F[_]: Concurrent: Timer, A](work: F[A], timeout: FiniteDuration)
(stop: (A, A) => Boolean): Stream[F, A] = {
val interrupt =
Stream.sleep_(timeout)
val run =
Stream
.repeatEval(work)
.zipWithPrevious
.takeThrough {
case (Some(p), c) if stop(p, c) => false
case _ => true
} map {
case (_, c) => c
}
run mergeHaltBoth interrupt
}
You can see it working here.
I have an Akka Stream application with a single flow/graph. I want to measure the flow rate at the source and log it every 5 seconds, like 'received 3 messages in the last 5 seconds'. I tried with,
someOtherFlow
.groupedWithin(Integer.MAX_VALUE, 5 seconds)
.runForeach(seq =>
log.debug(s"received ${seq.length} messages in the last 5 seconds")
)
but it only outputs when there are messages, no empty list when there are 0 messages. I want the 0's as well. Is this possible?
You could try something like
src
.conflateWithSeed(_ ⇒ 1){ case (acc, _) ⇒ acc + 1 }
.zip(Source.tick(5.seconds, 5.seconds, NotUsed))
.map(_._1)
which should batch your elements until the tick releases them. This is inspired from an example in the docs.
On a different note, if you need this for monitoring purposes, you could leverage a 3rd party tool for this purpose - e.g. Kamon.
A sample akka stream logging.
implicit val system: ActorSystem = ActorSystem("StreamLoggingActorSystem")
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val adapter: LoggingAdapter = Logging(system, "customLogger")
implicit val ec: ExecutionContextExecutor = system.dispatcher
def randomInt = Random.nextInt()
val source = Source.repeat(NotUsed).map(_ ⇒ randomInt)
val logger = source
.groupedWithin(Integer.MAX_VALUE, 5.seconds)
.log(s"in the last 5 seconds number of messages received : ", _.size)
.withAttributes(
Attributes.logLevels(
onElement = Logging.WarningLevel,
onFinish = Logging.InfoLevel,
onFailure = Logging.DebugLevel
)
)
val sink = Sink.ignore
val result: Future[Done] = logger.runWith(sink)
result.onComplete{
case Success(_) =>
println("end of stream")
case Failure(_) =>
println("stream ended with failure")
}
source code is here.
Extending Stefano's answer a little I created the following flows:
def flowRate[T](metric: T => Int = (_: T) => 1, outputDelay: FiniteDuration = 1 second): Flow[T, Double, NotUsed] =
Flow[T]
.conflateWithSeed(metric(_)){ case (acc, x) ⇒ acc + metric(x) }
.zip(Source.tick(outputDelay, outputDelay, NotUsed))
.map(_._1.toDouble / outputDelay.toUnit(SECONDS))
def printFlowRate[T](name: String, metric: T => Int = (_: T) => 1,
outputDelay: FiniteDuration = 1 second): Flow[T, T, NotUsed] =
Flow[T]
.alsoTo(flowRate[T](metric, outputDelay)
.to(Sink.foreach(r => log.info(s"Rate($name): $r"))))
The first converts the flow into a rate per second. You can supply a metric which gives a value to each object passing through. Say you want to measure the rate of characters in a flow of strings then you could pass _.length. The second parameter is the delay between flow rate reports (defaults to one second).
The second flow can be used inline to print the flow rate for debugging purposes without modifying the value passing through the stream. eg
stringFlow
.via(printFlowRate[String]("Char rate", _.length, 10 seconds))
.map(_.toLowercase) // still a string
...
which will show every 10 seconds the average the rate (per second) of characters.
N.B. The above flowRate would however be lagging one outputDelay period behind, because the zip will consume from the conflate and then wait for a tick (which can be easily verified by putting a log after the conflateWithSeed). To obtain a non lagging flow rate (metric), one could duplicate the tick, in order to force the zip to consume a second fresh element from the conflate, and then aggregate both ticks, i.e.:
Flow[T]
.conflateWithSeed(metric(_)){case (acc, x) => acc + metric(x) }
.zip(Source.tick(outputDelay, outputDelay, NotUsed)
.mapConcat(_ => Seq(NotUsed, NotUsed))
)
.grouped(2).map {
case Seq((a, _), (b, _)) => a + b
}
.map(_.toDouble / outputDelay.toUnit(SECONDS))
I'm trying to implement a helper method on observables that returns a new observable emitting only the values until a timeout is reached:
implicit class ObservableOps[T](obs: Observable[T]) {
def timedOut(totalSec: Long): Observable[T] = {
require(totalSec >= 0)
val timeOut = Observable.interval(totalSec seconds)
.filter(_ > 0)
.take(1)
obs.takeUntil(timeOut)
}
}
I wrote a test for it, which creates an observable emitting its first value long after the timeout. However, the resulting observable still seems to include the late value:
test("single value too late for timeout") {
val obs = Observable({Thread.sleep(8000); 1})
val list = obs.timedOut(1).toBlockingObservable.toList
assert(list === List())
}
The test fails with the message List(1) did not equal List(). What am I doing wrong?
I suspect that your Thread.sleep(8000) is actually blocking your main thread. Did you try to add a println after val obs in your test to see if it appears right after the test starts?
What's happening here is that your declaration of obs blocks your program for 8 seconds, then you create your new observable using timedOut, such that timedOut see the emitted value as soon as it's called.
Using rx-scala 0.23.0 your timedOut method works (excepted that Observable.interval doesn't emit immediately so the filter(_ > 0) should be removed).
val obs = Observable.just(42).delay(900.millis)
val list = obs.timedOut(1).toBlocking.toList
println(list) // prints List(42)
val obs = Observable.just(42).delay(1100.millis)
val list = obs.timedOut(1).toBlocking.toList
println(list) // prints List()
I have some code that is not performance-sensitive and was trying to make stacks easier to follow by using fewer futures. This resulted in some code similar to the following:
val fut = Future {
val r = Future.traverse(ips) { ip =>
val httpResponse: Future[HttpResponse] = asyncHttpClient.exec(req)
httpResponse.andThen {
case x => logger.info(s"received response here: $x")
}
httpResponse.map(r => (ip, r))
}
r.andThen { case x => logger.info(s"final result: $x") }
Await.result(r, 10 seconds)
}
fut.andThen { x => logger.info(s"finished $x") }
logger.info("here nonblocking")
As expected internal logging in the http client shows that the response returns immediately, but the callbacks executing logger.info(s"received response here: $x") and logger.info(s"final result: $x") do not execute until after Await.result(r, 10 seconds) times out. Looking at the log output, which includes thread ids, the callbacks are being executed in the same thread (ForkJoinPool-1-worker-3) that is awaiting the result, creating a deadlock. It was my understanding that ExecutionContext.global would create extra threads on demand when it ran out of threads. Is this not the case? There appears only to be two threads from the global fork join pool that are producing any output in the logs (1 and 3). Can anyone explain this?
As for fixes, I know perhaps the best way is to separate blocking and nonblocking work into different thread pools, but I was hoping to avoid this extra bookkeeping by using a dynamically sized thread pool. Is there a better solution?
If you want to grow the pool (temporarily) when threads are blocked, use concurrent.blocking. Here, you've used all the threads, doing i/o and then scheduling more work with map and andThen (the result of which you don't use).
More info: your "final result" is expected to execute after the traverse, so that is normal.
Example for blocking, although there must be a SO Q&A for it:
scala> import concurrent._ ; import ExecutionContext.Implicits._
scala> val is = 1 to 100 toList
scala> def db = s"${Thread.currentThread}"
db: String
scala> def f(i: Int) = Future { println(db) ; Thread.sleep(1000L) ; 2 * i }
f: (i: Int)scala.concurrent.Future[Int]
scala> Future.traverse(is)(f _)
Thread[ForkJoinPool-1-worker-13,5,main]
Thread[ForkJoinPool-1-worker-7,5,main]
Thread[ForkJoinPool-1-worker-9,5,main]
Thread[ForkJoinPool-1-worker-3,5,main]
Thread[ForkJoinPool-1-worker-5,5,main]
Thread[ForkJoinPool-1-worker-1,5,main]
Thread[ForkJoinPool-1-worker-15,5,main]
Thread[ForkJoinPool-1-worker-11,5,main]
res0: scala.concurrent.Future[List[Int]] = scala.concurrent.impl.Promise$DefaultPromise#3a4b0e5d
[etc, N at a time]
versus overly parallel:
scala> def f(i: Int) = Future { blocking { println(db) ; Thread.sleep(1000L) ; 2 * i }}
f: (i: Int)scala.concurrent.Future[Int]
scala> Future.traverse(is)(f _)
Thread[ForkJoinPool-1-worker-13,5,main]
Thread[ForkJoinPool-1-worker-3,5,main]
Thread[ForkJoinPool-1-worker-1,5,main]
res1: scala.concurrent.Future[List[Int]] = scala.concurrent.impl.Promise$DefaultPromise#759d81f3
Thread[ForkJoinPool-1-worker-7,5,main]
Thread[ForkJoinPool-1-worker-25,5,main]
Thread[ForkJoinPool-1-worker-29,5,main]
Thread[ForkJoinPool-1-worker-19,5,main]
scala> Thread[ForkJoinPool-1-worker-23,5,main]
Thread[ForkJoinPool-1-worker-27,5,main]
Thread[ForkJoinPool-1-worker-21,5,main]
Thread[ForkJoinPool-1-worker-31,5,main]
Thread[ForkJoinPool-1-worker-17,5,main]
Thread[ForkJoinPool-1-worker-49,5,main]
Thread[ForkJoinPool-1-worker-45,5,main]
Thread[ForkJoinPool-1-worker-59,5,main]
Thread[ForkJoinPool-1-worker-43,5,main]
Thread[ForkJoinPool-1-worker-57,5,main]
Thread[ForkJoinPool-1-worker-37,5,main]
Thread[ForkJoinPool-1-worker-51,5,main]
Thread[ForkJoinPool-1-worker-35,5,main]
Thread[ForkJoinPool-1-worker-53,5,main]
Thread[ForkJoinPool-1-worker-63,5,main]
Thread[ForkJoinPool-1-worker-47,5,main]
In the code below I create 20 threads, have them each print out a message, sleep, and print another message. I start the threads in my main thread and then join all of the threads as well. I would expect the "all done" message to only be printed after all of the threads have finished. Yet "all done" gets printed before all the threads are done. Can someone help me to understand this behavior?
Thanks.
Kent
Here is the code:
def ttest() = {
val threads =
for (i <- 1 to 5)
yield new Thread() {
override def run() {
println("going to sleep")
Thread.sleep(1000)
println("awake now")
}
}
threads.foreach(t => t.start())
threads.foreach(t => t.join())
println("all done")
}
Here is the output:
going to sleep
all done
going to sleep
going to sleep
going to sleep
going to sleep
awake now
awake now
awake now
awake now
awake now
It works if you transform the Range into a List:
def ttest() = {
val threads =
for (i <- 1 to 5 toList)
yield new Thread() {
override def run() {
println("going to sleep")
Thread.sleep(1000)
println("awake now")
}
}
threads.foreach(t => t.start())
threads.foreach(t => t.join())
println("all done")
}
The problem is that "1 to 5" is a Range, and ranges are not "strict", so to speak. In good English, when you call the method map on a Range, it does not compute each value right then. Instead, it produces an object -- a RandomAccessSeq.Projection on Scala 2.7 -- which has a reference to the function passed to map and another to the original range. Thus, when you use an element of the resulting range, the function you passed to map is applied to the corresponding element of the original range. And this will happen each and every time you access any element of the resulting range.
This means that each time you refer to an element of t, you are calling new Thread() { ... } anew. Since you do it twice, and the range has 5 elements, you are creating 10 threads. You start on the first 5, and join on the second 5.
If this is confusing, look at the example below:
scala> object test {
| val t = for (i <- 1 to 5) yield { println("Called again! "+i); i }
| }
defined module test
scala> test.t
Called again! 1
Called again! 2
Called again! 3
Called again! 4
Called again! 5
res4: scala.collection.generic.VectorView[Int,Vector[_]] = RangeM(1, 2, 3, 4, 5)
scala> test.t
Called again! 1
Called again! 2
Called again! 3
Called again! 4
Called again! 5
res5: scala.collection.generic.VectorView[Int,Vector[_]] = RangeM(1, 2, 3, 4, 5)
Each time I print t (by having Scala REPL print res4 and res5), the yielded expression gets evaluated again. It happens for individual elements too:
scala> test.t(1)
Called again! 2
res6: Int = 2
scala> test.t(1)
Called again! 2
res7: Int = 2
EDIT
As of Scala 2.8, Range will be strict, so the code in the question will work as originally expected.
In your code, threads is deferred - each time you iterate it, the for generator expression is run anew. Thus, you actually create 10 threads there - the first foreach creates 5 and starts them, the second foreach creates 5 more (which are not started) and joins them - since they aren't running, join returns immediately. You should use toList on the result of for to make a stable snapshot.