How to log flow rate in Akka Stream? - scala

I have an Akka Stream application with a single flow/graph. I want to measure the flow rate at the source and log it every 5 seconds, like 'received 3 messages in the last 5 seconds'. I tried with,
someOtherFlow
.groupedWithin(Integer.MAX_VALUE, 5 seconds)
.runForeach(seq =>
log.debug(s"received ${seq.length} messages in the last 5 seconds")
)
but it only outputs when there are messages, no empty list when there are 0 messages. I want the 0's as well. Is this possible?

You could try something like
src
.conflateWithSeed(_ ⇒ 1){ case (acc, _) ⇒ acc + 1 }
.zip(Source.tick(5.seconds, 5.seconds, NotUsed))
.map(_._1)
which should batch your elements until the tick releases them. This is inspired from an example in the docs.
On a different note, if you need this for monitoring purposes, you could leverage a 3rd party tool for this purpose - e.g. Kamon.

A sample akka stream logging.
implicit val system: ActorSystem = ActorSystem("StreamLoggingActorSystem")
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val adapter: LoggingAdapter = Logging(system, "customLogger")
implicit val ec: ExecutionContextExecutor = system.dispatcher
def randomInt = Random.nextInt()
val source = Source.repeat(NotUsed).map(_ ⇒ randomInt)
val logger = source
.groupedWithin(Integer.MAX_VALUE, 5.seconds)
.log(s"in the last 5 seconds number of messages received : ", _.size)
.withAttributes(
Attributes.logLevels(
onElement = Logging.WarningLevel,
onFinish = Logging.InfoLevel,
onFailure = Logging.DebugLevel
)
)
val sink = Sink.ignore
val result: Future[Done] = logger.runWith(sink)
result.onComplete{
case Success(_) =>
println("end of stream")
case Failure(_) =>
println("stream ended with failure")
}
source code is here.

Extending Stefano's answer a little I created the following flows:
def flowRate[T](metric: T => Int = (_: T) => 1, outputDelay: FiniteDuration = 1 second): Flow[T, Double, NotUsed] =
Flow[T]
.conflateWithSeed(metric(_)){ case (acc, x) ⇒ acc + metric(x) }
.zip(Source.tick(outputDelay, outputDelay, NotUsed))
.map(_._1.toDouble / outputDelay.toUnit(SECONDS))
def printFlowRate[T](name: String, metric: T => Int = (_: T) => 1,
outputDelay: FiniteDuration = 1 second): Flow[T, T, NotUsed] =
Flow[T]
.alsoTo(flowRate[T](metric, outputDelay)
.to(Sink.foreach(r => log.info(s"Rate($name): $r"))))
The first converts the flow into a rate per second. You can supply a metric which gives a value to each object passing through. Say you want to measure the rate of characters in a flow of strings then you could pass _.length. The second parameter is the delay between flow rate reports (defaults to one second).
The second flow can be used inline to print the flow rate for debugging purposes without modifying the value passing through the stream. eg
stringFlow
.via(printFlowRate[String]("Char rate", _.length, 10 seconds))
.map(_.toLowercase) // still a string
...
which will show every 10 seconds the average the rate (per second) of characters.
N.B. The above flowRate would however be lagging one outputDelay period behind, because the zip will consume from the conflate and then wait for a tick (which can be easily verified by putting a log after the conflateWithSeed). To obtain a non lagging flow rate (metric), one could duplicate the tick, in order to force the zip to consume a second fresh element from the conflate, and then aggregate both ticks, i.e.:
Flow[T]
.conflateWithSeed(metric(_)){case (acc, x) => acc + metric(x) }
.zip(Source.tick(outputDelay, outputDelay, NotUsed)
.mapConcat(_ => Seq(NotUsed, NotUsed))
)
.grouped(2).map {
case Seq((a, _), (b, _)) => a + b
}
.map(_.toDouble / outputDelay.toUnit(SECONDS))

Related

How to discard other Futures if the critical Future is finished in Scala?

Let's say I have three remote calls in order to construct my page. One of them (X) is critical for the page and the other two (A, B) just used to enhance the experience.
Because criticalFutureX is too important to be effected by futureA and futureB, so I want the overall latency of of all remote calls to be Not more than X.
That means, in case of criticalFutureX finishes, I want to discard futureA and futureB.
val criticalFutureX = ...
val futureA = ...
val futureB = ...
// the overall latency of this for-comprehension depends on the longest among X, A and B
for {
x <- criticalFutureX
a <- futureA
b <- futureB
} ...
In the above example, even though they are executed in parallel, the overall latency depends on the longest among X, A and B, which is not what I want.
Latencies:
X: |----------|
A: |---------------|
B: |---|
O: |---------------| (overall latency)
There is firstCompletedOf but it can not be used to explicit say "in case of completed of criticalFutureX".
Is there something like the following?
val criticalFutureX = ...
val futureA = ...
val futureB = ...
for {
x <- criticalFutureX
a <- futureA // discard when criticalFutureX finished
b <- futureB // discard when criticalFutureX finished
} ...
X: |----------|
A: |-----------... discarded
B: |---|
O: |----------| (overall latency)
You can achieve this with a promise
def completeOnMain[A, B](main: Future[A], secondary: Future[B]) = {
val promise = Promise[Option[B]]()
main.onComplete {
case Failure(_) =>
case Success(_) => promise.trySuccess(None)
}
secondary.onComplete {
case Failure(exception) => promise.tryFailure(exception)
case Success(value) => promise.trySuccess(Option(value))
}
promise.future
}
Some testing code
private def runFor(first: Int, second: Int) = {
def run(millis: Int) = Future {
Thread.sleep(millis);
millis
}
val start = System.currentTimeMillis()
val combined = for {
_ <- Future.unit
f1 = run(first)
f2 = completeOnMain(f1, run(second))
r1 <- f1
r2 <- f2
} yield (r1, r2)
val result = Await.result(combined, 10.seconds)
println(s"It took: ${System.currentTimeMillis() - start}: $result")
}
runFor(3000, 4000)
runFor(3000, 1000)
Produces
It took: 3131: (3000,None)
It took: 3001: (3000,Some(1000))
This kind of task is very hard to achieve efficiently, reliably and safely with Scala standard library Futures. There is no way to interrupt a Future that hasn't completed yet, meaning that even if you choose to ignore its result, it will still keep running and waste memory and CPU time. And even if there was a method to interrupt a running Future, there is no way to ensure that resources that were allocated (network connections, open files etc.) will be properly released.
I would like to point out that the implementation given by Ivan Stanislavciuc has a bug: if the main Future fails, then the promise will never be completed, which is unlikely to be what you want.
I would therefore strongly suggest looking into modern concurrent effect systems like ZIO or cats-effect. These are not only safer and faster, but also much easier. Here's an implementation with ZIO that doesn't have this bug:
import zio.{Exit, Task}
import Function.tupled
def completeOnMain[A, B](
main: Task[A], secondary: Task[B]): Task[(A, Exit[Throwable, B])] =
(main.forkManaged zip secondary.forkManaged).use {
tupled(_.join zip _.interrupt)
}
Exit is a type that describes how the secondary task ended, i. e. by successfully returning a B or because of an error (of type Throwable) or due to interruption.
Note that this function can be given a much more sophisticated signature that tells you a lot more about what's going on, but I wanted to keep it simple here.

Functional way of interrupting lazy iteration depedning on timeout and comparisson between previous and next, while, LazyList vs Stream

Background
I have the following scenario. I want to execute the method of a class from an external library, repeatedly, and I want to do so until a certain timeout condition and result condition (compared to the previous result) is met. Furthermore I want to collect the return values, even on the "failed" run (the run with the "failing" result condition that should interrupt further execution).
Thus far I have accomplished this with initializing an empty var result: Result, a var stop: Boolean and using a while loop that runs while the conditions are true and modifying the outer state. I would like to get rid of this and use a functional approach.
Some context. Each run is expected to run from 0 to 60 minutes and the total time of iteration is capped at 60 minutes. Theoretically, there's no bound to how many times it executes in this period but in practice, it's generally 2-60 times.
The problem is, the runs take a long time so I need to stop the execution. My idea is to use some kind of lazy Iterator or Stream coupled with scanLeft and Option.
Code
Boiler plate
This code isn't particularly relevant but used in my approach samples and provide identical but somewhat random pseudo runtime results.
import scala.collection.mutable.ListBuffer
import scala.util.Random
val r = Random
r.setSeed(1)
val sleepingTimes: Seq[Int] = (1 to 601)
.map(x => Math.pow(2, x).toInt * r.nextInt(100))
.toList
.filter(_ > 0)
.sorted
val randomRes = r.shuffle((0 to 600).map(x => r.nextInt(10)).toList)
case class Result(val a: Int, val slept: Int)
class Lib() {
def run(i: Int) = {
println(s"running ${i}")
Thread.sleep(sleepingTimes(i))
Result(randomRes(i), sleepingTimes(i))
}
}
case class Baz(i: Int, result: Result)
val lib = new Lib()
val timeout = 10 * 1000
While approach
val iteratorStart = System.currentTimeMillis()
val iterator = for {
i <- (0 to 600).iterator
if System.currentTimeMillis() < iteratorStart + timeout
f = Baz(i, lib.run(i))
} yield f
val iteratorBuffer = ListBuffer[Baz]()
if (iterator.hasNext) iteratorBuffer.append(iterator.next())
var run = true
while (run && iterator.hasNext) {
val next = iterator.next()
run = iteratorBuffer.last.result.a < next.result.a
iteratorBuffer.append(next)
}
Stream approach (Scala.2.12)
Full example
val streamStart = System.currentTimeMillis()
val stream = for {
i <- (0 to 600).toStream
if System.currentTimeMillis() < streamStart + timeout
} yield Baz(i, lib.run(i))
var last: Option[Baz] = None
val head = stream.headOption
val tail = if (stream.nonEmpty) stream.tail else stream
val streamVersion = (tail
.scanLeft((head, true))((x, y) => {
if (x._1.exists(_.result.a > y.result.a)) (Some(y), false)
else (Some(y), true)
})
.takeWhile {
case (baz, continue) =>
if (!baz.eq(head)) last = baz
continue
}
.map(_._1)
.toList :+ last).flatten
LazyList approach (Scala 2.13)
Full example
val lazyListStart = System.currentTimeMillis()
val lazyList = for {
i <- (0 to 600).to(LazyList)
if System.currentTimeMillis() < lazyListStart + timeout
} yield Baz(i, lib.run(i))
var last: Option[Baz] = None
val head = lazyList.headOption
val tail = if (lazyList.nonEmpty) lazyList.tail else lazyList
val lazyListVersion = (tail
.scanLeft((head, true))((x, y) => {
if (x._1.exists(_.result.a > y.result.a)) (Some(y), false)
else (Some(y), true)
})
.takeWhile {
case (baz, continue) =>
if (!baz.eq(head)) last = baz
continue
}
.map(_._1)
.toList :+ last).flatten
Result
Both approaches appear to yield the correct end result:
List(Baz(0,Result(4,170)), Baz(1,Result(5,208)))
and they interrupt execution as desired.
Edit: The desired outcome is to not execute the next iteration but still return the result of the iteration that caused the interruption. Thus the desired result is
List(Baz(0,Result(4,170)), Baz(1,Result(5,208)), Baz(2,Result(2,256))
and lib.run(i) should only run 3 times.
This is achieved by the while approach, as well as the LazyList approach but not the Stream approach which executes lib.run 4 times (Bad!).
Question
Is there another stateless approach, which is hopefully more elegant?
Edit
I realized my examples were faulty and not returning the "failing" result, which it should, and that they kept executing beyond the stop condition. I rewrote the code and examples but I believe the spirit of the question is the same.
I would use something higher level, like fs2.
(or any other high-level streaming library, like: monix observables, akka streams or zio zstreams)
def runUntilOrTimeout[F[_]: Concurrent: Timer, A](work: F[A], timeout: FiniteDuration)
(stop: (A, A) => Boolean): Stream[F, A] = {
val interrupt =
Stream.sleep_(timeout)
val run =
Stream
.repeatEval(work)
.zipWithPrevious
.takeThrough {
case (Some(p), c) if stop(p, c) => false
case _ => true
} map {
case (_, c) => c
}
run mergeHaltBoth interrupt
}
You can see it working here.

Akka SourceQueue to send list elements

I have a List[String] and a Source.queue. I would like to offer this queue string elements after some interval of time. Something like this :
val data : List[String] = ""
val tick = Source.tick(0 second, 1 second, "tick")
tick.runForeach(t => queue.offer(data(??))
Can someone help me out?
Edit : I have found a way but looking for more elegant way
val tick = Source.tick(0 second, 2 second, "tick").zipWithIndex.limit(data.length)
tick.runForeach(t => {
queue.offer(data(t._2.toInt))
})
To send elements from the List[String] to the queue in specific time intervals for each element, use Source#delay in the following manner:
val data: List[String] = ???
Source(data)
.delay(2.seconds, DelayOverflowStrategy.backpressure)
.withAttributes(Attributes.inputBuffer(1, 1))
.mapAsync(1)(x => queue.offer(x))
.runWith(Sink.ignore)
Set the input buffer size to one with withAttributes because the default value is 16, and use DelayOverflowStrategy.backpressure. Also, use mapAsync since the offer method returns a Future.
Alternatively, use Source#throttle:
Source(data)
.throttle(1, 2.seconds, 1, ThrottleMode.Shaping)
.mapAsync(1)(x => queue.offer(x))
.runWith(Sink.ignore)

Observables created at time interval

I was looking at the RxScala observables which are created at a given time interval:
val periodic: Observable[Long] = Observable.interval(100 millis)
periodic.foreach(x => println(x))
If I put this in a worksheet, I get this result:
periodic: rx.lang.scala.Observable[Long] = rx.lang.scala.JavaConversions$$anon$2#2cce3493
res0: Unit = ()
This leaves me confused: What do the elements of periodic actually contain?
Do they contain some index?
Do they contain the time interval at which they were created?
As you can read here http://reactivex.io/documentation/operators/interval.html produced elements are Long values incrementing from 0.
As for your code and results:
Here, you create the observable, and get Observable[Long] assigned to periodic. Everything as expected.
scala> val periodic: Observable[Long] = Observable.interval(100 millis)
periodic: rx.lang.scala.Observable[Long] = rx.lang.scala.JavaConversions$$anon$2#2cce3493
Here, you register a callback, i.e. what happens when value is emmited. The return type of foreach method is Unit as it doesn't have a reasonable value and happens just for the side effect of registering callbacks.
periodic.foreach(x => println(x))
res0: Unit = ()
You don't see actual values because execution stops. Try to insert Thread.sleep.
val periodic: Observable[Long] = Observable.interval(100.millis)
periodic.foreach(x => println(x))
Thread.sleep(1000)
Gives output similar to
periodic: rx.lang.scala.Observable[Long] = rx.lang.scala.JavaConversions$$anon$2#207cb62f
res0: Unit = ()
0
1
2
3
4
5
6
7
8
9
res1: Unit = ()
The problem is that interval is asynchronous, so you´re not waiting for the result.
Another way to wait for the result is use TestSubscriber
def interval(): Unit = {
addHeader("Interval observable")
Observable.interval(createDuration(100))
.map(n => "New item emitted:" + n)
.doOnNext(s => print("\n" + s))
.subscribe();
new TestSubscriber[Subscription].awaitTerminalEvent(1000, TimeUnit.MILLISECONDS);
}
You can see more examples here https://github.com/politrons/reactiveScala

Scala - Wait for all futures completed within time period

I have List[Future[String]] and would like to wait constant period of time in order to collect successfull computation as well as rerun operations for futures that do not complete in specified period of time.
In pseudo code it will look like:
val inputData: List[String] = getInputData()
val futures : List[Future[String]] = inputData.map(toLongRunningIOOperation)
val (completedFutures, unfinishedFutures) = Await.ready(futures, 2 seconds)
val rerunedOperations : List[Future[String]] = unfinisedFutures.map(rerun)
Such solution could be useful if you need to execute several calls to external services whose usual latency is (low p99 < 60ms) but sometimes requests are processed more than 5 seconds (because of current state/load). In such situation it is better to rerun those requests (i.e to another instance of the service).
As an example, use Future.firstCompletedOf function to get timed future
def futureToFutureOption[T](f: Future[T]): Future[Option[T]] = f.map(Some(_)).recover[Option[T]]{case _ => None}
val inputData: List[String] = List("a", "b", "c", "d")
val completedFuture = inputData.map { a =>
a match {
case "a" | "c" => Future.firstCompletedOf(Seq(Future{Thread.sleep(3000); a},
Future.failed{Thread.sleep(2000); new RuntimeException()}))
case _ => Future(a)
}
}
val unfinished = Future.sequence(completedFuture.map(futureToFutureOption)).map(list => inputData.toSet -- list.flatten.toSet)
val rerunedOperations: Future[Set[Future[String]]] = unfinished.map { _.map(foo) }
def foo(s: String): Future[String] = ???
Here in example rerunedOperations have different type as in yours example, but I think this will be fine for you.
Also keep in mind, if you're performing some call to external system inside the future and that future didn't finish in appropriate time, such approach will not prevent unfinished future from execution, I mean that the actual call to external system will be processing while you will try to make another call