How to throttle Futures with one-second delay with Akka - scala

I have list of URIs, each of which I want to request with a one-second delay in between. How can I do that?
val uris: List[String] = List()
// How to make these URIs resolve 1 second apart?
val responses: List[Future[Response]] = uris.map(httpRequest(_))

You could create an Akka Streams Source from the list of URIs, then throttle the conversion of each URI to a Future[Response]:
def httpRequest(uri: String): Future[Response] = ???
val uris: List[String] = ???
val responses: Future[Seq[Response]] =
Source(uris)
.throttle(1, 1 second)
.mapAsync(parallelism = 1)(httpRequest)
.runWith(Sink.seq[Response])

Something like this perahps:
#tailrec
def withDelay(
uris: Seq[String],
delay: Duration = 1 second,
result: List[Future[Response]] = Nil,
): Seq[Future[Response]] = uris match {
case Seq() => result.reversed
case (head, tail#_*) =>
val v = result.headOption.getOrElse(Future.successful(null))
.flatMap { _ =>
akka.pattern.after(delay, context.system.scheduler)(httpRequest(head))
}
withDelay(tail, delay, v :: result)
}
this has a delay before the first execution as well, but I hope, it's clear enough how to get rid of it if necessary ...
Another caveat is that this assumes that all futures succeed. As soon as one fails, all subsequent processing is aborted.
If you need a different behavior, you may want to replace the .flatMap with .transform or add a .recover etc.
You can also write the same with .foldLeft if preferred:
uris.foldLeft(List.empty[Future[Response]]) { case (results, next) =>
results.headOption.getOrElse(Future.successful(null))
.flatMap { _ =>
akka.pattern.after(delay, context.system.scheduler)(httpRequest(next))
} :: results
}.reversed

akka streams has it out of the box with the throttle function (taking into account that you are using akka-http and added tag for akka streams)

Related

How to divert Future failure in Akka stream to a separate sink?

I have a stream with following structure
val source = Source(1 to 10)
val flow1 = Flow[Int].mapAsyncUnordered(2){ x =>
if (x != 7) Future.successful(x)
else Future.failed(new Exception(s"x has failed"))
val flow2 = Flow[Int].mapAsyncUnordered(2){ x =>
if (x != 4) Future.successful(x)
else Future.failed(new Exception(s"x has failed"))
val sink = Sink.fold(List[Int])((xs, x: Int) => x :: xs)
val errorSink = Sink.fold(List[Exception])((errs ,err: Exception) => err :: errs)
My question:
How should I construct the divertTo function to send all exceptions to errorSink?
Any suggestion on how to get the error object with information on which stage it failed would be helpful.
I would recommend modelling your errors as a proper type so that you have a Flow[Either[CustomErrorType, Int]] for instance and then you can use divertTo with a predicate that looks at whether you have a Left or Right.
Or maybe use recover in combination.
See this interesting article: https://bszwej.medium.com/akka-streams-error-handling-7ff9cc01bc12
Future encodes both asynchronicity and can-fail. You'll need to separate the asynchronicity and can-fail.
Try, for instance, is an encoding of can-fail.
Meanwhile, mapAsyncUnordered only emits successes (you can use a supervision strategy to decide not to fail on a failed future, but that will drop the failures not emit them).
It seems that you want to accumulate a list of failures (given the use of Sink.fold). Since that list of failures is only accessible to the outside world through the materialized value, you'll want to use divertToMat instead of divertTo.
From this, the logical solution is:
import scala.concurrent.ExecutionContext
import scala.util.{ Failure, Success, Try }
// Returns a future which is only a failure on fatal exceptions
def liftToFutTry[T](fut: Future[T])(implicit ec: ExecutionContext): Future[Try[T]] =
fut.map(Success(_))
.recoverWith {
case ex => Future.successful(Failure(ex))
}
// for some reason we want a List[Exception] rather than List[Throwable]
val errorSink: Sink[Try[Int], Future[List[Exception]]] =
Flow[Try[Int]]
.mapConcat { t =>
t.failed.get match {
case ex: Exception => List(ex)
case _ => Nil
} : List[Exception]
}
.toMat(Sink.fold(List.empty[Exception]) { (exes, ex) => ex :: exes })(Keep.right)
// materializes as a future of the exceptions which failed in mapAsyncUnordered
val flow1: Flow[Int, Int, Future[List[Exception]]] =
Flow[Int]
.mapAsyncUnordered(2) { x =>
val fut =
if (x != 7) Future.successful(x)
else Future.failed(new Exception(s"$x has failed (equaled 7)"))
liftToFutTry(fut)
}
.divertToMat(errorSink, _.isFailure)(Keep.right) // propagate the failures
.map { successfulTry => successfulTry.get }
If you have two Flows like this and you want to compose them, you'd do
// materialized value is (list of failures from flow1, list of failures from otherFlow
val both: Flow[Int, Int, (Future[List[Exception]], Future[List[Exception]])]
flow1
.viaMat(flow2)(Keep.both)
// materialized value is:
// (
// (
// list of failures from flow1,
// list of failures from otherFlow
// ),
// list of ints which passed through both flow1 and otherFlow
// )
both.toMat(sink)(Keep.both) : Sink[Int, ((Future[List[Exception]], Future[List[Exception]]), Future[List[Int]])]
There are other ways to encode can-fail: e.g. you could use Either from the standard library.
Accumulating a Future[List[_]] may be questionable; note that the Futures won't be complete until the stream finishes.

Scala Futures for-comprehension with a list of values

I need to execute a Future method on some elements I have in a list simultaneously. My current implementation works sequentially, which is not optimal for saving time. I did this by mapping my list and calling the method on each element and processing the data this way.
My manager shared a link with me showing how to execute Futures simultaneously using for-comprehension but I cannot see/understand how I can implement this with my List.
The link he shared with me is https://alvinalexander.com/scala/how-use-multiple-scala-futures-in-for-comprehension-loop/
Here is my current code:
private def method1(id: String): Tuple2[Boolean, List[MyObject]] = {
val workers = List.concat(idleWorkers, activeWorkers.keys.toList)
var ready = true;
val workerStatus = workers.map{ worker =>
val option = Await.result(method2(worker), 1 seconds)
var status = if (option.isDefined) {
if (option.get._2 == id) {
option.get._1.toString
} else {
"INVALID"
}
} else "FAILED"
val status = s"$worker: $status"
if (option.get._1) {
ready = false
}
MyObject(worker.toString, status)
}.toList.filterNot(s => s. status.contains("INVALID"))
(ready, workerStatus)
}
private def method2(worker: ActorRef): Future[Option[(Boolean, String)]] = Future{
implicit val timeout: Timeout = 1 seconds;
Try(Await.result(worker ? GetStatus, 1 seconds)) match {
case Success(extractedVal) => extractedVal match {
case res: (Boolean, String) => Some(res)
case _ => None
}
case Failure(_) => { None }
case _ => { None }
}
}
If someone could suggest how to implement for-comprehension in this scenario, I would be grateful. Thanks
For method2 there is no need for the Future/Await mix. Just map the Future:
def method2(worker: ActorRef): Future[Option[(Boolean, String)]] =
(worker ? GetStatus).map{
case res: (Boolean, String) => Some(res)
case _ => None
}
For method1 you likewise need to map the result of method2 and do the processing inside the map. This will make workerStatus a List[Future[MyObject]] and means that everything runs in parallel.
Then use Future.sequence(workerStatus) to turn the List[Future[MyObject]] into a Future[List[MyObject]]. You can then use map again to do the filtering/ checking on that List[MyObject]. This will happen when all the individual Futures have completed.
Ideally you would then return a Future from method1 to keep everything asynchronous. You could, if absolutely necessary, use Await.result at this point which would wait for all the asynchronous operations to complete (or fail).

Stream Future in Play 2.5

Once again I am attempting to update some pre Play 2.5 code (based on this vid). For example the following used to be how to stream a Future:
Ok.chunked(Enumerator.generateM(Promise.timeout(Some("hello"), 500)))
I have created the following method for the work-around for Promise.timeout (deprecated) using Akka:
private def keepResponding(data: String, delay: FiniteDuration, interval: FiniteDuration): Future[Result] = {
val promise: Promise[Result] = Promise[Result]()
actorSystem.scheduler.schedule(delay, interval) { promise.success(Ok(data)) }
promise.future
}
According to the Play Framework Migration Guide; Enumerators should be rewritten to a Source and Source.unfoldAsync is apparently the equivalent of Enumerator.generateM so I was hoping that this would work (where str is a Future[String]):
def inf = Action { request =>
val str = keepResponding("stream me", 1.second, 2.second)
Ok.chunked(Source.unfoldAsync(str))
}
Of course I'm getting a Type mismatch error and when looking at the case class signature of unfoldAsync:
final class UnfoldAsync[S, E](s: S, f: S ⇒ Future[Option[(S, E)]])
I can see that the parameters are not correct but I'm not fully understanding what/how I should pass this through.
unfoldAsync is even more generic than Play!'s own generateM, as it allows you to pass through a status (S) value. This can make the value emitted depend on the previously emitted value(s).
The example below will load values by an increasing id, until the loading fails:
val source: Source[String, NotUsed] = Source.unfoldAsync(0){ id ⇒
loadFromId(id)
.map(s ⇒ Some((id + 1, s)))
.recover{case _ ⇒ None}
}
def loadFromId(id: Int): Future[String] = ???
In your case an internal state is not really needed, therefore you can just pass dummy values whenever required, e.g.
val source: Source[Result, NotUsed] = Source.unfoldAsync(NotUsed) { _ ⇒
schedule("stream me", 2.seconds).map(x ⇒ Some(NotUsed → x))
}
def schedule(data: String, delay: FiniteDuration): Future[Result] = {
akka.pattern.after(delay, system.scheduler){Future.successful(Ok(data))}
}
Note that your original implementation of keepResponding is incorrect, as you cannot complete a Promise more than once. Akka after pattern offer a simpler way to achieve what you need.
However, note that in your specific case, Akka Streams offers a more idiomatic solution with Source.tick:
val source: Source[String, Cancellable] = Source.tick(1.second, 2.seconds, NotUsed).mapAsync(1){ _ ⇒
loadSomeFuture()
}
def loadSomeFuture(): Future[String] = ???
or even simpler in case you don't actually need asynchronous computation as in your example
val source: Source[String, Cancellable] = Source.tick(1.second, 2.seconds, "stream me")

Scala - Batched Stream from Futures

I have instances of a case class Thing, and I have a bunch of queries to run that return a collection of Things like so:
def queries: Seq[Future[Seq[Thing]]]
I need to collect all Things from all futures (like above) and group them into equally sized collections of 10,000 so they can be serialized to files of 10,000 Things.
def serializeThings(Seq[Thing]): Future[Unit]
I want it to be implemented in such a way that I don't wait for all queries to run before serializing. As soon as there are 10,000 Things returned after the futures of the first queries complete, I want to start serializing.
If I do something like:
Future.sequence(queries)
It will collect the results of all the queries, but my understanding is that operations like map won't be invoked until all queries complete and all the Things must fit into memory at once.
What's the best way to implement a batched stream pipeline using Scala collections and concurrent libraries?
I think that I managed to make something. The solution is based on my previous answer. It collects results from Future[List[Thing]] results until it reaches a treshold of BatchSize. Then it calls serializeThings future, when it finishes, the loop continues with the rest.
object BatchFutures extends App {
case class Thing(id: Int)
def getFuture(id: Int): Future[List[Thing]] = {
Future.successful {
List.fill(3)(Thing(id))
}
}
def serializeThings(things: Seq[Thing]): Future[Unit] = Future.successful {
//Thread.sleep(2000)
println("processing: " + things)
}
val ids = (1 to 4).toList
val BatchSize = 5
val future = ids.foldLeft(Future.successful[List[Thing]](Nil)) {
case (acc, id) =>
acc flatMap { processed =>
getFuture(id) flatMap { res =>
val all = processed ++ res
val (batch, rest) = all.splitAt(5)
if (batch.length == BatchSize) { // if futures filled the batch with needed amount
serializeThings(batch) map { _ =>
rest // process the rest
}
} else {
Future.successful(all) //if we need more Things for a batch
}
}
}
}.flatMap { rest =>
serializeThings(rest)
}
Await.result(future, Duration.Inf)
}
The result prints:
processing: List(Thing(1), Thing(1), Thing(1), Thing(2), Thing(2))
processing: List(Thing(2), Thing(3), Thing(3), Thing(3), Thing(4))
processing: List(Thing(4), Thing(4))
When the number of Things isn't divisible by BatchSize we have to call serializeThings once more(last flatMap). I hope it helps! :)
Before you do Future.sequence do what you want to do with individual future and then use Future.sequence.
//this can be used for serializing
def doSomething(): Unit = ???
//do something with the failed future
def doSomethingElse(): Unit = ???
def doSomething(list: List[_]) = ???
val list: List[Future[_]] = List.fill(10000)(Future(doSomething()))
val newList =
list.par.map { f =>
f.map { result =>
doSomething()
}.recover { case throwable =>
doSomethingElse()
}
}
Future.sequence(newList).map ( list => doSomething(list)) //wait till all are complete
instead of newList generation you could use Future.traverse
Future.traverse(list)(f => f.map( x => doSomething()).recover {case th => doSomethingElse() }).map ( completeListOfValues => doSomething(completeListOfValues))

How would you "connect" many independent graphs maintaining backpressure between them?

Continuing series of questions about akka-streams I have another problem.
Variables:
Single http client flow with throttling
Multiple other flows that want to use first flow simultaneously
Goal:
Single http flow is flow that makes requests to particular API that limits number of calls to it. Otherwise it bans me. Thus it's very important to maintain rate of request regardless of how many clients in my code use it.
There are number of other flows that want to make requests to mentioned API but I'd like to have backpressure from http flow. Normally you connect whole thing to one graph and it works. But it my case I have multiple graphs.
How would you solve it ?
My attempt to solve it:
I use Source.queue for http flow so that I can queue http requests and have throttling. Problem is that Future from SourceQueue.offer fails if I exceed number of requests. Thus somehow I need to "reoffer" when previously offered event completes. Thus modified Future from SourceQueue would backpressure other graphs (inside their mapAsync) that make http requests.
Here is how I implemented it
object Main {
implicit val system = ActorSystem("root")
implicit val executor = system.dispatcher
implicit val materializer = ActorMaterializer()
private val queueHttp = Source.queue[(String, Promise[String])](2, OverflowStrategy.backpressure)
.throttle(1, FiniteDuration(1000, MILLISECONDS), 1, ThrottleMode.Shaping)
.mapAsync(4) {
case (text, promise) =>
// Simulate delay of http request
val delay = (Random.nextDouble() * 1000 / 2).toLong
Thread.sleep(delay)
Future.successful(text -> promise)
}
.toMat(Sink.foreach({
case (text, p) =>
p.success(text)
}))(Keep.left)
.run
val futureDeque = new ConcurrentLinkedDeque[Future[String]]()
def sendRequest(value: String): Future[String] = {
val p = Promise[String]()
val offerFuture = queueHttp.offer(value -> p)
def addToQueue(future: Future[String]): Future[String] = {
futureDeque.addLast(future)
future.onComplete {
case _ => futureDeque.remove(future)
}
future
}
offerFuture.flatMap {
case QueueOfferResult.Enqueued =>
addToQueue(p.future)
}.recoverWith {
case ex =>
val first = futureDeque.pollFirst()
if (first != null)
addToQueue(first.flatMap(_ => sendRequest(value)))
else
sendRequest(value)
}
}
def main(args: Array[String]) {
val allFutures = for (v <- 0 until 15)
yield {
val res = sendRequest(s"Text $v")
res.onSuccess {
case text =>
println("> " + text)
}
res
}
Future.sequence(allFutures).onComplete {
case Success(text) =>
println(s">>> TOTAL: ${text.length} [in queue: ${futureDeque.size()}]")
system.terminate()
case Failure(ex) =>
ex.printStackTrace()
system.terminate()
}
Await.result(system.whenTerminated, Duration.Inf)
}
}
Disadvantage of this solution is that I have locking on ConcurrentLinkedDeque which is probably not that bad for rate of 1 request per second but still.
How would you solve this task?
We have an open ticket (https://github.com/akka/akka/issues/19478) and some ideas for a "Hub" stage which would allow for dynamically combining streams, but I'm afraid I cannot give you any estimate for when it will be done.
So that is how we, in the Akka team, would solve the task. ;)