How would you "connect" many independent graphs maintaining backpressure between them? - scala

Continuing series of questions about akka-streams I have another problem.
Variables:
Single http client flow with throttling
Multiple other flows that want to use first flow simultaneously
Goal:
Single http flow is flow that makes requests to particular API that limits number of calls to it. Otherwise it bans me. Thus it's very important to maintain rate of request regardless of how many clients in my code use it.
There are number of other flows that want to make requests to mentioned API but I'd like to have backpressure from http flow. Normally you connect whole thing to one graph and it works. But it my case I have multiple graphs.
How would you solve it ?
My attempt to solve it:
I use Source.queue for http flow so that I can queue http requests and have throttling. Problem is that Future from SourceQueue.offer fails if I exceed number of requests. Thus somehow I need to "reoffer" when previously offered event completes. Thus modified Future from SourceQueue would backpressure other graphs (inside their mapAsync) that make http requests.
Here is how I implemented it
object Main {
implicit val system = ActorSystem("root")
implicit val executor = system.dispatcher
implicit val materializer = ActorMaterializer()
private val queueHttp = Source.queue[(String, Promise[String])](2, OverflowStrategy.backpressure)
.throttle(1, FiniteDuration(1000, MILLISECONDS), 1, ThrottleMode.Shaping)
.mapAsync(4) {
case (text, promise) =>
// Simulate delay of http request
val delay = (Random.nextDouble() * 1000 / 2).toLong
Thread.sleep(delay)
Future.successful(text -> promise)
}
.toMat(Sink.foreach({
case (text, p) =>
p.success(text)
}))(Keep.left)
.run
val futureDeque = new ConcurrentLinkedDeque[Future[String]]()
def sendRequest(value: String): Future[String] = {
val p = Promise[String]()
val offerFuture = queueHttp.offer(value -> p)
def addToQueue(future: Future[String]): Future[String] = {
futureDeque.addLast(future)
future.onComplete {
case _ => futureDeque.remove(future)
}
future
}
offerFuture.flatMap {
case QueueOfferResult.Enqueued =>
addToQueue(p.future)
}.recoverWith {
case ex =>
val first = futureDeque.pollFirst()
if (first != null)
addToQueue(first.flatMap(_ => sendRequest(value)))
else
sendRequest(value)
}
}
def main(args: Array[String]) {
val allFutures = for (v <- 0 until 15)
yield {
val res = sendRequest(s"Text $v")
res.onSuccess {
case text =>
println("> " + text)
}
res
}
Future.sequence(allFutures).onComplete {
case Success(text) =>
println(s">>> TOTAL: ${text.length} [in queue: ${futureDeque.size()}]")
system.terminate()
case Failure(ex) =>
ex.printStackTrace()
system.terminate()
}
Await.result(system.whenTerminated, Duration.Inf)
}
}
Disadvantage of this solution is that I have locking on ConcurrentLinkedDeque which is probably not that bad for rate of 1 request per second but still.
How would you solve this task?

We have an open ticket (https://github.com/akka/akka/issues/19478) and some ideas for a "Hub" stage which would allow for dynamically combining streams, but I'm afraid I cannot give you any estimate for when it will be done.
So that is how we, in the Akka team, would solve the task. ;)

Related

Consuming Server Sent Events(SSE) without losing data using scala and Akka

I want to consume SSE events without losing any data when the rate of production is > rate of consumption. Since SSE supports backpressure Akka should be able to do it. I tried a few different ways but the extra messages are being dropped.
#Singleton
class SseConsumer #Inject()()(implicit ec: ExecutionContext) {
implicit val system = ActorSystem()
val send: HttpRequest => Future[HttpResponse] = foo
def foo(x: HttpRequest) = {
try {
val authHeader = Authorization(BasicHttpCredentials("user", "pass"))
val newHeaders = x.withHeaders(authHeader)
Http().singleRequest(newHeaders)
} catch {
case e: Exception => {
println("Exceptio12n", e.printStackTrace())
throw e
}
}
}
val eventSource2: Source[ServerSentEvent, NotUsed] =
EventSource(
uri = Uri("https://xyz/a/events/user"),
send,
initialLastEventId = Some("2"),
retryDelay = 1.second
)
def orderStatusEventStable() = {
val events: Future[immutable.Seq[ServerSentEvent]] =
eventSource2
.throttle(elements = 1, per = 3000.milliseconds, maximumBurst = 1, ThrottleMode.Shaping)
.take(5)
.runWith(Sink.seq)
events.map(_.foreach(x => {
// TODO: push to sqs
println("456")
println(x.data)
}))
}
Future {
blocking {
while (true) {
try {
Await.result(orderStatusEventStable() recover {
case e: Exception => {
println("exception", e)
throw e
}
}, Duration.Inf)
} catch {
case e: Exception => {
println("Exception", e.printStackTrace())
}
}
}
}
}
}
This code works but with the following problems:
Due to .take(5) when rate of consumption < rate of production, I am dropping events.
Also I want to process each message as it comes, and don't want to wait until 5 messages have reached. How can I do that ?
I have to write the consumer in a while loop. This does not seem event based, rather looks like polling (very similar to calling GET with pagination and limit of 5)
I am not sure about throttling, tried reading the docs but its very confusing. If I don't want to lose any events, is throttling the right approach? I am expecting a rate of 5000 req / sec in peak hours and 10 req/sec otherwise. When the production rate is high I would I ideally want to apply backpressure. Is throttling the correct approach for that ? According to docs it seems correct as it says Backpressures when downstream backpressures or the incoming rate is higher than the speed limit
In order for Akka Stream back pressuring to work, you have to use only one source instead of recreating a kind of polling with a new source each time.
Forget about your loop and your def orderStatusEventStable.
Only do something like this (once):
eventSource2
.operator(event => /* do something */ )
.otherOperatorMaybe()
...
.runWith(Sink.foreach(println))
With operator and otherOperatorMaybe being operations on Akka Stream depending on what you want to achieve (like throttle and take in your original code).
List of operators: https://doc.akka.io/docs/akka/current/stream/operators/index.html
Akka Stream is powerful but you need to take some time to learn about it

integrating with a non thread safe service, while maintaining back pressure, in a concurrent environment using Futures, akka streams and akka actors

I am using a third party library to provide parsing services (user agent parsing in my case) which is not a thread safe library and has to operate on a single threaded basis. I would like to write a thread safe API that can be called by multiple threads to interact with it via Futures API as the library might introduce some potential blocking (IO). I would also like to provide back pressure when necessary and return a failed future when the parser doesn't catch up with the producers.
It could actually be a generic requirement/question, how to interact with any client/library which is not thread safe (user agents/geo locations parsers, db clients like redis, loggers collectors like fluentd), with back pressure in a concurrent environments.
I came up with the following formula:
encapsulate the parser within a dedicated Actor.
create an akka stream source queue that receives ParseReuqest that contains the user agent and a Promise to complete, and using the ask pattern via mapAsync to interact with the parser actor.
create another actor to encapsulate the source queue.
Is this the way to go? Is there any other way to achieve this, maybe simpler ? maybe using graph stage? can it be done without the ask pattern and less code involved?
the actor mentioned in number 3, is because I'm not sure if the source queue is thread safe or not ?? I wish it was simply stated in the docs, but it doesn't. there are multiple versions over the web, some stating it's not and some stating it is.
Is the source queue, once materialized, is thread safe to push elements from different threads?
(the code may not compile and is prone to potential failures, and is only intended for this question in place)
class UserAgentRepo(dbFilePath: String)(implicit actorRefFactory: ActorRefFactory) {
import akka.pattern.ask
import akka.util.Timeout
import scala.concurrent.duration._
implicit val askTimeout = Timeout(5 seconds)
// API to parser - delegates the request to the back pressure actor
def parse(userAgent: String): Future[Option[UserAgentData]] = {
val p = Promise[Option[UserAgentData]]
parserBackPressureProvider ! UserAgentParseRequest(userAgent, p)
p.future
}
// Actor to provide back pressure that delegates requests to parser actor
private class ParserBackPressureProvider extends Actor {
private val parser = context.actorOf(Props[UserAgentParserActor])
val queue = Source.queue[UserAgentParseRequest](100, OverflowStrategy.dropNew)
.mapAsync(1)(request => (parser ? request.userAgent).mapTo[Option[UserAgentData]].map(_ -> request.p))
.to(Sink.foreach({
case (result, promise) => promise.success(result)
}))
.run()
override def receive: Receive = {
case request: UserAgentParseRequest => queue.offer(request).map {
case QueueOfferResult.Enqueued =>
case _ => request.p.failure(new RuntimeException("parser busy"))
}
}
}
// Actor parser
private class UserAgentParserActor extends Actor {
private val up = new UserAgentParser(dbFilePath, true, 50000)
override def receive: Receive = {
case userAgent: String =>
sender ! Try {
up.parseUa(userAgent)
}.toOption.map(UserAgentData(userAgent, _))
}
}
private case class UserAgentParseRequest(userAgent: String, p: Promise[Option[UserAgentData]])
private val parserBackPressureProvider = actorRefFactory.actorOf(Props[ParserBackPressureProvider])
}
Do you have to use actors for this?
It does not seem like you need all this complexity, scala/java hasd all the tools you need "out of the box":
class ParserFacade(parser: UserAgentParser, val capacity: Int = 100) {
private implicit val ec = ExecutionContext
.fromExecutor(
new ThreadPoolExecutor(
1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(capacity)
)
)
def parse(ua: String): Future[Option[UserAgentData]] = try {
Future(Some(UserAgentData(ua, parser.parseUa(ua)))
.recover { _ => None }
} catch {
case _: RejectedExecutionException =>
Future.failed(new RuntimeException("parser is busy"))
}
}

How to throttle Futures with one-second delay with Akka

I have list of URIs, each of which I want to request with a one-second delay in between. How can I do that?
val uris: List[String] = List()
// How to make these URIs resolve 1 second apart?
val responses: List[Future[Response]] = uris.map(httpRequest(_))
You could create an Akka Streams Source from the list of URIs, then throttle the conversion of each URI to a Future[Response]:
def httpRequest(uri: String): Future[Response] = ???
val uris: List[String] = ???
val responses: Future[Seq[Response]] =
Source(uris)
.throttle(1, 1 second)
.mapAsync(parallelism = 1)(httpRequest)
.runWith(Sink.seq[Response])
Something like this perahps:
#tailrec
def withDelay(
uris: Seq[String],
delay: Duration = 1 second,
result: List[Future[Response]] = Nil,
): Seq[Future[Response]] = uris match {
case Seq() => result.reversed
case (head, tail#_*) =>
val v = result.headOption.getOrElse(Future.successful(null))
.flatMap { _ =>
akka.pattern.after(delay, context.system.scheduler)(httpRequest(head))
}
withDelay(tail, delay, v :: result)
}
this has a delay before the first execution as well, but I hope, it's clear enough how to get rid of it if necessary ...
Another caveat is that this assumes that all futures succeed. As soon as one fails, all subsequent processing is aborted.
If you need a different behavior, you may want to replace the .flatMap with .transform or add a .recover etc.
You can also write the same with .foldLeft if preferred:
uris.foldLeft(List.empty[Future[Response]]) { case (results, next) =>
results.headOption.getOrElse(Future.successful(null))
.flatMap { _ =>
akka.pattern.after(delay, context.system.scheduler)(httpRequest(next))
} :: results
}.reversed
akka streams has it out of the box with the throttle function (taking into account that you are using akka-http and added tag for akka streams)

Akka HTTP Connection Pool Hangs After Couple of Hours

I have an HTTP Connection Pool that hangs after a couple of hours of running:
private def createHttpPool(host: String): SourceQueue[(HttpRequest, Promise[HttpResponse])] = {
val pool = Http().cachedHostConnectionPoolHttps[Promise[HttpResponse]](host)
Source.queue[(HttpRequest, Promise[HttpResponse])](config.poolBuffer, OverflowStrategy.dropNew)
.via(pool).toMat(Sink.foreach {
case ((Success(res), p)) => p.success(res)
case ((Failure(e), p)) => p.failure(e)
})(Keep.left).run
}
I enqueue items with:
private def enqueue(uri: Uri): Future[HttpResponse] = {
val promise = Promise[HttpResponse]
val request = HttpRequest(uri = uri) -> promise
queue.offer(request).flatMap {
case Enqueued => promise.future
case _ => Future.failed(ConnectionPoolDroppedRequest)
}
}
And resolve the response like this:
private def request(uri: Uri): Future[HttpResponse] = {
def retry = {
Thread.sleep(config.dispatcherRetryInterval)
logger.info(s"retrying")
request(uri)
}
logger.info("req-start")
for {
response <- enqueue(uri)
_ = logger.info("req-end")
finalResponse <- response.status match {
case TooManyRequests => retry
case OK => Future.successful(response)
case _ => response.entity.toStrict(10.seconds).map(s => throw Error(s.toString, uri.toString))
}
} yield finalResponse
}
The result of this function is then always transformed if the Future is successful:
def get(uri: Uri): Future[Try[JValue]] = {
for {
response <- request(uri)
json <- Unmarshal(response.entity).to[Try[JValue]]
} yield json
}
Everything works fine for a while and then all I see in the logs are req-start and no req-end.
My akka configuration is like this:
akka {
actor.deployment.default {
dispatcher = "my-dispatcher"
}
}
my-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 256
parallelism-factor = 128.0
parallelism-max = 1024
}
}
akka.http {
host-connection-pool {
max-connections = 512
max-retries = 5
max-open-requests = 16384
pipelining-limit = 1
}
}
I'm not sure if this is a configuration problem or a code problem. I have my parallelism and connection numbers so high because without it I get very poor req/s rate (I want to request as fast possible - I have other rate limiting code to protect the server).
You are not consuming the entity of the responses you get back from the server. Citing the docs below:
Consuming (or discarding) the Entity of a request is mandatory! If
accidentally left neither consumed or discarded Akka HTTP will assume
the incoming data should remain back-pressured, and will stall the
incoming data via TCP back-pressure mechanisms. A client should
consume the Entity regardless of the status of the HttpResponse.
The entity comes in the form of a Source[ByteString, _] which needs to be run to avoid resource starvation.
If you don't need to read the entity, the simplest way to consume the entity bytes is to discard them, by using
res.discardEntityBytes()
(you can attach a callback by adding - e.g. - .future().map(...)).
This page in the docs describes all the alternatives to this, including how to read the bytes if needed.
--- EDIT
After more code/info was provided, it is clear that the resource consumption is not the problem. There is another big red flag in this implementation, namely the Thread.sleep in the retry method.
This is a blocking call that is very likely to starve the threading infrastructure of your underlying actor system.
A full blown explanation of why this is dangerous was provided in the docs.
Try changing that and using akka.pattern.after (docs). Example below:
def retry = akka.pattern.after(200 millis, using = system.scheduler)(request(uri))

How can Akka streams be materialized continually?

I am using Akka Streams in Scala to poll from an AWS SQS queue using the AWS Java SDK. I created an ActorPublisher which dequeues messages on a two second interval:
class SQSSubscriber(name: String) extends ActorPublisher[Message] {
implicit val materializer = ActorMaterializer()
val schedule = context.system.scheduler.schedule(0 seconds, 2 seconds, self, "dequeue")
val client = new AmazonSQSClient()
client.setRegion(RegionUtils.getRegion("us-east-1"))
val url = client.getQueueUrl(name).getQueueUrl
val MaxBufferSize = 100
var buf = Vector.empty[Message]
override def receive: Receive = {
case "dequeue" =>
val messages = iterableAsScalaIterable(client.receiveMessage(new ReceiveMessageRequest(url).getMessages).toList
messages.foreach(self ! _)
case message: Message if buf.size == MaxBufferSize =>
log.error("The buffer is full")
case message: Message =>
if (buf.isEmpty && totalDemand > 0)
onNext(message)
else {
buf :+= message
deliverBuf()
}
case Request(_) =>
deliverBuf()
case Cancel =>
context.stop(self)
}
#tailrec final def deliverBuf(): Unit =
if (totalDemand > 0) {
if (totalDemand <= Int.MaxValue) {
val (use, keep) = buf.splitAt(totalDemand.toInt)
buf = keep
use foreach onNext
} else {
val (use, keep) = buf.splitAt(Int.MaxValue)
buf = keep
use foreach onNext
deliverBuf()
}
}
}
In my application, I am attempting to run the flow at a 2 second interval as well:
val system = ActorSystem("system")
val sqsSource = Source.actorPublisher[Message](SQSSubscriber.props("queue-name"))
val flow = Flow[Message]
.map { elem => system.log.debug(s"${elem.getBody} (${elem.getMessageId})"); elem }
.to(Sink.ignore)
system.scheduler.schedule(0 seconds, 2 seconds) {
flow.runWith(sqsSource)(ActorMaterializer()(system))
}
However, when I run my application I receive java.util.concurrent.TimeoutException: Futures timed out after [20000 milliseconds] and subsequent dead letter notices which is caused by the ActorMaterializer.
Is there a recommended approach for continually materializing an Akka Stream?
I don't think you need to create a new ActorPublisher every 2 seconds. This seems redundant and wasteful of memory. Also, I don't think an ActorPublisher is necessary. From what I can tell of the code, your implementation will have an ever growing number of Streams all querying the same data. Each Message from the client will be processed by N different akka Streams and, even worse, N will grow over time.
Iterator For Infinite Loop Querying
You can get the same behavior from your ActorPublisher by using scala's Iterator. It is possible to create an Iterator which continuously queries the client:
//setup the client
val client = {
val sqsClient = new AmazonSQSClient()
sqsClient setRegion (RegionUtils getRegion "us-east-1")
sqsClient
}
val url = client.getQueueUrl(name).getQueueUrl
//single query
def queryClientForMessages : Iterable[Message] = iterableAsScalaIterable {
client receiveMessage (new ReceiveMessageRequest(url).getMessages)
}
def messageListIteartor : Iterator[Iterable[Message]] =
Iterator continually messageListStream
//messages one-at-a-time "on demand", no timer pushing you around
def messageIterator() : Iterator[Message] = messageListIterator flatMap identity
This implementation only queries the client when all previous Messages have been consumed and is therefore truly reactive. No need to keep track of a buffer with fixed size. Your solution needs a buffer because the creation of Messages (via a timer) is de-coupled from the consumption of Messages (via println). In my implementation, creation & consumption are tightly coupled via back-pressure.
Akka Stream Source
You can then use this Iterator generator-function to feed an akka stream Source:
def messageSource : Source[Message, _] = Source fromIterator messageIterator
Flow Formation
And finally you can use this Source to perform the println (As a side note: your flow value is actually a Sink since Flow + Sink = Sink). Using your flow value from the question:
messageSource runWith flow
One akka Stream processing all messages.