Consuming Server Sent Events(SSE) without losing data using scala and Akka - scala

I want to consume SSE events without losing any data when the rate of production is > rate of consumption. Since SSE supports backpressure Akka should be able to do it. I tried a few different ways but the extra messages are being dropped.
#Singleton
class SseConsumer #Inject()()(implicit ec: ExecutionContext) {
implicit val system = ActorSystem()
val send: HttpRequest => Future[HttpResponse] = foo
def foo(x: HttpRequest) = {
try {
val authHeader = Authorization(BasicHttpCredentials("user", "pass"))
val newHeaders = x.withHeaders(authHeader)
Http().singleRequest(newHeaders)
} catch {
case e: Exception => {
println("Exceptio12n", e.printStackTrace())
throw e
}
}
}
val eventSource2: Source[ServerSentEvent, NotUsed] =
EventSource(
uri = Uri("https://xyz/a/events/user"),
send,
initialLastEventId = Some("2"),
retryDelay = 1.second
)
def orderStatusEventStable() = {
val events: Future[immutable.Seq[ServerSentEvent]] =
eventSource2
.throttle(elements = 1, per = 3000.milliseconds, maximumBurst = 1, ThrottleMode.Shaping)
.take(5)
.runWith(Sink.seq)
events.map(_.foreach(x => {
// TODO: push to sqs
println("456")
println(x.data)
}))
}
Future {
blocking {
while (true) {
try {
Await.result(orderStatusEventStable() recover {
case e: Exception => {
println("exception", e)
throw e
}
}, Duration.Inf)
} catch {
case e: Exception => {
println("Exception", e.printStackTrace())
}
}
}
}
}
}
This code works but with the following problems:
Due to .take(5) when rate of consumption < rate of production, I am dropping events.
Also I want to process each message as it comes, and don't want to wait until 5 messages have reached. How can I do that ?
I have to write the consumer in a while loop. This does not seem event based, rather looks like polling (very similar to calling GET with pagination and limit of 5)
I am not sure about throttling, tried reading the docs but its very confusing. If I don't want to lose any events, is throttling the right approach? I am expecting a rate of 5000 req / sec in peak hours and 10 req/sec otherwise. When the production rate is high I would I ideally want to apply backpressure. Is throttling the correct approach for that ? According to docs it seems correct as it says Backpressures when downstream backpressures or the incoming rate is higher than the speed limit

In order for Akka Stream back pressuring to work, you have to use only one source instead of recreating a kind of polling with a new source each time.
Forget about your loop and your def orderStatusEventStable.
Only do something like this (once):
eventSource2
.operator(event => /* do something */ )
.otherOperatorMaybe()
...
.runWith(Sink.foreach(println))
With operator and otherOperatorMaybe being operations on Akka Stream depending on what you want to achieve (like throttle and take in your original code).
List of operators: https://doc.akka.io/docs/akka/current/stream/operators/index.html
Akka Stream is powerful but you need to take some time to learn about it

Related

Consuming Server Sent Events(SSE) in scala play framework with automatic reconnect

How can we consume SSE in scala play framework? Most of the resources that I could find were to make an SSE source. I want to reliably listen to SSE events from other services (with autoconnect). The most relevant article was https://doc.akka.io/docs/alpakka/current/sse.html . I implemented this but this does not seem to work (code below). Also the event that I am su
#Singleton
class SseConsumer #Inject()((implicit ec: ExecutionContext) {
implicit val system = ActorSystem()
val send: HttpRequest => Future[HttpResponse] = foo
def foo(x:HttpRequest) = {
try {
println("foo")
val authHeader = Authorization(BasicHttpCredentials("user", "pass"))
val newHeaders = x.withHeaders(authHeader)
Http().singleRequest(newHeaders)
}catch {
case e:Exception => {
println("Exception", e.printStackTrace())
throw e
}
}
}
val eventSource: Source[ServerSentEvent, NotUsed] =
EventSource(
uri = Uri("https://abc/v1/events"),
send,
initialLastEventId = Some("2"),
retryDelay = 1.second
)
def orderStatusEventStable() = {
val events: Future[immutable.Seq[ServerSentEvent]] =
eventSource
.throttle(elements = 1, per = 500.milliseconds, maximumBurst = 1, ThrottleMode.Shaping)
.take(10)
.runWith(Sink.seq)
events.map(_.foreach( x => {
println("456")
println(x.data)
}))
}
Future {
blocking{
while(true){
try{
Thread.sleep(2000)
orderStatusEventStable()
} catch {
case e:Exception => {
println("Exception", e.printStackTrace())
}
}
}
}
}
}
This does not give any exceptions and println("456") is never printed.
EDIT:
Future {
blocking {
while(true){
try{
Await.result(orderStatusEventStable() recover {
case e: Exception => {
println("exception", e)
throw e
}
}, Duration.Inf)
} catch {
case e:Exception => {
println("Exception", e.printStackTrace())
}
}
}
}
}
Added an await and it started working. Able to read 10 messages at a time. But now I am faced with another problem.
I have a producer which can at times produce faster than I can consume and with this code I have 2 issues:
I have to wait until 10 messages are available. How can we take a max. of 10 and a min. of 0 messages?
When the production rate > consumption rate, I am missing few events. I am guessing this is due to throttling. How do we handle it using backpressure?
The issue in your code is that the events: Future would only complete when the stream (eventSource) completes.
I'm not familiar with SSE but the stream likely never completes in your case as it's always listening for new events.
You can learn more in Akka Stream documentation.
Depending on what you want to do with the events, you could just map on the stream like:
eventSource
...
.map(/* do something */)
.runWith(...)
Basically, you need to work with the Akka Stream Source as data is going through it but don't wait for its completion.
EDIT: I didn't notice the take(10), my answer applies only if the take was not here. Your code should work after 10 events sent.

Akka HTTP Connection Pool Hangs After Couple of Hours

I have an HTTP Connection Pool that hangs after a couple of hours of running:
private def createHttpPool(host: String): SourceQueue[(HttpRequest, Promise[HttpResponse])] = {
val pool = Http().cachedHostConnectionPoolHttps[Promise[HttpResponse]](host)
Source.queue[(HttpRequest, Promise[HttpResponse])](config.poolBuffer, OverflowStrategy.dropNew)
.via(pool).toMat(Sink.foreach {
case ((Success(res), p)) => p.success(res)
case ((Failure(e), p)) => p.failure(e)
})(Keep.left).run
}
I enqueue items with:
private def enqueue(uri: Uri): Future[HttpResponse] = {
val promise = Promise[HttpResponse]
val request = HttpRequest(uri = uri) -> promise
queue.offer(request).flatMap {
case Enqueued => promise.future
case _ => Future.failed(ConnectionPoolDroppedRequest)
}
}
And resolve the response like this:
private def request(uri: Uri): Future[HttpResponse] = {
def retry = {
Thread.sleep(config.dispatcherRetryInterval)
logger.info(s"retrying")
request(uri)
}
logger.info("req-start")
for {
response <- enqueue(uri)
_ = logger.info("req-end")
finalResponse <- response.status match {
case TooManyRequests => retry
case OK => Future.successful(response)
case _ => response.entity.toStrict(10.seconds).map(s => throw Error(s.toString, uri.toString))
}
} yield finalResponse
}
The result of this function is then always transformed if the Future is successful:
def get(uri: Uri): Future[Try[JValue]] = {
for {
response <- request(uri)
json <- Unmarshal(response.entity).to[Try[JValue]]
} yield json
}
Everything works fine for a while and then all I see in the logs are req-start and no req-end.
My akka configuration is like this:
akka {
actor.deployment.default {
dispatcher = "my-dispatcher"
}
}
my-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 256
parallelism-factor = 128.0
parallelism-max = 1024
}
}
akka.http {
host-connection-pool {
max-connections = 512
max-retries = 5
max-open-requests = 16384
pipelining-limit = 1
}
}
I'm not sure if this is a configuration problem or a code problem. I have my parallelism and connection numbers so high because without it I get very poor req/s rate (I want to request as fast possible - I have other rate limiting code to protect the server).
You are not consuming the entity of the responses you get back from the server. Citing the docs below:
Consuming (or discarding) the Entity of a request is mandatory! If
accidentally left neither consumed or discarded Akka HTTP will assume
the incoming data should remain back-pressured, and will stall the
incoming data via TCP back-pressure mechanisms. A client should
consume the Entity regardless of the status of the HttpResponse.
The entity comes in the form of a Source[ByteString, _] which needs to be run to avoid resource starvation.
If you don't need to read the entity, the simplest way to consume the entity bytes is to discard them, by using
res.discardEntityBytes()
(you can attach a callback by adding - e.g. - .future().map(...)).
This page in the docs describes all the alternatives to this, including how to read the bytes if needed.
--- EDIT
After more code/info was provided, it is clear that the resource consumption is not the problem. There is another big red flag in this implementation, namely the Thread.sleep in the retry method.
This is a blocking call that is very likely to starve the threading infrastructure of your underlying actor system.
A full blown explanation of why this is dangerous was provided in the docs.
Try changing that and using akka.pattern.after (docs). Example below:
def retry = akka.pattern.after(200 millis, using = system.scheduler)(request(uri))

How would you "connect" many independent graphs maintaining backpressure between them?

Continuing series of questions about akka-streams I have another problem.
Variables:
Single http client flow with throttling
Multiple other flows that want to use first flow simultaneously
Goal:
Single http flow is flow that makes requests to particular API that limits number of calls to it. Otherwise it bans me. Thus it's very important to maintain rate of request regardless of how many clients in my code use it.
There are number of other flows that want to make requests to mentioned API but I'd like to have backpressure from http flow. Normally you connect whole thing to one graph and it works. But it my case I have multiple graphs.
How would you solve it ?
My attempt to solve it:
I use Source.queue for http flow so that I can queue http requests and have throttling. Problem is that Future from SourceQueue.offer fails if I exceed number of requests. Thus somehow I need to "reoffer" when previously offered event completes. Thus modified Future from SourceQueue would backpressure other graphs (inside their mapAsync) that make http requests.
Here is how I implemented it
object Main {
implicit val system = ActorSystem("root")
implicit val executor = system.dispatcher
implicit val materializer = ActorMaterializer()
private val queueHttp = Source.queue[(String, Promise[String])](2, OverflowStrategy.backpressure)
.throttle(1, FiniteDuration(1000, MILLISECONDS), 1, ThrottleMode.Shaping)
.mapAsync(4) {
case (text, promise) =>
// Simulate delay of http request
val delay = (Random.nextDouble() * 1000 / 2).toLong
Thread.sleep(delay)
Future.successful(text -> promise)
}
.toMat(Sink.foreach({
case (text, p) =>
p.success(text)
}))(Keep.left)
.run
val futureDeque = new ConcurrentLinkedDeque[Future[String]]()
def sendRequest(value: String): Future[String] = {
val p = Promise[String]()
val offerFuture = queueHttp.offer(value -> p)
def addToQueue(future: Future[String]): Future[String] = {
futureDeque.addLast(future)
future.onComplete {
case _ => futureDeque.remove(future)
}
future
}
offerFuture.flatMap {
case QueueOfferResult.Enqueued =>
addToQueue(p.future)
}.recoverWith {
case ex =>
val first = futureDeque.pollFirst()
if (first != null)
addToQueue(first.flatMap(_ => sendRequest(value)))
else
sendRequest(value)
}
}
def main(args: Array[String]) {
val allFutures = for (v <- 0 until 15)
yield {
val res = sendRequest(s"Text $v")
res.onSuccess {
case text =>
println("> " + text)
}
res
}
Future.sequence(allFutures).onComplete {
case Success(text) =>
println(s">>> TOTAL: ${text.length} [in queue: ${futureDeque.size()}]")
system.terminate()
case Failure(ex) =>
ex.printStackTrace()
system.terminate()
}
Await.result(system.whenTerminated, Duration.Inf)
}
}
Disadvantage of this solution is that I have locking on ConcurrentLinkedDeque which is probably not that bad for rate of 1 request per second but still.
How would you solve this task?
We have an open ticket (https://github.com/akka/akka/issues/19478) and some ideas for a "Hub" stage which would allow for dynamically combining streams, but I'm afraid I cannot give you any estimate for when it will be done.
So that is how we, in the Akka team, would solve the task. ;)

How can Akka streams be materialized continually?

I am using Akka Streams in Scala to poll from an AWS SQS queue using the AWS Java SDK. I created an ActorPublisher which dequeues messages on a two second interval:
class SQSSubscriber(name: String) extends ActorPublisher[Message] {
implicit val materializer = ActorMaterializer()
val schedule = context.system.scheduler.schedule(0 seconds, 2 seconds, self, "dequeue")
val client = new AmazonSQSClient()
client.setRegion(RegionUtils.getRegion("us-east-1"))
val url = client.getQueueUrl(name).getQueueUrl
val MaxBufferSize = 100
var buf = Vector.empty[Message]
override def receive: Receive = {
case "dequeue" =>
val messages = iterableAsScalaIterable(client.receiveMessage(new ReceiveMessageRequest(url).getMessages).toList
messages.foreach(self ! _)
case message: Message if buf.size == MaxBufferSize =>
log.error("The buffer is full")
case message: Message =>
if (buf.isEmpty && totalDemand > 0)
onNext(message)
else {
buf :+= message
deliverBuf()
}
case Request(_) =>
deliverBuf()
case Cancel =>
context.stop(self)
}
#tailrec final def deliverBuf(): Unit =
if (totalDemand > 0) {
if (totalDemand <= Int.MaxValue) {
val (use, keep) = buf.splitAt(totalDemand.toInt)
buf = keep
use foreach onNext
} else {
val (use, keep) = buf.splitAt(Int.MaxValue)
buf = keep
use foreach onNext
deliverBuf()
}
}
}
In my application, I am attempting to run the flow at a 2 second interval as well:
val system = ActorSystem("system")
val sqsSource = Source.actorPublisher[Message](SQSSubscriber.props("queue-name"))
val flow = Flow[Message]
.map { elem => system.log.debug(s"${elem.getBody} (${elem.getMessageId})"); elem }
.to(Sink.ignore)
system.scheduler.schedule(0 seconds, 2 seconds) {
flow.runWith(sqsSource)(ActorMaterializer()(system))
}
However, when I run my application I receive java.util.concurrent.TimeoutException: Futures timed out after [20000 milliseconds] and subsequent dead letter notices which is caused by the ActorMaterializer.
Is there a recommended approach for continually materializing an Akka Stream?
I don't think you need to create a new ActorPublisher every 2 seconds. This seems redundant and wasteful of memory. Also, I don't think an ActorPublisher is necessary. From what I can tell of the code, your implementation will have an ever growing number of Streams all querying the same data. Each Message from the client will be processed by N different akka Streams and, even worse, N will grow over time.
Iterator For Infinite Loop Querying
You can get the same behavior from your ActorPublisher by using scala's Iterator. It is possible to create an Iterator which continuously queries the client:
//setup the client
val client = {
val sqsClient = new AmazonSQSClient()
sqsClient setRegion (RegionUtils getRegion "us-east-1")
sqsClient
}
val url = client.getQueueUrl(name).getQueueUrl
//single query
def queryClientForMessages : Iterable[Message] = iterableAsScalaIterable {
client receiveMessage (new ReceiveMessageRequest(url).getMessages)
}
def messageListIteartor : Iterator[Iterable[Message]] =
Iterator continually messageListStream
//messages one-at-a-time "on demand", no timer pushing you around
def messageIterator() : Iterator[Message] = messageListIterator flatMap identity
This implementation only queries the client when all previous Messages have been consumed and is therefore truly reactive. No need to keep track of a buffer with fixed size. Your solution needs a buffer because the creation of Messages (via a timer) is de-coupled from the consumption of Messages (via println). In my implementation, creation & consumption are tightly coupled via back-pressure.
Akka Stream Source
You can then use this Iterator generator-function to feed an akka stream Source:
def messageSource : Source[Message, _] = Source fromIterator messageIterator
Flow Formation
And finally you can use this Source to perform the println (As a side note: your flow value is actually a Sink since Flow + Sink = Sink). Using your flow value from the question:
messageSource runWith flow
One akka Stream processing all messages.

Scala Akka Consumer/Producer: Return Value

Problem Statement
Assume I have a file with sentences that is processed line by line. In my case, I need to extract Named Entities (Persons, Organizations, ...) from these lines. Unfortunately, the tagger is quite slow. Therefore, I decided to parallelize the computation, such that lines could be processed independent from each other and the result is collected in a central location.
Current Approach
My current approach comprises the usage of a single producer multiple consumer concept. However, I'm relative new to Akka, but I think my problem description fits well into its capabilities. Let me show you some code:
Producer
The Producer reads the file line by line and sends it to the Consumer. If it reaches the total line limit, it propagates the result back to WordCount.
class Producer(consumers: ActorRef) extends Actor with ActorLogging {
var master: Option[ActorRef] = None
var result = immutable.List[String]()
var totalLines = 0
var linesProcessed = 0
override def receive = {
case StartProcessing() => {
master = Some(sender)
Source.fromFile("sent.txt", "utf-8").getLines.foreach { line =>
consumers ! Sentence(line)
totalLines += 1
}
context.stop(self)
}
case SentenceProcessed(list) => {
linesProcessed += 1
result :::= list
//If we are done, we can propagate the result to the creator
if (linesProcessed == totalLines) {
master.map(_ ! result)
}
}
case _ => log.error("message not recognized")
}
}
Consumer
class Consumer extends Actor with ActorLogging {
def tokenize(line: String): Seq[String] = {
line.split(" ").map(_.toLowerCase)
}
override def receive = {
case Sentence(sent) => {
//Assume: This is representative for the extensive computation method
val tokens = tokenize(sent)
sender() ! SentenceProcessed(tokens.toList)
}
case _ => log.error("message not recognized")
}
}
WordCount (Master)
class WordCount extends Actor {
val consumers = context.actorOf(Props[Consumer].
withRouter(FromConfig()).
withDispatcher("consumer-dispatcher"), "consumers")
val producer = context.actorOf(Props(new Producer(consumers)), "producer")
context.watch(consumers)
context.watch(producer)
def receive = {
case Terminated(`producer`) => consumers ! Broadcast(PoisonPill)
case Terminated(`consumers`) => context.system.shutdown
}
}
object WordCount {
def getActor() = new WordCount
def getConfig(routerType: String, dispatcherType: String)(numConsumers: Int) = s"""
akka.actor.deployment {
/WordCount/consumers {
router = $routerType
nr-of-instances = $numConsumers
dispatcher = consumer-dispatcher
}
}
consumer-dispatcher {
type = $dispatcherType
executor = "fork-join-executor"
}"""
}
The WordCount actor is responsible for creating the other actors. When the Consumer is finished the Producer sends a message with all tokens. But, how to propagate the message again and also accept and wait for it? The architecture with the third WordCount actor might be wrong.
Main Routine
case class Run(name: String, actor: () => Actor, config: (Int) => String)
object Main extends App {
val run = Run("push_implementation", WordCount.getActor _, WordCount.getConfig("balancing-pool", "Dispatcher") _)
def execute(run: Run, numConsumers: Int) = {
val config = ConfigFactory.parseString(run.config(numConsumers))
val system = ActorSystem("Counting", ConfigFactory.load(config))
val startTime = System.currentTimeMillis
system.actorOf(Props(run.actor()), "WordCount")
/*
How to get the result here?!
*/
system.awaitTermination
System.currentTimeMillis - startTime
}
execute(run, 4)
}
Problem
As you see, the actual problem is to propagate the result back to the Main routine. Can you tell me how to do this in a proper way? The question is also how to wait for the result until the consumers are finished? I had a brief look into the Akka Future documentation section, but the whole system is a little bit overwhelming for beginners. Something like var future = message ? actor seems suitable. Not sure, how to do this. Also using the WordCount actor causes additional complexity. Maybe it is possible to come up with a solution that doesn't need this actor?
Consider using the Akka Aggregator Pattern. That takes care of the low-level primitives (watching actors, poison pill, etc). You can focus on managing state.
Your call to system.actorOf() returns an ActorRef, but you're not using it. You should ask that actor for results. Something like this:
implicit val timeout = Timeout(5 seconds)
val wCount = system.actorOf(Props(run.actor()), "WordCount")
val answer = Await.result(wCount ? "sent.txt", timeout.duration)
This means your WordCount class needs a receive method that accepts a String message. That section of code should aggregate the results and tell the sender(), like this:
class WordCount extends Actor {
def receive: Receive = {
case filename: String =>
// do all of your code here, using filename
sender() ! results
}
}
Also, rather than blocking on the results with Await above, you can apply some techniques for handling Futures.