How can Akka streams be materialized continually? - scala

I am using Akka Streams in Scala to poll from an AWS SQS queue using the AWS Java SDK. I created an ActorPublisher which dequeues messages on a two second interval:
class SQSSubscriber(name: String) extends ActorPublisher[Message] {
implicit val materializer = ActorMaterializer()
val schedule = context.system.scheduler.schedule(0 seconds, 2 seconds, self, "dequeue")
val client = new AmazonSQSClient()
client.setRegion(RegionUtils.getRegion("us-east-1"))
val url = client.getQueueUrl(name).getQueueUrl
val MaxBufferSize = 100
var buf = Vector.empty[Message]
override def receive: Receive = {
case "dequeue" =>
val messages = iterableAsScalaIterable(client.receiveMessage(new ReceiveMessageRequest(url).getMessages).toList
messages.foreach(self ! _)
case message: Message if buf.size == MaxBufferSize =>
log.error("The buffer is full")
case message: Message =>
if (buf.isEmpty && totalDemand > 0)
onNext(message)
else {
buf :+= message
deliverBuf()
}
case Request(_) =>
deliverBuf()
case Cancel =>
context.stop(self)
}
#tailrec final def deliverBuf(): Unit =
if (totalDemand > 0) {
if (totalDemand <= Int.MaxValue) {
val (use, keep) = buf.splitAt(totalDemand.toInt)
buf = keep
use foreach onNext
} else {
val (use, keep) = buf.splitAt(Int.MaxValue)
buf = keep
use foreach onNext
deliverBuf()
}
}
}
In my application, I am attempting to run the flow at a 2 second interval as well:
val system = ActorSystem("system")
val sqsSource = Source.actorPublisher[Message](SQSSubscriber.props("queue-name"))
val flow = Flow[Message]
.map { elem => system.log.debug(s"${elem.getBody} (${elem.getMessageId})"); elem }
.to(Sink.ignore)
system.scheduler.schedule(0 seconds, 2 seconds) {
flow.runWith(sqsSource)(ActorMaterializer()(system))
}
However, when I run my application I receive java.util.concurrent.TimeoutException: Futures timed out after [20000 milliseconds] and subsequent dead letter notices which is caused by the ActorMaterializer.
Is there a recommended approach for continually materializing an Akka Stream?

I don't think you need to create a new ActorPublisher every 2 seconds. This seems redundant and wasteful of memory. Also, I don't think an ActorPublisher is necessary. From what I can tell of the code, your implementation will have an ever growing number of Streams all querying the same data. Each Message from the client will be processed by N different akka Streams and, even worse, N will grow over time.
Iterator For Infinite Loop Querying
You can get the same behavior from your ActorPublisher by using scala's Iterator. It is possible to create an Iterator which continuously queries the client:
//setup the client
val client = {
val sqsClient = new AmazonSQSClient()
sqsClient setRegion (RegionUtils getRegion "us-east-1")
sqsClient
}
val url = client.getQueueUrl(name).getQueueUrl
//single query
def queryClientForMessages : Iterable[Message] = iterableAsScalaIterable {
client receiveMessage (new ReceiveMessageRequest(url).getMessages)
}
def messageListIteartor : Iterator[Iterable[Message]] =
Iterator continually messageListStream
//messages one-at-a-time "on demand", no timer pushing you around
def messageIterator() : Iterator[Message] = messageListIterator flatMap identity
This implementation only queries the client when all previous Messages have been consumed and is therefore truly reactive. No need to keep track of a buffer with fixed size. Your solution needs a buffer because the creation of Messages (via a timer) is de-coupled from the consumption of Messages (via println). In my implementation, creation & consumption are tightly coupled via back-pressure.
Akka Stream Source
You can then use this Iterator generator-function to feed an akka stream Source:
def messageSource : Source[Message, _] = Source fromIterator messageIterator
Flow Formation
And finally you can use this Source to perform the println (As a side note: your flow value is actually a Sink since Flow + Sink = Sink). Using your flow value from the question:
messageSource runWith flow
One akka Stream processing all messages.

Related

Consuming Server Sent Events(SSE) without losing data using scala and Akka

I want to consume SSE events without losing any data when the rate of production is > rate of consumption. Since SSE supports backpressure Akka should be able to do it. I tried a few different ways but the extra messages are being dropped.
#Singleton
class SseConsumer #Inject()()(implicit ec: ExecutionContext) {
implicit val system = ActorSystem()
val send: HttpRequest => Future[HttpResponse] = foo
def foo(x: HttpRequest) = {
try {
val authHeader = Authorization(BasicHttpCredentials("user", "pass"))
val newHeaders = x.withHeaders(authHeader)
Http().singleRequest(newHeaders)
} catch {
case e: Exception => {
println("Exceptio12n", e.printStackTrace())
throw e
}
}
}
val eventSource2: Source[ServerSentEvent, NotUsed] =
EventSource(
uri = Uri("https://xyz/a/events/user"),
send,
initialLastEventId = Some("2"),
retryDelay = 1.second
)
def orderStatusEventStable() = {
val events: Future[immutable.Seq[ServerSentEvent]] =
eventSource2
.throttle(elements = 1, per = 3000.milliseconds, maximumBurst = 1, ThrottleMode.Shaping)
.take(5)
.runWith(Sink.seq)
events.map(_.foreach(x => {
// TODO: push to sqs
println("456")
println(x.data)
}))
}
Future {
blocking {
while (true) {
try {
Await.result(orderStatusEventStable() recover {
case e: Exception => {
println("exception", e)
throw e
}
}, Duration.Inf)
} catch {
case e: Exception => {
println("Exception", e.printStackTrace())
}
}
}
}
}
}
This code works but with the following problems:
Due to .take(5) when rate of consumption < rate of production, I am dropping events.
Also I want to process each message as it comes, and don't want to wait until 5 messages have reached. How can I do that ?
I have to write the consumer in a while loop. This does not seem event based, rather looks like polling (very similar to calling GET with pagination and limit of 5)
I am not sure about throttling, tried reading the docs but its very confusing. If I don't want to lose any events, is throttling the right approach? I am expecting a rate of 5000 req / sec in peak hours and 10 req/sec otherwise. When the production rate is high I would I ideally want to apply backpressure. Is throttling the correct approach for that ? According to docs it seems correct as it says Backpressures when downstream backpressures or the incoming rate is higher than the speed limit
In order for Akka Stream back pressuring to work, you have to use only one source instead of recreating a kind of polling with a new source each time.
Forget about your loop and your def orderStatusEventStable.
Only do something like this (once):
eventSource2
.operator(event => /* do something */ )
.otherOperatorMaybe()
...
.runWith(Sink.foreach(println))
With operator and otherOperatorMaybe being operations on Akka Stream depending on what you want to achieve (like throttle and take in your original code).
List of operators: https://doc.akka.io/docs/akka/current/stream/operators/index.html
Akka Stream is powerful but you need to take some time to learn about it

Akka Thread Tuning

I have 100 threads, need to process only 12 threads at a time not more than that. After completion of these threads other 12 have to be processed and so on but it's processing only first 12 set threads then it terminates after that.
Here is my Logic :
class AkkaProcessing extends Actor {
def receive = {
case message: List[Any] =>
var meterName = message(0) // It Contains only 12 threads , it process them and terminates. Am unable to get remaining threads
val sqlContext = message(1).asInstanceOf[SQLContext]
val FlagDF = message(2).asInstanceOf[DataFrame]
{
All the business logic here
}
context.system.shutdown()
}
}
}
object Processing {
def main(args: Array[String]) = {
val rawBuff = new ArrayBuffer[Any]()
val actorSystem = ActorSystem("ActorSystem") // Creating ActorSystem
val actor = actorSystem.actorOf(Props[AkkaProcessing].withRouter(RoundRobinPool(200)), "my-Actor")
implicit val executionContext = actorSystem.dispatchers.lookup("akka.actor.my-dispatcher")
for (i <- 0 until meter_list.length) {
var meterName = meter_list(i) // All 100 Meters here
rawBuff.append(meterName, sqlContext, FlagDF)
actor ! rawBuff.toList
}
}
}
Any Inputs highly appreciated
I think you might be best to create 2 actor types : consumer (which run in parallel) and coordinator (which takes the 12 thread tasks and passes them to the consumers). The coordinator would wait for the consumers to finish and then run the next batch.
See this answer for a code example: Can Scala actors process multiple messages simultaneously?
Failing that, you could just use Futures in a similar manner.

How would you "connect" many independent graphs maintaining backpressure between them?

Continuing series of questions about akka-streams I have another problem.
Variables:
Single http client flow with throttling
Multiple other flows that want to use first flow simultaneously
Goal:
Single http flow is flow that makes requests to particular API that limits number of calls to it. Otherwise it bans me. Thus it's very important to maintain rate of request regardless of how many clients in my code use it.
There are number of other flows that want to make requests to mentioned API but I'd like to have backpressure from http flow. Normally you connect whole thing to one graph and it works. But it my case I have multiple graphs.
How would you solve it ?
My attempt to solve it:
I use Source.queue for http flow so that I can queue http requests and have throttling. Problem is that Future from SourceQueue.offer fails if I exceed number of requests. Thus somehow I need to "reoffer" when previously offered event completes. Thus modified Future from SourceQueue would backpressure other graphs (inside their mapAsync) that make http requests.
Here is how I implemented it
object Main {
implicit val system = ActorSystem("root")
implicit val executor = system.dispatcher
implicit val materializer = ActorMaterializer()
private val queueHttp = Source.queue[(String, Promise[String])](2, OverflowStrategy.backpressure)
.throttle(1, FiniteDuration(1000, MILLISECONDS), 1, ThrottleMode.Shaping)
.mapAsync(4) {
case (text, promise) =>
// Simulate delay of http request
val delay = (Random.nextDouble() * 1000 / 2).toLong
Thread.sleep(delay)
Future.successful(text -> promise)
}
.toMat(Sink.foreach({
case (text, p) =>
p.success(text)
}))(Keep.left)
.run
val futureDeque = new ConcurrentLinkedDeque[Future[String]]()
def sendRequest(value: String): Future[String] = {
val p = Promise[String]()
val offerFuture = queueHttp.offer(value -> p)
def addToQueue(future: Future[String]): Future[String] = {
futureDeque.addLast(future)
future.onComplete {
case _ => futureDeque.remove(future)
}
future
}
offerFuture.flatMap {
case QueueOfferResult.Enqueued =>
addToQueue(p.future)
}.recoverWith {
case ex =>
val first = futureDeque.pollFirst()
if (first != null)
addToQueue(first.flatMap(_ => sendRequest(value)))
else
sendRequest(value)
}
}
def main(args: Array[String]) {
val allFutures = for (v <- 0 until 15)
yield {
val res = sendRequest(s"Text $v")
res.onSuccess {
case text =>
println("> " + text)
}
res
}
Future.sequence(allFutures).onComplete {
case Success(text) =>
println(s">>> TOTAL: ${text.length} [in queue: ${futureDeque.size()}]")
system.terminate()
case Failure(ex) =>
ex.printStackTrace()
system.terminate()
}
Await.result(system.whenTerminated, Duration.Inf)
}
}
Disadvantage of this solution is that I have locking on ConcurrentLinkedDeque which is probably not that bad for rate of 1 request per second but still.
How would you solve this task?
We have an open ticket (https://github.com/akka/akka/issues/19478) and some ideas for a "Hub" stage which would allow for dynamically combining streams, but I'm afraid I cannot give you any estimate for when it will be done.
So that is how we, in the Akka team, would solve the task. ;)

Akka-Streams ActorPublisher does not receive any Request messages

I am trying to continuously read the wikipedia IRC channel using this lib: https://github.com/implydata/wikiticker
I created a custom Akka Publisher, which will be used in my system as a Source.
Here are some of my classes:
class IrcPublisher() extends ActorPublisher[String] {
import scala.collection._
var queue: mutable.Queue[String] = mutable.Queue()
override def receive: Actor.Receive = {
case Publish(s) =>
println(s"->MSG, isActive = $isActive, totalDemand = $totalDemand")
queue.enqueue(s)
publishIfNeeded()
case Request(cnt) =>
println("Request: " + cnt)
publishIfNeeded()
case Cancel =>
println("Cancel")
context.stop(self)
case _ =>
println("Hm...")
}
def publishIfNeeded(): Unit = {
while (queue.nonEmpty && isActive && totalDemand > 0) {
println("onNext")
onNext(queue.dequeue())
}
}
}
object IrcPublisher {
case class Publish(data: String)
}
I am creating all this objects like so:
def createSource(wikipedias: Seq[String]) {
val dataPublisherRef = system.actorOf(Props[IrcPublisher])
val dataPublisher = ActorPublisher[String](dataPublisherRef)
val listener = new MessageListener {
override def process(message: Message) = {
dataPublisherRef ! Publish(Jackson.generate(message.toMap))
}
}
val ticker = new IrcTicker(
"irc.wikimedia.org",
"imply",
wikipedias map (x => s"#$x.wikipedia"),
Seq(listener)
)
ticker.start() // if I comment this...
Thread.currentThread().join() //... and this I get Request(...)
Source.fromPublisher(dataPublisher)
}
So the problem I am facing is this Source object. Although this implementation works well with other sources (for example from local file), the ActorPublisher don't receive Request() messages.
If I comment the two marked lines I can see, that my actor has received the Request(count) message from my flow. Otherwise all messages will be pushed into the queue, but not in my flow (so I can see the MSG messages printed).
I think it's something with multithreading/synchronization here.
I am not familiar enough with wikiticker to solve your problem as given. One question I would have is: why is it necessary to join to the current thread?
However, I think you have overcomplicated the usage of Source. It would be easier for you to work with the stream as a whole rather than create a custom ActorPublisher.
You can use Source.actorRef to materialize a stream into an ActorRef and work with that ActorRef. This allows you to utilize akka code to do the enqueing/dequeing onto the buffer while you can focus on the "business logic".
Say, for example, your entire stream is only to filter lines above a certain length and print them to the console. This could be accomplished with:
def dispatchIRCMessages(actorRef : ActorRef) = {
val ticker =
new IrcTicker("irc.wikimedia.org",
"imply",
wikipedias map (x => s"#$x.wikipedia"),
Seq(new MessageListener {
override def process(message: Message) =
actorRef ! Publish(Jackson.generate(message.toMap))
}))
ticker.start()
Thread.currentThread().join()
}
//these variables control the buffer behavior
val bufferSize = 1024
val overFlowStrategy = akka.stream.OverflowStrategy.dropHead
val minMessageSize = 32
//no need for a custom Publisher/Queue
val streamRef =
Source.actorRef[String](bufferSize, overFlowStrategy)
.via(Flow[String].filter(_.size > minMessageSize))
.to(Sink.foreach[String](println))
.run()
dispatchIRCMessages(streamRef)
The dispatchIRCMessages has the added benefit that it will work with any ActorRef so you aren't required to only work with streams/publishers.
Hopefully this solves your underlying problem...
I think the main problem is Thread.currentThread().join(). This line will 'hang' current thread because this thread is waiting for himself to die. Please read https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html#join-long- .

Scala Akka Consumer/Producer: Return Value

Problem Statement
Assume I have a file with sentences that is processed line by line. In my case, I need to extract Named Entities (Persons, Organizations, ...) from these lines. Unfortunately, the tagger is quite slow. Therefore, I decided to parallelize the computation, such that lines could be processed independent from each other and the result is collected in a central location.
Current Approach
My current approach comprises the usage of a single producer multiple consumer concept. However, I'm relative new to Akka, but I think my problem description fits well into its capabilities. Let me show you some code:
Producer
The Producer reads the file line by line and sends it to the Consumer. If it reaches the total line limit, it propagates the result back to WordCount.
class Producer(consumers: ActorRef) extends Actor with ActorLogging {
var master: Option[ActorRef] = None
var result = immutable.List[String]()
var totalLines = 0
var linesProcessed = 0
override def receive = {
case StartProcessing() => {
master = Some(sender)
Source.fromFile("sent.txt", "utf-8").getLines.foreach { line =>
consumers ! Sentence(line)
totalLines += 1
}
context.stop(self)
}
case SentenceProcessed(list) => {
linesProcessed += 1
result :::= list
//If we are done, we can propagate the result to the creator
if (linesProcessed == totalLines) {
master.map(_ ! result)
}
}
case _ => log.error("message not recognized")
}
}
Consumer
class Consumer extends Actor with ActorLogging {
def tokenize(line: String): Seq[String] = {
line.split(" ").map(_.toLowerCase)
}
override def receive = {
case Sentence(sent) => {
//Assume: This is representative for the extensive computation method
val tokens = tokenize(sent)
sender() ! SentenceProcessed(tokens.toList)
}
case _ => log.error("message not recognized")
}
}
WordCount (Master)
class WordCount extends Actor {
val consumers = context.actorOf(Props[Consumer].
withRouter(FromConfig()).
withDispatcher("consumer-dispatcher"), "consumers")
val producer = context.actorOf(Props(new Producer(consumers)), "producer")
context.watch(consumers)
context.watch(producer)
def receive = {
case Terminated(`producer`) => consumers ! Broadcast(PoisonPill)
case Terminated(`consumers`) => context.system.shutdown
}
}
object WordCount {
def getActor() = new WordCount
def getConfig(routerType: String, dispatcherType: String)(numConsumers: Int) = s"""
akka.actor.deployment {
/WordCount/consumers {
router = $routerType
nr-of-instances = $numConsumers
dispatcher = consumer-dispatcher
}
}
consumer-dispatcher {
type = $dispatcherType
executor = "fork-join-executor"
}"""
}
The WordCount actor is responsible for creating the other actors. When the Consumer is finished the Producer sends a message with all tokens. But, how to propagate the message again and also accept and wait for it? The architecture with the third WordCount actor might be wrong.
Main Routine
case class Run(name: String, actor: () => Actor, config: (Int) => String)
object Main extends App {
val run = Run("push_implementation", WordCount.getActor _, WordCount.getConfig("balancing-pool", "Dispatcher") _)
def execute(run: Run, numConsumers: Int) = {
val config = ConfigFactory.parseString(run.config(numConsumers))
val system = ActorSystem("Counting", ConfigFactory.load(config))
val startTime = System.currentTimeMillis
system.actorOf(Props(run.actor()), "WordCount")
/*
How to get the result here?!
*/
system.awaitTermination
System.currentTimeMillis - startTime
}
execute(run, 4)
}
Problem
As you see, the actual problem is to propagate the result back to the Main routine. Can you tell me how to do this in a proper way? The question is also how to wait for the result until the consumers are finished? I had a brief look into the Akka Future documentation section, but the whole system is a little bit overwhelming for beginners. Something like var future = message ? actor seems suitable. Not sure, how to do this. Also using the WordCount actor causes additional complexity. Maybe it is possible to come up with a solution that doesn't need this actor?
Consider using the Akka Aggregator Pattern. That takes care of the low-level primitives (watching actors, poison pill, etc). You can focus on managing state.
Your call to system.actorOf() returns an ActorRef, but you're not using it. You should ask that actor for results. Something like this:
implicit val timeout = Timeout(5 seconds)
val wCount = system.actorOf(Props(run.actor()), "WordCount")
val answer = Await.result(wCount ? "sent.txt", timeout.duration)
This means your WordCount class needs a receive method that accepts a String message. That section of code should aggregate the results and tell the sender(), like this:
class WordCount extends Actor {
def receive: Receive = {
case filename: String =>
// do all of your code here, using filename
sender() ! results
}
}
Also, rather than blocking on the results with Await above, you can apply some techniques for handling Futures.