How do I throttle messages in Akka (2.1.2)? - scala

Could you guys please show example of throttling messages in Akka ?
Here is my code
object Program {
def main(args: Array[String]) {
val system = ActorSystem()
val actor: ActorRef = system.actorOf(Props[HelloActor].withDispatcher("akka.actor.my-thread-pool-dispatcher"))
val zzz : Function0[Unit] = () => {
println(System.currentTimeMillis())
Thread.sleep(5000)
}
var i: Int = 0
while (i < 100) {
actor ! zzz
i += 1
}
println("DONE")
// system.shutdown()
}
}
class HelloActor extends Actor {
def receive = {
case func : Function0[Unit] => func()
}
}
and here is my config
akka {
actor {
my-thread-pool-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
task-queue-type = "array"
task-queue-size = 4
}
}
}
}
But when I run it it appears to be single-threaded where as I expect 4 messages to be processed at the same time.
What am I missing here ?

I don't see the connection between the question's title and the content.
Here is an article about throttling messages in Akka:
http://letitcrash.com/post/28901663062/throttling-messages-in-akka-2
However, you seem puzzled about the fact that your actor is processing only one message at a time. But that's how Akka actors work. They have a single mailbox of messages and they process only one message at a time in a continuous loop.
If you want to handle multiple tasks concurrently with the same work processing unit I suggest you take a look at routers:
http://doc.akka.io/docs/akka/2.1.2/scala/routing.html

Typesafe has recently announced akka reactive streams. Throttling can be achieved using its backpressure capability.
http://java.dzone.com/articles/reactive-queue-akka-reactive

Related

Is there a limit to how many Akka Streams can run at the same time?

I am trying to implement a simple one-to-many pub/sub pattern using a BroadcastHub. This fails silently for large numbers of subscribers, which makes me think I am hitting some limit on the number of streams I can run.
First, let's define some events:
sealed trait Event
case object EX extends Event
case object E1 extends Event
case object E2 extends Event
case object E3 extends Event
case object E4 extends Event
case object E5 extends Event
I have implemented the publisher using a BroadcastHub, adding a Sink.actorRefWithAck each time I want to add a new subscriber. Publishing the EX event ends the broadcast:
trait Publisher extends Actor with ActorLogging {
implicit val materializer = ActorMaterializer()
private val sourceQueue = Source.queue[Event](Publisher.bufferSize, Publisher.overflowStrategy)
private val (
queue: SourceQueueWithComplete[Event],
source: Source[Event, NotUsed]
) = {
val (q,s) = sourceQueue.toMat(BroadcastHub.sink(bufferSize = 256))(Keep.both).run()
s.runWith(Sink.ignore)
(q,s)
}
def publish(evt: Event) = {
log.debug("Publishing Event: {}", evt.getClass().toString())
queue.offer(evt)
evt match {
case EX => queue.complete()
case _ => Unit
}
}
def subscribe(actor: ActorRef, ack: ActorRef): Unit =
source.runWith(
Sink.actorRefWithAck(
actor,
onInitMessage = Publisher.StreamInit(ack),
ackMessage = Publisher.StreamAck,
onCompleteMessage = Publisher.StreamDone,
onFailureMessage = onErrorMessage))
def onErrorMessage(ex: Throwable) = Publisher.StreamFail(ex)
def publisherBehaviour: Receive = {
case Publisher.Subscribe(sub, ack) => subscribe(sub, ack.getOrElse(sender()))
case Publisher.StreamAck => Unit
}
override def receive = LoggingReceive { publisherBehaviour }
}
object Publisher {
final val bufferSize = 5
final val overflowStrategy = OverflowStrategy.backpressure
case class Subscribe(sub: ActorRef, ack: Option[ActorRef])
case object StreamAck
case class StreamInit(ack: ActorRef)
case object StreamDone
case class StreamFail(ex: Throwable)
}
Subscribers can implement the Subscriber trait to separate the logic:
trait Subscriber {
def onInit(publisher: ActorRef): Unit = ()
def onInit(publisher: ActorRef, k: KillSwitch): Unit = onInit(publisher)
def onEvent(event: Event): Unit = ()
def onDone(publisher: ActorRef, subscriber: ActorRef): Unit = ()
def onFail(e: Throwable, publisher: ActorRef, subscriber: ActorRef): Unit = ()
}
The actor logic is quite simple:
class SubscriberActor(subscriber: Subscriber) extends Actor with ActorLogging {
def subscriberBehaviour: Receive = {
case Publisher.StreamInit(ack) => {
log.debug("Stream initialized.")
subscriber.onInit(sender())
sender() ! Publisher.StreamAck
ack.forward(Publisher.StreamInit(ack))
}
case Publisher.StreamDone => {
log.debug("Stream completed.")
subscriber.onDone(sender(),self)
}
case Publisher.StreamFail(ex) => {
log.error(ex, "Stream failed!")
subscriber.onFail(ex,sender(),self)
}
case e: Event => {
log.debug("Observing Event: {}",e)
subscriber.onEvent(e)
sender() ! Publisher.StreamAck
}
}
override def receive = LoggingReceive { subscriberBehaviour }
}
One of the key points is that all subscribers must receive all messages sent by the publisher, so we have to know that all streams have materialized and all actors are ready to receive before starting the broadcast. This is why the StreamInit message is forwarded to another, user-provided actor.
To test this, I define a simple MockPublisher that just broadcasts a list of events when told to do so:
class MockPublisher(events: Event*) extends Publisher {
def receiveBehaviour: Receive = {
case MockPublish => events map publish
}
override def receive = LoggingReceive { receiveBehaviour orElse publisherBehaviour }
}
case object MockPublish
I also define a MockSubscriber who merely counts how many events it has seen:
class MockSubscriber extends Subscriber {
var count = 0
val promise = Promise[Int]()
def future = promise.future
override def onInit(publisher: ActorRef): Unit = count = 0
override def onEvent(event: Event): Unit = count += 1
override def onDone(publisher: ActorRef, subscriber: ActorRef): Unit = promise.success(count)
override def onFail(e: Throwable, publisher: ActorRef, subscriber: ActorRef): Unit = promise.failure(e)
}
And a small method for subscription:
object MockSubscriber {
def sub(publisher: ActorRef, ack: ActorRef)(implicit system: ActorSystem): Future[Int] = {
val s = new MockSubscriber()
implicit val tOut = Timeout(1.minute)
val a = system.actorOf(Props(new SubscriberActor(s)))
val f = publisher ! Publisher.Subscribe(a, Some(ack))
s.future
}
}
I put everything together in a unit test:
class SubscriberTests extends TestKit(ActorSystem("SubscriberTests")) with
WordSpecLike with Matchers with BeforeAndAfterAll with ImplicitSender {
override def beforeAll:Unit = {
system.eventStream.setLogLevel(Logging.DebugLevel)
}
override def afterAll:Unit = {
println("Shutting down...")
TestKit.shutdownActorSystem(system)
}
"The Subscriber" must {
"publish events to many observers" in {
val n = 9
val p = system.actorOf(Props(new MockPublisher(E1,E2,E3,E4,E5,EX)))
val q = scala.collection.mutable.Queue[Future[Int]]()
for (i <- 1 to n) {
q += MockSubscriber.sub(p,self)
}
for (i <- 1 to n) {
expectMsgType[Publisher.StreamInit](70.seconds)
}
p ! MockPublish
q.map { f => Await.result(f, 10.seconds) should be (6) }
}
}
}
This test succeeds for relatively small values of n, but fails for, say, val n = 90000. No caught or uncaught exception appears anywhere and neither does any out-of-memory complaint from Java (which does occur if I go even higher).
What am I missing?
Edit: Tried this on multiple computers with different specs. Debug info shows no messages reach any of the subscribers once n is high enough.
Akka Stream (and any other reactive stream, actually) provides you backpressure. If you hadn't messed up with how you create your consumers (e.g. allowing creation of 1GB JSON, which will you chop into smaller pieces only after you fetched it into memory) you should have a comfortable situation where you can consider your memory usage pretty much upper-bounded (because of how backpressure manage push-pull mechanics). Once you measure where your upper-bound lies, your can set up your JVM and container memory, so that you could let it run without fear of out of memory errors (provided that there is not other thing happening in your JVM which could cause memory usage spike).
So, from this we can see that there is some constraint on how much stream you can run in parallel - specifically you can run only as much of them as your memory allows you. CPU should not be a limitation (as you will have multiple threads), but if you will start too much of them on one machine, then CPU inevitably with have to switch between different streams making each of them slower. It might not be a technical blocker, but you might end up in a situation where processing is so slow that it doesn't fulfill its business purpose (though, I guess, you would have to run much more than few of streams at once).
In your tests you might run into some other issues as well. E.g. if you reuse the same thread pool for some blocking operations as you use for Actor System without informing the thread pool that they are blocking, you might end up with a dead lock (as a matter of the fact, you should run all IO blocking operations on a different thread pool than "computing" operations). Having 90000(!) concurrent things happening at the same time (and probably having the same small thread pool) almost guarantees running into issues (I guess you could run into issues even if instead of actors you would run the code directly on futures). Here you are using actor system in tests, which AFAIR use blocking logic only highlighting all the possible issues with small thread pools which keep blocking and non-blocking tasks in the same place.

how to watch multiple akka actors for termination

I have akka system which is basically two producer actors that send messages to one consumer actor. In a simplified form I have something like this:
class ProducerA extends Actor {
def receive = {
case Produce => Consumer ! generateMessageA()
}
... more code ...
}
class ProducerB extends Actor {
def receive = {
case Produce => Consumer ! generateMessageB()
}
... more code ...
}
class Consumer extends Actor {
def receive = {
case A => handleMessageA(A)
case B => handleMessageB(B)
}
... more code ...
}
And they are all siblings of the same akka system.
I am trying to figure out how to terminate this system gracefully. This means that on shutdown I want ProducerA and ProducerB to stop immediately and then I want Consumer to finish processing whatever messages are left in the message queue and then shutdown.
It seems like what I want is for the Consumer actor to be able to watch for the termination of both ProducerA and ProducerB. Or generally, it seems like what I want is to be able to send a PoisonPill message to the Consumer after both the producers are stopped.
https://alvinalexander.com/scala/how-to-monitor-akka-actor-death-with-watch-method
The above tutorial has a pretty good explanation of how one actor can watch for the termination of one other actor but not sure how an actor could watch for the termination of multiple actors.
An actor can watch multiple actors simply via multiple invocations of context.watch, passing in a different ActorRef with each call. For example, your Consumer actor could watch the termination of the Producer actors in the following way:
case class WatchMe(ref: ActorRef)
class Consumer extends Actor {
var watched = Set[ActorRef]()
def receive = {
case WatchMe(ref) =>
context.watch(ref)
watched = watched + ref
case Terminated(ref) =>
watched = watched - ref
if (watched.isEmpty) self ! PoisonPill
// case ...
}
}
Both Producer actors would send their respective references to Consumer, which would then monitor the Producer actors for termination. When the Producer actors are both terminated, the Consumer sends a PoisonPill to itself. Because a PoisonPill is treated like a normal message in an actor's mailbox, the Consumer will process any messages that are already enqueued before handling the PoisonPill and shutting itself down.
A similar pattern is described in Derek Wyatt's "Shutdown Patterns in Akka 2" blog post, which is mentioned in the Akka documentation.
import akka.actor._
import akka.util.Timeout
import scala.concurrent.duration.DurationInt
class Producer extends Actor {
def receive = {
case _ => println("Producer received a message")
}
}
case object KillConsumer
class Consumer extends Actor {
def receive = {
case KillConsumer =>
println("Stopping Consumer After All Producers")
context.stop(self)
case _ => println("Parent received a message")
}
override def postStop(): Unit = {
println("Post Stop Consumer")
super.postStop()
}
}
class ProducerWatchers(producerListRef: List[ActorRef], consumerRef: ActorRef) extends Actor {
producerListRef.foreach(x => context.watch(x))
context.watch(consumerRef)
var producerActorCount = producerListRef.length
implicit val timeout: Timeout = Timeout(5 seconds)
override def receive: Receive = {
case Terminated(x) if producerActorCount == 1 && producerListRef.contains(x) =>
consumerRef ! KillConsumer
case Terminated(x) if producerListRef.contains(x) =>
producerActorCount -= 1
case Terminated(`consumerRef`) =>
println("Killing ProducerWatchers On Consumer End")
context.stop(self)
case _ => println("Dropping Message")
}
override def postStop(): Unit = {
println("Post Stop ProducerWatchers")
super.postStop()
}
}
object ProducerWatchers {
def apply(producerListRef: List[ActorRef], consumerRef: ActorRef) : Props = Props(new ProducerWatchers(producerListRef, consumerRef))
}
object AkkaActorKill {
def main(args: Array[String]): Unit = {
val actorSystem = ActorSystem("AkkaActorKill")
implicit val timeout: Timeout = Timeout(10 seconds)
val consumerRef = actorSystem.actorOf(Props[Consumer], "Consumer")
val producer1 = actorSystem.actorOf(Props[Producer], name = "Producer1")
val producer2 = actorSystem.actorOf(Props[Producer], name = "Producer2")
val producer3 = actorSystem.actorOf(Props[Producer], name = "Producer3")
val producerWatchers = actorSystem.actorOf(ProducerWatchers(List[ActorRef](producer1, producer2, producer3), consumerRef),"ProducerWatchers")
producer1 ! PoisonPill
producer2 ! PoisonPill
producer3 ! PoisonPill
Thread.sleep(5000)
actorSystem.terminate
}
}
It can be implemented using ProducerWatchers actor, which manages producers killed, once all the producers are killed you can kill the Consumer actor, and then the ProducerWatchers actor.
so the solution I ended up going with was inspired by Derek Wyatt's terminator pattern
val shutdownFut = Future.sequence(
Seq(
gracefulStop(producerA, ProducerShutdownWaitSeconds seconds),
gracefulStop(producerB, ProducerShutdownWaitSeconds seconds),
)
).flatMap(_ => gracefulStop(consumer, ConsumerShutdownWaitSeconds seconds))
Await.result(shutdownFut, (ProducerShutdownWaitSeconds seconds) + (ConsumerShutdownWaitSeconds seconds))
This was more or less exactly what I wanted. The consumer shutdown waits for the producers to shutdown based on the fulfillment of futures. Furthermore, the whole shutdown itself results in a future which you can wait on therefore being able to keep the thread up long enough for everything to clean up properly.

Akka Thread Tuning

I have 100 threads, need to process only 12 threads at a time not more than that. After completion of these threads other 12 have to be processed and so on but it's processing only first 12 set threads then it terminates after that.
Here is my Logic :
class AkkaProcessing extends Actor {
def receive = {
case message: List[Any] =>
var meterName = message(0) // It Contains only 12 threads , it process them and terminates. Am unable to get remaining threads
val sqlContext = message(1).asInstanceOf[SQLContext]
val FlagDF = message(2).asInstanceOf[DataFrame]
{
All the business logic here
}
context.system.shutdown()
}
}
}
object Processing {
def main(args: Array[String]) = {
val rawBuff = new ArrayBuffer[Any]()
val actorSystem = ActorSystem("ActorSystem") // Creating ActorSystem
val actor = actorSystem.actorOf(Props[AkkaProcessing].withRouter(RoundRobinPool(200)), "my-Actor")
implicit val executionContext = actorSystem.dispatchers.lookup("akka.actor.my-dispatcher")
for (i <- 0 until meter_list.length) {
var meterName = meter_list(i) // All 100 Meters here
rawBuff.append(meterName, sqlContext, FlagDF)
actor ! rawBuff.toList
}
}
}
Any Inputs highly appreciated
I think you might be best to create 2 actor types : consumer (which run in parallel) and coordinator (which takes the 12 thread tasks and passes them to the consumers). The coordinator would wait for the consumers to finish and then run the next batch.
See this answer for a code example: Can Scala actors process multiple messages simultaneously?
Failing that, you could just use Futures in a similar manner.

Akka Actors Still Available After Stopped by PoisonPill

I'm using akka to dynamically create actors and destroy them when they're finished with a particular job. I've got a handle on actor creation, however stopping the actors keeps them in memory regardless of how I've terminated them. Eventually this causes an out of memory exception, despite the fact that I should only have a handful of active actors at any given time.
I've used:
self.tell(PoisonPill, self)
and:
context.stop(self)
to try and destroy the actors. Any ideas?
Edit: Here's a bit more to flesh out what I'm trying to do. The program opens up and spawns ten actors.
val system = ActorSystem("system")
(1 to 10) foreach { x =>
Entity.count += 1
system.actorOf(Props[Entity], name = Entity.count.toString())
}
Here's the code for the Entity:
class Entity () extends Actor {
Entity.entities += this
val id = Entity.count
import context.dispatcher
val tick = context.system.scheduler.schedule(0 millis, 100 millis, self, "update")
def receive = {
case "update" => {
Entity.entities.foreach(that => collide(that))
}
}
override def postStop() = tick.cancel()
def collide(that:Entity) {
if (!this.isBetterThan(that)) {
destroyMe()
spawnNew()
}
}
def isBetterThan() :Boolean = {
//computationally intensive logic
}
private def destroyMe(){
Entity.entities.remove(Entity.entities.indexOf(this))
self.tell(PoisonPill, self)
//context.stop(self)
}
private def spawnNew(){
val system = ActorSystem("system")
Entity.count += 1
system.actorOf(Props[Entity], name = Entity.count.toString())
}
}
object Entity {
val entities = new ListBuffer[Entity]()
var count = 0
}
Thanks #AmigoNico, you pointed me in the right direction. It turns out that neither
self.tell(PoisonPill, self)
nor
context.stop(self)
worked for timely Actor disposal; I switched the line to:
system.stop(self)
and everything works as expected.

Scala Akka Consumer/Producer: Return Value

Problem Statement
Assume I have a file with sentences that is processed line by line. In my case, I need to extract Named Entities (Persons, Organizations, ...) from these lines. Unfortunately, the tagger is quite slow. Therefore, I decided to parallelize the computation, such that lines could be processed independent from each other and the result is collected in a central location.
Current Approach
My current approach comprises the usage of a single producer multiple consumer concept. However, I'm relative new to Akka, but I think my problem description fits well into its capabilities. Let me show you some code:
Producer
The Producer reads the file line by line and sends it to the Consumer. If it reaches the total line limit, it propagates the result back to WordCount.
class Producer(consumers: ActorRef) extends Actor with ActorLogging {
var master: Option[ActorRef] = None
var result = immutable.List[String]()
var totalLines = 0
var linesProcessed = 0
override def receive = {
case StartProcessing() => {
master = Some(sender)
Source.fromFile("sent.txt", "utf-8").getLines.foreach { line =>
consumers ! Sentence(line)
totalLines += 1
}
context.stop(self)
}
case SentenceProcessed(list) => {
linesProcessed += 1
result :::= list
//If we are done, we can propagate the result to the creator
if (linesProcessed == totalLines) {
master.map(_ ! result)
}
}
case _ => log.error("message not recognized")
}
}
Consumer
class Consumer extends Actor with ActorLogging {
def tokenize(line: String): Seq[String] = {
line.split(" ").map(_.toLowerCase)
}
override def receive = {
case Sentence(sent) => {
//Assume: This is representative for the extensive computation method
val tokens = tokenize(sent)
sender() ! SentenceProcessed(tokens.toList)
}
case _ => log.error("message not recognized")
}
}
WordCount (Master)
class WordCount extends Actor {
val consumers = context.actorOf(Props[Consumer].
withRouter(FromConfig()).
withDispatcher("consumer-dispatcher"), "consumers")
val producer = context.actorOf(Props(new Producer(consumers)), "producer")
context.watch(consumers)
context.watch(producer)
def receive = {
case Terminated(`producer`) => consumers ! Broadcast(PoisonPill)
case Terminated(`consumers`) => context.system.shutdown
}
}
object WordCount {
def getActor() = new WordCount
def getConfig(routerType: String, dispatcherType: String)(numConsumers: Int) = s"""
akka.actor.deployment {
/WordCount/consumers {
router = $routerType
nr-of-instances = $numConsumers
dispatcher = consumer-dispatcher
}
}
consumer-dispatcher {
type = $dispatcherType
executor = "fork-join-executor"
}"""
}
The WordCount actor is responsible for creating the other actors. When the Consumer is finished the Producer sends a message with all tokens. But, how to propagate the message again and also accept and wait for it? The architecture with the third WordCount actor might be wrong.
Main Routine
case class Run(name: String, actor: () => Actor, config: (Int) => String)
object Main extends App {
val run = Run("push_implementation", WordCount.getActor _, WordCount.getConfig("balancing-pool", "Dispatcher") _)
def execute(run: Run, numConsumers: Int) = {
val config = ConfigFactory.parseString(run.config(numConsumers))
val system = ActorSystem("Counting", ConfigFactory.load(config))
val startTime = System.currentTimeMillis
system.actorOf(Props(run.actor()), "WordCount")
/*
How to get the result here?!
*/
system.awaitTermination
System.currentTimeMillis - startTime
}
execute(run, 4)
}
Problem
As you see, the actual problem is to propagate the result back to the Main routine. Can you tell me how to do this in a proper way? The question is also how to wait for the result until the consumers are finished? I had a brief look into the Akka Future documentation section, but the whole system is a little bit overwhelming for beginners. Something like var future = message ? actor seems suitable. Not sure, how to do this. Also using the WordCount actor causes additional complexity. Maybe it is possible to come up with a solution that doesn't need this actor?
Consider using the Akka Aggregator Pattern. That takes care of the low-level primitives (watching actors, poison pill, etc). You can focus on managing state.
Your call to system.actorOf() returns an ActorRef, but you're not using it. You should ask that actor for results. Something like this:
implicit val timeout = Timeout(5 seconds)
val wCount = system.actorOf(Props(run.actor()), "WordCount")
val answer = Await.result(wCount ? "sent.txt", timeout.duration)
This means your WordCount class needs a receive method that accepts a String message. That section of code should aggregate the results and tell the sender(), like this:
class WordCount extends Actor {
def receive: Receive = {
case filename: String =>
// do all of your code here, using filename
sender() ! results
}
}
Also, rather than blocking on the results with Await above, you can apply some techniques for handling Futures.