Is there a limit to how many Akka Streams can run at the same time? - scala

I am trying to implement a simple one-to-many pub/sub pattern using a BroadcastHub. This fails silently for large numbers of subscribers, which makes me think I am hitting some limit on the number of streams I can run.
First, let's define some events:
sealed trait Event
case object EX extends Event
case object E1 extends Event
case object E2 extends Event
case object E3 extends Event
case object E4 extends Event
case object E5 extends Event
I have implemented the publisher using a BroadcastHub, adding a Sink.actorRefWithAck each time I want to add a new subscriber. Publishing the EX event ends the broadcast:
trait Publisher extends Actor with ActorLogging {
implicit val materializer = ActorMaterializer()
private val sourceQueue = Source.queue[Event](Publisher.bufferSize, Publisher.overflowStrategy)
private val (
queue: SourceQueueWithComplete[Event],
source: Source[Event, NotUsed]
) = {
val (q,s) = sourceQueue.toMat(BroadcastHub.sink(bufferSize = 256))(Keep.both).run()
s.runWith(Sink.ignore)
(q,s)
}
def publish(evt: Event) = {
log.debug("Publishing Event: {}", evt.getClass().toString())
queue.offer(evt)
evt match {
case EX => queue.complete()
case _ => Unit
}
}
def subscribe(actor: ActorRef, ack: ActorRef): Unit =
source.runWith(
Sink.actorRefWithAck(
actor,
onInitMessage = Publisher.StreamInit(ack),
ackMessage = Publisher.StreamAck,
onCompleteMessage = Publisher.StreamDone,
onFailureMessage = onErrorMessage))
def onErrorMessage(ex: Throwable) = Publisher.StreamFail(ex)
def publisherBehaviour: Receive = {
case Publisher.Subscribe(sub, ack) => subscribe(sub, ack.getOrElse(sender()))
case Publisher.StreamAck => Unit
}
override def receive = LoggingReceive { publisherBehaviour }
}
object Publisher {
final val bufferSize = 5
final val overflowStrategy = OverflowStrategy.backpressure
case class Subscribe(sub: ActorRef, ack: Option[ActorRef])
case object StreamAck
case class StreamInit(ack: ActorRef)
case object StreamDone
case class StreamFail(ex: Throwable)
}
Subscribers can implement the Subscriber trait to separate the logic:
trait Subscriber {
def onInit(publisher: ActorRef): Unit = ()
def onInit(publisher: ActorRef, k: KillSwitch): Unit = onInit(publisher)
def onEvent(event: Event): Unit = ()
def onDone(publisher: ActorRef, subscriber: ActorRef): Unit = ()
def onFail(e: Throwable, publisher: ActorRef, subscriber: ActorRef): Unit = ()
}
The actor logic is quite simple:
class SubscriberActor(subscriber: Subscriber) extends Actor with ActorLogging {
def subscriberBehaviour: Receive = {
case Publisher.StreamInit(ack) => {
log.debug("Stream initialized.")
subscriber.onInit(sender())
sender() ! Publisher.StreamAck
ack.forward(Publisher.StreamInit(ack))
}
case Publisher.StreamDone => {
log.debug("Stream completed.")
subscriber.onDone(sender(),self)
}
case Publisher.StreamFail(ex) => {
log.error(ex, "Stream failed!")
subscriber.onFail(ex,sender(),self)
}
case e: Event => {
log.debug("Observing Event: {}",e)
subscriber.onEvent(e)
sender() ! Publisher.StreamAck
}
}
override def receive = LoggingReceive { subscriberBehaviour }
}
One of the key points is that all subscribers must receive all messages sent by the publisher, so we have to know that all streams have materialized and all actors are ready to receive before starting the broadcast. This is why the StreamInit message is forwarded to another, user-provided actor.
To test this, I define a simple MockPublisher that just broadcasts a list of events when told to do so:
class MockPublisher(events: Event*) extends Publisher {
def receiveBehaviour: Receive = {
case MockPublish => events map publish
}
override def receive = LoggingReceive { receiveBehaviour orElse publisherBehaviour }
}
case object MockPublish
I also define a MockSubscriber who merely counts how many events it has seen:
class MockSubscriber extends Subscriber {
var count = 0
val promise = Promise[Int]()
def future = promise.future
override def onInit(publisher: ActorRef): Unit = count = 0
override def onEvent(event: Event): Unit = count += 1
override def onDone(publisher: ActorRef, subscriber: ActorRef): Unit = promise.success(count)
override def onFail(e: Throwable, publisher: ActorRef, subscriber: ActorRef): Unit = promise.failure(e)
}
And a small method for subscription:
object MockSubscriber {
def sub(publisher: ActorRef, ack: ActorRef)(implicit system: ActorSystem): Future[Int] = {
val s = new MockSubscriber()
implicit val tOut = Timeout(1.minute)
val a = system.actorOf(Props(new SubscriberActor(s)))
val f = publisher ! Publisher.Subscribe(a, Some(ack))
s.future
}
}
I put everything together in a unit test:
class SubscriberTests extends TestKit(ActorSystem("SubscriberTests")) with
WordSpecLike with Matchers with BeforeAndAfterAll with ImplicitSender {
override def beforeAll:Unit = {
system.eventStream.setLogLevel(Logging.DebugLevel)
}
override def afterAll:Unit = {
println("Shutting down...")
TestKit.shutdownActorSystem(system)
}
"The Subscriber" must {
"publish events to many observers" in {
val n = 9
val p = system.actorOf(Props(new MockPublisher(E1,E2,E3,E4,E5,EX)))
val q = scala.collection.mutable.Queue[Future[Int]]()
for (i <- 1 to n) {
q += MockSubscriber.sub(p,self)
}
for (i <- 1 to n) {
expectMsgType[Publisher.StreamInit](70.seconds)
}
p ! MockPublish
q.map { f => Await.result(f, 10.seconds) should be (6) }
}
}
}
This test succeeds for relatively small values of n, but fails for, say, val n = 90000. No caught or uncaught exception appears anywhere and neither does any out-of-memory complaint from Java (which does occur if I go even higher).
What am I missing?
Edit: Tried this on multiple computers with different specs. Debug info shows no messages reach any of the subscribers once n is high enough.

Akka Stream (and any other reactive stream, actually) provides you backpressure. If you hadn't messed up with how you create your consumers (e.g. allowing creation of 1GB JSON, which will you chop into smaller pieces only after you fetched it into memory) you should have a comfortable situation where you can consider your memory usage pretty much upper-bounded (because of how backpressure manage push-pull mechanics). Once you measure where your upper-bound lies, your can set up your JVM and container memory, so that you could let it run without fear of out of memory errors (provided that there is not other thing happening in your JVM which could cause memory usage spike).
So, from this we can see that there is some constraint on how much stream you can run in parallel - specifically you can run only as much of them as your memory allows you. CPU should not be a limitation (as you will have multiple threads), but if you will start too much of them on one machine, then CPU inevitably with have to switch between different streams making each of them slower. It might not be a technical blocker, but you might end up in a situation where processing is so slow that it doesn't fulfill its business purpose (though, I guess, you would have to run much more than few of streams at once).
In your tests you might run into some other issues as well. E.g. if you reuse the same thread pool for some blocking operations as you use for Actor System without informing the thread pool that they are blocking, you might end up with a dead lock (as a matter of the fact, you should run all IO blocking operations on a different thread pool than "computing" operations). Having 90000(!) concurrent things happening at the same time (and probably having the same small thread pool) almost guarantees running into issues (I guess you could run into issues even if instead of actors you would run the code directly on futures). Here you are using actor system in tests, which AFAIR use blocking logic only highlighting all the possible issues with small thread pools which keep blocking and non-blocking tasks in the same place.

Related

Ask Akka actor for a result only when all the messages are processed

I am trying to split a big chunk of text into multiple paragraphs and process it concurrently by calling an external API.
An immutable list is updated each time the response comes from the API for the paragraph.
Once the paragraphs are processed and the list is updated, I would like to ask the Actor for the final status to be used in the next steps.
The problem with the below approach is that I would never know when all the paragraphs are processed.
I need to get back the targetStore once all the paragraphs are processed and the list is final.
def main(args: Array[String]) {
val source = Source.fromFile("input.txt")
val extDelegator = new ExtractionDelegator()
source.getLines().foreach(line => extDelegator.processParagraph(line))
extDelegator.getFinalResult()
}
case class Extract(uuid: UUID, text: String)
case class UpdateList(text: String)
case class DelegateLambda(text: String)
case class FinalResult()
class ExtractionDelegator {
val system = ActorSystem("ExtractionDelegator")
val extActor = system.actorOf(Props(classOf[ExtractorDelegateActor]).withDispatcher("fixed-thread-pool"))
implicit val executionContext = system.dispatchers.lookup("fixed-thread-pool")
def processParagraph(text: String) = {
extActor ! Extract(uuid, text)
}
def getFinalResult(): java.util.List[String] = {
implicit val timeout = Timeout(5 seconds)
val askActor = system.actorOf(Props(classOf[ExtractorDelegateActor]))
val future = askActor ? FinalResult()
val result = Await.result(future, timeout.duration).asInstanceOf[java.util.List[String]]
result
}
def shutdown(): Unit = {
system.terminate()
}
}
/* Extractor Delegator actor*/
class ExtractorDelegateActor extends Actor with ActorLogging {
var targetStore:scala.collection.immutable.List[String] = scala.collection.immutable.List.empty
def receive = {
case Extract(uuid, text) => {
context.actorOf(Props[ExtractProcessor].withDispatcher("fixed-thread-pool")) ! DelegateLambda(text)
}
case UpdateList(res) => {
targetStore = targetStore :+ res
}
case FinalResult() => {
val senderActor=sender()
senderActor ! targetStore
}
}
}
/* Aggregator actor*/
class ExtractProcessor extends Actor with ActorLogging {
def receive = {
case DelegateLambda(text) => {
val res =callLamdaService(text)
sender ! UpdateList(res)
}
}
def callLamdaService(text: String): String = {
//THis is where external API is called.
Thread.sleep(1000)
result
}
}
Not sure why you want to use actors here, most simple would be to
// because you call external service, you have back async response most probably
def callLamdaService(text: String): Future[String]
and to process your text you do
implicit val ec = scala.concurrent.ExecutionContext.Implicits.global // use you execution context here
Future.sequence(source.getLines().map(callLamdaService)).map {results =>
// do what you want with results
}
If you still want to use actors, you can do it replacing callLamdaService to processParagraph which internally will do ask to worker actor, who returns result (so, signature for processParagraph will be def processParagraph(text: String): Future[String])
If you still want to start multiple tasks and then ask for result, then you just need to use context.become with receive(worker: Int), when you increase amount of workers for each Extract message and decrease amount of workers on each UpdateList message. You will also need to implement then delayed processing of FinalResult for the case of non-zero amount of processing workers.

Pushing elements externally to a reactive stream in fs2

I have an external (that is, I cannot change it) Java API which looks like this:
public interface Sender {
void send(Event e);
}
I need to implement a Sender which accepts each event, transforms it to a JSON object, collects some number of them into a single bundle and sends over HTTP to some endpoint. This all should be done asynchronously, without send() blocking the calling thread, with some fixed-size buffer and dropping new events if the buffer is full.
With akka-streams this is quite simple: I create a graph of stages (which uses akka-http to send HTTP requests), materialize it and use the materialized ActorRef to push new events to the stream:
lazy val eventPipeline = Source.actorRef[Event](Int.MaxValue, OverflowStrategy.fail)
.via(CustomBuffer(bufferSize)) // buffer all events
.groupedWithin(batchSize, flushDuration) // group events into chunks
.map(toBundle) // convert each chunk into a JSON message
.mapAsyncUnordered(1)(sendHttpRequest) // send an HTTP request
.toMat(Sink.foreach { response =>
// print HTTP response for debugging
})(Keep.both)
lazy val (eventsActor, completeFuture) = eventPipeline.run()
override def send(e: Event): Unit = {
eventsActor ! e
}
Here CustomBuffer is a custom GraphStage which is very similar to the library-provided Buffer but tailored to our specific needs; it probably does not matter for this particular question.
As you can see, interacting with the stream from non-stream code is very simple - the ! method on the ActorRef trait is asynchronous and does not need any additional machinery to be called. Each event which is sent to the actor is then processed through the entire reactive pipeline. Moreover, because of how akka-http is implemented, I even get connection pooling for free, so no more than one connection is opened to the server.
However, I cannot find a way to do the same thing with FS2 properly. Even discarding the question of buffering (I will probably need to write a custom Pipe implementation which does additional things that we need) and HTTP connection pooling, I'm still stuck with a more basic thing - that is, how to push the data to the reactive stream "from outside".
All tutorials and documentation that I can find assume that the entire program happens inside some effect context, usually IO. This is not my case - the send() method is invoked by the Java library at unspecified times. Therefore, I just cannot keep everything inside one IO action, I necessarily have to finalize the "push" action inside the send() method, and have the reactive stream as a separate entity, because I want to aggregate events and hopefully pool HTTP connections (which I believe is naturally tied to the reactive stream).
I assume that I need some additional data structure, like Queue. fs2 does indeed have some kind of fs2.concurrent.Queue, but again, all documentation shows how to use it inside a single IO context, so I assume that doing something like
val queue: Queue[IO, Event] = Queue.unbounded[IO, Event].unsafeRunSync()
and then using queue inside the stream definition and then separately inside the send() method with further unsafeRun calls:
val eventPipeline = queue.dequeue
.through(customBuffer(bufferSize))
.groupWithin(batchSize, flushDuration)
.map(toBundle)
.mapAsyncUnordered(1)(sendRequest)
.evalTap(response => ...)
.compile
.drain
eventPipeline.unsafeRunAsync(...) // or something
override def send(e: Event) {
queue.enqueue(e).unsafeRunSync()
}
is not the correct way and most likely would not even work.
So, my question is, how do I properly use fs2 to solve my problem?
Consider the following example:
import cats.implicits._
import cats.effect._
import cats.effect.implicits._
import fs2._
import fs2.concurrent.Queue
import scala.concurrent.ExecutionContext
import scala.concurrent.duration._
object Answer {
type Event = String
trait Sender {
def send(event: Event): Unit
}
def main(args: Array[String]): Unit = {
val sender: Sender = {
val ec = ExecutionContext.global
implicit val cs: ContextShift[IO] = IO.contextShift(ec)
implicit val timer: Timer[IO] = IO.timer(ec)
fs2Sender[IO](2)
}
val events = List("a", "b", "c", "d")
events.foreach { evt => new Thread(() => sender.send(evt)).start() }
Thread sleep 3000
}
def fs2Sender[F[_]: Timer : ContextShift](maxBufferedSize: Int)(implicit F: ConcurrentEffect[F]): Sender = {
// dummy impl
// this is where the actual logic for batching
// and shipping over the network would live
val consume: Pipe[F, Event, Unit] = _.evalMap { event =>
for {
_ <- F.delay { println(s"consuming [$event]...") }
_ <- Timer[F].sleep(1.seconds)
_ <- F.delay { println(s"...[$event] consumed") }
} yield ()
}
val suspended = for {
q <- Queue.bounded[F, Event](maxBufferedSize)
_ <- q.dequeue.through(consume).compile.drain.start
sender <- F.delay[Sender] { evt =>
val enqueue = for {
wasEnqueued <- q.offer1(evt)
_ <- F.delay { println(s"[$evt] enqueued? $wasEnqueued") }
} yield ()
enqueue.toIO.unsafeRunAsyncAndForget()
}
} yield sender
suspended.toIO.unsafeRunSync()
}
}
The main idea is to use a concurrent Queue from fs2. Note, that the above code demonstrates that neither the Sender interface nor the logic in main can be changed. Only an implementation of the Sender interface can be swapped out.
I don't have much experience with exactly that library but it should look somehow like that:
import cats.effect.{ExitCode, IO, IOApp}
import fs2.concurrent.Queue
case class Event(id: Int)
class JavaProducer{
new Thread(new Runnable {
override def run(): Unit = {
var id = 0
while(true){
Thread.sleep(1000)
id += 1
send(Event(id))
}
}
}).start()
def send(event: Event): Unit ={
println(s"Original producer prints $event")
}
}
class HackedProducer(queue: Queue[IO, Event]) extends JavaProducer {
override def send(event: Event): Unit = {
println(s"Hacked producer pushes $event")
queue.enqueue1(event).unsafeRunSync()
println(s"Hacked producer pushes $event - Pushed")
}
}
object Test extends IOApp{
override def run(args: List[String]): IO[ExitCode] = {
val x: IO[Unit] = for {
queue <- Queue.unbounded[IO, Event]
_ = new HackedProducer(queue)
done <- queue.dequeue.map(ev => {
println(s"Got $ev")
}).compile.drain
} yield done
x.map(_ => ExitCode.Success)
}
}
We can create a bounded queue that will consume elements from sender and make them available to fs2 stream processing.
import cats.effect.IO
import cats.effect.std.Queue
import fs2.Stream
trait Sender[T]:
def send(e: T): Unit
object Sender:
def apply[T](bufferSize: Int): IO[(Sender[T], Stream[IO, T])] =
for
q <- Queue.bounded[IO, T](bufferSize)
yield
val sender: Sender[T] = (e: T) => q.offer(e).unsafeRunSync()
def stm: Stream[IO, T] = Stream.eval(q.take) ++ stm
(sender, stm)
Then we'll have two ends - one for Java worlds, to send new elements to Sender. Another one - for stream processing in fs2.
class TestSenderQueue:
#Test def testSenderQueue: Unit =
val (sender, stream) = Sender[Int](1)
.unsafeRunSync()// we have to run it preliminary to make `sender` available to external system
val processing =
stream
.map(i => i * i)
.evalMap{ ii => IO{ println(ii)}}
sender.send(1)
processing.compile.toList.start//NB! we start processing in a separate fiber
.unsafeRunSync() // immediately right now.
sender.send(2)
Thread.sleep(100)
(0 until 100).foreach(sender.send)
println("finished")
Note that we push data in the current thread and have to run fs2 in a separate thread (.start).

Gracefully shutdown different supervisor actors without duplicating code

I have an API that creates actor A (at runtime). Then, A creates Actor B (at runtime as well).
I have another API that creates Actor C (different from actor A, No command code between them) and C creates Actor D.
I want to gracefully shutdown A and C once B and D has finished processing their messages (A and C not necessarily run together, They are unrelated).
Sending poison pill to A/C is not good enough because the children (B/D) will still get context stop, and will not be able to finish their tasks.
I understand I need to implement a new type of message.
I didn't understand how to create an infrastructure so both A and C will know how to respond to this message without having same duplicate receive method in both.
The solution I found was to create a new trait that extends Actor and override the unhandled method.
The code looks like this:
object SuicideActor {
case class PleaseKillYourself()
case class IKilledMyself()
}
trait SuicideActor extends Actor {
override def unhandled(message: Any): Unit = message match {
case PleaseKillYourself =>
Logger.debug(s"Actor ${self.path} received PleaseKillYourself - stopping children and aborting...")
val livingChildren = context.children.size
if (livingChildren == 0) {
endLife()
} else {
context.children.foreach(_ ! PleaseKillYourself)
context become waitForChildren(livingChildren)
}
case _ => super.unhandled(message)
}
protected[crystalball] def waitForChildren(livingChildren: Int): Receive = {
case IKilledMyself =>
val remaining = livingChildren - 1
if (remaining == 0) { endLife() }
else { context become waitForChildren(remaining) }
}
private def endLife(): Unit = {
context.parent ! IKilledMyself
context stop self
}
}
But this sound a bit hacky.... Is there a better (non hacky) solution ?
I think designing your own termination procedure is not necessary and potentially can cause headache for future maintainers of you code as this is nonstandard akka behaviour that needs to be yet again understood.
If A and C unrelated and can terminate independently, the following options are possible. To avoid any confusion, I'll use just actor A and B in my explanations.
Option 1.
Actor A uses context.watch on newly created actor B and reacts on Terminated message in its receive method. Actor B calls context.stop(context.self) when it's done with its task and this will generate Terminated event that will be handled by actor A that can clean up its state if needed and terminate too.
Check these docs for more details.
Option 2.
Actor B calls context.stop(context.parent) to terminate the parent directly when it's done with its own task. This option does not allow parent to react and perform additional clean up tasks if needed.
Finally, sharing this logic between actors A and C can be done with a trait in the way you did but the logic is very small and having duplicated code is not all the time a bad thing.
So it took me a bit but I find my answer.
I implemented The Reaper Pattern
The SuicideActor create a dedicated Reaper actor when it finished its block. The Reaper watch all of the SuicideActor children and once they all Terminated it send a PoisonPill to the SuicideActor and to itself
The SuicideActor code is :
trait SuicideActor extends Actor {
def killSwitch(block: => Unit): Unit = {
block
Logger.info(s"Actor ${self.path.name} is commencing suicide sequence...")
context become PartialFunction.empty
val children = context.children
val reaper = context.system.actorOf(ReaperActor.props(self), s"ReaperFor${self.path.name}")
reaper ! Reap(children.toSeq)
}
override def postStop(): Unit = Logger.debug(s"Actor ${self.path.name} is dead.")
}
And the Reaper is:
object ReaperActor {
case class Reap(underWatch: Seq[ActorRef])
def props(supervisor: ActorRef): Props = {
Props(new ReaperActor(supervisor))
}
}
class ReaperActor(supervisor: ActorRef) extends Actor {
override def preStart(): Unit = Logger.info(s"Reaper for ${supervisor.path.name} started")
override def postStop(): Unit = Logger.info(s"Reaper for ${supervisor.path.name} ended")
override def receive: Receive = {
case Reap(underWatch) =>
if (underWatch.isEmpty) {
killLeftOvers
} else {
underWatch.foreach(context.watch)
context become reapRemaining(underWatch.size)
underWatch.foreach(_ ! PoisonPill)
}
}
def reapRemaining(livingActorsNumber: Int): Receive = {
case Terminated(_) =>
val remainingActorsNumber = livingActorsNumber - 1
if (remainingActorsNumber == 0) {
killLeftOvers
} else {
context become reapRemaining(remainingActorsNumber)
}
}
private def killLeftOvers = {
Logger.debug(s"All children of ${supervisor.path.name} are dead killing supervisor")
supervisor ! PoisonPill
self ! PoisonPill
}
}

Scala Akka Consumer/Producer: Return Value

Problem Statement
Assume I have a file with sentences that is processed line by line. In my case, I need to extract Named Entities (Persons, Organizations, ...) from these lines. Unfortunately, the tagger is quite slow. Therefore, I decided to parallelize the computation, such that lines could be processed independent from each other and the result is collected in a central location.
Current Approach
My current approach comprises the usage of a single producer multiple consumer concept. However, I'm relative new to Akka, but I think my problem description fits well into its capabilities. Let me show you some code:
Producer
The Producer reads the file line by line and sends it to the Consumer. If it reaches the total line limit, it propagates the result back to WordCount.
class Producer(consumers: ActorRef) extends Actor with ActorLogging {
var master: Option[ActorRef] = None
var result = immutable.List[String]()
var totalLines = 0
var linesProcessed = 0
override def receive = {
case StartProcessing() => {
master = Some(sender)
Source.fromFile("sent.txt", "utf-8").getLines.foreach { line =>
consumers ! Sentence(line)
totalLines += 1
}
context.stop(self)
}
case SentenceProcessed(list) => {
linesProcessed += 1
result :::= list
//If we are done, we can propagate the result to the creator
if (linesProcessed == totalLines) {
master.map(_ ! result)
}
}
case _ => log.error("message not recognized")
}
}
Consumer
class Consumer extends Actor with ActorLogging {
def tokenize(line: String): Seq[String] = {
line.split(" ").map(_.toLowerCase)
}
override def receive = {
case Sentence(sent) => {
//Assume: This is representative for the extensive computation method
val tokens = tokenize(sent)
sender() ! SentenceProcessed(tokens.toList)
}
case _ => log.error("message not recognized")
}
}
WordCount (Master)
class WordCount extends Actor {
val consumers = context.actorOf(Props[Consumer].
withRouter(FromConfig()).
withDispatcher("consumer-dispatcher"), "consumers")
val producer = context.actorOf(Props(new Producer(consumers)), "producer")
context.watch(consumers)
context.watch(producer)
def receive = {
case Terminated(`producer`) => consumers ! Broadcast(PoisonPill)
case Terminated(`consumers`) => context.system.shutdown
}
}
object WordCount {
def getActor() = new WordCount
def getConfig(routerType: String, dispatcherType: String)(numConsumers: Int) = s"""
akka.actor.deployment {
/WordCount/consumers {
router = $routerType
nr-of-instances = $numConsumers
dispatcher = consumer-dispatcher
}
}
consumer-dispatcher {
type = $dispatcherType
executor = "fork-join-executor"
}"""
}
The WordCount actor is responsible for creating the other actors. When the Consumer is finished the Producer sends a message with all tokens. But, how to propagate the message again and also accept and wait for it? The architecture with the third WordCount actor might be wrong.
Main Routine
case class Run(name: String, actor: () => Actor, config: (Int) => String)
object Main extends App {
val run = Run("push_implementation", WordCount.getActor _, WordCount.getConfig("balancing-pool", "Dispatcher") _)
def execute(run: Run, numConsumers: Int) = {
val config = ConfigFactory.parseString(run.config(numConsumers))
val system = ActorSystem("Counting", ConfigFactory.load(config))
val startTime = System.currentTimeMillis
system.actorOf(Props(run.actor()), "WordCount")
/*
How to get the result here?!
*/
system.awaitTermination
System.currentTimeMillis - startTime
}
execute(run, 4)
}
Problem
As you see, the actual problem is to propagate the result back to the Main routine. Can you tell me how to do this in a proper way? The question is also how to wait for the result until the consumers are finished? I had a brief look into the Akka Future documentation section, but the whole system is a little bit overwhelming for beginners. Something like var future = message ? actor seems suitable. Not sure, how to do this. Also using the WordCount actor causes additional complexity. Maybe it is possible to come up with a solution that doesn't need this actor?
Consider using the Akka Aggregator Pattern. That takes care of the low-level primitives (watching actors, poison pill, etc). You can focus on managing state.
Your call to system.actorOf() returns an ActorRef, but you're not using it. You should ask that actor for results. Something like this:
implicit val timeout = Timeout(5 seconds)
val wCount = system.actorOf(Props(run.actor()), "WordCount")
val answer = Await.result(wCount ? "sent.txt", timeout.duration)
This means your WordCount class needs a receive method that accepts a String message. That section of code should aggregate the results and tell the sender(), like this:
class WordCount extends Actor {
def receive: Receive = {
case filename: String =>
// do all of your code here, using filename
sender() ! results
}
}
Also, rather than blocking on the results with Await above, you can apply some techniques for handling Futures.

akka actor post a message to head of the MailBox

Can a Producer actor post a message to another actor for immediate processing? i.e. post a message to the head of the Consumer MailBox instead of the tail of the Consumer MailBox?
I know that akka provides a way to configure my own defined Mailbox type, but how to control if some type of the messages need to be posted at the head of the MailBox instead of tail.
e.g. TimerMessages. i want a precise timer control for a time window implementation. messages must be kept for 1000 msec only (say), and if the message processing consumes time and there are many messages pending in the mailBox, i dont want timer Message to be appended to the same queue.
I could use a PriorityMailBox, but the trouble with PriorityMailBox is that even though it can put higher priority messages (timer messages) at the head of MailBox, for Messages of same priority, the order of messages in the MailBox is not guaranteed to be same as order of arrival. So i cannot use the priorityMailBox also.
Can someone please tell me how i can achieve this behavior?
You can use your own PriorityMailBox which can take care of message's arrival time and use it as an additional priority (for messages with the same "main" priority).
Something like this (not tested):
import akka.dispatch._
import com.typesafe.config.Config
import akka.actor.{ActorRef, PoisonPill, ActorSystem}
import java.util.Comparator
import java.util.concurrent.PriorityBlockingQueue
class MyTimedPriorityMailbox(settings: ActorSystem.Settings, config: Config)
extends UnboundedTimedPriorityMailbox(
TimedPriorityGenerator {
case 'highpriority ⇒ 0
case 'lowpriority ⇒ 2
case PoisonPill ⇒ 3
case otherwise ⇒ 1
})
case class TimedEnvelope(envelope: Envelope) {
private val _timestamp = System.nanoTime()
def timestamp = _timestamp
}
class UnboundedTimedPriorityMailbox( final val cmp: Comparator[TimedEnvelope], final val initialCapacity: Int) extends MailboxType {
def this(cmp: Comparator[TimedEnvelope]) = this(cmp, 11)
final override def create(owner: Option[ActorRef], system: Option[ActorSystem]): MessageQueue =
new PriorityBlockingQueue[TimedEnvelope](initialCapacity, cmp) with TimedQueueBasedMessageQueue with TimedUnboundedMessageQueueSemantics {
override def queue: java.util.Queue[TimedEnvelope] = this
}
}
trait TimedQueueBasedMessageQueue extends MessageQueue {
def queue: java.util.Queue[TimedEnvelope]
def numberOfMessages = queue.size
def hasMessages = !queue.isEmpty
def cleanUp(owner: ActorRef, deadLetters: MessageQueue) {
if (hasMessages) {
var envelope = dequeue()
while (envelope ne null) {
deadLetters.enqueue(owner, envelope)
envelope = dequeue()
}
}
}
}
trait TimedUnboundedMessageQueueSemantics extends TimedQueueBasedMessageQueue {
def enqueue(receiver: ActorRef, handle: Envelope) { queue add TimedEnvelope(handle) }
def dequeue(): Envelope = Option(queue.poll()).map(_.envelope).getOrElse(null)
}
object TimedPriorityGenerator {
def apply(priorityFunction: Any ⇒ Int): TimedPriorityGenerator = new TimedPriorityGenerator {
def gen(message: Any): Int = priorityFunction(message)
}
}
abstract class TimedPriorityGenerator extends java.util.Comparator[TimedEnvelope] {
def gen(message: Any): Int
final def compare(thisMessage: TimedEnvelope, thatMessage: TimedEnvelope): Int = {
val result = gen(thisMessage.envelope.message) - gen(thatMessage.envelope.message)
// Int.MaxValue / Int.MinValue check omitted
if(result == 0) (thisMessage.timestamp - thatMessage.timestamp).toInt else result
}
}
The code above works ok.
Only a detail. Avoid using System.getTimeNano(). It has problems in multi-core machines as it is defined by a per-cpu logic
Here another post
Then, We have an strange behavior in the messages order Dependending on which cpu enque it.
I change it with classic System.currentTimeMillis(). It is less precise but, on our case if two messages with same priority and with same millisecond generation time, Don't care the order they are treated.
Thanks for the code!