Akka: actor spawning vs filling up mailboxes - scala

If you want to execute long running computations concurrently (on a single machine), Akka actors can help.
One approach is to spawn a new actor for each piece of work. Something like
while(true) {
val actor = system.actorOf(Props[ProcessingActor])
(actor ? msg).map {
...
system.stop(actor)
}
}
A second idea is to configure a set number of actors behind a router. And then send all messages to the router.
val router = system.actorOf(Props[ProcessingActor].withRouter(RoundRobinRouter(nrOfInstances = 5)))
while(true) {
(router ? msg).map { ... }
}
I wonder, which is better if the system is overloaded (rate of incoming messages is higher than processing rate)?
Which will last longer? And will both eventually blow up the system with an OOMError?

Before you create a new Actor for each task you could also just use a Future. It really depends on what you want to achieve. To get as much work done with the least memory usage, you should use the actor/router approach. Futures are more expensive, because for each task would create a new instance of Future and Promise. But it really depends on your use case, which approach is the better. I just wouldn't create a lot of actors, when there really is no need for them. Especially as system.actorOf always creates a new error kernel.

Related

Are Akka actors overkill for doing data crunching/uploading?

I'm quite new to Scala as well as Akka actors. I'm really only reading about their use and implementation now. My background is largely js and python with a bit of C#.
A new service I have to write is going to receive REST requests, then do the following:
Open a socket connection to a message broker
Query an external REST service once
Make many big, long REST requests to another internal service, do math on the responses, and send the result out. Messages are sent through the socket connection as progress updates.
Scalability is the primary concern here, as we may normally receive ~10 small requests per minute, but at unknown times receive several jaw-droppingly enormous and long running requests at once.
Using Scala Futures, the very basic implementation would be something like this:
val smallResponse = smallHttpRequest(args)
smallResponse.onComplete match {
case Success(result) => {
result.data.grouped(10000).toList.forEach(subList => {
val bigResponse = getBigSlowHttpRequest(subList)
bigResponse.onSuccess {
case crunchableStuff => crunchAndDeliver(crunchableStuff)
}
})
}
case Failure(error) => handleError(error)
}
My understanding is that on a machine with many cores, letting the JVM handle all the threading underneath the above futures would allow for them all to run in parallel.
This could definitely be written using Akka actors, but I don't know what, if any, benefits I would realize in doing so. Would it be overkill to turn the above into an actor based process with a bunch of workers taking chunks of crunching?
For such an operation, I wouldn't go near Akka Actors -- it's way too much for what looks to be a very basic chain of async requests. The Actor system gives you the ability to safely handle and/or accumulate state in an actor, whilst your task can easily be modeled as a type safe stateless flow of data.
So Futures (or preferably one of the many lazy variants such as the Twitter Future, cats.IO, fs2 Task, Monix, etc) would easily handle that.
No IDE to hand, so there's bound to be a huge mistake in here somewhere!
val smallResponse = smallHttpRequest(args)
val result: Future[List[CrunchedData]] = smallResponse.map(result => {
result.data
.grouped(10000)
.toList
// List[X] => List[Future[X]]
.map(subList => getBigSlowHttpRequest(subList))
// List[Future[X]] => Future[List[X]] so flatmap
.flatMap(listOfFutures => Future.sequence(listOfFutures))
})
Afterwards you could pass the future back via the controller if using something like Finch, Http4s, Play, Akka Http, etc. Or manually take a look like in your example code.

Is sending futures in Akka messages OK?

I'm working on implementing a small language to send tasks to execution and control execution flow. After the sending a task to my system, the user gets a future (on which it can call a blocking get() or flatMap() ). My question is: is it OK to send futures in Akka messages?
Example: actor A sends a message Response to actor B and Response contains a future among its fields. Then at some point A will fulfill the promise from which the future was created. After receiving the Response, B can call flatMap() or get() at any time.
I'm asking because Akka messages should be immutable and work even if actors are on different JVMs. I don't see how my example above can work if actors A and B are on different JVMs. Also, are there any problems with my example even if actors are on same JVM?
Something similar is done in the accepted answer in this stackoverflow question. Will this work if actors are on different JVMs?
Without remoting it's possible, but still not advisable. With remoting in play it won't work at all.
If your goal is to have an API that returns Futures, but uses actors as the plumbing underneath, one approach could be that the API creates its own actor internally that it asks, and then returns the future from that ask to the caller. The actor spawned by the API call is guaranteed to be local to the API instance and can communicate with the rest of the actor system via the regular tell/receive mechanism, so that there are no Futures sent as messages.
class MyTaskAPI(actorFactory: ActorRefFactory) {
def doSomething(...): Future[SomethingResult] = {
val taskActor = actorFactory.actorOf(Props[MyTaskActor])
taskActor ? DoSomething(...).mapTo[SomethingResult]
}
}
where MyTaskActor receives the DoSomething, captures the sender, sends out the request for task processince and likely becomes a receiving state for SomethingResult which finally responds to the captured sender and stops itself. This approach creates two actors per request, one explicitly, the MyTaskActor and one implicitly, the handler of the ask, but keeps all state inside of actors.
Alternately, you could use the ActorDSL to create just one actor inline of doSomething and use a captured Promise for completion instead of using ask:
class MyTaskAPI(system: System) {
def doSomething(...): Future[SomethingResult] = {
val p = Promise[SomethingResult]()
val tmpActor = actor(new Act {
become {
case msg:SomethingResult =>
p.success(msg)
self.stop()
}
}
system.actorSelection("user/TaskHandler").tell(DoSomething(...), tmpActor)
p.future
}
}
This approach is a bit off the top of my head and it does use a shared value between the API and the temp actor, which some might consider a smell, but should give an idea how to implement your workflow.
If you're asking if it's possible, then yes, it's possible. Remote actors are basically interprocess communication. If you set everything up on both machines to a state where both can properly handle the future, then it should be good. You don't give any working example so I can't really delve deeper into it.

Akka actor forward message with continuation

I have an actor which takes the result from another actor and applies some check on it.
class Actor1(actor2:Actor2) {
def receive = {
case SomeMessage =>
val r = actor2 ? NewMessage()
r.map(someTransform).pipeTo(sender)
}
}
now if I make an ask of Actor1, we now have 2 futures generated, which doesnt seem overly efficient. Is there a way to provide a foward with some kind of continuation, or some other approach I could use here?
case SomeMessage => actor2.forward(NewMessage, someTransform)
Futures are executed in an ExecutionContext, which are like thread pools. Creating a new future is not as expensive as creating a new thread, but it has its cost. The best way to work with futures is to create as much as needed and compose then in a way that things that can be computed in parallel are computed in parallel if the necessary resources are available. This way you will make the best use of your machine.
You mentioned that akka documentation discourages excessive use of futures. I don't know where you read this, but what I think it means is to prefer transforming futures rather than creating your own. This is exactly what you are doing by using map. Also, it may mean that if you create a future where it is not needed you are adding unnecessary overhead.
In your case you have a call that returns a future and you need to apply sometransform and return the result. Using map is the way to go.

Scala how to use akka actors to handle a timing out operation efficiently

I am currently evaluating javascript scripts using Rhino in a restful service. I wish for there to be an evaluation time out.
I have created a mock example actor (using scala 2.10 akka actors).
case class Evaluate(expression: String)
class RhinoActor extends Actor {
override def preStart() = { println("Start context'"); super.preStart()}
def receive = {
case Evaluate(expression) ⇒ {
Thread.sleep(100)
sender ! "complete"
}
}
override def postStop() = { println("Stop context'"); super.postStop()}
}
Now I run use this actor as follows:
def run {
val t = System.currentTimeMillis()
val system = ActorSystem("MySystem")
val actor = system.actorOf(Props[RhinoActor])
implicit val timeout = Timeout(50 milliseconds)
val future = (actor ? Evaluate("10 + 50")).mapTo[String]
val result = Try(Await.result(future, Duration.Inf))
println(System.currentTimeMillis() - t)
println(result)
actor ! PoisonPill
system.shutdown()
}
Is it wise to use the ActorSystem in a closure like this which may have simultaneous requests on it?
Should I make the ActorSystem global, and will that be ok in this context?
Is there a more appropriate alternative approach?
EDIT: I think I need to use futures directly, but I will need the preStart and postStop. Currently investigating.
EDIT: Seems you don't get those hooks with futures.
I'll try and answer some of your questions for you.
First, an ActorSystem is a very heavy weight construct. You should not create one per request that needs an actor. You should create one globally and then use that single instance to spawn your actors (and you won't need system.shutdown() anymore in run). I believe this covers your first two questions.
Your approach of using an actor to execute javascript here seems sound to me. But instead of spinning up an actor per request, you might want to pool a bunch of the RhinoActors behind a Router, with each instance having it's own rhino engine that will be setup during preStart. Doing this will eliminate per request rhino initialization costs, speeding up your js evaluations. Just make sure you size your pool appropriately. Also, you won't need to be sending PoisonPill messages per request if you adopt this approach.
You also might want to look into the non-blocking callbacks onComplete, onSuccess and onFailure as opposed to using the blocking Await. These callbacks also respect timeouts and are preferable to blocking for higher throughput. As long as whatever is way way upstream waiting for this response can handle the asynchronicity (i.e. an async capable web request), then I suggest going this route.
The last thing to keep in mind is that even though code will return to the caller after the timeout if the actor has yet to respond, the actor still goes on processing that message (performing the evaluation). It does not stop and move onto the next message just because a caller timed out. Just wanted to make that clear in case it wasn't.
EDIT
In response to your comment about stopping a long execution there are some things related to Akka to consider first. You can call stop the actor, send a Kill or a PosionPill, but none of these will stop if from processing the message that it's currently processing. They just prevent it from receiving new messages. In your case, with Rhino, if infinite script execution is a possibility, then I suggest handling this within Rhino itself. I would dig into the answers on this post (Stopping the Rhino Engine in middle of execution) and setup your Rhino engine in the actor in such a way that it will stop itself if it has been executing for too long. That failure will kick out to the supervisor (if pooled) and cause that pooled instance to be restarted which will init a new Rhino in preStart. This might be the best approach for dealing with the possibility of long running scripts.

When could Futures be more appropriate than Actors (or vice versa) in Scala?

Suppose I need to run a few concurrent tasks.
I can wrap each task in a Future and wait for their completion. Alternatively I can create an Actor for each task. Each Actor would execute its task (e.g. upon receiving a "start" message) and send the result back.
I wonder when I should use the former (with Futures) and the latter (with Actors) approach and why the Future approach is considered better for the case described above.
Because it is syntactically simpler.
val tasks: Seq[() => T] = ???
val futures = tasks map {
t => future { t() }
}
val results: Future[Seq[T]] = Future.sequence(futures)
The results future you can then wait on using Await.result or you can map it further/use it in for-comprehension or install callbacks on it.
Compare that to instantiating all the actors, sending messages to them, coding their receive blocks, receiving responses from them and shutting them down -- that would generally require more boilerplate.
As a general rule, use the simplest concurrency model that fits your application, rather than the most powerful. Ordering from simplest to most complex would be sequential programming->parallel collections->futures->stateless actors->stateful actors->threads with software transactional memory->threads with explicit locking->threads with lock-free algorithms. Pick the first one in this list that solves your problem. The farther down that list you go, the greater the complexities and risks, so you're better off trading simplicity for conceptual power.
I tend to think that actors are useful when you have interacting threads. In your case, it appears to be that all the jobs are independent; I would use futures.