Lots of Akka Threads in Tomcat - scala

I'm hosting my Scala (2.11) WAR inside Tomcat 8 and am using akka 2.3.+ and spray-client 1.3.3. I am using Akka for only one actor and am dispatching a call to it once when Tomcat starts. It can also be called manually.
class RefreshDataActor extends Actor with ActorLogging {
override def receive: Receive = {
case _ =>
implicit val timeout: Timeout = someTimeout
implicit val ec = this.context.dispatcher
val pipeline = sendReceive ~> unmarshal[Data]
pipeline(Get(fileUrl))
.onComplete {
case Success(data) =>
// Do stuff with the data
case Failure(ex) =>
this.log.error("Unable to find the latest version of the data!", ex)
}
}
}
Whenever any call comes into Tomcat, I see a spike in the number of live threads it hosts.
The arrows indicate when calls were made to the server. Also note that CPU slowly rises too and at some point the machine ends up working very hard over what seems to be nothing (the machine in the screencap has 8 cores).
I started debugging the issue by connecting VisualVM to one of the machines that is affected by this. The threads that remain alive are named default-akka.actor.default-dispatcher-X (where X is usually any number between 2 and 7), all of which are WAITING and default-scheduler-1 (TIMED_WAITING). There are HUNDREDS of them. There's also a single default-akka.io.pinned-dispatcher-5 (RUNNABLE).
I'm assuming this has something to do with how Akka works, but don't understand why this is.

Found the issue. I was calling ActorSystem() more than once, which created a new actor system, rather than reuse the already created one. This caused more and more systems to be created and, for some reason, not to be removed.

Related

Future[Source] pipeTo an Actor

There are two local actors (the remoting is not used). Actors were simplified for the example:
class ProcessorActor extends Actor {
override def receive: Receive = {
case src:Source[Int, NotUsed] =>
//TODO processing of `src` here
}
}
class FrontendActor extends Actor {
val processor = context.system.actorOf(Props[ProcessorActor])
...
override def receive: Receive = {
case "Hello" =>
val f:Future[Source[Int, NotUsed]] = Future (Source(1 to 100))
f pipeTo processor
}
}
// entry point:
val frontend = system.actorOf(Props[FrontendActor])
frontend ! "Hello"
Thus the FrontendActor sends Source to ProcessorActor. In the above example it is works successfully.
Is such approach okay?
Thus the FrontendActor sends Source to ProcessorActor. In the above example it is works successfully.
Is such approach okay?
It's unclear what your concern is.
Sending a Source from one actor to another actor on the same JVM is fine. Because inter-actor communication on the same JVM, as the documentation states, "is simply done via reference passing," there is nothing unusual about your example1. Essentially what is going on is that a reference to a Source is passed to ProcessorActor once the Future is completed. A Source is an object that defines part of a stream; you can send a Source from one actor to another actor locally just as you can any JVM object.
(However, once you cross the boundary of a single JVM, you have to deal with serialization.)
1 A minor, tangential observation: FrontendActor calls context.system.actorOf(Props[ProcessorActor]), which creates a top-level actor. Typically, top-level actors are created in the main program, not within an actor.
Yes, this is OK, but does not work quite how you describe it. FrontendActor does not send Future[Source], it just sends Source.
From the docs:
pipeTo installs an onComplete-handler on the future to affect the submission of the result to another actor.
In other words, pipeTo means "send the result of this Future to the actor when it becomes available".
Note that this will work even if remoting is being used because the Future is resolved locally and is not sent over the wire to a remote actor.

Is there an overhead because of nesting Futures

I wrote this code
package com.abhi
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
object FutureNesting extends App {
def measure(future: => Future[Unit]) : Future[Long] = {
val start = System.currentTimeMillis()
val ec = implicitly[ExecutionContext]
val t = future
t map { case _ =>
val end = System.currentTimeMillis()
end - start
}
}
measure(Future{ Thread.sleep(10000) }) onSuccess {case a => println(a)}
scala.io.StdIn.readLine()
}
So how many threads am I using in this code. The broader question is that what is the impact of going on nesting future inside futures.
So I ran the application above and observed it using Visual VM. This is what I saw
So the application launched 2 threads ForkJoinPool-1-worker-5 and ForkJoinPool-2-worker-3. However it launches the same 2 threads even if I remove the nesting. So I am not sure what is the overhead because of nesting the futures like above.
Edit:: Some people said it depends on the type of ThreadPool (ForkJoin etc).
I won't know what type of pool do Akka.HTTP or Spray use? I planned to use a code snippet similar to the one above in a Spray web service. The idea was to measure the performance of the web service using Futures.
In your case, you are using wrap over thradpool (ForkJoingPool from java.util.concurrent). Of course, all Futures are executed in it.
import scala.concurrent.ExecutionConext.Implicits.global
Based on this you must implicitly instantiate pool instead import, like this:
implicit val ec: ExecutionContext
And use method from ForkJoinPool: getActiveThreadCount()
Second approach:
You can open profiler (like JProfiler - from Jetbrains or Jvisualvm - attached with jdk) and watch meta information including threads parameters like their amount, activity, memory usage and etc.

How can I gather state information from a set of actors using only the actorSystem?

I'm creating an actor system, which has a list of actors representing some kind of session state.
These session are created by a factory actor (which might, in the future, get replaced by a router, if performance requires that - this should be transparent to the rest of the system, however).
Now I want to implement an operation where I get some state information from each of my currently existing session actors.
I have no explicit session list, as I want to rely on the actor system "owning" the sessions. I tried to use the actor system to look up the current session actors. The problem is that I did not find a "get all actor refs with this naming pattern" method. I tried to use the "/" operator on the system, followed by resolveOne - but got lost in a maze of future types.
The basic idea I had was:
- Send a message to all current session actors (as given to my by my ActorSystem).
- Wait for a response from them (preferably by using just the "ask" pattern - the method calling this broadcaster request/response is just a monitoring resp. debugging method, so blocking is no probleme here.
- And then collect the responses into a result.
After a death match against Scala's type system I had to give up for now.
Is there really no way of doing something like this?
If I understand the question correctly, then I can offer up a couple of ways you can accomplish this (though there are certainly others).
Option 1
In this approach, there will be an actor that is responsible for waking up periodically and sending a request to all session actors to get their current stats. That actor will use ActorSelection with a wildcard to accomplish that goal. A rough outline if the code for this approach is as follows:
case class SessionStats(foo:Int, bar:Int)
case object GetSessionStats
class SessionActor extends Actor{
def receive = {
case GetSessionStats =>
println(s"${self.path} received a request to get stats")
sender ! SessionStats(1, 2)
}
}
case object GatherStats
class SessionStatsGatherer extends Actor{
context.system.scheduler.schedule(5 seconds, 5 seconds, self, GatherStats)(context.dispatcher)
def receive = {
case GatherStats =>
println("Waking up to gether stats")
val sel = context.system.actorSelection("/user/session*")
sel ! GetSessionStats
case SessionStats(f, b) =>
println(s"got session stats from ${sender.path}, values are $f and $b")
}
}
Then you could test this code with the following:
val system = ActorSystem("test")
system.actorOf(Props[SessionActor], "session-1")
system.actorOf(Props[SessionActor], "session-2")
system.actorOf(Props[SessionStatsGatherer])
Thread.sleep(10000)
system.actorOf(Props[SessionActor], "session-3")
So with this approach, as long as we use a naming convention, we can use an actor selection with a wildcard to always find all of the session actors even though they are constantly coming (starting) and going (stopping).
Option 2
A somewhat similar approach, but in this one, we use a centralized actor to spawn the session actors and act as a supervisor to them. This central actor also contains the logic to periodically poll for stats, but since it's the parent, it does not need an ActorSelection and can instead just use its children list. That would look like this:
case object SpawnSession
class SessionsManager extends Actor{
context.system.scheduler.schedule(5 seconds, 5 seconds, self, GatherStats)(context.dispatcher)
var sessionCount = 1
def receive = {
case SpawnSession =>
val session = context.actorOf(Props[SessionActor], s"session-$sessionCount")
println(s"Spawned session: ${session.path}")
sessionCount += 1
sender ! session
case GatherStats =>
println("Waking up to get session stats")
context.children foreach (_ ! GetSessionStats)
case SessionStats(f, b) =>
println(s"got session stats from ${sender.path}, values are $f and $b")
}
}
And could be tested as follows:
val system = ActorSystem("test")
val manager = system.actorOf(Props[SessionsManager], "manager")
manager ! SpawnSession
manager ! SpawnSession
Thread.sleep(10000)
manager ! SpawnSession
Now, these examples are extremely trivialized, but hopefully they paint a picture for how you could go about solving this issue with either ActorSelection or a management/supervision dynamic. And a bonus is that ask is not needed in either and also no blocking.
There have been many additional changes in this project, so my answer/comments have been delayed quite a bit :-/
First, the session stats gathering should not be periodical, but on request. My original idea was to "mis-use" the actor system as my map of all existing session actors, so that I would not need a supervisor actor knowing all sessions.
This goal has shown to be elusive - session actors depend on shared state, so the session creator must watch sessions anyways.
This makes Option 2 the obvious answer here - the session creator has to watch all children anyways.
The most vexing hurdle with option 1 was "how to determine when all (current) answers are there" - I wanted the statistics request to take a snapshot of all currently existing actor names, query them, ignore failures (if a session dies before it can be queried, it can be ignored here) - the statistics request is only a debugging tool, i.e. something like a "best effort".
The actor selection api tangled me up in a thicket of futures (I am a Scala/Akka newbie), so I gave up on this route.
Option 2 is therefore better suited to my needs.

How to run Akka

It seems like there is no need in a class with a main method in it to be able to run Akka How to run akka actors in IntelliJ IDEA. However, here is what I have:
object Application extends App {
val system = ActorSystem()
val supervisor = system.actorOf(Props[Supervisor])
implicit val timeout = Timeout(100 seconds)
import system.dispatcher
system.scheduler.schedule(1 seconds, 600 seconds) {
val future = supervisor ? Supervisor.Start
val list = Await.result(future, timeout.duration).asInstanceOf[List[Int]]
supervisor ! list
}
}
I know I have to specify a main method called "akka.Main" in the configuration. But nonetheless, where should I move the current code from object Application ?
You can write something like
import _root_.akka.Main
object Application extends App {
Main.main(Array("somepackage.Supervisor"))
}
and Supervisor actor should have overriden preStart function as #cmbaxter suggested.
Then run sbt console in intellij and write run.
I agree with #kdrakon that your code is fine the way it is, but if you wanted to leverage the akka.Main functionality, then a simple refactor like so will make things work:
package code
class ApplicationActor extends Actor {
override def preStart = {
val supervisor = context.actorOf(Props[Supervisor])
implicit val timeout = Timeout(100 seconds)
import context.dispatcher
context.system.scheduler.schedule(1 seconds, 600 seconds) {
val future = (supervisor ? Supervisor.Start).mapTo[List[Int]]
val list = Await.result(future, timeout.duration)
supervisor ! list
}
}
def receive = {
case _ => //Not sure what to do here
}
}
In this case, the ApplicationActor is the arg you would pass to akka.Main and it would basically be the root supervisor to all other actors created in your hierarchy. The only fishy thing here is that being an Actor, it needs a receive implementation and I don't imagine any other actors will be sending messages here thus it doesn't really do anything. But the power to this approach is that when the ApplicationActor is stopped, the stop will also be cascaded down to all other actors that it started, simplifying a graceful shutdown. I suppose you could have the ApplicationActor handle a message to shutdown the actor system given some kind of input (maybe a ShutdownHookThread could initiate this) and give this actor some kind of a purpose after all. Anyway, as stated earlier, your current approach seems fine but this could also be an option if you so desire.
EDIT
So if you wanted to run this ApplicationActor via akka.Main, according to the instructions here, you would execute this from your command prompt:
java -classpath <all those JARs> akka.Main code.ApplicationActor
You will of course need to supply <all those JARS> with your dependencies including akka. At a minimum you will need scala-library and akka-actor in your classpath to make this run.
If you refer to http://doc.akka.io/docs/akka/snapshot/scala/hello-world.html, you'll find that akka.Main expects your root/parent Actor. In your case, Supervisor. As for your already existing code, it can be copied directly into the actors code, possibly in some initialisation calls. For example, refer to the HelloWorld's preStart function.
However, in my opinion, your already existing code is just fine too. Akka.main is a nice helper, as is the microkernel binary. But creating your own main executable is a viable option too.

Scala how to use akka actors to handle a timing out operation efficiently

I am currently evaluating javascript scripts using Rhino in a restful service. I wish for there to be an evaluation time out.
I have created a mock example actor (using scala 2.10 akka actors).
case class Evaluate(expression: String)
class RhinoActor extends Actor {
override def preStart() = { println("Start context'"); super.preStart()}
def receive = {
case Evaluate(expression) ⇒ {
Thread.sleep(100)
sender ! "complete"
}
}
override def postStop() = { println("Stop context'"); super.postStop()}
}
Now I run use this actor as follows:
def run {
val t = System.currentTimeMillis()
val system = ActorSystem("MySystem")
val actor = system.actorOf(Props[RhinoActor])
implicit val timeout = Timeout(50 milliseconds)
val future = (actor ? Evaluate("10 + 50")).mapTo[String]
val result = Try(Await.result(future, Duration.Inf))
println(System.currentTimeMillis() - t)
println(result)
actor ! PoisonPill
system.shutdown()
}
Is it wise to use the ActorSystem in a closure like this which may have simultaneous requests on it?
Should I make the ActorSystem global, and will that be ok in this context?
Is there a more appropriate alternative approach?
EDIT: I think I need to use futures directly, but I will need the preStart and postStop. Currently investigating.
EDIT: Seems you don't get those hooks with futures.
I'll try and answer some of your questions for you.
First, an ActorSystem is a very heavy weight construct. You should not create one per request that needs an actor. You should create one globally and then use that single instance to spawn your actors (and you won't need system.shutdown() anymore in run). I believe this covers your first two questions.
Your approach of using an actor to execute javascript here seems sound to me. But instead of spinning up an actor per request, you might want to pool a bunch of the RhinoActors behind a Router, with each instance having it's own rhino engine that will be setup during preStart. Doing this will eliminate per request rhino initialization costs, speeding up your js evaluations. Just make sure you size your pool appropriately. Also, you won't need to be sending PoisonPill messages per request if you adopt this approach.
You also might want to look into the non-blocking callbacks onComplete, onSuccess and onFailure as opposed to using the blocking Await. These callbacks also respect timeouts and are preferable to blocking for higher throughput. As long as whatever is way way upstream waiting for this response can handle the asynchronicity (i.e. an async capable web request), then I suggest going this route.
The last thing to keep in mind is that even though code will return to the caller after the timeout if the actor has yet to respond, the actor still goes on processing that message (performing the evaluation). It does not stop and move onto the next message just because a caller timed out. Just wanted to make that clear in case it wasn't.
EDIT
In response to your comment about stopping a long execution there are some things related to Akka to consider first. You can call stop the actor, send a Kill or a PosionPill, but none of these will stop if from processing the message that it's currently processing. They just prevent it from receiving new messages. In your case, with Rhino, if infinite script execution is a possibility, then I suggest handling this within Rhino itself. I would dig into the answers on this post (Stopping the Rhino Engine in middle of execution) and setup your Rhino engine in the actor in such a way that it will stop itself if it has been executing for too long. That failure will kick out to the supervisor (if pooled) and cause that pooled instance to be restarted which will init a new Rhino in preStart. This might be the best approach for dealing with the possibility of long running scripts.