Why doesn't Play framework close the Akka Stream? - scala

An actor initializes an Akka stream which connects to a websocket. This is done by using a Source.actorRef to which messages can be send, which are then processed by the webSocketClientFlow and consumed by a Sink.foreach. This can be seen in the following code (derived from akka docs):
class TestActor #Inject()(implicit ec: ExecutionContext) extends Actor with ActorLogging {
final implicit val system: ActorSystem = ActorSystem()
final implicit val materializer: ActorMaterializer = ActorMaterializer()
def receive = {
case _ =>
}
// Consume the incoming messages from the websocket.
val incoming: Sink[Message, Future[Done]] =
Sink.foreach[Message] {
case message: TextMessage.Strict =>
println(message.text)
case misc => println(misc)
}
// Source through which we can send messages to the websocket.
val outgoing: Source[TextMessage, ActorRef] =
Source.actorRef[TextMessage.Strict](bufferSize = 10, OverflowStrategy.fail)
// flow to use (note: not re-usable!)
val webSocketFlow = Http().webSocketClientFlow(WebSocketRequest("wss://ws-feed.gdax.com"))
// Materialized the stream
val ((ws,upgradeResponse), closed) =
outgoing
.viaMat(webSocketFlow)(Keep.both)
.toMat(incoming)(Keep.both) // also keep the Future[Done]
.run()
// Check whether the server has accepted the websocket request.
val connected = upgradeResponse.flatMap { upgrade =>
if (upgrade.response.status == StatusCodes.SwitchingProtocols) {
Future.successful(Done)
} else {
throw new RuntimeException(s"Failed: ${upgrade.response.status}")
}
}
// When the connection has been established.
connected.onComplete(println)
// When the stream has closed
closed.onComplete {
case Success(_) => println("Test Websocket closed gracefully")
case Failure(e) => log.error("Test Websocket closed with an error\n", e)
}
}
When play framework recompiles it closes the TestActor but does not close the Akka stream. Only when the websocket timeouts the stream is closed.
Does this mean that I need to close the stream manually by for example, sending the actor created with Source.actorRef a PoisonPill in the TestActor PostStop function?
Note: I also tried to inject the Materializer and the Actorsystem i.e:
#Inject()(implicit ec: ExecutionContext, implicit val mat: Materializer, implicit val system: ActorSystem)
When Play recompiles, the stream is closed, but also produces an error:
[error] a.a.ActorSystemImpl - Websocket handler failed with
Processor actor [Actor[akka://application/user/StreamSupervisor-62/flow-0-0-ignoreSink#989719582]]
terminated abruptly

In your first example, you're creating an actor system in your actor. You should not do that - actor systems are expensive, creating one means starting thread pools, starting schedulers, etc. Plus, you're never shutting it down, which means that you've got a much bigger problem than the stream not shutting down - you have a resource leak, the thread pools created by the actor system are never shut down.
So essentially, every time you receive a WebSocket connection, you're creating a new actor system with a new set of thread pools, and you're never shutting them down. In production, with even a small load (a few requests per second), your application is going to run out of memory within a few minutes.
In general in Play, you should never create your own actor system, but rather have one injected. From within an actor, you don't even need to have it injected because it automatically is - context.system gives you access to the actor system that created the actor. Likewise with materializers, these aren't as heavy weight, but if you create one per connection, you could also run out of memory if you don't shut it down, so you should have it injected.
So when you do have it injected, you get an error - this is hard to avoid, though not impossible. The difficulty is that Akka itself can't really know automatically what order things need to be shutdown in order to close things gracefully, should it shut your actor down first, so that it can shut the streams down gracefully, or should it shut the streams down, so that they can notify your actor that they are shutdown and respond accordingly?
Akka 2.5 has a solution for this, a managed shutdown sequence, where you can register things to be shutdown before the Actor system starts killing things in a somewhat random order:
https://doc.akka.io/docs/akka/2.5/scala/actors.html#coordinated-shutdown
You can use this in combination with Akka streams kill switches to shutdown your streams gracefully before the rest of the application is shut down.
But generally, the shutdown errors are fairly benign, so if it were me I wouldn't worry about them.

Related

Future[Source] pipeTo an Actor

There are two local actors (the remoting is not used). Actors were simplified for the example:
class ProcessorActor extends Actor {
override def receive: Receive = {
case src:Source[Int, NotUsed] =>
//TODO processing of `src` here
}
}
class FrontendActor extends Actor {
val processor = context.system.actorOf(Props[ProcessorActor])
...
override def receive: Receive = {
case "Hello" =>
val f:Future[Source[Int, NotUsed]] = Future (Source(1 to 100))
f pipeTo processor
}
}
// entry point:
val frontend = system.actorOf(Props[FrontendActor])
frontend ! "Hello"
Thus the FrontendActor sends Source to ProcessorActor. In the above example it is works successfully.
Is such approach okay?
Thus the FrontendActor sends Source to ProcessorActor. In the above example it is works successfully.
Is such approach okay?
It's unclear what your concern is.
Sending a Source from one actor to another actor on the same JVM is fine. Because inter-actor communication on the same JVM, as the documentation states, "is simply done via reference passing," there is nothing unusual about your example1. Essentially what is going on is that a reference to a Source is passed to ProcessorActor once the Future is completed. A Source is an object that defines part of a stream; you can send a Source from one actor to another actor locally just as you can any JVM object.
(However, once you cross the boundary of a single JVM, you have to deal with serialization.)
1 A minor, tangential observation: FrontendActor calls context.system.actorOf(Props[ProcessorActor]), which creates a top-level actor. Typically, top-level actors are created in the main program, not within an actor.
Yes, this is OK, but does not work quite how you describe it. FrontendActor does not send Future[Source], it just sends Source.
From the docs:
pipeTo installs an onComplete-handler on the future to affect the submission of the result to another actor.
In other words, pipeTo means "send the result of this Future to the actor when it becomes available".
Note that this will work even if remoting is being used because the Future is resolved locally and is not sent over the wire to a remote actor.

Akka Stream from within a Spark Job to write into kafka

Willing to be the most efficient in writing data back into kafka, i am interested in using Akka Stream to write my RDD partition back into Kafka.
The problem is that i need a way to create an actor system per executor and not per partition which would be ridiculous. One may end up with 8 actorSystems on one node on one JVM. However having a Stream per partition is fine.
Has anyone already done that ?
My understanding, an actor system can't be serialized, hence can't be
sent has broadcast variable which would be per executor.
If one has had the experience around figuring a solution to that and tested please would you share ?
Else i can always fall back to https://index.scala-lang.org/benfradet/spark-kafka-writer/spark-kafka-0-10-writer/0.3.0?target=_2.11 but i am not sure it is the most efficient way.
You can always define a global lazy val with an actor system:
object Execution {
implicit lazy val actorSystem: ActorSystem = ActorSystem()
implicit lazy val materializer: Materializer = ActorMaterializer()
}
Then you just import it in any of the classes where you want to use Akka Streams:
import Execution._
val stream: DStream[...] = ...
stream.foreachRDD { rdd =>
...
rdd.foreachPartition { records =>
val (queue, done) = Source.queue(...)
.via(Producer.flow(...))
.toMat(Sink.ignore)(Keep.both)
.run() // implicitly pulls `Execution.materializer` from scope,
// which in turn will initialize `Execution.actorSystem`
... // push records to the queue
// wait until the stream is completed
Await.result(done, 10.minutes)
}
}
The above is kind of pseudocode but I think it should convey the general idea.
This way the system is going to be initialized on every executor JVM only once when it is needed. Additionally you can make the actor system "daemonic" in order for it to shut down automatically when the JVM finishes:
object Execution {
private lazy val config = ConfigFactory.parseString("akka.daemonic = on")
.withFallback(ConfigFactory.load())
implicit lazy val actorSystem: ActorSystem = ActorSystem("system", config)
implicit lazy val materializer: Materializer = ActorMaterializer()
}
We're doing this in our Spark jobs and it works flawlessly.
This works without any kind of broadcast variables, and, naturally, can be used in all kinds of Spark jobs, streaming or otherwise. Because the system is defined in a singleton object, it is guaranteed to be initialized only once per JVM instance (modulo various classloader shenanigans, but it doesn't really matter in the context of Spark), therefore even if some of the partitions get placed onto the same JVM (maybe in different threads), it will only initialize the actor system one time. lazy val ensures the thread-safety of the initialization, and ActorSystem is thread-safe, so this won't cause problems in this regard as well.

Akka/Scala detect closed TCP connection?

I'm trying to write a TCP server using Akka and Scala, which will instantiate actors and stop the actors when a client connects and disconnects respectively. I have a TCP binding actor,
class Server extends Actor{
import Tcp._
import context.system
IO(Tcp) ! Bind(self, new InetSocketAddress("localhost", 9595))
def receive = {
case Bound(localAddress) =>
println("Server Bound")
println(localAddress)
case CommandFailed(_: Bind) => context stop self
case Connected(remote, local)=>
val handler = context.system.actorOf(Props[ConnHandler])
val connection = sender()
connection ! Register(handler)
}
}
The above instantiates a TCP listener on localhost:9595 and registers handler actors to each connection.
I then have the receive def in my ConnHandler class defined as such, abbreviated with ... where the code behaves correctly.
case received => {...}
case PeerClosed => {
println("Stopping")
//Other actor stopping, cleanup.
context stop self
}
(See http://doc.akka.io/docs/akka/snapshot/scala/io-tcp.html for the docs I used to write this - it uses more or less the same code in the PeerClosed case)
However, when I close the socket client-side, the actor remains running, and no "Stopping" message is printed.
I have no nearby configured non-Windows machines to test this on, since I believe this to do with me running on Windows because after googling around, I found a still open bug - https://github.com/akka/akka/issues/17122 - which refers to some Close events being missed on Windows-based systems.
Have I made a silly error in my code, or would this be part of the bug linked above?
Although I could write in a case for Received(data) that would close the connection, however, a disconnect caused by the network disconnecting or something would leave the server in an unrecoverable state, requiring the application to be restarted since it would leave a secondary, shared actor in a state that says the client is still connected, and thus, the server would reject further connections from that user.
Edit:
I've worked around this by adding a watchdog timer actor that has a periodic action that fires after some amount of time. The ConnHandler actor will reset the watchdog timer every time an event happens on the connection. Although not ideal, it does what I want to do.
Edit 12/9/2016:
ConnHandler will receive a PeerClosed message even when a client Unexpectedly disconnect.

Lots of Akka Threads in Tomcat

I'm hosting my Scala (2.11) WAR inside Tomcat 8 and am using akka 2.3.+ and spray-client 1.3.3. I am using Akka for only one actor and am dispatching a call to it once when Tomcat starts. It can also be called manually.
class RefreshDataActor extends Actor with ActorLogging {
override def receive: Receive = {
case _ =>
implicit val timeout: Timeout = someTimeout
implicit val ec = this.context.dispatcher
val pipeline = sendReceive ~> unmarshal[Data]
pipeline(Get(fileUrl))
.onComplete {
case Success(data) =>
// Do stuff with the data
case Failure(ex) =>
this.log.error("Unable to find the latest version of the data!", ex)
}
}
}
Whenever any call comes into Tomcat, I see a spike in the number of live threads it hosts.
The arrows indicate when calls were made to the server. Also note that CPU slowly rises too and at some point the machine ends up working very hard over what seems to be nothing (the machine in the screencap has 8 cores).
I started debugging the issue by connecting VisualVM to one of the machines that is affected by this. The threads that remain alive are named default-akka.actor.default-dispatcher-X (where X is usually any number between 2 and 7), all of which are WAITING and default-scheduler-1 (TIMED_WAITING). There are HUNDREDS of them. There's also a single default-akka.io.pinned-dispatcher-5 (RUNNABLE).
I'm assuming this has something to do with how Akka works, but don't understand why this is.
Found the issue. I was calling ActorSystem() more than once, which created a new actor system, rather than reuse the already created one. This caused more and more systems to be created and, for some reason, not to be removed.

Scala how to use akka actors to handle a timing out operation efficiently

I am currently evaluating javascript scripts using Rhino in a restful service. I wish for there to be an evaluation time out.
I have created a mock example actor (using scala 2.10 akka actors).
case class Evaluate(expression: String)
class RhinoActor extends Actor {
override def preStart() = { println("Start context'"); super.preStart()}
def receive = {
case Evaluate(expression) ⇒ {
Thread.sleep(100)
sender ! "complete"
}
}
override def postStop() = { println("Stop context'"); super.postStop()}
}
Now I run use this actor as follows:
def run {
val t = System.currentTimeMillis()
val system = ActorSystem("MySystem")
val actor = system.actorOf(Props[RhinoActor])
implicit val timeout = Timeout(50 milliseconds)
val future = (actor ? Evaluate("10 + 50")).mapTo[String]
val result = Try(Await.result(future, Duration.Inf))
println(System.currentTimeMillis() - t)
println(result)
actor ! PoisonPill
system.shutdown()
}
Is it wise to use the ActorSystem in a closure like this which may have simultaneous requests on it?
Should I make the ActorSystem global, and will that be ok in this context?
Is there a more appropriate alternative approach?
EDIT: I think I need to use futures directly, but I will need the preStart and postStop. Currently investigating.
EDIT: Seems you don't get those hooks with futures.
I'll try and answer some of your questions for you.
First, an ActorSystem is a very heavy weight construct. You should not create one per request that needs an actor. You should create one globally and then use that single instance to spawn your actors (and you won't need system.shutdown() anymore in run). I believe this covers your first two questions.
Your approach of using an actor to execute javascript here seems sound to me. But instead of spinning up an actor per request, you might want to pool a bunch of the RhinoActors behind a Router, with each instance having it's own rhino engine that will be setup during preStart. Doing this will eliminate per request rhino initialization costs, speeding up your js evaluations. Just make sure you size your pool appropriately. Also, you won't need to be sending PoisonPill messages per request if you adopt this approach.
You also might want to look into the non-blocking callbacks onComplete, onSuccess and onFailure as opposed to using the blocking Await. These callbacks also respect timeouts and are preferable to blocking for higher throughput. As long as whatever is way way upstream waiting for this response can handle the asynchronicity (i.e. an async capable web request), then I suggest going this route.
The last thing to keep in mind is that even though code will return to the caller after the timeout if the actor has yet to respond, the actor still goes on processing that message (performing the evaluation). It does not stop and move onto the next message just because a caller timed out. Just wanted to make that clear in case it wasn't.
EDIT
In response to your comment about stopping a long execution there are some things related to Akka to consider first. You can call stop the actor, send a Kill or a PosionPill, but none of these will stop if from processing the message that it's currently processing. They just prevent it from receiving new messages. In your case, with Rhino, if infinite script execution is a possibility, then I suggest handling this within Rhino itself. I would dig into the answers on this post (Stopping the Rhino Engine in middle of execution) and setup your Rhino engine in the actor in such a way that it will stop itself if it has been executing for too long. That failure will kick out to the supervisor (if pooled) and cause that pooled instance to be restarted which will init a new Rhino in preStart. This might be the best approach for dealing with the possibility of long running scripts.