I'm using akka to dynamically create actors and destroy them when they're finished with a particular job. I've got a handle on actor creation, however stopping the actors keeps them in memory regardless of how I've terminated them. Eventually this causes an out of memory exception, despite the fact that I should only have a handful of active actors at any given time.
I've used:
self.tell(PoisonPill, self)
and:
context.stop(self)
to try and destroy the actors. Any ideas?
Edit: Here's a bit more to flesh out what I'm trying to do. The program opens up and spawns ten actors.
val system = ActorSystem("system")
(1 to 10) foreach { x =>
Entity.count += 1
system.actorOf(Props[Entity], name = Entity.count.toString())
}
Here's the code for the Entity:
class Entity () extends Actor {
Entity.entities += this
val id = Entity.count
import context.dispatcher
val tick = context.system.scheduler.schedule(0 millis, 100 millis, self, "update")
def receive = {
case "update" => {
Entity.entities.foreach(that => collide(that))
}
}
override def postStop() = tick.cancel()
def collide(that:Entity) {
if (!this.isBetterThan(that)) {
destroyMe()
spawnNew()
}
}
def isBetterThan() :Boolean = {
//computationally intensive logic
}
private def destroyMe(){
Entity.entities.remove(Entity.entities.indexOf(this))
self.tell(PoisonPill, self)
//context.stop(self)
}
private def spawnNew(){
val system = ActorSystem("system")
Entity.count += 1
system.actorOf(Props[Entity], name = Entity.count.toString())
}
}
object Entity {
val entities = new ListBuffer[Entity]()
var count = 0
}
Thanks #AmigoNico, you pointed me in the right direction. It turns out that neither
self.tell(PoisonPill, self)
nor
context.stop(self)
worked for timely Actor disposal; I switched the line to:
system.stop(self)
and everything works as expected.
Related
I am trying to continuously read the wikipedia IRC channel using this lib: https://github.com/implydata/wikiticker
I created a custom Akka Publisher, which will be used in my system as a Source.
Here are some of my classes:
class IrcPublisher() extends ActorPublisher[String] {
import scala.collection._
var queue: mutable.Queue[String] = mutable.Queue()
override def receive: Actor.Receive = {
case Publish(s) =>
println(s"->MSG, isActive = $isActive, totalDemand = $totalDemand")
queue.enqueue(s)
publishIfNeeded()
case Request(cnt) =>
println("Request: " + cnt)
publishIfNeeded()
case Cancel =>
println("Cancel")
context.stop(self)
case _ =>
println("Hm...")
}
def publishIfNeeded(): Unit = {
while (queue.nonEmpty && isActive && totalDemand > 0) {
println("onNext")
onNext(queue.dequeue())
}
}
}
object IrcPublisher {
case class Publish(data: String)
}
I am creating all this objects like so:
def createSource(wikipedias: Seq[String]) {
val dataPublisherRef = system.actorOf(Props[IrcPublisher])
val dataPublisher = ActorPublisher[String](dataPublisherRef)
val listener = new MessageListener {
override def process(message: Message) = {
dataPublisherRef ! Publish(Jackson.generate(message.toMap))
}
}
val ticker = new IrcTicker(
"irc.wikimedia.org",
"imply",
wikipedias map (x => s"#$x.wikipedia"),
Seq(listener)
)
ticker.start() // if I comment this...
Thread.currentThread().join() //... and this I get Request(...)
Source.fromPublisher(dataPublisher)
}
So the problem I am facing is this Source object. Although this implementation works well with other sources (for example from local file), the ActorPublisher don't receive Request() messages.
If I comment the two marked lines I can see, that my actor has received the Request(count) message from my flow. Otherwise all messages will be pushed into the queue, but not in my flow (so I can see the MSG messages printed).
I think it's something with multithreading/synchronization here.
I am not familiar enough with wikiticker to solve your problem as given. One question I would have is: why is it necessary to join to the current thread?
However, I think you have overcomplicated the usage of Source. It would be easier for you to work with the stream as a whole rather than create a custom ActorPublisher.
You can use Source.actorRef to materialize a stream into an ActorRef and work with that ActorRef. This allows you to utilize akka code to do the enqueing/dequeing onto the buffer while you can focus on the "business logic".
Say, for example, your entire stream is only to filter lines above a certain length and print them to the console. This could be accomplished with:
def dispatchIRCMessages(actorRef : ActorRef) = {
val ticker =
new IrcTicker("irc.wikimedia.org",
"imply",
wikipedias map (x => s"#$x.wikipedia"),
Seq(new MessageListener {
override def process(message: Message) =
actorRef ! Publish(Jackson.generate(message.toMap))
}))
ticker.start()
Thread.currentThread().join()
}
//these variables control the buffer behavior
val bufferSize = 1024
val overFlowStrategy = akka.stream.OverflowStrategy.dropHead
val minMessageSize = 32
//no need for a custom Publisher/Queue
val streamRef =
Source.actorRef[String](bufferSize, overFlowStrategy)
.via(Flow[String].filter(_.size > minMessageSize))
.to(Sink.foreach[String](println))
.run()
dispatchIRCMessages(streamRef)
The dispatchIRCMessages has the added benefit that it will work with any ActorRef so you aren't required to only work with streams/publishers.
Hopefully this solves your underlying problem...
I think the main problem is Thread.currentThread().join(). This line will 'hang' current thread because this thread is waiting for himself to die. Please read https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html#join-long- .
Problem Statement
Assume I have a file with sentences that is processed line by line. In my case, I need to extract Named Entities (Persons, Organizations, ...) from these lines. Unfortunately, the tagger is quite slow. Therefore, I decided to parallelize the computation, such that lines could be processed independent from each other and the result is collected in a central location.
Current Approach
My current approach comprises the usage of a single producer multiple consumer concept. However, I'm relative new to Akka, but I think my problem description fits well into its capabilities. Let me show you some code:
Producer
The Producer reads the file line by line and sends it to the Consumer. If it reaches the total line limit, it propagates the result back to WordCount.
class Producer(consumers: ActorRef) extends Actor with ActorLogging {
var master: Option[ActorRef] = None
var result = immutable.List[String]()
var totalLines = 0
var linesProcessed = 0
override def receive = {
case StartProcessing() => {
master = Some(sender)
Source.fromFile("sent.txt", "utf-8").getLines.foreach { line =>
consumers ! Sentence(line)
totalLines += 1
}
context.stop(self)
}
case SentenceProcessed(list) => {
linesProcessed += 1
result :::= list
//If we are done, we can propagate the result to the creator
if (linesProcessed == totalLines) {
master.map(_ ! result)
}
}
case _ => log.error("message not recognized")
}
}
Consumer
class Consumer extends Actor with ActorLogging {
def tokenize(line: String): Seq[String] = {
line.split(" ").map(_.toLowerCase)
}
override def receive = {
case Sentence(sent) => {
//Assume: This is representative for the extensive computation method
val tokens = tokenize(sent)
sender() ! SentenceProcessed(tokens.toList)
}
case _ => log.error("message not recognized")
}
}
WordCount (Master)
class WordCount extends Actor {
val consumers = context.actorOf(Props[Consumer].
withRouter(FromConfig()).
withDispatcher("consumer-dispatcher"), "consumers")
val producer = context.actorOf(Props(new Producer(consumers)), "producer")
context.watch(consumers)
context.watch(producer)
def receive = {
case Terminated(`producer`) => consumers ! Broadcast(PoisonPill)
case Terminated(`consumers`) => context.system.shutdown
}
}
object WordCount {
def getActor() = new WordCount
def getConfig(routerType: String, dispatcherType: String)(numConsumers: Int) = s"""
akka.actor.deployment {
/WordCount/consumers {
router = $routerType
nr-of-instances = $numConsumers
dispatcher = consumer-dispatcher
}
}
consumer-dispatcher {
type = $dispatcherType
executor = "fork-join-executor"
}"""
}
The WordCount actor is responsible for creating the other actors. When the Consumer is finished the Producer sends a message with all tokens. But, how to propagate the message again and also accept and wait for it? The architecture with the third WordCount actor might be wrong.
Main Routine
case class Run(name: String, actor: () => Actor, config: (Int) => String)
object Main extends App {
val run = Run("push_implementation", WordCount.getActor _, WordCount.getConfig("balancing-pool", "Dispatcher") _)
def execute(run: Run, numConsumers: Int) = {
val config = ConfigFactory.parseString(run.config(numConsumers))
val system = ActorSystem("Counting", ConfigFactory.load(config))
val startTime = System.currentTimeMillis
system.actorOf(Props(run.actor()), "WordCount")
/*
How to get the result here?!
*/
system.awaitTermination
System.currentTimeMillis - startTime
}
execute(run, 4)
}
Problem
As you see, the actual problem is to propagate the result back to the Main routine. Can you tell me how to do this in a proper way? The question is also how to wait for the result until the consumers are finished? I had a brief look into the Akka Future documentation section, but the whole system is a little bit overwhelming for beginners. Something like var future = message ? actor seems suitable. Not sure, how to do this. Also using the WordCount actor causes additional complexity. Maybe it is possible to come up with a solution that doesn't need this actor?
Consider using the Akka Aggregator Pattern. That takes care of the low-level primitives (watching actors, poison pill, etc). You can focus on managing state.
Your call to system.actorOf() returns an ActorRef, but you're not using it. You should ask that actor for results. Something like this:
implicit val timeout = Timeout(5 seconds)
val wCount = system.actorOf(Props(run.actor()), "WordCount")
val answer = Await.result(wCount ? "sent.txt", timeout.duration)
This means your WordCount class needs a receive method that accepts a String message. That section of code should aggregate the results and tell the sender(), like this:
class WordCount extends Actor {
def receive: Receive = {
case filename: String =>
// do all of your code here, using filename
sender() ! results
}
}
Also, rather than blocking on the results with Await above, you can apply some techniques for handling Futures.
I have a futures pool , and each future works with the same akka Actor System - some Actors in system should be global, some are used only in one future.
val longFutures = for (i <- 0 until 2 ) yield Future {
val p:Page = PhantomExecutor(isDebug=true)
Await.result( p.open("http://www.stackoverflow.com/") ,timeout = 10.seconds)
}
PhantomExecutor tryes to use one shared global actor (simple increment counter) using system.actorSelection
def selectActor[T <: Actor : ClassTag](system:ActorSystem,name:String) = {
val timeout = Timeout(0.1 seconds)
val myFutureStuff = system.actorSelection("akka://"+system.name+"/user/"+name)
val aid:ActorIdentity = Await.result(myFutureStuff.ask(Identify(1))(timeout).mapTo[ActorIdentity],
0.1 seconds)
aid.ref match {
case Some(cacher) =>
cacher
case None =>
system.actorOf(Props[T],name)
}
}
But in concurrent environment this approach does not work because of race condition.
I know only one solution for this problem - create global actors before splitting to futures. But this means that I can't encapsulate alot of hidden work from top library user.
You're right in that making sure the global actors are initialized first is the right approach. Can't you tie them to a companion object and reference them from there so you know they will only ever be initialized one time? If you really can't go with such an approach then you could try something like this to lookup or create the actor. It is similar to your code but it include logic to go back through the lookup/create logic (recursively) if the race condition is hit (only up to a max number of times):
def findOrCreateActor[T <: Actor : ClassTag](system:ActorSystem, name:String, maxAttempts:Int = 5):ActorRef = {
import system.dispatcher
val timeout = 0.1 seconds
def doFindOrCreate(depth:Int = 0):ActorRef = {
if (depth >= maxAttempts)
throw new RuntimeException(s"Can not create actor with name $name and reached max attempts of $maxAttempts")
val selection = system.actorSelection(s"/user/$name")
val fut = selection.resolveOne(timeout).map(Some(_)).recover{
case ex:ActorNotFound => None
}
val refOpt = Await.result(fut, timeout)
refOpt match {
case Some(ref) => ref
case None => util.Try(system.actorOf(Props[T],name)).getOrElse(doFindOrCreate(depth + 1))
}
}
doFindOrCreate()
}
Now the retry logic would fire for any exception when creating the actor, so you might want to further specify that (probably via another recover combinator) to only recurse when it gets an InvalidActorNameException, but you get the idea.
You may want to consider creating a manager actor that would take care about creating "counter" actors. This way you would ensure that counter actor creation requests are serialized.
object CounterManagerActor {
case class SelectActorRequest(name : String)
case class SelectActorResponse(name : String, actorRef : ActorRef)
}
class CounterManagerActor extends Actor {
def receive = {
case SelectActorRequest(name) => {
sender() ! SelectActorResponse(name, selectActor(name))
}
}
private def selectActor(name : String) = {
// a slightly modified version of the original selectActor() method
???
}
}
I need an actor to stop one of its children, so that I can possibly create a new actor with same name (UUID ?).
I've got an ActorSystem with one Actor child. And this child creates new actors with context.actorOf and context.watch. When I try to stop one of these using context.stop, I observe that its postStop method is called as expected, but no matter how long I wait (seconds... minutes...), it never sends back the Terminated message to its creator (and watching) actor.
I read this in the AKKA documentation:
Since stopping an actor is asynchronous, you cannot immediately reuse the name of the child you just stopped; this will result in an InvalidActorNameException. Instead, watch the terminating actor and create its replacement in response to the Terminated message which will eventually arrive.
I don't care waiting for normal termination, but I really need actors to eventually terminate when asked to. Am I missing something ? Should I create actors directly from the system instead of from an actor ?
EDIT:
Here is my code :
object MyApp extends App {
def start() = {
val system = ActorSystem("MySystem")
val supervisor = system.actorOf(Supervisor.props(), name = "Supervisor")
}
override def main(args: Array[String]) {
start()
}
}
object Supervisor {
def props(): Props = Props(new Supervisor())
}
case class Supervisor() extends Actor {
private var actor: ActorRef = null
start()
def newActor(name: String): ActorRef = {
try {
actor = context.actorOf(MyActor.props(name), name)
context.watch(actor)
} catch {
case iane: InvalidActorNameException =>
println(name + " not terminated yet.")
null
}
}
def terminateActor() {
if (actor != null) context.stop(actor)
actor = null
}
def start() {
while (true) {
// do something
terminateActor()
newActor("new name possibly same name as a previously terminated one")
Thread.sleep(5000)
}
}
override def receive = {
case Terminated(x) => println("Received termination confirmation: " + x)
case _ => println("Unexpected message.")
}
override def postStop = {
println("Supervisor called postStop().")
}
}
object MyActor {
def props(name: String): Props = Props(new MyActor(name))
}
case class MyActor(name: String) extends Actor {
run()
def run() = {
// do something
}
override def receive = {
case _ => ()
}
override def postStop {
println(name + " called postStop().")
}
}
EDIT²: As mentionned by #DanGetz, one shall not need to call Thread.sleep in an AKKA actor. Here what I needed was a periodical routine. This can be done using the AKKA context scheduler. See: http://doc.akka.io/docs/akka/2.3.3/scala/howto.html#scheduling-periodic-messages . Instead I was blocking the actor in an infinite loop, preventing it to use its asynchronous mecanisms (messages). I changed the title since the problem was actually not involving actor termination.
It's hard to gauge exactly what you want now that the question has changed a bit, but I'm going to take a stab anyway. Below you will find a modified version of your code that shows both periodic scheduling of a task (one that kicks off the child termination process) and also watching a child and only creating a new one with the same name when we are sure the previous one has stopped. If you run the code below, every 5 seconds you should see it kill the child and wait for the termination message before stating a new one with the exact same name. I hope this is what you were looking for:
object Supervisor {
val ChildName = "foo"
def props(): Props = Props(new Supervisor())
case class TerminateChild(name:String)
}
case class Supervisor() extends Actor {
import Supervisor._
import scala.concurrent.duration._
import context._
//Start child upon creation of this actor
newActor(ChildName)
override def preStart = {
//Schedule regular job to run every 5 seconds
context.system.scheduler.schedule(5 seconds, 5 seconds, self, TerminateChild(ChildName))
}
def newActor(name: String): ActorRef = {
val child = context.actorOf(MyActor.props(name), name)
watch(child)
println(s"created child for name $name")
child
}
def terminateActor(name:String) = context.child(ChildName).foreach{ ref =>
println(s"terminating child for name $name")
context stop ref
}
override def receive = {
case TerminateChild(name) =>
terminateActor(name)
case Terminated(x) =>
println("Received termination confirmation: " + x)
newActor(ChildName)
case _ => println("Unexpected message.")
}
override def postStop = {
println("Supervisor called postStop().")
}
}
I was writing a little test program to try out some things with Remote Actors that I was going to need in a Scala project.
The basic goal was to write a test application of one server that could handle a bunch of clients and more important clients that can send multiple messages at the same time (like pings, requests for updates and user induced requests for data)
What I came up with was this:
brief overview: the client starts 3 different actors which again start actors in while loops with different offsets in order to simulate rather random messages.
import scala.actors.remote.RemoteActor
import scala.actors.remote.Node
import scala.actors.Actor
trait Request
trait Response
case object WhoAmI extends Request
case class YouAre(s:String) extends Response
case object Ping extends Request
case object Pong extends Response
case class PrintThis(s:String) extends Request
case object PrintingDone extends Response
object Server {
def main(args: Array[String]) {
val server = new Server
server.start
}
}
class Server extends Actor {
RemoteActor.alive(12345)
RemoteActor.register('server, this)
var count:Int = 0
def act() {
while(true) {
receive {
case WhoAmI => {
count += 1
sender ! YouAre(count.toString)
}
case Ping => sender ! Pong
case PrintThis(s) => {
println(s)
sender ! PrintingDone
}
case x => println("Got a bad request: " + x)
}
}
}
}
object Act3 extends scala.actors.Actor {
def act = {
var i = 0
Thread.sleep(900)
while (i <= 12) {
i += 1
val a = new Printer
a.start
Thread.sleep(900)
}
}
}
class Printer extends scala.actors.Actor {
def act = {
val server = RemoteActor.select(Node("localhost",12345), 'server)
server ! PrintThis("gagagagagagagagagagagagaga")
receive {
case PrintingDone => println("yeah I printed")
case _ => println("got something bad from printing")
}
}
}
object Act2 extends scala.actors.Actor {
def act = {
var i = 0
while (i < 10) {
i+=1
val a = new Pinger
a.start
Thread.sleep(700)
}
}
}
class Pinger extends scala.actors.Actor {
def act = {
val server = RemoteActor.select(Node("localhost",12345), 'server)
server ! Ping
receive {
case Pong => println("so I pinged and it fits")
case x => println("something wrong with ping. Got " + x)
}
}
}
object Act extends scala.actors.Actor {
def act = {
var i = 0
while(i < 10) {
i+=1
val a = new SayHi
a.start()
Thread.sleep(200)
}
}
}
class SayHi extends scala.actors.Actor {
def act = {
val server = RemoteActor.select(Node("localhost",12345), 'server)
server ! "Hey!"
}
}
object Client {
def main(args: Array[String]) {
Act.start()
//Act2.start()
Act3.start()
}
}
The problem is that things don't run as smoothly as I'd expect them to:
when I start only one of the client actors (by commenting the others out as I did with Act2in Client) things usually but not always go well. If I start two or more actors, quite often the printouts appear in bulk (meaning: there's nothing happening at once and then the printouts appear rather fast). Also the client sometimes terminates and sometimes doesn't.
This may not be the biggest problems but they're enough to make me feel quite uncomfortable. I did a lot of reading on Actors and Remote Actors but I find the available info rather lacking.
Tried to add exit statements where ever it seemed fit. But that didn't help.
Has anybody got an idea what I'm doing wrong? Any general tricks here? Some dos and donts?
My guess is that your issues stem from blocking your actor's threads by using receive and Thread.sleep. Blocking operations consume threads in the actors' thread pool, which can prevent other actors from executing until new threads are added to the pool. This question may provide some additional insight.
You can use loop, loopWhile, react, and reactWithin to rewrite many of your actors to use non-blocking operations. For example
import scala.actors.TIMEOUT
object Act extends scala.actors.Actor {
def act = {
var i = 0
loopWhile(i < 10) {
reactWithin(200) { case TIMEOUT =>
i+=1
val a = new SayHi
a.start()
}
}
}
}
Of course, you can eliminate some boilerplate by writing your own control construct:
def doWithin(msec: Long)(f: => Unit) = reactWithin(msec) { case TIMEOUT => f }
def repeat(times: Int)(f: => Unit) = {
var i = 0
loopWhile(i < times) {
f
i+=1
}
}
This would allow you to write
repeat(10) {
doWithin(200) {
(new SayHi).start
}
}
You may try Akka actors framework instead http://akkasource.org/