I have a fairly basic wrapper class around a scala akka actorRef. Basically the class has a field which is an actorRef and exposes a number of methods that "tell" specific messages to the actorRef. In this fashion I can adhere to a specify API and avoid exposing tells or message classes. I've experienced a memory leak in my program and I'm wondering if my wrapper around akka actors is causing the problem. I wrote this simulation below to test my theory.
import akka.actor.{ActorSystem, ActorRef, PoisonPill}
import akka.actor.ActorDSL._
implicit val as = ActorSystem()
def createMemoryActor(): ActorRef = actor(new Act {
Array.fill(99999999)(1.0) // Just to take up memory
become {
case _ => print("testing memory leaks")
}
})
val memoryActor = createMemoryActor() // memory usage jumps up
memoryActor ! PoisonPill
System.gc() // memory usage goes back down
case class ActorWrapper() {
val memoryActor = createMemoryActor()
}
def doNothing(): Unit = {
val shouldGetGCed = ActorWrapper()
()
}
doNothing() // memory usage jumps up
System.gc() // memory usage goes back down
I've run the above code in a scala repl and run jvisualvm to profile the memory usage. It seems that the "shouldGetGCed" reference gets garage collected and its actorRef field (which was taking up memory) also gets garbage collected. Is this always the case or am I missing something? Also anyone have any best practices for wrapping actors to adhere to specific APIs?
Every actor that is started must eventually be stopped, and until that happens it will consume memory. In your example you need to e.g. shouldGetGCed.memoryActor ! PoisonPill. There is no automatic garbage collection for actors since they are distributed entities by nature and the JVM does not support distributed GC.
Related
I am trying to refactor some code for a program which uses an ActorSystem as the backbone for Http calls.
My specific goal is to make my code more modular so I can write libraries of functions which make http calls using an ActorSystem where the ActorSystem is later expected to be provided by the application.
This is a general question though as I tend to run into this problem a reasonable amount.
I have two goals:
Minimize the number of ActorSystems I create to simplify tracking of them (one per top level application is the goal)
Avoid explicitly passing around the ActorSystem and context everywhere it's needed.
Conceptually - the code below illustrates how I'm thinking about it (of course this code would not compile).
import akka.actor.ActorSystem
import intermediateModule._
import scala.concurrent.ExecutionContextExecutor
object MyApp extends App {
// Create the actorsystem and place into scope
implicit val system = ActorSystem()
implicit val context = system.dispatcher
intermediateFunc1(300)
}
// Elsewhere in the intermediate module
object intermediateModule {
import expectsActorSystemModule._
def intermediateFunc1(x: Int) = {
// Relies on ActorSystem and Execution context,
// but won't compile because, of course the application ActorSystem and
// ec is not in scope
usesActorSystem(x)
}
}
// In this modiule, usesActorSystem needs an ActorSystem
object expectsActorSystemModule {
def usesActorSystem(x: Int)
(implicit system: ActorSystem, context: ExecutionContextExecutor) = ???
//... does some stuff like sending http requests with ActorSystem
}
Is there a way to "trickle down" implicits through the sub-modules to achieve the goal of the top level application providing the needed implicits?
Can this be done in a way such that the "depth" of module imports doesn't matter (e.g. if I added a few more intermediate libraries in between the top level app and the module which requires the ActorSystem)?
The answer here is dependency injection. Every object that has dependencies on other objects should receive them as constructor parameters. The important thing here is that higher layers only get their own dependencies, and not their dependencies' dependencies.
In your example, IntermediateModule doesn't use the ActorSystem itself; it only needs it to pass it on to ExpectsActorSystemModule. This is bad, because if the latter changes and requires another dependency, you will need to change the former as well – that is too much coupling. You can refactor it like so:
import akka.actor.ActorSystem
import scala.concurrent.ExecutionContextExecutor
object MyApp extends App {
// Create the actorsystem and place into scope
// wire everything together
implicit val system = ActorSystem()
implicit val context = system.dispatcher
val expectsActorSystemModule = new ExpectsActorSystemModule
val intermediateModule = new IntermediateModule(expectsActorSystemModule)
// run stuff
intermediateModule.intermediateFunc1(300)
}
// Elsewhere in the intermediate module
class IntermediateModule(expectsActorSystemModule: ExpectsActorSystemModule) {
def intermediateFunc1(x: Int) = {
// Note: no ActorSystem or ExecutionContext is needed, because they were
// injected into expectsActorSystemModule
expectsActorSystemModule.usesActorSystem(x)
}
}
// In this module, usesActorSystem needs an ActorSystem
class ExpectsActorSystemModule(
implicit system: ActorSystem,
context: ExecutionContextExecutor) {
def usesActorSystem(x: Int) = ???
//... does some stuff like sending http requests with ActorSystem
}
Note that IntermediateModule no longer needs an ActorSystem or an ExecutionContext, because those were provided directly to ExpectsActorSystemModule.
The slightly annoying part is that at some point you have to instantiate all these objects in your application and wire them all together. In the above example it's only 4 lines in MyApp, but it will get significantly longer for more substantial programs.
There are libraries like MacWire or Guice to help with this, but I would recommend against using them. They make it much less transparent what is going on, and they don't save all that much code either – in my opinion, it's a bad tradeoff. And these two specifically have more downsides. Guice comes from the Java world and gives you basically no compile-time guarantees, meaning that your code might compile just fine and then fail to start because Guice. MacWire is better in that regard (everything is done at compile time), but it's not future-proof because it's implemented as a Scala 2 macro – it will not work on Scala 3 in its current form.
Another approach that is popular among the purely functional programming community is to use ZIO's ZLayer. But since you're working on an existing codebase that is based on the Lightbend tech stack, this is unlikely to be the means of choice in this particular case.
In the code below I'm using an AKKA actor MonitorActor even though it's an object. I never see this pattern in production code although it seems to work well.
Does the below code have concurrency issues as a result of using an object as as Actor?
Are there any AKKA actor related 'gotchas' on show here?
case class SomeEvent(member: String)
class Example(eventBus: EventBus)(implicit actorSystem: ActorSystem) {
val members: AtomicReference[Set[String]] = new AtomicReference(Set())
actorSystem.actorOf(Props(MonitorActor))
private object MonitorActor extends Actor {
eventBus.subscribe(classOf[SomeEvent])
var isEnough = false
override def receive: Receive = {
case SomeEvent(member: String) =>
val newMembers = members.updateAndGet(_ + member)
if (newMembers.size >= 10) {
isEnough = true
}
}
}
}
One immediate question arising from this "pattern" is: what happens if the Actor is added to the actorSystem twice:
actorSystem.actorOf(Props(MonitorActor))
actorSystem.actorOf(Props(MonitorActor))
This is not a trivial question. In large code bases there can be multiple files/packages where an Actor is materialized so the above scenario will likely come up if only by accident.
At best, each SomeEvent is processed twice by the exact same logic. At worst you will get into nasty race conditions with isEnough. So lets assume the best case.
Even in the best case scenario each SomeEvent will be processed by the exact same logic. This isn't bad in the question's example because members is a Set. But if it were a List you would start to get double insertions of the same event.
Another issue is having to protect ourselves from race conditions involving members. A good reason for members to be an AtomicReference is to resolve the situation where the two "independent" Actors are trying to access members at the same time. But this goes against the entire purpose of the Actor model. From the original 1973 formalism (emphasis mine):
The architecture is general with respect to control structure and does
not have or need goto, interrupt, or semaphore primitives.
A similar description can be found in the akka documentation's introduction (emphasis mine):
The Actor Model provides a higher level of abstraction for writing
concurrent and distributed systems. It alleviates the developer from
having to deal with explicit locking and thread management, making it
easier to write correct concurrent and parallel systems.
So we have effectively broken the Actor model framework and all we got was not having to call a constructor. Contrast the question's example code with the "preferable" implementation:
class MonitorActor() extends Actor {
val members: Set[String] = Set.empty[String]
eventBus.subscribe(classOf[SomeEvent])
var isEnough = false
override def receive: Receive = {
case SomeEvent(member: String) => {
members add member
isEnough = members.size >= 10
}
}
}
Now the developer doesn't have to worry about semaphores, race conditions, thread contention, ... All of the logic and functionality within an Actor can be understood from a serial perspective.
I am fairly new with Akka framework and Concurrency concepts. And from Akka docs, I understood that only one message in the Actor mailbox would be processed at a time. So single thread would be processing Actor's state at a time. And my doubt is that, so declaring an Actor state/data variable as mutable - 'Var'(Only when 'Val' doesn't fit), will not cause inconsistent Actor states in the case of Concurrency.
I am using Scala for development. In the following Master actor, details of workers is stored in a mutable variable 'workers'. Will it be a problem with concurrency?
class Master extends PersistentActor with ActorLogging {
...
private var workers = Map[String, WorkerState]()
...
}
I think what you are doing is fine. As you said, one of the fundamental guarantees of Akka actors is that a single actor will be handling one message at a time, so there will not be inconsistent Actor states.
Akka actors conceptually each have their own light-weight thread,
which is completely shielded from the rest of the system. This means
that instead of having to synchronize access using locks you can just
write your actor code without worrying about concurrency at all.
http://doc.akka.io/docs/akka/snapshot/general/actors.html
Also, it is a good thing that you're using a var instead of a val with a mutable map :)
Another way to consider coding situations like these is to alter the actor's "state" after each message handled. Eg.:
class Master extends PersistentActor with ActorLogging {
type MyStateType = ... // eg. Map[String, WorkerState], or an immutable case class - of course, feel free to just inline the type...
def receive = handle(initState) // eg. just inline a call to Map.empty
def handle(state: MyStateType): Actor.Receive = LoggingReceive {
case MyMessageType(data) =>
... // processing data - build new state
become(handle(newState))
case ... // any other message types to be handled, etc.
}
... // rest of class implementation
}
While it is true that there is still mutable state happening here (in this case, it is the state of the actor as a whole - it becomes effectively a "non-finite state machine"), it feels better contained/hidden (to me, at least), and the "state" (or "workers") available to the actor for any given message is treated as entirely immutable.
I just start trying myself out with Scala. I grow confident enough to start refactoring an ongoing multi-threaded application that i have been working on for about a year and half.
However something that somehow bother and i can't somehow figure it out, is how to expose the interface/contract/protocole of an Actor? In the OO mode, i have my public interface with synchronized method if necessary, and i know what i can do with that object. Now that i'm going to use actor, it seems that all of that won't be available anymore.
More specifically, I a KbService in my app with a set of method to work with that Kb. I want to make it an actor on its own. For that i need to make all the method private or protected given that now they will be only called by my received method. Hence the to expose the set of available behavior.
Is there some best practice for that ?
To tackle this issue I generally expose the set of messages that an Actor can receive via a "protocol" object.
class TestActor extends Actor {
def receive = {
case Action1 => ???
case Action2 => ???
}
}
object TestActorProtocol {
case object Action1
case object Action2
}
So when you want to communicate with TestActor you must send it a message from its protocol object.
import example.TestActorProtocol._
testActorRef ! TestActorProtocol.Action1
It can become heavy sometimes, but at least there is some kind of contract exposed.
Hope it helps
Scenario: I have this code:
class MyActor extends Actor {
def act() {
react {
case Message() => println("hi")
}
}
}
def meth() {
val a = new MyActor
a.start
a ! Message()
}
is the MyActor instance garbage collected? if not, how do i make sure it is? if I create an ad-hoc actor (with the 'actor' method), is that actor GCed?
This thread on the scala-user mailing list is relevant.
There Phillip Haller mentions using a particular scheduler (available in Scala 2.8) to enable termination of an Actor before garbage collection, either on a global or per-actor basis.
Memory leaks with the standard Actor library has lead to other Actor implementations. This was the reason for David Pollak and Jonas Boner's Actor library for Lift that you can read much more about here: http://blog.lostlake.org/index.php?/archives/96-Migrating-from-Scala-Actors-to-Lift-Actors.html
Have you tried adding a finalize method to see whether it is? I think the answer here is that the actors subsystem behaves no different to how you would expect it to: it does not cache any reference to your actor except in a thread-local for the duration of processing.
I would therefore expect that your actor is a candidate for collection (assuming the subsystem correctly clears out the ThreadLocal reference after the actor has processed the message which it does indeed appear to do in the Reaction.run method).