Akka Actor message memory and Garbage Collection - scala

Is the following statement correct: when an Actor receives a message, after completing the pattern-matched function, the message goes out of scope and the message is garbage collected?

On the JVM objects may be garbage collected when it is no longer strongly reachable, i.e. there is no chain of "normal" references via that it can be reached from some thread running in the JVM.
So this means, the simple answer to your question is: no. You do not never know, when or even if the message will be garbage collected. What you know is that the reference that goes out of scope is deleted. However, this does not even mean that the message is no longer reachable. The object could still be referenced from some other actor.
Typically, however, if a message is send and the sending actor keeps no reference to it, and the receiving actor removes its reference to it, it should be garbage collected quite soon. Messages are typically short-lived objects so most likely the object will not survive even one GC-cycle.

Related

How Akka Actors are clearing resources?

We're experiencing strange memory behavior on our web server built using Akka HTTP.
Our architecture goes like this:
web server routes calls various actors, get results for future and streams it to response
actors call non-blocking operations (using futures), combine and process data fetched from them and pipe results to sender. We're using standard Akka actors, implementing its receive method (not Akka typed)
there is no blocking code anywhere in the app
When I run web server locally, at the start it takes around 350 MB. After first request, memory usage jumps to around 430 MB and slowly is increasing with each request (monitored using Activity Monitor on Mac). But shouldn't GC clean things after each request? Shouldn't memory usage after processing be 350 MB again?
I also installed YourKit java profiler and here is a digram of head memory
It can be seen that once memory usage increase, it never goes back, and system is stateless. Also, when I run GC manually from profiler, it almost doesn't do anything, just a small decrease in memory usage. I understand some services might cache things after first request, consuming memory temporarily, but is there any policy inside Akka Actors or Akka HTTP about this?
I tried to check objects furthest from GC but it only shows library classes and Akka built in classes, nothing related to our code.
So, I have a 2 questions:
How the actor is closing resources and freeing memory after message processing? Did you experienced anything similar?
Is there any better way of profiling Akka HTTP which will show me stacktrace of using classed furthest from GC?
On a side note, is it advisable to use scheduler inside Actors (running inside Akka HTTP server)? When I do that, it seems memory usage increases heavily and app runs our of memory on DEV environment.
Thanks in advance,
Amer
An actor remains active until it is explicitly stopped: there is no garbage collection.
Probably the two most common methods for managing actor lifetimes (beyond the actor itself deciding that it's time to stop) are:
Parent is responsible for stopping children. If the actors are being spawned for performing specific tasks on behalf of the parent, for instance, this approach is called for.
Using an inactivity timeout. If the actors represent domain entities (e.g. an actor for every user account, where this actor in some sense serves as an in-memory cache), using context.setReceiveTimeout to cause a ReceiveTimeout message to be sent to the actor after the timeout has passed (note that in some cases the scheduled send of that message may not be canceled in time if a message was enqueued in the mailbox but not processed when the timeout expired: receiving a ReceiveTimeout is not a guarantee that the timeout has passed since the last received message) is a pretty reasonable solution, especially if using Akka Persistence and Akka Cluster Sharding to allow the actor's state to be recovered.
Update to add:
Regarding
shouldn't GC clean things after each request?
The short answer is no, GC will not clean things after each request (and if it does, that's a very good sign that you haven't provisioned enough memory).
The longer answer is that the characteristics of garbage collection on the JVM are very underspecified: the only rule a garbage collection implementation has to respect is that it never frees an object reachable from a GC root (basically any variable on a thread's stack or static to a class) by a chain of strong references. When and even whether the garbage collector reclaims the space taken up by garbage is entirely implementation dependent (I say "whether" to account for the existence of the Epsilon garbage collector, which never frees memory; this is useful for benchmarking JVMs without the complication of garbage collection and also in environments where the application can be restarted when it runs out of memory: the JVM crash is in this some sense the actual garbage collector).
You could try executing java.lang.System.gc when the server stops: this may cause a GC run (note that there is no requirement that the system actually collect any garbage in such a scenario). If a garbage collector will free any memory, about the only time it has to run is if there's not enough space to fulfill an object allocation request: therefore if the application stops allocating objects, there may not be a garbage collection run.
For performance reasons, most modern garbage collectors in the JVM wait until there's no more than a certain amount of free space before they collect garbage: this is because the time taken to reclaim all space is proportional to the number of objects which aren't reclaimable and for a great many applications, the pattern is that most objects are comparatively ephemeral, so the number of objects which aren't reclaimable is reasonably constant. The consequence of that is that the garbage collector will do about the same amount of work in a "full" GC for a given application regardless of how much free space there is.

What happens to messages sent to Actors that are being deployed?

I have a very simple question, but I haven't found anything on the Internet (maybe I don't know how to search for it).
If I deploy an actor (actorSystem.actorOf ...) and I send a message to it immediately, if the Actor hasn't been deployed yet will the messages be enqueued in a "special" queue or will the messages be sent to DeadLetters?
Have a look at the bottom of the mailbox documentation. Your guess is correct that messages are stored in a special queue until the mailbox is ready.
In order to make system.actorOf both synchronous and non-blocking while keeping the return type ActorRef (and the semantics that the returned ref is fully functional), special handling takes place for this case. Behind the scenes, a hollow kind of actor reference is constructed, which is sent to the system’s guardian actor who actually creates the actor and its context and puts those inside the reference. Until that has happened, messages sent to the ActorRef will be queued locally, and only upon swapping the real filling in will they be transferred into the real mailbox.
Actor mailboxes

Akka: what is the reason of processing messages one at a time in an Actor?

It is said:
Akka ensures that each instance of an actor runs in its own lightweight thread and that messages are processed one at a time.
Can you please explain what is the reason of processing messages one at a time in an Actor?
This way we can guarantee thread safety inside an Actor.
Because an actor will only ever handle one message at any given time, we can guarantee that accessing the actor's local state is safe to access, even though the Actor itself may be switching Threads which it is executing on. Akka guarantees that the state written while handling message M1 are visible to the Actor once it handles M2, even though it may now be running on a different thread (normally guaranteeing this kind of safety comes at a huge cost, Akka handles this for you).
It also originates from the original Actor model description, which is an concurrency abstraction, described as actors who can only one by one handle messages and respond to these by performing one of these actions: send other messages, change it's behaviour or create new actors.

Can an actor return an object to a future waiting on an ask?

I would like to use an actor to synchronize access to a pool of objects. The actor would manage the objects in the pool including their state (busy v.s. free to allocate). When asked by non-actor code, it would return an object from the pool once available. Thus the calling code has an abstraction for obtaining an object to work with.
To get this kind of abstraction, I need the actor to be able to respond to its message senders' ask messages with the object the actor is allocating to them. Can this be accomplished and would it be resource intensive to pass a whole object via a message?
There is nothing wrong in returning future that will be completed later, by actor.
Keep an eye on this matter, however: will you complete the future with some mutable internal actor state or not?
If the answer is no - it is ok and there is nothing you should worry about.
If the answer is yes - you'll have to take care of synchronization, since actor/external code may mutate this state in different threads (which, kind of defeats the purpose of using actors).
Otherwise it is legit.
BTW, this is not something that is specific to futures only. You have to follow this for any message you send from actor.
UPDATE: extending my answer to address OP's comment.
Question was primarily about returning an object that's not an actor, from an actor, not just about the scenario... not sure if this answer relates to just that... an object can be much heavier than just a "regular" message... unless Akka passes a message in a way equivalent to a reference / interning.
There is no special requirements about "heaviness" of the message in akka and in general it can be any object (you can infer this from the fact that Any is used for the type of the message instead of some akka-defined message class/trait with set of defined limitations).
Of course, you have to treat specially situations when messages should be persisted or sent to remote host, but this is kind of special case. In this scenario you have to ensure that serialization is handled properly.
Anyway, if the message (object) does not leave boundaries of the same jvm - it is ok for object to hold any amount of state.

How to handle concurrent access to a Scala collection?

I have an Actor that - in its very essence - maintains a list of objects. It has three basic operations, an add, update and a remove (where sometimes the remove is called from the add method, but that aside), and works with a single collection. Obviously, that backing list is accessed concurrently, with add and remove calls interleaving each other constantly.
My first version used a ListBuffer, but I read somewhere it's not meant for concurrent access. I haven't gotten concurrent access exceptions, but I did note that finding & removing objects from it does not always work, possibly due to concurrency.
I was halfway rewriting it to use a var List, but removing items from Scala's default immutable List is a bit of a pain - and I doubt it's suitable for concurrent access.
So, basic question: What collection type should I use in a concurrent access situation, and how is it used?
(Perhaps secondary: Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread?)
(Tertiary: In Scala, what collection type is best for inserts and random access (delete / update)?)
Edit: To the kind responders: Excuse my late reply, I'm making a nasty habit out of dumping a question on SO or mailing lists, then moving on to the next problem, forgetting the original one for the moment.
Take a look at the scala.collection.mutable.Synchronized* traits/classes.
The idea is that you mixin the Synchronized traits into regular mutable collections to get synchronized versions of them.
For example:
import scala.collection.mutable._
val syncSet = new HashSet[Int] with SynchronizedSet[Int]
val syncArray = new ArrayBuffer[Int] with SynchronizedBuffer[Int]
You don't need to synchronize the state of the actors. The aim of the actors is to avoid tricky, error prone and hard to debug concurrent programming.
Actor model will ensure that the actor will consume messages one by one and that you will never have two thread consuming message for the same Actor.
Scala's immutable collections are suitable for concurrent usage.
As for actors, a couple of things are guaranteed as explained here the Akka documentation.
the actor send rule: where the send of the message to an actor happens before the receive of the same actor.
the actor subsequent processing rule: where processing of one message happens before processing of the next message by the same actor.
You are not guaranteed that the same thread processes the next message, but you are guaranteed that the current message will finish processing before the next one starts, and also that at any given time, only one thread is executing the receive method.
So that takes care of a given Actor's persistent state. With regard to shared data, the best approach as I understand it is to use immutable data structures and lean on the Actor model as much as possible. That is, "do not communicate by sharing memory; share memory by communicating."
What collection type should I use in a concurrent access situation, and how is it used?
See #hbatista's answer.
Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread
The second (though the thread on which messages are processed may change, so don't store anything in thread-local data). That's how the actor can maintain invariants on its state.