Akka Actor preStart() & postStop() methods behaviors? - scala

Says if I have an Actor for database accessing, an Actor is a singleton instance to handle all clients, or multiple instances for multiple clients? The Actor preStart() and postStop() methods are called only once for all instances? Or will be called when each new Actor instance is created? Is it good to put database initialisation code inside preStart(), and connection returning code inside postStop()?
Thanks

This is kind of like asking if an object is a singleton. If you only ever create one of the database Actor it will behave as a singleton, but in general Actors are not singletons.
Even if you did just create one, you still need to think about when it might be restarted by the actor system or supervisor.
[Update]
The lifecycle methods are called for every Actor - they are independent entities.
If you are creating an Actor to handle database requests / data access I'd probably have a single Actor that has singleton semantics, but internally it could create and supervise as many or as few Actors that actually deal with the database calls. This would allow you to handle the initialisation and cleanup of the database in a single place (the top level Actor), and allow you to scale internally (if needed) by creating more Actors to handle requests and supervise them to properly handle errors.
As a side note, there's probably plenty of prior art in this scenario so I'd recommend doing a bit of research into how this is handled by others. You should also see how the database driver itself handles threading as you might just be building lots of accidental complexity

Related

Service Fabric actors auto delete

In a ServiceFabric app, I have the necessity to create thousands of stateful Actors, so I need to avoid accumulating Actors when they become useless.
I know I can't delete an Actor from the Actor itself, but I don't want to keep track of Actors and loop to delete them.
The Actors runtime use Garbace collection to remove the deactivated Actor objects (but not their state); so, I was thinking about removing Actor state inside the OnDeactivateAsync() method and let the GC deallocate the Actor object after the usual 60min.
In theory, something like this should be equivalent to delete the Actor, isn't it?
protected override async Task OnActivateAsync()
{
await this.StateManager.TryRemoveStateAsync("MyState");
}
Is there anything remaining that only explicit deletion can remove?
According to the docs, you shouldn't change the state from OnDeactivateAsync.
If you need your Actor to not keep persisted state, you can use attributes to change the state persistence behavior:
No persisted state: State is not replicated or written to disk. This
level is for actors that simply don't need to maintain state reliably.
[StatePersistence(StatePersistence.None)]
class MyActor : Actor, IMyActor
{
}
Finally, you can use the ActorService to query Actors, see if they are inactive, and delete them.
TL;DR There are some additional resources you can free yourself (reminders) and some that only explicit deletion can remove because they are not publicly accessible.
Service Fabric Actor repo is available on GitHub. I am using using persistent storage model which seems to use KvsActorStateProvider behind the scenes so I'll base the answer on that. There is a series of calls that starts at IActorService.DeleteActorAsync and continues over to IActorManager.DeleteActorAsync. Lot of stuff is happening in there including a call to the state provider to remove the state part of the actor. The core code that handles this is here and it seems to be removing not only the state, but also reminders and some internal actor data. In addition, if you are using actor events, all event subscribers are unsubscribed for your actor.
If you really want delete-like behavior without calling the actor runtime, I guess you could register a reminder that would delete the state and unregister itself plus other reminders.

When to create an Akka Actor

I have a REST service which services only one POST request. I want to use an actor to process the request. However I don't know if I should create one actor and derive all the requests using this actor or should I create an actor every time I get a request. What are the pros and cons of these choices.
Also, how is it parallel execution when I create one actor and use that actor to process all my requests. It certainly looks like sequential execution. I would want to understand this as well.
If you use one Actor requests are queued inside the actor mail box and are processed one by one by the actor. This is sequential and not recommended.
Thats why it is said
One actor is no actor.
Create a manager Actor which manages other actors. As actors are quite cheap you can create one actor for every request without any problem.
Do db interactions and other heavy computation using a future and direct results of the future to request handling actor using pipeTo pattern.
Use actors only to divide and distribute work and use Futures to do compute intensive work.
I would create an actor per request and use the "tell" pattern to delegate the work to the newly created actor. If the REST framework you use supports completing the request from another actor (Spray, Akka-HTTP does), then you can complete the request from this new actor. This way your request handling actor is free to handle the next request.
I find this a wonderful resource that explains the pros & cons of ask & tell and per-request-actors. It can be helpful to you.
I agree with what #pamu said. Actors are cheap. But be mindful that if ever you are gonna use a singleton Actor, do not make it stateful it will cause trouble.
And if you are gonna use Futures to do intensive work (which you should do). Make sure you give them specific ExecutionContext / Dispatcher. Using the global dispatcher or ExecutionContext is not good.
Or in each api you have, create a certain dispatcher to control the # of Actors that will work on that kind of endpoint / api.
For example you have "/get/transactions"
specify a dispatcher that would only spawn this # of thread. For this api.
The advantage of this is you can control the # of threads and resources your app uses. When it comes to dealing with heavy traffic. This is a good practice.

How to handle concurrent access to a Scala collection?

I have an Actor that - in its very essence - maintains a list of objects. It has three basic operations, an add, update and a remove (where sometimes the remove is called from the add method, but that aside), and works with a single collection. Obviously, that backing list is accessed concurrently, with add and remove calls interleaving each other constantly.
My first version used a ListBuffer, but I read somewhere it's not meant for concurrent access. I haven't gotten concurrent access exceptions, but I did note that finding & removing objects from it does not always work, possibly due to concurrency.
I was halfway rewriting it to use a var List, but removing items from Scala's default immutable List is a bit of a pain - and I doubt it's suitable for concurrent access.
So, basic question: What collection type should I use in a concurrent access situation, and how is it used?
(Perhaps secondary: Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread?)
(Tertiary: In Scala, what collection type is best for inserts and random access (delete / update)?)
Edit: To the kind responders: Excuse my late reply, I'm making a nasty habit out of dumping a question on SO or mailing lists, then moving on to the next problem, forgetting the original one for the moment.
Take a look at the scala.collection.mutable.Synchronized* traits/classes.
The idea is that you mixin the Synchronized traits into regular mutable collections to get synchronized versions of them.
For example:
import scala.collection.mutable._
val syncSet = new HashSet[Int] with SynchronizedSet[Int]
val syncArray = new ArrayBuffer[Int] with SynchronizedBuffer[Int]
You don't need to synchronize the state of the actors. The aim of the actors is to avoid tricky, error prone and hard to debug concurrent programming.
Actor model will ensure that the actor will consume messages one by one and that you will never have two thread consuming message for the same Actor.
Scala's immutable collections are suitable for concurrent usage.
As for actors, a couple of things are guaranteed as explained here the Akka documentation.
the actor send rule: where the send of the message to an actor happens before the receive of the same actor.
the actor subsequent processing rule: where processing of one message happens before processing of the next message by the same actor.
You are not guaranteed that the same thread processes the next message, but you are guaranteed that the current message will finish processing before the next one starts, and also that at any given time, only one thread is executing the receive method.
So that takes care of a given Actor's persistent state. With regard to shared data, the best approach as I understand it is to use immutable data structures and lean on the Actor model as much as possible. That is, "do not communicate by sharing memory; share memory by communicating."
What collection type should I use in a concurrent access situation, and how is it used?
See #hbatista's answer.
Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread
The second (though the thread on which messages are processed may change, so don't store anything in thread-local data). That's how the actor can maintain invariants on its state.

Akka - How many instances of an actor should you create?

I'm new to the Akka framework and I'm building an HTTP server application on top of Netty + Akka.
My idea so far is to create an actor for each type of request. E.g. I would have an actor for a POST to /my-resource and another actor for a GET to /my-resource.
Where I'm confused is how I should go about actor creation? Should I:
Create a new actor for every request (by this I mean for every request should I do a TypedActor.newInstance() of the appropriate actor)? How expensive is it to create a new actor?
Create one instance of each actor on server start up and use that actor instance for every request? I've read that an actor can only process one message at a time, so couldn't this be a bottle neck?
Do something else?
Thanks for any feedback.
Well, you create an Actor for each instance of mutable state that you want to manage.
In your case, that might be just one actor if my-resource is a single object and you want to treat each request serially - that easily ensures that you only return consistent states between modifications.
If (more likely) you manage multiple resources, one actor per resource instance is usually ideal unless you run into many thousands of resources. While you can also run per-request actors, you'll end up with a strange design if you don't think about the state those requests are accessing - e.g. if you just create one Actor per POST request, you'll find yourself worrying how to keep them from concurrently modifying the same resource, which is a clear indication that you've defined your actors wrongly.
I usually have fairly trivial request/reply actors whose main purpose it is to abstract the communication with external systems. Their communication with the "instance" actors is then normally limited to one request/response pair to perform the actual action.
If you are using Akka, you can create an actor per request. Akka is extremely slim on resources and you can create literarily millions of actors on an pretty ordinary JVM heap. Also, they will only consume cpu/stack/threads when they actually do something.
A year ago I made a comparison between the resource consumption of the thread-based and event-based standard actors. And Akka is even better than the event-base.
One of the big points of Akka in my opinion is that it allows you to design your system as "one actor per usage" where earlier actor systems often forced you to do "use only actors for shared services" due to resource overhead.
I would recommend that you go for option 1.
Options 1) or 2) have both their drawbacks. So then, let's use options 3) Routing (Akka 2.0+)
Router is an element which act as a load balancer, routing the requests to other Actors which will perform the task needed.
Akka provides different Router implementations with different logic to route a message (for example SmallestMailboxPool or RoundRobinPool).
Every Router may have several children and its task is to supervise their Mailbox to further decide where to route the received message.
//This will create 5 instances of the actor ExampleActor
//managed and supervised by a RoundRobinRouter
ActorRef roundRobinRouter = getContext().actorOf(
Props.create(ExampleActor.class).withRouter(new RoundRobinRouter(5)),"router");
This procedure is well explained in this blog.
It's quite a reasonable option, but whether it's suitable depends on specifics of your request handling.
Yes, of course it could.
For many cases the best thing to do would be to just have one actor responding to every request (or perhaps one actor per type of request), but the only thing this actor does is to forward the task to another actor (or spawn a Future) which will actually do the job.
For scaling up the serial requests handling, add a master actor (Supervisor) which in turn will delegate to the worker actors (Children) (round-robin fashion).

Actors (scala/akka): is it implied that the receive method will be accessed in a threadsafe manner?

I assume that the messages will be received and processed in a threadsafe manner. However, I have been reading (some) akka/scala docs but I didn't encounter the keyword 'threadsafe' yet.
It is probably because the actor model assumes that each actor instance processes its own mailbox sequentially. That means it should never happen, that two or more concurrent threads execute single actor instance's code. Technically you could create a method in an actor's class (because it is still an object) and call it from multiple threads concurrently, but this would be a major departure from the actor's usage rules and you would do it "at your own risk", because then you would lose all thread-safety guarantees of that model.
This is also one of the reasons, why Akka introduced a concept of ActorRef - a handle, that lets you communicate with the actor through message passing, but not by calling its methods directly.
I think we have it pretty well documented: http://doc.akka.io/docs/akka/2.3.9/general/jmm.html
Actors are 'Treadsafe'. The Actor System (AKKA), provides each actor with its own 'light-weight thread'. Meaning that this is not a tread, but the AKKA system will give the impression that an Actor is always running in it's own thread to the developer. This means that any operations performed as a result of acting on a message are, for all purposes, thread safe.
However, you should not undermine AKKA by using mutable messages or public state. If you develop you actors to be stand alone units of functionality, then they will be threadsafe.
See also:
http://doc.akka.io/docs/akka/2.3.12/general/actors.html#State
and
http://doc.akka.io/docs/akka/2.3.12/general/jmm.html for a more indepth study of the AKKA memory model and how it manages 'tread' issues.