Uniqueness of persistenceId in akka-persistence - scala

I'm using the scala api for akka-persistence to persist a group of actor instances that are organized into a tree. Each node in the tree is a persistent actor and is named based on the path to that node from a 'root' node. The persistenceId is set to the name. For example the root node actor has persistenceId 'root'. The next node down has persistenceId 'root-europe'. Another actor might have persistenceId 'root-europe-italy'.
The state in each actor includes a list of the names of its children. E.g. the 'root' actor maintains a list of 'europe', 'asia' etc as part of its state.
I have implemented snapshotting for this system. When the root is triggered to snapshot, it does so and then tells each child to do the same.
The problem arises during snapshot recovery. When I re-create an actor with persistenceId = 'root' (by passing in the name as a constructor parameter), the SnapshotOffer event received by that actor is wrong. It is, for example, 'root-europe-italy....'. This seems like a contradiction of the contract for persistence, where the persistenceId identifies the actor state to be recovered. I got around this problem by reversing the persistenceId of node actors (e.g. 'italy-europe-root') so this seems to be something related to the way files are retrieved by the persistence module. Note that I tried other approaches first, for example I used a variety of separators between the node names, or no separator at all.
Has anyone else experienced this problem, or can an akka-persistence developer help me understand why this might have happened?
BTW: I am using the built-in file-based snapshot storage for now.
Thanks.

OK - so the issue was with Akka, and has now been resolved. See the related ticket to find out when the patch is released.

Related

Creating children actors in Akka Typed persistent actors

Let’s assume an application implemented using Akka Typed has a persistent actor. This persistent actor as part of its operations creates transient (or non-persistent) children actors, each child has a unique ID and these IDs are part of the persisted state. The persistent actor also needs some way of communicating with its children, but we don’t want to persist children’s ActorRefs as they aren’t really part of the state. On recovery the persistent actor should recreate its children based on the recovered state. It doesn’t sound like a very unusual use case, I’m trying to figure out what’s the cleanest way of implementing it. I could create the children actors inside the andThen Effect in my command handler which is meant for side effects, but then there’s no way to save the child’s ActorRef from there. That seems to be a more general characteristic of the typed Persistence API - it’s very hard to have non-persistent state in persistent actors (which could be used for storing the transient children ActorRefs in this case). One solution I came up with is having a sort of “proxy” actor for creating children, keeping a map of IDs and ActorRefs, and forwarding messages based on IDs. The persistent actor holds a reference to that proxy actor and contacts it every time it needs to create a new child or send something to one of the existing children. I have mixed feelings about it though and would appreciate if somebody can point me to a better solution.
If you are not using snapshots then the persistence mechanism does not store the State object, it stores the sequence of Events that led to that State object. On recovery it simply re-plays those Events in the order in which they happened and your eventHandler will return a modified State object that reflects the effect of each event.
This means that the State object can contain values that are not themselves persisted but are just set by the processing of certain Events. They are, in effect, cached values derived from the persistent values in the State.
In your case the operation that causes the creation of a transient actor will be captured as an Event on the actor. So you can create the transient actor in the eventHandler and put the ActorRef in the new State object. When the actor is recovered it will replay that event and your actor will re-create the transient actor.
If you are using snapshots then I don't think there is a requirement that the snapshot object is the same type as your State object, so you can snapshot the state without the ActorRefs and re-create them when you get the SnapshotOffer message.
It's a design goal of typed persistence that the State be fully recoverable from the events (or from a snapshot and the events since that snapshot).
In general, the only way to have state that's non-persistent is to wrap the EventSourcedBehavior in a Behaviors.setup block which sets up the state. One option for this is some sort of mutable state (e.g. a var or (likely exclusive or) mutable collection) in setup, which the command/event/recovery handlers manipulate.
A much more immutable alternative is to define an immutable fixture in setup, which includes a child actor which was spawned in setup to manage non-persistent state. You can also put things like the entity ID or other things that are immutable for at least this incarnation of the entity into the fixture.

Starting Actors on-demand by identifier in Akka

I'm currently implementing a system that that receives inbound messages from an external monitoring system. I'm translating these messages into more concise 'events', and I'm using these to alter the state of 'Managed System' objects. Akka Actors seemed like a good use case for encapsulating mutable state in concurrent applications.
The managed systems are identified by a name (99% of the time this is a hostname). Whenever a proper event is received, the system routes the message to the correct actor based on the name property. At first I used to use actorSelection and the complete paths of said actors, but that was very ugly, and I saw several people advise against relying on the fully qualified name of an actor to deliver message.
So I've set up a simple EventBus, which is great as I can now simply do:
eventBus.subscribe(subscriber1, "/managedSystem01")
eventBus.subscribe(subscriber2, "/managedSystem02")
eventBus.publish(MonitoringEvent("/managedSystem01", MonitoringMessage("managedSystem01", "N", "CPU_LOAD_HIGH", True)))
eventBus.publish(MonitoringEvent("/managedSystem02", MonitoringMessage("managedSystem02", "Y", "DISK_USAGE_HIGH", True)))
Of course, I now have the issue that, should I receive and event that concerns a managed system for which I've not spawned an actor yet (this is entirely possibly, it is impossible for me to get an absolute list of managed systems unfortunately), the message will be routed to the dead-letter mailbox.
Ideally I don't want this to happen. When it is unable to address a specific actor, I want to spawn a new one dynamically.
I suppose that, theoretically, I could subscribe to DeadLetter messages but:
That sounds a little 'hacky', since those message are essentially reserved for the system
Is it even possible to recover the original message (in my case, the MonitoringMessage) that is sent to the DeadLetter mailbox?
Alternatively is there a way to check if there are ZERO subscribers to a certain "topic"?
What you describe ("send to Actor by some identifier, if it does not exist buffer until it gets created and then deliver to that newly on-demand created Actor") is implemented in Akka as Cluster Sharding.
While it is designed primarily for sharding load (work) across a cluster, you could use it locally as well, since your requirement is essentially a scaled down (to one node) version of problem that it solves. It takes care of starting new Actors if they don't exist for a given identifier etc, so you'd simply subscribe the shard-region to the events and it'll take care of creating the actors for you.

How to assign and manage persistenceid in akka

My understanding of the persistenceId in a persistent actor in Akka is that it must have the same value upon reincarnations of the same actor in order to be able to recover state from the persistent store. How would this work in the event of a complete failure of the containing process? Does this mean that the persistenceId of an actor with a known path must be maintained in some other application specific persistent store?
I could understand how this would work with a specific actor with a known path, but how would this work with a worker actor that is controlled by a router pool?
Actors are created and stopped dynamically. How would I be able to associate a durable persistenceId with a specific actor in a router pool and be able to recover the state of the actor in the event of a failure in the containing process.

How generate unique id for Actor?

Suppose I have an application that uses actors for processing User. So there is one UserActor per user. Also every user Actor is mapped to user via id, e.g. to process actions with concrete user you should get Actor like that:
ActorSelection actor = actorSystem.actorSelection("/user/1");
where 1 is user id.
So the problem is - how generate unique id inside cluster effectively? First it needs to check that new id will not duplicate an existent one. I can create one actor for generating id's which will live in one node, and before creating any new UserActor Generator is asked for id, but this leads to additional request inside cluster whenever user is created. Is there a way to do this more effective? Are there build-in akka techniques to do that?
P.S. May this architecture for using Actor is not effective any suggestion/best practice is welcome.
I won't say whether or not your approach is a good idea. That's going to be up to you to decide. If I do understand your problem correctly though, then I can suggest a high level approach to making it work for you. If I understand correctly, you have a cluster, and for any given userId, there should be an actor in the system that handles requests for it, and it should only be on one node and consistently reachable based on the user id of the user. If that's correct, then consider the following approach.
Let's start first with a simple actor, let's call it UserRequestForwarder. This actors job is to find an actor instance for a request for a particular user id and forward on to it. If that actor instance does not yet exist, then this actor will create it before forwarding onto it. A very rough sketch could look like this:
class UserRequestForwarder extends Actor{
def receive = {
case req # DoSomethingForUser(userId) =>
val childName = s"user-request-handler-$userId"
val child = context.child(childName).getOrElse(context.actorOf(Props[UserRequestHandler]))
child forward req
}
}
Now this actor would be deployed onto every node in the cluster via a ConsistentHashingPool router configured in such a way that there would be one instance per node. You just need to make sure that there is something in every request that needs to travel through this router that allows it to be consistently hashed to the node that handles requests for that user (hopefully using the user id)
So if you pass all requests through this router, they will always land on the node that is responsible for that user, ending up in the UserRequestForwarder which will then find the correct user actor on that node and pass the request on to it.
I have not tried this approach myself, but it might work for what you are trying to do provided I understood your problem correctly.
Not an akka expert, so I can't offer code, but shouldn't the following approach work:
Have a single actor being responsible for creating the actors. And have it keep a Hashset of actor names, for actors that it created, and that didn't die already.
If you have to spread the load between multiple actors you can dispatch the task based on the first n digits of the hashcode of the actor name that has to be created.
It seems like you have your answer on how to generate the unique ID. In terms of your larger question, this is what Akka cluster sharding is designed to solve. It will handle distributing shards among your cluster, finding or starting your actors within the cluster and even rebalancing.
http://doc.akka.io/docs/akka/2.3.5/contrib/cluster-sharding.html
There's also an activator with a really nice example.
http://typesafe.com/activator/template/akka-cluster-sharding-scala

In akka 2.x, is root actor supervised by someone else?

Reading the Akka doc : http://doc.akka.io/docs/akka/2.2.3/AkkaScala.pdf its states in section
2.2.1 Hierarchical Structure
The only prerequisite is to know that each actor has exactly one supervisor,
which is the actor that created it.
But at the top of the hierarchy tree the parent actor has no supervisor ?
It is very well explained in akka doc (see The Top-Level Supervisors section), a little excerpt from it:
The root guardian is the grand-parent of all so-called “top-level”
actors and supervises all the special actors mentioned in Top-Level
Scopes for Actor Paths using the SupervisorStrategy.stoppingStrategy,
whose purpose is to terminate the child upon any type of Exception.
All other throwables will be escalated … but to whom? Since every real
actor has a supervisor, the supervisor of the root guardian cannot be
a real actor. And because this means that it is “outside of the
bubble”, it is called the “bubble-walker”. This is a synthetic
ActorRef which in effect stops its child upon the first sign of
trouble and sets the actor system’s isTerminated status to true as
soon as the root guardian is fully terminated (all children
recursively stopped).