Actor lookup in an Akka Cluster - scala

I have a Scala application where I have several nodes. Each node has an ActorSystem with a main actor and each actor must have some ActorRef to certain actors (for example "Node 1" has "Actor3" and "Actor3" needs the ActorRef for "Actor7" and "Actor8" to do its work). My problem is that I don't know if another node ("Node2") has the "Actor1" or the "Actor7" I'm looking for.
My idea was to loop inside every MemberUp, using the ActorSelection several times and asking every new member if it has the actors I'm looking for. Is this the only way I can do it? Is there a way to do this more efficiently?

An alternative approach to ActorSelection can be lookup table. If you need to make lots of actor selection and actor creation is not so dynamic, it can be better solution.
On each node you can create a data structure like Map[String,List[String]] first key is Node name and List value is for actor refs in this node.
Trick is when any node has change for its actors (creating, stopping) another actor should notice other nodes about changes so any nodes have synchronised updated map.
If you guaranty it, then each node can lookup actor existence;
map.get(nodeName) match {
case Some(n) => n.contains(actorName)
case None => false
}

I've solved a very similar problem in our cluster by having a DiscoveryActor at a known path on every node. The protocol of the DiscoveryActor has
Register(name, actorRef)
Subscribe(name)
Up(name, actorRef)
Down(name, actorRef)
Each named actor sends a Register to its local DiscoveryActor which in turn broadcasts the Up to all local subscribers and all other DiscoveryActor's on other nodes, which in turn broadcast to their subscribers
The DiscoveryActor watches MemberUp/MemberDown to determine when to look for a new peer DiscoveryActor and broadcast its local registrations or broadcast Down for registrations of downed peers.

Related

Akka: model of Network with actors

I have a task to model an interaction between nodes of net, by using Akka aktors.
Target model contains nodes, each of which either send and receive messages to other nodes.
So within actor approach, each actor, before getting started, must obtain references on all other actors, to which it send messages.
It would seem, the simple way to pass this refs through constructor params:
val node1 = context.spawn(Node(), "node1")
val node2 = context.spawn(Node(node1), "node2")
The problem is that node1 does't get in constructor the ActorRef on node2. If there were way to update actor "node1" after creation with ref on "node2", the problem does not appear. But, as I understand, update actors is not provided by Akka.
Other (working) way, which I found, it to use special inial-message:
At first, parent actor spawn childs:
case class Refs(Set[ActorRef[Node]])
val node1 = context.spawn(Node(), "node1")
val node2 = context.spawn(Node(), "node2")
At second, it send them message, which contains set of refs on each created child actor.
val refs = Inialise(Set(node1, node2))
context.children.foreach(child => child ! refs)
And only after receive Inialise-message each child begin send and receive any messages.
Is there any other way (patterns) to realise solve this task? In https://doc.akka.io/docs/akka/current/typed/interaction-patterns.html#scheduling-messages-to-self Interaction Patterns I also didn't find any variants.
I looked to the side of EventBus and preStart lifecycle method, but I'm not sure.
If you want to realize a defined topology, sending Initialise messages after all the actors are set up is likely to be the clearest way to accomplish that.
An alternative approach is to allow the set of nodes which a node is linked to to be more dynamic, with an AddLink (or whatever) message, which is sent when a node is started with links to existing nodes:
val node1 = context.spawn(Node(), "node1")
val node2 = context.spawn(Node(node1), "node2")
And when node2 starts it sends an AddLink(context.self) message to node1, since it knows about node1.
Another alternative might be for the creation of a node actor to require a reference to an actor which can answer the question "with whom should I be interlinked?" (this actor could be the parent or even the guardian actor for the ActorSystem, but even there, explicitly passing an ActorRef (rather than relying on context.parent or context.system: either of those requires an unsafeUpcast) is probably better). On startup, your nodes make an ask of that actor, which tracks pending requests, the desired topology, and which actors have made an ask (thus registering themselves) so that it replies once all of the nodes the asker is linked to exist.

Different use case for akka cluster aware router & akka cluster sharding?

Cluster aware router:
val router = system.actorOf(ClusterRouterPool(
RoundRobinPool(0),
ClusterRouterPoolSettings(
totalInstances = 20,
maxInstancesPerNode = 1,
allowLocalRoutees = false,
useRole = None
)
).props(Props[Worker]), name = "router")
Here, we can send message to router, the message will send to a series of remote routee actors.
Cluster sharding (Not consider persistence)
class NewShoppers extends Actor {
ClusterSharding(context.system).start(
"shardshoppers",
Props(new Shopper),
ClusterShardingSettings(context.system),
Shopper.extractEntityId,
Shopper.extractShardId
)
def proxy = {
ClusterSharding(context.system).shardRegion("shardshoppers")
}
override def receive: Receive = {
case msg => proxy forward msg
}
}
Here, we can send message to proxy, the message will send to a series of sharded actors (a.k.a. entities).
So, my question is: it seems both 2 methods can make the tasks distribute to a lot of actors. What's the design choice of above two? Which situation need which choice?
The pool router would be when you just want to send some work to whatever node and have some processing happen, two messages sent in sequence will likely not end up in the same actor for processing.
Cluster sharding is for when you have a unique id on each actor of some kind, and you have too many of them to fit in one node, but you want every message with that id to always end up in the actor for that id. For example modelling a User as an entity, you want all commands about that user to end up with the user but you want the actor to be moved if the cluster topology changes (remove or add nodes) and you want them reasonably balanced across the existing nodes.
Credit to johanandren and the linked article as basis for the following answer:
Both a router and sharding distribute work. Sharding is required if, additionally to load balancing, the recipient actors have to reliably manage state that is directly associated with the entity identifier.
To recap, the entity identifier is a key, derived from the message being sent, determining the message's receipient actor in the cluster.
First of all, can you manage state associated with an identifier across different nodes using a consistently hashing router? A Consistent Hash router will always send messages with an equal identifier to the same target actor. The answer is: No, as explained below.
The hash-based method stops working when nodes in the cluster go Down or come Up, because this changes the associated actor for some identifiers. If a node goes down, messages that were associated with it are now sent to a different actor in the network, but that actor is not informed about the former state of the actor which it is now replacing. Likewise, if a new node comes up, it will take care of messages (identifiers) that were previously associated with a different actor, and neither the new node or the old node are informed about this.
With sharding, on the other hand, the actors that are created are aware of the entity identifier that they manage. Sharding will make sure that there is exactly one actor managing the entity in the cluster. And it will re-create sharded actors on a different node if their parent node goes down. So using persistence they will retain their (persisted) state across nodes when the number of nodes changes. You also don't have to worry about concurrency issues if an actor is re-created on a different node thanks to Sharding. Furthermore, if a message with a new entity identifier is encountered, for which an actor does not exist yet, a new actor is created.
A consistently hashing router may still be of use for caching, because messages with the same key generally do go to the same actor. To manage a stateful entity that exists only once in the cluster, Sharding is required.
Use routers for load balancing, use Sharding for managing stateful entities in a distributed manner.

Akka Actor internal state during shard migration in a cluster

we are using Akka sharding to distribute our running actors across several Nodes. Those actors are Persistent and we keep their internal state in the database.
Now we need to add ActorRef to "metrics actor", running on each node. Each actor in shard is supposed to send telemetric data to metrics actor - it must choose the right metrics actor which is running locally on the very same node. Reason is, Metric actor gathers data peer node.
Now, I was just thinking to create Metric actor in Main method (which runs initially on each node):
val mvMetrics : ActorRef = system.actorOf(MetricsActor("mv"), "mvMetrics")
and then pass that reference to ClusterSharding inicialisation as a part of Actors props object:
ClusterSharding(system).start(
typeName = shardName,
entityProps = MyShardActor.props(mvMetrics),
settings = ClusterShardingSettings(system),
extractEntityId = idExtractor,
extractShardId = shardResolver)
My question is, what happen if such created actors migrate between nodes, e.g. from Node A -> B? I can imagine that migrated props object on node B remains the same as on node A, so the ActorRef remains the same and therefore newly created actor will be sending metrics data to original node A?
Thanks
How about taking advantage of ActorRef.path? imagine that each node has its actor named in a certain way, and then an actor will dynamically find the relevant metrics actor using the path.

How to assign and manage persistenceid in akka

My understanding of the persistenceId in a persistent actor in Akka is that it must have the same value upon reincarnations of the same actor in order to be able to recover state from the persistent store. How would this work in the event of a complete failure of the containing process? Does this mean that the persistenceId of an actor with a known path must be maintained in some other application specific persistent store?
I could understand how this would work with a specific actor with a known path, but how would this work with a worker actor that is controlled by a router pool?
Actors are created and stopped dynamically. How would I be able to associate a durable persistenceId with a specific actor in a router pool and be able to recover the state of the actor in the event of a failure in the containing process.

Akka actorSelection vs actorOf Difference

Is there a difference between these two? When I do:
context.actorSelection(actorNameString)
I get an ActorSelection reference which I can resolve using the resolveOne and I get back a Future[ActorRef]. But with an actorOf, I get an ActorRef immediately. Is there any other vital differences other than this?
What might be the use cases where in I would like to have the ActorRef wrapped in a Future?
actorOf is used to create new actors by supplying their Props objects.
actorSelection is a "pointer" to a path in actor tree. By using resolveOne you will get actorRef of already existing actor under that path - but that actorRef takes time to resolve, hence the Future.
Here's more detailed explanation:
http://doc.akka.io/docs/akka/snapshot/general/addressing.html
An actor reference designates a single actor and the life-cycle of the reference matches that actor’s life-cycle; an actor path represents a name which may or may not be inhabited by an actor and the path itself does not have a life-cycle, it never becomes invalid. You can create an actor path without creating an actor, but you cannot create an actor reference without creating corresponding actor.
In either processes, there is an associated cost of producing an ActorRef.
Creating user top level actors with system.actorOf cost a lot as it has to deal with error kernel initialization which also cost significantly. Creating ActorRef from child actor is very fair making it suitable for one actor per task design. If in an application, for every request, a new set of actors are created without cleanup, your app may run out of memory although akka actors are cheap. Another good is actorOf is immediate as you mentioned.
In abstract terms, actorSelection with resolveOne looks up the actor tree and produces an actorRef in a future as is not so immediate especially on remote systems. But it enforces re-usability. Futures abstract the waiting time of resolving an ActorRef.
Here is a brief summary of ActorOf vs. ActorSelection; I hope it helps:
https://getakka.net/articles/concepts/addressing.html
Actor references may be looked up using the ActorSystem.ActorSelection
method. The selection can be used for communicating with said actor
and the actor corresponding to the selection is looked up when
delivering each message.
In addition to ActorSystem.actorSelection there is also
ActorContext.ActorSelection, which is available inside any actor as
Context.ActorSelection. This yields an actor selection much like its
twin on ActorSystem, but instead of looking up the path starting from
the root of the actor tree it starts out on the current actor.
Summary: ActorOf vs. ActorSelection
ActorOf only ever creates a new actor, and it creates it as a direct
child of the context on which this method is invoked (which may be any
actor or actor system). ActorSelection only ever looks up existing
actors when messages are delivered, i.e. does not create actors, or
verify existence of actors when the selection is created.