Lets say that I have 4 nodes (N1, N2, N3, N4) in an Akka cluster. Suppose I have actor with name A deployed N4 (by the akka system and transparent to the user). If I decided that I no longer need a lot of computing power, I would scale down the servers to only have 2 nodes, thus node N3 and node N4 are powered down. What would happen to actor A? Would it be dead and should be recreated manually by application logic? Would it be automatically recreated on another node (even with the state lost)?
If you have a regular actor on a node and you shut down that node the actor will be shut down together with the actor system. There are some tools that you can use if you want a specific actor to (almost) always be alive on some node, ClusterSingleton keeps an actor alive on one node as continuously as possible without ever having multiple instances of it in the cluster, ClusterSharding makes it possible to keep actors alive and redistributable across cluster using an identifier. Akka persistence allows for the state of an actor survive being stopped on one node and started on another.
Read more about all of this in the docs, and I really recommend reading the general sections on what akka cluster is to get a firm understanding before starting to use it: http://doc.akka.io/docs/akka/2.4.0/scala/index-network.html
Related
I have two actors: the first already in the cluster (all in localhost) at port 25457 and the second to be started at port 25458.
Both have the following behaviour:
val addressBookKey = ServiceKey[Message]("address_book")
val listingResponseAdapter = ctx.messageAdapter[Receptionist.Listing] {
case addressBookKey.Listing(p) => OnlinePlayers(p) }
Cluster(ctx.system).manager ! Join(address)
ctx.system.receptionist ! Register(addressBookKey, ctx.self)
ctx.system.receptionist ! Subscribe(addressBookKey, listingResponseAdapter)
Behaviors.receiveMessagePartial {
case m =>
System.err.println(m)
Behaviors.same
}
When the second actor joins stderr prints Set(), Set(Actor[akka://system/user#0]) and then Set(Actor[akka://system/user#0], Actor[akka://system#localhost:27457/user#0])
When the second actor leaves, the first actor prints two times Set(Actor[akka://system/user#0])
How can the second actor receive directly all cluster participants?
Why the first actor prints two times the set after the second leaves?
Thanks
Joining the cluster is an async process, you have only just triggered joining by sending Join to the manager, actually joining the cluster is happening at some point after that. The receptionist can only know about registered services on nodes that has completed joining the cluster.
This means that when you subscribe to the receptionist, joining the cluster has likely not completed yet, so you get the locally registered services (because of ordering guarantees the receptionist will always get the local register message before it receives the subscribe), then once joining the cluster completes the receptionist learns about services on other nodes and the subscriber is updated.
To be sure other nodes are known you would have to wait with the subscription until the node has joined the cluster, this can be achieved by subscribing to cluster state and subscribing only after the node itself has been marked as Up.
In general it is often good to make something that works regardless of cluster node join as it makes it easier to test and also run the same component without cluster. So for example switching behaviour of the subscribing actor when there are no registered services vs at least one, or with a minimum count of services for the service key.
Not entirely sure about why you see the duplicated update when the actor on the other node "leaves", but there are some specifics around the CRDT used for the receptionist registry where it may have to re-remove a service for consistency, that could perhaps explain it.
I've been doing recently some experiments on the behavior of Vert.x and verticles in HA mode. I observed some weaknesses on how Vert.x dispatches the load on various nodes.
1. One node in a cluster crashes
Imagine a configuration with a cluster of some Vert.x nodes (say 4 or 5, 10, whatever), each having some hundreds or thousands verticles. If one node crashes, only one of the remaining nodes will restart all the verticles that had been deployed on the crashed node. Moreover, there is no guarantee that it will be the node with the smallest number of deployed verticles. This is unfair and in worst case, the same node would get all of the verticles from nodes that have crashed before, probably leading to a domino crash scenario.
2. Adding a node to a heavily loaded cluster
Adding a node to a heavily loaded cluster doesn't help to reduce the load on other nodes. Existing verticles are not redistributed on the new node and new verticles are created on the node that invokes the vertx.deployVerticle().
While the first point allows, within some limits, high availability, the second point breaks the promise of simple horizontal scalability.
I may be very possibly wrong: I may have misunderstood something or my configurations are maybe faulty. This question is about confirming this behavior and your advises about how to cope with it or point out my errors. Thanks in for your feedback.
This is how I create my vertx object:
VertxOptions opts = new VertxOptions()
.setHAEnabled(true)
;
// start vertx in cluster mode
Vertx.clusteredVertx(opts, vx_ar -> {
if (vx_ar.failed()) {
...
}
else {
vertx = vertx = vx_ar.result();
...
}
});
and this is how I create my verticles:
DeploymentOptions depOpt = new DeploymentOptions()
.setInstances(1).setConfig(prm).setHa(true);
// deploy the verticle
vertx
.deployVerticle("MyVerticle", depOpt, ar -> {
if(ar.succeeded()) {
...
}
else {
...
}
});
EDIT on 12/25/2019: After reading Alexey's comments, I believe I probably wasn't clear.
By promise of simple horizontal scalability I wasn't meaning that redistributing load upon insertion of a
new node is simple. I was meaning Vert.x promise to the developer that
what he needs to do to have his application to scale horizontally would be
simple. Scale is the very first argument on Vert.x home page, but, you're right, after re-reading carefully there's nothing about horizontal scaling on newly added nodes. I believe I was too much influenced by Elixir or Erlang. Maybe Akka provides this on the JVM, but I didn't try.
Regarding second comment, it's not (only) about the number of requests per second. The load I'm considering here is just the number of verticles "that are doing nothing else that waiting for a message". In a further experiment I can will make this verticle do some work and I will send an update. For the time being, imagine long living verticles that represent in memory actually connected user sessions on a backend. The system runs on 3 (or whatever number) clustered nodes each hosting few thousands (or whatever more) of sessions/verticles. From this state, I added a new node and waited until it is fully integrated in the cluster. Then I killed one of the first 3 nodes. All verticles are restarted fine but only on one node which, moreover, is not guaranteed to be the "empty" one. The destination node seems actually to be chosen at random : I did several tests and I have even observed verticles from all killed nodes being restarted on the same node. On a real platform with sufficient load, that would probably lead to a global crash.
I believe that implementing in Vert.x a fair restart of verticles, ie, distribute the verticles on all remaining nodes based on a given measure of their load (CPU, RAM, #of verticles, ...) would be simpler (not simple) than redistributing the load on a newly inserted node as that would probably require the capability for a scheduler to "steal" verticles from another one.
Yet, on a production system, not being "protected" by some kind of fair distribution of workload on the cluster may lead to big issues and as Vert.x is quite mature I was surprised by the outcome of my experiments, thus thinking I was doing something wrong.
I am building an open-source distributed economic simulation platform using Akka (especially the remote and cluster packages). A key bottleneck in such simulations is the fact that communication patterns between actors evolve over the course of the simulation and often actors will end up sending loads of messages over the wire between nodes in the cluster.
I am looking for a mechanism to detect actors on some node that are communicating heavily with actors on some other node and move them to that other node. Is this possible using existing Akka cluster sharding functionality? Perhaps this is what Roland Kuhn meant by "automatic actor tree partitioning" is his answer to this SO question.
To move shards around according to your own logic is doable by implementing a custom ShardAllocationStrategy.
You just have to extend ShardAllocationStrategy and implement those 2 methods:
def allocateShard(requester: ActorRef, shardId: ShardId,
currentShardAllocations: Map[ActorRef, immutable.IndexedSeq[ShardId]])
: Future[ActorRef]
def rebalance(currentShardAllocations: Map[ActorRef,
immutable.IndexedSeq[ShardId]], rebalanceInProgress: Set[ShardId])
: Future[Set[ShardId]]
The first one determines which region will be chosen when allocating a new shard, and provides you the shards already allocated. The second one is called regularly and lets you control which shards to rebalance to another region (for example, if they became too unbalanced).
Both functions return a Future, which means that you can even query another actor to get the information you need (for example, an actor that has the affinity information between your actors).
For the affinity itself, I think you have to implement something yourself. For example, actors could collect statistics about their sender nodes and post that regularly to a cluster singleton that would determine which actors should be moved to the same node.
Cluster aware router:
val router = system.actorOf(ClusterRouterPool(
RoundRobinPool(0),
ClusterRouterPoolSettings(
totalInstances = 20,
maxInstancesPerNode = 1,
allowLocalRoutees = false,
useRole = None
)
).props(Props[Worker]), name = "router")
Here, we can send message to router, the message will send to a series of remote routee actors.
Cluster sharding (Not consider persistence)
class NewShoppers extends Actor {
ClusterSharding(context.system).start(
"shardshoppers",
Props(new Shopper),
ClusterShardingSettings(context.system),
Shopper.extractEntityId,
Shopper.extractShardId
)
def proxy = {
ClusterSharding(context.system).shardRegion("shardshoppers")
}
override def receive: Receive = {
case msg => proxy forward msg
}
}
Here, we can send message to proxy, the message will send to a series of sharded actors (a.k.a. entities).
So, my question is: it seems both 2 methods can make the tasks distribute to a lot of actors. What's the design choice of above two? Which situation need which choice?
The pool router would be when you just want to send some work to whatever node and have some processing happen, two messages sent in sequence will likely not end up in the same actor for processing.
Cluster sharding is for when you have a unique id on each actor of some kind, and you have too many of them to fit in one node, but you want every message with that id to always end up in the actor for that id. For example modelling a User as an entity, you want all commands about that user to end up with the user but you want the actor to be moved if the cluster topology changes (remove or add nodes) and you want them reasonably balanced across the existing nodes.
Credit to johanandren and the linked article as basis for the following answer:
Both a router and sharding distribute work. Sharding is required if, additionally to load balancing, the recipient actors have to reliably manage state that is directly associated with the entity identifier.
To recap, the entity identifier is a key, derived from the message being sent, determining the message's receipient actor in the cluster.
First of all, can you manage state associated with an identifier across different nodes using a consistently hashing router? A Consistent Hash router will always send messages with an equal identifier to the same target actor. The answer is: No, as explained below.
The hash-based method stops working when nodes in the cluster go Down or come Up, because this changes the associated actor for some identifiers. If a node goes down, messages that were associated with it are now sent to a different actor in the network, but that actor is not informed about the former state of the actor which it is now replacing. Likewise, if a new node comes up, it will take care of messages (identifiers) that were previously associated with a different actor, and neither the new node or the old node are informed about this.
With sharding, on the other hand, the actors that are created are aware of the entity identifier that they manage. Sharding will make sure that there is exactly one actor managing the entity in the cluster. And it will re-create sharded actors on a different node if their parent node goes down. So using persistence they will retain their (persisted) state across nodes when the number of nodes changes. You also don't have to worry about concurrency issues if an actor is re-created on a different node thanks to Sharding. Furthermore, if a message with a new entity identifier is encountered, for which an actor does not exist yet, a new actor is created.
A consistently hashing router may still be of use for caching, because messages with the same key generally do go to the same actor. To manage a stateful entity that exists only once in the cluster, Sharding is required.
Use routers for load balancing, use Sharding for managing stateful entities in a distributed manner.
When I create a new Service Fabric actor the underlying (auto generated) actor service is configured to use 10 partitions.
I'm wondering how much I need to care about this value?
In particular, I wonder whether the Actor Runtime has support for changing the number of partitions of an actor service on a running cluster.
The Partition Service Fabric reliable services topic says:
In rare cases, you may end up needing more partitions than you have initially chosen. As you cannot change the partition count after the fact, you would need to apply some advanced partition approaches, such as creating a new service instance of the same service type. You would also need to implement some client-side logic that routes the requests to the correct service instance, based on client-side knowledge that your client code must maintain.
However, due to the nature of Actors and that they are managed by the Actor Runtime I'm tempted to believe that it would indeed be possible to do this. -- That the Actor Runtime would be able to take care of all the heavylifting required to re-partition actor instances.
Is that at all possible?
The number of partitions in a running service cannot be changed. This is true of Actors as well as Reliable Services. Typically, you would want to pick a large number of partitions (more than the number of nodes) up front and then scale out the number of nodes in the cluster instead of trying to repartition your data on the fly. Take a look at Abhishek and Matthew's comments in the discussion here for some ideas on how to estimate how many partitions you might need.