service fabric - is stateful service single instance per partition - azure-service-fabric

I am trying to digest the Service fabric architectural patterns and its best practices.
use case:
I define a stateful service with 26 partitions, and in each partition I am storing words that are with the same first letter.
1) Does this means that I actually have 26 instances of my stateful service?
2) When outside of the stateful service, i.e in the caller - I am constructing a URI for my service fabric client, specifying the partition ID I want the client to operate on. Does this mean that once I am in the context of the stateful service (i.e service client instantiaded and called the stateful service) - I cannot reference other partitions?
3) Is it true to say that a stateful service is a unit of work that needs to know which partition to operate on, and cannot make a decision on its own? Here I am referring to the many examples where inside the RunAsync method of a stateful service, there are calls to the underlying reliable store, for example, the code taken from this post:
protected override async Task RunAsync(CancellationToken cancelServicePartitionReplica)
{
var myDictionary = await this.StateManager.GetOrAddAsync<IReliableDictionary<string, int>> ("myDictionary");
var partition = base.ServicePartition.PartitionInfo.Id;
byte append = partition.ToByteArray()[0];
while (!cancelServicePartitionReplica.IsCancellationRequested)
{
// Create a transaction to perform operations on data within this partition's replica.
using (var tx = this.StateManager.CreateTransaction())
{
var result = await myDictionary.TryGetValueAsync(tx, "A");
await myDictionary.AddOrUpdateAsync(tx, "A", 0, (k, v) => v + append);
ServiceEventSource.Current.ServiceMessage(this,
$"Append {append}: {(result.HasValue ? result.Value : -1)}");
await tx.CommitAsync();
}
// Pause for 1 second before continue processing.
await Task.Delay(TimeSpan.FromSeconds(3), cancelServicePartitionReplica);
}
}
So, probably my statement 3) is wrong - A stateful service may call its internal storage without someone (a service client) to call it externaly and to supply information for the exact partition. But then, how the code above decides into which partition to put its data? And most importantly, how to later query that data via a service client which should provide an exact partition ID?

Stateful service 'instances' are actually replicas. You configure how many replicas you have for every partition (for performance, scaling, high availability & disaster recovery). Only one replica (primary) does writes. All replicas (secondaries and primary) may be used for reads. A replica contains a shard of your data set.
Data in partition 1 is not shared with partition 2.
Clients calling Stateful services need to decide for themselves with which partition they want to communicate. Services can only read/write in their own partition (directly).
More info here.

Related

What are the limits on actorevents in service fabric?

I am currently testing the scaling of my application and I ran into something I did not expect.
The application is running on a 5 node cluster, it has multiple services/actortypes and is using a shared process model.
For some component it uses actor events as a best effort pubsub system (There are fallbacks in place so if a notification is dropped there is no issue).
The problem arises when the number of actors grows (aka subscription topics). The actorservice is partitioned to 100 partitions at the moment.
The number of topics at that point is around 160.000 where each topic is subscribed 1-5 times (nodes where it is needed) with an average of 2.5 subscriptions (Roughly 400k subscriptions).
At that point communications in the cluster start breaking down, new subscriptions are not created, unsubscribes are timing out.
But it is also affecting other services, internal calls to a diagnostics service are timing out (asking each of the 5 replicas), this is probably due to the resolving of partitions/replica endpoints as the outside calls to the webpage are fine (these endpoints use the same technology/codestack).
The eventviewer is full with warnings and errors like:
EventName: ReplicatorFaulted Category: Health EventInstanceId {c4b35124-4997-4de2-9e58-2359665f2fe7} PartitionId {a8b49c25-8a5f-442e-8284-9ebccc7be746} ReplicaId 132580461505725813 FaultType: Transient, Reason: Cancelling update epoch on secondary while waiting for dispatch queues to drain will result in an invalid state, ErrorCode: -2147017731
10.3.0.9:20034-10.3.0.13:62297 send failed at state Connected: 0x80072745
Error While Receiving Connect Reply : CannotConnect , Message : 4ba737e2-4733-4af9-82ab-73f2afd2793b:382722511 from Service 15a5fb45-3ed0-4aba-a54f-212587823cde-132580461224314284-8c2b070b-dbb7-4b78-9698-96e4f7fdcbfc
I've tried scaling the application but without this subscribe model active and I easily reach a workload twice as large without any issues.
So there are a couple of questions
Are there limits known/advised for actor events?
Would increasing the partition count or/and node count help here?
Is the communication interference logical? Why are other service endpoints having issues as well?
After time spent with the support ticket we found some info. So I will post my findings here in case it helps someone.
The actor events use a resubscription model to make sure they are still connected to the actor. Default this is done every 20 seconds. This meant a lot of resources were being used and eventually the whole system overloaded with loads of idle threads waiting to resubscribe.
You can decrease the load by setting resubscriptionInterval to a higher value when subscribing. The drawback is that it will also mean the client will potentially miss events in the mean time (if a partition is moved).
To counteract the delay in resubscribing it is possible to hook into the lower level service fabric events. The following psuedo code was offered to me in the support call.
Register for endpoint change notifications for the actor service
fabricClient.ServiceManager.ServiceNotificationFilterMatched += (o, e) =>
{
var notification = ((FabricClient.ServiceManagementClient.ServiceNotificationEventArgs)e).Notification;
/*
* Add additional logic for optimizations
* - check if the endpoint is not empty
* - If multiple listeners are registered, check if the endpoint change notification is for the desired endpoint
* Please note, all the endpoints are sent in the notification. User code should have the logic to cache the endpoint seen during susbcription call and compare with the newer one
*/
List<long> keys;
if (resubscriptions.TryGetValue(notification.PartitionId, out keys))
{
foreach (var key in keys)
{
// 1. Unsubscribe the previous subscription by calling ActorProxy.UnsubscribeAsync()
// 2. Resubscribe by calling ActorProxy.SubscribeAsync()
}
}
};
await fabricClient.ServiceManager.RegisterServiceNotificationFilterAsync(new ServiceNotificationFilterDescription(new Uri("<service name>"), true, true));
Change the resubscription interval to a value which fits your need.
Cache the partition id to actor id mapping. This cache will be used to resubscribe when the replica’s primary endpoint changes(ref #1)
await actor.SubscribeAsync(handler, TimeSpan.FromHours(2) /*Tune the value according to the need*/);
ResolvedServicePartition rsp;
((ActorProxy)actor).ActorServicePartitionClientV2.TryGetLastResolvedServicePartition(out rsp);
var keys = resubscriptions.GetOrAdd(rsp.Info.Id, key => new List<long>());
keys.Add(communicationId);
The above approach ensures the below
The subscriptions are resubscribed at regular intervals
If the primary endpoint changes in between, actorproxy resubscribes from the service notification callback
This ends the psuedo code form the support call.
Answering my original questions:
Are there limits known/advised for actor events?
No hard limits, only resource usage.
Would increasing the partition count or/and node count help here? Partition count not. node count maybe, only if that means there are less subscribing entities on a node because of it.
Is the communication interference logical? Why are other service endpoints having issues as well?
Yes, resource contention is the reason.

Stateful or Stateless service for processing servicebus queues

I have a Session enabled Azure servicebus queue. I need some form of service that can read from the queue and process them and save the result (in memory for later retrieval). We are using azure servicefabric in our current architecture. I got few questions regarding which one to choose Stateful or Stateless service.
If I use Stateful service, then based on the documentation my understanding is, service will be running on 1 primary node (assuming 1 partition) and 2 active secondary nodes. That means, if I have a 10 node Service fabric cluster, then this stateful service will be utilizing only one node (VM) primarily.
So if I add a listener to this stateful service to read messages from Queues then that service on primary node will read messages from queues and all other remaining 9 nodes wont be able to utilized. Is this correct?
Whereas if I use Stateless service, I can create instances on all 10 nodes and all of them could listen to the message in Queues and process them in parallel. However, I will loose the option to save the results.
Please advise.
So if I add a listener to this stateful service to read messages from Queues then that service on primary node will read messages from queues and all other remaining 9 nodes wont be able to utilized. Is this correct?
That is correct. With stateful service scenario, only the primary replica will have it's listener executed, and work will be done. Other replicas can be used in read-only mode, but they would not be writing anything into reliable collections.
Whereas if I use Stateless service, I can create instances on all 10 nodes and all of them could listen to the message in Queues and process them in parallel.
Exactly. Stateless services can perform their work in parallel and no state is persisted. That's also the reason whey there's no reliable collection available for this Service Fabric model.
However, I will loose the option to save the results.
Not necessarily true. You could still save your data in a centralized/shared DB, just like you'd do with stateless solutions in the past (for example Cloud Services, or a Azure WebApp).
What you should ask yourself is what problem are you solving. If you have data sharding, the Statful makes more sense. If you don't have data sharding and/or you need to scale out your processing power, rather that scale up, Stateless is a better approach.

Different use case for akka cluster aware router & akka cluster sharding?

Cluster aware router:
val router = system.actorOf(ClusterRouterPool(
RoundRobinPool(0),
ClusterRouterPoolSettings(
totalInstances = 20,
maxInstancesPerNode = 1,
allowLocalRoutees = false,
useRole = None
)
).props(Props[Worker]), name = "router")
Here, we can send message to router, the message will send to a series of remote routee actors.
Cluster sharding (Not consider persistence)
class NewShoppers extends Actor {
ClusterSharding(context.system).start(
"shardshoppers",
Props(new Shopper),
ClusterShardingSettings(context.system),
Shopper.extractEntityId,
Shopper.extractShardId
)
def proxy = {
ClusterSharding(context.system).shardRegion("shardshoppers")
}
override def receive: Receive = {
case msg => proxy forward msg
}
}
Here, we can send message to proxy, the message will send to a series of sharded actors (a.k.a. entities).
So, my question is: it seems both 2 methods can make the tasks distribute to a lot of actors. What's the design choice of above two? Which situation need which choice?
The pool router would be when you just want to send some work to whatever node and have some processing happen, two messages sent in sequence will likely not end up in the same actor for processing.
Cluster sharding is for when you have a unique id on each actor of some kind, and you have too many of them to fit in one node, but you want every message with that id to always end up in the actor for that id. For example modelling a User as an entity, you want all commands about that user to end up with the user but you want the actor to be moved if the cluster topology changes (remove or add nodes) and you want them reasonably balanced across the existing nodes.
Credit to johanandren and the linked article as basis for the following answer:
Both a router and sharding distribute work. Sharding is required if, additionally to load balancing, the recipient actors have to reliably manage state that is directly associated with the entity identifier.
To recap, the entity identifier is a key, derived from the message being sent, determining the message's receipient actor in the cluster.
First of all, can you manage state associated with an identifier across different nodes using a consistently hashing router? A Consistent Hash router will always send messages with an equal identifier to the same target actor. The answer is: No, as explained below.
The hash-based method stops working when nodes in the cluster go Down or come Up, because this changes the associated actor for some identifiers. If a node goes down, messages that were associated with it are now sent to a different actor in the network, but that actor is not informed about the former state of the actor which it is now replacing. Likewise, if a new node comes up, it will take care of messages (identifiers) that were previously associated with a different actor, and neither the new node or the old node are informed about this.
With sharding, on the other hand, the actors that are created are aware of the entity identifier that they manage. Sharding will make sure that there is exactly one actor managing the entity in the cluster. And it will re-create sharded actors on a different node if their parent node goes down. So using persistence they will retain their (persisted) state across nodes when the number of nodes changes. You also don't have to worry about concurrency issues if an actor is re-created on a different node thanks to Sharding. Furthermore, if a message with a new entity identifier is encountered, for which an actor does not exist yet, a new actor is created.
A consistently hashing router may still be of use for caching, because messages with the same key generally do go to the same actor. To manage a stateful entity that exists only once in the cluster, Sharding is required.
Use routers for load balancing, use Sharding for managing stateful entities in a distributed manner.

How to run something on each node in service fabric

In a service fabric application, using Actors or Services - what would the design be if you wanted to make sure that your block of code would be run on each node.
My first idea would be that it had to be a Service with instance count set to -1, but also in cases that you had set to to 3 instances. How would you make a design where the service ensured that it ran some operation on each instance.
My own idea would be having a Actor with state controlling the operations that need to run, and it would itterate over services using serviceProxy to call methods on each instance - but thats just a naive idea for which I dont know if its possible or if it is the proper way to do so?
Some background info
Only Stateless services can be given a -1 for instance count. You can't use a ServiceProxy to target a specific instance.
Stateful services are deployed using 1 or more partitions (data shards). Partition count is configured in advance, as part of the service deployment and can't be changed automatically. For instance if your cluster is scaled out, partitions aren't added automatically.
Autonomous workers
Maybe you can invert the control flow by running Stateless services (on all nodes) and have them query a 'repository' for work items. The repository could be a Stateful service, that stores work items in a Queue.
This way, adding more instances (scaling out the cluster) increases throughput without code modification. The stateless service instances become autonomous workers.
(opposed to an intelligent orchestrator Actor)

How can I reach a specific replica of a stateless service

I've created a stateless service within Service Fabric. It has a SingletonPartition, but multiple instances (InstanceCount is -1 in my case).
I want to communicate with a specific replica of this service. To find all replica's I use:
var fabricClient = new FabricClient();
var serviceUri = new Uri(SERVICENAME);
Partition partition = (await fabricClient.QueryManager.GetPartitionListAsync(serviceUri)).First();
foreach(Replica replica in await fabricClient.QueryManager.GetReplicaListAsync(partition.PartitionInformation.Id))
{
// communicate with this replica, but how to construct the proxy?
//var eventHandlerServiceClient = ServiceProxy.Create<IService>(new Uri(replica.ReplicaAddress));
}
The problem is that there is no overload of the ServiceProxy to create one to the replica. Is there another way to communicate with a specific replica?
Edit
The scenario we are building is the following. We have different moving parts with counter information: 1 named partitioned stateful service (with a couple of hundred partitions), 1 int64 partitioned stateful service, and 1 actor with state. To aggregate the counter information, we need to reach out to all service-partitions and actor-instances.
We could of course reverse it and let everyone send there counts to a single (partitioned) service. But that would add a network call in the normal flow (and thus overhead).
Instead, we came up with the following. The mentioned services&actors are combined into one executable and one servicemanifest. Therefore they are in the same process. We add a stateless service with instancecount -1 to the mentioned services&actors. All counter information is stored inside a static variable. The stateless service can read this counter information.
Now, we only need to reach out to the stateless service (which has an upper limit of the number of nodes).
Just to get some terminology out of the way first, "replica" only applies to stateful services where you have a unique replica set for each partition of a service and replicate state between them for HA. Stateless services just have instances, all of which are equal and identical.
Now to answer your actual question: ServiceProxy doesn't have an option to connect to a specific instance of a deployed stateless service. You have the following options:
Primary replica: connect to the primary replica of a stateful service partition.
Random instance: connect to a random instance of a stateless service.
Random replica: connect to a random replica - regardless of its role - of a stateful service partition.
Random secondary replica - connect to a random secondary replica of a stateful service partition.
E.g.:
ServiceProxy.Create<IMyService>(serviceUri, partitionKey, TargetReplicaSelector.RandomInstance)
So why no option to connect to a specific stateless service instance?
Well, I would turn this question around and ask why would you want to connect to a specific stateless service instance? By definition, each stateless instance should be identical. If you are keeping some state in there - like user sessions - then now you're stateful and should use stateful services.
You might think of intelligently deciding which instance to connect to for load balancing, but again since it's stateless, no instance should be doing more work than any other as long as requests are distributed evenly. And for that, Service Proxy has the random distribution option.
With that in mind, if you still have some reason to seek out specific stateless service instances, you can always use a different communication stack - like HTTP - and do whatever you want.
"Well, I would turn this question around and ask why would you want to connect to a specific stateless service instance?"
One example would be if you have multiple (3x) stateless service instances all having WebSocket connections to different clients, let's say 500 each. And you want to notify all 1500 (500x3) users of the same message, if it was possible to connect directly to a specific instance (which I would expect was possible, since I can query for those instances using the FabricClient), I could send a message to each instance which would redirect it to all connected clients.
Instead we have to come up with any of multiple workarounds:
Have all instances connect to some evented system that allows them to trigger on incoming message, e.g. Azure Event Hubs, Azure Service Bus, RedisCache.
Host an additional endpoint, as mentioned here, which makes it 3 endpoints pr service instance: WCF, WebSocket, HTTP.
Change to a stateful partitioned service which doesn't hold any state or any replicas, but simply allows to call partitions.
Currently having some serious issues with RedisCache so migrating away from that, and would like to avoid external dependencies such as Event Hubs and Service Bus just for this scenario.
Sending many messages each second, which will give additional overhead when having to call HTTP, and then the request need to transition over to the WebSocket context.
In order to target a specific instance of stateless service you can use named partitions. You can have a single instance per partition and use multiple Named partitions. For example, you can have 5 named partitions [0,1,2,3,4] each will have only one instance of the "service". Then you can call it like this
ServiceProxy.Create<IMyService>(serviceUri, partitionKey, TargetReplicaSelector.RandomInstance)
where partitionKey parameter will have one of values [0,1,2,3,4].
the real example would be
_proxyFactory.CreateServiceProxy<IMyService>(
_myServiceUri,
new ServicePartitionKey("0"), // One of "0,1,2,3,4"
TargetReplicaSelector.Default,
MyServiceEndpoints.ServiceV1);
This way you can choose one of 5 instances. But all 5 instancies may not be always available. For example during startup or when the service dies and SF is recreating or it is in InBuild stage... So for this reason you should run Partition discovery