Service Fabric ServicePartitionResolver.ResolveAsync appears to ignore load balancer

Service Fabric ServicePartitionResolver.ResolveAsync appears to ignore load balancer - azure-service-fabric

I have a Stateless service which acts as a Gateway for all requests into my 5 Node cluster
This service forwards requests onto services within the cluster
protected virtual async Task<ResolvedServicePartition> FindPartitionAsync(long key = 0)
{
var resolver = ServicePartitionResolver.GetDefault();
var result = await resolver.ResolveAsync(FullServiceName, ServicePartitionKey.Singleton, CancellationToken.None).ConfigureAwait(false);
return result;
}
private async Task<string> EstablishProxyUrlAsync(string method, long key = 0)
{
var partition = await FindPartitionAsync(key).ConfigureAwait(false);
if (key != 0)
{
Log.Information($"{this.GetType().Name} method {method} request resolved by partition {partition.Info.Id}");
}
var endpoints = JObject.Parse(partition.GetEndpoint().Address)["Endpoints"];
var address = endpoints[""].ToString().TrimEnd('/');
var proxyUrl = $"{address}/api/{Area}/{method}";
return proxyUrl;
}
I have a suspicion that if I have a service - TestService that is on all 5 nodes of my cluster, the code above ignores the load balancer so the request simply goes to the instance on the node that picked up the request
Is there any way to fix this?
Do I need to implement my own load balancer then? All calls from the outside come into the gateway as that seemed to be the recommended way, I.e a single point of entry. However it appears as though that concept is now going to slow things down and put more load on a specific node as there is no load balancer to pick the best node. Eg if I have a gateway method GetCars which calls GetCars on a stateless service that is across all 5 nodes, I want a way of load balancing to one of those nodes not all requests to go to the local instance
Paul

I'd expect the resolved endpoint address to contain the internal IP address of the node that hosts the primary replica of TestService and the port that its listener uses.
For a stateful service, this can only ever be a single endpoint.
For a Singleton service, you'll get a cached result from the ServicePartitionResolver.
You can force a refresh, using resolver.ResolveAsync() that has an overload that takes the earlier ResolvedServicePartition.
Also, as internal calls are not made over the internet, the call will not be passing through the (Azure) load balancer.
Added more info:
Likely you'll run the gateway on all nodes. If not, make sure you do that. As every gateway has its own resolver that resolves to a 'random' instance, you should see that load will then be spread across the downstream services automatically.
P.S. have a look at Traefik, it could help solve this problem for you without you having to build a solid reverse proxy.
more info here and here

Related

Hazelcast IMap Lock not working on kubernetes across different pods

We are using Hazelcast 4 to implement distributed locking across two pods on kuberentes.
We have developed distributed application, two pods of micro service has been created. Both instances are getting auto discovered and forming members.
We are trying to use IMap.lock(key) method to achieve distributed locking across two pods however both pods are acquiring lock at same time, thereby executing the business logic at the concurrently. Also hazelcast management center shows zero locks for the created Imap.
Can you please help on how to achieve synchronization of imap lock(key) so that single pod get the lock for given key at given point of time ?
Code Snippet:-
HazelcastInstance client = HazelcastClient.newHazelcastClient(clientConfig);
try{
IMap map = client.getMap("customers");
map.lock( key );
//business logic
} finally {
map.unlock( key );
}
}

Can you create an mvce and confirm the version of Hazelcast used please.
There are tests for locks here that you can perhaps use as a way to simplify to determine where the fault lies.

How to run something on each node in service fabric

In a service fabric application, using Actors or Services - what would the design be if you wanted to make sure that your block of code would be run on each node.
My first idea would be that it had to be a Service with instance count set to -1, but also in cases that you had set to to 3 instances. How would you make a design where the service ensured that it ran some operation on each instance.
My own idea would be having a Actor with state controlling the operations that need to run, and it would itterate over services using serviceProxy to call methods on each instance - but thats just a naive idea for which I dont know if its possible or if it is the proper way to do so?

Some background info
Only Stateless services can be given a -1 for instance count. You can't use a ServiceProxy to target a specific instance.
Stateful services are deployed using 1 or more partitions (data shards). Partition count is configured in advance, as part of the service deployment and can't be changed automatically. For instance if your cluster is scaled out, partitions aren't added automatically.
Autonomous workers
Maybe you can invert the control flow by running Stateless services (on all nodes) and have them query a 'repository' for work items. The repository could be a Stateful service, that stores work items in a Queue.
This way, adding more instances (scaling out the cluster) increases throughput without code modification. The stateless service instances become autonomous workers.
(opposed to an intelligent orchestrator Actor)

service fabric - is stateful service single instance per partition

I am trying to digest the Service fabric architectural patterns and its best practices.
use case:
I define a stateful service with 26 partitions, and in each partition I am storing words that are with the same first letter.
1) Does this means that I actually have 26 instances of my stateful service?
2) When outside of the stateful service, i.e in the caller - I am constructing a URI for my service fabric client, specifying the partition ID I want the client to operate on. Does this mean that once I am in the context of the stateful service (i.e service client instantiaded and called the stateful service) - I cannot reference other partitions?
3) Is it true to say that a stateful service is a unit of work that needs to know which partition to operate on, and cannot make a decision on its own? Here I am referring to the many examples where inside the RunAsync method of a stateful service, there are calls to the underlying reliable store, for example, the code taken from this post:
protected override async Task RunAsync(CancellationToken cancelServicePartitionReplica)
{
var myDictionary = await this.StateManager.GetOrAddAsync<IReliableDictionary<string, int>> ("myDictionary");
var partition = base.ServicePartition.PartitionInfo.Id;
byte append = partition.ToByteArray()[0];
while (!cancelServicePartitionReplica.IsCancellationRequested)
{
// Create a transaction to perform operations on data within this partition's replica.
using (var tx = this.StateManager.CreateTransaction())
{
var result = await myDictionary.TryGetValueAsync(tx, "A");
await myDictionary.AddOrUpdateAsync(tx, "A", 0, (k, v) => v + append);
ServiceEventSource.Current.ServiceMessage(this,
$"Append {append}: {(result.HasValue ? result.Value : -1)}");
await tx.CommitAsync();
}
// Pause for 1 second before continue processing.
await Task.Delay(TimeSpan.FromSeconds(3), cancelServicePartitionReplica);
}
}
So, probably my statement 3) is wrong - A stateful service may call its internal storage without someone (a service client) to call it externaly and to supply information for the exact partition. But then, how the code above decides into which partition to put its data? And most importantly, how to later query that data via a service client which should provide an exact partition ID?

Stateful service 'instances' are actually replicas. You configure how many replicas you have for every partition (for performance, scaling, high availability & disaster recovery). Only one replica (primary) does writes. All replicas (secondaries and primary) may be used for reads. A replica contains a shard of your data set.
Data in partition 1 is not shared with partition 2.
Clients calling Stateful services need to decide for themselves with which partition they want to communicate. Services can only read/write in their own partition (directly).
More info here.

How can I reach a specific replica of a stateless service

I've created a stateless service within Service Fabric. It has a SingletonPartition, but multiple instances (InstanceCount is -1 in my case).
I want to communicate with a specific replica of this service. To find all replica's I use:
var fabricClient = new FabricClient();
var serviceUri = new Uri(SERVICENAME);
Partition partition = (await fabricClient.QueryManager.GetPartitionListAsync(serviceUri)).First();
foreach(Replica replica in await fabricClient.QueryManager.GetReplicaListAsync(partition.PartitionInformation.Id))
{
// communicate with this replica, but how to construct the proxy?
//var eventHandlerServiceClient = ServiceProxy.Create<IService>(new Uri(replica.ReplicaAddress));
}
The problem is that there is no overload of the ServiceProxy to create one to the replica. Is there another way to communicate with a specific replica?
Edit
The scenario we are building is the following. We have different moving parts with counter information: 1 named partitioned stateful service (with a couple of hundred partitions), 1 int64 partitioned stateful service, and 1 actor with state. To aggregate the counter information, we need to reach out to all service-partitions and actor-instances.
We could of course reverse it and let everyone send there counts to a single (partitioned) service. But that would add a network call in the normal flow (and thus overhead).
Instead, we came up with the following. The mentioned services&actors are combined into one executable and one servicemanifest. Therefore they are in the same process. We add a stateless service with instancecount -1 to the mentioned services&actors. All counter information is stored inside a static variable. The stateless service can read this counter information.
Now, we only need to reach out to the stateless service (which has an upper limit of the number of nodes).

Just to get some terminology out of the way first, "replica" only applies to stateful services where you have a unique replica set for each partition of a service and replicate state between them for HA. Stateless services just have instances, all of which are equal and identical.
Now to answer your actual question: ServiceProxy doesn't have an option to connect to a specific instance of a deployed stateless service. You have the following options:
Primary replica: connect to the primary replica of a stateful service partition.
Random instance: connect to a random instance of a stateless service.
Random replica: connect to a random replica - regardless of its role - of a stateful service partition.
Random secondary replica - connect to a random secondary replica of a stateful service partition.
E.g.:
ServiceProxy.Create<IMyService>(serviceUri, partitionKey, TargetReplicaSelector.RandomInstance)
So why no option to connect to a specific stateless service instance?
Well, I would turn this question around and ask why would you want to connect to a specific stateless service instance? By definition, each stateless instance should be identical. If you are keeping some state in there - like user sessions - then now you're stateful and should use stateful services.
You might think of intelligently deciding which instance to connect to for load balancing, but again since it's stateless, no instance should be doing more work than any other as long as requests are distributed evenly. And for that, Service Proxy has the random distribution option.
With that in mind, if you still have some reason to seek out specific stateless service instances, you can always use a different communication stack - like HTTP - and do whatever you want.

"Well, I would turn this question around and ask why would you want to connect to a specific stateless service instance?"
One example would be if you have multiple (3x) stateless service instances all having WebSocket connections to different clients, let's say 500 each. And you want to notify all 1500 (500x3) users of the same message, if it was possible to connect directly to a specific instance (which I would expect was possible, since I can query for those instances using the FabricClient), I could send a message to each instance which would redirect it to all connected clients.
Instead we have to come up with any of multiple workarounds:
Have all instances connect to some evented system that allows them to trigger on incoming message, e.g. Azure Event Hubs, Azure Service Bus, RedisCache.
Host an additional endpoint, as mentioned here, which makes it 3 endpoints pr service instance: WCF, WebSocket, HTTP.
Change to a stateful partitioned service which doesn't hold any state or any replicas, but simply allows to call partitions.
Currently having some serious issues with RedisCache so migrating away from that, and would like to avoid external dependencies such as Event Hubs and Service Bus just for this scenario.
Sending many messages each second, which will give additional overhead when having to call HTTP, and then the request need to transition over to the WebSocket context.

In order to target a specific instance of stateless service you can use named partitions. You can have a single instance per partition and use multiple Named partitions. For example, you can have 5 named partitions [0,1,2,3,4] each will have only one instance of the "service". Then you can call it like this
ServiceProxy.Create<IMyService>(serviceUri, partitionKey, TargetReplicaSelector.RandomInstance)
where partitionKey parameter will have one of values [0,1,2,3,4].
the real example would be
_proxyFactory.CreateServiceProxy<IMyService>(
_myServiceUri,
new ServicePartitionKey("0"), // One of "0,1,2,3,4"
TargetReplicaSelector.Default,
MyServiceEndpoints.ServiceV1);
This way you can choose one of 5 instances. But all 5 instancies may not be always available. For example during startup or when the service dies and SF is recreating or it is in InBuild stage... So for this reason you should run Partition discovery

Forward message to next Round Robin routee

I have this Play app that connects to a distant server in order to consume a given API. In order to load balance my requests to the distant server, I connect multiple accounts to that same server. Each account can query the API a given number of times. Each account is handled by an Akka actor and these actors are behind an Akka Round Robin router. Thus when wanting to consume the distant API, I "ask" the RR router for the wanted info.
This implementation runs fine until, one account gets disconnected. Basically, when one account is disconnected, the actor returns a given object that says "something was wrong with the connection", and then I send a second request to the RR router again to be handled by another account.
My question is, instead of having to have the "retry" logic outside the router-routee group, is there a way to do it inside? I am thinking that for example at router level, define a logic that handles these "something was wrong with the connection" messages by automatically forwarding the request to the next routee to be handled by it, and only return a final response once all routees have been tried and none worked?
Does Akka provide a simple way of achieving this or should I just carry on with my implementation?

I'm not sure if I fully understand your design but I think you should try using first complete model supported by the ScatterGatherFirstCompleted routing logic.
router.type-mapping {
...
scatter-gather-pool = "akka.routing.ScatterGatherFirstCompletedPool"
scatter-gather-group = "akka.routing.ScatterGatherFirstCompletedGroup"
..
}
in the simple form
---> Route
--> SGFC-1
RR ->
or possibly combined with round robin router.
---> Route
--> SGFC-1
RR ->
--> SGFC-2
---> Route
The same as in your proposal connections are represented by the routes. SGFC-1 and SGFC-2 should have access to the same pool of routees (conenctions.) or part of the pool.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse