Akka Source, is there a way to throttle based on a global rate limit coming from an api call? - scala

There is the throttle function on Source https://doc.akka.io/docs/akka/current/stream/operators/Source-or-Flow/throttle.html but this only works in a local context (1 server). If I wanted to share a rate limit (for 3rd party api calls) with other servers (say I have 2 servers instead of 1 for redundancy), I'd like the rate limit to efficiently be spread across the 2 servers (if one server dies from out of memory, the other server should pick up the freed up rate limit until the dead server restarts).
Is this possible somehow through akka's Source assuming I have something like Redis returning whether an action is allowed or disallowed + what the time until an action will be allowed?

Off the top of my head, you can dispense with Redis and use Akka Cluster to deal with failure detection: and set up an actor to subscribe to the cluster events (member joined, member left/downed) and update the local throttle.
Local dynamic throttling can be implemented via a custom graph stage (materializing as a handle through which to change the throttle), or you can also do that via an actor (in which case an ask stage is nice). In the latter case, you can go further and have the throttling actors coordinate among themselves to reallocate unused request capacity between nodes.


Limiting the number of times an endpoint of Kubernetes pod can be accessed?

I have a machine learning model inside a docker image. I pushed the docker image to google container registry and then deploy it inside a Kubernetes pod. There is a fastapi application that runs on Port 8000 and this Fastapi endpoint is public
(call it mymodel:8000).
The structure of fastapi is :
asynd def get_homepage()
aysnc def get_modelpage()
async def get_results(query: Form(...))
User can put query and submit them and get results from the machine learning model running inside the docker. I want to limit the number of times a query can be made by all the users combined. So if the query limit is 100, all the users combined can make only 100 queries in total.
I thought of a way to do this:
Store a database that stores the number of times GET and POST method has been called. As soon as the total number of times POST has been called crosses the limit, stop accepting any more queries.
Is there an alternative way of doing this using Kubernetes limits? Such as I can define a limit_api_calls such that the total number of times mymodel:8000 is accessed is at max equal to limit_api_calls.
I looked at the documentation and I could only find setting limits for CPUs, Memory and rateLimits.
There are several approaches that could satisfy your needs.
Custom implementation: As you mentioned, keep in a persistence layer the number of API calls received and deny requests after it has been reached.
Use a service mesh: Istio (for instance) will let you limit the number of requests received and act as a circuit breaker.
Use an external Api Manager: Apigee will also let you limit and even charge your users, however if it is only for internal use (not pay per use) I definitely won't recommend it.
The tricky part is what you want to happen after the limit has been reached, if it is just a pod you may exit the application to finish and clear it.
Otherwise, if you have a deployment with its replica set and several resources associated with it (like configmaps), you probably want to use some kind of asynchronous alert or polling check to clean up everything related to your deployment. You may want to have a deep look at orchestrators like Airflow (Composer) and use several tools such as Helm for keeping deployments easy.

What are the limits on actorevents in service fabric?

I am currently testing the scaling of my application and I ran into something I did not expect.
The application is running on a 5 node cluster, it has multiple services/actortypes and is using a shared process model.
For some component it uses actor events as a best effort pubsub system (There are fallbacks in place so if a notification is dropped there is no issue).
The problem arises when the number of actors grows (aka subscription topics). The actorservice is partitioned to 100 partitions at the moment.
The number of topics at that point is around 160.000 where each topic is subscribed 1-5 times (nodes where it is needed) with an average of 2.5 subscriptions (Roughly 400k subscriptions).
At that point communications in the cluster start breaking down, new subscriptions are not created, unsubscribes are timing out.
But it is also affecting other services, internal calls to a diagnostics service are timing out (asking each of the 5 replicas), this is probably due to the resolving of partitions/replica endpoints as the outside calls to the webpage are fine (these endpoints use the same technology/codestack).
The eventviewer is full with warnings and errors like:
EventName: ReplicatorFaulted Category: Health EventInstanceId {c4b35124-4997-4de2-9e58-2359665f2fe7} PartitionId {a8b49c25-8a5f-442e-8284-9ebccc7be746} ReplicaId 132580461505725813 FaultType: Transient, Reason: Cancelling update epoch on secondary while waiting for dispatch queues to drain will result in an invalid state, ErrorCode: -2147017731 send failed at state Connected: 0x80072745
Error While Receiving Connect Reply : CannotConnect , Message : 4ba737e2-4733-4af9-82ab-73f2afd2793b:382722511 from Service 15a5fb45-3ed0-4aba-a54f-212587823cde-132580461224314284-8c2b070b-dbb7-4b78-9698-96e4f7fdcbfc
I've tried scaling the application but without this subscribe model active and I easily reach a workload twice as large without any issues.
So there are a couple of questions
Are there limits known/advised for actor events?
Would increasing the partition count or/and node count help here?
Is the communication interference logical? Why are other service endpoints having issues as well?
After time spent with the support ticket we found some info. So I will post my findings here in case it helps someone.
The actor events use a resubscription model to make sure they are still connected to the actor. Default this is done every 20 seconds. This meant a lot of resources were being used and eventually the whole system overloaded with loads of idle threads waiting to resubscribe.
You can decrease the load by setting resubscriptionInterval to a higher value when subscribing. The drawback is that it will also mean the client will potentially miss events in the mean time (if a partition is moved).
To counteract the delay in resubscribing it is possible to hook into the lower level service fabric events. The following psuedo code was offered to me in the support call.
Register for endpoint change notifications for the actor service
fabricClient.ServiceManager.ServiceNotificationFilterMatched += (o, e) =>
var notification = ((FabricClient.ServiceManagementClient.ServiceNotificationEventArgs)e).Notification;
* Add additional logic for optimizations
* - check if the endpoint is not empty
* - If multiple listeners are registered, check if the endpoint change notification is for the desired endpoint
* Please note, all the endpoints are sent in the notification. User code should have the logic to cache the endpoint seen during susbcription call and compare with the newer one
List<long> keys;
if (resubscriptions.TryGetValue(notification.PartitionId, out keys))
foreach (var key in keys)
// 1. Unsubscribe the previous subscription by calling ActorProxy.UnsubscribeAsync()
// 2. Resubscribe by calling ActorProxy.SubscribeAsync()
await fabricClient.ServiceManager.RegisterServiceNotificationFilterAsync(new ServiceNotificationFilterDescription(new Uri("<service name>"), true, true));
Change the resubscription interval to a value which fits your need.
Cache the partition id to actor id mapping. This cache will be used to resubscribe when the replica’s primary endpoint changes(ref #1)
await actor.SubscribeAsync(handler, TimeSpan.FromHours(2) /*Tune the value according to the need*/);
ResolvedServicePartition rsp;
((ActorProxy)actor).ActorServicePartitionClientV2.TryGetLastResolvedServicePartition(out rsp);
var keys = resubscriptions.GetOrAdd(rsp.Info.Id, key => new List<long>());
The above approach ensures the below
The subscriptions are resubscribed at regular intervals
If the primary endpoint changes in between, actorproxy resubscribes from the service notification callback
This ends the psuedo code form the support call.
Answering my original questions:
Are there limits known/advised for actor events?
No hard limits, only resource usage.
Would increasing the partition count or/and node count help here? Partition count not. node count maybe, only if that means there are less subscribing entities on a node because of it.
Is the communication interference logical? Why are other service endpoints having issues as well?
Yes, resource contention is the reason.

Akka Actor Messaging Delay

I'm experiencing issues scaling my app with multiple requests.
Each request sends an ask to an actor, which then spawns other actors. This is fine, however, under load(5+ asks at once), the ask takes a massive amount of time to deliver the message to the target actor. The original design was to bulkhead requests evenly, but this is causing a bottleneck. Example:
In this picture, the ask is sent right after the query plan resolver. However, there is a multi-second gap when the Actor receives this message. This is only experienced under load(5+ requests/sec). I first thought this was a starvation issue.
Each planner-executor is a seperate instance for each request. It spawns a new 'Request Acceptor' actor each time(it logs 'requesting score' when it receives a message).
I gave the actorsystem a custom global executor(big one). I noticed the threads were not utilized beyond the core threadpool size even during this massive delay
I made sure all executioncontexts in the child actors used the correct executioncontext
Made sure all blocking calls inside actors used a future
I gave the parent actor(and all child) a custom dispatcher with core size 50 and max size 100. It did not request more(it stayed at 50) even during these delays
Finally, I tried creating a totally new Actorsystem for each request(inside the planner-executor). This also had no noticable effect!
I'm a bit stumped by this. From these tests it does not look like a thread starvation issue. Back at square one, I have no idea why the message takes longer and longer to deliver the more concurrent requests I make. The Zipkin trace before reaching this point does not degrade with more requests until it reaches the ask here. Before then, the server is able to handle multiple steps to e.g veify the request, talk to the db, and then finally go inside the planner-executor. So I doubt the application itself is running out of cpu time.
We had this very similar issue with Akka. We observed huge delay in ask pattern to deliver messages to the target actor on peek load.
Most of these issues are related to heap memory consumption and not because of usages of dispatchers.
Finally we fixed these issues by tuning some of the below configuration and changes.
1) Make sure you stop entities/actors which are no longer required. If its a persistent actor then you can always bring it back when you need it.
Refer : https://doc.akka.io/docs/akka/current/cluster-sharding.html#passivation
2) If you are using cluster sharding then check the akka.cluster.sharding.state-store-mode. By changing this to persistence we gained 50% more TPS.
3) Minimize your log entries (set it to info level).
4) Tune your logs to publish messages frequently to your logging system. Update the batch size, batch count and interval accordingly. So that the memory is freed. In our case huge heap memory is used for buffering the log messages and send in bulk. If the interval is more then you may fill your heap memory and that affects the performance (more GC activity required).
5) Run blocking operations on a separate dispatcher.
6) Use custom serializers (protobuf) and avoid JavaSerializer.
7) Add the below JAVA_OPTS to your jar
export JAVA_OPTS="$JAVA_OPTS -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -Djava.security.egd=file:/dev/./urandom"
The main thing is XX:MaxRAMFraction=2 which will utilize more than 60% of available memory. By default its 4 means your application will use only one fourth of the available memory, which might not be sufficient.
Refer : https://blog.csanchez.org/2017/05/31/running-a-jvm-in-a-container-without-getting-killed/

Why doesn't my Azure Function scale up?

For a test, I created a new function app. I added two functions, one was an http trigger that when invoked, pushed 500 messages to a queue. The other, a queue trigger to read the messages. The queue trigger function code, was setup to read a message and randomly sleep from 1 to 30 seconds. This was intended to simulate longer running tasks.
I invoked the http trigger to create the messages, then watched the que fill up (messages were processed by the other trigger). I also wired up app insights to this function app, but I did not see is scale beyond 1 server.
Do Azure functions scale up soley on the # of messages in the que?
Also, I implemented these functions in Powershell.
If you're running in the Azure Functions consumption plan, we monitor both the length and the throughput of your queue to determine whether additional VM resources are needed.
Note that a single function app instance can process multiple queue messages concurrently without needing to scale across multiple VMs. So if all 500 messages can be consumed relatively quickly (again, in the consumption plan), then it's possible that you won't scale at all.
The exact algorithm for scaling isn't published (it's subject to lots of tweaking), but generally speaking you can expect the system to automatically scale you out if messages are getting added to the queue faster than your functions can process them. Your app will also scale out if the latency of the first message in the queue is continuously increasing (meaning, messages are sitting idle and not getting processed). The time between VMs getting added is usually in the tens of seconds.
There are some thresholds based on queue count as well. For example, the system tries to ensure that there is at least 1 VM for every 1K queue messages, but usually the scale decisions are based on message throughput as I described earlier.
I think #Chris Gillum put it well, it's hard for us to push the limits of the server to the point that things will start to scale.
Some other options available are:
Use durable functions and scale with Threading:
Another method could be to use Event Hubs which are designed for massive scale. Instead of queues, have Function #1 trigger an Event, and your Function #2 subscribed to that Event Hub trigger. Adding Streaming Analytics, could also be an option to more fully expand on capabilities if needed.

Using Scala Akka framework for blocking CLI calls

I'm relatively new to Akka & Scala, but I would like to use Akka as a generic framework to pull together information from various web tools, and cli commands.
I understand the general principal that in an Actor model, it is highly desirable not to have the actors block. And in the case of the http requests, there are async http clients (such as Spray) that means that I can handle the requests asynchronously within the Actor framework.
However, I'm unsure what is the best approach when combining actors with existing blocking API calls such as the scala ProcessBuilder/ProcessIO libraries. In terms of issuing these CLI commands I expect a relatively small amount of concurrency, e.g. perhaps executing a max of 10 concurrent CLI invocations on a 12 core machine.
Is it better to have a single actor managing these CLI commands, farming the actual work off to Futures that are created as needed? Or would it be cleaner just to maintain a set of separate actors backed by a PinnedDispatcher? Or something else?
From the Akka documentation ( http://doc.akka.io/docs/akka/snapshot/general/actor-systems.html#Blocking_Needs_Careful_Management ):
Blocking Needs Careful Management
In some cases it is unavoidable to do blocking operations, i.e. to put a thread to sleep for an indeterminate time, waiting for an external event to occur. Examples are legacy RDBMS drivers or messaging APIs, and the underlying reason in typically that (network) I/O occurs under the covers. When facing this, you may be tempted to just wrap the blocking call inside a Future and work with that instead, but this strategy is too simple: you are quite likely to find bottle-necks or run out of memory or threads when the application runs under increased load.
The non-exhaustive list of adequate solutions to the “blocking problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router [Java, Scala]), making sure to configure a thread pool which is either dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded number of tasks of this nature will exhaust your memory or thread limits).
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they occur as actor messages.
The first possibility is especially well-suited for resources which are single-threaded in nature, like database handles which traditionally can only execute one outstanding query at a time and use internal synchronization to ensure this. A common pattern is to create a router for N actors, each of which wraps a single DB connection and handles queries as sent to the router. The number N must then be tuned for maximum throughput, which will vary depending on which DBMS is deployed on what hardware."