I'm experiencing issues scaling my app with multiple requests.
Each request sends an ask to an actor, which then spawns other actors. This is fine, however, under load(5+ asks at once), the ask takes a massive amount of time to deliver the message to the target actor. The original design was to bulkhead requests evenly, but this is causing a bottleneck. Example:
In this picture, the ask is sent right after the query plan resolver. However, there is a multi-second gap when the Actor receives this message. This is only experienced under load(5+ requests/sec). I first thought this was a starvation issue.
Design:
Each planner-executor is a seperate instance for each request. It spawns a new 'Request Acceptor' actor each time(it logs 'requesting score' when it receives a message).
I gave the actorsystem a custom global executor(big one). I noticed the threads were not utilized beyond the core threadpool size even during this massive delay
I made sure all executioncontexts in the child actors used the correct executioncontext
Made sure all blocking calls inside actors used a future
I gave the parent actor(and all child) a custom dispatcher with core size 50 and max size 100. It did not request more(it stayed at 50) even during these delays
Finally, I tried creating a totally new Actorsystem for each request(inside the planner-executor). This also had no noticable effect!
I'm a bit stumped by this. From these tests it does not look like a thread starvation issue. Back at square one, I have no idea why the message takes longer and longer to deliver the more concurrent requests I make. The Zipkin trace before reaching this point does not degrade with more requests until it reaches the ask here. Before then, the server is able to handle multiple steps to e.g veify the request, talk to the db, and then finally go inside the planner-executor. So I doubt the application itself is running out of cpu time.
We had this very similar issue with Akka. We observed huge delay in ask pattern to deliver messages to the target actor on peek load.
Most of these issues are related to heap memory consumption and not because of usages of dispatchers.
Finally we fixed these issues by tuning some of the below configuration and changes.
1) Make sure you stop entities/actors which are no longer required. If its a persistent actor then you can always bring it back when you need it.
Refer : https://doc.akka.io/docs/akka/current/cluster-sharding.html#passivation
2) If you are using cluster sharding then check the akka.cluster.sharding.state-store-mode. By changing this to persistence we gained 50% more TPS.
3) Minimize your log entries (set it to info level).
4) Tune your logs to publish messages frequently to your logging system. Update the batch size, batch count and interval accordingly. So that the memory is freed. In our case huge heap memory is used for buffering the log messages and send in bulk. If the interval is more then you may fill your heap memory and that affects the performance (more GC activity required).
5) Run blocking operations on a separate dispatcher.
6) Use custom serializers (protobuf) and avoid JavaSerializer.
7) Add the below JAVA_OPTS to your jar
export JAVA_OPTS="$JAVA_OPTS -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -Djava.security.egd=file:/dev/./urandom"
The main thing is XX:MaxRAMFraction=2 which will utilize more than 60% of available memory. By default its 4 means your application will use only one fourth of the available memory, which might not be sufficient.
Refer : https://blog.csanchez.org/2017/05/31/running-a-jvm-in-a-container-without-getting-killed/
Regards,
Vinoth
Related
We're experiencing strange memory behavior on our web server built using Akka HTTP.
Our architecture goes like this:
web server routes calls various actors, get results for future and streams it to response
actors call non-blocking operations (using futures), combine and process data fetched from them and pipe results to sender. We're using standard Akka actors, implementing its receive method (not Akka typed)
there is no blocking code anywhere in the app
When I run web server locally, at the start it takes around 350 MB. After first request, memory usage jumps to around 430 MB and slowly is increasing with each request (monitored using Activity Monitor on Mac). But shouldn't GC clean things after each request? Shouldn't memory usage after processing be 350 MB again?
I also installed YourKit java profiler and here is a digram of head memory
It can be seen that once memory usage increase, it never goes back, and system is stateless. Also, when I run GC manually from profiler, it almost doesn't do anything, just a small decrease in memory usage. I understand some services might cache things after first request, consuming memory temporarily, but is there any policy inside Akka Actors or Akka HTTP about this?
I tried to check objects furthest from GC but it only shows library classes and Akka built in classes, nothing related to our code.
So, I have a 2 questions:
How the actor is closing resources and freeing memory after message processing? Did you experienced anything similar?
Is there any better way of profiling Akka HTTP which will show me stacktrace of using classed furthest from GC?
On a side note, is it advisable to use scheduler inside Actors (running inside Akka HTTP server)? When I do that, it seems memory usage increases heavily and app runs our of memory on DEV environment.
Thanks in advance,
Amer
An actor remains active until it is explicitly stopped: there is no garbage collection.
Probably the two most common methods for managing actor lifetimes (beyond the actor itself deciding that it's time to stop) are:
Parent is responsible for stopping children. If the actors are being spawned for performing specific tasks on behalf of the parent, for instance, this approach is called for.
Using an inactivity timeout. If the actors represent domain entities (e.g. an actor for every user account, where this actor in some sense serves as an in-memory cache), using context.setReceiveTimeout to cause a ReceiveTimeout message to be sent to the actor after the timeout has passed (note that in some cases the scheduled send of that message may not be canceled in time if a message was enqueued in the mailbox but not processed when the timeout expired: receiving a ReceiveTimeout is not a guarantee that the timeout has passed since the last received message) is a pretty reasonable solution, especially if using Akka Persistence and Akka Cluster Sharding to allow the actor's state to be recovered.
Update to add:
Regarding
shouldn't GC clean things after each request?
The short answer is no, GC will not clean things after each request (and if it does, that's a very good sign that you haven't provisioned enough memory).
The longer answer is that the characteristics of garbage collection on the JVM are very underspecified: the only rule a garbage collection implementation has to respect is that it never frees an object reachable from a GC root (basically any variable on a thread's stack or static to a class) by a chain of strong references. When and even whether the garbage collector reclaims the space taken up by garbage is entirely implementation dependent (I say "whether" to account for the existence of the Epsilon garbage collector, which never frees memory; this is useful for benchmarking JVMs without the complication of garbage collection and also in environments where the application can be restarted when it runs out of memory: the JVM crash is in this some sense the actual garbage collector).
You could try executing java.lang.System.gc when the server stops: this may cause a GC run (note that there is no requirement that the system actually collect any garbage in such a scenario). If a garbage collector will free any memory, about the only time it has to run is if there's not enough space to fulfill an object allocation request: therefore if the application stops allocating objects, there may not be a garbage collection run.
For performance reasons, most modern garbage collectors in the JVM wait until there's no more than a certain amount of free space before they collect garbage: this is because the time taken to reclaim all space is proportional to the number of objects which aren't reclaimable and for a great many applications, the pattern is that most objects are comparatively ephemeral, so the number of objects which aren't reclaimable is reasonably constant. The consequence of that is that the garbage collector will do about the same amount of work in a "full" GC for a given application regardless of how much free space there is.
I frequently see queues in software architecture, especially those called "scalable" with prominent representative of Actor from Akka.io multi-actor platform. However, how can queue be scalable, if we have to synchronize placing messages in queue (and therefore operate in single thread vs multi thread) and again synchronize taking out messages from queue (to assure, that message it taken exactly once)? It get's even more complicated, when those messages can change state of (actor) system - in this case even after taking out message from queue, it cannot be load balanced, but still processed in single thread.
Is it correct, that putting messages in queue must be synchronized?
Is it correct, that putting messages out of queue must be synchronized?
If 1 or 2 is correct, then how is queue scalable? Doesn't synchronization to single thread immediately create bottleneck?
How can (actor) system be scalable, if it is statefull?
Does statefull actor/bean mean, that I have to process messages in single thread and in order?
Does statefullness mean, that I have to have single copy of bean/actor per entire system?
If 6 is false, then how do I share this state between instances?
When I am trying to connect my new P2P node to netowrk, I believe I have to have some "server" that will tell me, who are other peers, is that correct? When I am trying to download torrent, I have to connect to tracker - if there is "server" then we do we call it P2P? If this tracker will go down, then I cannot connect to peers, is that correct?
Is synchronization and statefullness destroying scalability?
Is it correct, that putting messages in queue must be synchronized?
Is it correct, that putting messages out of queue must be synchronized?
No.
Assuming we're talking about the synchronized java keyword then that is a reenetrant mutual exclusion lock on the object. Even multiple threads accessing that lock can be fast as long as contention is low. And each object has its own lock so there are many locks, each which only needs to be taken for a short time, i.e. it is fine-grained locking.
But even if it did, queues need not be implemented via mutual exclusion locks. Lock-free and even wait-free queue data structures exist. Which means the mere presence of locks does not automatically imply single-threaded execution.
The rest of your questions should be asked separately because they are not about message queuing.
Of course you are correct in that a single queue is not scalable. The point of the Actor Model is that you can have millions of Actors and therefore distribute the load over millions of queues—if you have so many cores in your cluster. Always remember what Carl Hewitt said:
One Actor is no actor. Actors come in systems.
Each single actor is a fully sequential and single-threaded unit of computation. The whole model is constructed such that it is perfectly suited to describe distribution, though; this means that you create as many actors as you need.
Alright so I have never done intense concurrent operations like this before, theres three main parts to this algorithm.
This all starts with a Vector of around 1 Million items.
Each item gets processed in 3 main stages.
Task 1: Make an HTTP Request, Convert received data into a map of around 50 entries.
Task 2: Receive the map and do some computations to generate a class instance based off the info found in the map.
Task 3: Receive the class and generate/add to multiple output files.
I initially started out by concurrently running task 1 with 64K entries across 64 threads (1024 entries per thread.). Generating threads in a for loop.
This worked well and was relatively fast, but I keep hearing about actors and how they are heaps better than basic Java threads/Thread pools. I've created a few actors etc. But don't know where to go from here.
Basically:
1. Are actors the right way to achieve fast concurrency for this specific set of tasks. Or is there another way I should go about it.
2. How do you know how many threads/actors are too many, specifically in task one, how do you know what the limit is on number of simultaneous connections is (Im on mac). Is there a golden rue to follow? How many threads vs how large per thread pool? And the actor equivalents?
3. Is there any code I can look at that implements actors for a similar fashion? All the code Im seeing is either getting an actor to print hello world, or super complex stuff.
1) Actors are a good choice to design complex interactions between components since they resemble "real life" a lot. You can see them as different people sending each other requests, it is very natural to model interactions. However, they are most powerful when you want to manage changing state in your application, which does not seem to be the case for you. You can achieve fast concurrency without actors. Up to you.
2) If none of your operations is blocking the best rule is amount of threads = amount of CPUs. If you use a non blocking HTTP client, and NIO when writing your output files then you should be fully non-blocking on IOs and can just safely set the thread count for your app to the CPU count on your machine.
3) The documentation on http://akka.io is very very good and comprehensive. If you have no clue how to use the actor model I would recommend getting a book - not necessarily about Akka.
1) It sounds like most of your steps aren't stateful, in which case actors add complication for no real benefit. If you need to coordinate multiple tasks in a mutable way (e.g. for generating the output files) then actors are a good fit for that piece. But the HTTP fetches should probably just be calls to some nonblocking HTTP library (e.g. spray-client - which will in fact use actors "under the hood", but in a way that doesn't expose the statefulness to you).
2) With blocking threads you pretty much have to experiment and see how many you can run without consuming too many resources. Worry about how many simultaneous connections the remote system can handle rather than hitting any "connection limits" on your own machine (it's possible you'll hit the file descriptor limit but if so best practice is just to increase it). Once you figure that out, there's no value in having more threads than the number of simultaneous connections you want to make.
As others have said, with nonblocking everything you should probably just have a number of threads similar to the number of CPU cores (I've also heard "2x number of CPUs + 1", on the grounds that that ensures there will always be a thread available whenever a CPU is idle).
With actors I wouldn't worry about having too many. They're very lightweight.
If you have really no expierience in Akka try to start with something simple like doing a one-to-one actor-thread rewriting of your code. This will be easier to grasp how things work in akka.
Spin two actors at the begining one for receiving requests and one for writting to the output file. Then when request is received create an actor in request-receiver actor that will do the computation and send the result to the writting actor.
Suppose I have an actor, which handles X requests per second. It is ok in average but sometimes there are bursts and clients send Y > X requests per second. Suppose also that all requests have timeouts so they cannot wait in queue forever.
Assuming we program in Scala and Akka what are the best practices/design patterns to make the actor handle those bursts? Are there any code examples, which handle bursts?
As long as your machine can handle the increased load (i.e. has enough CPUs), then I would suggest pooling the Actor using a Router. It sounds like from your example, a dynamically resizing router might be the best fit, but even a standard Round Robin or Smallest Mailbox might be enough. Below is the link for the routers section from the Akka documentation. I hope this helps.
http://doc.akka.io/docs/akka/2.1.2/scala/routing.html
You could also consider distributing the actor across multiple nodes, but that might be overkill for your scenario. If you have interest in that approach, let me know and I can post more context on doing that.
Now as far as what to do when after you pool the actors but the system still is getting backlogged, that's really up to you, but here are a few options. If you can handle the occasional increases in latency due to bursting, then do nothing. The actors mailboxes will just get a little backed up but they will clear as soon as the burst eases off. If not, then the question is how to handle incoming messages when the actors are backlogged. If you want to fast fail in that situation and not accept the message you might want to look into using a bounded mailbox (http://doc.akka.io/docs/akka/2.1.2/scala/dispatchers.html). When the mailbox reaches it's size limit and can no longer queue messages, the caller will get a failure sending the message (I think). Not awesome, but at least will lead to the system stabilizing faster.
I assume you are doing ask (?) (i.e. request/response), so when you do that, you get a Future. That Future will time out (with an implicitly defined timeout value) if it does not receive a response in time, so during a burst, Futures attached to the calls into the backlogged actors will just start timing out; they will not be stuck there forever.
I'm coming from Java, where I'd submit Runnables to an ExecutorService backed by a thread pool. It's very clear in Java how to set limits to the size of the thread pool.
I'm interested in using Scala actors, but I'm unclear on how to limit concurrency.
Let's just say, hypothetically, that I'm creating a web service which accepts "jobs". A job is submitted with POST requests, and I want my service to enqueue the job then immediately return 202 Accepted — i.e. the jobs are handled asynchronously.
If I'm using actors to process the jobs in the queue, how can I limit the number of simultaneous jobs that are processed?
I can think of a few different ways to approach this; I'm wondering if there's a community best practice, or at least, some clearly established approaches that are somewhat standard in the Scala world.
One approach I've thought of is having a single coordinator actor which would manage the job queue and the job-processing actors; I suppose it could use a simple int field to track how many jobs are currently being processed. I'm sure there'd be some gotchyas with that approach, however, such as making sure to track when an error occurs so as to decrement the number. That's why I'm wondering if Scala already provides a simpler or more encapsulated approach to this.
BTW I tried to ask this question a while ago but I asked it badly.
Thanks!
I'd really encourage you to have a look at Akka, an alternative Actor implementation for Scala.
http://www.akkasource.org
Akka already has a JAX-RS[1] integration and you could use that in concert with a LoadBalancer[2] to throttle how many actions can be done in parallell:
[1] http://doc.akkasource.org/rest
[2] http://github.com/jboner/akka/blob/master/akka-patterns/src/main/scala/Patterns.scala
You can override the system properties actors.maxPoolSize and actors.corePoolSize which limit the size of the actor thread pool and then throw as many jobs at the pool as your actors can handle. Why do you think you need to throttle your reactions?
You really have two problems here.
The first is keeping the thread pool used by actors under control. That can be done by setting the system property actors.maxPoolSize.
The second is runaway growth in the number of tasks that have been submitted to the pool. You may or may not be concerned with this one, however it is fully possible to trigger failure conditions such as out of memory errors and in some cases potentially more subtle problems by generating too many tasks too fast.
Each worker thread maintains a dequeue of tasks. The dequeue is implemented as an array that the worker thread will dynamically enlarge up to some maximum size. In 2.7.x the queue can grow itself quite large and I've seen that trigger out of memory errors when combined with lots of concurrent threads. The max dequeue size is smaller 2.8. The dequeue can also fill up.
Addressing this problem requires you control how many tasks you generate, which probably means some sort of coordinator as you've outlined. I've encountered this problem when the actors that initiate a kind of data processing pipeline are much faster than ones later in the pipeline. In order control the process I usually have the actors later in the chain ping back actors earlier in the chain every X messages, and have the ones earlier in the chain stop after X messages and wait for the ping back. You could also do it with a more centralized coordinator.