We create a simple iqueue in Hazelcast:
HazelcastInstance h = Hazelcast.newHazelcastInstance(config);
BlockingQueue<String> queue = h.getQueue("my-distributed-queue");
Let's assume that queue.size() == 0.
Does the distributed queue "my-distributed-queue" use any memory resources?
Background:
I want to use Hazelcast for creating large amount (>1k) of short lived queues (for keeping time order in item groups). I'm wondering what happens if an IQueue object in Hazelcast is drained out (size==0). Will it leave any artifacts in memory that won't be cleaned up by GC?
I've analized the heap dumps in VisualVM and I've found that queue items are stored as IQueueItem objects. When the queue size is 0, then there are no IQueueItem instances. But are there any other no removable artefacts? Thx for help.
There is some fixed cost of each structure even if it doesn't contain any data. The cost is rather low, you can see the structure backing each instance of a queue here: https://github.com/hazelcast/hazelcast/blob/master/hazelcast/src/main/java/com/hazelcast/queue/impl/QueueContainer.java
You can always destroy a queue once you don't need it - just call the destroy() method Each structure provided by Hazelcast implements this interface.
Related
We're experiencing strange memory behavior on our web server built using Akka HTTP.
Our architecture goes like this:
web server routes calls various actors, get results for future and streams it to response
actors call non-blocking operations (using futures), combine and process data fetched from them and pipe results to sender. We're using standard Akka actors, implementing its receive method (not Akka typed)
there is no blocking code anywhere in the app
When I run web server locally, at the start it takes around 350 MB. After first request, memory usage jumps to around 430 MB and slowly is increasing with each request (monitored using Activity Monitor on Mac). But shouldn't GC clean things after each request? Shouldn't memory usage after processing be 350 MB again?
I also installed YourKit java profiler and here is a digram of head memory
It can be seen that once memory usage increase, it never goes back, and system is stateless. Also, when I run GC manually from profiler, it almost doesn't do anything, just a small decrease in memory usage. I understand some services might cache things after first request, consuming memory temporarily, but is there any policy inside Akka Actors or Akka HTTP about this?
I tried to check objects furthest from GC but it only shows library classes and Akka built in classes, nothing related to our code.
So, I have a 2 questions:
How the actor is closing resources and freeing memory after message processing? Did you experienced anything similar?
Is there any better way of profiling Akka HTTP which will show me stacktrace of using classed furthest from GC?
On a side note, is it advisable to use scheduler inside Actors (running inside Akka HTTP server)? When I do that, it seems memory usage increases heavily and app runs our of memory on DEV environment.
Thanks in advance,
Amer
An actor remains active until it is explicitly stopped: there is no garbage collection.
Probably the two most common methods for managing actor lifetimes (beyond the actor itself deciding that it's time to stop) are:
Parent is responsible for stopping children. If the actors are being spawned for performing specific tasks on behalf of the parent, for instance, this approach is called for.
Using an inactivity timeout. If the actors represent domain entities (e.g. an actor for every user account, where this actor in some sense serves as an in-memory cache), using context.setReceiveTimeout to cause a ReceiveTimeout message to be sent to the actor after the timeout has passed (note that in some cases the scheduled send of that message may not be canceled in time if a message was enqueued in the mailbox but not processed when the timeout expired: receiving a ReceiveTimeout is not a guarantee that the timeout has passed since the last received message) is a pretty reasonable solution, especially if using Akka Persistence and Akka Cluster Sharding to allow the actor's state to be recovered.
Update to add:
Regarding
shouldn't GC clean things after each request?
The short answer is no, GC will not clean things after each request (and if it does, that's a very good sign that you haven't provisioned enough memory).
The longer answer is that the characteristics of garbage collection on the JVM are very underspecified: the only rule a garbage collection implementation has to respect is that it never frees an object reachable from a GC root (basically any variable on a thread's stack or static to a class) by a chain of strong references. When and even whether the garbage collector reclaims the space taken up by garbage is entirely implementation dependent (I say "whether" to account for the existence of the Epsilon garbage collector, which never frees memory; this is useful for benchmarking JVMs without the complication of garbage collection and also in environments where the application can be restarted when it runs out of memory: the JVM crash is in this some sense the actual garbage collector).
You could try executing java.lang.System.gc when the server stops: this may cause a GC run (note that there is no requirement that the system actually collect any garbage in such a scenario). If a garbage collector will free any memory, about the only time it has to run is if there's not enough space to fulfill an object allocation request: therefore if the application stops allocating objects, there may not be a garbage collection run.
For performance reasons, most modern garbage collectors in the JVM wait until there's no more than a certain amount of free space before they collect garbage: this is because the time taken to reclaim all space is proportional to the number of objects which aren't reclaimable and for a great many applications, the pattern is that most objects are comparatively ephemeral, so the number of objects which aren't reclaimable is reasonably constant. The consequence of that is that the garbage collector will do about the same amount of work in a "full" GC for a given application regardless of how much free space there is.
I'm experiencing issues scaling my app with multiple requests.
Each request sends an ask to an actor, which then spawns other actors. This is fine, however, under load(5+ asks at once), the ask takes a massive amount of time to deliver the message to the target actor. The original design was to bulkhead requests evenly, but this is causing a bottleneck. Example:
In this picture, the ask is sent right after the query plan resolver. However, there is a multi-second gap when the Actor receives this message. This is only experienced under load(5+ requests/sec). I first thought this was a starvation issue.
Design:
Each planner-executor is a seperate instance for each request. It spawns a new 'Request Acceptor' actor each time(it logs 'requesting score' when it receives a message).
I gave the actorsystem a custom global executor(big one). I noticed the threads were not utilized beyond the core threadpool size even during this massive delay
I made sure all executioncontexts in the child actors used the correct executioncontext
Made sure all blocking calls inside actors used a future
I gave the parent actor(and all child) a custom dispatcher with core size 50 and max size 100. It did not request more(it stayed at 50) even during these delays
Finally, I tried creating a totally new Actorsystem for each request(inside the planner-executor). This also had no noticable effect!
I'm a bit stumped by this. From these tests it does not look like a thread starvation issue. Back at square one, I have no idea why the message takes longer and longer to deliver the more concurrent requests I make. The Zipkin trace before reaching this point does not degrade with more requests until it reaches the ask here. Before then, the server is able to handle multiple steps to e.g veify the request, talk to the db, and then finally go inside the planner-executor. So I doubt the application itself is running out of cpu time.
We had this very similar issue with Akka. We observed huge delay in ask pattern to deliver messages to the target actor on peek load.
Most of these issues are related to heap memory consumption and not because of usages of dispatchers.
Finally we fixed these issues by tuning some of the below configuration and changes.
1) Make sure you stop entities/actors which are no longer required. If its a persistent actor then you can always bring it back when you need it.
Refer : https://doc.akka.io/docs/akka/current/cluster-sharding.html#passivation
2) If you are using cluster sharding then check the akka.cluster.sharding.state-store-mode. By changing this to persistence we gained 50% more TPS.
3) Minimize your log entries (set it to info level).
4) Tune your logs to publish messages frequently to your logging system. Update the batch size, batch count and interval accordingly. So that the memory is freed. In our case huge heap memory is used for buffering the log messages and send in bulk. If the interval is more then you may fill your heap memory and that affects the performance (more GC activity required).
5) Run blocking operations on a separate dispatcher.
6) Use custom serializers (protobuf) and avoid JavaSerializer.
7) Add the below JAVA_OPTS to your jar
export JAVA_OPTS="$JAVA_OPTS -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -Djava.security.egd=file:/dev/./urandom"
The main thing is XX:MaxRAMFraction=2 which will utilize more than 60% of available memory. By default its 4 means your application will use only one fourth of the available memory, which might not be sufficient.
Refer : https://blog.csanchez.org/2017/05/31/running-a-jvm-in-a-container-without-getting-killed/
Regards,
Vinoth
I frequently see queues in software architecture, especially those called "scalable" with prominent representative of Actor from Akka.io multi-actor platform. However, how can queue be scalable, if we have to synchronize placing messages in queue (and therefore operate in single thread vs multi thread) and again synchronize taking out messages from queue (to assure, that message it taken exactly once)? It get's even more complicated, when those messages can change state of (actor) system - in this case even after taking out message from queue, it cannot be load balanced, but still processed in single thread.
Is it correct, that putting messages in queue must be synchronized?
Is it correct, that putting messages out of queue must be synchronized?
If 1 or 2 is correct, then how is queue scalable? Doesn't synchronization to single thread immediately create bottleneck?
How can (actor) system be scalable, if it is statefull?
Does statefull actor/bean mean, that I have to process messages in single thread and in order?
Does statefullness mean, that I have to have single copy of bean/actor per entire system?
If 6 is false, then how do I share this state between instances?
When I am trying to connect my new P2P node to netowrk, I believe I have to have some "server" that will tell me, who are other peers, is that correct? When I am trying to download torrent, I have to connect to tracker - if there is "server" then we do we call it P2P? If this tracker will go down, then I cannot connect to peers, is that correct?
Is synchronization and statefullness destroying scalability?
Is it correct, that putting messages in queue must be synchronized?
Is it correct, that putting messages out of queue must be synchronized?
No.
Assuming we're talking about the synchronized java keyword then that is a reenetrant mutual exclusion lock on the object. Even multiple threads accessing that lock can be fast as long as contention is low. And each object has its own lock so there are many locks, each which only needs to be taken for a short time, i.e. it is fine-grained locking.
But even if it did, queues need not be implemented via mutual exclusion locks. Lock-free and even wait-free queue data structures exist. Which means the mere presence of locks does not automatically imply single-threaded execution.
The rest of your questions should be asked separately because they are not about message queuing.
Of course you are correct in that a single queue is not scalable. The point of the Actor Model is that you can have millions of Actors and therefore distribute the load over millions of queues—if you have so many cores in your cluster. Always remember what Carl Hewitt said:
One Actor is no actor. Actors come in systems.
Each single actor is a fully sequential and single-threaded unit of computation. The whole model is constructed such that it is perfectly suited to describe distribution, though; this means that you create as many actors as you need.
Using 3.4.1
I want to limit the number of entry in queue memory and thats why tried setting the memory limit property for queue store config but its not working. I think its not related if we set the property or not, still we will have all the entry stored in both in queue memory and Queuestore.
Find the code: https://gist.github.com/hitendrapratap/f8d27777f264c0966a39
Is the bounded queue what you are looking for: http://docs.hazelcast.org/docs/3.4/manual/html-single/hazelcast-documentation.html#bounded-queue ?
The following senario was done using threads
A large queue #work_queue populated/enqueued by the main thread. Used Thread::Queue here.
≥ 2 connection objects of something are added in #conns which had to be loaded serially as part of the loading process uses Expect->spawn
Multiple Worker threads are invoked, and each thread given a single $conns[$i] object & reference to the shared \#work_queue.
The worker threads safely removes a single item from #work_queue and performs some processing through its connection object, after which it picks up the next available item from #work_queue.
When this #work_queue is empty all the threads will shutdown safely
Now, the problem is that the loading phase is taking too long in many cases. But due to the use of Expect->spawn, parallel loading of #conns is possible only on a separate process & not on a thread.
Please suggest a good way to achieve the above scenario using fork. Or, even better if there is a way to use Expect->spawn with threads. (UNIX/LINUX only)
See Is it possible to use threads with Expect?