Thread monitoring for scala actors - scala

Is there a way to monitor how many threads are actually alive and running my scala actors ?

The only way to properly do this is to inject your own executor for the actors subsystem as, by default, the actor threads do not have actor- or scala-specific names (they may just be called Thread-N or pool-N-thread-M depending on which version of Scala you are using.
Philip Haller has given instructions on using your own executor, where you can monitor thread usage if you wish, or at the very least name the threads so created. If you override thread naming you could then use the standard Java system MBeans (i.e. ThreadMXBean) to monitor the threads programmatically (or via the JConsole/JVisualVM).
Note that you can control the default mechanism using the system properties:
actors.minPoolSize
actors.maxPoolSize
actors.corePoolSize

You might try the VisualVM tool (available free from Sun). Among other things, it can monitor threads in running JVMs.

Related

Number of dispatcher threads created in Akka & Viewing Akka Actors in an IDE

I have started using Akka. Please clarify the following queries;
I see around 8 default-dispatcher threads are created. Where is that number defined?
I see two types of dispatchers namely default-dispatcher and internal-dispatcher are created in my setup. How is this decided?
We could see the dispatcher threads in the Debug mode of any IDE. Is there any way to visualize the Akka objects such as Actors, Routers, Receptionist, etc?
I have observed that the dispatcher threads die automatically when I leave the program running for some time. Please explain the reason for this. Does actor system auto-create and auto-delete the dispatchers?
I see a number after default-dispatcher in the logs. What does this number indicate? Does it indicate the thread number being allocated by the dispatcher for an actor?
Example: 2022-09-02 10:39:25.482 [MyApp-akka.actor.default-dispatcher-5] DEBUG
The default dispatcher configuration for Akka can be found in the reference.conf file for the akka-actor package. By default, the default dispatcher (on which user actors run if not otherwise configured), is a fork-join pool with a minimum of 8 threads and a maximum of 64 threads. The internal dispatcher (introduced in Akka 2.6 to prevent user actors from starving system actors) is also, IIRC, a fork-join pool. Other dispatcher types and configurations are possible: the comments in reference.conf go through them, though for most purposes the defaults are reasonable.
As far as I know, there are no existing tools for visualizing the actors etc. in an Akka application.
The threads are managed by the thread-pool underlying the dispatcher. The fork-join pool will terminate threads which have been idle for a certain length of time and create new threads as needed.
The threads are (by default... at the very least a custom dispatcher could override this) named dispatcher-name-threadNumber. If you log inside an actor and have your log message format include the thread name, you can see which thread the actor was scheduled onto. Note that, typically, an actor being scheduled onto one thread does not imply it will never be scheduled on another thread (in particular, things like thread-local storage are unlikely to work); it's also worth noting that as Akka dispatchers are also Scala ExecutionContexts and Java Executors, the threads can be consumed by things which aren't actors (e.g. Future callbacks in Scala).

Using Scala Akka framework for blocking CLI calls

I'm relatively new to Akka & Scala, but I would like to use Akka as a generic framework to pull together information from various web tools, and cli commands.
I understand the general principal that in an Actor model, it is highly desirable not to have the actors block. And in the case of the http requests, there are async http clients (such as Spray) that means that I can handle the requests asynchronously within the Actor framework.
However, I'm unsure what is the best approach when combining actors with existing blocking API calls such as the scala ProcessBuilder/ProcessIO libraries. In terms of issuing these CLI commands I expect a relatively small amount of concurrency, e.g. perhaps executing a max of 10 concurrent CLI invocations on a 12 core machine.
Is it better to have a single actor managing these CLI commands, farming the actual work off to Futures that are created as needed? Or would it be cleaner just to maintain a set of separate actors backed by a PinnedDispatcher? Or something else?
From the Akka documentation ( http://doc.akka.io/docs/akka/snapshot/general/actor-systems.html#Blocking_Needs_Careful_Management ):
"
Blocking Needs Careful Management
In some cases it is unavoidable to do blocking operations, i.e. to put a thread to sleep for an indeterminate time, waiting for an external event to occur. Examples are legacy RDBMS drivers or messaging APIs, and the underlying reason in typically that (network) I/O occurs under the covers. When facing this, you may be tempted to just wrap the blocking call inside a Future and work with that instead, but this strategy is too simple: you are quite likely to find bottle-necks or run out of memory or threads when the application runs under increased load.
The non-exhaustive list of adequate solutions to the “blocking problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router [Java, Scala]), making sure to configure a thread pool which is either dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded number of tasks of this nature will exhaust your memory or thread limits).
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they occur as actor messages.
The first possibility is especially well-suited for resources which are single-threaded in nature, like database handles which traditionally can only execute one outstanding query at a time and use internal synchronization to ensure this. A common pattern is to create a router for N actors, each of which wraps a single DB connection and handles queries as sent to the router. The number N must then be tuned for maximum throughput, which will vary depending on which DBMS is deployed on what hardware."

how is HawtDispatch different from Java's Executors? (and netty)

Frustratingly, HawtDispatch's website describes it as "thread pooling and NIO event notification framework API."
Let's take the 'thread pooling' part first. Most of the Executors provided by Java are also basically thread pools. How is HawtDispatch different?
It is also apparently an "NIO event notification framework API." I'm assuming it is a thin layer on top NIO which takes incoming data and passes to its notion of 'thread pool,' and passes it to a consumer when the thread pool scheduler finds the time. Correct? (Any improvement over NIO is welcomed). Has anyone done any performance analysis of netty vs HD?
HawtDispatch is designed to be a single system wide fixed size thread pool. It provides implements 2 flavors of Java Executors:
Global Dispatch Queue: Submitted Runnable objects are executed concurrently (You get the same effect using a Executors.newFixedThreadPool(n) executor)
Serial Dispatch Queue: Submitted Runnable objects are executed serially (You get the same effect using a Executors.newSingleThreadExecutor() executor)
Unlike the java executor model all global and serial dispatch queues share a single fixed size thread pool. You can use thousands of serial dispatch queues without increasing your thread count. Serial dispatch queues can be used like Erlang mailboxes to drive reactive actor style applications.
Since HawtDispatch is using a fixed size thread pool for processing all global and serial queue executions, all Runnable tasks it executes must be non-blocking. In a way this is similar to the NodeJS architecture except it using multiple threads instead of just one.
In comparison to Netty, HawtDispatch is not a framework for actually processing socket data. It does not provide a framework for how encode/decode, buffer and process the socket data. All it does is execute a user configured Runnable when data can be read or written on the non-blocking socket. It's up to you application then to actually read/write the socket data.

Jboss Messaging WorkerThread# what are these threads?

I am load testing a jboss messaging install with 5 producers producing 100,000 100k messages. I am seeing significant bottlenecking. When I monitor the profiler, I see there are 15 threads named WorkerThread#. These threads are allocated 100% with no waits. I think they may be related. Does anyone know what function these threads service and if there is a threadpool setting. I am using a supp
JBoss Enterprise Application Server 4.3 CP08
JBoss Enterprise Service Bus 4.4 CP04
JBoss Transactions 4.2.3._CP07
JBoss Messaging 1.4.0.SP3-CP09
JBoss Rules 4.0.7
JBoss jBPM 3.2.9
JBoss Web Services 2.0.1.SP2_CP07
I've figured it out. Its not a pool of threads. In the jboss-messaging.sar/remoting-bisocket.xml file that defines the remoting connector for Jboss Messaging, you see a couple of values mainly clientMaxPool, maxPoolSize, numAcceptThreads.
In remoting, when a socket is established threads are created to monitor that socket up to the value of "numAcceptThreads". All this thread does is read data from the socket and hand it off to a thread in the client pool(governed by maxPoolSize).
The threads called workerThread#[] refer to the accept threads. The reason that I see more when I create more producers is because for the bisocket transport for Jboss Messaging there apparently are three sockets created. Initially there are 3, but when I create 5 producers that number is increased to 15(or 5*3 for those not mathematically inclined :)). The reason they are 100% allocated is because when I am sending all those messages the threads read from the socket, hand off to Server Thread, go back to reading from the socket(where this is always data)
So the short answer is there is no pool to govern these threads. You can have more than 1 accept thread, but It would almost never make sense. This because its job is so minimal read the data, hand it off, read the data... So have more threads would just add synchronization overhead.
This is from http://download.oracle.com/javase/tutorial/uiswing/concurrency/worker.html; hope it helps.
When a Swing program needs to execute a long-running task, it usually uses one of the worker threads, also known as the background threads. Each task running on a worker thread is represented by an instance of javax.swing.SwingWorker. SwingWorker itself is an abstract class; you must define a subclass in order to create a SwingWorker object; anonymous inner classes are often useful for creating very simple SwingWorker objects.

Is there any Non-blocking IO open source implementation for Scala's actors?

I have quite large files that I need to deal with (500Meg+ zip files).
Are there any non-blocking IO open source implementations for Scala's actors?
If I got your question right, you need non-blocking IO for files. I have bad news for you then.
NIO
Java NIO in Java6 supports only blocking operation when working with files. You may notice this from the fact FileChannel does not implement SelectableChannel interface. (NIO does support non-blocking mode for sockets however)
There is NIO.2 (JSR-203) specification aimed to overcome many current limits of java.io and NIO and to provide support for asynchronous IO on files as well. NIO.2 is to be released with Java 7 as far as I understand.
These are Java library limits, hence you will suffer from them in Scala as well.
Actors
Actors are based on Fork-Join framework of Doug Lea (at least in branch 2.7.x till version 2.7.7). One quote from FJTask class:
There is nothing actually preventing
you from blocking within a FJTask, and
very short waits/blocks are completely
well behaved. But FJTasks are not
designed to support arbitrary
synchronization since there is no way
to suspend and resume individual tasks
once they have begun executing.
FJTasks should also be finite in
duration -- they should not contain
infinite loops. FJTasks that might
need to perform a blocking action, or
hold locks for extended periods, or
loop forever can instead create normal
java Thread objects that will do so.
FJTasks are just not designed to
support these things.
FJ library is enhanced in Scala to provide a unified way permitting an actor to function like a thread or like an event-based task depending on number of working threads and "library activity" (you may find explanation in technical report "Actors that unify Threads and Events" by Philipp Haller and Martin Odersky).
Solution?
But after all if you run blocking code in an actor it behaves just like if it being a thread, so why not use an ordinary Thread for blocking reads and to send events to event-based actors from this thread?
Are you talking about Remote actors? A standard Actor is of course a intra-JVM entity. I'm not aware of an NIO implementation of remote actors I'm afraid.
Hello is that an option for you?
bigdata(R) is a scale-out storage and computing fabric supporting optional transactions, very high concurrency, and very high aggregate IO rates.
http://sourceforge.net/projects/bigdata/
Not that I know of, but you could probably get a lot of mileage out of looking at Naggati, a Scala wrapper around Apache Mina. Mina is a networking library that uses NIO, Naggati translates this into a Scala style of coding.