Is ProcessorContext.schedule thread-safe? - apache-kafka

I was wondering if ProcessorContext.schedule is thread-safe so that I can spawn new thread to execute the Punctuator callback?
Also, if a consumer consumes just 1 partition but we set num.stream.threads=2. Does this automatically spawn a new thread for the scheduler?
After trying it a bit I found the answer may be "no".
Then what's the recommended the way to make spawning new thread for scheduler thread-safe?

Registering a punctuation will not spawn a new thread. The number of used threads in determined by num.stream.threads configuration only. Hence, if you register a punctuation, it's executed on the same thread as the topology and thus it is thread safe.
If you configure more threads than available input topic partitions, some threads would not get any work assigned, and thus, they would not execute any punctuations.

Related

Number of dispatcher threads created in Akka & Viewing Akka Actors in an IDE

I have started using Akka. Please clarify the following queries;
I see around 8 default-dispatcher threads are created. Where is that number defined?
I see two types of dispatchers namely default-dispatcher and internal-dispatcher are created in my setup. How is this decided?
We could see the dispatcher threads in the Debug mode of any IDE. Is there any way to visualize the Akka objects such as Actors, Routers, Receptionist, etc?
I have observed that the dispatcher threads die automatically when I leave the program running for some time. Please explain the reason for this. Does actor system auto-create and auto-delete the dispatchers?
I see a number after default-dispatcher in the logs. What does this number indicate? Does it indicate the thread number being allocated by the dispatcher for an actor?
Example: 2022-09-02 10:39:25.482 [MyApp-akka.actor.default-dispatcher-5] DEBUG
The default dispatcher configuration for Akka can be found in the reference.conf file for the akka-actor package. By default, the default dispatcher (on which user actors run if not otherwise configured), is a fork-join pool with a minimum of 8 threads and a maximum of 64 threads. The internal dispatcher (introduced in Akka 2.6 to prevent user actors from starving system actors) is also, IIRC, a fork-join pool. Other dispatcher types and configurations are possible: the comments in reference.conf go through them, though for most purposes the defaults are reasonable.
As far as I know, there are no existing tools for visualizing the actors etc. in an Akka application.
The threads are managed by the thread-pool underlying the dispatcher. The fork-join pool will terminate threads which have been idle for a certain length of time and create new threads as needed.
The threads are (by default... at the very least a custom dispatcher could override this) named dispatcher-name-threadNumber. If you log inside an actor and have your log message format include the thread name, you can see which thread the actor was scheduled onto. Note that, typically, an actor being scheduled onto one thread does not imply it will never be scheduled on another thread (in particular, things like thread-local storage are unlikely to work); it's also worth noting that as Akka dispatchers are also Scala ExecutionContexts and Java Executors, the threads can be consumed by things which aren't actors (e.g. Future callbacks in Scala).

Should we use thread pool for long running threads?

Should we use a thread pool for long running threads or start our own threads? Is there some design pattern?
Unfortunately, it depends. There is no hard and fast rule saying that you should always employ thread pools.
Thread pools offer two main things:
Delegated creation/reuse of threads.
Back-pressure
IMO, it's the back-pressure property that's interesting, but often the most poorly understood. Your machine runs on a limited set of resources. If you have (say) 8 CPU cores and they are all busy working, you would like to signal that in some way that adding more work (submitting more tasks) isn't going to help, at least not in terms of latency.
This is the reason java.util.concurrent.ExecutorService implementations allow you to specify a java.util.concurrent.BlockingQueue of your choice. When this queue grows full, invoking threads will block until the thread pool has managed to complete tasks in progress.
Whether or not to have long-running threads inside the thread pool depends on what it's doing. If the thread is constantly busy (meaning it will never complete) then it will always occupy a slot in the thread pool, which is kind of pointless.
Regarding delegated creation/reuse of threads; maybe you could have two pools, one for long-running tasks and one for other tasks. Or perhaps a long-running thread pool with one single slot, this will prevent two long-running tasks from running at the same time, provided that is what you want.
As you can see, there is no single good answer. It really boils down to what you are trying to achieve and how you want to use the resources at hand.

Netty worker threads

In my netty server, I create threadpools as follows.
ChannelFactory factory =
new NioServerSocketChannelFactory(
Executors.newCachedThreadPool(threadFactory),
Executors.newCachedThreadPool(threadFactory);
Sometimes, I noticed that after a certain number of connections are being worked on by the server, the subsequent connections wait for one of the priors threads to finish.
From the documentation of newCachedThreadPool, I was under the assumption that the thread pool creates new threads as needed. Could someone help me understand why some of my connections being blocked till the prior connections finish? Would netty not create a new thread for new connection, as all the existing ones are busy?
How do I fix this?
Any help is appreciated!
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue.
At any point, at most nThreads threads will be active processing tasks.
If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available. If any thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.
The threads in the pool will exist until it is explicitly shutdown.
from Oracle Java Doc for newCachedThreadPool!
so the thread number is fixed by Executors.newCachedThreadPool
in netty default number is processer_number *2
:)

how to ensure the mutex shared by each thread averagely

I tried to find out how to ensure a mutex should be entered into by each thread (POSIX thread in Linux) averagely.
In my program, there is a global queue and it has own mutex lock. A couple of writing threads write element into queue one at a time, and a single reading thread reads out a group of elements from the queue every time. The result is that the size of queue always grows large than the limitation.
so my question is how to ensure that the mutex should be accessed by every thread averagely. Any comments will be appreciated!
I am assuming the scenario of two writer threads, one reader thread and a common buffering queue with some buffer limit.
There are couple of ways doing this.
Create the reader thread with higher priority then writer threads. So every time when the lock will be released by any of the writer thread, it will be acquired by the reader thread immediately if it is waiting in the scheduler queue along-with the second writer thread.
Use a global synchronized flag to perform the task in queue, and give a threshold for certain reading and writing conditions (say if my queue count is 10, so if the max count will be achieved, next time I will be able to schedule reader thread only with the help of flag for a certain number of times and then will release the flag to work normally). This will help restricting the queue growing larger then the limit.
Hope you understand both the points.

how is HawtDispatch different from Java's Executors? (and netty)

Frustratingly, HawtDispatch's website describes it as "thread pooling and NIO event notification framework API."
Let's take the 'thread pooling' part first. Most of the Executors provided by Java are also basically thread pools. How is HawtDispatch different?
It is also apparently an "NIO event notification framework API." I'm assuming it is a thin layer on top NIO which takes incoming data and passes to its notion of 'thread pool,' and passes it to a consumer when the thread pool scheduler finds the time. Correct? (Any improvement over NIO is welcomed). Has anyone done any performance analysis of netty vs HD?
HawtDispatch is designed to be a single system wide fixed size thread pool. It provides implements 2 flavors of Java Executors:
Global Dispatch Queue: Submitted Runnable objects are executed concurrently (You get the same effect using a Executors.newFixedThreadPool(n) executor)
Serial Dispatch Queue: Submitted Runnable objects are executed serially (You get the same effect using a Executors.newSingleThreadExecutor() executor)
Unlike the java executor model all global and serial dispatch queues share a single fixed size thread pool. You can use thousands of serial dispatch queues without increasing your thread count. Serial dispatch queues can be used like Erlang mailboxes to drive reactive actor style applications.
Since HawtDispatch is using a fixed size thread pool for processing all global and serial queue executions, all Runnable tasks it executes must be non-blocking. In a way this is similar to the NodeJS architecture except it using multiple threads instead of just one.
In comparison to Netty, HawtDispatch is not a framework for actually processing socket data. It does not provide a framework for how encode/decode, buffer and process the socket data. All it does is execute a user configured Runnable when data can be read or written on the non-blocking socket. It's up to you application then to actually read/write the socket data.