I have a dummy application.conf as follows:
configuration {
default-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
core-pool-size-min = 4
core-pool-size-factor = 2.0
core-pool-size-max = 8
}
throughput = 10
mailbox-capacity = -1
mailbox-type = ""
}
}
So, there'll be 4 x 2 = 8 threads in the pool, always, and a maximum of 8 x 2 = 16 threads.
Now, if I understand correctly, dispatcher is responsible for picking an actor and a bunch of messages from the mailbox and processing them.
Next, I spawn just one child actor for a supervisor as follows:
val greeter: ActorRef = context.actorOf(GreetingsActor.propsWithDispatcher)
What I'd like to know is.. since there's only one instance of the child actor, only one thread from the pool would ever be in use.
Is my understanding correct?
Yes, a single actor can process 1 message at a time on a single thread. When the message is processed, the thread is returned to thread-pool, thus available for scheduling to other messages and actors.
Related
I'm using Akka for several different Actors. The work done by these Actors is non-blocking. I noticed something odd - the number of dispatchers scales with the number of Actors I'm creating. If I create hundreds of actors, I find myself with hundreds of dispatchers, sometimes over 1000.
This is, even though, most of the dispatchers look like this:
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000003d503de50> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
(basically, doing nothing most of the time)
I initialize dispatchers with calls like this:
ActorMaterializer(ActorMaterializerSettings(system).withDispatcher(s"akka.dispatchers.$dispatcherName"))(system))
My dispatcher configuration is below (we have different dispatchers for different actors):
dispatchers {
connector-actor-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
fixed-pool-size = 200
}
throughput = 1
}
http-actor-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 1
parallelism-factor = 1.0
parallelism-max = 64
task-peeking-mode = "FIFO"
}
throughput = 1
}
commands-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 1
parallelism-factor = 1.0
parallelism-max = 64
task-peeking-mode = "FIFO"
}
throughput = 1
}
http-server-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
core-pool-size-factor = 1
}
throughput = 1
}
http-client-dispatcher-low {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
core-pool-size-factor = 1
}
throughput = 1
}
http-client-dispatcher-high {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
core-pool-size-factor = 1
}
throughput = 1
}
http-client-dispatcher-parser {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
fixed-pool-size = 200
}
throughput = 1
}
}
}
What am I missing?
It seems that this mostly answered in the comments above, but to collate them into an "Answer": it appears that "so many dispatchers" are getting created because you are creating them explicitly in your config.
Also, when you give an example of a "dispatcher" you are actually showing a thread stack trace. So you might be confusing threads with dispatchers. And many of the the dispatchers you are creating have large thread counts. As #Tim says "If you are creating hundreds of actors and each actor has its own dispatcher with many threads, you are going to get a lot of threads!"
But that's really a tuning question. The answer to the direct question is that generally (with the exception of the system dispatcher), dispatchers are generally only created when you specifically ask for them to be created. And it appears that you are creating many of them. And that you also have enormous thread counts for each dispatcher.
As discussed, the general best practice is to have one dispatcher for non-blocking actors and another dispatcher with blocking actors. Each dispatcher will also generally only need a small number of threads. There are some edge cases where you might want additional dedicated dispatchers for particularly sensitive or badly behaving actors, but it depends on your actors and your application.
Suppose I have a method to be executed once on every worker node.
The following code is what I have come up with to achieve this goal, but it seems that the method is executed twice on the same worker node(there are a master and two worker nodes altogether).
val noOfExecs = sparkSession.sparkContext.getExecutorMemoryStatus.keys.size
val results = sparkSession.sparkContext
.parallelize(0 until noOfExecs, noOfExecs)
.map { _ =>
new SomeClass().doSomething()
}
.cache()
results.count()
How can I make sure that the method is executed only once on every worker node?
Maybe you've confused yourself in drawing the conclusion. Why do you say the method is executed twice on the same worker node?
a few things needs to be clarified for spark:
the noOfExecs by using the method sparkSession.sparkContext.getExecutorMemoryStatus.keys.size will return the total number of executors plus driver. which will be 3 if you have two workers/executors.
breaking down your code into a few chunks, first a data set to be parallelized out to the spark cluster, it is basically an array/range of integers. (0,1,2). Note you can not really control which integer get sent to which worker.
and you map over the integer(s), so there are 3 values in the data set across all 2 workers, and you ask the worker to do something. ( below I've modified it to print to the WORKERs stdout - console output, so when you check the WORKER log, you know which data is executed on that worker.)
the rest of cache or results.count() are just noise.
a method will be executed once on each worker, if you use method like map. so you do not have to ensure this, spark should.
see below. and you should be able to check worker's log. in my test, 1 work log has this
on worker, this method is executed for data number: 1, happy sharing
and the other worker has this:
on worker, this method is executed for data number: 0, happy sharing
on worker, this method is executed for data number: 2, happy sharing
below is your code being modified.
class SomeClass()
{
def doSomething(x:Int) = {
println(s"on worker, this method is executed for data number: $x, happy sharing")
}
}
// below return 3 for 1 driver, 2 executors/workers cluster setup
val driverAndWorkers = spark.sparkContext.getExecutorMemoryStatus
val noOfExecs = driverAndWorkers.keys.size
//below is basicaly '0,1,2'
val data = 0 until noOfExecs
val rddOfInt = spark.sparkContext.parallelize(data,noOfExecs) //, noOfExecs can be removed. in this case/topic, it does not matter how you partition the RDD.
val results = rddOfInt
.map { x =>
new SomeClass().doSomething(x)
}
.cache()
results.count()
Situation
I am using akka actors to update data on my web-client. One of those actors is solely repsonsible for sending updates concerning single Agents. These agents are updated very rapidly (every 10ms). My goal now is to throttle this updating mechanism so that the newest version of every Agent is sent every 300ms.
My code
This is what I came up with so far:
/**
* Single agents are updated very rapidly. To limit the burden on the web-frontend, we throttle the messages here.
*/
class BroadcastSingleAgentActor extends Actor {
private implicit val ec: ExecutionContextExecutor = context.dispatcher
private var queue = Set[Agent]()
context.system.scheduler.schedule(0 seconds, 300 milliseconds) {
queue.foreach { a =>
broadcastAgent(self)(a) // sends the message to all connected clients
}
queue = Set()
}
override def receive: Receive = {
// this message is received every 10 ms for every agent present
case BroadcastAgent(agent) =>
// only keep the newest version of the agent
queue = queue.filter(_.id != agent.id) + agent
}
}
Question
This actor (BroadcastSingleAgentActor) works as expected, but I am not 100% sure if this is thread safe (updating the queue while potentionally clearing it). Also, this does not feel like I am making the best out of the tools akka provides me with. I found this article (Throttling Messages in Akka 2), but my problem is that I need to keep the newest Agent message while dropping any old version of it. Is there an example somewhere similar to what I need?
No, this isn't thread safe because the scheduling via the ActorSystem will happen on another thread than the receive. One potential idea is to do the scheduling within the receive method because incoming messages to the BroadcastSingleAgentActor will be handled sequentially.
override def receive: Receive = {
case Refresh =>
context.system.scheduler.schedule(0 seconds, 300 milliseconds) {
queue.foreach { a =>
broadcastAgent(self)(a) // sends the message to all connected clients
}
}
queue = Set()
// this message is received every 10 ms for every agent present
case BroadcastAgent(agent) =>
// only keep the newest version of the agent
queue = queue.filter(_.id != agent.id) + agent
}
I am getting an infrequent NoSuchElementException error when operating over my Scala 2.9.2 Queue. I don't understand the exception becase the Queue has elements in it. I've tried switching over to a SynchronizedQueue, thinking it was a concurrency issue (my queue is written and read to from different threads) but that didn't solve it.
The reduced code looks like this:
val window = new scala.collection.mutable.Queue[Packet]
...
(thread 1)
window += packet
...
(thread 2)
window.dequeueAll(someFunction)
println(window.size)
window.foreach(println(_))
Which results in
32
java.util.NoSuchElementException
at scala.collection.mutable.LinkedListLike$class.head(LinkedListLike.scala:76)
at scala.collection.mutable.LinkedList.head(LinkedList.scala:78)
at scala.collection.mutable.MutableList.head(MutableList.scala:53)
at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
at scala.collection.mutable.MutableList.foreach(MutableList.scala:30)
The docs for LinkedListLike.head() say
Exceptions thrown
`NoSuchElementException`
if the linked list is empty.
but how can this exception be thrown if the Queue is not empty?
You should have window (mutable data structure) accessed from only a single thread. Other threads should send messages to that one.
There is Akka that allows relatively easy concurrent programming.
class MySource(windowHolderRef:ActorRef) {
def receive = {
case MyEvent(packet:Packet) =>
windowHolderRef ! packet
}
}
case object CheckMessages
class WindowHolder {
private val window = new scala.collection.mutable.Queue[Packet]
def receive = {
case packet:Packet =>
window += packet
case CheckMessages =>
window.dequeueAll(someFunction)
println(window.size)
window.foreach(println(_))
}
}
To check messages periodically you can schedule periodic messages.
// context.schedule(1 second, 1 second, CheckMessages)
I am sending my Scala Actor its messages from a for loop. The scala actor is receiving the
messages and getting to the job of processing them. The actors are processing cpu and disk intensive tasks such as unzipping and storing files. I deduced that the Actor part is working fine by putting in a delay Thread.sleep(200) in my message passing code in the for loop.
for ( val e <- entries ) {
MyActor ! new MyJob(e)
Thread.sleep(100)
}
Now, my problem is that the program exits with a code 0 as soon as the for loop finishes execution. Thus preventing my Actors to finish there jobs. How do I get over this? This may be really a n00b question. Any help is highly appreciated!
Edit 1:
This solved my problem for now:
while(MyActor.getState != Actor.State.Terminated)
Thread.sleep(3000)
Is this the best I can do?
Assume you have one actor you're want to finish its work. To avoid sleep you can create a SyncVar and wait for it to be initialized in the main thread:
val sv = new SyncVar[Boolean]
// start the actor
actor {
// do something
sv.set(true)
}
sv.take
The main thread will wait until some value is assigned to sv, and then be woken up.
If there are multiple actors, then you can either have multiple SyncVars, or do something like this:
class Ref(var count: Int)
val numactors = 50
val cond = new Ref(numactors)
// start your actors
for (i <- 0 until 50) actor {
// do something
cond.synchronized {
cond.count -= 1
cond.notify()
}
}
cond.synchronized {
while (cond.count != 0) cond.wait
}