I am using Play 2.1 with Scala to run several tests continuously.
I am doing a Future.traverse(tests)(test => Future(runTest(test)).
I want to limit the number of tests running in parallel so I want to limit the number of threads in the default dispatcher.
I tried to put
play {
akka {
event-handlers = ["akka.event.slf4j.Slf4jEventHandler"]
loglevel = WARNING
actor {
default-dispatcher = {
fork-join-executor {
parallelism-factor = 1.0
parallelism-max = 2
}
}
}
}
}
in the application.conf but it seems it doesn't have any effect (when I run the program there still is one thread per core). The application.conf is correctly read for other play settings.
I tried to get rid of the play{} surrounding it but it changes nothing.
I tried different imports of executions contexts with no success:
//import scala.concurrent.ExecutionContext.Implicits._
import play.api.libs.concurrent.Execution.Implicits._
When I run the application I get this message so it seems that it is the default dispatcher which is used:
[info] play - Starting application default Akka system.
Does someone have an idea why I can't configure the default dispatcher?
Thank you!
List of threads :
main
Reference Handler
Finalizer
Signal Dispatcher
FSEvent thread
Attach Listener
play-scheduler-1
Timer-0
com.google.common.base.internal.Finalizer
BoneCP-keep-alive-scheduler
BoneCP-max-alive-scheduler
BoneCP-pool-alive-scheduler
application-akka.actor.default-dispatcher-2
application-scheduler-1
ForkJoinPool-3-worker-1
default-scheduler-1
default-scheduler-1
default-scheduler-1
default-akka.actor.default-dispatcher-3
default-akka.actor.default-dispatcher-5
default-akka.actor.default-dispatcher-3
default-akka.actor.default-dispatcher-2
default-akka.actor.default-dispatcher-5
default-pinned-dispatcher-4
play-akka.actor.default-dispatcher-2
play-akka.actor.default-dispatcher-4
Timer-1
Timer-3
Timer-4
Hashed wheel timer #1
Hashed wheel timer #2
Hashed wheel timer #3
AsyncHttpClient-Reaper
AsyncHttpClient-Reaper
AsyncHttpClient-Reaper
default-pinned-dispatcher-4
default-pinned-dispatcher-4
New I/O boss #35
New I/O boss #44
And 8 play-internal-execution-context- (1 to 8)
And 8 iteratee-execution-context- (1 to 8)
And 62 New I/O worker # (1 to 62)
The setting you have forgotten is parallelism-min which defaults to 8. But before you go and change that one please consider not using the default dispatcher for this purpose: restricting it to two threads may well break the system. Iād recommend configuring a specific dispatcher to be used for your futures.
Related
I'm building an app that has the following flow:
There is a source of items to process
Each item should be processed by external command (it'll be ffmpeg in the end but for this simple reproducible use case it is just cat to have data be passed through it)
In the end, the output of such external command is saved somewhere (again, for the sake of this example it just saves it to a local text file)
So I'm doing the following operations:
Prepare a source with items
Make an Akka graph that uses Broadcast to fan-out the source items into individual flows
Individual flows uses ProcessBuilder in conjunction with Flow.fromSinkAndSource to build flow out of this external process execution
End the individual flows with a sink that saves the data to a file.
Complete code example:
import akka.actor.ActorSystem
import akka.stream.scaladsl.GraphDSL.Implicits._
import akka.stream.scaladsl._
import akka.stream.ClosedShape
import akka.util.ByteString
import java.io.{BufferedInputStream, BufferedOutputStream}
import java.nio.file.Paths
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, ExecutionContext, Future}
object MyApp extends App {
// When this is changed to something above 15, the graph just stops
val PROCESSES_COUNT = Integer.parseInt(args(0))
println(s"Running with ${PROCESSES_COUNT} processes...")
implicit val system = ActorSystem("MyApp")
implicit val globalContext: ExecutionContext = ExecutionContext.global
def executeCmdOnStream(cmd: String): Flow[ByteString, ByteString, _] = {
val convertProcess = new ProcessBuilder(cmd).start
val pipeIn = new BufferedOutputStream(convertProcess.getOutputStream)
val pipeOut = new BufferedInputStream(convertProcess.getInputStream)
Flow
.fromSinkAndSource(StreamConverters.fromOutputStream(() ā pipeIn), StreamConverters.fromInputStream(() ā pipeOut))
}
val source = Source(1 to 100)
.map(element => {
println(s"--emit: ${element}")
ByteString(element)
})
val sinksList = (1 to PROCESSES_COUNT).map(i => {
Flow[ByteString]
.via(executeCmdOnStream("cat"))
.toMat(FileIO.toPath(Paths.get(s"process-$i.txt")))(Keep.right)
})
val graph = GraphDSL.create(sinksList) { implicit builder => sinks =>
val broadcast = builder.add(Broadcast[ByteString](sinks.size))
source ~> broadcast.in
for (i <- broadcast.outlets.indices) {
broadcast.out(i) ~> sinks(i)
}
ClosedShape
}
Await.result(Future.sequence(RunnableGraph.fromGraph(graph).run()), Duration.Inf)
}
Run this using following command:
sbt "run PROCESSES_COUNT"
i.e.
sbt "run 15"
This all works quite well until I raise the amount of "external processes" (PROCESSES_COUNT in the code). When it's 15 or less, all goes well but when it's 16 or more then the following things happen:
Whole execution just hangs after emitting the first 16 items (this amount of 16 items is Akka's default buffer size AFAIK)
I can see that cat processes are started in the system (all 16 of them)
When I manually kill one of these cat processes in the system, something frees up and processing continues (of course in the result, one file is empty because I killed its processing command)
I checked that this is caused by the external execution for sure (not i.e. limit of Akka Broadcast itself).
I recorded a video showing these two situations (first, 15 items working fine and then 16 items hanging and freed up by killing one process) - link to the video
Both the code and video are in this repo
I'd appreciate any help or suggestions where to look solution for this one.
It is an interesting problem and it looks like that the stream is dead-locking. The increase of threads may be fixing the symptom but not the underlying problem.
The problem is following code
Flow
.fromSinkAndSource(
StreamConverters.fromOutputStream(() => pipeIn),
StreamConverters.fromInputStream(() => pipeOut)
)
Both fromInputStream and fromOutputStream will be using the same default-blocking-io-dispatcher as you correctly noticed. The reason for using a dedicated thread pool is that both perform Java API calls that are blocking the running thread.
Here is a part of a thread stack trace of fromInputStream that shows where blocking is happening.
at java.io.FileInputStream.readBytes(java.base#11.0.13/Native Method)
at java.io.FileInputStream.read(java.base#11.0.13/FileInputStream.java:279)
at java.io.BufferedInputStream.read1(java.base#11.0.13/BufferedInputStream.java:290)
at java.io.BufferedInputStream.read(java.base#11.0.13/BufferedInputStream.java:351)
- locked <merged>(a java.lang.ProcessImpl$ProcessPipeInputStream)
at java.io.BufferedInputStream.read1(java.base#11.0.13/BufferedInputStream.java:290)
at java.io.BufferedInputStream.read(java.base#11.0.13/BufferedInputStream.java:351)
- locked <merged>(a java.io.BufferedInputStream)
at java.io.FilterInputStream.read(java.base#11.0.13/FilterInputStream.java:107)
at akka.stream.impl.io.InputStreamSource$$anon$1.onPull(InputStreamSource.scala:63)
Now, you're running 16 simultaneous Sinks that are connected to a single Source. To support back-pressure, a Source will only produce an element when all Sinks send a pull command.
What happens next is that you have 16 calls to method FileInputStream.readBytes at the same time and they immediately block all threads of default-blocking-io-dispatcher. And there are no threads left for fromOutputStream to write any data from the Source or perform any kind of work. Thus, you have a dead-lock.
The problem can be fixed if you increase the threads in the pool. But this just removes the symptom.
The correct solution is to run fromOutputStream and fromInputStream in two separate thread pools. Here is how you can do it.
Flow
.fromSinkAndSource(
StreamConverters.fromOutputStream(() => pipeIn).async("blocking-1"),
StreamConverters.fromInputStream(() => pipeOut).async("blocking-2")
)
with following config
blocking-1 {
type = "Dispatcher"
executor = "thread-pool-executor"
throughput = 1
thread-pool-executor {
fixed-pool-size = 2
}
}
blocking-2 {
type = "Dispatcher"
executor = "thread-pool-executor"
throughput = 1
thread-pool-executor {
fixed-pool-size = 2
}
}
Because they don't share the pools anymore, both fromOutputStream and fromInputStream can perform their tasks independently.
Also note that I just assigned 2 threads per pool to show that it's not about the thread count but about the pool separation.
I hope this helps to understand akka streams better.
Turns out this was limit on Akka configuration level of blocking IO dispatchers:
So changing that value to something bigger than the amount of streams fixed the issue:
akka.actor.default-blocking-io-dispatcher.thread-pool-executor.fixed-pool-size = 50
I am trying to finetune hystrix threadpool core size and max size. For that I need to know and plot the number of active threads at anytime in the pool. Is there a way to do so?
Is this the right way?
HystrixThreadPoolKey hystrixThreadPoolKey = new HystrixThreadPoolKey() {
#Override
public String name() {
return threadPoolKey;
}
};
HystrixThreadPoolMetrics hystrixThreadPoolMetrics = HystrixThreadPoolMetrics.getInstance(hystrixThreadPoolKey);
log.info("Hystrix active threads: {}", hystrixThreadPoolMetrics.getCurrentActiveCount().toString());
I am not sure because when I use this I get active thread count as 0, when the corePoolSize setting is 10.
This code works fine (after putting a null check for the time till no request is made). But the right way should be to use Netflix's servo.
Netflix Announcement
How to Use
Quoting some part from the announcement:
Servo is designed to make it easy for developers to export metrics from their application code, register them with JMX, and publish them to external monitoring systems
(Also active thread pool size can be less than the core size)
I have a play(2.3.0) application that does some database lookups. When there are more than 6 users the application runs into performance problems.
I have narrowed down the problem to a controller with an action that does a sleep of 4 seconds.
A test client calls this action every 500 ms. I can see the the first 6 requests are processesed, and it stops a few seconds(until the 4 seconds sleep have passed) and reads the next 6.
Also: when I open 7 browser windows the 7th will not load(waits for connection).
Looking at the documentation it looks like my problem is blocking io and using the highly synchronous profile should solve my problem.
Therefore I added this profile to my application.conf but nothing changes.
my application.conf looks like this
application.context=/appname/
# Secret key
# ~~~~~
# The secret key is used to secure cryptographics functions.
# If you deploy your application to several instances be sure to use the same key!
application.secret="xxxxx"
play {
akka {
akka.loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = WARNING
actor {
default-dispatcher = {
fork-join-executor {
parallelism-min = 300
parallelism-max = 300
}
}
}
}
}
and the action
def performancetestSleep() = Action{ request => {
Thread.sleep(4000)
Ok("hmmm good sleep")
}}
It seems to me the threadpool configuration is ignored. What am I missing here?
What you need for this is really just one thread which handles the 4 second delay - a scheduler. Spawning that many threads defeats the whole point of the architecture that Play has, IMHO. You could then use the scheduler to create a Future[Result] which you'd feed into an Action.async block.
Now, you don't really need to implement your own scheduler since Play depends on Akka for its concurrency; and Akka has a scheduler which will do the job.
import scala.concurrent.{Promise}
import scala.concurrent.duration._
import play.libs.Akka
val system = Akka.system()
def delayedResponse = Action.async {
import system.dispatcher
val promise = Promise[Result]
system.scheduler.scheduleOnce(4000 milliseconds) {
promise.success(Ok("Sorry for the wait!"))
}
promise.future
}
I used
activator run
to start the server, that does not seem to pick up the threadpool profile. Using
activator start
does, and now the profile seems to be used. I now need to test if this solves my problem. Will also have a look at the async call.
I noticed a slight difference between the documentation for 2.1 and 2.0:
2.0
akka.default-dispatcher.core-pool-size-max = 64
akka.debug.receive = on
2.1
akka.default-dispatcher.fork-join-executor.pool-size-max =64
akka.actor.debug.receive = on
Akka's own documentation has a core-pool-size-max setting like 2.0, but no pool-size-max like 2.1. Why did this change between 2.0 and 2.1? Which is the correct way to configure Akka in Play? Is this a documentation bug in one of the versions?
(In the meantime, I'm going to try and stick both config styles in my Play 2.1 config and hope for the best).
First of all, always use the documentation for the version you're using, in your case you're linking to the snapshot documentation which is for an unreleased Akka version (i.e. a snapshot).
Here's the 2.1.2 docs: http://doc.akka.io/docs/akka/2.1.2/scala/dispatchers.html (also accessible from doc.akka.io)
When we look at that page, we see that under the example configuration for fork-join-executor and thread-pool-executor it says: "For more options, see the default-dispatcher section of the Configuration.", linking to:
Where we can find:
# This will be used if you have set "executor = "thread-pool-executor""
thread-pool-executor {
# Keep alive time for threads
keep-alive-time = 60s
# Min number of threads to cap factor-based core number to
core-pool-size-min = 8
# The core pool size factor is used to determine thread pool core size
# using the following formula: ceil(available processors * factor).
# Resulting size is then bounded by the core-pool-size-min and
# core-pool-size-max values.
core-pool-size-factor = 3.0
# Max number of threads to cap factor-based number to
core-pool-size-max = 64
# Minimum number of threads to cap factor-based max number to
# (if using a bounded task queue)
max-pool-size-min = 8
# Max no of threads (if using a bounded task queue) is determined by
# calculating: ceil(available processors * factor)
max-pool-size-factor = 3.0
# Max number of threads to cap factor-based max number to
# (if using a bounded task queue)
max-pool-size-max = 64
# Specifies the bounded capacity of the task queue (< 1 == unbounded)
task-queue-size = -1
# Specifies which type of task queue will be used, can be "array" or
# "linked" (default)
task-queue-type = "linked"
# Allow core threads to time out
allow-core-timeout = on
}
So to conclude, you need to set the default-dispatcher to use the "thread-pool-executor" if you want to use the ThreadPoolExecutor, by akka.default-dispatcher.executor = "thread-pool-executor" and then specify your configuration for that thread-pool-executor.
Does that help?
Cheers,
ā
from random import randrange
from time import sleep
#import thread
from threading import Thread
from Queue import Queue
'''The idea is that there is a Seeker method that would search a location
for task, I have no idea how many task there will be, could be 1 could be 100.
Each task needs to be put into a thread, does its thing and finishes. I have
stripped down a lot of what this is really suppose to do just to focus on the
correct queuing and threading aspect of the program. The locking was just
me experimenting with locking'''
class Runner(Thread):
current_queue_size = 0
def __init__(self, queue):
self.queue = queue
data = queue.get()
self.ID = data[0]
self.timer = data[1]
#self.lock = data[2]
Runner.current_queue_size += 1
Thread.__init__(self)
def run(self):
#self.lock.acquire()
print "running {ID}, will run for: {t} seconds.".format(ID = self.ID,
t = self.timer)
print "Queue size: {s}".format(s = Runner.current_queue_size)
sleep(self.timer)
Runner.current_queue_size -= 1
print "{ID} done, terminating, ran for {t}".format(ID = self.ID,
t = self.timer)
print "Queue size: {s}".format(s = Runner.current_queue_size)
#self.lock.release()
sleep(1)
self.queue.task_done()
def seeker():
'''Gathers data that would need to enter its own thread.
For now it just uses a count and random numbers to assign
both a task ID and a time for each task'''
queue = Queue()
queue_item = {}
count = 1
#lock = thread.allocate_lock()
while (count <= 40):
random_number = randrange(1,350)
queue_item[count] = random_number
print "{count} dict ID {key}: value {val}".format(count = count, key = random_number,
val = random_number)
count += 1
for n in queue_item:
#queue.put((n,queue_item[n],lock))
queue.put((n,queue_item[n]))
'''I assume it is OK to put a tulip in and pull it out later'''
worker = Runner(queue)
worker.setDaemon(True)
worker.start()
worker.join()
'''Which one of these is necessary and why? The queue object
joining or the thread object'''
#queue.join()
if __name__ == '__main__':
seeker()
I have put most of my questions in the code itself, but to go over the main points (Python2.7):
I want to make sure I am not creating some massive memory leak for myself later.
I have noticed that when I run it at a count of 40 in putty or VNC on
my linuxbox that I don't always get all of the output, but when
I use IDLE and Aptana on windows, I do.
Yes I understand that the point of Queue is to stagger out your
Threads so you are not flooding your system's memory, but the task at
hand are time sensitive so they need to be processed as soon as they
are detected regardless of how many or how little there are; I have
found that when I have Queue I can clearly dictate when a task has
finished as oppose to letting the garbage collector guess.
I still don't know why I am able to get away with using either the
.join() on the thread or queue object.
Tips, tricks, general help.
Thanks for reading.
If I understand you correctly you need a thread to monitor something to see if there are tasks that need to be done. If a task is found you want that to run in parallel with the seeker and other currently running tasks.
If this is the case then I think you might be going about this wrong. Take a look at how the GIL works in Python. I think what you might really want here is multiprocessing.
Take a look at this from the pydocs:
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.