akka custom fork-join-executor dispatcher behaves differently on OSX and RHEL - scala

When I deploy a Play framework application, using the Akka framework to a production machine it behaves differently then on my development workstation.
This is a system that receives a batch of device IP addresses, it performs some processing on each device and aggregates the results after all devices in the batch have been processed. This processing isn't very CPU intensive.
I basically have 2 types of actors, A BatchActor, and a DeviceActor. For the devices, I've created a created an actor backed by a RoundRobinPool router, and a custom dispatcher. I'm attempting to process ~500 device at a time (in parallel).
This issue is that when I run this code on my OSX machine, it runs as I would except.
For instance if I submit a batch of 200 device IP addresses, the application running on my workstations all the devices in parallel.
However when I copy this application to the production machine, Red Hat Enterprise Linux (RHEL), and run it submitting the same list of devices, it only processes 1 to 2 devices at a time.
What do I need to do to fix this issue?
The relevant code is as follows:
object Application extends Controller {
...
val numberOfWorkers = 500
val workers = Akka.system.actorOf(Props[DeviceActor]
.withRouter(RoundRobinPool(nrOfInstances = numberOfWorkers))
.withDispatcher("my-dispatcher")
)
def batchActor(config:BatchConfig)
= Akka.system.actorOf(BatchActor.props(workers, config), s"batch-${config.batchId}")
...
def batch = Action(parse.json) { request =>
request.body.validate[BatchConfig] match {
case config:BatchConfig => {
...
val batch = batchActor(config)
batch ! BatchActorProtocol.Start
Ok(Json.toJson(status))
}
...
}
}
The application.conf configuration section looks like the following:
my-dispatcher {
# Dispatcher is the name of the event-based dispatcher
type = Dispatcher
# What kind of ExecutionService to use
executor = "fork-join-executor"
# Configuration for the fork join pool
fork-join-executor {
# Min number of threads to cap factor-based parallelism number to
parallelism-min = 1000
# Parallelism (threads) ... ceil(available processors * factor)
parallelism-factor = 100.0
# Max number of threads to cap factor-based parallelism number to
parallelism-max = 5000
}
# Throughput defines the maximum number of messages to be
# processed per actor before the thread jumps to the next actor.
# Set to 1 for as fair as possible.
throughput = 500
}
Inside the BatchActor I'm simply parsing the list of devices and feeding it to the
class BatchActor(val workers:ActorRef, val config:BatchConfig) extends Actor
...
def receive = {
case Start => start
...
}
private def start = {
...
devices.map { devices =>
results(devices.host) = None
workers ! DeviceWork(self, config, devices, steps)
}
...
}
after which the WorkerActor submits a result object back to the BatchActer.
My workstation: OS X - v10.9.3
java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
production machine: Red Hat Enterprise Linux Server release 6.5 (Santiago)
java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
Software:
Scala: v2.11.2
SBT: v0.13.6
Play: v2.3.5
Akka: v2.3.4
I'm using typesafe activator/sbt to start the application. The command is as follows:
cd <project dir>
./activator run -Dhttp.port=6600
Any help appreciated. I've been stuck on this issue for a couple of days now.

I believe you have too much parallelism in your code i.e., you are creating too many threads in your dispatcher. How many cores do you have on your Redhat box ? I've never seen such high value used. A lot of threads in FJ pool may be resulting in a large number of context switches. Try just using the default dispatcher and see if that fixes your issue or not. You can also change the values of min and max parallelism to 2 or 3 times number of cores you have.
fork-join-executor {
# Min number of threads to cap factor-based parallelism number to
parallelism-min = 1000
# Parallelism (threads) ... ceil(available processors * factor)
parallelism-factor = 100.0
# Max number of threads to cap factor-based parallelism number to
parallelism-max = 5000
}
Another thing to try is to create an uber jar using (sbt-assembly) and then deploy that instead of using activator to deploy it.
Finally, you can look inside your JVMs using something like VisualJVM or Yourkit.

After hours spent trying different things including:
doing research on different threading implementations on linux - pthreads vs NPTL
reading through all the VM documentation on threading
ulimits
trying various changes in the Play and Akka framework configurations
and finally a complete re-write of the thread management using scala futures, etc..
Nothing seemed to work. Then I did a detailed comparison and the only thing that was different was that I used the Oracle Hotspot implementation on my laptop, and the OpenJDK implementation on the production machine.
So I installed the Oracle VM on the production machine and that seemed to fix the issue. Even though I couldn't determine what the ultimate solution was, it seems that the default installation of OpenJDK on RHEL is complied or configured differently enough to not allow spawning of ~ 500 threads at a time.
I'm sure I'm missing something, but after ~ 3 days of searching I couldn't find it.

Related

JAX pmap with multi-core CPU

What is the correct method for using multiple CPU cores with jax.pmap?
The following example creates an environment variable for SPMD on CPU core backends, tests that JAX recognises the devices, and attempts a device lock.
import os
os.environ["XLA_FLAGS"] = '--xla_force_host_platform_device_count=2'
import jax as jx
import jax.numpy as jnp
jx.local_device_count()
# WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
# 2
jx.devices("cpu")
# [CpuDevice(id=0), CpuDevice(id=1)]
def sfunc(x): while True: pass
jx.pmap(sfunc)(jnp.arange(2))
Executing from a jupyter kernel and observing htop shows that only one core is locked
I receive the same output from htop when omitting the first two lines and running:
$ env XLA_FLAGS=--xla_force_host_platform_device_count=2 python test.py
Replacing sfunc with
def sfunc(x): return 2.0*x
and calling
jx.pmap(sfunc)(jnp.arange(2))
# ShardedDeviceArray([0., 2.], dtype=float32, weak_type=True)
does return a SharedDeviecArray.
Clearly I am not correctly configuring JAX/XLA to use two cores. What am I missing and what can I do to diagnose the problem?
As far as I can tell, you are configuring the cores correctly (see e.g. Issue #2714). The problem lies in your test function:
def sfunc(x): while True: pass
This function gets stuck in an infinite loop at trace-time, not at run-time. Tracing happens in your host Python process on a single CPU (see How to think in JAX for an introduction to the idea of tracing within JAX transformations).
If you want to observe CPU usage at runtime, you'll have to use a function that finishes tracing and begins running. For that you could use any long-running function that actually produces results. Here is a simple example:
def sfunc(x):
for i in range(100):
x = (x # x)
return x
jx.pmap(sfunc)(jnp.zeros((2, 1000, 1000)))

Akka http server dispatcher number constantly increasing

I'm testing an akka http service on AWS ECS. Each instance is added to a load balancer which regularly makes requests to a health check route. Since this is a test environment I can control for no other traffic going to the server. I notice the debug log indicating that the "default dispatcher" number is consistently increasing:
[DEBUG] [01/03/2017 22:33:03.007] [default-akka.actor.default-dispatcher-41200] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:33:29.142] [default-akka.actor.default-dispatcher-41196] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:33:33.035] [default-akka.actor.default-dispatcher-41204] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:33:59.174] [default-akka.actor.default-dispatcher-41187] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:34:03.066] [default-akka.actor.default-dispatcher-41186] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:34:29.204] [default-akka.actor.default-dispatcher-41179] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:34:33.097] [default-akka.actor.default-dispatcher-41210] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
This trend is never reversed and will get up into the tens of thousands pretty soon. Is this normal behavior or indicative of an issue?
Edit: I've updated the log snippet to show that the dispatcher thread number goes way beyond what I would expect.
Edit #2: Here is the health check route code:
class HealthCheckRoutes()(implicit executionContext: ExecutionContext)
extends LogHelper {
val routes = pathPrefix("health-check") {
pathEndOrSingleSlash {
complete(OK -> "Ok")
}
}
}
Probably, yes. I think that's the thread name.
If you do a thread dump on the server, does it have a great many open threads?
It looks like your server is leaking a thread per connection.
(It will probably be much easier to debug and diagnose this on your development machine, rather than on the EC2 VM. Try to reproduce it locally.)
For you Question, check this comment:
Akka http server dispatcher number constantly increasing
About dispatcher:
It is no problem to use default dispatcher for operations like health check.
Threads are controlled by the dispatcher you specified, or default-dispatcher if not specified.
default-dispatcher is setting as following, which means the thread pool size is between 8 to 64 or equal to (number of processors * 3).
default-dispatcher {
type = "Dispatcher"
executor = "default-executor"
default-executor {
fallback = "fork-join-executor"
}
fork-join-executor {
# Min number of threads to cap factor-based parallelism number to
parallelism-min = 8
# The parallelism factor is used to determine thread pool size using the
# following formula: ceil(available processors * factor). Resulting size
# is then bounded by the parallelism-min and parallelism-max values.
parallelism-factor = 3.0
# Max number of threads to cap factor-based parallelism number to
parallelism-max = 64
# Setting to "FIFO" to use queue like peeking mode which "poll" or "LIFO" to use stack
# like peeking mode which "pop".
task-peeking-mode = "FIFO"
}
Dispathcer Document:
http://doc.akka.io/docs/akka/2.4.16/scala/dispatchers.html
Configuration reference:
http://doc.akka.io/docs/akka/2.4.16/general/configuration.html#akka-actor
BTW for operations take a long time and blocks other operations, here is how to specify a custom dispatcher in Akka HTTP for them:
http://doc.akka.io/docs/akka-http/current/scala/http/handling-blocking-operations-in-akka-http-routes.html
According to this akka-http github issue there doesn't seem to be a problem: https://github.com/akka/akka-http/issues/722

DDS 9th topic causes a crash

I am using DDS (more specifically RTI DDS) for a java application. I am creating each topic for my DDS implementation one by one in code so thus I can test each one with a DDS spy after the code is written. When I wrote the 8th topic everything worked fine. However when I then wrote the 9th topic, nothing seemed to happen as the program seemed to stop somewhere. I then debugged and after a lot of stepping into code, got this printed to council.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x01349a58, pid=16109, tid=2429123440
#
# JRE version: Java(TM) SE Runtime Environment (7.0_65-b17) (build 1.7.0_65-b17)
# Java VM: Java HotSpot(TM) Server VM (24.65-b04 mixed mode linux-x86 )
# Problematic frame:
# V [libjvm.so+0x48aa58] java_lang_String::utf8_length(oopDesc*)+0x58
#
# Core dump written. Default location: /home/foo/core or core.16109
#
# An error report file with more information is saved as:
#
# /home/foo/corehs_err_pid16109.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
#
[D0000|ENABLE]COMMENDSrReaderService_new:!create worker-specific object
[D0000|ENABLE]PRESPsService_enable:!create srr (strict reliable reader)
[D0000|ENABLE]DDS_DomainParticipantService_enable:!enable publish/subscribe service
[D0000|ENABLE]DDS_DomainParticipant_enableI:!enable service
I am not sure why this has happened all of a sudden when I created my 9th topic, yet if I only have 8 it works great. I have tried to increase my resourcelimits as well and get an Immutable QOS Policy error. Does anyone know why this error is occurring in terms of why my 9th topic causes a failure and how to fix the problem? I am running my application on 32 bit RHEL 6.6.
I found on this is because of the max objects per thread by default is set to 8 by the qos. To change this setting, before your first topic is created you must do the following.
DomainParticipantFactoryQos factoryQos =
new DomainParticipantFactoryQos();
DomainParticipantFactory.TheParticipantFactory.get_qos(factoryQos);
factoryQos.resource_limits.max_objects_per_thread = 2048;
DomainParticipantFactory.TheParticipantFactory.set_qos(factoryQos);
This then sets the size before the DDS starts and is thus editable and not immutable at that point.

Rest server (Play Framework) gets "Read Timed out" exception during load test

We are running a heavy load test (jmeter: 350 threads, 35M total requests) on a rest server using Play Framework and run into the following error after ~2 hour. We remove other components so that request simply take requests and do nothing. Anyone has any idea or simply Play Framework cannot handle heavy load like this?
2014/07/05 11:59:38 WARN - com.company.test.RestTest2: Run TestSQL throw error java.lang.Exception: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.company.dispatcher.RexsterRESTTaskDispatcher.dispatchTask(RexsterRESTTaskDispatcher.java:76)
at com.company.test.RestTest2.runTest(RestTest2.java:375)
at org.apache.jmeter.protocol.java.sampler.JavaSampler.sample(JavaSampler.java:191)
at org.apache.jmeter.threads.JMeterThread.process_sampler(JMeterThread.java:429)
at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:257)
at java.lang.Thread.run(Thread.java:744)
Part of the application.conf :
....
db.pool.timeout=100000
play {
akka {
akka.loggers = ["akka.event.Logging$DefaultLogger", "akka.event.slf4j.Slf4jLogger"]
loglevel = WARNING
actor {
default-dispatcher = {
fork-join-executor {
parallelism-factor = 64
parallelism-max = 1000
}
}
}
}
}
Had the this error today. It tool me a while to found out that one of the windows (svchost) processes was occupying the 1099 port, which the Jmeter server was trying to use.
I got a hint for this when trying to start the Jmeter-Server.bat file manually. Then, the following PowerShell command provided the details of that process. After closing that process, Jmeter clients started to connect again.
Get-Process -Id (Get-NetTCPConnection -LocalPort 1099).OwningProcess
There a many things to check:
Are you running Test from same machine ? if yes it's a problem
Is your machine TCP stack tuned ?
What is your JVM configuration regarding Xmx as long as your machine memory, CPU ...
What does your test look like ? could you show a screenshot with all elements unfolded ?
I think Play/AKKA can handle this load without problem so I would look into configuration issues.

mongodb higher faults on Windows than on Linux

I am executing below C# code -
for (; ; )
{
Console.WriteLine("Doc# {0}", ctr++);
BsonDocument log = new BsonDocument();
log["type"] = "auth";
BsonDateTime time = new BsonDateTime(DateTime.Now);
log["when"] = time;
log["user"] = "staticString";
BsonBoolean bol = BsonBoolean.False;
log["res"] = bol;
coll.Insert(log);
}
When I run it on a MongoDB instance (version 2.0.2) running on virtual 64 bit Linux machine with just 512 MB ram, I get about 5k inserts with 1-2 faults as reported by mongostat after few mins.
When same code is run against a MongoDB instance (version 2.0.2) running on a physical Windows machine with 8 GB of ram, I get 2.5k inserts with about 80 faults as reported by mongostat after few mins.
Why more faults are occurring on Windows? I can see following message in logs-
[DataFileSync] FlushViewOfFile failed 33 file
Journaling is disable on both instances
Also, is 5k insert on a virtual machine with 1-2 faults a good enough speed? or should I be expecting better inserts?
Looks like this is a known issue - https://jira.mongodb.org/browse/SERVER-1163
page fault counter on Windows is in fact the total page faults which include both hard and soft page fault.
Process : Page Faults/sec. This is an indication of the number of page faults that
occurred due to requests from this particular process. Excessive page faults from a
particular process are an indication usually of bad coding practices. Either the
functions and DLLs are not organized correctly, or the data set that the application
is using is being called in a less than efficient manner.