Vertx http server instance number does not improve throughput

Vertx http server instance number does not improve throughput - vert.x

I am using Vertx 3.8.0 to build a http server. The CPU can not be utilized (Only about 25% of CPU can be used) even when I set the instance of the verticle to number larger than 1. The instresting thing is the best performance I can get is when I set the instance number to 1.
public class Runner {
public static void main(String[] args) {
VertxOptions vertxOptions = new VertxOptions().setPreferNativeTransport(true);
vertxOptions.setEventLoopPoolSize(6);
final HttpServerOptions options = new HttpServerOptions()
.setTcpFastOpen(true)
.setTcpNoDelay(true)
.setTcpQuickAck(true);
final Vertx vertx = Vertx.vertx(vertxOptions);
DeploymentOptions deploymentOptions;
deploymentOptions = new DeploymentOptions().setInstances(3);
vertx.deployVerticle(() -> new AbstractVerticle() {
#Override
public void start(Future<Void> startFuture) {
vertx.createHttpServer(options)
.requestHandler(req -> {
req.response().end("1");
})
.listen(8080, "0.0.0.0");
}
}, deploymentOptions
);
System.out.println("Deployment done with pooling");
}
}
I used apache benchmark to test throughput of the server.
ab -c 150 -n 100000 http://10.32.31.35:8080/api/values/
The throughput result in about 8k per second. The server only utilize about 25% of the CPU.
If I use keepalive of http, the throughput is about 48k with about 50% CPU.
I used JMX to monitor the server program. It seems like that the instances number setting actually worked. There are more than 1 eventloops processing the requests, but it's likely the acceptor event loop is the bottleneck.
Is there anyway to improve this?
I think multiple instance of vertx would help(Like docker) but isn't there any more elegant way to utilize the computing resource?

There are some invalid assumptions with this test:
You think you are deploying 3 servers, but they're deployed on the same port, so only one actually listens. And deploying more servers doesn't increase you concurrency anyway
Your test doesn't utilize event loop that much. Most of your time is wasted on establishing new connections. That's why you see "improvement" while using keepalive. It's pure networking, not Vert.x
Make sure you run ab on a separate machine, or you're competing on the same resources
Don't expect to see some kind of 100% CPU utilization anyway, as you're not doing anything CPU intensive, actually

Related

Akka cluster sharding configuration

Ok, I have 2 instances of my backend, hosted on 2 difference centos servers. What I want to do using Akka Cluster Sharding is to divide the work done by each of these instances:
I have data for 4 countries, which is retrieved from db at every 10 seconds by both backend instances, which update a Redis instance. So, multiple times, I have duplicated requests, because both backends get data for same country;
Using Akka Cluster Sharding, I try to divide the work dinamically, instance1 to get data for ES and EN, instance2 to get data for DE and IT. In case of instance1 is down, instance2 will take the jobs and will get data even for ES/EN.
I tought this is simple...but not.
All jobs are done by Akka Actors, so using Cluster Sharding, I thought all declared actors (from both instances) will be centralized somewhere, to can manipulate which do whatever job.
On localhost, all works fine, because I have an instance for my app with port 9001 and 2 cluster nodes with ports 2551 and 2552. But for production, I can't understand how to configure the hostnames
application.conf
"clusterRegistration" {
akka {
actor {
allow-java-serialization = on
provider = cluster
}
remote.artery {
enabled = on
transport = aeron-udp
}
cluster {
jmx.multi-mbeans-in-same-jvm = on
seed-nodes = [
"akka://ClusterService#instance1:8083",
"akka://ClusterService#instance1:2551"
]
}
}
}
class
object ClusterSharding {
def createNode(hostname: String, port: Int, role: String, props: Props, actorName: String) = {
val config = ConfigFactory.parseString(
s"""
|akka.cluster.roles = ["$role"]
|akka.remote.artery.canonical.hostname = $hostname
|akka.remote.artery.canonical.port = $port
|""".stripMargin
).withFallback(ConfigFactory.load
.getConfig("clusterRegistration"))
val system = ActorSystem("ClusterService", config)
system.actorOf(props, actorName)
}
val master = createNode("instance1", 8083, "master", Props[Master], "master")
createNode("instance1", 2551, "worker", Props[Worker], "worker")
createNode("instance2", 8083, "worker", Props[Worker], "worker")
Future {
while (true) {
master ! Proceed // this will fire an Actor Resolver case
Thread.sleep(5000)
}
}
}
master actor
class Master extends Actor {
var workers: Map[Address, ActorRef] = Map()
val cluster = Cluster(context.system)
override def preStart(): Unit = {
cluster.subscribe(
self,
initialStateMode = InitialStateAsEvents,
classOf[MemberEvent],
classOf[UnreachableMember]
)
}
override def postStop(): Unit = {
cluster.unsubscribe(self)
}
def receive = handleClusterEvents // cluster events
.orElse(handleWorkerRegistration) // worker registered to cluster
.orElse(handleJob) // give jobs to workers
def handleJob: Receive = {
case Proceed => {
// Here I must be able to use all workers from both instances
// (centos1 and centos2) and give work for each dinamically
if (workers.length == 2) {
worker1 ! List("EN", "ES")
worker2 ! List("DE", "IT")
} else if (workers.length == 1) {
worker ! List("EN", "ES", "DE", "IT")
} else {
execQueries() // if no worker is available, each backend instance will exec queries on his own way
}
}
}
}
Both instances are hosted with port 8083 (centos1: instance1:8083, centos2: instance2:8083). If I use settings just for one of the instances in application.conf and in createNode (instance1 for example), I can see in logs that the workers are created, but there is no communication with the second instance.
Where I'm wrong? thx

Your approach to configuring the hostnames is viable. There are better ways to do it (depending on how you're deploying the service: manual deploy vs. ansible/chef/puppet vs. docker vs. kubernetes/nomad/mesos will be different), but setting the hostname isn't likely your actual problem.
Your current approach will give you a master and 2 workers on every node and you're not actually using Cluster Sharding (you're using Cluster, but Cluster Sharding is something you opt into on top of Cluster). From the code you've posted, I strongly suspect that using Cluster Sharding will entail a dramatic redesign (though without posting the Worker and more complete Master code, it's hard to say).
The broad approach I'd take with this would be to have the process of updating Redis for a given country be owned by a sharded entity (keyed by that country). A cluster singleton actor would trigger the update process for each country every 10 seconds. Because we're using sharding and singleton, I'd probably actually have at least 3 instances of the service, or alternatively make use of a strongly consistent external lease system (the other split-brain resolution strategies (note that cluster sharding and cluster singleton basically force you to resolve split-brains) will all boil down, at least half the time, to losing one node is the same as losing both in a 2-node cluster). Because sharding implies that the actor for a process could be stopped arbitrarily (and possibly resumed on a different node), you'll also want to think about how the process can be resumed in a way that makes sense for the application.
Starting multiple ActorSystems in the same JVM process is generally only a good idea in fairly specific circumstances.

Akka http server dispatcher number constantly increasing

I'm testing an akka http service on AWS ECS. Each instance is added to a load balancer which regularly makes requests to a health check route. Since this is a test environment I can control for no other traffic going to the server. I notice the debug log indicating that the "default dispatcher" number is consistently increasing:
[DEBUG] [01/03/2017 22:33:03.007] [default-akka.actor.default-dispatcher-41200] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:33:29.142] [default-akka.actor.default-dispatcher-41196] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:33:33.035] [default-akka.actor.default-dispatcher-41204] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:33:59.174] [default-akka.actor.default-dispatcher-41187] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:34:03.066] [default-akka.actor.default-dispatcher-41186] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:34:29.204] [default-akka.actor.default-dispatcher-41179] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:34:33.097] [default-akka.actor.default-dispatcher-41210] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
This trend is never reversed and will get up into the tens of thousands pretty soon. Is this normal behavior or indicative of an issue?
Edit: I've updated the log snippet to show that the dispatcher thread number goes way beyond what I would expect.
Edit #2: Here is the health check route code:
class HealthCheckRoutes()(implicit executionContext: ExecutionContext)
extends LogHelper {
val routes = pathPrefix("health-check") {
pathEndOrSingleSlash {
complete(OK -> "Ok")
}
}
}

Probably, yes. I think that's the thread name.
If you do a thread dump on the server, does it have a great many open threads?
It looks like your server is leaking a thread per connection.
(It will probably be much easier to debug and diagnose this on your development machine, rather than on the EC2 VM. Try to reproduce it locally.)

For you Question, check this comment:
Akka http server dispatcher number constantly increasing
About dispatcher:
It is no problem to use default dispatcher for operations like health check.
Threads are controlled by the dispatcher you specified, or default-dispatcher if not specified.
default-dispatcher is setting as following, which means the thread pool size is between 8 to 64 or equal to (number of processors * 3).
default-dispatcher {
type = "Dispatcher"
executor = "default-executor"
default-executor {
fallback = "fork-join-executor"
}
fork-join-executor {
# Min number of threads to cap factor-based parallelism number to
parallelism-min = 8
# The parallelism factor is used to determine thread pool size using the
# following formula: ceil(available processors * factor). Resulting size
# is then bounded by the parallelism-min and parallelism-max values.
parallelism-factor = 3.0
# Max number of threads to cap factor-based parallelism number to
parallelism-max = 64
# Setting to "FIFO" to use queue like peeking mode which "poll" or "LIFO" to use stack
# like peeking mode which "pop".
task-peeking-mode = "FIFO"
}
Dispathcer Document:
http://doc.akka.io/docs/akka/2.4.16/scala/dispatchers.html
Configuration reference:
http://doc.akka.io/docs/akka/2.4.16/general/configuration.html#akka-actor
BTW for operations take a long time and blocks other operations, here is how to specify a custom dispatcher in Akka HTTP for them:
http://doc.akka.io/docs/akka-http/current/scala/http/handling-blocking-operations-in-akka-http-routes.html

According to this akka-http github issue there doesn't seem to be a problem: https://github.com/akka/akka-http/issues/722

Riak Java HTTPClientAdapter TCP CLOSE_WAIT

TLDR:
Lots of TCP connections in OPEN_WAIT status shutting down server
Setup:
riak_1.2.0-1_amd64.deb installed on Ubuntu12
Spring MVC 3.2.5
riak-client-1.1.0.jar
Tomcat7.0.51 hosted on Windows Server 2008 R2
JRE6_45
Full Description:
How do I ensure that the Java RiakClient is properly cleaning up it's connections to that I'm not left with an abundance of CLOSE_WAIT tcp connections?
I have a Spring MVC application which uses the Riak java client to connect to the remote instance/cluster.
We are seeing a lot of TCP Connections on the server hosting the Spring MVC application, which continue to build up until the server can no longer connect to anything because there are no ports available.
Restarting the Riak cluster does not clean the connections up.
Restarting the webapp does clean up the extra connections.
We are using the HTTPClientAdapter and REST api.
When connecting to a relational database, I would normally clean up connections by either explicitly calling close on the connection, or by registering the datasource with a pool and transaction manager and then Annotating my Services with #Transactional.
But since using the HTTPClientAdapter, I would have expected this to be more like an HttpClient.
With an HttpClient, I would consume the Response entity, with EntityUtils.consume(...), to ensure that the everything is properly cleaned up.
HTTPClientAdapter does have a shutdown method, and I see it being called in the online examples.
When I traced the method call through to the actual RiakClient, the method is empty.
Also, when I dig through the source code, nowhere in it does it ever close the Stream on the HttpResponse or consume any response entity (as with the standard Apache EntityUtils example).
Here is an example of how the calls are being made.
private RawClient getRiakClientFromUrl(String riakUrl) {
return new HTTPClientAdapter(riakUrl);
}
public IRiakObject fetchRiakObject(String bucket, String key, boolean useCache) {
try {
MethodTimer timer = MethodTimer.start("Fetch Riak Object Operation");
//logger.debug("Fetching Riak Object {}/{}", bucket, key);
RiakResponse riakResponse;
riakResponse = riak.fetch(bucket, key);
if(!riakResponse.hasValue()) {
//logger.debug("Object {}/{} not found in riak data store", bucket, key);
return null;
}
IRiakObject[] riakObjects = riakResponse.getRiakObjects();
if(riakObjects.length > 1) {
String error = "Got multiple riak objects for " + bucket + "/" + key;
logger.error(error);
throw new RuntimeException(error);
}
//logger.debug("{}", timer);
return riakObjects[0];
}
catch(Exception e) {
logger.error("Error fetching " + bucket + "/" + key, e);
throw new RuntimeException(e);
}
}
The only option I can think of, is to create the RiakClient separately from the adapter so I can access the HttpClient and then the ConnectionManager.
I am currently working on switching over to the PBClientAdapter to see if that might help, but for the purposes of this question (and because the rest of the team may not like me switching for whatever reason), let's assume that I must continue to connect over HTTP.

So it's been almost a year, so I thought I would go ahead and post how I solved this problem.
The solution was to change the client implementation we were using to the HTTPClientAdapter provided by the java client, passing in the configuration to implement pools and max connections. Here's some code example of how to do it.
First, we are on an older version of RIAK, so here's the amven dependency:
<dependency>
<groupId>com.basho.riak</groupId>
<artifactId>riak-client</artifactId>
<version>1.1.4</version>
</dependency>
And here's the example:
public RawClient riakClient(){
RiakConfig config = new RiakConfig(riakUrl);
//httpConnectionsTimetolive is in seconds, but timeout is in milliseconds
config.setTimeout(30000);
config.setUrl("http://myriakurl/);
config.setMaxConnections(100);//Or whatever value you need
RiakClient client = new RiakClient(riakConfig);
return new HTTPClientAdapter(client);
}
I actually broke that up a bit in my implementation and used Spring to inject values; I just wanted to show a simplified example for it.
By setting the timeout to something less than the standard five minutes, the system will not hang to the connections for too long (so, 5 minutes + whatever you set the timeout to) which causes the connectiosn to enter the close_wait status sooner.
And of course setting the max connections in the pool prevents the application from opening up 10's of thousands of connections.

akka custom fork-join-executor dispatcher behaves differently on OSX and RHEL

When I deploy a Play framework application, using the Akka framework to a production machine it behaves differently then on my development workstation.
This is a system that receives a batch of device IP addresses, it performs some processing on each device and aggregates the results after all devices in the batch have been processed. This processing isn't very CPU intensive.
I basically have 2 types of actors, A BatchActor, and a DeviceActor. For the devices, I've created a created an actor backed by a RoundRobinPool router, and a custom dispatcher. I'm attempting to process ~500 device at a time (in parallel).
This issue is that when I run this code on my OSX machine, it runs as I would except.
For instance if I submit a batch of 200 device IP addresses, the application running on my workstations all the devices in parallel.
However when I copy this application to the production machine, Red Hat Enterprise Linux (RHEL), and run it submitting the same list of devices, it only processes 1 to 2 devices at a time.
What do I need to do to fix this issue?
The relevant code is as follows:
object Application extends Controller {
...
val numberOfWorkers = 500
val workers = Akka.system.actorOf(Props[DeviceActor]
.withRouter(RoundRobinPool(nrOfInstances = numberOfWorkers))
.withDispatcher("my-dispatcher")
)
def batchActor(config:BatchConfig)
= Akka.system.actorOf(BatchActor.props(workers, config), s"batch-${config.batchId}")
...
def batch = Action(parse.json) { request =>
request.body.validate[BatchConfig] match {
case config:BatchConfig => {
...
val batch = batchActor(config)
batch ! BatchActorProtocol.Start
Ok(Json.toJson(status))
}
...
}
}
The application.conf configuration section looks like the following:
my-dispatcher {
# Dispatcher is the name of the event-based dispatcher
type = Dispatcher
# What kind of ExecutionService to use
executor = "fork-join-executor"
# Configuration for the fork join pool
fork-join-executor {
# Min number of threads to cap factor-based parallelism number to
parallelism-min = 1000
# Parallelism (threads) ... ceil(available processors * factor)
parallelism-factor = 100.0
# Max number of threads to cap factor-based parallelism number to
parallelism-max = 5000
}
# Throughput defines the maximum number of messages to be
# processed per actor before the thread jumps to the next actor.
# Set to 1 for as fair as possible.
throughput = 500
}
Inside the BatchActor I'm simply parsing the list of devices and feeding it to the
class BatchActor(val workers:ActorRef, val config:BatchConfig) extends Actor
...
def receive = {
case Start => start
...
}
private def start = {
...
devices.map { devices =>
results(devices.host) = None
workers ! DeviceWork(self, config, devices, steps)
}
...
}
after which the WorkerActor submits a result object back to the BatchActer.
My workstation: OS X - v10.9.3
java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
production machine: Red Hat Enterprise Linux Server release 6.5 (Santiago)
java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
Software:
Scala: v2.11.2
SBT: v0.13.6
Play: v2.3.5
Akka: v2.3.4
I'm using typesafe activator/sbt to start the application. The command is as follows:
cd <project dir>
./activator run -Dhttp.port=6600
Any help appreciated. I've been stuck on this issue for a couple of days now.

I believe you have too much parallelism in your code i.e., you are creating too many threads in your dispatcher. How many cores do you have on your Redhat box ? I've never seen such high value used. A lot of threads in FJ pool may be resulting in a large number of context switches. Try just using the default dispatcher and see if that fixes your issue or not. You can also change the values of min and max parallelism to 2 or 3 times number of cores you have.
fork-join-executor {
# Min number of threads to cap factor-based parallelism number to
parallelism-min = 1000
# Parallelism (threads) ... ceil(available processors * factor)
parallelism-factor = 100.0
# Max number of threads to cap factor-based parallelism number to
parallelism-max = 5000
}
Another thing to try is to create an uber jar using (sbt-assembly) and then deploy that instead of using activator to deploy it.
Finally, you can look inside your JVMs using something like VisualJVM or Yourkit.

After hours spent trying different things including:
doing research on different threading implementations on linux - pthreads vs NPTL
reading through all the VM documentation on threading
ulimits
trying various changes in the Play and Akka framework configurations
and finally a complete re-write of the thread management using scala futures, etc..
Nothing seemed to work. Then I did a detailed comparison and the only thing that was different was that I used the Oracle Hotspot implementation on my laptop, and the OpenJDK implementation on the production machine.
So I installed the Oracle VM on the production machine and that seemed to fix the issue. Even though I couldn't determine what the ultimate solution was, it seems that the default installation of OpenJDK on RHEL is complied or configured differently enough to not allow spawning of ~ 500 threads at a time.
I'm sure I'm missing something, but after ~ 3 days of searching I couldn't find it.

Rest server (Play Framework) gets "Read Timed out" exception during load test

We are running a heavy load test (jmeter: 350 threads, 35M total requests) on a rest server using Play Framework and run into the following error after ~2 hour. We remove other components so that request simply take requests and do nothing. Anyone has any idea or simply Play Framework cannot handle heavy load like this?
2014/07/05 11:59:38 WARN - com.company.test.RestTest2: Run TestSQL throw error java.lang.Exception: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.company.dispatcher.RexsterRESTTaskDispatcher.dispatchTask(RexsterRESTTaskDispatcher.java:76)
at com.company.test.RestTest2.runTest(RestTest2.java:375)
at org.apache.jmeter.protocol.java.sampler.JavaSampler.sample(JavaSampler.java:191)
at org.apache.jmeter.threads.JMeterThread.process_sampler(JMeterThread.java:429)
at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:257)
at java.lang.Thread.run(Thread.java:744)
Part of the application.conf :
....
db.pool.timeout=100000
play {
akka {
akka.loggers = ["akka.event.Logging$DefaultLogger", "akka.event.slf4j.Slf4jLogger"]
loglevel = WARNING
actor {
default-dispatcher = {
fork-join-executor {
parallelism-factor = 64
parallelism-max = 1000
}
}
}
}
}

Had the this error today. It tool me a while to found out that one of the windows (svchost) processes was occupying the 1099 port, which the Jmeter server was trying to use.
I got a hint for this when trying to start the Jmeter-Server.bat file manually. Then, the following PowerShell command provided the details of that process. After closing that process, Jmeter clients started to connect again.
Get-Process -Id (Get-NetTCPConnection -LocalPort 1099).OwningProcess

There a many things to check:
Are you running Test from same machine ? if yes it's a problem
Is your machine TCP stack tuned ?
What is your JVM configuration regarding Xmx as long as your machine memory, CPU ...
What does your test look like ? could you show a screenshot with all elements unfolded ?
I think Play/AKKA can handle this load without problem so I would look into configuration issues.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse