Multiple concurrent connections with Vertx - vert.x

I'm trying to build a web application that should be able to handle at least 15000 rps. Some of the optimizations I have done is increase the worker pool size to 20 and set an accept back log to 25000. Since I have set my worker pool size to 20; wil this help with the the blocking piece of code?

A worker pool size of 20 seems to be the default.
I believe the important question in your case is how long do you expect each request to run. On my side, I expect to have thousands of short-lived requests, each with a payload size of about 5-10KB. All of these will be blocking, because of a blocking database driver I use at the moment. I have increased the default worker pool size to 40 and I have explicitly set my deploy vertical instances using the following formulae:
final int instances = Math.min(Math.max(Runtime.getRuntime().availableProcessors() / 2, 1), 2);
A test run of 500 simultaneous clients running for 60 seconds, on a vert.x server doing nothing but blocking calls, produced an average of 6 failed requests out of 11089. My test payload in this case was ~28KB.
Of course, from experience I know that running my software in production would often produce results that I have not anticipated. Thus, the important thing in my case is to have good atomicity rules in place, so that I don't get half-baked or corrupted data in the database.

Related

how to reduce the getConnection time consume in Atomikos

I am using Atomkios XA configuration for Oracle.
This is my code for creating datasource connection.
OracleXADataSource oracleXADataSource = new OracleXADataSource();
oracleXADataSource.setURL(sourceURL);
oracleXADataSource.setUser(UN);
oracleXADataSource.setPassword(PS);
AtomikosDataSourceBean sourceBean= new AtomikosDataSourceBean();
sourceBean.setXaDataSource(oracleXADataSource);
sourceBean.setUniqueResourceName(resourceName);
sourceBean.setMaxPoolSize(max-pool-size); // 10
atomikos:
datasource:
resourceName: insight
max-pool-size: 10
min-pool-size: 3
transaction-manager-id: insight-services-tm
This is my configuration is fine for medium user load around 5000 requests.
But when user count increases assume more than 10000 requests, this class com.atomikos.jdbc.AbstractDataSourceBean:getConnection consuming more then than normal.
This class time taking approximately 1500ms but normal time it takes less then 10ms. I can understand user demand increases, getConnection is going to wait state to pick the free connection from the connection pool so if I increase my max-pool-size, will my problem sort out or any other option feature available to sort out my problem.
Try setting concurrentConnectionValidation=true on your datasource.
If that does not help then consider a free trial of the commercial Atomikos product:
https://www.atomikos.com/Main/ExtremeTransactionsFreeTrial
Best

Akka Actor Messaging Delay

I'm experiencing issues scaling my app with multiple requests.
Each request sends an ask to an actor, which then spawns other actors. This is fine, however, under load(5+ asks at once), the ask takes a massive amount of time to deliver the message to the target actor. The original design was to bulkhead requests evenly, but this is causing a bottleneck. Example:
In this picture, the ask is sent right after the query plan resolver. However, there is a multi-second gap when the Actor receives this message. This is only experienced under load(5+ requests/sec). I first thought this was a starvation issue.
Design:
Each planner-executor is a seperate instance for each request. It spawns a new 'Request Acceptor' actor each time(it logs 'requesting score' when it receives a message).
I gave the actorsystem a custom global executor(big one). I noticed the threads were not utilized beyond the core threadpool size even during this massive delay
I made sure all executioncontexts in the child actors used the correct executioncontext
Made sure all blocking calls inside actors used a future
I gave the parent actor(and all child) a custom dispatcher with core size 50 and max size 100. It did not request more(it stayed at 50) even during these delays
Finally, I tried creating a totally new Actorsystem for each request(inside the planner-executor). This also had no noticable effect!
I'm a bit stumped by this. From these tests it does not look like a thread starvation issue. Back at square one, I have no idea why the message takes longer and longer to deliver the more concurrent requests I make. The Zipkin trace before reaching this point does not degrade with more requests until it reaches the ask here. Before then, the server is able to handle multiple steps to e.g veify the request, talk to the db, and then finally go inside the planner-executor. So I doubt the application itself is running out of cpu time.
We had this very similar issue with Akka. We observed huge delay in ask pattern to deliver messages to the target actor on peek load.
Most of these issues are related to heap memory consumption and not because of usages of dispatchers.
Finally we fixed these issues by tuning some of the below configuration and changes.
1) Make sure you stop entities/actors which are no longer required. If its a persistent actor then you can always bring it back when you need it.
Refer : https://doc.akka.io/docs/akka/current/cluster-sharding.html#passivation
2) If you are using cluster sharding then check the akka.cluster.sharding.state-store-mode. By changing this to persistence we gained 50% more TPS.
3) Minimize your log entries (set it to info level).
4) Tune your logs to publish messages frequently to your logging system. Update the batch size, batch count and interval accordingly. So that the memory is freed. In our case huge heap memory is used for buffering the log messages and send in bulk. If the interval is more then you may fill your heap memory and that affects the performance (more GC activity required).
5) Run blocking operations on a separate dispatcher.
6) Use custom serializers (protobuf) and avoid JavaSerializer.
7) Add the below JAVA_OPTS to your jar
export JAVA_OPTS="$JAVA_OPTS -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -Djava.security.egd=file:/dev/./urandom"
The main thing is XX:MaxRAMFraction=2 which will utilize more than 60% of available memory. By default its 4 means your application will use only one fourth of the available memory, which might not be sufficient.
Refer : https://blog.csanchez.org/2017/05/31/running-a-jvm-in-a-container-without-getting-killed/
Regards,
Vinoth

Play 2.4.6 performance

I am using Play 2.4.6, Scala 2.11.8 and Cassandra 2.1.14. Few days back I faced problem where clients were receiving response from play application after 15-20 seconds. I have set up of 2 servers where play application is running, both servers are having 8 cores. I am not sure why play application was responding so slowly. CPU usage are very low around 15%. I am using fork-join pull with following parameters
contexts {
cassandra-pool = {
fork-join-executor {
parallelism-factor = 1.0
parallelism-max = 200
}
}
}
I am using above pool as ExecutionContext in all of my controllers and all requests are wrapped in Action.async. As I know play supports very high concurrent requests. In my case, I was receiving only 800 requests and I had two servers to serve the requests.
When I tried to hit 1000 concurrent requests, all requests were served in less than 50 milli seconds. I tried this in local environment, with single core system.
I have gone through various blogs and seen that for IO fork join pool should not be choice. I couldn't understand what gone wrong. Can someone please explain what could be issue? How can I conclude that thread-pool was the issue? What "parallelism-max" is meant for? Does it keep upper bound on thread pool count?

Scala ExecutionContext for REST API Calls

In my application which is a HTTP service that exposes several API's that can be consumed by other services, I have a situation in which I have to call 2 different external services which would be a messaging service and another REST service.
I understand that for these I/O bound operations, it is a good practice to use a separate thread pool or ExecutionContext. I'm using the following to create a configuration for the custom ExecutionContext in my application.conf:
execution-context {
fork-join-executor {
parallelism-max = 10
}
}
I have a couple of questions:
Is this going to create 10 dedicated threads?
How do I know the size of the parallelism-max?
Say if I'm going to use this execution context to make REST API calls, how should I size this?
Is this going to create 10 dedicated threads?
Close, but not exactly.
As you can read from Akka documentation, three properties, parallelism-min, parallelism-factor and parallelism-max are used to calculate parallelism parameter that is then supplied to underlying ForkJoinPool. The formula is parallelism = clamp(parallelism-min, ceil(available processors * factor), parallelism-max).
Now about parallelism. As you can read from the docs, it roughly corresponds to the number of "hot" threads, but under some circumstances additional threads might be spawned. Namely, when some threads are blocked inside ManagedBlocking. Read that answer for additional details.
How do I know the size of the parallelism-max
It depends on your use case. If you block one thread per task, how many simultaneous task executions do you expect?
Say if I'm going to use this execution context to make REST API calls, how should I size this?
Again, how many simultaneous requests you want to make? If you are going to block your threads, and you expect to have large number of simultaneous http calls, and you want them to be processed as soon as possible, you want large thread pool.
However, if you application makes that many http requests, why not use existent library. Libraries like ApacheHttpClient allow you to configure parallelism in terms of http connections or connection per host.
Also for making http calls from actors, it's natural to use non-blocking http clients, like netty-based AsyncHttpClient. It also has thread pool inside (obviously), but it is fixed and any number of simultaneous connections are handled by this fixed amount of threads in non-blocking way.

Apache HttpClient PoolingHttpClientConnectionManager leaking connections?

I am using the Apache Http Client in a Scala application.
The application is fairly high throughput with high parallelism.
I am not sure but I think perhaps I am leaking connections. It seems that whenever the section of code that uses the client gets busy, the application become unresponsive. My suspicion is that I am leaking sockets or something which is then causing other aspects of the application to stop working. It may also not be leaking connections so much as not closing them fast enough.
For more context, occasionally, certain actions lead to this code being executed hundreds of times a minute in parallel. When this happens the Rest API (Spray) of the application becomes unresponsive. There are other areas of the application that operate in high parallelism as well and those never cause a problem with the applications responsiveness.
Cutting back on the parallelism of this section of code does seem to alleviate the problem but isn't a viable long term solution.
Am I forgetting to configure something, or configuring something incorrectly?
The code I am using is something like this:
class SomeClass {
val connectionManager = new PoolingHttpClientConnectionManager()
connectionManager.setDefaultMaxPerRoute(50)
connectionManager.setMaxTotal(500)
val httpClient = HttpClients.custom().setConnectionManager(connectionManager).build()
def postData() {
val post = new HttpPost("http://SomeUrl") // Typically this URL is fixed. It doesn't vary much if at all.
post.setEntity(new StringEntity("Some Data"))
try {
val response = httpClient.execute(post)
try {
// Check the response
} finally {
response.close()
}
} finally {
post.releaseConnection()
}
}
}
EDIT
I can see that I am building up a lot of connections in the TIME_WAIT state. I have tried adjusting the DefaultMaxPerRoute and the MaxTotal to a variety of values with no noticeable effect. It seems like I am missing something and as a result the connections are not being re-used, but I can't find any documentation that suggests what I am missing. It is critical that these connections get re-used.
EDIT 2
With further investigation, using lsof -p, I can see that if I set the MaxPerRoute to 10, there are in fact 10 connections being listed as "ESTABLISHED". I can see that the port numbers do not change. This seems to imply to me that in fact it is re-using the connections.
What that doesn't explain is why I am still leaking connections in this code? The reused connections and leaked connections (found with netstat -a) showing up in TIME_WAIT status share the same base url. So they are definitely related. Is it possible perhaps that I am re-using the connections but then somehow not properly closing the response?
EDIT 3
Located the source of the TIME_WAIT "leak". It was in an unrelated section of code. So it wasn't anything to do with the HttpClient. However after fixing up that code, all the TIME_WAITs went away, but the application is still becoming unresponsive when hitting the HttpClient code many times. Still investigating that portion.
You should really consider re-using HttpClient instance or at least the connection pool that underpins it instead of creating them for each new request execution. If you wish to continue doing the latter, you should also close the client or shut down the connection pool before they go out of scope.
As far as the leak is concerned, it should be relatively easy to track by running your application with context logging for connection management turned out as described here
IMO - you can use a much lower number of maxConnection per domain ( like 5 instead of 50 ) and still completely saturate your network bandwidth, if you use http efficiently.
im not a scala person ( android , java ) but have done lots and lots of optimization on http client side threadpools. IMO - blindly increasing connections per domain to 50 is masking some other serious issue with thruput.
2 points:
if you are using a shared "sharedPoolingClientConnManager" , correctly going to a small pool per domain and you conform to the recommended way of release your conn back to the pool ( you should be able to debug all this seeing a running metric of the state of connection per threadpool instance ) then u should be good.
whatever the parallelism feature of scala , you should understand something of how the 5 respective threads from the pool on a domain are sharing the socket?? IMO from the android/java experience is that even though each thread executor is supposedly doing blocking I/O to the server in the scope of that httpclient.exec statement, the actual channel management involved allows very high thruput without resorting to ASNyC client libs for http.
Android experience may not be relevant because client has only 4 threads. Having said that , even if you have 64 or more threads available , i just dont understand needing more than 10 connection per domain in order to keep your underlying http socket very , very busy with thruput.