how to reduce the getConnection time consume in Atomikos - spring-data-jpa

I am using Atomkios XA configuration for Oracle.
This is my code for creating datasource connection.
OracleXADataSource oracleXADataSource = new OracleXADataSource();
oracleXADataSource.setURL(sourceURL);
oracleXADataSource.setUser(UN);
oracleXADataSource.setPassword(PS);
AtomikosDataSourceBean sourceBean= new AtomikosDataSourceBean();
sourceBean.setXaDataSource(oracleXADataSource);
sourceBean.setUniqueResourceName(resourceName);
sourceBean.setMaxPoolSize(max-pool-size); // 10
atomikos:
datasource:
resourceName: insight
max-pool-size: 10
min-pool-size: 3
transaction-manager-id: insight-services-tm
This is my configuration is fine for medium user load around 5000 requests.
But when user count increases assume more than 10000 requests, this class com.atomikos.jdbc.AbstractDataSourceBean:getConnection consuming more then than normal.
This class time taking approximately 1500ms but normal time it takes less then 10ms. I can understand user demand increases, getConnection is going to wait state to pick the free connection from the connection pool so if I increase my max-pool-size, will my problem sort out or any other option feature available to sort out my problem.

Try setting concurrentConnectionValidation=true on your datasource.
If that does not help then consider a free trial of the commercial Atomikos product:
https://www.atomikos.com/Main/ExtremeTransactionsFreeTrial
Best

Related

EF core request cannot wake-up Azure Sql (serverless sku) database and times out

I'm using EF Core with one of my apps to query an Azure Sql database. It's the serverless sku, that scales down to zero (goes to sleep) after 1h of inactivity.
Now, in that app there is scheduled function to query the database at certain points in time. This often is in a time, where the DB is sleeping. To compensate for this, I'm using the the following in the DbContext.cs
optionsBuilder.UseSqlServer(connection, opt => opt.EnableRetryOnFailure(
maxRetryCount: 20,
maxRetryDelay: TimeSpan.FromSeconds(30),
errorNumbersToAdd: null
));
If the delay is evenly distributed, that results in an avg of 15s, with 20 retries => timeout after 5mins.
I thought this should be plenty, since when querying a sleeping database with SSMS it usaully takes well under 1min to get going. However, this is not the case, the functions regularly time-out and the queries fail.
Is there a better way to deal with this than just even more increasing the timeout? Should 5mins really not be enough?
Cheers
I think I got it working now. The above code snippet from EF core is relevant to any command timeout occurences. However, since the database was sleeping during the request it was rather a connection timeout issue. I fixed this, by providing adding Connect Timeout=120 in the connection string itself.

Multiple concurrent connections with Vertx

I'm trying to build a web application that should be able to handle at least 15000 rps. Some of the optimizations I have done is increase the worker pool size to 20 and set an accept back log to 25000. Since I have set my worker pool size to 20; wil this help with the the blocking piece of code?
A worker pool size of 20 seems to be the default.
I believe the important question in your case is how long do you expect each request to run. On my side, I expect to have thousands of short-lived requests, each with a payload size of about 5-10KB. All of these will be blocking, because of a blocking database driver I use at the moment. I have increased the default worker pool size to 40 and I have explicitly set my deploy vertical instances using the following formulae:
final int instances = Math.min(Math.max(Runtime.getRuntime().availableProcessors() / 2, 1), 2);
A test run of 500 simultaneous clients running for 60 seconds, on a vert.x server doing nothing but blocking calls, produced an average of 6 failed requests out of 11089. My test payload in this case was ~28KB.
Of course, from experience I know that running my software in production would often produce results that I have not anticipated. Thus, the important thing in my case is to have good atomicity rules in place, so that I don't get half-baked or corrupted data in the database.

Akka Actor Messaging Delay

I'm experiencing issues scaling my app with multiple requests.
Each request sends an ask to an actor, which then spawns other actors. This is fine, however, under load(5+ asks at once), the ask takes a massive amount of time to deliver the message to the target actor. The original design was to bulkhead requests evenly, but this is causing a bottleneck. Example:
In this picture, the ask is sent right after the query plan resolver. However, there is a multi-second gap when the Actor receives this message. This is only experienced under load(5+ requests/sec). I first thought this was a starvation issue.
Design:
Each planner-executor is a seperate instance for each request. It spawns a new 'Request Acceptor' actor each time(it logs 'requesting score' when it receives a message).
I gave the actorsystem a custom global executor(big one). I noticed the threads were not utilized beyond the core threadpool size even during this massive delay
I made sure all executioncontexts in the child actors used the correct executioncontext
Made sure all blocking calls inside actors used a future
I gave the parent actor(and all child) a custom dispatcher with core size 50 and max size 100. It did not request more(it stayed at 50) even during these delays
Finally, I tried creating a totally new Actorsystem for each request(inside the planner-executor). This also had no noticable effect!
I'm a bit stumped by this. From these tests it does not look like a thread starvation issue. Back at square one, I have no idea why the message takes longer and longer to deliver the more concurrent requests I make. The Zipkin trace before reaching this point does not degrade with more requests until it reaches the ask here. Before then, the server is able to handle multiple steps to e.g veify the request, talk to the db, and then finally go inside the planner-executor. So I doubt the application itself is running out of cpu time.
We had this very similar issue with Akka. We observed huge delay in ask pattern to deliver messages to the target actor on peek load.
Most of these issues are related to heap memory consumption and not because of usages of dispatchers.
Finally we fixed these issues by tuning some of the below configuration and changes.
1) Make sure you stop entities/actors which are no longer required. If its a persistent actor then you can always bring it back when you need it.
Refer : https://doc.akka.io/docs/akka/current/cluster-sharding.html#passivation
2) If you are using cluster sharding then check the akka.cluster.sharding.state-store-mode. By changing this to persistence we gained 50% more TPS.
3) Minimize your log entries (set it to info level).
4) Tune your logs to publish messages frequently to your logging system. Update the batch size, batch count and interval accordingly. So that the memory is freed. In our case huge heap memory is used for buffering the log messages and send in bulk. If the interval is more then you may fill your heap memory and that affects the performance (more GC activity required).
5) Run blocking operations on a separate dispatcher.
6) Use custom serializers (protobuf) and avoid JavaSerializer.
7) Add the below JAVA_OPTS to your jar
export JAVA_OPTS="$JAVA_OPTS -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -Djava.security.egd=file:/dev/./urandom"
The main thing is XX:MaxRAMFraction=2 which will utilize more than 60% of available memory. By default its 4 means your application will use only one fourth of the available memory, which might not be sufficient.
Refer : https://blog.csanchez.org/2017/05/31/running-a-jvm-in-a-container-without-getting-killed/
Regards,
Vinoth

Distributed timer service

I am looking for a distributed timer service. Multiple remote client services should be able to register for callbacks (via REST apis) after specified intervals. The length of an interval can be 1 minute. I can live with an error margin of around 1 minute. The number of such callbacks can go up to 100,000 for now but I would need to scale up later. I have been looking at schedulers like Quartz but I am not sure if they are a fit for the problem. With Quartz, I will probably have to save the callback requests in a DB and poll every minute for overdue requests on 100,000 rows. I am not sure that will scale. Are there any out of the box solutions around? Else, how do I go about building one?
Posting as answer since i cant comment
One more options to consider is a message queue. Where you publish a message with scheduled delay so that consumers can consume after that delay.
Amazon SQS Delay Queues
Delay queues let you postpone the delivery of new messages in a queue for the specified number of seconds. If you create a delay queue, any message that you send to that queue is invisible to consumers for the duration of the delay period. You can use the CreateQueue action to create a delay queue by setting the DelaySeconds attribute to any value between 0 and 900 (15 minutes). You can also change an existing queue into a delay queue using the SetQueueAttributes action to set the queue's DelaySeconds attribute.
Scheduling Messages with RabbitMQ
https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/
A user can declare an exchange with the type x-delayed-message and then publish messages with the custom header x-delay expressing in milliseconds a delay time for the message. The message will be delivered to the respective queues after x-delay milliseconds.
Out of the box solution
RocketMQ meets your requirements since it supports the Scheduled messages:
Scheduled messages differ from normal messages in that they won’t be
delivered until a provided time later.
You can register your callbacks by sending such messages:
Message message = new Message("TestTopic", "");
message.setDelayTimeLevel(3);
producer.send(message);
And then, listen to this topic to deal with your callbacks:
consumer.subscribe("TestTopic", "*");
consumer.registerMessageListener(new MessageListenerConcurrently() {...})
It does well in almost every way except that the DelayTimeLevel options can only be defined before RocketMQ server start, which means that if your MQ server has configuration messageDelayLevel=1s 5s 10s, then you just can not register your callback with delayIntervalTime=3s.
DIY
Quartz+storage can build such callback service as you mentioned, while I don't recommend that you store callback data in relational DB since you hope it to achieve high TPS and constructing distributed service will be hard to get rid of lock and transaction which bring complexity to DB coding.
I do suggest storing callback data in Redis. Because it has better performance than relational DB and it's data structure ZSET suits this scene well.
I once developed a timed callback service based on Redis and Dubbo. it provides some more useful features. Maybe you can get some ideas from it https://github.com/joooohnli/delay-callback

What does server throughput mean

If the throughput is increase how will be changed the response and request time?
If I have the data(request/min)?
JMeter's definition of throughput can be seen here: https://jmeter.apache.org/usermanual/glossary.html
Basically its a measure of how many requests that JMeter were able to send to your test site/application in one second. Or in another word the number of requests that your test site/application was able to receive from JMeter in one second. An increase in the throughput will mean your site/application was able to receive more requests per second while a decrease will mean a reduction in the number of request it handled per second.
The relationship between throughput with response/request time totally depends as ysth stated. I typically use this number to see the load of the server but run the test several times (30x min) and take the average.
There's not necessarily a relationship. Can you tell us anything more about why you want to know this, what you plan to do with the information, etc.? It may help get you an answer better suited to your needs.
After completion of the project development as a developer, we are responsible to test the performance of the application.
As part of performance testing, we have to check
1)Response time of application
2)bottle nack of application
3)Throughput of application
Throughput of application:-
In general 'Request capacity of application in a given time.'
As per Apache JMeter doc :-
Throughput is calculated as requests/unit of time. The time is calculated from the start of the first sample to the end of the last sample. This includes any intervals between samples, as it is supposed to represent the load on the server.
The formula is: Throughput = (number of requests) / (total time).