Vertx JDBC limiting number of queries - vert.x

I am working on an microservice that is developed using Vertx framework. One of service receives 100s(400 average) of events per second from event bus and it is written to MSSQL DB. Queries are executed using JDBCPool and currently configured with 40 as max number of connection. C3P0 is used for connection pooling.
My problem is, sometime the pool got exhausted and there are lot of statements waiting to be executed and this make whole application un-responsive. If I increase the pool size the DB exhibit slowness which affects other services also. So I am planning to write event to a Queue and poll the event from queue and then write to db, by this way I can control number of connections to DB by increasing/decreasing the number poller instance.
Current design
Source system -> Event bus -> Async IO -> DB
Proposed design
Source system -> Event bus -> Queue -> Polling -> DB
To keep it simple, I am trying to replace the DB IO async part with kind of sync.
Code
poll(){
// poll queue with 100ms timeout
//after getting event from queue call dao
dao.insert(event)
.onSuccess(statValObj -> {
poll();
})
}
The above looks like a recursion, so would it impact vertx eventloop?
Can connection pool size limit the number of queries/operation without freezing the entire call stack?
Vertx version - 4.2.0

Related

How does vertx database connection work under the hood?

I am new to reactive world and trying to understand how the db-connections works under the hood with vertx-sql-clients.
I am using io.vertx:vertx-mysql-client, io.vertx:vertx-oracle-client and io.vertx:vertx-pg-client in my project.
As this is reactive database client, I can understand that one thread can handle multiple db-connections objects. I have set pool.maxSize to 1 and executed concurrently 5000 times a function, which gets the connection from the pool, executes a select query in the database and fetches some rows from the DB.
Now my question is, even though I have configured a single db-connection in the pool, still it can handle my 5000 requests concurrently, so how this works under the hood? is all the selects run over single database connection? if so, how it can handle a transaction management with single db-connection?

Batch processing with Vert.x

Is there a way to do a typical batch processing with Vert.x - like providing a file or DB query as input and let each record be processed by a vertice in non-blocking way.
In examples of Vertice, a server is defined in startup. And even though multiple vertices are deployed, server is created only onece. Which means that Vert.x engine does have a build in concept of a server and knows how to send incomming requests to each vertice for processing.
Same happens with Event Bus as well.
But is there a way to define a vertice with a handler for processing data from a general stream - query, file, etc..
I am particularly interested in spreading data processing over cluster nodes.
One way I can think of, is execute a query a regular way and then publish data to event bus for processing. But that means that if I have to process few millions of records, I will run out of memory. Of course I could do paging, etc.. - but there is no coordination between retrieving and processing of data.
Thanks
Andrius
If you are using the JDBC Client, you can stream the query result:
(using vertx-rx-java2)
JDBCClient client = ...;
JsonObject params = new JsonArray().add(dataCategory);
client.rxQueryStreamWithParams("SELECT * FROM data WHERE data.category = ?", params)
.flatMapObservable(SQLRowStream::toObservable)
.subscribe(
(JsonArray row) -> vertx.eventBus().send("data.process", row)
);
This way each row is send to the event bus. If you then have multiple verticle instances that each listen to this address, you spread the data processing to multiple threads.
If you are using another SQL Client have a look at its documentation - Maybe is has a similar method.

How application server handle multiple requests to save data into table

I have created a web application in jsf and it has a button.
If the button is clicked then it will go to the server side and execute the below function to save the data in a table and I am using mybatis for this.
public void save(A a)
{
SqlSession session = null;
try{
session = SqlConnection.getInstance().openSession();
TestMapper testmap= session.getMapper(TestMapper.class);
testmap.insert(a);
session .commit();
}
catch(Exception e){
}
finally{
session.close();
}
}
Now i have deployed this application in an application server JBoss(wildfly).
As per my understanding, when multiple users try to access the application
by hitting the URL, the application server creates thread for each of the user request.
For example if 4 clients make request then 4 threads will be generated that is t1,t2,t3 and t4.
If all the 4 users hit the save button at the same time, how save method will be executed, like if t1 access the method and execute insert statement
to insert data into table, then t2,t3 and t4 or simultaneously all the 4 threads will execute the insert method and insert data?
To bring some context I would describe first two possible approaches to handling requests. In this case HTTP but these approaches do not depend on the protocol used and the main important thing is that requests come from the network and for their execution some IO is needed (either access to filesystem or database or network calls to other systems). Note that the following description has some simplifications.
These two approaches are:
synchronous
asynchronous
In general to process the typical HTTP request that involves DB access at least four IO operations are needed:
request handler needs to read the request data from the client socket
request handler needs to write request to the socket connected to the DB
request handler needs to read response from the DB socket
request handler needs to write the response to the client socket
Let's see how this is done for both cases.
Synchronous
In this approach the server has a pool (think a collection) of threads that are ready to serve a request.
When the request comes in the server borrows a thread from the pool and executes a request handler in that thread.
When the request handler needs to do the IO operation it initiates the IO operation and then waits for its completion. By wait I mean that thread execution is blocked until the IO operation completes and the data (for example response with the results of the SQL query) is available.
In this case concurrency that is requests processing for multiple clients simultaneously is achieved by having some number of threads in the pool. IO operations are much slower if compared to CPU so most of the time the thread processing some request is blocked on IO operation and CPU cores can execute stages of the request processing for other clients.
Note that because of the slowness of the IO operations thread pool used for handling HTTP requests is usually large enough. Documentation for sync requests processing subsystem used in wildfly says about 10 threads per CPU core as a reasonable value.
Asynchronous
In this case the IO is handled differently. There is a small number of threads handling IO. They all work the same way and I'll describe one of them.
Such thread runs a loop which basically waits for events and every time an event happen it calls a handler for an event.
The first such event is new request. When a request processing is started the request handler is invoked from the loop that is run by one of the IO threads. The first thing the request handler is doing it tries to read request from the client socket. So the handler initiates the IO operation on the client socket and returns control to the caller. That means that the thread is released and it can process another event.
Another event happens when the IO operations that reads from client socket got some data available. In this case the loop invokes the handler at the point where the handler returned the control to the loop after the IO initiate namely it is resumed on the next step that processes the input data (like parses HTTP parameters) and initiates new IO operation (in this case request to the DB socket). And again the handler releases the thread so it can handler other events (like completion of IO operations that are part of other clients' requests processing).
Given that IO operations are slow compared to the speed of CPU itself one thread handling IO can process a lot of requests concurrently.
Note: that it is important that the requests handler code never uses any blocking operation (like blocking IO) because that would steal the IO thread and will not allow other requests to proceed.
JSF and Mybatis
In case of JSF and mybatis the synchronous approach is used. JSF uses a servlet to handle requests from the UI and servlets are handled by the synchronous processors in WildFly. JDBC which is used by mybatis to communicate to a DB is also using synchronous IO so threads are used to execute requests concurrently.
Congestions
All of the above is written with the assumption that there is no other sources of the congestion. By congestion here I mean a limitation on the ability of the certain component of the system to execute things in parallel.
For example imagine a situation that a database is configured to only allow one client connection at a time (this is not a reasonable configuration and I'm using this only to demonstrate the idea). In this case even if multiple threads can execute the code of the save method in parallel all but one will be blocked at the moment when they try to open the connection to the database.
Another similar example is if you are using sqlite database. It only allows one client to write to the DB at a time. So at the point when thread A tries to execute insert it will be blocked if the is another thread B that is already executing the insert. And only after the commit executed by the thread B the thread A would be able to proceed with the insert. The time A depends on the time it take for B to execute its request and the number of other threads waiting to do a write operation to the same DB.
In practice if you are using a RDBMS that scales better (like postgresql, mysql or oracle) you will not hit this problem when using the small number of connection. But it may become a problem when there is a big number of concurrent requests and there is a limitation in the DB on the number of client connections or the connection pool is used to limit the number of connections on the application side. In this case if there are already many connections to the database the new clients will wait until existing requests are finished and connections are closed.

Distributed timer service

I am looking for a distributed timer service. Multiple remote client services should be able to register for callbacks (via REST apis) after specified intervals. The length of an interval can be 1 minute. I can live with an error margin of around 1 minute. The number of such callbacks can go up to 100,000 for now but I would need to scale up later. I have been looking at schedulers like Quartz but I am not sure if they are a fit for the problem. With Quartz, I will probably have to save the callback requests in a DB and poll every minute for overdue requests on 100,000 rows. I am not sure that will scale. Are there any out of the box solutions around? Else, how do I go about building one?
Posting as answer since i cant comment
One more options to consider is a message queue. Where you publish a message with scheduled delay so that consumers can consume after that delay.
Amazon SQS Delay Queues
Delay queues let you postpone the delivery of new messages in a queue for the specified number of seconds. If you create a delay queue, any message that you send to that queue is invisible to consumers for the duration of the delay period. You can use the CreateQueue action to create a delay queue by setting the DelaySeconds attribute to any value between 0 and 900 (15 minutes). You can also change an existing queue into a delay queue using the SetQueueAttributes action to set the queue's DelaySeconds attribute.
Scheduling Messages with RabbitMQ
https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/
A user can declare an exchange with the type x-delayed-message and then publish messages with the custom header x-delay expressing in milliseconds a delay time for the message. The message will be delivered to the respective queues after x-delay milliseconds.
Out of the box solution
RocketMQ meets your requirements since it supports the Scheduled messages:
Scheduled messages differ from normal messages in that they won’t be
delivered until a provided time later.
You can register your callbacks by sending such messages:
Message message = new Message("TestTopic", "");
message.setDelayTimeLevel(3);
producer.send(message);
And then, listen to this topic to deal with your callbacks:
consumer.subscribe("TestTopic", "*");
consumer.registerMessageListener(new MessageListenerConcurrently() {...})
It does well in almost every way except that the DelayTimeLevel options can only be defined before RocketMQ server start, which means that if your MQ server has configuration messageDelayLevel=1s 5s 10s, then you just can not register your callback with delayIntervalTime=3s.
DIY
Quartz+storage can build such callback service as you mentioned, while I don't recommend that you store callback data in relational DB since you hope it to achieve high TPS and constructing distributed service will be hard to get rid of lock and transaction which bring complexity to DB coding.
I do suggest storing callback data in Redis. Because it has better performance than relational DB and it's data structure ZSET suits this scene well.
I once developed a timed callback service based on Redis and Dubbo. it provides some more useful features. Maybe you can get some ideas from it https://github.com/joooohnli/delay-callback

Play Framework + JDBC + Futures

Assuming I obtain a JDBC connection through injection, like so:
class SqlQuery #Inject()(db: Database) extends Controller { /* .... */ }
And that the pool of connections is large enough, for example 100. Is it possible to create a Future to avoid blocking when running the SQL statement (similar to Slick futures)? Or the fact that the number of connections in the pool is large means that the SQL statement will not block?
Using futures is not synonymous with non-blocking. Futures allow you to execute code on another thread, or some type of executor, in general. However, the code you execute can still block.
JDBC is a blocking API. This means that when you execute a query through JDBC, the calling thread is blocked while it waits for a response from the database. Another term for this would be synchronous. A non-blocking or asynchronous API would accept a response asynchronously, freeing the calling thread from actively waiting for it. Reactive slick uses it's own driver to accept responses from a database in an asynchronous manner, which means the calling thread can be freed as soon as the query is dispatched to the database.
The difference between the two is this:
Imagine your application has a database connection pool of size 100, and a fixed thread pool of size 10. Then, let's say you wrap all of your JDBC calls in futures. Let's also say that your SqlQuery controller has a method that makes several JDBC calls at the same time. All of these queries will be run in parallel, until the thread pool is exhausted, which means you would only be able to run 10 queries at the same time at any given moment. While the calling thread would not be blocked by the JDBC calls, the threads executing them would. With enough queries running in parallel, the thread pool would become exhausted and it would no longer matter how many connections were in the pool. You could deal with this by making your thread pool larger, or using a fork join pool that expands as needed, but this could incur performance costs due to the creation of new threads and context switching. After all, your CPU is limited.
Using an asynchronous database driver like reactive slick would not block your limited pool of threads, and you would be able to run as many queries concurrently as you had connections in the pool (100 in this example). Saving threads from being blocked means saving CPU time that would otherwise be spent just waiting for responses, which means you can use it to continue to handle other requests, etc.