How does vertx database connection work under the hood? - reactive-programming

I am new to reactive world and trying to understand how the db-connections works under the hood with vertx-sql-clients.
I am using io.vertx:vertx-mysql-client, io.vertx:vertx-oracle-client and io.vertx:vertx-pg-client in my project.
As this is reactive database client, I can understand that one thread can handle multiple db-connections objects. I have set pool.maxSize to 1 and executed concurrently 5000 times a function, which gets the connection from the pool, executes a select query in the database and fetches some rows from the DB.
Now my question is, even though I have configured a single db-connection in the pool, still it can handle my 5000 requests concurrently, so how this works under the hood? is all the selects run over single database connection? if so, how it can handle a transaction management with single db-connection?

Related

Postgres: processes terminated after connetion break / invalidation

I don't understand some of Postgres mechanism and it makes me quite upset.
I usually use DBeaver as SQL client to query external pg base. If run create.. or insert.. queries and then connection for some reason is broken or invalidated, the pid is still running and finishes transaction.
But for some more complicated PL/pgSQL functions (with temp tables, loops, inserts, etc.) we wrote, breaking connection always causes process termination (it disappears from session list just before making next sql operation, eg. inserting a row in logtable). No matter if it's DBeaver editor or psql command.
I know that maybe disconnecting is critical problem, which should be eliminated and maybe I shouldn't expect process to successfully continue, but I do:) Or just to know why it happened and is it possible to prevent it?
If the network connection fails, the database server can detect that in two ways:
if it tries to send data to the client, it will figure out pretty quickly that the connection is down
if it tries to receive data from the client, it will only notice when the kernel's TCP keepalive mechanism has determined that the connection is down
When you say that sometimes execution of a function is terminated right away, I would say that is because the function returned data to the client.
In the case where a query keeps running, it is not attempting to return any data yet.
There is no cure for the former, but in PostgreSQL v14 you can prevent the latter by setting client_connection_check_interval. In addition, you have to set the PostgreSQL keepalive parameters so that the dead connection becomes known quickly.
See my article for more.

JPA result streams and underlying JDBC connections

It's pretty much evident what's happening when you call getResultList() on a Query instance - the framework obtains a JDBC connection from the pool and returns it when the list is ready.
What is not clear to me is how JPA handles connections during the getResultStream() calls. Does it wait till I get to the end of the stream and then return the connection to the pool? What if I don't? What if I obtain a Spliterator from the stream and stop iterating somewhere in the middle?
My only guess is that such connections are returned to the pool after a timeout. Which would mean that, depending on the timeout value, I might need a much larger value of open DB connections. If I'm right, how do I configure the timeout value, particularly with Spring JPA?
The answer is it doesn't. You're supposed to wrap the returned stream in a 'try with resources' statement.
It seems that a lot of people don't realize that Streams are actually AutoCloseable and whenever they hold on to resources like JDBC connections or file handles, it's the developer's responsibility to properly close them.

How application server handle multiple requests to save data into table

I have created a web application in jsf and it has a button.
If the button is clicked then it will go to the server side and execute the below function to save the data in a table and I am using mybatis for this.
public void save(A a)
{
SqlSession session = null;
try{
session = SqlConnection.getInstance().openSession();
TestMapper testmap= session.getMapper(TestMapper.class);
testmap.insert(a);
session .commit();
}
catch(Exception e){
}
finally{
session.close();
}
}
Now i have deployed this application in an application server JBoss(wildfly).
As per my understanding, when multiple users try to access the application
by hitting the URL, the application server creates thread for each of the user request.
For example if 4 clients make request then 4 threads will be generated that is t1,t2,t3 and t4.
If all the 4 users hit the save button at the same time, how save method will be executed, like if t1 access the method and execute insert statement
to insert data into table, then t2,t3 and t4 or simultaneously all the 4 threads will execute the insert method and insert data?
To bring some context I would describe first two possible approaches to handling requests. In this case HTTP but these approaches do not depend on the protocol used and the main important thing is that requests come from the network and for their execution some IO is needed (either access to filesystem or database or network calls to other systems). Note that the following description has some simplifications.
These two approaches are:
synchronous
asynchronous
In general to process the typical HTTP request that involves DB access at least four IO operations are needed:
request handler needs to read the request data from the client socket
request handler needs to write request to the socket connected to the DB
request handler needs to read response from the DB socket
request handler needs to write the response to the client socket
Let's see how this is done for both cases.
Synchronous
In this approach the server has a pool (think a collection) of threads that are ready to serve a request.
When the request comes in the server borrows a thread from the pool and executes a request handler in that thread.
When the request handler needs to do the IO operation it initiates the IO operation and then waits for its completion. By wait I mean that thread execution is blocked until the IO operation completes and the data (for example response with the results of the SQL query) is available.
In this case concurrency that is requests processing for multiple clients simultaneously is achieved by having some number of threads in the pool. IO operations are much slower if compared to CPU so most of the time the thread processing some request is blocked on IO operation and CPU cores can execute stages of the request processing for other clients.
Note that because of the slowness of the IO operations thread pool used for handling HTTP requests is usually large enough. Documentation for sync requests processing subsystem used in wildfly says about 10 threads per CPU core as a reasonable value.
Asynchronous
In this case the IO is handled differently. There is a small number of threads handling IO. They all work the same way and I'll describe one of them.
Such thread runs a loop which basically waits for events and every time an event happen it calls a handler for an event.
The first such event is new request. When a request processing is started the request handler is invoked from the loop that is run by one of the IO threads. The first thing the request handler is doing it tries to read request from the client socket. So the handler initiates the IO operation on the client socket and returns control to the caller. That means that the thread is released and it can process another event.
Another event happens when the IO operations that reads from client socket got some data available. In this case the loop invokes the handler at the point where the handler returned the control to the loop after the IO initiate namely it is resumed on the next step that processes the input data (like parses HTTP parameters) and initiates new IO operation (in this case request to the DB socket). And again the handler releases the thread so it can handler other events (like completion of IO operations that are part of other clients' requests processing).
Given that IO operations are slow compared to the speed of CPU itself one thread handling IO can process a lot of requests concurrently.
Note: that it is important that the requests handler code never uses any blocking operation (like blocking IO) because that would steal the IO thread and will not allow other requests to proceed.
JSF and Mybatis
In case of JSF and mybatis the synchronous approach is used. JSF uses a servlet to handle requests from the UI and servlets are handled by the synchronous processors in WildFly. JDBC which is used by mybatis to communicate to a DB is also using synchronous IO so threads are used to execute requests concurrently.
Congestions
All of the above is written with the assumption that there is no other sources of the congestion. By congestion here I mean a limitation on the ability of the certain component of the system to execute things in parallel.
For example imagine a situation that a database is configured to only allow one client connection at a time (this is not a reasonable configuration and I'm using this only to demonstrate the idea). In this case even if multiple threads can execute the code of the save method in parallel all but one will be blocked at the moment when they try to open the connection to the database.
Another similar example is if you are using sqlite database. It only allows one client to write to the DB at a time. So at the point when thread A tries to execute insert it will be blocked if the is another thread B that is already executing the insert. And only after the commit executed by the thread B the thread A would be able to proceed with the insert. The time A depends on the time it take for B to execute its request and the number of other threads waiting to do a write operation to the same DB.
In practice if you are using a RDBMS that scales better (like postgresql, mysql or oracle) you will not hit this problem when using the small number of connection. But it may become a problem when there is a big number of concurrent requests and there is a limitation in the DB on the number of client connections or the connection pool is used to limit the number of connections on the application side. In this case if there are already many connections to the database the new clients will wait until existing requests are finished and connections are closed.

Play Framework + JDBC + Futures

Assuming I obtain a JDBC connection through injection, like so:
class SqlQuery #Inject()(db: Database) extends Controller { /* .... */ }
And that the pool of connections is large enough, for example 100. Is it possible to create a Future to avoid blocking when running the SQL statement (similar to Slick futures)? Or the fact that the number of connections in the pool is large means that the SQL statement will not block?
Using futures is not synonymous with non-blocking. Futures allow you to execute code on another thread, or some type of executor, in general. However, the code you execute can still block.
JDBC is a blocking API. This means that when you execute a query through JDBC, the calling thread is blocked while it waits for a response from the database. Another term for this would be synchronous. A non-blocking or asynchronous API would accept a response asynchronously, freeing the calling thread from actively waiting for it. Reactive slick uses it's own driver to accept responses from a database in an asynchronous manner, which means the calling thread can be freed as soon as the query is dispatched to the database.
The difference between the two is this:
Imagine your application has a database connection pool of size 100, and a fixed thread pool of size 10. Then, let's say you wrap all of your JDBC calls in futures. Let's also say that your SqlQuery controller has a method that makes several JDBC calls at the same time. All of these queries will be run in parallel, until the thread pool is exhausted, which means you would only be able to run 10 queries at the same time at any given moment. While the calling thread would not be blocked by the JDBC calls, the threads executing them would. With enough queries running in parallel, the thread pool would become exhausted and it would no longer matter how many connections were in the pool. You could deal with this by making your thread pool larger, or using a fork join pool that expands as needed, but this could incur performance costs due to the creation of new threads and context switching. After all, your CPU is limited.
Using an asynchronous database driver like reactive slick would not block your limited pool of threads, and you would be able to run as many queries concurrently as you had connections in the pool (100 in this example). Saving threads from being blocked means saving CPU time that would otherwise be spent just waiting for responses, which means you can use it to continue to handle other requests, etc.

Pattern for a singleton application process using the database

I have a backend process that maintains state in a PostgreSQL database, which needs to be visible to the frontend. I want to:
Properly handle the backend being stopped and started. This alone is as simple as clearing out the backend state tables on startup.
Guard against multiple instances of the backend trampling each other. There should only be one backend process, but if I accidentally start a second instance, I want to make sure either the first instance is killed, or the second instance is blocked until the first instance dies.
Solutions I can think of include:
Exploit the fact that my backend process listens on a port. If a second instance of the process tries to start, it will fail with "Address already in use". I just have to make sure it does the listen step before connecting to the database and wiping out state tables.
Open a secondary connection and run the following:
BEGIN;
LOCK TABLE initech.backend_lock IN EXCLUSIVE MODE;
Note: the reason for IN EXCLUSIVE MODE is that LOCK defaults to the AccessExclusive locking mode. This conflicts with the AccessShare lock acquired by pg_dump.
Don't commit. Leave the table locked until the program dies.
What's a good pattern for maintaining a singleton backend process that maintains state in a PostgreSQL database? Ideally, I would acquire a lock for the duration of the connection, but LOCK TABLE cannot be used outside of a transaction.
Background
Consider an application with a "broker" process which talks to the database, and accepts connections from clients. Any time a client connects, the broker process adds an entry for it to the database. This provides two benefits:
The frontend can query the database to see what clients are connected.
When a row changes in another table called initech.objects, and clients need to know about it, I can create a trigger that generates a list of clients to notify of the change, writes it to a table, then uses NOTIFY to wake up the broker process.
Without the table of connected clients, the application has to figure out what clients to notify. In my case, this turned out to be quite messy: store a copy of the initech.objects table in memory, and any time a row changes, dispatch the old row and new row to handlers that check if the row changed and act if it did. To do it efficiently involves creating "indexes" against both the table-stored-in-memory, and handlers interested in row changes. I'm making a poor replica of SQL's indexing and querying capabilities in the broker program. I'd rather move this work to the database.
In summary, I want the broker process to maintain some of its state in the database. It vastly simplifies dispatching configuration changes to clients, but it requires that only one instance of the broker be connected to the database at a time.
it can be done by advisory locks
http://www.postgresql.org/docs/9.1/interactive/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS
I solved this today in a way I thought was concise:
CREATE TYPE mutex as ENUM ('active');
CREATE TABLE singleton (status mutex DEFAULT 'active' NOT NULL UNIQUE);
Then your backend process tries to do this:
insert into singleton values ('active');
And quits or waits if it fails to do so.