How to collect metrics in pg-promise? - pg-promise

pg-promise internally has a connection pools. When a request is processed, it will
fetch a connection from connection pool. if no connection is available, it will wait
call db through the connection
I need to collect metrics for times consumed in each step. Is there any way to do that? Can we collect these times at each step and return it to caller?

There is no timing data related to allocating database connections.
So what you are asking for is not possible.

Related

How does vertx database connection work under the hood?

I am new to reactive world and trying to understand how the db-connections works under the hood with vertx-sql-clients.
I am using io.vertx:vertx-mysql-client, io.vertx:vertx-oracle-client and io.vertx:vertx-pg-client in my project.
As this is reactive database client, I can understand that one thread can handle multiple db-connections objects. I have set pool.maxSize to 1 and executed concurrently 5000 times a function, which gets the connection from the pool, executes a select query in the database and fetches some rows from the DB.
Now my question is, even though I have configured a single db-connection in the pool, still it can handle my 5000 requests concurrently, so how this works under the hood? is all the selects run over single database connection? if so, how it can handle a transaction management with single db-connection?

JPA result streams and underlying JDBC connections

It's pretty much evident what's happening when you call getResultList() on a Query instance - the framework obtains a JDBC connection from the pool and returns it when the list is ready.
What is not clear to me is how JPA handles connections during the getResultStream() calls. Does it wait till I get to the end of the stream and then return the connection to the pool? What if I don't? What if I obtain a Spliterator from the stream and stop iterating somewhere in the middle?
My only guess is that such connections are returned to the pool after a timeout. Which would mean that, depending on the timeout value, I might need a much larger value of open DB connections. If I'm right, how do I configure the timeout value, particularly with Spring JPA?
The answer is it doesn't. You're supposed to wrap the returned stream in a 'try with resources' statement.
It seems that a lot of people don't realize that Streams are actually AutoCloseable and whenever they hold on to resources like JDBC connections or file handles, it's the developer's responsibility to properly close them.

How application server handle multiple requests to save data into table

I have created a web application in jsf and it has a button.
If the button is clicked then it will go to the server side and execute the below function to save the data in a table and I am using mybatis for this.
public void save(A a)
{
SqlSession session = null;
try{
session = SqlConnection.getInstance().openSession();
TestMapper testmap= session.getMapper(TestMapper.class);
testmap.insert(a);
session .commit();
}
catch(Exception e){
}
finally{
session.close();
}
}
Now i have deployed this application in an application server JBoss(wildfly).
As per my understanding, when multiple users try to access the application
by hitting the URL, the application server creates thread for each of the user request.
For example if 4 clients make request then 4 threads will be generated that is t1,t2,t3 and t4.
If all the 4 users hit the save button at the same time, how save method will be executed, like if t1 access the method and execute insert statement
to insert data into table, then t2,t3 and t4 or simultaneously all the 4 threads will execute the insert method and insert data?
To bring some context I would describe first two possible approaches to handling requests. In this case HTTP but these approaches do not depend on the protocol used and the main important thing is that requests come from the network and for their execution some IO is needed (either access to filesystem or database or network calls to other systems). Note that the following description has some simplifications.
These two approaches are:
synchronous
asynchronous
In general to process the typical HTTP request that involves DB access at least four IO operations are needed:
request handler needs to read the request data from the client socket
request handler needs to write request to the socket connected to the DB
request handler needs to read response from the DB socket
request handler needs to write the response to the client socket
Let's see how this is done for both cases.
Synchronous
In this approach the server has a pool (think a collection) of threads that are ready to serve a request.
When the request comes in the server borrows a thread from the pool and executes a request handler in that thread.
When the request handler needs to do the IO operation it initiates the IO operation and then waits for its completion. By wait I mean that thread execution is blocked until the IO operation completes and the data (for example response with the results of the SQL query) is available.
In this case concurrency that is requests processing for multiple clients simultaneously is achieved by having some number of threads in the pool. IO operations are much slower if compared to CPU so most of the time the thread processing some request is blocked on IO operation and CPU cores can execute stages of the request processing for other clients.
Note that because of the slowness of the IO operations thread pool used for handling HTTP requests is usually large enough. Documentation for sync requests processing subsystem used in wildfly says about 10 threads per CPU core as a reasonable value.
Asynchronous
In this case the IO is handled differently. There is a small number of threads handling IO. They all work the same way and I'll describe one of them.
Such thread runs a loop which basically waits for events and every time an event happen it calls a handler for an event.
The first such event is new request. When a request processing is started the request handler is invoked from the loop that is run by one of the IO threads. The first thing the request handler is doing it tries to read request from the client socket. So the handler initiates the IO operation on the client socket and returns control to the caller. That means that the thread is released and it can process another event.
Another event happens when the IO operations that reads from client socket got some data available. In this case the loop invokes the handler at the point where the handler returned the control to the loop after the IO initiate namely it is resumed on the next step that processes the input data (like parses HTTP parameters) and initiates new IO operation (in this case request to the DB socket). And again the handler releases the thread so it can handler other events (like completion of IO operations that are part of other clients' requests processing).
Given that IO operations are slow compared to the speed of CPU itself one thread handling IO can process a lot of requests concurrently.
Note: that it is important that the requests handler code never uses any blocking operation (like blocking IO) because that would steal the IO thread and will not allow other requests to proceed.
JSF and Mybatis
In case of JSF and mybatis the synchronous approach is used. JSF uses a servlet to handle requests from the UI and servlets are handled by the synchronous processors in WildFly. JDBC which is used by mybatis to communicate to a DB is also using synchronous IO so threads are used to execute requests concurrently.
Congestions
All of the above is written with the assumption that there is no other sources of the congestion. By congestion here I mean a limitation on the ability of the certain component of the system to execute things in parallel.
For example imagine a situation that a database is configured to only allow one client connection at a time (this is not a reasonable configuration and I'm using this only to demonstrate the idea). In this case even if multiple threads can execute the code of the save method in parallel all but one will be blocked at the moment when they try to open the connection to the database.
Another similar example is if you are using sqlite database. It only allows one client to write to the DB at a time. So at the point when thread A tries to execute insert it will be blocked if the is another thread B that is already executing the insert. And only after the commit executed by the thread B the thread A would be able to proceed with the insert. The time A depends on the time it take for B to execute its request and the number of other threads waiting to do a write operation to the same DB.
In practice if you are using a RDBMS that scales better (like postgresql, mysql or oracle) you will not hit this problem when using the small number of connection. But it may become a problem when there is a big number of concurrent requests and there is a limitation in the DB on the number of client connections or the connection pool is used to limit the number of connections on the application side. In this case if there are already many connections to the database the new clients will wait until existing requests are finished and connections are closed.

How do I place a read lock on MongoDB?

My application needs to access a Mongo db where if more than one process/thread is reading from a specific collection, bad things will happen.
I need to restrict the ability of a group of processes to read from the collection (or db, if need be). So for example, if there are multiple processes trying to read from the db, they read sequentially, not in parallel.
This could be done in the driver level. If you set connection pool size to 1 then all access to to database will be in sequence.
In nodejs you can set the driver as:
MongoClient.connect(url, {
poolSize: 1
});
From the documentation:
poolSize, this allows you to control how many tcp connections are
opened in parallel. The default value for this is 5 but you can set it
as high as you want. The driver will use a round-robin strategy to
dispatch and read from the tcp connection.

Haskell database connections

Please look at this scotty app (it's taken directly from this old answer from 2014):
import Web.Scotty
import Database.MongoDB
import qualified Data.Text.Lazy as T
import Control.Monad.IO.Class
runQuery :: Pipe -> Query -> IO [Document]
runQuery pipe query = access pipe master "nutrition" (find query >>= rest)
main = do
pipe <- connect $ host "127.0.0.1"
scotty 3000 $ do
get "/" $ do
res <- liftIO $ runQuery pipe (select [] "stock_foods")
text $ T.pack $ show res
You see how the the database connection (pipe) is created only once when the web app launches. Subsequently, thousands if not millions of visitors will hit the "/" route simultaneously and read from the database using the same connection (pipe).
I have questions about how to properly use Database.MongoDB:
Is this the proper way of setting things up? As opposed to creating a database connection for every visit to "/". In this latter case, we could have millions of connections at once. Is that discouraged? What are the advantages and drawbacks of such an approach?
In the app above, what happens if the database connection is lost for some reason and needs to be created again? How would you recover from that?
What about authentication with the auth function? Should the auth function only be called once after creating the pipe, or should it be called on every hit to "/"?
Some say that I'm supposed to use a pool (Data.Pool). It looks like that would only help limit the number of visitors using the same database connection simultaneously. But why would I want to do that? Doesn't the MongoDB connection have a built-in support for simultaneous usages?
Even if you create connection per client you won't be able to create too many of them. You will hit ulimit. Once you hit that ulimit the client that hit this ulimit will get a runtime error.
The reason it doesn't make sense is because mongodb server will be spending too much time polling all those connections and it will have only as many meaningful workers as many CPUs your db server has.
One connection is not a bad idea, because mongodb is designed to send several requests and wait for responses. So, it will utilize as much resources as your mongodb can have with only one limitation - you have only one pipe for writing, and if it closes accidentally you will need to recreate this pipe yourself.
So, it makes more sense to have a pool of connections. It doesn't need to be big. I had an app which authenticates users and gives them tokens. With 2500 concurrent users per second it only had 3-4 concurrent connections to the database.
Here are the benefits connection pool gives you:
If you hit pool connection limit you will be waiting for the next available connection and will not get runtime error. So, you app will wait a little bit instead of rejecting your client.
Pool will be recreating connections for you. You can configure pool to close excess of connections and create more up until certain limit as you need them. If you connection breaks while you read from it or write to it, then you just take another connection from the pool. If you don't return that broken connection to the pool pool will create another connection for you.
If the database connection is closed then: mongodb listener on this connection will exit printing a error message on your terminal, your app will receive an IO error. In order to handle this error you will need to create another connection and try again. When it comes to handling this situation you understand that it's easier to use a db pool. Because eventually you solution to this will resemble connection pool very much.
I do auth once as part of opening a connection. If you need to auth another user later you can always do it.
Yes, mongodb handles simultaneous usage, but like I said it gives only one pipe to write and it soon becomes a bottle neck. If you create at least as many connections as your mongodb server can afford threads for handling them(CPU count), then they will be going at full speed.
If I missed something feel free to ask for clarifications.
Thank you for your question.
What you really want is a database connection pool. Take a look at the code from this other answer.
Instead of auth, you can use withMongoDBPool to if your MongoDB server is in secure mode.
Is this the proper way of setting things up? As opposed to creating a database connection for every visit to "/". In this latter case, we could have millions of connections at once. Is that discouraged? What are the advantages and drawbacks of such an approach?
You do not want to open one connection and then use it. The HTTP server you are using, which underpins Scotty, is called Warp. Warp has a multi-core, multi-green-thread design. You are allowed to share the same connection across all threads, since Database.MongoDB says outright that connections are thread-safe, but what will happen is that when one thread is blocked waiting for a response (the MongoDB protocol follows a simple request-response design) all threads in your web service will block. This is unfortunate.
We can instead create a connection on every request. This trivially solves the problem of one thread's blocking another but leads to its own share of problems. The overhead of setting up a TCP connection, while not substantial, is also not zero. Recall that every time we want to open or close a socket we have to jump from the user to the kernel, wait for the kernel to update its internal data structures, and then jump back (a context switch). We also have to deal with the TCP handshake and goodbyes. We would also, under high load, run out file descriptors or memory.
It would be nice if we had a solution somewhere in between. The solution should be
Thread-safe
Let us max-bound the number of connections so we don't exhaust the finite resources of the operating system
Quick
Share connections across threads under normal load
Create new connections as we experience increased load
Allow us to clean up resources (like closing a handle) as connections are deleted under reduced load
Hopefully already written and battle-tested by other production systems
It is this exactly problem that resource-pool tackles.
Some say that I'm supposed to use a pool (Data.Pool). It looks like that would only help limit the number of visitors using the same database connection simultaneously. But why would I want to do that? Doesn't the MongoDB connection have a built-in support for simultaneous usages?
It is unclear what you mean by simultaneous usages. There is one interpretation I can guess at: you mean something like HTTP/2, which has pipelining built into the protocol.
standard picture of pipelining http://research.worksap.com/wp-content/uploads/2015/08/pipeline.png
Above we see the client making multiple requests to the server, without waiting for a response, and then the client can receive responses back in some order. (Time flows from the top to the bottom.) This MongoDB does not have. This is a fairly complicated protocol design that is not that much better than just asking your clients to use connection pools. And MongoDB is not alone here: the simple request-and-response design is something that Postgres, MySQL, SQL Server, and most other databases have settled on.
And: it is true that connection pool limits the load you can take as a web service before all threads are blocked and your user just sees a loading bar. But this problem would exist in any of the three scenarios (connection pooling, one shared connection, one connection per request)! The computer has finite resources, and at some point something will collapse under sufficient load. Connection pooling's advantages are that it scales gracefully right up until the point it cannot. The correct solution to handling more traffic is to increase the number of computers; we should not avoid pooling simply due to this problem. 
In the app above, what happens if the database connection is lost for some reason and needs to be created again? How would you recover from that?
I believe these kinds of what-if's are outside the scope of Stack Overflow and deserve no better answer than "try it and see." Buuuuuuut given that the server terminates the connection, I can take a stab at what might happen: assuming Warp forks a green thread for each request (which I think it does), each thread will experience an unchecked IOException as it tries to write to the closed TCP connection. Warp would catch this exception and serve it as an HTTP 500, hopefully writing something useful to the logs also. Assuming a single-connection model like you have now, you could either do something clever (but high in lines of code) where you "reboot" your main function and set up a second connection. Something I do for hobby projects: should anything odd occur, like a dropped connection, I ask my supervisor process (like systemd) to watch the logs and restart the web service. Though clearly not a great solution for a production, money-makin' website, it works well enough for small apps.
What about authentication with the auth function? Should the auth function only be called once after creating the pipe, or should it be called on every hit to "/"?
It should be called once after creating the connection. MongoDB authentication is per-connection. You can see an example here of how the db.auth() command mutates the MongoDB server's data structures corresponding to the current client connection.