How can I speed up PostgreSQL connections from pgbouncer? (getting a connection constitutes ~80% of latency) - postgresql

I have a basic pgbouncer configuration set up on an Amazon EC2 instance.
My client code (an AWS Lambda function, or a localhost webserver when developing) is making SQL queries to my database through the pgbouncer.
Currently, each query is taking 150-200ms to execute, with about 80% of that being the time it takes to get the connection.
Here's how I'm getting a connection:
long start = System.currentTimeMillis();
Connection conn = DriverManager.getConnection(this.url, this.username, this.password);
log.info("Got connection in " + (System.currentTimeMillis() - start) + "ms")
this.url is simply the location of the pgbouncer instance. Here's what the measured latency looks like, where Got connection is from the above code snippet and Executed in is another timing that measures the elapsed duration after a PreparedStatement has been executed. The first connection is usually a bit slow which is fine, subsequent ones take around 100ms pretty consistently.
DBManager - Got connection in 190ms
DBManager - Executed in 232ms
DBManager - Got connection in 108ms
DBManager - Executed in 132ms
DBManager - Got connection in 108ms
DBManager - Executed in 128ms
Is there any way to make this faster? Or am I basically stuck with a minimum ~100ms latency on my requests? I get similar speeds from Lambda and localhost, and unfortunately I can't throw my Lambda into the same VPC because of the occasional 8-10 second cold start delay from setting up a new Elastic Network Interface when using a Lambda in a VPC.
This is my first time working with this kind of setup so I don't really know where to start. Could I squeeze out higher speed by adding more power (RAM/CPU) to the database or pgbouncer? Should I not get a new connection for every request (but this would mean having a connection pool per Lambda and then a separate pgbouncer pool)?
I feel like this is surely a pretty common problem so there must be some good ways of solving it, but I haven't been able to find anything.

You'd have to ask the vendor to figure out what part of the time is spent on the route between you and pgBouncer and pgBouncer and the database server. I'd guess it is the first part.
If you want low latency, a hosted database might not be perfect for you.
My suggestion would be to build a connection pool into your application or have pgBouncer locally, so that you don't have to estabish connections to the hosted systems all the time.

Related

Psycopg2 idle session timeout with ThreadedConnectionPool

I am trying to setup ThreadedConnecionPool in my AWS Lambda, Postgres 14 is being used. The lambda might die abruptly and I want to make sure that the Postgres server closes the connection after 1 minute of idle activity, for example.
idle_session_timeout parameter states the following:
Be wary of enforcing this timeout on connections made through connection-pooling software or other middleware, as such a layer may not react well to unexpected connection closure. It may be helpful to enable this timeout only for interactive sessions, perhaps by applying it only to particular users.
Is PgBouncer the right answer here? Or is it safe to apply this setting in my case? Or is there a better approach? What I want to make sure is that the server does its own cleanup of connections created by the lambda ThreadedConnectionPool if it so happens the lambda died.
Are you explicitly closing the connection when you are done with it? If not and you just let the connection go out of scope, maybe the garbage collection system is just not very aggressive about cleaning it up.
Pgbouncer could be helpful for this, but it would have to be run in transaction pooling mode (because the default session pooling mode can't be very useful when the sessions don't get closed promptly), and that does impose some restrictions on what you can do, like prepared transactions.
Or, if you created a database user for your lambdas to use, then you could apply the idle timeout only to that user, and so prevent it from killing administrator, monitoring, or developer connections. But combining the pooler and the timeout is probably not needed, or advisable.

EF core request cannot wake-up Azure Sql (serverless sku) database and times out

I'm using EF Core with one of my apps to query an Azure Sql database. It's the serverless sku, that scales down to zero (goes to sleep) after 1h of inactivity.
Now, in that app there is scheduled function to query the database at certain points in time. This often is in a time, where the DB is sleeping. To compensate for this, I'm using the the following in the DbContext.cs
optionsBuilder.UseSqlServer(connection, opt => opt.EnableRetryOnFailure(
maxRetryCount: 20,
maxRetryDelay: TimeSpan.FromSeconds(30),
errorNumbersToAdd: null
));
If the delay is evenly distributed, that results in an avg of 15s, with 20 retries => timeout after 5mins.
I thought this should be plenty, since when querying a sleeping database with SSMS it usaully takes well under 1min to get going. However, this is not the case, the functions regularly time-out and the queries fail.
Is there a better way to deal with this than just even more increasing the timeout? Should 5mins really not be enough?
Cheers
I think I got it working now. The above code snippet from EF core is relevant to any command timeout occurences. However, since the database was sleeping during the request it was rather a connection timeout issue. I fixed this, by providing adding Connect Timeout=120 in the connection string itself.

libpq Postgres PQexecParams 2 hours timeout

I am using libpq v9.6.8 for my Application (running 24/7), which inserts data into the postgres database. I also run PQexecParams to get the table columns. But randomly (sometimes just once a week, but then twice a weekend) this blocking PQexecParams call somehow returns after about 2 hours. Within these two hours my application just hangs... The inserts are done via async PQsendQueryParams.
Is there a way to configure the timeout for PQexecParams (as I cannot find any appropriate timeout settings in the lib maybe on the postgres server)? Is there a better way to perform the select synchronous?
Thank you in advance
The two hours suggest TCP keepalive kicking in and determining that the connection has gone bad.
You can set the keepalives_idle connection parameter so that the timeout happens earlier and you are not stalled for two hours.
But you probably also want to know what aborts the network connection. Your first look should be at the PostgreSQL server log; you should see an error message that matches the one on the client side. Probably a network component is at fault – look for firewalls in particular.

Haskell database connections

Please look at this scotty app (it's taken directly from this old answer from 2014):
import Web.Scotty
import Database.MongoDB
import qualified Data.Text.Lazy as T
import Control.Monad.IO.Class
runQuery :: Pipe -> Query -> IO [Document]
runQuery pipe query = access pipe master "nutrition" (find query >>= rest)
main = do
pipe <- connect $ host "127.0.0.1"
scotty 3000 $ do
get "/" $ do
res <- liftIO $ runQuery pipe (select [] "stock_foods")
text $ T.pack $ show res
You see how the the database connection (pipe) is created only once when the web app launches. Subsequently, thousands if not millions of visitors will hit the "/" route simultaneously and read from the database using the same connection (pipe).
I have questions about how to properly use Database.MongoDB:
Is this the proper way of setting things up? As opposed to creating a database connection for every visit to "/". In this latter case, we could have millions of connections at once. Is that discouraged? What are the advantages and drawbacks of such an approach?
In the app above, what happens if the database connection is lost for some reason and needs to be created again? How would you recover from that?
What about authentication with the auth function? Should the auth function only be called once after creating the pipe, or should it be called on every hit to "/"?
Some say that I'm supposed to use a pool (Data.Pool). It looks like that would only help limit the number of visitors using the same database connection simultaneously. But why would I want to do that? Doesn't the MongoDB connection have a built-in support for simultaneous usages?
Even if you create connection per client you won't be able to create too many of them. You will hit ulimit. Once you hit that ulimit the client that hit this ulimit will get a runtime error.
The reason it doesn't make sense is because mongodb server will be spending too much time polling all those connections and it will have only as many meaningful workers as many CPUs your db server has.
One connection is not a bad idea, because mongodb is designed to send several requests and wait for responses. So, it will utilize as much resources as your mongodb can have with only one limitation - you have only one pipe for writing, and if it closes accidentally you will need to recreate this pipe yourself.
So, it makes more sense to have a pool of connections. It doesn't need to be big. I had an app which authenticates users and gives them tokens. With 2500 concurrent users per second it only had 3-4 concurrent connections to the database.
Here are the benefits connection pool gives you:
If you hit pool connection limit you will be waiting for the next available connection and will not get runtime error. So, you app will wait a little bit instead of rejecting your client.
Pool will be recreating connections for you. You can configure pool to close excess of connections and create more up until certain limit as you need them. If you connection breaks while you read from it or write to it, then you just take another connection from the pool. If you don't return that broken connection to the pool pool will create another connection for you.
If the database connection is closed then: mongodb listener on this connection will exit printing a error message on your terminal, your app will receive an IO error. In order to handle this error you will need to create another connection and try again. When it comes to handling this situation you understand that it's easier to use a db pool. Because eventually you solution to this will resemble connection pool very much.
I do auth once as part of opening a connection. If you need to auth another user later you can always do it.
Yes, mongodb handles simultaneous usage, but like I said it gives only one pipe to write and it soon becomes a bottle neck. If you create at least as many connections as your mongodb server can afford threads for handling them(CPU count), then they will be going at full speed.
If I missed something feel free to ask for clarifications.
Thank you for your question.
What you really want is a database connection pool. Take a look at the code from this other answer.
Instead of auth, you can use withMongoDBPool to if your MongoDB server is in secure mode.
Is this the proper way of setting things up? As opposed to creating a database connection for every visit to "/". In this latter case, we could have millions of connections at once. Is that discouraged? What are the advantages and drawbacks of such an approach?
You do not want to open one connection and then use it. The HTTP server you are using, which underpins Scotty, is called Warp. Warp has a multi-core, multi-green-thread design. You are allowed to share the same connection across all threads, since Database.MongoDB says outright that connections are thread-safe, but what will happen is that when one thread is blocked waiting for a response (the MongoDB protocol follows a simple request-response design) all threads in your web service will block. This is unfortunate.
We can instead create a connection on every request. This trivially solves the problem of one thread's blocking another but leads to its own share of problems. The overhead of setting up a TCP connection, while not substantial, is also not zero. Recall that every time we want to open or close a socket we have to jump from the user to the kernel, wait for the kernel to update its internal data structures, and then jump back (a context switch). We also have to deal with the TCP handshake and goodbyes. We would also, under high load, run out file descriptors or memory.
It would be nice if we had a solution somewhere in between. The solution should be
Thread-safe
Let us max-bound the number of connections so we don't exhaust the finite resources of the operating system
Quick
Share connections across threads under normal load
Create new connections as we experience increased load
Allow us to clean up resources (like closing a handle) as connections are deleted under reduced load
Hopefully already written and battle-tested by other production systems
It is this exactly problem that resource-pool tackles.
Some say that I'm supposed to use a pool (Data.Pool). It looks like that would only help limit the number of visitors using the same database connection simultaneously. But why would I want to do that? Doesn't the MongoDB connection have a built-in support for simultaneous usages?
It is unclear what you mean by simultaneous usages. There is one interpretation I can guess at: you mean something like HTTP/2, which has pipelining built into the protocol.
standard picture of pipelining http://research.worksap.com/wp-content/uploads/2015/08/pipeline.png
Above we see the client making multiple requests to the server, without waiting for a response, and then the client can receive responses back in some order. (Time flows from the top to the bottom.) This MongoDB does not have. This is a fairly complicated protocol design that is not that much better than just asking your clients to use connection pools. And MongoDB is not alone here: the simple request-and-response design is something that Postgres, MySQL, SQL Server, and most other databases have settled on.
And: it is true that connection pool limits the load you can take as a web service before all threads are blocked and your user just sees a loading bar. But this problem would exist in any of the three scenarios (connection pooling, one shared connection, one connection per request)! The computer has finite resources, and at some point something will collapse under sufficient load. Connection pooling's advantages are that it scales gracefully right up until the point it cannot. The correct solution to handling more traffic is to increase the number of computers; we should not avoid pooling simply due to this problem. 
In the app above, what happens if the database connection is lost for some reason and needs to be created again? How would you recover from that?
I believe these kinds of what-if's are outside the scope of Stack Overflow and deserve no better answer than "try it and see." Buuuuuuut given that the server terminates the connection, I can take a stab at what might happen: assuming Warp forks a green thread for each request (which I think it does), each thread will experience an unchecked IOException as it tries to write to the closed TCP connection. Warp would catch this exception and serve it as an HTTP 500, hopefully writing something useful to the logs also. Assuming a single-connection model like you have now, you could either do something clever (but high in lines of code) where you "reboot" your main function and set up a second connection. Something I do for hobby projects: should anything odd occur, like a dropped connection, I ask my supervisor process (like systemd) to watch the logs and restart the web service. Though clearly not a great solution for a production, money-makin' website, it works well enough for small apps.
What about authentication with the auth function? Should the auth function only be called once after creating the pipe, or should it be called on every hit to "/"?
It should be called once after creating the connection. MongoDB authentication is per-connection. You can see an example here of how the db.auth() command mutates the MongoDB server's data structures corresponding to the current client connection.

Connection to postgres closed with 'This socket is closed' error message

I'm migrating toward node.js 0.6.12 and now got the following error messages when using pg module (version 0.6.14):
Error: This socket is closed.
at Socket._write (net.js:453:28)
at Socket.write (net.js:446:15)
at [object Object]._send (/home/luc/node_modules/pg/lib/connection.js:102:24)
at [object Object].flush (/home/luc/node_modules/pg/lib/connection.js:192:8)
at [object Object].getRows (/home/luc/node_modules/pg/lib/query.js:112:14)
at [object Object].prepare (/home/luc/node_modules/pg/lib/query.js:150:8)
at [object Object].submit (/home/luc/node_modules/pg/lib/query.js:97:10)
at [object Object]._pulseQueryQueue (/home/luc/node_modules/pg/lib/client.js:166:24)
at [object Object].query (/home/luc/node_modules/pg/lib/client.js:193:8)
at /home/luc/test/routes/user.js:23:29
The line indicated in my code is:
var get_obj = client.query("SELECT id FROM users WHERE name = $1", [name]);
This use to work fine with node 0.4.8 and gp 0.5.0 but does not work anymore now I'm testing the migration.
I saw several error like this one on the net but no answer.
UPDATE
This seems to be linked to the way I handle my postgres connection. Today I create a single connection when running the app. I think creating a new connection on each request would be better. Is the best solution to have the connection created in an express middleware ?
Normally, frameworks and middleware keep the connection open (or: a pool of connections). The problem lies most probably in your node.js code (or usage). BTW: if you have access to the postgres's logfiles, you can probably see explicit disconnections from the node.js. (log_connections and log_disconnections should both be set to True to see this)
Connect+disconnect is considered an expensive operation (TCP traffic, authorisation, forking a worker process (for postgres) , session setup, logging, (accounting?) ). But if it works for you (or you have only one request+reply for the session) it's okay.
Cost /resource usage estimates:
For the session setup:
TCP/IP connection setup: 2*2 IP packets := 4*round-trip delay
login /password:
2*2 TCP readwrites := 4 * round-trip delays
4 system R/W calls
a few database queries / lookups for user authorisation, (say 10...100 disk reads; mostly cached)
session construction := fork (for postgres) + lots of COW pages being cloned (? 100-1000 pagefaults?)
session initialisation := a few round trips
for the query:
send+ receive query := a few TCP/IP round-trips
parse := a few (1...100) catalog lookups (mostly from disk cache)
execute := xxx disk reads (possibly from cache)
fetch and store results := allocate (dirty) buffers
send results := xxx TCP round-trips
discard result-buffers := (almost for free!)
Session teardown:
3*2 IP roundtrips
exit() of the child process, wait() for the parent process (Sorry, I think in unix-terms ;-)
1 socket-descriptor in TIME_WAIT state for a few seconds / minutes
As you can see, the amount of resources spent on connection build-up is 10, maybe 100 times as big as what a typical query+result will cost; if you have more than one query to execute it will be wise to keep the connection open. (or maintain a pool of open connections)
For simplicity, I ignored CPU consumption and mainly ignored memory/buffer usage. Nowadays, CPU almost seems for free; the amount of calculation that can be done while waiting for a disk (10 ms) or network (x ms) is incredible: several (100...10K?) ticks per byte.