Just looking for an explanation of rationale for this bit of code (PoolUtiltites:293 in version 2.2.4):
dataSource.setLoginTimeout((int) TimeUnit.MILLISECONDS.toSeconds(Math.min(1000L, connectionTimeout)));
This code and the setConnectionTimeout method means that I get this behaviour:
connectionTimeout == 0, then loginTimeout = Integer.MAX_VALUE
connectionTimeout > 0 && < 100, then HikariConfig throws IllegalArgumentException
connectionTimeout >= 100 && <= 1000, then loginTimeout = connectionTimeout
connectionTeimout > 1000, then loginTimeout = 1000
That looks really weird to me!
It's almost like the Math.min should be Math.max ???
In my current project I'd like to fail connections after 30s, which is impossible in the current setup.
I'm using the 4.1 postgres jdbc driver, but I think this is not relevant to the issue above.
Many thanks - and cool pooling library!!!
Ok, there are a couple of moving parts here. First, Math.min() is a bug, it should be Math.max(). In light of that (it will be fixed) consider the following:
It is important to note that connections are created asynchronously in the pool. The setConnectionTimeout() sets the maximum time (in milliseconds) that a call to getConnection() will wait for a connection before timing out.
The DataSource loginTimeout is the maximum time that physical connection initiation to the database can take before timing out. Because HikariCP obtains connections asynchronously, if the connection attempt fails, HikariCP will continue to retry, but your calls to getConnection() will timeout appropriately. We are using the connectionTimeout in kind of a double duty for loginTimeout.
For example, lets say the pool is completely empty, and you have configured a connectionTimeout of 30 seconds. When you call getConnection() HikariCP, realizing that there are no idle connections available, starts trying to obtain a new one. There is little point in having a loginTimeout exceeding 30 seconds, in this case.
The intent of the Math.max() call is to ensure that we never set loginTimeout to 0 if the user has configured connectionTimeout to 250ms. TimeUnit.MILLESECONDS.toSeconds() would return 0 without the Math.max(). If the user has configured a connectionTimeout of 0, meaning they never want to timeout, the time conversion of Integer.MAX_VALUE results in several thousand years as a timeout (virtually never).
Having said that, and in light of how HikariCP connections to the database are obtained asynchronously, even without the Math.max() fix, you should be able to achieve application-level connection timeouts of 30s. Unless physical connections to your database exceed 1000ms you would be unaffected by the Math.min().
We are putting out a 2.2.5-rc3 release candidate in the next few hours. I will slot this fix in.
Related
I am using Atomkios XA configuration for Oracle.
This is my code for creating datasource connection.
OracleXADataSource oracleXADataSource = new OracleXADataSource();
oracleXADataSource.setURL(sourceURL);
oracleXADataSource.setUser(UN);
oracleXADataSource.setPassword(PS);
AtomikosDataSourceBean sourceBean= new AtomikosDataSourceBean();
sourceBean.setXaDataSource(oracleXADataSource);
sourceBean.setUniqueResourceName(resourceName);
sourceBean.setMaxPoolSize(max-pool-size); // 10
atomikos:
datasource:
resourceName: insight
max-pool-size: 10
min-pool-size: 3
transaction-manager-id: insight-services-tm
This is my configuration is fine for medium user load around 5000 requests.
But when user count increases assume more than 10000 requests, this class com.atomikos.jdbc.AbstractDataSourceBean:getConnection consuming more then than normal.
This class time taking approximately 1500ms but normal time it takes less then 10ms. I can understand user demand increases, getConnection is going to wait state to pick the free connection from the connection pool so if I increase my max-pool-size, will my problem sort out or any other option feature available to sort out my problem.
Try setting concurrentConnectionValidation=true on your datasource.
If that does not help then consider a free trial of the commercial Atomikos product:
https://www.atomikos.com/Main/ExtremeTransactionsFreeTrial
Best
I have a basic pgbouncer configuration set up on an Amazon EC2 instance.
My client code (an AWS Lambda function, or a localhost webserver when developing) is making SQL queries to my database through the pgbouncer.
Currently, each query is taking 150-200ms to execute, with about 80% of that being the time it takes to get the connection.
Here's how I'm getting a connection:
long start = System.currentTimeMillis();
Connection conn = DriverManager.getConnection(this.url, this.username, this.password);
log.info("Got connection in " + (System.currentTimeMillis() - start) + "ms")
this.url is simply the location of the pgbouncer instance. Here's what the measured latency looks like, where Got connection is from the above code snippet and Executed in is another timing that measures the elapsed duration after a PreparedStatement has been executed. The first connection is usually a bit slow which is fine, subsequent ones take around 100ms pretty consistently.
DBManager - Got connection in 190ms
DBManager - Executed in 232ms
DBManager - Got connection in 108ms
DBManager - Executed in 132ms
DBManager - Got connection in 108ms
DBManager - Executed in 128ms
Is there any way to make this faster? Or am I basically stuck with a minimum ~100ms latency on my requests? I get similar speeds from Lambda and localhost, and unfortunately I can't throw my Lambda into the same VPC because of the occasional 8-10 second cold start delay from setting up a new Elastic Network Interface when using a Lambda in a VPC.
This is my first time working with this kind of setup so I don't really know where to start. Could I squeeze out higher speed by adding more power (RAM/CPU) to the database or pgbouncer? Should I not get a new connection for every request (but this would mean having a connection pool per Lambda and then a separate pgbouncer pool)?
I feel like this is surely a pretty common problem so there must be some good ways of solving it, but I haven't been able to find anything.
You'd have to ask the vendor to figure out what part of the time is spent on the route between you and pgBouncer and pgBouncer and the database server. I'd guess it is the first part.
If you want low latency, a hosted database might not be perfect for you.
My suggestion would be to build a connection pool into your application or have pgBouncer locally, so that you don't have to estabish connections to the hosted systems all the time.
According to this response you can set a time limit for a query operation via find() parameter or a collection method:
cursor = db.collection.find(max_time_ms=1)
or
cursor = db.collection.find().max_time_ms(1)
The doc says:
max_time_ms (optional): Specifies a time limit for a query operation. If the specified time is exceeded, the operation will be aborted and ExecutionTimeout is raised. Pass this as an alternative to calling max_time_ms() on the cursor.
We're currently experiencing a problem that a query runs for ~30 minutes before it eats all the RAM and the server dies. I hope this parameter gives a hard limit on the query and after the given time the server gives up.
Since our app is full of finds and cursors: is there a way how to set this parameter directly in the MongoClient constructor?
The doc says:
socketTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait for a response after sending an ordinary (non-monitoring) database operation before concluding that a network error has occurred. Defaults to None (no timeout).
connectTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait during server monitoring when connecting a new socket to a server before concluding the server is unavailable. Defaults to 20000 (20 seconds).
serverSelectionTimeoutMS: (integer) Controls how long (in milliseconds) the driver will wait to find an available, appropriate server to carry out a database operation; while it is waiting, multiple server monitoring operations may be carried out, each controlled by connectTimeoutMS. Defaults to 30000 (30 seconds).
...couldn't find another timeout and none of these seem to be the equivalent of max_time_ms. Am I missing something?
I'm trying to build a web application that should be able to handle at least 15000 rps. Some of the optimizations I have done is increase the worker pool size to 20 and set an accept back log to 25000. Since I have set my worker pool size to 20; wil this help with the the blocking piece of code?
A worker pool size of 20 seems to be the default.
I believe the important question in your case is how long do you expect each request to run. On my side, I expect to have thousands of short-lived requests, each with a payload size of about 5-10KB. All of these will be blocking, because of a blocking database driver I use at the moment. I have increased the default worker pool size to 40 and I have explicitly set my deploy vertical instances using the following formulae:
final int instances = Math.min(Math.max(Runtime.getRuntime().availableProcessors() / 2, 1), 2);
A test run of 500 simultaneous clients running for 60 seconds, on a vert.x server doing nothing but blocking calls, produced an average of 6 failed requests out of 11089. My test payload in this case was ~28KB.
Of course, from experience I know that running my software in production would often produce results that I have not anticipated. Thus, the important thing in my case is to have good atomicity rules in place, so that I don't get half-baked or corrupted data in the database.
I am using libpq v9.6.8 for my Application (running 24/7), which inserts data into the postgres database. I also run PQexecParams to get the table columns. But randomly (sometimes just once a week, but then twice a weekend) this blocking PQexecParams call somehow returns after about 2 hours. Within these two hours my application just hangs... The inserts are done via async PQsendQueryParams.
Is there a way to configure the timeout for PQexecParams (as I cannot find any appropriate timeout settings in the lib maybe on the postgres server)? Is there a better way to perform the select synchronous?
Thank you in advance
The two hours suggest TCP keepalive kicking in and determining that the connection has gone bad.
You can set the keepalives_idle connection parameter so that the timeout happens earlier and you are not stalled for two hours.
But you probably also want to know what aborts the network connection. Your first look should be at the PostgreSQL server log; you should see an error message that matches the one on the client side. Probably a network component is at fault – look for firewalls in particular.