Hikari CP connections are suddenly invalidated - postgresql

The bounty expires in 5 days. Answers to this question are eligible for a +50 reputation bounty.
Habil Ganbarli is looking for an answer from a reputable source.
Hi Stackoverflow family,
So we have an application with Kotlin & Spring boot that uses a single DB instance(1 GB Memory and instance class is db.t3.micro) as PostgreSQL and is hosted in AWS. What happens for the last couple of days is suddenly connections in my pool are invalidated(2-3 times a day) and the pool size drops drastically. In summary:
Let's say everything is normal in Hikari and the connections are closed and added according to the maxliftime which is 30 minutes by default and the log are like below:
HikariPool-1 - Pool stats (total=40, active=0, idle=40, waiting=0)
HikariPool-1 - Fill pool skipped, pool is at sufficient level.
Suddenly most of the connections become invalidated. Let's say 30 out of 40. The connections are closed before they pass their max lifetime and the logs are like below for all closed connections:
HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection#5257d7b2 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
HikariPool-1 - Closing connection org.postgresql.jdbc.PgConnection#7b673105: (connection is dead)
Additionally after these messages followed by multiple of this logs like below:
Add connection elided, waiting 6, queue 13
And the timeout failure stats like below:
HikariPool-1 - Timeout failure stats (total=12, active=12, idle=0, waiting=51)
Finally, I have left with lots of connection timeouts of requests due to the reason that there were no connection available for the most of the requests:
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms
I have added leak-detection-threshold and it also logs like below during the problem happening:
Connection leak detection triggered for org.postgresql.jdbc.PgConnection#3bb5f155 on thread http-nio-8080-exec-482, stack trace follows
java.lang.Exception: Apparent connection leak detected
The hikari config is like below:
hikari:
data-source-properties: stringtype=unspecified
maximum-pool-size: 40
leak-detection-threshold: 30000
When this problem happens queries in PostgreSQL also take a lot of time: 8-9 seconds and increase up to 15-35 seconds. Some queries even 55-65 seconds(which usually take 1-3 seconds at most in usual times). That is why we think it is not a query issue.
In addition to that some sources suggest using try with resources, however, it is not the case for us as we do not obtain connections manually. In addition to that increasing the max pool size from 20 to 40 also did not help. I would really appreciate any comment or hint as we are dealing with this issue for almost a week.

Related

Spring Boot 1.5.3 Creating more connection than specified in application.properties

I am working on a project where i have dual datasource configured. On testing i have limit the no of max-active connections to five but when i checked on database, i found that application create around 25+ connections.
Code Sample
# Number of ms to wait before throwing an exception if no connection is available.
spring.datasource.tomcat.max-wait=1000
# Maximum number of active connections that can be allocated from this pool at the same time.
spring.datasource.tomcat.max-active=1
spring.datasource.tomcat.max-idle=1
spring.datasource.tomcat.min-idle=1
spring.datasource.tomcat.initial-size=1
# Validate the connection before borrowing it from the pool.
spring.datasource.tomcat.test-on-borrow=true
spring.datasource.tomcat.test-while-idle = true
spring.datasource.tomcat.validation-query = true
spring.datasource.tomcat.time-between-eviction-runs-millis = 360000
spring.rdatasource.tomcat.max-wait=1000
# Maximum number of active connections that can be allocated from this pool at the same time.
spring.rdatasource.tomcat.max-active=1
spring.rdatasource.tomcat.max-idle=1
spring.rdatasource.tomcat.min-idle=1
spring.rdatasource.tomcat.initial-size=1
# Validate the connection before borrowing it from the pool.
spring.rdatasource.tomcat.test-on-borrow=true
spring.rdatasource.tomcat.test-while-idle= true
spring.rdatasource.tomcat.validation-query = true
spring.rdatasource.tomcat.time-between-eviction-runs-millis = 360000
above connection is working fine, but exceeding no of connection to database. User which i am using is limited to 10 connection.
when i hit request to application than i am getting
query wait timeout error with unable to create initial pool size.
I am using tomcat connection pooling
Please provide me the solution so application will run with 10 connections limit which is set at database.

orientdb 2.2.26 cluster setup - queries hang

Orientdb version - 2.2.26
CLuster - 3 node setup, readQuorum = 2, writeQuorum = "majority", ridBag.embeddedToSbtreeBonsaiThreshold = 2147483647
Nodes - CentOS 7.0, 24 cores and 96 GB RAM
Gremlin-scala/tinkerpop APIs are used for querying and inserting.
This code works fine on single node setup.
Code checks for existing vertex in graph. If vertex does not exist, then the insert operation are batched and send to the db within a transaction.
I see following warnings in orientdb log on all three nodes -
2017-09-15 16:37:31:025 WARNI [dev2] Timeout (852567ms) on waiting for synchronous responses from nodes=[dev1, dev3, dev2] responsesSoFar=[] request=(id=1.354 task=record_read(#65:22)) [ODistributedDatabaseImpl]
2017-09-15 16:52:18:239 WARNI [dev2] Timeout (1049042ms) on waiting for synchronous responses from nodes=[dev1, dev3, dev2] responsesSoFar=[] request=(id=1.568 task=record_read(#63:24)) [ODistributedDatabaseImpl]
2017-09-15 17:25:22:477 WARNI [dev2] Timeout (1984236ms) on waiting for synchronous responses from nodes=[dev1, dev3, dev2] responsesSoFar=[] request=(id=1.889 task=record_read(#63:24)) [ODistributedDatabaseImpl]
There is no problem on network. Firewall is disabled on all three nodes.
Are these logs related to the problem ?
What else I should check to fix the problem ?

Spymemcached issue with 2 nodes configured

I am using Ketama algorithm's spymemcached for my project. I do have two memcached servers running as part of HA (high availability) and my configurations are
hibernate.cache.use_second_level_cache=true
hibernate.cache.use_query_cache=true
hibernate.cache.region.factory_class=kr.pe.kwonnam.hibernate4memcached.Hibernate4MemcachedRegionFactory
hibernate.cache.default_cache_concurrency_strategy=NONSTRICT_READ_WRITE
hibernate.cache.region_prefix=myProjectCache
hibernate.cache.use_structured_entries=false
h4m.adapter.class=kr.pe.kwonnam.hibernate4memcached.spymemcached.SpyMemcachedAdapter
h4m.timestamper.class=kr.pe.kwonnam.hibernate4memcached.timestamper.HibernateCacheTimestamperMemcachedImpl
h4m.adapter.spymemcached.hosts=host1:11211,host2:11211
h4m.adapter.spymemcached.hashalgorithm=KETAMA_HASH
h4m.adapter.spymemcached.operation.timeout.millis=5000
h4m.adapter.spymemcached.transcoder=kr.pe.kwonnam.hibernate4memcached.spymemcached.KryoTranscoder
h4m.adapter.spymemcached.cachekey.prefix=myProject
h4m.adapter.spymemcached.kryotranscoder.compression.threashold.bytes=20000
# 10 minutes
h4m.expiry.seconds=600
# a day
h4m.expiry.seconds.validatorCache.org.hibernate.cache.spi.UpdateTimestampsCache=86400
# 1 hour
h4m.expiry.seconds.validatorCache.org.hibernate.cache.internal.StandardQueryCache=3600
# 30 minutes
h4m.expiry.seconds.myProjectCache.database1=1800
h4m.expiry.seconds.myProjectCache.database2=1800
Configurations are followed as per the link below :
SpyMemcachedAdapter
Both nodes host1 and host2 are reachable, up and running.
Issue :
As part of testing HA , when I bring down one memcached (host1) my application is connecting to host2 but only after trying to connect host1 (which will be timedout - as host1 is made down) for every request. Which will result in too much of time taken
Below is the exception thrown for every request
2017-07-07 17:27:31.915 [SimpleAsyncTaskExecutor-6] ERROR u.c.o.sProcessor - TransId:004579 - Exception occurred while processing request :Timeout waiting for value: waited 5,000 ms. Node status: Connection Status { /host1:11211 active: false, authed: true, last read: 247,290 ms ago /host2:11211 active: true, authed: true, last read: 5 ms ago }
2017-07-07 17:28:54.666 INFO net.spy.memcached.MemcachedConnection: Reconnecting due to failure to connect to {QA sa=/host1:11211, #Rops=0, #Wops=214, #iq=0, topRop=null, topWop=Cmd: 5 Opaque: 341143 Key: myProject.myProjectCache.databse1# Amount: 0 Default: 1499444604639 Exp: 2592000, toWrite=0, interested=0}
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:677)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:436)
at net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1446)
2017-07-07 17:28:54.666 WARN net.spy.memcached.MemcachedConnection: Closing, and reopening {QA sa=/host1:11211, #Rops=0, #Wops=214, #iq=0, topRop=null, topWop=Cmd: 5 Opaque: 341143 Key: myProject.myProjectCache.databse1# Amount: 0 Default: 1499444604639 Exp: 2592000, toWrite=0, interested=0}, attempt 14.
2017-07-07 17:28:54.841 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for myProject.myProjectCache.databse1#-1:my.co.org.myProject.dao.entity.databse1.tablexyz#14744.
Am using memcached for the first time, not sure whether this is the behavior of spymemcached? Or am I missing something in my configurations? Or
by changing timeout configurations will it resolve time taken to process the request?
Any suggestions/help much appreciated.
If you are using DefaultConnectionFactory which uses OOTB ConnectionFactoryBuilder then the reconnect will happen after failed operation count has reached timeoutExceptionThreshold (in version 2.7 of spymemcached) is initialized to 998. So if you create your own ConnectionFactory and change the timeoutExceptionThreshold to lower value then you should see the automatic recovery.
Hope this helps.

Google Cloud SQL + Hikari CP + Communications link failure

I'm experiencing intermittent connectivity errors from a Spring Boot application communicating with a D1 Google CloudSQL Server with the configuration settings described here HikariCP MySQL settings
I was wondering if anyone has encountered this before.
I've read the FAQ posted here Hikari FAQ and I'm wondering if my default idleTimeout and maxLifeTime (30 mins) settings might be at fault; wait_timeout and interactive_timeout on the server are both set to default 28800s (8 hours).
The FAQ says that these two settings should be about a minute less that the server settings, but if I'm losing connections after 30 minutes I can't quite see how upping the maxLifeTime to 7hrs 59mins is going to improve the situation.
Does anyone have any recommendations?
Redacted stack trace(s):
Get these from time to time
org.springframework.security.authentication.InternalAuthenticationServiceException: Could not get JDBC Connection; nested exception is java.sql.SQLException: Timeout after 30018ms of waiting for a connection.
at org.springframework.security.authentication.dao.DaoAuthenticationProvider.retrieveUser(DaoAuthenticationProvider.java:110)
at org.springframework.security.authentication.dao.AbstractUserDetailsAuthenticationProvider.authenticate(AbstractUserDetailsAuthenticationProvider.java:132)
at org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:156)
at org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:177)
...
Caused by: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: Timeout after 30023ms of waiting for a connection.
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80)
....
Caused by: java.sql.SQLException: Timeout after 30023ms of waiting for a connection.
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:208)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:108)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
... 59 common frames omitted
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:630)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:695)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:727)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:737)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:787)
Hibernate search:
2015-02-17 10:34:17.090 INFO 1 --- [ entityloader-2] o.h.s.i.SimpleIndexingProgressMonitor : HSEARCH000030: 31050 documents indexed in 1147865 ms
2015-02-17 10:34:17.090 INFO 1 --- [ entityloader-2] o.h.s.i.SimpleIndexingProgressMonitor : HSEARCH000031: Indexing speed: 27.050219 documents/second; progress: 99.89%
2015-02-17 10:41:59.917 WARN 1 --- [ntifierloader-1] com.zaxxer.hikari.proxy.ConnectionProxy : Connection com.mysql.jdbc.JDBC4Connection#372f2018 (HikariPool-0) marked as broken because of SQLSTATE(08S01), ErrorCode(0).
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 1,611,087 milliseconds ago. The last packet sent successfully to the server was 927,899 milliseconds ago.
The indexing isn't particularly quick at the moment I think because I'm not using projections. The process takes about 30 minutes to execute.
Thanks
It could be a couple of things here. First, the network infrastructure (firewalls, load-balancers, etc.) between the application tier and the database tier can impose their own connection timeouts, regardless of MySql settings.
The indexing failure indicates that the connection was out of the pool for ~27 minutes with no SQL activity when that failure occurred.
Second, specifically regarding the "Could not get JDBC Connection" error, you may be running into Cloud SQL connection limits.
I recommend three things. One, make sure you are on the latest HikariCP (2.3.2) and latest MySql Connector/J driver (5.1.34). Two, enable DEBUG-level logging for the com.zaxxer.hikari package. HikariCP debug logging is not "chatty", but will log pool statistics every 30 seconds (and sometimes more detail in failure conditions). Lastly, try setting the maxPoolSize to something smaller (unless already at the default), and setting maxLifeTime to 15 or 20 minutes (1200000ms).
If the error occurs again, post updated logs containing the HikariCP debug logs around the time of failure. Also, feel free to open a tracking issue over on Github as larger logs etc. are easier there.

High tomcat thread count caused by threads waiting on MultiThreadedHttpConnectionManager ConnectionPool

We've recently started seeing spikes in the thread counts on our tomcat servers (peaking at over 1000 when normally at around 100). We performed a thread dump on one of the tomcat servers whilst it's thread count was high and found that a large number of the threads were waiting on MultiThreadedHttpConnectionManager$ConnectionPool, stack trace as follows:
"TP-Processor21700" daemon prio=10 tid=0x4a0b3400 nid=0x2091 in Object.wait() [0x399f3000..0x399f4004]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x58ee5030> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
- locked <0x58ee5030> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
...
There are 3 points in our code where httpClient.executeMethod() is called (to obtain info via an http request to another tomcat server). In each case the GetMethod object passed to it has had its socket timeout value set (i.e. via getMethod.getParams().setSoTimeout();) before hand, and the MultiThreadedConnectionManager is configured in spring to have a connectionTimeout value of 10 seconds. One thing I have noticed is that only 2 of the 3 httpClient.executeMethod() invocations are followed by a call to getMethod.releaseConnection(), so I'm wondering if this may be the cause of the problem (i.e. connections not being explicitly released). However what's strange is that
the problem has only started occurring in the last few days and the source code has not been modified for over a year, plus the fact that there has been no recent surge in requests coming through to the tomcat servers. One change that did occur a couple of days before the problem started to occur was that we upgraded the JVM used by the tomcat server from Java 5 (1.5 update 14) to Java 6 (1.6 update 25). We have tried temporarily reverting the JVM version to Java 5 to see if the problem stopped occurring but it did not. Another point to note is that in most cases the tomcat server eventually recovers and
the thread count drops back to normal - we've only had one instance where a tomcat process appears to have crashed because of the thread count increase.
We are running Tomcat 5.5 with commons-httpclient-3.1.jar running against a Java 1.6 update 25 on a Red Hat linux environment.
Please let me know if you can suggest any ideas as to what may be the cause of this issue.
Thanks.
The problem was indeed caused by the fact that only 2 of the 3 httpClient.executeMethod(getMethod) invocations were followed by a call to getMethod.releaseConnection(). Ensuring all 3 httpClient.executeMethod(getMethod) invocations were inside a try/catch block followed by a finally block containing a call to getMethod.releaseConnection() prevented the high thread counts from occurring. Although this code had been in our live system for over a year, it appears that the reason the high thread count issue only recently started occurring was because various search engine crawlers had started hitting the site with lots of URL requests that caused the code where the connection was being used but not subsequently released to execute. Problem solved.