Spymemcached issue with 2 nodes configured - memcached

I am using Ketama algorithm's spymemcached for my project. I do have two memcached servers running as part of HA (high availability) and my configurations are
hibernate.cache.use_second_level_cache=true
hibernate.cache.use_query_cache=true
hibernate.cache.region.factory_class=kr.pe.kwonnam.hibernate4memcached.Hibernate4MemcachedRegionFactory
hibernate.cache.default_cache_concurrency_strategy=NONSTRICT_READ_WRITE
hibernate.cache.region_prefix=myProjectCache
hibernate.cache.use_structured_entries=false
h4m.adapter.class=kr.pe.kwonnam.hibernate4memcached.spymemcached.SpyMemcachedAdapter
h4m.timestamper.class=kr.pe.kwonnam.hibernate4memcached.timestamper.HibernateCacheTimestamperMemcachedImpl
h4m.adapter.spymemcached.hosts=host1:11211,host2:11211
h4m.adapter.spymemcached.hashalgorithm=KETAMA_HASH
h4m.adapter.spymemcached.operation.timeout.millis=5000
h4m.adapter.spymemcached.transcoder=kr.pe.kwonnam.hibernate4memcached.spymemcached.KryoTranscoder
h4m.adapter.spymemcached.cachekey.prefix=myProject
h4m.adapter.spymemcached.kryotranscoder.compression.threashold.bytes=20000
# 10 minutes
h4m.expiry.seconds=600
# a day
h4m.expiry.seconds.validatorCache.org.hibernate.cache.spi.UpdateTimestampsCache=86400
# 1 hour
h4m.expiry.seconds.validatorCache.org.hibernate.cache.internal.StandardQueryCache=3600
# 30 minutes
h4m.expiry.seconds.myProjectCache.database1=1800
h4m.expiry.seconds.myProjectCache.database2=1800
Configurations are followed as per the link below :
SpyMemcachedAdapter
Both nodes host1 and host2 are reachable, up and running.
Issue :
As part of testing HA , when I bring down one memcached (host1) my application is connecting to host2 but only after trying to connect host1 (which will be timedout - as host1 is made down) for every request. Which will result in too much of time taken
Below is the exception thrown for every request
2017-07-07 17:27:31.915 [SimpleAsyncTaskExecutor-6] ERROR u.c.o.sProcessor - TransId:004579 - Exception occurred while processing request :Timeout waiting for value: waited 5,000 ms. Node status: Connection Status { /host1:11211 active: false, authed: true, last read: 247,290 ms ago /host2:11211 active: true, authed: true, last read: 5 ms ago }
2017-07-07 17:28:54.666 INFO net.spy.memcached.MemcachedConnection: Reconnecting due to failure to connect to {QA sa=/host1:11211, #Rops=0, #Wops=214, #iq=0, topRop=null, topWop=Cmd: 5 Opaque: 341143 Key: myProject.myProjectCache.databse1# Amount: 0 Default: 1499444604639 Exp: 2592000, toWrite=0, interested=0}
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:677)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:436)
at net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1446)
2017-07-07 17:28:54.666 WARN net.spy.memcached.MemcachedConnection: Closing, and reopening {QA sa=/host1:11211, #Rops=0, #Wops=214, #iq=0, topRop=null, topWop=Cmd: 5 Opaque: 341143 Key: myProject.myProjectCache.databse1# Amount: 0 Default: 1499444604639 Exp: 2592000, toWrite=0, interested=0}, attempt 14.
2017-07-07 17:28:54.841 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for myProject.myProjectCache.databse1#-1:my.co.org.myProject.dao.entity.databse1.tablexyz#14744.
Am using memcached for the first time, not sure whether this is the behavior of spymemcached? Or am I missing something in my configurations? Or
by changing timeout configurations will it resolve time taken to process the request?
Any suggestions/help much appreciated.

If you are using DefaultConnectionFactory which uses OOTB ConnectionFactoryBuilder then the reconnect will happen after failed operation count has reached timeoutExceptionThreshold (in version 2.7 of spymemcached) is initialized to 998. So if you create your own ConnectionFactory and change the timeoutExceptionThreshold to lower value then you should see the automatic recovery.
Hope this helps.

Related

Hikari CP connections are suddenly invalidated

The bounty expires in 5 days. Answers to this question are eligible for a +50 reputation bounty.
Habil Ganbarli is looking for an answer from a reputable source.
Hi Stackoverflow family,
So we have an application with Kotlin & Spring boot that uses a single DB instance(1 GB Memory and instance class is db.t3.micro) as PostgreSQL and is hosted in AWS. What happens for the last couple of days is suddenly connections in my pool are invalidated(2-3 times a day) and the pool size drops drastically. In summary:
Let's say everything is normal in Hikari and the connections are closed and added according to the maxliftime which is 30 minutes by default and the log are like below:
HikariPool-1 - Pool stats (total=40, active=0, idle=40, waiting=0)
HikariPool-1 - Fill pool skipped, pool is at sufficient level.
Suddenly most of the connections become invalidated. Let's say 30 out of 40. The connections are closed before they pass their max lifetime and the logs are like below for all closed connections:
HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection#5257d7b2 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
HikariPool-1 - Closing connection org.postgresql.jdbc.PgConnection#7b673105: (connection is dead)
Additionally after these messages followed by multiple of this logs like below:
Add connection elided, waiting 6, queue 13
And the timeout failure stats like below:
HikariPool-1 - Timeout failure stats (total=12, active=12, idle=0, waiting=51)
Finally, I have left with lots of connection timeouts of requests due to the reason that there were no connection available for the most of the requests:
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms
I have added leak-detection-threshold and it also logs like below during the problem happening:
Connection leak detection triggered for org.postgresql.jdbc.PgConnection#3bb5f155 on thread http-nio-8080-exec-482, stack trace follows
java.lang.Exception: Apparent connection leak detected
The hikari config is like below:
hikari:
data-source-properties: stringtype=unspecified
maximum-pool-size: 40
leak-detection-threshold: 30000
When this problem happens queries in PostgreSQL also take a lot of time: 8-9 seconds and increase up to 15-35 seconds. Some queries even 55-65 seconds(which usually take 1-3 seconds at most in usual times). That is why we think it is not a query issue.
In addition to that some sources suggest using try with resources, however, it is not the case for us as we do not obtain connections manually. In addition to that increasing the max pool size from 20 to 40 also did not help. I would really appreciate any comment or hint as we are dealing with this issue for almost a week.

SQL timeout on Azure website suddenly started when return large number (1500) rows

Azure Website with EF6 just started to get timeout on pages where I retrieve more than about 1000 rows. (unsure about the limit, works on 400 or less, fails on 1500 or more)
[Win32Exception (0x80004005): The wait operation timed out]
[SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. This failure occurred while attempting to connect to the routing destination. The duration spent while attempting to connect to the original server was - [Pre-Login] initialization=1; handshake=21; [Login] initialization=0; authentication=0; [Post-Login] complete=1; ]
The app has been running smoothly for several month I just noticed today. Any ideas?
(In case the error is still present: Page with err: http://fartslek.no/fartslek/15 Page without err: http://fartslek.no/fartslek/3 )

Google Cloud SQL + Hikari CP + Communications link failure

I'm experiencing intermittent connectivity errors from a Spring Boot application communicating with a D1 Google CloudSQL Server with the configuration settings described here HikariCP MySQL settings
I was wondering if anyone has encountered this before.
I've read the FAQ posted here Hikari FAQ and I'm wondering if my default idleTimeout and maxLifeTime (30 mins) settings might be at fault; wait_timeout and interactive_timeout on the server are both set to default 28800s (8 hours).
The FAQ says that these two settings should be about a minute less that the server settings, but if I'm losing connections after 30 minutes I can't quite see how upping the maxLifeTime to 7hrs 59mins is going to improve the situation.
Does anyone have any recommendations?
Redacted stack trace(s):
Get these from time to time
org.springframework.security.authentication.InternalAuthenticationServiceException: Could not get JDBC Connection; nested exception is java.sql.SQLException: Timeout after 30018ms of waiting for a connection.
at org.springframework.security.authentication.dao.DaoAuthenticationProvider.retrieveUser(DaoAuthenticationProvider.java:110)
at org.springframework.security.authentication.dao.AbstractUserDetailsAuthenticationProvider.authenticate(AbstractUserDetailsAuthenticationProvider.java:132)
at org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:156)
at org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:177)
...
Caused by: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: Timeout after 30023ms of waiting for a connection.
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80)
....
Caused by: java.sql.SQLException: Timeout after 30023ms of waiting for a connection.
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:208)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:108)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
... 59 common frames omitted
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:630)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:695)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:727)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:737)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:787)
Hibernate search:
2015-02-17 10:34:17.090 INFO 1 --- [ entityloader-2] o.h.s.i.SimpleIndexingProgressMonitor : HSEARCH000030: 31050 documents indexed in 1147865 ms
2015-02-17 10:34:17.090 INFO 1 --- [ entityloader-2] o.h.s.i.SimpleIndexingProgressMonitor : HSEARCH000031: Indexing speed: 27.050219 documents/second; progress: 99.89%
2015-02-17 10:41:59.917 WARN 1 --- [ntifierloader-1] com.zaxxer.hikari.proxy.ConnectionProxy : Connection com.mysql.jdbc.JDBC4Connection#372f2018 (HikariPool-0) marked as broken because of SQLSTATE(08S01), ErrorCode(0).
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 1,611,087 milliseconds ago. The last packet sent successfully to the server was 927,899 milliseconds ago.
The indexing isn't particularly quick at the moment I think because I'm not using projections. The process takes about 30 minutes to execute.
Thanks
It could be a couple of things here. First, the network infrastructure (firewalls, load-balancers, etc.) between the application tier and the database tier can impose their own connection timeouts, regardless of MySql settings.
The indexing failure indicates that the connection was out of the pool for ~27 minutes with no SQL activity when that failure occurred.
Second, specifically regarding the "Could not get JDBC Connection" error, you may be running into Cloud SQL connection limits.
I recommend three things. One, make sure you are on the latest HikariCP (2.3.2) and latest MySql Connector/J driver (5.1.34). Two, enable DEBUG-level logging for the com.zaxxer.hikari package. HikariCP debug logging is not "chatty", but will log pool statistics every 30 seconds (and sometimes more detail in failure conditions). Lastly, try setting the maxPoolSize to something smaller (unless already at the default), and setting maxLifeTime to 15 or 20 minutes (1200000ms).
If the error occurs again, post updated logs containing the HikariCP debug logs around the time of failure. Also, feel free to open a tracking issue over on Github as larger logs etc. are easier there.

PostgreSQL Exception: “An I/O error occured while sending to the backend”

I run two web app in a machine and one DB in another machine.(They use the same DB)
One can run very well,But another one always down after about 4 hours.
Here is error information:
Error 2014-11-03 13:31:05,902 [http-bio-8080-exec-7] ERROR spi.SqlExceptionHelper - An I/O error occured while sending to the backend.
| Error 2014-11-03 13:31:05,904 [http-bio-8080-exec-7] ERROR spi.SqlExceptionHelper - This connection has been closed.
Postgresql logs:
2014-10-26 23:41:31 CDT WARNING: pgstat wait timeout
2014-10-27 01:13:48 CDT WARNING: pgstat wait timeout
2014-10-27 03:55:46 CDT LOG: could not receive data from client: Connection timed out
2014-10-27 03:55:46 CDT LOG: unexpected EOF on client connection
Who caused this problem, app or database? or net?
Reason:
At this point it was clear that the TCP connection that was sitting idle was already broken, but our app still assumed it to be open. By idle connections, I mean connections in the pool that aren’t in active use at the moment by the application.
After some search, I came to the conclusion that the network firewall between my app and the database is dropping the idle/stale connections after 1 hour. It seemed to be a common problem that many people have faced.
Solution:
In grails, you can set this in DataSource.groovy.
environments {
development {
dataSource {
//configure DBCP
properties {
maxActive = 50
maxIdle = 25
minIdle = 1
initialSize = 1
minEvictableIdleTimeMillis = 60000
timeBetweenEvictionRunsMillis = 60000
numTestsPerEvictionRun = 3
maxWait = 10000
testOnBorrow = true
testWhileIdle = true
testOnReturn = false
validationQuery = "SELECT 1"
}
}
}
}

Memcached - Connection refused

I tried a simple test with memcached from jelastic and always getting the exception "COnnection refused"... But the URL ist correct. Is some add
MemcachedClient c = new MemcachedClient(
new InetSocketAddress("memcached-myexample.jelastic.dogado.eu", 11211));
c.set("someKey", 3600, user);
User cachedUser = (User) c.get("someKey");
Here is the exception:
2014-01-02 00:07:41.820 INFO net.spy.memcached.MemcachedConnection: Added {QA sa=memcached-myexample.jelastic.dogado.eu/92.51.168.106:11211, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2014-01-02 00:07:41.833 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for someKey.
2014-01-02 00:07:41.835 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for someKey.
2014-01-02 00:07:41.858 INFO net.spy.memcached.MemcachedConnection: Connection state changed for sun.nio.ch.SelectionKeyImpl#2dc1482f
2014-01-02 00:07:41.859 INFO net.spy.memcached.MemcachedConnection: Reconnecting due to failure to connect to {QA sa=memcached-myexample.jelastic.dogado.eu/92.51.168.106:11211, #Rops=0, #Wops=2, #iq=0, topRop=null, topWop=Cmd: set Key: someKey Flags: 1 Exp: 3600 Data Length: 149, toWrite=0, interested=0}
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:629)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:409)
at net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1334)
I would try to telnet to your memcached cluster in order to rule out a firewall issue. You can do that with the following command.
telnet memcached-myexample.jelastic.dogado.eu 11211
If that doesn't work then you have network issues. If this is the case I would first check to see if you have a firewall up.
Add int portNum = 11211; at the first, and try again.
int portNum = 11211;
MemcachedClient c = new MemcachedClient(
new InetSocketAddress("memcached-myexample.jelastic.dogado.eu", portNum));
// Store a value (async) for one hour
c.set("someKey", 3600, someObject);
// Retrieve a value (synchronously).
Object myObject=c.get("someKey");
Thanks but the error was on a firewall rule from the provider. So not my failure.
Check /etc/memcached.conf file and update the server IP address from which you want to access the cache.