Zuul1 can't release the underlying httpclient's connection pool in pressure test - spring-cloud

Zuul1's version is 1.3.1
I use jmeter to test a service through zuul and I set the number of threads is 1000, loop count is infinite. After a while, zuul's circuit breaker opend, but it never became accessable again. I found, the reason is the underlying httpclient's leased connections is full of the connection pool, but normally, they should be released after timeout. I wonder why leased connections cannot released?
My zuul1's configuration is:
ribbon:
ReadTimeout: 2000
ConnectTimeout: 1000
MaxTotalConnections: 200
MaxConnectionsPerHost: 50
zuul:
ribbonIsolationStrategy: THREAD
hystrix:
command:
default:
execution:
timeout:
enabled: true
isolation:
thread:
timeoutInMilliseconds: 8000

I got the answer: https://github.com/spring-cloud/spring-cloud-netflix/issues/2831
It's a bug of zuul.

Related

Hikari CP connections are suddenly invalidated

The bounty expires in 5 days. Answers to this question are eligible for a +50 reputation bounty.
Habil Ganbarli is looking for an answer from a reputable source.
Hi Stackoverflow family,
So we have an application with Kotlin & Spring boot that uses a single DB instance(1 GB Memory and instance class is db.t3.micro) as PostgreSQL and is hosted in AWS. What happens for the last couple of days is suddenly connections in my pool are invalidated(2-3 times a day) and the pool size drops drastically. In summary:
Let's say everything is normal in Hikari and the connections are closed and added according to the maxliftime which is 30 minutes by default and the log are like below:
HikariPool-1 - Pool stats (total=40, active=0, idle=40, waiting=0)
HikariPool-1 - Fill pool skipped, pool is at sufficient level.
Suddenly most of the connections become invalidated. Let's say 30 out of 40. The connections are closed before they pass their max lifetime and the logs are like below for all closed connections:
HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection#5257d7b2 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
HikariPool-1 - Closing connection org.postgresql.jdbc.PgConnection#7b673105: (connection is dead)
Additionally after these messages followed by multiple of this logs like below:
Add connection elided, waiting 6, queue 13
And the timeout failure stats like below:
HikariPool-1 - Timeout failure stats (total=12, active=12, idle=0, waiting=51)
Finally, I have left with lots of connection timeouts of requests due to the reason that there were no connection available for the most of the requests:
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms
I have added leak-detection-threshold and it also logs like below during the problem happening:
Connection leak detection triggered for org.postgresql.jdbc.PgConnection#3bb5f155 on thread http-nio-8080-exec-482, stack trace follows
java.lang.Exception: Apparent connection leak detected
The hikari config is like below:
hikari:
data-source-properties: stringtype=unspecified
maximum-pool-size: 40
leak-detection-threshold: 30000
When this problem happens queries in PostgreSQL also take a lot of time: 8-9 seconds and increase up to 15-35 seconds. Some queries even 55-65 seconds(which usually take 1-3 seconds at most in usual times). That is why we think it is not a query issue.
In addition to that some sources suggest using try with resources, however, it is not the case for us as we do not obtain connections manually. In addition to that increasing the max pool size from 20 to 40 also did not help. I would really appreciate any comment or hint as we are dealing with this issue for almost a week.

Program gets stuck while acquiring data from MongoDB

I'm trying to develop a server/client program, the server is based on Netty. When the server gets data from a netty channel, I will query data from mongo. However, the channel will get stuck when it opened MongoDB's connection.
My code is simple is:
Query query = new Query(Criteria.where("account_id").is(accountId).and("account_type").is(accountType));
return seqKeyConvert.toProtoBuf(mongoTemplate.findOne(query, SeqKeyMongoEntity.class, collectionName));
And the server's log will be like this:
[nioEventLoopGroup-3-1] org.mongodb.driver.connection - Opened connection[connectionId{localValue:7, serverValue:7530577}] to 100.75.53.176:5521
[nioEventLoopGroup-3-6] org.mongodb.driver.connection - Opened connection [connectionId{localValue:10, serverValue:7530585}] to 100.75.53.176:5521
[nioEventLoopGroup-3-8] org.mongodb.driver.connection - Opened connection [connectionId{localValue:8, serverValue:7530584}] to 100.75.53.176:5521
[nioEventLoopGroup-3-7] org.mongodb.driver.connection - Opened connection [connectionId{localValue:9, serverValue:7530586}] to 100.75.53.176:5521
The MongoDB configuration is like:
spring:
data:
mongodb:
manager:
address: mongodb://100.75.53.176:5521,100.75.53.177:5521
min-connections-per-host: 2
max-connections-per-host: 100
threads-allowed-to-block-for-connection-multiplier: 10
server-selection-timeout: 30000
max-wait-time: 60000
max-connection-idel-time: 28800000
max-connection-life-time: 0
connect-timeout: 30000
socket-timeout: 0
socket-keep-alive: false
ssl-enabled: false
ssl-invalid-host-name-allowed: false
always-use-m-beans: false
heartbeat-socket-timeout: 20000
heartbeat-connect-timeout: 20000
min-heartbeat-frequency: 500
heartbeat-frequency: 10000
local-threshold: 15
The channel will be stuck here and without any error throws. I have tried to test whether the MongoDB is ok and if I had set too few MongoDB connection number limit but without any finding. Any advice? Thank you.
I use InitialzationBean to set some variables that use MongoDB. It turns out this problem could be solved if I replace InitialzationBean with CommandLineRunner. Temporarily, I don't if some kind of race condition happens.

Is there any relationship between feign clients 'readTimeout' and configuration of hystrix.execution.isolation.thread.timeoutInMilliseconds

First of all, I'm sorry for my bad at English :)
I have a question about relationship between feign clients 'readTimeout', 'connectTimeout' and configuration of hystrix.execution.isolation.thread.timeoutInMilliseconds.
I have used 'thread' option instead of semaphore when setting isolation.
Below are the relevant my settings.
hystrix:
threadpool:
A:
coreSize: 5
maximumSize: 5
allowMaximumSizeToDivergeFromCoreSize: true
feign:
client:
config:
A:
connectTimeout: 500
readTimeout: 500
loggerLevel: basic
I hope you give an answer to me. 🙏
I found the answer. Hystrix's thread timeout priority is more precedes then Feign client timeout.
Hystrix's thread timeout
Test case
1. condition:
- the timeout that related to Feign: 2s
- the timeout that related to Hystrix's thread: 1s
2. result
- Feign's timeout can't be work!

Spring Boot 1.5.3 Creating more connection than specified in application.properties

I am working on a project where i have dual datasource configured. On testing i have limit the no of max-active connections to five but when i checked on database, i found that application create around 25+ connections.
Code Sample
# Number of ms to wait before throwing an exception if no connection is available.
spring.datasource.tomcat.max-wait=1000
# Maximum number of active connections that can be allocated from this pool at the same time.
spring.datasource.tomcat.max-active=1
spring.datasource.tomcat.max-idle=1
spring.datasource.tomcat.min-idle=1
spring.datasource.tomcat.initial-size=1
# Validate the connection before borrowing it from the pool.
spring.datasource.tomcat.test-on-borrow=true
spring.datasource.tomcat.test-while-idle = true
spring.datasource.tomcat.validation-query = true
spring.datasource.tomcat.time-between-eviction-runs-millis = 360000
spring.rdatasource.tomcat.max-wait=1000
# Maximum number of active connections that can be allocated from this pool at the same time.
spring.rdatasource.tomcat.max-active=1
spring.rdatasource.tomcat.max-idle=1
spring.rdatasource.tomcat.min-idle=1
spring.rdatasource.tomcat.initial-size=1
# Validate the connection before borrowing it from the pool.
spring.rdatasource.tomcat.test-on-borrow=true
spring.rdatasource.tomcat.test-while-idle= true
spring.rdatasource.tomcat.validation-query = true
spring.rdatasource.tomcat.time-between-eviction-runs-millis = 360000
above connection is working fine, but exceeding no of connection to database. User which i am using is limited to 10 connection.
when i hit request to application than i am getting
query wait timeout error with unable to create initial pool size.
I am using tomcat connection pooling
Please provide me the solution so application will run with 10 connections limit which is set at database.

Zuul timing out in long-ish requests

I am using a front end Spring Cloud application (micro service) acting as a Zuul proxy (#EnableZuulProxy) to route requests from an external source to other internal micro services written using spring cloud (spring boot).
The Zuul server is straight out of the applications in the samples section
#SpringBootApplication
#Controller
#EnableZuulProxy
#EnableDiscoveryClient
public class ZuulServerApplication {
public static void main(String[] args) {
new SpringApplicationBuilder(ZuulServerApplication.class).web(true).run(args);
}
}
I ran this set of services locally and it all seems to work fine but if I run it on a network with some load, or through a VPN, then I start to see Zuul forwarding errors, which I am seeing as client timeouts in the logs.
Is there any way to change the timeout on the Zuul forwards so that I can eliminate this issue from my immediate concerns? What accessible parameter settings are there for this?
In my case I had to change the following property:
zuul.host.socket-timeout-millis=30000
The properties to set are: ribbon.ReadTimeout in general and <service>.ribbon.ReadTimeout for a specific service, in milliseconds. The Ribbon wiki has some examples. This javadoc has the property names.
I have experienced the same problem: in long requests, Zuul's hystrix command kept timing out after around a second in spite of setting ribbon.ReadTimeout=10000.
I solved it by disabling timeouts completely:
hystrix:
command:
default:
execution:
timeout:
enabled: false
An alternative that also works is change Zuul's Hystrix isolation strategy to THREAD:
hystrix:
command:
default:
execution:
isolation:
strategy: THREAD
thread:
timeoutInMilliseconds: 10000
This worked for me, I had to set connection and socket timeout in the application.yml:
zuul:
host:
connect-timeout-millis: 60000 # starting the connection
socket-timeout-millis: 60000 # monitor the continuous incoming data flow
I had to alter two timeouts to force zuul to stop timing out long-running requests. Even if hystrix timeouts are disabled ribbon will still timeout.
hystrix:
command:
default:
execution:
timeout:
enabled: false
ribbon:
ReadTimeout: 100000
ConnectTimeout: 100000
If Zuul uses service discovery, you need to configure these timeouts with the ribbon.ReadTimeout and ribbon.SocketTimeout Ribbon properties.
If you have configured Zuul routes by specifying URLs, you need to use zuul.host.connect-timeout-millis and zuul.host.socket-timeout-millis
by routes i mean
zuul:
routes:
dummy-service:
path: /dummy/**
I had a similar issue and I was trying to set timeout globally, and also sequence of setting timeout for Hystrix and Ribbon matters.
After spending plenty of time, I ended up with this solution. My service was taking upto 50 seconds because of huge volume of data.
Points to consider before changing default value for Timeout:
Hystrix time should be greater than combined time of Ribbon ReadTimeout and ConnectionTimeout.
Use for specific service only, which means don't set globally (which doesn't work).
I mean use this:
command:
your-service-name:
instead of this:
command:
default:
Working solution:
hystrix:
command:
your-service-name:
execution:
isolation:
strategy: THREAD
thread:
timeoutInMilliseconds: 95000
your-service-name:
ribbon:
ConnectTimeout: 30000
ReadTimeout: 60000
MaxTotalHttpConnections: 500
MaxConnectionsPerHost: 100
Reference
Only these settings on application.yml worked for me:
ribbon:
ReadTimeout: 90000
ConnectTimeout: 90000
eureka:
enabled: true
zuul:
host:
max-total-connections: 1000
max-per-route-connections: 100
semaphore:
max-semaphores: 500
hystrix:
command:
default:
execution:
isolation:
thread:
timeoutInMilliseconds: 1000000
Hope it helps someone!