Is there any relationship between feign clients 'readTimeout' and configuration of hystrix.execution.isolation.thread.timeoutInMilliseconds - spring-cloud

First of all, I'm sorry for my bad at English :)
I have a question about relationship between feign clients 'readTimeout', 'connectTimeout' and configuration of hystrix.execution.isolation.thread.timeoutInMilliseconds.
I have used 'thread' option instead of semaphore when setting isolation.
Below are the relevant my settings.
hystrix:
threadpool:
A:
coreSize: 5
maximumSize: 5
allowMaximumSizeToDivergeFromCoreSize: true
feign:
client:
config:
A:
connectTimeout: 500
readTimeout: 500
loggerLevel: basic
I hope you give an answer to me. 🙏

I found the answer. Hystrix's thread timeout priority is more precedes then Feign client timeout.
Hystrix's thread timeout
Test case
1. condition:
- the timeout that related to Feign: 2s
- the timeout that related to Hystrix's thread: 1s
2. result
- Feign's timeout can't be work!

Related

Hikari CP connections are suddenly invalidated

The bounty expires in 5 days. Answers to this question are eligible for a +50 reputation bounty.
Habil Ganbarli is looking for an answer from a reputable source.
Hi Stackoverflow family,
So we have an application with Kotlin & Spring boot that uses a single DB instance(1 GB Memory and instance class is db.t3.micro) as PostgreSQL and is hosted in AWS. What happens for the last couple of days is suddenly connections in my pool are invalidated(2-3 times a day) and the pool size drops drastically. In summary:
Let's say everything is normal in Hikari and the connections are closed and added according to the maxliftime which is 30 minutes by default and the log are like below:
HikariPool-1 - Pool stats (total=40, active=0, idle=40, waiting=0)
HikariPool-1 - Fill pool skipped, pool is at sufficient level.
Suddenly most of the connections become invalidated. Let's say 30 out of 40. The connections are closed before they pass their max lifetime and the logs are like below for all closed connections:
HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection#5257d7b2 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
HikariPool-1 - Closing connection org.postgresql.jdbc.PgConnection#7b673105: (connection is dead)
Additionally after these messages followed by multiple of this logs like below:
Add connection elided, waiting 6, queue 13
And the timeout failure stats like below:
HikariPool-1 - Timeout failure stats (total=12, active=12, idle=0, waiting=51)
Finally, I have left with lots of connection timeouts of requests due to the reason that there were no connection available for the most of the requests:
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms
I have added leak-detection-threshold and it also logs like below during the problem happening:
Connection leak detection triggered for org.postgresql.jdbc.PgConnection#3bb5f155 on thread http-nio-8080-exec-482, stack trace follows
java.lang.Exception: Apparent connection leak detected
The hikari config is like below:
hikari:
data-source-properties: stringtype=unspecified
maximum-pool-size: 40
leak-detection-threshold: 30000
When this problem happens queries in PostgreSQL also take a lot of time: 8-9 seconds and increase up to 15-35 seconds. Some queries even 55-65 seconds(which usually take 1-3 seconds at most in usual times). That is why we think it is not a query issue.
In addition to that some sources suggest using try with resources, however, it is not the case for us as we do not obtain connections manually. In addition to that increasing the max pool size from 20 to 40 also did not help. I would really appreciate any comment or hint as we are dealing with this issue for almost a week.

Zuul1 can't release the underlying httpclient's connection pool in pressure test

Zuul1's version is 1.3.1
I use jmeter to test a service through zuul and I set the number of threads is 1000, loop count is infinite. After a while, zuul's circuit breaker opend, but it never became accessable again. I found, the reason is the underlying httpclient's leased connections is full of the connection pool, but normally, they should be released after timeout. I wonder why leased connections cannot released?
My zuul1's configuration is:
ribbon:
ReadTimeout: 2000
ConnectTimeout: 1000
MaxTotalConnections: 200
MaxConnectionsPerHost: 50
zuul:
ribbonIsolationStrategy: THREAD
hystrix:
command:
default:
execution:
timeout:
enabled: true
isolation:
thread:
timeoutInMilliseconds: 8000
I got the answer: https://github.com/spring-cloud/spring-cloud-netflix/issues/2831
It's a bug of zuul.

How can one configure retry for IOExceptions in Spring Cloud Gateway?

I see that Retry Filter supports retries based on http status codes. I would like to configure retries in case of io exceptions such as connection resets. Is that possible with Spring Cloud Gateway 2?
I was using 2.0.0.RC1. Looks like latest build snapshot has support for retry based on exceptions. Fingers crossed for the next release.
Here is an example that retries twice for 500 series errors or IOExceptions:
filters:
- name: Retry
args:
retries: 2
series:
- SERVER_ERROR
exceptions:
- java.io.IOException

Akka http server dispatcher number constantly increasing

I'm testing an akka http service on AWS ECS. Each instance is added to a load balancer which regularly makes requests to a health check route. Since this is a test environment I can control for no other traffic going to the server. I notice the debug log indicating that the "default dispatcher" number is consistently increasing:
[DEBUG] [01/03/2017 22:33:03.007] [default-akka.actor.default-dispatcher-41200] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:33:29.142] [default-akka.actor.default-dispatcher-41196] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:33:33.035] [default-akka.actor.default-dispatcher-41204] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:33:59.174] [default-akka.actor.default-dispatcher-41187] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:34:03.066] [default-akka.actor.default-dispatcher-41186] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:34:29.204] [default-akka.actor.default-dispatcher-41179] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:34:33.097] [default-akka.actor.default-dispatcher-41210] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
This trend is never reversed and will get up into the tens of thousands pretty soon. Is this normal behavior or indicative of an issue?
Edit: I've updated the log snippet to show that the dispatcher thread number goes way beyond what I would expect.
Edit #2: Here is the health check route code:
class HealthCheckRoutes()(implicit executionContext: ExecutionContext)
extends LogHelper {
val routes = pathPrefix("health-check") {
pathEndOrSingleSlash {
complete(OK -> "Ok")
}
}
}
Probably, yes. I think that's the thread name.
If you do a thread dump on the server, does it have a great many open threads?
It looks like your server is leaking a thread per connection.
(It will probably be much easier to debug and diagnose this on your development machine, rather than on the EC2 VM. Try to reproduce it locally.)
For you Question, check this comment:
Akka http server dispatcher number constantly increasing
About dispatcher:
It is no problem to use default dispatcher for operations like health check.
Threads are controlled by the dispatcher you specified, or default-dispatcher if not specified.
default-dispatcher is setting as following, which means the thread pool size is between 8 to 64 or equal to (number of processors * 3).
default-dispatcher {
type = "Dispatcher"
executor = "default-executor"
default-executor {
fallback = "fork-join-executor"
}
fork-join-executor {
# Min number of threads to cap factor-based parallelism number to
parallelism-min = 8
# The parallelism factor is used to determine thread pool size using the
# following formula: ceil(available processors * factor). Resulting size
# is then bounded by the parallelism-min and parallelism-max values.
parallelism-factor = 3.0
# Max number of threads to cap factor-based parallelism number to
parallelism-max = 64
# Setting to "FIFO" to use queue like peeking mode which "poll" or "LIFO" to use stack
# like peeking mode which "pop".
task-peeking-mode = "FIFO"
}
Dispathcer Document:
http://doc.akka.io/docs/akka/2.4.16/scala/dispatchers.html
Configuration reference:
http://doc.akka.io/docs/akka/2.4.16/general/configuration.html#akka-actor
BTW for operations take a long time and blocks other operations, here is how to specify a custom dispatcher in Akka HTTP for them:
http://doc.akka.io/docs/akka-http/current/scala/http/handling-blocking-operations-in-akka-http-routes.html
According to this akka-http github issue there doesn't seem to be a problem: https://github.com/akka/akka-http/issues/722

Zuul timing out in long-ish requests

I am using a front end Spring Cloud application (micro service) acting as a Zuul proxy (#EnableZuulProxy) to route requests from an external source to other internal micro services written using spring cloud (spring boot).
The Zuul server is straight out of the applications in the samples section
#SpringBootApplication
#Controller
#EnableZuulProxy
#EnableDiscoveryClient
public class ZuulServerApplication {
public static void main(String[] args) {
new SpringApplicationBuilder(ZuulServerApplication.class).web(true).run(args);
}
}
I ran this set of services locally and it all seems to work fine but if I run it on a network with some load, or through a VPN, then I start to see Zuul forwarding errors, which I am seeing as client timeouts in the logs.
Is there any way to change the timeout on the Zuul forwards so that I can eliminate this issue from my immediate concerns? What accessible parameter settings are there for this?
In my case I had to change the following property:
zuul.host.socket-timeout-millis=30000
The properties to set are: ribbon.ReadTimeout in general and <service>.ribbon.ReadTimeout for a specific service, in milliseconds. The Ribbon wiki has some examples. This javadoc has the property names.
I have experienced the same problem: in long requests, Zuul's hystrix command kept timing out after around a second in spite of setting ribbon.ReadTimeout=10000.
I solved it by disabling timeouts completely:
hystrix:
command:
default:
execution:
timeout:
enabled: false
An alternative that also works is change Zuul's Hystrix isolation strategy to THREAD:
hystrix:
command:
default:
execution:
isolation:
strategy: THREAD
thread:
timeoutInMilliseconds: 10000
This worked for me, I had to set connection and socket timeout in the application.yml:
zuul:
host:
connect-timeout-millis: 60000 # starting the connection
socket-timeout-millis: 60000 # monitor the continuous incoming data flow
I had to alter two timeouts to force zuul to stop timing out long-running requests. Even if hystrix timeouts are disabled ribbon will still timeout.
hystrix:
command:
default:
execution:
timeout:
enabled: false
ribbon:
ReadTimeout: 100000
ConnectTimeout: 100000
If Zuul uses service discovery, you need to configure these timeouts with the ribbon.ReadTimeout and ribbon.SocketTimeout Ribbon properties.
If you have configured Zuul routes by specifying URLs, you need to use zuul.host.connect-timeout-millis and zuul.host.socket-timeout-millis
by routes i mean
zuul:
routes:
dummy-service:
path: /dummy/**
I had a similar issue and I was trying to set timeout globally, and also sequence of setting timeout for Hystrix and Ribbon matters.
After spending plenty of time, I ended up with this solution. My service was taking upto 50 seconds because of huge volume of data.
Points to consider before changing default value for Timeout:
Hystrix time should be greater than combined time of Ribbon ReadTimeout and ConnectionTimeout.
Use for specific service only, which means don't set globally (which doesn't work).
I mean use this:
command:
your-service-name:
instead of this:
command:
default:
Working solution:
hystrix:
command:
your-service-name:
execution:
isolation:
strategy: THREAD
thread:
timeoutInMilliseconds: 95000
your-service-name:
ribbon:
ConnectTimeout: 30000
ReadTimeout: 60000
MaxTotalHttpConnections: 500
MaxConnectionsPerHost: 100
Reference
Only these settings on application.yml worked for me:
ribbon:
ReadTimeout: 90000
ConnectTimeout: 90000
eureka:
enabled: true
zuul:
host:
max-total-connections: 1000
max-per-route-connections: 100
semaphore:
max-semaphores: 500
hystrix:
command:
default:
execution:
isolation:
thread:
timeoutInMilliseconds: 1000000
Hope it helps someone!