Zuul timing out in long-ish requests - spring-cloud

I am using a front end Spring Cloud application (micro service) acting as a Zuul proxy (#EnableZuulProxy) to route requests from an external source to other internal micro services written using spring cloud (spring boot).
The Zuul server is straight out of the applications in the samples section
#SpringBootApplication
#Controller
#EnableZuulProxy
#EnableDiscoveryClient
public class ZuulServerApplication {
public static void main(String[] args) {
new SpringApplicationBuilder(ZuulServerApplication.class).web(true).run(args);
}
}
I ran this set of services locally and it all seems to work fine but if I run it on a network with some load, or through a VPN, then I start to see Zuul forwarding errors, which I am seeing as client timeouts in the logs.
Is there any way to change the timeout on the Zuul forwards so that I can eliminate this issue from my immediate concerns? What accessible parameter settings are there for this?

In my case I had to change the following property:
zuul.host.socket-timeout-millis=30000

The properties to set are: ribbon.ReadTimeout in general and <service>.ribbon.ReadTimeout for a specific service, in milliseconds. The Ribbon wiki has some examples. This javadoc has the property names.

I have experienced the same problem: in long requests, Zuul's hystrix command kept timing out after around a second in spite of setting ribbon.ReadTimeout=10000.
I solved it by disabling timeouts completely:
hystrix:
command:
default:
execution:
timeout:
enabled: false
An alternative that also works is change Zuul's Hystrix isolation strategy to THREAD:
hystrix:
command:
default:
execution:
isolation:
strategy: THREAD
thread:
timeoutInMilliseconds: 10000

This worked for me, I had to set connection and socket timeout in the application.yml:
zuul:
host:
connect-timeout-millis: 60000 # starting the connection
socket-timeout-millis: 60000 # monitor the continuous incoming data flow

I had to alter two timeouts to force zuul to stop timing out long-running requests. Even if hystrix timeouts are disabled ribbon will still timeout.
hystrix:
command:
default:
execution:
timeout:
enabled: false
ribbon:
ReadTimeout: 100000
ConnectTimeout: 100000

If Zuul uses service discovery, you need to configure these timeouts with the ribbon.ReadTimeout and ribbon.SocketTimeout Ribbon properties.
If you have configured Zuul routes by specifying URLs, you need to use zuul.host.connect-timeout-millis and zuul.host.socket-timeout-millis
by routes i mean
zuul:
routes:
dummy-service:
path: /dummy/**

I had a similar issue and I was trying to set timeout globally, and also sequence of setting timeout for Hystrix and Ribbon matters.
After spending plenty of time, I ended up with this solution. My service was taking upto 50 seconds because of huge volume of data.
Points to consider before changing default value for Timeout:
Hystrix time should be greater than combined time of Ribbon ReadTimeout and ConnectionTimeout.
Use for specific service only, which means don't set globally (which doesn't work).
I mean use this:
command:
your-service-name:
instead of this:
command:
default:
Working solution:
hystrix:
command:
your-service-name:
execution:
isolation:
strategy: THREAD
thread:
timeoutInMilliseconds: 95000
your-service-name:
ribbon:
ConnectTimeout: 30000
ReadTimeout: 60000
MaxTotalHttpConnections: 500
MaxConnectionsPerHost: 100
Reference

Only these settings on application.yml worked for me:
ribbon:
ReadTimeout: 90000
ConnectTimeout: 90000
eureka:
enabled: true
zuul:
host:
max-total-connections: 1000
max-per-route-connections: 100
semaphore:
max-semaphores: 500
hystrix:
command:
default:
execution:
isolation:
thread:
timeoutInMilliseconds: 1000000
Hope it helps someone!

Related

Zuul1 can't release the underlying httpclient's connection pool in pressure test

Zuul1's version is 1.3.1
I use jmeter to test a service through zuul and I set the number of threads is 1000, loop count is infinite. After a while, zuul's circuit breaker opend, but it never became accessable again. I found, the reason is the underlying httpclient's leased connections is full of the connection pool, but normally, they should be released after timeout. I wonder why leased connections cannot released?
My zuul1's configuration is:
ribbon:
ReadTimeout: 2000
ConnectTimeout: 1000
MaxTotalConnections: 200
MaxConnectionsPerHost: 50
zuul:
ribbonIsolationStrategy: THREAD
hystrix:
command:
default:
execution:
timeout:
enabled: true
isolation:
thread:
timeoutInMilliseconds: 8000
I got the answer: https://github.com/spring-cloud/spring-cloud-netflix/issues/2831
It's a bug of zuul.

Is there any relationship between feign clients 'readTimeout' and configuration of hystrix.execution.isolation.thread.timeoutInMilliseconds

First of all, I'm sorry for my bad at English :)
I have a question about relationship between feign clients 'readTimeout', 'connectTimeout' and configuration of hystrix.execution.isolation.thread.timeoutInMilliseconds.
I have used 'thread' option instead of semaphore when setting isolation.
Below are the relevant my settings.
hystrix:
threadpool:
A:
coreSize: 5
maximumSize: 5
allowMaximumSizeToDivergeFromCoreSize: true
feign:
client:
config:
A:
connectTimeout: 500
readTimeout: 500
loggerLevel: basic
I hope you give an answer to me. 🙏
I found the answer. Hystrix's thread timeout priority is more precedes then Feign client timeout.
Hystrix's thread timeout
Test case
1. condition:
- the timeout that related to Feign: 2s
- the timeout that related to Hystrix's thread: 1s
2. result
- Feign's timeout can't be work!

Feign+Ribbon retry does not work

I have build a spring cloud demo with feign and ribbon in it,but the retry function doesn't work.In the same program ,I use RestTemplate too.In this case the retry function works well.Does Feign need extra configuration to use retry?
application.yml
spring:
cloud:
loadbalancer:
retry:
enabled: true
ribbon:
MaxAutoRetries: 1
MaxAutoRetriesNextServer: 2
OkToRetryOnAllOperations: true
#

Spring Cloud | Feign Hytrix | First Call Timeout

I have a service that has uses 3 feign clients. Each time I start my application, I get a TimeoutException on the first call to any feign client.
I have to trigger each feign client at least once before everything is stable. Looking around online, the problem is that something inside of feign or hystrix is lazy loaded and the solution was to make a configuration class that overrides the spring defaults. I've tried that wiith the below code and it is still not helping. I still see the same issue. Anyone know a fix for this? Is the only solution to call the feignclient twice via a hystrix callback?
#FeignClient(value = "SERVICE-NAME", configuration =ServiceFeignConfiguration.class)
#Configuration
public class ServiceFeignConfiguration {
#Value("${service.feign.connectTimeout:60000}")
private int connectTimeout;
#Value("${service.feign.readTimeOut:60000}")
private int readTimeout;
#Bean
public Request.Options options() {
return new Request.Options(connectTimeout, readTimeout);
}
}
Spring Cloud - Brixton.SR4
Spring Boot - 1.4.0.RELEASE
This is all running in docker
Ubuntu - 12.04
Docker - 1.12.1
Docker-Compose - 1.8
I found the solution to be that the default properties of Hystrix are not good. They have a very small timeout window and the request will always time out on the first try. I added these properties to my application.yml file in my config service and now all of my services can use feign with no problems and i dont have to code around the first time timeout
hystrix:
threadpool.default.coreSize: "20"
threadpool.default.maxQueueSize: "500000"
threadpool.default.keepAliveTimeMinutes: "2"
threadpool.default.queueSizeRejectionThreshold: "500000"
command:
default:
fallback.isolation.semaphore.maxConcurrentRequests: "20"
execution:
timeout:
enabled: "false"
isolation:
strategy: "THREAD"
thread:
timeoutInMilliseconds: "30000"

Spring cloud sidecar can not un-register nodeJS service once it is shut down

I suspect this is an issue, can anyone help to have a check?
In my sideCar application, I have application.yml:
server:
port: 5678
spring:
application:
name: nodeservice
sidecar:
port: ${nodeServer.instance.port:3000}
health-uri: http://localhost:${nodeServer.instance.port:3000}/app/health.json
eureka:
instance:
hostname: ${host.instance.name:localhost}
leaseRenewalIntervalInSeconds: 5 #default is 30, recommended to keep default
metadataMap:
instanceId: ${spring.application.name}:${spring.application.instance_id:${random.value}}
client:
serviceUrl:
defaultZone: http://localhost:8761/eureka/
And in my main spring config app, I have:
String url_node = "";
try {
InstanceInfo instance = discoveryClient.getNextServerFromEureka("nodeservice", false);
// InstanceInfo instance = discoveryClient.getNextServerFromEureka("foo", false);
url_node = instance.getHomePageUrl();
} catch (Exception e) {
}
Now I start my nodeJS server, I have in spring app:
url for nodeService is: http://SJCC02MT0NUFD58.local:3000/
This is perfect, but after I shutdown my nodeJS server,
http://localhost:3000/app/health.json url is totally down, BUT, in the main java spring app, I still see the same output there.
So it seemed even if the NodeJS service is no longer available, eureka is still remembering that in memory.
Anything wrong for my configuration?
Another question is why the url being discovered by spring is http://SJCC02MT0NUFD58.local:3000/, not http://localhost:3000? I already configured Eureka.server.instance.host to be localhost.
Thanks
You are seeing the appropriate behavior. Eureka and ribbon are built to be very resilient (AP in CAP). In the case you described, a service had at least one instance, then there were none, the ribbon eureka client keeps the last know list of servers around as a last resort. You're just printing the names, if you try to connect to that service it will fail. This is where you use the Hystrix Circuit Breaker that can provide a fallback in the case that no instances are up.