Default timeout value of istio - kubernetes

I have a service in which i have added a delay of 5 minutes. So the request to this service will take 5 minutes to give the response.
Now I have deployed this service in kubernetes with istio v1.5. When I am calling this service through the ingress gateway, I am getting a timeout in 3 minutes.
{"res_tx_duration":"-","route_name":"default","user_agent":"grpc-java-netty/1.29.0","response_code_details":"-","start_time":"****","request_id":"****","method":"POST","upstream_host":"127.0.0.1:6565","x_forwarded_for":"****","bytes_sent":"0","upstream_cluster":"****","downstream_remote_address":"****","req_duration":"0","path":"/****","authority":"****","protocol":"HTTP/2","upstream_service_time":"-","upstream_local_address":"-","duration":"180000","upstream_transport_failure_reason":"-","response_code":"0","res_duration":"-","response_flags":"DC","bytes_received":"5"}
I tried to set the timeout in the Virtual service greater than the 3 minutes, but that is not working. Only the timeouts less than 3 minutes set in the virtual service is working.
route:
- destination:
host: demo-service
port:
number: 8000
timeout: 240s
Is there any other configuration where we can set the timeout, other than VirtualService?
Is 3 minutes (180s) is the maximum value we can set in the VirtualService?

Related

Availability with Kubernetes

We run an internal a healthcheck of the service every 5 seconds. And we run Kubernetes liveness probes every 1 second. So in the worst scenario the Kubernetes loadbalancer has up-to-date information every 6 seconds.
My question is what happens when a client request hits a pod which is broken but not seen by the loadbalancer as unhealthy? Should the client implement a logic with retries? Or should we implement backend logic to handle the cases when a request hits a pod which is not yet seen as unhealthy by the loadbalancer?
Not sure how your architecture is however LoadBalancers are generally set with the ingress controller like Nginx and etc.
Load Balancer backed by the ingress controller forwards the traffic to the K8s service, and the K8s service mostly manages the request routing to PODs, not LB.
Based on the Readiness K8s service route the request to PODs, so if your POD is NotReady, the request won't reach there. Due to any delay if the request reaches to that POD there could be a chance you get internal error or so in return.
Retries
yes, you implement the retries at the client side but if you are on K8s, you can offload the retries part to the service mesh. it would be easy to maintain and integrate retries logic with the K8s and service mesh.
You can use the service mesh like Istio and implement the retries policy at virtual service level
retries:
attempts: 5
retryOn: 5xx
Virtual service
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ratings
spec:
hosts:
- ratings
http:
- route:
- destination:
host: ratings
subset: v1
retries:
attempts: 3
perTryTimeout: 2s
Read more at : https://istio.io/latest/docs/concepts/traffic-management/#retries

Disable Istio default retry strategy (at least on POST requests)

I have an application (microservices-based) running on kubernets with Istio 1.7.4
The microservices has its own mechanisms of transaction compensation on integration failures.
But Istio is retrying requests, when some integrations has 503 status code responses. I need to disabled it (at least on POST, which is non-idenpontent).
And let the application take care of it.
But I've tried so many ways without success. Can someone help me?
Documentation
From Istio Retries documentation: Default retry is hardcoded and it's value equal to 2.
The interval between retries (25ms+) is variable and determined
automatically by Istio, preventing the called service from being
overwhelmed with requests. The default retry behavior for HTTP
requests is to retry twice before returning the error.
Btw, it was initially 10, but decreased to 2 in Enable retries for specific status codes and reduce num retries to 2 commit.
workaround is to use virtual services
you can adjust your retry settings on a per-service basis in virtual
services without having to touch your service code. You can also
further refine your retry behavior by adding per-retry timeouts,
specifying the amount of time you want to wait for each retry attempt
to successfully connect to the service.
Examples
The following example configures a maximum of 3 retries to connect to this service subset after an initial call failure, each with a 2 second timeout.
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ratings
spec:
hosts:
- ratings
http:
- route:
- destination:
host: ratings
subset: v1
retries:
attempts: 3
perTryTimeout: 2s
Your case. Disabling retries. Taken from Disable globally the default retry policy:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: no-retries-for-one-service
spec:
hosts:
- one-service.default.svc.cluster.local
http:
- retries:
attempts: 0
route:
- destination:
host: one-service.default.svc.cluster.local

Kubernetes liveness probe

How can I write kubernetes readiness probe for my spring boot application, which takes around 20 second to startup ? I tried to follow example from Configure Liveness, Readiness and Startup Probes, but I'm not sure how does Kubernetes figure out status code 200 as success
apiVersion: v1
kind: Pod
metadata:
labels:
app: backend
name: liveness-http
spec:
containers:
- name: liveness
image: k8s.gcr.io/liveness
args:
- /server
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3
Kubernetes kubelet will make a http request at /healthz path in your application and expects http status code 200 returned from that endpoint for the probe to be successful. So you need to have a rest endpoint in a rest controller which will return 200 from /healthz. An easy way to achieve it would be to include spring boot actuator dependency and change the liveness probe path to /actuator/health/liveness. Spring boot actuator by default comes with a rest controller endpoint which returns 200 from /actuator/health/liveness.
https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-features.html#production-ready-kubernetes-probes
initialDelaySeconds field tells the kubelet that it should wait 20 seconds before performing the first probe. So this is generally configured to the value/time that the container / pod takes to start.
Configure initialDelaySeconds: 20 with the value as 20 seconds.
K8 engine considers response code 200-399 as a successful probe. In your case you can add initial delay seconds for your probe to start with a delay of 20 seconds.

How to avoid coredns resolving overhead in kubernetes

I think the title is pretty much self explanatory. I have done many experiments and the sad truth, is that coredns does add a 20 ms overhead to all the requests inside the cluster. At first we thought maybe by adding more replications, and balancing the resolving requests between more instances, we could improve the response time, but it did not help at all. (we scaled up from 2 pods to 4 pods)
There was some enhancements on the fluctuations of resolving time, after scaling up to 4 instances. But it wasn't what we were expecting, and the 20 ms overhead was still there.
We have some web-services that their actual response time is < 30 ms and using coredns we are doubling up the response time, and it is not cool!
After coming to a conclusion about this matter, we did an experiment to double-check that this is not an OS level overhead. And the results were not different from what we were expecting.
We thought maybe we can implement/deploy a solution based on putting list of needed hostname mappings for each pod, inside /etc/hosts of that pod. So my final questions are as follows:
Has anyone else experienced something similar with coredns?
Can you please suggest alternative solutions to coredns that work in k8s environment?
Any thoughts or insights are appreciated. Thanks in advance.
There are several things to look at when running coreDNS in your kubernetes cluster
Memory
AutoPath
Number of Replicas
Autoscaler
Other Plugins
Prometheus metrics
Separate Server blocks
Memory
CoreDNS recommended amount of memory for replicas is
MB required (default settings) = (Pods + Services) / 1000 + 54
Autopath
Autopath is a feature in Coredns that helps increase the response time for external queries
Normally a DNS query goes through
..svc.cluster.local
.svc.cluster.local
cluster.local
Then the configured forward, usually host search path (/etc/resolv.conf
Trying "example.com.default.svc.cluster.local"
Trying "example.com.svc.cluster.local"
Trying "example.com.cluster.local"
Trying "example.com"
Trying "example.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55265
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;example.com. IN A
;; ANSWER SECTION:
example.com. 30 IN A 93.184.216.34
This requires more memory so the calculation now becomes
MB required (w/ autopath) = (Number of Pods + Services) / 250 + 56
Number of replicas
Defaults to 2 but enabling the Autoscaler should help with load issues.
Autoscaler
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: coredns
namespace: default
spec:
maxReplicas: 20
minReplicas: 2
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: coredns
targetCPUUtilizationPercentage: 50
Node local cache
Beta in Kubernetes 1.15
NodeLocal DNSCache improves Cluster DNS performance by running a dns caching agent on cluster nodes as a DaemonSet. In today’s architecture, Pods in ClusterFirst DNS mode reach out to a kube-dns serviceIP for DNS queries. This is translated to a kube-dns/CoreDNS endpoint via iptables rules added by kube-proxy. With this new architecture, Pods will reach out to the dns caching agent running on the same node, thereby avoiding iptables DNAT rules and connection tracking. The local caching agent will query kube-dns service for cache misses of cluster hostnames(cluster.local suffix by default).
https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/
Other Plugins
These will also help see what is going on inside CoreDNS
Error - Any errors encountered during the query processing will be printed to standard output.
Trace - enable OpenTracing of how a request flows through CoreDNS
Log - query logging
health - CoreDNS is up and running this returns a 200 OK HTTP status code
ready - By enabling ready an HTTP endpoint on port 8181 will return 200 OK when all plugins that are able to signal readiness have done so.
Ready and Health should be used in the deployment
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /ready
port: 8181
scheme: HTTP
Prometheus Metrics
Prometheus Plugin
coredns_health_request_duration_seconds{} - duration to process a HTTP query to the local /health endpoint. As this a local operation, it should be fast. A (large) increase in this duration indicates the CoreDNS process is having trouble keeping up with its query load.
https://github.com/coredns/deployment/blob/master/kubernetes/Scaling_CoreDNS.md
Separate Server blocks
One last bit of advice is to separate the Cluster DNS server block to external block
CLUSTER_DOMAIN REVERSE_CIDRS {
errors
health
kubernetes
ready
prometheus :9153
loop
reload
loadbalance
}
. {
errors
autopath #kubernetes
forward . UPSTREAMNAMESERVER
cache
loop
}
More information about the k8 plugin and other options here
https://github.com/coredns/coredns/blob/master/plugin/kubernetes/README.md

How to debug failed requests with client_disconnected_before_any_response

We have an HTTP(s) Load Balancer created by a kubernetes ingress, which points to a backend formed by set of pods running nginx and Ruby on Rails.
Taking a look to the load balancer logs we have detected an increasing number of requests with a response code of 0 and statusDetails = client_disconnected_before_any_response.
We're trying to understand why this his happening, but we haven't found anything relevant. There is nothing in the nginx access or error logs.
This is happening for multiple kind of requests, from GET to POST.
We also suspect that sometimes despite of the request being logged with that error, the requests is actually passed to the backend. For instance we're seeing PG::UniqueViolation errors, due to idential sign up requests being sent twice to the backend in our sign up endpoint.
Any kind of help would be appreciated. Thanks!
 UPDATE 1
As requested here is the yaml file for the ingress resource:
 UPDATE 2
I've created a log-based Stackdriver metric, to count the number of requests that present this behavior. Here is the chart:
The big peaks approximately match the timestamp for these kubernetes events:
Full error: Readiness probe failed: Get http://10.48.1.28:80/health_check: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
So it seems sometimes the readiness probe for the pods behind the backend fails, but not always.
Here is the definition of the readinessProbe
readinessProbe:
failureThreshold: 3
httpGet:
httpHeaders:
- name: X-Forwarded-Proto
value: https
- name: Host
value: [redacted]
path: /health_check
port: 80
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
A response code of 0 and statusDetails = client_disconnected_before_any_response means the client closed the connection before the Load Balancer being able to provide a response as per this GCP documentation.
Investigating why it did not respond in time, one of the reasons could be the difference between the keepalive timeouts from nginx and the GCP Load Balancer, even if this will most-like provide a backend_connection_closed_before_data_sent_to_client caused by a 502 Bad Gateway race condition.
To make sure the backend responds to the request and to see if how long it takes, you can repeat this process for a couple of times (since you still get some valid responses):
curl response time
$ curl -w "#curl.txt" -o /dev/null -s IP_HERE
curl.txt content(create and save this file first):
time_namelookup: %{time_namelookup}\n
time_connect: %{time_connect}\n
time_appconnect: %{time_appconnect}\n
time_pretransfer: %{time_pretransfer}\n
time_redirect: %{time_redirect}\n
time_starttransfer: %{time_starttransfer}\n
----------\n
time_total: %{time_total}\n
If this is the case, please review the sign up endpoint code for any type of loop like the PG::UniqueViolation errors that you mentioned.