Kubernetes randomly refuses requests to services - kubernetes

I recently started to experience a lot of failed connection between pods on my Kubernetes cluster (v1.8.3-gke.0).
Under load (400+ requests per second), requests to a service backed by 200 pods spread on machines with enough resources have a failure rate between 1 and 10 percent, which is clearly problematic.
The HTTP request doesn't fail with a 4xx or 5xx error status, it's just dropped or refused at some point.
Note that the pods are far from being at maximum capacity, their CPU usage are rarely over 200 millicores.
Even without being under heavy load, I monitored that a lot of requests failed randomly, on other services than the previous one, so I'm suspecting an issue at the cluster level (docker? kubernetes? kernel?).
I have made some curl benchmarking to measure failure rates.
When a HTTP request fails doing CURL request on a loop, the displayed error is curl: (7) Failed to connect to 10.x.x.x port 80: Connection refused.
We have a similar error messages when reported by our production code: Cannot connect to host svc:80 ssl:False [Connect call failed ('10.x.x.x', 80)], although most requests succeed.
Do you have any idea of what is going wrong, or how can I track this issue down?

Related

Delayed Unauthorized responses from AKS

We use AKS's kube-apiserver for leader-election from a VM, external to k8s cluster but in the same VNET. We use a k8s client from the client-go package. The client tries to get/update the lease object every 2 sec. We observe occasional failures (every few hours) caused by delayed "Unauthorized" responses. The client-go HTTP client refreshes creds/tokens when it receives an Unauthorized response. However, when the Unauthorized response is extremely delayed (sometimes up to 30 sec!), the timeouts of HTTP client or leader-election mechanism kick in.
We could tune the HTTP client and leader-election timeouts but we do need a fast failover (e.g., up to 30 sec), so it would be great to eliminate these delays.
What is the reason these Unauthorized responses get so delayed?

(Kubernetes HPA) Scaling up pods got connection refused

I have some amount of traffic that can boost the cpu usage up to 180%. I tried using a single pod which works fine but the response was extremely slow. When I configured my HPA to cpu=80%, min=1 and max={2 or more} I hit connection refused when HPA was creating more pods. I tried put a large value to min (ie. min = 3) the connection refused relief but there will be too many idle pods when traffic is low. Is there any way to stop putting pod online until it is completely started?
I hit connection refused when HPA was creating more pods
Kubernetes uses the readinessProbe, to determine whether to redirect clients to some pods. If the readinessProbe for a Pod is not successful, then Service whose selectors could have matched that Pod would not take it under consideration.
If there is no readinessProbe defined, or if it was misconfigured, Pods that are still starting up may end up serving client requests. Connection refused could suggest there was no process listening yet for incoming connections.
Please share your deployment/statefulset/..., if you need further assistance setting this up.

grpc client shows Unavailable errors during server scale up

I have two grpc services that communicate to each other using Istio service mesh and have envoy proxies for all the services.
We have an issue where during the scaling up of server pods due to high load, the client throws a few grpc UNAVAILABLE(mostly)/DEADLINE_EXCEEDED/CANCELLED errors for a while as soon as the new pod is ready.
I don't see any CPU throttling in server pods at all.
What else could be the issue and how can I investigate this?
Without the concrete status message, it's hard to say what could be the cause of the errors mentioned above.
To reduce UNAVAILABLE, one way to ask the RPC to wait-for-ready: https://github.com/grpc/grpc/blob/master/doc/wait-for-ready.md. This feature is available in all major gRPC languages (some may rename it to fail-fast=false).
DEADLINE_EXCEEDED is caused by the timeout set by your application or Envoy config, you should be able to tune it.
CANCELLED could mean: 1. the server is entering a graceful shutdown state; 2. the server is overloaded and rejecting new connections.

Readiness Probe and Apache Common Http Client

I have simple OpenShift setup with a Service configured with 2 backend PODS. The PODS have its READINESS Probe configured. The Service is exposed via NodePort. All these configuration are fine it is working as expected. Once the readiness probes fails the Services marks the pod as unreachable and any NEW requests don't get routed to the POD.
Scenario 1:
I execute CURL command to access the services. While the curl command is executing I introduce readiness failure of Pod-1. I see that no new requests are sent to Pod -1. This is FINE
Scenario 2:
I hava Java Client and use Apache Commons Http Client library to initiate a connection to the Kubernetes Service. Connection gets established and it is working fine. The problem comes when I introduce readiness failure of Pod-1. I still see the Client sending requests to Pod-1 only, even though Services has only the endpoint of Pod-2.
My hunch, as the HttpClient uses Persistence Connection and Services when exposed via NodePorts, the destination address for the Http Connection is the POD-1 itself. So even if the readiness probe fails it still sends requests to Pod-1.
Can some one explain why this works they way described above ??
kube-proxy (or rather the iptables rules it generates) intentionally does not shut down existing TCP connections when changing the endpoint mapping (which is what a failed readiness probe will trigger). This has been discussed a lot on many tickets over the years with generally little consensus on if the behavior should be changed. For now your best bet is to instead use an Ingress Controller for HTTP traffic, since those all update live and bypass kube-proxy. You could also send back a Keep-Alive header in your responses and terminate persistent connections after N seconds or requests, though that only shrinks the window for badness.

Kubernetes cluster REST API error: 500 internal server error

I have a k8s cluster deployed using kubespray.
The loadbalancer used is metalLB.
I have deployed a helm chart in this cluster which has a REST service up at an address 10.0.8.26:50028
I am sending requests to this service:
http://10.0.8.26:50028/data/v3/authentication
http://10.0.8.26:50028/data/v3/actions
http://10.0.8.26:50028/data/v3/versions
But each time I call an endpoints, it returns responses in an order:
503 transport is closing
500 Internal server
500 Internal server
204 - correct response
The same order is returned when i call each endpoint. Once a correct response is returned, after that there are no errors. But trying a new endoint will return error.
Can someone please help me?
This error was related to the connections between the services in the cluster. The cluster was using a kube-proxy in IPVS mode. Due to the IPVS timeouts (in he nodes), the connection between the services gets terminated after 900 seconds:
$ ipvsadm -l --timeout
Timeout (tcp tcpfin udp): 900 120 300
That means the tcp connection were terminated by another agent.
My application uses both grpc protocol for the communication between some services.
So, after setting grpc keepalive in the application's code and tcp keepalive of pods to a lower value, the issue was resolved.
The following links may provide more details:
https://success.docker.com/article/ipvs-connection-timeout-issue
https://github.com/moby/moby/issues/31208
https://github.com/kubernetes/kubernetes/issues/80298