Delayed Unauthorized responses from AKS - kubernetes

We use AKS's kube-apiserver for leader-election from a VM, external to k8s cluster but in the same VNET. We use a k8s client from the client-go package. The client tries to get/update the lease object every 2 sec. We observe occasional failures (every few hours) caused by delayed "Unauthorized" responses. The client-go HTTP client refreshes creds/tokens when it receives an Unauthorized response. However, when the Unauthorized response is extremely delayed (sometimes up to 30 sec!), the timeouts of HTTP client or leader-election mechanism kick in.
We could tune the HTTP client and leader-election timeouts but we do need a fast failover (e.g., up to 30 sec), so it would be great to eliminate these delays.
What is the reason these Unauthorized responses get so delayed?

Related

Kubernetes options request Load Balancer latency of 10 seconds - how to debug?

I have an issue on the backend responding to Options requests.
Sometimes the serve responds in a few ms to the options request, but many times it takes 10 seconds or more. I have a feeling the load-balancer / ingress is holding back since the backend server is not doing this locally and it just has a nodejs app.use(cors()).
The main question is, how can I debug where this goes wrong?
I can see incoming requests but I cannot find for sure if it is the loadbalancer, or the loadbalancer waiting for a response from server.

grpc client shows Unavailable errors during server scale up

I have two grpc services that communicate to each other using Istio service mesh and have envoy proxies for all the services.
We have an issue where during the scaling up of server pods due to high load, the client throws a few grpc UNAVAILABLE(mostly)/DEADLINE_EXCEEDED/CANCELLED errors for a while as soon as the new pod is ready.
I don't see any CPU throttling in server pods at all.
What else could be the issue and how can I investigate this?
Without the concrete status message, it's hard to say what could be the cause of the errors mentioned above.
To reduce UNAVAILABLE, one way to ask the RPC to wait-for-ready: https://github.com/grpc/grpc/blob/master/doc/wait-for-ready.md. This feature is available in all major gRPC languages (some may rename it to fail-fast=false).
DEADLINE_EXCEEDED is caused by the timeout set by your application or Envoy config, you should be able to tune it.
CANCELLED could mean: 1. the server is entering a graceful shutdown state; 2. the server is overloaded and rejecting new connections.

Readiness Probe and Apache Common Http Client

I have simple OpenShift setup with a Service configured with 2 backend PODS. The PODS have its READINESS Probe configured. The Service is exposed via NodePort. All these configuration are fine it is working as expected. Once the readiness probes fails the Services marks the pod as unreachable and any NEW requests don't get routed to the POD.
Scenario 1:
I execute CURL command to access the services. While the curl command is executing I introduce readiness failure of Pod-1. I see that no new requests are sent to Pod -1. This is FINE
Scenario 2:
I hava Java Client and use Apache Commons Http Client library to initiate a connection to the Kubernetes Service. Connection gets established and it is working fine. The problem comes when I introduce readiness failure of Pod-1. I still see the Client sending requests to Pod-1 only, even though Services has only the endpoint of Pod-2.
My hunch, as the HttpClient uses Persistence Connection and Services when exposed via NodePorts, the destination address for the Http Connection is the POD-1 itself. So even if the readiness probe fails it still sends requests to Pod-1.
Can some one explain why this works they way described above ??
kube-proxy (or rather the iptables rules it generates) intentionally does not shut down existing TCP connections when changing the endpoint mapping (which is what a failed readiness probe will trigger). This has been discussed a lot on many tickets over the years with generally little consensus on if the behavior should be changed. For now your best bet is to instead use an Ingress Controller for HTTP traffic, since those all update live and bypass kube-proxy. You could also send back a Keep-Alive header in your responses and terminate persistent connections after N seconds or requests, though that only shrinks the window for badness.

Kubernetes cluster REST API error: 500 internal server error

I have a k8s cluster deployed using kubespray.
The loadbalancer used is metalLB.
I have deployed a helm chart in this cluster which has a REST service up at an address 10.0.8.26:50028
I am sending requests to this service:
http://10.0.8.26:50028/data/v3/authentication
http://10.0.8.26:50028/data/v3/actions
http://10.0.8.26:50028/data/v3/versions
But each time I call an endpoints, it returns responses in an order:
503 transport is closing
500 Internal server
500 Internal server
204 - correct response
The same order is returned when i call each endpoint. Once a correct response is returned, after that there are no errors. But trying a new endoint will return error.
Can someone please help me?
This error was related to the connections between the services in the cluster. The cluster was using a kube-proxy in IPVS mode. Due to the IPVS timeouts (in he nodes), the connection between the services gets terminated after 900 seconds:
$ ipvsadm -l --timeout
Timeout (tcp tcpfin udp): 900 120 300
That means the tcp connection were terminated by another agent.
My application uses both grpc protocol for the communication between some services.
So, after setting grpc keepalive in the application's code and tcp keepalive of pods to a lower value, the issue was resolved.
The following links may provide more details:
https://success.docker.com/article/ipvs-connection-timeout-issue
https://github.com/moby/moby/issues/31208
https://github.com/kubernetes/kubernetes/issues/80298

Kubernetes randomly refuses requests to services

I recently started to experience a lot of failed connection between pods on my Kubernetes cluster (v1.8.3-gke.0).
Under load (400+ requests per second), requests to a service backed by 200 pods spread on machines with enough resources have a failure rate between 1 and 10 percent, which is clearly problematic.
The HTTP request doesn't fail with a 4xx or 5xx error status, it's just dropped or refused at some point.
Note that the pods are far from being at maximum capacity, their CPU usage are rarely over 200 millicores.
Even without being under heavy load, I monitored that a lot of requests failed randomly, on other services than the previous one, so I'm suspecting an issue at the cluster level (docker? kubernetes? kernel?).
I have made some curl benchmarking to measure failure rates.
When a HTTP request fails doing CURL request on a loop, the displayed error is curl: (7) Failed to connect to 10.x.x.x port 80: Connection refused.
We have a similar error messages when reported by our production code: Cannot connect to host svc:80 ssl:False [Connect call failed ('10.x.x.x', 80)], although most requests succeed.
Do you have any idea of what is going wrong, or how can I track this issue down?