I have a k8s cluster deployed using kubespray.
The loadbalancer used is metalLB.
I have deployed a helm chart in this cluster which has a REST service up at an address 10.0.8.26:50028
I am sending requests to this service:
http://10.0.8.26:50028/data/v3/authentication
http://10.0.8.26:50028/data/v3/actions
http://10.0.8.26:50028/data/v3/versions
But each time I call an endpoints, it returns responses in an order:
503 transport is closing
500 Internal server
500 Internal server
204 - correct response
The same order is returned when i call each endpoint. Once a correct response is returned, after that there are no errors. But trying a new endoint will return error.
Can someone please help me?
This error was related to the connections between the services in the cluster. The cluster was using a kube-proxy in IPVS mode. Due to the IPVS timeouts (in he nodes), the connection between the services gets terminated after 900 seconds:
$ ipvsadm -l --timeout
Timeout (tcp tcpfin udp): 900 120 300
That means the tcp connection were terminated by another agent.
My application uses both grpc protocol for the communication between some services.
So, after setting grpc keepalive in the application's code and tcp keepalive of pods to a lower value, the issue was resolved.
The following links may provide more details:
https://success.docker.com/article/ipvs-connection-timeout-issue
https://github.com/moby/moby/issues/31208
https://github.com/kubernetes/kubernetes/issues/80298
Related
As we know, by default HTTP 1.1 uses persistent connections which is a long-lived connection. For any service in Kubernetes, for example, clusterIP mode, it is L4 based load balancer.
Suppose I have a service which is running a web server, this service contains 3 pods, I am wondering whether HTTP/1.1 requests can be distributed to 3 pods?
Could anybody help clarify it?
This webpage perfectly address your question: https://learnk8s.io/kubernetes-long-lived-connections
In the spirit of StackOverflow, let me summarize the webpage here:
TLDR: Kubernetes doesn't load balance long-lived connections, and some Pods might receive more requests than others.
Kubernetes Services do not exist. There's no process listening on the IP address and port of a Service.
The Service IP address is used only as a placeholder that will be translated by iptables rules into the IP addresses of one of the destination pods using cleverly crafted randomization.
Any connections from clients (regardless from inside or outside cluster) are established directly with the Pods, hence for an HTTP 1.1 persistent connection, the connection will be maintained between the client to a specific Pod until it is closed by either side.
Thus, all requests that use a single persistent connection will be routed to a single Pod (that is selected by the iptables rule when establishing connection) and not load-balanced to the other Pods.
Additional info:
By W3C RFC2616 (https://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.1.3), any proxy server that serves between client and server must maintain HTTP 1.1 persistent connections from client to itself and from itself to server.
I have simple OpenShift setup with a Service configured with 2 backend PODS. The PODS have its READINESS Probe configured. The Service is exposed via NodePort. All these configuration are fine it is working as expected. Once the readiness probes fails the Services marks the pod as unreachable and any NEW requests don't get routed to the POD.
Scenario 1:
I execute CURL command to access the services. While the curl command is executing I introduce readiness failure of Pod-1. I see that no new requests are sent to Pod -1. This is FINE
Scenario 2:
I hava Java Client and use Apache Commons Http Client library to initiate a connection to the Kubernetes Service. Connection gets established and it is working fine. The problem comes when I introduce readiness failure of Pod-1. I still see the Client sending requests to Pod-1 only, even though Services has only the endpoint of Pod-2.
My hunch, as the HttpClient uses Persistence Connection and Services when exposed via NodePorts, the destination address for the Http Connection is the POD-1 itself. So even if the readiness probe fails it still sends requests to Pod-1.
Can some one explain why this works they way described above ??
kube-proxy (or rather the iptables rules it generates) intentionally does not shut down existing TCP connections when changing the endpoint mapping (which is what a failed readiness probe will trigger). This has been discussed a lot on many tickets over the years with generally little consensus on if the behavior should be changed. For now your best bet is to instead use an Ingress Controller for HTTP traffic, since those all update live and bypass kube-proxy. You could also send back a Keep-Alive header in your responses and terminate persistent connections after N seconds or requests, though that only shrinks the window for badness.
I'm running a Mosquitto pod (docker.io/jllopis/mosquitto:v1.6.8-2) on an AKS instance (incidentally, using HTTP auth backend with the plugin) and have exposed that through a K8s Service. Looking at the logs for the broker I can see constant (multiple times at the same timestamp) sets of records like this:
1587048303: New connection from 10.240.0.6 on port 8883.
1587048303: New connection from 10.240.0.6 on port 1883.
1587048303: New connection from 10.240.0.6 on port 1883.
1587048305: Socket error on client <unknown>, disconnecting.
1587048305: Socket error on client <unknown>, disconnecting.
These come from different IP addresses but all within the same range; and checking using kubectl get pods --all-namespaces -o wide I can see that they are internal k8s processes, such as more-fs-watchers-sb64w, in the kube-system namespace.
What are all these doing and how can I stop them bombarding the broker? Why are they doing it? And could this be affecting other MQTT clients, legitimate ones, that are reporting intermittent connection problems?
I suspect that you are running the more-fs-watcher daemonset in your cluster.
This was vaguely recommended to go around the following issue: https://github.com/Azure/AKS/issues/772
Note that the issue is now fixed and live in the latest AKS cluster, so it should be safe to remove the more-fs-watcher DaemonSet.
I've followed the steps from Microsoft to create a Multi-Node On-Premises Service Fabric cluster. I've deployed a stateless app to the cluster and it seems to be working fine. When I have been connecting to the cluster I have used the IP Address of one of the nodes. Doing that, I can connect via Powershell using Connect-ServiceFabricCluster nodename:19000 and I can connect to the Service Fabric Explorer website (http://nodename:19080/explorer/index.html).
The examples online suggest that if I hosted in Azure I can connect to http://mycluster.eastus.cloudapp.azure.com:19000 and it resolves, however I can't work out what the equivalent is on my local. I tried connecting to my sample cluster: Connect-ServiceFabricCluster sampleCluster.domain.local:19000 but that returns:
WARNING: Failed to contact Naming Service. Attempting to contact Failover Manager Service...
WARNING: Failed to contact Failover Manager Service, Attempting to contact FMM...
False
WARNING: No such host is known
Connect-ServiceFabricCluster : No cluster endpoint is reachable, please check if there is connectivity/firewall/DNS issue.
Am I missing something in my setup? Should there be a central DNS entry somewhere that allows me to connect to the cluster? Or am I trying to do something that isn't supported On-Premises?
Yup, you're missing a load balancer.
This is the best resource I could find to help, I'll paste relevant contents in the event of it becoming unavailable.
Reverse Proxy — When you provision a Service Fabric cluster, you have an option of installing Reverse Proxy on each of the nodes on the cluster. It performs the service resolution on the client’s behalf and forwards the request to the correct node which contains the application. In majority of the cases, services running on the Service Fabric run only on the subset of the nodes. Since the load balancer will not know which nodes contain the requested service, the client libraries will have to wrap the requests in a retry-loop to resolve service endpoints. Using Reverse Proxy will address the issue since it runs on each node and will know exactly on what nodes is the service running on. Clients outside the cluster can reach the services running inside the cluster via Reverse Proxy without any additional configuration.
Source: Azure Service Fabric is amazing
I have an Azure Service Fabric resource running, but the same rules apply. As the article states, you'll need a reverse proxy/load balancer to resolve not only what nodes are running the API, but also to balance the load between the nodes running that API. So, health probes are necessary too so that the load balancer knows which nodes are viable options for sending traffic to.
As an example, Azure creates 2 rules off the bat:
1. LBHttpRule on TCP/19080 with a TCP probe on port 19080 every 5 seconds with a 2 count error threshold.
2. LBRule on TCP/19000 with a TCP probe on port 19000 every 5 seconds with a 2 count error threshold.
What you need to add to make this forward-facing is a rule where you forward port 80 to your service http port. Then the health probe can be an http probe that hits a path to test a 200 return.
Once you get into the cluster, you can resolve the services normally and SF will take care of availability.
In Azure-land, this is abstracted again to using something like API Management to further reverse proxy it to SSL. What a mess but it works.
Once your load balancer is set up, you'll have a single IP to hit for management, publishing, and regular traffic.
I recently started to experience a lot of failed connection between pods on my Kubernetes cluster (v1.8.3-gke.0).
Under load (400+ requests per second), requests to a service backed by 200 pods spread on machines with enough resources have a failure rate between 1 and 10 percent, which is clearly problematic.
The HTTP request doesn't fail with a 4xx or 5xx error status, it's just dropped or refused at some point.
Note that the pods are far from being at maximum capacity, their CPU usage are rarely over 200 millicores.
Even without being under heavy load, I monitored that a lot of requests failed randomly, on other services than the previous one, so I'm suspecting an issue at the cluster level (docker? kubernetes? kernel?).
I have made some curl benchmarking to measure failure rates.
When a HTTP request fails doing CURL request on a loop, the displayed error is curl: (7) Failed to connect to 10.x.x.x port 80: Connection refused.
We have a similar error messages when reported by our production code: Cannot connect to host svc:80 ssl:False [Connect call failed ('10.x.x.x', 80)], although most requests succeed.
Do you have any idea of what is going wrong, or how can I track this issue down?