Kube-proxy or ELB "delaying" packets of HTTP requests - kubernetes

We're running a web API app on Kubernetes (1.9.3) in AWS (set with KOPS). The app is a Deployment and represented by a Service (type: LoadBalancer) which is actually an ELB (v1) on AWS.
This generally works - except that some packets (fragments of HTTP requests) are "delayed" somewhere between the client <-> app container. (In both HTTP and HTTPS which terminates on ELB).
From the node side:
( Note: Almost all packets on server-side arrive duplicated 3 times )
We use keep-alive so the tcp socket is open and requests arrive and return pretty fast. Then the problem happens:
first, a packet with only the headers arrives [PSH,ACK] (I see the headers in the payload with tcpdump).
an [ACK] is sent back by the container.
The tcp socket/stream is quiet for a very long time (up to 30s and more - but the interval is not consistent, we consider >1s as a problem ).
another [PSH, ACK] with the HTTP data arrives, and the request can finally be processed in the app.
From the client side:
I've run some traffic from my computer, recording it on the client side to see the other end of the problem, but not 100% sure it represents the real client side.
a [PSH,ASK] with the headers go out.
a couple of [ACK]s with parts of the payload start going out.
no response arrives for a few seconds (or more) and no more packets go out.
[ACK] marked as [TCP Window update] arrives.
a short pause again and [ACK]s start arriving and the session continues until the end of the payload.
This is only happening under load.
To my understanding, this is somewhere between the ELB and the Kube-Proxy, but I'm clueless and desperate for help.
This is the arguments Kube-Proxy runs with:
Commands: /bin/sh -c mkfifo /tmp/pipe; (tee -a /var/log/kube-proxy.log < /tmp/pipe & ) ; exec /usr/local/bin/kube-proxy --cluster-cidr=100.96.0.0/11 --conntrack-max-per-core=131072 --hostname-override=ip-10-176-111-91.ec2.internal --kubeconfig=/var/lib/kube-proxy/kubeconfig --master=https://api.internal.prd.k8s.local --oom-score-adj=-998 --resource-container="" --v=2 > /tmp/pipe 2>&1
And we use Calico as a CNI:
So far I've tried:
Using service.beta.kubernetes.io/aws-load-balancer-type: "nlb" - the issue remained.
(Playing around with ELB settings hoping something will do the trick ¯_(ツ)_/¯ )
Looking for errors in the Kube-Proxy, found rare occurrences of the following:
E0801 04:10:57.269475 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Get https://api.internal.prd.k8s.local/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp: lookup api.internal.prd.k8s.local on 10.176.0.2:53: no such host
...and...
E0801 04:09:48.075452 1 proxier.go:1667] Failed to execute iptables-restore: exit status 1 (iptables-restore: line 7 failed
)
I0801 04:09:48.075496 1 proxier.go:1669] Closing local ports after iptables-restore failure
I couldn't find anything describing such issue and will appreciate any help. Ideas on how to continue and troubleshoot are welcome.
Best,
A

Related

haproxy - layer 7 health check failure

I am getting occasional layer 7 health check failures. This happens on production machine seemingly at random, maybe once a minute or every few minutes on average. Here is the configuration:
backend api
mode http
option httpchk GET /api/v1/status HTTP/1.0
http-check expect status 200
balance roundrobin
server api1 127.0.0.1:8001 check fall 3 rise 2
server api2 127.0.0.1:8002 check fall 3 rise 2
The HAproxy log tells me the following:
Health check for server api/api2 failed, reason: Layer7 timeout, check duration: 10001ms, status: 2/3 UP.
Strange thing is when I run a script to fetch the same URL at a much faster pace than HAproxy, it never fails to return 200 response. It never hangs like it seems to do for HAproxy.
In addition, I'm getting occasional HAProxy error for various API calls, not just health checks, all looking quite similar:
https-in~ api/api1 45/0/0/-1/30045 504 194 - - sHVN 50/49/13/10/0 0/0 "POST /api/v1/accounts HTTP/1.1"
What could be the issue here? This one really got me stumped.

Long request returns with empty response after 120 seconds, caused by Network Load Balancer

I have a GKE cluster with 2 nodes, with a service of type LoadBalancer.
When I call the service internally a long request will not timeout after 120 seconds.
But if I call the external IP of the Network Load Balancer that forwards to the internal service, I get a "Empty reply from server" response.
External call example:
curl -v "http://<public-ip>/longResponse"
* Trying <public-ip>...
* TCP_NODELAY set
* Connected to <public-ip> (<public-ip>) port 80 (#0)
> GET /longResponse HTTP/1.1
> Host: <public-ip>
> User-Agent: curl/7.54.0
> Accept: */*
>
* Empty reply from server
* Connection #0 to host <public-ip> left intact
curl: (52) Empty reply from server
Internal call example:
/ # wget -O - -S <service-name>/longResponse
Connecting to location-service (10.3.255.181:80)
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Content-Length: 15
Date: Thu, 28 Feb 2019 10:31:14 GMT
Connection: close
- 100% |*********************************************************************************************************************************************************************************************************************| 15 0:00:00 ETA
/ #
I've tried to find documentation for request or socket timeout in the load balancer level, but I didn't encounter anything. Any idea?
Thanks.
Are you sure that's not a client-side timeout? Network LB doesn't process packets other than to route them, so it should never send any response back.
Try the -m flag to curl?
Also maybe capture a tcpdump on your client-side so you can see what the network is actually doing.
Get the load-balancer's backend name with:
gcloud compute backend-services list
then
BACKEND=name-of-your-backend
gcloud compute backend-services update $BACKEND --timeout=600s
otherwise, in the console: Network services ⇒ Load balancing ⇒ Backends then you can click your HTTP backend(s) and edit the settings, including the timeout.
On a wider note, this may be one of serval hops between server and client, each of which might timeout. You're better off either living with the timeout (and making your long polls complete before the timeout), or drip feeding data down the line... for instance, you can preprend whitespace to json, so for instance, send a space character every 30 seconds until you have a proper response body. This will keep the load-balance from timing out.

TCP socket over HTTP proxy disconnects after idle timeout

I have a problem with TCP socket when using HTTP tunneling over proxy.
Client (C++) opens a TCP socket to a server (JAVA). I added support for HTTP proxy. Everything worked good, client sends "HTTP connect" request like this and continues to plain TCP connection after:
CONNECT servername:5555 HTTP/1.1
Host: servername:5555
Proxy-Connection: Keep-Alive
HTTP/1.1 200
However, if idle timeout is configured in proxy and there is no actual data sent, connection is terminated though client sends TCP keep alive packets every 60 seconds. Idle timeout is configured to 10 minutes.
TCP keep alive is configured as following:
WSAIoctl(socket, SIO_KEEPALIVE_VALS, &alive, sizeof(alive), NULL, 0, &dwBytesRet, NULL, NULL)
client IP - 192.168.91.xxx
Proxy IP - 192.168.92.yyy
244 47.133017000 192.168.91.xxx 192.168.92.yyy TCP 55 [TCP Keep-Alive] 64351 > 808 [ACK] Seq=4336 Ack=13084 Win=65700 Len=1
245 47.133336000 192.168.92.yyy 192.168.91.xxx TCP 66 [TCP Keep-Alive ACK] 808 > 64351 [ACK] Seq=13084 Ack=4337 Win=65536 Len=0 SLE=4336 SRE=4337
Any ideas how to keep connection alive?
I tried to add "Connection: Keep-Alive" header though HTTP1.1 should do it automatically. It didn't help anyway.
This is a timeout at the application layer, e.g. the connection is idle because no application data are sent. What you've tried will not work because:
Connection: keep-alive is for having multiple HTTP requests over a single connection. This does not apply here because from the view of the proxy there is only a single request (CONNECT).
TCP keep-alive is to notice if the peer is not reachable any longer (died without closing connection or connection broke somewhere in the middle). It does not apply for cases, where the TCP connection is still alive, but it is idle (no application data).
Having a idle timeout for the proxy makes sense. The idea of HTTP is, that the client sends a request and the server sends a response. If it is idle while receiving the request or the response usually something is broken (or you have a reaaaaaly slow connection). If it is idle after request and response finished it is perfectly valid to close the connection, even if the client asked for Connection: keep-alive, because keep-alive is not a requirement on the server but only a suggestion to keep the connection open for more requests if the server has enough resources to do so.

socket programming for bad network

client:
socket(), connect() and then
for (1 to 1024) {
write(1024 bytes)
}
exit(0);
server:
socket(), bind(), listen()
while (1) {
accept()
while((n = read()) {
if (n == -1) abort(); /* never happended */
total_read += n
}
close()
}
now, client runs on Mac under NAT and server runs on my VPS (abroad)
generally, it works fine (client send all data and exit & server recv all data)
however, when client is running but suddenly the network is broken for couple minutes(and regain), the client won't exit after a long long time... I kill it with control + C and run it again, the server seems not read the data any more (client is still running)
here is what netstat shows:
client:
tcp4 0 130312 192.168.1.254.58573 A.B.C.D.8888 ESTABLISHED
server:
tcp 0 0 A.B.C.D:8888 a.b.c.d:54566 ESTABLISHED 10970/a.out
tcp 102136 0 A.B.C.D:8888 a.b.c.d:60916 ESTABLISHED -
A.B.C.D is my VPS address
a.b.c.d is my public client address
my quesiton is:
1, why ?
2, server will works fine after restarting, how to write code to get rid of it without restarting ?
In TCP, there's no way to tell that a connection has failed unless you try to send something on the connection. TCP doesn't perform active monitoring of the connection (actually, there are optional "keepalive" packets, but these are not normally sent until the connection has been idle for a couple of hours). When you send something, you'll eventually get an error if there's a timeout waiting for the other machine to return an acknowledgement. But if you're just reading data without sending, you can't tell that the connection has failed -- it just looks like the sender doesn't have anything to send.
You can resolve this by designing your application so that the client is required to send something every N seconds. Then set a timer in the server that detects that you haven't received anything for more than N seconds (you should add a little extra time to allow for transient delays).
When the network is broken what happens is that you clients keep sending data and at some point the socket send buffer gets full (I understand from what you show that you are sending 1024 Bytes, 1024 times, 1MB in total). The default for send buffer could be 16KB (surely less than 1MB). Then when the client tries to write, it gets blocked forever.
BTW, now I'm answering your question I don't know whether eventually after a number of TCP timeouts, TCP gives up and closes the socket making the socket interface return with error. I think that's not happening ... :) - So, connect fails if there is a problem in the network but write and read do not fail.
In the server side, the server gets blocked in read because it never receives the EOF.
Solution:
In the client side use non-blocking sockets, if the network is broken, at some point write will return with error EWOULDBLOCK. Then you will realize the send buffer is full for some reason. At that point, you could clouse the connection and try to connect again. If the network is broken, you will receive an error.
In the server side also use non-blocking sockets and select() function with a timeout. After a few timeouts you may decide there is a problem with the new connection and close it.

Haproxy 503 Service Unavailable . No server is available to handle this request

How does haproxy deal with static file , like .css, .js, .jpeg ? When I use my configure file , my brower says :
503 Service Unavailable
No server is available to handle this request.
This my config :
global
daemon
group root
maxconn 4000
pidfile /var/run/haproxy.pid
user root
defaults
log global
option redispatch
maxconn 65535
contimeout 5000
clitimeout 50000
srvtimeout 50000
retries 3
log 127.0.0.1 local3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 10s
listen dashboard_cluster :8888
mode http
stats refresh 5s
balance roundrobin
option httpclose
option tcplog
#stats realm Haproxy \ statistic
acl url_static path_beg -i /static
acl url_static path_end -i .css .jpg .jpeg .gif .png .js
use_backend static_server if url_static
backend static_server
mode http
balance roundrobin
option httpclose
option tcplog
stats realm Haproxy \ statistic
server controller1 10.0.3.139:80 cookie controller1 check inter 2000 rise 2 fall 5
server controller2 10.0.3.113:80 cookie controller2 check inter 2000 rise 2 fall 5
Does my file wrong ? What should I do to solve this problem ? ths !
What I think is the cause:
There was no default_backend defined. 503 will be sent by HAProxy---this will appear as NOSRV in the logs.
Another Possible Cause
Based on one of my experiences, the HTTP 503 error I receive was due to my 2 bindings I have for the same IP and port x.x.x.x:80.
frontend test_fe
bind x.x.x.x:80
bind x.x.x.x:443 ssl blah
# more config here
frontend conflicting_fe
bind x.x.x.x:80
# more config here
Haproxy configuration check does not warn you about it and netstat doesn't show you 2 LISTEN entries, that's why it took a while to realize what's going on.
This can also happen if you have 2 haproxy services running. Please check the running processes and terminate the older one.
Try making the timers bigger and check that the server is reachable.
From the HAproxy docs:
It can happen from many reasons:
The status code is always 3-digit. The first digit indicates a general status :
- 1xx = informational message to be skipped (eg: 100, 101)
- 2xx = OK, content is following (eg: 200, 206)
- 3xx = OK, no content following (eg: 302, 304)
- 4xx = error caused by the client (eg: 401, 403, 404)
- 5xx = error caused by the server (eg: 500, 502, 503)
503 when no server was available to handle the request, or in response to
monitoring requests which match the "monitor fail" condition
When a server's maxconn is reached, connections are left pending in a queue
which may be server-specific or global to the backend. In order not to wait
indefinitely, a timeout is applied to requests pending in the queue. If the
timeout is reached, it is considered that the request will almost never be
served, so it is dropped and a 503 error is returned to the client.
if you see SC in the logs:
SC The server or an equipment between it and haproxy explicitly refused
the TCP connection (the proxy received a TCP RST or an ICMP message
in return). Under some circumstances, it can also be the network
stack telling the proxy that the server is unreachable (eg: no route,
or no ARP response on local network). When this happens in HTTP mode,
the status code is likely a 502 or 503 here.
Check ACLs, check timeouts... and check the logs, that's the most important...