Long request returns with empty response after 120 seconds, caused by Network Load Balancer - kubernetes

I have a GKE cluster with 2 nodes, with a service of type LoadBalancer.
When I call the service internally a long request will not timeout after 120 seconds.
But if I call the external IP of the Network Load Balancer that forwards to the internal service, I get a "Empty reply from server" response.
External call example:
curl -v "http://<public-ip>/longResponse"
* Trying <public-ip>...
* TCP_NODELAY set
* Connected to <public-ip> (<public-ip>) port 80 (#0)
> GET /longResponse HTTP/1.1
> Host: <public-ip>
> User-Agent: curl/7.54.0
> Accept: */*
>
* Empty reply from server
* Connection #0 to host <public-ip> left intact
curl: (52) Empty reply from server
Internal call example:
/ # wget -O - -S <service-name>/longResponse
Connecting to location-service (10.3.255.181:80)
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Content-Length: 15
Date: Thu, 28 Feb 2019 10:31:14 GMT
Connection: close
- 100% |*********************************************************************************************************************************************************************************************************************| 15 0:00:00 ETA
/ #
I've tried to find documentation for request or socket timeout in the load balancer level, but I didn't encounter anything. Any idea?
Thanks.

Are you sure that's not a client-side timeout? Network LB doesn't process packets other than to route them, so it should never send any response back.
Try the -m flag to curl?
Also maybe capture a tcpdump on your client-side so you can see what the network is actually doing.

Get the load-balancer's backend name with:
gcloud compute backend-services list
then
BACKEND=name-of-your-backend
gcloud compute backend-services update $BACKEND --timeout=600s
otherwise, in the console: Network services ⇒ Load balancing ⇒ Backends then you can click your HTTP backend(s) and edit the settings, including the timeout.
On a wider note, this may be one of serval hops between server and client, each of which might timeout. You're better off either living with the timeout (and making your long polls complete before the timeout), or drip feeding data down the line... for instance, you can preprend whitespace to json, so for instance, send a space character every 30 seconds until you have a proper response body. This will keep the load-balance from timing out.

Related

REST client gives error, while HTTP looks fine

My interface (an MKR Wifi 1010 Arduino) runs a very simple REST API, but when testing it with Mulesoft's Advanced Rest Client, I get this error:
The requested URL can't be reached
The service might be temporarily down or it may have moved permanently to a new web address.
The response status "0" is not allowed. See HTTP spec for more details: https://tools.ietf.org/html/rfc2616#section-6.1.1
When I check it with telnet though, it looks fine:
[bf#localhost ~]$ telnet 192.168.178.185 80
Trying 192.168.178.185...
Connected to 192.168.178.185.
Escape character is '^]'.
GET /api/gps HTTP/1.1
Host: 192.168.178.185
HTTP/1.1 200 OK
Connection: close
Content-Length: 9
Content-Type: application/json
"Success"
Connection closed by foreign host.
My question now is, is the rest client broken, or am I missing something in my reply? Of course I want any REST client to be able to process my interface correctly.

Filter by hostname is not working in my WireShark

I am using RESTFul client Insomnia to test my GET request.
I get 500 internal server error as shown below
* Preparing request to https://sample.azure-api.net/masterData/carTypes
* Using libcurl/7.57.0-DEV OpenSSL/1.0.2o zlib/1.2.11 libssh2/1.7.0_DEV
* Current time is 2019-08-30T05:03:09.029Z
* Disable timeout
* Enable automatic URL encoding
* Enable SSL validation
* Enable cookie sending with jar of 0 cookies
* Found bundle for host sample.azure-api.net: 0x205d69260b0 [can pipeline]
* Re-using existing connection! (#7) with host sample.azure-api.net
* Connected to sample.azure-api.net (XX.XXX.XXX.XX) port 443 (#7)
> GET /masterData/carTypes HTTP/1.1
> Host: sample.azure-api.net
> User-Agent: insomnia/6.6.2
> Accept: */*
< HTTP/1.1 500 Internal Server Error
< Content-Length: 111
< Content-Type: application/json
< Date: Fri, 30 Aug 2019 05:03:09 GMT
To troubleshoot, I opened wireshark, selected Ethernet2 interface and started to capture the traffic. also added a filter as follow
http.host == "sample.azure-api.net"
But I do not see any traffic filtered when I apply the above filter.
But when I try to filter like IP Destination, I get to see the traffic.
ip.dst == XX.XXX.XXX.XX && tcp.port == 443
Why filter by hostname is not working?
What I am trying to solve? Root Issue
When I try same request from c# code using rest client, I get below error
{"The request was aborted: Could not create SSL/TLS secure channel."}
So basically I am trying to find where exactly is request is failing!

Kube-proxy or ELB "delaying" packets of HTTP requests

We're running a web API app on Kubernetes (1.9.3) in AWS (set with KOPS). The app is a Deployment and represented by a Service (type: LoadBalancer) which is actually an ELB (v1) on AWS.
This generally works - except that some packets (fragments of HTTP requests) are "delayed" somewhere between the client <-> app container. (In both HTTP and HTTPS which terminates on ELB).
From the node side:
( Note: Almost all packets on server-side arrive duplicated 3 times )
We use keep-alive so the tcp socket is open and requests arrive and return pretty fast. Then the problem happens:
first, a packet with only the headers arrives [PSH,ACK] (I see the headers in the payload with tcpdump).
an [ACK] is sent back by the container.
The tcp socket/stream is quiet for a very long time (up to 30s and more - but the interval is not consistent, we consider >1s as a problem ).
another [PSH, ACK] with the HTTP data arrives, and the request can finally be processed in the app.
From the client side:
I've run some traffic from my computer, recording it on the client side to see the other end of the problem, but not 100% sure it represents the real client side.
a [PSH,ASK] with the headers go out.
a couple of [ACK]s with parts of the payload start going out.
no response arrives for a few seconds (or more) and no more packets go out.
[ACK] marked as [TCP Window update] arrives.
a short pause again and [ACK]s start arriving and the session continues until the end of the payload.
This is only happening under load.
To my understanding, this is somewhere between the ELB and the Kube-Proxy, but I'm clueless and desperate for help.
This is the arguments Kube-Proxy runs with:
Commands: /bin/sh -c mkfifo /tmp/pipe; (tee -a /var/log/kube-proxy.log < /tmp/pipe & ) ; exec /usr/local/bin/kube-proxy --cluster-cidr=100.96.0.0/11 --conntrack-max-per-core=131072 --hostname-override=ip-10-176-111-91.ec2.internal --kubeconfig=/var/lib/kube-proxy/kubeconfig --master=https://api.internal.prd.k8s.local --oom-score-adj=-998 --resource-container="" --v=2 > /tmp/pipe 2>&1
And we use Calico as a CNI:
So far I've tried:
Using service.beta.kubernetes.io/aws-load-balancer-type: "nlb" - the issue remained.
(Playing around with ELB settings hoping something will do the trick ¯_(ツ)_/¯ )
Looking for errors in the Kube-Proxy, found rare occurrences of the following:
E0801 04:10:57.269475 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Get https://api.internal.prd.k8s.local/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp: lookup api.internal.prd.k8s.local on 10.176.0.2:53: no such host
...and...
E0801 04:09:48.075452 1 proxier.go:1667] Failed to execute iptables-restore: exit status 1 (iptables-restore: line 7 failed
)
I0801 04:09:48.075496 1 proxier.go:1669] Closing local ports after iptables-restore failure
I couldn't find anything describing such issue and will appreciate any help. Ideas on how to continue and troubleshoot are welcome.
Best,
A

404 redirect to another server/domain

I'm looking for a solution with redirects to another domain if the response from HTTP server was 404.
acl not_found status 404
acl found_ceph status 200
use_backend minio_s3 rsprep ^HTTP/1.1\ 404\ (.*)$ HTTP/1.1\ 302\ Found\nLocation:\ / if not_found
use_backend ceph if found_ceph
But still not working, this rule goes to minio_s3 backend.
Thank you for you advice.
When the response from this backend has status 404, first add a Location header that will send the browser to example.com with the original URI intact, then set the status code to 302 so the browser executes a redirect.
backend my-backend
mode http
server my-server 203.0.113.113:80 check inter 60000 rise 1 fall 2
http-response set-header Location http://example.com%[capture.req.uri] if { status eq 404 }
http-response set-status 302 if { status eq 404 }
Test:
$ curl -v http://example.org/pics/funny/cat.jpg
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to example.org (127.0.0.1) port 80 (#0)
> GET /pics/funny/cat.jpg HTTP/1.1
> User-Agent: curl/7.35.0
> Host: example.org
> Accept: */*
The actual back-end returns 404, but we don't see it. Instead...
< HTTP/1.1 302 Moved Temporarily
< Last-Modified: Thu, 04 Aug 2016 16:59:51 GMT
< Content-Type: text/html
< Content-Length: 332
< Date: Sat, 07 Oct 2017 00:03:22 GMT
< Location: http://example.com/pics/funny/cat.jpg
The response body from the back-end's 404 error page will still be sent to the browser, but -- as it turns out -- the browser will not display it, so no harm done. This requires HAProxy 1.6 or later.
#Michael's answer is rather good, but isno't working for me for two reasons:
Mainly because the %[capture.req.uri] tag resolves to empty (HA Proxy 1.7.9 Docker image)
Also due to the fact that the original assumptions are incomplete, due to the fact that the frontend section is missing...
So I struggled for a while, as you find all kinds of answers on the Internet, between those guys who swear the 404 logic should be put in the frontend, vs those who choose the backend, and any possible kind of tags...
This is my answer, which works for me.
My use case is that if an image is not found on the backend behind HA Proxy, then an S3 bucket is checked.
The entry point is: https://myhostname:8080/path/to/image.jpeg
defaults
mode http
global
log 127.0.0.1:514 local0 debug
frontend come_on_over_here
bind :8080
# The following two lines are here to save values while we have access to them. They won't be available in the backend section.
http-request set-var(txn.path) path
http-request set-var(txn.query) query
http-request replace-value Host localhost:8080 dev.local:80
default_backend onprems_or_s3_be
backend onprems_or_s3_be
log global
acl path_photos var(txn.path) -m beg /path/prefix/i/want/to/strip/off
acl p_ext_jpeg var(txn.path) -m end .jpeg
acl is404 status eq 404
http-response set-header Location https://mybucket.s3.eu-west-3.amazonaws.com"%[var(txn.path),regsub(^/path_prefix_i_want_to_strip_off/,/)]?%[var(txn.query)]" if path_photos p_ext_jpeg is404
http-response set-status 301 if is404
server onprems_server dev.local:80 check

haproxy heartbeat with backend based on http post

I want to create a configuration such that the heartbeat between haproxy and the backend is based on HTTP POST.
Does anyone have any idea about this?
I have tried the below configuration, but it only sent the http HEAD to the backend server (I want HTTP POST):
backend mlp
mode http
balance roundrobin
server mlp1 192.168.12.165:9210 check
server mlp2 192.168.12.166:9210 check
Thanks for your help.
#Mohsin,
Thank you so much. I indeed work.
But I want to specify the request message, seems my configure doesn't work. I appreciate that if you can help too.
[root#LB_vAPP_1 tmp]# more /var/www/index.txt
POST / HTTP/1.1\r\nHost: 176.16.0.8:2234\r\nContent-Length: 653\r\n\r\n<?xml version=\"1.0\" encoding=\"gb2312\"?>\r\n<svc_init ver=\"3.2.0\">\r\n<hdr ver=\"3.2.0\">\r\n<client>\r\n<id>915948</id>\r\n<pwd>915948</pwd>\r\n<serviceid></serviceid>\r\n</client>\r\n<requestor><id>13969041845</id></requestor>\r\n</hdr>\r\n<slir ver=\"3.2.0\" res_type=\"SYNC\">\r\n<msids><msid enc=\"ASC\" type=\"MSISDN\">00000000000</msid></msids>\r\n<eqop>\r\n<resp_req type=\"LOW_DELAY\"/>\r\n<hor_acc>200</hor_acc>\r\n</eqop>\r\n<geo_info>\r\n<CoordinateReferenceSystem>\r\n<Identifier
>\r\n<code>4326</code>\r\n<codeSpace>EPSG</codeSpace>\r\n<edition>6.1</edition>\r\n</Identifier\r\n</CoordinateReferenceSystem>\r\n</geo_info>\r\n<loc_type type=\"CURRENT_OR_LAST\"/>\r\n<prio type=\"HIGH\"/>\r\n</slir>\r\n</svc_init>\r\n\r\n\r\n\r\n
my haproxy.conf file is as bellowing:
#---------------------------------------------------------------------
# Example configuration for a possible web application. See the
# full configuration options online.
#
# http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#---------------------------------------------------------------------
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
log 127.0.0.1 local7
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
ulimit-n 65536
daemon
nbproc 1
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
defaults
mode tcp
retries 3
log global
option redispatch
# option abortonclose
retries 3
timeout queue 28s
timeout connect 28s
timeout client 28s
timeout server 28s
timeout check 1s
maxconn 32000
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend mlp
mode tcp
option persist
# bind 10.68.97.42:9211 ssl crt /etc/ssl/server.pem
#bind 10.68.97.42:9211
bind 10.68.97.42:9210
default_backend mlp
frontend supl
mode tcp
option persist
bind 10.68.97.42:7275
default_backend supl
#-------------
# option1 http check
#------------
backend mlp
mode http
balance roundrobin
option httpchk POST / HTTP/1.1\r\nHost: 176.16.0.8:2234\r\nContent-Length: 653\r\n\r\n{<?xml version=\"1.0\" encoding=\"gb2312\"?>\r\n<svc_init ver=\"3.2.0\">\r\n<hdr ver=\"3.2.0\">\r\n<client>\r\n<id>915948</id>\r\n<pwd>915948</pwd>\r\n<serviceid></serviceid>\r\n</client>\r\n<requestor><id>13969041845</id></requestor>\r\n</hdr>\r\n<slir ver=\"3.2.0\" res_type=\"SYNC\">\r\n<msids><msid enc=\"ASC\" type=\"MSISDN\">00000000000</msid></msids>\r\n<eqop>\r\n<resp_req type=\"LOW_DELAY\"/>\r\n<hor_acc>200</hor_acc>\r\n</eqop>\r\n<geo_info>\r\n<CoordinateReferenceSystem>\r\n<Identifier>\r\n<code>4326</code>\r\n<codeSpace>EPSG</codeSpace>\r\n<edition>6.1</edition>\r\n</Identifier>\r\n</CoordinateReferenceSystem>\r\n</geo_info>\r\n<loc_type type=\"CURRENT_OR_LAST\"/>\r\n<prio type=\"HIGH\"/>\r\n</slir>\r\n</svc_init>\r\n\r\n\r\n\r\n}
http-check expect rstring <result resid=\"4\">UNKNOWN SUBSCRIBER</result>
server mlp1 192.168.12.165:9210 check
server mlp2 192.168.12.166:9210 check
#server mlp2 192.168.12.166:9210 check
backend supl
mode tcp
source 0.0.0.0 usesrc clientip
balance roundrobin
server supl1 192.168.12.165:7275 check
server supl2 192.168.12.166:7275 check
#server supl2 192.168.12.166:7275 check
#Mohsin,
Thanks for your answer, it gave me the critical clue to resolve this issue.
However, my message is as bellowing, right now it can work as I want(send the specified request and check the specified response). I post it, hopefully, it may help others also. One point is, the content-length is very important.
backend mlp
mode http
balance roundrobin
option httpchk POST / HTTP/1.1\r\nUser-Agent:HAProxy\r\nHost:176.16.0.8:2234\r\nContent-Type:\ text/xml\r\nContent-Length:516\r\n\r\n91594891594813969041845000000000003200
http-check expect rstring <result resid=\"4\">UNKNOWN SUBSCRIBER</result>
server mlp1 192.168.12.165:9210 check
server mlp2 192.168.12.166:9210 check
I was able to get this working after a bit of experimenting.
This was my setup
HAProxy -> NGINX -> Backend
I was sniffing the requests at the NGINX stage with tcpdump to see what was actually happening.
In order to change the health check request we have to follow a hack described in the documentation to change the HTTP version and send headers:
It is possible to send HTTP headers after the string by concatenating them using rn and backslashes spaces. This is useful to send Host headers when probing a virtual host
This is the raw http check I want to send:
POST ${ENDPOINT} HTTP/1.0
Content-Type: application/json
{"body": "json"}
The big issue here is that HAProxy adds a new header by itself: Connection: close, so this is what NGINX gets:
POST ${ENDPOINT} HTTP/1.0
Content-Type: application/json
{"body": "json"}
Connection: close
This leads, at least in my case to error 400s due to a malformed request.
The fix is to add a Content-Length header:
POST ${ENDPOINT} HTTP/1.0
Content-Type: application/json
Content-Length: 16
{"body": "json"}
Connection: close
Since the Content-Length should take precedence over the actual length, this forces the last header to be ignored. This is what NGINX passes to the backend:
POST ${ENDPOINT} HTTP/1.0
Host: ~^(.+)$
X-Real-IP: ${IP}
X-Forwarded-For: ${IP}
Connection: close
Content-Length: 16
Content-Type: application/json
{"body": "json"}
This is my final check:
option httpchk POST ${ENDPOINT} HTTP/1.0\r\nContent-Type:\ application/json\r\nContent-Length:\ 16\r\n\r\n{\"body\":\"json\"}
If it's just JSON you should be ok copying and pasting this and adjusting the content length.
However, I do recommend that you follow the same procedure and sniff the actual health checks, because, with the characters one has to escape in the config file, creating the request properly can be tricky.
Open haproxy/conf/haproxy.conf file. Goto end of the page, you will see that there is a line 'option httpchk GET /', change GET to POST and you are done.
Let me know if you face any problem.