haproxy - layer 7 health check failure - haproxy

I am getting occasional layer 7 health check failures. This happens on production machine seemingly at random, maybe once a minute or every few minutes on average. Here is the configuration:
backend api
mode http
option httpchk GET /api/v1/status HTTP/1.0
http-check expect status 200
balance roundrobin
server api1 127.0.0.1:8001 check fall 3 rise 2
server api2 127.0.0.1:8002 check fall 3 rise 2
The HAproxy log tells me the following:
Health check for server api/api2 failed, reason: Layer7 timeout, check duration: 10001ms, status: 2/3 UP.
Strange thing is when I run a script to fetch the same URL at a much faster pace than HAproxy, it never fails to return 200 response. It never hangs like it seems to do for HAproxy.
In addition, I'm getting occasional HAProxy error for various API calls, not just health checks, all looking quite similar:
https-in~ api/api1 45/0/0/-1/30045 504 194 - - sHVN 50/49/13/10/0 0/0 "POST /api/v1/accounts HTTP/1.1"
What could be the issue here? This one really got me stumped.

Related

Getting 504 gateway timeout error when accessing node application through haproxy

I am facing following situations when configuring haproxy with node/express application. I am trying to
achieve following.
(https) (http)
browser ======> haproxy =====> node application
When loading the node application through the browser I am getting http 504 gateway time-out error.
Below is my haproxy configurtions.
haproxy configurations
Following are the haproxy logs.
vm-2 haproxy[21255]: 127.0.0.1:45948 [23/Dec/2019:10:57:51.411] https-in~ servers/server1 0/0/0/-1/100001 504 194 - - sH-- 1/1/0/0/0 0/0 "GET / HTTP/1.1"
vm-2 haproxy[21255]: 127.0.0.1:45948 [23/Dec/2019:10:57:51.411] https-in~ servers/server1 0/0/0/-1/100001 504 194 - - sH-- 1/1/0/0/0 0/0 "GET / HTTP/1.1"
vm-2 haproxy[21255]: 127.0.0.1:46122 [23/Dec/2019:10:59:31.435] https-in~ servers/server1 0/0/0/-1/100002 504 194 - - sH-- 1/1/0/0/0 0/0 "GET /favicon.ico HTTP/1.1"
Any help would be appreciated.
You're haproxy logs indicate that it's taking over 100 seconds (ie 100001/100002) for the request to complete and that it's being aborted (ie -1) before your backend server can send the full response.
If you're looking for a strictly haproxy solution (ie. you can't/won't tune your application) then you would need to play with haproxy timeout settings.
We faced the same problem, the client requests to the server were 504s sent by the HAProxy. We found out that the defaults configurations in the haproxy.cfg file had the timeout server property that defined the 504 response (setting it to a lower value, 1s in our case, would automatically result in a 504). Increasing that value is a way to have a longer connection between the proxy and the backend.

Kube-proxy or ELB "delaying" packets of HTTP requests

We're running a web API app on Kubernetes (1.9.3) in AWS (set with KOPS). The app is a Deployment and represented by a Service (type: LoadBalancer) which is actually an ELB (v1) on AWS.
This generally works - except that some packets (fragments of HTTP requests) are "delayed" somewhere between the client <-> app container. (In both HTTP and HTTPS which terminates on ELB).
From the node side:
( Note: Almost all packets on server-side arrive duplicated 3 times )
We use keep-alive so the tcp socket is open and requests arrive and return pretty fast. Then the problem happens:
first, a packet with only the headers arrives [PSH,ACK] (I see the headers in the payload with tcpdump).
an [ACK] is sent back by the container.
The tcp socket/stream is quiet for a very long time (up to 30s and more - but the interval is not consistent, we consider >1s as a problem ).
another [PSH, ACK] with the HTTP data arrives, and the request can finally be processed in the app.
From the client side:
I've run some traffic from my computer, recording it on the client side to see the other end of the problem, but not 100% sure it represents the real client side.
a [PSH,ASK] with the headers go out.
a couple of [ACK]s with parts of the payload start going out.
no response arrives for a few seconds (or more) and no more packets go out.
[ACK] marked as [TCP Window update] arrives.
a short pause again and [ACK]s start arriving and the session continues until the end of the payload.
This is only happening under load.
To my understanding, this is somewhere between the ELB and the Kube-Proxy, but I'm clueless and desperate for help.
This is the arguments Kube-Proxy runs with:
Commands: /bin/sh -c mkfifo /tmp/pipe; (tee -a /var/log/kube-proxy.log < /tmp/pipe & ) ; exec /usr/local/bin/kube-proxy --cluster-cidr=100.96.0.0/11 --conntrack-max-per-core=131072 --hostname-override=ip-10-176-111-91.ec2.internal --kubeconfig=/var/lib/kube-proxy/kubeconfig --master=https://api.internal.prd.k8s.local --oom-score-adj=-998 --resource-container="" --v=2 > /tmp/pipe 2>&1
And we use Calico as a CNI:
So far I've tried:
Using service.beta.kubernetes.io/aws-load-balancer-type: "nlb" - the issue remained.
(Playing around with ELB settings hoping something will do the trick ¯_(ツ)_/¯ )
Looking for errors in the Kube-Proxy, found rare occurrences of the following:
E0801 04:10:57.269475 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:85: Failed to list *core.Endpoints: Get https://api.internal.prd.k8s.local/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp: lookup api.internal.prd.k8s.local on 10.176.0.2:53: no such host
...and...
E0801 04:09:48.075452 1 proxier.go:1667] Failed to execute iptables-restore: exit status 1 (iptables-restore: line 7 failed
)
I0801 04:09:48.075496 1 proxier.go:1669] Closing local ports after iptables-restore failure
I couldn't find anything describing such issue and will appreciate any help. Ideas on how to continue and troubleshoot are welcome.
Best,
A

Connection reset by tomcat server on continuous reception of HTTP GET request

I am doing load test of web server. Current i am using tomcat 6 to test my code. While running the server resets the connection after few minutes on receiving continuous GET requests for the same page. If I send GET request with some gap (say 500 ms) then it works fine. If I send GET request with 10 ms or less than 10 ms then server resets the connection after few seconds from the start of test. Please help on how to fix this problem. What is the reason for reset ? Whether the server is overloaded or I have to perform some operation while establish connection ??.
My GET request format is:
GET /index.html HTTP/1.1
Host: 180.168.40.40
Connection: keep-alive

Wget gives up too quickly on a Express API

I want to download the result of a Express.js REST API which is very slow to process (~10 minutes). I tried few timeout options with wget but it gives up after few minutes while I ask it to wait around ~60 000 years.
wget "http://localhost:5000/slowstuff" --http-user=user --http-password=password --read-timeout=1808080878708 --tries=1
--2015-02-26 11:14:21-- http://localhost:5000/slowstuff
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:5000... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authentication selected: Basic realm="Authorization Required"
Reusing existing connection to [localhost]:5000.
HTTP request sent, awaiting response... No data received.
Giving up.
EDIT:
The problem doesn't come from the wget timeout value. With a timeout set to 4 seconds, the error is different: Read error (Connection timed out) in headers. And I have exactly the same problem with curl.
I think the problem comes from my API. It looks like a timeout of 2 minutes is set by default in NodeJS.
Now, I need to find how to change this value.
This
--http-password=password--read-timeout=1808080878708
is missing a blank. Use
--http-password=password --read-timeout=1808080878708

Haproxy 503 Service Unavailable . No server is available to handle this request

How does haproxy deal with static file , like .css, .js, .jpeg ? When I use my configure file , my brower says :
503 Service Unavailable
No server is available to handle this request.
This my config :
global
daemon
group root
maxconn 4000
pidfile /var/run/haproxy.pid
user root
defaults
log global
option redispatch
maxconn 65535
contimeout 5000
clitimeout 50000
srvtimeout 50000
retries 3
log 127.0.0.1 local3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 10s
listen dashboard_cluster :8888
mode http
stats refresh 5s
balance roundrobin
option httpclose
option tcplog
#stats realm Haproxy \ statistic
acl url_static path_beg -i /static
acl url_static path_end -i .css .jpg .jpeg .gif .png .js
use_backend static_server if url_static
backend static_server
mode http
balance roundrobin
option httpclose
option tcplog
stats realm Haproxy \ statistic
server controller1 10.0.3.139:80 cookie controller1 check inter 2000 rise 2 fall 5
server controller2 10.0.3.113:80 cookie controller2 check inter 2000 rise 2 fall 5
Does my file wrong ? What should I do to solve this problem ? ths !
What I think is the cause:
There was no default_backend defined. 503 will be sent by HAProxy---this will appear as NOSRV in the logs.
Another Possible Cause
Based on one of my experiences, the HTTP 503 error I receive was due to my 2 bindings I have for the same IP and port x.x.x.x:80.
frontend test_fe
bind x.x.x.x:80
bind x.x.x.x:443 ssl blah
# more config here
frontend conflicting_fe
bind x.x.x.x:80
# more config here
Haproxy configuration check does not warn you about it and netstat doesn't show you 2 LISTEN entries, that's why it took a while to realize what's going on.
This can also happen if you have 2 haproxy services running. Please check the running processes and terminate the older one.
Try making the timers bigger and check that the server is reachable.
From the HAproxy docs:
It can happen from many reasons:
The status code is always 3-digit. The first digit indicates a general status :
- 1xx = informational message to be skipped (eg: 100, 101)
- 2xx = OK, content is following (eg: 200, 206)
- 3xx = OK, no content following (eg: 302, 304)
- 4xx = error caused by the client (eg: 401, 403, 404)
- 5xx = error caused by the server (eg: 500, 502, 503)
503 when no server was available to handle the request, or in response to
monitoring requests which match the "monitor fail" condition
When a server's maxconn is reached, connections are left pending in a queue
which may be server-specific or global to the backend. In order not to wait
indefinitely, a timeout is applied to requests pending in the queue. If the
timeout is reached, it is considered that the request will almost never be
served, so it is dropped and a 503 error is returned to the client.
if you see SC in the logs:
SC The server or an equipment between it and haproxy explicitly refused
the TCP connection (the proxy received a TCP RST or an ICMP message
in return). Under some circumstances, it can also be the network
stack telling the proxy that the server is unreachable (eg: no route,
or no ARP response on local network). When this happens in HTTP mode,
the status code is likely a 502 or 503 here.
Check ACLs, check timeouts... and check the logs, that's the most important...