How to make haproxy dispense two following requests to a node - haproxy

In a two node scenario, using roundrobin, I want haproxy to dispense two requests to each node before switching to the next node.
I have a messaging application, which makes one request for getting a messageID, then the next for sending the message.
If i use a standard roundrobin algorithm on two backend servers, this leads to one server only getting the messageID requests, and the other doing all the message sending.
This is not really balanced, as providing messageIDs is a nobrainer to the server, and handling the messages, which can be up to a few hundret MBs, is all done by the other node.
I had a look at weighted roundrobin, but if seems not to work out, when using a weight of 2 for both servers, as the weights seem to get calculated relatively to each other.
I'd be glad for any hint, how to achieve haproxy switching the backend nodes after sending two requests, instead of one.
this is my current configuration, which still leads to a claer one here one there round robin pattern:
### frontend XTA Entry TLS/CA
frontend GMM_XTA_Entry_TLS_CA
mode tcp
bind 10.200.0.20:8444
default_backend GMM_XTA_Entrypoint_TLS_CA
### backend XTA Entry TLS/CA
backend GMM_XTA_Entrypoint_TLS_CA
mode tcp
server GMMAPPLB1-XTA-CA 10.200.0.21:8444 check port 8444 inter 1s rise 2 fall 3 weight 2
server GMMAPPLB2-XTA-CA 10.200.0.22:8444 check port 8444 inter 1s rise 2 fall 3 weight 2
well, like stated, I would need a "two requests here, two requests there" round robin pattern, but it keeps doing "one here, one there".
Glad for any hint, cheers,
Arend

To get the behavior you want where requests go to a server 2 at a time, you can add an extra consecutive server line for each backend, like so:
backend GMM_XTA_Entrypoint_TLS_CA
balance roundrobin
mode tcp
server GMMAPPLB1-XTA-CA_1 10.200.0.21:8444 check port 8444 inter 1s rise 2 fall 3
server GMMAPPLB1-XTA-CA_2 10.200.0.21:8444 track GMMAPPLB1-XTA-CA_1
server GMMAPPLB2-XTA-CA_1 10.200.0.22:8444 check port 8444 inter 1s rise 2 fall 3
server GMMAPPLB2-XTA-CA_2 10.200.0.22:8444 track GMMAPPLB2-XTA-CA_1
However, if you can use HAProxy 1.9 or above, you can also use the balance random option which should randomly distribute requests evenly across your servers. I think this may solve the balancing problem you stated above more directly. Also, using balance random will still balance your requests randomly if the type of requests change.

the proposed answer using 4 server entries in the backend did the job.
I am not sure, if it is the most elegant solution, but it did help me understanding the usage of backends a bit more, again thanks for that.

Related

What could "reason: Layer6 timeout" possibly mean?

I have a haproxy configured with two servers in the backend. Occasionally, every 16-20h one of them gets marked by haproxy as DOWN:
haproxy.log-20190731:2019-07-30T16:16:24+00:00 <local2.alert> haproxy[2716]: Server be_kibana_elastic/kibana8 is DOWN, reason: Layer6 timeout, check duration: 2000ms. 0 active and 0 backup servers left. 8 sessions active, 0 requeued, 0 remaining in queue.
I did some reading how haproxy runs the checks but the Layer6 timeout does not tell me much. What could be a possible reasons for that timeout? What does it actually mean?
Here is my backend configuration
backend be_kibana_elastic
balance roundrobin
stick on src
stick-table type ip size 100k expire 12h
server kibana8 172.24.0.1:5601 check ssl verify none
server kibana9 172.24.0.2:5601 check ssl verify none
Layer 6 refers to TLS. The backend is accepting a TCP connection but isn't negotiating TLS (SSL) on the health check connection within the allowed time.
The configuration values timeout connect, timeout check, and inter all interact to determine how much time health checks are allowed, to complete, and the default value of inter if not specified is 2000 milliseconds, which is what you're seeing here. By default, inter (health check interval) determines both how often checks run and how long they are allowed to complete.
Since you have not configured a fall count for the servers, the implication is that the default value 3 is being used, which means your server is actually failing 3 consecutive health checks, before being marked down.
Consider adding option log-health-checks to the backend declaration, which will create additional log entries of those initial failing checks before the final one causes the backend to be marked down.
Increasing the allowable time may avoid the failure, but is probably valid only for testing -- not a fix -- because if your backend can't reliably respond to a check within 2000 ms, then it also can't reliably respond to client connections within that time frame, which is a long time to wait for a response.
Note that in typical environments, intermittent packet loss will typically cause sluggish behavior in increments of 3000 ms, because TCP stacks often use a retransmission timeout (RTO) of 3 seconds. Since this is more than 2000 ms, packet loss on your network is one possible explanation for the problem.
Another possible explanation is excessive CPU load on the backend, either related to traffic or to a cron job doing something intensive, because TLS negotiation -- relatively speaking -- is an expensive process from the CPU's perspective.

Low latency two-phase protocol

I'm look recommendations on how to achieve low latency for the following network protocol:
Alice sends out a request for information to many peers selected at random from a very large pool.
Each peer responds with a small packet <20kb.
Alice aggregates the responses and selects a peer accordingly.
Alice and the selected peer then continue to the second phase of the protocol whereby a sequence of 2 requests and responses are performed.
Repeat from 1.
Given that steps 1 and 2 do not need to be reliable (as long as a percentage of responses arrive back we proceed to step 3) and that 1 is essentially a multicast, this part of the protocol seems to suit UDP - setting up a TCP connection to these peers would add an addition round trip.
However step 4 needs to be reliable - we can't tolerate packet loss during the subsequent requests/responses.
The conundrum I'm facing is that UDP suits 1 and 2 and TCP protocol suits 4. Connecting to every peer selected in 1 is slow especially since we aim to transmit just 20kb, however UDP cannot be tolerated for step 4. Handshaking the peer selected in 4. would require an additional round trip, which compared to the 3 round trips still is a considerable increase in total time.
Is there some hybrid scheme whereby you can do a TCP handshake while transmitting a small amount of data? (The handshake could be merged into 1 and 2 and hence doesn't add any additional round trip time.)
Is there a name for such protocols? What should I read to become more acquainted with such problems?
Additional info:
Participants are assumed to be randomly distributed around the globe and connected via the internet.
The pool selected from in step 1. is on the order of 1000 addresses and the the sample from the pool on the order of 10 to 100.
There's not enough detail here to do a well-informed criticism. If you were hiring me for advice, I'd want to know a lot more about the proposal, but since I'm doing this for free, I'll just answer the question as asked, and try to make it practical rather than ideal.
I'd argue that UDP is not suitable for the early part of your protocol. You can't just multicast a single packet to a large number of hosts on the Internet (although you can do it on typical LANs). A 20KB payload is not the sort of thing you can generally transmit in a single datagram in any case, and the moment messages fail to fit in a single datagram, UDP loses most of its attraction, because you start reinventing TCP (badly).
Probably the simplest thing you can do is base your system on HTTP, and work with implementations which incorporate all the various speed-ups that Google (mostly) has been putting into HTTP development. This includes TCP Fast Open, and things like it. Initiate connections out to your chosen servers; some will respond faster than others: use that to your advantage by going with the quickest ones. Don't underestimate the importance of efficient implementation relative to theoretical round-trip time, by the way.
For stage two, continue with HTTP as before. For efficiency, you could hold all the connections open at the end of phase one and then close all the ones except your chosen phase two partner. It's not clear from your description that the phase two exchange lends itself to the HTTP model, though, so I have to hand-wave this a bit.
It's also possible that you can simply hold TCP connections open to all available peers more or less permanently, thus dodging the cost of connection establishment nearly all the time. A thousand simultaneous open connections is large, but not outrageous in most contexts (although you may need to tweak OS settings to allow it). If you do that, you can just talk whatever protocol you like over TCP. If it's a truly peer-to-peer protocol, you only need one TCP connection per pair. Implementing this kind of thing is tricky, though: an average programmer will do a terrible job of it, in my experience.

Kubernetes Service: IPVS load balancing algorithm

As discovered here, there is a new kind of kube service that are IPVS and have many algorithms for load balancing.
The only problem is I didn't find where those algorithms are specified.
My understanding:
rr: round-robin
-> call backend pod one after another in a loop
lc: least connection
-> group all pod with the lowest number of connection, and send message to it. Which kind of connection? only the ones from this service ?
dh: destination hashing
-> ?something based on url?
sh: source hashing
-> ?something based on url?
sed: shortest expected delay
-> either the backend with less ping or some logic on the time a backend took to respond in the past
nq: never queue
-> same as least connection? but refusing messages at some points ?
If anyone has the documentation link (not provided in the official page and still saying IPVS is beta whereas it is stable sync 1.11) or the real algorithm behind all of them, please help.
I tried: Google search with the terms + lookup in the official documentation.
They are defined in the code
https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/apis/config/types.go#L193
rr round robin : distributes jobs equally amongst the available real servers
lc least connection : assigns more jobs to real servers with fewer active jobs
sh source hashing : assigns jobs to servers through looking up a statically assigned hash table by their source IP addresses
dh destination hashing : assigns jobs to servers through looking up a statically assigned hash table by their destination IP addresses
sed shortest expected delay : assigns an incoming job to the server with the shortest expected delay. The expected delay that the job will experience is (Ci + 1) / Ui if sent to the ith server, in which Ci is the number of jobs on the ith server and Ui is the fixed service rate (weight) of the ith server.
nq never queue : assigns an incoming job to an idle server if there is, instead of waiting for a fast one; if all the servers are busy, it adopts the ShortestExpectedDelay policy to assign the job.
All those come from IPVS official documentation : http://www.linuxvirtualserver.org/docs/scheduling.html
Regards

200 vs 403 server response - which degrades server's performance more?

Some rogue people have set up server monitoring that connects to server every 2 minutes to check if it's down (they connect from several different accounts so they ping the server every 20 seconds or so). It's a simple GET request.
I have two options:
Leave it as it is (ie. allow them via a normal 200 server response).
Block them by either IP or user-agent (giving 403 response).
My question is - what is the better solution as far as server performance is concerned (ie. what is less 'stressful' on the server) - 1 (200 response) or 2 (403 response)?
I'm inclined to #1 since there would be no IP / user-agent checking which should mean less stress on the server, correct?
It doesn't matter.
The status code and an if-check on the user-string is completely dominated by network IO, gc and server subsystems.
If they just query every 2 minutes, I'd very much leave it alone. If they query a few hundred times per second; time to act.

How to enforce single threaded requests?

I've started rate limiting my API using HAProxy, but my biggest problem is not so much the rate of requests, but when multi-threaded requests overlap.
Even within my legal per-second limits, big problems are occurring when clients don't wait for a response before issuing another request.
Is it possible (say, per IP address) to queue requests and pass them one at at time to the back end for sequential processing?
Here is a possible solution to enforce one connection at a time per src IP.
You need to put the following HAProxy conf in the corresponding frontend:
frontend fe_main
mode http
stick-table type ip size 1m store conn_cur
tcp-request connection track-sc0 src
tcp-request connection reject if { src_conn_cur gt 1 }
This will create a stick table that stores concurrent connection counts per source IP. Then rejects new connections if there is already one established from the same src IP.
Browsers imitating multiple connections to your API or clients behind a NAT will not be able to efficiently use you API.