How to handle ephemeral port exhaustion in envoy - kubernetes

One of the problems with reverse proxies handling multiple requests on behalf of clients is, after a while under heavy load, the number of outgoing connections from envoy node to backend nodes will run out of ephemeral ports
Assuming that we have assigned multiple ip addreses/hostnames to envoy node, is there a way to inform envoy use these ip addresses/hostnames in a round robin fashion when making connections to backends?
References:
https://blog.box.com/blog/ephemeral-port-exhaustion-and-web-services-at-scale/
https://making.pusher.com/ephemeral-port-exhaustion-and-how-to-avoid-it/
https://www.nginx.com/blog/overcoming-ephemeral-port-exhaustion-nginx-plus/
https://github.com/kubernetes/kubernetes/issues/27398

The most promising option is to find a way to enable TCP multiplexing between your proxy/LB and backend servers.
What is TCP Multiplexing?
TCP multiplexing is a technique used primarily by load balancers and application delivery controllers (but also by some stand-alone web application acceleration solutions) that enables the device to "reuse" existing TCP connections. This is similar to the way in which persistent HTTP 1.1 connections work in that a single HTTP connection can be used to retrieve multiple objects, thus reducing the impact of TCP overhead on application performance.
TCP multiplexing allows the same thing to happen for TCP-based applications (usually HTTP / web) except that instead of the reuse being limited to only 1 client, the connections can be reused over many clients, resulting in much greater efficiency of web servers and faster performing applications.
Another good explanation about TCP multiplexing can be found here.
Another option is adding more proxy instances to the pool behind the L4 network Load Balancer and set connection limit for one instance to reasonable value.
Each proxy would carry a certain amount of load without a problem. If you need to handle periodic bursts in load, you may want to set auto scaling strategy to the proxy pool.

Related

TCP or UDP for lots of connections?

I want to create a P2P network with the following characteristics:
low latency is not really important
loosing packages is okay
the nodes would only send tiny amounts of data around
there will be no NAT/firewall issues, every node has an open port on its public ip
every node is connected to every other node
Usually I would use TCP for anything not time-critical but the last requirements causes the nodes to have lots of open connections for a long time. If I remember correctly, using TCP to connect to 1000 servers would mean I had to use 1000 ports to handle these connections. UDP on the other hand, would only require a single port for each node.
So my question is: Is TCP able to handle the above requirements in a network with e.g. 1000 nodes without tweaking the system? Would UDP be better suited in this case? Is there anything else that would be a deal-breaker for either protocol?
With UDP you control the "connection state" and it is pretty much the best way to do anything peer to peer related IF you have a high number of nodes or care about bandwidth, memory and CPU overhead. By moving all the control to your application in regards to the "connection state" of each node you minimize the amount of wasted resources by making it fit your needs exactly.
You will bypass a lot of operating system specific weirdness that limits the effectiveness of TCP with high numbers of connections. There is TIME_WAIT bloat and tens to hundreds of OS specific settings which will need tweaking for every user of your P2P app if it needs those high numbers. A test app I made which allowed you to use UDP with ack or TCP showed only a 10% difference in performance regardless of operating system using UDP. TCP performance was always lower than the best UDP and its performance varied wildly by over 600% depending upon the OS. With tweaks you can make most OS perform roughly the same using TCP but by default most are not properly tweaked.
So in my opinion it is harder to make a reliable UDP P2P network compared to TCP but it is often needed. However I would only advise that route it if you were quite experienced with networking as there are a lot of "gotchas" to deal with. There are libraries which help with this like Raknet or Enet. They provide ways to do reliable UDP but it still takes a higher amount of networking knowledge to know how this all ties in together, whereas with TCP it is mostly hidden from you.
In a peer to peer network you often have messages like NODE PINGs that you may not care if each one is always received, you just care if you have received one recently. ie You may send a ping every 10 seconds, and disconnect the node after 60 seconds of no ping. This would mean you would need 6 ping packets in a row to fail, which is highly unlikely unless the node is really down. If you received even one ping in that 60 second period then the node is still active. A TCP implementation of this would have involved more latency and bandwidth as it makes sure EACH ping message gets through and will block any other data going out until it does. And since you cannot rely on TCP to reliably tell you if a connection is dead, you are forced to add similar PING features for TCP, on top of all the other things TCP is already doing extra with your packets.
Games also often have data that if its not received by a client it is no big deal because there are more packets coming in a few milliseconds which will invalidate any missed packets. ie Player is moving from A to Z over a time span of 1 second, his client sends out each packet, roughly 40 milliseconds apart ABCDEFG__I__KLMNOPQRSTUVWXYZ Do we really care if we miss "H and J" since every 40ms we are receiving updates? Not really, this is where prediction can come into it, but this is usually not relevant to most P2P projects. If that was TCP instead of UDP then it would have increased bandwidth requirements and added latency to the rest of the packets being received as the data will be resent until it arrives, on top of the extra latency it is already adding by acking everything.
Essentially you can lower latency and network overhead for many messages in a peer to peer network using UDP. However there will always be some messages which NEED to be sent reliably and that requires you to basically implement some reliable way to get packets to that node, similar to that of TCP. And this is where you need some level of expertise if you want a reliable peer to peer network. Some things to look into include sequencing packets with a number, message ACKs, etc.
If you care a lot about efficiency or really need tens of thousands of connections then implementing your specific protocol in UDP will always be better than TCP. But there are cases to be made for TCP, like if the time to make the project matters or if you are a new to network programming.
If I remember correctly, using TCP to connect to 1000 servers would mean I had to use 1000 ports to handle these connections.
You remember wrong.
Take a web server which is listening on port 80 and can handle 1000s of connections at the same time on this single port. This is because a connection is defined by the tuple of {client-ip,client-port,server-ip,server-port}. And while server-ip and server-port are the same for all connections to this server the client-ip and client-port are not. Even if the client-ip is the same (i.e. same client) the client would pick a different source port.
... with e.g. 1000 nodes without tweaking the system?
This depends on the system since each of the open connections needs to preserve the state and thus needs memory. This might be a problem for embedded systems with only little memory.
In any case: if your protocol is just sending small messages and if packet loss, reordering or duplication are acceptable than UDP might be the better choice because the overhead (connection setup, ACK..) is smaller and it takes less memory. You could also use a single socket to exchange data with all 1000 nodes whereas with TCP you would need a separate socket for each connection (socket is not the same as port!). Using only a single socket might allow for a simpler application design.
I want to amend the answer by Steffen with a few points:
1000 connections are nothing for any normal computer and OS.
UDP fits your requirements. It might be easier to program because it is message oriented. TCP provides a stream of bytes. You need to layer a messaging protocol on top of that which is not that easy. Also, you need to handle broken TCP connections by reconnecting.
Ports are not scarce. No problem with consuming 1000 ports.

Multiple service connections vs internal routing in MMO

The server consists of several services with which a user interacts: profiles, game logics, physics.
I heard that it's a bad practice to have multiple client connections to the same server.
I'm not sure whether I will use UDP or TCP.
The services are realtime, they should reply as fast as possible so I don't want to include any additional rerouting if there are no really important reasons. So are there any reasons to rerote traffic through one external endpoint service to specific internal services in my case?
This seems to be multiple questions in one package. I will try to answer the ones I can identify as separate...
UDP vs TCP: You're saying "real-time", this usually means UDP is the right choice. However, that means having to deal with lost packets and possible re-ordering of packets. But, using UDP leaves a couple of possible delay-decreasing tricks open.
Multiple connections from a single client to a single server: This consumes resources (end-points, as it were) on both the client (probably ignorable) and on the server (possibly a problem, possibly ignorable). The advantage of using separate connections for separate concerns (profiles, physics, ...) is that when you need to separate these onto separate servers (or server farms), you don't need to update the clients, they just need to connect to other end-points, using code that's already tested.
"Re-router" (or "load balancer") needed: Probably not going to be an issue initially. However, it will probably become an issue later. Depending on your overall design and server OS, using UDP may actually become an asset here. UDP packet arrives at the load balancer, dispatched to the right backend and that could then in theory send back a reply with the source IP of the load balancer.
An alternative would be to have a "session broker". The client makes an initial connection to a well-known endpoint, says "I am a client, tell me where my profile, physics, what-have0-you servers are", the broker considers the current load, possibly the location of the client and other things that may make sense and the client then connects to the relevant backends on its own. The downside of this is that it's harder (not impossible, but harder) to silently migrate an ongoing session to a new backend, when there's a load-balancer in the way, this can be done essentially-transparently.

HA-Proxy balancing by source doesnt appear consistent

Using HA-Proxy 1.4.18 I am using balance source as the option to balance a tcp stream to 2 servers. However from an admittedly very small sample set of connections it appears that they all just go to the one server - the server listed first in the haproxy config.
listen videos *:1935
balance source
mode tcp
server server1 192.168.0.1:1935
server server2 192.168.0.2:1935
I have not seen it split the load onto the 2 boxes. This does work when I use balance roundrobin however for this particular application I cannot use this method.
Any ideas for an otherwise persistent session loadbalanced between these 2 machines from the clients?
Cheers
How did you test the balance ?, the doc says :
The source IP address is hashed and divided by the total
weight of the running servers to designate which server will
receive the request. This ensures that the same client IP
address will always reach the same server as long as no
server goes down or up. If the hash result changes due to the
number of running servers changing, many clients will be
directed to a different server. This algorithm is generally
used in TCP mode where no cookie may be inserted. It may also
be used on the Internet to provide a best-effort stickiness
to clients which refuse session cookies. This algorithm is
static by default, which means that changing a server's
weight on the fly will have no effect, but this can be
changed using "hash-type"
If you tested with just 2 different IP source you maybe fall in a particular case.

Performance considerations of a large number of connection on the same port

What are the performance considerations that one should take into account when designing a server application that listens on one port? I know that it is possible for many thousands of clients to connect to a server on a single port, but is performance negatively affected by having the server application accept all incoming requests on a single port?
Is my understanding correct that this model (server listening on one port which handles incoming connections, and then responds back over the outbound connection that was created when the client connection was established) is the way databases/webservers etc work?
Regards,
Brian
It does not matter if the server listens on one port or multiple ports. The server still has to establish a full socket connection for each client it accepts. The OS still has to route inbound packets to the correct socket endpoints, and sockets are uniquely identified by the combination of IP/Port pairs of both endpoints, so there is no performance issues if the server endpoints use different ports.
Any performance issues are going to be in the way the server's code handles those socket connections. If it only listens on one port, and accepts clients on that port using a simple accept() loop in a single thread, then the rate that it can accept clients is limited by that loop. Typically, servers spawn worker threads for each accepted client, which in itself has performance overhead of its own if thread pooling is not used. If the server needs to handle a lot of clients simultaneously, then it should use overlapped I/O or I/O Completion Ports to handle the connections more efficiently.

Scaleability of TCP keep-alive

Consider a large scale, heterogeneous network of various devices. These devices are providing services to others on the network in a peer-to-peer fashion. The mechanism used to track service availabilty across all nodes is currently using TCP sockets marked as keep-alive, usually for the duration the node is online. This leads to every node having a socket open with every other node (within a subnet of the peer-to-peer infrastructure).
What arguments exist regarding the scaleability of using TCP keep-alive in this way?
My alternative approach is to use a publish/subscribe model, where nodes push new services to the network as they become available, and their peers cache them for when they want to subscribe to a service. Does this sound feasable?
I interpret from what you wrote that the communication is strictly point-to-point, with considerable duration ('leases'). If this is true, it means that you will gain nothing in a publish-subscribe model. If this is not true, then yes, you should (could) change the network model to match the communications, and your idea sounds sound.
Regarding your second question, since TCP sockets and keep-alive is just a concept, there is no (or a very small) intrinsic cost of having such a keep-alive contract. In practice YMMV since different socket implementations require different resources, and other actions might be required to keep the channel open. There are however many implementations which require very little resources for open sockets (select()-type for example).
A discovery service (publish/subscribe of services) makes most sense if there are many implementers of the same type of service, and you cannot (or do not want to) predict statically where they will appear.
In short, I would say that you should only change the design if the type of communication that you have fits the current architecture badly. Your idea certainly sounds very feasible, but more information about the communication patterns would be necessary to make an estimation of the outcome.
Yeah using keep alive seems like a bad idea for any P2P network. Not only would I only have connections kept open while data is being transferred I would also keep node state updates on a different sockets altogether so as to not interfere with file transmissions.
If your TCP Keep Alive mechanism is being used only for tracking service availability (meaning, you never communicate service request/response across these connections), the use of TCP sockets is definitely an over kill. TCP sockets do take significant resources.
A more scalable method could be using a publish/subscribe model that uses UDP publish messages at regular intervals to advertise continued existence of the service. You could also use a service-down message published from a disconnecting node to gracefully declare end of service.
Going further, if you mean to get optimal with really large scale networks and, are ready to put in some time and effort -- consider a structured P2P mechanism like DHT.