Scaleability of TCP keep-alive

Scaleability of TCP keep-alive - sockets

Consider a large scale, heterogeneous network of various devices. These devices are providing services to others on the network in a peer-to-peer fashion. The mechanism used to track service availabilty across all nodes is currently using TCP sockets marked as keep-alive, usually for the duration the node is online. This leads to every node having a socket open with every other node (within a subnet of the peer-to-peer infrastructure).
What arguments exist regarding the scaleability of using TCP keep-alive in this way?
My alternative approach is to use a publish/subscribe model, where nodes push new services to the network as they become available, and their peers cache them for when they want to subscribe to a service. Does this sound feasable?

I interpret from what you wrote that the communication is strictly point-to-point, with considerable duration ('leases'). If this is true, it means that you will gain nothing in a publish-subscribe model. If this is not true, then yes, you should (could) change the network model to match the communications, and your idea sounds sound.
Regarding your second question, since TCP sockets and keep-alive is just a concept, there is no (or a very small) intrinsic cost of having such a keep-alive contract. In practice YMMV since different socket implementations require different resources, and other actions might be required to keep the channel open. There are however many implementations which require very little resources for open sockets (select()-type for example).
A discovery service (publish/subscribe of services) makes most sense if there are many implementers of the same type of service, and you cannot (or do not want to) predict statically where they will appear.
In short, I would say that you should only change the design if the type of communication that you have fits the current architecture badly. Your idea certainly sounds very feasible, but more information about the communication patterns would be necessary to make an estimation of the outcome.

Yeah using keep alive seems like a bad idea for any P2P network. Not only would I only have connections kept open while data is being transferred I would also keep node state updates on a different sockets altogether so as to not interfere with file transmissions.

If your TCP Keep Alive mechanism is being used only for tracking service availability (meaning, you never communicate service request/response across these connections), the use of TCP sockets is definitely an over kill. TCP sockets do take significant resources.
A more scalable method could be using a publish/subscribe model that uses UDP publish messages at regular intervals to advertise continued existence of the service. You could also use a service-down message published from a disconnecting node to gracefully declare end of service.
Going further, if you mean to get optimal with really large scale networks and, are ready to put in some time and effort -- consider a structured P2P mechanism like DHT.

Related

Does listen() backlog affect established TCP connections?

Would it be naive to create a TCP socket with a listen backlog set to minimum as a way of rate limiting new incoming connections? The server workload in question doesn't expect many new connections at any time but spends a lot of time servicing long open persistent connections. It appears that new incoming connections shouldn't affect established connections, though I've been unable to find any definitive answer in any text. Is it possible for failed new incoming connections to create some kind of TCP traffic congestion on the server with the packets it's receiving or are they dropped fast enough that it has no effect on any buffers or other part of the network stack?
Specifically the platform in use is Linux, and although it may be handled differently in different OSs, I expect them to all behave roughly the same.
EDIT What I mean by the "same" is that backlog doesn't affect established connections, though I do understand Linux discards them while Windows sends a reset.

Does listen() backlog affect established TCP connections?
It affects established connections that the server hasn't accepted yet via accept(), only in the sense that it limits the number of such connections that can exist.
Would it be naive to create a TCP socket with a listen backlog set to minimum as a way of rate limiting new incoming connections?
All it would accomplish would be to unnecessarily fail some connecting clients. They won't get any service until your server gets around to it anyway, and once the backlog queue fills they are rate-limited by your service code anyway. There is no particular reason why shortening the queue would have any beneficial effect. The other problem with the idea is that it isn't readily possible to determine what the minimum actually is, or whether you succeeded in setting it as the backlog queue length.
It appears that new incoming connections shouldn't affect established connections, though I've been unable to find any definitive answer in any text.
That is correct. There is no reason why it should affect them: that's why you won't find it written down anywhere, any more than the fact that the phase of the moon doesn't affect it either.
Is it possible for failed new incoming connections to create some kind of TCP traffic congestion on the server with the packets it's receiving
No.
or are they dropped fast enough that it has no effect on any buffers or other part of the network stack?
They're not dropped. They simply aren't even created if they won't fit on the backlog queue. Ergo their resource consumption at the server is zero.
Specifically the platform in use is Linux, and although it may be handled differently in different OSs, I expect them to all behave roughly the same.
They don't. On Windows, an incoming connection when the backlog queue is full causes an RST to be issued. On other platforms it is simply ignored.

What you describe are several types of attacks like flooding, syn attacks and other goodies resulting in denial of service.
This topic is not easy, because protection has to be implemented in all the layers, including TCP. For instance a SYN attack, fiddling with the sequence numbers, ... . At that point the packet in question already came a long way, through the ethernet layer and ip layer, bottom line it is taking resources. So if your system is under attack, the attacking packets are in your data stream just like the good ones are. The faster you can detect a packet is faulty and drop it, the better. Usually a system that is under attack will be slower. Well at least the systems that I have worked with.
Some attacks try to bring your system in a faulty state permanently, this by exploiting bugs. For instance TCP has a receive queue, if packets are constantly arriving out of order they will be stored in that receive queue. If the missing packet never arrives, then this receive queue could keep on growing and growing. Without the proper defense , this would lead to the system going completely out of resources.
There are specialised tools (codenumicon for instance) to check the vulnerability of a TCP stack implementation. You can assume that the one on linux has been properly tested using similar tools.
An attack can also occur on the application layer. If you have a TCP server and it allows only a limited amount of sessions. A malicious user can simply take all the connections simply by establishing all the connections and then not doing anything with it. So you have to create some defense as well. Weather or not you set this limit very low or high does not change a thing. A malicious user will try anything to bring your system down. You need to built in defense anyway. You can connect to a webserver (HTTP) simply using telnet. If you don't send anything the server's defense will come into play and close the connection.
So bringing the amount of possible connections to a low value and thinking that this in itself is a form of protection is indeed naive.
Is it possible for failed new incoming connections to create some kind of TCP traffic congestion on the server with the packets it's receiving or are they dropped fast enough that it has no effect on any buffers or other part of the network stack?
They are using resources of your machine and will make your system run slower.
It appears that new incoming connections shouldn't affect established connections, though I've been unable to find any definitive answer in any text.
If it is normal user trying to establish a connection, even if he is doing it continuously, retrying upon failure. The influence will be minimal, close to nothing. But a malicious user that is flooding connections attempts will have influence on the system performance, because the system has to spent time identifying those flawed packets and dropping them asap.

Multiple service connections vs internal routing in MMO

The server consists of several services with which a user interacts: profiles, game logics, physics.
I heard that it's a bad practice to have multiple client connections to the same server.
I'm not sure whether I will use UDP or TCP.
The services are realtime, they should reply as fast as possible so I don't want to include any additional rerouting if there are no really important reasons. So are there any reasons to rerote traffic through one external endpoint service to specific internal services in my case?

This seems to be multiple questions in one package. I will try to answer the ones I can identify as separate...
UDP vs TCP: You're saying "real-time", this usually means UDP is the right choice. However, that means having to deal with lost packets and possible re-ordering of packets. But, using UDP leaves a couple of possible delay-decreasing tricks open.
Multiple connections from a single client to a single server: This consumes resources (end-points, as it were) on both the client (probably ignorable) and on the server (possibly a problem, possibly ignorable). The advantage of using separate connections for separate concerns (profiles, physics, ...) is that when you need to separate these onto separate servers (or server farms), you don't need to update the clients, they just need to connect to other end-points, using code that's already tested.
"Re-router" (or "load balancer") needed: Probably not going to be an issue initially. However, it will probably become an issue later. Depending on your overall design and server OS, using UDP may actually become an asset here. UDP packet arrives at the load balancer, dispatched to the right backend and that could then in theory send back a reply with the source IP of the load balancer.
An alternative would be to have a "session broker". The client makes an initial connection to a well-known endpoint, says "I am a client, tell me where my profile, physics, what-have0-you servers are", the broker considers the current load, possibly the location of the client and other things that may make sense and the client then connects to the relevant backends on its own. The downside of this is that it's harder (not impossible, but harder) to silently migrate an ongoing session to a new backend, when there's a load-balancer in the way, this can be done essentially-transparently.

Is there a performance difference between pooling connections or channels in rabbitmq?

I'm a newbie with Rabbitmq(and programming) so sorry in advance if this is obvious. I am creating a pool to share between threads that are working on a queue but I'm not sure if I should use connections or channels in the pool.
I know I need channels to do the actual work but is there a performance benefit of having one channel per connection(in terms of more throughput from the queue)? or am I better off just using a single connection per application and pool many channels?
note: because I'm pooling the resources the initial cost is not a factor, as I know connections are more expensive than channels. I'm more interested in throughput.

I have found this on the rabbitmq website it is near the bottom so I have quoted the relevant part below.
The tl;dr version is that you should have 1 connection per application and 1 channel per thread.
Connections
AMQP connections are typically long-lived. AMQP is an application
level protocol that uses TCP for reliable delivery. AMQP connections
use authentication and can be protected using TLS (SSL). When an
application no longer needs to be connected to an AMQP broker, it
should gracefully close the AMQP connection instead of abruptly
closing the underlying TCP connection.
Channels
Some applications need multiple connections to an AMQP broker.
However, it is undesirable to keep many TCP connections open at the
same time because doing so consumes system resources and makes it more
difficult to configure firewalls. AMQP 0-9-1 connections are
multiplexed with channels that can be thought of as "lightweight
connections that share a single TCP connection".
For applications that use multiple threads/processes for processing,
it is very common to open a new channel per thread/process and not
share channels between them.
Communication on a particular channel is completely separate from
communication on another channel, therefore every AMQP method also
carries a channel number that clients use to figure out which channel
the method is for (and thus, which event handler needs to be invoked,
for example).
It is advised that there is 1 channel per thread, even though they are thread safe, so you could have multiple threads sending through one channel. In terms of your application I would suggest that you stick with 1 channel per thread though.
Additionally it is advised to only have 1 consumer per channel.
These are only guidelines so you will have to do some testing to see what works best for you.
This thread has some insights here and here.
Despite all these guidelines this post suggests that it will most likely not affect performance by having multiple connections. Though it is not specific whether it is talking about client side or server(rabbitmq) side. With the one point that it will of course use more systems resources with more connections. If this is not a problem and you wish to have more throughput it may indeed be better to have multiple connections as this post suggests multiple connections will allow you more throughput. The reason seems to be that even if there are multiple channels only one message goes through the connection at one time. Therefore a large message will block the whole connection or many unimportant messages on one channel may block an important message on the same connection but a different channel. Again resources are an issue. If you are using up all the bandwidth with one connection then adding an additional connection will have no increase performance over having two channels on the one connection. Also each connection will use more memory, cpu and filehandles, but that may well not be a concern though might be an issue when scaling.

In addition to the accepted answer:
If you have a cluster of RabbitMQ nodes with either a load-balancer in front, or a short-lived DNS (making it possible to connect to a different rabbit node each time), then a single, long-lived connection would mean that one application node works exclusively with a single RabbitMQ node. This may lead to one RabbitMQ node being more heavily utilized than the others.
The other concern mentioned above is that the publishing and consuming are blocking operations, which leads to queueing messages. Having more connections will ensure that 1. processing time for each messages doesn't block other messages 2. big messages aren't blocking other messages.
That's why it's worth considering having a small connection pool (having in mind the resource concerns raised above)

The "one channel per thread" might be a safe assumption (I say might as I have not made any research by myself and I have no reason to doubt the documentation :) ) but beware that there is a case where this breaks:
If you you use RPC with RabbitMQ Direct reply-to then you cannot reuse the same channel to consume for another RPC request. I asked for details about that in the google user group and the answer I got from Michael Klishin (who seems to be actively involved in RabbitMQ development) was that
Direct Reply to is not meant to be used with channel sharing either way.
I've email Pivotal to update their documentation to explain how amq.rabbitmq.reply-to is working under the hood and I'm still waiting for an answer (or an update).
So if you want to stick to "one channel per thread" beware as this will not work good with Direct reply-to.

Heartbeat Protocols/Algorithms or best practices

Recently I've added some load-balancing capabilities to a piece of software that I wrote. It is a networked application that does some data crunching based on input coming from a SQL database. Since the crunching can be pretty intensive I've added the capability to have multiple instances of this application running on different servers to split the load but as it is now the load balancing is a manual act. A user must specify which instances take which portion of the input domain.
I would like to take that to the next level and program the instances to automatically negotiate the diving up of the input data and to recognize if one of them "disappears" (has crashed or has been powered down) so that the remaining instances can take on the failed instance's workload.
In order to implement this I'm considering using a simple heartbeat protocol between the instances to determine who's online and who isn't and while this is not terribly complicated I'd like to know if there are any established heartbeat network protocols (based on UDP, TCP or both).
Obviously this happens a lot in the networking world with clustering, fail-over and high-availability technologies so I guess in the end I'd like to know if maybe there are any established protocols or algorithms that I should be aware of or implement.
EDIT
It seems, based on the answers, that either there are no well established heart-beat protocols or that nobody knows about them (which would imply that they aren't so well established after all) in which case I'm just going to roll my own.
While none of the answers offered what I was looking for specifically I'm going to vote for Matt Davis's answer since it was the closest and he pointed out a good idea to use multicast.
Thank you all for your time~

Distribued Interactive Simulation (DIS), which is defined under IEEE Standard 1278, uses a default heartbeat of 5 seconds via UDP broadcast. A DIS heartbeat is essentially an Entity State PDU, which fully defines the state, including the position, of the given entity. Due to its application within the simulation community, DIS also uses a concept referred to as dead-reckoning to provide higher frequency heartbeats when the actual position, for example, is outside a given threshold of its predicted position.
In your case, a DIS Entity State PDU would be overkill. I only mention it to make note of the fact that heartbeats can vary in frequency depending on the circumstances. I don't know that you'd need something like this for the application you described, but you never know.
For heartbeats, use UDP, not TCP. A heartbeat is, by nature, a connectionless contrivance, so it goes that UDP (connectionless) is more relevant here than TCP (connection-oriented).
The thing to keep in mind about UDP broadcasts is that a broadcast message is confined to the broadcast domain. In short, if you have computers that are separated by a layer 3 device, e.g., a router, then broadcasts are not going to work because the router will not transmit broadcast messages from one broadcast domain to another. In this case, I would recommend using multicast since it will span the broadcast domains, providing the time-to-live (TTL) value is set high enough. It's also a more automated approach than directed unicast, which would require the sender to know the IP address of the receiver in order to send the message.

Broadcast a heartbeat every t using UDP; if you haven't heard from a machine in more than k*t, then it's assumed down. Be careful that the aggregate bandwidth used isn't a drain on resources. You can use IP broadcast addresses, or keep a list of specific IPs you're doing work for.
Make sure the heartbeat includes a "reboot count" as well as "machine ID" so that you know previous server state isn't around.
I'd recommend using MapReduce if it fits. It would save a lot of work.

I'm not sure this will answer the question but you might be interested by the way Weblogic Server clustering work under the hood. From the book Mastering BEA WebLogic Server:
[...] WebLogic Server clustering provides a loose coupling of the servers in the cluster. Each server in the cluster is independent and does not rely on any other server for any fundamental operations. Even if contact with every other server is lost, each server will continue to run and be able to process the requests it receives. Each server in the cluster maintains its own list of other servers in the cluster through periodic heartbeat messages. Every 10 seconds, each server sends a heartbeat message to the other servers in the cluster to let them know it is still alive. Heartbeat messages are sent using IP multicast technology built into the JVM, making this mechanism efficient and scalable as the number of servers in the cluster gets large. Each server receives these heartbeat messages from other servers and uses them to maintain its current cluster membership list. If a server misses receiving three heartbeat messages in a row from any other server, it takes that server out of its membership list until it receives another heartbeat message from that server. This heartbeat technology allows servers to be dynamically added and dropped from the cluster with no impact on the existing servers’ configurations.

Cisco content switches are a hardware solution for this problem. They implement a virtual IP address as a front end to multiple real servers, whose real IP addresses are known to the switch. The switch periodically sends HTTP HEAD requests to the web servers, to verify they are still running (which the switch software calls a "keepalive", although this doesn't keep the server itself alive). The Cisco switch accepts traffic on the virtual IP and forwards it to the actual web servers, using configurable load balancing such as round-robin, or user-defined load balancing.
These switches retail in the $3-10K range, although my business partner picked one up on eBay for about $300 a year ago. If you can afford one, they do represent a proven hardware solution to the question of how to have a service spread transparently across multiple servers. Redhat includes a built-in port configuration so that you could implement your own Cisco switch using a cheap RedHat box. Google for "virtual ip address" and "cisco content router" for more information.

In addition to trying hardware load-balancers, you can also try a free-open-source load-balancing software application such as HAProxy, available for Linux and the BSDs.

Can socket connections be multiplexed?

Is it possible to multiplex sa ocket connection?
I need to establish multiple connections to yahoo messenger and i am looking for a way to do this efficiently without having to hold a socket open for each client connection.
so far i have to use one socket for each client and this does not scale well above 50,000 connections.
oh, my solution is for a TELCO, so i need to at least hit 250,000 to 500,000 connections
i'm planing to bind multiple IP addresses to a single NIC to beat the 65k port restriction per IP address.
Please i would any help, insight i can get.
**most of my other questions on this site have gone un-answered :) **
Thanks

This is an interesting question about scaling in a serious situation.
You are essentially asking, "How do I establish N connections to an internet service, where N is >= 250,000".
The only way to do this effectively and efficiently is to cluster. You cannot do this on a single host, so you will need to be able to fragment and partition your client base into a number of different servers, so that each is only handling a subset.
The idea would be for a single server to hold open as few connections as possible (spreading out the connectivity evenly) while holding enough connections to make whatever service you're hosting viable by keeping inter-server communication to a minimum level. This will mean that any two connections that are related (such as two accounts that talk to each other a lot) will have to be on the same host.
You will need servers and network infrastructure that can handle this. You will need a subnet of ip addresses, each server will have to have stateless communication with the internet (i.e. your router will not be doing any NAT in order to not have to track 250,000+ connections).
You will have to talk to AOL. There is no way that AOL will be able to handle this level of connectivity without considering cutting your connection off. Any service of this scale would have to be negotiated with AOL so both you and they would be able to handle the connectivity.
There are i/o multiplexing technologies that you should investigate. Kqueue and epoll come to mind.
In order to write this massively concurrent and teleco grade solution, I would recommend investigating erlang. Erlang is designed for situations such as these (multi-server, massively-multi-client, massively-multithreaded telecommunications grade software). It is currently used for running Ericsson telephone exchanges.

While you can listen on a socket for multiple incoming connection requests, when the connection is established, it connects a unique port on the server to a unique port on the client. In order to multiplex a connection, you need to control both ends of the pipe and have a protocol that allows you to switch contexts from one virtual connection to another or use a stateless protocol that doesn't care about the client's identity. In the former case you'd need to implement it in the application layer so that you could reuse existing connections. In the latter case you could get by using a proxy that keeps track of which server response goes to which client. Since you're connecting to Yahoo Messenger, I don't think you'll be able to do this since it requires an authenticated connection and it assumes that each connection corresponds to a single user.

You can only multiplex multiple connections over a single socket if the other end supports such an operation.
In other words it's a function protocol - sockets don't have any native support for it.
I doubt yahoo messenger protocol has any support for it.
An alternative (to multiple IPs on a single NIC) is to design your own multiplexing protocol and have satellite servers that convert from the multiplex protocol to the yahoo protocol.

I'll trow in another approach you could consider (depending on how desperate you are).
Note that operating system TCP/IP implementations need to be general purpose, but you are only interested in a very specific use-case. So it might make sense to implement a cut-down version of TCP/IP (which only handles your use-case, but does that very well) in your application code.
For example, if you are using Linux, you could route a couple of IP addresses to a tun interface and have your application handle the IP packets for that tun interface. That way you can implement TCP/IP (optimised for your use-case) entirely in your application and avoid any operating system restriction on the number of open connections.
Of course, it's quite a bit of work doing the TCP/IP yourself, but it really depends on how desperate you are - i.e. how much hardware can you afford to throw at the problem.

500,000 arbitrary yahoo messenger connections - is your telco doing this on behalf of Yahoo? It seems like whatever solution has been in place for many years now should be scalable with the help of Moore's Law - and as far as I know all the IM clients have been pretty effective for a long time, and there's no pressing increase in demand that I can think of.
Why isn't this a reasonable problem to address with hardware plus traditional solutions?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse