Poor UDP broadcast performance to multiple processes on same PC - sockets

We have an application that broadcasts data using UDP from a server system to client applications running on multiple Windows XP PC's. This is on a LAN, typically Gigabit. This has been running fine for some years.
We now have a requirement to have two (or more) of the client applications running on each quad core PC, with each instance of the application receiving the broadcast data. The method I have used to implement this is to give each client PC multiple IP addresses. Each client app then connects to the server using the same port number but on a different IP. This works functionally but the performance for some reason is very poor. My data transfer rate is cut by around a factor of 10!
To get multiple IP addresses I have tried both using two NIC adapters and assigning multiple IP addresses to a single NIC in the advanced TCP/IP network properties. Both methods seem to give similarly poor performance. I also tried using several different manufacturers NICs but that didn't help either.
One thing I did notice is that the data seems to come over more fragmented. With just a single client on a PC if I send 20kBytes of data to the client it almost always receives it all in one chunk. But with two clients running the data seems to mostly come over in blocks the size of a frame (1500 bytes) so my code has to iterate around more times. But I wouldn't expect this on its own to cause such a dramatic performance hit.
So I guess my question is does any one know why the performance is so much slower and if anything can be done to speed it up?
I know I could re-design things so that the server only sends data to one client per PC, and that client could then mirror the data on to the other clients on the same PC. But that is a major redesign and re-coding effort so I'd like to keep that as a last resort.

Instead of creating one IP address for each client, try using setsockopt() to enable the SO_REUSEADDR option for each of your sockets. This will allow all of your clients to bind to the same port on the same host address and receive the broadcast data. Should be easier to manage than the multiple NIC/IP address approach.
SO_REUSEADDR will allow broadcast and multicast sockets to share the same port and address. For more info see:
SO_REUSEADDR and UDP behavior in Windows
and
Uses of SO_REUSEADDR?

Related

Coordinating peer-to-peer messages using multicast, how to get receiving IP?

I have been working on a local LAN service which uses a multicast port to coordinate several machines. Each machine listens on the multicast port for instructions, and when a certain instruction is received, will send messages directly to other machines.
In other words the multicast port is used to coordinate peer-to-peer UDP messaging.
In practice this works quite well but there is a lingering issue related to correctly setting up these peer-to-peer transmissions. Basically, each machine needs to announce on the multicast port its own IP address, so that other machines know where to send messages when they wish to start a P2P transmission.
I realize that in general the idea of identifying the local IP is not necessarily sensible, but I don't see any other way-- the local receiving IP must be announced one way or another. At least I am not working on the internet, so in general I won't need to worry about NATs, just need to identify the local LAN IP. (No more than 1 hop for the multicast packets is allowed.)
I wanted to, if possible, determine the IP passively, i.e., without sending any messages.
I have been using code that calls getifaddrs(), which returns a linked list of NICs on the machine, and I scan this list for non-zero IP addresses and choose the first one.
In general this has worked okay, but we have had issues where for example a machine with both a wired and wifi connection are active, it will identify the wrong one, and the only work-around we found was to turn off the wifi.
Now, I imagine that a more reliable solution would be to send a message to the multicast telling other machines to report back with the source address of the message; that might allow to identify which IP is actually visible to the other machines on the net. Alternatively maybe even just looking at the multicast loopback message would work.
What do you think, are there any passive solutions to identify which address to use? If not, what's the best active solution?
I'm using POSIX socket API from C. Must work on Linux, OS X, Windows. (For Windows I have been using GetAdapterAddresses().)
Your question about how to get the address so you can advertise it right is looking at it from the wrong side. It's a losing proposition to try to guess what your address is. Better for the other side to detect it itself.
When a listening machine receives a message, it is probably doing do using recvfrom(2). The fifth argument is a buffer into which the kernel will store the address of the peer, if the underlying protocol offers it. Since you are using IP/UDP, the buffer should get filled in with a sockaddr_in showing the IP address of the sender.
I'd use the address on the interface I use to send the announcement multicast message -- on the wired interface announce the wired address and on the wireless interface announce the wireless address.
When all the receivers live on the wired side, they will never see the message on the wireless network.
When there is a bridge between the wired and the wireless network, add a second step in discovery for round-trip time estimation, and include a unique host ID in the announcement packet, so multiple routes to the same host can be detected and the best one chosen.
Also, it may be a good idea to add a configuration option to limit the service to certain interfaces.

UDP multicast from specific network card

I'm looking for some networking gurus to help me with a problem. I have many computers running my software which uses UDP multicasting. This works fine if the computers are connected ONLY to one network (network A). My computer (which is also running said software) will listen on port XXXX for the multicasts. This computer has two network cards and when I connect it to another network, network B, my software goes haywire. The problem is that I do not know what network a given multicast came from. And if I send out a multicast, I cannot tell it to use network A instead of network B or vice versa.
My questions:
Is there a way to distinguish packets coming in from different networks??
Is there a way to send a multicast to network A and NOT network B?
I'm using C++ and Win32 sockets. Thanks to anyone that replies.
You should listen for multicast packets on one interface where you joined the group. You should explicitly set the interface used for sending the multicast packets (otherwise they are routed as everything else, default route, etc.). Both are accomplished via setsockopt calls. Here are some links for you:
Multicast programming - talks about setting "send" interface,
IP Multicast Extensions - talks about both "send" and "receive" interfaces.
Disclaimer: the links are admittedly Unix-centric, so your Windows mileage may vary :)
Working on a project with MC UDP on redundant NICs over the last year, we saw a similar problem. After battling it a bit with winsock, our ultimate solution was to prioritize traffic using the DOS command route
route add 224.x.x.x ... [desired gateway] METRIC 1
This ensured that the traffic only went out on the Interface we wanted.
I realize this might not be exactly what you want, but it could at least be a stopgap solution while you implement another fix.
On multihomed hosts you need to join the multicast group via all interfaces sequentially, or via all the ones you care about. If you are interested in network of origin you could use multiple M/C sockets, each bound to a different interface, same port, and each of them joined to the group; then the receiving socket itself tells you which network any incoming traffic comes from.

Heartbeat Protocols/Algorithms or best practices

Recently I've added some load-balancing capabilities to a piece of software that I wrote. It is a networked application that does some data crunching based on input coming from a SQL database. Since the crunching can be pretty intensive I've added the capability to have multiple instances of this application running on different servers to split the load but as it is now the load balancing is a manual act. A user must specify which instances take which portion of the input domain.
I would like to take that to the next level and program the instances to automatically negotiate the diving up of the input data and to recognize if one of them "disappears" (has crashed or has been powered down) so that the remaining instances can take on the failed instance's workload.
In order to implement this I'm considering using a simple heartbeat protocol between the instances to determine who's online and who isn't and while this is not terribly complicated I'd like to know if there are any established heartbeat network protocols (based on UDP, TCP or both).
Obviously this happens a lot in the networking world with clustering, fail-over and high-availability technologies so I guess in the end I'd like to know if maybe there are any established protocols or algorithms that I should be aware of or implement.
EDIT
It seems, based on the answers, that either there are no well established heart-beat protocols or that nobody knows about them (which would imply that they aren't so well established after all) in which case I'm just going to roll my own.
While none of the answers offered what I was looking for specifically I'm going to vote for Matt Davis's answer since it was the closest and he pointed out a good idea to use multicast.
Thank you all for your time~
Distribued Interactive Simulation (DIS), which is defined under IEEE Standard 1278, uses a default heartbeat of 5 seconds via UDP broadcast. A DIS heartbeat is essentially an Entity State PDU, which fully defines the state, including the position, of the given entity. Due to its application within the simulation community, DIS also uses a concept referred to as dead-reckoning to provide higher frequency heartbeats when the actual position, for example, is outside a given threshold of its predicted position.
In your case, a DIS Entity State PDU would be overkill. I only mention it to make note of the fact that heartbeats can vary in frequency depending on the circumstances. I don't know that you'd need something like this for the application you described, but you never know.
For heartbeats, use UDP, not TCP. A heartbeat is, by nature, a connectionless contrivance, so it goes that UDP (connectionless) is more relevant here than TCP (connection-oriented).
The thing to keep in mind about UDP broadcasts is that a broadcast message is confined to the broadcast domain. In short, if you have computers that are separated by a layer 3 device, e.g., a router, then broadcasts are not going to work because the router will not transmit broadcast messages from one broadcast domain to another. In this case, I would recommend using multicast since it will span the broadcast domains, providing the time-to-live (TTL) value is set high enough. It's also a more automated approach than directed unicast, which would require the sender to know the IP address of the receiver in order to send the message.
Broadcast a heartbeat every t using UDP; if you haven't heard from a machine in more than k*t, then it's assumed down. Be careful that the aggregate bandwidth used isn't a drain on resources. You can use IP broadcast addresses, or keep a list of specific IPs you're doing work for.
Make sure the heartbeat includes a "reboot count" as well as "machine ID" so that you know previous server state isn't around.
I'd recommend using MapReduce if it fits. It would save a lot of work.
I'm not sure this will answer the question but you might be interested by the way Weblogic Server clustering work under the hood. From the book Mastering BEA WebLogic Server:
[...] WebLogic Server clustering provides a loose coupling of the servers in the cluster. Each server in the cluster is independent and does not rely on any other server for any fundamental operations. Even if contact with every other server is lost, each server will continue to run and be able to process the requests it receives. Each server in the cluster maintains its own list of other servers in the cluster through periodic heartbeat messages. Every 10 seconds, each server sends a heartbeat message to the other servers in the cluster to let them know it is still alive. Heartbeat messages are sent using IP multicast technology built into the JVM, making this mechanism efficient and scalable as the number of servers in the cluster gets large. Each server receives these heartbeat messages from other servers and uses them to maintain its current cluster membership list. If a server misses receiving three heartbeat messages in a row from any other server, it takes that server out of its membership list until it receives another heartbeat message from that server. This heartbeat technology allows servers to be dynamically added and dropped from the cluster with no impact on the existing servers’ configurations.
Cisco content switches are a hardware solution for this problem. They implement a virtual IP address as a front end to multiple real servers, whose real IP addresses are known to the switch. The switch periodically sends HTTP HEAD requests to the web servers, to verify they are still running (which the switch software calls a "keepalive", although this doesn't keep the server itself alive). The Cisco switch accepts traffic on the virtual IP and forwards it to the actual web servers, using configurable load balancing such as round-robin, or user-defined load balancing.
These switches retail in the $3-10K range, although my business partner picked one up on eBay for about $300 a year ago. If you can afford one, they do represent a proven hardware solution to the question of how to have a service spread transparently across multiple servers. Redhat includes a built-in port configuration so that you could implement your own Cisco switch using a cheap RedHat box. Google for "virtual ip address" and "cisco content router" for more information.
In addition to trying hardware load-balancers, you can also try a free-open-source load-balancing software application such as HAProxy, available for Linux and the BSDs.

UDP Server Discovery - Should clients send multicasts to find server or should server send regular beacon?

I have clients that need to all connect to a single server process. I am using UDP discovery for the clients to find the server. I have the client and server exchange IP address and port number, so that a TCP/IP connection can be established after completion of the discovery. This way the packet size is kept small. I see that this could be done in one of two ways using UDP:
Each client sends out its own multicast message in search of the server, which the server then responds to. The client can repeat sending this multicast message in regular intervals (in the case that the server is down) until the server responds.
The server sends out a multicast message beacon at regular intervals. The clients subscribe to the multicast group and in this way receives the server's multicast message and complete the discovery.
In 1. if there are many clients then initially there would be many multicast messages transmitted (one from each client). Only the server would subscribe and receive the multicast messages from the clients. Once the server has responded to the client, the client ceases to send out the multicast message. Once all clients have completed their discovery of the server no further multicast messages are transmitted on the network. If however, the server is down, then each client would be sending out a multicast message beacon in intervals until the server is back up and can respond.
In 2. only the server would submit a multicast message beacon in regular intervals. This message would end up getting routed to all clients that are subscribed to the multicast group. Once the clients receive the packet the client's UDP listening socket gets closed and they are no longer subscribed to the multicast group. However, the server must continue to send the multicast beacon, so that new clients can discover it. It would continue sending out the beacon at regular intervals regardless of whether any clients are out their requiring discovery or not.
So, I see pros and cons either way. It seems to me that #1 would result in heavier load initially, but this load eventually reduces down to zero. In #2 the server would continue sending out a beacon forever.
UDP and multicast is a fairly new topic to me, so I am interested in finding out which would be the preferred approach and which would result in less network load.
I've used option #2 in the past several times. It works well for simple network topologies. We did see some throughput problems when UDP datagrams exceeded the Ethernet MTU resulting in a large amount of fragmentation. The largest problem that we have seen is that multicast discovery breaks down in larger topologies since many routers are configured to block multicast traffic.
The issue that Greg alluded to is rather important to consider when you are designing your protocol suite. As soon as you move beyond simple network topologies, you will have to find solutions for address translation, IP spoofing, and a whole host of other issues related to the handoff from your discovery layer to your communications layer. Most of them have to do specifically with how your server identifies itself and ensuring that the identification is something that a client can make use of.
If I could do it over again (how many times have we uttered this phrase), I would look for standards-based discovery mechanisms that fit the bill and start solving the other protocol suite problems. The last thing that you really want to do is come up with a really good discovery scheme that breaks the week after you deploy it because of some unforeseen network topology. Google service discovery for a starting list. I personally tend towards DNS-SD but there are a lot of other options available.
I would recommend method #2, as it is likely (depending on the application) that you will have far more clients than you will servers. By having the server send out a beacon, you only send one packet every so often, rather than one packet for each client.
The other benefit of this method, is that it makes it easier for the clients to determine when a new server becomes available, or when an existing server leaves the network, as they don't have to maintain a connection to each server, or keep polling each server, to find out.
Both are equally viable methods.
The argument for method #1 would be that in normal principle, clients initiate requests, and servers listen and respond to them.
The argument for method #2 would be that the point of multicast is so that one host can send a packet and it can be received by many clients (one-to-many), so it's meant to be the reverse of #1.
OK, as I think about this I'm actually drawn to #2, server-initiated beacon. The problem with #1 is that let's say clients broadcast beacons, and they hook up with the server, but the server either goes offline or changes its IP address.
When the server is back up and sends its first beacon, all the clients will be notified at the same time to reconnect, and your entire system is back up immediately. With #1, all of the clients would have to individually realize that the server is gone, and they would all start multicasting at the same time, until connected back with the server. If you had 1000 clients and 1 server your network load would literally be 1000x greater than method #2.
I know these messages are most likely small, and 1000 packets at a time is nothing to a UDP network, but just from a design standpoint #2 feels better.
Edit: I feel like I'm developing a split-personality disorder here, but just thought of a powerful point of why #1 would be an advantage... If you ever wanted to implement some sort of natural load balancing or scaling with multiple servers, design #1 works well for this. That way the first "available" server can respond to the client's beacon and connect to it, as opposed to #2 where all the clients jump to the beaconing server.
Your option #2 has a big limitation in that it assumes that the server can communicate more or less directly with every possible client. Depending on the exact network architecture of your operational system, this may not be the case. For example, you may be depending that all routers and VPN software and WANs and NATs and whatever other things people connect networks together with, can actually handle the multicast beacon packets.
With #1, you are assuming that the clients can send a UDP packet to the server. This is an entirely reasonable expectation, especially considering the very next thing the client will do is make a TCP connection to the same server.
If the server goes down and the client wants to find out when it's back up, be sure to use exponential backoff otherwise you will take the network down with a packet storm someday!

Can socket connections be multiplexed?

Is it possible to multiplex sa ocket connection?
I need to establish multiple connections to yahoo messenger and i am looking for a way to do this efficiently without having to hold a socket open for each client connection.
so far i have to use one socket for each client and this does not scale well above 50,000 connections.
oh, my solution is for a TELCO, so i need to at least hit 250,000 to 500,000 connections
i'm planing to bind multiple IP addresses to a single NIC to beat the 65k port restriction per IP address.
Please i would any help, insight i can get.
**most of my other questions on this site have gone un-answered :) **
Thanks
This is an interesting question about scaling in a serious situation.
You are essentially asking, "How do I establish N connections to an internet service, where N is >= 250,000".
The only way to do this effectively and efficiently is to cluster. You cannot do this on a single host, so you will need to be able to fragment and partition your client base into a number of different servers, so that each is only handling a subset.
The idea would be for a single server to hold open as few connections as possible (spreading out the connectivity evenly) while holding enough connections to make whatever service you're hosting viable by keeping inter-server communication to a minimum level. This will mean that any two connections that are related (such as two accounts that talk to each other a lot) will have to be on the same host.
You will need servers and network infrastructure that can handle this. You will need a subnet of ip addresses, each server will have to have stateless communication with the internet (i.e. your router will not be doing any NAT in order to not have to track 250,000+ connections).
You will have to talk to AOL. There is no way that AOL will be able to handle this level of connectivity without considering cutting your connection off. Any service of this scale would have to be negotiated with AOL so both you and they would be able to handle the connectivity.
There are i/o multiplexing technologies that you should investigate. Kqueue and epoll come to mind.
In order to write this massively concurrent and teleco grade solution, I would recommend investigating erlang. Erlang is designed for situations such as these (multi-server, massively-multi-client, massively-multithreaded telecommunications grade software). It is currently used for running Ericsson telephone exchanges.
While you can listen on a socket for multiple incoming connection requests, when the connection is established, it connects a unique port on the server to a unique port on the client. In order to multiplex a connection, you need to control both ends of the pipe and have a protocol that allows you to switch contexts from one virtual connection to another or use a stateless protocol that doesn't care about the client's identity. In the former case you'd need to implement it in the application layer so that you could reuse existing connections. In the latter case you could get by using a proxy that keeps track of which server response goes to which client. Since you're connecting to Yahoo Messenger, I don't think you'll be able to do this since it requires an authenticated connection and it assumes that each connection corresponds to a single user.
You can only multiplex multiple connections over a single socket if the other end supports such an operation.
In other words it's a function protocol - sockets don't have any native support for it.
I doubt yahoo messenger protocol has any support for it.
An alternative (to multiple IPs on a single NIC) is to design your own multiplexing protocol and have satellite servers that convert from the multiplex protocol to the yahoo protocol.
I'll trow in another approach you could consider (depending on how desperate you are).
Note that operating system TCP/IP implementations need to be general purpose, but you are only interested in a very specific use-case. So it might make sense to implement a cut-down version of TCP/IP (which only handles your use-case, but does that very well) in your application code.
For example, if you are using Linux, you could route a couple of IP addresses to a tun interface and have your application handle the IP packets for that tun interface. That way you can implement TCP/IP (optimised for your use-case) entirely in your application and avoid any operating system restriction on the number of open connections.
Of course, it's quite a bit of work doing the TCP/IP yourself, but it really depends on how desperate you are - i.e. how much hardware can you afford to throw at the problem.
500,000 arbitrary yahoo messenger connections - is your telco doing this on behalf of Yahoo? It seems like whatever solution has been in place for many years now should be scalable with the help of Moore's Law - and as far as I know all the IM clients have been pretty effective for a long time, and there's no pressing increase in demand that I can think of.
Why isn't this a reasonable problem to address with hardware plus traditional solutions?