TURN Server WebRTC Hardware / Network Requirements - sockets

I am currently being challenged (mentally) with the idea of scaling out a TURN server(s) from a novelty to something that scales based on call volume.
Subsequently, I am trying to understand the requirements from a hardware, network, and application perspective and it's associated cost. I have some specific questions come to mind I would love the community's help with wrapping my brain around.
1) Are the ports reused for multiple destinations simultaneously? It seems to me conceptually if it's UDP and the source ip/port, destination ip/port is enough of 4-tuple for uniqueness, I could see it theoretically possible but I've never seen documentation around this.
2) What is the time (if any) before ports are reused. If the TURN server has allocated ports 1234 and 1235 for a given time, when one or both of those sockets close, how long will it be before the TURN server re-allocates those ports as a result of another request.
3) How should I think about the hardware requirements (specifically CPU and memory) of my TURN server(s) as a function of number of concurrent calls?

Related

How can I automatically test a networking (TCP/IP) application?

I teach students to develop network applications, both clients and servers. At this moment, we have not yet touched existing protocols such as HTTP, SMTP, etc. The students write very simple programs on top of the plain socket API. Currently I check a students' work manually, but I want to automate this task and create an automated test bench for networking applications. The most interesting topics for testing are:
Breaking TCP segments into small parts and delivering them with a noticeable delay. A reason I need such test is that students usually just issue a read/recv call and process the received data without checking that all necessary data was received. TCP doesn't guarantee the message boundaries, so in certain circumstances it is necessary to make several read/recv calls. The problem is that in most simple network applications (for example, in a chat application) messages are small and fit into the single TCP segment, so the issue doesn't appear. My idea is to artificially break messages into several small TCP segments (i.e. several bytes of data) so the problem will appear.
Pausing the data transfer for some time to simulate multiple slow clients and check that the multithreading/async sockets are implemented properly in the students' servers.
Resetting a connection in random moments of time.
I've found several systems which simulate a bad network (dummynet, clumsy, netem). Hovewer, they all work on the IP level of the stack, so OS and it's TCP implementation will compensate the data loss. Such systems are able to solve the task number 2, but they are not able to solve tasks 1 and 3. So I think that I need to develop my own solution, which will act as a TCP proxy. My questions are:
Maybe the are any libraries or applications which can (at least partially) solve the given tasks, so I'll be able to use them as a base for my own solution?
In case there is none any suitable existing software projects, maybe there are any ideas and approaches about how to do this properly?
From WireShark mailing list - Creating and Modifying Packets:
...There's a "Tools" page on the Wireshark Wiki:
http://wiki.wireshark.org/Tools
which has a "Traffic generators" section:
https://wiki.wireshark.org/Tools#Traffic_generators
which lists some tools that might be useful...
The "Traffic generators" chapter also mentions another collection of traffic generators
If you write your own socket code, you can address all 3 tasks.
enable the socket's TCP_NODELAY option (disable the Nagle Algorithm for Send Coalescing) via setsockopt(), then you can send() small fragments of data as you wish, optionally with a delay in between (see #2).
simply put a delay in between your send() calls.
use setsockopt() to adjust the socket's SO_LINGER and SO_DONTLINGER options to control whether closing the socket performs an abortive or graceful closure, then simply close the socket at some random interval after the connection is established.

How do game-proxys minify network latency in their infrastructure? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
in South America, many gamers use something called a proxy service which takes their network connection, route's it through their own infrastructure and then exit close to the game server location.. E.g. they want to control that the TCP Traffic does not cross the USA for latency reasons.. So, how could they manipulate the path taken by a TCP connection ?
a) Do they just open up TCP conections in low traffic times (e.g. 4 in the morning) and then keep those for the rest of the day ?
b) Do they keep trying to open up TCP connections UNTIL they get lower latency one and then switch their internal traffic to that connection ?
c) Is the only thing they can do to minify TCP latency over long distances to rent private peerings or choose a hoster with good ones?
d) Could sending UDP packets over such distances reduce latency IF and only if you skip out packet loss (e.g. by sending the traffic redudant/multiple packets each) ?
It all boils down to the question whether u can control somehow what path a TCP connection takes or if you cant.
This talk is all about the networking part which is NOT about the endusers computer (Leantrix/TCP Optimizations) or the game servers.. They can somehow gain additional latency savings and im curious how they do it.
Thank you for the great year I've been with SO for now - its been a pleasure to talk to experts about stuff.
If you are referring to thins like Battleping and the likes, here's what someone has written in a forum that seems to make sense, i suppose the same holds true for South America. The relevant info is "SSH tunnel"
The advent of proxy tunnels came from the demand of Oceanic WoW
players. Incase you don't know, the backbone connecting Oceania to
America is a huge piece of shit and once you leave Australia/New
Zealand packets gain an extra 200ms because gaming packets get shaped
leaving our country, and then they get shaped going into America.
Generally you can ping about 200~ to US Servers, but in real-time the
game data will end up getting prioritized to hell and back and you'll
have a latency of around 500.
The way Lowerping, Battleping and Smoothping etc all work are by
establishing an SSH tunnel to a proxy in America and sending the
SC2/WoW data through it. SSH traffic has much higher priority than
gaming traffic does, so instead of being delayed, the data flies
through. Afaik, it also doesn't get shaped as incoming traffic from
the Blizzard serverside, because they're originating from a proxy
inside of the US.
Feel free to correct me, I might be wrong on some things, but that's
what I've picked up from using the very first tunneling service
(Lowerping) since it came out
As per my knowledge proxy servers do not speed up your connection. They usually send your data through a longer path and the receiver will see the data packet as a packet originated from the proxy server.
The computer that sends data can not determine the path that it takes. It can only determine the end points. When connected to a proxy. The end point of the 1st trip is the proxy and then proxy retransmits the data packet to the second destination.Proxy is a special type of server configured for this retransmission.
Lets see the internet as a spider web. Then each joint is a router. These routers maintain a table called the 'routing table'. Routing table has information about where to foreword the data packets according to their destination. This routing table is updated automatically to send the packets in the shortest path.
So you see if we do not put any interference the data packets will go in the shortest path.
Now the exception,
If the proxy service provider has a different private network connection from the proxy location to game server location (some thing like a privet highway with no traffic) the data packets can be delivered quickly. But this is a highly unlikely thing because no one will draw their privet wires around the world instead of already existing internet backbones.
Lets say
A - Origin
B - Destination
C - Proxy
Then,
Normal way packets go
A -> B (Quickest path determined by the routers)
With Proxy packets go,
A -> C -> B (Usually a longer path)
With a proxy and a high-speed privet network C-D (This is a highly unlikely scenario no one have such things.)
A -> C --(less traffic)--> D -> B (Can have a speed gain)
Some other ways of increasing the speed of the connection.
You can use UDP instead of using TCP. TCP usually has some error correcting features. All the routers double check the data to be correct. This slows down the data transmission. With UDP this error checking happens in a minimal level. So if you use UDP the transmitted data might occasionally has errors but will transmit quickly.
The stranded protocols that transmit data ie.HTTP has many fields other than the actual data. These are checksums, browser information, OS information etc. If you make a different protocol by removing these data, the amount of data to be transmitted become small. This will also speedup the communication.

How to use UDP from a machine with only NAT access

I have a machine, with no external IP address, it will need to send UDP packets to the outside world. Only NAT access.
Will this work?
It is really hard to prototype this in our environment.
It is still really under construction.
Any thoughts on how I can prototype this?
Most of the home network configurations in the world are made of a PC with an internal IP and a router with a public IP that NAT the internal one. (Independently of UDP/TCP or whatever protocol that needs to go out)
I see no troubles with it
It should work.
Ensure that for the socket created, set the TTL (time-to-live) to a value that is sufficiently large to cover the possible number of router hops to reach the destination. Running traceroute to the destination IP will give you a rough idea on the number of hops. Note that this value can change depending on network conditions. So it's best to set this to a larger value. Refer to sockets IOCtl API documentation for the syntax for setting TTL.
Finally, remember that UDP is not a reliable protocol. So even after taking the necessary steps above, the packet may not reach its destination. However, if the entire network, including the intermediary routers, is within a controlled environment, such as a corporate intranet, chances of packet drop are minimal.
If you want to add reliability on top of UDP, you can adopt a NAK based algorithm where packets are stamped with a sequence number. Various resources might advise you that if you need to add reliability over UDP you should consider TCP, but my experience has been that if your app runs in a controlled environment with very minimal chance of packet drops and you need fast connection setup and tear down, adding a lightweight reliability over UDP has its merits. Also TCP connections take up valuable space in the OS kernel whereas UDP don't. This could also be a consideration if you want to support very large number of 'connections' in a constrained environment.
At the end of the day you need to experiment a little to figure out what works best for you.
To prototype, I would set up a NAT server using something like Linux and then start working from there. Real world traffic scenarios that you want to simulate will determine where the client and server are to be located on either side of the NAT. That is, if the traffic should go through an ISP or all within a controlled environment.
HTH

Is using Erlang's gen_tcp a scalable way to construct a high traffic socket server

I am trying to learn Erlang to do some simple but scalable network programming. I basically want to write a program that does what servers on the backbone of the internet do--but on a smaller scale. I want to try to set up an intranet with web accessible servers which would act as gateways to the intranet [sic] and route data to connected clients and/or other gateways.
The high traffic would come from the fact that data would not only flow from client to gateway to client, but might have to bounce around a few gateways to get to the destination (like how data travels on the internet). This means that the gateways would have to not only handle traffic from their clients, but traffic from other gateways' clients.
I figured this would lead to unusually high levels of traffic, even for a medium number of clients and gateways.
Coming from a background in Python and, to a lesser extent, other scripting languages, I am used to digging for a customized module to solve my problems. I understand Erlang is designed for high traffic network programming, but all I could find in terms of libraries/modules for this kind of thing was gen_tcp.
Does this mean that Erlang is already so optimized for this kind of thing that you can fire it up with its most basic modules and expect it to scale nicely?
You can expect gen_tcp to perform extremely well, even under conditions of massive load. If you are just going to pass around data and not process it much, then my guess is you will be able to scale quite nicely - effectively you will just be passing around pointers.
All of the known scalable solutions written in Erlang uses gen_tcp:
Cowboy, Mochiweb, Yaws, ...
Riak
Etorrent
RabbitMQ
and so on. When using it, there is a hint worth mentioning though: Make sure you run erl as erl +K true so you get access to the kernel polling. That is, epoll() on Linux, kqueue()/kevent() on BSD and /dev/poll on Solaris. Also note that you can give commands to TCP ports to set their options w.r.t. buffer size and so on. Finally, for certain types of packets, you can have the C-layer parse the packet for you, see erl -man inet and the setopts/2 call. An example would be {packet, 4} which is quite popular.
In general, Erlang has a quite fast I/O sublayer. You can expect it to perform really quickly, even for large complex interactions.

Heartbeat Protocols/Algorithms or best practices

Recently I've added some load-balancing capabilities to a piece of software that I wrote. It is a networked application that does some data crunching based on input coming from a SQL database. Since the crunching can be pretty intensive I've added the capability to have multiple instances of this application running on different servers to split the load but as it is now the load balancing is a manual act. A user must specify which instances take which portion of the input domain.
I would like to take that to the next level and program the instances to automatically negotiate the diving up of the input data and to recognize if one of them "disappears" (has crashed or has been powered down) so that the remaining instances can take on the failed instance's workload.
In order to implement this I'm considering using a simple heartbeat protocol between the instances to determine who's online and who isn't and while this is not terribly complicated I'd like to know if there are any established heartbeat network protocols (based on UDP, TCP or both).
Obviously this happens a lot in the networking world with clustering, fail-over and high-availability technologies so I guess in the end I'd like to know if maybe there are any established protocols or algorithms that I should be aware of or implement.
EDIT
It seems, based on the answers, that either there are no well established heart-beat protocols or that nobody knows about them (which would imply that they aren't so well established after all) in which case I'm just going to roll my own.
While none of the answers offered what I was looking for specifically I'm going to vote for Matt Davis's answer since it was the closest and he pointed out a good idea to use multicast.
Thank you all for your time~
Distribued Interactive Simulation (DIS), which is defined under IEEE Standard 1278, uses a default heartbeat of 5 seconds via UDP broadcast. A DIS heartbeat is essentially an Entity State PDU, which fully defines the state, including the position, of the given entity. Due to its application within the simulation community, DIS also uses a concept referred to as dead-reckoning to provide higher frequency heartbeats when the actual position, for example, is outside a given threshold of its predicted position.
In your case, a DIS Entity State PDU would be overkill. I only mention it to make note of the fact that heartbeats can vary in frequency depending on the circumstances. I don't know that you'd need something like this for the application you described, but you never know.
For heartbeats, use UDP, not TCP. A heartbeat is, by nature, a connectionless contrivance, so it goes that UDP (connectionless) is more relevant here than TCP (connection-oriented).
The thing to keep in mind about UDP broadcasts is that a broadcast message is confined to the broadcast domain. In short, if you have computers that are separated by a layer 3 device, e.g., a router, then broadcasts are not going to work because the router will not transmit broadcast messages from one broadcast domain to another. In this case, I would recommend using multicast since it will span the broadcast domains, providing the time-to-live (TTL) value is set high enough. It's also a more automated approach than directed unicast, which would require the sender to know the IP address of the receiver in order to send the message.
Broadcast a heartbeat every t using UDP; if you haven't heard from a machine in more than k*t, then it's assumed down. Be careful that the aggregate bandwidth used isn't a drain on resources. You can use IP broadcast addresses, or keep a list of specific IPs you're doing work for.
Make sure the heartbeat includes a "reboot count" as well as "machine ID" so that you know previous server state isn't around.
I'd recommend using MapReduce if it fits. It would save a lot of work.
I'm not sure this will answer the question but you might be interested by the way Weblogic Server clustering work under the hood. From the book Mastering BEA WebLogic Server:
[...] WebLogic Server clustering provides a loose coupling of the servers in the cluster. Each server in the cluster is independent and does not rely on any other server for any fundamental operations. Even if contact with every other server is lost, each server will continue to run and be able to process the requests it receives. Each server in the cluster maintains its own list of other servers in the cluster through periodic heartbeat messages. Every 10 seconds, each server sends a heartbeat message to the other servers in the cluster to let them know it is still alive. Heartbeat messages are sent using IP multicast technology built into the JVM, making this mechanism efficient and scalable as the number of servers in the cluster gets large. Each server receives these heartbeat messages from other servers and uses them to maintain its current cluster membership list. If a server misses receiving three heartbeat messages in a row from any other server, it takes that server out of its membership list until it receives another heartbeat message from that server. This heartbeat technology allows servers to be dynamically added and dropped from the cluster with no impact on the existing servers’ configurations.
Cisco content switches are a hardware solution for this problem. They implement a virtual IP address as a front end to multiple real servers, whose real IP addresses are known to the switch. The switch periodically sends HTTP HEAD requests to the web servers, to verify they are still running (which the switch software calls a "keepalive", although this doesn't keep the server itself alive). The Cisco switch accepts traffic on the virtual IP and forwards it to the actual web servers, using configurable load balancing such as round-robin, or user-defined load balancing.
These switches retail in the $3-10K range, although my business partner picked one up on eBay for about $300 a year ago. If you can afford one, they do represent a proven hardware solution to the question of how to have a service spread transparently across multiple servers. Redhat includes a built-in port configuration so that you could implement your own Cisco switch using a cheap RedHat box. Google for "virtual ip address" and "cisco content router" for more information.
In addition to trying hardware load-balancers, you can also try a free-open-source load-balancing software application such as HAProxy, available for Linux and the BSDs.