Should I use RTP or WebRTC for local network audio communication - raspberry-pi

I have a set of Raspberry Pi Zeros that I would like to use as a home intercom. I initially set them up to send audio to each other using golang with gRPC and bidirectional streaming, which works for short calls, but the lag builds up over time, so I think I need to switch to a real-time protocol like RTP or WebRTC. Since I already know the IP address of each device, and the hardware/supported codecs for each is the same, and they are all on the same network, is there any advantage to using WebRTC over using plain RTP? My understanding is that WebRTC mainly provides some additional security and connection orchestration like ICE and SDP, which I wouldn't necessarily need. I am trying to minimize resource usage since these devices are not as powerful as a phone or desktop. If I do use WebRTC, I can do the SDP signaling with gRPC or some other direct delivery method. Since there are more than 2 devices, I'm also curious about multicast functionality, which seems pure-RTP specific, while WebRTC (which uses RTP), doesn't necessarily support multicasting, and would require (n-1)! p2p connections. I'm very unclear/unsure about this point.
Also, does either support mixing audio channels natively, or would that need to be handled in the custom software?

You could use WebRTC, but you'd need to rig a signalling server, and a STUN / TURN server. These can be super simple and low capacity because everything is on a private network, but you still need 'em. The signalling server handles the necessary SDP interchange. Going full WebRTC might be overengineering this. (But of course learning to get WebRTC working can be useful.)
You've built out a golang infrastructure. Seeing as how you're on a private network, you could change up that program to send multicast UDP packets or RTP packets. Then you can rig your listeners to listen to them.
No matter what you do, you'll need to deal with the lag. A good way to do it in the packet world: don't build a queue of buffers ready to play. Instead, always put each received packet as the next-to-play packet, even if you have to overwrite a previously received packet. (That is, skip ahead.) You may get a pop once in a while, but with reasonably short packets, under 50ms, it shouldn't affect the user experience significantly. And the lag won't build up.
The oldtimey phone system ran on a continent-wide 8K synchronous clock. So lag was not an issue. But it's always a problem when audio analog-to-digital and digital-to-analog clocks aren't synchronized. That's true whenever they are on different devices. The slightest drift builds up over time. (RPis don't have fifty-dollar clock parts in them with guaranteed low drift.)
If all your audio sources run at the same sample rate, you can average them to mix them. That should get you started. (If you're using WebRTC in a browser, it will mix multiple sources for you. )

Since you are using Go check out offline-browser-communication. This removes the need for Signaling and STUN/TURN. It uses mDNS and pre-generated certificates. It is also being discussed in the WICG Discourse no idea if/when it will land.
'Lag' is a pretty common problem to have when doing media over TCP. You have lots of queues and congestion control you are dealing with. WebRTC (and RTP in general) is great at solving this. You have the following standardized things to solve it.
RTP packets have the relative timestamp
RTP Sender reports have a mapping of relative to NTP timestamp. Use this for sync/timing.
RTP Receiver reports give you packet loss/jitter. Use this to assert your network health.
Multicast is a fantastic suggestion as well. You reduce the complexity of having to signal all those 1:1 connections, and reduce the amount of bandwidth required. It does make security a little bit more delicate/roll your own though.
With Pion we decoupled all the RTP/RTCP stuff Pion Interceptor. So you don't have to use the full WebRTC stack to get the media transport things mentioned above.

Related

Does it make sense to use RTP protocol for multiple streamers and single receiver?

I am in a process of learning and trying to use the RTP/RTCP protocol. My situation is that there is 1 to n streamers and 1 (or potentially 1 to m if needed) receiver(s), but in a way that the streamers themselves do not know about each other (they cannot directly due to technical reasons, such as different network, limited bandwidth, etc...). So it is more like multiple unicast sessions, but the receiver actually knows about them all, collects data from all of them, it is just the senders do not know about each other .
Now reading about the protocol, it seems to me that huge portion of it is related to sending some feedback, collision detections, and so on. So I have doubts, is RTP is really applicable in this case? Is is already used in this way somewhere?
Seems to me it is still beneficial to collect statistic about data transfer that RTP provides (data sent, loss, times, etc...), it just feels the most of the protocol is sort of left out...
Also I have one additional question, going through the various RTP libraries, they all assume that sender will also open ports for receiving RTP/RTCP data, does RTP forbid use of one way communication? I mean application that would only stream the data, not expecting to receive anything back. The libraries (e.g. ccRTP) seem to assume both way communication only...
RTCP is the protocol that provides statistics. The stream receiver (client) will send stats to the sender (server) via RTCP. I don't believe the client will get any statistic reports from the server.
There's nothing wrong with a single client receiving multiple unicast sessions from various servers.
RTP requires two way communication during the setup process. Once setup is complete and the play cmd is sent, it is mostly one way. The exception are the "keep alive" packets that must be sent to the server periodically (usually every 60 seconds or so) to keep the stream going. The exact timeout value is sent to the client during the setup process.
But if you implement your own RTP, there's nothing stopping you from having the server send the stream continuously without any feedback from the client. Basically it would be implementing an infinite timeout value.
You can read about all the details in the spec: RTP: A Transport Protocol for Real-Time Applications

Is UDP always unreliable?

I'm about to re-architect a real-time system that has been prototyped on a single node and specify how it should be scaled up to multiple nodes (probably never more than 20 of them in any one LAN). Some of the functionality will multiply on a per-node basis, and some of it will remain centralised on a one-per-system basis. There is going to be a need for communication between each node and that central unit (possibly a master node), but not between individual nodes.
Due to the real-time demands of the system, UDP is something that should be considered for that communication. But... it is almost always described as unreliable. Is this always the case? Does it not depend on the scale of the network, the data load on the network and the way the protocol is used?
For example, suppose I have a central unit which regularly polls through each node by addressing a UDP message to it, and each node immediately responds with its data via UDP. There is no other communication on the (isolated) network. Suppose there is also some mechanism to ensure there are never any collisions (e.g. all nodes have a maximum transmission length for their responses to a poll message, and the latencies are nailed down to known levels). Is there any (hidden) reason in a simple and structured network like this that you would ever fail to transmit/receive every last UDP packet and have near 100% reliability?
EDIT: the detail of this question suffers from confusion around what "unreliable" means, and whether it is intended to apply only to UDP, or to the system in which UDP is employed. I have chosen to leave this confusion in the question, because looking back over a lot of material on UDP, I can see that this confusion might be very common, and that answers which highlight that confusion and overcome it might be valuable.
The key is, UDP does not make any guarantees. There are many reasons why datagrams might go undelivered:
Sender host buffers fill up
Cosmic rays flip bits somewhere along the way, causing a checksum mismatch and the datagram to be discarded
Electromagnetic interference corrupts the signal momentarily
A network cable gets unplugged for a moment
A hub or switch loses power for a moment
A switch's buffers fill up
Receiving host buffers fill up
If any of these things (or many others) occurs, a datagram may go undelivered. UDP will make no attempt to detect this or to re-deliver it.
Yes. Every layer is potentially unreliable, starting with the electrical signalling across your Ethernet cable. (Ever jostled one of those plugs? You can see it in Wireshark logs.) Collisions are virtually impossible to avoid. And in case of congestion, your protocol stack may decide to drop UDP packets.
But all that's rather beside the point. UDP is unreliable, but that doesn't mean it can't be relied on. Plenty of mission-critical applications run over UDP. You just need to understand the unreliability and account for it.
Unreliable does not mean it will definitely fail. It only means that it does not care about transport problems and thus will not make any guarantees that transmission will be successful. Let's compare some aspects of UDP against TCP.
UDP is packet based, TCP stream based. This has not much to do with reliability.
Packets may arrive in a different order than they were sent. UDP does not care and will deliver the packets in this order to the application. In TCP data have a sequence number so the receivers operating system will detect reordering and forward the data to the application in the correct order. This usually does not matter when you have a direct connection between client and server, but might happen in wide networks like the internet.
Packets may get lost due to router or switch congestion or overload of the senders or receiving system or others. This might also happen in local networks with heavy traffic or if the receiver system is unable to cope with the amount of data, even for a short time. With UDP the data will be lost. TCP instead will detect lost packets and retransmit them and even slow down the traffic to adapt to what speed network and endpoints can handle and thus loose less packets in the future.
Packets might get duplicated. Again TCP will detect this due to the sequence number but UDP will not and thus transmit the duplicate packet to the application.
Packets might get corrupted. Both TCP and UDP have the same kind of checksum to detect small errors, but will not detect larger errors.
Applications using UDP usually does not need the reliability of TCP or don't need all of this. For instance with real time audio and video packet loss is acceptable but duplicates and reordering is not. Thus the RTP protocol contains its own sequence number (timestamp) to detect this case. Also, RTP is often accompanied by the RTCP protocol to send statistics about packet loss back to the peer and thus make adaption of connection speed possible.
If you want reliable UDP, try looking at ENet library.
http://enet.bespin.org/
Unreliability with regard to UDP is different from unreliability in general. Also, UDP and alternatives to it (e.g. TCP) are always only ever components or single layers in a wider system. This can lead to some confusion about what "unreliable" means.
UDP is a transport layer network protocol. The transport layer is responsible for getting data from one point on the network to another specific point on the network. In that context, UDP is described as an "unreliable" protocol because it makes no guarantees about whether the data sent will actually arrive. In contrast, TCP is a "reliable" transport layer protocol because if data goes missing or is corrupted the first time it is sent, the protocol itself has mechanisms to resend the data and ensure it arrives... eventually.
But UDP is not some sloppy "maybe, maybe not - let me think about it and screw you around" protocol. It does what it is specified to do, and is reliable (general sense) at doing it... as well as reliable (general sense) in failing in predictable ways. If you take these failure modes into account elsewhere, UDP can be a component of an overall very reliable system.
For example, by restricting network topology and using UDP to transport higher level protocols, the GigE Vision standard specifies a highly reliable system with high data transfer rates and real-time response whose transport level communications is dominated by UDP traffic.
Historically, the major source of unreliable packet transport was packet collisions due to two sources attempting to transmit simultaneously on a single channel. In modern networks, each node is typically connected on a full duplex link to a network switch, making collisions impossible on that link, and consequently making modern networks much more reliable (in all senses) than was the case when UDP was first designed.
No networking technology currently available can be made 100% reliable... but let's be practical rather than pedantic, because potential unreliability and actual unreliability are a lot like shark attacks - they tend to occur far more in people's minds than in reality.
Some material on UDP makes it sound almost like the people who designed UDP did it just to annoy people - that unreliability was deliberately engineered in. This is not the case, and it is unhelpful to think of it in these terms. It is far better to focus on what UDP does and does not do in comparison to alternatives (e.g. see this comparison between TCP and UDP... which nonetheless lists "unreliability" as a key feature of UDP).
In reality, when there is data to be transmitted, that can be transmitted, it is transmitted; when there is data that can be received, it is received. Likewise, if you transmit packets 1, 2 then 3 directly to an endpoint, they will almost certainly be received as packets 1, 2 and 3 in order (assuming no failures in lower network layers, and that incoming data is buffered in a FIFO as is customary, but not mandatory). You can get a lot of reliability out of this, depending on how you use it.
However, if you transmit packets via multiple routes, all bets are off - "unreliability" of packet order can occur. And if you flood the available buffers, unreliability via dropping packets will occur. And if you allow nodes to transmit at any time (asynchronous), then you will get unreliability through packet collisions. But in the "simple and structured" (and also small and synchronous) LAN described, you may be able to either avoid this, or detect its occurrence (e.g. by sending an incrementing counter value in each packet), which will let you compensate in an application-specific way.
In cases where the power goes off (perhaps momentarily), or cosmic rays strike, or people trip on loose cables causing an unacceptable level of "unreliability"... then don't blame UDP - blame the engineer(s) whose design left the system susceptible to these things.
All things considered, in the LAN described, you might reasonably expect to be able to engineer a system based on UDP so as to never lose more than one packet in every few million, or billion, or even astronomically better than this - but it will depend on specifics, and only you can know if your application can tolerate the quantity and quality of unreliable comms that results in your case.

What's the difference between SIP/XMPP for web conferencing and file-sharing?

I want to setup a personal videoconferencing service for my family, friends and myself. The main problem I have with current options is that they are either closed-source and centralized (GG hangouts, skype) or open-source but not working in corporate environment or in hotels (due to strict firewalling rules and the "Skype is going through, if you want VOIP use that" kind of netadmin reaction).
I have two solutions then. Either setup a STUN/TURN relay server and use XMPP and SIP as I used to, but that would require my friends to setup that too. Or setup a whole VOIP server. 2 solutions come to mind: SIP and XMPP. Though to my knowledge, each of them ultimately uses the (S)RTP/RTCP protocol.
And that's the problem. Out of the specific signaling part used by the two of them, I really can't figure out the difference between them, their typical use case.
I think you're right in that as far as setting up a video conferencing system XMPP and SIP are equivalent. They both are signalling only protocols and the media sessions they set up typically use RTP (although they can both be used to set up any kind of session you want but RTP is the norm).
The biggest problem is also going to be the one you mention about getting video streams out of a corporate firewall. Skype overcomes this obstacle by sending it's media over an SSL connection and is thus able to get through firewalls. Theoretically you could do the same with RTP and in the past I once used openvpn connections with a SIP client to test some audio calls. My experience wasn't great as the audio was very choppy, assumedly as a result of all the extra packaging that is required to get the high volume of small audio packets from one end to the other. That was nearly a decade ago though so perhaps with the better CPU and bandwidth resources available now it would work better.
Personally I think I'd stick with Skype as it's going to be a big hassle to set up your own system. If you were to go ahead with your own the first option I would try would be Asterisk combined with openvpn so that if the clients were behind a firewall or had NAT issues they could connect over it.

How to speed up slow / laggy Windows Phone 7 (WP7) TCP Socket transmit?

Recently, I started using the System.Net.Sockets class introduced in the Mango release of WP7 and have generally been enjoying it, but have noticed a disparity in the latency of transmitting data in debug mode vs. running normally on the phone.
I am writing a "remote control" app which transmits a single byte to a local server on my LAN via Wifi as the user taps a button in the app. Ergo, the perceived responsiveness/timeliness of the app is highly important for a good user experience.
With the phone connected to my PC via USB cable and running the app in debug mode, the TCP connection seems to transmit packets as quickly as the user taps buttons.
With the phone disconnected from the PC, the user can tap up to 7 buttons (and thus case 7 "send" commands with 1 byte payloads before all 7 bytes are sent.) If the user taps a button and waits a little between taps, there seems to be a latency of 1 second.
I've tried setting Socket.NoDelay to both True and False, and it seems to make no difference.
To see what was going on, I used a packet sniffer to see what the traffic looked like.
When the phone was connected via USB to the PC (which was using a Wifi connection), each individual byte was in its own packet being spaced ~200ms apart.
When the phone was operating on its own Wifi connection (disconnected from USB), the bytes still had their own packets, but they were all grouped together in bursts of 4 or 5 packets and each group was ~1000ms apart from the next.
btw, Ping times on my Wifi network to the server are a low 2ms as measured from my laptop.
I realize that buffering "sends" together probably allows the phone to save energy, but is there any way to disable this "delay"? The responsiveness of the app is more important than saving power.
This is an interesting question indeed! I'm going to throw my 2 cents in but please be advised, I'm not an expert on System.Net.Sockets on WP7.
Firstly, performance testing while in the debugger should be ignored. The reason for this is that the additional overhead of logging the stack trace always slows applications down, no matter the OS/language/IDE. Applications should be profiled for performance in release mode and disconnected from the debugger. In your case its actually slower disconnected! Ok so lets try to optimise that.
If you suspect that packets are being buffered (and this is a reasonable assumption), have you tried sending a larger packet? Try linearly increasing the packet size and measuring latency. Could you write a simple micro-profiler in code on the device ie: using DateTime.Now or Stopwatch class to log the latency vs. packet size. Plotting that graph might give you some good insight as to whether your theory is correct. If you find that 10 byte (or even 100byte) packets get sent instantly, then I'd suggest simply pushing more data per transmission. It's a lame hack I know, but if it aint broke ...
Finally you say you are using TCP. Can you try UDP instead? TCP is not designed for real-time communications, but rather accurate communications. UDP by contrast is not error checked, you can't guarantee delivery but you can expect faster (more lightweight, lower latency) performance from it. Networks such as Skype and online gaming are built on UDP not TCP. If you really need acknowledgement of receipt you could always build your own micro-protocol over UDP, using your own Cyclic Redundancy Check for error checking and Request/Response (acknowledgement) protocol.
Such protocols do exist, take a look at Reliable UDP discussed in this previous question. There is a Java based implementation of RUDP about but I'm sure some parts could be ported to C#. Of course the first step is to test if UDP actually helps!
Found this previous question which discusses the issue. Perhaps a Wp7 issue?
Poor UDP performance with Windows Phone 7.1 (Mango)
Still would be interested to see if increasing packet size or switching to UDP works
ok so neither suggestion worked. I found this description of the Nagle algorithm which groups packets as you describe. Setting NoDelay is supposed to help but as you say, doesn't.
http://msdn.microsoft.com/en-us/library/system.net.sockets.socket.nodelay.aspx
Also. See this previous question where Keepalive and NoDelay were set on/off to manually flush the queue. His evidence is anecdotal but worth a try. Can you give it a go and edit your question to post more up to date results?
Socket "Flush" by temporarily enabling NoDelay
Andrew Burnett-Thompson here already mentioned it, but he also wrote that it didn't work for you. I do not understand and I do not see WHY. So, let me explain that issue:
Nagle's algorithm was introduced to avoid a scenario where many small packets had to been sent through a TCP network. Any current state-of-the-art TCP stack enables Nagle's algorithm by default!
Because: TCP itself adds a substantial amount of overhead to any the data transfer stuff that is passing through an IP connection. And applications usually do not care much about sending their data in an optimized fashion over those TCP connections. So, after all that Nagle algorithm that is working inside of the TCP stack of the OS does a very, very good job.
A better explanation of Nagle's algorithm and its background can be found on Wikipedia.
So, your first try: disable Nagle's algorithm on your TCP connection, by setting option TCP_NODELAY on the socket. Did that already resolve your issue? Do you see any difference at all?
If not so, then give me a sign, and we will dig further into the details.
But please, look twice for those differences: check the details. Maybe after all you will get an understanding of how things in your OS's TCP/IP-Stack actually work.
Most likely it is not a software issue. If the phone is using WiFi, the delay could be upwards of 70ms (depending on where the server is, how much bandwidth it has, how busy it is, interference to the AP, and distance from the AP), but most of the delay is just the WiFi. Using GMS, CDMA, LTE or whatever technology the phone is using for cellular data is even slower. I wouldn't imagine you'd get much lower than 110ms on a cellular device unless you stood underneath a cell tower.
Sounds like your reads/writes are buffered. You may try setting the NoDelay property on the Socket to true, you may consider trimming the Send and Receive buffer sizes as well. The reduced responsiveness may be a by-product of there not being enough wifi traffic, i'm not sure if adjusting MTU is an option, but reducing MTU may improve response times.
All of these are only options for a low-bandwidth solution, if you intend to shovel megabytes of data in either direction you will want larger buffers over wifi, large enough to compensate for transmit latency, typically in the range of 32K-256K.
var socket = new System.Net.Sockets.Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
{
NoDelay = true,
SendBufferSize = 3,
ReceiveBufferSize = 3,
};
I didn't test this, but you get the idea.
Have you tried setting SendBufferSize = 0? In the 'C', you can disable winsock buffering by setting SO_SNDBUF to 0, and I'm guessing SendBufferSize means the same in C#
Were you using Lumia 610 and mikrotik accesspoint by any chance?
I have experienced this problem, it made Lumia 610 turn off wifi radio as soon as last connection was closed. This added perceivable delay, compared to Lumia 800 for example. All connections were affected - simply switching wifi off made all apps faster. My admin says it was some feature mikrotiks were not supporting at the time combined with WMM settings. Strangely, most other phones were managing just fine, so we blamed cheapness of the 610 at the beginning.
If you still can replicate the problem, I suggest trying following:
open another connection in the background and ping it all the time.
use 3g/gprs instead of wifi (requires exposing your server to the internet)
use different (or upgraded) phone
use different (or upgraded) AP

how to make SIP protocol more reliable using UDP

Actually We are doing thesis work where we need to make 10 voip phones which are SIP based connected with each other.So they can call and talk among each other.Also we want to add video calls access.Another question is it possible video calls on SIP.
SIP already has built in reliability measures, most of which are specifically to cope with unreliable transports such as UDP. You should read the section in the SIP RFC on Transactions to gain an understanding of how it works. One aspect missing from the SIP RFC is reliability for provisional responses and the supplementary RFC3262 deals with that.
SIP is agnostic to the type of sessions, such as voice or video, it sets up so yes it can be used to set up video calls. There are heaps of readily available SIP softphones around that already provide video, one example being x-lite.
To make it reliable you need to emulate the following two features:
For Calls
You need to sequence the packets.
One end needs to tell the other end that a sequenced packet is missing if this happens, and you probably want take jitter into account -- i.e., wait a small amount of time before you request a missing packet.
For protocol commands
You need to ackknowledge command packets -- if a command is not acknowledged it has to be sent again.