how to make SIP protocol more reliable using UDP - sip

Actually We are doing thesis work where we need to make 10 voip phones which are SIP based connected with each other.So they can call and talk among each other.Also we want to add video calls access.Another question is it possible video calls on SIP.

SIP already has built in reliability measures, most of which are specifically to cope with unreliable transports such as UDP. You should read the section in the SIP RFC on Transactions to gain an understanding of how it works. One aspect missing from the SIP RFC is reliability for provisional responses and the supplementary RFC3262 deals with that.
SIP is agnostic to the type of sessions, such as voice or video, it sets up so yes it can be used to set up video calls. There are heaps of readily available SIP softphones around that already provide video, one example being x-lite.

To make it reliable you need to emulate the following two features:
For Calls
You need to sequence the packets.
One end needs to tell the other end that a sequenced packet is missing if this happens, and you probably want take jitter into account -- i.e., wait a small amount of time before you request a missing packet.
For protocol commands
You need to ackknowledge command packets -- if a command is not acknowledged it has to be sent again.

Related

Should I use RTP or WebRTC for local network audio communication

I have a set of Raspberry Pi Zeros that I would like to use as a home intercom. I initially set them up to send audio to each other using golang with gRPC and bidirectional streaming, which works for short calls, but the lag builds up over time, so I think I need to switch to a real-time protocol like RTP or WebRTC. Since I already know the IP address of each device, and the hardware/supported codecs for each is the same, and they are all on the same network, is there any advantage to using WebRTC over using plain RTP? My understanding is that WebRTC mainly provides some additional security and connection orchestration like ICE and SDP, which I wouldn't necessarily need. I am trying to minimize resource usage since these devices are not as powerful as a phone or desktop. If I do use WebRTC, I can do the SDP signaling with gRPC or some other direct delivery method. Since there are more than 2 devices, I'm also curious about multicast functionality, which seems pure-RTP specific, while WebRTC (which uses RTP), doesn't necessarily support multicasting, and would require (n-1)! p2p connections. I'm very unclear/unsure about this point.
Also, does either support mixing audio channels natively, or would that need to be handled in the custom software?
You could use WebRTC, but you'd need to rig a signalling server, and a STUN / TURN server. These can be super simple and low capacity because everything is on a private network, but you still need 'em. The signalling server handles the necessary SDP interchange. Going full WebRTC might be overengineering this. (But of course learning to get WebRTC working can be useful.)
You've built out a golang infrastructure. Seeing as how you're on a private network, you could change up that program to send multicast UDP packets or RTP packets. Then you can rig your listeners to listen to them.
No matter what you do, you'll need to deal with the lag. A good way to do it in the packet world: don't build a queue of buffers ready to play. Instead, always put each received packet as the next-to-play packet, even if you have to overwrite a previously received packet. (That is, skip ahead.) You may get a pop once in a while, but with reasonably short packets, under 50ms, it shouldn't affect the user experience significantly. And the lag won't build up.
The oldtimey phone system ran on a continent-wide 8K synchronous clock. So lag was not an issue. But it's always a problem when audio analog-to-digital and digital-to-analog clocks aren't synchronized. That's true whenever they are on different devices. The slightest drift builds up over time. (RPis don't have fifty-dollar clock parts in them with guaranteed low drift.)
If all your audio sources run at the same sample rate, you can average them to mix them. That should get you started. (If you're using WebRTC in a browser, it will mix multiple sources for you. )
Since you are using Go check out offline-browser-communication. This removes the need for Signaling and STUN/TURN. It uses mDNS and pre-generated certificates. It is also being discussed in the WICG Discourse no idea if/when it will land.
'Lag' is a pretty common problem to have when doing media over TCP. You have lots of queues and congestion control you are dealing with. WebRTC (and RTP in general) is great at solving this. You have the following standardized things to solve it.
RTP packets have the relative timestamp
RTP Sender reports have a mapping of relative to NTP timestamp. Use this for sync/timing.
RTP Receiver reports give you packet loss/jitter. Use this to assert your network health.
Multicast is a fantastic suggestion as well. You reduce the complexity of having to signal all those 1:1 connections, and reduce the amount of bandwidth required. It does make security a little bit more delicate/roll your own though.
With Pion we decoupled all the RTP/RTCP stuff Pion Interceptor. So you don't have to use the full WebRTC stack to get the media transport things mentioned above.

Does it make sense to use RTP protocol for multiple streamers and single receiver?

I am in a process of learning and trying to use the RTP/RTCP protocol. My situation is that there is 1 to n streamers and 1 (or potentially 1 to m if needed) receiver(s), but in a way that the streamers themselves do not know about each other (they cannot directly due to technical reasons, such as different network, limited bandwidth, etc...). So it is more like multiple unicast sessions, but the receiver actually knows about them all, collects data from all of them, it is just the senders do not know about each other .
Now reading about the protocol, it seems to me that huge portion of it is related to sending some feedback, collision detections, and so on. So I have doubts, is RTP is really applicable in this case? Is is already used in this way somewhere?
Seems to me it is still beneficial to collect statistic about data transfer that RTP provides (data sent, loss, times, etc...), it just feels the most of the protocol is sort of left out...
Also I have one additional question, going through the various RTP libraries, they all assume that sender will also open ports for receiving RTP/RTCP data, does RTP forbid use of one way communication? I mean application that would only stream the data, not expecting to receive anything back. The libraries (e.g. ccRTP) seem to assume both way communication only...
RTCP is the protocol that provides statistics. The stream receiver (client) will send stats to the sender (server) via RTCP. I don't believe the client will get any statistic reports from the server.
There's nothing wrong with a single client receiving multiple unicast sessions from various servers.
RTP requires two way communication during the setup process. Once setup is complete and the play cmd is sent, it is mostly one way. The exception are the "keep alive" packets that must be sent to the server periodically (usually every 60 seconds or so) to keep the stream going. The exact timeout value is sent to the client during the setup process.
But if you implement your own RTP, there's nothing stopping you from having the server send the stream continuously without any feedback from the client. Basically it would be implementing an infinite timeout value.
You can read about all the details in the spec: RTP: A Transport Protocol for Real-Time Applications

Packet drop notification in Layer-2

IS there a way I can in user-space get notification about a packet being dropped at Layer-2 in 802.11.
According to my understanding what happens is, when a packet is sent out on the medium, there are Layer-2 ACKs which are received if it is delivered correctly (if not,it does the retransmission and ultimately drops the packet if not delivered after several retries..)
I want to be able to access this notification (in user-space)and change the behavior of packet transmission.
I want to be able to send the packet to another host from the FIB rather than dropping the packet.
I have read about libpcap libraries and netfilter hooks which allows me to capture packet and inject them back on the networking stack..
But I'm not able to find hooks (if any, for the wireless stack) to help me capture the packet notification in Layer-2.
Please correct me if I'm not understanding something correctly. Also, any heads-up or links to read would be great.
No, you cannot, at least not using the standardised sockets interfaces. 802.11 is a link layer, and the sockets API is strictly link-layer agnostic: unless it's going to work on all link layers, it's not in sockets. There are good reasons for that: the kind of cross-layer interaction that you envision has been tried many times, and it's always turned out more trouble than it's worth.
You didn't give us any details about the application — but the best solution is most probably to change your application-layer protocol to send explicit acknowledgments, and send your data over the fallback route when you fail to receive an ACK.

What are the required mechanisms for a reliable layer over UDP?

I've been working on writing my own networking engine for my own game development side projects. This requires the options of having unreliable, reliable, and ordered reliable messages. I have not, however, been able to identify all of the mechanisms necessary for reliable and ordered reliable protocols.
What are the required mechanisms for a reliable layer over UDP? Additional details are appreciated.
So far, I gather that these are requirements:
Acknowledge received messages with a sequence number.
Resend unacknowledged messages after a retransmission time expires.
Track round trip times for each destination in order to calculate an appropriate retransmission time.
Identify and remove duplicate packets.
Handle overflowing sequence numbers looping around.
This has influenced my architecture to have reliable message headers with sequences and timestamps, acknowledge messages that echo a received sequence and timestamp, a system for tracking appropriate retransmission times based on address, and a thread that a) receives messages and queues them for user receipt, b) acknowledges reliable messages, and c) retransmits unacknowledged messages with expired retransmission timers.
NOTE:
Reliable UDP is not the same as TCP. Even ordered reliable UDP is not the same as TCP. I am not secretly unaware that I really want TCP. Also, before someone plays the semantics games, yes... reliable UDP is an "oxymoron". This is a layer over UDP that makes for reliable delivery.
You might like to take a look at the answers to this question: What do you use when you need reliable UDP?
I'd add 'flow control' to your list. You want to be able to control the amount of data you're sending on a particular link depending on the round trip time's you're getting or you'll flood the link and just be throwing datagrams away.
Note that depending on the overall protocol, it might be possible to dispense with retransmission timers. See, for example, the Quake 3 network protocol.
In Q3 reliable packets are simply sent until an ack is seen.
Why are you trying to re-invent TCP? It provides all of the features you originally stated, and has been show to work well.
EDIT - Since your comments show that you have additional requirements not originally stated, you should consider whether a hybrid model using multiple sockets would be better than trying to fulfill all of those criteria in a single application-layer protocol.
Actually it seems that what you really need is SCTP.
SCTP supports:
message based (rather than byte stream) transmissions
multiple streams over a single netsock socket
ordered or unordered receipt of packets
... message ordering is optional in SCTP; a receiving application may choose to process messages in the order they are received instead of the order they were sent

How wrong is it to modify the SDP body of a SIP message?

A requirement for the SIP PBX I created for my company was to record all calls passing through it. I solved it by forcing all SIP message to pass through the PBX and to modify the SDP body so the stream passes through it and gets recorded. It works well.
I recently found out that this is not allowed.
Is there any other way to implement call recording and how "wrong" is this in regard to the protocol?
It sounds like you're describing a SIP proxy, more or less a Session Border Controller (SBC). A proxy can modify SDP, though it should be careful in doing so. Typically SBCs will set the media destination to themselves, and proxy the data to the destination. So this is perfectly legal spec-wise (assuming the devices are already coming to your server).
However, "Not allowed" could also mean "recording calls is legally not allowed", which varies a lot state-to-state.
A more conventional way to implement call recording would be to capture the RTP packets on the wire and put them together to create an audio file. There are quite a few tools around to do exactly that and it's even inbuilt into Wireshark.
As far as tweaking with the SDP goes it's definitely not something that is "not allowed" at least not on a technical level. A lot of SIP Proxy's are forced to mangle the IP addresses in SDP when user agents put private IP addresses in them. You'll find most SIP servers out there have some sort of capability in this regard and it's often call NAT mangling or similar.