What's the difference between SIP/XMPP for web conferencing and file-sharing? - xmpp

I want to setup a personal videoconferencing service for my family, friends and myself. The main problem I have with current options is that they are either closed-source and centralized (GG hangouts, skype) or open-source but not working in corporate environment or in hotels (due to strict firewalling rules and the "Skype is going through, if you want VOIP use that" kind of netadmin reaction).
I have two solutions then. Either setup a STUN/TURN relay server and use XMPP and SIP as I used to, but that would require my friends to setup that too. Or setup a whole VOIP server. 2 solutions come to mind: SIP and XMPP. Though to my knowledge, each of them ultimately uses the (S)RTP/RTCP protocol.
And that's the problem. Out of the specific signaling part used by the two of them, I really can't figure out the difference between them, their typical use case.

I think you're right in that as far as setting up a video conferencing system XMPP and SIP are equivalent. They both are signalling only protocols and the media sessions they set up typically use RTP (although they can both be used to set up any kind of session you want but RTP is the norm).
The biggest problem is also going to be the one you mention about getting video streams out of a corporate firewall. Skype overcomes this obstacle by sending it's media over an SSL connection and is thus able to get through firewalls. Theoretically you could do the same with RTP and in the past I once used openvpn connections with a SIP client to test some audio calls. My experience wasn't great as the audio was very choppy, assumedly as a result of all the extra packaging that is required to get the high volume of small audio packets from one end to the other. That was nearly a decade ago though so perhaps with the better CPU and bandwidth resources available now it would work better.
Personally I think I'd stick with Skype as it's going to be a big hassle to set up your own system. If you were to go ahead with your own the first option I would try would be Asterisk combined with openvpn so that if the clients were behind a firewall or had NAT issues they could connect over it.

Related

Should I use RTP or WebRTC for local network audio communication

I have a set of Raspberry Pi Zeros that I would like to use as a home intercom. I initially set them up to send audio to each other using golang with gRPC and bidirectional streaming, which works for short calls, but the lag builds up over time, so I think I need to switch to a real-time protocol like RTP or WebRTC. Since I already know the IP address of each device, and the hardware/supported codecs for each is the same, and they are all on the same network, is there any advantage to using WebRTC over using plain RTP? My understanding is that WebRTC mainly provides some additional security and connection orchestration like ICE and SDP, which I wouldn't necessarily need. I am trying to minimize resource usage since these devices are not as powerful as a phone or desktop. If I do use WebRTC, I can do the SDP signaling with gRPC or some other direct delivery method. Since there are more than 2 devices, I'm also curious about multicast functionality, which seems pure-RTP specific, while WebRTC (which uses RTP), doesn't necessarily support multicasting, and would require (n-1)! p2p connections. I'm very unclear/unsure about this point.
Also, does either support mixing audio channels natively, or would that need to be handled in the custom software?
You could use WebRTC, but you'd need to rig a signalling server, and a STUN / TURN server. These can be super simple and low capacity because everything is on a private network, but you still need 'em. The signalling server handles the necessary SDP interchange. Going full WebRTC might be overengineering this. (But of course learning to get WebRTC working can be useful.)
You've built out a golang infrastructure. Seeing as how you're on a private network, you could change up that program to send multicast UDP packets or RTP packets. Then you can rig your listeners to listen to them.
No matter what you do, you'll need to deal with the lag. A good way to do it in the packet world: don't build a queue of buffers ready to play. Instead, always put each received packet as the next-to-play packet, even if you have to overwrite a previously received packet. (That is, skip ahead.) You may get a pop once in a while, but with reasonably short packets, under 50ms, it shouldn't affect the user experience significantly. And the lag won't build up.
The oldtimey phone system ran on a continent-wide 8K synchronous clock. So lag was not an issue. But it's always a problem when audio analog-to-digital and digital-to-analog clocks aren't synchronized. That's true whenever they are on different devices. The slightest drift builds up over time. (RPis don't have fifty-dollar clock parts in them with guaranteed low drift.)
If all your audio sources run at the same sample rate, you can average them to mix them. That should get you started. (If you're using WebRTC in a browser, it will mix multiple sources for you. )
Since you are using Go check out offline-browser-communication. This removes the need for Signaling and STUN/TURN. It uses mDNS and pre-generated certificates. It is also being discussed in the WICG Discourse no idea if/when it will land.
'Lag' is a pretty common problem to have when doing media over TCP. You have lots of queues and congestion control you are dealing with. WebRTC (and RTP in general) is great at solving this. You have the following standardized things to solve it.
RTP packets have the relative timestamp
RTP Sender reports have a mapping of relative to NTP timestamp. Use this for sync/timing.
RTP Receiver reports give you packet loss/jitter. Use this to assert your network health.
Multicast is a fantastic suggestion as well. You reduce the complexity of having to signal all those 1:1 connections, and reduce the amount of bandwidth required. It does make security a little bit more delicate/roll your own though.
With Pion we decoupled all the RTP/RTCP stuff Pion Interceptor. So you don't have to use the full WebRTC stack to get the media transport things mentioned above.

Connect sockets directly after introduction through server

I'm looking for the name of a protocol and example code that permits handing off IP/port connections to establish unmediated P2P after introduction through a server.
Simple example:
You and I both start chat programs that connect to chatintroduce.com (fictional server). I send you a "Hi! Wanna chat?" message. It doesn't get sent. Instead my chat program tells chatintroduce to send your chat program a request for connection. You respond to a prompt and your chat program tells chatintroduce to broker the connection. Chatintroduce establishes an initial two-way connection between us. Now, this final step is important, chatintroduce releases control and our two chat programs now talk directly to each other without any traffic through chatintroduce.
In other words, I construct packets which have your IP address and you receive them without interference from firewalls, NATs or any other technologies. In other words, true peer-to-peer connection independent of intermediate server.
I need to know what search terms to use to find appropriate technology. An RFC name would suffice. I've been searching for days without success.
I think what you are looking for is TCP/UDP hole punching which typically coordinates the P2P connection using a STUN server to determine the "capabilities" of the firewalls (e.g. is it a full cone nat? symmetric?).
https://en.wikipedia.org/wiki/Hole_punching_(networking)
We employed this at a company I worked for to create a kind of BitTorrent that could circumvent firewalls for streaming video between two peers.
Note that sometimes it is NOT possible to establish a connection without the intermediary.
What you are looking for is ICE protocol. RFC 5245. This protocol is used for connecting two peers through NAT traversal. There are some open source libraries and also some proprietary libraries for this. You can search google with ICE implementation.
You will also need to read about some additional protocols. These are used with ICE protocol. They are STUN and TURN.
For some cases you can't make P2P call 100% time. You will have to use a relay server. Like if the NAT combination of two peers are Symmetric vs Symmetric/PRC. That relay server is called TURN server.
Some technique like Port forwarding and TCP/UDP hole punching will help you to increase P2P rates.
See this answer for more information about which combination of NAT will require a relay server and which don't.
Thank you. I will be looking further into ICE, STUN, TURN, and hole-punching.
I also found n2n which looks like almost exactly what I wanted.
https://github.com/meyerd/n2n
http://xmodulo.com/configure-peer-to-peer-vpn-linux.html
With n2n, one makes a VPN with a super node that all other edge nodes know.
But once the introductions are made, the super node can be absent.
This was exactly what I wanted. I hope it works across platforms (linux, MacOS, Windows).
Again, I am still researching before implementation, so your advice was very important to me.
Thank you.
Use PJNATH. Its open source.
http://www.pjsip.org/pjnath/docs/html/
There is not much open source on NAT Traversal. As far as I know PJNATH is good.
For server you can use Google's Open source STUN and TURN server.

Sending data to 2 servers at once

Lets say I have a server for broadcasting a character's position in an online game and a server for voice chatting with a group of players in the same game. Would the client be able to send data to both servers? Would they connect? Would this work? Would it be practical?
Short answer: yes.
Long answer: That's actually pretty common. Your browser does it all the time, connecting to different servers using different protocols (HTTP and HTTPS, at least), for example. Most mail clients support an even wider range of protocols (POP3, IMAP, SMTP, sometimes NNTP and encrypted variants thereof) and can easily handle multiple servers in parallel. A networked game client works, on the network protocol level, exactly like those, so there's nothing out-of-ordinary nor impractical in doing so.

using XMPP or WebSocket, why there is a server needed in real-time communication between users?

At the bottom, it's all about socket communications. If there is some way to get the ip of the both users, why can't the connection be directly setup between the users instead of having to go thru a server in the middle?
My 2 cents:
No one out there forces us to have a server based real-time communication model. Infact XMPP have an extension called "Serverless Messaging" which defines how to communicate over local or wide-area networks using the principles of zero-configuration networking for endpoint discovery and the syntax of XML streams and XMPP messaging for real-time communication. This method uses DNS-based Service Discovery and Multicast DNS to discover entities that support the protocol, including their IP addresses and preferred ports.
P2P chat applications have been for over a decade now. Having a server in the middle is purely a decision dependent upon your application needs. If your application can live with chats getting lost while the user was transitioning between online/offline status, then you can very well have a direct P2P model going. Similarly, there are a loads and loads of advantages (contact list management, avatars, entity discovery, presence authorization, offline messages, ....) when it comes to choosing a server based messaging model. If you try to have all this right inside your P2P based clients, they might die or under-perform because of all the work they will need to perform by themselves.
"WebSockets" were not designed for P2P/Serverless communication, rather they were designed to provide a standardized PUSH semantic over stateless HTTP protocol. In short, "WebSockets" is a standardized way replacing hacky comet, long-polling, chunked-encoding, jsonp, iframe-based and various other technique developers have been using to simulate server push over HTTP.
Named WebSockets (if someday it is fully and widely supported) could be the solution.
http://namedwebsockets.github.io/spec/
Named WebSockets are useful in a variety of collaborative local device
and local network scenarios: Discover matching peer services on the
local device and/or the local network.
Direct communication between users is possible in Peer To Peer (P2P) networks. In P2P each participant can act as client as well as server. But for P2P networks you need to write a separate program to make the communication possible.
Web Sockets let you leverage existing common browsers as clients. All depends on what is the purpose of your application and how you want to deploy it.
If there is some way to get the ip of the both users
You nailed the answer right in your question.
Most machines I use have IP address of 192.168.0.10 (or similar from 192.168. private network) and are deep, deep behind several layers of NAT. With the end of free IPv4 address pool and IPv6 nowhere near sight, this is the reality most users live. Having a stable intermediary of known, routable address helps a ton working around this issue.
WebSockets don't allow the socket to listen for connections, only to connect as a client to a server (not reverse). Technically they could make it allow this, but as far as I understand the spec doesn't currently (nor is it expected to) allow listen functionality for WebSockets.
The new WebRTC (http://www.webrtc.org/) spec looks like it might support peer-to-peer connections. I have not played with WebRTC at all so I'm not in a position to comment on it. I think it would be a bit more involved than WebSocket stuff. Maybe someone who knows WebRTC better can chime in. (Also apart from the latest version of Chrome I'm not sure if any of the other browsers really support WebRTC yet).

What types of apps are developed today using socket programming?

I've worked in business application development for a while but have never done socket programming. I know that all HTTP transport implicitly involves socket communication but this is all abstracted when using most software frameworks. So I was curious what types of apps developed today involve socket programming?
Any kind of proprietary communication protocol running over UDP or TCP would fit this description. We have a handful of applications that communicate with embedded systems using TCP and UDP, all using specialized protocols.
An application involving networking or network protocols could involve socket programming. This would mean UDP, TCP, peer-to-peer, etc.
Financial companies, especially ones in algorithmic trading area, rely on TCP/IP heavily.
That ranges from third party communication products like Tibco to FIX over TCP sockets to in-house frameworks over UDP/multicast.
Here's what I developped in my own spare time (took me 2 years actually) :
(1) program I called "big chief"
(2) program I called "the manager"
Here's how it works :
First launch the managers on every machine that is configured for that.
Once launched, the big chief asks for dlls to create a list of sites to "suck".
It cut them in "packets" and sends each packet to a "(2) manager"
Each manager has a pool of "workers" (threads). As soon as it gets the list, it activates each thread with one url to "suck". After some time, once all the list is done, the manager make a big "results" packet then send it back to the "big chief".
It can go far further than just "simply" suck urls. (You can define a whole "path" with get and posts, and the cookies follows the path, which means stuff like "going to xx.com, simulate valid button, then go to xx.com/valid.php (with all the cookies and so on set) then simulate something else.)
Yep, it's a mini-google.
I used TCP for "big chief" and "manager" communication with my own protocol and compression before sending.
One of its powerful feature is that you can extend it very easyli. I've used my PC for the "big chief" and 6 other Internet connexions for the managers (including a huge one from my old school). I am able to add as many "managers" as I want :).
PS : Why am I talking about that ? Because I'm proud of it and it's not used at all. It's on my computer, I've sucked a site that is hard to ... suck (pbase.com) and they've probably seen incoming connexions from the States, China, and so on (whereas I'm in France) (yep it does do support public proxies as well)... I'm so proud of a product that is not used at all...