Multimedia Streaming: Users connecting with multiple tabs - streaming

We're writing a streaming service using wowza as our streaming server with a flash applet on the client. A recent concern we've had is a single client accessing the same stream from multiple tabs. In this scenario, we want to save bandwidth by sending the client only one copy of the stream.
How is this problem typically handled?

I think this is not possible for multiple technical reasons. This is my personal opinion, so experts can correct me if I am wrong)
The flash players on multiple tabs are not aware of each other and they can't be associated with each other on Wowza Streaming Engine side either. Wowza would see 2 different clients from the same IP, but that might be two different devices behind the same router.
The browser tabs are usually isolated from each other in the browser implementations for security and stability reasons, not allowing cross-browser data sharing
The 2 players are not in perfect sync, started at different points in time and lags, buffering might also introduce time shift
I haven't seen any attempt to reduce bandwidth in this scenario. There are solutions to prevent users from playing multiple streams in parallel though, which I would not like to link here, but do a search for "wowza forbidding concurrent stream" and you'll find it.
I think what you thought of is multicast streaming, where network packets are broadcast over the network and every player receives the same stream. This is used in IP TV systems, but that is not possible over internet, only on intranet.

Related

Should I use RTP or WebRTC for local network audio communication

I have a set of Raspberry Pi Zeros that I would like to use as a home intercom. I initially set them up to send audio to each other using golang with gRPC and bidirectional streaming, which works for short calls, but the lag builds up over time, so I think I need to switch to a real-time protocol like RTP or WebRTC. Since I already know the IP address of each device, and the hardware/supported codecs for each is the same, and they are all on the same network, is there any advantage to using WebRTC over using plain RTP? My understanding is that WebRTC mainly provides some additional security and connection orchestration like ICE and SDP, which I wouldn't necessarily need. I am trying to minimize resource usage since these devices are not as powerful as a phone or desktop. If I do use WebRTC, I can do the SDP signaling with gRPC or some other direct delivery method. Since there are more than 2 devices, I'm also curious about multicast functionality, which seems pure-RTP specific, while WebRTC (which uses RTP), doesn't necessarily support multicasting, and would require (n-1)! p2p connections. I'm very unclear/unsure about this point.
Also, does either support mixing audio channels natively, or would that need to be handled in the custom software?
You could use WebRTC, but you'd need to rig a signalling server, and a STUN / TURN server. These can be super simple and low capacity because everything is on a private network, but you still need 'em. The signalling server handles the necessary SDP interchange. Going full WebRTC might be overengineering this. (But of course learning to get WebRTC working can be useful.)
You've built out a golang infrastructure. Seeing as how you're on a private network, you could change up that program to send multicast UDP packets or RTP packets. Then you can rig your listeners to listen to them.
No matter what you do, you'll need to deal with the lag. A good way to do it in the packet world: don't build a queue of buffers ready to play. Instead, always put each received packet as the next-to-play packet, even if you have to overwrite a previously received packet. (That is, skip ahead.) You may get a pop once in a while, but with reasonably short packets, under 50ms, it shouldn't affect the user experience significantly. And the lag won't build up.
The oldtimey phone system ran on a continent-wide 8K synchronous clock. So lag was not an issue. But it's always a problem when audio analog-to-digital and digital-to-analog clocks aren't synchronized. That's true whenever they are on different devices. The slightest drift builds up over time. (RPis don't have fifty-dollar clock parts in them with guaranteed low drift.)
If all your audio sources run at the same sample rate, you can average them to mix them. That should get you started. (If you're using WebRTC in a browser, it will mix multiple sources for you. )
Since you are using Go check out offline-browser-communication. This removes the need for Signaling and STUN/TURN. It uses mDNS and pre-generated certificates. It is also being discussed in the WICG Discourse no idea if/when it will land.
'Lag' is a pretty common problem to have when doing media over TCP. You have lots of queues and congestion control you are dealing with. WebRTC (and RTP in general) is great at solving this. You have the following standardized things to solve it.
RTP packets have the relative timestamp
RTP Sender reports have a mapping of relative to NTP timestamp. Use this for sync/timing.
RTP Receiver reports give you packet loss/jitter. Use this to assert your network health.
Multicast is a fantastic suggestion as well. You reduce the complexity of having to signal all those 1:1 connections, and reduce the amount of bandwidth required. It does make security a little bit more delicate/roll your own though.
With Pion we decoupled all the RTP/RTCP stuff Pion Interceptor. So you don't have to use the full WebRTC stack to get the media transport things mentioned above.

What's the difference between SIP/XMPP for web conferencing and file-sharing?

I want to setup a personal videoconferencing service for my family, friends and myself. The main problem I have with current options is that they are either closed-source and centralized (GG hangouts, skype) or open-source but not working in corporate environment or in hotels (due to strict firewalling rules and the "Skype is going through, if you want VOIP use that" kind of netadmin reaction).
I have two solutions then. Either setup a STUN/TURN relay server and use XMPP and SIP as I used to, but that would require my friends to setup that too. Or setup a whole VOIP server. 2 solutions come to mind: SIP and XMPP. Though to my knowledge, each of them ultimately uses the (S)RTP/RTCP protocol.
And that's the problem. Out of the specific signaling part used by the two of them, I really can't figure out the difference between them, their typical use case.
I think you're right in that as far as setting up a video conferencing system XMPP and SIP are equivalent. They both are signalling only protocols and the media sessions they set up typically use RTP (although they can both be used to set up any kind of session you want but RTP is the norm).
The biggest problem is also going to be the one you mention about getting video streams out of a corporate firewall. Skype overcomes this obstacle by sending it's media over an SSL connection and is thus able to get through firewalls. Theoretically you could do the same with RTP and in the past I once used openvpn connections with a SIP client to test some audio calls. My experience wasn't great as the audio was very choppy, assumedly as a result of all the extra packaging that is required to get the high volume of small audio packets from one end to the other. That was nearly a decade ago though so perhaps with the better CPU and bandwidth resources available now it would work better.
Personally I think I'd stick with Skype as it's going to be a big hassle to set up your own system. If you were to go ahead with your own the first option I would try would be Asterisk combined with openvpn so that if the clients were behind a firewall or had NAT issues they could connect over it.

RTP/RTSP start up latency: Would this method help to reduce it, and if yes, why we don't have it

This is probably not the best forum for such a specialized question, but at the moment I don't know of a better one (open to suggestions/recommendations).
I work on a video product which for the last 10+ years has been using proprietary communications protocol (DCOM-based) to send the video across the network. A while ago we recognized the need to standardize and currently are almost at a point of ripping out all that DCOM baggage and replacing it with a fully compliant RTP/RTSP client/server framework.
One thing we noticed during testing over the last few months is that when we switch the client to use RTP/RTSP, there's a noticeable increase in start-up latency. The problem is that it's not us but RTSP.
BEFORE (DCOM): we would send one DCOM command and before that command even returned back to the client, the server would already be sending video. -- total latency 1 RTT
NOW (RTSP): This is the sequence of commands, each one being a separate network request: DESCRIBE, SETUP, SETUP, PLAY (assuming the session has audio and video) -- total of 4 RTTs.
Works as designed - unfortunately it feels like a step backwards because prior user experience was actually better.
Can this be improved? If you stay with the standard, short answer is, NO. However, my team fully controls our entire RTP/RTSP stack and I've been thinking we could introduce a new RTSP command (without touching any of existing commands so we are still fully inter-operable) as a solution: DESCRIBE_SETUP_PLAY.
We could send this one command, pass in types of streams interested in (typically, there's only one video and 0..1 audio). Response would include the full SDP text, as well as all the port information and just like before, server would start streaming instantly without waiting for anything else from the client.
Would this work? any downside that I may not be seeing? I'm curious why this wasn't considered (or was dropped) from official spec, since latency even in local intranet is definitely noticeable.
FYI, it is possible according to the RTSP 1.0 specification:
9.1 Pipelining
A client that supports persistent connections or connectionless mode
MAY "pipeline" its requests (i.e., send multiple requests without
waiting for each response). A server MUST send its responses to those
requests in the same order that the requests were received.
The RTSP 2.0 draft also contains support for pipelining.
However none of the clients/servers I've used implement it AFAIK.

How to sync an application state over multiple iphones in the same network?

I am developing an iPhone application that allows to basically click through a series of actions. These series are predefined and synced with a common configuration server.
That app might be running on multiple devices at the same time. All devices are assumed to have the same series of actions defined on them. All devices are considered equal, there is not a server and multiple clients or something like that.
(Only) one of these devices is used by a person at any given time, it is however possible that the person switches to a different device at any given time. All "passive" devices need to be synchronized with the active one, so that they display the same action.
The whole thing should happen as automatically as possible. No selection of devices, configuration, all devices in the same network take part in the same series of actions.
One additional requirement is that a device could join during a presentation (a series of actions) and needs to jump to the currently active action.
Right now, I see two options to implement the networking/communication part of that:
Bonjour. I have implemented a working prototype that can automatically connect with one (1) other device in the network and communicate with that. I am not sure at this point how much additional work the "multiple devices" requirement is. Would I have to open a set of connections for every device and manually send the sync events to all of them? Is there a better way or does bonjour provide anything to help me with that? What does Bonjour provide given that I want to communicate with every device in the network anyway?
Multicast with AsyncUdpSocket. Simply define a port and send multicast sync events out to that port. I guess the main issue compared to using bonjour with tcp would be that the connection is not safe and packets could be lost. This is however in a private, protected wlan network with low traffic if that would really be an issue. Are there other disadvantages that I'm not seeing? Because that sounds like a relatively easy option at this point...
Which one would you suggest? Or is there another, better alternative that I'm not thinking of?
You should check out GameKit (built in to iOS)--they have a lot of the machinery you need in a convenient package. You can easily discover peers on the network and easily send data back for forth between clients (broadcast or peer to peer)
In my experience Bonjour is perfect for what you want. There's an excellent tutorial with associated source code: Chatty that can be easily modified to suit your purposes.
I hobbled together a distributed message bus for the iphone (no centralized server) that would work great for this. It should be noted that the UI guy made a mess of the code, so thar' be dragons there: https://code.google.com/p/iphonebusmiddleware/
The basic idea is to use bonjour to form a network with leader election. The leader becomes the hub through which all the slaves subscribe to topics of interest. Then any message sent to a given topic is delivered to every node subscribed to said topic. A master disconnection simple means restarting the leader election process.

On-demand video streaming

I'm currently researching different streaming methods both for live and on-demand streaming.
I've read about both multicast and unicast, and now I got the following question, which I can not find an answer to.
"Is it possible to make on-demand streaming with multicast?"
The way I understand it is, that when using multicast, the media server creates a stream of the video, which only is played once, which users can connect to and watch.
It it because multicast only allows live streaming? If not can someone please explain to me how it works?
"Is it possible to make on-demand streaming with multicast?"
Technically, yes. Practically, no.
The way I understand it is, that when using multicast, the media server creates a stream of the video, which only is played once, which users can connect to and watch.
You understand it correctly. And that is that.
Well, you can do it, but the bigger question is why would you want it?
On-demand suggests that you start the broadcast at the time that a single viewer wants to see that particular piece of content. If a single user chooses the content and the time it is started, why would you want to multicast it?
Yes, it can be done, but there are caveats. If you take a flight on an old plane you may see an old entertainment system that offers say 20 channels with a movie on each. The channels are all rolling and once the programmes have finished they restart. This is better than having just one channel broadcast on a projector as it gives the user choice of what to watch but doesn't give them the freedom of when to watch.
Modern flight entertainment systems are all on-demand, every passenger can watch any film at any time. So how can multicast help there is the question? If you detect that multiple users are watching the same film, and the caveat being at the same time, you can replace the streams to each user with just one multicast channel. Which is technically savvy but you have to ask why would you do this? This only makes sense if the communication medium is feeliable or insufficient to serve every user simultaneously.
Designing a flight entertainment system that does not scale to every passenger actually using it is a bit short sighted. Therefore the system can handle the worst case of a stream for each user, meaning there is no benefit for multicasting anything.
Some cable/satellite networks implement multicast streaming and use time windows to group as many viewers together as possible. For example wait up to 5 minutes to watch a video whilst displaying the infamous phrase "buffering".