What is the difference between RTP or RTSP in a streaming server? - streaming

I'm thinking about developing a streaming server and I have the following question, do over RTSP (example url: rtsp://192.168.0.184/myvideo.mpg) or RTP (example url: rtp://192.168.0.184).
As I have understood, an RTSP server is mainly used for streaming of files that already exist, ie, not live. RTP server is used to broadcast.
Somebody correct me if I'm wrong, am I right?.
What I want to develop a server to broadcast live content on the computer screen, that is, which is displayed at the time that is broadcast in streaming.

You are getting something wrong... RTSP is a realtime streaming protocol. Meaning, you can stream whatever you want in real time. So you can use it to stream LIVE content (no matter what it is, video, audio, text, presentation...). RTP is a transport protocol which is used to transport media data which is negotiated over RTSP.
You use RTSP to control media transmission over RTP. You use it to setup, play, pause, teardown the stream...
So, if you want your server to just start streaming when the URL is requested, you can implement some sort of RTP-only server. But if you want more control and if you are streaming live video, you must use RTSP, because it transmits SDP and other important decoding data.
Read the documents I linked here, they are a good starting point.

AFAIK, RTSP does not transmit streams at all, it is just an out-of-band control protocol with functions like PLAY and STOP.
Raw UDP or RTP over UDP are transmission protocols for streams just like raw TCP or HTTP over TCP.
To be able to stream a certain program over the given transmission protocol, an encapsulation method has to be defined for your container format. For example TS container can be transmitted over UDP but Matroska can not.
Pretty much everything can be transported through TCP though.
(The fact that which codec do you use also matters indirectly as it restricts the container formats you can use.)

Some basics:
RTSP server can be used for dead source as well as for live source. RTSP protocols provides you commands (Like your VCR Remote), and functionality depends upon your implementation.
RTP is real time protocol used for transporting audio and video in real time. Transport used can be unicast, multicast or broadcast, depending upon transport address and port. Besides transporting RTP does lots of things for you like packetization, reordering, jitter control, QoS, support for Lip sync.....
In your case if you want broadcasting streaming server then you need both RTSP (for control) as well as RTP (broadcasting audio and video)
To start with you can go through sample code provided by live555

I hear your pain. I'm going through this right now (years later).
From what I've learned, you can think of RTSP as a "VCR controller", the protocol allows you to specify which streams (presentations) you want to play, it will then send you a description of the media, and then you can use RTSP to play, stop, pause, and record the remote stream. The media itself goes over RTP. RTSP is normally implemented over a different socket or communication layer. Although it is simply a protocol, most often it's implemented by a server over a socket. For live streams, the RTSP stream you request is simply a name of a stream. It doesn't need to refer to a file on the server, the server's RTSP implementation can parse that stream, put together a live graph, and then provide the SDP (description) for that stream name. But, this is of course specific to the way the RTSP server has been implemented. For "live" streams, it's probably simpler to just use RTP, but you'll need a way to transfer the SDP from the RTP server to the client that wants to play that stream.

I think thats correct. RTSP may use RTP internally.

RTSP is widely used in IP camera, running as RTSP server in camera, so that user could play(pull) the RTSP stream from camera. It is a low cost solution, because we don't need a central media server (think about thousands of camera streams). The arch is bellow:
IP Camera ----RTSP(pull)---> Player
(RTSP server) (User Agent)
The RTSP protocol actually contains:
A signaling over TCP, at port 554, used to exchange the SDP (also used in WebRTC), about media capabilities.
UDP/TCP streams over serval ports, generally two ports, one for RTCP and one for RTP (also used in WebRTC).
Comparing to WebRTC, which is now available in H5:
A signaling over HTTP/WebSocket or exchange by any other protocols, used to exchange the SDP.
UDP streams(RTP/RTCP) over one or many ports, generally bind to one port, to make cloud services load balancer happy.
In protocol view, RTSP and WebRTC are similar, but the use scenario is very different, because it's off the topic, let's grossly simplified, WebRTC is design for web conference, while RTSP is used for IP camera systems.
So it's clear both RTSP and WebRTC are solution and protocol, used in different scenario. While RTP is transport protocol, also it can be used in live streaming by WebRTC.
Note: RTSP is not available for H5 or internet live streaming, but we could covert it by FFmpeg and a gateway server, please see here.

RTP is the transport protocol for real-time data. It provides timestamp, sequence number, and other means to handle the timing issues in real-time data transport.
RTSP is a control protocol that initiates and directs delivery of streaming multimedia data from media servers. It is the "Internet VCR remote control protocol." Its role is to provide the remote control; how­ever, the actual data delivery is done separately, most likely by RTP.
also, RTCP is the control part of RTP that helps with quality of service and membership management.
These three related protocols are used for real-time multimedia data over the Internet. Read the excellent full documentation at this link: RTP, RTCP & RTSP

RTSP (actually RTP) can be used for streaming video, but also many other types of media including live presentations. Rtsp is just the protocol used to setup the RTP session.
For all the details you can check out my open source RTSP Server implementation on the following address: https://net7mma.codeplex.com/
Or my article # http://www.codeproject.com/Articles/507218/Managed-Media-Aggregation-using-Rtsp-and-Rtp
It supports re-sourcing streams as well as the dynamic creation of streams, various RFC's are implemented and the library achieves better performance and less memory then FFMPEG and just about any other solutions in the transport layer and thus makes it a good candidate to use as a centralized point of access for most scenarios.

Related

Should I use RTP or WebRTC for local network audio communication

I have a set of Raspberry Pi Zeros that I would like to use as a home intercom. I initially set them up to send audio to each other using golang with gRPC and bidirectional streaming, which works for short calls, but the lag builds up over time, so I think I need to switch to a real-time protocol like RTP or WebRTC. Since I already know the IP address of each device, and the hardware/supported codecs for each is the same, and they are all on the same network, is there any advantage to using WebRTC over using plain RTP? My understanding is that WebRTC mainly provides some additional security and connection orchestration like ICE and SDP, which I wouldn't necessarily need. I am trying to minimize resource usage since these devices are not as powerful as a phone or desktop. If I do use WebRTC, I can do the SDP signaling with gRPC or some other direct delivery method. Since there are more than 2 devices, I'm also curious about multicast functionality, which seems pure-RTP specific, while WebRTC (which uses RTP), doesn't necessarily support multicasting, and would require (n-1)! p2p connections. I'm very unclear/unsure about this point.
Also, does either support mixing audio channels natively, or would that need to be handled in the custom software?
You could use WebRTC, but you'd need to rig a signalling server, and a STUN / TURN server. These can be super simple and low capacity because everything is on a private network, but you still need 'em. The signalling server handles the necessary SDP interchange. Going full WebRTC might be overengineering this. (But of course learning to get WebRTC working can be useful.)
You've built out a golang infrastructure. Seeing as how you're on a private network, you could change up that program to send multicast UDP packets or RTP packets. Then you can rig your listeners to listen to them.
No matter what you do, you'll need to deal with the lag. A good way to do it in the packet world: don't build a queue of buffers ready to play. Instead, always put each received packet as the next-to-play packet, even if you have to overwrite a previously received packet. (That is, skip ahead.) You may get a pop once in a while, but with reasonably short packets, under 50ms, it shouldn't affect the user experience significantly. And the lag won't build up.
The oldtimey phone system ran on a continent-wide 8K synchronous clock. So lag was not an issue. But it's always a problem when audio analog-to-digital and digital-to-analog clocks aren't synchronized. That's true whenever they are on different devices. The slightest drift builds up over time. (RPis don't have fifty-dollar clock parts in them with guaranteed low drift.)
If all your audio sources run at the same sample rate, you can average them to mix them. That should get you started. (If you're using WebRTC in a browser, it will mix multiple sources for you. )
Since you are using Go check out offline-browser-communication. This removes the need for Signaling and STUN/TURN. It uses mDNS and pre-generated certificates. It is also being discussed in the WICG Discourse no idea if/when it will land.
'Lag' is a pretty common problem to have when doing media over TCP. You have lots of queues and congestion control you are dealing with. WebRTC (and RTP in general) is great at solving this. You have the following standardized things to solve it.
RTP packets have the relative timestamp
RTP Sender reports have a mapping of relative to NTP timestamp. Use this for sync/timing.
RTP Receiver reports give you packet loss/jitter. Use this to assert your network health.
Multicast is a fantastic suggestion as well. You reduce the complexity of having to signal all those 1:1 connections, and reduce the amount of bandwidth required. It does make security a little bit more delicate/roll your own though.
With Pion we decoupled all the RTP/RTCP stuff Pion Interceptor. So you don't have to use the full WebRTC stack to get the media transport things mentioned above.

Playing a streamed data from the server on a client software

I'm trying to plan a learning curve for a nodeJS module that streams all my output sounds to a server. Say I have a nodeJS module that forwards all outgoing sounds and music as packets to the server's port 8000. How can I connect some client's mp3 player to playback the streaming audio formats from the server? I mean the buffer that is sent is just raw messy bits, how to make my audio player on the client recognize the format, connect to the stream, forward the packets to the player etc.
You (I) need to open up a file, meaning by that a resource through POST request's answer and transmit to that file chunks of data from your original video resource according to the indices[ranges] the request asks for. So the request asks for data at xyz (just in an additional field) and tries to download resource Z and you constantly fill that resource with data so that it is always full.
This is a rather complex topic but there are many examples and documentation already available.
Just doing a quick search these came out:
node (socket) live audio stream / broadcast
https://www.npmjs.com/package/audio-stream
By the way, I'm not an expert but I think if you want to do audio streaming probably mp3 is not the right choice and you may get some benefit if you convert it to an intermediate streaming format.

Audio + visual through TCP/IP

im in the process of making an application similar to skype to interact with another computer and i have a few questions.
I know all the basics such as how to send data over tcp etc in the form of an image and audio.
How does applications like skype send live audio? Does it litrally record 1 byte of audio, send it and play it and then repeat the process? For me its not instant so i dont see how that would be possible.
How would u send string and image through tcp at the same time (video call + chat), would you use multiple ports? i can see how that would be very bad. The way im doing it atm is when i click to recive an image, i set it up to receive an image so it receives properly, if a string got sent at this time for example, it wouldnt work as it cant be converted to an image if you see what im saying. im not sure how else i would do it. I Could send each thing with its type as the beginning for example "string Hello how are you" then decypher the data type through that, but that seems abit tedious and slow.
If anyone could give me an insight, that would be fantastic
I can't speak for how skype does it, but this should be a starting point:
Streaming audio/video is usually transported over UDP sockets, not TCP. TCP guarantees delivery whereas UDP is best effort. If you have a temporary connection loss you care more that the video you're receiving is current, not that you receive the whole stream.
The data is usually compressed (and sometimes encrypted) using a standard compression algorithm after being received from a camera/microphone. Have a look at H264, which is commonly used to compress video.
RTP is often used to transmit audio/video. It allows multiple types of stream to be combined over a single socket.
Control traffic is usually sent separately over a different socket, usually TCP. For example SIP which is used for VoIP phones initiates a control connection over a TCP or UDP port (usually 5060). The two ends then negotiate which types of stream will be supported, and how those streams will be sent. For SIP, this will be an RTP stream which is set up on a different UDP port.

RTP Multiplexing or Mixing?

I'm designing a real-time voice comm system that I want to use RTP with. Here's my general requirements:
Every user streams one audio stream to the server
The incoming stream may be compressed differently, depending on the source (a SIP trunk, an Android phone, a desktop client, etc.)
Users can pick which streams they want to receive
If the users had unlimited bandwidth and there wasn't a limited number of ports, I would just have them each open an RTP stream with the server for each stream they wanted to receive. However, a lot of the users will be over a 3G or 2G network, so my question is, how can I bundle the streams (chosen by the user) into a single RTP stream?
One option I've seen is multiplexing the streams into a single packet, but as far as I can tell, that actually goes against the RFC (however, there are working drafts for multiplexing).
Another option would be just mixing the audio together into one packet. Is that the recommended way to do this? I would have to normalize all of the chosen streams into one format first.
I'm very new to the whole VoIP/streaming media thing, so this may be a poor question.
I'm guessing that you want to use tcp in order to not lose data.
you need to use rtp over tcp see rfc
this allow sending several rtp stream over a single socket with a unique id per stream.

Reasons to use RTP when streaming a pre-existing file?

The only reason I could think of for using RTP to transfer a pre-existing file is if you're trying to monitor the amount of time a user is streaming the file, like if you're running a time-based On-Demand website. The other streaming-solution i know of is to use HTTP to upload a media file, then providing a client to progressively play the file. Can anyone come up with another reason to use RTP to stream media files?
You don't use RTP to transfer files, you use RTP to stream media to media players.
If you want to serve media RTP has some advantages:
RTP capable clients can use the stream, they might not be able to use whatever else you come up with.
Tolerates network congestion. If you serve the data over a TCP connection, the stream is quite sensible to packet loss and congestion. TCP has long timeouts and you might experience stalls.