Audio + visual through TCP/IP

Audio + visual through TCP/IP - sockets

im in the process of making an application similar to skype to interact with another computer and i have a few questions.
I know all the basics such as how to send data over tcp etc in the form of an image and audio.
How does applications like skype send live audio? Does it litrally record 1 byte of audio, send it and play it and then repeat the process? For me its not instant so i dont see how that would be possible.
How would u send string and image through tcp at the same time (video call + chat), would you use multiple ports? i can see how that would be very bad. The way im doing it atm is when i click to recive an image, i set it up to receive an image so it receives properly, if a string got sent at this time for example, it wouldnt work as it cant be converted to an image if you see what im saying. im not sure how else i would do it. I Could send each thing with its type as the beginning for example "string Hello how are you" then decypher the data type through that, but that seems abit tedious and slow.
If anyone could give me an insight, that would be fantastic

I can't speak for how skype does it, but this should be a starting point:
Streaming audio/video is usually transported over UDP sockets, not TCP. TCP guarantees delivery whereas UDP is best effort. If you have a temporary connection loss you care more that the video you're receiving is current, not that you receive the whole stream.
The data is usually compressed (and sometimes encrypted) using a standard compression algorithm after being received from a camera/microphone. Have a look at H264, which is commonly used to compress video.
RTP is often used to transmit audio/video. It allows multiple types of stream to be combined over a single socket.
Control traffic is usually sent separately over a different socket, usually TCP. For example SIP which is used for VoIP phones initiates a control connection over a TCP or UDP port (usually 5060). The two ends then negotiate which types of stream will be supported, and how those streams will be sent. For SIP, this will be an RTP stream which is set up on a different UDP port.

Related

Does it make sense to use RTP protocol for multiple streamers and single receiver?

I am in a process of learning and trying to use the RTP/RTCP protocol. My situation is that there is 1 to n streamers and 1 (or potentially 1 to m if needed) receiver(s), but in a way that the streamers themselves do not know about each other (they cannot directly due to technical reasons, such as different network, limited bandwidth, etc...). So it is more like multiple unicast sessions, but the receiver actually knows about them all, collects data from all of them, it is just the senders do not know about each other .
Now reading about the protocol, it seems to me that huge portion of it is related to sending some feedback, collision detections, and so on. So I have doubts, is RTP is really applicable in this case? Is is already used in this way somewhere?
Seems to me it is still beneficial to collect statistic about data transfer that RTP provides (data sent, loss, times, etc...), it just feels the most of the protocol is sort of left out...
Also I have one additional question, going through the various RTP libraries, they all assume that sender will also open ports for receiving RTP/RTCP data, does RTP forbid use of one way communication? I mean application that would only stream the data, not expecting to receive anything back. The libraries (e.g. ccRTP) seem to assume both way communication only...

RTCP is the protocol that provides statistics. The stream receiver (client) will send stats to the sender (server) via RTCP. I don't believe the client will get any statistic reports from the server.
There's nothing wrong with a single client receiving multiple unicast sessions from various servers.
RTP requires two way communication during the setup process. Once setup is complete and the play cmd is sent, it is mostly one way. The exception are the "keep alive" packets that must be sent to the server periodically (usually every 60 seconds or so) to keep the stream going. The exact timeout value is sent to the client during the setup process.
But if you implement your own RTP, there's nothing stopping you from having the server send the stream continuously without any feedback from the client. Basically it would be implementing an infinite timeout value.
You can read about all the details in the spec: RTP: A Transport Protocol for Real-Time Applications

RTP Multiplexing or Mixing?

I'm designing a real-time voice comm system that I want to use RTP with. Here's my general requirements:
Every user streams one audio stream to the server
The incoming stream may be compressed differently, depending on the source (a SIP trunk, an Android phone, a desktop client, etc.)
Users can pick which streams they want to receive
If the users had unlimited bandwidth and there wasn't a limited number of ports, I would just have them each open an RTP stream with the server for each stream they wanted to receive. However, a lot of the users will be over a 3G or 2G network, so my question is, how can I bundle the streams (chosen by the user) into a single RTP stream?
One option I've seen is multiplexing the streams into a single packet, but as far as I can tell, that actually goes against the RFC (however, there are working drafts for multiplexing).
Another option would be just mixing the audio together into one packet. Is that the recommended way to do this? I would have to normalize all of the chosen streams into one format first.
I'm very new to the whole VoIP/streaming media thing, so this may be a poor question.

I'm guessing that you want to use tcp in order to not lose data.
you need to use rtp over tcp see rfc
this allow sending several rtp stream over a single socket with a unique id per stream.

How'd I determine where one packet ends and where another one starts

While sending packets across a network, how can one determine where one packet ends and where another starts?
Is sending/receiving acknowledgment one of the ways of doing so?

TCP is a stream-based protocol. That is, it provides a stream vs. packet or message-based interface to the application. If using TCP, an application must implement its own method of determining packets or messages. For example, (a) all message are a fixed size, or (b) each message is prefixed with its subsequent size, or (c) there is a special "end-of-record" sequence in the data stream to indicate a message boundary. Search google for lots of information on how one can implement message boundaries in TCP.

I assume here that you mean application-level 'packets'.
If you use UDP, you don't need to since it's a message protocol. TCP is a byte streaming protocol, so it cannot send packets, just bytes. If you need to send anything more complex than a byte-stream across TCP, you have to add another protocol on top - HTTP is one such protocol. Text is fairly easy since lines have terminating characters, usually CR/LF/CRLF. Sending non-text messages will require a different protocol.
One approach that is often used with TCP is to connect, stream a protocol-unit, disconnect. This works OK, but slowly because of the huge latency of continually opening and closing TCP connections. HTTP usually works like this in order to serve up web pages to large numbers of users who, if left permanently connected while they viewed pages, would needlessly use up all the server sockets.
Waiting for an application-level ACK from the peer is sometimes necessary if it absolutely essential that peer receipt is known before the next message is sent, but again, this is slow because of the connection latency. TCP was not designed with this approach in mind.
If the commonly available IP protocols cannot directly provide what you need, you will have to resort to implementing your own.
What sort of 'packet' are you sending?
Rgds,
Martin

With TCP sockets, you just see the datastream where you can receive and send bytes. You have no way of knowing where a packet ends and another begins.
This is a feature (and a problem) of TCP. Most people just read data into a buffer until a linefeed (\n) is seen. Then process the data and wait for the next line. If transferring chunks of binary data, one can first inform the receiver of how many bytes of data are coming.
If packet boundaries are important, you could use UDP but then the packet order might change or some packets might be lost on the way without you knowing.
The newer SCTP protocol behaves much like TCP (lost packets are resend, packet ordering is retained) but with SCTP sockets you can send packets so that receiver gets exactly the same packet.

What is the difference between RTP or RTSP in a streaming server?

I'm thinking about developing a streaming server and I have the following question, do over RTSP (example url: rtsp://192.168.0.184/myvideo.mpg) or RTP (example url: rtp://192.168.0.184).
As I have understood, an RTSP server is mainly used for streaming of files that already exist, ie, not live. RTP server is used to broadcast.
Somebody correct me if I'm wrong, am I right?.
What I want to develop a server to broadcast live content on the computer screen, that is, which is displayed at the time that is broadcast in streaming.

You are getting something wrong... RTSP is a realtime streaming protocol. Meaning, you can stream whatever you want in real time. So you can use it to stream LIVE content (no matter what it is, video, audio, text, presentation...). RTP is a transport protocol which is used to transport media data which is negotiated over RTSP.
You use RTSP to control media transmission over RTP. You use it to setup, play, pause, teardown the stream...
So, if you want your server to just start streaming when the URL is requested, you can implement some sort of RTP-only server. But if you want more control and if you are streaming live video, you must use RTSP, because it transmits SDP and other important decoding data.
Read the documents I linked here, they are a good starting point.

AFAIK, RTSP does not transmit streams at all, it is just an out-of-band control protocol with functions like PLAY and STOP.
Raw UDP or RTP over UDP are transmission protocols for streams just like raw TCP or HTTP over TCP.
To be able to stream a certain program over the given transmission protocol, an encapsulation method has to be defined for your container format. For example TS container can be transmitted over UDP but Matroska can not.
Pretty much everything can be transported through TCP though.
(The fact that which codec do you use also matters indirectly as it restricts the container formats you can use.)

Some basics:
RTSP server can be used for dead source as well as for live source. RTSP protocols provides you commands (Like your VCR Remote), and functionality depends upon your implementation.
RTP is real time protocol used for transporting audio and video in real time. Transport used can be unicast, multicast or broadcast, depending upon transport address and port. Besides transporting RTP does lots of things for you like packetization, reordering, jitter control, QoS, support for Lip sync.....
In your case if you want broadcasting streaming server then you need both RTSP (for control) as well as RTP (broadcasting audio and video)
To start with you can go through sample code provided by live555

I hear your pain. I'm going through this right now (years later).
From what I've learned, you can think of RTSP as a "VCR controller", the protocol allows you to specify which streams (presentations) you want to play, it will then send you a description of the media, and then you can use RTSP to play, stop, pause, and record the remote stream. The media itself goes over RTP. RTSP is normally implemented over a different socket or communication layer. Although it is simply a protocol, most often it's implemented by a server over a socket. For live streams, the RTSP stream you request is simply a name of a stream. It doesn't need to refer to a file on the server, the server's RTSP implementation can parse that stream, put together a live graph, and then provide the SDP (description) for that stream name. But, this is of course specific to the way the RTSP server has been implemented. For "live" streams, it's probably simpler to just use RTP, but you'll need a way to transfer the SDP from the RTP server to the client that wants to play that stream.

I think thats correct. RTSP may use RTP internally.

RTSP is widely used in IP camera, running as RTSP server in camera, so that user could play(pull) the RTSP stream from camera. It is a low cost solution, because we don't need a central media server (think about thousands of camera streams). The arch is bellow:
IP Camera ----RTSP(pull)---> Player
(RTSP server) (User Agent)
The RTSP protocol actually contains:
A signaling over TCP, at port 554, used to exchange the SDP (also used in WebRTC), about media capabilities.
UDP/TCP streams over serval ports, generally two ports, one for RTCP and one for RTP (also used in WebRTC).
Comparing to WebRTC, which is now available in H5:
A signaling over HTTP/WebSocket or exchange by any other protocols, used to exchange the SDP.
UDP streams(RTP/RTCP) over one or many ports, generally bind to one port, to make cloud services load balancer happy.
In protocol view, RTSP and WebRTC are similar, but the use scenario is very different, because it's off the topic, let's grossly simplified, WebRTC is design for web conference, while RTSP is used for IP camera systems.
So it's clear both RTSP and WebRTC are solution and protocol, used in different scenario. While RTP is transport protocol, also it can be used in live streaming by WebRTC.
Note: RTSP is not available for H5 or internet live streaming, but we could covert it by FFmpeg and a gateway server, please see here.

RTP is the transport protocol for real-time data. It provides timestamp, sequence number, and other means to handle the timing issues in real-time data transport.
RTSP is a control protocol that initiates and directs delivery of streaming multimedia data from media servers. It is the "Internet VCR remote control protocol." Its role is to provide the remote control; however, the actual data delivery is done separately, most likely by RTP.
also, RTCP is the control part of RTP that helps with quality of service and membership management.
These three related protocols are used for real-time multimedia data over the Internet. Read the excellent full documentation at this link: RTP, RTCP & RTSP

RTSP (actually RTP) can be used for streaming video, but also many other types of media including live presentations. Rtsp is just the protocol used to setup the RTP session.
For all the details you can check out my open source RTSP Server implementation on the following address: https://net7mma.codeplex.com/
Or my article # http://www.codeproject.com/Articles/507218/Managed-Media-Aggregation-using-Rtsp-and-Rtp
It supports re-sourcing streams as well as the dynamic creation of streams, various RFC's are implemented and the library achieves better performance and less memory then FFMPEG and just about any other solutions in the transport layer and thus makes it a good candidate to use as a centralized point of access for most scenarios.

Reasons to use RTP when streaming a pre-existing file?

The only reason I could think of for using RTP to transfer a pre-existing file is if you're trying to monitor the amount of time a user is streaming the file, like if you're running a time-based On-Demand website. The other streaming-solution i know of is to use HTTP to upload a media file, then providing a client to progressively play the file. Can anyone come up with another reason to use RTP to stream media files?

You don't use RTP to transfer files, you use RTP to stream media to media players.
If you want to serve media RTP has some advantages:
RTP capable clients can use the stream, they might not be able to use whatever else you come up with.
Tolerates network congestion. If you serve the data over a TCP connection, the stream is quite sensible to packet loss and congestion. TCP has long timeouts and you might experience stalls.