Live Streaming Service Development - streaming

I am about to develop a service that involves an interactive audio live streaming. Interactive in the sense that a moderator can have his stream paused and upon request, stream audio coming from one of his listeners (during the streaming session).
Its more like a Large Pipe where what flows through but the water can come in from only one of many small pipes connected to it at a time with a moderator assigned to each stream controlling which pipe is opened. I know nothing about media streaming, I dont know if a cloud service provides an interactive programmable solution such as this.
I am a programmer and I will be able to program the logic involved in such interaction. The issue is I am a novice to media streaming, don't have any knowledge if its technologies and various software used on the server for such purpose, are there any books that can introduce on to the technologies employed in media streaming, and I am trying to avoid using Flash,?
Clients could be web or mobile. I dont think I will have any problem with integrating with client system. My issue is implementing the server side

You are effectively programming a switcher. Basically, you need to be able to switch from one audio stream to the other. With uncompressed PCM, this is very simple. As long as the sample rates and bit depth are equal, cut the audio on any frame (which is sample-accurate) and switch to the other. You can resample audio and apply dithering to convert between different sample rates and bit depths.
The complicated part is when lossy codecs get involved. On a simlar project, I have gone down the road of trying to stitch streams together, and I can tell you that it is nearly impossible, even with something as simple as MP3. (The bit reservoir makes things difficult.) Plus, it sounds as if you will be supporting a wide variety of devices, meaning you likely won't be able standardize on a codec anyway. The best thing to do is take multiple streams and decode them at the mix point of your system. Then, you can switch from stream to stream easily with PCM.
At the output of your system, you'll want to re-encode to some lossy codec.
Due to latency, you don't typically want the server doing this switching. The switching should be done at the desk of the person encoding the stream so that way they can cue it accurately. Just write something that does all of the switching and encoding, and use SHOUTcast/Icecast for hosting your streams.

Related

Video streaming application Where to start

So there is some time now that i am thinking of creating some sort of video streaming application(client and server).Doing some little search i always get applications for streaming and not how to code one.
I know that it should be something like... capture data, pack , send to server and then the server will broadcast to anyone connected...right?
So where should i start...should i study about sockets..should i study more about how to implement UDT or TCP protocol...or those two combined??
Part of the problem you're having in your searches is that you haven't really defined what you're trying to solve. "video streaming application" isn't enough... what are the constraints? Some questions that help narrow down appropriate solutions:
Does the player need to be web-based?
Does the source need to be web-based?
What other platforms need support?
What sort of latency requirements do you have? (Video conferencing style, where quality is less important but low latency is very important... or more traditional streaming where you choose quality and don't care much about latency.)
What's the ratio between the source streams and those playing? Lots of watchers per stream, or lots of streams with few watchers?
At what sort of scale does your whole operation need to be at?
I know that it should be something like... capture data, pack , send to server and then the server will broadcast to anyone connected...right?
Close. Let's break this down a bit. All video streaming is going to have some element of capture, codecs, a container or transport, a server to distribute, and clients to connect to the server and reverse the whole process.
Media Capture
As I hinted at above, how you do this depends on the platform you're on. This is actually where things vary the most. If you're on Windows, there's DirectShow. OSX and Linux have their own capture frameworks. Also remember that you need and audio stream as well, which isn't necessarily handled in the video capture. If you're web-based, you need getUserMedia.
Codecs
It would be incredibly inefficient to send raw uncompressed frames. If it weren't for codecs, video streaming would be impossible as we know it. Each codec works a bit differently, but there are a lot of common techniques.
At a basic level, if you can imagine frames on a filmstrip, each frame isn't much different from the next. For a given shot, there may be motion happening but much of the content in the frame stays very similar. We can save a lot of bandwidth by only sending what's changed. (Realistically speaking, each frame is always a bit different due to the analog nature of the world we're capturing, but codecs can spend very little bandwidth on the things that are almost completely the same, vs. things that are totally different.) When we go to a different shot, the codec sees that the whole frame is different and sends a whole frame. Frames that can stand alone are "I-frames". I-frames are also inserted regularly in the stream, every few seconds. Most video players will only seek to I-frames, because anything not an I-frame requires decoding of all the frames before it up until a preceeding I-frame. If you've ever tried to hit an exact spot in a movie but the player put you somewhere within a few seconds nearby, that's why this happens. In addition, if some frames were to become corrupted, the stream will correct itself on the next I-frame. (Ever watched a video and a huge chunk of it went green for a few seconds but was fine later? That's why.)
Video codecs also use the nature of how we see things to their advantage. Our eyes are far more sensitive to changes in brightness than changes in color. Therefore, the codecs spend more bandwidth on brightness differences in the frame than they do in the color differences. There are also some crafty tricks for smoothing and adding visual noise to make things look more normal rather than blocky.
Audio codecs are also required. While a CD-quality stereo uncompressed audio stream may only take up 1.4mbit, that's a lot of bandwidth in internet terms. A lot of streaming video sites use less bandwidth than this for the entire video. Audio codecs, much like video codecs, use some tricks around how we perceive to save bandwidth. (For a more detailed explanation, read my post about how MP3 works here: https://sound.stackexchange.com/a/25946/7209)
Container
The next step is to mux your encoded audio and video streams together in a container format. If you were recording to disk, you might choose something like MKV which supports audio, video, subtitles, and more, all in the same file. WebM is basically a limited version of MKV but is designed to be easily supported by browsers. Or you might choose a format less complicated like MP4 where you are limited in choice of audio and video codecs, but get better player compatibility.
Since you're live streaming, the line between the streaming protocol and the container are often blurred a bit. HLS will require you to make a bunch of video files that stand alone, but your muxer and your codecs need to know how to segment these files in a way that they can be put together again. I think that RTMP takes its cues from FLV, but also has some information about the streams in its exchange with the client. (If you use RTMP, you might read up on it elsewhere... I don't know much about RTMP under the hood.)
Server
Lots of choices here. In the case of WebRTC, the "server" might actually be the web browser doing all the encoding and what not because it can run peer-to-peer. Alternatively, you might have a specialized streaming server running RTMP, or a normal HTTP web server for distributing HLS chunks. Again, what you choose depends on your requirements.
Clients
Clients need to connect to the server, demux the streams, decode the audio and video streams, and play they back. It's the entire process listed above, but in reverse.
So where should i start...
Start by figuring out exactly what you want to do. If you don't know what you want to do, play around with WebRTC. The browsers do all the work, and it requires very little server resources in most cases. This will allow you to stream between a few clients in real time.
To get more advanced, start experimenting with what you already have off-the-shelf. FFmpeg is a great tool that you should absolutely know how to use, and it can be embedded in your solution.
A few things of what you probably shouldn't do (unless you really want to):
Don't invent your own codec. (The codecs we have today are very good. They have taken a ton of investment and decades of academic research to get to where they are.)
Don't invent your own streaming protocol. (You would have to fight to get it adopted in all the players. We already have a ton of streaming protocols to choose from. Use what's already there.)
should i study about sockets..should i study more about how to implement UDT or TCP protocol...or those two combined??
It would always be helpful for you to know the basics of networking. Yes, learning about UDP and TCP will certainly help you, but since you're not inventing your own streaming protocols, you're not going to get to even choose between them anyway.
I hope this helps you get started. In short, understand all the layers here. Once you have done that, you'll know what to do next, and what to Google for.

Protocol for sending streaming data over multiple sockets

I am working on designing an API for consuming messages from an application that will generate a very large amount of data; 10+ of GB/s is likely, even for smaller clients. I am looking for a protocol that allows me to deliver this data in a way that is easy for clients to consume.
The obvious answer for me is: split up the messages so they are consumable over multiple connections. Each connection would consume a fraction of the overall load.
But if I do this, there are a few things I need to account for:
How does the user know they are falling behind and need to launch more connections?
Twitter says consumers should check timestamps, which could work for us
When they launch a new connection to consume more of the data, how do they specify that this is part of the same consumption session?
We could give the session a name, correlate that with a "direct" amqp queue, and let our queue do the hard work
Is there something very important I am missing.
Probably.
For this reason, I'd much rather a protocol that already exists.
The protocol would be considered extra awesome if it:
is websocket or streaming HTTP friendly
supports data compression
The problems you are describing are pretty much the same issues that video streaming has to deal with, which you probably already know. The key HTTP friendly streaming protocols are HLS (Apple), SmoothStreaming (Microsoft), HDS (Adobe) and MPEG-DASH (open protocol, but new).
When considering video streaming, it is also worth understanding whether your streams are more like 'live' streams or 'static' content - the former is generated on the fly and any given part of the live stream may only be available for a set tine, while the latter stored on the server in full and generally any part is available at any time (until the content is removed). How you stream and playback these is subtly different.
It may be that you can simply reuse one of the above video streaming protocols by wrapping your data as if it were video (or maybe it even is video), and implementing your own custom client on the receiving side.
Alternatively, these protocols could provide a good reference point if you wanted to create your own simpler protocol - there are several open source streaming servers you could look to for ideas or even adapt to your needs if that looks like a sensible route:
http://gstreamer.freedesktop.org
http://icecast.org
Video streaming is quite complex as you may already be aware, but if your use cases are simpler you may be able to ignore or remove much of the complexity - for example you may not need seek, multiple format and bit rate streams, accompanying streams (for subtitles etc). Being able to simplify like this might justify the effort to modify one of the above for your needs, if you are not able to use them out of the box.
One final point - video and audio streaming protocols usually have a built in way of dealing with delayed or lost packets. Depending on your application these may not be applicable to you so you should look carefully at this aspect if reusing a video or audio streaming protocol or server. For example, audio clients are typically tolerant of a small amount of packet loss, and will generally discard delayed packets rather than pause the audio (packets received outside the 'jitter buffer' window). If your application cannot tolerate any packet loss, then you will need to look carefully at the underlying solution and protocol to make sure it really meets your needs over all network conditions.

Darwin Streaming Server - Adaptive Bitrate?

Can anyone provide any direction or links on how to use the adaptive bitrate feature that DSS says it supports? According to the release notes for v6.0.3:
3GPP Release 6 bit rate adaptation support
I assume that this lets you include multiple video streams in the 3gp file with varying bitrates, and DSS will automatically serve the best stream based on the current bandwidth. At least that's what I hope it does.
I guess I'm not sure what format DSS is expecting to receive the file. I tried just adding several streams to a 3gp file which resulted in Quicktime unable to play it, and VLC opening up a different window for each stream in the file.
Any direction would be much appreciated.
Adaptive streaming used in DSS 6.x uses a dropped frame approach to reduce overall bandwidth rather than dynamic on the fly bitrate adjustments. The result of this can be unpredictable. The DSS drops the frames, and does not need the video encoded in any special way for it to work.

On-demand video streaming

I'm currently researching different streaming methods both for live and on-demand streaming.
I've read about both multicast and unicast, and now I got the following question, which I can not find an answer to.
"Is it possible to make on-demand streaming with multicast?"
The way I understand it is, that when using multicast, the media server creates a stream of the video, which only is played once, which users can connect to and watch.
It it because multicast only allows live streaming? If not can someone please explain to me how it works?
"Is it possible to make on-demand streaming with multicast?"
Technically, yes. Practically, no.
The way I understand it is, that when using multicast, the media server creates a stream of the video, which only is played once, which users can connect to and watch.
You understand it correctly. And that is that.
Well, you can do it, but the bigger question is why would you want it?
On-demand suggests that you start the broadcast at the time that a single viewer wants to see that particular piece of content. If a single user chooses the content and the time it is started, why would you want to multicast it?
Yes, it can be done, but there are caveats. If you take a flight on an old plane you may see an old entertainment system that offers say 20 channels with a movie on each. The channels are all rolling and once the programmes have finished they restart. This is better than having just one channel broadcast on a projector as it gives the user choice of what to watch but doesn't give them the freedom of when to watch.
Modern flight entertainment systems are all on-demand, every passenger can watch any film at any time. So how can multicast help there is the question? If you detect that multiple users are watching the same film, and the caveat being at the same time, you can replace the streams to each user with just one multicast channel. Which is technically savvy but you have to ask why would you do this? This only makes sense if the communication medium is feeliable or insufficient to serve every user simultaneously.
Designing a flight entertainment system that does not scale to every passenger actually using it is a bit short sighted. Therefore the system can handle the worst case of a stream for each user, meaning there is no benefit for multicasting anything.
Some cable/satellite networks implement multicast streaming and use time windows to group as many viewers together as possible. For example wait up to 5 minutes to watch a video whilst displaying the infamous phrase "buffering".

Deciphering MMORPG Protocol Encoding

I plan on writing an automated bot for a game.
The tricky part is figuring out how they encoded their protocol... To make the bot run around is easy, simply make the character run and record what it does in wireshark. However, interpreting the environment is more difficult... It recieves about 5 packets each second if you are idle, hence lots of garbarge.
My plan: Because the game runs under TCP, I will use freecap (http://www.freecap.ru/eng) to force the game to connect to a proxy running on my machine. I will need this proxy to be capable of packet injection, or perhaps a server that is capable of resending captured packets. This way I can recreate and tinker around with what the server sends, and understand their protocol encoding.
Does anyone know where I can get a proxy that allows packet injection or where I can perform packet injection (not via hardware, as is the case with wireless or anything!)
Where of if I can find a server/proxy that resends captured packets (ie: replays a connection).
Any better tools or methodologies for pattern matching? Something which can highlight patterns from mutliple messages would be GREAT.
OR, is there a better way to decipher this here? Possibly a dissasembly strategy (via hooking a winsock function and starting the dissassembly from there) ? I have not done this before so I am not sure. OR , any other ideas?
Network traffic interception and protocol analysis is generally a less favored method to accomplish your goal here. For most modern games, encryption is a serious factor, and there are serious headaches associated with the protocol analysis for any but trivial factors of the most common gameplay scenarios.
Most modern implementations* of what you are trying to do rely on reading and manipulating the memory space and process of a running client. The client will have already done all the hard parts for you, including decrypting the traffic and sorting it into far more easy to read data structures. For interacting with the server you can call functions built into the client instead of crafting entire series of packets from scratch. The plus to this approach is that you have to do far less work to interpret the data and produce activity. The minus is that there is often some data in the network traffic that would be useful to a bot but is discarded by the client, or that you may want to send traffic to the server that the client cannot produce (which, in my own well-developed hierarchy for such, is a few steps farther down the "cheating" slope).
*...I say this having seen the evolution of the majority of MMORPG botting/hacking communities from network protocol analyzers like ShowEQ and Odin's Eye / Excalibur to memory-based applications like MacroQuest and InnerSpace. On that note, InnerSpace provides an excellent extensible framework for the memory/process-based variant of what you are attempting, and you should look into it as a basis for your project if you abandon the network analysis approach.
As I've done a few game bots in the past (for fun, not profit or griefing of course - writing game bots is a lot of fun), I recommend the following:
If you can code and there isn't cheat protection preventing you from doing it, I highly recommend writing an injected DLL for the following reasons:
Your DLL will be able to access the game's memory space directly, and once you reverse-engineer the data structures (either by poking around memory or by code disassembly), you'll have access to lots of data. This will also allow you to bypass any network encryption the game may have. The downside of accessing process memory directly is that offsets and data structures change between versions - however, data structures don't change very often with a stable game, and you can compensate offset changes by searching for code patterns instead of using fixed offsets.
Either way, you'll still be able to hook WinSock functions using API hooks (check out Microsoft Detours and the excellent but now-commercial madCodeHook).
otherwise, I can only advise that you give live/interactive packet editors like WPE Pro a try.
In most scenarios, the coolest methods (code reverse-engineering and direct memory access) tend to be the least productive. They require a lot of skill (to understand the code) and time, both initially (to go through all the code and develop code to interact with the data structure) and for maintainance (in case the game is being updated). (Of course, they sometimes do allow doing cool stuff which is impossible to do with the official client, but most of the time this is obvious as blatant cheating, and likely to attract the GMs quickly). Most of the time bots are made by replacing game graphics/textures with solid colours, and creating simple "pixel" bots which search for certain colours on the screen and react accordingly (e.g. click them).
Hope this helps, and remember - cheating is only fun when it doesn't make the game less fun for everyone else ;)
There are probably a few reasonable assumptions you can make that should simplify your task enormously. However, to make the best use of them you will probably need greater comfort with sleeves-rolled-up programming than it sounds like you have.
First, it's a safe bet that the encryption they are using falls into one of three categories:
None
Cheesy
Far better than you are likely to crack
With the odds of the middle case being very low.
Next, it's a safe bet that the packets are encrypted / decrypted close to the edge of the program (right as they come in, right before they go out) and that the body of the game deals with them in decrypted form.
Finally, the protocol they are using most likely consists of either
ascii with data blocks
binary goo
So do a little packet sniffing with a card set in promiscuous mode for unencrypted ascii. If you see some, great, you're ahead of the game. But if you don't give up the whole tapping-the-line idea and instead start following the code as it returns from the sending data out by breakpointing and stepping with a debugger. Figure the outermost layer or three will be standard network stuff, then will come the encryption layer, and beyond that the huge mass of stuff that deals with the protocol unencrypted.
You should be able to get this far in an hour if you're hot, a weekend if you're reasonably skilled, motivated, and diligent, and never if you are hopeless. But it is possible in principle (and doubtlessly far easier in practice) to do it this way.
Once you get to where something that looks like unencrypted goo comes in, gets mungled, and the mungled form goes out, then start worrying about what it means.
-- MarkusQ
A) I play a MMO and do not support bots, voting down...
B) Download Backtrack v.3, run an arpspoof on your default gateway and your host. There is an application that will spoof the remote host's SSL cert sslmitm (I believe is the name) which will then allow you to create a full connection through your host. Then fireup tcpdump/ethereal/wireshark (choose your pcap poison) and move around do random stuff to find out what packet is doing what. That will be your biggest challenge; but proxying with a Man in the Middle attack on yourself is the way to go.
C) I do not condone this activity, this information is only being provided as free information.
Sounds like there is not encryption going on, so you could do a network approach.
A great place to start would be to find the packet ID's - most of the time, something near the front of the packet is going to be an ID of the type of the packet. For example move could be 1, shoot fired could be "2", chat could be "4".
You can write your own proxy that listens on one port for your game to connect, and then connects to the server. You can make keypresses to your proxy fire off commands, or you can make your proxy write out debugging info to help you go further.
(I've written a bot for an online in game in PHP - of all things.)