How do I seamlessly concatenate MP3 streams? - streaming

I'm working on a streaming server that will be capable of broadcasting targetted ads. Basically listeners hear the same music, but every, say, 30 minutes comes a block of ads and every listener has his/her own block. Implementing such streaming server poses various problems and this question is about one of them.
The server will work in a manner similar to Icecast, i.e. it will read the stream over the network from some stream generator and relay it to every listener. When it's time to broadcast ads, the server stops fetching the stream from the generator, reads ads from files and inserts them into each listener's buffer, transmits them and resumes on relaying stream from the generator.
When the server switches from relaying stream to broadcasting ads, it has to concatenate two MP3 streams (we broadcast in MP3). My concern is that simply appending one piece of data after another may produce some audible artifacts. Can it be done seamlessly?
I've already figured out this:
- I can make the server be aware of MP3 frames to avoid sync errors.
- I'm thinking about appending MP3 frames from the ad file after MP3 frames from the stream.
- Since ad is loaded from properly encoded MP3 file, I circumvent the problem of byte reservoir, because the first frame from the file can't use it.
But my concern is the way MDCT works. Listeners have no idea of what my server will do, so their MP3 decoders may produce some artifacts because incorrect MDCT data will be placed one after another in the stream they download. Will zero-padding at the beginning of the file with the ad compensate for this?
Do you know any libraries/tools (open source if possible) that can seamlessly join two MP3 files without decompressing them?
Can you point any good resources describing MP3 format? I searched Internet a lot, found lots of information, but I still miss the overall picture.
Maybe you know that this would be easier if I used another codec like OGG/Vorbis, AAC?
PS. This question is not a duplicate of What is the best way to merge mp3 files?. mp3wrap and tools alike are not an option for me.

I believe MP3s can be merged by simply concatenating the files. In some quick testing (cat file1.mp3 file2.mp3 > merged.mp3; mplayer merged.mp3) it seems to work as expected. Streaming from a web server probably will work just as well.
How are you going to handle switching the current input file? You can simply treat the advertisements as short tracks to play.

You should be able to concatenate mp3 files of both CBR and VBR formats.
MP3 files do not have a main header (disregarding ID3 and Xing). The audio data is stored as chunks where every chunk includes its own header. The header contains the necessary information (bitrate, sample frequency, stereo, etc) for the decoding of the audio data in that chunk.
This is one of the reasons why it is difficult to determine the duration of a mp3 file.
Another way of looking at it is, if you concatenate a CBR MP3 file with a VBR file, the end result is the same as one long VBR file with the first section of Audio at a constant bitrate.
The issue is that some MP3 players may be strict and expect a Xing header for a VBR MP3 file. This however was never the specification for the MP3 format but it is now assumed to be true.

If you're on Windows, the Microsoft DirectShow API may be the way to go. You should find that is is capable of doing things with audio and video both statically and streaming, in a variety of formats (you only need the necessary codecs, and the interface is virtually the same for all).
Saying this, DirectShow is unfortunately designed in a horribly intricate way, and has a steep learning curve, but the power it offers in unparallel if you're going to be doing audio/video manipulation on Windows. There are however a great number of samples and tutorials on how to use it, so it may not be so painful in the end. Also, if you're using the .NET Framework, there is a managed wrapped by the name of DirectShow.NET. It's not going to be an easy task whatever you do, unless there's something out there than I'm not aware of. Good luck with it anyway!

I approached a very similar problem, and after asking the right questions at various sources came up with the following...
Any worthy decoder will skip "bad" data until it hits a valid frame header. This is what ID3v2 relies upon to inject additional information into mp3 data. At the server, I'd go with analysis of source MP3 files to only serve valid MP3 frames. If you serve a few silent frames (about 7 should do it), the decoder should have time to settle before ramping up for the next load of (unassociated) MP3 data, avoiding the artefacts you (correctly) assume when concatenating frames from different encoding sessions.
More problematic is the possible switching of MP3 attributes (1/2 channels, output sample rate etc) between one frame to the next. Some decoders get quite upset when confronted with such a stream, resulting in 1/2 speed playback and the like. So, you need to ensure that all your source material is encoded to the same output attributes otherwise you may come unstuck.
You may have seen this already, but if not:
http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=79&printer=t

I don't see why you would want to concatenate the files. Why don't you use some sort of play list system and just change which file your sending. I would think this would allow more flexibility in the long run, and you wouldn't end up with large MP3 files.

Related

Realtime meta-data/ captioning for live streamed audio

How might I achieve adding a track of accurately aligned real-time "additional" data with live-streamed audio? Primarily interested in the browser here, but ideally the solution would be possible with any platform.
The idea is, if I have a live recording from my computer being sent into Icecast via something like DarkIce, I want a listener (who could join a stream at any time) to be able to place some kind of annotation over a few of the samples and allow them to send only the annotation back (for example, using a regular HTTP request). However, this needs a mechanism to align the annotation with the dumped streamed audio at the server side, and in a live stream, the user AFAIK can't actually get the timestamp in the "whole" stream, just from when they joined. But if there was some kind of simultaneously aligned metadata, then perhaps this would be possible.
The problem is, most systems seem to assume you "pre-caption" or multi-plex your data streams beforehand. However, this wouldn't make sense for something being recorded and live-streamed in real-time. Google's examples seem to be mostly around their ability to do "live captioning" which is more about processing audio in real-time then adding slightly delayed captions using speech recognition. This isn't what I'm after. I've looked into various ways data is put into OGG containers, as well as the current captioning like WebVTT, and I am struggling to find examples of this.
I found maybe a hint here: https://github.com/w3c/webvtt/issues/320 and I've been recommended to look for examples by Apple and Google using WebVTT for something along these lines, but cannot find these demos. There's older tech as well (Kate, CMML, Annodex, etc) but none of these are in use and are completely replaced by WebVTT. Perhaps I can achieve something like this web WebRTC, but I'm not sure this gives any guarantees on alignment and it's a slightly different technology stack that I am looking at in this scenario.

Large file size download

I am looking into making a system for work where you can download huge video files, (Im talking 4k full length videos which have a file size of sometimes 500GB) and I'm looking into the best way of doing this.
Would it simply need a file manager to split the download? or could I use bittorrent?
any suggestions?
Bittorrent can be used to to 1:1 transfers and has the benefit of hash-verifying the contents and being able to resume the transfer when it has been interrupted. But that can also achieved with other tools such as rsync.
Bittorrent's strength is making the transfer between many nodes scalable and being able to work in a decentralized manner.

Is it possible to inject IDv3 into MP3 stream?

I'd making a relay audio stream server (like shoutcast relaying but with customization) in PHP.
Is it possible to dynamicly add IDv3 tag's every specified pack of data (maybe every second - every 64KB)?
If it`s possible how to do it?
IDv3 tags occur at the beginning of a mp3 but as an mp3 is just a series of frames due to the way it's possible to cut them with say mp3splt without re-encoding that stream would be IDv3 tags followed by mp3 data and then it would repeat in the same format for the next part of the stream
clearly i'm ignoring a lot of the details

iPhone web-app: HTML5 database and audio files

I'm having issues with audio files on the iPhone web-app. Seems as each time an audio file is played, it's loaded first then played, even if repeating the same audio on a page that hasn't refreshed (done via javascript). From what I've research manifest files would be great but they are for offline application. I'm now researching HTML5 databases.
Does anyone know if HTML5 databases can store audio files such as mp3? The end result it then to pull the mp3 from the database. It might still have to load the file each time from the database but I'm hoping it's quicker than retrieving it from a server.
Thank you.
I think what you are after is possible, however you have a significant hurdle in that the implementation of HTML5 databases on most browsers is limited to 5mb as per w3c recommendations:
A mostly arbitrary limit of five
megabytes per origin is recommended.
Having said that the way its implemented in iPhone Safari is that databases can grow until they reach 5MB in size at which point the browser will ask the user if they wish to allow for the extra size, asking again at 10, 50, 100 and 500MB (see section "Estimated Database Size" in this post by html5doctor).
There is no limit on the number of databases you can build per domain in safari, however according to this post by Cantina Consulting you can have a total of 50MB across all databases in a single domain.
Given these parameters, a possible work-around for this implementation is to split your mp3 blobs across multiple databases, creating a new database each time your reach 4.9MB, however even if you follow this design it may not be ideal as you will still experience the following:
50MB is not a lot of audio files, a typical 5/6min song is about 5MB at 128Khz, so that only gives you space for about 1CD (60 min) of mp3 songs, after this you will need user cooperation to use additional database space.
You will still have significant security issues trying to play the mp3 blobs from the javascript runtime, it may be possible to bypass these tricking flash into thinking they are mp3 stream but I'm not sure how you'd go about it.
Feel free to have a play around with this iPhone HTML5 SQL Client I put together, you may want to use something similar for experimenting with your local mp3 Database.

Streaming short sound files

I have a script that generates wave files, based on user input.
I want to be able to stream those wave files online(not necessarily as wave files, they can be converted on the fly to mp3 or whatever). Preferably through a embedded flash streamer, but a html5 version would be good too.
The files are generally small, around 5 seconds long, and I'd like then to be stream multiple files in one session.
Does anyone know how I should go about implementing this?
With such short audio clips I would not bother with a 'real' streaming technology, but just serve them up via HTTP as static files as quickly as the network connection will allow. A quick look at my iTunes library indicates that a 5s 128kpbs 44kHz stereo file is between 120-250KB. Almost small. If you are talking about 32kbps mono, then maybe the sizes will be a mere 15-30KB.
Encoding on-the-fly may result in undesirable issues, like scaling (CPU load from all those encoding jobs, some of which will be duplicate), latency (setting up the encoding, the actual encoding), and you won't know the end file size which can cause problems. So, setting up a caching system may make more sense.
I use wpaudioplayer to stream MP3s from my website (Example). It was originally made as a wordpress plugin but can be used as a standalone javascript.
I believe that it can play wave files as well as MP3s. If you do end up converting them before serving them I would suggest that you would