In an A/V stream, is the amount of data streamed constant or fluctuating? - streaming

The amount of activity in an A/V stream can vary. For instance, if the data being streamed is from an empty, silent room, there is much less going on than if the data is something like a loud and explosive video game.
What I am wondering is whether the actual amount of data going up and down differs depending on this subjective interpretation of "activity". In other words, am I downloading less data when watching a stream of the empty room versus the active video game? My hunch has always been a resounding "no"; after all, how would the program know the difference between the two?
I'm asking now, though, because I've noticed a difference when streaming video in the past. The video always seems to be fine during periods of subjectively "low" activity, and it begins to lag or skip during periods of "high" activity. Is this just coincidence, or is there actually some kind of algorithm or service in place which dilutes data in periods of low activity or something like that?

Well, the thing is that audio and video streams are compressed. They can be compressed with any one of a whole range of formats. Some formats will aim for a % reduction in size, some will set a quality value, others will perform the same steps whether the data is simple or complex.
Take for example the jpg and png formats. Open up your favourite editor and create a 640x480px image, filled with pure white. Now save that file and look at it's size. Now apply noise to the image and save it as a new file. Compare the two - see the huge difference in size..
I got 1.37kb for the white image, 331kb for the noisy one. (a single 8x8 or 16x16 tile may be repeated for the entire white image, unique 8x8 or 16x16 blocks must be used for the noisy one)
VBR (variable bit rate) and CBR (constant bit rate) are two frequently used terms when video transcoding (changing from one format to another)
Anyway - the answer is 'it depends on the format' - some formats do work like that, some don't.
The video card is always sending the same quantity of data to the screen each frame, even if there is very little information in it - it's uncompressed. Transmitted audio and video on the other hand are (almost) always compressed, so when there's less information, it takes less data to convey it.

Related

Recovering and Reordering Lost Bytes

When an image is sent to an application (e.g WhatsApp) over the network, the image is compressed to an extent.
How can I recover these lost bytes and when I do, how can I regain the order in which they were, originally?
Use case for this is in the application of Steganography. if I encode a message into a png, send it over WhatsApp, and download it back (comes back as jpeg in WhatsApp's case), convert it back to PNG, I cannot seem to decode the message again as I would with the picture that never went over the network.
You're dealing with a noisy channel, which may intentionally or unintentionally alter your data in transit, so you need to ensure your algorithm is robust to that. In this case you want an algorithm robust to lossy recompression, assuming nothing else takes place, e.g., resizing, cropping, etc.
I would start with a literature review to find an algorithm that fits any other lower priority criteria you may have. Keep in mind that the algorithm will probably end up being more complex compared to simply altering pixel values directly, which can be done in a few lines of code. Especially if the algorithm is only applicable to jpeg images. And it's likely it'll implement some kind of error correction, which will decrease your message capacity.

How is it possible to encode black/white picture into ".wav"-file?

How is it possible to encode black/white picture into ".wav"-file? I know that it is possible for sure with help of "stenography". But I don't know it's algorithms. What algorithms exist? And what books/sources are the best for understanding of their principles?
Edited:
Actually I have stereo wav-file. My task is to decode pictures from it. The task says, that frequencies of the left channel show the X-coordinate, frequencies of the right channel show the Y-coordinate of Cartesian coordinate system. These points compose the picture with the text-message. So, I must to write programm for this. I haven't any idea what should I do.
Probably the simplest version of steganography using a wav file would be to use 16-bit samples in the wave file, but only dedicate the 15 most significant bits to sound. In the least significant bit of each sample, you'd encode one pixel of your black and white picture.
Regenerating the picture would require software to open the wave file, take the least significant bit from each sample, and put those bits back together with each other into (for example) a JPEG file.
To put things into perspective, a CD has two channels containing 16 bit samples at a rate of 44.1 KHz, so you'd only need the LSBs from around 10 seconds of sound to encode a fairly typical full-color JPEG (e.g., 100KB or so). A wave file of a typical ~3 minute pop song could hide around 15-20 full-color pictures pretty easily.
Edit: (to reply to edited answer). This is a little tougher to deal with. An individual sample can't represent any frequency; it just represents the amplitude at a given point in time. To get frequency, you need a number of samples over a period of time -- and you need to know the exact period to convert.
Once you know that, you basically do an FFT on the samples. That will tell you the relative strengths of signal at all possible frequencies. Presumably, you'd pick the strongest one and scale appropriately. Do the same for the other channel and draw a pixel at that point.
Your ears are not sensitive to small changes in sound file.
Wav files are UNCOMPRESSED data so its just a file of 16-24bit characters. Your ears cannot notice slight differences betweeen bits. All you need to do is periodically inject bit values that represent an image in the data.
So if you insert one pixel for every 1000 data points you can hide an image (without even encrypting it) in a wave file. If a user plays the file they CANNOT hear it.
When you save the file on your computer or computer afar you can use a decoding tool that is aware of the hiding techinque.

How to export sound from timeline of sounds on iOS with OpenAL

I'm not sure if it's possible to achieve what I want, but basically I have a NSDictionary which represents a recording. It's a timeline of what sound id was played at what point in time.
I have it so that you can play back this timeline/recording, and it works perfectly.
I'm wondering if there is anyway to take this timeline, and export it as a single sound that could be saved to a computer if the device was synced with iTunes.
So basically I'm asking if I can take a timeline of sounds, play it back and have these sounds stitched together as a single sound, that can then be exported.
I'm using OpenAL as my sound framework and the sound files are all CAFs.
Any help or guidance is appreciated.
Thanks!
You will need:
A good understanding of linear PCM audio format (See Wikipedia's Linear PCM page).
A good understanding of audio sample-rates and some basic maths to convert your timings into sample-offsets.
An awareness of how two's-complement binary numbers (signed/unsigned, 16-bit, 32-bit, etc.) are stored in computers, and how the endian-ness of a processor affects this.
Patience, interest in learning, and a strong desire to get this working.
Here's what to do:
Enable file sharing in your app (UIFileSharingEnabled=YES in info.plist and write files to /Documents directory).
Render the used sounds into memory buffers containing linear PCM audio data (if they are not already, i.e. if they are compressed). You can do this using the offline rendering functionality of Audio Queues (see Apple audio queue docs). It will make things a lot easier if you render them all to the same PCM format and sample rate (For example 16-bit signed samples #44,100Hz, I'll use this format for all examples), and use the same format for your output. I recommend starting off with a Mono format then adding stereo once you get it working.
Choose an uncompressed output format and mix your sounds into a single stream:
3.1. Allocate a buffer large enough, or open a file stream to write to.
3.2. Write out any headers (for example if using WAV format output instead of raw PCM) and write zeros (or the mid-point of your sample range if not using a signed sample format) for any initial silence before your first sound starts. For example if you want 0.1 seconds silence before your first sound, write 4410 (0.1 * 44100) zero-samples i.e. write 4410 shorts (16-bit) all with zero.
3.3. Now keep track of all 'currently playing' sounds and mix them together. Start with an empty list of 'currently playing sounds and keep track of the 'current time' of the sample you are mixing, for each sample you write out increment the 'current time' by 1.0/sample_rate. When it gets time for another sound to start, add it to the 'currently playing' list with a sample offset of 0. Now to do the mixing, you iterate through all of the 'currently playing' sounds and add together their current sample, then increment the sample offset for each of them. Write the summed value into the output buffer. For example if soundA starts at 0.1 seconds (after the silence) and soundB starts at 0.2 seconds, you will be doing the equivalent of output[8820] = soundA[4410] + soundB[0]; for sample 8820 and then output[8821] = soundA[4411] + soundB[1]; for sample 8821, etc. As a sound ends (you get to the end of its samples) simply remove it from the 'currently playing' list and keep going until the end of your audio data.
3.4. The simple mixing (sum of samples) described above does have some problems. For example if two samples have values that add up to a number larger than 32767, this cannot be stored in a signed-16-bit number, this is called clipping. For now, just clamp the value to 32767, and get it working... later on come back and implement a simple limiter (see description at end).
Now that you have a mixed version of your track in an uncompressed linear PCM format, that might be enough, so write it to /Documents. If you want to write it in a compressed format, you will need to get the source for an audio encoder and run your linear PCM output through that.
Simple limiter:
Let's choose to limit the top 10% of the sample range, so if the absolute value is greater than 29490 (int limitBegin = (int)(32767 * 0.9f);) we will scale down the value. The maximum possible peak would be int maxSampleValue = 32767 * numPlayingSounds; and we want to scale values above limitBegin to peak at 32767. So do the summation into sampleValue as per the very simple mixer described above, then:
if(sampleValue > limitBegin)
{
float overLimit = (sampleValue - limitBegin) / (float)(maxSampleValue - limitBegin);
sampleValue = limitBegin + (int)(overLimit * (32767 - limitBegin));
}
If you're paying attention, you will have noticed that when numPlayingSounds changes (for example when a new sound starts), the limiter becomes more (or less) harsh and this may result in abrupt volume changes (within the limited range) to accommodate the extra sound. You can use the maximum number of playing sounds instead, or devise some clever way to ramp up the limiter over a few milliseconds.
Remember that this is operating on the absolute value of sampleValue (which may be negative in signed formats), so the code here is just to demonstrate the idea. You'll need to write it properly to handle limiting at both ends (peak and trough) of your sample range. Also, there are some tricks you can do to optimize all of the above during the mixing - you will probably spot these while you're writing the mixer, be careful and get it working first, then go back and refactor/optimize if needed.
Also remember to consider the endian-ness of the platform you are using and the file-format you are writing to, as you may need to do some byte-swapping.
One approach which isn't too hard if your files are stored in a simple format is just to combine them together manually. That is, create a new file with the caf format and manually put together the pieces you want.
This will be really easy if the sounds are uncompressed (linear PCM). But, read the documents on the caf file format here:
http://developer.apple.com/library/mac/#documentation/MusicAudio/Reference/CAFSpec/CAF_spec/CAF_spec.html#//apple_ref/doc/uid/TP40001862-CH210-SW1

Mixing sound files of different size

I want to mix audio files of different size into a one single .wav file without clipping any file.,i.e. The resulting file size should be equal to the largest sized file of all.
There is a sample through which we can mix files of same size
[(http://www.modejong.com/iOS/#ex4 )(Example 4)].
I modified the code to get the mixed file as a .wav file.
But I am not able to understand that how to modify this code for unequal sized files.
If someone can help me out with some code snippet,i'll be really thankful.
It should be as easy as sending all the files to the mixer simultaneously. When any single file gets to the end, just treat it as if the remainder is filled with zeroes. When all files get to the end, you are done.
Note that the example code says it returns an error if there would be clipping (the sum of the waves is greater than the max representable value.). This condition is more likely if you are mixing multiple inputs. The best way around it is to create some "headroom" in the input waves. You can do either do this in preprocessing, by ensuring that each wave's volume is no more than X% of maximum. (~80-90%, depending on number of inputs.). The other way is to do it dynamically in the mixer code by multiplying each sample by some value <1.0 as you add it to the mix.
If you are selecting the waves to mix at runtime and failure due to clipping is unacceptable, you will need to modify the sample code to pin the values at max/min instead of returning an error. Don't just let them overflow or you will get noisy artifacts.
(Clipping creates artifacts as well, but when you haven't created enough headroom before mixing, it is definitely preferrable to overflow. It is a more familiar-sounding type of distortion, similar to what you get when you overdrive your speakers. See this wikipedia article on clipping:
Clipping is preferable to the alternative in digital systems—wrapping—which occurs if the digital hardware is allowed to "overflow", ignoring the most significant bits of the magnitude, and sometimes even the sign of the sample value, resulting in gross distortion of the signal.
How I'd do it:
Much like the mix_buffers function that you linked to, but pass in 2 parameters for mixbufferNumSamples. Iterate over the whole of the longer of the two buffers. When the index has gone beyond the end of the shorter buffer, simply set the sample from that buffer to 0 for the rest of the function.
If you must avoid clipping and do it in real-time and you know nothing else about the two sounds, you must provide enough headroom. The simplest method is by halving each of the samples before mixing:
mixed = s1/2 + s2/2;
This ensures that the resultant mixed sample won't overflow an int16_t. It will have the side effect of making everything quieter though.
If you can run it offline, you can calculate a scale factor to apply to both waveforms which will keep the peaks when summed below the maximum allowed value.
Or you could mix them all at full volume to an int32_t buffer, keeping track of the largest (magnitude) mixed sample and then go back through the buffer multiplying each sample by a scale factor which will make that extreme sample just reach the +32767/-32768 limits.

How do you use afconvert to convert from wav to aac caf WITHOUT RESAMPLING

I'm making an Iphone game, we need to use a compressed format for sound, and we want to be able to loop SEAMLESSLY back to a specific sample in the audio file (so there is an intro, then it loops back to an offset)
currently THE ONLY export process I have found that will allow seamless looping (reports the right priming and padding frame numbers, no clicking when looping ect) is using apple's afconvert to a aac format in a caf file.
but when we try and encode to lower bitrates, it automatically re samples the sound! we do NOT want to have the sound re sampled, every other encoder I have encountered has an option to set the output sample rate, but I can't find it for this one.
on another note, if anyone has had any luck with seamless looping of a compressed file format using audio queues, let me know.
currently I'm working off the information found at:
http://developer.apple.com/mac/library/qa/qa2009/qa1636.html
note that this DID work PERFECTLY when I left the bitrate for the encode at default (~128kbs) but when I set it to 32kbps - with the -b option - it resampled, and looping clicks now.
It needs to be at least 48kbps. 32kbps will downsample to a lower sample rate.
I think you are confusing sample rate (typical values: 32kHz, 44.1kHz, 48kHz) and bit rate (typical values: 128kbps, 160kbps, 192kbps).
For a bit rate, 32kbps is extremely low. Sound will have bad quality at this bit rate. You probably intended to set the sample rate to 32kHz instead, which is also not outright typical, but makes more sense.
When compressing to AAC and uncompressing back to WAV, you will not get the same audio file back, because in AAC, the audio data is represented in a completely different format than in raw wave. E.g. you can have shifts by few microseconds, which are necessary to convert to the compressed format. You can not completely get around this with any highly compressed format.
The clicking sound originates from the sudden change between two samples which are played in direct succession. This is likely taking place because the offset to which you jump back in your loop does not end up to be at exactly the same position in the AAC file as it was in the WAV file (as explained above, there can shifts by microseconds).
You will not get around these slight changes when compressing. Instead, you have to compensate for them after compression by adjusting the offset. That means you have to open the compressed sound file in an audio editor, e.g. Audacity, and manually find another offset close to the original one, which is suitable for looping.
How to find an offset which is suitable for looping?
Zoom in to the waveform's end. Look at how the waveform looks there. Then zoom in to the waveform at the original offset and search in its neighbourhood for an offset at which the waveform connects seamlessly to the end of the waveform.
For an example how this shoud look like, open the uncompressed audio file in the audio editor and examine the end of the waveform and the offset there.