Recording playback and mic on IPhone - iphone

In iPhone SDK 4.3 I would like to record what is being played out through speaker via Remote IO and also record the mic input. I was wondering if the best way is to record each separately to a different channel in an audio file. If so which apis allow me to do this and what audio format should I use. I was planning on using ExtAudioFileWrite to do the actual writing to the file.

If both tracks that you have is mono, 16bit integer with the same sample rate:
format->mFormatFlags = kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsPacked;
format->mBitsPerChannel = 16;
you can combine those tracks to the 2 channels PCM by just alternating sample from one track with sample from another.
[short1_track1][short1_track2][short2_track1][short2_track2] and so on.
After that you can write this samples to the output file using ExtAudioFileWrite. That file should be 2 channel kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsPacked of course.
If one of tracks is stereo (I don't think that it is reasonable to record stereo from iphone mic), you can convert it to the mono by taking average from 2 channels or by skipping every second sample of it.

You can separately save PCM data from the play and record callback buffers of the RemoteIO Audio Unit, then mix them using your own mixer code (DSP code) before writing the mixed result to a file.
You may or may not need to do your own echo cancellation (advanced DSP code) as well.


what is the content of an .acc audio file?

i may sound too rookie please excuse me. When i read a .AAC audio file in Matlab using the audioread function the out put is a 256000x6 matrix. how do i know what is the content of each column?
filename = 'sample1.aac';
[y,Fs] = audioread(filename,'native');
writing the first column using audiowrite as below i can hear the whole sound. so what are the other columns?
Output Arguments
y - Audio Data
Audio data in the file, returned as an m-by-n matrix, where m is the number of audio samples read and n is the number of audio channels in the file.
If you can hear the entire file in the first channel, it just means most of that file is contained in a mono channel. From Wikipedia r.e. AAC audio channels:
AAC supports inclusion of 48 full-bandwidth (up to 96 kHz) audio channels in one stream plus 16 low frequency effects (LFE, limited to 120 Hz) channels, up to 16 "coupling" or dialog channels, and up to 16 data streams

CoreAudio Audio Unit plays only one channel of stereo audio

Recently I've bumped into next problem.
I use CoreAudio AudioUnit (RemoteI/O) to play/record sound stream in an iOS app.
Sound stream which goes into audio unit is 2 channel LPCM, 16 bit, signed integer, interleaved (I also configure an output recording stream which is basically the same but has only one channel and 2 bytes per packet and frame).
I have configured my input ASBD as follows (I get no error when I set it and when I initialize unit):
ASBD.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsPacked;
ASBD.mBytesPerPacket = 4;
ASBD.mFramesPerPacket = 1;
ASBD.mBytesPerFrame = 4;
ASBD.mChannelsPerFrame = 2;
ASBD.mBitsPerChannel = 16;
In my render callback function I get AudioBufferList with one buffer (as I understand, because the audio stream is interleaved).
I have a sample stereo file for testing which is 100% stereo with 2 obvious channels. I translate it into stream which corresponds to ASBD and feed to audio unit.
When I play sample file I hear only left channel.
I would appreciate any ideas why this happens. If needed I can post more code.
Update: I've tried to set
ASBD.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsPacked | kLinearPCMFormatFlagIsNonInterleaved;
ASBD.mBytesPerPacket = 2;
ASBD.mFramesPerPacket = 1;
ASBD.mBytesPerFrame = 2;
ASBD.mChannelsPerFrame = 2;
ASBD.mBitsPerChannel = 16;
ASBD and I've got buffer list with two buffers. I deinterleaved my stream into 2 channels(1 channel for 1 buffer) and got the same result. I tried with headset and speaker on iPad (I know that speaker is mono).
Ok. So I've check my code and spotted that I use VoiceProcessingIO audio unit (instead of RemoteIO which is in the question) which is basically correct for my app since documentation says "The Voice-Processing I/O unit (subtype kAudioUnitSubType_VoiceProcessingIO) has the characteristics of the Remote I/O unit and adds echo suppression for two-way duplex communication. It also adds automatic gain correction, adjustment of voice-processing quality, and muting"
When I changed audio unit type to RemoteIO I've immediately got the stereo playback. I didn't have to change stream properties.
Basically VoiceProcessingIO audio unit downfalls to mono and disregards stream properties.
I've posted a question on Apple Developer forum regarding stereo output using VoiceProcessingIO audio unit but haven't got any answer yet.
It seems pretty logical for me to downfall to mono in order to do some signal processing like echo cancelation because iOS devices can record only mono sound without specific external accessories. Although this is not documented anywhere in Apple documentation. I've also come across a post of guy who claimed that stereo worked for VoiceProcessingIO AU prior to iOS5.0.
Anyway thanks for your attention. Any other comments on the matter would be greatly appreciated.

iOS: Bad Mic input latency measurement result

I'm running a test to measure the basic latency of my iPhone app, and the result was disappointing: 50ms for a play-through test app. The app just picks up mic input and plays it out using the same render callback, no other audio units or processing involved. Therefore, the results seemed too bad for such a basic scenario. I need some pointers to see if the result makes sense or I had design flaws in my test.
The basic idea of the test was to have three roles:
My finger snap as the reference sound source.
A simple iOS play-thru app (using built-in mic) as the first
listener to #1.
A Mac (with a USB mic and Audacity) as the second listener to #1 and
the only listener to the iOS output (through a speaker connected via
iOS headphone jack).
Then, with Audacity in recording mode, the Mac would pick up both the sound from my fingers and its "clone" from the iOS speaker in close range. Finally I simply visually observe the waveform in Audacity's recorded track and measure the time interval between the peaks of the two recorded snaps.
This was by no means a super accurate measurement, but at least the innate latency of the Mac recording pipeline should have been cancelled out this way. So that the error should mainly come from the peak distance measurement, which I assume should be much smaller than the audio pipeline latency and can be ignored.
I was expecting 20ms or lower latency, but clearly the result gave me 50~60ms.
My ASBD uses kAudioFormatFlagsCanonical and kAudioFormatLinearPCM as format.
50 mS is about 4 mS more than the duration of 2 audio buffers (one output, one input) of size 1024 at a sample rate of 44.1 kHz.
17 mS is around 5 mS more than the duration of 2 buffers of length 256.
So it looks like the iOS audio latency is around 5 mS plus the duration of the two buffers (the audio output buffer duration plus the time it takes to fill the input buffer) ... on your particular iOS device.
A few iOS devices may support even shorter audio buffer sizes of 128 samples.
You can use core audio and set up the audio session to have a very low latency.
You can set the buffer size to be smaller using AudioSessionSetProperty(kAudioSessionProperty_PreferredHardwareIOBufferDuration,...
Using smaller buffers causes the audio callback to happen more often while grabbing smaller chunks of audio. Keep in mind that this is merely a suggestion to the audio system. iOS will use a callback time suitable value based on your sample rate and integer powers of 2.
Once you set the buffer duration, you can get the actual buffer duration that the system will use using AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareIOBufferDuration,...
I'll summarize Paul R's comments as the answer, which has solved my problem:
50 ms corresponds to a total buffer size of around 2048 at a 44.1 kHz sample rate, which doesn't seem unreasonable given that you have both a record and a playback path.
I don't know that the buffer size is 2048, and there may be more than one buffer in your record-playback loopback test, but it seems that the effective total buffer size in you test is probably of the order of 2048, which doesn't seem unreasonable. Of course if you're only interested in record latency, as the title of your question suggests, then you'll need to find a way to tease that out separately from playback latency.

How to export sound from timeline of sounds on iOS with OpenAL

I'm not sure if it's possible to achieve what I want, but basically I have a NSDictionary which represents a recording. It's a timeline of what sound id was played at what point in time.
I have it so that you can play back this timeline/recording, and it works perfectly.
I'm wondering if there is anyway to take this timeline, and export it as a single sound that could be saved to a computer if the device was synced with iTunes.
So basically I'm asking if I can take a timeline of sounds, play it back and have these sounds stitched together as a single sound, that can then be exported.
I'm using OpenAL as my sound framework and the sound files are all CAFs.
Any help or guidance is appreciated.
You will need:
A good understanding of linear PCM audio format (See Wikipedia's Linear PCM page).
A good understanding of audio sample-rates and some basic maths to convert your timings into sample-offsets.
An awareness of how two's-complement binary numbers (signed/unsigned, 16-bit, 32-bit, etc.) are stored in computers, and how the endian-ness of a processor affects this.
Patience, interest in learning, and a strong desire to get this working.
Here's what to do:
Enable file sharing in your app (UIFileSharingEnabled=YES in info.plist and write files to /Documents directory).
Render the used sounds into memory buffers containing linear PCM audio data (if they are not already, i.e. if they are compressed). You can do this using the offline rendering functionality of Audio Queues (see Apple audio queue docs). It will make things a lot easier if you render them all to the same PCM format and sample rate (For example 16-bit signed samples #44,100Hz, I'll use this format for all examples), and use the same format for your output. I recommend starting off with a Mono format then adding stereo once you get it working.
Choose an uncompressed output format and mix your sounds into a single stream:
3.1. Allocate a buffer large enough, or open a file stream to write to.
3.2. Write out any headers (for example if using WAV format output instead of raw PCM) and write zeros (or the mid-point of your sample range if not using a signed sample format) for any initial silence before your first sound starts. For example if you want 0.1 seconds silence before your first sound, write 4410 (0.1 * 44100) zero-samples i.e. write 4410 shorts (16-bit) all with zero.
3.3. Now keep track of all 'currently playing' sounds and mix them together. Start with an empty list of 'currently playing sounds and keep track of the 'current time' of the sample you are mixing, for each sample you write out increment the 'current time' by 1.0/sample_rate. When it gets time for another sound to start, add it to the 'currently playing' list with a sample offset of 0. Now to do the mixing, you iterate through all of the 'currently playing' sounds and add together their current sample, then increment the sample offset for each of them. Write the summed value into the output buffer. For example if soundA starts at 0.1 seconds (after the silence) and soundB starts at 0.2 seconds, you will be doing the equivalent of output[8820] = soundA[4410] + soundB[0]; for sample 8820 and then output[8821] = soundA[4411] + soundB[1]; for sample 8821, etc. As a sound ends (you get to the end of its samples) simply remove it from the 'currently playing' list and keep going until the end of your audio data.
3.4. The simple mixing (sum of samples) described above does have some problems. For example if two samples have values that add up to a number larger than 32767, this cannot be stored in a signed-16-bit number, this is called clipping. For now, just clamp the value to 32767, and get it working... later on come back and implement a simple limiter (see description at end).
Now that you have a mixed version of your track in an uncompressed linear PCM format, that might be enough, so write it to /Documents. If you want to write it in a compressed format, you will need to get the source for an audio encoder and run your linear PCM output through that.
Simple limiter:
Let's choose to limit the top 10% of the sample range, so if the absolute value is greater than 29490 (int limitBegin = (int)(32767 * 0.9f);) we will scale down the value. The maximum possible peak would be int maxSampleValue = 32767 * numPlayingSounds; and we want to scale values above limitBegin to peak at 32767. So do the summation into sampleValue as per the very simple mixer described above, then:
if(sampleValue > limitBegin)
float overLimit = (sampleValue - limitBegin) / (float)(maxSampleValue - limitBegin);
sampleValue = limitBegin + (int)(overLimit * (32767 - limitBegin));
If you're paying attention, you will have noticed that when numPlayingSounds changes (for example when a new sound starts), the limiter becomes more (or less) harsh and this may result in abrupt volume changes (within the limited range) to accommodate the extra sound. You can use the maximum number of playing sounds instead, or devise some clever way to ramp up the limiter over a few milliseconds.
Remember that this is operating on the absolute value of sampleValue (which may be negative in signed formats), so the code here is just to demonstrate the idea. You'll need to write it properly to handle limiting at both ends (peak and trough) of your sample range. Also, there are some tricks you can do to optimize all of the above during the mixing - you will probably spot these while you're writing the mixer, be careful and get it working first, then go back and refactor/optimize if needed.
Also remember to consider the endian-ness of the platform you are using and the file-format you are writing to, as you may need to do some byte-swapping.
One approach which isn't too hard if your files are stored in a simple format is just to combine them together manually. That is, create a new file with the caf format and manually put together the pieces you want.
This will be really easy if the sounds are uncompressed (linear PCM). But, read the documents on the caf file format here:

iPhone: Change playback speed with Audio Units

What are the different ways to change the playback speed of audio on the iPhone, when using Audio Units? What are the advantages / disadvantages of each solution?
I have a mixer unit and an IO unit. Do I need to add another unit (eg. converter unit)? What audio unit parameters should I set, on which (input or output) bus on which audio unit(s)?
My current setup:
------------------------- -------------------------
| mixer unit | -----------> | IO unit |
------------------------- -------------------------
All of the below solutions will alter the pitch of your audio (along with the playback speed). To correct the pitch of your audio after the playback speed has been changed you'll need to use a 3rd party audio library (such as SoundTouch, which has an LGPL license, so you can use it in your app, without making it open-source, or DiracLE or the free smbPitchShift). There is an audio unit (AUPitch), that can change the pitch of your audio, but it's not available for iPhone; only for Mac.
All of the solutions below are tested, and work...
Solution #1 [Best solution]
3D Mixer Unit: Instead of a Multichannel Mixer unit use a 3D Mixer unit and set the k3DMixerParam_PlaybackRate on the input scope.
Advantages: k3DMixerParam_PlaybackRate can be set real-time, while you are playing audio, without any clipping sounds or other side effects. It's also easy to implement once you have audio units going.
Disadvantages: Affects the pitch of your audio, but the difference in pitch is not really noticeable if you only need to alter the playback rate by +/- 8%.
Solution #2
Changing sample rate: Change the sample rate on the output bus of the mixer unit. Changing the sample rate works very similarly to adding and removing samples.
Advantages: Works well if you want to multiply the playback speed by a fraction of an integer (1.2x for example).
Disadvantages: Changing the sample rate of the mixer output can't be set on the fly; only when initializing the mixer unit. Affects the pitch of your audio, but the difference in pitch is not really noticeable if you only need to alter the playback rate by +/- 8%.
audioDescriptionMixerOutput.mSampleRate = 1.2*kGraphSampleRate;
Solution #3
Add/remove samples: Only pass every second, third, ... audio sample to the input of your audio unit (mixer unit in my case) in your render callback function.
Advantages: Works well if you want to speed up or slow down your audio playback by 2x, 3x, 4x, etc. It's also easy to implement.
Disadvantages: You can only multiply the playback speed by an integer factor. Speeding up audio playback by 1.2x for example is not possible by adding or removing samples. Affects the pitch of your audio.