AudioToolbox - Callback delay while recording - iphone

I've been working on a very specific project for iOS, lately, and my researches lead me to an almost final code. I've solved all the extreme difficulties I've found until now, but on this one I don't seem to have a clue (about the reason nor the possibility of solving it).
I set up my audioqueue (sample rate 44100, format LinearPCM, 16 bits per channel, 2 bytes per frame, 1 channel per frame...) and start recording the sound with 12 audio buffers. However, there seems to be a delay after every 4 callbacks.
The situation is the following: the first 4 callbacks are called with an interval each of about 2 ms. However, between the 4th and the 5th, there is a delay of about 60ms. The same thing happens between the 8th and the 9th, the 12th and 13th and on...
There seems to be a relation between the bytes per frame and the moment of the delay. I know this because if I change to 4 bytes per frame, I start having the delay between the 8th and the 9th, then between the 16th and the 17th, the 24th and the 25th... Nonetheless, there doesn't seem to be any relation between the moment of the delay and the number of buffers.
The callback function does only two things: store the audio data (inBuffer->mAudioData) on a array my class can use; and call another AudioQueueEnqueueBuffer, to put the current buffer back on the queue.
Did anyone go through this problem already? Does anyone know, at least, what could be the cause of it?
Thank you in advance.

The Audio Queue API seems to run on top of the RemoteIO Audio Unit API, who's real audio buffer size is probably unrelated to, and in your example larger than, whatever size your Audio Queue buffers are. So whenever a RemoteIO buffer is ready, a bunch of your smaller AQ buffers quickly get filled from it. And then you get a longer delay waiting for some larger buffer to be filled with samples.
If you want better controlled (more evenly spaced) buffer latency, try using the RemoteIo Audio Unit directly.


STM32 ADC: leave it running at 'high' speed or switch it off as much as possible?

I am using a G0 with one ADC and 8 channels. Works fine. I use 4 channels. One is temperature that is measured constantly and I am interested in the value every 60s. Another one is almost the opposite: it is measuring sound waves for a couple a minutes per day and I need those samples at 10kHz.
I solved this by letting all 4 channels sample at 10kHz and have the four readings moved to memory by DMA (array of length 4 with 1 measurement each). Every 60s I take the temperature and when I need the audio, I retrieve the audio values.
If I had two ADC's, I would start the temperature ADC reading for 1 conversion every 60s. Non-stop. And I would only start the audio ADC for the the couple of minutes a day that it is needed. But with the one ADC solution, it seems simple to let all conversions run at this high speed continuously and that raised my question: Is there any true downside in having 40.000 conversions per second, 24 hours per day? If not, the code is simple. I just have the most recent values in memory all the time. But maybe I ruin the chip? I use too much energy I know. But there is plenty of it in this case.
You aren't going to "wear it out" by running it when you don't need to.
The main problems are wasting power and RAM.
If you have enough of these, then the lesser problems are:
The wasted power will become heat, this may upset your temperature measurements (this is a very small amount though).
Having the DMA running will increase your interrupt latency and maybe also slow down the processor slightly, if it encounters bus contention (this only matters if you are close to capacity in these regards).
Having it running all the time may also have the advantage of more stable readings, not being perturbed turning things on and off.

iOS: Bad Mic input latency measurement result

I'm running a test to measure the basic latency of my iPhone app, and the result was disappointing: 50ms for a play-through test app. The app just picks up mic input and plays it out using the same render callback, no other audio units or processing involved. Therefore, the results seemed too bad for such a basic scenario. I need some pointers to see if the result makes sense or I had design flaws in my test.
The basic idea of the test was to have three roles:
My finger snap as the reference sound source.
A simple iOS play-thru app (using built-in mic) as the first
listener to #1.
A Mac (with a USB mic and Audacity) as the second listener to #1 and
the only listener to the iOS output (through a speaker connected via
iOS headphone jack).
Then, with Audacity in recording mode, the Mac would pick up both the sound from my fingers and its "clone" from the iOS speaker in close range. Finally I simply visually observe the waveform in Audacity's recorded track and measure the time interval between the peaks of the two recorded snaps.
This was by no means a super accurate measurement, but at least the innate latency of the Mac recording pipeline should have been cancelled out this way. So that the error should mainly come from the peak distance measurement, which I assume should be much smaller than the audio pipeline latency and can be ignored.
I was expecting 20ms or lower latency, but clearly the result gave me 50~60ms.
My ASBD uses kAudioFormatFlagsCanonical and kAudioFormatLinearPCM as format.
50 mS is about 4 mS more than the duration of 2 audio buffers (one output, one input) of size 1024 at a sample rate of 44.1 kHz.
17 mS is around 5 mS more than the duration of 2 buffers of length 256.
So it looks like the iOS audio latency is around 5 mS plus the duration of the two buffers (the audio output buffer duration plus the time it takes to fill the input buffer) ... on your particular iOS device.
A few iOS devices may support even shorter audio buffer sizes of 128 samples.
You can use core audio and set up the audio session to have a very low latency.
You can set the buffer size to be smaller using AudioSessionSetProperty(kAudioSessionProperty_PreferredHardwareIOBufferDuration,...
Using smaller buffers causes the audio callback to happen more often while grabbing smaller chunks of audio. Keep in mind that this is merely a suggestion to the audio system. iOS will use a callback time suitable value based on your sample rate and integer powers of 2.
Once you set the buffer duration, you can get the actual buffer duration that the system will use using AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareIOBufferDuration,...
I'll summarize Paul R's comments as the answer, which has solved my problem:
50 ms corresponds to a total buffer size of around 2048 at a 44.1 kHz sample rate, which doesn't seem unreasonable given that you have both a record and a playback path.
I don't know that the buffer size is 2048, and there may be more than one buffer in your record-playback loopback test, but it seems that the effective total buffer size in you test is probably of the order of 2048, which doesn't seem unreasonable. Of course if you're only interested in record latency, as the title of your question suggests, then you'll need to find a way to tease that out separately from playback latency.

x264 threading latency

I wonder why sliceless threading ( in x264 leads to latency? If I have for example 2 threads the first encode one frame and the second encode one frame. The seconds have to wait for the first in some cases. But they can be encoded in parallel.
So two threads should be faster than only one, right?
Frame-threading add latency in frames not in seconds because you need to feed encoder with more input frames before you start getting output frames (to fill pipeline). Encoding one frame itself will take about near same processor time as with one thread but threading allow pipeline process by encoding different frames parallel. From other hand sliced-threading decrease latency because all threads encode one frame parallel so it would be finished faster than encoding it with one thread (also sliced-threading don't need latency in frames for pipepining).
It took me quite a while to reason through it, but the answer is Queuing Theory.
Each frame can be started when half of the previous frame has been encoded. But if parallelization is going to provide any benefit most (preferably all) threads should have a frame to work on. 5 threads means 5 frames. That is the pipeline. Any time the pipeline is not completely full, parallelization is giving you less of a benefit. If the pipeline contains only one frame, only one thread is working and therefore you get no benefit from parallelization. But if your pipeline is usually full, what is it full of? Unencoded frames. Unencoded frames are frames that must have been captured and therefore they represent that many frames worth of latency. The latency might be slightly less by a small constant portion of a frame because some of those frames in the pipeline are partially encoded but in general each item in the pipeline contributes to the latency.
One reason for added latency with more threads is that the consecutive frames use each other for motion prediction and compensation. That means in order to compress a frame you need info from previous motion estimation details. That means the frames are dependant on each other and sometimes they have to wait for at least some data from other threads as well. This is in contrast with the slice threading when threads slicing up the frame and each one works on one slice and all on the same frame and they have all the needed info from previous frames, or next in case of B frames.

iOS - Speed Issues

Hey all, I've got a method of recording that writes the notes that a user plays to an array in real time. The only problem is that there is a slight delay and each sequence is noticeably slowed down when playing back. I upped the speed of playback by about 6 miliseconds, and it sounds right, but I was wondering if the delay would vary on other devices?
I've tested on an ipod touch 2nd gen, how would that preform on 3rd, and 4th as well as iphones? do I need to test on all of them and find the optimal delay variation?
Any Ideas?
More Info:
I use two NSThreads instead of timers, and fill an array with blank spots where no notes should play (I use integers, -1 is a blank). Every 0.03 seconds it adds a blank when recording. Every time the user hits a note, the most recent blank is replaced by a number 0-7. When playing back, the second thread is used, (2 threads because the second one has a shorter time interval) that has a time of 0.024. The 6 millisecond difference compensates for the delay between the recording and playback.
I assume that either the recording or playing of notes takes longer than the other, and thus creates the delay.
What I want to know is if the delay will be different on other devices, and how I should compensate for it.
Exact Solution
I may not have explained it fully, that's why this solution wasn't provided, but for anyone with a similar problem...
I played each beat similar to a midi file like so:
while playing:
do stuff to play beat
new date xyz seconds from now
new date now
while now is not > date xyz seconds from now wait.
The obvious thing that I was missing was to create the two dates BEFORE playing the beat...
It seems more likely to me that the additional delay is caused by the playback of the note, or other compute overhead in the second thread. Grab the wallclock time in the second thread before playing each note, and check the time difference from the last one. You will need to reduce your following delay by any excess (likely 0.006 seconds!).
The delay will be different on different generations of the iphone, but by adapting to it dynamically like this, you will be safe as long as the processing overhead is less than 0.03 seconds.
You should do the same thing in the first thread as well.
Getting high resolution timestamps - there's a a discussion on apple forums here, or this stackoverflow question.

AudioQueueNewInput callback latency

Regardless of the size of the buffers I provide the callback provided to AudioQueueNewInput occurs at roughly the same time interval.
For example:
If you have .05 second buffers and are recording at 44k the callback first called about at .09 seconds and then a second call occurs right after (.001 seconds). Then you wait again for ~.09 seconds. If your buffer size was .025. You would wait .09 seconds and then see 3 more buffers nearly instantly.
Changing the sample rate increases the latency.
Recording 16 bit 8k audio results in .5 seconds of latency between buffer floods.
So I suspect that there is an 8000 byte buffer that is being used behind the scenes. When it's filled my callback gets run with the given buffers until it is emptied.
I want to record 16k 16 bit audio with as little latency as possible. Given the above I always see about a quarter of a second of latency. Is there a way to decrease the latency? Is there an audio session property to set the internal buffer size? I've tried kAudioSessionProperty_PreferredHardwareIOBufferDuration but it does not seem to help.
The Audio Queue API looks like it is built on top of the Audio Unit RemoteIO API. Small Audio Queue buffers are probably being used to fill a larger RemoteIO buffer behind the scenes. Perhaps even some rate resampling might be taking place (on the original 2G phone).
For lower latency, try using the RemoteIO Audio Unit API directly, and then requesting the audio session to provide your app a smaller lower latency buffer size.