AudioQueueNewInput callback latency - iphone

Regardless of the size of the buffers I provide the callback provided to AudioQueueNewInput occurs at roughly the same time interval.
For example:
If you have .05 second buffers and are recording at 44k the callback first called about at .09 seconds and then a second call occurs right after (.001 seconds). Then you wait again for ~.09 seconds. If your buffer size was .025. You would wait .09 seconds and then see 3 more buffers nearly instantly.
Changing the sample rate increases the latency.
Recording 16 bit 8k audio results in .5 seconds of latency between buffer floods.
So I suspect that there is an 8000 byte buffer that is being used behind the scenes. When it's filled my callback gets run with the given buffers until it is emptied.
I want to record 16k 16 bit audio with as little latency as possible. Given the above I always see about a quarter of a second of latency. Is there a way to decrease the latency? Is there an audio session property to set the internal buffer size? I've tried kAudioSessionProperty_PreferredHardwareIOBufferDuration but it does not seem to help.
thanks!

The Audio Queue API looks like it is built on top of the Audio Unit RemoteIO API. Small Audio Queue buffers are probably being used to fill a larger RemoteIO buffer behind the scenes. Perhaps even some rate resampling might be taking place (on the original 2G phone).
For lower latency, try using the RemoteIO Audio Unit API directly, and then requesting the audio session to provide your app a smaller lower latency buffer size.

Related

ALSA passthrough latency

I am working on an embedded Linux application with audio passthrough using ALSA. It has very stringent latency requirements.
The output buffer is as small as possible which results in an occasional (perhaps once an hour) underrun on the output. This is acceptable. However, when it occurs, it causes a "backup" in the capture buffer and the result is a creeping increase in latency.
There doesn't seem to be a reliable way to know how much output data was lost in order to discard the same amount of input. I can experiment, but even though it's an embedded application it needs to be device independent, so we need a reliable solution.
Does anyone know a way to determine how much data was lost, or if it is always one buffer, or have other suggestions?
If you do not want the PCM devices to stop on an underrun/overrun, configure them not to stop by setting the stop threshold to the boundary value. Then they will just continue to run, and the number of available frames will continue to increase (for capture) or decrease (for playback). (Not all of those frames will be usable; the ring buffer just wraps around.)

iOS: Bad Mic input latency measurement result

I'm running a test to measure the basic latency of my iPhone app, and the result was disappointing: 50ms for a play-through test app. The app just picks up mic input and plays it out using the same render callback, no other audio units or processing involved. Therefore, the results seemed too bad for such a basic scenario. I need some pointers to see if the result makes sense or I had design flaws in my test.
The basic idea of the test was to have three roles:
My finger snap as the reference sound source.
A simple iOS play-thru app (using built-in mic) as the first
listener to #1.
A Mac (with a USB mic and Audacity) as the second listener to #1 and
the only listener to the iOS output (through a speaker connected via
iOS headphone jack).
Then, with Audacity in recording mode, the Mac would pick up both the sound from my fingers and its "clone" from the iOS speaker in close range. Finally I simply visually observe the waveform in Audacity's recorded track and measure the time interval between the peaks of the two recorded snaps.
This was by no means a super accurate measurement, but at least the innate latency of the Mac recording pipeline should have been cancelled out this way. So that the error should mainly come from the peak distance measurement, which I assume should be much smaller than the audio pipeline latency and can be ignored.
I was expecting 20ms or lower latency, but clearly the result gave me 50~60ms.
My ASBD uses kAudioFormatFlagsCanonical and kAudioFormatLinearPCM as format.
50 mS is about 4 mS more than the duration of 2 audio buffers (one output, one input) of size 1024 at a sample rate of 44.1 kHz.
17 mS is around 5 mS more than the duration of 2 buffers of length 256.
So it looks like the iOS audio latency is around 5 mS plus the duration of the two buffers (the audio output buffer duration plus the time it takes to fill the input buffer) ... on your particular iOS device.
A few iOS devices may support even shorter audio buffer sizes of 128 samples.
You can use core audio and set up the audio session to have a very low latency.
You can set the buffer size to be smaller using AudioSessionSetProperty(kAudioSessionProperty_PreferredHardwareIOBufferDuration,...
Using smaller buffers causes the audio callback to happen more often while grabbing smaller chunks of audio. Keep in mind that this is merely a suggestion to the audio system. iOS will use a callback time suitable value based on your sample rate and integer powers of 2.
Once you set the buffer duration, you can get the actual buffer duration that the system will use using AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareIOBufferDuration,...
I'll summarize Paul R's comments as the answer, which has solved my problem:
50 ms corresponds to a total buffer size of around 2048 at a 44.1 kHz sample rate, which doesn't seem unreasonable given that you have both a record and a playback path.
I don't know that the buffer size is 2048, and there may be more than one buffer in your record-playback loopback test, but it seems that the effective total buffer size in you test is probably of the order of 2048, which doesn't seem unreasonable. Of course if you're only interested in record latency, as the title of your question suggests, then you'll need to find a way to tease that out separately from playback latency.

x264 threading latency

I wonder why sliceless threading (http://akuvian.org/src/x264/sliceless_threads.txt) in x264 leads to latency? If I have for example 2 threads the first encode one frame and the second encode one frame. The seconds have to wait for the first in some cases. But they can be encoded in parallel.
So two threads should be faster than only one, right?
Frame-threading add latency in frames not in seconds because you need to feed encoder with more input frames before you start getting output frames (to fill pipeline). Encoding one frame itself will take about near same processor time as with one thread but threading allow pipeline process by encoding different frames parallel. From other hand sliced-threading decrease latency because all threads encode one frame parallel so it would be finished faster than encoding it with one thread (also sliced-threading don't need latency in frames for pipepining).
It took me quite a while to reason through it, but the answer is Queuing Theory.
Each frame can be started when half of the previous frame has been encoded. But if parallelization is going to provide any benefit most (preferably all) threads should have a frame to work on. 5 threads means 5 frames. That is the pipeline. Any time the pipeline is not completely full, parallelization is giving you less of a benefit. If the pipeline contains only one frame, only one thread is working and therefore you get no benefit from parallelization. But if your pipeline is usually full, what is it full of? Unencoded frames. Unencoded frames are frames that must have been captured and therefore they represent that many frames worth of latency. The latency might be slightly less by a small constant portion of a frame because some of those frames in the pipeline are partially encoded but in general each item in the pipeline contributes to the latency.
One reason for added latency with more threads is that the consecutive frames use each other for motion prediction and compensation. That means in order to compress a frame you need info from previous motion estimation details. That means the frames are dependant on each other and sometimes they have to wait for at least some data from other threads as well. This is in contrast with the slice threading when threads slicing up the frame and each one works on one slice and all on the same frame and they have all the needed info from previous frames, or next in case of B frames.

AudioToolbox - Callback delay while recording

I've been working on a very specific project for iOS, lately, and my researches lead me to an almost final code. I've solved all the extreme difficulties I've found until now, but on this one I don't seem to have a clue (about the reason nor the possibility of solving it).
I set up my audioqueue (sample rate 44100, format LinearPCM, 16 bits per channel, 2 bytes per frame, 1 channel per frame...) and start recording the sound with 12 audio buffers. However, there seems to be a delay after every 4 callbacks.
The situation is the following: the first 4 callbacks are called with an interval each of about 2 ms. However, between the 4th and the 5th, there is a delay of about 60ms. The same thing happens between the 8th and the 9th, the 12th and 13th and on...
There seems to be a relation between the bytes per frame and the moment of the delay. I know this because if I change to 4 bytes per frame, I start having the delay between the 8th and the 9th, then between the 16th and the 17th, the 24th and the 25th... Nonetheless, there doesn't seem to be any relation between the moment of the delay and the number of buffers.
The callback function does only two things: store the audio data (inBuffer->mAudioData) on a array my class can use; and call another AudioQueueEnqueueBuffer, to put the current buffer back on the queue.
Did anyone go through this problem already? Does anyone know, at least, what could be the cause of it?
Thank you in advance.
The Audio Queue API seems to run on top of the RemoteIO Audio Unit API, who's real audio buffer size is probably unrelated to, and in your example larger than, whatever size your Audio Queue buffers are. So whenever a RemoteIO buffer is ready, a bunch of your smaller AQ buffers quickly get filled from it. And then you get a longer delay waiting for some larger buffer to be filled with samples.
If you want better controlled (more evenly spaced) buffer latency, try using the RemoteIo Audio Unit directly.

Audible Click when pitch shifting in OpenAL

I am using OpenAL to pitch shift a note. e.g.
alSourcef(source, AL_PITCH, aPitch);
I am noticing however an audible click when I do this. Other than that the pitch is perfect, correct pitch etc.
Any ideas what might be causing this?
I haven't used OpenAL, but in other sound libraries I have seen this "artifact". There is usually, when dealing with tone generator etc. a variable for the time it takes a tone to reach 100% volume level, I can for the life of me not remember what it is called :)
like this:
playTone(400 Hz, 40 dB, 50 ms, 3000 ms).
where 400 is the Hz, 40 dB the volume, 3000 milliseconds is the duration and 50 milliseconds is the time it takes from starting the tone at volume 0 (or +100dB) to it reaches 40 dB. I simply can't find the word right now.
Anyways, if you have the ability to set this variable, try doing that, just set it to something like 10 ms. You wont be able to hear it, but it has removed clicking sounds for me in both an open source sound library I used for the iPhone and in some Java/Processing libraries I used in the past.
Maybe it has to do with the way the underlying code is triggering some hardware connected to the speaker?
i have experience on this one, mostly it is because you shift the pitch too high or too low, shifting pitch is stretching or shrinking wave-data length, the case is if your data does not have enough sample to stretch it will sound "weird", in case of shortening the length (pitch-up) if your playback buffer does not have enough sample to feed in time, it will lag or jitter because conceptually the playing rate is increased to due shortened the length of audio, mostly clicking or popping is what you heard.
to prevent this, you should limit the shifting range, mostly 0.5 to 2.0 is the limit on most sound-card, and it is vary across soundcard, since shifting the pitch could be make better by using some advanced smoothing and processing in DSP, so it will depend on processing power of your DSP or CPU to do such processing. i've tried it using onboard intel HDA that the limit is mostly 0.5 to 2.0, but using X-Fi soundcard it is better, shifting to 0.1 .. 5.0 doesn't have a problem