I'm running a test to measure the basic latency of my iPhone app, and the result was disappointing: 50ms for a play-through test app. The app just picks up mic input and plays it out using the same render callback, no other audio units or processing involved. Therefore, the results seemed too bad for such a basic scenario. I need some pointers to see if the result makes sense or I had design flaws in my test.
The basic idea of the test was to have three roles:
My finger snap as the reference sound source.
A simple iOS play-thru app (using built-in mic) as the first
listener to #1.
A Mac (with a USB mic and Audacity) as the second listener to #1 and
the only listener to the iOS output (through a speaker connected via
iOS headphone jack).
Then, with Audacity in recording mode, the Mac would pick up both the sound from my fingers and its "clone" from the iOS speaker in close range. Finally I simply visually observe the waveform in Audacity's recorded track and measure the time interval between the peaks of the two recorded snaps.
This was by no means a super accurate measurement, but at least the innate latency of the Mac recording pipeline should have been cancelled out this way. So that the error should mainly come from the peak distance measurement, which I assume should be much smaller than the audio pipeline latency and can be ignored.
I was expecting 20ms or lower latency, but clearly the result gave me 50~60ms.
My ASBD uses kAudioFormatFlagsCanonical and kAudioFormatLinearPCM as format.
50 mS is about 4 mS more than the duration of 2 audio buffers (one output, one input) of size 1024 at a sample rate of 44.1 kHz.
17 mS is around 5 mS more than the duration of 2 buffers of length 256.
So it looks like the iOS audio latency is around 5 mS plus the duration of the two buffers (the audio output buffer duration plus the time it takes to fill the input buffer) ... on your particular iOS device.
A few iOS devices may support even shorter audio buffer sizes of 128 samples.
You can use core audio and set up the audio session to have a very low latency.
You can set the buffer size to be smaller using AudioSessionSetProperty(kAudioSessionProperty_PreferredHardwareIOBufferDuration,...
Using smaller buffers causes the audio callback to happen more often while grabbing smaller chunks of audio. Keep in mind that this is merely a suggestion to the audio system. iOS will use a callback time suitable value based on your sample rate and integer powers of 2.
Once you set the buffer duration, you can get the actual buffer duration that the system will use using AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareIOBufferDuration,...
I'll summarize Paul R's comments as the answer, which has solved my problem:
50 ms corresponds to a total buffer size of around 2048 at a 44.1 kHz sample rate, which doesn't seem unreasonable given that you have both a record and a playback path.
I don't know that the buffer size is 2048, and there may be more than one buffer in your record-playback loopback test, but it seems that the effective total buffer size in you test is probably of the order of 2048, which doesn't seem unreasonable. Of course if you're only interested in record latency, as the title of your question suggests, then you'll need to find a way to tease that out separately from playback latency.
Related
I am working on an embedded Linux application with audio passthrough using ALSA. It has very stringent latency requirements.
The output buffer is as small as possible which results in an occasional (perhaps once an hour) underrun on the output. This is acceptable. However, when it occurs, it causes a "backup" in the capture buffer and the result is a creeping increase in latency.
There doesn't seem to be a reliable way to know how much output data was lost in order to discard the same amount of input. I can experiment, but even though it's an embedded application it needs to be device independent, so we need a reliable solution.
Does anyone know a way to determine how much data was lost, or if it is always one buffer, or have other suggestions?
If you do not want the PCM devices to stop on an underrun/overrun, configure them not to stop by setting the stop threshold to the boundary value. Then they will just continue to run, and the number of available frames will continue to increase (for capture) or decrease (for playback). (Not all of those frames will be usable; the ring buffer just wraps around.)
I've been working on a very specific project for iOS, lately, and my researches lead me to an almost final code. I've solved all the extreme difficulties I've found until now, but on this one I don't seem to have a clue (about the reason nor the possibility of solving it).
I set up my audioqueue (sample rate 44100, format LinearPCM, 16 bits per channel, 2 bytes per frame, 1 channel per frame...) and start recording the sound with 12 audio buffers. However, there seems to be a delay after every 4 callbacks.
The situation is the following: the first 4 callbacks are called with an interval each of about 2 ms. However, between the 4th and the 5th, there is a delay of about 60ms. The same thing happens between the 8th and the 9th, the 12th and 13th and on...
There seems to be a relation between the bytes per frame and the moment of the delay. I know this because if I change to 4 bytes per frame, I start having the delay between the 8th and the 9th, then between the 16th and the 17th, the 24th and the 25th... Nonetheless, there doesn't seem to be any relation between the moment of the delay and the number of buffers.
The callback function does only two things: store the audio data (inBuffer->mAudioData) on a array my class can use; and call another AudioQueueEnqueueBuffer, to put the current buffer back on the queue.
Did anyone go through this problem already? Does anyone know, at least, what could be the cause of it?
Thank you in advance.
The Audio Queue API seems to run on top of the RemoteIO Audio Unit API, who's real audio buffer size is probably unrelated to, and in your example larger than, whatever size your Audio Queue buffers are. So whenever a RemoteIO buffer is ready, a bunch of your smaller AQ buffers quickly get filled from it. And then you get a longer delay waiting for some larger buffer to be filled with samples.
If you want better controlled (more evenly spaced) buffer latency, try using the RemoteIo Audio Unit directly.
I want to add a few bytes of data to a sound file (for example a song). The sound file will be transmitted via radio to a received who uses for example the iPhone microphone to pick up the sound, and an application will show the original bytes of data. Preferably it should not be hearable for humans.
What is such technology called? Are there any applications that can do this?
Libraries/apps that can be used on iPhone?
It's audio steganography. There are algorithms to do it. Refer to here.
I've done some research, and it seems the way to go is:
Use low audio frequencies.
Spread the "bits" around randomly - do not use a pattern as it will be picked up by the listener. "White noise" is a good clue. The random pattern is known by the sender and receiver.
Use Fourier transform to pick up frequency and amplitude
Clean up input data.
Use checksum/redundancy-algorithms to compensate for loss.
I'm writing a prototype and am having a bit difficulty in picking up the right frequency as if has a ~4 Hz offset (100 Hz becomes 96.x Hz when played and picked up by the microphone).
This is not the answer, but I hope it helps.
Regardless of the size of the buffers I provide the callback provided to AudioQueueNewInput occurs at roughly the same time interval.
For example:
If you have .05 second buffers and are recording at 44k the callback first called about at .09 seconds and then a second call occurs right after (.001 seconds). Then you wait again for ~.09 seconds. If your buffer size was .025. You would wait .09 seconds and then see 3 more buffers nearly instantly.
Changing the sample rate increases the latency.
Recording 16 bit 8k audio results in .5 seconds of latency between buffer floods.
So I suspect that there is an 8000 byte buffer that is being used behind the scenes. When it's filled my callback gets run with the given buffers until it is emptied.
I want to record 16k 16 bit audio with as little latency as possible. Given the above I always see about a quarter of a second of latency. Is there a way to decrease the latency? Is there an audio session property to set the internal buffer size? I've tried kAudioSessionProperty_PreferredHardwareIOBufferDuration but it does not seem to help.
thanks!
The Audio Queue API looks like it is built on top of the Audio Unit RemoteIO API. Small Audio Queue buffers are probably being used to fill a larger RemoteIO buffer behind the scenes. Perhaps even some rate resampling might be taking place (on the original 2G phone).
For lower latency, try using the RemoteIO Audio Unit API directly, and then requesting the audio session to provide your app a smaller lower latency buffer size.
I am using OpenAL to pitch shift a note. e.g.
alSourcef(source, AL_PITCH, aPitch);
I am noticing however an audible click when I do this. Other than that the pitch is perfect, correct pitch etc.
Any ideas what might be causing this?
I haven't used OpenAL, but in other sound libraries I have seen this "artifact". There is usually, when dealing with tone generator etc. a variable for the time it takes a tone to reach 100% volume level, I can for the life of me not remember what it is called :)
like this:
playTone(400 Hz, 40 dB, 50 ms, 3000 ms).
where 400 is the Hz, 40 dB the volume, 3000 milliseconds is the duration and 50 milliseconds is the time it takes from starting the tone at volume 0 (or +100dB) to it reaches 40 dB. I simply can't find the word right now.
Anyways, if you have the ability to set this variable, try doing that, just set it to something like 10 ms. You wont be able to hear it, but it has removed clicking sounds for me in both an open source sound library I used for the iPhone and in some Java/Processing libraries I used in the past.
Maybe it has to do with the way the underlying code is triggering some hardware connected to the speaker?
i have experience on this one, mostly it is because you shift the pitch too high or too low, shifting pitch is stretching or shrinking wave-data length, the case is if your data does not have enough sample to stretch it will sound "weird", in case of shortening the length (pitch-up) if your playback buffer does not have enough sample to feed in time, it will lag or jitter because conceptually the playing rate is increased to due shortened the length of audio, mostly clicking or popping is what you heard.
to prevent this, you should limit the shifting range, mostly 0.5 to 2.0 is the limit on most sound-card, and it is vary across soundcard, since shifting the pitch could be make better by using some advanced smoothing and processing in DSP, so it will depend on processing power of your DSP or CPU to do such processing. i've tried it using onboard intel HDA that the limit is mostly 0.5 to 2.0, but using X-Fi soundcard it is better, shifting to 0.1 .. 5.0 doesn't have a problem