I want to apply an audio filter on the users voice in iPhone.
The filter is quite heavy and needs many audio samples to get the desired quality. I do not want to apply the filter in realtime but I want to have an almost realtime performance. I would like the processing to happen in parrallel with the recording when the nessesary samples are collected and when the user stops recording to hear (after a few seconds) the distorted sound.
My questions are:
1. Which is the right technology layer for this task e.g. audio units?
2. Which are the steps involved?
3. Which are the key concepts and API methods to use?
4. I want to capture the users voice. Which are the right recording settings for this? If my filter alter alters the frequency should I use a wider range?
5. How can I collect the necessary samples for my filter? How can I handle the audio data? I mean depending on the recording settings how the data are packed?
6. How can I wright the final audio recording to a file?
Thanks in advance!
If you find a delay of over a hundred milliseconds acceptable, you can use the Audio Queue API, which is a bit simpler than using the RemoteIO Audio Unit, for both capture and audio playback. You can process the buffers in your own NSOperationQueue as the come in from the audio queue, and either save the processed results to a file or just kept in memory if there is room.
For Question 4: If your audio filter is linear, then you won't need any wider frequency range. If you are doing non-linear filtering, all bets are off.
Related
I am using several instances of AVAudioPlayer to play overlapping sounds, and getting harsh distortion as a result. Here is my situation... I have an app with several piano keys. Upon touching a key, it plays a note. If I touch 6-7 keys in rapid succession, my app plays a 2 second .mp3 clip for each key. Since I am using separate audio streams, they sounds overlap (which they should), but the result is lots of distortion, pops, or buzzing sounds!
How can I make the overlapping audio crisp and clean? I recorded the piano sounds myself and they are very nice, clean, noise-free recordings, and I don't understand why the overlapping streams sound so bad. Even at low volume or through headphones, the quality is just very degraded.
Any suggestions are appreciated!
Couple of things:
Clipping
The "buzzing" you describe is almost assuredly clipping—the result of adding two or more waveforms together and the resulting, combined waveform having its peaks cut off—clipped—at unity.
When you're designing virtual synthesizers with polyphony, you have to take into consideration how many voices will likely play at once and provide headroom, typically by attenuating each voice.
In practice, you can achieve this with AVAudioPlayer by setting each instances volume property to 0.316 for 10 dB of headroom. (Enough for 8 simultaneous voices)
The obvious problem here that when the user plays a single voice, it may seem too quiet—you'll want to experiment with various headroom values and typical user behavior and adjust to taste (it's also signal-dependent. Your piano samples may clip more/less easily than other waveforms depending on their recorded amplitude.)
Depending on your app's intended user, you might consider making this headroom parameter available to them.
Discontinuities/Performance
The pops and clicks you're hearing may not be a result of clipping, but rather a side effect of the fact you're using mp3 as your audio file format. This is a Bad Idea™. iOS devices only have one hardware stereo mp3 decoder, so as soon as you spin up a second, third, etc. voice, iOS has to decode the mp3 audio data on the cpu. Depending on the device, you can only decode a couple audio streams this way before suffering from underflow discontinuities (cut that in half for stereo files, obviously)... the CPU simply can't decode enough samples for the output audio stream in time, so you hear nasty pops and clicks.
For sample playback, you want to use an LPCM audio encoding (like wav or aiff) or something extremely efficient to decode, like ima4. One strategy that I've used in every app I've shipped that has these types of audio samples is to ship samples in mp3 or aac format, but decode them once to an LPCM file in the app's sandbox the first time the app is launched. This way you get the benefit of a smaller app bundle and low CPU utilization/higher polyphony at runtime when decoding the samples. (With a small hit to the first-time user experience while the user waits for the samples to be decoded.)
My understanding is that AVAudioPlayer isn't meant to be used like that. In general, when combining lots of sounds into a single output like that, you want to open a single stream and mix the sounds yourself.
What you are encountering is clipping — it's occurring because the combined volumes of the sounds you're playing are exceeding the maximum possible volume. You need to decrease the volume of these sounds when there's more than one playing at a time.
I'm trying to get samples from AudioQueue to show spectrum of music (like in iTunes) on iPhone.
Ive read a lot of posts but almost all asks about get samples when Recording, not playing :(
I'm using AudioQueue Services for streaming audio. Please help to understanding next points:
1/ Where can I get access to samples (PCM, non mp3 (I'm using mp3 stream)
2/ Should I collect samples in my own buffer to apply fft ?
3/ Is it possible get frequencies without fft transformations ?
4/ How can I synchronize my fft shift in buffer with current playing samples ?
thanks,
update:
AudioQueueProcessingTapNew
For iOS6+, this works fine for me. But what about iOS5 ?
For playing audio, the idea is to get at the samples before you feed them to the Audio Queue callback. You may need to convert any compressed audio file format into raw PCM samples beforehand. This can be done using one of the AVFoundation converter or file reader services.
You can then copy frames of data from the same source used to feed the Audio Queue callback buffers, and apply your FFT or other DSP for visualization to them.
You can use either FFTs or a bank of band-pass filters to get frequency info, but the FFT is very efficient at this.
Synchronization needs to done by trial-and-error, as Apple does not specify exact audio and view graphic display latencies, which may differ between iOS devices and OS versions anyway. But short Audio Queue buffers or using the RemoteIO Audio Unit may give you better control of the audio latency, and OpenGL ES will give you better control of the graphic latency.
so I'm making an app and what I need to do is when for example someone starts talking I need to detect that there is a sound and then record it.
I found this tutorial http://mobileorchard.com/tutorial-detecting-when-a-user-blows-into-the-mic/ but it starts the recording on the beginning and then based on the recording it detects the sound.
Is there any other way to detect a sound without actually starting the recorder first? What I thought of would be having 2 recorders, one for detection and one for actually recording the sound. Another solution would be to edit (trim) the sound after it's recorded.
Are these approaches somehow standard or is there a better way to detect sound?
Thanks.
edit: if anyone ever reads this, I also found this http://bonkel.wordpress.com/2010/03/03/frequency-detection-using-fourier-transform/
If you don't mind getting a little dirty, you could go down to a lower level, to CoreAudio, and read data out of the input buffers until you see values exceeding your threshold, and start recording those input buffers, or triggering a high level recording call. You can similarly stop recording after a period of silence.
If you use CoreAudio, you have a lot of control over what you record. You could, pretty easily, filter out background noise, or add beeps to signify when the recording stopped due to silence, and even add markers to use later to match time to the recording.
CoreAudio does require you to do more work. You will have to read the microphone buffers on a timely basis and either save or discard the data pretty quickly in order not to drop any sound data. This isn't that hard, as the devices have plenty of CPU power to do that and other tasks at the same time - you just have to have a good grasp of CoreAudio.
There are plenty of Apple CoreAudio samples that can guide you. The WWDC 2010 and 2010 CoreAudio sessions are also a must-see.
You could use either the Audio Queue or the Core Audio (RemoteIO Audio Unit) API. Unless your app requires low latency, the Audio Queue API may be simpler to use.
You need to start the recording API to detect any sound, but you don't need to save everything you get from the recording callback to a file.
I'm currently playing a stream on my iOS App but one feature we'd like to add is the visualization of the output wave. I use an output audio queue in order to play the stream, but have found no way to read the output buffer. Can this be achieved using audio queues or shall be done wit a lower level api?
To visualize, you presumably need PCM (uncompressed) data, so if you're pushing some compressed format into the queue like MP3 or AAC, then you never see the data you need. If you were working with PCM (maybe you're uncompressing it yourself with the Audio Conversion APIs), then you could visualize before putting samples into the queue. But then the problem would be latency - you want to visualize samples when they play, not when they go into the queue.
For latency reasons alone, you probably want to be using audio units.
It cannot actually be done. In order to do so, I need audio units to implement the streamer.
I am new to Core Audio and really lost, I am trying to record an audio and then apply voice modulation to that recording and play it back. I have looked at the example Speak Here which uses Audio Queue for audio recording. I am stuck at the part of how to change the audio samples. I understand that it can be done using Audio Unit in the call back function to change the audio samples, but I have no idea what to apply to those samples to change them (will changing pitch help ?).
If you could direct me to some source code or tutorial or any site that explains voice modulation for objective C will really really help me. Thank you all in advance.
What you are trying to do here is not that simple. Basically, you would have to implement a vocoder ("voice-coder") to change a voice. The Wikipedia links should help you there.
Then, you still have manipulate those samples in CoreAudio. You can do this using Audio Queue Services but that not exactly an easy-to-use API. It might actually be less trouble to use one of the simpler CoreAudio APIs and wrap your vocoder in an Audio Unit.
Do you have some experience with audio processing? Implementing a vocoder without some prior knowledge about audio processing in general is a tough task.
First, to actually answer your question: When you called the AudioQueueNewInput() function, you pass it the name of a routine that will be called every time data is available to you. You probably called it MyInputBufferHandler() or something. It's third argument is an AudioQueueBufferRef which hold the incoming data.
Be aware that this is not as simple as looking at each sample (amplitude) and lowering or raising it. You receive samples in the temporal (time) domain as amplitudes. There is no pitch or frequency information available. What you need to do is move the incoming samples (waveform) into the frequency domain, wherein each "point" in that space is a frequency and it's accompanying power and phase. You can do that with an FFT (fast Fourier transform) but the mathematics are somewhat sophisticated. Apple does provide FFT routines in the Acceleration framework, but be aware that you are wading into very deep water here.