Counting audio power peaks iOS - iphone

Edited the question due to progressive insights :-)
I am creating an app that is listening to the audio input.
I want it to count peaks. (peaks will be at a max frequency of about 10 Hz.)
After a lot of searching, I ended up using the AudioQueue Service as that will be able to give me the raw input data.
I am using a stripped down version (no playback) of the SpeakHere example, but instead of simply writing the buffer to the filesystem, I want to look at the individual sample data.
Think I am on the right track now, but I don't understand how to work with the buffers.
I am trying to isolate the data of one sample. So that for loop in the following function, does that make any sense, and
what should I put in there to get one sample?
void AQRecorder::MyInputBufferHandler( void *inUserData, AudioQueueRef inAQ, AudioQueueBufferRef inBuffer, const AudioTimeStamp *inStartTime, UInt32 inNumPackets, const AudioStreamPacketDescription* inPacketDesc)
{
// AudioQueue callback function, called when an input buffers has been filled.
AQRecorder *aqr = (AQRecorder *)inUserData;
try {
if (inNumPackets > 0) {
/* // write packets to file
XThrowIfError(AudioFileWritePackets(aqr->mRecordFile,FALSE,inBuffer->mAudioDataByteSize,inPacketDesc,aqr->mRecordPacket,&inNumPackets,inBuffer->mAudioData),
"AudioFileWritePackets failed");*/
SInt16 sample;
for (UInt32 sampleIndex=0; sampleIndex < inNumPackets; ++sampleIndex) {
// What do I put here to look at one sample at index sampleIndex ??
}
aqr->mRecordPacket += inNumPackets;
}
// if we're not stopping, re-enqueue the buffe so that it gets filled again
if (aqr->IsRunning())
XThrowIfError(AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, NULL),
"AudioQueueEnqueueBuffer failed");
} catch (CAXException e) {
char buf[256];
fprintf(stderr, "Error: %s (%s)\n", e.mOperation, e.FormatError(buf));
}
}
(maybe I shouldn't have deleted so much of the original question... what is the policy?)
Originally I was thinking of using the AurioTouch example, but as was pointed out in a comment, that uses throughput and I only need input. It is also a much more complicated example than SpeakHere.

you would probably want to apply some sort of smoothing to your peak power level, maybe am IIR filter, something like:
x_out = 0.9 * x_old + 0.1 * x_in;
:
x_old = x_out;
I haven't used this feature, so I don't know if it would do everything you want. if it doesn't, you can drop a level and use a RemoteIO audio unit, and catch sound as it comes in using the 'input callback' ( as opposed to the render callback which happens when the speakers are hungry for data )
note that in the input callback you have to create your own buffers, don't think just because you get a buffer pointer as the last parameter that that means it points to something valid. it doesn't.
anyway, you could use some vDSP function to get the magnitude squared for the vector of the entire buffer (1024 floats or whatever your buffer size / stream format is)
and then you could smooth that yourself

This loops through all samples in the buffer.
SInt16 sample;
for (UInt32 sampleIndex=0; sampleIndex < inNumPackets; ++sampleIndex) {
sample = buffer[sampleIndex]; // Get the power of one sample from the buffer
aqr->AnalyseSample(sample);
}
Was a tricky part: aqr points to the instance of the recorder. The callback is a static function and can't access the member variables or member functions directly.
In order to count the peaks, I keep track of a longterm average and a shortterm average. If the shortTerm average is a certain factor bigger than the longterm average, there is a peak. When the shortterm average goes down again, the peak has passed.

Related

Interpreting inputBuffer's Value in a Callback

I am basing my code off of Portaudio's paex_record_file.c example. One of the parameters in the callback is inputBuffer, and I wanted to use its data to calculate other numbers with the double/float type. I changed the file from a .raw to a .txt, but notepad still cannot read it, leading me to believe its data is not actually encoded as a number. How is the data stored in inputBuffer and how can I do arithmetic with it (add, multiply, divide, etc)?
This is how I initialized inputParameters:
inputParameters.device = Pa_GetDefaultInputDevice(); /* default input device */
if (inputParameters.device == paNoDevice) {
fprintf(stderr,"Error: No default input device.\n");
goto error;
}
inputParameters.channelCount = 2; /* stereo input */
inputParameters.sampleFormat = paFloat32;
inputParameters.suggestedLatency = Pa_GetDeviceInfo( inputParameters.device )->defaultLowInputLatency;
inputParameters.hostApiSpecificStreamInfo = NULL;
This question is somewhat related to print floats from audio input callback function (unanswered).
The inputBuffer parameter to the callback is a void*. The actual type of the underlying buffer depends on the parameters and the flags that you pass to Pa_OpenStream.
If you specified paFloat32 then there will be a float* in there somewhere. However the are two possibilities:
Interleaved: inputParameters.sampleFormat = paFloat32;
Non-Interleaved: inputParameters.sampleFormat = paFloat32|paNonInterleaved;
You specified the interleaved option. In this case, inputBuffer points to a single buffer of interleaved floats. So you can write:
float *samples = (float*)inputBuffer;
In a two channel stream samples will contain interleaved left and right samples, e.g.:
samples[0]; // first left sample
samples[1]; // first right sample
samples[2]; // second left sample
samples[3]; // second right sample
// etc.
For completeness: If it had been a non-interleaved stream then inputBuffer points to an array of pointers to single-channel buffers. To extract the buffer pointers you would write something like:
float *left = ((float **) inputBuffer)[0];
float *right = ((float **) inputBuffer)[1];
Note that in all cases framesPerBuffer counts frames not samples. A frame includes one sample from each channel. For example, in a stereo stream, a frame includes both the left and right channel samples.

iOS Core Audio render callback works on simulator, not on device

My callback looks like this:
static OSStatus renderInput(void *inRefCon, AudioUnitRenderActionFlags *ioActionFlags, const AudioTimeStamp *inTimeStamp, UInt32 inBusNumber, UInt32 inNumberFrames, AudioBufferList *ioData)
{
AudioSampleType *outBuffer = (AudioSampleType *)ioData->mBuffers[0].mData;
memset(outBuffer, 0, sizeof(AudioSampleType)*inNumberFrames*kNumChannels);
//copy a sine wave into outBuffer
double max_aust = pow(2.f, (float)(sizeof(AudioSampleType)*8.0 - 1.f)) - 1.0;
for(int i = 0; i < inNumberFrames; i++) {
SInt16 val = (SInt16) (gSine2_[(int)phase2_] * max_aust);
outBuffer[2*i] = outBuffer[2*i+1] = (AudioSampleType)val;
phase2_ += inc2_;
if(phase2_ > 1024) phase2_ -= 1024;
}
return noErr;
}
This is a super basic render callback that should just play a sine wave. It does on the simulator, it does NOT on the device. In fact, I can get no audio from the device. Even if I add a printf to check outBuffer, it shows that outBuffer is filled with samples of a sine wave.
I'm setting the session type to Ambiet, but I've tried playAndRecord and MediaPlayback as well. No luck with either. My preferred framesPerBuffer is 1024 (which is what I get on the simulator and device). My sample rate is 44100hz. I've tried 48000 as well just in case. I've also tried changing the framesPerBuffer.
Are there any other reasons that the samples would not reach the hardware on the device?
UPDATE:
I just found out that if I plug my headphones into the device I hear what sounds like a sine wave that is clipping really horribly. This made me think that possibly the device was expecting floating point instead of signed int, but when I changed the values to -1 to 1 there's just no audio (device or simulator, as expected since the engine is set to accept signed int, not floating point).
I can't tell for sure without seeing more of your setup, but it sounds very much like you're getting bitten by the difference between AudioSampleType (SInt16 samples) and AudioUnitSampleType (fixed 8.24 samples inside of a SInt32 container). It's almost certainly the case that AudioUnitSampleType is the format expected in your callback. This post on the Core Audio mailing list does a very good job explaining the difference between the two, and why they exist.
Because I don't know how is your setup I suggest to read this: http://www.cocoawithlove.com/2010/10/ios-tone-generator-introduction-to.html
The sample code is for a mono tone generator, if you want stereo fill the second channel too.
The pointer to second channel buffer is
const int secondChannel = 1;
Float32 *bufferSecondChannel = (Float32 *)ioData->mBuffers[secondChannel].mData;
Hope this help
You may need to setup the audio session (initialize, set category and activate it)
OSStatus activationResult = NULL;
result = AudioSessionSetActive (true);
More at:
http://developer.apple.com/library/ios/#documentation/Audio/Conceptual/AudioSessionProgrammingGuide/Cookbook/Cookbook.html

Recording with remote I/O, AudioUnitRender -50 return code

I've been working on a frequency detection application for iOS and I'm having an issue filling a user-defined AudioBufferList with audio samples from the microphone.
I'm getting a return code of -50 when I call AudioUnitRender in my InputCallback method. I believe this means one of my parameters is invalid. I'm guessing it's the AudioBufferList, but I haven't been able to figure out what is wrong with it. I think I've set it up so it matches the data format I've specified in my ASBD.
Below is the remote I/O setup and function calls that I believe could be incorrect:
ASBD:
size_t bytesPerSample = sizeof(AudioUnitSampleType);
AudioStreamBasicDescription localStreamFormat = {0};
localStreamFormat.mFormatID = kAudioFormatLinearPCM;
localStreamFormat.mFormatFlags = kAudioFormatFlagsAudioUnitCanonical;
localStreamFormat.mBytesPerPacket = bytesPerSample;
localStreamFormat.mBytesPerFrame = bytesPerSample;
localStreamFormat.mFramesPerPacket = 1;
localStreamFormat.mBitsPerChannel = 8 * bytesPerSample;
localStreamFormat.mChannelsPerFrame = 2;
localStreamFormat.mSampleRate = sampleRate;
InputCallback Declaration:
err = AudioUnitSetProperty(ioUnit, kAudioOutputUnitProperty_SetInputCallback,
kAudioUnitScope_Input,
kOutputBus, &callbackStruct, sizeof(callbackStruct));
AudioBufferList Declaration:
// Allocate AudioBuffers
bufferList = (AudioBufferList *)malloc(sizeof(AudioBuffer));
bufferList->mNumberBuffers = 1;
bufferList->mBuffers[0].mNumberChannels = 2;
bufferList->mBuffers[0].mDataByteSize = 1024;
bufferList->mBuffers[0].mData = calloc(256, sizeof(uint32_t));
InputCallback Function:
AudioUnit rioUnit = THIS->ioUnit;
OSStatus renderErr;
UInt32 bus1 = 1;
renderErr = AudioUnitRender(rioUnit, ioActionFlags, inTimeStamp, bus1, inNumberFrames, THIS->bufferList);
A few things to note:
Sample Rate = 22050 Hz
Since the canonical format of remote I/O data is 8.24-bit fixed point, I'm assuming the samples are 32 bits each (or 4 bytes). Since an unsigned int is 4 bytes, I'm using that to allocate my audio buffer.
I can get the same code to render audio correctly if I implement the audio data flow as PassThru rather than input only.
I've already looked at Michael Tyson's blog post on Remote I/O. Didn't see anything there different from what I'm doing.
Thanks again, you all are awesome!
Demetri
If you have 2 channels per frame, you cannot have bytesPerSample as the size of the frame. Since the terminology is a bit confusing:
A sample is a single value at a given position in a waveform
A channel refers to data associated with a particular audio stream, ie, left/right channel for stereo, a single channel for mono, etc.
A frame contains the samples for all channels for a given position in a waveform
A packet contains one or more frames
So basically, you need to use bytesPerSample * mChannelsPerFrame for mBytesPerFrame, and use mBytesPerFrame * mFramesPerPacket for mBytesPerPacket.
Also I noticed that you are using 32-bits for your sample size. I'm not sure if you really want to do this -- usually, you want to record audio using 16-bit samples. The sound difference between 16 and 32 bit audio is almost impossible for most listeners to hear (the average CD is mastered at 44.1kHz, 16-bit PCM), and it will spare you 50% of the I/O and storage costs.
One difference is that Tyson's RemoteIO blog post uses 2 bytes per sample of linear PCM. So this might be a format incompatible error.
The line bufferList = (AudioBufferList *)malloc(sizeof(AudioBuffer)); is also wrong. Since AudioBuffer is smaller than AudioBufferList, it allocates not enough memory.

Help with live-updating sound on the iPhone

My question is a little tricky, and I'm not exactly experienced (I might get some terms wrong), so here goes.
I'm declaring an instance of an object called "Singer". The instance is called "singer1". "singer1" produces an audio signal. Now, the following is the code where the specifics of the audio signal are determined:
OSStatus playbackCallback(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData) {
//Singer *me = (Singer *)inRefCon;
static int phase = 0;
for(UInt32 i = 0; i < ioData->mNumberBuffers; i++) {
int samples = ioData->mBuffers[i].mDataByteSize / sizeof(SInt16);
SInt16 values[samples];
float waves;
float volume=.5;
for(int j = 0; j < samples; j++) {
waves = 0;
waves += sin(kWaveform * 600 * phase)*volume;
waves += sin(kWaveform * 400 * phase)*volume;
waves += sin(kWaveform * 200 * phase)*volume;
waves += sin(kWaveform * 100 * phase)*volume;
waves *= 32500 / 4; // <--------- make sure to divide by how many waves you're stacking
values[j] = (SInt16)waves;
values[j] += values[j]<<16;
phase++;
}
memcpy(ioData->mBuffers[i].mData, values, samples * sizeof(SInt16));
}
return noErr;
}
99% of this is borrowed code, so I only have a basic understanding of how it works (I don't know about the OSStatus class or method or whatever this is. However, you see those 4 lines with 600, 400, 200 and 100 in them? Those determine the frequency. Now, what I want to do (for now) is insert my own variable in there in place of a constant, which I can change on a whim. This variable is called "fr1". "fr1" is declared in the header file, but if I try to compile I get an error about "fr1" being undeclared. Currently, my technique to fix this is the following: right beneath where I #import stuff, I add the line
fr1=0.0;//any number will work properly
This sort of works, as the code will compile and singer1.fr1 will actually change values if I tell it to. The problems are now this:A)even though this compiles and the tone specified will play (0.0 is no tone), I get the warnings "Data definition has no type or storage class" and "Type defaults to 'int' in declaration of 'fr1'". I bet this is because for some reason it's not seeing my previous declaration in the header file (as a float). However, again, if I leave this line out the code won't compile because "fr1 is undeclared". B)Just because I change the value of fr1 doesn't mean that singer1 will update the value stored inside the "playbackcallback" variable or whatever is in charge of updating the output buffers. Perhaps this can be fixed by coding differently? C)even if this did work, there is still a noticeable "gap" when pausing/playing the audio, which I need to eliminate. This might mean a complete overhaul of the code so that I can "dynamically" insert new values without disrupting anything. However, the reason I'm going through all this effort to post is because this method does exactly what I want (I can compute a value mathematically and it goes straight to the DAC, which means I can use it in the future to make triangle, square, etc waves easily). I have uploaded Singer.h and .m to pastebin for your veiwing pleasure, perhaps they will help. Sorry, I can't post 2 HTML tags so here are the full links.
(http://pastebin.com/ewhKW2Tk)
(http://pastebin.com/CNAT4gFv)
So, TL;DR, all I really want to do is be able to define the current equation/value of the 4 waves and re-define them very often without a gap in the sound.
Thanks. (And sorry if the post was confusing or got off track, which I'm pretty sure it did.)
My understanding is that your callback function is called every time the buffer needs to be re-filled. So changing fr1..fr4 will alter the waveform, but only when the buffer updates. You shouldn't need to stop and re-start the sound to get a change, but you will notice an abrupt shift in the timbre if you change your fr values. In order to get a smooth transition in timbre, you'd have to implement something that smoothly changes the fr values over time. Tweaking the buffer size will give you some control over how responsive the sound is to your changing fr values.
Your issue with fr being undefined is due to your callback being a straight c function. Your fr variables are declared as objective-c instance variables as part of your Singer object. They are not accessible by default.
take a look at this project, and see how he implements access to his instance variables from within his callback. Basically he passes a reference to his instance to the callback function, and then accesses instance variables through that.
https://github.com/youpy/dowoscillator
notice:
Sinewave *sineObject = inRefCon;
float freq = sineObject.frequency * 2 * M_PI / samplingRate;
and:
AURenderCallbackStruct input;
input.inputProc = RenderCallback;
input.inputProcRefCon = self;
Also, you'll want to move your callback function outside of your #implementation block, because it's not actually part of your Singer object.
You can see this all in action here: https://github.com/coryalder/SineWaver

iPhone audio analysis

I'm looking into developing an iPhone app that will potentially involve a "simple" analysis of audio it is receiving from the standard phone mic. Specifically, I am interested in the highs and lows the mic pics up, and really everything in between is irrelevant to me. Is there an app that does this already (just so I can see what its capable of)? And where should I look to get started on such code? Thanks for your help.
Look in the Audio Queue framework. This is what I use to get a high water mark:
AudioQueueRef audioQueue; // Imagine this is correctly set up
UInt32 dataSize = sizeof(AudioQueueLevelMeterState) * recordFormat.mChannelsPerFrame;
AudioQueueLevelMeterState *levels = (AudioQueueLevelMeterState*)malloc(dataSize);
float channelAvg = 0;
OSStatus rc = AudioQueueGetProperty(audioQueue, kAudioQueueProperty_CurrentLevelMeter, levels, &dataSize);
if (rc) {
NSLog(#"AudioQueueGetProperty(CurrentLevelMeter) returned %#", rc);
} else {
for (int i = 0; i < recordFormat.mChannelsPerFrame; i++) {
channelAvg += levels[i].mPeakPower;
}
}
free(levels);
// This works because one channel always has an mAveragePower of 0.
return channelAvg;
You can get peak power in either dB Free Scale (with kAudioQueueProperty_CurrentLevelMeterDB) or simply as a float in the interval [0.0, 1.0] (with kAudioQueueProperty_CurrentLevelMeter).
Don't forget to activate level metering for AudioQueue first:
UInt32 d = 1;
OSStatus status = AudioQueueSetProperty(mQueue, kAudioQueueProperty_EnableLevelMetering, &d, sizeof(UInt32));
Check the 'SpeakHere' sample code. it will show you how to record audio using the AudioQueue API. It also contains some code to analyze the audio realtime to show a level meter.
You might actually be able to use most of that level meter code to respond to 'highs' and 'lows'.
The AurioTouch example code performs Fourier analysis
on the mic input. Could be a good starting point:
https://developer.apple.com/iPhone/library/samplecode/aurioTouch/index.html
Probably overkill for your application.