iPhone audio analysis - iphone

I'm looking into developing an iPhone app that will potentially involve a "simple" analysis of audio it is receiving from the standard phone mic. Specifically, I am interested in the highs and lows the mic pics up, and really everything in between is irrelevant to me. Is there an app that does this already (just so I can see what its capable of)? And where should I look to get started on such code? Thanks for your help.

Look in the Audio Queue framework. This is what I use to get a high water mark:
AudioQueueRef audioQueue; // Imagine this is correctly set up
UInt32 dataSize = sizeof(AudioQueueLevelMeterState) * recordFormat.mChannelsPerFrame;
AudioQueueLevelMeterState *levels = (AudioQueueLevelMeterState*)malloc(dataSize);
float channelAvg = 0;
OSStatus rc = AudioQueueGetProperty(audioQueue, kAudioQueueProperty_CurrentLevelMeter, levels, &dataSize);
if (rc) {
NSLog(#"AudioQueueGetProperty(CurrentLevelMeter) returned %#", rc);
} else {
for (int i = 0; i < recordFormat.mChannelsPerFrame; i++) {
channelAvg += levels[i].mPeakPower;
}
}
free(levels);
// This works because one channel always has an mAveragePower of 0.
return channelAvg;
You can get peak power in either dB Free Scale (with kAudioQueueProperty_CurrentLevelMeterDB) or simply as a float in the interval [0.0, 1.0] (with kAudioQueueProperty_CurrentLevelMeter).

Don't forget to activate level metering for AudioQueue first:
UInt32 d = 1;
OSStatus status = AudioQueueSetProperty(mQueue, kAudioQueueProperty_EnableLevelMetering, &d, sizeof(UInt32));

Check the 'SpeakHere' sample code. it will show you how to record audio using the AudioQueue API. It also contains some code to analyze the audio realtime to show a level meter.
You might actually be able to use most of that level meter code to respond to 'highs' and 'lows'.

The AurioTouch example code performs Fourier analysis
on the mic input. Could be a good starting point:
https://developer.apple.com/iPhone/library/samplecode/aurioTouch/index.html
Probably overkill for your application.

Related

In Unity, how to segment the user's voice from microphone based on loudness?

I need to collect voice pieces from a continuous audio stream. I need to process later the user's voice piece that has just been said (not for speech recognition). What I am focusing on is only the voice's segmentation based on its loudness.
If after at least 1 second of silence, his voice becomes loud enough for a while, and then silent again for at least 1 second, I say this is a sentence and the voice should be segmented here.
I just know I can get raw audio data from the AudioClip created by Microphone.Start(). I want to write some code like this:
void Start()
{
audio = Microphone.Start(deviceName, true, 10, 16000);
}
void Update()
{
audio.GetData(fdata, 0);
for(int i = 0; i < fdata.Length; i++) {
u16data[i] = Convert.ToUInt16(fdata[i] * 65535);
}
// ... Process u16data
}
But what I'm not sure is:
Every frame when I call audio.GetData(fdata, 0), what I get is the latest 10 seconds of sound data if fdata is big enough or shorter than 10 seconds if fdata is not big enough, is it right?
fdata is a float array, and what I need is a 16 kHz, 16 bit PCM buffer. Is it right to convert the data like: u16data[i] = fdata[i] * 65535?
What is the right way to detect loud moments and silent moments in fdata?
No. you have to read starting at the current position within the AudioClip using Microphone.GetPosition
Get the position in samples of the recording.
and pass the optained index to AudioClip.GetData
Use the offsetSamples parameter to start the read from a specific position in the clip
fdata = new float[clip.samples * clip.channels];
var currentIndex = Microphone.GetPosition(null);
audio.GetData(fdata, currentIndex);
I don't understand what exactly you convert this for. fdata will contain
floats ranging from -1.0f to 1.0f (AudioClip.GetData)
so if for some reason you need to get values between short.MinValue (= -32768) and short.MaxValue(= 32767) than yes you can do that using
u16data[i] = Convert.ToUInt16(fdata[i] * short.MaxValue);
note however that Convert.ToUInt16(float):
value, rounded to the nearest 16-bit unsigned integer. If value is halfway between two whole numbers, the even number is returned; that is, 4.5 is converted to 4, and 5.5 is converted to 6.
you might want to rather use Mathf.RoundToInt first to also round up if a value is e.g. 4.5.
u16data[i] = Convert.ToUInt16(Mathf.RoundToInt(fdata[i] * short.MaxValue));
Your naming however suggests that you are actually trying to get unsigned values ushort (or also UInt16). For this you can not have negative values! So you have to shift the float values up in order to map the range (-1.0f | 1.0f ) to the range (0.0f | 1.0f) before multiplaying it by ushort.MaxValue(= 65535)
u16data[i] = Convert.ToUInt16(Mathf.RoundToInt(fdata[i] + 1) / 2 * ushort.MaxValue);
What you receive from AudioClip.GetData are the gain values of the audio track between -1.0f and 1.0f.
so a "loud" moment would be where
Mathf.Abs(fdata[i]) >= aCertainLoudThreshold;
a "silent" moment would be where
Mathf.Abs(fdata[i]) <= aCertainSiltenThreshold;
where aCertainSiltenThreshold might e.g. be 0.2f and aCertainLoudThreshold might e.g. be 0.8f.

AVAudioPCMBuffer built programmatically, not playing back in stereo

I'm trying to fill an AVAudioPCMBuffer programmatically in Swift to build a metronome. This is the first real app I'm trying to build, so it's also my first audio app. Right now I'm experimenting with different frameworks and methods of getting the metronome looping accurately.
I'm trying to build an AVAudioPCMBuffer with the length of a measure/bar so that I can use the .Loops option of the AVAudioPlayerNode's scheduleBuffer method. I start by loading my file(2 ch, 44100 Hz, Float32, non-inter, *.wav and *.m4a both have same issue) into a buffer, then copying that buffer frame by frame separated by empty frames into the barBuffer. The loop below is how I'm accomplishing this.
If I schedule the original buffer to play, it will play back in stereo, but when I schedule the barBuffer, I only get the left channel. As I said I'm a beginner at programming, and have no experience with audio programming, so this might be my lack of knowledge on 32 bit float channels, or on this data type UnsafePointer<UnsafeMutablePointer<float>>. When I look at the floatChannelData property in swift, the description makes it sound like this should be copying two channels.
var j = 0
for i in 0..<Int(capacity) {
barBuffer.floatChannelData.memory[j] = buffer.floatChannelData.memory[i]
j += 1
}
j += Int(silenceLengthInSamples)
// loop runs 4 times for 4 beats per bar.
edit: I removed the glaring mistake i += 1, thanks to hotpaw2. The right channel is still missing when barBuffer is played back though.
Unsafe pointers in swift are pretty weird to get used to.
floatChannelData.memory[j] only accesses the first channel of data. To access the other channel(s), you have a couple choices:
Using advancedBy
// Where current channel is at 0
// Get a channel pointer aka UnsafePointer<UnsafeMutablePointer<Float>>
let channelN = floatChannelData.advancedBy( channelNumber )
// Get channel data aka UnsafeMutablePointer<Float>
let channelNData = channelN.memory
// Get first two floats of channel channelNumber
let floatOne = channelNData.memory
let floatTwo = channelNData.advancedBy(1).memory
Using Subscript
// Get channel data aka UnsafeMutablePointer<Float>
let channelNData = floatChannelData[ channelNumber ]
// Get first two floats of channel channelNumber
let floatOne = channelNData[0]
let floatTwo = channelNData[1]
Using subscript is much clearer and the step of advancing and then manually
accessing memory is implicit.
For your loop, try accessing all channels of the buffer by doing something like this:
for i in 0..<Int(capacity) {
for n in 0..<Int(buffer.format.channelCount) {
barBuffer.floatChannelData[n][j] = buffer.floatChannelData[n][i]
}
}
Hope this helps!
This looks like a misunderstanding of Swift "for" loops. The Swift "for" loop automatically increments the "i" array index. But you are incrementing it again in the loop body, which means that you end up skipping every other sample (the Right channel) in your initial buffer.

iOS Core Audio render callback works on simulator, not on device

My callback looks like this:
static OSStatus renderInput(void *inRefCon, AudioUnitRenderActionFlags *ioActionFlags, const AudioTimeStamp *inTimeStamp, UInt32 inBusNumber, UInt32 inNumberFrames, AudioBufferList *ioData)
{
AudioSampleType *outBuffer = (AudioSampleType *)ioData->mBuffers[0].mData;
memset(outBuffer, 0, sizeof(AudioSampleType)*inNumberFrames*kNumChannels);
//copy a sine wave into outBuffer
double max_aust = pow(2.f, (float)(sizeof(AudioSampleType)*8.0 - 1.f)) - 1.0;
for(int i = 0; i < inNumberFrames; i++) {
SInt16 val = (SInt16) (gSine2_[(int)phase2_] * max_aust);
outBuffer[2*i] = outBuffer[2*i+1] = (AudioSampleType)val;
phase2_ += inc2_;
if(phase2_ > 1024) phase2_ -= 1024;
}
return noErr;
}
This is a super basic render callback that should just play a sine wave. It does on the simulator, it does NOT on the device. In fact, I can get no audio from the device. Even if I add a printf to check outBuffer, it shows that outBuffer is filled with samples of a sine wave.
I'm setting the session type to Ambiet, but I've tried playAndRecord and MediaPlayback as well. No luck with either. My preferred framesPerBuffer is 1024 (which is what I get on the simulator and device). My sample rate is 44100hz. I've tried 48000 as well just in case. I've also tried changing the framesPerBuffer.
Are there any other reasons that the samples would not reach the hardware on the device?
UPDATE:
I just found out that if I plug my headphones into the device I hear what sounds like a sine wave that is clipping really horribly. This made me think that possibly the device was expecting floating point instead of signed int, but when I changed the values to -1 to 1 there's just no audio (device or simulator, as expected since the engine is set to accept signed int, not floating point).
I can't tell for sure without seeing more of your setup, but it sounds very much like you're getting bitten by the difference between AudioSampleType (SInt16 samples) and AudioUnitSampleType (fixed 8.24 samples inside of a SInt32 container). It's almost certainly the case that AudioUnitSampleType is the format expected in your callback. This post on the Core Audio mailing list does a very good job explaining the difference between the two, and why they exist.
Because I don't know how is your setup I suggest to read this: http://www.cocoawithlove.com/2010/10/ios-tone-generator-introduction-to.html
The sample code is for a mono tone generator, if you want stereo fill the second channel too.
The pointer to second channel buffer is
const int secondChannel = 1;
Float32 *bufferSecondChannel = (Float32 *)ioData->mBuffers[secondChannel].mData;
Hope this help
You may need to setup the audio session (initialize, set category and activate it)
OSStatus activationResult = NULL;
result = AudioSessionSetActive (true);
More at:
http://developer.apple.com/library/ios/#documentation/Audio/Conceptual/AudioSessionProgrammingGuide/Cookbook/Cookbook.html

Counting audio power peaks iOS

Edited the question due to progressive insights :-)
I am creating an app that is listening to the audio input.
I want it to count peaks. (peaks will be at a max frequency of about 10 Hz.)
After a lot of searching, I ended up using the AudioQueue Service as that will be able to give me the raw input data.
I am using a stripped down version (no playback) of the SpeakHere example, but instead of simply writing the buffer to the filesystem, I want to look at the individual sample data.
Think I am on the right track now, but I don't understand how to work with the buffers.
I am trying to isolate the data of one sample. So that for loop in the following function, does that make any sense, and
what should I put in there to get one sample?
void AQRecorder::MyInputBufferHandler( void *inUserData, AudioQueueRef inAQ, AudioQueueBufferRef inBuffer, const AudioTimeStamp *inStartTime, UInt32 inNumPackets, const AudioStreamPacketDescription* inPacketDesc)
{
// AudioQueue callback function, called when an input buffers has been filled.
AQRecorder *aqr = (AQRecorder *)inUserData;
try {
if (inNumPackets > 0) {
/* // write packets to file
XThrowIfError(AudioFileWritePackets(aqr->mRecordFile,FALSE,inBuffer->mAudioDataByteSize,inPacketDesc,aqr->mRecordPacket,&inNumPackets,inBuffer->mAudioData),
"AudioFileWritePackets failed");*/
SInt16 sample;
for (UInt32 sampleIndex=0; sampleIndex < inNumPackets; ++sampleIndex) {
// What do I put here to look at one sample at index sampleIndex ??
}
aqr->mRecordPacket += inNumPackets;
}
// if we're not stopping, re-enqueue the buffe so that it gets filled again
if (aqr->IsRunning())
XThrowIfError(AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, NULL),
"AudioQueueEnqueueBuffer failed");
} catch (CAXException e) {
char buf[256];
fprintf(stderr, "Error: %s (%s)\n", e.mOperation, e.FormatError(buf));
}
}
(maybe I shouldn't have deleted so much of the original question... what is the policy?)
Originally I was thinking of using the AurioTouch example, but as was pointed out in a comment, that uses throughput and I only need input. It is also a much more complicated example than SpeakHere.
you would probably want to apply some sort of smoothing to your peak power level, maybe am IIR filter, something like:
x_out = 0.9 * x_old + 0.1 * x_in;
:
x_old = x_out;
I haven't used this feature, so I don't know if it would do everything you want. if it doesn't, you can drop a level and use a RemoteIO audio unit, and catch sound as it comes in using the 'input callback' ( as opposed to the render callback which happens when the speakers are hungry for data )
note that in the input callback you have to create your own buffers, don't think just because you get a buffer pointer as the last parameter that that means it points to something valid. it doesn't.
anyway, you could use some vDSP function to get the magnitude squared for the vector of the entire buffer (1024 floats or whatever your buffer size / stream format is)
and then you could smooth that yourself
This loops through all samples in the buffer.
SInt16 sample;
for (UInt32 sampleIndex=0; sampleIndex < inNumPackets; ++sampleIndex) {
sample = buffer[sampleIndex]; // Get the power of one sample from the buffer
aqr->AnalyseSample(sample);
}
Was a tricky part: aqr points to the instance of the recorder. The callback is a static function and can't access the member variables or member functions directly.
In order to count the peaks, I keep track of a longterm average and a shortterm average. If the shortTerm average is a certain factor bigger than the longterm average, there is a peak. When the shortterm average goes down again, the peak has passed.

Playing generated audio on an iPhone

As a throwaway project for the iPhone to get me up to speed with Objective C and the iPhone libraries, I've been trying to create an app that will play different kinds of random noise.
I've been constructing the noise as an array of floats normalized from [-1,1].
Where I'm stuck is in playing that generated data. It seems like this should be fairly simple, but I've looked into using AudioUnit and AVAudioPlayer, and neither of these seem optimal.
AudioUnit requires apparently a few hundred lines of code to do even this simple task, and AVAudioPlayer seems to require me to convert the audio into something CoreAudio can understand (as best I can tell, that means LPCM put into a WAV file).
Am I overlooking something, or are these really the best ways to play some sound data stored in array form?
Here's some code to use AudioQueue, which I've modified from the SpeakHere example. I kind of pasted the good parts, so there may be something dangling here or there, but this should be a good start if you want to use this approach:
AudioStreamBasicDescription format;
memset(&format, 0, sizeof(format));
format.mSampleRate = 44100;
format.mFormatID = kAudioFormatLinearPCM;
format.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsPacked;
format.mChannelsPerFrame = 1;
format.mBitsPerChannel = 16;
format.mBytesPerFrame = (format.mBitsPerChannel / 8) * format.mChannelsPerFrame;
format.mFramesPerPacket = 1;
format.mBytesPerPacket = format.mBytesPerFrame * format.mFramesPerPacket;
AudioQueueRef queue;
AudioQueueNewOutput(&format,
AQPlayer::AQOutputCallback,
this, // opaque reference to whatever you like
CFRunLoopGetCurrent(),
kCFRunLoopCommonModes,
0,
&queue);
const int bufferSize = 0xA000; // 48K - around 1/2 sec of 44kHz 16 bit mono PCM
for (int i = 0; i < kNumberBuffers; ++i)
AudioQueueAllocateBufferWithPacketDescriptions(queue, bufferSize, 0, &mBuffers[i]);
AudioQueueSetParameter(queue, kAudioQueueParam_Volume, 1.0);
UInt32 category = kAudioSessionCategory_MediaPlayback;
AudioSessionSetProperty(kAudioSessionProperty_AudioCategory, sizeof(category), &category);
AudioSessionSetActive(true);
// prime the queue with some data before starting
for (int i = 0; i < kNumberBuffers; ++i)
OutputCallback(queue, mBuffers[i]);
AudioQueueStart(queue, NULL);
The code above refers to this output callback. Each time this callback executes, fill the buffer passed in with your generated audio. Here, I'm filling it with random noise.
void OutputCallback(void* inUserData, AudioQueueRef inAQ, AudioQueueBufferRef inCompleteAQBuffer) {
// Fill
//AQPlayer* that = (AQPlayer*) inUserData;
inCompleteAQBuffer->mAudioDataByteSize = next->mAudioDataBytesCapacity;
for (int i = 0; i < inCompleteAQBuffer->mAudioDataByteSize; ++i)
next->mAudioData[i] = rand();
AudioQueueEnqueueBuffer(queue, inCompleteAQBuffer, 0, NULL);
}
It sounds like you're coming from a platform that had a simple built in tone generator. The iPhone doesn't have anything like that. It's easier to play simple sounds from sound files. AudioUnit is for actually processing and generating real music.
So, yes, you do need an audio file to play a sound simply.