Help with live-updating sound on the iPhone - iphone

My question is a little tricky, and I'm not exactly experienced (I might get some terms wrong), so here goes.
I'm declaring an instance of an object called "Singer". The instance is called "singer1". "singer1" produces an audio signal. Now, the following is the code where the specifics of the audio signal are determined:
OSStatus playbackCallback(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData) {
//Singer *me = (Singer *)inRefCon;
static int phase = 0;
for(UInt32 i = 0; i < ioData->mNumberBuffers; i++) {
int samples = ioData->mBuffers[i].mDataByteSize / sizeof(SInt16);
SInt16 values[samples];
float waves;
float volume=.5;
for(int j = 0; j < samples; j++) {
waves = 0;
waves += sin(kWaveform * 600 * phase)*volume;
waves += sin(kWaveform * 400 * phase)*volume;
waves += sin(kWaveform * 200 * phase)*volume;
waves += sin(kWaveform * 100 * phase)*volume;
waves *= 32500 / 4; // <--------- make sure to divide by how many waves you're stacking
values[j] = (SInt16)waves;
values[j] += values[j]<<16;
phase++;
}
memcpy(ioData->mBuffers[i].mData, values, samples * sizeof(SInt16));
}
return noErr;
}
99% of this is borrowed code, so I only have a basic understanding of how it works (I don't know about the OSStatus class or method or whatever this is. However, you see those 4 lines with 600, 400, 200 and 100 in them? Those determine the frequency. Now, what I want to do (for now) is insert my own variable in there in place of a constant, which I can change on a whim. This variable is called "fr1". "fr1" is declared in the header file, but if I try to compile I get an error about "fr1" being undeclared. Currently, my technique to fix this is the following: right beneath where I #import stuff, I add the line
fr1=0.0;//any number will work properly
This sort of works, as the code will compile and singer1.fr1 will actually change values if I tell it to. The problems are now this:A)even though this compiles and the tone specified will play (0.0 is no tone), I get the warnings "Data definition has no type or storage class" and "Type defaults to 'int' in declaration of 'fr1'". I bet this is because for some reason it's not seeing my previous declaration in the header file (as a float). However, again, if I leave this line out the code won't compile because "fr1 is undeclared". B)Just because I change the value of fr1 doesn't mean that singer1 will update the value stored inside the "playbackcallback" variable or whatever is in charge of updating the output buffers. Perhaps this can be fixed by coding differently? C)even if this did work, there is still a noticeable "gap" when pausing/playing the audio, which I need to eliminate. This might mean a complete overhaul of the code so that I can "dynamically" insert new values without disrupting anything. However, the reason I'm going through all this effort to post is because this method does exactly what I want (I can compute a value mathematically and it goes straight to the DAC, which means I can use it in the future to make triangle, square, etc waves easily). I have uploaded Singer.h and .m to pastebin for your veiwing pleasure, perhaps they will help. Sorry, I can't post 2 HTML tags so here are the full links.
(http://pastebin.com/ewhKW2Tk)
(http://pastebin.com/CNAT4gFv)
So, TL;DR, all I really want to do is be able to define the current equation/value of the 4 waves and re-define them very often without a gap in the sound.
Thanks. (And sorry if the post was confusing or got off track, which I'm pretty sure it did.)

My understanding is that your callback function is called every time the buffer needs to be re-filled. So changing fr1..fr4 will alter the waveform, but only when the buffer updates. You shouldn't need to stop and re-start the sound to get a change, but you will notice an abrupt shift in the timbre if you change your fr values. In order to get a smooth transition in timbre, you'd have to implement something that smoothly changes the fr values over time. Tweaking the buffer size will give you some control over how responsive the sound is to your changing fr values.
Your issue with fr being undefined is due to your callback being a straight c function. Your fr variables are declared as objective-c instance variables as part of your Singer object. They are not accessible by default.
take a look at this project, and see how he implements access to his instance variables from within his callback. Basically he passes a reference to his instance to the callback function, and then accesses instance variables through that.
https://github.com/youpy/dowoscillator
notice:
Sinewave *sineObject = inRefCon;
float freq = sineObject.frequency * 2 * M_PI / samplingRate;
and:
AURenderCallbackStruct input;
input.inputProc = RenderCallback;
input.inputProcRefCon = self;
Also, you'll want to move your callback function outside of your #implementation block, because it's not actually part of your Singer object.
You can see this all in action here: https://github.com/coryalder/SineWaver

Related

In Unity, how to segment the user's voice from microphone based on loudness?

I need to collect voice pieces from a continuous audio stream. I need to process later the user's voice piece that has just been said (not for speech recognition). What I am focusing on is only the voice's segmentation based on its loudness.
If after at least 1 second of silence, his voice becomes loud enough for a while, and then silent again for at least 1 second, I say this is a sentence and the voice should be segmented here.
I just know I can get raw audio data from the AudioClip created by Microphone.Start(). I want to write some code like this:
void Start()
{
audio = Microphone.Start(deviceName, true, 10, 16000);
}
void Update()
{
audio.GetData(fdata, 0);
for(int i = 0; i < fdata.Length; i++) {
u16data[i] = Convert.ToUInt16(fdata[i] * 65535);
}
// ... Process u16data
}
But what I'm not sure is:
Every frame when I call audio.GetData(fdata, 0), what I get is the latest 10 seconds of sound data if fdata is big enough or shorter than 10 seconds if fdata is not big enough, is it right?
fdata is a float array, and what I need is a 16 kHz, 16 bit PCM buffer. Is it right to convert the data like: u16data[i] = fdata[i] * 65535?
What is the right way to detect loud moments and silent moments in fdata?
No. you have to read starting at the current position within the AudioClip using Microphone.GetPosition
Get the position in samples of the recording.
and pass the optained index to AudioClip.GetData
Use the offsetSamples parameter to start the read from a specific position in the clip
fdata = new float[clip.samples * clip.channels];
var currentIndex = Microphone.GetPosition(null);
audio.GetData(fdata, currentIndex);
I don't understand what exactly you convert this for. fdata will contain
floats ranging from -1.0f to 1.0f (AudioClip.GetData)
so if for some reason you need to get values between short.MinValue (= -32768) and short.MaxValue(= 32767) than yes you can do that using
u16data[i] = Convert.ToUInt16(fdata[i] * short.MaxValue);
note however that Convert.ToUInt16(float):
value, rounded to the nearest 16-bit unsigned integer. If value is halfway between two whole numbers, the even number is returned; that is, 4.5 is converted to 4, and 5.5 is converted to 6.
you might want to rather use Mathf.RoundToInt first to also round up if a value is e.g. 4.5.
u16data[i] = Convert.ToUInt16(Mathf.RoundToInt(fdata[i] * short.MaxValue));
Your naming however suggests that you are actually trying to get unsigned values ushort (or also UInt16). For this you can not have negative values! So you have to shift the float values up in order to map the range (-1.0f | 1.0f ) to the range (0.0f | 1.0f) before multiplaying it by ushort.MaxValue(= 65535)
u16data[i] = Convert.ToUInt16(Mathf.RoundToInt(fdata[i] + 1) / 2 * ushort.MaxValue);
What you receive from AudioClip.GetData are the gain values of the audio track between -1.0f and 1.0f.
so a "loud" moment would be where
Mathf.Abs(fdata[i]) >= aCertainLoudThreshold;
a "silent" moment would be where
Mathf.Abs(fdata[i]) <= aCertainSiltenThreshold;
where aCertainSiltenThreshold might e.g. be 0.2f and aCertainLoudThreshold might e.g. be 0.8f.

Interpreting inputBuffer's Value in a Callback

I am basing my code off of Portaudio's paex_record_file.c example. One of the parameters in the callback is inputBuffer, and I wanted to use its data to calculate other numbers with the double/float type. I changed the file from a .raw to a .txt, but notepad still cannot read it, leading me to believe its data is not actually encoded as a number. How is the data stored in inputBuffer and how can I do arithmetic with it (add, multiply, divide, etc)?
This is how I initialized inputParameters:
inputParameters.device = Pa_GetDefaultInputDevice(); /* default input device */
if (inputParameters.device == paNoDevice) {
fprintf(stderr,"Error: No default input device.\n");
goto error;
}
inputParameters.channelCount = 2; /* stereo input */
inputParameters.sampleFormat = paFloat32;
inputParameters.suggestedLatency = Pa_GetDeviceInfo( inputParameters.device )->defaultLowInputLatency;
inputParameters.hostApiSpecificStreamInfo = NULL;
This question is somewhat related to print floats from audio input callback function (unanswered).
The inputBuffer parameter to the callback is a void*. The actual type of the underlying buffer depends on the parameters and the flags that you pass to Pa_OpenStream.
If you specified paFloat32 then there will be a float* in there somewhere. However the are two possibilities:
Interleaved: inputParameters.sampleFormat = paFloat32;
Non-Interleaved: inputParameters.sampleFormat = paFloat32|paNonInterleaved;
You specified the interleaved option. In this case, inputBuffer points to a single buffer of interleaved floats. So you can write:
float *samples = (float*)inputBuffer;
In a two channel stream samples will contain interleaved left and right samples, e.g.:
samples[0]; // first left sample
samples[1]; // first right sample
samples[2]; // second left sample
samples[3]; // second right sample
// etc.
For completeness: If it had been a non-interleaved stream then inputBuffer points to an array of pointers to single-channel buffers. To extract the buffer pointers you would write something like:
float *left = ((float **) inputBuffer)[0];
float *right = ((float **) inputBuffer)[1];
Note that in all cases framesPerBuffer counts frames not samples. A frame includes one sample from each channel. For example, in a stereo stream, a frame includes both the left and right channel samples.

Counting audio power peaks iOS

Edited the question due to progressive insights :-)
I am creating an app that is listening to the audio input.
I want it to count peaks. (peaks will be at a max frequency of about 10 Hz.)
After a lot of searching, I ended up using the AudioQueue Service as that will be able to give me the raw input data.
I am using a stripped down version (no playback) of the SpeakHere example, but instead of simply writing the buffer to the filesystem, I want to look at the individual sample data.
Think I am on the right track now, but I don't understand how to work with the buffers.
I am trying to isolate the data of one sample. So that for loop in the following function, does that make any sense, and
what should I put in there to get one sample?
void AQRecorder::MyInputBufferHandler( void *inUserData, AudioQueueRef inAQ, AudioQueueBufferRef inBuffer, const AudioTimeStamp *inStartTime, UInt32 inNumPackets, const AudioStreamPacketDescription* inPacketDesc)
{
// AudioQueue callback function, called when an input buffers has been filled.
AQRecorder *aqr = (AQRecorder *)inUserData;
try {
if (inNumPackets > 0) {
/* // write packets to file
XThrowIfError(AudioFileWritePackets(aqr->mRecordFile,FALSE,inBuffer->mAudioDataByteSize,inPacketDesc,aqr->mRecordPacket,&inNumPackets,inBuffer->mAudioData),
"AudioFileWritePackets failed");*/
SInt16 sample;
for (UInt32 sampleIndex=0; sampleIndex < inNumPackets; ++sampleIndex) {
// What do I put here to look at one sample at index sampleIndex ??
}
aqr->mRecordPacket += inNumPackets;
}
// if we're not stopping, re-enqueue the buffe so that it gets filled again
if (aqr->IsRunning())
XThrowIfError(AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, NULL),
"AudioQueueEnqueueBuffer failed");
} catch (CAXException e) {
char buf[256];
fprintf(stderr, "Error: %s (%s)\n", e.mOperation, e.FormatError(buf));
}
}
(maybe I shouldn't have deleted so much of the original question... what is the policy?)
Originally I was thinking of using the AurioTouch example, but as was pointed out in a comment, that uses throughput and I only need input. It is also a much more complicated example than SpeakHere.
you would probably want to apply some sort of smoothing to your peak power level, maybe am IIR filter, something like:
x_out = 0.9 * x_old + 0.1 * x_in;
:
x_old = x_out;
I haven't used this feature, so I don't know if it would do everything you want. if it doesn't, you can drop a level and use a RemoteIO audio unit, and catch sound as it comes in using the 'input callback' ( as opposed to the render callback which happens when the speakers are hungry for data )
note that in the input callback you have to create your own buffers, don't think just because you get a buffer pointer as the last parameter that that means it points to something valid. it doesn't.
anyway, you could use some vDSP function to get the magnitude squared for the vector of the entire buffer (1024 floats or whatever your buffer size / stream format is)
and then you could smooth that yourself
This loops through all samples in the buffer.
SInt16 sample;
for (UInt32 sampleIndex=0; sampleIndex < inNumPackets; ++sampleIndex) {
sample = buffer[sampleIndex]; // Get the power of one sample from the buffer
aqr->AnalyseSample(sample);
}
Was a tricky part: aqr points to the instance of the recorder. The callback is a static function and can't access the member variables or member functions directly.
In order to count the peaks, I keep track of a longterm average and a shortterm average. If the shortTerm average is a certain factor bigger than the longterm average, there is a peak. When the shortterm average goes down again, the peak has passed.

Beat Detection on iPhone with wav files and openal

Using this website i have tried to make a beat detection engine. http://www.gamedev.net/reference/articles/article1952.asp
{
ALfloat energy = 0;
ALfloat aEnergy = 0;
ALint beats = 0;
bool init = false;
ALfloat Ei[42];
ALfloat V = 0;
ALfloat C = 0;
ALshort *hold;
hold = new ALshort[[myDat length]/2];
[myDat getBytes:hold length:[myDat length]];
ALuint uiNumSamples;
uiNumSamples = [myDat length]/4;
if(alDatal == NULL)
alDatal = (ALshort *) malloc(uiNumSamples*2);
if(alDatar == NULL)
alDatar = (ALshort *) malloc(uiNumSamples*2);
for (int i = 0; i < uiNumSamples; i++)
{
alDatal[i] = hold[i*2];
alDatar[i] = hold[i*2+1];
}
energy = 0;
for(int start = 0; start<(22050*10); start+=512){
for(int i = start; i<(start+512); i++){
energy+= ((alDatal[i]*alDatal[i]) + (alDatal[i]*alDatar[i]));
}
aEnergy = 0;
for(int i = 41; i>=0; i--){
if(i ==0){
Ei[0] = energy;
}
else {
Ei[i] = Ei[i-1];
}
if(start >= 21504){
aEnergy+=Ei[i];
}
}
aEnergy = aEnergy/43.f;
if (start >= 21504) {
for(int i = 0; i<42; i++){
V += (Ei[i]-aEnergy);
}
V = V/43.f;
C = (-0.0025714*V)+1.5142857;
init = true;
if(energy >(C*aEnergy)) beats++;
}
}
}
alDatal and alDatar are (short*) type;
myDat is NSdata that holds the actual audio data of a wav file formatted to
22050 khz and 16 bit stereo.
This doesn't seem to work correctly. If anyone could help me out that would be amazing. I've been stuck on this for 3 days.
The desired result is after the 10 seconds worth of data has been processed i should be able to multiply that by 6 and have an estimated beats per minute.
My current results are 389 beats every 10 seconds, 2334 BPM the song i know is right around 120 BPM.
That code really has been smacked about with the ugly stick. If you're going to ask other people to find your bugs for you, it's a good idea to make things presentable first. Strangely enough, this will often help you to find them for yourself too.
So, before I point out some of the more fundamental errors, I have to make a few schoolmarmly suggestions:
Don't sprinkle your code with magic numbers. Is it really that hard to type a few lines like const ALuint SAMPLE_RATE = 22050? Trust me, it makes life a lot easier.
Use variable names that you aren't going to mix up easily. One of your bugs is a substitution of alDatal for alDatar. That probably wouldn't have happened if they were called left and right. Similarly, what is the point of having a meaningful variable name like energy if you're just going to stick it alongside the meaningless but more or less identical aEnergy? Why not something informative like average?
Declare variables close to where you're going to use them and in the appropriate scope. Another of your bugs is that you don't reset your calculated energy sum when you move your averaging window, so the energy will just add up and up. But you don't need the energy outside that loop, and if you declared it inside the problem couldn't happen.
There are some other things I personally find a little irksome, like the random bracing and indentation, and mixing of C and C++ allocations, and odd inconsistent scraps of Hungarian prefixing, but at least some of those may be more a matter of taste so I won't go on.
Anyway, here are some reasons why your code doesn't work:
First up, look at the right hand side of this line:
energy+= ((alDatal[i]*alDatal[i]) + (alDatal[i]*alDatar[i]));
You want the square of each channel value, so it should really say:
energy+= ((alDatal[i]*alDatal[i]) + (alDatar[i]*alDatar[i]));
Spot the difference? Not easy with those names, is it?
Second, you should be computing the total energy over each window of samples, but you're only setting energy = 0 outside the outer loop. So the sum accumulates, and consequently the current window energy will always be the biggest you've ever encountered.
Third, your variance calculation is wrong. You have:
V += (Ei[i]-aEnergy);
But it should be the sum of the squares of the differences from the mean:
V += (Ei[i] - aEnergy) * (Ei[i] - aEnergy);
There may well be other errors as well. For instance, you don't allocate the data buffers if they're not NULL, but assume that they're the right length -- which you've only just calculated. You may justify that in terms of some consistent usage you've stuck to throughout your code, but from the perspective of what we can see here it looks like a pretty bad idea.

EXC_BAD_ACCESS when calling avcodec_encode_video

I have an Objective-C class (although I don't believe this is anything Obj-C specific) that I am using to write a video out to disk from a series of CGImages. (The code I am using at the top to get the pixel data comes right from Apple: http://developer.apple.com/mac/library/qa/qa2007/qa1509.html). I successfully create the codec and context - everything is going fine until it gets to avcodec_encode_video, when I get EXC_BAD_ACCESS. I think this should be a simple fix, but I just can't figure out where I am going wrong.
I took out some error checking for succinctness. 'c' is an AVCodecContext*, which is created successfully.
-(void)addFrame:(CGImageRef)img
{
CFDataRef bitmapData = CGDataProviderCopyData(CGImageGetDataProvider(img));
long dataLength = CFDataGetLength(bitmapData);
uint8_t* picture_buff = (uint8_t*)malloc(dataLength);
CFDataGetBytes(bitmapData, CFRangeMake(0, dataLength), picture_buff);
AVFrame *picture = avcodec_alloc_frame();
avpicture_fill((AVPicture*)picture, picture_buff, c->pix_fmt, c->width, c->height);
int outbuf_size = avpicture_get_size(c->pix_fmt, c->width, c->height);
uint8_t *outbuf = (uint8_t*)av_malloc(outbuf_size);
out_size = avcodec_encode_video(c, outbuf, outbuf_size, picture); // ERROR occurs here
printf("encoding frame %3d (size=%5d)\n", i, out_size);
fwrite(outbuf, 1, out_size, f);
CFRelease(bitmapData);
free(picture_buff);
free(outbuf);
av_free(picture);
i++;
}
I have stepped through it dozens of times. Here are some numbers...
dataLength = 408960
picture_buff = 0x5c85000
picture->data[0] = 0x5c85000 -- which I take to mean that avpicture_fill worked...
outbuf_size = 408960
and then I get EXC_BAD_ACCESS at avcodec_encode_video. Not sure if it's relevant, but most of this code comes from api-example.c. I am using XCode, compiling for armv6/armv7 on Snow Leopard.
Thanks so much in advance for help!
I have not enough information here to point to the exact error, but I think that the problem is that the input picture contains less data than avcodec_encode_video() expects:
avpicture_fill() only sets some pointers and numeric values in the AVFrame structure. It does not copy anything, and does not check whether the buffer is large enough (and it cannot, since the buffer size is not passed to it). It does something like this (copied from ffmpeg source):
size = picture->linesize[0] * height;
picture->data[0] = ptr;
picture->data[1] = picture->data[0] + size;
picture->data[2] = picture->data[1] + size2;
picture->data[3] = picture->data[1] + size2 + size2;
Note that the width and height is passed from the variable "c" (the AVCodecContext, I assume), so it may be larger than the actual size of the input frame.
It is also possible that the width/height is good, but the pixel format of the input frame is different from what is passed to avpicture_fill(). (note that the pixel format also comes from the AVCodecContext, which may differ from the input). For example, if c->pix_fmt is RGBA and the input buffer is in YUV420 format (or, more likely for iPhone, a biplanar YCbCr), then the size of the input buffer is width*height*1.5, but avpicture_fill() expects the size of width*height*4.
So checking the input/output geometry and pixel formats should lead you to the cause of the error. If it does not help, I suggest that you should try to compile for i386 first. It is tricky to compile FFMPEG for the iPhone properly.
Does the codec you are encoding support the RGB color space? You may need to use libswscale to convert to I420 before encoding. What codec are you using? Can you post the code where you initialize your codec context?
The function RGBtoYUV420P may help you.
http://www.mail-archive.com/libav-user#mplayerhq.hu/msg03956.html