What would be the best way to get Hz frequency value from audio stream(music) on iOS? What are the best and easiest frameworks provided by Apple to do that. Thanks in advance.
Here is some code I use to perform FFT in iOS using Accelerate Framework, which makes it quite fast.
//keep all internal stuff inside this struct
typedef struct FFTHelperRef {
FFTSetup fftSetup; // Accelerate opaque type that contains setup information for a given FFT transform.
COMPLEX_SPLIT complexA; // Accelerate type for complex number
Float32 *outFFTData; // Your fft output data
Float32 *invertedCheckData; // This thing is to verify correctness of output. Compare it with input.
} FFTHelperRef;
//first - initialize your FFTHelperRef with this function.
FFTHelperRef * FFTHelperCreate(long numberOfSamples) {
FFTHelperRef *helperRef = (FFTHelperRef*) malloc(sizeof(FFTHelperRef));
vDSP_Length log2n = log2f(numberOfSamples);
helperRef->fftSetup = vDSP_create_fftsetup(log2n, FFT_RADIX2);
int nOver2 = numberOfSamples/2;
helperRef->complexA.realp = (Float32*) malloc(nOver2*sizeof(Float32) );
helperRef->complexA.imagp = (Float32*) malloc(nOver2*sizeof(Float32) );
helperRef->outFFTData = (Float32 *) malloc(nOver2*sizeof(Float32) );
memset(helperRef->outFFTData, 0, nOver2*sizeof(Float32) );
helperRef->invertedCheckData = (Float32*) malloc(numberOfSamples*sizeof(Float32) );
return helperRef;
}
//pass initialized FFTHelperRef, data and data size here. Return FFT data with numSamples/2 size.
Float32 * computeFFT(FFTHelperRef *fftHelperRef, Float32 *timeDomainData, long numSamples) {
vDSP_Length log2n = log2f(numSamples);
Float32 mFFTNormFactor = 1.0/(2*numSamples);
//Convert float array of reals samples to COMPLEX_SPLIT array A
vDSP_ctoz((COMPLEX*)timeDomainData, 2, &(fftHelperRef->complexA), 1, numSamples/2);
//Perform FFT using fftSetup and A
//Results are returned in A
vDSP_fft_zrip(fftHelperRef->fftSetup, &(fftHelperRef->complexA), 1, log2n, FFT_FORWARD);
//scale fft
vDSP_vsmul(fftHelperRef->complexA.realp, 1, &mFFTNormFactor, fftHelperRef->complexA.realp, 1, numSamples/2);
vDSP_vsmul(fftHelperRef->complexA.imagp, 1, &mFFTNormFactor, fftHelperRef->complexA.imagp, 1, numSamples/2);
vDSP_zvmags(&(fftHelperRef->complexA), 1, fftHelperRef->outFFTData, 1, numSamples/2);
//to check everything =============================
vDSP_fft_zrip(fftHelperRef->fftSetup, &(fftHelperRef->complexA), 1, log2n, FFT_INVERSE);
vDSP_ztoc( &(fftHelperRef->complexA), 1, (COMPLEX *) fftHelperRef->invertedCheckData , 2, numSamples/2);
//=================================================
return fftHelperRef->outFFTData;
}
Use it like this:
Initialize it: FFTHelperCreate(TimeDomainDataLenght);
Pass Float32 time domain data, get frequency domain data on return: Float32 *fftData = computeFFT(fftHelper, buffer, frameSize);
Now you have an array where indexes=frequencies, values=magnitude (squared magnitudes?).
According to Nyquist theorem your maximum possible frequency in that array is half of your sample rate. That is if your sample rate = 44100, maximum frequency you can encode is 22050 Hz.
So go find that Nyquist max frequency for your sample rate: const Float32 NyquistMaxFreq = SAMPLE_RATE/2.0;
Finding Hz is easy: Float32 hz = ((Float32)someIndex / (Float32)fftDataSize) * NyquistMaxFreq;
(fftDataSize = frameSize/2.0)
This works for me. If I generate specific frequency in Audacity and play it - this code detects the right one (the strongest one, you also need to find max in fftData to do this).
(there's still a little mismatch in about 1-2%. not sure why this happens. If someone can explain me why - that would be much appreciated.)
EDIT:
That mismatch happens because pieces I use to FFT are too small. Using larger chunks of time domain data (16384 frames) solves the problem.
This questions explains it:
Unable to get correct frequency value on iphone
EDIT:
Here is the example project: https://github.com/krafter/DetectingAudioFrequency
Questions like this are asked a lot here on SO. (I've answered a similar one here) so I wrote a little tutorial with code that you can use even in commercial and closed source apps. This is not necessarily the BEST way, but it's a way that many people understand. You will have to modify it based on what you mean by "Hz average value of every short music segment". Do you mean the fundamental pitch or the frequency centroid, for example.
You might want to use Apple's FFT in the accelerate framework as suggested by another answer.
Hope it helps.
http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html
Apple does not provide a framework for frequency or pitch estimation. However, the iOS Accelerate framework does include routines for FFT and autocorrelation which can be used as components of more sophisticated frequency and pitch recognition or estimation algorithms.
There is no way that is both easy and best, except possibly for a single long continuous constant frequency pure sinusoidal tone in almost zero noise, where an interpolated magnitude peak of a long windowed FFT might be suitable. For voice and music, that simple method will very often not work at all. But a search for pitch detection or estimation methods will turn up lots of research papers on more suitable algorithms.
Related
I was doing some work using the CoreBluetooth API and ran into a problem. All of the places I have looked, they say that to convert RSSI (Signal Strength of Bluetooth), you must do things like:
Distance = 10 ^ ((Measured Power – RSSI)/(10 * N))
And:
var txPower = -59 //hard coded power value. Usually ranges between -59 to -65
if (rssi == 0) {
return -1.0;
}
var ratio = rssi*1.0/txPower;
if (ratio < 1.0) {
return Math.pow(ratio,10);
}
else {
var distance = (0.89976)*Math.pow(ratio,7.7095) + 0.111;
return distance;
}
I have tried all of the above and everything I could find. None of it gets me the accurate measurements from about 0.5 meters to around 5 - 7 meters of distance between.
My code to do so is making both phones using the app as a central and peripheral Bluetooth and in my didDiscoverPeripheral callback from CentralManager, I get the RSSI - which I want to convert to a distance (meter, feet).
Along with that:
I also need to find out how to get the Measured Power (RSSI Strength at 1 meter) of iPhones as it would really help in the accurate calculations.
Also, what does environmental factor mean in terms of Bluetooth? What do the different environmental factors mean (which have the range of 2-4). Is there a way to change or increase the Broadcasting Power value of the Apple Device?
Basically, I am looking for an accurate RSSI to distance formula which works from distances from 0.5 meter to 5-7 meters
Thank you so much!
This is what is a common solution:
pow(10, ((-56-Double(rssi))/(10*2)))*3.2808
It was good for most distances but got very inaccurate as you get close or too far, so I ended up using bins kind of like Apple's iBeacons (Unknown, Far, Near, Immediate). If the raw RSSI is less than -80, then it is far, if it is more than -50, then it is immediate, and if it is between those two, it is near. This solution worked for me.
I am developing an application for the iPhone using opencv. I have to use the method solvePnPRansac:
http://opencv.willowgarage.com/documentation/cpp/camera_calibration_and_3d_reconstruction.html
For this method I need to provide a camera matrix:
__ __
| fx 0 cx |
| 0 fy cy |
|_0 0 1 _|
where cx and cy represent the center pixel positions of the image and fx and fy represent focal lengths, but that is all the documentation says. I am unsure what to provide for these focal lengths. The iPhone 5 has a focal length of 4.1 mm, but I do not think that this value is usable as is.
I checked another website:
http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
which shows how opencv creates camera matrices. Here it states that focal lengths are measured in pixel units.
I checked another website:
http://www.velocityreviews.com/forums/t500283-focal-length-in-pixels.html
(about half way down)
it says that focal length can be converted from units of millimeters to pixels using the equation: fx = fy = focalMM * pixelDensity / 25.4;
Another Link I found states that fx = focalMM * width / (sensorSizeMM);
fy = focalMM * length / (sensorSizeMM);
I am unsure about these equations and how to properly create this matrix.
Any help, advice, or links on how to create an accurate camera matrix (especially for the iPhone 5) would be greatly appreciated,
Isaac
p.s. I think that (fx/fy) or (fy/fx) might be equal to the aspect ratio of the camera, but that might be completely wrong.
UPDATE:
Pixel coordinates to 3D line (opencv)
using this link, I can figure out how they want fx and fy to be formatted because they use it to scale angles relative to their distance from the center. therefore, fx and fy are likely in pixels/(unit length) but im still not sure what this unit length needs to be, can it be arbitrary as long as x and y are scaled to each other?
You can get an initial (rough) estimate of the focal length in pixel dividing the focal length in mm by the width of a pixel of the camera' sensor (CCD, CMOS, whatever).
You get the former from the camera manual, or read it from the EXIF header of an image taken at full resolution. Finding out the latter is a little more complicated: you may look up on the interwebs the sensor's spec sheet, if you know its manufacturer and model number, or you may just divide the overall width of its sensitive area by the number of pixels on the side.
Absent other information, it's usually safe to assume that the pixels are square (i.e. fx == fy), and that the sensor is orthogonal to the lens's focal axis (i.e. that the term in the first row and second column of the camera matrix is zero). Also, the pixel coordinates of the principal point (cx, cy) are usually hard to estimate accurately without a carefully designed calibration rig, and an as-carefully executed calibration procedure (that's because they are intrinsically confused with the camera translation parallel to the image plane). So it's best to just set them equal to the geometrical geometrical center of the image, unless you know that the image has been cropped asymmetrically.
Therefore, your simplest camera model has only one unknown parameter, the focal length f = fx = fy.
Word of advice: in your application is usually more convenient to carry around the horizontal (or vertical) field-of-view angle, rather than the focal length in pixels. This is because the FOV is invariant to image scaling.
The "focal length" you are dealing with here is simply a scaling factor from objects in the world to camera pixels, used in the pinhole camera model (Wikipedia link). That's why its units are pixels/unit length. For a given f, an object of size L at a distance (perpendicular to the camera) z, would be f*L/z pixels.
So, you could estimate the focal length by placing an object of known size at a known distance of your camera and measuring its size in the image. You could aso assume the central point is the center of the image. You should definitely not ignore the lens distortion (dist_coef parameter in solvePnPRansac).
In practice, the best way to obtain the camera matrix and distortion coefficients is to use a camera calibration tool. You can download and use the MRPT camera_calib software from this link, there's also a video tutorial here. If you use matlab, go for the Camera Calibration Toolbox.
Here you have a table with the spec of the cameras for iPhone 4 and 5.
The calculation is:
double f = 4.1;
double resX = (double)(sourceImage.cols);
double resY = (double)(sourceImage.rows);
double sensorSizeX = 4.89;
double sensorSizeY = 3.67;
double fx = f * resX / sensorSizeX;
double fy = f * resY / sensorSizeY;
double cx = resX/2.;
double cy = resY/2.;
Try this:
func getCamMatrix()->(Float, Float, Float, Float)
{
let format:AVCaptureDeviceFormat? = deviceInput?.device.activeFormat
let fDesc:CMFormatDescriptionRef = format!.formatDescription
let dim:CGSize = CMVideoFormatDescriptionGetPresentationDimensions(fDesc, true, true)
// dim = dimensioni immagine finale
let cx:Float = Float(dim.width) / 2.0;
let cy:Float = Float(dim.height) / 2.0;
let HFOV : Float = format!.videoFieldOfView
let VFOV : Float = ((HFOV)/cx)*cy
let fx:Float = abs(Float(dim.width) / (2 * tan(HFOV / 180 * Float(M_PI) / 2)));
let fy:Float = abs(Float(dim.height) / (2 * tan(VFOV / 180 * Float(M_PI) / 2)));
return (fx, fy, cx, cy)
}
Old thread, present problem.
As Milo and Isaac mentioned after Milo's answer, there seems to be no "common" params available for, say, the iPhone 5.
For what it is worth, here is the result of a run with the MRPT calibration tool, with a good old iPhone 5:
[CAMERA_PARAMS]
resolution=[3264 2448]
cx=1668.87585
cy=1226.19712
fx=3288.47697
fy=3078.59787
dist=[-7.416752e-02 1.562157e+00 1.236471e-03 1.237955e-03 -5.378571e+00]
Average err. of reprojection: 1.06726 pixels (OpenCV error=1.06726)
Note that dist means distortion here.
I am conducting experiments on a toy project, with these parameters---kind of ok. If you do use them on your own project, please keep in mind that they may be hardly good enough to get started. The best will be to follow Milo's recommendation with your own data. The MRPT tool is quite easy to use, with the checkerboard they provide. Hope this does help getting started !
My goal is to create a "Chirper" class. A chirper should be able to emit a procedurally generated chirp sound. The specific idea is that the chirp must be procedurally generated, not a prerecorded sound played back.
What is the simplest way to achieve a procedurally generated chirp sound on the iPhone?
You can do it with a sine wave as you said, which you would define using the sin functions. Create a buffer as long as you want the sound in samples, such as:
// 1 second chirp
float samples[44100];
Then pick a start frequency and end frequency, which you probably want the start to be higher than the end, something like:
float startFreq = 1400;
float endFreq = 1100;
float thisFreq;
int x;
for(x = 0; x < 44100; x++)
{
float lerp = float(float(x) / 44100.0);
thisFreq = (lerp * endFreq) + ((1 - lerp) * startFreq);
samples[x] = sin(thisFreq * x);
}
Something like that, anyway.
And if you want a buzz or another sound, use different waveforms - create them to work very similarly to sin and you can use them interchangably. That way you could create saw() sqr() tri(), and you could do things like combine them to form more complex or varied sounds
========================
Edit -
If you want to play you should be able to do something along these lines using OpenAL. The important thing is to use OpenAL or a similar iOS API to play the raw buffer.
alGenBuffers (1, &buffer);
alBufferData (buffer, AL_FORMAT_MONO16, buf, size, 8000);
alGenSources (1, &source);
ALint state;
// attach buffer and play
alSourcei (source, AL_BUFFER, buffer);
alSourcePlay (source);
do
{
wait (200);
alGetSourcei (source, AL_SOURCE_STATE, &state);
}
while ((state == AL_PLAYING) && play);
alSourceStop(source);
alDeleteSources (1, &source);
delete (buf)
}
Using RemoteIO audio unit
Audio Unit Hosting Guide for iOS
You can use Nektarios's code in the render callback for remote I/O unit. Also you can even change waveforms real-time (low latency).
I'm working off of some of Apple's sample code for mixing audio (http://developer.apple.com/library/ios/#samplecode/MixerHost/Introduction/Intro.html) and I'd like to display an animated spectrogram (think itunes spectrogram in the top center that replaces the song title with moving bars). It would need to somehow get data from the audio stream live since the user will be mixing several loops together. I can't seem to find any tutorials online about anything to do with this.
I know I am really late for this question. But I just found some great resource to solve this question.
Solution:
Instantiate an audio unit to record samples from the microphone of the iOS device.
Perform FFT computations with the vDSP functions in Apple’s Accelerate framework.
Draw your results to the screen using a UIImage
Computing the FFT or spectrogram of an audio signal is a fundamental audio signal processing. As an iOS developer, whether you want to simply find the pitch of a sound, create a nice visualisation, or do some front end processing it’s something you’ve likely thought about if you are at all interested in audio. Apple does provide a sample application for illustrating this task (aurioTouch). The audio processing part of that App is obscured, however, imho, by extensive use of Open GL.
The goal of this project was to abstract as much of the audio dsp out of the sample app as possible and to render a visualisation using only the UIKit. The resulting application running in the iOS 6.1 simulator is shown to the left. There are three components. rscodepurple1 A simple ‘exit’ button at the top, a gain slider at the bottom (allowing the user to adjust the scaling between the magnitude spectrum and the brightness of the colour displayed), and a UIImageView displaying power spectrum data in the middle that refreshes every three seconds. Note that the frequencies run from low to the high beginning at the top of the image. So, the top of the display is actually DC while the bottom is the Nyquist rate. The image shows the results of processing some speech.
This particular Spectrogram App records audio samples at a rate of 11,025 Hz in frames that are 256 points long. That’s about 0.0232 seconds per frame. The frames are windowed using a 256 point Hanning window and overlap by 1/2 of a frame.
Let’s examine some of the relevant parts of the code that may cause confusion. If you want to try to build this project yourself you can find the source files in the archive below.
First of all look at the content of the PerformThru method. This is a callback for an audio unit. Here, it’s where we read the audio samples from a buffer into one of the arrays we have declared.
SInt8 *data_ptr = (SInt8 *)(ioData->mBuffers[0].mData);
for (i=0; i<inNumberFrames; i++) {
framea[readlas1] = data_ptr[2];
readlas1 += 1;
if (readlas1 >=33075) {
readlas1 = 0;
dispatch_async(dispatch_get_main_queue(), ^{
[THIS printframemethod];
});
}
data_ptr += 4;
}
Note that framea is a static array of length 33075. The variable readlas1 keeps track of how many samples have been read. When the counter hits 33075 (3 seconds at this sampling frequency) a call to another method printframemethod is triggered and the process restarts.
The spectrogram is calculated in printframemethod.
for (int b = 0; b < 33075; b++) {
originalReal[b]=((float)framea[b]) * (1.0 ); //+ (1.0 * hanningwindow[b]));
}
for (int mm = 0; mm < 250; mm++) {
for (int b = 0; b < 256; b++) {
tempReal[b]=((float)framea[b + (128 * mm)]) * (0.0 + 1.0 * hanningwindow[b]);
}
vDSP_ctoz((COMPLEX *) tempReal, 2, &A, 1, nOver2);
vDSP_fft_zrip(setupReal, &A, stride, log2n, FFT_FORWARD);
scale = (float) 1. / 128.;
vDSP_vsmul(A.realp, 1, &scale, A.realp, 1, nOver2);
vDSP_vsmul(A.imagp, 1, &scale, A.imagp, 1, nOver2);
for (int b = 0; b < nOver2; b++) {
B.realp[b] = (32.0 * sqrtf((A.realp[b] * A.realp[b]) + (A.imagp[b] * A.imagp[b])));
}
for (int k = 0; k < 127; k++) {
Bspecgram[mm][k]=gainSlider.value * logf(B.realp[k]);
if (Bspecgram[mm][k]<0) {
Bspecgram[mm][k]=0.0;
}
}
}
Note that in this method we first cast the signed integer samples to floats and store in the array originalReal. Then the FFT of each frame is computed by calling the vDSP functions. The two-dimensional array Bspecgram contains the actual magnitude values of the Short Time Fourier Transform. Look at the code to see how these magnitude values are converted to RGB pixel data.
Things to note:
To get this to build just start a new single-view project and replace the delegate and view controller and add the aurio_helper files. You need to link the Accelerate, AudioToolbox, UIKit, Foundation, and CoreGraphics frameworks to build this. Also, you need PublicUtility. On my system, it is located at /Developer/Extras/CoreAudio/PublicUtility. Where you find it, add that directory to your header search paths.
Get the code:
The delegate, view controller, and helper files are included in this zip archive.
A Spectrogram App for iOS in purple
Apple's aurioTouch example app (on developer.apple.com) has source code for drawing an animated frequency spectrum plot from recorded audio input. You could probably group FFT bins into frequency ranges for a coarser bar graph plot.
I like to update an existing iPhone application which is using AudioQueue for playing audio files. The levels (peakPowerForChannel, averagePowerForChannel) were linear form 0.0f to 1.0f.
Now I like to use the simpler class AVAudioPlayer which works fine, the only issue is that the levels which are now in decibel, not linear from -120.0f to 0.0f.
Has anyone a formula to convert it back to the linear values between 0.0f and 1.0f?
Thanks
Tom
Several Apple examples use the following formula to convert the decibels into a linear range (from 0.0 to 1.0):
double percentage = pow (10, (0.05 * power));
where power is the value you get from one of the various level meter methods or functions, such as AVAudioPlayer's averagePowerForChannel:
Math behind the Linear and Logarithmic value conversion:
1. Linear to Decibel (logarithmic):
decibelValue = 20.0f * log10(linearValue)
Note: log is base 10
Suppose the linear value in the form of percentage range from [ 0 (min vol) to 100 (max vol)] then the decibelValue for half of the volume (50%) is
decibelValue = 20.0f * log10(50.0f/100.0f) = -6 dB
Full volume:
decibelValue = 20.0f * log10(100.0f/100.0f) = 0 dB
Complete mute:
decibelValue = 20.0f * log10(0/100.0f) = -infinity
2. Decibel(logarithmic) to Linear:
LinearValue = pow(10.0f, decibelValue/20.0f)
Apple uses a lookup table in their SpeakHere sample that converts from dB to a linear value displayed on a level meter.
I moulded their calculation in a small routine; see here.