What Are Linear PCM Values - iphone

I am working with audio in the iPhone OS and am a bit confused.
I am currently getting input from my audio buffer in the form of pcm values ranging from
-32767 to 32768. I am hoping to perform a dbSPL conversion using the formula 20LOG10(p/pref).
I am aware that pRef is .00002 pascals, and would like to convert the pcm values to pascals.
My question is
a) what are these pcm values representing exactly.
b) how do I turn these values to pascals.
Thanks so much

You can't do this conversion without additional information. The mapping of PCM values to physical units of pressure (pascals) depends on the volume setting, characteristics of the output device (earbuds? a PA system?), and the position of the observer with respect to the output device (right next to the speaker? 100 meters away?).
To answer the first part of your question: if you were to graph the sound pressure
versus time for, say, a 1 kHz sine wave tone, the linear-quantized PCM values at the
sample times would be roughly proportional to the sound pressure variations from ambient
at that instant. ("Roughly", because input and output devices seldom have absolutely flat
response curves over the entire audio frequency range.)

Lets get some intuition for the question
what are these pcm values representing exactly ( ranging from -32767 to 32768 )
Audio is simply a curve which fluctuates above and below a zero line ... if the curve sits at or too near the zero line for long enough period of time this maps to silence ... neither your speaker surface nor your eardrum do any wobbling ... alternatively if the audio curve violently wobbles from maximum value to minimum value more often than not for a stretch of time you have maximum volume hence a greater value of pascals
Audio in the wild which your ear hears is analog ... to digitize audio this analog curve must get converted into binary data by sampling the raw audio curve to record the curve height at X samples per second ... the fundamental digital format for audio is PCM which simply maps that continuous unbroken analog curve into distinct points on a graph ... PCM audio still appears as a curve yet when you zoom in its just distinct points on the curve ... each curve point has its X and Y value where X represents time (going left to right) and Y represents amplitude (going up and down) ... however only the Y values are stored and the X values are implied meaning each successive Y sample is by definition separated in time determined by the sampling rate so for a second of time with a sample rate of 44100 Hertz you will have 44100 values of Y per second of recording ( per channel )
The number of X measurements per second we call Sample Rate (often 44,100 per second) ... the number of bits used to record the fidelity of Y we call Bit Depth ... if we devote 3 bits the universe of possible Y values must fit in one of these rows
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 0
so for 3 bits the number of possible values of Y is 2^3 or 8 distinct values which sounds very distorted since the audio curve is far from continuous which is why CD quality audio uses two bytes ( 16 bits) of information to record the value of curve height Y which gives it 2^16 distinct values of Y which equates to the scale you gave us ( -32767 to 32768 ) ... 2^16 == 65536 distinct Y values ... the original continuous unbroken analog audio curve is now digitized into 2^16 choices of height values ranging from top to bottom of audio curve which to the human ear becomes indistinguishable from the source audio curve ... when doing audio calculations using floating point often the Y value gets normalized ... into say a range of -1.0 to +1.0 ... instead of ( -32767 to 32768)
So by now it should be clear the heart of your question concerning pascals ( unit of pressure ) is orthogonal to Y value range (bit depth) and is instead a function of the shape of the audio curve together with the sheer area of the speaker surface ... for a given choice of frequency(s) the degree to which the audio curve adheres to the canonical sine curve of that frequency while consuming the full range of possible Y values will maximize the amplitude ( volume ) thus driving the pascal value

Your question is neither “iphone”, “objective-c” or “objective-c++”. But it can be answered very simple: http://en.wikipedia.org/wiki/Pulse-code_modulation
Greetings

Related

Why is the number of sample frequencies in `scipy.signal.stft()` tied to the hop size?

This question relates to SciPy's Short-time Fourier Transform function for signal processing.
For some reason I don't understand, the size of the output 'array of sample frequencies' is exactly equal to the hop size. From the documentation:
nperseg : int, optional
Length of each segment. Defaults to 256.
noverlap : int, optional
Number of points to overlap between segments. If None, noverlap = nperseg // 2. Defaults to None. When specified, the COLA constraint must be met (see Notes below).
f : ndarray
Array of sample frequencies.
hop size H = nperseg - noverlap
I'm new to signal processing and Fourier transforms, but as far as I understand a STFT is just chopping an audio file into segments ('time frames') on which you perform a Fourier transform. So if I want to do a STFT on 100 time frames, I'd expect the output to be a matrix of size 100 x F, where F is an array of measured frequencies ('measured' probably isn't the right word here but you know what I mean).
This is kinda what SciPy's implementation does, but the size of f here is what bothers me. It's supposed to be an array describing the different frequencies, like [0Hz 500Hz 1000Hz], and it does, but for some reasons its size exactly the same as the hop size. If the hop size is 700, the number of measured frequencies is 700.
The hop size is the number of samples (i.e. time) between each time frame, and is correctly calculated as H = nperseg - noverlap, but what does this have to do with the frequency array?
Edit: Related to this question
An FFT is an square matrix transform from one orthogonal basis to another of the same dimension. This is because N is the exact number of orthogonal (e.g. that don't interfere with one another) complex sinusoids that fit in a time domain vector of length N.
A longer time vector can contain more frequency information (e.g. it's hard to tell 2 frequencies apart using just 3 sample points, but much easier with 3000 samples, etc.)
You can zero-pad your short time vector of length N to use a longer FFT, but that is identical to interpolating a nice curve between N frequency points, which makes all the FFT results interdependent.
For many purposes (visualization, etc.) an STFT is overlapped, where the adjacent segments share some overlapped data instead of just being end-to-end. This gives better time locality (e.g. the segments can be spaced closer but still be long enough so that each one can provide the frequency resolution required).

How to generate accurate FFT plot of guitar harmonics with only 256 data points # 44.1khz Fs ?[Matlab]

I'm trying to make a realtime(ish) monophonic guitar to midi program. I want a latency of <=6 milli secs. To find what note was played i aim to sample 256 points (should take approx 6 millis) , run an fft and analyze mag plot to determine pitch of note played.
When i do this in matlab, it gives me back very unstable/inaccurate results with peaks appearing in random places etc.
The note being inputted is 110Hz sampled # 44.1khz. I've applied a high pass filter at 500hz with a roll off of 48db/octave.. so only the higher harmonics of signal should remain. The audio last for 1 second ( filled with zeros after 256 samples)
Code:
%fft work
guitar = wavread('C:\Users\Donnacha\Desktop\Astring110hz.wav');
guitar(1:44100);
X = fft(guitar);
Xmag = abs(X);
plot(Xmag);
Zoomed in FFT plot
I was hoping to see all the harmonics of 110Hz (A note on guitar) starting at >500hz..
How would i achieve accurate results from an FFT with such little data?
You can't. (at least reliably for all notes in a guitar's range).
256 samples at 44.1kHz is less than one period of most low string guitar notes. One period of vibration from a guitar's open low E string takes around 535 samples, depending on the guitar's tuning and intonation.
Harmonics often require multiple periods (repetitions) of a guitar note waveform within the FFT window in order to reliably show up in the FFT's spectrum. The more periods within the FFT window, the more reliably and sharper the harmonics show up in the FFT spectrum. Even more periods are required if the data is Von Hann (et.al.) windowed to avoid "leakage" windowing artifacts. So you have to pick the minimum number of periods needed based on the lowest note needed, your window type, and your statistical reliability and frequency resolution requirements.
An alternative is to concatenate several sets of your 256 samples into a longer window, at least as long as several periods of the lowest pitch you want to reliably plot.

What's the frequency range for a FFT?

So, as far as I have read up, for a signal with data points 0...n, I get a result from 0 to n, but I can omit n/2...n. Correct ? So now I have n/2 data points. How is the relation between the frequency range of these data points to the signal data ? E.g. what frequency is n/2 (0 is 0 hz I guess) ?
An FFT by itself has no frequency range. It could be anything.
The frequency range of an FFT result depends on the sample rate frequency at which the input data points were evenly sampled. The FFT results are then data points in the frequency domain spaced at the sample rate frequency divided by the FFT length, from 0 or DC up to half the sample rate.
The output of the generic FFT normally used in programming is 0-22khz for a 44.1 sample and 0-24khz for a 48khz input.
Actually, the raw output of the FFT is reflected over zero, so -22 to + 22, and most programs use the 0-22 half only... https://i.stack.imgur.com/GeaJq.png

Finding Energy in frequency bands of an audio file vector

I have an audio file which I imported into my Matlab workspace and have it as a vector now.
I have broken the vector into windows of 100 ms long.
window_length = fs*0.1;
How can I find the energy in certain frequency bands. 0-1000 Hz, 1000-2000 Hz etc??
I've tried to use the filter below:
% Create Filter
[N,Wc]=ellipord([450 1050]/(fs/2),[500 1000]/(fs/2),1,40);
[a,b]=ellip(N,1,40,Wc);
window_filtered=filter(a,b,window);
% Find Filterend Energy
Energy_band_X_X(position) =diag(window_filtered*window_filtered');
However my results are too large and don't make any sense.
Thanks!
I recommend using a simple FFT to find the entire frequency spectrum and then finding the energy in the band of your interest. You should also normalize your input data. For example, you can divide your data by the maximum value to make them between 0 and 1. If you are dealing with 16 bits or 8 bits integer valued audio samples then your energy value is going to be high.

iPhone - how to measure amplitude of a PCM coded signal

here's my problem at hand:
I need to analyze audio data in realtime to find out the amplitude of the signal
I can't use the AudioQueue metering functionality because it has too much delay for detecting peaks
I have studied the aurioTouch example..... however...
I simply don't understand how the PCM coded buffer is converted back to the waveform displayed in the oscilloscope view. It would be exactly this wavefore I want to analyze for amplitude.
In my callback when I analyze the buffer it only has 0 and -1 in it - when using NSLog(#"%d"). This makes sense I guess because it is PCM.
I can't find the place in aurioTouch where the magic of transforming the 0 / -1 stream into a waveform happens :-((
Also once I have the waveform in memory how do I analyze the amplitude and how to convert this reading into dB?
I don't want to use FFT because I am not interested in the frequency and I hope there are other ways.
Cheers
Mat
Once you have a chunk of the waveform in memory then it's fairly easy to calculate magnitude values in dB (although you'll need to decide what you reference magnitude for 0 dB is). Typically if you want the kind of short term magnitude that you might see displayed on a VU meter then you need to rectify the instantaneous values of the waveform (you can use abs for this) and then pass these rectified values through a simple low pass filter with a time constant of the order of, say, 100 ms. To convert the values to dB you'll do this:
amplitude_dB = 20 * log10(amplitude) + calibration_dB;
where amplitude is the rectified and filtered magnitude, and calibration_dB is an offset to give you the correct amplitude for 0 dB, whatever that might be in your particular application (e.g. dB re full scale, or a calibrated dBV or dBm value).
A simple but effective low pass filter can be implemented as follows. This will be a single pole IIR (recursive) filter. Each output is dependent on the previous output value and the current input value. We have a constant factor, alpha, which effectively determines the time constant or cut-off frequency of this low pass filter.
y = alpha * x + (1.0 - alpha) * y_old;
y_old = y;
x = current input value
y = new output value
y_old = previous output value
alpha = constant which determines response of filter - a small positive number - try 0.001 to start off with and experiment
AurioTouch is the right example to look at. Unfortunately the code is just terrible. Make sure that you only use it to get to know the concepts behind working with the RemoteIO audio unit.
The actual drawing of the waveform is happening right in the renderProc callback which is called by CoreAudio when PCM data becomes available. Look at PerformThru() in aurioTouchAppDelegate.mm:197... further down, in line 237
SInt8 *data_ptr = (SInt8 *)(ioData->mBuffers[0].mData);
... that's where the actual PCM data is accessed. This is the data you would need to analyze in order to get peak/average power of the signal.