iPhone - how to measure amplitude of a PCM coded signal - iphone

here's my problem at hand:
I need to analyze audio data in realtime to find out the amplitude of the signal
I can't use the AudioQueue metering functionality because it has too much delay for detecting peaks
I have studied the aurioTouch example..... however...
I simply don't understand how the PCM coded buffer is converted back to the waveform displayed in the oscilloscope view. It would be exactly this wavefore I want to analyze for amplitude.
In my callback when I analyze the buffer it only has 0 and -1 in it - when using NSLog(#"%d"). This makes sense I guess because it is PCM.
I can't find the place in aurioTouch where the magic of transforming the 0 / -1 stream into a waveform happens :-((
Also once I have the waveform in memory how do I analyze the amplitude and how to convert this reading into dB?
I don't want to use FFT because I am not interested in the frequency and I hope there are other ways.
Cheers
Mat

Once you have a chunk of the waveform in memory then it's fairly easy to calculate magnitude values in dB (although you'll need to decide what you reference magnitude for 0 dB is). Typically if you want the kind of short term magnitude that you might see displayed on a VU meter then you need to rectify the instantaneous values of the waveform (you can use abs for this) and then pass these rectified values through a simple low pass filter with a time constant of the order of, say, 100 ms. To convert the values to dB you'll do this:
amplitude_dB = 20 * log10(amplitude) + calibration_dB;
where amplitude is the rectified and filtered magnitude, and calibration_dB is an offset to give you the correct amplitude for 0 dB, whatever that might be in your particular application (e.g. dB re full scale, or a calibrated dBV or dBm value).
A simple but effective low pass filter can be implemented as follows. This will be a single pole IIR (recursive) filter. Each output is dependent on the previous output value and the current input value. We have a constant factor, alpha, which effectively determines the time constant or cut-off frequency of this low pass filter.
y = alpha * x + (1.0 - alpha) * y_old;
y_old = y;
x = current input value
y = new output value
y_old = previous output value
alpha = constant which determines response of filter - a small positive number - try 0.001 to start off with and experiment

AurioTouch is the right example to look at. Unfortunately the code is just terrible. Make sure that you only use it to get to know the concepts behind working with the RemoteIO audio unit.
The actual drawing of the waveform is happening right in the renderProc callback which is called by CoreAudio when PCM data becomes available. Look at PerformThru() in aurioTouchAppDelegate.mm:197... further down, in line 237
SInt8 *data_ptr = (SInt8 *)(ioData->mBuffers[0].mData);
... that's where the actual PCM data is accessed. This is the data you would need to analyze in order to get peak/average power of the signal.

Related

How to recreate an instrument sound from a .WAV file by using FFT and findpeaks() in MATLAB?

I want to generate my own samples of a Kick, Clap, Snare and Hi-Hat sounds in MATLAB based on a sample I have in .WAV format.
Right now it does not sound at all correct, and I was wondering if my code does not make sense? Or if it is that I am missing some sound theory.
Here is my code right now.
[y,fs]=audioread('cp01.wav');
Length_audio=length(y);
df=fs/Length_audio;
frequency_audio=-fs/2:df:fs/2-df;
frequency_audio = frequency_audio/(fs/2); //Normalize the frequency
figure
FFT_audio_in=fftshift(fft(y))/length(fft(y));
plot(frequency_audio,abs(FFT_audio_in));
The original plot of y.
My FFT of y
I am using the findpeaks() function to find the peaks of the FFT with amplitude greater than 0.001.
[pk, loc] = findpeaks(abs(FFT_audio_in), 'MinPeakHeight', 0.001);
I then find the corresponding normalized frequencies from the frequency audio (positive ones) and the corresponding peak.
loc = frequency_audio(loc);
loc = loc(length(loc)/2+1:length(loc))
pk = pk(length(pk)/2+1:length(pk))
So the one sided, normalized FFT looks like this.
Since it looks like the FFT, I think I should be able to recreate the sound by summing up sinusoids with the correct amplitude and frequency. Since the clap sound had 21166 data points I use this for the for loop.
for i=1:21116
clap(i) = 0;
for j = 1:length(loc);
clap(i) = bass(i) + pk(j)*sin(loc(j)*i);
end
end
But this results in the following sound, which is nowhere near the original sound.
What should I do differently?
You are taking the FFT of the entire time-period of the sample, and then generating stationary sinewaves for the whole duration. This means that the temporal signature of the drum is gone. And the temporal signature is the most characteristic of percussive unvoiced instruments.
Since this is so critical, I suggest you start there first instead of with the frequency content.
The temporal signature can be approximated by the envelope of the signal. MATLAB has a convenient function for this called envelope. Use that to extract the envelope of your sample.
Then generate some white-noise and multiply the noise by the envelope to re-create a very simple version of your percussion instrument. You should hear a clear difference between Kick, Clap, Snare and Hi-Hat, though it won't sound the same as the original.
Once this is working, you can attempt to incorporate frequency information. I recommend taking the STFT to get a spectrogram of the sound, so you can see how it the frequency spectrum changes over time.

Creating a sinusoidal wave in matlab

I want to create a sinusoidal wave that has the following properties :
a sine wave with f=400Hz amp=1 from 0 to 2s
a sine wave with f=200Hz amp=1 from 2 to 3s
a sine wave with f=800Hz amp=2 from 3 to 5s
Here is my matlab Code :
t=linspace(0,5,5000);
x=zeros(1,length(t));
n1=0:1999;
n2=2000:2999;
n3=3000:4999;
x(1:2000)=1*sin(2*pi*400*n1);
x(2001:3000)=1*sin(2*pi*200*n2);
x(3001:5000)=2*sin(2*pi*800*n3);
plot(t,x)
and here is the plot that I had, still it looks not logical at all,
So I would like to know the error in my code
In this type of problem, where you're naturally looking at physical quantities, it's very helpful to be consistent with this all the way through your calculations.
Specifically, you specify Hz (1/seconds), a physical unit, so when you calculate everything else, you need to be consistent with that.
To do this in your equation, it's most straightforward to put time directly in the sin function, like sin(2*pi*f*t). But since you want to break the array apart using different n, it probably easiest to do that and then use t=linspace(0,5,50000) and dt = 5.0/50000 or dt = t(2) - t(1), and sin(2*pi*400*dt*n1). Read this as dt*n1 converts the integers in n1 to time in seconds.
Note the physical units too: 400 in above is actually 400Hz, and the time is in seconds, so the units of 2*pi*400*dt*n1 and 2*pi*f*t are Hz * s = 1, that is, the units cancel, which is what you need.
There is a tendency for programmers to want to define away some unit, like say seconds=1. This is possible and technically correct and can save a multiplication or two. It almost always leads to errors.
Note also that you should change from t=linspace(0,5,5000) to something like t=linspace(0,5,50000). The reason should now be clear: you're looking at frequencies from 400-800Hz, or almost 1kHz, or 1 oscillation per millisecond. To see a sine wave, you'll need to get in a few data points per oscillation, and 50000 points in 5 seconds will now give about 10 points per millisecond, which is barely enough to see a reasonable sine wave. Or, however you want to think of the calculation, somehow you need to be sure you sample at a high enough rate.
That is, the specific error that your encountering is that by using integers instead of fractions of a second for your time array, you're taking much too large of steps for the sin function. That's always a possible problems with the sin function, but even if you did plot a sin that looked like a sin (say, by using a frequency like 0.003Hz instead of 400Hz) it would still be incorrect because it wouldn't have the proper time axis. So you need to both get the units correct, and make sure that you get enough data per oscillation to see the sine wave (or whatever it is you happen to be looking for).

Summing Frequency Spectrums

I've a set of data from an EEG device from which I want to find the strength of different brain waves in Matlab. I tried to use EEGLAB but I wasn't really sure how, so at this point I'm simply using the dsp toolbox in Matlab.
For background: I've 15 epochs, 4 seconds in length. The device sampled at 256 Hz, and there are 264 sensors, so there are 1024 data points for each sensor for each epoch, i.e. my raw data is 264 x 1024 x 15. The baseline is removed. The data in each epoch is going to be used to train a classifier eventually, so I'm dealing with each epoch individually. I'll come up with more data samples later.
Anyways, what I've done so far is apply a Hann filter to the data and then run fft on the filtered data. So now I have the information in frequency domain. However, I'm not quite sure how to go from the power of the fft buckets to the power of certain frequency bands (e.g. alpha 8-13), to get the values I seek.
I know the answer should be straightforward but I can't seem to get find the answer I want online, and then there's further confusion by certain sources recommending using a wavelet transform? Here's the little bit of code I have so far, the input "data" is one epoch, i.e. 264 x 1024.
% apply a hann window
siz = size(data);
hann_window = hann(siz(2));
hann_window = repmat(hann_window.', siz(1), 1);
hann_data = data.' * hann_window;
% run fft
X = fft(hann_data, [], 2);
X_mag = abs(X);
X_mag = X_mag.';
Thanks for the assistance!
If I'm understanding your question correctly, you are wanting to scale the FFT output to get the correct power. To do this you need to divide by the number of samples used for the FFT.
X_mag = abs(X)/length(hann_data); % This gives the correct power.
See this question for more info.
Once the content is scaled correctly, you can find the power in a band (e.g. 8 - 13 Hz) by integrating the content from the start to the stop of the band. Since you are dealing with discrete values it is a discrete integration. For perspective, this is equivalent to changing the resolution bandwidth of a spectrum analyzer.

Matlab Fourier transform of dsp.Audiorecorder in real-time

I'm using dsp.Audiorecord to get real-time microphone input. The sound input is a series of sinusoids with different frequencies ranging from 500 to 2000Hz. Each one sounds for a second.
I'd like to know in real-time, what's the frequency of the current sin and also make the difference between two sins with same frequency going one after the other. This is why I use dsp.Audiorecord.
This is what my code looks like now:
Microphone = dsp.AudioRecorder;
tic;
while(toc<30)
audio = step(Microphone);
[x, indexMax] = max(abs(fft(audio(:,1)-mean(audio(:,1)))));
indexMax
end
All the indexMax shows are numbers ranging from around 25 to 40. There's clearly an operation left out in order to retrieve the original frequency in [500;2000].
I've tried also to apply dsp.FFT() directly to audio but it tells me:
Error using dsp.FFT/pvParse
Invalid property/value pair arguments.
If there's any other way to perform real-time FFT on the dsp.Audiorecorder I'd really like to know. Or just if you see a way to to complete what I've done here it would be great also.
To approximately estimate what frequency goes with what index, you need to know the sample rate (Fs) of the data sent to the FFT, and the length (N) of the FFT:
f ~= index * Fs / N
That's the operation you've left out.

What is a spectrogram and how do I set its parameters?

I am trying to plot the spectrogram of my time domain signal given:
N=5000;
phi = (rand(1,N)-0.5)*pi;
a = tan((0.5.*phi));
i = 2.*a./(1-a.^2);
plot(i);
spectrogram(i,100,1,100,1e3);
The problem is I don't understand the parameters and what values should be given. These values that I am using, I referred to MATLAB's online documentation of spectrogram. I am new to MATLAB, and I am just not getting the idea. Any help will be greatly appreciated!
Before we actually go into what that MATLAB command does, you probably want to know what a spectrogram is. That way you'll get more meaning into how each parameter works.
A spectrogram is a visual representation of the Short-Time Fourier Transform. Think of this as taking chunks of an input signal and applying a local Fourier Transform on each chunk. Each chunk has a specified width and you apply a Fourier Transform to this chunk. You should take note that each chunk has an associated frequency distribution. For each chunk that is centred at a specific time point in your time signal, you get a bunch of frequency components. The collection of all of these frequency components at each chunk and plotted all together is what is essentially a spectrogram.
The spectrogram is a 2D visual heat map where the horizontal axis represents the time of the signal and the vertical axis represents the frequency axis. What is visualized is an image where darker colours means that for a particular time point and a particular frequency, the lower in magnitude the frequency component is, the darker the colour. Similarly, the higher in magnitude the frequency component is, the lighter the colour.
Here's one perfect example of a spectrogram:
Source: Wikipedia
Therefore, for each time point, we see a distribution of frequency components. Think of each column as the frequency decomposition of a chunk centred at this time point. For each column, we see a varying spectrum of colours. The darker the colour is, the lower the magnitude component at that frequency is and vice-versa.
So!... now you're armed with that, let's go into how MATLAB works in terms of the function and its parameters. The way you are calling spectrogram conforms to this version of the function:
spectrogram(x,window,noverlap,nfft,fs)
Let's go through each parameter one by one so you can get a greater understanding of what each does:
x - This is the input time-domain signal you wish to find the spectrogram of. It can't get much simpler than that. In your case, the signal you want to find the spectrogram of is defined in the following code:
N=5000;
phi = (rand(1,N)-0.5)*pi;
a = tan((0.5.*phi));
i = 2.*a./(1-a.^2);
Here, i is the signal you want to find the spectrogram of.
window - If you recall, we decompose the image into chunks, and each chunk has a specified width. window defines the width of each chunk in terms of samples. As this is a discrete-time signal, you know that this signal was sampled with a particular sampling frequency and sampling period. You can determine how large the window is in terms of samples by:
window_samples = window_time/Ts
Ts is the sampling time of your signal. Setting the window size is actually very empirical and requires a lot of experimentation. Basically, the larger the window size, the better frequency resolution you get as you're capturing more of the frequencies, but the time localization is poor. Similarly, the smaller the window size, the better localization you have in time, but you don't get that great of a frequency decomposition. I don't have any suggestions here on what the most optimal size is... which is why wavelets are preferred when it comes to time-frequency decomposition. For each "chunk", the chunks get decomposed into smaller chunks of a dynamic width so you get a mixture of good time and frequency localization.
noverlap - Another way to ensure good frequency localization is that the chunks are overlapping. A proper spectrogram ensures that each chunk has a certain number of samples that are overlapping for each chunk and noverlap defines how many samples are overlapped in each window. The default is 50% of the width of each chunk.
nfft - You are essentially taking the FFT of each chunk. nfft tells you how many FFT points are desired to be computed per chunk. The default number of points is the largest of either 256, or floor(log2(N)) where N is the length of the signal. nfft also gives a measure of how fine-grained the frequency resolution will be. A higher number of FFT points would give higher frequency resolution and thus showing fine-grained details along the frequency axis of the spectrogram if visualised.
fs - The sampling frequency of your signal. The default is 1 Hz, but you can override this to whatever the sampling frequency your signal is at.
Therefore, what you should probably take out of this is that I can't really tell you how to set the parameters. It all depends on what signal you have, but hopefully the above explanation will give you a better idea of how to set the parameters.
Good luck!