Advice on converting ultrasonic rat call signal into human audible range with matlab - matlab

I'm doing a project studying rats who squeak in the ultrasonic range (20kHz to 100kHz) using Matlab software and sound files.
I have (or will be getting) a couple .wav audio signals of these rats speaking, and among general analysis of these wave forms, I also want to convert these ultrasonic signals (outside of our hearing), into the human audible range (20hz to 20khz).
Could I get some advice on how to do this conversion (via Matlab programming and not by using equipment)
Looking into this, I've found names such as:
-frequency division
-heterodyning
-envelope detection
-time expansion
but looking into these it seems either they are explained in terms of what the equipment (bat detectors) does, or they sound incredibly similar to each other. e.g. frequency division and time expansion both involve dividing the incoming signal by 10
since I am looking into what seems to be unfamiliar turf, it would be great to find multiple ways to convert the signal (to my knowledge the names above have their own associated positive and negative traits)

Your question is a signal processing question more than a Matlab question, which isn't really what Stack Overflow is about, so you might get some negative votes.
There are indeed a number of methods of changing the frequency of audio (or any signals):
1) Slow it Down: The least disruptive to the signal is simply to slow down the audio. If you are looking to have rat signals up to 100 kHz, you'll need to sample the audio at 200 kHz or greater. Once you have your recording, simply re-save the wav file telling it that the sample rate is 44.1 kHz (or whatever). This will play it more slowly, but all the frequencies will now be audible (unlike the single side band demodulation discussed below). This is definitely the place you should start...it's the easiest and will sound the best.
fs = 200e3; %your original sample rate
myAudio = load('myFile.mat'); %your original audio
fs = 44.1e3; %simply declare that you want a lower sample rate
wavwrite(myAudio,fs,16,'myFile_44kHz.wav'); %save it out at the new rate
2) Single-Side Band: Use the demod command to "demodulate" the signal to lower its frequency. There are a number of demodulation methods available with this command. I'd use "single side band (suppressed carrier)" because that is how the rat itself (and humans) create sound. To do the demodulation, you'll have to assume a "carrier frequency", as if it were a radio signal. If the lowest frequency of a rat squeek is 20 kHz, you can assume a carrier of 20 kHz. This operation will shift all of your audio down by 20 kHz. As a result the squeek that was originall 20-100 kHz, will now be 0-80 kHz. So, you won't hear the whole thing, but you'll hear part of it.
fs = 200e3; %your original sample rate
myAudio = load('myFile.mat'); %your original audio
[b,a]=butter(2,20e3/(fs/2),'high'); %define highpass filter
myAudio = filtfilt(b,a,myAudio); %remove the low frequencies
myAudio = demod(myAudio,20e3,fs,'amssb'); %shift it down 20 kHz
wavwrite(myAudio,fs,16,'myWave_shifted.wav'); %save it out
3) Phase Vocoder (or other Pitch Shifting): To hear the whole 20-100 kHz range (which is 80 kHz bandwidth, which is 4x bigger than the 20 kHz bandwidth of human hearing), you've got to go to more extreme methods. These methods will make the audio sound bizarre, but you can give it a try. There are several algorithms. Look up "phase vocoder". Or, use one of audio processing software packages like Audacity, Raven, etc.

Related

what is the purpose of the frequency domain analysis

I had assumed that this was going to output a set frequency buckets that I could use to do pitch detection (like aubio pitch). But that doesn't seem to be what it does. I fired up the voice-omatic app and using it frequency display played various notes through my mic. Bars appear, all at the left hand end, almost no distinction between high and low notes. I upped the FFT size just to see if that changed anything, seems not. I found a picthdetect js project and saw that it used analyzer ,aha, here I will find the correct usage, but the meat of the code doesn't use frequency domain output, it feeds time domain into its own algorithm. So to solve my problem I will use that library, but I am still curious what the freqency domain data represents
It does exactly what you think it does, but the specific implementation on that site is not great for seeing that.
The values are mapped differently than is nescessary for seeing changes in pitch between high and low vocal range because the range is mapped linearly, not logrithmically. If you were to map each frequency bin in a logarithmic way you would get a far more useful diagram. Right now the right hand 3/4 of the visualizer is showing from 5000hz to 20000hz (roughly), which contains very very little data in an audio signal compared to 0-5000hz. The root frequency of the human voice (and most instruments) occupies mostly between 100-1000hz. There are harmonics above that, but at reducing amplitudes the higher you get.
I've tweaked the code to tell you the peak frequency and the size of each bucket. Use this app.js: https://puu.sh/Izcws/ded6aae55b.js - you can use a tone generator on your phone to see how accurate it is. https://puu.sh/Izcx3/d5a9d74764.png
The way the code works is it calculates how big each bucket is as described in my answer with var bucketSize = 1 / (analyser.fftSize / analyser.context.sampleRate); (and adds two spans to show the data) and then while drawing each bar calculate which bar is the biggest, and then multiply the size of each bucket by which bucket is the biggest to get the peak frequency (and write it in el2). You can play with the fftSize and see why using a small value will not work at all for determining whether you are playing A2(110 Hz) or A2#(116.5 Hz).
From https://stackoverflow.com/a/43369065/13801789:
I believe I understand what you mean exactly. The problem is not with your code, it is with the FFT underlying getByteFrequencyData. The core problem is that musical notes are logarithmically spaced while the FFT frequency bins are linearly spaced.
Notes are logarithmically spaced: The difference between consecutive low notes, say A2(110 Hz) and A2#(116.5 Hz) is 6.5 Hz while the difference between the same 2 notes on a higher octave A3(220 Hz) and A3#(233.1 Hz) is 13.1 Hz.
FFT bins are linearly spaced: Say we're working with 44100 samples per second, the FFT takes a window of 1024 samples (a wave), and multiplies it first with a wave as long as 1024 samples (let's call it wave1), so that would be a period of 1024/44100=0.023 seconds which is 43.48 Hz, and puts the resulting amplitude in the first bin. Then it multiplies it with a wave with frequency of wave1 * 2, which is 86.95 Hz, then wave1 * 3 = 130.43 Hz. So the difference between the frequencies is linear; it's always the same = 43.48, unlike the difference in musical notes which changes.
This is why close low notes will be bundled up in the same bin while close high notes are separated. This is the problem with FFT's frequency resolution. It can be solved by taking windows bigger than 1024 samples, but that would be a trade off for the time resolution.

why is amplitude of low frequency much lower than that of high frequency after using fft?

I generate the sound using matlab and then play it over computer's speaker, meanwhile I record the sound using iphone, at last i send the 'record.wav' file to computer to analyze it. But here I found that the amplitude of low frequency much is lower than that of high frequency.
The sound generation code looks like A*sin(2*pi*697*(0:N-1)/44100)+A*sin(2*pi*1209*(0:N-1)/44100 if I want to generate a dial tone for number 1, N is the length that i want to generate and 44100 is the sampling frequency.
Then I want to use FFT to analyze the frequency of the sound and plot the FFT output. Though I get the correct frequency I want, the amplitude looks different, which puzzles me a lot.
So, what happened? Why are the two amplitudes different?
[temp,fs] = audioread('record.wav');
[P1,f] = fft_recorder(temp,fs);
function [P1,f] = fft_recorder(array,fs)
array = fft(array);
P2 = abs(array/length(array));
P1 = P2(1:length(array)/2+1);
P1(2:end-1) = 2*P1(2:end-1);
f = fs*(0:(length(array)/2))/length(array);
end
The speaker outputting the sound, and the room you are in (due to multipath reflections and resonances), most likely do not have a flat frequency response over that frequency range. Any mechanical resonances due to contact with the speaker or the iPhone will also change the received audio level by different amounts at different frequencies. (The iPhone's microphone may be closer to having a flat frequency response, but not perfectly.) So some frequencies will be recorded as stronger than others, even with your variable A set to a constant.
Try testing one frequency at at time over your desired frequency range, and measure the response of your channel. The frequency response curve might even change by a large amount when changing the position of the speaker, microphone, or other large objects in the room.

1024 pt fft on a large set of data points

I have a signal that may not be periodic. We take about 15secs worth of samples (# 10kHz sampling rate) and we need to do the FFT on that signal to get the frequency content.
The problem is that we are implementing this FFT on an embedded system (DSP) which provides a library FFT of max. 1024 length. That is, it takes in 1024 data points as input and provides a 1024 point output.
What is the correct way of obtaining an FFT on the full 150000 point input?
You could run the FFT on each 1024 point block and average them to get an average power spectrum on the lower-resolution 1024-point frequency axis (512 samples from 0 to the Nyquist frequency, fs/2, so about 10 Hz resolution for your 10 kHz sampling). You should average the magnitudes of the component FFTs (i.e., sqrt(re^2+im^2)), otherwise the average will be sensitive to the drifting phase within each subwindow, which will depend on the precise frequency of the sinusoi.
If you think the periodic component may be at a low frequency, such that it will show up in a 15 sec sample but not complete any cycles in a 1024/10k ~ 100ms sample (i.e., below 10 Hz or so), you could downsample your input. You could try something as crude as averaging every 100 points to get a somewhat-distorted signal at 100 Hz sampling rate, then pack 10.24 sec worth into your 1024 pt sequence to pass to the FFT.
You could combine these two approaches by using a smaller downsampling factor and then do the magnitude-averaging of successive windows.
I'm confused why the system provides an FFT only up to 1024 points - is there something about the memory that makes it harder to access larger blocks?
Calculating a 128k point FFT using a 1k FFT as a subroutine is possible, but you'd end up recoding a lot of the FFT yourself. Maybe you should forget about the system library and use some other FFT implementation, without the length limitation, that will compile on your target. It may not incorporate all the optimizations of the system-provided one, but you're likely to lose a lot of that advantage when you embed it within the custom code needed to use the partial outputs of the multiple shorter FFTs to produce the long FFT.
Probably the quickest way to do the hybrid FFT (1024 points using the library, then added code to combine them into a 128k point FFT) would be to take an existing full FFT routine (a radix-2, decimation-in-time (DIT) routine for instance), but then modify it to use the system library for what would have been the first 10 stages, which amount to calculating 128 individual 1024-point FFTs on different subsets of the original signal (not, unfortunately, successive windows, but the partial-bit-reversed subsets), then let the remaining 7 stages of butterflies operate on those partial outputs. You'd want to get a pretty solid understanding of how the DIT FFT works to implement this.

Error while performing fft in MATLAB

I took a file which was a clean, repeating waveform of the note F2 (I deduced this by playing F2 on the virtual piano and playing the wave fle, and comparing) on a piano. When I did the fft of the signal, by using wavread of the saved wav file, I got a peak frequency of 176.4 Hz, which is one octave off the correct frequency of the note F2.
When I analysed another note, this time recorded from a physical piano, I got the exact same peak frequency. How is this possible? It is possible that MATLAB stores the peak frequency in its memory for more than one file? If so, how do I solve this problem?
P.S. when we analysed a full song, i.e. a wav file containing many notes, we got many peaks, which confirmed that we weren't using the same graph for everything.
It's not an error in Matlab's FFT.
Musical pitch is different from peak frequency. It's a psycho-acoustic phenomena. A sound that a human will hear as a single musical pitch can contain many frequency peaks, with the strongest spectral frequency peaks possibly being overtones that are centered on a completely different pitch class, and/or in a higher octave than the perceived pitch. There are books that cover this topic in areas such as audiology and the neuropsychology of sound perception. A book on the physics of music might explain why a piano creates this more rich and complicated frequency spectrum content.
To find pitch, one needs to use a pitch detection/estimation algorithm, not just an FFT. Search here using those keywords.

How to determine the frequency of the input recorded voice in iphone?

I am new to iphone development.I am doing research on voice recording in iphone .I have downloaded the "speak here " sample program from Apple.I want to determine the frequency of my voice that is recorded in iphone.Please guide me .Please help me out.Thanks.
In the context of processing human speech, there's really no such thing as "the" frequency.
The signal will be a mix of many different frequencies, so it might be more fruitful to think in terms of a spectrum, rather than a single frequency. Even if you're talking about
a sustained musical note with a fixed pitch, there will be plenty of overtones and harmonics present, in addition to the fundamental frequency of the note. And for actual speech,
the frequency spectrum will change drastically even within a short clip, due to the different tonal characteristics of vowels and consonants.
With that said, it does make some sense to consider the peak frequency of a voice recording.
You could calculate the Fast Fourier Transform of your voice clip, then find the frequency
bin with the largest response. You may also be interested in the concept of a spectrogram, which represents how the audio spectrum of a signal varies over time.
Use Audacity. Take a small recording of typical speech, and cut it down to one wavelength, from one peak to another peak. Subtract the two times, and divide 1 by that number and you'll get the frequency of your wave in Hz.
Example:
In my audio clip, my waveform runs from 0.0760 to 0.0803 seconds.
0.0803-0.0760 = 0.0043
1/0.0043 = 232.558 Hz, my typical speech frequency
This might give you a good basis to create an analyzer. You'd need to detect the peaks, and time between the peaks of the wave and do an average calculation of the result.
You'll need to use Apple's Accelerate framework to take an FFT of the relevant audio. The FFT will convert the audio in the time domain to the frequency domain. The Accelerate framework supports the FFT and will allow you to do frequency analysis in real-time.