Can non-standard sampling rates be used with AAC encoding? - encoding

I am trying to up-sampling an M4a file from 41000 to another non-standard sampling rate let's say 5000.
ffmpeg -i audio.wav -ar 5000 audio_.wav
This worked fine with wav files whoever it didn't work with m4a. Any ideas why?
If non-standard sampling rates don't work with AAC I need any documentation or reference for that.

The set of available sampling frequencies is limited by AAC ADIF (Audio Data Interchange Format) and ADTS (Audio Data Transport Stream). So other rates just can't be encoded in AAC stream. Here are values for field sampling_frequency_index form 8.1.1.2 in ISO/IEC 13818-7 standard:
About #slhck answer
According to ISO/IEC 13818-7 paragraph 8.2.3:
8.2.3 Decoding Process
Assuming that the start of a raw_data_block() is known, it can be decoded without any additional „transport-level“ information and produces 1024 audio samples per output channel. The sampling rate of the audio signal, as specified by the sampling_frequency_index, may be specified in a program_config_element() or it may be implied in the specific application domain. In the latter case, the sampling_frequency_index must be deduced in order for the bitstream to be parsed.
Since a given sampling frequency is associated with only one sampling frequency table, and since maximum flexibility is desired in the range of possible sampling frequencies, the following Table shall be used to associate an implied sampling frequency with the desired sampling frequency dependent tables.
This table is for sampling_frequency_index deduction. And standards (both ISO/IEC 13818-7 and ISO/IEC 14496-3) imply that encoding have been made with one of these fixed frequencies.

Related

Different length of sound files with different sampling frequencies

Im currently struggling to understand what is happening. So, I created a sound using the audiowrite function in Matlab (the sound is created using two different sounds but I dont think it matters) first with a sampling frequency of 44100 Hz, and another one, the sound file is the same but the sampling frequency is 48000 Hz. Now I'm observing that the sound produced at 44100Hz is approx. 30sec longer than the other one (48000Hz sampling). It looks like phase shifting of some sort, but I'm not sure. Any help/explanation is appreciated. I also made a amplitude/time plot for better understanding:
(I set the x axis to 350sec to see where the signal ends).
EDIT: here is the code for how I create the sound file:
[y1,F1] = audioread(cave_file); %cave and forest files are mp3 files loaded earlier both have samp.freq of 48000Hz
[y2,F2] = audioread(forest_file);
samp_freq=44100;
%samp_freq=48000;
a = max(size(y1),size(y2));
z = [[y1;zeros(abs([a(1),0]-size(y1)))],[y2;zeros(abs([a(1),0]- size(y2)))]]
audiowrite('test_sound.wav', z,samp_freq);
What is the storage format? More specifically, is the info about sampling rate and number of channels stored in file meta data? which is then used during playback.
If so, then there are 3 possibilities for this behavior:
1) The sampling rate meta data of the 44.1KHz file is incorrect, while the audio was sampled at the correct rate i.e. 44.1KHz. Because the 44.1KHz file is playing longer than 48KHz, which I'm assuming to be producing the correct sound, and playing for the correct duration, it can be concluded that the sampling rate meta data of 44.1KHz is much lesser than 44.1KHz.
Could you please check the meta data? or attach the files here so that I can try to take a look?
2) The sampling didn't happen at the correct rate, while the meta data has 44.1KHz as the sampling rate.
3) The number of channels is incorrectly stored.
In case the files are raw PCMs, then this probably the correct sampling rate and/or number of channels is not selected when playing the 44.1KHz file.
Hope this helps

How can I save audio wav file without data clipping?

I am using MATLAB tool for extracting silence part from audio WAV files. After extracting silence part from audio, I want to save new audio as a WAV file.
For this process, I use 'audiowrite' function. However, the program warns to me with this message :
Warning: Data clipped when writing file.
I tried to add 'BitsPerSample' value with single file format(32 bit) and I dont take a message from program with this way. I saved audio files with 32 bit but WAV files should be 16 bit.
How can I fix this problem?
audiowrite(filename,y,fs,'BitsPerSample',32);
Note: I also normalized data and problem is same.
Thanks for your help!
UPDATE:
I want to normalize audio samples as mean 0 and standard deviation or variance 1.Thus, I use z-score normalization technique.
Also,y/max(abs(y)) method is normalized data between -1 and 1. However, mean and variance are not equal to 0 and 1 respectively. These techniques are normalized data with different way.
Actually, My question is that How can I save samples with z score normalization technique without data clipping?
Matlab's audiowrite uses different normalizations for different data types. So if you want to get 16bit audio wav file, you should normalize your data to the [-32768,32767] range and convert your data to int16 type:
y_normalized = intmax('int16') * y/(max(abs(y))*1.001);
audiowrite(filename, int16(y_normalized), fs)
Similarly, for float you should normalize your data to the [-1,+1] range :
y_normalized = y/(max(abs(y)));
audiowrite(filename, y_normalized, fs)

Receiving the right value when transmitting .dat file using FM radio

I am new to GNU Radio and I'm trying to transmit a value using it and the USRP B210 board.
I used Matlab to convert the value 0.121 to wav format then convert the wav file to .dat file using audio_to_file example in GNU Radio.
When I transmit the .dat file using the B210 and GNU Radio, I received a wav file but when I read the wav using matlab function (audioread()) I get a different value.
P.S.
Sample rate for the converted .dat file was 44100 Hz and 16 bits per sample.
The receiver and transmitter sampling rate is 400K Hz.
I used fm_tx4.py example from the GNU Radio package for my transmitter.
I used uhd_nbfm_receiver.grc for the receiver.
If you're wondering why your received signal doesn't have the same amplitude as your sent signal, you're not getting the very basics of radio communications: as there is no digital line between your transmitter and your receiver, power can go anywhere, and how much reaches the receiver depends on a lot of factors, including gain, antennas, distance, matching...
There will be a lot more things that are different on the RX side than they were on the TX side: Your reception has not been time-synchronized, so you might see a phase shift. You don't mention whether the receiver is the same, a clock-synchronized or an clock-independent B210, which means you have the general case, where no two physical clocks can be identical (yes, that's impossible, but you can reduce errors), so you'll generally see some frequency offset, too.
I recommend reading up a bit on basic radio comm theory, I often recommend GNU Radio's pictured introduction, and GNU Radio's suggested Reading Page. Michael Ossmann gets some recognition for his courses, too, so you should definitely have a look at them.
Also, all your data->Wav->transmit conversion is totally unnecessary. Matlabs fread/fwrite functions can read/store the native machine float format that GNU Radio's file_sink/file_source can store/read. See the FAQ entry.

Watermarking sound, reading through iPhone

I want to add a few bytes of data to a sound file (for example a song). The sound file will be transmitted via radio to a received who uses for example the iPhone microphone to pick up the sound, and an application will show the original bytes of data. Preferably it should not be hearable for humans.
What is such technology called? Are there any applications that can do this?
Libraries/apps that can be used on iPhone?
It's audio steganography. There are algorithms to do it. Refer to here.
I've done some research, and it seems the way to go is:
Use low audio frequencies.
Spread the "bits" around randomly - do not use a pattern as it will be picked up by the listener. "White noise" is a good clue. The random pattern is known by the sender and receiver.
Use Fourier transform to pick up frequency and amplitude
Clean up input data.
Use checksum/redundancy-algorithms to compensate for loss.
I'm writing a prototype and am having a bit difficulty in picking up the right frequency as if has a ~4 Hz offset (100 Hz becomes 96.x Hz when played and picked up by the microphone).
This is not the answer, but I hope it helps.

Compare sounds inside of the App

Is it possible to compare two sounds ?
for example app have already a sound file mp3 or any format, is it possible to compare any static sound file and recorded sound inside of app ?
Any comments are welcomed.
Regards
This forum thread has a good answer (about three down) - http://www.dsprelated.com/showmessage/103820/1.php.
The trick is to get the decoded audio from the mp3 - if they're just short 'hello' sounds, I'd store them inside the app as a wav instead of decoding them (though I've never used CoreAudio or any of the other frameworks before so mp3 decoding into memory might be easy).
When you've got your reference wav and your recorded wav, follow the steps in the post above :
1 Do whatever is necessary to convert .wav files to their discrete- time
signals:
http://www.sonicspot.com/guide/wavefiles.html
2 time-warping might or might not be necessary depending on difference
between two sample rates:
http://en.wikipedia.org/wiki/Dynamic_time_warping
3 After time warping, truncate both signals so that their durations are
equivalent.
4 Compute normalized energy spectral density (ESD) from DFT's two signals:
http://en.wikipedia.org/wiki/Power_spectrum.
6 Compute mean-square-error (MSE) between normalized ESD's of two
signals:
http://en.wikipedia.org/wiki/Mean_squared_error
The MSE between the normalized ESD's
of two signals is good metric of
closeness. If you have say, 10 .wav
files, and 2 of them are nearly the
same, but the others are not, the two
that are close should have a
relatively low MSE. Two perfectly
identical signals will obviously have
MSE of zero. Ideally, two "equivalent"
signals with different time scales,
(20-second human talking versus
5-second chipmunk), different energies
(soft-spoken human verus yelling
chipmunk), and different phases
(sampling began at slightly different
instant against continuous time
input); should still have MSE of zero,
but quantization errors inherent in
DSP will yield MSE slightly greater
than zero.
http://en.wikipedia.org/wiki/Minimum_mean-square_error
You should get two different MSE values, one between your male->recorded track and one between your female->recorded track. The comparison with the lowest difference is probably the correct gender.
I confess that I've never tried to do this and it looks very hard - good luck!