Linphone opus codec sampling rate - sip

I would like to use opus codec in linphone
But I have a few problems using it. If someone with opus codec knowledge could help me out would appreciate it.
How I can force audio sampling scheme to 8000 Hz? Currently, it uses 48000 Hz only.
Thanks in advance

If you look at rfc7587 Section 4.1, you can read this:
Opus supports 5 different audio bandwidths, which can be adjusted
during a stream. The RTP timestamp is incremented with a 48000 Hz
clock rate for all modes of Opus and all sampling rates. The unit
for the timestamp is samples per single (mono) channel. The RTP
timestamp corresponds to the sample time of the first encoded sample
in the encoded frame. For data encoded with sampling rates other
than 48000 Hz, the sampling rate has to be adjusted to 48000 Hz.
Reading more in the rfc7587, you will find out that, in SDP, you will always see the codec being using "OPUS/48000/2", no matter the real sampling rates.
No matter the real sampling rate, as explained above, the RTP timestamp will always be incremented with a 48000 Hz clock rate.
If you wish to control the real sampling rate for the codec (and thus, the bandwidth), you can use the following SDP parameters: maxplaybackrate and maxaveragebitrate are the ones to be used.
Section 3.1.1 is listing the relation between maxaveragebitrate and the sampling rate:
3.1.1. Recommended Bitrate
For a frame size of 20 ms, these are the bitrate "sweet spots" for Opus in various configurations:
o 8-12 kbit/s for NB speech,
o 16-20 kbit/s for WB speech,
o 28-40 kbit/s for FB speech,
o 48-64 kbit/s for FB mono music, and
o 64-128 kbit/s for FB stereo music.
Conclusion: to use only 8000Hz in OPUS, you must negotiate with such parameters, where 12kbit/s is the maximum setup for opus in NB speech:
m=audio 54312 RTP/AVP 101
a=rtpmap:101 opus/48000/2
a=fmtp:101 maxplaybackrate=8000; sprop-maxcapturerate=8000; maxaveragebitrate=12000
I don't know if linphone is following all the parameters, but this is the theory!

Related

Can non-standard sampling rates be used with AAC encoding?

I am trying to up-sampling an M4a file from 41000 to another non-standard sampling rate let's say 5000.
ffmpeg -i audio.wav -ar 5000 audio_.wav
This worked fine with wav files whoever it didn't work with m4a. Any ideas why?
If non-standard sampling rates don't work with AAC I need any documentation or reference for that.
The set of available sampling frequencies is limited by AAC ADIF (Audio Data Interchange Format) and ADTS (Audio Data Transport Stream). So other rates just can't be encoded in AAC stream. Here are values for field sampling_frequency_index form 8.1.1.2 in ISO/IEC 13818-7 standard:
About #slhck answer
According to ISO/IEC 13818-7 paragraph 8.2.3:
8.2.3 Decoding Process
Assuming that the start of a raw_data_block() is known, it can be decoded without any additional „transport-level“ information and produces 1024 audio samples per output channel. The sampling rate of the audio signal, as specified by the sampling_frequency_index, may be specified in a program_config_element() or it may be implied in the specific application domain. In the latter case, the sampling_frequency_index must be deduced in order for the bitstream to be parsed.
Since a given sampling frequency is associated with only one sampling frequency table, and since maximum flexibility is desired in the range of possible sampling frequencies, the following Table shall be used to associate an implied sampling frequency with the desired sampling frequency dependent tables.
This table is for sampling_frequency_index deduction. And standards (both ISO/IEC 13818-7 and ISO/IEC 14496-3) imply that encoding have been made with one of these fixed frequencies.

Different length of sound files with different sampling frequencies

Im currently struggling to understand what is happening. So, I created a sound using the audiowrite function in Matlab (the sound is created using two different sounds but I dont think it matters) first with a sampling frequency of 44100 Hz, and another one, the sound file is the same but the sampling frequency is 48000 Hz. Now I'm observing that the sound produced at 44100Hz is approx. 30sec longer than the other one (48000Hz sampling). It looks like phase shifting of some sort, but I'm not sure. Any help/explanation is appreciated. I also made a amplitude/time plot for better understanding:
(I set the x axis to 350sec to see where the signal ends).
EDIT: here is the code for how I create the sound file:
[y1,F1] = audioread(cave_file); %cave and forest files are mp3 files loaded earlier both have samp.freq of 48000Hz
[y2,F2] = audioread(forest_file);
samp_freq=44100;
%samp_freq=48000;
a = max(size(y1),size(y2));
z = [[y1;zeros(abs([a(1),0]-size(y1)))],[y2;zeros(abs([a(1),0]- size(y2)))]]
audiowrite('test_sound.wav', z,samp_freq);
What is the storage format? More specifically, is the info about sampling rate and number of channels stored in file meta data? which is then used during playback.
If so, then there are 3 possibilities for this behavior:
1) The sampling rate meta data of the 44.1KHz file is incorrect, while the audio was sampled at the correct rate i.e. 44.1KHz. Because the 44.1KHz file is playing longer than 48KHz, which I'm assuming to be producing the correct sound, and playing for the correct duration, it can be concluded that the sampling rate meta data of 44.1KHz is much lesser than 44.1KHz.
Could you please check the meta data? or attach the files here so that I can try to take a look?
2) The sampling didn't happen at the correct rate, while the meta data has 44.1KHz as the sampling rate.
3) The number of channels is incorrectly stored.
In case the files are raw PCMs, then this probably the correct sampling rate and/or number of channels is not selected when playing the 44.1KHz file.
Hope this helps

OPUS packet size

I have an application, that reads opus packets from a file. The file confirms opus packets in ogg format. My application sends each opus packet every 20 millisecond (it is configurable).
For 20 millisec, it sends packets of sizes ranging from 200 bytes to 400 bytes, say average size is 300 bytes.
Sending 300 bytes for 20millsec, is it right or its too much of data. How can I calculate for 20millisec how much data (in bytes) I can send to remote.
Can somebody help me to understand how to calculate number of bytes I need to send to remote party per 20millisec.
300 bytes/packet × 8 bits/byte / 20 ms/packet = 120 kbit/s
That is enough for good quality stereo music. Depending on the quality that you need, or if you are only sending mono or voice, you could potentially reduce the bitrate of the encoder. However if you are reading from an Ogg Opus file then the packets are already encoded, so it is too late to reduce the bitrate of the encoder unless you decode the packets and re-encode them at a lower bitrate.

what is the content of an .acc audio file?

i may sound too rookie please excuse me. When i read a .AAC audio file in Matlab using the audioread function the out put is a 256000x6 matrix. how do i know what is the content of each column?
filename = 'sample1.aac';
[y,Fs] = audioread(filename,'native');
writing the first column using audiowrite as below i can hear the whole sound. so what are the other columns?
audiowrite('sample2.wav',y,Fs);
Documentation:
https://uk.mathworks.com/help/matlab/ref/audioread.html
Output Arguments
y - Audio Data
Audio data in the file, returned as an m-by-n matrix, where m is the number of audio samples read and n is the number of audio channels in the file.
If you can hear the entire file in the first channel, it just means most of that file is contained in a mono channel. From Wikipedia r.e. AAC audio channels:
AAC supports inclusion of 48 full-bandwidth (up to 96 kHz) audio channels in one stream plus 16 low frequency effects (LFE, limited to 120 Hz) channels, up to 16 "coupling" or dialog channels, and up to 16 data streams
https://en.wikipedia.org/wiki/Advanced_Audio_Coding

ratio of bits/hz of digitized voice channel

We need to transmit 100 digitized voice channels using a passband
channel of 30 KHz. What should be the ratio of bits/Hz if we use no
guard band?
What I understand and get the bandwidth are :
30 KHz / 100 = 300 Hz
And the ratio of bit/Hz if no guard band is 640000/300 = 213.333... bits/Hz
(because digitized voice channel has a date rate of 64 Kbps)
Is that right answers?
digitization scheme. High quality PCM - 144 kbps, PCM (DSO) - 64 kbps, CVSD - 32 kbps, Compressed - 16 kbps, LPC - 2.4 kbps.
Assuming DSO, you have 100 channels at 64 kbps or 6400 kbps to send over a 20 kHz channel. Thus 6400 kbps / 20 kHZ = 320 b/Hz. Rather high packing but could be done over a noise-free channel.
Entropic multiplexing (don't send silent time) could reduce this by a factor of 5-10 for 32-64 k/HZ.