What value to use for Libopus encoder max_data_bytes field? - opus

I am currently using libopus in order to encode some audio that I have.
When consulting the documentation for how to use the encoder, one of the arguments the encode function takes in is max_data_bytes, a opus_int32 that has the following documentation:
Size of the allocated memory for the output payload. May be used to impose an upper limit on the instant bitrate, but should not be used as the only bitrate control
Unfortunately, I wasn't able to get much out of this definition as to how to set the upper size and the relation of this argument to bitrate. I tried consulting some of the examples provided such as this or this but both have the argument defined as some constant without much information.
Could anyone help me understand the definition of this value, and what number I might be interested in using for it? Thank you!

Depends on encoder version and encoding parameters.
In 1.1.4 the encoder doesn't merge packets and the upper limit should be 1275 byte. For the decoder, if repacketizer is used, you could find some packet up to 3*1275.
Things could be changed in recent version, I'm quite sure that the repacketizer has been somehow merged in the encoder. Look into the RFC.
Just paste here some of my notes from a 1½ years ago...
//Max opus frame size if 1275 as from RFC6716.
//If sample <= 20ms opus_encode return always an one frame packet.
//If celt is used and sample is 40 or 60ms, two or three frames packet is generated as max celt frame size is 20ms
//in this very specific case, the max packet size is multiplied by 2 or 3 respectively

Related

Variable Length MIDI Duration Algorithm

I'm trying to compile MIDI files, and I reached an issue with the duration values for track events. I know these values (according to this http://www.ccarh.org/courses/253/handout/vlv/) are variable length quantities where each byte is made up of a continuation bit (0 for no following duration byte and 1 for a following duration byte) and the rest of the number in a 7 bit representation.
For example, 128 would be represented as such:
1_0000001 0_0000000
The problem is that I'm having trouble wrapping my head around this concept, and am struggling to come up with an algorithm that can convert a decimal number to this format. I would appreciate it if someone could help me with this. Thanks in advance.
There is no need to re-invent the wheel. The official MIDI specification has example code for dealing with variable length values. You can freely download the specs from the official MIDI.org website.

Audioworklets and pitch

I've recently started working with audioworklets and am trying to figure out how to determine the pitch(s) from the input. I found a simple algorithm to use for a script processor, but the input values are different than a script processor and doesn't work. Plus each input array is only 128 units. So, how can I determine pitch using an audioworklet? As a bonus question, how do the values relate to the actual audio going in?
If it worked with a ScriptProcessorNode, it will work in an AudioWorklet, but you'll have to buffer the data in the worklet because, as you noted, you only get 128 frames per call. The ScriptProcessor gets anywhere from 256 to 16384.
The values going to the worklet are the actual values that are produced from the graph connected to the input. These are exactly the same values that would go to the script processor, except you get them in chunks of 128.

How to render in a specific bit depth?

How can OfflineAudioContext.startRendering() output an AudioBuffer that contains the bit depth of my choice (16 bits or 24 bits)? I know that I can set the sample rate of the output easily with AudioContext.sampleRate, but how do I set the bit depth?
My understanding of audio processing is pretty limited, so perhaps it's not as easy as I think it is.
Edit #1:
Actually, AudioContext.sampleRate is readonly, so if you have an idea on how to set the sample rate of the output, that would be great too.
Edit #2:
I guess the sample rate is inserted after the number of channels in the encoded WAV (in the DataView)
You can't do this directly because WebAudio only works with floating-point values. You'll have to do this yourself. Basically take the output from the offline context and multiply every sample by 32768 (16-bit) or 8388608 (24-bit) and round to an integer. This assumes that the output from the context lies within the range of -1 to 1. If not, you'll have to do additional scaling. And finally, you might want to divide the final result by 32768 (8388608) to get floating-point numbers back. That depends on what the final application is.
For Edit #1, the answer is that when you construct the OfflineAudioContext, you have to specify the sample rate. Set that to the rate you want. Not sure what AudioContext.sampleRate has to do with this.
For Edit #2, there's not enough information to answer since you don't say what DataView is.

Is anyone familiar with .BMT files and their structure?

This might be in the wrong place here but I am trying to use a simple BMP Image in a software for thermography called IRSoft. Is anyone maybe familiar with the file type .BMT?
I don't really want to reverse engineer too much but maybe someone else has an idea.
In the status line of IRSoft you can see the resolution of your camera. In my case it is 160x120 pixels. My BMT-files have always a size of 230588 bytes, that means some 12 bytes per pixel...
It seems to me that the last 160*120*4=76800 bytes of the BMT-file represents the thermal image:
4 bytes for every pixel. At file offset 153788 I can find the upper left pixel followed by the rest of the upper line. At the last offset 230584 I can find the lower right pixel.
I don't know the meaning of the rest of the file. Perhaps the real image, reference temperatures...
Do you know how to calculate the temperature out of these values?
This table translates the 4 byte values approximately to temperatures in degrees Celsius:
I am afraid they differ to other files.
0x41d00000 and more: 26.0°C an more
0x41c00000 and more: 24.0°C an more
0x41b00000 and more: 22.0°C an more
0x41a00000 and more: 20.0°C an more
0x41900000 and more: 18.0°C an more

Binary integral data compression

I need to transmit integral data types over the network but don't want to transfer all 32 (or 64) bits all the time - data fits into just one byte 99% of time - so it looks like it's need to compress it somehow: for example first bit of a byte is 0 if other 7 bits means just some value (0-127), otherwise (if first byte is 1) it's need to shift these 7 bytes left and read second byte to do the same process.
Is there some common way to do this? I don't want to reinvent a wheel...
Thank you.
The scheme you describe (which is essentially a base-128 encoding: each byte is a 7-bit base-128 "digit" and a single bit flag to indicate whether or not it is the final digit) is a common way of doing this.
For example, see:
the section on "LEB128" in the DWARF spec (§7.6);
"Base 128 Varints" in Google's protocol buffers;
"Variable Width Integers" in the LLVM bitcode format (various different widths are used in various different places there).
Just about any data compression algorithm would be able to compress that kind of data stream very well. Use whatever compression libraries your language provides.