How can I mix multiple stereo signals to one with WebAudio? - web-audio-api

I'm writing a web app which needs to combine a number of stereo sounds into one stereo output, so I want an equivalent of gstreamer's audiomixer element, but there doesn't seem to be one in WebAudio. ChannelMerger doesn't do quite the same thing - it combines multiple mono signals into one multi-channel signal.
The documentation for AudioNode.connect says that you can connect an output to multiple inputs of other nodes and that attempts to connect the same output to the same input more than once are ignored. But it doesn't say what will happen if you try to connect multiple different outputs to the same input. Would that act as a simple mixer like I want? I suspect not, because what splitting/merging functionality WebAudio does provide (see ChannelMerger above) seems to mostly be based on converting between multiple mono signals and one multi-channel signal with a one channel to one mono signal mapping.
I could take an arbitrary node (I guess a GainNode would work, and I could take advantage of its gain functionality) and set its channelInterpretation mode to "speakers" to actually mix channels, but it only works for 1, 2, 4 or 6 inputs. I'm unlikely to need more than 6, but I will definitely need to be able to handle 3, and possibly 5. That could be done by using more than one mixer (eg for three channels mix inputs 1 and 2 in one mixer, then mix its output with input 3 in a second mixer), but I think I would have to add more GainNodes to balance the mix correctly. A mixer presumably has to attenuate each input to prevent coincident peaks from clipping out of range, so with chained mixers without compensation I'd end up with 1/4,1/4,1/2 instead of 1/3,1/3,1/3?

You almost got it right. Use a single GainNode and connect each source to the single input to the GainNode. This will sum up all of the different connections and produce a single output. If you know all of the individual sources are stereo, you don't need to change anything about the channelInterpretation, channelCountMode, or channelCount to get what you want.
You will probably have to adjust the gain value of the GainNode to reduce the output volume so that you don't overdrive the output device.
Other than, this should all work.

Related

Audioworklets and pitch

I've recently started working with audioworklets and am trying to figure out how to determine the pitch(s) from the input. I found a simple algorithm to use for a script processor, but the input values are different than a script processor and doesn't work. Plus each input array is only 128 units. So, how can I determine pitch using an audioworklet? As a bonus question, how do the values relate to the actual audio going in?
If it worked with a ScriptProcessorNode, it will work in an AudioWorklet, but you'll have to buffer the data in the worklet because, as you noted, you only get 128 frames per call. The ScriptProcessor gets anywhere from 256 to 16384.
The values going to the worklet are the actual values that are produced from the graph connected to the input. These are exactly the same values that would go to the script processor, except you get them in chunks of 128.

Calibration of a soundcard in MATLAB

I have designed a GUI to calibrate my sound card using MATLAB, I am able to record my input signal. I would like to calibrate my input.
How do I do that?
My GUI should be capable to adapt to different sound cards and get the dBV values, hence the Calibration is required. Any help would be appreciated.
A: This is a task from a Metrology, rather than from a programming area
To get the job done, you need a fully-controlled-environment to re-run a defined-input/known-output experiment.
In principle,
your both all your devices and your setup, has to be controlled - i.e.
your MIC-Input-accoustic/electric converter, while [dBa] -> [V] conversion is
"readable" down the cable path, it is not a principally important value per-se,
your CABLE-wire-path, which shall not be either neglected or forgotten,
your SND-Card-A/D converter,
your AUDIO-pre-Calibration Sound-Sample,
your TEST-pre-Calibration Environment
so as to be able to pre-Calibrate your devices for measurments.
The calibration itself can be achieved right by using the same AUDIO Sound-Sample in the same TEST Environment and be that measured / calibrated / by another device, that was certified at a locally recognised reference Authority to have a certain level of precision ( a guarantee that it's readings will not be outside a natl./intl. recognised precision class' envelope from correct/exact values ).
Note: you may want to pre-Calibrate your MIC+SND-A/D setup inside your in-vitro controlled environment specifically across a wide range of frequencies, so as to avoid frequency-dependent variation of the measurement-conversion path. Thus your pre-Calibration would have sort of Calibration-curve as an input for your further tests to be performed in-vivo

Mixing sound files of different size

I want to mix audio files of different size into a one single .wav file without clipping any file.,i.e. The resulting file size should be equal to the largest sized file of all.
There is a sample through which we can mix files of same size
[(http://www.modejong.com/iOS/#ex4 )(Example 4)].
I modified the code to get the mixed file as a .wav file.
But I am not able to understand that how to modify this code for unequal sized files.
If someone can help me out with some code snippet,i'll be really thankful.
It should be as easy as sending all the files to the mixer simultaneously. When any single file gets to the end, just treat it as if the remainder is filled with zeroes. When all files get to the end, you are done.
Note that the example code says it returns an error if there would be clipping (the sum of the waves is greater than the max representable value.). This condition is more likely if you are mixing multiple inputs. The best way around it is to create some "headroom" in the input waves. You can do either do this in preprocessing, by ensuring that each wave's volume is no more than X% of maximum. (~80-90%, depending on number of inputs.). The other way is to do it dynamically in the mixer code by multiplying each sample by some value <1.0 as you add it to the mix.
If you are selecting the waves to mix at runtime and failure due to clipping is unacceptable, you will need to modify the sample code to pin the values at max/min instead of returning an error. Don't just let them overflow or you will get noisy artifacts.
(Clipping creates artifacts as well, but when you haven't created enough headroom before mixing, it is definitely preferrable to overflow. It is a more familiar-sounding type of distortion, similar to what you get when you overdrive your speakers. See this wikipedia article on clipping:
Clipping is preferable to the alternative in digital systems—wrapping—which occurs if the digital hardware is allowed to "overflow", ignoring the most significant bits of the magnitude, and sometimes even the sign of the sample value, resulting in gross distortion of the signal.
How I'd do it:
Much like the mix_buffers function that you linked to, but pass in 2 parameters for mixbufferNumSamples. Iterate over the whole of the longer of the two buffers. When the index has gone beyond the end of the shorter buffer, simply set the sample from that buffer to 0 for the rest of the function.
If you must avoid clipping and do it in real-time and you know nothing else about the two sounds, you must provide enough headroom. The simplest method is by halving each of the samples before mixing:
mixed = s1/2 + s2/2;
This ensures that the resultant mixed sample won't overflow an int16_t. It will have the side effect of making everything quieter though.
If you can run it offline, you can calculate a scale factor to apply to both waveforms which will keep the peaks when summed below the maximum allowed value.
Or you could mix them all at full volume to an int32_t buffer, keeping track of the largest (magnitude) mixed sample and then go back through the buffer multiplying each sample by a scale factor which will make that extreme sample just reach the +32767/-32768 limits.

Compare sounds inside of the App

Is it possible to compare two sounds ?
for example app have already a sound file mp3 or any format, is it possible to compare any static sound file and recorded sound inside of app ?
Any comments are welcomed.
Regards
This forum thread has a good answer (about three down) - http://www.dsprelated.com/showmessage/103820/1.php.
The trick is to get the decoded audio from the mp3 - if they're just short 'hello' sounds, I'd store them inside the app as a wav instead of decoding them (though I've never used CoreAudio or any of the other frameworks before so mp3 decoding into memory might be easy).
When you've got your reference wav and your recorded wav, follow the steps in the post above :
1 Do whatever is necessary to convert .wav files to their discrete- time
signals:
http://www.sonicspot.com/guide/wavefiles.html
2 time-warping might or might not be necessary depending on difference
between two sample rates:
http://en.wikipedia.org/wiki/Dynamic_time_warping
3 After time warping, truncate both signals so that their durations are
equivalent.
4 Compute normalized energy spectral density (ESD) from DFT's two signals:
http://en.wikipedia.org/wiki/Power_spectrum.
6 Compute mean-square-error (MSE) between normalized ESD's of two
signals:
http://en.wikipedia.org/wiki/Mean_squared_error
The MSE between the normalized ESD's
of two signals is good metric of
closeness. If you have say, 10 .wav
files, and 2 of them are nearly the
same, but the others are not, the two
that are close should have a
relatively low MSE. Two perfectly
identical signals will obviously have
MSE of zero. Ideally, two "equivalent"
signals with different time scales,
(20-second human talking versus
5-second chipmunk), different energies
(soft-spoken human verus yelling
chipmunk), and different phases
(sampling began at slightly different
instant against continuous time
input); should still have MSE of zero,
but quantization errors inherent in
DSP will yield MSE slightly greater
than zero.
http://en.wikipedia.org/wiki/Minimum_mean-square_error
You should get two different MSE values, one between your male->recorded track and one between your female->recorded track. The comparison with the lowest difference is probably the correct gender.
I confess that I've never tried to do this and it looks very hard - good luck!

iPhone audio and AFSK

Here is a question for all you iPhone experts:
If you guys remember the sounds that modems used to make, or when one was trying to load a program from a cassette tape – I am trying to replicate this in an iPhone for a ham radio application. I have a stream of data (ASCII) and I need to encode it as AFSK at 1200 baud. So basically everything in the stream is converted to a series of 1200 and 2200 Hz tones. It needs to sound something like this: http://upload.wikimedia.org/wikipedia/commons/2/27/AFSK_1200_baud.ogg
I successfully built a bit array out of the string, but when I try to assign tones to each bit I get gaps in the sound, therefore it doesn’t demodulate correctly.
Any thought of how one should tackle this problem? Thank you.
The mobilesynth project is open-source. You might be able to scan that for code that generates the tones you need.
How are you assigning tones to the bits? Remember, a digital audio signal is just a stream of samples with values between -1 and 1. Perhaps there is a clipping issue between tone assignments. This can happen if the signal dives below -1 or above 1. If it stays above or below this range at a constant value, there will be no sound. Maybe you could output your stream of samples to check if this is the case. Or plug the output into an oscilloscope...
Also note that clicking can occur between "uneven" transitions of signals. For example if i output a sample with value 1 followed immediately by a sample with value -1, a click or pop will be produced.