creating an audioWorkletNode with 4 channels? - web-audio-api

I am working on a mod player which is an audio file with 4 different tracks (channels) using webaudio/audioWorkletNode.
I got it working correctly using a 2 channel (stereo) audio node:
channels (tracks) 0 & 3 are mixed into the left channel
channels (tracks) 1 & 2 are mixed into the right channel
The problem is that I'd like to analyse and show a waveform display for each of the tracks (so there should be 4 different analysers).
I had the idea of creating an audioWorkletNode with outputChannelCount set to [4], connect an analyser to each of the node's four channels, and then use a channelMerger to mix it into 2 stereo channels.
So I used the following code, expecting it to create a node with 4 channels:
let node = new AudioWorkletNode(context, 'processor', { outputChannelCount: [4] });
But the outputChannelCount parameter seems to be ignored. No matter what I specify, it's set to 2 channels in the end.
Is there a way to do it another way, or must I handle the analyse myself, using my own analyser?

I finally found a way to mix all four channels and pass each channel to its own analyser by doing that:
this.context.audioWorklet.addModule(`js/${soundProcessor}`).then(() =>
{
this.splitter = this.context.createChannelSplitter(4);
// Use 4 inputs that will be used to send each track's data to a separate analyser
// NOTE: what should we do if we support more channels (and different mod formats)?
this.workletNode = new AudioWorkletNode(this.context, 'mod-processor', {
outputChannelCount: [1, 1, 1, 1],
numberOfInputs: 0,
numberOfOutputs: 4
});
this.workletNode.port.onmessage = this.handleMessage.bind(this);
this.postMessage({
message: 'init',
mixingRate: this.mixingRate
});
this.workletNode.port.start();
// create four analysers and connect each worklet's input to one
this.analysers = new Array();
for (let i = 0; i < 4; ++i) {
const analyser = this.context.createAnalyser();
analyser.fftSize = 256;// Math.pow(2, 11);
analyser.minDecibels = -90;
analyser.maxDecibels = -10;
analyser.smoothingTimeConstant = 0.65;
this.workletNode.connect(analyser, i, 0);
this.analysers.push(analyser);
}
this.merger = this.context.createChannelMerger(4);
// merge the channel 0+3 in left channel, 1+2 in right channel
this.workletNode.connect(this.merger, 0, 0);
this.workletNode.connect(this.merger, 1, 1);
this.workletNode.connect(this.merger, 2, 1);
this.workletNode.connect(this.merger, 3, 0);
this.merger.connect(this.context.destination);
});
I basically create a new node with 4 outputs and use the outputs as a channel. To produce a stereo output I can then use a channel merger. And voila!
Complete source code of the app can be found here: https://warpdesign.github.io/modplayer-js/

Related

I have trouble getting depth information from the DEPTH16 format with the Camera2 API using ToF on P30 pro

I am currently testing options for depth measurement with the smartphone and wanted to create a depth image initially for testing. I am using the Camera2Basic example as a basis for this. (https://github.com/android/camera-samples/tree/main/Camera2Basic) Using Depth16 I get a relatively sharp "depth image" back. But the millimetres are not correct. They are in a range around from 3600mm to 5000mm for an object like a wall that is about 500mm or 800mm away from the camera.
But what puzzles me the most is that the image does not transmit any information in the dark. If Android is really targeting the ToF sensor for DEPTH16, it shouldn't be affected in the dark, should it? Or do I have to use AR-Core or Huawei's HMS core to get a real ToF image?
I am using a Huawei P30 Pro and the code for extracting the depth information looks like this. And yes performance wise it is bullshit but it is only for testing purposes:)
private Map<String, PixelData> parseDepth16IntoDistanceMap(Image image) {
Map<String, PixelData> map = new HashMap();
Image.Plane plane = image.getPlanes()[0];
// using asShortBuffer() like in the documentation leads to a wrong format (for me) but does not help getting better values
ByteBuffer depthBuffer = plane.getBuffer().order(ByteOrder.nativeOrder());
int stride = plane.getRowStride();
int offset = 0;
int i = 0;
for (short y = 0; y < image.getHeight(); y++) {
for (short x = 0; x < image.getWidth(); x++) {
short depthSample = depthBuffer.getShort( (y / 2) * stride + x);
short depthSampleShort = (short) depthSample;
short depthRange = (short) (depthSampleShort & 0x1FFF);
short depthConfidence = (short) ((depthSampleShort >> 13) & 0x7);
float depthPercentage = depthConfidence == 0 ? 1.f : (depthConfidence - 1) / 7.f;
maxz = depthRange;
sum = sum + depthRange;
numPoints++;
listOfRanges.add((float) depthRange);
if (depthRange < minz && depthRange > 0) {
minz = depthRange;
}
map.put(x + "_" + y, new PixelData(x, y, depthRange, depthPercentage));
i++;
}
}
return map;
}
In any case, it would help a lot to know if you can get the data this way at all, so I know if I'm already doing something fundamentally wrong. Otherwise I will change to one of the ar systems. Either way, many thanks for your efforts
If you want to extract a depth map where you can see the distance to an object you might use ARCORE Depth API.
https://developers.google.com/ar/develop/java/depth/overview
Or you can follow the codelab where shows you how to get the data in millimeters.
https://codelabs.developers.google.com/codelabs/arcore-depth#0

Transmitting data from arduino to raspberry pi (and store in variables) through serial communication

Currently, I'm using an arduino to read a joystick position, which outputs 3 values. A switch button output (1 or 0), x coord (0 - 1023), and y coord (0 - 1023). I use Serial.print to print the values to the serial monitor and using Raspberry Pi's grabserial, I get the serial data to the pi. However, I'm using ser.readline().decode('utf-8')[:-2] and I can't seem to be able to assign data to a variable. I'm trying to store the 3 most recent data values (switch, x coord, y coord) into 3 separate variables so that I can say if 'switch' is less than [something] and greater than [something] then play some command. How do I store the 3 most recent data values into 3 variables?
I have tried to use something like 'switch' = ser.readline()
if 'switch' == 1: then print("switch is not pressed") which should be printing but it says 'switch' is not equal to 1 so the data is not correctly assigned into a variable.
#Arduino
// Arduino pin numbers
const int SW_pin = 2; // connected to digital pin 2
const int X_pin = 0; // connected to analog pin 0
const int Y_pin = 1; // connected to analog pin 1
void setup() {
pinMode(SW_pin, INPUT);
digitalWrite(SW_pin, HIGH);
Serial.begin(9600);
}
void loop() {
Serial.println(digitalRead(SW_pin));
Serial.println(analogRead(X_pin));
Serial.println(analogRead(Y_pin));
delay(500);
}
# Raspberry Pi
import serial
ser = serial.Serial("/dev/ttyACM0", 9600, timeout = 0.5)
While True:
Switch = ser.readline().decode('utf-8')[:-2]
if Switch == 1:
print ("Switch is not pressed")
I expected that it would print "switch is not pressed" every 3 values, but it just prints "1". Right now I'm trying to make one reading work which it does not, but I need all three of them working at the same time.
You are comparing a string and an int. Try using
Switch == "1"

Torch: back-propagation from loss computed over a subset of the output

I have a simple convolutional neural network, whose output is a single channel 4x4 feature map. During training, the (regression) loss needs to be computed only on a single value among the 16 outputs. The location of this value will be decided after the forward pass. How do I compute the loss from just this one output, while making sure all irrelevant gradients are zero'ed out during back-prop.
Let's say I have the following simple model in torch:
require 'nn'
-- the input
local batch_sz = 2
local x = torch.Tensor(batch_sz, 3, 100, 100):uniform(-1,1)
-- the model
local net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 128, 9, 9, 9, 9, 1, 1))
net:add(nn.SpatialConvolution(128, 1, 3, 3, 3, 3, 1, 1))
net:add(nn.Squeeze(1, 3))
print(net)
-- the loss (don't know how to employ it yet)
local loss = nn.SmoothL1Criterion()
-- forward'ing x through the network would result in a 2x4x4 output
y = net:forward(x)
print(y)
I have looked at nn.SelectTable and it seems like if I convert the output into tabular form I would be able to implement what I want?
This is my current solution. It works by splitting the output into a table, and then using nn.SelectTable():backward() to get the full gradient:
require 'nn'
-- the input
local batch_sz = 2
local x = torch.Tensor(batch_sz, 3, 100, 100):uniform(-1,1)
-- the model
local net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 128, 9, 9, 9, 9, 1, 1))
net:add(nn.SpatialConvolution(128, 1, 3, 3, 3, 3, 1, 1))
net:add(nn.Squeeze(1, 3))
-- convert output into a table format
net:add(nn.View(1, -1)) -- vectorize
net:add(nn.SplitTable(1, 1)) -- split all outputs into table elements
print(net)
-- the loss
local loss = nn.SmoothL1Criterion()
-- forward'ing x through the network would result in a (2)x4x4 output
y = net:forward(x)
print(y)
-- returns the output table's index belonging to specific location
function get_sample_idx(feat_h, feat_w, smpl_idx, feat_r, feat_c)
local idx = (smpl_idx - 1) * feat_h * feat_w
return idx + feat_c + ((feat_r - 1) * feat_w)
end
-- I want to back-propagate the loss of this sample at this feature location
local smpl_idx = 2
local feat_r = 3
local feat_c = 4
-- get the actual index location in the output table (for a 4x4 output feature map)
local out_idx = get_sample_idx(4, 4, smpl_idx, feat_r, feat_c)
-- the (fake) ground-truth
local gt = torch.rand(1)
-- compute loss on the selected feature map location for the selected sample
local err = loss:forward(y[out_idx], gt)
-- compute loss gradient, as if there was only this one location
local dE_dy = loss:backward(y[out_idx], gt)
-- now convert into full loss gradient (zero'ing out irrelevant losses)
local full_dE_dy = nn.SelectTable(out_idx):backward(y, dE_dy)
-- do back-prop through who network
net:backward(x, full_dE_dy)
print("The full dE/dy")
print(table.unpack(full_dE_dy))
I would really appreciate it somebody points out a simpler OR more efficient method.

Merging geometries using a WebWorker?

Anyone know if it's possible to merge a set of cube geometries in a web worker and pass it back to the main thread? Was thinking this could reduce the lag when merging large amounts of cubes.
Does Three.JS work okay in a web worker, and if it does, would it be possible (and faster) to do this? Not sure if passing the geometry back would take just as long as merging it normally.
At the moment I'm using a timed for loop to reduce the lag:
// This array is populated by the server and contains the chunk position and data (which I do nothing with yet).
var sectionData = data.secData;
var section = 0;
var tick = function() {
var start = new Date().getTime();
for (; section < sectionData.length && (new Date().getTime()) - start < 1; section++) {
var sectionXPos = sectionData[section][0] * 10;
var sectionZPos = sectionData[section][1] * 10;
var combinedGeometry = new THREE.Geometry();
for (var layer = 0; layer < 1; layer++) { // Only 1 layer because of the lag...
for (var x = 0; x < 10; x++) {
for (var z = 0; z < 10; z++) {
blockMesh.position.set(x-4.5, layer-.5, z-4.5);
blockMesh.updateMatrix();
THREE.GeometryUtils.merge(combinedGeometry, blockMesh);
}
}
}
var sectionMesh = new THREE.Mesh(combinedGeometry, grassBlockMat);
sectionMesh.position.set(sectionXPos, 0, sectionZPos);
sectionMesh.matrixAutoUpdate = false;
sectionMesh.updateMatrix();
scene.add(sectionMesh);
}
if (section < sectionData.length) {
setTimeout(tick, 25);
}
};
setTimeout(tick, 25);
Using Three.JS rev59-dev.
Merged cubes make up the terrain in chunks, and at the moment (due to the lag) each chunk only has 1 layer.
Any tips would be appreciated! Thanks.
THREE.JS will not work in a web worker, however you can copy those parts of the library that you need to work both in the main thread and in your web worker.
Your first problem will be that you cannot send the geometry object itself back to the main thread.
Since the web worker onmessage variable passing works only by sending copies of JSON (not javascript objects) or references to ArrayBuffers, you would have to decode the geometry down to each float, pack it in an ArrayBuffer, and send a reference back to the main thread.
Note those are called transferable objects and once sent, they are cleared in the webworker / main thread from which they came.
See here for more details:
http://www.html5rocks.com/en/tutorials/workers/basics/
https://developer.mozilla.org/en-US/docs/Web/Guide/Performance/Using_web_workers
Here is an example of packing position vertices into an array for a physics type system:
//length * 3 axes * 4 bytes per vertex
var posBuffer = new Float32Array(new ArrayBuffer(len * 3 * 4));
//in a loop
//... do hard work
posBuffer[i * 3] = pos.x; //pos is a threejs vector
posBuffer[i * 3 + 1] = pos.y;
posBuffer[i * 3 + 2] = pos.z;
//after loop send buffer to main thread
self.postMessage({posBuffer:posBuffer}, [posBuffer.buffer]);
I copied the THREE.JS vector class inside my web worker and cut out all the methods I didn't need to keep it nice and lean.
FYI it's not slow and for something like n-body collisions it works well.
The main thread sends a command to the web worker telling it to run the update and then listens for the response. Kind of like a producer consumer model in regular threading.

How to use an Audio Unit on the iPhone

I'm looking for a way to change the pitch of recorded audio as it is saved to disk, or played back (in real time). I understand Audio Units can be used for this. The iPhone offers limited support for Audio Units (for example it's not possible to create/use custom audio units, as far as I can tell), but several out-of-the-box audio units are available, one of which is AUPitch.
How exactly would I use an audio unit (specifically AUPitch)? Do you hook it into an audio queue somehow? Is it possible to chain audio units together (for example, to simultaneously add an echo effect and a change in pitch)?
EDIT: After inspecting the iPhone SDK headers (I think AudioUnit.h, I'm not in front of a Mac at the moment), I noticed that AUPitch is commented out. So it doesn't look like AUPitch is available on the iPhone after all. weep weep
Apple seems to have better organized their iPhone SDK documentation at developer.apple.com of late - now its more difficult to find references to AUPitch, etc.
That said, I'm still interested in quality answers on using Audio Units (in general) on the iPhone.
There are some very good resources here (http://michael.tyson.id.au/2008/11/04/using-remoteio-audio-unit/) for using the RemoteIO Audio Unit. In my experience working with Audio Units on the iPhone, I've found that I can implement a transformation manually in the callback function. In doing so, you might find that solves you problem.
Regarding changing pitch on the iPhone, OpenAL is the way to go. Check out the SoundManager class available from www.71squared.com for a great example of an OpenAL sound engine that supports pitch.
- (void)modifySpeedOf:(CFURLRef)inputURL byFactor:(float)factor andWriteTo:(CFURLRef)outputURL {
ExtAudioFileRef inputFile = NULL;
ExtAudioFileRef outputFile = NULL;
AudioStreamBasicDescription destFormat;
destFormat.mFormatID = kAudioFormatLinearPCM;
destFormat.mFormatFlags = kAudioFormatFlagsCanonical;
destFormat.mSampleRate = 44100 * factor;
destFormat.mBytesPerPacket = 2;
destFormat.mFramesPerPacket = 1;
destFormat.mBytesPerFrame = 2;
destFormat.mChannelsPerFrame = 1;
destFormat.mBitsPerChannel = 16;
destFormat.mReserved = 0;
ExtAudioFileCreateWithURL(outputURL, kAudioFileCAFType,
&destFormat, NULL, kAudioFileFlags_EraseFile, &outputFile);
ExtAudioFileOpenURL(inputURL, &inputFile);
//find out how many frames is this file long
SInt64 length = 0;
UInt32 dataSize2 = (UInt32)sizeof(length);
ExtAudioFileGetProperty(inputFile,
kExtAudioFileProperty_FileLengthFrames, &dataSize2, &length);
SInt16 *buffer = (SInt16*)malloc(kBufferSize * sizeof(SInt16));
UInt32 totalFramecount = 0;
AudioBufferList bufferList;
bufferList.mNumberBuffers = 1;
bufferList.mBuffers[0].mNumberChannels = 1;
bufferList.mBuffers[0].mData = buffer; // pointer to buffer of audio data
bufferList.mBuffers[0].mDataByteSize = kBufferSize *
sizeof(SInt16); // number of bytes in the buffer
while(true) {
UInt32 frameCount = kBufferSize * sizeof(SInt16) / 2;
// Read a chunk of input
ExtAudioFileRead(inputFile, &frameCount, &bufferList);
totalFramecount += frameCount;
if (!frameCount || totalFramecount >= length) {
//termination condition
break;
}
ExtAudioFileWrite(outputFile, frameCount, &bufferList);
}
free(buffer);
ExtAudioFileDispose(inputFile);
ExtAudioFileDispose(outputFile);
}
it will change pitch based on factor
I've used the NewTimePitch audio unit for this before, the Audio Component Description for that is
var newTimePitchDesc = AudioComponentDescription(componentType: kAudioUnitType_FormatConverter,
componentSubType: kAudioUnitSubType_NewTimePitch,
componentManufacturer: kAudioUnitManufacturer_Apple,
componentFlags: 0,
componentFlagsMask: 0)
then you can change the pitch parameter with an AudioUnitSetParamater call. For example this changes the pitch by -1000 cents
err = AudioUnitSetParameter(newTimePitchAudioUnit,
kNewTimePitchParam_Pitch,
kAudioUnitScope_Global,
0,
-1000,
0)
The parameters for this audio unit are as follows
// Parameters for AUNewTimePitch
enum {
// Global, rate, 1/32 -> 32.0, 1.0
kNewTimePitchParam_Rate = 0,
// Global, Cents, -2400 -> 2400, 1.0
kNewTimePitchParam_Pitch = 1,
// Global, generic, 3.0 -> 32.0, 8.0
kNewTimePitchParam_Overlap = 4,
// Global, Boolean, 0->1, 1
kNewTimePitchParam_EnablePeakLocking = 6
};
but you'll only need to change the pitch parameter for your purposes. For a guide on how to implement this refer to Justin's answer