Interpreting inputBuffer's Value in a Callback - callback

I am basing my code off of Portaudio's paex_record_file.c example. One of the parameters in the callback is inputBuffer, and I wanted to use its data to calculate other numbers with the double/float type. I changed the file from a .raw to a .txt, but notepad still cannot read it, leading me to believe its data is not actually encoded as a number. How is the data stored in inputBuffer and how can I do arithmetic with it (add, multiply, divide, etc)?
This is how I initialized inputParameters:
inputParameters.device = Pa_GetDefaultInputDevice(); /* default input device */
if (inputParameters.device == paNoDevice) {
fprintf(stderr,"Error: No default input device.\n");
goto error;
}
inputParameters.channelCount = 2; /* stereo input */
inputParameters.sampleFormat = paFloat32;
inputParameters.suggestedLatency = Pa_GetDeviceInfo( inputParameters.device )->defaultLowInputLatency;
inputParameters.hostApiSpecificStreamInfo = NULL;
This question is somewhat related to print floats from audio input callback function (unanswered).

The inputBuffer parameter to the callback is a void*. The actual type of the underlying buffer depends on the parameters and the flags that you pass to Pa_OpenStream.
If you specified paFloat32 then there will be a float* in there somewhere. However the are two possibilities:
Interleaved: inputParameters.sampleFormat = paFloat32;
Non-Interleaved: inputParameters.sampleFormat = paFloat32|paNonInterleaved;
You specified the interleaved option. In this case, inputBuffer points to a single buffer of interleaved floats. So you can write:
float *samples = (float*)inputBuffer;
In a two channel stream samples will contain interleaved left and right samples, e.g.:
samples[0]; // first left sample
samples[1]; // first right sample
samples[2]; // second left sample
samples[3]; // second right sample
// etc.
For completeness: If it had been a non-interleaved stream then inputBuffer points to an array of pointers to single-channel buffers. To extract the buffer pointers you would write something like:
float *left = ((float **) inputBuffer)[0];
float *right = ((float **) inputBuffer)[1];
Note that in all cases framesPerBuffer counts frames not samples. A frame includes one sample from each channel. For example, in a stereo stream, a frame includes both the left and right channel samples.

Related

In Unity, how to segment the user's voice from microphone based on loudness?

I need to collect voice pieces from a continuous audio stream. I need to process later the user's voice piece that has just been said (not for speech recognition). What I am focusing on is only the voice's segmentation based on its loudness.
If after at least 1 second of silence, his voice becomes loud enough for a while, and then silent again for at least 1 second, I say this is a sentence and the voice should be segmented here.
I just know I can get raw audio data from the AudioClip created by Microphone.Start(). I want to write some code like this:
void Start()
{
audio = Microphone.Start(deviceName, true, 10, 16000);
}
void Update()
{
audio.GetData(fdata, 0);
for(int i = 0; i < fdata.Length; i++) {
u16data[i] = Convert.ToUInt16(fdata[i] * 65535);
}
// ... Process u16data
}
But what I'm not sure is:
Every frame when I call audio.GetData(fdata, 0), what I get is the latest 10 seconds of sound data if fdata is big enough or shorter than 10 seconds if fdata is not big enough, is it right?
fdata is a float array, and what I need is a 16 kHz, 16 bit PCM buffer. Is it right to convert the data like: u16data[i] = fdata[i] * 65535?
What is the right way to detect loud moments and silent moments in fdata?
No. you have to read starting at the current position within the AudioClip using Microphone.GetPosition
Get the position in samples of the recording.
and pass the optained index to AudioClip.GetData
Use the offsetSamples parameter to start the read from a specific position in the clip
fdata = new float[clip.samples * clip.channels];
var currentIndex = Microphone.GetPosition(null);
audio.GetData(fdata, currentIndex);
I don't understand what exactly you convert this for. fdata will contain
floats ranging from -1.0f to 1.0f (AudioClip.GetData)
so if for some reason you need to get values between short.MinValue (= -32768) and short.MaxValue(= 32767) than yes you can do that using
u16data[i] = Convert.ToUInt16(fdata[i] * short.MaxValue);
note however that Convert.ToUInt16(float):
value, rounded to the nearest 16-bit unsigned integer. If value is halfway between two whole numbers, the even number is returned; that is, 4.5 is converted to 4, and 5.5 is converted to 6.
you might want to rather use Mathf.RoundToInt first to also round up if a value is e.g. 4.5.
u16data[i] = Convert.ToUInt16(Mathf.RoundToInt(fdata[i] * short.MaxValue));
Your naming however suggests that you are actually trying to get unsigned values ushort (or also UInt16). For this you can not have negative values! So you have to shift the float values up in order to map the range (-1.0f | 1.0f ) to the range (0.0f | 1.0f) before multiplaying it by ushort.MaxValue(= 65535)
u16data[i] = Convert.ToUInt16(Mathf.RoundToInt(fdata[i] + 1) / 2 * ushort.MaxValue);
What you receive from AudioClip.GetData are the gain values of the audio track between -1.0f and 1.0f.
so a "loud" moment would be where
Mathf.Abs(fdata[i]) >= aCertainLoudThreshold;
a "silent" moment would be where
Mathf.Abs(fdata[i]) <= aCertainSiltenThreshold;
where aCertainSiltenThreshold might e.g. be 0.2f and aCertainLoudThreshold might e.g. be 0.8f.

How to read and write bits in a chunk of memory in Swift

I would like to know how to read a binary file into memory (writing it to memory like an "Array Buffer" from JavaScript), and write to different parts of memory 8-bit, 16-bit, 32-bit etc. values, even 5 bit or 10 bit values.
extension Binary {
static func readFileToMemory(_ file) -> ArrayBuffer {
let data = NSData(contentsOfFile: "/path/to/file/7CHands.dat")!
var dataRange = NSRange(location: 0, length: ?)
var ? = [Int32](count: ?, repeatedValue: ?)
data.getBytes(&?, range: dataRange)
}
static func writeToMemory(_ buffer, location, value) {
buffer[location] = value
}
static func readFromMemory(_ buffer, location) {
return buffer[location]
}
}
I have looked at a bunch of places but haven't found a standard reference.
https://github.com/nst/BinUtils/blob/master/Sources/BinUtils.swift
https://github.com/apple/swift/blob/master/stdlib/public/core/ArrayBuffer.swift
https://github.com/uraimo/Bitter/blob/master/Sources/Bitter/Bitter.swift
In Swift, how do I read an existing binary file into an array?
Swift - writing a byte stream to file
https://apple.github.io/swift-nio/docs/current/NIO/Structs/ByteBuffer.html
https://github.com/Cosmo/BinaryKit/blob/master/Sources/BinaryKit.swift
https://github.com/vapor-community/bits/blob/master/Sources/Bits/Data%2BBytesConvertible.swift
https://academy.realm.io/posts/nate-cook-tryswift-tokyo-unsafe-swift-and-pointer-types/
https://medium.com/#gorjanshukov/working-with-bytes-in-ios-swift-4-de316a389a0c
I would like for this to be as low-level as possible. So perhaps using UnsafeMutablePointer, UnsafePointer, or UnsafeMutableRawPointer.
Saw this as well:
let data = NSMutableData()
var goesIn: Int32 = 42
data.appendBytes(&goesIn, length: sizeof(Int32))
println(data) // <2a000000]
var comesOut: Int32 = 0
data.getBytes(&comesOut, range: NSMakeRange(0, sizeof(Int32)))
println(comesOut) // 42
I would basically like to allocate a chunk of memory and be able to read and write from it. Not sure how to do that. Perhaps using C is the best way, not sure.
Just saw this too:
let rawData = UnsafeMutablePointer<UInt8>.allocate(capacity: width * height * 4)
If you're looking for low level code you'll need to use UnsafeMutableRawPointer. This is a pointer to a untyped data. Memory is accessed in bytes, so 8 chunks of at least 8 bits. I'll cover multiples of 8 bits first.
Reading a File
To read a file this way, you need to manage file handles and pointers yourself. Try the the following code:
// Open the file in read mode
let file = fopen("/Users/joannisorlandos/Desktop/ownership", "r")
// Files need to be closed manually
defer { fclose(file) }
// Find the end
fseek(file, 0, SEEK_END)
// Count the bytes from the start to the end
let fileByteSize = ftell(file)
// Return to the start
fseek(file, 0, SEEK_SET)
// Buffer of 1 byte entities
let pointer = UnsafeMutableRawPointer.allocate(byteCount: fileByteSize, alignment: 1)
// Buffer needs to be cleaned up manually
defer { pointer.deallocate() }
// Size is 1 byte
let readBytes = fread(pointer, 1, fileByteSize, file)
let errorOccurred = readBytes != fileByteSize
First you need to open the file. This can be done using Swift strings since the compiler makes them into a CString itself.
Because cleanup is all for us on this low level, a defer is put in place to close the file at the end.
Next, the file is set to seek the end of the file. Then the distance between the start of the file and the end is calculated. This is used later, so the value is kept.
Then the program is set to return to the start of the file, so the application starts reading from the start.
To store the file, a pointer is allocated with the amount of bytes that the file has in the file system. Note: This can change inbetween the steps if you're extremely unlucky or the file is accessed quite often. But I think for you, this is unlikely.
The amount of bytes is set, and aligned to one byte. (You can learn more about memory alignment on Wikipedia.
Then another defer is added to make sure no memory leaks at the end of this code. The pointer needs to be deallocated manually.
The file's bytes are read and stored in the pointer. Do note that this entire process reads the file in a blocking manner. It can be more preferred to read files asynchronously, if you plan on doing that I'll recommend looking into a library like SwiftNIO instead.
errorOccurred can be used to throw an error or handle issues in another manner.
From here, your buffer is ready for manipulation. You can print the file if it's text using the following code:
print(String(cString: pointer.bindMemory(to: Int8.self, capacity: fileByteSize)))
From here, it's time to learn how to read manipulate the memory.
Manipulating Memory
The below demonstrates reading byte 20..<24 as an Int32.
let int32 = pointer.load(fromByteOffset: 20, as: Int32.self)
I'll leave the other integers up to you. Next, you can alos put data at a position in memory.
pointer.storeBytes(of: 40, toByteOffset: 30, as: Int64.self)
This will replace byte 30..<38 with the number 40. Note that big endian systems, although uncommon, will store information in a different order from normal little endian systems. More about that here.
Modifying Bits
As you notes, you're also interested in modifying five or ten bits at a time. To do so, you'll need to mix the previous information with the new information.
var data32bits = pointer.load(fromByteOffset: 20, as: Int32.self)
var newData = 0b11111000
In this case, you'll be interested in the first 5 bits and want to write them over bit 2 through 7. To do so, first you'll need to shift the bits to a position that matches the new position.
newData = newData >> 2
This shifts the bits 2 places to the right. The two left bits that are now empty are therefore 0. The 2 bits on the right that got shoved off are not existing anymore.
Next, you'll want to get the old data from the buffer and overwrite the new bits.
To do so, first move the new byte into a 32-bits buffer.
var newBits = numericCast(newData) as Int32
The 32 bits will be aligned all the way to the right. If you want to replace the second of the four bytes, run the following:
newBits = newBits << 16
This moves the fourth pair 16 bit places left, or 2 bytes. So it's now on position 1 starting from 0.
Then, the two bytes need to be added on top of each other. One common method is the following:
let oldBits = data32bits & 0b11111111_11000001_11111111_11111111
let result = oldBits | newBits
What happens here is that we remove the 5 bits with new data from the old dataset. We do so by doing a bitwise and on the old 32 bits and a bitmap.
The bitmap has all 1's except for the new locations which are being replaced. Because those are empty in the bitmap, the and operator will exclude those bits since one of the two (old data vs. bitmap) is empty.
AND operators will only be 1 if both sides of the operator are 1.
Finally, the oldBits and the newBits are merged with an OR operator. This will take each bit on both sides and set the result to 1 if the bits at both positions are 1.
This will merge successfully since both buffers contain 1 bits that the other number doesn't set.

AVAudioPCMBuffer built programmatically, not playing back in stereo

I'm trying to fill an AVAudioPCMBuffer programmatically in Swift to build a metronome. This is the first real app I'm trying to build, so it's also my first audio app. Right now I'm experimenting with different frameworks and methods of getting the metronome looping accurately.
I'm trying to build an AVAudioPCMBuffer with the length of a measure/bar so that I can use the .Loops option of the AVAudioPlayerNode's scheduleBuffer method. I start by loading my file(2 ch, 44100 Hz, Float32, non-inter, *.wav and *.m4a both have same issue) into a buffer, then copying that buffer frame by frame separated by empty frames into the barBuffer. The loop below is how I'm accomplishing this.
If I schedule the original buffer to play, it will play back in stereo, but when I schedule the barBuffer, I only get the left channel. As I said I'm a beginner at programming, and have no experience with audio programming, so this might be my lack of knowledge on 32 bit float channels, or on this data type UnsafePointer<UnsafeMutablePointer<float>>. When I look at the floatChannelData property in swift, the description makes it sound like this should be copying two channels.
var j = 0
for i in 0..<Int(capacity) {
barBuffer.floatChannelData.memory[j] = buffer.floatChannelData.memory[i]
j += 1
}
j += Int(silenceLengthInSamples)
// loop runs 4 times for 4 beats per bar.
edit: I removed the glaring mistake i += 1, thanks to hotpaw2. The right channel is still missing when barBuffer is played back though.
Unsafe pointers in swift are pretty weird to get used to.
floatChannelData.memory[j] only accesses the first channel of data. To access the other channel(s), you have a couple choices:
Using advancedBy
// Where current channel is at 0
// Get a channel pointer aka UnsafePointer<UnsafeMutablePointer<Float>>
let channelN = floatChannelData.advancedBy( channelNumber )
// Get channel data aka UnsafeMutablePointer<Float>
let channelNData = channelN.memory
// Get first two floats of channel channelNumber
let floatOne = channelNData.memory
let floatTwo = channelNData.advancedBy(1).memory
Using Subscript
// Get channel data aka UnsafeMutablePointer<Float>
let channelNData = floatChannelData[ channelNumber ]
// Get first two floats of channel channelNumber
let floatOne = channelNData[0]
let floatTwo = channelNData[1]
Using subscript is much clearer and the step of advancing and then manually
accessing memory is implicit.
For your loop, try accessing all channels of the buffer by doing something like this:
for i in 0..<Int(capacity) {
for n in 0..<Int(buffer.format.channelCount) {
barBuffer.floatChannelData[n][j] = buffer.floatChannelData[n][i]
}
}
Hope this helps!
This looks like a misunderstanding of Swift "for" loops. The Swift "for" loop automatically increments the "i" array index. But you are incrementing it again in the loop body, which means that you end up skipping every other sample (the Right channel) in your initial buffer.

Repeated Scene items in iOS YUV video capturing output

I capture a video and handle the resulting YUV frames.
the output looks like the following:
Although it appears normally on my phone's screen. But my peer receives it like that img above.
Every item is repeated and shifted by some value horizontally and vertically
My captured video is 352x288 and my YPixelCount = 101376, UVPixelCount = YPIXELCOUNT/4
Any clue to solve this or a starting point to understand how to handle YUV video frames on iOS ?
NSNumber* recorderValue = [NSNumber numberWithUnsignedInt:kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange];
[videoRecorderSession setSessionPreset:AVCaptureSessionPreset352x288];
And this is the captureOutput function
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection{
if(CMSampleBufferIsValid(sampleBuffer) && CMSampleBufferDataIsReady(sampleBuffer) && ([self isQueueStopped] == FALSE))
{
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
CVPixelBufferLockBaseAddress(imageBuffer,0);
UInt8 *baseAddress[3] = {NULL,NULL,NULL};
uint8_t *yPlaneAddress = (uint8_t *)CVPixelBufferGetBaseAddressOfPlane(imageBuffer,0);
UInt32 yPixelCount = CVPixelBufferGetWidthOfPlane(imageBuffer,0) * CVPixelBufferGetHeightOfPlane(imageBuffer,0);
uint8_t *uvPlaneAddress = (uint8_t *)CVPixelBufferGetBaseAddressOfPlane(imageBuffer,1);
UInt32 uvPixelCount = CVPixelBufferGetWidthOfPlane(imageBuffer,1) * CVPixelBufferGetHeightOfPlane(imageBuffer,1);
UInt32 p,q,r;
p=q=r=0;
memcpy(uPointer, uvPlaneAddress, uvPixelCount);
memcpy(vPointer, uvPlaneAddress+uvPixelCount, uvPixelCount);
memcpy(yPointer,yPlaneAddress,yPixelCount);
baseAddress[0] = (UInt8*)yPointer;
baseAddress[1] = (UInt8*)uPointer;
baseAddress[2] = (UInt8*)vPointer;
CVPixelBufferUnlockBaseAddress(imageBuffer,0);
}
}
Is there anything wrong with the above code ?
Your code doesn't look to0 bad. I can see two mistakes and one potential problem:
The uvPixelCount is incorrect. The YUV 420 format means that there is color information for each 2 by 2 pixel block. So the correct count is:
uvPixelCount = (width / 2) * (height / 2);
You write something about yPixelCount / 4, but I cannot see that in your code.
The UV information is interleaved, i.e. the second plane alternatingly contains a U and a V value. Or put differently: there's a U value on all even byte addresses and a V value on all odd byte addresses. If you really need to separate the U and V information, memcpy won't do.
There can be some extra bytes after each pixel row. You should use CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 0) to get the number of bytes between two rows. As a consequence, a single memcpy won't do. Instead you need to copy each pixel row separately to get rid of the extra bytes between the rows.
All these things only explain part of the resulting image. The remaining parts are probably due to differences between your code and what the receiving peer expect. You did't write anything about that? Does the peer really need separated U and V values? Does it you 4:2:0 compression as well? Does it you video range instead of full range as well?
If you provide more information, I can give your more hints.

Help with live-updating sound on the iPhone

My question is a little tricky, and I'm not exactly experienced (I might get some terms wrong), so here goes.
I'm declaring an instance of an object called "Singer". The instance is called "singer1". "singer1" produces an audio signal. Now, the following is the code where the specifics of the audio signal are determined:
OSStatus playbackCallback(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData) {
//Singer *me = (Singer *)inRefCon;
static int phase = 0;
for(UInt32 i = 0; i < ioData->mNumberBuffers; i++) {
int samples = ioData->mBuffers[i].mDataByteSize / sizeof(SInt16);
SInt16 values[samples];
float waves;
float volume=.5;
for(int j = 0; j < samples; j++) {
waves = 0;
waves += sin(kWaveform * 600 * phase)*volume;
waves += sin(kWaveform * 400 * phase)*volume;
waves += sin(kWaveform * 200 * phase)*volume;
waves += sin(kWaveform * 100 * phase)*volume;
waves *= 32500 / 4; // <--------- make sure to divide by how many waves you're stacking
values[j] = (SInt16)waves;
values[j] += values[j]<<16;
phase++;
}
memcpy(ioData->mBuffers[i].mData, values, samples * sizeof(SInt16));
}
return noErr;
}
99% of this is borrowed code, so I only have a basic understanding of how it works (I don't know about the OSStatus class or method or whatever this is. However, you see those 4 lines with 600, 400, 200 and 100 in them? Those determine the frequency. Now, what I want to do (for now) is insert my own variable in there in place of a constant, which I can change on a whim. This variable is called "fr1". "fr1" is declared in the header file, but if I try to compile I get an error about "fr1" being undeclared. Currently, my technique to fix this is the following: right beneath where I #import stuff, I add the line
fr1=0.0;//any number will work properly
This sort of works, as the code will compile and singer1.fr1 will actually change values if I tell it to. The problems are now this:A)even though this compiles and the tone specified will play (0.0 is no tone), I get the warnings "Data definition has no type or storage class" and "Type defaults to 'int' in declaration of 'fr1'". I bet this is because for some reason it's not seeing my previous declaration in the header file (as a float). However, again, if I leave this line out the code won't compile because "fr1 is undeclared". B)Just because I change the value of fr1 doesn't mean that singer1 will update the value stored inside the "playbackcallback" variable or whatever is in charge of updating the output buffers. Perhaps this can be fixed by coding differently? C)even if this did work, there is still a noticeable "gap" when pausing/playing the audio, which I need to eliminate. This might mean a complete overhaul of the code so that I can "dynamically" insert new values without disrupting anything. However, the reason I'm going through all this effort to post is because this method does exactly what I want (I can compute a value mathematically and it goes straight to the DAC, which means I can use it in the future to make triangle, square, etc waves easily). I have uploaded Singer.h and .m to pastebin for your veiwing pleasure, perhaps they will help. Sorry, I can't post 2 HTML tags so here are the full links.
(http://pastebin.com/ewhKW2Tk)
(http://pastebin.com/CNAT4gFv)
So, TL;DR, all I really want to do is be able to define the current equation/value of the 4 waves and re-define them very often without a gap in the sound.
Thanks. (And sorry if the post was confusing or got off track, which I'm pretty sure it did.)
My understanding is that your callback function is called every time the buffer needs to be re-filled. So changing fr1..fr4 will alter the waveform, but only when the buffer updates. You shouldn't need to stop and re-start the sound to get a change, but you will notice an abrupt shift in the timbre if you change your fr values. In order to get a smooth transition in timbre, you'd have to implement something that smoothly changes the fr values over time. Tweaking the buffer size will give you some control over how responsive the sound is to your changing fr values.
Your issue with fr being undefined is due to your callback being a straight c function. Your fr variables are declared as objective-c instance variables as part of your Singer object. They are not accessible by default.
take a look at this project, and see how he implements access to his instance variables from within his callback. Basically he passes a reference to his instance to the callback function, and then accesses instance variables through that.
https://github.com/youpy/dowoscillator
notice:
Sinewave *sineObject = inRefCon;
float freq = sineObject.frequency * 2 * M_PI / samplingRate;
and:
AURenderCallbackStruct input;
input.inputProc = RenderCallback;
input.inputProcRefCon = self;
Also, you'll want to move your callback function outside of your #implementation block, because it's not actually part of your Singer object.
You can see this all in action here: https://github.com/coryalder/SineWaver