Presentation time of Audio buffer and Video buffer are not equal - swift

I'm trying to create an app that does live Video & Audio recording, using AVFoundation.
Also using AVAssetWriter I'm writing the buffers to a local file.
For the Video CMSampleBuffer I'm Using the AVCaptureVideoDataOutputSampleBufferDelegate output in AVCaptureSession which is straightforward.
For the Audio CMSampleBuffer I'm creating the buffer from the AudioUnit record callback.
The way I'm calculating the presentation time for the Audio buffer is like so:
var timebaseInfo = mach_timebase_info_data_t(numer: 0, denom: 0)
let timebaseStatus = mach_timebase_info(&timebaseInfo)
if timebaseStatus != KERN_SUCCESS {
debugPrint("not working")
return
}
let hostTime = time * UInt64(timebaseInfo.numer / timebaseInfo.denom)
let presentationTIme = CMTime(value: CMTimeValue(hostTime), timescale: 1000000000)
let duration = CMTime(value: CMTimeValue(1), timescale: CMTimeScale(self.sampleRate))
var timing = CMSampleTimingInfo(
duration: duration,
presentationTimeStamp: presentationTIme,
decodeTimeStamp: CMTime.invalid
)
self.sampleRate is a variable that changes at the recording start, but most of the times is 48000.
When getting the CMSampleBuffers of both the video and the audio the presentation times has a really big difference.
Audio - CMTime(value: 981750843366125, timescale: 1000000000, flags: __C.CMTimeFlags(rawValue: 1), epoch: 0)
Video - CMTime(value: 997714237615541, timescale: 1000000000, flags: __C.CMTimeFlags(rawValue: 1), epoch: 0)
This creates a big gap when trying to write the buffers to the file.
My questions is
Am I calculating the presentation Time of the audio buffer correctly? if so, what am I missing?
How can I make sure the Audio & the Video are in the same area of time (I know that there should be a small millisecond difference between them)

Ok, so this was my fault.
As Rhythmic Fistman in the comments suggested, I was getting truncation with my time calculation:
let hostTime = time * UInt64(timebaseInfo.numer / timebaseInfo.denom)
Changing to this calculation fixed it
let hostTime = (time * UInt64(timebaseInfo.numer)) / UInt64(timebaseInfo.denom)

Related

iPhone11 unexpected number of Audio Samples

I have an app that captures audio and video using AVAssetWriter. It runs a fast fourier transform (FFT) on the audio to create a visual spectrum of the captured audio in real time.
Up until the release of iPhone11, this all worked fine. Users with the iPhone 11, however, are reporting that audio is not being captured at all. I have managed to narrow down the issue - The number of samples returned in captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) is either 940 or 941 - On previous phone models, this is always 1024 samples. I use CMSampleBufferGetNumSamples to get the number of samples. My FFT calculations rely on having the number of samples be a power of 2, so it drops all frames on the newer model iPhones.
Can anybody shed light on why the new iPhone11 is returning an unusual number of samples? Here is how I have configured the AVAssetWriter:
self.videoWriter = try AVAssetWriter(outputURL: self.outputURL, fileType: AVFileType.mp4)
var videoSettings: [String : Any]
if #available(iOS 11.0, *) {
videoSettings = [
AVVideoCodecKey : AVVideoCodecType.h264,
AVVideoWidthKey : Constants.VIDEO_WIDTH,
AVVideoHeightKey : Constants.VIDEO_HEIGHT,
]
} else {
videoSettings = [
AVVideoCodecKey : AVVideoCodecH264,
AVVideoWidthKey : Constants.VIDEO_WIDTH,
AVVideoHeightKey : Constants.VIDEO_HEIGHT,
]
}
//Video Input
videoWriterVideoInput = AVAssetWriterInput(mediaType: AVMediaType.video, outputSettings: videoSettings)
videoWriterVideoInput?.expectsMediaDataInRealTime = true;
if (videoWriter?.canAdd(videoWriterVideoInput!))!
{
videoWriter?.add(videoWriterVideoInput!)
}
//Audio Settings
let audioSettings : [String : Any] = [
AVFormatIDKey : kAudioFormatMPEG4AAC,
AVSampleRateKey : Constants.AUDIO_SAMPLE_RATE, //Float(44100.0)
AVEncoderBitRateKey : Constants.AUDIO_BIT_RATE, //64000
AVNumberOfChannelsKey: Constants.AUDIO_NUMBER_CHANNELS //1
]
//Audio Input
videoWriterAudioInput = AVAssetWriterInput(mediaType: AVMediaType.audio, outputSettings: audioSettings)
videoWriterAudioInput?.expectsMediaDataInRealTime = true;
if (videoWriter?.canAdd(videoWriterAudioInput!))!
{
videoWriter?.add(videoWriterAudioInput!)
}
You can't assume a fixed sample rate. Depending on the microphone and many other factors of a device, you can't always assume it will be the same. This doesn't help with the FFT library I'm using (TempiFFT) - To get this to work you need to detect the sample rate ahead of time.
Rather than:
let fft = TempiFFT(withSize: 1024, sampleRate: Constants.AUDIO_SAMPLE_RATE)
I need to first detect what the sample rate is when I start my AVCaptureSession, and then pass that detected value to the FFT library:
//During initialization of AVCaptureSession
audioSampleRate = Float(AVAudioSession.sharedInstance().sampleRate)
...
//Run FFT calculations
let fft = TempiFFT(withSize: 1024, sampleRate: audioSampleRate)
Update
On some devices, you may not receive a full 1024 samples in your loop (on iPhone 11 I was getting 941) - if it doesn't have the right number of frames, you may get unexpected behavior from the FFT. I needed to create a circular buffer to store the samples upon return of each output til I had at least 1024 samples to perform the FFT.

How to sync input and playback for core audio using swift

I have created an app which I am using to take acoustic measurements. The app generates a log sine sweep stimulus, and when the user presses 'start' the app simultaneously plays the stimulus sound, and records the microphone input.
All fairly standard stuff. I am using core audio as down the line I want to really delve into different functionality, and potentially use multiple interfaces, so have to start learning somewhere.
This is for iOS so I am creating an AUGraph with remoteIO Audio Unit for input and output. I have declared the audio formats, and they are correct as no errors are shown and the AUGraph initialises, starts, plays sound and records.
I have a render callback on the input scope to input 1 of my mixer. (ie, every time more audio is needed, the render callback is called and this reads a few samples into the buffer from my stimulus array of floats).
let genContext = Unmanaged.passRetained(self).toOpaque()
var genCallbackStruct = AURenderCallbackStruct(inputProc: genCallback,
inputProcRefCon: genContext)
AudioUnitSetProperty(mixerUnit!, kAudioUnitProperty_SetRenderCallback,
kAudioUnitScope_Input, 1, &genCallbackStruct,
UInt32(MemoryLayout<AURenderCallbackStruct>.size))
I then have an input callback which is called every time the buffer is full on the output scope of the remoteIO input. This callback saves the samples to an array.
var inputCallbackStruct = AURenderCallbackStruct(inputProc: recordingCallback,
inputProcRefCon: context)
AudioUnitSetProperty(remoteIOUnit!, kAudioOutputUnitProperty_SetInputCallback,
kAudioUnitScope_Global, 0, &inputCallbackStruct,
UInt32(MemoryLayout<AURenderCallbackStruct>.size))
Once the stimulus reaches the last sample, the AUGraph is stopped, and then I write both the stimulus and the recorded array to separate WAV files so I can check my data. What I am finding is that there is currently about 3000 samples delay between the recorded input and the stimulus.
Whilst it is hard to see the start of the waveforms (both the speakers and the microphone may not detect that low), the ends of the stimulus (bottom WAV) and the recorded should roughly line up.
There will be propagation time for the audio, I realise this, but at 44100Hz sample rate, that's 68ms. Core audio is meant to keep latency down.
So my question is this, can anybody account for this additional latency which seems quite high
my inputCallback is as follows:
let recordingCallback: AURenderCallback = { (
inRefCon,
ioActionFlags,
inTimeStamp,
inBusNumber,
frameCount,
ioData ) -> OSStatus in
let audioObject = unsafeBitCast(inRefCon, to: AudioEngine.self)
var err: OSStatus = noErr
var bufferList = AudioBufferList(
mNumberBuffers: 1,
mBuffers: AudioBuffer(
mNumberChannels: UInt32(1),
mDataByteSize: 512,
mData: nil))
if let au: AudioUnit = audioObject.remoteIOUnit! {
err = AudioUnitRender(au,
ioActionFlags,
inTimeStamp,
inBusNumber,
frameCount,
&bufferList)
}
let data = Data(bytes: bufferList.mBuffers.mData!, count: Int(bufferList.mBuffers.mDataByteSize))
let samples = data.withUnsafeBytes {
UnsafeBufferPointer<Int16>(start: $0, count: data.count / MemoryLayout<Int16>.size)
}
let factor = Float(Int16.max)
var floats: [Float] = Array(repeating: 0.0, count: samples.count)
for i in 0..<samples.count {
floats[i] = (Float(samples[i]) / factor)
}
var j = audioObject.in1BufIndex
let m = audioObject.in1BufSize
for i in 0..<(floats.count) {
audioObject.in1Buf[j] = Float(floats[I])
j += 1 ; if j >= m { j = 0 }
}
audioObject.in1BufIndex = j
audioObject.inputCallbackFrameSize = Int(frameCount)
audioObject.callbackcount += 1
var WindowSize = totalRecordSize / Int(frameCount)
if audioObject.callbackcount == WindowSize {
audioObject.running = false
}
return 0
}
So from when the engine starts, this callback should be called after the first set of data is collected from remoteIO. 512 samples as that is the default allocated buffer size. All it does is convert from the signed integer into Float, and save to a buffer. The value in1BufIndex is a reference to the last index in the array written to, and this is referenced and written to with each callback, to make sure the data in the array lines up.
Currently it seems about 3000 samples of silence is in the recorded array before the captured sweep is heard. Inspecting the recorded array by debugging in Xcode, all samples have values (and yes the first 3000 are very quiet), but somehow this doesn't add up.
Below is the generator Callback used to play my stimulus
let genCallback: AURenderCallback = { (
inRefCon,
ioActionFlags,
inTimeStamp,
inBusNumber,
frameCount,
ioData) -> OSStatus in
let audioObject = unsafeBitCast(inRefCon, to: AudioEngine.self)
for buffer in UnsafeMutableAudioBufferListPointer(ioData!) {
var frames = buffer.mData!.assumingMemoryBound(to: Float.self)
var j = 0
if audioObject.stimulusReadIndex < (audioObject.Stimulus.count - Int(frameCount)){
for i in stride(from: 0, to: Int(frameCount), by: 1) {
frames[i] = Float((audioObject.Stimulus[j + audioObject.stimulusReadIndex]))
j += 1
audioObject.in2Buf[j + audioObject.stimulusReadIndex] = Float((audioObject.Stimulus[j + audioObject.stimulusReadIndex]))
}
audioObject.stimulusReadIndex += Int(frameCount)
}
}
return noErr;
}
There may be at least 4 things contributing to the round trip latency.
512 samples, or 11 mS, is the time required to gather enough samples before remoteIO can call your callback.
Sound propagates at about 1 foot per millisecond, double that for a round trip.
The DAC has an output latency.
There is the time needed for the multiple ADCs (there’s more than 1 microphone on your iOS device) to sample and post-process the audio (for sigma-delta, beam forming, equalization, and etc.). The post processing might be done in blocks, thus incurring the latency to gather enough samples (an undocumented number) for one block.
There’s possibly also added overhead latency in moving data (hardware DMA of some unknown block size?) between the ADC and system memory, as well as driver and OS context switching overhead.
There’s also a startup latency to power up the audio hardware subsystems (amplifiers, etc.), so it may be best to start playing and recording audio well before outputting your sound (frequency sweep).

How do you pause, stop and reset an AUFilePlayer?

I am working with core audio using an AUFilePlayer to load a few mp3s into a mixer unit, everything plays great however I am unable to pause the music or rewind the music back to the start. I tried Starting and stopping the AudioGraph, but once the playback is stopped, I can't get it to restart. I also tried using AudioUnitSetProperty to set the file playback to 0
i.e something along these lines:
ScheduledAudioFileRegion rgn;
memset (&rgn.mTimeStamp, 0, sizeof(rgn.mTimeStamp));
rgn.mTimeStamp.mFlags = kAudioTimeStampSampleTimeValid;
rgn.mTimeStamp.mSampleTime = 0;
rgn.mCompletionProc = NULL;
rgn.mCompletionProcUserData = NULL;
rgn.mAudioFile = inputFile;
rgn.mLoopCount = 1;
rgn.mStartFrame = 0;
rgn.mFramesToPlay = nPackets * inputFormat.mFramesPerPacket;
AudioUnitSetProperty(fileUnit, kAudioUnitProperty_ScheduledFileRegion,
kAudioUnitScope_Global, 0,&rgn, sizeof(rgn));
Any suggestions?
In case anyone else is dealing with a similar issue, which cost me several hours searching on google, here's what I've discovered on how to specifically retrieve and set the playhead.
To get the playhead from an AUFilePlayer unit:
AudioTimeStamp timestamp;
UInt32 size = sizeof(timestamp);
err = AudioUnitGetProperty(unit, kAudioUnitProperty_CurrentPlayTime, kAudioUnitScope_Global, 0, &timestamp, &size);
The timestamp.mSampleTime is the current playhead for that file. Cast mSampleTime to a float or a double and divide by the file's sample rate to convert to seconds.
For restarting the AUFilePlayer's playhead, I had a more complex scenario where multiple AUFilePlayers pass through a mixer and can be scheduled at different times, multiple times, and with varying loop counts. This is a real-world scenario, and getting them all to restart at the correct time took a little bit of code.
There are four scenarios for each AUFilePlayer and it's schedule:
The playhead is at the beginning, so can be scheduled normally.
The playhead is past the item's duration, and doesn't need to be scheduled at all.
The playhead is before the item has started, so the start time can be moved up.
The playhead is in the middle of playing an item, so the region playing within the file needs to be adjusted, and remaining loops need to be scheduled separately (so they play in full).
Here is some code which demonstrates this (some external structures are from my own code and not Core Audio, but the mechanism should be clear):
// Add each region
for(int iItem = 0; iItem < schedule.items.count; iItem++) {
AEFileScheduleItem *scheduleItem = [schedule.items objectAtIndex:iItem];
// Setup the region
ScheduledAudioFileRegion region;
[file setRegion:&region schedule:scheduleItem];
// Compute where we are at in it
float playheadTime = schedule.playhead / file.sampleRate;
float endOfItem = scheduleItem.startTime + (file.duration*(1+scheduleItem.loopCount));
// There are four scenarios:
// 1. The playhead is -1
// In this case, we're all done
if(schedule.playhead == -1) {
}
// 2. The playhead is past the item start time and duration*loopCount
// In this case, just ignore it and move on
else if(playheadTime > endOfItem) {
continue;
}
// 3. The playhead is less than or equal to the start time
// In this case, simply subtract the playhead from the start time
else if(playheadTime <= scheduleItem.startTime) {
region.mTimeStamp.mSampleTime -= schedule.playhead;
}
// 4. The playhead is in the middle of the file duration*loopCount
// In this case, set the start time to zero, adjust the loopCount
// startFrame and framesToPlay
else {
// First remove the start time
region.mStartFrame = 0;
double offsetTime = playheadTime - scheduleItem.startTime;
// Next, take out any expired loops
int loopsExpired = floor(offsetTime/file.duration);
int fullLoops = region.mLoopCount - loopsExpired;
region.mLoopCount = 0;
offsetTime -= (loopsExpired * file.duration);
// Then, adjust this segment of a loop accordingly
region.mStartFrame = offsetTime * file.sampleRate;
region.mFramesToPlay = region.mFramesToPlay - region.mStartFrame;
// Finally, schedule any remaining loops separately
if(fullLoops > 0) {
ScheduledAudioFileRegion loops;
[file setRegion:&loops schedule:scheduleItem];
loops.mStartFrame = region.mFramesToPlay;
loops.mLoopCount = fullLoops-1;
if(![super.errors check:AudioUnitSetProperty(unit, kAudioUnitProperty_ScheduledFileRegion, kAudioUnitScope_Global, 0, &region, sizeof(region))
location:#"AudioUnitSetProperty(ScheduledFileRegion)"])
return false;
}
}
// Set the region
if(![super.errors check:AudioUnitSetProperty(unit, kAudioUnitProperty_ScheduledFileRegion, kAudioUnitScope_Global, 0, &region, sizeof(region))
location:#"AudioUnitSetProperty(ScheduledFileRegion)"])
return false;
}
I figured it out. In case any of you ever run into the same problem, here was my solution:
On startup, I initialize the AUGraph with an array of file player audio units. I set the play head of each track in the file player audio unit array to zero.
On “pause” , first I stop the AuGraph. Then I loop through the array of file player audio units and capture the current playhead position. Each time the pause button is pressed, I make sure I add the new current playhead position to the old playhead position to get its true position.
When the user hits play, I re initialize the AuGraph just as if I was starting the app for the very first time, only I set the playhead to the number I stored when the user hit “pause” instead of telling it to play at the start of the file.
If the user clicks stop, I set the stored playhead position to zero and then stop the AuGraph.

iPhone/OpenAL getting sound length (playback time) of a sample

I'm new to OpenAL. I managed to get a soundmanager code that wraps OpenAL for iPhone, so I can load sounds and play them.
But I really need to know how long each sound file is in seconds because I need to call an event as soon as the sound as finished.
I've noticed that there is a way to calculate the length of a sound when populating the buffers(?). Can someone help me with this? Thanks in advance.
float result;
alGetSourcef(sourceID, AL_SEC_OFFSET, &result);
return result;
You can use this snippet to get the current playback time of the sound.
If you are populating known size buffers with raw PCM audio samples of a known format, then:
duration = numberOfSampleFrames / sampleRate;
where, typically, the number of sample frames is the number_of_bytes/2 for mono 16-bit samples, or the number_of_bytes/4 for stereo, etc.
I have the same problem and came up with the following solution. The first function is optional, but allows to compensate for the elapsed time. I'm then firing an NSTimer with the resulting time interval.
Have fun! Dirk
static NSTimeInterval OPElapsedPlaybackTimeForSource(ALuint sourceID) {
float result = 0.0;
alGetSourcef(sourceID, AL_SEC_OFFSET, &result);
return result;
}
static NSTimeInterval OPDurationFromSourceId(ALuint sourceID) {
ALint bufferID, bufferSize, frequency, bitsPerSample, channels;
alGetSourcei(sourceID, AL_BUFFER, &bufferID);
alGetBufferi(bufferID, AL_SIZE, &bufferSize);
alGetBufferi(bufferID, AL_FREQUENCY, &frequency);
alGetBufferi(bufferID, AL_CHANNELS, &channels);
alGetBufferi(bufferID, AL_BITS, &bitsPerSample);
NSTimeInterval result = ((double)bufferSize)/(frequency*channels*(bitsPerSample/8));
NSLog(#"duration in seconds %lf", result);
return result;
}
ALint bufferID, bufferSize;
alGetSourcei(sourceID, AL_BUFFER, &bufferID);
alGetBufferi(bufferID, AL_SIZE, &bufferSize);
NSLog(#"time in seconds %f", (1.0*bufferSize)/(44100*2*2)); //44100 * 2 chanel * 2byte (16bit)

How to use an Audio Unit on the iPhone

I'm looking for a way to change the pitch of recorded audio as it is saved to disk, or played back (in real time). I understand Audio Units can be used for this. The iPhone offers limited support for Audio Units (for example it's not possible to create/use custom audio units, as far as I can tell), but several out-of-the-box audio units are available, one of which is AUPitch.
How exactly would I use an audio unit (specifically AUPitch)? Do you hook it into an audio queue somehow? Is it possible to chain audio units together (for example, to simultaneously add an echo effect and a change in pitch)?
EDIT: After inspecting the iPhone SDK headers (I think AudioUnit.h, I'm not in front of a Mac at the moment), I noticed that AUPitch is commented out. So it doesn't look like AUPitch is available on the iPhone after all. weep weep
Apple seems to have better organized their iPhone SDK documentation at developer.apple.com of late - now its more difficult to find references to AUPitch, etc.
That said, I'm still interested in quality answers on using Audio Units (in general) on the iPhone.
There are some very good resources here (http://michael.tyson.id.au/2008/11/04/using-remoteio-audio-unit/) for using the RemoteIO Audio Unit. In my experience working with Audio Units on the iPhone, I've found that I can implement a transformation manually in the callback function. In doing so, you might find that solves you problem.
Regarding changing pitch on the iPhone, OpenAL is the way to go. Check out the SoundManager class available from www.71squared.com for a great example of an OpenAL sound engine that supports pitch.
- (void)modifySpeedOf:(CFURLRef)inputURL byFactor:(float)factor andWriteTo:(CFURLRef)outputURL {
ExtAudioFileRef inputFile = NULL;
ExtAudioFileRef outputFile = NULL;
AudioStreamBasicDescription destFormat;
destFormat.mFormatID = kAudioFormatLinearPCM;
destFormat.mFormatFlags = kAudioFormatFlagsCanonical;
destFormat.mSampleRate = 44100 * factor;
destFormat.mBytesPerPacket = 2;
destFormat.mFramesPerPacket = 1;
destFormat.mBytesPerFrame = 2;
destFormat.mChannelsPerFrame = 1;
destFormat.mBitsPerChannel = 16;
destFormat.mReserved = 0;
ExtAudioFileCreateWithURL(outputURL, kAudioFileCAFType,
&destFormat, NULL, kAudioFileFlags_EraseFile, &outputFile);
ExtAudioFileOpenURL(inputURL, &inputFile);
//find out how many frames is this file long
SInt64 length = 0;
UInt32 dataSize2 = (UInt32)sizeof(length);
ExtAudioFileGetProperty(inputFile,
kExtAudioFileProperty_FileLengthFrames, &dataSize2, &length);
SInt16 *buffer = (SInt16*)malloc(kBufferSize * sizeof(SInt16));
UInt32 totalFramecount = 0;
AudioBufferList bufferList;
bufferList.mNumberBuffers = 1;
bufferList.mBuffers[0].mNumberChannels = 1;
bufferList.mBuffers[0].mData = buffer; // pointer to buffer of audio data
bufferList.mBuffers[0].mDataByteSize = kBufferSize *
sizeof(SInt16); // number of bytes in the buffer
while(true) {
UInt32 frameCount = kBufferSize * sizeof(SInt16) / 2;
// Read a chunk of input
ExtAudioFileRead(inputFile, &frameCount, &bufferList);
totalFramecount += frameCount;
if (!frameCount || totalFramecount >= length) {
//termination condition
break;
}
ExtAudioFileWrite(outputFile, frameCount, &bufferList);
}
free(buffer);
ExtAudioFileDispose(inputFile);
ExtAudioFileDispose(outputFile);
}
it will change pitch based on factor
I've used the NewTimePitch audio unit for this before, the Audio Component Description for that is
var newTimePitchDesc = AudioComponentDescription(componentType: kAudioUnitType_FormatConverter,
componentSubType: kAudioUnitSubType_NewTimePitch,
componentManufacturer: kAudioUnitManufacturer_Apple,
componentFlags: 0,
componentFlagsMask: 0)
then you can change the pitch parameter with an AudioUnitSetParamater call. For example this changes the pitch by -1000 cents
err = AudioUnitSetParameter(newTimePitchAudioUnit,
kNewTimePitchParam_Pitch,
kAudioUnitScope_Global,
0,
-1000,
0)
The parameters for this audio unit are as follows
// Parameters for AUNewTimePitch
enum {
// Global, rate, 1/32 -> 32.0, 1.0
kNewTimePitchParam_Rate = 0,
// Global, Cents, -2400 -> 2400, 1.0
kNewTimePitchParam_Pitch = 1,
// Global, generic, 3.0 -> 32.0, 8.0
kNewTimePitchParam_Overlap = 4,
// Global, Boolean, 0->1, 1
kNewTimePitchParam_EnablePeakLocking = 6
};
but you'll only need to change the pitch parameter for your purposes. For a guide on how to implement this refer to Justin's answer