Generating Float32 Array (Float32 PCM data) using CMSampleBuffer - swift

I get the callbacks from camera with for audio with data in the format of CMSampleBuffer but I am unable to convert this data to PCM data.
I followed the docs provided by Apple copyPCMData, UnsafeMutablePointer, AudioBufferList but all I get is 0.0 at the end.
Here is my code:
private let pcmBufferPointer = UnsafeMutablePointer<AudioBufferList>.allocate(capacity: 1024)
init(....){
//...
let unsafeRawPointer = UnsafeMutableRawPointer.allocate(byteCount: 4, alignment: 0)
let audioBuffer = AudioBuffer(mNumberChannels: 1, mDataByteSize: 4, mData: unsafeRawPointer)
let audioBufferList = AudioBufferList(mNumberBuffers: 0, mBuffers: audioBuffer)
self.pcmBufferPointer.initialize(repeating: audioBufferList, count: 1024)
}
//CMSampleBuffer obtained from AVCaptureAudioDataOutputSampleBufferDelegate
private func audioFrom(sampleBuffer: CMSampleBuffer) -> Void {
let status = CMSampleBufferCopyPCMDataIntoAudioBufferList(sampleBuffer, 0, 1024, pcmBufferPointer)
if status == 0 {
Logger.log(key: "Audio Sample Buffer Status", message: "Buffer copied to pointer")
let dataValue = pcmBufferPointer[0].mBuffers.mData!.load(as: Float32.self) //Tried with Int, Int16, Int32, Int64 and Float too
Logger.log(key: "PCM Data Value", message: "Data value : \(dataValue)") //prints 0.0
}else{
Logger.log(key: "Audio Sample", message: "Buffer allocation failed with status \(status)")
}
}

Finally got it working.
Had to add extra step for the conversion of AudioBufferList pointer to AudioList pointer
if status == 0 {
let inputDataPtr = UnsafeMutableAudioBufferListPointer(pcmBufferPointer)
let mBuffers : AudioBuffer = inputDataPtr[0]
if let bufferPointer = UnsafeMutableRawPointer(mBuffers.mData){
let dataPointer = bufferPointer.assumingMemoryBound(to: Int16.self)
let dataArray = Array(UnsafeBufferPointer.init(start: dataPointer, count: 1024))
pcmArray.append(contentsOf: dataArray)
}else{
Logger.log(key: "Audio Sample", message: "Failed to generate audio sample")
}
}else{
Logger.log(key: "Audio Sample", message: "Buffer allocation failed with status \(status)")
}
Above code works only for single channel PCM data. For 2 channels data refer the following GIST - https://gist.github.com/hotpaw2/ba815fc23b5d642705f2b1dedfaf0107

Related

Converting AvAudioInputNode to S16LE PCM

I'm trying to convert input node format to S16LE format. I've tried it with AVAudioMixerNode
First I create audio session
do {
try audioSession.setCategory(.record)
try audioSession.setActive(true)
} catch {
...
}
//Define formats
let inputNodeOutputFormat = audioEngine.inputNode.outputFormat(forBus: 0)
guard let wantedFormat = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: 16000, channels: 1, interleaved: false) else {
return;
}
//Create mixer node and attach it to the engine
audioEngine.attach(mixerNode)
//Connect the input node to mixer node and mixer node to mainMixerNode
audioEngine.connect(audioEngine.inputNode, to: mixerNode, format: inputNodeOutputFormat)
audioEngine.connect(mixerNode, to: audioEngine.mainMixerNode, format: wantedFormat)
//Install the tab on the output of the mixerNode
mixerNode.installTap(onBus: 0, bufferSize: bufferSize, format: wantedFormat) { (buffer, time) in
let theLength = Int(buffer.frameLength)
var bufferData: [Int16] = []
for i in 0 ..< theLength
{
let char = Int16((buffer.int16ChannelData?.pointee[i])!)
bufferData.append(char)
}
}
I get the following error.
Exception 'I[busArray objectAtindexedSubscript:
(NSUlnteger)element] setFormat:format
error:&nsErr]: returned false, error Error
Domain=NSOSStatusErrorDomain Code=-10868
"(null)"' was thrown
What part of the graph did I mess up?
You have to set the format of nodes to match the actual format of the data. Setting the node's format doesn't cause any conversions to happen, except that mixer nodes can convert sample rates (but not data formats). You'll need to use an AVAudioConverter in your tap to do the conversion.
As an example of what this code would look like, to handle arbitrary conversions:
let inputNode = audioEngine.inputNode
let inputFormat = inputNode.inputFormat(forBus: 0)
let outputFormat = AVAudioFormat(... define your format ...)
guard let converter = AVAudioConverter(from: inputFormat, to: outputFormat) else {
throw ...some error...
}
inputNode.installTap(onBus: 0, bufferSize: 1024, format: inputFormat) {[weak self] (buffer, time) in
let inputBlock: AVAudioConverterInputBlock = {inNumPackets, outStatus in
outStatus.pointee = AVAudioConverterInputStatus.haveData
return buffer
}
let targetFrameCapacity = AVAudioFrameCount(outputFormat.sampleRate) * buffer.frameLength / AVAudioFrameCount(buffer.format.sampleRate)
if let convertedBuffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: targetFrameCapacity) {
var error: NSError?
let status = converter.convert(to: convertedBuffer, error: &error, withInputFrom: inputBlock)
assert(status != .error)
let sampleCount = convertedBuffer.frameLength
let rawData = convertedBuffer.int16ChannelData![0]
// ... and here you have your data ...
}
}
If you don't need to change the sample rate, and you're converting from uncompressed audio to uncompressed audio, you may be able to use the simpler convert(to:from:) method in your tap.
Since iOS 13, you can also do this with AVAudioSinkNode rather than a tap, which can be more convenient.

i got crash when record : "required condition is false: format.sampleRate == hwFormat.sampleRate" afterweb rtc call

my record work normally, but the problem is after WebRTC call, i got crash
required condition is false: format.sampleRate == hwFormat.sampleRate
here is how i start crash and installTap:
func startRecord() {
self.filePath = nil
print("last format: \(audioEngine.inputNode.inputFormat(forBus: 0).sampleRate)")
let session = AVAudioSession.sharedInstance()
do {
try session.setCategory(.playAndRecord, options: .mixWithOthers)
} catch {
print("======== Error setting setCategory \(error.localizedDescription)")
}
do {
try session.setPreferredSampleRate(44100.0)
} catch {
print("======== Error setting rate \(error.localizedDescription)")
}
do {
try session.setPreferredIOBufferDuration(0.005)
} catch {
print("======== Error IOBufferDuration \(error.localizedDescription)")
}
do {
try session.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
print("========== Error starting session \(error.localizedDescription)")
}
let format = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatInt16,
sampleRate: 44100.0,
// sampleRate: audioEngine.inputNode.inputFormat(forBus: 0).sampleRate,
channels: 1,
interleaved: true)
audioEngine.connect(audioEngine.inputNode, to: mixer, format: format)
audioEngine.connect(mixer, to: audioEngine.mainMixerNode, format: format)
let dir = NSSearchPathForDirectoriesInDomains(.documentDirectory, .userDomainMask, true).first! as String
filePath = dir.appending("/\(UUID.init().uuidString).wav")
_ = ExtAudioFileCreateWithURL(URL(fileURLWithPath: filePath!) as CFURL,
kAudioFileWAVEType,(format?.streamDescription)!,nil,AudioFileFlags.eraseFile.rawValue,&outref)
mixer.installTap(onBus: 0, bufferSize: AVAudioFrameCount((format?.sampleRate)!), format: format, block: { (buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
let audioBuffer : AVAudioBuffer = buffer
_ = ExtAudioFileWrite(self.outref!, buffer.frameLength, audioBuffer.audioBufferList)
})
try! audioEngine.start()
startMP3Rec(path: filePath!, rate: 128)
}
func stopRecord() {
self.audioFilePlayer.stop()
self.audioEngine.stop()
self.mixer.removeTap(onBus: 0)
self.stopMP3Rec()
ExtAudioFileDispose(self.outref!)
try? AVAudioSession.sharedInstance().setActive(false)
}
func startMP3Rec(path: String, rate: Int32) {
self.isMP3Active = true
var total = 0
var read = 0
var write: Int32 = 0
let mp3path = path.replacingOccurrences(of: "wav", with: "mp3")
var pcm: UnsafeMutablePointer<FILE> = fopen(path, "rb")
fseek(pcm, 4*1024, SEEK_CUR)
let mp3: UnsafeMutablePointer<FILE> = fopen(mp3path, "wb")
let PCM_SIZE: Int = 8192
let MP3_SIZE: Int32 = 8192
let pcmbuffer = UnsafeMutablePointer<Int16>.allocate(capacity: Int(PCM_SIZE*2))
let mp3buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: Int(MP3_SIZE))
let lame = lame_init()
lame_set_num_channels(lame, 1)
lame_set_mode(lame, MONO)
lame_set_in_samplerate(lame, 44100)
lame_set_brate(lame, rate)
lame_set_VBR(lame, vbr_off)
lame_init_params(lame)
DispatchQueue.global(qos: .default).async {
while true {
pcm = fopen(path, "rb")
fseek(pcm, 4*1024 + total, SEEK_CUR)
read = fread(pcmbuffer, MemoryLayout<Int16>.size, PCM_SIZE, pcm)
if read != 0 {
write = lame_encode_buffer(lame, pcmbuffer, nil, Int32(read), mp3buffer, MP3_SIZE)
fwrite(mp3buffer, Int(write), 1, mp3)
total += read * MemoryLayout<Int16>.size
fclose(pcm)
} else if !self.isMP3Active {
_ = lame_encode_flush(lame, mp3buffer, MP3_SIZE)
_ = fwrite(mp3buffer, Int(write), 1, mp3)
break
} else {
fclose(pcm)
usleep(50)
}
}
lame_close(lame)
fclose(mp3)
fclose(pcm)
self.filePathMP3 = mp3path
}
}
func stopMP3Rec() {
self.isMP3Active = false
}
as first time run app, i log the last format using
print("last format: \(audioEngine.inputNode.inputFormat(forBus: 0).sampleRate)")
--> return 0 -> record normally
next time return 44100 -> record normally
but after webrtc call, i got 48000, then it make crash in this line
self.audioEngine.connect(self.audioEngine.inputNode, to: self.mixer, format: format)
i spend 4 hour in stackoverflow but no solution work for me.
i dont want 48000 format, because i have set the sample to
sampleRate: audioEngine.inputNode.inputFormat(forBus: 0).sampleRate,
-> my output is hard to hear, i can recognize my voice :(
So i think 44100 is the best
can someone give me some advices? Thanks
The down sample part , more vivid on your case.
let bus = 0
let inputNode = audioEngine.inputNode
let inputFormat = inputNode.outputFormat(forBus: bus)
let outputFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: 44100, channels: 1, interleaved: true)!
let converter = AVAudioConverter(from: inputFormat, to: outputFormat)!
inputNode.installTap(onBus: bus, bufferSize: 1024, format: inputFormat){ (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
var newBufferAvailable = true
let inputCallback: AVAudioConverterInputBlock = { inNumPackets, outStatus in
if newBufferAvailable {
outStatus.pointee = .haveData
newBufferAvailable = false
return buffer
} else {
outStatus.pointee = .noDataNow
return nil
}
}
let convertedBuffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: AVAudioFrameCount(outputFormat.sampleRate) * buffer.frameLength / AVAudioFrameCount(buffer.format.sampleRate))!
var error: NSError?
let status = converter.convert(to: convertedBuffer, error: &error, withInputFrom: inputCallback)
// 44100 Hz buffer
print(convertedBuffer.format)
}
This line bugs.
let format = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatInt16, ...
AVAudioCommonFormat.pcmFormatInt16 not works by default.
You should use .pcmFormatFloat32
And the xcode tip is obvious,
the crash line
self.audioEngine.connect(self.audioEngine.inputNode, to: self.mixer, format: format)
You know it by print mixer.inputFormat(forBus: 0 )
then you got sample rate 48000 by the actual device. you can get 44100 by converting
just use AVAudioConverter to do down sample audio buffer.
let input = engine.inputNode
let bus = 0
let inputFormat = input.outputFormat(forBus: bus )
guard let outputFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: 44100, channels: 1, interleaved: true), let converter = AVAudioConverter(from: inputFormat, to: outputFormat) else{
return
}
if let convertedBuffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: AVAudioFrameCount(outputFormat.sampleRate) * buffer.frameLength / AVAudioFrameCount(buffer.format.sampleRate)){
var error: NSError?
let status = converter.convert(to: convertedBuffer, error: &error, withInputFrom: inputCallback)
assert(status != .error)
print(convertedBuffer.format)
}
Only saw this in the iOS simulator:
I spent over an hour going down a rat hole on this. I had worked on a Logic audio session with some headphones (at 48K) on and then went over to my iOS simulator to work on my audio code for my app and started getting this crash. Unplugged my headphones, still crashed. Rebooted simulator, deleted app from the simulator, restarted XCode and machine, still crashed.
Finally I went to system preference on my mac, selected:
Sound & Input, plugged in my headphones so it says "External Microphone".
Also went to the simulator I/O settings for audio input set to "Internal Microphone"
Now my app was able to startup in the simulator without crashing while trying to create an AKMicrophone()...
I tried the accepted answer but it didn't work for me.
I was able to fix it by declaring audioEngine instance variable as optional. Right before when I need to monitor or record sound. I used to assign a new object of type AVAudioEngine to it.
Upon ending the recording session. I call audioEngine!.stop and then assign it to nil to deallocate the object.

Convert PCM Buffer to AAC ELD Format and vice versa

I'm having trouble converting a linear PCM buffer to a compressed AAC ELD (Enhanced Low Delay) buffer.
I got some working code for the conversion into ilbc format from this question:
AVAudioCompressedBuffer to UInt8 array and vice versa
This approach worked fine.
I changed the input for the format to this:
let packetCapacity = 8
let maximumPacketSize = 96
lazy var capacity = packetCapacity * maximumPacketSize // 768
let convertedSampleRate: Double = 16000
lazy var aaceldFormat: AVAudioFormat = {
var descriptor = AudioStreamBasicDescription(mSampleRate: convertedSampleRate, mFormatID: kAudioFormatMPEG4AAC_ELD, mFormatFlags: 0, mBytesPerPacket: 0, mFramesPerPacket: 0, mBytesPerFrame: 0, mChannelsPerFrame: 1, mBitsPerChannel: 0, mReserved: 0)
return AVAudioFormat(streamDescription: &descriptor)!
}()
The conversion to a compressed buffer worked fine and I was able to convert the buffer to a UInt8 Array.
However, the conversion back to a PCM Buffer didn't work. The input block for the conversion back to a buffer looks like this:
func convertToBuffer(uints: [UInt8], outcomeSampleRate: Double) -> AVAudioPCMBuffer? {
// Convert to buffer
let compressedBuffer: AVAudioCompressedBuffer = AVAudioCompressedBuffer(format: aaceldFormat, packetCapacity: AVAudioPacketCount(packetCapacity), maximumPacketSize: maximumPacketSize)
compressedBuffer.byteLength = UInt32(capacity)
compressedBuffer.packetCount = AVAudioPacketCount(packetCapacity)
var compressedBytes = uints
compressedBytes.withUnsafeMutableBufferPointer {
compressedBuffer.data.copyMemory(from: $0.baseAddress!, byteCount: capacity)
}
guard let audioFormat = AVAudioFormat(
commonFormat: AVAudioCommonFormat.pcmFormatFloat32,
sampleRate: outcomeSampleRate,
channels: 1,
interleaved: false
) else { return nil }
guard let uncompressor = getUncompressingConverter(outputFormat: audioFormat) else { return nil }
var newBufferAvailable = true
let inputBlock : AVAudioConverterInputBlock = {
inNumPackets, outStatus in
if newBufferAvailable {
outStatus.pointee = .haveData
newBufferAvailable = false
return compressedBuffer
} else {
outStatus.pointee = .noDataNow
return nil
}
}
guard let uncompressedBuffer: AVAudioPCMBuffer = AVAudioPCMBuffer(pcmFormat: audioFormat, frameCapacity: AVAudioFrameCount((audioFormat.sampleRate / 10))) else { return nil }
var conversionError: NSError?
uncompressor.convert(to: uncompressedBuffer, error: &conversionError, withInputFrom: inputBlock)
if let err = conversionError {
print("couldnt decompress compressed buffer", err)
}
return uncompressedBuffer
}
The error block after the convert method triggers and prints out "too few bits left in input buffer". Also, it seems like the input block only gets called once.
I've tried different codes and this seems to be one of the most common outcomes. I'm also not sure if the problem is in the initial conversion from the pcm buffer to uint8 array although I get an UInt8 Array filled with 768 values every 0.1 seconds (Sometimes the array contains a few zeros at the end, which doesn't happen in ilbc format.
Questions:
1. Is the initial conversion from pcm buffer to uint8 array done with the right approach? Are the packetCapacity, capacity and maximumPacketSize valid? -> Again, seems to work
2. Am I missing something at the conversion back to pcm buffer? Also, am I using the variables in the right way?
3. Has anyone achieved this conversion without using C in the project?
** EDIT: ** I also worked with the approach from this post:
Decode AAC to PCM format using AVAudioConverter Swift
It works fine with AAC format, but not with AAC_LD or AAC_ELD

swift AVAudioEngine convert multichannel non interleaved signal to single channel

I am using AVAudioEngine to take a measurement. I play a stimulus sound out of my interface, and use a micTap to record the returned signal.
I am now looking at different Audio Interfaces which support a multitude of different formats. I am converting the input format of the inputNode via a mixer for two different reasons:
to downsample from the interfaces' preferred sampleRate to the sampleRate at which my app is working
to convert the incoming channel count to a single mono channel
I try this, however it does not always seem to work as expected. If my interface is running 96k and my app is running 48k, doing a format change via a mixer ends up with the following:
This looks like it is only getting one side of a stereo interleaved channel. Below is my audioEngine code:
func initializeEngine(inputSweep:SweepFilter) {
buf1current = 0
buf2current = 0
in1StartTime = 0
in2startTime = 0
in1firstRun = true
in2firstRun = true
in1Buf = Array(repeating:0, count:1000000)
in2Buf = Array(repeating:0, count:1000000)
engine.stop()
engine.reset()
engine = AVAudioEngine()
numberOfSamples = 0
var time:Int = 0
do {
try AVAudioSession.sharedInstance().setCategory(.playAndRecord)
try AVAudioSession.sharedInstance()
.setPreferredSampleRate(Double(sampleRate))
} catch {
assertionFailure("AVAudioSession setup failed")
}
let format = engine.outputNode.inputFormat(forBus: 0)
let stimulusFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32,
sampleRate: Double(sampleRate),
channels: 1,
interleaved: false)
let outputFormat = engine.outputNode.inputFormat(forBus: 0)
let inputFormat = engine.inputNode.outputFormat(forBus: 0)
let srcNode = AVAudioSourceNode { _, timeStamp, frameCount, AudioBufferList -> OSStatus in
let ablPointer = UnsafeMutableAudioBufferListPointer(AudioBufferList)
if self.in2firstRun == true {
let start2 = CACurrentMediaTime()
self.in2startTime = Double(CACurrentMediaTime())
self.in2firstRun = false
}
if Int(frameCount) + time >= inputSweep.stimulus.count{
self.running = false
print("AUDIO ENGINE STOPPED")
}
if (Int(frameCount) + time) <= inputSweep.stimulus.count {
for frame in 0..<Int(frameCount) {
let value = inputSweep.stimulus[frame + time] * Float(outputVolume)
for buffer in ablPointer {
let buf: UnsafeMutableBufferPointer<Float> = UnsafeMutableBufferPointer(buffer)
buf[frame] = value
}
}
time += Int(frameCount)
} else {
for frame in 0..<Int(frameCount) {
let value = 0
for buffer in ablPointer {
let buf: UnsafeMutableBufferPointer<Float> = UnsafeMutableBufferPointer(buffer)
buf[frame] = Float(value)
}
}
}
return noErr
}
engine.attach(srcNode)
engine.connect(srcNode, to: engine.mainMixerNode, format: stimulusFormat)
engine.connect(engine.mainMixerNode, to: engine.outputNode, format: format)
let requiredFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32,
sampleRate: Double(sampleRate),
channels: 1,
interleaved: false)
let formatMixer = AVAudioMixerNode()
engine.attach(formatMixer)
engine.connect(engine.inputNode, to: formatMixer, format: inputFormat)
let MicSinkNode = AVAudioSinkNode() { (timeStamp, frames, audioBufferList) ->
OSStatus in
if self.in1firstRun == true {
let start1 = CACurrentMediaTime()
self.in1StartTime = Double(start1)
self.in1firstRun = false
}
let ptr = audioBufferList.pointee.mBuffers.mData?.assumingMemoryBound(to: Float.self)
var monoSamples = [Float]()
monoSamples.append(contentsOf: UnsafeBufferPointer(start: ptr, count: Int(frames)))
if self.buf1current >= 100000 {
self.running = false
}
for frame in 0..<frames {
self.in1Buf[self.buf1current + Int(frame)] = monoSamples[Int(frame)]
}
self.buf1current = self.buf1current + Int(frames)
return noErr
}
engine.attach(MicSinkNode)
engine.connect(formatMixer, to: MicSinkNode, format: requiredFormat)
engine.prepare()
assert(engine.inputNode != nil)
running = true
try! engine.start()
}
My sourceNode is an array of Floats synthesised to use the stimulusFormat. If I listen to this audioEngine with my interface at 96k, the output stimulus sounds completely clean. However this broken up signal is what is coming from the micTap. Physically the output of the interface is routed. directly to the input, so not going through any other device.
Further to this, I have the following function, which records my arrays to WAV files so that I can visually inspect in a DAW.
func writetoFile(buff:[Float], name:String){
let SAMPLE_RATE = sampleRate
let outputFormatSettings = [
AVFormatIDKey:kAudioFormatLinearPCM,
AVLinearPCMBitDepthKey:32,
AVLinearPCMIsFloatKey: true,
AVLinearPCMIsBigEndianKey: true,
AVSampleRateKey: SAMPLE_RATE,
AVNumberOfChannelsKey: 1
] as [String : Any]
let fileName = name
let DocumentDirURL = try! FileManager.default.url(for: .documentDirectory, in: .userDomainMask, appropriateFor: nil, create: true)
let url = DocumentDirURL.appendingPathComponent(fileName).appendingPathExtension("wav")
//print("FilePath: \(url.path)")
let audioFile = try? AVAudioFile(forWriting: url, settings: outputFormatSettings, commonFormat: AVAudioCommonFormat.pcmFormatFloat32, interleaved: false)
let bufferFormat = AVAudioFormat(settings: outputFormatSettings)
let outputBuffer = AVAudioPCMBuffer(pcmFormat: bufferFormat!, frameCapacity: AVAudioFrameCount(buff.count))
for i in 0..<buff.count {
outputBuffer?.floatChannelData!.pointee[i] = Float(( buff[i] ))
}
outputBuffer!.frameLength = AVAudioFrameCount( buff.count )
do{
try audioFile?.write(from: outputBuffer!)
} catch let error as NSError {
print("error:", error.localizedDescription)
}
}
If I set my interface to be 48k, and my app is working at 48k, if I inspect my reference signal and. my measurement signal, i get the following:
The measured signal is clearly a lot longer than the original stimulus. The physical file size. is the same as it is initialised as an empty array of fixed size. However at some point doing the format conversion, it is not correct.
If I put my interface at 44.1k and my app runs at 48k, I can see the regular 'glitches' in audio. So the format convert here is not working as it should do.
Can anyone see anything obviously wrong?
put the non-interleaved option "AVLinearPCMIsNonInterleaved" inside the format settings:
let outputFormatSettings = [
**AVLinearPCMIsNonInterleaved: 0,**
AVFormatIDKey:kAudioFormatLinearPCM,
AVLinearPCMBitDepthKey:32,
AVLinearPCMIsFloatKey: true,
AVLinearPCMIsBigEndianKey: true,
AVSampleRateKey: SAMPLE_RATE,
AVNumberOfChannelsKey: 1
] as [String : Any]
it worked for me, let me know

Get all sound frequencies of a WAV-file using Swift and AVFoundation

I would like to capture all frequencies between given timespans in a Wav-file. The intent is to do some audio analysis in a later step. For test, I’ve used the application “Sox” to generate a 1 second long Wav-file which includes only a single tone at 13000Hz. I want to read the file and find that frequency.
I’m using AVFoundation (which is important) to read the file. Since the input data is in PCM, I need to use FFT to get the actual frequencies which I do using the Accelerate framework. However, I don’t get the expected result (13000Hz), but rather a lot of values I don’t understand. I’m new to audio development, so any hint about where my code is failing is appreciated. The code includes a few comments where the issue occurs.
Thanks in advance!
Code:
import AVFoundation
import Accelerate
class Analyzer {
// This function is implemented using the code from the following tutorial:
// https://developer.apple.com/documentation/accelerate/vdsp/fast_fourier_transforms/finding_the_component_frequencies_in_a_composite_sine_wave
func fftTransform(signal: [Float], n: vDSP_Length) -> [Int] {
let observed: [DSPComplex] = stride(from: 0, to: Int(n), by: 2).map {
return DSPComplex(real: signal[$0],
imag: signal[$0.advanced(by: 1)])
}
let halfN = Int(n / 2)
var forwardInputReal = [Float](repeating: 0, count: halfN)
var forwardInputImag = [Float](repeating: 0, count: halfN)
var forwardInput = DSPSplitComplex(realp: &forwardInputReal,
imagp: &forwardInputImag)
vDSP_ctoz(observed, 2,
&forwardInput, 1,
vDSP_Length(halfN))
let log2n = vDSP_Length(log2(Float(n)))
guard let fftSetUp = vDSP_create_fftsetup(
log2n,
FFTRadix(kFFTRadix2)) else {
fatalError("Can't create FFT setup.")
}
defer {
vDSP_destroy_fftsetup(fftSetUp)
}
var forwardOutputReal = [Float](repeating: 0, count: halfN)
var forwardOutputImag = [Float](repeating: 0, count: halfN)
var forwardOutput = DSPSplitComplex(realp: &forwardOutputReal,
imagp: &forwardOutputImag)
vDSP_fft_zrop(fftSetUp,
&forwardInput, 1,
&forwardOutput, 1,
log2n,
FFTDirection(kFFTDirection_Forward))
let componentFrequencies = forwardOutputImag.enumerated().filter {
$0.element < -1
}.map {
return $0.offset
}
return componentFrequencies
}
func run() {
// The frequencies array is a array of frequencies which is then converted to points on sinus curves (signal)
let n = vDSP_Length(4*4096)
let frequencies: [Float] = [1, 5, 25, 30, 75, 100, 300, 500, 512, 1023]
let tau: Float = .pi * 2
let signal: [Float] = (0 ... n).map { index in
frequencies.reduce(0) { accumulator, frequency in
let normalizedIndex = Float(index) / Float(n)
return accumulator + sin(normalizedIndex * frequency * tau)
}
}
// These signals are then restored using the fftTransform function above, giving the exact same values as in the "frequencies" variable
let frequenciesRestored = fftTransform(signal: signal, n: n).map({Float($0)})
assert(frequenciesRestored == frequencies)
// Now I want to do the same thing, but reading the frequencies from a file (which includes a constant tone at 13000 Hz)
let file = { PATH TO A WAV-FILE WITH A SINGLE TONE AT 13000Hz RUNNING FOR 1 SECOND }
let asset = AVURLAsset(url: URL(fileURLWithPath: file))
let track = asset.tracks[0]
do {
let reader = try AVAssetReader(asset: asset)
let sampleRate = 48000.0
let outputSettingsDict: [String: Any] = [
AVFormatIDKey: kAudioFormatLinearPCM,
AVSampleRateKey: Int(sampleRate),
AVLinearPCMIsNonInterleaved: false,
AVLinearPCMBitDepthKey: 16,
AVLinearPCMIsFloatKey: false,
AVLinearPCMIsBigEndianKey: false,
]
let output = AVAssetReaderTrackOutput(track: track, outputSettings: outputSettingsDict)
output.alwaysCopiesSampleData = false
reader.add(output)
reader.startReading()
typealias audioBuffertType = Int16
autoreleasepool {
while (reader.status == .reading) {
if let sampleBuffer = output.copyNextSampleBuffer() {
var audioBufferList = AudioBufferList(mNumberBuffers: 1, mBuffers: AudioBuffer(mNumberChannels: 0, mDataByteSize: 0, mData: nil))
var blockBuffer: CMBlockBuffer?
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
sampleBuffer,
bufferListSizeNeededOut: nil,
bufferListOut: &audioBufferList,
bufferListSize: MemoryLayout<AudioBufferList>.size,
blockBufferAllocator: nil,
blockBufferMemoryAllocator: nil,
flags: kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
blockBufferOut: &blockBuffer
);
let buffers = UnsafeBufferPointer<AudioBuffer>(start: &audioBufferList.mBuffers, count: Int(audioBufferList.mNumberBuffers))
for buffer in buffers {
let samplesCount = Int(buffer.mDataByteSize) / MemoryLayout<audioBuffertType>.size
let samplesPointer = audioBufferList.mBuffers.mData!.bindMemory(to: audioBuffertType.self, capacity: samplesCount)
let samples = UnsafeMutableBufferPointer<audioBuffertType>(start: samplesPointer, count: samplesCount)
let myValues: [Float] = samples.map {
let value = Float($0)
return value
}
// Here I would expect my array to include multiple "13000" which is the frequency of the tone in my file
// I'm not sure what the variable 'n' does in this case, but changing it seems to change the result.
// The value should be twice as high as the highest measurable frequency (Nyquist frequency) (13000),
// but this crashes the application:
let mySignals = fftTransform(signal: myValues, n: vDSP_Length(2 * 13000))
assert(mySignals[0] == 13000)
}
}
}
}
}
catch {
print("error!")
}
}
}
The test clip can be generated using:
sox -G -n -r 48000 ~/outputfile.wav synth 1.0 sine 13000