I'm currently building a game music player in Swift for Mac OS, using GME for generating sound buffers and SDL for audio playback. I have previously (and successfully) used SDL_QueueAudio for playback, but I need more fine tuned control over the buffer handling in order to determine song progress, song end etc. So, that leaves me with SDL_AudioCallback in the SDL_AudioSpec API.
I CAN hear the song play, but it's accompanied by noise and distortion (and sometimes echo depending on what I try).
Setup
var desiredSpec = SDL_AudioSpec()
desiredSpec.freq = Int32(44100)
desiredSpec.format = SDL_AudioFormat(AUDIO_U8)
desiredSpec.samples = 512
desiredSpec.channels = 2
desiredSpec.callback = { callback(userData: $0, stream: $1, length: $2) }
SDL_OpenAudio(&desiredSpec, nil)
Callback
func callback(userData: UnsafeMutableRawPointer?, stream: UnsafeMutablePointer<UInt8>?, length: Int32) {
// Half sample length to compensate for double byte Int16.
var output = Array<Int16>.init(repeating: 0, count: Int(length / 2))
// Generates stereo signed Int16 audio samples.
gme_play(emulator, Int32(output.count), &output)
// Converts Int16 samples to UInt8 samples expected by the mixer below. Then adds to buffer.
var buffer = [UInt8]()
for i in 0 ..< (output.count) {
let uIntVal = UInt16(bitPattern: output[i])
buffer.append(UInt8(uIntVal & 0xff))
buffer.append(UInt8((uIntVal >> 8) & 0xff))
}
SDL_MixAudio(stream, buffer, Uint32(length), SDL_MIX_MAXVOLUME)
}
SOLVED!
In the end this was enough to make it work:
desiredSpec.format = SDL_AudioFormat(AUDIO_S16)
var buffer = Array<Int16>(repeating: 0, count: Int(length / 2))
gme_play(emulator, Int32(output.count), &output)
SDL_memcpy(stream, buffer, Int(length))
Related
I am trying to figure out how to use Apple's Core Audio APIs to record and play back linear PCM audio without any file I/O. (The recording side seems to work just fine.)
The code I have is pretty short, and it works somewhat. However, I am having trouble with identifying the source of clicks and pops in the output. I've been beating my head against this for many days with no success.
I have posted a git repo here, with a command-line program program that shows where I'm at: https://github.com/maxharris9/AudioRecorderPlayerSwift/tree/main/AudioRecorderPlayerSwift
I put in a couple of functions to prepopulate the recording. The tone generator (makeWave) and noise generator (makeNoise) are just in here as debugging aids. I'm ultimately trying to identify the source of the messed up output when you play back a recording in audioData:
// makeWave(duration: 30.0, frequency: 441.0) // appends to `audioData`
// makeNoise(frameCount: Int(44100.0 * 30)) // appends to `audioData`
_ = Recorder() // appends to `audioData`
_ = Player() // reads from `audioData`
Here's the player code:
var lastIndexRead: Int = 0
func outputCallback(inUserData: UnsafeMutableRawPointer?, inAQ: AudioQueueRef, inBuffer: AudioQueueBufferRef) {
guard let player = inUserData?.assumingMemoryBound(to: Player.PlayingState.self) else {
print("missing user data in output callback")
return
}
let sliceStart = lastIndexRead
let sliceEnd = min(audioData.count, lastIndexRead + bufferByteSize - 1)
print("slice start:", sliceStart, "slice end:", sliceEnd, "audioData.count", audioData.count)
if sliceEnd >= audioData.count {
player.pointee.running = false
print("found end of audio data")
return
}
let slice = Array(audioData[sliceStart ..< sliceEnd])
let sliceCount = slice.count
// doesn't fix it
// audioData[sliceStart ..< sliceEnd].withUnsafeBytes {
// inBuffer.pointee.mAudioData.copyMemory(from: $0.baseAddress!, byteCount: Int(sliceCount))
// }
memcpy(inBuffer.pointee.mAudioData, slice, sliceCount)
inBuffer.pointee.mAudioDataByteSize = UInt32(sliceCount)
lastIndexRead += sliceCount + 1
// enqueue the buffer, or re-enqueue it if it's a used one
check(AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, nil))
}
struct Player {
struct PlayingState {
var packetPosition: UInt32 = 0
var running: Bool = false
var start: Int = 0
var end: Int = Int(bufferByteSize)
}
init() {
var playingState: PlayingState = PlayingState()
var queue: AudioQueueRef?
// this doesn't help
// check(AudioQueueNewOutput(&audioFormat, outputCallback, &playingState, CFRunLoopGetMain(), CFRunLoopMode.commonModes.rawValue, 0, &queue))
check(AudioQueueNewOutput(&audioFormat, outputCallback, &playingState, nil, nil, 0, &queue))
var buffers: [AudioQueueBufferRef?] = Array<AudioQueueBufferRef?>.init(repeating: nil, count: BUFFER_COUNT)
print("Playing\n")
playingState.running = true
for i in 0 ..< BUFFER_COUNT {
check(AudioQueueAllocateBuffer(queue!, UInt32(bufferByteSize), &buffers[i]))
outputCallback(inUserData: &playingState, inAQ: queue!, inBuffer: buffers[i]!)
if !playingState.running {
break
}
}
check(AudioQueueStart(queue!, nil))
repeat {
CFRunLoopRunInMode(CFRunLoopMode.defaultMode, BUFFER_DURATION, false)
} while playingState.running
// delay to ensure queue emits all buffered audio
CFRunLoopRunInMode(CFRunLoopMode.defaultMode, BUFFER_DURATION * Double(BUFFER_COUNT + 1), false)
check(AudioQueueStop(queue!, true))
check(AudioQueueDispose(queue!, true))
}
}
I captured the audio with Audio Hijack, and noticed that the jumps are indeed correlated with the size of the buffer:
Why is this happening, and what can I do to fix it?
I believe you were beginning to zero in on, or at least suspect, the cause of the popping you are hearing: it's caused by discontinuities in your waveform.
My initial hunch was that you were generating the buffers independently (i.e. assuming that each buffer starts at time=0), but I checked out your code and it wasn't that. I suspect some of the calculations in makeWave were at fault. To check this theory I replaced your makeWave with the following:
func makeWave(offset: Double, numSamples: Int, sampleRate: Float64, frequency: Float64, numChannels: Int) -> [Int16] {
var data = [Int16]()
for sample in 0..<numSamples / numChannels {
// time in s
let t = offset + Double(sample) / sampleRate
let value = Double(Int16.max) * sin(2 * Double.pi * frequency * t)
for _ in 0..<numChannels {
data.append(Int16(value))
}
}
return data
}
This function removes the double loop in the original, accepts an offset so it knows which part of the wave is being generated and makes some changes to the sampling of the sine wave.
When Player is modified to use this function you get a lovely steady tone. I'll add the changes to player soon. I can't in good conscience show the quick and dirty mess it is now to the public.
Based on your comments below I refocused on your player. The issue was that the audio buffers expect byte counts but the slice count and some other calculations were based on Int16 counts. The following version of outputCallback will fix it. Concentrate on the use of the new variable bytesPerChannel.
func outputCallback(inUserData: UnsafeMutableRawPointer?, inAQ: AudioQueueRef, inBuffer: AudioQueueBufferRef) {
guard let player = inUserData?.assumingMemoryBound(to: Player.PlayingState.self) else {
print("missing user data in output callback")
return
}
let bytesPerChannel = MemoryLayout<Int16>.size
let sliceStart = lastIndexRead
let sliceEnd = min(audioData.count, lastIndexRead + bufferByteSize/bytesPerChannel)
if sliceEnd >= audioData.count {
player.pointee.running = false
print("found end of audio data")
return
}
let slice = Array(audioData[sliceStart ..< sliceEnd])
let sliceCount = slice.count
print("slice start:", sliceStart, "slice end:", sliceEnd, "audioData.count", audioData.count, "slice count:", sliceCount)
// need to be careful to convert from counts of Ints to bytes
memcpy(inBuffer.pointee.mAudioData, slice, sliceCount*bytesPerChannel)
inBuffer.pointee.mAudioDataByteSize = UInt32(sliceCount*bytesPerChannel)
lastIndexRead += sliceCount
// enqueue the buffer, or re-enqueue it if it's a used one
check(AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, nil))
}
I did not look at the Recorder code, but you may want to check if the same sort of error crept in there.
I'm having trouble converting a linear PCM buffer to a compressed AAC ELD (Enhanced Low Delay) buffer.
I got some working code for the conversion into ilbc format from this question:
AVAudioCompressedBuffer to UInt8 array and vice versa
This approach worked fine.
I changed the input for the format to this:
let packetCapacity = 8
let maximumPacketSize = 96
lazy var capacity = packetCapacity * maximumPacketSize // 768
let convertedSampleRate: Double = 16000
lazy var aaceldFormat: AVAudioFormat = {
var descriptor = AudioStreamBasicDescription(mSampleRate: convertedSampleRate, mFormatID: kAudioFormatMPEG4AAC_ELD, mFormatFlags: 0, mBytesPerPacket: 0, mFramesPerPacket: 0, mBytesPerFrame: 0, mChannelsPerFrame: 1, mBitsPerChannel: 0, mReserved: 0)
return AVAudioFormat(streamDescription: &descriptor)!
}()
The conversion to a compressed buffer worked fine and I was able to convert the buffer to a UInt8 Array.
However, the conversion back to a PCM Buffer didn't work. The input block for the conversion back to a buffer looks like this:
func convertToBuffer(uints: [UInt8], outcomeSampleRate: Double) -> AVAudioPCMBuffer? {
// Convert to buffer
let compressedBuffer: AVAudioCompressedBuffer = AVAudioCompressedBuffer(format: aaceldFormat, packetCapacity: AVAudioPacketCount(packetCapacity), maximumPacketSize: maximumPacketSize)
compressedBuffer.byteLength = UInt32(capacity)
compressedBuffer.packetCount = AVAudioPacketCount(packetCapacity)
var compressedBytes = uints
compressedBytes.withUnsafeMutableBufferPointer {
compressedBuffer.data.copyMemory(from: $0.baseAddress!, byteCount: capacity)
}
guard let audioFormat = AVAudioFormat(
commonFormat: AVAudioCommonFormat.pcmFormatFloat32,
sampleRate: outcomeSampleRate,
channels: 1,
interleaved: false
) else { return nil }
guard let uncompressor = getUncompressingConverter(outputFormat: audioFormat) else { return nil }
var newBufferAvailable = true
let inputBlock : AVAudioConverterInputBlock = {
inNumPackets, outStatus in
if newBufferAvailable {
outStatus.pointee = .haveData
newBufferAvailable = false
return compressedBuffer
} else {
outStatus.pointee = .noDataNow
return nil
}
}
guard let uncompressedBuffer: AVAudioPCMBuffer = AVAudioPCMBuffer(pcmFormat: audioFormat, frameCapacity: AVAudioFrameCount((audioFormat.sampleRate / 10))) else { return nil }
var conversionError: NSError?
uncompressor.convert(to: uncompressedBuffer, error: &conversionError, withInputFrom: inputBlock)
if let err = conversionError {
print("couldnt decompress compressed buffer", err)
}
return uncompressedBuffer
}
The error block after the convert method triggers and prints out "too few bits left in input buffer". Also, it seems like the input block only gets called once.
I've tried different codes and this seems to be one of the most common outcomes. I'm also not sure if the problem is in the initial conversion from the pcm buffer to uint8 array although I get an UInt8 Array filled with 768 values every 0.1 seconds (Sometimes the array contains a few zeros at the end, which doesn't happen in ilbc format.
Questions:
1. Is the initial conversion from pcm buffer to uint8 array done with the right approach? Are the packetCapacity, capacity and maximumPacketSize valid? -> Again, seems to work
2. Am I missing something at the conversion back to pcm buffer? Also, am I using the variables in the right way?
3. Has anyone achieved this conversion without using C in the project?
** EDIT: ** I also worked with the approach from this post:
Decode AAC to PCM format using AVAudioConverter Swift
It works fine with AAC format, but not with AAC_LD or AAC_ELD
Context
I am writing a signal interpreter using AVAudioEngine which will analyse microphone input. During development, I want to use a default input buffer so I don't have to make noises for the microphone to test my changes.
I am developing using Catalyst.
Problem
I am using AVAudioSinkNode to get the sound buffer (the performance is allegedly better than using .installTap). I am using (a subclass of) AVAudioSourceNode to generate a sine wave. When I connect these two together, I expect the sink node's callback to be called, but it is not. Neither is the source node's render block called.
let engine = AVAudioEngine()
let output = engine.outputNode
let outputFormat = output.inputFormat(forBus: 0)
let sampleRate = Float(outputFormat.sampleRate)
let sineNode440 = AVSineWaveSourceNode(
frequency: 440,
amplitude: 1,
sampleRate: sampleRate
)
let sink = AVAudioSinkNode { _, frameCount, audioBufferList -> OSStatus in
print("[SINK] + \(frameCount) \(Date().timeIntervalSince1970)")
return noErr
}
engine.attach(sineNode440)
engine.attach(sink)
engine.connect(sineNode440, to: sink, format: nil)
try engine.start()
Additional tests
If I connect engine.inputNode to the sink (i.e., engine.connect(engine.inputNode, to: sink, format: nil)), the sink callback is called as expected.
When I connect sineNode440 to engine.outputNode, I can hear the sound and the render block is called as expected.
So both the source and the sink work individually when connected to device input/output, but not together.
AVSineWaveSourceNode
Not important to the question but relevant: AVSineWaveSourceNode is based on Apple sample code. This node produces the correct sound when connected to engine.outputNode.
class AVSineWaveSourceNode: AVAudioSourceNode {
/// We need this separate class to be able to inject the state in the render block.
class State {
let amplitude: Float
let phaseIncrement: Float
var phase: Float = 0
init(frequency: Float, amplitude: Float, sampleRate: Float) {
self.amplitude = amplitude
phaseIncrement = (2 * .pi / sampleRate) * frequency
}
}
let state: State
init(frequency: Float, amplitude: Float, sampleRate: Float) {
let state = State(
frequency: frequency,
amplitude: amplitude,
sampleRate: sampleRate
)
self.state = state
let format = AVAudioFormat(standardFormatWithSampleRate: Double(sampleRate), channels: 1)!
super.init(format: format, renderBlock: { isSilence, _, frameCount, audioBufferList -> OSStatus in
print("[SINE GENERATION \(frequency) - \(frameCount)]")
let tau = 2 * Float.pi
let ablPointer = UnsafeMutableAudioBufferListPointer(audioBufferList)
for frame in 0..<Int(frameCount) {
// Get signal value for this frame at time.
let value = sin(state.phase) * amplitude
// Advance the phase for the next frame.
state.phase += state.phaseIncrement
if state.phase >= tau {
state.phase -= tau
}
if state.phase < 0.0 {
state.phase += tau
}
// Set the same value on all channels (due to the inputFormat we have only 1 channel though).
for buffer in ablPointer {
let buf: UnsafeMutableBufferPointer<Float> = UnsafeMutableBufferPointer(buffer)
buf[frame] = value
}
}
return noErr
})
for i in 0..<self.numberOfInputs {
print("[SINEWAVE \(frequency)] BUS \(i) input format: \(self.inputFormat(forBus: i))")
}
for i in 0..<self.numberOfOutputs {
print("[SINEWAVE \(frequency)] BUS \(i) output format: \(self.outputFormat(forBus: i))")
}
}
}
outputNode drives the audio processing graph when AVAudioEngine is configured normally ("online"). outputNode pulls audio from its input node, which pulls audio from its input node(s), etc. When you connect sineNode and sink to each other without making a connection to outputNode, there is nothing attached to an output bus of sink or an input bus of outputNode, and therefore when the hardware asks for audio from outputNode it has nowhere to get it.
If I understand correctly I think you can accomplish what you'd like to do by getting rid of sink, connecting sineNode to outputNode, and running AVAudioEngine in manual rendering mode. In manual rendering mode you pass a manual render block to receive audio (similar to AVAudioSinkNode) and drive the graph manually by calling renderOffline(_:to:).
I am writing an application that needs to detect a frequency in the audio stream. I have read about a million articles and am having problems crossing the finish line. I have my audio data coming to me in this function via the AVFoundation Framework from Apple.
I am using Swift 4.2 and have tried playing with the FFT functions, but they are a little over my head at the current moment.
Any thoughts?
// get's the data as a call back for the AVFoundation framework.
public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// prints the whole sample buffer and tells us alot of information about what's inside
print(sampleBuffer);
// create a buffer, ready out the data, and use the CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer method to put
// it into a buffer
var buffer: CMBlockBuffer? = nil
var audioBufferList = AudioBufferList(mNumberBuffers: 1,
mBuffers: AudioBuffer(mNumberChannels: 1, mDataByteSize: 0, mData: nil))
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, bufferListSizeNeededOut: nil, bufferListOut: &audioBufferList, bufferListSize: MemoryLayout<AudioBufferList>.size, blockBufferAllocator: nil, blockBufferMemoryAllocator: nil, flags: UInt32(kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment), blockBufferOut: &buffer);
let abl = UnsafeMutableAudioBufferListPointer(&audioBufferList)
var sum:Int64 = 0
var count:Int = 0
var bufs:Int = 0
var max:Int64 = 0;
var min:Int64 = 0
// loop through the samples and check for min's and maxes.
for buff in abl {
let samples = UnsafeMutableBufferPointer<Int16>(start: UnsafeMutablePointer(OpaquePointer(buff.mData)),
count: Int(buff.mDataByteSize)/MemoryLayout<Int16>.size)
for sample in samples {
let s = Int64(sample)
sum = (sum + s*s)
count += 1
if(s > max) {
max = s;
}
if(s < min) {
min = s;
}
print(sample)
}
bufs += 1
}
// debug
print("min - \(min), max = \(max)");
// update the interface
DispatchQueue.main.async {
self.frequencyDataOutLabel.text = "min - \(min), max = \(max)";
}
// stop the capture session
self.captureSession.stopRunning();
}
After much research I found that the answer is to use an FFT method (Fast Fourier Transform). It takes the raw input from the iPhone's code above and converts it into an array of values representing magnitude of each frequency in bands.
Much props to the open code here https://github.com/jscalo/tempi-fft that created a visualizer that captures the data, and displays it. From there, it was a matter of manipulating it to meet the needs. In my case I was looking for frequencies very high above human hearing (20kHz range). By scanning the later half of the array in the tempi-fft code I was able to determine if frequencies I was looking for were present and loud enough.
I updated to latest version of Audiokit 4.5
and my Audiokit class that is meant to listen to microphone amplitude is now printing: 55: EXCEPTION (-1): "" infinitely on the console. the app doesnt crash or anything.. but is logging that.
My app is a video camera app that records using GPUImage library.
The logs appear only when I start recording for some reason.
In addition to that. My onAmplitudeUpdate callback method no longer outputs anything, just 0.0 values. This didnt happen before updating Audiokit. Any ideas here ?
Here is my class:
// G8Audiokit.swift
// GenerateToolkit
//
// Created by Omar Juarez Ortiz on 2017-08-03.
// Copyright © 2017 All rights reserved.
//
import Foundation
import AudioKit
class G8Audiokit{
//Variables for Audio audioAnalysis
var microphone: AKMicrophone! // Device Microphone
var amplitudeTracker: AKAmplitudeTracker! // Tracks the amplitude of the microphone
var signalBooster: AKBooster! // boosts the signal
var audioAnalysisTimer: Timer? // Continuously calls audioAnalysis function
let amplitudeBuffSize = 10 // Smaller buffer will yield more amplitude responiveness and instability, higher value will respond slower but will be smoother
var amplitudeBuffer: [Double] // This stores a rolling window of amplitude values, used to get average amplitude
public var onAmplitudeUpdate: ((_ value: Float) -> ())?
static let sharedInstance = G8Audiokit()
private init(){ //private because that way the class can only be initialized by itself.
self.amplitudeBuffer = [Double](repeating: 0.0, count: amplitudeBuffSize)
startAudioAnalysis()
}
// public override init() {
// // Initialize the audio buffer with zeros
//
// }
/**
Set up AudioKit Processing Pipeline and start the audio analysis.
*/
func startAudioAnalysis(){
stopAudioAnalysis()
// Settings
AKSettings.bufferLength = .medium // Set's the audio signal buffer size
do {
try AKSettings.setSession(category: .playAndRecord)
} catch {
AKLog("Could not set session category.")
}
// ----------------
// Input + Pipeline
// Initialize the built-in Microphone
microphone = AKMicrophone()
// Pre-processing
signalBooster = AKBooster(microphone)
signalBooster.gain = 5.0 // When video recording starts, the signal gets boosted to the equivalent of 5.0, so we're setting it to 5.0 here and changing it to 1.0 when we start video recording.
// Filter out anything outside human voice range
let highPass = AKHighPassFilter(signalBooster, cutoffFrequency: 55) // Lowered this a bit to be more sensitive to bass-drums
let lowPass = AKLowPassFilter(highPass, cutoffFrequency: 255)
// At this point you don't have much signal left, so you balance it against the original signal!
let rebalanced = AKBalancer(lowPass, comparator: signalBooster)
// Track the amplitude of the rebalanced signal, we use this value for audio reactivity
amplitudeTracker = AKAmplitudeTracker(rebalanced)
// Mute the audio that gets routed to the device output, preventing feedback
let silence = AKBooster(amplitudeTracker, gain:0)
// We need to complete the chain, routing silenced audio to the output
AudioKit.output = silence
// Start the chain and timer callback
do{ try AudioKit.start(); }
catch{}
audioAnalysisTimer = Timer.scheduledTimer(timeInterval: 0.01,
target: self,
selector: #selector(audioAnalysis),
userInfo: nil,
repeats: true)
// Put the timer on the main thread so UI updates don't interrupt
RunLoop.main.add(audioAnalysisTimer!, forMode: RunLoopMode.commonModes)
}
// Call this when closing the app or going to background
public func stopAudioAnalysis(){
audioAnalysisTimer?.invalidate()
AudioKit.disconnectAllInputs() // Disconnect all AudioKit components, so they can be relinked when we call startAudioAnalysis()
}
// This is called on the audioAnalysisTimer
#objc func audioAnalysis(){
writeToBuffer(val: amplitudeTracker.amplitude) // Write an amplitude value to the rolling buffer
let val = getBufferAverage()
onAmplitudeUpdate?(Float(val))
}
// Writes amplitude values to a rolling window buffer, writes to index 0 and pushes the previous values to the right, removes the last value to preserve buffer length.
func writeToBuffer(val: Double) {
for (index, _) in amplitudeBuffer.enumerated() {
if (index == 0) {
amplitudeBuffer.insert(val, at: 0)
_ = amplitudeBuffer.popLast()
}
else if (index < amplitudeBuffer.count-1) {
amplitudeBuffer.rearrange(from: index-1, to: index+1)
}
}
}
// Returns the average of the amplitudeBuffer, resulting in a smoother audio reactivity signal
func getBufferAverage() -> Double {
var avg:Double = 0.0
for val in amplitudeBuffer {
avg = avg + val
}
avg = avg / amplitudeBuffer.count
return avg
}
}