I'm using TinySoundFont to use SF2 files on watchOS. I want to play the raw audio generated by the framework in real time (which means calling tsf_note_on as soon as the corresponding button is pressed and calling tsf_render_short as soon as new data is needed). I'm using an AVAudioSourceNode to achieve that.
Despite the sound rendering fine when I render it into a file, it's really noisy when played using the AVAudioSourceNode. (Based on the answer from Rob Napier, this might be because I ignore the timestamp property - I'm looking for a solution that addresses that concern.) What causes this issue and how can I fix it?
I'm looking for a solution that renders audio realtime and not precalculates it, since I want to handle looping sounds correctly as well.
You can download a sample GitHub project here.
ContentView.swift
import SwiftUI
import AVFoundation
struct ContentView: View {
#ObservedObject var settings = Settings.shared
init() {
settings.prepare()
}
var body: some View {
Button("Play Sound") {
Settings.shared.playSound()
if !settings.engine.isRunning {
do {
try settings.engine.start()
} catch {
print(error)
}
}
}
}
}
Settings.swift
import SwiftUI
import AVFoundation
class Settings: ObservableObject {
static let shared = Settings()
var engine: AVAudioEngine!
var sourceNode: AVAudioSourceNode!
var tinySoundFont: OpaquePointer!
func prepare() {
let soundFontPath = Bundle.main.path(forResource: "GMGSx", ofType: "sf2")
tinySoundFont = tsf_load_filename(soundFontPath)
tsf_set_output(tinySoundFont, TSF_MONO, 44100, 0)
setUpSound()
}
func setUpSound() {
if let engine = engine,
let sourceNode = sourceNode {
engine.detach(sourceNode)
}
engine = .init()
let mixerNode = engine.mainMixerNode
let audioFormat = AVAudioFormat(
commonFormat: .pcmFormatInt16,
sampleRate: 44100,
channels: 1,
interleaved: false
)
guard let audioFormat = audioFormat else {
return
}
sourceNode = AVAudioSourceNode(format: audioFormat) { silence, timeStamp, frameCount, audioBufferList in
guard let data = self.getSound(length: Int(frameCount)) else {
return 1
}
let ablPointer = UnsafeMutableAudioBufferListPointer(audioBufferList)
data.withUnsafeBytes { (intPointer: UnsafePointer<Int16>) in
for index in 0 ..< Int(frameCount) {
let value = intPointer[index]
// Set the same value on all channels (due to the inputFormat, there's only one channel though).
for buffer in ablPointer {
let buf: UnsafeMutableBufferPointer<Int16> = UnsafeMutableBufferPointer(buffer)
buf[index] = value
}
}
}
return noErr
}
engine.attach(sourceNode)
engine.connect(sourceNode, to: mixerNode, format: audioFormat)
do {
try AVAudioSession.sharedInstance().setCategory(.playback)
} catch {
print(error)
}
}
func playSound() {
tsf_note_on(tinySoundFont, 0, 60, 1)
}
func getSound(length: Int) -> Data? {
let array = [Int16]()
var storage = UnsafeMutablePointer<Int16>.allocate(capacity: length)
storage.initialize(from: array, count: length)
tsf_render_short(tinySoundFont, storage, Int32(length), 0)
let data = Data(bytes: storage, count: length)
storage.deallocate()
return data
}
}
The AVAudioSourceNode initializer takes a render block. In the mode you're using (live playback), this is a real-time callback, so you have a very tight deadline to fill the block with the requested data and return it so it can be played. You don't have a ton of time to do calculations. You definitely don't have time to access the filesystem.
In your block, you're re-computing an entire WAV every render cycle, then writing it to disk, then reading it from disk, then filling in the block that was requested. You ignore the timestamp requested, and always fill the buffer starting at sample zero. The mismatch is what's causing the buzzing. The fact that you're so slow about it is probably what's causing the pitch-drop.
Depending on the size of your files, the simplest way to implement this is to first decode everything into memory, and fill in the buffers for the timestamps and lengths requested. It looks like your C code already generates PCM data, so there's no need to convert it into a WAV file. It seems to already be in the right format.
Apple provides a good sample project for a Signal Generator that you should use as a starting point. Download that and make sure it works as expected. Then work to swap in your SF2 code. You may also find the video on this helpful: What’s New in AVAudioEngine.
The easiest tool to use here is probably an AVAudioPlayerNode. Your SoundFontHelper is making things much more complicated, so I've removed it and just call TSF directly from Swift. To do this, create a file called tsf.c as follows:
#define TSF_IMPLEMENTATION
#include "tsf.h"
And add it to BridgingHeader.h:
#import "tsf.h"
Simplify ContentView to this:
import SwiftUI
struct ContentView: View {
#ObservedObject var settings = Settings.shared
init() {
// You'll want error handling here.
try! settings.prepare()
}
var body: some View {
Button("Play Sound") {
settings.play()
}
}
}
And that leaves the new version of Settings, which is the meat of it:
import SwiftUI
import AVFoundation
class Settings: ObservableObject {
static let shared = Settings()
var engine = AVAudioEngine()
let playerNode = AVAudioPlayerNode()
var tsf: OpaquePointer
var outputFormat = AVAudioFormat()
init() {
let soundFontPath = Bundle.main.path(forResource: "GMGSx", ofType: "sf2")
tsf = tsf_load_filename(soundFontPath)
engine.attach(playerNode)
engine.connect(playerNode, to: engine.mainMixerNode, format: nil)
updateOutputFormat()
}
// For simplicity, this object assumes the outputFormat does not change during its lifetime.
// It's important to watch for route changes, and recreate this object if they occur. For details, see:
// https://developer.apple.com/documentation/avfaudio/avaudiosession/responding_to_audio_session_route_changes
func updateOutputFormat() {
outputFormat = engine.mainMixerNode.outputFormat(forBus: 0)
}
func prepare() throws {
// Start the engine
try AVAudioSession.sharedInstance().setCategory(.playback)
try engine.start()
playerNode.play()
updateOutputFormat()
// Configure TSF. The only important thing here is the sample rate, which can be different on different hardware.
// Core Audio has a defined format of "deinterleaved 32-bit floating point."
tsf_set_output(tsf,
TSF_STEREO_UNWEAVED, // mode
Int32(outputFormat.sampleRate), // sampleRate
0) // gain
}
func play() {
tsf_note_on(tsf,
0, // preset_index
60, // key (middle C)
1.0) // velocity
// These tones have a long falloff, so you want a lot of source data. This is 10s.
let frameCount = 10 * Int(outputFormat.sampleRate)
// Create a buffer for the samples
let buffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: AVAudioFrameCount(frameCount))!
buffer.frameLength = buffer.frameCapacity
// Render the samples. Do not mix. This buffer has been extended to
// the needed size by the assignment to `frameLength` above. The call to
// `assumingMemoryBound` is known to be correct because the format is Float32.
let ptr = buffer.audioBufferList.pointee.mBuffers.mData?.assumingMemoryBound(to: Float.self)
tsf_render_float(tsf,
ptr, // buffer
Int32(frameCount), // samples
0) // mixing (do not mix)
// All done. Play the buffer, interrupting whatever is currently playing
playerNode.scheduleBuffer(buffer, at: nil, options: .interrupts)
}
}
You can find the full version at my fork. You can also see the first commit, which is another approach that maintains your SoundFontHelper and does conversions to deal with it, but it's much simpler to just render the audio correctly in the first place.
Related
I've been unable to find any examples that walk one through how to export audio that you have piped through AudioKit's 5 filters.
I want to add some effect to the audio (like reverb..etc) then export filtered audio
here my try:
// load empty file to write
let docs = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!
let dstURL:URL = docs.appendingPathComponent("rendred.caf")
var audioFile:AVAudioFile!
do {
audioFile = try AVAudioFile(forReading: dstURL)
} catch {
Log("Could not load file")
}
// duration of audio after add some effects
let duration = self.conductor.player.duration
let rendredTime = Double(duration)
// using AudioEngine() to export mixed audio but it's not work <----🛑
do {
try? self.conductor.engine.renderToFile(audioFile, duration: rendredTime, prerender: {
self.conductor.player.play()
})
}
I'm not sure how renderToFile() function work
It's hard to answer without your answers to what Mark Jeschke asked you but in general:
You can load the audio file, create a new reverb (like CostelloReverb for example).
Put the loaded audio as a node to the reverb.
Put the reverb as the output (again - it depends what exactly you want to do) to the AudioEngine output.
If you want to blend with playback you can use DryWetMixer.
Some general example:
import Foundation
import AudioKit
import SoundpipeAudioKit
class AudioProcessing {
let audioFile:(choose the way you want to load/play)
let playback :(same)
let reverb:CostelloReverb
let dryWetMix:DryWetMixer
let engine = AudioEngine()
init() {
(init the audioFile and playback here)
reverb = CostelloReverb(audioFile)
dryWetMix = DryWetMixer(playback, reverb)
engine.output = dryWetMix
}
}
I've been trying to get audio data to be played when it's being received but from the current implementation I found had flaws. I've modified it somewhat, but there are almost no examples of this case of streaming live audio data. The issue occurs when playing the beginning of the streamed data. It will play a few buffers, then when I monitor what is going on with the queue's being called, the AudioOutputCallback method goes rogue. It will call properly for the first few times then call more than the amount of buffers I have allocated (about 6 or more times). Currently I've tried manually calling the callback method and it will call but not cycle through the 3 buffers allocated.
Some specifications I have are, the data is coming in at 1000 Bytes and it never came in, in anything other than that, so it's pretty constant. Audio Queue is only used for Playback of streamed data from the network. 3 buffers are allocated as Apple suggested there be. They are somewhat reused. This implementation is supposed to be some sort of circular buffer.
Here is my current implementation:
Initialization:
public func setupAudioQueuePlayback() {
incomingData = NSMutableData()
audioQueueBuffers = []
isBuffersUsed = []
var outputStreamDescription = createLPCMDescription()
AudioQueueNewOutputWithDispatchQueue(&audioPlaybackQueue, &outputStreamDescription, 0, playbackQueue, AudioQueueOutputCallback)
createAudioBuffers()
AudioQueueStart(audioPlaybackQueue!, nil)
}
CreateBuffers method:
fileprivate func createAudioBuffers() {
for _ in 0..<allocatedBuffersUponStart {
var audioBuffer: AudioQueueBufferRef? = nil
let osStatus = AudioQueueAllocateBuffer(audioPlaybackQueue!, UInt32(1000), &audioBuffer)
if osStatus != noErr {
print("Error allocating buffers!"); return
} else {
self.audioQueueBuffers.append(audioBuffer)
self.isBuffersUsed.append(false)
AudioQueueAllocateBuffer(audioPlaybackQueue!, UInt32(1000), &audioBuffer)
}
}
}
Where the data is entering method:
private func playFromData(_ data: Data) {
playbackGroup.enter()
playbackQueue.sync {
incomingData.append(data)
var bufferIndex = 0
while true {
if !isBuffersUsed[bufferIndex] {
isBuffersUsed[bufferIndex] = true
break
} else {
bufferIndex += 1
if bufferIndex >= allocatedBuffersUponStart {
bufferIndex = 0
}
}
}
currentIndexGlobal = bufferIndex
let bufferReference = audioQueueBuffers[bufferIndex]
bufferReference?.pointee.mAudioDataByteSize = UInt32(incomingData.length)
bufferReference?.pointee.mAudioData.advanced(by: 0).copyMemory(from: incomingData.bytes, byteCount: incomingData.length)
AudioQueueEnqueueBuffer(audioPlaybackQueue!, bufferReference!, 0, nil)
incomingData = NSMutableData()
playbackGroup.leave()
}
}
Callback Method (Not Global):
private func AudioQueueOutputCallback(aq: AudioQueueRef, buffer: AudioQueueBufferRef) {
for index in 0..<allocatedBuffersUponStart {
if isBuffersUsed[index] == true {
isBuffersUsed[index] = false
}
}
}
If your wondering how it's all being implemented:
func inSomeMethodOutsideThisFile() {
audioService = AudioService.shared
audioService.setupAudioQueuePlayback()
dataManager.subscribe(toTopic: "\(deviceId)/LineOutAudio", qoS: .messageDeliveryAttemptedAtLeastOnce) { (audioData) in
self.audioService.playFromData(audioData)
}
}
I've tried other ways but this way was the main way it all started.
I'm having trouble writing wav files in 24bits with AVAudioEngine in swift.
For my usage, my input is an array of Float.
I have the audio format of the input file (retrieved with AVAudioFile).
So, I need to convert my input Float array to a value that will be writable for the buffer. Also, I want to find the right channel to write my data.
My code is working with 16bit and 32 bit files, but I don't know how to handle 24 bit files...
Here it is :
//Static func to write audiofile
fileprivate func writeAudioFile(to outputURL : URL,
withFormat format : AVAudioFormat,
fromSamples music : [Float] )
{
var outputFormatSettings = format.settings
guard let bufferFormat = AVAudioFormat(settings: outputFormatSettings) else{
return
}
var audioFile : AVAudioFile?
do{
audioFile = try AVAudioFile(forWriting: outputURL,
settings: outputFormatSettings,
commonFormat: format.commonFormat,
interleaved: true)
} catch let error as NSError {
print("error:", error.localizedDescription)
}
let frameCount = music.count / Int(format.channelCount)
let outputBuffer = AVAudioPCMBuffer(pcmFormat: bufferFormat,
frameCapacity: AVAudioFrameCount(frameCount))
//We write the data in the right channel
guard let bitDepth = (outputFormatSettings["AVLinearPCMBitDepthKey"] as? Int) else {
return
}
switch bitDepth {
case 16:
for i in 0..<music.count {
var floatValue = music[i]
if(floatValue > 1){
floatValue = 1
}
if(floatValue < -1){
floatValue = -1
}
let value = floatValue * Float(Int16.max)
outputBuffer?.int16ChannelData!.pointee[i] = Int16(value)
}
case 24:
//Here I am not sure of what I do ... Could'nt find the right channel !
for i in 0..<music.count {
outputBuffer?.floatChannelData!.pointee[i] = music[i]
}
case 32:
for i in 0..<music.count {
outputBuffer?.floatChannelData!.pointee[i] = music[i]
}
default:
return
}
outputBuffer?.frameLength = AVAudioFrameCount( frameCount )
do{
try audioFile?.write(from: outputBuffer!)
} catch let error as NSError {
print("error:", error.localizedDescription)
return
}
}
Thanks by advance if someone have an idea of how to handle this !
Representing a 24 bit int in C isn't fun so in Swift I'm sure it's downright painful, and none of the API's support it anyway. Your best bet is to convert to a more convenient format for processing.
AVAudioFile has two formats and an internal converter to convert between them. Its fileFormat represents the format of the file on disk, while its processingformat represents the format of the lpcm data when it is read from, and the format of the lpcm data that it will accept when being written to.
The typical workflow is choose a standard processingFormat, do all of your processing using this format, and let AVAudioFile convert to and from the file format for reading and writing to disk. All of the Audio Unit APIs accept non-interleaved formats, so I tend to use non interleaved for all of my processing formats.
Here's an example that copies the first half of an audio file. It doesn't address your existing code, but illustrates a more common approach:
func halfCopy(src: URL, dst: URL) throws {
let srcFile = try AVAudioFile(forReading: src) //This opens the file for reading using the standard format (deinterleaved floating point).
let dstFile = try AVAudioFile(forWriting: dst,
settings: srcFile.fileFormat.settings,
commonFormat: srcFile.processingFormat.commonFormat,
interleaved: srcFile.processingFormat.isInterleaved) //AVAudioFile(forReading: src) always returns a non-interleaved processing format, this will be false
let frameCount = AVAudioFrameCount(srcFile.length) / 2 // Copying first half of file
guard let buffer = AVAudioPCMBuffer(pcmFormat: srcFile.processingFormat,
frameCapacity: frameCount) else {
fatalError("Derp")
}
try srcFile.read(into: buffer, frameCount: frameCount)
try dstFile.write(from: buffer)
}
For my project (being compiled as a framework) I have a file ops.metal:
kernel void add(device float *lhs [[buffer(0)]],
device float *rhs [[buffer(1)]],
device float *result [[buffer(2)]],
uint id [[ thread_position_in_grid ]])
{
result[id] = lhs[id] + rhs[id];
}
and the following Swift code:
#available(OSX 10.11, *)
public class MTLContext {
var device: MTLDevice!
var commandQueue:MTLCommandQueue!
var library:MTLLibrary!
var commandBuffer:MTLCommandBuffer
var commandEncoder:MTLComputeCommandEncoder
init() {
if let defaultDevice = MTLCreateSystemDefaultDevice() {
device = defaultDevice
print("device created")
} else {
print("Metal is not supported")
}
commandQueue = device.makeCommandQueue()
library = device.newDefaultLibrary()
if let defaultLibrary = device.newDefaultLibrary() {
library = defaultLibrary
} else {
print("could not load default library")
}
commandBuffer = commandQueue.makeCommandBuffer()
commandEncoder = commandBuffer.makeComputeCommandEncoder()
}
deinit {
commandEncoder.endEncoding()
}
}
When I try to create an instance of MTLContext in a unit test, the device is created, but the default library cannot be created ("could not load default library"). I've checked that the compiled framework has a default.metallib in Resources (which is the most common reason given for newDefaultLibrary).
Unfortunately I haven't been able to find any working examples that are creating compute kernels in a Metal shader file (there are a few examples using the performance shaders, but they don't need to make kernels in the shader file).
Any suggestions would be greatly appreciated!
newDefaultLibrary() loads from the main bundle of the currently running application. It doesn't search any embedded frameworks or other locations for libraries.
If you want to use a metallib that was compiled into an embedded framework, the easiest thing to do is to get a reference to its containing Bundle and ask for the default library of that bundle instead:
let frameworkBundle = Bundle(for: SomeClassFromMyShaderFramework.self)
guard let defaultLibrary = try? device.makeDefaultLibrary(bundle: frameworkBundle) else {
fatalError("Could not load default library from specified bundle")
}
This does require that you have at least one publicly-visible class in the framework containing your shaders, but that can be as simple as declaring an empty class strictly for the purpose of doing the bundle look-up:
public class SomeClassFromMyShaderFramework {}
I am using following to get video sample buffer:
- (void) writeSampleBufferStream:(CMSampleBufferRef)sampleBuffer ofType:(NSString *)mediaType
Now my question is that how can I get h.264 encoded NSData from above sampleBuffer. Please suggest.
Update for 2017:
You can do streaming Video and Audio now by using the VideoToolbox API.
Read the documentation here: VTCompressionSession
Original answer (from 2013):
Short: You can't, the sample buffer you receive is uncompressed.
Methods to get hardware accelerated h264 compression:
AVAssetWriter
AVCaptureMovieFileOutput
As you can see both write to a file, writing to a pipe does not work as the encoder updates header information after a frame or GOP has been fully written. So you better don't touch the file while the encoder writes to it as it does randomly rewrite header information. Without this header information the video file will not be playable (it updates the size field, so the first header written says the file is 0 bytes). Directly writing to a memory area is not supported currently. But you can open the encoded video-file and demux the stream to get to the h264 data (after the encoder has closed the file of course)
You can only get raw video images in either BGRA or YUV color formats from AVFoundation. However, when you write those frames to an mp4 via AVAssetWriter, they will be encoded using H264 encoding.
A good example with code on how to do that is RosyWriter
Note that after each AVAssetWriter write, you will know that one complete H264 NAL was written to a mp4. You could write code that reads a complete H264 NAL after each write by AVAssetWriter, which is going to give you access to an H264 encoded frame. It might take a bit to get it right with decent speed, but it is doable( I did it successfully).
By the way, in order to successfully decode these encoded video frames, you will need H264 SPS and PPS information which is located in a different place in the mp4 file. In my case, I actually create couple of test mp4 files, and then manually extracted those out. Since those don't change, unless you change the H264 encoded specs, you can use them in your code.
Check my post to SPS values for H 264 stream in iPhone to see some of the SPS/PPS I used in my code.
Just a final note, in my case I had to stream h264 encoded frames to another endpoint for decoding/viewing; so my code had to do this fast. In my case, it was relatively fast; but eventually I switched to VP8 for encoding/decoding just because it was way faster because everything was done in memory without file reading/writing.
Good luck, and hopefully this info helps.
Use VideoToolbox API. refer: https://developer.apple.com/videos/play/wwdc2014/513/
import Foundation
import AVFoundation
import VideoToolbox
public class LiveStreamSession {
let compressionSession: VTCompressionSession
var index = -1
var lastInputPTS = CMTime.zero
public init?(width: Int32, height: Int32){
var compressionSessionOrNil: VTCompressionSession? = nil
let status = VTCompressionSessionCreate(allocator: kCFAllocatorDefault,
width: width,
height: height,
codecType: kCMVideoCodecType_H264,
encoderSpecification: nil, // let the video toolbox choose a encoder
imageBufferAttributes: nil,
compressedDataAllocator: kCFAllocatorDefault,
outputCallback: nil,
refcon: nil,
compressionSessionOut: &compressionSessionOrNil)
guard status == noErr,
let compressionSession = compressionSessionOrNil else {
return nil
}
VTSessionSetProperty(compressionSession, key: kVTCompressionPropertyKey_RealTime, value: kCFBooleanTrue);
VTCompressionSessionPrepareToEncodeFrames(compressionSession)
self.compressionSession = compressionSession
}
public func pushVideoBuffer(buffer: CMSampleBuffer) {
// image buffer
guard let imageBuffer = CMSampleBufferGetImageBuffer(buffer) else {
assertionFailure()
return
}
// pts
let pts = CMSampleBufferGetPresentationTimeStamp(buffer)
guard CMTIME_IS_VALID(pts) else {
assertionFailure()
return
}
// duration
var duration = CMSampleBufferGetDuration(buffer);
if CMTIME_IS_INVALID(duration) && CMTIME_IS_VALID(self.lastInputPTS) {
duration = CMTimeSubtract(pts, self.lastInputPTS)
}
index += 1
self.lastInputPTS = pts
print("[\(Date())]: pushVideoBuffer \(index)")
let currentIndex = index
VTCompressionSessionEncodeFrame(compressionSession, imageBuffer: imageBuffer, presentationTimeStamp: pts, duration: duration, frameProperties: nil, infoFlagsOut: nil) {[weak self] status, encodeInfoFlags, sampleBuffer in
print("[\(Date())]: compressed \(currentIndex)")
if let sampleBuffer = sampleBuffer {
self?.didEncodeFrameBuffer(buffer: sampleBuffer, id: currentIndex)
}
}
}
deinit {
VTCompressionSessionInvalidate(compressionSession)
}
private func didEncodeFrameBuffer(buffer: CMSampleBuffer, id: Int) {
guard let attachments = CMSampleBufferGetSampleAttachmentsArray(buffer, createIfNecessary: true)
else {
return
}
let dic = Unmanaged<CFDictionary>.fromOpaque(CFArrayGetValueAtIndex(attachments, 0)).takeUnretainedValue()
let keyframe = !CFDictionaryContainsKey(dic, Unmanaged.passRetained(kCMSampleAttachmentKey_NotSync).toOpaque())
// print("[\(Date())]: didEncodeFrameBuffer \(id) is I frame: \(keyframe)")
if keyframe,
let formatDescription = CMSampleBufferGetFormatDescription(buffer) {
// https://www.slideshare.net/instinctools_EE_Labs/videostream-compression-in-ios
var number = 0
CMVideoFormatDescriptionGetH264ParameterSetAtIndex(formatDescription, parameterSetIndex: 0, parameterSetPointerOut: nil, parameterSetSizeOut: nil, parameterSetCountOut: &number, nalUnitHeaderLengthOut: nil)
// SPS and PPS and so on...
let parameterSets = NSMutableData()
for index in 0 ... number - 1 {
var parameterSetPointer: UnsafePointer<UInt8>?
var parameterSetLength = 0
CMVideoFormatDescriptionGetH264ParameterSetAtIndex(formatDescription, parameterSetIndex: index, parameterSetPointerOut: ¶meterSetPointer, parameterSetSizeOut: ¶meterSetLength, parameterSetCountOut: nil, nalUnitHeaderLengthOut: nil)
// parameterSets.append(startCode, length: startCodeLength)
if let parameterSetPointer = parameterSetPointer {
parameterSets.append(parameterSetPointer, length: parameterSetLength)
}
//
if index == 0 {
print("SPS is \(parameterSetPointer) with length \(parameterSetLength)")
} else if index == 1 {
print("PPS is \(parameterSetPointer) with length \(parameterSetLength)")
}
}
print("[\(Date())]: parameterSets \(parameterSets.length)")
}
}
}