I have an audio file with 5.1 channels. How do I get access to a buffer containing all of this information during playback?
The setup is roughly like this
engine.attach(playerNode)
engine.connect(playerNode, to: engine.mainMixerNode, format: audioFile.processingFormat)
engine.prepare()
try? engine.start()
File loading and playback is pretty standard, I think.
let audioFile: AVAudioFile = try? .init(forReading: url, commonFormat: AVAudioCommonFormat.pcmFormatFloat32, interleaved: false)
// ...
playerNode.scheduleFile(audioFile, at: nil)
Then I used a tap on the bus to get access to the buffer.
let format: AVAudioFormat = engine.mainMixerNode.outputFormat(forBus: 0)
engine.mainMixerNode.installTap(
onBus: 0,
bufferSize: 1024,
format: format
) { buffer, time in
// Do something cool with buffer, but buffer only has two channels.
for channel in 0..<buffer.format.channelCount {
}
}
Questions:
Is a tap on the the bus the right way to get hold of this data? Os there a better way?
How do I get at the data for more than two channels?
Install your tap on the playerNode and don't bother specifying the format:
playerNode.installTap(
onBus: 0,
bufferSize: 1024,
format: nil
) { buffer, time in
// 6 channel buffer
for channel in 0..<buffer.format.channelCount {
}
}
Related
we are working on a project which records voice from an external microphone. For analysis purposes, we need to have a sample rate of about 5k Hz.
We are using AvAudioEngine to record a voice.
We know Apple devices want able to record at a specific rate, so we are using AVAudioConverter to downgrade the sample rate.
But as you know it is similar to the compression, so the lower we reduce sample rate, file size and file duration affect the same. Which is currently happening(Correct me if I am wrong in this).
Issue
**Issue is downgrading sample rate shorter the file length and its effects on calculation & analysis.
For example, a 1-hour recording was downgraded to 45 mins. So suppose if we are making analysis on 5 minute period interval, it goes wrong
What will be the best solution for this?**
Query
We have searched over the internet but we could not figure out how buffer size on installTap affects? In the current code, we have set it to 2688.
Can anyone clarify?
Code
let bus = 0
let inputNode = engine.inputNode
let equalizer = AVAudioUnitEQ(numberOfBands: 2)
equalizer.bands[0].filterType = .lowPass
equalizer.bands[0].frequency = 3000
equalizer.bands[0].bypass = false
equalizer.bands[1].filterType = .highPass
equalizer.bands[1].frequency = 1000
equalizer.bands[1].bypass = false
engine.attach(equalizer) //Attach equalizer
// Connect nodes
engine.connect(inputNode, to: equalizer, format: inputNode.inputFormat(forBus: 0))
engine.connect(equalizer, to: engine.mainMixerNode, format: inputNode.inputFormat(forBus: 0))
// call before creating converter because this changes the mainMixer's output format
engine.prepare()
let outputFormat = AVAudioFormat(commonFormat: .pcmFormatInt16,
sampleRate: 5000,
channels: 1,
interleaved: false)!
// Downsampling converter
guard let converter: AVAudioConverter = AVAudioConverter(from: engine.mainMixerNode.outputFormat(forBus: 0), to: outputFormat) else {
print("Can't convert in to this format")
return
}
engine.mainMixerNode.installTap(onBus: bus, bufferSize: 2688, format: nil) { (buffer, time) in
var newBufferAvailable = true
let inputCallback: AVAudioConverterInputBlock = { inNumPackets, outStatus in
if newBufferAvailable {
outStatus.pointee = .haveData
newBufferAvailable = false
return buffer
} else {
outStatus.pointee = .noDataNow
return nil
}
}
let convertedBuffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: AVAudioFrameCount(outputFormat.sampleRate) * buffer.frameLength / AVAudioFrameCount(buffer.format.sampleRate))!
var error: NSError?
let status = converter.convert(to: convertedBuffer, error: &error, withInputFrom: inputCallback)
assert(status != .error)
if status == .haveData {
// Process with converted buffer
}
}
do {
try engine.start()
} catch {
print("Can't start the engine: \(error)")
}
Expecting Result
We are fine with compression of buffer but We would like to have the same recording duration in the output file. If we record for 10 minutes output file should have 10 minutes of data.
Digitized audio doesn't have an intrinsic duration since it can be played back at any sample rate.
In order for the resulting file's duration to be what you expect, the sample rates have to be what you expect at each stage: Recording, processing, and playback.
I suspect that one of two possible things is happening:
A) the sample rate of the buffer you receive inside installtap is not what you assumed it would be... and you are converting from the wrong format.
B) You are playing back your audio at sample rates than are different that what you are assuming they are. (How do you know that your player is playing at 5000hz)?
In order to check this, you would have to break the process down into smaller pieces and check the sample rate at each stage.
So I've been trying to do Speech Recognition in Swift using the builtin SFSpeechRecognition class while also downsampling and then recording the audio to a file, but I'm not well-versed enough in AVAudioEngine to figure it out.
I've gotten the Speech Recognition to work by itself, and I've gotten the recording the audio to work by itself, but I can't get them to work together.
Here's my existing code in which I try to record - the remaining code is just the standard speech recognition type stuff:
let audioFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: 16000, channels: 1, interleaved: false)
let mixer = AVAudioMixerNode()
audioEngine.attach(mixer)
audioEngine.connect(inputNode!, to: mixer, format: inputNode!.inputFormat(forBus: 0))
// 1 Connecting Mixer
audioEngine.connect(mixer, to: audioEngine.outputNode, format: audioFormat)
// 2 Recognition
inputNode!.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
print("Testing if this tap works")
self.recognitionRequest?.append(buffer)
}
// 3 Downsampling and recording
mixer.installTap(onBus: 0, bufferSize: 1024, format: audioFormat){ (buffer, when) in
print(buffer)
try? self.outputFile!.write(from: buffer)
}
If I comment out 3, then the speech recognition works, but otherwise 2 doesn't even run - the tap doesn't output anything. I also can't put the recognitionRequest in 3 because then the speech recognition throws an error. I see in the docs that each bus can only have one tap - how can I get around this? Should I use an AVConnectionPoint? I don't see it well documented in the docs.
I am working on a push to talk functionality where sender can send an audio in form of bytes array to server and receiver can listen it at realtime through socket connection.
when i try to play video at receiver end using AVAudioEngine, it's not working.
let buffer = dataToPCMBuffer(format: format16KHzMono!, data: data)
let player = AVAudioPlayerNode()
self.audioEngine?.attach(audioPlayerNode)
let mixer = self.audioEngine?.mainMixerNode
self.audioEngine?.connect(player, to: mixer!, format: AVAudioFormat.init(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: 16000, channels: 1, interleaved: true) )
self.playerQueue.async {
self.audioPlayerNode.scheduleBuffer(buffer!) {
print("stopping")
if self.audioEngine!.isRunning {
self.audioPlayerNode.play()
}else {
try? self.audioEngine?.start()
}
}
And, i am facing crash at below given line.
self.audioEngine?.connect(player, to: mixer!, format: AVAudioFormat.init(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: 16000, channels: 1, interleaved: true) )
Any help will be appreciated.
I think it’s the format in your connection. Try using nil instead. There are some magic numbers needed for sample rates, maybe 16000 is not one of them.
Is it possible to increase/decrease volume of AVAsset track or AVMutableComposition of an audio file?
I have two audio files (background instrumental and recorded song), I want to decrease one file's volume and merge it with the other.
1. Change the Track's volume
To do this to the physical file, you will need to load the raw PCM data into Swift. Below is an example of getting the floating point data thanks to this SO post:
import AVFoundation
// ...
let url = NSBundle.mainBundle().URLForResource("your audio file", withExtension: "wav")
let file = try! AVAudioFile(forReading: url!)
let format = AVAudioFormat(commonFormat: .PCMFormatFloat32, sampleRate: file.fileFormat.sampleRate, channels: 1, interleaved: false)
let buf = AVAudioPCMBuffer(PCMFormat: format, frameCapacity: 1024)
try! file.readIntoBuffer(buf)
// this makes a copy, you might not want that
let floatArray = Array(UnsafeBufferPointer(start: buf.floatChannelData[0], count:Int(buf.frameLength)))
print("floatArray \(floatArray)\n")
Once you have the data in your floatArray, simply multiply every value in the array by a number between 0 and 1 to adjust the gain. If you are more familiar with decibels then put your decibel value into the following line, and multiply every array value by the linGain:
var linGain = pow(10.0f, decibelGain/20.0f).
Then it's a question of writing the audio file back again before you load it (credit):
let SAMPLE_RATE = Float64(16000.0)
let outputFormatSettings = [
AVFormatIDKey:kAudioFormatLinearPCM,
AVLinearPCMBitDepthKey:32,
AVLinearPCMIsFloatKey: true,
// AVLinearPCMIsBigEndianKey: false,
AVSampleRateKey: SAMPLE_RATE,
AVNumberOfChannelsKey: 1
] as [String : Any]
let audioFile = try? AVAudioFile(forWriting: url, settings: outputFormatSettings, commonFormat: AVAudioCommonFormat.pcmFormatFloat32, interleaved: true)
let bufferFormat = AVAudioFormat(settings: outputFormatSettings)
let outputBuffer = AVAudioPCMBuffer(pcmFormat: bufferFormat, frameCapacity: AVAudioFrameCount(buff.count))
// i had my samples in doubles, so convert then write
for i in 0..<buff.count {
outputBuffer.floatChannelData!.pointee[i] = Float( buff[i] )
}
outputBuffer.frameLength = AVAudioFrameCount( buff.count )
do{
try audioFile?.write(from: outputBuffer)
} catch let error as NSError {
print("error:", error.localizedDescription)
}
2. Mix the Tracks Together
Once you have your new .wav files of your audio, you can load both into your AVAssets like before, but this time with the desired gain you applied before.
Then it looks like you will want to be using the AVAssetReaderAudioMixOutput which has a method specifically for mixing together two audio tracks.
AVAssetReaderAudioMixOutput.init(audioTracks: [AVAssetTrack], audioSettings: [String : Any]?)
Note: I would not continuously use steps 1 & 2 for example if you wanted to mix the song with a slider and hear the result, I would recommend using AVPlayer's and adjust their volume and then when the user is ready, call this file IO and mixing.
I am using following to get video sample buffer:
- (void) writeSampleBufferStream:(CMSampleBufferRef)sampleBuffer ofType:(NSString *)mediaType
Now my question is that how can I get h.264 encoded NSData from above sampleBuffer. Please suggest.
Update for 2017:
You can do streaming Video and Audio now by using the VideoToolbox API.
Read the documentation here: VTCompressionSession
Original answer (from 2013):
Short: You can't, the sample buffer you receive is uncompressed.
Methods to get hardware accelerated h264 compression:
AVAssetWriter
AVCaptureMovieFileOutput
As you can see both write to a file, writing to a pipe does not work as the encoder updates header information after a frame or GOP has been fully written. So you better don't touch the file while the encoder writes to it as it does randomly rewrite header information. Without this header information the video file will not be playable (it updates the size field, so the first header written says the file is 0 bytes). Directly writing to a memory area is not supported currently. But you can open the encoded video-file and demux the stream to get to the h264 data (after the encoder has closed the file of course)
You can only get raw video images in either BGRA or YUV color formats from AVFoundation. However, when you write those frames to an mp4 via AVAssetWriter, they will be encoded using H264 encoding.
A good example with code on how to do that is RosyWriter
Note that after each AVAssetWriter write, you will know that one complete H264 NAL was written to a mp4. You could write code that reads a complete H264 NAL after each write by AVAssetWriter, which is going to give you access to an H264 encoded frame. It might take a bit to get it right with decent speed, but it is doable( I did it successfully).
By the way, in order to successfully decode these encoded video frames, you will need H264 SPS and PPS information which is located in a different place in the mp4 file. In my case, I actually create couple of test mp4 files, and then manually extracted those out. Since those don't change, unless you change the H264 encoded specs, you can use them in your code.
Check my post to SPS values for H 264 stream in iPhone to see some of the SPS/PPS I used in my code.
Just a final note, in my case I had to stream h264 encoded frames to another endpoint for decoding/viewing; so my code had to do this fast. In my case, it was relatively fast; but eventually I switched to VP8 for encoding/decoding just because it was way faster because everything was done in memory without file reading/writing.
Good luck, and hopefully this info helps.
Use VideoToolbox API. refer: https://developer.apple.com/videos/play/wwdc2014/513/
import Foundation
import AVFoundation
import VideoToolbox
public class LiveStreamSession {
let compressionSession: VTCompressionSession
var index = -1
var lastInputPTS = CMTime.zero
public init?(width: Int32, height: Int32){
var compressionSessionOrNil: VTCompressionSession? = nil
let status = VTCompressionSessionCreate(allocator: kCFAllocatorDefault,
width: width,
height: height,
codecType: kCMVideoCodecType_H264,
encoderSpecification: nil, // let the video toolbox choose a encoder
imageBufferAttributes: nil,
compressedDataAllocator: kCFAllocatorDefault,
outputCallback: nil,
refcon: nil,
compressionSessionOut: &compressionSessionOrNil)
guard status == noErr,
let compressionSession = compressionSessionOrNil else {
return nil
}
VTSessionSetProperty(compressionSession, key: kVTCompressionPropertyKey_RealTime, value: kCFBooleanTrue);
VTCompressionSessionPrepareToEncodeFrames(compressionSession)
self.compressionSession = compressionSession
}
public func pushVideoBuffer(buffer: CMSampleBuffer) {
// image buffer
guard let imageBuffer = CMSampleBufferGetImageBuffer(buffer) else {
assertionFailure()
return
}
// pts
let pts = CMSampleBufferGetPresentationTimeStamp(buffer)
guard CMTIME_IS_VALID(pts) else {
assertionFailure()
return
}
// duration
var duration = CMSampleBufferGetDuration(buffer);
if CMTIME_IS_INVALID(duration) && CMTIME_IS_VALID(self.lastInputPTS) {
duration = CMTimeSubtract(pts, self.lastInputPTS)
}
index += 1
self.lastInputPTS = pts
print("[\(Date())]: pushVideoBuffer \(index)")
let currentIndex = index
VTCompressionSessionEncodeFrame(compressionSession, imageBuffer: imageBuffer, presentationTimeStamp: pts, duration: duration, frameProperties: nil, infoFlagsOut: nil) {[weak self] status, encodeInfoFlags, sampleBuffer in
print("[\(Date())]: compressed \(currentIndex)")
if let sampleBuffer = sampleBuffer {
self?.didEncodeFrameBuffer(buffer: sampleBuffer, id: currentIndex)
}
}
}
deinit {
VTCompressionSessionInvalidate(compressionSession)
}
private func didEncodeFrameBuffer(buffer: CMSampleBuffer, id: Int) {
guard let attachments = CMSampleBufferGetSampleAttachmentsArray(buffer, createIfNecessary: true)
else {
return
}
let dic = Unmanaged<CFDictionary>.fromOpaque(CFArrayGetValueAtIndex(attachments, 0)).takeUnretainedValue()
let keyframe = !CFDictionaryContainsKey(dic, Unmanaged.passRetained(kCMSampleAttachmentKey_NotSync).toOpaque())
// print("[\(Date())]: didEncodeFrameBuffer \(id) is I frame: \(keyframe)")
if keyframe,
let formatDescription = CMSampleBufferGetFormatDescription(buffer) {
// https://www.slideshare.net/instinctools_EE_Labs/videostream-compression-in-ios
var number = 0
CMVideoFormatDescriptionGetH264ParameterSetAtIndex(formatDescription, parameterSetIndex: 0, parameterSetPointerOut: nil, parameterSetSizeOut: nil, parameterSetCountOut: &number, nalUnitHeaderLengthOut: nil)
// SPS and PPS and so on...
let parameterSets = NSMutableData()
for index in 0 ... number - 1 {
var parameterSetPointer: UnsafePointer<UInt8>?
var parameterSetLength = 0
CMVideoFormatDescriptionGetH264ParameterSetAtIndex(formatDescription, parameterSetIndex: index, parameterSetPointerOut: ¶meterSetPointer, parameterSetSizeOut: ¶meterSetLength, parameterSetCountOut: nil, nalUnitHeaderLengthOut: nil)
// parameterSets.append(startCode, length: startCodeLength)
if let parameterSetPointer = parameterSetPointer {
parameterSets.append(parameterSetPointer, length: parameterSetLength)
}
//
if index == 0 {
print("SPS is \(parameterSetPointer) with length \(parameterSetLength)")
} else if index == 1 {
print("PPS is \(parameterSetPointer) with length \(parameterSetLength)")
}
}
print("[\(Date())]: parameterSets \(parameterSets.length)")
}
}
}