How to use Metal argument buffer? - swift

I am trying to use Metal argument buffers to access data in a Metal compute kernel.
The buffer has an entry when I print out the value CPU-side, but the Xcode debugger shows my argument buffer as empty on the GPU.
I can see my buffer with the sentinel value as an indirect resource in the debugger but no pointer to it in the argument buffer.
Here is the Swift code:
import MetalKit
do {
let device = MTLCreateSystemDefaultDevice()!
let capture_manager = MTLCaptureManager.shared()
let capture_desc = MTLCaptureDescriptor()
capture_desc.captureObject = device
try capture_manager.startCapture(with: capture_desc)
let argument_desc = MTLArgumentDescriptor()
argument_desc.dataType = MTLDataType.pointer
argument_desc.index = 0
argument_desc.arrayLength = 1024
let argument_encoder = device.makeArgumentEncoder(arguments: [argument_desc])!
let argument_buffer = device.makeBuffer(length: argument_encoder.encodedLength, options: MTLResourceOptions())
argument_encoder.setArgumentBuffer(argument_buffer, offset: 0)
var sentinel: UInt32 = 12345
let ptr = UnsafeRawPointer.init(&sentinel)
let buffer = device.makeBuffer(bytes: ptr, length: 4, options: MTLResourceOptions.storageModeShared)!
argument_encoder.setBuffer(buffer, offset: 0, index: 0)
let source = try String(contentsOf: URL.init(fileURLWithPath: "/path/to/kernel.metal"))
let library = try device.makeLibrary(source: source, options: MTLCompileOptions())
let function = library.makeFunction(name: "main0")!
let pipeline = try device.makeComputePipelineState(function: function)
let queue = device.makeCommandQueue()!
let encoder = queue.makeCommandBuffer()!
let compute_encoder = encoder.makeComputeCommandEncoder()!
compute_encoder.setComputePipelineState(pipeline)
compute_encoder.setBuffer(argument_buffer, offset: 0, index: 0)
compute_encoder.useResource(buffer, usage: MTLResourceUsage.read)
compute_encoder.dispatchThreads(MTLSize.init(width: 1, height: 1, depth: 1), threadsPerThreadgroup: MTLSize.init(width: 1, height: 1, depth: 1))
compute_encoder.endEncoding()
encoder.commit()
encoder.waitUntilCompleted()
capture_manager.stopCapture()
} catch {
print(error)
exit(1)
}
And the compute kernel:
#include <metal_stdlib>
#include <simd/simd.h>
using namespace metal;
struct Argument {
constant uint32_t *ptr [[id(0)]];
};
kernel void main0(
constant Argument *bufferArray [[buffer(0)]]
) {
constant uint32_t *ptr = bufferArray[0].ptr;
uint32_t y = *ptr;
}
If anyone has any ideas, I'd greatly appreciate it!

Seems that Metal optimizes out the kernel or something along that line since it only performs read operations.
Changing the kernel to write to the buffer makes everything work and show up properly in the Xcode debugger.

Related

How do you create and use an indirectCommandBuffer in Swift for Metal GPU computations?

I am currently working on a project that uses the GPU to do computations on large datasets. Currently I'm investigating the potential of using indirectCommandBuffers to potentially speed up our code, especially since I'm are having trouble with its speed on the M1 processor (interestingly enough, the speed at which our program runs is super fast on AMD Metal GPUs). Another reason why I want to do this is to avoid having to create the exact same Compute Command Encoders 500+ times.
However, I'm having troubles with coding the indirect Command Buffers, and I can't seem to find much documentation for it online, especially in Swift. When I first attempted to do this in the project I'm working on, I found that on the M1 it would just crash when I tried setting the MTLComputePipelineState of the MTLIndirectComputeCommand using .setComputePipelineState(), whereas on AMD chips it would hang when trying to commit and execute the commands in the indirectCommandBuffer, and if it got through everything, it would just return memory points of 0 data.
I've created what hopefully is a minimal reproducible example to try and show what issue I'm having; it just adds two numpy arrays received from C 1000 times. Be aware this is just an example to illustrate the issue. Our goal is improve some finite difference code with Metal.
I'm currently running MacOS 12.4.
Below is the Swift function:
import Metal
import MetalPerformanceShaders
import Accelerate
import Foundation
#_cdecl("metalswift_add")
public func addition(array1: UnsafeMutablePointer<Float>,array2: UnsafeMutablePointer<Float>, length: Int) -> UnsafeMutablePointer<Float> {
var bFound = false
var device : MTLDevice!
device = MTLCreateSystemDefaultDevice()!
let defaultLibrary = try! device.makeLibrary(filepath: "metal.metallib")
let metalswift_addfunction = defaultLibrary.makeFunction(name: "metalswift_add")!
let descriptor = MTLComputePipelineDescriptor()
descriptor.computeFunction = metalswift_addfunction
descriptor.supportIndirectCommandBuffers = true
let computePipelineState = try! device.makeComputePipelineState(descriptor: descriptor, options: .init(), reflection: nil)
var Ref1 : UnsafeMutablePointer<Float> = UnsafeMutablePointer(array1)
var Ref2 : UnsafeMutablePointer<Float> = UnsafeMutablePointer(array2)
var size = length
let SizeBuffer : UnsafeMutableRawPointer = UnsafeMutableRawPointer(&size)
let ll = MemoryLayout<Float>.stride * length
var Buffer1:MTLBuffer! = device.makeBuffer(bytes:Ref1, length: ll, options:[])
var Buffer2:MTLBuffer! = device.makeBuffer(bytes:Ref2, length: ll, options:[])
var MetalBuffer:MTLBuffer! = device.makeBuffer(length: ll, options:[])
let Size:MTLBuffer! = device.makeBuffer(bytes: SizeBuffer, length: MemoryLayout<Int>.size, options: [])
var icbDescriptor:MTLIndirectCommandBufferDescriptor = MTLIndirectCommandBufferDescriptor()
icbDescriptor.commandTypes.insert(MTLIndirectCommandType.concurrentDispatchThreads)
icbDescriptor.inheritBuffers = false
icbDescriptor.inheritPipelineState = false
icbDescriptor.maxKernelBufferBindCount = 4
var indirectCommandBuffer = device.makeIndirectCommandBuffer(descriptor: icbDescriptor, maxCommandCount: 1)!
let icbCommand = indirectCommandBuffer.indirectComputeCommandAt(0)
icbCommand.setComputePipelineState(computePipelineState)
icbCommand.setKernelBuffer(Buffer1, offset: 0, at: 0)
icbCommand.setKernelBuffer(Buffer2, offset: 0, at: 1)
icbCommand.setKernelBuffer(MetalBuffer, offset: 0, at: 2)
icbCommand.setKernelBuffer(Size, offset: 0, at: 3)
icbCommand.concurrentDispatchThreads(MTLSize(width:computePipelineState.threadExecutionWidth, height: 1, depth: 1), threadsPerThreadgroup:MTLSize(width:computePipelineState.maxTotalThreadsPerThreadgroup, height: 1, depth: 1))
icbCommand.setBarrier()
for i in 0..<1000{
print(i)
let commandQueue = device.makeCommandQueue()!
let commandBuffer = commandQueue.makeCommandBuffer()!
let computeCommandEncoder = commandBuffer.makeComputeCommandEncoder()!
computeCommandEncoder.executeCommandsInBuffer(indirectCommandBuffer, range:0..<1)
computeCommandEncoder.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
}
return(MetalBuffer!.contents().assumingMemoryBound(to: Float.self))
}
This is the Metal Function:
#include <metal_stdlib>
#include <metal_math>
using namespace metal;
#define size (*size_pr)
kernel void metalswift_add(const device float *Buffer1 [[ buffer(0) ]],
const device float *Buffer2[[ buffer(1) ]],
device float *MetalBuffer[[ buffer(2) ]],
const device int *size_pr[[ buffer(3) ]]) {
for (int i=0; i<size; i++){
MetalBuffer[i] = Buffer1[i] + Buffer2[i];
}
}
I had it working without the indirectCommandEncoders, so I believe that it's probably an issue with how I coded the indirectCommandEncoders rather than the Metal function.
If any other information is needed, let me know! Sorry if this is of low quality, this is my first question on stack.
Update: I've updated the code above with some changes that stops the code from crashing at runtime. However, I'm still running into the hanging issue on AMD Metal GPUs, and on the M1 it seems like it only goes through the Metal function once.
You aren't creating the MTLComputePipelineState correctly. To use a pipeline state in an ICB, you need to set supportIndirectCommandBuffers to true in a pipeline state descriptor. Kinda like this:
let metalswift_addfunction = defaultLibrary.makeFunction(name: "metalswift_add")!
let descriptor = MTLComputePipelineDescriptor()
descriptor.computeFunction = metalswift_addfunction
descriptor.supportIndirectCommandBuffers = true
let computePipelineState = try! device.makeComputePipelineState(descriptor: descriptor, options: .init(), reflection: nil)
With that, it should work.
By the way, I recommend running with Shader Validation. It does catch this error. You can enable it in diagnostics scheme settings or by passing an environment variable. You can find more information about shader validation by reading man MetalValidation in Terminal.

Getting RGBA values for all pixels of CGImage Swift

I am trying to create a real time video processing app, in which I need to get the RGBA values of all pixels for each frame, and process them using an external library, and show them. I am trying to get the RGBA value for each pixel, but it is too slow the way I am doing it, I was wondering if there is a way to do it faster, using VImage. This is my current code, and the way I get all the pixels, as I get the current frame:
guard let cgImage = context.makeImage() else {
return nil
}
guard let data = cgImage.dataProvider?.data,
let bytes = CFDataGetBytePtr(data) else {
fatalError("Couldn't access image data")
}
assert(cgImage.colorSpace?.model == .rgb)
let bytesPerPixel = cgImage.bitsPerPixel / cgImage.bitsPerComponent
gp.async {
for y in 0 ..< cgImage.height {
for x in 0 ..< cgImage.width {
let offset = (y * cgImage.bytesPerRow) + (x * bytesPerPixel)
let components = (r: bytes[offset], g: bytes[offset + 1], b: bytes[offset + 2])
print("[x:\(x), y:\(y)] \(components)")
}
print("---")
}
}
This is the version using the VImage, but I there is some memory leak, and I can not access the pixels
guard
let format = vImage_CGImageFormat(cgImage: cgImage),
var buffer = try? vImage_Buffer(cgImage: cgImage,
format: format) else {
exit(-1)
}
let rowStride = buffer.rowBytes / MemoryLayout<Pixel_8>.stride / format.componentCount
do {
let componentCount = format.componentCount
var argbSourcePlanarBuffers: [vImage_Buffer] = (0 ..< componentCount).map { _ in
guard let buffer1 = try? vImage_Buffer(width: Int(buffer.width),
height: Int(buffer.height),
bitsPerPixel: format.bitsPerComponent) else {
fatalError("Error creating source buffers.")
}
return buffer1
}
vImageConvert_ARGB8888toPlanar8(&buffer,
&argbSourcePlanarBuffers[0],
&argbSourcePlanarBuffers[1],
&argbSourcePlanarBuffers[2],
&argbSourcePlanarBuffers[3],
vImage_Flags(kvImageNoFlags))
let n = rowStride * Int(argbSourcePlanarBuffers[1].height) * format.componentCount
let start = buffer.data.assumingMemoryBound(to: Pixel_8.self)
var ptr = UnsafeBufferPointer(start: start, count: n)
print(Array(argbSourcePlanarBuffers)[1]) // prints the first 15 interleaved values
buffer.free()
}
You can access the underlying pixels in a vImage buffer to do this.
For example, given an image named cgImage, use the following code to populate a vImage buffer:
guard
let format = vImage_CGImageFormat(cgImage: cgImage),
let buffer = try? vImage_Buffer(cgImage: cgImage,
format: format) else {
exit(-1)
}
let rowStride = buffer.rowBytes / MemoryLayout<Pixel_8>.stride / format.componentCount
Note that a vImage buffer's data may be wider than the image (see: https://developer.apple.com/documentation/accelerate/finding_the_sharpest_image_in_a_sequence_of_captured_images) which is why I've added rowStride.
To access the pixels as a single buffer of interleaved values, use:
do {
let n = rowStride * Int(buffer.height) * format.componentCount
let start = buffer.data.assumingMemoryBound(to: Pixel_8.self)
let ptr = UnsafeBufferPointer(start: start, count: n)
print(Array(ptr)[ 0 ... 15]) // prints the first 15 interleaved values
}
To access the pixels as a buffer of Pixel_8888 values, use (make sure that format.componentCount is 4:
do {
let n = rowStride * Int(buffer.height)
let start = buffer.data.assumingMemoryBound(to: Pixel_8888.self)
let ptr = UnsafeBufferPointer(start: start, count: n)
print(Array(ptr)[ 0 ... 3]) // prints the first 4 pixels
}
This is the slowest way to do it. A faster way is with a custom CoreImage filter.
Faster than that is to write your own OpenGL Shader (or rather, it's equivalent in Metal for current devices)
I've written OpenGL shaders, but have not worked with Metal yet.
Both allow you to write graphics code that runs directly on the GPU.

Metal Command Buffer Internal Error: What is Internal Error (IOAF code 2067)?

Attempting to run a compute kernel results in the following message:
Execution of the command buffer was aborted due to an error during execution. Internal Error (IOAF code 2067)
To get more specific information I query the command encoder's user info and manage to extract more details. I followed instructions from this video to yield the following message:
[Metal Diagnostics] __message__: MTLCommandBuffer execution failed: The commands
associated with the encoder were affected by an error, which may or may not have been
caused by the commands themselves, and failed to execute in full __:::__
__delegate_identifier__: GPUToolsDiagnostics
The breakpoint triggered by the API Validation and Shader Validation results in a record stack frame - not a GPU backtrace. The breakpoint does not indicate any new information apart from the above message.
I cannot find any reference to the mentioned IOAF code in documentation. The additional information printed reveals nothing of assistance. The kernel is quite divergent and I am speculating that may be causing the GPU to take too much time to complete. That may be to blame but I have nothing supporting this apart from a gut feeling.
Here is the thread setup for the group:
let threadExecutionWidth = pipeline.threadExecutionWidth
let threadgroupsPerGrid = MTLSize(width: (Int(pixelCount) + threadExecutionWidth - 1) / threadExecutionWidth, height: 1, depth: 1)
let threadsPerThreadgroup = MTLSize(width: threadExecutionWidth, height: 1, depth: 1)
commandEncoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
The GPU commands are being committed and waited upon for completion:
commandEncoder.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
Here is my application side code in it's entirety:
import Metal
import Foundation
import simd
typealias Float4 = SIMD4<Float>
struct SimpleFileWriter {
var fileHandle: FileHandle
init(filePath: String, append: Bool = false) {
if !FileManager.default.fileExists(atPath: filePath) {
FileManager.default.createFile(atPath: filePath, contents: nil, attributes: nil)
}
fileHandle = FileHandle(forWritingAtPath: filePath)!
if !append {
fileHandle.truncateFile(atOffset: 0)
}
}
func write(content: String) {
fileHandle.seekToEndOfFile()
guard let data = content.data(using: String.Encoding.ascii) else {
fatalError("Could not convert \(content) to ascii data!")
}
fileHandle.write(data)
}
}
var imageWidth = 480
var imageHeight = 270
var sampleCount = 16
var bounceCount = 3
let device = MTLCreateSystemDefaultDevice()!
let library = try! device.makeDefaultLibrary(bundle: Bundle.module)
let primaryRayFunc = library.makeFunction(name: "ray_trace")!
let pipeline = try! device.makeComputePipelineState(function: primaryRayFunc)
var pixelData: [Float4] = (0..<(imageWidth * imageHeight)).map{ _ in Float4(0, 0, 0, 0)}
var pixelCount = UInt(pixelData.count)
let pixelDataBuffer = device.makeBuffer(bytes: &pixelData, length: Int(pixelCount) * MemoryLayout<Float4>.stride, options: [])!
let pixelDataMirrorPointer = pixelDataBuffer.contents().bindMemory(to: Float4.self, capacity: Int(pixelCount))
let pixelDataMirrorBuffer = UnsafeBufferPointer(start: pixelDataMirrorPointer, count: Int(pixelCount))
let commandQueue = device.makeCommandQueue()!
let commandBufferDescriptor = MTLCommandBufferDescriptor()
commandBufferDescriptor.errorOptions = MTLCommandBufferErrorOption.encoderExecutionStatus
let commandBuffer = commandQueue.makeCommandBuffer(descriptor: commandBufferDescriptor)!
let commandEncoder = commandBuffer.makeComputeCommandEncoder()!
commandEncoder.setComputePipelineState(pipeline)
commandEncoder.setBuffer(pixelDataBuffer, offset: 0, index: 0)
commandEncoder.setBytes(&pixelCount, length: MemoryLayout<Int>.stride, index: 1)
commandEncoder.setBytes(&imageWidth, length: MemoryLayout<Int>.stride, index: 2)
commandEncoder.setBytes(&imageHeight, length: MemoryLayout<Int>.stride, index: 3)
commandEncoder.setBytes(&sampleCount, length: MemoryLayout<Int>.stride, index: 4)
commandEncoder.setBytes(&bounceCount, length: MemoryLayout<Int>.stride, index: 5)
// We have to calculate the sum `pixelCount` times
// => amount of threadgroups is `resultsCount` / `threadExecutionWidth` (rounded up)
// because each threadgroup will process `threadExecutionWidth` threads
let threadExecutionWidth = pipeline.threadExecutionWidth;
let threadgroupsPerGrid = MTLSize(width: (Int(pixelCount) + threadExecutionWidth - 1) / threadExecutionWidth, height: 1, depth: 1)
// Here we set that each threadgroup should process `threadExecutionWidth` threads
// the only important thing for performance is that this number is a multiple of
// `threadExecutionWidth` (here 1 times)
let threadsPerThreadgroup = MTLSize(width: threadExecutionWidth, height: 1, depth: 1)
commandEncoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
commandEncoder.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
if let error = commandBuffer.error as NSError? {
if let encoderInfo = error.userInfo[MTLCommandBufferEncoderInfoErrorKey] as? [MTLCommandBufferEncoderInfo] {
for info in encoderInfo {
print(info.label + info.debugSignposts.joined())
}
}
}
let sfw = SimpleFileWriter(filePath: "/Users/pprovins/Desktop/render.ppm")
sfw.write(content: "P3\n")
sfw.write(content: "\(imageWidth) \(imageHeight)\n")
sfw.write(content: "255\n")
for pixel in pixelDataMirrorBuffer {
sfw.write(content: "\(UInt8(pixel.x * 255)) \(UInt8(pixel.y * 255)) \(UInt8(pixel.z * 255)) ")
}
sfw.write(content: "\n")
Additionally, here is the shader being ran. I have not included all function definition for brevity's sake:
kernel void ray_trace(device float4 *result [[ buffer(0) ]],
const device uint& dataLength [[ buffer(1) ]],
const device int& imageWidth [[ buffer(2) ]],
const device int& imageHeight [[ buffer(3) ]],
const device int& samplesPerPixel [[ buffer(4) ]],
const device int& rayBounces [[ buffer (5)]],
const uint index [[thread_position_in_grid]]) {
if (index >= dataLength) {
return;
}
const float3 origin = float3(0.0);
const float aspect = float(imageWidth) / float(imageHeight);
const float3 vph = float3(0.0, 2.0, 0.0);
const float3 vpw = float3(2.0 * aspect, 0.0, 0.0);
const float3 llc = float3(-(vph / 2.0) - (vpw / 2.0) - float3(0.0, 0.0, 1.0));
float3 accumulatedColor = float3(0.0);
thread float seed = getSeed(index, index % imageWidth, index / imageWidth);
float row = float(index / imageWidth);
float col = float(index % imageWidth);
for (int aai = 0; aai < samplesPerPixel; ++aai) {
float ranX = fract(rand(seed));
float ranY = fract(rand(seed));
float u = (col + ranX) / float(imageWidth - 1);
float v = 1.0 - (row + ranY) / float(imageHeight - 1);
Ray r(origin, llc + u * vpw + v * vph - origin);
float3 color = float3(0.0);
HitRecord hr = {0.0, 0.0, false};
float attenuation = 1.0;
for (int bounceIndex = 0; bounceIndex < rayBounces; ++bounceIndex) {
testForHit(sceneDistance, r, hr);
if (hr.h) {
float3 target = hr.p + hr.n + random_f3_in_unit_sphere(seed);
attenuation *= 0.5;
r = Ray(hr.p, target - hr.p);
} else {
color = default_atmosphere_color(r) * attenuation;
break;
}
}
accumulatedColor += color / samplesPerPixel;
}
result[index] = float4(sqrt(accumulatedColor), 1.0);
}
Oddly enough, it occasionally shall run. Changing the number of samples to 16 or above will always results in the mention IOAF code. Less than 16 samples, the code will run ~25% of the time. The more samples, the more likely it is to results in the error code.
Is there anyway to get additional on IOAF code 2067?
Determining the error code with Metal API + Shader Validation was not possible.
By testing individual portions of the kernel, the particular error was narrowed down to a while loop that caused the GPU to hang.
The problem can essentially be boiled down to code that looks like:
while(true) {
// ad infinitum
}
or, in the case of the code above in the call to random_f3_in_unit_sphere(seed):
while(randNum(seed) < threshold) {
// the while loop is not "bounded"
// in any sense. Whoops.
++seed;
}

Get all sound frequencies of a WAV-file using Swift and AVFoundation

I would like to capture all frequencies between given timespans in a Wav-file. The intent is to do some audio analysis in a later step. For test, I’ve used the application “Sox” to generate a 1 second long Wav-file which includes only a single tone at 13000Hz. I want to read the file and find that frequency.
I’m using AVFoundation (which is important) to read the file. Since the input data is in PCM, I need to use FFT to get the actual frequencies which I do using the Accelerate framework. However, I don’t get the expected result (13000Hz), but rather a lot of values I don’t understand. I’m new to audio development, so any hint about where my code is failing is appreciated. The code includes a few comments where the issue occurs.
Thanks in advance!
Code:
import AVFoundation
import Accelerate
class Analyzer {
// This function is implemented using the code from the following tutorial:
// https://developer.apple.com/documentation/accelerate/vdsp/fast_fourier_transforms/finding_the_component_frequencies_in_a_composite_sine_wave
func fftTransform(signal: [Float], n: vDSP_Length) -> [Int] {
let observed: [DSPComplex] = stride(from: 0, to: Int(n), by: 2).map {
return DSPComplex(real: signal[$0],
imag: signal[$0.advanced(by: 1)])
}
let halfN = Int(n / 2)
var forwardInputReal = [Float](repeating: 0, count: halfN)
var forwardInputImag = [Float](repeating: 0, count: halfN)
var forwardInput = DSPSplitComplex(realp: &forwardInputReal,
imagp: &forwardInputImag)
vDSP_ctoz(observed, 2,
&forwardInput, 1,
vDSP_Length(halfN))
let log2n = vDSP_Length(log2(Float(n)))
guard let fftSetUp = vDSP_create_fftsetup(
log2n,
FFTRadix(kFFTRadix2)) else {
fatalError("Can't create FFT setup.")
}
defer {
vDSP_destroy_fftsetup(fftSetUp)
}
var forwardOutputReal = [Float](repeating: 0, count: halfN)
var forwardOutputImag = [Float](repeating: 0, count: halfN)
var forwardOutput = DSPSplitComplex(realp: &forwardOutputReal,
imagp: &forwardOutputImag)
vDSP_fft_zrop(fftSetUp,
&forwardInput, 1,
&forwardOutput, 1,
log2n,
FFTDirection(kFFTDirection_Forward))
let componentFrequencies = forwardOutputImag.enumerated().filter {
$0.element < -1
}.map {
return $0.offset
}
return componentFrequencies
}
func run() {
// The frequencies array is a array of frequencies which is then converted to points on sinus curves (signal)
let n = vDSP_Length(4*4096)
let frequencies: [Float] = [1, 5, 25, 30, 75, 100, 300, 500, 512, 1023]
let tau: Float = .pi * 2
let signal: [Float] = (0 ... n).map { index in
frequencies.reduce(0) { accumulator, frequency in
let normalizedIndex = Float(index) / Float(n)
return accumulator + sin(normalizedIndex * frequency * tau)
}
}
// These signals are then restored using the fftTransform function above, giving the exact same values as in the "frequencies" variable
let frequenciesRestored = fftTransform(signal: signal, n: n).map({Float($0)})
assert(frequenciesRestored == frequencies)
// Now I want to do the same thing, but reading the frequencies from a file (which includes a constant tone at 13000 Hz)
let file = { PATH TO A WAV-FILE WITH A SINGLE TONE AT 13000Hz RUNNING FOR 1 SECOND }
let asset = AVURLAsset(url: URL(fileURLWithPath: file))
let track = asset.tracks[0]
do {
let reader = try AVAssetReader(asset: asset)
let sampleRate = 48000.0
let outputSettingsDict: [String: Any] = [
AVFormatIDKey: kAudioFormatLinearPCM,
AVSampleRateKey: Int(sampleRate),
AVLinearPCMIsNonInterleaved: false,
AVLinearPCMBitDepthKey: 16,
AVLinearPCMIsFloatKey: false,
AVLinearPCMIsBigEndianKey: false,
]
let output = AVAssetReaderTrackOutput(track: track, outputSettings: outputSettingsDict)
output.alwaysCopiesSampleData = false
reader.add(output)
reader.startReading()
typealias audioBuffertType = Int16
autoreleasepool {
while (reader.status == .reading) {
if let sampleBuffer = output.copyNextSampleBuffer() {
var audioBufferList = AudioBufferList(mNumberBuffers: 1, mBuffers: AudioBuffer(mNumberChannels: 0, mDataByteSize: 0, mData: nil))
var blockBuffer: CMBlockBuffer?
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
sampleBuffer,
bufferListSizeNeededOut: nil,
bufferListOut: &audioBufferList,
bufferListSize: MemoryLayout<AudioBufferList>.size,
blockBufferAllocator: nil,
blockBufferMemoryAllocator: nil,
flags: kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
blockBufferOut: &blockBuffer
);
let buffers = UnsafeBufferPointer<AudioBuffer>(start: &audioBufferList.mBuffers, count: Int(audioBufferList.mNumberBuffers))
for buffer in buffers {
let samplesCount = Int(buffer.mDataByteSize) / MemoryLayout<audioBuffertType>.size
let samplesPointer = audioBufferList.mBuffers.mData!.bindMemory(to: audioBuffertType.self, capacity: samplesCount)
let samples = UnsafeMutableBufferPointer<audioBuffertType>(start: samplesPointer, count: samplesCount)
let myValues: [Float] = samples.map {
let value = Float($0)
return value
}
// Here I would expect my array to include multiple "13000" which is the frequency of the tone in my file
// I'm not sure what the variable 'n' does in this case, but changing it seems to change the result.
// The value should be twice as high as the highest measurable frequency (Nyquist frequency) (13000),
// but this crashes the application:
let mySignals = fftTransform(signal: myValues, n: vDSP_Length(2 * 13000))
assert(mySignals[0] == 13000)
}
}
}
}
}
catch {
print("error!")
}
}
}
The test clip can be generated using:
sox -G -n -r 48000 ~/outputfile.wav synth 1.0 sine 13000

How to get CVPixelBuffer handle from UnsafeMutablePointer<UInt8> in Swift?

I got a decoded AVFrame whose format shows 160/Videotoolbox_vld. After googled some articles(here) and viewed the FFmpeg source code(here, and here), the CVBuffer handle should be at AVFrame.data[3]. But the CVBuffer I got seems invalid, any CVPixelBufferGetXXX() function returns 0 or nil.
If I used the av_hwframe_transfer_data() like the ffmpeg's example/hw_decode.c did, the sample can be downloaded from HW to SW buffer. Its AVFrame.format will be nv12. After converted via sws_scale to bgra, the sample can be showed on view with correct content.
I think the VideoToolbox decoded frame is OK. The way I convert AVFrame.data[3] to CVBuffer may be wrong. Just learned accessing c pointer in swift but I am not sure how to read a resource handle(CVBuffer) in a pointer correctly.
The following is how I try to extract CVBuffer from AVFrame
var pFrameOpt: UnsafeMutablePointer<AVFrame>? = av_frame_alloc()
avcodec_receive_frame(..., pFrameOpt)
let data3: UnsafeMutablePointer<UInt8>? = pFrameOpt?.pointee.data.3
data3?.withMemoryRebound(to: CVBuffer.self, capacity: 1) { pCvBuf in
let fW = pFrameOpt!.pointee.width // print 3840
let fH = pFrameOpt!.pointee.height // print 2160
let fFmt = pFrameOpt!.pointee.format // print 160
let cvBuf: CVBuffer = pCvBuf.pointee
let a1 = CVPixelBufferGetDataSize(cvBuf) // print 0
let a2 = CVPixelBufferGetPixelFormatType(cvBuf) // print 0
let a3 = CVPixelBufferGetWidth(cvBuf) // print 0
let a4 = CVPixelBufferGetHeight(cvBuf) // print 0
let a5 = CVPixelBufferGetBytesPerRow(cvBuf) // print 0
let a6 = CVPixelBufferGetBytesPerRowOfPlane(cvBuf, 0) // print 0
let a7 = CVPixelBufferGetWidthOfPlane(cvBuf, 0) // print 0
let a8 = CVPixelBufferGetHeightOfPlane(cvBuf, 0) // print 0
let a9 = CVPixelBufferGetPlaneCount(cvBuf) // print 0
let a10 = CVPixelBufferIsPlanar(cvBuf) // print false
let a11 = CVPixelBufferGetIOSurface(cvBuf) // print nil
let a12 = CVPixelBufferGetBaseAddress(cvBuf) // print nil
let a13 = CVPixelBufferGetBaseAddressOfPlane(cvBuf, 0) // print nil
let b1 = CVImageBufferGetCleanRect(cvBuf) // print 0, 0, 0, 0
let b2 = CVImageBufferGetColorSpace(cvBuf) // print nil
let b3 = CVImageBufferGetDisplaySize(cvBuf) // print 0, 0, 0, 0
let b4 = CVImageBufferGetEncodedSize(cvBuf) // print 0, 0, 0, 0
let b5 = CVImageBufferIsFlipped(cvBuf) // print false
// bad exec
var cvTextureOut: CVMetalTexture?
CVMetalTextureCacheCreateTextureFromImage(kCFAllocatorDefault, ..., cvBuf, nil, .bgra8Unorm, 3840, 2160, 0, ...)
}
CVBuffer is not a fixed size, so rebinding the memory won't work in this way. You need to do this:
Unmanaged<CVBuffer>.fromOpaque(data!).takeRetainedValue()
However, the bottom line is FFmpeg's VideoToolbox backend is not creating a CVPixelBuffer with kCVPixelBufferMetalCompatibilityKey set to true. You won't be able to call CVMetalTextureCacheCreateTextureFromImage(...) successfully in any case.
You could consider using a CVPixelBufferPool with appropriate settings (including kCVPixelBufferMetalCompatibilityKey set to true) and then using VTPixelTransferSession to quickly copy FFmpeg's pixel buffer to your own.
It seems like I wrongly cast void* to CVPixelBuffer* instead of casting void* directly to CVPixelBuffer. I cannot find a swift way to do such c style casting from pointer to numeric value. (Using as! CVPixelBuffer causes crash).
So I create a function for void* to CVPixelBufferRef in C code to do such casting job.
// util.h
#include <CoreVideo/CVPixelBuffer.h>
CVPixelBufferRef CastToCVPixelBuffer(void* p);
// util.c
CVPixelBufferRef CastToCVPixelBuffer(void* p)
{
return (CVPixelBufferRef)p;
}
// BridgeHeader.h
#include "util.h"
Then pass the UnsafeMutablePointer<UInt8> in, get CVPixelBuffer handle out.
let pFrameOpt: UnsafeMutablePointer<AVFrame>? = ...
let data3: UnsafeMutablePointer<UInt8>? = pFrameOpt?.pointee.data.3
let cvBuf: CVBuffer = CastToCVPixelBuffer(data3).takeUnretainedValue()
let width = CVPixelBufferGetWidth(cvBuf) // print 3840
let height = CVPixelBufferGetHeight(cvBuf) // print 2160
Try this
let cvBuf: CVBuffer = Array(UnsafeMutableBufferPointer(start: data3, count: 3))
.withUnsafeBufferPointer {
$0.baseAddress!.withMemoryRebound(to: CVBuffer.self, capacity: 1) { $0 }
}.pointee
or maybe even
let cvBuf: CVBuffer = unsafeBitcast(UnsafeMutableBufferPointer(start: data3, count: 3), to: CVBuffer.self)
/**
#function CVPixelBufferGetBaseAddressOfPlane
#abstract Returns the base address of the plane at planeIndex in the PixelBuffer.
#discussion Retrieving the base address for a PixelBuffer requires that the buffer base address be locked
via a successful call to CVPixelBufferLockBaseAddress. On OSX 10.10 and earlier, or iOS 8 and
earlier, calling this function with a non-planar buffer will have undefined behavior.
#param pixelBuffer Target PixelBuffer.
#param planeIndex Identifying the plane.
#result Base address of the plane, or NULL for non-planar CVPixelBufferRefs.
*/
#available(iOS 4.0, *)
public func CVPixelBufferGetBaseAddressOfPlane(_ pixelBuffer: CVPixelBuffer, _ planeIndex: Int) -> UnsafeMutableRawPointer?
maybe you can try use CVPixelBufferLockBaseAddress before use CVPixelBufferGetBaseAddressOfPlane