I'm trying to read the image frames from a Quicktime movie file using AVFoundation and AVAssetReader on macOSX. I want to display the frames via a texture map in Metal. There are many examples of using AVAssetReader online, but I cannot get it working for what I want.
I can read the basic frame data from the movie -- the time values, size, and durations in the printout look correct. However, when I try to get the pixelBuffer, CMSampleBufferGetImageBuffer returns NULL.
let track = asset.tracks(withMediaType: AVMediaType.video)[0]
let videoReaderSettings : [String : Int] = [kCVPixelBufferPixelFormatTypeKey as String : Int(kCVPixelFormatType_32BGRA)]
let output = AVAssetReaderTrackOutput(track:track, outputSettings:nil) // using videoReaderSettings causes it to no longer report frame data
guard let reader = try? AVAssetReader(asset: asset) else {exit(1)}
output.alwaysCopiesSampleData = true
reader.add(output)
reader.startReading()
while(reader.status == .reading){
if let sampleBuffer = output.copyNextSampleBuffer(), CMSampleBufferIsValid(sampleBuffer) {
let frameTime = CMSampleBufferGetOutputPresentationTimeStamp(sampleBuffer)
if (frameTime.isValid){
print("frame: \(frameNumber), time: \(String(format:"%.3f", frameTime.seconds)), size: \(CMSampleBufferGetTotalSampleSize(sampleBuffer)), duration: \( CMSampleBufferGetOutputDuration(sampleBuffer).value)")
if let pixelBuffer : CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) {
getTextureFromCVBuffer(pixelBuffer)
// break
}
frameNumber += 1
}
}
}
This problem was addressed here (Why does CMSampleBufferGetImageBuffer return NULL) where it is suggested that the problem is that one must specify a video format in the settings argument instead of 'nil'. So I tried replacing 'nil' with 'videoReaderSettings' above, with various values for the format: kCVPixelFormatType_32BGRA, kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange, and others.
The result is that the frame 'time' values are still correct, but the 'size' and 'duration' values are 0. However, CMSampleBufferGetImageBuffer DOES return something, where before it was 0. But garbage shows up onscreen.
Here is the function which converts the pixelBuffer to a Metal texture.
func getTextureFromCVBuffer(_ pixelBuffer:CVPixelBuffer) {
// Get width and height for the pixel buffer
let width = CVPixelBufferGetWidth(pixelBuffer)
let height = CVPixelBufferGetHeight(pixelBuffer)
// Converts the pixel buffer in a Metal texture.
var cvTextureOut: CVMetalTexture?
if CVMetalTextureCacheCreateTextureFromImage(kCFAllocatorDefault, self.textureCache!, pixelBuffer, nil, .bgra8Unorm, width, height, 0, &cvTextureOut) != kCVReturnSuccess {
print ("CVMetalTexture create failed!")
}
guard let cvTexture = cvTextureOut, let inputTexture = CVMetalTextureGetTexture(cvTexture) else {
print("Failed to create metal texture")
return
}
texture = inputTexture
}
When I'm able to pass a pixelBuffer to this function, it does report the correct size for the image. But as I said, what appears onscreen is garbage -- its composed of chunks of recent Safari browser pages actually. I'm not sure if the problem is in the first function or the second function. A nonzero return value from CMSampleBufferGetImageBuffer is encouraging, but the 0's for size and duration are not.
I found this thread (Buffer size of CMSampleBufferRef) which suggests that showing 0 for the size and duration may not be a problem, so maybe the issue is in the conversion to the Metal texture?
Any idea what I am doing wrong?
Thanks!
put videoReaderSetting at AVAssetReaderTrackOutput.
let videoReaderSettings : [String : Int] = [kCVPixelBufferPixelFormatTypeKey as String : Int(kCVPixelFormatType_32BGRA)]
let output = AVAssetReaderTrackOutput(track:track, outputSettings: videoReaderSettings)
Related
I am trying to create a real time video processing app, in which I need to get the RGBA values of all pixels for each frame, and process them using an external library, and show them. I am trying to get the RGBA value for each pixel, but it is too slow the way I am doing it, I was wondering if there is a way to do it faster, using VImage. This is my current code, and the way I get all the pixels, as I get the current frame:
guard let cgImage = context.makeImage() else {
return nil
}
guard let data = cgImage.dataProvider?.data,
let bytes = CFDataGetBytePtr(data) else {
fatalError("Couldn't access image data")
}
assert(cgImage.colorSpace?.model == .rgb)
let bytesPerPixel = cgImage.bitsPerPixel / cgImage.bitsPerComponent
gp.async {
for y in 0 ..< cgImage.height {
for x in 0 ..< cgImage.width {
let offset = (y * cgImage.bytesPerRow) + (x * bytesPerPixel)
let components = (r: bytes[offset], g: bytes[offset + 1], b: bytes[offset + 2])
print("[x:\(x), y:\(y)] \(components)")
}
print("---")
}
}
This is the version using the VImage, but I there is some memory leak, and I can not access the pixels
guard
let format = vImage_CGImageFormat(cgImage: cgImage),
var buffer = try? vImage_Buffer(cgImage: cgImage,
format: format) else {
exit(-1)
}
let rowStride = buffer.rowBytes / MemoryLayout<Pixel_8>.stride / format.componentCount
do {
let componentCount = format.componentCount
var argbSourcePlanarBuffers: [vImage_Buffer] = (0 ..< componentCount).map { _ in
guard let buffer1 = try? vImage_Buffer(width: Int(buffer.width),
height: Int(buffer.height),
bitsPerPixel: format.bitsPerComponent) else {
fatalError("Error creating source buffers.")
}
return buffer1
}
vImageConvert_ARGB8888toPlanar8(&buffer,
&argbSourcePlanarBuffers[0],
&argbSourcePlanarBuffers[1],
&argbSourcePlanarBuffers[2],
&argbSourcePlanarBuffers[3],
vImage_Flags(kvImageNoFlags))
let n = rowStride * Int(argbSourcePlanarBuffers[1].height) * format.componentCount
let start = buffer.data.assumingMemoryBound(to: Pixel_8.self)
var ptr = UnsafeBufferPointer(start: start, count: n)
print(Array(argbSourcePlanarBuffers)[1]) // prints the first 15 interleaved values
buffer.free()
}
You can access the underlying pixels in a vImage buffer to do this.
For example, given an image named cgImage, use the following code to populate a vImage buffer:
guard
let format = vImage_CGImageFormat(cgImage: cgImage),
let buffer = try? vImage_Buffer(cgImage: cgImage,
format: format) else {
exit(-1)
}
let rowStride = buffer.rowBytes / MemoryLayout<Pixel_8>.stride / format.componentCount
Note that a vImage buffer's data may be wider than the image (see: https://developer.apple.com/documentation/accelerate/finding_the_sharpest_image_in_a_sequence_of_captured_images) which is why I've added rowStride.
To access the pixels as a single buffer of interleaved values, use:
do {
let n = rowStride * Int(buffer.height) * format.componentCount
let start = buffer.data.assumingMemoryBound(to: Pixel_8.self)
let ptr = UnsafeBufferPointer(start: start, count: n)
print(Array(ptr)[ 0 ... 15]) // prints the first 15 interleaved values
}
To access the pixels as a buffer of Pixel_8888 values, use (make sure that format.componentCount is 4:
do {
let n = rowStride * Int(buffer.height)
let start = buffer.data.assumingMemoryBound(to: Pixel_8888.self)
let ptr = UnsafeBufferPointer(start: start, count: n)
print(Array(ptr)[ 0 ... 3]) // prints the first 4 pixels
}
This is the slowest way to do it. A faster way is with a custom CoreImage filter.
Faster than that is to write your own OpenGL Shader (or rather, it's equivalent in Metal for current devices)
I've written OpenGL shaders, but have not worked with Metal yet.
Both allow you to write graphics code that runs directly on the GPU.
I'm kind of stuck right now when it comes to applying a specific texture on my 3d obj model.
Easiest solution of all would be to do let test = SCNScene(named: "models.scnassets/modelFolder/ModelName.obj"), but this requires that the mtl file maps the texture file directly inside of it which is not something that's possible with my current workflow.
With my current understanding, this leaves me with the option of using a scattering function to apply textures to a specific semantic, something like such :
if let url = URL(string: obj) {
let asset = MDLAsset(url: url)
guard let object = asset.object(at: 0) as? MDLMesh else {
print("Failed to get mesh from asset.")
self.presentAlert(title: "Warning", message: "Could not fetch the model.", firstBtn: "Ok")
return
}
// Create a material from the various textures with a scatteringFunction
let scatteringFunction = MDLScatteringFunction()
let material = MDLMaterial(name: "material", scatteringFunction: scatteringFunction)
let property = MDLMaterialProperty(name: "texture", semantic: .baseColor, url: URL(string: self.textureURL))
material.setProperty(property)
// Apply the texture to every submesh of the asset
object.submeshes?.forEach {
if let submesh = $0 as? MDLSubmesh {
submesh.material = material
}
}
// Wrap the ModelIO object in a SceneKit object
let node = SCNNode(mdlObject: object)
let scene = SCNScene()
scene.rootNode.addChildNode(node)
// Set up the SceneView
sceneView.scene = scene
...
}
The actual problem is the semantics. The 3d models are made on Unreal and for many models there's a png texture which has 3 semantics inside of it, namely Ambient Occlusion, Roughness and Metallic. Ambient Occlusion would need to be applied on the red channel, Roughness on the greed channel and Metallic on the blue channel.
How could I achieve this? An MdlMaterialSemantic has all of these possible semantics, but metallic, ambient occlusion and roughness are all separate. I tried simply applying the texture on each, but obviously this did not work very well.
Considering that my .png texture has all of those 3 "packaged" in it under a different channel, how can I work with this? I was thinking that maybe I could somehow use a small script to add mapping to the texture in the mtl file on my end in the app directly, but this seems sketchy lol..
What are my other options if there's no way of doing this? I've also been trying to use fbx files with assimpKit, but I couldn't manage to load any textures, just the model in black...
I am open to any suggestion, if more info is needed, please let me know! Thank you very much!
Sorry, I don't have enough rep to comment, but this might be more of a comment than an answer!
Have you tried loading the texture png image separately (as a NS/UI/CGImage) and then splitting it into three channels manually, then applying these channels separately? (Splitting into three separate channels is not as simple as it could be... but you could use this grayscale conversion for guidance, and just do one channel at a time.)
Once you have your objects in SceneKit, it is possibly slightly easier to modify these materials. Once you have a SCNNode with a SCNGeometry with a SCNMaterial you can access any of these materials and set the .contents property to almost anything (including a XXImage).
Edit:
Here's an extension you can try to extract the individual channels from a CGImage using Accelerate. You can get a CGImage from an NSImage/UIImage depending on whether you're on Mac or iOS (and you can load the file directly into one of those image formats).
I've just adapted the code from the link above, I am not very experienced with the Accelerate framework, so use at your own risk! But hopefully this puts you on the right path.
extension CGImage {
enum Channel {
case red, green, blue
}
func getChannel(channel: Channel) -> CGImage? {
// code adapted from https://developer.apple.com/documentation/accelerate/converting_color_images_to_grayscale
guard let format = vImage_CGImageFormat(cgImage: cgImage) else {return nil}
guard var sourceImageBuffer = try? vImage_Buffer(cgImage: cgImage, format: format) else {return nil}
guard var destinationBuffer = try? vImage_Buffer(width: Int(sourceImageBuffer.width), height: Int(sourceImageBuffer.height), bitsPerPixel: 8) else {return nil}
defer {
sourceImageBuffer.free()
destinationBuffer.free()
}
let redCoefficient: Float = channel == .red ? 1 : 0
let greenCoefficient: Float = channel == .green ? 1 : 0
let blueCoefficient: Float = channel == .blue ? 1 : 0
let divisor: Int32 = 0x1000
let fDivisor = Float(divisor)
var coefficientsMatrix = [
Int16(redCoefficient * fDivisor),
Int16(greenCoefficient * fDivisor),
Int16(blueCoefficient * fDivisor)
]
let preBias: [Int16] = [0, 0, 0, 0]
let postBias: Int32 = 0
vImageMatrixMultiply_ARGB8888ToPlanar8(&sourceImageBuffer,
&destinationBuffer,
&coefficientsMatrix,
divisor,
preBias,
postBias,
vImage_Flags(kvImageNoFlags))
guard let monoFormat = vImage_CGImageFormat(
bitsPerComponent: 8,
bitsPerPixel: 8,
colorSpace: CGColorSpaceCreateDeviceGray(),
bitmapInfo: CGBitmapInfo(rawValue: CGImageAlphaInfo.none.rawValue),
renderingIntent: .defaultIntent) else {return nil}
guard let result = try? destinationBuffer.createCGImage(format: monoFormat) else {return nil}
return result
}
}
Xcode's GPU frame capture highlight multiple expressions as purple and say I should set the texture storage mode to private because only GPU access it. I am trying to fix the purple suggestion.
Memory Usage'Texture:0x10499ae00 "CoreVideo 0x6000017f2bc0"' has storage mode 'Managed' but is accessed exclusively by a GPU
When using device.makeBuffer(bytes:length:options:) to create MTLTexture, I can set storageMode to private in the argument options.
But when create MTLTexture from CVPixelBuffer through CVMetalTextureCacheCreateTextureFromImage(), I don't know how to configure the storage mode for the created texture.
Ways I tried:
Pass a texture attributes dictionary to the textureAttributes argument in CVMetalTextureCacheCreateTextureFromImage(..., _ textureAttributes: CFDictionary?, ...)
var textureAttrs: [String: Any] = [:]
if #available(macOS 10.15, *) {
textureAttrs[kCVMetalTextureStorageMode as String] = MTLStorageMode.private
}
CVMetalTextureCacheCreateTextureFromImage(,,,textureAttrs as CFDictionary,..., &texture)
if let texture = texture,
let metalTexture = CVMetalTextureGetTexture(texture) {
print(metalTexture.storageMode.rawValue)
}
}
My OS is already 10.15.4, but the created MTLTexture still has storageMode as managed/rawValue: 1
Pass the same attribute to CVMetalTextureCacheCreate() which creates the cache for CVMetalTextureCacheCreateTextureFromImage in cacheAttributes and textureAttributes.
The result is the same.
Problems:
Is that my attributes dictionary has wrong key-value set? The apple documentation doesn't describe which key and value need to be set.
Or there is a correct way to configure
Or currently it does not support yet?
References:
makeBuffer(bytes:length:options:)
CVMetalTextureCacheCreateTextureFromImage(::::::::_:)
CVMetalTextureCacheCreate(::::_:)
maxOS 10.15+ kCVMetalTextureStorageMode
I have experience with Metal and had the same kind of issue. There is no way to change texture storageMode one the fly. You have to create another MTLTexture with desired storageMode and use MTLBlitCommandEncoder to copy data to it.
Here is the piece of code from my project:
MTLTextureDescriptor* descriptor = [[MTLTextureDescriptor alloc] init];
descriptor.storageMode = MTLStorageModePrivate;
descriptor.pixelFormat = MTLPixelFormatRGBA8Unorm;
descriptor.width = width;
descriptor.height = height;
id<MTLTexture> texture = [__metal_device newTextureWithDescriptor: descriptor];
if ((data != NULL) && (size > 0)) {
id<MTLCommandQueue> command_queue = [__metal_device newCommandQueue];
id<MTLCommandBuffer> command_buffer = [command_queue commandBuffer];
id<MTLBlitCommandEncoder> command_encoder = [command_buffer blitCommandEncoder];
id<MTLBuffer> buffer = [__metal_device newBufferWithBytes: data
length: size
options: MTLResourceStorageModeShared];
[command_encoder copyFromBuffer: buffer
sourceOffset: 0
sourceBytesPerRow: (width * 4)
sourceBytesPerImage: (width * height * 4)
sourceSize: (MTLSize){ width, height, 1 }
toTexture: texture
destinationSlice: 0
destinationLevel: 0
destinationOrigin: (MTLOrigin){ 0, 0, 0 }];
[command_encoder endEncoding];
[command_buffer commit];
[command_buffer waitUntilCompleted];
}
To set the texture attributes - you muse use the rawValue of the MTLStorageMode. For example:
var textureAttrs: [String: Any] = [:]
if #available(macOS 10.15, *) {
result[kCVMetalTextureStorageMode as String] = MTLStorageMode.managed.rawValue
}
var textureCache: CVMetalTextureCache!
let textureCache = CVMetalTextureCacheCreate(
nil, nil, device, textureAttrs as CFDictionary, &self.textureCache)
Once the cache is created with those texture attributes, you can pass nil as the texture attributes in CVMetalTextureCacheCreateTextureFromImage when creating each texture as it will use whatever storage mode the cache was created with. For example:
var cvTexture: CVMetalTexture?
CVMetalTextureCacheCreateTextureFromImage(nil, textureCache, pixelBuffer, nil, .bgra8Unorm, width, height, 0, &cvTexture)
Something to note - I was getting metal warnings in Xcode that my textures should be made private instead of managed, but when setting the cache to a private storage mode, the following error occurred:
failed assertion 'Texture Descriptor Validation IOSurface textures must use MTLStorageModeManaged'
This is because these textures are IOSurface-backed. So for now I'm keeping it managed.
I'm having trouble converting a linear PCM buffer to a compressed AAC ELD (Enhanced Low Delay) buffer.
I got some working code for the conversion into ilbc format from this question:
AVAudioCompressedBuffer to UInt8 array and vice versa
This approach worked fine.
I changed the input for the format to this:
let packetCapacity = 8
let maximumPacketSize = 96
lazy var capacity = packetCapacity * maximumPacketSize // 768
let convertedSampleRate: Double = 16000
lazy var aaceldFormat: AVAudioFormat = {
var descriptor = AudioStreamBasicDescription(mSampleRate: convertedSampleRate, mFormatID: kAudioFormatMPEG4AAC_ELD, mFormatFlags: 0, mBytesPerPacket: 0, mFramesPerPacket: 0, mBytesPerFrame: 0, mChannelsPerFrame: 1, mBitsPerChannel: 0, mReserved: 0)
return AVAudioFormat(streamDescription: &descriptor)!
}()
The conversion to a compressed buffer worked fine and I was able to convert the buffer to a UInt8 Array.
However, the conversion back to a PCM Buffer didn't work. The input block for the conversion back to a buffer looks like this:
func convertToBuffer(uints: [UInt8], outcomeSampleRate: Double) -> AVAudioPCMBuffer? {
// Convert to buffer
let compressedBuffer: AVAudioCompressedBuffer = AVAudioCompressedBuffer(format: aaceldFormat, packetCapacity: AVAudioPacketCount(packetCapacity), maximumPacketSize: maximumPacketSize)
compressedBuffer.byteLength = UInt32(capacity)
compressedBuffer.packetCount = AVAudioPacketCount(packetCapacity)
var compressedBytes = uints
compressedBytes.withUnsafeMutableBufferPointer {
compressedBuffer.data.copyMemory(from: $0.baseAddress!, byteCount: capacity)
}
guard let audioFormat = AVAudioFormat(
commonFormat: AVAudioCommonFormat.pcmFormatFloat32,
sampleRate: outcomeSampleRate,
channels: 1,
interleaved: false
) else { return nil }
guard let uncompressor = getUncompressingConverter(outputFormat: audioFormat) else { return nil }
var newBufferAvailable = true
let inputBlock : AVAudioConverterInputBlock = {
inNumPackets, outStatus in
if newBufferAvailable {
outStatus.pointee = .haveData
newBufferAvailable = false
return compressedBuffer
} else {
outStatus.pointee = .noDataNow
return nil
}
}
guard let uncompressedBuffer: AVAudioPCMBuffer = AVAudioPCMBuffer(pcmFormat: audioFormat, frameCapacity: AVAudioFrameCount((audioFormat.sampleRate / 10))) else { return nil }
var conversionError: NSError?
uncompressor.convert(to: uncompressedBuffer, error: &conversionError, withInputFrom: inputBlock)
if let err = conversionError {
print("couldnt decompress compressed buffer", err)
}
return uncompressedBuffer
}
The error block after the convert method triggers and prints out "too few bits left in input buffer". Also, it seems like the input block only gets called once.
I've tried different codes and this seems to be one of the most common outcomes. I'm also not sure if the problem is in the initial conversion from the pcm buffer to uint8 array although I get an UInt8 Array filled with 768 values every 0.1 seconds (Sometimes the array contains a few zeros at the end, which doesn't happen in ilbc format.
Questions:
1. Is the initial conversion from pcm buffer to uint8 array done with the right approach? Are the packetCapacity, capacity and maximumPacketSize valid? -> Again, seems to work
2. Am I missing something at the conversion back to pcm buffer? Also, am I using the variables in the right way?
3. Has anyone achieved this conversion without using C in the project?
** EDIT: ** I also worked with the approach from this post:
Decode AAC to PCM format using AVAudioConverter Swift
It works fine with AAC format, but not with AAC_LD or AAC_ELD
I am receiving audio buffers and I am converting them into a conventional array for ease of use. This code has always been reliable. However, recently it began crashing quite frequently. I am using Airpods when it crashes, which may or may not be part of the problem. The mic object is an AKMicrophone object from AudioKit.
func tap(){
let recordingFormat = mic.outputNode.inputFormat(forBus: 0)
mic.outputNode.removeTap(onBus: 0)
mic.outputNode.installTap(onBus: 0,
bufferSize: UInt32(recordingBufferSize),
format: recordingFormat)
{ (buffer, when) in
let stereoDataUnsafePointer = buffer.floatChannelData!
let monoPointer = stereoDataUnsafePointer.pointee
let count = self.recordingBufferSize
let bufferPointer = UnsafeBufferPointer(start: monoPointer, count: count)
let array = Array(bufferPointer) //CRASHES HERE
}
mic.start()
}
When running on iPhone 7 with airpods, this crashes about 7/10 times with one of two different error messages:
EXC_BAD_ACCESS
Fatal error: UnsafeMutablePointer.initialize overlapping range
If the way I was converting the array was wrong I would expect it to crash every time. I speculate that the recording sample rate could be an issue.
I figured out the answer. I hardcoded the buffer size to 10000, and when initiating the tap, specified buffer size of 10000. However, the device ignored this buffer size and instead sent me buffers that were 6400. This meant when I tried to initialize an array of with size 10000 it went off the end. I modified code to check the actual buffer size, not the size I requested:
let stereoDataUnsafePointer = buffer.floatChannelData!
let monoPointer = stereoDataUnsafePointer.pointee
let count = buffer.frameLength //<--Check actual buffer size
let bufferPointer = UnsafeBufferPointer(start: monoPointer, count: Int(count))
let array = Array(bufferPointer)