How to manually release CMSampleBuffer - swift

This code leads to memory leak and app crash:
var outputSamples = [Float]()
while assetReader.status == .reading {
let trackOutput = assetReader.outputs.first!
if let sampleBuffer = trackOutput.copyNextSampleBuffer(),
let blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer) {
let blockBufferLength = CMBlockBufferGetDataLength(blockBuffer)
let sampleLength = CMSampleBufferGetNumSamples(sampleBuffer) * channelCount(from: assetReader)
var data = Data(capacity: blockBufferLength)
data.withUnsafeMutableBytes { (blockSamples: UnsafeMutablePointer<Int16>) in
CMBlockBufferCopyDataBytes(blockBuffer, atOffset: 0, dataLength: blockBufferLength, destination: blockSamples)
let processedSamples = process(blockSamples,
ofLength: sampleLength,
from: assetReader,
downsampledTo: targetSampleCount)
outputSamples += processedSamples
var paddedSamples = [Float](repeating: silenceDbThreshold, count: targetSampleCount)
paddedSamples.replaceSubrange(0..<min(targetSampleCount, outputSamples.count), with: outputSamples)
This is due to copyNextSampleBuffer() and The Create Rule.
In turn, we can not use CFRelease() in Swift. The reason why a link to the Objective-C only rule is there is beyond my understanding.
Is there a way to release CMSampleBuffer manually in Swift?

I recently solved a similar issue by using an autoreleasepool
Try wrapping the area where sampleBuffer is used in an autoreleasepool. Something like this:
var outputSamples = [Float]()
while assetReader.status == .reading {
let trackOutput = assetReader.outputs.first!
autoreleasepool {
if let sampleBuffer = trackOutput.copyNextSampleBuffer(),
let blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer) {
let blockBufferLength = CMBlockBufferGetDataLength(blockBuffer)
let sampleLength = CMSampleBufferGetNumSamples(sampleBuffer) * channelCount(from: assetReader)
var data = Data(capacity: blockBufferLength)
data.withUnsafeMutableBytes { (blockSamples: UnsafeMutablePointer<Int16>) in
CMBlockBufferCopyDataBytes(blockBuffer, atOffset: 0, dataLength: blockBufferLength, destination: blockSamples)
let processedSamples = process(blockSamples,
ofLength: sampleLength,
from: assetReader,
downsampledTo: targetSampleCount)
outputSamples += processedSamples
var paddedSamples = [Float](repeating: silenceDbThreshold, count: targetSampleCount)
paddedSamples.replaceSubrange(0..<min(targetSampleCount, outputSamples.count), with: outputSamples)
If I understand correctly, once it moves out of the scope of autoreleasepool, the sampleBuffer will be released

This is not really a solution, because it seems that releasing memory manually is impossible and using while loop in conjunction with assetReader results in memory not being released when unsafe mutable bytes are read.
The problem was solved by a workaround: converting the audio file into CAF format before exposing it to the while loop.
Downside: it takes a hot second, the longer the audio file - the more time it takes.
Upside: it only used minuscule amount of memory, which was the problem in the first place.
Inspired by: answer in Extract meter levels from audio file


Memory issue in using CVPixelBufferPoolCreatePixelBuffer

I'm converting a CIImage to CVPixelBuffer for using in a streaming, so this conversion will happen 25-30 times per sec and need to be fast
So i heard if I use buffer pool, the performance will be good, here is the process that I do for it
private var bufferPool: CVPixelBufferPool?
private let context = CIContext(options: [.cacheIntermediates: false])
and then conversion
var mergedImageBuffer: CVPixelBuffer?
guard let bufferPool = bufferPool else {
Logger.logError("Error retrieving final buffer pool.")
CVPixelBufferPoolCreatePixelBuffer(nil, bufferPool, &mergedImageBuffer)
guard let validMergedImageBuffer = mergedImageBuffer else {
Logger.logError("Error creating CVPixelBuffer for output image.")
context.render(inputCIImage, to: validMergedImageBuffer)
But seems there will be memory leak and app will be crashed after 30-40 sec, and it point to CVPixelBufferPoolCreatePixelBuffer(nil, bufferPool, &mergedImageBuffer) line with error of EXC_RESOURCE RESOURCE_TYPE_MEMORY I read with CVPixelBufferLockBaseAddress and CVPixelBufferUnlockBaseAddress are possible to fix it but I can't make it work. could anyone help me on that? Thanks
I changed the process:
var pixelBuffer: CVPixelBuffer?
let attrs = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue,
kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue] as CFDictionary
let width:Int = Int(bufferPoolWidth)
let height:Int = Int(bufferPoolHeight)
guard let mergedPixelBuffer = pixelBuffer else { return }
CVPixelBufferLockBaseAddress(mergedPixelBuffer, .readOnly)
let context = CIContext()
context.render(inputCIImage, to: mergedPixelBuffer)
... Here I use the pixel buffer to send to the stream, and then
CVPixelBufferUnlockBaseAddress(mergedPixelBuffer, .readOnly)
It's now much much better (no crashing), but still have some large memory peak. Is there any way to improve it?

How do I specify a deallocator for memory allocated with mem_align in Swift?

I am creating paged aligned memory with memory_align, then I create a MTLBuffer from that with no copy. The GPU then blits data into that MTLBuffer. When that completes, I wrap that same memory in Data with Data.init(bytesNoCopy:count:deallocator:) to pass on in my project. I don't know what to use as the deallocator. I translating this code from an Apple tutorial written in OBJ-C. The Apple code is here. I spent two days trying to understand this researching myself.
The Apple OBJ-C code deallocator looks like this. This is beyond my OBJ-C knowledge.
// Block to dealloc memory created with vm_allocate
void (^deallocProvidedAddress)(void *bytes, NSUInteger length) =
^(void *bytes, NSUInteger length)
The code in question is towards the end of my listing.
// Blit all positions and velocities and provide them to the client either to show final results
// or continue the simulation on another device
func provideFullData(
_ dataProvider: AAPLFullDatasetProvider,
forSimulationTime time: CFAbsoluteTime
) {
let positionDataSize = positions[oldBufferIndex]!.length
let velocityDataSize = velocities[oldBufferIndex]!.length
var positionDataAddress: UnsafeMutableRawPointer? = nil
var velocityDataAddress: UnsafeMutableRawPointer? = nil
// Create buffers to transfer data to client
do {
// allocate memory on page aligned addresses use by both GPU and CPU
let alignment = 0x4000
// make length a mulitple of alignment
let positionAllocationSize = (positionDataSize + alignment - 1) & (~(alignment - 1))
posix_memalign(&positionDataAddress, alignment, positionAllocationSize)
let velocityAllocationSize = (velocityDataSize + alignment - 1) & (~(alignment - 1))
posix_memalign(&positionDataAddress, alignment, velocityAllocationSize)
// Blit positions and velocities to a buffer for transfer
do {
// create MTL buffers with created mem allighed
let positionBuffer = device.makeBuffer(
bytesNoCopy: &positionDataAddress,
length: positionDataSize,
options: .storageModeShared,
deallocator: nil)
positionBuffer?.label = "Final Positions Buffer"
let velocityBuffer = device.makeBuffer(
bytesNoCopy: &velocityDataAddress,
length: velocityDataSize,
options: .storageModeShared,
deallocator: nil)
velocityBuffer?.label = "Final Velocities Buffer"
let commandBuffer = commandQueue?.makeCommandBuffer()
commandBuffer?.label = "Full Transfer Command Buffer"
let blitEncoder = commandBuffer?.makeBlitCommandEncoder()
blitEncoder?.label = "Full Transfer Blits"
blitEncoder?.pushDebugGroup("Full Position Data Blit")
if let _position = positions[oldBufferIndex], let positionBuffer {
from: _position,
sourceOffset: 0,
to: positionBuffer,
destinationOffset: 0,
size: positionBuffer.length)
blitEncoder?.pushDebugGroup("Full Velocity Data Blit")
if let _velocity = velocities[oldBufferIndex], let velocityBuffer {
from: _velocity,
sourceOffset: 0,
to: velocityBuffer,
destinationOffset: 0,
size: velocityBuffer.length)
// Ensure blit of data is complete before providing
// the data to the client
// Wrap the memory allocated with vm_allocate
// with a NSData object which will allow the app to
// rely on ObjC ARC (or even MMR) to manage the
// memory's lifetime. Initialize NSData object
// with a deallocation block to free the
// vm_allocated memory when the object has been
// deallocated
do {
//this code was in obj-c I don'tlnow how to convert this to swift
// Block to dealloc memory created with vm_allocate
// let deallocProvidedAddress: ((_ bytes: UnsafeMutableRawPointer?, _ length: Int) -> Void)? =
// { bytes, length in
// vm_deallocate(
// mach_task_self() as? vm_map_t,
// bytes as? vm_address_t,
// length)
// }
let positionData = Data(
bytesNoCopy: &positionDataAddress,
count: positionDataSize,
deallocator: .none) // this may be a memory leak
let velocityData = Data(
bytesNoCopy: &velocityDataAddress,
count: velocityDataSize,
deallocator: .none) // this may be a memory leak
dataProvider(positionData, velocityData, time)
Here is the listing for the Apple OBJ-C code
// Set the initial positions and velocities of the simulation based upon the simulation's config
- (void)initializeData
const float pscale = _config->clusterScale;
const float vscale = _config->velocityScale * pscale;
const float inner = 2.5f * pscale;
const float outer = 4.0f * pscale;
const float length = outer - inner;
_oldBufferIndex = 0;
_newBufferIndex = 1;
vector_float4 *positions = (vector_float4 *) _positions[_oldBufferIndex].contents;
vector_float4 *velocities = (vector_float4 *) _velocities[_oldBufferIndex].contents;
for(int i = 0; i < _config->numBodies; i++)
vector_float3 nrpos = generate_random_normalized_vector(-1.0, 1.0, 1.0);
vector_float3 rpos = generate_random_vector(0.0, 1.0);
vector_float3 position = nrpos * (inner + (length * rpos));
positions[i].xyz = position;
positions[i].w = 1.0;
vector_float3 axis = {0.0, 0.0, 1.0};
float scalar = vector_dot(nrpos, axis);
if((1.0f - scalar) < 1e-6)
axis.xy = nrpos.yx;
axis = vector_normalize(axis);
vector_float3 velocity = vector_cross(position, axis);
velocities[i].xyz = velocity * vscale;
NSRange fullRange;
fullRange = NSMakeRange(0, _positions[_oldBufferIndex].length);
[_positions[_oldBufferIndex] didModifyRange:fullRange];
fullRange = NSMakeRange(0, _velocities[_oldBufferIndex].length);
[_velocities[_oldBufferIndex] didModifyRange:fullRange];
/// Set simulation data for a simulation that was begun elsewhere (i.e. on another device)
- (void)setPositionData:(nonnull NSData *)positionData
velocityData:(nonnull NSData *)velocityData
_oldBufferIndex = 0;
_newBufferIndex = 1;
vector_float4 *positions = (vector_float4 *) _positions[_oldBufferIndex].contents;
vector_float4 *velocities = (vector_float4 *) _velocities[_oldBufferIndex].contents;
assert(_positions[_oldBufferIndex].length == positionData.length);
assert(_velocities[_oldBufferIndex].length == velocityData.length);
memcpy(positions, positionData.bytes, positionData.length);
memcpy(velocities, velocityData.bytes, velocityData.length);
NSRange fullRange;
fullRange = NSMakeRange(0, _positions[_oldBufferIndex].length);
[_positions[_oldBufferIndex] didModifyRange:fullRange];
fullRange = NSMakeRange(0, _velocities[_oldBufferIndex].length);
[_velocities[_oldBufferIndex] didModifyRange:fullRange];
_simulationTime = simulationTime;
/// Blit a subset of the positions data for this frame and provide them to the client
/// to show a summary of the simulation's progress
- (void)fillUpdateBufferWithPositionBuffer:(nonnull id<MTLBuffer>)buffer
usingCommandBuffer:(nonnull id<MTLCommandBuffer>)commandBuffer
id<MTLBlitCommandEncoder> blitEncoder = [commandBuffer blitCommandEncoder];
blitEncoder.label = #"Position Update Blit Encoder";
[blitEncoder pushDebugGroup:#"Position Update Blit Commands"];
[blitEncoder copyFromBuffer:buffer
[blitEncoder popDebugGroup];
[blitEncoder endEncoding];
/// Blit all positions and velocities and provide them to the client either to show final results
/// or continue the simulation on another device
- (void)provideFullData:(nonnull AAPLFullDatasetProvider)dataProvider
NSUInteger positionDataSize = _positions[_oldBufferIndex].length;
NSUInteger velocityDataSize = _velocities[_oldBufferIndex].length;
void *positionDataAddress = NULL;
void *velocityDataAddress = NULL;
// Create buffers to transfer data to client
// Use vm allocate to allocate buffer on page aligned address
kern_return_t err;
err = vm_allocate((vm_map_t)mach_task_self(),
assert(err == KERN_SUCCESS);
err = vm_allocate((vm_map_t)mach_task_self(),
assert(err == KERN_SUCCESS);
// Blit positions and velocities to a buffer for transfer
id<MTLBuffer> positionBuffer = [_device newBufferWithBytesNoCopy:positionDataAddress
positionBuffer.label = #"Final Positions Buffer";
id<MTLBuffer> velocityBuffer = [_device newBufferWithBytesNoCopy:velocityDataAddress
velocityBuffer.label = #"Final Velocities Buffer";
id<MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
commandBuffer.label = #"Full Transfer Command Buffer";
id<MTLBlitCommandEncoder> blitEncoder = [commandBuffer blitCommandEncoder];
blitEncoder.label = #"Full Transfer Blits";
[blitEncoder pushDebugGroup:#"Full Position Data Blit"];
[blitEncoder copyFromBuffer:_positions[_oldBufferIndex]
[blitEncoder popDebugGroup];
[blitEncoder pushDebugGroup:#"Full Velocity Data Blit"];
[blitEncoder copyFromBuffer:_velocities[_oldBufferIndex]
[blitEncoder popDebugGroup];
[blitEncoder endEncoding];
[commandBuffer commit];
// Ensure blit of data is complete before providing the data to the client
[commandBuffer waitUntilCompleted];
// Wrap the memory allocated with vm_allocate with a NSData object which will allow the app to
// rely on ObjC ARC (or even MMR) to manage the memory's lifetime. Initialize NSData object
// with a deallocation block to free the vm_allocated memory when the object has been
// deallocated
// Block to dealloc memory created with vm_allocate
void (^deallocProvidedAddress)(void *bytes, NSUInteger length) =
^(void *bytes, NSUInteger length)
NSData *positionData = [[NSData alloc] initWithBytesNoCopy:positionDataAddress
NSData *velocityData = [[NSData alloc] initWithBytesNoCopy:velocityDataAddress
dataProvider(positionData, velocityData, time);
You define the deallocation block (or even a named function) similar to the way its done in Obj-C, though some casting is needed. The Obj-C deallocator block becomes the following closure in Swift:
let deallocProvidedAddress = {
(_ bytes: UnsafeMutableRawPointer, _ length: Int) -> Void in
vm_deallocate(mach_task_self_, vm_offset_t(bitPattern: bytes), vm_size_t(length))
Then instead of .none for the deallocator parameter for Data(bytesNoCopy:count:deallocator), you pass .custom(deallocProvidedAddress).
let positionData = Data(
bytesNoCopy: &positionDataAddress,
count: positionDataSize,
deallocator: .custom(deallocProvidedAddress))
let velocityData = Data(
bytesNoCopy: &velocityDataAddress,
count: velocityDataSize,
deallocator: .custom(deallocProvidedAddress))
dataProvider(positionData, velocityData, time)
However, since you don't call vm_allocate, but instead use posix_memalign, you'd need to call free instead of vm_deallocate in deallocProvidedAddress:
let deallocProvidedAddress = {
(_ bytes: UnsafeMutableRawPointer, _ length: Int) -> Void in
How did I know to use free? Having never actually used posix_memalign myself, I just did man posix_memalign in Terminal, and it says, among other things:
Memory that is allocated via posix_memalign() can be used as an argument in subsequent calls to realloc(3), reallocf(3), and free(3).
So free is the appropriate way to deallocate memory allocated via posix_memalign
This is my translation of the Obj-C version of provideFullData into Swift. It uses vm_allocate and vm_deallocate since that's what the Obj-C version does, but you can easily replace that with posix_memalign and free, if you like:
/// Blit all positions and velocities and provide them to the client either to show final results
/// or continue the simulation on another device
func provide(fullData dataProvider: AAPLFullDatasetProvider, forSimulationTime time: CFAbsoluteTime)
let positionDataSize = positions[oldBufferIndex]!.length
let velocityDataSize = velocities[oldBufferIndex]!.length
func vm_alloc(count: Int) -> UnsafeMutableRawPointer?
var address: vm_address_t = 0
let err = vm_allocate(mach_task_self_, &address, vm_size_t(count), VM_FLAGS_ANYWHERE)
return err == KERN_SUCCESS
? UnsafeMutableRawPointer(bitPattern: address)
: nil
func makeMTLBuffer(
from bytes: UnsafeMutableRawPointer,
count: Int,
labeled label: String) -> MTLBuffer?
guard let buffer = device.makeBuffer(
bytesNoCopy: bytes,
length: count,
options: [.storageModeShared],
deallocator: nil)
else { return nil }
buffer.label = label
return buffer
guard let positionDataAddress = vm_alloc(count: positionDataSize) else {
fatalError("failed to allocate position data")
guard let velocityDataAddress = vm_alloc(count: velocityDataSize) else {
fatalError("failed to allocate velocity data")
// Blit positions and velocities to a buffer for transfer
guard let positionBuffer = makeMTLBuffer(
from: positionDataAddress,
count: positionDataSize,
labeled: "Final Positions Buffer")
else { fatalError("Failed to allocate positions MTLBuffer") }
guard let velocityBuffer = makeMTLBuffer(
from: velocityDataAddress,
count: velocityDataSize,
labeled: "Final Velocities Buffer")
else { fatalError("Failed to allocate velocities MTLBuffer") }
guard let commandBuffer = commandQueue.makeCommandBuffer() else {
fatalError("Failed to make commandBuffer")
commandBuffer.label = "Full Transfer Command Buffer"
guard let blitEncoder = commandBuffer.makeBlitCommandEncoder() else {
fatalError("Failed to make blitEncoder")
blitEncoder.label = "Full Transfer Blits"
blitEncoder.pushDebugGroup("Full Position Data Blit")
from: positions[oldBufferIndex]!,
sourceOffset: 0,
to: positionBuffer,
destinationOffset: 0,
size: positionBuffer.length
blitEncoder.pushDebugGroup("Full Velocity Data Blit")
from: velocities[oldBufferIndex]!,
sourceOffset: 0,
to: velocityBuffer,
destinationOffset: 0,
size: velocityBuffer.length
// Ensure blit of data is complete before providing the data to the client
// Wrap the memory allocated with vm_allocate with a NSData object which will allow the app to
// rely on ObjC ARC (or even MMR) to manage the memory's lifetime. Initialize NSData object
// with a deallocation block to free the vm_allocated memory when the object has been
// deallocated
// Block to dealloc memory created with vm_allocate
let deallocProvidedAddress =
{ (_ bytes: UnsafeMutableRawPointer, _ length: Int) -> Void in
vm_offset_t(bitPattern: bytes),
let positionData = Data(
bytesNoCopy: positionDataAddress,
count: positionDataSize,
deallocator: .custom(deallocProvidedAddress))
let velocityData = Data(
bytesNoCopy: velocityDataAddress,
count: velocityDataSize,
deallocator: .custom(deallocProvidedAddress))
dataProvider(positionData, velocityData, time)
I see lots of opportunities for refactoring here (I already did a little bit). If you do something other than fatalError in the "sad" path, don't forget that you need to deallocate positionDataAddress and velocityDataAddress before returning or throwing. I would at least refactor it so that each Data instance is made immediately after its successful vm_allocate/posix_memalign instead of waiting until the very end of the method, that way, in case of errors, clean up can happen automatically. I'd also extract all the Metal blit code into it's own function.
Refactored version
I was originally going to let the above version stand as is, but it cries out for reorganization, so I refactored it as I suggested above, plus a bit more.
For convenience, I created an extension on MTLBlitCommandEncoder to encode a copy from an MTLBuffer to Data:
fileprivate extension MTLBlitCommandEncoder
func encodeCopy(
from src: MTLBuffer,
to dst: MTLBuffer,
dstName: #autoclosure () -> String)
pushDebugGroup("Full \(dstName()) Data Blit")
defer { popDebugGroup() }
from: src, sourceOffset: 0,
to: dst, destinationOffset: 0,
size: dst.length
func encodeCopy(
from src: MTLBuffer,
to dst: inout Data,
dstName: #autoclosure () -> String)
guard let buffer = device.makeBuffer(
bytesNoCopy: $0.baseAddress!,
length: $0.count,
options: [.storageModeShared],
deallocator: nil)
else { fatalError("Failed to allocate MTLBuffer for \(dstName())") }
buffer.label = "\(dstName()) Buffer"
encodeCopy(from: src, to: buffer, dstName: dstName())
I moved nested functions to fileprivate methods, and changed from a closure for the custom deallocator to static method, renaming it to vm_dealloc:
fileprivate static func vm_dealloc(
_ bytes: UnsafeMutableRawPointer,
_ length: Int)
vm_offset_t(bitPattern: bytes),
fileprivate func vm_alloc(count: Int) -> UnsafeMutableRawPointer?
var address: vm_address_t = 0
let err = vm_allocate(mach_task_self_, &address, vm_size_t(count), VM_FLAGS_ANYWHERE)
return err == KERN_SUCCESS
? UnsafeMutableRawPointer(bitPattern: address)
: nil
Since the pointer will be stored in an instance of Data anyway, and Data can handle clean up automatically, I write vmAllocData(count:) to allocate the memory, and then immediately put it in a Data. The calling code doesn't need to worry about the underlying pointer anymore.
fileprivate func vmAllocData(count: Int) -> Data?
guard let ptr = vm_alloc(count: count) else {
return nil
return Data(
bytesNoCopy: ptr,
count: count,
deallocator: .custom(Self.vm_dealloc)
Then I move the Metal code to a copy(positionsInto:andVelicitiesInto:) method. Some would quibble with the "and" in the name because it says that it's doing more than one thing, and it is... but it's matter of efficiency in using the same MTLBlitCommandEncoder to encode copying both positions and velocities. So yeah, it does more than one thing, but the other option is to create the encoder separately and pass it in which would spread the Metal code out a bit more than is necessary. I think in this case it's OK to do more than one thing for the sake of efficiency and sequestering the Metal code. Anyway, this function uses encodeCopy from the extension above:
fileprivate func copy(
positionsInto positionData: inout Data,
andVelocitiesInto velocityData: inout Data)
guard let commandBuffer = commandQueue.makeCommandBuffer() else {
fatalError("Failed to make commandBuffer")
commandBuffer.label = "Full Transfer Command Buffer"
guard let blitEncoder = commandBuffer.makeBlitCommandEncoder() else {
fatalError("Failed to make blitEncoder")
blitEncoder.label = "Full Transfer Blits"
guard let positionSrc = positions[oldBufferIndex] else {
fatalError("positions[\(oldBufferIndex)] is nil!")
from: positionSrc,
to: &positionData,
dstName: "Positions"
guard let velocitySrc = velocities[oldBufferIndex] else {
fatalError("velocities[\(oldBufferIndex)] is nil!")
from: velocitySrc,
to: &velocityData,
dstName: "Velocity"
// Ensure blit of data is complete before providing the data to the client
Then finally provide(fullData:forSimulationTime) becomes:
func provide(fullData dataProvider: AAPLFullDatasetProvider, forSimulationTime time: CFAbsoluteTime)
let positionDataSize = positions[oldBufferIndex]!.length
let velocityDataSize = velocities[oldBufferIndex]!.length
guard var positionData = vmAllocData(count: positionDataSize) else {
fatalError("failed to allocate position data")
guard var velocityData = vmAllocData(count: velocityDataSize) else {
fatalError("failed to allocate velocity data")
copy(positionsInto: &positionData, andVelocitiesInto: &velocityData)
dataProvider(positionData, velocityData, time)

Implementing AVVideoCompositing causes video rotation problems

I using Apple's example and have some issues with video transformation.
If source assets have preferredTransform other than identity, output video will have incorrectly rotated frames. This problem can be fixed if AVMutableVideoComposition doesn't have value in property customVideoCompositorClass and when AVMutableVideoCompositionLayerInstruction's transform is setted up with asset.preferredTransform. But in reason of using custom video compositor, which adopting an AVVideoCompositing protocol I can't use standard video compositing instructions.
How can I pre-transform input asset tracks before it's CVPixelBuffer's putted into Metal shaders? Or there are any other way to fix it?
Fragment of original code:
func buildCompositionObjectsForPlayback(_ forPlayback: Bool, overwriteExistingObjects: Bool) {
// Proceed only if the composition objects have not already been created.
if self.composition != nil && !overwriteExistingObjects { return }
if self.videoComposition != nil && !overwriteExistingObjects { return }
guard !clips.isEmpty else { return }
// Use the naturalSize of the first video track.
let videoTracks = clips[0].tracks(withMediaType:
let videoSize = videoTracks[0].naturalSize
let composition = AVMutableComposition()
composition.naturalSize = videoSize
With transitions:
Place clips into alternating video & audio tracks in composition, overlapped by transitionDuration.
Set up the video composition to cycle between "pass through A", "transition from A to B", "pass through B".
let videoComposition = AVMutableVideoComposition()
if self.transitionType == TransitionType.diagonalWipe.rawValue {
videoComposition.customVideoCompositorClass = APLDiagonalWipeCompositor.self
} else {
videoComposition.customVideoCompositorClass = APLCrossDissolveCompositor.self
// Every videoComposition needs these properties to be set:
videoComposition.frameDuration = CMTimeMake(1, 30) // 30 fps.
videoComposition.renderSize = videoSize
buildTransitionComposition(composition, andVideoComposition: videoComposition)
self.composition = composition
self.videoComposition = videoComposition
I did workaround for transforming like this:
private func makeTransformedPixelBuffer(fromBuffer buffer: CVPixelBuffer, withTransform transform: CGAffineTransform) -> CVPixelBuffer? {
guard let newBuffer = renderContext?.newPixelBuffer() else {
return nil
// Correct transformation example I took from
var preferredTransform = transform
preferredTransform.b *= -1
preferredTransform.c *= -1
var transformedImage = CIImage(cvPixelBuffer: buffer).transformed(by: preferredTransform)
preferredTransform = CGAffineTransform(translationX: -transformedImage.extent.origin.x, y: -transformedImage.extent.origin.y)
transformedImage = transformedImage.transformed(by: preferredTransform)
let filterContext = CIContext(mtlDevice: MTLCreateSystemDefaultDevice()!)
filterContext.render(transformedImage, to: newBuffer)
return newBuffer
But wondering if there are more memory-effective way without creation of new pixel buffers
How can I pre-transform input asset tracks before it's CVPixelBuffer's
putted into Metal shaders?
The best way to achieve maximum performance is to transform your video frame directly in shader. You just need to add rotation matrix in your Vertex shader.

Do swift manage the memory of a CVPixelBuffer that I create from CVPixelBufferCreate?

let said I wanted to store a frame from camera output
let imageBuffer:CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
And here is how the copy function is defined by extension to CVPixelBuffer
extension CVPixelBuffer {
func copy() -> CVPixelBuffer {
precondition(CFGetTypeID(self) == CVPixelBufferGetTypeID(), "copy() cannot be called on a non-CVPixelBuffer")
var _copy : CVPixelBuffer?
CVBufferGetAttachments(self, CVAttachmentMode.shouldPropagate),
guard let copy = _copy else { fatalError() }
CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags.readOnly)
CVPixelBufferLockBaseAddress(copy, CVPixelBufferLockFlags(rawValue: 0))
let dest = CVPixelBufferGetBaseAddress(copy)
let source = CVPixelBufferGetBaseAddress(self)
let height = CVPixelBufferGetHeight(self)
let bytesPerRow = CVPixelBufferGetBytesPerRow(self)
memcpy(dest, source, height * bytesPerRow)
CVPixelBufferUnlockBaseAddress(copy, CVPixelBufferLockFlags(rawValue: 0))
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags.readOnly)
return copy
The question is, do I need to explicitly manage the CVPixelBuffer copy I created? or does swift take care of it through reference count?
Swift manages your buffer object, so you don't have to consider about releasing it.
Core Foundation objects returned from annotated APIs are automatically memory-managed in Swift—you don't need to invoke the CFRetain, CFRelease, or CFAutorelease functions yourself.
In fact, there is no Swift version of CVPixelBufferRelease function.

Why is an iPhone XS getting worse CPU performance when using the camera live than an iPhone 6S Plus?

I'm using live camera output to update a CIImage on a MTKView. My main issue is that I have a large, negative performance difference where an older iPhone gets better CPU performance than a newer one, despite all their settings I've come across are the same.
This is a lengthy post, but I decided to include these details since they could be important to the cause of this problem. Please let me know what else I can include.
Below, I have my captureOutput function with two debug bools that I can turn on and off while running. I used this to try to determine the cause of my issue.
applyLiveFilter - bool whether or not to manipulate the CIImage with a CIFilter.
updateMetalView - bool whether or not to update the MTKView's CIImage.
// live output from camera
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection){
Create CIImage from camera.
Here I save a few percent of CPU by using a function
to convert a sampleBuffer to a Metal texture, but
whether I use this or the commented out code
(without captureOutputMTLOptions) does not have
significant impact.
guard let texture:MTLTexture = convertToMTLTexture(sampleBuffer: sampleBuffer) else{
var cameraImage:CIImage = CIImage(mtlTexture: texture, options: captureOutputMTLOptions)!
var transform: CGAffineTransform = .identity
transform = transform.scaledBy(x: 1, y: -1)
transform = transform.translatedBy(x: 0, y: -cameraImage.extent.height)
cameraImage = cameraImage.transformed(by: transform)
// old non-Metal way of getting the ciimage from the cvPixelBuffer
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else
var cameraImage:CIImage = CIImage(cvPixelBuffer: pixelBuffer)
var orientation = UIImage.Orientation.right
orientation = UIImage.Orientation.leftMirrored
// apply filter to camera image
if debug_applyLiveFilter {
cameraImage = self.applyFilterAndReturnImage(ciImage: cameraImage, orientation: orientation, currentCameraRes:currentCameraRes!)
if debug_updateMetalView {
self.MTLCaptureView!.image = cameraImage
Below is a chart of results between both phones toggling the different combinations of bools discussed above:
Even without the Metal view's CIIMage updating and no filters being applied, the iPhone XS's CPU is 2% greater than iPhone 6S Plus's, which isn't a significant overhead, but makes me suspect that somehow how the camera is capturing is different between the devices.
My AVCaptureSession's preset is set identically between both phones
The CIImage created from captureOutput is the same size (extent)
between both phones.
Are there any settings I need to set manually between these two phones AVCaptureDevice's settings, including activeFormat properties, to make them the same between devices?
The settings I have now are:
if let captureDevice = AVCaptureDevice.default( {
do {
try captureDevice.lockForConfiguration()
captureDevice.isSubjectAreaChangeMonitoringEnabled = true
captureDevice.focusMode = AVCaptureDevice.FocusMode.continuousAutoFocus
captureDevice.exposureMode = AVCaptureDevice.ExposureMode.continuousAutoExposure
} catch {
// Handle errors here
print("There was an error focusing the device's camera")
My MTKView is based off code written by Simon Gladman, with some edits for performance and to scale the render before it is scaled up to the width of the screen using Core Animation suggested by Apple.
class MetalImageView: MTKView
let colorSpace = CGColorSpaceCreateDeviceRGB()
var textureCache: CVMetalTextureCache?
var sourceTexture: MTLTexture!
lazy var commandQueue: MTLCommandQueue =
[unowned self] in
return self.device!.makeCommandQueue()
lazy var ciContext: CIContext =
[unowned self] in
return CIContext(mtlDevice: self.device!)
override init(frame frameRect: CGRect, device: MTLDevice?)
super.init(frame: frameRect,
device: device ?? MTLCreateSystemDefaultDevice())
if super.device == nil
fatalError("Device doesn't support Metal")
CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, self.device!, nil, &textureCache)
framebufferOnly = false
enableSetNeedsDisplay = true
isPaused = true
preferredFramesPerSecond = 30
required init(coder: NSCoder)
fatalError("init(coder:) has not been implemented")
// The image to display
var image: CIImage?
override func draw(_ rect: CGRect)
guard var
image = image,
let targetTexture:MTLTexture = currentDrawable?.texture else
let commandBuffer = commandQueue.makeCommandBuffer()
let customDrawableSize:CGSize = drawableSize
let bounds = CGRect(origin:, size: customDrawableSize)
let originX = image.extent.origin.x
let originY = image.extent.origin.y
let scaleX = customDrawableSize.width / image.extent.width
let scaleY = customDrawableSize.height / image.extent.height
let scale = min(scaleX*IVScaleFactor, scaleY*IVScaleFactor)
image = image
.transformed(by: CGAffineTransform(translationX: -originX, y: -originY))
.transformed(by: CGAffineTransform(scaleX: scale, y: scale))
to: targetTexture,
commandBuffer: commandBuffer,
bounds: bounds,
colorSpace: colorSpace)
My AVCaptureSession (captureSession) and AVCaptureVideoDataOutput (videoOutput) are setup below:
func setupCameraAndMic(){
let backCamera = AVCaptureDevice.default(
var error: NSError?
var videoInput: AVCaptureDeviceInput!
do {
videoInput = try AVCaptureDeviceInput(device: backCamera!)
} catch let error1 as NSError {
error = error1
videoInput = nil
if error == nil &&
captureSession!.canAddInput(videoInput) {
guard CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, MetalDevice, nil, &textureCache) == kCVReturnSuccess else {
print("Error: could not create a texture cache")
stillImageOutput = AVCapturePhotoOutput()
if captureSession!.canAddOutput(stillImageOutput!) {
let q = DispatchQueue(label: "sample buffer delegate", qos: .default)
videoOutput.setSampleBufferDelegate(self, queue: q)
videoOutput.videoSettings = [
kCVPixelBufferPixelFormatTypeKey as AnyHashable as! String: NSNumber(value: kCVPixelFormatType_32BGRA),
kCVPixelBufferMetalCompatibilityKey as String: true
videoOutput.alwaysDiscardsLateVideoFrames = true
if captureSession!.canAddOutput(videoOutput){
The video and mic are recorded on two separate streams. Details on the microphone and recording video have been left out since my focus is performance of live camera output.
UPDATE - I have a simplified test project on GitHub that makes it a lot easier to test the problem I'm having:
From the top of my mind, you are not comparing pears with pears, even if you are running with the 2.49 GHz of A12 against 1.85 GHz of A9, the differences between the cameras are also huge, even if you use them with the same parameters there are several features from XS's camera that require more CPU resources (dual camera, stabilization, smart HDR, etc).
Sorry for the sources, I tried to find metrics of the CPU cost of those features, but I couldn't find it, unfortunately for your needs, that information is not relevant for marketing, when they are selling it as the best camera ever for an smartphone.
They are selling it as the best processor as well, we don't know what would happen using the XS camera with an A9 processor, it would probably crash, we will never know...
PS.... Your metrics are for the whole processor or for the used core? For the whole processor, you also need to consider other tasks that the devices can be executing, for the single core, is 21% of 200% against 39% of 600%