How to memory manage a CMSampleBuffer - swift

I'm getting frames from my camera in the following way:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let imageBuffer: CVImageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
}
From the Apple documentation ...
If you need to reference the CMSampleBuffer object outside of the scope of this method, you must CFRetain it and then CFRelease it when you are finished with it.
To maintain optimal performance, some sample buffers directly reference pools of memory that may need to be reused by the device system and other capture inputs. This is frequently the case for uncompressed device native capture where memory blocks are copied as little as possible. If multiple sample buffers reference such pools of memory for too long, inputs will no longer be able to copy new samples into memory and those samples will be dropped.
Is it okay to hold a reference to CVImageBuffer without explicitly setting sampleBuffer = nil? I only ask because the latest version of Swift automatically memory manages CF data structures so CFRetain and CFRelease are not available.
Also, what is the reasoning behind "This is frequently the case for uncompressed device native capture where memory blocks are copied as little as possible." ? Why would a memory block be copied in the first place?

Is it okay to hold a reference to CVImageBuffer without explicitly setting sampleBuffer = nil?
If you're going to keep a reference to the image buffer, then keeping a reference to its "containing" CMSampleBuffer definitely cannot hurt. Will the "right thing" be done if you keep a reference to the CVImageBuffer but not the CMSampleBuffer? Maybe.
Also, what is the reasoning behind "This is frequently the case for uncompressed device native capture where memory blocks are copied as little as possible." ? Why would a memory block be copied in the first place?
There are questions on SO about how to do a deep copy on an image CMSampleBuffer, and the answers are not straightforward, so the chances of unintentionally copying one's memory block are very low. I think the intention of this documentation is to inform you that AVCaptureVideoDataOutput is efficient! and that this efficiency (via fixed size frame pools) can have the surprising side effect of dropped frames if you hang onto too many CMSampleBuffers for too long, so don't do that.
The warning is slightly redundant however, because even without the spectre of dropped frames, uncompressed video CMSampleBuffers are already a VERY hot potato due to their size and frequency. You only need to reference a few seconds' worth to use up gigabytes of RAM, so it is imperative to process them as quickly possible and then release/nil any references to them.

Related

Memory leak in Swift when binding memory

I've come across a memory leak in Swift on the Mac. I'm creating a buffer for a calculation on the GPU using Metal. The storage created for these is automatically deleted when they are out of scope UNLESS I bind the contents to memory.
In this case, the memory is not deleted even when both the buffer and the bound pointer are out of scope.
I tried manually deallocating the buffer, but this fails since the memory was not allocated using malloc.
Is there a way to manage this memory to avoid a leak, or is this a bug in Swift on the Mac?
Any other thoughts?
Thank you very much,
Colin
let intensityBuff = myGPUData.device?.makeBuffer(length: MemoryLayout<Float>.stride * Int(myStars.nstars * myStars.npatch * myStars.npatch, options: .storageModeShared)
let intensityPointer = intensityBuff?.contents().bindMemory(to: Float.self, capacity: MemoryLayout<Float>.stride * Int(myStars.nstars * myStars.npatch * myStars.npatch))
Metal buffers need to have the flag MTLPurgeableState.empty set to indicate they can be cleared out of memory after you are done using them. For example:
intensityBuff!.setPurgeableState(MTLPurgeableState.empty)
It turns out that the issue is that not all of Apple's code has been ported to Swift - the bindMemory command is actually written in Objective-C.
For these functions, ARC doesn't work automatically - instead you need to include the code within an autoreleasepool block e.g.
autorelease {
code with bindMemory statement in
}
I haven't tested out the MTLPurgeableState.empty suggested above by Nate (thank you for that suggestion), since the autorelease works fine.
And of course, Apple doesn't tell you which of the functions are Objective-C based - at least not that I could find!

Swift - Risk in using autoreleasepool? CPU usage?

With the Xcode Profiler I have just spotted a not really necessary memory peak on JSON decoding. Apparently it's a known issue and I should wrap the call in an autoreleasepool which helped:
//extension..
var jsonData: Data? {
return autoreleasepool{ try? JSONSerialization.data(withJSONObject: self, options: []) }
}
I found another few big chunks of allocations that were not really needed so I applied my newly-learned trick to other code as well, such as the following:
var protoArray = [Proto_Bit]()
for bit in data {
autoreleasepool{
if let str = bit.toJSONString() {
if let proto = try? Proto_Bit(jsonString: str) {
protoArray.append(proto)
}
}
}
}
Now, before I wrap every single instruction of my code (or at least wherever I see fit) in this autoreleasepool thing, I would like to ask if there are any risks or drawbacks associated to it.
With these two wraps I was able to reduce my peak memory consumption from 500mb to 170mb. I am aware that Swift also does these kinds of things behind the scenes and probably has some guards in place however I would rather be safe than sorry.
does autoreleasepool come with a CPU overhead? If it is 5% I would be okay with that since it sounds like a good tradeoff, if it's more I would have to investigate
can I mess up anything using autoreleasepool? Null pointers, thread locking etc. since the block structure looks a bit scary.. or is this just telling the hardware "at the end of the bracket clean up and close the door behind you" without affecting other objects?
Autorelease Pools are a mechanism which comes from Objective-C for helping automate memory management and ensure that objects and resources are released "eventually", where that "eventually" comes when the pool is drained. i.e., an autorelease pool, once created on a thread, captures (retains) all objects which are -autoreleaseed while the pool is active — when the pool is drained, all of those objects are released. (Note that this is a Foundation feature in conjunction with the Objective-C runtime, and is not directly integrated with hardware: it's way, way higher-level than that.)
As a short-hand for managing autorelease pools directly (and avoiding creating NSAutoreleasePool instances directly), Objective-C introduced the #autoreleasepool language keyword, which effectively creates an autorelease pool at the beginning of the scope, and drains it at the end:
#autoreleasepool /* create an autorelease pool to capture autoreleased objects */ {
// ... do stuff ...
} /* release the autoreleasepool, and all objects that were in it */
Introducing autorelease pools manually in this way grants you more control over when autoreleased objects are effectively cleaned up: if you know that a block of code creates many autoreleased objects that really don't need to outlive that block of code, that may be a good candidate for wrapping up in an #autoreleasepool.
Autorelease pools pre-date ARC, which automates reference counting in a deterministic way, and its introduction made autorelease pools became largely unnecessary in most code: if an object can be deterministically retained and released, there's no need to rely on autoreleasing it "at some point". (And in fact, along with regular memory management calls like -retain and -release themselves, ARC will not allow you to call -autorelease on objects directly either.)
Swift, following the ARC memory management model, also does not rely on autoreleasing objects — all objects are deterministically released after their last usage. However: Swift does still need to interoperate with Objective-C code, and notable, not all Objective-C code (including a lot of code in, e.g., Foundation) uses ARC. Many internal Apple frameworks still use Objective-C's manual memory management, and thus still rely on autoreleased objects.
On platforms where Swift might need to interoperate with Objective-C code, no work needs to be explicitly done in order to allow autoreleased objects to eventually be released: every Swift application on Darwin platforms has at least one implicit autorelease pool at the root of the process which captures autoreleased objects. However, as you note: this "eventual" release of Objective-C objects might keep memory usage high until the pool is drained. To help alleviate that high memory usage, Swift has autoreleasepool { ... } (matching Objective-C's #autoreleasepool { ... }), which allows you to explicitly and eagerly capture those autoreleased objects, and free them at the end of the scope.
To answer your questions directly, but in reverse order:
Can I mess up anything using autoreleasepool? For correctly-written code, no. All you're doing is helping the Objective-C runtime clean up these objects a little bit earlier than it would otherwise. And it's critical to note: the objects will only be released by the pool — if their retain count is still positive after the pool releases them, they must still be in use somewhere, and will not be deallocated until that other owner holding on to the object also releases them.
Is it possible that the introduction of an autoreleasepool will cause some unexpected behavior to occur which didn't before? Absolutely. Incorrectly-written code could have accidentally worked due to the fact that an object was incidentally kept alive long enough to prevent unintentional behavior from occurring — and releasing the object sooner might trigger it. But, this is both unlikely (given the miniscule amount of actually manual memory management outside of Apple frameworks) and not something you can rely on: if the code misbehaves inside of a newly-introduced autoreleasepool, it wasn't correct to begin with, and could have backfired on you some other way.
Does autoreleasepool come with a CPU overhead? Yes, and it is likely vanishingly small compared to the actual work an application performs. But, that doesn't mean that sprinkling autoreleasepool all over the place will be useful:
Given the decreasing amount of autoreleased objects in a Swift project as increasing amounts of code transition away from Objective-C, it's becoming rarer to see large numbers of autoreleased objects which need to be eagerly cleaned up. You could sprinkle autoreleasepools everywhere, but it's entirely possible that those pools will be entirely empty, with nothing to clean up
autoreleasepools don't affect native Swift allocations: only Objective-C objects can be autoreleased, which means that for a good portion of Swift code, autoreleasepools are entirely wasted
So, when should you use autoreleasepools?
When you're working with code coming from Objective-C, which
You've measured to show that is contributing to high memory usage thanks to autoreleased objects, which
You've also measured are cleaned up appropriately by the introduction of an autoreleasepool
In other words, exactly what you've done here in your question. So, kudos.
However, try to avoid cargo-culting the insertion of autoreleasepools all over the place: it's highly unlikely to be effective without actual measurements and understanding what might be going on.
[An aside: how do you know when objects/code might be coming from Objective-C? You can't, very easily. A good rule of thumb is that many Apple frameworks are still written in Objective-C under the hood, or may at some layer return an Objective-C object bridged (or not) to Swift — so they may be a likely culprit to investigate if you've measured something actionable. 3rd-party libraries are also much less likely to contain Objective-C these days, but you may also have source access to them to confirm.]
Another note about optimizations and autoreleasepools: in general, you should not typically expect a Release configuration of a build to behave differently with regard to autoreleased objects as opposed to a Debug configuration.
Unlike ARC code (both in Swift and in Objective-C), where the compiler can insert memory management optimizations for code at compile time, autorelease pools are a runtime feature, and since any retain will necessarily keep an object instance alive, even a single insertion of an object into an autorelease pool will keep it alive until it is disposed of at runtime. So, even if the compiler can aggressively optimize the specific locations of retains and releases for most objects in a Release configurations, there's nothing to be done for an object that's autoreleased.
(Well, the ARC optimizer can do some amount of optimization around autoreleasing objects if it has enough visibility into all of the code using the object, the context of the autorelease pools it belongs to, etc., but this is usually very limited because the scope in which the object was originally -autoreleased is usually far from the scope in which the autorelease pool lives, by definition [otherwise it would be a candidate for regular memory management].)

High Memory Allocation debugging with Apple Instruments

I have an app written in swift which works fine initially, but throughout time the app gets sluggish. I have opened an instruments profiling session using the Allocation and Leaks profile.
What I have found is that the allocation increases dramatically, doing something that should only overwrite the current data.
The memory in question is in the group < non-object >
Opening this group gives hundreds of different allocations, with the responsible library all being libvDSP. So with this I can conclude it is a vDSP call that is not releasing the memory properly. However, double clicking on any of these does not present me with any code, but the raw language I do not understand.
The function that callas vDSP is wrapped like this:
func outOfPlaceComplexFourierTransform(
setup: FFTSetup,
resultSize:Int,
logSize: UInt,
direction: FourierTransformDirection) -> ComplexFloatArray {
let result = ComplexFloatArray.zeros(count:resultSize)
self.useAsDSPSplitComplex { selfPointer in
result.useAsDSPSplitComplex { resultPointer in
vDSP_fft_zop(
setup,
&selfPointer,
ComplexFloatArray.strideSize,
&resultPointer,
ComplexFloatArray.strideSize,
logSize,
direction.rawValue)
}
}
return result
}
This is called from another function:
var mags1 = ComplexFloatArray.zeros(count: measurement.windowedImpulse!.count)
mags1 = (measurement.windowedImpulse?.outOfPlaceComplexFourierTransform(setup: fftSetup, resultSize: mags1.count, logSize: UInt(logSize), direction: ComplexFloatArray.FourierTransformDirection(rawValue: 1)!))!
Within this function, mags1 is manipulated and overwrites an existing array. It was my understanding that mags1 would be deallocated once this function has finished, as it is only available inside this function.
This is the function that is called, many times per second at times. Any help would be appreciated, as what should only take 5mb, very quickly grows by two hundred megabytes in a couple of seconds.
Any pointers to either further investigate the source of the leak, or to properly deallocate this memory once finished would be appreciated.
I cannot believe I solved this so quickly after posting this. (I genuinely had several hours of pulling my hair out).
Not included in my code here, I was creating a new FFTSetup every time this was called. Obviously this is memory intensive, and it was not reusing this memory.
In instruments looking at the call tree I was able to see the function utilising this memory.

Memory usage growing seems to be linked to Metal objects

I am currently building an app that uses Metal to efficiently render triangles and evaluate a fitness function on textures, I noticed that the memory usage of my Metal app is growing and I can't really understand why.
First of all I am surprised to see that in debug mode, according to Xcode debug panel, memory usage grows really slowly (about 20 MB after 200 generated images), whereas it grows way faster in release (about 100 MB after 200 generated images).
I don't store the generated images (at least not intentionally... but maybe there is some leak I am unaware of).
I am trying to understand where the leak (if it one) comes from but I don't really know where to start, I took a a GPU Frame capture to see the objects used by Metal and it seems suspicious to me:
Looks like there are thousands of objects (the list is way longer than what you can see on the left panel).
Each time I draw an image, there is a moment when I call this code:
trianglesVerticesCoordiantes = device.makeBuffer(bytes: &positions, length: bufferSize , options: MTLResourceOptions.storageModeManaged)
triangleVerticiesColors = device.makeBuffer(bytes: &colors, length: bufferSize, options: MTLResourceOptions.storageModeManaged)
I will definitely make it a one time allocation and then simply copy data into this buffer when needed, but could it cause the memory leak or not at all ?
EDIT with screenshot of instruments :
EDIT #2 : Tons of command encoder objects present when using Inspector:
EDIT #3 : Here is what seems to be the most suspect memory graph when analysed which Xcode visual debugger:
And some detail :
I don't really know how to interpret this...
Thank you.

Saving CMSampleBufferRef for later processing

I am trying to use AVFoundation framework to capture a 'series' of still images from AVCaptureStillImageOutput QUICKLY, like the burst mode in some cameras. I want to use the completion handler,
[stillImageOutput captureStillImageAsynchronouslyFromConnection:videoConnection
completionHandler: ^(CMSampleBufferRef imageSampleBuffer, NSError *error) {
and pass the imageSampleBuffer to an NSOperation object for later processing. However i cant find a way to retain the buffer in the NSOperation class.
[stillImageOutput captureStillImageAsynchronouslyFromConnection:videoConnection
completionHandler: ^(CMSampleBufferRef imageSampleBuffer, NSError *error) {
//Add to queue
SaveImageDataOperation *saveOperation = [[SaveImageDataOperation alloc] initWithImageBuffer:imageSampleBuffer];
[_saveDataQueue addOperation:saveOperation];
[saveOperation release];
//Continue
[self captureCompleted];
}];
Does any one know what I maybe doing wrong here? Is there a better approach to do this?
"IMPORTANT: Clients of CMSampleBuffer must explicitly manage the retain count by calling CFRetain and CFRelease, even in processes using garbage collection."
SOURCE: CoreMedia.Framework CMSampleBuffer.h
I've been doing a lot of work with CMSampleBuffer objects recently and I've learned that most of the media buffers sourced by the OS during real-time operations are allocated from pools. If AVFoundation (or CoreVideo/CoreMedia) runs out of buffers in a pool (ie. you CFRetain a buffer for a 'long' time), the real time aspect of the process is going to suffer or block until you CFRelease the buffer back into the pool.
So, in addition to manipulating the CFRetain/CFRelease count on the CMSampleBuffer you should only keep the buffer retained long enough to unpack (deep copy the bits) in the CMBlockBuffer/CMFormat and create a new CMSampleBuffer to pass to your NSOperationQueue or dispatch_queue_t for later processing.
In my situation I wanted to pass compressed CMSampleBuffers from the VideoToolbox over a network. I essentially created a deep copy of the CMSampleBuffer, with my application having full control over the memory allocation/lifetime. From there, I put the copied CMSampleBuffer on a queue for the network I/O to consume.
If the sample data is compressed, deep copying should be relatively fast. In my application, I used NSKeyedArchiver to create an NSData object from the relevant parts of the source CMSampleBuffer. For H.264 video data, that meant the CMBlockBuffer contents, the SPS/PPS header bytes and also the SampleTimingInfo. By serializing those elements I could reconstruct a CMSampleBuffer on the other end of the network that behaved identically to to the one that VideoToolbox had given me. In particular, AVSampleBufferLayer was able to display them as if they were natively sourced on the machine.
For your application I would recommend the following:
Take your source CMSampleBuffer and compress the pixel data. If you
can, use the hardware encoder in VideoToolbox to create I-frame only
H.264 images which will be very high quality. The VT encoder
apparently is very good for battery life as well, probably much
better than JPEG unless they have a hardware JPEG codec on the
system as well.
Deep copy the compressed CMSampleBuffer output by
the VideoToolbox, VT will CFRelease the original CMSampleBuffer back
to the pool used by the capture subsystem.
Retain the VT compressed CMSampleBuffer only long enough to enqueue a deep copy for later processing.
Since the AVFoundation movie recorder can do steps #1 and #2 in real time without running out of buffers, you should be able to deep copy and enqueue your data on a dispatch_queue without exhausting the buffer pools used by the video capture component and VideoToolbox components.