How to limit memory consumption when using audio unit - iphone

For my app, I need to play music on background when user navigate inside it.
So, starting from MixerHost, I developed an audio mixer which is able to play 8 tracks simultaneously. Nevertheless, It consumes to much memory because the 8 tracks files are entirely loaded in 8 buffers.
To limit the memory consumption, I load only a small chunk of data at the beginning, and I feed with new data in the callback like that
result = ExtAudioFileRead ( audioFileObject, &numberOfPacketsToRead, bufferList );
It works quite well, but sometimes the playback is shortly paused. I know the origin of the problem: making FS access in the callback.
But is there another solution to limit memory consumption ?

The way this is typically handled is with a shared ring buffer. The ring buffer acts like a shock absorber between the real-time render thread and the slow disk accesses. Create a new thread that does nothing but read audio from the file and stores it in the ring buffer. Then, in your render callback just read from the ring buffer.
Apple has provided an implementation of a ring buffer suitable for use with Audio Units, called CARingBuffer. It's available in /Developer/Extras/CoreAudio/PublicUtility/CARingBuffer.

Related

Does Exoplayer download a chunk completely before processing (decoding) it

Does Exoplayer download a chunk completely before processing (decrypting, decoding) it. Is there a way to override this and start decoding / playback before the chunk is completely downloaded.The content is an MPEG-DASH content with a 6 second chunk size.
I am on the latest version of Exoplayer. I am trying to improve the Video Start Time and hence this query. Also, will smaller chunk sizes impact the Video start time ?
I think you mean a dash segment when you say chunk - the terminology is important because DASH segments can contain subsegments, and each of these may be decodable, but it is also confusing as the term chunks and segments are both used in the ExoPlayer code.
Its useful when discussing this area to remember that the video download is actually a series of requests and responses, rather than a constant stream of media.
To start decoding earlier you typically have to request smaller 'pieces' (trying to avoid tripping over terminology...) of the video.
To be decodable, and video piece usually needs to start with a frame which does not reference any previous frames - an IDR frame or Stream Access Point (SAP).
Looking at ExoPlayer itself, you can set the number of segments you can download per chunk (Exoplayer terminology for the bit of the video you download) - take a look at the 'maxSegmentsPerLoad' attribute in the 'DefaultDashChunkSource': https://github.com/google/ExoPlayer/blob/bd54394391b0527893f382c9d641b8a55ffca765/library/dash/src/main/java/com/google/android/exoplayer2/source/dash/DashChunkSource.java
However, I think this is the opposite of what you are looking for - you would like to request a smaller piece of video - e.g. subsegments rather than the whole segments.
For that you most likely want to look at the new low latency mechanisms introduced for DASH and for HLS - ExoPlayer has added support for these and there is a public design document which provides a very good explanation of the background and the approach here (link correct at the time of writing - original ExoPlauer git issue also for reference - https://github.com/google/ExoPlayer/issues/4904):
https://docs.google.com/document/d/1z9qwuP7ff9sf3DZboXnhEF9hzW3Ng5rfJVqlGn8N38k/edit#
The diagrams in this document explain it well, but the short answer to your question is that yes, this approach does allow smaller 'pieces' of video be downloaded and played back and it does indeed help with video start time.

MTLSharedEventListener block called before command buffer scheduling and not in-flight

I am using a MTLSharedEvent to occasionally relay new information from the CPU to the GPU by writing into a MTLBuffer with storage mode .storageModeManaged within a block registered by the shared event (using the notify(_:atValue:block:) method of MTLSharedEvent, with a MTLSharedEventListener configured to be notified on a background dispatch queue). The process looks something like this:
let device = MTLCreateSystemDefaultDevice()!
let synchronizationQueue = DispatchQueue(label: "com.myproject.synchronization")
let sharedEvent = device.makeSharedEvent()!
let sharedEventListener = MTLSharedEventListener(dispatchQueue: synchronizationQueue)
// Updated only occasionally on the CPU (on user interaction). Mostly written to
// on the GPU
let managedBuffer = device.makeBuffer(length: 10, options: .storageModeManaged)!
var doExtra = true
func computeSomething(commandBuffer: MTLCommandBuffer) {
// Do work on the GPU every frame
// After writing to the buffer on the GPU, synchronize the buffer (required)
let blitToSynchronize = commandBuffer.makeBlitCommandEncoder()!
blitToSynchronize.synchronize(resource: managedBuffer)
blitToSynchronize.endEncoding()
// Occassionally, add extra information on the GPU
if doExtraWork {
// Register a block to write into the buffer
sharedEvent.notify(sharedEventListener, atValue: 1) { event, value in
// Safely write into the buffer. Make sure we call `didModifyRange(_:)` after
// Update the counter
event.signaledValue = 2
}
commandBuffer.encodeSignalEvent(sharedEvent, value: 1)
commandBuffer.encodeWaitForEvent(sharedEvent, value: 2)
}
// Commit the work
commandBuffer.commit()
}
The expected behavior is as follows:
The GPU does some work with the managed buffer
Occasionally, the information needs to be updated with new information on the CPU. In this frame, we register a block of work to be executed. We do so in a dedicated block because we cannot guarantee that by the time execution on the main thread reaches this point the GPU is not simultaneously reading from or writing to the managed buffer. Hence, it is unsafe to simply write to it currently and must make sure the GPU is not doing anything with this data
When the GPU schedules this command buffer to be executed, commands executed before the encodeSignalEvent(_:value:) call are executed and then execution on the GPU stops until the block increments the signaledValue property of the event passed into the block
When execution reaches the block, we can safely write into the managed buffer because we know the CPU has exclusive access to the resource. Once we've done so, we resume execution of the GPU
The issue is that it seems Metal is not calling the block when the GPU is executing the command, but rather before the command buffer is even scheduled. Worse, the system seems to "work" with the initial command buffer (the very first command buffer, before any other are scheduled).
I first noticed this issue when I looked at a GPU frame capture after my scene would vanish after a CPU update, which is where I saw that the GPU had NaNs all over the place. I then ran into this strange situation when I purposely waited on the background dispatch queue with a sleep(:_) call. Quite correctly, my shared resource semaphore (not shown, signaled in a completion block of the command buffer and waited on in the main thread) reached a value of -1 after committing three command buffers to the command queue (three being the number of recycled shared MTLBuffers holding scene uniform data etc.). This suggests that the first command buffer has not finished executing by then time the CPU is more than three frames ahead, which is consistent with the sleep(_:) behavior. Again, what isn't consistent is the ordering: Metal seems to call the block before even scheduling the buffer. Further, in subsequent frames, it doesn't seem that Metal cares that the sharedEventListener block is taking so long and schedules the command buffer for execution even while the block is running, which finishes dozens of frames later.
This behavior is completely inconsistent with what I expect. What is going on here?
P.S.
There is probably a better way to periodically update a managed buffer whose contents are mostly
modified on the GPU, but I have not yet found a way to do so. Any advice on this subject is appreciated as well. Of course, a triple buffer system could work, but it would waste a lot of memory as the managed buffer is quite large (whereas the shared buffers managed by the semaphore are quite small)
I think I have the answer for you, but I'm not sure.
From MTLSharedEvent doc entry
Commands waiting on the event are allowed to run if the new value is equal to or greater than the value for which they are waiting. Similarly, setting the event's value triggers notifications if the value is equal to or greater than the value for which they are waiting.
Which means, that if you are passing values 1 and 2 like you show in your snippet, if will only work a single time and then the event won't be waited and listeners won't be notified.
You have to make sure the value you are waiting on and then signaling is monotonically rising every time, so you have to bump it up by 1 or more.

8 channel async mic recording in matlab

I wanted to record a sequence of sounds (using 8 channel mic array).
Matlab's audiorecorder system object does not support more than 2 channels async recording.
When I say async, I want to achieve the following:
The user will press some key (handled by event handler gui) it will start the recording and then again user will press a key then the system will save the current recording and user starts with next audio in the sequence.
I can record 8 ch from Matlab using audioDeviceReader system object but for that, I need to call it for each frame so I will have to create a parallel process that will have to communicate with the event handler and the audioDeviceReader.
I don't have much experience will parallel programming? Should I look into audiorecorder's code and see if can be trivially changed to support 8 ch (If that was the case I think they would have already done it). Or write code to spawn a parallel process which exposes record and stop functions wrapping over audioDeviceReader which can interface with event listener similar to audiorecorder? If so how should I proceed?
Well surprisingly removing the num channel error check in the library code worked. :)

Memory usage growing seems to be linked to Metal objects

I am currently building an app that uses Metal to efficiently render triangles and evaluate a fitness function on textures, I noticed that the memory usage of my Metal app is growing and I can't really understand why.
First of all I am surprised to see that in debug mode, according to Xcode debug panel, memory usage grows really slowly (about 20 MB after 200 generated images), whereas it grows way faster in release (about 100 MB after 200 generated images).
I don't store the generated images (at least not intentionally... but maybe there is some leak I am unaware of).
I am trying to understand where the leak (if it one) comes from but I don't really know where to start, I took a a GPU Frame capture to see the objects used by Metal and it seems suspicious to me:
Looks like there are thousands of objects (the list is way longer than what you can see on the left panel).
Each time I draw an image, there is a moment when I call this code:
trianglesVerticesCoordiantes = device.makeBuffer(bytes: &positions, length: bufferSize , options: MTLResourceOptions.storageModeManaged)
triangleVerticiesColors = device.makeBuffer(bytes: &colors, length: bufferSize, options: MTLResourceOptions.storageModeManaged)
I will definitely make it a one time allocation and then simply copy data into this buffer when needed, but could it cause the memory leak or not at all ?
EDIT with screenshot of instruments :
EDIT #2 : Tons of command encoder objects present when using Inspector:
EDIT #3 : Here is what seems to be the most suspect memory graph when analysed which Xcode visual debugger:
And some detail :
I don't really know how to interpret this...
Thank you.

Modify Auriotouch sample code to read data from an audio file

I want to modify the apple's sample code of auriotouch to generate the waveform from and audio file instead of rendering the waveform from the mic input. I tried to do it, but i am not able to understand where and what changes to make. Can anyone guide me on how it can be achieved.
Thanks,
Look inside the render callback for a function named AudioUnitRender
The render callback happens whenever the speakers are hungry for data.
IIRC A.T. simply grabs however many samples are required from the microphone using this function
Of course, the first time round it will fail because there will be nothing waiting
Anyway, just comment out this function and instead fill the buffer yourself with samples from your file ( which I think you would probably want to load into memory in advance, probably don't want fileIO clogging a high priority thread )
that means you will probably need to create some sort of AudioFile class, and pass a reference to an instance of this class when you set up the render callback. that way you will be able to access the data from within this render callback ( which is a vanilla C function, ie not a member of a class, so it has no other way to access class data -- unless you want to do something horrible with file-level variables ).
make sure you create this AudioFile* audiofile NONATOMIC if it is a property, you don't want your render callback to be kept waiting because some other thread is inside the object and consequently has a lock on it.