AudioUnit callback and synchronization: how to ensure thread safety with GCD - swift

I am building an audio app based on the AudioUnit callback facility and a graph of audio processing nodes. I know that the callback is executed in a separate (high priority?) thread and therefore all interaction with my processing nodes, such as e.g. changing EQ parameters while playing, should be done in a thread safe manner. In other words, the nodes should be protected from modification while the audio callback chain is being executed.
The way I understand it in terms of more low-level multithreading is that, I need a lock either in each node or a single one for the entire graph, that prevents writes while audio buffers are being processed.
However, I would like the implementation to be more "Swifty" and use DispatchQueue/DispatchGroup which should provide the said functionality. I just can't quite understand how to do it in the most efficient manner.
So let's say all audio parameter modifications are done on a queue, like so:
audioQueue.async {
eqNode.setEqParameters(...)
}
How do I ensure this block is not executed until the AudioUnit callback completes? Using audioQueue.sync is not an option because it means the system audio thread will depend on my audioQueue, this is not great.
If I were to use DispatchGroup what would be the best way to implement the said flow?

A real-time Audio Unit callback should never lock, wait on locks, or manage memory (Swift or Objective C objects or methods).
I would double buffer the set of audio parameters (or more, e.g. a ring buffer of parameter sets). Have the writer lock the set that is being changed, unlock when done, and never switch sets at or faster than the known audio callback rate (2X+ slower would be safe). Have the periodic reader (Audio Unit real-time callback) check the locks and not use the set of parameters that is locked.
In order to not use locks on inter-thread ring buffers, you might want to use OS atomic memory barriers on the pointer, index, or status load/stores: load_acquire, store_release, etc., to prevent ARM processor write buffer reordering, or Swift optimizer instruction reordering, from corrupting your data.

Related

Can a mutex be used instead of a critical section when using stream buffers in FreeRTOS?

I am looking into using a stream buffer in FreeRTOS to transfer CAN frames from multiple tasks to an ISR, which puts them into a CAN transmit buffer as soon as it's ready. The manual here explains that a stream buffer should only be used by one task/isr and read by one task/isr, and if not, then a critical section is required.
Can a mutex be used in place of a critical section for this scenario? Would it make more sense to use?
First, if you are sending short discrete frames, you may want to consider a message buffer instead of a stream buffer.
Yes you could use a mutex.
If sending from multiple tasks the main thing to consider is what happens when the stream buffer becomes full. If you were using a different FreeRTOS object (other than a message buffer, message buffers being built on stream buffers) then multiple tasks attempting to write to the same instance of an object that was full would all block on their attempt to write to the object, and be automatically unblocked when space in the object became available - the highest priority waiting task would be the first to be unblocked no matter the order in which the tasks entered the blocked state. However, with stream/message buffers you can only have one task blocked attempting to write to a full buffer - and if the buffer was protected by a muted - all other tasks would instead block on the mutex. That could mean that a low priority task was blocked on the stream/message buffer while a higher priority task was blocked on the mutex - a kind of priority inversion.

Uart dma receive interrupt stops receiving data after several minutes

I have a project that I have used stm32f746g discovery board. It receives data with fixed size from Uart sequentially and to inform application about each data receive completed, dma callback is used (HAL_UART_RxCpltCallback function). It works fine at the beginning but after several minutes of running, the dma callback stops to be called, and as a result, the specified parameter value doesn't get updated. Because the parameter is used in another thread too (actually a rtos defined timer), I believe this problem is caused by lacking of thread safety. But my problem is that mutex and semaphore don't be supported in ISRs and I need to protect my variable in dma callback which is an interrupt routine. I am using keil rtx to handle multithreading and the timer I use is osTimer that is defined in rtx. How can I handle the issue?
Generally, only one thread should communicate with the ISR. If multiple threads are accessing a variable shared with an ISR, your design is wrong and needs to be fixed. In case of DMA, only one thread should access the buffer.
You'll need to protect the variables shared between that thread and the ISR - not necessarily with a mutex/semaphore but perhaps with something simpler like guaranteeing atomic access (best solution if possible), or by using the non-interrruptable abilitiy that many ISRs have. Example for simple, single-threaded MCU applications. Alternatively just temporarily disable interrupts during access, but that may not be possible, depending on real-time requirements.

Is it required to use spin_lock inside tasklets?

As far as I know in interrupt handler, there is no need of synchronization technique. The interrupt handler cannot run concurrently. In short, the pre-emption is disabled in ISR. However, I have a doubt regarding tasklets. As per my knowledge, tasklets runs under interrupt context. Thus, In my opinion, there is no need for spin lock under tasklet function routine. However, I am not sure on it. Can somebody please explain on it? Thanks for your replies.
If data is shared between top half and bottom half then go for lock. Simple rules for locking. Locks meant to protect data not code.
1. What to protect?.
2. Why to protect?
3. How to protect.
Two tasklets of the same type do not ever run simultaneously. Thus, there is no need to protect data used only within a single type of tasklet. If the data is shared between two different tasklets, however, you must obtain a normal spin lock before accessing the data in the bottom half. You do not need to disable bottom halves because a tasklet never preempts another running tasklet on the same processor.
For synchronization between code running in process context (A) and code running in softirq context (B) we need to use special locking primitives. We must use spinlock operations augmented with deactivation of bottom-half handlers on the current processor in (A), and in (B) only basic spinlock operations. Using spinlocks makes sure that we don't have races between multiple CPUs while deactivating the softirqs makes sure that we don't deadlock in the softirq is scheduled on the same CPU where we already acquired a spinlock. (c) Kernel docs

Read-Write lock with GCD

My application makes heavy use of GCD, and almost everything is split up in small tasks handled by dispatches. However, the underlying data model is mostly read and only occasionally written.
I currently use locks to prevent changes to the critical data structures while reading. But after looking into locks some more today, I found NSConditionLock and some page about read-write locks. The latter is exactly what I need.
I found this implementation: http://cocoaheads.byu.edu/wiki/locks . My question is, will this implementation work with GCD, seeing that it uses PThreads?
It will still work. pthreads is the threading API which underlies all of the other thread-using APIs on Mac OS X. (Under that there's Mach thread activations, but that's SPI, not API.) Anyway, the pthreads locks don't really require that you use pthreads threads.
However, GCD offers a better alternative as of iOS 5: dispatch_barrier_async(). Basically, you have a private concurrent queue. You submit all read operations to it in the normal fashion. You submit write operations to it using the barrier routines. Ta-da! Read-write locking.
You can learn more about this if you have access to the WWDC 2011 session video for Session 210 - Mastering Grand Central Dispatch.
You might also want to consider maintaining a serial queue for all read/write operations. You can then dispatch_sync() writes to that queue to ensure that changes to the data model are applied promptly and dispatch_async() all the reads to make sure you maintain nice performance in the app.
Since you have a single serial queue on which all the reads and writes take place you ensure that no reads can happen during a write. This is far less costly than a lock but it means you cannot execute multiple 'read' operations simultaneously. This is unlikely to cause a problem for most applications.
Using dispatch_barrier_async() might mean that writes you make take an arbitrary amount of time to actually be committed since all the pre-existing tasks in the queue have to be completed before your barrier block executes.

Using multiple sockets, is non-blocking or blocking with select better?

Lets say I have a server program that can accept connections from 10 (or more) different clients. The clients send data at random which is received by the server, but it is certain that at least one client will be sending data every update. The server cannot wait for information to arrive because it has other processing to do. Aside from using asynchronous sockets, I see two options:
Make all sockets non-blocking. In a loop, call recv() on each socket and allow it to fail with WSAEWOULDBLOCK if there is no data available and if I happen to get some data, then keep it.
Leave the sockets as blocking. Add all sockets to a FD_SET and call select(). If the return value is non-zero (which it will be most of the time), loop through all the sockets to find the appropriate number of readable sockets with FD_ISSET() and only call recv() on the readable sockets.
The first option will create a lot more calls to the recv() function. The second method is a bigger pain from a programming perspective because of all the FD_SET and FD_ISSET looping.
Which method (or another method) is preferred? Is avoiding the overhead on letting recv() fail on a non-blocking socket worth the hassle of calling select()?
I think I understand both methods and I have tried both with success, but I don't know if one way is considered better or optimal.
I would recommend using overlapped IO instead. You can then kick off a WSARecv(), and provide a callback function to be invoked when the operation completes. What's more, since it'll only be invoked when your program is in an alertable wait state, you don't need to worry about locks like you would in a threaded application (assuming you run them on your main thread).
Note, however, that you do need to enter such an alertable wait state frequently. If this is your UI thread, make sure to use MsgWaitForMultipleObjectsEx() in your message loop, with the MWMO_ALERTABLE flag. This will give your callbacks a chance to run. On non-UI threads, call on a regular basis any of the wait functions that put you into an alertable wait state.
Note also that modal dialogs generally will not enter an alertable wait state, as they have their own message loop which doesn't call MsgWaitForMultipleObjectsEx(). If you need to process network IO when showing a dialog box, do all of your network IO on a dedicated thread, which does enter an alertable wait state regularly.
If, for whatever reason, you can't use overlapped IO - definitely use blocking select(). Using non-blocking recv() like that in an infinite loop is an inexcusable waste of CPU time. However, do put the sockets in non-blocking mode - as otherwise, if one byte arrives and you try to read two, you might end up blocking unexpectedly.
You might also want to consider using a library to abstract away the finicky details. For example, libevent or boost::asio.
the IO should be either completely blocking with one thread per connection and in this case the event loop is essentially an OS scheduler or the IO should be completely non-blocking, and in this case select/waitformultipleobjects-based event loop will be in your application
All intermediate variants are not very maintainable and error prone
Completely non blocking approach scales much better when the amount of concurrent connections grows and does not have a thread context switch overhead, so it is a preferrable where the number of concurrent connections is not fixed. This approach has higher implementation complexity compared to completely blocking one.
For a completely non-blocking IO the core of the applicaiton is a select/waitformultipleobjects-based event loop, all sockets are in non-blocking mode, all reads/writes are generally done from within event loop thread (for top performance writes can be first attempted directly from the thread requesting the write)