OperationQueue - crash when editing the same array from multiple operations - swift

I have an OperationQueue with multiple custom Operations which all append to the same array on completion (each operation downloads a file from user's iCloud and when it's done it appends the file to the array)
This, sometimes, causes the app to crash, because several operations try to edit the array at the same time.
How can I prevent this and only edit the array 1 operation at a time but running all operations simultaneously?
I must use OperationQueue because I need the operations to be cancelable.
func convertAssetsToMedias(assets: [PHAsset],
completion: #escaping (_ medias: [Media]) ->()) {
operationQueue = OperationQueue()
var medias: [Media] = []
operationQueue?.progress.totalUnitCount = Int64(assets.count)
for asset in assets {
// For each asset we start a new operation
let convertionOperation = ConvertPHAssetToMediaOperation(asset)
convertionOperation.qualityOfService = .userInteractive
convertionOperation.completionBlock = { [unowned convertionOperation] in
let media = convertionOperation.media
medias.append(media) // CRASH HERE (sometimes)
self.operationQueue?.progress.completedUnitCount += 1
if let progress = self.operationQueue?.progress.fractionCompleted {
self.delegate?.onICloudProgressUpdate(progress: progress)
}
convertionOperation.completionBlock = nil
}
operationQueue?.addOperation(convertionOperation)
}
operationQueue?.addBarrierBlock {
completion(medias)
}
}
Edit 1:
The Media file itself is nothing big, just a bunch of metadata and a url to an actual file at documents directory. There are usually about 24 medias max at 1 run. The memory is barely increasing during those operations. The crash never occured due to a lack of memory.
The operation ConvertPHAssetToMediaOperation is a subclass of AsyncOperation where isAsynchronous propery is set to true.
That's how I construct the Media object in the end of each operation:
self.media = Media(type: mediaType, url: resultURL, creationDate: date)
self.finish()
Edit 2: The crash is always the same:

Related

TaskGroup limit amount of memory usage for lots of tasks

I'm trying to build a chunked file uploading mechanism using modern Swift Concurrency.
There is a streamed file reader which I'm using to read files chunk by chunk of 1mb size.
It has two closures nextChunk: (DataChunk) -> Void and completion: () - Void. The first one gets called as many times as there is data read from InputStream of a chunk size.
In order to make this reader compliant to Swift Concurrency I made the extension and created AsyncStream
which seems to be the most suitable for such a case.
public extension StreamedFileReader {
func read() -> AsyncStream<DataChunk> {
AsyncStream { continuation in
self.read(nextChunk: { chunk in
continuation.yield(chunk)
}, completion: {
continuation.finish()
})
}
}
}
Using this AsyncStream I read some file iteratively and make network calls like this:
func process(_ url: URL) async {
// ...
do {
for await chunk in reader.read() {
let request = // ...
_ = try await service.upload(data: chunk.data, request: request)
}
} catch let error {
reader.cancelReading()
print(error)
}
}
The issue there is that there is no any limiting mechanism I'm aware of that won't allow to execute more than
N network calls. Thus when I'm trying to upload huge file (5Gb) memory consumption grows drastically.
Because of that the idea of streamed reading of file makes no sense as it'd be easier to read the entire file into the memory (it's a joke but looks like that).
In contrast, if I'm using a good old GCD everything works like a charm:
func process(_ url: URL) {
let semaphore = DispatchSemaphore(value: 5) // limit to no more than 5 requests at a given time
let uploadGroup = DispatchGroup()
let uploadQueue = DispatchQueue.global(qos: .userInitiated)
uploadQueue.async(group: uploadGroup) {
// ...
reader.read(nextChunk: { chunk in
let requset = // ...
uploadGroup.enter()
semaphore.wait()
service.upload(chunk: chunk, request: requset) {
uploadGroup.leave()
semaphore.signal()
}
}, completion: { _ in
print("read completed")
})
}
}
Well it is not exactly the same behavior as it uses a concurrent DispatchQueue when AsyncStream runs sequentially.
So I did a little research and found out that probably TaskGroup is what I need in this case. It allows to run async tasks in parallel etc.
I tried it this way:
func process(_ url: URL) async {
// ...
do {
let totalParts = try await withThrowingTaskGroup(of: Void.self) { [service] group -> Int in
var counter = 1
for await chunk in reader.read() {
let request = // ...
group.addTask {
_ = try await service.upload(data: chunk.data, request: request)
}
counter = chunk.index
}
return counter
}
} catch let error {
reader.cancelReading()
print(error)
}
}
In that case memory consumption is even more that in example with AsyncStream iterating!
I suspect that there should be some conditions on which I need to suspend group or task or something and
call group.addTask only when it is possible to really handle these tasks I'm going to add but I have no idea how to do it.
I found this Q/A
And tried to put try await group.next() for each 5th chunk but it didn't help me at all.
Is there any mechanism similar to DispatchGroup + DispatchSemaphore but for modern concurrency?
UPDATE:
In order to better demonstrate the difference between all 3 ways here are screenshots of memory report
AsyncStream iterating
AsyncStream + TaskGroup (using try await group.next() on each 5th chunk)
GCD DispatchQueue + DispatchGroup + DispatchSemaphore
The key problem is the use of the AsyncStream. Your AsyncStream is reading data and yielding chunks more quickly than it can be uploaded.
Consider this MCVE where I simulate a stream of 100 chunks, 1mb each:
import os.log
private let log = OSLog(subsystem: "Test", category: .pointsOfInterest)
struct Chunk {
let index: Int
let data: Data
}
actor FileMock {
let maxChunks = 100
let chunkSize = 1_000_000
var index = 0
func nextChunk() -> Chunk? {
guard index < maxChunks else { print("done"); return nil }
defer { index += 1 }
return Chunk(index: index, data: Data(repeating: UInt8(index & 0xff), count: chunkSize))
}
func chunks() -> AsyncStream<Chunk> {
AsyncStream { continuation in
index = 0
while let chunk = nextChunk() {
os_signpost(.event, log: log, name: "chunk")
continuation.yield(chunk)
}
continuation.finish()
}
}
}
And
func uploadAll() async throws {
try await withThrowingTaskGroup(of: Void.self) { group in
let chunks = await FileMock().chunks()
var index = 0
for await chunk in chunks {
index += 1
if index > 5 {
try await group.next()
}
group.addTask { [self] in
try await upload(chunk)
}
}
try await group.waitForAll()
}
}
func upload(_ chunk: Chunk) async throws {
let id = OSSignpostID(log: log)
os_signpost(.begin, log: log, name: #function, signpostID: id, "%d start", chunk.index)
try await Task.sleep(nanoseconds: 1 * NSEC_PER_SEC)
os_signpost(.end, log: log, name: #function, signpostID: id, "end")
}
When I do that, I see memory spike to 150mb as the AsyncStream rapidly yields all of the chunks upfront:
Note that all the Ⓢ signposts, showing when the Data objects are created, are clumped at the start of the process.
Note, the documentation warns us that the sequence might conceivably generate values faster than they can be consumed:
An arbitrary source of elements can produce elements faster than they are consumed by a caller iterating over them. Because of this, AsyncStream defines a buffering behavior, allowing the stream to buffer a specific number of oldest or newest elements. By default, the buffer limit is Int.max, which means the value is unbounded.
Unfortunately, the various buffering alternatives, .bufferingOldest and .bufferingNewest, will only discard values when the buffer is filled. In some AsyncStreams, that might be a viable solution (e.g., if you are tracking the user location, you might only care about the most recent location), but when uploading chunks of the file, you obviously cannot have it discard chunks when the buffer is exhausted.
So, rather than AsyncStream, just wrap your file reading with a custom AsyncSequence, which will not read the next chunk until it is actually needed, dramatically reducing peak memory usage, e.g.:
struct FileMock: AsyncSequence {
typealias Element = Chunk
struct AsyncIterator : AsyncIteratorProtocol {
let chunkSize = 1_000_000
let maxChunks = 100
var current = 0
mutating func next() async -> Chunk? {
os_signpost(.event, log: log, name: "chunk")
guard current < maxChunks else { return nil }
defer { current += 1 }
return Chunk(index: current, data: Data(repeating: UInt8(current & 0xff), count: chunkSize))
}
}
func makeAsyncIterator() -> AsyncIterator {
return AsyncIterator()
}
}
And
func uploadAll() async throws {
try await withThrowingTaskGroup(of: Void.self) { group in
var index = 0
for await chunk in FileMock() {
index += 1
if index > 5 {
try await group.next()
}
group.addTask { [self] in
try await upload(chunk)
}
}
try await group.waitForAll()
}
}
And that avoids loading all 100mb in memory at once. Note, the vertical scale on memory is different, but you can see that the peak usage is 100mb less than the above graph and the Ⓢ signposts, showing when data is read into memory, are now distributed throughout the graph rather than all at the start:
Now, obviously, I am only mocking the reading of a large file with Chunk/Data objects and mocking the upload with a Task.sleep, but it hopefully illustrates the basic idea.
Bottom line, do not use AsyncStream to read the file, but rather consider a custom AsyncSequence or other pattern that reads the file in as the chunks are needed.
A few other observations:
You said “tried to put try await group.next() for each 5th chunk”. Perhaps you can show us what you tried. But note that this answer didn’t say “each 5th chunk” but rather “every chunk after the 5th”. We cannot comment on what you tried unless you show us what you actually tried (or provide a MCVE). And as the above shows, using Instruments’ “Points of Interest” tool can show the actual concurrency.
By the way, when uploading large asset, consider using a file-based upload rather than Data. The file-based uploads are far more memory efficient. Regardless of the size of the asset, the memory used during a file-based asset will be measured in kb. You can even turn off chunking entirely, and a file-based upload will use very little memory regardless of the file size. URLSession file uploads have a minimal memory footprint. It is one of the reasons we do file-based uploads.
The other reason for file-based uploads is that, for iOS especially, one can marry the file-based upload with a background session. With a background session, the user can even leave the app to do something else, and the upload will continue to operate in the background. At that point, you can reassess whether you even need/want to do chunking at all.

Array Function of MPMediaItem Very Slow

I'm trying to edit the queue of my music player using the applicationQueuePlayer and the perform method (details here). However, whenever I apply any array function (map, filter etc.), it takes many seconds to complete, leading to (I think) data races and crashes when the user, for example, removes two tracks immediately after each other.
var musicPlayerController = MPMusicPlayerController.applicationQueuePlayer
self.musicPlayerController.perform { (currentQueue) in
let items = currentQueue.items
let itemsToRemove = items.filter { $0.artist == "Some artist" } // this takes multiple seconds
if let item = itemsToRemove.first {
currentQueue.remove(item)
}
} completionHandler: { (newQueue, error) in
if let e = error {
print(e)
} else {
tracks = items.map { Track(item: $0) } // this takes multiple seconds
}
}
The issue is arising as I'm going through an MPMediaItem array. I don't think this is an issue with the MPMediaItem class though, as I'm able to complete a map of [MPMediaItem] in other places in the app e.g. when getting items from a playlist (a similar sized array to the queue items).
The issue happens solely when the MPMediaItems are taken from the MPMusicPlayerControllerMutableQueue and MPMusicPlayerControllerQueue
Is this just a bug with MusicKit API?

Why is a process suspended by another process behind it?

The code is in a simple way, only read and parse an xml file into an array. I did not notice the problem until one day I tried to open a big xml file.
I added a blur view with NSProgressIndicator when the data is parsing, but the blur view did not show up until the parsing was completed.
self.addBlurView()
let file = HandleFile.shared.openFile(filePath)
self.removeBlurView()
guard let name = file.name, let path = file.path, let data = file.data else {
return
}
So I tried to delay parsing data. The blur view can be showed up, and removed when completed.
self.addBlurView()
DispatchQueue.main.asyncAfter(deadline: .now() + 0.3, execute: {
let file = HandleFile.shared.openFile(filePath)
self.removeBlurView()
guard let name = file.name, let path = file.path, let data = file.data else {
return
}
})
I thought it might be a problem fo thread, so I tried this in func addBlurView(), failed. I also tried to add an counting in addBlurView(), it counted to a certain number and paused, and continue counting after parsing data.
DispatchQueue.main.async {
self.blurView.isHidden = false
self.spinner.startAnimation(self)
}
Have no idea why this happen. Can anyone help to solve this problem?
Thanks.
As I mentioned in the comments above, main queue is a serial queue and all the tasks assigned to it are executed serially by main thread. In general, You should not perform any heavy lifting task (like loading file to memory) on main thread as it would block the main thread and render UI unresponsive.
Typically all the heavy lifting tasks like loading a file to a memory (anything which does not deal with UI rendering directly) should be delegated to one of dispatch queues. Try wrapping your openFile(filePath) call inside DispatchQueue
self.addBlurView()
DispatchQueue.global(qos: .default).async {
let file = HandleFile.shared.openFile(filePath)
}
Personally I would expect openFile function to have a completion block which is triggered on main queue when it finished loading file so that you can remove your blurView, but in your case it seems like its a synchronous statement so you can try
self.addBlurView()
DispatchQueue.global(qos: .default).async {
let file = HandleFile.shared.openFile(filePath)
DispatchQueue.main.async {
self.removeBlurView()
}
}

Apparently random execution time with GKGraph.findpath() method in Swift

I'm having a pathfinder class in a SpriteKit game that a I want to use to process every path request in the game. So I have my class stored in my SKScene and I access it from different parts of the game always from the main thread. The pathfinder uses a GKGridGraph of a pretty good size (288 x 224). The class holds an array of requests processed one after another at each update() call from the main scene. Here is the code :
class PathFinder {
var isLookingForPath = false
var groundGraph : GKGridGraph<MyNode>?
var queued : [PathFinderRequest] = []
var thread: DispatchQoS.QoSClass = .userInitiated
func generate(minPoint: CGPoint) {
// generate the groundGraph grid
}
func update() {
// called every frame
if !self.isLookingForPath {
findPath()
}
}
func findPath(from start: TuplePosition, to end: TuplePosition, on layer: PathFinderLayer, callBack: PathFinderCallback) {
// Generating request
let id = String.randomString(length: 5)
let request = PathFinderRequest(id: id, start: start, end: end, layer: layer, callback: callBack)
// Append the request object at the end of the array
queued.append(request)
}
func findPath() {
self.isLookingForPath = true
guard let request = queued.first else {
isLookingForPath = false
return
}
let layer = request.layer
let callback = request.callback
let start = request.start
let end = request.end
let id = request.id
var graph = self.groundGraph
queued.removeFirst()
let findItem = DispatchWorkItem {
if let g = graph, let sn = g.node(atGridPosition: start.toVec()), let en = g.node(atGridPosition: end.toVec()) {
if let path = g.findPath(from: sn, to: en) as? [GKGridGraphNode], path.count > 0 {
// Here we have the path found
// it worked !
}
}
// Once the findPath() method execution is over,
// we reset the "flag" so we can call it once again from
// the update() method
self.isLookingForPath = false
}
// Execute the findPath() method in the chosen thread
// asynchronously
DispatchQueue.global(qos: thread).async(execute: findItem)
}
func drawPath(_ path: [GKGridGraphNode]) {
// draw the path on the scene
}
}
Well the code works quite good as it is. If I send random path request within (x+-10, y+-10) length it will return them to each object holding the callback in the request object pretty quickly, but suddenly one request is randomly taking a huge amount of time (approximatively 20s compared to 0.001s) and despite everything I tried I wasn't able to find out what happens. It's never on the same path, never the same caller, never after a certain amount of time... here is a video of the issue : https://www.youtube.com/watch?v=-IYlLOQgJrQ
It sure happens more quickly when there is too much entities requesting but I can't figure why I'm sure it has to deal with the DispacthQueue async calls that I use to prevent the game from freezing.
With delay on every call, the error appear later but is still here :
DispatchQueue.global(qos: thread).asyncAfter(deadline: .now() + 0.1, execute: findItem)
When I look for what is taking so much time to process it is a sub method of the GKGridGraph class :
So I really don't know how to figure this out, I tried everything I could think of but it always happens whatever the delay, the number of entities, the different threads, etc...
Thank you for your precious help !

How does the semaphore keep async loop in order?

I've set up this script to loop through a bunch of data in the background and I've successfully set up a semaphore to keep everything (the array that will populate the table) in order but I cannot exactly understand how or why the semaphore keeps the array in order. The dispatchGroup is entered, the loop stops and waits until the image is downloaded, once the image is gotten the dispatchSemaphore is set to 1 and immediately the dispatchGroup is exited and the semaphore set back to 0. The semaphore is toggled so fast from 0 to 1 that I don't understand how it keeps the array in order.
let dispatchQueue = DispatchQueue(label: "someTask")
let dispatchGroup = DispatchGroup()
let dispatchSemaphore = DispatchSemaphore(value: 0)
dispatchQueue.async {
for doc in snapshot.documents {
// create data object for array
dispatchGroup.enter()
// get image with asynchronous completion handler
Storage.storage().reference(forURL: imageId).getData(maxSize: 1048576, completion: { (data, error) in
defer {
dispatchSemaphore.signal()
dispatchGroup.leave()
}
if let imageData = data,
error == nil {
// add image to data object
// append to array
}
})
dispatchSemaphore.wait()
}
// do some extra stuff in background after loop is done
}
dispatchGroup.notify(queue: dispatchQueue) {
DispatchQueue.main.async {
self.tableView.reloadData()
}
}
The solution is in your comment get image with asynchronous completion handler. Without the semaphore all image downloads would be started at the same time and race for completion, so the image that downloads fastest would be added to the array first.
So after you start your download you immediately wait on your semaphore. This will block until it is signaled in the callback closure from the getData method. Only then the loop can continue to the next document and download it. This way you download one file after another and block the current thread while the downloads are running.
Using a serial queue is not an option here, since this would only cause the downloads to start serially, but you can’t affect the order in which they finish.
This is a rather inefficient though. Your network layer probably can run faster if you give it multiple requests at the same time (think of parallel downloads and HTTP pipelining). Also you're 'wasting' a thread which could do some different work in the meantime. If there is more work to do at the same time GCD will spawn another thread which wastes memory and other resources.
A better pattern would be to skip the semaphore, let the downloads run in parallel and store the image directly at the correct index in your array. This of course means you have to prepare an array of the appropriate size beforehand, and you have to think of a placeholder for missing or failed images. Optionals would do the trick nicely:
var images: [UIImage?] = Array(repeating: nil, count: snapshot.documents.count)
for (index, doc) in snapshot.documents.enumerated() {
// create data object for array
dispatchGroup.enter()
// get image with asynchronous completion handler
Storage.storage().reference(forURL: imageId).getData(maxSize: 1048576) { data, error in
defer {
dispatchGroup.leave()
}
if let imageData = data,
error == nil {
// add image to data object
images[index] = image
}
}
}
The DispatchGroup isn't really doing anything here. You have mutual exclusion granted by the DispatchSemaphor, and the ordering is simply provided by the iteration order of snapshot.documents