Merging two dictionaries asynchronously in Swift - swift

I am having a problem with merging two dictionaries asynchronously. The idea is to execute calculation function that returns a dictionary multiple times asynchronously and then merge results into a single dictionary. I tried following:
var subResult: [String: Result] = [:]
let stride = (max_val - min_val) / 2 + 1
DispatchQueue.concurrentPerform(iterations: stride) { index in
let c_size = max_val + 2*index
var s_size = min_provided
var localResult: [String: Result] = [:]
repeat {
let res = SubFlow().process(data: data, cSize: c_size, sSize: s_size)
localResult.merge( res ) { (current, _) in current }
s_size += 2
} while (s_size <= c_size)
subResult.merge( localResult ) { (current, _) in current }
}
This solution works, but I don't see it as a reliable one as we are mutating dictionary asynchronously. I am new to Swift and not sure how I can implement "safe" and performant merge in this case?

As dictionaries are not thread-safe in Swift, you need to make sure that all writes to a dictionary happen on the same queue.
You can achieve it by either creating a serial queue or by creating a concurrent queue while making sure that the write operations are executed with barriers. The latter approach will allow concurrent reads from the object while it's not being written to:
var subResult: [String: Int] = [:]
let resultUpdateQueue = DispatchQueue(label: "com.example.myapp.resultUpdateQueue", attributes: .concurrent)
DispatchQueue.concurrentPerform(iterations: 10) { index in
let localResult = ["sample\(index)":index]
resultUpdateQueue.sync(flags: .barrier) {
subResult.merge( localResult ) { (current, _) in current }
}
}
print(subResult)
Be sure not to execute .concurrentPerform() on the main queue because it will wait until all the iterations have completed.

No, this is not safe. It's undefined behavior and I'm somewhat surprised it works.
Instead, you should be generating sub-results in parallel, and then merging them together serially. Something along the lines of:
DispatchQueue.concurrentPerform(iterations: stride) { index in
// ... call process and generated localResults ...
serialQueue.dispatchAsync { subResult.merge ... }
}

Related

"PassthroughSubject" seems to be thread-unsafe, is this a bug or limitation?

"PassthroughSubject" seems to be thread-unsafe. Please see the code below, I'm sending 100 values concurrently to a subscriber which only request .max(5). Subscriber should only get 5 values I think, but it actually got more. Is this a bug or limitation?
// Xcode11 beta2
var count = 0
let q = DispatchQueue(label: UUID().uuidString)
let g = DispatchGroup()
let subject = PassthroughSubject<Int, Never>()
let subscriber = AnySubscriber<Int, Never>(receiveSubscription: { (s) in
s.request(.max(5))
}, receiveValue: { v in
q.sync {
count += 1
}
return .none
}, receiveCompletion: { c in
})
subject.subscribe(subscriber)
for i in 0..<100 {
DispatchQueue.global().async(group: g) {
subject.send(i)
}
}
g.wait()
print("receive", count) // expected 5, but got more(7, 9...)
I believe the prefix operator can help:
/// Republishes elements up to the specified maximum count.
func prefix(Int) -> Publishers.Output<PassthroughSubject<Output, Failure>>
The max operator is returning the largest value at completion (and it's possible you're triggering completion more than once):
/// Publishes the maximum value received from the upstream publisher, after it finishes.
/// Available when Output conforms to Comparable.
func max() -> Publishers.Comparison<PassthroughSubject<Output, Failure>>

What does the reduce(_:_:) function do in Swift? [duplicate]

This question already has answers here:
What is the reduce() function doing, in Swift
(4 answers)
Closed 9 months ago.
Here is a piece of code I don't understand. This code uses swift's reduce(::) function along with the closure which I am having trouble to understand. What are the values set in maxVerticalPipCount and maxHorizontalPipCount? Are they 5 and 2 respectively?
let pipsPerRowForRank = [[0], [1], [1,1], [1,1,1], [2,2], [2,1,2],
[2,2,2], [2,1,2,2], [2,2,2,2], [2,2,1,2,2],
[2,2,2,2,2]]
let maxVerticalPipCount = CGFloat(pipsPerRowForRank.reduce(0) { max($1.count, $0) })
let maxHorizontalPipCount = CGFloat(pipsPerRowForRank.reduce(0) { max($1.max() ?? 0, $0) })
By the way, if you’re wondering what precisely reduce does, you can always refer to the source code, where you can see the actual code as well as a nice narrative description in the comments.
But the root of your question is that this code is not entirely obvious. I might suggest that if you’re finding it hard to reason about the code snippet, you can replace the opaque shorthand argument names, $0 and $1, with meaningful names, e.g.:
let verticalMax = pipsPerRowForRank.reduce(0) { previousMax, nextArray in
max(nextArray.count, previousMax)
}
let horizontalMax = pipsPerRowForRank.reduce(0) { previousMax, nextArray in
max(nextArray.max() ?? 0, previousMax)
}
By using argument names that make the functional intent more clear, it often is easier to grok what the code is doing. IMHO, especially when there are multiple arguments, using explicit argument names can make it more clear.
That having been said, I’d probably not use reduce and instead do something like:
let verticalMax = pipsPerRowForRank
.lazy
.map { $0.count }
.max() ?? 0
To my eye, that makes the intent extremely clear, namely that we’re counting how many items are in each sub-array and returning the maximum count.
Likewise, for the horizontal one:
let horizontalMax = pipsPerRowForRank
.lazy
.flatMap { $0 }
.max() ?? 0
Again, I think that’s clear that we’re creating a flat array of the values, and then getting the maximum value.
And, in both cases, we’re using lazy to avoid building interim structures (in case our arrays were very large), but evaluating it as we go along. This improves memory characteristics of the routine and the resulting code is more efficient. Frankly, with an array of arrays this small, lazy isn’t needed, but I include it for your reference.
Bottom line, the goal with functional patterns is not to write code with the fewest keystrokes possible (as there are more concise renditions we could have written), but rather to write efficient code whose intent is as clear as possible with the least amount of cruft. But we should always be able to glance at the code and reason about it quickly. Sometimes if further optimization is needed, we’ll make a conscious decision to sacrifice readability for performance reasons, but that’s not needed here.
This is what the reduce functions do here
var maxVerticalPipCount:CGFloat = 0
for rark in pipsPerRowForRank {
if CGFloat(rark.count) > maxVerticalPipCount {
maxVerticalPipCount = CGFloat(rark.count)
}
}
var maxHorizontalPipCount:CGFloat = 0
for rark in pipsPerRowForRank {
if CGFloat(rark.max() ?? 0) > maxHorizontalPipCount {
maxHorizontalPipCount = CGFloat(rark.max() ?? 0)
}
}
You shouldn't use reduce(::) function for finding the max value. Use max(by:)
function like this
let maxVerticalPipCount = CGFloat(pipsPerRowForRank.max { $0.count < $1.count }?.count ?? 0)
let maxHorizontalPipCount = CGFloat(pipsPerRowForRank.max { ($0.max() ?? 0) < ($1.max() ?? 0) }?.max() ?? 0)
The reduce function loops over every item in a collection, and combines them into one value. Think of it as literally reducing multiple values to one value. [Source]
From Apple Docs
let numbers = [1, 2, 3, 4]
let numberSum = numbers.reduce(0, { x, y in
x + y
})
// numberSum == 10
In your code,
maxVerticalPipCount is iterating through the whole array and finding the max between count of 2nd element and 1st element of each iteration.
maxHorizontalPipCount is finding max of 2nd element's max value and first element.
Try to print each element inside reduce function for better understandings.
let maxVerticalPipCount = pipsPerRowForRank.reduce(0) {
print($0)
return max($1.count, $0)
}
Reduce adds together all the numbers in an array opens a closure and really do whatever you tell it to return.
let pipsPerRowForRank = [[1,1], [2,2,2]]
let maxVerticalPipCount = CGFloat(pipsPerRowForRank.reduce(0) {
max($1.count, $0)})
Here it starts at 0 at reduce(0) and loops through the full array. where it takes the highest value between it's previous value it's in process of calculating and the number of items in the subarray. In the example above the process will be:
maxVerticalPipCount = max(2, 0)
maxVerticalPipCount = max(3, 2)
maxVerticalPipCount = 3
As for the second one
let pipsPerRowForRank = [[1,2], [1,2,3], [1,2,3,4], []]
let maxHorizontalPipCount = CGFloat(pipsPerRowForRank.reduce(0) {
max($1.max() ?? 0, $0)})
Here we instead of checking count of array we check the max value of the nested array, unless it's empty, the it's 0. So here goes this one:
let maxHorizontalPipCount = max(2, 0)
let maxHorizontalPipCount = max(3, 2)
let maxHorizontalPipCount = max(4, 3)
let maxHorizontalPipCount = max(0, 4)
let maxHorizontalPipCount = 4
Example With Swift 5,
enum Errors: Error {
case someError
}
let numbers = [1,2,3,4,5]
let inititalValue = 0
let sum = numbers.reduce(Result.success(inititalValue)) { (result, value) -> Result<Int, Error> in
if let initialValue = try? result.get() {
return .success(value + initialValue)
} else {
return .failure(Errors.someError)
}
}
switch sum {
case .success(let totalSum):
print(totalSum)
case .failure(let error):
print(error)
}

Swift for iOS - 2 for loops run at the same time?

I have two objects where I need to update their UI at the same time. I have a for loop for one, and after that another for loop. Each iteration in the for loop I have a short delay so that for elements in the object I am making a UI change... one after the other - not seemingly all at once.
func update(value: Int){
var delay: Double = 0.05
// first loop
for i in 0...value {
delayWithSeconds(delay) {
//do something with object 1
}
delay = delay + 0.05
}
var delay2: Double = 0.05
// second loop
for i in 0...value {
delayWithSeconds(delay2) {
//do something with object 2
}
delay2 = delay2 + 0.05
}
}
// Utility
func delayWithSeconds(_ seconds: Double, completion: #escaping () -> ()) {
DispatchQueue.main.asyncAfter(deadline: .now() + seconds) {
completion()
}
}
I have tried wrapping each for loop with DispatchQueue.main.async and it didn't make a difference. In short - I would like to run both for loops at the same time (or perceived as such). These are on the UI thread.
I tried this and it seemed to work out quite well. It does exactly what I want it to do (at least visually they seem to run at the same time).
let concurrentQueue = DispatchQueue(label: "net.ericd.hello", attributes: .concurrent)
concurrentQueue.async {
//my loop with delay here for object 1.
}
concurrentQueue.async {
//my separate loop with delay here for object 2.
}
We can use it when we want execute different arrays at the same time:
using this Generic Function
zip(_:_:)
Here i took 2 array:
var arrOfInt = ["1","2","3"]
var arrOfIntString = ["one","two","three"]
for (intNum, intString) in zip(arrOfInt, arrOfIntString) {
print("Int:\(intNum), String:\(intString)")
}

Strange values from vDSP_meanD

I am using the vDSP_meanD function to determine the average of a data set (consecutive diferences from an array)
The code I am using is below
func F(dataAllFrames:[Double],std:Double,medida:String)->Double{
let nframes=dataAllFrames.count
var diferencas_consecutivas_media = [Double](count: dataAllFrames.count-1, repeatedValue:0.0)
var mediaDifConseq:Double = 0
for(var i:Int=1; i<dataAllFrames.count; i++){
diferencas_consecutivas_media[i-1]=dataAllFrames[i]-dataAllFrames[i-1]
}
var meanConseqDif = [Double](count: 1, repeatedValue:0.0)
var meanConseqDifPtr = UnsafeMutablePointer<Double>(meanConseqDif)
vDSP_meanvD(diferencas_consecutivas_media,1,meanConseqDifPtr,UInt(nframes))
print( meanConseqDif[0])
}
The function F is called within a thread block
let group = dispatch_group_create()
let queue = dispatch_queue_create("myqueue.data.processor", DISPATCH_QUEUE_CONCURRENT)
dispatch_group_async(group, queue) {
F(measureData,std: std, medida: medida)
}
The F function is called in multiple dispatch block with different variables instances every now and then i get different values for the value returned from vDSP_meanD is there any context where this may happen ?
May the thread call have some influence on that?
Any "lights" would be greatly appreciated
I wouldn't expect this code to work. This shouldn't be correct:
var meanConseqDif = [Double](count: 1, repeatedValue:0.0)
var meanConseqDifPtr = UnsafeMutablePointer<Double>(meanConseqDif)
vDSP_meanvD(diferencas_consecutivas_media,1,meanConseqDifPtr,UInt(nframes))
I believe this is pointing directly at the Array struct, so you're probably blowing away the metadata rather than updating the value you meant. But I would expect that you don't get the right answers at all in that case. Have you validated that your results are correct usually?
I think the code you mean is like this:
func F(dataAllFrames: [Double], std: Double, medida: String) -> Double {
let nframes = UInt(dataAllFrames.count)
var diferencas_consecutivas_media = [Double](count: dataAllFrames.count-1, repeatedValue:0.0)
for(var i = 1; i < dataAllFrames.count; i += 1) {
diferencas_consecutivas_media[i-1] = dataAllFrames[i] - dataAllFrames[i-1]
}
var mediaDifConseq = 0.0
vDSP_meanvD(diferencas_consecutivas_media, 1, &mediaDifConseq, nframes)
return mediaDifConseq
}
You don't need an output array to collect a single result. You can just use a Double directly, and use & to take an unsafe pointer to it.
Unrelated point, but you can get rid of all of the difference-generating code with a single zip and map:
let diferencasConsecutivasMedia = zip(dataAllFrames, dataAllFrames.dropFirst())
.map { $1 - $0 }
I haven't profiled these two approaches, though. It's possible that your approach is faster. I find the zip and map much clearer and less error-prone, but others may feel differently.

Process Array in parallel using GCD

I have a large array that I would like to process by handing slices of it to a few asynchronous tasks. As a proof of concept, I have the written the following code:
class TestParallelArrayProcessing {
let array: [Int]
var summary: [Int]
init() {
array = Array<Int>(count: 500000, repeatedValue: 0)
for i in 0 ..< 500000 {
array[i] = Int(arc4random_uniform(10))
}
summary = Array<Int>(count: 10, repeatedValue: 0)
}
func calcSummary() {
let group = dispatch_group_create()
let queue = dispatch_get_global_queue(QOS_CLASS_USER_INITIATED, 0)
for i in 0 ..< 10 {
dispatch_group_async(group, queue, {
let base = i * 50000
for x in base ..< base + 50000 {
self.summary[i] += self.array[x]
}
})
}
dispatch_group_notify(group, queue, {
println(self.summary)
})
}
}
After init(), array will be initialized with random integers between 0 and 9.
The calcSummary function dispatches 10 tasks that take disjoint chunks of 50000 items from array and add them up, using their respective slot in summary as an accummulator.
This program crashes at the self.summary[i] += self.array[x] line. The error is:
EXC_BAD_INSTRUCTION (code = EXC_I386_INVOP).
I can see, in the debugger, that it has managed to iterate a few times before crashing, and that the variables, at the time of the crash, have values within correct bounds.
I have read that EXC_I386_INVOP can happen when trying to access an object that has already been released. I wonder if this has anything to do with Swift making a copy of the array if it is modified, and, if so, how to avoid it.
This is a slightly different take on the approach in #Eduardo's answer, using the Array type's withUnsafeMutableBufferPointer<R>(body: (inout UnsafeMutableBufferPointer<T>) -> R) -> R method. That method's documentation states:
Call body(p), where p is a pointer to the Array's mutable contiguous storage. If no such storage exists, it is first created.
Often, the optimizer can eliminate bounds- and uniqueness-checks within an array algorithm, but when that fails, invoking the same algorithm on body's argument lets you trade safety for speed.
That second paragraph seems to be exactly what's happening here, so using this method might be more "idiomatic" in Swift, whatever that means:
func calcSummary() {
let group = dispatch_group_create()
let queue = dispatch_get_global_queue(QOS_CLASS_USER_INITIATED, 0)
self.summary.withUnsafeMutableBufferPointer {
summaryMem -> Void in
for i in 0 ..< 10 {
dispatch_group_async(group, queue, {
let base = i * 50000
for x in base ..< base + 50000 {
summaryMem[i] += self.array[x]
}
})
}
}
dispatch_group_notify(group, queue, {
println(self.summary)
})
}
When you use the += operator, the LHS is an inout parameter -- I think you're getting race conditions when, as you mention in your update, Swift moves around the array for optimization. I was able to get it to work by summing the chunk in a local variable, then simply assigning to the right index in summary:
func calcSummary() {
let group = dispatch_group_create()
let queue = dispatch_get_global_queue(QOS_CLASS_USER_INITIATED, 0)
for i in 0 ..< 10 {
dispatch_group_async(group, queue, {
let base = i * 50000
var sum = 0
for x in base ..< base + 50000 {
sum += self.array[x]
}
self.summary[i] = sum
})
}
dispatch_group_notify(group, queue, {
println(self.summary)
})
}
You can also use concurrentPerform(iterations: Int, execute work: (Int) -> Swift.Void) (since Swift 3).
It has a much simpler syntax and will wait for all threads to finalise before returning.:
DispatchQueue.concurrentPerform(iterations: iterations) { i in
performOperation(i)
}
I think Nate is right: there are race conditions with the summary variable. To fix it, I used summary's memory directly:
func calcSummary() {
let group = dispatch_group_create()
let queue = dispatch_get_global_queue(QOS_CLASS_USER_INITIATED, 0)
let summaryMem = UnsafeMutableBufferPointer<Int>(start: &summary, count: 10)
for i in 0 ..< 10 {
dispatch_group_async(group, queue, {
let base = i * 50000
for x in base ..< base + 50000 {
summaryMem[i] += self.array[x]
}
})
}
dispatch_group_notify(group, queue, {
println(self.summary)
})
}
This works (so far).
EDIT
Mike S has a very good point, in his comment below. I have also found this blog post, which sheds some light on the problem.
Any solution that assigns the i'th element of the array concurrently risks race condition (Swift's array is not thread-safe). On the other hand, dispatching to the same queue (in this case main) before updating solves the problem but results in a slower performance overall. The only reason I see for taking either of these two approaches is if the array (summary) cannot wait for all concurrent operations to finish.
Otherwise, perform the concurrent operations on a local copy and assign it to summary upon completion. No race condition, no performance hit:
Swift 4
func calcSummary(of array: [Int]) -> [Int] {
var summary = Array<Int>.init(repeating: 0, count: array.count)
let iterations = 10 // number of parallel operations
DispatchQueue.concurrentPerform(iterations: iterations) { index in
let start = index * array.count / iterations
let end = (index + 1) * array.count / iterations
for i in start..<end {
// Do stuff to get the i'th element
summary[i] = Int.random(in: 0..<array.count)
}
}
return summary
}
I've answered a similar question here for simply initializing an array after computing on another array