I'm trying to understand how DispatchQueues really work and I'd like to know if this is safe to assume that DispatchQueues have their own managing thread ? For example let's take the serial queue. In my understanding - since it is serial the new task can be started only after the end of the current one, so "someone" have to dequeue the new task from the queue and submit it for execution! So, basically it seems that the queue have to leave within its own thread which will dispatch the tasks that are stored in the queue. Is this correct or I misunderstood something?
No, you should not assume that DispatchQueues have their own managed threads, and it doesn't have to execute all tasks on the same thread. It only guarantees that the next task is picked up after the previous one completes:
Work submitted to dispatch queues executes on a pool of threads managed by the system. Except for the dispatch queue representing your app's main thread, the system makes no guarantees about which thread it uses to execute a task.
(source)
Practically it is very possible, that the same thread will run several or all sequential tasks from the same sequential queue - provided they run close to each other (in time). I will speculate that this is not by a pure coincidence, but by optimization (avoids context switches). But it's not a guarantee.
In fact you can do this little experiment:
let serialQueue = DispatchQueue(label: "my.serialqueue")
var incr: Int = 0
DispatchQueue.concurrentPerform(iterations: 5) { iteration in
// Rundomize time of access to serial queue
sleep(UInt32.random(in: 1...30))
// Schedule execution on serial queue
serialQueue.async {
incr += 1
print("\(iteration) \(Date().timeIntervalSince1970) incremented \(incr) on \(Thread.current)")
}
}
You will see something like this:
3 1612651601.6909518 incremented 1 on <NSThread: 0x600000fa0d40>{number = 7, name = (null)}
4 1612651611.689259 incremented 2 on <NSThread: 0x600000faf280>{number = 9, name = (null)}
0 1612651612.68934 incremented 3 on <NSThread: 0x600000fb4bc0>{number = 3, name = (null)}
2 1612651617.690246 incremented 4 on <NSThread: 0x600000fb4bc0>{number = 3, name = (null)}
1 1612651622.690335 incremented 5 on <NSThread: 0x600000faf280>{number = 9, name = (null)}
Iterations start concurrently, but we make them sleep for a random time, so that they access a serial queue at different times. The result is that it's not very likely that the same thread picks up every task, although task execution is perfectly sequential.
Now if you remove a sleep on top, causing all iterations request access to sequential queue at the same time, you will most likely see that all tasks will run on the same thread, which I think is optimization, not coincidence:
4 1612651665.3658218 incremented 1 on <NSThread: 0x600003c94880>{number = 6, name = (null)}
3 1612651665.366118 incremented 2 on <NSThread: 0x600003c94880>{number = 6, name = (null)}
2 1612651665.366222 incremented 3 on <NSThread: 0x600003c94880>{number = 6, name = (null)}
0 1612651665.384039 incremented 4 on <NSThread: 0x600003c94880>{number = 6, name = (null)}
1 1612651665.3841062 incremented 5 on <NSThread: 0x600003c94880>{number = 6, name = (null)}
Here's an excellent read on topic of iOS Concurrency "Underlying Truth"
It's true that GCD on macOS includes some direct support from kernel. All the code is open source and can be viewed at: https://opensource.apple.com/source/libdispatch/
However, a Linux implementation of the same Dispatch APIs is available as part of the Swift open source project (swift-corelibs-libdispatch). This implementation does not use any special kernel support, but is just implemented using pthreads. From that project's readme:
libdispatch on Darwin [macOS] is a combination of logic in the xnu kernel alongside the user-space Library. The kernel has the most information available to balance workload across the entire system. As a first step, however, we believe it is useful to bring up the basic functionality of the library using user-space pthread primitives on Linux. Eventually, a Linux kernel module could be developed to support more informed thread scheduling.
To address your question specifically — it's not correct that each queue has a managing thread. A queue is more like a data structure that assists in the management of threads — an abstraction that makes it so you as the developer don't have to think about thread details.
How threads are created and used is up to the system and can vary depending on what you do with your queues. For example, using .sync() on a queue often just acquires a lock and executes the block on the calling thread, even if the queue is a concurrent queue. You can see this by setting a breakpoint and observing which thread you're running on:
let syncQueue = DispatchQueue(label: "syncQueue", attributes: .concurrent)
print("Before sync")
syncQueue.sync {
print("On queue")
}
print("After sync")
On the other hand, multiple async tasks can run at once on a concurrent queue, backed by multiple threads. In practice, the global queues seem to use up to 64 threads at once (the code prints "used 64 threads"):
var threads: Set<Thread> = []
let threadQueue = DispatchQueue(label: "threads set")
let group = DispatchGroup()
for _ in 0..<100 {
group.enter()
DispatchQueue.global(qos: .default).async {
sleep(2)
let thisThread = Thread.current
threadQueue.sync { _ = threads.insert(thisThread) }
group.leave()
}
}
group.wait() // wait for all async() blocks to finish
print("Used \(threads.count) threads")
But without the sleep(), the tasks finish quickly, and the system doesn't need to use so many threads (the program prints "Used 20 threads", or 30, or some lower number).
The main queue is another serial queue, which runs as part of your application lifecycle, or can be run manually with dispatchMain().
Related
I have a piece of code that spams long running tasks 5-6 times per second. Each task takes some time to finish. I want to ignore all the other tasks while 1 is being executed. After it finishes a fresh one should take its place.
There are a bunch of tools being used for concurrency in Swift 4.2. What would work the best?
For solving of this problem you can use GCD or Operation. In case that you have describ I would use Operation. Using this approach you can have a bit more user friendly control over Operation that are executing (stoping, cancelling....).
Small example:
let queue = OperationQueue()
queue.maxConcurrentOperationCount = 1
queue.addOperation { print("🤠") }
queue.addOperation { print("🤓") }
queue.addOperation { print("👺") }
In this case operations are executed one by one.
In a nutshell:
I have one counter variable that is accessed from many threads. Although I've implemented multi-thread read/write protections, the variable seems to still -in an inconsistent way- get written to simultaneously, leading to incorrect results from the counter.
Getting into the weeds:
I'm using a "for loop" that triggers roughly 100 URL requests in the background, each in its “DispatchQueue.global(qos: .userInitiated).async” queue.
These processes are async, once they finish they update a “counter” variable. This variable is supposed to be multi-thread protected, meaning it’s always accessed from one thread and it’s accessed syncronously. However, something is wrong, from time to time the variable will be accessed simultaneously by two threads leading to the counter not updating correctly. Here's an example, lets imagine we have 5 URLs to fetch:
We start with the Counter variable at 5.
1 URL Request Finishes -> Counter = 4
2 URL Request Finishes -> Counter = 3
3 URL Request Finishes -> Counter = 2
4 URL Request Finishes (and for some reason – I assume variable is accessed at the same time) -> Counter 2
5 URL Request Finishes -> Counter = 1
As you can see, this leads to the counter being 1, instead of 0, which then affects other parts of the code. This error happens inconsistently.
Here is the multi-thread protection I use for the counter variable:
Dedicated Global Queue
//Background queue to syncronize data access fileprivate let
globalBackgroundSyncronizeDataQueue = DispatchQueue(label:
"globalBackgroundSyncronizeSharedData")
Variable is always accessed via accessor:
var numberOfFeedsToFetch_Value: Int = 0
var numberOfFeedsToFetch: Int {
set (newValue) {
globalBackgroundSyncronizeDataQueue.sync() {
self.numberOfFeedsToFetch_Value = newValue
}
}
get {
return globalBackgroundSyncronizeDataQueue.sync {
numberOfFeedsToFetch_Value
}
}
}
I assume I may be missing something but I've used profiling and all seems to be good, also checked the documentation and I seem to be doing what they recommend. Really appreciate your help.
Thanks!!
Answer from Apple Forums:https://forums.developer.apple.com/message/322332#322332:
The individual accessors are thread safe, but an increment operation
isn't atomic given how you've written the code. That is, while one
thread is getting or setting the value, no other threads can also be
getting or setting the value. However, there's nothing preventing
thread A from reading the current value (say, 2), thread B reading the
same current value (2), each thread adding one to this value in their
private temporary, and then each thread writing their incremented
value (3 for both threads) to the property. So, two threads
incremented but the property did not go from 2 to 4; it only went from
2 to 3. You need to do the whole increment operation (get, increment
the private value, set) in an atomic way such that no other thread can
read or write the property while it's in progress.
I'm trying to do 3 async requests and control the load with semaphores to know when all have loaded.
I Init the semaphore in this way:
let sem = dispatch_semaphore_create(2);
Then send to background the waiting for semaphore code:
let backgroundQueue = dispatch_get_global_queue(QOS_CLASS_BACKGROUND, 0)
dispatch_async(backgroundQueue) { [unowned self] () -> Void in
println("Waiting for filters load")
dispatch_semaphore_wait(sem, DISPATCH_TIME_FOREVER);
println("Loaded")
}
Then I signal it 3 times (on each request onSuccess and onFailure):
dispatch_semaphore_signal(sem)
But when the signal code arrives it already passed the semaphore wait code, it never waits to subtract the semaphore count.
why?
You've specified dispatch_semaphore_create with a parameter of 2 (which is like calling dispatch_semaphore_signal twice), and then signal it three more times (for a total of five), but you appear to have only one wait (which won't wait at all because you started your semaphore with a count of 2).
That's obviously not going to work. Even if you fixed that (e.g. use zero for the creation of the semaphore and then issue three waits) this whole approach is inadvisable because you're unnecessarily tying up a thread waiting for the the other requests to finish.
This is a textbook candidate for dispatch groups. So you would generally use the following:
Create a dispatch_group_t:
dispatch_group_t group = dispatch_group_create();
Then do three dispatch_group_enter, once before each request.
In each of the three onSuccess/onFailure blocks pairs, do a dispatch_group_leave in both block.
Create a dispatch_group_notify block that will be performed when all of the requests are done.
from random import randrange
from time import sleep
#import thread
from threading import Thread
from Queue import Queue
'''The idea is that there is a Seeker method that would search a location
for task, I have no idea how many task there will be, could be 1 could be 100.
Each task needs to be put into a thread, does its thing and finishes. I have
stripped down a lot of what this is really suppose to do just to focus on the
correct queuing and threading aspect of the program. The locking was just
me experimenting with locking'''
class Runner(Thread):
current_queue_size = 0
def __init__(self, queue):
self.queue = queue
data = queue.get()
self.ID = data[0]
self.timer = data[1]
#self.lock = data[2]
Runner.current_queue_size += 1
Thread.__init__(self)
def run(self):
#self.lock.acquire()
print "running {ID}, will run for: {t} seconds.".format(ID = self.ID,
t = self.timer)
print "Queue size: {s}".format(s = Runner.current_queue_size)
sleep(self.timer)
Runner.current_queue_size -= 1
print "{ID} done, terminating, ran for {t}".format(ID = self.ID,
t = self.timer)
print "Queue size: {s}".format(s = Runner.current_queue_size)
#self.lock.release()
sleep(1)
self.queue.task_done()
def seeker():
'''Gathers data that would need to enter its own thread.
For now it just uses a count and random numbers to assign
both a task ID and a time for each task'''
queue = Queue()
queue_item = {}
count = 1
#lock = thread.allocate_lock()
while (count <= 40):
random_number = randrange(1,350)
queue_item[count] = random_number
print "{count} dict ID {key}: value {val}".format(count = count, key = random_number,
val = random_number)
count += 1
for n in queue_item:
#queue.put((n,queue_item[n],lock))
queue.put((n,queue_item[n]))
'''I assume it is OK to put a tulip in and pull it out later'''
worker = Runner(queue)
worker.setDaemon(True)
worker.start()
worker.join()
'''Which one of these is necessary and why? The queue object
joining or the thread object'''
#queue.join()
if __name__ == '__main__':
seeker()
I have put most of my questions in the code itself, but to go over the main points (Python2.7):
I want to make sure I am not creating some massive memory leak for myself later.
I have noticed that when I run it at a count of 40 in putty or VNC on
my linuxbox that I don't always get all of the output, but when
I use IDLE and Aptana on windows, I do.
Yes I understand that the point of Queue is to stagger out your
Threads so you are not flooding your system's memory, but the task at
hand are time sensitive so they need to be processed as soon as they
are detected regardless of how many or how little there are; I have
found that when I have Queue I can clearly dictate when a task has
finished as oppose to letting the garbage collector guess.
I still don't know why I am able to get away with using either the
.join() on the thread or queue object.
Tips, tricks, general help.
Thanks for reading.
If I understand you correctly you need a thread to monitor something to see if there are tasks that need to be done. If a task is found you want that to run in parallel with the seeker and other currently running tasks.
If this is the case then I think you might be going about this wrong. Take a look at how the GIL works in Python. I think what you might really want here is multiprocessing.
Take a look at this from the pydocs:
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
-> I am making an iphone application
-> I have a scenario where I am running number of threads in background.
-> Now suppose on main thread I receive an event and have to execute some action in a new background thread.
->But when I perform some action on the new background thread at that time my all other threads should pause/sleep till the action is completed.
-> Once the action is over all other threads should resume their operation.
I will be exploring more on this, but if any one has idea please provide some input.
Thanks
Usually signalling can be done using pthread conditions,using mutex for synchronizing, Like so:
Create Mutex and Condition Variables:
pthread_mutex_init(&mutex, NULL);
pthread_cond_init(&cond, NULL);
Thread 1: Wait for the Signal:
pthread_cond_wait(&cond, &mutex);
Thread 2 : Signal the thread waiting for condition:
pthread_mutex_lock(&mutex);
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);