shared queue between multiple process in perl - perl

The following senario was done using threads
A large queue #work_queue populated/enqueued by the main thread. Used Thread::Queue here.
≥ 2 connection objects of something are added in #conns which had to be loaded serially as part of the loading process uses Expect->spawn
Multiple Worker threads are invoked, and each thread given a single $conns[$i] object & reference to the shared \#work_queue.
The worker threads safely removes a single item from #work_queue and performs some processing through its connection object, after which it picks up the next available item from #work_queue.
When this #work_queue is empty all the threads will shutdown safely
Now, the problem is that the loading phase is taking too long in many cases. But due to the use of Expect->spawn, parallel loading of #conns is possible only on a separate process & not on a thread.
Please suggest a good way to achieve the above scenario using fork. Or, even better if there is a way to use Expect->spawn with threads. (UNIX/LINUX only)
See Is it possible to use threads with Expect?

Related

When are GCD queues used and when do you know you need them? Swift

After reading about Concurrent and Serial queues, sync and async, I think I have an idea about how to create queues and the order they are executed in. My problem is that in any of the tutorials I have seen, none of them actually tell you many use cases. For example:
I have a network manager that uses URLSessions and serializes json to send a request to my api. Does it make sense to wrap it in a .utility Queue or in a .userInitiated or do I just don't wrap it in a queue.
let task = LoginTask(username: username, password: password)
let networkQueue = DispatchQueue(label: "com.messenger.network",
qos: DispatchQoS.userInitiated)
networkQueue.async {
task.dataTask(in: dispatcher) { (user, httpCode, error) in
self.presenter?.loginUserResponse(user: user, httpCode: httpCode, error: error)
}
}
My question is: Is there any guidlines I can follow to know when there is a need to use queues or not because I cant find this information anywhere. I realise apple provides example usage howver it is very vague
Dispatch queues are used in a multitude of use cases, so it's hard to enumerate them, but two very common use cases are as follows:
You have some expensive and/or time-consuming process that you want to run on some thread other than the current thread. Often this is used when you're on the main thread and you want to run something on a background thread.
A good example of this would be image manipulation, which is a notoriously computationally (and memory) intensive process. So, you'd create a queue for image manipulation and then you'd dispatch each image manipulation task to that queue. You might also dispatch the UI update when it's done back to the main queue (because all UI updates must happen on the main thread). A common pattern would be:
imageQueue.async {
// manipulate the image here
// when done, update the UI:
DispatchQueue.main.async {
// update the UI and/or model objects on the main thread
}
}
You have some shared resource (it could be a simple variable, it could be some interaction with some other shared resource like a file or database) that you want to synchronize regardless of from which thread to invoke it. This is often part of a broader strategy of making something that is not inherently thread-safe behave in a thread safe manner.
The virtue of dispatch queues is that it greatly simplifies writing multi-threaded code, an otherwise very complicated technology.
The thing is that your example, initiating a network request, already runs the request on a background thread and URLSession manages all of this for you, so there's little value in using queues for that.
In the interest of full disclosure, there is a surprising of variety of different tools using GCD directly (e.g. dispatch groups or dispatch sources) or indirectly (e.g. operation queues) above and beyond the basic dispatch queues discussed above:
Dispatch groups: Sometimes you will initiate a series of asynchronous tasks and you want to be notified when they're all done. You can use a dispatch group (see https://stackoverflow.com/a/28101212/1271826 for a random example). This eliminates you from needing to keep track of when all of these tasks are done yourself.
Dispatch "apply" (now called concurrentPerform): Sometimes when you're running some massively parallel task, you want to use as many threads as you reasonably can. So concurrentPerform lets you effectively perform a for loop in parallel, and Apple has optimized it for the number of cores and CPUs your particular device, while not flooding it with too many concurrent tasks at any one time, exhausting the limited number of worker threads. See the https://stackoverflow.com/a/39949292/1271826 for an example of running a for loop in parallel.
Dispatch sources:
For example, if you have some background task that is doing a lot of work and you want to update the UI with the progress, sometimes those UI updates can come more quickly than the UI can handle them. So you can use a dispatch source (a DispatchSourceUserDataAdd) to decouple the background process from the UI updates. See aforementioned https://stackoverflow.com/a/39949292/1271826 for an example.
Traditionally, a Timer runs on the main run loop. But sometimes you want to run it on a background thread, but doing that with a Timer is complicated. But you can use a DispatchSourceTimer (a GCD timer) to run a timer on a queue other than the main queue. See https://stackoverflow.com/a/38164203/1271826 for example of how to create and use a dispatch timer. Dispatch timers also can be used to avoid some of the strong reference cycles that are easily introduced with target-based Timer objects.
Barriers: Sometimes when using a concurrent queue, you want most things to run concurrently, but for other things to run serially with respect to everything else on the queue. A barrier is a way to say "add this task to the queue, but make sure it doesn't run concurrently with respect to anything else on that queue."
An example of a barrier is the reader-writer pattern, where reading from some memory resource can happen concurrently with respect to all other reads, but any writes must not happen concurrently with respect to anything else on the queue. See https://stackoverflow.com/a/28784770/1271826 or https://stackoverflow.com/a/45628393/1271826.
Dispatch semaphores: Sometimes you need to let two tasks running on separate threads communicate to each other. You can use semaphores for one thread to "wait" for the "signal" from another.
One common application of semaphores is to make an inherently asynchronous task behave in a more synchronous manner.
networkQueue.async {
let semaphore = DispatchSemaphore(0)
let task = session.dataTask(with: url) { data, _, error in
// process the response
// when done, signal that we're done
semaphore.signal()
}
task.resume()
semaphore.wait(timeout: .distantFuture)
}
The virtue of this approach is that the dispatched task won't finish until the asynchronous network request is done. So if you needed to issue a series of network requests, but not have them run concurrently, semaphores can accomplish that.
Semaphores should be used sparingly, though, because they're inherently inefficient (generally blocking one thread waiting for another). Also, make sure you never wait for a semaphore from the main thread (because you're defeating the purpose of having the asynchronous task). That's why in the above example, I'm waiting on the networkQueue, not the main queue. All of this having been said, there's often better techniques than semaphores, but it is sometimes useful.
Operation queues: Operation queues are built on top of GCD dispatch queues, but offer some interesting advantages including:
The ability to wrap an inherently asynchronous task in a custom Operation subclass. (This avoids the disadvantages of the semaphore technique I discussed earlier.) Dispatch queues are generally used when running inherently synchronous tasks on a background thread, but sometimes you want to manage a bunch of tasks that are, themselves, asynchronous. A common example is the wrapping of asynchronous network requests in Operation subclass.
The ability to easily control the degree of concurrency. Dispatch queues can be either serial or concurrent, but it's cumbersome to design the control mechanism to, for example, to say "run the queued tasks concurrent with respect to each other, but no more than four at any given time." Operation queues make this much easier with the use of maxConcurrentOperationCount. (See https://stackoverflow.com/a/27022598/1271826 for an example.)
The ability to establish dependencies between various tasks (e.g. you might have a queue for downloading images and another queue for manipulating the images). With operation queues you can have one operation for the downloading of an image and another for the processing of the image, and you can make the latter dependent upon the completion of the former.
There are lots of other GCD related applications and technologies, but these are a few that I use with some frequency.

If one thread is busy on I/O will the entire process be blocked

In a multi-threaded process,If one thread is busy on I/O will the entire process be blocked?
AFAIK, it totally depends on programmer that how they manage the threads inside the programs.
If another thread is there with no I/O, processor will never sit idle & start executing this thread. However, process in split threads such that one thread waits for the result of the other, the the entire process will be blocked.
Please comment if more information needs to be added.
Does there exist any other explaination?
If the process has only one thread, then yes.
If the process has multiple threads, then normally no if the operating system supports multithreading.
This question can also be addressed in terms of the underlying implementation of user threads. There are different models for multithreading models, in order to implement user threads they have to be mapped to a kernel thread:
Many-to-One: Many user threads to one kernel thread
One-to-One: Each user thread is assigned to a kernel thread.
Many-to-Many: Many user threads are split on different kernel threads.
In the many-to-one case, a single block-operation (system call) within the thread can block the whole process. This disadvantage is not present in the one-to-one model.

Networking using run loop

I have an application which uses some external library for analytics. Problem is that I suspect it does some things synchronously, which blocks my thread and makes watchdog kill my app after 10 secs (0x8badf00d code). It is really hard to reproduce (I cannot), but there are quite few cases "in the wild".
I've read some documentation, which suggested that instead creating another thread I should use run-loops. Unfortunately the more I read about them, the more confused I get. And the last thing i want to do is release a fix which will break even more things :/
What I am trying to achieve is:
From main thread add a task to the run-loop, which calls just one function: initMyAnalytics(). My thread continues running, even if initMyAnalytics() gets locked waiting for network data. After initMyAnalytics() finishes, it quietly quits and never gets called again (so it doesnt loop or anything).
Any ideas how to achieve it? Code examples are welcome ;)
Regards!
You don't need to use a run loop in that case. Run loops' purpose is to proceed events from various sources sequentially in a particular thread and stay idle when they have nothing to do. Of course, you can detach a thread, create a run loop, add a source for your function and run the run loop until the function ends. The same as you can use a semi-trailer truck to carry your groceries home.
Here, what you need are dispatch queues. Dispatch queues are First-In-First-Out data structures that run tasks asynchronously. In contrary to run loops, a dispatch queue isn't tied to a particular thread: the working threads are automatically created and terminated as and when required.
As you only have one task to execute, you don't need to create a dispatch queue. Instead you will use an existing global concurrent queue. A concurrent queue execute one or more tasks concurrently, which is perfectly fine in our case. But if we had many tasks to execute and wanted each task to wait for its predecessor to end, we would need to create a serial queue.
So all you have to do is:
create a task for your function by enclosing it into a Block
get a global queue using dispatch_get_global_queue
add the task to the queue using dispatch_async.
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
initMyAnalytics();
});
DISPATCH_QUEUE_PRIORITY_DEFAULT is a macro that evaluates to 0. You can get different global queues with different priorities. The second parameter is reserved for future use and should always be 0.

NSManagedObjectContext deadlocking from 2 serial queues

I've created a system where i can request an NSManagedObjectContext from a singleton object, dependant on the queue it's running on. Every serial GCD dispatch queue is associated with a certain task, and thus gets its own context, though all with the same persistent store coordinator.
I was under the assumption that this would solve my problems associated with threads, which it so far seems to have done, but now i have a different problem: If 2 serial queues, with different MOCs, both try to make the context execute, they both lock and the app freezes. So what did i miss?
"...[I]f you create one context per thread, but all pointing to the same persistent store coordinator, Core Data takes care of accessing the coordinator in a thread-safe way (the lock and unlock methods of NSManagedObjectContext handle recursion)." (source)
What i read there, is that Core Data should handle locking and unlocking correctly with my setup. Or do i understand 'in a thread-safe way' wrong in this case?
Edit: I basically have a dictionary that maps a queue to a context. At first i wanted to work with threads instead of queues, until i read this part:
"Note: You can use threads, serial operation queues, or dispatch queues for concurrency. For the sake of conciseness, this article uses “thread” throughout to refer to any of these." (source)
If by "serial queue" you mean GCD dispatch queue or NSOperationQueue, you are making incorrect assumptions that each queue has a dedicated thread or that the tasks for each queue always run on the same thread.
You need to figure out a way of mapping a thread to a managed object context, perhaps by way of an NSDictionary and when you run a task on your queue, get the MOC associated with the current thread.
JeremyP is right: queues do not == threads. A queue may create a new thread for each operation - Core Data (in the default mode) requires thread confinement (that is, the thread that created the NSManagedObjectContext must be the thread used for all access to any objects from that context).
You may want to check how the confinement options are used - if you're targeting iOS5 alone, you might be able to change it without too much difficulty and still use the queues.

How to limit concurrency when using actors in Scala?

I'm coming from Java, where I'd submit Runnables to an ExecutorService backed by a thread pool. It's very clear in Java how to set limits to the size of the thread pool.
I'm interested in using Scala actors, but I'm unclear on how to limit concurrency.
Let's just say, hypothetically, that I'm creating a web service which accepts "jobs". A job is submitted with POST requests, and I want my service to enqueue the job then immediately return 202 Accepted — i.e. the jobs are handled asynchronously.
If I'm using actors to process the jobs in the queue, how can I limit the number of simultaneous jobs that are processed?
I can think of a few different ways to approach this; I'm wondering if there's a community best practice, or at least, some clearly established approaches that are somewhat standard in the Scala world.
One approach I've thought of is having a single coordinator actor which would manage the job queue and the job-processing actors; I suppose it could use a simple int field to track how many jobs are currently being processed. I'm sure there'd be some gotchyas with that approach, however, such as making sure to track when an error occurs so as to decrement the number. That's why I'm wondering if Scala already provides a simpler or more encapsulated approach to this.
BTW I tried to ask this question a while ago but I asked it badly.
Thanks!
I'd really encourage you to have a look at Akka, an alternative Actor implementation for Scala.
http://www.akkasource.org
Akka already has a JAX-RS[1] integration and you could use that in concert with a LoadBalancer[2] to throttle how many actions can be done in parallell:
[1] http://doc.akkasource.org/rest
[2] http://github.com/jboner/akka/blob/master/akka-patterns/src/main/scala/Patterns.scala
You can override the system properties actors.maxPoolSize and actors.corePoolSize which limit the size of the actor thread pool and then throw as many jobs at the pool as your actors can handle. Why do you think you need to throttle your reactions?
You really have two problems here.
The first is keeping the thread pool used by actors under control. That can be done by setting the system property actors.maxPoolSize.
The second is runaway growth in the number of tasks that have been submitted to the pool. You may or may not be concerned with this one, however it is fully possible to trigger failure conditions such as out of memory errors and in some cases potentially more subtle problems by generating too many tasks too fast.
Each worker thread maintains a dequeue of tasks. The dequeue is implemented as an array that the worker thread will dynamically enlarge up to some maximum size. In 2.7.x the queue can grow itself quite large and I've seen that trigger out of memory errors when combined with lots of concurrent threads. The max dequeue size is smaller 2.8. The dequeue can also fill up.
Addressing this problem requires you control how many tasks you generate, which probably means some sort of coordinator as you've outlined. I've encountered this problem when the actors that initiate a kind of data processing pipeline are much faster than ones later in the pipeline. In order control the process I usually have the actors later in the chain ping back actors earlier in the chain every X messages, and have the ones earlier in the chain stop after X messages and wait for the ping back. You could also do it with a more centralized coordinator.