How to synchronize tasks in different dispatch queues?

How to synchronize tasks in different dispatch queues? - iphone

I'm new to queues and I'm having some trouble setting up the following scheme.
I have three tasks that need doing.
Task A: Can only run on the main queue, can run asynchronously with task B, cannot run asynchronously with task C. Runs a lot but runs fairly quickly.
Task B: Can run on any queue, can run asynchronously with task A, cannot run asynchronously with task C. Runs rarely, but takes a long time to run. Needs Task C to run afterwards, but once again task C cannot run asynchronously with task A.
Task C: Can run on any queue. Cannot run asynchronously with either task A or task B. Runs rarely and runs quickly.
Right now I have it like this:
Task A is submitted to the main queue by a Serial Queue X (a task is submitted to Serial Queue X to submit task A to the main queue).
Task B is submitted to Serial Queue X.
Task C is submitted to the main queue by Serial Queue X, just like task A.
The problem here is that task C sometimes runs at the same time as task B. The main queue sometimes runs task C at the same time that the serial queue runs task B.
So, how can I ensure that task B and task C never run at the same time while still allowing A and B to run at the same time and preventing A and C from running at the same time? Further, is there any easy way to make sure they run the same number of times? (alternating back and forth)

You know, I think I had this problem on my GRE, only A, B, and C were Bob, Larry, and Sue and they all worked at the same office.
I believe that this can be solved with a combination of a serial queue and a dispatch semaphore. If you set up a single-wide serial dispatch queue and submit tasks B and C to that, you'll guarantee that they won't run at the same time. You can then use a dispatch semaphore with a count set to 1 that is shared between tasks A and C to guarantee that only one of them will run at a time. I describe how such a semaphore works in my answer here. You might need to alter that code to use DISPATCH_TIME_FOREVER so that task A is held up before submission rather than just tossed aside if C is running (likewise for submission of C).
This way, A and B will be running on different queues (the main queue and your serial queue) so they can execute in parallel, but B and C cannot run at the same time due to their shared queue, nor can A and C because of the semaphore.
As far as load balancing on A and C goes (what I assume you want to balance), that's probably going to be fairly application-specific, and might require some experimentation on your part to see how to interleave actions properly without wasting cycles. I'd also make sure that you really need them to alternate evenly, or if you can get by with one running slightly more than another.

Did you check out NSOperation to synchronize your operations? You can handle dependencies there.

There's a much simpler way, of course, assuming that C must always follow A and B, which is to have A and B schedule C as completion callbacks for their own operations (and have C check to make sure it's not already running, in case A and B both ask for it to happen simultaneously). The completion callback pattern (described in dispatch_async man page) is very powerful and a great way of serializing async operations that need to be nonetheless coupled.
Where the problem is A, B, C, D and E where A-D can run async and E must always run at the end, dispatch groups are a better solution since you can set E to run as the completion callback for the entire group and then simply put A-E in that group.

Related

Where does the scheduler run?

Having just finished a book on comp. architecture, I find myself not completely clarified on where the scheduler is running.
What I'm looking to have clarified is where the scheduler is running - does it have it's own core assigned to run that and nothing else, or is the "scheduler" in fact just a more ambiguous algorithm, that it implemented in every thread being executed - ex. upon preemption of thread, a swithToFrom() command is run?
I don't need specifics according to windows x/linux x/mac os x, just in general.

No the scheduler is not run in it's own core. In fact multi-threading was common long before multi-core CPUs were common.
The best way to see how scheduler code interacts with thread code is to start with a simple, cooperative, single-core example.
Suppose thread A is running and thread B is waiting on an event. thread A posts that event, which causes thread B to become runnable. The event logic has to call the scheduler, and, for the purposes of this example, we assume that it decides to switch to thread B. At this point in time the call stack will look something like this:
thread_A_main()
post_event(...)
scheduler(...)
switch_threads(threadA, threadB)
switch_threads will save the CPU state on the stack, save thread A's stack pointer, and load the CPU stack pointer with the value of thread B's stack pointer. It will then load the rest of the CPU state from the stack, where the stack is now stack B. At this point, the call stack has become
thread_B_main()
wait_on_event(...)
scheduler(...)
switch_threads(threadB, threadC)
In other words, thread B has now woken up in the state it was in when it previously yielded control to thread C. When switch_threads() returns, it returns control to thread B.
These kind of manipulations of the stack pointer usually require some hand-coded assembler.
Add Interrupts
Thread B is running and a timer interrupts occurs. The call stack is now
thread_B_main()
foo() //something thread B was up to
interrupt_shell
timer_isr()
interrupt_shell is a special function. It is not called. It is preemptively invoked by the hardware. foo() did not call interrupt_shell, so when interrupt_shell returns control to foo(), it must restore the CPU state exactly. This is different from a normal function, which returns leaving the CPU state according to calling conventions. Since interrupt_shell follows different rules to those stated by the calling conventions, it too must be written in assembler.
The main job of interrupt_shell is to identify the source of the interrupt and call the appropriate interrupt service routine (ISR) which in this case is timer_isr(), then control is returned to the running thread.
Add preemptive thread switches
Suppose the timer_isr() decides that it's time for a time-slice. Thread D is to be given some CPU time
thread_B_main()
foo() //something thread B was up to
interrupt_shell
timer_isr()
scheduler()
Now, scheduler() can't call switch_threads() at this point because we are in interrupt context. However, it can be called soon after, usually as the last thing interrupt_shell does. This leaves the thread B stack saved in this state
thread_B_main()
foo() //something thread B was up to
interrupt_shell
switch_threads(threadB, threadD)
Add Deferred Service Routines
Some OSses do not allow you to do complex logic like scheduling from within ISRs. One solution is to use a deferred service routine (DSR) which runs as higher priority than threads but lower than interrupts. These are used so that while scheduler() still needs to be protected from being preempted by DSRs, ISRs can be executed without a problem. This reduces the number of places a kernel has to mask (switch off) interrupts to keep it's logic consistent.
I once ported some software from an OS that had DSRs to one that didn't. The simple solution to this was to create a "DSR thread" that ran higher priority than all other threads. The "DSR thread" simply replaces the DSR dispatcher that the other OS used.
Add traps
You may have observed in the examples I've given so far, we are calling the scheduler from both thread and interrupt contexts. There are two ways in and two ways out. It looks a bit weird but it does work. However, moving forward, we may want to isolate our thread code from our Kernel code, and we do this with traps. Here is the event posting redone with traps
thread_A_main()
post_event(...)
user_space_scheduler(...)
trap()
interrupt_shell
kernel_space_scheduler(...)
switch_threads(threadA, threadB)
A trap causes an interrupt or an interrupt-like event. On the ARM CPU they are known as "software interrupts" and this is a good description.
Now all calls to switch_threads() begin and end in interrupt context, which, incidentally usually happens in a special CPU mode. This is a step towards privilege separation.
As you can see, scheduling wasn't built in a day. You could go on:
Add a memory mapper
Add processes
Add multiple Cores
Add hyperthreading
Add virtualization
Happy reading!

Each core is separately running the kernel, and cooperates with other cores by reading / writing shared memory. One of the shared data structures maintained by the kernel is the list of tasks that are ready to run, and are just waiting for a timeslice to run in.
The kernel's process / thread scheduler runs on the core that needs to figure out what to do next. It's a distributed algorithm with no single decision-making thread.
Scheduling doesn't work by figuring out what task should run on which other CPU. It works by figuring out what this CPU should do now, based on which tasks are ready to run. This happens whenever a thread uses up its timeslice, or makes a system call that blocks. In Linux, even the kernel itself is pre-emptible, so a high-priority task can be run even in the middle of a system call that takes a lot of CPU time to handle. (e.g. checking the permissions on all the parent directories in an open("/a/b/c/d/e/f/g/h/file", ...), if they're hot in VFS cache so it doesn't block, just uses a lot of CPU time).
I'm not sure if this is done by having the directory-walking loop in (a function called by) open() "manually" call schedule() to see if the current thread should be pre-empted or not. Or maybe just that tasks waking up will have set some kind of hardware time to fire an interrupt, and the kernel in general is pre-emptible if compiled with CONFIG_PREEMPT.
There's an inter-processor interrupt mechanism to ask another core to schedule something on itself, so the above description is an over-simplification. (e.g. for Linux run_on to support RCU sync points, and TLB shootdowns when a thread on another core uses munmap). But it's true that there isn't one "master control program"; generally the kernel on each core decides what that core should be running. (By running the same schedule() function on a shared data-structure of tasks that are ready to run.)
The scheduler's decision-making is not always as simple as taking the task at the front of the queue: a good scheduler will try to avoid bouncing a thread from one core to another (because its data will be hot in the caches of the core it was last running on, if that was recent). So to avoid cache thrashing, a scheduler algorithm might choose not to run a ready task on the current core if it was just running on a different core, instead leaving it for that other core to get to later. That way a brief interrupt-handler or blocking system call wouldn't result in a CPU migration.
This is especially important in a NUMA system, where running on the "wrong" core will be slower long-term, even once the caches populate.

There are three types of general schedulers:
Job scheduler also known as the Long term scheduler.
Short term scheduler also known as the CPU scheduler.
Medium term scheduler, mostly used to swap jobs so there can be non-blocking calls. This is usually for not having too many I/O jobs or to little.
In an operating systems book it shows a nice automata of the states these schedulers go to and from. Job scheduler puts things from job queue to ready queue, the CPU scheduler takes things from ready queue to running state. The algorithm is just like any other software, it must be run on a cpu/core, it is most likely probably part of the kernel somewhere.
It doesn't make sense the scheduler can be preempted. The jobs inside the queue can be preempted when running, for I/O, etc. No the kernel does not have to schedule itself to allocate the task, it just gets cpu time without scheduling itself. And yes, most likely the data is in probably in ram, not sure if it is worth storing in the cpu cache.

Networking using run loop

I have an application which uses some external library for analytics. Problem is that I suspect it does some things synchronously, which blocks my thread and makes watchdog kill my app after 10 secs (0x8badf00d code). It is really hard to reproduce (I cannot), but there are quite few cases "in the wild".
I've read some documentation, which suggested that instead creating another thread I should use run-loops. Unfortunately the more I read about them, the more confused I get. And the last thing i want to do is release a fix which will break even more things :/
What I am trying to achieve is:
From main thread add a task to the run-loop, which calls just one function: initMyAnalytics(). My thread continues running, even if initMyAnalytics() gets locked waiting for network data. After initMyAnalytics() finishes, it quietly quits and never gets called again (so it doesnt loop or anything).
Any ideas how to achieve it? Code examples are welcome ;)
Regards!

You don't need to use a run loop in that case. Run loops' purpose is to proceed events from various sources sequentially in a particular thread and stay idle when they have nothing to do. Of course, you can detach a thread, create a run loop, add a source for your function and run the run loop until the function ends. The same as you can use a semi-trailer truck to carry your groceries home.
Here, what you need are dispatch queues. Dispatch queues are First-In-First-Out data structures that run tasks asynchronously. In contrary to run loops, a dispatch queue isn't tied to a particular thread: the working threads are automatically created and terminated as and when required.
As you only have one task to execute, you don't need to create a dispatch queue. Instead you will use an existing global concurrent queue. A concurrent queue execute one or more tasks concurrently, which is perfectly fine in our case. But if we had many tasks to execute and wanted each task to wait for its predecessor to end, we would need to create a serial queue.
So all you have to do is:
create a task for your function by enclosing it into a Block
get a global queue using dispatch_get_global_queue
add the task to the queue using dispatch_async.
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
initMyAnalytics();
});
DISPATCH_QUEUE_PRIORITY_DEFAULT is a macro that evaluates to 0. You can get different global queues with different priorities. The second parameter is reserved for future use and should always be 0.

Running two instances of the ScheduledThreadPoolExecutor

I have a number of asynchronous tasks to run in parallel. All the tasks can be divided into two types, lets call one - type A (that are time consuming) and everything else type B (faster and quick to execute ones).
with a single ScheduledThreadPoolExecutor with x poolsize, eventually at some point all threads are busy executing type A, as a resul type B gets blocked and delayed.
what im trying to accomplish is to run a type A tasks parallel to type B, and i want tasks in both the types to run parallel within their group for performance .
Would you think its prudent to have two instances of ScheduledThreadPoolExecutor for the type A and B exclusively with their own thread pools ? Do you see any issues with this approach?

No, that's seems reasonable.
I am doing something similar i.e. I need to execute tasks in serial fashion depending on some id e.g. all the tasks which are for component with id="1" need to be executed serially to each another and in parallel to all other tasks which are for components with different ids.
so basically I need a separate queue of tasks for each different component, the tasks are pulled one after another from each specific queue.
In order to achieve that I use
Executors.newSingleThreadExecutor(new JobThreadFactory(componentId));
for each component.
Additionally I need ExecutorService for a different type of tasks which are not bound to componentIds, for that I create additional ExecutorService instance
Executors.newFixedThreadPool(DEFAULT_THREAD_POOL_SIZE, new JobThreadFactory());
This works fine for my case at least.
The only problem I can think of if there is a need of ordered execution of the tasks i.e.
task2 NEEDS to be executed after task1 and so on... But I doubt this the case here ...

Condor job using DAG with some jobs needing to run the same host

I have a computation task which is split in several individual program executions, with dependencies. I'm using Condor 7 as task scheduler (with the Vanilla Universe, due do constraints on the programs beyond my reach, so no checkpointing is involved), so DAG looks like a natural solution. However some of the programs need to run on the same host. I could not find a reference on how to do this in the Condor manuals.
Example DAG file:
JOB A A.condor
JOB B B.condor
JOB C C.condor
JOB D D.condor
PARENT A CHILD B C
PARENT B C CHILD D
I need to express that B and D need to be run on the same computer node, without breaking the parallel execution of B and C.
Thanks for your help.

Condor doesn't have any simple solutions, but there is at least one kludge that should work:
Have B leave some state behind on the execute node, probably in the form of a file, that says something like MyJobRanHere=UniqueIdentifier". Use the STARTD_CRON support to detect this an advertise it in the machine ClassAd. Have D use Requirements=MyJobRanHere=="UniqueIdentifier". A part of D's final cleanup, or perhaps a new node E, it removes the state. If you're running large numbers of jobs through, you'll probably need to clean out left-over state occasionally.

I don't know the answer but you should ask this question on the Condor Users mailing list. The folks who support the DAG functionality in Condor monitor it and will respond. See this page for subscription information. It's fairly low traffic.
It's generally fairly difficult to keep two jobs together on the same host in Condor without locking them to a specific host in advance, DAG or no DAG. I actually can't think of a really viable way to do this that would let B start before C or C start before B. If you were willing to enforce that B must always start before C you could make part of the work that Job B does when it starts running be modify the Requirements portion of Job C's ClassAd so that it has a "Machine == " string where is the name of the machine B landed on. This would also require that Job C be submitted held or not submitted at all until B was running, B would also have to release it as part of its start up work.
That's pretty complicated...
So I just had a thought: you could use Condor's dynamic startd/slots features and collapse your DAG to achieve what you want. In your DAG where you currently have two separate nodes, B and C, you would collapse this down into one node B' that would run both B and C in parallel when it starts on a machine. As part of the job requirements you note that it needs 2 CPUs on a machine. Switch your startd's to use the dynamic slot configuration so machines advertise all of their resources and not just statically allocated slots. Now you have B and C running concurrently on one machine always. There are some starvation issues with dynamic slots when you have a few multi-CPU jobs in a queue with lots of single-CPU jobs, but it's at least a more readily solved problem.
Another option is to tag B' with a special job attribute:
MultiCPUJob = True
And target it just at slot 1 on machines:
Requirements = Slot == 1 && ...your other requirements...
And have a static slot startd policy that says, "If a job with MultiCPUJob=True tries to run on slot 1 on me preempt any job that happens to be in slot 2 on this machine because I know this job will need 2 cores/CPUs".
This is inefficient but can be done with any version of Condor past 6.8.x. I actually use this type of setup in my own statically partitioned farms so if a job needs a machine all to itself for benchmarking it can happen without reconfiguring machines.
If you're interested in knowing more about that preemption option let me know and I can point you to some further configuration reading in the condor-user list archives.

The solution here is to use the fact that you can modify submit descriptions even while DAGMan is running as long as DAGMan has not yet submitted the node. Assume a simple DAG of A -> B -> C. If you want all nodes to run on the same host you can do the following:
Define a POST script on node A.
The post script searches condor_history for the ClusterId of the completed node A. Something like condor_history -l -attribute LastRemoteHost -m1 $JOB_ID ... You'll need to clean up the output and what not, but you'll be left with the host that ran node A.
The post script then searches for and modifies dependent job submit files, inserting into them a job job requirement at the top of the submit file. Just make sure you build your job requirements incrementally so that they pick up this new requirement if it is present.
When the post script completes, DAGMan will then look to submit ready nodes, of which in this example we have one: B. The submission of B will now be done with the new requirement you added in step 3, so that it will run on the same execute host as A.
I do this currently with numerous jobs. It works great.

multithreading and multiprocessing

I am trying to understand basic OS concepts
Want to know if my understanding is right
multi-processing
Example: I invoke A.exe on my machine. I invoke another instance of it again. So there would be two A.exe on the RAM which are called processes and the OS would do multi-processing between them by means of context switching and blah blah
Multi-threading
Example: A.exe consitutes 2 things say program C and D . Assuming invoking A.exe means running C and D simultaneously. In that case
1. program A would call C and D as thread and span or start them as soon as A.exe is loaded.
2. C and D are threads and when process A.exe is given a chance to execute, only then multi-threading between C and D happens
3. C and D share the same process space alloted for A.
Is this correct?

Largely correct
it is not required that a process creates all its threads at the beginning; they can be created as needed, and as many as needed can be created
the operating system multi-tasks between threads; many processes consist of a single thread, others might consist of several. The operating system has complicated ways of working out how to balance the CPU time of all the threads in the system based upon whether they need to run and what their priority is and such; its not as simple as you describe in the system, and its not based upon the process they are part of (except when that is part of the weighting system in the scheduler)
Multi-threading allows the threads to share state easily - there is no 'memory protection' between the threads in the same process
Multiple processes does not allow threads to share state except explicitly e.g. by passing messages, sharing file handles or explicitly shared memory.

Yes. You are correct

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse