Are callbacks in Mesos Scheduler thread-safe? - scheduler

Does each call back to scheduler from mesos master gets run in a separate thread ?

Multiple callbacks on the same scheduler will never be invoked simultaneously -- i.e., if a second event is delivered for a scheduler while a previous scheduler callback is still active, the second callback will not be invoked until the first callback completes. Note that this implies you should avoid performing blocking or long-running operations in scheduler callbacks -- a typical pattern is to dispatch an asynchronous operation in the callback.

Related

Is DispatchQueue using RunLoop?

Every Thread has its own RunLoop, how DispatchQueue interact with them? Is DispatchQueue using RunLoop to dispatch task to Thread or do it by another way?
Any thread can have a run loop, but, nowadays, in practice, only the main thread does.
When you create a thread manually, it will not have a run loop. When you call RunLoop.current, the name suggests that it is grabbing the thread’s run loop, suggesting that it always will have one. But in reality, when you call current, it will return the run loop if one is already there, and if not, it creates a RunLoop for you. As the docs say:
If a run loop does not yet exist for the thread, one is created and returned.
And if you do create a run loop, you have to spin on it yourself (as shown here; and that example is over-simplified). But we don’t do that very often anymore. GCD has rendered it largely obsolete.
At a high level, GCD has pools of worker threads, one pool per quality of service (QoS). When you dispatch something via GCD to any queue (other than targeting the main queue), it grabs an available worker thread of the appropriate QoS, performs the task, and when done, marks the worker thread as available for future dispatched tasks. No run loop is needed (or desired) for these worker threads.

Why does dispatchQueue.main give me the option of sync or async if main queue is always sync

Why does dispatchQueue.main give me the option of sync or async if main queue is always sync
dispatchQueue.main.async{
//code
}
dispatchQueue.main.async{
//code
}
Is the same as
dispatchQueue.main.sync{
//code
}
dispatchQueue.main.sync{
//code
}
So I was wondering if there is any difference
The main queue is no different than any other queue with regard to whether you can enqueue something to it synchronously or asynchronously.
The only rule you must adhere to is to never use sync to enqueue something onto the same queue currently being used. That will cause a deadlock. Again, this is true no matter the queue, main or otherwise.
To answer your question "Is the same as" - no, it is not the same. Assuming both sets of code are being called from some background queue, in the first set of code, the background queue will keep on moving along without regard to when the two blocks eventually get executed on the main queue. In the second set of code (using sync), the background queue blocks each time until each block of code is run on the main queue.
If both sets of code are being called from the main queue, then there is a bigger difference. The first set of code (with async) keeps working. The second set of code (with sync) will cause the main queue to block at the first call to sync and your app will become unresponsive until the user (or the OS) kills it.
The only possibly relevant difference between the main queue and other queues is that the main queue is always a serial queue while background queues can be either serial or concurrent. But both have valid uses for using sync or async as long as you avoid using sync where both queues are the same.
Async: Non blocking, gets scheduled somewhere in the near future on that specific queue.
Sync: Gets executed right away on that specific queue.
What you are talking about is the situation where you are executing code on the main thread and dispatching something asynchronous or synchronous on that same queue. For async this is no problem as that block gets scheduled somewhere in the near future when there is a timeslot available on that queue. However a sync block which is executed on the same queue as its called upon has the possibility to deadlock.
This does not only hold for the main queue but also for any other queue, so if you are dispatching synchronously to a queue, make sure that thats another queue then the queue you are currently working on.

How scheduler knows a Task is in blocking state?

I am reading "Embedded Software Primer" by David E.Simon.
In it discusses RTOS and its building blocks Scheduler and Task. It says each Task is either in Ready State, Running State, or Blocking State. My question is how the scheduler determines a Task is in Blocking State? Assume it's waiting for a Semaphore. Then it likely Semaphore is in a state it can't return. Does Scheduler see if a function does not return, then mark its state as Blocking?
The implementation details will vary by RTOS. Generally, each task has a state variable that identifies whether the task is ready, running, or blocked. The scheduler simply reads the task's state variable to determine whether the task is blocked.
Each task has a set of parameters that determine the state and context of the task. These parameters are often stored in a struct and called the "task control block" (although the implementation varies by RTOS). The ready/run/block state variable may be a part of the task control block.
When the task attempts to get the semaphore and the semaphore is not available then the task will be set to the blocked state. More specifically, the semaphore-get function will change the task from running to blocked. And then the scheduler will be called to determine which task should run next. The scheduler will read through the task state variables and will not run those tasks that are blocked.
When another task eventually sets the semaphore then the task that is blocked on the semaphore will be changed from the blocked to the ready state and the scheduler may be called to determine if a context switch should occur.
As I'm writing a RTOS ( http://distortos.org/ ), I thought that I may chime in.
The variable which holds the state of each thread is indeed usually implemented in RTOSes, and this includes mine version:
https://github.com/DISTORTEC/distortos/blob/master/include/distortos/ThreadState.hpp#L26
https://github.com/DISTORTEC/distortos/blob/master/include/distortos/internal/scheduler/ThreadControlBlock.hpp#L329
However this variable usually is used only as a debugging aid or for additional checks (like preventing you from starting a thread that is already started).
In RTOSes targeted at deeply embedded systems the distinction between ready/blocked is usually made using the containers that hold the threads. Usually the threads are "chained" in linked lists, usually also sorted by priority and insertion time. The scheduler has its own list of threads that are "ready" ( https://github.com/DISTORTEC/distortos/blob/master/include/distortos/internal/scheduler/Scheduler.hpp#L340 ). Each synchronization object (like a semaphore) also has its own list of threads which are "blocked" waiting for this object ( https://github.com/DISTORTEC/distortos/blob/master/include/distortos/Semaphore.hpp#L244 ) . When a thread attempts to use a semaphore that is currently not available, it is simply moved from the scheduler's "ready" list to semaphores's "blocked" list ( https://github.com/DISTORTEC/distortos/blob/master/source/synchronization/Semaphore.cpp#L82 ). The scheduler doesn't need to decide anything, as now - from scheduler's perspective - this thread is just gone. When this semaphore is now released by another thread, first thread which was waiting on this semaphore's "blocked" list is moved back to scheduler's "ready" list ( https://github.com/DISTORTEC/distortos/blob/master/source/synchronization/Semaphore.cpp#L39 ).
Usually there's no need to make special distinction between threads that are ready and the thread that is actually running. As the amount of threads that can actually run is fixed and equal to the number of available CPU cores, then all you need is a pointer for each CPU core which points to the thread from the "ready" list which is running at that core at that moment. In my system I do the same - the thread that is at the head of the "ready" list is the one that is running, but I also manage an iterator which points to that thread ( https://github.com/DISTORTEC/distortos/blob/master/include/distortos/internal/scheduler/Scheduler.hpp#L337 ). You could have a separate list for running threads, but in most cases it would be a waste of space (there's usually just one) and makes other things slightly more complicated.
I've actually wrote an article about thread states and their transitions if you're interested - http://distortos.org/documentation/task-states/ This article has no special distinction between the thread that is "ready" and the one that is actually running. I don't consider this distinction to be actually useful for anything, as long as you have other means to tell which of the "ready" threads is running.

Networking using run loop

I have an application which uses some external library for analytics. Problem is that I suspect it does some things synchronously, which blocks my thread and makes watchdog kill my app after 10 secs (0x8badf00d code). It is really hard to reproduce (I cannot), but there are quite few cases "in the wild".
I've read some documentation, which suggested that instead creating another thread I should use run-loops. Unfortunately the more I read about them, the more confused I get. And the last thing i want to do is release a fix which will break even more things :/
What I am trying to achieve is:
From main thread add a task to the run-loop, which calls just one function: initMyAnalytics(). My thread continues running, even if initMyAnalytics() gets locked waiting for network data. After initMyAnalytics() finishes, it quietly quits and never gets called again (so it doesnt loop or anything).
Any ideas how to achieve it? Code examples are welcome ;)
Regards!
You don't need to use a run loop in that case. Run loops' purpose is to proceed events from various sources sequentially in a particular thread and stay idle when they have nothing to do. Of course, you can detach a thread, create a run loop, add a source for your function and run the run loop until the function ends. The same as you can use a semi-trailer truck to carry your groceries home.
Here, what you need are dispatch queues. Dispatch queues are First-In-First-Out data structures that run tasks asynchronously. In contrary to run loops, a dispatch queue isn't tied to a particular thread: the working threads are automatically created and terminated as and when required.
As you only have one task to execute, you don't need to create a dispatch queue. Instead you will use an existing global concurrent queue. A concurrent queue execute one or more tasks concurrently, which is perfectly fine in our case. But if we had many tasks to execute and wanted each task to wait for its predecessor to end, we would need to create a serial queue.
So all you have to do is:
create a task for your function by enclosing it into a Block
get a global queue using dispatch_get_global_queue
add the task to the queue using dispatch_async.
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
initMyAnalytics();
});
DISPATCH_QUEUE_PRIORITY_DEFAULT is a macro that evaluates to 0. You can get different global queues with different priorities. The second parameter is reserved for future use and should always be 0.

Who schedules the scheduler in OS - Isn't it a chicken and egg scenario?

Who schedules the scheduler?
Which is the first task created and how is this first task created? Isn't any resource or memory required for it? isn't like a chicken and egg scenario?
Isn't scheduler a task? Does it get the CPU at the end of each time slice to check which task needs to be given CPU?
Are there any good links which makes a person think and understand deeply all these concepts rather than spilling out some theory which needs to be byhearted?
The scheduler is scheduled by
an (external) event such as an interrupt, (disk done, mouse click, timer tick)
or an internal event (such as the completion of a thread, the signalling by a thread that it needs to wait for something, or the signalling of a thread that it has released a resource, or a trap caused by a thread doing something illegal like division by zero)
In short, it is triggered by any event that might require that the set of tasks to be run and/or the priorities of those tasks to be reevaluated. The scheduler decides which task(s) run next, and passes control to the next task.
Typically, this "scheduling" of the scheduler is caused by the code associated with a hardware interrupt, or code associated with a system call.
While you can think of the scheduler as being a real thread, in practice it doesn't need to be implemented that way... because it is executed with higher priority than any other task. Sophisticated OSes may in fact set aside a special thread that is the scheduler, and mark it busy when the scheduler gets control. That makes it pretty, but the bogus thread isn't scheduled by the scheduler
One can have multiple schedulers: the highest priority one (e.g., the one we just described), and other schedulers which really are threads, and are run like other user tasks. Such lower priority schedulers tend to be used to manage actions which occur at much longer intervals, such as background jobs.
it is usually invoked periodically by a timed CPU interrupt